Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Risk-aware path planning for autonomous underwater vehicles
(USC Thesis Other)
Risk-aware path planning for autonomous underwater vehicles
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
RISK-AWARE PATH PLANNING FOR AUTONOMOUS UNDERWATER VEHICLES by Arvind Antonio de Menezes Pereira A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) Dec 2013 Copyright 2013 Arvind Antonio de Menezes Pereira Acknowledgements First and foremost I must thank my advisor Gaurav Sukhatme for his patient guid- ance throughout this thesis. He took me under his wing and helped me develop confidence in my research work. Collaborative work with (now Asst. Prof.) Geof- frey Hollinger produced some of the key results in this thesis and I am very grateful for his suggestions, advice and friendship over the years I have known him. I must also thank my qualifying committee members Sven Koenig, Stefan Schaal, David Caron and Jernej Barbic for their valuable suggestions. Stefan and Dave deserve special thanks for finding time at short notice to be at my defense, despite their busy schedules. Prof. Burt Jones helped me get both the key datasets that I use extensively throughout this thesis - Automatic Identification System (through Paul Reuter) and the Regional Ocean Model (ROMs) datasets (by introducing us to Yi- Chao and John Fararra). Burt, Paul, Yi-Chao and John, thank you for providing access to the datasets that made this dissertation possible. The members of the Robotic Embedded Systems Laboratory (RESL) have always been a close-knit team and I have benefitted greatly by being part of this lab. Mem- bers both past and present have influenced my life (and this thesis) in many ways. ii Hordur Heidarsson, Jonathan Binney, Supreeth Subbaraya, Christian Potthast and StephanieKemnahavehelpedmeongliderdeployments. Hordur, JonandSupreeth played important roles in collaborating on several glider-related projects ranging from glider communications, to data-compression. Beside Geoff, I have to thank twomorepostdocs-RyanSmithandJoergMuellerfortheirvaluableadvice. Burt’s students have always been a big help especially during field deployments. Ivona Cetinic, Bridget Seegers, Elizabeth Teel, and Xiao Liu have been part of glider deployments and recoveries. Special thanks go to Carl Oberg and Matthew Ra- gan. Matthew and I have been up into the wee hours of the morning due to glider crisis management. I must thank the folks at Teledyne Webbresearch for their support, especially Ben Allsup and Clayton Jones. The folks at USC’s Wrigley Institute of Environmental studies have also been very important in enabling field tests, glider deployments, recoveries including several emergency recoveries. Trevor Oudin, Gerry Smith and Gordon Boivin deserve special mention for taking me out on many a glider recovery. Similarly, Ray Arntz, Kaya Heller and the rest of the Sundiver team helped us pick up gliders during emergency recoveries, allowing us to fix problems with the gliders and redeploy them. Big thanks also go to Lizsl de Leon for being an amazing PhD student advisor to all of us at USC. I must thank funding agencies that have funded me through the years (specially the National Science Foundation and Office of Naval research) for supporting this work. I would not be here without my parents Pamela and Balduino, and siblings Rachel iii and David, whose support helped me fulfill my dream of studying at a wonderful university like USC. I must also thank my friends at the National Institute of Oceanography (NIO) in India, friends from the Instituto Superior Technico (Prof. Antonio Pascoal and Luis Sebastiao in particular) for providing early inspiration to pursue marine robotics. Similarly, Srikanth Saripalli, Jonathan Kelly, Gabe Sibley, MarinKobilarov, KarthikDantu, SameeraPodhuriandBinZhang, whoweresenior students during my time at RESL were also inspirations. I am grateful to Helga and Joaquim Goes for encouraging me to pursue the American dream. To all my old lab mates at RESL, Megha, Hordur, Ryan W., JD, Harsh, Stephanie, Chris, Max and David, many thanks especially for enduring many of my practice talks and helping me improve as a speaker. My old room-mates and close friends Anuj, Arun, Akshay and Pankaj, thanks for the constant encouragement to get done with pursuing the higher degree. Finally, I must thank my wife Svetlana for not only encouraging me to finish my thesis, but also for running my planners for a few hours each night while I caught up on sleep to keep the field experiments in Chapter 3, going for over a week. The figures and text in this thesis about those field experiments will not explain the effort that was expended in producing these results. Looking back at nearly two thousand hours spent running these planners on gliders at sea - the satisfaction that comes from making robots autonomous, makes it all worth it. iv Table of Contents Acknowledgements ii List of Tables ix List of Figures x Abstract xxix Chapter 1 Introduction 1 Chapter 2 Deterministic Minimum risk planning 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 Risk Map Creation using AIS Data . . . . . . . . . . . . . . 17 2.3.2 Dealing with Dead-reckoning Uncertainty . . . . . . . . . . . 19 2.3.3 Dealing with Arbitrarily Long Paths . . . . . . . . . . . . . 20 2.4 Minimum Risk Planning with Communication Utility . . . . . . . . 22 2.4.1 Proposed Solution: Minimum Risk Planner using A ∗ . . . . 24 2.4.2 Proof of the Admissibility of the A* heuristic . . . . . . . . 27 2.4.3 Proof of the Correctness of using A* for planning Minimum Risk paths with edge-dependent, non-negative node weights . . . . . . . . . . . . . . . . . . . 29 2.5 Deterministic minimum risk planning in predictive ocean currents . . . . . . . . . . . . . . . . . . . . . . . . 31 v 2.5.1 Proposed solution: Time-expanded A* planner . . . . . . . . 32 2.5.2 Calculation of the Time of Flight . . . . . . . . . . . . . . . 35 2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.6.1 Resultsfromdeterministicplanningwithoutcurrentpredictions 39 2.6.2 Results from deterministic planning with current predictions 42 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Chapter 3 Risk-Aware Planning under Uncertainty in Prediction Models 56 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.3 Risk Aware Planning: Problem Definitions and Algorithms . . . . . 67 3.3.1 Problem formulation: Deterministic minimum-risk planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.3.2 Minimum-risk Planner . . . . . . . . . . . . . . . . . . . . . 69 3.3.3 Minimum-Expected Risk Planner . . . . . . . . . . . . . . . 70 3.3.4 Risk aware Markov Decision Process Planner . . . . . . . . . 71 3.4 Methods and Description of Sources of Data . . . . . . . . . . . . . 73 3.4.1 Risk Map Creation using AIS Data . . . . . . . . . . . . . . 74 3.4.2 Current predictions from the regional ocean model system . 78 3.4.3 Simulation of glider motion . . . . . . . . . . . . . . . . . . 79 3.4.4 Creation of the planning graph . . . . . . . . . . . . . . . . 86 3.5 Planner Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.5.1 Comparison of planners in simulation in the southern Cali- fornia Bight . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.5.2 ComparisonofMDP,MERandMin-RiskPlannersonVirtual map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.6 Field Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.6.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 102 3.6.2 Experiment 1: Minimum-Expected-Risk and MDP execution along currents . . . . . . . . . . . . . . . . . . . . . . . . . . 106 3.6.3 Experiment 2: Minimum-Expected-Risk and MDP execution against currents . . . . . . . . . . . . . . . . . . . . . . . . . 108 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 vi Chapter 4 Learning uncertainty in ocean models to aid planning 118 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.2 Risk-aware planning using MDPs and predictive ocean models . . . . . . . . . . . . . . . . . . . . . . . . 122 4.3 Methods and Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 129 4.3.1 Gaussian Process regression . . . . . . . . . . . . . . . . . . 129 4.3.2 Local approximation and block learning . . . . . . . . . . . 132 4.3.3 Comparison of uncertainty predictions . . . . . . . . . . . . 133 4.3.4 Policy switching risk-aware MDP using stationary transition models . . . . . . . . . . . . . . . . . . . . . . . . 136 4.3.5 Risk-aware Gaussian Process MDP using stationary transi- tion models . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 4.3.6 Risk-aware Gaussian Process MDP using non-stationary transition models . . . . . . . . . . . . . . . 139 4.4 Planner Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 141 4.4.1 Comparisonofswitchingpolicyrisk-awareMDPwiththesta- tionary risk-aware GP-MDP . . . . . . . . . . . . . . . . . . 144 4.4.2 Comparisonofnon-stationaryrisk-awareGP-MDPwithswitch- ing and stationary risk-aware MDPs . . . . . . . . . . . . . 148 4.4.3 Gain of using an Omniscient MDP over a practical finite- horizon non-stationary MDP . . . . . . . . . . . . . . . . . . 150 4.5 Field Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 4.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Chapter 5 Conclusion 171 Appendix Chapter A Practical aspects to Coastal Ocean Observato- ries 176 Appendix Chapter B Description of field trials of glider AUVs 179 B.1 Pre-deployment procedures . . . . . . . . . . . . . . . . . . . . . . . 184 B.2 Glider Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 B.3 Normal Glider Operations during deployment . . . . . . . . . . . . 189 vii B.4 Automated and Semi-Automated plan execution using Glider Path Planning Library (GPPLib) . . . . . . 190 B.5 Glider Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 B.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 AppendixChapterC Risk-reductionviaimprovementstoAUV-shore communication 197 C.1 Hardware Modifications and Communication Protocol Design . . . 201 C.1.1 Hardware Modifications . . . . . . . . . . . . . . . . . . . . 201 C.1.2 The Communication Protocol Used on the Vehicle . . . . . . 203 C.2 Data Collection and Processing . . . . . . . . . . . . . . . . . . . . 206 C.2.1 Data Gathering at Sea . . . . . . . . . . . . . . . . . . . . . 207 C.2.2 Accelerometer Data Processing Pipe-line . . . . . . . . . . . 208 C.2.3 Communication parameters processing . . . . . . . . . . . . 213 C.3 Experimental Results and Analysis . . . . . . . . . . . . . . . . . . 215 C.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 C.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 Reference List 228 viii List of Tables 4.1 Correlation coefficients (R-values) for Gaussian Process variance and interpolation variance relative to true prediction error (shown sepa- rately for N/S component and E/W components of the ocean current vectors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4.2 Summary of planner performance. (X⇒ good,×⇒ poor). Here the stationary GPMDP consistently outperforms the other planners non-stationary GPMDP outperforms the stationary planners (even with a limited prediction horizon) in faster currents. . . . . . . . . . 165 C.1 Comparison of classifier performance on link-quality . . . . . . . . . 223 C.2 Comparison of classifier performance on link-quality . . . . . . . . . 224 C.3 Comparison of classifier performance on link-quality . . . . . . . . . 224 ix List of Figures 2.1 An overlay of the AIS data from Jan 2010 to April 2010 along with all glider surfacings from 2 gliders color coded differently for iden- tification, during the deployments from 2009 and 2010. Also shown are planned and currently operational base station sites for shore- based communication with gliders are also shown. Point Fermin and Catalina Island have HF-radar sites which provide surface current measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Obstaclessuchasthebathymetrycanalsobecombinedintothesame risk map with a very high value assigned to the risk of hitting it. The risk-map shown here has also undergone a Gaussian blur to spread the noise in the creation of the risk map, as well as to account for theuncertaintiesintheglidernavigationwhileexecutingthemission. Here lower risk is black while increasing risk goes to white through grey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Figure illustrating theGetNeighbours † (C) step in ourA ∗ search for the shortest risk path while using the d max constraint. The blue circle indicates a region that encloses all nodes around a node at the frontier of expansion, where we compare the cost of all nodes around it which are under d max distance away. All nodes around C in the open−list have their parent-pointers pointing to C, since it is the minimum f-cost node. In this figure a darker cell refers to one with higher risk, and the lowest-cost path should be one that picks lighter cells in order to achieve the minimum total path risk. At the next stage of the algorithm, if node E has the minimum f-cost, it will be the next node at the search frontier. . . . . . . . . . . . . . . . . . 15 x 2.4 An illustration (not to scale) of how a glider plan is modified between a set of way points w k and w k+1 such that the glider stays deeper whiletraversingthesegmentthroughariskyarea. Foreverysegment that has a risky area which the glider might pass through, the glider will need to stay deeper. . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Amapshowingtheproposedandfunctionalbase-stationsintheSCB region. Low bandwidth Iridium communication is available through- out the whole region, whereas Line-Of-Sight (LOS) communication with the base-stations is only available within the shaded regions around each of the red diamond shaped base-station locations. We create this throughput map based upon experimental data as well as LOS calculations from an elevation map. A better communication model for throughput if available, could use be substituted instead. 25 2.6 Glider needs to reach B from location A, but will drift due to the currents to location B’. The pseudo-waypoint method iteratively at- tempts to find a good location B” which the glider should aim for instead of B, so that the currents push the glider to location B. The location B” is called a pseudo-waypoint. . . . . . . . . . . . . . . . 35 2.7 Illustration of the effect of currents on the execution of a planned path between the Start waypoint A, and Goal waypoint B. In the presence of a field of currents whose average velocity is represented by the colored vector in each grid shown, the glider will drift along cells such as C, following the path to D instead of going to B. One way to circumvent this is to find a pseudo target waypoint E to aim for which in the presence of currents takes the glider to the goal waypoint B. Another approach to calculating the time-of-flight would be to compute the resultant velocity along the straight line between A and B, for all the cells (shown in blue here), which lie along that line. Knowing the distance between A and B, as well as the resultant velocity it is possible to calculate an approximate time-of-flight between A and B. . . . . . . . . . . . . . . . . . . . . 37 xi 2.8 Here Shortest-Path-Low-Risk algorithm is 11.8 times riskier than the minimum-risk-path chosen by our planner which is optimal w.r.t. risk and a givend max = 8 km. Even though the Min-Risk-At-Surface path’s waypoints are chosen to minimize the total risk of the path at the surface while respecting the d max constraint, it is 33.4 times riskier than the minimum-risk path found by our method and 3.23 times longer than the shortest path algorithm. . . . . . . . . . . . 41 2.9 A path stretching from the Malibu basin to the coast off Orange county illustrates how aiming for the shortest path could lead to bad results. Even though our path is 3.5 km longer than the shortest path between the start and goal location, the shortest path is 5200 times riskier. The path planned by our algorithm picks the minimum risk waypoints all along the path for which it accrues R min at each waypoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.10 Plot of the variations in the d max constraint parameter. As we in- creased max we see that the usual trend is for the paths to get shorter while the risk associated with that path reduces. This is because we allow the glider to dive under hazards for longer distances. . . . . . 47 2.11 Plot of variations in the α tuning parameter which allows the user to assign higher nominal risk to each waypoint. By increasing this value ofα, we find that the minimum risk paths usually get shorter, since we are essentially allowing the glider to take a riskier path to the destination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.12 Running times for the Minimum risk planner vs. the maximum time of flight (100 randomly chosen start and end locations for each value oft max ). The Minimum-risk planner is more than 1000x faster on av- erage as compared to the time-expanded planner (due to the added dimension of time, as well as the overhead in testing feasibility of paths). We also see that although longer paths have a larger branch- ing factor, beyond t max =11 there is a gradual drop in the time to find the optimal solution. This is probably because longer paths can count toward the T max limit faster, and thus get eliminated quicker. 49 xii 2.13 Plots of (a) average risk values, (b) average number of waypoints in a plan and (c) average path-length vs. the maximum time of flight, over 100 trials for each value of t max . Note that ast max goes up, the Time−Expanded planner’s risk approaches that of theMin−Risk planner even though it uses more waypoints and has a longer path length. The Time−Expanded version is forced to use longer paths because it has to eliminate many of the longer path segments due to currents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.14 The goal waypoint in this case is the red star. If the glider pro- ceeds directly toward this waypoint, it will drift off along the green direction. The planner chooses to use the green triangle as the goal location as a pseudo-waypoint. This choice in the presence of the currents helps the glider get to the waypoint within a single itera- tion. The Min−Risk planner with no knowledge of currents, is unable to get to the desired waypoints accurately, and consequently has higher risk in practice than theTime−Expanded planner. Also shown in the figure is the grid layout on with the current map and risk maps used in the planner. . . . . . . . . . . . . . . . . . . . . . 51 2.15 Paths planned with a value oft max =12 hr. The magenta path is the minimum-risk path based on prior work where ocean currents are not accounted for. It results in the lowest possible numerical risk value, but is only valid if there are no ocean currents. The time-expanded planner’spathshownincyan, onthecontraryusesinformationabout the ocean currents. Also shown is an overlay of the ocean currents in the region and the risk map. We plan both these paths starting at 00:00 GMT on July 16, 2011, so as to simulate the conditions from our field experiment which motivated this path planner. . . . . . . . 52 2.16 Paths planned with a value of t max =11 hr. The green path is the time-expanded path planner’s output in the presence of the currents for the same start and goal locations shown in Fig. 3.2. The goal lo- cations are indicated by a "red" G. We plan both these paths starting at 00:00 GMT on July 16, 2011. Since the glider struggles against the currents when going from east to west (in the green path), it is forced to have to take a path going north-wards as well as to make a stop (Wpt 10) in between the two shipping lanes. When travelling in the opposite direction, the glider can take advantage of the currents and make it across much faster. . . . . . . . . . . . . . . . . . . . . 53 xiii 2.17 Paths planned with a value of t max =5, 7 and 11 hr using only the Time-Expanded planner in the presence of currents. The green path is the time-expanded path planner’s output in the presence of the currents for the same start and goal locations shown in Fig. 3.2. The goal locations are indicated by a "red" G. We notice that the cyan path has more than 20 waypoints to get to the goal, with each leg having a maximum time of 5 hours. The magenta path (t max = 7 hours) has fewer waypoints than the gray path but more waypoints than the green path which has t max = 11 hours. We also see that in order to collect lower risk the green path takes a much more longer path, but one which is shorter than T max of 100 hours. . . . . . . . 54 2.18 Pathsplannedwithavalueoft max =8, 10and12hrrespectivelyusing our Time-Expanded planner in the presence of currents. The green path is the time-expanded path planner’s output in the presence of the currents where the start and goal locations are interchanged from those shown in Fig. 3.2. We plan all these paths starting at 00:00 GMT on July 16, 2011. We notice that the "dark green" and "dark blue" paths have both got similar paths - they move south west initially because they attempt crossing the shipping lanes at a narrower portion. The red path which has t max = 12 hr plans a slightly counter-intuitive path which goes north before moving west- south-west with long jumps. It appears to take advantage of the currents and expects to comfortably clear the broad high-risk area in the center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3 (a) Occupancy map created using historical AIS data. (b) Obstacle map created using bathymetry data. (c) Combined Occupancy map after Gaussian Blur. . . . . . . . . . . . . . . . . . . . . . . . . . . 74 xiv 3.4 (a) For simplicity we decouple the kinematic model for the glider in the horizontal and depth planes. In the depth plane, the glider performs yo-yo maneuvers which appear like sinusoids (due to the dynamics of the vehicle where the glider accelerates when diving to a terminal velocity on the downward yo, until it decelerates at depth 2 to begin an upward yo, followed by acceleration to terminal velocity on the upward yo). When flying in water deeper than depth 2 , the glider takes a nominal time T to perform each yo-cycle. When flyinginshallowerregionsthandepth 2 , thegliderdetectstheseafloor (using the altimeter) and inflects before reachingdepth 2 - shortening both dive and climb times. Ignoring accelerations and deceleration at both ends, we can arrive at a simple model for the depth of the glider to allow us to do integrate the glider trajectory. The glider pre-computesthenumberofyo-maneuversrequiredtogettothegoal at a given bearing. (b) This simplified vector diagram for glider dive and climb velocities allows us to compute the nominal horizontal glider velocity v g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.6 Resultsforsimulationsperformedonwellautocorrelatednodes(start and goal locations are well autocorrelated) on January 1, 2011 and February 1, 2011 respectively. January is a month with fast currents which overwhelm the glider in many cases causing it to have very low success rates. Most successful path completions are those which are along the flow of the currents, with very few successful completions in the reverse direction. In January, due to the dangers involved in fighting the currents the strategy chosen by the MDP is usually to go to a safe area and loiter until conditions improve instead of trying to head toward the goal. On the contrary, the MER and Min-Risk planners are very goal-directed and hence attempt to make headway to the goal even though this can result in them being forced toward land by the strong currents. Interestingly enough, there is very little difference in the performance of the planners in terms of success or failures. The MDP is certainly more conservative and consequently times out more often. . . . . . . . . . . . . . . . . . . . . . . . . . . 92 xv 3.7 Results for simulations performed on poorly autocorrelated nodes (start and goal locations are poorly autocorrelated) on January 1, 2011 and February 1, 2011 respectively. January is a month with fast currents which overwhelm the glider in many cases causing it to have verylowsuccessrates. ThebenefitsoftheMDPinbeingabletocope with higher variability in the currents, as opposed to the other two planners gives the MDP a definite edge w.r.t. poorly autocorrelated nodes. WeseethatinthemonthofJanuarytheMDPmakestheright decision by loitering instead of being goal-directed thus avoiding a large number of potential aborts. In February this over-conservative behavior earns it fewer crashes, but also prevents the MDP from being comparable to the other two planners in completion rates. While there are ways to make the MDP more goal-directed and less conservative, we set this MDP up to be conservative. . . . . . . . . 93 3.9 Figures (a, c) on the left show the Success/Abort/Timeout percent- ages for 300 simulations in the virtual map for (start,goal) pairs whose direction to goal from the start is approximately (a) along that of the median current, (c) opposing the median current direc- tion. Figures (b, d) on the right show the average risk, average time andtheaveragepathlengthforallsuccessfultrialsapproximately(b) along the median current direction, (d) against the median current direction. It is clear from the high-rates of mission aborts that the naive Minimum-Risk planner has very poor performance in the pres- enceofcurrents-itismorelikelytofailthantosucceed. Theaverage risk is much lower primarily because it was successful only at shorter paths(whichaccumulatelowerrisk). Whentravelingtowardthegoal inadirectionalignedwiththecurrents, theMinimum-ExpectedRisk planner is certainly the best choice. . . . . . . . . . . . . . . . . . . 99 3.10 One of the two Slocum gliders used in the field experiments (viewed from just below the water surface) while it is executing a dive. No- tice the absence of any thrusters for propulsion - a feature which gives the Slocum glider mission longevity at the expense of speed. The nominal speed attained by a glider is heavily influenced on its ballasting as well as the dive/climb angles used. . . . . . . . . . . . 102 xvi 3.11 An example of a forward simulation showing how the glider is ex- pected to behave over the next few steps of the planner output. The location of the glider when it surfaces is taken as the start point of the simulation, and 10 simulations are performed using ROMS data predictions (perturbed by a small amount of noise to simulate vari- ability). In general, if there is high variability in the ocean currents, there is high variability in the simulated paths as well because the glider experiences significantly faster or slower currents which affect its trajectory. Here the currents have very little variability and their magnitudes are much smaller than those of the glider resulting in all 10 hypotheses being fairly consistent with each other. Also note that the currents change with time and the current vectors depicted in this figure are those during the final portion of the last simulation. In situations where there can be high variability in the currents, we would see a much higher spread in the predicted trajectories. If any of these predicted trajectories ends up running into land, the oper- ator would become aware that there is risk associated with such a path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.12 Gliders start in south-eastern portion of the graph (at 5:30 am on Jul 28, 2012) making their way toward the goal waypoint to the west. The currents during this period were generally aiding the motion of both the gliders. Both planners had difficulty crossing the first set of shipping lanes near the eastern virtual island. Both planners chose to cross the shipping lanes connecting the western virtual island with the larger land body. Both the gliders arrived at the goal by 6:00 pm on Jul 29, 2012. While both planners ended up having a few surfacinglocationswhichwereinshippinglanes, thechosenwaypoint (as seen in simulation) is stochastically the lower risk choice. There was very little qualitative difference in the type of paths executed by both planners, with both having approximately the same amount of cumulative risk of collisions at surface. . . . . . . . . . . . . . . . . 106 xvii 3.1 View of the region monitored by the Center for Integrated Net- worked Aquatic Platform Systems (CINAPS) coastal observatory in the Southern California Bight with the Los Angeles and Orange Counties to the north and east respectively, and St. Catalina Island to the south. The ports of Los Angeles and Long beach are among the busiest ports in the world. The solid yellow lines are the offi- cial shipping lanes for inbound and outbound shipping traffic. The gray traces are traces of ships which traversed this region during 2010. Also visible are the surfacing locations of two gliders in this region during 2009 and 2010. Notice that several surfacings fall in the shipping lanes, where gliders could be at high collision risk. . . 113 3.2 The figure above shows an overlay of the estimated path and dead- reckoned path from a field experiment conducted between July 16-18 2011, where the glider was making its way from the start, (waypoint 1) to the goal (waypoint 5). Also shown is an overlay of a risk map which shows actual tracks followed by ships between January and April 2010. The glider aimed for the locations chosen by the Minimum-risk planner (which ignores currents) but in every case missed waypoints by a large margin due to the strong currents on thosedays. Errorsindead-reckoningduetocurrentsresultedinthree glider surfacings within shipping lanes. . . . . . . . . . . . . . . . 114 3.5 Histograms of current magnitudes for (a) January 1, 2011 and (b) February 1, 2011. Here we histogram the depth averaged current magnitudes at each grid point between into 50 bins between 0 and 1m/s. January 1 is a day with a significant number of current mag- nitudes are larger than the nominal velocity of the glider. Note that the mean current magnitude in January is almost the same as the nominal speed of the glider. Current velocities of up to 0.8 m/s exist in some parts of the region which corresponds to a speed that is more than 3 times the glider speed. This makes January a very perilous month since the glider is at the mercy of strong ocean currents in much of the planning region. On a day like February 1 on the con- trary, most of the currents are slower than the velocity of the glider. The results in section 3.5 (Figures 3.6 and 3.7) also suggest that this is a very important consideration. . . . . . . . . . . . . . . . . . . . 115 xviii 3.8 Virtual test environment in the Southern Californian region. The outer rectangle has dimensions 53 km x 39 km. The farthest nodes along a straight line east-west axis are separated by 45 km, while the farthest nodes along the north-south axis are separated by a dis- tance of 15.6 km. Also visible are triangles indicating the nodes in our planning graph. The darker patches within the rectangle form an archipelago of virtual islands, which create a challenging plan- ning scenario with narrow corridors. We have also created virtual shipping-lanes (lighter gray lines connecting islands to larger land bodies) similar to those in the real world map. . . . . . . . . . . . . 116 3.13 Gliders start in western portion of the graph (at 23:30 am on Jul 30, 2012) making their way toward the goal waypoint to the east. The currents during this period were generally against the motion of both the gliders. Both the gliders arrived at the goal by 8:00 pm on August 1, 2012. In this run, the MER goes directly east toward the goal, while the MDP starts off going west after which it goes around the western virtual island from the north, while the MER planner goes around it from the south. The length of both paths is almost the same. The MDP came dangerously close to the eastern virtual island, but was still approximately 500 m away. . . . . . . . . . . . 117 4.1 (a) Slocum gliders are AUVs which are valuable assets in coastal ob- servatories. When at the surface, an inflatable air bladder helps raise the antennae in the tail approximately 30 cm above the water sur- face allowing the robot to communicate via Iridium or radio modem and receive GPS updates. A glider at the surface is at risk of colli- sion with boats and ships due to its small profile. (b) A virtual risk map built for the Southern California coastal ocean that shows the region we use in this thesis to evaluate our planners. This virtual map is intentionally designed to be more challenging than typical AUV coastal missions, while also having some safeguards built in to prevent accidental collisions due to operating the glider too close to land or regions with high shipping and boating traffic. Notice how we pretend that the shipping lanes from Figure 4.2 (b) are land bod- ies to the north and east of the virtual map. Three (virtual) islands andartificialshippinglanesareaddedtomaketheenvironmentmore challenging. We also pretend that the boundary of Santa Catalina island (off whose coast the test region is located) extends further out to prevent the glider from erroneously coming too close to land during field tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 xix 4.2 (a) Ocean currents in the Southern California Bight region predicted byRegionalOceanModelingSystems(ROMS)[1]. Weproposedata- driven methods to develop confidence estimates on these predictions, and we integrate the resulting estimates into a probabilistic planner to improve the safety of autonomous underwater vehicle operation. (b) A collision risk map built using automatic identification system (AIS) data for shipping traffic (indicated by the warmer lines), and thresholding on bathymetry data for identifying land (red color) and navigation hazards indicated by warmer colors. We threshold the occupancy of pixels (of resolution 250 x 250 m) at the maximum value of occupancy in the regions with shipping traffic. The traffic going between the mainland and the island to the south consists mainly of diurnal tourist traffic to and from St. Catalina island. The shipping traffic along the primary shipping lanes near the mainland which enter/leave the ports of Los Angeles and Long beach do not show much variation between day and night. . . . . . . . . . . . . . 121 4.3 We begin with historical ROMs data and the current predictions for the day we are interested in. We use the methods described in to estimate the interpolation variance associated with the current pre- dictions for the day indexed by location and time. These variance estimates are used to create transition models which can be station- ary (applicable over 6, 12 or 24 hour periods), or non-stationary (models varying with time i.e. every hour). The transition models are used to solve the MDP using classical methods like value itera- tion. The risk map is used to provide costs for the reward function of the MDP. Once solved, we obtain a policy which provides the action the agent must execute to get the maximum utility given the state the agent is currently in. (b) . . . . . . . . . . . . . . . . . . . . . . 124 4.4 Comparison of the Gaussian Process variance and the interpolation variance measures of uncertainty for the prediction of ocean currents in the Southern California Bight on August 7, 2012. The interpo- lation variance accounts for variability in the ocean currents when making its prediction of uncertainty, and it shows a stronger corre- lation with the true prediction error. . . . . . . . . . . . . . . . . . 135 xx 4.5 Basic flowchart explaining how simulations of the glider are per- formed for both the stationary and non-stationary planners. The pre-computed policy for the non-stationary MDP is sensitive to the time, whilethestationaryMDPpolicychoosesafinite-horizonpolicy which corresponds to the 6, 12 or 24 hour period for which the sta- tionary policy has been created. In the simulations reported in this chapterweusepolicieswhicharevalidfor12hourperiods. Whenthe glider is at sea during field experiments the same procedure holds; we use the latest pre-computed policy and lookup the best action based on the state consisting of the latitude, longitude (and in the case of non-stationary planners, time). This action is translated to a glider mission file which is uploaded to and executed by the glider. 141 4.6 Shows a total of 2160 simulations conducted in two regimes currents slower than speed of glider (Aug 4 to Aug 11, 2012), and when cur- rents often faster than speed of glider (Aug 17 to Aug 24, 2012) (a) In 270 simulations each where the glider had to travel generally with the direction of the slower currents (when going from the start to the goal), there was a significant reduction in the number of collisions when the GPMDP was used, as compared to the Switching MDP (p- value=0.01858). (b) When going against the currents there was no improvement in the success rate for the GPMDP over the switching MDP (p-value=0.5). The improved current estimates appear to help when going with the currents, but do not appear to provide any ad- vantagewhengoingagainstthecurrentsifthespeedofthecurrentsis slow in comparison to the speed of the glider. (c) In 270 simulations each where the glider had to travel generally with the direction of the fastercurrents, therewasasignificantreductionincollisionsbyusing the GPMDP over the switching MDP (p-value=0.00651). (d) When going against the faster currents we notice a significant reduction of crashes as compared to the successes by using the GPMDP over the switching MDP (p-value=0.02802). This indicates that using a more sophisticated technique such as GPMDPs produces an improvement when the currents are faster. Faster currents are also responsible for more policies resulting in timeouts - an MDP policy which finds that the currents are too strong for the glider to reach the goal is likely to find a policy that keeps the glider hovering between two or more low-risk waypoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 xxi 4.7 (a) Simulation results from 270 runs when the direction to goal is generally with the currents. The NS-GPMDP has fewer crashes than the GPMDP while timing out significantly more times. This reduc- tion in collisions through the use of the Non-stationary GP-MDP is not statistically significant (p-value=0.59) (b) Simulation results from 270 runs each when going against fast currents. Here, we no- tice a significant improvement in terms of reducing collisions with land through the use of the non-stationary GPMDP (p-value≈0). The non-stationary GP-MDP is an improvement over the stationary planners especially when the currents are fast. . . . . . . . . . . . . 148 4.8 (a) Simulation results from 405 runs for each planner when the goal is aligned with the goal. We notice that there is a significant increase in successes for the Omniscient MDP with a reduction in crashes (p- value≈0). This indicates that the knowledge of currents further out into the future is highly beneficial when going toward the goal with the currents. (b) Simulation results from 405 runs each when going againstthecurrents. Here, wenoticeanimprovementinsuccessrates for the Omniscient MDP, but it has more collisions (p-value=0.7697 which is not statistically significant). This indicates that risk of col- lision does not necessarily go down with additional knowledge about the uncertainty in current forecasts when going against currents. Overall however, we note that having knowledge of future currents does help the omniscient MDP avoid timing out, consequently in- creasing its success rates when going against the currents. . . . . . . 150 4.9 (a) Simulation results from 270 runs each in relatively fast currents when the goal is aligned with the goal. While the omniscient MDP appears to help increase the number of successes and reduce the number of crashes this is not very significant (p-value=0.1312). This indicates that the knowledge of currents further out into the future is beneficial when going toward the goal with the currents, but is less important than it is for slower currents. (b) Simulation results for 270 simulations each for all 4 planners in this chapter, when going against relatively fast currents. Here we notice that there is no improvement through having more knowledge about the currents - the omniscient MDP also times out the same amount as the non- stationary MDP. (c) Statistics of the planners when going with fast currents. (d)Statisticsoftheplannerswhengoingagainstfastcurrents.166 xxii 4.10 The figure above shows the execution of the stationary GP-MDP being executed by the Slocum glider. The experiment began at 19:10 PST on August 17, at the location denoted by the blue triangle. The glider arrived at the goal denoted by the black square at 3:21 am PST on August 19. The path being executed by the GP-MDP was generally with the direction of the currents. . . . . . . . . . . . . . 167 4.11 The figure above shows the execution of the Non-stationary GP- MDP being executed by the Slocum glider. The experiment began at 19:10 PST on August 13, 2013 with the glider starting at the location denoted by the blue triangle. The glider followed the path showninred, surfacingatthelocationsindicatedbythereddiamond markers, and finally arrived at the goal around 16:20 PST on August 16, 2013. Notice how the non-stationary GP-MDP had practically no oscillations (since the goal was reachable within 48 hours when going with the currents). We also note that the planner had to cross one set of shipping lanes but chose to cross the shipping lane to the east that required a longer path around the eastern and later around the central virtual islands to get to the goal. . . . . . . . . . . . . . 168 4.12 Screen-shots from the Glider path planning library during the field trial on August 15, 2013. (a) Non-stationary GPMDP policy com- puted when the glider surfaced at the location just south-east of the central island. Notice how all the arrows at this instant of time are goaldirected. (b)Forwardsimulationswithvaryingamountsofnoise in the currents indicate that the glider is likely to successfully exe- cute the computed policy from the surfacing location it is currently at. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 xxiii 4.13 (a) The figure above shows the execution of the Non-stationary GP- MDP being executed by the Slocum glider. The experiment began at 18:00 PST on August 16, at the location denoted by the blue triangle. The glider would have to travel against the currents and ended up executing policies without making any progress toward the goal. (b) Closer inspection of the policies generated corresponding to the time of the experiment revealed that none of the locations in the red region had goal-directed actions. The actions took the glider to low-risk regions of the map and caused it to oscillate there. These policies result in the NS-GP-MDP timing out quite often in practice due to a lack of information about the ocean currents in the future. This is a practical limitation imposed upon the planner due to the limited time-horizon for which ocean current predictions are available. We were able to get the glider to proceed toward the goal subsequently by manually directing it out of the red region, and resuming the mission using the latest computed non-stationary policy. This second run of the planner against the currents was also successful. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 B.1 The two gliders used in the work described here, waiting for deploy- ment at the USC Wrigley Institute for Marine Sciences, California, USA. We deploy the gliders with stickers indicating that the glider is a scientific instrument which should not be recovered by other people. We also provide contact phone numbers for people to re- port/enquire about gliders they may come across. Needless to say, there have been numerous occasions when we have been contacted by people who found the glider (as well as had our gliders recovered by fisherman/sailors, when we would have preferred that they were collecting data instead). We have been fortunate not to have lost either of these gliders in 5 years of deployments in fairly busy parts of Southern California. . . . . . . . . . . . . . . . . . . . . . . . . . 181 xxiv B.2 Parts that make up an electric Slocum glider. The glider consists of 5 parts: the front consists of the nose section, which houses the bellofram (used to change the volume of the glider for buoyancy control), as well as an altimeter sensor (which uses sonar to avoid hitting the bottom during dives). The forward section contains bat- teries mounted on a motorized-tray that slides back and forth (which cause the glider to pitch up and down respectively when underwa- ter). The middle section of the glider (called the science bay) holds electronics and sensors related to oceanographic sensing such as sen- sors for conductivity, temperature and density (CTD), fluorometers, spectrometers and so on. The aft section contains another large immovable battery pack (located only at the bottom for added sta- bility), as well as the bulk of the electronics required to navigate, control, communicate and sense the attitude and position of the ve- hicle. The tail cone contains an inflatable air-bag and a jettison weight (ejected for safety reasons if the glider is trapped underwa- ter). The airbag inflated when the glider is at the surface to raise the tail (containing the rudder used for heading control), so that the glider can communicate or get a GPS position using the antennae in the tail. The glider also has passive wings on either side providing improved gliding performance during dive and climb maneuvers. . . 183 B.3 Glider technicians Carl Oberg and Matthew Ragan work on a glider during a battery change. Notice how the glider comes apart longitu- dinallyintothevarioussectionsdescribedearlier. Carlisalsowriting down the difference in mass between the previous battery pack and the new one which he uses to guide the ballasting procedure for the glider whereby the glider’s mass is adjusted to make it (nominally) neutrally buoyant in sea water. . . . . . . . . . . . . . . . . . . . . 185 B.4 Glider technician Carl Oberg, in the ballasting tank with a glider (being fork-lifted into the tank), at the Southern California Marine Institute, San Pedro CA. . . . . . . . . . . . . . . . . . . . . . . . . 186 xxv B.5 GliderpilotArvindPereira, taskstheglideronamissionviaasmart- phone (Apple iPhone) in september 2009. The iPhone transmit- ted commands to the glider via a 3G connection through a shore- based Freewave base-station which in turn communicated with the glider through a Gumstix computer which resides in the glider. The software (and hardware modifications) on the glider and base sta- tion, were jointly developed by Arvind and Hordur Heidarsson. This serves as a demonstration of how improvements in mobile commu- nication devices can simplify tasks such as AUV operation at sea (particularly in coastal scenarios) and is the first time glider AUVs were operated using only a cell-phone (to the best of our knowledge). 188 B.6 A 20 hour simulation of an ensemble of over 20 gliders, with random start locations (in the south-east) and a single goal location (in the nort-west) using GPPLib’s kinematic simulator. GPPLib’s collision detection system detects collisions between the gliders and Catalina island (the land body to the south). The trajectories that result in collisions are depicted as thick red lines. . . . . . . . . . . . . . . . 191 B.7 A glider retrieval waypoint sent to the MotionX Gps app to help recover the glider. Clearly this screenshot was not created when re- trieving the glider (we do not recover gliders at night due to visibility constraints). ApplicationssuchasMotionXGpsmakeitfindingglid- ers very convenient since it obviates the need to manually enter the GPS coordinates into a GPS, allowing the boat operator freedom to steer the boat along the course provided by MotionX. The app takes advantage of the built-in Compass and GPS sensors on the phone to provide a range, bearing and expected time of arrival to the glider location (based on the location and speed of the boat). . . . . . . . 194 C.1 ABlockdiagramdepictingthecommunicationsystemandtheshore- based Glider Mission control server which allows users to plan mis- sions and view the status of the glider through a web-based interface.198 C.2 A deployed Slocum glider. Note the nose-down posture with the tail elevated to facilitate communication. In this picture, the antenna is located approximately 30 cm above the sea surface. . . . . . . . . . 199 C.3 Schematicdepictionofglidertoshoreline-of-sight(LOS)obstruction due to waves. (not to scale) . . . . . . . . . . . . . . . . . . . . . . 200 xxvi C.4 The glider Persistor, additional Gumstix computer which runs our communication protocol stack, and accelerometer as installed on the USC Slocum glider. . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 C.5 Flowchart for a typical glider surfacing with the Gumstix and ac- celerometer additions. . . . . . . . . . . . . . . . . . . . . . . . . . 216 C.6 Carrierdetectexpressedasapercentageoftimewithmodem-modem carrier detect on from the May 2009 measurements. Some glider surfacing events had poor carrier detect in spite of being closer to the base station than other locations. This motivated us to use an accelerometer on the glider to test if the sea state was responsible for the poor communication performance. . . . . . . . . . . . . . . . 216 C.7 Protocol link state expressed as a percentage of time with protocol link on during the May 2009 trials. The link state is a ratio of the total time the glider had communication to the total time spent by theglideratthesurface. Linksclosertothebasestationaregenerally better although there are exceptions. Having carrier detect between modems does not imply a good link state. . . . . . . . . . . . . . . 217 C.8 File transfer speeds in bytes/s from the May 2009 trials. In these trials, we used a standard set of files as a payload to measure file transfer performance after the glider surfaces. No data compression was performed before transmission. For these trials, we used conser- vative estimates on retransmit timeouts. . . . . . . . . . . . . . . . 217 C.9 Protocol link-state expressed as a percentage vs. distance (km) be- tween the glider and the nearest base station. Our communication protocol gives a link-state of >90% up to 12 km with a base station located at a height of 70 m above the sea surface. We notice that there are instances of poor communication at ranges under 10 km. We hypothesize that these are due to surface conditions that affect communication between the glider and the base station. . . . . . . . 218 C.10 Link state obtained from the September 2009 glider deployment. While the link state is generally very good within 8 km, we find several surfacing locations where the link state was poor nearer the base station on Catalina island. . . . . . . . . . . . . . . . . . . . . 219 xxvii C.11 Comparison of H s from the glider accelerometer and NDBC Buoy 96222 located approximately 25 km away. The general trend in the significant wave-height measured at the buoy and the glider is simi- lar, although due to differences in the direction of the sea-waves, sig- nificantdistancebetweentheobservationsatthebuoyandtheglider, as well as the small time-window of accelerometer data-collection on the glider, variations may be expected. . . . . . . . . . . . . . . . . 220 C.12 Scatter plot showing the average correlated byte length and the dis- tance (m) between glider surfacing location and the base station. A cubic data fit is shown to highlight the trend followed by the data as distance increases. From the plot it is obvious that there is a sharp drop in correlated byte lengths beyond 3 km. There is considerable spread in the correlation scores between 5 km and 11 km, which we will investigate further. Beyond 12 km, the average correlated byte length is under 300 bytes. . . . . . . . . . . . . . . . . . . . . . . . 220 C.13 Scatter plot showing the average time occupied by dropped bytes in the received correlation sequence. This time provides us with a figure that includes delays due to transmission errors, modem- modem retransmissions, intermittent losses in carrier detect, line-of- sight between modems and so on. We see an increasing trend (as expected), with a large spread as glider-surfacing locations beyond 8km become progressively more distant from the base station. . . . 221 C.14 Scatter plot showing the maximum throughput between the glider and the base station from the received correlation sequence. Each data point is based on the achieved throughput at the base sta- tion based on the ratio of the total data successfully transmitted in the given time. This is the maximum throughput we can expect to achieve, since the modems cannot be made to transmit any faster. In practice, we have lower throughput because we need to retransmit data packets that have errors in them, due to which the rate goes down. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 xxviii Abstract Path planning is the process of generating an optimal sequence of waypoints from a start configuration to a desired goal configuration under constraints (e.g., avoiding obstacles, respecting time/energy budgets). In this thesis, we study the problem of risk-aware planning. Specifically, we design, develop, and experimentally validate optimal paths for Autonomous Underwater Vehicles (AUVs) in the open ocean in the presence of navigational hazards such as ships and other obstacles. A novel aspect of this work is the introduction of ocean current predictions to optimize planning in such settings. This is challenging because current predictions are typ- ically available at non-uniform spatial resolution, noisy, and time-delayed. We designed three risk-aware planners that reason probabilistically about the uncer- tainty in ocean currents predictions. The minimum expected risk planner ensures that the AUV always reaches the goal, while minimizing risk along the way, Risk- aware Markov Decision Process-based planning uses stationary models over a short horizon, and trades off between goal-directed behavior and reducing risk. This is susceptible to finding sub-optimal policies due to stationarity. The non-stationary, risk-aware MDP makes use of variability in the currents where possible to overcome xxix high-risk sections of paths on the way to the goal. In addition to these planners, we develop a taxonomy for risk-aware planning in dynamic settings. Another key contribution is learning the uncertainty in currents to improve planning. Results from extensive simulations clearly show that learning uncertainty helps significantly improve performance of risk-aware planners in uncertain currents, allowing AUVs to be operated in more challenging scenarios than was previously possible. Finally, the planners described in this dissertation have been field tested at unprecedented levels to validate their practical utility (∼2000 hours of testing at sea). xxx Chapter 1 Introduction This thesis studies the problem of planning risk-aware paths for slow-moving Au- tonomous Underwater Vehicles (AUVs). Slow-moving AUVs such as Slocum gliders lack maneuverability (which is sacrificed for longer endurance) since they are buoy- ancy driven and lack thrusters for propulsion. Moreover, gliders do not have the ability to sense the speed of the ocean currents through which they glide, making it difficult for them to estimate their true location while underwater. These vehicles require to surface in order to localize themselves using GPS as well as to transmit data back to shore via satellite phones. Surfacing exposes the AUV to the risk of collision with ships particularly when the AUV is operated in coastal regions. A glider’s limited ability to localize itself using dead-reckoning and the lack of sensing to accurately estimate the vehicle’s position make this a challenging problem, which requires planning. 1 Planning(particularlyinthecontextofautonomousrobots)istheprocessoffinding a sequence of waypoints which take the robot from a given start location to a given goal location subject to some constraint (eg. avoiding obstacles or sticking to a time/energy budget). Optimal planning involves finding the optimal sequence of waypoints which minimize some objective function, which in the risk-aware sense could be to minimize the risk of collision with obstacles. In the work described in this thesis, we create a risk map using historical Automatic Identification System (AIS) from ships for the Southern California Bight (SCB). We proceed to apply proven techniques from the Artificial Intelligence community to plan paths which minimize the surfacing risk deterministically. This is described in chapter 2, where we define the problem of finding a minimum risk path going from the start to the goal. Planning without considering ocean currents drastically affects the outcome of plan execution. We begin considering the use of ocean current predictions to help planpathswhichminimizerisklaterinthelatterpartofchapter2. Time-Expanded planning allows for paths which minimize risk by using the currents while pseudo- waypoints can help a glider surface closer to the desired location given knowledge about how currents affect the flight of the glider. If predictive ocean models were accurate, their use in risk-aware planning deter- ministically would be sufficient. Unfortunately, predictive ocean models are coarse and inherently noisy. Uncertainty estimates characterizing the noise in these es- timates are usually not available as part of the predictions. This uncertainty in 2 predictions has strong implications on our ability to predict where the glider will surface while navigating through these currents. The uncertainty in the surfacing location necessitates planning under uncertainty which is the focus of chapter 3. Here, we develop planners which reason about the uncertainty probabilistically. In chapter 3 we introduce two algorithms (1) Minimum Expected Risk (MER) planner and (2) risk-aware Markov Decision Process. We evaluate these planners in simulation and present results showing improvements from employing our tech- niques. We also describe field tests performed using these planners. The MER planner minimizes the expected risk of surfacing at high-risk locations while head- ing to the goal. It is ideal for use in low-variability currents and when being goal is a priority. In strong currents, we find that the MDP is better at trading off between being goal-directed and being risk-averse. During the course of simulation studies of the MDP, the noise assumed to be present in the predictions appeared to play an important role in resulting performance. This observation led to the thesis statement - probabilistic planning algorithms aided by learning uncertainty in noisy predictive ocean models, significantly reduce collision risk, for slow-moving autonomous underwater vehicles. Chapter 4 focuses the problem of learning uncertainty in ocean current predictions, with the aim of using this knowledge to improve risk-aware planning. We use the Gaussian Process framework to learn the uncertainty in ocean current predictions. GPs provide us with a principled way to build transition models. We develop 3 two types of MDP planners, one suited for low-variability currents (stationary GP- MDP) and the other suited for high-variability currents (non-stationary GP-MDP). Extensive simulation studies indicate that probabilistic planning algorithms when aided by learning uncertainty in noisy predictive ocean models do significantly reduce collision risk for slow-moving autonomous underwater vehicles. Our results also clearly bring out the regimes when each planner is most suitable and show that learning uncertainty is very useful. Comparison of these practical planners in simulation (which have a planning horizon of <72 hours) with an Omniscient planner provides a key insight: a non-stationary GPMDP planner is useful only in fast currents but is a poor choice in slower currents (unless large prediction horizons are available). The planners designed and developed in this thesis have undergone approximately 2000 hours of field tests on Slocum gliders. Given the limited computation available on gliders, and the large size of ocean prediction datasets most of the planning was performed off board on shore-side computers. When taken together the planners in this thesis provide a comprehensive treatment of the learning, planning and implementation challenges of fielding autonomous underwater vehicles in uncertain ocean currents. 4 Chapter 2 Deterministic Minimum risk planning 2.1 Introduction Autonomous Underwater Vehicles (AUVs) and slow moving Autonomous Underwa- ter Glider Vehicles (AUGVs), provide sampling platforms which have long ranges during deployments while sacrificing speed to gain mission longevity. Many sci- entific applications involve slow moving phenomena for which gliders make ideal sampling platforms. Since they are cheaper to operate in comparison to a ship, deploying gliders or AUVs in large numbers for a long duration of time makes it possible to improve the sampling capabilities in a given region significantly, thus overcoming many drawbacks inherent to such slow platforms. Slocum gliders, [2] are efficient AUVs with an endurance of approximately 1 month. They dead-reckon between waypoints, correcting for observed currents as they ex- ecute a mission to improve upon navigational accuracy in a future section of the 5 Glider Surfacing Locations Unofficial Shipping Lanes 20 km Figure 2.1: An overlay of the AIS data from Jan 2010 to April 2010 along with all glider surfacings from 2 gliders color coded differently for identification, during the deployments from 2009 and 2010. Also shown are planned and currently opera- tional base station sites for shore-based communication with gliders are also shown. PointFerminandCatalinaIslandhaveHF-radarsiteswhichprovidesurfacecurrent measurements mission. Gliders accumulate error in their position estimates over time making it advantageous for them to surface regularly to get a GPS fix beside transmitting collected information. At the surface however, the glider is exposed to risks of collision with larger marine vessels such as ships, especially in a region with heavy shipping traffic such as that near the ports of Los Angeles and Long beach. Hence it is important to be able to plan paths for the gliders consisting of waypoints that make it surface in safe locations while traversing between the desired locations. 6 Automated Identification System (AIS) is a system for automatically identifying and locating vehicles by electronically exchanging messages with other ships as well as Vessel Traffic Services (VTS) stations. The information provided includes an unique identification for each AIS transceiver, a position, course, speed etc. The International Maritime Organization’s (IMO) International Convention for the Safety of Life at Sea (SOLAS) requires AIS equipment to be fitted aboard all international ships with a gross tonnage (GT) of 300 tons or more, as well as all passenger ships regardless of tonnage. AIS is useful in collision avoidance, given the position of larger vessels, their speed and course assists in steering the vessels safely amidst traffic so that the vessels may plan safe paths accordingly. This is usually in addition to the ship radar. Local VTS stations utilize AIS information to manage ship traffic. AIS can provide them with the kinds of ships in their local area and their movement. Figure 2.1 is an overlay of the AIS data in the Southern California Bight within which we have operated two gliders (color coded so it is easier to differentiate between the two vehicles). We see the surfacing locations for the gliders during the firstfourmonthsof2010, whichisalsothedurationforwhichwehavehistoricalAIS data for the region. Various operational and planned base-stations for the region can be used to allow faster data communication between the glider and a base station in range. Figure 2.1 shows an overlay of the AIS data from 2010, for the entire Southern California Bight region. Regions with higher chances of collision 7 are color coded lighter in the occupancy grid. The twin shipping lanes along the coast leading into and leaving the ports of Los Angeles and Long beach are clearly visible. There is also a significant amount of traffic between the mainland and the south-eastern tip of Catalina Island which shows up as another fairly clear line. The officially specified shipping lanes which are along the coast, line up very well with the AIS data along the coast. 2.1.1 Related Work The Rutgers University Coastal Ocean Observation Laboratory, [3] describes how a large-scale coastal ocean observatory consisting of ships, gliders, moored sensors and HF-radars can provide sustained coastal observations that give a much clearer picture about coastal ocean phenomenae. Gliders due to their long endurance allow for sustained oceanographic sampling, with a servicing time which can be as short as a day. This gives unprecedented sampling capabilities to oceanographers and marine biologists. CINAPS at USC [4] also has a coastal observatory consisting of Slocum gliders, robotic boats, networked sensor buoys, HF-Radar and communica- tion sites, which are operated together to study scientific problems such as Harmful Algal Blooms. Path planning for Autonomous Underwater Vehicles is a well-studied problem. Most of the challenges in path-planning are due to the ocean being a very dy- namic environment with currents that typically have different velocities at different 8 depths. Typically path-planning for most oceanographic applications involves plan- ning lawn-mower patterns of motion that allow the robot to explore some portion of the ocean. In more complex regions where there are obstacles due to the water be- ing shallow or due to the presence of shore features, planning paths that do not run into obstacles becomes imperative. [5] describes a path planner using bathymetry, exclusion zones and ocean current databases to generate path corridors along great circle routes using A ∗ . This algorithm finds path corridors that satisfy path con- straints such as minimum depth, desired depth, maximum depth, width and fuel resources. In [6], Petres et al, introduce an algorithm called Fast Marching (FM) and extensions to it to deal with underwater currents. In this algorithm, they can specify the curvature of the final path which allows the turning radius of the vehicle to be used. While this is useful in lower-level planning involving control, the spatial scale at which we plan for gliders is often too large to allow us to meaningfully plan paths at such a low-level. Our planner plans paths consisting only of waypoints while allowing the vehicle’s lower-level controllers to aim for these locations as best they can. Fighting ocean currents can be an energy-expensive task. In some cases such as estuarine environments, [7] or in areas with time-varying ocean currents [8], algo- rithms that exploit the current to conserve energy and find paths that can provide significant gains in speed when making use of currents that help them move faster. 9 These methods typically require a fairly well understood model for the flow of cur- rents in order to exploit them, and may produce arbitrarily poor results if the actual conditions are very different from the values predicted by the models em- ployed. Gliders are slow-moving vehicles and are affected by currents even more than AUVs which are usually capable of higher speed in water since they have their own motorized propulsion. Most ocean models are not accurate enough to predict currents several days into the future, and therefore pre-planning long range paths based upon currents might not produce very favorable results. These techniques might also be difficult to use if it is important to visit locations in some order which contradicts the efficient path. Informative path planning [9], can drive planning paths that are driven by information gain in an attempt to reduce the uncertainty of a given region. Das et al [10], describe another form of path-planning which deals with planning in the Lagrangian frame, a method which allows for simultaneous tracking and sampling using Lagrangian drifters to track moving features in the ocean such as plumes. Hazard avoidance map building, both offline and online especially for UGVs [11] has beenstudiedandiswelldocumentedintheliterature.[12]discussestheconstruction of risk-maps based upon ground orography, which is then used to plan minimum- riskpathsforUAVsusingA ∗ andageneticalgorithm. Theyalsoperformsmoothing on their paths so they can be used for trajectory control on an UAV. 10 Interestingly, around the time we began studying this problem, Merckelbach (inde- pendently of us) was also working on developing hazard avoidance maps using AIS data (as evidenced by his paper, [13]). This work delves into a theoretical analysis for computation of collision risk between gliders at the surface, and ships traveling in the same region so as to compute probabilities that the glider will survive a mission (between two waypoints). Merckelbach showed that a simple probabilistic model that assumes the risk of collision is proportional to the ship density in a re- gion (and a few other factors which can be factored in), yielded comparable results to expected collision risk from a realistic (computationally expensive) Monte-Carlo simulation of over 10000 gliders traversing a transect in shipping traffic. His work did not plan paths that minimized risk. In this paper we describe an optimal path planning algorithm to find minimum risk-paths from a start to a goal location, under a constraint on the maximum distance between consecutive surfacings, while implicitly handling uncertainty in dead-reckoning. Parts of the work in this chapter, were first published in a confer- ence paper [14] and a technical report [15]. 11 2.2 Problem Definition Our goal is to find the minimum risk pathP ∗ for travelling between the given start and goal waypoints, P ∗ = argmin P X i risk(w i ) (2.1) where w i is a waypoint along the path P ={w 1 ,w 2 ,...}, under the constraint that e(w i ,w i−1 )≤d max (2.2) where i≥ 2, assuming the starting waypoint is w 1 . Here e(w i ,w i−1 ) is the length of an edge between waypoints w i andw i−1 , andd max is the maximum distance the glider is allowed to travel before it needs to transmit its data. For simplicity we assume that the glider requires a constant amount of time at each waypoint to transmit its data. Although the actual motion of the glider is in 3 dimensions, we treat this planning problem as 2 dimensional, because we can plan a deep dive between each waypoint to dive under risky regions as shown in Fig. 2.4. 2.3 Proposed Solution To find the minimum risk path, we consider the risk map created as described in section 3.4. 12 Figure 2.2: Obstacles such as the bathymetry can also be combined into the same risk map with a very high value assigned to the risk of hitting it. The risk-map shown here has also undergone a Gaussian blur to spread the noise in the creation of the risk map, as well as to account for the uncertainties in the glider navigation while executing the mission. Here lower risk is black while increasing risk goes to white through grey. We use A ∗ search [16], [17] to find the minimum risk path to solve the problem specifiedbyequations2.1and2.2. A ∗ usesaheuristicfunctionh(w k )thatminimizes the estimated cost to goal to help improve the cost-function of Uniform-cost search which minimizes the cost of the path so far g(w k ), where w k is the waypoint or node being evaluated during the search for the minimum-cost path. g(w k ) = X i risk(w i ) (2.3) where g(w k ) is the path cost from the start to the last waypoint on the optimal path explored currently in the search. 13 Algorithm 2.1 Minimum-Risk A ∗ 1: Input: start waypoint S, target waypoint T, bathymetric map B, risk map R, minimum risk R min 2: closed←{}, open←{s}, parentNodes←{} 3: g[S]← 0 4: h[S]←b(kT−Sk)c.R min 5: f[S]←g[S] +h[S] 6: while open6={} do 7: x← node in open with minimum f[] value 8: if C =T then 9: P←{T} 10: X←T 11: while parentNode[X]6=S do 12: P.Append(X) 13: X =parentNode[X] 14: end while 15: P.Append(S), P←Reverse(P ) 16: return P 17: end if 18: remove C from open 19: add C to closed 20: for all D in GetNeighbours † (C) do 21: if D in closed or CollisionDetected(D,B) then 22: continue 23: end if 24: g temp ←g[C] +R[D] 25: if C not in open then 26: open.Append(C), g update ←True 27: else if g temp <g[C] then 28: g update ←True 29: else 30: g update ←False 31: end if 32: if g update ←True then 33: parentNode[D]←C, g[D] =g temp 34: h[D]←b(kT−Dk)c.R min 35: f[D]←g[D] +h[D] 36: end if 37: end for 38: end while 39: return "Path not found" 14 C E B D d< d_max region A Figure 2.3: Figure illustrating the GetNeighbours † (C) step in our A ∗ search for the shortest risk path while using the d max constraint. The blue circle indicates a region that encloses all nodes around a node at the frontier of expansion, where we compare the cost of all nodes around it which are under d max distance away. All nodes around C in the open−list have their parent-pointers pointing to C, since it is the minimum f-cost node. In this figure a darker cell refers to one with higher risk, and the lowest-cost path should be one that picks lighter cells in order to achieve the minimum total path risk. At the next stage of the algorithm, if node E has the minimum f-cost, it will be the next node at the search frontier. We use the heuristic function h(w k ) which we set to risk min which is the globally minimum risk value among all cells on the risk map. h(w k ) =N∗risk min (2.4) and N is the number of waypoints to the goal from waypoint w k assuming the glider uses the minimum number of surfacings along the straight line path between w k and the goal waypoint, such that it satisfies the constraint from equation 2.2. Equation 2.4 is an admissible heuristic because it never underestimates the cost to reach the goal. This is because we are using the minimum possible number of 15 surfacings, as well as the minimum risk. Although this heuristic always evaluates to zero if there are any cells of zero risk, it comes into play when we start scaling the minimum risk to avoid longer paths, which we discuss in section 2.3.3. N =b(kw e −w k k)c (2.5) where w e is the goal way-point, andbc is the floor function. The cost-function we use to choose the search expansion node is f(w k ) =g(w k ) +h(w k ) (2.6) GLIDER EXECUTES A STAY-DEEP YO-YO WHEN IN A RISKY AREA REGULAR YO-YO OF GLIDER THROUGH A NON-RISKY AREA RISKY AREA WHERE HIGH SURFACE TRAFFIC IS EXPECTED WAYPOINT K (BEFORE RISKY REGION) WAYPOINT K+1 (AFTER RISKY REGION) REGULAR YO-YO DEEP-DIVE MARGIN RISK-AVOID YO-YO Figure 2.4: An illustration (not to scale) of how a glider plan is modified between a set of way points w k and w k+1 such that the glider stays deeper while traversing the segment through a risky area. For every segment that has a risky area which the glider might pass through, the glider will need to stay deeper. 16 2.3.1 Risk Map Creation using AIS Data We create risk maps which we represent in the form of occupancy grids. In this work, we have used grid-resolutions of 100 m x 100 m and 250 m x 250m for each cell. Planning at a lower resolution helps speed up the search but gives up on resolutioningliderpositionsinthepathbeingplanned. Themajorityofcommercial shipping vessels are under 250 m in length, making this a good cell size to treat simultaneous occupancy by a glider and a shipping vessel as highly undesirable. Each vessel equipped with AIS transmits several messages which contain an unique identifier for the vessel, the vessel’s location, course, speed and heading among many others. We use this information to find which cell is occupied by each unique vessel in a fixed window of time. If the vessel is moving, it may occupy several cells over a few hours, while if it is stationary, it will only occupy a single cell over all those hours. If there is no vessel in a given region, we assume there is only some R min risk in that region. When a vessel emits multiple position messages in an hour, we average all its messages in that time-window and consider the cell in which this average falls to be the one that it has occupied during that hour. risk(x,y) = P O(x,y,t 0 ) M (2.7) 17 where O(x,y,t 0 ) is the occupancy of the cell (x,y) during a block of time t 0 . M is the number of blocks of time for which the occupancy data is being accumulated. As stated earlier, we only consider the locations of each unique vessel in a given block of time. If the vessel emits multiple messages we treat its location to be the average of all positions reported during that time. Although this averaging may not reflect all cells that a particular ship was in during a particular hour, we care only that it reflects the overall risk when summed over many ships and a longer period of time. We can treat this occupancy map as a risk map, because a ship and a glider at the surface co-occupying a single grid cell can lead to a high chance of collision between the two, and this map allows us to evaluate the chances of a cell being occupied by a ship in a given amount of time. Hence we can treat this as a probabilistic measure of how risky a given cell is for a glider to surface in. In this work, we use AIS data collected during the first four months of 2010 in the Southern California Bight region to produce a set of occupancy grids at an hourly discretization. We construct the occupancy grid by summing up the occurrences of a ship having been in each cell through the entire period of time we are considering as history of ship traffic for the region. During practical implementation of A ∗ representingtheriskmapasintegersismoreefficientthanconvertingittoafloating point representation. Hence we do not divide the total occupancy map by M and we consequently have path risk values which are integers. 18 2.3.2 Dealing with Dead-reckoning Uncertainty As discussed in [18], since the glider dead-reckons underwater and is a very slow moving vehicle, there is a large degree of uncertainty involved in the glider’s bear- ing, actual velocity achieved in the water (due to the effect of currents), and con- sequently the achieved surfacing location can be substantially different from the planned location. This error usually grows with increasing lengths of the glider paths between waypoints. The distance limit on how far the glider is allowed to travel before it is forced to surface to re-transmit data can also keep the glider surfacing often enough to be fairly accurate. We know through analyzing all glider surfacings in our coastal region that our glider has a standard deviation of approx- imately 1.1 km. We use a two-dimensional Gaussian mask G(x,y) to compute the probability that the glider will surface in a region around the desired location. For any waypointw i underconsiderationweapplythisGaussianmasktoobtaintheriskvalueassociated with it by taking into account the fact that the glider might not surface at the exact location. G(x,y) = 1 2πσ 2 e − (x−¯ x) 2 +(y−¯ y) 2 2σ 2 (2.8) where σ is the standard deviation on the dead-reckoning of the glider expressed in grid-size units. This mask when convolved with the risk-map gives a risk value which is weighted by the probability that the glider might surface at locations 19 around the desired location, thus incorporating the uncertainty associated with surfacing at that waypoint in the mission plan. The size of the kernel function is typically chosen to be the nearest odd integer tod6σe. For a glider with a standard deviation of σ=1100 m, on the discrete grid, at a grid resolution of 250 m, the 6σ kernel operator is of size 27 x 27 elements. We convolve this Gaussian mask with the risk map which helps serve two purposes. Thefirstallowsustoaccountforgliderdead-reckoninguncertainty, whilethesecond helps us spread the risk to neighboring cells so we don’t have noisy low-risk areas in the middle of high-risk cells, which could make the planner erroneously pick one of those locations for surfacing at. If a better motion model were to become available, we can convolve that probability distribution with the risk map accordingly, but this would need to be done at run-time instead of the map pre-processing stage as done in this work. We noticed an improvement of upto 30 times by pre-processing the risk-map with the vehicle’s Gaussian mask relative to evaluating it at run-time. 2.3.3 Dealing with Arbitrarily Long Paths The A ∗ planner we propose finds the minimum risk path between start and goal. Unfortunately, this path could be fairly long and might be infeasible since it might not meet the criteria of energy available for the glider. Hence the user may specify a budget constraint on the length of the path P max . 20 If our planner returns a successful planned path P ∗ , we test if: kP ∗ k≤P max (2.9) This value of P max must be larger than the shortest path between the start and goal locations for the algorithm to have a solution. We modify a parameterα which modulates the risk-map such that we can increase the nominal risk over the entire map. risk 0 (x,y) = (1−α)∗risk(x,y) +α∗R max . (2.10) Ifrisk 0 (x,y)>R max ,risk 0 (x,y) =R max . The purpose ofα is to provide a human- tunable parameter that allows an operator to allow increased risk to obtain shorter paths, based on their own judgement. If this number is 0, the risk at the location will be the same as the original accumulated value. As we increase α toward 1 it becomes almost equally risky to surface everywhere on the map, thus forcing the path to be shorter until we are essentially finding the shortest path between start and goal when there is a constant cost R max to go from one waypoint to another. At this point the planner would select the minimum surfacings waypoints along the shortest path to go from start to goal, thus avoiding longer routes. 21 2.4 MinimumRiskPlanningwithCommunication Utility As mentioned earlier, one of the primary reasons for gliders requiring to surface is communication. Gliders need to transmit collected data from its sensors as well as to periodically receive update mission plans. One scheme of data collection involves the glider transmitting all data collected between the preceding and the current waypoint at that waypoint. This might be useful in adapting mission plans of other AUVs in the area. Most gliders can transmit collected data from any location using a Iridium (or similar) satellite data link. These links while glob- ally available are cost-prohibitive and slow, due to which the use of cheaper and faster communication modes such as line-of-sight (LOS) radio modems can be fa- vored (especially for glider operations in a coastal region such as the SCB). In prior work, [19], we described the development of such a network which allowed for higher throughput communication with the gliders. In [20] we conducted field experiments which helped us characterize the communication throughput we can expect when communicating between the base-station and the glider. We can construct a com- munication map such as that shown in Fig. 2.5, where the shaded regions around each base-station represent the communication throughput available at those loca- tions. Gliders surfacing within these regions serviced by a base-station can have 22 access to much faster data throughput as opposed to Iridium which is available ubiquitously. Given a risk map and a communication throughput map for the region, it is worth- while considering the problem of planning safe paths from a start to a goal location, while taking advantage of the higher throughput at certain surfacing locations if the glider has to communicate at every waypoint. A common way to handle this is to create a multi-attribute cost function such as equation 2.11 where a,b are weights giving the importance of risk and communication throughput at each way- point along the path. One drawback of such an approach is that quantities such as risk and communication are not directly comparable to each other in a meaningful way. Cost(w k ) = k X i=1 (a∗risk(w i ) + b Thr(w i ) ) (2.11) Itturnsoutthatforthisproblem, theuseofamultiplicativecost-function(equation 2.12)allowsustoconnectriskandcommunicationthroughputsinameaningfulway. This draws from the observation that when surfacing at the same location the risk of collision at the surface increases with the time spent by the glider transmitting data. The data transmission time is in turn inversely proportional to the data throughput available at that location. 23 Cost(w k ) = k X i=1 D(w i−1 ,w i )∗risk(w i ) Thr(w i ) ) (2.12) where D(w i−1 ,w i ) is the amount of data to be transmitted at waypoint w i which depends upon the time taken to travel between waypoints w i−1 and w i . The mod- ified minimum-risk problem is to find the minimum risk path P ∗ among all paths P∈P going from the start to goal waypoint such that P ∗ = argmin P∈P |P| X i=1 risk(w i ) Thr(w i ) (2.13) under the same constraint of equation 3.2. 2.4.1 Proposed Solution: Minimum Risk Planner using A ∗ WeuseA ∗ tosolvethemodifiedplanningprobleminequation2.13becausealledges and risk costs are positive, and we have found an admissible heuristic. We build a search graph G(V,E), adding nodes starting from the source w s , and iteratively adding all outgoing edges from this node until we add all the nodes in our region of interest while also updating the cost of edge-traversal C(w i ,w i−1 ). We define the cost of traversing an outgoing edge from node w i−1 to some node w i as C(w i−1 ,w i ) = D(w i−1 ,w i ) Thr(w i ) ∗Risk(w i ) (2.14) 24 Los Angeles County Orange County St. Catalina Island Malibu Figure 2.5: A map showing the proposed and functional base-stations in the SCB region. Low bandwidth Iridium communication is available throughout the whole region, whereas Line-Of-Sight (LOS) communication with the base-stations is only available within the shaded regions around each of the red diamond shaped base- station locations. We create this throughput map based upon experimental data as well as LOS calculations from an elevation map. A better communication model for throughput if available, could use be substituted instead. HereD(w i ,w i−1 ) is the amount of data the glider needs to transmit at waypointw i given it travelled along the edge e(w i−1 ,w i ). The outgoing edges from any node are all nodes reachable from it which satisfy equation 3.2. By doing so, we can now construct a connected directed graph and we want to find the shortest path going from the source w s to the goal node w e through it. When constructing the fully-connected graph, we maintain a list of nodes that have already been added so that each node’s neighbors are added to the graph only once. 25 We notice that since all values of C(w i ,w i−1 ) are non-negative, we can apply Dijk- stra’s shortest path algorithm on this graph to find the shortest path. It is possible to further improve upon the running-time of Dijkstra’s shortest path algorithm by expressing this as an A ∗ search for the shortest path through the same graph by using an admissible heuristic. We describe the cost-functions used to perform A ∗ search and prove that our heuristic is both admissible and consistent. We define the path-cost g(w k ) evaluated at node w k to be: g(w k ) = k X i=1 D(w i ,w i−1 ) Thr(w i ) ∗risk(w i ) (2.15) where the starting node w s is w 0 . Equation2.15statesthatthecostofthepathuntilnodew k isthesumoftheproduct of the risk at each waypoint and the time spent on the surface transmitting data at that location. We define the heuristic for the estimated cost h(w k ) as: h(w k ) = N∗risk min ∗D max Thr max (2.16) where N =floor kw e −w k k d max ! (2.17) 26 First, forh(w k ) to be an admissible heuristic, it needs to underestimate the cost to the goal. 2.4.2 Proof of the Admissibility of the A* heuristic Theorem 1: Heuristich(w k ) always underestimates the cost of the remaining path cost from w k to the goal node w e and is therefore an admissible heuristic. Proof. Let us assume that we are evaluating the minimum amount of data that can be accumulated when going from the current node to the end. This amount will be: D total =α∗ (w e −w k ) (2.18) The minimum total amount of time that the glider can spend at the surface is: t min =D total /Thr max (2.19) We know that risk min is≥ 0, if risk min = 0, the h(w k ) is zero, which is trivially admissible. Let us consider the more interesting case where risk min > 0. The minimum possible total risk between the node w k and the end waypoint w e occurs when we have the minimum number of nodes between the current waypoint and the end, with each having a risk value of risk min . The number of waypoints 27 required is at least N (equation 2.17). Hence we can never have more risk greater than risk min ∗N. Typically the minimum number of waypoints required will be N + 1 when the length of the path remaining is not an exact integral multiple of d max . Assume for now that there exists a path going from w k to w e which has a lower path-cost than our heuristic. To do so, the actual path cost for this path will be: h ∗ (w k ) = e X i=k d(w i ,w i−1 ) Thr(w i ) ∗α∗risk(w i ) (2.20) The minimum possible cost for such a path should have the lowest risk and the maximum throughput, or there could be a better path than it. Hence equation 10 becomes: h ∗ (w k ) =α∗ risk min Thr max ∗ e X i=k+1 d(w i ,w i−1 ) (2.21) The heuristic value we define in equation 2.16 can also be re-written as: h(w k ) =α∗ risk min Thr max ∗N∗d max (2.22) Now, this means that for h∗ (w k ) to be smaller than h(w k ), e X i=k+1 d(w i ,w i−1 )<N∗d max (2.23) But we know from equation 2.17, that N∗d max ≤ (kw e −w k k). 28 Hence it is not possible to find a path going from w k to w e consisting of smaller segments, that is shorter than that used in equation 2.16. This is a contradiction, and henceh(w k ) must always underestimate the distance to the goal, and therefore the heuristic in equation 6 is admissible. 2.4.3 Proof of the Correctness of using A* for planning Minimum Risk paths with edge-dependent, non-negative node weights In our A* formulation, the heuristic is chosen based upon the straight line distance from the node it is being evaluated at to the goal node. The g-cost, h-cost and f-cost for all outgoing links from the lowest-cost node selected from the open list are computed before the node is placed on the closed list. At this point of time, if this node had an incorrect g-cost which was higher than its true value, we will never re-visit it again and consequently end up with a suboptimal path to the goal which may not include the best path to it. We know that the path cost at some node w k is g(w k ) =α∗ k X i=1 risk(w i )∗kw i −w i−1 k∗ 1 Thr(w i ) (2.24) In order to be sure that the node picked based upon the lowest f-cost will already have a g-cost that will not change, we need to ensure that our heuristic function is 29 not only admissible but also that it ensures the monotonicity of the f-cost function. i.e. The heuristic function should never make the f-cost decrease. Lemma: The heuristic function h(w k ) is monotonic. Proof. h(w k ) =α∗ risk min Thr max ∗d max ∗floor kw e −w k k d max ! (2.25) or: h(w k ) =β∗floor( kw e −w k k d max ) (2.26) where β = α∗risk min ∗d max Thr max (2.27) Forh(w k ) to be monotonic, it must obey the triangle inequality (Pearl, 1984)(Rus- sell, 1995). Let us assume that we are considering a path fromA toC which has some heuristic function value h AC while there is another path that to C via B from A which has heuristic values of h AB and h BC for each of the other two legs. We want to prove that h AB +h BC <h AC (2.28) Now, from the definition of our heuristic function, the heuristich AB =h(w b ), while that of h BC =h(w c ). Similarly the heuristic h AC =h(w c ). 30 Let us assume that there exist some values of h AB , h BC and h AC for which this does not hold. Then we must have: h(w b ) +h(w c )<h(w c ) (2.29) Which becomes: $ kw e −w b k d max % + $ kw e −w c k d max % < $ kw e −w c k d max % (2.30) But we know that under no conditions is it possible for the floor function to go less than zero. Hence our heuristic obeys the triangle inequality and from this it follows that it is monotonic. 2.5 Deterministic minimum risk planning in predictive ocean currents Time-expanded shortest path problems such as those encountered in road trans- portation networks can usually be solved efficiently usingA ∗ if we deal with metric graphs (e.g. Chabini et al [21]). In this problem however, we deal with ocean cur- rents which make the edges on the search graph non metric (i.e. they do not satisfy the triangle inequality). The currents can speed up the glider in one direction and 31 slow it down in the other making travel times between any two nodes asymmetric in general. Hence, it is difficult to find good heuristics for non-metric graphs which can be used to make the search more efficient without sacrificing optimality. We begin by describing how we construct a search graph to find the minimum-risk path in the presence of ocean currents and then explain how we can use Dijkstra’s shortest path algorithm [22], to find a minimum-risk path going from the start waypoint to the goal waypoint under the constraints defined earlier. We construct a graph G(V,E) where each vertex v has a location and time of arrival at that location. These vertices make up waypoints in a possible path from the start waypoint to the goal. The edge connecting two vertices v i and v i+1 denotes the time taken to go fromv i tov i+1 while negotiating the field of ocean current between v i and v i+1 . e(v i ,v i+1 ) =t i+1 −t i (2.31) where v i = (x i ,y i ,t i ) and v i+1 = (x i+1 ,y i+1 ,t i+1 ). By construction of the graph, t i+1 >t i . Hence all edges are positive. The value of the risk at each node is also always positive. 2.5.1 Proposed solution: Time-expanded A* planner We want to find the minimum risk path while going from the start to the goal. One way to construct a search graph is to consider all the outgoing edges to all 32 the nodes which are spatially reachable from a node w(x i ,y i ,t i ). The value of each outgoing edge is computed based upon the time of flight for the glider to go from the previous node to the next. We discretize this time to allow for a finite number of time values. If the time of flight to reach to waypoint w i+1 from w i is t flight , where t flight is discretized to the nearest value of time values. If we define our optimal cost function for getting to a node g(w i ) as the lowest value of getting to that node among all possible paths leading to it, then if we have already updated the path leading up tog(w i ) with the optimal path, we set the value of g(w i+1 ) to: g(w i+1 ) = i X k=1 risk(w k )∗t(w k ,w k+1 ) +risk(w i+1 )∗t flight (2.32) Since the edges and the node-weights are always positive, we can use Dijkstra’s shortest path algorithm on this time-expanded network to find the minimum risk path going from the start to the goal. Algorithm 2.2 describes how we use the open and closed lists employed by Dijkstra’s algorithm to find the minimum risk path in our time-expanded network. We calculate pseudo-waypoints which are fake target locations the glider should aim for in order to reach the desired waypoint due to the effect of the currents. Pseudo-waypoints require a simulator that will describe the motion of the glider through an ocean model. We assume linearity in the currents and compute the value of currents at a given location using bilinear interpolation of depth-averaged 33 Algorithm 2.2 Minimum Risk Path Planner using an adaptation of Dijkstra’s Shortest Path formulation 1: Input: start waypointS(x s ,y s ,t s ), target waypointG(x g ,y g ), bathymetric map B, risk maps R k , minimum risk R min 2: closed←{}, open←{s}, parentNodes←{} 3: g[S]← 0 4: f[S]←g[S] 5: while open6={} do 6: C← node in open with minimum g[] value 7: if C =G then 8: P←{T} 9: X←T 10: while parentNode[X]6=S do 11: P.Append(X) 12: X =parentNode[X] 13: end while 14: P.Append(S), P←Reverse(P ) 15: return P 16: end if 17: remove C from open 18: add C to closed 19: for all D in GetNeighbours † (C) do 20: if D in closed or CollisionDetected(D,B) then 21: continue 22: end if 23: g temp ←g[C] +R[D] 24: if C not in open then 25: open.Append(C), g update ←True 26: else if g temp <g[C] then 27: g update ←True 28: else 29: g update ←False 30: end if 31: if g update ←True then 32: parentNode[D]←C, g[D] =g temp 33: end if 34: end for 35: end while 36: return "Path not found" 34 currents. In figure 2.6, we see that when a glider at location A desires to travel to waypoint B, the currents in the region will push the glider off course to another location B’. In order to bring the glider back to the correct location we compute a pseudo-waypoint B” which when aimed for brings the glider to location B. This concept introduced in prior work by Smith et al, provides a quick way of iteratively finding pseudo-waypoints [23]. This method is not guaranteed to converge, but usually returns with locations which under predicted conditions result in a glider surfacing close to that originally desired (in a noiseless deterministic simulation). B' B'' B A currents Figure 2.6: Glider needs to reach B from location A, but will drift due to the currents to location B’. The pseudo-waypoint method iteratively attempts to find a good location B” which the glider should aim for instead of B, so that the currents push the glider to location B. The location B” is called a pseudo-waypoint. To grow the graph at each step, we use Algorithm 2.3. The time of flight for the glider is computed as we describe in section 2.5.2. 2.5.2 Calculation of the Time of Flight We assume that the nominal velocity of the glider remains constant through the execution of the plan. There are several ways to calculate the true time of flight 35 Algorithm 2.3 GetNeighbours(C) 1: Input: node C, current maps CM(k), maximum flight time t max 2: neighborList ={} 3: for all reachable nodes N in neighborhood of C do 4: Tof←GetTimeOfFlight(C,N,CM,C.t) 5: if Tof≤t max and C.t +Tof≤T max then 6: N.t←C.t +Tof 7: neighborList.Append(N) 8: end if 9: end for 10: return neighborList for the vehicle. In order to illustrate two methods for computing the time of flight that we use in this paper, consider Fig.2.7. Here we assume that we have a map of currents (which we refer to as CM in Algorithm 2.3). One way to compute the time of flight between any two waypoints (say A and B from the figure), is to find the resultant velocity along the path between waypoints A and B. This amounts to doing a line integral of the resultant velocities of the glider. For each cell, it is possible to compute the two components of the resultant velocity vector (along the intended direction of travel V r2 and perpendicular to it V r1 ). Smith et al, describe an iterative method of using a simple sinusoidal model for the glider to simulate the expected trajectory of the glider in the presence of currents by using ROMS data [24] . This computation is done in 3D and results in a target waypoint which has deviated from the desired waypoint by some distance. In the subsequent step, this offset is computed and a target is chosen which will remove that offset. Once again the path followed by the glider if it were to aim for this offset is computed. This process is recursively attempted until a solution is found that brings the glider 36 Goal Wpt Wpt due to Currents Start Wpt C A B' B Pseudo Target Wpt E Vc1 Calculation of time of flight and pseudo waypoint E Vr1 V-current A Vr2 V_auv Vc2 B B' E V-res Figure 2.7: Illustration of the effect of currents on the execution of a planned path between the Start waypoint A, and Goal waypoint B. In the presence of a field of currents whose average velocity is represented by the colored vector in each grid shown, the glider will drift along cells such as C, following the path to D instead of going to B. One way to circumvent this is to find a pseudo target waypoint E to aim for which in the presence of currents takes the glider to the goal waypoint B. Another approach to calculating the time-of-flight would be to compute the resultant velocity along the straight line between A and B, for all the cells (shown in blue here), which lie along that line. Knowing the distance between A and B, as well as the resultant velocity it is possible to calculate an approximate time-of-flight between A and B. to the desired target in the presence of currents, or there is no convergence beyond a certain number of repetitions. In the latter case, we assume thatt flight is infinite. If a target waypoint is found, the solution also contains the time of flight to achieve this. Due to the recursive nature of this algorithm, computing the time of flight for every neighborofanodeisexpensiveandcouldhaveahighupperboundonrunningtime. Hence we need to pre-cache the time of flight from every node to its neighbors. As the branching factor for growing the graph increases (which is determined by 37 Algorithm 2.4 GetTimeOfFlight(w a ,w b ,CM,t) 1: Input: Waypoints: w a (x i ,y i ),w b (x i+1 ,y i+1 ), Current Map: CM(U,V ), Time: t i 2: TimeOfFlight = 0, N = 10, R =GetDist(w a ,w b ) 3: δR←R/N, θ = arctan( y i+1 −y i x i+1 −x i ) 4: for all k = 1 to N do 5: x 0 y 0 ! = x i +δR/2∗cos(θ) y i +δR/2∗sin(θ) ! 6: u 0 =GetBilinearU(x 0 ,y 0 ), v 0 =GetBilinearV (x 0 ,y 0 ) 7: v along =u 0 cosθ +v 0 sinθ 8: v perp =v 0 cosθ−u 0 sinθ 9: v r2 =v auv +v along 10: TimeOfFlight =TimeOfFlight + δR v r2 11: end for 12: if TimeOfFlight≤ 0 then 13: return Inf 14: end if 15: return TimeOfFlight t max ), the memory requirements grow exponentially. Furthermore, since we may have several current maps during a single planning session, there could be different solutions for the times of flights for paths at nodes during a switch from one map of currents to the next. A solution to avoid pre-caching is to compute the time of flight each time we add the neighbors of a particular nodeA to the open list in our algorithm. In this work we use a simplification of the algorithm above which requires only the current maps to be stored in memory. Computation of the time of flight can be performed in roughlyO(n) time wheren is the branching factor for each neighbor. Since we choose the neighboring waypoints to be within a spatial radius of the node being considered, this branching factor is constant for a particular instance 38 of planning. Similarly, an approximate target for the glider to aim for to counter the currents and arrive at B instead could be computed by finding the amount of expected deviation that the glider would experience perpendicular err perp to the desired path as well as along it err AB . The target waypoint is chosen such that it compensates for this difference. This gives us a pseudo target waypoint W p (x p ,y p ,t p ) to aim for if the glider should get to B in the presence of currents. 2.6 Results 2.6.1 Results from deterministic planning without current predictions We compare the paths planned by our method with those planned by two other algorithms, which we callShortest−Path−Low−Risk method, while the latter is called Min−Risk−At−Surface−Path. Shortest−Path−Low−Risk first finds the shortest path from start to goal based on 8-connectivity. Then it uses dynamic programming to choose the lowest risk set of waypoints at which to surface along that path while still satisfying the d max constraint. Min−Risk−At−Surface−Path starts by finding the lowest risk path assuming 8-connectivity, and then also uses dynamic programming to choose the lowest risk set of points at which to surface. 39 In Figures 2.8, 2.9 we show these as the blue and green lines respectively. Our path-planner for minimum risk paths with a constraint on the dive length between waypoints to d max has the lowest risk among all three paths. Our algorithm pro- duces the same path as theMin−Risk−At−Surface−Path path whend max is set to a distance equivalent to one cell on the occupancy grid map. This is similar to planning a path that surfaces in every cell between the start and goal location. For values larger than d max however, it always finds lower risk paths. Figure 2.8 shows paths planned by all three algorithms to go from a start to a goal waypoint which we have manually planned for in the past based solely upon information about the official shipping lanes. The manually planned path is very similar to theShortest−Path−Low−Risk path which is 11.8 times riskier than our Min-Risk path (shown as red lines) while being only 1.1 times longer. For this choice of start and goal locations, the Min−Risk−At−Surface path is even riskier overall, being 33.4 times riskier than our Min-Risk path while being 3.23 times longer than that found by the Shortest−Path−Low−Risk algorithm. In figure 2.9, we choose our start and end waypoints such that the shortest path would cross a large number of shipping lanes with high risk. We notice that in this case the shortest path is very risky in comparison to that found by our algorithm - it is approximately 5200 times riskier, while being only 3.3% shorter than our Minimum risk path. Our path is also significantly better than theMin−Risk−At−Surface path, which chooses to avoid most of the shipping lanes by going around Santa 40 Catalina Island. The Min−Risk−At−Surface path is 750 times riskier than our minimum-risk method while being 42.8% longer. We chosed max to a frequently encountered value of 8 km for these plans. Figure 2.8: Here Shortest-Path-Low-Risk algorithm is 11.8 times riskier than the minimum-risk-path chosen by our planner which is optimal w.r.t. risk and a given d max = 8 km. Even though the Min-Risk-At-Surface path’s waypoints are chosen to minimize the total risk of the path at the surface while respecting the d max constraint, it is 33.4 times riskier than the minimum-risk path found by our method and 3.23 times longer than the shortest path algorithm. 41 2.6.2 Resultsfromdeterministicplanningwithcurrentpredictions We compare the minimum risk planner output which was introduced in [14] which we callMin−Risk planner, with the time-expanded minimum risk planner which we call Time−Expanded planner, introduced here. In Fig.2.15, two paths are planned. The Min−Risk planner, was used to plan the path in magenta, while the Time−expanded planner was used to plan the path shown in cyan. With- out currents, the glider is assumed to have a nominal velocity of approximately 1km/hr, so we substitute time for distance, as was done in [14]. The output of the Minimum−Risk planner in Fig.2.15 is clearly not the desired behavior, as the glider has no chance of making it across the shipping lanes due to the strong op- posing currents. The time-expanded version has chosen the waypoints in its path, precisely from a set of achievable waypoints, provided the current predictions hold. This suggests that although theMin−Risk planner produces a numerically lower risk path, it may produce a path which is infeasible for actual execution. In Fig. 2.13, we see that as we allow t max to grow, the Time−Expanded planner producespathswhichhaveriskvalueswhichareaslowasthosefromtheMin−Risk planner. Note that both these planners are optimal and produce identical plans in the absence of currents, except that in the real world the plans by the Min−Risk planner might not be executable. We also notice that the minimum risk planner has fewer waypoints as well as shorter paths in general. We performed 50 trials for each value of t max with randomly chosen start and end locations in the same map 42 to produce these statistics. TheMin−Risk planner is more than 1000 times faster than the Time−Expanded planner when executed on the same computer. Figure 2.16 shows two paths planned, one opposing the currents, while the other is with the currents. We notice that the planner chooses to go south east via a much longer route (which takes 67 hours to execute), while the reverse path which is helpedbythecurrentsiscompletedinunder48hours. Thistimeisverycomparable to that taken by the same glider on its return path under similar ocean conditions. In the plan opposing the currents, the planner chooses to move southward since this allows it to cross the high-risk region near it at a narrower part. T max has been set to 120 hours for all of the planned paths in this paper. Figure 2.17 and Fig. 2.18 show three paths each with different values of t max . Smaller values of t max typically require more waypoints. We also note that up- current paths typically need more waypoints since the planner typically has to get close to the high risk region it needs to dive under to be able to get past the risky location. When going down current the glider has higher chances of covering this distance within t max and can use a longer leg if useful. 2.7 Conclusion We have proposed an algorithm that systematically creates and uses risk maps for gliders based upon historical AIS data from ships, which is then used to plan 43 paths through a given region from a designated start location to a goal. The glider is required to surface within a distance d max , so that it can transmit data. We apply this constraint to paths planned both by the regularA∗ shortest-path search and the A∗ search on the risk-map for the minimum risk path (while performing a single-step search). We also perform an A∗ search on the search space of the minimum-risk along with the d max constraint. This produces the lowest-risk path at the surface, that satisfies this constraint. In Figure 2.11 we see how increasing the nominal risk for the entire map helps reduce the path-length for the same value of d max . The paths generally tend to get shorter as the user increases α (although this is not necessary). By increasing α, the user can adjust the path chosen, by allowing the glider to traverse a riskier region. We do not have a principled way to tuneα, although using binary search to get to a reasonable value looks promising. From Figure 2.10 it is clear that as we allowd max to get larger, the expected total risk for the path reduces. While this is true to some extent, the glider’s navigational accuracy begins to deteriorate with distance, and hence larger d max values could potentially increase the uncertainty in glider surfacings, which in turn could substantially increase the risk experienced by surfacing at a location which is significantly riskier than that intended. We find that the planner tends to take paths which are more direct toward the goal since it is easier for it to reach lower risk waypoints which might be located further away. 44 We also explore the use of ocean current predictions in a deterministic way so as to improve the performance of planning. The ocean currents can be utilized to get realistic edge lengths (in terms of time) for the real travel time between waypoints. In such a scenario we do not connect nodes which have travel times larger than those stipulated (if the currents oppose the glider), while we could connect nodes further away (if the currents aid the glider to travel further). The planning in this case is performed in a higher dimensional space resulting in a time-expanded graph (with edges going forward only, along the time axis). We also described an iterative algorithm for the rapid computation of an alternative pseudo-waypoint which a glider aims for (instead of the true waypoint), but results in the glider to surface much closer to the desired waypoint due to the effect of the ocean currents. This type of method relies on accurate knowledge of ocean currents. Our practical experience with using this method has been that it works best during the first 8- 12 hours of the predictions for a day, but the benefits do not appear to provide much improvement beyond that. As ocean current models improve, we expect this horizontoincreaseinwhichcasedeterministictime-expandedplanningwillbemore important. Clearly the uncertainty in the ocean current predictions needed to be addressed - which leads us naturally toward planning techniques which look at the stochasticity in the ocean current measurements. 45 Figure 2.9: A path stretching from the Malibu basin to the coast off Orange county illustrates how aiming for the shortest path could lead to bad results. Even though ourpathis3.5kmlongerthantheshortestpathbetweenthestartandgoallocation, the shortest path is 5200 times riskier. The path planned by our algorithm picks the minimum risk waypoints all along the path for which it accrues R min at each waypoint. 46 Figure 2.10: Plot of the variations in thed max constraint parameter. As we increase d max we see that the usual trend is for the paths to get shorter while the risk associated with that path reduces. This is because we allow the glider to dive under hazards for longer distances. 47 Figure 2.11: Plot of variations in the α tuning parameter which allows the user to assign higher nominal risk to each waypoint. By increasing this value of α, we find that the minimum risk paths usually get shorter, since we are essentially allowing the glider to take a riskier path to the destination. 48 5 6 7 8 9 10 11 12 13 0 0.1 0.2 0.3 0.4 0.5 t max (hr) Min−Risk avg run time (s) 5 6 7 8 9 10 11 12 13 0 200 400 600 800 1000 t max (hr) Time−Expanded avg run time (s) Figure 2.12: Running times for the Minimum risk planner vs. the maximum time of flight (100 randomly chosen start and end locations for each value of t max ). The Minimum-risk planner is more than 1000x faster on average as compared to the time-expanded planner (due to the added dimension of time, as well as the overhead in testing feasibility of paths). We also see that although longer paths have a larger branching factor, beyond t max =11 there is a gradual drop in the time to find the optimal solution. This is probably because longer paths can count toward theT max limit faster, and thus get eliminated quicker. 49 5 6 7 8 9 10 11 12 13 0 200 400 600 800 t max (hr) Average Risk Min−Risk Planner Time−Expanded Planner 5 6 7 8 9 10 11 12 13 5 10 15 20 t max (hr) # of Waypoints Min−Risk Planner Time−Expanded Planner 5 6 7 8 9 10 11 12 13 5 10 15 20 t max (hr) PathLength Min−Risk Planner Time−Expanded Planner Figure 2.13: Plots of (a) average risk values, (b) average number of waypoints in a plan and (c) average path-length vs. the maximum time of flight, over 100 trials for each value oft max . Note that ast max goes up, theTime−Expanded planner’s risk approaches that of the Min−Risk planner even though it uses more waypoints and has a longer path length. TheTime−Expanded version is forced to use longer paths because it has to eliminate many of the longer path segments due to currents. 50 Figure 2.14: The goal waypoint in this case is the red star. If the glider proceeds directlytowardthiswaypoint, itwilldriftoffalongthegreendirection. Theplanner chooses to use the green triangle as the goal location as a pseudo-waypoint. This choice in the presence of the currents helps the glider get to the waypoint within a single iteration. TheMin−Risk planner with no knowledge of currents, is unable to get to the desired waypoints accurately, and consequently has higher risk in practice than the Time−Expanded planner. Also shown in the figure is the grid layout on with the current map and risk maps used in the planner. 51 Figure 2.15: Paths planned with a value of t max =12 hr. The magenta path is the minimum-risk path based on prior work where ocean currents are not accounted for. It results in the lowest possible numerical risk value, but is only valid if there are no ocean currents. The time-expanded planner’s path shown in cyan, on the contrary uses information about the ocean currents. Also shown is an overlay of the ocean currents in the region and the risk map. We plan both these paths starting at 00:00 GMT on July 16, 2011, so as to simulate the conditions from our field experiment which motivated this path planner. 52 Figure 2.16: Paths planned with a value of t max =11 hr. The green path is the time-expanded path planner’s output in the presence of the currents for the same start and goal locations shown in Fig. 3.2. The goal locations are indicated by a "red" G. We plan both these paths starting at 00:00 GMT on July 16, 2011. Since the glider struggles against the currents when going from east to west (in the green path), it is forced to have to take a path going north-wards as well as to make a stop (Wpt 10) in between the two shipping lanes. When travelling in the opposite direction, the glider can take advantage of the currents and make it across much faster. 53 Figure 2.17: Paths planned with a value of t max =5, 7 and 11 hr using only the Time-Expanded planner in the presence of currents. The green path is the time- expanded path planner’s output in the presence of the currents for the same start and goal locations shown in Fig. 3.2. The goal locations are indicated by a "red" G. We notice that the cyan path has more than 20 waypoints to get to the goal, with each leg having a maximum time of 5 hours. The magenta path (t max = 7 hours) has fewer waypoints than the gray path but more waypoints than the green path which has t max = 11 hours. We also see that in order to collect lower risk the green path takes a much more longer path, but one which is shorter than T max of 100 hours. 54 10 km 0 Figure 2.18: Paths planned with a value oft max =8, 10 and 12 hr respectively using ourTime-Expandedplannerinthepresenceofcurrents. Thegreenpathisthetime- expanded path planner’s output in the presence of the currents where the start and goal locations are interchanged from those shown in Fig. 3.2. We plan all these paths starting at 00:00 GMT on July 16, 2011. We notice that the "dark green" and "dark blue" paths have both got similar paths - they move south west initially because they attempt crossing the shipping lanes at a narrower portion. The red pathwhichhast max = 12hrplansaslightlycounter-intuitivepathwhichgoesnorth before moving west-south-west with long jumps. It appears to take advantage of the currents and expects to comfortably clear the broad high-risk area in the center. 55 Chapter 3 Risk-Aware Planning under Uncertainty in Prediction Models 3.1 Introduction Autonomous Underwater Vehicles (AUVs) are rapidly gaining popularity in the oceanographic sampling community. The ease of deployment, low operational cost and high resolution sampling capabilities in comparison to ship-based sampling have made them valuable tools for studying the oceans. Propeller-driven AUVs have high maneuverability at the expense of range, but are usually capable of fairly accurate navigation and can travel rapidly between waypoints. Another popular family of AUVs called Slocum gliders are buoyancy-driven vehicles which give up speed and maneuverability for longer endurance. Gliders have speeds in water ranging between 0.2 m/s and 0.4 m/s, with an average speed of approximately 0.27 m/s for our gliders. A glider has a typical endurance in excess of 3 weeks during 56 which it may travel 500 km or more, surfacing periodically for data transmission (taking tens of minutes to hours for transmission via Iridium) and for GPS updates (taking a couple of minutes). The long endurance afforded by the gliders makes them ideal for persistent monitoring of ocean processes over long durations of time. Coastal observatories are a combination of AUVs, sensor moorings, radar sites and communicationinfrastructurewhichhelpscientistsinobservingandunderstanding, the evolution of ocean processes in a coastal region (eg., Rutgers University Coastal Ocean Observation Laboratory [3], CINAPS [4]). When operating gliders in a coastal region such as the Southern California Bight (SCB) 1 , the time spent at the surface exposes them to the risk of colliding with ship and boat traffic. The ports of Los Angeles and Long Beach handle a significant amount of ship traffic, making minimizing collision risk of AUVs with ships in this region, very important (see Figure 3.1). As AUV use in near-coastal regions becomes more prevalent, there is an increasing need for path-planning methods which make operating these vehicles safer in high ship and boat traffic areas, especially if the glider has to spend non-trivial amounts of time at the surface transmitting data. At the same time, regional ocean models ( [25] and [1]) can predict ocean currents, providing a potentially valuable tool for path planning. In this work, we describe risk-aware path planners which aim at taking advantage of ocean current predictions, and account for the uncertainty in these model predictions. 1 The SCB is the region of the ocean within [32 ◦ N,−117 ◦ E] and [34.5 ◦ ,−121 ◦ E]. 57 We know that ocean model forecasts have uncertainty in predictions, which in- crease with time from the nowcast. The first question we seek to answer is whether predictive ocean models improve risk aware planning. This chapter extends upon the deterministic minimum risk planner (introduced in conference paper [14]), and the deterministic time-expanded minimum risk planner (described in a technical report, [15]). In [14] we presented a deterministic strategy using an A ∗ planner to minimize cumulative surfacing risk when traversing between desired waypoints - the effect of ocean currents was largely ignored. As can be seen in Figure 3.2, this strategy can fare poorly in practice. The primary motivation for using plan- ners which use ocean currents were results from field experiments of the minimum risk planner, (see Figure 3.2), where the glider surfaced several times in shipping lanes, due to strong currents between July 16-18, 2011. In [15] we used predictive ocean models deterministically, without explicitly accounting for the error in the predictions during planning. This work extends upon the prior work by introducing planning strategies designed to cope with the inherent uncertainty in the predictive ocean models during planning. The use of ocean current predictions for planning AUV paths has generated sub- stantial interest lately. Most path planners that utilize ocean current information use forecasts deterministically, with few addressing uncertainty. In [26], the authors looked at the uncertainty in predictions by comparing a planner which is omniscient about true currents with a realistic planner which uses 48 hour forecasts. In [27], 58 the authors describe a planner which beautifully captures the effects of flow-fields in planning paths for AUVs. This method is very suitable for planning paths for ensembles of AUVs since it scales linearly with the number of vehicles. Our work expands upon related work from these authors (see Section 4.6) by providing a framework which explicitly uses the uncertainty in the ocean current predictions during planning. In this chapter, we propose two stochastic planners that utilize ocean current pre- dictions in a probabilistic manner. Both stochastic planners use the notion of min- imizing the expected risk of collision, which helps the planners utilize uncertainty in ocean current predictions as well as navigational uncertainty in their respective planning frameworks. We compare to a baseline minimum risk planner that does not use the ocean current predictions, and our results demonstrate significant im- provements from the proposed methods. In this work we make the assumption that most of the error in navigation is due to poor estimates of the true current field the glider is flying through. While we do not explicitly do so in this work, other systematic navigational errors can also be modeled when creating the planning graphs. For AUV deployments in risky regions, it is important to keep the surface time to a minimum. In this work we provide an off-board stochastic planning framework for AUVs which requires minimal surfacing time to update policies. We do this by pre- computing policies (in the case of the risk aware MDP) and shortest-paths (in the 59 case of the Minimum Expected Risk planner), thus only performing lookups, given the AUV’s location and time of surfacing. We used this framework to plan paths for two gliders in the Pacific ocean over a period of 3 weeks. Planning and execution are intertwined in our approach which allows us to re-plan at each time the AUV surfaces during execution of the previous plan. Our planners use a rolling horizon in order to make use of the latest ocean current predictions available (which are more accurate than older predictions for the same time). These characteristics enable our planners to be more adaptive to disturbances that may affect the execution of plans. The key novel contributions of this work are (1) the adaptation of the shortest-path search and MDP planning algorithms to stochastically minimize surfacing risk for AUVs using ocean current predictions, (2) simulation studies which indicate that the use of ocean predictions can significantly lower planning risk, (3) a comparison of the two stochastic planners in simulation bringing out conditions where we can expect significant gains from the use of these planners, and (4) development of a field-able planning framework for stochastic planning which is practical for use in risky areas, and has been tested in the field with two gliders in the Southern California Bight. In Section 3.3 we define the risk aware planning problem and methods used to find the solution. In Section 3.4, we describe methods and algorithms used to create risk maps from AIS data, a description of Regional Ocean Modeling System (ROMS) 60 models used and glider simulations used to produce the planning graph for the stochastic planners. We present results from simulation which provide strong indi- cations of improvements in risk-aware planning for AUVs when current predictions are used. 3.2 Related Work In this section we describe how our work is related to the vast body of work on planning for robots that is already present in the literature. Motion and path planning for robots is a well studied problem ( [28], [29]). Motion planning for robotic manipulators drove much of the early research in this field with much work done in modeling, representing the configuration space (C-space) for robots, such that a path is found taking the robot from a designated start configuration to the goal configuration, by avoiding all obstacles. Optimal paths are a subset of feasible paths where the planner optimizes some cost function which might be the shortest distance, or the energy expended wrt some constraints such as an energy budget or time of traversal. Path planning problems are generally hard - the classical piano mover’s problem was shown to be PSPACE-hard in [30]. When planning for paths in the presence of moving obstacles a common technique used is velocity obstacles [31], [32], and even probabilistic velocity obstacles when dealing with uncertainties [33]. 61 While path planning can be done in the continuous domain, discrete planning al- gorithms are better suited for implementation on computers. Complexity in robots and/or planning environment, increases the dimensionality of the planning prob- lem making the problem much harder. This motivates the use of sampling methods such as Probabilistic Roadmaps (PRMs) and Rapidly Exploring Randomized Trees (RRTs) which sample the configuration space for feasible solutions [34] [35], [36], [37]. Both PRMs and RRTs have been shown to be probabilistically complete and are routinely used to find paths in complex systems, including systems with kino- dynamic constraints [38]. In general, sampling-based methods are not guaranteed to find the optimal solution. Hazard avoidance for unmanned ground vehicles is also well studied [11] as is minimum-risk planning for UAVs e.g. [12] where A ∗ and genetic algorithms are used to plan optimal paths. Our prior work reported in a conference paper [14], de- scribed a path-planning algorithm based upon the A* algorithm to plan risk-aware paths using A* (while ignoring the time-varying ocean fields). We later extended this work by looking at minimizing risk through search in a time-expanded plan- ning environment [15] where we use ocean current predictions deterministically. Recently [13] showed that the probability of collision between glider AUVs and ships is proportional to the shipping density in a region of operation. This in- formation is utilized to determine the probability of safely executing a path in high-traffic regions, although the choice of this path is left to an operator or some 62 other path planner. This probabilistic model can be utilized in planners such as the one described here to find low-risk paths for glider AUVs. Path planning for Slocum gliders [2], is often driven by the goal of sampling parts of the ocean [39]. Informative path planning methods presented in [40] help in sampling spatiotemporal phenomena. For oceanographic applications a commonly used objective is coverage, which is typically performed using lawnmower patterns [41] or spanning tree coverage [42]. More specific coverage algorithms for AUVs also exist [43]. Typically these paths are then provided to AUVs or ASVs in the form of waypoints which are meant to be followed in sequence. These methods usually assumethatalow-levelcontrollerhelpstherobotreachwaypointsaccuratelydespite external disturbances. Our planners are designed to address the lack of accurate navigation due to unpredictable or unobservable disturbances with the objective of reducing collision risk. In complex environments planning ahead to avoid regions which might have obsta- cles might be necessary. In [5] the planner that uses bathymetry, exclusion zones and ocean current databases to generate path corridors along great circle routes using A ∗ . Recently the Fast Marching algorithm originally introduced by Sethian, (see [44], [45] for details), was applied by [6] with extensions to deal with underwa- ter currents where the turning radius of the vehicle is utilized to ensure the plan can be executed. Fast Marching solves the Eikonal equations for a front whose 63 speed of movement does not change sign. This provides an elegant method of de- termining the location of the front by building the front efficiently using ordered upwind methods, in a way similar to Dijkstra’s shortest path algorithm. Just as in Dijkstra’s shortest path, the constraint that edge costs cannot be negative helps maintaintheinvariantontheorderinginwhichthefrontisexpanded, theconstraint that the speed function cannot change sign allows a similar invariant on the order- ing of when the front reaches a point during expansion to be maintained. Planning paths for energy conservation by taking advantage of the current flow such as [8] for time-varying ocean currents, and [7] for fast-flowing estuarine environments have also been studied. These methods typically require good models for currents, as well as AUVs capable of much faster movement than Slocum gliders (which are used in this work). In [24], an iterative Algorithm which improves navigation of gliders for track evolving Ocean processes is described. In [46], the authors discuss methods to improve the dead-reckoning on gliders using a DVL, which can help in the execution of the planned path. State estimation techniques like Unscented Kalman Filters are very useful in warning glider pilots when a particular choice of waypoints may be risky [47]. The A ∗ algorithm has been used to plan paths for AUVs. Carrol et al describe a planner that uses exclusion zones, bathymetry and ocean current databases to generate corridors for paths along great circle routes [5]. More recently, algorithms such as Fast Marching use constraints such as the turning radius of the AUV to 64 ensure paths (planned in the presence of currents) are executable by it. In [48] a heuristic planning algorithm constant-time surfacing A* used Regional Ocean Model data to find paths with the lowest temporal cost in the presence of currents (as applied to Slocum Gliders). In [49], several optimal path planners are described which apply search to the time-varying environment of ocean currents through which AUVs have to navigate. In [50] these algorithms were improved upon to take advantage of parallelization of graph based planners, as well as the use of more robust cost-functions. Planners which use ocean currents deterministically rely on the accuracy of these predictions. While predictive ocean models are improving, they are not accurate enough for planning far into the future deterministically in most cases. In this work, we try to extend the idea of using predictive ocean models tothemoregeneralcasewhereoceancurrentpredictionsareuncertainandplanners using these predictions ought to use this uncertainty. Level set methods (originally introduced and described in [51], [44]), are a more generaltechniquethantheFastMarchingalgorithmforwavefrontexpansion. These methods can handle changes in the sign of the speed function of the wavefront by using higher dimensions to represent it. This corresponds to a slower running time than Fast Marching, but provides the ability to handle more complex phenomenae. [27], path planning in flow fields using the level set method is described whereby the time-optimal path of the vehicle is obtained by solving a particle tracking equation backward in time after they evolve a front from the AUV’s start location until it 65 reaches the goal. This method is ideal for predicting the formation and evolution of swarms through time varying 2D flow fields since the time-complexity scales linearly with the number of vehicles for 2-dimensions. Recently [26], described wavefront planners for Slocum gliders where the perfor- mance of wavefront expansion planners using the full 48-hour ocean current pre- dictions was compared with similar planners which had true knowledge of currents. The paper contrasted planning with realistic current predictions (using 48 hour forecasts at 2 different grid resolutions), with planning with omniscient knowledge of currents (using more accurate nowcasts from historical ROMS data). Control strategies considered were (1) current-sensitive control which accounts explicitly for predictive forces, (2) current-blind control which aims for a pseudo-waypoint which takes the glider to the goal due to the currents and (3) a greedy strategy which aims for the goal. The results from their paper indicate that uncertainty in predictions does affect the performance of the planners used. Realizing the effect uncertainty may have during the planning stage, we extended upon deterministic risk-aware planners to implicitly account for the uncertainty in ocean current predictions. In [47], a model based decision support system for glider monitoring is presented. It uses an unscented Kalman filter to help glider pilots decide mission parameter corrections for the safe operation of gliders. This is the closest work to risk-averse decision making while glider planning to our work and it follows a similar line of approach to solving the problem. The primary difference is that their system is 66 intended to advise a human operator about when not to choose a particularly risky path, whereas our work provides a plan for the lowest risk path in expectation. While planners which use ocean models for planning in time-varying ocean currents exist ( [49], [48], [15]), we did not find a planner which (1) implicitly accounts for the inherent uncertainty in the ocean current models during planning and (2) incorporates the predictions without impractically increasing the complexity of the planner for near real time use in AUVs in the field. 3.3 Risk Aware Planning: Problem Definitions and Algorithms Risk aware planners in this work, aim to reduce the risk of collision between the robotandstaticordynamicobstacles(atthesurface), usingaprobabilisticriskmap described in Section 3.4.1. In this section, we will begin with a description of the Minimum Risk planner which aims at deterministically minimizing the cumulative surfacing risk for the AUV without considering errors due to environmental factors such as ocean currents. We follow this with a description of the Minimum Expected Risk planner in section 3.3.3 and the risk aware Markov Decision planner in section 3.3.4. 67 3.3.1 Problem formulation: Deterministic minimum-risk planning We define the minimum-risk pathP ∗ = [s 0 ,s 1 ,...] as the path which minimizes the total risk at the surface among all feasible pathsP, going froms 0 tos g wheres 0 is the start location and s g is the goal location. P ∗ = argmin P∈P |P| X i=1 risk(s i ) (3.1) wheres i = (x i ,y i ) T is a waypoint along the path P = [s 0 ,s 1 ,...] andrisk(s i ) is the value of risk at the surface from the risk map R, at node (or waypoint) s i . We impose the constraint that time of traversal for edge e(s i−1 ,s i ) is e(s i−1 ,s i )≤t max (3.2) where t max is maximum time allowed for submerged traversal between surfacing waypoints. For a constant velocity model of glider travel, d max denotes the maxi- mum dive-length between two consecutive waypoints. Additionally, the maximum travel time on this path, cannot exceed T max to stay within the time budget and feasible (straight-line) paths may not pass through designated obstacles. 68 3.3.2 Minimum-risk Planner The solution to this problem has two phases, (1) construction of the minimum- risk planning graphG min−risk (V,E) where the vertices V, are the same as those describedinsection3.4.4 andE arethelistofedgesconnectingreachableverticesto each other. Directed edges connect vertices to their reachable neighboring vertices. Every edge is assigned a weight equal to the risk of surfacing at the destination vertex. Theedgeweightforedgee(s j ,s k )isrisk(s k ), wheres k isavertexreachable from s j . The path-cost is defined by g(s k ) = k X i=1 risk(s i ). (3.3) A valid heuristic for the estimated cost h(s k ) is h(s k ) =N∗risk min , (3.4) where N = |s e −s k | d max (3.5) and s e is the desired goal state. 69 While heuristic algorithms likeA ∗ are ideal for finding the minimum-risk path using the heuristic above, Dijkstra’s shortest path and Bellman-Ford all-pairs shortest paths are also practical for finding the minimum-risk path on small graphs. Next, we discuss planners which use the idea of minimizing the expected risk of collision instead of the risk at the surface at that location. Note, that the expected risk used in these formulations is dependent upon the noise in the ocean current predictions as well as the expected variability in the ocean currents during the finite time-horizon used for the planning. 3.3.3 Minimum-Expected Risk Planner To take advantage of regional ocean model predictions, we propose minimizing the expected risk of surfacing in high risk areas. The problem is to find a path P ∗ = [s 0 ,s 1 ,...] which minimizes the total expected risk at the surface among all feasible pathsP, going from start s 0 to goal s g in a planning graphG min−exp−risk (V,E) created in section 3.4.4. P ∗ = argmin P∈P |P| X i=1 E risk(s i ) , (3.6) 70 where E is the Expected risk at waypoint s i , making the edge weights for our planning graphG min−exp−risk (V,E) ||e(s,s 0 )|| = X s 00 T (s 00 |s,a(s,s 0 ))R(s 00 ), (3.7) where R(s 00 ) is the risk of collision if the AUV were to surface at state s 00 , and T (s 00 |s,a(s,s 0 ) is the transition probability of surfacing in state s 00 when starting out at locations and performing the action of aiming for locations 0 . The minimum risk path is solved by finding the shortest path between the start node and goal node on this graphG min−exp−risk (V,E). 3.3.4 Risk aware Markov Decision Process Planner ROMs provides forecasts up to two days in advance, which allows us to develop actionmodelsforvehiclesthatutilizetheseoceanpredictions. BesidetheMinimum- Expected Risk stochastic replanner, we also evaluate a risk-averse Markov Decision Process(MDP)plannerwhichutilizesprobabilisticactionmodelsbaseduponocean predictions (where we assume that the stochasticity in the ROMS predictions is uniform throughout the map). An MDP is a tuple (S,A,P,H,R), whereS is a set of states; A is a set of possible actions; P : S×A→ S is a set of transition probabilities between states; H is a horizon indicating the number of time-steps being considered; and R :S×A→R 71 is a reward function which maps some state-action to a reward. In our problem the only positive reward available to the robot is that which we assign to the final state-action resulting in it getting to the goal. At every other location we incur a penalty proportional to the expected risk associated with performing the action a from the previous state s. The distribution of surfacing locations for each set of trials gives us an estimate of the transition function for that action. This transition functionT (s 00 |s,a(s,s 0 )) describes the probability of ending up in states 00 , given we choose to take action a (state s to state s 0 ). The methods described above can be incorporated into path planners for AUVs to improve the safety and reliability of operation. Path planners that incorporate data from ocean currents clearly stand to benefit from improvements in the ac- curacy of ocean current predictions. In addition, confidence estimates are useful both for planners that reason probabilistically as well as for planners that reason about worst-case instances. In particular, we are interested in improving the safety of operation of AUVs by avoiding areas with high probability of encountering a dangerous ship. TheMDPplannerusestheunderlyingtransitionfunctionabovetoperformBellman- updates in a planning graph where the rewardsR(s) are given by negative risk. The 72 Bellman-updates to compute the utility values U, if value iteration is used to find the optimal policy, are described by U i+1 (s)←−R(s,a) +argmax a X s 00 T (s 00 |s,a(s,s 0 ))U i (s 0 ) (3.8) MDPs can be solved using standard techniques such as Value-Iteration, Policy Iteration and so on to get the optimal policy π ∗ . 3.4 Methods and Description of Sources of Data In this section we start with a description of the methods used for the creation of risk maps using Automatic Identification System (AIS) data (section 3.4.1). Next, we briefly describe the ROMS model we use in performing glider simulations before providing a description of the kinematic glider model we use throughout our work. Finally, we explain how we create the graphs used in our stochastic planners in section 3.4.4. 73 3.4.1 Risk Map Creation using AIS Data (a) (b) (c) Figure 3.3: (a) Occupancy map created using historical AIS data. (b) Obstacle map created using bathymetry data. (c) Combined Occupancy map after Gaussian Blur. We create risk maps using historical Automatic Identification System (AIS) data 2 . AIS is a tracking system for ships intended to aid in vessel traffic management as well as for the safe navigation of marine vessels, through the exchange of vessel identification and location information via standard VHF transceivers. In several countries including the US, a large number of vessels are mandated to have oper- ational AIS devices for safety, national security as well as to aid in vessel traffic services. AIS data for many regions is also available on the internet which could be aggregated to create a map such as that in Fig. 3.4 (a). A risk map can be created by discretizing the region of interest into a grid. In this work we use a grid resolution of 100 m x 100 m for each cell. AIS data used, was 2 AdescriptionofAIScanbefoundathttp://en.wikipedia.org/wiki/Automatic_Identification_System 74 collected over a period of 5 months between January and May, 2010 in the region 33.25 ◦ N to 34.13 ◦ N and 117.7 ◦ W and 118.8 ◦ W. The time discretization used in map creation is 10 seconds. We select a spatial resolution for each pixel of the risk map, and then accumulate the number of times a pixel is occupied by at least one ship/boat in unit time defined by O(x,y). Although co-occupancy of a pixel by a ship and the glider does not necessarily imply a collision, Risk(x,y) is directly proportional to the probability of collision with obstacles. IfM denotes the total number of time intervals used to obtain the risk map, then the risk at location (x,y) is defined as Risk(x,y) = 1 M M X t=1 O(x,y,t). (3.9) In this thesis, risk is the likelihood of surfacing in a location whereby there is a known shipping hazard based on the risk map which is essentially a static time averaged occupancy grid based on prior known shipping movements in the region of interest (with a Gaussian blur to account for some uncertainty). We combine bathymetric maps 3 with the risk map, to identify non traversable regions such as land. The bathymetric maps could also be used to intrinsically avoid other possible sources of problems for AUVs such as kelp forests which tend to grow more abundantly in shallow regions with higher nutrient content. By thresholding the bathymetric map at some isobath for navigable locations (e.g. depth > 15 m), we 3 Bathymetry maps with sufficient resolution for AUV planning purposes for the US can be ob- tainedfromtheNationalOceanandAtmosphericAdministration’s(NOAA)NationalGeophysical Data Center (NGDC) (http://www.ngdc.noaa.gov/mgg/bathymetry/relief.html). 75 find that in practice we avoid most Kelp forests while also ensuring a safe buffer from obstacles such as land and beaches. It is also worth mentioning here (as shown in [13], probability of collision risk for AUVs is proportional to shipping density) that the probability of collisions between an AUV and obstacles is proportional to the risk value associated with its surfacing location. Minimizing the sum of the risks is equivalent to minimizing the product of the probabilities of collision (taking logarithms of that product). We choose to use cumulative risk values for paths in this work noting that minimizing a function is the same as minimizing its logarithm (since logarithms are monotonic). We have studied the AIS data and looked for temporal and seasonal variations in traffic patterns. Although there is very little variation between day and night traffic for larger ships (which follow the official shipping lanes), there is significantly lowertrafficinpassengercraftgoingbackandforthbetweenCatalinaislandandthe mainland at night (as opposed to that during the day). This traffic did not decrease significantly in the winter although one could expect that pleasure boat traffic could vary significantly based upon the weather and have seasonal fluctuations as a consequence. In this work we chose to ignore the reduction in nocturnal traffic to simplify the problem. It is easy to extend this work to also use risk-maps which are indexed by time - the simplest case being to use a day-time risk map and a second night-time risk map. Although we have the ability to obtain real-time AIS data, we have chosen to rely more upon historical data instead because (a) AUVs 76 are able to receive new AIS data only after surfacing, (b) accurate prediction of the future path of a ship or boat is not as easy on the time-frames of glider dives (which take several hours). A practical system for risk-avoidance could utilize AIS data to prevent file-transfers and shorten surface times for the gliders if the glider has surfaced close to a ship which is coming its way. Each pixel representing land gets the maximum value of risk. We apply Gaussian blur on this risk map to (a) spread risk onto cells with holes where no AIS data was collected from a location at sea and (b) to help keep the planner safely away from land masses which are also risky. While setting obstacles to the highest risk always prevents the planner from picking an obstacle as a waypoint, collision detection along the path joining the two waypoints with sufficient space for error drifts due to currents is necessary. In prior work we used Gaussian blur to account for uncertainties in glider dead reckoning. In this work the uncertainties in glider dead-reckoning are handled by the underlying graph generation process described in Section 3.4.4 for the Minimum Expected Risk and Markov Decision Process planners. 77 3.4.2 Current predictions from the regional ocean model system The predictive tool for ocean currents we use in this work is the Regional Ocean Model system (ROMS) which is a split-explicit, free-surface, topography-following- coordinate oceanic model [1]. It is an open-source ocean model that is widely accepted and supported throughout the physical oceanography and modeling com- munities. In this work we use the version of ROMS compiled and run by the Jet Propulsion Laboratory (JPL), California Institute of Technology, which provides a data-assimilated now-cast followed by 72 hourly forecasts for the SCB via their THREDDS data server at a 2.2 km grid resolution along the northing-easting axes. Ocean models can be fairly sophisticated and many model other physical and bio- geochemical properties (where phytoplankton and its dependence on nutrients like iron, nitrate, ammonium and silica may be modelled). The ROMs model we use provides predictions for physical parameters such as temperature, salinity and sea- surface height we are primarily concerned with the spatio-temporal current vector fields provided by ROMS. This current vector field is in the form ofu(x,t) (zonal or easting),v(x,t) (meridional or northing) and w(x,t) (vertical) component vectors, where x is a position vector (x,y,z) T and t is time (in hours). In this thesis we assume linearity in the current fields between grid locations. A 3 dimensionalcurrentvalueinaspatialgridarecomputedusingtrilinearinterpolation while that in a spatio-temporal grid are computed using quadrilinear interpolation. 78 Higher-order interpolation schemes are usually expensive, so for most of the work we describe in this thesis, we resort to depth-averaged current fields on which we use bilinear interpolation (for spatial interpolation) or trilinear interpolation (for spatio-temporal interpolation). Depth-averaging is performed by averaging the individual current components along the depth axis between the depths the glider flies between. In general the bathymetry may affect the speed of the current in the vertical direction (upwelling/downwelling). We ignore this effect (which is more pronounced in regions nearer to shore where the bathymetry is more likely to induce these current components) for the computational benefits provided by depth-averaging the currents. 3.4.3 Simulation of glider motion While dynamic models which describe glider motion in more detail can be found in the literature [52], [53], we will use a kinematic model (as done in [23], [48], [47]) because we feel that (1) the overall effect of the ocean currents dominate effects due to dynamics for slow-moving vehicles like Slocum gliders (2) the use of kinematics reduces the complexity of our integration of the glider trajectory making fast simulations more tractable and (3) accurate measurement and/or estimation of all the terms in the dynamics models for gliders is a difficult exercise. The planners described in this work are practical for use in the real-world only when fast simulations for vehicle dynamics can be performed. While we do not present results 79 based upon dynamics in this work, the use of more accurate AUV dynamics models can be expected to improve the results of this work albeit at higher computational expense. Although kinematic models for gliders can be found in the literature ( [52], [53]), we describe the development of a kinematic model for completeness. A constant velocitymodelreliesonthenominalvelocityofthegliderbeingfairlywellestimated. This value is sensitive to physical characteristics of the glider such as ballasting and thenominalpitchangle. Thecoefficientsforthekinematicmodelpresentedherecan be determined quite easily from data logged on board the glider during a mission involving a few dive-climb maneuvers. This velocity can be different for each glider depending upon how quickly it climbs and dives. While traveling between two waypoints in a typical mission, the glider performs a pre-computednumberofyo-maneuversbetweentwopre-programmeddepthsdepth 1 and depth 2 . While diving the glider pitches its noise downward to a programmed angle θ dive and reverses the process by pitching the nose upward while climbing back up at an angle θ climb . Let the average nominal vertical velocities at which the glider can dive (v dive ) and climb (v climb ), we can compute the nominal horizontal velocity for the glider from these and the pitch angle (θ∈ (−π/4,π/4)). The glider uses its pitch servomotor to achieve the desired pitch angle during its dive and climb and usually sets these to the same value (typically≈ 26 ◦ ). It is easy to obtainv dive andv climb empirically 80 although these values may also be estimated from a more rigorous glider dynamics model. (a) (b) Figure 3.4: (a) For simplicity we decouple the kinematic model for the glider in the horizontal and depth planes. In the depth plane, the glider performs yo-yo maneuvers which appear like sinusoids (due to the dynamics of the vehicle where the glider accelerates when diving to a terminal velocity on the downward yo, until it decelerates atdepth 2 to begin an upward yo, followed by acceleration to terminal velocity on the upward yo). When flying in water deeper than depth 2 , the glider takes a nominal timeT to perform each yo-cycle. When flying in shallower regions thandepth 2 , the glider detects the sea floor (using the altimeter) and inflects before reachingdepth 2 - shortening both dive and climb times. Ignoring accelerations and deceleration at both ends, we can arrive at a simple model for the depth of the glider to allow us to do integrate the glider trajectory. The glider pre-computes the number of yo-maneuvers required to get to the goal at a given bearing. (b) This simplified vector diagram for glider dive and climb velocities allows us to compute the nominal horizontal glider velocity v g . 81 v g_dive = v dive tan(θ dive ) (3.10) v g_climb = v climb tan(θ climb ) (3.11) v g = v g_dive +v g_climb 2 (3.12) If the glider executes fairly deep dives such that|depth 2 −depth 1 | 30m, then it will spend the majority of the time gliding at near terminal velocity which allows us to ignore the non-linearity due to the change in velocity in the upper and lower portions of each dive (so equations 3.10 - 3.12 hold for a large part of the yo- maneuver). We compute the time-period T for a single yo, using equation 3.13. T =t dive +t climb ≈kdepth 2 −depth 1 k 1 v dive + 1 v climb (3.13) Given the glider’s present location x i = (x i ,y i ) T and a target waypoint x w = (x w ,y w ) T , the range R to this waypoint, may be computed (making a simplifying rectilinear assumption) using equation 3.14. Similarly, we may also compute the 82 nominal heading angle ψ the gilder will try to maintain to get to the goal using equation 3.15. R = q (x w −x i ) 2 + (y w −y i ) 2 (3.14) ψ des = tan −1 y w −y i x w −x i (3.15) Ignoring any prior current estimates as well as adjustments due to surface drift, we may compute the approximate number of yo-yo maneuvers to get to the goal using equation3.16. Inthisworkweforcetheglidertoignoreanyinternalaveragecurrent estimates which it may observe based on the error between its desired location and the true surfacing location. When performing field experiments, we explicitly specify that the AUV not use internal current compensation computations. N yos = & R v g T !' (3.16) The total time between diving and surfacing during this leg is given by equation 3.17. t total =N yos T (3.17) With the knowledge of the number of yo-maneuvers required to go from the initial location of the glider to that of the next waypoint, we have enough information to be able to integrate the position of the glider through an ocean current field indexed 83 by the intermediate location of the glider (x,t). The values of current for locations are obtained through bilinear interpolation along the zonal and meridional axes (for depth-averaged simulations), or using trilinear interpolation along the zonal, meridional and depth axes (for simulations where we use equation 3.19). Equation 3.18 describes how we can integrate the horizontal trajectory of the glider from the initial location (x 0 ,y 0 ) T as the glider flies for time t total . x y = x 0 y 0 + Z t 0 +t total t 0 cosψ des −sinψ des sinψ des cosψ des v g v sway + u current (x,t) v current (x,t) dt (3.18) For this work, we assume the v sway component is zero (since we consider a very slow-moving AUV), and we attribute any perpendicular velocity to external distur- bances from ocean currents. When performing simulations where the glider depth is required, equation 3.19 may be used to obtain an approximate location for the glider location in the depth column using our simplifying sinusoidal assumption. depth(t) =depth 1 + 2(kdepth 2 −depth 1 k)(1−cos( 2π T t)) (3.19) The kinematic simulator described by equations 3.18 and 3.19, is extensively used throughout this work both in the creation of transition models for the gliders as explained in section 3.4.4, as well as in the evaluation of the planners in simulation. 84 As part of the simulation it is also useful (and in some cases important) to be able to test for collisions with obstacles. There are several methods to do such a test, the simplest of which is a look-up in the risk map for values of risk R(x)≈ 1.0 where a risk value of 1.0 refers to this location being an obstacle with probability 1. Finally we point out that the time spent by the glider at the surface also adds to the overall time of the mission. Once at the surface the glider typically requires to get a GPS fix (on the order of a few minutes), optionally transmit data (between a few minutes to a few hours) and finally to receive a new mission file. This work we assume that the behavior of the glider is identical at each surfacing location i.e. it takes almost the same time at each surfacing location to get a GPS update and to perform data download. In practice we find that if we do not perform data downloads, we require approximately 5 minutes at the surface to get a GPS update, run the planner and transmit a new mission file for the glider to execute. This time is almost instantaneous as compared to that taken by the glider to traverse between waypoints (which is on the order of hours). This allows us to make the simplifying assumption of not including surfacing time during evaluations of the risk on the planner. We have developed deterministic planners which do consider the GPS fix time in addition to data download times, which results in situations where the planner does need to explicitly account for the risk being proportional to the duration of each surfacing. 85 3.4.4 Creation of the planning graph The basic building block for planning in this thesis is the planning graph. Our planning graphG(V,E), consists of an uniform spatial grid of locations forming nodes which can be connected via edges. While there exist several methods to choose locations more intelligently for these nodes (we tried using potential fields to bias nodes toward low-risk locations, as well as randomized graph generation with low-risk nodes), in this work we only consider a uniform spatial grid of nodes for the purpose of evaluation of our planning algorithms. Naturally, the choice of the spatial grid affects the initial locations of the nodes in the graph and can consequently affect subsequent planning. We choose our spatial grid resolution to ensure that (a) edge-lengths are short enough to allow gliders to nominally traverse the edges within the maximum allowed inter-waypoint surfacing time and (b) the state space is manageable for the available computational resources. When performing simulations for generating the transition model, we choose a representative noise for the currents with a standard-deviation σ between 0.1 m/s to 0.001 m/s. The same noise model was used for both the easting and northing current components. The following equations describe how the simulated currents are obtained, where simulated noise is drawn from a Gaussian distribution N (0,σ) yielding simulated current for the easting component u sim (x,y,t) =u pred (x,y,t) +σ u , (3.20) 86 and the simulated current for the northing component v sim (x,y,t) =v pred (x,y,t) +σ v , (3.21) where σ u and σ v are the noise values for each component. We draw new values of σ u and σ v for each simulation (instead of at every simulation step). This is because the true currents differ from those predicted by the noise value we choose for the simulation and they are not expected to change quickly. By performing a large number of simulations with different start times and noise values, we can approximate the distribution of surfacing locations for attempting to do an action a starting from waypoint s and aiming for waypoint s 0 . We realize that Gaussian perturbations may not accurately represent large-scale systematic errors in the predictions and are investigating the use of methods like Gaussian processes to learn improved models from old current predictions [54]. During graph-construction, we cycle through every node and look at all its neigh- boring nodes within a distance d max from it. These nodes form the neighborhood of the node we are considering at this time and we add a directed out-going edge from our node to each of these neighboring nodes and set the edge weight to zero initially. We avoid adding edges to nodes too close to land, or nodes with obstacles between them. 87 These edges in the graph form what we call a transition model providing the prob- ability of transitioning from one node to another. The transition model we use relies upon a discrete grid of states for the AUV in the ocean, with possible actions being the choice of moving from one state to any of its 8-connected neighbors. Our planning graph is a regular grid, where any state or location (x,y) has 8 neighbors providing 8 actions of attempting to go from this state to any of its 8 neighbors. The glider while performing this action will aim for the center of the desired neigh- boring node (but will be affected by the currents and navigational errors, surfacing somewhere else). With sufficient simulations for each of these actions, we estimate a distribution for the transition probabilities for each action. Both the MDP and the MER utilize the same underlying transition probabilities. The grid is built by performing a number of simulations (> 30) for the AUV traversing each pair of grid locations under the influence of ocean currents with different times of start for each simulation to generalize for temporal variations. The distribution of surfacing locations for each set of trials gives us an estimate of the transition function for that action. This transition functionT (s 00 |s,a(s,s 0 )) describes the probability of ending up in states 00 , given we choose to take actiona (from states to states 0 ). The graph generation process described here provides a general way of creating probabilistic transition models. It is worth noting that errors in glider surfacing predictability can occur both due to dead reckoning navigational uncertainty as well as errors in estimation or prediction of the currents. If a relatively good model for the glider 88 is available the navigational errors are usually negligible relative to the errors in ocean current predictions. In this work we assume that the error in the navigation of the glider is primarily due to the errors in the current predictions. Navigational errors can also be similarly incorporated during the creation of the planning graph, although we did not do so in this work. 3.5 Planner Simulations In this section we report a subset of representative simulation results performed on the planners which show the conditions under which we can expect to see improve- ments through the use of planners which use ocean current predictions. The first set of simulations is performed on a relatively large planning graph for the entire SCB region within which we deploy the gliders. This set of simulations focuses on performance of the planners in a realistic scenario with a risk map which is built entirely from AIS data from ships as described in section 3.4.1. Here we analyze the ocean current predictions to look at the temporal autocorrelation of each node in the planning graph, to determine if it is well autocorrelated or poorly autocorre- lated and perform simulations which look at start/goal pairs that are either in good or poorly autocorrelated locations for comparison. The second set of representative simulations use a virtual risk map which is also used for performing field experi- ments. Here we look at the simulation performance of simulated gliders running 89 the Min-Risk, Minimum Expected Risk and the risk aware MDP, with and against the median ocean currents. 3.5.1 Comparison of planners in simulation in the southern California Bight Figure 3.5 shows the ocean current histograms for the first day of the months of January and February 2011. We see that the currents are much faster on January 1, with a significant portion of current values having a magnitude larger than that of the glider’s nominal horizontal velocity. Such a situation can as expected make navigation in regions with fast currents highly risky - because the glider will most likely be unable to fight the currents in those regions. Beside the high-speed and low-speed regimes, it is also interesting to look at paths that may originate or end at nodes which may have poor spatio-temporal autocor- relations. A poorly autocorrelated region (in terms of spatio-temporal variability of the ocean currents in it) usually signifies that the currents within it could vary sig- nificantly - thus making it substantially harder to use any deterministic corrections to improve upon the navigational accuracy of the glider. We propose using ROMS data to generate action models for AUVs for locations in the ocean that exhibit good temporal autocorrelations. Autocorrelation is a commonly used tool in statistics and time series analysis, and is usually defined 90 (for a second-order stationary process X t with mean μ, variance σ, and lag τ) as shown in Equation 3.22. The normalized form of autocorrelation is a number in the range [−1, 1], where 1 indicates perfect correlation, while−1 indicates perfect anti-correlation: R(τ) = E[(X t −μ)(X t+τ −μ)] σ 2 (3.22) ROMs provides forecasts up to two days in advance, which allows us to develop actionmodelsforvehiclesthatutilizetheseoceanpredictions. Wefindthatformost locations in the Southern California Bight during the first six months of 2011, ocean current magnitudes have an autocorrelation coefficient of at least 0.5 at 48 hour lag. In our discretized planning graph, we eliminate locations which exhibit poor auto-correlations by the end of 48 hours. We evaluate two risk-averse path planners for AUVs, which utilize probabilistic action models based upon ocean predictions - (1) a Minimum Expected Risk (MER) planner based upon a continuation of work described in [14], and (2) a Markov Decision Process (MDP) planner. We performed a large number (∼ 8000) of simulations between every pair of nodes in the planning graph as the start and goal locations (with 10 different start times for each pair). A simulation is termed a success if the glider is able to successfully get to the goal within a maximum number of surfacings (50 in this case). If the glider ends up surfacing too close to or within an obstacle, the mission ends up being aborted and we term this simulation as an abort. If the glider neither aborts the mission, nor is able to get to the goal within the maximum number of surfacings 91 then we term this simulation as a time-out. Although we do not want the glider to surface in shipping lanes, a surfacing in these does not count as an abort. It does add to the average risk associated with the execution of that plan. The statistics for the simulations are presented in Figures 3.6 and 3.7. (a) Simulations on Jan 1, 2011 (b) Simulations on Feb 1, 2011 Figure 3.6: Results for simulations performed on well autocorrelated nodes (start and goal locations are well autocorrelated) on January 1, 2011 and February 1, 2011 respectively. January is a month with fast currents which overwhelm the glider in many cases causing it to have very low success rates. Most successful path completions are those which are along the flow of the currents, with very few successful completions in the reverse direction. In January, due to the dangers involved in fighting the currents the strategy chosen by the MDP is usually to go to a safe area and loiter until conditions improve instead of trying to head toward the goal. On the contrary, the MER and Min-Risk planners are very goal-directed and hence attempt to make headway to the goal even though this can result in them being forced toward land by the strong currents. Interestingly enough, there is very little difference in the performance of the planners in terms of success or failures. The MDP is certainly more conservative and consequently times out more often. 92 (a) Simulations on Jan 1, 2011 (b) Simulations on Feb 1, 2011 Figure 3.7: Results for simulations performed on poorly autocorrelated nodes (start and goal locations are poorly autocorrelated) on January 1, 2011 and February 1, 2011respectively. Januaryisamonthwithfastcurrentswhichoverwhelmtheglider in many cases causing it to have very low success rates. The benefits of the MDP in being able to cope with higher variability in the currents, as opposed to the other two planners gives the MDP a definite edge w.r.t. poorly autocorrelated nodes. We see that in the month of January the MDP makes the right decision by loitering instead of being goal-directed thus avoiding a large number of potential aborts. In February this over-conservative behavior earns it fewer crashes, but also prevents the MDP from being comparable to the other two planners in completion rates. While there are ways to make the MDP more goal-directed and less conservative, we set this MDP up to be conservative. Both the MDP and the MER planners outperform the Min-Risk planner which does not take ocean currents into account performs poorly in months with fast currents. The Min-Risk planner is performs even worse when the goal or the glider’s start locations are badly auto-correlated nodes. The average risk values for successful paths is also generally higher than that for the MER and MDP planners. Locations thatarepoorlyautocorrelatedtendtobenearlandandsuchareaswithinconsistent behavior in terms of currents pose the biggest challenge to gliders navigating safely. Moreover, the transition models used in the planning graphs in this work make 93 the simplifying assumption that the currents are stationary. Regions with higher autocorrelations are closer to this assumption and as such, we expect these results to generalize to regions outside the Southern California Bight. The MDP planner in these simulations is clearly conservative - choosing to loiter in asaferegioninsteadofmakingprogresstowardthegoalespeciallyonthedayswhen the currents are significantly faster. Although the planner sacrifices reaching the goal successfully, we find that most of the reasons for land-proximity mission aborts for the other planners were due to the glider being forced to fly against stronger currents which usually push the glider toward land. The behavior exhibited by the MDP in such cases usually involved moving with the currents to a low-risk area nearby and loitering there. The MDP’s loitering behavior in this case is emergent and has not been explicitly encoded by us. We have noticed that it emerges when currentsarefasterthanthespeedoftheglideranditisdesirablebecausetheplanner couldwaituntilthecurrentsareslowerbeforeproceeding(ourfinitehorizonplanner allows this to be done in practice). We realize that this behavior is over-conservative because the success rates on the days when the currents were slower than the velocity of the glider were low. When the currents are relatively slow, and we are navigating between well correlated nodes, the MER planner is a good choice. For poorly correlated nodes and days with fast currents, the MDP is a safer bet and will ensure the safety of the vehicle more reliably. 94 3.5.2 Comparison of MDP, MER and Min-Risk Planners on Virtual map To provide additional comparison, we developed a virtual world (see Figure 3.8), within which we have shipping lanes and a virtual archipelago of islands. This is the same environment which we use in field trials described in Section 4.5. The virtual world we use has dimensions of 53 km x 39 km. First, the environment can be made more challenging with addition of an arbitrarily large number of virtual islands and shipping lanes. Second, we can lower the true risk of operation by selecting a region which does not have real-world shipping lanes in it. Finally, by taking advantage of the increased complexity of the planning world, we can perform more interesting experiments in a smaller area which allows us to run more field experiments than we would otherwise be able to. The major drawback of using virtual islands is that we do not model the effect of these land bodies on the currents. This results in elevated risks of collisions with virtual land-bodies than would happen in reality (since real land-bodies would affect the flow of currents around them). For the purposes of the evaluation of our planning algorithms, we ignore these effects, noting that the predictive ocean model would account for these changes had the virtual worlds been modeled in it. Before we present results from the field, it will be interesting to learn from the behavior of the planners in the virtual world. Simulations for each pair of vertices from these start and goal locations were run (from 6 uniformly distributed vertices 95 in the planning graph). For each (start,goal) pair we perform 10 simulations with 6 hour starting time delays with the first simulation starting at 00:00:00 UTC on Aug 18, 2012. The daily one-day forecast for each simulation was used (without any additive noise during simulation). We analyze the results by breaking up the simulations into 3 regimes depending upon the difference in the goal direction and that of the median currents in the region we are looking at during the time period Aug 18 to Aug 24, 2012. There were 90 simulations each where the direct path to the goal is along (or against) the median current direction. The remaining 120 simulations had current vectors which were closer to perpendicular to the desired goal direction. If the direct path to the goal is in the same general direction as the median current vector, then Figure 3.9 (a) indicates the success/abort/time-out ratios for each planner. Clearly, the Minimum-Expected Risk planner is the best performer here. The MDP (with a goal reward of 10.0) has the highest mission abort rate as well as time-out rate among the three planners. Among the successful paths in this regime, the MDP takes nearly double the amount of time to get to the goal as compared to the Minimum-Expected Risk re-planner and the Minimum Risk re-planner. The Minimum-Expected Risk planner has the best performance across path-lengths, time to get to the goal, and cumulative risk of the path taken. Similarly the minimum risk planner performed well in all these categories, handily beating the MDP in almost all categories. 96 When the direct path to the goal is in opposition to that of the currents, we begin to see the advantages of planning with ocean current predictions. Here, the naive Minimum-Risk planner has the maximum number of aborts (50.8%) as shown in Figure 3.9 (c). Although the MDP has fewer successes than the minimum expected risk planner when going with the currents, it has the fewest crashes when going against the currents. The MDP has a significant number of paths which are longer than those of the MER planner causing it to have longer paths and a longer amount of time to get to the goal. Not surprisingly this behavior results in many time-outs - although the MDP has the least aborts among the 3 planners. Figure 3.9 (b,d) shows average risk, average time and average path length for all successful runs. In general the minimum-risk planner has the lowest average risk especially when going against currents. The low average path length and average time associated with the successful path completions of the minimum-risk planner are intuitive - this planner is only able to reach the goal in 49.2% of the cases and most of these executions involve fairly short paths. The MDP has the highest successratesandisabletohavefairlylowriskseventhoughtheaveragepathlengths are high. Adding path length as a criterion for optimization while useful in practice is not explicitly used in the objective function of this planner. Typically if there exists at more than one feasible path to the goal, the planner picks a path which has fewer surfacings, since the cost of this path will be the sum of the expected risks associated with each surfacing location. This usually translates into the path 97 taking less time to execute. When currents are fast and the glider has to pass through narrow passages (such as in the virtual world), a strategy that appears to work well is to wait at a safe location for calmer currents, and then attempt crossing the narrow passages safely. Among the planners we use, the finite-horizon MDP’s emergent behavior (between switching) exhibits this characteristic. 98 (a) Completion statistics with currents (b) Avg. statistics for successful paths with currents (c) Completion statistics against currents (d) Avg. statistics for successful paths against currents Figure 3.9: Figures (a, c) on the left show the Success/Abort/Timeout percentages for 300 simulations in the virtual map for (start,goal) pairs whose direction to goal from the start is approximately (a) along that of the median current, (c) opposing the median current direction. Figures (b, d) on the right show the average risk, average time and the average path length for all successful trials approximately (b) along the median current direction, (d) against the median current direction. It is clear from the high-rates of mission aborts that the naive Minimum-Risk planner has very poor performance in the presence of currents - it is more likely to fail than to succeed. The average risk is much lower primarily because it was successful only at shorter paths (which accumulate lower risk). When traveling toward the goal in a direction aligned with the currents, the Minimum-Expected Risk planner is certainly the best choice. 99 Overall, the MDP is the most consistent under the more difficult circumstances of traveling against the currents, although there are better planners that may be used when the AUV needs to go in the general direction of the current flow. It is also clear from these simulations that the planners that use current predictions outperform the planner that does not do so (by a factor of up to∼ 2x when the currents are not in an aiding direction to that of desired travel). Simulation of the execution of the Min-Risk planner alongside the two planners which use the ocean current predictions attest to considerable gains which can be made through the use of these predictions. These gains (as may be expected) are much higher when the currents are faster, indicating that it is not only important to factor currents into planning, but also that it is feasible to use predictive ocean models. The results from simulation in this work clearly show the importance of being able to use ocean current predictions when attempting to do risk-aware planning in the ocean. The planning algorithms allow us to choose waypoints for AUVs at each surfacing such that we utilize the ocean current predictions in reducing the expected risk of colliding with an obstacle. The two planners we describe in this work rely on classical planning techniques, such as shortest paths in a graph and Markov decision processes, both of which are more likely to succeed at getting the AUV to the goal than a naive method which does not use the ocean currents during the planning phase. 100 In our analysis of the performance of these algorithms in simulation, we found that when the current velocities are relatively low and that the best planner (in terms of high-success rate and low average risk, average time and average path length) is the Minimum-Expected-Risk planner. The MDP with the reward-setup used in our work was over-conservative (since it had a much higher proportion of time-outs) and is not as goal-directed having a lower success rate. This conservative behavior is useful when the velocities of the currents are high - in such circumstances the MDP outperforms the other planners with the lowest number of mission aborts. We also observed that the MDP had better performance when the start or goal locations started or terminated at nodes in the planning graph located in poorly autocorrelated regions. Using the MDP for start or goal locations in these regions drastically reduced the potential for crashes especially in months where the currents velocities had a large magnitude. 3.6 Field Trials In order to test the validity of our planners in the field, we used two Teledyne Webb ResearchSlocumgliderssuchastheoneshowninFigure3.10. Aregionoftheocean in the Southern California Bight was designated for testing which contained a real land-body (in the form of Catalina island to the south). The real shipping lanes bounded this region to the north and the east (as shown in Figure 3.8) to keep gliders out of the path of real-world shipping traffic during experiments. We also 101 included virtual shipping lanes and islands which only the planners were aware of, to create a challenging test region for comparing simultaneous path executions of the planners on different gliders under the same ocean predictions and dynamical conditions. Figure 3.10: One of the two Slocum gliders used in the field experiments (viewed from just below the water surface) while it is executing a dive. Notice the absence of any thrusters for propulsion - a feature which gives the Slocum glider mission longevity at the expense of speed. The nominal speed attained by a glider is heavily influenced on its ballasting as well as the dive/climb angles used. 3.6.1 Experimental setup We used two Slocum gliders with each assigned a planner output to execute. To ensure that both gliders start the experiment at approximately the same start location and time, we used a holding mission where the gliders moved along a square of sides 200 m which was centered around the start waypoint. This scheme ensured that before the start of the experiment both gliders called in every 15 102 minutes. Before each trial, we download the latest ocean current predictions and pre-computed the probabilistic transition models for the nodes in the planning graph over the next 12 hours. Every time a glider calls in, we use the reported GPS location as the start location of the glider and run the corresponding planner for that glider. The planner outputs a set of waypoints which the glider should traverse, which are converted into the mission argument file format and uploaded to the glider via the satellite link being used for communication. Finally, the glider is commanded to execute the new mission plan. When the glider resurfaces, this procedure is repeated until the glider reaches the goal or the output of the planner is a desired location that exceeds the safety precautions. We term these termination criteria as a successful or an aborted mission respectively. We deployed two Slocum gliders for the purpose of testing our planners at sea. The firstglider(Rusalka)wasdeployedonJuly19, 2012, whilethesecondglider(He-Ha- Pe)wasdeployedonJuly27, 2012, neartheisthmusofTwoHarbor’satSt. Catalina Island. Rusalka was initially used to test out both the planners individually during the first week of deployment. The second glider allowed us to run two different planners, one on each glider with the same environmental conditions and the same start-goal pairs, for comparison. ROMS forecasts become available every morning approximately at 13:00 UTC (which is 6 am PST). The average time between consecutive surfacing locations for the gliders was approximately 3 hours (±2 hours). Computing new transition 103 models (at the time of the experiments) took approximately 1.5 hours. All the transition models for up to 48 hours from the time of arrival of ROMS data are pre-computed. By pre-computing these for validity during a 6 hour time interval at a time (as done in the simulations on the virtual map), it is easy to index into the appropriate planning graph for each planner given the surfacing location of the glider at that time. The glider position was manually provided to the planner by generating a mission file which was semi-automatically uploaded to the glider via the Teledyne Webb- Research Dockserver (a computer acting as a gateway to the gliders). Our program would perform some forward simulations of the output of the planner (assuming noisy current predictions), which allowed us to ascertain that the plan provided to the glider is unlikely to be catastrophic. We perform these simulations with an ensemble of 10 gliders at this step to provide some confidence in the likelihood of successful execution (as shown in Figure 3.11). The operator at this point makes the decision whether to run the new mission uploaded to the glider or whether to abort the mission for that planner. In this chapter, we report the results from two of the longest continuous runs of bothglidersflyingsimultaneouslyeachrunningtheMDPandMERplanners. These experiments were executed between 5:30 am UTC on July 28, 2012 and 8:00 pm UTC on August 1, 2012. These runs are representative of the other shorter paths 104 which we tested during the month-long deployment of the gliders in the Pacific ocean in Southern California. Figure 3.11: An example of a forward simulation showing how the glider is ex- pected to behave over the next few steps of the planner output. The location of the glider when it surfaces is taken as the start point of the simulation, and 10 simu- lations are performed using ROMS data predictions (perturbed by a small amount of noise to simulate variability). In general, if there is high variability in the ocean currents, there is high variability in the simulated paths as well because the glider experiences significantly faster or slower currents which affect its trajectory. Here the currents have very little variability and their magnitudes are much smaller than those of the glider resulting in all 10 hypotheses being fairly consistent with each other. Also note that the currents change with time and the current vectors de- picted in this figure are those during the final portion of the last simulation. In situations where there can be high variability in the currents, we would see a much higher spread in the predicted trajectories. If any of these predicted trajectories ends up running into land, the operator would become aware that there is risk associated with such a path. 105 3.6.2 Experiment 1: Minimum-Expected-Risk and MDP execution along currents MER execution MDP execution Start Goal St. Catalina Island 18 km Figure 3.12: Gliders start in south-eastern portion of the graph (at 5:30 am on Jul 28, 2012) making their way toward the goal waypoint to the west. The currents during this period were generally aiding the motion of both the gliders. Both plannershaddifficultycrossingthefirstsetofshippinglanesneartheeasternvirtual island. Both planners chose to cross the shipping lanes connecting the western virtual island with the larger land body. Both the gliders arrived at the goal by 6:00 pm on Jul 29, 2012. While both planners ended up having a few surfacing locations which were in shipping lanes, the chosen waypoint (as seen in simulation) is stochastically the lower risk choice. There was very little qualitative difference in the type of paths executed by both planners, with both having approximately the same amount of cumulative risk of collisions at surface. 106 The first experiment began at 5:30 am on July 28, 2012 , in the south east, with both gliders choosing paths which went under the eastern island. The MDP had a more cautious start while the MER planner travelled along the direction of the current flow. At each surfacing, we re-planned paths for each glider. The currents were strong enough to bring both the gliders fairly close to the virtual island on the westward journey. The initial plan began on a planning graph which used current predictions from July 27, 2012. ROMS predictions were updated at approximately 6:30 am each day (July 28 and 29) of travel to the goal waypoint. No significant qualitative difference between the paths executed by either the MER planner or the MDP was found, with both producing approximately the same amount of cumula- tive risk. We note that the glider executing the MER plan had some unexpected surfacing due to a hardware error which resulted in it surfacing more often than ex- pected during the crossing of the shipping lanes near the western island. The MDP successfully negotiated the crossing of the shipping lanes (although it surfaced close to the edge of the first of these shipping lanes). It is interesting to note that the planners decided not to go around the western island even though there are no shipping lanes in this area. After analyzing the currents in the region at the time when the gliders were in this area, we found that the strength and direction of the currents at this time made it riskier for the gliders to attempt to traverse this region without risking collisions with the land-mass to the north. Consequently, although there was a chance of surfacing in the shipping 107 lanes, the planners chose to use the safer of the two options. The gliders arrived at the goal by 6 pm on July 29 with the glider executing the MDP reaching the goal slightly earlier than the one executing the MER planner. 3.6.3 Experiment 2: Minimum-Expected-Risk and MDP execution against currents The second experiment began approximately at 22:30 am UTC on July 30, 2012. The gliders were kept in a holding pattern around the goal and missions were uploaded to both the gliders. In this experiment the gliders were traveling in a direction against the median direction of currents. During the experiment the MDP went around the island while the MER followed a route which was similar to that executed in the previous experiment (with the currents). The MDP chose to completely avoid the shipping lanes near the western island. Both planners chose paths going under the central island (avoiding its solitary northern shipping lane). The MDP continued traveling along the northern direction while the MER went south around the eastern island. The glider executing the MDP surfaced very close to the eastern island at this time. Upon closer inspection of the currents at this time, we realized that there was a drastic change in direction between the current predictions from a day earlier to those on the day when the experiment was conducted. The glider did surface at a location away from land and continued on to make it to the goal successfully (as did the MER planner). In this experiment, the 108 MDP had lower cumulative risk, although the near run-in with the virtual island could have resulted in a possible crash. Note that the use of virtual islands can affect the planners in a more detrimental way than otherwise since a real island would affect the flow of currents around it. Because the flow of currents is not affected by virtual islands in reality, there is a highpotentialforcurrentsintheseregionstoaffectthegliderduringtheexperiment. A glider trying to traverse such a region could risk getting swept into the island as compared to the same situation with an island actually present. That being said however, our aim in this chapter has been to evaluate planners for AUVs which use ocean model predictions in probabilistically minimizing the risk of collisions with ships and/or land. The advantages of being able to generate a more complex world for this evaluation (without adding to the real-world risk of such an experiment) was valuable enough, in our opinion, to choose to perform these experiments this way despite the drawbacks of the virtual land bodies. Our field trials helped us develop a framework for the use of stochastic planners which we used to perform several runs of our planners in the SCB. These trials took place over several days in relatively fast ocean currents, which made the execution of plans by the gliders more challenging. Despite the fact that the virtual map is riskier than the real map (particularly with fast currents), we did not see any instances of the gliders colliding with the virtual islands. 109 3.7 Conclusion In this chapter, we introduce two planning algorithms designed to use the un- certainty in ocean current predictions. We approach the problem of risk-aware planning under the assumption that while ocean current predictions are not ac- curate, we can minimize the risk in expectation. Results from a large number of simulations show considerable gains from the use of ocean current predictions over a naive planner which minimizes risk without considering the effect of currents. Even though the planners we use in this work are stochastically optimal within a finite time horizon, we found a significant reduction in the number of outcomes where missions had be aborted due to a high likelihood of crashes. This is an encouraging result indicating that the use of ocean currents is beneficial when planning paths in strong currents, poorly auto-correlated regions as well as when the glider has to fly against the general direction of stronger currents to get to the goal. Both of the proposed planners are more likely to avoid crashes when navigating the glider through challenging regions in strong currents than a more naive planner which ignores them. We field tested the planners described in this chapter in the ocean using a virtual map to make the planning scenarios more challenging. We successfully ran both planners using the predictions of the currents to generate stochastic models for our planners, demonstrating the practicality of our approach. We developed a framework for probabilistic planning which allows ocean current predictions to be 110 usedinreal-worlddeploymentsofgliders(withoutanymodificationtotheirinternal hardware or software). Encouraged by the absence of scenarios where we had to manually over-ride the planner outputs, we plan to add more autonomy to the decision to run the planner on the vehicle. We believe that a system adding such decision making can improve the safe operations of AUVs in near-coastal areas, allowing the gliders to operate much closer to shore than is usually possible around the clock with minimal human oversight. The stochastic planners investigated in this chapter have their respective uses - the Minimum Expected Risk planner is ideal for situations where the currents are not very strong and when the start or goal locations are not in regions with high variability in current predictions. The MDP on the other hand (being conservative by nature) is better able to handle high-risk conditions such as faster currents, moving against currents and planning paths in regions with high variability (poor autocorrelation). We found that the MER planner was more goal directed, but at times the MDP’s ability to loiter in a safe area until conditions improve is particularly useful in reducing the real risk of path execution. As such we are looking at methods such as reward-shaping to make the MDP more goal-directed. In the MDPs we have discussed in this work, we have assumed that we have an idea for how much prediction noise we have in the system. 111 We are looking at how we might be able to estimate the prediction noise for each edge in the planning graph, making our planners use a more accurate planning graph [54],. We believe that these improvements in prediction noise estimation coupled with a robust planner which is capable of deciding when to loiter in a low-risk area and when it ought to make progress toward the goal will help make reliable and safe operation of AUVs in coastal regions completely achievable in the nearfuture. Ultimatelyplannerslikethosedescribedinthischaptermoveustoward persistent coastal monitoring of ocean processes. 112 Glider Surfacing Locations Unofficial Shipping Lanes 20 km Figure 3.1: View of the region monitored by the Center for Integrated Networked AquaticPlatformSystems(CINAPS)coastalobservatoryintheSouthernCalifornia Bight with the Los Angeles and Orange Counties to the north and east respectively, and St. Catalina Island to the south. The ports of Los Angeles and Long beach are among the busiest ports in the world. The solid yellow lines are the official shipping lanes for inbound and outbound shipping traffic. The gray traces are traces of ships which traversed this region during 2010. Also visible are the surfacing locations of two gliders in this region during 2009 and 2010. Notice that several surfacings fall in the shipping lanes, where gliders could be at high collision risk. 113 Dead Reckoned Path True path Glider dead-reckons that it is at Wpt 2 Glider GPS fix indicates it is not at Wpt 2 7.5 km Error in dead-reckoning due to currents causes surfaces in shipping lanes Figure 3.2: The figure above shows an overlay of the estimated path and dead- reckoned path from a field experiment conducted between July 16-18 2011, where the glider was making its way from the start, (waypoint 1) to the goal (waypoint 5). Also shown is an overlay of a risk map which shows actual tracks followed by ships between January and April 2010. The glider aimed for the locations chosen by the Minimum-risk planner (which ignores currents) but in every case missed waypoints by a large margin due to the strong currents on those days. Errors in dead-reckoning due to currents resulted in three glider surfacings within shipping lanes. 114 (a) Histogram of current magnitudes on Jan 1(b) Histogram of current magnitudes on Feb 1 Figure 3.5: Histograms of current magnitudes for (a) January 1, 2011 and (b) February 1, 2011. Here we histogram the depth averaged current magnitudes at each grid point between into 50 bins between 0 and 1m/s. January 1 is a day with a significant number of current magnitudes are larger than the nominal velocity of the glider. Note that the mean current magnitude in January is almost the same as the nominal speed of the glider. Current velocities of up to 0.8 m/s exist in some parts of the region which corresponds to a speed that is more than 3 times the glider speed. This makes January a very perilous month since the glider is at the mercy of strong ocean currents in much of the planning region. On a day like February 1 on the contrary, most of the currents are slower than the velocity of the glider. The results in section 3.5 (Figures 3.6 and 3.7) also suggest that this is a very important consideration. 115 Western Island Eastern Island Central Island St. Catalina Island Shipping Lane (higher risk) Figure 3.8: Virtual test environment in the Southern Californian region. The outer rectangle has dimensions 53 km x 39 km. The farthest nodes along a straight line east-west axis are separated by 45 km, while the farthest nodes along the north-south axis are separated by a distance of 15.6 km. Also visible are triangles indicatingthenodesinourplanninggraph. Thedarkerpatcheswithintherectangle form an archipelago of virtual islands, which create a challenging planning scenario with narrow corridors. We have also created virtual shipping-lanes (lighter gray lines connecting islands to larger land bodies) similar to those in the real world map. 116 MER execution MDP execution Start Goal 18 km St. Catalina Island Figure 3.13: Gliders start in western portion of the graph (at 23:30 am on Jul 30, 2012) making their way toward the goal waypoint to the east. The currents during this period were generally against the motion of both the gliders. Both the gliders arrived at the goal by 8:00 pm on August 1, 2012. In this run, the MER goes directly east toward the goal, while the MDP starts off going west after which it goes around the western virtual island from the north, while the MER planner goes around it from the south. The length of both paths is almost the same. The MDP came dangerously close to the eastern virtual island, but was still approximately 500 m away. 117 Chapter 4 Learning uncertainty in ocean models to aid planning 4.1 Introduction Buoyancy-driven Autonomous Underwater Vehicles (AUVs) like Slocum gliders [see Figure 4.1 (a)] with proven endurance of several weeks are a natural choice for per- sistently monitoring of the ocean. Such slow-moving AUVs operating in coastal regions, are often at risk of colliding with navigational hazards such as ships [13], or of being swept away toward land by strong currents. Deterministic risk-aware planning techniques ignoring ocean currents [14], perform poorly in practice due to significant errors in navigation while the glider dead-reckons underwater. Cor- rections for averaged current fields experienced by gliders over long dives rarely help improve navigational accuracy in future dives, due to significant variability in currents both spatially and temporally. 118 (a) Slocum glider at surface near Catalina is- land Western Island Eastern Island Central Island St. Catalina Island Shipping Lane (higher risk) (b) Virtual map used to field test planners Figure 4.1: (a) Slocum gliders are AUVs which are valuable assets in coastal obser- vatories. When at the surface, an inflatable air bladder helps raise the antennae in the tail approximately 30 cm above the water surface allowing the robot to com- municate via Iridium or radio modem and receive GPS updates. A glider at the surface is at risk of collision with boats and ships due to its small profile. (b) A virtual risk map built for the Southern California coastal ocean that shows the region we use in this thesis to evaluate our planners. This virtual map is intention- ally designed to be more challenging than typical AUV coastal missions, while also having some safeguards built in to prevent accidental collisions due to operating the glider too close to land or regions with high shipping and boating traffic. Notice how we pretend that the shipping lanes from Figure 4.2 (b) are land bodies to the north and east of the virtual map. Three (virtual) islands and artificial shipping lanes are added to make the environment more challenging. We also pretend that the boundary of Santa Catalina island (off whose coast the test region is located) extends further out to prevent the glider from erroneously coming too close to land during field tests. Measurement of currents on gliders has been attempted, but the required power considerably restricts possible applications [46]. In the absence of current mea- surements, predictive ocean current models are used to compute paths of gliders through current fields [18,26]. Predictions of ocean currents utilize forward simu- lations of prior data, and they incorporate large-scale measurements from satellite 119 and HF-radar. We focus on the Regional Ocean Monitoring Systems (ROMS) pre- dictions of ocean currents [1]. An example is shown in Figure 4.2a for the Southern California Bight region. These predictions are provided daily and give forecasts up to 48 hours in advance. Such predictions are inherently uncertain [55], and this uncertainty affects the navigation accuracy of gliders in the ocean [56]. In prior work, we proposed probabilistic algorithms capable of planning risk-aware paths by using predictive ocean models [57]. These techniques require to create station- ary transition models through stochastic simulations of a glider model in a field of predicted ocean currents. The goal of the planners is to minimize the expected risk of surfacing in regions where the glider might collide with ships and other naviga- tional hazards, given noisy predictions of the currents. A limitation of prior work is the lack of a principled way to estimate the noise in these predictions (when not provided by the model). We address this shortcoming of our prior work in the current chapter. 120 (a) Ocean currents predictions for Southern California (b) Risk map for Southern California Figure 4.2: (a) Ocean currents in the Southern California Bight region predicted by Regional Ocean Modeling Systems (ROMS) [1]. We propose data-driven methods to develop confidence estimates on these predictions, and we integrate the resulting estimatesintoaprobabilisticplannertoimprovethesafetyofautonomousunderwa- ter vehicle operation. (b) A collision risk map built using automatic identification system (AIS) data for shipping traffic (indicated by the warmer lines), and thresh- olding on bathymetry data for identifying land (red color) and navigation hazards indicated by warmer colors. We threshold the occupancy of pixels (of resolution 250 x 250 m) at the maximum value of occupancy in the regions with shipping traffic. The traffic going between the mainland and the island to the south consists mainly of diurnal tourist traffic to and from St. Catalina island. The shipping traffic along the primary shipping lanes near the mainland which enter/leave the ports of Los Angeles and Long beach do not show much variation between day and night. Slocum gliders attempt to learn the average current experienced during a dive to correct for the expected deviation in their path over the next segment of their dive. This is reasonable when the glider is executing segments short enough for the currents not to vary considerably between consecutive dives. We use a similar idea in the policy switching risk-aware MDP where the deviation from the surfacing location predicted using ocean models is used to select a policy that is designed 121 for such deviations. These policies assume a fixed amount of noise in the currents throughout the region, which is typically not the case since some parts of the region gliders operate in might be modeled more accurately than others. The GP-MDP planners employ Gaussian Processes to additionally provide confidence measures aboutthetime-seriescurrentpredictionsfromtheRegionalOceanModelingSystem (ROMS), first described in a conference paper [54]. This data-driven approach improves the estimation of the amount of uncertainty we expect in different parts of the operational region, which can be used to improve the planner performance. The key contributions of this paper are (1) the design and evaluation in simulation of three risk-aware probabilistic planners, which build upon the Markov decision process (MDP) framework – namely a switching MDP, a stationary GP-MDP and a non-stationary GP-MDP, (2) field trials and simulated comparisons of these plan- ners using different amounts of ocean current knowledge. 4.2 Risk-aware planning using MDPs and predictive ocean models A Markov Decision Process is a concept described by a tuple (S,A,P,T,H,R), where S is the set of all possible states of the system (s 1 ,s 2 ,...), A is the finite set of all actions (a 1 ,a 2 ,...) an agent can take from each state [58]. The transition function P : S×A×S×D → [0, 1], is a mapping specifying the probability 122 P (s k |a h ,s j ,t i ) of arriving at state s k given that the action a h was executed at time t i when the agent was in state s j . T is a set of decision epochs or discrete time steps at which actions need to be taken (0, 1, 2,...T max ). The reward function R :S×A×S×T, provides a finite reward valueR(s k |a h ,s j ,t i ) obtained when the system goes from states j to states k due to the execution of actiona h at time step t i . MDPs makes the first-order Markov assumption where mappings P andR depend only upon the state the system is currently in (and not its history). An MDP is well suited for a glider path planning problem because the state of the system is known (through GPS updates) when the glider is at the surface, but is highly uncertain when the glider is dead-reckoning underwater. We provide rewards for the glider to get to the goal and penalize it proportional to the risk of the expected out comes of performing actions from each state. States are said to have a linear additive utility function in expectation which is typically the expected sum of all the rewards (possibly discounted) which can be collected by the agent during the epoch being considered. This utility U(R t ,R t+1 ,...) =E[ P |Tmax| t 0 =t γ t 0 −t R t 0]. A policy π is a set of actions that the agent executes from being in each state. An optimal policy π ∗ is one whose value function U ∗ (s) (called the optimal value function), dominates the value function for all other policies over all histories. A finite horizon MDP is one where the maximum number of time steps T max is finite, and in such a case γ is typically assumed to be 1. An infinite horizon MDP 123 is one whereT max is∞, and in this caseγ (the discount factor) is usually chosen to be some value between 0 and 1. A discount factor less than 1 makes an agent prefer rewards which can be obtained earlier. For these expected linear additive utilities, a policy is as good as the expected discounted rewards it will yield. Setting γ = 1 makes the agent indifferent to when it will receive rewards. ROMS Predictions (Historical + for the day) Variance Estimates Pre-select noise model, or use Gaussian Process regression to estimate noise Risk Map Create Transition Models Solve MDP Transition Models Policy Figure 4.3: We begin with historical ROMs data and the current predictions for the day we are interested in. We use the methods described in to estimate the interpolation variance associated with the current predictions for the day indexed by location and time. These variance estimates are used to create transition models whichcanbestationary(applicableover6, 12or24hourperiods), ornon-stationary (models varying with time i.e. every hour). The transition models are used to solve the MDP using classical methods like value iteration. The risk map is used to provide costs for the reward function of the MDP. Once solved, we obtain a policy which provides the action the agent must execute to get the maximum utility given the state the agent is currently in. (b) 124 In [57], we introduced the risk-aware MDP and described how it can be used to plan paths through noisy ocean currents through the use of predictive ocean mod- els and risk maps constructed using AIS data. Figure 4.1 (b) shows the planning environment which we use in this work both in evaluation of the planners (in simu- lation) as well as to conduct field experiments. This figure also shows how we have discretized the map using a regular grid (resolution approximately 2.7 km) to ob- tain statess used throughout this chapter. The transition modelP relies upon this discrete grid of states, with possible actions being the choice of moving from one state to any of its 8-connected neighbors. This transition function P (s 00 |s,a(s,s 0 )) describes the probability of ending up in state s 00 , given we choose to take action a (state s to state s 0 ). In order to estimate the transition model P, a large number of simulations (> 30) using a standard kinematic model [see [52], [53], [57]] are performed, between every pair of states. For the work reported in this chapter we use depth-averaged currents for all simulations. The currents used for each simulation are perturbed with additive Gaussian noise as shown in equations 4.1 and 4.2. For the switching risk-awareMDP,thisnoisedistributionisconstantthroughtheentiremap, whilefor the GP-MDP the noise variance is based upon the estimated interpolation variance. The distribution of surfacing locations for each set of trials gives us an estimate of the transition function for that action. This allows us to estimate the transition 125 function P (s 00 |s,a(s,s 0 )) describing the probability of ending up in state s 00 , given we choose to perform action a (attempting to go from state s to state s 0 ). When performing simulations for generating the transition model of the switching risk-aware MDP, we choose a representative noise for the currents with a standard- deviation between 0.1 m/s to 0.001 m/s. The same noise model is used for both the easting and northing current components. The following equations describe how the simulated currents are obtained, where simulated noise is drawn from a Gaussian distribution N (0,σ 2 ): u sim (x,y,t) =u pred (x,y,t) +σ u (4.1) v sim (x,y,t) =v pred (x,y,t) +σ v (4.2) For the switching risk-aware MDPs, we store transition models indexed by the noise used to generate them asP (σ). Now we turn our attention to generating the reward function R. The reward function is a mapping from an action a taken at a state s. The expected risk of performing this action, which is given by the average of the risk at the final surfacing location for each simulation. This risk is obtained from risk maps such as the one shown in figure 4.2. In equation 4.3,R(s i ,a k ) is the reward for performing action a k from state s i , f j is the final surfacing location of 126 the jth simulation involving glider movement from location s i under action a k . N is the total number of simulations. R(s i ,a k ) =− P N j=1 risk(f j |s i ,a k ) N (4.3) There are several well known algorithms which can be used to solve MDPs [see [58] for an extensive list]. Goal locations carry a one-time positive reward (set to 10.0 in this work) and are terminal states. We employ the value iteration algorithm described in [59] to solve our MDPs. In the case of the stationary (switching risk- aware as well as the Gaussian Process MDPs), this involves solving the Bellman backups corresponding to equation 4.4. Here V ∗ (s) is the maximum expected dis- counted utility which can be collected by the agent from state s with a discount factor γ and it is called a Bellman backup at state s. V ∗ (s) =max a∈A X s 0 ∈S P (s 0 |a,s)[R(s 0 |a,s) +γV ∗ (s 0 )] (4.4) Value iteration consists of performing sweeps of Bellman backups at each state until there is no difference between the new V ∗ (s) and V ∗ (s 0 ) for all states s and s 0 during each sweep. In this work we use a value of γ = 1, although in general 127 one might choose a discounting factor< 1 to ensure that value iteration converges. Once value iteration has converged, the optimal policy π ∗ for each state s is π ∗ (s) =argmax a∈A X s 0 ∈S P (s 0 |a,s)[R(s 0 |a,s) +γV ∗ (s 0 )] . (4.5) The policy π ∗ (s) informs the agent/glider of the action a it should take which will give it the maximum utility (which in most cases takes it to the goal so as to collect the one-time positive reward). Notice that in this formulation we do not index the transition models, rewards, optimal value function and optimal policy by the decision epoch. This is because we create these models under the assumption of stationarity over time. The switching risk-aware MDP uses noise values which are chosen from a list of user-defined scale. In this work we have chosen these values to fall in the set σ = [0.01, 0.03, 0.1, 0.3], where σ is the standard deviation of additive noise used to create the transition model. We solve each MDP to get the optimal policy corresponding to a given amount of noise in the map. This indexed set of policies will be used when switching policies based upon the noise during execution of the plan by the glider. A drawback of this approach is that the currents have a uniform amount of noise throughout the planning graph. This is a limiting assumption which we overcome by using the Gaussian Process framework to estimate noise values using historical current data, which is described in section 4.3. Figure 4.3 depicts the general procedure used to produce policies from ocean current predictions and the risk map. 128 4.3 Methods and Algorithms The goal is to provide a principled estimate of uncertainty for predictions of large- scaleoceanprocesses. Weproposeaugmentingavailablepredictionsthroughtheuse of Gaussian Process learning methods. In this work, we focus on the prediction of ocean surface currents, due to the availability of accurate HF-Radar measurements for these values. The ocean currents at a given latitudelat, longitudelon, and timet are represented by a vectorc(lat,lon,t) = (u,v), whereu andv are the components of the currents along the latitude and longitude axes respectively. The positions are discretized based on the latitude and longitude, and time is discretized into hours. At a given time T, we are given historical data for times t ={T− 1,T− 2,...} back several months or years. We are also given predictions from the ROMS data for t ={T +1,T +2,...,T +48} (two days in the future). Given this data, we want to provide better predictions for the points of time in the future as well as confidence bounds for these predictions. 4.3.1 Gaussian Process regression We propose modeling ocean currents using non-parametric Bayesian regression in the form of Gaussian Processes (GPs) [60]. A GP models a noisy process z i = f(x i ) +ε, where z i ∈R, x i ∈R d , and ε is Gaussian noise. 129 We are given some data of the form D = [(x 1 ,z 1 ), (x 2 ,z 2 ),..., (x n ,z n )], where x i represents a vector of latitude, longitude, and time values for a data pointi, andz i represents either the u or v component of the currents at that point and time. We note that this formulation decouples the prediction ofu andv, an assumption that could be relaxed in future work (e.g., using the techniques in [61]). We refer to the d×n matrix of x i vectors as X and the vector of z i values as z. To fully define a GP, we must choose a covariance function that relates the points inX to each other. We expect that data points that are close to each other in time and space will have high correlations, which provides a smoothing effect on the data. In addition, there is a periodic effect that arises due to tidal processes, which creates a correlation between points that are separated by 12 hours, 24 hours, etc. To capture both the spatial correlations and the periodic correlations, we apply the following kernel based on the squared exponential: k(x i ,x j ) =σ 2 f exp[−w lat (x lat i −x lat j ) 2 (4.6) −w lon (x lon i −x lon j ) 2 −w t (x t i −x t j ) 2 −w p sin 2 (π(x t i −x t j )/12)]. 130 The hyperparameterσ 2 f represents the process noise, and the hyperparametersw lat , w lon ,w t , andw p represent weighting for the latitude, longitude, time, and periodic correlations. Having defined the kernel, combining the covariance values for all points into an n×n matrix K and adding a Gaussian observation noise hyperpa- rameter σ 2 n yields K z =K +σ 2 n I. The following equation predicts the mean function value (u or v value) μ(x ∗ ) and varianceV gp (x ∗ ) at a selected point x ∗ given the historical and prediction training data: μ(x ∗ ) =k T ∗ (K +σ 2 n I) −1 z, (4.7) V gp (x ∗ ) =k(x ∗ ,x ∗ )−k T ∗ (K +σ 2 n I) −1 k ∗ , (4.8) where k ∗ is the covariance vector between the selected point x ∗ and the training inputsX. Thismodelgivesameanandvarianceforaparticularlatitude, longitude, and future time. An important aspect of this model is the derivation of a variance, which provides a measure of confidence of each prediction based on the correlations of the points used to predict it. The Gaussian Process variance described above gives an estimate of the uncertainty of a prediction based on the data sparsity around that point and the estimated hy- perparameters. However, this estimate does not take into account variability of the ocean currents around that point, which is an important predictor of measurement 131 confidence. To improve on the GP variance uncertainty predictions, we propose using a method based on the interpolation variance [62–64]. Given that a GP has already been learned, the interpolation variance can be estimated as: V iv (x ∗ ) =k T ∗ (K +σ 2 n I) −1 (z−μ(x ∗ )) T (z−μ(x ∗ )). (4.9) The interpolation variance incorporates correlations between the data learned using theGPframeworkaswellasvariabilityofcorrelateddatatodeterminethepredicted variance. This estimate of variance provides a richer representation that accounts for both data sparsity and data variability. 4.3.2 Local approximation and block learning The modeling technique proposed above can be used for small datasets. However, the ROMS datasets correspond to vast periods of time and a large portion of space. The computation time required by the GP scales O(n 3 ) in the number of data points, which makes it infeasible for any large dataset more than approximately 2000 points. As an alternative, we propose estimating the predictions using a subset of the data corresponding to the points we expect to be most correlated. To choose these points, we store the data in a KD-tree, where the relative weighting of space and time are adjusted to fit the values in the kernel (e.g., the periodic correlations are preserved). Such techniques have been applied successfully for 132 terrain modeling [65], though we believe their application to time-series data is novel. The estimation of the kernel hyperparameters θ = (σ f ,σ n ,w lat ,w lon ,w t ,w p ) in large-scale Gaussian Processes is also a challenging task. The standard method is maximizing the likelihood of the measurements given the data and the hyperpa- rameters [60]: logp(z|X,θ) =− 1 2 z T K −1 z z− 1 2 log|K z |− n 2 log 2π, (4.10) where K z =K +σ 2 n I. This likelihood can be maximized using conjugate gradient optimization for small and moderately sized datasets. For the large ROMS dataset, weuseablocklearningapproachthatlearnshyperparametersforsubsetsofthedata and then averages across many of these subsets to yield the final hyperparameter values. We use this method to learn separate hyperparameters for each day. 4.3.3 Comparison of uncertainty predictions The data processing was performed on a single desktop with a 3.2 GHz Intel i7 pro- cessor with 9 GB of RAM. The kernel hyperparameters were learned independently for each day by dividing the training data from each day into 500 measurement blocks and then estimating the hyperparameters for each block using conjugate gradient optimization. The final hyperparameter values for a day were then found 133 by averaging over all blocks. Using this block learning approximation and the KD- tree inference approximation, the GP took approximately 6 minutes to compute all predictions for a single day. These results use 100 points in the space/time KD-tree for the local GP prediction at each point. There are a total of 2560 locations that are estimated over a 24 hours period for predictions up to two days ahead (total of 123,880 point predictions per day). The proposed method is highly parallelizable, since the hyperparameter estimation of each block is independent, and the predic- tion for each point can be performed independently once the hyperparameters are learned and the KD-tree is built. Our methods provide confidence measures based on the underlying variance within theGP.Intheformulationabove, theGaussianProcessvariancepredictionprovides an estimate of error due to data sparsity and measurement noise. Since the ROMS data are available at high resolution across time and space, this value provides a limited measure of confidence that is fairly homogeneous. The alternative method based on the interpolation variance incorporates variability in the surrounding cur- rents as a component of the uncertainty. Figure 4.4 shows example variance maps for a day in August, 2012 as well as the true prediction error (compared to the next day’s nowcast, which would not have been available during planning). The results demonstrate that the interpolation variance provides a much richer measure of uncertainty that qualitatively matches with the true prediction error. 134 (a) GP Std. Dev. (E/W) (b) Interpolation Std. Dev. (E/W) (c) True Prediction Error (E/W) (d) GP Std. Dev. (N/S) (e) Interpolation Std. Dev. (N/S) (f) True Prediction Error (N/S) Figure 4.4: Comparison of the Gaussian Process variance and the interpolation variance measures of uncertainty for the prediction of ocean currents in the South- ern California Bight on August 7, 2012. The interpolation variance accounts for variability in the ocean currents when making its prediction of uncertainty, and it shows a stronger correlation with the true prediction error. We also compare the correlation coefficients (R-values) between the interpolation variance and the Gaussian Process variance correlated with the true prediction error. If the variance is an accurate representation of uncertainty, we would expect to see a positive correlation between the variance and the true error. Table 4.1 shows the R-values for three months in 2012. The interpolation variance shows a positive correlation with the true prediction error, while the Gaussian Process variance shows essentially no correlation. 1 1 TheslightnegativecorrelationfortheGPvarianceislikelyduetothehighpredictedvariances intheborderingedgesofthedataset, whichdonotcorrespondtoincreasederrorsintheprediction. 135 Table 4.1: Correlation coefficients (R-values) for Gaussian Process variance and interpolation variance relative to true prediction error (shown separately for N/S component and E/W components of the ocean current vectors) Month (2012) June July August GP variance R-value (E/W) -0.0519 -0.0178 -0.0277 Interp. variance R-value (E/W) 0.1383 0.1260 0.1655 GP variance R-value (N/S) -0.0652 -0.0245 -0.0271 Interp. variance R-value (N/S) 0.1745 0.1400 -0.1323 4.3.4 Policy switching risk-aware MDP using stationary transition models In the policy switching MDP we pre-compute transition models assuming a fixed amount of noise in the ocean currents with the understanding that we will switch policies during the execution of plans depending upon the deviation from predicted location observed by the AUV at that surfacing. The noise values (in terms of standard-deviation) when creating transition models are 0.001 m/s, 0.003 m/s, 0.01 m/s, 0.03 m/s, 0.1 m/s and 0.3 m/s. These correspond to increasing predic- tion noise in the current predictions. Using our knowledge of the nominal speed of the AUV and the distance between the start and the goal locations, we can de- termine thresholds at which a policy corresponding to higher noise in the currents must be employed when the AUV surfaces further away from the desired location. Conversely when the AUV surfaces closer to the predicted location, we switch to using a policy corresponding to lower noise in the current predictions. This allows the switching MDP to adapt to observations indicating noise in the currents. A 136 Algorithm 4.1 Policy switching risk-aware MDP 1: Input: previous surfacing location (a x ,a y ), desired surfacing location (d x ,d y ), observed/actualsurfacing location (b x ,b y ), set ofK noise valuesσ 2 u added tothe currents to create K transition models (symmetrical for easting and northing current components) and corresponding pre-computed MDP policies Π(σ 2 u ) 2: simulate predicted surfacing location by executing run without additive noise to get (p x ,p y ) when going from (a x ,a y ) to (d x ,d y ) 3: for all σ 2 u do 4: simulate N runs from (a x ,a y ) to (d x ,d y ) in currents with noise σ 2 u (i) to get a set of simulated surfacing locations (c x ,c y )(u) indexed by the noise used u 5: end for 6: compute variance of simulated surfacing locationsσ 2 x (u),σ 2 y (u) for each value of noise in currents 7: find lowest σ 2 u (i) where (b x −p x ) 2 + (b y −p y ) 2 ≤max{σ 2 x (σ 2 u (i)),σ 2 y (σ 2 u (i))}. 8: Output: Π(σ 2 u (i)) as the best policy to use for the next segment. disadvantage of this technique is that when the AUV is traversing long distances during path execution, the noise in the current predictions out into the future in a different region could be different. This planner could stand to benefit from im- proved measures of the uncertainty of currents at locations apriori to execution of the plan. 4.3.5 Risk-aware Gaussian Process MDP using stationary transition models The methods described above can be incorporated into path planners for AUVs to improve the safety and reliability of operation. Path planners that incorporate data from ocean currents clearly stand to benefit from improvements in the accuracy of ocean current predictions. In addition, confidence estimates are useful both for 137 planners that reason probabilistically as well as for planners that reason about worst-case instances. In particular, we are interested in improving the operation of AUVs by avoiding areas with high probability of encountering a dangerous ship. We now examine how the proposed GP prediction methods could be useful for improving the paths generated by such planners. In the GP-MDP we use the maximum interpolation variance among the states s and s 0 of each transition edge, e(s,s 0 ), over every time interval t∈ [t 1 ,t 2 ] to generate the transition model for the MDP. To compute the maximum variance for the generation of the transition model for a particular transition edge, we use Equations 4.11 and 4.12. Each transition model is then computed by conducting (> 30) simulations of transitions between each pair of states where the currents used for the simulation are drawn from a Gaussian N (0,σ), where the variance standard deviationσ isσ u andσ v for the easting and northing current components using Equations 4.1 and 4.2 respectively: σ 2 u = max t∈[t 1 ,t 2 ] n σ 2 u (s,t),σ 2 u (s 0 ,t) o , (4.11) σ 2 v = max t∈[t 1 ,t 2 ] n σ 2 v (s,t),σ 2 v (s 0 ,t) o . (4.12) 138 With improved confidence measures describing the amount of noise in the ocean current predictions, the transition models used by the GP-MDP are more repre- sentative of the true errors than the models using a constant prediction noise value over the entire planning graph. Thus, we would expect the GP-MDP to provide improvements in the execution speed, reliability, and safety of the resulting plan. 4.3.6 Risk-aware Gaussian Process MDP using non-stationary transition models Stationary models by their very nature are useful when there is low variability in the ocean currents. When planning paths through regions with high variability in currents, a natural option is to use non-stationary transition models during planning. To do this, we make use of the interpolation variance estimates learnt usingGaussianProcessregressiononthepredictions, andusethesemodelsinanon- stationary MDP called the risk-aware Non-Stationary GP-MDP (NS-GP-MDP). ThetransitionmodelP inthisplannerusesboththediscretegridofstatesandtime, with possible actions defined as moving from one state to any of its 8-connected neighbors at some time. We perform a large number of simulations for every action performed for each state s∈ S at time t, where S is the set of all states in our planningregion. ThisallowsustoestimatethetransitionfunctionP (s 00 |s,a(s,s 0 ),t) describing probability of surfacing in state s 00 when performing that action a(s,t). 139 The reward function is still computed using equation 4.3. The optimal value func- tion is given by V ∗ (s,t) =max a∈A X s 0 ∈S P (s,a,s 0 ,t)[R(s,a,s 0 ,t) +γV ∗ (s 0 ,t + 1)] , (4.13) where V ∗ (s,t) is the maximum expected utility the agent can collect at state s and time t. The optimal policy π ∗ corresponding to this optimal value function is deterministic Markovian and is given by π ∗ (s,t) =argmax a∈A X s 0 ∈S P (s,a,s 0 ,t)[R(s,a,s 0 ,t) +γV ∗ (s 0 ,t + 1)] . (4.14) The risk-aware GP-MDP using non-stationary MDPs has transition models only until the maximum lookahead time from the current predictions. Due to this, when the start and goal are separated far enough, the GP-MDP can be susceptible to computing an optimal policy which makes no progress toward the goal, but in fact chooses to hover in a low-risk region in the vicinity of this node.This is reasonable because the agent prefers to minimize the risk of colliding with obstacles it may encounter while making toward the goal if it is aware that it cannot make it to the goal (where it may get a high goal reward). We observe in section 4.4.2 (both in simulation as well as field tests), that this results in policies which timeout instead of taking the glider to the goal. 140 4.4 Planner Simulations ROMS Predictions Glider Model and Obstacle avoidance Start Risk Map Output: Surfacing Location(lat, lon,t) Initial Location (lat, lon,t) Pre-computed Policy Get Next Policy (lat,lon,t) Is at Goal? Stop N Y Figure 4.5: Basic flowchart explaining how simulations of the glider are performed for both the stationary and non-stationary planners. The pre-computed policy for the non-stationary MDP is sensitive to the time, while the stationary MDP policy chooses a finite-horizon policy which corresponds to the 6, 12 or 24 hour period for which the stationary policy has been created. In the simulations reported in this chapter we use policies which are valid for 12 hour periods. When the glider is at sea during field experiments the same procedure holds; we use the latest pre- computed policy and lookup the best action based on the state consisting of the latitude, longitude (and in the case of non-stationary planners, time). This action is translated to a glider mission file which is uploaded to and executed by the glider. We evaluate our planners in simulation using the virtual map shown in Figure 4.1 (b). A disadvantage of using virtual land bodies (eg. the three islands) is 141 that ocean currents may not behave exactly as modeled 2 , especially when the true currents flow through an island (a situation which is unlikely with a real island). Since the planners described in this work do not directly reason about the current models, statistics built during the transition model generation phase should provide evidence advising the planner against selecting actions taking the glider into an island. Simulations are performed by computing policies for one of each of these 6 goal locations: (33.4498 ◦ ,−118.3593 ◦ ), (33.5430 ◦ ,−118.3316 ◦ ), (33.4692 ◦ ,−118.4181 ◦ ), (33.5238 ◦ ,−118.5881 ◦ ), (33.4882 ◦ ,−118.7014 ◦ ) and (33.5223 ◦ ,−118.7572 ◦ ). When performing simulations with currents, we first compute the average direction of the currents during the simulation window (typically a week). Then for each pair of locations we compare the direction from the start to the goal with the average direction of the currents. If these directions are found to be within±45 ◦ of each other, we assume that this pair of locations is with the currents. If on the contrary, the directions are found to have a difference greater than±135 ◦ , we assume that this pair of locations is against the currents. Figure 4.5 shows the procedure followed when performing simulations of the gliders aswellaswellasduringfieldtestsoftheplannerswithglidersatsea. Insimulationa glider model and ROMS ocean current predictions are used to simulate the action of the gliders motion at sea. Policies (stationary or non-stationary) are pre-computed 2 A higher number of aborts due to potential collisions with land happen than we would expect with more realistic currents going around these virtual islands. 142 and indexed by date and time ranges, allowing for easy lookup when a glider is at the surface in the field. During simulation, the tuple of surfacing latitude, longitude and time of surfacing are used to lookup the best action to be taken from this state given the policy corresponding to that time. The policy provides a new latitude and longitude which the glider should aim for, which is then executed using the glider model until the glider is at the goal or aborts the mission due to a collision. In field tests we produce a mission file containing the desired waypoint and this file is automatically uploaded to the glider during the same surfacing. The time taken to retrieve the GPS location from the glider, lookup the policy, generate a new mission file and upload it to the glider is approximately 2 minutes. The total time at the surface also involves time for GPS position acquisition soon after surfacing as well as just before diving, and typically ranges between 5 and 10 minutes (without data transfer from the glider to shore). In simulations we report here, we do not model collisions with ships when the glider is at the surface to transmit data. Surfacing risk is accumulated at each surfacing location (with the assumption that time at surface is equal at every waypoint). A simulation run ends either in success (glider reaches goal), abort (glider reached land) or in a time-out (when the simulation has not ended in more than 250 hours). Simulations results are presented where we compare the stationary switching MDP policy with a stationary MDP with transition models guided by noise estimates learnt by Gaussian Processes in section 4.4.1. Further simulations are performed 143 to compare the performance of the stationary GP MDP with a non-stationary GP- MDP (which is designed for use when currents have higher higher variability) are presented in section 4.4.2. Finally, section 4.4.3 presents simulation results of an omniscient non-stationary MDP planner which has all the ocean predictions for a week of data available at the start of planning. 4.4.1 Comparison of switching policy risk-aware MDP with the stationary risk-aware GP-MDP In a switching policy risk-aware MDP the decision of the policy to be used in the next segment is decided by the observed difference between true surfacing location and predicted surfacing location. We have to pre-compute a larger number of MDP solutions corresponding to different amounts of noise (an expensive proposition), and we make the assumption that the noise in the current predictions will affect the glider in the subsequent dive in the same way as it did during the last dive. The GP-MDP on the contrary uses historical data to derive a stationary transition model which encodes the uncertainty in the ocean predictions within it. We solve a single policy for each finite horizon as opposed to solving N policies for each finite horizon (where N is the number of noise values for which transition models were createdintheswitchingpolicyrisk-awareMDP).Theadvantagesofthisdatadriven approach are obvious in the results shown in Figure 4.6 where there is a significant reduction in the number of collisions with land from the use of the GP-MDP as 144 compared to the switching policy risk-aware MDP. In the case when the direction from start to the goal was generally with the currents we observe a significant reduction (p-value=0.0186) in slower currents, as well as a significant reduction (p-value=0.0065) when going with faster currents. Although we do not observe any improvement through the use of the GPMDP when going against slow currents (p-value=0.5), there is a significant improvement in reducing collisions when going against currents (p-value=0.028). 145 249 261 21 9 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Switching MDP GPMDP Timeouts Collisions Successes (a) With slow currents (Aug 4 to Aug 11, 2012) 260 261 10 9 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Switching MDP GPMDP Timeouts Collisions Successes (b) Against slow currents (Aug 4 to Aug 11, 2012) 215 233 47 26 8 11 0% 20% 40% 60% 80% 100% Switching MDP GPMDP Timeouts Collisions Successes (c) With fast currents (Aug 17 to Aug 24, 2012) 114 94 94 115 62 61 0% 20% 40% 60% 80% 100% Switching MDP GPMDP Timeouts Collisions Successes (d) Against fast currents (Aug 17 to Aug 24, 2012) Figure 4.6: Shows a total of 2160 simulations conducted in two regimes currents slower than speed of glider (Aug 4 to Aug 11, 2012), and when currents often faster than speed of glider (Aug 17 to Aug 24, 2012) (a) In 270 simulations each where the glider had to travel generally with the direction of the slower currents (when going from the start to the goal), there was a significant reduction in the number of collisions when the GPMDP was used, as compared to the Switching MDP (p- value=0.01858). (b) When going against the currents there was no improvement in the success rate for the GPMDP over the switching MDP (p-value=0.5). The improved current estimates appear to help when going with the currents, but do not appear to provide any advantage when going against the currents if the speed of the currents is slow in comparison to the speed of the glider. (c) In 270 simula- tions each where the glider had to travel generally with the direction of the faster currents, there was a significant reduction in collisions by using the GPMDP over the switching MDP (p-value=0.00651). (d) When going against the faster currents we notice a significant reduction of crashes as compared to the successes by using the GPMDP over the switching MDP (p-value=0.02802). This indicates that using a more sophisticated technique such as GPMDPs produces an improvement when the currents are faster. Faster currents are also responsible for more policies result- ing in timeouts - an MDP policy which finds that the currents are too strong for the glider to reach the goal is likely to find a policy that keeps the glider hovering between two or more low-risk waypoints. 146 These results from simulation indicate that the improved estimates of the GP- MDP are likely to improve the performance of probabilistic planners. These im- provements are more significant when the speed of the currents are high. When considering the results of simulations against fast currents we notice that both the stationary planners have a large number of collisions with land, which is a highly undesirable behavior. The interpolation variance estimated by the GP-MDP uses a non-stationary kernel which provides estimates of the uncertainty in the current noise over time. The non-stationary GP-MDP which was described in section 4.3.6 provides a means of reducing the collisions by preventing the planners from being overly goal-directed when conditions are risky. 147 4.4.2 Comparison of non-stationary risk-aware GP-MDP with switching and stationary risk-aware MDPs 233 141 26 16 11 113 0% 20% 40% 60% 80% 100% GPMDP NS-‐GPMDP Timeouts Collisions Successes (a) With fast currents (Aug 17-24, 2012) 115 73 94 10 61 187 0% 20% 40% 60% 80% 100% GPMDP NS-‐GPMDP Timeouts Collisions Successes (b) Against fast currents (Aug 17-24, 2012) Figure 4.7: (a) Simulation results from 270 runs when the direction to goal is generally with the currents. The NS-GPMDP has fewer crashes than the GPMDP while timing out significantly more times. This reduction in collisions through the use of the Non-stationary GP-MDP is not statistically significant (p-value=0.59) (b)Simulationresultsfrom270runseachwhengoingagainstfastcurrents. Here, we notice a significant improvement in terms of reducing collisions with land through the use of the non-stationary GPMDP (p-value≈0). The non-stationary GP-MDP is an improvement over the stationary planners especially when the currents are fast. Figure 4.7 shows results from a total of 1080 simulations performed between Au- gust 17 and August 24, 2012 using the stationary GPMDP and the non-stationary GPMDP. The speeds of the currents were relatively fast in comparison to the speed of the glider, exceeding 0.27 m/s in several cases. In 270 simulations for each plan- ner, when going with the currents, we find that the non-stationary MDP does not significantly reduce the number of collisions as compared to the number of success- fully executed paths. The stationary MDP has success rates in excess of 86%, while 148 the non-stationary MDP is successful only 50% of the time, choosing to time out instead of reaching the goal. For the case where the currents go against the direction the glider generally follows when going from the start to the goal, we notice that the non-stationary GP- MDP has a much larger number of timeouts, but has significantly lower collisions than the switching MDP (p-value=0.0961). It also outperforms the GP-MDP (p- value=0.1791), although this is not as significant in comparison with the switching MDP. Here, the non-stationary MDP prefers not to make progress toward the goal in most cases because the small horizon for planning (approximately 48 hours) might be too small a time for the glider to make it successfully to the goal. This is quite evident in Figure 4.13 where the most of the states in the western portion of the planning region do not have actions which will take the glider toward the goal. States closer to the goal have actions going to the goal in the case where the glider goes against the currents. The limited amount of ocean current predictions prevents the non-stationary MDP from having better performance except in the case where the goal is easily reachable from the state the glider surfaces at. In low- speed currents, the non-stationary MDP is a slight improvement over stationary MDPs in terms of reducing collisions. 149 385 383 185 348 20 22 40 15 180 42 0% 20% 40% 60% 80% 100% Switching MDP GPMDP NS-‐GPMDP Omniscient MDP Timeouts Collisions Successes (a) Traveling with currents 313 319 100 227 92 85 20 54 0 1 285 124 0% 20% 40% 60% 80% 100% Switching MDP GPMDP NS-‐GPMDP Omniscient MDP Timeouts Collisions Successes (b) Traveling against currents Figure 4.8: (a) Simulation results from 405 runs for each planner when the goal is aligned with the goal. We notice that there is a significant increase in successes for the Omniscient MDP with a reduction in crashes (p-value≈0). This indicates that the knowledge of currents further out into the future is highly beneficial when going toward the goal with the currents. (b) Simulation results from 405 runs each when going against the currents. Here, we notice an improvement in success rates for the Omniscient MDP, but it has more collisions (p-value=0.7697 which is not statistically significant). This indicates that risk of collision does not necessarily go down with additional knowledge about the uncertainty in current forecasts when going against currents. Overall however, we note that having knowledge of future currents does help the omniscient MDP avoid timing out, consequently increasing its success rates when going against the currents. 4.4.3 Gain of using an Omniscient MDP over a practical finite-horizon non-stationary MDP In section 4.4.2, we discussed why the non-stationary risk-aware MDP often com- putes policies which make little progress toward the goal and may result in timeouts specially against currents. Although the knowledge about currents in the future is limited, we can develop a non-stationary MDP planner which concatenates together a week of ocean current data. This planner has more accurate current information over the planning horizon because it does not contain current predictions exceeding 150 the initial 24 hours for each day, consequently having improved knowledge about the currents. We call this planner the Omniscient MDP because it has substantially more knowledge about the currents during the time of planning than is practically possible with the stationary or non-stationary MDPs. Figure4.8showsresultsfromatotalof3240simulationsconductedinrelativelyslow currents from August 12 to August 22, 2013. Here we notice that the performance of the stationary planners is much better than that of the non-stationary planners, indicating that if the currents are always slower than the speed of the glider, it is better to use a stationary planner such as the switching MDP or the stationary GP-MDP. When we compare the performance of the non-stationary GPMDP and the omniscient MDP, we notice that in slower currents the Omniscient MDP has significantly fewer timeouts as compared to the non-stationary GPMDP. This is not surprising considering that the glider is likely to be able to have a policy that can take it from the start to the goal within the week-long planning horizon. The Omniscient MDP significantly reduces the number of collisions while increasing successes (p-value≈0) for the case when going with the currents. When going against slow currents however, the Omnsicient MDP performed worse than the non-stationary GPMDP (p-value=0.77 for Omniscient MDP being an improvement over the non-stationary GPMDP). Now, consider Figure 4.9, where the currents are relatively fast in comparison with thespeedoftheglider. Insuchcases,wehaveseen(asdiscussedinSecion4.4.2)that 151 the non-stationary MDP significantly reduce the number of collisions as compared to the stationary planners, both with and against the currents. This figure also shows us that providing additional knowledge about future currents does not yield a significant improvement over the performance of the non-stationary MDP in terms of reduction of crashes (p-value=0.1312). Moreover there is no improvement in terms of reduction in timeouts due to having additional information about the currentswhengoingagainstfastercurrents. Thereductionincollisionsascompared to the non-stationary GPMDP was not significant (p-value=0.5). This indicates that for challenging regions like that shown in Figure 4.1 (b), strong currents often call for policies in which the glider should not attempt to make progress toward the goal. In such cases, the best policy appears to be to wait the faster currents out in a safe area, and attempting to move to the goal only when current speeds reduce. In these simulations, the currents stayed strong throughout the runs, and in such cases the non-stationary MDP’s policies were nearly identical to those of the Omniscient MDP in trying to avoid risky areas and sticking to safer low-risk regions. Figures 4.9 (c) and (d) show statistics of the planners when going with and against relatively fast currents respectively. These statistics are the averages calculated for the successfully executed paths. The stationary MDP planners have higher risk associated with them even though they have more successfully executed paths. This is because even though the non-stationary MDP and the Omniscient MDP 152 timeout, they do so by going back and forth in low-risk regions, accruing very little risk. We notice that the risk associated with the non-stationary MDP is almost 40% lower than that of the stationary MDP when going with fast currents. When going against fast currents, the paths executed by the stationary GPMDP are almost 2.6 times riskier than the paths executed by the non-stationary GPMDP. The statistics for the average time taken to get to the goal and the average path length are computed only for cases where the glider successfully reaches the goal. We notice that the paths successfully executed by the non-stationary planner are usually shorter than those successfully executed by the stationary GPMDP. The omniscient MDP appears to have slightly shorter paths on average than the non- stationary GPMDP when going against currents. 4.5 Field Trials We report results from field trials of the stationary version of the GP-MDP as well as the non-stationary version of the GP-MDP. The experiments from the stationary GP-MDP planner were performed between August 17 and August 19, 2012, while the non-stationary GP-MDP experiments were performed in August 2013. In both cases, we obtain ROMS data at 6:30 am PST and compute transition models based upon the GP’s estimates of interpolation variance. Next, we solve the MDP using the desired goal, to pre-compute the MDP policy for each planner. When the AUV surfaces, we use its reported position to lookup the best action to execute from the 153 indexed policy corresponding to this surfacing location. We create a mission file for the glider with this state as the desired waypoint, and instruct the glider to execute it. At each subsequent waypoint we repeat this procedure until the glider reaches the goal. In the field tests of the stationary GP-MDP conducted in August 2012, we observed that the GP-MDP was able to get to the goal faster than the switching MDP (which was run on another glider simulatenously). This seemed to indicate that the GP- MDP was better informed and took less time to arrive at the goal as a consequence of planning a better path. Simulation results do not support this hypothesis - a glider executing the GP-MDP is likely to be slower on average than the switching MDP. Figure 4.10 depicts the execution of the non-stationary GP-MDP which began approximately at midnight on August 17, 2012. The GP-MDPs confidence measures produce a less conservative policy which moved the glider more quickly toward the goal. The glider oscillated a few times before crossing the western shipping lanes, before making its way to the goal at approximately 3 am on August 19. This was a day faster than the glider running the swiching policy MDP in the field trial. Figure 4.11 shows the path executed by the glider due to the non-stationary MDP planner. An example snapshot of the non-stationary policy used during the ex- ecution on August 15, 2013 is shown in Figure 4.12 (a), while several simulated trajectories used to test whether the mission should be sent to the glider during the 154 field tests is shown in Figure 4.12 (b). During field trials, if these simulations end up causing collisions the mission is not automatically loaded for execution on the glider and instead requires a human pilot’s supervision. This is an useful feature which can help glider pilots with a decision support framework during real glider deployments. After the glider arrived at the goal, we chose a goal location in the east of the map (to cause the glider to travel against the currents). This experiment began at 18:00 PST on August 16 and is shown in Figure 4.13 (a), where the glider started at the blue triangle. The goal location is depicted by the black square was too far for the glider to be able to realistically reach while traveling against the currents. We found that the glider made no progress toward the goal over the next 12 hours. Closer inspection of the non-stationary MDP policy revealed that the states shown in Figure 4.13 (b) did not make progress toward the goal, while states outside this region made progress toward the goal. The policy shown in Figure 4.13(b) is a snapshot of the policy at 22:00 PST on August 16. Notice how the actions are all risk-averse and take the glider to low-risk regions away from shipping lanes and land, where they oscillate back and forth between low-risk states. This is a natural solution to MDPs when the goal cannot be reached - they try to be risk-averse and find solutions which minimize the expected risk. Field trials of our planners resulted in execution of paths which were similar to those observed in simulation with currents from the same time period. Much of the 155 process was automated using a Python framework for glider path planning which includes the ability to communicate with a glider Dockserver, as well as to transfer mission files to the glider through the Dockserver. The framework which we call the Glider Path Planning Library also includes methods to download the latest ROMs update, create transition models, archive and reload them as well as algorithms to solve MDPs and shortest paths in planning graphs. Without sufficient automation, it is difficult to perform multi-day experiments when the glider travels distances of approximately 150 km during the course of a set of field trials. Successful field tests alsoindicatethatthesealgorithms(alongwithoptimizationssuchaspre-computing and indexing policies) allow for these planners to be practically implemented, and used on robots in the real world. Such automation in plan execution and decision making is essential to enable more complicated automated data-driven planning for AUVs. Our system compensates for the lack of computational resources onboard the vehicle (as well as limitations such as the large size of ROMs current data and the transition models) by performing most of the computationally expensive work off board and only uploading the final mission file to the vehicle for execution. 4.6 Related Work Optimal path planning is the process of generating an optimal sequence of way- points from a start configuration to a desired goal configuration under constraints and a metric (e.g., avoiding obstacles and requiring minimal time/energy). Several 156 well-known path and motion planning algorithms have been discussed in [28,29]. Most classical planning techniques in the artificial intelligence literature are de- signed for use on a discretized world due to ease of implementation on computers. Heuristic search techniques such as A ∗ have been very popular for deterministic planning in the artificial intelligence and robotics literature. These techniques use heuristics based on domain knowledge to guide the search, preventing unnecessary expansions whenever possible. Variants of these techniques such as D ∗ andD ∗ -lite are capable of performing efficient updates to the costs for nodes in the search graph when new knowledge of the world becomes available in dynamic scenarios [66,67]. While discrete path planning algorithms have been popular since they are easy to implement, lately sampling-based motion planning algorithms such as rapidly- exploringrandomizedtrees(RRTs)andprobabilisticroadmaps(PRMs)havegained popularityastheycansolveplanningproblemsinhigh-dimensionalcontinuousstate spaces [34–37]. These algorithms are well suited to problems requiring plans online as robots execute motion. These planners typically work well for robots with strong and accurate sensors and strong actuators. In contrast, Slocum gliders have low control authority since ocean currents are often faster than their nominal speed and lack the ability to observe the currents directly. Thus, we focus on planning techniques that are better suited for situations with uncertainty in observation and control of the state of the robot: Markov decision processes. 157 This work focusses on risk avoidance, a well studied problem in robotics, spanning from unmanned ground vehicles [11] to unmanned aerial vehicles [12]. Most of this work uses deterministic strategies for risk minimization using heuristic search techniques. Our conference paper [14] described a deterministic planning algorithm which used A ∗ to plan risk-aware paths (without considering ocean currents). A recent paper [13], studied the risk of collisions between glider AUVs and ships. The resulting collision probability model indicates that the probability of a collision is proportional to shipping density. Our work is different because we construct risk maps based on a similar model for planning low-risk paths for AUVs, while also considering how noisy ocean predictions affect our surfacing outcome. In [47] a decision support system using automatic identification system (AIS) data and ocean current predictions determines the safety of a set of waypoints chosen by human pilots. We extend this by providing an algorithm that chooses waypoints so as to minimize the expected risk of surfacing in hazardous locations or collisions with land bodies. Fast marching and level sets are techniques based on wavefront expansion that provide efficient solvers for the motion of AUVs through time-varying current fields. Fast marching [44,45] is a technique which solves the Eikonal equations of a front whose speed function cannot change sign. Fast marching has been successfully applied in the context of planning paths for AUVs in [6]. A related technique, level sets [44,51], relaxes the constraint by allowing the speed function of the wavefront 158 to change signs. Recently, [27] used level sets to plan paths for AUVs in flow fields using level sets. However, these techniques do not take into account the uncertainty of the current field. In contrast, we focus on algorithms that also reason about the noise in predictive ocean models such as Markov decision processes. Most work in path planning for AUVs relies on deterministic techniques such as A ∗ , e.g., [5] where bathymetry, exclusion zones, and ocean current databases help to generate path corridors along great circle routes. Witt and Dinbabin (2008) describe an algorithm designed to find energy-optimal paths through search by us- ing time-varying ocean current predictions [8]. Kruger et al., developed planning algorithms for AUVs in fast-flowing estuarine environments [7]. All of these algo- rithms are unsuitable for slow-moving vehicles like Slocum gliders which lack good control authority. When it comes to deterministic planning for gliders, Fernandez- Perdomo et al., developed an algorithm called constant time surfacing A ∗ , where predictive ocean models are used to plan time-optimal paths [48]. Similar graph- based planners for various cost-functions are described in [49]; further extensions using parallelization can also be found in [50]. All these planners require low noise in the ocean current predictions to be practically useful. The first attempt at study- ing the effect of noisy predictive ocean models in path planning is found in [26], where the performance of wavefront expansion based planners under uncertainty in current predictions was studied. Results from their work indicate that noise in predictions has a significant effect on the planning performance. Our work [57], 159 was the first to develop planners capable of reasoning probabilistically about noisy current predictions, but lacked a principled way determine noise values for building transition models. This chapter describes a planner based on observed surfacing lo- cations (switching risk-aware MDP) to switch policies and planners using transition models based on data-driven GPs to learn the noise model. Predictions of ocean currents have previously been used by several research groups to improve the navigation capabilities of autonomous vehicles and improve the safety of their operation [14,48]. Lower bounds on navigation error have also been derived to estimate the path following performance of underwater vehicles [68]. While these prior works provide a basis for navigation of AUVs in the open ocean, theyarelackinganalysisoftheuncertaintyofpredictions. Wedevelopthenecessary tools to utilize these uncertainty predictions and improve path planning methods. Prior work has examined the development of confidence measures for various ocean processes using straightforward statistical tools [69] as well as more sophisticated Bayesian models [70]. In [55], a comprehensive method for the quantification, pre- diction, and estimation of uncertainties of ocean dynamics is described, along with the challenges involved in modeling ocean fields in a very large scale. A rigor- ous computational method called error subspace statistical estimation (ESSE) is described with several illustrative examples for its use. ESSE has components of time-varying basis functions, multi-scale initializations, and stochastic ensem- ble predictions, which are combined with data-assimilation. However, ESSE does 160 not provide approximate governing equations for the time-evolving error covariance bases, and it is best suited for data-assimilation of complex processes. GPs have been used successfully to improve the accuracy of tasks such as large-scale terrain modeling [65], scientific planetary surveying [71], and robotic grasping [72]. We demonstrate that standard GP variance predictions do not capture the uncer- tainty inherent in prediction of ocean processes. Interpolation variance provides an alternative measure of uncertainty that also incorporates variability of the pre- diction into the variance. Interpolation variance was originally introduced for the estimation of geostatistical data [62,63] and was recently applied as a method for guiding adaptive sampling by mobile robots [64]. To our knowledge, we are the first to apply interpolation variance to the prediction of ocean currents and the first to integrate it into the action model of a probabilistic planner in [54]. 4.7 Conclusions This work addresses the problem of planning risk-aware paths for AUVs using noisy predictive ocean models by developing three planners based upon the Markov Deci- sion Making framework. Extending upon prior work, the planners described in this work utilize a principled approach to generating and using these noisy predictions during planning. We begin with the development of stationary models through a Switching risk-aware MDP which switches between pre-computed MDP policies 161 based on the deviation of the actual surfacing location from the predicted surfacing location. This planner employs a reasonable means of utilizing observed deviations in surfacing locations to advise the indexed policy that should be used for the next decision epoch. A drawback of this approach is that we are always a step behind in knowing how noisy predictions in the region we will be operating in are. To address this problem, we used a data driven approach using Gaussian Process regression to estimate the interpolation variance of the noisy ocean current predictions. These variance estimates were then used to create transition models (stationary transition models when currents have low variability) which are used in the stationary risk- aware GP-MDP and (non-stationary transition models when currents have high variability) which are used in the non-stationary risk-aware GP-MDP. Extensive simulation results indicate that often there are significant gains to be made over a switching risk-aware MDP by learning these uncertainties and utiliz- ing them during planning. We have observed improvements in simulation in the performance of the stationary GP-MDP over the switching (fixed-noise) risk-aware MDP particularly when the speed of the currents are relatively fast in comparison with the nominal speed of the glider. The non-stationary MDP is an ideal planner for situations where the currents are relatively fast in comparison with the speed of the glider. A disadvantage of this planner is that if the start and goal are separated enough for it to take too much time for the glider to go from the start to the goal, it may make no progress toward the goal. While the strategy of heading toward 162 a low-risk region and hovering between a pair of low-risk locations is optimal, in practice we might want the AUV to traverse a longer distance. We notice that when the speed of the currents is substantially slower than the speed of the glider the best solution is to utilize one of the two stationary MDP planners, since both outperform the non-stationary MDP. We also noticed that when the currents are slow, a non-stationary MDP could be useful in planning paths between distant start-goal pairs, only if they had access to a substantially larger amount of ocean current forecasts. This was evident from our simulations in slower currents, where we noticed that the omniscient MDP which had a week of current forecasts during planning, significantly reduced the number of timeouts when going from the start to the goal. It also significantly reduced the number of crashes as compared to the non-stationary MDP when going with the currents. This result indicates that the non-stationary MDP may be useful in practice when ocean models provide longer forecasts under conditions where the speed of the currents are slower than the speed of the glider. Even in this case, for goal-directed behavior the stationary MDP planners, particularly the stationary GPMDP is the preferred planner of choice. When the speed of the currents is fast however, the non-stationary GPMDP is much more useful (particularly in very challenging regions). Here we noticed a sig- nificant reduction in the number of collisions through the use of the non-stationary GPMDP over both types of stationary MDP planners, both when going with the 163 currents as well as against them. Moreover we did not notice a significant im- provement in terms of a reduction in collisions when providing our planner with more knowledge about the currents. It is likely however, that when planning paths in less challenging scenarios this difference in performance between the stationary and non-stationary planners is smaller even in faster currents. The non-stationary GPMDP is the winner in situations where the glider is operating in a region with complex terrain, lots of shipping lanes and fairly fast currents. In such scenarios the non-stationary GPMDP’s policies are much safer than those from the station- ary MDP planners, and are likely to ensure the safety of the glider while giving up on goal-directed behavior. Table 4.2 summarizes the qualitative performance of planners when traveling with and against slow or fast currents based upon the results obtained from simulations. The behaviors of the non-stationary planners in the field was observed to be similar to that observed during simulation. We have tested our algorithms on gliders over several multi-day runs at sea, developing a semi-autonomous planning framework which makes it easy to automate risk-aware planning for underwater vehicles. The stationary GPMDP which was tested in August 2012 out-performed the switching MDP in the field, by taking approximately 10 hours less to reach the goal. The non- stationary MDP was also tested at sea in August 2013 and performed satisfactorily when going with the currents. 164 Table 4.2: Summary of planner performance. (X⇒ good,×⇒ poor). Here the stationaryGPMDPconsistentlyoutperformstheotherplannersnon-stationaryGP- MDP outperforms the stationary planners (even with a limited prediction horizon) in faster currents. S-MDP GPMDP NSGPMDP Omni MDP Slow currents With XX XX × X Against X X × × Fast currents With × X XX XX Against ×× × XX XX Future extensions of this work can involve augmenting the reward functions of the GPMDP to allow the planners to automatically select between several goals using some multi-criteria model possibly driven by information gain metrics. We would also like to investigate other data-driven approaches to modeling and estimating the noise in the ocean currents. Overall this chapter described three planners each of which is useful in planning paths for gliders under uncertainty. The stationary GPMDP based on the interpolation variance measure is the best planner that can be used when the speed of the currents are slower than the speed of the currents. In situations where the glider is operating in a challenging region with fast cur- rents, the best planner is the non-stationary GPMDP. In practice, we can use the non-stationary GPMDP when we notice that the speed of the currents are fast in comparison with the speed of the glider, while using the stationary GPMDP otherwise. 165 215 233 141 178 8 11 113 80 47 26 16 12 0% 20% 40% 60% 80% 100% Switching MDP GPMDP NS-‐GPMDP Omniscient MDP Timeouts Collisions Successes (a) Traveling with currents 114 94 10 9 94 115 73 74 62 61 187 187 0% 20% 40% 60% 80% 100% Switching MDP GPMDP NS-‐GPMDP Omniscient MDP Timeouts Collisions Successes (b) Traveling against currents 2.0 34.5 31.0 2.0 34.9 32.8 1.2 24.1 23.5 1.2 25.9 26.5 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 Avg. Risk Avg. Time (hr) Avg. Path Length (km) Switching MDP GPMDP NS-‐GPMDP Omniscient MDP (c) Traveling with currents 4.0 93.3 45.7 3.7 87.3 45.2 1.4 31.1 20.3 1.4 34.2 17.9 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 Avg. Risk Avg. Time (hr) Avg. Path Length (km) Switching MDP GPMDP NS-‐GPMDP Omniscient MDP (d) Traveling against currents Figure 4.9: (a) Simulation results from 270 runs each in relatively fast currents when the goal is aligned with the goal. While the omniscient MDP appears to help increase the number of successes and reduce the number of crashes this is not very significant (p-value=0.1312). This indicates that the knowledge of currents further out into the future is beneficial when going toward the goal with the currents, but is less important than it is for slower currents. (b) Simulation results for 270 simulations each for all 4 planners in this chapter, when going against relatively fast currents. Here we notice that there is no improvement through having more knowledgeaboutthecurrents-theomniscientMDPalsotimesoutthesameamount as the non-stationary MDP. (c) Statistics of the planners when going with fast currents. (d) Statistics of the planners when going against fast currents. 166 Start MDP execution Goal 18 km 0 Santa Catalina Island Figure 4.10: The figure above shows the execution of the stationary GP-MDP being executed by the Slocum glider. The experiment began at 19:10 PST on August 17, at the location denoted by the blue triangle. The glider arrived at the goal denoted by the black square at 3:21 am PST on August 19. The path being executed by the GP-MDP was generally with the direction of the currents. 167 MDP execution Start Goal 18 km 0 St. Catalina Island Figure 4.11: The figure above shows the execution of the Non-stationary GP-MDP being executed by the Slocum glider. The experiment began at 19:10 PST on August13, 2013withthegliderstartingatthelocationdenotedbythebluetriangle. Thegliderfollowedthepathshowninred,surfacingatthelocationsindicatedbythe red diamond markers, and finally arrived at the goal around 16:20 PST on August 16, 2013. Notice how the non-stationary GP-MDP had practically no oscillations (since the goal was reachable within 48 hours when going with the currents). We also note that the planner had to cross one set of shipping lanes but chose to cross the shipping lane to the east that required a longer path around the eastern and later around the central virtual islands to get to the goal. 168 Goal (6,16) (a)Non-stationarypolicyonAug15, 2013 Goal (6,16) 0.28 m/s (b) Simulated trajectories to goal at same time Figure 4.12: Screen-shots from the Glider path planning library during the field trial on August 15, 2013. (a) Non-stationary GPMDP policy computed when the glider surfaced at the location just south-east of the central island. Notice how all the arrows at this instant of time are goal directed. (b) Forward simulations with varying amounts of noise in the currents indicate that the glider is likely to successfully execute the computed policy from the surfacing location it is currently at. 169 (a) Execution over 10 hours against cur- rents Santa Catalina Island 18 km 0 Goal (b) Policy for NS-GP-MDP against cur- rents Figure 4.13: (a) The figure above shows the execution of the Non-stationary GP- MDP being executed by the Slocum glider. The experiment began at 18:00 PST on August 16, at the location denoted by the blue triangle. The glider would have to travel against the currents and ended up executing policies without mak- ing any progress toward the goal. (b) Closer inspection of the policies generated corresponding to the time of the experiment revealed that none of the locations in the red region had goal-directed actions. The actions took the glider to low-risk regions of the map and caused it to oscillate there. These policies result in the NS-GP-MDP timing out quite often in practice due to a lack of information about the ocean currents in the future. This is a practical limitation imposed upon the planner due to the limited time-horizon for which ocean current predictions are available. We were able to get the glider to proceed toward the goal subsequently by manually directing it out of the red region, and resuming the mission using the latest computed non-stationary policy. This second run of the planner against the currents was also successful. 170 Chapter 5 Conclusion In this thesis we presented solutions to the problem of planning low risk paths for slow-moving autonomous underwater vehicles in the open ocean. Prior to this thesis, although ocean models existed, their use had not been combined to plan risk-aware paths for AUVs like gliders. Moreover, most of the work in the litera- ture used predictive ocean models deterministically. Although we began our work with deterministic algorithms to search for low-risk surfacing waypoints, the pri- mary contribution of this work is in developing probabilistic planning algorithms which reason about the uncertainty in the predictive ocean models and use this information in planning. Chapter 2 defined the problem of minimizing risk for slow-moving AUVs, and de- signed an algorithm based uponA ∗ which could find the minimum-risk path in the planning graph. We also showed how objectives such as goodness of communication could be used in the same planning framework to find paths optimizing criteria that 171 are related to collision risk. This initial work ignored the effect of ocean currents. Later in the chapter, we described a time-expanded version of our minimum-risk planner which is designed to use predictive ocean models to plan risk-aware paths. This algorithm was designed for use in conjunction with another method called "Pseudo-waypoints" where alternative target locations were generated using ocean predictions, such that a glider executing the plan would surface at the surfacing locations desired originally. We call this the time-expanded minimum risk planner. The biggest drawback of these planners is that ocean models are inherently noisy and this uncertainty makes planning of paths deterministically and their execution blindly a hazardous option for slow-moving vehicles, particularly because ocean currents are often faster than the gliders. The predictive ocean models which provide forecasts of ocean currents are coarse and noisy. This makes planning paths using predictive ocean models deterministi- cally impractical - the uncertainty can severely increase the risk of paths planned by deterministic planners like the Minimum Risk planner which ignores currents. Chapter 3 clearly shows the utility of probabilistic planning. We design two proba- bilistic planners - the Minimum Expected Risk (MER) planner and the risk-aware Markov Decision Process (MDP). The MER planner minimizes the expected risk of surfacing at high-risk locations while heading to the goal. It performs well in low-variability currents and is most useful when goal-directed behavior is of high- est priority. In fast currents however, planners need to trade-off between being 172 goal-directed and heading for low-risk regions until the currents reduce speed, after which the glider can continue on its way to the goal. We notice that in fast currents with high variability, the MDP significantly reduces the number of collisions. The performance of the MDP is affected by the noise assumed to be present in the cur- rents. This led us to hypothesize that "probabilistic planning algorithms aided by learning uncertainty in noisy predictive ocean models, significantly reduce collision risk for slow-moving autonomous underwater vehicles". Chapter 4 investigated this hypothesis by using a well-known data-driven technique from machine learning called Gaussian Processes (GPs) to learn uncertainty in the predictive ocean models. We utilize this information to develop two planners - a stationary GPMDP and a non-stationary GPMDP. Extensive simulations have shown that the stationary risk-aware GPMDP has very good performance in slower currents with lower variability. It significantly reduces the number of collisions vs. number of successfully completed paths in comparison with a naive policy switching MDP and the non-stationary GPMDP. The non-stationary GPMDP in this case suffers from a lack of information because the prediction horizon from the ocean model is often not long enough for the policy to find a path to the goal. This results in the non-stationary GPMDP timing out instead of allowing the glider to reach the goal. We notice that if we provide the non-stationary GPMDP with horizons of a week as opposed to the 72 hours (in the ROMS model we used), there would be significant improvements in reducing the number of timeouts. 173 We also learnt that when operating gliders in relatively fast currents with high vari- ability, the non-stationary GPMDP significantly reduces collisions in comparison with the stationary planners. We also noticed that in such cases having a much larger prediction horizon does not reduce the number of timeouts significantly. It also indicates that in challenging regions with lots of obstacles, slow-moving vehi- cles should use the non-stationary GPMDP until the speed of currents are certainly slower than the speed their own nominal speed. When the speed of currents in the region are slower than the speed of the glider, the stationary GPMDP can be uti- lized to ensure a high success rate toward the goal. This thesis has demonstrated the usefulness of predictive ocean models in planning low-risk paths for slow-moving AUVs. During the course of this work we have ex- tensively tested our algorithms for≈ 2000 hours between 2011 and 2013. We found the performance of our algorithms to be representative of similar trials observed in simulation. These extensive simulations indicate that data-driven techniques learn- ing the uncertainty in the predictive ocean models, significantly benefit planning both in low-speed currents, as well as high-speed currents, especially in complex regions with lots of obstacles. We have also developed tools that enable automat- ing risk-minimizing tasks which plan low-risk paths for gliders to enable safer glider operations. The key contribution of this thesis is the development of the first risk- aware probabilistic path planners for AUVs which reason about the uncertainty of ocean currents predictions. Through learning the uncertainty in noisy ocean 174 current predictions, our planners can significantly reduce collisions in challenging scenarios, enabling gliders to be operated in faster currents, closer to land than was previously possible. 175 Appendix A Practical aspects to Coastal Ocean Observatories Coastal ocean observatories such as Rutgers University Coastal Ocean Observation Laboratory (RU-COOL) and the University of Southern California’s Center for Integrated Networked Aquatic PlatformS (CINAPS) are aggregates of autonomous underwater vehicles, moored buoys with sensors, high frequency radar installations and other data sources which together provide a synoptic and integrated view of a coastal region. The aim in such observatories is to have sustained ocean sampling capabilities, and gliders such as the Teledyne Webb Research Slocum glider are ideal platforms due to their ability to monitor the ocean for extended periods of time. These vehicles can provide a spatio-temporal view of slowly evolving ocean processes. The electric version of these vehicles can operate for periods of time ranging from several weeks to several months. Slocum gliders are able to stay at sea for long periods of time by being very frugal in their use of energy. Unlike propeller driven AUVs which constantly utilize power 176 to drive thrusters and fins, a Slocum glider uses most power to change buoyancy between dive and climb maneuvers which it uses alternatively to move up and down. The hydrodynamic shape of the glider along with wings help it convert this vertical motion into horizontal motion. The glider also has a rudder in the tail section which aids in changing heading and through the combined use of the climb, dive maneuvers and heading changes is able to move between waypoints. When a glider is underwater it estimates its position by dead-reckoning from an open-loop model. When the glider resurfaces it is able to get a GPS position update. Without measuring the true ocean currents, a glider is unable to dead-reckon accurately and inevitably surfaces at a location which is at some distance from the waypoint it was aiming for. The glider can determine the average current vector acting on it during the last dive segment and attempt to compensate for it during the next dive segment. Communication with shore enabling glider pilots to download data collected or update the AUVs with new mission files. Gliders deployed at sea are designed to communicate with shore via satellite modems such as Iridium. This is an expensive form of communication and has low bandwidth in comparison to the secondary communication method - the high-frequency line-of-sight communication Freewave modem. Unlike Iridium, Freewave range (as well as bandwidth) are limited by the range between the glider and the base-station. Due to the low data rates on Iridium, the glider has to spend a significant amount of time at the surface 177 transmitting data files. When the glider is operating in coastal regions, time at the surface transmitting data significantly increases the chances of the glider being hit by a vessel. Although the algorithms in this thesis are designed to keep the AUV safe, the best way to avoid collisions when the AUV is at the surface is to keep the time at the surface as short as possible. We propose that base-stations at elevated locations could be used to communicate with the Freewave modems already on the glider with minor modification to each glider. We describe the coastal base-station network which we can use to communi- cate with gliders at the surface at higher data rates (which in turn reduces the time the glider has to be at the surface). In section C.3 we describe results from exper- iments conducted to measure throughputs available. Throughput measurements can in turn drive multi-objective planning criteria, such as planning for minimizing risk at surface while also minimizing the cost of data transmissions. During our experiments at sea, we noticed that the communication quality was related to the sea state. This led us to develop an algorithm for classifying communication quality based upon sea state. 178 Appendix B Description of field trials of glider AUVs Glider AUVs (see figure B.1) used to collect oceanographic data at USC in the Southern California Bight are Teledyne Webb-research Slocum (electric) gliders. Figure B.2 shows the parts that make up a glider and what each part general contains. The glider is able to change its volume by moving a bellofram in the nose - when retracted inward decreases the displaced volume (causing the glider to sink in the ocean), and when pushed outward increases the displaced volume (causing the glider to float upward instead). The glider simultaneously controls the position of the battery pack in the forward section so as to adjust itself to a desired pitch angle (also called pitch-servoing). During a climb, the glider increases buoyancy (for net upthrust) while simultaneously moving the battery pack back (to make the nose pitch upward). When diving, the glider decreases buoyancy (to sink downward) while simultaneously moving the battery pack forward (to make the nose pitch downward). The glider body and passive wings are able to produce 179 enough lift to convert a part of the vertical movement into horizontal movement during both the climb and the dive. Our glider manufacturer suggests using dive and climb pitch angles of±26 degrees since (empirical studies show that) these angles give energy-efficient glide characteristics under normal operating conditions (between 0 and 100 m depth). The glider uses a rudder in the tail to control its yaw angle which helps the glider change its heading. In this manner, the glider is able to move between waypoints by climbing and diving in the vertical axis, which consequently produces movement in the horizontal axis. The glider changes direction in the horizontal axis using the rudder and uses a magnetometer compass to adjust its own heading to that desired by the guidance law which will take the glider to the desired waypoint. The glider is able to measure its depth fairly accurately using a depth sensor. In the horizontal axes as the glider glides underwater, the glider cannot measure its own state (position, velocity or accelerations) and thus has growing uncertainty about its own position. This uncertainty about its position is compounded by ocean currents which modulate the speed of the vehicle and can vary spatio-temporally in the region the vehicle is operating in. Since our gliders do not have an instrument to measure the speed of the currents or their own speed relative to the ground or with respect to the currents, navigational accuracies are typically poor. The glider’s velocity during dives and climbs also depend upon how well-ballasted the vehicle is (a process described in section B.1). A poorly ballasted glider may dive 180 FigureB.1: Thetwoglidersusedintheworkdescribedhere, waitingfordeployment at the USC Wrigley Institute for Marine Sciences, California, USA. We deploy the gliderswithstickersindicatingthatthegliderisascientificinstrumentwhichshould notberecoveredbyotherpeople. Wealsoprovidecontactphonenumbersforpeople to report/enquire about gliders they may come across. Needless to say, there have been numerous occasions when we have been contacted by people who found the glider (as well as had our gliders recovered by fisherman/sailors, when we would have preferred that they were collecting data instead). We have been fortunate not to have lost either of these gliders in 5 years of deployments in fairly busy parts of Southern California. or climb very slowly producing little horizontal velocity which affects the ability of the glider to move between waypoints. Even with perfect ballasting, the glider (which typically has a nominal horizontal speed of 0.2 m/s to 0.4 m/s) may be over-powered by the currents and be unable to make progress against them. This uncertaintyinpositionestimationaswellas(unpredictable/unmeasurable)external factors make glider navigation particularly challenging. 181 The glider is equipped with a suite of sensors that it uses to collect scientific data while sampling the water column. Our gliders have a conductivity-temperature- density (CTD) sensor, fluorometers (measuring chlorophyll) and spectrometers (measuring back-scatter). The choice of sensors is driven primarily by the scientific objectivesbeingpursuedbyourcollaboratorsfromthebiologicalandoceanographic sciences. The computation power available on gliders is minimal due to power con- straints - every component is designed to be as power-efficient as possible. Beside the energy consumed at the start of a dive or climb maneuver (propulsion power), most of the remaining energy in the vehicle is utilized for scientific data collection and communication with shore. This energy efficient design enable our gliders to sample the oceans for a duration of a month at a time. We typically operate the gliders such that they surface every 4-8 hours to transfer data collected giving glid- ers a mission file consisting of a set of waypoints which it needs to visit. The glider must be at the surface to either communicate or to use GPS for a position update. Thegliderisabletocommunicatebi-directionallywithshoreusingeitheraFreewave (a high-bandwidth, short-range line-of-sight radio) or Iridium (a low-bandwidth, globally available data service). It is also able to use the Argos satellite-based location and data collection system to transmit a few bytes of position and sensor information when at the surface. This form of communication requires very little power and provides an independent means of locating the glider (from the GPS) - which has proven useful to us in locating gliders which had hardware/software 182 failures as well as a faulty GPS which reported wrong location estimates on one occasion. The Iridium modem is the primary mode of communication in gliders since it is globally available. Since our glider operations are primarily coastal, we have developed faster and more inexpensive ways to communicate with gliders in the Southern California Bight. We describe one such method in more detail in section C, where we propose the use of accelerometery data to aid in AUV-shore communication using LOS radios. Nose Cone: Altitude sensor & Bellofram Actuator Forward section: Pitch-control battery pack and motors for ballast pump operation Science Bay: Chlorophyl-A, CTD, Back- scatter etc. Aft section: GPS, Compass, ARGOS, Iridium and Freewave modem, Flight computer, pressure sensor and batteries Tail Cone: Air-Bladder Jetisson Wt GPS/Iridium and Freewave antennas Wings for improved gliding performance Rudder in Tail-fin Figure B.2: Parts that make up an electric Slocum glider. The glider consists of 5 parts: the front consists of the nose section, which houses the bellofram (used to change the volume of the glider for buoyancy control), as well as an altimeter sensor (which uses sonar to avoid hitting the bottom during dives). The forward section contains batteries mounted on a motorized-tray that slides back and forth (which cause the glider to pitch up and down respectively when underwater). The middle section of the glider (called the science bay) holds electronics and sensors related to oceanographic sensing such as sensors for conductivity, temperature and density (CTD), fluorometers, spectrometers and so on. The aft section contains another large immovable battery pack (located only at the bottom for added stability), as well as the bulk of the electronics required to navigate, control, communicate and sense the attitude and position of the vehicle. The tail cone contains an inflatable air-bag and a jettison weight (ejected for safety reasons if the glider is trapped underwater). The airbag inflated when the glider is at the surface to raise the tail (containingtherudderusedforheadingcontrol),sothattheglidercancommunicate or get a GPS position using the antennae in the tail. The glider also has passive wings on either side providing improved gliding performance during dive and climb maneuvers. 183 B.1 Pre-deployment procedures A glider deployment requires much preparation - more so for a multi-glider de- ployment. The gliders need a battery change as well as preventive maintenance checks on the seals and other components such as sacrificial anodes (that prevent corrosion). The battery change involves disassembling the glider and removing the battery packs from the forward and aft sections. Figure B.3 shows glider techni- cians working on a glider during a battery change operation. During the battery change it is also useful to keep track of the minor changes in mass between the old battery packs and those replacing them. These differences in mass need to be compensated to ensure that the glider will be well ballasted so as to be able to climb and dive with ease in the ocean. After putting the glider together, we use a vac- uum pump and adjust the internal pressure inside the glider to 6 inches of mercury. This pressure difference with the atmosphere keeps the glider pulled-together more tightly (especially to automatically deflate the rear air-bladder during operation) and makes it easier for us to detect leaks (if any) which could endanger the safety of the glider at sea. In order to perform ballasting, it is preferable to have access to a large tank filled with water with the same density as that in which the glider will operate. This is important because a glider in sea-water is ballasted heavier (in air) than a glider meant to operate in fresh water. The sea water in the tank is usually freshly pumped so as to have density similar to that in the ocean where the glider will 184 Figure B.3: Glider technicians Carl Oberg and Matthew Ragan work on a glider during a battery change. Notice how the glider comes apart longitudinally into the various sections described earlier. Carl is also writing down the difference in mass between the previous battery pack and the new one which he uses to guide the ballasting procedure for the glider whereby the glider’s mass is adjusted to make it (nominally) neutrally buoyant in sea water. operate. Figure B.4 shows one of our gliders being fork-lifted into a ballasting tank filled with sea-water. The ballasting procedure for gliders is fairly time consuming as it involves adding or removing weights (which are in small plastic bottles located in the glider’s forward and aft sections), such that the glider is barely floating in the water (with the buoyancy pump at the neutral location, the air-bladder deflated and the pitch-battery at the zero-location). A well ballasted glider’s antenna will be the only part barely breaking free from the surface of the water - while the rest of the glider should be submerged and near horizontal (without any pitch or roll). This gives the glider the ability to climb and dive at approximately the same speed which gives it a more evenly distributed speed while traveling horizontally (as well as vertically). 185 FigureB.4: GlidertechnicianCarlOberg, intheballastingtankwithaglider(being fork-lifted into the tank), at the Southern California Marine Institute, San Pedro CA. B.2 Glider Deployment Once the glider is mechanically fit to fly, we go through a final pre-deployment checklist on shore to ensure that it every sensor or means of communication is functional. This involves testing to ensure that we can communicate with the glider via Freewave, that it is holding vacuum correctly, the GPS indicates the correct location and that it is able to call back to the base-station via Iridium. After ensuring that the mission critical sensors are loaded onto the glider, we also check to ensure that the mission files we will use during this deployment are loaded onto the flight computer. With all these items ascertained, the next step is to head out to the deployment area in a small boat or research craft. We have deployed gliders from a wide variety of platforms (from larger vessels with an A-frame to small boats with an outboard engine). We have also deployed gliders directly from the dock at Catalina (although an experience with one of our gliders getting stuck 186 in a Kelp forest taught us not to repeat this procedure). The typical procedure we follow is to go to an area with fairly low boat traffic where we can run further glider qualification tests. We start off with a set of pre-determined test missions designed to ensure that the glider can dive, climb and call back to the shore-side computer successfully. The shore-side computer typically sends us email when the glider successfully commu- nicates with it via Iridium. The email helps us confirm that we can communicate with the glider when it resurfaces prior to launching the glider on a real mission. We also typically start the glider on a loiter behavior (such as a small box of way- points) to ensure that it is functioning correctly before we remotely task it to follow the mission we have designed for it. A glider launch may be completed either from the boat using a laptop via a Freewave radio, or with the help of another glider pilot who remotely operates the glider via Iridium using the Dockserver (a com- puter software provided by Teledyne Webb for interacting with the glider). We have also successfully launched gliders (running missions using non-standard soft- ware which we developed) from our cell-phone. Figure B.5 shows the author typing in commands to start the glider on a mission during field trials in September 2009. Once a deployed glider is executing a mission and we are satisfied that it is com- municating with us with all sensors functional, we typically head back to shore and continue operating the glider remotely (either via Iridium or our Freewave base-station network). 187 Figure B.5: Glider pilot Arvind Pereira, tasks the glider on a mission via a smart- phone (Apple iPhone) in september 2009. The iPhone transmitted commands to the glider via a 3G connection through a shore-based Freewave base-station which in turn communicated with the glider through a Gumstix computer which resides in the glider. The software (and hardware modifications) on the glider and base station, were jointly developed by Arvind and Hordur Heidarsson. This serves as a demonstration of how improvements in mobile communication devices can simplify tasks such as AUV operation at sea (particularly in coastal scenarios) and is the first time glider AUVs were operated using only a cell-phone (to the best of our knowledge). 188 B.3 NormalGliderOperationsduringdeployment A glider runs mission files which consist of a list of waypoints which the guid- ance system onboard the vehicle attempts to navigate between. The glider is also provided with other parameters such as the maximum dive depth, pitch angles, maximum time to stay submerged without communicating with shore, and so on. These parameters are typically designed for a region of operation as well as based upon the science parameters we are interested in. Typically, a human operator (designated the glider pilot), is responsible for creating mission files (in consulta- tion with scientists who define regions they need to sample in). The pilot chooses waypoints in the region of interest which appear to be safe for glider operation, writes missions and manually transfers the missions to the Dockserver (in anticipa- tion of the glider calling in via Iridium). When a glider calls the Dockserver, the glider pilot manually instructs the glider to download the mission files and then runs the mission on the glider. Gliders will call back periodically when they hit a waypoint (i.e. when the glider dead-reckons that its position is within a watch-circle around the desired waypoint), or based upon criteria such as elapsed time without communication and so on. These surfacing calls also provide the glider with GPS updates without which the uncertainty in a gliders position may result in it running into land. The length of dives between surfacing locations is chosen carefully to ensure that the drift in the 189 glider’s position uncertainty does not exceed that which may result in the glider crashing with land bodies. This computation is difficult to perform manually and requires much experience and judgement on the part of the glider pilot. It is useful to have software on shore which is capable of not only informing the decisions of a glider pilot (such as the risk of collision due to a particular choice of waypoints), but also one which is capable of helping the glider operator forecast mission plans in ocean current predictions as well as plan entire paths automatically for the glider operator. In section B.4 we describe a tool we have developed that takes steps in this direction. B.4 Automated and Semi-Automated plan executionusingGliderPathPlanningLibrary (GPPLib) The author has developed a planning framework for planning glider paths called GPPLib (an acronym for Glider Path Planning Library), which we use to perform the glider planning described in this work. It consists of several modules designed to work with the regional ocean model data, glider models and contains several planning algorithms as well as code that generates missions for the gliders auto- matically (as outputs from the glider). Figure B.6 shows an example simulation of 190 20 gliders performing a single long dive in a (historical) ocean current field during a 20 hour time period. It also has collision detection which helps us determine when any of the gliders collide with land bodies. GPPLib has been written in C++ and Python (but also has a few utilities written in Matlab primarily for interaction with ROMS data). Figure B.6: A 20 hour simulation of an ensemble of over 20 gliders, with random start locations (in the south-east) and a single goal location (in the nort-west) using GPPLib’s kinematic simulator. GPPLib’s collision detection system detects collisions between the gliders and Catalina island (the land body to the south). The trajectories that result in collisions are depicted as thick red lines. 191 GPPLib helps simplify decision making for the glider pilot through utilities which aid in planning, simulation and prediction of glider paths (using ocean current predictions and glider models). One of these methods provides Monte-Carlo simu- lations of possible trajectories that may be followed by the gliders based upon the time of surfacing. Advised by these predictions, a glider operator may decided to continue or abort the plan. Without such tools a glider at sea sometimes, requires persistent monitoring on the part of the operator (a situation the author is very aware of after years of playing the role of a glider pilot during months of deploy- ments). For planning situations where we re-plan the glider path based upon its surfacing location and the time of planning, a glider pilot potentially needs to be able to service gliders around the clock (several semi-automated experiments in July-August 2012 were conducted this way). Toward the end of these physically exhausting situations however, we developed a mode of operation which allowed for potential (complete) automation. We have worked toward automating a large part of the workflow which would normally require a human operator in the loop at all stages. When a glider calls in (via Iridium), we detect a glider connection via a Python software tool called Dockserver-Talk (developed by Lucas Merckelbach, who incidentally also did work on developing risk-maps independently and at approximately the same time as that of the author). We use Dockserver-talk to communicate with the glider through the Dockserver from Teledyne Webb. This provides us with the location of the glider, 192 using which we can invoke the planner we are using for that glider. This planner uses the glider’s location and time of surfacing to determine the policy or plan to be followed by the glider. This plan is then simulated under varying amounts of noise in ocean current predictions to provide a set of candidate trajectories the glider might take in the predicted current field for that time period. If none of the trajectories result in a high risk of collision, the mission will be uploaded to the Dockserver (via FTP) and then sent to the glider using Dockserver-Talk. The mission can then be started automatically (if setup to do so). If one or more of the trajectories has potential collision risks higher than a threshold set by the operator, themissionisnotautomaticallystartedandthegliderpilotisalertedviaemail/text message to attend to the glider themselves. This workflow drastically reduces the workload needed to be handled by the glider pilot as well as allows glider operations to be scaled up to multiple gliders without adding to the stress faced by the glider pilot. B.5 Glider Retrieval A glider is typically recovered (upon completion of tasks at sea, or when the batter- ies are running low whichever is earlier). Other scenarios where a glider may have to be recovered include glider damage due to collision with ships, sensor failures, communication failures, perceived leaks and other emergencies such as ejection of 193 FigureB.7: AgliderretrievalwaypointsenttotheMotionXGpsapptohelprecover the glider. Clearly this screenshot was not created when retrieving the glider (we do not recover gliders at night due to visibility constraints). Applications such as MotionX Gps make it finding gliders very convenient since it obviates the need to manually enter the GPS coordinates into a GPS, allowing the boat operator freedom to steer the boat along the course provided by MotionX. The app takes advantage of the built-in Compass and GPS sensors on the phone to provide a range, bearing and expected time of arrival to the glider location (based on the location and speed of the boat). 194 the jettison weight. If the glider is able to use the Iridium modem in these situ- ations, a glider pilot typically relays the position to the recovery team at sea via email or phone. The recovery team will then use the last known GPS location from the glider to find it. Once a glider is located, it can typically be pulled onto a small boat by one or two people. For larger boats a swimmer may have to guide the glider into a sling and have it winched up using an A-frame. Most of our glider recoveries have been done using a small boat. Recently, a new utility in GPPLib (a screenshot of which is shown in Figure B.7), was used to aid in glider recovery by providing the latest glider locations via email. This sends out an email with a GPX attachment which opens directly in a popular application called Motion X Gps for the Apple iPhone and iPad. This app provides a vector to the glider from the recovery boat and aids in rapid recovery of the gliders after the mission is complete without requiring a person to be present at the Dockserver on shore to send the latest position updates. We have successfully used GPPLib’s glider recovery tool to locate and recover two gliders during three separate deployments. B.6 Discussion Glider operation is typically a complex task involving a large team. At USC we have several Glider pilots who typically take turns at operating the gliders to allow 195 round-the-clock monitoring of gliders in the field. Gliders require maintenance and at times careful preparation is not enough to ensure that gliders will always be sea- worthy. We have had our fair-share of exciting stories due to gliders being picked up by random sea-farers, or having hardware or software failures rendering the glider to an incommunicado state where it was drifting with the currents without transmitting its position. We have been fortunate to be able to recover our gliders successfully so far (albeit with many sleepless nights when a glider appeared lost at sea). We believe automatic planning can significantly aid in AUV operation especially in coastal regions. Recent research from groups using AUVs ( [13], [47]) into using AIS data for risk-averse decision support for AUV operations indicates the importance of using computing in such scenarios. We will continue developing tools that aid in safer AUV operations through the use of planning techniques which exploit ocean current predictions (with and without uncertainty). It is also interesting to add multi-criteria objective functions which can trade off between various competing objectives (science goals such as information gain and so on). We hope to grow GPPLib into a useful set of tools and release it for use by the AUV community to simplify path planning for glider pilots and other decision makers. 196 Appendix C Risk-reduction via improvements to AUV-shore communication In this work we reported the development of a lightweight TCP-like communication protocol that performs communication between a glider at the water surface and a network of coastal base stations (each station is equipped with a Freewave radio modem). The primary motivation for doing this was to lower operational cost by reducing Iridium usage which constitutes a significant portion of operational costs for the vehicle. Our protocol allows the transfer of large files at high speed, provid- ing an improvement of∼24x over Iridium-based communications. This allows the gliders to spend less time at the surface thus improving operational safety; an im- portant consideration in a busy coastal area such as the SCB. The communication protocol was designed from the ground up to handle multiple gliders communicat- ing with multiple Freewave sites - a feature which is not easily possible with the commercial off-the-shelf Slocum gliders. 197 900MHz Radio File- Transfer Module Glider-Base communications Module Glider-console parser and command/status module File-Txfer via Z- modem Glider-Gumstix communications Module Glider Radio Modem Iridium Satellite Internet Phone Modem Base Station with Radio Modem and Internet Iridium Tranceiver Station Glider Computer (Flight Persistor) Gumstix Satellite Data-Link Slocum Glider at Sea Surface Status Commands Files Glider-Base communications Module Base-Station Computer Base to GliderServer Communications Module Glider-Control Server Glider-Server to GliderClients Module Central Server with Internet Web Interface Internet Iridium serial-data Freewave Serial Data 3-axis Accelero- meter Figure C.1: A Block diagram depicting the communication system and the shore- based Glider Mission control server which allows users to plan missions and view the status of the glider through a web-based interface. 198 Figure C.1 (based on an earlier version in [19]) shows how a glider at the surface can use either its Iridium satellite phone to communicate directly with a modem connected to the shore-side glider control server or it can communicate with the server via shore-based base stations using its Freewave radio modem. The figure also gives a logical overview of the typical data flow between the glider’s main computer and the communication logic running on the glider’s secondary computer which we have added to the commercial off-the-shelf system. Figure C.2: A deployed Slocum glider. Note the nose-down posture with the tail elevated to facilitate communication. In this picture, the antenna is located ap- proximately 30 cm above the sea surface. In a typical deployment where the glider’s antenna is approximately 30 cm above the sea surface during a surfacing, waves near the glider can cut off visibility of the shore based antenna. However there are short periods of time when the glider may be riding the crest of a wave which is locally higher than others, thus giving it line-of-sight (LOS) to the base station. Since sea waves are periodic, we expect to periodicallyobtaincommunicationbetweenthegliderandthebasestationwhenthe glider is at such locally elevated positions. A long term objective of this research 199 is to exploit the periodicity of these drops in LOS (and hence the link itself) to adapt packet size to improve the link-state protocol between the glider and the base stations. Base Station Antenna Ha LOS line between antennae Typically only height of antennae and curvature of the earth affect LOS range Near horizon, small waves may obstruct LOS Communication Range Sea surface Curvature of earth Glider at Surface Figure C.3: Schematic depiction of glider to shore line-of-sight (LOS) obstruction due to waves. (not to scale) The Freewave modem [73] we use in the experiments described here has in-built buffering and re-transmissions to ensure a reliable modem to modem link. If the link between the modems drops long enough, the modems lose carrier. In this case it takes longer for successful communication to take place. While our ultimate goal is to determine appropriate communication parameters from test packets and accelerometry, here we discuss progress in the first step to- wards this goal. We present results from recent field tests to measure the impact of sea state on communications. These measurements establish that the useful ter- restrial radio communication range depends strongly on the vehicle to base station distance as well as the local sea state experienced by the vehicle. Our analysis suggests that it is possible to predict communication quality from the sea state at 200 the vehicle’s location and the distance between the vehicle and the nearest base station. This, in turn, enables the vehicle to make a decision on whether it should use its short range radio or its satellite modem. We discuss how this finding enables online modification of low-level communication protocol parameters (e.g., packet size, in analogy with GSM networks [74]) C.1 HardwareModificationsandCommunication Protocol Design In this section we describe, in brief, modifications performed on a commercial off- the-shelf glider to enable them to use our communication protocol as well as to collect accelerometer data to assess the effect of sea surface conditions on our com- munication system, in a minimally intrusive manner. Next, we describe the com- munication protocol used to communicate between a glider and a base-station it is communicating with. C.1.1 Hardware Modifications Our communication experiments required making minor modifications to the hard- ware on the glider. We added a low-power Linux-based computer to the glider called the Gumstix Verdex which sits between the glider’s main flight computer (the Persistor) and the radio modem. The Gumstix appears as a Freewave modem 201 to the Persistor. It takes a serial bit stream from the Persistor and packetizes it before sending on to the Freewave radio using the protocol described in [19]. We developed this protocol to enable the operation of multiple gliders in a region where the same base station might end up servicing several gliders at the surface as and when they make (break) connections with it. We installed a Phidget 3-axis accelerometer on the glider, which is sampled at 50Hz. The accelerometer is interfaced with the Gumstix via USB, and is mounted at the rear of the vehicle (Fig. C.4). The Gumstix shares its power supply with the Freewave modem, and is powered up only when the glider is at the surface and transmitting data. This helps to keep power consumption to a minimum, since the glider does not typically surface very often during normal mission operation. When the glider detects that it is about to surface, (based on readings from its pressure transducer), it switches on the Freewave modem (and consequently the Gumstix and accelerometer). The Gumstix boots up and executes our communica- tion code, which attempts to look for a carrier on the Freewave modem (indicative that the glider can connect to a shore-based station). Figure C.5 shows the typical procedure that the Gumstix performs during a glider surfacing. We programmed the Gumstix to perform communication-related tests over two field deployments conducted in May 2009 and September 2009. During the trials in May 2009, the goal was to test the feasibility of using the shore station-based communication network and to determine file transfer speeds. In September 2009, 202 Figure C.4: The glider Persistor, additional Gumstix computer which runs our communication protocol stack, and accelerometer as installed on the USC Slocum glider. the goal was to concurrently record accelometry and communication quality (using test packet sequences) and develop some insight into the dependence of communi- cation quality on location and local sea state. In time, we expect to use these data to develop algorithms for modifying communication parameters (e.g., packet size) in real time based on local sea state at each surfacing to improve glider to shore communication. C.1.2 The Communication Protocol Used on the Vehicle The communication protocol we have implemented on the glider is a simplified lightweight version of TCP with selective acknowledgement. Outgoing packets are queued for transmission while incoming packets are buffered and analyzed for er- rors before being processed. Each valid packet contains a header which contains a packet identifier field used to distinguish between data and (various) command 203 packets. In our implementation these could be file transfer packets, console pack- ets, glider status packets, acknowledgement packets or test packets. In the case of some packets we want to ensure reliable transmission. These are placed in a re-transmission packet queue to be re-transmitted in the event of them not being acknowledged before the retransmission timeout (RTO) occurs. The acknowledge- ment is based on a selective acknowledgement scheme. Packets that need not be reliably transmitted are not copied to the re-transmission queue. When the modems have difficulty communicating, we reduce the packet sizes to improve the chances of a successful transmission. We refer to this as varying the Maximum Transmission Unit (MTU). Acknowledgement packets provide feedback to vary the MTU and the inter-packet transmission time. Successful transmissions (indicated by acknowledgement packets) are used to increase the MTU while re- ducing the inter-packet transmission time. On the contrary if we timeout on the acknowledgement of packets, we reduce the MTU and increase the inter-packet transmission time. An increase in the inter-packet transmission time provides the radio modems more time to attempt re-transmission of data using their own inter- nal protocol. The radio modem uses small packet sizes, hence we need to develop our own protocol for communication. As long as the communicating modems have a persistent LOS this model for com- munication tends to work fairly well. We have been able to utilize the commu- nications computer (Gumstix) to perform additional processing to zip data files 204 from the glider to reduce file sizes drastically. We can also multiplex file transfers between the glider while simultaneously being able to command the Gumstix to perform other tasks such as compressing data files or retrieving new files from the Glider’s computer (Persistor). We point out that since the Gumstix is powered up along with the radio modem, (which happens only at the surface), we still require to transmit data between the glider persistor and the Gumstix. This transmission is relatively fast since there is a reliable wired data link between these two computers. After a single file begins transferring between the glider and a base station, subse- quent files can be transferred between the glider and the Gumstix in parallel with transmissions from the Gumstix to the base stations. We have also implemented capabilities to differentially build data files to prevent the need to initiate a new file transfer if the file was partially transmitted during a previous data transfer at- tempt. This feature (which was already present in the glider Persistor’s Z-modem protocol) is extremely useful during file transfers. File transfers are typically high bandwidth applications and tend to take up most of the bandwidth available on the modems. To ensure fairness as well as the abil- ity to provide higher reliability for important packets (e.g., control packets) we also provide a priority field in the packet header. The packet transmitter sorts the outgoing packets based on priority to attempt to transmit the higher priority packets before the lower-priority packets. File transfer packets are usually assigned a lower priority than command packets. This also provides fairness and improves 205 the interaction with the glider during file transfers by improving the speed of the relay of these packets. As the radio modems move further apart, there is a natu- ral degradation in the signal quality and consequently a reduction in the available bandwidth between the modems. Local surface waves intermittently obscure the LOS between the glider and base station radio antennae. This is primarily due to the glider antenna being too close to the surface of the sea. Finally, we note that the curvature of the earth also affects the LOS and limits or reduces the bandwidth for communication between the modems. C.2 Data Collection and Processing We tested the modified glider in the SCB during May 2009 off the Santa Catalina island in California, USA. The goal was to measure coverage and quality, i.e., to build a spatial map of communication quality. We conducted a second set of tests during September 2009 where the glider was modified with the addition of the accelerometer to collect information about the sea state while concurrently measuring communication quality. This section describes the data collection and processing involved to compute features used to classify communication quality based on accelerometer data. 206 UnlikeNationalDataBuoyCenter(NDBC)buoyswhichgenerallyrelyon20minute time series at 2 Hz, we collect shorter accelerometer time series during glider surfac- ings. The surfacing event durations range between 3 and 10 minutes during which we sample the accelerometer at 50 Hz. The glider experiences roll, pitch and yaw at the surface due to its hydrodynamic interactions with the water. In Section C.2 we describe an algorithm for processing accelerometer timeseries to obtain acceler- ation in the heave axis, from which wave parameters are computed using standard techniques. As reference we compare these estimates with those from the nearest NDBC buoy (approximately 25 km away). C.2.1 Data Gathering at Sea In the course of the May 2009 tests for communication quality and coverage (pri- marilyfiletransfersatdifferentsurfacinglocations)wenoticedsignificantvariations in performance at comparable glider to base station distances. This variation led to the hypothesis that sea state was a significant factor that affected communication quality [19]. A second set of tests were conducted in September 2009 during which the glider was programmed, upon surfacing, to initiate transmission of a special sequence of bytes(describednext), beforeswitchingbackintoaregularcommunicationwiththe base stations using data packets following the communication protocol described 207 in Section C.1. During these tests accelerometry in all 3 axes was logged onboard the glider. During the field trials in May 2009, we collected data to build a coverage map of a region around a radio modem base station installed at a height of approximately 70 m on Catalina island. We collected three basic types of communication metrics which are shown in Figs. C.6, C.7, C.8 and C.9. These data are interpreted in Section C.3. C.2.2 Accelerometer Data Processing Pipe-line During the September 2009 tests, the glider collected accelerometer data while si- multaneouslytransmittingtestsequencesontheFreewaveradio. Thetestsequences were designed to allow us to understand the quality of communication between the glider and the base station at various surfacing locations and sea states. Here we describe the steps involved in processing the accelerometer data to arrive at the wave parameters in which we are interested. While this is a standard procedure and is well documented in standard wave mechanics texts for wave gauges and buoys (see [75]), we include it here to facilitate the reader. Our accelerometer is programmed to sample each axis if it detects a minimum acceleration of 0.025 g. Since the accelerometer data is sampled asynchronously, we use a zero-order-hold while resampling the data at 50 Hz to obtain the measured time-series. We smooth the data using a low-pass filter. If we call the smoothed 208 accelerometer time-series data as a(nΔt) where, n is the sample index and Δt is the time-step between subsequent samples, then we segment this data into J segments of length L. Each segment is a(j,nΔt) where a represents the values in each segment and j is the segment number. The segments are divided as shown below based on the index j. 1 a(1,nΔt) =a(nΔt),n = 0...,L− 1 2 a(2,nΔt) =a(nΔt),n = L 2 ,... 3L 2 − 1 . . . . . . J a(J,nΔt) =a(nΔt),n = (J− 1) L 2 ,..., (J + 1) L 2 − 1, (C.1) where J = 2( N− N 2 L ). (C.2) We remove the mean from the accelerometer data for each segment. After the removal of the means, we obtain wave spectra by correcting for roll and pitch using the glider’s tilt sensor. Next we perform spectral leakage reduction using a Hanning window, which is a cosine bell taper over each data segment. The bell window is given by W (nΔt) = 1 2 (1− cos( 2πn L ))), 0≤n≤L− 1, (C.3) where L is the record or data segment length. 209 The data is then multiplied with the Hanning window a w (j,nΔt) =a(j,nΔt)∗W (nΔt) (C.4) Next, we calculate the Fast Fourier Transform for the data. The Fast Fourier Transform provides the frequency domain representation, A of the measured time- series, a (or a w with the Hanning window). A(j,mΔf) = Δt L−1 X n=0 a w (j,nΔt)e −i 2πmn L , (C.5) where m = 0, 1, 2,··· , L 2 L even, m = 0, 1, 2,··· , L−1 2 L odd. The real and imaginary parts of A are given by Re[A(j,mΔf)] = Δt L−1 X n=0 a w (j,nΔt) cos( 2πmn L ), and Im[A(j,mΔf)] = −Δt L−1 X n=0 a w (j,nΔt) sin( 2πmn L ). 210 We obtain spectral estimates at Fourier frequencies, mΔf, where the interval be- tween the frequencies is given by Δf = 1 LΔt . (C.6) Power Spectral Density (PSD) estimates for the j th segment are obtained from S aa (j,mΔf) = A ∗ (j,mΔf)A(j,mΔf) LΔt (C.7) where A ∗ is the complex conjugate of A. In particular we are interested in the Co-Spectral Density given by C aa (j,mΔf) = Re[A]Re[A] +Im[A]Im[A] LΔt (C.8) The Final co-spectral estimate is obtained by C aa (mΔf) = 1 J J X j=1 C aa (j,mΔf) (C.9) Now, we compute some of the standard wave parameters which are usually mea- sured by wave-gauge buoys, which are the significant wave-height, peak time period and average time period. The first of these parameters is the significant wave height 211 (H mo or H s ), which is calculated from the wave elevation variance (also the zero moment, m 0 of a non-directional wave spectrum) using H s = 4.0 √ m 0 (C.10) where m 0 = n=1 X N C 11 (f n )df n . (C.11) The peak time period is given by: T p = 1 f p (C.12) where f p (the peak frequency) is the frequency at which the spectral wave energy density is a maximum. The average period T av is calculated by the following T avg = m 0 m 1 . (C.13) The spectral moments are given by: m i = n=1 X n b f i n C 11 (f n )df n , i=0,1. (C.14) 212 C.2.3 Communication parameters processing During the September 2009 trials, the glider was programmed to begin transmitting a sequence of bytes at the maximum baud-rate possible. It makes five attempts to transmit a sequence of bytes from 0-65535, and 2000 padding bytes around the sequence. We chose this sequence because we were able to detect dropped bytes and by cycling through these bytes, we are able to detect successful and unsuccessful parts of the transmission. While transmitting this data, the glider is simultaneously collecting accelerometry which is used to compute the wave parameters described in Section C.2.2. The base station is programmed to continuously monitor its Freewave modem for carrier or incoming data. Any new activity at the base station is assumed to be from the glider, since it has the only Freewave modem programmed to be able to communicate with the base station during the course of the experiment. The base station stores any change to its carrier detect state, as well as logs incoming byte streams with timestamps. We sample bytes collected from the incoming byte stream every 10 ms. The data rate between the radio modem and the base station computer is 115200, which translates into a maximum of 115 bytes being received between any two samples. If we designate a glider surfacing as the time between obtaining a carrier detect on the modem, and losing carrier detect for the final time (during a particular glider surfacing episode), we can compute the total time carrier detect was high during 213 this period of time. We use this ratio (expressed as a percentage of surfacing time) as one of the indicators for communication quality. Carrier detect is a necessary but not sufficient indicator of successful delivery of the data payload. The most useful communication parameter we can measure is the link state ratio. This is the ratio of the time we had a persistent communication link between the glider and the base station to the total surfacing time. Since the only way to ensure there is a link between the two is through the exchange of data packets, the gran- ularity of this measure is governed by the maximum time between transmission of any two packets from the glider. If the glider has nothing to transmit during a sur- facing (e.g., files or console packets), it tries to transmit a status packet containing health information (e.g., position, battery voltage, mission name etc. ) to the base station every 5 seconds. The link is assumed to be lost if no packet is received from the glider at the base station for more than 10 seconds. For many of the glider surfacings at distances greater than 12 km, despite good carrier detect ratios, the link state percentage is low. Another indicator of the communication quality is a count of the number of times the modems had a drop on their carrier. Similarly we can also use the number of times the link was lost as an indicator of how poor the communication is. These can be used to indicate poor communication conditions. From the experiments we performed in September, we perform a correlation of the byte sequences received at the base station, to determine the total correlation, 214 number of bytes dropped or skipped during transmission, as well as the average correlated byte length. This gives us substantial information on how to vary the communication parameters at each transmission time. The average correlated byte length gives us an indicator for the MTU, while we can compute the length of time occupied by the average skipped count to adjust the retransmission timeout (RTO) to a better value for the given conditions. The wave parameters computed from the acceleration spectra are significant wave height (H s ), peak time period (T p ) and average time period (T avg ). We find that the wave parameters obtained using the accelerometer on the glider show a similar trend to those gathered from the nearest data buoy which is NDBC buoy 96222 which is located approximately 25 km from the region the glider was operated within, (see Fig. C.11). We trained a linear perceptron classifier (described in [76]) on a subset of the data collected at surfacing events to assess how well it predicts the communication quality, specifically link percentage ( the ratio of the time a persistent communication link was present to the total time at the surface). We notice from our experiments that values greater than 90% are good for file transfers. C.3 Experimental Results and Analysis We present the coverage maps from the May 2009 tests for the Carrier-detect per- centage, Link-state percentage and file transfer percentages in Figs. C.6, C.7, C.8 215 Start Does Modem have Carrier? Force Iridium Phone Call and wait until Glider Dives again Stop Collect Accelerometer Data Process Accelerometer & GPS data in Classifier Adapt Transmission Scheduling, Packet sizes, RTT parameters etc. Transmit & Receive Packets Are we done communicating? N Y Y N GLIDER DIVES GLIDER AT SURFACE Figure C.5: Flowchart for a typical glider surfacing with the Gumstix and ac- celerometer additions. Figure C.6: Carrier detect expressed as a percentage of time with modem-modem carrier detect on from the May 2009 measurements. Some glider surfacing events had poor carrier detect in spite of being closer to the base station than other locations. This motivated us to use an accelerometer on the glider to test if the sea state was responsible for the poor communication performance. 216 Figure C.7: Protocol link state expressed as a percentage of time with protocol link on during the May 2009 trials. The link state is a ratio of the total time the glider had communication to the total time spent by the glider at the surface. Links closer to the base station are generally better although there are exceptions. Having carrier detect between modems does not imply a good link state. Figure C.8: File transfer speeds in bytes/s from the May 2009 trials. In these trials, we used a standard set of files as a payload to measure file transfer performance after the glider surfaces. No data compression was performed before transmission. For these trials, we used conservative estimates on retransmit timeouts. 217 FigureC.9: Protocollink-stateexpressedasapercentagevs. distance(km)between the glider and the nearest base station. Our communication protocol gives a link- state of>90% up to 12 km with a base station located at a height of 70 m above the sea surface. We notice that there are instances of poor communication at ranges under 10 km. We hypothesize that these are due to surface conditions that affect communication between the glider and the base station. respectively. We also present a scatter plot for the link state percentage vs. dis- tance for all the observations made during the May 2009 tests. Figure C.10 shows a link state coverage map from the September 2009 tests. Just as we observed for the tests in May, we observe that there are regions of lower performance closer to the base station compared to some regions further away from the base station. Figure C.11 compares the significant wave height (H s ) obtained by processing our accelerometer data, with that obtained from NDBC buoy 96222 located 25 km from mostoftheglidersurfacings. Thisbuoyisthenearestreferenceprovidingwavedata to the region where our glider experiments were performed. At the time of writing this paper, preliminary analysis compares significant wave height at glider surfacing location directly with that at the buoy during the same time interval. An analysis is 218 ongoing to provide better comparisons by incorporating the direction and velocity of the waves. The general trend in the significant wave-height measured at the buoy and the glider is similar, although due to differences in the direction of the sea-waves, significant distance between the observations at the buoy and the glider, as well as the smaller time-window of accelerometer data-collection on the glider, variations may be expected. Figure C.10: Link state obtained from the September 2009 glider deployment. While the link state is generally very good within 8 km, we find several surfacing locations where the link state was poor nearer the base station on Catalina island. Figure C.12 depicts the scatter plot of the correlated byte length and the distance between glider surfacing locations. We observe that there is a significant drop in the average correlated byte length from 2200 bytes at a range of 2.8 km to under 1000 bytes beyond 4 km. The trend from this point onwards is a slow decline beyond 4 km. We notice that there is significant spread in the average correlated 219 Figure C.11: Comparison of H s from the glider accelerometer and NDBC Buoy 96222locatedapproximately25kmaway. Thegeneraltrendinthesignificantwave- height measured at the buoy and the glider is similar, although due to differences in the direction of the sea-waves, significant distance between the observations at the buoy and the glider, as well as the small time-window of accelerometer data- collection on the glider, variations may be expected. 2000 4000 6000 8000 10000 12000 14000 16000 0 500 1000 1500 2000 2500 Distance from Base−station (m) Average Correlation byte−length Correlation−sequence data Cubic data−fit Figure C.12: Scatter plot showing the average correlated byte length and the dis- tance (m) between glider surfacing location and the base station. A cubic data fit is shown to highlight the trend followed by the data as distance increases. From the plot it is obvious that there is a sharp drop in correlated byte lengths beyond 3 km. There is considerable spread in the correlation scores between 5 km and 11 km, which we will investigate further. Beyond 12 km, the average correlated byte length is under 300 bytes. 220 2000 4000 6000 8000 10000 12000 14000 16000 −0.5 0 0.5 1 1.5 2 2.5 Distance from Base−station (m) Average byte−loss time (sec) Correlation−sequence data Linear data−fit Figure C.13: Scatter plot showing the average time occupied by dropped bytes in the received correlation sequence. This time provides us with a figure that includes delays due to transmission errors, modem-modem retransmissions, inter- mittent losses in carrier detect, line-of-sight between modems and so on. We see an increasing trend (as expected), with a large spread as glider-surfacing locations beyond 8km become progressively more distant from the base station. byte sequence length between 4 km and 10 km. We speculate that this is related to the conditions at the sea surface affecting the performance of the radio modem on the glider and its antenna. Beyond 8 km, there is a continued drop in the average byte length which falls to approximately 250 bytes beyond 12 km. We see several cases where the average correlated byte length is almost zero beyond 8 km. Figure C.13 is a scatter plot of the average time occupied by dropped bytes during the transmission of the correlation sequence. This serves as a measure of how much time is wasted during a typical transmission due to a combination of various factors which may include loss in LOS between the modems, a drop of carrier between modems, errors in reception, delays due to retransmission between modems, bytes 221 dropped due to buffer overruns in the modem and so on. We see an increasing trend (as expected), with a large spread as glider-surfacing locations beyond 8 km become progressively more distant from the base station. Our approach here is to treat the modems as a black box, and this is why we do not explicitly model the effects of each factor that might result in transmission delays. By looking at Fig. C.12 and Fig. C.13, we see that the overall throughput of trans- mission must drop very quickly with distance. This is because there is a significant increase in the amount of time between successful transmissions of data bytes, while there is a sharp drop in the number of successfully transmitted bytes at increasing distances between the modems. This translates into a sharp drop in data through- put, because the modems begin taking longer to transmit smaller amounts of data. If the glider attempts to transmit large files at the surface, it might risk staying at the surface for an unduly long period of time, which is undesirable. Figure C.14 is a scatter plot of the maximum throughput achievable by the glider by measuring the total number of bytes from the transmitted sequence successfully obtained at the base station, without any retransmissions from the glider side. This transmission speed is therefore governed primarily by the maximum transmission rate which the modems can handle, and it goes down with increasing distance. The theoretical maximum transmission rate between two Freewave modems such as those on the glider and base station is 11.5 KB/s. In practice, with the glider we have been testing and our base station the maximum throughput we have achieved 222 2000 4000 6000 8000 10000 12000 14000 16000 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Distance from Base−station (m) Max throughput measured (KB/sec) Gllider−Base−stationT hroughput data cubic data−fit Figure C.14: Scatter plot showing the maximum throughput between the glider and the base station from the received correlation sequence. Each data point is based on the achieved throughput at the base station based on the ratio of the total data successfully transmitted in the given time. This is the maximum throughput we can expect to achieve, since the modems cannot be made to transmit any faster. In practice, we have lower throughput because we need to retransmit data packets that have errors in them, due to which the rate goes down. is approximately 5 KB/s for distances under 3 km. The throughput goes down to approximately 2 KB/s in good sea conditions between 4 km and 9 km. Beyond 9 km, there is a gradual drop in throughput. Table C.1: Comparison of classifier performance on link-quality Training Range >6 km Range >6 km Range >6 km data-set Link <30% Link >30% Link >90% A 91.14% 90.91% 91.19% B 90.97% 91.33% 89.40% C 94.39% 93.62% 93.79% Tables C.1, C.2, C.3 show results of averaging the comparison of 30 trials using a classifier with data from 206 glider surfacings for A - classification with range only, B - classification with T p , T avg , H s only, and C - classification with T p , T avg , H s 223 Table C.2: Comparison of classifier performance on link-quality Training Range <8 km Range <8 km Range <8 km data-set Link <30% Link >30% Link >90% A 97.25% 97.83% 96.35% B 98.62% 97.14% 98.36% C 98.20% 97.35% 97.72% Table C.3: Comparison of classifier performance on link-quality Training Range >8 km Range >8 km Range >8 km data-set Link <30% Link >30% Link >90% A 90.62% 91.99% 90.43% B 91.99% 91.34% 89.67% C 94.56% 96.05% 91.67% and range (distance between the base station and the glider). We used 80% of the permuted feature vectors for training the classifier and used the remaining feature vectors to assess its accuracy from the data we already have. We find that given processed wave parameters and a glider’s surfacing location, we can determine if we should use Iridium (Link< 30%), send out only a status report, or do transmission of files (Link > 90%). Range plays a major role as is also visible in Fig. C.10. C.4 Discussion Throughput is a good indicator of communication quality. It also serves as a good benchmark to decide where the cut-off between the use of Iridium or Freewave should be, since there is a direct comparison which is possible. The maximum transmission speed on Iridium is 240 byte/s. As we can see from the data, there 224 are several surfacings when the throughput is far lower than this - even at distances of approximately 5 km between the glider and the base station. Hence, although it appears that in general, we can perform file transfers via Freewave all the way up to 14 km, there are times when terrestrial communication is poor enough that Iridium even at a low data rate performs better. Some other key observations from the trials conducted are that the classifier has a very high accuracy rate in predicting how poor the communication quality is (98.62%), withouttheuseofrangeinthefeaturevectorfortraining. Similarly, there is a very high accuracy in predicting a good link quality between the two modems (98.36%) using accelerometer data only. Accelerometer data appears to capture the effect of some of the external conditions that might affect the link quality experienced by the glider during communication. In fact, the use of accelerometer data has a lower predictive error rate than the use of a classifier that uses range alone (as we can see from Table C.2). The use of the communication sequence yielded insights into the trend followed by communication throughput and even provided us with a means of choosing appro- priate values of MTU and RTO for use between the glider and the particular base station at Catalina island. Since the range achievable by an antenna is dependent on its elevation (among many other factors), coming up with a generalizable version of this will need further tests with antennas at multiple elevations. 225 For our antenna which is situated at an elevation of 70 m, the MTU can take the maximum value of 2000 bytes if the glider is closer to the base station than 3 km. Beyond this distance, a safe MTU is below 200 bytes (see Fig. C.12). Similarly, we find that the RTO can be very high at distances beyond 6 km. Up until 6 km, a reasonable value for the RTO is 1 second. Beyond 6 km, there are occasions when the RTO can be higher than 4 seconds. Such a high RTO is not practical for communication that relies on acknowledgements, while having low bandwidth. Hence, we would likely cap the RTO to a maximum of 2 seconds for acknowledged data transfers. C.5 Conclusion We find that wave parameters are useful in estimating expected quality of commu- nication. The sea state was generally calm during the time our data was collected, with significant wave heights between 0.4 m and 1.7 m, and peak time periods ranging between 2.86 sec and 22.22 sec. By knowing how well we can expect to communicate over the Freewave radio based on the sea state, the time at the surface can be better utilized. If the sea state indicates that the expected communication quality will be poor, file transmissions can be postponed to a future surfacing. Ac- celerometer time series from the glider at the surface can be utilized to predict the expected quality of communication with shore stations. While range plays a major role in predicting the expected communication quality, Table C.2 shows that for 226 ranges under 8 km, the classifier that relies on acceleration information is more accurate in predicting when the communication quality is expected to be poor. This provides evidence supporting the hypothesis that sea state plays a role in de- termining the quality of communication and can be used to enhance prediction of communication quality. 227 Reference List [1] A. F. Shchepetkin and J. C. McWilliams, “The Regional Oceanic Modeling System(ROMS):asplit-explicit, free-surface, topography-following-coordinate oceanic model,” Ocean Modelling, vol. 9, no. 4, pp. 347–404, 2005. [2] O. Schofield, J. Kohut, D. Aragon, E. L. Creed, J. Graver, C. Haldeman, J. Kerfoot, H. Roarty, C. Jones, D. C. Webb, and S. Glenn, “Slocum gliders: Robust and ready,” Journal of Field Robotics, vol. 24, no. 6, pp. 473–485, 2007. [3] S. Glenn and O. Schofield, “Growing distributed ocean observatory: Our view from the cool room,” Oceanography Magazine, vol. 22, no. 2, pp. 128–145, 2009. [4] R. N. Smith, J. Das, H. K. Heidarsson, A. A. Pereira, F. Arrichiello, I. Cetinic, L. Darjany, M.-E. Garneau, M. D. Howard, C. Oberg, M. Ragan, E. Seubert, E. C. Smith, B. Stauffer, A. Schnetzer, G. Toro-Farmer, D. A. Caron, B. H. Jones, and G. S. Sukhatme, “Usc cinaps builds bridges: Observing and monitoring the southern california bight,” IEEE Robotics and Automation Magazine, vol. 17, no. 1, pp. 20–30, Mar 2010. [Online]. Available: http://cres.usc.edu/cgi-bin/print_pub_details.pl?pubid=627 [5] K. P. Carroll, S. R. McClaran, E. L. Nelson, D. M. Barnett, D. K. Friesen, and G. N. Williams, “Auv path planning: An a* approach to path planning with consideration of variable vehicle speeds and multiple, overlapping, time- dependent exclusion zones,” in Proceedings of the 1992 Symposium on Au- tonomous Underwater Vehicle Technology, Washington DC, USA, June 1992, pp. 79–84. [6] C. Petres, Y. Pilhas, P. Patron, Y. Petillot, J. Evans, and D. Lane, “Path plan- ning for autonomous underwater vehicles,” IEEE Transactions on Robotics, vol. 23, no. 2, pp. 331–340, April 2007. [7] D. Kruger, R. Stolkin, A. Blum, and J. Briganti, “Optimal AUV path plan- ning for extended missions in complex fast-flowing estuarine environments,” in IEEE International Conference on Robotics and Automation, 2007, pp. 4265– 4270. 228 [8] J. Witt and M. Dunbabin, “Go with the flow: Optimal auv path planning in coastal environments,” in Australian Conference on Robotics and Automation, 2008. [9] J. Binney, A. Krause, and G. S. Sukhatme, “Informative path planning for an autonomous underwater vehicle,” in IEEE International Conference on Robotics and Automation, 2010, pp. 4791–4796. [Online]. Available: http://cres.usc.edu/cgi-bin/print_pub_details.pl?pubid=642 [10] J. Das, F. Py, T. Maughan, T. O’Reilly, M. Messie, J. Ryan, K. Ra- jan, and G. S. Sukhatme, “Simultaneous tracking and sampling of dynamic oceanographic features with autonomous underwater vehicles and lagrangian drifters,” in 12th International Symposium on Experimental Robotics, New Delhi, India, December 2010. [11] A. R. Soltani, H. Tawfik, J. Y. Goulermas, and T. Fernando, “Path planning in construction sites: performance evaluation of the Dijkstra, A*, and GA search algorithms,” Advanced Engineering Informatics, vol. 16, no. 4, pp. 291 – 303, 2002. [Online]. Available: http://www.sciencedirect.com/science/ article/B6X1X-497WRKV-5/2/4ec796c642888bff54cac457254d0e5d [12] L. D. Filippis, G. Guglieri, and F. Quagliotti, “A minimum risk approach for path planning of uavs,” Journal of Intelligent Robot Systems, vol. 61, pp. 203–219, 2011. [13] L. Merckelbach, “On the probability of underwater glider loss due to collision with a ship,” Journal of Marine Science and Technology, June 2012. [14] A. M. Pereira, J. Binney, B. H. Jones, M. Ragan, and G. S. Sukhatme, “To- ward risk aware mission planning for autonomous underwater vehicles,” in IEEE/RSJ Int. Conf. Intelligent Robots and Systems, 2011, pp. 3147–3153. [15] A. A. Pereira and G. S. Sukhatme, “Minimum-risk time-expanded planning for auvs using ocean current predictions,” CRES-USC, Center for Robotic Embedded Systems, CRES-Techreport CRES-11-002, 2011. [16] P. E. Hart, N. J. Nilsson, and B. Raphael, “A formal basis for the heuristic determination of minimum cost paths in graphs,” IEEE Trans. on Systems Science and Cybernetics, vol. SSC-4, no. 2, pp. 100–107, July 1968. [17] S.RussellandP.Norvig, Artificial Intelligence - A Modern Approach. Prentice Hall, 1995. [18] R. N. Smith, J. Kelly, Y. Chao, B. H. Jones, and G. S. Sukhatme, “Towards improvement of autonomous glider navigation accuracy through the use of regional ocean models,” in Proceedings of the 29th International Conference on Offshore Mechanics and Arctic Engineering (OMAE 2010), 2010. [19] A. A. Pereira, H. Heidarsson, C. Oberg, D. A. Caron, B. H. Jones, and G. S. Sukhatme, “A communication framework for cost-effective operation 229 of auvs in coastal regions,” in The 7th International Conference on Field and Service Robots, Cambridge, Massachusetts, Jul 2009. [Online]. Available: http://cres.usc.edu/cgi-bin/print_pub_details.pl?pubid=619 [20] A. A. Pereira and G. S. Sukhatme, “Estimation of wave parameters from accelerometry data to aid auv-shore communication,” in IEEE OCEANS, Sydney, Australia, May 2010. [Online]. Available: http: //cres.usc.edu/cgi-bin/print_pub_details.pl?pubid=656 [21] I. Chabini and S. Lan, “Adaptations of the A* Algorithm for the Computation of Fastest Paths in Deterministic Discrete-Time Dynamic Networks,” IEEE Transactions on Intelligent Transportation Systems, vol. 3, no. 1, pp. 60–74, March 2002. [22] E. W. Dijkstra, “A note on two problems in connexion with graphs,” Nu- merische Mathematik, vol. 1, pp. 269–271, 1959. [23] R. N. Smith, A. A. Pereira, Y. Chao, P. P. Li, D. A. Caron, B. H. Jones, and G. S. Sukhatme, “Autonomous underwater vehicle trajectory design coupled with predictive ocean models: A case study,” in IEEE International Conference on Robotics and Automation, 2010, pp. 4770–4777. [Online]. Available: http://cres.usc.edu/cgi-bin/print_pub_details.pl?pubid=638 [24] R. N. Smith, Y. Chao, P. P. Li, D. A. Caron, B. H. Jones, and G. S. Sukhatme, “Planning and Implementing Trajectories for Autonomous Underwater Vehicles to Track Evolving Ocean Processes based on Predictions from a Regional Ocean Model,” International Journal of Robotics Research, 2010. [Online]. Available: http://cres.usc.edu/cgi-bin/print_pub_details.pl? pubid=646 [25] A. R. Robinson, “Forecasting and simulating coastal ocean processes and vari- abilities with the harvard ocean prediction system,” Coastal Ocean Prediction, pp. 77–100, 1999. [26] D. R. Thompson, S. Chien, Yi-Chao, P. P. Li, B. Cahill, J. Levin, O. Schofield, A. Balasuriya, S. Petillo, M. Arrott, and M. Meisinger, “Spatiotemporal path planning in strong, dynamic, uncertain currents,” in IEEE International Con- ference on Robotics and Automation, 2010. [27] T. Lolla, M. P. Ueckermann, K. Yigit, P. J. H. Jr., and P. F. J. Lermusi- aux, “Path planning in time dependent flow fields using level set methods,” in IEEE International Conference on Robotics and Automation (ICRA 2012), RiverCentre, Saint Paul, Minnesota, USA, 2012, pp. 166–173. [28] J. Latombe, Robot Motion Planning. Boston, MA: Kluwer Academic Pub- lishers, 1991. [29] S. M. LaValle, Planning Algorithms. Cambridge, U.K.: Cambridge University Press, 2006, also available at http://planning.cs.uiuc.edu/. 230 [30] J. H. Reif, “Complexity of the mover’s problem and generalizations,” in IEEE Symposium on Foundations of Computer Science, 1979, pp. 421–427. [31] P. Fiorini and Z. Shiller, “Motion planning in dynamic environments using velocity obstacles,” The International Journal of Robotics Research, vol. 17, pp. 760–772, 1998. [32] D. Wilkie, J. van den Berg, and D. Manocha, “Generalized velocity obstacles,” in Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, oct. 2009, pp. 5573 –5578. [33] C. Fulgenzi, A. Spalanzani, and C. Laugier, “Dynamic obstacle avoidance in uncertain environment combining pvos and occupancy grid,” in Robotics and Automation, 2007 IEEE International Conference on, april 2007, pp. 1610 –1616. [34] L. Kavraki, P. Svestka, J. C. Latombe, and M. Overmars, “Probabilistic roadmaps for path planning in high-dimensional configuration spaces,” IEEE Transactions on Robotics and Automation, vol. 12, no. 4, pp. 566–580, 1996. [35] S.M.LaValleandJ.J.Kuffner, Algorithmic and Computational Robotics: New Directions. MA: A K Peters, Wellesley, 2001, ch. Rapidly-exploring random trees: Progress and prospects, pp. 293–308. [36] D. Ferguson and A. T. Stentz, “Anytime rrts,” in Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS ’06), October 2006, pp. 5369 – 5375. [37] M. Kobilarov, J. E. Marsden, and G. S. Sukhatme, “Global estimation in constrained environments,” The International Journal of Robotics Research, vol. 31, no. 1, pp. 24–41, 2012. [38] S. Lavalle and J. Kuffner, “Randomized kinodynamic planning,” International Journal of Robotics Research, vol. 20, no. 5, pp. 378–400, May 2001. [39] J. Das, F. Py, T. Maughan, T. O’Reilly, M. Messié, J. Ryan, G. S. Sukhatme, and K. Rajan, “Coordinated sampling of dynamic oceanographic features with AUVs and drifters,” International Journal of Robotics Research, Apr 2012. [Online]. Available: http://cres.usc.edu/cgi-bin/print_pub_details.pl? pubid=721 [40] J. Binney, A. Krause, and G. S. Sukhatme, “Optimizing waypoints for moni- toring spatiotemporal phenomena,” The International Journal of Robotics Re- search, 2013, (To appear in). [41] H. Choset and P. Pignon, “Coverage path planning: The boustrophedon de- composition,” in International Conference on Field and Service Robotics, Can- berra, Australia, December 1997. 231 [42] Y. Gabriely and E. Rimon, “Spanning-tree based coverage of continuous ar- eas by a mobile robot,” in IEEE International Conference on Robotics and Automation, 2001, pp. 1927–1933. [43] S. Hert, S. Tiwari, and V. Lumelsky, “A Terrain-Covering Algorithm for an AUV,” Autonomous Robots, vol. 3, pp. 91–119, 1996. [44] J. A. Sethian, “Evolution, implementation, and application of level set and fast marching methods for advancing fronts,” Journal of Computational Physics, vol. 169, pp. 503–555, 2001. [45] J. A. Sethian and A. Vladimirsky, “Ordered Upwind Methods for Static Hamilton-Jacobi Equations: Theory and Algorithms,” Society for Industrial and Applied Mathematics, vol. 41, no. 1, pp. 325–363, 2003. [46] H. C. Woithe, D. Boehm, and U. Kremer, “Improving Slocum Glider Dead Reckoning Using a Doppler Velocity Log,” in Proceedings of the IEEE Oceans 2011, 2011. [47] R. Grasso, D. Cecchi, M. Cococcioni, C. Trees, M. Rixen, A. Alvarez, and C.Strode, “Modelbaseddecisionsupportforunderwaterglideroperationmon- itoring,” in Proceedings of the IEEE/MTS Oceans 2010, 2010. [48] C. Fernández-Perdomo, J. Cabrera-Gámez, D. Hernández-Sosa, J. Isern- González, A. C. Domínguez-Brito, A. Redondo, J. Coca, A. G. Ramos, E. A. Fanjul, and M. García, “Path planning for gliders using Regional Ocean Mod- els: Application of pinzón path planner with the ESEOAT model and the RU27 trans-Atlantic flight data,” in Proc. IEEE OCEANS Conf., 2010, pp. 1–10. [49] M. Eichhorn, “Solutions for practice-oriented requirements for optimal path planning for the auv "slocum glider",” in Proceedings of the IEEE Oceans 2010, 2010, pp. 1–10. [50] M. Eichhorn and U. Kremer, “Opportunities to Parallelize Path Planning Al- gorithms for Autonomous Underwater Vehicles,” in Proceedings of the IEEE Oceans 2011, 2011. [51] S. Osher and J. A. Sethian, “Fronts propagating with curvature dependent speed: Algorithms based on Hamilton-Jacobi formulations,” Journal of Com- putational Physics, vol. 79, pp. 12–49, 1988. [52] J. G. Graver, “Underwater gliders: Dynamics, control and design,” Ph.D. dissertation, Princeton University, May 2005. [53] P. Bhatta, “Nonlinear stability and control of gliding vehicles,” Ph.D. disser- tation, Princeton University, Dept. of Mechanical and Aerospace Engineering, September 2006. [54] G. A. Hollinger, A. A. Pereira, and G. S. Sukhatme, “Learning uncertainty models for reliable operation of autonomous underwater vehicles,” in IEEE 232 International Conference on Robotics and Automation (ICRA 2013), 2013, (submitted to). [Online]. Available: http://robotics.usc.edu/~geoff/files/ HollingerICRA13a.pdf [55] P. F. Lermusiaux, “Uncertainty estimation and prediction for interdisciplinary ocean dynamics,” Journal of Computational Physics, vol. 217, pp. 176–199, 2006. [56] R. N. Smith, J. Kelly, K. Nazarzadeh, and G. S. Sukhatme, “An investigation on the accuracy of regional ocean models through field trials,” in Proceedings of the IEEE International Conference on Robotics and Automation, 2013. [57] A. A. Pereira, J. Binney, G. A. Hollinger, and G. S. Sukhatme, “Risk-aware path planning for autonomous underwater vehicles using predictive ocean models,” Journal of Field Robotics, vol. 30, no. 5, pp. 741–762, Oct 2013. [Online]. Available: http://robotics.usc.edu/publications/801/ [58] Mausam and A. Kolobov., Planning with Markov Decision Processes: An AI Perspective, ser. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan and Claypool Publishers, June 2012. [59] S. Russell and P. Norvig, Artificial Intelligence - A Modern Approach, Sec- ond ed. Prentice Hall, ISBN 0-13-790395-2, 2003. [60] C.E.RasmussenandC.K.I.Williams,Gaussian Processes for Machine Learn- ing. The MIT Press, 2006. [61] M.A.ÁlvarezandN.D.Lawrence,“Computationallyefficientconvolvedmulti- ple output gaussian processes,” Journal of Machine Learning Research, vol. 12, pp. 1425–1466, 2011. [62] J. Yamamoto, “An alternative measure of the reliability of ordinary kriging estimates,” Mathematical Geology, vol. 32, no. 4, pp. 489–509, 2000. [63] J. Yamamoto and M. M. da Rocha, “Properties and applications of the in- terpolation variance associated with ordinary kriging estimates,” in Proc. Int. Symp. Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, 2008, pp. 70–77. [64] Y.-H. Kim, D. Shell, C. Ho, and S. Saripalli, “Spatial interpolation for robotic sampling: Uncertainty with two models of variance,” in Proc. Int. Symp. Ex- perimental Robotics, 2012. [65] S. Vasudevan, F. T. Ramos, E. W. Nettleton, and H. F. Durrant-Whyte, “Gaussian process modeling of large scale terrain,” J. Field Robotics, vol. 26, no. 10, pp. 812–840, 2009. [66] S. Koenig and M. Likhachev, “D* lite,” in Proceedings of the Eighteenth Na- tional Conference on Artificial Intelligence (AAAI). AAAI, 2002. 233 [67] A. T. Stentz, “The focussed d* algorithm for real-time re-planning,” in Pro- ceedings of the International Joint Conference on Artificial Intelligence, 1995, pp. 1652–1659. [68] K. Szwaykowska and F. Zhang, “A lower bound on navigation error for marine robots guided by ocean circulation models,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, 2011, pp. 3584–3588. [69] C. J. Willmott, R. E. Davis, J. J. Feddema, K. M. Klink, D. R. Legates, C. M. Rowe, S. G. Ackleson, and J. O’Donnell, “Statistics for the evaluation and comparison of models,” Journal of Geophysical Research, vol. 90, no. C12, pp. 8995–9005, 1985. [70] M.C.Kerman, W.Jiang, A.F.Blumberg, andS.E.Buttrey, “Acomparisonof robust metamodels for the uncertainty quantification (UQ) of New York Har- bor oceanographic data,” Journal of Operational Oceanography, vol. 1, no. 2, pp. 3–13, 2008. [71] D. Thompson and D. Wettergreen, “Intelligent maps for autonomous kilometer-scale science survey,” in Proc. Int. Symp. Artificial Intelligence, Robotics and Automation in Space, 2008. [72] S. Dragiev, M. Toussaint, and M. Gienger, “Gaussian process implicit surfaces for shape estimation and grasping,” in IEEE Int. Conf. Robotics and Automa- tion, 2011. [73] F. Technologies, “Spread spectrum wireless data traceiver user manual,” Free- Wave Technologies Inc., Tech. Rep. 6.3, 2005. [74] R. Ludwig, A. Konrad, A. D. Joseph, and R. H. Katz, “Optimizing the end- to-end performance of reliable flows over wireless links,” Wireless Networks, vol. 8, no. 2-3, pp. 289–299, November 2002. [75] R. Brown, J. Baker, and J. McCall, “Nondirectional and directional wave data analysis procedures,” National Data Buoy Center, NOAA, Stennis Space Cen- ter, Tech. Rep. 01, January 1996. [76] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. A Wiley-Interscience Publication, 2001. 234
Abstract (if available)
Abstract
Path planning is the process of generating an optimal sequence of waypoints from a start configuration to a desired goal configuration under constraints (e.g., avoiding obstacles, respecting time/energy budgets). In this thesis, we study the problem of risk-aware planning. Specifically, we design, develop, and experimentally validate optimal paths for Autonomous Underwater Vehicles (AUVs) in the open ocean in the presence of navigational hazards such as ships and other obstacles. A novel aspect of this work is the introduction of ocean current predictions to optimize planning in such settings. This is challenging because current predictions are typ- ically available at non-uniform spatial resolution, noisy, and time-delayed. We designed three risk-aware planners that reason probabilistically about the uncer- tainty in ocean currents predictions. The minimum expected risk planner ensures that the AUV always reaches the goal, while minimizing risk along the way, Risk- aware Markov Decision Process-based planning uses stationary models over a short horizon, and trades off between goal-directed behavior and reducing risk. This is susceptible to finding sub-optimal policies due to stationarity. The non-stationary, risk-aware MDP makes use of variability in the currents where possible to overcome high-risk sections of paths on the way to the goal. In addition to these planners, we develop a taxonomy for risk-aware planning in dynamic settings. Another key contribution is learning the uncertainty in currents to improve planning. Results from extensive simulations clearly show that learning uncertainty helps significantly improve performance of risk-aware planners in uncertain currents, allowing AUVs to be operated in more challenging scenarios than was previously possible. Finally, the planners described in this dissertation have been field tested at unprecedented levels to validate their practical utility (∼2000 hours of testing at sea).
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Navigation and guidance of an autonomous surface vehicle
PDF
Data-driven robotic sampling for marine ecosystem monitoring
PDF
Informative path planning for environmental monitoring
PDF
Multi-robot strategies for adaptive sampling with autonomous underwater vehicles
PDF
Learning objective functions for autonomous motion generation
PDF
Any-angle path planning
PDF
Discrete geometric motion control of autonomous vehicles
PDF
Bayesian methods for autonomous learning systems
PDF
Speeding up trajectory planning for autonomous robots operating in complex environments
PDF
Optimization-based whole-body control and reactive planning for a torque controlled humanoid robot
PDF
Intelligent robotic manipulation of cluttered environments
PDF
Information theoretical action selection
PDF
Speeding up path planning on state lattices and grid graphs by exploiting freespace structure
PDF
Iterative path integral stochastic optimal control: theory and applications to motor control
PDF
Data-driven autonomous manipulation
PDF
Incremental search-based path planning for moving target search
PDF
Large-scale path planning and maneuvering with local information for autonomous systems
PDF
Target assignment and path planning for navigation tasks with teams of agents
PDF
Learning from planners to enable new robot capabilities
PDF
A robotic system for benthic sampling along a transect
Asset Metadata
Creator
de Menezes Pereira, Arvind Antonio
(author)
Core Title
Risk-aware path planning for autonomous underwater vehicles
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
11/26/2013
Defense Date
09/06/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
artificial intelligence,autonomous underwater vehicles,gliders,OAI-PMH Harvest,path planning
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Sukhatme, Gaurav S. (
committee chair
), Caron, David A. (
committee member
), Schaal, Stefan (
committee member
)
Creator Email
arvind.pereira@gmail.com,menezesp@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-351180
Unique identifier
UC11295757
Identifier
etd-deMenezesP-2184.pdf (filename),usctheses-c3-351180 (legacy record id)
Legacy Identifier
etd-deMenezesP-2184.pdf
Dmrecord
351180
Document Type
Thesis
Format
application/pdf (imt)
Rights
de Menezes Pereira, Arvind Antonio
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
artificial intelligence
autonomous underwater vehicles
path planning