Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Speeding up trajectory planning for autonomous robots operating in complex environments
(USC Thesis Other)
Speeding up trajectory planning for autonomous robots operating in complex environments
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Speeding Up Trajectory Planning for Autonomous Robots Operating in Complex Environments by Pradeep Rajendran A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Mechanical Engineering) December 2019 Copyright 2020 Pradeep Rajendran Acknowledgements First and foremost, I would like to thank Professor Satyandra K. Gupta for being my doctoral advisor. His guidance, patience and constructive critique has been very instrumental in shaping my research philosophy and my approach to solving new problems. I am very fortunate to have an understanding and caring advisor. There are numerous valuable lessons I learned from him. These lessons will continue to guide me throughout my career. Also, I would like to thank professors on my thesis committee: Professor Mitul Luhar and Professor Sven Koenig. I appreciate their time and eort in going over all the work I did and giving me their input. I would also like to thank the National Science Foundation (NSF grants 1634433,1634431,1526016). Without their nancial support, this work would not have been pos- sible. Next, I would like to thank the professors who taught me at University of Maryland, University of Wisconsin and Hong Kong University of Science and Technology. I would like to thank all of my labmates for their pleasant company and numerous stimulating discussions throughout the past few years. Specically, I would like to thank Ariyan Kabir, Di Zeng, Brual Shah, Shaurya Shriyam, Sarah Al-Hussaini, Shantanu Thakar, Prahar Bhatt, Yeo Jung Yoon, Aniruddha Shembekar, Rishi Malhan and Jason Gregory. Finally, I would like to thank my parents and my sister who were always available and gave words of encouragement and support when I needed it most. I want to thank my wife. She has always given words of encouragement, put up with my absent-mindedness and taken on more than fair share of life's many responsibilities. ii Table of Contents Acknowledgements ii List Of Tables vi List Of Figures vii Abstract xiii Chapter 1: Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Trajectory Planning Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1.1 Search-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1.2 Sampling-Based Methods . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1.3 Optimization-Based Methods . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 Planning Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.3 Motion and Perception Uncertainty . . . . . . . . . . . . . . . . . . . . . . 9 1.2.4 High-Dimensional State-Space . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3 Goal and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 2: Literature Review 15 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Handling Dynamic Obstacles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 Reactive Trajectory Planning Methods . . . . . . . . . . . . . . . . . . . . . 15 2.2.1.1 Zero-Order Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1.2 First-Order Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.2 Deliberative Trajectory Planning Methods . . . . . . . . . . . . . . . . . . . 18 2.3 Handling Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Handling High-Dimensional Conguration Spaces . . . . . . . . . . . . . . . . . . . 24 2.4.1 Ecient Collision Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.2 Sampling Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.3 Node and Target Selection Strategies . . . . . . . . . . . . . . . . . . . . . . 30 2.4.4 Connection Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4.5 Incorporating Human Assistance in Planning . . . . . . . . . . . . . . . . . 33 Chapter 3: Dynamics-Aware Reactive Trajectory Planning 34 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 iii 3.4 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.1 Construction of a Planning Window . . . . . . . . . . . . . . . . . . . . . . 43 3.4.2 Construction of Dynamics Constraints . . . . . . . . . . . . . . . . . . . . . 46 3.4.3 Construction of st Obstacles . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.4 Evaluating Strategies for Obstacle Avoidance . . . . . . . . . . . . . . . . . 48 3.4.5 Generating Dynamically Feasible Plans . . . . . . . . . . . . . . . . . . . . 52 3.4.6 Trajectory Deviation Cost Function . . . . . . . . . . . . . . . . . . . . . . 54 3.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.1 Impact of Inexact Dynamics Constraints on Performance . . . . . . . . . . 56 3.5.2 Impact of Perception Uncertainty on Performance . . . . . . . . . . . . . . 57 3.5.3 Need for Both Speed-Regulation and Path-Variation . . . . . . . . . . . . . 58 3.5.4 Sequential Processing of Dynamic Obstacles . . . . . . . . . . . . . . . . . . 58 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Chapter 4: Deliberative Trajectory Planning 65 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.1 Vehicle Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.2 Trac Vessel Proles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2.3 Environmental Disturbances . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.4 Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.4 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.4.1 Node Expansion and Cost Functions . . . . . . . . . . . . . . . . . . . . . . 75 4.4.1.1 Identication of Tracking Error Model . . . . . . . . . . . . . . . 75 4.4.1.2 Obtaining Tracking Error and Failure Model Parameters . . . . . 77 4.4.1.3 Obtaining Tracking Error Parameters . . . . . . . . . . . . . . . . 81 4.4.1.4 Obtaining MTBF . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4.1.5 Cost Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.4.1.6 Accounting for Perception Uncertainty . . . . . . . . . . . . . . . 85 4.4.2 Speed-up Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.4.2.1 Static Obstacle Heuristics (SOH) . . . . . . . . . . . . . . . . . . . 86 4.4.2.2 Space-Time Exploration Heuristics . . . . . . . . . . . . . . . . . . 87 4.4.2.3 Adaptive Motion Goal Set (AMGS) . . . . . . . . . . . . . . . . . 90 4.4.3 Parameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4.3.1 User Preference Parameters . . . . . . . . . . . . . . . . . . . . . . 90 4.4.3.2 Planning Speed Parameters . . . . . . . . . . . . . . . . . . . . . . 91 4.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.5.1 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.5.1.1 Sensitivity to Number of Dynamic Obstacles . . . . . . . . . . . . 105 4.5.1.2 Sensitivity to Dynamic Obstacle Uncertainty . . . . . . . . . . . . 105 4.5.2 Physical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.5.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.5.2.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.5.3 Comparison of MDP and Graph Search . . . . . . . . . . . . . . . . . . . . 114 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 iv Chapter 5: Trajectory Planning in High-Dimensional Conguration Spaces 128 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.2.1 Sampling Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.2.2 Node/Target Selection Strategies . . . . . . . . . . . . . . . . . . . . . . . . 133 5.2.3 Connection Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.4 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.4.1 Tree Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.4.2 Focus Region Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.4.3 Node Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.4.4 Target Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 5.4.5 Extend Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.4.6 Connection Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.4.7 Scheduling of Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 5.4.8 Incorporating Human Assistance . . . . . . . . . . . . . . . . . . . . . . . . 152 5.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 5.5.1 Impact of Human Assistance . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.5.2 Impact of Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Chapter 6: Conclusion 170 6.1 Intellectual Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.2 Summary of Lessons Learnt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 6.3 Anticipated Benets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 6.4 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Reference List 177 v List Of Tables 3.1 Scaling parameters for each terrain type . . . . . . . . . . . . . . . . . . . . . . . . 46 3.2 An example of the cost matrix for 4 path choices . . . . . . . . . . . . . . . . . . . 52 3.3 Comparison of speed-regulation, path-variation against the hybrid approach . . . . 57 4.1 Tracking error model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2 Mean-time-between-failure (MTBF) model. . . . . . . . . . . . . . . . . . . . . . . 80 4.3 Planner parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.4 Planning stack congurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.5 Comparison of performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.6 Impact of speed-up techniques on performance. . . . . . . . . . . . . . . . . . . . . 104 4.7 Comparison of execution time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.1 Summary of Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 5.2 Strategy Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 5.3 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 5.4 Failure rate and suboptimality factor for all tested methods (T o = 7:5s) . . . . . . 161 5.5 Failure rate and suboptimality factor for all tested methods (T o = 15s) . . . . . . 161 5.6 Failure Rate (FR) and Path Length (PL) for all tested methods . . . . . . . . . . . 163 5.7 Impact of strategies on a subset of tested scenarios . . . . . . . . . . . . . . . . . . 168 vi List Of Figures 1.1 Prime applications of marine automation: Port assistance, Search and Rescue, Port Policing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Left: An environment where tasks may change slowly (e.g., car model changes, dierent seat assembly), Right: An environment where tasks may change quickly (e.g., mobile robots interacting with human trac in a warehouse setting) . . . . . 4 1.3 Architecture of a robotic system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Planning hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5 Dissertation theme. Specic applications impose constraints on minimum optimal- ity and maximum planning-time. This rules out anytime methods with generic heuristics as candidates. Anytime methods with good heuristics are needed to produce high-quality solutions quickly. . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1 An illustrative example showing the need for vehicle-dynamics aware planning . . . 38 3.2 Alternative paths choices are shown. For each path, there can be many associated speed proles that can circumnavigate the dynamic obstacle in the face of dynamics constraints. Only one speed prole is shown for each path. On each path, a few speed proles are evaluated in TGDC to yield the best path and speed prole. . . 38 3.3 Graphical overview of the approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4 (a) PathK and (b) its corresponding planning windowPW K (0;t start ;l K ;t h ) show- ing a reference speed prole v(s) to follow . . . . . . . . . . . . . . . . . . . . . . . 45 3.5 Maximum speed, acceleration and deceleration constraints are trajectories in the planning window corresponding to a certain curve. . . . . . . . . . . . . . . . . . . 47 3.6 Snapshots of an in ated (vehicle radius + obstacle radius) xyt space obstacle as it passes diagonally across a few paths at velocity v. (also see Figure 3.7) . . . . 49 vii 3.7 A planning window in which ast space obstacle (dark red) and its bounding box approximation (light red) are shown. The reference speed prole is infeasible as it intersects thest obstacle. Two trajectories start at (s start ;t start ) in the planning window. The passing trajectory (green) accelerates in front of the obstacle using the upper left (UL) vertex. The yielding trajectory (purple) decelerates and yields to the obstacle using lower right (LR) vertex. (also see Figure 3.6) . . . . . . . . . 50 3.8 An example of how two st obstacles C K;DO1 and C K;DO2 are merged into one single st obstacle C K;DO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.9 Qualitative example of risk assessment for a passing strategy where 6 samples of a st obstacles on a particular path are used. One of the samples has the upper left (UL) vertex in the infeasible region. Thus, the collision risk is 1/6 for the passing strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.10 Evaluation of strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.11 Triggering of value binding operation depends acceleration/deceleration curve. (a) and (c) illustrate the scenario where pass and yield strategies have been decided but reference speed prole is followed. (b) and (d) illustrate the scenario a little later at t 1 and just after value binding has occurred. The computed numerical plan is shown as a dotted black line. The blue rectangle shows the worst-case st obstacle corresponding to the redst obstacle samples used in value binding process 61 3.12 Top view of the scenario used for generating the results. Dynamic obstacles are poised to cut across the reference path from either the left or the right of the reference path. Inset: a magnied view of the dynamic obstacle (red with yellow buer region), reference path (red) and alternative paths (blue, pink). . . . . . . . 62 3.13 Relationship between collision rates and speed violation as the constraint augmen- tation factor is varied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.14 (a) Percent increase in execution time and (b) percent increase in speed violation ratio as uncertainty parameters p;max and v;max are varied. . . . . . . . . . . . . 63 3.15 When handling two obstacles sequentially, collisions (green) happen when st obstacles are spaced less than 1.5 stopping distances and 1 stopping time . . . . . 63 3.16 When handling two obstacles as a single composite obstacle, speed violation ratio increases when st obstacles are spaced apart further than 0.5 stopping distances 64 4.1 Handling large dynamic obstacles requires deliberative planning. . . . . . . . . . . 66 4.2 Motion goal set with r = 30m containing n dir = 8 motion goals (in blue) directed toward cardinal and intercardinal directions. Eight optimal trajectories (in red) from (x = 0;y = 0; = 0;t = 0) to each of the motion goals is shown. A ninth motion goal labeled O is also shown which corresponds to a wait-action of duration t wait . The trajectories start with = 0 and end at the heading angle corresponding to the cardinal directions. . . . . . . . . . . . . . . . . . . . . . . . . 70 viii 4.3 (a) A Kelvin wave pattern for a trac vessel underway westward, operating at hull Froude number 0:5 and waterline boat length 10m. Crests are shown in yellow and troughs are shown in red. Two local wavefronts near two instances of the USV is labeled 1 and 2. (b) Simplication of wave pattern into zones with dierent local wavefronts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.4 An illustration showing how parameter vector is dened over a trajectory at a few characteristic query points A, B, C and D. . . . . . . . . . . . . . . . . . . . . 72 4.5 Realization of multiple simulated trajectories as the USV is commanded to go towards ve dierent locations. Deviation from the nominal trajectories can be observed by looking at the spread in the realized trajectories. . . . . . . . . . . . . 78 4.6 Waveeld experiment setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.7 Determining wave incidence angle and relative position from a trac vessel. . . . . 82 4.8 (a) Trajectory tracking error envelope dened by ct and at for a trajectory start- ing at p q = (0; 0; 0), t q = 0 and reaching east under a particular sea state , (b) The 3 error ellipse at an intermediate time t along the nominal trajectory. . . . . 83 4.9 Collision cost assessment for a trajectory starting at a parent node and ending at a child node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.10 Generation of heuristic space-time path. . . . . . . . . . . . . . . . . . . . . . . . . 88 4.11 Islands map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.12 Busy map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.13 Port map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.14 Stress map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.15 Scenario to illustrate the eects of changing the MTBF parameter M t . USV is positioned at the start point (250; 100) and commanded to reach the blue square at (235; 80) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.16 Case M t = 1000. At t = 75:0, the USV is seen avoiding crossing waveeld and taking a longer route to the goal point. . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.17 Case M t = 1000. USV reaches goal point at t = 166:1 . . . . . . . . . . . . . . . . 100 4.18 Case M t = 3000. At t = 62:0, the USV is seen a crossing waveeld at the appro- priate angle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.19 Case M t = 3000. USV reaches goal point at t = 139:1 . . . . . . . . . . . . . . . . 102 ix 4.20 The scenario used for evaluating the eect of increasing the number of dynamic obstacles. Random start states and random goal states are sampled from the start region and the goal region respectively. Dynamic obstacles are randomly spawned on the sampling line and made to move at the indicated directions at various speeds.106 4.21 Planning time as the number of dynamic obstacles was varied between 1 15. . . . 107 4.22 Scenario to illustrate the eects of low and high dynamic obstacle perception un- certainty. A dynamic obstacle is entering a narrow channel. Start and goal states are separated by the narrow channel. The USV must access the risks of entering the channel while another vessel is underway. . . . . . . . . . . . . . . . . . . . . . 108 4.23 USV enters the channel as the estimates of dynamic obstacle are sucient for passing the channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.24 USV has traveled halfway into the channel after encountering the dynamic obstacle in a COLREGs compliant way. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.25 USV waits at the mouth of the channel for the dynamic obstacle to pass as it is too dangerous to attempt passing the channel when the estimate of the dynamic obstacle is bad. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.26 USV starts to move only after the dynamic obstacle has cleared the channel. . . . 112 4.27 Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.28 One set of trajectories from scenario P 1 comparing wave-aware trajectory with a conservative trajectory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.29 States, clusters and actions are illustrated. For a particular state, the possible next states when the action u =FL is applied is shown using dashed blue lines. . . . . 116 4.30 Example of a cluster graph showing a shortest trajectory from C s toC g computed usingA . Part of the trajectory is complete but the trajectory has not been faith- fully executed resulting in a slight deviation atC a . Would the optimal MDP policy change in light of this deviation ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.31 A test scenario to compare MDP and A solutions. The goal state set is shown as a red dot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.32 MDP policy computed for t = 0:0. . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.33 An execution run of the plan computed for the test scenario using SDP. . . . . . . 125 4.34 An execution run of the plan computed for the test scenario using A . . . . . . . . 126 5.1 (a) Point-to-point planning between any two points in (A;B;C;D;E) can bene- t from workspace guidance. (b) A complex planning problem where workspace guidance is unlikely to produce solutions. A sanding tool is to be moved from the starting pose to the goal pose. The gap circled red is too narrow. The only way out is across the slot circled blue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 x 5.2 Our framework dynamically switches strategies selected from a library of strategies based on feedback from the current search state. It can incorporate new strategies and scheduling logic for application specic needs . . . . . . . . . . . . . . . . . . . 136 5.3 An illustration of the six primitive steps in realizing the bi-directional tree search method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.4 (a) An illustration of bi-directional tree search using TS A , (b) An illustration of bi-directional tree search usingTS C . We observe that the balanced growth of both trees results in a solution without sampling a lot of unnecessary (blue) nodes. . . . 138 5.5 An illustration of free-space balls inW (K = 6). TreeT s has progressed upto ball B 3 . TreeT g has progressed upto ballB 4 . Note that for illustration purposes, both trees have been projected ontoW based on the EE positions of the node. . . . . . 139 5.6 An illustration of how a local model (dim(C) = 2) is utilized for exploration. Di- rections roughly orthogonal to the gradient vector are promising directions. . . . . 144 5.7 Connection biasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 5.8 (a) A complex planning scenario where a human operator intuitively knows how to maneuver around obstacles, (b) Humans intuitively know the sanding tool cannot be slid across the narrow gap. By inspection, humans immediately know that the tool has to exit the cavity to reach the goal pose. . . . . . . . . . . . . . . . . . . . 154 5.9 An example of a cooperative-planning GUI. Focus regions in the form of workspace balls are created by the human-operator to guide the welding tool around the roll- cage assembly. In the lower-right area, a slider control pops up when the human operator clicks a workspace ball. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 5.10 A representative subset of test scenarios. For clarity, only one of many possible joint congurations is shown for each scenario. . . . . . . . . . . . . . . . . . . . . 158 5.11 Performance metrics of CODES3 and alternative methods aggregated over 30 di- verse scenarios. All methods were given time budget T o = 7:5s to compute solu- tions. The dots indicate median sub-optimality factor for individual scenarios and the stars indicate average performance over all scenarios tested. The point (0; 1) is an ideal point indicating zero failure rate and optimal solution quality. Some data points that have a sub-optimality factor greater than 8 are not visualized. . . . . . 159 5.12 Performance metrics of CODES3 and alternative methods aggregated over 30 di- verse scenarios. All methods were given time budget T o = 15s to compute solu- tions. The dots indicate median sub-optimality factor for individual scenarios and the stars indicate average performance over all scenarios tested. The point (0; 1) is an ideal point indicating zero failure rate and optimal solution quality. Some data points that have a sub-optimality factor greater than 8 are not visualized. . . . . . 160 5.13 Test scenarios where human operator assistance is utilized. For clarity, only one of many possible joint congurations is shown for each scenario. . . . . . . . . . . . . 162 xi 5.14 A representative subset of test scenarios used for analyzing the impact of various strategies. For clarity, only one of many possible joint congurations is shown for each scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 5.15 Performance of CODES3 (C3), CODES3 with RBF, AF, and CB (C3-P) and other methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 xii Abstract Advances in sensing and computing hardware have physically equipped robots to operate in complex environments. In many real-world settings, we desire robots to operate at a high-level of autonomy to reduce operating costs and manpower requirements. A high-level of autonomy can be achieved only when robots are able to plan missions and tasks themselves. Trajectory planning is a fundamental building block required to support high-level decision making in robots. Trajectory planning for autonomous robots operating in complex environments is a challenging problem. The complexity of trajectory planning problems stems from the dimensionality of robot's state space, the complexity of the robot kinematic and dynamic model, the nature of environmen- tal constraints (e.g., obstacles), task constraints (e.g., rules), the optimization objective function, and the planning-time requirements needed for deployment in the real world. Depending on the complexity, these problems can be solved by existing methods to produce feasible trajectories. But, in many practical applications (e.g., automated package delivery), computing a feasible tra- jectory alone is not enough. The quality of the computed trajectory is also important. However, in many cases, computing truly optimal trajectories is computationally intensive and thus, very time-consuming. As a result, existing methods do not satisfy planning-time constraints required by the application while maintaining optimality. We need a method that produces high-quality trajectories and at the same time produce those trajectories quickly. Anytime methods handle ex- actly this problem. However, these methods produce high-quality trajectories quickly only when good heuristics are used. This thesis focuses on heuristic techniques for anytime algorithms that speed up trajectory planning for autonomous robots in complex environments. xiii Trajectory planning for Unmanned Surface Vehicles (USVs) is one representative example that includes many of the complicating factors such as uncertainty in motion, uncertainty in perception, spatio-temporally varying environment (e.g., waves and trac vessels), and fast planning-time requirements. Accounting for all of these factors is crucial for building reliable autonomy solutions. A wave-aware deliberative planner that avoids collisions with trac vessels is presented. It also opportunistically traverses waveelds generated by these vessels while minimizing the execution time and the risk of failure. The planner performs a search over a 4D pose-time lattice to generate a collision-free, minimum-risk sequence of motion goals. It addresses motion uncertainty, failure risk and perception uncertainty through a parametric model. Heuristics to speed-up computation are presented. Simulation and physical experimental results are presented. They show that our methodology quickly produces plans that execute faster and safer when compared to the typical reactive planning schemes. A real-time dynamics-aware reactive trajectory generator is presented. It produces trajectories that avoid collisions with dynamic obstacles under varying dynamic constraints imposed on the robot as its operating environment changes. It is an obstacle avoidance approach that explicitly reasons about reference trajectory deviation, varying dynamics constraints while minimizing col- lision risk with uncertain dynamic obstacles. Simulation results are presented in the context of the Unmanned Ground Vehicle (UGV) domain, to show that the planner is able to eectively deal with the dynamic obstacles on terrains with varying slopes. Trajectory planning for industrial manipulators operating in cluttered environments is a rep- resentative example that involves navigating a high-dimensional conguration space. In small- volume manufacturing applications, manipulators perform non-repetitive tasks that change fre- quently. Every time a task changes, generating manipulator trajectories by manual placement of waypoints is tedious and inecient. Automated trajectory-planning methods can help improve manufacturing throughput and reduce operating costs. Planning-time and trajectory-quality are the key metrics that decide the utility of an automated planning method. A highly reliable xiv trajectory planning framework is presented. It exploits workspace hints and uses a parameter scheduling scheme to generate high-quality trajectories quickly. The framework is also able to seamlessly incorporate human assistance through an intuitive user-interface to cooperatively (i.e., human and computer) solve dicult planning problems. xv Chapter 1 Introduction 1.1 Motivation Diverse domains such as large-scale automobile manufacturing, large-scale electronics manufactur- ing, medical surgery [128], self-driving urban vehicles, package delivery, marine trac control [6] and search/rescue operations can greatly benet from a degree of automation. These automation systems range from simple hardware mechanisms to a complex heterogenous team of autonomous robots. Let us take the example of the marine domain to see the need for automation and the benets it oers. Industry interest in marine automation is booming with growing pressure from heavy competition, increasing oil prices, stricter environmental regulations, elaborate global trade and emerging shipping trends. The marine industry is moving towards the phased introduction of automation technologies ranging from assistive-behaviors to fully autonomous operations. With recent advances in robotics and key enabling technologies (i.e., faster compute, sensing), utilizing autonomous robots in sectors such as ocean mining, sh farming, arctic exploration is physically realizable. In general, annual cargo-shipping costs are dominated by fuel and man power expen- ditures. To save on costs, naval eets in the shipping industry are expected to slowly transition into hybrid eets containing autonomous, semi-autonomous (reduced crew) and manned ships by 1 2035[5, 1]. Beyond the shipping industry, even oshore wind farm operations report that manned vessels account for as much as 60% of operating costs [4]. It is clear that marine automation can reduce operating costs. Marine automation can not only reduce operating costs but also improve eciency and safety. Between the years 2011-2016, loss of control and ship-to-ship collision were the major casualty events occurring in the sea [3]. As much as 42% of the casualties were reported to have taken place in port environments followed by 28% in coastal areas. Upto 60% of the marine accidents that took place between 2011-2016 were caused by human error. A disproportionately large number of these events were reported have happened to cargo-ships. From these statistics, Figure 1.1: Prime applications of marine automation: Port assistance, Search and Rescue, Port Policing it is clear that incorporating a degree of automation in prime application areas such as in Figure 1.1 can reduce costs and save lives. 2 Another domain that can benet greatly through automation is manufacturing. In manufac- turing settings, robotic manipulators are widely used in the factory oor for various tasks such as automotive assembly, quality control, packing and machining [2, 144, 67, 200]. A large fraction of these manipulators follow pre-programmed sequence of actions to complete tasks. Before any task, human operators program the manipulator by manually placing waypoints to accomplish a given task. This is tedious and inecient use of human labor. If the environment does not change, programming the robot once is enough. The robot is going to perform millions of repetitions of the same trajectory. So, the manual programming time is going to be amortized over millions of repetitions. In this case, it seems that time spent in manual programming is justied. But, due to the fact that humans have programmed the robot, we do not get any local optimality guarantees let alone global optimality guarantees. Considering millions of repetitions, even losing a second can quickly add up and decrease manufacturing throughput. When the environment changes or the task description changes (e.g., dierent car model, dierent door assembly) (Figure 1.2), the manipulators have to be programmed again. When the environment changes quickly (e.g., dynamic environment such as product packing area, mo- bile manipulators in a warehouse setting) (Figure 1.2), it becomes impractical to have humans reprogram robots. Thus, in such scenarios, using automated trajectory planning for manipulators can clearly lead to reduced operating costs and higher productivity. 1.2 Background Autonomous robots is a general term for various types of robots not limited to mobile robots, robotic manipulators and mobile manipulators. In order for a robot to operate autonomously, it should be able to perceive the environment it operates in and perform actions based on the 3 Figure 1.2: Left: An environment where tasks may change slowly (e.g., car model changes, dierent seat assembly), Right: An environment where tasks may change quickly (e.g., mobile robots interacting with human trac in a warehouse setting) perceived information to achieve a certain goal. Thus, robotic systems can be decomposed as shown in Figure 1.3. Figure 1.3: Architecture of a robotic system Commanding and controlling autonomous robots requires the con uence of many elds of engineering such as machine design, decision-making algorithms, Articial Intelligence (AI) plan- ning/scheduling, trajectory planning, control algorithms, computer vision, computer networks and signal processing. In this list, trajectory planning refers to a eld of robotics that studies decision-making al- gorithms that plan how a robot is going to navigate the world to achieve a set of goals. For instance, the goal could be to move from one location to another location in a safe manner and in the shortest amount of time. 4 Many higher-level autonomy functions such as mission/task planning rely heavily on trajectory planning. Consider a heterogenous team of 10 package-delivery robots consisting of land and air units attempting to deliver 100 packages within a deadline of a few hours. Each robot is to be allocated a task of delivering a set of packages to a certain area. And, the capability of each robot is dierent. Land units can haul heavy payloads and stay active longer. Air units can y fast but only take on light payloads and stay active for a shorter duration. Before all robots can be tasked, we require an understanding of whether a certain allocated task can be realistically completed by a robot (i.e., does there exist a trajectory plan for which delivery can be completed?). Thus, even at the task planning level, trajectory planning serves as a crucial tool for evaluating the feasibility of a task. Some of the relevant application areas are [192, 125, 175, 81]. Trajectory planning is a fundamental capability required to realize autonomous robot. Tra- jectory planning for autonomous robots is a challenging research area with immediate practical utility across many domains. Given an representation of the environment, trajectory planning determines how a robot should operate in the environment over time in order to move from an initial state to a nal state while satisfying a set of constraints. Typically, a trajectory planning problem is dened by a robot model, a search-space, a set of environment and task constraints, an objective function, an initial robot state and goal conditions. Depending on the nature of the planning problem, a robot model could include a robot kine- matic model, a robot dynamic model or both. A kinematic model of a robot describes the relationship between the input command velocities (e.g., wheel velocity, steering velocity) and how the robot is going move (e.g., robot body velocity). A dynamic model of a robot describes the relationship between the control forces/torques (e.g., motor torque, thruster force) and how it changes velocities of the components of the robot (e.g., wheel velocity). In modern control theory and planning literature, it is typical to describe the robot model using a state-space representa- tion. The state space of a robot refers to the collection of states a robot might take. For example, 5 the state of an Unmanned Ground Vehicle (UGV) can be described by the vector (x;y;; _ x; _ y; _ ). Given a state-space representation, the search-space can be dened in terms of the control sig- nals that move the robot. For example, a UGV is moved by the throttle and steering control signals. The complexity of a trajectory planning problem is determined by the dimensionality of the state space, the complexity of the robot model, the nature of environmental constraints, task constraints and the objective function. 1.2.1 Trajectory Planning Paradigms The goal of trajectory planning is to compute reference control signals (or reference trajectories) that move the robot from an initial state to a nal state while satisfying environment and task constraints. One of the basic constraints in trajectory planning is collision avoidance with static and dynamic obstacles. There are three general paradigms to solve the trajectory planning problem: search-based, sampling-based and optimization-based methods. Hybrid approaches combining the general meth- ods is also not uncommon. 1.2.1.1 Search-Based Methods Search-based methods perform a combinatorial optimization over a sequence of actions that can be performed in the state-space. Combinatorial optimization entails the systematic examination of the state-space to produce an optimal sequence of actions. Classical forms of dynamic pro- gramming fall in this category. Two popular algorithms are Dijkstra's algorithm [46] and A algorithm [69]. A algorithm utilizes a heuristic that guides the search towards promising actions that nally lead to the goal conguration. These algorithms work well when the environment information is crisp and known apriori. Moreover, due to the combinatorial nature of the search, high-dimensional state-space pose a signicant challenge to these methods. For instance, the worst-case time complexity of a combinatorial search grows as O(b d ) where b is the branching 6 factor and d is the depth of the search-tree. As the entire state-space is examined eventually, search-based methods are said to be complete. In complex environments and high-dimensional state-spaces, complete planners may not be able to produce solutions within reasonable time limits. 1.2.1.2 Sampling-Based Methods Sampling-based approaches such as RRT (Rapidly-exploring Random Trees) [110] and PRM (Probabilistic Road Maps) [87] are very similar to search-based methods except that they do not systematically examine every possible state-space before producing a solution. Instead, these methods take a dierent approach. Random samples of the state-space are used to explore and build a representation of the connectivity in the state-space. For this reason, many sampling- based approaches are only probabilistically complete; given enough computation time, they will eventually produce a solution. One of the major challenges in these methods is generating useful state-space samples (i.e., sampling), evaluating distances/costs between two points in the state- space and nding connections between these samples. 1.2.1.3 Optimization-Based Methods Optimization-based methods typically formulate the trajectory planning problem as an optimal control problem [73]. Environmental and task constraints are decomposed into a collection of initial, nal, path, dierential and algebraic constraints. Then, the optimal control problem is solved using approaches outlined in [22]. Ultimately, the problem is either solved analytically or numerically. The challenge in either case is the ability to incorporate a wide variety of constraints. For example, disjunctive constraints are hard to incorporate into the formulation without resorting to a type of Mixed-Integer Non-Linear Programming (MINLP). 7 Figure 1.4: Planning hierarchy 1.2.2 Planning Hierarchy Owing to the complexity, trajectory planning is often performed hierarchically. Typically, search- based methods are used for global planning over large areas and sampling-based methods are used for global planning in high-dimensional state-spaces. These methods are able to reason over long planning horizons produce reference trajectories that are able to easily capture a wide variety of constraints. Classically, optimization-based methods are used for low-level reactive trajectory planning where the constraints are simpler and typically have analytical representations. Op- timization methods are used to produce minor tweaks to the reference trajectory produced by search-based or sampling-based method. In all of these approaches, the duration of planning horizon is a design variable. Planning horizon refers to the length of time for which decisions are evaluated. Shorter planning horizons apply local reasoning and therefore focuses predominantly on shortsighted tasks such as avoiding obstacles only in the local vicinity. This forgoes looking at the larger picture and may get the robot stuck in a local minimum. However, shorter planning horizons generally require less computation time. Longer planning horizons on the other hand al- low for global reasoning and hence, result in optimal behavior. However, longer planning horizons generally require much more computation time. 8 1.2.3 Motion and Perception Uncertainty Motion uncertainty refers to uncertainty in the motion of a robot as it is commanded to perform a maneuver. This typically arises due to unmodeled dynamics and disturbances a robot is exposed to as it operates in the environment. For example, consider an Unmanned Surface Vessel (USV) moving over the sea. Even though the USV may be commanded to move due North, it may deviate o course due to sea disturbances. Perception uncertainty refers to the uncertainty in the measurements recorded by the perception system. This typically arises due to occlusions, sensing range limitations and sensor update rates. When the environment is controlled and predictable to a high-degree, the trajectory planning problem is often simple especially if the state space is low-dimensional (less than 4D). Automated warehouse management [39, 13] using mobile robots and robotic manipulators operating in an automobile assembly unit are examples where the environment is carefully controlled, uncertainty in state estimates are very low and often many parts of the trajectory planning problem can be pre-computed. Thus, in these applications the trajectory planning problem is relatively simple. However, when the environments change over time and when the robot is able to perceive the environment only with a degree of certainty, trajectory planning becomes challenging. It becomes more challenging when the spatio-temporally changing environments also in uence robot dynamics. There are many domains such as aerial robotics, ground-based robotics and marine robotics that present complex environments to an autonomous robot. In the marine robotics domain, automated trajectory planning for Unmanned Surface Vehicles (USVs) is a representative example of the challenges of automated trajectory planning in a complex environment. Signicant environmental spatio-temporally varying disturbances like waves, wakes and winds [184] impart strong motion uncertainty [26] to vessels operating over water bodies. The eects of motion uncertainty are markedly more pronounced in smaller USVs as they are particularly more susceptible to wave disturbances. Furthermore, the intentions of other trac 9 vessels in the vicinity of the USV is not known apriori and only uncertain predictions of the future trajectories of these trac vessels can be made. Collision avoidance is more challenging under these circumstances and it involves not only staying a safe distance from static obstacles and uncertain dynamic obstacles but also following COLREGs [186]. COLREGs is a set of maritime navigation rules set by the International Maritime Organization (IMO) to minimize collisions and confusion among marine vessels. Another aspect that makes trajectory planning challenging is the real-time requirements imposed by physical systems. Autonomous robots operating in real environments need to sense, plan and act as fast as possible to avoid collisions and adapt to changing environmental circumstances. 1.2.4 High-Dimensional State-Space Another aspect that makes trajectory planning challenging is the high-dimensionality of the search space. A representative example of this is the search space associated with a high Degree-of- Freedom (DoF) robotic manipulator arm. Typically, the dimensionality of the conguration space corresponding to a manipulator is equal to the number of joints the manipulator has. Determining collision-free trajectories for manipulators amounts to nding a continuous curve in the cong- uration space. Explicitly mapping the conguration space ahead of planning is intractable due to the dimensionality of the space. Therefore, trajectory planning for manipulators needs to be performed without this explicit mapping. 1.3 Goal and Scope This dissertation explores computational foundations for trajectory planning for autonomous robots operating in complex environments. Depending on the complexity, these trajectory plan- ning problems can be solved by existing methods to produce feasible trajectories. However, in many practical applications (e.g., automated package delivery, search and rescue), computing a 10 Figure 1.5: Dissertation theme. Specic applications impose constraints on minimum optimality and maximum planning-time. This rules out anytime methods with generic heuristics as can- didates. Anytime methods with good heuristics are needed to produce high-quality solutions quickly. 11 feasible trajectory alone is not enough. The optimality of the computed trajectory is also impor- tant. Traditionally, search-based methods have been very popular as they provide strong trajec- tory optimality guarantees. However, seeking optimal solutions by utilizing search-based methods are only practical when low-dimensional search-spaces are considered. In practical settings, the trajectory planning problem often needs to include not only the state of the robot but also time- varying constraints such as dynamic obstacles and dynamics constraints. Thus, time-dimension needs to be included in the search-space. However, in many cases where the search-space is high- dimensional, computing truly optimal trajectories using search-based methods is computationally intensive and thus, very time-consuming. Operating in complex environments (e.g., uncertain and dynamic environments) impose strict constraints on planning-time. One the one hand, we seek trajectory optimality aorded by search-based methods. On the other hand, we also seek fast planning times that search-based methods are known to lack. In order to make the search tractable, the eld of anytime methods was introduced by re- searchers. Anytime methods utilize heuristics in search-based methods to accelerate and produce useful suboptimal solutions very quickly. There are two classes of heuristics: admissible heuristics and inadmissible heuristics. On the one hand, the use of admissible heuristics preserves the opti- mality guarantees of search-based methods. However, ecient admissible heuristics that work in time-extended search-spaces are dicult to generate. On the other hand, the use of potentially inadmissible heuristics provides solutions quickly but provides no guarantees regarding optimality. There are no general techniques that can be applied to generate these heuristics. In any case, the optimality guarantees given by search-based methods are valid only when information about the environment and constraints are free of uncertainty. And in a lot of applications, it is acceptable to produce and use high-quality trajectories that are near-optimal. Despite the shortcomings of inadmissible heuristics, these heuristics are still valuable if they provide high-quality trajectories that are near-optimal. 12 For problems that are intrinsically high-dimensional (e.g., trajectory planning for manipula- tors), even search-based anytime methods do not scale well. Thus, sampling-based methods are the only viable alternative. However, it is well known that sampling-based methods produce sub- optimal solutions unless specic eort is made to take suboptimal solutions and repair them into optimal solutions. In these circumstances, sampling-based methods can be tted with machin- ery that utilizes potentially inadmissible heuristics to rapidly converge to near-optimal solutions. Figure 1.5 outlines the premise of the dissertation. This dissertation presents computational techniques for speeding up trajectory planning for autonomous robots operating in complex environments exhibiting one or more of the following qualities: 1. Requirement of real-time operation 2. Requirement of collision avoidance 3. Existence of perception uncertainty 4. Existence of robot motion uncertainty 5. High-dimensionality of search space The methods developed in this dissertation are domain independent. These approaches are how- ever motivated and demonstrated in the context of trajectory planning for Unmanned Ground Vehicles (UGVs), Unmanned Surface Vehicles (USVs) and robotic manipulators. 1.4 Overview This section provides a brief summary for each chapter in this dissertation. Chapter 2 presents a literature review on trajectory planning methods used in applications involving unmanned ground vehicles, unmanned surface vehicles and manipulators. The focus is 13 on search-based, sampling-based and optimization-based methods that address dynamic obstacles, motion/perception uncertainty and high-dimensionality of search space. Chapter 3 presents a reactive planning methodology that explicitly reasons about reference trajectory deviation and handles varying dynamics constraints while minimizing collision risk. Simulations are performed on an unmanned ground vehicle that operates over uneven terrain. It presents simulation results to show that the planner is able to eectively deal with the dynamic obstacles on terrains with varying slopes. Chapter 4 presents a deliberative planning methodology that avoids collision with dynamic obstacles, satises trajectory constraints, opportunistically traverses large permeable dynamic obstacles while minimizing the risk of failure. It also describes speed-up techniques to improve search performance. Physical and simulation experiments are described for an unmanned surface vehicle operating in congested environments. It also presents results from these simulation and physical experiments. Chapter 5 presents a sampling-based framework for trajectory planning in high-dimensional conguration space. It describes a novel bi-directional tree-search framework and its building blocks. Simulation experiments are conducted on high degree-of-freedom manipulators. It also presents results from these simulation experiments. Chapter 6 provides a summary of the contributions of this dissertation and its potential impact. It also identies avenues of future research. 14 Chapter 2 Literature Review 2.1 Overview Trajectory planning for autonomous robots is an extensively studied problem in robotics. This chapter reviews the literature related to trajectory planning in unmanned ground vehicles, un- manned surface vehicles specically focusing on dynamic obstacle avoidance, motion/perception uncertainty and high-dimensional state-spaces. Additionally, this chapter also reviews literature related to trajectory planning for high degree-of-freedom robots such as robotic manipulators. It also outlines some of the shortcomings of existing methods. 2.2 Handling Dynamic Obstacles 2.2.1 Reactive Trajectory Planning Methods A variety of methods has been proposed to address the issue of fast, reactive, dynamics-aware planning that translate waypoints into control sequences that avoid collision with static obstacles and dynamic obstacles. 15 2.2.1.1 Zero-Order Methods Some methods like potential eld method, Vector Field Histogram [25] and Nearness Diagram Method [134] do not explicitly encode the dynamics of the robot and obstacle into the procedure. They are considered zero-order methods as they only consider the noisy instantaneous position as measured by the sensors. As a result, a higher-level of deliberative planning is not possible. Furthermore, though most of these methods solve the given problem to a reasonable extent, they fail to provide a more general framework that can incorporate human-imposed rules (like obeying a trac light). Furthermore, they may result in undesirable oscillatory behavior. 2.2.1.2 First-Order Methods Methods such as Dynamic Window Approach (DWA) [53] and Velocity Obstacle (VO) Method [51] work directly in the velocity space (VS) and hence, are able to incorporate rst-order behavior of obstacles and also respect acceleration limits on actuators in the planning process. In particular, VO and its variants such as Non-Linear Velocity Obstacle (NLVO) [171], Generalized Velocity Obstacles (GVO) [208] has been successful in handling the dynamic obstacle and actuator limit problem. A major reason for its success is due to planning in VS. This provides a framework to capture many constraints in a unied approach. For example, static or dynamic obstacles can be readily encoded in VS. Other constraints like trac rules and other rules of engagement can be transformed into appropriately shaped velocity obstacles. The original VO formulation assumes the obstacles are moving in piece-wise constant veloci- ties between processing windows. While this assumption is relaxed in NLVO, prior information regarding the obstacle trajectory is required. Both in VO and NLVO, the robot is assumed to be capable of moving in any direction from a given location. But, car-like robots can only move in the direction they are facing. The GVO formulation incorporates such constraints through the kinematic model of the robot. However, it still retains the assumption that obstacle velocities are 16 piece-wise constant between processing windows. In high-speed situations, the feasible action set may become empty and this has to be detected early enough for contingency measures. Local obstacle avoidance paradigms such as potential eld method, vector eld histogram [25] and nearness diagram method [134] primarily consider the instantaneous position of the obstacles while computing collision avoidance strategies. Using just instantaneous position and purely reactive strategies result in undesirable oscillatory behavior. These methods are termed as zero-order methods as they only consider position and not the velocity of obstacles. Inevitable collision states (ICS) [55] is an ideal method that provides provable zero collision guarantees. In this method, states of the robot in which no feasible action may be performed to avoid collision are identied and deliberately avoided. Variants of ICS such as [127] and [24] dier in how much the three criteria for motion safety [54] is relaxed (i.e. knowledge of vehicle dynamics, future environment/obstacles and ability to reason over innite time horizon). The precise characterization of ICS is computationally expensive even for low-dimensional dynamics and hence, prohibitive for real-time applications. Other methods such as Dynamic Window Approach (DWA) [53] and Velocity Obstacle (VO) [51] directly work in the velocity space (VS). Hence, these methods are able to incorporate rst- order behavior of obstacles and vehicle dynamics to yield viable velocity vectors that avoid ob- stacles. The original VO formulation models obstacles moving in piece-wise constant veloci- ties between planning windows. Assumptions are relaxed in recent developments of VO such as [171, 78, 56, 17]. They dier in the representation of obstacles, assumptions related to obstacle trajectories and how they react to impending collision. While dynamics constraints are respected in GVO, ICS and some of the recent approaches [172, 178, 72, 75, 206], they do not explicitly consider (1) reference trajectory deviation, (2) time- varying dynamics constraints due to uneven terrain. Collision avoidance alone without regard to reference path deviation may compromise mission performance. 17 2.2.2 Deliberative Trajectory Planning Methods Deliberative planning involves planning for a larger time horizon and reasoning over the evolution of environmental constraints and other constraints over the entire planning horizon. Lattice- based methods used for dynamic obstacle avoidance compute trajectories using a lattice of viable alternatives. This lattice structure is computed using the kinodynamic model of the vehicle. The approach presented in [27] plans the trajectory in a 4D lattice structure. Computational eciency is retained by adaptive variation of lattice resolution and by using the Anytime Dynamic A* (AD*) [119] algorithm for graph search and by using environment constrained heuristics to guide the search. Similar methods have been developed for unmanned surface vehicles in [167], where the motion primitives used during the graph search are scaled proportionate to the congestion in the environment. Chen et al. utilize workspace heuristics [32] to accelerate the search. A sequence of circles is generated to connect the start and goal congurations. It is then used as a heuristic to guide the search. Dynamic obstacles are also considered in [33] where a cylinder path in time-extended workspace is rst generated and later utilized as a heuristic. Using lattice spaces with only position and orientation of the UGV results in inability to reason about dynamics. In [27], a 4D (x,y,,v) multi-resolution lattice based global planner is used to generate dynamically feasible trajectories for an autonomous car over 200 200 meter grids. Motion primitives are specically tailored for traversing the lattice space. This ensures seamless transfer between high-resolution and low-resolution lattices. To deal with limited deliberation time, the Anytime Dynamic A* (AD*) algorithm is used to solve the graph problem. Mechanism and environment constrained heuristics are computed to more eectively guide AD* to the goal. All of these features combined enables real-time planning capability. Time-bounded lattice [99] is an extension of [27] to account for uncertain and dynamic ob- stacles. It uses a hybrid state lattice that is 6D near the current UGV position and 2D when 18 suciently far away. The threshold for the transition to lower dimensional lattice is determined by the uncertainty propagation of dynamic obstacle. The transition to lower dimensional lattice is initiated when collision probability with dynamic obstacles decrease below a threshold. By doing this, real-time guarantees on the performance is given. Optimization-based methods such as [199], [37], [151], [207], [117] have been used to compute optimal trajectories. These trajectories are generated for a relatively short planning horizon and thus, only optimal considering the short horizon. Specically, Model predictive control (MPC) based algorithms [7, 151, 37, 199] are also used to compute trajectories for robots with dynamics constraints by solving it as an optimization problem. In the marine domain, Abdelaal and Hahn implement collision avoidance [7] as a time-varying constraint at the controller level. Their method is also compliant with International Regulations for Preventing Collisions at Sea (COLREGS) by incorporating them as constraints depending on the state of the Unmanned Surface Vessel (USV) and the trac vessels. However, uncertainty introduced by the perception system is not included in the formulation. Tam and Bucknall report a cooperative path planning algorithm [189] for USVs. Their main contribution is a deterministic path planner that incorporates the evasive action that other trac vessels may take when reacting to the motion of the USV. The path planner produces single course changes in a manner that is consistent with COLREGs. This is dierent from other methods where multiple course changes are prescribed. Naeem et al. utilize the A algorithm [135] with specialized heuristics. To implement COL- REGs constraints, virtual obstacles are placed at specic locations of the oncoming trac. This naturally encodes the COLREGs rules without any special handling. Works such as [215, 177] consider the ocean currents as time-varying elds over which path optimization is performed. Some of these techniques are also applicable in the context of dynamic obstacle induced waves. 19 In the context of aerial robotics, Liu et al. use a two-stage approach [121] where the workspace is rst decomposed into regions known as Safe Flight Corridor (SFC). These regions are formed by a sequence of overlapping convex polyhedra connecting the start and goal points. Once these regions are computed, these regions are used in the ensuing trajectory optimization problem as constraints. The idea is to nd optimal trajectories that completely lie within the sequence of overlapping polyhedra. The candidate trajectories are constructed in a piecewise manner such that each piece of trajectory lies within a single convex polyhedron and each piece connects to the next piece (including higher-order continuity). This way a graph-search (global) method is fused with an optimization-based (local) method to generate trajectories that are not fundamentally limited to one homotopy class. For trajectory planning on an aerial platform, Otte et al. [140] form a discrete graph over state-space where the state consists of time, pose and velocity. It handles a time-varying wind eld and produces solutions in an anytime fashion by allowing a specied level of suboptimality. Werling et al. generate optimal trajectories [206] for autonomous vehicle operating in urban trac. It considers trajectories of dynamic obstacles and produces trajectories that realize tasks such as lane-changing, merging, distance-keeping and velocity-regulation. A family of polynomial curves are used to describe lateral and longitudinal movements in body-xed frame of reference. Lateral and longitudinal movements are jointly optimized to form optimal trajectories considering minimization of jerk, velocity-regulation and collision avoidance. Generally, motion uncertainty and perception uncertainty are not included in the formulation of the problem. 2.3 Handling Uncertainty In [184], motion uncertainty is handled in through the use of precomputed transition probabilities over diering sea states and contingencies are handled through a localized search in the space of 20 probabilistic transitions. Under perception and motion uncertainty, MDPs are transformed into POMDPs (Partially Observable Markov Decision Processes). Belief-space (pose covariance space) search is employed in static maps [31, 115] to capture motion uncertainty and compute plans that minimize nal time and covariance. In [9], the Probabilistic Road Map (PRM) idea is extended for belief spaces using MDPs and feedback controllers. Feedback controllers are designed to drive the agent into a sampled state in belief space called a FIRM node. Then, through the use of a precomputed policy for the MDP over FIRM nodes, a sequence of controllers that drive the agent from the initial state to the nal state is obtained. This way, POMDPs is made computationally tractable for larger scale problems involving only static obstacles. It is hard to extend these methods to work in domains with spatio-temporal disturbances. Raeisakhae et al. solve certain class of POMDPs [155] using a two-stage process. They attempt to break the curse of history by moving the robot towards states for which observation uncertainty is lower. Doing this allows for a more accurate knowledge of the robot state as a function of time. This in turn allows for better trajectory plans even though the robot may deviate signicantly from an initially planned path. When non-linear inequality constraints (e.g., obstacle avoidance constraints) are certain, one can easily verify if these constraints are satised or not satised. However, when non-linear inequality constraints are uncertain and their probability distribution has unbounded support, one cannot deterministically claim these constraints are satised { only a probabilistic claim can be made. Thus, instead of hard/crisp inequality constraints, we have a chance constraint (probabilistic constraint). An example of a chance constraint is P (g(x) 0)> 0:99 21 Here,g(x) 0 cannot be satised perfectly. But, the chance constraint requires that g(x) 0 be satised with a probability of 0:99. In [23, 183], chance constraints are used to guarantee the failure probability to be below a certain prescribed value while nding paths through static obstacles dened by convex polygons. [23] introduces the notion of risk budget and risk allocation. It allocates a large part of the risk budget to trajectory segments where collisions are more likely, and only small part of the budget to trajectory segments where collisions are less likely. Dynamic obstacles are not considered in these works. Toit and Burdick use a belief-space formulation [196, 47] and use chance constraints to avoid dynamic obstacles. By incorporating the fact that a new measurement of the robot state will be made in the future and the robot controller will attempt to regulate based on that measurement, they are able to produce more aggressive trajectories in the presence of uncertainties. A similar approach is taken in [202] where apriori probability distribution of state and control is computed using a linearized dynamic model under LQR control and a Kalman lter based state estimator. The idea is to analyse in advance how the trajectory of the robot will evolve over time. The uncertainty in the trajectory is then used to perform probabilistic collision detection along segments of the trajectory. In [131] and [66], terrain variation and its eect on the low-level trajectory tracking error is considered in the planning framework. In the trajectory tracking context, tracking error is typi- cally split into two parts: cross-track error and along-track error [169, 114]. Trajectory tracking error is characterized in terms of cross-track and along-track error for various terrain types. And, it is modeled as a function of terrain. The probability of successful navigation through the terrain is computed during search via a cost function that uses terrain-dependant error model. Bry et al. [29] produce trajectories by reasoning about collision avoidance in belief space. It uses the fact that a motion plan is going to be executed with a linear state estimator and controller to prediction future distributions of the state vector. The goal is to reduce uncertainty 22 by exploiting measurement regions (regions where measurement is accurate) and also nd low cost paths. A similar approach is taken by Berg et al. in [203]. The main novelty arises from the fact that it does not perform discretization of the state-space, does not use maximum-likelihood observations and runs in polynomial time. Instead, it makes a quadratic approximation the value function and approximates the belief dynamics using a EKF(Extended Kalman Filter). It considers the robot dynamics model as well as the perception model. In [145], a learning-based approach is taken to account for motion uncertainty. Samples of executed trajectories on some representative terrain is used to train a Gaussian Process Regression (GPR) based model. This model is then used in the MDP-based policy planner. This allows the MDP-based policy planner to handle traversal of rigid terrain and also deformable terrain such as small rocks. A Monte Carlo sampling scheme is used to compute collision probabilities [104]. We favor the Monte Carlo sampling approach as it is amenable to parallelization. However, this approach requires extensive sampling when high condence estimates of collision probabilities are required especially when chance of a collision is rare. This problem can be alleviated if adaptive sampling techniques are used [164]. Computing exact collision probabilities is important when probabilistic constraints are used. When only relative safety of trajectories are considered, the exact collision probability does not matter. In [79], the reported planning times are suitable in a reactive planning context. A semi-analytical method to calculate collision probability [143]. Accurate prediction of trajectories of moving obstacles [137] is a fundamental requirement for deliberative trajectory planning. Several threads of research work are in progress to provide predictions. An intention model is used in [168] to predict the motion of civilian vessels and develop contingency aware plans. In the marine domain, AIS (Automatic Identication System) data [198] can be used for coarse grained trajectory prediction. Though AIS is susceptible to communication interruptions [105], when it is used in conjunction trajectory prediction methods 23 using GPR [92, 57, 195, 14], can produce good estimates for long-term planning purposes [113]. Waves generated by moving vessels can be forecast upto 180s into the future using radar systems [71, 100]. In this thesis, it is assumed that such a perception system is available in the sensing suite to aid deliberative planning. 2.4 Handling High-Dimensional Conguration Spaces A prime example of trajectory planning in high-dimensional conguration space is manipulator trajectory planning. In this context, trajectory planning is typically done in two levels. In the rst level, paths satisfying problem constraints are generated. Problem constraints can consist of many constraints such as contact constraints (e.g., tracing workpiece borders, touching objects), manipulability constraints, grasping constraints and collision avoidance constraints. At the second level, trajectories are computed by meeting robot dynamics constraints. This section reviews methods used in the rst level. Specically, it considers only methods that handle collision avoidance constraints (e.g., point-to-point trajectory planning for manipulators). Point-to-point trajectory planning for manipulators is a well studied problem and researchers have developed many methods spanning potential-eld-based methods [204, 89], sampling-based methods [87, 95, 86, 79, 59], search-based methods [150, 211, 35, 36, 60, 61, 191], and optimization- based methods [84, 219, 221, 193]. Search-based methods such as [10, 146] have been successful in motion planning for high degree of freedom systems. In particular, Phillips et al. propose a trajectory planning method that utilizes plans from past episodes to accelerate the search. The premise of this method is that despite dynamically changing environments, large portions of the environments that the robot encounters are static. The planner should exploit this fact and learn to solve similar problems faster as more experience is gained. In this method, past solutions are stored in terms of an experience graph that allows this method to work for unseen start/goal congurations. For unseen 24 start/goal congurations, the method attempts to nd solutions that pass through the experience graph. This way regions explored by the experience graph is eectively reused. When new regions need to be included, the experience graph is updated seamlessly. By reusing and updating the experience graph every new planning query, this method is able to rapidly learn to focus planning eort in promising regions of the search-space. In [10], Aine et al. propose a search-based method that utilizes multiple heuristics to solve planning problems. The core idea is that a single guiding heuristic that works well in all regions of the search space is hard to generate. Often a single heuristic fails make progress as enters a heuristic depression region (a region where there is a large discrepancy between the heuristic value and the actual cost-to-go). Instead, multiple inadmissible heuristics can be generated. Each of these heuristics can be given an opportunity to make progress in the search space in a round- robin fashion. These dierent heuristics complement each other in such a way that not all of the heuristics experience a depression region at the same time. This allows the search to proceed even when a few of the heuristics are not eective. Sampling-based approaches have been extensively used for motion planning of high degree of freedom systems. Many variants of Probabilistic Roadmaps (PRMs)[87], and Rapidly-exploring Random Trees (RRTs)[95] have been developed. Many techniques have been applied to achieve these goals. These include introducing a biased sampling scheme, providing avenues for anytime convergence, heuristic guided sampling, and allowing a specied factor of sub-optimality [12, 18, 59, 86, 136, 187, 96, 107, 220, 190, 28, 159, 103, 213, 160, 218, 38, 162, 216, 217, 11, 126, 111, 68, 129]. In [49], recent developments of RRT and its variants are outlined. Recent approaches such fast matching trees (FMT )[123] and batch informed trees (BIT )[59, 58] attempt to outperform RRT in high dimensional problems. Traditional search-based methods suer the curse of dimensionality. When the dimensionality of the search space is large, search-based methods need to explore an astronomical number of conguration points to compute a sequence of actions leading to the goal. Typically, trajectory 25 planning problems in ve or greater dimensional space are generally computationally very expen- sive and take a lot of time to solve. This is due to the systematic evaluation of potential actions at each conguration space point which provides completeness at the expense of planning time. Traditional sampling-based methods do not face this combinatorial blow-up as they sample in conguration space. Sampling-based algorithms such as RRT [107] and PRM [87] capture the connectivity of free conguration space through the incremental construction of a free space connectivity graph. However, sampling congurations point by point to establish connectivity of the free-space is inecient and potentially useful information regarding the local structure of the free conguration space is discarded. Recent works have attempted to incorporate the information about local conguration space in various ways. Firstly, caching methods are used in [142] where the conguration space is learned through a sequence of conguration space validity queries. These methods build a predictive model of the conguration space to quickly answer validity queries. The method presented in [41] not only learns the conguration space but also engages in active learning to improve prediction accuracies. In other works such as [8, 174, 102], the local conguration space information is directly used in the search process. In [174], free conguration space volume information is used to build free space ball in conguration space. A tree of such balls is constructed using RRT and tree search is performed over the tree of balls to nd a sequence of balls leading to the goal conguration. A similar method is used in [143], [8]. [102] builds on [152], extends it and provides a method to approximate the free space volume in conguration space. An ecient method to construct volumes of free C-space (i.e. bubbles of free C-space) was rst introduced in [152]. The basic idea behind the method is to use workspace clearance-distance information to analytically dene 26 a volume in C-space that is guaranteed to be collision free. A bubbleB(q;d c )2C f is a region of free C-space around a conguration q dened as B(q; d c ) =fyj n X i=1 r i jy i q i j d c g (2.1) where d c is the clearance distance between the robot and the closest obstacle in the workspace. And,r i is the radius of a virtual cylinder that is coaxial with the rotational jointi and encompasses the rest of the robot from link i onwards till the end-eector. Details can be found in [152, 101]. Lacevic et al. extended it by introducing the notion of complete bubbles [101] which can be approximated by a set of spines (i.e. rays emanating from q). Adaptation laws to determine the delity of the approximate volume is presented. Using these better approximates of the free space volume, RRT style search is performed to yield a solution. Through the use of RRT method in constructing the search tree, the method is susceptible to missing some promising free space volumes and prone to nding sub-optimal solutions. Furthermore, if free space volume approximates are generated through caching methods such as [41], then these volumes may not be guaranteed to be free of obstacles. This possibility is explored in [174]. Researchers have also combined sampling-based approaches with Jacobian-based control to improve performance[182, 181]. Other ideas for improving computational performance include examining only promising sub- spaces of the overall search space [60, 62, 61], performing a multi-stage search [148, 147, 149], and the use of motion primitives [35, 36, 132]. Each method targets one or more of the following goals. Reducing planning time Increasing solution quality Providing optimality guarantees 27 In the context of bi-directional tree search, these goals are achieved by various techniques that modify: How sampling is performed How promising nodes are identied How promising directions to explore are found How small steps (i.e., extensions) are performed How two trees are connected 2.4.1 Ecient Collision Detection Ecient collision-detection and eective sampling distribution are fundamental requirements for the eciency of RRT-style planners [138]. Collision detection dominates planning time; ecient collision-detection reduces planning time. Eective sampling not only reduces planning time but also increases solution quality by focusing on regions where high-quality solutions are likely to occur. Recent eorts to improve eciency of collision-detection try to build approximate models of C-space [85, 142, 41] or attempt to nd mathematical conditions to avoid unnecessary collision- detection queries [152, 101, 139]. 2.4.2 Sampling Strategies Workspace cues have been used to guide sampling-based approaches [201, 98, 219, 158, 90]. Rickert et al. follow a two stage approach [158] where a RRT-based planner operating in C-space leverages workspace cues. Workspace paths connecting start and goal poses are used to bias the search into promising regions in [158]. As a precomputation step, a search-based method is used to nd a workspace path for the end-eector. This workspace path provides workspace cues that focus 28 RRT sampling and attempts to drag the end-eector along the workspace path. This method was deliberately designed as an incomplete method. This method works well in uncluttered workspaces and in situations where connectivity inC is largely in uenced by that of theW- space (e.g. conguration spaces of mobile manipulators). It behaves like a potential eld method whenW-space connectivity mirrorsC-space connectivity and works very well for many practical scenarios involving mobile manipulators and less cluttered manipulator workspaces. However, these methods suer when there is a large discrepancy between the connectivity in the cluttered workspace than that of the high-dimensional conguration-space. In [16], a novel sampling strategy is proposed. It uses a learning-based approach to perform approximate collision detection. It also ensures the sampled point is in a relevant region before continuing with an accurate collision checker. As this method is presented only in 2D conguration spaces, it is unclear if this method is applicable to higher-dimensions. Kiesel et al. use a focusing scheme [90] where a subspace ofC-space (i.e. essentiallyW-space) is discretized into regions. By incrementally establishing connectivity between these regions in an online fashion, easy regions of the C-space are identied. Planning eort is focused into easy regions as they are uncovered. This scheme also suers when there is a mismatch between connectivity in W-space and C-space. Although techniques that provide a diverse set of paths as in [21, 153, 94, 214] can be em- ployed, explicitly constructing a workspace path for the end-eector in a cluttered workspace while eciently pruning workspace paths having infeasible C-space counterparts seems to be just as hard as the original problem itself. Nevertheless, if we insist on utilizing workspace cues, then quickly detecting that a workspace path is misguiding is crucial for conserving computational resources. Clearly, there is a trade-o. We could invest computational resources attempting to nesse through a workspace path and succeed or admit failure and give up after a while resorting to uniform sampling. We require the focusing scheme to reliably predict a misguiding workspace path and give up quickly. 29 Other workspace guidance methods as in [133, 112] will not work for a cluttered 3D scene unless the workspace is carefully decomposed into a tractable number of disjoint polytopes. De- pending heavily on any particular workspace-focused-sampling strategy may be detrimental to performance. 2.4.3 Node and Target Selection Strategies In [205, 173, 112], the Jacobian pseudo-inverse is used to generate target congurations. Specif- ically, Weghe et. al. use a goal biasing scheme that drives the search towards workspace goal regions [205]. Jacobian pseudo-inverse is a potent technique for bridging the workspace and the conguration space. While it helps cover large distances in the workspace, intricate congura- tion space maneuvers cannot easily be performed using this technique alone. When solutions pass through narrow channels in the conguration space, sampling-based approaches tend to miss them unless special care is taken to sample these channels. Methods such as Retraction-based RRT [141],Spark PRM [170] and [187] focus on these problems. Variants such as RRT , PRM [86] guarantee almost-sure asymptotic optimality. However, without eective focused-sampling, nei- ther probabilistic completeness nor asymptotic optimality is practically appreciable cf. RRT [86], BIT [59]. In [154], a novel target selection method is employed. A Contractive Auto Encoder (CAE) is trained using a xed number of colliding congurations. Random maps and random start/goal congurations are used to generate optimal paths. These paths are segmented into many linear segments. Given the start point of a segment, goal conguration and the encoded obstacle map, a neural network is trained to produce the end point of the segment. Any lacking in the repre- sentation of the obstacles can severely aect performance. The main diculty in this method is being able to faithfully represent an unseen environment using the CAE. And, the MPNet does not adapt to changing tree-state, rather it just represents a static policy determined through 30 only through the initial sampling of the obstacles. Adapting to tree-state changes is required for robustness against any representational lacking. End-to-end Machine Learning (ML) methods to solve the motion planning problem are gen- erally not preferable as they may have unreliable performance (e.g., completeness issues). They typically require large training datasets. It is better to use Deep Learning (DL) based techniques as acceleration scheme for existing methods. In this spirit, a novel target selection strategy is proposed in [74] that uses Q-learning. It attempts to unify the benets of PRM (i.e., multi-query method) and also avoids the need for graph repair when obstacles shift under one method. In this work,C-space is approximated by tting Radial-Basis-Functions (RBFs) using pre-sampled C-space samples and their collision labels. The trained RBFs output a number denoting the ap- proximate degree of collision. Distance functions representing the conguration space distance from an arbitrary state to the goal and workspace distance between EE position of the arbitrary state to that of the goal are dened. The Q-value determined through weighted combination of the distance function and RBF approximation. The weights are learnt online. Another method that is meant to be used as a multi-query method is OracleNet [20]. It uses Long-Short-Term-Memory (LSTM) networks to learn the optimal next move to make in the conguration-space. While OracleNet has an advantage in terms of its applicability in a real-time setting (i.e., an one-step move can be immediately generated to move in a dynamic environment), it is likely to not work well for cluttered environments where a more intricate conguration-space maneuver needs to be made. In other words, even if obstacle context is added, it unlikely that LSTM networks can generalize and learn the required behavior for previously unseen and cluttered environments. This particularly true when narrow passages in static environments are missed by the training set. The main diculties in applying this method are being able to sample eectively in narrow passages (for generating training set) and nding a way to impart high-delity obstacle context to the network. 31 For example, in [76], the basic functions required by sampling-based methods are replaced by encoder, decoder and classier networks. The basic idea is to move the original problem into a latent-space and solve the latent-space problem. Training these networks requires a lot of data. The problem of tuning strategy vectors (i.e., planner parameters) can be posed as reinforcement learning problem [165] where we learn a policy that sets planner parameters as a function of the search-state. The main drawback is developing a feature vector to encode the search-state which includes a dynamically growing tree. Specically, we can apply a Deep Reinforcement Learning (DRL) approach to automatically extract features. If DRL scheme is used, we require a graph- embedding that accommodates a dynamic graph-network such as the two search-trees and the collision data collected so far. These entities grow in size over time. The embedding needs to be permutation invariant, node label invariant. Finding an embedding for these dynamic entities is challenging. Methods such as [124, 197, 64, 65, 63] deal with dynamic graphs but are not immediately applicable in our context. Another diculty in applying DRL is the fact that motion planning problems only have sparse rewards. For instance, we get a reward (i.e., solution found) after the whole episode of executing a policy. Moreover, the reward is stochastic which further aects the convergence properties of the DRL method. 2.4.4 Connection Strategies Variants such as RRT*, PRM* [86] guarantee almost-sure asymptotic global optimality. BIT* [59] uses a conguration space focusing schedule and provides optimality guarantees. While these methods are guaranteed to nd optimal solutions eventually, being able to utilize workspace guidance when it is accurate can greatly complement their strengths [34]. In [45], a workspace RRT is grown from the goal position. The resulting tree is used to bias the growth of the conguration space tree. This method works well for mobile manipulation in uncluttered environments. 32 2.4.5 Incorporating Human Assistance in Planning In many applications, there is always a human-operator managing robot operations even if these robots are automated. Often these operators perform a nal verication before the automatically- generated trajectories are executed. Despite the use of specialized techniques, automated path- planning methods may not always nd a solution within a given time budget; even if they nd a solution, the human-operator may reject it due to excessively long trajectories or uncomfortable proximity to obstacles. In such cases, the human-operator may as well play a proactive role in shaping the produced paths. The eld of cooperative-planning studies how human assistance can be incorporated into an underlying automated path planner. The goal of cooperative-planning is to harness the power of human intuition in focusing the automated search and produce high- quality solutions quickly. The level of assistance by a human-operator is a crucial factor in the usability and eciency of a cooperative-search. Denny et al. discuss the various levels of assistance in [44] such as specifying intermediate congurations, providing a rough path and providing hints [130, 50] in the form of focus regions to the underlying automated search. In [188, 19], haptic devices are used to generate intermediate congurations. Humans generate intermediate congurations slowly; intermediate congurations also pose a signicant User-Interface (UI) design-challenge unless expensive haptic devices are used. Generating a rough path, though eective [44], is mentally more demanding than providing focus regions. A method of providing focus regions is given in [43]. In [77], search progress stagnation is detected using heuristic values and auxiliary metrics related to nodes that are popped out of the priority queues. Once a stagnation is identied, the user is prompted for assistance in the form of a conguration point. 33 Chapter 3 Dynamics-Aware Reactive Trajectory Planning 3.1 Introduction The ability to successfully avoid collisions with dynamic obstacles is a fundamental capability needed to realize autonomous robots. The collision avoidance approach must take into account the robot's performance constraints, such as maximum achievable velocities, accelerations, braking distances, and turning radii. Collision avoidance with dynamic obstacles on uneven terrains is challenging because a robot's performance constraints change based on the robot state, terrain characteristics, terrain slope, and robot's orientation with respect to the terrain slope. Consider the stopping distance constraint as an example. The stopping distance constraint cannot be dened just based on the robot characteristics alone. The stopping distance changes based on robot's velocity. It also depends on the traction available on the terrain. The slope of the terrain aects this as well. Finally, the stopping distance is shorter if the robot is traveling uphill compared to the same robot traveling downhill. Thus, the planner must be dynamics-aware and use accurate estimates of robot's dynamic constraints based on the current state of the robot and the terrain. Using overly conservative dynamics constraints can avoid collisions by stopping or steering the robot far from the obstacles, but lead to signicant deviations from the intended trajectories and 34 compromise mission performance. By contrast, using constraints that overestimate robot's capa- bilities leads to an increased risk of collision. Therefore, using accurate constraints is important during the generation of reactive plans to avoid collision with dynamic obstacles. In order to avoid collision with dynamic obstacles, the robot must consider modications to the intended path by generating path alternatives and consider regulating speed along the alternative paths to nd a trajectory that avoids collision and minimizes deviations from the planned nominal trajectory. This trajectory modication must be done in real-time. Based on the available time, the planner should then automatically adjust the number of options that it evaluates to ensure that computation time is tractable and does not reduce the reaction time available to the robot. Finally, the reactive planner needs to account for uncertainty in the obstacle position and velocity when evaluating trajectories for collision risks. The uncertainty in obstacle position and velocity reduces as obstacles approach the robot. The planner should delay committing to a specic trajectory until the last possible moment to fully exploit the reduction in uncertainty in obstacle position and velocity. Dynamic obstacle collision avoidance is a well-studied problem (see Section 2 for a detailed discussion). Popular methods include Generalized Velocity Obstacles (GVO) [208] and its variants. In this work, a methodology that is real-time, dynamics-aware and generates trajectories which avoid collision with dynamic obstacles is presented. It works by blending the path selection and the speed prole selection. Conceptually, our method can be viewed in the same family as GVO. Our method diers from GVO in the way control/input space is partitioned. In GVO, control- space sampling and collision detection determines the available feasible control actions. These control actions impart changes to both the path and speed of the UGV. So, a dense control- space sampling is required to explore a variety of trajectories. In addition, one way time-varying dynamics constraints can be incorporated into the selection of control actions is to use costly forward simulations of at least a low-delity UGV dynamics model to verify feasibility of the 35 control actions. Otherwise, it is customary to assume an overly conservative UGV capability and use conservative control actions resulting in degradation in mission performance in some cases. In this method, trajectory generation is decoupled using the standard approach of geometric path selection followed by speed prole selection. As a result, the geometric paths can be more spatially focused for collision avoidance avoiding the need for dense control-space sampling. Fur- thermore, the variation in dynamics constraints along these paths is easier to incorporate as a function of the position of the UGV on the path. For example, a path crossing a slippery or sloped region imposes acceleration constraints that can easily be incorporated into the spatio-temporal conguration space of the path along with the dynamic obstacles. These paths are chosen with the curvature and velocity limits of the UGV. Our trajectory generator accounts for uncertainty in the obstacle position and velocities when evaluating alternative trajectories for collision risks and uses conservative values of estimated dynamics constraints to ensure that the risk of collision is minimized. It also uses deferred value binding to utilize improved estimates of obstacles' states as they come closer to the robot. Finally, it automatically adjusts the number of options being evaluated based on the estimated time to collision. A robot may encounter multiple dynamic obstacles. In order to maintain real-time performance, the approach presented in this work either groups multiple obstacles into a single composite obstacle in spatio-temporal conguration space or deals with them sequentially (i.e. prioritizing obstacles based on the estimated time to collision). This approach works well for both cases when obstacles are too close or too far apart. 3.2 Background Autonomous operation of an Unmanned Ground Vehicle (UGV) requires a path and trajectory planner that respects both the vehicle's kinematic and dynamic constraints [106]. Typical UGV 36 missions may span several kilometers and computing the entirety of a dynamically feasible trajec- tory over such large distances is time consuming and impractical in the face of dynamic obstacles. Thus, a hierarchical planning architecture is used. The hierarchical planning architecture is composed of three layers: (i) the Global Path Gen- erator (GPG), (ii) the Trajectory Generator respecting Kinematic Constraints (TGKC) and (iii) the reactive Trajectory Generator respecting Dynamics Constraints (TGDC). Each layer in the architecture has its own planning horizon dened by distance or time. And, each layer sets up a reference trajectory for the layer below. In the case where a reference trajectory imposed by a upper layer is found to be infeasible by the lower layer, the lower layer raises an exception that is handled by the upper layer. In this work, the GPG described in [166] is employed. It produces a globally optimal, any-angle, geometrically feasible path on a planning horizon which spans several hundred kilometers. The TGKC computes trajectories respecting the vehicle's kinematic constraints while tracking the geometric path laid out by the GPG. It employs a planning horizon spanning hundreds of meters. This relatively short planning horizon enables it to re-compute plans at higher frequency than the GPG. The TGKC computes a trajectory consisting of a reference path and a reference speed prole along the path. Only static obstacles are taken into consideration in TGKC while the dynamic obstacles are delegated to the TGDC. The focus of this work is the development of the third layer of the planning architecture: Trajectory Generation respecting Dynamics Constraints (see Section 3.3). The TGDC computes trajectories that respect the dynamics constraints of the vehicle and avoids moving/dynamic obstacles over uneven terrains, planning trajectories with a horizon of up to several seconds. Its re-planning frequency is signicantly higher than that of the TGKC, which is essential in the presence of dynamic obstacles. While avoiding dynamic obstacles, the TGDC also attempts to follow the trajectory provided by the TGKC as closely as possible, altering the velocity prole on the reference path. Where appropriate, the TGDC also locally alters the reference path prescribed 37 by the TGKC. These alterations are performed in view of vehicle capabilities on uneven terrain. An example is presented in Section 3.2 to brie y illustrate the trajectory generation process in TGDC. Let us assume that TGKC layer has commanded a UGV to reach the top of the ramp in Figure 3.1. The UGV is on the ramp, attempting to climb it. A dynamic obstacle is poised to cross the intended path of the UGV. In this situation, it is crucial to know if the UGV is capable of accelerating (against gravity) to pass the on-coming obstacle. On the same note, if the UGV happens to be facing down the slope, it is crucial to know if the UGV is capable of slowing down (against gravity) to avoid on-coming obstacles. Static obstacle Ramp Dynamic obstacle Reference path given by TGKC UGV Figure 3.1: An illustrative example showing the need for vehicle-dynamics aware planning 3 alternative paths 1 2 3 Speed pro file on path Time Path 1 Path 2 Path 3 Figure 3.2: Alternative paths choices are shown. For each path, there can be many associated speed proles that can circumnavigate the dynamic obstacle in the face of dynamics constraints. Only one speed prole is shown for each path. On each path, a few speed proles are evaluated in TGDC to yield the best path and speed prole. Suppose the UGV's throttle actuator is already close to saturation. If we simply assume that the UGV is capable of high acceleration and go full throttle ahead, the UGV may not able to 38 speed up fast enough to avoid the dynamic obstacle and end up colliding with it. Conversely, assuming overly conservative acceleration bounds for the UGV and not accelerating when it is in fact doable results in reference trajectory deviation. This means future commands to the speed of UGV have to be selected carefully. Broadly, the selected speed proles perform one of three strategies: pass obstacles (by rushing ahead along the path before the obstacle crosses the path), yield to obstacles (by waiting for obstacles to cross the path) or continue along path with no change to speed. Given that the UGV is already throttle-saturated, what other options are there to reach the top of the ramp quickly and safely? Instead of accelerating, the UGV could try modifying its path slightly to \buy" some time for maneuvering around the dynamic obstacle at the same speed. This means alternative future paths (see Figure 3.2) have to be selected opportunistically while minimizing the deviation from the reference path provided by TGKC. The rest of the chapter covers the technical details behind the TGDC. Symbols and terms used in the following sections are brie y described below. s2R : Parameter variable in the arc-length parameterization of a curve in 3D space. K(s) : A curve K :R7!R 3 in 3D space parameterized by s. l K : Total length of curve K. t2R : Time. v2R : Speed on a curve K. Speed prole : This refers to various speeds the UGV travels at along the path. Note that this is slightly misrepresented in the gures as a position vs time function (instead of speed vs time function). C K (or st space) : A conguration space C K dened over s and t. Each point p 0 = (s 0 ;t 0 )2C K R 2 in this space represents a particular position s 0 on a certain curve K(s) 39 at a particular timet 0 . The subscriptK denotes the association of the space with the curve K. (see Chapter 7.1.3 [106]) PW K (s 0 ;t 0 ;l;t h ) (or planning window) : A rectangular subset of C K dened by the rect- angle's two diagonal points p lower left = (s 0 ;t 0 ) and p upper right = (s 0 +l;t 0 +t h ) where t h is the horizon (look-ahead) time and l is the total length of the curve K. K (t) (orst trajectory) : A trajectory K :R7!R dened over a planning windowPW K with timet as the independent variable. In other words, this trajectory is a specication of position on the curve K as a function of time t. UGV state : A tuple of numbers consisting of the bounding circle radius, position, velocity, orientation and direction ( , see Figure 3.1) with respect to the slope of the terrain. accel;K (t) : Thest trajectory on the curveK as a function of time as the UGV accelerates under full throttle along the curve. This function also depends on UGV state. decel;K (t) : Thest trajectory on the curveK as a function of time as the UGV decelerates under full braking along the curve. This function also depends on UGV state. DO : A tuple of numbers consisting of the bounding circle radius, position and velocity of a dynamic obstacle. xyt obstacle : This is a projection of a dynamic obstacle onto the local terrain plane (i.e. discarding the z components) C K;DO (or st obstacle) : This is the set of all points C K;DO PW K corresponding to a particular xyt obstacle moving across the curve K. These points mark the position and time at which the UGV will be in collision with the dynamic obstacle DO. The UGV is in collision if the position of the UGV on the curve is s ugv att n and the point (s ugv ;t n )2C K;DO . Thus, C K;DO can computed by sampling points in PW K and checking if the sampled point corresponds to a collision between the UGV and a dynamic obstacle. 40 UL vertex (or upper left vertex) : This is the upper left vertex of a st obstacle C K;DO . UL(C K;DO ) = (s UL ;t UL )2C K;DO where s UL = max (s;t)2C K;DO (s) and t UL = min (s;t)2C K;DO (t) LR vertex (or lower right vertex) : This is the lower right vertex of a st obstacleC K;DO . LR(C K;DO ) = (s LR ;t LR )2C K;DO where s LR = min (s;t)2C K;DO (s) and t LR = max (s;t)2C K;DO (t) Velocity tuning (or speed regulation) : The process of nding a st trajectory K overC K such that the trajectory does not pass through anyst obstacleC K;DO . (see Chapter 7.1.3 [106]) 3.3 Problem Formulation Given: (a) A kinematically feasible, collision free reference path K r (s)2R 3 parameterized by the arc- length parameter s and a reference speed prole v r (s) over the path K r (s) (b) The current time t start (c) UGV state consisting of: p ugv 2R 3 : The current position of the UGV v start 2R : The current velocity in direction of motion : The average slope angle of the terrain under footprint of the UGV : The yaw angle with respect to the slope (d) A look-up table of speed, acceleration and deceleration constraints indexed by and (e) A set ofn do dynamic obstaclesD =fDO k g n do k=1 sensed by the perception system, each dened by: Instantaneous position measurement p do 2 R 3 and the associated perception variance p 2R following the Gaussian distributionN (p do ;I 33 p ) 41 Instantaneous velocity measurement v do 2 R 3 and the associated perception variance v 2R following the Gaussian distributionN (v do ;I 33 v ) Distance between obstacle and the UGV perception system d Noise saturation threshold d t Position uncertainty parameter { p (d) = 8 > > > < > > > : p;max d dt ; d<d t p;max ; dd t Velocity uncertainty parameter { v (d) = 8 > > > < > > > : v;max d dt ; d<d t v;max ; dd t (f) A set of kinematically feasible paths (computed oine) A, which locally modify K r (s) by connecting p ugv to a common point K r (s end ) on the path K r An alternative path: K m of length l Km where K m (0) =p ugv and K m (l Km ) =K r (s end ) A(p ugv ;s end ) =fK m (s)g n alt m=1 (g) Full-throttle acceleration trajectory for the current UGV state accel;K (t;t 0 ;v 0 ;s 0 ;(s); (s)) and full braking deceleration trajectory decel;K (t;t 0 ;v 0 ;s 0 ;(s); (s)) dened overC K space. s 0 denotes the current position of the UGV onK(s). t 0 denotes the current time. Additionally, the tuple (t 0 ;v 0 ;s 0 ) denes an initial condition over st space. (s) is the local slope angle of the terrain at the point K(s) and (s) is the angle the tangent at K(s) makes with the slope. Compute: A dynamically feasible trajectory, which minimizes collision risk and minimizes ref- erence path and reference speed deviation. 42 3.4 Approach The approach used in the TGDC to compute a dynamically feasible plan is outlined in Algorithm 1. The approach is described with reference to the motivational example. Generate a planning window on each path at the current time Generate a set of alternative paths from the current UGV position to a common end-point on the reference path Populate the planning window with − obstacles Generate a − trajectory by making a decision to pass obstacles or yield to obstacles or continue with no change to speed Path Collision Risk Cost Best Strategy 0.7 0.0 No change 1 0.0 50.0 No change 2 0.1 30.0 Yield 3 0.6 40.0 Pass = { , 1 , 2 , 3 } 1 2 3 4 Return a symbolic plan ( 1 , ) i.e. Use path 1 and maintain 5 1 2 3 Figure 3.3: Graphical overview of the approach 3.4.1 Construction of a Planning Window Line 3 of Algorithm 1: The kinematically feasible pathK r (s) and speed prolev r (s) provided by TGKC is used as a reference by the TGDC. They are termed reference path and reference speed prole respectively. Each time the TGDC is invoked, the TGDC generates a set of paths A. This path generation can be done online or oine. In this work, A is constructed by picking a few paths from a library of pre-computed paths described in [72]. Some of the these paths may be in collision with static obstacles when rotated, translated and placed in front of the UGV. While some other paths may not satisfy the curvature constraints imposed by the vehicle kinematics. These paths are removed fromA. The number of alternative pathsn alt is scaled based on the time to collision with the next 43 Algorithm 1 Trajectory planning with dynamics constraints 1: procedure ComputeDynamicallyFeasiblePlan 2: Depending on time to collision with the nearest obstacle, choose the number of alternative paths n alt to generate. 3: Generate the path set A(p ugv ;s end ) based on the current vehicle position and look-ahead distance. 4: Obtain the dynamic obstacle state measurementsD =fDO k g from the perception system 5: for each path K m 2A do 6: Construct a planning window PW Km 7: Compute vehicle dynamics constraints accel;Km , decel;Km based on and along the path K m and the UGV state 8: Pre-process the reference speed prole on the path using vehicle dynamics constraints to form a dynamically feasible reference trajectory 9: for each DO k 2D do 10: Obtainn s samples ofDO k (from the Gaussian distribution) and compute the st obstacle samplesfC Km;DO k ;n g ns n=1 11: If combining multiple obstacles is appropriate, coalescest obstaclesfC Km;DO k ;n g n do k=1 to form n s samples of a single composite st obstaclefC Km;DO;n g ns n=1 12: for each strategy2fPASS;YIELD;NOCHANGEg do 13: Compute p coll;strategy , c strategy using the st obstacle samplesfC Km;DO;n g ns n=1 using the method in Section 3.4.4 14: Append the cost tuplep coll;pass ;c pass ;p coll;yield ;c yield ;p coll;nochange ;c nochange for each path option K m and each of the strategies to the cost matrix 15: Generate a symbolic plan (K bestpath ;c beststrategy ) consisting of the best path option and best strategy on that path by using criteria in Section 3.4.5 44 Planning window Figure 3.4: (a) PathK and (b) its corresponding planning windowPW K (0;t start ;l K ;t h ) showing a reference speed prole v(s) to follow dynamic obstacle. This allows for fast reactive collision avoidance even when available reaction time is limited. In this work, n alt ranges between 2 and 10. Each path option starts at p ugv and ends atK(s end ). It deviates slightly from reference path midway. Speed prole on the alternative path could be specied using the dynamic trajectory space [178] or adapted from the reference speed prole. For example, in Figure 3.1 and 3.3,A =fK r ;K 1 ;K 2 ;K 3 g. K r will not be included if the UGV happens to be signicantly away from K r . Each path in A is such that it starts at the current position of the UGV and ends at a common pointK r (s end ) wheres end =s start +l deviation . s start is parameter value on K r such that K r (s start ) is the closest point on K r to the current UGV position. l deviation is the distance overK r in which the UGV will be deviating from the reference path K r . l deviation typically has to scale with the obstacle size. In this work, l deviation was xed to 10 m as obstacles were of a xed size. Note thatK m (l Km ) =K r (s end ). For each pathK m 2A, 45 a planning window PW Km (0;t start ;l Km ;t h ) is constructed (see Figure 3.4b). Thus, the planning window is t h wide starting at the current time t start and l Km tall starting at 0. So far, the planning windows are all blank. The next step is to populate each window with dynamic constraints. 3.4.2 Construction of Dynamics Constraints Line 7: The terrain type (i.e. the local slope at the vehicle footprint) is used to compute the associated dynamics constraints. In this work, a simple look-up table (Table 3.1) is used to determine parameters describing the acceleration and deceleration capabilities (3.1) of the vehicle in a particular terrain type. Acceleration and deceleration constraints are specied via trajectories in the planning window. Terrain parameters and the orientation of the vehicle on the terrain aect the local acceleration and deceleration constraints. In this work, only the local slope under the vehicle's footprint and the direction with respect to the slope are used to form equations of acceleration and deceleration. However, other parameters aecting the powertrain can be easily incorporated into these equations [178]. The upper bound on accelerationa + and the upper bound on decelerationa used in this work are shown in (3.1) and Table 3.1. For a certain initial state (s 0 ;t 0 ;v 0 ) and a pathK, an acceleration trajectory accel;K (t;t 0 ;v 0 ;s 0 ;(s); (s)) and a deceleration trajectory decel;K (t;t 0 ;v 0 ;s 0 ;(s); (s)) can be computed (see Figure 3.5) as the double integral of the equations describing upper bounds on acceleration and deceleration along the path. Note that and change along the path K. Table 3.1: Scaling parameters for each terrain type Parameter Slope angle ( ) -10 0 10 v max (m/s) 2:0 1.8 1:6 k 0:3 + 0:2c 0:3 0:3 0:2c a min (m/s 2 ) 0:2 + 0:1c 0:2 0:2 0:1c 46 a + (v) =k(v max v) a =a min (3.1) accelerating decelerating stopping speed saturating stopping distance time to maximum speed Infeasible region Feasible region Figure 3.5: Maximum speed, acceleration and deceleration constraints are trajectories in the planning window corresponding to a certain curve. The acceleration trajectory (blue) shows the resulting st trajectory if maximum throttle were to be applied from the initial state (s 0 ;t 0 ;v 0 ) onwards. Similarly, the deceleration trajectory (red) shows the resulting st trajectory if maximum braking were to be applied from the initial state. Thus, the region above the acceleration curve and the region below the deceleration curve are not reachable from the initial state. In this context, a target point in st space is considered reachable if there is a continuous function between the initial state and the target point that does 47 not pass through st obstacles or the infeasible region. At this point, planning windows have dynamic constraints. The next step is to populate them with dynamic obstacles. 3.4.3 Construction of s t Obstacles Line 8: A given reference speed though kinematically feasible, may be dynamically infeasible when vehicle is on certain types of terrain (e.g. high-speed over uphill terrain). In such circumstances, the reference speed prole is capped at the maximum speeds allowed by the local terrains in the planning window. This pre-processing step ensures that reference speed prole is reasonably altered for the terrain conditions but not necessarily collision free. Line 10: Collision avoidance using the velocity tuning method requires dynamic obstacles to be translated from xyt space to the planning window PW Km . Dynamic obstacles that cut across the reference path are termed xyt space obstacles henceforth. Figures 3.7 and 3.6 collectively illustrate the relationship between xyt space and st space. For example, the st obstacle C K;DO corresponding to the dynamic obstacle cutting across the path K in Figure 3.6 is computed by moving the in ated obstacle time step by time step from t =t 0 to t =t 1 and noting the intersection between the in ated obstacle and the path. Thus, in this case, for the path K, C K;DO is [s a ;s b ] [t 0 ;t 1 ] as shown in Figure 3.7. Note that the subscript m is omitted from K m for notational simplicity. However, it is to be understood that K has to be replaced with each of the paths in A to yield path specic st obstacles such as C Km;DO . 3.4.4 Evaluating Strategies for Obstacle Avoidance Given a st obstacle, a schematic view of two possible trajectories is illustrated in Figure 3.7. The two possible trajectories, pass and yield, are computed over the time range [t start ;t end ]. The two classes of speed proles maneuvering around the st obstacles are termed strategies. When the UGV executes the pass strategy, the UGV passes in the front of the dynamic obstacle before tracking reference speed. When the UGV executes the yield strategy, it slows down to pass the 48 Alternative paths K(s) Figure 3.6: Snapshots of an in ated (vehicle radius + obstacle radius) xyt space obstacle as it passes diagonally across a few paths at velocity v. (also see Figure 3.7) dynamic obstacle before tracking reference speed. There is a third strategy no change that is not shown. This strategy is applicable when reference speed prole does not intersect with the st obstacle and hence, the UGV can follow the speed prole with no changes. Multiple dynamic obstacles cutting across the path are either dealt with sequentially one after the other or coalesced into a composite obstacle depending on how these obstacles are spaced apart in the planning window. Figure 3.8 shows how two st obstacles are combined into one singlest obstacle by obtaining the bounding box ofC K;DO1 andC K;DO2 . Section 3.5.4 species a condition to determine if multiple obstacles should be merged depending on s and t. Either way, multiple dynamic obstacles are reduced to a singlest obstacle to be dealt within a planning window. This way, the pass and yield strategies developed so far still work the same way even with multiple obstacles in the planning window. Before any informed decision can be made regarding the strategy to choose, perception uncer- tainty associated with the dynamic obstacle state needs to be incorporated. In order to incorporate dynamic obstacle state (position and velocity) uncertainty and assess the risk associated with each strategy, the uncertainty model of a xyt obstacle has to be utilized. Multiple realizations of the dynamic obstacle state is sampled from the Gaussian distribution yielding samples of position 49 UL LR Pass strategy Yield strategy Reference speed pro file Figure 3.7: A planning window in which a st space obstacle (dark red) and its bounding box approximation (light red) are shown. The reference speed prole is infeasible as it intersects the st obstacle. Two trajectories start at (s start ;t start ) in the planning window. The passing trajectory (green) accelerates in front of the obstacle using the upper left (UL) vertex. The yielding trajectory (purple) decelerates and yields to the obstacle using lower right (LR) vertex. (also see Figure 3.6) and velocity. In this work, a sample sizen s of 50 is used. For each samplep do;n andv do;n , a corre- spondingst obstacleC K;DO;n can be computed. A few samples ofst obstacles corresponding to a dynamic obstacle is illustrated in Figure 3.9. In order to circumnavigate the obstacle, there are two strategies that can be employed - passing and yielding. A passing strategy involves accel- erating and passing in front of the obstacle, which requires targeting and reaching the upper left (UL) vertex from the initial state (s 0 ;t 0 ;v 0 ). Similarly, a yielding strategy involves decelerating and letting the obstacle pass, requiring targeting and reaching the lower right (LR) vertex from the initial state. For a passing strategy, the risk of collision can be estimated as the ratio of the number UL vertices that are unreachable to the total number of UL vertices. Similarly, for a yielding strategy, the risk of collision can be estimated as the ratio of the number of LR vertices that are unreachable to the total number of LR vertices. For the passing strategy shown in Figure 50 UL LR Figure 3.8: An example of how twost obstaclesC K;DO1 andC K;DO2 are merged into one single st obstacle C K;DO 3.10, the collision risk is 1=6 since there is only one UL vertex that is situated in the infeasible region. In other words, R(p) = 8 > > > < > > > : 1; if point p is not reachable 0; if point p is reachable (3.2) p coll;pass = 1 n s ns X n=1 R(UL(C Km;DO;n )) (3.3) p coll;yield = 1 n s ns X n=1 R(LR(C Km;DO;n )) (3.4) For the no change strategy, the ratio of st obstacles instances intersecting the reference speed prole to the total number of st obstacle instances is taken as the collision risk. This way perception uncertainty and satisfaction of dynamics constraints are taken care of. Note that the notion of collision employed here not only involves a physical collision but also counts inability to satisfy dynamics constraints as a collision. In addition to collision risk, each strategy has an associated reference trajectory deviation cost. Section 3.4.6 describes the cost function used to evaluate a strategy. Once collision risk 51 Infeasible region Figure 3.9: Qualitative example of risk assessment for a passing strategy where 6 samples of a st obstacles on a particular path are used. One of the samples has the upper left (UL) vertex in the infeasible region. Thus, the collision risk is 1/6 for the passing strategy and trajectory deviation cost are computed for each strategy, they are assembled into a 6-tuple containing collision riskp coll and reference deviation costc for each of the three possible strategies that can be attempted over the reference path. An example of such a tuple is shown in the rst row of Table 3.2. Table 3.2: An example of the cost matrix for 4 path choices Path choice Strategy choice Pass Yield No change p coll c pass p coll c yield p coll c nochange K r 0.5 37.2 0.0 43.3 0.9 0 K 1 0.1 45.2 0.0 108.6 0.1 40 K 2 0.0 77.2 0.0 103.3 0.0 50 K 3 0.2 63.2 0.1 93.3 0.0 60 3.4.5 Generating Dynamically Feasible Plans Line 13: For each path inA, each strategy is evaluated. And, a cost matrix is populated to form the entire table shown in Table 3.2. 52 Line 15: A combination of path option and strategy pair is termed as a symbolic plan. For example, (K 3 ;YIELD) is a symbolic plan. The risk assessment method implicitly handles the dynamics constraints. Hence, picking symbolic plans with zero risk satises dynamics constraints. In some scenarios, there might be multiple symbolic plans that are risk free. In these cases, the expected cost is used to favor one symbolic plan over the other. Conversely, in some scenarios, there might be no symbolic plan that is risk free. In such cases, the symbolic plan exhibiting the least risk is chosen without regard for costs. An exception is raised and fed to the higher-level layer to trigger a replan. Note that the symbolic plan does not specify any specic speed or specic coordinates on the planning window that has to be targeted and reached. Rather, it only species an abstract class of trajectories over the planning window. A value binding operation that converts the symbolic plan to a numerical plan occurs only when the actual obstacle is observed at a closer range during execution. Being at closer range, more accurate st obstacles are observed and this accuracy is exploited to minimize reference trajectory violation while retaining safety. This value binding operation is needed because the perception system may be running at a higher frequency and thus, provide fast updates to dynamic obstacle measurements. During the value binding operation, new samples ofst obstacles are used to construct a worst casest obstacle by obtaining a bounding box around the samples (see blue rectangle in Figure 3.11). If the strategy decision was to pass, the planner defaults to following the reference speed prole (i.e. no change strategy) till a critical time after which value binding operation is triggered. This critical time depends on the acceleration curve as shown in Figure 3.11. For the pass strategy, the value binding operation uses the UL vertex of the worst casest obstacle and computes the command velocity along the chosen path as v command = s UL =(t UL t start ). Similarly, if the strategy decision was to yield, the planner defaults to following the reference speed prole till a critical distance after which value binding operation is triggered. This critical distance depends on the stopping distance associated with the deceleration curve. For the yield strategy, the value binding operation uses the LR vertex 53 of the worst case st obstacle and computes the command velocity along the chosen path as v command =s LR =(t LR t start ). The rationale behind such triggering is to incur as little reference trajectory deviation as possible and wait till it is absolutely necessary to act to avoid collision. Value binding triggered this way minimizes reference speed deviation. After the pass or yield maneuver, the planner resumes following the reference speed prole. Infeasible region Reference speed profile s-t obstacle cloud No change strategy Evaluation of no change strategy Evaluation of pass strategy Infeasible region Reference speed profile s-t obstacle cloud Pass strategy Evaluation of yield strategy Infeasible region Reference speed profile s-t obstacle cloud Yield strategy Figure 3.10: Evaluation of strategies 3.4.6 Trajectory Deviation Cost Function A distance based metric is used to evaluate symbolic plans consisting of a path choice and a strategy choice. The trajectory deviation cost consists of path deviation penalty c path incurred while using alternative paths and speed deviation penalty incurred when passing and yielding 54 strategies are used . The cost of pass, yield and no change strategies on a certain path are computed as follows: c path = 8 > > > < > > > : l Km ; if path is K m 0; if path is K r (3.5) c pass = s UL +s LR 2 jv p v r j v r +c path (3.6) c yield = s UL +s LR 2 jv y v r j v r +c path (3.7) c nochange = 0 +c path (3.8) The cost function captures the distance over which speed violation happens and also weights it by the degree of speed deviation from the reference speed v r (Figure 3.10). Note that the cost computed by this cost function corresponds to only one sample of the st obstacle. Over n s samples selected from ast obstacle distribution, an expected cost can be computed for each of the strategies as follows: E[c strategy ] = 1 n s ns X k=1 c strategy;k (3.9) 3.5 Results and Discussion The following performance metrics are used to evaluate the executed trajectory: Number of collisions Execution time: the time taken in moving from start point to end point Speed violation ratio: the ratio of execution time spent in violation of the reference speed prole to the overall execution time Path deviation distance: the distance over which reference path was not followed. 55 The TGDC was implemented on the scenario shown in Figure 3.12. The scenario features terrain strips with varying slopes. A point mass dynamic model of a vehicle was used. The capability of the vehicle was articially capped at the limits allowed by the terrain. For example, if the terrain only allowed a certain maximum speed 2:0ms 1 , any further throttle will have no eect after the vehicle reached 2:0ms 1 . The vehicle was placed at the start point and commanded to reach the end point. Then, a kinematically feasible trajectory was supplied to the TGDC. Dynamic obstacles were positioned along the reference path at various points and with their trajectories crossing the path at various angles. Each dynamic obstacle was assigned a trigger region, as the vehicle entered the trigger region, the dynamic obstacle was set in motion and assigned a specic speed. This speed was such that collision would occur if the vehicle continued traveling at the reference speed. Such triggering emulates the appearance of dynamic obstacles within the vehicle's sensing radius. Once triggered, a dynamic obstacle's state information is available to TGDC. Dynamic obstacle speed was also randomly varied around the nominal obstacle speed. 3.5.1 Impact of Inexact Dynamics Constraints on Performance The TGDC's sensitivity to inexact dynamics constraints was measured by deliberately supplying a range of dynamics constraints around the correct dynamics constraints. Each dynamics constraint parameter in Table 3.1 was in ated by constraint augmentation factor to yield new parameters that were either more conservative or aggressive than the correct values. Figure 3.13 shows the collision rate and speed violation as the dynamics constraints are altered. Speed violation ratio has been normalized with respect to that of the exact dynamics constraints ( = 0). Figure 3.13 shows that conservative ( < 0) dynamics constraints results in zero collision at the cost of increased speed violations. On the other hand, aggressive ( > 0) dynamics constraints result in lower speed violations at the cost of a higher collision rate. 56 Table 3.3: Comparison of speed-regulation, path-variation against the hybrid approach (a) Pre-critical value binding (dynamic obstacles detected on time) Normalized metric Collision avoidance approach Path-variation only Speed-regulation only Hybrid Collision rate 0 0 0 Execution time 0.95 1.05 1.0 Path deviation 1.89 - 1.0 Speed violation ratio - 1.3 1.0 (b) Post-critical value binding (dynamic obstacles detected too late) Normalized metric Collision avoidance approach Path-variation only Speed-regulation only Hybrid Collision rate 6.34 2.65 1.0 Execution time 0.77 0.96 1.0 Path deviation 0.38 - 1.0 Speed violation ratio - 1.2 1.0 3.5.2 Impact of Perception Uncertainty on Performance To observe the eect of perception uncertainty on TGDC performance, the dynamic obstacle noise model parameters (position uncertainty parameter, p;max and speed uncertainty parameter v;max ) were varied, while the TGDC was supplied conservative dynamics constraints ( =20%). These noise model parameters control the interval width of the uniform distribution used in the obstacle position and velocity measurement. Note that v;max scales with the velocity of the dynamic obstacle. For each combination of noise parameters, 50 independent experiments were performed. The noise saturation distance threshold d t (see Section 3.3) was set to 10 meters. The normalized mean execution time and normalized speed violation ratio are shown in Figure 3.14. These quantities were normalized with respect to the zero noise case. No collisions were observed in these experiments. However, as noise increased, the execution time and speed violation ratio increased by up to 31:7% and 88:1% respectively. This result shows that increased dynamic obstacle state measurement noise does not compromise the safety of our method. 57 3.5.3 Need for Both Speed-Regulation and Path-Variation Relying on speed-regulation alone (with no alternative paths) to avoid dynamic obstacles results in longer execution time and a larger reference speed deviation. This is undesirable when dynamic obstacles stop along the reference path, which causes the vehicle to stop on the reference path indenitely. Similarly, avoiding dynamic obstacles by solely relying on moving on alternative paths (without speed-regulation) results in a larger reference path deviation. This is especially undesirable when no alternative paths exist (e.g. in a narrow passage). Table 3.3 compares the execution time, path deviation and speed violation ratio of the path-variation and speed- regulation approaches against the hybrid approach. No collisions were observed when dynamic obstacles were sensed suciently early (Table 3.3). These results show that the hybrid approach used in TGDC, combining speed-regulation and path-variation incurs a lower path-deviation and speed violation ratio at the cost of a marginal increase in execution time. The sudden appearance of dynamic obstacles impacts collision avoidance. The value of the hybrid approach is more apparent when dynamic obstacles appear suddenly and hence, are sensed late (after the critical point). In such a situation, speed-regulation alone may not avoid collisions due to the UGV's inability to quickly decelerate or accelerate. Similarly, swerving to the extreme left or right without modifying speed, may cause vehicle to overturn. In contrast, the hybrid approach allows for these actions whilst decelerating. Allowing such actions results in lower collision rate (Table 3.3). 3.5.4 Sequential Processing of Dynamic Obstacles The spacing between multiplest obstacles in the planning window aects TGDC performance. If thest obstacles are well spaced, it may be better for the TGDC to consider eachst obstacle sequentially. Otherwise, it is better to coalesce them into a single compositest obstacle as shown in Figure 3.8. To quantitatively study the eect of spacing between two st obstacles, a grid 58 of st points is used (see Figure 3.16). Each point in the grid corresponds to the centroid- to-centroid vector (t; s) between two st obstacles (e.g. (0; 0) implies both obstacles are coincident, see Figure 3.8). The s and t axes of Figure 3.15 are normalized by vehicle stopping distance (corresponding to maximum speed) and vehicle stopping time, respectively. Figure 3.15 shows collisions whenst obstacles separated by 1.5 stopping distances (s) and 1 stopping time (t). This arises due to sequential processing of obstacles, where overtaking one obstacle put the vehicle on a collision course with the next. From the examination of Figure 3.15, st obstacles with centroids less than one stopping time away from each other or less than approximately 2 stopping distances can be safely handled by creating a composite st obstacle. However, Figure 3.16 shows an increase in speed violation ratio up to a factor of approximately 3.7 when these compositest obstacles are used. This trade-o must be taken into consideration while processing multiple dynamic obstacles within a single planning window. 3.6 Summary In this work, a method to generate collision risk-aware, dynamically feasible trajectories for a UGV operating over uneven terrain is presented. This method is shown to produce collision-free trajectories when the correct dynamics constraints are used even in the presence of sensor noise. Many recent approaches for avoiding collisions with dynamic obstacles respect the dynamic constraints of the vehicle. However, as the terrain changes, dynamic constraints imposed on the vehicle also change. A common practice is to assume conservative dynamics constraints that work on all terrains. This reduces overall mission performance. In this work, the focus is on developing a real-time dynamics-aware reactive trajectory generator which produces trajectories that avoid collisions with dynamic obstacles under varying dynamic constraints. The trajectory generator considers modications to the intended path, generating alternatives in real-time. It also considers regulating speed along the various modied paths, nding a trajectory that avoids collision while 59 minimizing deviation from the intended trajectory. It accounts for uncertainty in the obstacle position and velocity while evaluating the alternative trajectories and uses conservative estimates of the vehicle's dynamic constraints to ensure collision risk is minimized. It also uses \deferred value binding", to exploit more accurate estimates of the dynamic obstacle states as obstacles approach the vehicle. It automatically adjusts the number of options being evaluated based on the estimated time to collision. In order to ensure real-time performance, the trajectory generator handles multiple dynamic obstacles by either grouping them into a single composite obstacle in the conguration space or deals with them sequentially by prioritizing obstacles based on the estimated time to collision. Simulation results are presented to show that the planner is able to eectively deal with the dynamic obstacles on terrains with varying slopes. 60 Time left to react Critical time to act has been reached Time left to react Critical distance or time to act has been reached Distance left to react (a) (b) (d) (c) Pass Yield Numerical plan Numerical plan Figure 3.11: Triggering of value binding operation depends acceleration/deceleration curve. (a) and (c) illustrate the scenario where pass and yield strategies have been decided but reference speed prole is followed. (b) and (d) illustrate the scenario a little later at t 1 and just after value binding has occurred. The computed numerical plan is shown as a dotted black line. The blue rectangle shows the worst-casest obstacle corresponding to the redst obstacle samples used in value binding process 61 SLOPE = 10° SLOPE = 0° SLOPE = -10° SLOPE = 10° SLOPE = 0° SLOPE = -10° SLOPE = 10° SLOPE = 0° Dynamic obstacle Direction of movement Alternative paths Start point End point Figure 3.12: Top view of the scenario used for generating the results. Dynamic obstacles are poised to cut across the reference path from either the left or the right of the reference path. Inset: a magnied view of the dynamic obstacle (red with yellow buer region), reference path (red) and alternative paths (blue, pink). Figure 3.13: Relationship between collision rates and speed violation as the constraint augmen- tation factor is varied. 62 (a) (b) Figure 3.14: (a) Percent increase in execution time and (b) percent increase in speed violation ratio as uncertainty parameters p;max and v;max are varied. Figure 3.15: When handling two obstacles sequentially, collisions (green) happen when st obstacles are spaced less than 1.5 stopping distances and 1 stopping time 63 Figure 3.16: When handling two obstacles as a single composite obstacle, speed violation ratio increases when st obstacles are spaced apart further than 0.5 stopping distances 64 Chapter 4 Deliberative Trajectory Planning 4.1 Introduction Automated trajectory planning is a prerequisite for realizing autonomous unmanned vehicles. Collision avoidance with static and dynamic obstacles under perception uncertainty is a basic requirement for generating useful trajectories. The focus of this work is a framework for deliber- ative trajectory planning in the presence of large semi-permeable dynamic obstacles and motion uncertainty. Typically, trajectory planning for unmanned vehicles is decoupled into two parts. First, a path planning problem is solved to obtain a tentative path, ignoring dynamic obstacles. Then, a motion planning problem is solved, this time considering dynamic obstacles, using the tentative path as a reference to yield a nal trajectory. The motion planning problem is conventionally solved using reactive planning approaches like generalized velocity obstacles (GVO) [209] and velocity tuning [179, 156]. These methods work well when the dynamic obstacles are small and the introduction of these dynamic obstacles does not introduce drastic dierences between the true optimal path and the tentative path. Thus, in the case of small dynamic obstacles, a reactively-planned trajectory is likely to be close to the true optimal trajectory while following a tentative path computed while ignoring small dynamic obstacles. 65 However, in the case of large dynamic obstacles, it is generally likely for reactively-planned trajectories to be highly sub-optimal. For instance, on the one hand reactively avoiding (i.e., going around) large dynamic obstacles may introduce unnecessary slow-downs and increase ex- ecution time. On the other hand, punching through semi-permeable dynamic obstacles may decrease execution time but increase the risk of failure and collisions. Therefore, handling large semi-permeable dynamic obstacles requires deliberative planning that can carefully balance the competing priorities (i.e., reduce execution time, reduce failure/collision rate). For example, reducing failure/collision rate should be the rst priority. When it is possible to have zero fail- ures/collisions, reducing execution time is the next priority. In some situations such as those that arise in the marine robotics domain, navigation around or through large dynamic obstacles is required (Figure 4.1). Figure 4.1: Handling large dynamic obstacles requires deliberative planning. The waveeld generated by a marine vessel is a good example of a large semi-permeable dynamic obstacle. In a majority of practical scenarios, unmanned ground vehicles exhibit signif- icantly less motion uncertainty compared to unmanned surface vehicles. Trajectory planning for 66 unmanned ground vehicles is a well-studied problem and in the recent years, signicant progress has been made in solving it using a wide variety of methods such as graph search [118], stochastic tree search [57], MDPs (Markov Decision Processes), optimal control [88, 210]. Graph search methods like state lattice search have become popular as global planning methods due their op- timality guarantees and eciency in large environments. Unmanned surface vehicles exhibit more motion uncertainty due to signicant external dis- turbances like waves, wakes and winds [184]. Collision avoidance is more challenging under these circumstances and it involves not only staying a safe distance from static obstacles and dynamic obstacles but also following COLREGs [186]. COLREGs is a set of maritime navigation rules set by the International Maritime Organization (IMO) to minimize collisions and confusion among marine vessels. The eects of motion uncertainty is markedly more pronounced in smaller USVs as they are particularly more susceptible to wave disturbances. Motion uncertainty can be rig- orously handled in a MDP (Markov Decision Process) framework for environments with static obstacles. It requires a state transition model of the USV. A state transition model to account for wave disturbances is developed in [194] for use in path planning amid static obstacles. In addition to ambient sea waves, waveelds generated by the motion of dynamic obstacles over the sea create yet another source of motion uncertainty and adds to the failure modes. The waveeld induces external disturbances that vary both spatially and temporally. Trajectory planning for USVs in the presence of dynamic obstacles needs to consider a state vector that includes position, orien- tation, velocity and time. For time-varying environments, the regular MDP can be converted to a time-extended MDP [120]. Time-extended MDP over high dimensional state spaces is complex and computationally prohibitive [83, 161]. On the other hand, na ve lattice-based search over mo- tion primitives require careful collision risk assessment and specialized heuristics for dealing with dynamic obstacles in the presence of signicant motion uncertainty. Large-scale POMDPs can be approximately solved only in a reactive online fashion using methods developed in [176, 212]. The main contribution of this work is a deliberative planning framework which 1) uses a trajectory 67 tracking error model to handle motion uncertainty, 2) introduces a failure-risk assessment scheme for moving wave-elds generated by trac vessels, and 3) utilizes speed-up techniques for the search process to generate trajectories rapidly to be useful in dynamic environments. A state lattice search [118] approach is employed in this work. And, the waves generated by the moving vessels are modeled as trailing spatio-temporally changing penalty regions. 4.2 Background 4.2.1 Vehicle Model For trajectory planning purposes, a simplied kinematic vehicle model is used to describe the motion of the USV. _ x =f(x; u; w 0 ) (4.1) = 2 6 6 6 6 6 6 4 cos 0 sin 0 0 1 3 7 7 7 7 7 7 5 2 6 6 4 v ! 3 7 7 5 + w 0 (4.2) The USV pose is x = (x;y; ) T where x and y denote the position in North-East-Down (NED) coordinates. k denotes heading with respect to north. USV control variables are u = (v;!) T and w 0 is a zero-mean random process noise capturing the eects of environmental disturbances. A state s is dened by the tuple (x;y; ;t;p s ) encoding the USV pose, the time-stamp and probability of being operational upto t units of time from the start time. A motion goal is a tuple MG = (x f ;y f ; f ;t f ) encoding a pose (x f ;y f ; f ) to be reached at time instant t f . Given an initial state x i and a nal state x f = (x f ;y f ; f ) T specied by the motion goal MG, and letting w 0 = 0, a nominal trajectory x(t) satisfying x(t i ) = x i ; x(t f ) = x f can be computed using trajectory generation methods [42, 88]. It is assumed that the low-level controller of the USV will 68 guarantee that target pose x f is achieved by the time deadlinet f . In this work, Dubins curves [48] is used as the optimal trajectory between motion goals using 70% of the maximum allowable USV speed (as the controller may need to exercise full throttle occasionally to compensate for time losses while rejecting disturbances). A motion goal set MGS r;n dirs (s) is a collection of motion goals spaced out a distance ofr units relative to a particular state s and directed atn dirs dierent angles. Each of the motion goals is given a label l corresponding to its direction from s. Thus, MGS r;n dirs (s) =fMG l g[MG O , wherel =k360=n dirs ;k2 [0;n dirs 1]. The labelO indicates a wait action of duration t wait at the same pose. A station-keeping controller is assumed to be active for the duration of the wait action. Figure 4.2 shows an example of a labeled motion goal set MGS r=30m;n dirs =8 (s) where s = (x = 0;y = 0; = 0;t = 0). r is adapted to have multiple resolutions depending on how congested the local region is. This allows for high resolution motion goals in areas which are dicult. 4.2.2 Trac Vessel Proles A trac vessel prole TVP is a collection of quantities describing the geometry and state of a trac vessel underway as predicted by a trajectory prediction system [14]. A typical trac vessel prole TVP i made up of the following quantities: Predicted trajectory p i (t) Instantaneous position covariance matrix i (t) Vessel zonesfZ k g 3 k=1 where eachZ k is an ellipse attached to and dened in the vessel body frame (see Figure 4.3). The vessel zones travel along with the trac vessels as vessels move. Vessel zone 1 is a region with 100% probability of failure . Vessel zone 2 and 3 mark regions where signicant wave disturbances are expected and hence, also specify the local wavefront direction in that region. It is assumed that waves that are outside the zones have negligible eect on the USV. The sizes of these vessel 69 Figure 4.2: Motion goal set with r = 30m containing n dir = 8 motion goals (in blue) directed toward cardinal and intercardinal directions. Eight optimal trajectories (in red) from (x = 0;y = 0; = 0;t = 0) to each of the motion goals is shown. A ninth motion goal labeledO is also shown which corresponds to a wait-action of duration t wait . The trajectories start with = 0 and end at the heading angle corresponding to the cardinal directions. 70 zones are application-dependant. In this work, the length of zone 1 is xed to three boat-lengths long, and it can be made longer for vessels traveling at a higher relative speed compared the USV. The length of zones 2 and 3 have been chosen based on the attenuation of the waveeld and where the waveeld poses no danger to the USV. A set of trac vessel proles is denoted by TVPS =fTVP k g ntv k=1 . Figure 4.3: (a) A Kelvin wave pattern for a trac vessel underway westward, operating at hull Froude number 0:5 and waterline boat length 10m. Crests are shown in yellow and troughs are shown in red. Two local wavefronts near two instances of the USV is labeled 1 and 2. (b) Simplication of wave pattern into zones with dierent local wavefronts. 71 4.2.3 Environmental Disturbances Environmental disturbances are encoded parametrically through a parameter vector dened for each state in the state-space as in Figure 4.4. For example, at point D, D = fsea state = calm; near vessel = falseg where sea states are dened as in the Douglas sea scale [52]. Point C lies in a static obstacle region and hence, C =finvalid = trueg is left undened and handled as a spe- cial case. Whenever a trac vessel passes through a point, for example, point A at a certain timet, the parameter vector is updated as A =fsea state = calm; near vessel = true; wave zone =Z 2 g. Similarly, at point B, B =fsea state = rough; near vessel = true; wave zone = Z 3 g. In both cases A and B, the local wave amplitude and wave direction can be determined given the wave zone and the pose of the trac vessel at t. The eld near vessel in indicates that the point is in some zone of some trac vessel. Figure 4.4: An illustration showing how parameter vector is dened over a trajectory at a few characteristic query points A, B, C and D. 72 4.2.4 Failure Model We seek to model failure probabilities using a robot reliability model [180]. The probability that a robot remains functional up to time t is given by p s (t) =e t=M ; (4.3) where M is the Mean-Time-Between-Failures (MTBF), which is estimated as M = Total observation duration Number of failures during observation ; (4.4) the ratio of number of failures to the total number of attempts within an observation time window. Developing a precise model of the failure (i.e., capsize) of the USV in varying sea-states and waves is dicult. It involves allowing the robot to fail in the process of counting the number of failures within an observation time window. This is not practical. A safer alternative is to measure how close to failure the robot got to when exposed to a certain local disturbance. In this work, a local disturbance is described parametrically as the tuple consisting of sea-state and proximity to a vessel. Two sea-states are considered calm (undisturbed waters) and rough (choppy waters due to wind conditions/multiple trac vessels). And, proximity to vessel is a Boolean ag that is determined by whether the USV happens to be in any of the wavezones around the trac vessel (see Figure 4.7). If the USV is in proximity to a vessel, then the wavezone is also part of the local disturbance parameter. USVs typically have certain design limits such as roll limits or pitch limits. We dene a failure event as an event when any of these design limits is exceeded. Thus, given a set of trajectory samplesfs k (t)g n k k=1 as in Section 4.4.1.1 of duration t 1 . For each trajectory sample, we count the number of failure events. With failure events labeled and counted, compute M as in Equation 4.4 for each trajectory sample. We aggregate the MTBF over trajectory samples for 73 each combination of local disturbance parameter . In this way a look-up table that is indexed by can be generated (see Table 4.2). Once MTBF estimates are computed for an USV operating in dierent local disturbance parameters, it can later be used in the exponential reliability model to compute failure probabilities using Equation 4.3. 4.3 Problem Formulation Given: (a) Initial state s i = (x i ;y i ; i ;t i ;p s = 1) (b) A region G around the goal state s f = (x g ;y g ; g ) (c) A set of trac vessel proles TVPS =fTVP i g nv k=1 sensed by the perception system. (d) A map of static obstacles dened by a set of polygons SO Compute: A sequence of statesfs i g p i=0 such that (a) s 0 = s i (b) s p 2G (c) the trajectory is COLREGs compliant (d) probability of success p s at s p is maximized (e) expected time of execution is minimized. 4.4 Approach The problem is posed as an implicit graph search problem and utilize the Theta* graph search algorithm [40] to solve it. The essential components of the Theta* graph search implementation are outlined in the following subsections. 74 4.4.1 Node Expansion and Cost Functions A node n is a tuple dened as n =fs;g;h;fg where g is the cost-to-come, h is the cost-to- go, f is the total cost (f = g +h), s is the state. An initial node n i is constructed as n i = fs i ; 0;h(s i ;s f ); 0 +h(s i ;s f )g. When a parent node n p is expanded, it is assigned a set of child nodesfn c;k g Nc k=1 that contain with states corresponding to the nearby motion goals. HereN c is the total number of child nodes expanded for n p . Each of the child nodes also contains the reference trajectory x(s p ; s c;k ) specied by the Dubins curve linking the parent state to the child state. During execution, there is motion uncertainty as the controller tracks the reference trajectory. This uncertainty is handled while evaluating the cost function. To illustrate the cost function, let us focus on the expansion of an arbitrary parent node n p yielding a child node n c ending in the state s c as in Figure 4.9. Let us assume the reference trajectory x(s p ; s c ) starts at t =t p and ends at t =t c . In order to evaluate the cost function, the reference trajectory must be discretized into tempo- ral segments. Performing regular discretization, we have a sequence of time samplesTS =ft q g nts q=1 wheret q 2 [t p ;t c ],t q+1 t q = t. Suppose for a t2TS, the reference trajectory species the USV state to be s with the 2D position being p and time being t. And, similarly, all of trac vessels are moved to t on their mean predicted trajectory. The uncertainty in the predicted trajectory will be considered Section 4.4.1.6. Having moved the trac vessels to their positions in the environment map, the local disturbance vector is obtained by looking up p on the environment map and identifying which of the characteristic scenarios apply (see Figure 4.4). Let us now see how is used in the trajectory evaluation process. 4.4.1.1 Identication of Tracking Error Model In order to characterize the motion of the USV, we rst generate an optimal control solution from a initial state s 0 = (x = 0;y = 0; = 0;t = 0) to s 1 = (x =x 1 ;y = 0; = 0;t =t 1 ) in the absence of disturbances and refer to it as the reference trajectory x(t). Next, we want to characterize how 75 much deviation from x(t) is expected in the presence of disturbances. Let the initial position be p 0 . When the low-level controller is commanded a motion goal to s 1 , it tries to track the reference in the presence of disturbances. Thus, the target position to be achieved at t = t 1 is p 1 = (x 1 ; 0; 0). It is assumed that the motion goal is conservative enough to be feasible even under heavy disturbances (i.e. s 1 can be reached by t 1 ). The controller is able to achieve the target position while absorbing the eect of disturbances (by temporarily maneuvering at faster or slower speeds than the reference speed). Many trials are conducted where the USV is repeatedly made to move from s 0 to s 1 under dierent conditions (i.e. sea states and wave directions) and USV trajectory s(t) is recorded. The USV is subject to superposition of waves of varying frequency and amplitude expected in a particular sea state and varying local waves generated by trac vessels. Thek th trial captures the evolution of the USV state under a particular realization of . Given a set of trajectory samplesfs k (t)g n k k=1 , the deviation from the optimal reference trajectory can be characterized using two quantities: maximal cross-track error standard deviation ( ct ), maximal along-track ( at ) error standard deviation. ct = max q2[0;t1] [fN q [ k (q) (q)]g n k k=1 ] at = max q2[0;t1] [fT q [ k (q) (q)]g n k k=1 ] Here k and denote the xy path associated with s k and x respectively. [] denotes the standard deviation operator. N q [] and T q [] denote the normal projection, tangential projection operator on the reference path point (q). The maximal cross-track error standard deviation is the maximum standard deviation of the normal distance between the USV and the path over the entire time interval t2 [0;t 1 ]. The maximal along-track error standard deviation is the maximum standard deviation of the tangential 76 distance between the USV and the reference position over the entire time interval t2 [0;t 1 ]. ct and at collectively capture the lateral position and speed uncertainty experienced by the USV as it travels toward a motion goal under in uence of disturbance parameter . Given a set of trajectory samplesfs k (t)g n k k=1 of durationt 1 . For each trajectory sample, count the number of failure events. A failure event is when a roll limit or pitch limit has been exceeded or violations of other design limits. With failure events labeled and counted, compute M as in Equation 4.4 for each trajectory sample. Aggregate the MTBF over trajectory samples for each combination of local disturbance parameter . This way a look-up table that is indexed by can be generated. 4.4.1.2 Obtaining Tracking Error and Failure Model Parameters Local environmental conditions such as local sea state and being in the vicinity of a moving vessel aect the trajectory tracking performance as the USV moves along a reference trajectory. These conditions are encoded into (Figure 4.4) and can be used to obtain tracking error and failure model parameters. If the USV were to be controlled in an open-loop conguration, then trajectory tracking error would build up over time. However, in the presence of a low-level stabilizing feedback controller, the tracking error can be bounded within an envelope around the reference trajectory [29]. The eects of the environment and controller can be extensively studied either using oine Monte Carlo simulations to obtain the various realizations of the transition trajectories [194] or empirically evaluated in real scenarios. An example is shown in Figure 4.5. In either case, the goal is to characterize the motion of the USV as it is commanded to move in a straight line from an initial state s 0 towards a motion goal ahead in the presence of parameterized disturbances in the environment. We seek to model the tracking error and the probability of successful navigation along the reference trajectory as a function of the local disturbance parameter vector . Tracking 77 Figure 4.5: Realization of multiple simulated trajectories as the USV is commanded to go towards ve dierent locations. Deviation from the nominal trajectories can be observed by looking at the spread in the realized trajectories. 78 error can be characterized by observing cross-track, along-track error (see Figure 4.8). Cross- track error at any point in time is the deviation in a direction normal to the reference trajectory (i.e. lateral deviation). Along-track error at any point in time is the dierence in track-projected current position and the target point on the trajectory (i.e. lag or overshoot). Probability of successful navigation can be characterized by measuring MTBF under dierent conditions. To this end, preliminary experiments were conducted with a 16 foot WAM-V USV [163] in Stranahan river, south-eastern Florida. The WAM-V platform was chosen due to its robustness under wave action. This way physical experiments could be safely performed while not risking the total loss of the robot. However, this work is applicable to platforms that are not specically designed to withstand waves that may cause hull damage. Data was collected using wave gauges and the navigation sensors (IMU, GPS, Compass) of the USV (Figure 4.6). The data was studied to understand the eect of waves generated by a trac vessel on the WAM-V. Figure 4.6: Waveeld experiment setup. By observing the collected data, we generated Table 4.2 for MTBF (M) and Table 4.1 for cross-track error standard deviation ( ct ), along-track error standard deviation ( at ). Due to 79 Table 4.1: Tracking error model. Disturbance parameter Trajectory tracking error std. Along-track Cross-track fsea state = calm, near vessel = falseg at ct fsea state = rough, near vessel = falseg 2 at 2 ct fsea state = any, near vessel = trueg 2 at 2 ct Table 4.2: Mean-time-between-failure (MTBF) model. Disturbance parameter MTBF fsea state = calm, near vessel = falseg M c fsea state = rough, near vessel = anyg M r fsea state = calm, near vessel = true, wave zone = Z 0 g M t finvalid = trueg, 0 the diculties of controlling sea-state, only two states calm and rough were considered. Calm condition refers to a situation when water level across the test area is roughly constant, and no wind eects are seen on the surface of the water. Rough condition refers to a situation when water levels are changing due to interacting waves generated by passing trac vessels and wind. In both conditions, the USV was made to traverse a 20m linear path 10 times and the cross-track ct and along-track at error standard deviation in the tracked trajectory was used to populate Table 4.1. Similarly, in both sea-state conditions, the pitch and roll angles recorded and the procedure in Section 4.2.4 is used to computeM r andM c in Table 4.2. M t is not generated using physical experiments instead we use a surrogate model dened by Equation 4.5. In a very rough sea-state, the data from IMU is likely to be noisy and other sensing modalities might be needed for producing better estimates. Since we performed experiments in a water channel, our experiments were shielded from disturbances and did not experience noisy IMU measurements and we were able to observe roughly sinusoidal variations as the waves passed by. Tables 4.2 and 4.1 are indexed by the local disturbance parameters in and they are brie y explained below. 80 4.4.1.3 Obtaining Tracking Error Parameters As for the tracking error model, when the query state lies over calm waters and is not in vicinity of any trac vessel, we use nominal values for tracking error parameters. Otherwise, tracking error parameters are doubled. 4.4.1.4 Obtaining MTBF Since trac-free calm waters are very safe, M c =1. Since rough waters are not safe regardless of trac, M r = 600. Assigning MTBF over calm waters near trac vessels involves determining angles between waves and USV heading. If the query state is located within any zone of the trac vessels, we determine the relative position p d from the center of the trac vessel and the incidence angle i between the USV heading and the wave direction of zone Z 0 (see Figure 4.7). p d is calculated in the body coordinates of the trac vessel and d = 1 0 p d . Adhering to this notion, we use the following assignment for MTBF: = 8 > > > < > > > : 1:0; 8 i 4 8 0:25; otherwise (4.5) M t = 8 > > > > > > > > < > > > > > > > > : 0; p d 2Z 1 M Z 0d=L z ; 9k2f2; 3g; p d 2Z k M c ; otherwise (4.6) HereM Z 0 is MTBF value associated with the zoneZ 0 and is dened asM Z1 = 0,M Z2 = 1000, M Z3 = 3000. And, L z is the maximum trailing length of the trac vessel zones. 81 Figure 4.7: Determining wave incidence angle and relative position from a trac vessel. 82 Figure 4.8: (a) Trajectory tracking error envelope dened by ct and at for a trajectory starting at p q = (0; 0; 0), t q = 0 and reaching east under a particular sea state , (b) The 3 error ellipse at an intermediate time t along the nominal trajectory. 83 4.4.1.5 Cost Assessment Having determined tracking error and MTBF parameters using , we proceed to evaluate the cost function along the reference trajectory x(s p ; s c ). Tracking error parameters at and ct encode the deviation along the path and the deviation in the normal direction from the position p, respectively (see Figure 4.8). Thus, the actual position p of the USV at t is given by p = p +cN[ p] +aT [ p] (4.7) where cN (0; ct ) and aN (0; at ). N[ p] and T [ p] are the normal and tangent unit-vectors at p. An ellipse E t centered at p and aligned to the tangent direction T [ p] and normal direction N[ p] with axis lengths 6 at and 6 ct describes a 99:7% condence region within which the USV can be expected to be at t = t (99:7% comes from the 68-95-99.7 empirical rule associated with the normal distribution). Figure 4.9: Collision cost assessment for a trajectory starting at a parent node and ending at a child node. 84 For each ellipse E tq , the probability of failure p f;q is computed as follows. p f;q = 1 exp( t M q ) (4.8) where M q is computed by averaging the MTBF values over a uniform set of points in E tq . More specically, these MTBF values are computed by looking up each point in and then using Table 4.2 to nd the MTBF. Assuming independence of the failure events, the overall probability of success for executing x(s p ; s c ) is given by p s = nts q=1 (1p f;q ) (4.9) Thus, the probability of success assigned to the child state isp s (n c ) :=p s (n p )p s wherep s (n p ) and p s (n c ) are the probabilities of success of the parent and child states respectively. The child node n c is then assigned a cost as follows: g(n c ) =g(n p ) + (1! f )c t ! f F f log(p s ) +c c F c (4.10) where time cost is c t =t c t p , F f is a large constant with units of time penalizing risky motion causing failure,F c is a large constant with units of time penalizing moves that violate COLREGs. c c 2f0; 1g encodes the violation of COLREGs computed using [185, 168]. ! f 2 [0; 1] is a user dened weight parameter. 4.4.1.6 Accounting for Perception Uncertainty A Monte Carlo sampling scheme [104] is used to account for the perception uncertainty in the measurement of the trac vessel prolesTVP . The Monte Carlo sampling approach is favored as it is a general method and amenable to parallelization. However, this approach requires extensive sampling when high condence estimates of collision probabilities are required especially when 85 chance of a collision is rare. This problem can be alleviated if adaptive sampling techniques are used [164]. In the previous section, was computed using only the mean position of trac vessels at t. However, to account for uncertainty, the covariance information needs to be used. Let us pick one trac vessel prole TVP i 2TVPS to illustrate the process. A set of possible instantaneous positions PS i =fp k g nps k=1 is sampled from the distributionN ( p i ( t); i ( t)). The trac vessel are moved to each possible position p k 2PS i , then k is looked-up as in Figure 4.4. The corresponding M k are computed using Table 4.2 and Figure 4.7. Finally, the average M is computed over all the computed M k . This average M is used in place of the original MTBF in Equation 4.8. 4.4.2 Speed-up Techniques Some techniques are described to improve the search performance. 4.4.2.1 Static Obstacle Heuristics (SOH) Since the cost function includes the time duration of going from one position to another (i.e. c t ) and static obstacles remain xed for at least a few planning cycles, it is benecial to compute a time-to-goal map (also known as arrival time map). This map is termed SOH and it stores the lower bound on the time cost of going from any position in the workspace to the goal position g. This map is computed using Fast Marching Method (FMM) to approximately solve an Eikonal equation [122, 15]. The level sets of the solution of the Eikonal equation correspond to the frontiers of a wave emanating from the goal position. The speed of the wave is set to the maximum speed of the USV. As the frontier (i.e. the open set in Dijkstra's algorithm) bends around corners, the time-to-goal estimate is more informative than the na ve Euclidean distance based estimate. The raw time-to-goal values given FMM are naturally not admissible unless they scaled down by a factor that depends on the discretization used in computing the map [118]. After appropriate scaling, each cell in this map contains an underestimate of the time to reach the goal from that 86 cell location. This value is used to compute an estimate of c t to be accumulated till the goal is reached. Given a query position s, a heuristic value h s (s;g) is looked up from the SOH. 4.4.2.2 Space-Time Exploration Heuristics Developing admissible heuristics in the presence of dynamic and static obstacles is challenging and can often be just as hard as the original problem. However, by relaxing the original problem, eective heuristics can be generated. We follow [33] and drop to 3D conguration space by excluding (i.e. we use only xyt space) to generate a heuristic trajectory using a point robot kinematic model (relaxed version of the USV kinematic model) from initial state s i to the goal region G. Maximum speed v max of the point robot is the same as that of the USV. Zones 2 and 3 of trac vessels are ignored such that only zone 1 is considered as a dynamic obstacle. In other words, for the purpose of heuristics generation, the point robot is allowed to move freely across zone 2 and 3 without any penalty. Suppose the point robot is at the initial state p i (x i ;y i ) at the query time t i (see Figure 4.10). One can observe that set of all points this point robot can reach in t is given by a cone C i with apex at (x i ;y i ;t i ), perpendicular height t and radiusv max t. t is chosen such that it is as large as possible without the cone touching any conguration space obstacle. Suppose d c (t) is the clearance distance between p i and any workspace obstacle at time t. Then, t i = min t2[ti;ti+t;max] d c (t)=v max : This cone represents a collision-free volume in the 3D conguration space. In order to explore the space, the idea is to grow new cones at the top face of cone C i and continue this process until the goal region is reached by a certain cone. In this way we generate a sequence of overlapping cones forming a collision-free volume in conguration space which connects the initial state and the goal region. 87 Figure 4.10: Generation of heuristic space-time path. Algorithm 2 Heuristic Space-Time Path 1: procedure ComputeSTHeuristicPath(s i ;G) 2: S closed ; 3: S open fn start g 4: while S open 6=; do 5: n k PopTop(S open ) 6: if ExistsOverlap(n k ;G) then . Check if cone of n k has non-empty intersection with G 7: return RetracePath() 8: else if n k 62S closed then 9: D ExpandNode(n k ;r min ;r max ) . Store child nodes in D 10: S open D[S open 11: S closed n k [S closed 12: else 13: continue 14: return failure 88 Nodes are described using tuples of the formn k = (x k ;y k ;t k ;r k ;C k ;g k ;h k ;f k ). C k is the cone situated at (x k ;y k ;t k ) with radiusr k =v max =t k . The cost-to-comeg k =t k +c c F c is the sum of time elapsed and COLREGs violation cost. And, the cost-to-go is the value from the SOH lookup table, h k =h s (p k ;g). The total cost is f k =g k +h k . The space exploration process proceeds in an implicit A* graph search manner as in Algorithm 2 by rst pushing the start node n i in to the open set S open . The open set S open stores the nodes to be explored in a sorted order with respect to the costf in ascending order and it is implemented using a priority queue. For as long as S open is non-empty, the node with the lowest f value is chosen for further expansion. In this context, an expansion of a node involves the creation of new child nodes at the top face of C k . The perimeter of the top face is equally sampled to form candidate locations for new child nodes. These locations indicate constant velocity motion. Furthermore, an additional candidate location is sampled at the center of the top face. This candidate location captures the notion of a wait state (Figure 4.10). For each candidate location (x c ;y c ;t c ), a new cone C c is constructed with apex at the candidate location such that it has the largest height and does not collide with conguration space obstacles. If such a cone can be constructed, a new node n c = (x c ;y c ;t c ;r c ;C c ;g c ;h c ;f c ) is generated where g c = t c +c c F c , h c = h s ( (x c ;y c );g), f c = g c +h c . Incorporation of COLREGs constraints through the term c c F c improves the eectiveness of the heuristics. Once Algorithm 2 ends, a sequence of cones CS =fC q g starting at the initial state and ending in the goal region is obtained. Let be the sequence of apexes of the cones in CS and it represents a coarse-grained trajectory in the (x;y;t) space that avoids static and dynamic obstacles. The computation of heuristic value is rst described. We rst project to (x;y) space and denote it as p . For a given a query state s, we use the position component p of s to nd the closest point p c on p . The heuristic is computed as in [33]. It is the sum of time dierence between the end-point of and the closest point on and time required to reach p c (i.e., jppcj vmax +t end t c where (t c ) = p c and t end is the timestamp of the end-point of ). 89 4.4.2.3 Adaptive Motion Goal Set (AMGS) In order to accelerate the Theta* graph search, the motion goal set resolution parameters r and n dirs are modied depending on the distance of the parent node position from moving vessels and static obstacles. A precomputed Euclidean distance transform is used to query the shortest distance from any of the static obstacles. Shortest distance from any of the trac vessels is computed online. (r;n dirs ) = 8 > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > : (3L u ; 16); if within 10L u of a moving vessel or within 5L u of a static obstacle (5L u ; 8); if within 25L u of a moving vessel or within 10L u of static obstacle (8L u ; 8); otherwise This way, in open waters, longer strides are made and the progress towards goal is faster. 4.4.3 Parameter Tuning Table 4.3 summarizes the list of parameters used in the planner. These parameters are chosen by the user. 4.4.3.1 User Preference Parameters User preference parameters balance safety and performance. The failure penaltyF f can be chosen as the duration of time required to dispatch a functional USV in lieu of the failed USV. This choice roughly captures the real time-impact a failed USV will cause. The COLREGs violation penalty F c should be much smaller thanF f to favor COLREGs non-compliant trajectories over trajectories that lead to collision. ! f controls the extent to which large time delays are accepted in order to 90 Table 4.3: Planner parameters. Parameters Symbol Description Typical Range r Motion goal spacing 5m-30m n dirs Number of directions f8; 16g t wait Wait state duration 10s M Failure probability model parameter 500 6000s L u Length of USV 5m L v Length of trac vessel 10m L z Maximum extent of the waveeld extending behind vessels 100m ct Trajectory tracking cross-track 1m at Trajectory tracking along-track 1m F f Time penalty for failure 300 F c Time penalty for COLREGs violation 300 ! f Weight parameter favoring collision avoidance over travel time 0:5 n s Sample size for handling perception uncertainty 20 minimize failure. The closer to ! f is to 1, the more conservative the trajectories are in terms of avoiding failure. 4.4.3.2 Planning Speed Parameters Decreasingr and increasingn dirs creates a denser motion goal lattice and thus, increases planning times. Using longer wait state durations t wait may introduce undesirable looping behavior as the cost of gyrating about a point may be cheaper than waiting at the same point for a duration of t wait . Introducing an extra cost term in Equation 4.10 capturing energy cost will remove this behavior (e.g. c p = PathLength(n p ;n c ) ). 4.5 Results and Discussion The wave-aware trajectory planner was implemented in MATLAB. A set of simulation experiments was conducted. The setup and results of these experiments are described below. 91 4.5.1 Simulation Experiments Four dierent maps were generated to test the trajectory planner. Trac vessels are shown in their initial conguration in Figures 4.11, 4.13, 4.12 and 4.14. Magenta arrows indicate the intended direction of motion. The USV and trac vessels were simulated using three degrees-of-freedom dynamic model as in [93]. In the USV pose observations used for planning, noise was added following the distribution in Equation 4.7. The response of trac vessels to the motion of the USV not simulated. Only the USV is allowed to react to the trac vessels. Figure 4.11: Islands map Some important parameters used are: Map size = 600m 600m, USV length = L u = 5m, trac vessel length = L v = 10m, trac vessel speed = 1 2ms 1 . Other parameters are as in Table 4.3. The turning radius and maximum speed of the USV is 5m and 2ms 1 respectively. 92 Figure 4.12: Busy map 93 Figure 4.13: Port map 94 Figure 4.14: Stress map 95 The trac vessels are placed to trigger interesting dynamic obstacle avoidance as well as wave eld crossing behavior. Our approach (WA-TP: Wave-Aware Trajectory Planner) is compared with three dierent congurations of the planning stack. These congurations are summarized in Table 4.4. Global Path Planner (G-PP) ignores the dynamic obstacles and generates only kinematically feasible paths to the goal. G-PP can be realized by removing trac vessels from the implementation of WA-TP. Conservative Trajectory Planner (C-TP) considers the trac vessels along with their waveelds as a non-permeable dynamic obstacles to yield conservative trajectories. C-TP can be realized through setting M = 0 within the waveeld in the implementation of WA-TP. The conguration G-PP + VO follows the generated path and avoids trac vessels using Velocity Obstacles (VO). G-PP + C-VO is a conservative version of G-PP + VO which treats trac vessels along with their waveelds as obstacles. The conguration C-TP + VO follows the conservative trajectory generated by C-TP and avoids trac vessels using VO. Table 4.5 the execution time, collision count, success probability and distance to an obstacle at Closest Point of Approach (CPA). Success probability is calculated after each run using the executed trajectory as in Section 4.4.1.5 (cost assessment). Table 4.4: Planning stack congurations Conguration Feature Enabled Trac vessel Waveeld VO G-PP + VO 7 7 3 G-PP + C-VO 7 7 3 C-TP + VO 3 3(M = 0) 3 WA-TP + VO 3 3 3 Table 4.5 shows aggregated performance metrics for 20 runs of three dierent scenarios for dierent congurations of the planning stack. Relying on path planning and reactive collision avoidance with large dynamic obstacles as in G-PP + VO and G-PP + C-VO results in collisions, high failure probabilities and increased execution time. On the other hand, conservative trajectory 96 planning as in C-TP helps prevent collisions and failures at the cost of increased execution time. The results show that the proposed approach safely reduces execution time by roughly 13%, 24% and 3% for scenarios S1, S2 and S3 respectively when compared to the next safe alternative. Table 4.5: Comparison of performance. Metric Scenario G-PP + VO G-PP + C-VO C-TP + VO WA-TP + VO Execution time [s] S1 442.6 456.7 465.1 405.2 S2 436.7 422.4 484.1 367.5 S3 384.3 415.5 367.2 355.1 Collision count S1 2.0 2.0 0.0 0.0 S2 1.2 1.9 0.0 0.0 S3 0.0 0.2 0.0 0.0 Success probability S1 0.0 0.0 1.0 0.9975 S2 0.0 0.0 0.9964 0.9991 S3 0.9739 0.0728 0.9716 0.9894 CPA distance [m] S1 1.6 1.8 17.8 46.3 S2 7.5 5.6 20.5 17.7 S3 22.9 12.9 20.4 15.3 The eects of changing the MTBF parameter of the failure model is shown in Table 4.2. Figure 4.15 shows the initial conguration of USV and trac vessels in the islands map. Trac vessels were initialized to speeds between 1 2ms 1 . The USV was setup to use WA-TP + VO and commanded to cross the busy channel. In this experiment, the parameter controlling the danger level of the wave region (i.e., MTBF value M t ) is changed. Two dierent values of M t are used simulate two dierent levels of danger. In Figure 4.16 and 4.17, the waveelds are made to be dangerous by assigning a lower MTBF parameter (M t = 1000). This results in the USV taking a longer trajectory to reach the goal point while altogether avoiding any contact with the waveeld. In Figure 4.18 and 4.19, the waveelds are made to be permeable by assigning a high MTBF parameter (M t = 3000), making the traversal of the waveeld less dangerous. Thus, in this case, the USV attempts crossing the waveeld at an appropriate angle as it makes it way to the goal point. Figure 4.16 and Figure 4.17 show snapshots of the executed trajectory when M t = 1000. Similarly, Figure 4.18 and Figure 4.19 show the case when M t = 3000. 97 Figure 4.15: Scenario to illustrate the eects of changing the MTBF parameter M t . USV is positioned at the start point (250; 100) and commanded to reach the blue square at (235; 80) 98 Figure 4.16: CaseM t = 1000. Att = 75:0, the USV is seen avoiding crossing waveeld and taking a longer route to the goal point. 99 Figure 4.17: Case M t = 1000. USV reaches goal point at t = 166:1 100 Figure 4.18: CaseM t = 3000. Att = 62:0, the USV is seen a crossing waveeld at the appropriate angle. 101 Figure 4.19: Case M t = 3000. USV reaches goal point at t = 139:1 102 To study the eects of performance enhancing techniques, the planner was invoked in the port map enabling and disabling subsets of these techniques. The baseline conguration of the planner uses Euclidean distance based time heuristic (i.e. h(s;g) = time taken using straight line path from s to g using maximum speed), the nest motion goal set and a sample size of n s = 30. The results are shown in Table 4.6. 103 Table 4.6: Impact of speed-up techniques on performance. Speed-up technique Planning Time [s] Expansion Count Solution Cost S1 S2 S3 S1 S2 S3 S1 S2 S3 Baseline 144.0 148.9 38.0 597 565 190 373.8 308.5 298.0 SOH 19.1 63.5 12.2 64 277 42 385.9 314.3 315.7 SOH + DOH 16.8 34.1 8.7 52 125 28 389.5 338.1 322.4 SOH + DOH + AMGS 7.2 15.2 6.2 20 32 22 406.3 358.2 341.1 SOH + DOH + AMGS + AS 4.5 4.1 2.5 20 28 12 397.7 351.1 341.7 Improvement over baseline % 96.8 97.2 93.4 96.6 95.0 93.7 -6.4 -13.8 -14.7 104 Between the two heuristics (SOH and DOH), the eect of static obstacle heuristics (SOH) has a huge impact on the performance, reducing the expansion counts by at least 51%. While the dynamic obstacle heuristics help reduce expansion counts on top of SOH, the eect is not so pronounced when only a few dynamic obstacles are encountered for a particular trajectory connecting the start/goal pair. There is a slight increase in the solution cost due to the resolution of the heuristic lookup table. Introducing adaptive motion goals reduces the expansion count by at least 21%, but increases the solution cost at worst by only 6%. Adaptive sample sizes for the Monte Carlo failure probability computation helped reduce the planning time by only sampling densely where interaction with static obstacles and dynamic obstacles were possible. It was observed that approximately 60% 70% of the planning time was spent in sampling dynamic obstacles and performing collision checking. 4.5.1.1 Sensitivity to Number of Dynamic Obstacles The performance of the planner was studied when the number of dynamic obstacles was varied between 1 15. Dynamic obstacles were randomly spawned along the sampling lines the map going towards various directions at speeds between 1:03:0ms 1 as shown in Figure 4.20. Thirty trials were conducted for random start points in the start region and random end points in the goal region for each possible number of dynamic obstacles. Figure 4.21 shows the variation in the planning time as the number of dynamic obstacles encountered increases. It is observed that the planning times are under 6s for dynamic obstacle counts of less than 7. Beyond 7 dynamic obstacles, the planning time grows linearly with the number of dynamic obstacles. 4.5.1.2 Sensitivity to Dynamic Obstacle Uncertainty A study on how increased dynamic obstacle perception uncertainty aects the executed trajectory was conducted. Specically, two levels of perception uncertainty were used: low and high in a narrow channel scenario shown in Figure 4.22. 105 Figure 4.20: The scenario used for evaluating the eect of increasing the number of dynamic obstacles. Random start states and random goal states are sampled from the start region and the goal region respectively. Dynamic obstacles are randomly spawned on the sampling line and made to move at the indicated directions at various speeds. 106 Figure 4.21: Planning time as the number of dynamic obstacles was varied between 1 15. 107 Figure 4.22: Scenario to illustrate the eects of low and high dynamic obstacle perception uncer- tainty. A dynamic obstacle is entering a narrow channel. Start and goal states are separated by the narrow channel. The USV must access the risks of entering the channel while another vessel is underway. 108 Figure 4.23: USV enters the channel as the estimates of dynamic obstacle are sucient for passing the channel. 109 Figure 4.24: USV has traveled halfway into the channel after encountering the dynamic obstacle in a COLREGs compliant way. 110 Figure 4.25: USV waits at the mouth of the channel for the dynamic obstacle to pass as it is too dangerous to attempt passing the channel when the estimate of the dynamic obstacle is bad. 111 Figure 4.26: USV starts to move only after the dynamic obstacle has cleared the channel. 112 Figures 4.25 and 4.23 showcase the dierent behaviors of the USV when faced with diering levels of uncertainty in the perception of the dynamic obstacle. When the perception uncertainty is high, we observe the USV to act conservatively and wait at the mouth of the narrow channel for the dynamic obstacle to clear the channel. This behavior is due to the high perception uncertainty associated with the sensing of the trac vessel. Entering the channel in this situation is a risky move. If the channel is entered without an accurate state estimate of the trac vessel, a collision might be inevitable when the USV is unable to maneuver around the trac vessel. This behavior is avoided. When the perception uncertainty is low, we observe the USV to enter the channel and pass the dynamic obstacle on the portside in compliance with COLREGs. 4.5.2 Physical Experiments 4.5.2.1 Experimental Setup A set of experiments was performed using a 16 feet WAM-V USV in North Lake, Hollywood, Florida to verify the eectiveness of the proposed approach. Instead of physical static and dynamic obstacles, virtual obstacles were introduced to the navigation system in lieu of the perception subsystem. 4.5.2.2 Experimental Results Table 4.7: Comparison of execution time. Scenario Planner Type Execution time(s) Reduction (%) P 1 C-TP + VO 148.9 - WA-TP + VO 136.5 8.3 P 2 C-TP + VO 163.6 - WA-TP + VO 122.5 25.1 Dynamic obstacles were moving at constant speeds between 1:0 2:0ms 1 . During experi- ments, we noted south-easterly winds ranging between 10 12KTS. From Table 4.7, we observe that execution times are reduced by as much as 25:1% under the proposed approach. 113 Figure 4.27: Setup. 4.5.3 Comparison of MDP and Graph Search Although, motion planning under motion and perception uncertainty is formally described as a POMDP. In practical trajectory planning scenarios for USVs, the localization accuracy is good enough and does not warrant a POMDP formulation. In light of this, we ignore perception uncertainty pertaining to localization of the agent and construct an MDP formulation of trajectory planning in the presence of dynamic obstacles. We prove that, under some assumptions, MDP problems arising in the context of practical trajectory planning can be solved using any DP (Dynamic Programming) techniques instead of SDP (Stochastic Dynamic Programming) without compromising the optimality of solutions. LetX R n be a discrete state space consisting of the state s = (x;y;t). This state space is not densely sampled over the spatial and time dimensions. Rather,X consists of clusters of states that are separated by a xed distance (see Figure 4.29). Let C =fs 0 ; ; s k g denote a cluster of states. If a state s belongs to C and we want to draw attention to it, we denote it using the notation C s. Each cluster is centered on a nominal state s c . LetC denote the set of all clusters inX . LetU =fL;R;F;B;FL;FR;BL;BR;Sg be the space of actions. The 114 Figure 4.28: One set of trajectories from scenario P 1 comparing wave-aware trajectory with a conservative trajectory. 115 Figure 4.29: States, clusters and actions are illustrated. For a particular state, the possible next states when the action u =FL is applied is shown using dashed blue lines. 116 actions inU are shorthand for cardinal motions over 2D space (i.e. L=left, R=right, F=forward, B=backward, FL=forward-left, FR=forward-right, BL=backward-left, BR=backward-right and S=stay). The stay action waits at the same spatial location for a xed period of time. Suppose feedback controllers realize these motions over distances of upto 20m (e.g. 20m forward) while minimizing drift due to motion uncertainty. It is assumed that the typical motion uncertainty faced is relatively small (i.e. under 5m compared to the large 20m motion). And, therefore these controllers are robust enough to suppress the motion uncertainty propagation and guarantee that the USV moves as planned from any state in one cluster to within some state in another cluster. As a result, the inter-cluster transitions are deterministic and given by the invertible mapping T c :UC!C; (u;C src )7!C dst . Thus, for a given action u2U from any state s in the source cluster C src , the destination cluster C dst is determined without any stochasticity. For example, given any state s in a cluster C src centered at (x = 0;y = 0;t = 0), the action u = R results in the controller moving the USV at an average speed of 5ms 1 while targeting the state s c in the cluster C dst centered at (x = 20m;y = 0m;t = 4s). Let F c :CC!R + be a cluster transition cost function that assigns a cost for the move from one cluster to another. At this point, we recognize the tuple W c =hC;T c ;F c i as a specication of a dynamic programming (DP) problem minimizing cumulative transition costs over nominal states of clusters between the start cluster C s and the goal cluster set G =fC g;k g K k=0 . Each C g;k 2 G has the same spatial location but varies in time value. This captures the open-ended time dimension upto time step K. While the feedback controllers are able to drive the state to the general vicinity of a destination cluster C dst , the exact state C dst s 0 reached by the controller is stochastic. To incorporate this motion uncertainty, let us formulate an MDP. LetEXX be the set of feasible state transitions satisfying the condition (s i ; s f )2 E () 9C src ;C dst 2C;9u2U s.t. C dst = T c (u;C src ); s i 2 C src ; s f 2C dst . The transition probabilities between any two states inX are given by the mapping P : E! [0; 1] (the action spaceU is redundant and hence, omitted becauseT c is a invertible map. i.e. the action is completely determined by the source and destination cluster). Let F : E!R + 117 be a state transition cost function that assigns a cost for the move from one state to another. Let R: E!R be a state transition reward function that assigns a cost for the move from one state to another such that: R(s 1 ; s 2 ) = 8 > > > < > > > : 0 if L[s 2 ];L[s 1 ]2G F (s 1 ; s 2 ) otherwise. Let :X!U be a policy that gives an action u2U for every state inX . Thus, the MDP dened by the tupleM =hX;U;P;Ri. The goal of the MDP formulation is to nd a optimal policy such that it maximizes the cumulative rewards accumulated over state transitions leading to one of the goal states from any given state. We would like to know if SDP is the only way to solve this MDP and under what conditions a DP approach can be used. To illustrate this question, consider an example of a DP as shown in Figure 4.30. Suppose, Cs s c is the start state and Cg s c is one possible goal state. Ignoring motion uncertainty, let us suppose we perform DP (e.g. A*) over the nominal states of clusters, and suppose it yields an optimal path ( Cs s c ; Ca s c ; C b s c ; Cg s c ) through nominal states of the clusters (shown in dashed green line). We then proceed with the transition from the cluster C s to C a . However, due to motion uncertainty, we are not able to reach Ca s c precisely. But, suppose we reached a nearby state Ca s n instead. Would the transition to C b still be the next optimal action ? Would SDP produce a policy that prescribes a dierent action at Ca s n instead of the action towards C b . Theorem 1. Suppose the following statements are true: 1. The controller robustness condition is satised. 2. An optimal solution to the problem W c is a sequence of N clusters q c =fC k g N1 k=0 . 3. We start in cluster C s and end in some C g 2G. 118 Figure 4.30: Example of a cluster graph showing a shortest trajectory from C s to C g computed using A . Part of the trajectory is complete but the trajectory has not been faithfully executed resulting in a slight deviation at C a . Would the optimal MDP policy change in light of this deviation ? Then, the cluster transition costs of reaching some C g from C s using the optimal sequence q c is exactly the cluster transition costs incurred through following the optimal policy generated for the MDP M. Proof. Let L be an operator that extracts the cluster a state belongs to (i.e. L[ C s] = C). Let the condition F (s 1 ; s 2 ) = F c (L[s 1 ];L[s 2 ]) where (s 1 ; s 2 )2 E be called the controller robustness condition (C.R.C.). The structure of the MDPM allows for a reduction to a smaller MDPM c dened over clusters. We proceed by following the work of [116] to reduce the MDP. LetE c CC be the set of feasible cluster transitions. E c satises the condition (s 1 ; s 2 )2E () (L[s 1 ];L[s 2 ])2E c : 119 Let w :X ! [0; 1] be any weighting function where s2C w(s) = 1;8C2C. This required for well-dened transition probability and reward functions. We dene P c : E c ! [0; 1] as: P c (C src ;C dst ) = s12Csrc s22C dst w(s 1 )P (s 1 ; s 2 ) = s22C dst P (s 2 ) s12Csrc w(s 1 ) = s22C dst P (s 2 ) = 1 We dene a reward function R c : E!R + [ 0: R c (C 1 ;C 2 ) = s12C1;s22C2 w(s 1 )P (s 1 ; s 2 )R(s 1 ; s 2 ) = s22C2 (P (s 2 ) s12C1 w(s 1 )R(s 1 ; s 2 )) Applying the controller robustness condition, R c (C 1 ;C 2 ) = 8 > > > > > > > > < > > > > > > > > : 0 if C 1 ;C 2 2G; s22C2 (P (s 2 ) (F c (C 1 ;C 2 ) s12C1 w(s 1 )) otherwise. = 8 > > > < > > > : 0 if C 1 ;C 2 2G; F c (C 1 ;C 2 ) otherwise. Thus, a state-aggregated version of the MDP M can now be dened as M c =hC;U;P c ;R c i Note that M c is now a dynamic programming problem because: 120 1. The set E c expresses an adjacency list between the clusters, 2. The actions have deterministic outcomes (i.e. P c (C 1 ;C 2 ) = 1;8(C 1 ;C 2 )2E c ), Taking the negative of the reward function as the cost function, we can create another instance of DP as W =hC;E c ;R c i. Furthermore, F c =R c (except for any pair in the goal set G). Graph search will converge as soon as any cluster in the goal set is reached excluding the possibility of examining any cost between pairs of clusters in the goal set. Thus, the graph search problem W c is exactly the same as W and hence, their solutions have identical optimal costs. For a deeper understanding, the MDP problem M and the graph search problem W c was solved using the Python programming language on a GNU/Linux machine with Intel Xeon E3- 1245 CPU running at 3.5GHz and 32 GB RAM. The MDP and A algorithms were run for the test scenario shown in Figure 4.31. In this scenario, there are 8 dynamic obstacles (in green and some are hidden) and a few static obstacles (black) over a 500m 500m area. The clusters are spaced out uniformly every 20m in bothx andy directions yielding 2525 = 625 clusters. Each cluster has 9 possible states around the central nominal state (inclusive). These 9 states are arranged on a grid with 3m spacing. This attempts to capture the various possible end states at any cluster after a trajectory tracking controller reaches that cluster. The agent was assumed to move at a constant speed of 5ms 1 . Diagonal moves between clusters such as FL;FR;BL;BR take p 2 times longer (i.e. 5:65s) than non-diagonal moves such as F;B;L;R that only take 4s. Thus, time dimension needs to include all possible combinations of diagonal moves and non-diagonal moves resulting in 530 possible times values betweent = 0s and t = 150s. Thus,jXj = 25 25 9 530 = 2981250 states. The cost function for transitioning from one cluster to another cluster is the time dierence between the clusters (i.e. destination cluster time - source cluster time = t2f4 p 2; 4g). If a transition results in collision, then the 121 Figure 4.31: A test scenario to compare MDP and A solutions. The goal state set is shown as a red dot. 122 Figure 4.32: MDP policy computed for t = 0:0. 123 reward is set to1000s. A transition is considered to be in collision if the any of the sampled points along the trajectory envelope collides with obstacles (see Figure 4.8). It took roughly 2 days to compute an optimal policy for the MDP using SDP. However, solving the DP problem (under C.R.C. assumption) using A produced solutions in around 103s. Figure 4.33 and Figure 4.34 show the trajectory obtained while the computed plans where executed in simulation for SDP and A respectively. Both plans had the nal time of 149:8 and comparable collision rates. Though the trajectories varied in shape and sometimes the homotopy class, the solution cost was the same. The variation in trajectories are possibly due the extraction order in the priority queue used in the implementation of theA algorithm. The SDP solution had a collision rate of 3=300 and the DP solution had a collision rate of 4=500 when tighter trajectory envelopes and smaller buer distances where used. Although both SDP and DP formulations yield almost identical solutions, the SDP approach is computationally intensive and intractable for online trajectory planning. It is dicult to incor- porate heuristic techniques to speed it up. Moreover, there is no principled way of incorporating the uncertainty in the perception of dynamic obstacles into the SDP approach. Thus, A search is more favorable than SDP. 4.6 Summary Many practical operations like search and rescue operations are time-sensitive missions. Automa- tion of such operations using Unmanned Surface Vehicles (USVs) requires automated trajectory generation. Automated trajectory generation is challenging as it requires collision avoidance with static obstacles, moving trac vessels and the waveelds they leave behind. By ignoring the dan- gers of traversing these waveelds, USVs take undue risk and compromise their survivability. On the other hand, acting conservatively by circumnavigating large waveelds wastes mission time. In both cases, the timely completion of the mission is in jeopardy. USVs are susceptible to the 124 Figure 4.33: An execution run of the plan computed for the test scenario using SDP. 125 Figure 4.34: An execution run of the plan computed for the test scenario using A . 126 eects of wave disturbances resulting in motion uncertainty and at times total failure. The com- mon practice of decoupling trajectory planning into long-term global path planning in conjunction with short-term reactive planning leads to delays, reducing mission performance. Furthermore, regions with signicant wave disturbance is typically avoided completely to ensure safety. In this work, an attempt to strike a balance is made. And, a wave-aware deliberative planner is presented. It avoids collisions with trac vessels and also opportunistically traverses waveelds generated by these vessels while minimizing the risk of failure. The planner performs a search over a 4D pose-time lattice to generate a collision-free, minimum-risk sequence of motion goals. It addresses motion uncertainty and failure risk parametrically via a pre-computed look-up table. Furthermore, speed-up techniques such as heuristics, adaptive action-set and adaptive sampling to speed up the search are also presented. Results from simulation experiments are presented. They show that the wave-aware planner: 1. is able to plan trajectories over long distances while avoiding static obstacles, trac vessels and the waves generated by them 2. produces plans that execute faster when compared to the typical reactive planning scheme 3. can compute plans in under a minute Results from physical experiments showing the feasibility of the proposed approach is presented as well. 127 Chapter 5 Trajectory Planning in High-Dimensional Conguration Spaces 5.1 Introduction Though the motion planning problem for robots operating in a 3D workspace is PSPACE- hard [157], many automated trajectory planning methods are available when the completeness requirement is relaxed. For example, popular sampling-based planners such as Rapidly-exploring Random Trees (RRTs) [108] and Probabilistic Road-Maps (PRMs) [87] are probabilistically com- plete. Deterministic graph-based planners [35, 61] are resolution complete. On the one hand, the inherent randomness of sampling-based methods results in unpredictable solution quality; sometimes in cluttered environments, these methods may fail unless the narrow passage problem is handled using specialized techniques [141, 170]. Sampling-based motion planning algorithms are well studied methods and they have been extremely successful in solving a wide variety of motion planning problems. The inherent stochasticity of these sampling-based algorithms results in unpredictable solution quality between multiple planning queries. Although local optimization methods such as STOMP [84] and Shortcuts [70] may alleviate the problem, these methods are generally limited to the homotopy class of the seed solution. Even after local optimization, we are 128 likely to observe locally-optimal trajectories belonging to a variety of homotopy classes. These homotopy classes may each harbor locally-optimal trajectories of diverse quality. In many applications, we prefer trajectories that lie not only in a predictable homotopy class but also those that are aesthetically pleasing. For instance, consider a manipulator operating in a factory oor as in Figure 5.1(a). We desire the end-eector trajectories between any two locations in (A;B;C;D;E) to be short and roughly linear motions; indeed, workspace cues are very informative partly due to the absence of clutter. While informative, workspace cues can also be unreliable at times. Consider Figure 5.1(b), where the L-shaped sanding tool is to be moved from its starting pose to the goal pose. Clearly, there is no room to slide between the gap circled red. Relying purely on workspace cues (i.e., attempting sliding) is likely to produce no solution. The only way is to move through the slot circled blue to exit the cavity. In light of this, we motivate our context-dependent bi-directional tree-search framework that can encode preferences on how much and when to exploit workspace cues. At the core of sampling-based bi-directional tree-search methods are the following key modules: focusing module, sampling module, collision-detection module, inter-tree and intra-tree connection module. A specic implementation of a module is referred a strategy. These modules have seen wide interest in the planning community and signicant eort has been made adapting them for various applications, resulting in a rich set of alternative strategies for each module; these strategies are often a drop-in replacement (see Section 5.2). For example, there are a set of alternative strategies for (1) handling the narrow passage problem [141, 170, 187, 34], (2) focused sampling [59, 158]. Intelligently switching between these strategies is one way to: (1) improve the resiliency of the search method, (2) encode user preferences (e.g. planning speed vs. sub-optimality trade-o). In prior work [82], a deterministic graph-search based method called COntext DEpendent Search Strategy Switching version 1 (CODES3v1) has been used. It integrates multiple heuristics and switches between these heuristics during planning. This work explores a sampling-based approach and applies biasing to increase trajectory quality. 129 Figure 5.1: (a) Point-to-point planning between any two points in (A;B;C;D;E) can benet from workspace guidance. (b) A complex planning problem where workspace guidance is unlikely to produce solutions. A sanding tool is to be moved from the starting pose to the goal pose. The gap circled red is too narrow. The only way out is across the slot circled blue. 130 In application domains such as small-volume manufacturing, industrial manipulators per- form non-repetitive tasks that change frequently; manual programming of trajectories is time- consuming and impractical. In such applications, we require automated trajectory planning that quickly produces collision-free trajectories. We also require these trajectories to be of high quality as this directly impacts manufacturing throughput. High-quality trajectories are typically short and tend to minimize the execution time and energy spent. Manipulator motion may be classied into two categories. (1) Point-to-Point Motion: Here the manipulator is given start and goal joint conguration. And, the task is to nd a sequence of collision-free joint-space congurations that connect them. (2) Trajectory-Tracking Motion: Here the manipulator is given a workspace (W-space) trajectory to be followed. And, the task is to nd a corresponding conguration-space (C-space) trajectory. In this work, the focus is on a general point-to-point trajectory planning problem where the start and goal end-eector workspace congurations (i.e., transform frames) are given. Under- specied goal congurations frequently occur in manipulator trajectory planning where the nal end-eector pose is important but the conguration of the rest of the arm is not. This problem is challenging as the workspace is often cluttered with obstacles with complex geometries. The primary contribution is a exible framework that enables switching between alternative strategies to yield a resilient search method amenable to machine learning. The other contribution is a simple condition for tree swapping. Its ecacy is demonstrated on a variety of challenging planning problems through CODES3v2, a simple instance of the proposed framework. In the rest of the chapter, CODES3 is used to refer to CODES3v2, an improvement over CODES3v1. The switching conditions in CODES3 encode the following preference; we should rst utilize focusing strategies that are likely to quickly produce high-quality solutions. Then, only in response to growing evidence that the focusing strategies are misguiding, we should resort to other seemingly less lucrative strategies. 131 5.2 Background Deterministic search-based methods are used in [35, 61, 191] to solve the trajectory planning problem for manipulators. Without specialized heuristics, these methods take a longer time to solve the problem than sampling-based methods like RRT. Recent developments related to RRT and its variants are given in [49]. Each of these variants introduce novel strategies in the sampling-based tree-search algorithm. The goal of these strategies is to either reduce planning time or improve solution quality. 5.2.1 Sampling Strategies Specically, workspace sampling strategies have been studied in the past. Rickert et al. follow a two stage approach [158] where a RRT-based planner operating inC-space leverages workspace cues. It works very well wheneverW-space connectivity roughly mirrorsC-space connectivity. For instance, this is true in the case of xed and mobile manipulators operating in uncluttered workspaces. Kiesel et al. use a focusing scheme [90] where a subspace ofC-space (i.e., essentiallyW-space) is discretized into regions. By incrementally establishing connectivity between these regions in an online fashion, easy regions of theC-space are identied. Planning eort is focused on easy regions as they are uncovered. Both methods suer when there is a large discrepancy between the connectivity in the workspace to that of the high-dimensional conguration-space (Figure 5.1). Although techniques that provide a diverse set of trajectories as in [21, 153, 94, 214] can be employed, explicitly constructing a workspace path for the end-eector in a cluttered workspace while eciently pruning workspace paths having infeasibleC-space counterparts is just as hard as the original problem itself. Other workspace guidance methods as in [133, 112] do not scale well for a cluttered 3D scene unless special care is taken to carefully breakdown workspace into a 132 computationally tractable number of disjoint polytopes. Depending heavily on any single focused- sampling strategy may be detrimental to performance. Workspace cues have been used to guide sampling-based approaches [201, 98, 219]. These methods involve decomposition/analysis of workspace [98]. 5.2.2 Node/Target Selection Strategies Works such as [205, 173, 112] use the Jacobian pseudo-inverse to generate target congurations. Specically, Weghe et. al. use a goal biasing scheme that drives the search towards workspace goal regions [205]. Jacobian pseudo-inverse is a potent technique for bridging the workspace and conguration space. While it helps cover large distances in the workspace, intricate conguration space maneuvers cannot easily be performed using this technique alone. When solutions pass through narrow channels in the conguration space, sampling-based approaches tend to miss them unless special care is taken to sample these channels. Methods such as [141, 170, 187] focus on these problems. 5.2.3 Connection Strategies Variants such as RRT , PRM [86] guarantee almost-sure asymptotic global optimality. BIT [59] uses a conguration space focusing schedule and provides optimality guarantees. While these methods are guaranteed to nd optimal solutions eventually, being able to utilize workspace guidance when it is accurate can greatly complement their strengths [34]. In [45], a workspace RRT is grown from the goal position. The resulting tree is used to bias the growth of the conguration space tree. This method works well for mobile manipulation in uncluttered environments. 133 5.3 Problem Formulation Let the workspace of the robot be denoted asW R 3 . Let q =fq 1 ;q 2 ;:::;q n g be the joint conguration of a serial-link manipulator with n joints. Assume the geometric and kinematic models (R) of the robot are well dened and available. Let the geometry of the manipulator at a conguration q be represented as a set of rigid bodiesM(q)W. LetOW be the set of workspace obstacles. LetC be the conguration space of the manipulator containing all joint congurations of the manipulator. The set of joint congurations that lead to collision is denoted byC obs =fq2C :M(q)\O6=;g. The set of joint congurations that are collision-free is C free =fq2CnC obs g. Let T be the homogeneous transformation matrix representing the pose of the End-Eector (EE) inW. For a given joint conguration q, T can be found by applying Forward Kinematics (FK). In this work, the dot notation is used to extract quantities (e.g. T:p denotes the position component). For given a pose of the end-eector, a set of corresponding joint congurations can be found by computing Inverse Kinematics (IK). A tree node n consists of a conguration q, the EE transform frame T(q). The terms nodes and congurations are used interchangeably and the distinction is made where appropriate.T s denotes the tree of nodes rooted at the start node andT g for the tree rooted at the goal node. We are interested in the trajectory planning problem for a manipulator that needs to move from a starting end-eector transform frame T s to a goal transform frame T g . Given a robot modelR, workspace obstaclesO, start and end transform frames T s ; T g , the objective is to nd a collision-free trajectory in the form of a sequence of points Q =fq k g K k=0 where FK(q 0 ) = T s ; FK(q K ) = T g . 5.4 Approach Our context-dependent bi-directional tree search algorithm (Figure 5.2) proceeds as in Algorithm 3. For each module, one of the alternatives from Table 5.1 is dynamically selected during each 134 Algorithm 3 Context Dependent Bi-directional Tree Search 1: function Solve(T s ; T g ) 2: T s IK(T s );T g IK(T g ) 3: T c T s ;T o T g 4: while true do 5: T c ;T o TreeSelection(S e ;T s ;T g ) 6: F s ;F t FocusSelection(T c ) 7: n c NodeSelection(T c ;F s ) 8: n t TargetSelection(n c ;F t ) 9: S e ; n e ExtendStrategy(n c ; n t ) 10: T c T c [fn e g 11: S c ConnectStrategy(S e ; n e ;T o ;T c ) 12: if REACHED ==S c then 13: return RetracePath(T s ;T g ) iteration. These alternatives are outlined and some insights into when they might be useful is provided. 5.4.1 Tree Selection Tree selection refers to picking a current treeT c from the two choices {T s andT g . After a current tree is picked, the other treeT o refers to the tree that was not picked. In a broad sense, tree selection is a type of focusing scheme where the eort is focused in growing the current tree. In many cases, unconditionally swapping trees before every iteration works well (TS A ). However, it is observed that this may not always be the best strategy. Suppose the goal treeT g is harder to grow (e.g. the goal conguration has a tool in a cavity) and the start treeT s is easy to grow. It is crucial to focus more eort growingT g thanT s . Adding nodes toT s may not alleviate the problem of exiting the cavity as much as adding useful nodes inT g would. A simple strategy is to switch to the other tree only if a valid (i.e., S e 6= TRAPPED) new node was added to the current tree in the previous iteration (TS B ). While TS B allows progress on both trees, greedy connection strategies may introduce signicant node-count dierence (i.e., 135 Figure 5.2: Our framework dynamically switches strategies selected from a library of strategies based on feedback from the current search state. It can incorporate new strategies and scheduling logic for application specic needs 136 Figure 5.3: An illustration of the six primitive steps in realizing the bi-directional tree search method. abs(jT s jjT g j)) as intermediate nodes are laid down between the trees. Another simple strategy is to switch only when the normalized node-count dierence abs(jT s jjT g j) jT s j +jT g j ! between the two trees is less than a threshold ! (TS C ). This has the eect of roughly equalizing node-count on both trees resulting in a balanced growth (see Figure 5.4). Such a balanced growth results in the drastic reduction of failure rate (see Section 5.5). For TS C to work, during the extend and connect steps, only the chosen tree must be extended and the other tree must be left untouched. This ensures that the node-count balancing feedback is negative. 5.4.2 Focus Region Selection Having chosen a tree, focus regions are picked next. Focus regions are regions of the search space where planning eort is expended. There are two types of focus regions: W-space focus regions 137 Figure 5.4: (a) An illustration of bi-directional tree search using TS A , (b) An illustration of bi- directional tree search using TS C . We observe that the balanced growth of both trees results in a solution without sampling a lot of unnecessary (blue) nodes. andC-space focus regions. Methods such as [21, 153, 94, 214] can be used to generateW-space focus regions. Domain specic workspace focusing can also be triggered upon certain conditions (e.g. moving along conned workspace slits/slots as in scenario S13 in Figure 5.10). Similarly, C-space focus regions can be generated through methods outlined in [59, 187, 154, 97]. For either type of focusing, two subsets can be dened: node-focus regionF s and target-focus- regionF t in their respective spaces. Node-focus region is a region used for selecting nodes, while target-focus region is used for selecting targets (see Figure 5.3). Targets are points inC orW towards which the chosen tree is attempted to extend towards. In this work,C-space focusing was not used (i.e., forC-space sampling,F s =F t =C). However, aW-space focusing method given in [158] is used. The wavefront propagation method (Algorithm 4) is used to nd a workspace path for a key point located at the end-eector ange. The wavefront algorithm greedily grows a search tree from the start point p s towards the goal point p g . We seed with the rst node containing a workspace ball centered at p s . The radius of this ball is set by querying the workspaceW for the distance to the closest obstacle from p s . 138 Figure 5.5: An illustration of free-space balls inW (K = 6). TreeT s has progressed upto ballB 3 . TreeT g has progressed upto ballB 4 . Note that for illustration purposes, both trees have been projected ontoW based on the EE positions of the node. This node is then put into a priority queue Q which sorts nodes in the order of least Euclidean distance to the goal position. Whenever a node is popped from this queue, we check if the ball contains the goal point. If it contains the goal point, we terminate the search and retrace the search tree to return a sequence of workspace-balls. If it does not contain the goal point, the surface of the ball is uniformly sampled. And, each sample p i that is already contained by the existing balls in V is ignored. The remaining samples are inserted into the queue by performing a distance query. This process repeats until a sequence of workspace-ballsS is returned. This workspace path denotes the sequence of free-space ballsfB k gW (starting at p s leading to the workspace goal p g ) through which the end-eector can be dragged along (Figure 5.5). The progress of the search tree is monitored along this ball path using a progress index m2 [1;K] where K is the number of balls in the path. For a particular tree, sayT s , a progress index m = 3, indicates that there exists a conguration q 0 in this tree whose EE workspace position has progressed 3 balls from the initial position p s towards the nal ball (i.e., T(q 0 ):p2B 3 ). With a slight abuse of notation, we say that q 0 2B 3 . Since our framework is bi-directional, there are two progress indices, one for each tree. Initially, m s = 1 andm g =K. The intent is to increment 139 Algorithm 4 Wavefront Algorithm 1: function WorkspaceBallPath(q s ; q g ) 2: p s ForwardKinematics(q s ) 3: p g ForwardKinematics(q g ) 4: V ; 5: E ; 6: Q ; 7: r s DistanceQuery(p s ) 8: Insert(Q; s;kp g p s kr s ) 9: while true do 10: s (p center ;r; s parent ) Pop(Q) 11: V V[fsg 12: E E[f(s parent ; s)g 13: ifkp g p center kr then 14: S RetraceTree(s) 15: returnS 16: P UniformSamplesOnSphere(s) 17: for p i 2P do 18: for s i = (p center;i ;r i ; s parent )2V do 19: ifkp i p center;i k>r i then 20: r 0 DistanceQuery(p i ) 21: s 0 (p i ;r 0 ; s) 22: Insert(Q; s 0 ; s) m s till m s =K and decrement m g till m g = 1 in the hope that both trees will progress towards their respective terminal balls (Figure 5.5). ForT s andT g , the terminal balls areB K andB 1 respectively. To this intent, a biased coin is tossed to choose between the following assignments: (F s ;F t ) (B m ;B m+d ) on heads and (F s ;F t ) (W;W) on tails whered = 1 whenT c =T s andd =1 whenT c =T g . The bias for heads is denoted by. 5.4.3 Node Selection Node selection refers to picking a node n s from the current treeT c using the focus regionF s . Much like RRT-Connect (RRTC) [95], strategyNS A picks a random conguration q r 2F s and nds the 140 closest node to q r inT c (NS A ). This selection method has the Voronoi-bias property that allows for rapid exploration (i.e., nodes ofT c are selected with frequency proportional to the Voronoi- region-volume of each node). However, rapid exploration happens only when a suitable distance- metric is used [109]. In many applications, it is often benecial to make a move in workspace by applying Voronoi-bias in workspace [173]. This is done by picking a random reachable workspace frame T r 2F s and nding the closest node inT c (i.e., argmin n2Tc d(n:T T r )) (NS B ). Nodes can also be selected based on criteria such as a cost estimate as in graph-search methods like A* (NS C ). 5.4.4 Target Selection Target selection refers to picking a target in the vicinity of the chosen node n c in order to make a local connection. This target could be a point in eitherC orW. TaS A produces a target conguration q t =Interpolate(q c ; q r ; q ) where q r 2F t and q is the step-size. TaS B produces a target transform frame T t =Interpolate(T c ; T r ; p ) where T r 2F t and p species the step- size. While TaS A and TaS B receive target points from the node selection module, the target point could be chosen independently of node selection. In TaS C , n directions from the chosen node can be sampled and a direction with least cost can be picked. In dicult areas (e.g., narrow passages and entrance/exit of narrow passages), extending a tree can become dicult as many of the sampled targets are in collision. In such cases, building an approximate local analyticalC-space model ( b C obs ) can be helpful. This analytical model ( b C obs ) allows for quick inference and it can be used for selection promising targets (TaS D ). Since this building this model takes a non-negligible amount of time, we judiciously build these local models only for regions ofC that are dicult. Dicult regions ofC-space can be identied in several ways. After every extend step, we log the status of the extend operation into the source node (n s ). For example, suppose that an extend step is performed. Then, n s :ea n s :ea + 1 (extend attempts). If the return code for an extend step is TRAPPED, then n s :ef n s :ef + 1. A node 141 is considered to be in a dicult region when n s :ea>N a andn s :ef=n s :ea> d and a local model is built (i.e., n: b C obs ). N a is a threshold that waits for enough number of attempts ( 20) and d is also a threshold ( 0:8). Suppose that Q =fq m 2Cg M m=1 2R Mn a sequence of conguration points. Suppose that L =fl m g M m=1 where l m = 1 C obs (q m )2R M1 and 1 is the indicator function. We can obtain a local approximation of theC-space by tting a RBF model. In order to reduce the computational burden when M is large, we can rst perform clustering on X alone (disregarding L). Suppose that after clustering Q using k-means algorithm with k = K, we have C =f k g K k=1 2 R Kn where k 2C are cluster centers and cluster variance is =f 2 k g K k=1 2R K1 . Suppose that we represent the RBF basis set as =f k g K k=1 (5.1) where k (q) = exp( k kq k k 2 ) (5.2) and k 1=(2 2 k ) (5.3) We wish to approximateC obs using the following RBF model. V (q; ) K X k=1 k k (q) 1 C obs (q) (5.4) TheC-space approximation problem can be formulated as follows. A = L (5.5) 142 where A = 2 6 6 6 6 6 6 6 6 6 6 4 1 (q 1 ) 2 (q 1 ) K (q 1 ) 1 (q 2 ) 2 (q 2 ) K (q 2 ) . . . . . . . . . . . . 1 (q M ) 2 (q M ) K (q M ) 3 7 7 7 7 7 7 7 7 7 7 5 , q m 2 Q and 1 2 K T . The approximation of theC-space can be determined by nding a least-squares solution LS = (A T A) 1 A T L (5.6) One can also compute the gradient of V as r q V (q) = K X k=1 (2 k k k (q))(q k ) (5.7) To build a localC-space model ( b C obs (q 0 )) at a point q 0 , we can sample a distribution of points Q 0 (a domain around q 0 ) and their corresponding labels L 0 . Then, we can construct a local model as b C obs (q 0 ) (q 0 ; Q 0 ; L 0 ; LS ) and store in a node (i.e.,n: b C obs = b C obs (q 0 )). Whenever a selected node (n s ) falls in the domain of an existing local model b C obs;0 , we set n s : b C obs = b C obs;0 . Given that we start atn s :q q s , we want to nd suitable directions to extend and avoid obstacles. An easy way to accomplish this is to move orthogonal to the gradient of V as shown in Figure 5.6. Suppose that ^ q r is a random unit vector (^ v denotes normalization of v). An orthogonal vector can be constructed as q r = ^ q r (^ q T r r ^ V (q s ) +)r ^ V (q s ) (5.8) Thus, the target can be computed as q t = q s + q r (5.9) 143 Figure 5.6: An illustration of how a local model (dim(C) = 2) is utilized for exploration. Directions roughly orthogonal to the gradient vector are promising directions. Without a large number of samples, this model is unlikely to be perfect and thus, we cannot rely on it exclusively. We handle this drawback in two simple ways. First, we add a slight perturbation U(0; 0:1) that helps avoid obstacles a little further. Second, every time a model is created, we compute the accuracy (p r ) of the model. The outcome of a biased (p r ) coin-toss is used to decide if model-based target selection is performed. The more accurate the model is, the more likely model-based target selection is used. 5.4.5 Extend Strategy Extend strategy refers to how the selected node n c and the selected target are connected. There are two types of targets: q t (C-space target) and T t (W-space target). If the selected target is q t , then ES A and ES C apply. ES A performs a linear interpolation between q c n c :q and q t ; it checks for collisions along the way. Judiciously applied non-linear connections [34, 91] in C-space (ES C ) have the potential to improve performance signicantly. In this work, non-linear connections are not included. 144 ES B uses the geometric Jacobian (J g ) to drive q c to T t using the pseudo-inverse method [30]. p = T t :p T c :p: o = R c (Quat(R T c R t ):xyz) where Quat(R) is the quaternion representation of R and R c is the rotation component of T x . Let (v;) == max(kvk 1 ;). x = [(p; p )p ;(o; o )o] (5.10) q = (J T g J g + I) 1 J T g x (5.11) q =(q; q )q (5.12) q n q n + q (5.13) The adaptive scaling function allows ner control over step-size and increased stability at the cost of more iterations to reach T t . A status ag S e and a new node n e is returned by all extend strategies; S e TRAPPED when a collision-free move cannot happen at q c , S e ADVANCED when a collision-free move can be made and S e REACHED when the target is reached without collisions. n e contains q e , the resultant collision-free conguration after extending q c . If the target was selected using the workspace balls, we follow [158] to update progress indices; suppose S e 6= TRAPPED and the extension produced q e in the current treeT c . We start scanning from the terminal ball towardsB m and identify the rst ballB p such that q e 2B p . Then, we perform the update m c = p. Suppose S e = TRAPPED, we follow the backtracking technique in [158] where the current ball is in ated by a small factor. On recurring entrapment, 145 the progress index is regressed so that a more comprehensive eort can be made from a previous ball. 5.4.6 Connection Strategy Connection strategy refers how a connection can be made between the start and the goal trees. Specically, it refers to how a node (source node) fromT c is connected to another node (destination node) inT o . One way to pick the source node is to pick n e 2T c whenever the status agS e 6=TRAPPED. The destination node inT o can be picked in many ways. The closest node to n e :q inT o can be picked as the destination node; the source and destination nodes can be connected using linear- interpolation inC-space (CS A ). Due to the bi-directional search scheme, some nodes ofT c andT o often intertwine inW-space; however, inC-space, these nodes are relatively distant. Such nodes in T o which are in proximity to n e inW-space are good candidates as destination nodes as they have the potential to connect the two trees. Thus, a candidate set can be generated by ltering nodes ofT o that are within a workspace-ball of radiusr c centered at n e :T:p. A random candidate is picked from this set as the destination node and attempt connect to it from the source node (CS B ). Connection is performed by repeatedly using the extend strategy ES A until the destination node is reached or collision is detected. As noted in [95], the greediness of the connect strategy has an impact on performance; It is observed that limiting the number of times the extend strategy is repeatedly called improves planning time and solution cost. Furthermore, inhibiting the addition of intermediate nodes (i.e., nodes generated while attempting to connect the source and destination, excluding the end-points) has the eect of decreasing planning time at the expense of decreasing trajectory quality. A maximum limit of 5 extends per connect was used. And, intermediate nodes were not added. Occasionally, with a very low frequency, it is benecial to select a random node fromT o (CS C ). This helps when the distance-metric is not accurate. 146 In RRT-Connect [95], a newly generated node in the chosen tree (T c ) is connected greedly along a straight-line to the nearest node in the other tree (T o ). While it is simple and works well in many cases, the connection strategy can be made adaptive to information gathered about the C-space so far. The idea is to make connection eort not just along a straight-line but a general region ofC-space (CS D ). In this work, CS D is also referred to as the Connection Biasing (CB) strategy. Consider the scenario in Figure 5.7 where n a and n c (closest on the other tree) have Figure 5.7: Connection biasing an unobstructed straight-line path between them while n b and n d have an obstruction. Initially, this information is unknown; whenever, n a (or n b ) is chosen, attempting connection directly to n c (orn d ) is desired. However, as time progresses, collision samples are observed during attempts to connect. These samples indicate diculty in connection. A cone c of a certain angle between q 1 and q 2 can be dened asc ( q 1 ; q 2 ;) =fq2Cjkq par kd;kq perp kd tan(); q par ((q q 1 ) T q dir )q dir ; q perp = (q q 1 ) q par g where q perp and q par denote the perpendicular and parallel components along the direction q dir (q 2 q 1 )=kq 2 q 1 k. Suppose that a node n c is selected and the closest node on the other tree is n o . A conec(q c ; q o ;), = 10 deg. Within the conec, the ratior of collision samples to the total number of observed samples can be computed. This can be done eciently by culling operations utilizing R-trees. Here observed samples consists of the collision samples and nodes (of both trees). Based on this ratio r, we can make a decision on what to do next. The decisions to be made are (1) should a connection attempt be made at 147 all?, (2) if we are attempting a connection, which direction to move towards? We use the following rule. If r < 0:2, attempt connection directly towards q o . Else if r < 0:5, make 5 extend steps consecutively along a random ray passing through the apex of the cone (i.e., q c ) and a random point on the cap of the cone. If r> 0:5, skip the connection step and move on to tree selection. 148 Table 5.1: Summary of Strategies Module Strategy Description (more details in Section 5.4) Tree Selection TS A Swap trees TS B Swap trees only on success TS C Swap trees on low node count dierence Focus Selection FS A Use conguration space focusing FS B Use workspace focusing Node Selection NS A Pick closest node inF s to a random q r 2F t C NS B Pick closest node inF s to a random workspace frame T r 2F t W NS C Pick least f-cost node from priority queue Target Selection TaS A Pick target q away in direction of q r q c TaS B Pick target p away in direction of T r :p T(n s :q):p TaS C Sample n directions and pick direction with least f-cost TaS D Use an approximate localC-space model to quickly identify promising targets (RBF) Extend Strategy ES A Linear interpolation inC-space ES B Linear interpolation inW-space ES C Non-linear interpolation inC-space Connect Strategy CS A Connect new node to nearest (inC) tree node in other tree CS B Connect new node to nearest (inW) tree node in other tree CS C Connect new node to random tree node in other tree CS D Connect new node to nearest node in other tree based cones (Connection Biasing) 149 5.4.7 Scheduling of Strategies The choice of strategies across each of these modules alters the behavior of the search method. For each module, one of many alternative strategies can be chosen. Letf k g represent the strategy vector, a stochastic vector containing the probabilities of selecting a strategy for module k. For example, TS = [TS A : 0; TS B : 0:2; TS C : 0:8] indicates the selection of TS B and TS C with a probability of 0:2 and 0:8 respectively. Certain strategies are not compatible with each other. In particular, selecting a workspace target (i.e., strategy TS B ), limits options in the selection of the extend strategy. E B is the only strategy that can be selected as the extend strategy as there is no joint conguration as a target. In such cases, the probability values are zeroed out to conform to the constraints. A strategy sequence can be dened as a tuple of strategy vectors = ( TS ; FS ; NS ; TaS ; ES ; CS ). Conforming to such constraints, a sequence of strategies could be statically determined before the search starts and kept constant during the search. An approximation of RRTC [95] (A-RRTC) and an approximation of a bi-directional EET (A-EETC) [158] can be represented as static tuples as in Table 5.2 where R (TS A : 1;FS A : 1;NS A : 1;TaS A : 1;ES A : 1;CS A : 1) and E (TS A : 1;FS B : 1;NS B : 1;TaS B : 1;ES B : 1;CS A : 1) with = 1:0. The sequence can also be dynamically determined during the search. By changing strategies during the search, some of the synergistic interaction between these strategies can be exploited. Conversely, mixing dissonant strategies can have the opposite eect. For example, whenever the focus strategy is FS B , using only the connection strategy CS B yields better performance as 150 the intermediate nodes created while using CS A often provide little value in making progress in workspace. While a dynamic strategy sequence captures synergistic interaction between strategies, it is not context-dependant; therefore, it may miss contextual-cues and select unpromising strategies. For instance, A-RRTC+A-EETC, a 50 50 blend of A-RRTC and A-EETC, is context-independent as it does not depend on the search state and it only depends on an independent coin-toss (see Table 5.2). In the interest of space, A-RRTC+A-EETC is also labeled as BLEND in the results and they refer to the same method. CODES3 can be represented as a context-dependent dynamic strategy sequence. CC refers to the collision-check counts since the search started. SF is a ag indicating if a solution has already been found. Initially, CODES3 behaves almost like A-EETC till CC <CL 1 using 1 = (TS C ;fFS A : 0:0;FS B : 1:0g;fNS A : 0:50;NS B : 0:50g;fTaS A : 0:50;TaS B : 0:49;TaS C : 0:01g;fES A : 0:5;ES B : 0:5g;fCS A : 0:49;CS B : 0:49;CS C : 0:02g) and = 0:9. As CC increases and no solution has been found, it is likely that the workspace cues are misguiding. We switch gears and use a copy 2 1 with the change 2 : FS =fFS A : 0:4;FS B : 0:6g, = 0:5. In making this change, we are arming our ambivalence regarding the eectiveness of the workspace cues. The conditions involving limits CL 1 and CL 2 act as a gauge of problem complexity. WhenCC exceedsCL 2 and no solution has been found, it is unlikely that workspace cues are useful at all; 3 1 is used along with the change 3 : FS =fFS A : 0:9;FS B : 0:1g, = 0. In making this change, we are arming our guess that perhaps conguration space focusing is the only way to a solution. As soon as a solution is found, we revert back to heavy workspace focusing by using 1 and = 0:9 again. In doing so, we are redoubling eorts to nd a higher-quality solution based on workspace cues, content in the fact that we have at least a feasible solution. Instead of using xed threshold of collision count (CC) to determine if workspace sampling was helpful. This threshold is arbitrary and may not be determined using any principled approach. 151 The Auto-Focus (AF) workspace/conguration rule is presented. It controls FS and the rule is of the form FS = [FS A : 1p FS B ;FS B : p FS B ]. The basic idea is to monitor progress along the sequence of balls and change p FS B accordingly. Progress refers to the update of the progress index towards the terminal ball. Whenever progress is made along the ball sequence, we increase the probability of workspace focus selection (p FS B ). When no progress is made, we decreasep FS B by a xed amount. The following equation describes the rule succinctly. p FS B [k + 1] = min(0:8; max(0:2;p FS B [k]d +u)) (5.14) where u = 0:1 if progress is made and u = 0 otherwise. And, d = 0:01 is a decay parameter, k refers to the iteration count of the main while-loop. p FS B is limited to the range [0:2; 0:8] to ensure that FS B gets chosen even when no progress along the ball sequence has been made in a long time. Table 5.2: Strategy Assignments Name Strategy Sequence A-RRTC R A-EETC E and setting = 1 A-RRTC+A-EETC (BLEND) ( R ; if heads on coin-toss E ; if tails on coin-toss CODES3 8 > > > < > > > : 1 ; CC2 [0;CL 1 )^:SF 2 ; CC2 [CL 1 ;CL 2 )^:SF 3 ; CC2 [CL 2 ;1)^:SF 1 ; SF 5.4.8 Incorporating Human Assistance In many applications, there is always a human-operator managing robot operations even if these robots are automated. Often these operators perform a nal verication before the automatically- generated trajectories are executed. Despite the use of specialized techniques, automated trajectory- planning methods may not always nd a solution within a given time budget; even if they nd a 152 Table 5.3: Parameters Parameter Value q 0:12 rad p 0:03 m q 0:1 rad p 0:1 m o 0:1 0:05 ! 0:05 CL 1 3000 CL 2 10 000 solution, the human-operator may reject it due to excessively long trajectories or uncomfortable proximity to obstacles. In such cases, the human-operator may as well play a proactive role in shaping the produced trajectories. The eld of cooperative-planning studies how human assistance can be incorporated into an underlying automated trajectory planner. The goal of cooperative-planning is to harness the power of human intuition in focusing the automated search and produce high-quality solutions quickly. For instance, humans intuitively know the shortest trajectories for scenarios as in Figure 5.8, whereas standard sampling-based methods may produce aesthetically disturbing trajectories. This is particularly frustrating when there is high perception uncertainty in the vicinity of the computed trajectories and a replan is required. Playing an active role can reduce frustration and bridge the gap between the human-operator's expectation and the automatically produced trajectories. By cooperatively interacting with the planner, the operator can impart human intuition in focusing the search and help the automatic planner produce high-quality solutions quickly. Figure 5.8 shows example scenarios in which humans intuitively know the shortest end- eector paths to reach the goal pose. While the performance of CODES3 is already better than alternative methods, it is benecial to incorporate human assistance whenever CODES3 is unlikely to nd a solution quickly or it is likely to nd suboptimal solutions. Such a cooperative-planning scheme is useful to a robot operator only when it satises the following conditions: (C1) presents an intuitive graphical 153 Figure 5.8: (a) A complex planning scenario where a human operator intuitively knows how to maneuver around obstacles, (b) Humans intuitively know the sanding tool cannot be slid across the narrow gap. By inspection, humans immediately know that the tool has to exit the cavity to reach the goal pose. 154 interface to the underlying planner, (C2) allows dynamic changes to the planner behavior over time, (C3) controls the underlying planner eectively, (C4) requests human assistance only when help is needed. In this work, Human assisted CODES3 is referred to as HA-CODES3. In this work, the human operator provides the focus regions. An illustrative example of a cooperative-planning work ow is described using Figure 5.9. Figure 5.9 shows a scenario where a roll-cage assembly is welded for a buggy. Given a welding scenario as in Figure 5.9, CODES3 starts o un-assisted to solve the welding problem. Meanwhile, the human-operator is presented a graphical-user-interface (GUI) as in Figure 5.9. Depending on the progress of the search, the operator can choose to make the following changes: 1. Tweaks to Focus Selection Bias: The human-operator can dynamically change FS using the slider. Suppose the slider value is v2 [0; 1] (v = 0 atFS A ). Then, FS =fFS A : 1v;FS B : vg. This gives the operator an option to override and disallow workspace biasing, 2. Tweaks to the positions and sizes of the workspace balls by moving them around: Moving one ball will aect the positions of other balls in a manner similar to a string of beads; this way the operator spends the least time positioning all the balls. Whenever the balls are modied, the progress indices are reset, 3. Clicking workspace balls: The operator can also observe a live value from the CODES3 algorithm indicating the usefulness of the selected ball (the red icon). The usefulness value acts as the feedback to the operator and it is determined by the ratio between the number of successful extends and the total number of extends that were made within the selected ball. The operator uses it to make an informed guess at the correct value of 2 [0; 1] ( = 0 is the cross symbol) specic to the selected ball by sliding. The workspace projection of both trees is also displayed (for clarity, this has not been visualized in Figure 5.9). As the tree 155 search continues, the ball corresponding to the progress indices of both trees is also shown to the operator using a tinted color. Figure 5.9: An example of a cooperative-planning GUI. Focus regions in the form of workspace balls are created by the human-operator to guide the welding tool around the roll-cage assembly. In the lower-right area, a slider control pops up when the human operator clicks a workspace ball. 5.5 Results and Discussion The proposed method was implemented in MATLAB on a Dell workstation with Intel Xeon 3.5 GHz CPU and 32 GB RAM. Table 5.3 lists some of the important parameters we used. EETC and RRTC* refer to the standard implementation of bi-directional variant of [158] and [86] respectively. The proposed method was tested in a diverse set of 30 scenarios. This set contains 156 scenarios which are: easy, hard, workspace-misguiding, workspace-guiding. Figure 5.10 shows a representative subset of the test scenarios. For each scenario, a 3D Euclidean Distance Transform (EDT) of the environment [80] was precomputed. Collision checks were performed between the environment and a ball-approximation of the robot using the EDT. All methods were given a planning timeout of T o = 7:5s or 15s to produce any number of valid solutions and a smoothing timeout of T s = 2:5s to achieve less than 1% relative cost dierence between consecutive smoothing iterations. If any method took longer than T o , the trial was recorded as a failure; otherwise, for each of the raw solutions found, the distance traveled by the tool-tip was also recorded as a metric of trajectory quality. Finally, the raw solution with the least tool-path distance was smoothed using a simplied version of [70]. Independently, the optimal trajectory for each scenario was computed in order to assess the trajectory cost sub-optimality factor (i.e., ratio of trajectory cost obtained to the optimal trajectory cost) for each of the smoothed solutions produced. Results shown in Figure 5.11, Figure 5.12, Table 5.5 and Table 5.4 were computed using the data collected over 50 trials of each planner-scenario combination using dierent planning timeouts. 157 Figure 5.10: A representative subset of test scenarios. For clarity, only one of many possible joint congurations is shown for each scenario. 158 Figure 5.11: Performance metrics of CODES3 and alternative methods aggregated over 30 diverse scenarios. All methods were given time budgetT o = 7:5s to compute solutions. The dots indicate median sub-optimality factor for individual scenarios and the stars indicate average performance over all scenarios tested. The point (0; 1) is an ideal point indicating zero failure rate and optimal solution quality. Some data points that have a sub-optimality factor greater than 8 are not visualized. 159 Figure 5.12: Performance metrics of CODES3 and alternative methods aggregated over 30 diverse scenarios. All methods were given time budgetT o = 15s to compute solutions. The dots indicate median sub-optimality factor for individual scenarios and the stars indicate average performance over all scenarios tested. The point (0; 1) is an ideal point indicating zero failure rate and optimal solution quality. Some data points that have a sub-optimality factor greater than 8 are not visualized. 160 Table 5.4: Failure rate and suboptimality factor for all tested methods (T o = 7:5s) Method Failure Rate (%) Suboptimality Factor RRTC 47:71 3:14 RRTC-TSC 29:94 4:72 RRTC* 48:59 2:56 EETC 31:35 1:93 A-RRTC+A-EETC (BLEND) 17:30 2:18 CODES3-TSA 7:23 1:53 CODES3 2:05 1:42 Table 5.5: Failure rate and suboptimality factor for all tested methods (T o = 15s) Method Failure Rate (%) Suboptimality Factor RRTC 43:29 2:65 RRTC-TSC 16:56 2:59 RRTC* 43:37 2:47 EETC 25:39 1:90 A-RRTC+A-EETC (BLEND) 12:89 2:18 CODES3-TSA 4:40 1:45 CODES3 0:79 1:41 Results shown in Table 5.6 were computed using the data collected over 30 trials of each planner-scenario combination. A brief description of some of the test scenarios is given below. S13 features a slot along which a sanding tool is to be moved. In S31, the sanding tool is lodged between two horizontal bars, requiring some tool reorientation before it comes free. In S21, the manipulator is reaching into a machine and transferring a workpiece to another machine. In S19, the manipulator is reorienting the tool to better reach the other side of the large curved plate. In S12, the manipulator is exposed to a pillar problem where it has to move from one side to the other. S26 is skipped as it is described in detail in Figure 5.1(b). In S03, the manipulator is attempting to reach to the opposite side under table. In S23, the manipulator needs to retract through a tunnel to reach the other part. 161 Figure 5.13: Test scenarios where human operator assistance is utilized. For clarity, only one of many possible joint congurations is shown for each scenario. RRTC and RRTC* have the highest failure rates. In Figure 5.11, compared to RRTC, RRTC* yields only a slightly better solution quality and incurs an increased failure rate. This is due to time spent rewiring nodes that were never going to be part of a high-quality solution. Given more time, both RRTC and RRTC manage to achieve a roughly equal failure rate. EETC dominates RRTC in terms of failure rate and trajectory quality. The blending of EETC and RRTC as in A-RRTC+A-EETC (BLEND) yields lower failure rates and demonstrates the synergistic interaction between the strategies. This positive interaction is particularly noticeable in the workspace-misguiding scenarios (S12, S26, S19, S31) where EETC is likely to fail often. However, it does little to increase trajectory quality. Context-awareness and synergistic interaction of strategies help drive the failure rate down and increase trajectory quality in CODES3. Specically, random sampling inW-space helps to easily identify narrow passages inW-space that are close to the end-eector; thus, it eectively identies the correspondingC-space narrow passages (e.g. scenarios S23, S21). Although the trajectory 162 Table 5.6: Failure Rate (FR) and Path Length (PL) for all tested methods Scenario Metric RRTC RRTC* EETC CODES3 HA-CODES3 S1 FR (%) 86:67 96:67 86:67 0:03 0:00 PL (m) 0:64 0:59 0:71 0:68 0:62 S2 FR (%) 100:00 100:00 86:67 26:67 0:00 PL (m) - - 2:16 2:10 1:72 S3 FR (%) 100:00 100:00 100:00 53:33 0:00 PL (m) - - - 1:92 1:69 S4 FR (%) 100:00 100:00 100:0 0:00 0:00 PL (m) - - - 1:70 1:45 quality of alternative methods are occasionally better in some tough scenarios, CODES3 has a considerably lower failure rate and suers only slightly in terms of trajectory quality. Sub-optimal feasible solutions obtained at a low failure-rate are often more valuable than optimal solutions with high failure-rates. Nevertheless, in general, the trajectory quality is better when CODES3 is used; better trajectory quality is attained quickly compared to other methods. CODES3 is able to achieve a low suboptimality factor of 1:42 within the rst 7:5s, whereas the next best alternative EETC only manages to achieve 1:93 (Table 5.4). The impact of the tree selection strategy TS C is studied. The eect of using TS C is very pronounced whenever one of the trees is rooted in a narrow passage (e.g. scenario S26). The extent to which employing TS C benets RRTC is tested by considering RRTC-TSC, a variant of RRTC which uses TS C instead of TS A . It is observed that the use of TS C decreased the failure rate from 47:71% to 29:94%. CODES3-TSA, a variant of CODES3 using TS A instead of TS C was also tested. Not usingTS C resulted in an increase of failure rate from 2:05% to 7:23% in the case of CODES3-TSA. 5.5.1 Impact of Human Assistance A brief description of the test scenarios in Figure 5.13 is given. S1 is skipped as it is described in detail in Figure 5.1(b) as well as in [82]. In S2, a long rod-like tool is lodged between two 163 horizontal bars and it needs to be moved to the opposite side. S3 and S4 feature a scenario where a miniature jeep roll cage is welded. The manipulator is reaching into the jeep's window to weld the tubes. Navigating the voids between the tubular structures is a challenge. In Table 5.6, compared to RRTC, RRTC* gives slightly better solution quality. However, due to the time spent rewiring nodes that were never going to be part of a high-quality solution, RRTC* suers an increased failure rate. Context-awareness and synergistic interaction of strategies help drive the failure rate down and increase trajectory quality in CODES3. Specically, random sampling inW-space helps to creep along narrow passages close to the end-eector. As a result, the correspondingC-space narrow passages are identied eectively. Though the trajectory quality of alternative methods may be better in S1, CODES3 has a very low failure rate and yet comes very close to that of RRTC*. One often prefers low failure-rate over solution quality. In light of this, it is observed that HA-CODES3 performs very well with zero failure rate and a very good solution quality. 5.5.2 Impact of Strategies To study the impact of strategies, test scenarios shown in Figure 5.14 is used. A brief description of some of the test scenarios is given below. S03 features a table-like structure where the tool has to reach the other side of the central partition. In S04, a sanding tool needs to be moved from one side of the mold to the middle slot. There is a narrow constriction blocking the way and the tool cannot be slid through. In S07, the manipulator is reaching into the void formed by the cross-bars supporting the cylindrical structure. In S12, the manipulator is exposed to a pillar problem where it has to move from one side to the other. In S14, the manipulator needs to move from one mold to another mold. But, its motion highly restricted by the long vertical rod in the middle. S26 is skipped as it is described in detail in Figure 5.1(b). In S30, the manipulator needs to move the tool from a conned pose between two bars. The tool has to be taken out into a new pose where it is relatively less conned. 164 Figure 5.15 was generated using the data collected over 30 runs of each method-scenario combination (methods are listed in the legend). In Figure 5.15, C3 refers to CODES3. And, C3-P refers to CODES3 including the strategies TaS D (RBF), CS D (CB, Connection Biasing) and Auto-Focusing (AF). C3-P performs very well compared to the rest of the methods. Specically, it beats C3 (CODES3) in terms of failure rate. C3-P reduces the failure rate by 4 times (compared to C3) and brings it down to 1%. In terms of suboptimality factor, both C3-P and C3 perform equally well and have no discernible dierence. This is expected as none of the strategies (i.e., RBF, CB and AF) were focused towards improving solution quality. Table 5.7 compares the performance of C3 as dierent strategies are enabled and disabled. For instance, the column +CB consists of the baseline C3 and the inclusion of CB only. Similarly, the column +AF and +RBF refers to C3+AF and C3+RBF respectively. The method C3-P consists of the baseline C3 and the inclusion of RBF, CB and AF. When planning times are considered, the introduction of RBF has the most dramatic eect compared to that of CB and AF. Notably, introducing RBF in S07, S12, S26 and S30 reduces planning time by a large factor. This is due the presence of a few dicult regions (e.g., narrow passage) in these scenarios. Local models help accelerate the search and reduce planning time. However, in S14 and S23, there are too many of non-overlapping dicult regions in the cong- uration space (e.g., the tunnel in S23). Many local models are created and thus, the advantage of RBF is greatly diminished resulting in higher planning time than C3. No signicant change is observed in sub-optimality factor or failure rate. Introduction of AF generally reduces planning time with the exception of S26. S26 is a very challenging problem that involves intricate conguration space maneuvers. And, in this case AF increases the failure rate to 100% from the baseline of 88%. No signicant change is observed in sub-optimality factor. Introduction of CB consistently reduces planning time across all tested scenarios. 165 Through combining strategies, C3-P is able to generally reduce planning time by atleast 13% and upto 70%. In some cases, it does worse by as much as 20%. However, the overall failure rate is improved by C3-P compared to C3. Figure 5.14: A representative subset of test scenarios used for analyzing the impact of various strategies. For clarity, only one of many possible joint congurations is shown for each scenario. 166 Figure 5.15: Performance of CODES3 (C3), CODES3 with RBF, AF, and CB (C3-P) and other methods 167 Table 5.7: Impact of strategies on a subset of tested scenarios Scen. Planning Time (s) Sub-Optimality Factor Failure Rate (%) C3 +CB +AF +RBF C3-P Red. (%) C3 +CB +AF +RBF C3-P Red. (%) C3 +CB +AF +RBF C3-P Red. (%) S03 5:1 5:0 4:5 3:9 5:8 13:1 10:6 10:9 9:8 9:8 9:7 8:4 0:0 0:0 0:0 0:0 0:0 { S04 1:1 0:8 0:6 0:6 0:6 42:5 6:0 5:7 6:2 6:4 6:1 0:4 0:0 0:0 0:0 0:0 0:0 { S07 2:6 1:0 1:0 0:3 0:8 70:0 7:6 7:6 7:6 7:6 7:6 0:1 0:0 0:0 0:0 0:0 0:0 { S12 9:4 9:6 6:8 6:5 6:9 26:4 1:6 1:6 1:6 1:5 1:7 4:4 8:0 0:0 0:0 0:0 0:0 100:0 S14 7:8 6:8 6:9 9:7 9:3 19:3 1:9 1:9 1:9 1:8 1:8 2:0 0:0 15:8 5:0 0:0 5:0 { S23 2:9 2:4 2:7 2:8 2:6 13:0 1:5 1:5 1:5 1:4 1:5 1:5 0:0 0:0 0:0 0:0 0:0 { S26 16:6 15:7 { 10:8 14:5 14:1 1:3 1:8 { 1:7 1:6 28:5 88:0 20:0 100:0 78:0 36:0 59:5 S30 1:6 1:0 0:7 0:5 0:8 48:2 3:1 3:2 3:1 3:0 3:0 0:8 0:0 0:0 0:0 0:0 0:0 { 168 5.6 Summary A context-dependent bi-directional tree-search framework for point-to-point trajectory planning for manipulators is presented in this chapter. Conceptually, the framework is composed of six modules: tree selection, focus selection, node selection, target selection, extend selection and connection type selection. Each module consists of a set of interchangeable strategies. Many strategies are explained in detail in this chapter. Human assistance is also considered a strategy that could be used by the planner. By exploiting synergistic interaction between these strategies and selecting appropriate strategies based the contextual cues from the search state, the proposed method computes high quality solutions in a variety of complex scenarios with a low failure rate. It is also shown that some popular trajectory planning methods in the literature can be easily represented in the proposed framework. The proposed approach is compared with these popular methods in a diverse set of test scenarios. The proposed approach yields a 15-fold reduction in failure rate coupled with at least a 26% drop in solution suboptimality when compared to the best of the alternative methods. With human assistance, the proposed method has a zero failure rate as well as high solution quality in the test scenarios. 169 Chapter 6 Conclusion 6.1 Intellectual Contributions This dissertation develops computational techniques for speeding up for trajectory planning for autonomous robots operating in complex environments. This dissertation presents advancements in speed-up techniques to perform: (1) dynamic obstacle avoidance while satisfying dynamics con- straints and accounting for perception uncertainty, (2) dynamic obstacle avoidance while travers- ing large uncertain dynamic obstacles, satisfying trajectory constraints and accounting for motion and perception uncertainty, (3) obstacle avoidance in high-dimensional conguration spaces. The key contributions are outlined below. Reactive Trajectory Planning A speed-up scheme where the trajectory planning problem is decoupled into path generation and velocity tuning. A collision-risk assessment method that incorporates the varying dynamics constraints of the robot and uncertainty in the measure- ment of obstacle's state is presented. The speed-up scheme adapts to varying number of dynamic obstacles by prioritizing critical obstacles over non-critical obstacles. The scheme also is able to maintain real-time behavior and able to handle suddenly appearing obstacle by prioritizing it over other obstacles. 170 Deliberative Trajectory Planning The main contributions of this work are speed-up tech- niques applicable to trajectory planning over long distances while avoiding static obstacles and large dynamic obstacles. , introduces a failure-risk assessment scheme for traversing large dynamic obstacles. A speed-up scheme based on heuristics generated by an approxi- mate lower-dimensional problem for the search process to generate trajectories rapidly and be useful in dynamic environments. It also provides dynamic obstacle heuristics in the form of a precomputed look-up table. The generated heuristics are aware of large dynamic ob- stacles and help reduce the planning eort. A collision-risk assessment method that uses a precomputed trajectory tracking error model to handle motion uncertainty, incorporates perception uncertainty and large permeable dynamic obstacles. The incorporation of the risk of traversing large permeable dynamic obstacles into the planning process improves the success probability of the planned trajectory. The speed-up scheme produces safe trajecto- ries that execute faster than purely reactive methods on the physical robot platform. It also reacts safely to varying levels of perception uncertainty. Trajectory Planning for High Degree-of-Freedom Robots Operating in Cluttered Environ- ments The main contribution is a context-dependent bi-directional tree-search framework that schedules multiple speed-up techniques. It is a exible framework that enables switch- ing between multiple heuristics to yield a resilient search method amenable to machine learning. It is also able to represent many existing methods. Other contributions include new heuristics for tree selection, target selection, connection strategy selection. Moreover, a user-friendly interface for incorporating human assistance in a cooperative planning setting is presented. Finally, rules for scheduling heuristics is also given. Using these scheduling rules, CODES3, a high-performance trajectory planner based on the framework is presented. 171 6.2 Summary of Lessons Learnt The previous section went over the major contributions of this dissertation. This section is focused on discussing lessons learned from work in this dissertation and how these can be used by readers for speeding up trajectory planning. The simplest technique that can be used for speeding up computation is the technique of precomputation. Many methods rely on precomputed data for ecient planning. For example, consider a problem where repeated query of the distance from a 3D point to the closest obstacle in 3D space is required. This distance computation can be performed on-demand and this may be computationally expensive. When many distance queries are required, a lot of computation time is required to perform these queries. Euclidean Distance Transform (EDT) is a technique where the distance to obstacles is precomputed for every point in a query region (e.g., box) and stored in a 3D matrix. Having the EDT computed, the distance query becomes very fast. When a point is queried, the cell (query cell) corresponding the point in the 3D matrix is rst calculated. Then, the distance entry recorded in that cell is an approximate distance between the point and the closest obstacle. The distance is approximate due to the required discretization of the query region and only one distance is stored per cell. This highlights the need to be aware of the level of error that discretization of the problem brings about. This error can somewhat be mitigated by interpolating values between neighboring cells for a particular query cell. Furthermore, the ner the discretization level, the more memory is required to store the precomputed values. There is a trade o between storage space and computation time. There are few tips to keep in mind when picking a quantity to precompute. Sometimes the quantity to be precomputed may not be obvious. Some property of the system/problem instance may have to be precomputed. A contrived yet illustrative example would be the following. Suppose that set (S) of 10000 unique numbers are given. In that array, one number is randomly removed. Now, given the leftover set (S 0 ) of 9999 numbers, we are tasked to nd the number that was randomly removed. One of the 172 possible ways to nd the removed number is to compute an property of the original set and utilize it. In this case, the sum s of all numbers in S is a good property. Similarly, the sum s 0 of all numbers in S 0 can be computed. The missing number can be calculated as ss 0 . If a property is used, then it is a good idea to pick properties that are easy to compute. In the above case, the sum of these numbers is a simple property that can be computed very quickly. When the search-space is large, any search method becomes slow. If ecient heuristics can be computed, then large search-spaces can be handled easily. However, nding ecient heuristics can be more of an art than a scientic procedure. In the context of A* search, a heuristic is function that gives cost-to-go values for a given query state. The cost-to-go value represents the potential accumulated cost of going to the goal state from the query state. In such a case, to develop a good heuristic, a good starting point is to observe the terms of the cost function and identify terms for which good approximations can be quickly computed. For example, if a particular cost term is known to dominate the total cost at the goal state, then that term is a good candidate for an attempt at approximation. Other non-numeric heuristics can be generated by observing the regions of search-space that optimal/high-quality solutions pass through. If a function can be developed that takes cues from the search-space/search-state and produces promising regions in which high-quality solutions may lie, it can be used to guide search in the predicted promising regions. Another class of techniques is approximation techniques. The idea is to approximate the problem by simplifying certain less important details. For example, the popular non-linear pro- gramming method Sequential Quadratic Programming (SQP) follows this idea. In SQP, the original problem is approximated by a series of QPs (Quadratic Problem). As each QP is solved, the solution gets closer to the solution of the original problem. This way a series of QP is solved to nally reach the solution of the original problem. A similar technique can be adopted to solve trajectory search problems. A local analytical approximation of the constraints (e.g., obstacles) in the search space can be used to nd approximate solutions. These approximate solutions may 173 not be feasible considering the original constraints (i.e., actual obstacles). Nevertheless, the ap- proximate solutions can be repaired or used in some manner to aid generation of the solution to the original problem. Another approximation scheme could be dimensionality reduction. Instead of approximating constraints, the dimensionality of the state space can be reduced by ignoring the less dominant dimensions. The lower dimensional problem may be easier to solve quickly. The solutions of the lower dimensional problem can be repurposed as a heuristic/guess for the original high dimensional problem. Another related idea is to decouple complex problems into its constituent sub-problems. This involves rst observing where the problem is loosely coupled. For example, a trajectory generation problem can be decoupled into path generation and timing generation. Path generation is per- formed rst, followed by timing generation. Path generation deals with how to navigate around obstacles while timing generation deals with when to be at a particular point on the path. There can be certain conditions that need to be satised before such a decoupling can be justied. In the example above, the dynamics of the robot could prevent the robot from executing a certain path due to restrictions on velocity. However, restrictions on velocity are considered only in timing generation. Therefore, it is necessary to ensure the decoupling does not introduce infeasiblity. Once the sub-problems of path generation and timing generation are solved, the solution to the trajectory generation problem can be computed by assembling the solutions of the sub-problems. 6.3 Anticipated Benets This dissertation presents speed-up techniques for automated trajectory planning for autonomous mobile robots subject to motion uncertainty, perception uncertainty. Many mobile robots in practical settings require automated trajectory planning. Specically, it is anticipated that the methodology presented in this work will be applicable to mobile robots operating in an outdoor setting such as uneven terrain, water bodies. In such settings, the speed-up techniques will 174 allow the robot to quickly react to the environment and perform tasks safely. Depending on the application domain, this will serve as an enabling technology for advanced services. For instance, in the marine domain, applications in areas such as automated port assistance (e.g., container ship docking), water taxis, marine structure inspection are possible. It will reduce operating costs, reduce vessel collisions, improve trac ow, and increase robot survivability. This dissertation also presents a general framework accelerates trajectory planning for robots with high degrees of freedom through the speed-up techniques. Many industrial processes are currently use high degree-of-freedom manipulators that are manually programmed by a human operator. Methods presented in this work can greatly simplify work ows related to manipulators. For instance, humans can ooad the time-consuming process of manipulator trajectory generation to the proposed method. This can increase human productivity and manufacturing throughput. This work also serves as a enabling technology for higher-level task planning methods. 6.4 Future Directions Reactive Trajectory Planning for Mobile Robots The overall trajectory generation architec- ture does not make assumptions regarding sensing modalities and vehicle models. Though the trajectory generation method is presented in the context of UGVs over uneven terrain, it is widely applicable to a variety of mobile robot platforms and with some modications, even unmanned surface vehicles operating in dynamic sea states encountering dynamic obstacles. For each terrain type, the dynamics constraints were assumed to be known before hand. A natural extension would be to perform online learning of dynamics constraints, starting with conservative estimates and converging to the correct constraints. While eective, this approach has some limitations. Opportunities to explore multiple homotopy classes while navigating around st obstacles are eliminated due to the way multiple obstacles are handled. This issue can be addressed. 175 Deliberative Trajectory Planning for Mobile Robots Through the use of heuristics and adap- tive motion goal sets, the planning eort and time has been reduced. However, further reduction is required for practical deployment in congested harbors. Harbors have speed limits ranging between 1:8ms 1 5ms 1 and reacting to trac vessels potentially mov- ing at those speeds requires faster planning times than possible by the current MATLAB implementation. Furthermore, the proposed method does not address the reactive behaviors of trac vessels which is crucial for practical situations. Potential directions of future work are: (1) develop- ing an optimized C++ implementation to achieve a faster planning time by utilizing GPU and multi-core frameworks to perform the computations in the Monte Carlo scheme used for failure probability computations, (2) incorporating reactive behavior of trac vessels to the actions of the USV to ensure safe trajectories are generated. The interaction of wave- elds generated by multiple trac vessels has been ignored. The impact of this interaction remains to be studied. Trajectory Planning for High Degree-of-Freedom Manipulators Operating in Cluttered Envi- ronments Future avenues of research include developing more informed switching conditions that help switch between heuristics. There is signicant potential for improving the per- formance by introducing more informed switching conditions. And, non-linear connection strategies can developed to improve performance. Developing a polished and intuitive GUI for obtaining human assistance is a promising future work. Incorporating machine learning techniques to identify switching conditions automatically is a promising future direction. In addition, applying Deep Reinforcement Learning to learn a policy for changing heuris- tics is powerful and has a very high impact potential. Adapting incremental tree repairing techniques can provide asymptotic optimality guarantees. 176 Reference List [1] Bhp billiton plans for roboships within a decade | workboat. https://www.workboat. com/news/bluewater/bhp-billiton-plans-roboships-within-decade/. (Accessed on 05/21/2018). [2] The business of automation, betting on robots. [3] Latest news - annual overview of marine casualties and inci- dents 2017 - emsa - european maritime safety agency. http: //www.emsa.europa.eu/news-a-press-centre/external-news/item/ 3156-annual-overview-of-marine-casualties-and-incidents-2017.html. (Ac- cessed on 06/01/2018). [4] New project to prove use of autonomous surface vessels for oshore wind. https://ore. catapult.org.uk/press-releases/. (Accessed on 05/21/2018). [5] Rolls-royce might build the worlds rst autonomous naval vessel. https://futurism.com/ rolls-royce-might-build-the-worlds-first-autonomous-naval-vessel/. (Accessed on 05/21/2018). [6] Nato and rafael carry out unmanned surface vehicle missile exercise o israeli coast, May 2019. [7] M. Abdelaal and A. Hahn. Nmpc-based trajectory tracking and collision avoidance of unmanned surface vessels with rule-based colregs connement. In 2016 IEEE Conference on Systems, Process and Control (ICSPC), pages 23{28, Dec 2016. [8] A. Ademovic and B. Lacevic. Path planning for robotic manipulators using expanded bubbles of free c-space. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 77{82, May 2016. [9] A. Agha-mohammadi, S. Chakravorty, and N. M. Amato. Firm: Feedback controller-based information-state roadmap - a framework for motion planning under uncertainty. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4284{4291, Sept 2011. [10] Sandip Aine, Siddharth Swaminathan, Venkatraman Narayanan, Victor Hwang, and Maxim Likhachev. Multi-heuristic a*. The International Journal of Robotics Research, 35(1-3):224{ 243, 2016. [11] Hossein Akbaripour and Ellips Masehian. Semi-lazy probabilistic roadmap: a parameter- tuned, resilient and robust path planning method for manipulator robots. The International Journal of Advanced Manufacturing Technology, 89(5-8):1401{1430, 2017. [12] Baris Akgun and Mike Stilman. Sampling heuristics for optimal motion planning in high dimensions. In International Conference on Intelligent Robots and Systems, pages 2640{ 2645, 2011. 177 [13] Vivek Annem, Pradeep Rajendran, Shantanu Thakar, and Satyandra K. Gupta. Towards remote teleoperation of a semi-autonomous mobile manipulator system in machine tending tasks. [14] Georges S Aoude, Brandon D Luders, Joshua M Joseph, Nicholas Roy, and Jonathan P How. Probabilistically safe motion planning to avoid dynamic obstacles with uncertain motion patterns. Autonomous Robots, 35(1):51{76, jul 2013. [15] Csar Arismendi, David lvarez, Santiago Garrido, and Luis Moreno. Nonholonomic motion planning using the fast marching square method. International Journal of Advanced Robotic Systems, 12(5):56, 2015. [16] Oktay Arslan and Panagiotis Tsiotras. Machine learning guided exploration for sampling- based motion planning algorithms. In 2015 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), pages 2646{2652. IEEE, 2015. [17] Daman Bareiss and Jur van den Berg. Generalized reciprocal collision avoidance. The International Journal of Robotics Research, 34(12):1501{1514, 2015. [18] Timothy Barfoot. Informed RRT*: Optimal Sampling-based Path Planning Focused via . (Iros):1{8, 2014. [19] O. Burhan Bayazit, Guang Song, and Nancy M. Amato. Ligand binding with obprm and user input. In ICRA, 2001. [20] Mayur Bency, Ahmed Qureshi, and Michael Yip. Neural path planning: Fixed time, near- optimal path generation via oracle imitation. 04 2019. [21] Subhrajit Bhattacharya, Maxim Likhachev, and Vijay Kumar. Identication and represen- tation of homotopy classes of trajectories for search-based path planning in 3d. In Proceed- ings of Robotics: Science and Systems, Los Angeles, CA, USA, June 2011. [22] Francesco Biral, Enrico Bertolazzi, and Paolo Bosetti. Notes on numerical methods for solving optimal control problems. IEEJ Journal of Industry Applications, 5(2):154{166, 2016. [23] L. Blackmore, M. Ono, and B. C. Williams. Chance-constrained optimal path planning with obstacles. IEEE Transactions on Robotics, 27(6):1080{1094, Dec 2011. [24] M. Blaich, S. Weber, J. Reuter, and A. Hahn. Motion safety for vessels: An approach based on inevitable collision states. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1077{1082, Sept 2015. [25] J. Borenstein and Y. Koren. The vector eld histogram-fast obstacle avoidance for mobile robots. Robotics and Automation, IEEE Transactions on, 7(3):278{288, Jun 1991. [26] B. Braginsky and H. Guterman. Trajectory controller for autonomous surface vehicle under sea waves. In OCEANS 2015 - MTS/IEEE Washington, pages 1{5, Oct 2015. [27] O. Brock, J. Trinkle, and F. Ramos. Planning Long Dynamically-Feasible Maneuvers for Autonomous Vehicles, pages 214{221. MIT Press, 2009. [28] Oliver Brock and Lydia E Kavraki. Decomposition-based motion planning: A framework for real-time motion planning in high-dimensional conguration spaces. In IEEE International Conference on Robotics and Automation (ICRA), volume 2, pages 1469{1474, 2001. 178 [29] A. Bry and N. Roy. Rapidly-exploring random belief trees for motion planning under uncertainty. In 2011 IEEE International Conference on Robotics and Automation, pages 723{730, May 2011. [30] F. Caccavale and B. Siciliano. Quaternion-based kinematic control of redundant space- craft/manipulator systems. In Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164), volume 1, pages 435{440 vol.1, May 2001. [31] A. Censi, D. Calisi, A. De Luca, and G. Oriolo. A bayesian framework for optimal mo- tion planning with uncertainty. In 2008 IEEE International Conference on Robotics and Automation, pages 1798{1805, May 2008. [32] Chao Chen. Motion Planning for Nonholonomic Vehicles with Space Exploration Guided Heuristic Search. Dissertation, Technische Universitt Mnchen, Mnchen, 2016. [33] Chao Chen, Markus Rickert, and Alois Knoll. Motion planning under perception and control uncertainties with Space Exploration Guided Heuristic Search. IEEE Intelligent Vehicles Symposium, Proceedings, (Iv):712{718, 2017. [34] S. Choudhury, J. D. Gammell, T. D. Barfoot, S. S. Srinivasa, and S. Scherer. Regionally accelerated batch informed trees (rabit*): A framework to integrate local information into optimal path planning. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 4207{4214, May 2016. [35] Benjamin J Cohen, Sachin Chitta, and Maxim Likhachev. Search-based Planning for Ma- nipulation with Motion Primitives. In IEEE International Conference on Robotics and Automation (ICRA), 2010. [36] Benjamin J Cohen, Gokul Subramanian, Sachin Chitta, and Maxim Likhachev. Planning for Manipulation with Adaptive Motion Primitives. In International Conference on Robotics and Automation, 2011. [37] Camilla Colombo, Massimiliano Vasile, and Gianmarco Radice. Optimal low-thrust tra- jectories to asteroids through an algorithm based on dierential dynamic programming. Celestial mechanics and dynamical astronomy, 105(1-3):75{112, 2009. [38] S ebastien Dalibard and Jean-Paul Laumond. Linear dimensionality reduction in random motion planning. The International Journal of Robotics Research, 30(12):1461{1476, 2011. [39] R. D'Andrea. Guest editorial: A revolution in the warehouse: A retrospective on kiva systems and the grand challenges ahead. IEEE Transactions on Automation Science and Engineering, 9(4):638{639, Oct 2012. [40] Kenny Daniel, Alex Nash, Sven Koenig, and Ariel Felner. Theta*: Any-angle path planning on grids. Journal of Articial Intelligence Research, 39:533{579, 2010. [41] Nikhil Das, Naman Gupta, and Michael Yip. Fastron: An online learning-based model and active learning strategy for proxy collision detection. In Sergey Levine, Vincent Vanhoucke, and Ken Goldberg, editors, Proceedings of the 1st Annual Conference on Robot Learning, volume 78 of Proceedings of Machine Learning Research, pages 496{504. PMLR, 13{15 Nov 2017. [42] V. Delsart and T. Fraichard. Tiji, a generic trajectory generation tool for motion plan- ning and control. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1439{1444, Oct 2010. 179 [43] Jory Denny, Read Sandstr om, Nicole Julian, and Nancy M Amato. A Region-Based Strategy for Collaborative Roadmap Construction, pages 125{141. Springer International Publishing, Cham, 2015. [44] Jory Denny, Read Sandstrm, and Nancy Amato. A General Region-Based Framework for Collaborative Planning, pages 563{579. 01 2018. [45] Rosen Diankov. Bispace planning: Concurrent multi-space exploration. Proceedings of Robotics: Science and Systems IV, 63, 2008. [46] E. W. Dijkstra. A note on two problems in connexion with graphs. Numer. Math., 1(1):269{ 271, December 1959. [47] Noel E. Du Toit and Joel W. Burdick. Robot motion planning in dynamic, uncertain environments. IEEE Transactions on Robotics, 28(1):101{115, 2012. [48] L. E. Dubins. On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. American Journal of Mathematics, 79(3):497{516, 1957. [49] M. Elbanhawi and M. Simic. Sampling-based robot motion planning: A review. IEEE Access, 2:56{77, 2014. [50] Aim ee Vargas Estrada, Jyh-Ming Lien, and Nancy M Amato. Vizmo++: a visualization, authoring, and educational tool for motion planning. In Proceedings 2006 IEEE Interna- tional Conference on Robotics and Automation, 2006. ICRA 2006., pages 727{732. IEEE, 2006. [51] Paolo Fiorini and Zvi Shiller. Motion planning in dynamic environments using velocity obstacles. I. J. Robotic Res., 17(7):760{772, 1998. [52] T. Fossen. Guidance and Control of Ocean Vehicles. Wiley, New York, NY, 1995. [53] D. Fox, W. Burgard, and S. Thrun. The dynamic window approach to collision avoidance. Robotics Automation Magazine, IEEE, 4(1):23{33, Mar 1997. [54] T. Fraichard. A short paper about motion safety. In Robotics and Automation, 2007 IEEE International Conference on, pages 1140{1145, April 2007. [55] T. Fraichard and H. Asama. Inevitable collision states. a step towards safer robots? In Intelligent Robots and Systems, 2003. (IROS 2003). Proceedings. 2003 IEEE/RSJ Interna- tional Conference on, volume 1, pages 388{393 vol.1, Oct 2003. [56] C. Fulgenzi, A. Spalanzani, and C. Laugier. Dynamic obstacle avoidance in uncertain environment combining PVOs and occupancy grid. In IEEE International Conference on Robotics and Automation, 2007, pages 1610{1616. IEEE. [57] C. Fulgenzi, C. Tay, A. Spalanzani, and C. Laugier. Probabilistic navigation in dynamic en- vironment using rapidly-exploring random trees and gaussian processes. In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1056{1062, Sept 2008. [58] J. D. Gammell, T. D. Barfoot, and S. S. Srinivasa. Informed sampling for asymptotically optimal path planning. IEEE Transactions on Robotics, pages 1{19, 2018. [59] Jonathan D. Gammell, Siddhartha S. Srinivasa, and Timothy D. Barfoot. Batch Informed Trees (BIT): Sampling-based optimal planning via the heuristically guided search of implicit random geometric graphs. In IEEE International Conference on Robotics and Automation, volume 2015-June, pages 3067{3074, 2015. 180 [60] Kalin Gochev, Benjamin Cohen, Jonathan Butzke, Alla Safonova, and Maxim Likhachev. Path Planning with Adaptive Dimensionality. In The Symposium on Combinatorial Search (SoCS), 2011. [61] Kalin Gochev, Venkatraman Narayanan, Benjamin Cohen, Alla Safonova, and Maxim Likhachev. Motion Planning for Robotic Manipulators with Independent Wrist Joints. In International Conference on Robotics and Automation, 2014. [62] Kalin Gochev, Alla Safonova, and Maxim Likhachev. Planning with adaptive dimensionality for mobile manipulation. Proceedings - IEEE International Conference on Robotics and Automation, pages 2944{2951, 2012. [63] Palash Goyal, Sujit Rokka Chhetri, and Arquimedes Canedo. dyngraph2vec: Capturing network dynamics using dynamic graph representation learning, 2018. [64] Palash Goyal and Emilio Ferrara. Graph embedding techniques, applications, and perfor- mance: A survey. Knowledge-Based Systems, 151:78{94, 2018. [65] Palash Goyal, Nitin Kamra, Xinran He, and Yan Liu. Dyngem: Deep embedding method for dynamic graphs. arXiv preprint arXiv:1805.11273, 2018. [66] M. Greytak and F. Hover. Motion planning with an analytic risk cost for holonomic vehicles. In Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, pages 5655{5660, Dec 2009. [67] Eric Guizzo. Three engineers, hundreds of robots, one warehouse. IEEE spectrum, 45(7):26{ 34, 2008. [68] Li Han. A kinematics-based probabilistic roadmap method for closed chain systems. In International Workshop on Algorithmic Foundations of Robotics (WAFR. Citeseer, 2000. [69] P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100{ 107, July 1968. [70] K. Hauser and V. Ng-Thow-Hing. Fast smoothing of manipulator trajectories using optimal bounded-acceleration shortcuts. In 2010 IEEE International Conference on Robotics and Automation, pages 2493{2498, May 2010. [71] T. Hilmer and E. Thornhill. Observations of predictive skill for real-time deterministic sea waves from the WaMoS II. In OCEANS 2015 - MTS/IEEE Washington, pages 1{7, Oct 2015. [72] Thomas M Howard and Alonzo Kelly. Optimal rough terrain trajectory generation for wheeled mobile robots. The International Journal of Robotics Research, 26(2):141{166, 2007. [73] T.M. Howard and A. Kelly. Trajectory and spline generation for all-wheel steering mobile robots. In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages 4827{4832, Oct 2006. [74] Jinwook Huh and Daniel D Lee. Ecient sampling withq-learning to guide rapidly exploring random trees. IEEE Robotics and Automation Letters, 3(4):3868{3875, 2018. [75] Karl Iagnemma, Shingo Shimoda, and Zvi Shiller. Near-optimal navigation of high speed mobile robots on uneven terrain. In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 4098{4103. IEEE, 2008. 181 [76] Brian Ichter and Marco Pavone. Robot motion planning in learned latent spaces. CoRR, abs/1807.10366, 2018. [77] Fahad Islam, Oren Salzman, and Maxim Likhachev. Online, interactive user guidance for high-dimensional, constrained motion planning. In IJCAI, 2018. [78] S. Guy D. Manocha J. van den Berg, J. Snape. Reciprocal collision avoidance with acceleration-velocity obstacles. In IEEE Int. Conf. on Robotics and Automation, 2011. [79] Lucas Janson, Edward Schmerling, and Marco Pavone. Monte carlo motion planning for robot trajectory optimization under uncertainty. CoRR, abs/1504.08053, 2015. [80] Derek Jung and Kamal K. Gupta. Octree-based hierarchical distance maps for collision detection. Proceedings of IEEE International Conference on Robotics and Automation, 1:454{459 vol.1, 1996. [81] A. M. Kabir, J. D. Langsfeld, S. Shriyam, V. S. Rachakonda, C. Zhuang, K. N. Kaipa, J. Marvel, and S. K. Gupta. Planning algorithms for multi-setup multi-pass robotic clean- ing with oscillatory moving tools. In 2016 IEEE International Conference on Automation Science and Engineering (CASE), pages 751{757, Fort Worth, Texas, USA, Aug 2016. [82] A. M. Kabir, B. C. Shah, and S. K. Gupta. Trajectory planning for manipulators operating in conned workspaces. In 2018 IEEE International Conference on Automation Science and Engineering (CASE), Munich, Germany, Aug 2018. [83] Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains. Articial Intelligence, 101(1):99 { 134, 1998. [84] Mrinal Kalakrishnan, Sachin Chitta, Evangelos Theodorou, Peter Pastor, and Stefan Schaal. STOMP: Stochastic Trajectory Optimization for Motion Planning. In International Con- ference on Robotics and Automation, 2011. [85] H. Kang and F. C. Park. Conguration space learning for constrained manipulation tasks us- ing gaussian processes. In 2014 IEEE-RAS International Conference on Humanoid Robots, pages 1088{1093, Nov 2014. [86] Sertac Karaman, Matthew Walter, Alejandro Perez, Emilio Frazzoli, and Seth Teller. Any- time Motion Planning Using the Rrt. In International Conference on Robotics and Automa- tion, pages 1478{1483, 2011. [87] Lydia E. Kavraki, Petr Svec, Jean-Paul Laumond, and Mark H Overmars. Probabilistic roadmaps for path planning in high-dimensional conguration space. IEEE Transactions on Robotics, 1996. [88] Alonzo Kelly and Bryan Nagy. Reactive nonholonomic trajectory generation via parametric optimal control. The International Journal of Robotics Research, 22(7-8):583{601, 2003. [89] Oussama Khatib. Real-Time Obstacle Avoidance for Manipulators and Mobile Robots. The International Journal of Robotics Research, 1986. [90] Scott Kiesel, Tianyi Gu, and Wheeler Ruml. An eort bias for sampling-based motion plan- ning. In Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on, pages 2864{2871. IEEE, 2017. 182 [91] Donghyuk Kim, Youngsun Kwon, and Sung-Eui Yoon. Dancing prm: Simultaneous plan- ning of sampling and optimization with conguration free space approximation. In IEEE International Conference on Robotics and Automation (ICRA). IEEE Robotics and Au- tomation Society, 2018. [92] Kihwan Kim, Dongryeol Lee, and I. Essa. Gaussian process regression ow for analysis of motion trajectories. In 2011 International Conference on Computer Vision, pages 1164{ 1171, Nov 2011. [93] Wilhelm B Klinger, Ivan R Bertaska, Karl D von Ellenrieder, and Manhar R Dhanak. Con- trol of an unmanned surface vehicle with uncertain displacement and drag. IEEE Journal of Oceanic Engineering, 42(2):458{476, 2016. [94] Ross A Knepper, Siddhartha S Srinivasa, and Matthew T Mason. Toward a deeper under- standing of motion alternatives via an equivalence relation on local paths. The International Journal of Robotics Research, 31(2):167{186, 2012. [95] James J Kuner and Steven M LaValle. Rrt-connect: An ecient approach to single- query path planning. In Robotics and Automation, 2000. Proceedings. ICRA'00. IEEE International Conference on, volume 2, pages 995{1001. IEEE, 2000. [96] James J Kuner and Steven M LaValle. RRT-connect: An ecient approach to single-query path planning. In IEEE International Conference on Robotics and Automation (ICRA)., volume 2, pages 995{1001. IEEE, 2000. [97] Yen-Ling Kuo, Andrei Barbu, and Boris Katz. Deep sequential models for sampling-based planning. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6490{6497. IEEE, 2018. [98] Hanna Kurniawati and David Hsu. Workspace-based connectivity oracle: An adaptive sampling strategy for prm planning. Springer Tracts in Advanced Robotics, 47:35{51, 01 2006. [99] Aleksandr Kushleyev and M. Likhachev. Time-bounded lattice for ecient planning in dynamic environments. In Robotics and Automation, 2009. ICRA '09. IEEE International Conference on, pages 1662{1668, May 2009. [100] J. G. Kusters, K. L. Cockrell, B. S. H. Connell, J. P. Rudzinsky, and V. J. Vinciullo. Futurewaves: A real-time ship motion forecasting system employing advanced wave-sensing radar. In OCEANS 2016 MTS/IEEE Monterey, pages 1{9, Sept 2016. [101] B. Lacevic, D. Osmankovic, and A. Ademovic. Burs of free c-space: A novel structure for path planning. pages 70{76, May 2016. [102] B. Lacevic, D. Osmankovic, and A. Ademovic. Path planning using adaptive burs of free conguration space. In 2017 XXVI International Conference on Information, Communica- tion and Automation Technologies (ICAT), pages 1{6, Oct 2017. [103] Bakir Lacevic, Dinko Osmankovic, and Adnan Ademovic. Burs of free c-space: a novel structure for path planning. In IEEE International Conference on Robotics and Automation (ICRA), pages 70{76, 2016. [104] A. Lambert, D. Gruyer, and G. Saint Pierre. A fast monte carlo algorithm for collision probability estimation. In 2008 10th International Conference on Control, Automation, Robotics and Vision, pages 406{411, Dec 2008. 183 [105] Philipp Last, Christian Bahlke, Martin Hering-Bertram, and Lars Linsen. Comprehensive analysis of automatic identication system (ais) data in regard to vessel movement predic- tion. Journal of Navigation, 67(5):791809, 2014. [106] S. M. LaValle. Planning algorithms. Cambridge University Press, Cambridge, U.K., 2006. Available at http://planning.cs.uiuc.edu. [107] Steven M LaValle. Rapidly-exploring random trees: A new tool for path planning. 1998. [108] Steven M LaValle. Rapidly-exploring random trees: A new tool for path planning. 1998. [109] Steven M LaValle. From dynamic programming to rrts: Algorithmic design of feasible trajectories. In Control Problems in Robotics, pages 19{37. Springer, 2003. [110] Steven M. LaValle, James J. Kuner, and Jr. Rapidly-exploring random trees: Progress and prospects, 2000. [111] Duong Le and Erion Plaku. Guiding sampling-based tree search for motion planning with dynamics via probabilistic roadmap abstractions. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 212{217, 2014. [112] J. Lee and S. Yoon. Prot: Productive regions oriented task space path planning for hyper- redundant manipulators. In 2014 IEEE International Conference on Robotics and Automa- tion (ICRA), pages 6491{6498, May 2014. [113] S Lef evre. A survey on motion prediction and risk assessment for intelligent vehicles. Robomech . . . , pages 1{14, 2014. [114] Anastasios M Lekkas. Guidance and Path-Planning Systems for Autonomous Vehicles. Number April. 2014. [115] D. Lenz, M. Rickert, and A. Knoll. Heuristic search in belief space for motion planning under uncertainties. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2659{2665, Sept 2015. [116] Lihong Li, Thomas J. Walsh, and Michael L. Littman. Towards a unied theory of state abstraction for mdps. In ISAIM, 2006. [117] Xiaohui Li, Zhenping Sun, Qi Zhu, and Daxue Liu. A unied approach to local trajec- tory planning and control for autonomous driving along a reference path. In 2014 IEEE International Conference on Mechatronics and Automation, pages 1716{1721. IEEE, 2014. [118] Maxim Likhachev and Dave Ferguson. Planning long dynamically feasible maneuvers for autonomous vehicles. Int. J. Rob. Res., 28(8):933{945, August 2009. [119] Maxim Likhachev, David I Ferguson, Georey J Gordon, Anthony Stentz, and Sebastian Thrun. Anytime dynamic A*: An anytime, replanning algorithm. In ICAPS, pages 262{ 271, 2005. [120] Lantao Liu and Gaurav S Sukhatme. A Solution to Time-Varying Markov Decision Pro- cesses. 2018. [121] S. Liu, M. Watterson, K. Mohta, K. Sun, S. Bhattacharya, C. J. Taylor, and V. Kumar. Planning dynamically feasible trajectories for quadrotors using safe ight corridors in 3-d complex environments. IEEE Robotics and Automation Letters, 2(3):1688{1695, July 2017. 184 [122] Y. Liu, R. Song, and R. Bucknall. A practical path planning and navigation algorithm for an unmanned surface vehicle using the fast marching algorithm. In OCEANS 2015 - Genova, pages 1{7, May 2015. [123] Marco Pavone Lucas Janson, Edward Schmerling, Ashley Clark. Fast Marching Tree: a Fast Marching Sampling-Based Method for Optimal Motion Planning in Many Dimensions. International Journal of Robotics Research, 34(7):883{921, 2015. [124] S. Mahdavi, S. Khoshraftar, and A. An. dynnode2vec: Scalable dynamic network embed- ding. In 2018 IEEE International Conference on Big Data (Big Data), pages 3762{3765, Dec 2018. [125] Rishi K. Malhan, Ariyan M. Kabir, Brual C. Shah, and Satyandra K. Gupta. Identifying feasible workpiece placement with respect to redundant manipulator for complex manu- facturing tasks. In IEEE International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 2019. [126] James D Marble and Kostas E Bekris. Asymptotically near-optimal is good enough for motion planning. In Robotics Research, pages 419{436. Springer, 2017. [127] L. Martinez-Gomez and T. Fraichard. Collision avoidance in dynamic environments: An ics-based solution and its comparative evaluation. In 2009 IEEE International Conference on Robotics and Automation, pages 100{105, May 2009. [128] Kayla Matthews. 6 industries that have been improved by robotic automation. [129] Troy McMahon, Shawna Thomas, and Nancy M Amato. Sampling-based motion planning with reachable volumes for high-degree-of-freedom manipulators. The International Journal of Robotics Research, 37(7):779{817, 2018. [130] Siddhartha Mehta, Cuong Ton, Michael Mccourt, Zhen Kan, E.A. Doucette, and W Curtis. Human-assisted rrt for path planning in urban environments. pages 941{946, 10 2015. [131] D. Mellinger and V. Kumar. Control and planning for vehicles with uncertainty in dynamics. In 2010 IEEE International Conference on Robotics and Automation, pages 960{965, May 2010. [132] Arjun Menon, Benjamin Cohen, and Maxim Likhachev. Motion Planning for Smooth Pickup of Moving Objects. In IEEE International Conference on Robotics and Automation (ICRA), 2014. [133] George Mesesan, Mximo Alejandro Roa Garzon, Esra er, and Matthias Altho. Hierarchical path planner using workspace decomposition and parallel task-space rrts. 10 2018. [134] J. Minguez and L. Montano. Nearness diagram navigation (nd): a new real time collision avoidance approach. In Intelligent Robots and Systems, 2000. (IROS 2000). Proceedings. 2000 IEEE/RSJ International Conference on, volume 3, pages 2094{2100 vol.3, 2000. [135] Wasif Naeem, George W. Irwin, and Aolei Yang. Colregs-based collision avoidance strategies for unmanned surface vehicles. Mechatronics, 22(6):669 { 678, 2012. [136] Jauwairia Nasir, Fahad Islam, Usman Malik, Yasar Ayaz, Osman Hasan, Mushtaq Khan, and Mannan Saeed Muhammad. Rrt*-smart: A rapid convergence implementation of rrt* regular paper. International Journal of Advanced Robotic Systems, 2013. 185 [137] Teck Chew Ng, J. Ibanez-Guzman, Jian Shen, Zhiming Gong, Han Wang, and Chen Cheng. Vehicle following with obstacle avoidance capabilities in natural environments. In Robotics and Automation, 2004. Proceedings. ICRA '04. 2004 IEEE International Conference on, volume 5, pages 4283{4288 Vol.5, April 2004. [138] Iram Noreen, Amna Khan, and Zulqar Habib. Optimal path planning using rrt* based approaches: a survey and future directions. Int. J. Adv. Comput. Sci. Appl, 7:97{107, 2016. [139] Dinko Osmankovic and Bakir Lacevic. Rapidly exploring bur trees for optimal motion planning. pages 002085{002090, 2016. [140] M. Otte, W. Silva, and E. Frew. Any-time path-planning: Time-varying wind eld + moving obstacles. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 2575{2582, May 2016. [141] J. Pan, L. Zhang, and D. Manocha. Retraction-based rrt planner for articulated models. In 2010 IEEE International Conference on Robotics and Automation, pages 2529{2536, May 2010. [142] Zherong Pan, Chonhyon Park, and Dinesh Manocha. Robot Motion Planning for Pouring Liquids. International Conference on Automated Planning and Scheduling, (Icaps):518{526, 2016. [143] J. Park and J. Kim. Predictive evaluation of ship collision risk using the concept of proba- bility ow. IEEE Journal of Oceanic Engineering, 42(4):836{845, Oct 2017. [144] Nicola Pedrocchi, Federico Vicentini, Malosio Matteo, and Lorenzo Molinari Tosatti. Safe human-robot cooperation in an industrial environment. International Journal of Advanced Robotic Systems, 10(1):27, 2013. [145] Thierry Peynot, Sin-Ting Lui, Rowan McAllister, Robert Fitch, and Salah Sukkarieh. Learned stochastic mobility prediction for planning with control uncertainty on unstruc- tured terrain. J. Field Robotics, 31:969{995, 2014. [146] Mike Phillips, Benjamin Cohen, Sachin Chitta, and Maxim Likhachev. E-graphs: Boot- strapping planning with experience graphs. In In Proceedings of the Robotics: Science and Systems Conference (RSS 2012), July 2012. [147] Vinay Pilania and Kamal Gupta. A hierarchical and adaptive mobile manipulator planner. In IEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 45{51. IEEE, 2014. [148] Vinay Pilania and Kamal Gupta. A hierarchical and adaptive mobile manipulator planner with base pose uncertainty. Autonomous Robots, 39(1):65{85, 2015. [149] Vinay Pilania and Kamal Gupta. Mobile manipulator planning under uncertainty in un- known environments. The International Journal of Robotics Research, 37(2-3):316{339, 2018. [150] Erion Plaku, Lydia E Kavraki, and Moshe Y Vardi. Discrete Search Leading Continuous Exploration for Kinodynamic Motion Planning. In Robotics: Science and Systems, 2007. [151] Oliver Purwin and Raaello DAndrea. Trajectory generation and control for four wheeled omnidirectional vehicles. Robotics and Autonomous Systems, 54(1):13{22, 2006. [152] Sean Quinlan. Real-time modication of collision-free paths. Number 1537. Stanford Uni- versity Stanford, 1994. 186 [153] Ana Huam an Quispe and Mike Stilman. Deterministic motion planning for redundant robots along end-eector paths. In Humanoid Robots (Humanoids), 2012 12th IEEE-RAS International Conference on, pages 785{790. IEEE, 2012. [154] Ahmed H. Qureshi and Michael C. Yip. Deeply informed neural sampling for robot motion planning. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6582{6588, 2018. [155] Mohammadhussein Raeisakhaei, Amirhossein Tamjidi, Suman Chakravorty, and P. R. Ku- mar. Feedback motion planning under non-gaussian uncertainty and non-convex state con- straints. CoRR, abs/1511.05186, 2015. [156] Pradeep Rajendran, Brual C. Shah, and Satyandra K. Gupta. Dynamics-Aware Reactive Planning for Unmanned Ground Vehicles to Avoid Collisions with Dynamic Obstacles on Uneven Terrains. In Workshop on Planning and Robotics (PlanRob), held at International Conference on Automated Planning and Scheduling (ICAPS' 17), Pittsburgh, PA, June 19 - 20, 2017, 2017. [157] J. H. Reif. Complexity of the mover's problem and generalizations. In 20th Annual Sympo- sium on Foundations of Computer Science (sfcs 1979), pages 421{427, Oct 1979. [158] M. Rickert, A. Sieverling, and O. Brock. Balancing exploration and exploitation in sampling- based motion planning. IEEE Transactions on Robotics, 30(6):1305{1317, Dec 2014. [159] Markus Rickert, Oliver Brock, and Alois Knoll. Balancing exploration and exploitation in motion planning. In IEEE International Conference on Robotics and Automation (ICRA)., pages 2812{2817, 2008. [160] Samuel Rodriguez, Shawna Thomas, Roger Pearce, and Nancy M Amato. Resampl: A region-sensitive adaptive motion planner. In Algorithmic Foundation of Robotics VII, pages 285{300. Springer, 2008. [161] Nicholas Roy and Sebastian Thrun. Coastal navigation with mobile robots. In Advances in Neural Information Processing Systems, pages 1043{1049, 2000. [162] Gildardo S anchez and Jean-Claude Latombe. On delaying collision checking in prm plan- ning: Application to multi-robot coordination. The International Journal of Robotics Re- search, 21(1):5{26, 2002. [163] Edoardo I. Sarda, Huajin Qu, Ivan R. Bertaska, and Karl D. von Ellenrieder. Station- keeping control of an unmanned surface vehicle exposed to current and wind disturbances. CoRR, abs/1702.04941, 2017. [164] Edward Schmerling and Marco Pavone. Evaluating trajectory collision probability through adaptive importance sampling for safe motion planning. CoRR, abs/1609.05399, 2016. [165] Jonathan Scholz and Mike Stilman. Combining motion planning and optimization for ex- ible robot manipulation. 2010 10th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2010, 12 2010. [166] Brual Shah and Satyandra K. Gupta. Speeding up A* search on visibility graphs dened over quadtrees to enable long distance path planning for unmanned surface vehicles. In Internation Conference on Automated Planning and Scheduling (ICAPS' 16), London, UK, June 12 - 17, 2016, 2016. 187 [167] Brual C Shah, Petr Svec, Ivan R Bertaska, Armando J Sinisterra, Wilhelm Klinger, Karl von Ellenrieder, Manhar Dhanak, and Satyandra K Gupta. Resolution-adaptive risk-aware trajectory planning for surface vehicles operating in congested civilian trac. Autonomous Robots, pages 1{25, 2015. [168] Brual C. Shah, Petr Svec, Ivan R. Bertaska, Armando J. Sinisterra, Wilhelm Klinger, Karl von Ellenrieder, Manhar Dhanak, and Satyandra K. Gupta. Resolution-adaptive risk-aware trajectory planning for surface vehicles operating in congested civilian trac. Autonomous Robots, 40(7):1139{1163, Oct 2016. [169] M. Sheckells, T. M. Caldwell, and M. Kobilarov. Fast approximate path coordinate motion primitives for autonomous driving. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 837{842, Dec 2017. [170] K. Shi, J. Denny, and N. M. Amato. Spark prm: Using rrts within prms to eciently explore narrow passages. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 4659{4666, May 2014. [171] Z. Shiller, F. Large, and S. Sekhavat. Motion planning in dynamic environments: obstacles moving along arbitrary trajectories. In Robotics and Automation, 2001. Proceedings 2001 ICRA. IEEE International Conference on, volume 4, pages 3716{3721 vol.4, 2001. [172] S. Shimoda, Y. Kuroda, and K. Iagnemma. Potential eld navigation of high speed un- manned ground vehicles on uneven terrain. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pages 2828{2833, April 2005. [173] A. Shkolnik and R. Tedrake. Path planning in 1000+ dimensions using a task-space voronoi bias. In 2009 IEEE International Conference on Robotics and Automation, pages 2061{ 2067, May 2009. [174] Alexander Shkolnik and Russ Tedrake. Sample-based planning with volumes in congura- tion space. arXiv preprint arXiv:1109.3145, 2011. [175] Shaurya Shriyam and Satyandra K. Gupta. Task assignment and scheduling for mobile robot teams. In ASME International Design Engineering Technical Conferences Computers and Information in Engineering Conference (IDETC/CIE), Quebec City, Canada, Aug 2018. [176] David Silver and Joel Veness. Monte-carlo planning in large pomdps. In Advances in neural information processing systems, pages 2164{2172, 2010. [177] Rui Song, Yuanchang Liu, and Richard Bucknall. A multi-layered fast marching method for unmanned surface vehicle path planning in a time-variant maritime environment. Ocean Engineering, 129:301 { 317, 2017. [178] Matthew Spenko, Yoji Kuroda, Steven Dubowsky, and Karl Iagnemma. Hazard avoidance for high-speed mobile robots in rough terrain. Journal of Field Robotics, 23(5):311{331, 2006. [179] Matthew Spenko, Yoji Kuroda, Steven Dubowsky, and Karl Iagnemma. Hazard avoidance for high-speed mobile robots in rough terrain. Journal of Field Robotics, 23(5):311{331, 2006. [180] Stephen B. Stancli, John M. Dolan, and Ashitey Trebi-Ollennu. Towards a predictive model of mobile robot reliability. Technical Report CMU-RI-TR-05-38, Pittsburgh, PA, August 2005. 188 [181] M. Stilman. Global manipulation planning in robot joint space with task constraints. IEEE Transactions on Robotics, 26(3):576{584, June 2010. [182] Mike Stilman. Task constrained motion planning in robot joint space. IEEE International Conference on Intelligent Robots and Systems (ICRA), pages 3074{3081, 2007. [183] Daniel Strawser and Brian Williams. Approximate Branch and Bound for Fast , Risk-Bound Stochastic Path Planning. pages 7047{7054, 2018. [184] P. Svec, M. Schwartz, A. Thakur, and S. K. Gupta. Trajectory planning with look-ahead for unmanned sea surface vehicles to handle environmental disturbances. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1154{1159, Sept 2011. [185] P. Svec, B. C. Shah, I. R. Bertaska, J. Alvarez, A. J. Sinisterra, K. von Ellenrieder, M. Dhanak, and S. K. Gupta. Dynamics-aware target following for an autonomous sur- face vehicle operating under colregs in civilian trac. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3871{3878, Nov 2013. [186] Petr Svec, Brual C. Shah, Ivan R. Bertaska, Wilhelm Klinger, Armando J. Sinisterra, Karl von Ellenrieder, Manhar Dhanak, and Satyandra K. Gupta. Adaptive sampling based colregs-compliant obstacle avoidance for autonomous surface vehicles. 2014. [187] Adnan Tahirovic and Mina Ferizbegovic. Rapidly-exploring Random Vines ( RRV ) for Motion Planning in Conguration Spaces with Narrow Passages. pages 7055{7062, 2018. [188] Michel Ta x, David Flavign e, and Etienne Ferr e. Human interaction with motion planning algorithm. Journal of Intelligent & Robotic Systems, 67(3):285{306, Sep 2012. [189] CheeKuang Tam and Richard Bucknall. Cooperative path planning algorithm for marine surface vessels. Ocean Engineering, 57:25 { 33, 2013. [190] Xinyu Tang, Jyh-Ming Lien, NM Amato, et al. An obstacle-based rapidly-exploring random tree. In IEEE International Conference on Robotics and Automation (ICRA)., pages 895{ 900. Citeseer, 2006. [191] Shantanu Thakar, Liwei Fang, Brual Shah, and Satyandra Gupta. Towards time-optimal trajectory planning for pick-and-transport operation with a mobile manipulator. pages 981{987, 08 2018. [192] Shantanu Thakar, Ariyan Kabir, Prahar Bhatt, Rishi Malhan, Pradeep Rajendran, Brual Shah, and Satyandra K. Gupta. Task assignment and motion planning for bi-manual mobile manipulation. In IEEE International Conference on Automation Science and Engineering (CASE), Vancouver, Canada, August 2019. [193] Shantanu Thakar, Pradeep Rajendran, Vivek Annem, Ariyan Kabir, and Satyandra K. Gupta. Accounting for part pose estimation uncertainties during trajectory generation for part pick-up using mobile manipulators. In IEEE International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 2019. [194] Atul Thakur, Petr Svec, and Satyandra K. Gupta. GPU based generation of state transition models using simulations for unmanned surface vehicle trajectory planning. Robotics and Autonomous Systems, 60(12):1457 { 1471, 2012. [195] M. Tiger and F. Heintz. Online sparse gaussian process regression for trajectory modeling. In 2015 18th International Conference on Information Fusion (Fusion), pages 782{791, July 2015. 189 [196] N. E. Du Toit and J. W. Burdick. Probabilistic collision checking with chance constraints. IEEE Transactions on Robotics, 27(4):809{815, Aug 2011. [197] Rakshit Trivedi, Mehrdad Farajtbar, Prasenjeet Biswal, and Hongyuan Zha. Representation learning over dynamic graphs. arXiv preprint arXiv:1803.04051, 2018. [198] E. Tu, G. Zhang, L. Rachmawati, E. Rajabally, and G. B. Huang. Exploiting ais data for intelligent maritime navigation: A comprehensive survey from data to methodology. IEEE Transactions on Intelligent Transportation Systems, PP(99):1{24, 2017. [199] V. Turri, A. Carvalho, H. E. Tseng, K. H. Johansson, and F. Borrelli. Linear model pre- dictive control for lane keeping and obstacle avoidance on low curvature roads. In 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), pages 378{383, Oct 2013. [200] V. V. Unhelkar, J. Perez, J. C. Boerkoel, J. Bix, S. Bartscher, and J. A. Shah. Towards control and sensing for an autonomous mobile robotic assistant navigating assembly lines. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 4161{ 4167, May 2014. [201] J. P. van den Berg and M. H. Overmars. Using workspace information as a guide to non- uniform sampling in probabilistic roadmap planners. In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004, volume 1, pages 453{460 Vol.1, April 2004. [202] Jur Van Den Berg, Pieter Abbeel, and Ken Goldberg. Lqg-mp: Optimized path planning for robots with motion uncertainty and imperfect state information. The International Journal of Robotics Research, 30(7):895{913, 2011. [203] Jur van den Berg, Sachin Patil, and Ron Alterovitz. Motion Planning Under Uncertainty Using Dierential Dynamic Programming in Belief Space, volume 100, pages 473{490. 01 2017. [204] C W Warren, J C Danos, and B W Mooring. An Approach To Manipulator Path Planning. Van Nostrand Reinhold. Schutz, B. Geometrical Methods of Mathematical Physics, 1981. [205] Mike Vande Weghe, Dave Ferguson, and Siddhartha S Srinivasa. Randomized path planning for redundant manipulators without inverse kinematics. In Humanoid Robots, 2007 7th IEEE-RAS International Conference on, pages 477{482. IEEE, 2007. [206] M. Werling, J. Ziegler, S. Kammel, and S. Thrun. Optimal trajectory generation for dynamic street scenarios in a frenet frame. In 2010 IEEE International Conference on Robotics and Automation, pages 987{993, May 2010. [207] Moritz Werling, S oren Kammel, Julius Ziegler, and Lutz Gr oll. Optimal trajectories for time-critical street scenarios using discretized terminal manifolds. The International Journal of Robotics Research, 31(3):346{359, 2012. [208] D. Wilkie, J. van den Berg, and D. Manocha. Generalized velocity obstacles. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages 5573{ 5578, Oct 2009. [209] D. Wilkie, J. van den Berg, and D. Manocha. Generalized velocity obstacles. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pages 5573{ 5578, Oct 2009. 190 [210] Wenda Xu, Junqing Wei, J. M. Dolan, Huijing Zhao, and Hongbin Zha. A real-time motion planner with trajectory optimization for autonomous vehicles. In 2012 IEEE International Conference on Robotics and Automation, pages 2061{2067, May 2012. [211] Zhenwang Yao and Kamal Gupta. Path planning with general end-eector constraints: Using task space to guide conguration space search. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2211{2216, 2005. [212] Nan Ye, Adhiraj Somani, David Hsu, and Wee Sun Lee. DESPOT: online POMDP planning with regularization. CoRR, abs/1609.03250, 2016. [213] Anna Yershova, L eonard Jaillet, Thierry Sim eon, and Steven M LaValle. Dynamic-domain rrts: Ecient exploration by controlling the sampling domain. In IEEE International Con- ference on Robotics and Automation (ICRA). Proceedings of the 2005, pages 3856{3861, 2005. [214] D. Yi, M. A. Goodrich, and K. D. Seppi. Homotopy-aware rrt*: Toward human-robot topological path-planning. In 2016 11th ACM/IEEE International Conference on Human- Robot Interaction (HRI), pages 279{286, March 2016. [215] Byunghyun Yoo and Jinwhan Kim. Path optimization for marine vehicles in ocean currents using reinforcement learning. Journal of Marine Science and Technology, 21(2):334{343, Jun 2016. [216] Xiaowen Yu, Yu Zhao, Cong Wang, and Masayoshi Tomizuka. Trajectory planning for robot manipulators considering kinematic constraints using probabilistic roadmap approach. Journal of Dynamic Systems, Measurement, and Control, 139(2):021001, 2017. [217] Fang Yuan, Jia-Hong Liang, Yue-Wen Fu, Han-Cheng Xu, and Ke Ma. A hybrid sampling strategy with optimized probabilistic roadmap method. In 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pages 2298{2302, 2015. [218] Liangjun Zhang and Dinesh Manocha. An ecient retraction-based rrt planner. In IEEE International Conference on Robotics and Automation (ICRA), pages 3743{3750, 2008. [219] M. Zucker, J. Kuner, and J. A. Bagnell. Adaptive workspace biasing for sampling-based planners. In 2008 IEEE International Conference on Robotics and Automation, pages 3757{ 3762, May 2008. [220] Matt Zucker, James Kuner, and J Andrew Bagnell. Adaptive workspace biasing for sampling-based planners. In IEEE International Conference on Robotics and Automation (ICRA), pages 3757{3762, 2008. [221] Matt Zucker, Nathan Ratli, Anca D Dragan, Mihail Pivtoraiko, Matthew Klingensmith, Christopher M Dellin, J Andrew Bagnell, Siddhartha S Srinivasa, Pittsburgh Pa, and Philadelphia Pa. CHOMP: Covariant Hamiltonian Optimization for Motion Planning. The International Journal of Robotics Research, 2012. 191
Abstract (if available)
Abstract
Advances in sensing and computing hardware have physically equipped robots to operate in complex environments. In many real-world settings, we desire robots to operate at a high-level of autonomy to reduce operating costs and manpower requirements. A high-level of autonomy can be achieved only when robots are able to plan missions and tasks themselves. Trajectory planning is a fundamental building block required to support high-level decision making in robots. ❧ Trajectory planning for autonomous robots operating in complex environments is a challenging problem. The complexity of trajectory planning problems stems from the dimensionality of robot's state space, the complexity of the robot kinematic and dynamic model, the nature of environmental constraints (e.g., obstacles), task constraints (e.g., rules), the optimization objective function, and the planning-time requirements needed for deployment in the real world. Depending on the complexity, these problems can be solved by existing methods to produce feasible trajectories. But, in many practical applications (e.g., automated package delivery), computing a feasible trajectory alone is not enough. The quality of the computed trajectory is also important. However, in many cases, computing truly optimal trajectories is computationally intensive and thus, very time-consuming. As a result, existing methods do not satisfy planning-time constraints required by the application while maintaining optimality. We need a method that produces high-quality trajectories and at the same time produce those trajectories quickly. Anytime methods handle exactly this problem. However, these methods produce high-quality trajectories quickly only when good heuristics are used. This thesis focuses on heuristic techniques for anytime algorithms that speed up trajectory planning for autonomous robots in complex environments. ❧ Trajectory planning for Unmanned Surface Vehicles (USVs) is one representative example that includes many of the complicating factors such as uncertainty in motion, uncertainty in perception, spatio-temporally varying environment (e.g., waves and traffic vessels), and fast planning-time requirements. Accounting for all of these factors is crucial for building reliable autonomy solutions. A wave-aware deliberative planner that avoids collisions with traffic vessels is presented. It also opportunistically traverses wavefields generated by these vessels while minimizing the execution time and the risk of failure. The planner performs a search over a 4D pose-time lattice to generate a collision-free, minimum-risk sequence of motion goals. It addresses motion uncertainty, failure risk and perception uncertainty through a parametric model. Heuristics to speed-up computation are presented. Simulation and physical experimental results are presented. They show that our methodology quickly produces plans that execute faster and safer when compared to the typical reactive planning schemes. ❧ A real-time dynamics-aware reactive trajectory generator is presented. It produces trajectories that avoid collisions with dynamic obstacles under varying dynamic constraints imposed on the robot as its operating environment changes. It is an obstacle avoidance approach that explicitly reasons about reference trajectory deviation, varying dynamics constraints while minimizing collision risk with uncertain dynamic obstacles. Simulation results are presented in the context of the Unmanned Ground Vehicle (UGV) domain, to show that the planner is able to effectively deal with the dynamic obstacles on terrains with varying slopes. ❧ Trajectory planning for industrial manipulators operating in cluttered environments is a representative example that involves navigating a high-dimensional configuration space. In small-volume manufacturing applications, manipulators perform non-repetitive tasks that change frequently. Every time a task changes, generating manipulator trajectories by manual placement of waypoints is tedious and inefficient. Automated trajectory-planning methods can help improve manufacturing throughput and reduce operating costs. Planning-time and trajectory-quality are the key metrics that decide the utility of an automated planning method. A highly reliable trajectory planning framework is presented. It exploits workspace hints and uses a parameter scheduling scheme to generate high-quality trajectories quickly. The framework is also able to seamlessly incorporate human assistance through an intuitive user-interface to cooperatively (i.e., human and computer) solve difficult planning problems.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Trajectory planning for manipulators performing complex tasks
PDF
Decentralized real-time trajectory planning for multi-robot navigation in cluttered environments
PDF
Contingency handling in mission planning for multi-robot teams
PDF
Motion coordination for large multi-robot teams in obstacle-rich environments
PDF
Target assignment and path planning for navigation tasks with teams of agents
PDF
Robot trajectory generation and placement under motion constraints
PDF
Large-scale path planning and maneuvering with local information for autonomous systems
PDF
Discrete geometric motion control of autonomous vehicles
PDF
Planning for mobile manipulation
PDF
Intelligent robotic manipulation of cluttered environments
PDF
Data-driven acquisition of closed-loop robotic skills
PDF
Information theoretical action selection
PDF
Speeding up path planning on state lattices and grid graphs by exploiting freespace structure
PDF
Data scarcity in robotics: leveraging structural priors and representation learning
PDF
Multi-robot strategies for adaptive sampling with autonomous underwater vehicles
PDF
Process planning for robotic additive manufacturing
PDF
Risk-aware path planning for autonomous underwater vehicles
PDF
Rethinking perception-action loops via interactive perception and learned representations
PDF
Automated alert generation to improve decision-making in human robot teams
PDF
AI-driven experimental design for learning of process parameter models for robotic processing applications
Asset Metadata
Creator
Rajendran, Pradeep
(author)
Core Title
Speeding up trajectory planning for autonomous robots operating in complex environments
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Mechanical Engineering
Publication Date
12/03/2019
Defense Date
09/13/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
autonomous robots,OAI-PMH Harvest,trajectory planning
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gupta, Satyandra (
committee chair
), Koenig, Sven (
committee member
), Luhar, Mitul (
committee member
)
Creator Email
pradeepr@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-243172
Unique identifier
UC11673587
Identifier
etd-RajendranP-7978.pdf (filename),usctheses-c89-243172 (legacy record id)
Legacy Identifier
etd-RajendranP-7978.pdf
Dmrecord
243172
Document Type
Dissertation
Rights
Rajendran, Pradeep
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
autonomous robots
trajectory planning