Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Incremental search-based path planning for moving target search
(USC Thesis Other)
Incremental search-based path planning for moving target search
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Incremental Search-Based Path Planning for Moving Target Search by Xiaoxun Sun A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) May 2013 Copyright 2013 Xiaoxun Sun Acknowledgements First and foremost, I would like to thank my advisor, Sven Koenig, for his guidance and strong support throughout this journey. I would also like to thank my dissertation committee and dissertation proposal committee, Maxim Likhachev, Raghu Raghavendra, Michael Zyda, Aiichiro Nakano and Cyrus Shahabi for their helpful comments and sug- gestions. I would also like to thank my colleagues and collaborators Xiaoming Zheng, Po-An Chen, Kenny Daniel, Alex Nash, Janusz Marecki, Changhe Yuan, Jason Tsai, Carlos Hernandez, Pedro Meseguer, Wheeler Ruml, Marek Druzdzel, Pradeep Varakan- tham, David Bond, Tansel Uras, and Niels Widger for all the stimulating discussions we have had. I thank Nathan Sturtevant for providing the game maps used in the chap- ter on Experimental Evaluation to measure the runtimes of the developed incremental search algorithms. I would like to separately acknowledge Maxim Likhachev, the creator of SBPL, for providing me with the source code of SBPL and explaining to me how to use it, and William Yeoh for our fruitful collaborations during the past seven years that directly in uenced my dissertation. Last but not least, I would like to thank my wife for being so understanding and patient with me. ii Table of Contents Acknowledgements ii List of Tables vi List of Figures vii Abstract xi Chapter 1: Introduction 1 1.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.2 Terrains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 2: Background 16 2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 State Space Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.2 State Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 A* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.2 OPEN and CLOSED Lists . . . . . . . . . . . . . . . . . . . . . . 22 2.3.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.6 Applying A* to Moving Target Search . . . . . . . . . . . . . . . . 29 2.3.6.1 Search Directions . . . . . . . . . . . . . . . . . . . . . . 29 2.3.6.2 Repeated A* . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Repeated A*-based Path Planning Algorithms for Agent Navigation . . . 34 2.4.1 Real-time Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4.2 Incremental Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.4.2.1 Adaptive A* . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.4.2.2 D* Lite . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 iii 2.4.2.3 Dierential A* . . . . . . . . . . . . . . . . . . . . . . . . 59 2.5 Evasion Algorithms for the Targets . . . . . . . . . . . . . . . . . . . . . . 60 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Chapter 3: Incremental Search by Learning h-values 65 3.1 Review of Moving Target Adaptive A* . . . . . . . . . . . . . . . . . . . . 67 3.1.1 Eager Moving Target Adaptive A* . . . . . . . . . . . . . . . . . . 67 3.1.1.1 Applicability . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.1.1.2 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.1.1.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.1.1.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.1.2 Lazy Moving Target Adaptive A* . . . . . . . . . . . . . . . . . . . 73 3.1.2.1 Applicability . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.1.2.2 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.1.2.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.1.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.2 Generalized Adaptive A* . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.2.1 Applicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.2.2 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.2.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Chapter 4: Incremental Search by Reusing Search Trees 97 4.1 Generalized Fringe-Retrieving A* and its Optimization . . . . . . . . . . . 100 4.1.1 Generalized Fringe-Retrieving A* . . . . . . . . . . . . . . . . . . . 100 4.1.1.1 Applicability . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.1.1.2 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.1.1.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.1.1.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.1.2 Fringe-Retrieving A* . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.1.2.1 Applicability . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.1.2.2 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.1.2.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.1.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.2 Moving Target D* Lite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 4.2.1 Basic Moving Target D* Lite . . . . . . . . . . . . . . . . . . . . . 127 4.2.1.1 Applicability . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.2.1.2 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.2.1.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 129 4.2.1.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 4.2.2 Moving Target D* Lite . . . . . . . . . . . . . . . . . . . . . . . . 135 4.2.2.1 Applicability . . . . . . . . . . . . . . . . . . . . . . . . . 135 4.2.2.2 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4.2.2.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 136 iv 4.2.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Chapter 5: Experimental Evaluation 144 5.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.2 Experimental Setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.2.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.2.2 State Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 5.2.3 Terrains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.2.4 Target Movement Strategies . . . . . . . . . . . . . . . . . . . . . . 149 5.2.5 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 5.3.1 Experiments in Known Static Terrains . . . . . . . . . . . . . . . . 152 5.3.1.1 Target Movement Strategy: Random Waypoint . . . . . . 152 5.3.1.2 Target Movement Strategy: TrailMax . . . . . . . . . . . 161 5.3.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 5.3.2 Experiments in Known Dynamic Terrains . . . . . . . . . . . . . . 168 5.3.2.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 5.3.3 Experiments in Unknown Static Terrains . . . . . . . . . . . . . . 178 5.3.3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Chapter 6: Applications 190 6.1 Generalized Adaptive A* Application . . . . . . . . . . . . . . . . . . . . 190 6.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 6.1.2 State Space Representation . . . . . . . . . . . . . . . . . . . . . . 193 6.1.3 Overview of the Experimental Results . . . . . . . . . . . . . . . . 193 6.1.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 6.2 Generalized Fringe-Retrieving A* Application . . . . . . . . . . . . . . . . 195 6.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 6.2.2 State Space Representation . . . . . . . . . . . . . . . . . . . . . . 196 6.2.3 Overview of the Experimental Results . . . . . . . . . . . . . . . . 198 6.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 6.3 Moving Target D* Lite Application . . . . . . . . . . . . . . . . . . . . . . 200 6.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 6.3.2 State Space Representation . . . . . . . . . . . . . . . . . . . . . . 201 6.3.3 Overview of the Experimental Results . . . . . . . . . . . . . . . . 202 6.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Chapter 7: Conclusions 205 Reference List 211 Bibliography 211 v List of Tables 1.1 Search Problems that Existing Incremental Search Algorithms Apply to . 9 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Fastest Incremental Search Algorithms for Dierent Terrains . . . . . . . 15 2.1 Operations of A* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 Search Problems that Heuristic Learning Incremental Search Algorithms Apply to . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1 Search Problems that Search Tree Transforming Incremental Search Algo- rithms Apply to . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.1 Experimental Results in Known Static Grids (Target Movement Strategy: Random Waypoint) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 5.2 Experimental Results in Known Static Grids (Target Movement Strategy: TrailMax) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 5.3 Experimental Results in Known Dynamic Grids . . . . . . . . . . . . . . 169 5.4 Experimental Results in Unknown Static Grids . . . . . . . . . . . . . . . 178 5.5 Best Incremental Search Algorithms for Dierent Terrains . . . . . . . . 183 vi List of Figures 1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1 Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Example Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Example Feasible Motion Primitive . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Pseudo Code of A* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5 Legend of Figure 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.6 Example Trace of A* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.7 Pseudo Code of Forward Repeated A* . . . . . . . . . . . . . . . . . . . . 31 2.8 Legend of Figure 2.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.9 Example Trace of Forward Repeated A* . . . . . . . . . . . . . . . . . . 33 2.10 Pseudo Code of Eager Adaptive A* . . . . . . . . . . . . . . . . . . . . . . 42 2.11 Legend of Figure 2.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.12 Example Trace of Forward Repeated A* . . . . . . . . . . . . . . . . . . 47 2.13 Example Trace of Eager Adaptive A* . . . . . . . . . . . . . . . . . . . . 47 2.14 Pseudo Code of Lazy Adaptive A* . . . . . . . . . . . . . . . . . . . . . . 49 2.15 Legend of Figure 2.16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.16 Example Trace of Lazy Adaptive A* . . . . . . . . . . . . . . . . . . . . . 52 vii 2.17 Pseudo Code of D* Lite (Part 1) . . . . . . . . . . . . . . . . . . . . . . . 53 2.18 Pseudo Code of D* Lite (Part 2) . . . . . . . . . . . . . . . . . . . . . . . 54 2.19 Legend of Figure 2.20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.20 Example Trace of D* Lite . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.1 Pseudo Code of Forward Eager Moving Target Adaptive A* . . . . . . . 71 3.2 Legend of Figures 3.3 and 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.3 Example Trace of Forward Repeated A* . . . . . . . . . . . . . . . . . . 72 3.4 Example Trace of Forward Eager Moving Target Adaptive A* . . . . . . 72 3.5 Pseudo Code of Forward Lazy Moving Target Adaptive A* (Part 1) . . . 74 3.6 Pseudo Code of Forward Lazy Moving Target Adaptive A* (Part 2) . . . 75 3.7 Legend of Figure 3.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.8 Example Trace of Forward Lazy Moving Target Adaptive A* . . . . . . . 83 3.9 Pseudo Code of Forward Generalized Adaptive A* (Part 1) . . . . . . . . 88 3.10 Pseudo Code of Forward Generalized Adaptive A* (Part 2) . . . . . . . . 89 3.11 Legend of Figures 3.12 and 3.13 . . . . . . . . . . . . . . . . . . . . . . . . 93 3.12 Example Trace of Forward Repeated A* . . . . . . . . . . . . . . . . . . 94 3.13 Example Trace of Forward Generalized Adaptive A* . . . . . . . . . . . . 94 4.1 Pseudo Code of Generalized Fringe-Retrieving A* (Part 1) . . . . . . . . . 103 4.2 Pseudo Code of Generalized Fringe-Retrieving A* (Part 2) . . . . . . . . . 104 4.3 Operations of Generalized Fringe-Retrieving A* . . . . . . . . . . . . . . . 104 4.4 Legend of Figure 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.5 Example Trace of Generalized Fringe-Retrieving A* . . . . . . . . . . . . 112 4.6 Denition of Perimeter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 viii 4.7 Pseudo Code of Fringe-Retrieving A* (Part 1) . . . . . . . . . . . . . . . . 116 4.8 Pseudo Code of Fringe-Retrieving A* (Part 2) . . . . . . . . . . . . . . . . 117 4.9 Legend of Figure 4.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.10 Example Trace of Fringe-Retrieving A* . . . . . . . . . . . . . . . . . . . 124 4.11 Pseudo Code of Basic Moving Target D* Lite (Part 1) . . . . . . . . . . . 130 4.12 Pseudo Code of Basic Moving Target D* Lite (Part 2) . . . . . . . . . . . 131 4.13 Legend of Figure 4.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 4.14 Example Trace of Basic Moving Target D* Lite . . . . . . . . . . . . . . . 134 4.15 Pseudo Code of Moving Target D* Lite (Part 1) . . . . . . . . . . . . . . 137 4.16 Pseudo Code of Moving Target D* Lite (Part 2) . . . . . . . . . . . . . . 138 4.17 Legend of Figure 4.18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 4.18 Example Trace of Moving Target D* Lite . . . . . . . . . . . . . . . . . . 140 5.1 Grids Used in the Experimental Evaluation . . . . . . . . . . . . . . . . . 147 5.2 Reusable and Deleted States of Moving Target D* Lite . . . . . . . . . . 160 5.3 Runtime Relationships in Known Static Terrains (Target Movement Strat- egy: Random Waypoint) . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 5.4 Runtime Relationships in Known Static Terrains (Target Movement Strat- egy: TrailMax) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 5.5 Runtime Relationships in Known Dynamic Terrains (Target Movement Strategy: Random Waypoint) . . . . . . . . . . . . . . . . . . . . . . . . 177 5.6 Runtime Relationships in Unknown Static Terrains (Target Movement Strategy: Random Waypoint) . . . . . . . . . . . . . . . . . . . . . . . . 182 6.1 Interface of the Roman Tutoring System (Belghith, Kabanza, & Hartman, 2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 6.2 Space Station Remote Manipulator System (NASA, 2001) . . . . . . . . . 191 ix 6.3 Example Unmanned Ground Vehicle Path . . . . . . . . . . . . . . . . . . 197 6.4 Motion Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 x Abstract In this dissertation, I demonstrate how to speed up path planning for moving target search, which is a problem where an agent needs to move to a target and the target can move over time. It is assumed that the current locations of the agent and the target are known to the agent at all times. The information about the terrain that an agent has is the action costs of the agent in any particular location of the terrain. The information about the terrain that the agent has can change over time depending on dierent applications: For example, when a robot is deployed in a new terrain without any map a priori, the robot initially does not have any information about the terrain and it has to acquire information about the terrain during navigation. However, a character in a computer game may have complete information about the terrain that remains unchanged over time given that the whole game map is loaded in memory and is available to the character. I use the following movement strategy for the agent that is based on assumptive planning: The agent rst nds a cost-minimal path to the target with the information about the terrain that is currently available to it. The agent then starts moving along the path. Whenever new information about the terrain is acquired or the target moves o the path, the agent performs a new search to nd a new cost-minimal path from the agent to the target. The agent uses this movement strategy until either the target is caught or the agent nds that xi there does not exist any path from the agent to the target after a search (and in any future searches), upon which the agent stops navigation. Since the agent's information about the terrain can change and the target can move over time, the agent needs to repeatedly perform searches to nd new cost-minimal paths to the target. Path planning for moving target search by using this movement strategy is thus often a repeated search process. Additionally, agents need to nd new cost-minimal paths as fast as possible, such that they move smoothly and without delay. Many path planning algorithms have been developed, among which incremental search algorithms reuse information from previous searches to speed up the current search and are thus often able to nd cost-minimal paths for series of similar search problems faster than by solving each search problem from scratch. Incremental search algorithms have been demonstrated to be very successful in path planning for many important applications in robotics. However, it is assumed that the target does not move over time during navigation for most incremental search algorithms, and they are either inapplicable or run more slowly than A* to solve moving target search. Thus, I demonstrate how to speed up search-based path planning for moving target search by developing new incremental search algorithms. In my dissertation, I make the following contributions: (1) I develop Generalized Adaptive A* (GAA*), that learns h-values (= heuristic values) to make them more in- formed for moving target search. GAA* applies to moving target search in terrains where the action costs of the agent can change between searches. (2) I develop Gener- alized Fringe-Retrieving A* (G-FRA*), that transforms the search tree of the previous search to the search tree of the current search for moving target search. Though G-FRA* xii applies only to moving target search in terrains where the action costs of the agent do not change between searches, it creates a new way of transforming the search tree of the previous search to the search tree of the current search. (3) I develop Moving Target D* Lite (MT-D* Lite), that transforms the search tree of the previous search to the search tree of the current search for moving target search. MT-D* Lite combines the principles of G-FRA* and D* Lite, an existing incremental search algorithm. MT-D* Lite applies to moving target search in terrains where the action costs of the agent can change between searches. (4) I compare the new incremental search algorithms, discuss their strengths and weaknesses and provide guidelines for when to choose a particular algorithm over another. Simulation results show that the developed incremental search algorithms run up to one order of magnitude faster than A* for moving target search. xiii Chapter 1: Introduction An agent often needs to move from its start location to a target in a terrain, which is the vertical and horizontal dimension of land surface, known as agent navigation (Thrun, 1998; Goto & Stentz, 1987; Hrabar & Sukhatme, 2009). Although in dierent applica- tions the action costs of the agent can be aected by many things such as the speed of the wind or the battery level of the agent, for ease of description of the developed algorithms, I assume that the action costs of the agent can only be aected by the terrain. This assumption can easily be extended to cover more things that can aect the action costs of the agent in dierent applications while no changes to the developed algorithms are necessary, because the pseudo codes of the developed algorithms are independent of the action costs of the agent. Agent navigation has a wide range of applications in articial in- telligence, such as in robotics (Garrido, Moreno, Abderrahim, & Monar, 2006; Kushleyev & Likhachev, 2009; Vandapel, Donamukkala, & Hebert, 2006; Ferguson & Stentz, 2005a) and real-time computer games (Rabin, 2002; Yap, Burch, Holte, & Schaeer, 2011). In such situations, an agent has to take into account new information about the terrain during navigation and respond quickly. For example, a character in real-time computer 1 games often has to deal with newly observed obstacles that block its path during nav- igation and repeatedly searches for new paths to catch its target. Moreover, an agent usually has a limited amount of time for planning its path. For example, a character in real-time computer games needs to plan its path quickly in order to move smoothly and without delay. The computer game company Bioware recently imposed a limit of 1-3 ms on the search time (Bulitko, Bjornsson, Luvstrek, Schaeer, & Sigmundarson, 2007) for game maps of approximately 500 500 cells. 1 Planning a new path from scratch can be computationally expensive or may not even be possible if an agent frequently receives new information (Ferguson, 2006). Agent navigation generally falls into two classes: Stationary target search, where the target does not move over time (Stentz, 1995; Koenig & Likhachev, 2002; Likhachev, 2005; Ferguson, 2006), and moving target search, where the target can move over time (Ishida & Korf, 1991; Moldenhauer & Sturtevant, 2009b, 2009a). Lots of research has focused on developing path planning algorithms for stationary target search. For example, incremen- tal search algorithms reuse information from previous searches to speed up the current search and are thus often able to nd cost-minimal paths for series of similar search prob- lems faster than by solving each search problem from scratch (Koenig, Likhachev, Liu, & Furcy, 2004). Most of the recently developed incremental search algorithms are based on A* (Hart, Nilsson, & Raphael, 1968), a popular search algorithm that uses h-values (= heuristic values), estimates of the minimum costs of moving from each state to the goal state, to guide its search. The informedness of an h-value of a given state is a mea- sure of how accurate that h-value estimates the minimum cost of moving from the given 1 I thank Vadim Bulitko for providing the information about the size of the game maps. 2 state to the goal state. More informed h-values (= h-values that estimate the minimum costs of moving from each state to the goal state more accurately) make A* focus its search better and hence dramatically out-perform a search that does not use h-values (= uninformed search) (Pearl, 1985). Incremental search algorithms are well-suited for path planning for stationary target search. For example, Field D* (Ferguson & Stentz, 2005c), an incremental search algorithm extended from D* Lite (Koenig & Likhachev, 2002), is currently implemented as the path planner in several elded robotic systems, such as Automated E-Gator, GDRS XUV, and Mars rovers of NASA's Jet Propulsion Laboratory (Ferguson & Stentz, 2005b). Anytime D* (Likhachev, Ferguson, Gordon, Stentz, & Thrun, 2005a, 2005b), an anytime incremental search algorithm, is able to eciently generate complex paths over large, obstacle-laden terrains. Anytime D* has been applied to ecient path planning for land-based mobile robots, such as the ATRV and Segbot robotic platforms (Likhachev et al., 2005a; Likhachev & Ferguson, 2009), and \BOSS", the autonomous Chevy Tahoe that won the 2007 Defense Advanced Research Projects Agency (DARPA) Urban Challenge, where unmanned ground vehicles needed to autonomously navigate a 60-mile urban course in moving trac (Urmson et al., 2008). All these applications of incremental search algorithms make solid progress on fast path planning in robotics. Thus far, incremental search algorithms, such as D* Lite and its variants, were de- signed for stationary target search. They are either inapplicable or run more slowly than A* for moving target search. However, many applications in articial intelligence need to plan and replan paths in response to target movements during navigation (Ander- son, 1988; Mehrandezh, Sela, Fenton, & Benhabib, 2000; Riley & Atkeson, 2002; Croft, 3 Fenton, & Benhabib, 1998; Rabin, 2002; Moldenhauer & Sturtevant, 2009b, 2009a; Un- deger & Polat, 2007; Coue & Bessiere, 2001; Freda & Oriolo, 2007; Loh & Prakash, 2009; Kolling, Kleiner, Sycara, & Lewis, 2011). For example, a character in computer games often needs to catch other moving characters (Rabin, 2002). Therefore, in this disserta- tion, I develop new incremental search algorithms for moving target search. I hypothesize that one can develop new incremental search algorithms for moving target search that can speed up repeated A*. 1.1 Problem In this dissertation, I demonstrate how to speed up path planning for moving target search, which is a problem where an agent needs to move to its target and the target can move over time. It is assumed that the current locations of the agent and the target are known to the agent at all times. This assumption is used in many applications, such as computer games, in which a game character knows its current location and its target's current location at all times. For example, the target can be another game character that the game character wants to catch. The chase process is triggered when the game player sets another game character as the target. There is no assumption about the movement strategy or speed of the target except that the target can move from a location to another location only when there exists a path between these two locations. I rst introduce the approach that I use for moving target search in Section 1.1.1, then I introduce dierent classes of terrains used in my dissertation in Section 1.1.2. 4 1.1.1 Approaches Existing algorithms for moving target search generally fall into two classes: The rst class is a class of oine path planning algorithms that take into account all possible contingencies (for example, all possible movements of the target and all possible action cost changes in the terrain) to nd paths for the agent. Thus, the agent does not perform any path planning during navigation. For example, one version of the minimax search algorithm (Hahn & MacGillivray, 2006) falls into this class. Reverse Minimax A* (RMA*) (Moldenhauer & Sturtevant, 2009b) extends this algorithm by incorporating a heuristic function that estimates the minimum cost of moving from each state to the goal state to speed up the search. However, algorithms of this class are typically slow and do not scale up well due to the large number of contingencies (Moldenhauer & Sturtevant, 2009a). The second class is a class of online path planning algorithms that interleave path planning with action execution. Online path planning algorithms do not need to consider all possible contingencies: An agent rst nds a path with the information currently available to it. The agent then starts to move along the path found until new information is acquired (for example, the target moved and the path does not lead to the target anymore, or the path is blocked by newly observed obstacles and thus becomes infeasible), upon which the agent performs a new search to nd a new path from the agent to the target. The agent uses this approach until either the target is caught successfully or the agent nds that there does not exist any path from the agent to the target after one search (and in future searches), upon which 5 the agent stops navigation. The solution quality of path planning for moving target search is the total action cost of the agent from the moment it starts navigation until it stops navigation (= the cost of the trajectory of the agent). This approach is known as assumptive planning (Nourbakhsh & Genesereth, 1996). The main advantage of this approach compared to oine algorithms is that it does not need to consider contingencies that the agent has not encountered and hence is faster and more memory-ecient. It has been demonstrated that online path planning algorithms scale up well in the size of the terrain (Likhachev & Ferguson, 2009), and the path of the agent can be calculated reasonably fast to satisfy the runtime requirements of certain applications, such as robotics (Urmson, 2008). The limitation of assumptive planning is that it is incomplete, that is, an agent using assumptive planning might fail to catch the target if the terrain changes in a way that prevents the agent from catching the target. For example, consider a situation where a robot is trapped in a room. There is a door 10 meters away from the robot to the north (= northern door). The northern door is always open and provides an exit from the room. There is another door 5 meters away from the robot to the south (= southern door). The southern door is controlled by an ad- versarial agent. The target is right behind the southern door outside of the room. Whenever the robot is within 1 meter from the southern door, the door is shut by the adversarial agent temporarily and reopened when the robot is more than 2 meters away from it. While the robot always has a way out by passing through the northern door, a path planning algorithm using assumptive planning will guide the robot to toggle between the two doors forever. In this situation, the robot 6 needs to reason explicitly about the changes of the terrain to learn the features of the changes, and identify the southern door as an untraversable area. Despite the fact that path planning algorithms based on assumptive planning are incom- plete, agent navigation using assumptive planning has been a great success in many applications of robotics (Stentz, 1997; Koenig & Smirnov, 1996; Likhachev, 2005; Ferguson, 2006). In this dissertation, I develop incremental search algorithms based on assumptive planning for moving target search, and test them by using problems without adversarial agents in them. In this situation, the developed incremental search algorithms are complete and can guide the agent to catch the target in all problems. 1.1.2 Terrains An agent's information about the terrain may vary depending on the application. For example, when a robot is deployed in a new terrain without any map a priori, the robot initially does not have any information about the terrain and it has to acquire information about the terrain during navigation. On the other hand, a character in a computer game can have complete information about the terrain given that the whole game map can be loaded into memory and is available to the character. In order to evaluate the runtime eciency of the developed incremental search algorithms in dierent terrains, I systematically consider three classes of terrains in this dissertation: Known Static Terrain: The terrain does not change over time, and the agent has complete information about the terrain at all times. The only information updated 7 over time for the agent is the current locations of the agent and the target. In this terrain, the action costs of the agent do not change over time. Known Dynamic Terrain: The terrain can change over time, and the agent has complete information about the terrain at all times. The information updated over time for the agent is thus (1) the current locations of the agent and the target, and (2) the information about the terrain. In this terrain, the action costs of the agent can increase and decrease over time. Unknown Static Terrain: The terrain does not change over time, but the agent initially has no information about the terrain and thus assumes that the terrain is traversable everywhere (= no obstacles in the terrain). 2 The sensors on-board the agent can typically observe the terrain only within a certain range around its current location and update its information about the terrain if any obstacles are newly observed. The information updated over time for the agent is thus (1) the current locations of the agent and the target, and (2) the information about any newly observed obstacles within the sensor range. In this terrain, the action costs of the agent can increase but not decrease over time. In this dissertation, most examples use grids for ease of description of the developed algorithms. I use the term \known static grids" to refer to grids in known static terrains. Similarly, the terms \known dynamic" and \unknown static grids" refer to grids in the corresponding terrains. 2 This is a simple but reasonable assumption since, if it is assumed that a traversable area is un- traversable, then there is a chance that no path can be found even if there exists one. 8 Action Costs Action Costs Start State Goal State Algorithms Can Increase Can Decrease Can Change Can Change Between Searches Between Searches Between Searches Between Searches Heuristic Learning Yes No Yes Yes Incremental Search Algorithms Search Tree Transforming Yes Yes No Yes Incremental Search Algorithms Table 1.1: Search Problems that Existing Incremental Search Algorithms Apply to 1.2 Hypothesis Incremental search algorithms can combine assumptive planning to repeatedly nd cost- minimal paths from the agent to the target faster than repeated A* searches. Incremental search algorithms have been demonstrated to be well-suited for path planning for agent navigation where the target does not move over time (Likhachev, 2005; Ferguson, 2006). Existing incremental search algorithms generally fall into two classes: The rst class uses information from previous searches to update theh-values of the current search so that they become more informed and focus the current search bet- ter (Principle 1), referred to as \heuristic learning incremental search algorithms". The second class transforms the search tree of the previous search to the search tree of the current search and hence the current search starts with the transformed search tree instead of from scratch (Principle 2), referred to as \search tree transforming incremental search algorithms". Table 1.1 lists classes of search problems that existing incremental search algorithms apply to. It shows \Yes" if at least one algorithm is known in that class that applies to the problems and runs faster than repeated A*, and it shows \No" otherwise. Note that, the \start state" and \goal state" in Table 1.1 as well as the discussions in the following 9 two paragraphs refer to the start state and goal state of each search, respectively, rather than the current states of the agent and the target. Table 1.1 shows that, although existing incremental search algorithms have been demonstrated to be able to speed up path planning for stationary target search, they are either inapplicable or run more slowly than repeated A* for moving target search where both the agent and the target can move and the action costs of the agent can change over time: Heuristic learning incremental search algorithms, such as Adaptive A* (Koenig & Likhachev, 2005) and its extension Moving Target Adaptive A* (MT-Adaptive A*) (Koenig, Likhachev, & Sun, 2007), need consistent h-values with respect to the goal state of each search, that is, h-values that satisfy the \triangle inequal- ity" (Pearl, 1985), to guide their searches. Adaptive A* and MT-Adaptive A* basi- cally transform consistent h-values into more informed consistent h-values. Adap- tive A* is not guaranteed to nd cost-minimal paths in terrains where (1) the action costs of the agent can decrease over time or (2) the goal state can change between searches, because consistent h-values do not necessarily remain consistent with re- spect to the goal state of each search in these two situations. MT-Adaptive A* extends Adaptive A* to maintain the consistency of the h-values with respect to the goal state after the goal state changed between searches. However, MT-Adaptive A* cannot maintain the consistency of the h-values with respect to the goal state of each search when the action costs of the agent decrease between searches. Thus, it is necessary to maintain the consistency of theh-values of MT-Adaptive A* with respect to the goal state in terrains where the action costs of the agent can decrease 10 Action Costs Action Costs Start State Goal State Algorithms Can Increase Can Decrease Can Change Can Change Between Searches Between Searches Between Searches Between Searches GAA* Yes Yes Yes Yes FRA* & G-FRA No No Yes Yes Basic MT-D* Lite & MT-D* Lite Yes Yes Yes Yes Table 1.2: Contributions between searches, so that it applies to moving target search in all three classes of terrains (see Section 1.1.2). Search tree transforming incremental search algorithms, such as D* Lite and its variants, are ecient only if the start state does not change between searches. For stationary target search, D* Lite performs backward searches from the target to the agent by assigning the current states of the agent and target to the goal and start states of each search, respectively. However, it has not been investigated how to apply D* Lite and its variants to moving target search and make it run faster than A* where both the start and goal states can change between searches. Therefore, in this dissertation, I demonstrate how to speed up path planning for moving target search by developing new incremental search algorithms. My hypothesis is as follows: One can develop incremental search algorithms that can nd cost-minimal paths faster than repeated A* for moving target search. 11 Heuristic Learning Incremental Search Algorithms Search Tree Transforming Incremental Search Algorithms Adaptive A* MT-Adaptive A* GAA* Existing Work My Contributions D* Lite FRA* Basic MT-D* Lite G-FRA* MT-D* Lite Figure 1.1: Contributions 1.3 Contributions To validate my hypothesis, I developed new incremental search algorithms for moving target search. The newly developed algorithms thus extend the applicability of incre- mental search algorithms. Table 1.2 lists types of path planning problems that the newly developed incremental search algorithms apply to. In detail, my contributions are shown in Figure 1.1: For heuristic learning incremental search algorithms, I develop Generalized Adap- tive A* (GAA*) (Sun, Koenig, & Yeoh, 2008), an incremental A* variant for moving target search that extends MT-Adaptive A* (Koenig et al., 2007). GAA* extends 12 MT-Adaptive A* to apply to moving target search in terrains where the action costs of the agent can both increase and decrease between searches. Thus, GAA* applies to all three classes of terrains (see Section 1.1.2). GAA* is currently the fastest incremental search algorithm for moving target search in unknown static terrains. For search tree transforming incremental search algorithms, I make the following contributions: { First, I develop Generalized Fringe-Retrieving A* (G-FRA*), an incremental A* variant for moving target search in known static terrains (see Section 1.1.2). G-FRA* runs up to one order of magnitude faster than A* when applied to moving target search in known static terrains. { Second, I develop Fringe-Retrieving A* (FRA*) (Sun, Yeoh, & Koenig, 2009), an incremental A* variant for moving target search in known static grids (see Section 1.1.2). FRA* optimizes G-FRA* to apply to moving target search in known static grids only. FRA* runs up to one order of magnitude faster than A* when applied to moving target search in known static grids. { Finally, I develop Moving Target D* Lite (MT-D* Lite) by combining the principles of G-FRA* and D* Lite. MT-D* Lite applies to moving target search in terrains where the action costs of the agent can increase and decrease between searches. Thus, MT-D* Lite applies to all three classes of terrains (see Section 1.1.2). MT-D* Lite runs faster than A* and GAA* by up to a factor of four for moving target search in known dynamic terrains. 13 I compare the runtime of the developed incremental search algorithms, discuss their strengths and weaknesses and provide guidelines for when to choose a particular algorithm over another. Table 1.3 provides a summary of the best algorithms in terms of runtime per search in all three classes of terrains (see Section 1.1.2): A* and GAA* can perform repeated searches either from the current state of the agent to the current state of the target, referred to as Forward Repeated A* and For- ward GAA*, respectively, or from the current state of the target to the current state of the agent, referred to as Backward Repeated A* and Backward GAA*, respectively. FRA*, G-FRA* and MT-D* Lite can only perform forward but not backward searches. The numbers in square brackets below each incremental search algorithm are the smallest ratios of the runtime per search of repeated A* and the runtime per search of the incremental search algorithm. The runtime per search of repeated A* is the smaller runtime per search between Forward Repeated A* and Backward Repeated A*. { For known static terrains, FRA* is the fastest algorithm on grids. G-FRA* runs slightly more slowly than FRA* on known static grids, and it is the second fastest algorithm. However, G-FRA* applies to arbitrary graphs and FRA* applies only to known static grids. { For known dynamic terrains, if the h-values are well-informed, MT-D* Lite is the fastest algorithm. If the h-values are ill-informed: Backward GAA* is the fastest algorithm if the number of actions whose costs change is small. 14 Dynamic Terrains Static Terrains User-provided h-value Small Number of Medium Number of Large Number of No Informedness Cost Changes Cost Changes Cost Changes Cost Changes Known Terrains Well-informed MT-D* Lite MT-D* Lite MT-D* Lite FRA*/G-FRA* [2.82] [2.29] [1.12] [3.25/2.26] Ill-informed Backward GAA* MT-D* Lite Forward Repeated A* FRA*/G-FRA* [2.09] [1.59] [1.00] [2.29/2.12] Unknown Terrains Well-informed Forward GAA* [1.05] Ill-informed Forward GAA* [3.44] Table 1.3: Fastest Incremental Search Algorithms for Dierent Terrains MT-D* Lite is the fastest algorithm if the number of actions whose costs change is medium. Forward Repeated A* and Backward Repeated A* are the fastest algo- rithms if the number of actions whose costs change is large. { For unknown static terrains, Forward GAA* is the fastest algorithm. 1.4 Dissertation Structure This dissertation is structured as follows: In Chapter 2, I introduce background knowledge on path planning and incremental search algorithms. In Chapter 3, I introduce GAA*, a new heuristic learning incremental search algorithm. In Chapter 4, I introduce G-FRA*, FRA*, Basic MT-D* Lite and MT-D* Lite, new search tree transforming incremental search algorithms. In Chapter 5, I compare the new incremental search algorithms for moving target search to evaluate their runtime. In Chapter 6, I introduce the applications of the new incremental search algorithms. In Chapter 7, I provide a summary of my work. 15 Chapter 2: Background In this chapter, I introduce background knowledge on search-based path planning for moving target search. The structure of this chapter is as follows: In Section 2.1, I introduce notation used in the rest of my dissertation. In Section 2.2, I describe represen- tations of path planning problems that allow them to be solved with search algorithms. In Section 2.3, I describe A*, a popular search algorithm in articial intelligence, based on which new incremental search algorithms for moving target search are developed. In Section 2.4, I describe existing search-based path planning algorithms. In Section 2.5, I describe evasion algorithms for the target in moving target search. In Section 2.6, I provide a summary of this chapter. 2.1 Notation I use the following notation throughout this dissertation: S denotes the nite set of states. s start 2 S denotes the start state of a search, and s goal 2 S denotes the goal state of a search. Succ(s) denotes the nite set of successor states of s2 S. Pred(s) denotes the nite set of predecessor states of s2S. A(s) denotes the nite set of actions that can be executed in s2 S. Executing a2 A(s) in s2 S results in a transition to 16 S4 S3 S S1 S2 (a) Four-neighbor Grid S6 S7 S8 S5 s S1 S4 S3 S2 (b) Eight-neighbor Grid Figure 2.1: Grids succ(s;a)2 Succ(s). It is assumed that, for all s2S, there exists only a unique action a2A(s) that transitions from s2S to s 0 2Succ(s), and c(s;s 0 )> 0 denotes the action cost that transitions froms2S tos 0 2Succ(s). 1 dist (s 1 ;s 2 ) denotes the minimum cost of moving from s 1 2S to s 2 2S. The state space consists of S and all actions that can be executed in each s2S. A path from s start to s goal consists of the sequence of actions fa 1 ;a 2 ;a 3 ;:::;a n1 g and the sequence of statesfs 1 = s start ;s 2 ;s 3 ;:::;s n = s goal g that satises s i+1 =succ(s i ;a i ) (1i<n). The cost of the path is n1 X i=1 c(s i ;succ(s i ;a i )). A cost-minimal path is a path with the minimum cost among all paths from s start to s goal . 2.2 State Space Representation Path planning often needs to deal with continuous states and continuous ac- tions (Likhachev, 2005; Ferguson, 2006). A common approach for path planning is to simplify these problems by discretizing these continuous elements. 1 It is trivial to extend the assumption to the situation where there exists more than one action that is able to transition from s2S to s 0 2Succ(s). In that situation, for each pair of s2S and s 0 2Succ(s), one can only use the action with the minimum cost that transitions from s2 S to s 0 2 Succ(s) at any point in time. 17 2.2.1 Grids Four-neighbor and eight-neighbor grids are commonly used in path planning for agent navigation (Ferguson, 2006; Likhachev, 2005; Koenig & Likhachev, 2002; Ishida & Korf, 1991; Koenig et al., 2007; Sun et al., 2008, 2009; Moldenhauer & Sturtevant, 2009b). Figure 2.1 shows an example of a four-neighbor grid and an eight-neighbor grid. An agent can always move from an unblocked cell to one of the four neighboring cells with action cost one (four-neighbor grid) or one of the eight neighboring cells with action cost one for orthogonal movements or p 2 for diagonal movements (eight-neighbor grid) provided that the neighboring cell is unblocked. The undirected lines between s and s i (1i 4 in four-neighbor grids and 1i 8 in eight-neighbor grids) indicate that an agent can move from s to s i and vice versa, if s i is unblocked. All other action costs are innite. 2 Figure 2.2 shows an example path planning problem from a given start location to a given goal location. Figure 2.2(a) shows the real terrain: The start location is marked S, and the goal location is marked G. The black areas represent obstacles, which are blocked. The white area is unblocked (= traversable). I use the same convention in the rest of the dissertation, namely that, black cells are blocked and white cells are unblocked. Figure 2.2(b) shows a uniform discretization of the real terrain. Each cell that is either partially or completely occupied by obstacles is treated as blocked. This results in the (eight-neighbor) grid representation of the terrain shown in Figure 2.2(c). It is the user's 2 In this dissertation, I regard actions with innite costs as nonexistent, unless they become nite in known dynamic terrains, since these actions cannot be in any path of nite cost that can be followed by the agent. Similarly, I regard blocked states as nonexistent, unless they become unblocked in known dynamic terrains. 18 S G (a) Real Terrain S G (b) Discretized Terrain S G (c) Grid Representation S G (d) State Space Representation Figure 2.2: Example Representation choice whether the grid representation of the terrain is eight-neighbor or four-neighbor, which depends on the intended application. The start and goal cells are the cells that contain the start and goal locations in the real terrain, respectively. Figure 2.2(d) shows that a state is assigned to each cell in Figure 2.2(c), with actions connecting each pair of neighboring states, referred to as the state space representation. The start and goal states are the states assigned to the start and goal cells, respectively. Then, one can search for a cost-minimal path from the start state to the goal state by applying search algorithms, such as A*, to the state space. 2.2.2 State Lattices State lattices are extensions of grids that can model motion constraints (Pivtoraiko & Kelly, 2005b; Likhachev & Ferguson, 2009; Kushleyev & Likhachev, 2009) and are there- fore well suited to path planning for non-holonomic and highly constrained robotic sys- tems with limited maneuverability, such as unmanned ground vehicles (UGVs) (Likhachev 19 START END 1 2 3 4 5 6 7 8 9 10 11 A B C D E Figure 2.3: Example Feasible Motion Primitive & Ferguson, 2009; Urmson et al., 2008). A state lattice is constructed by discretizing the terrain into a multi-dimensional grid and connecting the cells of the grid with motion primitives, which are the building blocks for more complicated motions. In this disserta- tion, I use state lattices that are constructed from three-dimensional grids: A state in a state lattice is a tuple (x;y;), where x and y together dene the location of the center of the UGV and is its orientation. A motion primitive is feasible in a state i the UGV does not collide with obstacles when executing it in that state. Ideally, an action from stateu to statev in a state lattice exists i there is a feasible motion primitive inu whose execution results in v. State lattices often include only a subset of actions to make path planning fast (Howard & Kelly, 2007; Likhachev & Ferguson, 2009). Figure 2.3 shows a feasible motion primitive in state (2;D; 90 ), whose execution results in state (10;C; 90 ), where 90 indicates an east-facing orientation. The black curve represents the path of the center of the UGV, and the blue rectangles are the perimeters of the UGV as it executes the motion primitive. 20 2.3 A* A* (Hart et al., 1968) is probably the most popular search algorithm in articial intelli- gence. The objective of each search of A* is to nd a cost-minimal path from the start state to the goal state. 2.3.1 Values A* maintains four values for every state s2S: Theh-value (= heuristic value)h(s) ofs estimatesdist (s;s goal ), the minimum cost of moving from s to s goal of a search. The h-values are admissible with respect to s goal i 0 h(s) dist (s;s goal ) for all s2 S. The h-values are consistent with respect to s goal i h(s) satises the triangle inequality, namely h(s goal ) = 0 and 0 h(s) c(s;s 0 ) +h(s 0 ) for all s2 S and s 0 2 Succ(s) (Pearl, 1985). In this dissertation, I use a function H(s;s goal ) to calculate the user-provided h-value of s with respect to s goal . The user-provided h-values H(s;s goal ) of all states s2 S need to be consistent with respect to s goal , where s goal can be any state in S. The user-providedh-valuesH(s;s goal ) also need to satisfyH(s;s 00 )H(s;s 0 )+H(s 0 ;s 00 ) for all states s;s 0 ;s 00 2S, referred to as \H-value triangle inequality". In this dissertation, I use the Manhattan distances and the Octile distances (Bulitko & Lee, 2006) to the goal state as the user-provided h-values in four-neighbor and eight-neighbor grids, respectively. The Manhattan distance between a pair of states (x 1 ;y 1 ) and (x 2 ;y 2 ) is equal tojx 1 x 2 j +jy 1 y 2 j, and the Octile distance between a pair of states (x 1 ;y 1 ) and (x 2 ;y 2 ) is equal to p 2 min(jx 1 x 2 j;jy 1 y 2 j) + 21 jjx 1 x 2 jjy 1 y 2 jj (Zhang, Sturtevant, Holte, Schaeer, & Felner, 2009), which correspond to the minimum cost of moving from (x 1 ;y 1 ) to (x 2 ;y 2 ) when all cells are unblocked in four-neighbor and eight-neighbor grids, respectively. The Manhattan distances and the Octile distances are not only consistent with respect to any s goal but also satisfy the H-value triangle inequality. A* is an informed search algorithm, that is, A* uses theh-values to guide its search. For two consistent h-values h 1 and h 2 of A*, if h 1 (s) h 2 (s) for all s2 S, then h 2 is no less informed than h 1 (Pearl, 1985). h-values are called \well-informed" if they are close todist (s;s goal ), that is, are accurate estimates of dist (s;s goal ), and \ill-informed" otherwise. The g-value g(s) is the minimum cost of moving from the start state to s found so far. Thef-valuef(s) :=g(s) +h(s) is an estimate of the minimum cost of moving from the start state via s to the goal state. The parent pointer parent(s) points to one of the predecessor states s 0 of s. s 0 is called the parent of s. The parent pointers are used to extract the path after the search terminates. 2.3.2 OPEN and CLOSED Lists A* maintains two data structures: The OPEN list is a priority queue that contains all states to be considered for expansion. Initially, it contains the start state only. 22 01 function CalculateKey(s) 02 return g(s)+h(s); 03 procedure InitializeState(s) 04 if search(s)6=counter 05 g(s) :=1; 06 h(s) :=H(s;s goal ); 07 search(s) :=counter; 08 procedure UpdateState(s) 09 if s2OPEN 10 OPEN.Update(s, CalculateKey(s)); 11 else 12 OPEN.Insert(s, CalculateKey(s)); 13 function ComputePath() 14 while OPEN.TopKey()< CalculateKey(s goal ) 15 s :=OPEN.Pop(); 16 for all s 0 2Succ(s) 17 InitializeState(s 0 ); 18 if g(s 0 )>g(s)+c(s;s 0 ) 19 g(s 0 ) :=g(s)+c(s;s 0 ); 20 parent(s 0 ) :=s; 21 UpdateState(s 0 ); 22 if OPEN =; 23 return false; 24 return true; 25 function Main() 26 counter := 0; 27 for all s2S 28 search(s) := 0; 29 counter :=counter+1; 30 InitializeState(s start ); 31 InitializeState(s goal ); 32 g(s start ) := 0; 33 OPEN :=;; 34 OPEN.Insert(s start , CalculateKey(s start )); 35 if ComputePath() = false 36 return false; /* No path found */ 37 return true; Figure 2.4: Pseudo Code of A* The CLOSED list is a set that contains all states that have been deleted from the OPEN list. Initially, it is empty. 3 3 Although the pseudocode of A* in Figure 2.4 does not maintain a CLOSED list, we can add \CLOSED :=;" between Line 33 and Line 34 to initialize theCLOSED list, and add \CLOSED.Insert(s)" between Line 15 and Line 16 to insert s into the CLOSED list, in order to maintain a CLOSED list ex- plicitly. 23 2.3.3 Operations Figure 2.4 gives the pseudo code of a version of A* for only one search. 4 For ease of description of how A* can be applied to moving target search later in this dissertation where multiple searches might be performed, I use a variable counter to indicate how many searches have been performed so far, including the current search. Since this is the rst search, counter is set to one (Lines 26 and 29). I use a variable search(s) for alls2S to indicate whether g(s) and h(s) have been initialized in the counter-th search. g(s) and h(s) have been initialized in the counter-th search i counter = search(s). Initially, search(s) = 0 for all s2S, which indicates that none of the states' g- and h-values have been initialized (Lines 27 - 28). A* then initializes the g- and h-values of s start and s goal (Lines 30 - 31) and initializes the OPEN list to contain s start only (Lines 33 - 34). The main routine of A* is performed in ComputePath() (Lines 13 - 24). One execution of ComputePath() is called one search. A* repeats the following procedure in ComputePath(): It deletes a state s with the smallestf-value from the OPEN list (Line 15) and expandss by performing the following operations for each successor states 0 2 Succ(s) (Lines 17 - 21). Ifg(s 0 ) andh(s 0 ) has not yet been initialized (search(s 0 )6= counter), A* initializes them in InitializeState(s 0 ) (Line 17) by assigningg(s 0 ) :=1 and calculating the user-providedh-valueh(s 0 ) :=H(s 0 ;s goal ) (Lines 05 - 06). A* assigns search(s 0 ) := counter (Line 07) each time InitializeState(s 0 ) 4 In this dissertation, all pseudo codes use the following functions to manage the OPEN list unless otherwise specied: The keys of all states in the OPEN list are theirf-values. CalculateKey(s) calculates the key ofs. OPEN.Top() returns a state with the smallest key of all states in theOPEN list. OPEN.Pop() returns a state with the smallest key of all states in the OPEN list and removes the state from the OPEN list. OPEN.TopKey() returns the smallest key of all states in the OPEN list. If the OPEN list is empty, then OPEN.TopKey() returns1. OPEN.Insert(s;k) inserts state s into the OPEN list with key k. OPEN.Update(s;k) changes the key of state s in the OPEN list to k. OPEN.Delete(s) deletes state s from the OPEN list. 24 is called to indicate that g(s 0 ) andh(s 0 ) have been initialized in the counter-th search. If g(s 0 ) is larger thang(s) +c(s;s 0 ), then A* generatess 0 by performing the following three steps: (1) Assigning g(s 0 ) := g(s) +c(s;s 0 ) (Line 19); (2) Setting the parent pointer of state s 0 to state s (Line 20); and (3) Updating the key of s 0 to re ect its current f-value if s 0 is in the OPEN list or inserting s 0 into the OPEN list with its current f-value as its key (Line 21). A* terminates when its OPEN list is empty (Lines 22 - 23) or when the smallest f-value of all states in the OPEN list is no smaller than the f-value of s goal (Line 24). The former condition indicates that no path exists from s start tos goal , and the latter condition indicates that A* found a cost-minimal path from s start to s goal . The principle behind A* is that, when examining whether a states should be expanded next, A* considers not only the cost of the path from s start to s found so far (= g(s)), but also the estimated cost of the cost-minimal path froms tos goal (=h(s)). The sum of g(s) and h(s) (= f(s)) approximates the cost of a cost-minimal path from s start to s goal via s, which provides more accurate guidance than only considering g(s) when A* picks the most promising state to expand. 2.3.4 Properties A* has the following properties when using consistenth-values (Pearl, 1985), that will be used in this dissertation: A* Property 1: During a search, every state can be expanded at most once. A* Property 2: During a search, A* expands no more states than an otherwise identical version of A* (with the same tie-breaking strategy) for the same search 25 problem if the h-values used by the rst A* search are no less informed than the h-values used by the latter A* search. A* Property 3: During a search, thef-values of the series of expanded states over time are monotonically non-decreasing. Thus, f(s) f(s goal ) for all states s2 S that were expanded during an A* search, andf(s goal )f(s) for alls2S that were generated but remained unexpanded (= remained in the OPEN list) when the A* search terminates. A* Property 4: During a search, every expanded state s (= every state in the CLOSED list) satises the following conditions: (a) If s6=s start , then the parent of s has also been expanded andg(s) =g(parent(s)) +c(parent(s);s). (b) Theg-value of s satises g(s) =g(s start ) +dist (s start ;s). Thus, the dierence between g(s) of every expanded states andg(s start ) is equal to the cost of a cost-minimal path from s start to s. A* Property 5: During a search, a path from s start to any generated state s2S can be identied in reverse by repeatedly following the parent pointers from s to s start . This path is a cost-minimal path from s to s start if s has been expanded in that search. The search tree of an A* search consists of s start and all generated states with their parent pointers pointing to their parents. The root of the search tree is s start . The subtree (of the search tree of an A* search) rooted in s consists of (1) s, (2) every generated state s 0 2 S (s 0 6= s) with a path to s, which can be identied in reverse by repeatedly following the parent pointers from s 0 to s, and 26 (3) the parent pointer of state s 0 . A* Property 5 implies that the states in the CLOSED list form a contiguous area in grids (Sun & Koenig, 2007). A* Property 6: During a search, the OPEN list contains the following states: If the CLOSED list is empty, then the OPEN list contains only s start . Otherwise, the OPEN list contains exactly all states that are not in the CLOSED list but have at least one predecessor state in the CLOSED list. Every state s (s6= s start ) in the OPEN list satises the following conditions: (a) The parent of s is the state s 0 in the CLOSED list that minimizes g(s 0 ) +c(s 0 ;s). (b) The g-value of s satises g(s) =g(parent(s)) +c(parent(s);s). A* Property 7: A* terminates. If A* terminates because its OPEN list is empty, then no path exists from the start state to the goal state. Otherwise, A* terminates when the smallest f-value of all states in the OPEN list is no smaller than the f-value of the goal state. One can then identify a cost-minimal path from the start state to the goal state in reverse by repeatedly following the parent pointers from the goal state to the start state. 2.3.5 Example g search h f Figure 2.5: Legend of Figure 2.6 27 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0 3 1 0 D A B C 1 2 3 4 S G (a) Time Step 0 0 0 0 0 0 0 0 0 0 0 0 1 1 5 0 3 4 1 3 1 0 1 2 3 4 D A B C S G (b) Time Step 1 0 0 0 0 0 0 0 0 2 5 3 1 0 0 1 1 5 0 3 4 1 3 1 0 1 2 3 4 D A B C S G (c) Time Step 2 0 0 0 0 3 7 4 1 0 0 0 2 5 3 1 0 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (d) Time Step 3 4 9 5 1 0 0 0 3 7 4 7 4 1 3 1 0 0 2 5 3 1 0 0 1 1 5 0 3 4 1 3 1 0 1 2 3 4 D A B C S G (e) Time Step 4 4 9 5 9 5 1 4 1 0 0 3 7 4 7 5 7 4 1 3 1 2 1 0 2 5 3 1 0 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (f) Time Step 5 4 9 5 9 6 9 5 1 4 1 3 1 0 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (g) Time Step 6 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 1 2 3 4 D A B C S G (h) Time Step 7 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 1 2 3 4 D A B C S G (i) Time Step 8 Figure 2.6: Example Trace of A* Figure 2.6 gives an example of A* in a four-neighbor grid. In this example, A* only performs one search. The start state is D2 (marked S) and the goal state is C4 (marked G). All states have their search-value in the lower right corner. States that have been initialized in InitializeState() also have theirg-values in the upper left corner,h-values in the lower left corner, andf-values in the upper right corner (see Figure 2.5). The outgoing arrow from a state is the parent pointer of the state and points to its parent. A time step is one execution of Lines 15 - 21 in Figure 2.4. The state that is being expanded in each 28 Time Step Expanded States States in State being the OPEN List Expanded Time Step 0 ; D2 NULL Time Step 1 ; D1 D2 Time Step 2 D2 C1 D1 Time Step 3 D1,D2 B1 C1 Time Step 4 D1,D2,C1 A1,B2 B1 Time Step 5 D1,D2,C1,B1 A1,A2,B3 B2 Time Step 6 D1,D2,C1,B1,B2 A1,A2,A3,B4,C3 B3 Time Step 7 D1,D2,C1,B1,B2,B3 A1,A2,A3,A4,C3,C4 B4 Time Step 8 D1,D2,C1,B1,B2,B3,B4 A1,A2,A3,A4,C3,C4 Terminate Table 2.1: Operations of A* time step is highlighted with a bold frame. The states that have been expanded before each time step are shaded grey. A* breaks ties among states with the same f-values in favor of states with larger g-values, which is known to be a good tie-breaking strategy. The operations of A* and the corresponding states in the OPEN list, expanded states, and the state being expanded in each time step are given in Table 2.1. After the search, one can identify a cost-minimal path from the start state to the goal state in reverse by repeatedly following the parent pointers from C4 to D2. 2.3.6 Applying A* to Moving Target Search A* is able to nd a cost-minimal path from the start state to the goal state in a search. In this section, I introduce how to apply A* to moving target search, where multiple searches can be performed. 2.3.6.1 Search Directions When applying A* to moving target search. A* can perform a search in two dierent directions: 29 A* performs a search from the agent to the target by assigning the current states of the agent and target to the start and goal states of the search, respectively, referred to as Forward A*. A* performs a search from the target to the agent by assigning the current states of the agent and target to the goal and start states of the search, respectively, referred to as Backward A*. In this dissertation, \forward search" refers to a search that assigns the current states of the agent and target to the start and goal states, respectively, and \backward search" refers to a search that assigns the current states of the agent and target to the goal and start states, respectively. 2.3.6.2 Repeated A* One can apply Forward A* (by using assumptive planning introduced in Section 1.1.1) to moving target search by assigning the current states of the agent and target to the start and goal states for each search, respectively, and repeatedly perform searches to nd paths for the agent, referred to as Forward Repeated A*. One can also apply Backward A* to moving target search by assigning the current states of the agent and the target to the goal and start states for each search, respectively, and repeatedly perform searches to nd paths for the agent, referred to as Backward Repeated A*. In this section, I only introduce Forward Repeated A*, since it is trivial to switch the search direction to backward search, resulting in Backward Repeated A*. Forward Repeated A* is guaranteed to nd a cost- minimal path from the start state to the goal state for each search no matter how the 30 01 function CalculateKey(s) 02 return g(s)+h(s); 03 procedure InitializeState(s) 04 if search(s)6=counter 05 g(s) :=1; 06 h(s) :=H(s;s goal ); 07 search(s) :=counter; 08 procedure UpdateState(s) 09 if s2OPEN 10 OPEN.Update(s, CalculateKey(s)); 11 else 12 OPEN.Insert(s, CalculateKey(s)); 13 function ComputePath() 14 while OPEN.TopKey()< CalculateKey(s goal ) 15 s :=OPEN.Pop(); 16 for all s 0 2Succ(s) 17 InitializeState(s 0 ); 18 if g(s 0 )>g(s)+c(s;s 0 ) 19 g(s 0 ) :=g(s)+c(s;s 0 ); 20 parent(s 0 ) :=s; 21 UpdateState(s 0 ); 22 if OPEN =; 23 return false; 24 return true; 25 function Main() 26 counter := 0; 27 s start := the current state of the agent; 28 s goal := the current state of the target; 29 for all s2S 30 search(s) := 0; 31 while s start 6=s goal 32 counter :=counter+1; 33 InitializeState(s start ); 34 InitializeState(s goal ); 35 g(s start ) := 0; 36 OPEN :=;; 37 OPEN.Insert(s start , CalculateKey(s start )); 38 if ComputePath() = true 39 while target not caught AND target on path from s start to s goal AND action costs do not change 40 agent follows the path from s start to s goal ; 41 if agent caught target 42 return true; 43 s start := the current state of the agent; 44 else 45 wait until some action costs decrease; 46 s goal := the current state of the target; 47 update the action costs (if any); 48 return true; Figure 2.7: Pseudo Code of Forward Repeated A* action costs of the agent change and the target moves between searches, since it performs Forward A* for each search from scratch. Figure 2.7 gives the pseudo code of one version of Forward Repeated A* that extends the pseudo code of Forward A* in Figure 2.4, which applies to all terrains introduced in 31 Section 1.1.2. This pseudo code can be optimized when applied to unknown static terrains, where the action costs are non-decreasing between searches: We can replace Line 39 with \while target not caught AND target on path from s start to s goal AND action costs do not increase between the agent and the target along the path", and replace Line 45 with \return false;". For ease of description of the algorithms, I model the problem in a way that the agent and the target take turns moving at most one step at a time rather than moving simultaneously. The main routine of Forward Repeated A* is performed in ComputePath() (Lines 13 - 24), which remains unchanged. The main change of Forward Repeated A* over Forward A* is that multiple searches can be performed in the while loop on Line 31 in order to repeatedly nd new cost-minimal paths from the current state of the agent to the current state of the target: After each search, if a path is found (Line 38), then the agent moves along the path to catch the target until either the target is caught or the current state of the target is no longer on the path or the action costs of the agent change (Lines 39 - 43). If no path exists from the current state of the agent to the current state of the target, the agent does not move, but waits until some action costs decrease upon which there might exist a path from the current state of the agent to the current state of the target (Lines 44 - 45). 5 The agent then updates the current states of the agent and target (Lines 43 and 46) and the action costs of the agent (if they changed) (Line 47). The agent then performs the next search to nd a path from the current state of the agent to the current state of the target given the current action costs of the agent. Forward Repeated A* terminates when the agent caught the target (Line 42). 5 In unknown static terrains, where the action costs are non-decreasing, Forward Repeated A* can terminate immediately and determine that the target cannot be caught if no path is found after a search. 32 g search h f Figure 2.8: Legend of Figure 2.9 1 2 3 4 D A B C S G (a) Real Terrain 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (b) First Search counter = 1 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 5 7 6 7 4 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (c) After Target Moved and B2 Blocked counter = 1 1 5 2 5 3 5 4 7 4 2 3 2 2 2 3 2 0 3 4 5 5 7 3 2 1 2 2 2 1 3 5 5 7 7 2 2 0 2 0 1 2 5 3 5 3 2 2 2 0 D A B C 1 2 3 4 S G (d) Second Search counter = 2 Figure 2.9: Example Trace of Forward Repeated A* I use an example problem of moving target search in unknown static terrains to demonstrate the operations of Forward Repeated A*. In the rest of the dissertation, all example problems on grids are given on four-neighbor grids. Figure 2.9(a) illustrates the real terrain: The current states of the agent and the target are D2 (marked S) and C4 (marked G), respectively. The sensor onboard the agent can observe only the four neighboring states of the current state of the agent. Thus, initially, the agent only knows that C2 and D3 are blocked, and that D1 and D2 are unblocked. Figures 2.9(b) - (d) illustrate the agent navigation problem solved with Forward Re- peated A*. All states have their search-value in the lower right corner. States that have been initialized in InitializeState() have their h-values in the lower left corner, g-values in the upper left corner and f-values in the upper right corner (shown in Figure 2.8). Expanded states during each search are shaded grey. Figure 2.9(b) illustrates the rst search. The agent then moves along the path found by the rst search. Figure 2.9(c) 33 illustrates that, after the rst search, the agent moves along the path to B1 where it observes that B2 blocks its path. In the meantime, the target moves to C3 which is o the path found by the rst search. The agent then performs the second search to plan a new path from its current state B1 to the current state of the target C3. Figure 2.9(d) illustrates the second search. The agent then moves along this path until it arrives at C3 and catches the target. Forward and Backward Repeated A* can be applied to moving target search. However, they may not be fast enough to satisfy the runtime requirements of certain applications. For example, Repeated A* is not fast enough to satisfy the runtime requirements of the ROMAN Tutoring System (Belghith et al., 2010), an automatic intelligent tutoring system to train the astronauts to operate the Space Station Remote Manipulator System. 2.4 Repeated A*-based Path Planning Algorithms for Agent Navigation Repeated A* has been extended in various directions to speed up its search such that it can be applied to dierent applications of agent navigation (Korf, 1990; Koenig & Likhachev, 2002; Stentz, 1995; Ferguson & Stentz, 2005a; Likhachev, Ferguson, Gordon, Stentz, & Thrun, 2007; Hern andez, Meseguer, Sun, & Koenig, 2009). 2.4.1 Real-time Search Real-time search algorithms (Korf, 1990; Hern andez & Meseguer, 2005; Koenig & Likhachev, 2006; Bulitko & Lee, 2006; Furcy & Koenig, 2000; Bond, Widger, Ruml, 34 & Sun, 2010; Sturtevant & Bulitko, 2011; Hern andez & Meseguer, 2005, 2007b, 2007a; Furcy & Koenig, 2000; Shue & Zamani, 1993) have been used for path planning for agent navigation in terrains where the action costs of the agent do not decrease between searches (by using assumptive planning introduced in Section 1.1.1). Real-time search algorithms guarantee a constant search time per action execution (referred to as the \real-time property"). LRTA* (Korf, 1990) is a real-time search algorithm designed for stationary target search. LRTA* interleaves searches with action executions, potentially at the expense of increasing the number of action executions. LRTA* nds the beginning (= prex) of a complete path from the current state of the agent to the current state of the target by restricting the search to the states around the current state of the agent with the current information about the terrains, and updates the h-value of the current state of the agent to make it more informed after each step. The agent then moves along that path until it reaches the end of the path or observes that the action costs of the agent along the path have increased (for example, the path is blocked by newly observed blocked states). If the current state of the agent is dierent from the current state of the target, then LRTA* repeats the process. Otherwise, LRTA* terminates successfully. MTS (Ishida & Korf, 1991; Ishida, 1992) extends LRTA* to moving target search. MTS updates h- values from the current state of the agent to the current state of the target after each search, and guarantees a constant search time per action execution. Trailblazer (Chimura & Tokoro, 1994) extends MTS to reduce the number of searches needed to catch the target by sacricing MTS's real-time property in some situations: Trailblazer stores the trajectories of both the agent and the target in a graph G. Before their trajectories 35 intersect, Trailblazer performs the search in the same way as MTS. When their trajectories intersect, MTS runs Dijkstra's algorithm (Dijkstra, 1959) on graphG to nd a path from the current state of the agent to the current state of the target. Since Dijkstra's algorithm cannot guarantee a constant search time per action execution, Trailblazer loses its real- time property in this situation. Updating Shortest Paths (USP) (Edelkamp, 1998) is an incremental version of Trailblazer. USP corrects only those g-values of the states in G that are not the minimum costs moving from the start state to these states after each search. Thus, USP can run faster than Trailblazer that runs Dijkstra's algorithm from scratch on graphG. 6 A longer overview of real-time search algorithms is given in (Ishida, 1997). The advantage of real-time search algorithms is that they can guarantee a constant search time per action execution. Unfortunately, the quality of the solution of real-time search algorithms, that is, the total action cost of the agent from the moment it starts navigation until it stops navigation can be quite poor due to the fact that they do not nd a complete path from the current state of the agent to the current state of the target in each search and often move back and forth in order to be able to learn the h- values necessary to move to the current state of the target. Thus, in order to address the disadvantages of real-time search algorithms, I develop new incremental search algorithms in this dissertation to speed up search-based path planning algorithms that nd complete paths from the current state of agent to the current state of the target in each search. I 6 Trailblazer and USP cannot guarantee a constant search time per action execution in all situations (for example, when the trajectory of the agent intersects the trajectory of the target). However, I still list them as (partial) real-time search algorithms as they are based on the principle of MTS and can provide the real-time property in some situations. 36 demonstrate that the newly developed algorithms can satisfy the runtime requirements of some applications of moving target search, such as computer games. 2.4.2 Incremental Search Path planning with assumptive planning is often a repeated search process (as discussed in Section 1.1.1) where an agent needs to repeatedly nd new paths for a series of similar search problems as the action costs of the agent change between searches. Fortunately, in many path planning applications, the number of actions whose costs change between searches is small (Koenig, Likhachev, & Furcy, 2004a). This suggests that complete recomputations of the paths from scratch can be wasteful and inecient. Incremental search algorithms reuse information from previous searches to speed up the current search and hence are often able to nd cost-minimal paths for series of similar search problems faster than is possible by solving each search problem from scratch. Incremental search algorithms are particularly well-suited for large terrains with slight changes over time, as there is usually a large amount of information that can be reused between searches in such cases (Koenig, Likhachev, Liu, & Furcy, 2004b). The idea of incremental search was introduced a few decades ago. For example, Deo and Pang wrote an overview article on cost-minimal path algorithms (Deo & Pang, 1984), which already cited several incremental search algorithms, including several ones published in the late 1960s (Koenig et al., 2004b). Since then, additional incremental search algorithms have been introduced in the literature (Stentz, 1994, 1995; Barbehenn & Hutchinson, 1995; Ramalingam & Reps, 1996; Frigioni, Marchetti-Spaccamela, & Nanni, 1998). Recent developments of incremental search algorithms have focused on replanning 37 with A*, referred to as incremental heuristic search algorithms, which have been used as path planning algorithms for a wide range of robotic systems (Likhachev et al., 2005a; Urmson et al., 2008; Likhachev & Ferguson, 2009; Likhachev et al., 2007, 2005b; Ferguson & Stentz, 2005b, 2005a). Existing incremental heuristic search algorithms generally fall into two classes: Class 1: Heuristic learning incremental search algorithms. The rst class of al- gorithms updates the h-values for the current search by using information from previous searches to make them more informed and thus future searches more fo- cused, referred to as heuristic learning incremental search algorithms. For example, Adaptive A* (Koenig & Likhachev, 2005) and Moving Target Adaptive A* (MT- Adaptive A*) (Koenig et al., 2007) belong to this class. Adaptive A* is simple to understand and easy to implement. However, Adaptive A* has restrictions on its applicability: Adaptive A* nds cost-minimal paths in terrains where (1) the goal state does not change between searches, and (2) the action costs of the agent do not decrease between searches. It cannot guarantee to nd cost-minimal paths in other situations. MT-Adaptive A* extends Adaptive A* to nd cost-minimal paths in terrains where both the start and goal states can change between searches. How- ever, MT-Adaptive A* still cannot guarantee to nd cost-minimal paths in terrains where the action costs of the agent decrease between searches. Class 2: Search tree transforming incremental search algorithms. The second class of algorithms transforms the search tree of the previous search to the search tree of the current search such that the current search can start with the transformed 38 search tree instead of starting from scratch, referred to as search tree transforming incremental search algorithms. For example, DynamicSWSF-FP (Ramalingam & Reps, 1996), Dierential A* (Trovato, 1990), Dynamic A* (D*) (Stentz, 1995), Lifelong Planning A* (LPA*) (Koenig et al., 2004a) and its generalized version D* Lite (Koenig & Likhachev, 2002) belong to this class. D* Lite is one of the state-of- the-art incremental search algorithms that has been extended and applied to many applications (Ferguson & Stentz, 2005b; Likhachev et al., 2005a, 2005b). However, D* Lite also has restrictions on its applicability: D* Lite can nd cost-minimal paths in terrains where the goal state can change between searches but the start state does not change between searches. It is still unknown how to apply D* Lite to search problems where both the start and goal states can change between searches and make it run faster than A*. Incremental search algorithms have the following properties: First, like Repeated A*, incremental search algorithms nd paths with a memory requirement that is in linear in the number of states in the state space, which can be satised by many applications. For example, Repeated A* variants have been imple- mented in computer games (Rabin, 2002), and incremental search algorithms have been implemented in robotic systems (Likhachev, 2005; Ferguson & Stentz, 2005c). It has been demonstrated that incremental search algorithms can solve agent navi- gation problems in large state spaces (Likhachev & Ferguson, 2009). Thus, in this dissertation, I focus mainly on the runtimes of the developed incremental search algorithms. 39 Second, incremental search algorithms are runtime-ecient. Incremental search algorithms can reuse information from previous searches to speed up the current search, and can nd cost-minimal paths for a series of similar search problems faster than by solving each search problem from scratch (Koenig et al., 2004b). For example, D* Lite can achieve a speedup of one to two orders of magnitude over repeated A* searches for stationary target search (Koenig & Likhachev, 2002). Third, like Repeated A*, incremental search algorithms guarantee that the path returned by each search is a cost-minimal path based on the information currently available to the agent (Koenig et al., 2004a; Koenig & Likhachev, 2005; Sun et al., 2009), no matter how the state space changes. I now give an overview of three popular incremental search algorithms: namely Adap- tive A* (Koenig & Likhachev, 2005), D* Lite (Koenig & Likhachev, 2002) and Dierential A* (Trovato, 1990). I give more details on Adaptive A* and D* Lite than on Dierential A* because I develop new incremental search algorithms for moving target search based on Adaptive A* and D* Lite. A longer overview of incremental search has been given in (Koenig et al., 2004b). 2.4.2.1 Adaptive A* Adaptive A* (Koenig & Likhachev, 2005) is a heuristic learning incremental search algo- rithm designed for stationary target search in unknown static terrains. It needs consistent h-values with respect to the goal state of each search to guide its search. Adaptive A* performs Forward Repeated A* to repeatedly nd cost-minimal paths from the current 40 state of the agent to the current state of the target by assigning the current states of the agent and the target to the start and goal states of each search, respectively. Similar to Forward Repeated A*, Adaptive A* constructs its search tree for each search from scratch. Dierent from Forward Repeated A*, Adaptive A* updates itsh-values between searches to make them more informed and thus future searches more focused. Adaptive A* can only perform forward (but not backward) searches to nd cost- minimal paths between the current states of the agent and the target. This is so because the h-value of a state is an estimate of the minimum cost of moving from the state to the goal state of a search. After each search, the h-values of all expanded states are updated with respect to the goal state of the previous search (= the h-values are updated to estimate the minimum cost of moving from the state to the goal state of the previous search). Since the target is stationary, the goal state does not change between searches if forward searches are performed. Thus, the updatedh-values remain consistent with respect to the goal state. However, the goal state is the current state of the agent if backward searches are performed. Thus, the h-values can become inadmissible and inconsistent with respect to the goal state when the agent moves. Therefore, Adaptive A* can have dierent start states but has to have the same goal state. In addition, Adaptive A* is guaranteed to nd cost-minimal paths only in terrains where the action costs of the agent do not decrease between searches. Thus, Adaptive A* only applies to known static terrains and unknown static terrains (see Section 1.1.2). I rst review an eager version of Adaptive A* that is easier to understand, referred to as Eager Adaptive A* (Koenig & Likhachev, 2005). I then review a more complicated 41 01 function CalculateKey(s) 02 return g(s)+h(s); 03 procedure InitializeState(s) 04 if search(s) = 0 05 g(s) :=1; 06 h(s) :=H(s;s goal ); 07 else if search(s)6=counter 08 g(s) :=1; 09 search(s) :=counter; 10 procedure UpdateState(s) 11 if s2OPEN 12 OPEN.Update(s, CalculateKey(s)); 13 else 14 OPEN.Insert(s, CalculateKey(s)); 15 function ComputePath() 16 while OPEN.TopKey()< CalculateKey(s goal ) 17 s :=OPEN.Pop(); 18 CLOSED.Insert(s); 19 for all s 0 2Succ(s) 20 InitializeState(s 0 ); 21 if g(s 0 )>g(s)+c(s;s 0 ) 22 g(s 0 ) :=g(s)+c(s;s 0 ); 23 parent(s 0 ) :=s; 24 UpdateState(s 0 ); 25 if OPEN =; 26 return false; 27 return true; 28 function Main() 29 counter := 0; 30 s start := the current state of the agent; 31 s goal := the current state of the target; 32 for all s2S 33 search(s) := 0; 34 while s start 6=s goal 35 counter :=counter+1; 36 InitializeState(s start ); 37 InitializeState(s goal ); 38 g(s start ) := 0; 39 OPEN :=CLOSED :=;; 40 OPEN.Insert(s start , CalculateKey(s start )); 41 if ComputePath() = false 42 return false; /* Target cannot be caught*/ 43 for all s2CLOSED 44 h(s) :=g(s goal )g(s); 45 while target not caught AND action costs on path do not increase 46 agent follows the path from s start to s goal ; 47 if agent caught target 48 return true; 49 s start := the current state of the agent; 50 update the increased action costs (if any); 51 return true; Figure 2.10: Pseudo Code of Eager Adaptive A* version, referred to as Lazy Adaptive A* (Koenig & Likhachev, 2005), that is more ecient than the eager version. 42 Eager Adaptive A* As discussed earlier, Eager Adaptive A* can only perform forward searches, and it only applies to stationary target search in known static terrains and unknown static terrains (see Section 1.1.2), where the action costs of the agent do not decrease between searches. The principle behind Eager Adaptive A* is the following: Let s denote any expanded state by an A* search. Then, g(s) is equal to dist (s start ;s) since s was expanded by the search (see A* Property 4 in Section 2.3.4). Similarly,g(s goal ) is equal todist (s start ;s goal ). dist (s start ;s), dist (s;s goal ) and dist (s start ;s goal ) satisfy the triangle inequality: dist (s start ;s goal ) dist (s start ;s) +dist (s;s goal ) dist (s start ;s goal )dist (s start ;s) dist (s;s goal ) g(s goal )g(s) dist (s;s goal ): Thus,g(s goal )g(s) is an admissible estimate of the minimum cost of moving from s to s goal that can be calculated quickly. It can thus be used as a new admissible h-value of s. Adaptive A* therefore updates the h-values by assigning h(s) :=g(s goal )g(s) (2.1) for all expanded states s by the search. Let h 0 (s) denote the h-values after the updates. The updated h-values h 0 (s) are not only admissible but also consistent with respect to 43 s goal (Koenig & Likhachev, 2005). Furthermore, since s was expanded by the search, it holds that f(s) f(s goal ) (see A* Property 3 in Section 2.3.4), and f(s goal ) = g(s goal ). Thus, g(s) +h(s) g(s goal ) (2.2) h(s) g(s goal )g(s) h(s) h 0 (s) Thus, the updated h-values h 0 (s) of all expanded states s are no smaller than the im- mediately preceding h-values h(s). This property implies that the h-values h 0 (s) are no less informed than the immediately preceding h-values h(s) and thus also all previous h-values, including the user-provided h-values. Consequently, a search with the h-values h 0 (s) cannot expand more states than an A* search with the user-provided h-values with the same tie-breaking strategy (see A* Property 2 in Section 2.3.4). It therefore cannot be slower (except possibly for the small amount of runtime needed by the bookkeeping and h-value update operations) than an A* search with the user-provided h-values. In this sense, Eager Adaptive A* is guaranteed to reduce its runtime per search more and more over time, and its runtime per search cannot be larger than that of a search that does not modify the user-provided h-values. This principle was used in (Holte, Mkadmi, & Macdonald, 1996) and later resulted in the independent development of Eager Adaptive A* (Koenig & Likhachev, 2005). Eager Adaptive A* does not update the h-values of the statess that remained unexpanded, becausef(s)f(s goal ) holds for all expanded states 44 (see A* Property 3 in Section 2.3.4) but not necessarily for an unexpanded state. Thus, Inequality (2.2) is not guaranteed to hold for unexpanded statess, andg(s goal )g(s) can be smaller than the immediately preceding h-value h(s). Thus, updating these h-values could decrease them, and the argument above thus does not hold for them. Figure 2.10 gives the pseudo code of Eager Adaptive A*, 7 which extends the pseudo code of Forward Repeated A* in Figure 2.7. Eager Adaptive A* applies only to terrains where the action costs of the agent do not decrease between searches. Thus, Eager Adaptive A* can terminate when it nds that no path exists between the start state and the goal state after a search (Line 42), since it is guaranteed that Eager Adaptive A* will not be able to nd any path in future searches. Similar to Forward Repeated A*, Eager Adaptive A* repeatedly performs Forward A* searches in ComputePath() (Lines 15 - 27). InitializeState(s) is executed when (1) the g-value and h-value of s are needed (Line 20) during each search and (2) s start and s goal are initialized before each search (Lines 36 - 37). It initializes theh-value ofs to its user-providedh-value is has not yet been initialized by InitializeState(s) in any search (search(s) = 0) (Lines 04 and 06) and initializes the g-value of s to innity i s has not been initialized by InitializeState(s) during the counter-th search (= the current search) (Lines 05 and 08). search(s) is set to counter each time InitializeState(s) is executed to indicate that the g-values of s have been initialized in the counter-th search and should not be re-initialized in the same search (Line 09). 7 In this dissertation, all pseudo codes use the following functions to manage the CLOSED list unless otherwise specied: CLOSED.Insert(s) inserts state s into the CLOSED list. CLOSED.Delete(s) deletes state s from the CLOSED list. 45 The main dierence between Forward Repeated A* and Eager Adaptive A* is as follows: After each search, Eager Adaptive A* updates the h-values of all expanded states in that search by executing Assignment (2.1) (Lines 43 - 44). Eager Adaptive A* has to update the h-values after the search rather than during the search because it has to know the value of g(s goal ) after the search. In order to store all expanded states in each search so that their h-values can be updated after the search, Eager Adaptive A* maintains a CLOSED list that stores all expanded states in each search (Lines 18 and 39). I use the same example problem of agent navigation in unknown static terrains shown in Figure 2.9, except that the target does not move between searches, to demonstrate the operations of Eager Adaptive A*. Figures 2.12 and 2.13 illustrate the same agent naviga- tion problem solved by Forward Repeated A* and Eager Adaptive A*, respectively. All states have their search-value in the lower right corner. States that have been initialized in InitializeState() have their h-values in the lower left corner, g-values in the upper left corner and f-values in the upper right corner (shown in Figure 2.11). Expanded states during each search are shaded grey. All search algorithms break ties among states with the same f-values in favor of states with larger g-values, which is known to be a good tie-breaking strategy (Koenig & Likhachev, 2005). g search h f Figure 2.11: Legend of Figure 2.12 46 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (a) First Search counter = 1 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 1 2 3 4 D A B C S G (b) After First Search counter = 1 1 6 2 6 3 6 4 6 5 2 4 2 3 2 2 2 0 4 4 6 5 6 4 2 2 2 1 2 1 4 6 7 6 6 3 2 1 1 0 2 2 6 3 6 4 2 3 2 0 1 2 3 4 D A B C S G (c) Second Search counter = 2 Figure 2.12: Example Trace of Forward Repeated A* 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (a) First Search counter = 1 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 5 1 1 1 0 1 1 5 0 3 6 1 7 1 0 D A B C 1 2 3 4 S G (b) After First Search counter = 1 1 6 2 6 3 6 4 6 5 2 4 2 3 2 2 2 0 4 4 6 5 6 4 2 2 2 1 2 1 6 6 7 6 6 5 2 1 1 0 2 2 8 0 3 6 2 7 1 0 1 2 3 4 D A B C S G (c) Second Search counter = 2 Figure 2.13: Example Trace of Eager Adaptive A* For the rst search, Forward Repeated A* (shown in Figure 2.12(a)) and Eager Adap- tive A* (shown in Figure 2.13(a)) perform the same operations, since Eager Adaptive A* and Forward Repeated A* both use the user-providedh-values (= Manhattan Distances) to guide the search. After the rst search, Eager Adaptive A* (shown in Figure 2.13(b)) updates the h-values of all expanded states during the rst search (shaded grey) by executing Assign- ment (2.1). The expanded states have their updated h-values in the lower left corner. In contrast to Eager Adaptive A*, Forward Repeated A* does not update anyh-values after the rst search. The agent then moves along the cost-minimal path returned by the rst 47 search until it reaches B1, where it observes that obstacle B2 blocks its path. Then, the agent performs a second search to nd a new path from the current state of the agent B1 to the current state of the target C4. For the second search, Forward Repeated A* still uses the user provided h-values, while Eager Adaptive A* uses updated h-values for all states that were expanded during the rst search, which can be more informed. Figure 2.12(c) shows that Forward Repeated A* expands D1 and D2 because the user-provided h-values are misleading and form a local minimum. Eager Adaptive A* avoids expanding D1 and D2 during the second search (shown in Figure 2.13(c)) resulting in a smaller number of state expansions. Lazy Adaptive A* Eager Adaptive A* rst performs a Forward A* search and then updates theh-values of the states that were expanded during the search. A disadvantage of updating the h-values of all states that were expanded during the search is that one potentially updates the h-values of states that are not needed by future searches. Lazy Adaptive A* addresses this problem by postponing updating theh-values of the expanded states after one search and updates theh-values only when needed during a future search. Like Eager Adaptive A*, Lazy Adaptive A* can only perform forward searches, and it applies only to agent navigation where the target does not move between searches in known static terrains and unknown static terrains (see Section 1.1.2), where the action costs of the agent do not decrease between searches. Eager Adaptive A* and Lazy Adaptive A* share the same principle, and they use the same h-values for all states when they are needed during each search. Thus, they nd 48 01 function CalculateKey(s) 02 return g(s)+h(s); 03 procedure InitializeState(s) 04 if search(s) = 0 05 g(s) :=1; 06 h(s) :=H(s;s goal ); 07 else if search(s)6=counter 08 if g(s)+h(s)<pathcost(search(s)) 09 h(s) :=pathcost(search(s))g(s); 10 g(s) :=1; 11 search(s) :=counter; 12 procedure UpdateState(s) 13 if s2OPEN 14 OPEN.Update(s, CalculateKey(s)); 15 else 16 OPEN.Insert(s, CalculateKey(s)); 17 function ComputePath() 18 while OPEN.TopKey()< CalculateKey(s goal ) 19 s :=OPEN.Pop(); 20 for all s 0 2Succ(s) 21 InitializeState(s 0 ); 22 if g(s 0 )>g(s)+c(s;s 0 ) 23 g(s 0 ) :=g(s)+c(s;s 0 ); 24 parent(s 0 ) :=s; 25 UpdateState(s 0 ); 26 if OPEN =; 27 return false; 28 return true; 29 function Main() 30 counter := 0; 31 s start := the current state of the agent; 32 s goal := the current state of the target; 33 for all s2S 34 search(s) := 0; 35 while s start 6=s goal 36 counter :=counter+1; 37 InitializeState(s start ); 38 InitializeState(s goal ); 39 g(s start ) := 0; 40 OPEN :=;; 41 OPEN.Insert(s start , CalculateKey(s start )); 42 if ComputePath() = false 43 return false; /* Target cannot be caught*/ 44 pathcost(counter) :=g(s goal ); 45 while target not caught AND action costs on path do not increase 46 agent follows the path from s start to s goal ; 47 if agent caught target 48 return true; 49 s start := the current state of the agent; 50 update the increased action costs (if any); 51 return true; Figure 2.14: Pseudo Code of Lazy Adaptive A* the same paths and move the agent along the same trajectory when using the same tie- breaking strategy. The only dierence between them is that Lazy Adaptive A* updates the h-values of the states only when they are needed by future searches. 49 Figure 2.14 gives the pseudo code of Lazy Adaptive A*. Lazy Adaptive A* remembers some information during each search, such as the g-values (Lines 23 and 39) and search- values of states (Line 11), and some information after each search, such as the g-value of the goal state (Lines 44), and then uses this information to calculate the h-value of a state only when it is needed by a future search. During the counter-th search (= the current search), InitializeState(s) is called to initialize theg- andh-values ofs (Lines 21, 37 and 38): If h(s) has not yet been initialized by any search before (search(s) = 0), Lazy Adaptive A* initializesh(s) with the user-providedh-value (Line 06). Otherwise, if search(s)6= counter, which indicates thath(s) has not been calculated during the current search (Line 07), Lazy Adaptive A* checks whethers was expanded during the search(s)- th search, where h(s) was last calculated: If g(s) +h(s)< pathcost(search(s)) (Line 08), then the f-value of s (= g(s) +h(s)) is smaller than the g-value of the goal state during the search(s)-th search (Line 44). Since the g-value of the goal state is always identical to the f-value of the goal state (with consistent h-values), the f-value of s is smaller than thef-value of the goal state during the search(s)-th search. Thus,s must have been expanded during the search(s)-th search according to A* Property 3 (see Section 2.3.4). Lazy Adaptive A* updatesh(s) by executingh(s) := pathcost(search(s))g(s) (Line 09). If g(s) +h(s)> pathcost(search(s)), then h(s) is not updated, since s was not expanded during the search(s)-th search. If g(s) +h(s) = pathcost(search(s)), then h(s) is not updated either, since updating it by executing Line 09 will not change its value. g(s) is initialized to innity if s has not been initialized during the current search (Lines 05 and 10). Finally, search(s) is set to counter each time InitializeState(s) is executed to 50 indicate that theg-value ofs has been initialized, and should not be re-initialized during the same search (Line 11). g search h f Figure 2.15: Legend of Figure 2.16 I now use the same example path planning problem for Eager Adaptive A* shown in Figure 2.13 to demonstrate the behavior of Lazy Adaptive A*. Figure 2.16 shows the behavior of Lazy Adaptive A*: All states have their search-value in the lower right corner. States that have been initialized in InitializeState() also have their g-values in the upper left corner, h-values in the lower left corner, and f-values in the upper right corner (shown in Figure 2.15). Expanded states in each search are shaded grey. Figure 2.16(a) shows the rst search of Lazy Adaptive A*, that searches from the cur- rent state of the agent D2 to the current state of the target C4. For the rst search, Lazy Adaptive A* expands the same states as Eager Adaptive A* (shown in Figure 2.13(a)), since Eager Adaptive A* and Lazy Adaptive A* both use the user-provided h-values (= Manhattan Distances) to guide the search. After the rst search, Lazy Adaptive A* returns a cost-minimal path with path cost 7, and thus pathcost(1) = 7 (Line 44). In contrast to Eager Adaptive A* (shown in Figure 2.13(b)), Lazy Adaptive A* does not update any h-values directly after the rst search (shown in Figure 2.16(b)). The agent then moves along the cost-minimal path returned by the rst search until it reaches B1, where it observes that obstacle B2 blocks 51 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (a) First Search counter = 1 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 1 2 3 4 D A B C S G (b) After First Search counter = 1 pathcost(1) = 7 1 6 2 6 3 6 4 6 5 2 4 2 3 2 2 2 0 4 4 6 5 6 4 2 2 2 1 2 1 6 6 7 6 6 5 2 1 1 0 2 2 8 0 3 6 2 3 1 0 D A B C 1 2 3 4 S G (c) Second Search counter = 2 pathcost(1) = 7 Figure 2.16: Example Trace of Lazy Adaptive A* its path. Then, the agent performs a second search to replan a new path from the current state of the agent B1 to the current state of the target C4 (shown in Figure 2.16(c)). During the second search, Lazy Adaptive A* updates the h-values of some states expanded during the rst search to make them more informed. An example is D1: InitializeState(D1) updates its h-value from 4 to 6 by using Assignment (2.1) (Lines 07 - 09), which is more informed than the user-provided h-value. Lazy Adaptive A* up- dates the h-values of the expanded states of the rst search only when they are needed during a search. Thus, Lazy Adaptive A* might update theh-values of a smaller number of states than Eager Adaptive A*. For example, the h-value of D2 is updated from 3 to 7 by Eager Adaptive A* (shown in Figure 2.13(c)) but is not updated by Lazy Adaptive A*. 2.4.2.2 D* Lite D* Lite (Koenig & Likhachev, 2002) is a search tree transforming incremental search algorithm designed for stationary target search. D* Lite requires the root of the search 52 01 function CalculateKey(s) 02 return [min(g(s);rhs(s))+H(s goal ;s)+km;min(g(s);rhs(s))]; 03 procedure Initialize() 04 OPEN :=;; 05 km := 0; 06 for all s2S 07 rhs(s) :=g(s) :=1; 08 parent(s) :=NULL; 09 s start := the current state of the target; 10 s goal := the current state of the agent; 11 rhs(s start ) := 0; 12 OPEN.Insert(s start , CalculateKey(s start )); 13 procedure UpdateState(u) 14 if g(u)6=rhs(u) AND u2OPEN 15 OPEN.Update(u, CalculateKey(u)); 16 else if g(u)6=rhs(u) AND u = 2OPEN 17 OPEN.Insert(u, CalculateKey(u)); 18 else if g(u) =rhs(u) AND u2OPEN 19 OPEN.Delete(u); 20 function ComputePath() 21 while OPEN.TopKey()< CalculateKey(s goal ) OR rhs(s goal )>g(s goal ) 22 u :=OPEN.Top(); 23 k old :=OPEN.TopKey(); 24 knew := CalculateKey(u); 25 if k old <knew 26 OPEN.Update(u,knew); 27 else if g(u)>rhs(u) 28 g(u) :=rhs(u); 29 OPEN.Delete(u); 30 for all s2Pred(u) 31 if s6=s start AND rhs(s)>g(u)+c(s;u) 32 parent(s) :=u; 33 rhs(s) :=g(u)+c(s;u); 34 UpdateState(s); 35 else 36 g(u) :=1; 37 for all s2Pred(u)[fug 38 if s6=s start AND parent(s) =u 39 rhs(s) := min s 0 2Succ(s) (g(s 0 )+c(s;s 0 )); 40 if rhs(s) =1 41 parent(s) :=NULL; 42 else 43 parent(s) := argmin s 0 2Succ(s) (g(s 0 )+c(s;s 0 )); 44 UpdateState(s); 45 if rhs(s goal ) =1 46 return false; 47 return true; Figure 2.17: Pseudo Code of D* Lite (Part 1) tree to remain unchanged. Thus, for stationary target search, D* Lite performs backward searches (and cannot perform forward searches) by assigning the current states of the agent and target to the goal and start states of each search, respectively. Thus, the goal state of each search can change between searches since the agent can move between searches. However, the start state of each search does not change between searches since 53 48 function Main() 49 Initialize(); 50 while s start 6=s goal 51 s oldgoal :=s goal ; 52 if ComputePath() = true 53 while target not caught AND action costs do not changed 54 agent follows the path from s goal to s start ; 55 if agent caught target 56 return true; 57 s goal := the current state of the agent; 58 km :=km +H(s oldgoal ;s goal ); 59 else 60 wait until some action costs decrease; 61 for all actions whose costs changed from c 0 (u;v) to c(u;v) 62 if c 0 (u;v)>c(u;v) 63 if u6=s start AND rhs(u)>g(v)+c(u;v) 64 parent(u) :=v; 65 rhs(u) :=g(v)+c(u;v); 66 UpdateState(u); 67 else 68 if u6=s start AND parent(u) =v 69 rhs(u) := min s 0 2Succ(u) (g(s 0 )+c(u;s 0 )); 70 if rhs(u) =1 71 parent(u) :=NULL; 72 else 73 parent(u) := argmin s 0 2Succ(u) (g(s 0 )+c(u;s 0 )); 74 UpdateState(u); 75 return true; Figure 2.18: Pseudo Code of D* Lite (Part 2) the target does not move between searches. D* Lite maintains a cost-minimal path between the start and goal states of each search in terrains where the action costs of the agent can both increase and decrease between searches. Thus, D* Lite applies to agent navigation where the target does not move between searches in all three classes of terrains (see Section 1.1.2). However, it is still unknown how to apply D* Lite to moving target search where both the start and goal states can change between searches and make it run faster than A*. 54 Figures 2.17 and 2.18 show the pseudo code of D* Lite. 8 D* Lite maintains anh-value, g-value and parent pointer for every states, with similar semantics as used by A*, but it also maintain an rhs-value. The rhs-value of s is dened to be rhs(s) = 8 > > < > > : c if s =s start (Eq. 1) min s 0 2Succ(s) (g(s 0 ) +c(s;s 0 )) otherwise (Eq. 2) where c = 0 for all searches. Thus, the rhs-value is basically a one-step lookahead g- value. A state s is called locally inconsistent i g(s)6= rhs(s). A locally inconsistent state s is called locally underconsistent i g(s) < rhs(s), and locally overconsistent i g(s)>rhs(s). Finally, the parent pointer of state s is dened to be parent(s) = 8 > > > < > > > : NULL if s =s start OR rhs(s) =1 (Eq. 3) argmin s 0 2Succ(s) (g(s 0 ) +c(s;s 0 )) otherwise (Eq. 4) The main routine of D* Lite is performed in ComputePath(), which calculates a cost- minimal path from s goal to s start . The OPEN list is a priority queue that allows A* to always expand a state with the smallest f-value in it. D* Lite also maintains an OPEN list with a similar semantics as used by A*. The OPEN list always contains exactly 8 The key k(s) of a state s in the OPEN list roughly corresponds to its f-values used by A* and is the pair: k(s) = [k 1 (s);k 2 (s)], where k 1 (s) = min(g(s);rhs(s)) +H(s goal ;s) +k m and k 2 (s) = min(g(s);rhs(s)) (Line 02). The pseudo code uses the following functions to manage the OPEN list: OPEN.Top() returns a state with the smallest key of all states in the OPEN list. OPEN.TopKey() returns the smallest key of all states in the OPEN list. (If the OPEN list is empty, then OPEN.TopKey() returns [1;1].) OPEN.Insert(s;k) inserts state s into the OPEN list with key k. OPEN.Update(s;k) changes the key of state s in the OPEN list to k. OPEN.Delete(s) deletes state s from the OPEN list. 55 the locally inconsistent states, whose g-values need to be updated to make the states locally consistent. Keys are compared according to a lexicographical order. For example, a key k(s) = [k 1 (s);k 2 (s)] is less than or equal to a key k 0 (s) = [k 0 1 (s);k 0 2 (s)], denoted by k(s) k 0 (s), i either k 1 (s) < k 0 1 (s) or (k 1 (s) = k 0 1 (s) and k 2 (s) k 0 2 (s)). The rst component of the key k 1 (s) corresponds to the f-value f(s) :=g(s) +h(s) 9 used by A*, because both theg-values and rhs-values of D* Lite correspond to theg-values of A* and the h-values of D* Lite correspond to the h-values of A*. 10 The second component of the key k 2 (s) corresponds to the g-values of A*. D* Lite always expands a state in the OPEN list (by executing Lines 27 - 44) with the smallest k 1 -value, which corresponds to the f-value of A*, breaking ties in favor of the state with the smallest k 2 -value, which corresponds to theg-value of A*. This is similar to a version of A* that always expands a state in the OPEN list with the smallestf-value, breaking ties towards smallestg-values. Before ComputePath() is called, states can have arbitrary g-values but their rhs- values and parent pointers have to satisfy Eqs. 1-4 and the OPEN list has to contain 9 Since D* Lite performs backward search, it calculates the user-provided h-values by using H(s goal ;s), which estimates dist (s goal ;s), the minimum cost of moving from s goal (= the current state of the agent) to s of a search. 10 Every time ComputePath() is performed to nd a cost-minimal path, D* Lite could recalculate the keys of the states in the OPEN list when the agent observes a change in action costs after it has moved. Since the goal state changes between searches after the agent has moved and the h-values of all states in theOPEN list were calculated with respect to the goal state of the previous search, theh-values need to be updated with respect to the current goal state. However, the repeated reordering of the OPEN list can be computationally expensive since the OPEN list often contains a large number of states. D* Lite therefore uses a method derived from focussed D* (Stentz, 1995) to avoid having to reorder the keys of all states in the OPEN list, namely keys that are lower bounds on the keys that D* Lite uses for the corresponding states: After the agent has moved from the goal state of the previous search s oldgoal to the current goal state s goal , where it observes changes in action costs, the rst component of the keys can have decreased by at most H(s oldgoal ;s goal ). (The second component does not depend on the h-values and thus remains unchanged.) Thus, in order to maintain lower bounds, D* Lite needs to subtract H(s oldgoal ;s goal ) from the rst component of the keys of all states in theOPEN list. However, sinceH(s oldgoal ;s goal ) is the same for all states in the OPEN list, the order of the states in the OPEN list does not change if the subtraction is not performed. Then, when new keys are computed, their rst components are by H(s oldgoal ;s goal ) too small relative to the keys in the OPEN list. Thus, H(s oldgoal ;s goal ) has to be added to their rst components every time some action costs change. If the agent moves again and then observes action cost changes again, then the constants need to get added up. D* Lite maintains a variable km for this method (Lines 02, 05 and 58) (Koenig & Likhachev, 2002). 56 exactly all locally inconsistent states. ComputePath() terminates when s goal is not lo- cally underconsistent and its key is less than or equal to OPEN.TopKey() (Line 21). If rhs(s goal ) =1, then D* Lite terminates with no path found (Line 46). Otherwise, D* Lite nds a cost-minimal path from s start to s goal . The principle of D* Lite can be summarized as follows: D* Lite initially nds a cost- minimal path from the start state to the goal state. When some action costs change between searches, all states s whose rhs-values are aected by these changes have their rhs-values updated and are inserted into the OPEN list if they become locally inconsistent and are not in the OPEN list yet, or removed from the OPEN list if they become locally consistent and were in the OPEN list before. D* Lite then propagates the eects of these rhs-value changes to the rest of the state space until ComputePath() terminates, upon which all states s whose keys are smaller than the key of s goal are locally consistent, and rhs(s) and g(s) are equal to dist (s;s start ). g rhs Figure 2.19: Legend of Figure 2.20 Figure 2.20 gives an example stationary target search problem using D* Lite in known dynamic grids. For ease of illustration, I use H(s goal ;s) (= user-provided h-values for D* Lite) that are zero for all states and do not show them in Figure 2.20. All states have their g-values in the upper left corner and their rhs-values in the upper right corner (shown in Figure 2.19). Expanded states are shaded grey. The state of the target is B1 (marked S), and the current state of the agent is D5 (marked G). The rst search of D* Lite runs 57 1 1 2 2 3 3 4 4 5 5 0 0 1 1 2 2 6 6 1 1 2 2 3 3 7 7 2 2 3 3 4 4 8 3 3 4 4 5 5 5 1 2 3 4 E A B C D S G (a) First Search 1 1 2 2 3 3 4 4 5 5 0 0 1 1 2 2 3 6 6 1 1 2 2 3 3 7 7 2 2 3 3 4 4 8 3 3 4 4 5 5 E A B C D 5 1 2 3 4 S G (b) After B4 Unblocked 1 1 2 2 3 3 4 4 5 5 0 0 1 1 2 2 3 3 6 4 1 1 2 2 3 3 7 7 2 2 3 3 4 4 8 3 3 4 4 5 5 5 1 2 3 4 E A B C D S G (c) After B4 Expanded 1 1 2 2 3 3 4 4 5 5 0 0 1 1 2 2 3 3 4 4 1 1 2 2 3 3 7 5 2 2 3 3 4 4 8 3 3 4 4 5 5 E A B C D 5 1 2 3 4 S G (d) After B5 Expanded Figure 2.20: Example Trace of D* Lite with B1 and D5 as the start and goal states, respectively (shown in Figure 2.20(a)). The cost-minimal path is B1, B2, B3, A3, A4, A5, B5, C5 and D5. The agent then moves along this path to C5, at which point in time B4 becomes unblocked, which changes the costs c(B3,B4), c(A4,B4), c(B4,B5), c(B4,B3), c(B4,A4) and c(B5,B4) from innity to one (shown in Figure 2.20(b)). D* Lite then updates the rhs-values and parent pointers of the states aected by the action cost changes by using Eqs. 1-4: D* Lite updates the rhs-value and parent pointer of B4 to 3 and B3 respectively, and inserts B4 into the OPEN list. The OPEN list then contains all locally inconsistent states, which provides a starting point for the second search. With all locally inconsistent states in the OPEN list, D* Lite then propagates the rhs-value changes to the states whose rhs-values need to be updated by executing ComputePath() (Lines 20 - 47). The second search of D* Lite then runs with B1 and C5 as its start and goal states, respectively, and performs only two state expansions B4 (shown in Figure 2.20(c)) and B5 (shown in Figure 2.20(d)). The cost-minimal path is B1, B2, B3, B4, B5 and C5. The agent then moves along this path until it reaches B1. D* Lite is ecient for path planning for stationary target search in unknown static terrain because an agent can usually observe action cost changes only within its sensor range. Thus, the action cost changes are commonly close to the agent. Since D* Lite 58 performs backward searches by assigning the current states of the agent and target to the goal and start states of each search, respectively, these changes are usually close to the goal state and far away from the start state (= the root of the search tree). Thus, D* Lite only needs to propagate the rhs-value changes to a small number of states. It is possible for D* Lite and its variants to be slower than A* for some easy navigation problems, where the agent catches the target with only a small number of searches (Hern andez, Baier, Uras, & Koenig, 2012) or if the changes are close to the root of the search tree, in which case D* Lite can even expand more states than an A* search that searches from scratch. It has been demonstrated that D* Lite can achieve a speedup of one to two orders of magnitude over repeated A* searches for agent navigation in unknown static terrains (Koenig & Likhachev, 2002). However, it is still unknown how to adapt D* Lite to moving target search, where both the start and goal states can change between searches, and make it run faster than A*. 2.4.2.3 Dierential A* Dierential A* (Trovato, 1990) is a search tree transforming incremental search algorithm which has been designed for agent navigation in terrains where the action costs of the agent can both increase and decrease between searches. Dierential A* can either perform forward searches or backward searches since it applies to search problems where both the start and goal states can change between searches. Thus, it applies to moving target search in all three classes of terrains (see Section 1.1.2). The principle of Dierential A* is as follows: Dierential A* denes an \active tran- sition" for each s2 S in the search tree, which is the transition among all transitions 59 from s 0 2 Pred(s) to s that minimizes g(s 0 ) +c(s 0 ;s). When the start state or action costs change between searches, Dierential A* uses a \Dierence Engine" to perform a case-by-case analysis (with 14 cases) to determine the potentially aected active transi- tions and the states involved. All potentially aected active transitions and the states are deleted from the search tree of the previous search. Dierential A* then starts a new search with the remaining search tree of the previous search and thus does not have to construct the search tree for the new search from scratch. Dierential A* is ecient when the number of potentially aected active transitions is small and hence a large portion of the search tree of the previous search can be reused for the current search. When the start state changes between searches, Dierential A* deletes the previous start state and creates the new start state. Deleting the previous start state potentially aects the active transitions of all states in the search tree of the previous search, and Dierential A* has to recursively delete the search tree of the previous search. Thus, Dierential A* cannot reuse any part of the search tree from the previous search in this case (but has to spend eort deleting the search tree from the previous search) and hence can be slower than A* for moving target search where both the start and goal states can change over time. 2.5 Evasion Algorithms for the Targets Although most research on moving target search has focused on developing ecient path planning algorithms for the agent to catch the target, there are several path planning algorithms for the target to avoid the agent, referred to as \evasion algorithms for the targets." Evasion algorithms for the targets generally fall into two classes: 60 The rst class is a class of oine algorithms. Oine algorithms, such as mini- max search, can be used for the target to avoid the agent for as long as possible. Unfortunately, even if well optimized, oine algorithms do not scale to large ter- rains (Moldenhauer & Sturtevant, 2009a). The second class is a class of online algorithms. Online algorithms, such as MTS (Ishida & Korf, 1991), have been used to guide the target to a state as far away from the current state of the agent as possible (see the \Experimental Evaluation Section" in (Ishida & Korf, 1991)). Four simple algorithms are used in (Goldenberg, Kovarsky, Wu, & Schaeer, 2003) to guide the target: (1) moving to a state that maximizes the minimum cost of moving from the current state of the agent to the state, (2) moving to a state that maximizes the move choices of the target and thus its mobility, (3) moving to a state that is not in line of sight of the agent, and (4) moving randomly. The cover heuristic algorithm (Isaza, Lu, Bulitko, & Greiner, 2008) is a path planning algorithm for the target to avoid the agent: It computes the number of states in the state space that the target can move to before the agent from each successor state of the current state of the target, referred to as the \target's cover" of that successor state. The algorithm then moves the target to the successor state with the largest target's cover. TrailMax (Moldenhauer & Sturtevant, 2009a) is a recent evasion algorithm for the target to avoid the agent. TrailMax generates paths for the target that maximize the minimum cost of the trajectory of the agent to catch the target in known static terrains. TrailMax in- terleaves two independent searches by using Dijkstra's algorithm starting from the 61 current state of agent, referred to as the agent's search, and the current state of the target, referred to as the target's search, respectively. Two priority queues are maintained, one for the agent's search and one for the target's search. The speeds of the agent and the target are v a and v t , respectively. Each state s maintains two versions of g-values: (1) g a (s) is the g-value of s in the agent's search, which is the minimum cost of moving from the current state of the agent to s. (2) g t (s) is the g-value of s in the target's search, which is the minimum cost of moving from the current state of the target to s. Initially, the g a -value of the current state of the agent and the g t -value of the current state of the target are zero, and the g a - and g t -values of all other states are innity. TrailMax repeatedly performs the follow- ing procedure: The agent's search and the target's search take turns to expand all states with a given g-value (starting from 0), starting with the agent's search. In the agent's search, states are expanded as in Dijkstra's algorithm. In the target's search, whenever a states is deleted from the priority queue for expansion, TrailMax checks whether g t (s)=v t g a (s)=v a . If this is the case, then s is discarded and not expanded in the target's search, because the agent can move to s before the target and catch the target at s. Otherwise, s is expanded as in Dijkstra's algorithm. If all states that have been expanded in the target's search have been expanded in the agent's search, then TrailMax terminates and the last state that was expanded in the agent's search is the state that the target moves to. Otherwise, TrailMax sets the g-value to the smallest value among the g a - and g t -values of all states in the two priority queues and repeats the procedure. 62 In this dissertation, I will evaluate the runtime of the developed incremental search algorithms by applying them to moving target search with dierent movement strategies of the target: The target repeatedly moves to a randomly chosen state with the restriction that there exists a path from the current state of the target to the randomly chosen state. The target uses TrailMax to avoid the agent. By doing so, I demonstrate the runtime advantage of the developed incremental search algorithms over Repeated A* and all competing incremental search algorithms with re- spect to dierent movement strategies of the target. 2.6 Summary In this chapter, I introduced background knowledge of path planning for agent navigation. In Section 2.1, I introduced the notation used in this dissertation. In Section 2.2, I described a way of representing path planning problems by using discretization, so that they can be solved with search algorithms. In Section 2.3, I described A*, a popular search algorithm in articial intelligence. All incremental search algorithms developed in this dissertation are based on A*. In Section 2.4, I described existing Repeated A*- based path planning algorithms, including real-time and incremental search algorithms, and described their advantages and disadvantages. Incremental search algorithms have been a great success for many applications of stationary target search. However, there does not exist any incremental search algorithm that applies to moving target search and 63 runs faster than Repeated A* in terrains where the action costs of the agent can change over time. Thus, the objective of my dissertation is to develop new incremental search algorithms for moving target search that run faster than Repeated A*. In Section 2.5, I described evasion algorithms for the targets to avoid the agent in moving target search. I will evaluate the runtime of the developed incremental search algorithms by applying them to moving target search with dierent movement strategies of the target. 64 Chapter 3: Incremental Search by Learning h-values In this chapter, I introduce heuristic learning incremental search algorithms for moving target search. In Section 2.4.2.1, I reviewed Adaptive A* (Koenig & Likhachev, 2005) designed for stationary target search in unknown static terrains. Thus far, there does not exist a heuristic learning incremental search algorithm that applies to moving target search in all three classes of terrains (see Section 1.1.2). Thus, I develop GAA* (Sun et al., 2008), a new heuristic learning incremental search algorithm for moving target search. Table 3.1 illustrates the search problems that Adaptive A*, MT-Adaptive A* and GAA* apply to: Adaptive A* applies to search problems where the start state can change and the action costs do not decrease between searches. It does not apply to search problems where the goal state can change or the action costs can decrease between searches. Action Costs Action Costs Start State Goal State Algorithms Can Increase Can Decrease Can Change Can Change Between Searches Between Searches Between Searches Between Searches Adaptive A* Yes No Yes No MT-Adaptive A* Yes No Yes Yes GAA* Yes Yes Yes Yes Table 3.1: Search Problems that Heuristic Learning Incremental Search Algorithms Apply to 65 Thus, Adaptive A* can only perform forward (but not backward) searches by as- signing the current states of the agent and the target to the start and goal states of each search, respectively, for stationary target search. Adaptive A* applies only to stationary target search in known static terrains and unknown static terrains (see Section 1.1.2). MT-Adaptive A* extends Adaptive A* to search problems where both the start and goal states can change between searches. Thus, MT-Adaptive A* can either perform forward searches by assigning the current states of the agent and the target to the start and goal states of each search, respectively, referred to as Forward Moving Target Adaptive A* (Forward MT-Adaptive A*), or perform backward searches by assigning the current states of the agent and the target to the goal and start states of each search, respectively, referred to as Backward Moving Target Adaptive A* (Backward MT-Adaptive A*). However, MT-Adaptive A* still does not apply to search problems where the action costs can decrease between searches. Thus, MT-Adaptive A* applies only to moving target search in known static terrains and unknown static terrains (see Section 1.1.2). GAA* extends MT-Adaptive A* to search problems where both the start and goal states can change and the action costs can both increase and decrease between searches. Thus, GAA* can either perform forward searches by assigning the current states of the agent and the target to the start and goal states of each search, respectively, referred to as Forward Generalized Adaptive A* (Forward GAA*), or perform backward searches by assigning the current states of the agent and 66 the target to the goal and start states of each search, respectively, referred to as Backward Generalized Adaptive A* (Backward GAA*). GAA* applies to moving target search in all three classes of terrains (see Section 1.1.2). The structure of this chapter is as follows: In Section 3.1, I review MT-Adaptive A* (Koenig et al., 2007) that extends Adaptive A* to moving target search by maintaining the consistency of the h-values with respected to the goal state after it changed between searches. In Section 3.2, I introduce GAA* (Sun et al., 2008) that extends MT-Adaptive A* to moving target search in terrains where the action cost can both increase and decrease between searches. In Section 3.3, I provide a summary of the heuristic learning incremental search algorithms. 3.1 Review of Moving Target Adaptive A* MT-Adaptive A* (Koenig et al., 2007) extends Adaptive A* to moving target search by maintaining the consistency of theh-values with respect to the goal state after it changed between searches. In Section 3.1.1, I rst review an eager version of MT-Adaptive A*, referred to as Eager Moving Target Adaptive A* (Eager MT-Adaptive A*) (Koenig et al., 2007). In Section 3.1.2, I then review a more complicated version, referred to as Lazy Moving Target Adaptive A* (Lazy MT-Adaptive A*) (Koenig et al., 2007). 3.1.1 Eager Moving Target Adaptive A* In this section, I review Eager MT-Adaptive A* (Koenig et al., 2007), that extends Eager Adaptive A* to moving target search. 67 3.1.1.1 Applicability Eager MT-Adaptive A* can perform either forward or backward searches, and it applies only to moving target search in known static terrains and unknown static terrains (see Section 1.1.2). In this section, I review only Forward Eager MT-Adaptive A*, since it is trivial to switch the search direction, resulting in Backward Eager MT-Adaptive A*. 3.1.1.2 Principle It is not straight-forward to extend Eager Adaptive A* to moving target search since the h-values need to be consistent with respect to the goal state. In moving target search, where the goal state can change between searches, the h-values have to be corrected in order to maintain the consistency of the h-values with respect to the goal state after it changed between searches, which is the issue addressed by Forward Eager MT-Adaptive A*. The principle behind Forward Eager MT-Adaptive A* is as follows: Assume that the goal state changes from s goal to s 0 goal with s 0 goal 6=s goal after a search. For all s2S, h(s) is theh-value ofs immediately after the previous search. Assume that the h-values of all s2S are consistent with respect to s goal . Forward Eager MT-Adaptive A* then corrects the h-values of all s2S to make them consistent with respect to s 0 goal by assigning h(s) := max(H(s;s 0 goal );h(s)h(s 0 goal )): (3.1) It has been shown that the h-values of all s 2 S are consistent with respect to s 0 goal after the assignment (Koenig et al., 2007). Moreover, taking the maximum of the 68 user-provided h-value H(s;s 0 goal ) and h(s)h(s 0 goal ) ensures that the h-values used by Forward Eager MT-Adaptive A* are at least as informed as the user-provided h-values with respect to s 0 goal . Thus, a search of Forward Eager MT-Adaptive A* with the new h-values cannot expand more states than a search of A* with the user-provided h-values with respect to s 0 goal (with the same tie-breaking strategy). However, it is not the case that Forward Eager MT-Adaptive A* is guaranteed to reduce its runtime per search more and more over time since theh-values are not guaranteed to become more and more informed. Updating them makes them potentially more informed while correcting them makes them potentially less informed. In the following, I refer to \updating theh-values" and \correcting the h-values" to distinguish the two steps. 3.1.1.3 Operations Figure 3.1 gives the pseudo code of Forward Eager MT-Adaptive A*. Since Forward Eager MT-Adaptive A* applies only to terrains where the action costs of the agent do not decrease between searches, Forward Eager MT-Adaptive A* can terminate when no path exists between the start state and the goal state after a search (Line 42), because it is then guaranteed that it will not be able to nd any path in future searches. Forward Eager MT-Adaptive A* performs each search by executing ComputePath() (Lines 14 - 26). After each search, it updates the h-values of all states that have been expanded by that search by executing Assignment (2.1) (Lines 43 - 44). The updated h-values are consistent with respect to the goal state of that search. The agent then moves along the path found by the search. If the goal state changed and it is now o the path because the 69 target moved after the previous search, then Forward Eager MT-Adaptive A* corrects theh-values of all states in the state space by executing Assignment (3.1) (Lines 53 - 54). There are only one addition and two modications to the pseudo code of Eager Adap- tive A* (shown in Figure 2.10). The only addition is Lines 50 - 54, where the h-values of all states are corrected if the goal state changes between searches. One modication is Line 32, where the h-values are now initialized with the user-provided h-values with respect to the goal state of the rst search up front since all h-values are corrected after the goal state changes between searches and thus need to have been initialized at this point in time. The second modication is Line 45, that now takes into account that the target can move between searches. 3.1.1.4 Example I now use the example problem of moving target search introduced in Figure 2.9 to demonstrate the advantage of Forward Eager MT-Adaptive A* over Forward Repeated A*. The terrain and the current states of the agent and target are exactly the same as those in Figure 2.9(a) before the rst search. After the rst search, the target moves from C4 to C3. Figures 3.3 and 3.4 illustrate the same path planning problem solved with Forward Repeated A* and Forward Eager MT-Adaptive A*, respectively. All states have their search-value in the lower right corner. States that have been initialized in InitializeState() also have their g-values in the upper left corner, h-values in the lower left corner, and f- values in the upper right corner (shown in Figure 3.2). For Forward Eager MT-Adaptive A*, I also show theh-values of the states that have not been initialized in InitializeState() 70 01 function CalculateKey(s) 02 return g(s)+h(s); 03 procedure InitializeState(s) 04 if search(s) = 0 05 g(s) :=1; 06 else if search(s)6=counter 07 g(s) :=1; 08 search(s) :=counter; 09 procedure UpdateState(s) 10 if s2OPEN 11 OPEN.Update(s, CalculateKey(s)); 12 else 13 OPEN.Insert(s, CalculateKey(s)); 14 function ComputePath() 15 while OPEN.TopKey()< CalculateKey(s goal ) 16 s :=OPEN.Pop(); 17 CLOSED.Insert(s); 18 for all s 0 2Succ(s) 19 InitializeState(s 0 ); 20 if g(s 0 )>g(s)+c(s;s 0 ) 21 g(s 0 ) :=g(s)+c(s;s 0 ); 22 parent(s 0 ) :=s; 23 UpdateState(s 0 ); 24 if OPEN =; 25 return false; 26 return true; 27 function Main() 28 counter := 0; 29 s start := the current state of the agent; 30 s goal := the current state of the target; 31 for all s2S 32 h(s) :=H(s;s goal ); 33 search(s) := 0; 34 while s start 6=s goal 35 counter :=counter+1; 36 InitializeState(s start ); 37 InitializeState(s goal ); 38 g(s start ) := 0; 39 OPEN :=CLOSED :=;; 40 OPEN.Insert(s start , CalculateKey(s start )); 41 if ComputePath() = false 42 return false; /* Target cannot be caught */ 43 for all s2CLOSED 44 h(s) :=g(s goal )g(s); 45 while target not caught AND action costs on path do not increase AND target on path from s start to s goal 46 agent follows the path from s start to s goal ; 47 if agent caught target 48 return true; 49 s start := the current state of the agent; 50 s newgoal := the current state of the target; 51 if s goal 6=s newgoal 52 old :=h(s newgoal ); 53 for all s2S 54 h(s) := max(H(s;s newgoal );h(s)old); 55 s goal :=s newgoal ; 56 update the increased action costs (if any); 57 return true; Figure 3.1: Pseudo Code of Forward Eager Moving Target Adaptive A* 71 g search h f Figure 3.2: Legend of Figures 3.3 and 3.4 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (a) First Search counter = 1 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (b) After First Search counter = 1 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 5 7 6 7 4 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (c) After Target Moved counter = 1 1 5 2 5 3 5 4 7 4 2 3 2 2 2 3 2 0 3 4 5 5 7 3 2 1 2 2 2 1 3 5 5 7 7 2 2 0 2 0 1 2 5 3 5 3 2 2 2 0 1 2 3 4 D A B C S G (d) Second Search counter = 2 Figure 3.3: Example Trace of Forward Repeated A* 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 1 0 1 2 3 4 D A B C S G (a) First Search counter = 1 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 5 1 1 1 0 1 1 5 0 3 6 1 7 1 1 0 D A B C 1 2 3 4 S G (b) After First Search counter = 1 4 9 5 9 6 9 7 9 4 1 3 1 2 1 3 1 3 7 4 7 5 7 6 7 3 1 3 1 1 1 2 1 2 5 6 7 7 7 4 1 0 1 1 1 1 5 0 3 5 1 6 1 2 0 1 2 3 4 D A B C S G (c) After Target Moved counter = 1 1 5 2 5 3 5 4 7 4 2 3 2 2 2 3 2 0 3 4 5 5 7 3 2 1 2 2 2 1 5 5 5 7 7 4 2 0 2 1 1 2 7 0 3 5 2 6 1 2 0 D A B C 1 2 3 4 S G (d) Second Search counter = 2 Figure 3.4: Example Trace of Forward Eager Moving Target Adaptive A* since they have been initialized before the rst search (Lines 31 - 32). By doing so, I demonstrate how theirh-values are corrected after the target moved. All search methods break ties between states with the same f-values in favor of states with larger g-values. For the rst search, Forward Repeated A* (shown in Figure 3.3(a)) and Forward Eager MT-Adaptive A* (shown in Figure 3.4(a)) expand exactly the same states as they both use the user-providedh-values to guide the search. The expanded states are shaded grey. After the rst search, Forward Eager MT-Adaptive A* (shown in Figure 3.4(b)) updates the h-values of all states that have been expanded in the rst search (shaded grey) by 72 executing Assignment (2.1). The expanded states have their updated h-values in the lower left corner. In contrast to Forward Eager MT-Adaptive A*, Forward Repeated A* does not update any h-values after the rst search. The agent then moves along the cost-minimal path returned by the rst search until it reaches B1, where it observes that B2 is blocked and the target moves from C4 to C3. After the target moved to C3, Forward Eager MT-Adaptive A* (shown in Fig- ure 3.4(c)) corrects the h-values of all states (shaded grey) by executing Assignment (3.1). All states have their corrected h-values in the lower left corner. In contrast to Forward Eager MT-Adaptive A*, Forward Repeated A* does not correct the h-values. The agent then performs the second search to replan a new path from its current state B1 to the current state of the target C3. For the second search, Forward Repeated A* still uses the user-provided h-values, while Forward Eager MT-Adaptive A* can use the updated and corrected h-values, which can be more informed. The expanded states are shaded grey. Figure 3.3(d) shows that Forward Repeated A* expands D1 and D2 because the user-provided h-values are misleading. Forward Eager MT-Adaptive A* avoids expanding D1 and D2 during the second search (shown in Figure 3.4(d)) resulting in a smaller number of state expansions. 3.1.2 Lazy Moving Target Adaptive A* Eager MT-Adaptive A* can update and correct theh-values of states that are not needed by future A* searches which can be time-consuming if the number of expanded states are large or the goal state changes (Koenig et al., 2007). In this section, I review Lazy Moving 73 01 function CalculateKey(s) 02 return g(s)+h(s); 03 procedure InitializeState(s) 04 if search(s) = 0 05 g(s) :=1; 06 h(s) :=H(s;s goal ); 07 else if search(s)6=counter 08 if g(s)+h(s)<pathcost(search(s)) 09 h(s) :=pathcost(search(s))g(s); 10 h(s) :=h(s)(deltah(counter)deltah(search(s))); 11 h(s) := max(h(s);H(s;s goal )); 12 g(s) :=1; 13 search(s) :=counter; 14 procedure UpdateState(s) 15 if s2OPEN 16 OPEN.Update(s, CalculateKey(s)); 17 else 18 OPEN.Insert(s, CalculateKey(s)); 19 function ComputePath() 20 while OPEN.TopKey()< CalculateKey(s goal ) 21 s :=OPEN.Pop(); 22 for all s 0 2Succ(s) 23 InitializeState(s 0 ); 24 if g(s 0 )>g(s)+c(s;s 0 ) 25 g(s 0 ) :=g(s)+c(s;s 0 ); 26 parent(s 0 ) :=s; 27 UpdateState(s 0 ); 28 if OPEN =; 29 return false; 30 return true; Figure 3.5: Pseudo Code of Forward Lazy Moving Target Adaptive A* (Part 1) Target Adaptive A* (Lazy MT-Adaptive A*) (Koenig et al., 2007), that calculates the h-value of a state only when it is needed by a future A* search. 3.1.2.1 Applicability Like Eager MT-Adaptive A*, Lazy MT-Adaptive A* can perform either forward or back- ward searches, and it applies only to moving target search in known static terrains and unknown static terrains (see Section 1.1.2). In this section, I review only Forward Lazy MT-Adaptive A*, since it is trivial to switch the search direction, resulting in Backward Lazy MT-Adaptive A*. 74 31 function Main() 32 counter := 0; 33 s start := the current state of the agent; 34 s goal := the current state of the target; 35 deltah(1) := 0; 36 for all s2S 37 search(s) := 0; 38 while s start 6=s goal 39 counter :=counter+1; 40 InitializeState(s start ); 41 InitializeState(s goal ); 42 g(s start ) := 0; 43 OPEN :=;; 44 OPEN.Insert(s start , CalculateKey(s start )); 45 if ComputePath() = false 46 return false; /* Target cannot be caught */ 47 pathcost(counter) :=g(s goal ); 48 while target not caught AND action costs on path do not increase AND target on path from s start to s goal 49 agent follows the path from s start to s goal ; 50 if agent caught target 51 return true; 52 s start := the current state of the agent; 53 s newgoal := the current state of the target; 54 if s goal 6=s newgoal 55 InitializeState(s newgoal ); 56 if g(s newgoal )+h(s newgoal )<pathcost(counter) 57 h(s newgoal ) :=pathcost(counter)g(s newgoal ); 58 deltah(counter+1) :=deltah(counter)+h(s newgoal ); 59 s goal :=s newgoal ; 60 else 61 deltah(counter+1) :=deltah(counter); 62 update the increased action costs (if any); 63 return true; Figure 3.6: Pseudo Code of Forward Lazy Moving Target Adaptive A* (Part 2) 3.1.2.2 Principle Forward Eager MT-Adaptive A* and Forward Lazy MT-Adaptive A* share the same principle and use the same h-values for all states when they are needed during each search. Thus, they nd the same paths and move the agent along the same trajectory. They dier in that Forward Lazy MT-Adaptive A* updates and corrects the h-values of the states only when they are needed by future searches, while Forward Eager MT- Adaptive A* updates the h-values of all expanded states after each search and corrects theh-values of all states after each search when the goal state changed, no matter whether these h-values will be needed by future searches or not. 75 3.1.2.3 Operations Figures 3.5 and 3.6 give the pseudo code of Forward Lazy MT-Adaptive A*. InitializeState(s) initializes h(s) with the user-provided h-value with respect to the goal state of the current search ifh(s) has not been initialized by any previous search (Lines 04 and 06). 1 Ifh(s) has been initialized by one of the previous searches but not the current one (Line 07), then InitializeState(s) rst updates h(s) if s has been expanded before (Lines 08 - 09), and then correctsh(s) with respect to the goal state of the current search in case the target has moved since the last time s was initialized by InitializeState(s) (Lines 10 - 11). I now explain in detail how Lazy MT-Adaptive A* corrects theh-value of s: After each search, if the target moved and hence the goal state changed, the correction for the goal state of the current search decreases the h-values of all states by the h-value of the goal state of the current search. This h-value is rst calculated (Lines 55 - 57) and then added to a running sum of all corrections (Line 58). In particular, the value of deltah(x) during thex-th search is the running sum of all corrections up to the beginning of the x-th search. If h(s) was initialized (by InitializeState(s)) in a previous search but not yet initialized (by InitializeState(s)) in the current search (search(s)6= counter), then InitializeState(s) corrects its h-value by the sum of all corrections between the search when s was initialized last and the current search, which is the same as the dierence of the value of deltah during the current search (deltah(counter)) and the search where s was initialized last (deltah(search(s))) (Line 10). It then takes the maximum of this value and the user-provided h-value with respect to the goal state of the current search (Line 1 This is similar to both Eager Adaptive A* and Lazy Adaptive A* but dierent from Forward Eager MT-Adaptive A* that initializes the h-value up front. 76 11). In this way, Forward Lazy MT-Adaptive A* updates and corrects the h-values of all states only when they are needed by future searches, which is faster than updating the h-values of all expanded states after each search and correcting the h-values of all states after each search when the goal state changed as done by Eager MT-Adaptive A*. It has been shown that Forward Eager MT-Adaptive A* uses consistenth-values with respect to the goal state during every search (Koenig et al., 2007). I now show that during every search Forward Lazy MT-Adaptive A* uses the same h-values for all s2S as Forward Eager MT-Adaptive A* when they use the same tie-breaking strategy. Thus, its h-values remain consistent with respect to the goal state. Theorem 1 During every search, whenever the h-value of any state s is calculated by InitializeState(s) for Forward Lazy MT-Adaptive A*, it is identical to the h-value used by Forward Eager MT-Adaptive A* (with the same tie-breaking strategy) in the same search. Proof: The values at the end of the i-th search are indicated via superscript i. The h-values at the end of the i-th search are the same as those used during the i-th search since Forward Eager MT-Adaptive A* does not update any h-value during a search and Forward Lazy MT-Adaptive A* calculates any h-value the rst time it is needed by a search and then returns its h-value whenever it is needed again by the same search. The values of Forward Eager MT-Adaptive A* are not overlined, while the values of Forward Lazy MT-Adaptive A* are overlined. We do not make this distinction for s i goal since s i goal =s i goal per construction. We dene a new variable z i (s) =h i (s) ifs was not expanded by Eager MT-Adaptive A* during the i-th search and z i (s) = g i (s i goal )g i (s) otherwise. Similarly, we dene 77 a new variable z i (s) = h i (s) if s was not expanded by Forward Lazy MT-Adaptive A* during the i-th search and z i (s) = g i (s goal )g i (s) otherwise. 2 z i (s i+1 goal ) is equal to h(s newgoal ) calculated by Forward Eager MT-Adaptive A* on Line 52, and z i (s i+1 goal ) is equal to h(s newgoal ) calculated by Forward Lazy MT-Adaptive A* on Line 57. (In this proof, the line numbers that are not overlined refer to the pseudo code of Forward Eager MT-Adaptive A* in Figure 3.1, and the line numbers that are overlined refer to the pseudo code of Forward Lazy MT-Adaptive A* in Figures 3.5 and 3.6, respectively.) Forward Eager and Lazy MT-Adaptive A* generate and expand exactly the same states, nd exactly the same paths for every search when they use the same h-values and tie-breaking strategy and thus also calculate the same g- and z-values (Property 1). We prove the theorem by induction on the number of times Forward Lazy MT- Adaptive A* calls InitializeState. Assume that Forward Lazy MT-Adaptive A* calls InitializeState(s) during the j-th search. Let x be equal to search(s) at that point in time. These s, j and x are used in the remainder of the proof. Lemma 1 z i (s)h i (s) for all i with i 1. Proof: After each searchi withi 1, Forward Eager MT-Adaptive A* calculatesz i (s) for all s2 CLOSED (Lines 43 - 44). Since Forward Eager MT-Adaptive A* uses consistent h-values, g i (s) +h i (s)g i (s goal ) (A* Property 3), and h i (s)g i (s goal )g i (s) =z i (s). For all s = 2 CLOSED, h i (s) =z i (s) by denition. Lemma 2 h i (s)H(s;s i goal ) for all i with i 1. 2 If Forward Lazy MT-Adaptive A* expands a state s with g i (s) +h i (s) = g i (s i goal ), then it actually setsz i (s) =h i (s) but this does not cause a problem for our denition sincez i (s) =h i (s) =g i (s)+h i (s) g i (s) =g i (s i goal )g i (s). 78 Proof: The lemma holds trivially during the rst search, since h 1 (s) =H(s;s 1 goal ) (Lines 31 - 33), and Forward Eager MT-Adaptive A* does not update any h-value during a search. After each search, there are only two places whereh-values can change in Forward Eager MT-Adaptive A*: (1) h-values of all states s2 CLOSED are updated on Line 44, and we have shown in Lemma 1 that the h-values cannot be decreased on Line 44. (2) h-values of all states s2S are corrected on Line 54 i the goal state changed, and it is trivial to see that h i+1 (s) H(s;s i+1 goal ) after Line 54 is executed. If the goal state does not change between searches, then only (1) is executed. Otherwise, both (1) and (2) are executed. Combing (1) and (2), the lemma holds no matter whether the goal state changed or not between searches. Lemma 3 z i (s) h i (s) H(s;s i goal ) for all i with i 1 (Combining Lemma 1 and Lemma 2). Lemma 4 If h k+1 (s) =H(s;s k+1 goal ) for at least one k with 0xk <j, then h j (s) = H(s;s j goal ). Proof: The lemma trivially holds if k = j 1. Otherwise, we show that h l+2 (s) = H(s;s l+2 goal ) if h l+1 (s) = H(s;s l+1 goal ) for k l < j, which implies the lemma. InitializeState(s) was called last during the x-th search (or has not been called before i x = 0). Thus, s was expanded last during or before the x-th search (or has not been expanded yet i x = 0) by Forward Lazy MT-Adaptive A*, and thus also by Forward Eager MT-Adaptive A* (Property 1). h l+2 (s) = max(h l+1 (s)z l+1 (s l+2 goal );H(s;s l+2 goal )) = max(H(s;s l+1 goal )z l+1 (s l+2 goal );H(s;s l+2 goal )) 79 max(H(s;s l+1 goal )H(s l+2 goal ;s l+1 goal );H(s;s l+2 goal )) (Lemma 3) max(H(s;s l+2 goal );H(s;s l+2 goal )) (H-value triangle inequality) =H(s;s l+2 goal ) In the meantime, Lemma 2 guarantees that h l+2 (s) H(s;s l+2 goal ). Thus, h l+2 (s) = H(s;s l+2 goal ). Lemma 5 deltah(l) = l1 P i=1 z i (s i+1 goal ) for all l with 1lj. Proof: The lemma holds trivially when l = 1 since deltah(l) = 0 initially (Line 35). Otherwise, when 1<lj, assume that the lemma holds until the (l 1)-st search, that is, deltah(l 1) = l2 P i=1 z i (s i+1 goal ). After the (l 1)-st search, Forward Lazy MT-Adaptive A* calculates deltah(l) in two cases: Case 1: If s l1 goal 6=s l goal (Lines 54 - 59), deltah(l) = deltah(l 1) +z l1 (s l goal ) (Line 58) = l2 P i=1 z i (s i+1 goal ) +z l1 (s l goal ) = l1 P i=1 z i (s i+1 goal ). Case 2: If s l1 goal =s l goal (Lines 60 - 61), h l1 (s l1 goal ) = h l1 (s l goal ) = 0 since Forward Lazy MT-Adaptive A* uses consistent h-values during the (l 1)-st search. Moreover, since s l1 goal cannot be expanded during the (l 1)-st search due to the condition on Line 20, ands l1 goal =s l goal in this case,s l goal cannot be expanded during the (l1)-st search either. Thus,z l1 (s l goal ) = h l1 (s l goal ) according to our denition. Thus, z l1 (s l goal ) =h l1 (s l goal ) = 0. 80 Thus, deltah(l) = deltah(l 1) (Line 61) = deltah(l 1) + 0 = deltah(l 1) +z l1 (s l goal ) = l2 P i=1 z i (s i+1 goal ) +z l1 (s l goal ) = l1 P i=1 z i (s i+1 goal ). Combining Case 1 and Case 2, the lemma holds. If x = j, then InitializeState(s) does not change h(s). It was called last during the x-th search, that is, the current search. It continues to hold that h j (s) =h j (s) according to the induction hypothesis. Otherwise, 0x<j. We distinguish two cases: Case 1: Assume that x = 0 (induction basis). Then, h j (s) = H(s;s j goal ) = h j (s) since h 1 (s) =H(s;s 1 goal ) and thus h j (s) =H(s;s j goal ) according to Lemma 4. Case 2: Otherwise,x> 0. Assume that Forward Eager and Lazy MT-Adaptive A* used the same h-values every time Forward Lazy MT-Adaptive A* called Initial- izeState so far. s was expanded last during or before the x-th search by Forward Lazy MT-Adaptive A* and thus also by Forward Eager MT-Adaptive A* accord- ing to the induction hypothesis since they expand the same states during the same search where they use the same h-values and tie-breaking strategy. We distinguish two caeses: { Case a: Assume that h k+1 (s) = H(s;s k+1 goal ) for at least one k with x k < j. Then h j (s) = H(s;s j goal ) according to Lemma 4. It holds that z x (s) j1 P l=x z l (s l+1 goal ) h j (s) due to the monotonicity of the max operator 81 used repeatedly in the calculation of h j (s) (Property 2) . Thus, h j (s) = max(z x (s) (deltah(j) deltah(x));H(s;s j goal )) (Lines 07 - 11) = max(z x (s) j1 P l=x z l (s l+1 goal );H(s;s j goal )) (Lemma 5) = max(z x (s) j1 P l=x z l (s l+1 goal );H(s;s j goal )) (Induction hypothesis) max(h j (s);H(s;s j goal )) (Property 2) = max(H(s;s j goal ));H(s;s j goal )) =H(s;s j goal ) Since x> 0, h j (s)H(s;s j goal ) (Line 11). Thus, h j (s) =H(s;s j goal ) =h j (s). { Case b: Otherwise, h x+1 (s) = z x (s)z x (s x+1 goal ) and h k+1 = h k (s)z k (s k+1 goal ) for all k with x<k <j since h x+1 (s) = max(z x (s)z x (s x+1 goal );H(s;s x+1 goal ))6= H(s;s x+1 goal ) and h k+1 (s) = max(h k (s)z k (s k+1 goal );H(s;s k+1 goal ))6=H(s;s k+1 goal ) for all x<k<j. Then, h j (s) = max(h j (s);H(s;s j goal )) (Lemma 2) = max(z x (s) j1 P l=x z l (s l+1 goal );H(s;s j goal )) = max(z x (s) j1 P l=x z l (s l+1 goal );H(s;s j goal )) (Induction hypothesis) = max(z x (s) (deltah(j) deltah(x));H(s;s j goal )) (Lemma 5) =h j (s) 3.1.2.4 Example I now use the example path planning problem for Forward Eager MT-Adaptive A* shown in Figure 3.4 to demonstrate the behavior of Forward Lazy MT-Adaptive A*. Figure 3.8 82 g search h f Figure 3.7: Legend of Figure 3.8 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (a) First Search counter = 1 deltah(1) = 0 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 4 7 5 7 6 7 4 1 3 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (b) After First Search counter = 1 deltah(1) = 0 pathcost(1) = 7 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 5 7 6 7 4 1 2 1 1 1 2 5 6 7 7 7 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 1 2 3 4 D A B C S G (c) After Target Moved counter = 1 deltah(1) = 0 deltah(2) = 1 pathcost(1) = 7 1 5 2 5 3 5 4 7 4 2 3 2 2 2 3 2 0 3 4 5 5 7 3 2 1 2 2 2 1 5 5 5 7 7 4 2 0 2 0 1 2 7 0 3 5 2 3 1 0 1 2 3 4 D A B C S G (d) Second Search counter = 2 deltah(1) = 0 deltah(2) = 1 pathcost(1) = 7 Figure 3.8: Example Trace of Forward Lazy Moving Target Adaptive A* shows the behavior of Forward Lazy MT-Adaptive A*: All states have their search-value in the lower right corner. States that have been initialized in InitializeState() also have their g-values in the upper left corner, h-values in the lower left corner, and f-values in the upper right corner (shown in Figure 3.7). Forward Lazy MT-Adaptive A* breaks ties among states with the same f-values in favor of states with larger g-values. Figure 3.8(a) shows the rst search of Forward Lazy MT-Adaptive A* that searches from the current state of the agent D2 to the current state of the target C4. For the rst search, Forward Lazy MT-Adaptive A* and Forward Eager MT-Adaptive A* (shown in Figure 3.4(a)) expand exactly the same states as they both use the user-providedh-values to guide the search. The expanded states are shaded grey. After the rst search, Forward Lazy MT-Adaptive A* does not update any h-values (shown in Figure 3.8(b)). The agent then moves along the cost-minimal path returned 83 by the rst search until it reaches B1, where it observes that B2 is blocked and the target moves from C4 to C3. Since the goal state is now o the path, the agent then performs a second search to replan a new path from its current state B1 to the goal state C3. After the goal state changed to C3, Forward Lazy MT-Adaptive A* does not correct the h-values of all states. Instead, it only calculatesh(C3) (= 1), theh-value of the goal state of the second search with respect to the goal state of the rst search, and adds h(C3) to the value of deltah(1), yielding deltah(2) = 1 (shown in Figure 3.8(c)), which will be used for correcting the h-values of the states that are used by future searches. Figure 3.8(d) shows the second search of Forward Lazy MT-Adaptive A*. The ex- panded states are shaded grey. It uses the same h-values that are needed by the second search for all states as Forward Eager MT-Adaptive A*. Thus, they nd the same paths and move the agent along the same trajectory. They dier in that Forward Lazy MT- Adaptive A* updates and corrects the h-values of the states only when these h-values are needed by future searches, while Forward Eager Adaptive A* updates the h-values of all states that have been expanded after each search and corrects the h-values of all states after the goal state changed between searches no matter whether the h-values of these states will be needed by future searches. One example is D2: The h-value of D2 is updated from 3 to 7, and corrected from 7 to 6 by Forward Eager MT-Adaptive A* after the rst search. However, the h-value of D2 is not needed by the second search and thus not changed by Forward Lazy MT-Adaptive A*. Another example is D4: The h-value of D4 is corrected from 1 to 2 after the rst search by Forward Eager MT-Adaptive A*. However, the h-value of D4 is not needed by the second search and thus not changed by Forward Lazy MT-Adaptive A*. 84 3.2 Generalized Adaptive A* MT-Adaptive A* extends Adaptive A* to moving target search in terrains where the action costs of the agent do not decrease between searches. However, MT-Adaptive A* does not apply to terrains where the action costs of the agent can decrease between searches. Thus, MT-Adaptive A* only applies to known static terrains and unknown static terrains (see Section 1.1.2). In order to extend MT-Adaptive A* to moving target search in terrains where the action costs can both increase and decrease between searches, so that it applies to all three classes of terrains (see Section 1.1.2), I introduce Generalized Adaptive A* (GAA*) in this section. GAA* extends Lazy MT-Adaptive A*. I choose to extend Lazy MT-Adaptive A* but not Eager MT-Adaptive A* because Eager MT- Adaptive A* can update the h-values of states that are not needed by future searches, which is time-consuming if the number of expanded states is large (and Eager MT- Adaptive A* then needs to update theh-values of all expanded states) or the target moves (and Eager MT-Adaptive A* then needs to correct the h-values of all states) (Koenig et al., 2007). 3.2.1 Applicability GAA* can either perform forward searches by assigning the current states of the agent and target to the start and goal states of each search, respectively, referred to as Forward GAA*, or perform backward searches by assigning the current states of the agent and target to the goal and start states of each search, respectively, referred to as Backward GAA*. It applies to moving target search in all three classes of terrains (see Section 1.1.2). 85 In this section, I only introduce Forward GAA*, since it is trivial to switch the search direction, resulting in Backward GAA*. 3.2.2 Principle The principle behind Forward GAA* is as follows: If the cost of no action decreases be- tween searches, Forward GAA* operates in the same way as Forward Lazy MT-Adaptive A*. If some actions whose costs decrease after one search, Forward GAA* performs a version of Dijkstra's algorithm, referred to as the consistency procedure, to eagerly up- date the h-values of all states that are aected by the action cost decrease, so that they remain consistent with respect to the goal state of the current search. 3.2.3 Operations Figures 3.9 and 3.10 give the pseudo code of Forward GAA*, which extends the pseudo code of Forward Lazy MT-Adaptive A* (shown in Figure 3.5). The major changes to the pseudo code of Forward Lazy MT-Adaptive A* are as follows: Forward GAA* repeatedly searches for a new path from the current state of the agent to the current state of the target after it found that there does not exist a path after a search, since the action costs can not only increase but also decrease between searches. There might exist a path from the current state of the agent to the current state of the target after the action costs of the agent decrease in the future. Thus, Forward GAA* terminates only when the agent catches the target (Lines 70 - 71). In order to deal with this termination condition, Forward GAA* 86 uses a variable last nopath to remember the last search in which no path existed. Initially, last nopath = 0 (Line 57) since no search has been performed. Forward GAA* contains a consistency procedure, referred to as the consistency procedure (Lines 36 - 51), that eagerly updates consistent h-values with respect to the goal state with a version of Dijkstra's algorithm so that they remain consistent with respect to the goal state after action cost increases and decreases. I rst describe the situation when there exists a path between the current states of the agent and target: After each search, the consistency procedure is performed to maintain the consistency of the h-values with respect to the goal state. The consistency procedure maintains an OPEN list, which is a priority queue that contains those states whose h-values need to be updated in the consistency procedure. Initially, the OPEN list is set to empty (Line 38). The consistency procedure uses the same OPEN list used in procedure ComputePath(), except that the key of each state in the OPEN list is the h-value (rather than thef-value) during the consistency procedure, since the objective of the consistency procedure is to propagate theh-value changes to the states whoseh-values need to be updated. Then, for all actions whose costs decreased from c 0 (s;s 0 ) to c(s;s 0 ) between searches, the consistency procedure performs the following operations: It rst calculates the h-values of s and s 0 by executing InitializeState(s) and InitializeState(s 0 ) (Lines 40 - 41). If h(s)>c(s;s 0 ) +h(s 0 ), h(s) is set to c(s;s 0 ) +h(s 0 ) (Lines 42 - 43) and inserted into the OPEN list (with h(s) as its key) (Line 44). Starting with the OPEN list, the consistency procedure performs a version of Dijkstra's algorithm by repeatedly executing the following operations: It deletes a state s 0 with the smallest h-value from 87 01 function CalculateKey(s) 02 return g(s)+h(s); 03 procedure InitializeState(s) 04 if search(s)last nopath 05 g(s) :=1; 06 h(s) :=H(s;s goal ); 07 else if search(s)6=counter 08 if g(s)+h(s)<pathcost(search(s)) 09 h(s) :=pathcost(search(s))g(s); 10 h(s) :=h(s)(deltah(counter)deltah(search(s))); 11 h(s) := max(h(s);H(s;s goal )); 12 g(s) :=1; 13 search(s) :=counter; 14 procedure UpdateState(s) 15 if s2OPEN 16 OPEN.Update(s, CalculateKey(s)); 17 else 18 OPEN.Insert(s, CalculateKey(s)); 19 procedure UpdateState2(s) 20 if s2OPEN 21 OPEN.Update(s, h(s)); 22 else 23 OPEN.Insert(s, h(s)); 24 function ComputePath() 25 while OPEN.TopKey()< CalculateKey(s goal ) 26 s :=OPEN.Pop(); 27 for all s 0 2Succ(s) 28 InitializeState(s 0 ); 29 if g(s 0 )>g(s)+c(s;s 0 ) 30 g(s 0 ) :=g(s)+c(s;s 0 ); 31 parent(s 0 ) :=s; 32 UpdateState(s 0 ); 33 if OPEN =; 34 return false; 35 return true; 36 procedure ConsistencyProcedure() 37 update the increased and decreased action costs (if any); 38 OPEN :=;; 39 for all actions whose costs decreased from c 0 (s;s 0 ) to c(s;s 0 ) 40 InitializeState(s); 41 InitializeState(s 0 ); 42 if h(s)>c(s;s 0 )+h(s 0 ) 43 h(s) :=c(s;s 0 )+h(s 0 ); 44 UpdateState2(s); 45 while OPEN6=; 46 s 0 :=OPEN.Pop(); 47 for all s2Pred(s 0 )nfs goal g 48 InitializeState(s); 49 if h(s)>c(s;s 0 )+h(s 0 ) 50 h(s) :=c(s;s 0 )+h(s 0 ); 51 UpdateState2(s); Figure 3.9: Pseudo Code of Forward Generalized Adaptive A* (Part 1) the OPEN list (Line 46) and then checks all s2 Pred(s 0 ) with s6= s goal (Line 47). If h(s)>c(s;s 0 )+h(s 0 ), thenh(s) is set toc(s;s 0 )+h(s 0 ) (Lines 49 - 50) ands is inserted into 88 52 function Main() 53 counter := 1; 54 s start := the current state of the agent; 55 s goal := the current state of the target; 56 deltah(1) := 0; 57 last nopath := 0; 58 for all s2S 59 search(s) := 0; 60 while s start 6=s goal 61 InitializeState(s start ); 62 InitializeState(s goal ); 63 g(s start ) := 0; 64 OPEN :=;; 65 OPEN.Insert(s start , CalculateKey(s start )); 66 if ComputePath() = true 67 pathcost(counter) :=g(s goal ); 68 while target not caught AND action costs do not change AND target on path from s start to s goal 69 agent follows the path from s start to s goal ; 70 if agent caught target 71 return true; 72 s start := the current state of the agent; 73 s newgoal := the current state of the target; 74 if s goal 6=s newgoal 75 InitializeState(s newgoal ); 76 if g(s newgoal )+h(s newgoal )<pathcost(counter) 77 h(s newgoal ) :=pathcost(counter)g(s newgoal ); 78 deltah(counter+1) :=deltah(counter)+h(s newgoal ); 79 s goal :=s newgoal ; 80 else 81 deltah(counter+1) :=deltah(counter); 82 counter :=counter+1; 83 ConsistencyProcedure(); 84 else 85 pathcost(counter) :=1; 86 last nopath := counter; 87 deltah(counter+1) := 0; 88 counter :=counter+1; 89 wait until some action costs decrease; 90 update the increased and decreased action costs (if any); 91 s goal := the current state of the target; 92 return true; Figure 3.10: Pseudo Code of Forward Generalized Adaptive A* (Part 2) the OPEN list (with h(s) as its key) (Line 44). The consistency procedure terminates when the OPEN list is empty (Line 45). It is important to point out that Forward GAA* updates and corrects the h-values in a lazy way during each search. However, the consistency procedure currently updates the h-values in an eager way, which means that it is run between searches whenever action costs decrease. The reason is that after some action costs decrease, the consistency procedure decreases the h-values of the states that are directed aected by the action costs decrease (Lines 39 - 44) and inserts the states into 89 the OPEN list (Line 44). Theseh-value decreases are then propagated to the predecessor states of the states in the OPEN list by using a version of Dijkstra's algorithm (Lines 45 - 51) if the condition on Line 49 is satised. The h-values of those predecessor states then decrease (Lines 49 - 50). By executing the consistency procedure, the h-values of all states aected by the action costs decrease are updated (by propagating the h-value changes), and they cannot be updated by using Assignment (3.1) in a Lazy way. Thus, the consistency procedure can potentially update a large number ofh-values that are not needed by future searches. I have also tried to make the consistency procedure update the h-values in a lazy way by interleaving the consistency procudure with searches in order to speed up GAA*. However, the resutls showed that this variant of GAA* did not run faster than the current version of GAA* described in Figures 3.9 and 3.10. Thus, I only describe the main idea and skip the technical details of this variant. The variant maintains two priority queues (1) the OPEN list OPEN 1 for the searches (the key of each state in OPEN 1 is its f-value), and (2) the OPEN list OPEN 2 for the consistency procedure (the key of each state in OPEN 2 is its h-value), since it needs to interleav the consistency procudure with searches. After each search, the variant only executes lines 36 - 44 of the consistency procedure after some action costs decrease and then starts a new sarch. During the new search, whenever a state s is deleted from OPEN 1 for expansion (Line 26), if h(s) OPEN 2 :TopKey(), the variant expands state s in the same way as GAA* (Lines 27 - 32). The reason is because the consistency procedure can only decrease the h-values but not increase them (Lines 42 - 43 and 49 - 50). Since h(s) OPEN 2 :TopKey() on Line 26, it is guarateed that h(s) cannot 90 be decreased by the consistency procedure (Lines 45 - 51). Thus, it is safe for the variant to expands withh(s) (Lines 27 - 32) before the consistency procedure terminates. Otherwise, the variant repeatedly executes lines 45 - 51 of the consistent procedure until h(s) OPEN 2 :TopKey() on Line 45. Then, the variant continues to expand state s (Lines 27 - 32). I now describe the situation when there does not exist a path between the current states of the agent and target: If there does not exist a path in the counter-th search, Forward GAA* executes Lines 84 - 91: The reason Lines 84 - 91 are added is as follows: Assume Forward GAA* executes Lines 67 - 83 to deal with the situation when there does not exist a path after the counter-th search. Then, pathcost(counter) =1 after the counter-th search (Line 67). h(s newgoal ) is then initialized to1 since pathcost(counter) = 1 (Line 77). Thus, deltah(counter + 1) =1 (Line 78), and the deltah-values will be1 for all future searches since they cannot decrease (Lines 78 and 81). The consequence is that, whenever a state s with search(s) = counter is initialized in InitializeState(s) in a future search, 11 can be executed when calculating h(s) (Line 10), which causes a problem. To address this problem, Forward GAA* uses a variable last nopath to remember the last search in which no path existed. All information before the last nopath- th search (including the last nopath-th search itself) is discarded: Forward GAA* resets the running sum deltah(last nopath + 1) = 0 (Line 87) and uses the user-provided h- values for all states whose h-values have been calculated by InitializeState() last before and during the last nopath-th search (Lines 04 - 06). I now show that the h-values remain consistent with respect to the goal state after the consistency procedure. 91 Theorem 2 The h-values remain consistent with respect to the goal state after action cost increases and decreases if the consistency procedure is run. Proof: Let c denote the action costs before the changes and c 0 denote the action costs after the changes. Let h(s) denote the h-values before running the consistency procedure and h 0 (s) denote the h-values after its termination. Thus, h(s goal ) = 0 and h(s) c(s;s 0 ) +h(s 0 ) for all s2 Snfs goal g and all s 0 2 Succ(s) since the h-values are consistent with respect to s goal before action cost changes. The h-value of s goal remains zero since it is never updated. The h-value of any s 2 Snfs goal g is monotonically non-increasing over time since it only gets updated to an h-value that is smaller than its current h-value (Lines 42 - 43 and 49 - 50). Thus, I distinguish three cases for any s2Sns goal and s 0 2Succ(s): First, theh-value ofs 0 never decreased during the consistency procedure (and thus h(s 0 ) = h 0 (s 0 )) and c(s;s 0 ) c 0 (s;s 0 ). Then, h 0 (s) h(s) c(s;s 0 ) +h(s 0 ) = c(s;s 0 ) +h 0 (s 0 )c 0 (s;s 0 ) +h 0 (s 0 ). Second, the h-value of s 0 never decreased during the consistency procedure (and thus h(s 0 ) = h 0 (s 0 )) and c(s;s 0 ) > c 0 (s;s 0 ). Then, c(s;s 0 ) decreased and s6= s goal and Lines 39 - 43 were thus executed. Let h(s) be theh-value ofs after the execution of Lines 39 - 44. Then, h 0 (s) h(s)c 0 (s;s 0 ) +h 0 (s 0 ). Third, theh-value ofs 0 decreased during the consistency procedure and thush(s 0 )> h 0 (s 0 ). Then, s 0 was inserted into the priority queue and later retrieved on Line 46. Consider the last time that it was retrieved on Line 46. Then, Lines 47 - 51 were 92 executed. Let h(s) be the h-value of s after the execution of Lines 47 - 51. Then, h 0 (s) h(s)c 0 (s;s 0 ) +h 0 (s 0 ). Thus, h 0 (s) c 0 (s;s 0 ) +h 0 (s 0 ) in all three cases, and the h-values remain consistent with respect to the goal state. 3.2.4 Example g search h f Figure 3.11: Legend of Figures 3.12 and 3.13 I now use an example problem of moving target search that extends the problem in Figure 2.9 to demonstrate the behavior of GAA*. Figures 3.12 and 3.13 illustrate the example problem solved with Forward Repeated A* and Forward GAA*, respectively. The real terrain and the current states of the agent and target are exactly the same as those in Figure 2.9: Initially, the current states of the agent and target are D2 and C4, respectively. The only dierence is that the terrain now is a known dynamic terrain, where the action costs of the agent can both increase and decrease between searches. After the rst search, the target moves from C4 to C3 (shown in Figures 3.12(b) and 3.13(b)), and B2 becomes unblocked, which changes the costsc(B1,B2),c(B2,B1),c(A2,B2),c(B2,A2), c(B3,B2) and c(B2,B3) from innity to one (shown in Figures 3.12(c) and 3.13(c)). Forward GAA* updates the h-values in a lazy way except for the consistency proce- dure. All states have their search-values in the lower right corner. States that have been 93 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 7 9 8 9 4 1 2 1 1 1 2 5 9 9 3 1 0 0 1 1 5 0 3 4 1 3 1 0 1 2 3 4 D A B C S G (a) First Search counter = 1 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 7 9 8 9 4 1 2 1 1 1 2 5 9 9 3 1 0 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (b) After Target Moved counter = 1 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 7 9 8 9 4 1 0 2 1 1 1 2 5 9 9 3 1 0 0 1 1 5 0 3 4 1 3 1 0 D A B C 1 2 3 4 S G (c) After B2 Unblocked counter = 1 3 7 4 7 5 7 7 9 4 2 3 2 2 2 2 1 2 5 3 5 4 5 5 7 3 2 2 2 1 2 2 2 1 3 5 5 9 9 2 2 0 2 0 1 0 3 1 3 3 2 2 2 0 1 2 3 4 D A B C S G (d) Second Search counter = 2 Figure 3.12: Example Trace of Forward Repeated A* 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 7 9 8 9 4 1 2 1 1 1 2 5 9 9 3 1 0 0 1 1 5 0 3 4 1 3 1 0 1 2 3 4 D A B C S G (a) First Search counter= 1 deltah(1) = 0 4 9 5 9 6 9 7 9 5 1 4 1 3 1 2 1 3 7 7 9 8 9 4 1 2 1 1 1 2 5 9 9 3 1 1 1 0 1 1 5 0 3 4 1 3 1 0 1 2 3 4 D A B C S G (b) After Target Moved counter = 1 deltah(1) = 0 deltah(2) = 1 pathcost(1) = 9 6 9 7 9 4 2 3 2 3 1 2 1 8 9 3 2 2 2 1 2 1 1 9 9 4 2 1 1 0 1 5 2 6 2 0 D A B C 1 2 3 4 S G (c) After B2 Unblocked counter = 2 deltah(1) = 0 deltah(2) = 1 pathcost(1) = 9 3 7 4 7 5 7 7 9 4 2 3 2 2 2 2 1 2 5 3 5 4 5 5 7 3 2 2 2 1 2 2 2 1 5 5 5 9 9 4 2 0 2 0 1 0 5 1 7 5 2 6 2 0 1 2 3 4 D A B C S G (d) Second Search counter = 2 deltah(1) = 0 deltah(2) = 1 pathcost(1) = 9 Figure 3.13: Example Trace of Forward Generalized Adaptive A* initialized by InitializeState() also have their g-values in the upper left corner, h-values in the lower left corner, and f-values in the upper right corner (shown in Figure 3.11). Both algorithms break ties among states with the same f-values in favor of states with larger g-values. For the rst search, Forward Repeated A* (shown in Figure 3.12(a)) and Forward GAA* (shown in Figure 3.13(a)) expand exactly the same states as they both use the user-provided h-values to guide the search. The expanded states are shaded grey. After the rst search, the agent moves along the cost-minimal path returned by the rst search from D2 to D1, and the target moves from C4 to C3. Forward GAA* (shown in Figure 3.13(b)) calculates theh-value of the current state of the target C3 with respect 94 to the goal state of the rst search C4, which is equal to one, and then adds it to a running sum of all corrections by executing deltah(2) = deltah(1) + 1 = 1. Forward GAA* then assigns the current state of the target C3 to the goal state of the second search. States initialized by InitializeState() are shaded grey (shown in Figure 3.13(b)). In contrast to Forward GAA*, Forward Repeated A* does not perform any operations to update h-values after the target moved (shown in Figure 3.12(b)). When B2 becomes unblocked, Forward GAA* performs the consistency procedure to maintain the consistency of the h-values with respect to the goal state of the current search. States initialized by InitializeState() are shaded grey (shown in Figure 3.13(c)). In contrast to Forward GAA*, Forward Repeated A* does not perform any operations to update h-values after B2 becomes unblocked (shown in Figure 3.12(c)). For the second search, Forward Repeated A* still uses the user-provided h-values, while Forward GAA* can use the updated h-values, which can be more informed. Fig- ure 3.12(d) shows that Forward Repeated A* expands D2 because the user-provided h-values are misleading and form a local minimum. Forward GAA* avoids expanding D2 during the second search (shown in Figure 3.13(d)) resulting in a smaller number of state expansions (but at the cost of updating the h-values of states in the consistency procedure to maintain their consistency). Thus, it is important to evaluate the runtime of Forward GAA* compared to Forward Repeated A*. 95 3.3 Summary In this chapter, I reviewed MT-Adaptive A*, that extends Adaptive A* to moving target search: In Section 3.1.1, I rst reviewed of an eager version of MT-Adaptive A*, referred to as Eager MT-Adaptive A*. In Section 3.1.2, I then reviewed of a more complicated version, referred to as Lazy MT-Adaptive A*. Both versions of MT-Adaptive A* apply to moving target search, where the start and goal states can change between searches. However, they do not apply to moving target search in terrains where the action costs of the agent can decrease between searches. Thus, in Section 3.2, I introduced GAA* that extends Lazy MT-Adaptive A* to moving target search in terrains where the action costs of the agent can both increase and decrease between searches. GAA* is thus the rst heuristic learning incremental search algorithm that applies to moving target search in all three classes of terrains (see Section 1.1.2). 96 Chapter 4: Incremental Search by Reusing Search Trees In this chapter, I introduce search tree transforming incremental search algorithms for path planning for moving target search. In Section 2.4.2.2, I reviewed D* Lite (Koenig & Likhachev, 2002), a state-of-the- art search tree transforming incremental search algorithm designed for stationary target search in all three classes of terrains (see Section 1.1.2). However, it is still unknown how to apply D* Lite to moving target search and make it run faster than Forward and Backward Repeated A*. Thus, in order to speed up search tree transforming incremental search algorithms and apply them to moving target search, I develop new search tree transforming incremental search algorithms, including Generalized Fringe-Retrieving A* (G-FRA*) (Sun, Yeoh, & Koenig, 2010a), Fringe-Retrieving A* (FRA*) (Sun et al., 2009), Basic Moving Target D* Lite (Basic MT-D* Lite) and Moving Target D* Lite Action Costs Action Costs Start State Goal State Algorithms Can Increase Can Decrease Can Change Can Change Between Searches Between Searches Between Searches Between Searches D* Lite Yes Yes No Yes G-FRA* & FRA* No No Yes Yes Basic MT-D* Lite & MT-D* Lite Yes Yes Yes Yes Table 4.1: Search Problems that Search Tree Transforming Incremental Search Algo- rithms Apply to 97 (MT-D* Lite) (Sun, Yeoh, & Koenig, 2010b). Table 4.1 illustrates the search problems that D* Lite, G-FRA*, FRA*, Basic MT-D* Lite and MT-D* Lite apply to: D* Lite applies to search problems where the goal state can change and the action costs of the agent can change between searches. It does not apply to search problems where the start state can change between searches. For stationary target search, D* Lite can only perform backward (but not forward) searches by assigning the current state of the agent and target to the goal and start state of each search, respectively. D* Lite only applies to stationary target search in all three classes of terrains (see Section 1.1.2). G-FRA* applies to search problems where the start and goal states can change between searches in known static terrains, where the action costs of the agent do not change between searches. FRA* optimizes G-FRA* for search problems where the start and goal states can change between searches in known static grids, where the action costs of the agent do not change between searches. Both FRA* and G-FRA* require the start state of the current search to remain in the search tree of the previous search since they transform the search tree of the previous search to the search tree of the current search by reusing the part of search tree of the previous search rooted in the start state of the current search. Thus, they can only perform forward (but not backward) searches by assigning the current states of the agent and the target to the start and goal states of each search, respectively, since the current state of the agent can only move along the path found by the previous search and hence cannot move outside of the search tree of the previous search, while 98 the current state of the target can move to a state that is outside of the search tree of the previous search. If FRA* and G-FRA* perform backward searches and the target moves outside of the search tree of the previous search, then they cannot reuse any part of the search tree from the previous search. FRA* applies to moving target search in known static grids only, and G-FRA* applies to moving target search in known static terrains (see Section 1.1.2). Though FRA* applies only to moving target search in known static grids and G-FRA* applies only to moving target search in known static terrains, where the action costs of the agent do not change between searches, they provide a new way of transforming the search tree of the previous search to the search tree of the current search, which can be used to develop new incremental search algorithms that apply to moving target search in terrains where the action costs of the agent can change between searches, such as MT-D* Lite. FRA* and G-FRA* run up to one order of magnitude faster than both Repeated A* and GAA* for moving target search in known static terrains. Basic MT-D* Lite generalizes D* Lite to search problems where both the start and goal states can change, and the action costs of the agent can change between searches. MT-D* Lite combines the principles of G-FRA* and Basic MT-D* Lite to optimize Basic MT-D* Lite and applies to the same search problems as Basic MT- D* Lite. Both Basic MT-D* Lite and MT-D* Lite can only perform forward (but not backward) searches since they require the start state of the current search to remain in the search tree of the previous search, similar to G-FRA*. Basic MT-D* Lite and MT-D* Lite apply to moving target search in all three classes of terrains 99 (see Section 1.1.2). Basic MT-D* Lite runs faster than Repeated A* and GAA* by up to a factor of four and MT-D* Lite runs faster than Repeated A* and GAA* by up to a factor of eight for moving target search in known dynamic terrains. The structure of this chapter is as follows: I introduce G-FRA* and its optimization FRA* in Section 4.1: First, I introduce G-FRA* in Section 4.1.1. Then, I introduce FRA* in Section 4.1.2. I introduce Basic MT-D* Lite and its optimization MT-D* Lite in Section 4.2: First, I introduce Basic MT-D* Lite in Section 4.2.1. Then, I introduce MT-D* Lite in Section 4.2.2. 4.1 GeneralizedFringe-RetrievingA*anditsOptimization In this section, I introduce G-FRA* for moving target search in known static terrains. I then introduce FRA* that optimizes G-FRA* for moving target search in known static grids. 4.1.1 Generalized Fringe-Retrieving A* G-FRA* (Sun et al., 2010a) is a new search tree transforming incremental search algo- rithm for moving target search in known static terrains. 4.1.1.1 Applicability G-FRA* can only perform forward (but not backward) searches by assigning the current states of the agent and target to the start and goal states of each search, respectively. G-FRA* only applies to moving target search in known static terrains (see Section 1.1.2). 100 4.1.1.2 Principle The principle behind G-FRA* is as follows: G-FRA* performs forward searches. After a search, A* Properties 4, 5 and 6 hold since G-FRA* uses A* for the search (= previous search). The agent then moves along the path towards the target. Thus, the current state of the agent must remain in the search tree and cannot move outside of the search tree of the previous search. Whenever the target moves o the path, G-FRA* performs a new search (= current search) by assigning the current states of the agent and target to the start and goal states of the current search again. Instead of performing the current search from scratch, G-FRA* transforms the search tree of the previous search to the search tree of the current search by reusing the part of search tree of the previous search rooted in the start state of the current search. The search tree of the previous search is given by the initial OPEN and CLOSED lists as well as the parent pointers and the g-values of the states in them. Thus, G-FRA* changes the initial OPEN and CLOSED lists as well as theg-values and parent pointers of the states in them to guarantee that A* Properties 4, 5 and 6 hold again after the search tree has been transformed, and then starts the current search with these initial OPEN and CLOSED lists to nd a cost-minimal path from the start state of the current search to the goal state of the current search. 4.1.1.3 Operations Figures 4.1 and 4.2 give the pseudo code of G-FRA*. 1 Since G-FRA* applies only to terrains where the action costs of the agent do not change between searches, G-FRA* can terminate when no path exists between the start state and the goal state after a search 1 In this dissertation, all pseudo codes use the following functions to manage the DELETED list unless otherwise specied: DELETED.Insert(s) inserts state s into the DELETED list. 101 (Line 64), because it is then guaranteed that it will not be able to nd any path in future searches. Similar to the pseudo code of A* (shown in Figure 2.4), G-FRA* initializes the g- and h-values of a state s whenever they are needed during a search by executing InitializeState(s) (Line 19). Dierent from the pseudo code of A*, G-FRA* maintains a CLOSED list that contains all states that have been expanded in previous searches that can be reused for the current search, and a DELETED list that contains all states that are deleted from the search tree of the previous search. Both the CLOSED and DELETED lists are needed to transform the search tree of the previous search to the search tree of the current search. The OPEN list is complete i it satises the following properties: If the CLOSED list is empty, then the OPEN list contains only s start . Otherwise, the OPEN list contains exactly all states that are not in the CLOSED list but have at least one predecessor state in the CLOSED list. The OPEN list is incomplete if does not satisfy the properties described above. G-FRA* maintains a variable open incomplete to indicate whether the initial OPEN list is incomplete (Lines 65, 76, and 77). open incomplete is set to false each time after ComputePath() has been performed (Line 65) since the OPEN list is complete after each search, and set to true each time after Step 2 (Lines 27 - 33) has been performed (Line 76) since deleting states from the initial OPEN and CLOSED lists can result in an incomplete OPEN list. G-FRA* terminates after the goal state is inserted into the CLOSED list (Line 16) and expanded (Lines 17 - 23). The rst search of G-FRA* is identical to an A* search, and starts from scratch to nd a cost-minimal path from the start state to the goal state. The agent then moves along the path until it catches the target or the target moves o the path. In the latter 102 01 function CalculateKey(s) 02 return g(s)+h(s); 03 procedure InitializeState(s) 04 if search(s)6=counter 05 g(s) :=1; 06 h(s) :=H(s;s goal ); 07 search(s) :=counter; 08 procedure UpdateState(s) 09 if s2OPEN 10 OPEN.Update(s, CalculateKey(s)); 11 else 12 OPEN.Insert(s, CalculateKey(s)); 13 function ComputePath() 14 while OPEN6=; 15 s :=OPEN.Pop(); 16 CLOSED.Insert(s); 17 for all s 0 2Succ(s) 18 if s 0 = 2CLOSED 19 InitializeState(s 0 ); 20 if g(s 0 )>g(s)+c(s;s 0 ) 21 g(s 0 ) :=g(s)+c(s;s 0 ); 22 parent(s 0 ) :=s; 23 UpdateState(s 0 ); 24 if s =s goal 25 return true; 26 return false; 27 procedure Step2() 28 parent(s start ) :=NULL; 29 for all s2 S in the subtree rooted in s oldstart /* After parent(s start ) := NULL, all states in the subtree rooted in s start are not in the subtree rooted in s oldstart anymore. */ 30 parent(s) :=NULL; 31 if s2OPEN then OPEN.Delete(s); 32 if s2CLOSED then CLOSED.Delete(s); 33 DELETED.Insert(s); 34 procedure Step4() 35 for all s2OPEN 36 h(s) :=H(s;s goal ); 37 UpdateState(s); 38 search(s) :=counter; 39 for all s2DELETED 40 if9s 0 2Pred(s) : s 0 2CLOSED 41 InitializeState(s); 42 for all s 0 2Pred(s) 43 if s 0 2CLOSED AND g(s)>g(s 0 )+c(s 0 ;s) 44 g(s) :=g(s 0 )+c(s 0 ;s); 45 parent(s) :=s 0 ; 46 UpdateState(s); 47 DELETED :=;; 48 procedure Step5() 49 for all s2OPEN 50 h(s) :=H(s;s goal ); 51 UpdateState(s); Figure 4.1: Pseudo Code of Generalized Fringe-Retrieving A* (Part 1) case, G-FRA* runs an A* search with initial OPEN and CLOSED lists obtained from the previous search rather than from scratch. 103 52 function Main() 53 counter := 1; 54 s start := the current state of the agent; 55 s goal := the current state of the target; 56 for all s2S 57 search(s) := 0; 58 InitializeState(s start ); 59 g(s start ) := 0; 60 OPEN :=CLOSED :=DELETED :=;; 61 OPEN.Insert(s start , CalculateKey(s start )); 62 while s start 6=s goal 63 if ComputePath() = false /* Step 6 */ 64 return false; /* Target cannot be caught */ 65 open incomplete := false; 66 while s goal 2CLOSED 67 while target not caught AND target on path from s start to s goal 68 agent follows the path from s start to s goal ; 69 if agent caught target 70 return true; 71 s oldstart :=s start ; 72 s start := the current state of the agent; 73 s goal := the current state of the target; 74 if s start 6=s oldstart 75 Step2(); 76 open incomplete := true; 77 if open incomplete 78 counter :=counter+1; 79 Step4(); 80 else 81 Step5(); 82 return true; Figure 4.2: Pseudo Code of Generalized Fringe-Retrieving A* (Part 2) Previous OPEN List Previous CLOSED List S’ G’ G S (a) Before Step 1 Deleted CLOSED List Deleted OPEN List Initial CLOSED List G Incomplete Initial OPEN List S (b) After Step 2 Deleted CLOSED List Initial CLOSED List G S Inserted OPEN List Incomplete Initial OPEN List (c) After Step 4 Figure 4.3: Operations of Generalized Fringe-Retrieving A* Figure 4.3(a) visualizes the OPEN and CLOSED lists after the previous search, which are used as the initial OPEN and CLOSED lists that the following six steps manipulate. S' and S represent the start states of the previous and current search, respectively, and G' and G represent the goal states of the previous and current search, respectively. Step 1 (Starting A* Immediately): G-FRA* executes this step if the agent has not moved since ComputePath() was executed last (Line 63), and hence s start = s oldstart , where s oldstart is the start state when ComputePath() was executed last. 104 When Step 1 is executed, Step 2 (Deleting States) must have not been executed since ComputePath() was executed last, because s start =s oldstart and the precondition of Step 2 (Line 74) is not satised. Consequently, open incomplete is false (= the OPEN list is complete) when Step 1 is executed, because open incomplete was set to false (Line 65) after ComputePath() was executed last and the only place where open incomplete is set to true is Line 76, which cannot be executed since its precondition s start 6= s oldstart (Line 74) does not hold in this case. There are the following two cases in Step 1: { Case 1 (Lines 66 - 73): If the goal state of the current search is in the initial CLOSED list (Line 66), then the previous search already determined a cost- minimal path from the start state of the current search to the goal state of the current search, which can be identied in reverse by repeatedly following the parent pointers from the goal state of the current search to the start state of the current search (see A* Property 5 in Section 2.3.4). Then G-FRA* skips Steps 2 to 6. { Case 2 (80 - 81): If the goal state of the current search is not in the initial CLOSED list (Line 66), then the previous search can be continued to determine a cost-minimal path from the start state of the current search to the goal state of the current search. G-FRA* then skips Steps 2 to 4 and executes Step 5 to calculate the h-values of all states in the OPEN list with respect to the goal state of the current search and update theirf-values before the current search is run since the goal state has changed after the previous search (Line 81). 105 G-FRA* then starts the current search with the initial OPEN and CLOSED lists (Line 63) to determine a new cost-minimal path from the start state of the current search to the goal state of the current search. Step 2 (Deleting States): All states in the initial CLOSED list need to satisfy A* Property 4 with respect to the start state of the current search. After the previous search, the initial CLOSED list satises A* Property 4 with respect to the start state of the previous search since G-FRA* performs A* for the previous search. Before the current search, if the initial CLOSED list contains exactly all states of the subtree rooted in the start state of the current search, then A* Property 4 holds with respect to the start state of the current search. A* Property 4(a) holds trivially, and A* Property 4(b) holds for the following reason: Consider the cost- minimal path from the start state of the previous search s 0 start to any states in the initial CLOSED list that results from following the parent pointers froms tos 0 start in reverse. Since the start state of the current searchs start is on this cost-minimal path, it holds thatdist (s 0 start ;s) =dist (s 0 start ;s start )+dist (s start ;s). Sinces satises A* Property 4(b) with respect to s 0 start , it holds that g(s) = g(s 0 start ) +dist (s 0 start ;s). Sinces start satises A* Property 4(b) with respect tos 0 start , it holds thatg(s start ) = g(s 0 start ) +dist (s 0 start ;s start ). Thus, s also satises A* Property 4(b) with respect to s start since g(s) = g(s 0 start ) +dist (s 0 start ;s) = g(s start )dist (s 0 start ;s start ) + dist (s 0 start ;s) =g(s start ) +dist (s start ;s). However, A* Property 4 might not hold for the states that are not in the subtree rooted in the start state of the current search. Thus, G-FRA* performs the following 106 procedure to delete those states from the search tree of the previous search: G- FRA* rst sets the parent pointer of the start state of the current search to NULL (Line 28), then a path from the start state of the previous search to any state in the subtree rooted in the start state of the current search cannot be identied in reverse by repeatedly following the parent pointers from the state to the start state of the previous search. Thus, all states in the subtree rooted in the start state of the current search are not in the subtree rooted in the start state of the previous search. G-FRA* then iterates over all states in the subtree rooted in the start state of the previous search (Line 29): 2 G-FRA* sets the parent pointers of the states to NULL (Line 30), deletes the states from the initial OPEN and CLOSED lists if they were in the lists (Lines 31 - 32), and inserts them into the DELETED list (Line 33). Figure 4.3(b) visualizes the initial OPEN and CLOSED lists after Step 2. The dotted line represents the states deleted from the initial OPEN list (called \deleted OPEN list"), the yellow area represents the states deleted from the initial CLOSED list (called \deleted CLOSED list"), the solid line represents the states remaining in the initial OPEN list (called \incomplete initial OPEN list"), and the blue area represents the states remaining in the initial CLOSED list (called \initial CLOSED list"). 2 One possible implementation of Line 29 is using Breadth-First Search (BFS), which maintains a rst- in-rst-out (FIFO) queue. Initially, the FIFO queue only contains the start state of the previous search. BFS repeats the following procedure until the FIFO queue is empty: It deletes a state s from the FIFO queue and expands s by performing the following operations for each s 0 2Succ(s). If parent(s 0 ) =s then BFS generates s 0 by inserting s 0 into the FIFO queue. 107 Step 3 (Terminating Early): If the goal state of the current search is in the initial CLOSED list (Line 66), then G-FRA* executes in the same way as Case 1 of Step 1 (= determines a cost-minimal path from the start state of the current search to the goal state of the current search and skips the rest of the steps), except that the OPEN list is complete in Case 1 of Step 1 and the OPEN list might be incomplete in Step 3, since it might not contain exactly all states that are not in the initial CLOSED list but have at least one predecessor state in the initial CLOSED list after Step 2 has been executed (shown in Figure 4.3(b)). Thus, G-FRA* has to complete the OPEN list in the following steps before performing the current search. Step 4 (Inserting States): The initial OPEN list can be incomplete at the start of Step 4. Thus, Step 4 identies the states that have to be inserted into the initial OPEN list to make it complete. Step 4 iterates over all states in the DELETED list and inserts those states that have a predecessor state in the initial CLOSED list into the initial OPEN list. The DELETED list contains all states deleted from the initial OPEN and CLOSED lists in all executions of Step 2 since G-FRA* ran the last search (Line 33). G-FRA* completes the initial OPEN list by performing the following check for all states in the DELETED list before it sets the DELETED list to empty again (Line 47): If the state has a predecessor state in the initial CLOSED list (Line 40), then G-FRA* sets its parent pointer andg-value according to A* Property 6 (see Section 2.3.4) and inserts it into the initial OPEN list (Lines 41 - 46). 108 According to A* Property 6, the initial OPEN list has to contain all states that are not in the initial CLOSED list but have at least one predecessor state in the initial CLOSED list. The correctness of G-FRA* follows from the fact that every state that has a predecessor state in the initial CLOSED list when G-FRA* executes Step 4 was in the initial OPEN or CLOSED lists after the last search and thus is in the initial CLOSED list, the initial OPEN list or the DELETED list at the start of Step 4 (Line 34). I distinguish three cases: { If a state is in the initial CLOSED list at the start of Step 4 (Line 34), then it was in the initial CLOSED list and thus not in the initial OPEN list after the last search and still is not in the initial OPEN list during Step 4. G-FRA* does nothing since the state should indeed not be in the initial OPEN list during Step 4. { If a state is in the initial OPEN list at the start of Step 4 (Line 34), then it was in the initial OPEN list and thus was not in the initial CLOSED list but had at least one predecessor state in the initial CLOSED list after the last search and this still holds during Step 4 (otherwise it would have been deleted from the initial OPEN list in Step 2). G-FRA* does nothing since the state should indeed be in the initial OPEN list during Step 4 and its parent pointer and g-value still satisfy A* Property 6. { If a state is in the DELETED list at the start of Step 4 (Line 34), then it was deleted from the initial OPEN or CLOSED lists in Step 2. Thus, if it has at least one predecessor state in the initial CLOSED list, G-FRA* inserts it 109 into the initial OPEN list in Step 4 and sets its parent pointer and g-value according to A* Property 6. Figure 4.3(c) visualizes the initial OPEN and CLOSED lists after Step 4. The dotted line represents the inserted states that complete the initial OPEN list (called \inserted OPEN list"). The solid and dotted lines together represent the initial OPEN list. After Step 4, the h-values of all states in the initial OPEN list have been updated with respect to the goal state of the current search, since (1) the h-values of all states that remained in the initial OPEN list before any state from the DELETED list was inserted into the initial OPEN list have been updated with respect to the goal state of the current search (Line 36), and (2) the h-values of all states that were inserted into the initial OPEN list in this step have been updated with respect to the goal state of the current search (Line 41) before the states were inserted into the initial OPEN list. G-FRA* then skips Step 5 and executes Step 6. Note that the OPEN list, which, if it is implemented as a binary heap, requires the execution of heap operations to keep it sorted only once, namely at the end of Step 4, since it can remain unsorted until then. Step 5 (Updating h-values): G-FRA* executes this step only when the start state remains unchanged since ComputePath() was executed last and the goal state of the current search is not in the initial CLOSED list (see Step 1): Since the goal state of the current search has changed after the previous search, the h-values of all states in the initial OPEN list need to be updated with respect to the goal state 110 of the current search. G-FRA* also updates the f-values of all states in the initial OPEN list since they depend on the h-values (Lines 49 - 51). Step 6 (Starting A*): G-FRA* then starts the current search with the initial OPEN and CLOSED lists (Line 63) to determine a new cost-minimal path from the start state of the current search to the goal state of the current search. 4.1.1.4 Example g search h f Figure 4.4: Legend of Figure 4.5 I now use an example problem of moving target search that is extended from the problem in Figure 2.20 to demonstrate the behavior of G-FRA*. The real terrain and the current states of the agent and target are exactly the same as those in Figure 2.20: Initially, the current states of the agent and target are B1 (marked S) and D5 (marked G), respectively. The only dierence is that the terrain now is a known static terrain, where the action costs of the agent do not change between searches. All states have their search-value in the lower right corner. States that have been initialized in InitializeState() also have their g-values in the upper left corner, h-values in the lower left corner, and f-values in the upper right corner (shown in Figure 4.4). G-FRA* breaks ties between states with the same f-values in favor of states with larger g-values. The outgoing arrows from states are the parent pointers of the states and point to their parents. 111 1 8 2 8 3 8 4 8 5 8 7 1 6 1 5 1 4 1 3 1 0 6 1 6 2 6 6 8 6 1 5 1 4 1 2 1 1 6 2 6 3 6 7 8 5 1 4 1 3 1 1 1 2 6 3 6 4 6 8 8 4 1 3 1 2 1 0 1 3 8 4 8 5 8 9 10 5 1 4 1 3 1 1 1 5 1 2 3 4 E A B C D S G (a) First Search counter = 1 1 8 2 8 3 8 4 8 5 8 7 1 6 1 5 1 4 1 3 1 0 6 1 6 2 6 6 8 6 1 5 1 4 1 2 1 1 6 2 6 3 6 7 8 5 1 4 1 3 1 1 1 2 6 3 6 4 6 8 8 4 1 3 1 2 1 0 1 3 8 4 8 5 8 9 10 5 1 4 1 3 1 1 1 E A B C D 5 1 2 3 4 S G (b) Step 2 counter = 1 1 8 2 9 3 8 4 8 5 8 7 1 7 2 5 1 4 1 3 1 2 9 1 6 2 6 6 8 7 2 5 1 4 1 2 1 1 6 2 7 3 7 7 8 5 1 5 2 4 2 1 1 2 6 3 6 4 6 8 8 4 1 3 1 2 1 0 1 3 8 4 8 5 8 9 9 5 1 4 1 3 1 0 2 E A B C D 5 1 2 3 4 S G (c) Step 4 counter = 2 1 8 2 9 3 8 4 8 5 8 7 1 7 2 5 1 4 1 3 1 2 9 1 6 2 6 6 8 7 2 5 1 4 1 2 1 3 9 2 7 3 7 7 8 6 2 5 2 4 2 1 1 4 9 3 7 4 7 8 8 5 2 4 2 3 2 0 1 5 9 4 7 5 7 9 9 4 2 3 2 2 2 0 2 E A B C D 5 1 2 3 4 S G (d) Step 6 (Second Search) counter = 2 Figure 4.5: Example Trace of Generalized Fringe-Retrieving A* G-FRA* performs its rst search from B1 to D5 (shown in Figure 4.5(a)), which starts from scratch. The expanded states (= states in the initial CLOSED list) are shaded grey. After the rst search, the agent moves along the path from B1 to B2, and the target moves o the path from D5 to E5. G-FRA* then executes the following steps before the second search: G-FRA* skips Step 1 and executes Step 2 since the agent moved along the path from B1 to B2, and thus the start state changed between searches. Thus, the conditions of Step 1 are not satised. In Step 2 (shown in Figure 4.5(b)), G-FRA* deletes all states in the search tree of the previous search that are not in the subtree rooted in the start state of the current search by deleting them from the initial OPEN and CLOSED lists. The state that is deleted from the initial OPEN list is A1, and the states that are deleted from the initial CLOSED list are B1, C1, C2, C3, D1, D2, D3, E1, E2 and E3. All states that are deleted have their parent pointer set to NULL and are inserted into the DELETED list. Thus, the DELETED list contains A1, B1, C1, C2, C3, D1, D2, D3, E1, E2, and E3. 112 G-FRA* skips Step 3 and executes Step 4, since the goal state of the current search changed after the previous search and is not in the initial CLOSED list. Thus, the conditions of Step 3 are not satised. In Step 4 (shown in Figure 4.5(c)), G-FRA* completes the initial OPEN list by checking all states s in the DELETED list: If s has a predecessor state in the initial CLOSED list, then G-FRA* sets its parent pointer andg-value according to A* Property 6 (see Section 2.3.4) and inserts it into the initial OPEN list. After Step 4, B1, C2, and C3 have been inserted into the initial OPEN list with g(B1) = 2, g(C2) = 2, and g(C3) = 3, and parent(B1) = B2, parent(C2) = B2, and parent(C3) = B3. G-FRA* skips Step 5 and executes Step 6, since the start state of the current search has changed after the previous search. Thus, the conditions of Step 5 are not satised. In Step 6 (shown in Figure 4.5(d)), G-FRA* starts the second search, which performs 7 state expansions, namely C3, D3, E3, C2, D2, E2 and E5. On the other hand, A* would perform 17 state expansions if one performed A* from scratch for the second search. 4.1.2 Fringe-Retrieving A* Fringe-Retrieving A* (FRA*) (Sun et al., 2009) is a new search tree transforming incre- mental search algorithm that optimizes G-FRA* for moving target search in known static grids (see Section 1.1.2). 4.1.2.1 Applicability Like G-FRA*, FRA* requires the start state of the current search to remain in the search tree of the previous search. Thus, FRA* can only perform forward (but not backward) 113 searches. FRA* applies only to moving target search in known static grids, since it uses geometric properties that are specic to grids and thus does not apply to other state space representations, such as state lattices (see Section 2.2.2). 4.1.2.2 Principle FRA* and G-FRA* share the same principle: Whenever the target moves o the path, FRA* performs a new search (= current search) by assigning the current states of the agent and target to the start and goal states of the current search, respectively. Instead of performing the current search from scratch, FRA* transforms the search tree of the previous search, which is given by the initial OPEN and CLOSED lists as well as the parent pointers and g-values of the states in them, to the search tree of the current search. FRA* changes the initial OPEN and CLOSED lists as well as the parent pointers of the states in them to guarantee that A* Properties 4, 5 and 6 hold again and then starts the current search with these initial OPEN and CLOSED lists to nd a cost-minimal path from the start state of the current search to the goal state of the current search. The dierence between FRA* and G-FRA* is that the initial CLOSED list of G-FRA* contains only the expanded states of the previous search that are in the subtree rooted in the start state of the current search, while FRA* uses geometric properties that are specic to grids to insert additional states into the initial CLOSED list and guarantee that A* Properties 4, 5 and 6 hold again after the insertions. This optimization enables FRA* to reuse more states from the previous search and thus perform fewer state expansions than G-FRA* in the current search, which runs faster than G-FRA* (Sun et al., 2009). 114 I now dene some terminologies for the CLOSED list of A* that are specic to grids. These terminologies will be used to describe the operations of FRA*: The inner perimeter of the CLOSED list contains all states that are in the CLOSED list and have at least one neighboring state that is not in the CLOSED list. The outer perimeter of the CLOSED list contains all states that are not in the CLOSED list and share at least one corner with at least one state that is in the CLOSED list. A* Property 6 implies that the states in the OPEN list belong to the outer perimeter of the CLOSED list. Figure 4.6 illustrates what we call the inner and outer perimeters of the CLOSED list (The CLOSED list is shaded grey). The solid black line is the outer perimeter of the CLOSED list, and the dashed black line is the inner perimeter of the CLOSED list. The outer and inner perimeters consist of three parts each. Figure 4.6: Denition of Perimeter 4.1.2.3 Operations Figures 4.7 and 4.8 give the pseudo code of FRA*, that extends the pseudo code of G- FRA* in Figures 4.1 and 4.2. Similar to G-FRA*, FRA* initializes theg- andh-values of a state s whenever they are needed during a search by executing InitializeState(s) (Line 115 01 function CalculateKey(s) 02 return g(s)+h(s); 03 procedure InitializeState(s) 04 if search(s)6=counter 05 g(s) :=1; 06 h(s) :=H(s;s goal ); 07 search(s) :=counter; 08 procedure UpdateState(s) 09 if s2OPEN 10 OPEN.Update(s, CalculateKey(s)); 11 else 12 OPEN.Insert(s, CalculateKey(s)); 13 function ComputePath() 14 while OPEN6=; 15 s :=OPEN.Pop(); 16 CLOSED.Insert(s); 17 for all s 0 2Succ(s) 18 if s 0 = 2CLOSED 19 InitializeState(s 0 ); 20 if g(s 0 )>g(s)+c(s;s 0 ) 21 g(s 0 ) :=g(s)+c(s;s 0 ); 22 parent(s 0 ) :=s; 23 UpdateState(s 0 ); 24 if s =s goal 25 return true; 26 return false; 27 function UpdateParent(direction) 28 for all s2Succ(cell) in direction order, starting with parent(cell) 29 if g(s) =g(cell)+c(cell;s) AND s2CLOSED 30 parent(s) :=cell; 31 cell :=s; 32 return true; 33 return false; 34 procedure AdditionalStep() 35 cell :=s start ; 36 while UpdateParent(counter-clockwise) /* body of while-loop is empty */; 37 cell :=s start ; 38 while UpdateParent(clockwise) /* body of while-loop is empty */; 39 procedure Step2() 40 parent(s start ) :=NULL; 41 for all s2 S in the subtree rooted in s oldstart /* After parent(s start ) := NULL, all states in the subtree rooted in s start are not in the subtree rooted in s oldstart anymore. */ 42 parent(s) :=NULL; 43 if s2OPEN then OPEN.Delete(s); 44 if s2CLOSED then CLOSED.Delete(s); 45 procedure Step4() 46 for all s2S on the outer perimeter of CLOSED, starting with anchor 47 if s is unblocked AND s = 2OPEN AND9s 0 2Pred(s) : s 0 2CLOSED 48 OPEN :=OPEN[fsg; 49 for all s2OPEN 50 InitializeState(s); 51 for all s 0 2Pred(s) 52 if s 0 2CLOSED AND g(s)>g(s 0 )+c(s 0 ;s) 53 g(s) :=g(s 0 )+c(s 0 ;s); 54 parent(s) :=s 0 ; 55 UpdateState(s); Figure 4.7: Pseudo Code of Fringe-Retrieving A* (Part 1) 116 56 procedure Step5() 57 for all s2OPEN 58 h(s) :=H(s;s goal ); 59 UpdateState(s); 60 function Main() 61 counter := 1; 62 s start := the current state of the agent; 63 s goal := the current state of the target; 64 for all s2S 65 search(s) := 0; 66 InitializeState(s start ); 67 g(s start ) := 0; 68 OPEN :=CLOSED :=;; 69 OPEN.Insert(s start , CalculateKey(s start )); 70 while s start 6=s goal 71 if ComputePath() = false /* Step 6 */ 72 return false; /* Target cannot be caught */ 73 open incomplete := false; 74 while s goal 2CLOSED 75 while target not caught AND target on path from s start to s goal 76 agent follows the path from s start to s goal ; 77 if agent caught target 78 return true; 79 s oldstart :=s start ; 80 s start := the current state of the agent; 81 s goal := the current state of the target; 82 if s start 6=s oldstart 83 AdditionalStep(); 84 anchor :=parent(s start ); 85 Step2(); 86 open incomplete := true; 87 if open incomplete 88 counter :=counter+1; 89 Step4(); 90 else 91 Step5(); 92 return true; Figure 4.8: Pseudo Code of Fringe-Retrieving A* (Part 2) 19), and the main search routine is given in ComputePath() (Lines 13 - 26), which remains unchanged from G-FRA*. Dierent from G-FRA*, FRA* uses geometric properties that are specic to grids to insert more states into the initial CLOSED list and guarantee that A* Properties 4, 5 and 6 hold again after the insertions. There are two major changes from the pseudo code of G-FRA*: (1) FRA* performs an additional step between Step 1 and Step 2, named \Additional Step" to insert more states into the initial CLOSED list, and (2) FRA* performs a version of Step 4 that is dierent from Step 4 of G- FRA* to complete the initial OPEN list. Both Additional Step and Step 4 of FRA* use geometric properties that are specic to grids, and thus do not apply to other state space 117 representation, such as state lattices. I now show how FRA* obtains the initial OPEN and CLOSED lists with a focus on the dierence between the operations of G-FRA* and FRA*: Step 1 (Starting A* Immediately): Step 1 of FRA* is the same as Step 1 of G-FRA*: FRA* executes this step if the agent has not moved (and hence s start = s oldstart ) since ComputePath() was executed last (Line 71). There are the following two cases in Step 1: { Case 1 (Lines 74 - 81): If the goal state of the current search is in the initial CLOSED list, then FRA* skips Additional Step to Step 6. { Case 2 (Lines 90 - 91): If the goal state of the current search is not in the initial CLOSED list, then the previous search can be continued to determine a cost- minimal path from the start state of the current search to the goal state of the current search. FRA* then skips Additional Step to Step 4 and executes Step 5 to calculate the h-values of all states in the initial OPEN list with respect to the goal state of the current search and update their f-values before the current search is run since the goal state has changed after the previous search (Line 81). FRA* then starts the current search with the initial OPEN and CLOSED lists (Line 71) to determine a new cost-minimal path from the start state of the current search to the goal state of the current search. Additional Step (Changing Parent Pointers): The initial CLOSED list at the start of the current search should be as large a subset of the initial CLOSED list after the previous search as possible, so that the current search performs fewer 118 state expansions and runs faster. FRA* executes this step (which does not exist for G-FRA*) to insert additional states into the subtree rooted in the start state of the current search by using geometric properties that are specic to grids, so that there are more states in the initial CLOSED list at the start of the current search: States from the initial CLOSED list after the previous search often satisfy A* Property 4(b) with respect to the start state of the current search but not A* Property 4(a) since there are typically many alternative cost-minimal paths from the start state of the previous search to a state s in the initial CLOSED list after the previous search. If the start state of the current search is on the cost-minimal path from the start state of the previous search to s that results from following the parent pointers from s to the start state of the previous search, then s is in the subtree rooted in the start state of the current search and thus in the initial CLOSED list at the start of the current search. If the start state of the current search is not on the cost-minimal path, then s is not in the subtree rooted in the start state of the current search. FRA* nds such states and changes their parent pointers to make them part of the subtree rooted in the start state of the current search if possible so that they can be part of the initial CLOSED list at the start of the current search. First, FRA* makes the start state of the current search its current state s (Line 35), faces its parent and performs checks in the counter-clockwise direction. It turns counter-clockwise to face the next neighboring state s 0 of its current state s that is in the initial CLOSED list and checks whether it holds that g(s 0 ) =g(s) +c(s;s 0 ) (Lines 28 - 29). 119 { If the check is successful, then FRA* sets the parent ofs 0 tos (Line 30), which is possible because this change does not aect its g-value. Due to this change, all states in the subtree rooted in s 0 now belong to the subtree rooted in the start state of the current search and their g-values remain unaected. FRA* then makes s 0 its current state (Line 31), faces its new parent s and repeats the process of turning and checking the neighboring state that it faces. { If the check is unsuccessful, then FRA* repeats the process of turning and checking the neighboring state that it faces. If, during the process of turning and checking the neighboring state that it faces, FRA* faces the parent of its current state again, then it makes the start state of the current search its current state again (Line 37), faces its parent and now performs similar checks in the clockwise direction (Line 38). The resulting search tree is one that a search of A* could have generated if it had broken ties among states with the same f-value appropriately. It is guaranteed that the process of checking the neighboring state terminates. This is because during the checks in the counter-clockwise direction, whenever making a state s 0 the current state (except when s 0 is set to the start state of the current search at the start of the process), itsg-value must be larger than theg-value of the previous current state s (Lines 28 - 29 and c(s;s 0 ) > 0). Thus, the g-value of the current state must increase monotonically during the process. Since the g-values of the states in the search tree must be nite, the process must terminate. Similar situation holds for the checks in the clockwise direction. 120 Step 2 (Deleting States): Before Step 2, FRA* remembers the parent of the start state of the current search, called the anchor state, that will be used in Step 4 to complete the initial OPEN list (Line 84). Step 2 of FRA* is similar to Step 2 of G-FRA*: FRA* rst sets the parent pointer of the start state of the current search to NULL (Line 40), then all states in the subtree rooted in the start state of the current search are not in the subtree rooted in the start state of the previous search anymore. FRA* then iterates over all states in the subtree rooted in the start state of the previous search: FRA* sets the parent pointers of the states to NULL (Line 42) and deletes the states from the initial OPEN and CLOSED lists if they were in the lists (Lines 43 - 44). The only dierence to G-FRA* is that FRA* does not maintain a DELETED list, and thus does not insert all deleted states into the DELETED list. Step 3 (Terminating Early): Step 3 of FRA* is as same as Step 3 of G-FRA*: If the goal state of the current search is in the initial CLOSED list (Line 74), then the previous search already determined a cost-minimal path from the start state of the current search to the goal state of the current search (see A* Property 5 in Section 2.3.4). Thus, FRA* skips Steps 4 to 6. Step 4 (Inserting States): Step 4 of FRA* is dierent from Step 4 of G-FRA*: This step uses geometric properties that are specic to grids: The initial OPEN list needs to contain exactly all states that are on the outer perimeter of the initial CLOSED list that have at least one predecessor state in the initial CLOSED list according to A* Property 6. The initial CLOSED list forms a contiguous area in 121 grids according to A* Property 5. FRA* thus completes the initial OPEN list by circumnavigating the outer perimeter of the initial CLOSED list that contains the anchor state, starting with the anchor state, and inserting every visited unblocked state that has at least one predecessor state (= neighboring state on the grid) in the initial CLOSED list into the initial OPEN list if the unblocked state is not yet in the initial OPEN list (Lines 46 - 48). Afterwards, the initial OPEN list is complete. 3 However, the parent pointers and g-values of some states in the initial OPEN list might not satisfy A* Properties 6(a) and (b). FRA* therefore sets the parent of every states in the initial OPEN list to the states 0 in the initial CLOSED list that minimizes g(s 0 ) +c(s 0 ;s) and then the g-value of s to g(parent(s)) +c(parent(s);s) (Lines 49 - 54). The parent pointers and g-values of all states in the initial OPEN list now satisfy A* Properties 6(a) and (b). FRA* also updates the f-values of all states in the initial OPEN list using their current g- and h-values (Line 55). The initial OPEN list, which, if it is implemented as a binary heap, requires the execution of heap operations to keep it sorted only once, namely at the end of Step 4, since it can remain unsorted until then. Step 5 (Updating h-values): Step 5 of FRA* is as same as Step 5 of G-FRA*: If the start state remains unchanged since ComputePath() was executed last, and the goal state of the current search is not in the initial CLOSED list, FRA* updates the h-values of all states in the initial OPEN list with respect to the goal state of 3 FRA* does not need to circumnavigate the outer perimeters of the initial CLOSED list that do not contain the anchor state (if any) to complete the initial OPEN list, because the states belong to those outer perimeters have not been deleted from the search tree of the previous search in Step 2, and remain in the initial OPEN list after Steps 2 and 3. 122 the current search. G-FRA* also updates the f-values of all states in the initial OPEN list since their f-values depend on their current h-values (Lines 57 - 59). Step 6 (Starting A*): Step 6 of FRA* is as same as Step 6 of G-FRA*: It then starts the current search with the initial OPEN and CLOSED lists (Line 71) to determine a new cost-minimal path from the start state of the current search to the goal state of the current search. 4.1.2.4 Example g search h f Figure 4.9: Legend of Figure 4.10 I now use the same example problem of moving target search as for G-FRA* (shown in Figure 4.5) to demonstrate the behavior of FRA*. Initially, the current states of the agent and target are B1 (marked S) and D5 (marked G), respectively. All states have their search-value in the lower right corner. States that have been initialized in InitializeState() also have their g-values in the upper left corner, h-values in the lower left corner, and f-values in the upper right corner (shown in Figure 4.9). FRA* breaks ties among states with the same f-values in favor of states with larger g-values. The outgoing arrows from states are the parent pointers of the states and point to their parents. Like G-FRA*, FRA* performs its rst search from B1 to D5 from scratch (shown in Figure 4.10(a)). The expanded states (= states in the initial CLOSED list) are shaded 123 1 8 2 8 3 8 4 8 5 8 7 1 6 1 5 1 4 1 3 1 0 6 1 6 2 6 6 8 6 1 5 1 4 1 2 1 1 6 2 6 3 6 7 8 5 1 4 1 3 1 1 1 2 6 3 6 4 6 8 8 4 1 3 1 2 1 0 1 3 8 4 8 5 8 9 10 5 1 4 1 3 1 1 1 4 5 1 2 3 E A B C D S G (a) First Search counter = 1 1 8 2 8 3 8 4 8 5 8 7 1 6 1 5 1 4 1 3 1 0 6 1 6 2 6 6 8 6 1 5 1 4 1 2 1 1 6 2 6 3 6 7 8 5 1 4 1 3 1 1 1 2 6 3 6 4 6 8 8 4 1 3 1 2 1 0 1 3 8 4 8 5 8 9 10 5 1 4 1 3 1 1 1 E A B C D 4 5 1 2 3 S G (b) Additional Step counter = 1 1 8 2 8 3 8 4 8 5 8 7 1 6 1 5 1 4 1 3 1 0 6 1 6 2 6 6 8 6 1 5 1 4 1 2 1 1 6 2 6 3 6 7 8 5 1 4 1 3 1 1 1 2 6 3 6 4 6 8 8 4 1 3 1 2 1 0 1 3 8 4 8 5 8 9 10 5 1 4 1 3 1 1 1 4 5 1 2 3 E A B C D S G (c) Step 2 counter = 1 1 8 2 9 3 8 4 8 5 8 7 1 7 2 5 1 4 1 3 1 2 9 1 6 2 6 6 8 7 2 5 1 4 1 2 1 3 9 2 6 3 6 7 8 6 2 4 1 3 1 1 1 4 9 3 6 4 6 8 8 5 2 3 1 2 1 0 1 5 9 4 8 5 8 9 9 4 2 4 1 3 1 0 2 4 5 1 2 3 E A B C D S G (d) Step 4 counter = 2 1 8 2 9 3 8 4 8 5 8 7 1 7 2 5 1 4 1 3 1 2 9 1 6 2 6 6 8 7 2 5 1 4 1 2 1 3 9 2 6 3 6 7 8 6 2 4 1 3 1 1 1 4 9 3 6 4 6 8 8 5 2 3 1 2 1 0 1 5 9 4 8 5 8 9 9 4 2 4 1 3 1 0 2 E A B C D 4 5 1 2 3 S G (e) Step 6 (Second Search) counter = 2 Figure 4.10: Example Trace of Fringe-Retrieving A* grey. After the rst search, the agent moves along the path from B1 to B2, and the target moves o the path from D5 to E5. Then, FRA* executes the following steps before the second search: FRA* skips Step 1 and executes Additional Step since the agent moved along the path from B1 to B2, and thus the start state changed between searches. Thus, the conditions of Step 1 are not satised. In Additional Step (shown in Figure 4.10(b)), FRA* inserts states that were not in the subtree rooted in the start state of the current search into the subtree, so that there are more states in the initial CLOSED list at the start of the second search: Before this 124 step, the expanded states in the subtree rooted in the start state B2 of the current search and in the initial CLOSED list are A3, A4, A5, B2, B3, B5, C5 and D5 (shown in Figure 4.10(a)). However, C2 could have B2 as parent rather than C1 since this change does not aect its g-value. Then, all states in the subtree rooted in C2 belong to this subtree, and their g-values remain unaected. FRA* starts at the start state of the current search B2 and performs checks in the counter-clockwise direction. First, FRA* is at B2 facing its parent B1 and checks C2 successfully. It then sets the parent of C2 to B2. Second, FRA* is at C2 facing its parent B2, checks C1 unsuccessfully and then checks D2 successfully. It then sets the parent of D2 to C2. Third, FRA* is at D2 facing its parent C2, checks D1 unsuccessfully and then checks E2 successfully. It then sets the parent of E2 to D2. Fourth, FRA* is at E2 facing its parent D2, checks E1 unsuccessfully and then checks E3 successfully. It then sets the parent of E3 to E2 (which does not change its parent). Fifth, FRA* is at E3 facing its parent E2 and checks D3 unsuccessfully. FRA* then starts again at the start state of the current search B2 and performs similar checks in the clockwise direction until it reaches D5. The states in the subtree rooted in the start state B2 of the current search that are in the initial CLOSED list after Additional Step are A3, A4, A5, B2, B3, B5, C2, C3, C5, D2, D3, D5, E2 and E3. In Step 2 (shown in Figure 4.10(c)), FRA* deletes all states in the subtree rooted in the start state of the previous search but not in the subtree rooted in the start state of the current search by deleting them from the initial OPEN and CLOSED lists. The state that is deleted from the initial OPEN list is A1, and the states that are deleted from the initial CLOSED list are B1, C1, D1, and E1. All states that are deleted have their parent pointer set to NULL. 125 FRA* skips Step 3 and executes Step 4, since the goal state of the current search changed after the rst search and is not in the initial CLOSED list. Thus, the conditions of Step 3 are not satised. In Step 4 (shown in Figure 4.10(d)), FRA* completes the initial OPEN list by circum- navigating the outer perimeter of the initial CLOSED list that contains the anchor cell B1, starting with B1. The grey arrows point towards the states that FRA* visits during the circumnavigation. During the circumnavigation, FRA* inserts B1, C1, D1 and E1 into the initial OPEN list, since they have at least one predecessor state in the initial CLOSED list, and then corrects all of their parent pointers and g-values. For example, before this step, parent(C1) = NULL and g(C1) = 1. After this step, parent(C1) = C2 and g(C1) = 3. FRA* skips Step 5 and executes Step 6, since the start state of the current search changed after the rst search. Thus, the conditions of Step 5 are not satised. In Step 6 (shown in Figure 4.10(d)), FRA* starts the second search, in which FRA* performs only one state expansion, namely E5. On the other hand, G-FRA* performs 7 state expansions, namely C3, D3, E3, C2, D2, E2 and E5 (shown in Figure 4.5(d)). 4.2 Moving Target D* Lite In this section, I introduce Moving Target D* Lite for moving target search in terrains where the action costs of the agent can change between searches. I rst introduce a simple version of Moving Target D* Lite, referred to as Basic Moving Target D* Lite (Basic MT-D* Lite), for moving target search in Section 4.2.1. I then introduce an optimized 126 version of Moving Target D* Lite, referred to as Moving Target D* Lite (MT-D* Lite), in Section 4.2.2, that runs faster than Basic MT-D* Lite. 4.2.1 Basic Moving Target D* Lite Basic MT-D* Lite is a new search tree transforming incremental search algorithm for moving target search. 4.2.1.1 Applicability Basic MT-D* Lite requires the start state of the current search to remain in the search tree of the previous search. Thus, Basic MT-D* Lite can only perform forward (but not backward) searches by assigning the current states of the agent and target to the start and goal states of each search, respectively, since the current state of the agent can only move along the path found by the previous search and hence cannot move outside of the search tree of the previous search, while the current state of the target can move to a state that is outside of the search tree of the previous search. Basic MT-D* Lite applies to moving target search in all three classes of terrains (see Section 1.1.2). 4.2.1.2 Principle rhs(s) = 8 > > < > > : c if s =s start (Eq. 1') min s 0 2Pred(s) (g(s 0 ) +c(s 0 ;s)) otherwise (Eq. 2') 127 parent(s) = 8 > > > < > > > : NULL if s =s start OR rhs(s) =1 (Eq. 3') argmin s 0 2Pred(s) (g(s 0 ) +c(s 0 ;s)) otherwise (Eq. 4') Basic MT-D* Lite shares its principle with D* Lite, and it extends D* Lite to the situation where the start state can change between searches: D* Lite requires that all states satisfy Eqs. 1-4 described in Section 2.4.2.2. The correctness proof of D* Lite continues to hold if c is an arbitrary nite constant in Eq. 1. instead of zero. In Basic MT-D* Lite, if the start state changed between searches, the rhs-value of the start state of the current search can be an arbitrary nite value, including its current rhs-value, since its current rhs-value is nite. 4 Since Basic MT-D* Lite performs forward searches while D* Lite performs backward searches, Basic MT-D* Lite simply converts Eqs. 1- 4 described in Section 2.4.2.2 to Eqs. 1'-4', respectively. 5 Basic MT-D* Lite therefore calculates the rhs-value of the previous start state (according to Eq. 2' since it is no longer the start state of the current search), its parent pointer (according to Eqs. 3'-4') and its membership in the OPEN list. Then, Basic MT-D* Lite performs a new search using similar operations to D* Lite. 4 The rhs-value of the start state was initialized to 0 (Line 11), and the rhs-value of the start state of the current search is guaranteed to be nite because (1) if there exists a path after the previous search, then the start state of the current search is on the cost-minimal path from the start state to the goal state of the previous search. The rhs-values of all states on this path are no larger than the rhs-value of the goal state of the previous search, which is nite due to Lines 45 - 46 of the Basic MT-D* Lite pseudo code, and (2) if there does not exist a path after the previous search, then the start state remains unchanged because the start state does not move in this case. Therefore, the rhs-value of the start state is still nite. 5 This conversion is straightforward, Basic MT-D* Lite checks the predecessor states rather than the successor states for setting therhs-values and parent pointers of all states in Eqs. 2 and 4 when performing forward searches, resulting in Eqs. 2' and 4', respectively. Eqs. 1 and 3 are identical to Eqs. 1' and 3', respectively. 128 4.2.1.3 Operations Figures 4.11 and 4.12 give the pseudo code of Basic MT-D* Lite that extends the pseudo code of D* Lite in Figures 2.17 and 2.18. Like D* Lite, Basic MT-D* Lite maintains an h-value (= H(s;s goal )), g-value, rhs-value, and parent pointer for every state s. The rhs-value is dened in a similar way to D* Lite (see Eqs. 1' and 2'). The main dierence is that c can be an arbitrary nite constant during each search for Basic MT-D* Lite, whilec must be 0 during each search for D* Lite (see Section 2.4.2.2). The parent pointer of state s is also dened in a similar way to D* Lite (see Eqs. 3' and 4'). The main search routine is ComputePath() (Lines 20 - 55), which is similar to ComputePath() in D* Lite. The main dierences of the operations between D* Lite and Basic MT-D* Lite are caused by the change in search direction: When expanding a state s, Basic MT-D* Lite needs to check the successor states (rather than the predecessor states) of s (Lines 30 and 37). When calculating the user-provided h-value of a state s, Basic MT-D* Lite uses H(s;s goal ) (rather than H(s goal ;s) (Line 02). When calculating k m , Basic MT-D* Lite uses k m :=k m +H(s goal ;s oldgoal ) (rather than k m :=k m +H(s oldgoal ;s goal )) (Line 70). ComputePath() determines a cost-minimal path from the start state to the goal state of each search. Before it is executed, states can have arbitrary g-values but their rhs- values and parent pointers have to satisfy Eqs. 1'-4' and the OPEN list has to contain all locally inconsistent states. The main necessary modications to the pseudo code of D* Lite in Figures 2.17 and 2.18 are as follows: 129 01 function CalculateKey(s) 02 return [min(g(s);rhs(s))+H(s;s goal )+km;min(g(s);rhs(s))]; 03 procedure Initialize() 04 OPEN :=;; 05 km := 0; 06 for all s2S 07 rhs(s) :=g(s) :=1; 08 parent(s) :=NULL; 09 s start := the current state of the agent; 10 s goal := the current state of the target; 11 rhs(s start ) := 0; 12 OPEN.Insert(s start ,CalculateKey(s start )); 13 procedure UpdateState(u) 14 if g(u)6=rhs(u) AND u2OPEN 15 OPEN.Update(u, CalculateKey(u)); 16 else if g(u)6=rhs(u) AND u = 2OPEN 17 OPEN.Insert(u, CalculateKey(u)); 18 else if g(u) =rhs(u) AND u2OPEN 19 OPEN.Delete(u); 20 function ComputePath() 21 while OPEN.TopKey()< CalculateKey(s goal ) OR rhs(s goal )>g(s goal ) 22 u :=OPEN.Top(); 23 k old :=OPEN.TopKey(); 24 knew := CalculateKey(u); 25 if k old <knew 26 OPEN.Update(u,knew); 27 else if g(u)>rhs(u) 28 g(u) :=rhs(u); 29 OPEN.Delete(u); 30 for all s2Succ(u) 31 if s6=s start AND rhs(s)>g(u)+c(u;s) 32 parent(s) :=u; 33 rhs(s) :=g(u)+c(u;s); 34 UpdateState(s); 35 else 36 g(u) :=1; 37 for all s2Succ(u)[fug 38 if s6=s start AND parent(s) =u 39 rhs(s) := min s 0 2Pred(s) (g(s 0 )+c(s 0 ;s)); 40 if rhs(s) =1 41 parent(s) :=NULL; 42 else 43 parent(s) := argmin s 0 2Pred(s) (g(s 0 )+c(s 0 ;s)); 44 UpdateState(s); 45 if rhs(s goal ) =1 46 return false; 47 return true; 48 procedure BasicDeletion() 49 parent(s start ) :=NULL; 50 rhs(s oldstart ) := min s 0 2Pred(s oldstart ) (g(s 0 )+c(s 0 ;s oldstart )); 51 if rhs(s oldstart ) =1 52 parent(s oldstart ) :=NULL; 53 else 54 parent(s oldstart ) := argmin s 0 2Pred(s oldstart ) (g(s 0 )+c(s 0 ;s oldstart )); 55 UpdateState(s oldstart ); Figure 4.11: Pseudo Code of Basic Moving Target D* Lite (Part 1) 130 56 function Main() 57 Initialize(); 58 while s start 6=s goal 59 s oldstart :=s start ; 60 s oldgoal :=s goal ; 61 if ComputePath() = true 62 while target not caught AND target on path from s start to s goal AND action costs do not change 63 agent follows path from s start to s goal ; 64 if agent caught target 65 return true; 66 s start := the current state of the agent; 67 else 68 wait until some action costs decrease; 69 s goal := the current state of the target; 70 km :=km +H(s goal ;s oldgoal ); 71 if (s oldstart 6=s start ) 72 BasicDeletion(); 73 for all actions whose costs changed from c 0 (u;v) to c(u;v) 74 if c 0 (u;v)>c(u;v) 75 if v6=s start AND rhs(v)>g(u)+c(u;v) 76 parent(v) :=u; 77 rhs(v) :=g(u)+c(u;v); 78 UpdateState(v); 79 else 80 if v6=s start AND parent(v) =u 81 rhs(v) := min s 0 2Pred(v) (g(s 0 )+c(s 0 ;v)); 82 if rhs(v) =1 83 parent(v) :=NULL; 84 else 85 parent(v) := argmin s 0 2Pred(v) (g(s 0 )+c(s 0 ;v)); 86 UpdateState(v); 87 return true; Figure 4.12: Pseudo Code of Basic Moving Target D* Lite (Part 2) First, D* Lite performs backward searches. On the other hand, Basic MT-D* Lite performs forward searches by assigning the current states of the agent and target to the start and goal states of each search, respectively (Lines 09, 10, 66 and 69). This is so because, like G-FRA*, Basic MT-D* Lite requires the start state of the current search to remain in the search tree of the previous search to reuse the search tree. If Basic MT-D* Lite performed backward searches, the start state (= the state of the target) might move outside of the search tree of the previous search. Basic MT-D* Lite might not reuse any part of the search tree from the previous search when the start state moves outside of the search tree of the previous search 131 Second, D* Lite requires the start state to remain unchanged between searches. On the other hand, the start state of Basic MT-D* Lite can change between searches. If the agent moved after the previous search, and thus the start state of the current search is dierent from the start state of the previous search (Line 71), Basic MT- D* Lite executes BasicDeletion(), which is a new procedure that does not exist for D* Lite: Before ComputePath() is executed, the rhs-values and parent pointers of all states have to satisfy Eqs. 1'-4' and the OPEN list has to contain all locally inconsistent states. Fortunately, the rhs-values and parent pointers of all states already satisfy these invariants, with the possible exception of the start states of the previous and current searches. Basic MT-D* Lite therefore calculates the rhs- value of the start state of the previous search (according to Eq. 2' since it is no longer the start state of the current search), its parent pointer (according to Eqs. 3'-4') and its membership in the OPEN list (Lines 50 - 55). The correctness proofs of D* Lite continue to hold ifc is an arbitrary nite value in Eq. 1'. instead of zero (see Section 4.2.1.2). Thus, the rhs-value of the start state of the current search can be an arbitrary nite value, including its current rhs-value, since its current rhs-value is nite. Basic MT-D* Lite therefore does not change the rhs-value of the start state of the current search nor its membership in the OPEN list and only sets its parent pointer to NULL (according to Eq. 3') (Line 49). 6 132 g rhs Figure 4.13: Legend of Figure 4.14 4.2.1.4 Example I now use an example problem of moving target search that extends the problem in Figure 4.5 to demonstrate the behavior of Basic MT-D* Lite. The only dierence is that the terrain now is a known dynamic terrain. Initially, the current states of the agent and target are B1 (marked S) and D5 (marked G), respectively. All unblocked states have theirg-values in the upper left corner and theirrhs-values in the upper right corner (shown in Figure 4.13). For ease of illustration, I use h-values that are zero for all states and thus do not show them in each state. The key of each state is thus the minimum of its g- and rhs-values. The outgoing arrows from states are the parent pointers of the states and point to their parents. Basic MT-D* Lite performs its rst search from B1 to D5 (shown in Figure 4.14(a)), which starts from scratch. The cost-minimal path is B1, B2, B3, A3, A4, A5, B5, C5, and D5. The agent then moves along the path to B2, the target moves o the path to E5, and B4 becomes unblocked, which changes the action costs of the agent c(B3,B4), c(B4,B3), c(B4,B5), c(B5,B4), c(B4,A4) and c(A4,B4) from innity to one (shown in Figure 4.14(b)). Since the target moved o the path and the action costs of the agent changed, Basic MT-D* Lite nds a cost-minimal path from the current state of the agent B2 to the current state of the target E5 using the following steps. Basic MT-D* Lite 6 It is possible to execute BasicDeletion() after updating the action costs of the agent (by executing Lines 71 - 72 after Line 86) without aecting the correctness of Basic MT-D* Lite, since Eqs. 1'-4' still hold for all states. 133 1 1 2 2 3 3 4 4 5 5 0 0 1 1 2 2 6 6 1 1 2 2 3 3 7 7 2 2 3 3 4 4 8 3 3 4 4 5 5 5 1 2 3 4 E A B C D S G (a) First Search 1 1 2 2 3 3 4 4 5 5 0 0 1 1 2 2 6 6 1 1 2 2 3 3 7 7 2 2 3 3 4 4 8 3 3 4 4 5 5 E A B C D 5 1 2 3 4 S G (b) After Target Moved and B4 unblocked 1 1 2 2 3 3 4 4 5 5 0 2 1 1 2 2 3 6 6 1 1 2 2 3 3 7 7 2 2 3 3 4 4 8 3 3 4 4 5 5 E A B C D 5 1 2 3 4 S G (c) After BasicDeletion and action costs updated 1 3 2 2 3 3 4 4 5 5 2 1 1 2 2 3 6 6 1 3 2 2 3 3 7 7 2 2 3 3 4 4 8 3 3 4 4 5 5 E A B C D 5 1 2 3 4 S G (d) After B1 Expanded 3 2 2 3 3 4 4 5 5 2 1 1 2 2 3 6 6 1 3 2 2 3 3 7 7 2 2 3 3 4 4 8 3 3 4 4 5 5 5 1 2 3 4 E A B C D S G (e) After A1 Expanded 3 2 2 3 3 4 4 5 5 2 1 1 2 2 3 6 6 3 2 2 3 3 7 7 2 4 3 3 4 4 8 3 3 4 4 5 5 E A B C D 5 1 2 3 4 S G (f) After C1 Expanded 3 2 2 3 3 4 4 5 5 2 1 1 2 2 3 6 6 3 2 2 3 3 7 7 4 3 3 4 4 8 3 5 4 4 5 5 E A B C D 5 1 2 3 4 S G (g) After D1 Expanded 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 6 6 3 2 2 3 3 7 7 4 3 3 4 4 8 3 5 4 4 5 5 5 1 2 3 4 E A B C D S G (h) After B1 Expanded 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 6 6 3 2 2 3 3 7 7 4 3 3 4 4 8 5 4 4 5 5 E A B C D 5 1 2 3 4 S G (i) After E1 Expanded 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 6 6 3 3 2 2 3 3 7 7 4 3 3 4 4 8 5 4 4 5 5 5 1 2 3 4 E A B C D S G (j) After C1 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 6 6 3 3 2 2 3 3 7 7 4 3 3 4 4 8 5 4 4 5 5 E A B C D 5 1 2 3 4 S G (k) After A1 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 3 6 4 3 3 2 2 3 3 7 7 4 3 3 4 4 8 5 4 4 5 5 5 1 2 3 4 E A B C D S G (l) After B4 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 3 6 4 3 3 2 2 3 3 7 7 4 4 3 3 4 4 8 5 4 4 5 5 E A B C D 5 1 2 3 4 S G (m) After D1 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 3 4 4 3 3 2 2 3 3 7 5 4 4 3 3 4 4 8 5 4 4 5 5 5 1 2 3 4 E A B C D S G (n) After B5 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 3 4 4 3 3 2 2 3 3 7 5 4 4 3 3 4 4 8 5 5 4 4 5 5 E A B C D 5 1 2 3 4 S G (o) After E1 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 3 4 4 3 3 2 2 3 3 5 5 4 4 3 3 4 4 6 5 5 4 4 5 5 5 1 2 3 4 E A B C D S G (p) After C5 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 3 4 4 3 3 2 2 3 3 5 5 4 4 3 3 4 4 6 6 5 5 4 4 5 5 7 E A B C D 5 1 2 3 4 S G (q) After D5 Expanded Figure 4.14: Example Trace of Basic Moving Target D* Lite 134 sets the parent pointer of the start state of the current search B2 to NULL, updates the rhs-value and parent pointer of the start state of the previous search B1 to 2 and B2, respectively, and inserts B1 into the OPEN list (shown in Figure 4.14(c)). Basic MT-D* Lite then starts the second search, which performs 14 state expansions, namely B1, A1, C1, D1, B1, E1, C1, A1, B4, D1, B5, E1, C5 and D5 (shown in Figures 4.14(d-q)). The cost-minimal path is B2, B3, B4, B5, C5, D5 and E5. All states in the search tree of the previous search that are not in the subtree rooted in the start state of the current search, namely A1, B1, C1, D1 and E1, are expanded twice during the second search. Expanding these states twice can be computationally expensive (for example, expanding a state in D* Lite and its variants involves many operations for updating the g- and rhs-values of the state and its successor states, and updating their keys and memberships in the OPEN list), which is the issue addressed by Moving Target D* Lite (MT-D* Lite). 4.2.2 Moving Target D* Lite Moving Target D* Lite (MT-D* Lite) is a new search tree transforming incremental search algorithm for moving target search that optimizes Basic MT-D* Lite. 4.2.2.1 Applicability Like Basic MT-D* Lite, MT-D* Lite can only perform forward (but not backward) searches by assigning the current states of the agent and target to the start and goal states of each search, respectively. MT-D* Lite applies to moving target search in all three classes of terrains (see Section 1.1.2). 135 4.2.2.2 Principle MT-D* Lite shares its principle with Basic MT-D* Lite, except that Basic MT-D* Lite uses BasicDeletion() to lazily expand the states in the search tree of the previous search that are not in the subtree rooted in the start state of the current search to set their g-values to innity one at a time when needed. Whenever it sets the g-value of a state to innity, it needs to update the rhs-values, parent pointers and memberships in the OPEN list for all successor states of the state, so that all states satisfy Eqs. 1'-4'. Expanding the states in the search tree of the previous search that are not in the subtree rooted in the start state of the current search can be computationally expensive. On the other hand, MT-D* Lite uses OptimizedDeletion(), a more sophisticated procedure that combines the principles of Basic MT-D* Lite and G-FRA* to delete all states in the search tree of the previous search that are not in the subtree rooted in the start state of the current search (= deleted states) by eagerly setting theirg- andrhs-values to innity and setting their parent pointers to NULL, so that all states satisfy Eqs. 1'-4'. MT-D* Lite avoids expanding the deleted states in the current search twice and thus can be computationally less expensive than Basic MT-D* Lite. 4.2.2.3 Operations Figures 4.15 and 4.16 give the pseudo code of MT-D* Lite that extends the pseudo code of Basic MT-D* Lite in Figures 4.11 and 4.12. There is only one modication to the pseudo code of Basic MT-D* Lite, namely that MT-D* Lite uses OptimizedDeletion() (Lines 48 - 60) instead of BasicDeletion() to ensure that all states satisfy Eqs. 1'-4' after the start state has changed between searches: 136 01 function CalculateKey(s) 02 return [min(g(s);rhs(s))+H(s;s goal )+km;min(g(s);rhs(s))]; 03 procedure Initialize() 04 OPEN :=;; 05 km := 0; 06 for all s2S 07 rhs(s) :=g(s) :=1; 08 parent(s) :=NULL; 09 s start := the current state of the agent; 10 s goal := the current state of the target; 11 rhs(s start ) := 0; 12 OPEN.Insert(s start ,CalculateKey(s start )); 13 procedure UpdateState(u) 14 if g(u)6=rhs(u) AND u2OPEN 15 OPEN.Update(u, CalculateKey(u)); 16 else if g(u)6=rhs(u) AND u = 2OPEN 17 OPEN.Insert(u, CalculateKey(u)); 18 else if g(u) =rhs(u) AND u2OPEN 19 OPEN.Delete(u); 20 function ComputePath() 21 while OPEN.TopKey()< CalculateKey(s goal ) OR rhs(s goal )>g(s goal ) 22 u :=OPEN.Top(); 23 k old :=OPEN.TopKey(); 24 knew := CalculateKey(u); 25 if k old <knew 26 OPEN.Update(u,knew); 27 else if g(u)>rhs(u) 28 g(u) :=rhs(u); 29 OPEN.Delete(u); 30 for all s2Succ(u) 31 if s6=s start AND rhs(s)>g(u)+c(u;s) 32 parent(s) :=u; 33 rhs(s) :=g(u)+c(u;s); 34 UpdateState(s); 35 else 36 g(u) :=1; 37 for all s2Succ(u)[fug 38 if s6=s start AND parent(s) =u 39 rhs(s) := min s 0 2Pred(s) (g(s 0 )+c(s 0 ;s)); 40 if rhs(s) =1 41 parent(s) :=NULL; 42 else 43 parent(s) := argmin s 0 2Pred(s) (g(s 0 )+c(s 0 ;s)); 44 UpdateState(s); 45 if rhs(s goal ) =1 46 return false; 47 return true; 48 procedure OptimizedDeletion() 49 DELETED :=;; 50 parent(s start ) :=NULL; 51 for all s2S that belong to the search tree rooted in s oldstart but not the subtree rooted in s start 52 parent(s) :=NULL; 53 rhs(s) :=g(s) :=1; 54 insert s into DELETED; 55 for all s2DELETED 56 for all s 0 2Pred(s) 57 if rhs(s)>g(s 0 )+c(s 0 ;s) 58 rhs(s) :=g(s 0 )+c(s 0 ;s); 59 parent(s) :=s 0 ; 60 UpdateState(s); Figure 4.15: Pseudo Code of Moving Target D* Lite (Part 1) 137 61 function Main() 62 Initialize(); 63 while s start 6=s goal 64 s oldstart :=s start ; 65 s oldgoal :=s goal ; 66 if ComputePath() = true 67 while target not caught AND target on path from s start to s goal AND action costs do not change 68 agent follows path from s start to s goal ; 69 if agent caught target 70 return true; 71 s start := the current state of the agent; 72 else 73 wait until some action costs decrease; 74 s goal := the current state of the target; 75 km :=km +H(s goal ;s oldgoal ); 76 if (s oldstart 6=s start ) 77 OptimizedDeletion(); 78 for all actions whose costs changed from c 0 (u;v) to c(u;v) 79 if c 0 (u;v)>c(u;v) 80 if v6=s start AND rhs(v)>g(u)+c(u;v) 81 parent(v) :=u; 82 rhs(v) :=g(u)+c(u;v); 83 UpdateState(v); 84 else 85 if v6=s start AND parent(v) =u 86 rhs(v) := min s 0 2Pred(v) (g(s 0 )+c(s 0 ;v)); 87 if rhs(v) =1 88 parent(v) :=NULL; 89 else 90 parent(v) := argmin s 0 2Pred(v) (g(s 0 )+c(s 0 ;v)); 91 UpdateState(v); 92 return true; Figure 4.16: Pseudo Code of Moving Target D* Lite (Part 2) MT-D* Lite maintains a DELETED list in OptimizedDeletion(). The DELETED list contains all states in the search tree of the previous search that are not in the subtree rooted in the start state of the current search, which is exactly the denition that G- FRA* uses. First, MT-D* Lite sets the parent pointer of the start state of the current search to NULL (Line 50). Then, MT-D* Lite executes two phases to ensure that all states satisfy Eqs. 1'-4': Phase 1 (Lines 51 - 54): MT-D* Lite eagerly sets the parent pointers of all states in the search tree of the previous search that are not in the subtree rooted in the 138 start state of the current search to NULL, sets their g- and rhs-values to innity and inserts them into the DELETED list in one pass (Lines 51 - 54). Phase 2 (Lines 55 - 60): MT-D* Lite then updates the rhs-values according to Eqs. 1'-2', parent pointers according to Eqs. 3'-4', and memberships in the OPEN list according to the denition of local consistency for all states in the DELETED list in one pass (Lines 55 - 60). Their rhs-values, parent pointers and memberships in the OPEN list can be updated in one pass since they depend only on the g-values of their predecessor states, which do not change in Phase 2. Overall, Phase 1 is similar to Step 2 (Deleting States) of G-FRA*, and Phase 2 is similar to Step 4 (Inserting States) of G-FRA*. Phases 1 and 2 iterate over all states in the DELETED list. Phase 2 also iterates over all predecessor states of the states in the DELETED list. Thus, each state in the DELETED list can require n operations on n- neighbor gridworlds. Both phases manipulate the OPEN list, which, if it is implemented as a binary heap, requires the execution of heap operations to keep it sorted only once, namely at the end of Phase 2, since it can remain unsorted until then. 4.2.2.4 Example g rhs Figure 4.17: Legend of Figure 4.18 I now use the same example problem of moving target search as used for Basic MT-D* Lite (shown in Figure 4.14) to demonstrate the behavior of MT-D* Lite. Initially, the 139 1 1 2 2 3 3 4 4 5 5 0 0 1 1 2 2 6 6 1 1 2 2 3 3 7 7 2 2 3 3 4 4 8 3 3 4 4 5 5 5 1 2 3 4 E A B C D S G (a) First Search 1 1 2 2 3 3 4 4 5 5 0 0 1 1 2 2 6 6 1 1 2 2 3 3 7 7 2 2 3 3 4 4 8 3 3 4 4 5 5 E A B C D 5 1 2 3 4 S G (b) After Target Moved and B4 unblocked 3 2 2 3 3 4 4 5 5 2 1 1 2 2 3 6 6 3 2 2 3 3 7 7 4 3 3 4 4 8 5 4 4 5 5 5 1 2 3 4 E A B C D S G (c) After OptimizedDeletion and action costs updated 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 6 6 3 2 2 3 3 7 7 4 3 3 4 4 8 5 4 4 5 5 E A B C D 5 1 2 3 4 S G (d) After B1 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 6 6 3 2 2 3 3 7 7 4 3 3 4 4 8 5 4 4 5 5 5 1 2 3 4 E A B C D S G (e) After A1 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 6 6 3 3 2 2 3 3 7 7 4 3 3 4 4 8 5 4 4 5 5 E A B C D 5 1 2 3 4 S G (f) After C1 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 3 6 4 3 3 2 2 3 3 7 7 4 3 3 4 4 8 5 4 4 5 5 5 1 2 3 4 E A B C D S G (g) After B4 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 3 6 4 3 3 2 2 3 3 7 7 4 4 3 3 4 4 8 5 4 4 5 5 E A B C D 5 1 2 3 4 S G (h) After D1 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 3 4 4 3 3 2 2 3 3 7 5 4 4 3 3 4 4 8 5 4 4 5 5 5 1 2 3 4 E A B C D S G (i) After B5 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 3 4 4 3 3 2 2 3 3 7 5 4 4 3 3 4 4 8 5 5 4 4 5 5 E A B C D 5 1 2 3 4 S G (j) After E1 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 3 4 4 3 3 2 2 3 3 5 5 4 4 3 3 4 4 6 5 5 4 4 5 5 5 1 2 3 4 E A B C D S G (k) After C5 Expanded 3 3 2 2 3 3 4 4 5 5 2 2 1 1 2 2 3 3 4 4 3 3 2 2 3 3 5 5 4 4 3 3 4 4 6 6 5 5 4 4 5 5 7 E A B C D 5 1 2 3 4 S G (l) After D5 Expanded Figure 4.18: Example Trace of Moving Target D* Lite current states of the agent and target are B1 (marked S) and D5 (marked G), respectively. All unblocked states have their g-values in the upper left corner and their rhs-values in the upper right corner (shown in Figure 4.17). For ease of illustration, I useh-values that are zero for all states and thus do not show them in each state. The key of each state is thus the minimum of theg- and rhs-values of the state. The outgoing arrows from states are the parent pointers of the states and point to their parents. 140 MT-D* Lite runs its rst search from B1 to D5 (shown in Figure 4.18(a)), which is exactly the same as for Basic MT-D* Lite (shown in Figure 4.14(a)). The cost-minimal path is B1, B2, B3, A3, A4, A5, B5, C5, and D5. The agent then moves along the path to B2, and the target moves o the path to E5 and B4 becomes unblocked, which changes the action costs c(B3,B4), c(B4,B3), c(B4,B5), c(B5,B4), c(B4,A4) and c(A4,B4) from innity to one (shown in Figure 4.18(b)). Since the target moved o the path and the action costs of the agent changed, MT-D* Lite nds a cost-minimal path from the current state of the agent B2 to the current state of the target E5 using the following steps. Dierent from Basic MT-D* Lite that lazily expands the states in the search tree of the previous search that are not in the subtree rooted in the start state of the current search to set theirg-values, rhs-values and parent pointers and updates their memberships in the OPEN list one at a time, so that all states satisfy Eqs. 1'-4' (see Section 2.4.2.2), MT-D* Lite uses OptimizedDeletion() to eagerly set the g-values, rhs-values and parent pointers and update the memberships in the OPEN list of these states, so that they satisfy Eqs. 1'-4', which can be computationally less expensive. (For example, in order to expand a state, Basic MT-D* Lite needs to not only set the g-value of the state itself, but also update the rhs-values of all successor states of that state.) First, MT-D* Lite sets the parent pointer of the start state of the current search B2 to NULL. Second, in Phase 1, MT-D* Lite deletes all states in the search tree of the previous search that are not in the subtree rooted in the start state of the current search, namely A1, B1, C1, D1, and E1, by setting their g- and rhs-values to innity, setting their parent pointers to NULL and inserting them into the DELETED list. Third, in Phase 2, MT-D* Lite updates the rhs-values according to Eqs. 1'-2' and parent pointers according to Eqs. 3'-4' 141 for all states in the DELETED list. For example, rhs(B1) = 2 and parent(B1) = B2. Then, it inserts A1, B1, C1, D1, and E1 into the OPEN list, since they have become locally inconsistent. MT-D* Lite then starts the second search, which performs 9 state expansions (com- pared to 14 state expansions performed by Basic MT-D* Lite in Figures 4.14(d-q)), namely B1, A1, C1, B4, D1, B5, E1, C5 and D5 (shown in Figures 4.18(d-l)). The cost-minimal path is B2, B3, B4, B5, C5, D5 and E5. 4.3 Summary In this chapter, I introduced new search tree transforming incremental search algorithms for moving target search. In Section 4.1.1, I introduced G-FRA*, a new search tree trans- forming incremental search algorithm for moving target search in known static terrains. In Section 4.1.2, I introduced FRA*, that uses geometric properties specic to grids to optimize G-FRA* for moving target search in known static grids. Both G-FRA* and FRA* transform the search tree of the previous search to the search tree of the current search by reusing the subtree rooted in the start state of the current search. Thus, they can be computationally more ecient than Repeated A* if a large number of states belong to the subtree rooted in the start state of the current search. In Section 4.2.1, I intro- duced Basic MT-D* Lite, a new search tree transforming incremental search algorithm that generalizes D* Lite to moving target search. In Section 4.2.2, I introduced MT-D* Lite that optimizes Basic MT-D* Lite by combining the principles of G-FRA* and Basic MT-D* Lite. Similar to G-FRA* and FRA*, both Basic MT-D* Lite and MT-D* Lite 142 transform the search tree of the previous search to the search tree of the current search by reusing the subtree rooted in the start state of the current search. Thus, Basic MT-D* Lite and MT-D* Lite can be computationally more ecient than Repeated A* if a large number of states are in the subtree rooted in the start state of the current search, since they can reuse a large number of states from the previous search. 143 Chapter 5: Experimental Evaluation In this chapter, I measure the runtimes of the developed incremental search algorithms for moving target search. The structure of this chapter is as follows: In Section 5.1, I introduce the objective of the experimental evaluation. In Section 5.2, I introduce the experimental settings. In Section 5.3, I describe the experimental results and analyze them. In Section 5.4, I discuss the strengths and weaknesses of the developed incremental search algorithms and provide guidelines for potential users when to choose one algorithm over another. 5.1 Objective The objective of the comparison of the developed incremental search algorithms is to nd out their strengths and weaknesses in terms of their runtime per search in dierent scenarios of moving target search and to demonstrate that the developed incremental search algorithms are able to satisfy the runtime requirements imposed by computer game company Bioware (= 1-3 ms per search). I do not measure their memory requirements since they are in linear in the number of states in the state space for all of them as for A*, 144 which can often be satised in dierent applications, such as computer games (Rabin, 2002). Specically, I compare their runtime per search with respect to: dierent search directions of the incremental search algorithms, namely forward and backward searches (if applicable); dierent state spaces, namely randomly generated grids, randomly generated mazes and game maps; dierent classes of terrains, namely the ones discussed in Section 1.1; dierent numbers of actions whose costs change between searches in known dynamic terrains; and dierent movement strategies of the target, namely moving to randomly selected states and using TrailMax, a state-of-the-art evasion algorithm for known static terrains, that generates movement strategies for the target that result in long tra- jectories of the agent necessary to catch the target (Moldenhauer & Sturtevant, 2009a), as discussed in Section 2.5. Then, I discuss their strengths and weaknesses and provide guidelines for when to choose a particular algorithm over another. In order to ensure that the style of the pseudo codes of all compared search algorithms are consistent throughout the dissertation, I use the style of the pseudo code of D* Lite (Koenig & Likhachev, 2002) for all compared search algorithms. The implementation of a search algorithm can be slightly dierent from the pseudo code of the algorithm. For example, rather than using UpdateState(s), I have implemented Forward and Backward Repeated A*, GAA*, G-FRA* and FRA* with 145 inline code to update the membership of a state in the OPEN list. The source code of all compared search algorithms is available at (Sun, 2012). The experimental results showed that the runtimes per search of these algorithms with slightly dierent implementation are similar. Some developed incremental search algorithms, such as GAA* and MT-D* Lite, have also been implemented by other researchers, who have gained similar experimental results to the ones demonstrated in this chapter (Belghith et al., 2010; Anand, 2011). I will provide more details about their works in Chapter 6. 5.2 Experimental Setups In this section, I introduce my experimental setups for evaluating the runtime per search of the developed incremental search algorithms. 5.2.1 Algorithms The search algorithms that I compare in this chapter for moving target search are: Forward Repeated A* and Backward Repeated A*; Forward Dierential A* and Backward Dierential A*; Forward Generalized Adaptive A* (Forward GAA*) and Backward Generalized Adaptive A* (Backward GAA*); Generalized Fringe-Retrieving A* (G-FRA*) and Fringe-Retrieving A* (FRA*); and 146 Figure 5.1: Grids Used in the Experimental Evaluation Basic Moving Target D* Lite (Basic MT-D* Lite) and Moving Target D* Lite (MT-D* Lite). 5.2.2 State Spaces I primarily use four-neighbor grids to evaluate the runtime of the developed search algo- rithms, since they are standard test beds for evaluating the runtime of search algorithms and have been widely used to represent state spaces of search-based path planning prob- lems (Korf, 1990; Ishida & Korf, 1991; Koenig & Likhachev, 2002; Thrun & Buecken, 1996; Likhachev, 2005; Ferguson, 2006; Rabin, 2002; Sun & Koenig, 2007; Moldenhauer & Sturtevant, 2009b). I use the Manhattan distances from each state to the goal state as the user-provided h-values in four-neighbor grids. I perform experiments to evaluate the runtime of the developed incremental search algorithms in three dierent kinds of grids that have been used to evaluate the runtime of search-based path planning algorithms (Koenig et al., 2004a, 2007; Hern andez et al., 2009; Moldenhauer & Sturtevant, 2009b; Sun & Koenig, 2007): 147 Random grids of size 500 500 in which 25% of randomly chosen states were blocked (shown in Figure 5.1 left). Random mazes of size 500 500 whose corridors are ten states wide (shown in Figure 5.1 middle). The corridors are generated with depth-rst search. Two game maps adapted from World of Warcraft of size 512 512 and 676 676, respectively (shown in Figure 5.1 right). 5.2.3 Terrains I systematically evaluate the runtime of the search algorithms described in Section 5.2.1 in all three classes of terrains discussed in Section 1.1.2. Known Static Terrains: In Section 5.3.1, I compare the developed search al- gorithms described in Section 5.2.1 that apply to known static terrains in known static grids. Known Dynamic Terrains: In Section 5.3.2, I compare the developed search algorithms described in Section 5.2.1 that apply to known dynamic terrains in known dynamic grids. I unblock k blocked states and block k unblocked states every tenth move of the agent in a way so that there always remains a path from the current state of the agent to the current state of the target, where k is a parameter whose value I vary from ten to ve thousand in order to get a comprehensive comparison of the search algorithms. For each test case in the experiments, if the rst search of the test case does not return a path from the current state of the agent to the current state of the target, then I terminate the test case. If the rst 148 search returns a path, then I ensure that a path from the current state of the agent to the current state of the target exists by keeping all states on the rst path and the trajectory of the target unblocked. The reason why I terminate a test case if there does not exist a path from the current state of the agent to the current state of the target is that it can take a long time before the state space changes in a way such that a new path exists from the current state of the agent to the current state of the target, which makes it unacceptably time-consuming to perform the experiments. Unknown Static Terrains: In Section 5.3.3, I compare the developed search algo- rithms described in Section 5.2.1 that apply to unknown static terrains in unknown static grids. I set the sensor range of the agent to one state, that is, the agent can always observe the blocked state of the four neighboring states of its current state. 5.2.4 Target Movement Strategies I evaluate the runtime of all search algorithms described in Section 5.2.1 with respect to two dierent movement strategies of the target: Random Waypoint: The target always follows a cost-minimal path from its cur- rent state to a randomly selected unblocked state. I ensure that there always exists a path from the current state of the target to the unblocked state when selected, and this path is kept unblocked during the test case. The target repeats the pro- cess once it reaches that unblocked state in all three kind of terrains discussed in Section 1.1.2. This strategy can be seen in computer games. For example, a game 149 character can repeatedly move to random states selected by the player via mouse clicks. TrailMax: TrailMax moves the target to a state that is as far away from the current state of the agent as possible in known static terrains (see Section 2.5). Following the previous literature on moving target search (Ishida & Korf, 1991; Ishida, 1992; Chimura & Tokoro, 1994; Moldenhauer & Sturtevant, 2009a), I choose the initial states of the agent and target randomly for each test case. The target skips every tenth move to enable the agent to catch it. 5.2.5 Measures I run all experiments on an Intel Core 2 Duo 2.53 Ghz Linux PC with 2 GBytes of RAM. I report two measures for the diculty of the search problems, namely (a) the number of searches per test case and (b) the number of moves of the agent per test case until it catches the target. All search algorithms determine paths of the same cost for each search if the start and goal states of that search are the same, and their numbers of moves and searches per test case are thus approximately the same. They dier slightly since the agent can follow dierent trajectories due to tie breaking. I report two measures for the eciency of the search algorithms, namely (c) the number of expanded states (= state expansions) per search and (f) the runtime per search. I calculate the runtime per search by dividing the total runtime until the target is caught by the number of searches. 1 Unfortunately, the runtime per search depends on low-level machine and implementation details, such as the instruction set of the processor, the optimizations performed by the 1 The time for calculating the movement strategy of the target is not included in the total runtime. 150 compiler and coding decisions. This point is especially important since the grids t into memory and the resulting state spaces are thus small. I do not know of any better method for evaluating search algorithms than to implement them as well as possible, publish their runtimes, and let other researchers validate them with their own and thus potentially slightly dierent implementations. I also report the standard deviation of the mean for the number of state expansions per search (in parentheses) for all search algorithms described in Section 5.2.1 to demonstrate the statistical signicance of my results. I do not report the standard deviation of the mean for the runtime per search because the smallest clock granularity available in the Linux system is one microsecond, which is larger than some runtimes per search when the agent is near the target. In this situation, the runtimes per search measured are all zeros, resulting in an inaccurate measurement of the standard deviation of the mean for the runtime per search. For GAA*, I report (d) the number of h-value updates (= state propagations) per search by the consistency procedure as third measure since the runtime per search depends on both the number of state expansions and propagations per search. I calculate the number of state propagations per search by dividing the total number of h-value updates until the target is caught by the number of searches. For FRA*, G-FRA* and MT-D* Lite, I report (e) the number of deleted states (= state deletions) per search as third measure since the runtime per search depends on both the number of state expansions and deletions per search. I calculate the number of state deletions per search by dividing the total number of deleted states by the number of searches. 151 Random Grid Random Maze (a) (b) (c) (d) (e) (f) (a) (b) (c) (d) (e) (f) Forward Repeated A* 252 401 2,417 (26.4) 371 345 590 10,500 (84.9) 1,659 Forward GAA* 255 405 2,102 (24.1) 0 363 348 589 6,710 (59.4) 0 1,140 Forward Dierential A* 252 401 2,417 (26.4) 478 345 590 10,500 (84.9) 2,290 Backward Repeated A* 245 402 2,032 (21.9) 312 329 590 11,109 (98.7) 1,716 Backward GAA* 236 403 1,034 (14.1) 0 178 303 584 4,810 (54.1) 0 788 Backward Dierential A* 245 402 2,032 (21.9) 394 329 590 11,109 (98.7) 2,348 FRA* 278 412 206 (14.5) 187 65 356 592 1,937 (24.1) 1,529 328 G-FRA* 283 417 223 (11.1) 220 76 374 604 1,895 (23.5) 1,545 344 Basic MT-D* Lite 277 415 310 (11.3) 154 366 600 5,392 (53.2) 1,542 MT-D* Lite 282 419 184 (18.8) 185 94 368 602 2,725 (27.8) 2,754 801 Game Map 512 512 Game Map 676 676 Forward Repeated A* 165 296 1,726 (31.6) 238 345 536 6,387 (55.9) 1,097 Forward GAA* 165 296 1,321 (25.7) 0 208 347 537 4,906 (46.7) 0 859 Forward Dierential A* 165 296 1,726 (31.6) 285 345 536 6,387 (55.9) 1,437 Backward Repeated A* 169 301 2,310 (45.9) 331 329 545 4,869 (51.5) 829 Backward GAA* 161 301 802 (17.8) 0 135 311 539 2,150 (36.0) 0 375 Backward Dierential A* 169 301 2,310 (45.9) 407 329 545 4,869 (51.5) 1,084 FRA* 179 290 199 (8.7) 156 48 361 539 464 (26.8) 338 124 G-FRA* 185 288 211 (7.3) 216 50 373 547 521 (21.1) 402 129 Basic MT-D* Lite 198 316 431 (4.3) 148 369 545 1,011 (23.8) 374 MT-D* Lite 203 320 220 (7.6) 36 83 371 547 549 (20.5) 537 221 (a) = searches until the target is caught; (b) = moves until the target is caught; (c) = state expansions per search (in parenthesis: standard deviation of the mean); (d) = state propagations per search; (e) = state deletions per search; and (f) = runtime per search (in microseconds) Table 5.1: Experimental Results in Known Static Grids (Target Movement Strategy: Random Waypoint) 5.3 Experimental Results In this section, I discuss the experimental results for all search algorithms described in Section 5.2.1. 5.3.1 Experiments in Known Static Terrains In this section, I compare all search algorithms described in Section 5.2.1 for moving target search in known static terrains. 5.3.1.1 Target Movement Strategy: Random Waypoint In this section, I compare all search algorithms described in Section 5.2.1 for moving target search when the target uses the Random Waypoint movement strategy. First, I discuss the experimental results for Forward and Backward Repeated A* and the heuristic learning incremental search algorithms, namely, Forward and Backward 152 GAA*, which are identical to Forward and Backward MT-Adaptive A*, respectively, in known static grids. Table 5.1 shows the following relationships: For all three kinds of grids (see Section 5.2.2), GAA* has a smaller runtime per search than Repeated A* with the same search direction because GAA* updates theh-values to make them more informed over time and hence expands fewer states per search. The smallest runtime per search in each kind of grid is shown in bold. For example, Backward GAA* runs faster than Backward Repeated A* by a factor of 1.75 in random grids, 2.18 in random mazes, 2.45 in the game map of size 512 512 and 2.21 in the game map of size 676 676. For all three kinds of grids, Backward GAA* has a smaller runtime per search than Forward GAA*. The reason is as follows: Assume that after the previous search the current state of the target changed fromt tot 0 . Letd be equal to the minimum cost of moving from the state of the agent in the previous search to t, and d 0 be equal to the minimum cost of moving from the state of the agent in the previous search to t 0 . For both Forward and Backward GAA*, there are two possible cases: { Case 1: dd 0 . For example, the target moves away from the agent after the previous search. { Case 2: d > d 0 . For example, the target moves towards the agent after the previous search. For Forward GAA*, the minimum costs of moving from most of the expanded states in the previous search to the current state of the targett 0 (= goal state of the current search) do not decrease in Case 1 after the target moved. However, Forward GAA* 153 corrects the h-value of the state s expanded in the previous searches by assigning h(s) := max(H(s;t 0 );h(s)h(t 0 )) (see Assignment (3.1) on page 68), which likely decreases theh-value and thus make it less informed although the minimum cost of moving from s to the goal state likely does not decrease. For Backward GAA* and for both Cases 1 and 2, the agent moves froms goal tos 0 goal towards the target after the previous search. Thus, the minimum costs of moving from most of the expanded states in the previous search to the current state of the agent (= goal state of the current search) decrease after the agent moved. Backward GAA* corrects the h-value of each state s expanded in the previous search by assigning h(s) := max(H(s;s 0 goal );h(s)h(s 0 goal )), which likely re ects the change of the h-value of s accurately. I also perform additional experiments with the game map of size 676 676 to measure the ratios between the updatedh-values and the user-providedh-values of Forward GAA* and Backward GAA*. The results show that the ratios of Forward GAA* and Backward GAA* are 2.13 and 2.39, respectively. Therefore, Backward GAA* has more informed h-values and expands fewer states per search than For- ward GAA*, which results in a smaller runtime per search than Forward GAA*. For example, Backward GAA* runs faster than Forward GAA* by a factor of 2.04 in random grids, 1.45 in random mazes, 1.54 in the game map of size 512 512 and 2.29 in the game map of size 676 676. Second, I discuss the experimental results for Forward and Backward Repeated A* and the search tree transforming incremental search algorithms, that is, Forward and 154 Backward Dierential A*, FRA*, G-FRA*, Basic MT-D* Lite and MT-D* Lite. Table 5.1 shows the following relationships: For all three kinds of grids, Repeated A* has a smaller runtime per search than Dierential A* with the same search direction. This is because the start state changes between searches for moving target search. After the start state changes, Dierential A* deletes the start state of the previous search and creates the start state of the current search. Deleting the start state of the previous search results in the deletion of all states in the search tree of the previous search. Thus, Dierential A* cannot reuse any part of the search tree from the previous search but spends eort on deleting the search tree of the previous search and hence can be even slower than Repeated A* with the same search direction (see Section 2.4.2.3). For example, Forward Repeated A* runs faster than Forward Dierential A* by a factor of 1.29 in random grids, 1.38 in random mazes, 1.20 in the game map of size 512 512 and 1.31 in the game map of size 676 676. For all three kinds of grids, both FRA* and G-FRA* have a smaller runtime per search than Forward and Backward Repeated A* because FRA* and G-FRA* do not expand the states in the subtree of the previous search rooted in the start state of the current search. Forward and Backward Repeated A*, on the other hand, expand some of these states. For example, in random grids, FRA* expands only about 8:52% of the states that Forward Repeated A* expands. It thus runs by a factor of 5.71 faster per search than Forward Repeated A*. Similarly, FRA* runs 155 faster than Forward Repeated A* by a factor of 5.06 in random mazes, 4.96 in the game map of size 512 512 and 8.85 in the game map of size 676 676. For all three kinds of grids, MT-D* Lite has a smaller runtime per search than Basic MT-D* Lite because Basic MT-D* Lite expands the states in the search tree of the previous search that are not in the subtree rooted in the start state of the current search to set their g-values to innity. MT-D* Lite, on the other hand, uses OptimizedDeletion() instead, which is demonstrated to be faster with the experimental results. For example, MT-D* Lite runs faster than Basic MT-D* Lite by a factor of 1.64 in random grids, 1.93 in random mazes, 1.78 in the game map of size 512 512 and 1.69 in the game map of size 676 676. For all three kinds of grids, FRA* and G-FRA* have smaller runtimes per search than Basic MT-D* Lite and MT-D* Lite because of two reasons: (1) FRA* and G-FRA* have a smaller runtime per state expansion than Basic MT-D* Lite and MT-D* Lite. For example, in random grids, the approximate runtime per state expansion (calculated by dividing the runtime per search by the number of expanded states per search) is 0.32 and 0.34 microseconds for FRA* and G-FRA*, respectively, while the approximate runtime per state expansion is 0.50 and 0.51 microseconds for Basic MT-D* Lite and MT-D* Lite, respectively. (2) FRA* and G-FRA* expand fewer states than Basic MT-D* Lite and MT-D* Lite in the following case: If the target moves to a state in the subtree of the previous search that is rooted in the start state of the current search, FRA* and G-FRA* terminate without expanding states due to Step 3 (Terminating Early). I also perform additional experiments for 156 FRA* with the game map of size 676 676 to measure the ratios of the number of early terminations and the total number of searches per test case. The results show that 64 out of 351 searches of FRA* (= 18.23%) terminate early. Basic MT-D* Lite and MT-D* Lite, on the other hand, do not terminate early but have to expand all locally inconsistent states whose keys are smaller than the key of the goal state. For example, FRA* runs faster than MT-D* Lite by a factor of 1.45 in random grids, 2.44 in random mazes, 1.73 in the game map of size 512 512 and 1.78 in the game map of size 676 676. For all three kinds of grids, FRA* has a smaller runtime per search than G-FRA* because G-FRA* reuses only the subtree of the previous search that is rooted in the start state of the current search. FRA*, on the other hand, uses the Additional Step for grids (see Section 4.1.2.3), that allows it to reuse more of the search tree of the previous search. Thus, FRA* deletes and expands fewer states per search than G-FRA*. For example, in random grids, FRA* expands only about 92:38% of the states per search that G-FRA* expands. It thus runs by a factor of 1.17 faster per search than G-FRA*. Similarly, FRA* runs faster than G-FRA* by a factor of 1.05 in random mazes, 1.04 in the game map of size 512 512 and 1.04 in the game map of size 676 676. For all three kinds of grids, FRA* has the smallest runtime per search among Forward and Backward Repeated A*, FRA*, G-FRA*, Basic MT-D* Lite and MT- D* Lite. 157 Third, I compare the search tree transforming incremental search algorithms with the heuristic learning incremental search algorithms. Table 5.1 shows the following relation- ships: For all three kinds of grids, G-FRA* and FRA* have smaller runtimes per search than Forward and Backward GAA* because G-FRA* and FRA* expand fewer states per search than Forward and Backward GAA*. I focus only on the comparison of FRA* against Backward GAA* since FRA* runs faster than G-FRA* and Backward GAA* runs faster than Forward GAA*. In random grids, FRA* expands about 19:92% of the states per search that Backward GAA* expands. It thus runs by a factor of 2.74 faster per search than Backward GAA*. Similarly, FRA* runs faster than Backward GAA* by a factor of 2.40 in random mazes, 2.81 in the game map of size 512 512 and 3.02 in the game map of size 676 676. For all three kinds of grids except the random mazes, MT-D* Lite has a smaller runtime per search than Forward and Backward GAA*. I focus only on the com- parison of MT-D* Lite against Backward GAA* since MT-D* Lite runs faster than Basic MT-D* Lite and Backward GAA* runs faster than Forward GAA*. MT-D* Lite runs faster than Backward GAA* by a factor of 1.89 in random grids, 1.63 in the game map of size 512 512 and 1.70 in the game map of size 676 676. How- ever, Backward GAA* runs faster than MT-D* Lite by a factor of 1.02 in random mazes. I perform additional experiments with MT-D* Lite and Backward GAA* to gain more insight into the operations of both search algorithms. I rst measure the ratio 158 r of the number of states deleted from the search tree and the total number of states in the search tree after each search of MT-D* Lite. I calculater by dividing the total number of states deleted from the search tree by the total number of states in the search tree before the deletion is performed. r can signicantly aect the runtime of MT-D* Lite because, the smaller r is, the smaller a portion of the search tree of the previous search is deleted and hence the larger a portion of the search tree of the previous search can be reused for the current search. Thus, a smallerr tends to result in a smaller runtime per search for MT-D* Lite. The results show that r is equal to 4.63% and 21.12% in random grids and random mazes, respectively, which explains why MT-D* Lite runs faster than Backward GAA* by a factor of 1:89 in random grids, but runs more slowly than Backward GAA* by a factor of 1:02 in random mazes. To explain why r is much larger in random mazes than random grids, I measure the informedness of the user-provided h-values in each kind of grid as follows: I randomly select 100 pairs of start state s start and goal state s goal in each kind of grid. Then, I calculate the informedness of the user-provided h-values in each kind of grid by using the ratior in between the average ratio of the user-providedh-values H(s start ;s goal ) and dist (s start ;s goal ) between each pair of s start and s goal . Since the user-provided h-values (= Manhattan Distances) are admissible, 0 r in 1 and, the larger r in is, the more informed the h-values are. Experimental results show that r in is equal to 0.61 in random mazes, 0.95 in random grids, 0.85 in the game map of size 512 512 and 0.90 in the game map of size 676 676. Thus, the order of informedness of the user-provided h-values in the dierent kinds of grids, from 159 S’ G S Reusable States G S’ Deleted States G’ (a) Search with Well-informed h-values S’ G S Reusable States G S’ G’ Deleted States (b) Search with Ill-informed h-values Figure 5.2: Reusable and Deleted States of Moving Target D* Lite the most to the least well-informed, is in random grids, the game map of size 676 676, the game map of size 512 512, and random mazes. In this dissertation, the user-provided h-values in random mazes are regarded as \ill-informed", since they are much smaller than the ones in random grids, and the two game maps, which are regarded as \well-informed". The informedness of the user-provided h- values explains why r is much larger in random mazes than random grids: Assume that MT-D* Lite performs a search from the current state of the agent S to the current state of the target G and, after the search, the current states of the agent and target change toS 0 andG 0 , respectively. Figure 5.2(a) visualizes the expanded states of a search with well-informed h-values. In this case, the search is more focused on the area that contains the cost-minimal path from S to G. When the agent moves fromS toS 0 , a large portion of expanded states belongs to the subtree rooted in S 0 (= reusable states) since the search was more focused. MT-D* Lite only needs to delete a small portion of expanded states from the search tree of the previous search, resulting in a smaller runtime per search than Backward GAA*. Figure 5.2(b) visualizes the expanded states of a search with ill-informed h-values. 160 In this case, the search is less focused on the area that contains the cost-minimal path from S to G. When the agent moves from S to S 0 , only a relatively small portion of expanded states belongs to the subtree rooted in S 0 (= reusable states) since the search was less focused. MT-D* Lite needs to delete a large portion of expanded states from the search tree of the previous search, resulting in a larger r and hence a larger runtime per search than Backward GAA*. 5.3.1.2 Target Movement Strategy: TrailMax I now evaluate the runtime of all search algorithms described in Section 5.2.1 when the target uses TrailMax as its movement strategy to nd out how dierent movement strategies of the target aect the runtimes of the compared search algorithms. I do not report the results of Dierential A*, since its runtime per search is larger than the one of Repeated A* (with the same search direction) and hence it is not competitive. Random Grid Random Maze (a) (b) (c) (d) (e) (f) (a) (b) (c) (d) (e) (f) Forward Repeated A* 806 1,001 1,828 (12.1) 315 1,212 1,591 9,090 (47.1) 1,499 Forward GAA* 813 1,008 1,721 (11.5) 0 284 1,216 1,594 6,275 (33.3) 0 1,064 Backward Repeated A* 820 1,015 1,209 (8.9) 193 1,164 1,543 7,538 (41.1) 1,106 Backward GAA* 812 1,017 690 (5.3) 0 117 1,123 1,505 3,947 (23.3) 0 643 FRA* 810 1,005 116 (5.1) 105 49 1,161 1,528 2,471 (16.9) 2,627 482 G-FRA* 785 9,85 144 (4.2) 151 61 1,136 1,501 2,696 (17.5) 3,086 521 Basic MT-D* Lite 841 1,030 149 (4.0) 119 1,154 1,517 5,474 (37.9) 1,592 MT-D* Lite 844 1,032 85 (3.6) 90 64 1,155 1,517 2,766 (19.3) 364 899 Game Map 512 512 Game Map 676 676 Forward Repeated A* 386 554 2,537 (26.9) 358 797 1,097 6,452 (37.4) 1,395 Forward GAA* 385 553 1,753 (20.4) 0 301 773 1,076 5,647 (35.4) 0 1,291 Backward Repeated A* 451 624 1,719 (19.8) 231 668 996 5,950 (39.6) 1,323 Backward GAA* 434 612 712 (8.3) 0 108 677 1,005 2,644 (22.5) 0 583 FRA* 364 521 370 (6.2) 438 71 768 1,077 377 (12.9) 329 127 G-FRA* 366 523 527 (7.8) 576 102 776 1,086 382 (10.3) 354 131 Basic MT-D* Lite 365 522 1,026 (14.8) 329 773 1,082 746 (13.2) 336 MT-D* Lite 367 524 524 (8.0) 548 176 773 1,084 396 (10.4) 398 193 (a) = searches until the target is caught; (b) = moves until the target is caught; (c) = state expansions per search (in parenthesis: standard deviation of the mean); (d) = state propagations per search; (e) = state deletions per search; and (f) = runtime per search (in microseconds) Table 5.2: Experimental Results in Known Static Grids (Target Movement Strategy: TrailMax) 161 I rst describe the similarity between the results in Table 5.1, where the target uses the Random Waypoint movement strategy, and the results in Table 5.2, where the target uses the TrailMax movement strategy. The explanations given earlier in Section 5.3.1.1 still apply to the following relationships: First, for heuristic learning incremental search algorithms, both tables show the following similar relationships: { For all three kinds of grids, GAA* has a smaller runtime per search than Repeated A* with the same search direction. For example, Backward GAA* runs faster than Backward Repeated A* by a factor of 1.65 in random grids, 1.72 in random mazes, 2.14 in the game map of size 512 512 and 2.27 in the game map of size 676 676. { For all three kinds of grids, Backward GAA* has a smaller runtime per search than Forward GAA*. For example, Backward GAA* runs faster than Forward GAA* by a factor of 2.43 in random grids, 1.65 in random mazes, 2.79 in the game map of size 512 512 and 2.21 in the game map of size 676 676. Second, for search tree transforming incremental search algorithms, both tables show the following similar relationships: { For all three kinds of grids, both FRA* and G-FRA* have a smaller runtime per search than Forward and Backward Repeated A*. For example, FRA* runs faster than Forward Repeated A* by a factor of 6.43 in random grids, 162 3.11 in random mazes, 5.04 in the game map of size 512 512 and 10.98 in the game map of size 676 676. { For all three kinds of grids, MT-D* Lite has a smaller runtime per search than Basic MT-D* Lite. For example, MT-D* Lite runs faster than Basic MT-D* Lite by a factor of 1.86 in random grids, 1.77 in random mazes, 1.87 in the game map of size 512 512 and 1.74 in the game map of size 676 676. { For all three kinds of grids, FRA* and G-FRA* have smaller runtimes per search than Basic MT-D* Lite and MT-D* Lite. For example, FRA* runs faster than MT-D* Lite by a factor of 1.31 in random grids, 1.87 in random mazes, 2.48 in the game map of size 512 512 and 1.52 in the game map of size 676 676. { For all three kinds of grids, FRA* has a smaller runtime per search than G- FRA*. For example, FRA* runs faster than G-FRA* by a factor of 1.24 in random grids, 1.08 in random mazes, 1.44 in the game map of size 512 512 and 1.03 in the game map of size 676 676. { For all three kinds of grids, FRA* has the smallest runtime per search among Forward and Backward Repeated A*, FRA*, G-FRA*, Basic MT-D* Lite and MT-D* Lite. Third, when comparing the heuristic learning incremental search algorithms to the search tree transforming incremental search algorithms, both tables show the fol- lowing similar relationships: 163 { For all three kinds of grids, FRA* and G-FRA* have smaller runtimes per search than both Forward and Backward GAA*. I focus only on the compar- ison of FRA* against Backward GAA* since FRA* runs faster than G-FRA* and Backward GAA* runs faster than Forward GAA*. FRA* runs faster than Backward GAA* by a factor of 2.39 in random grids, 1.33 in random mazes, 1.52 in the game map of size 512 512 and 4.59 in the game map of size 676 676. { For random grids and the game map of size 676 676 (where the user-provided h-values are well-informed, see Section 5.3.1.1), MT-D* Lite has a smaller runtime per search than Backward GAA*. For example, MT-D* Lite runs faster than Backward GAA* by a factor of 1.83 in random grids and 3.02 in the game map of size 676 676. { For random mazes (where the user-providedh-values are ill-informed, see Sec- tion 5.3.1.1) and the game map of size 512 512 (where the user-provided h-values are less well-informed than random grids and the game map of size 676 676), Backward GAA* has a smaller runtime per search than MT-D* Lite. For example, Backward GAA* runs faster than MT-D* Lite by a factor of 1.40 in random mazes and 1.63 in the game map of size 512 512. I now describe the dierences between the results in Table 5.1, where the target uses the Random Waypoint movement strategy, and the results in Table 5.2, where the target uses the TrailMax movement strategy: 164 For all three kinds of grids, the number of searches of all search algorithms and the number of moves to catch the target are larger when the target uses the TrailMax rather than the Random Waypoint movement strategy, which demonstrates the power of TrailMax to avoid the agent. For example, in random grids, the agent performs 252 Forward Repeated A* searches and 401 moves to catch the target when the target uses the Random Waypoint movement strategy. On the other hand, the agent performs 806 Forward Repeated A* searches (= an increase of a factor of 3.20) and 1,001 moves (= an increase of a factor of 2.50) to catch the target when the target uses the Random Waypoint movement strategy. For all three kinds of grids, the runtime per search of all search algorithms can be smaller or larger when the target uses dierent movement strategies, since dierent movement strategies of the target can result in dierent trajectories of both the agent and the target. In this case, a search algorithm can expand dierent states when searching for a cost-minimal path between the current states of the agent and the target, resulting in a dierent runtime per search. For example, the runtime per search of MT-D* Lite is 94 microseconds in random grids when the target uses the Random Waypoint movement strategy, and it decreases to 64 microseconds when the target uses the TrailMax movement strategy. Dierent from the situation in random grids, the runtime per search of MT-D* Lite is 83 microseconds in the game map of size 512 512, when the target uses the Random Waypoint movement strategy, and it increases to 176 microseconds when the target uses the TrailMax movement strategy. 165 5.3.1.3 Summary Random Grid FRA* Backward GAA* MT-D* Lite Basic MT-D* Lite Random Maze G-FRA* FRA* G-FRA* Backward GAA* MT-D* Lite Forward GAA* Game Map 512 x 512 FRA* G-FRA* MT-D* Lite Backward GAA* Basic MT-D* Lite Game Map 676 x 676 FRA* G-FRA* MT-D* Lite Basic MT-D* Lite Backward GAA* Figure 5.3: Runtime Relationships in Known Static Terrains (Target Movement Strategy: Random Waypoint) Random Grid FRA* Backward GAA* MT-D* Lite Basic MT-D* Lite Random Maze G-FRA* FRA* G-FRA* Backward GAA* MT-D* Lite Forward GAA* Game Map 512 x 512 FRA* G-FRA* MT-D* Lite Backward GAA* Game Map 676 x 676 FRA* G-FRA* MT-D* Lite Basic MT-D* Lite Backward GAA* Backward Repeated A* Figure 5.4: Runtime Relationships in Known Static Terrains (Target Movement Strategy: TrailMax) Finally, I summarize the results by combing the experimental results from Tables 5.1 and 5.2. The runtime relationships of the fastest ve search algorithms when the target uses the Random Waypoint and TrailMax movement strategies are shown in Figures 5.3 and 5.4, respectively. An arrow points algorithm A to algorithm B i the runtime per search of algorithm A is smaller than the one of algorithm B. Figure 5.3 shows the following relationships: 166 For all three kinds of grids except for random mazes (where the user-provided h- values are well-informed, see Section 5.3.1.1), FRA* and G-FRA* have the smallest and second smallest runtime per search, respectively. MT-D* Lite has the third smallest runtime per search. For random mazes (where the user-provided h-values are ill-informed, see Sec- tion 5.3.1.1), FRA* and G-FRA* have the smallest and second smallest runtime per search, respectively. Backward GAA* has the third smallest runtime per search. Figure 5.4 shows the following relationships: For all three kinds of grids except for random mazes (where the user-provided h- values are well-informed, see Section 5.3.1.1), FRA* and G-FRA* have the smallest and second smallest runtime per search, respectively. For random grids and the game map of size 676 676, MT-D* Lite has the third smallest runtime per search. For the game map of size 512 512, Backward GAA* has the third smallest runtime per search. For random mazes (where the user-provided h-values are ill-informed, see Sec- tion 5.3.1.1), FRA* and G-FRA* have the smallest and second smallest runtime per search, respectively. Backward GAA* has the third smallest runtime per search. To summarize, FRA* and G-FRA* have the smallest runtime per search among all compared search algorithms for all three kinds of grids and for both movement strategies of the target. Thus, they are the fastest incremental search algorithms for moving target search in known static terrains. 167 5.3.2 Experiments in Known Dynamic Terrains In this section, I compare all search algorithms described in Section 5.2.1 for moving target search in known dynamic terrains. Since TrailMax was designed only for known static terrains, I compare all search algorithms in case the target uses the Random Waypoint movement strategy. First, I discuss the experimental results for Forward and Backward Repeated A* and the heuristic learning incremental search algorithms, that is, Forward and Backward GAA*. Table 5.3 shows the following relationships: For all three kinds of grids and for all k, the number of state expansions per search of GAA* is smaller than the one of Repeated A* with the same search direction because GAA* updates the h-values to make them more informed over time and hence expands fewer states per search. For example, in random grids, whenk = 10, Backward GAA* expands only about 51.38% of the states per search that Backward Repeated A* expands. For all three kinds of grids, the number of state propagations per search of GAA* increases as k increases since a larger number of action costs then decrease. Thus, the h-values of more states in the state space are aected by the action cost de- creases. The consistency procedure needs to perform more state propagations to maintain the consistency of the h-values. For example, in random grids, the num- ber of state propagations per search of Forward GAA* is 7 when k = 10, and it increases to 916 when k = 5; 000. 168 (a) (b) (c) (d) (e) (f) (a) (b) (c) (d) (e) (f) k = 10 Random Grid Random Maze Forward Repeated A* 315 421 2,221 (22.5) 369 401 584 11,048 (80.6) 1,725 Forward GAA* 321 428 1,838 (19.7) 7 304 387 574 6,402 (54.9) 2 1,077 Backward Repeated A* 308 421 1,884 (19.2) 292 379 582 11,588 (95.4) 1,810 Backward GAA* 298 419 968 (12.1) 7 169 364 582 4,561 (47.3) 1 742 Basic MT-D* Lite 337 430 266 (9.4) 135 394 573 5,552 (52.6) 1,622 MT-D* Lite 340 433 159 (8.7) 161 81 411 591 2,662 (26.4) 2,668 792 Game Map 512 512 Game Map 676 676 Forward Repeated A* 201 298 1,665 (27.7) 234 407 541 6,847 (53.3) 1,169 Forward GAA* 198 296 1,119 (20.2) 1 178 408 541 5,033 (43.3) 2 886 Backward Repeated A* 176 281 2,672 (47.3) 388 375 535 5,598 (49.2) 1,032 Backward GAA* 172 281 977 (17.7) 2 151 365 531 2,381 (32.4) 2 411 Basic MT-D* Lite 223 307 445 (10.6) 151 429 548 1,022 (20.9) 371 MT-D* Lite 225 309 232 (7.2) 249 83 427 545 556 (17.9) 548 215 k = 100 Random Grid Random Maze Forward Repeated A* 315 424 2,339 (22.8) 375 401 586 11,696 (90.0) 1,912 Forward GAA* 311 420 1,997 (20.5) 55 340 418 602 6,977 (59.2) 77 1,269 Backward Repeated A* 309 422 1,964 (19.2) 307 375 577 10,778 (90.6) 1,694 Backward GAA* 298 421 997 (11.9) 48 181 360 578 4,866 (46.8) 67 812 Basic MT-D* Lite 349 443 269 (9.0) 143 424 597 5,748 (57.9) 1,748 MT-D* Lite 355 448 163 (8.4) 152 87 402 580 3,021 (33.2) 2,978 1,011 Game Map 512 512 Game Map 676 676 Forward Repeated A* 187 286 1,813 (30.2) 282 388 529 6,320 (50.3) 1,065 Forward GAA* 192 290 1,188 (21.5) 93 199 397 539 4,581 (40.3) 33 804 Backward Repeated A* 175 282 2,573 (45.6) 382 347 510 5,369 (49.0) 900 Backward GAA* 171 282 966 (17.4) 101 159 348 514 2,230 (31.7) 31 393 Basic MT-D* Lite 210 300 497 (11.4) 169 426 552 932 (20.7) 345 MT-D* Lite 211 300 265 (7.8) 274 96 427 553 512 (17.8) 491 210 k = 1,000 Random Grid Random Maze Forward Repeated A* 280 390 2,192 (23.2) 359 331 547 11,005 (92.9) 1,967 Forward GAA* 279 388 2,003 (21.1) 272 387 335 536 6,600 (59.2) 1,037 1,469 Backward Repeated A* 270 385 1,920 (19.7) 311 300 524 10,843 (99.3) 1,901 Backward GAA* 280 404 995 (12.1) 245 215 301 543 6,474 (66.3) 1,289 1,410 Basic MT-D* Lite 324 421 320 (10.1) 174 328 533 5,552 (67.9) 1,813 MT-D* Lite 312 409 244 (10.1) 148 136 327 523 3,121 (42.7) 2,543 1,196 Game Map 512 512 Game Map 676 676 Forward Repeated A* 158 269 1,871 (29.1) 329 358 506 6,952 (55.9) 1,288 Forward GAA* 161 272 1,281 (20.0) 314 238 367 513 5,155 (43.9) 448 999 Backward Repeated A* 145 265 2,577 (48.2) 457 360 530 4,960 (47.5) 919 Backward GAA* 139 265 989 (19.8) 346 194 349 528 2,381 (32.2) 358 559 Basic MT-D* Lite 174 281 573 (13.5) 210 378 516 1,047 (37.9) 459 MT-D* Lite 171 277 342 (10.2) 263 135 379 517 637 (20.4) 483 302 k = 5,000 Random Grid Random Maze Forward Repeated A* 248 359 2,510 (25.8) 428 228 435 7,109 (87.5) 1,627 Forward GAA* 251 362 2,269 (23.9) 916 764 248 456 4,857 (58.3) 1,922 2,050 Backward Repeated A* 244 360 1,977 (20.2) 326 230 446 7,151 (95.5) 1,691 Backward GAA* 236 362 1,169 (14.1) 915 594 240 463 4,019 (57.4) 1,960 1,934 Basic MT-D* Lite 245 342 543 (15.4) 308 254 459 3,676 (61.3) 1,892 MT-D* Lite 249 346 450 (14.8) 260 246 454 2,681 (53.6) 1,480 1,629 Game Map 512 512 Game Map 676 676 Forward Repeated A* 57 192 2,572 (58.8) 409 317 482 6,437 (56.3) 1,207 Forward GAA* 62 203 1,519 (39.4) 1,195 605 330 490 4,798 (45.2) 1,294 1,355 Backward Repeated A* 47 186 3,674 (99.7) 602 299 482 4,598 (51.2) 833 Backward GAA* 43 186 1,475 (48.9) 1,487 682 292 487 2,455 (37.4) 1,233 924 Basic MT-D* Lite 56 191 1,215 (40.4) 452 335 488 1,361 (29.3) 665 MT-D* Lite 55 192 921 (36.9) 333 365 328 480 1,039 (27.9) 408 546 (a) = searches until the target is caught; (b) = moves until the target is caught; (c) = state expansions per search (in parenthesis: standard deviation of the mean); (d) = state propagations per search; (e) = state deletions per search; and (f) = runtime per search (in microseconds) Table 5.3: Experimental Results in Known Dynamic Grids For all three kinds of grids, when the number of action cost changes is small (that is, whenk 100), GAA* has a smaller runtime per search than Repeated A* with 169 the same search direction because GAA* updates the h-values to make them more informed over time and hence expands fewer states per search. The computational eort of the consistency procedure (= the overhead of the consistency procedure), that is, the number of state propagations to maintain the consistency of the h- values, is small when k is small. Thus, the savings of GAA* over Repeated A* in state expansions dominates the overhead of the consistency procedure, resulting in a smaller runtime per search than Repeated A*. For example, when k = 10 in random grids, Backward GAA* expands only about 51:38% of the states per search that Backward Repeated A* expands. It thus runs by a factor of 1.73 faster per search than Backward Repeated A*. Similarly, Backward GAA* runs faster per search than Backward Repeated A* by a factor of 2.44 in random mazes, 2.57 in the game map of size 512 512 and 2.51 in the game map of size 676 676. For all three kinds of grids, when the number of action cost changes is large (that is, when k = 5; 000), GAA* has a larger runtime per search than Repeated A* with the same search direction because the overhead of the consistency procedure increases ask increases. Whenk is large, the overhead of the consistency procedure dominates the savings of GAA* over Repeated A* in state expansions, resulting in a larger runtime per search than Repeated A*. For example, when k = 5; 000 in the game map of size 676 676 (and thus 2.19% of states change their block- age status), Backward GAA* expands only about 53.39% of the states per search that Backward Repeated A* expands. However, the overhead of the consistency procedure dominates the savings of Backward GAA* over Repeated A* in state 170 expansions. Thus, Backward GAA* runs by a factor of 1.11 more slowly per search than Backward Repeated A*. For all three kinds of grids and for all k, Backward GAA* has a smaller runtime per search than Forward GAA* for the same reason as discussed in Section 5.3.1. For example, when k = 10, Backward GAA* runs faster than Forward GAA* by a factor of 1.80 in random grids, 1.45 in random mazes, 1.18 in the game map of size 512 512 and 2.16 in the game map of size 676 676. Second, I discuss the experimental results for Forward and Backward Repeated A* and the search tree transforming incremental search algorithms, that is, Basic MT-D* Lite and MT-D* Lite. 2 Table 5.3 shows the following relationships: For all three kinds of grids and for all k, MT-D* Lite has a smaller runtime per search than Basic MT-D* Lite for the same reason as discussed in Section 5.3.1. For example, when k = 10, MT-D* Lite runs faster than Basic MT-D* Lite by a factor of 1.67 in random grids, 2.05 in random mazes, 1.82 in the game map of size 512 512 and 1.73 in the game map of size 676 676. For all three kinds of grids, Basic MT-D* Lite and MT-D* Lite have runtimes per search that generally increase as k increases because they generally update more g-values and hence expand more states per search. Thus, the savings in the runtime per search of Basic MT-D* Lite and MT-D* Lite over Forward and Backward Repeated A* generally decrease as k increases. For example, when k = 10, in random grids, Basic MT-D* Lite and MT-D* Lite run faster per search than 2 FRA* and G-FRA* do not apply to known dynamic terrains. 171 Forward Repeated A* by a factor of 2:73 and 4:56, respectively. These factors decrease to 2:62 and 4:31 when k = 100, 2:06 and 2:64 when k = 1; 000, and 1:39 and 1:65 when k = 5; 000. For all three kinds of grids except the random mazes (where the user-provided h-values are well-informed, see Section 5.3.1.1) and for all k, MT-D* Lite has the smallest runtime per search among all compared search algorithms. However, in ran- dom mazes (where the user-provided h-values are ill-informed, see Section 5.3.1.1), the savings in runtime per search of MT-D* Lite over Forward Repeated A* are not as large as the ones in random grids and game maps due to the large ratio r of the states deleted from the search tree and the total number of states in the search tree after each search. The reason for this is similar to that discussed for known static terrains in Section 5.3.1: A smallerr tends to result in a smaller runtime per search for MT-D* Lite. I perform additional experiments to compare the ratior of MT-D* Lite in random grids and in random mazes: For example, when k = 10, r is equal to 4:05% in random grids (where the user-provided h-values are well-informed, see Section 5.3.1.1), and 23:91% in random mazes (where the user-provided h-values are ill-informed, see Section 5.3.1.1). MT-D* Lite thus runs faster than Forward Repeated A* by a factor of 4:56 in random grids (with a small r), but a factor of only 2:18 in random mazes (with a large r). Similar results hold for all k, which explains why MT-D* Lite is slower in random mazes (where the user-provided h- values are ill-informed, see Section 5.3.1.1) compared to random grids (where they are well-informed, see Section 5.3.1.1). 172 Third, I compare the search tree transforming incremental search algorithms with the heuristic learning incremental search algorithms. Table 5.3 shows the following relation- ships: For all three kinds of grids except the random mazes (where the user-provided h-values are well-informed, see Section 5.3.1.1) and for all k, MT-D* Lite has a smaller runtime per search than both Forward and Backward GAA*. The reason is as follows: { For Forward and Backward GAA*, their runtimes per search generally increase as k increases because the number of state propagations per search increases ask increases since a larger number of action costs then decrease, which aects theh-values of more states. The consistency procedure needs to perform more state propagations to maintain the consistency of the h-values, resulting in a larger runtime per search. Moreover, Forward and Backward GAA* need to construct the search tree from scratch for each search. { For MT-D* Lite, its runtime per search generally increases with k (which is similar to both Forward and Backward GAA*), because it updates more g- values as k increases and hence expands more states per search. However, since the user-provided h-values are well-informed, the ratio r between the number of states deleted from the search tree and the total number of states in the search tree after each search is very small. For example, when k = 10,r is equal to 4:05% in random grids, which means that 95.95% of states belong to the reusable search tree. Thus, although the runtime per search of MT-D* 173 Lite generally increases withk, MT-D* Lite still runs faster than Forward and Backward GAA*, since a large portion of the search tree can be reused. For example, in random grids, MT-D* Lite runs faster per search than Forward and Backward GAA* by a factor of 3:75 and 2:09, respectively, when k = 10, 3:91 and 2:08 whenk = 100, 2:85 and 1:58 whenk = 1; 000, and 2:94 and 2:28 when k = 5; 000. For random mazes (where the h-values are ill-informed, see Section 5.3.1.1), there are three dierent cases as k increases: { First, when the number of action cost changes is small (that is, whenk 100), Backward GAA* has a smaller runtime per search than MT-D* Lite and both Forward and Backward Repeated A*, because Backward GAA* updates the h-values to make them more informed over time and hence focuses the search better than both Forward and Backward Repeated A*. The number of state propagations of the consistency procedure is small when k is small. Thus, the savings of Backward GAA* over both Forward and Backward Repeated A* in state expansions dominates the overhead of the consistency procedure, resulting in a smaller runtime per search than both Forward and Backward Repeated A*. For example, when k = 10, Backward GAA* runs faster than Forward and Backward Repeated A* by factors of 2:32 and 2:44, respectively. On the other hand, since MT-D* Lite has a large ratio r of states deleted from the search tree and the total number of states in the search tree after 174 each search in random mazes, Backward GAA* runs faster than MT-D* Lite in random mazes when k is small. { Second, when the number of action cost changes is medium (that is, when k = 1; 000), Backward GAA* has a larger runtime per search than MT-D* Lite because the number of state propagations of the consistency procedure increases as k increases. Although the runtime per search of MT-D* Lite also increases ask increases, the increase in the runtime per search of MT-D* Lite is smaller than the one of Backward GAA*. When k = 1; 000, MT-D* Lite has a smaller runtime per search than both Forward and Backward Repeated A* and Forward and Backward GAA*. { Third, when the number of action cost changes is large (that is, when k = 5; 000), Forward Repeated A* has a smaller runtime per search than both Forward and Backward GAA* and MT-D* Lite: Forward Repeated A* has a smaller runtime per search than Forward and Backward GAA* because the number of state propagations per search of the consistency procedure dom- inates the savings of GAA* over Forward Repeated A* in state expansions per search. Forward Repeated A* has a smaller runtime per search than MT-D* Lite because the runtime per search of MT-D* Lite increases as k increases. When k = 5; 000, the computational eort of MT-D* Lite needed to reuse the information from previous searches for the current search is larger than the computational savings gained from reusing the information. Thus, neither Forward or Backward GAA* nor MT-D* Lite should be used in this case. 175 5.3.2.1 Summary Finally, I summarize the results in Table 5.3. The runtime relationships of the fastest ve search algorithms are shown in Figure 5.5. An arrow points algorithm A to algorithm B i the runtime per search of algorithm A is smaller than the one of algorithm B. Figure 5.3 shows the following relationships: For all three kinds of grids except the random mazes (where the user-provided h- values are well-informed, see Section 5.3.1.1), MT-D* Lite has the smallest runtime per search among all compared search algorithms. For random mazes (where the user-provided h-values are ill-informed, see Sec- tion 5.3.1.1), there are three dierent cases as the number of action cost changes increases: { When the number of action cost changes is small (that is, when k 100), Backward GAA* has the smallest runtime per search among all compared search algorithms. { When the number of action cost changes is medium (that is, whenk = 1; 000), MT-D* Lite has the smallest runtime per search among all compared search algorithms. { When the number of action cost changes is large (that is, when k = 5; 000), Forward Repeated A* has the smallest runtime per search among all compared algorithms. 176 k = 10 MT-D* Lite Forward GAA* Backward GAA* Backward Repeated A* k = 100 Basic MT-D* Lite k = 1,000 k = 5,000 MT-D* Lite Basic MT-D* Lite Backward GAA* Backward Repeated A* Forward GAA* MT-D* Lite Basic MT-D* Lite Backward GAA* Backward Repeated A* Forward Repeated A* MT-D* Lite Basic MT-D* Lite Backward Repeated A* Backward GAA* Forward Repeated A* (a) Runtime Relationships in Random Grids k = 10 MT-D* Lite Forward GAA* Backward GAA* Forward Repeated A* k = 100 Basic MT-D* Lite k = 1,000 k = 5,000 MT-D* Lite Basic MT-D* Lite Backward GAA* Backward Repeated A* Forward GAA* MT-D* Lite Basic MT-D* Lite Backward GAA* Backward Repeated A* Forward Repeated A* MT-D* Lite Basic MT-D* Lite Backward Repeated A* Backward GAA* Forward GAA* (b) Runtime Relationships in Random Mazes k = 10 MT-D* Lite Backward GAA* & Basic MT-D* Lite k = 100 k = 1,000 k = 5,000 MT-D* Lite Basic MT-D* Lite Backward GAA* Forward Repeated A* Forward GAA* MT-D* Lite Basic MT-D* Lite Backward GAA* Forward Repeated A* MT-D* Lite Basic MT-D* Lite Backward Repeated A* Forward GAA* Forward Repeated A* Forward GAA* Forward Repeated A* Forward GAA* (c) Runtime Relationships in Game Map 512 512 k = 10 MT-D* Lite Forward GAA* Backward GAA* Backward Repeated A* k = 100 Basic MT-D* Lite k = 1,000 k = 5,000 MT-D* Lite Basic MT-D* Lite Backward GAA* Backward Repeated A* Forward GAA* MT-D* Lite Basic MT-D* Lite Backward GAA* Backward Repeated A* Forward Repeated A* MT-D* Lite Basic MT-D* Lite Backward Repeated A* Forward GAA* Backward GAA* (d) Runtime Relationships in Game Map 676 676 Figure 5.5: Runtime Relationships in Known Dynamic Terrains (Target Movement Strat- egy: Random Waypoint) 177 Random Grid Random Maze (a) (b) (c) (d) (e) (f) (a) (b) (c) (d) (e) (f) Forward Repeated A* 545 707 154 (6.4) 31 1,306 1,528 2,338 (36.0) 403 Forward GAA* 535 694 151 (6.6) 0 28 1,287 1,507 738 (5.6) 0 117 Backward Repeated A* 531 690 1,148 (26.2) 214 1,168 1,374 7,301 (43.9) 1,311 Backward GAA* 529 690 931 (24.0) 0 185 1,181 1,394 5,685 (37.5) 0 1,072 Basic MT-D* Lite 536 663 120 (6.6) 73 1,300 1,582 2,311 (37.2) 758 MT-D* Lite 537 686 113 (6.4) 5 69 1,382 1,586 1,796 (31.6) 706 637 Game Map 512 512 Game Map 676 676 Forward Repeated A* 368 473 323 (4.1) 48 965 1,235 362 (10.7) 69 Forward GAA* 369 473 262 (3.1) 0 42 964 1,233 317 (10.6) 0 66 Backward Repeated A* 425 526 2,022 (29.2) 328 869 1,134 5,318 (61.0) 1,221 Backward GAA* 424 525 1,690 (25.3) 0 292 902 1,180 3,680 (50.7) 0 880 Basic MT-D* Lite 378 469 350 (5.0) 135 926 1,144 315 (11.2) 170 MT-D* Lite 371 459 263 (3.4) 93 110 901 1,115 287 (11.4) 44 161 (a) = searches until the target is caught; (b) = moves until the target is caught; (c) = state expansions per search (in parenthesis: standard deviation of the mean); (d) = state propagations per search; (e) = state deletions per search; and (f) = runtime per search (in microseconds) Table 5.4: Experimental Results in Unknown Static Grids 5.3.3 Experiments in Unknown Static Terrains In this section, I compare all search algorithms described in Section 5.2.1 for moving target search in unknown static terrains. Since the TrailMax algorithm has been designed only for known static terrains, I compare all search algorithms in case the target uses the Random Waypoint movement strategy. All search algorithms are optimized for unknown static grids as discussed in Section 2.3.6.2: They replan only when either the target moves o the path or the action costs of the agent increase between the agent and the target along the path. First, I discuss the experimental results for Forward and Backward Repeated A* and heuristic learning incremental search algorithms, that is, Forward and Backward GAA*. The consistency procedure is not performed in unknown static terrains, since action costs are non-decreasing between searches in unknown static grids. Table 5.4 shows the following relationships: For all three kinds of grids, GAA* has a smaller runtime per search than Repeated A* with the same search direction because GAA* updates the h-values to make 178 them more informed over time and hence expands fewer states per search, resulting in a smaller runtime per search than A*. For example, Backward GAA* runs faster than Backward Repeated A* by a factor of 1.16 in random grids, 1.22 in random mazes, 1.12 in the game map of size 512 512 and 1.39 in the game map of size 676 676. For all three kinds of grids, Forward Repeated A* has a smaller runtime per search than Backward Repeated A* because Forward Repeated A* has a smaller number of state expansions per search than Backward Repeated A*. Similarly, Forward GAA* has a smaller runtime per search than Backward GAA*. For example, Forward Repeated A* has a smaller runtime per search than Backward Repeated A* by a factor 6.90 in random grids, and Forward GAA* has a smaller runtime per search than Backward GAA* by a factor of 6.61 in random grids. The reason why the search algorithms that search forward have smaller runtimes per search than the ones that search backward was discussed in previous literature: \We typically assign optimistic costs to edges whose costs we do not know. As a result, areas of the graph that have been observed have more expensive edge costs than the unexplored areas. This means that, when searching forwards, as soon as the search exits the observed area it can rapidly progress through the unexplored area directly to the goal. However, when searching backwards, the search initially rapidly progresses to the observed area, then once it encounters the more costly edges in the observed area, it begins expanding large portions of the unexplored area trying to nd a cheaper path. As a result, it can be signicantly more ecient to use forward A* 179 rather than backward A* when replanning from scratch." (Ferguson, Likhachev, & Stentz, 2005). Second, I discuss the experimental results for Forward and Backward Repeated A* and the search tree transforming incremental search algorithms, that is, Basic MT-D* Lite and MT-D* Lite. 3 Table 5.4 shows the following relationships: For all three kinds of grids, MT-D* Lite has a smaller runtime per search than Basic MT-D* Lite for the same reason as discussed in Section 5.3.1. For example, MT-D* Lite runs faster than Basic MT-D* Lite by a factor of 1.06 in random grids, 1.19 in random mazes, 1.23 in the game map of size 512 512 and 1.06 in the game map of size 676 676. For all three kinds of grids, MT-D* Lite has a larger runtime per search than Forward Repeated A*. 4 The reason is as follows: MT-D* Lite, as a variant of D* Lite, has a smaller runtime per search than Repeated A* when the action cost changes are close to the goal state (see Section 2.4.2.2). However, for moving target search, MT-D* Lite has to perform forward (but not backward) searches in order to reuse the search tree rooted in the current state of the agent. When the agent observes new blocked states near it, the resulting action cost increases are close to the current state of the agent and thus the start state of MT-D* Lite. MT-D* Lite then has to update a large number of g-values and hence expands a large number of states per search, resulting in a large runtime. Therefore, MT-D* Lite runs more 3 FRA* and G-FRA* are designed for known static terrains and hence do not apply to unknown static terrains. 4 I compare MT-D* Lite against Forward Repeated A* only, because MT-D* Lite runs faster than Basic MT-D* Lite and Forward Repeated A* runs faster than Backward Repeated A* in unknown static terrains. 180 slowly than Forward Repeated A*. For example, Forward Repeated A* runs faster than MT-D* Lite by a factor of 2.23 in random grids, 1.58 in random mazes, 2.29 in the game map of size 512 512 and 2.33 in the game map of size 676 676. Thus, MT-D* Lite and Basic MT-D* Lite should not be used in unknown static terrains. Third, I compare the search tree transforming incremental search algorithms with the heuristic learning incremental search algorithms. Table 5.4 shows the following relation- ships: For all three kinds of grids, MT-D* Lite has a larger runtime per search than For- ward GAA*. 5 This is because Forward GAA* updates the h-values to make them more informed over time and hence expands fewer states per search than Forward Repeated A*, resulting in a smaller runtime per search than Forward Repeated A*. Moreover, as discussed earlier, MT-D* Lite has a larger runtime per search than Forward Repeated A*. Thus, Forward GAA* runs faster than MT-D* Lite. For example, Forward GAA* runs faster than MT-D* Lite by a factor of 2.46 in random grids, 5.44 in random mazes, 2.62 in the game map of size 512 512 and 2.44 in the game map of size 676 676. 5.3.3.1 Summary Finally, I summarize the results in Table 5.4. The runtime relationships of the compared search algorithms are shown in Figure 5.6. An arrow points algorithm A to algorithm B i the runtime per search of algorithm A is smaller than the one of algorithm B. Figure 5.6 shows the following relationships: 5 I compare MT-D* Lite against Forward GAA* only, because MT-D* Lite runs faster than Basic MT-D* Lite and Forward GAA* runs faster than Backward GAA* in unknown static terrains. 181 Random Grid Forward GAA* Backward GAA* MT-D* Lite Basic MT-D* Lite Random Maze MT-D* Lite Game Map 512 x 512 Backward GAA* Basic MT-D* Lite Game Map 676 x 676 Basic MT-D* Lite Backward GAA* Forward Repeated A* Forward GAA* Forward GAA* Forward GAA* Forward Repeated A* Forward Repeated A* Forward Repeated A* MT-D* Lite MT-D* Lite Basic MT-D* Lite Backward GAA* Figure 5.6: Runtime Relationships in Unknown Static Terrains (Target Movement Strat- egy: Random Waypoint) For all three kinds of grids, Forward GAA* has the smallest runtime per search among all compared search algorithms. For all three kinds of grids, search algorithms that search forward have smaller runtimes per search than search algorithms that search backward. For all three kinds of grids, MT-D* Lite and Basic MT-D* Lite have larger runtimes per search than Forward Repeated A*. Thus, MT-D* Lite and Basic MT-D* Lite should not be used for moving target search in unknown static terrains. Overall, Forward GAA* is the fastest incremental search algorithm for moving target search in unknown static terrains. 5.4 Conclusions In this chapter, I have systematically evaluated the runtime of the developed incremental search algorithms for moving target search with respect to (1) dierent terrains introduced in Section 1.1; (2) dierent target movement strategies; (3) dierent numbers of action 182 Dynamic Terrains Static Terrains User-provided h-value Small Number of Medium Number of Large Number of No Informedness Cost Changes Cost Changes Cost Changes Cost Changes Known Terrains Well-informed MT-D* Lite MT-D* Lite MT-D* Lite FRA*/G-FRA* [2.82] [2.29] [1.12] [3.25/2.26] Ill-informed Backward GAA* MT-D* Lite Forward Repeated A* FRA*/G-FRA* [2.09] [1.59] [1.00] [2.29/2.12] Unknown Terrains Well-informed Forward GAA* [1.05] Ill-informed Forward GAA* [3.44] Table 5.5: Best Incremental Search Algorithms for Dierent Terrains cost changes in known dynamic terrains and (4) dierent kinds of grids introduced in Section 5.2.2. Nevertheless, there are limitations on the experimental setups: First, the only state space representation used for the experiments is four-neighbor grids. I have not performed more experiments with dierent state space represen- tations, such as eight-neighbor grids and polygonal maps (Rabin, 2002) in order to get more comprehensive comparison of the developed incremental search algorithms. As demonstrated in this chapter, the developed incremental search algorithms run even more slowly than both Forward and Backward Repeated A* in terrains where the number of action cost changes is large. Thus, it is expected that the developed incremental search algorithms do not work well in high dimensional state spaces with large branching factors (= the number of successor states of each state) in which the action costs can change between searches, because even a small change in the state space (for example, one observed obstacle) can typically aect a large number of action costs in this state space. Second, the sizes of the grids used for all experiments are about 500 500, which are about the same size as the game maps used by the game company Bioware. I 183 have not varied the size of the grids in order to get more comprehensive comparison of the developed incremental search algorithms. Third, the action costs can only change from one to innity or from innity to one in the experiments. I have not varied the values that the action costs can change to (For example, the action cost can change from one to a nite value.) in order to get more comprehensive comparison of the developed incremental search algorithms. Fourth, the speed of the target is smaller than the agent. I have not varied the ratio of the speed of the target and the agent in order to get more comprehensive comparison of the developed incremental search algorithms. Fifth, the sensor range of the agent is set to one cell in unknown static terrains. I have not varied the sensor range of the agent in order to get more comprehen- sive comparison of the developed incremental search algorithms in unknown static terrains. Sixth, the priority queues of the developed incremental search algorithms are im- plemented by using binary heaps only. I have not varied the data structure of the priority queues by using other containers, such as buckets, in order to get more comprehensive comparison of the developed incremental search algorithms. Seventh, the memory usage of the developed incremental search algorithms are not evaluated, since they have memory requirements that are only in linear in the num- ber of states in the state space, which often can be satised by many applications, such as computer games. This measurement can become important and should not 184 be skipped when applying the developed incremental search algorithms to applica- tions with high dimensional state spaces with large branching factors because those applications requires a large amount of memory, which might not be easily satised. Eighth, the experiments focus mainly on the test cases where there exists a path from the agent to the target. I have not focused on the test cases where there does not exist a path from the agent to the target, for which the developed incremental search algorithms do not run faster (or may even run more slowly) than Forward and Backward Repeated A*. Table 5.5 lists the search algorithms with the smallest runtime per search in dierent scenarios. The numbers in square brackets are the smallest ratio of the runtime per search of Repeated A* and the runtime per search of the fastest search algorithm in dierent kind of grids. The runtime per search of Repeated A* is the smaller runtime per search between Forward Repeated A* and Backward Repeated A*. 6 In known static terrains, FRA* and G-FRA* have the smallest runtime per search of all search algorithms described in Section 5.2.1. FRA* is optimized for grids and does not apply to other state space representations, such as state lattices, while G-FRA* applies to arbitrary state space representations. 6 For example, in known static terrains, when the user-provided h-values are well-informed and the movement strategy of the target is either Random Waypoint (see Table 5.1) or TrailMax (see Table 5.2), FRA* has the smallest runtime per search among all compared search algorithms. When the movement strategy of the target is Random Waypoint, the smallest ratio of the runtime per search of Repeated A* and the runtime per search of FRA* (= 4.96) is calculated by dividing the runtime per search of Forward Repeated A* (= 238 microseconds) by the one of FRA* (= 48 microseconds) in the game map 512 512. When the movement strategy of the target is TrailMax , the smallest ratio of the runtime per search of Repeated A* and the runtime per search of FRA* (= 3.25) is calculated by dividing the runtime per search of Backward Repeated A* (= 231 microseconds) by the one of FRA* (= 71 microseconds) in the game map 512 512. Thus, the smallest ratio reported is min(4.96, 3.25) = 3.25. 185 In known dynamic terrains, if the user-providedh-values are well-informed, MT-D* Lite has the smallest runtime per search, because the ratio of the number of states deleted from the search tree and the total number of states in the search tree after each search is very small. Therefore, MT-D* Lite can reuse a large portion of the search tree from the previous search for the current search. If the user-provided h- values are ill-informed, Backward GAA* has the smallest runtime per search when the number of action cost changes is small. When the number of action cost changes is medium, MT-D* Lite has a smaller runtime per search than Backward GAA* because the overhead of the consistency procedure of Backward GAA* becomes large. Finally, when the number of action cost changes is large, both Backward GAA* and MT-D* Lite have larger runtimes per search than Forward Repeated A*. In such a situation, incremental search algorithms should not be used, since their computational eort to reuse information from previous searches to speed up the current search is larger than the computational savings gained from reusing the information. In unknown static terrains, Forward GAA* has the smallest runtime per search because it updates the h-values to make them more informed over time. Basic MT-D* Lite and MT-D* run more slowly than Forward Repeated A* and Forward GAA* since the action cost changes are close to the current state of the agent (= start state) between searches. Basic MT-D* Lite and MT-D* Lite then have to update a large number of g-values and hence expand more states per search than Forward and Backward Repeated A*. 186 In this chapter, it has been demonstrated that the developed incremental search al- gorithms have smaller runtimes per search than both Forward and Backward Repeated A* for moving target search in dierent classes of terrains. However, there are scenar- ios in which their runtimes per search can be larger than both Forward and Backward Repeated A*. We need to be ware of these worst case scenarios and avoid using the developed incremental search algorithms in such scenarios: Heuristic learning incremental search algorithms, such as GAA*, have a smaller runtime per search than Repeated A* with the same search direction in known static and unknown static terrains, because GAA* updates itsh-values to make them more informed and hence focus its search better. However, when a larger number of action costs decrease in known dynamic terrains, the consistency procedure needs to perform more state propagations to maintain the consistency of the h-values, which can be computationally expensive. Thus, GAA* has a larger runtime per search than Repeated A* with the same search direction because of the overhead of the consistency procedure. Search tree transforming incremental search algorithms, such as MT-D* Lite, have a smaller runtime per search than both Forward and Backward Repeated A* in known static terrains, because MT-D* Lite reuses the part of search tree of the previous search rooted in the start state of the current search. However, MT-D* Lite has a larger runtime per search than both Forward and Backward Repeated A* in the following scenarios: 187 { In unknown static terrains, when the agent observes new blocked states near it, the resulting action cost increases are close to the current state of the agent and thus the start state of MT-D* Lite. MT-D* Lite then has to update a large number of g-values and hence expands a large number of states per search, resulting in a large runtime per search. { In known dynamic terrains, if the user-provided h-values in the state space are ill-informed, then the ratio r of the number of states deleted from the search tree and the total number of states in the search tree after each search of MT-D* Lite is large. Thus, MT-D* Lite can only reuse a small portion of the search tree of the previous search. Moreover, when a larger number of action costs change, MT-D* Lite also has to update a large number ofg-values and hence expands a large number of states per search. Thus, when both the user-provided h-values in the state space are ill-informed and a large number of action costs change, MT-D* Lite has a larger runtime per search than both Forward and Backward Repeated A*. Overall, the fastest incremental search algorithms that I have developed achieve run- times per search below 1 ms in known static terrains, unknown static terrains and known dynamic terrains when the number of action cost changes is small (that is, whenk 100). For example, FRA* achieves a runtime of 0.127 ms per search in the game map 676 676, which is one order of magnitude faster than Repeated A* in known static terrains. This runtime speedup is important given that all experiments have been performed on a relatively fast computer available in 2011, a Linux PC with an Intel Core 2 Duo 2.53 Ghz 188 CPU. On the other hand, many current mobile devices, such as cell phones or touchpads, have slower CPUs than the one I used for the experiments. However, game developers are designing more and more computer games based on mobile devices nowadays, which means that the search algorithms have to satisfy the runtime requirements with relatively slow CPUs. Thus, the developed incremental search algorithms have a good potential to be applied to computer games on mobile devices given their large speedups over Repeated A*. 189 Chapter 6: Applications In this chapter, I give an overview of applications of the developed incremental search al- gorithms. First, I describe an application of GAA*, namely, the Flexible Adaptive Proba- bilistic RoadMap planner (FAPRM planner) (Belghith et al., 2010) in Section 6.1. Second, I describe an application of G-FRA*, namely, applying G-FRA* to unmanned ground ve- hicle (UGV) navigation on state lattices (Sun et al., 2010a) in Section 6.2. Third, I describe an application of MT-D* Lite, namely, the JPathPlan path planner (Anand, 2011) in Section 6.3. 6.1 Generalized Adaptive A* Application Figure 6.1: Interface of the Roman Tutoring System (Belghith et al., 2010) 190 In this section, I describe an application of GAA*, namely, the Flexible Adaptive Probabilistic RoadMap planner (FAPRM planner) (Belghith et al., 2010), which has been developed in the context of the ROMAN Tutoring System (shown in Figure 6.1) by Khaled Belghith and Froduald Kabanza from the University of Sherbrooke (Canada) and Leo Hartman from the Canadian Space Agency for training astronauts to operate the Space Station Remote Manipulator System (SSRMS). I describe the operations of the FAPRM planner in a way that is sucient for understanding its principles. A detailed description of the application can be found in (Belghith et al., 2010) and Khaled Belghith's dissertation (Belghith, 2010). 6.1.1 Background Figure 6.2: Space Station Remote Manipulator System (NASA, 2001) The Space Station Remote Manipulator System (SSRMS) (shown in Figure 6.2) on- board the International Space Station (ISS) is a 17 meter long articulated robot arm (Cur- rie & Peacock, 2002). It has a complex geometry, with seven rotational joints, each with a range of 270 degrees. There are fourteen cameras covering dierent parts of the ISS, 191 which are able to observe the operations of the SSRMS. The SSRMS is a key component of the ISS and has been used in the assembly, maintenance and repair of the station, and also for moving payloads from visiting shuttles. Astronauts operate the SSRMS through a workstation located inside one of the ISS compartments. The ROMAN Tutoring System is used to train the operators of the SSRMS before they are sent to the ISS to accomplish their missions. The tutoring system is important for providing an unlimited supply of training examples for astronauts to gain and maintain their skills for long duration missions. In essence, the ROMAN Tutoring System observes an astronaut operating a simulation of the SSRMS to accomplish a task in the simulation environment. Every few seconds the system has to determine whether the astronaut is keeping the simulated SSRMS close to a path that achieves the task. If the simulated SSRMS is too far away from such a path, the tutoring system intervenes and shows the astronaut a more appropriate path generated by the FAPRM planner embedded in the tutoring system. The action costs of the SSRMS can change over time depending on the orbit of the ISS and the visibility of the corresponding area via the camera on-board the ISS since the SSRMS prefers a path to avoid regions with limited visibility in order to see the robot motion well. The visibility of an area can change over time. For example, a camera may point towards the sun as the SSRMS moves, resulting in visibility changes of the corresponding area that the camera views (Belghith et al., 2010). I do not provide a detailed description of how the action costs of the SSRMS are modeled since the pseudo code of GAA* is independent of how the action costs are modeled. Since the FAPRM planner uses a version of Backward GAA*, the goal state (= the current state of the agent) can change over time as the agent moves towards the 192 target (Belghith et al., 2010). The start state (= the current state of the target) does not change in the experiments performed in (Belghith et al., 2010), although the FAPRM planner can solve problems with moving targets. 1 The FAPRM planner has to be able to generate new paths for the SSRMS as fast as possible as the action costs change to guarantee that the tutoring system runs smoothly and without delay, while A* is generally not fast enough to satisfy the runtime requirements of the tutoring system. 1 In this section, I describe the FAPRM planner with an emphasis on the version of GAA* implemented in the planner. 6.1.2 State Space Representation The FAPRM planner uses Probabilistic RoadMaps (PRMs) (Lavalle, 2006), a sampling- based representation of the terrain that is dierent from the discretization-based repre- sentation described in Section 2.2. PRMs are most commonly used to eciently generate paths between any two congurations (Lavalle, 2006). A conguration of an articulated robot with n degrees of freedom is an n-element vector of the robot joint positions. The basic idea behind the sampling-based representation is to generate a set of sample con- gurations and try to connect these congurations to each other through sequences of actions. Each conguration corresponds to a state in the state space representation. 6.1.3 Overview of the Experimental Results Experiments were performed in the context of training an astronaut to manipulate the SSRMS from a given start conguration to a given goal conguration, which remains 1 I thank Professor Froduald Kabanza (University of Sherbrooke), one of the designers of the ROMAN Tutoring Systems, to conrm this. 193 unchanged over time (Belghith, 2010). Since the FAPRM planner uses a version of Backward GAA*, searches are performed from the goal conguration (= start state) to the current conguration of the SSRMS (= goal state). Thus, the goal state can change over time as the SSRMS moves towards the goal conguration. The simulated SSRMS has 7 degrees of freedom and the terrain consists of 75 obstacles modeled with 85,000 triangles. The experiments were run on a 2.86 GHZ Core 2 Processor with 2GB of RAM. The experiments were run three times with the operator doing exactly the same manipulations to reach the goal conguration. The average runtime per search of the FAPRM planner and the SBL planner (Sanchez & Latombe, 2001) are reported. Except for the rst few searches, the FAPRM planner runs faster than the SBL planner, because the FAPRM planner uses GAA* to update the h-values after each search to make them more informed and thus future searches more focused. The FAPRM planner runs faster than the SBL planner by up to a factor of four (shown in Figure 1.3 of (Belghith, 2010)). 6.1.4 Discussion This application demonstrated that the FAPRM planner that uses Backward GAA* achieves a smaller runtime per search than the SBL planner in a state space representa- tion that is dierent from grids. The FAPRM planner combines the incremental search algorithm (Backward GAA*) with the sampling-based representation of the state space so that the planner can be applied to robotic applications with high dimensional state spaces. The sampling-based representation of the state space can signicantly reduce the number of states in the state space compared to the discretized representation of the state space (see Chapter 2) at the expense of potentially increase the cost of the 194 path of each search. I believe that combing incremental search with the sampling-based representation of the state space is a promising direction to apply incremental search algorithm to applications with high dimensional state spaces in the future. Nevertheless, there are limitations on the experimental setups to demonstrate the rumtime per search of the FAPRM planner: First, all experiments are performed in situations where the target does not move between searches, although the FAPRM planner can solve problems with moving targets. Thus, the experimental results do not directly demonstrate the performance of the FAPRM planner for moving target search. Second, the FAPRM planner uses a sampling-based representation of the state space, resulting in an increased cost of the path found by each search. The ex- perimental results have not shown the ratio of the cost of the path found and the minimum cost. Third, the runtime per search of Backward GAA* is aected by the informedness of theh-values used during each search. It is unclear how informed the user-provided h-values are in this application, and by how much the updated h-values are more informed than the user-provided h-values. 6.2 Generalized Fringe-Retrieving A* Application In this section, I describe an application of G-FRA*, namely, applying G-FRA* to moving target search on state lattices, which are popular kinds of state spaces for unmanned ground vehicle (UGV) navigation (Pivtoraiko & Kelly, 2005a, 2005b; Kelly, Howard, & 195 Green, 2007; Pivtoraiko, Knepper, & Kelly, 2009; Likhachev & Ferguson, 2009; Kushleyev & Likhachev, 2009). A detailed description of the application can be found in (Sun et al., 2010a). 6.2.1 Background Moving target search is important for robotics applications. For example, UGVs often have to follow other friendly or hostile UGVs or move from one surveillance position to the next one as their surveillance target moves. FRA* and G-FRA* are experimentally the fastest and the second fastest incremental search algorithms for moving target search in known static grids, respectively (see Chapter 5). Although grids are popular test domains for agent navigation in articial intelligence (Ishida & Korf, 1991; Moldenhauer & Sturtevant, 2009b; Hern andez & Meseguer, 2005; Chimura & Tokoro, 1994; Undeger & Polat, 2007), they are rather unrealistic models for UGV navigation since they are unable to model motion constraints. Unfortunately, FRA* uses geometric properties that are specic to grids and thus does not apply to arbitrary graphs. State lattices (described in Section 2.2.2) are popular state spaces for UGV navigation that are able to model motion constraints of UGVs. Fortunately, G-FRA* does not suer from being specic to grids and still runs fast on state lattices. I therefore apply G-FRA* to moving target search in known static state lattices. 6.2.2 State Space Representation State lattices can accurately model the motion constraints of UGVs and guarantee that the paths found are feasible for them to follow. This makes state lattices well-suited 196 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 START GOAL Figure 6.3: Example Unmanned Ground Vehicle Path 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1 2 3 4 (a) pr2.mprim 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1 2 3 4 (b) pr2sides.mprim 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1 2 3 4 (c) mprim unic sideback.mprim 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 1 2 3 4 (d) pr2 all 2.5cm 20turncost.mprim Figure 6.4: Motion Primitives for path planning for UGVs, and thus they have been used widely for UGV naviga- tion (Likhachev & Ferguson, 2009; Kushleyev & Likhachev, 2009; Pivtoraiko et al., 2009; Ru i & Siegwart, 2010). I implemented G-FRA* in the Search-Based Planning Library (SBPL) (Likhachev, 2009), a publicly available library for UGV navigation with state lattices, which is writ- ten in C++. SBPL discretizes the orientation of the UGV into multiples of 22:5 and constructs the state lattices from three inputs, namely the size of the UGV, a terrain 197 denition le and a motion primitive denition le. The terrain denition le species the size of the grid, the size of each cell, the cells of the obstacles, the translational and rotational velocities of the UGV and the start and goal states of the UGV. The motion primitive denition le denes the motion primitives for each state. The cost of a motion primitive is its execution time. Figure 6.3 shows an example UGV path for a state lattice constructed by SBPL. I use the default size of the UGV, namely 20 centimeters 4 centimeters. I use one of the example terrain denition les (env2.cfg), which denes the size of the grid to be 1200 cells 100 cells, the size of each cell to be 2.5 centimeters 2.5 centimeters, the translational velocity of the UGV to be 1 meter per second and the rotational velocity of the UGV to be 22:5 per second. 6.2.3 Overview of the Experimental Results I use four motion primitive denition les together with the default h-values, namely the amount of time needed for the UGV to move to the target at maximal speed in a straight line in the absence of obstacles and motion constraints. Figure 6.4 shows the motion primitives dened in the four les for state (8; 2; 90 ). The target moves randomly but skips every tenth move to enable the UGV to catch it. The problem is solved when the UGV and the target are in the same state. Experimental results show that G-FRA* has a smaller runtime per search than Forward and Backward Repeated A* and Forward and Backward GAA* for all motion primitive denition les, because it reuses parts of the search tree of the previous search and hence expands fewer states per search than Forward and Backward Repeated A* and Forward and Backward GAA*. The runtime per search of G-FRA* is smaller than the one of Forward Repeated A* by factors of 12.54 to 21.28 198 and smaller than the one of Forward GAA* by factors of 6.52 to 11.45. The runtime per search of G-FRA* is smaller than the one of Backward Repeated A* by factors of 10.58 to 20.32 and smaller than the one of Backward GAA* by factors of 3.28 to 4.58. Overall, the experimental results of G-FRA* on state lattices are similar to the experimental results on grids that were shown in Section 5.3.1. G-FRA* runs up to one order of magnitude faster than both Forward and Backward Repeated A* (Sun et al., 2010a). The source code of SBPL that I extended for this application is available at (Sun, 2012). 6.2.4 Discussion The implementation of G-FRA* in SBPL demonstrated that G-FRA* is up to one order of magnitude faster than both Forward and Backward Repeated A* when applied to UGV navigation on state lattices, which extends the applicability of the newly developed incremental heuristic search algorithms for moving target search in a dierent state space from grids. Moreover, the priority queues of the compared search algorithms in SBPL are implemented by using the hashed buckets, which are dierent from the binary heaps used in Chapter 5. Thus, the experimental results demonstrate that G-FRA* runs faster than both Forward and Backward Repeated A* when the priority queues are implemented with dierent data structures. Nevertheless, since I wanted to use similar experimental settings as earlier experiments in Chapter 5, I did not attempt to build on a realistic simulation of moving target search with UGVs, and there are limitations on the experimental setups: First, I used a coarse-grained discrete simulation on the motion-primitive level. 199 Second, the UGV and target took turns executing uninterruptible actions with potentially dierent execution times. Third, the UGV and target had symmetrical motion capabilities. Fourth, the motion strategy of the target was simplistic. Fifth, I assumed the exact position for the target, and did not try to run into the target. It is therefore my future work to evaluate G-FRA* as part of a realistic simulation of moving target search with UGVs, where (a) a motion simulator provides a continuous simulation, (b) the UGV and target are able to move in parallel, (c) the UGV and target have dierent motion capabilities, and (d) the target uses a more sophisticated motion strategy. 6.3 Moving Target D* Lite Application In this section, I describe an application of MT-D* Lite, the JPathPlan path planner, which has been implemented by Abhijeet Anand from the Royal Melbourne Institute of Technology (RMIT) in the RACT-GOAL agent system for the International Multi-Agent Contest (Dastani, Dix, & Novak, 2005). A detailed description of the application can be found in Abhijeet Anand's Master's thesis (Anand, 2011). 200 6.3.1 Background The International Multi-Agent Contest started in 2005 and provides a platform for re- searchers from all over the world to demonstrate their research in multi-agent systems design, development and programming (Anand, 2011). The gold-mining game in the International Multi-Agent Contest involves two teams of six agents each, interacting with a remote simulation server. The game runs on a grid. The agents need to collect gold and deposit it into a depot. They need to deposit gold when they reach their maximum carrying capacity. They are not allowed to enter the depot if they are not carrying any gold. However, if they are carrying gold and enter the depot but do not deposit it, they are punished by being teleported to a random location on the grid. The agents initially have no information about certain areas of the grid. Moreover, moving target search is necessary since the gold piece (= target) can be moved from its current cell to one of the neighboring cells before the agent arrives. Finally, path planning must be fast to allow the agent to collect the gold quickly (Anand, 2011). The RACT (= RMIT Agent Contest Team) group of the Royal Melbourne Institute of Technology has designed the RACT-GOLD system for participating in the gold-mining game (Yadav, Zhou, Sardina, & R onnquist, 2010). However, the path planner in the RACT-GOLD system was too slow. Thus, they developed JPathPlan, a new path planner for the RACT-GOLD system based on MT-D* Lite. 6.3.2 State Space Representation The gold-mining game maps are represented as grids. The agents, gold pieces and the depot are placed in unblocked cells. The content of a cell is initially unknown to an 201 agent. In each time step, the game server sends each agent a request-action message. A request-action message contains information about the map, the agent's current x- and y- coordinate on the grid, the time step number and the number of gold pieces that the agent carries. The agent can respond to a request-action message by sending an action message. This action message can either encode a movement action (up, left, down or right), a pick up or drop gold action, or a skip action. The game server imposes a timeout period for every request-action message that is sends to the agents. If an agent fails to reply within the timeout period, the game server considers this as a skip action. If an agent sends an action message with a skip action while it is in the depot, it is teleported to a random unblocked cell on the grid (Anand, 2011). 6.3.3 Overview of the Experimental Results Both Forward Repeated A* and MT-D* Lite have been implemented in the JPathPlan planner (in JAVA) and compared in the context of the gold-mining game. The following performance measures have been reported in (Anand, 2011): total time spent on search; time per search, which measures how quickly an agent responds to the server after it receives a request-action message from the server; and the number of time steps skipped due to path planning taking longer than the timeout period for a time step. Simulation results show that MT-D* Lite outperforms Forward Repeated A* with respect to all performance measures (Anand, 2011). For example, in four-neighbor grids 202 of size 300 300 (shown in Table 4.2 in (Anand, 2011)), MT-D* Lite runs faster than Forward Repeated A* by a factor of 3.47 when 8% of randomly chosen cells are blocked, 5.86 when 9% of randomly chosen cells are blocked, and 7.96 when 10% of randomly chosen cells are blocked. The developer of the JPathPlan planner concluded that MT-D* Lite provides a substantial improvement over Forward Repeated A* in the context of the gold-mining game of the International Multi-Agent Contest (Anand, 2011). 6.3.4 Discussion The JPathPlan planner demonstrated that MT-D* Lite can achieve a smaller runtime per search than Forward Repeated A* in the context of the gold-mining game, which is similar to the results reported in Chapter 5. Nevertheless, there are limitations on the experimental setups to demonstrate the feasibility of MT-D* Lite when applied to the moving target search applications in multi-agent systems: First, although the RACT-GOLD system consists of six agents, the developer only assigned one of the agents the task of path planning. The other agents in the system move randomly and provide location updates to the single agent that is assigned the task of path planning. Second, only one incremental heuristic search algorithm is implemented in the RACT-GOLD system. Since the runtime per search of MT-D* Lite can be aected by the informedness of the user-providedh-values used during each search, it is un- clear how informed the user-provided h-values are in the terrain used, and whether 203 Backward GAA* (which has not been implemented in the JPathPlan planner) can have a smaller runtime per search than MT-D* Lite or not. 6.4 Conclusions In this chapter, I described applications of the developed incremental search algorithms. In Section 6.1, I described the Flexible Adaptive Probabilistic RoadMap planner (FAPRM planner), that uses Backward GAA* for moving target search. The FAPRM planner has been implemented in the ROMAN Tutoring System for training astronauts to operate the Space Station Remote Manipulator System. Results showed that the FAPRM planner runs faster than the SBL planner by up to a factor of four. In Section 6.2, I described how I applied G-FRA* to moving target search for unmanned ground vehicle naviga- tion on state lattices. Results showed that G-FRA* runs faster than both Forward and Backward Repeated A* by up to one order of magnitude. In Section 6.3, I described the JPathPlan path planner that uses MT-D* Lite for moving target search. The JPathPlan path planner has been implemented by researchers from the Royal Melbourne Institute of Technology in their RACT-GOLD system, a multi-agent system designed for the In- ternational Multi-Agent Contest. Results showed that JPathPlan with MT-D* Lite runs faster than JPathPlan with repeated A* by up to a factor of 7.96. Thus, the developed incremental search algorithms provide a substantial improvement over repeated A* in applications of moving target search. 204 Chapter 7: Conclusions In moving target search, an agent's information about the terrain can change and the target can move over time. Thus, the agent often needs to repeatedly perform searches to nd new cost-minimal paths to the target. Incremental search algorithms can reuse information from previous searches to speed up the current search and hence nd cost- minimal paths for series of similar path planning problems faster than is possible by solving each path planning problem from scratch. Incremental search algorithms generally fall into two classes: Heuristic learning incremental search algorithms reuse information from the previ- ous searches to update the h-values of the current search so that they become more informed and focus the current search better. Search tree transforming incremental search algorithms reuse the search tree of the previous search for the current search so that the current search does not need to start from scratch. However, there did not exist incremental search algorithms that apply to moving target search, where both the start and goal states can change between searches, and run faster than A*. Thus, I have developed new incremental search algorithms that 205 signicantly speed up search-based path planning for moving target search. The key contributions of this dissertation are as follows: In Chapter 3, I developed new heuristic learning incremental search algorithms for moving target search. In particular, I developed Generalized Adaptive A* (GAA*) (Sun et al., 2008), an incremental A* variant that extends MT-Adaptive A* (Koenig et al., 2007) to moving target search in terrains where the action costs of the agent can both increase and decrease between searches. Thus, GAA* ap- plies to known static, unknown static and known dynamic terrains. I demonstrated experimentally that Forward GAA* runs faster than Backward GAA*, Forward and Backward Repeated A*, Basic MT-D* Lite and MT-D* Lite for moving target search in unknown static terrains, and Backward GAA* runs faster than Forward GAA*, Forward and Backward Repeated A*, Basic MT-D* Lite and MT-D* Lite for moving target search in known dynamic terrains when the user-providedh-values are ill-informed and the number of action cost changes is small. In Chapter 4, I developed new search tree transforming incremental search algo- rithms for moving target search: { First, I developed Generalized Fringe-Retrieving A* (G-FRA*), an incremental A* variant for moving target search in known static terrains. G-FRA* is currently the fastest incremental search algorithm for moving target search in known static terrains, and runs up to one order of magnitude faster than Forward and Backward Repeated A*. 206 { Second, I developed Fringe-Retrieving A* (FRA*) (Sun et al., 2009), an incre- mental A* variant that optimizes G-FRA* to apply to moving target search in known static grids only. FRA* is currently the fastest incremental search algorithm for moving target search in known static grids, and runs up to one order of magnitude faster than Forward and Backward Repeated A*. { Finally, I developed Moving Target D* Lite (MT-D* Lite) by combining the principles of G-FRA* and D* Lite. MT-D* Lite applies to moving target search in terrains where the action costs of the agent can increase and decrease between searches. Thus, MT-D* Lite applies to known static, unknown static and known dynamic terrains. MT-D* Lite is currently the fastest incremental search algorithm for moving target search in known dynamic terrains when the user-provided h-values are well-informed, and runs by up to a factor of eight faster than Forward and Backward Repeated A*. In Chapter 5, I systematically evaluated the runtime of the developed incremental search algorithms for moving target search with respect to (1) dierent terrains, in- cluding the known static, unknown static and known dynamic terrains; (2) dierent target movement strategies, including the random waypoints and TrailMax strate- gies; (3) dierent numbers of action cost changes in known dynamic terrains and (4) dierent kinds of grids, including random grids, random mazes and two game maps. I discussed the strengths and weaknesses of them and provided guidelines for when to choose a particular algorithm over others with Table 5.5. 207 In Chapter 6, I gave an overview of the following applications of the developed incremental search algorithms. { For GAA*, I described the Flexible Adaptive Probabilistic RoadMap (FAPRM) planner by researchers from the University of Sherbrooke and the Canadian Space Agency, that uses Backward GAA*. The FAPRM planner has been used in the ROMAN Tutoring System for training astronauts to operate the Space Station Remote Manipulator System. { For G-FRA*, I applied G-FRA* to unmanned ground vehicle (UGV) naviga- tion in state lattices. { For MT-D* Lite, I described the JPathPlan path planner by researchers from the Royal Melbourne Institute of Technology (RMIT). The JPathPlan path planner has been used in their RACT-GOLD system, a multi-agent system designed for the International Multi-Agent Contest. For future work, I now discuss possible approaches to speed up path planning for moving target search even further: Currently, there does not exist any incremental search algorithm that both learns h-values and reuses search trees from the previous searches for moving target search. Thus, one possible approach to speed up path planning for moving target search is to combine heuristic learning incremental search algorithms with search tree trans- forming incremental search algorithms. 208 Anytime search algorithms (Likhachev et al., 2007), such as Anytime Repairing A* (ARA*) (Likhachev, Gordon, & Thrun, 2003), solve a single search problem by quickly nding an initial, error-bounded suboptimal path and then improving this path until the available time runs out (Likhachev et al., 2007). Thus, one possible approach to speed up path planning for moving target search is to combine anytime search algorithms with incremental search algorithms, resulting in anytime incremental search algorithms. State abstraction techniques (Holte et al., 1996; Bulitko, Sturtevant, Lu, & Yau, 2007) map several states in the state space to a single abstract state. The abstract state space consists of all abstract states. Thus, the number of abstract states in the abstract state space can be much smaller than the number of states in the state space. The abstract states that the start and goal states are mapped to become the abstract start and goal states. One can rst nd a path between the abstract start and goal states in the abstract state space, which can be faster than nding a path between the start and goal states in the state space since the number of abstract states is smaller than the number of states, and then rene this path to a path in the state space, which can be used by the agent for navigation. Currently, the developed search tree transforming incremental search algorithms, such as FRA* and MT-D* Lite, require the agent to move along the path found by the previous search so that the agent (= start state of each search) cannot move outside the search tree of the previous search. However, in realistic applications, such as UGV navigation, there are situations where the agent can accidentally move 209 o the path found by the previous search and move outside the search tree of the previous search. In these situations, the developed search tree transforming incre- mental search algorithms cannot reuse any part of the search tree of the previous search. Thus, one potential research direction is to investigate situations where the agent can move outside the search tree of the previous search, and develop new search tree transforming incremental search algorithms that can reuse the search trees in such situations. 210 Bibliography Anand, A. (2011). Path planning in agents with incomplete information in dynamic environments. Master's thesis, Royal Melbourne Institute of Technology. Anderson, R. (1988). A robot ping-pong player: experiment in real-time intelligent control. MIT Press. Barbehenn, M., & Hutchinson, S. (1995). Ecient search and hierarchical motion plan- ning using dynamic single-source shortest paths trees. In Proceedings of the IEEE Transactions on Robotics and Automation, pp. 198{214. Belghith, K. (2010). Simulateur Tutoriel Intelligent pour les operations robotisees : Ap- plication au bras canadien sur la station spatiale internationale. Ph.D. thesis, Uni- versity of Sherbrooke. Belghith, K., Kabanza, F., & Hartman, L. (2010). Using a randomized path planner to generate 3d task demonstrations of robot operations. In Proceedings of the Inter- national Conference on Autonomous and Intelligent Systems, pp. 1{6. Bond, D., Widger, N., Ruml, W., & Sun, X. (2010). Real-time search in dynamic worlds. In Proceedings of the Symposium on Combinatorial Search. Bulitko, V., & Lee, G. (2006). Learning in real-time search: A unifying framework. Journal of Articial Intelligence Research, 25, 119{157. Bulitko, V., Bjornsson, Y., Luvstrek, M., Schaeer, J., & Sigmundarson, S. (2007). Dy- namic control in path-planning with real-time heuristic search. In Proceedings of the International Conference on Automated Planning and Scheduling, pp. 49{56. Bulitko, V., & Lee, G. (2006). Learning in real-time search: A unifying framework. Journal of Articial Intelligence Research, 25, 119{157. Bulitko, V., Sturtevant, N., Lu, J., & Yau, T. (2007). Graph abstraction in real-time heuristic search. Journal of Articial Intelligence Research, 30, 51{100. 211 Chimura, F., & Tokoro, M. (1994). The trailblazer search: a new method for searching and capturing moving targets. In Proceedings of the National Conference on Articial Intelligence, pp. 1347{1352. Coue, C., & Bessiere, P. (2001). Chasing an elusive target with a mobile robot. In Proceedings of the IEEE International Conference on Robotics and Automation, pp. 1370{1375. Croft, E., Fenton, R., & Benhabib, B. (1998). Optimal rendezvous-point selection for robotic interception of moving objects. IEEE Transactions on Systems, Man, and Cybernetics, 28(2), 192{204. Currie, N., & Peacock, B. (2002). International space station robotic systems operations - a human factors perspective. In Proceedings of Human Factors and Ergonomics Society Annual Meeting, Aerospace Systems, pp. 26{30. Dastani, M., Dix, J., & Novak, P. (2005). International multi-agent programming contest. http://www.multiagentcontest.org. Deo, N., & Pang, C.-Y. (1984). Shortest-path algorithms: taxonomy and annotation. Networks, 14, 275{323. Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269{271. Edelkamp, S. (1998). Updating shortest paths. In Proceedings of the European Conference on Articial Intelligence, pp. 655{659. Ferguson, D. (2006). Single agent and multi-agent path planning in unknown and dy- namic environments. Ph.D. thesis, School of Computer Science, Carnegie Mellon University. Ferguson, D., Likhachev, M., & Stentz, A. (2005). A guide to heuristic-based path plan- ning. In Proceedings of the International Workshop on Planning under Uncertainty for Autonomous Systems, International Conference on Automated Planning and Scheduling. Ferguson, D., & Stentz, A. (2005a). The Delayed D* algorithm for ecient path re- planning. In Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2045{2050. Ferguson, D., & Stentz, A. (2005b). The Field D* algorithm for improved path planning and replanning in uniform and non-uniform cost environments. Tech. rep. CMU- RI-TR-05-19, Robotics Institute, Carnegie Mellon University. 212 Ferguson, D., & Stentz, A. (2005c). Field D*: An interpolation-based path planner and replanner. In Proceedings of the International Symposium on Robotics Research, pp. 239{253. Freda, L., & Oriolo, G. (2007). Vision-based interception of a moving target with a nonholonomic mobile robot. Robotics and Autonomous Systems, 55(6), 419{432. Frigioni, D., Marchetti-Spaccamela, A., & Nanni, U. (1998). Semi-dynamic algorithms for maintaining single-source shortest path trees. Algorithmica, 22, 250{274. Furcy, D., & Koenig, S. (2000). Speeding up the convergence of real-time search. In Proceedings of the National Conference on Articial Intelligence, pp. 891{897. Garrido, S., Moreno, L., Abderrahim, M., & Monar, F. M. (2006). Path planning for mobile robot navigation using Voronoi diagram and fast marching. In International Conference on Intelligent Robots and Systems, pp. 2376{2381. Goldenberg, M., Kovarsky, E., Wu, X., & Schaeer, J. (2003). Multiple agents moving target search. In Proceedings of the International Joint Conference on Articial Intelligence, pp. 1511{1512. Goto, Y., & Stentz, A. (1987). Mobile robot navigation: The cmu system. IEEE Expert, 2(4), 44 { 55. Hahn, G., & MacGillivray, G. (2006). A note on k-cop, l-robber games on graphs. Discrete Mathematics, 306, 2492{2497. Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 2, 100{107. Hern andez, C., Baier, J. A., Uras, T., & Koenig, S. (2012). Position paper: Incremen- tal search algorithms considered poorly understood. In Proceedings of the Annual Symposium on Combinatorial Search, pp. 159{160. Hern andez, C., & Meseguer, P. (2005). LRTA*(k). In Proceedings of the International Joint Conference in Articial Intelligence, pp. 1238{1243. Hern andez, C., & Meseguer, P. (2007a). Improving HLRTA*(k). In Proceedings of the Conference of the Spanish Association for Articial Intelligence, pp. 110{119. Hern andez, C., & Meseguer, P. (2007b). Improving LRTA*(k). In Proceedings of the International Joint Conference in Articial Intelligence, pp. 2312{2317. 213 Hern andez, C., Meseguer, P., Sun, X., & Koenig, S. (2009). Path-Adaptive A* for in- cremental heuristic search in unknown terrain. In Proceedings of the International Conference on Automated Planning and Scheduling, pp. 358{361. Holte, R. C., Mkadmi, T., & Macdonald, A. J. (1996). Speeding up problem solving by abstraction: A graph oriented approach. Articial Intelligence, 85, 321{361. Howard, T., & Kelly, A. (2007). Optimal rough terrain trajectory generation for wheeled mobile robots. International Journal of Robotics Research, 26(1), 141{166. Hrabar, S., & Sukhatme, G. S. (2009). Vision-based navigation through urban canyons. Journal of Field Robotics, 26(5), 431{452. Isaza, A., Lu, J., Bulitko, V., & Greiner, R. (2008). A cover-based approach to multi-agent moving target pursuit. In Proceedings of the Conference on Articial Intelligence and Interactive Digital Entertainment, pp. 54{59. Ishida, T. (1992). Moving target search with intelligence. In National Conference on Articial Intelligence, pp. 525{532. Ishida, T. (1997). Real-time search for learning autonomous agents. Kluwer Academic Publishers. Ishida, T., & Korf, R. E. (1991). Moving target search. In Proceedings of the International Joint Conference in Articial Intelligence, pp. 204{211. Kelly, A., Howard, T., & Green, C. (2007). Terrain aware inversion of predictive mod- els for high performance ugvs. In Proceedings of the SPIE Defense and Security Symposium, pp. 6561{6566. Koenig, S., & Likhachev, M. (2002). D* Lite. In Proceedings of the National Conference on Articial Intelligence, pp. 476{483. Koenig, S., & Likhachev, M. (2005). Adaptive A*. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 1311{1312. Koenig, S., Likhachev, M., & Furcy, D. (2004a). Lifelong Planning A*. Articial Intelli- gence Journal, 155(1{2), 93{146. Koenig, S., Likhachev, M., Liu, Y., & Furcy, D. (2004b). Incremental heuristic search in articial intelligence. Articial Intelligence Magazine, 25(2), 99{112. 214 Koenig, S., Likhachev, M., & Sun, X. (2007). Speeding up moving-target search. In Pro- ceedings of the International Joint Conference on Autonomous Agents and Multi- Agent Systems, pp. 1136{1143. Koenig, S., & Likhachev, M. (2006). Real-time Adaptive A*. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 281{288. Koenig, S., Likhachev, M., Liu, Y., & Furcy, D. (2004). Incremental heuristic search in AI. Articial Intelligence Magazine, 25(2), 99{112. Koenig, S., & Smirnov, Y. (1996). Sensor-based planning with the freespace assumption. In Proceedings of the IEEE International Conference on Robotics and Automation, pp. 3540{3545. Kolling, A., Kleiner, A., Sycara, K., & Lewis, M. (2011). Computing and executing strate- gies for moving target search. In Proceedings of the IEEE International Conference on Robotics and Automation, pp. 4246{4253. Korf, R. E. (1990). Real-time heuristic search. Articial Intelligence, 42(2-3), 189{211. Kushleyev, A., & Likhachev, M. (2009). Time-bounded lattice for ecient planning in dynamic environments. In Proceedings of the IEEE International Conference on Robotics and Automation, pp. 4303{4309. Lavalle, S. (2006). Planning Algorithms. Cambridge University Press. Likhachev, M., Gordon, G., & Thrun, S. (2003). ARA*: Anytime A* search with provable bounds on sub-optimality. In Proceedings of Conference on Neural Information Processing Systems. Likhachev, M. (2005). Search-based planning for large dynamic environments. Ph.D. thesis, School of Computer Science, Carnegie Mellon University. Likhachev, M. (2009). SBPL Library. www.seas.upenn.edu/maximl/software.html. Likhachev, M., & Ferguson, D. (2009). Planning long dynamically feasible maneuvers for autonomous vehicles. International Journal of Robotics Research, 28(8), 933{945. Likhachev, M., Ferguson, D., Gordon, G., Stentz, A., & Thrun, S. (2007). Anytime search in dynamic graphs. Articial Intelligence, 172(14), 1613{1643. 215 Likhachev, M., Ferguson, D., Gordon, G., Stentz, A., & Thrun, S. (2005a). Anytime Dynamic A*: An anytime, replanning algorithm. In Proceedings of the International Conference on Automated Planning and Scheduling, pp. 262{271. Likhachev, M., Ferguson, D., Gordon, G., Stentz, A., & Thrun, S. (2005b). Anytime Dy- namic A*: The proofs. Tech. rep. CMU-RI-TR-05-12, Robotics Institute, Carnegie Mellon University. Loh, P., & Prakash, E. (2009). Performance simulations of moving target search algo- rithms. International Journal of Computer Games Technology, 2009, 31{36. Mehrandezh, M., Sela, N., Fenton, R., & Benhabib, B. (2000). Robotic interception of moving objects using an augmented ideal proportional navigation guidance tech- nique. IEEE Transactions on Systems, Man and Cybernetics, 30(3), 238{250. Moldenhauer, C., & Sturtevant, N. (2009a). Evaluating strategies for running from the cops. In Proceedings of the International Joint Conference on Artical Intelligence, pp. 584{589. Moldenhauer, C., & Sturtevant, N. (2009b). Optimal solutions for moving target search. In Proceedings of the International Conference on Autonomous Agents and Multia- gent Systems, pp. 1249{1250. NASA (2001). SSRMS. http://science.nasa.gov/science-news/science-at-nasa/ 2001/ast18apr 1/. Nourbakhsh, I., & Genesereth, M. (1996). Assumptive planning and execution: a simple, working robot architecture. Autonomous Robots, 3(1), 49{67. Pearl, J. (1985). Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley. Pivtoraiko, M., & Kelly, A. (2005a). Generating near minimal spanning control sets for constrained motion planning in discrete state spaces. In Proceedings of the International Conference on Intelligent Robots and Systems, pp. 3231{3237. Pivtoraiko, M., & Kelly, A. (2005b). Ecient constrained path planning via search in state lattices. In International Symposium on Articial Intelligence, Robotics and Automation in Space, pp. 1{7. Pivtoraiko, M., Knepper, R. A., & Kelly, A. (2009). Dierentially constrained mobile robot motion planning in state lattices. Journal of Field Robotics, 26(1), 308{333. Rabin, S. (2002). AI Game Programming Wisdom. Charles River Media. 216 Ramalingam, G., & Reps, T. (1996). An incremental algorithm for a generalization of the shortest-path problem. Journal of Algorithms, 21, 267{305. Riley, M., & Atkeson, C. (2002). Robot catching: Towards engaging human-humanoid interaction. Autonomous Robots, 12(1), 119{128. Ru i, M., & Siegwart, R. (2010). On the design of deformable input- / state-lattice graphs. In Proceedings of the IEEE International Conference on Robotics and Automation, pp. 3071{3077. Sanchez, G., & Latombe, J. (2001). A single-query bi-directional probabilistic roadmap planner with lazy collision checking. In Proceedings of the International Symposium on Robotics Research, pp. 403{417. Shue, L.-Y., & Zamani, R. (1993). An admissible heuristic search algorithm. In Proceed- ings of the International Symposium on Methodologies for Intelligent Systems, pp. 69{75. Stentz, A. (1994). Optimal and ecient path planning for partially-known environments. In Proceedings of the IEEE International Conference on Robotics and Automation, pp. 3310{3317. Stentz, A. (1995). The focussed D* algorithm for real-time replanning. In Proceedings of the International Joint Conference on Articial Intelligence, pp. 1652{1659. Stentz, A. (1997). Best information planning for unknown, uncertain, and changing do- mains. In Proceedings of the National Conference on Articial Intelligence Work- shop on On-Line Search, pp. 110{113. Sturtevant, N., & Bulitko, V. (2011). Learning where you are going and from whence you came: h-and g-cost learning in real-time heuristic search. In Proceedings of the International Joint Conference on Articial Intelligence, pp. 365{370. Sun, X. (2012). Source code. www-scf.usc.edu/xiaoxuns/research.html. Sun, X., & Koenig, S. (2007). The Fringe-Saving A* search algorithm - a feasibility study. In Proceedings of the International Joint Conference on Articial Intelligence, pp. 2391{2397. Sun, X., Koenig, S., & Yeoh, W. (2008). Generalized Adaptive A*. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 469{476. 217 Sun, X., Yeoh, W., & Koenig, S. (2009). Ecient incremental search for moving target search. In Proceedings of the International Joint Conference on Articial Intelli- gence, pp. 615{620. Sun, X., Yeoh, W., & Koenig, S. (2010a). Generalized Fringe-Retriving A*: Faster moving target search on state lattices. In Proceedings of the International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 1081{1087. Sun, X., Yeoh, W., & Koenig, S. (2010b). Moving Target D* Lite. In Proceedings of the International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 67{74. Thrun, S. (1998). Learning metric-topological maps for indoor mobile robot navigation. Articial Intelligence, 99(1), 21{71. Thrun, S., & Buecken, A. (1996). Integrating grid-based and topological maps for mobile robot navigation. In Proceedings of the National Conference on Articial Intelli- gence, pp. 944{950. Trovato, K. (1990). Dierential A*: An adaptive search method illustrated with robot path planning for moving obstacles and goals and an uncertain environment. Inter- national Journal on Pattern Recognition and Articial Intelligence, 4(2), 245{268. Undeger, C., & Polat, F. (2007). Moving target search in grid worlds. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 1{3. Urmson, C. (2008). Autonomous driving in urban environments: Boss and the urban challenge. Journal of Field Robotics, 25(8), 425{466. Urmson et al., C. (2008). Autonomous driving in urban environments: Boss and the Urban Challenge. Journal of Field Robotics (Special Issue on the 2007 DARPA Urban Challenge, Part I), 25(1), 425{466. Vandapel, N., Donamukkala, R. R., & Hebert, M. (2006). Unmanned ground vehicle navigation using aerial ladar data. The International Journal of Robotics Research, 25(1), 31{51. Yadav, N., Zhou, C., Sardina, S., & R onnquist, R. (2010). A BDI agent system for the cow herding domain. Annals of Mathematics and Articial Intelligence, 59(3-4), 313{333. 218 Yap, P., Burch, N., Holte, R. C., & Schaeer, J. (2011). Any-angle path planning for computer games. In Proceedings of the Conference on Articial Intelligence and Interactive Digital Entertainment, pp. 201{207. Zhang, Z., Sturtevant, N., Holte, R., Schaeer, J., & Felner, A. (2009). A* search with inconsistent heuristics. In Proceedings of the International Joint Conference on Articial Intelligence, pp. 634{639. 219
Abstract (if available)
Abstract
In this dissertation, I demonstrate how to speed up path planning for moving target search, which is a problem where an agent needs to move to a target and the target can move over time. It is assumed that the current locations of the agent and the target are known to the agent at all times. The information about the terrain that an agent has is the action costs of the agent in any particular location of the terrain. The information about the terrain that the agent has can change over time depending on different applications: For example, when a robot is deployed in a new terrain without any map a priori, the robot initially does not have any information about the terrain and it has to acquire information about the terrain during navigation. However, a character in a computer game may have complete information about the terrain that remains unchanged over time given that the whole game map is loaded in memory and is available to the character. I use the following movement strategy for the agent that is based on assumptive planning: The agent first finds a cost-minimal path to the target with the information about the terrain that is currently available to it. The agent then starts moving along the path. Whenever new information about the terrain is acquired or the target moves off the path, the agent performs a new search to find a new cost-minimal path from the agent to the target. The agent uses this movement strategy until either the target is caught or the agent finds that there does not exist any path from the agent to the target after a search (and in any future searches), upon which the agent stops navigation. Since the agent's information about the terrain can change and the target can move over time, the agent needs to repeatedly perform searches to find new cost-minimal paths to the target. Path planning for moving target search by using this movement strategy is thus often a repeated search process. Additionally, agents need to find new cost-minimal paths as fast as possible, such that they move smoothly and without delay. ❧ Many path planning algorithms have been developed, among which incremental search algorithms reuse information from previous searches to speed up the current search and are thus often able to find cost-minimal paths for series of similar search problems faster than by solving each search problem from scratch. Incremental search algorithms have been demonstrated to be very successful in path planning for many important applications in robotics. However, it is assumed that the target does not move over time during navigation for most incremental search algorithms, and they are either inapplicable or run more slowly than A* to solve moving target search. Thus, I demonstrate how to speed up search-based path planning for moving target search by developing new incremental search algorithms. ❧ In my dissertation, I make the following contributions: (1) I develop Generalized Adaptive A* (GAA*), that learns $h$-values (= heuristic values) to make them more informed for moving target search. GAA* applies to moving target search in terrains where the action costs of the agent can change between searches. (2) I develop Generalized Fringe-Retrieving A* (G-FRA*), that transforms the search tree of the previous search to the search tree of the current search for moving target search. Though G-FRA* applies only to moving target search in terrains where the action costs of the agent do not change between searches, it creates a new way of transforming the search tree of the previous search to the search tree of the current search. (3) I develop Moving Target D* Lite (MT-D* Lite), that transforms the search tree of the previous search to the search tree of the current search for moving target search. MT-D* Lite combines the principles of G-FRA* and D* Lite, an existing incremental search algorithm. MT-D* Lite applies to moving target search in terrains where the action costs of the agent can change between searches. (4) I compare the new incremental search algorithms, discuss their strengths and weaknesses and provide guidelines for when to choose a particular algorithm over another. Simulation results show that the developed incremental search algorithms run up to one order of magnitude faster than A* for moving target search.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Any-angle path planning
PDF
Speeding up multi-objective search algorithms
PDF
Efficient and effective techniques for large-scale multi-agent path finding
PDF
Efficient bounded-suboptimal multi-agent path finding and motion planning via improvements to focal search
PDF
Target assignment and path planning for navigation tasks with teams of agents
PDF
Speeding up distributed constraint optimization search algorithms
PDF
Speeding up path planning on state lattices and grid graphs by exploiting freespace structure
PDF
Auction and negotiation algorithms for cooperative task allocation
PDF
A framework for research in human-agent negotiation
PDF
Risk-aware path planning for autonomous underwater vehicles
PDF
Improving decision-making in search algorithms for combinatorial optimization with machine learning
PDF
Artificial intelligence for low resource communities: Influence maximization in an uncertain world
PDF
Speeding up trajectory planning for autonomous robots operating in complex environments
PDF
Informative path planning for environmental monitoring
PDF
A statistical ontology-based approach to ranking for multi-word search
PDF
Modeling emotional effects on decision-making by agents in game-based simulations
PDF
Robot trajectory generation and placement under motion constraints
PDF
Decentralized real-time trajectory planning for multi-robot navigation in cluttered environments
PDF
Planning for mobile manipulation
PDF
A search-based approach for technical debt prioritization
Asset Metadata
Creator
Sun, Xiaoxun
(author)
Core Title
Incremental search-based path planning for moving target search
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
04/16/2013
Defense Date
01/18/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
artificial intelligence,heuristic search,moving target search,OAI-PMH Harvest,search
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Koenig, Sven (
committee chair
), Likhachev, Maxim (
committee member
), Raghavendra, Raghu (
committee member
), Zyda, Michael (
committee member
)
Creator Email
sunxiaoxun@gmail.com,xiaoxuns@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-236875
Unique identifier
UC11294048
Identifier
etd-SunXiaoxun-1552.pdf (filename),usctheses-c3-236875 (legacy record id)
Legacy Identifier
etd-SunXiaoxun-1552.pdf
Dmrecord
236875
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Sun, Xiaoxun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
artificial intelligence
heuristic search
moving target search
search