Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Towards addressing spatio-temporal aspects in security games
(USC Thesis Other)
Towards addressing spatio-temporal aspects in security games
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Towards Addressing Spatio-Temporal Aspects in Security Games by Fei Fang A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) August 2016 Copyright 2016 Fei Fang Acknowledgements Having now gone through the process, I would like to express my gratitude to those who helped me reach this point. I have enjoyed the past five years with the privilege to have shared experi- ences, conservations and relationships with a number of extraordinary people along the way. I would like to start by thanking my advisor Milind Tambe. Five years ago, I joined Teamcore without much knowledge about research. Thank you for seeing the potential in me and guiding me through every step of doing research. Thank you for the many opportunities that you provided me, including working on various great projects such as PROTECT-Ferry and PAWS, collaborat- ing with researchers within and outside our field, and supporting me for attending conferences all around the world. Your passion and devotion to research have inspired me and is something I will strive to emulate in my career. Your commitment to working on real problems and making a real-world impact has shaped my philosophy of research. I learned so much from you, not only about research but also how to be a good mentor with care and patience. I will be forever grateful for your heartfelt advice for the future career path. Your commitment to your students extends long after graduation, and the Teamcore family network continues to show its strength over time. ii I would like to thank my committee members: Shaddin Dughmi, Leana Golubchik, Jelena Mirkovic and Suvrajeet Sen, not only for serving on my qualifying committee but also for pro- viding input and suggestions to my research and future career. Your guidance is so valuable to me, and I am thankful for all your time and efforts on helping and supporting me. I would like to thank United States Coast Guard personnel Craig Baldwin for your consis- tent support of the PROTECT project and in particular the PROTECT-Ferry application and for making the application a success. Second, I would like to thank Rob Pickles, Wai Y . Lam and Gopalasamy R. Clements from two non-government organizations Panthera and Rimba for their input and help in making the PAWS application a success including a regular deployment in the field. Also, I would like to thank Arnaud Lyet and Barney Long from World Wildlife Fund for their helpful feedback and Andrew Lemieux for his support for the field test of PAWS. And I would like to thank our partners in the field who made the test and deployment of the PAWS application possible. I am also grateful for the support from the Army Research Office on my research over the years. During my time at USC, I had the privilege of collaborating with many excellent researchers: Peter Stone, Janusz Marecki, Vincent Conitzer, Heidi J. Albers, Rajiv Maheswaran, Nicole Sin- tov, Christopher Kiekintveld, Bo An, Albert Jiang, Matthew P. Johnson, Francesco M. Delle Fave, William Haskell, Matthew Brown and Arunesh Sinha. I thank you for your guidance and insights on the projects we worked on as well as the patience you showed in your mentoring. Being in a large research group, I was fortunate enough to share the Ph.D. experience with a group of bright and talented fellow students who all became my friends: Manish Jain, James Pita, Jun-young Kwak, Rong Yang, Zhengyu Yin, Jason Tsai, Eric Shieh, Thanh Nguyen, Le- andro Marcolino, Chao Zhang, Yundi Qian, Debarun Kar, Benjamin Ford, Haifeng Xu, Amulya iii Yadav, Aaron Schlenker, Sara Mc Carthy, Yasaman D. Abbasi, Shahrzad Gholami, Bryan Wilder and Elizabeth Orrico. Thank you for making my life full of laugh and excitement and for always being ready to help. I will always remember sharing offices, traveling to conferences and grind- ing through conference deadlines and attending group seminars and retreats. Special thanks to Manish Jain for all the advice you have given to me over the years and Rong Yang for all your kind help, and especially for your help on my research and life during my first year. Also, I want to thank all the people that I had the privilege to mentor: Dana Thomas, Brian Schwedock, Amandeep Singh, Matthew Burke, David Liao, Kevin Hoffman and Jewels Kovach. It was a great experience working with you, and I learned a lot from you and the mentoring experience. I would like to thank Yu Cheng for being a constant source of support and encouragement. This thesis would have never been possible without you. You have shared this entire journey with me, and you have been my rock along the way. Lastly, I would like to thank my parents for the love and support you have provided me over the years. Thank you for always believing in me, supporting my decisions and sharing my good times and bad times. iv Contents Acknowledgements ii List Of Figures viii List Of Tables xi Abstract xii 1 Introduction 1 1.1 Spatio-temporal Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Novel Challenges in Green Security Domains . . . . . . . . . . . . . . . . . . . 7 1.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Background 10 2.1 Security Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Human Behavior Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Related Work 14 3.1 Stackelberg Security Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Continuous Strategy Space in Games . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Planning and Learning in Repeated Games . . . . . . . . . . . . . . . . . . . . . 18 4 Reasoning in Continuous Time 20 4.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.1 Domain Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.1.2 Defender Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.3 Attacker Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.4 Utility Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.5 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1.6 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.1 Representing Defender’s Strategies . . . . . . . . . . . . . . . . . . . . 29 4.2.2 DASS: Discretized Attacker Strategies . . . . . . . . . . . . . . . . . . . 34 4.2.3 CASS: Continuous Attacker Strategies . . . . . . . . . . . . . . . . . . 36 4.2.4 Optimal Defender Strategy in the Original Game . . . . . . . . . . . . . 43 4.2.5 Generalized Model with Multiple Defender Resources . . . . . . . . . . 44 4.3 Equilibrium Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 v 4.3.1 Route Adjust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.2 Flow Adjust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4 Extension To Two-Dimensional Space . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.1 Defender Strategy for 2-D . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.4.2 DASS for 2-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.4.3 CASS for 2-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.5 Route Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.6.1 Experiments for One-Dimensional Setting . . . . . . . . . . . . . . . . . 74 4.6.1.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . 74 4.6.1.2 Performance of Solvers . . . . . . . . . . . . . . . . . . . . . 75 4.6.1.3 Improvement Using Refinement Methods . . . . . . . . . . . . 78 4.6.1.4 Sampled Routes . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.6.1.5 Number of Patrollers . . . . . . . . . . . . . . . . . . . . . . 82 4.6.1.6 Approximation Approach for Multiple Defender Resources . . 84 4.6.2 Experiments for Two Dimensional Setting . . . . . . . . . . . . . . . . . 84 4.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5 Reasoning in Continuous Space 92 5.1 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.1.1 Detection probability models . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2 Patrol Allocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2.1 Algorithmic extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6 Reasoning with Frequent and Repeated Attacks 111 6.1 Motivation and Defining GSGs . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.2 Planning in GSGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.3 Learning and Planning in GSGs . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.4 General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.5.1 Planning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6.5.2 Learning and Planning Framework . . . . . . . . . . . . . . . . . . . . . 132 6.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7 Reasoning about Spatial Constraints 134 7.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.2 First Tests and Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 7.3 PAWS Overview and Game Model . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.3.1 Input and Initial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.3.2 Build Game Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.3.2.1 Defender Action Modeling . . . . . . . . . . . . . . . . . . . 144 7.3.2.2 Poacher Action Modeling and Payoff Modeling . . . . . . . . 147 7.4 Calculate Patrol Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.5 Deployment and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 vi 7.6 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8 Conclusion and Future Directions 156 8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Bibliography 161 vii List Of Figures 1.1 Different domains with complex spatio-temporal aspects. . . . . . . . . . . . . . 3 4.1 (a) Protecting ferries with patrol boats; (b) Part of the map of New York Har- bor Commuter Ferry Routes. The straight line linking St. George Terminal and Whitehall Terminal indicates a public ferry route run by New York City Depart- ment of Transportation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 An example with three targets (triangles) and two patrollers (squares). The pro- tective circles of the patrollers are shown with protection radiusr e . A patroller protects all targets in her protective circle. PatrollerP 1 is protectingF 2 andP 2 is protectingF 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3 Compact representation: x-axis shows time intervals; y-axis the discretized distance-points in the one-dimensional movement space. . . . . . . . . . . . . . 32 4.4 An example to show how a target moving fromd 3 tod 2 during [t k ;t k+1 ] is pro- tected. In a sub-interval [ r qk ; r+1 qk ], a patroller either always protects the target or never protects the target. Equivalently, the target is either always within the protective circle of a patroller or always outside the circle. . . . . . . . . . . . . 37 4.5 Sub-interval analysis in (t k ;t k+1 ) for the example shown in Figure 4.4.] . . . . . 38 4.6 An example to show one equilibrium outperforms another when the attacker is constrained to attack in [t 0 ;t 2 ] ift 0 > 1 1 . . . . . . . . . . . . . . . . . . . . . . 52 4.7 Step (i): decomposition. Every time a route containing the minimal flow variable is subtracted and a residual graph is left for further decomposition. The origi- nal flow distribution is thus decomposed into three routesR 2 ,R 1 , andR 3 with probability 0:4, 0:2 and 0:4 respectively. . . . . . . . . . . . . . . . . . . . . . . 56 4.8 An example of flow adjust. An rational attacker who is constrained to attack in [t 1 ;t 2 ] will choose to attack around time (t 1 +t 2 )=2 to get utilityU 0 givenf 0 and attack aroundt 1 ort 2 to get utility 0:5U 0 givenf 1 . . . . . . . . . . . . . . . 63 viii 4.9 Part of route map of Washington State Ferries . . . . . . . . . . . . . . . . . . . 66 4.10 An illustration of the calculation of intersection points in the two-dimensional setting. Thex andy axes indicates the position in 2-D and thez axis is the time. To simplify the illustration,z axis starts from timet k . In this example, there are two intersection points occurring at time pointst a andt b . . . . . . . . . . . . . . 69 4.11 Schedules of the ferries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.12 Performance under different randomized utility function settings. The utility function in this set of experiments is a function of the distance to Terminal A. The utility function is piece-wise linear and the value at discretized distance pointsd i is chosen randomly between [0,10]. . . . . . . . . . . . . . . . . . . . . . . . . 76 4.13 Performance under different realistic utility function settings. The utility function is U-shape or inverse U-shape. The utility around distance 0:5 is denoted asU mid . We compare the defender strategy given by DASS and CASS with the baseline whenU mid is changing from 1 to 20. . . . . . . . . . . . . . . . . . . . . . . . . 77 4.14 The attacker’s expected utility function given the defender strategy calculated by DASS vs CASS under example setting. The expected utilities at the discretized time points are indicated by squares for CASS and dots for DASS. The maximum of AttEU under CASS is 3.82, 30%less than the maximum of AttEU under DASS, which is 4.99. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.15 Performance of equilibrium refinement approaches. . . . . . . . . . . . . . . . . 79 4.16 Comparison of refinement approaches. . . . . . . . . . . . . . . . . . . . . . . . 81 4.17 Results for sampling under the example setting: (a) Frequency of each edge is chosen when the first sampling method based on Markov strategy is used. (b) Decomposed routes with highest probability superimposed on ferry schedules when the second sampling method based on decomposition is used. . . . . . . . 82 4.18 Performance with varying number of patrollers. . . . . . . . . . . . . . . . . . . 83 4.19 An example setting in two-dimensional space . . . . . . . . . . . . . . . . . . . 85 4.20 Experimental results under two-dimensional settings . . . . . . . . . . . . . . . 86 5.1 “A truck loaded with illegally cut rosewood passes through Russey Chrum Vil- lage...in the Central Cardamom Protected Forest.” Photo from (Boyle, 2011). . . 93 5.2 The forest, with the pristine area shaded. . . . . . . . . . . . . . . . . . . . . . . 96 ix 5.3 The shaded regions correspond to the reduction in marginal benefits within the patrol zone. Not shown are the (less dramatic) effects onb() following the patrol zone, due to the cumulative capture probability. . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.4 Patrol strategy effectiveness for sampleb();c() functions. . . . . . . . . . . . . 105 6.1 Snare poaching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.2 Experimental results show improvements over algorithms from previous work. . 129 6.3 Robustness against uncertainty in 0 when = 1 . . . . . . . . . . . . . . . . . 132 7.1 A picture of a snare placed by poachers. . . . . . . . . . . . . . . . . . . . . . . 139 7.2 One patrol route during the test in Uganda. . . . . . . . . . . . . . . . . . . . . . 140 7.3 First 4-day patrol in Malaysia. Figure 7.3(a) shows one suggested route (orange straight lines) and the actual patrol track (black line). Figure 7.3(b) shows the patrollers walking along the stream during the patrol. . . . . . . . . . . . . . . . 140 7.4 Illustrative examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.5 PAWS Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.6 KAPs (black) for 2 by 2 grid cells. . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.7 New integrated algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.8 Various signs recorded during PAWS patrols. . . . . . . . . . . . . . . . . . . . 152 7.9 One daily PAWS Patrol route in Aug. 2015. . . . . . . . . . . . . . . . . . . . . 153 x List Of Tables 4.1 Summary of notations involved in the chapter. . . . . . . . . . . . . . . . . . . . 89 4.2 Two full representations that can be mapped into the same compact representation shown in Figure 4.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.3 Step (ii): Adjust each route greedily. . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4 Step (iii): compose a new compact representation. . . . . . . . . . . . . . . . . 90 4.5 Details about discretization levels. In the experiments mentioned in this section, the distance space is evenly discretized, parameterized by d =d i+1 d i . . . . . 90 4.6 Comparison of different refinement approaches in terms of average performance and runtime. Only the runtime for the refinement process is calculated. . . . . . . 90 4.7 The maximum of attacker’s expected utility in each time interval decreases after flow-adjust is used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.8 Performance of approximation approach. . . . . . . . . . . . . . . . . . . . . . . 91 6.1 Summary of key notations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.2 Payoff structure of Example 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.1 Problem Scale for PAWS Patrols. . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.2 Basic Information of PAWS Patrols. . . . . . . . . . . . . . . . . . . . . . . . . 151 7.3 Summary of observations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 xi Abstract Game theory has been successfully used to handle complex resource allocation and patrolling problems in security and sustainability domains. More specifically, real-world applications have been deployed for different domains based on the framework of security games, where the de- fender (e.g., security agency) has a limited number of resources to protect a set of targets from an adversary (e.g., terrorist). Whereas the first generation of security games research provided algorithms for optimizing security resources in mostly static settings, my thesis advances the state-of-the-art to a new generation of security games, handling massive games with complex spatio-temporal settings and leading to real-world applications that have fundamentally altered current practices of security resource allocation. Indeed, in many real-world domains, players act in a geographical space over time, and my thesis is then to expand the frontiers of security games and to deal with challenges in domains with spatio-temporal dynamics. My thesis provides the first algorithms and models for advancing key aspects of spatio-temporal challenges in security games, including (i) continuous time; (ii) continuous space; (iii) frequent and repeated attacks; (iv) complex spatial constraints. First, focusing on games where actions are taken over continuous time (for example games with moving targets such as ferries and refugee supply lines), I propose a new game model that accurately models the continuous strategy space for the attacker. Based on this model, I provide xii an efficient algorithm to calculate the defender’s optimal strategy using a compact representation for both the defender and the attacker’s strategy space. Second, for games where actions are taken over continuous space (for example games with forest land as a target), I provide an algorithm computing the optimal distribution of patrol effort. Third, my work addresses challenges with one key dimension of complexity – frequent and repeated attacks. Motivated by the repeated inter- action of players in domains such as preventing poaching and illegal fishing, I introduce a novel game model that deals with frequent defender-adversary interactions and provide algorithms to plan effective sequential defender strategies. Furthermore, I handle complex spatial constraints that arise from the problem of designing optimal patrol strategy given detailed topographical information. My thesis work has led to two applications which have been deployed in the real world and have fundamentally altered previously used tactics, including one used by the US Coast Guard for protecting the Staten Island Ferry in New York City and another deployed in a protected area in Southeast Asia to combat poaching. xiii Chapter 1 Introduction Security and sustainability challenges exist all around the world. These challenges include pro- tecting critical infrastructure and transportation networks, preventing intrusions in cyber systems, as well as protecting environmental resources, saving endangered wildlife from poaching and stopping illegal fishing. A unifying theme in these challenges is the strategic reasoning between the law enforcement agencies and the adversaries such as terrorists and poachers. The law en- forcement agencies only have limited resources, and it is not possible to protect everything at all times; and at the same time, the adversaries can conduct surveillance to observe the agencies’ actions. Therefore, any deterministic allocation of resources can be exploited by the adversaries, and it is important for the agencies to introduce randomness to the allocation. Game theory has become a well-established paradigm for modeling complex resource al- location and scheduling problems in security and sustainability domains (Tambe, 2011; Gatti, 2008; Agmon, Kraus, & Kaminka, 2008; Basilico, Gatti, & Amigoni, 2009). One game-theoretic model that has received significant attention is the Stackelberg security game (denoted as SSG) model with two players, the defender, and the attacker. In SSG, the defender needs to allocate and schedule their limited resources to protect a set of targets from the attacker. The defender 1 commits to a mixed strategy, which is a randomized schedule specified by a probability distribu- tion over deterministic schedules; the attacker then observes the distribution and plays the best response (Korzhyk, Conitzer, & Parr, 2010a). Models and algorithms have been proposed to compute the optimal strategy for the defender efficiently to address real-world challenges, form- ing the first generation of security games (Jain, 2013; Yin, 2013; Pita, 2012; Yang, 2014; Shieh, 2015; Brown, 2015). Decision-support systems based on SSG and the proposed algorithms have been successfully deployed in several domains to assist security agencies (Pita, Jain, Marecki, Ord´ o˜ nez, Portway, Tambe, Western, Paruchuri, & Kraus, 2008a; Tsai, Rathi, Kiekintveld, Or- donez, & Tambe, 2009a; Shieh, An, Yang, Tambe, Baldwin, DiRenzo, Maule, & Meyer, 2012a; Yin, Jiang, Johnson, Kiekintveld, Leyton-Brown, Sandholm, Tambe, & Sullivan, 2012b). In most previous work on security games, only limited spatio-temporal aspects have been con- sidered: First, the targets are assumed to be stationary (e.g., airport terminals (Pita, Jain, Marecki, Ord´ o˜ nez, Portway, Tambe, Western, Paruchuri, & Kraus, 2008b)), or stationary relative to the de- fender and the attacker (e.g., trains (Yin & Tambe, 2012) and planes (Tsai, Rathi, Kiekintveld, Ordonez, & Tambe, 2009b), where the players can only move along with the targets to protect or attack them). Second, the targets are assumed to be discretized, such as airport terminals and dif- ferent regions in port. Third, it is assumed that the attackers can conduct long-term surveillance to understand the defender’s strategy and then plan for a one-shot attack. Fourth, spatial con- straints that restrict the movement of the defender’s resources (e.g., patrollers) are often ignored or significantly simplified. However, as the security game model has evolved, there has been a push towards increas- ingly complex security domains where these simple assumptions might not be sufficient. My thesis focuses on such domains and addresses the research challenges raised by introducing more 2 (a) Ferry protection (b) Forest protection (c) Wildlife protection (d) Fishery protection Figure 1.1: Different domains with complex spatio-temporal aspects. complex spatio-temporal aspects. The first part of my thesis considers spatio-temporal continuity of targets and players’ action space in security games. The second part of my thesis focuses on spatio-temporal dynamics in “green (environmental)” security domains such as preventing poach- ing and illegal fishing, where frequent and repeated attacks and complex spatial constraints are involved. The technical contributions of my thesis include models and algorithms for addressing these spatio-temporal aspects. 1.1 Spatio-temporal Continuity The first part of my thesis focuses on spatio-temporal continuity in security games. In many real- world scenarios, players act in a geographical space over time even considering a one-shot attack, and these scenarios would lead to a need of considering the continuous time and space. 3 One important scenario is moving target protection, where the defender has mobile resources to protect targets that are moving according to a known daily schedule and with changing utility values. This scenario captures domains such as protecting ferry systems from potential attacks and protecting refugee supply lines. The changing position of the targets makes it necessary to reason about the continuous space and time, which leads to significant challenges in representing the players’ strategies and calculating the equilibrium. For example, the action set for the attacker has infinite actions, as he can choose not only which target to attack, but also when to attack. Such infinite action space due to continuous time and space was not considered in previous work in security games. Another important scenario where continuous space needs to be considered is protecting a large area from intruders. The intruders’ reward is not only measured by whether or not he in- trudes the area without being captured by the patroller, but also how far he intrudes. This scenario abstracts the problem of protecting a forest land from excessive fuel-wood extraction where the rangers aim to deter extractors who intrude the forest from the boundary of the protected land. Calculating the optimal strategies for the players is a significant challenge as the action sets for both players have infinite cardinality: the extractors need to decide how far to intrude, and the rangers need to plan their patrol effort within the whole area. To handle the infinite action space introduced by spatio-temporal continuity in security games, one approach that is often used is discretization (Yang, Ford, Tambe, & Lemieux, 2014a). In some problems, the discretization needs to be sufficiently fine-grained to ensure the solution quality. However, a fine-grained discretization can still lead to a large action space for the players, mak- ing the calculation of optimal strategies computationally prohibitive. Also, in some problems, no matter how fine-grained the discretization is, it may still lead to sup-optimal solutions. My 4 main contributions in this part include exploiting the spatio-temporal structure to handle the in- finite action space directly and to handle the large action space after discretization efficiently. The key ideas include reducing the number of actions to be considered based on the dominance relationship or equivalence relationship and analyzing the properties of the optimal strategy to reduce the search space. Both ideas aim to abstract the players’ action space before calculating the equilibrium strategy. In dealing with the problem of moving target protection, I introduce a novel game model that considers an SSG with a continuous set of actions for the attacker. In contrast, while the defender’s action space is also continuous, it is discretized for three reasons. Firstly, if the de- fender’s action space is continuous, the space of mixed strategies for the defender would then have infinite dimensions, which makes exact computation infeasible. Secondly, in practice, the defender’s mobile resources (i.e., patrol boats for protecting ferries) may not be able to have in- finitely fine-grained control over their movement, which makes the actual defender’s action space effectively a discrete one. Finally, the discretized defender action space is a subset of the original continuous defender action space, so the optimal defender strategy calculated in the discretized defender action space is a feasible solution in the original game and gives a lower-bound guaran- tee for the defender. Given this model, I propose CASS (Solver for Continuous Attacker Action Set), an efficient linear program to exactly solve the proposed game model. In this solver, I represent the defender’s mixed strategies as marginal probability variables, and this compact representation significantly reduces the number of variables needed to describe a defender strategy under discretization. In addition, I handle the infinite attacker actions over continuous time directly by partitioning the at- tacker’s action set. I show that due to the spatio-temporal structure of the problem, only one time 5 point could potentially be chosen by a best-responding attacker for each target in each partitioned subset. An attacker is best-responding if he tries to attack a target at a time point that gives him the highest expected utility. Therefore, it is sufficient to consider a finite number of attacker actions when calculating the optimal defender strategy. Additional contributions include equilibrium re- finement in the game and patrol route sampling. The game may have multiple equilibria and the defender strategy found by CASS can be suboptimal with respect to uncertainties in the attacker’s model, e.g. if the attacker can only attack during certain time intervals. I propose two heuristic equilibrium refinement approaches. The first, route-adjust, iteratively computes a defender strat- egy that dominates earlier strategies. The second, flow-adjust, is a linear-programming-based approach. An application built based on the work has been deployed for protecting the Staten Island Ferry and is used by US Coast Guard since April 2013. For area protection, the solution stems from the cost-benefit analysis, which leads to the key observation that in equilibrium, the attacker chooses a distance with the highest net benefit and the defender allocates the patrol resources to make the attacker’s marginal net benefit be zero beyond the equilibrium distance. That is, the attacker’s every additional step beyond the equilibrium distance would lead to zero net benefit. We prove that there exists one optimal strategy that satisfies this property, and we aim to find such strategy. With this analysis, the search space for the optimal patrol strategy is reduced significantly since the strategy can be characterized by the distance from which the defender starts to allocate patrol resources. Our work provides an efficient algorithm to find the optimal patrol strategy, a simple and practical approximation with a theoretical bound, as well as closed-form solutions for special cases. 6 1.2 Novel Challenges in Green Security Domains The second part of my thesis focuses on spatio-temporal dynamics in green security domains such as protecting wildlife and preventing illegal fishing. Same as infrastructure security domains, the law enforcement agencies have limited resources, and game theory can be brought into the field to fight against the illegal activities such as poaching and illegal fishing. However, these green security domains are very different compared to infrastructure security for several main reasons. First, frequent and repeated attacks are involved. For example, poachers place snares on the ground to trap animals. More than one thousand snares can be found annually in a conservation area in Uganda. If we consider the problem as a game, it is no longer a one-shot game. In addition, the frequent attacks would bring in more attack data that can be exploited by the defender. The second main difference is in attacker’s decision making. The attacks take place frequently, so it’s impossible for the attacker to conduct long-term surveillance before each of the attacks. The attacker may even have a lagged understanding of the defender’s strategy. Also, due to frequent attacks and the relatively low cost of failure, the attacker will take less effort in planning the attacks and may be boundedly rational in their decision making. Third, the feasibility of a patrol route is often constrained by various spatial constraints. For example, it is important to take into account the geographical information and design routes that can be followed by patrollers. With all these differences, new models are needed for green security domains and the defender faces a much more complex optimization problem. My contributions in this part include models and algorithms to handle challenges introduced by these difference. I proposed Green Security Games (GSGs), a novel game model that considers frequent and repeated attacks and a novel behavior model for the attacker. In green security domains, the 7 attackers can repeatedly and frequently perform attacks. On the other hand, this makes it possi- ble for the defender to exploit the attackers’ temporal attack pattern and benefit from changing her strategy from time to time. GSG generalizes the standard Stackelberg assumption in secu- rity games and instead assumes that the attackers’ understanding of the defender strategy can be approximated as a convex combination of the defender strategies used in several recent rounds. Based on this model, I proposed two sets of algorithms that plan ahead, providing defender strate- gies in each round and I further provide a novel framework that incorporates learning with our planning framework. In bringing the defender strategy to the real world, one important aspect to be considered is spatial constraints. For example, efforts have been made by law enforcement agencies in many countries to protect endangered animals; the most commonly used approach is conducting foot patrols. However, for human patrollers who move in conservation areas with complex terrain, a defender strategy in the form of coverage probability is not sufficient. More guidance should be provided, and a desirable defender strategy should include a set of complete and detailed patrol routes that are compatible with the terrain. In my thesis, I incorporate the topographical information into the game model and handle the spatial constraints brought by it. The main technical challenge of addressing the spatial constraints is that considering detailed topographical information leads to a need for a fine-grained discretization, which makes the cal- culation of optimal defender strategy computationally prohibitive. To address this challenge, I use a hierarchical modeling approach and combine grid-based discretization and graph represen- tation. I first apply a coarse grid-based discretization for the whole area, and then represent each discretized region as a sub-graph and connect the sub-graphs to get a large graph – a virtual street map of the area. More specifically, I build the sub-graphs of the discretized regions based on the 8 terrain features such as ridgeline and streams, as they are important conduits for animals’ move- ment and patrollers should focus on these features during their patrols. With the street map, the number of patrol routes for the defender is significantly reduced, making it possible to calculate the optimal defender strategy efficiently. 1.3 Thesis Overview The structure of the thesis is organized as follows: Chapter 2 discusses background material for Stackelberg security games. Chapter 3 reviews related work to provide the context for the contributions of the thesis. Chapter 4 considers security games continuous time (for example games with moving targets such as ferries and refugee supply lines). Chapter 5 investigates how to reason about continuous space in security games (for example games with forest land as a target). Chapter 6 explores the problem of frequent and repeated attacks, motivated by green security domains such as preventing poaching and illegal fishing. Chapter 7 examines how to handle complex spatial constraints that arise from the problem of designing optimal patrol strategy given complex topographical information. Chapter 8 summarizes the thesis and presents possible directions for future work. 9 Chapter 2 Background 2.1 Security Games A security game (Conitzer & Sandholm, 2006; Kiekintveld, Jain, Tsai, Pita, Ord´ o˜ nez, & Tambe, 2009a; Paruchuri, Pearce, Marecki, Tambe, Ordonez, & Kraus, 2008) is a two-player game be- tween a defender and an attacker, where the defender must protect a set ofN targets from the attacker. The defender tries to prevent attacks usingK defender resources. A pure strategy for the defender is typically an assignment of theK resources to either patrols or targets (depend- ing on the type of the game) while a pure strategy for the adversary is typically the target that is to be attacked. Denote the k th defender pure strategy as A k , which is an assignment of all the security resources. A k is represented as a column vectorA k =hA ki i T , whereA ki indicates whether targeti is covered byA k . For example, in a game with four targets and two resources, A k =h1; 1; 0; 0i represents the pure strategy of assigning one resource to target 1 and another to target 2. Each target i2 [N] is assigned a set of payoffsfP a i ;R a i ;P d i ;R d i g: If an attacker attacks targeti and it is protected by a defender resource, the attacker gets utilityP a i (P stands for penalty) and the defender gets utilityR d i (R stands for reward). If targeti is not protected, the 10 attacker gets utilityR a i and the defender gets utilityP d i . In order to be a valid security game, it must hold thatR a i > P d i andR d i > P d i , which means that assigning a resource to cover a target more often is always beneficial for the defender and disadvantageous for the adversary. Careful planning by the defender is necessary as the amount of available security resources is limited, i.e.,K <N, and not all targets can be covered. Most work on security games has used the Stackelberg assumption, i.e., the defender commits to a strategy first. The adversary is then able to conduct surveillance and thus learn the defender’s strategy before selecting their strategy. The game is then denoted as a Stackelberg security game (SSG) and the standard solution concept the Strong Stackelberg Equilibrium (SSE), in which the defender selects an optimal strategy based on the assumption that the adversary will choose an optimal response, breaking ties in favor of the defender. In an SSG, the optimal resource allocation strategy for the defender will usually be a mixed (randomized) strategy a, which is a distribution over the set of pure defender strategiesA, as any deterministic defender strategy would easily be exploited by the adversary. The defender’s mixed strategy can then be represented as a vector a =ha k i, wherea k 2 [0; 1] is the probability of choosing pure strategyA k . There is also a more compact ”marginal” representation for defender strategies. Letx be the marginal strategy, wherex i = P A k 2A a k A ki is the probability that targeti is covered. Thus, depending on the particular type of security game, the defender is trying to find either the optimal mixed strategya or marginal strategyx. There have been many algorithms and models developed to solve SSGs, including DOBSS (Paruchuri et al., 2008) which solves SSGs using a mixed-integer linear program, ORIGAMI (Kiekintveld et al., 2009a) which provides a polynomial time algorithm for SSGs that contain no 11 scheduling constraints. However, these algorithms do not apply to security games with spatio- temporal aspects associated with moving targets, continuous target, and frequent adversary inter- action. 2.2 Human Behavior Models One of the strongest assumptions in classic game theory is that the players are perfectly rational utility maximizer. However, it is well-understood that human beings are boundedly rational, and thus, the attackers may not always choose to attack the target with the highest expected utility, i.e., best respond to the defender’s strategy. Incorporating human behavioral models (McKelvey & Palfrey, 1995) in decision making into security games has been demonstrated to improve the performance of defender patrol strategies in both simulations and human subject experiments (Pita, Jain, Ordonez, Tambe, & Kraus, 2010; Yang, Ordonez, & Tambe, 2012; Nguyen, Yang, Azaria, Kraus, & Tambe, 2013c). (Yang et al., 2012) was the first to address human adversaries in security games by incorpo- rating quantal response (QR) model (McKelvey & Palfrey, 1995) from the behavioral economics literature. Instead of choosing the action with the highest utility, QR model predicts a probabil- ity distribution over adversary actions where actions with higher utility have a greater chance of being chosen. (Nguyen et al., 2013c) extended the QR model by proposing that humans use “subjective util- ity”, a weighted linear combination of features (such as defender coverage, adversary reward, and adversary penalty), to make decisions. This subjective utility quantal response (SUQR) model 12 was shown to outperform QR in human subject experiments. As a result, most subsequent re- search on boundedly rational human adversaries in security games has focused on the SUQR model. In this model, an attacker’s choice is based on an evaluation of key properties of each tar- get, including the coverage probability, the reward and the penalty, represented by the parameter vector! = (! 1 ;! 2 ;! 3 ). If the attackers respond to defender strategy, the probability that an attacker with parameter! attacks targeti is q i (!;) = e ! 1 i +! 2 R a i +! 3 P a i P j e ! 1 j +! 2 R a j +! 3 P a j (2.1) When there are multiple attackers, Bayesian SUQR model (Yang et al., 2014a) is proposed based on the SUQR model. It captures the heterogeneity of a group of attackers and assumes that different attackers have different parameters. 13 Chapter 3 Related Work 3.1 Stackelberg Security Games Stackelberg games have been widely applied to security domains, although most of this work has considered static targets (e.g., Korzhyk et al., 2010a; Krause, Roper, & Golovin, 2011; Letchford & V orobeychik, 2012; Kiekintveld, Islam, & Kreinovich, 2013). (Agmon et al., 2008) proposed algorithms for computing mixed strategies for setting up a perimeter patrol in adversarial set- tings with mobile robot patrollers. Similarly, (Basilico et al., 2009) computed randomized leader strategies for robotic patrolling in environments with arbitrary topologies. Even when both of the players are mobile, e.g., in hider-seeker games (Halvorson, Conitzer, & Parr, 2009), infiltration games (Alpern, 1992) or search games (Gal, 1980), the targets (if any) were assumed to be static. (Tsai et al., 2009b) applied Stackelberg games to the domain of scheduling federal air marshals on board flights. The targets (i.e., flights) in this domain are mobile, but the players are restricted to move along the targets to protect or attack them. This stationary nature leads to discrete game models with finite numbers of pure strategies. 14 (Boˇ sansk´ y, Lis´ y, Jakob, & Pˇ echouˇ cek, 2011) and (Vanˇ ek, Jakob, Hrstka, & Pˇ echouˇ cek, 2011) studied the problem of protecting moving targets. However, they both considered a model in which the defender, the attacker, and the targets have discretized movements on a directed graph. Such discretization of attacker strategy spaces can introduce suboptimality in the solutions as we have shown with DASS. We, in our work, generalize the strategy space of the attacker to the continuous realm and compute optimal strategies even in such a setting. Furthermore, while we provide an efficient and scalable linear formulation, (Boˇ sansk´ y et al., 2011) presented a formu- lation with non-linear constraints, which faced problems scaling up to larger games even with a single defender resource. (Yin & Tambe, 2012) considered the domain of patrolling in public transit networks (such as the LA Metro subway train system) in order to catch fare evaders. Because the players ride trains that follow a fixed schedule, the domain is inherently discrete, and they modeled the patrolling problem as a finite zero-sum Bayesian game. (Yin & Tambe, 2012) proposed a compact represen- tation for defender mixed strategies as flows in a network. We adapt this compact representation idea to a continuous domain. In particular, in our domain, we need to model the interaction be- tween the defender’s flow and attacker’s continuous strategy space. Our proposed sub-interval analysis used spatio-temporal reasoning to efficiently reduce the problem into a finite LP. There is an extensive literature on equilibrium refinement; however most existing work on the computation of equilibrium refinement focuses on finite games. For simultaneous-move finite games, solution concepts such as perfect equilibrium and proper equilibrium were proposed as refinements of Nash equilibrium (Fudenberg & Tirole, 1991). (Miltersen & Sørensen, 2007) pro- posed an efficient algorithm for computing proper equilibria in finite zero-sum games. For finite security games, (An, Tambe, Ord´ o˜ nez, Shieh, & Kiekintveld, 2011) proposed a refinement of 15 Stackelberg equilibrium and techniques for computing such refinements. The resulting defender strategy is robust against possibilities of constrained capabilities of the attacker. These existing approaches rely on the finiteness of action sets and is thus not applicable to our setting. (Simon & Stinchcombe, 1995) proposed definitions of perfect equilibrium and proper equilibrium for infinite games with continuous strategy sets. However, they did not propose any computational procedure for the resulting solution concepts. Exact computation of equilibrium refinements of continuous games remains a challenging open problem. There is a rising interest in applying game theory to green security domains, e.g., protecting fisheries from over-fishing (Brown, Haskell, & Tambe, 2014; Haskell, Kar, Fang, Tambe, Che- ung, & Denicola, 2014a) and protecting wildlife from poaching (Yang et al., 2014a). However, previous work in green security domains (Yang, Ford, Tambe, & Lemieux, 2014b; Haskell, Kar, Fang, Tambe, Cheung, & Denicola, 2014b) models the problem as a game with multiple rounds and each round is an SSG (Yin, Korzhyk, Kiekintveld, Conitzer, & Tambe, 2010) where the de- fender commits to a mixed strategy and the attackers respond to it. However, these efforts share the standard Stackelberg assumption that the defender’s mixed strategy is fully observed by the attacker via extensive surveillance before each attack. This assumption can be unrealistic in green security domains due to the frequent and repeated attacks. Due to this assumption, previous ef- forts do not engage in any planning and instead rely only on designing strategies for the current round. The bounded rationality of attackers has been studied extensively in the context of Stackel- berg security games. Instead of always choosing to attack the target with the highest expected utility, the attackers may choose sub-optimal targets. It has been shown that considering human behavioral models (McKelvey & Palfrey, 1995) when designing defender strategies in security 16 games can significantly improve the performance of defender in human subject experiments (Pita et al., 2010; Yang et al., 2012; Nguyen, Yang, Azaria, Kraus, & Tambe, 2013a). In addition, the bounded rationality of attackers has also been studied in green security problems, and new human behavior models are proposed to incorporate heterogeneity among a population of attackers (Yang et al., 2014a). However, previous work does not consider the attacker’s lagged understanding of the defender strategy and bounded memory. By embedding these factors, we complement previ- ous work that focuses on modeling human bounded rationality and bounded memory (Rubinstein, 1997; Cowan, 2005). Previous work on learning in repeated SSGs (Marecki, Tesauro, & Segal, 2012; Letchford, Conitzer, & Munagala, 2009; Blum, Haghtalab, & Procaccia, 2014) has mainly focused on learn- ing the payoffs of attackers assuming perfectly rational attackers. In contrast, we not only gen- eralize the Stackelberg assumption to fit green security domains but also provide algorithms to learn the parameters in the attackers’ bounded rationality model. While the work by (Yang et al., 2014a) do exploit the available attack data, they use Maximum Likelihood Estimation (MLE) to learn the parameters of the SUQR model for individual attackers which may lead to skewed results. 3.2 Continuous Strategy Space in Games Games with continuous strategy spaces have been well-studied in game theory. Much of the economics literature has focused on games whose equilibria can be solved analytically (and thus the question of computation does not arise), for example, the classical theory of auctions (see e.g., Krishna, 2009). Recent computational approaches for the analysis and design of auctions 17 have focused on discretized versions of the auction games (e.g., Thompson & Leyton-Brown, 2009; Daskalakis & Weinberg, 2012). There has been research on efficiently solving two-player continuous games with specific types of utility functions, such as zero-sum games with convex- concave utility functions (Owen, 1995) and separable continuous games with polynomial utility functions (Stein, Ozdaglar, & Parrilo, 2008). However, the approaches to handle the continuous strategy space cannot be directly applied to security games with complex spatio-temporal settings. Hybrid games and timed games in control problems have also been studied with an investi- gation into the continuous timeline (Henzinger, Horowitz, & Majumdar, 1999; De Alfaro, Hen- zinger, & Majumdar, 2001; Platzer, 2015). To address the infinite number of possible actions over the continuous timeline, region construction is used. In timed games, the options are partitioned into equivalent classes of states based on bisimilarity, similarity, and trace equivalence, forming a finite number of regions in the action space. The region construction in timed games and the compact representation in my work share the spirit of exploiting the equivalent class of strategies. Also, the region construction and the partitioning of the attacker’s strategy space in my work share the high-level intuition of reducing the number of actions to be considered from infinite to finitely many through exploiting the problem structure. However, the underlying techniques cannot apply directly. In my work, I partition the continuous timeline into a finite set of zones and consider only one best time point in each zone by exploiting the dominance relationship within each zone. 3.3 Planning and Learning in Repeated Games Planning and learning in repeated games against opponents with bounded memory has been stud- ied (Sabourian, 1998; Powers & Shoham, 2005; Chakraborty, Agmon, & Stone, 2013; de Cote & 18 Jennings, 2010; Banerjee & Peng, 2005). However, most of the work considers the case where each player chooses one action from his finite action set in each round of the game. In addi- tion, there is no delay in observing the other player’s action. Therefore, on-line learning based approaches can be used to find a good strategy against the opponent. Instead, we focus on the problem motivated by real-world green security challenges where the players can choose a mixed strategy and implement it for multiple episodes in each round. Previous approaches fail to apply in such settings, which is a better fit for the green security problems. 19 Chapter 4 Reasoning in Continuous Time My thesis has addressed reasoning in continuous time when the defender has mobile resources to protect moving targets. One major example of the practical domains motivating this work is the problem of protecting ferries that carry passengers in many waterside cities. Packed with hundreds of passengers, these may present attractive targets for an attacker. For example, the attacker may ram a suicide boat packed with explosives into the ferry as happened with attacks on French supertanker Limburg and USS Cole (Greenberg, Chalk, & Willis, 2006). In this case, the intention of the attacker can only be detected once he gets very close to the ferry. Small, fast and well-armed patrol boats (patrollers) can provide protection to the ferries, by detecting the attacker and stopping him with the armed weapons. However, there are often limited numbers of patrol boats, i.e., they cannot protect the ferries at all times at all locations. Most previous work on game-theoretic models for security has assumed either stationary targets such as airport terminals (Pita et al., 2008b), or targets that are stationary relative to the defender and the attacker, e.g., trains (Yin & Tambe, 2012) and planes (Tsai et al., 2009b), where the players can only move along with the targets to protect or attack them). This stationary nature leads to discrete game models with finite numbers of pure strategies. In contrast, the attacker 20 in this problem can attack these targets at any point in time during their movement, leading to a continuous set of strategies. The defender can deploy a set of mobile defender resources (called patrollers for short) to protect these targets. The patrollers’ movement is constrained by a speed limit and a patroller can provide protection to targets within a known protection radius. The values/utilities of the targets may vary depending on their locations and time. Along the line with previous work (Tambe, 2011; Yin & Tambe, 2012; Kiekintveld et al., 2009a), the attackers in this problem are assumed to be perfectly rational and will choose to attack a target at a time that is most favorable to him (i.e., with the highest expected utility). The defender’s objective is to schedule the patrollers to minimize attacker’s maximal expected utility (zero-sum game). The first contribution of this chapter is a novel game model for MRMT called MRMT sg . MRMT sg is an attacker-defender Stackelberg game model with a continuous set of strategies for the attacker. In contrast, while the defender’s strategy space is also continuous, we discretize it in MRMT sg for three reasons. Firstly, if we let the defender’s strategy space to be continuous, the space of mixed strategies for the defender would then have infinite dimensions, which makes exact computation infeasible. Secondly, in practice, the patrollers are not able to have such fine- grained control over their vehicles, which makes the actual defender’s strategy space effectively a discrete one. Finally, the discretized defender strategy space is a subset of the original continuous defender strategy space, so the optimal solution calculated under our formulation is a feasible solution in the original game and gives a lower-bound guarantee for the defender in terms of expected utility for the original continuous game. On the other hand, discretizing the attacker’s strategy space can be highly problematic as we will illustrate later in this chapter. In particular, if we deploy a randomized schedule for the defender under the assumption that the attacker could 21 only attack at certain discretized time points, the actual attacker could attack at some other time point, leading to a possibly much worse outcome for the defender. Our second contribution is CASS (Solver for Continuous Attacker Strategies), an efficient linear program to exactly solve MRMT sg . Despite discretization, the defender strategy space still has an exponential number of pure strategies. We overcome this shortcoming by compactly representing the defender’s mixed strategies as marginal probability variables. On the attacker side, CASS exactly and efficiently models the attacker’s continuous strategy space using sub- interval analysis, which is based on the observation that given the defender’s mixed strategy, the attacker’s expected utility is a piecewise-linear function. Along the way to presenting CASS, we present DASS (Solver for Discretized Attacker Strategies), which finds minimax solutions for MRMT sg games while constraining the attacker to attack at discretized time points. For clarity of exposition, we first derive DASS and CASS for the case where the targets move on a one- dimensional line segment. We later show that DASS and CASS can be extended to the case where targets move in a two-dimensional space. Our third contribution is focused on equilibrium refinement. Our game has multiple equi- libria, and the defender strategy found by CASS can be suboptimal with respect to uncertainties in the attacker’s model, e.g. if the attacker can only attack during certain time intervals. We present two heuristic equilibrium refinement approaches for this game. The first, route-adjust, it- eratively computes a defender strategy that dominates earlier strategies. The second, flow-adjust, is a linear-programming-based approach. Our experiments show that flow-adjust is computa- tionally faster than route-adjust but route-adjust is more effective in selecting robust equilibrium strategies. 22 Additionally, I provide several sampling methods for generating practical patrol routes given the defender strategy in a compact representation. Finally, I present detailed experimental anal- yses of our algorithm in the ferry protection domain. CASS has been deployed by the US Coast Guard since April 2013. The rest of the chapter is organized as follows: Section 4.1 provides our problem statement. Section 4.2 presents the MRMT sg model and an initial formulation of the DASS and CASS for a one-dimensional setting. Section 4.3 discusses equilibrium refinement, followed by Section 4.4 which gives the generalized formulation of DASS and CASS for two-dimensional settings. Section 4.5 describes how to sample a patrol route and Section 4.6 provides experimental results in the ferry protection domain. Section 4.7provides concluding remarks. 4.1 Problem Statement One major example of the practical domains motivating this chapter is the problem of protecting ferries that carry passengers in many waterside cities. Packed with hundreds of passengers, these may present attractive targets for an attacker. For example, the attacker may ram a suicide boat packed with explosives into the ferry as happened with attacks on French supertanker Limburg and USS Cole (Greenberg et al., 2006). In this case, the intention of the attacker can only be detected once he gets very close to the ferry. Small, fast and well-armed patrol boats (patrollers) can provide protection to the ferries (Figure 4.1(a)), by detecting the attacker and stopping him with the armed weapons. However, there are often limited numbers of patrol boats, i.e., they cannot protect the ferries at all times at all locations. We first focus on the case where ferries and patrol boats move in a one-dimensional line segment (this is a realistic setting and also simplifies 23 exposition); we will discuss the two-dimensional case in Section 4.4. Table 6.1 provides a table listing all the notations used in the work. (a) (b) Figure 4.1: (a) Protecting ferries with patrol boats; (b) Part of the map of New York Harbor Commuter Ferry Routes. The straight line linking St. George Terminal and Whitehall Terminal indicates a public ferry route run by New York City Department of Transportation. 4.1.1 Domain Description In this problem, there areL moving targets,F 1 ;F 2 ;:::;F L . We assume that these targets move along a one-dimensional domain, specifically, a straight line segment linking two terminal points which we will nameA andB. This is sufficient to capture real-world domains such as ferries moving back-and-forth in a straight line between two terminals as they do in many ports around the world; an example is the green line shown in Figure 4.1(b). We will provide an illustration of our geometric formulation of the problem in Figure 4.1.1. The targets have fixed daily schedules. The schedule of each target can be described as a continuous functionS q : T! D whereq = 1;:::;L is the index of the target,T = [0; 1] is a continuous time interval (e.g., representing the duration of a typical daily patrol shift) andD = [0; 1] is the continuous space of possible locations (normalized) with 0 corresponding to terminal A and 1 to terminal B. Thus S q (t) denotes the position of the targetF q at a specified timet. We assumeS q is piecewise linear. 24 The defender hasW mobile patrollers that can move alongD to protect the targets, denoted as P 1 ;P 2 ;:::;P W . Although capable of moving faster than the targets, they have a maximum speed ofv m . While the defender attempts to protect the targets, the attacker will choose a certain time and a certain target to attack. (In the rest of the chapter, we denote the defender as “she” and the attacker as “he”). The probability of attack success depends on the positions of the patrollers at that time. Specifically, each patroller can detect and try to intercept anything within the protection radiusr e but cannot detect the attacker prior to that radius. Thus, a patroller protects all targets within her protective circle of radius r e (centered at her current position), as shown in Figure 4.1.1. Figure 4.2: An example with three targets (triangles) and two patrollers (squares). The protective circles of the patrollers are shown with protection radiusr e . A patroller protects all targets in her protective circle. PatrollerP 1 is protectingF 2 andP 2 is protectingF 3 . Symmetrically, a target is protected by all patrollers whose protective circles can cover it. If the attacker attacks a protected target, then the probability of successful attack is a decreasing function of the number of patrollers that are protecting the target. Formally, we use a set of coefficientsfC G g to describe the strength of the protection. Definition 1. LetG2f1;:::;Wg be the total number of patrollers protecting a targetF q , i.e., there are G patrollers such that F q is within radius r e of each of the G patrollers. Then C G 2 [0; 1] specifies the probability that the patrollers can successfully stop the attacker. We require thatC G 1 C G 2 ifG 1 G 2 , i.e., more patrollers offer better protection. 25 As with previous work in security games (Tambe, 2011; Yin & Tambe, 2012; Kiekintveld et al., 2009a), we model the game as a Stackelberg game, where the defender commits to a ran- domized strategy first, and then the attacker can respond to such a strategy. The patrol schedules in these domains were previously created by hand; and hence suffer the drawbacks of hand-drawn patrols, including lack of randomness (in particular, informed randomness) and reliance on simple patrol patterns (Tambe, 2011), which we remedy in this chapter. 4.1.2 Defender Strategy A pure strategy of the defender is to designate a movement schedule for each patroller. Analogous to the target’s schedule, a patroller’s schedule can be written as a continuous functionR u :T! D where u = 1;:::;W is the index the patroller. R u must be compatible with the patroller’s velocity range. A mixed defender strategy is a randomization over the pure strategies, denoted as f. 4.1.3 Attacker Strategy The attacker conducts surveillance of the defender’s mixed strategy and the targets’ schedules; he may then execute a pure strategy response to attack a certain target at a certain time. The attacker’s pure strategy can be denoted ashq;ti whereq is the index of the target to attack andt is the time to attack. 4.1.4 Utility Function We assume the game is zero-sum. If the attacker performs a successful attack on target F q at locationx at timet, he gets a positive rewardU q (x;t) and the defender getsU q (x;t), otherwise 26 both players get utility zero. The positive rewardU q (x;t) is a known function which accounts for many factors in practice. For example, an attacker may be more effective in his attack when the target is stationary (such as at a terminal point) than when the target is in motion. As the target’s position is decided by the schedule, the utility function can be written asU q (t) U q (S q (t);t). We assume that for each targetF q ,U q (t) can be represented as a piecewise linear function oft. 4.1.5 Equilibrium Since our game is zero-sum, the Strong Stackelberg Equilibrium can be calculated by finding the minimax/maximin strategy (Fudenberg & Tirole, 1991; Korzhyk et al., 2010a). That is, we can find the optimal defender strategy by finding a strategy that minimizes the maximum of attacker’s expected utility. Definition 2. For single patroller case, the attacker expected utility of attacking targetF q at timet given defender mixed strategyf is AttEU f (F q ;t) = (1C 1 ! f (F q ;t))U q (t) (4.1) U q (t) is the reward for a successful attack, ! f (F q ;t) is the probability that the patroller is protecting targetF q at timet andC 1 is the protection coefficient of single patroller. We drop the subscript iff is obvious from the context. AsC 1 andU q (t) are constants for a given attacker’s pure strategyhq;ti, AttEU(F q ;t) is purely decided by !(F q ;t). The definition with multiple patrollers will be given in Section 4.2.5. We further denote the attacker’s maximum expected utility as AttEU m f = max q;t AttEU f (F q ;t) (4.2) 27 So the optimal defender strategy is a strategyf such that the AttEU m f is minimized, formally f2 arg min f 0 AttEU m f 0 (4.3) 4.1.6 Assumptions In our problem, the following assumptions are made based on discussions with domain experts. Here we provide our justifications for these assumptions. While appropriate for the current do- main of application, relaxing these assumptions for future applications remains an issue for future work, and we provide an initial discussion at the end of this chapter. • The attacker’s plan is decided off-line, i.e., the attacker does not take into account the patroller’s current partial route (partial pure strategy) in executing an attack: This as- sumption is similar to the assumption made in other applications of security games and justified elsewhere (An, Kempe, Kiekintveld, Shieh, Singh, Tambe, & V orobeychik, 2012; Pita, Jain, Ordonez, Portway, Tambe, Western, Paruchuri, & Kraus, 2009; Tambe, 2011). One key consideration is that given that attackers have limited resources as well, for them to generate and execute complex conditional plans that change based on “on-line” obser- vations of defender’s pure strategy is both difficult and risky. • A single attacker is assumed instead of multiple attackers: This assumption arises because performing even a single attack is already costly for the attacker. Thus, having coordinating attackers at the same time will be even harder and therefore significantly less likely for the attacker. 28 • The game is assumed to be zero-sum: In this case, the objectives of the defender and attacker are in direct conflict: preventing an attack with higher potential damage is a bigger success to the defender in our game. • The schedules for the targets are deterministic: For the domains we focus on, potential delays in the targets’ schedules are usually within several minutes if any, and the targets will try to catch up with the fixed schedules as soon as possible. Therefore, even when delays occur, the deterministic schedule for a target can be viewed as a good approximation of the actual schedule. 4.2 Models In this section, we introduce our MRMT sg model that uses a discretized strategy space for the defender and a continuous strategy space for the attacker. For clarity of exposition, we then introduce the DASS approach to compute a minimax solution for discretized attacker strategy space (Section 4.2.2), followed by CASS for the attacker’s continuous strategy space (Section 4.2.3). We first assume a single patroller in Sections 4.2.1 through 4.2.3 and then generalize to multiple patrollers in Section 4.2.5. 4.2.1 Representing Defender’s Strategies In this subsection, we introduce the discretized defender strategy space and the compact represen- tation used to represent the defender’s mixed strategy. We show that the compact representation is equivalent to the intuitive full representation, followed by several properties of the compact representation. 29 Since the defender’s strategy space is discretized, we assume that each patroller only makes changes at a finite set of time points T = ft 1 ;t 2 ;:::;t M g, evenly spaced across the original continuous time interval. t 1 = 0 is the starting time andt M = 1 is the normalized ending time. We denote by t the distance between two adjacent time points: t =t k+1 t k = 1 M1 . We set t to be small enough such that for each targetF q , the scheduleS q (t) and the utility functionU q (t) are linear in each interval [t k ;t k+1 ] fork = 1;:::;M 1, i.e., the target is moving with uniform speed and the utility of a successful attack on it changes linearly during each of these intervals. Thus, ift 0 is a breakpoint ofS q (t) orU q (t) for anyq, it can be represented ast 0 = t K 0 where K 0 is an integer. In addition to discretization in time, we also discretize the line segmentAB that the targets move along into a set of pointsD =fd 1 ;d 2 ;:::;d N g and restrict each patroller to be located at one of the discretized pointsd i at any discretized time pointt k . Note thatD is not necessarily evenly distributed and the targets’ locations are not restricted at anyt k . During each time interval [t k ;t k+1 ], each patroller moves with constant speed from her locationd i at timet k to her location d j at timet k+1 . Only movements compatible with the speed limitv m are possible. The points d 1 ;d 2 ;:::;d N are ordered by their distance to terminal A, andd 1 refers toA andd N refers toB. Since the time interval is discretized intoM points, a patroller’s routeR u can be represented as a vectorR u = (d ru(1) ;d ru(2) ;:::;d ru(M) ). r u (k) indicates the index of the discretized distance point where the patroller is located at timet k . We discretized the defender’s strategy space not only for computational reasons. It is not even clear whether an equilibrium exists in the original game with continuous strategy space for both 30 players. The discretization is made also because of the practical constraint of patrollers. In addi- tion, as we show later in this chapter, we can provide a bi-criteria polynomial time approximation scheme for the optimal defender strategy in the original game using the discretized games. For expository purpose, we first focus on the case with a single defender resource and then generalize to a larger number of resources later. For a single defender resource, the defender’s mixed strategy in full representation assigns a probability to each of the patrol routes that can be executed. Since at each time step a patroller can choose to go to at mostN different locations, there are at mostN M possible patrol routes in total and this number is achievable if there is no speed limit (orv m is large enough). The exponentially growing number of routes will make any analysis based on full representation intractable. Therefore, we use the compact representation of the defender’s mixed strategy. Definition 3. The compact representation for a single defender resource is a compact way to represent the defender’s mixed strategy using flow distribution variablesff(i;j;k)g.f(i;j;k) is the probability of the patroller moving fromd i at timet k tod j at timet k+1 . The complexity of the compact representation is O(MN 2 ), which is much more efficient compared to the full representation. Proposition 1. Any strategy in full representation can be mapped into a compact representa- tion. Proof sketch: If there areH possible patrol routesR 1 ;R 2 ;:::;R H , a mixed defender strategy can be represented in full representation as a probability vector (p(R 1 );:::p(R H )) wherep(R u ) is the probability of taking routeR u . Taking routeR u means the patroller moves fromd ru(k) to d ru(k+1) during time [t k ;t k+1 ], so the edgeE Ru(k);Ru(k+1);k is taken when routeR u is chosen. 31 Then the total probability of taking edge E i;j;k is the sum of probabilities of all the routes R u whereR u (k) =i andR u (k + 1) =j. Therefore, given any strategy in full presentation specified by the probability vector (p(R 1 );:::p(R H )), we can construct a compact representation consisting of a set of flow distribution variablesff(i;j;k)g where f(i;j;k) = X Ru:Ru(k)=i andRu(k+1)=j p(R u ): (4.4) Figure 4.3 shows a simple example illustrating the compact representation. Numbers on the edges indicate the value of f(i;j;k). We use E i;j;k to denote the directed edge linking nodes (t k ;d i ) and (t k+1 ;d j ). For example, f(2; 1; 1), the probability of the patroller moving from d 2 to d 1 during time t 1 to t 2 , is shown on the edge E 2;1;1 from node (t 1 ;d 2 ) to node (t 2 ;d 1 ). While a similar compact representation was used earlier by (Yin & Tambe, 2012), we use it in a continuous setting. Figure 4.3: Compact representation: x-axis shows time intervals; y-axis the discretized distance- points in the one-dimensional movement space. Note that different mixed strategies in full representation can be mapped to the same compact representation. Table 4.2 shows two different mixed defender strategies in full representations that can be mapped to the same mixed strategy in compact representation as shown in Figure 4.3. 32 The probability of a route is labeled on all edges in the route in full representation. Adding up the numbers of a particular edgeE i;j;k in all routes of a full representation together, we can get f(i;j;k) for the compact representation. Theorem 1. Compact representation does not lead to any loss in solution quality. Proof sketch: The complete proof of the theorem relies on the calculations in Section 4.2.2 and 4.2.3. Here we provide a sketch. Recall our goal is to find an optimal defender strategy f that minimizes the maximum attacker expected utility AttEU m f . As we will show in the next subsections,!(F q ;t) can be calculated from the compact representationff(i;j;k)g. If two de- fender strategies under the full representation are mapped to the same compact representation ff(i;j;k)g, they will have the same ! function and then the same AttEU function according to Equation 4.1. Thus the value of AttEU m f is the same for the two defender strategies. So an optimal mixed defender strategy in compact representation is still optimal for the corresponding defender strategies in full representation. We exploit the following properties of the compact representation. Property 1. For any time interval [t k ;t k+1 ], the sum of all flow distribution variables equals to 1: P N i=1 P N j=1 f(i;j;k) = 1. Property 2. The sum of flows that go into a particular node equals the sum of flows that go out of the node. Denote the sum for node (t k ;d i ) byp(i;k), thenp(i;k) = P N j=1 f(j;i;k 1) = P N j=1 f(i;j;k). Eachp(i;k) is equal to the marginal probability that the patroller is at location d i at timet k . Property 3. Combining Property 1 and 2, P N i=1 p(i;k) = 1. 33 4.2.2 DASS: Discretized Attacker Strategies In this subsection, we introduce DASS, a mathematical program that efficiently finds minimax solutions for MRMT sg -based games under the assumption that the attacker will attack at one of the discretized time pointst k . In this problem, we need to minimizev wherev is the maximum of attacker’s expected utility. Here,v is the maximum of AttEU(F q ;t) for any targetF q at any discretized time pointt k . From Equation (4.1), we know that AttEU(F q ;t) is decided by!(F q ;t), the probability that the patroller is protecting targetF q at timet. Given the position of the targetS q (t), we define the protection range q (t) = [maxfS q (t)r e ;d 1 g; minfS q (t) +r e ;d N g]. If the patroller is located within the range q (t), the distance between the target and the patroller is no more thanr e and thus the patroller is protecting F q at time t. So !(F q ;t) is the probability that the patroller is located within range q (t) at timet. For the discretized time pointst k , the patroller can only be located at a discretized distance pointd i , so we define the following. Definition 4. I(i;q;k) is a function of two values.I(i;q;k) = 1 ifd i 2 q (t k ), and otherwise I(i;q;k) = 0. In other words,I(i;q;k) = 1 means that a patroller located atd i at timet k can protect target F q . Note that the value ofI(i;q;k) can be calculated directly from the input parameters (d i ,S q (t) andr e ) and stored in a look-up table. In particular,I(i;q;k) is not a variable in the formulations that follow. It simply encodes the relationship betweend i and the location of targetF q att k . The probability that the patroller is atd i at timet k isp(i;k). So we have !(F q ;t k ) = X i:I(i;q;k)=1 p(i;k); (4.5) 34 AttEU(F q ;t k ) = 1C 1 X i:I(i;q;k)=1 p(i;k) U q (t k ): (4.6) Equation (4.6) follows from Equations (4.1) and (4.5), expressing attacker’s expected utility for discretized time points. Finally, we must address speed restrictions on the patroller. We set all flows corresponding to actions that are not achievable to zero, 1 that is,f(i;j;k) = 0 ifjd j d i j> v m t . Thus, DASS can be formulated as a linear program. This linear program solves for any number of targets but only one defender resource. min f(i;j;k);p(i;k) z (4.7) f(i;j;k)2 [0; 1]; 8i;j;k (4.8) f(i;j;k) = 0; 8i;j;k such thatjd j d i j>v m t (4.9) p(i;k) = N X j=1 f(j;i;k 1); 8i;8k> 1 (4.10) p(i;k) = N X j=1 f(i;j;k); 8i;8k<M (4.11) N X i=1 p(i;k) = 1; 8k (4.12) z AttEU(F q ;t k ); 8q;8k (4.13) Constraint 4.8 describes the probability range. Constraint 4.9 describes the speed limit. Con- straints 4.10–4.11 describes Property 2. Constraint 4.12 is exactly Property 3. Property 1 can be 1 Besides the speed limit, we can also model other practical restrictions of the domain by placing constraints on f(i;j;k). 35 derived from Property 2 and 3, so it is not listed as a constraint. Constraint (4.13) shows the at- tacker chooses the strategy that gives him the maximal expected utility among all possible attacks at discretized time points; where AttEU() is described by Equation (4.6). 4.2.3 CASS: Continuous Attacker Strategies In this subsection, we generalize the problem to one with continuous attacker strategy set and provides a linear-programming-based solution CASS. CASS efficiently finds optimal mixed de- fender strategy under the assumption that the attacker can attack at any time in the continuous time intervalT = [0; 1]. With this assumption, DASS’s solution quality guarantee may fail: if the attacker chooses to attack between t k and t k+1 , he may get a higher expected reward than attacking att k or t k+1 . Consider the following example, with the defender’s compact strategy betweent k andt k+1 shown in Figure 4.4. Here the defender’s strategy has only three non-zero flow variablesf(3; 4;k) = 0:3, f(3; 1;k) = 0:2, andf(1; 3;k) = 0:5, indicated by the set of three edgesE + =fE 3;4;k ;E 3;1;k ;E 1;3;k g. A targetF q moves fromd 3 tod 2 at constant speed during [t k ;t k+1 ]. Its schedule is depicted by the straight line segmentS q . The dark linesL 1 q and L 2 q are parallel toS q with distancer e . The area between them indicates the protection range q (t) for any timet2 (t k ;t k+1 ). Consider the time points at which an edge fromE + intersects one of L 1 q ,L 2 q , and label them as r qk ;r = 1::: 4 in Figure 4.4). Intuitively, these are all the time points at which a defender patrol could potentially enter or leave the protection range of the target. To simplify the notation, we denote t k as 0 qk and t k+1 as 5 qk . For example, a patroller moving fromd 3 tod 4 (or equivalently, taking the edgeE 3;4;k ) protects the target from 0 qk to 1 qk because E 3;4;k is betweenL 1 1 andL 2 1 in [ 0 qk ; 1 qk ], during which the distance to the target is less or equal than protection radiusr e . Consider the sub-intervals between each r qk and r+1 qk , forr = 0::: 4. 36 Since within each of these five sub-intervals, no patroller enters or leaves the protection range, the probability that the target is being protected is a constant in each sub-interval, as shown in Figure 4.5(a). Figure 4.4: An example to show how a target moving fromd 3 tod 2 during [t k ;t k+1 ] is protected. In a sub-interval [ r qk ; r+1 qk ], a patroller either always protects the target or never protects the target. Equivalently, the target is either always within the protective circle of a patroller or always outside the circle. SupposeU q (t) decreases linearly from 2 to 1 during [t k ;t k+1 ] andC 1 = 0:8. We can then calculate the attacker’s expected utility function AttEU(F q ;t) for (t k ;t k+1 ), as plotted in Figure 4.5(b). AttEU(F q ;t) is linear in each sub-interval but the function is discontinuous at the inter- section points 1 qk ;:::; 4 qk because of the patroller leaving or entering the protection range of the target. We denote the limit ofAttEU whent approaches r qk from the left as: lim t! r qk AttEU(F q ;t) = AttEU(F q ; r qk ) 37 Similarly, the right limit is denoted as: lim t! r+ qk AttEU(F q ;t) = AttEU(F q ; r+ qk ) IfF q is the only target, an attacker can choose to attack at a time immediately after 2 qk , getting an expected utility that is arbitrarily close to 1:70. According to Equation (4.6), we can get AttEU(F q ;t k ) = 1:20 and AttEU(F q ;t k+1 ) = 1:00, both lower than AttEU(F q ; 2+ qk ). Thus, the attacker can get a higher expected reward by attacking betweent k andt k+1 , violating DASS’s quality guarantee. 0.20 0.00 0.50 (a) Probability that the target is protected is a constant in each sub-interval. 1.70 1.20 1.43 1.00 (b) The attacker’s expected utility is linear in each sub-interval. Figure 4.5: Sub-interval analysis in (t k ;t k+1 ) for the example shown in Figure 4.4.] However, because of discontinuities in the attacker’s expected utility function, a maximum might not exist. This implies that the minimax solution concept might not be well-defined for our game. We thus define our solution concept to be minimizing the supremum of AttEU(F q ;t). Definition 5. The supremum of attacker’s expected utility is the smallest real number that is greater than or equal to all elements of the infinite setfAttEU(F q ;t)g, denoted as sup AttEU(F q ;t). 38 The supremum is the least upper bound of the function AttEU(F q ;t). So for CASS model, Equation 4.2 should be modified as AttEU m f = sup q;t AttEU f (F q ;t) (4.14) So a defender strategyf is minimax if AttEU m f is maximized, i.e., f2 arg min f 0 sup AttEU f 0(F q ;t) In the above example, the supremum of attacker’s expected utility in (t k ;t k+1 ) is AttEU(F q ; 2+ qk ) = 1:70. In the rest of the chapter, we will not specify when supremum is used instead of maximum as it can be easily inferred from the context. How can we deal with the possible attacks between the discretized points and find an optimal defender strategy? We generalize the process above (called sub-interval analysis) to all possible edges E i;j;k . We then make use of the piecewise linearity of AttEU(F q ;t) and the fact that the potential discontinuity points are fixed, which allows us to construct a linear program that solves the problem to optimality. We name the approach CASS (Solver for Continuous Attacker Strategies). We first introduce the general sub-interval analysis. For any target F q and any time in- terval (t k ;t k+1 ), we calculate the time points at which edges E i;j;k and L 1 q , L 2 q intersect, de- noted as intersection points. We sort the intersection points in increasing order, denoted as r qk ;r = 1:::M qk , where M qk is the total number of intersection points. Set 0 qk = t k and M qk +1 qk =t k+1 . Thus (t k ;t k+1 ) is divided into sub-intervals ( r qk ; r+1 qk );r = 0;:::;M qk . 39 Lemma 1. For any given target F q , AttEU(F q ;t) is piecewise-linear in t. Furthermore, there exists a fixed set of time points, independent of the defender’s mixed strategy, such that AttEU(F q ;t) is linear between each adjacent pair of points. Specifically, these points are the intersection points r qk defined above. Proof: In each sub-interval ( r qk ; r+1 qk ) for a targetF q , a feasible edgeE i;j;k is either totally above or belowL 1 q , and similarly forL 2 q . Otherwise there will be a new intersection point which contradicts the definition of the sub-intervals. If edgeE i;j;k is betweenL 1 q andL 2 q , the distance between a patroller taking the edge and targetF q is less thanr e , meaning the target is protected by the patroller. As edgeE i;j;k is taken with probabilityf(i;j;k), the total probability that the target is protected (!(F q ;t)) is the sum of f(i;j;k) whose corresponding edge E i;j;k is between the two lines in a sub-interval. So!(F q ;t) is constant int in each sub-interval and thus the attacker’s expected utility AttEU(F q ;t) is linear in each sub-interval according to Equation 2 asU q (t) is linear in [t k ;t k+1 ]. Discontinuity can only exist at these intersection points and an upper bound on the number of these points for targetF q isMN 2 . Define coefficientA r qk (i;j) to beC 1 if edgeE i;j;k is betweenL 1 q andL 2 q in ( r qk ; r+1 qk ), and 0 otherwise. According to Equation (4.1) and the fact that!(F q ;t) is the sum off(i;j;k) whose corresponding coefficientA r qk (i;j) =C 1 , we have the following equation fort2 ( r qk ; r+1 qk ). AttEU(F q ;t) = 0 @ 1 N X i=1 N X j=1 A r qk (i;j)f(i;j;k) 1 A U q (t) (4.15) Piecewise linearity of AttEU(F q ;t) means the function is monotonic in each sub-interval and the supremum can be found at the intersection points. Because of linearity, the supremum of AttEU in ( r qk ; r+1 qk ) can only be chosen from the one-sided limits of the endpoints, AttEU(F q ; r+ qk ) and 40 AttEU(F q ; (r+1) qk ). Furthermore, ifU q (t) is decreasing in [t k ;t k+1 ], the supremum is AttEU(F q ; r+ qk ) and otherwise it is AttEU(F q ; (r+1) qk ). In other words, all other attacker’s strate- gies in ( r qk ; r+1 qk ) are dominated by attacking at time close to r qk or r+1 qk . Thus, CASS adds new constraints to Constraints 4.8–4.13 which consider attacks to occur att2 (t k ;t k+1 ). We add one constraint for each sub-interval with respect to the possible supremum value in this sub-interval: min f(i;j;k);p(i;k) z (4.16) subject to constraints (4:8::: 4:13) z maxfAttEU(F q ; r+ qk ); AttEU(F q ; (r+1) qk )g (4.17) 8k2f1:::Mg;q2f1:::Lg;r2f0:::M qk g This linear program stands at the core of CASS and we will not differentiate the name for the solver and the name for the linear program in the following. All the linear constraints in- cluded by Constraint 4.17 can be added to CASS using Algorithm 4.2.3. The input of the algorithm include targets’ schedulesfS q g, the protection radius r e , the speed limit v m , the set of discretized time pointsft k g and the set of discretized distance pointsfd i g. Function CalInt(L 1 q ;L 2 q ;v m ) in Line 1 returns the list of all intersection time points between all possi- ble edgesE i;j;k and the parallel linesL 1 q ,L 2 q , with additional pointst k as 0 qk andt k+1 as M qk +1 qk . Function CalCoef(L 1 q ;L 2 q ;v m ; r qk ; r+1 qk ) in Line 1 returns the coefficient matrixA r qk .A r qk can be easily decided by checking the status at the midpoint in time. Sett mid = ( r qk + r+1 qk )=2 and denote the patroller’s position at t mid when edge E i;j;k is taken as E i;j;t mid , thus A r qk (i;j) = C 1 if E i;j;t mid 2 q (t mid ). Lines 1–1 add a constraint with respect to the larger value of AttEU(F q ; r+ qk ) and AttEU(F q ; (r+1) qk ) to CASS for this sub-interval ( r qk ; r+1 qk ). It means 41 when the attacker chooses to attackF q in this sub-interval, his best choice is decided by the larger value of the two side-limits of AttEU in ( r qk ; r+1 qk ). Algorithm 1 Add constraints described in Constraint 4.17 Input:S q ,r e ,v m ,ft k g,fd i g fork 1;:::;M 1 do forq 1;:::;L do L 1 q S q + r e , L 2 q S q r e 0 qk ;:::; M qk +1 qk CalInt(L 1 q ;L 2 q ;v m ) for r 0;:::;M qk do A r qk CalCoef(L 1 q ;L 2 q ;v m ; r qk ; r+1 qk ) ifU q (t) is decreasing in [t k ;t k+1 ] then add constraintz AttEU(F q ; r+ qk ) end else add constraintz AttEU(F q ; (r+1) qk ) end end end end Theorem 2. CASS computes (in polynomial time) the exact solution (minimax) of the game with discretized defender strategies and continuous attacker strategies. Proof: According to Lemma 1, AttEU(F q ;t) is piecewise linear and discontinuity can only occur at the intersection points r qk . These intersection points divide the time space into sub- intervals. Because of piecewise linearity, the supremum of AttEU(F q ;t) equals to the limit of an endpoint of at least one sub-interval. For any defender’s strategyf that is feasible, a feasiblez of the linear program 4.16-4.17 is no less than any of the limit values at the intersection points according to Constraint 4.17 and values at the discretized time pointst k according to Constraint 4.13, and thusv can be any upper bound of AttEU(F q ;t) forf. Asz is minimized in the objective function,z is no greater than the supremum of AttEU(F q ;t) given any defender strategyf, and furtherz will be the minimum of the set of supremum corresponding to all defender strategies. Thus we get the optimal defender strategyf. 42 The total number of variables in the linear program isO(MN 2 ). The number of constraints represented in Algorithm 4.2.3 is O(MN 2 L) as the number of intersection points is at most 2(M 1)N 2 for each target. The number of constraints represented in Constraints 4.8–4.13 is O(MN 2 ). Thus, the linear program computes the solution in polynomial time. Corollary 1. The solution of CASS provides a feasible defender strategy of the original con- tinuous game and gives exact expected value of that strategy. 4.2.4 Optimal Defender Strategy in the Original Game We have solved the problem with discretized defender strategy space. In this subsection, we dis- cuss how much the defender loses by using the discretization. Indeed, we can provide a bi-criteria polynomial time approximation scheme for the optimal defender strategy in the original game us- ing the discretized games. Let DefEU d ;t (r e ;v m ) denote the defender’s optimal expected utility with a discretization granularity defined by d and t (i.e.,N = 1 d ). Let DefEU opt (r e ;v m ) denote the defender’s optimal expected utility without using discretization. The defender’s capability is depicted by the protection radiusr e and speed limitv m . Then DefEU opt (r e ;v m ) is bounded by the following: DefEU d ;t (r e ;v m ) DefEU opt (r e ;v m ) DefEU 0 d ; 0 t (r e +;v m + ) (4.18) 8 d ; t ;; ; 0 t 2 v m ; 0 d 0 t (4.19) 43 4.2.5 Generalized Model with Multiple Defender Resources In this subsection, we generalize DASS and CASS to solve the problem with multiple defender resources. When there are multiple patrollers, the patrollers will coordinate with each other. Re- call the protection coefficientC G in Definition 1, a target is better protected when more patrollers are close to it (within radiusr e ). So the protection provided to a target is determined when all patrollers’ locations are known. Thus it is not sufficient to calculate the probability that an in- dividual edge is taken as in the single patroller case. Under the presence of multiple patrollers, we need a more complex representation to explicitly describe the defender strategy. To illustrate generalization to the multiple defender resources case, we first take two patrollers as an example. If there are two patrollers, the patrol strategy can be represented using flow distribution variables ff(i 1 ;j 1 ;i 2 ;j 2 ;k)g. Here the flow distribution variables are defined on the Cartesian product of two duplicated sets of all feasible edgesfE i;j;k g. f(i 1 ;j 1 ;i 2 ;j 2 ;k) is the joint probability of the first patroller moving fromd i 1 tod j 1 and the second patroller moving fromd i 2 tod i 2 during timet k tot k+1 , i.e., taking edgeE i 1 ;j 1 ;k andE i 2 ;j 2 ;k respectively. The corresponding marginal distribution variablep(i 1 ;i 2 ;k) represents for the probability that the first patroller is atd i 1 and the second atd i 2 at timet k . Protection coefficientsC 1 andC 2 are used when one or two patrollers are protecting the target respectively. So the attacker’s expected utility can be written as AttEU(F q ;t) = (1 (C 1 ! 1 (F q ;t) +C 2 ! 2 (F q ;t)))U q (t) 44 ! 1 (F q ;t) is the probability that only one patroller is protecting the targetF q at timet and! 2 (F q ;t) is the probability that both patrollers are protecting the target. For attacks that happen at dis- cretized pointst k , we can make use ofI(i;q;k) in Definition 4 andI(i 1 ;q;k) +I(i 2 ;q;k) is the total number of patrollers protecting the ferry at timet k . ! 1 (F q ;t k ) = X i 1 ;i 2 :I(i 1 ;q;k)+I(i 2 ;q;k)=1 p(i 1 ;i 2 ;k) ! 2 (F q ;t k ) = X i 1 ;i 2 :I(i 1 ;q;k)+I(i 2 ;q;k)=2 p(i 1 ;i 2 ;k) Constraints for attacks occurring in (t k ;t k+1 ) can be calculated with an algorithm that looks the same as Algorithm 4.2.3. The main difference is in the coefficient matrixA r qk and the expression of AttEU. We set the values in the coefficient matrix A r qk (i 1 ;j 1 ;i 2 ;j 2 ) as C 2 if both edges E i 1 ;j 1 ;k andE i 2 ;j 2 ;k are betweenL 1 q andL 2 q , andC 1 if only one of the edges protects the target. The attacker’s expected utility function in ( r qk ; r+1 qk ) is AttEU(F q ;t) = (1 X i 1 ;j 1 ;i 2 ;j 2 A r qk (i 1 ;j 1 ;i 2 ;j 2 )f(i 1 ;j 1 ;i 2 ;j 2 ;k))U q (t) For a general case ofW defender resources, we can useff(i 1 ;j 1 ;:::;i W ;j W ;k)g to represent the patrol strategy. Definition 6. The compact representation for multiple defender resources is a com- pact way to represent the defender’s mixed strategy using flow distribution variables ff(i 1 ;j 1 ;:::;i W ;j W ;k)g. ff(i 1 ;j 1 ;:::;i W ;j W ;k)g is the joint probability that patroller mov- ing fromd iu at timet k tod ju at timet k+1 foru = 1:::W . 45 Given the generalized compact representation, we get the following equations for calculating the attacker’s expected utility function and the protection probability: AttEU(F q ;t) = 0 @ 1 W X Q=1 C Q ! Q (F q ;t) 1 A U q (t) ! Q (F q ;t k ) = X i 1 ;:::;i W : W P u=1 I(iu;q;k)=Q p(i 1 ;:::;i W ;k) Q is the number of patrollers protecting the target. We can modify Algorithm 4.2.3 to apply for the multiple defender resource case. Set A r qk (i 1 ;j 1 ;:::;i W ;j W ) as C Q if Q of the edges fE iu;ju;k g are betweenL 1 q andL 2 q . 46 We conclude the linear program for generalized CASS for multiple patrollers as follows. min f(i 1 ;j 1 ;:::;i W ;j W ;k);p(i 1 ;:::;i W ;k) z (4.20) f(i 1 ;j 1 ;:::;i W ;j W ;k) = 0;8i 1 ;:::;i W ;j 1 ;:::;j W such that9u;jd ju d iu j>v m t (4.21) p(i 1 ;:::;i W ;k) = n X j 1 =1 ::: n X j W =1 f(j 1 ;i 1 ;:::;j W ;i W ;k 1);8i 1 ;:::;i W ;8k> 1 (4.22) p(i 1 ;:::;i W ;k) = n X j 1 =1 ::: n X j W =1 f(i 1 ;j 1 ;:::;i W ;j W ;k);8i 1 ;:::;i W ;8k<M (4.23) n X i 1 =1 ::: n X i W =1 p(i 1 ;:::;i W ;k) = 1;8k (4.24) z AttEU(F q ;t k );8q;8k (4.25) z maxfAttEU(F q ; r+ qk ); AttEU(F q ; (r+1) qk )g;8k;8q;8r (4.26) The number of variables in the linear program isO(MN 2W ) and the number of constraints isO(MN W ). It is reasonable to examine potentially more efficient alternatives. We summarize the results of such an examination below concluding that using the current linear program would appear to currently offer our best tradeoff in terms of solution quality and time at least for the current domains of application; although as discussed below, significant future work might reveal alternatives approaches for other future domains. 47 The first question to examine is that of the computational complexity of the problem at hand: generating optimal patrolling strategies for multiple patrollers on a graph. Unfortunately, de- spite the significant attention paid to the topic, currently, the complexity remains unknown. More specifically, the question of computational complexity of generating patrols for multiple defend- ers on graphs of different types has received significant attention (Letchford, 2013; Korzhyk et al., 2010a). These studies illustrate that in several cases the problem is NP-hard, in some cases the problem is known to be polynomial time, but despite significant effort, the problem complexity in many cases remains unknown (Letchford & Conitzer, 2013). Unfortunately, our graph turns out to be different from the cases considered in their work. Indeed, the DASS model can be explained as a game with homogeneous defender resources patrolling on a graph, similar to the cases that have already been considered. However, prior results cannot explain the complexity of our game as the structure of our graph does not fit any of the prior graphs. Given that computational complexity results are not directly available, we may examine ap- proaches to provide efficient approximations. Here we provide an overview of two such ap- proaches (providing experimental results in Section 4.6.1.6). Our first approach attempts to pro- vide a more compact representation in the hope of providing a speedup. To that end, we apply an intuitive approach that uses individual strategy profile for each patroller and then calculates a best possible mixed strategy combination. Unfortunately, this approach is inefficient in run-time even for the DASS model and may result in a suboptimal solution. Thus, although more compact, this approach fails to achieve our goal; we explain this approach next. Assume each patroller independently follows her mixed strategy. Denote the individual mixed strategy for patrolleru asf u (i u ;j u ;t k ), and the probability that a target is protected byQ players 48 can be represented as a polynomial expression offf u (i u ;j u ;t k )g of orderQ. Then our optimiza- tion problem is converted to minimizing objective functionz with non-linear constraints. Assume we have two patrollers, and for a potential attack at targetq at timet k , we denote the probability that patrolleru is protecting the target as$ u .$ u is linear inf u and the attacker’s expected utility for this attack can be represented as AttEU(F q ;t k ) = (1C 1 ((1$ 1 )$ 2 + (1$ 2 )$ 1 )C 2 $ 1 $ 2 )U q (t k ) So a constraint z AttEU(F q ;t k ) is quadratic in f, due to the fact that the joint probability is represented by the product of the individual probability of each patroller. These constraints are not ensured to have a convex feasible region, and there are no known polynomial algorithms for solving this kind of non-convex optimization problems. We attempt to solve the problem by converting it into a mathematical program with a non-convex objective function and linear constraints, i.e., instead of minimizingz with constraintsz AttEU(F q ;t k ), we incorporate the constraints into the objective function as z = max q;k fAttEU(F q ;t k )g (4.27) The results in Section 4.6.1.6 show that when we solve this mathematical program in MATLAB using function fmincon with interior-point method for the DASS model, the algorithm fails to get to a feasible solution efficiently and even when enough time is given, the solution can still be suboptimal as it may get stuck at a local minimum. To conclude, although this approach is 49 more compact and helps in saving memory, it is inefficient at run-time and may result in a loss in solution quality. Our second approach takes a further step to reduce the run-time complexity, making it a poly- nomial approximation algorithm, but it can lead to a high degradation in solution quality. In this approach, we iteratively compute the optimal defender strategy for a newly added resource unit given the existing strategies for the previous defender resources. Namely, we first calculate f 1 (i 1 ;j 1 ;t k ) as if only one patroller is available and then calculatef 2 (i 2 ;j 2 ;t k ) given the value off 1 (i 1 ;j 1 ;t k ). In this way, we need to solveW linear programs with complexityO(MN 2 ) so this approach is much faster compared to the former one. Unfortunately, this approach fails to capture the coordination between the patrollers effectively and thus may result in a high degra- dation in solution quality. For example, suppose there are only two targets of constant utility U, one target stays at terminal A and the other one stays at terminal B. Further, suppose the protection coefficient is always 1 when a target is protected by one or more patrollers. When two patrollers are available, the optimal solution would be each protect one of the targets all the way, so both targets are protected with probability 1 and the expected utility function for the attacker is 0. If the defender strategy is calculated for each patroller sequentially as discussed above, the solution would be to protect each target with probability 0.5 for both players, making the attacker’s expected utility 0:25%U. In other words, we reach a suboptimal solution, wasting resources when both patrollers end up protecting the same target with probability 0.25. In this case, we can already see that there is a 0.25 probability that a target is unprotected when clearly an optimal solution existed that protected all targets with probability 1. Thus, even with just two patrollers, this solution leads to a potentially significant loss in expected utility; therefore, this solution clearly appears to be inadequate for our purposes. 50 Given the above discussion, it would appear that a fast approximation may lead to significant losses in solution quality or may not be efficient enough. Fortunately for current application domains, such as the current deployment of CASS for protecting ferries (e.g., the Staten Island Ferry in New York), the number of defender resources are limited. The lack of resources is the main reason that optimization using security games becomes critical. As a result, our current approach of CASS is adequate for current domains such as ferry protection. Further research about scale-up is an issue for future work. 4.3 Equilibrium Refinement A game often has multiple equilibria. Since our game is zero-sum, all equilibria achieve the same objective value. However, if an attacker deviates from his best response, some equilibrium strategies for the defender may provide better results than others. Consider the following example game. There are two targets moving during [t 1 ;t 2 ] (no further discretization): one moves fromd 3 tod 2 and the other moves fromd 1 tod 2 (See Figure 4.6(a)). Supposed 3 d 2 = d 2 d 1 = d andr e = 0:5 d . There is only one patroller available and the protection coefficientC 1 = 1. Both targets’ utility functions decrease from 10 to 1 in [t 1 ;t 2 ] (See Figure 4.6(b)). In one equilibrium,f 3;2;1 =f 1;2;1 = 0:5, i.e., the patroller randomly chooses one target and follows it all the way. In another equilibrium,f 3;3;1 = f 1;1;1 = 0:5, i.e., the patroller either stays at d 1 or at d 3 . In either equilibrium, the attacker’s best response is to attack at t 1 , with a maximum expected utility of 5. However, if an attacker is physically constrained (e.g., due to launch point locations) to only attack no earlier thant 0 andt 0 > 1 1 (where 1 1 is the only intersection time point and 1 1 = (t 1 +t 2 )=2), against both defender strategies he will choose to 51 attack either of the targets att 0 . The attacker’s expected utility isU q (t 0 )=2 in the first equilibrium because there is 50% probability that the patroller is following that target. However in the second equilibrium, he is assured to succeed and get a utility of U q (t 0 ) because the distance between the chosen target andd 1 (ord 3 ) is larger thanr e att 0 , i.e., the chosen target is unprotected att 0 . In this case, the defender strategy in the first equilibrium is preferable to the one in the second; indeed, the first defender strategy dominates the second one, by which we mean the first is equally good or better than the second no matter what strategy the attacker chooses. We provide a formal definition of dominance in Section 4.3.1. (a) Two targets moves with schedulesS1 andS2. 10 1 (b) Utility function is the same for both targets and is decreasing linearly over time. Figure 4.6: An example to show one equilibrium outperforms another when the attacker is con- strained to attack in [t 0 ;t 2 ] ift 0 > 1 1 . Our goal is to improve the defender strategy so that it is more robust against constrained attackers while keeping the defender’s expected utility against unconstrained attackers the same. This task of selecting one from the multiple equilibria of a game is an instance of the equilibrium refinement problem, which has received extensive study in game theory (van Damme, 1987; Fudenberg & Tirole, 1991; Miltersen & Sørensen, 2007). For finite security games, (An et al., 2011) proposed techniques that provide refinement over Stackelberg equilibrium. However, there has been little prior research on the computation of equilibrium refinements for continuous games. 52 In this section, we introduce two equilibrium refinement approaches: “route-adjust” (Section 4.3.1) and “flow-adjust” (Section 4.3.2). Both approaches can be applied to improve any feasi- ble defender strategy, and when they are applied to an optimal defender strategy in an existing equilibrium, we will get new equilibria with more robust optimal defender strategies. For expository simplicity, we still use the single-resource case as an example, but both meth- ods apply to the multiple-resources case. The results shown in evaluation section experimentally illustrates these two refinement methods can significantly improve the performance. 4.3.1 Route Adjust Given thatf is the defender strategy of one equilibrium of the game, if we can find a defender strategy f 0 such that for any attacker strategy (q;t), the defender’s expected utility under f 0 is equal to or higher than the one under f, and the one under f 0 is strictly higher than the one under f for at least one specific attacker strategy, we say that f 0 dominates f. Intuitively, the defender should choose f 0 instead of f as f 0 is at least as good as f for any attacker strategy and can achieve better performance for some attacker strategies. So an equilibrium with strategy f 0 is more robust to unknown deviations on the attacker side. We give the formal definition of dominance as follows. Definition 7. Defender strategyf dominatesf 0 if8q;t, DefEU f (F q ;t) DefEU f 0(F q ;t), and9q;t, DefEU f (F q ;t) > DefEU f 0(F q ;t); or equivalently in this zero-sum game, 8q;t, AttEU f (F q ;t) AttEU f 0(F q ;t), and9q;t, AttEU f (F q ;t)< AttEU f 0(F q ;t). Corollary 2. Defender strategy f dominates f 0 if8q;t, !(F q ;t) ! 0 (F q ;t) and9q;t, !(F q ;t)>! 0 (F q ;t). 53 Definition 7 simply restates the commonly used weak dominance definition in game theory for this specific game. Corollary 2 follows from Equation (4.1). In this section, we introduce the route-adjust approach which gives a procedure for finding a defender strategyf 1 that dominates the given defender strategyf 0 . Route-adjust provides final routes using these steps: (i) decompose flow distributionf 0 into component routes; (ii) for each route, greedily find a route which provides better protection to targets; (iii) combine the resulting routes into a new flow distribution,f 1 , which dominatesf 0 iff 1 is different fromf 0 . The detailed process is listed in Algorithm 2. We illustrate this approach using a simple dominated strategy shown in Figure 4.3. To accomplish step (i), we decompose the flow distribution by iteratively finding a route that contains the edge with minimum probability. As shown in Figure 4.7, we first randomly choose a route that contains edgeE 1;2;2 , asf(1; 2; 2) = 0:4 is the minimum among all flow variables. We chooseR 2 = (d 1 ;d 1 ;d 2 ), and setp(R 2 ) =f(1; 2; 2) = 0:4. Then for each edge of the routeR 2 we subtract 0:4 from the original flow, resulting in a residual flow. We continue to extract routes from the residual flow until there is no route left. Denote byZ the number of non-zero edges in the flow distribution graph, thenZ is decreased by at least 1 after each iteration. So the algorithm will terminate in at mostZ steps and at mostZ routes are found. The result of step (i) is a sparse description of a defender mixed strategy in full representation. As we will discuss in Section 4.5, this decomposition constitutes one method of executing a compact strategy. For step (ii), we adjust each of the routes greedily. To that end, we first introduce the dom- inance relation of edges and routes, using the intersection points r qk and the coefficient matrix A r qk (i;j) defined in Section 4.2.3. 54 Algorithm 2 Route-Adjust Input: a mixed defender strategyf Output: an updated mixed defender strategyf 0 (i) Decomposef into multiple routes by iteratively finding a route that contains the edge with minimum probability: (a) Initialize the remaining flow distribution ~ f =f and route setS =;. Initial- ize probability distribution over routesp(R u ) = 0,8u. (b) while max ~ f(i;j;k)> 0 do i. Set (i 0 ;j 0 ;k 0 ) = arg min i;j;k: ~ f(i;j;k)>0 ~ f(i;j;k). ii. Setf min = ~ f(i 0 ;j 0 ;k 0 ). iii. Find an arbitrary routeR u 0 such thatr u 0 (k 0 1) =i 0 andr u 0 (k 0 ) =j 0 (i.e., edgeE i 0 ;j 0 ;k 0 is in the route) and ~ f(r u 0 (k);r u 0 (k +1);k)> 0,8k (i.e., all edges in the route has non-zero remaining flow). iv. AddR u 0 toS and setp(R u 0 ) =f min . v. Set ~ f(i;j;k) = ~ f(i;j;k)f min ifr u 0 (k 1) =i andr u 0 (k) =j. end (ii) Adjust each route inS greedily to get a new set of routesS 0 and the corresponding new probability distributionp 0 : (a) Initialize the new setS 0 =; and new probability distributionp 0 (R u ) = 0, 8u. (b) whileS6=; do i. Pick a routeR u fromS. ii. AdjustR u to get new routeR u 0: for a givenR u and a specifiedk , set r u 0(k) =r u (k) ifk6=k . Setr u 0(k ) =i 0 such that: 1)E(u 1 ;k 1) andE(u 1 ;k ) meet the speed constraint; 2)R u 0 dominatesR u with the choice ofi 0 ; 3)R u 0 is not dominated by a route with any other choice ofi 0 . If no suchi 0 exists, setr u 0(k ) =r u (k ) iii. AddR u toS 0 and setp 0 (R u 0) =p(R u ). iv. RemoveR u fromS. end (iii) Reconstruct a new compact representationf 0 fromS 0 andp 0 according to Equation 4.4. 55 , , p R 0.2 , , p R 0.4 , , p R 0.4 Figure 4.7: Step (i): decomposition. Every time a route containing the minimal flow variable is subtracted and a residual graph is left for further decomposition. The original flow distribution is thus decomposed into three routesR 2 ,R 1 , andR 3 with probability 0:4, 0:2 and 0:4 respectively. Definition 8. Edge E i;j;k dominates edge E i 0 ;j 0 ;k in [t k ;t k+1 ] if A r qk (i;j) A r qk (i 0 ;j 0 ), 8q = 1::L,8r = 0::M qk , and9q;r such thatA r qk (i;j)>A r qk (i 0 ;j 0 ). The dominance relation of edges is based on the comparison of protection provided to the targets in each sub-interval. In the following dominance relation of routes, we denote the edge E ru(k);ru(k+1);k asE(u;k) to simplify the notation, . Definition 9. Route R u = (d ru(1) ;:::;d ru(M) ) dominates R u 0 = (d r u 0(1) ;:::;d r u 0(M) ) if 8k = 1:::M 1,E(u;k) =E(u 0 ;k) orE(u;k) dominatesE(u 0 ;k) and9k such thatE(u;k) dominatesE(u 0 ;k). Route R u dominates R u 0 if each edge of R u is either the same as or dominates the corre- sponding edge inR u 0 and at least one edge inR u dominates the corresponding edge inR u 0. Denote the original route to be adjusted asR u and the new route asR u 0. A greedy way to improve the route is to replace only one node in the route. If we want to replace the node at time t k , then we have r u 0(k) = r u (k),8k6= k and d ru(k ) in the original route is replaced with 56 d r u 0(k ) . So the patroller’s route changes only in [t k 1 ;t k +1 ]. Thus, only edgesE(u;k 1) andE(u;k ) in the original route are replaced byE(u 0 ;k 1) andE(u 0 ;k ) in the new route. We are trying to find a new routeR u 0 that dominates the original route to provide equal or more protection to the targets. So the selection ofr u 0(k ) needs to meet the requirements speci- fied in Algorithm 2. The first one describes the speed limit constraint. The second one requires the changed edgesE(u 0 ;k 1) andE(u 0 ;k ) are either equal to or dominate the corresponding edges in the original route (and dominance relation exist for at least one edge). The third require- ment attains a local maximum. If such a new node does not exist for a specifiedk , we return the original routeR u . We can iterate this process for the new route and get a final route denoted byR u 0 after several iterations or when the state of convergence is reached. When the state of convergence is reached, the resulting routeR u 0 keeps unchanged no matter whichk is chosen for the next iteration. For the example in Figure 4.7, assume the only target’s moving schedule isd 1 ! d 1 ! d 2 , d 3 d 2 = d 2 d 1 = d ,r e = 0:1 d and utility function is constant. We adjust each route for only one iteration by changing the patroller’s position at time t 3 , i.e., r u (3). As t 3 is the last discretized time point, only edgeE(u; 2) may be changed. ForR 1 = (d 1 ;d 1 ;d 1 ), we enumerate all possible patroller’s positions at time t 3 and choose one according to the three constraints mentioned above. In this case, the candidates ared 1 andd 2 , so the corresponding new routes are R 1 (unchanged) andR 2 = (d 1 ;d 1 ;d 2 ) respectively. Note that edgeE d 1 ;d 2 ;2 dominatesE d 1 ;d 1 ;2 because the former one protects the target all the way in [t 2 ;t 3 ] and thusR 2 dominatesR 1 . Sod 2 is chosen as the patroller’s position att 3 andR 2 is chosen as the new route. The adjustment for all routes with non-zero probability after decomposition is shown in Table 4.3. 57 The new routes we get after step (ii) are same as the original routes or dominate the original routes. That is, whenever a routeR u is chosen according to the defender mixed strategy resulting from step (i), it is always equally good or better to choose the corresponding new route R u 0 instead, becauseR u 0 provides equal or more protection to the targets thanR u . Suppose there are H possible routes in the defender strategy after step (i), denoted asR 1 ;:::;R H . After adjusting the routes, we get a new defender strategy (p 0 (R 1 );p 0 (R 2 );:::;p 0 (R H )) in full representation (See Table 4.4). Some routes are taken with higher probability (e.g. p 0 (R 2 ) = 0:2 + 0:4 = 0:6) and some are with lower probability (e.g.p 0 (R 3 ) = 0) compared to the original strategy. For step (iii), we reconstruct a new compact representation according to Equation 4.4. This is accomplished via a process that is the inverse of decomposition and is the same as how we map a strategy in full representation into a compact representation. For the example above, the result is shown in Table 4.4. Theorem 3. After steps (i)–(iii), we get a new defender strategyf 1 that dominates the origi- nal onef 0 iff 1 is different fromf 0 . Proof: We continue to use the notation that the decomposition in step (i) yields the routes R 1 ;:::;R H . For each flow distribution variable in the original distributionf 0 (i;j;k), it is decom- posed intoH sub-flowsff 0 u (i;j;k)g according to the route decomposition. f 0 u (i;j;k) =p(R u ) ifi =r u (k);j =r u (k + 1) andf 0 u (i;j;k) = 0 otherwise. Thus we have the following equation. f 0 (i;j;k) = X H u=1 f 0 u (i;j;k) (4.28) = X u:ru(k)=i;ru(k+1)=j f 0 u (i;j;k) (4.29) 58 After adjust each route separately, each non-zero sub-flowf 0 u (i;j;k) on edgeE(u;k) is moved to edge E(u 0 ;k) as route R u is adjusted to R u 0. Reconstructing the flow distribution f 1 can also be regarded as adding up all the sub-flows after adjustment together on each edge. That means, f 1 is composed of a set of sub-flows after adjustment, denoted asff 1 u (i 0 ;j 0 ;k)g. The subscriptu represents for the index of the original route to indicate it is moved from edgeE(u;k). So f 1 u (i 0 ;j 0 ;k) = f 0 u (r u (k);r u (k + 1);k), if i 0 = R u 0(k) and j 0 = R u 0(k + 1); otherwise f 1 u (i 0 ;j 0 ;k) = 0. Similarly to Equation 4.29, we have the following equation forf 1 . f 1 (i 0 ;j 0 ;k) = X H u=1 f 1 u (i 0 ;j 0 ;k) (4.30) = X u 0 :r u 0(k)=i 0 ;r u 0(k+1)=j 0 f 1 u (i 0 ;j 0 ;k) (4.31) Based on how the adjustment is made,R u 0 is same as or dominatesR u and thusE(u 0 ;k) is same as or dominatesE(u;k). So if edgeE(u;k) protects targetF q at timet, the corresponding edgeE(u 0 ;k) after adjustment also protects targetF q at timet. Recall from Section 4.2.3 that !(F q ;t) is the sum of f(i;j;k) whose corresponding edge E i;j;k can protect the targetF q at timet. We denote by! 0 (F q ;t) and! 1 (F q ;t) the probabilities of protection corresponding tof 0 andf 1 respectively. According to Equation 4.29,! 0 (F q ;t) can be viewed as the sum of all the non-zero sub-flowsf 0 u (i;j;k) where the correspondingE(u;k) protects the targetF q at timet. Iff 0 u (i;j;k) is a term in the summation to calculate! 0 (F q ;t), it means E(u;k) protects F q at t and thus the corresponding E(u 0 ;k) protects F q at t, so the corresponding sub-flowf 1 u (r u 0(k);r u 0(k+1);k) inf 1 is also a term in the summation to calculate ! 1 (F q ;t). It leads to the conclusion ! 0 (F q ;t) ! 1 (F q ;t). Note that if8q;t, ! 0 (F q ;t) = ! 1 (F q ;t), then all routes kept unchanged in step (ii) as otherwise it contradicts with the fact that 59 the new route dominates the original route. According to Corollary 2, we havef 1 dominatesf 0 if it is different fromf 0 . In the example in Figure 4.7,f 0 (1; 1; 2) is decomposed into two non-zero termsf 0 1 (1; 1; 2) = 0:2 andf 0 3 (1; 1; 2) = 0:4 along with routesR 1 andR 3 (See Figure 4.7). After adjustment, we get the corresponding subflowsf 1 1 (1; 2; 2) = 0:2,f 1 3 (1; 2; 2) = 0:4. Recall that the target’s schedule isd 1 ! d 1 ! d 2 . The flow distribution after adjustment (See Table 4.6) gives more protection to the target in [t 2 ;t 3 ]. Since the flow is equal fromt 1 tot 2 (and therefore the protection is the same), overall the new strategy dominates the old strategy. Therefore, if we apply route-adjust to the optimal defender strategy calculated by CASS we get a more robust equilibrium. While step (iii) allows us to prove Theorem 3, notice that at the end of step (ii), we have a probability distribution over a set of routes from which we can sample actual patrol routes. For two or more defender resources, a generalized version of Definition 8 can be used to define the dominance relation on the edge tuple (E i 1 ;j 1 ;k ;:::;E i W ;j W ;k ) with coefficient matrix for multiple patrollersA r qk (i 1 ;j 1 ;:::;i W ;j W ). There are other ways to adjust each route. Instead of adjusting only one node in the route, we can adjust more consecutive nodes at a time, for example, we can adjust both r u 0(k ) and r u 0(k + 1) by checking edgesE(u 0 ;k 1),E(u 0 ;k ) andE(u 0 ;k + 1). However, we need to tradeoff the performance and the efficiency of the algorithm. This tradeoff will be further discussed in Section 4.6. 4.3.2 Flow Adjust Whereas route-adjust tries to select an equilibrium that is robust against attackers playing sub- optimal strategies, the second approach, flow-adjust, attempts to select a new equilibrium that 60 is robust to rational attackers that are constrained to attack during any time interval [t k ;t k+1 ]. As we will discuss below, flow-adjust focuses on a weaker form of dominance, which im- plies that a larger set of strategies are now dominated (and thus could potentially be elimi- nated) compared to the standard notion of dominance used by route-adjust; however flow-adjust does not guarantee the elimination of all such dominated strategies. We denote by DefEU k f the defender expected utility when an attacker is constrained to attack during time interval [t k ;t k+1 ] when the attacker provides his best response given the defender strategyf. Formally, DefEU k f = min q2f1:::Lg;t2[t k ;t k+1 ] fDefEU f (F q ;t)g. We give the following definition of “local dominance”. Definition 10. Defender strategyf locallydominatesf 0 if DefEU k f DefEU k f 0,8k. 2 Corollary 3. Defender strategyf locallydominatesf 0 if min q2f1:::Lg;t2[t k ;t k+1 ] fDefEU f (F q ;t)g min q2f1:::Lg;t2[t k ;t k+1 ] fDefEU f 0(F q ;t)g;8k; or equivalently in this zero-sum game, max q2f1:::Lg;t2[t k ;t k+1 ] fAttEU f (F q ;t)g max q2f1:::Lg;t2[t k ;t k+1 ] fAttEU f 0(F q ;t)g;8k: Corollary 3 follows from the fact that the attacker plays a best response given the defender strategy, and it means thatf locally dominatesf 0 if the maximum of attacker expected utilities in each time interval [t k ;t k+1 ] givenf is no greater than that off 0 . 2 We don’t require that there exists at least onek such that DefEU k f > DefEU k f 0. 61 Compared to Definition 7, which gives the standard condition for dominance, local domi- nance is a weaker condition; that is if f dominates f 0 then f locally dominates f 0 . However, the converse is not necessarily true. Intuitively, whereas in Definition 7 the attacker can play any (possibly suboptimal) strategy, here the attacker’s possible deviations from best response are more restricted. As a result, the set of locally dominated strategies includes the set of dominated strategies. From Definition 10, iff locally dominatesf 0 , and the attacker is rational (i.e., still playing a best response) but constrained to attack during some time interval [t k ;t k+1 ], thenf is preferable tof 0 for the defender. A further corollary is that even if the rational attacker is con- strained to attack in the union of some of these intervals, f is still preferable tof 0 iff locally dominatesf 0 . One intuition for the local dominance concept is the following: suppose we suspect the attacker will be restricted to an (unknown) subset of time, due to some logistical constraints. Such logistical constraints would likely make the restricted time subset to be contiguous or a union of a small number of contiguous sets. Since such sets are well-approximated by unions of intervals [t k ;t k + 1], local dominance can serve as an approximate notion of dominance with respect to such attackers. Flow-adjust looks for a defender strategy f 1 that locally dominates the original defender strategy f 0 . To achieve this, we simply adjust the flow distribution variables f(i;j;k) while keeping the marginal probabilitiesp(i;k) the same. Figure 4.8 shows an example game with two discretized intervals [t 1 ;t 2 ] and [t 2 ;t 3 ] (only the first interval is shown). Suppose the maximal attacker expected utility is 5U 0 in this equilibrium and is attained in the second interval [t 2 ;t 3 ]. If the attacker’s utility for success is a constantU 0 in the first interval [t 1 ;t 2 ], then the defender strategy in [t 1 ;t 2 ] could be arbitrarily chosen because the attacker’s expected utility in [t 1 ;t 2 ] in worst case is smaller than that of the attacker’s best response in [t 2 ;t 3 ]. However, if a attacker is 62 constrained to attack in [t 1 ;t 2 ] only, the defender strategy in the first interval will make a differ- ence. In this example, there is only one target moving fromd 1 tod 2 during [t 1 ;t 2 ]. The schedule of the ferry is shown as dark lines and the parallel linesL 1 1 andL 2 1 with respect to protection ra- diusr e = 0:2(d 2 d 1 ) are shown as dashed lines. The marginal distribution probabilitiesp(i;k) are all 0:5 and protection coefficientC 1 = 1. Inf 0 , the defender’s strategy is taking edgesE 1;1;1 andE 2;2;1 with probability 0:5 and the attacker’s maximum expected utility isU 0 , which can be achieved around time (t 1 +t 2 )=2 when neither of the two edgesE 1;1;1 andE 2;2;1 are within the target’s protection range. If we adjust the flows to edge E 1;2;1 and E 2;1;1 , as shown in Figure 4.8(b), the attacker’s maximum expected utility in [t 1 ;t 2 ] is reduced to 0:5U 0 as edgeE 1;2;1 is within the target’s protection range all the way. So a rational attacker who is constrained to attack between [t 1 ;t 2 ] will get a lower expected utility given defender strategyf 1 than givenf 0 , and thus the equilibrium withf 1 is more robust to this kind of deviation on the attacker side. (a) f 0 : the patroller is taking edges E1;1;1 and E2;2;1 with probability 0:5. (b) f 1 : the patroller is taking edges E1;2;1 and E2;1;1 with probability 0:5. Figure 4.8: An example of flow adjust. An rational attacker who is constrained to attack in [t 1 ;t 2 ] will choose to attack around time (t 1 +t 2 )=2 to get utilityU 0 givenf 0 and attack aroundt 1 ort 2 to get utility 0:5U 0 givenf 1 . So in flow-adjust, we construct M 1 new linear programs, one for each time interval [t k ;t k +1 ], k = 1:::M 1 to find a new set of flow distribution probabilities f(i;j;k ) 63 to achieve the lowest local maximum in [t k ;t k +1 ] with unchanged p(i;k ) and p(i;k + 1). The linear program for an interval [t k ;t k +1 ] is shown below. min f(i;j;k ) v f(i;j;k ) = 0; ifjd j d i j>v m t p(i;k + 1) = n X j=1 f(j;i;k );8i2f1:::ng p(i;k ) = n X j=1 f(i;j;k );8i2f1:::ng vAttEU(F q ;t k );8q2f1:::Lg;k2fk ;k + 1g v maxfAttEU(F q ; r+ qk );AttEU(F q ; (r+1) qk )g 8q2f1:::Lg;r2f0:::M qk g While the above linear program appears similar to the linear program of CASS, they have signif- icant differences. Unlike CASS, the marginal probabilitiesp(i;k ) here are known constants and are provided as input and as mentioned above, there is a separate program for each [t k ;t k +1 ]. Thus, we get f(i;j;k ) such that the local maximum in [t k ;t k +1 ] is minimized. Denote the minimum asv 1 k . From the original flow distributionf 0 , we getAttEU f 0(F q ;t) and we denote the original local maximum value in [t k ;t k +1 ] asv 0 k . As the subsetff 0 (i;j;k )g of the orig- inal flow distributionf 0 is a feasible solution of the linear program above, we havev 1 k v 0 k , noting that the equality happens for the interval from which the attacker’s best response is chosen. Note that any change made to f(i;j;k) in an interval [t k ;t k +1 ] will not affect the per- formance of f in other intervals as the marginal probabilities p(i;k) are kept the same, i.e., changing f(i;j;k ) based on the linear program above is independent from any change to 64 f(i;j;k);k 6= k . So we can solve the M 1 linear programs independently. After calcu- latingf(i;j;k ) for allk = 1::M 1, we can get the new defender strategyf 1 by combining the solutionsf(i;j;k ) of the different linear programs together. Asv 1 k v 0 k , we have max q2f1:::Lg;t2[t k ;t k +1 ] AttEU f 0(F q ;t) max q2f1:::Lg;t2[t k ;t k +1 ] AttEU f 1(F q ;t) for allk = 1::M 1, i.e.,f 1 locally dominatesf 0 . On the other hand, while we have restricted the strategies to have the samep(i;k), there may exist another strategyf 2 with a different set ofp(i;k) that locally dominatesf 1 . Finding locally dominating strategies with differentp(i;k) from the original is a topic of future research. Although the two refinement approaches we provide do not necessarily lead to a non- dominated strategy under the corresponding dominance definition, these two approaches are guar- anteed to find a more robust (or at least indifferent) equilibrium when faced with constrained at- tackers compared to the original equilibrium we obtain from CASS. Clearly, these two refinement approaches do not exhaust the space of refinement approaches — other refinement approaches are possible that may lead to other equilibria that are better than (e.g. dominate) the one found by CASS. However, it is likely that different defender strategies resulting from different equilibrium refinements are not comparable to each other in terms of dominance, i.e., with some constrained attackers, one equilibrium might turn out to be better and with other constrained attackers, another equilibrium might be better. Their computational costs may differ as well. Thus, understanding this space of refinement approaches regarding their computational cost and output quality, and determining which approach should be adopted under which circumstances is an important chal- lenge for future work. 65 4.4 Extension To Two-Dimensional Space Both DASS and CASS presented in Section 4.2 are based on the assumption that both the targets and the patrollers move along a straight line. However, a more complex model is needed in some practical domains. For example, Figure 4.9 shows a part of the route map of Washington State Ferries, where there are several ferry trajectories. If a number of patroller boats are tasked to protect all the ferries in this area, it is not necessarily optimal to simply assign a ferry trajectory to each of the patroller boats and calculate the patrolling strategies separately according to CASS described in Section 4.2. As the ferry trajectories are close to each other, a patrolling strategy that can take into account all the ferries in this area will be much more efficient, e.g., a patroller can protect a ferry moving from Seattle to Bremerton first, and then change direction halfway and protect another ferry moving from Bainbridge Island back to Seattle. Figure 4.9: Part of route map of Washington State Ferries In this section, we extend the previous model to a more complex case, where the targets and patrollers move in a two-dimensional space and provide the corresponding linear-program-based solution. Again we use a single defender resource as an example and generalize to multiple defenders at the end of this section. 66 4.4.1 Defender Strategy for 2-D As in the one-dimensional case, we need to discretize the time and space for the defender to calculate the defender’s optimal strategy. The time interval T is discretized into a set of time pointsT =ft k g. LetG = (V;E) represents the graph where the set of verticesV corresponds to the locations that the patrollers may be at, at the discretized time points inT , andE is the set of feasible edges that the patrollers can take. An edgee2 E satisfies the maximum speed limit of patroller and possibly other practical constraints (e.g., a small island may block some edges). 4.4.2 DASS for 2-D When the attack only occurs at the discretized time points, the linear program of DASS and described in Section 4.2 can be applied to the two-dimensional settings when the distance in Constraint 4.9 is substituted with Euclidean distance in 2-D space of nodesV i andV j . min f(i;j;k);p(i;k) v (4.32) f(i;j;k)2 [0; 1];8i;j;k (4.33) f(i;j;k) = 0;8i;j;k such thatjjV j V i jj>v m t (4.34) p(i;k) = N X j=1 f(j;i;k 1);8i;8k> 1 (4.35) p(i;k) = N X j=1 f(i;j;k);8i;8k<M (4.36) N X i=1 p(i;k) = 1;8k (4.37) v AttEU(F q ;t k );8q;8k (4.38) 67 Note thatf(i;j;k) now represents the probability that a patroller is moving from nodeV i toV j during [t k ;t k+1 ]. Recall in Figure 4.1.1, a patroller protects all targets within her protective circle of radiusr e . However, in the one-dimensional space, we only care about the straight lineAB, so we used q (t) = [maxfS q (t)r e ;d 1 g; minfS q (t) +r e ;d N g] as the protection range of target F q at time t, which is in essence a line segment. In contrast, here the whole circle needs to be considered as the protection range in the two-dimensional space and the extended protection range can be written as q (t) =fV = (x;y) :jjVS q (t)jjr e g. This change affects the value ofI(i;q;k) and thus the value ofAttEU(F q ;t k ) in Constraint 4.38. 4.4.3 CASS for 2-D When the attacking timet can be chosen from the continuous time intervalT , we need to analyze the problem in a similar way as in Section 4.2.3. The protection radius isr e , which means only patrollers located within the circle whose origin isS q (t) and radius isr e can protect targetF q . As we assume that the target will not change its speed and direction during time [t k ;t k+1 ], the circle will also move along a line in the 2-D space. If we track the circle in a 3-D space where the x andy axes indicate the position in 2-D and thez axis is the time, we get an oblique cylinder, which is similar to a cylinder except that the top and bottom surfaces are displaced from each other (See Figure 4.10). When a patroller moves from vertexV i (2 V ) to vertexV j during time [t k ;t k+1 ], she protects the target only when she is within the oblique cylinder. In the 3-D space we described above, the patroller’s movement can be represented as a straight line. Intuitively, there will be at most two intersection points between the patroller’s route in 3-D space and the surface. This can be proved by analytically calculating the exact time of these intersection points. Assume the patroller is moving fromV 1 = (x 1 ;y 1 ) toV 2 = (x 2 ;y 2 ) and the 68 V V r Figure 4.10: An illustration of the calculation of intersection points in the two-dimensional set- ting. Thex andy axes indicates the position in 2-D and thez axis is the time. To simplify the illustration,z axis starts from timet k . In this example, there are two intersection points occurring at time pointst a andt b . target is moving fromS q (t k ) = ( ^ x 1 ; ^ y 1 ) toS q (t k+1 ) = ( ^ x 2 ; ^ y 2 ) during [t k ;t k+1 ] (an illustration is shown in Figure 4.10). To get the time of the intersection points, we solve a quadratic equation with these coordination parameters and protection radiusr e . If a root of the quadratic equation is within the interval [t k ;t k+1 ], it indicates that the patroller’s route intersects with the surface at this time point. So there will be at most two intersection points. Once we find all these intersection points, the same analysis in Section 4.2.3 applies and we can again claim Lemma 1. So we conclude that we only need to consider the attacker’s strategies at these intersection points. We use the same notation r qk as in the one-dimensional case to denote the sorted intersection points and get the following linear program for the 2-D case. min f(i;j;k);p(i;k) v (4.39) subject to constraints(4:33::: 4:38) v maxfAttEU(F q ; r+ qk ); AttEU(F q ; (r+1) qk )g (4.40) 8k2f1:::Mg;q2f1:::Lg;r2f0:::M qk g 69 Algorithm 4.2.3 can still be used to add constraints to the linear program of CASS for the 2-D case. The main difference compared to CASS in the 1-D case is that since Euclidean distance in 2-D is used in Constraint 4.34 we need to use the extended definition of q (t) in 2-D when deciding the entries in the coefficient matrixA r qk (i;j). The detailed calculation for finding the intersection points is shown below. We calculate the time where the patroller’s route intersects with the protection range for a target when the patroller is moving fromV 1 = (x 1 ;y 1 ) toV 2 = (x 2 ;y 2 ) and the target is moving fromS q (t k ) = ( ^ x 1 ; ^ y 1 ) toS q (t k+1 ) = ( ^ x 2 ; ^ y 2 ) during [t k ;t k+1 ]. The patroller’s position at a given timet2 [t k ;t k+1 ] is denoted as (x;y) and the target’s position is denoted as (^ x; ^ y). Then we have x = tt k t k+1 t k (x 2 x 1 ) +x 1 ; y = tt k t k+1 t k (y 2 y 1 ) +y 1 (4.41) ^ x = tt k t k+1 t k ( ^ x 2 ^ x 1 ) +x 1 ; ^ y = tt k t k+1 t k ( ^ y 2 ^ y 1 ) + ^ y 1 (4.42) At an intersection point, the distance from the patroller’s position to the target’s position equals to the protection radiusr e , so we are looking for a timet such that (x ^ x) 2 + (y ^ y) 2 =r 2 e (4.43) By substituting the variables in Equation 4.43 with Equations 4.41–4.42, and denoting A 1 = (x 2 x 1 ) ( ^ x 2 ^ x 1 ) t k+1 t k ; B 1 =x 1 ^ x 1 ; A 2 = (y 2 y 1 ) ( ^ y 2 ^ y 1 ) t k+1 t k ; B 2 =y 1 ^ y 1 ; 70 Equation 4.43 can be simplified to (A 1 tA 1 t k +B 1 ) 2 + (A 2 tA 2 t k +B 2 ) 2 =r 2 e : (4.44) Denote C 1 = B 1 A 1 t k and C 2 = B 2 A 2 t k , and we can easily get the two roots of this quadratic equation, which are t a;b = 2(A 1 C 1 +A 2 C 2 ) 2 p (A 1 C 1 +A 2 C 2 ) 2 (A 2 1 +A 2 2 )(C 2 1 +C 2 2 r 2 e ) 2(A 2 1 +A 2 2 ) : (4.45) t a ort b is the time of a valid intersection point if and only if it is within the time interval under consideration ([t k ;t k+1 ]). For multiple defender resources, again the linear program described in Section 4.2.5 is appli- cable when the extended definition of q (t) is used to calculateAttEU and Constraint 4.21 is substituted with the following constraint: f(i 1 ;j 1 ;:::;i W ;j W ;k) = 0;8i 1 ;:::;i W ;j 1 ;:::;j W such that9u;kV ju V iu k>v m t : 4.5 Route Sampling We have discussed how to generate an optimal defender strategy in the compact representation; however, the defender strategy will be executed as taking a complete route. So we need to sample a complete route from the compact representation. In this section, we give two methods of sam- pling and show the corresponding defender strategy in the full representation when these methods are applied. 71 The first method is to convert the strategy in the compact representation into a Markov strat- egy. A Markov strategy in our setting is a defender strategy such that the patroller’s movement fromt k tot k+1 depends only on the location of the patroller att k . We denote by(i;j;k) the conditional probability of moving fromd i tod j during timet k tot k+1 given that the patroller is located atd i at timet k . In other words(i;j;k) represents the chance of taking edgeE i;j;k given that the patroller is already located at node (t k ;d i ). Thus, given a compact defender strategy specified byf(i;j;k) andp(i;k), we have (i;j;k) =f(i;j;k)=p(i;k), ifp(i;k)> 0: (4.46) (i;j;k) can be an arbitrary number ifp(i;k) = 0. We can get a sampled route by first deter- mining where to start patrolling according top(i; 1); then for eacht k , randomly choose where to go fromt k tot k+1 according to the conditional probability distribution(i;j;k). The distri- bution from this sampling procedure matches the given marginal variables as each edgeE i;j;k is sampled with probabilityp(i;k)(i;j;k) = f(i;j;k). This sampling method actually leads to a full representation where route R u = (d ru(1) ;d ru(2) ;:::;d ru(M) ) is sampled with probability p(r u (1); 1) Q M1 k=1 (r u (k);r u (k + 1);k), the product of the probability of the initial distribu- tion and the probability of taking each step. This method is intuitively straightforward and the patrol route can be decided online during the patrol, i.e., the position of the patroller att k+1 is decided when the patroller reaches its position at t k , which makes the defender strategy more unpredictable. The downside of the method is that the number of routes chosen with non-zero probability can be as high asN M . For 2-D case, the patroller is located at nodeV i at timet k . The 72 sampling process is exactly the same when(i;j;k) is used to denote the probability of moving fromV i toV j during [t k ;t k+1 ]. The second method of sampling is based on the decomposition process mentioned in Section 4.3.1 (step (i)). As we discussed above for the first sampling method, sampling is essentially restoring a full representation from the compact representation. As shown in Table 4.2, there are multiple ways to assign probabilities to different routes and the decomposition process of “route- adjust” constructively defines one of them. So we can make use of the information we get from the process, and sample a route according to the probability assigned to each decomposed route. The number of routes chosen with non-zero probability is at most N 2 M, much less than the first method and thus it becomes feasible to describe the strategy in full representation, by only providing the routes that are chosen with positive probability. Different sampling approaches may be necessitated by different application requirements. Some applications might require that the defender obtains a strategy in full representation and only be presented a small number of pure strategies. However, for other applications, a strategy that can be decided on-line, potentially with a hand-held smartphone such as in (Luber, Yin, Fave, Jiang, Tambe, & Sullivan, 2013) may be preferred. Therefore, based on the needs of the application, different sampling strategies might be selected. 4.6 Evaluation We use different settings in the ferry protection domain and compare the performance in terms of the attacker’s expected utility AttEU(F q ;t). As it is a zero-sum game, a lower value of AttEU indicates a higher value of defender’s expected utility. 73 We will run experiments both for 1-D and 2-D setting. We will evaluate the performance of CASS and show the sampling results. We will also evaluate the improvement of the two refinement approaches for 1-D. Section 4.6.1 shows our results for the 1-D setting; Section 4.6.2 for the 2-D setting. 4.6.1 Experiments for One-Dimensional Setting For the 1-D setting, we first evaluate the performance of the solvers and then show how much the performance can be improved by using the refinement methods. We also show sampled routes for an example setting and evaluate CASS for varying number of patrollers. 4.6.1.1 Experimental Settings We used the following setting for the experiments in one-dimensional case. This is a complex spatio-temporal game; rather than a discrete security game as in most previous work. There are three ferries moving between terminals A and B and the total distanceAB = 1. The simulation time is 30 minutes. The schedules of the ferries are shown in Figure 4.11, where the x-axis indicates the time, and the y-axis is the distance from terminal A. Ferry 1 and Ferry 3 are moving from A to B while Ferry 2 is moving from B to A. The maximum speed for patrollers isv m = 0:1=min and the protection radius is r e = 0:1. Experiments in the one-dimensional case are using 2 patrollers (whereC 1 = 0:8, andC 2 = 1:0), except in Section 4.6.1.5 where we report on experiments with different numbers of patrollers. 74 0 10 20 30 0 0.5 1 t − time d − distance Ferry1 Ferry2 Ferry3 Figure 4.11: Schedules of the ferries 4.6.1.2 Performance of Solvers We compare the strategies calculated by CASS with DASS and a baseline strategy. In the baseline strategy, the two patrollers choose a ferry with a probability of 1/3 (uniformly random) and move alongside it to offer it full protection, leaving the other two unprotected (strategy observed in practice). First, we wished to stress-test CASS by using more complex utility functions than in the realistic case that follows. Therefore, we tested under four different discretization levels (details about discretization levels are included in Table 4.5) with random utilities, and at each discretization level, we created 20 problem instances. The problem instances are different across levels. In this ferry protection domain, the utility function for each ferry usually depends on the ferry’s position, so each instance has utilities uniformly randomly chosen between [0; 10] at discretized distance points; an example is shown in Figure 4.12(a). The chosen discretization levels have ensured that U q (t) is linear in t in each time interval [t k ;t k+1 ] for each target F q . In Figure 4.12(a), the x-axis indicates the distance d from terminal A, the y-axis indicates the utility of a successful attack if the ferry is located at distanced. In Figure 4.12(b), x-axis plots the four discretization levels and y-axis plots the average attacker expected utility if he plays best response over the 20 instances for baseline, DASS and CASS. CASS is shown to outperform DASS and baseline and the differences are statistically significant (p< 0:01). Note that different 75 sets of instances are generated for different discretization levels, so we cannot compare the results across levels directly. However, it is helpful to better understand the models. From the figure, we find the solution quality of DASS varies a lot and sometimes can be worse than the naive strategy (e.g., level 1). This is because DASS calculates an optimal solution that considers only the attacks at the discretized time points. In Figure 4.12(b), the solution quality is measured byAttEU m , which is calculated as the maximum over the continuous attacker strategy set. The gap between the optimal objective function of DASS and the actual AttEU m given the optimal solution of DASS may vary for different strategies and different discretization levels. Another interesting observation is that the average solution quality of CASS is almost the same for all discretization levels. Despite the difference in instance sets, this result implies that the improvement of a finer discretization may be limited for CASS. 0 0.5 1 0 5 10 d − distance U − utility (a) Randomized attacker utility function Level1 Level2 Level3 Level4 0 2 4 6 8 Ave(AttEU m ) NAIVE DASS CASS (b) Average solution quality of different strate- gies Figure 4.12: Performance under different randomized utility function settings. The utility func- tion in this set of experiments is a function of the distance to Terminal A. The utility function is piece-wise linear and the value at discretized distance pointsd i is chosen randomly between [0,10]. Next, we turn to more realistic utility function in this ferry domain, which is ofU-shape or inverse U-shape. Figure 4.13(a) shows a sample utility curve where the attacker gains higher utility closer to the shore. We fix the utility at the shore as 10, vary the utility in the middle (denoted asU mid ), which is the value on the floor of theU-shape or the top of the inverseU-shape 76 0 0.5 1 4 6 8 10 d − distance U − utility (a) Realistic attacker utility function with U mid = 5 0 5 10 15 20 0 5 10 15 U mid Sup(AttEU) NAIVE DASS CASS (b) Solution quality of different strategies Figure 4.13: Performance under different realistic utility function settings. The utility function is U-shape or inverse U-shape. The utility around distance 0:5 is denoted asU mid . We compare the defender strategy given by DASS and CASS with the baseline whenU mid is changing from 1 to 20. and evaluate the strategies. In Figure 4.13(b),U mid is shown on the x-axis, and we compare the performance of the strategies regarding attacker’s expected utility when he plays the best response on the y-axis. We conclude that 1) the strategy calculated by CASS outperforms the baseline and DASS; 2) DASS may achieve worse results than the baseline. 0 10 20 30 1 2 3 4 5 t − time AttEU DASS CASS Figure 4.14: The attacker’s expected utility function given the defender strategy calculated by DASS vs CASS under example setting. The expected utilities at the discretized time points are indicated by squares for CASS and dots for DASS. The maximum of AttEU under CASS is 3.82, 30%less than the maximum ofAttEU under DASS, which is 4.99. Among all these different experiment settings of discretization and utility function, we choose one instance and provide a more detailed analysis of it. We refer to this instance as example setting in the following of this section. In this example setting, discretization level 4 is used and the util- ity curve is as shown in Figure 4.13(a), other parameters involved are described in Section 4.6.1.1. 77 Figure 4.14 compares the attacker expected utility function when DASS and CASS is used re- spectively. The x-axis indicates the timet, and the y-axis indicates the attacker’s expected utility if he attacks Ferry 1 at timet. For the strategy calculated by DASS, the worst performance at dis- cretized time points is 3.50 (AttEU(F 1 ; 20)), however, the supremum of AttEU(F 1 ;t),t2 [0; 30] can be as high as 4.99 (AttEU(F 1 ; 4 + )), which experimentally shows that taking into considera- tion the attacks between the discretized time points is necessary. For the strategy calculated by CASS, the supremum of AttEU(F 1 ;t) is reduced to 3.82. 4.6.1.3 Improvement Using Refinement Methods We compare the refinement approaches described in Section 4.3 and analyze the tradeoff between performance improvement and runtime. Three approaches are considered for comparison: route- adjust, flow-adjust and a variation of route-adjust, denoted by route-adjust2. In step (ii) of route- adjust, we replace every node in the route one-by-one in sequence. 3 In step (ii) of route-adjust2, we replace every consecutive pair of nodes in the route in sequence. We first show results for the example setting. In Figure 4.15(a), we compare the AttEU(F q ;t) function of the defender strategy given by CASS and of the one after route-adjust for Ferry 1. It shows for an attack aiming at any target at any time, the defender strategy after route- adjust refinement is equally good or better than the one in the original equilibrium, and thus, the defender performs equally or better no matter how the attacker is constrained in time, i.e., the defender strategy after route-adjust dominates the original strategy. Figure 4.15(b) is the comparison between AttEU function of the defender strategy after route-adjust and the one after 3 In supplementary experiments, we also tested route-adjust with more iterations, e.g., repeating the process of re- placing every node in sequence five times. The extra benefit is insignificant while the runtime increases proportionally to the number of iterations. In light of this, we choose to replace each node only once in the experiments reported in this chapter. 78 0 10 20 30 1 2 3 4 t − time AttEU CASS Route−Adjust (a) AttEU function of Ferry 1 after route-adjust (one node at a time) 0 10 20 30 1 2 3 4 t − time AttEU Route−Adjust Route−Adjust2 (b) AttEU function of Ferry 1 after route-adjust2 (two nodes at a time) 0 10 20 30 1 2 3 4 t − time AttEU CASS Flow−Adjust (c) Performance of flow-adjust Figure 4.15: Performance of equilibrium refinement approaches. route-adjust2 for Ferry 1. The one after route-adjust2 does not dominate the one after route- adjust but overall the former appears to perform better than the latter more frequently and by larger amounts. If we use the average value of AttEU function as a metric of performance, we will show that route-adjust2 is better than route-adjust in this example setting later in Table 4.6. Figure 4.15(c) shows the comparison between the AttEU function of the defender strategy given by CASS and that of the defender strategy after flow-adjust for Ferry 1. The strategy given by CASS is not dominated by the one after flow-adjust under Definition 7, but if we investigate the maximum of AttEU in each time interval [t k ;t k+1 ], as shown in Table 4.7, we find that the defender strategy after flow-adjust locally dominates the original strategy. We list the worst case performance and the average performance of AttEU function over all ferries in this example setting for four defender strategies (CASS, route-adjust, route-adjust2, flow-adjust) in Table 4.6, from which we conclude that 1) the worst case performance of all 79 strategies of flow-adjust is the same, which means the defender achieves exactly same expected utility towards an unconstrained rational attacker; 2) the average performance of flow-adjust is slightly better than the CASS, but is outperformed by route-adjust and route-adjust2, while it takes much less time to run compared to the other two; 3) in this example setting, when we adjust two consecutive nodes at a time, the performance is better than adjusting only one node at a time, but the difference is not significant, and it is much more expensive in terms of run-time. Figure 4.16(a) and Figure 4.16(b) shows the maximum and the average improvement of route-adjust, route-adjust2 and flow-adjust, averaged over all the 20 instances of Level 4 with randomized utilities that have been used for Figure 4.12(b); and Figure 4.16(c) shows the average runtime. The maximum improvement is the largest difference between the AttEU function given defender strategy calculated by CASS and the one after refinement. The average improvement is the average difference between the two functions. The standard deviations over all instances are shown as error bars. Figure 4.16 confirms that all the refinement approaches improve the defender strategy calculated by CASS in terms of both the maximum performance and aver- age performance and thus provide better defender strategies given possible constrained attackers. Route-adjust2 achieves the most improvement, then route-adjust, and flow-adjust the least. Flow- adjust achieves much less improvement compared to the other two approaches. One explanation for this is that the constraints are very strong as they require all marginal probabilities to be unchanged, so it is likely that little changes are made to the original defender strategy. The differ- ence between route-adjust2 and route-adjust is not as significant. Regarding run-time, flow-adjust is the least expensive, route-adjust the second and route-adjust2 the most. Route-adjust2 is sig- nificantly more expensive compared to the other two. So we conclude that route-adjust is a better choice considering the tradeoff between improvement and the runtime. 80 0 0.5 1 1.5 2 2.5 Max Improvement route−adjust route−adjust2 flow−adjust (a) Average of maximal improvement 0 0.2 0.4 0.6 Ave Improvement route−adjust route−adjust2 flow−adjust (b) Average of average improvement 0 10 20 30 40 Runtime (minutes) route−adjust route−adjust2 flow−adjust (c) Average of runtime Figure 4.16: Comparison of refinement approaches. 4.6.1.4 Sampled Routes We first convert the defender strategy under the example setting into a Markov strategy and sam- ple 1000 pair of patrol routes. The defender strategy used here is the one after “route-adjust”. In each sample, a pair of routes is chosen step by step for the two patrol boats according to the joint conditional probability distributionf(i1;j1;i2;j2;k)g. The routes for the two patrol routes are chosen simultaneously as they are coordinating with each other. We cannot show each pair separately for all 1000 samples. Instead, Figure 4.17(a) shows the frequency of being taken out of the 1000 samples of each edge. The x-axis indicates the time and the y-axis is the distance to terminal A. The width of the each edge indicates the frequency of being chosen by at least one patroller. Although Figure 4.17(a) does not precisely depict the samples, it provides a rough view of how the routes are taken by the patrol boats. 81 Figure 4.17(b) shows the pair of routes that is of highest probability when we use the de- composition method of sampling. The solid lines show the patrol boats’ routes and the dashed lines show the ferries’ schedules. We get 3958 different pairs of patrol routes in total in the decomposition process, and the shown pair of routes is chosen with probability 1:57%. 0 5 10 15 20 25 30 0 0.2 0.4 0.6 0.8 1 t − time d − distance (a) 0 5 10 15 20 25 30 0 0.2 0.4 0.6 0.8 1 t − time d − distance Patrol Boat 1 Patrol Boat 2 (b) Figure 4.17: Results for sampling under the example setting: (a) Frequency of each edge is chosen when the first sampling method based on Markov strategy is used. (b) Decomposed routes with highest probability superimposed on ferry schedules when the second sampling method based on decomposition is used. 4.6.1.5 Number of Patrollers Figure 4.18(a) shows the improvement in performance of CASS with increasing number of pa- trollers under discretization Level 1. The x-axis shows the number of patrollers and the y-axis indicates the average of attacker’s maximal expected utility, i.e., the expected reward when he plays his best response. The results are averaged over 20 random utility settings of discretiza- tion Level 1. With fewer patrollers, the performance of the defender varies a lot depending on the randomized utility function (as indicated by standard deviation shown as the error bar). But the variance gets much smaller with more patrollers, which means the defender has sufficient re- sources for different instances. Figure 4.18(b) shows the run-time for CASS. The y-axis indicates the average of natural logarithm of runtime. Not surprisingly, the run-time increases when the number of patrollers increases. 82 0 2 4 6 Attacker EU 1 patroller 2 patrollers 3 patrollers 4 patrollers (a) Solution quality of Level 1 −3 −2 −1 0 1 log(Runtime (seconds)) 1 patroller 2 patrollers 3 patrollers 4 patrollers (b) Runtime of Level 1 0 1 2 3 4 5 Attacker EU 1 patroller 2 patrollers 3 patrollers (c) Solution quality of Level 2 −2 −1 0 1 2 3 log(Runtime (seconds)) 1 patroller 2 patrollers 3 patrollers (d) Runtime of Level 2 Figure 4.18: Performance with varying number of patrollers. Figure 4.18(c) and 4.18(d) show the average performance and run-time of CASS with dis- cretization Level 2, using the same set of utility settings as used in Level 1. Only results for 1 to 3 patrollers are shown. The program runs out of memory for four patrollers as there are N 8 M = 2734375 flow distribution variables and at leastN 4 M = 8757 constraints. Note that the average solution quality of Level 2 is better than the result of Level 1 (e.g., the average at- tacker EU for 1 patroller is 4.81 in Level 1 and 4.13 in Level 2), which indicates a higher level of granularity can improve the solution quality. However, granularity clearly affect the ability to scale-up; which means that we need to consider the tradeoff between the solution quality and the memory used and one way to combat the scaling-up problem is to reduce the level of granularity. Nonetheless, the number of patrollers we have encountered in real-world scenarios such as at New York is of the order of 3 or 4, so CASS is capable at least for key real-world scenarios. 83 4.6.1.6 Approximation Approach for Multiple Defender Resources We tested the first approximation approach for multiple defender resources described in Section 4.2.5 for the example setting. We used the fmincon function with interior-point method in MAT- LAB to minimize the non-linear objective function (Equation 4.27). Table 4.8 lists different run- time and the value of the objective function achieved given different iteration number (denoted as MaxIter). The function is not ensured to provide a feasible solution when the iteration number is not large enough, as shown in the first two rows. We compared the result with our LP for- mulation of DASS, which was implemented in MATLAB using linprog function. DASS can be solved within 8:032 seconds and provides an optimal solution AttEU m = 3:5, this approximation approach is outperformed in both run-time efficiency and solution quality. This approach fails to provide a feasible solution efficiently and even when sufficient time is given (more than 400 times the run-time of the LP formulation), the maximum attacker expected utility is 18% larger than the optimal solution. This is mainly because the new formulation in the approximation approach is no longer linear or convex, making it difficult to find a global maximum. 4.6.2 Experiments for Two Dimensional Setting The settings in 2-D space are more complex even with single patroller. Here we show an example setting motivated by the ferry system between Seattle, Bainbridge Island and Bremerton as shown in Figure 4.9. In this example setting, three terminals (denoted as A, B, and C) are non-collinear in the 2-D space as shown in the Figure 4.19(a). Ferry 1 and Ferry 2 are moving on the trajectory between Terminal B and C (denoted as Trajectory 1) and Ferry 3, and Ferry 4 are moving on the trajectory between Terminal B and A (denoted as Trajectory 2). The schedules of the four ferries are shown in Figure 4.19(b), where the x-axis is the time, and the y-axis is the distance from the 84 0 1.5 3 4.5 0 1 2 x y Terminals in 2D A B C Trajectory 2 Trajectory 1 (a) Three terminals 0 20 40 60 0 0.2 0.4 0.6 0.8 1 time distance from Terminal B Ferry Schedules Ferry1 Ferry2 Ferry3 Ferry4 (b) Ferry schedules 0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 distance from Terminal B utility Utility Function Ferry Trajectory1 Ferry Trajectory2 (c) Utility function 0 1.5 3 4.5 0 1 2 x y Edges Available (d) Available edges Figure 4.19: An example setting in two-dimensional space common terminal B. Ferry 1 moves from C to B, Ferry 2 moves from B to C, Ferry 3 moves from B to A and Ferry 4 moves from A to B. Similar to the one-dimensional scenario in ferry domain, we assume the utility is decided by the ferry’s position and the utility function is shown in Figure 4.19(c). The x-axis is the distance from the common terminal B and the y-axis is the utility for the two trajectories respectively. The 2-D space is discretized into a grid as shown in Figure 4.19(d) withx = 1:5 andy = 1 indicating the interval in the x-axis and y-axis. A patroller will be located at one of the intersection points of the grid graph at any discretized time points. The simulation time is 60 minutes andM = 13, i.e.,t k+1 t k = 5 minutes. The speed limit for the patroller isv e = 0:38 and all the available edges that a patroller can take during [t k ;t k+1 ] are shown in Figure 4.19(d). Only one patroller is involved. The protection radius is set tor e = 0:5, and protection coefficient isC 1 = 0:8. 85 0 20 40 60 4 5 6 7 t − time AttEU DASS CASS (a) Solution quality of DASSand CASS for Ferry 2 0 1.5 3 4.5 0 1 2 x y Sampled Route by CASS start end stay stay stay stay stay stay (b) Sampled route 1 superimposed on ferry tra- jectories 0 1.5 3 4.5 0 1 2 x y Sampled Route by CASS start end stay stay stay stay stay stay (c) Sampled route 2 superimposed on ferry tra- jectories Figure 4.20: Experimental results under two-dimensional settings Figure 4.20(a) compares the performance of DASS and CASS for Ferry 2. Ferry 2 is chosen because in both strategies, the attacker’s best response is to attack Ferry 2. The x-axis is the time t, and the y-axis is the attacker expected utility of attacking Ferry 1 at timet. The maximum of AttEU of CASS is 6.1466, 12% lower compared to the result of DASS, which is 6.9817. Figure 4.20(b) and 4.20(c) show two sampled route given the strategy calculated by CASS on the 2-D map where the dashed lines represent for the ferry trajectories. The patroller starts from the node with text “start” and follows the arrowed route, and ends at the node with text “end” at the end of the patrol. She may stay at the nodes with text “stay”. The patrol routes are shown in an intuitive way but can be ambiguous. The exact route should be listed as a table with time and position. 86 The routes are sampled based on the converted Markov strategy, and the total number of patrol routes that may be chosen with non-zero probability is 4:49 10 10 . 4.7 Chapter Summary This chapter makes several contributions in computing optimal strategies given moving targets and mobile patrollers. First, we introduce MRMT sg , a novel Stackelberg game model that takes into consideration spatial and temporal continuity. In this model, targets move with fixed sched- ules and the attacker chooses his attacking time from a continuous time interval. Multiple mobile defender resources protect the targets within their protection radius, and bring in continuous space in our analysis. Second, we develop a fast solution approach, CASS, based on compact repre- sentation and sub-interval analysis. Compact representations dramatically reduce the number of variables in designing the optimal patrol strategy for the defender. Sub-interval analysis reveals the piece-wise linearity in attacker expected utility function and shows there is a finite set of dominating strategies for the attacker. Third, we propose two approaches for equilibrium refine- ment for CASS’s solutions: route-adjust and flow-adjust. Route-adjust decomposes the patrol routes, greedily improves the routes and composes the new routes together to get the new de- fender strategy. Flow-adjust is a fast and simple algorithm that adjusts the flow distribution to achieve optimality in each time interval while keeping the marginal probability at the discretized time points unchanged. Additionally, we provide detailed experimental analyses in the ferry pro- tection domain. CASS has been deployed by the US Coast Guard since April 2013. There are several important avenues for future work. These include: (i) use a decreasing function to model the protection provided to the targets instead of using a fixed protection radius; 87 (ii) handle practical constraints on patrol boat schedule as not all are easily implementable; (iii) efficiently handle more complex and uncertain target schedules and utility functions. Here we provide an initial discussion about the relaxation of the assumptions that we listed in Section 4.1 and used throughout the chapter: • If we allow for complex and uncertain target schedules, we may model the problem as a game where the targets follow stochastic schedules. Our framework may still apply but may need to be enriched (e.g., using approaches such as the use of MDPs to represent defender strategies, see (Jiang, Yin, Zhang, Tambe, & Kraus, 2013)). Coordinating multiple such defenders then become an important challenge. It may be helpful in such cases to appeal to more of the prior work on multi-agent teamwork, given the significant uncertainty in such cases leading to more need for on-line coordination (Tambe, 1997; Stone, Kaminka, Kraus, & Rosenschein, 2010; Kumar & Zilberstein, 2010; Yin & Tambe, 2011). • If we focus on environments where multiple attackers can coordinate their attacks, then we may need to further enhance our framework. Prior results from (Korzhyk, Conitzer, & Parr, 2011) over stationary targets and discrete time would be helpful in addressing this challenge, although the case of moving targets in continuous space and time in such cases provides a very significant challenge. Combining with the previous item for future work, a complex multiple defender multiple attacker scenarios would appear to be a very significant computational challenge. 88 Notation Meaning MRMT The problem of multiple Mobile Resources protecting Moving Targets MRMT sg Game model with a continuous set of strategies for the attacker for MRMT. L Number of ferries. F q Ferry with indexq. A,B Terminal points. T Continuous time interval or a finite set of time points. D Continuous space of possible locations or a set of distance points. S q (t) Ferry schedule. Position of the targetF q at a specified timet. W Number of patrollers. P u Patroller with indexu. v m Speed limit of patroller. r e Protection radius of patroller. C G Probability that the attacker can be stopped withG patrollers. U q (t) Positive reward of a successful attack on targetF q at timet for the attacker. M Number of discretized time points. N Number of discretized distance points. t k Discretized time point. d i Discretized distance point. t Distance between two adjacent time points. R u Patrol route for patrollerP u . Under discretization of the defender’s strategy space,R u can be described as a vector. r u (k) The patroller is located atd ru(k) at timet k . f(i;j;k) Flow distribution variable. Probability that the patroller moves from d i to d j during time [t k ;t k+1 ]. p(i;k) Marginal distribution variable. Probability that the patroller is located atd i t k . E i;j;k The directed edge linking nodes (t k ;d i ) and (t k+1 ;d j ). p(R u ) Probability of taking routeR u . AttEU(F q ;t) Attacker expected utility of attacking targetF q at timet. q (t) Protection range of targetF q at timet !(F q ;t) Probability that the patroller is protecting targetF q at timet. I(i;q;k) Whether a patroller located atd i at timet k is protecting targetF q . L 1 q ,L 2 q Lines ofS q (t)r e . r qk Therth intersection point in [t k ;t k+1 ] with respect to targetF q . AttEU(F q ; r qk ) Left/right-side limit of AttEU(F q ;t) at r qk . M qk Number of intersection points in [t k ;t k+1 ] with respect to targetF q . A r qk (i;j) C 1 if patroller taking edgeE i;j;k can protect targetF q in [ r qk ; r+1 qk ]; 0 otherwise. E(u;k) Short forE ru(k);ru(k+1);k . Table 4.1: Summary of notations involved in the chapter. 89 Full Representation 1 R 1 = (d 1 ;d 1 ;d 1 ) R 2 = (d 1 ;d 1 ;d 2 ) R 3 = (d 2 ;d 1 ;d 1 ) R 4 = (d 2 ;d 1 ;d 2 ) Full Representation 2 R 1 = (d 1 ;d 1 ;d 1 ) R 2 = (d 1 ;d 1 ;d 2 ) R 3 = (d 2 ;d 1 ;d 1 ) R 4 = (d 2 ;d 1 ;d 2 ) Table 4.2: Two full representations that can be mapped into the same compact representation shown in Figure 4.3. R u p(R u ) after decomposition Adjusted Routes R 1 = (d 1 ;d 1 ;d 1 ) 0.2 (d 1 ;d 1 ;d 2 ) =R 2 R 2 = (d 1 ;d 1 ;d 2 ) 0.4 (d 1 ;d 1 ;d 2 ) =R 2 R 3 = (d 2 ;d 1 ;d 1 ) 0.4 (d 2 ;d 1 ;d 2 ) =R 4 Table 4.3: Step (ii): Adjust each route greedily. R u p 0 (R u ) after adjustment Composed Flow Distribution R 1 = (d 1 ;d 1 ;d 1 ) 0 R 2 = (d 1 ;d 1 ;d 2 ) 0.6 R 3 = (d 2 ;d 1 ;d 1 ) 0 R 4 = (d 2 ;d 1 ;d 2 ) 0.4 Table 4.4: Step (iii): compose a new compact representation. Level t (minutes) M d N 1 10 4 0.5 3 2 5 7 0.25 5 3 2.5 13 0.125 9 4 2 16 0.1 11 Table 4.5: Details about discretization levels. In the experiments mentioned in this section, the distance space is evenly discretized, parameterized by d =d i+1 d i . Strategies Worst Case Performance Average Performance Runtime (minutes) CASS 3.82 3.40 - Route-Adjust 3.82 2.88 8.96 Route-Adjust2 3.82 2.76 32.31 Flow-Adjust 3.82 3.34 0.50 Table 4.6: Comparison of different refinement approaches in terms of average performance and runtime. Only the runtime for the refinement process is calculated. 90 time interval [t k ;t k+1 ] maximum before maximum after time interval [t k ;t k+1 ] maximum before maximum after [2; 4] 3.7587 3.6675 [16; 18] 3.8111 3.7291 [4; 6] 3.8182 3.8182 [18; 20] 3.8182 3.8182 [6; 8] 3.8153 3.6164 [20; 22] 3.8182 3.8182 [8; 10] 3.8137 3.6316 [22; 24] 3.8182 3.8182 [10; 12] 3.8052 3.6316 [24; 26] 3.8182 3.8182 [12; 14] 3.8050 3.5664 [26; 28] 3.8182 3.8182 [14; 16] 3.7800 3.2100 [28; 30] 3.8182 3.8182 Table 4.7: The maximum of attacker’s expected utility in each time interval decreases after flow- adjust is used. MaxIter Runtime(sec) AttEU m 3000 4.14 infeasible 10000 17.21 infeasible 900000 3298 4.0537 Table 4.8: Performance of approximation approach. 91 Chapter 5 Reasoning in Continuous Space In addition to reasoning in continuous time, my thesis addresses the problem of reasoning in continuous space when protecting a large area. This work is motivated by the challenge in a ”green security game” where policy-makers try to design patrol strategy for protecting forest areas from illegal extraction 1 . Illegal extraction of fuelwood or other natural resources from forests is a problem confronted by officials in many developing countries, with only partial success (MacKinnon, Mackinnnon, Child, & Thorsell, 1986; Dixon & Sherman, 1990; Clarke, Reed, & Shrestha, 1993; Robinson, 2008). To cite just two examples, Tanzania’s Kibaha Ruvu Forest Reserves are “under constant pressure from the illegal production of charcoal to supply markets in nearby Dar es Salaam,” 2 and illegal logging is reported “decimating” the rosewood of Cambodia’s Central Cardamom Protected Forest (see Fig. 5). In many cases, forest land covers a large area, which the local people may freely visit. Rather than protecting the forest by denying extractors entry to it, therefore, protective measures take the form of patrols throughout the forest, seeking to observe and hence deter illegal extraction activity (Lober, 1992; Sinclair & Arcese, 1995). With a limited budget, a patrol strategy will seek to distribute the patrols throughout the forest, 1 The first author of the work in this chapter is Matthew P. Johnson. 2 http://www.tfcg.org/ruvu.html 92 to minimize the resulting amount of extraction that occurs or to protect as much of the forest as possible. The extraction-preventing benefits of patrols are twofold: extraction is prevented directly, when catching would-be extractors in the act, and also indirectly, through deterrence. Figure 5.1: “A truck loaded with illegally cut rosewood passes through Russey Chrum Village...in the Central Cardamom Protected Forest.” Photo from (Boyle, 2011). The problem setting to be addressed differs from those considered in previous works on secu- rity games, most crucially in that the forest protection setting is essentially continuous rather than discrete, both spatially and in terms of player actions. In the existing problems, there are a finite number of discrete locations to protect, whereas ideally the entire forest area would be protected from extraction. To address this problem, I considered a Stackelberg game in which the defender pub- licly chooses a (mixed) patrol strategy in the form of patrol density distribution over the two- dimensional protected region, i.e. a probability distribution from which to select patrols; in re- sponse, the extractor then chooses whether or not to extract, or to what degree. Previous work in forest economics has provided an influential forest protection model (Albers, 2010) (see also (Robinson, Albers, & Williams, 2008, 2011)), in which there is a circular forest surrounded by 93 villages (hence potential extractors); the task is to distribute the patrols’ probability density across the region of interest; the objective is to minimize the distance by which the extractors will tres- pass into the forest (since nearby villagers will extract as a function of this distance (HOFER, CAMPBELL, EAST, & HUISH, 2000)) and hence maximize the size of the resulting pristine forestland. The Stackelberg game I consider is a game-theoretic extension of this model, with additional features such as permitting spatial variation in patrol density, multiple patrol units, and convex polygon-shaped forests. To reason about the attacker’s strategy under this continuous setting, I provided a detailed benefit-cost analysis of the attacker. As the extractors go into the protected area for distanced, they incur a cost and gain a benefit if not caught, based on an increasing marginal cost function c(d) and a decreasing marginal benefit functionb(d). (The instantaneous or marginal cost and benefit functions are the derivatives of the functions specifying the cumulative costs and benefits, respectively, of walking that far into the forest.) A given patrol strategy will reduce the extractor’s expected benefit for an incursion of distanced fromb(d) to some valueb p (d). Based on this analysis, I proposed an efficient algorithm that calculates the optimal patrol strategy. The general idea of the optimal patrol strategy is to reduce the extractor’s expected marginal benefit to a level that is equal to his expected marginal cost, i.e.,b p (d) = c(d). Thus, the attacker has no incentive to going further and the area is effectively protected. I also provided a 1/2-approximation algorithm that calculates a ring patrol strategy, i.e., all patrol resources are distributed on a thin ring somewhere in the protected area. Economists have studied the relationship generally between enforcement policy for protect- ing natural resources and the resulting incentives for neighbors of the protected area (Milliman, 1986; Robinson, 2008; Sanchirico & Wilen, 2001). Our point of departure in this chapter is the 94 influential forest protection model of (Albers, 2010) (see also (Robinson et al., 2008, 2011)), in which there is a circular forest surrounded by villages (hence potential extractors); the task is to distribute the patrols’ probability density across the region of interest; the objective is to mini- mize the distance by which the extractors will trespass into the forest and hence (since nearby villagers will extract as a function of this distance (HOFER et al., 2000)) or maximize the size of the resulting pristine forestland. We strengthen this model in several ways, permitting spatial variation in patrol density, mul- tiple patrol units, and convex polygon-shaped forests. As has been observed (Albers, 2010), ex- ogenous legal restrictions on patrol strategies, such as requiring homogenous patrols, can degrade protection performance (MacKinnon et al., 1986; Hall & Rodgers, 1992). Unlike the existing work on this model, we bring to bear algorithmic analysis on the problem. Specifically, we show that while certain such allocations can perform arbitrarily badly compared to the optimal, prov- ably approximate or near-optimal allocations can be found efficiently. 5.1 Problem Setting In this section we present the forest model of (Albers, 2010) and formulate a corresponding optimization problem. Villagers are distributed about the forest perimeter (see Fig. 5.2), which is initially assumed to be a circular region of radius 1, though we later extend to convex polygons. An extractor’s action is to choose some distanced to walk into the forest, extracting on the return trip. We may assume, without loss of generality, that the extractor’s route goes the chosen distance d towards the forest center (on a straight line), before reversing back to his starting pointP on the perimeter. To see this, observe that all possible paths fromP will sweep out a lens-like shape but, 95 d p Figure 5.2: The forest, with the pristine area shaded. since all points on the perimeter are possible starting points, the set of all trespass paths directed towards the center sweeps out the same area. Given our objective of maximizing pristine forest area, this holds true even if extractors are distributed around the perimeter nonuniformly, as long as there is a nonzero probability of villager presence at each point on the perimeter. Due to symmetries and the fact that extractors’ decisions are uncoordinated, the problem is essentially one-dimensional. Extractors incur a cost and gain a benefit if not caught, based on an increasing marginal cost functionc(d) and a decreasing marginal benefit functionb(d). (The instantaneous or marginal cost and benefit functions are the derivatives of the functions specifying the cumulative costs and benefits, respectively, of walking that far into the forest.) If caught, the extractor’s benefit is 0 (the extracted resources are confiscated) but the cost is unchanged (the extractor’s traveled distance does not change; there is no positive punishment beyond the confiscation itself and being prevented from engaging in further extraction while leaving the forest). Since extraction can be assumed to occur only on the return trip, and given the nature of the punishment, we may restrict our attention to detection on the return trip. Thus a given patrol strategy will reduce the extractor’s expected benefit for an incursion of distanced fromb(d) to some valueb p (d). For a sufficiently fast-growing cost function relative to the benefit function, there will be a “natural core” of pristine forest even with no patrolling at all (Albers, 2010); that is, the optimal 96 trespass distance will be less than 1, since the marginal cost of extraction will eventually outweigh the marginal benefit, corresponding to the point at which the curvesb(d) andc(d) intersect (see Fig. 5.3). The overall result of choosing a given patrol strategy, therefore, is to transform the benefit curveb(d) into a lower benefit curveb p (d), thus reducing the extractor’s optimal incursion distance (see Fig. 5.3). In the language of mathematical morphology (Soille, 2004), the pristine forest areaP due to a given patrol strategy will be an erosionP = F B of the forestF by a shapeB, whereB is a circle whose radius equals the trespass distance. The erosion is the locus of points reached by the center ofB as it moves about inside ofF . Notation. b(x);c(x);(x) are the marginal benefit, cost, and capture probability functions, re- spectively. B(x);C(x); (x) are the corresponding cumulative functions. d p forp2fn;o;rg is the trespass distance under no patrols, the optimal patrol allocation, the best ring alloca- tion, respectively. r p is the radius of the pristine forest area under some patrol p. (Similarly, b p (x);B p (x).)d n d p is the reduction in trespass distance under this patrol. Definition 11. LetOPT (I) be the optimal solution value of a problem instanceI, and let ALG(I) be the solution value computed by a given algorithm. An algorithm for a maximization problem is ac-approximation (withc< 1) if, for every problem instanceI, we haveALG(I) cOPT (I). The leader has a budget E specifying a bound on the total detection probability mass that can be distributed across the region. The task is to choose an allocation in order to minimize the extractor’s resulting optimal trespass distanced n , which is equivalent to maximizing the tres- pass distance reduction and implies maximizing the pristine radius. Note that our optimal and approximation algorithms both perform a binary search and thus incur an additive error. 97 5.1.1 Detection probability models Let (x) be the detection probability density function chosen by the leader for the forest. An extractor is detected if he comes within some distance << 1 of the patrol. Under our time model, the patrol units move much less quickly than the extractors, and so patrols can be modeled as stationary from the extractor’s point of view. Therefore, if e.g. (x) is constant (for a single patrol unit) over the regionR (of sizejRj), then the probability of detection for an extraction path of length d is proportional to d, specifically d2=jRj, where the total area within distance of the length-d walk is approximated asd 2. That is, probabilities are added rather than “multiplied” due to stationarity. (Here we assume the patrol unit is not visible to the extractor.) The model described here also covers settings in which the amount spent at a location determines the sensing range there. For notational convenience, we drop andjRj throughout the chapter, assuming normalization as appropriate. (x) influences the extractor’s behavior in two ways. The rational extractor will trespass a distance into the forest that maximizes his total (or cumulative) net benefit, which is where his net marginal benefitb(x)c(x) equals zero. As the extractor moves about through a region with nonzero(x), his cost-benefit analysis is affected in two ways. First, the probability of reaching a given locationx is reduced by the cumulative probability of capture up to that point, (x), and so the net marginal benefit at pointx is reduced fromb(x)c(x) by amount (x)b(x). (Recall that capture occurs on the return trip out of the forest, and so the cost c(x) is paid regardless of whether confiscation occurs.) Second, being caught at pointx is(x) means losing the full benefit accrued so far, which further reduces the net marginal benefit at this point by amount (x)B(x), whereB(x) = x y=0 b(y)dy is the cumulative benefit. 98 We emphasize that the extractor’s strategy (trespass distance) is chosen offline (in advance), based on the expected returns of each possible strategy. Note that the extractor acquires no new information online that can affect his decision-making: the strategy consists entirely of a distance by which to attempt to trespass; once caught, there is no further choice. 5.2 Patrol Allocations Let the patrol zone be the region of the forest assigned nonzero patrol density. We note three patrol allocation strategies that have been proposed in the past: • Homogeneous: Patrol density distributed uniformly over the entire region. • Boundary: Patrol density distributed uniformly over a ring (of some negligible width) at the forest boundary. • Ring: Patrol density distributed uniformly over a some ring (of negligible widthw) con- centric with the forest. Boundary patrols can be superior to homogenous patrols since homogeneous patrols waste enforcement on the natural core (Albers, 2010). It is interesting to note that this is not always so. Suppose the homogenous-induced core radius is less than 1d,w is very small, and the trip lengthd satisfiesw < 1=2 < d 1. With homogenous patrols, we will have (d) = E=d. With boundary patrols, however, this probability for anydw will be E (1w) 2 w =E= w 1(1w) 2 , which approaches E 2 asw! 0. In this case, homogeneous patrols will outperform boundary patrols. Intuitively, this is because a patrol in the interior will “intersect” more trips 99 from boundary to the center than a patrol on the boundary will. Unfortunately, both boundary and homogeneous patrols can perform arbitrarily badly. Proposition 2. The approximation ratios of boundary and homogeneous patrols are both 0. (sketch). To see this, hold the budget fixed, and consider extremely large forests and cost and benefit functions yielding an empty natural core. The relationship between the cost/benefit func- tions and the budget be that an optimal patrol allocation will place patrols near to the forest center, halting the extractors at some distancer o from the center but the significant dispersions of patrols due to either boundary or homogenous allocations would mean failing to stop the extractors prior to the forest center, resulting in an approximation factor of 0. Instead, our optimal patrol will be of the following sort: • Band: The shape of the patrol zone is a band, i.e., the set difference of two circles, 3 both concentric with the forest. The net cumulative benefit of walking distancex isB o (x)C(x) = B(x) (x)B(x) C(x), where (x) is the capture probability for this walk. Let(x) = d(x)=dx be the prob- ability density function of the capture probability, which is proportional to patrol density. Then the probability density function corresponding toB o (x)C(x) will be d(B o (x)C(x))=dx =dB(x)=dxd(x)B(x)=dxdC(x)=dx = (1 (x))b(x)(x)B(x)c(x) (5.1) Let band [d o ;e) (with 0d o ed n ) be the patrol zone chosen by Algorithm 3. 3 Generalizable to other forest shapes, as discussed below. 100 Algorithm 3 Computing the optimal allocation(b;c;E;) (d 1 ;d 2 ) (0;d n ) binary search: whiled 1 <d 2 =3 or 2 not set do d (d 1 +d 2 )=2 (x), b(x)c(x) B(x) b(x) B 2 (x) (B(x)C(x) (B(d)C(d)) e xs:t:dxd n and(x) = 0 cost = e d 2(1x)(x)dx fd 2 d, 2 g ifcostE elsed 1 d end return (d 2 ; 2 ) Lemma 2. Without loss of generality, the optimal density(x) at each pointx2 [d o ;e) can be assumed to be the smallest possible value disincentivizing further walking from x, i.e., that density yieldingb o (x) =c(x). Moreover,b o (x)<c(x) and(x) = 0 forx>e. Proof. Consider a function () that successfully stops the extractor at some location d o but which violates the stated property, at some particular level of discretization. That is, partition the interval [d o ;d n ] inton equal sized subintervals, numberedd 1 ;:::;d n . For this discretization, we writeB(d i ) = P i1 j=1 b(j) and (d i ) = P i1 j=1 (i) (omitting the coefficients). Letd i be the first such subinterval for whichb o (d i ) < c(x), and letd + i be shorthand ford i + 1. In this case (see Eq. 5.1) we have (1 (d i ))b(d i )(d i )B(d i )c(d i )< 0. We correct this by subtracting a value from(d i ) to bring about equality, and adding to(d + i ). The marginal net benefit of step d i is then 0 (by construction), and that of step d + i is only lower than it was before, so there is no immediate payoff to walking from d i to d + i or d i + 2. Clearly (d i + 2) is unchanged. Finally, we verify that the expected total net benefit of walking 101 to positiond i + 2 is unchanged. This benefit is affected by the changes to both(d i ) and(d + i ). First,B(d i ) is added tob o (d i ) by subtracting from(d i ); second,b o (d + i ) becomes b o (d + i )(1 ((d + i ))) ((d + i ) +)B o (d + i ) = b o (d + i )(1 (d + i )) +b o (d + i )(d + i )B o (d + i )B o (d + i ) = b(d + i )(1 (d + i ))(d + i )B o (d + i ) + b o (d + i )B o (d + i ) = b o (d + i )B(d i ) Thus, since these two changes cancel out, and there was no incentive for walking from d i pastd i + 2 prior to the modification, this remains true, and so the extractor will walk no farther than he did before the modification. We repeat this modification iteratively for all earliest ad- jacent violations (d i ;d + i ), and for discretization precisionsn. Since outer rings of circular (or, more generally, convex) forests have greater circumference, each such operation of moving patrol density forward only lowers the total cost of the patrol. b o (x) < c(x) and(x) = 0 forx > e follows from() being a band that stops the extractor at positiond o . Lemma 3. Without loss of generality, we may assumed o kisses the outer edge of the patrol region. Proof. Clearlyd o will not be prior to the start of the patrol region. Ifd o lay after the beginning of the patrol region, then, by Lemma 2, the solution would have its cost only lowered by shifting the earlier patrol density pastd o . Under the varying patrol density regime, the optimal patrol allocation can be computed (nu- merically). We remark that under the resulting patrol allocation, patrol density will decline mono- tonically with distance into the forest. Intuitively, the reason for this is that as distance into the 102 marginal values b(d) c(d) d o b o (d) d n (a) Optimal patrol allocation. marginal values b(d) c(d) d r d n b r (d) (b) Ring patrol. Figure 5.3: The shaded regions correspond to the reduction in marginal benefits within the patrol zone. Not shown are the (less dramatic) effects onb() following the patrol zone, due to the cumulative capture probability. forest grows, there is a smaller and smaller remaining net marginal benefit (b(x)c(x) that we need to compensate for by threat of confiscation, and yet the magnitude of the potential confisca- tion (B(x)) grows only larger. Theorem 4. Algorithm 3 produces a near-optimal allocation (i.e., with arbitrarily small er- ror). Proof. We assume the properties stated by Lemma 2. Let d o indeed be the optimal trespass distance. Observe that forx < d o ,b o (x) = b(x); forx > e,b o (x) is determined only byb(x) and the cumulative capture probability, i.e.,b o (x) = (1 (x))b(x). e is the point at which (x) = 0 and (1 (x))b(x)c(x) = 0. Now we computeb o (). Setting Eq. 5.1 to 0 yields: (x) = (1 (x))b(x)c(x) B(x) (5.2) The solution to this standard-form first-order differential equation (recall that (x) = x do (y)dy, and note that depends on the valued o ) is: (x) =e P (x)dx Q(x)e P (x)dx dx +K) 103 whereP (x) = b(x) B(x) ,Q(x) = b(x)c(x) B(x) , andK is a constant. Since P (x)dx = b(x) B(x) dx = lnB(x), we havee P (x)dx =e lnB(x) =B(x). Therefore Q(x)e P (x)dx dx = b(x)c(x) B(x) B(x)dx = (b(x)c(x))dx =B(x)C(x) and, based on initial condition (d o ) = 0, K = Q(x)e P (x)dx dxj do =(B(d o )C(d o )) Since(x) = ((x)) 0 , this yields: (x) = B(x)C(x) (B(d o )C(d o )) B(x) (x) = b(x)c(x) B(x) b(x) B 2 (x) (B(x)C(x) (B(d o )C(d o )) Then the optimal allocation for any given budget E will equal (x) for x2 [d o ;d n ]. The total cost of this isE(d o ) = d do 2(1x)(x)dx. Ifb(x) andc(x) are polynomial functions, then(x) is a rational function, and soE(d o ) is solvable analytically, by the method of partial fractions. In this case, we can evaluate E(d o ) in constant time (for fixed b(x) and c(x)) in a real-number computation model. Alternatively,E(d o ) can be approximated within additive error in timeO(1=), using standard numerical integration methods. We can compute the smallestd o for whichE(d o )E by binary search. (e is also found by binary search, within error 3 1 2(0) , which is a constant; recall that(x) is a decreasing function.) 104 0 1 2 3 4 5 6 7 budget E 0.0 0.1 0.2 0.3 0.4 0.5 0.6 trespass distance d p Border Homogeneous Best ring ( 1 0 ¡ 1 ) Best ring ( 1 0 ¡ 3 ) Optimal 0.0 0.1 0.2 0.3 0.4 0.5 0.6 location x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Á o ( d ) (a) b(x) = 2xx 2 ;c(x) = 3x 2 0 1 2 3 4 5 6 7 budget E 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 trespass distance d p Border Homogeneous Best ring ( 1 0 ¡ 1 ) Best ring ( 1 0 ¡ 3 ) Optimal 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 location x 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Á o ( d ) (b) b(x) = 1x 20 ;c(x) =x 0 1 2 3 4 5 6 7 budget E 0.0 0.2 0.4 0.6 0.8 1.0 trespass distance d p Border Homogeneous Best ring ( 1 0 ¡ 1 ) Best ring ( 1 0 ¡ 3 ) Optimal 0.0 0.2 0.4 0.6 0.8 1.0 location x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Á o ( d ) (c) b(x) = 1;c(x) =x 0 1 2 3 4 5 6 7 budget E 0.00 0.05 0.10 0.15 0.20 trespass distance d p Border Homogeneous Best ring ( 1 0 ¡ 1 ) Best ring ( 1 0 ¡ 3 ) Optimal 0.00 0.05 0.10 0.15 0.20 location x 0 1 2 3 4 5 6 Á o ( d ) (d) b(x) = 1;c(x) = 5x Figure 5.4: Patrol strategy effectiveness for sampleb();c() functions. Algorithm 4 Computing the best ring patrol(b;c;E;) (d 1 ;d 2 ) (0;d n ) binary search: whiled 1 <d 2 or 2 not set do d (d 1 +d 2 )=2 (d) E=(2((1d)w=2)w) w e xs:t: (1)b(x) = c(x) pos e d (1 )b(x)c(x)dx neg B(d) fd 2 d, 2 g ifnegpos elsed 1 d end return (d 2 ; 2 ) This yields a total running time of eitherO(log 2 1=) orO(1= log 1=), depending on whether E(d o ) is solved analytically or approximated. The varying-density allocation of Algorithm may be difficult or impractical to implement; moreover, each iteration of the loop requires an expensive iterative approximation parameterized bys, ifE(d o ) is not solvable analytically. Now we present a more efficient algorithm that pro- duces easier-to-implement allocations. Assumingb() andc() can be integrated analytically and that their intersection can be found analytically, Algorithm 4 runs in timeO(log 1=). Theorem 5. Algorithm 4 produces a near-optimal ring patrol (i.e., within additive error at most). 105 Proof. For a candidate trespass distanced, allocating the budgetE to a width-w ring (wherew is negligible) of radiusd yields = E=(2 ((1d)w=2)w) and = w. In order to discourage the extractor from passing pointd, it must be the case that the expected cost of doing so (the potential loss due to confiscated, weighted by probability: B(d)) exceeds the expected benefit (the remaining net benefit, weighted by probability: (1 ) dn d b(x)c(x)dx). We do binary search for the smallest such valued. Theorem 6. Algorithm 4 provides a 1/2-approximation, both in trespass distance reduction and pristine radius. Proof. Letr n = 1d n be the radius of the natural core. Letr o = 1d o be the pristine area ra- dius under the optimal patrol allocation o (). We know that o () will be nonzero over the range [d o ;d n ]. Consider locationsx within this range. Asx grows fromd o tod n , the marginal benefit b(x) falls monotonically whilec(x) grows, and the cumulative benefitB(x) and cumulative cap- ture probability (x) both grow monotonically. Thus by Eq. 5.2,(x) falls monotonically over [d o ;d n ]. Now consider the radiusr r = (r o +r n )=2 and the corresponding locationd r = 1r r , which divides the range [d o ;d n ] into two halves. Because(x) is monotonic decreasing, (x) has at least as much total mass in the first half than in the second, i.e., dr do (x)dx dn dr (x)dx. Because the total cost of patrol density (x) at location x, rotated about the entire circle, is 2(x), “flattening”(x) over the range [d o ;d n ] (i.e., setting it equal to 1 dndo dn do (x)dx) will only lower the total cost. (Though doing so will sacrifice the guarantee of trespass distanced o .) Then “compressing” this total probability mass dn do (x)dx from the range [d o ;d n ] to the point 106 d r will not change the cost any further, since the mean circle circumference for radii in [r o ;r n ] is 2(r o +r n )=2, which is the same as that for radiusr r . We now claim that the constructed negligible-width ring patrol atd r will deter the extractors from crossing it, by accounting for the two “halves” of o (x). First, the “left” half of o (x) transferred tod r will yield a cumulative detect probability of o (d r ), just as under the optimal patrol. Second, the “right” half of o (x) will inflict the same total reduction in net benefits for the action of traversing [d r ;d n ] as the optimal patrol does. After passingd r , each additional step would provide a positive net marginal benefit, until regaining the pre-d r cumulative net benefit only at point d n , after which all net marginal benefits are negative. Thus every stopping point afterd r will have cumulative net benefit lower than this value immediately befored r . We have constructed a ring patrol allocation that reduces the trespass distance by at least half the optimal such value, i.e., (r n r o )=2, yielding pristine radiusr r = (r o +r n )=2r o =2, and so the result follows. We note that the approximation ratio is tight. To see this, problem instances can be con- structed satisfying the following: c(x) = 0 and b(x) is constant (and small) over the interval [d o ;d n ] (which meets an empty natural core, i.e.d n = 1), andE is very small and hence [d o ;d n ] is very narrow. In this case, i (x) grows very slowly over the patrol region, and o (x) declines very slowly over it. In the extreme case, the weight of o (x)’s probability mass to the right ofd r approaches the weight to the left. 5.2.1 Algorithmic extensions Multiple patrol units. We can extend from one to multiple patrol units, weighted equally or unequally. Givenk patrol units, each given budgetE i (e.g., 1=k) withE = P E i , we partition 107 the forest intok sectors, each of angle 2E i =E. We run one of our algorithms below, with budget E. Then we position patrol uniti at a location within sectori, chosen according to the computed (). Other forest shapes. In the noncircular forest context, permitting extractors to traverse any length-bounded path from their starting points implies that the pristine area determined by a given patrol strategy will again be an erosion of the forest. Computing the erosion of an arbitrary shape is computationally intensive (Soille, 2004), but it is easily computable for convex polygons, which will approximate many realistic forests. In order to be practically implementable in such cases, the patrol should be symmetric around the forest area. Our algorithms above adapt easily to the setting of convex polygon forest shapes, where pristine areas are erosions, by integrating the cost of a patrol around the forest boundary. In both cases, we replace the circle circumferences 2(1x) with the cost of the corresponding polygon circumference. For large polygons with a reasonable number of sides, the resulting error due to corners will be insignificant. 5.3 Experiments We implemented both our algorithms, as well as the baseline solutions of homogenous and bound- ary patrols. We tested these algorithms on certain realistic pairs of benefit and cost functions (with forest radius 1; see four examples in Fig. 5.2). We now summarize our observations on these re- sults. In each setting (see left subfigures), we vary the patrol budget, computing the patrol allocation function and hence the extractor’s trespass distance d p , for each. First, the optimal algorithm indeed dominates all the others. Both our algorithms perform much better overall than the two 108 baselines, however, up until the point at which the budget is sufficient to deter any entry into the forest, using boundary and best ring. Best ring will consider a ring at the boundary, so it cannot do worse than boundary, and so the two curves must intersect at zero. Prior to this best ring does outperform boundary. As observed above, neither homogeneous nor boundary consistently dominates the other. We computed ring patrols for two ring widths, one very narrow (1=10 4 ) and one less so (0:1). Interestingly, neither ring size dominates the other. With a sufficiently large budget, the rings will lie on the boundary, but a wider ring will permit some nonnegligible trespass (part way across the ring itself). With smaller budgets the rings will lie in the interior of the forest. In this case, the narrow ring will spend the entire budget at one (expensive) density level, whereas the wider ring can (more cheaply, and hence more successfully) spend some of its budget at lower-density levels. Next (see middle subfigures), we plot the optimal o () functions under many different bud- gets. As can be seen, these curves sweep out different regions of the plane, depending on the b();c() pair. Finally (see right subfigures), we illustrate the result of applying Algorithm 3 to a rectangular forest, with one sample budget (3.5, normalized to the dimensions of the forest). The patrol density is represented by the level of shading. The border of the natural core is also shown. 5.4 Chapter Summary In this chapter, we have presented a Stackelberg security game setting that differs significantly from those previously considered in the AI literature, which necessitates the use of very different 109 techniques from those used in the past. At the same time, this work opens up an exciting new area of research for AI at the intersection of forest economics and game theory. Eventually, as with counterterrorism Stackelberg games studied in the literature, we aim to deploy our solutions in real-world settings. Potential sites for such deployments include Tanzania’s aforementioned Kibaha Ruvu Forest Reserves and the mangrove forests of Mnazi Bay Ruvuma Estuary Marine Park. 110 Chapter 6 Reasoning with Frequent and Repeated Attacks Another spatio-temporal aspect that is addressed in my thesis is reasoning with frequent and re- peated attacks in domains such as protecting fisheries from over-fishing (Qian, Haskell, Jiang, & Tambe, 2014; Haskell et al., 2014a) and protecting rhinos and tigers from illegal poaching (Yang et al., 2014a). Poaching and illegal over-fishing are critical international problems leading to a destruction of ecosystems. For example, three out of nine tiger species have gone extinct in the past 100 years, and others are now endangered due to poaching (Secretariat, 2013). Law enforce- ment agencies in many countries are hence challenged with applying their limited resources to protecting endangered animals and fish stocks. Building upon the success of applying SSGs to protect infrastructure including airports (Pita, Jain, Western, Portway, Tambe, Ordonez, Kraus, & Paruchuri, 2008), ports (Shieh, An, Yang, Tambe, Baldwin, DiRenzo, Maule, & Meyer, 2012b) and trains (Yin, Jiang, Johnson, Tambe, Kiekintveld, Leyton-Brown, Sandholm, & Sullivan, 2012a), researchers are now applying game theory to green security domains, e.g., protecting fisheries from over-fishing (Brown et al., 2014; Haskell et al., 2014b) and protecting wildlife from poaching (Yang et al., 2014b). 111 There are several key features in green security domains that introduce novel research chal- lenges. First, the defender is faced with multiple adversaries who carry out repeated and frequent illegal activities (attacks), yielding a need to go beyond the one-shot SSG model. Second, in carrying out such frequent attacks, the attackers generally do not conduct extensive surveillance before performing an attack and spend less time and effort in each attack, and thus, it becomes more important to model the attackers’ bounded rationality and bounded surveillance. Third, there is more attack data available in green security domains than in infrastructure security do- mains, which makes it possible to learn the attackers’ decision-making model from data. Previous work in green security domains (Yang et al., 2014b; Haskell et al., 2014b) models the problem as a game with multiple rounds and each round is an SSG (Yin et al., 2010) where the defender commits to a mixed strategy and the attackers respond to it. In addition, they address the bounded rationality of attackers using the SUQR model (Nguyen, Yang, Azaria, Kraus, & Tambe, 2013b). While such advances have allowed these works to be tested in the field, there are three key weaknesses in these efforts. First, the Stackelberg assumption in these works — that the defender’s mixed strategy is fully observed by the attacker via extensive surveillance before each attack – can be unrealistic in green security domains as mentioned above. Indeed, the attacker may experience a delay in observing how the defender strategy may be changing over time, from round to round. Second, since the attacker may lag in observing the defender’s strategy, it may be valuable for the defender to plan ahead; however these previous efforts do not engage in any planning and instead rely only on designing strategies for the current round. Third, while they do exploit the available attack data, they use Maximum Likelihood Estimation (MLE) to learn the parameters of the SUQR model for individual attackers which we show may lead to skewed results. 112 Figure 6.1: Snare poaching In this chapter, we offer remedies for these limitations. First, we introduce a novel model called Green Security Games (GSGs). Generalizing the perfect Stackelberg assumption, GSGs assume that the attackers’ understanding of the defender strategy may not be up-to-date and can be instead approximated as a convex combination of the defender strategies used in recent rounds. Previous models in green security domains, e.g., such as (Yang et al., 2014b; Haskell et al., 2014b) can be seen as a special case of GSGs, as they assume that the attackers always have up-to-date information, whereas GSGs allow for more generality and hence planning of defender strategies. Second, we provide two algorithms that plan ahead — the generalization of the Stackelberg assumption introduces a need to plan ahead and take into account the effect of defender strategy on future attacker decisions. While the first algorithm plans a fixed number of steps ahead, the second one designs a short sequence of strategies for repeated execution. Third, the chapter also provides a novel framework that incorporates learning of parameters in the attackers’ bounded rationality model into the planning algorithms where, instead of using MLE as in past work, we use insights from Bayesian updating. All proposed algorithms are fully implemented and we provide detailed empirical results. 6.1 Motivation and Defining GSGs Our motivating example assumes a perfectly rational attacker purely for simplicity of exposition. In the rest of the chapter, we consider attackers with bounded rationality. 113 Example 1. Consider a ranger protecting a large area with rhinos. The area is divided into two subareasN 1 andN 2 of the same importance. The ranger chooses a subarea to guard every day and she can stop any snaring by poachers in the guarded area. The ranger has been using a uniform random strategy throughout last year. So for this January, she can choose to continue using the uniform strategy throughout the month, catching 50% of the snares. But now assume that the poachers change their strategy every two weeks based on the most recently observed ranger strategy. In this case, the ranger can catch 75% of the snares by always protectingN 1 in the first two weeks of January, and then switching to always protectingN 2 : At the beginning of January, the poachers are indifferent between the two subareas due to their observation from last year. Thus, 50% of the snares will be placed inN 1 and the ranger can catch these snares in the first half of January by only protectingN 1 . But after observing the change in ranger strategy, the poachers will switch to only putting the snares inN 2 . The poachers’ behavior change can be expected by the ranger and the ranger can catch 100% of the snares by only protectingN 2 starting from mid-January. (Of course the poachers must then be expected to adapt further). This example conceptually shows that the defender can benefit from planning strategy changes in green security domains. We now define GSG as an abstraction of the problem in green security domains (borrowing some terminology from Stackelberg Security Games (Yin et al., 2010)). Definition 12. A GSG is aT (<1) round repeated game between a defender andL GSG attackers and (i) The defender has K guards to protect N( K) targets. (ii) Each round has multiple episodes and in every episode, each guard can protect one target and each attacker can attack one target. (iii) In roundt, the defender chooses a mixed strategy at the beginning of the round, which is a probability distribution over all pure strategies, i.e.,N chooseK assignments 114 from the guards to targets. In every episode, the guards are assigned to targets according to an assignment randomly sampled from the mixed strategy. (iv) Each targeti2 [N] has payoff values P a i , R a i , P d i , R d i (“P” for “Penalty”, “R” for “Reward”, “a” for “attacker” and “d” for “defender”). If an attacker attacks targeti which is protected by a guard, the attacker gets utilityP a i , and the defender getsR d i . If targeti is not protected, the attacker gets utilityR a i , and the defender getsP d i . R d i > P d i andR a i > P a i . (v) The defender’s utility in roundt is the total expected utility calculated over all attackers. Each round of the repeated game corresponds to a period of time, which can be a time interval (e.g., a month) after which the defender (e.g., warden) communicate with local guards to assign them a new strategy. We divide each round into multiple episodes for the players to take actions. Consistent with previous work on green security games (Yang et al., 2014b; Haskell et al., 2014b), we divide the protected area into subareas or grid cells and treat each subarea or cell as a target. Different targets may have different importance to the defender and the attackers due to differences in resource richness and accessibility. We therefore associate each targeti2 [N] with payoff values. A mixed defender strategy can be represented compactly by a coverage vectorc =hc i i where 0 c i 1 is the probability that targeti is covered by some guard and it satisfies P N i=1 c i K (Kiekintveld, Jain, Tsai, Pita, Ordonez, & Tambe, 2009b; Korzhyk, Conitzer, & Parr, 2010b). If an attacker attacks targeti, the expected utility for the defender is U d i (c) =c i R d i + (1c i )P d i given defender strategyc. We denote the mixed defender strategy in roundt asc t . Definition 13. AGSGattacker is characterized by his memory length , coefficients 0 ;::: and his parameter vector !. In round t, A GSG attacker with memory length responds to a 115 convex combination of the defender strategy in recent + 1 rounds, i.e., he responds to t = P =0 c t where P =0 = 1 andc t = c 0 ift 0. In every episode of roundt, a GSG attacker follows the SUQR model and chooses a random target to attack based on his parameter vector! in the SUQR model. We aim to provide automated decision aid to defenders in green security domains who defend against human adversaries such as poachers who have no automated tools — hence we model the poachers as being boundedly rational and having bounded memory. We approximate a GSG attacker’s belief of the defender’s strategy in round t as a convex combination of the defender strategy in the current round and the last rounds. This is because the attackers may not be capable of knowing the defender’s exact strategy when attacking; naturally, they will consider the information they get from the past. Further, human beings have bounded memory, and the attackers may tend to rely on recent information instead of the whole history. The Stackelberg assumption in (Yang et al., 2014b; Haskell et al., 2014b) can be seen as a special case of this approximation with 0 = 1. In this chapter, we assume all attackers have the same memory length , coefficients and these values are known to the defender. c 0 is the defender strategy used before the game starts and is known to players. To model the bounded rationality of the human attackers such as poachers, we use the SUQR model, which has performed the best so far against human subjects in security games (Nguyen et al., 2013b). In this model, an attacker’s choice is based on key properties of each target, including the coverage probability, the reward and the penalty, represented by the parameter 116 Notation T;N;K # of rounds, targets and guards, re- spectively. L; # of attackers and memory length of attackers. c t Defender strategy in roundt. t Attackers’ belief of defender strat- egy in round t, which is a convex combination ofc t . Coefficient of c t when calculat- ing t . ! l Parameter vector of the SUQR model for attackerl. ! l 1 ,! l 2 and! l 3 are the coefficient onc i ,R a i ,P a i re- spectively in the SUQR model. q i The probability of attacking targeti. E t Defender’s expected utility in round t. Table 6.1: Summary of key notations. vector ! = (! 1 ;! 2 ;! 3 ). Given as the attacker’s belief (with i the belief of the coverage probability on targeti), the probability that an attacker with parameter! attacks targeti is q i (!;) = e ! 1 i +! 2 R a i +! 3 P a i P j e ! 1 j +! 2 R a j +! 3 P a j (6.1) Following the work of Yang et. al (Yang et al., 2014b), in this chapter, we assume the group of attackers may have heterogeneous weighting coefficients, i.e., each attackerl2 [L] is associated with a parameter vector! l = (! l 1 ;! l 2 ;! l 3 ). A GSG defender strategy profile [c] is defined as a sequence of defender strategies with length T , i.e., [c] = hc 1 ;:::;c T i. The defender’s expected utility in round t is E t ([c]) = P l P i q i (! l ; t )U d i (c t ). The objective of the defender is to find the strategy profile with the highest average expected utility over all rounds, i.e., to maximizeE([c]) = P T t=1 E t ([c])=T . 117 Algorithm 5 Plan Ahead(!,M) Output: a defender strategy profile [c] fort=1 toT do c t = f-PlanAhead(c t1 ;!; minfTt + 1;Mg) end 6.2 Planning in GSGs The defender can potentially improve her average expected utility by carefully planning changes in her strategy from round to round in a GSG. In this section, we consider the case where the attackers’ parameter vectors! 1 ;:::;! L , are known to the defender. For clarity of exposition, we will first focus on the case where 0 = 0 and = 1. This is the special case when the attackers have one round memory and have no information about the defender strategy in the current round, i.e., the attackers respond to the defender strategy in the last round. We discuss the more general case in Section 6.4. To maximize her average expected utility, the defender could optimize over all rounds simul- taneously. However, this approach is computationally expensive when T is large: it needs to solve a non-convex optimization problem withNT variables (c t i ) as the defender must consider attacker response, and the attacking probability has a non-convex form (see Equation 7.3). An alternative is a myopic strategy, i.e., the defender can always protect the targets with the highest expected utility in the current round. However, this myopic choice may lead to significant quality degradation as it ignores the impact ofc t in the next round. Therefore, we propose an algorithm named PlanAhead-M (or PA-M) that looks ahead a few steps (see Algorithm 5). PA-M finds an optimal strategy for the current round as if it is the M th last round of the game. If M = 2, the defender chooses a strategy assuming she will play a myopic strategy in the next round and end the game. When there are less than M 1 118 Target R d i P d i N 1 2 1 N 2 X 3 future rounds, the defender only needs to look ahead Tt steps (Line 5). PA-T corresponds to the optimal solution and PA-1 is the myopic strategy. Unless otherwise specified, we choose 1 < M < T . Function f-PlanAhead(c t1 ;!;m) solves the following mathematical program (MP). max c t ;c t+1 ;:::c t+m1 P m1 =0 E t+ (6.2) s:t E = P l P i q i (! l ; )U d i (c ); =t;::;t +m 1 (6.3) =c 1 ; =t;::;t +m 1 (6.4) P i c i K; =t;::;t +m 1 (6.5) This is a non-convex problem whenm > 0 and can be solved approximately with local search approaches. Although we show in the experiment section that PA-2 can provide a significant improvement over baseline approaches in most cases, there exist settings where PA-2 can perform arbitrarily badly when compared to the optimal solution. The intuition is that the defender might make a suboptimal choice in the current round with an expectation to get a high reward in the next round; however, when she moves to the next round, she plans for two rounds again, and as a result, she never gets a high reward until the last round. Example 2. Consider a guard protecting two subareas with payoff values shown on the right (X 1). For simplicity of the example, assume the defender can only choose pure strategies. There is one poacher with a large negative coefficient on coverage probability, i.e., the poacher 119 Algorithm 6 Fixed Sequence Output: defender strategy profile [c] (a 1 ;:::;a M ) = f-FixedSequence(!;M). fort=1 toT do c t =a (t modM)+1 end will always snare in the subarea that is not protected in the last round. The initial defender strategy is protectingN 1 , meaning the attacker will snare inN 2 in round 1. According to PA-2, the defender will protectN 1 in round 1 and plan to protectN 2 in round 2, expecting a total utility of 3 +X. However, in round 2, the defender choosesN 1 again as she assumes the game ends after round 3. Thus, her average expected utility is 3(T1)+X T 3. On the other hand, if the defender alternates between N 1 and N 2 , she gets a total utility of X + 2 for two consecutive rounds, and her average utility is at least X 2 3. PA-2 fails in such cases because it over-estimates the utility in the future. To remedy this, we generalize PA-M to PA-M- by introducing a discount factor 0< 1 for future rounds when Tt<M 1, i.e., substituting Equation 6.2 with max c t ;c t+1 ;:::c t+m1 X m1 =0 E t+ (6.6) While PA-M- presents an effective way to design sequential defender strategies, we provide another algorithm called FixedSequence-M (FS-M) for GSGs (see Algorithm 6). FS-M not only has provable theoretical guarantees, but may also ease the implementation in practice. The idea of FS-M is to find a short sequence of strategies with fixed lengthM and require the defender to execute this sequence repeatedly. IfM = 2, the defender will alternate between two strategies and she can exploit the attackers’ delayed response. It can be easier to communicate with local guards to implement FS-M in green security domains as the guards only need to alternate between 120 several types of maneuvers. Function f-FixedSequence(!;M) calculates the best fixed sequence of lengthM through the following MP. max a 1 ;:::;a M P M t=1 E t (6.7) s:t E t = P l P i q i (! l ; t )U d i (a t );t = 1;:::;M (6.8) 1 =a M (6.9) t =a t1 ;t = 2;:::;M (6.10) P i a t i K;t = 1;::;M (6.11) Theorem 7 shows that the solution to this MP provides a good approximation of the optimal defender strategy profile. Theorem 7. In a GSG withT rounds, 0 = 0 and = 1, for any fixed length 1<MT , there exists a cyclic defender strategy profile [s] with periodM that is a (1 1 M ) Z1 Z+1 approxi- mation of the optimal strategy profile in terms of the normalized utility, whereZ =d T M e. The intuition is to divide the optimal sequence into sections with lengthM1 and bound the defender’s expected utility in each section. Definition 14. A cyclic defender strategy profile for a GSG is a profile consisting of a cyclic sequence of strategies, i.e.,9 T , such that8t > T ,c t = c t T , T is denoted as the period of the strategy profile. Proof of Theorem 7: UseU(x 1 ;x 2 ) to denote the defender’s normalized expected utility in a round where defender strategyx 2 is used in this round and defender strategyx 1 is used in the 121 previous round. Then 0 U(x 1 ;x 2 ) 1. For the optimal defender strategy profile [c], denote the normalized utility asU opt . hb 1 ;:::;b M i is a strategy sequence whose average normalized expected utility for the last M 1 rounds, i.e., U b = P M t=2 U(b t1 ;b t ) M1 , is maximized. ha 1 ;:::;a M i is a strategy sequence such that the average normalized expected utility of the sequence when it forms a cycle, i.e., U a = U(a M ;a 1 )+ P M t=2 U(a t1 ;a t ) M , is maximized. Then MU a = U(a M ;a 1 ) + X M t=2 U(a t1 ;a t ) U(b M ;b 1 ) + X M t=2 U(b t1 ;b t ) X M t=2 U(b t1 ;b t ) = (M 1)U b Let Z = d T M e. Construct a cyclic defender strategy profile [s] by repeating the strategy sequenceha 1 ;:::;a M i. Then TU([s]) = U(c 0 ;s 1 ) + X T t=2 U(s t1 ;s t ) (6.12) (Z 1)MU a (6.13) (Z 1) (M 1)U b (6.14) Strategy profile [s] containsZ 1 complete cycles (starting witha 2 ) with an average normalized utilityU a . The first inequality is derived by ignoring the first round and the last incomplete cycle whenmod(T;M)6= 1. 122 On the other hand, for the optimal defender strategy profile [c] = [c] opt , we know that for any consecutive sequence of lengthM, the average normalized utility of lastM 1 rounds can be no more thanU b . So we divide the strategy profile intod T M1 e pieces, each piece with length M 1 except the last piece. Then for each piece, the sum of normalized utility is no more thanU b (M 1). Otherwise, if the sum of normalized utility of thei th piece is higher than U b (M 1), then the strategy sequence<c (i1)(M1) ;:::;c i(M1) > contradicts the optimality of<b 1 ;:::;b M >. Thus, TU opt = U(c 0 ;c 1 ) + X T t=2 U(c t1 ;c t ) (6.15) U b (M 1)d T M 1 e (6.16) (T +M 1)U b (6.17) The last inequality is yield by conceptually completing the last piece. Combining these re- sults, we get U([s]) U opt (Z 1) (M 1) T +M 1 (Z 1) (M 1) ZM +M = (1 1 M ) Z 1 Z + 1 So [s] is a (1 1 M ) Z1 Z+1 approximation of the optimal strategy profile in terms of the normal- ized utility. 123 According to Theorem 7, when the game has many rounds (T M), the cyclic sequence constructed by repeatinga 1 ;:::a M is a 1 1=M approximation. While in experiments this non- convex MP is solved approximately, with a large number of random restarts, we may be able to achieve this 1 1=M approximation. According to Theorem 7, when a GSG has many rounds (T M), the cyclic sequence constructed by repeatinga 1 ;:::a M is a 1 1=M approximation. 6.3 Learning and Planning in GSGs In Section 6.2, we assume that the parameter vectors! 1 ;:::;! L in the attackers’ bounded ratio- nality model are known. Since the defender may not know these parameter values precisely at the beginning of the game in practice, we now aim to learn the attackers’ average parameter distribu- tion from attack data. Previous work in green security domains (Yang et al., 2014b; Haskell et al., 2014b) treats each data point, e.g., each snare or fishnet, as an independent attacker and applies MLE to select the most probable parameter vector. However, some of the assumptions made in previous work in proposing MLE may not always hold as MLE works well when a large number of data samples are used to estimate one set of parameters (Eliason, 1993). Here we show that estimating! from a single data point using MLE can lead to highly biased results. Example 3. Consider a guard protecting two targets in round 1. The payoff structure and initial defender strategy are shown in Table 6.2 whereX 1 and 0< 1. An attacker with parameter vector! = (1; 0; 0) will chooseN 1 orN 2 with the probability 0:5, as! 1 =1 means he has a slight preference on targets with lower coverage probability (see Equation 7.3). If the attacker attacksN 1 , applying MLE will lead to an estimation of! = (+1;;), meaning 124 the attacker will always choose the target with higher coverage probability. This is because the probability of attacking N 1 is 1 given ! 1 = +1, which is higher than that of any other parameter value. Similarly, if the attacker attacks N 2 , an extreme parameter of (1;;) is derived from MLE. These extreme parameters will mislead the defender in designing her strategy in the following round. Target R d i P d i R a i P a i c 0 i N 1 1 1 1 1 0:5 + N 2 1 X 1 1 0:5 Table 6.2: Payoff structure of Example 3. We, therefore, leverage insights from Bayesian Updating. For each data point, we estimate a probability distribution over parameter values instead of selecting the! vector that maximizes the likelihood of the outcome. This approach is also different from maximum a posteriori prob- ability (MAP) because MAP still provides single value estimates, whereas Bayesian Updating uses distributions to summarize data. Algorithm 7 describes the learning algorithm for one round of the game. Rather than learning single parameter values, one from each attack, we learn a probability distribution. The input of the algorithm includes the number of attacks i found on each targeti2 [N], the attackers’ belief of the defender strategy, and the prior distributionp =hp 1 ;:::;p S i over a discrete set of parameter valuesf^ !g =f^ ! 1 ;:::; ^ ! S g, each of which is a 3-element tuple. If an attacker attacks targeti, we can calculate the posterior distribution of this attacker’s parameter by applying Bayes’ rule based on the prior distributionp (Line 7). We then calculate the average posterior distribution p over all attackers (Line 7). Based on Algorithm 7, we now provide a novel framework that incorporates the learning al- gorithm into PA-M(- ) as shown in Algorithm 8. The input p 1 is the prior distribution of the 125 Algorithm 7 Learn-BU (,,f^ !g,p) Output: p – a probability distribution overf^ !g =f^ ! 1 ;:::; ^ ! S g. fori=1 toN do fors=1 toS do p i (s) = p(s)q i (^ ! s ;) P r p(r)q i (^ ! r ;) end end fors=1 toS do p(s) = P i i p i (s) P i i end Algorithm 8 BU-PA-M- (p 1 ) Output: Defender strategy profilehc 1 ;:::c T i. fort=1 toT do c t = f-PlanAhead(c t1 ;!; minfTt;M 1g) p t = Learn-BU(c t1 ; t ;f^ !g;p t ) p t+1 = p t end attackers’ parameters before the game starts. This prior distribution is for the general population of attackers and we need to learn the distribution of theL attackers we are facing in one game. The main idea of the algorithm is to use the average posterior distribution calculated in roundt (denoted as p t ) as the prior distribution in roundt + 1 (denoted asp t+1 ), i.e.,p t+1 = p t . Given priorp t , Function f-PlanAhead in Line 8 is calculated through Equation 6.2 – 6.5 by substituting Equation 6.3 withE t = L P s P i p t (s)q i ( ! s ;c t1 )U d i (c t ). Note that there was no probability term in Equation 6.3 because there we know exactly the parameter values of the attackers. After we collect data in roundt, we apply Learn-BU (Algorithm 7) again and update the prior for next round (Line 8). This is a simplification of the more rigorous process which enumerates the match- ings (exponentially many) between the data points and attackers and updates the distribution of each attacker separately when the attack data is anonymous (the guard may only find the snares placed on ground without knowing the identity of the poacher). When incorporating Algorithm 7 into FS-M, we divide the game into several stages, each containing more thanM rounds, and only update the parameter distribution at the end of each 126 stage. As FS-M may not achieve its average expected utility if only a part of the sequence is executed, updating the parameter distribution in every round may lead to a low utility. 6.4 General Case Generalization from = 1 and 0 = 0 to > 1 and/or 0 2 [0; 1] can be achieved via gener- alizing t . PA-M(- ) can be calculated by substituting Constraint 6.4 with = P M k=0 k c k , and FS-M can be calculated by changing Constraints 6.9-6.10 accordingly. Theorem 8 shows the theoretical bound of FS-M with > 1 and the proof is similar to that of Theorem 7. Theorem 8. In a GSG withT rounds, for any fixed length <MT , there exists a cyclic defender strategy profile [s] with periodM that is a (1 M ) Z1 Z+1 approximation of the optimal strategy profile in terms of the normalized utility, whereZ =d T+1 M e. Proof of Theorem 8: Use U([x];x 0 ) to denote the defender’s normalized expected reward in a round where defender strategy x 0 is used in this round, and defender strategy sequence [x] =hx ;:::;x 1 i is used in the previous rounds. Then 0U([x];x 0 ) 1. For the optimal defender strategy profile [c], denote the normalized utility asU opt . hb 1 ;:::;b M i is a strategy sequence whose average normalized expected utility for lastM rounds, is maximized and the value is denoted asU b .ha 1 ;:::;a M i is a strategy sequence such that the average normalized expected utility of the sequence when it forms a cycle is maximized and the value is denoted asU a . Then MU a X M t=+1 U(b t1 ;b t ) = (M )U b 127 Construct a defender strategy profile [s] by repeating the strategy sequenceha 1 ;:::;a M i. Then TU([s]) (Z 1)MU a (6.18) (Z 1) (M )U b (6.19) Strategy profile [s] containsb T M c complete cycles (starting from the first round witha ) with average normalized reward U a . As Z =d T+1 M e,b T M c = Z 1. The inequality 6.18 is derived by ignoring the first round and the last incomplete cycle if any (whenmod(T;M)6= ). On the other hand, for the optimal defender strategy profile [c] = [c] opt , we know that for any consecutive sequence of lengthM, the average normalized reward of lastM rounds can be no more thanU b . So we divide the strategy profile intod T M e pieces, each piece with length M except the last piece. Then for each piece, the sum of normalized reward is no more than U b (M ). Thus, TU opt U b (M )d T M e (6.20) (T +M )U b (6.21) The inequality 6.21 is yield by conceptually completing the last piece. Combine 6.18 - 6.21, we get U([s]) U opt (Z 1) (M ) T +M (Z 1) (M ) M +ZM = (1 M ) Z 1 Z + 1 128 Equation is derived from the definition ofZ, asT ZM 1 ZM. So the cyclic strategy profile [s] is a (1 M ) Z1 Z+1 approximation of the optimal strategy profile in terms of normalized utility. 6.5 Experimental Results M=1 M=2 M=3 M=4 −1 0 1 2 3 4 Defender AEU FS−M PA−M−0.25 PA−M−0.5 PA−M−0.75 PA−M−1 (a) Solution quality (Game Set 1) M=1 M=2 M=3 M=4 0 1000 2000 3000 Runtime (s) FS−M PA−M−0.25 PA−M−0.5 PA−M−0.75 PA−M−1 (b) Runtime (Game Set 1) M=1 M=2 M=3 M=4 0 2 4 6 8 Defender AEU FS−M PA−M−0.25 PA−M−0.5 PA−M−0.75 PA−M−1 (c) Solution quality (Game Set 2) 0 0.25 0.5 0.75 1 0 2 4 6 8 Defender AEU FS−1 PA−1 FS−2 PA−2 (d) With varying0 (x-axis) M=1 M=2 M=3 M=4 0 2 4 6 8 Defender AEU FS−M PA−M−0.25 PA−M−0.5 PA−M−0.75 PA−M−1 (e) Solution quality ( = 2) 0 2 4 6 Actual Defender AEU Γ= 1 Γ= 2 Γ= 3 Γ= 4 FS−1 PA−1 PA−2 PA−3 PA−4 (f) Estimated = 3 FS−1 PA−1 FS−2 PA−2 −2 0 2 4 6 8 Defender AEU KnownPara MLE BU BU’ (g) Solution quality for learning FS−1 PA−1 FS−2 PA−2 0 2000 4000 6000 8000 Runtime(s) MLE BU BU’ (h) Runtime for learning Figure 6.2: Experimental results show improvements over algorithms from previous work. We test all the proposed algorithms on GSGs motivated by scenarios in green security do- mains such as defending against poaching and illegal fishing. Each round corresponds to 30 days, and each poacher/fisherman will choose a target to place snares/fishnets every day. All algorithms are implemented in MATLAB with thefmincon function used for solving MPs and tested on 2.4GHz CPU with 128 GB memory. All key differences noted are statistically signifi- cant (p< 0:05). 129 6.5.1 Planning Algorithms We compare proposed planning algorithms PA-M(- ) and FS-M with baseline approaches FS- 1 and PA-1. FS-1 is equivalent to calculating the defender strategy with a perfect Stackelberg assumption, which is used in previous work (Yang et al., 2014b; Haskell et al., 2014b), as the de- fender uses the same strategy in every round and the attackers’ belief coincides with the defender strategy. PA-1 is the myopic strategy which tries to maximize the defender’s expected utility in the current round. We assumec 0 is the MAXIMIN strategy. We first consider the special case ( 0 = 0, = 1) and test on 32 game instances of 5 attackers, three targets, one guard and 100 rounds with random reward and penalty chosen from [0; 10] and [10; 0] respectively (denoted as Game Set 1). We run 100 restarts for each MP. Figure 6.2(a) shows that PA-M(- ) and FS-M significantly outperform FS-1 and PA-1 in terms of the defender’s average expected utility (AEU). This means using the perfect Stackelberg assumption would be detrimental to the defender if the attackers respond to last round’s strategy. For PA-M, adding a discount factor may improve the solution. Figure 6.2(b) shows FS-M takes much less time than PA-M overall as FS-M only needs to solve one MP throughout a game while PA-M solves an MP for each round. We also test on 32 games with 100 attackers, ten targets, four guards and 100 rounds (denoted as Game Set 2) in the special case (see Figure 6.2(c)). We set a 1-hour runtime limit for the algorithms and again, FS-M and PA-M(- ) significantly outperform FS-1 and PA-1 in solution quality. We then test general cases on Game Set 2. Figure 6.2(d) shows the defender’s AEU with varying 0 when = 1. In the extreme case of 0 = 1, i.e., the attackers have perfect knowledge 130 of the current defender strategy, the problem reduces to a repeated Stackelberg game, and all approaches provide similar solution quality. However, when 0 < 0:5, FS-2 and PA-2 provide a significant improvement over FS-1 and PA-1, indicating the importance of planning. We further test the robustness of FS-2 when there is slight deviation in 0 with = 1 (see Figure 6.3). For example, the value of 5:891 in the 2 nd row, 1 st column of the table is the de- fender’s AEU when the actual 0 = 0 and the defender assumes (estimates) it to be 0:125 when calculating her strategies. Cells in the diagonal show the case when the estimation is accurate. Cells in the last row show result for baseline algorithm FS-1. FS-1 uses the Stackelberg assump- tion, and thus, the estimated value makes no difference. When the actual value slightly deviates from the defender’s estimate (cells adjacent to the diagonal ones in the same column), the so- lution quality does not change much if the actual 0 > 0:5. When the actual 0 < 0:5, FS-2 outperforms FS-1 significantly even given the slight deviation. In Figure 6.2(e), we compare algorithms assuming = 2, 1 = 2 = 0:5 and 0 = 0. As expected, PA-M with M > 1 and FS-M with M > 2 significantly outperforms FS-1 and PA-1. The improvement of FS-2 over FS-1 is negligible, as any fixed sequence of length 2 can be exploited by the attackers with memory length = 2. Figure 6.2(f) shows the solution quality of PA-M when the defender assumes the attackers’ memory length is 3, but the actual value of varies from 1 to 4. When is slightly over-estimated (actual = 1 or 2), PA-M still significantly outperforms the baseline algorithm FS-1 and PA- 1. However, when is under-estimated (actual = 4), the attackers have a longer memory than the defender’s estimate and thus the attackers can exploit the defender’s planning. This observation suggests that it is more robust to over-estimate the attackers’ memory length when there is uncertainty in . We defer to future work to learn and from attack data. 131 Figure 6.3: Robustness against uncertainty in 0 when = 1 6.5.2 Learning and Planning Framework When the parameter vectorsf! l g are unknown, we compare Algorithm 7 with the baseline learning algorithm that uses MLE (denoted as MLE) when incorporated into planning algo- rithms. In each game of Game Set 2, we randomly choosef! l g for the 100 attackers from a three-dimensional normal distribution with mean = (17:81; 0:72; 0:47) and covariance = 0 B B B B B B @ 209:48 2:64 0:71 2:64 0:42 0:24 0:71 0:24 0:36 1 C C C C C C A . We use BU to denote the case when an accurate prior ( and ) is given to the defender. Recall that the defender plays against 100 attackers throughout a game, and BU aims to learn the parameter distribution of these 100 attackers. BU 0 represents the case when the prior distribution is a slightly deviated estimation (a normal distribution with ran- dom 0 and 0 satisfyingk i 0 i k 5 andk 0 ii ii k 5). KnownPara represents the case when the exact values off! l g are known to the defender. We set a time limit of 30 minutes for the planning algorithms. Figure 6.2(g) – 6.2(h) show that BU and BU 0 significantly outperform MLE. Indeed, the solution quality of BU and BU 0 is close to that of KnownPara, indicating the effectiveness of the proposed learning algorithm. Also, BU and BU 0 run much faster than MLE as MLE solves a convex optimization problem for each target in every round. 132 6.6 Chapter Summary So far, the field had been lacking an appropriate game-theoretic model for green security domains: this chapter provides Green Security Games (GSG) to fill this gap. GSG’s generalization of the Stackelberg assumption which is commonly used in previous work has led it to provide two new planning algorithms as well as a new learning framework, providing a significant advance over previous work in green security domains (Yang et al., 2014b; Haskell et al., 2014b). Additional related work includes criminological work on poaching and illegal fishing (Lemieux, 2014; Beirne & South, 2007), but a game-theoretic approach is completely missing in this line of research. 133 Chapter 7 Reasoning about Spatial Constraints In bringing a defender strategy represented by coverage probability to the real world, my thesis addresses one important aspect which is to deal with spatial constraints. Efforts have been made by law enforcement agencies in many countries to protect endangered animals; the most com- monly used approach is conducting foot patrols. However, given their limited human resources, improving the efficiency of patrols to combat poaching remains a major challenge. To address this problem, prior work introduced a novel emerging application called PAWS (Protection Assistant for Wildlife Security) (Yang et al., 2014b); PAWS is proposed as a game- theoretic decision-aid to optimize the use of human patrol resources to combat poaching. PAWS is an application in the general area of “security games” (Tambe, 2011); security-game-based decision support systems have previously been successfully deployed in the real-world in pro- tecting critical infrastructure such as airports, flights, ports, and metro trains. PAWS was inspired by this success, and was the first of a new wave of proposed applications in the subarea now called “green security games” (Fang, Stone, & Tambe, 2015; Kar, Fang, Fave, Sintov, & Tambe, 2015). Specifically, PAWS solves a repeated Stackelberg security game, where the patrollers (de- fenders) conduct randomized patrols against poachers (attackers), while balancing the priorities 134 of different locations with different animal densities. Despite its promise, the initial PAWS effort did not test the concept in the field. This chapter reports on PAWS’s significant evolution over the last two years from a proposed decision aid to a regularly deployed application. We report on the innovations made in PAWS and lessons learned from the first tests in Uganda in Spring 2014, through PAWS’s continued evolution to current regular use in Malaysia (in collaboration with two Non-Governmental Or- ganizations: Panthera and Rimba). Indeed, the first tests revealed key shortcomings in PAWS’s initial algorithms and assumptions (we will henceforth refer to the initial version of PAWS as PAWS-Initial, and to the version after our enhancement as PAWS). First, a major limitation, the severity of which was completely unanticipated, was that PAWS-Initial ignored topographic in- formation. Yet in many conservation areas, high changes in elevation and the existence of large water bodies may result in a big difference in the effort needed for patrollers’ movement. These factors also have a direct effect on poachers’ movement. Second, PAWS-Initial assumed animal density and relevant problem features at different locations to be known. However, in practice, there are uncertainties in the payoffs of different locations, due to uncertainty over the animal movement. Not considering such uncertainty may lead to high degradation in patrol quality. Third, PAWS-Initial could not scale to provide detailed patrol routes in large conservation ar- eas. Detailed routes require fine-grained discretization, which leads to a large number of feasible patrol routes. Finally, PAWS-Initial failed to consider patrol scheduling constraints. In this chapter, we outline novel research advances which remedy the aforementioned limi- tations, making it possible to deploy PAWS on a regular basis. First, we incorporate elevation information and land features and use a novel hierarchical modeling approach to build a virtual “street map” of the conservation area. This virtual “street map” helps scale-up while providing 135 fine-grained guidance, and is an innovation that would be useful in many other domains requir- ing patrolling of large areas. Essentially, the street map connects the whole conservation area through easy-to-follow route segments such as a ridgeline, streams and river banks. The rationale for this comes from the fact that animals, poachers, and patrollers all use these features while moving. To address the second and third limitations, we build on the street map concept with a novel algorithm that uniquely synthesizes two threads of prior work in the security games lit- erature; specifically, the new PAWS algorithm handles payoff uncertainty using the concept of minimax regret (Nguyen, Fave, Kar, Lakshminarayanan, Yadav, Tambe, Agmon, Plumptre, Dri- ciru, Wanyama, & Rwetsiba, 2015), while simultaneously ensuring scalability – using our street maps – via the cutting plane framework (Yang, Jiang, Tambe, & Ordo ˆ A ˇ Znez, 2013). To address the final limitation, we incorporate in PAWS’s algorithm the ability to address constraints such as patrol time limit and starting and ending at the base camp. In the final part of the chapter, we provide detailed information about the regular deployment of PAWS. 7.1 Background Criminologists have begun to work on the problem of combating poaching, from policy design to illegal trade prevention (Lemieux, 2014). Geographic Information Systems (GIS) experts (Hamisi, 2008) and wildlife management staff (Wato, Wahungu, & Okello, 2006) have carefully studied the identification of poaching hotspots. In recent years, software tools such as SMART (SMART, 2013), MIST (Stokes, 2010) have been developed to help conservation managers record data and analyze patrols retrospectively. We work on a complementary problem of optimizing the patrol planning of limited security staff in conservation areas. 136 In optimizing security resource allocation, previous work on Stackelberg Security Games (SSGs) has led to many successfully deployed applications for the security of airports, ports and flights (Pita et al., 2008; Fang, Jiang, & Tambe, 2013). Based on the early work on SSG, recent work has focused on green security games (Kar et al., 2015), providing conceptual advances in integrating learning and planning (Fang et al., 2015) and the first application to wildlife security PAWS-Initial. PAWS-Initial (Yang et al., 2014b) models the interaction between the patroller (defender) and the poacher (attacker) who places snares in the conservation area (see Figure 7.1) as a basic green security game, i.e., a repeated SSG, where every few months, poaching data is analyzed, and a new SSG is set up to improve patrolling strategies. The deployed version of PAWS adopts this framework. We provide a brief review of SSGs, using PAWS as a key example. In SSGs, the defender protectsT targets from an adversary by optimally allocating a set ofR resources (R <T ) (Pita et al., 2008). In PAWS, the defender discretizes the conservation area into a grid, where each grid cell is viewed as a target for poachers, to be protected by a set of patrollers. The defender’s pure strategy is an assignment of the resources to targets. The defender can choose a mixed strategy, which is a probability distribution over pure strategies. The defender strategy can be compactly represented as a coverage vector c =hc i i wherec i is the coverage probability, i.e., the probability that a defender resource is assigned to be at targeti (Korzhyk, Conitzer, & Parr, 2010c). The adversary observes the defender’s mixed strategy through surveillance and then attacks a target. An attack could refer to the poacher, a snare, or some other aspect facilitating poaching (e.g., poaching camp). Each target is associated with payoff values which indicate the reward and penalty for the players. If the adversary attacks target i, and i is protected by the defender, the defender gets rewardU d r;i and the adversary receives penaltyU a p;i . Conversely, if 137 not protected, the defender gets penalty U d p;i and the adversary receives reward U a r;i . Given a defender strategyc, the players’ expected utilities when targeti is attacked are: U a i =c i U a p;i + (1c i )U a r;i (7.1) U d i =c i U d r;i + (1c i )U d p;i (7.2) The game in PAWS is zero-sum,U d r;i =U a p;i ,U d p;i =U a r;i .U a r;i is decided by animal density – higher animal density implies higher payoffs. In SSGs, the adversary’s behavior model decides his response to the defender’s mixed strat- egy. Past work has often assumed that the adversary is perfectly rational, choosing a single target with the highest expected utility (Pita et al., 2008). PAWS is the first deployed application that relaxes this assumption for a bounded rationality model called SUQR, which models the adver- sary’s stochastic response to defender’s strategy (Nguyen et al., 2013b). SUQR was shown to perform the best in human subject experiments when compared with other models. Formally, SUQR predicts the adversary’s probability of attackingi as follows: q i = e w 1 c i +w 2 U a r;i +w 3 U a p;i P j e w 1 c j +w 2 U a r;j +w 3 U a p;j (7.3) where (w 1 ;w 2 ;w 3 ) are parameters indicating the importance of three key features: the coverage probability and the attacker’s reward and penalty. The parameters can be learned from data. 138 Figure 7.1: A picture of a snare placed by poachers. 7.2 First Tests and Feedback We first tested PAWS-Initial (Yang et al., 2014b) at Uganda’s Queen Elizabeth National Park (QENP) for 3 days. Subsequently, with the collaboration of Panthera and Rimba, we started working in forests in Malaysia since September 2014 1 . These protected forests are home to en- dangered animals such as the Malayan Tiger and Asian Elephant but are threatened by poachers. One key difference of this site compared to QENP is that there is high changes in elevation, and the terrain is much more complex. The first 4-day patrol in Malaysia was conducted in November 2014. These initial tests revealed four areas of shortcomings, which restricted PAWS-Initial from being used regularly and widely. The first limitation, which was surprising given that it has received no attention in previous work on security games, is the critical importance of topographic information that was ignored in PAWS-Initial. Topography can affect patrollers’ speed in key ways. For example, lakes are inac- cessible for foot patrols. Not considering such information may lead to the failure of completing 1 For security of animals and patrollers, no latitude/longitude information is presented in this chapter. 139 Figure 7.2: One patrol route during the test in Uganda. (a) Deployed route (b) Patrollers Figure 7.3: First 4-day patrol in Malaysia. Figure 7.3(a) shows one suggested route (orange straight lines) and the actual patrol track (black line). Figure 7.3(b) shows the patrollers walking along the stream during the patrol. the patrol route. Figure 7.2 shows one patrol route during the test in Uganda. The suggested route (orange straight line) goes across the water body (lower right part of figure), and hence the patrollers decided to walk along the water body (black line). Also, changes in elevation require extra patrol effort, and extreme changes may stop the patrollers from following a route. For ex- ample, in Figure 7.3(a) [Malaysia], PAWS-Initial planned a route on a 1km by 1km grid (straight lines), and suggested that the patrollers walk to the north area (Row 1, Column 3) from the south side (Row 2, Column 3). However, such movement was extremely difficult because of the high changes in elevation. So patrollers decided to head towards the northwest area as the elevation 140 (a) Ridgeline (b) Feasible routes (c) Coverage Figure 7.4: Illustrative examples. change is more gentle. In addition, it is necessary to focus on terrain features such as ridge- lines and streams (Figure 7.3(b)) when planning routes for three reasons: (i) they are important conduits for certain mammal species such as tigers; (ii) hence, poachers use these features for trapping and moving about in general; (iii) patrollers find it easier to move around here than on slopes. Figure 7.4(a) shows a prominent ridgeline. The second limitation is that PAWS-Initial assumes the payoff values of the targets — e.g., U a r;i – are known and fixed. In the domain of wildlife protection, there can be uncertainties due to animal movement and seasonal changes. Thus, considering payoff uncertainty is necessary for optimizing patrol strategy. The third limitation is that PAWS-Initial cannot scale to provide detailed patrol routes in large conservation areas, which is necessary for successful deployment. Detailed routes require fine- grained discretization, which leads to an exponential number of routes in total. The fourth limitation is that PAWS-Initial considers covering individual grid cells, but not fea- sible routes. In practice, the total patrolling time is limited, and the patrollers can move to nearby areas. A patrol strategy for implementation should be in the form of distribution over feasible patrol routes satisfying these constraints. Without taking these scheduling (routing) constraints into account, the optimal coverage probabilities calculated by PAWS-Initial may not be imple- mentable. Figure 7.4(b) shows an example area that is discretized into four cells and the base camp is located in the upper left cell. There are three available patrol routes, each protecting two 141 targets. The coverage probabilities shown in Figure 7.4(c) cannot be achieved by a randomization over the three routes because the coverage of the upper left cell (Target 1) should be no less than the overall coverage of the remaining three cells since all routes start from the base camp. 7.3 PA WS Overview and Game Model Figure 7.5 provides an overview of the deployed version of PAWS. PAWS first takes the input data and estimates the animal distribution and human activity distribution. Based on this information, an SSG based game model is built, and the patrol strategy is calculated. In wildlife protection, there is repeated interaction between patrollers and poachers. When patrollers execute the patrol strategy generated by PAWS over a period (e.g., three months), more information is collected and can become part of the input in the next round. PAWS provides significant innovations in addressing the aforementioned limitations of PAWS-Initial. In building the game model, PAWS uses a novel hierarchical modeling approach to build a virtual street map, while incorporating detailed topographic information. PAWS models the poachers bounded rationality as described by the SUQR model and considers uncertainty in payoff values. In calculating the patrol strategy, PAWS uses ARROW (Nguyen et al., 2015) algo- rithm to deal with payoff uncertainty and adopts cutting plane approach and column generation to address the scalability issue introduced by scheduling constraints. 142 Calculate Patrol Strategy Build Game Model Initial Analysis Input Terrain Information Patrol Track Observation Data Street Map Payoff Uncertainty Poacher Behavior Model ARROW Human Activity Distribution Animal Distribution Route Generation Cutting Plane Execute and Collect Observations Figure 7.5: PAWS Overview 7.3.1 Input and Initial Analysis The input information includes contour lines which describe the elevation, terrain information such as lakes and drainage, base camp locations, previous observations (animals and human ac- tivities), as well as previous patrol tracks. However, the point detections of animal and human activity presence are not likely to be spatially representative. As such, it is necessary to predict the animal and human activity distribution over the entire study area. To this end, we used: 1) JAGS (Plummer, 2003) to produce a posterior predictive density raster for tigers (as a target species) derived from a spatially explicit capture-recapture analysis conducted in a Bayesian framework; and 2) MaxEnt (Phillips, Anderson, & Schapire, 2006) to create a raster of predicted human ac- tivity distribution based on meaningful geographical covariates (e.g., distance to water, slope, elevation) in a Maximum Entropy Modelling framework. 7.3.2 Build Game Model Based on the input information and the estimated distribution, we build a game model abstracting the strategic interaction between the patroller and the poacher as an SSG. Building a game model involves defender action modeling, adversary action modeling, and payoff modeling. We will 143 discuss all three parts but emphasize defender action modeling since this is one of the major chal- lenges to bring PAWS to a regularly deployed application. Given the topographic information, modeling defender actions in PAWS is far more complex than any other previous security game domain. 7.3.2.1 Defender Action Modeling Based on the feedback from the first tests, we aim to provide detailed guidance to the patrollers. If we use a fine-grained grid and treat every fine-grained grid cell as a target, computing the optimal patrolling strategy is exceptionally computationally challenging due to the large number of targets and the exponential number of patrol routes. Therefore, a key novelty of PAWS is to provide a hierarchical modeling solution, the first such model in security game research. This hierarchical modeling approach allows us to attain a good compromise between scaling up and providing detailed guidance. This approach would be applicable in many other domains for large open area patrolling where security games are applicable, not only other green security games applications, but others including patrolling of large warehouse areas or large open campuses via robots or UA Vs. More specifically, we leverage insights from hierarchical abstraction for heuristic search such as path planning (Botea, M ˜ AŒller, & Schaeffer, 2004) and apply two levels of discretization to the conservation area. We first discretize the conservation area into 1km by 1km Grid Cells and treat every grid cell as a target. We further discretize the grid cells into 50m by 50m Raster Pieces and describe the topographic information such as elevation in 50m scale. The defender actions are patrol routes defined over a virtual “street map” – which is built in the terms of raster pieces while aided by the grid cells in this abstraction as described below. With this hierarchical 144 Figure 7.6: KAPs (black) for 2 by 2 grid cells. modeling, the model keeps a small number of targets and reduces the number of patrol routes while allowing for details at the 50m scale. The street map is a graph consisting of nodes and edges, where the set of nodes is a small subset of the raster pieces and edges are sequences of raster pieces linking the nodes. We denote nodes as Key Access Points (KAPs) and edges as route segments. The street map not only helps scalability but also allows us to focus patrolling on preferred terrain features such as ridgelines. The street map is built in three steps: (i) determine the accessibility type for each raster piece, (ii) define KAPs and (iii) find route segments to link the KAPs. In the first step, we check the accessibility type of every raster piece. For example, raster pieces in a lake are inaccessible, whereas raster pieces on ridgelines or previous patrol tracks are easily accessible. Ridgelines and valley lines are inferred from the contour lines using existing approaches in hydrology (Tarboton, Bras, & Rodriguez-Iturbe, 2007). The second step is to define a set of KAPs, via which patrols will be routed. We want to build the street map in such a way that each grid cell can be reached. So we first choose raster pieces which can serve as entries and exits for the grid cells as KAPs, i.e., the ones that are on the boundary of grid cells and are easily accessible. In addition, we consider existing base camps and mountain tops as KAPs as they are key points in planning the patroller’s route. We choose 145 additional KAPs to ensure KAPs on the boundary of adjacent cells are paired. Figure 7.6 shows identified KAPs and easily accessible pieces (black and gray raster pieces respectively). The last step is to find route segments to connect the KAPs. Instead of inefficiently finding route segments to connect each pair of KAPs on the map globally, we find route segments locally for each pair of KAPs within the same grid cell, which is sufficient to connect all the KAPs. When finding the route segment, we design a distance measure which estimates the actual patrol effort and also gives high priority to the preferred terrain features. The effort needed for three- dimensional movement can be interpreted as the equivalent distance on flat terrain. For example, for gentle slopes, equivalent “flat-terrain” distance is obtained by adding 8km for every 1km of elevation ascent according to Naismith’s rule (Thompson, 2011). In PAWS, we apply Naismith’s rule with Langmuir corrections (Langmuir, 1995) for gentle slopes (< 20) and apply Tobler’s hiking speed function (Tobler, 1993) for steep slopes ( 20). Very steep slopes (> 30) are not allowed. We penalize not walking on preferred terrain features by adding extra distance. Given the distance measure, the route segment is defined as the shortest distance path linking two KAPs within the grid cell. The defender’s pure strategy is defined as a patrol route on the street map, starting from the base camp, walking along route segments and ending with base camp, with its total distance satisfying the patrol distance limit (all measured as the distance on flat terrain). The patroller confiscates the snares along the route and thus protects the grid cells. More specifically, if the patroller walks along a route segment which covers a sufficiently large portion (e.g., 50% of animal distribution) of a grid cell, the cell is considered to be protected. The defender’s goal is to find an optimal mixed patrol strategy — a probability distribution over patrol routes. 146 7.3.2.2 Poacher Action Modeling and Payoff Modeling The poacher’s actions are defined over the grid cells to aid scalability. In this game, we assume the poacher can observe the defender’s mixed strategy and then chooses one target (a grid cell) and places snares in this target. Following earlier work, the poacher in this game is assumed to be boundedly rational, and his actions can be described by the SUQR model. Each target is associated with payoff values indicating the reward and penalty for the pa- trollers and the poachers. As mentioned earlier, PAWS models a zero-sum game and the reward for the attacker (and the penalty for the defender) is decided by the animal distribution. However, in this game model, we need to handle uncertainty in the players’ payoff values since key domain features such as animal density which contribute to the payoffs are difficult to precisely estimate. In addition, seasonal or dynamic animal migration may lead to payoffs to become uncertain in the next season. We use intervals to represent payoff uncertainty in PAWS; the payoffs are known to lie within a certain interval whereas the exact values are unknown. Interval uncertainty is, in fact, a well-known concept to capture uncertainty in security games (Nguyen, Yadav, An, Tambe, & Boutilier, 2014; Nguyen et al., 2015). We determine the size of the payoff intervals at each grid cell based on patrollers’ patrol efforts at that cell. Intuitively, if the patrollers patrol a cell more frequently, there is less uncertainty in the players’ payoffs at that target and thus a smaller size of the payoff intervals. 7.4 Calculate Patrol Strategy We build on algorithms from the rich security game literature to optimize the defender strategy. However, we find that no existing algorithm directly fits our needs as we need an algorithm that 147 ARROW: compute optimal coverage vec- tor ^ c given a set of linear constraints S. Separation Oracle Find Cutting Plane: Find a hyperplane separating ^ c and feasible region C. If exists, ^ c = 2 C and a new constraint s is found. Route Generation: find routes that constitute the separation hyperplane. Is ^ c2 C? S = S[s Figure 7.7: New integrated algorithm can scale-up to the size of the domain of interest, where: (i) we must generate patrol routes over the street map over the entire conservation area region, while (ii) simultaneously addressing payoff uncertainty and (iii) bounded rationality of the adversary. While the ARROW (Nguyen et al., 2015) algorithm allows us to address (ii) and (iii) together, it cannot handle scale-up over the street map. Indeed, while the (virtual) street map is of tremendous value in scaling up as discussed earlier, scaling up given all possible routes ( 10 12 routes) on the street map is still a massive research challenge. We, therefore, integrate ARROW with another algorithm BLADE (Yang et al., 2013) for addressing the scalability issue, resulting in a novel algorithm that can handle all the three aforementioned challenges. The new algorithm is outlined in Figure 7.7. In the following, we explain how ARROW and BLADE are adapted and integrated. ARROW attempts to compute a strategy that is robust to payoff uncertainty given that poach- ers’ responses follow SUQR. The concept of minimizing maximum regret is a well-known con- cept in AI for decision making under uncertainty (Wang & Boutilier, 2003). ARROW uses the solution concept of behavioral minimax regret to provide the strategy that minimizes regret or utility loss for the patrollers in the presence of payoff uncertainty and bounded rational attackers. In small-scale domains, ARROW could be provided all the routes (the defender’s pure strategies), 148 on the basis of which it would calculate the PAWS solution – a distribution over the routes. Un- fortunately, in large scale domains like ours, enumerating all the routes is infeasible. We must, therefore, turn to an approach of incremental solution generation, which is where it interfaces with the BLADE framework. More specifically, for scalability reasons, ARROW first generates the robust strategy for the patrollers in the form of coverage probabilities over the grid cells without consideration of any routes. Then a separation oracle in BLADE is called to check if the coverage vector is imple- mentable. If it is implementable, the oracle returns a probability distribution over patrol routes that implements the coverage vector, which is the desired PAWS solution. If it is not imple- mentable – see Figure 7.4(c) for an example of coverage vector that is not implementable – the oracle returns a constraint (cutting plane) that informs ARROW why it is not. For the example in Figure 7.4(b)-7.4(c), if ARROW generates a vector as shown in Figure 7.4(c), the constraint returned could bec 1 P 4 i=2 c i since all implementable coverage vector should satisfy this con- straint. This constraint helps ARROW refine its solution. The process repeats until the coverage vector generated by ARROW is implementable. As described in BLADE (Yang et al., 2013), to avoid enumerating all the routes to check whether the coverage vector is implementable, the separation oracle iteratively generate routes until it has just enough routes (usually after a small number of iterations) to match the coverage vector probabilities or get the constraint (cutting plane). At each iteration of this route generation (shown in the bottom-most box in Figure 7.7), the new route is optimized to cover targets of high value. However, we cannot directly use any existing algorithm to find the optimal route at each iteration due to the presence of our street map. But we note similarities to the well-studied 149 orienteering problem (Vansteenwegen, Souffriau, & Oudheusden, 2011) and exploit the insight of the S-algorithm for orienteering (Tsiligiridis, 1984). In particular, in this bottom-most box of in Figure 7.7, to ensure each route returned is of high quality, we run a local search over a large number of routes and return the one with the highest total value. In every iteration, we start from the base KAP and choose which KAP to visit next through a weighted random selection. The next KAP to be visited can be any KAP on the map, and we assume the patroller will take the shortest path from the current KAP to the next KAP. The weight of each candidate KAP is proportional to the ratio of the additional target value that can be accrued and distance from current KAP. We set the lower bound of weight to be > 0 to make sure every feasible route can be chosen with positive probability. The process continues until the patroller has to go back to the base to meet the patrol distance limit constraint. Given a large number of such routes, our algorithm returns a route close to the optimal solution. Integrating all these algorithms, PAWS calculates the patrol strategy consisting of a set of patrol routes and the corresponding probability for taking them. 7.5 Deployment and Evaluation PAWS patrols are now regularly deployed at a conservation area in Malaysia. This section pro- vides details about the deployment and both subjective and objective evaluations of PAWS patrols. PAWS patrol aims to conduct daily patrols from base camps. Before the patrol starts, PAWS generates the patrol strategy starting from the base camp selected by patrol team leader. The patrol distance limit considered by PAWS is 10km per day (equivalent flat terrain). As shown in Table 7.1, this leads to about 9000 raster pieces to be considered. Thus, it is impossible 150 Average # of Reachable Raster Pieces 9066.67 Average # of Reachable Grid Cells (Targets) 22.67 Average # of Reachable KAPs 194.33 Table 7.1: Problem Scale for PAWS Patrols. Average Trip Length 4.67 Days Average Number of Patrollers 5 Average Patrol Time Per Day 4.48 hours Average Patrol Distance Per Day 9.29 km Table 7.2: Basic Information of PAWS Patrols. to consider each raster piece as a separate target or consider all possible routes over the raster pieces. With the two-level of discretization and the street map, the problem scale is reduced, with 8:57(= 194:33=22:67) KAPs and 80 route segments in each grid cell on average, making the problem manageable. The strategy generated by PAWS is a set of suggested routes associated with probabilities and the average number of suggested routes associated with probability > 0:001 is 12. Each PAWS patrol lasts for 4-5 days and is executed by a team of 3-7 patrollers. The patrol planner will make plans based on the strategy generated by PAWS. After reaching the base camp, patrollers execute daily patrols, guided by PAWS’s patrol routes. Table 7.2 provides a summary of basic statistics about the patrols. During the patrol, the patrollers are equipped with a printed map, a handheld GPS, and data recording booklet. They detect animal and human activity signs and record them with detailed comments and photos. After the patrol, the data manager will put all the information into a database, including patrol tracks recorded by the hand-held GPS, and the observations recorded in the log book. Figure 7.8 shows various types of signs found during the patrols. Table 7.3 summarizes all the observations. These observations show that there is a serious ongoing threat from the poach- ers. Column 2 shows results for all PAWS patrols. Column 3 shows results for explorative PAWS 151 (a) Tiger sign (Nov. 2014) (b) Human sign (lighter; Jul. 2015) (c) Human sign (old poacher camp; Aug. 2015) (d) Human sign (tree mark- ing; Aug 2015) Figure 7.8: Various signs recorded during PAWS patrols. patrols, the (partial) patrol routes which go across areas where the patrollers have never been before. To better understand the numbers, we show in Column 4 the statistics about early-stage non-PAWS patrols in this conservation area, which were deployed for tiger survey. Although it is not a fair comparison as the objectives of the non-PAWS patrols and PAWS patrols are different, comparing Column 2 and 3 with Column 4 indicates that PAWS patrols are effective in finding human activity signs and animal signs. Finding the human activity signs is important to identify hotspots of poaching activity, and patrollers’ presence will deter the poachers. Animals signs are not directly evaluating PAWS patrols, but they indicate that PAWS patrols prioritize areas with higher animal density. Finding these signs is aligned with the goal of PAWS – combat poaching to save animals – and thus is a proof for the effectiveness of PAWS. Comparing Column 3 with Column 2, we find the average number of observations made along the explorative routes is com- parable to and even higher than that of all PAWS patrol routes. The observations on explorative routes are important as they lead to a better understanding of the unexplored area. These re- sults show that PAWS can guide the patrollers towards hotspots of poaching activity and provide valuable suggestions to the patrol planners. Along the way of PAWS deployment, we have received feedback from patrol planners and patrollers. The patrol planners mentioned that the top routes in PAWS solution (routes with high- est probability) come close to an actual planner’s routes, which shows PAWS can suggest feasible 152 Patrol Type All PAWS Pa- trol Explorative PAWS Patrol Previous Patrol for Tiger Survey Total Distance (km) 130.11 20.1 624.75 Average # of Human Activity Signs per km 0.86 1.09 0.57 Average # of Animal Signs per km 0.41 0.44 0.18 Table 7.3: Summary of observations. Figure 7.9: One daily PAWS Patrol route in Aug. 2015. routes and potentially reduce the burden of planning effort. As we deploy PAWS in the future at other sites, the cumulative human planners’ effort saved by using PAWS will be a considerable amount. In addition, patrollers commented that PAWS was able to guide them towards poach- ing hotspots. The fact that they found multiple human signs along the explorative PAWS patrol routes makes them believe that PAWS is good at finding good ridgelines that are taken by animals and humans. Patrollers and patrol planners also agree that PAWS generates detailed suggested routes which can guide the actual patrol. Patrollers commented that the suggested routes were mostly along the ridgeline, which are easier to follow, compared with the routes from the first trial by PAWS-Initial. Figure 7.9 shows one suggested route (orange line) and the actual patrol track (black line) during PAWS patrol in Aug. 2015 (shown on 1km by 1km grid). Due to the precision of the contour lines we get, we provide a 50m buffer zone (light orange polygon) around 153 the suggested route (orange line). The patrollers started from the base camp (green triangle) and headed to the southeast. The patrollers mostly followed PAWS’s suggested route, indicating that the route generated by PAWS is easy to follow (contrast with PAWS-Initial as shown in Figure 7.3(a)). Finally, the power of randomization in PAWS solution can be expected in the long-term. 7.6 Lessons Learned During the development and deployment process, we faced several challenges, and here we out- line some lessons learned. First, first-hand immersion in the security environment of concern is critical to understanding the context and accelerating the development process. The authors (from USC and NTU) inten- tionally went for patrols in the forest with the local patrolling team to familiarize themselves with the area. The first-hand experience confirmed the importance of ridgelines, as several human and animal signs are found along the way, and also confirmed that extreme changes in elevation require a considerable extra effort of the patrollers. This gave us the insight for building the street map. Second, visualizing the solution is important for communication and technology adaptation. When we communicate with domain experts and human planners, we need to effectively convey the game-theoretic strategy generated by PAWS, which is a probability distribution over routes. We first visualize the routes with probability> 0:01 using ArcGIS so that they can be shown on the topographic map and the animal distribution map. Then for each route, we provide detailed information that can assist the human planners’ decision-making. We not only provide basic statistics such as probability to be taken and total distance, but also estimate the difficulty level 154 for patrol, predict the probability of finding animals and human signs, and provide an elevation chart that shows how the elevation changes along the route. Such information can help planners’ understanding the strategy. Third, minimizing the need for extra equipment/effort would further ease PAWS future de- ployment, i.e., patrollers would prefer having a single handheld device for collecting patrol data and displaying suggested patrol routes. If PAWS routes could be embedded in the software that is already in use for collecting data in many conservation areas, e.g., SMART, it would reduce the effort required of planners. This is one direction for future development. 7.7 Chapter Summary PAWS is a first deployed “green security game” application to optimize human patrol resources to combat poaching. We provided key research advances to enable this deployment; this has provided a practical benefit to patrol planners and patrollers. The deployment of PAWS patrols will continue at the site in Malaysia. Panthera has seen the utility of PAWS, and we are taking steps to expand PAWS to its other sites. This future expansion and maintenance of PAWS will be taken over by ARMORWAY (ARMORWAY , 2015), a “security games” company (starting in Spring 2016); ARMORWAY has significant experience in supporting security-games-based software deployments. 155 Chapter 8 Conclusion and Future Directions 8.1 Contributions Whereas the first generation of ”security games” research provided algorithms for optimizing security resources in mostly static settings, my thesis advances the state-of-the-art to a new gen- eration of security games, handling massive games with complex spatio-temporal settings and leading to real-world applications that have fundamentally altered current practices of security resource allocation. My work spans many different domains, including protecting ferry systems, forest, fisheries and wildlife. My thesis provides the first algorithms and models for advanc- ing several key aspects of spatio-temporal challenges in security games, including actions over continuous time and space, frequent and repeated attacks as well as complex spatial constraints. First, for games with moving targets such as ferries and refugee supply lines, players’ actions are taken over continuous time, and I provide an efficient linear-programming-based solution while accurately modeling the attacker’s continuous strategy. This work has been deployed by the US Coast Guard for protecting the Staten Island Ferry in New York City in past few years and fundamentally altering previously used tactics. Second, for games where actions are taken 156 over continuous space (for example games with forest land as the target), I provide an algorithm computing the optimal distribution of patrol effort. Third, my work addresses challenges with one key dimension of complexity – frequent and repeated attacks. Motivated by the repeated inter- action of players in domains such as preventing poaching and illegal fishing, I introduce a novel game model that accounts for temporal behavior change of opponents and provide algorithms to plan effective sequential defender strategies. Furthermore, I incorporate complex terrain informa- tion and design the PAWS application to combat illegal poaching, which generates patrol plans with detailed patrol routes for local patrollers. PAWS has been deployed in a protected area in Southeast Asia, with plans for worldwide deployment. While these challenges are brought by different domains, the proposed approaches of ad- dressing these challenges share some high-level ideas that can shed light on future research. The spatio-temporal aspects often lead to a large and even infinite action space, and my work has shown that it is often possible to exploit the spatio-temporal structure to abstract the players’ action space and strategy space. In my work, I have used four approaches for abstraction. The first approach is to investigate the dominance relationship among the players’ actions, and partitioning the action space can be helpful in finding such dominance relationship. In the problem of moving target protection, I par- tition the attacker’s action set over continuous time and show that in each partitioned subset, there exists one time point that (weekly) dominates others. Therefore, it is sufficient to consider a finite number of attacker actions when calculating the optimal defender strategy. The second approach is to use a compact representation for the defender strategy based on equivalence relationship. Two defender strategies are equivalent if they lead to the same expected utility for the defender. If the defender strategies can be classified into several equivalent classes, and each class can be 157 represented using a small number of parameters, a compact representation is found and solving the problem with the compact representation can be much more efficient. The third approach is to exploit the property of the optimal defender strategy to reduce the search space. In the problem of area protection, we made the key observation that the defender should allocate the resources in a way such that the attacker gets zero benefit by going beyond the equilibrium distance. With this property, the defender’s strategy can be uniquely decided once the equilibrium distance is known. Therefore, finding an optimal strategy for the defender is simplified to finding the equilibrium distance. The last approach is to use hierarchical modeling or hierarchical discretization. When a fine-grained discretization is necessary but computationally expensive, an alternative is to use multi-layers of abstraction. In my work, I have used two layers of abstraction for addressing the spatial constraints brought by topographical information. The higher layer is a coarse grid-based discretization, and in the lower layer, sub-graphs are built based on terrain features. 8.2 Future Directions My thesis provides algorithms and models for advancing several key aspects of spatio-temporal dynamics in security games, and one extension would be to explore domains with the presence of several spatio-temporal aspects simultaneously. For example, consider games with both spatial continuity and repeated attacks. The first challenge of calculating optimal strategy in these games is to how to compactly represent the defender’s strategy space. The second challenge is how to efficiently calculate the optimal patrol strategy for the defender given the complexity of the problem. 158 In dealing with spatio-temporal aspects in security games, my thesis has provided several ap- proaches for exploiting the spatio-temporal structure and abstracting the players’ action space or strategy space. The first three approaches mentioned in Section 8.1 can be seen as lossless abstrac- tion since the abstraction does not lead to the degradation of solution quality. The last approach can be lossy in certain cases, but it is not yet clear in which cases a hierarchical discretization can be lossless or what kind of guarantee can be found. Furthermore, adaptive discretization may be a better way of balancing the solution quality and computation efficiency. It is important to un- derstand the tradeoff in various problems. These questions lead to a need for further investigation on the problem of abstraction and discretization in security games. Another direction of future work could be dealing with dynamic defender-attacker interac- tions in the presence of data. In domains such as wildlife conservation and urban security, re- peated interactions between the defender and the attacker(s) are involved. The players can (par- tially) observe the other players’ actions and the observation data collected from informants and surveillance may lead to a change in their behavior in the future. The basic questions remain open, for example, how to model these factors in a game-theoretic model and under what conditions a desirable equilibrium can be reached. In fact, a new solution concept may be required when the infinite time horizon of the game, the uncertainty in the observed actions and the bounded rationality of the attackers are considered. In addition to infrastructure security domains and green security domains, a bunch of research problems are open for applying game theory to cyber-security. Cyber-security has become an increasingly significant problem. It is impossible to fully protect cyber assets at all times, and an important research direction is to develop game-theoretic solutions to inform decisions about what to protect and when. A key future research challenge is learning human behavior models 159 from data collected from human behavior logs and system logs, where timestamps could be an important attribute of attack. However, cyber-security brings a third crucial player, the human user; that was absent in my previous work. This raises fundamental new challenges in attacker- defender game settings which normally have only two players. 160 Bibliography Agmon, N., Kraus, S., & Kaminka, G. A. (2008). Multi-robot perimeter patrol in adversarial settings. In IEEE International Conference on Robotics and Automation (ICRA), pp. 2339– 2345. Albers, H. J. (2010). Spatial modeling of extraction and enforcement in developing country protected areas. Resource and Energy Economics, 32, 165–179. Alpern, S. (1992). Infiltration Games on Arbitrary Graphs. Journal of Mathematical Analysis and Applications, 163, 286–288. An, B., Kempe, D., Kiekintveld, C., Shieh, E., Singh, S. P., Tambe, M., & V orobeychik, Y . (2012). Security games with limited surveillance. In Proceedings of the Twenty-Sixth AAAI Con- ference on Artificial Intelligence, pp. 1241–1248. An, B., Tambe, M., Ord´ o˜ nez, F., Shieh, E., & Kiekintveld, C. (2011). Refinement of strong Stack- elberg equilibria in security games. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI), pp. 587–593. ARMORWAY (2015) http://armorway.com/. Banerjee, B., & Peng, J. (2005). Efficient learning of multi-step best response. In AAMAS, pp. 60–66. Basilico, N., Gatti, N., & Amigoni, F. (2009). Leader-follower strategies for robotic patrolling in environments with arbitrary topologies. In Proceedings of The 8th International Confer- ence on Autonomous Agents and Multiagent Systems (AAMAS) - Volume 1, pp. 57–64. Beirne, P., & South, N. (Eds.). (2007). Issues in Green Criminology. Willan Publishing. Blum, A., Haghtalab, N., & Procaccia, A. D. (2014). Learning optimal commitment to overcome insecurity. In NIPS. Botea, A., M ˜ AŒller, M., & Schaeffer, J. (2004). Near optimal hierarchical path-finding. Journal of Game Development, 1, 7–28. Boˇ sansk´ y, B., Lis´ y, V ., Jakob, M., & Pˇ echouˇ cek, M. (2011). Computing time-dependent poli- cies for patrolling games with mobile targets. In The 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) - Volume 3, pp. 989–996. Boyle, D. (2011). Logging in the wild west. The Phnom Penh Post. Brown, M. (2015). Balancing Tradeoffs in Security Games: Handling Defenders and Adversaries with Multiple Objectives. Ph.D. thesis, University of Southern California. 161 Brown, M., Haskell, W. B., & Tambe, M. (2014). Addressing scalability and robustness in secu- rity games with multiple boundedly rational adversaries. In Conference on Decision and Game Theory for Security (GameSec). Chakraborty, D., Agmon, N., & Stone, P. (2013). Targeted opponent modeling of memory- bounded agents. In Proceedings of the Adaptive Learning Agents Workshop (ALA). Clarke, H. R., Reed, W. J., & Shrestha, R. M. (1993). Optimal enforcement of property rights on developing country forests subject to illegal logging. Resource and Energy Economics, 15, 271–293. Conitzer, V ., & Sandholm, T. (2006). Computing the optimal strategy to commit to. In Proceed- ings of the 7th ACM Conference on Electronic Commerce, EC ’06, pp. 82–90. Cowan, N. (2005). Working Memory Capacity. Essays in cognitive psychology. Psychology Press. Daskalakis, C., & Weinberg, S. M. (2012). Symmetries and optimal multi-dimensional mecha- nism design. In Proceedings of the 13th ACM Conference on Electronic Commerce, EC ’12, pp. 370–387. De Alfaro, L., Henzinger, T. A., & Majumdar, R. (2001). Symbolic algorithms for infinite-state games. In CONCUR 2001 – Concurrency Theory, pp. 536–550. Springer. de Cote, E. M., & Jennings, N. R. (2010). Planning against fictitious players in repeated normal form games. In AAMAS, pp. 1073–1080. Dixon, J. A., & Sherman, P. B. (1990). Economics of Protected Areas: A New Look at Benefits and Costs. Island Press, Washington, DC. Eliason, S. (1993). Maximum Likelihood Estimation. Logic and Practice., V ol. 96 of Quantitative Applications in the Social Sciences. Sage Publications. Fang, F., Jiang, A. X., & Tambe, M. (2013). Optimal patrol strategy for protecting moving targets with multiple mobile resources. In AAMAS. Fang, F., Stone, P., & Tambe, M. (2015). When security games go green: Designing defender strategies to prevent poaching and illegal fishing. In International Joint Conference on Artificial Intelligence (IJCAI). Fudenberg, D., & Tirole, J. (1991). Game Theory. MIT Press. Gal, S. (1980). Search Games. Academic Press, New York. Gatti, N. (2008). Game theoretical insights in strategic patrolling: Model and algorithm in normal- form. In Proceedings of the 18th European Conference on Artificial Intelligence (ECAI), pp. 403–407. Greenberg, M., Chalk, P., & Willis, H. (2006). Maritime terrorism: risk and liability. Rand Corporation monograph series. RAND Center for Terrorism Risk Management Policy. Hall, J. B., & Rodgers, W. A. (1992). Buffers at the boundary. Rural Development Forestry Network, Summer (Paper 13a). Halvorson, E., Conitzer, V ., & Parr, R. (2009). Multi-step Multi-sensor Hider-Seeker Games. In IJCAI. 162 Hamisi, M. (2008). Identification and mapping risk areas for zebra poaching: A case of Tarangire National Park, Tanzania. Ph.D. thesis, Thesis, ITC. Haskell, W., Kar, D., Fang, F., Tambe, M., Cheung, S., & Denicola, E. (2014a). Robust protection of fisheries with compass. In Proceedings of the Twenty-Sixth Innovative Applications of Artificial Intelligence Conference, IAAI 2014, July 29 -31, 2014, Qu´ ebec City, Qu´ ebec, Canada., pp. 2978–2983. Haskell, W. B., Kar, D., Fang, F., Tambe, M., Cheung, S., & Denicola, L. E. (2014b). Robust protection of fsheries with COmPASS. In IAAI. Henzinger, T. A., Horowitz, B., & Majumdar, R. (1999). Rectangular hybrid games. Springer. HOFER, H., CAMPBELL, K. L., EAST, M. L., & HUISH, S. A. (2000). Modeling the spatial distribution of the economic costs and benefits of illegal game meat hunting in the serengeti. Natural Resource Modeling, 13(1), 151–177. Jain, M. (2013). Thwarting Adversaries with Unpredictability: Massive-scale Game-Theoretic Algorithms for Real-world Security Deployments. Ph.D. thesis, Citeseer. Jiang, A. X., Yin, Z., Zhang, C., Tambe, M., & Kraus, S. (2013). Game-theoretic randomization for security patrolling with dynamic execution uncertainty. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, AAMAS ’13, pp. 207–214. Kar, D., Fang, F., Fave, F. D., Sintov, N., & Tambe, M. (2015). “A Game of Thrones”: When human behavior models compete in repeated Stackelberg security games. In AAMAS 2015. Kiekintveld, C., Islam, T., & Kreinovich, V . (2013). Security games with interval uncertainty. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS ’13, pp. 231–238. Kiekintveld, C., Jain, M., Tsai, J., Pita, J., Ord´ o˜ nez, F., & Tambe, M. (2009a). Computing op- timal randomized resource allocations for massive security games. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS ’09, pp. 689–696. Kiekintveld, C., Jain, M., Tsai, J., Pita, J., Ordonez, F., & Tambe, M. (2009b). Computing optimal randomized resource allocations for massive security games. In AAMAS. Korzhyk, D., Conitzer, V ., & Parr, R. (2010a). Complexity of computing optimal Stackelberg strategies in security resource allocation games. In Proceedings of the 24th National Con- ference on Artificial Intelligence (AAAI), pp. 805–810. Korzhyk, D., Conitzer, V ., & Parr, R. (2010b). Complexity of computing optimal stackelberg strategies in security resource allocation games. In AAAI, pp. 805–810. Korzhyk, D., Conitzer, V ., & Parr, R. (2010c). Complexity of computing optimal Stackelberg strategies in security resource allocation games. In AAAI, pp. 805–810. Korzhyk, D., Conitzer, V ., & Parr, R. (2011). Security games with multiple attacker resources. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One, IJCAI’11, pp. 273–279. AAAI Press. 163 Krause, A., Roper, A., & Golovin, D. (2011). Randomized sensing in adversarial environments. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), pp. 2133–2139. Krishna, V . (2009). Auction theory. Academic press. Kumar, A., & Zilberstein, S. (2010). Anytime planning for decentralized POMDPs using ex- pectation maximization. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, pp. 294–301. Langmuir, E. (1995). Mountaincraft and Leadership: A Handbook for Mountaineers and Hill- walking Leaders in the British Isles. Mountain Leader Training Board. Lemieux, A. M. (Ed.). (2014). Situational Prevention of Poaching. Crime Science Series. Rout- ledge. Letchford, J. (2013). Computational Aspects of Stackelberg Games. Ph.D. thesis, Duke Univer- sity. Letchford, J., & Conitzer, V . (2013). Solving security games on graphs via marginal probabilities. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence (AAAI), pp. 591–597. Letchford, J., Conitzer, V ., & Munagala, K. (2009). Learning and approximating the optimal strategy to commit to. In Proceedings of the 2nd International Symposium on Algorithmic Game Theory, pp. 250–262. Letchford, J., & V orobeychik, Y . (2012). Computing optimal security strategies for interdepen- dent assets. In The Conference on Uncertainty in Artificial Intelligence (UAI), pp. 459–468. Lober, D. J. (1992). Using forest guards to protect a biological reserve in costa rica: one step towards linking parks to people. Journal of Environmental Planning and Management, 35(1), 17. Luber, S., Yin, Z., Fave, F. D., Jiang, A. X., Tambe, M., & Sullivan, J. P. (2013). Game-theoretic patrol strategies for transit systems: the trusts system and its mobile app (demonstra- tion). In International Conference on Autonomous Agents and Multiagent Systems (AA- MAS)[Demonstrations Track], pp. 1377–1378. MacKinnon, J., Mackinnnon, K., Child, G., & Thorsell, J. (1986). Managing Protected Areas in the Tropics. IUCN, Gland, Switzerland. Marecki, J., Tesauro, G., & Segal, R. (2012). Playing repeated Stackelberg games with unknown opponents. In AAMAS, pp. 821–828. McKelvey, R. D., & Palfrey, T. R. (1995). Quantal response equilibria for normal form games. Games and Economic Behavior, 2, 6–38. Milliman, S. R. (1986). Optimal fishery management in the presence of illegal activity. Journal of Environmental Economics and Management, 12, 363–381. Miltersen, P. B., & Sørensen, T. B. (2007). Computing proper equilibria of zero-sum games. In Proceedings of the 5th International Conference on Computers and Games, CG’06, pp. 200–211. 164 Nguyen, T. H., Fave, F. M. D., Kar, D., Lakshminarayanan, A. S., Yadav, A., Tambe, M., Agmon, N., Plumptre, A. J., Driciru, M., Wanyama, F., & Rwetsiba, A. (2015). Making the most of our regrets: Regret-based solutions to handle payoff uncertainty and elicitation in green security games. In Conference on Decision and Game Theory for Security. Nguyen, T. H., Yadav, A., An, B., Tambe, M., & Boutilier, C. (2014). Regret-based optimization and preference elicitation for Stackelberg security games with uncertainty. In AAAI. Nguyen, T. H., Yang, R., Azaria, A., Kraus, S., & Tambe, M. (2013a). Analyzing the effectiveness of adversary modeling in security games. In Conference on Artificial Intelligence (AAAI). Nguyen, T. H., Yang, R., Azaria, A., Kraus, S., & Tambe, M. (2013b). Analyzing the effectiveness of adversary modeling in security games. In AAAI. Nguyen, T. H., Yang, R., Azaria, A., Kraus, S., & Tambe, M. (2013c). Analyzing the effectiveness of adversary modeling in security games.. In AAAI. Owen, G. (1995). Game Theory (3rd ed.). Academic Press. Paruchuri, P., Pearce, J. P., Marecki, J., Tambe, M., Ordonez, F., & Kraus, S. (2008). Playing games for security: An efficient exact algorithm for solving Bayesian Stackelberg games. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Mul- tiagent Systems - Volume 2, AAMAS ’08, pp. 895–902, Richland, SC. International Foun- dation for Autonomous Agents and Multiagent Systems. Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190(3-4), 231–259. Pita, J. (2012). The Human Element: Addressing Human Adversaries in Security Domains. Ph.D. thesis, University of Southern California. Pita, J., Jain, M., Marecki, J., Ord´ o˜ nez, F., Portway, C., Tambe, M., Western, C., Paruchuri, P., & Kraus, S. (2008a). Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems: Industrial Track, AAMAS ’08, pp. 125–132. Pita, J., Jain, M., Marecki, J., Ord´ o˜ nez, F., Portway, C., Tambe, M., Western, C., Paruchuri, P., & Kraus, S. (2008b). Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems: Industrial Track, AAMAS ’08, pp. 125–132. Pita, J., Jain, M., Ordonez, F., Portway, C., Tambe, M., Western, C., Paruchuri, P., & Kraus, S. (2009). Using game theory for los angeles airport security.. AI Magazine, 30, 43–57. Pita, J., Jain, M., Ordonez, F., Tambe, M., & Kraus, S. (2010). Robust solutions to stackelberg games: Addressing bounded rationality and limited observations in human cognition. Arti- ficial Intelligence Journal, 174(15):1142-1171, 2010. Pita, J., Jain, M., Western, C., Portway, C., Tambe, M., Ordonez, F., Kraus, S., & Paruchuri, P. (2008). Deployed ARMOR protection: The application of a game theroetic model for security at the Los Angeles International Airport. In AAMAS. 165 Platzer, A. (2015). Differential game logic. ACM Transactions on Computational Logic (TOCL), 17(1), 1. Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling.. Powers, R., & Shoham, Y . (2005). Learning against opponents with bounded memory. In IJCAI, IJCAI’05, pp. 817–822, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Qian, Y ., Haskell, W. B., Jiang, A. X., & Tambe, M. (2014). Online planning for optimal protector strategies in resource conservation games. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014). Robinson, E. J. Z., Albers, H. J., & Williams, J. C. (2008). Spatial and temporal modelling of community non-timber forest extraction. Journal of Environmental Economics and Man- agement, 56, 234–245. Robinson, E. J. Z., Albers, H. J., & Williams, J. C. (2011). Sizing reserves within a landscape: The roles of villagers’ reactions and the ecological-socioeconomic setting. Land Economics, 87, 233–249. Robinson, E. J. (2008). India’s disappearing common lands: fuzzy boundaries, encroachment, and evolving property rights. Land Economics, 84(3), 409–422. Rubinstein, A. (1997). Modeling Bounded Rationality, V ol. 1 of MIT Press Books. The MIT Press. Sabourian, H. (1998). Repeated games with m-period bounded memory (pure strategies). Journal of Mathematical Economics, 30(1), 1 – 35. Sanchirico, J. N., & Wilen, J. E. (2001). A bioeconomic model of marine reserve creation. Journal of Environmental Economics and Management, 42(November), 257–276. Secretariat, G. T. I. (2013). Global tiger recovery program implementation plan: 2013-14. Tech. rep., The World Bank, Washington, D.C. Shieh, E. (2015). Not a Lone Ranger: Unleashing Defender Teamwork in Security Games. Ph.D. thesis, University of Southern California. Shieh, E., An, B., Yang, R., Tambe, M., Baldwin, C., DiRenzo, J., Maule, B., & Meyer, G. (2012a). PROTECT: A deployed game theoretic system to protect the ports of the United States. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS ’12, pp. 13–20. Shieh, E., An, B., Yang, R., Tambe, M., Baldwin, C., DiRenzo, J., Maule, B., & Meyer, G. (2012b). PROTECT: A deployed game theoretic system to protect the ports of the United States. In AAMAS. Simon, L. K., & Stinchcombe, M. B. (1995). Equilibrium refinement for infinite normal-form games. Econometrica, 63(6), 1421–1443. Sinclair, A. R. E., & Arcese, P. (1995). Serengeti II: Dynamics, Management, and Conservation of an Ecosystem. University of Chicago Press, Chicago. SMART (2013). The spatial monitoring and reporting tool (SMART). http://www.smartconservationsoftware.org/. 166 Soille, P. (2004). Morphological Image Analysis: Principles and Applications. Springer. Stein, N. D., Ozdaglar, A., & Parrilo, P. A. (2008). Separable and low-rank continuous games. International Journal of Game Theory, 37(4), 475–504. Stokes, E. J. (2010). Improving effectiveness of protection efforts in tiger source sites: developing a framework for law enforcement monitoring using mist. Integrative Zoology, 5(4), 363– 377. Stone, P., Kaminka, G. A., Kraus, S., & Rosenschein, J. S. (2010). Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the 24th AAAI Confer- ence on Artificial Intelligence, pp. 1504–1509. Tambe, M. (1997). Towards flexible teamwork. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 7, 83–124. Tambe, M. (2011). Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press. Tarboton, D. G., Bras, R. L., & Rodriguez-Iturbe, I. (2007). On the extraction of channel networks from digital elevation data. Hydrologic Processes, 5(1), 81–100. Thompson, D. R. M., & Leyton-Brown, K. (2009). Computational analysis of perfect-information position auctions. In Proceedings of the 10th ACM conference on Electronic commerce, EC ’09, pp. 51–60. Thompson, S. (2011). Unjustifiable Risk?: The Story of British Climbing. Cicerone Press. Tobler, W. (1993). Three presentations on geographical analysis and modeling. non-isotropic ge- ographic modeling: speculations on the geometry of geography, and global spatial analysis (93-1). Tech. rep., UC Santa Barbara. Tsai, J., Rathi, S., Kiekintveld, C., Ordonez, F., & Tambe, M. (2009a). IRIS - a tool for strategic security allocation in transportation networks. In The Eighth International Conference on Autonomous Agents and Multiagent Systems - Industry Track, AAMAS ’09, pp. 37–44. Tsai, J., Rathi, S., Kiekintveld, C., Ordonez, F., & Tambe, M. (2009b). IRIS - a tool for strategic security allocation in transportation networks. In The Eighth International Conference on Autonomous Agents and Multiagent Systems - Industry Track, AAMAS ’09, pp. 37–44. Tsiligiridis, T. (1984). Heuristic methods applied to orienteering. The Journal of the Operational Research Society, 35(9), pp. 797–809. van Damme, E. (1987). Stability and Perfection of Nash equilibria. Springer-Verlag. Vansteenwegen, P., Souffriau, W., & Oudheusden, D. V . (2011). The orienteering problem: A survey. European Journal of Operational Research, 209(1), 1–10. Vanˇ ek, O., Jakob, M., Hrstka, O., & Pˇ echouˇ cek, M. (2011). Using multi-agent simulation to improve the security of maritime transit. In Proceedings of 12th International Workshop on Multi-Agent-Based Simulation (MABS), pp. 1–16. Wang, T., & Boutilier, C. (2003). Incremental utility elicitation with the minimax regret decision criterion. In IJCAI. Wato, Y . A., Wahungu, G. M., & Okello, M. M. (2006). Correlates of wildlife snaring patterns in tsavo west national park, Kenya. Biological Conservation, 132(4), 500–509. 167 Yang, R. (2014). Human Adversaries in Security Games: Integrating Models of Bounded Ratio- nality and Fast Algorithms. Ph.D. thesis, University of Southern California. Yang, R., Ford, B., Tambe, M., & Lemieux, A. (2014a). Adaptive resource allocation for wildlife protection against illegal poachers. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS). Yang, R., Ford, B., Tambe, M., & Lemieux, A. (2014b). Adaptive resource allocation for wildlife protection against illegal poachers. In AAMAS. Yang, R., Jiang, A. X., Tambe, M., & Ordo ˆ A ˇ Znez, F. (2013). Scaling-up security games with boundedly rational adversaries: A cutting-plane approach. In IJCAI. Yang, R., Ordonez, F., & Tambe, M. (2012). Computing optimal strategy against quantal response in security games. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pp. 847–854. International Foundation for Au- tonomous Agents and Multiagent Systems. Yin, Z. (2013). Addressing Uncertainty in Stackelberg Games for Security: Models and Algo- rithms. Ph.D. thesis, University of Southern California. Yin, Z., Jiang, A., Johnson, M., Tambe, M., Kiekintveld, C., Leyton-Brown, K., Sandholm, T., & Sullivan, J. (2012a). TRUSTS: Scheduling randomized patrols for fare inspection in transit systems. In IAAI. Yin, Z., Jiang, A. X., Johnson, M. P., Kiekintveld, C., Leyton-Brown, K., Sandholm, T., Tambe, M., & Sullivan, J. P. (2012b). TRUSTS: Scheduling randomized patrols for fare inspec- tion in transit systems. In Proceedings of the Twenty-Fourth Conference on Innovative Applications of Artificial Intelligence (IAAI), pp. 2348–2355. Yin, Z., Korzhyk, D., Kiekintveld, C., Conitzer, V ., & Tambe, M. (2010). Stackelberg vs. nash in security games: Interchangeability, equivalence, and uniqueness. In AAMAS. Yin, Z., & Tambe, M. (2011). Continuous time planning for multiagent teams with temporal con- straints. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One, IJCAI’11, pp. 465–471. AAAI Press. Yin, Z., & Tambe, M. (2012). A unified method for handling discrete and continuous uncertainty in bayesian stackelberg games. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS). 168
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Hierarchical planning in security games: a game theoretic approach to strategic, tactical and operational decision making
PDF
Game theoretic deception and threat screening for cyber security
PDF
The human element: addressing human adversaries in security domains
PDF
Not a Lone Ranger: unleashing defender teamwork in security games
PDF
When AI helps wildlife conservation: learning adversary behavior in green security games
PDF
Thwarting adversaries with unpredictability: massive-scale game-theoretic algorithms for real-world security deployments
PDF
Balancing tradeoffs in security games: handling defenders and adversaries with multiple objectives
PDF
Human adversaries in security games: integrating models of bounded rationality and fast algorithms
PDF
Modeling human bounded rationality in opportunistic security games
PDF
Combating adversaries under uncertainties in real-world security problems: advanced game-theoretic behavioral models and robust algorithms
PDF
Addressing uncertainty in Stackelberg games for security: models and algorithms
PDF
Protecting networks against diffusive attacks: game-theoretic resource allocation for contagion mitigation
PDF
Computational model of human behavior in security games with varying number of targets
PDF
Handling attacker’s preference in security domains: robust optimization and learning approaches
PDF
Predicting and planning against real-world adversaries: an end-to-end pipeline to combat illegal wildlife poachers on a global scale
PDF
Information as a double-edged sword in strategic interactions
PDF
Computational aspects of optimal information revelation
PDF
Automated negotiation with humans
PDF
Memorable, secure, and usable authentication secrets
PDF
Spatio-temporal probabilistic inference for persistent object detection and tracking
Asset Metadata
Creator
Fang, Fei
(author)
Core Title
Towards addressing spatio-temporal aspects in security games
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
06/20/2016
Defense Date
04/08/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
game theory,OAI-PMH Harvest,security game,spatio-temporal analysis
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Tambe, Milind (
committee chair
), Dughmi, Shaddin (
committee member
), Golubchik, Leana (
committee member
), Mirkovic, Jelena (
committee member
), Sen, Suvrajeet (
committee member
)
Creator Email
fangfeiff123@gmail.com,feifang@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-254981
Unique identifier
UC11279447
Identifier
etd-FangFei-4460.pdf (filename),usctheses-c40-254981 (legacy record id)
Legacy Identifier
etd-FangFei-4460.pdf
Dmrecord
254981
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Fang, Fei
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
game theory
security game
spatio-temporal analysis