Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Addressing uncertainty in Stackelberg games for security: models and algorithms
(USC Thesis Other)
Addressing uncertainty in Stackelberg games for security: models and algorithms
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Addressing Uncertainty in Stackelberg Games for Security: Models and Algorithms
by
Zhengyu Yin
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Computer Science)
May 2013
Copyright 2013 Zhengyu Yin
Acknowledgments
First and foremost, I would like to thank my advisor, Milind Tambe. When I first came abroad
to USC to pursue my doctorate degree, little did I know about doing research and writing papers.
It was not easy initially, and my lack of language literacy and common sense of living in the
U.S. certainly did not make things any better. Awkward moment as I still remember vividly, my
first meeting with Milind had to end in five minutes because I could barely communicate. Over
my countless times of feeling disappointed, frustrated, or frightened, Milind has always been
supportive and encouraging. He has taught me how to do scientific study and kept reminding
me to think of the big picture and always push an idea to its limits. His dedication to students,
accessibility, ethical standards, insistence of practicality make him the best academic advisor I
could ever ask for. It is never enough for me to say in words how great Milind is as a teacher, a
role model, and a sincere friend.
I would also like to thank other members of my dissertation committee for providing valuable
feedback to my research and pushing me to think about it in another level: Bhaskar Krishna
machari, Rajiv Maheswaran, Mathew McCubbins, Fernando Ord´ o˜ nez, and Tuomas Sandholm.
Over the years I have been fortunate to collaborate with many great researchers: Bo An,
Matthew Brown, Branislav Bosansky, Emma Bowring, Jesus Cerquides, Vincent Conitzer,
Francesco Delle Fave, Manish Jain, Michal Jakob, Albert Jiang, Matthew Johnson, Bostjan
ii
Kaluza, David Kempe, Christopher Kiekintveld, Antonin Komenda, Dmytro Korzhyk, Sarit
Kraus, Atul Kumar, Junyoung Kwak, Kevin LeytonBrown, Samantha Luber, Zbynek Moler,
Fernando Ord´ o˜ nez, Michal Pechoucek, Kanna Rajan, Juan Antonio RodriguezAguilar, Tuomas
Sandholm, Eric Shieh, John Sullivan, Milind Tambe, Matthew Taylor, Jason Tsai, Ondrej Vanek,
Meritxell Vinyals, Rong Yang, Rob Zinkov.
I also want to thank my friends at USC and entire TEAMCORE community. Special thanks
to Manish Jain for helping me through many emergencies, Jason Tsai for correcting my language
errors and proofreading my papers for countless times, Matthew Brown for reigniting my ex
citement on international soccer leagues, Bo Yang and Xin Fang for constantly treating me with
delicious food. Thank you all for making the five years of my Ph.D an enjoyable and memorable
experience.
Finally, I want to thank my family for their unconditional love and I always feel sad that I
have to live far away from my parents. I would like to thank my wife Olivia for always being
supportive and taking good care of me when I am busy with work. Olivia, thank you for making
our live so cheerful and enjoyable.
iii
Table of Contents
Acknowledgments ii
List of Figures vii
Abstract x
Chapter 1: Introduction 1
1.1 Problem Addressed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Overview of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 2: Background 9
2.1 Motivating Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Stackelberg Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Bayesian Stackelberg Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Strong Stackelberg Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Baseline Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.1 Multiple Linear Programs . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.2 Dobss: MixedInteger Linear Program . . . . . . . . . . . . . . . . . . . 18
2.5.3 HBGS: BranchandBound Search . . . . . . . . . . . . . . . . . . . . . 19
2.6 Security Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 3: Stackelberg Games with Distributional Uncertainty 25
3.1 Hunter: Discrete Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Upper Bound Linear Program . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.2.1 Convex hull of a Single Type . . . . . . . . . . . . . . . . . . 30
3.1.2.2 Tractable Relaxation . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.3 Bender’s Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.4 Reusing Bender’s Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.5 Heuristic Branching Rules . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Extension to Continuous Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.1 Uncertain Stackelberg Game Model . . . . . . . . . . . . . . . . . . . . 44
3.2.2 Sample Average Approximation . . . . . . . . . . . . . . . . . . . . . . 45
3.2.3 A Unified Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
iv
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 Handling Discrete Follower Types . . . . . . . . . . . . . . . . . . . . . 48
3.3.2 Handling Both Types of Uncertainty . . . . . . . . . . . . . . . . . . . . 51
Chapter 4: Robust Solutions for Security Games 53
4.1 Formal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.1 Recon MILP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.2 Speeding up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.2.1 aRecon: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2.2.2 iRecon: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.1 Runtime of Recon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.2 Performance under uncertainty . . . . . . . . . . . . . . . . . . . . . . . 69
Chapter 5: Stackelberg vs. Nash in Security Games 74
5.1 Properties of Security Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.1.1 Equivalence of Nash Equilibrium and Minimax . . . . . . . . . . . . . . 77
5.1.2 Interchangeability of Nash Equilibria . . . . . . . . . . . . . . . . . . . 80
5.1.3 SSE and Minimax / NE . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.1.4 Uniqueness in Restricted Games . . . . . . . . . . . . . . . . . . . . . . 86
5.2 Multiple Attacker Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Chapter 6: Patrolling in Transit Systems 96
6.1 TRUSTSv1: Deterministic Model for Perfect Execution . . . . . . . . . . . . . . 100
6.1.1 Formal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.1.2.1 Basic Formulation . . . . . . . . . . . . . . . . . . . . . . . . 105
6.1.2.2 Issues with the Basic Formulation . . . . . . . . . . . . . . . . 107
6.1.2.3 Extended Formulation . . . . . . . . . . . . . . . . . . . . . . 109
6.2 TRUSTSv2: Stochastic Model for Imperfect Execution . . . . . . . . . . . . . . 113
6.2.1 Formal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.2.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.2.2.1 Ecient Computation on Separable Utilities . . . . . . . . . . 118
6.2.2.2 Generating Patrol Schedules . . . . . . . . . . . . . . . . . . . 124
6.2.2.3 Coupled Execution: Cartesian Product MDP . . . . . . . . . . 126
6.2.3 Application to Fare Evasion Deterrence . . . . . . . . . . . . . . . . . . 129
6.2.3.1 Linear Program Formulation . . . . . . . . . . . . . . . . . . 129
6.2.3.2 Metro App: SmartPhone Implementation . . . . . . . . . . . 133
6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.3.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.3.2 Simulation Results of TRUSTSv1 . . . . . . . . . . . . . . . . . . . . . 137
6.3.3 Field Trials of TRUSTSv1 . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.3.4 Simulation Results of TRUSTSv2 . . . . . . . . . . . . . . . . . . . . . 143
v
Chapter 7: Related Work 149
7.1 Uncertainty in Simultaneousmove Games . . . . . . . . . . . . . . . . . . . . . 150
7.2 Uncertainty in Stackelberg Games . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.2.1 Algorithms for Bayesian Stackelberg Games . . . . . . . . . . . . . . . 151
7.2.2 Robust Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.2.3 Against Suboptimal Opponents . . . . . . . . . . . . . . . . . . . . . . 153
7.2.4 Observability and Commitment . . . . . . . . . . . . . . . . . . . . . . 154
7.2.5 Markov Decision Process and Stochastic Games . . . . . . . . . . . . . 155
7.3 Solving Complex Graph Patrolling Games . . . . . . . . . . . . . . . . . . . . . 156
Chapter 8: Conclusions 160
8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Bibliography 167
Appendix A: Bender’s Decomposition 173
vi
List of Figures
2.1 Entrance of an LA Metro Rail station. . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Example of a Stackelberg game . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Payo matrices of a Bayesian Stackelberg game. . . . . . . . . . . . . . . . . . 14
2.4 Example search tree of solving Bayesian games. . . . . . . . . . . . . . . . . . . 20
2.5 Payo structure of security games. . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 Relationship among Stackelberg, security, IRIS, and ARMOR games. . . . . . . 22
3.1 Steps of creating internal search nodes in Hunter. . . . . . . . . . . . . . . . . . 28
3.2 Example of internal nodes in Hunter’s search tree. . . . . . . . . . . . . . . . . 29
3.3 Visualization of the Hunter relaxation. . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Experimental analysis of Hunter and runtime comparison against HBGS, and
Dobss. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1 Example ARMOR game with two targets. . . . . . . . . . . . . . . . . . . . . . 55
4.2 Runtime of Recon MILP and the speedup of lower bound heuristics aRecon and
iRecon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3 Performance of strategies generated by Recon, Hunter, Eraser, and Cobra. . . . 71
5.1 A security game where the defender’s expected utility varies in dierent NE profiles 81
5.2 A scheduleconstrained security game where the defender’s SSE strategy is not
an NE strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
vii
5.3 A security game with multiple attacker resources where the defender’s SSE strat
egy is not an NE strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4 The number of games in which the SSE strategy is not an NE strategy, for dier
ent parameter settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.1 The transition graph of a toy problem instance. . . . . . . . . . . . . . . . . . . 102
6.2 Basic Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.3 Example of an infeasible marginal strategy. . . . . . . . . . . . . . . . . . . . . 108
6.4 (a) HDT graph of Example 4 with two starting time points. (b) extension storing
the last action occurring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.5 Extended Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.6 Example game with separable utilities. . . . . . . . . . . . . . . . . . . . . . . . 120
6.7 Metro App user interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.8 Solution quality of TRUSTSv1: (a) Per passenger revenue of the computed mixed
strategy (b) Percentage of the solution value compared to the LP upper bound. . . 137
6.9 Fare evasion analysis of TRUSTSv1: (a) Evasion tendency distribution of the
Red line (b) Percentage of riders that prefer fare evasion. . . . . . . . . . . . . . 138
6.10 Runtime analysis of TRUSTSv1: (a) Runtime of solving the LP by CPLEX (b)
Tradeos between optimality and runtime. . . . . . . . . . . . . . . . . . . . . . 139
6.11 Reducing number of switches: (a) Tradeos between optimality and patrol pref
erence (b) Cumulative probability distribution of the number of switches for the
Red line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.12 Example of a fare inspection patrol shift. . . . . . . . . . . . . . . . . . . . . . . 142
6.13 Example of shift statistics and feedback provided by the LASD. . . . . . . . . . 142
6.14 Markov strategy (TRUSTSv2) vs. pregenerated strategy (TRUSTSv1): (a) rev
enue per rider of varying (b) revenue per rider of varying delay time (c) evasion
rate of varying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.15 Simulation results of TRUSTSv2: (a) Revenue per rider of Markov strategy (b)
Evasion rate of Markov strategy (c) Revenue decay with varying coverage levels. 146
viii
6.16 Simulation results of TRUSTSv2: (a) Revenue per rider with increasing coverage
(b) Worstcase LP runtime. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
ix
Abstract
Recently, there has been significant research interest in using gametheoretic approaches to al
locate limited security resources to protect physical infrastructure including ports, airports, tran
sit systems, and other critical national infrastructure as well as natural resources such as forests,
tigers, fish, and so on. Indeed, the leaderfollower Stackelberg game model is at the heart of many
deployed applications. In these applications, the game model provides a randomized strategy for
the leader (security forces), under the assumption that the adversary will conduct surveillance
before launching an attack. Inevitably, the security forces are faced with the problem of uncer
tainty. For example, a security ocer may be forced to execute a dierent patrol strategy from the
planned one due to unexpected events. Also, there may be significant uncertainty regarding the
amount of surveillance conducted by an adversary. While Bayesian Stackelberg games for mod
eling discrete uncertainty have been successfully used in deployed applications, they are NPhard
problems and existing methods perform poorly in scaling up the number of types: inadequate for
complex real world problems. Furthermore, Bayesian Stackelberg games have not been applied
to model execution and observation uncertainty and finally, they require the availability of full
distributional information of the uncertainty.
x
To overcome these diculties, my thesis presents four major contributions. First, I provide
a novel algorithm Hunter for Bayesian Stackelberg games to scale up the number of types. Ex
ploiting the eciency of Hunter, I show preference, execution and observation uncertainty can
be addressed in a unified framework. Second, to address execution and observation uncertainty
(where distribution may be dicult to estimate), I provide a robust optimization formulation to
compute the optimal riskaverse leader strategy in Stackelberg games. Third, addressing the un
certainty of the adversary’s capability of conducting surveillance, I show that for a class of Stack
elberg games motivated by real security applications, the leader is always bestresponding with a
Stackelberg equilibrium strategy regardless of whether the adversary conducts surveillance or not.
As the final contribution, I provide TRUSTS, a novel gametheoretic formulation for scheduling
randomized patrols in public transit domains where timing is a crucial component. TRUSTS
addresses dynamic execution uncertainty in such spatiotemporal domains by integrating Markov
Decision Processes into the gametheoretic model. Simulation results as well as realworld trials
of TRUSTS in the Los Angeles Metro Rail system provide validations of my approach.
xi
Chapter 1: Introduction
My thesis focuses on gametheoretic approaches to allocate resources to protect critical infras
tructure in a variety of security settings. While there is a diverse set of security scenarios, the
typical problem among them is that the security agencies have to protect a large set of targets
with limited resources, making it impossible to protect all targets at all times. For instance, the
security agencies are responsible for protecting large transportation networks such as ports, train
stations, and airports, from potential terrorist activities that may cause significant damage. The
security agencies are also required to patrol an area or a network to deter crimes or misdemeanors
such as illegal extractions of forest resources or fare evasion in public transit systems. Since de
terministic allocations of security resources can often be exploited by intelligent adversary who
conducts surveillance before an act, it is often more desirable for the security agencies to allocate
their resources in a randomized fashion.
1.1 Problem Addressed
Game theory provides a formal mathematical framework for reasoning about the aforementioned
resource randomization problems. Indeed, gametheoretic approaches have been used in multiple
deployed applications, including ARMOR for randomizing checkpoints and canine units at the
1
Los Angeles International Airport (LAX) [Pita et al., 2008], IRIS for randomizing Federal Air
Marshals on commercial flights [Tsai et al., 2009], PROTECT for randomizing port security
patrols at the Boston Coast Guard [Shieh et al., 2012], and TRUSTS for randomizing ticket
inspections in the Los Angeles Metro Rail System [Yin et al., 2012a] (under evaluation as of
March 2013). At the backbone of these applications is the leaderfollower Stackelberg game
model, where the leader (security agency) acts first by committing to a mixed strategy, and the
follower (adversary) bestresponds after observing the leader’s mixed strategy perfectly. Beyond
those deployed security applications, the Stackelberg game model has been studied in numerous
other security problems ranging from patrolling in adversarial domains [Agmon et al., 2008;
Gatti, 2008; Vanek et al., 2011] to natural resource protection [Johnson et al., 2012] and computer
network security [Vanek et al., 2012a].
The Stackelberg game model, despite its recent success in real world deployments, presumes
both perfect knowledge about the adversary and perfect execution of the planned security ac
tivities. It also assumes the adversary can perfectly observe the mixed strategy of the security
agency, i.e., a probability distribution over security activities. Nevertheless, in real world security
domains, the security agencies are inevitably faced with various types of uncertainty. Game
theoretic approaches neglecting these types of uncertainty may lead to a significant loss in real
world deployments.
My thesis studies three main causes of uncertainty typically found in security applications.
First the security agencies may have incomplete information about the adversaries. Adversaries
can have distinct objectives and capabilities, leading to varying preferences over dierent targets.
For example, the police at LAX may be facing either a wellfunded hardlined terrorist or crim
inals from local gangs; and the LA Metro system has tens of thousands of potential fare evaders
2
daily, each of whom may have a distinct intended trip and risk profile. Second, the adversary’s
observation is most likely imperfect. A deliberate terrorist may get noisy observations when con
ducting surveillance: he may occasionally not observe an ocer patrolling a target, or mistake
a passing car as a security patrol. In some situations, the adversary may act without acquiring
information about the security strategy, especially when the cost of surveillance (such as mone
tary expenses and risk of being caught) is prohibitively high or the security measures are dicult
to observe (e.g., undercover ocers). In these situations, the information that an adversary can
acquire through surveillance is either too limited or too noisy to be an important factor in his de
cision making. Finally, the security agencies may not be able to execute their strategies perfectly.
Planned security activities may be interrupted or canceled due to emergency events. For example,
a canine unit protecting a terminal at LAX may be urgently called o to another assignment or
alternatively a new unit could become available. A patrol schedule of an LA Metro ocer may
get interrupted due to a variety of reasons such as writing citations, handling emergencies, or
felony arrests.
Earlier research on modeling uncertainty in Stackelberg games has primarily focused on
the Bayesian extension of Stackelberg games which represents the discrete preference uncer
tainty using multiple adversary types. Unfortunately, solving Bayesian Stackelberg games is
NPhard [Conitzer and Sandholm, 2006], with existing methods [Conitzer and Sandholm, 2006;
Paruchuri et al., 2008; Jain et al., 2011b] performing poorly in scaling up the number of types:
they are inadequate for complex real world problems. The second drawback of Bayesian Stack
elberg game model is that it requires full distributional information of the uncertainty which
may not always be available. Finally, the Bayesian Stackelberg game model has not been (and
3
in certain situations cannot be) applied to problems where there is uncertainty in the follower’s
observation and the leader’s execution.
Thus, there are four problems to be addressed: The first is to design new ecient and scalable
algorithms for Bayesian Stackelberg games. The second is to design models and algorithms to
compute robust solutions when the uncertainty distribution is unavailable. The third is to address
the follower’s observation uncertainty, including the uncertainty in his capability of observing
the leader’s strategy. The fourth is to address the leader’s execution uncertainty, and in particular
for timecritical domains where execution errors can aect the leader’s ability to carry out the
planned schedules in later time steps.
1.2 Contributions
In this context my thesis provides the following four major contributions. The first contribution
of my thesis is a new unified method for solving Bayesian Stackelberg games with both dis
crete and continuous uncertainty [Yin and Tambe, 2012]. At the core of this unified method is a
new algorithm for solving discrete finite Bayesian Stackelberg games, called Hunter (Handling
UNcerTainty Eciently using Relaxation). Hunter combines the following key ideas:
ecient pruning via a bestfirst search in the follower’s strategy space;
a novel linear program for computing tight upper bounds for this search;
using Bender’s decomposition for solving the upper bound linear program eciently;
ecient inheritance of Bender’s cuts from parent to child;
an ecient heuristic branching rule.
4
Then I show that sample average approximation approach can be applied together with Hunter
to address preference, execution, and observation uncertainty in both discrete and continuous
forms in a unified framework. Furthermore, my experimental results suggest that Hunter pro
vides orders of magnitude speedups over the best existing methods for Bayesian Stackelberg
games [Conitzer and Sandholm, 2006; Paruchuri et al., 2008; Jain et al., 2011b]. The eciency
of Hunter can be further exploited in the sample average approximation approach to solving
problems with both discrete and continuous uncertainty.
My second contribution is a robust optimization framework, called Recon (Riskaverse
Execution Considering Observational Noise), to address execution and observation uncertainty
of unknown distribution, with a focus on security problems motivated by the ARMOR applica
tion [Yin et al., 2011]. Recon addresses the major drawback of the Bayesian model: the necessity
of knowing the precise distribution of the uncertainty, and is particularly useful for security sce
narios where no good estimation of such uncertainty distribution is available. For example, the
distribution of the follower’s observation noise is often dicult to measure statistically due to
limited data. Recon models the uncertainty boundary as a hyperrectangle, and correspondingly
computes the optimal riskaverse strategy for the leader. In particular, Recon assumes that na
ture chooses an uncertainty realization within the given hyperrectangle to maximally reduce the
leader’s utility, and maximizes against this worst case. This robust optimization formulation
is similar in spirit to [Aghassi and Bertsimas, 2006a]; the latter, however, is in the context of
simultaneous move games. To solve the Recon formulation eciently, I provide a mixed inte
ger linear program (MILP) and two novel heuristics that speed up the computation of MILP by
5
orders of magnitude. I provide experimental analysis comparing the performance of various se
curity game strategies including those generated by Recon and Hunter in simulated uncertainty
settings, showing the value of Recon and Hunter under dierent assumptions.
The third contribution of my thesis studies security problems where the adversary may or
may not conduct surveillance before taking an action [Yin et al., 2010]. The assumption that
the adversary always observes the leader’s strategy (perfectly or imperfectly) is fundamental in
both the Stackelberg game model as well as the Recon model. However when the adversary acts
without surveillance, a simultaneousmove game model may be a better reflection of the real sit
uation. The leader faces an unclear choice about which strategy to adopt: the recommendation of
the Stackelberg model, or of the simultaneousmove model, or something else entirely? My the
sis provides theoretical and experimental analysis of the leader’s dilemma, focusing on security
games, a class of Stackelberg games motivated by the ARMOR and IRIS applications. In partic
ular, I show that in security games that satisfy the SSAS (Subsets of Schedules Are Schedules)
property (such as ARMOR games), any Stackelberg game equilibrium strategy for the leader
is also a Nash equilibrium strategy. The leader is therefore bestresponding with a Stackelberg
equilibrium strategy regardless of the follower’s ability to observe the leader’s strategy, resolving
the leader’s dilemma. On the other hand, counterexamples to this (partial) equivalence between
leader’s Stackelberg and Nash equilibrium strategies exist when the SSAS property does not hold.
However, my experiments show that in this case, the fraction of games where the Stackelberg
equilibrium strategy is not in any Nash equilibrium is vanishingly small with increasing problem
sizes. In practical terms, my theoretical and experimental contributions imply that security agen
cies in applications such as ARMOR (where games satisfy the SSAS property) and IRIS (where
6
games have small schedule size and a large number of schedules) can simply stick to the Stack
elberg game model regardless of the follower’s ability to observe the leader’s mixed strategy.
The final contribution of this thesis addresses dynamic execution uncertainty in security pa
trolling for public transit systems. This problem is significantly more complex than earlier prob
lems such as ARMOR and IRIS where security activities are represented as a single action. In
transit domains, security activities are patrols within the transit systems, carried out as sequences
of actions in dierent place and time. Execution uncertainty in such spatiotemporal domains has
vastly dierent impact since an execution error can aect the security ocers’ ability to carry out
their planned schedules in later time steps. The result of the investigation is a new gametheoretic
model, called TRUSTS (Tactical Randomization for Urban Security in Transit Systems) [Yin
et al., 2012a,b; Jiang et al., 2013]. TRUSTS proposed in my thesis features the following four
key ideas:
I provide a general Bayesian Stackelberg game model for spatiotemporal patrolling with
execution uncertainty where the execution uncertainty is represented as Markov Decision
Processes.
I show that when the utility functions have a certain separable structure, the leader’s strat
egy space can be compactly represented. As a result the problem can be reduced to
a polynomialsized optimization problem, solvable by existing approaches for Bayesian
Stackelberg games without execution uncertainty.
TRUSTS employs a novel history duplicate approach to encode constraints on feasible
patrols within this compact representation.
7
The compactly represented solutions are stochastic patrol policies that can be used to gen
erate randomized patrol schedules with contingency plans. Such contingency plans can be
implemented as a smartphone app carried by patrol units, or as a communication protocol
with a central operator.
As an empirical validation of the approach, I apply the gametheoretic model to the problem of
fare evasion deterrence in the Los Angeles Metro Rail system, providing details of model creation,
simulation results, and smartphone app design for implementing the patrol policies generated.
1.3 Overview of Thesis
This thesis is organized in the following way. Chapter 2 introduces necessary background for the
research presented in this thesis. Chapter 3 presents the algorithm Hunter for Bayesian Stack
elberg games, its extension to address preference, execution, and observation uncertainty in a
unified framework, and the corresponding experimental results. Chapter 4 presents the robust
optimization framework Recon and the corresponding experimental results. Chapter 5 studies
the uncertainty of whether the adversary conducts surveillance or not, establishing connection
between the Stackelberg equilibrium and the Nash equilibrium in security games. Chapter 6
presents the TRUSTS system, describing the model framework, strategy representation, execu
tion uncertainty model using Markov Decision Processes, and experimental results from com
puter simulations as well as field trials. Chapter 7 presents related work. And finally, Chapter 8
concludes the thesis and presents issues for future work.
8
Chapter 2: Background
This chapter begins by introducing motivating examples of real world security applications in
Section 2.1. It then provides background on the general Stackelberg game model and its Bayesian
extension in Section 2.2 and 2.3. Section 2.4 introduces the standard solution concept known as
the Strong Stackelberg Equilibrium (SSE) and Section 2.5 describes previous algorithms for find
ing SSE in general Bayesian Stackelberg games. Finally, in Section 2.6, I introduce a restricted
class of Stackelberg games called security games motivated by two security applications: AR
MOR for the Los Angeles International Airport (LAX) and IRIS for the Federal Air Marshals
Services (FAMS).
2.1 Motivating Applications
While there are many potential security applications where game theory is applicable, e.g., pro
tecting ports, road network, forest, fish, etc., in this section, I will emphasize three real world
security applications that are closely related to this thesis. The first is the ARMOR security sys
tem deployed at the Los Angeles International Airport (LAX) [Pita et al., 2008]. In this domain
police are able to set up checkpoints on roads leading to particular terminals, and assign canine
9
units (bombsning dogs) to patrol terminals. Police resources in this domain are homogeneous,
and do not have significant scheduling constraints.
IRIS is a similar application deployed by the Federal Air Marshals Service (FAMS) [Tsai
et al., 2009]. Armed marshals are assigned to commercial flights to deter and defeat terrorist
attacks. This domain has more complex constraints. In particular, marshals are assigned to tours
of flights that return to the same destination, and the tours on which any given marshal is available
to fly are limited by the marshal’s current location and timing constraints. The types of scheduling
and resource constraints considered in this thesis (in particular Chapter 5) are motivated by those
necessary to represent this domain.
The third example is the TRUSTS application for the Los Angeles Metro Rail system focusing
on fare evasion deterrence. In the Los Angeles Metro Rail system (and other proofofpayment
transit systems worldwide), passengers are legally required to buy tickets before boarding, but
there are no gates or turnstiles. There are, quite literally, no barriers to entry, as illustrated in
Figure 2.1. Instead, security personnel are dynamically deployed throughout the transit system,
randomly inspecting passenger tickets; fare evaders face significant penalties when caught. This
proofofpayment fare collection method is typically chosen as a more costeective alternative
to direct fare collection, i.e., when the revenue lost to fare evasion is believed to be less than what
it would cost to directly preclude it. (See http://en.wikipedia.org/wiki/Proofofpayment
for a list of such systems.)
For the Los Angeles Metro system, with approximately 300,000 riders daily, this revenue
loss can be significant; the annual cost has been estimated at $5.6 million [Booz Allen Hamilton,
2007]. The Los Angeles Sheris Department (LASD) deploys uniformed patrols on board trains
and at stations for farechecking (and for other purposes such as crime prevention), in order to
10
Figure 2.1: Entrance of an LA Metro Rail station.
discourage fare evasion. With limited resources to devote to patrols, it is impossible to cover all
locations at all times. The LASD thus requires some mechanism for choosing times and locations
for inspections. Any predictable patterns in such a patrol schedule are likely to be observed and
exploited by potential fareevaders. The traditional approach relies on humans for scheduling the
patrols. However, human schedulers are poor at generating unpredictable schedules [Wagenaar,
1972; Tambe, 2011]; furthermore such scheduling for LASD is a tremendous cognitive burden on
the human schedulers who must take into account all of the scheduling complexities (e.g., train
timings, switching time between trains, and schedule lengths). Indeed, the sheer diculty of even
enumerating the trillions of potential patrols makes any simple automated approach—such as a
simple dice roll—inapplicable.
2.2 Stackelberg Games
A Stackelberg game is a twoperson game played by a leader and a follower [von Stackelberg,
1934], where the leader commits to a mixed strategy first, and the follower observes the leader’s
strategy and responds with a pure strategy, maximizing his utility correspondingly. Since sig
nificant portion of this thesis focuses on Stackelberg games for security applications where the
11
leader defends a set of physical assets against potential attacks, the terms “defender” and “leader”,
and the terms “attacker” and “follower” are used interchangeably respectively. For explanatory
purpose, I will also refer to the leader (defender) as “her” and the follower (attacker) as “him”.
The leader in Stackelberg games benefits from the power of commitment known as the first
mover’s advantage in game theory literature. To see the advantage of being a leader, consider
a simple game in normal form given below. If the players move simultaneously, the only Nash
Equilibrium (NE) of this game is for the row player to play a and the column player c, giving
the row player a utility of 2. This can be seen by noticing that b is strictly dominated for the row
player. On the other hand, if the row player moves first, she can commit to b. With the column
player best responds with d, the row player can receive a utility of 3, better than the simultaneous
move case. In fact, the Stackelberg equilibrium strategy is for the row player to play a with :5
and b with:5, so that the best response for the column player is to play d, which gives the row
player an expected utility of 3:5.
2
c d
a 2,1 4,0
b 1,0 3,1
Figure 2.2: Example of a Stackelberg game
In the general form of Stackelberg games, the leader’s mixed strategy is an Ndimensional
real vector x2R
N
subject to a set of linear constraints (e.g., Ax b; x 0). This mixed strategy
representation generalizes the traditional mixed strategy concept in game theory where
P
N
i=1
x
i
=
1 with x
i
representing the probability of playing pure strategy i. The added expressiveness of
2
In these games it is assumed that if the follower is indierent, he breaks the tie in the leader’s favor (otherwise,
the optimal solution is not well defined).
12
this generalization is useful for compact strategy representation in many security domains, e.g.,
TRUSTS as we will see in Section 6.1.2.
The leader and follower’s expected utilities are both linear combinations of x with weights
dependent on the follower’s choice. Given a leader’s strategy x, the follower maximizes his
expected utility by choosing one of his J pure strategies. For each pure strategy j played by the
follower, the leader gets a utility of
T
j
x +
j;0
and the follower gets a utility of
T
j
x +
j;0
, where
j
;
j
are real vectors inR
N
and
j;0
;
j;0
2 R. It is useful to define the leader’s utility matrix U
and the follower’s utility matrix V as the following,
U =
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1;0
:::
J;0
1
:::
J
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
; V =
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1;0
:::
J;0
1
:::
J
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
:
Then for a leader’s strategy x, the leader and follower’s J utilities for the follower’s J pure strate
gies are U
T
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1
x
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
and V
T
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1
x
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
.
2.3 Bayesian Stackelberg Games
A Bayesian extension to the Stackelberg game allows multiple types of followers, each with its
own payo matrix. This extension is useful in modeling the diversity of potential adversaries in
all aspects. For example, the police at LAX may be facing either a wellfunded hardlined terrorist
or criminals from local gangs; and the LA Metro system has tens of thousands of potential fare
evaders daily, each of whom may have a distinct intended trip and risk profile.
Formally, a Bayesian Stackelberg game is a Stackelberg game between a leader and a follower
whose type is drawn randomly from a set of follower typesf1; 2;:::; g. Each type 1
13
is associated with a prior probability p
representing the likelihood of its occurrence and a pair
of utility matrices (U
; V
) for the leader and the follower respectively. The leader commits to
a mixed strategy knowing the prior distribution of all dierent follower types but not the type of
the follower she faces. The follower, however, knows his own type, and plays the best response
according to his utility matrix V
. For the purpose of equilibrium computation, it is sucient
to consider only pure strategy responses of the follower as shown in [Conitzer and Sandholm,
2006].
The expected utilities of both players are welldefined for a pair of leader’s mixed strategy x
and a vector of the follower’s pure responses j = ( j
1
;:::; j
) where j
denotes the pure strategy
of follower type. For the follower of type, his expected utility is v
(x; j
) = (
j
)
T
x +
j
;0
.
For the leader, her expected utility is u(x; j) =
P
=1
p
u
(x; j
) where u
(x; j
) = (
j
)
T
x +
j
;0
is the leader’s expected utility against follower type.
As an example, which we will return to in Chapter 3, consider a Bayesian Stackelberg game
with two follower types, where type 1 appears with probability:84 and type 2 appears with prob
ability :16. The leader (defender) chooses a probability distribution of allocating one resource
to protect the two targets whereas the follower (attacker) chooses the best target to attack. We
show the payo matrices in Figure 2.3, where the leader is the row player and the follower is the
column player. The utilities of the two types are identical except that a follower of type 2 gets a
utility of 1 for attacking Target2 successfully, whereas one of type 1 gets 0. The leader’s strategy
is a column vector (x
1
; x
2
)
T
representing the probabilities of protecting the two targets. Given
Type 1 Target1 Target2
Target1 1, 1 1, 0
Target2 0, 1 1, 1
Type 2 Target1 Target2
Target1 1, 1 1, 1
Target2 0, 1 1, 1
Figure 2.3: Payo matrices of a Bayesian Stackelberg game.
14
one resource, the strategy space of the leader is x
1
+ x
2
1; x
1
0; x
2
0, i.e., A = (1; 1); b = 1.
The payos in Figure 2.3 can be represented by the following utility matrices,
U
1
=
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
0 0
1 1
0 1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
; V
1
=
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
0 0
1 0
1 1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
; U
2
=
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
0 0
1 1
0 1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
; V
2
=
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
0 0
1 1
1 1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
:
Suppose the leader commits to a mixed strategy (x
1
; x
2
)
T
while follower type 1 attacks Target1
and type 2 attacks Target2. Follower type 1 gets an expected utility ofx
1
+ x
2
and follower
type 2 gets an expected utility of x
1
x
2
. On the other hand, the leader’s expected utility is
0:84(1 x
1
+ 0 x
2
) + 0:16((1) x
1
+ 1 x
2
).
2.4 Strong Stackelberg Equilibrium
Two types of unique Stackelberg equilibria were proposed in [Leitmann, 1978], which are typ
ically called “strong” and “weak” after [Breton et al., 1988]. The two types both assume the
follower best responds to the leader’s mixed strategy. But in cases where ties exist, i.e., multiple
follower pure strategies yield the same maximum expected utilities for the follower, the strong
form assumes that the follower will always choose the optimal strategy for the leader while the
weak form assumes that the follower will choose the worst strategy for the leader. A strong Stack
elberg equilibrium always exists, but a weak Stackelberg equilibrium may not [Basar and Olsder,
1995]. In addition, the leader can often induce the favorable strong equilibrium by selecting a
strategy arbitrarily close to the equilibrium that causes the follower to strictly prefer the desired
strategy [von Stengel and Zamir, 2004].
15
The Strong Stackelberg Equilibrium (SSE) is adopted in recent works of utilizing the Stackel
berg game model for security resource randomization [Paruchuri et al., 2008; Kiekintveld et al.,
2009]. In Bayesian Stackelberg games, the follower’s strategy specifies the pure strategy of each
follower type given the leader’s mixed strategy x, i.e., a vector of functions g = (g
1
;:::; g
),
where each g
maps a leader’s mixed strategy to a pure strategy of follower type. Let g(x) be
the vector of the follower’s responses to x according to g, i.e., g(x) = (g
1
(x);:::; g
(x)). Formally
a Strong Stackelberg Equilibrium is defined below:
Definition 1. For a given Bayesian Stackelberg game with utility matrices (U
1
; V
1
);:::; (U
; V
)
and type distribution p, a pair of strategies (x; g) forms a Strong Stackelberg Equilibrium if and
only if:
1. The leader plays a best response:
u(x; g(x)) u(x
0
; g(x
0
));8x
0
:
2. The follower plays a best response:
v
(x; g
(x)) v
(x; j);81 ;81 j J:
3. The follower breaks ties in favor of the leader:
u
(x; g
(x)) u
(x; j);81 ;8 j that is a best response to x as above:
2.5 Baseline Solvers
2.5.1 Multiple Linear Programs
The leader’s strategy in the SSE is considered the optimal leader’s strategy as it maximizes the
leader’s expected utility assuming the follower best responds. This section explains the baseline
16
algorithms for finding the optimal leader’s strategy of a Bayesian Stackelberg game. As shown
in [Conitzer and Sandholm, 2006], the problem of computing the optimal leader’s strategy x
is
equivalent to finding a leader’s mixed strategy x and a follower’s pure strategy response j = g(x)
such that the three SSE conditions are satisfied. Mathematically x
can be found by solving the
following maximization problem:
(x
; j
) = arg max
x;j
fu(x; j)jv
(x; j
) v
(x; j
0
);81 j
0
Jg: (2.1)
Equation (2.1) suggests the multiple linear program (LP) approach for finding x
as given
in [Conitzer and Sandholm, 2006]. The idea is to enumerate all possible pure strategy responses
of the follower j2f1;:::; Jg
. And for each j, the optimal mixed strategy of the leader x
(j) such
that j is a best response of the follower can be found by solving the following LP:
1
max
x
u(x; j)
s:t: Ax b; x 0
v
(x; j
) v
(x; j
0
); 81 ;81 j
0
J
(2.2)
Some of the LPs may be infeasible but it can be shown that at least one LP will return a feasible
solution. The optimal leader’s strategy x
is then the optimal solution of the LP which has the
highest objective value (i.e., the leader’s expected utility) among all feasible LPs.
1
Note the formulation here is slightly dierent from and has fewer constraints in each LP than the original multiple
LPs approach in [Conitzer and Sandholm, 2006] where a Bayesian game is transformed to a normalform one using
Harsanyi transformation [Harsanyi, 1967].
17
2.5.2 Dobss: MixedInteger Linear Program
Since the followers of dierent types are mutually independent of each other, there can be at
most J
possible combinations of follower best response actions over the follower types. The
multiple LPs approach will then have to solve J
LPs and therefore its runtime complexity grows
exponentially in the number of follower types. In fact, the problem of finding the optimal strategy
for the leader in a Bayesian Stackelberg game with multiple follower types is NPhard [Conitzer
and Sandholm, 2006]. Nevertheless, researchers have continued to provide practical improve
ments. Dobss is an ecient general Stackelberg solver [Paruchuri et al., 2008] and is in use for
security scheduling at the Los Angeles International Airport [Pita et al., 2008]. Dobss obtains
a decomposition scheme by exploiting the property that follower types are independent of each
other and solves the entire problem as one mixedinteger linear program (MILP):
max
x;u;v;q
1
;:::;q
P
=1
p
u
s:t: Ax b; x 0
P
J
j=1
q
j
= 1; 8
q
j
2f0; 1g; 8;8 j
u
u
(x; j) + (1 q
j
) M; 8;8 j
0 v
v
(x; j) (1 q
j
) M; 8;8 j
(2.3)
Dobss eectively reduces the problem of solving an exponential number of LPs to a com
pactly represented MILP which can be solved much more eciently via modern techniques in
operation research. The key idea of the Dobss MILP is to represent the pure strategy of each
follower type as a binary vector q
= (q
1
;:::; q
J
). In particular, the binary variable q
j
is 1 if
18
the follower type chooses the pure strategy j and 0 otherwise. It is easy to see
P
J
j=1
q
j
= 1
since only one q
j
can be 1. M is (conceptually) an infinitely large constant. Variable u
rep
resents the leader’s expected utility against type, which is equal to u
(x; j) when the follower
chooses j (i.e., q
j
= 1). Variable v
represents the expected utility of follower type, which is
the maximum of v
(x; j) over all possible 1 j J.
2.5.3 HBGS: BranchandBound Search
In addition to multiple LPs and Dobss, recent work (HBGS) solves the problem via a branchand
bound tree search [Jain et al., 2011b]. In contrast to the branchandbound techniques typically
used in integer programming where branches are created by considering each side of a separating
hyperplane, HBGS uses the knowledge of the problem and creates the search tree by assigning
one follower type to one pure strategy at each tree level. For example, Figure 2.4 shows the search
tree of the example game in Figure 2.3. Each leaf node corresponds to one LP in the multiple
LPs approach. For instance, the corresponding linear program of the leftmost leaf node finds the
optimal leader strategy such that both type 1 and type 2 have a best response of attacking Target1.
The multiple LPs approach will solve and compare across all leaf nodes to find the overall optimal
strategy of the leader. In this case, the leaf node where type 1 is assigned to Target1 and type 2 to
Target2 provides the overall optimal strategy.
Instead of solving an LP for all J
leaf nodes, branchandbound techniques can be used to
speed up the tree search. The key to eciency in branchandbound is obtaining tight upper and
lower bounds for internal nodes, i.e., for nodes shown by circles in Figure 2.4, where subsets
of follower types are assigned to particular targets. For example, in Figure 2.4, suppose the left
19
subtree has been explored; now if at the rightmost internal node (where type 1 is assigned to Tar
get2) we realize that the upper bound on solution quality is 0:5, we could prune the right subtree
without even considering type 2. One possible way of obtaining upper bounds is by relaxing the
integrality constraints in Dobss MILP. Unfortunately, when the integer variables in DOBSS are
relaxed, the objective can be arbitrarily large, leading to meaningless upper bounds. HBGS [Jain
et al., 2011b] computes upper bounds by heuristically utilizing the solutions of smaller restricted
games. However, the preprocessing involved in solving many small games can be expensive and
the bounds computed using heuristics can again be loose. In my thesis, a new framework of com
puting upper and lower bounds will be presented in Chapter 3 which leads to several orders of
magnitudes speedup over both Dobss and HBGS.
Target1
0.5 0.506 Infeasible 0.33
Type 1:
Type 2:
Target2
Target1 Target2 Target2 Target1
Figure 2.4: Example search tree of solving Bayesian games.
2.6 Security Games
The security games definition in this thesis is quite general, but with assumptions motivated by
two realworld applications ARMOR and IRIS (see Section 2.1). A security game [Kiekintveld
et al., 2009] is a twoplayer game between a defender (leader) and an attacker (follower). The
attacker may choose to attack any target from a set of N targets: T =ft
1
; t
2
;:::; t
N
g. The defender
tries to prevent attacks by covering targets using resources from a set of
resources.As shown
in Figure 2.5,
c
i
is the defender’s utility if t
i
is attacked while t
i
is covered by some defender
20
resource. If t
i
is not covered, the defender gets
u
i
. The attacker’s utility is denoted similarly by
c
i
and
u
i
.
i
=
c
i
u
i
denotes the dierence between the defender’s covered and uncovered
utilities. Similarly,
i
=
u
i
c
i
. As a key property of security games, we assume
i
> 0 and
i
> 0. In words, adding resources to cover a target helps the defender and hurts the attacker.
For ease of memorization, the notation here uses to denote utility for the leader (defender)
and to denote the utility for the follower (attacker) similar to that defined for general Stackelberg
game introduced in Section 2.2. However, with no explicit definition of utility vectors here, the
expected utility calculation in Section 2.2 is not applicable here; instead the expected utilities of
the two players for a certain strategy profile are computed in a more compact way as I will explain
later (given in (2.4) and (2.5)).
Defender
Attacker
μ
u
i
μ
c
i
ν
u
i
ν
c
i
Not covered Covered
Δ ν
i
> 0
Δ μ
i
> 0
Figure 2.5: Payo structure of security games.
Motivated by the IRIS application and similar realworld domains, I introduce resource and
scheduling constraints for the defender. Resources may be assigned to schedules covering mul
tiple targets, s T. For each resource, there is a subset of the schedules that the resource
can potentially cover. In the IRIS application, flights are targets and air marshals are resources.
Schedules capture the idea that air marshals fly tours, and must return to a particular starting
point. Heterogeneous resources can express additional timing and location constraints that limit
21
the tours on which any particular marshal can be assigned to fly. The IRIS application is an im
portant subset of security games with heterogenous resources where the minimum size of feasible
schedules is 2 since an air marshal needs to cover at least a pair of departing and returning flights.
The ARMOR application is also an important subclass of security games, with schedules of size
1 and homogeneous resources. In my thesis, the security games for the ARMOR and IRIS appli
cation are referred to as ARMOR games and IRIS games respectively. Figure 2.6 visualizes the
relationship among four classes of games defined so far.
Stackelberg Games
Security Games
ARMOR
Games
IRIS
Games
Figure 2.6: Relationship among Stackelberg, security, IRIS, and ARMOR games.
A security game described above can be represented as a strategic form game as follows. The
attacker’s pure strategy space is the set of targets. The attacker’s mixed strategy a = (a
1
;:::; a
N
)
is a vector where a
i
represents the probability of attacking t
i
. The defender’s pure strategy is a
feasible assignment of resources to schedules.Since covering a target with one resource is exactly
the same as covering it with any positive number of resources, the defender’s pure strategy can
also be represented by a coverage vector d = (d
1
;:::; d
N
)2f0; 1g
N
where d
i
represents whether t
i
is covered or not. For example, (ft
1
; t
4
g;ft
2
g) can be a possible assignment, and the corresponding
coverage vector is (1; 1; 0; 1). However, not all the coverage vectors are feasible due to resource
and schedule constraints. Denote the set of feasible coverage vectors byDf0; 1g
N
.
22
The defender’s mixed strategy X specifies the probabilities of playing each d2D, where
each individual probability is denoted by X
d
. Let x = (x
1
;:::; x
N
) be the vector of coverage
probabilities corresponding to X, where x
i
=
P
d2D
d
i
X
d
is the marginal probability of covering
t
i
. For example, suppose the defender has two coverage vectors: d
1
= (1; 1; 0) and d
2
= (0; 1; 1).
Then X = (:5;:5) is one defender’s mixed strategy, and the corresponding x = (:5; 1;:5). Denote
the mapping from X to x by', i.e., x ='(X). For defender mixed strategy X and target t
i
attacked,
denote the defender’s and the attacker’s expected utility by u(X; t
i
) and v(X; t
i
) respectively. It is
easy to see u(X; t
i
) = x
i
c
i
+ (1 x
i
)
u
i
and v(X; t
i
) = x
i
c
i
+ (1 x
i
)
u
i
.
If strategy profile (X; a) is played, the defender’s expected utility is
u(X; a) =
N
X
i=1
a
i
u(X; t
i
) =
N
X
i=1
a
i
h
x
i
c
i
+ (1 x
i
)
u
i
i
; (2.4)
while the attacker’s expected utility is
v(X; a) =
N
X
i=1
a
i
v(X; t
i
) =
N
X
i=1
a
i
h
x
i
c
i
+ (1 x
i
)
u
i
i
: (2.5)
Given a defender’s mixed strategy X, let g(X) : X ! a denotes the attacker’s response
function. Similar to Definition 1, a Strong Stackelberg Equilibrium in the security game context
is defined below.
Definition 2. A pair of strategieshX; gi forms a Strong Stackelberg Equilibrium (SSE) of a secu
rity game if they satisfy the following:
1. The leader (defender) plays a bestresponse:
u(X; g(X)) u(X
0
; g(X
0
)), for all X
0
.
23
2. The follower (attacker) plays a bestresponse:
v(X; g(X)) v(X; g
0
(X)), for all X; g
0
.
3. The follower breaks ties optimally for the leader:
u(X; g(X)) u(X; t
i
), for all X and for all t
i
that is a bestresponse to X.
As we will see in Chapter 5, the defender in security games may not always have the power
of commitment (acting as the leader) in certain situations. If the players move simultaneously,
the standard solution concept is Nash equilibrium.
Definition 3. A pair of strategieshX; ai forms a Nash Equilibrium (NE) of a security game if they
satisfy the following:
1. The defender plays a bestresponse:
u(X; a) u(X
0
; a)8X
0
.
2. The attacker plays a bestresponse:
v(X; a) v(X; a
0
)8a
0
.
For convenience, I denote the set of mixed strategies for the defender that are played in some
Nash Equilibrium by
NE
, and the corresponding set for Strong Stackelberg Equilibrium by
S S E
.
24
Chapter 3: Stackelberg Games with Distributional Uncertainty
As discussed earlier, Bayesian Stackelberg game model is useful in modeling distributional un
certainty in Stackelberg games. A key challenge of applying Bayesian Stackelberg game models
to real world problems is to scale up the number of follower types. Scalability of discrete follower
types is essential in domains such as road network security [Dickerson et al., 2010] and public
transit network [Yin et al., 2012a], where each follower type could represent an adversary attempt
ing to follow a certain path. Scaling up the number of types is also necessary for the sampling
based algorithms [Kiekintveld et al., 2011] to obtain high quality solutions under continuous un
certainty. Unfortunately, such scaleup remains dicult, as finding the equilibrium of a Bayesian
Stackelberg game is NPhard [Conitzer and Sandholm, 2006]. Indeed, despite the recent algo
rithmic advancement including MultipleLPs [Conitzer and Sandholm, 2006], Dobss [Paruchuri
et al., 2008], HBGS [Jain et al., 2011b], none of these techniques can handle games with more
than 50 types, even when the number of actions per player is as few as 5: inadequate both for
scaleup in discrete follower types and for samplingbased approaches.
This chapter presents a novel algorithm for solving Bayesian Stackelberg games called
Hunter, combining techniques in artificial intelligence such as bestfirst search and operation
research such as Bender’s decomposition. In Section 3.1, I will describe the algorithmic details
25
of Hunter. In Section 3.2, I will show how Hunter can be used, together with sample average
approximation technique, to solve Stackelberg games with continuous uncertainty such as the
defender’s execution and the attacker’s observation noise. Finally, Section 3.3 contains the exper
imental results of Hunter, demonstrating its superior scalability compared to existing algorithms.
26
3.1 Hunter: Discrete Uncertainty
In this section, I will present Hunter (Handling UNcerTainty Eciently using Relaxation) based
on the five key ideas: i) bestfirst search for ecient pruning of the search tree; ii) a novel linear
program relaxation for computing upper bounds in that search tree; iii) solving the upper bound
LP eciently using Bender’s decomposition; iv) inheritance of Bender’s cuts from parent nodes
to child nodes for speedup; v) ecient heuristic branching rules utilizing the solution returned by
the upper bound LP.
3.1.1 Algorithm Overview
To find the optimal leader’s mixed strategy, Hunter would conduct a bestfirst search in the search
tree that results from assigning follower types to pure strategies, such as the search tree in Fig
ure 2.4. Simply stated, Hunter aims to search this space much more eciently than HBGS [Jain
et al., 2011b]. As discussed earlier in Section 2.5.3, eciency gains are sought by obtaining tight
upper bounds and lower bounds at internal nodes in the search tree (which corresponds to a partial
assignment in which a subset of follower types are fixed). To that end, as illustrated in Figure 3.1,
we use an upper bound LP within an internal search node. The LP returns an upper bound UB and
a feasible solution x
, which is then evaluated by computing the follower best response, providing
a lower bound LB. The solution returned by the upper bound LP is also utilized in choosing a
new type
to create branches. To avoid having this upper bound LP itself become a bottleneck,
it is solved eciently using Bender’s decomposition, which will be explained below.
To understand Hunter’s behavior on a toy game instance, see Figure 3.2, which illustrates
Hunter’s search tree in solving the example game in Figure 2.3 (in Section 2.3). To start the
27
Node
Upper Bound LP:
Bender’s Decomposition
Constraints:
Ax≤b, x≥0
x*
UB
LB
Master
Sub
1
Sub
2
Sub
Λ
...
λ*
Figure 3.1: Steps of creating internal search nodes in Hunter.
bestfirst search, at the root node, no type is assigned any targets yet; we solve the upper bound
LP with the initial strategy space x
1
+ x
2
1; x
1
; x
2
0 (Node 1). As a result, we obtain an
upper bound of 0:560 and the optimal solution x
1
= 2=3; x
2
= 1=3. We evaluate the solution
returned and obtain a lower bound of 0:506. Using Hunter’s heuristics, type 2 is then chosen
to create branches by assigning it to Target1 and Target2 respectively. Next, we consider a child
node (Node 2) in which type 2 is assigned to Target1, i.e., type 2’s best response is to attack
Target1. As a result, the follower’s expected utility of choosing Target1 must be higher than that
of choosing Target2, i.e.,x
1
+ x
2
x
1
x
2
, simplified as x
1
x
2
0. Thus, in Node 2, we
impose an additional constraint x
1
x
2
0 on the strategy space and obtain an upper bound of
0:5. Since its upper bound is lower than the current lower bound 0:506, this branch can be pruned
out. Next we consider the other child node (Node 3) in which type 2 is assigned to Target2. This
time we add constraintx
1
+ x
2
0 instead, and obtain an upper bound of 0:506. Since the upper
bound coincides with the lower bound, we do not need to expand the node further. Moreover,
since we have considered both Target1 and Target2 for type 2, we can terminate the algorithm
and return 0:506 as the optimal solution value.
Now let us discuss Hunter’s behavior linebyline (see Algorithm 1). We initialize the best
first search by creating the root node of the search tree with no assignment of types to targets
28
Node 2: Type 2 → Target1
Constraints:
x
1
+ x
2
≤ 1,
x
1
, x
2
≥ 0,
x
1
– x
2
≤ 0
UB = 0.5
Pruned!
UB < best LB
Node 1
Constraints:
x
1
+ x
2
≤ 1,
x
1
, x
2
≥ 0
x1* = 2/3,
x2* = 1/3
UB = 0.560
LB = 0.506
s* = Type 2
Node 3: Type 2 → Target2
Constraints:
x
1
+ x
2
≤ 1,
x
1
, x
2
≥ 0,
x
1
+ x
2
≤ 0
UB = 0.506
Optimality proved!
UB = best LB
Figure 3.2: Example of internal nodes in Hunter’s search tree.
and with the computation of the node’s upper bound (Line 2 and 3). The initial lower bound is
obtained by evaluating the solution returned by the upper bound LP (Line 4). We add the root
node to a priority queue of open nodes which is internally sorted in a decreasing order of their
upper bounds (Line 5). Each node contains information of the partial assignment, the feasible
region of x, the upper bound, and the Bender’s cuts generated by the upper bound LP. At each
iteration, we retrieve the node with the highest upper bound (Line 8), select a type
to assign
pure strategies (Line 9), compute the upper bounds of the node’s child nodes (Line 12 and 14),
update the lower bound using the new solutions (Line 15), and enqueue child nodes with upper
bound higher than the current lower bound (Line 16). As shown later, Bender’s cuts at a parent
node can be inherited by its children, speeding up the computation (Line 12).
In the rest of the section, I will 1) present the upper bound LP, 2) show how to solve it using
Bender’s decomposition, and 3) verify the correctness of passing down Bender’s cuts from parent
to child nodes, 4) introduce the heuristic branching rule.
29
Algorithm 1: Hunter
1 Initialization;
2 [UB, x
, BendersCuts] = SolveUBLP(, Ax b,1);
3 root :=h UB, x
, Ax b; x 0, BendersCutsi ;
4 LB := Evaluate(x
);
5 Enqueue(queue, root);
6 Bestfirst Search;
7 while not Empty(queue) do
8 node := pop(queue);
9
:= PickType(node);
10 for j := 1 to J do
11 NewConstraints := node.Constraints[fD
j
x + d
j
0g ;
12 [NewUB, x
0
, NewBendersCuts] = SolveUBLP(node.BendersCuts,
NewConstraints, LB) ;
13 if NewUB> LB then
14 child :=h NewUB, x
0
, NewConstraints, NewBendersCutsi ;
15 LB := maxfEvaluate(x
0
), LBg ;
16 Enqueue(queue, child);
17 end
18 end
19 end
3.1.2 Upper Bound Linear Program
In this section, I will derive a tractable linear relaxation of Bayesian Stackelberg games to provide
an upper bound eciently at each of Hunter’s internal nodes. For expository purpose, let us focus
on the root node of the search tree. Applying the results in disjunctive program [Balas, 1998], I
will first derive the convex hull for a single type. Then I will show intersecting the convex hulls
of all its types provides a tractable, polynomialsize relaxation of the entire Bayesian Stackelberg
game.
3.1.2.1 Convex hull of a Single Type
Consider a Stackelberg game with a single follower type (U; V), the leader’s optimal strategy x
is the best among the optimal solutions of J LPs where each restricts the follower’s best response
30
to one pure strategy [Conitzer and Sandholm, 2006]. Hence we can represent the optimization
problem as the following disjunctive program (i.e., a disjunction of ”Multiple LPs” [Conitzer and
Sandholm, 2006]),
max
x;u
u
s:t: Ax b; x 0
J
_
j=1
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
u
T
j
x +
j;0
D
j
x + d
j
0
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
(3.1)
where D
j
’s and d
j
’s are given by,
D
j
=
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
T
1
T
j
:
:
:
T
J
T
j
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
; d
j
=
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1;0
j;0
:
:
:
J;0
j;0
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
:
31
The feasible set of (3.1), denoted by H, is a union of J convex sets, each corresponding to
a disjunctive term. Applying the results in [Balas, 1998], the closure of the convex hull of H,
clconvH, is
1
,
clconvH =
8
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
:
x =
J
X
j=1
j
;
j
0;8 j
u2R u =
J
X
j=1
j
;
j
0;8 j
x2R
n
J
X
j=1
j
= 1;
j
0;8 j
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
A b 0
D
j
d
j
0
T
j
j;0
1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
j
j
j
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
0;8 j
9
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
;
:
The intuition here is that the continuous variables;
P
J
j=1
j
= 1 are used to create all possible
convex combination of points in H. Furthermore, when
j
, 0,h
j
j
;
j
j
i represents a point in the
convex set defined by the jth disjunctive term in the original problem (3.1). Finally, since all the
extreme points of clconvH belong to H, the disjunctive program (3.1) is equivalent to the linear
program:
max
x;u
fuj(x; u)2 clconvHg:
This result is important, as it shows that one can use a single linear program (as opposed to
multiple LPs [Conitzer and Sandholm, 2006] or a mixed integer LP [Paruchuri et al., 2008]) to
solve a Stackelberg game with a single type.
1
To use the results in [Balas, 1998], we assume u 0 for convenience. In the case where u can be negative, we can
replace u by u
+
u
, with u
+
; u
0.
32
3.1.2.2 Tractable Relaxation
Building on the convex hulls of individual types, I will now derive the relaxation of a Bayesian
Stackelberg game with S types. Let us rewrite this game with types as the following disjunctive
program,
max
x;u
1
;:::;u
X
=1
p
u
s:t: Ax b; x 0
^
s=1
2
6
6
6
6
6
6
6
6
6
6
6
6
6
4
J
_
j=1
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
u
(
j
)
T
x +
j;0
D
j
x + d
j
0
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
3
7
7
7
7
7
7
7
7
7
7
7
7
7
5
(3.2)
Returning to the toy example, the corresponding disjunctive program of the game in Fig
ure 2.3 can be written as,
max
x
1
;x
2
;u
1
;u
2
0:84u
1
+ 0:16u
2
s:t: x
1
+ x
2
1; x
1
; x
2
0
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
u
1
x
1
x
1
2x
2
0
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
_
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
u
1
x
1
+ x
2
x
1
+ 2x
2
0
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
u
2
x
1
x
1
x
2
0
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
_
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
u
2
x
1
+ x
2
x
1
+ x
2
0
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
(3.3)
Denote the set of feasible points (x; u
1
;:::; u
) of (3.2) by H
. Unfortunately, to use the
results of [Balas, 1998] here and create clconvH
, we need to expand (3.2) to a disjunctive normal
form, resulting in a linear program with an exponential number (O(NJ
)) of variables. Instead, I
now give a much more tractable, polynomialsize relaxation of (3.2). Denote the feasible set of
33
each type , (x; u
) by H
, and define
c
H
:=f(x; u
1
;:::; u
)j(x; u
)2 clconvH
;81 g.
Then the following program is a relaxation of (3.2):
max
x;u
1
;:::;u
8
>
>
<
>
>
:
X
=1
p
u
j(x; u
)2 clconvH
;81
9
>
>
=
>
>
;
(3.4)
Indeed, for any feasible point (x; u
1
;:::; u
) in H
, (x; u
) must belong to H
, implying that
(x; u
)2 clconvH
. Hence H
c
H
, implying that optimizing over
c
H
provides an upper bound
on H
. On the other hand,
c
H
will in general have points not belonging to H
and thus the
relaxation can lead to an overestimation.
For example, consider the disjunctive program in (3.3). (x
1
=
2
3
; x
2
=
1
3
; u
1
=
2
3
; u
2
= 0) does
not belong to H
sincex
1
+ x
2
0 but u
2
x
1
+ x
2
=
1
3
. However the point belongs to
c
H
because: i) (x
1
=
2
3
; x
2
=
1
3
; u
1
=
2
3
) belongs to H
1
clconvH
1
; ii) (x
1
=
2
3
; x
2
=
1
3
; u
2
= 0)
belongs to clconvH
2
, as it is the convex combination of two points in H
2
: (x
1
=
1
2
; x
2
=
1
2
; u
2
=
1
2
)
and (x
1
= 1; x
2
= 0; u
2
=1),
(
2
3
;
1
3
; 0) =
2
3
(
1
2
;
1
2
;
1
2
) +
1
3
(1; 0;1):
Perhaps a better way to understand the Hunter relaxation is through the following demon
strative example shown in Figure 3.3. In Figure 3.3(a), the blue and orange rectangles
represent the solution spaces of follower type 1 and 2 respectively, i.e., the blue rectan
gles representf(x
1
; x
2
; u
1
; u
2
)j (x
1
; x
2
; u
1
) 2 H
1
; u
2
2 Rg and the red rectangles represent
f(x
1
; x
2
; u
1
; u
2
)j (x
1
; x
2
; u
2
)2 H
2
; u
1
2 Rg. For each type, the two rectangles represent the two
disjoint sets corresponding to attacking one of the two targets respectively. Then the intersection
34
(a) Convex hull clconvH
.
(b) Relaxation
c
H
.
Figure 3.3: Visualization of the Hunter relaxation.
of the rectangles of the two types, shown as the green rectangles, represents the feasible solution
space H
. Built upon the four disjoint green rectangles, the convex hull of H
(clconvH
) is the
area within the purple lines. Shown in Figure 3.3(b), the Hunter relaxation
c
H
is the intersection
of the convex hulls of the two types, i.e., the purple region within the solid purple lines. As can
be easily visualized in Figure 3.3(b),
c
H
is indeed a relaxation compared to clconvH
, the area
within the dashed purple lines.
The upper bound LP (3.4) has O(NJ) number of variables and constraints, and can be
written as the following twostage problem by explicitly representing clconvH
s
:
max
x
X
=1
p
u
(x)
s:t: Ax b; x 0
(3.5)
35
where u
(x) is defined to be the optimal value of,
max
j
;
j
;
j
J
X
j=1
j
s:t:
J
X
j=1
j
= x; 81
J
X
j=1
j
= 1; 81
j
;
j
0;
j
0; 81 ;81 j J
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
A b 0
D
j
d
j
0
(
j
)
T
j;0
1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
j
j
j
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
0;81 ;81 j J
(3.6)
Although written in two stages, the above formulation is in fact a single linear program, as both
stages are maximization problems and combining the two stages will not produce any nonlinear
terms. I display formulations (3.5) and (3.6) in order to reveal the block structure for further
speedup as explained below.
Note that so far, we have only derived the relaxation for the root node of Hunter’s search
tree, without assigning any type to a pure strategy. This relaxation is also applied to other internal
nodes in Hunter’s search tree. For example, if type is assigned to pure strategy j, the leader’s
strategy space is further restricted by the addition of constraints of D
j
x + d
j
0 to the original
constraints Ax b; x 0. That is, we now have obtained the same form of constraints as in the
root node: A
0
x b
0
; x 0 where A
0
=
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
D
j
A
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
and b
0
=
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
d
j
b
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
.
36
3.1.3 Bender’s Decomposition
Although much easier than solving a full Bayesian Stackelberg game, solving the upper bound
LP can still be computationally challenging. Here we invoke the block structure of (3.4) ob
served above, which partitioned it into (3.5) and (3.6), where, (3.5) is a master problem and (3.6)
for = 1;:::; are subproblems. This block structure allows us to solve the upper bound
LP eciently using multicut Bender’s Decomposition [Birge and Louveaux, 1988]. Generally
speaking, the computational diculty of optimization problems increases significantly with the
number of variables and constraints. Instead of considering all variables and constraints of a large
problem simultaneously, Bender’s decomposition partitions the problem into multiple smaller
problems, which can then be solved in sequence. For completeness, I will briefly describe the
technique here in the context of solving LP formulation (3.5)  (3.6). General detailed explana
tion of Bender’s decomposition can be found in Appendix A.
In Bender’s decomposition, the secondstage maximization problem (3.6) is replaced by its
dual minimization counterpart, with dual variables!
j
;
;
for = 1;:::; :
u
(x) = min
!
j
0;
;
(
)
T
x +
s:t:
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
A
T
(D
j
)
T
j
b
T
(d
j
)
T
j;0
0
T
0
T
1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
!
j
+
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
0;81 j J
(3.7)
Since the feasible region of (3.7) is independent of x, its optimal solution is reached at one
of a finite number of extreme points (of the dual variables). Since u
(x) is the minimum of
37
(
)
T
x +
over all possible dual points, we know the following inequality must be true in the
master problem,
u
(
k
)
T
x +
k
; k = 1;:::; K (3.8)
where (
k
;
k
); k = 1;:::; K are all the dual extreme points. Constraints of type (3.8) for the
master problem are called optimality cuts (infeasibility cuts, another type of constraint, turn out
not to be relevant in this context).
Since there are typically exponentially many extreme points for the dual formulation (3.7),
generating all constraints of type (3.8) is not practical. Instead, Bender’s decomposition starts by
solving the master problem (3.5) with a subset of these constraints to find a candidate optimal
solution (x
; u
1;
;:::; u
;
). It then solves dual subproblems (3.7) to calculate u
(x
). If all
the subproblems have u
(x
) = u
;
, the algorithm stops. Otherwise for those u
(x
) < u
;
, the
corresponding constraints of type (3.8) are added to the master program for the next iteration.
As a numerical example, let us consider the example given in Figure 3.2 again. As mentioned
earlier the example problem can be written as disjunctive program (3.3). To illustrate the process
of applying Bender’s decomposition to solve an upper bound program, let us focus on the root
search node where no type has been assigned to any target yet. At the beginning, the master
program (3.5) has no Bender’s cuts:
max
x
1
;x
2
0:84u
1
+ 0:16u
2
s:t: x
1
+ x
2
= 1; x
1
; x
2
0:
Although the above master program is unbounded, an arbitrary feasible strategy x can be returned
as the optimal solution. Without loss of generality, let x
(1)
1
= 1 and x
(1)
2
= 0 be the optimal x
38
returned from the master program in the first iteration. Similarly, I will denote by x
(k)
1
, x
(k)
2
, u
1;(k)
,
and u
2;(k)
the optimal solution obtained from the master program in the k
th
iteration. Since the
first master program is unbounded, u
1;(1)
and u
2;(1)
can be considered as +1.
Given a solution x
(k)
1
and x
(k)
2
, two subproblems corresponding to the two follower types need
to be solved. For better readability, I will give the subproblems in their primal form only, although
the dual solution is used to construct the Bender’s cuts. Note when solving the primal problem
using primaldual methods, the values of dual variables can be obtained as well. The subproblem
for follower type 1 is:
max
1
1
;
1
2
;
1
1
;
1
2
;
1
1
;
1
2
1
1
+
1
2
;
1
1
1
1
;
1
2
1
2
s:t:
1
1;1
+
1
1;2
= x
(k)
1
; 0
1
1;1
1
1
; 0
1
1;2
1
2
1
2;1
+
1
2;2
= x
(k)
2
; 0
1
2;1
1
1
; 0
1
2;2
1
2
1
1
+
1
2
= 1;
1
1
;
1
2
0
1
1;1
+
1
2;1
1
2;1
1
1
1
1;1
1
1;2
+
1
2;2
1
2;2
1
2
1
1;2
+
1
2;2
39
Similarly, the subproblem for follower type 2 is:
max
2
1
;
2
2
;
2
1
;
2
2
;
2
1
;
2
2
2
1
+
2
2
;
2
1
2
1
;
2
2
2
2
s:t:
2
1;1
+
2
1;2
= x
(k)
1
; 0
2
1;1
2
1
; 0
2
1;2
2
2
2
2;1
+
2
2;2
= x
(k)
2
; 0
2
2;1
2
1
; 0
2
2;2
2
2
2
1
+
2
2
= 1;
2
1
;
2
2
0
2
1;1
+
2
2;1
2
1;1
2
2;1
2
1
2
1;1
2
1;2
+
2
2;2
2
1;2
2
2;2
2
2
2
1;2
+
2
2;2
Recall in the first iteration, x
(1)
1
= 1 and x
(1)
2
= 0. The two subproblems are both feasible and
bounded. Solving the two subproblems gives the dual solutions
1
1
=1,
1
2
= 4,
1
= 0, and
2
1
=1,
2
2
= 2,
2
= 0. Here recall that
j
is the dual variable corresponding to the constraint
P
J
j
0
j; j
0
= x
j
and
is the dual variable corresponding to the constraint
P
J
j
j
= 1. The optimal
solutions of both subproblems are1:0, lower than u
1;(1)
and u
2;(1)
. Therefore each subproblem
can generate one Bender’s cut to be added to the master problem. The two cuts are u
1
x
1
+4x
2
and u
2
x
1
+ 2x
2
.
After adding the cuts, the master program becomes the following in the second iteration:
max
x
1
;x
2
0:84u
1
+ 0:16u
2
s:t: x
1
+ x
2
= 1; x
1
; x
2
0
u
1
x
1
+ 4x
2
u
2
x
1
+ 2x
2
:
40
The new optimal solution of the master program is x
(2)
1
= 0, x
(2)
2
= 1, u
1;(2)
= 4, and u
2;(2)
= 2.
Solving the subproblems again generates two Bender’s cuts: u
1
x
1
and u
2
x
1
. Hence the
master program becomes:
max
x
1
;x
2
0:84u
1
+ 0:16u
2
s:t: x
1
+ x
2
= 1; x
1
; x
2
0
u
1
x
1
+ 4x
2
u
2
x
1
+ 2x
2
u
1
x
1
u
2
x
1
:
The optimal solution of the third iteration is x
(3)
1
= 2=3, x
(3)
2
= 1=3, u
1;(3)
= 2=3, and u
2;(3)
= 0.
This time, the optimal values of the two subproblems are 2=3 and 0 respectively. Since these
optimal values are the same as u
1;(3)
and u
2;(3)
respectively, no further Bender’s cut needs to be
added. Therefore the process of Bender’s decomposition terminates with an upper bound value
of 0:84 2=3 + 0:16 0 = 0:56 at the root node of the search tree.
3.1.4 Reusing Bender’s Cuts
It is possible to further speed up the upper bound LP computation at internal nodes of Hunter’s
search tree by not creating all of the Bender’s cuts from scratch; instead, the Bender’s cuts from
the parent node can be reused in its children. Suppose u
(
)
T
x +
is a Bender’s cut in the
parent node. This means u
cannot be greater than (
)
T
x +
for any x in the feasible region of
the parent node. Intuitively because a child node’s feasible region is always more restricted than
its parent’s, it can be concluded that u
cannot be greater than (
)
T
x +
for any x in the child
41
node’s feasible region. Hence, u
(
)
T
x +
must also be a valid cut for the child node. The
following Proposition provides a formal proof.
Proposition 1. The Bender’s cuts generated for a parent node are valid cuts for its child nodes.
Proof. Let the feasible region of a parent node be Ax b; x 0, and the feasible region of a
child node be A
0
x b
0
; x 0, where A
0
=
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
˜
A
A
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
and b
0
=
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
˜
b
b
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
. Assume u
(
)
T
x +
is a
cut of the parent node, implying there exists!
j
0, for all j = 1;:::; J, such that,
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
A
T
(D
j
)
T
j
b
T
(d
j
)
T
j;0
0
T
0
T
1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
!
j
+
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
0;81 j J
Then
,
for all = 1;:::; is a feasible point of the dual problem (3.7) for the child node
because,
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
˜
A
T
A
T
(D
j
)
T
j
˜
b
T
b
T
(d
j
)
T
j;0
0
T
0
T
0
T
1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
0
j
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
+
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
0;81 j J
The above result implies u
(
)
T
x +
is a valid cut for the child node.
3.1.5 Heuristic Branching Rules
Given an internal node in the search tree of Hunter, one must decide on the type to branch on
next, i.e., the type for which J child nodes will be created at the next lower level of the tree.
The simplest way of selecting such type is to randomly choose one type that has been selected
before. However, as I will show in Section 3.3 later, this branching type has a significant eect on
42
eciency and therefore it is important to choose such type intelligently. While multiple heuristics
can be developed, I will limit the focus to the following one within the scope of this thesis.
Throughout the branchandbound search process, after a new search node is evaluated, the
global lower bound increases and the maximum upper bound decreases. The algorithm terminates
with the optimal solution when the lower bound meets the upper bound. Hence intuitively, one
should select a type whereby the upper bound at these children nodes will decrease the most
significantly. While the best type can be found by a onestep lookahead, such lookahead requires
solving many upper bound linear programs and incurs significant extra runtime. It is therefore
desirable to choose one type heuristically without further lookahead.
To this end, Hunter chooses the type whose
returned by (3.6) which violates the integral
ity constraint the most. By branching on this type, the integrality constraint of its
must be
satisfied. This in turn will reduce the upper bound as the problem becomes more constrained.
Recall that
is used to generate the convex combinations. If all
returned by (3.6) are inte
ger vectors, the solution of the upper bound LP (3.5) and (3.6) is a feasible point of the original
problem (3.2), implying the relaxed LP already returns the optimal solution. More specifically,
as inspired by [Gilpin and Sandholm, 2011], Hunter chooses type
whose corresponding
has the maximum entropy, i.e.,
= arg max
P
J
j=1
j
log
j
.
43
3.2 Extension to Continuous Uncertainty
This section extends Hunter to handle continuous uncertainty via the sample average approxima
tion technique [Ahmed et al., 2002]. I first introduce the uncertain Stackelberg game model with
continuously distributed uncertainty in leader’s execution, follower’s observation, and both play
ers’ utilities. Then I show the uncertain Stackelberg game model can be written as a twostage
mixedinteger stochastic program, to which existing convergence results of the sample average
approximation technique apply. Finally, I show the sampled problems are equivalent to Bayesian
Stackelberg games, and consequently could also be solved by Hunter.
3.2.1 Uncertain Stackelberg Game Model
Let us consider the following types of uncertainty in Stackelberg games with known distributions.
First, similar to [Kiekintveld et al., 2011], I assume there is uncertainty in both the leader and the
follower’s utilities U and V. Second, the leader’s execution and the follower’s observation can
also be noisy. More specifically, I assume the executed strategy and observed strategy are linear
perturbations of the intended strategy, i.e., when the leader commits to x, the actual executed
strategy is y = F
T
x + f and the observed strategy by the follower is z = G
T
x + g, where (F; f)
and (G; g) are uncertain. Intuitively f and g are used to represent the execution and observation
noise that is independent on x, while F and G are N N matrices representing execution and
observation noise that is linearly dependent on x. For example, we can represent an execution
noise that is independent of x and follows a Gaussian distribution with 0 mean using F = I
N
and
fN(0; ), where I
N
is the N N identity matrix. U, V, F, f, G, and g are random variables
that follow some known continuous (joint) distributions. Note that G and g can be dependent
44
on F and f to capture the correlation between the defender’s executed strategy and the attacker’s
observed strategy. We use a vector = (U; V; F; f; G; g) to represent a realization of the above
inputs, and use the notation($) to represent the corresponding random variable.
I now show the uncertain Stackelberg game can be written as a twostage mixedinteger
stochastic program. Let Q(x;) be the leader’s utility for a strategy x and a realization
, assuming the follower chooses the best response. The first stage maximizes the ex
pectation of leader’s utility with respect to the joint probability distribution of (!), i.e.,
max
x
fE[Q(x;($))]jAx b; x 0g. The second stage computes Q(x;):
2
Q(x;) =
T
j
(F
T
x + f) +
j
;0
where j
= arg max
J
j=1
T
j
(G
T
x + g) +
j;0
:
(3.9)
3.2.2 Sample Average Approximation
Sample average approximation is a popular solution technique for stochastic programs with con
tinuously distributed uncertainty [Ahmed et al., 2002]. It can be applied to solving uncertain
Stackelberg games as follows. First, a sample
1
;:::;
of realizations of the random vector
($) is generated. The expected value functionE[Q(x;($))] can then be approximated by the
sample average function
1
P
=1
Q(x;
). The sampled problem is therefore given by,
max
x
8
>
>
<
>
>
:
X
=1
1
Q(x;
)jAx b; x 0
9
>
>
=
>
>
;
: (3.10)
2
Problem (3.9) can be formulated as a mixedinteger linear program similar to the Dobss [Paruchuri et al., 2008]
formulation shown in Section 2.5.2.
45
The sampled problem provides tighter and tighter statistical upper bound of the true problem with
increasing number of samples [Mak et al., 1999]; the number of samples required to solve the
true problem to a certain accuracy grows linearly in the dimension of x [Ahmed et al., 2002].
More specifically, Ahmed and Shapiro 2002 showed that if the objective function in terms of
x is Lipschitz continuous, the sample size K which is required to solve the true problem with
probability 1 and accuracy > 0 by solving the sample average approximation problem (3.10)
with accuracy < , grows linearly in dimension of the first stage problem (C is a constant
dependent on the feasible space of x and the objective function):
K
12
2
()
2
jxj log
2C
log
!
:
In the sampled problem, each sample corresponds to a tuple (U; V; F; f; G; g). The following
proposition shows that the sampled execution and observation noise can be handled by simply
perturbing the utility matrices, i.e., is equivalent to some
ˆ
where
ˆ
F =
ˆ
G = I
N
and
ˆ
f = ˆ g = 0.
Proposition 2. For any leader’s strategy x and follower’s strategy j, both players get the same
expected utilities in two noise realizations (U; V; F; f; G; g) and (
ˆ
U;
ˆ
V; I
N
; 0; I
N
; 0), where,
ˆ
U =
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1 f
T
0 F
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
U;
ˆ
V =
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1 g
T
0 G
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
V:
46
Proof. We can calculate both players’ expected utility vectors for both noise realizations to es
tablish the equivalence:
ˆ
U
T
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1
x
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
= U
T
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1 0
T
f F
T
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1
x
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
= U
T
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1
F
T
x + f
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
:
ˆ
V
T
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1
x
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
= V
T
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1 0
T
g G
T
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1
x
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
= V
T
0
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1
G
T
x + g
1
C
C
C
C
C
C
C
C
C
C
C
C
C
A
:
A direct implication of Proposition 2 is that the sampled problem (3.10) and (3.9) is equivalent
to a Bayesian Stackelberg game of equally weighted types, with utility matrices (
c
U
;
c
V
); =
1;:::; . Hence, via sample average approximation, Hunter could be used to solve Stackelberg
games with continuous payo, execution, and observation uncertainty.
3.2.3 A Unified Approach
Both discrete and continuous uncertainty can be handled simultaneously using Hunter by apply
ing sample average approximation in Bayesian Stackelberg games with discrete follower types.
The idea is to replace each discrete follower type by a set of samples of the continuous distri
bution, converting the original Bayesian Stackelberg game to a larger one that can be solved by
Hunter.
47
3.3 Experimental Results
Since none of the existing algorithm can handle both discrete and continuous uncertainty in Stack
elberg games, I provide two sets of experiments in this section considering (i) only discrete un
certainty and (ii) both types of uncertainty. The utility matrices were randomly generated from
a uniform distribution between 10 and 10. All experimental results were obtained on a standard
2.8GHz machine with 2GB main memory, and were averaged over 30 trials.
The main focus of the experiments in this section is to show the scalability of Hunter in
comparison with existing algorithms. As described earlier, an important motivation of scaling
up the number of types is in applying sample average approximation to handling continuous
uncertainty. Therefore it is also interesting to see how good the solutions returned by the Hunter
based sample average approximation approach are in the presence of continuous uncertainty. I
will indeed provide such experimental results in the next chapter after I introduced Recon, a
robust optimization alternative aiming at providing riskaverse strategies for the defender.
3.3.1 Handling Discrete Follower Types
For discrete uncertainty, I compared the runtime of Hunter with Dobss [Paruchuri et al., 2008]
and HBGS [Jain et al., 2011b] (specifically, HBGSF, the most ecient variant), the two fastest
known algorithms for general Bayesian Stackelberg games. I compared the performance of these
algorithms with varying number of types and varying number of pure strategies per player. The
tests used a cuto time of one hour for all three algorithms.
Figure 3.4(a) shows the performance of the three algorithms when the number of types in
creases. The games tested in this set have 5 pure strategies for each player. The xaxis shows the
48
number of types, while the yaxis shows the runtime in seconds. As can be seen in Figure 3.4(a),
Hunter provides significant speedup, of orders of magnitude over both HBGS and Dobss
3
(the
line depicting Hunter is almost touching the xaxis in Figure 3.4(a)). For example, Hunter can
solve a Bayesian Stackelberg game with 50 types in 17:7 seconds on average, whereas neither
HBGS nor Dobss can solve an instance in an hour. Figure 3.4(b) shows the performance of the
three algorithms when the number of pure strategies for each player increases. The games tested
in this set have 10 types. The xaxis shows the number of pure strategies for each player, while
the yaxis shows the runtime in seconds. Hunter again provides significant speedup over both
HBGS and Dobss. For example, Hunter on average can solve a game with 13 pure strategies in
108:3 seconds, but HBGS and Dobss take more than 30 minutes.
Let us now turn to analyzing the contributions of Hunter’s key components to its perfor
mance. First, we consider the runtime of Hunter with two search heuristics, bestfirst (BFS) and
depthfirst (DFS), when the number of types is further increased. I set the pure strategies for each
player to 5, and increased the number of types from 10 to 200. In Table 3.1, I summarize the
average runtime and average number of nodes explored in the search process. As we can see,
DFS is faster than BFS when the number of types is small, e.g., 10 types. However, BFS always
explores significantly fewer number of nodes than DFS and is more ecient when the number
types is large. For games with 200 types, the average runtime of BFS based Hunter is 20 minutes,
highlighting its scalability to a large number of types. Such scalability is achieved by ecient
pruning—for a game with 200 types, Hunter explores on average 5:3 10
3
nodes with BFS and
1:1 10
4
nodes with DFS, compared to a total of 5
200
= 6:2 10
139
possible leaf nodes.
3
The runtime results of HBGS and Dobss are inconsistent with the results in [Jain et al., 2011b] because I used
CPLEX 12 for solving mixed integer linear program instead of GLPK which was used in [Jain et al., 2011b].
49
10 20 30 40 50
0
500
1000
1500
Number of Types
Runtime (in seconds)
HUNTER HBGS DOBSS
(a) Scaling up types.
5 10 15
0
500
1000
1500
Number of Pure Strategies
Runtime (in seconds)
HUNTER
HBGS
DOBSS
(b) Scaling up pure strategies.
10 20 30 40 50
0
20
40
60
Numbern of Types
Runtime (in seconds)
Variant−I
Variant−II
Variant−III (HUNTER)
(c) Eectiveness of heuristics.
0 1 2 3 4 5 6 7 8 9 10
0
100
200
300
400
Error Bound
Runtime (in seconds)
150 Types
100 Types
50 Types
(d) Finding approximate solutions.
Figure 3.4: Experimental analysis of Hunter and runtime comparison against HBGS, and Dobss.
#Types 10 50 100 150 200
BFS Runtime (s) 5.7 17.7 178.4 405.1 1143.5
BFS #Nodes Explored 21 316 1596 2628 5328
DFS Runtime (s) 4.5 29.7 32.1 766.0 2323.5
DFS #Nodes Explored 33 617 3094 5468 11049
Table 3.1: Scalability of Hunter to a large number of types
Second, I tested the eectiveness of the two heuristics: inheritance of Bender’s cuts from
parent node to child nodes and the branching rule utilizing the solution returned by the upper
bound LP. I fixed the number of pure strategies for each agent to 5 and increased the number of
types from 10 to 50. In Figure 3.4(c), I show the runtime results of three variants of Hunter: i)
VariantI does not inherit Bender’s cuts and chooses a random type to create branches; ii) Variant
II does not inherit Bender’s cuts and uses the heuristic branching rule; iii) VariantIII (Hunter)
inherits Bender’s cuts and uses the heuristic branching rule. The xaxis represents the number of
50
types while the yaxis represents the runtime in seconds. As we can see, each individual heuristic
helps speed up the algorithm significantly, showing their usefulness. For example, it took 14:0
seconds to solve an instance of 50 types when both heuristics were enabled (VariantIII) compared
to 51:5 seconds when neither of them was enabled (VariantI).
Finally, let us consider the performance of Hunter in finding quality bounded approximate
solutions. To this end, Hunter is allowed to terminate once the dierence between the upper
bound and the lower bound decreases to", a given error bound. The solution returned is therefore
an approximate solution provably within of the optimal solution. In this set of experiment, we
test 30 games with 5 pure strategies for each player and 50, 100, and 150 types with varying error
bound" from 0 to 10. As shown in Figure 3.4(d), Hunter can eectively trade o solution quality
for further speedup, indicating the eectiveness of its upper bound and lower bound heuristics.
For example, for games with 100 types, Hunter returns within 30 seconds a suboptimal solution
at most 5 away from the optimal solution (the average optimal solution quality is 60:2). Compared
to finding the global optimal solution in 178 seconds, Hunter is able to achieve sixfold speedup
by allowing at most 5 quality loss.
3.3.2 Handling Both Types of Uncertainty
In the other set of experiments, I consider Stackelberg games with both discrete and continuous
uncertainty. Since no previous algorithm can handle both, I will only show the runtime results
of Hunter. I tested on security games with five targets and one resource, and with multiple
discrete follower types whose utilities are randomly generated. For each type, a certain number
of samples from a continuous uniform distribution was drawn. Table 3.2 summarizes the runtime
results of Hunter for 3; 4; 5; 6 follower types, and 10; 20 samples per type. As we can see, Hunter
51
can eciently handle both uncertainty simultaneously. For example, Hunter spends less than 4
minutes on average to solve a problem with 5 follower types and 20 samples per type.
#Discrete Types 3 4 5 6
10 Samples 4.9 12.8 29.3 54.8
20 Samples 32.4 74.6 232.8 556.5
Table 3.2: Runtime results (in seconds) of Hunter for handling both discrete and continuous
uncertainty.
52
Chapter 4: Robust Solutions for Security Games
While Bayesian Stackelberg game model is a useful tool for modeling various types of uncertainty
in security domains, the requirement of full distributional information of the uncertainty limits
its applicability. The lack of precise uncertainty distribution is a particularly important challenge
in security domains where historical data is scarce. This chapter focuses on security systems like
ARMOR [Pita et al., 2008] and considers two types of uncertainty: The defender’s execution
error and the attacker’s observation noise. Instead of modeling the execution and observation
uncertainty probabilistically like Section 3.2, I provide a robust optimization framework, called
Recon (Riskaverse Execution Considering Observational Noise), to find riskaverse strategies
for the defender. Recon assumes that nature chooses noise (of a known boundary) to maximally
reduce defenders utility, and Recon maximizes against this worst case.
Section 4.1 describes the formal Recon model and the notation specific to this chapter in
addition to the standard notation of security games introduced in Section 2.6. Section 4.2 provides
a mixedinteger linear program (MILP) for Recon that computes the optimal riskaverse strategy
and two novel heuristics that speed up the computation of Recon MILP by orders of magnitude.
Finally, Section 4.3 contains the experimental results that demonstrate the superiority of Recon
in uncertain domains where existing algorithms perform poorly.
53
4.1 Formal Model
This chapter restricts its investigation to ARMOR games which are security games with schedules
of size 1 and homogeneous resources as defined earlier in Section 2.6. For ARMOR games, a
strategy profile can be restricted to the form ofhx; t
i
i where x = (x
1
;:::; x
N
) is a vector of
probabilities of defender coverage over all targets and t
i
is the attacker’s choice of which target to
attack. The sum of all coverage probabilities is not more than the number of available resources
, i.e.,
P
N
i=1
x
i
. For example, a mixed strategy for the defender can be :25 coverage on
t
1
and :75 coverage on t
2
. I assume y
i
, the defender’s actual coverage on t
i
, can vary from the
intended coverage x
i
by the amount
i
, that is,jy
i
x
i
j
i
. Thus, if
1
= 0:1, it would mean
that 0:15 y
1
0:35. Additionally, I assume that the attacker wouldn’t necessarily observe the
actual implemented mixed strategy of the defender; instead the attacker’s perceived coverage for
t
i
, denoted by z
i
, can vary by
i
from the implemented coverage y
i
. Therefore,jz
i
y
i
j
i
. Thus,
in the earlier example, if y
1
was 0:3 and
1
was set to 0:05, then 0:25 z
1
0:35. Table 4.1
summarizes notation used in this chapter.
To provide the rationale behind the uncertainty model in the context of a real world scenario,
let us consider the ARMOR application at the LAX. ARMOR might generate a schedule for two
canines to patrol Terminals 1, 2, 3, 4 with probabilities of 0:2, 0:8, 0:5, 0:5 respectively. However,
a lastminute cargo inspection may require a canine unit to be called away from, say, Terminal
2 in its particular patrol, or an extra canine unit may become available by chance and get sent to
Terminal 3. Additionally, an attacker may fail to observe a canine patrol on a terminal, or he may
54
mistake an ocer walking across as engaged in a patrol. Since each target is patrolled and ob
served independently, we can assume that both execution and observation noise are independent
per target.
Variable Definition
T T =ft
1
;:::; t
N
g is a set of N targets
u
i
Defender’s payo if target t
i
is uncovered
c
i
Defender’s payo if target t
i
is covered
u
i
Attacker’s payo if target t
i
is uncovered
c
i
Attacker’s payo if target t
i
is covered
Number of defender resources
x
i
Defender’s intended coverage of target t
i
y
i
Defender’s actual coverage of target t
i
z
i
Attacker’s observed coverage of target t
i
i
i
=
c
i
u
i
i
i
=
u
i
c
i
D
i
(x
i
) Defender’s expected utility for target t
i
D
i
(x
i
) =
u
i
+
i
x
i
A
i
(x
i
) Attacker’s expected utility for target t
i
A
i
(x
i
) =
u
i
i
x
i
i
Maximum execution error for target t
i
i
Maximum observation error for target t
i
Table 4.1: Notation for Recon
Target
c
i
u
i
c
i
u
i
t
1
10 0 1 1
t
2
0 10 1 1
Figure 4.1: Example ARMOR game with two targets.
To see why SSE can be vulnerable to execution and observation noise, consider the example
in Figure 4.1 with two targets, t
1
and t
2
and one defender resource. The SSE strategy for the
defender would be protecting t
1
and t
2
with 0:5 probability each, making them indierent for
the attacker. The attacker breaks ties in defender’s favor and chooses t
1
to attack, giving the
defender an expected utility of 5. This SSE strategy is not robust to any noise – by deducting an
55
infinitesimal amount of coverage probability from t
2
, the attacker’s best response changes to t
2
,
reducing the defender’s expected utility to5. In this case, it is better for the security agency to
use a riskaverse strategy, which provides the defender the maximum worstcase expected utility.
For example, assuming no execution error and 0:1 observational uncertainty ( = 0 and = 0:1),
the optimal riskaverse defender strategy is to protect t
1
with 0:4 probability and t
2
with 0:6+
probability so that even in the worstcase, the attacker would choose t
1
, giving the defender an
expected utility of 4. Finding the optimal riskaverse strategy for general games remains dicult,
as it is essentially a bilevel programming problem [Bard, 2006].
The objective is to find the optimal riskaverse strategy x, maximizing the worstcase defender
utility, u
(x) (Constraint (4.1) and (4.2)). Given a fixed maximum execution and observation
noise, and respectively, u
(x) is computed by the minimization problem from Constraint
(4.3) to (4.6).
max
x
u
(x) (4.1)
s.t.
N
X
i=1
x
i
; 0 x
i
1 (4.2)
u
(x) = min
y;z;t
j
D
j
(y
j
) (4.3)
s.t. t
j
2 arg max
t
i
2T
A
i
(z
i
) (4.4)
i
y
i
x
i
i
; 0 y
i
1 (4.5)
i
z
i
y
i
i
; 0 z
i
1 (4.6)
The overall problem is a bilevel programming problem. For a fixed defender strategy x, the
secondlevel problem from Constraint (4.3) to (4.6) computes the worstcase defender’s executed
56
coverage y, the attacker’s observed coverage z, and the target attacked t
j
. (y; z; t
j
) is chosen such
that the defender’s expected utility D
j
(y
j
) (see Table 4.1) is minimized, given that the attacker
maximizes his believed utility
1
A
j
(z
j
) (Constraint (4.4)). This robust optimization is similar in
spirit to Aghassi and Bertsimas 2006b, although that is in the context of simultaneous move
games.
This also highlights the need to separately model both execution and observation noise. In
deed a problem with uncertainty defined as (;) is dierent from a problem with (
0
= 0;
0
=
+) (or viceversa), since the defender utility is dierent in the two problems. Other key
properties of our approach include the solution of the above problem is an SSE if = = 0.
Furthermore, a Maximin strategy is obtained when = 1 with = 0, since z can be arbitrary.
Finally, = 1 implies that the execution of the defender is independent of x and thus, any feasible
x is optimal.
1
The attacker’s believed utility is computed using the strategy observed by the attacker, and it may not be achieved,
since z can be dierent from y, which can be dierent from x.
57
4.2 Approach
I will present the a mixedinteger linear programming (MILP) formulation for Recon to compute
the riskaverse defender strategy in the presence of execution and observation noise. It encodes
the necessary and sucient conditions of the secondlevel problem (Constraint (4.4)) as linear
constraints. The intuition behind these constraints is to identifyS(x), the bestresponse action set
for the attacker given a strategy x, and then break ties against the defender. Additionally, Recon
represents the variables y and z in terms of the variable x – it reduces the bilevel optimization
problem to a singlelevel optimization problem. I will first define the term inducible target and
then the associated necessary/sucient conditions of the second level problem.
Definition 4. A target t
j
is said to be weakly inducible by a mixed strategy x if there exists a
strategy z with 0 z
i
1 andjz
i
x
i
j
i
+
i
for all t
i
2 T, such that t
j
is the best response to z
for the attacker, i.e., t
j
= arg max
t
i
2T
A
i
(z
i
).
Additionally, I define the upper and lower bounds on the utility the attacker may believe
to obtain for the strategy profilehx; t
i
i. These bounds will then be used to determine the best
response setS(x) of the attacker.
Definition 5. For the strategy profilehx; t
i
i, the upper bound of attacker’s believed utility is given
by A
+
i
(x
i
), which would be reached when the attacker’s observed coverage of t
i
reaches the lower
bound maxf0; x
i
i
i
g.
A
+
i
(x
i
) = minf
u
i
; A
i
(x
i
i
i
)g (4.7)
58
Similarly, denote the lower bound of attacker’s believed utility of attacking target t
i
by A
i
(x
i
),
which is reached when the attacker’s observed coverage probability on t
i
reaches the upper bound
minf1; x
i
+
i
+
i
g.
A
i
(x
i
) = maxf
c
i
; A
i
(x
i
+
i
+
i
)g (4.8)
Lemma 1. A target t
j
is weakly inducible by x if and only if A
+
j
(x
j
) max
t
i
2T
A
i
(x
i
).
Proof. If t
j
is weakly inducible, consider z such that t
j
= arg max
t
i
2T
A
i
(z
i
). Since z
j
maxf0; x
j
j
j
g and for all t
i
, t
j
, z
i
minf1; x
i
+
i
+
i
g, we have:
A
+
j
(x
j
) = minf
u
j
; A
j
(x
j
j
j
)g A
j
(z
j
)
A
i
(z
i
) maxf
c
i
; A
i
(x
i
+
i
+
i
)g = A
i
(x
i
):
On the other hand, if A
+
j
(x
j
) A
i
(x
i
) for all t
i
2 T, we can let z
j
= maxf0; x
i
j
j
g and
z
i
= minf1; x
i
+
i
+
i
g for all t
i
, t
j
, which satisfies t
j
= arg max
t
i
2T
A
i
(z
i
). This implies t
j
is
weakly inducible.
Let us also define D
i
(x
i
), the lower bound on the defender’s expected utility for the strategy
profilehx; t
i
i. This lower bound is used to determine the defender’s worstcase expected utility.
Definition 6. For the strategy profilehx; t
i
i, D
i
(x
i
) is achieved when the defender’s implemented
coverage on t
i
reaches the lower bound maxf0; x
i
i
g, and is given by:
D
i
(x
i
) = maxf
u
i
; D
i
(x
i
i
)g (4.9)
59
Lemma 2. LetS(x) be the set of all targets that are weakly inducible by x, then u
(x) =
min
t
i
2S(x)
D
i
(x
i
).
Proof. A target not inS(x) cannot be attacked, since it is not the best response of the attacker
for any feasible z. Additionally, for any target t
i
inS(x), the minimum utility of the defender is
D
i
(x
i
). Therefore, u
(x) min
t
i
2S(x)
D
i
(x
i
).
Additionally, we prove u
(x) min
t
i
2S(x)
D
i
(x
i
) by showing there exist (y; z; t
j
) satisfy
ing Constraint (4.4) to (4.6) with D
j
(y
j
) = min
t
i
2S(x)
D
i
(x
i
). To this end, we choose t
j
=
arg min
t
i
2S(x)
D
i
(x
i
), y
j
= maxf0; x
j
j
g, z
j
= maxf0; x
j
j
j
g, and y
i
= minf1; x
i
+
i
g,
z
i
= minf1; x
i
+
i
+
i
g for all t
i
, t
j
. The choice of y and z here is to maximally reduce the
actual and perceived coverage on t
j
and maximally increase the actual and perceived coverage on
all other targets t
i
, t
j
. By construction, y and z satisfy Constraint (4.5) and (4.6). And since
t
j
is weakly inducible, we have for all t
i
, t
j
, A
j
(z
j
) = A
+
j
(x
j
) A
i
(x
i
) = A
i
(z
i
), implying
t
j
= arg max
t
i
2T
A
i
(z
i
).
Lemma (1) and (2) are the necessary and sucient conditions for the second level optimiza
tion problem, reducing the bilevel optimization problem into a single level MILP.
4.2.1 Recon MILP
Now we present the MILP formulation for Recon. It maximizes the defender utility, denoted
as u. v represents the highest lowerbound on the believed utility of the attacker, given in Con
straint (4.11). The binary variable q
i
is 1 if the target t
i
is weakly inducible; it is 0 otherwise.
Constraint (4.12) says that q
i
= 1 if A
+
i
(x
i
) v (M is a large constant and is a small positive
constant which together ensure that q
i
= 1 when A
+
i
(x
i
) = v) and together with Constraint (4.11),
60
encodes Lemma 1. The constraint that q
i
= 0 if A
+
i
(x
i
) < v could be added to Recon, however,
it is redundant since the defender wants to set q
i
= 0 in order to maximize u. Constraint (4.13)
says that the defender utility u is less than D
i
(x
i
) for all inducible targets, thereby implementing
Lemma 2. Constraint (4.14) ensures that the allocated resources are no more than the number of
available resources
, maintaining feasibility.
max
x;q;u;v
u (4.10)
s.t. v = max
t
i
2T
A
i
(x
i
) (4.11)
A
+
i
(x
i
) v + q
i
M (4.12)
u D
i
(x
i
) + (1 q
i
)M (4.13)
X
i
x
i
(4.14)
x
i
2 [0; 1] (4.15)
q
i
2f0; 1g (4.16)
The max function in Constraint (4.11) can be formulated using N binary variables,
(h
1
;:::; h
N
), in the following manner:
A
i
(x
i
) v A
i
(x
i
) + (1 h
i
)M (4.17)
N
X
i=1
h
i
= 1; h
i
2f0; 1g (4.18)
61
Constraint (4.17) ensures that v A
i
(x
i
) for all 1 i N and v = A
j
(x
j
) when h
j
= 1 and
Constraint (4.18) ensures that only one h
j
is set to 1.
The min operation in A
+
i
(x
i
) is also implemented similarly. For example, Equation (4.7) can
be encoded as:
u
i
(1 l
i
)M A
+
i
(x
i
)
u
i
A
i
(x
i
i
i
) l
i
M A
+
i
(x
i
) A
i
(x
i
i
i
)
l
i
2f0; 1g
It is easy to see that A
+
i
(x
i
) minf
u
i
; A
i
(x
i
i
i
)g. Furthermore, when l
i
= 1, the first constraint
enforces A
+
i
(x
i
) =
u
i
and when l
i
= 1, the second constraint enforces A
+
i
(x
i
) = A
i
(x
i
i
i
).
I will omit the details for expanding A
i
(x
i
) and D
i
(x
i
)—they can be encoded in exactly the
same fashion.
4.2.2 Speeding up
I described a MILP formulation of Recon to compute the riskaverse strategy for the defender.
Solving this MILP is however computationally challenging as it involves a large number of integer
variables. Using integer variables increases the complexity of the linear programming problem;
indeed solving integer programs is NPhard. MILP solvers internally use branchandbound to
evaluate integer assignments. Availability of good lower bounds implies that less combinations
of integer assignments (branchandbound nodes) need to be evaluated. Such lower bounds can
be supplied to Recon MILP by simply adding a constraint, e.g., u u
b
where u
b
is a lower bound.
62
This is indeed the intuition behind speeding up the execution of Recon MILP. I will provide two
methods, aRecon and iRecon, to generate lower bounds.
4.2.2.1 aRecon:
aRecon solves a restricted version of Recon. This restricted version has lower number of integer
variables, and thus generates solutions faster. It replaces A
+
i
(x
i
) by A
i
(x
i
i
i
) and D
i
(x
i
) by
D
i
(x
i
i
), thereby rewriting Constraints (4.12) and (4.13) as follows:
A
i
(x
i
i
i
) v + q
i
M (4.19)
u D
i
(x
i
i
) + (1 q
i
)M (4.20)
aRecon is indeed more restricted — the LHS of Constraint (4.19) in aRecon is no less than
the LHS of Constraint (4.12); and the RHS of Constraint (4.20) is no greater than the RHS of
Constraint (4.13). Therefore, any solution generated by aRecon is feasible in Recon, and acts as
a lower bound.
4.2.2.2 iRecon:
iRecon uses an iterative method to obtain monotonically increasing lower bounds u
(k)
of Recon.
Using the insight that Constraint (4.19) is binding only when q
i
= 0, and (4.20) when q
i
= 1,
iRecon rewrites Constraints (4.19) and (4.20) as follows:
x
i
8
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
:
a;i
(v) =
u
i
v+
i
+
i
+
i
if q
i
= 0
d;i
(u) =
u
u
i
i
)
+
i
if q
i
= 1
(4.21)
63
Constraint (4.21) says that q
i
= 0 implies x
i
a;i
(v) and q
i
= 1 implies x
i
d;i
(u).
2
Constraint
(4.21) is equivalent to:
x
i
minf
d;i
(u);
a;i
(v)g
=
d;i
(u) + minf0;
a;i
(v)
d;i
(u)g (4.22)
The equivalence between Constraint (4.21) and (4.22) can be verified as follows: (x; u; v)
from any feasible solution (x; q; u; v) of (4.21) is trivially feasible in (4.22). On the other hand,
given a feasible solution (x; u; v) to Constraint (4.22), we choose q
i
= 1 if x
i
d;i
(u) and 0
otherwise, and thus obtain a feasible solution to Constraint (4.21). Hence, an equivalent problem
of aRecon can be obtained by replacing Constraints (4.12) and (4.13) by Constraint (4.22). In the
k
th
iteration, iRecon substitutes
d;i
(u)
a;i
(v) by a constant,
(k)
i
, restricting Constraint (4.22).
This value is updated in every iteration while maintaining a restriction of Constraint (4.22). Such
a substitution reduces Constraint (4.22) to a linear constraint, implying that iRecon performs a
polynomialtime computation in every iteration.
3
Observe that
d;i
(u) is increasing in u where as
a;i
(v) is decreasing in v (refer Constraint
(4.21)), and hence
d;i
(u)
a;i
(v) is increasing in both u and v. iRecon generates an increasing
sequence off
(k)
i
=
d;i
(u
(k)
)
a;i
(v
(k)
)g by finding increasing sequences of u
(k)
and v
(k)
. As I will
show later, substituting
d;i
(u)
a;i
(v) withf
(k)
i
g in Constraint (4.22) guarantees the correct
ness. Since a higher value of
(k)
i
implies a lower value of minf0;
(k)
i
g, a weaker restriction is
imposed by Constraint (4.22), leading to a better lower bound u
(k+1)
.
2
This is not equivalent to the unconditional equation x
i
maxf
a;i
(v);
d;i
(u)g.
3
While the formulation has integer variables from Constraint (4.11), it can be considered as 2N LPs since there are
only 2N distinct combinations of integer assignments.
64
Algorithm 2: Pseudo code of iRecon
1 k = 0, u
(0)
= v
(0)
=1;
2 whilejv
(k+1)
v
(k)
j and ju
(k+1)
u
(k)
j do
3 v
(k+1)
= Solve(ALP (u
(k)
; v
(k)
));
4 u
(k+1)
= Solve(DLP (u
(k)
; v
(k)
));
5 k = k + 1;
6 end
Given u
(k)
and v
(k)
, iRecon uses DLP to compute the u
(k+1)
, and ALP to compute v
(k+1)
. The
pseudocode for iRecon is given in Algorithm 2. DLP is the following maximization linear
program, which returns the solution vector (x; u; ˆ v), such that u is the desired lower bound.
max
x;u;ˆ v
u
s.t. Constraint(4:11); (4:14) and (4:15)
x
i
d;i
(u) + minf0;
(k)
i
g (4.23)
u u
(k)
; ˆ v v
(k)
(4.24)
Constraint (4.24) is added to DLP to ensure that we get a monotonically increasing solution in
every iteration. Similarly, given u
(k)
and v
(k)
, ALP is the following minimization problem. It
minimizes v to guarantee that Constraint (4.23) inDLP remains a restriction to Constraint (4.22)
65
for the next iteration, ensuring DLP always provides a lower bound of Recon. More detail is
given in Proposition 3 which proves the correctness of iRecon.
min
x;u;v
v
s.t. Constraint (4:11); (4:14) and (4:15)
x
i
a;i
(v) + minf
(k)
i
; 0g (4.25)
v v
(k)
(4.26)
Proposition 3. Both DLP and ALP are feasible and bounded for every iteration k until iRecon
converges.
Proof. ALP is bounded for every iteration because v max
t
i
2T
c
i
by Constraint (4.11). I will
prove the rest of the proposition using induction. First I establish that both DLP and ALP are
feasible and bounded in the first iteration. In the first iteration, DLP is feasible for any value of
x
i
0 when u = min
t
i
2T
f
u
i
i
i
g (from Constraint (4.21)), and it is bounded since
d;i
(u)
x
i
1 for all t
i
2 T. In the same way, for ALP, Constraint (4.25) becomes x
i
1 in the first
iteration. Thus, v = max
t
i
2T
A
i
(x
i
)>1 is a feasible solution.
Assuming that DLP and ALP are feasible and bounded for iterations 1; 2;:::; k, I now show
that they remain bounded and feasible in iteration k + 1. Firstly, DLP is bounded in the k + 1
th
iteration since
d;i
(u) 1minf0;
(k)
i
g for all t
i
2 T. DLP is feasible because the solution from
the k
th
iteration, (x
(k)
; u
(k)
; ˆ v
(k)
), remains feasible. To see this, observe that since
(k)
d;i
is increasing
and
(k)
a;i
is decreasing with k, thus we have
(k)
i
(k1)
i
. Hence minf0;
(k1)
i
g minf0;
(k)
i
g,
implying that (x
(k)
; u
(k)
) satisfies Constraint (4.23). Moreover, Constraints (4.11), (4.14), (4.15)
and (4.24) are trivially satisfied.
66
Similarly, ALP is also feasible in the k + 1
th
iteration sincehx
(k+1)
; u
(k+1)
; ˆ v
(k+1)
i, the solution
returned by DLP in the k + 1
th
iteration, satisfies all the constraints of ALP. Firstly, Constraints
(4.11), (4.14), (4.15) and (4.26) are trivially satisfied. Secondly, Constraint (4.25) is also satisfied
since:
d;i
(u
(k+1)
)
a;i
(ˆ v
(k+1)
)
(k)
i
: (4.27)
x
(k+1)
i
d;i
(u
(k+1)
) + minf0;
(k)
i
g from (4.23)
= minf
d;i
(u
(k+1)
);
d;i
(u
(k+1)
)
(k)
i
g
minf
d;i
(u
(k+1)
);
a;i
(ˆ v
(k+1)
a
)g from (4.27)
=
a;i
(ˆ v
(k+1)
a
) + minf
d;i
(u
(k+1)
)
a;i
(ˆ v
(k+1)
a
); 0g
a;i
(ˆ v
(k+1)
a
) + minf
(k)
i
; 0g from (4.27)
Similarly, (x
(k+1)
; u
(k+1)
; ˆ v
(k+1)
) is a feasible solution of aRecon for any k using inequality (4.27),
and hence, u
(k+1)
is a lower bound of Recon. Additionally, since the sequencefu
(k)
g is bounded and
monotonically increasing, it must converge.
67
4.3 Experimental Results
I provide two sets of experimental results: (i) I provide the runtime results of Recon, showing
the eectiveness of the two heuristics aRecon and iRecon. (ii) I compare the solution quality
of strategies generated by Eraser, Cobra, Hunter, and Recon, under execution and observation
uncertainty.
4.3.1 Runtime of Recon
0.1
1
10
100
1000
10 20 30 40 50 60 70 80
Runtime (in seconds)
#Targets
RECON
aRECON
iRECON
(a) Runtime of Recon with = = 0:01.
0.1
1
10
100
1000
10 20 30 40 50 60 70 80
Runtime (in seconds)
#Targets
RECON
aRECON
iRECON
(b) Runtime of Recon with = = 0:1.
Figure 4.2: Runtime of Recon MILP and the speedup of lower bound heuristics aRecon and
iRecon.
In this set of experiments, I show the runtime of the three variants of Recon with increasing
number of targets. In all test instances, I set the number of defender resources to 20% of the
number of targets. The results were obtained using CPLEX on a standard 2.8GHz machine with
2GB main memory, and averaged over 30 trials. Figures 4.2(a) and 4.2(b) show the runtime
results of Recon without any lower bounds, and with lower bounds provided by aRecon and
iRecon respectively. The xaxis shows the number of targets and the yaxis (in logarithmic
68
scale) shows the total runtime in seconds. Both aRecon and iRecon heuristics help reduce the
total runtime significantly in both uncertainty settings—the speedup is of orders of magnitude in
games with large number of targets. For instance, for cases with 80 targets and high uncertainty,
Recon without heuristic lower bounds takes 3; 948 seconds, whereas Recon with aRecon lower
bound takes a total runtime of 52 seconds and Recon with iRecon lower bound takes a total
runtime of 22 seconds.
4.3.2 Performance under uncertainty
In this set of experiments, I compared the performance of various candidate strategies under
continuous execution and observation uncertainty. Two scenarios of the ARMOR security games
were considered: (i) games with 5 targets and 2 defender resources and (ii) games with 8 targets
and 3 defender resources. Payos
c
i
and
u
i
are integers chosen uniformly randomly from 1 to 10
while
u
i
and
c
i
are integers chosen uniformly randomly from10 to1.
Let us define parameterized uncertainty distributions
as simplified examples of continuous
execution and observation uncertainty. In ARMOR security games with uncertainty distribution
, the defender’s execution and the attacker’s observation at every target follows independent
uniform distributions determined by . More specifically, given an intended defender strategy
x = (x
1
;:::; x
N
), where x
i
represents the probability of protecting target i, the actual executed
strategy y = (y
1
;:::; y
N
) has every y
i
following a uniform distribution between x
i
and x
i
+
and the actual observed strategy z = (z
1
;:::; z
N
) has every z
i
following a uniform distribution
between y
i
and y
i
+. Here is referred to as the uncertainty parameter and a higher implies
a higher amount of uncertainty.
The following candidate strategies provided were compared:
69
Eraser: defender’s SSE strategy computed by the Eraser algorithm [Kiekintveld et al.,
2009].
Cobra1: defender strategy generated by Cobra [Pita et al., 2010], the latest algorithms
that addresses attacker’s observational error, with bounded rationality parameter set to
1:0.
4
Cobra2: defender strategy generated by Cobra with = 2:0 (as suggested in [Pita et al.,
2010]).
Hunter100: defender strategy generated by Hunterbased sample average approximation
with 100 samples (see Section 3.2). The samples of uncertainty realization were drawn
randomly from distribution
0:1
.
Recon: defender strategy generated by the Recon MILP with = = 0:1, i.e., the maxi
mum execution and observation noise for every target was set to 0:1.
To understand the performance of the candidates strategies above, two major metrics were
employed for a given uncertainty parameter : (i) expected defender utility under uncertainty
distribution
and (ii) worstcase defender utility given a maximum execution error and a
maximum observation error where
i
=
i
= . The expected defender utility was computed
by evaluating the strategy for 10; 000 sample uncertainty realizations and taking the average.
The expected defender utility is a valuable metric to evaluate a strategy’s performance when
the uncertainty distribution follows some smooth and continuous distribution (here independent
uniform distributions
). The worstcase defender utility was computed using the secondlevel
optimization problem given in Constraints (4.3) to (4.6). The worstcase defender utility is also
4
The human bias parameter in Cobra is set to 1 since the experiments here are not tested against human subjects.
70
an important metric which determines how robust a strategy is given some uncertainty boundary
(here a hyperrectangle boundary defined by). In the experimental results reported in Figure 4.3,
for each strategy and each metric, 5 uncertainty settings were evaluated: = 0; 0:05; 0:1; 0:15; 0:2.
0 0.05 0.1 0.15 0.2
−2
−1
0
1
2
3
Uncertainty parameter ρ
Defender expected utility
ERASER
COBRA−1
COBRA−2
RECON
HUNTER−100
(a) Expected defender utility with increasing
uncertainty (games with 5 targets and 2 de
fender resources).
0 0.05 0.1 0.15 0.2
−6
−4
−2
0
2
4
Defender worst−case utility
Uncertainty boundary (α = β)
ERASER
COBRA−1
COBRA−2
RECON
HUNTER−100
(b) Worst defender utility with increasing uncer
tainty (games with 5 targets and 2 defender re
sources).
0 0.05 0.1 0.15 0.2
−2
−1
0
1
2
3
Uncertainty parameter ρ
Defender expected utility
ERASER
COBRA−1
COBRA−2
RECON
HUNTER−100
(c) Expected defender utility with increasing un
certainty (games with 8 targets and 3 defender re
sources).
0 0.05 0.1 0.15 0.2
−6
−4
−2
0
2
4
Uncertainty boundary (α = β)
Defender worst−case utility
ERASER
COBRA−1
COBRA−2
RECON
HUNTER−100
(d) Worst defender utility with increasing uncer
tainty (games with 8 targets and 3 defender re
sources).
Figure 4.3: Performance of strategies generated by Recon, Hunter, Eraser, and Cobra.
Figure 4.3(a) and Figure 4.3(b) show the expected defender utility and the worstcase de
fender utility with increasing uncertainty for ARMOR games with 5 targets and 2 defender re
sources. Here the xaxis represents the value of and the yaxis represent the defender’s utility
71
(either expected or worstcase). Figure 4.3(c) and Figure 4.3(d) show exactly the same compar
ison but for ARMOR games with 8 targets and 3 defender resources. As we can see, the trends
observed in the two scenarios were consistent. All the comparisons claimed below between two
candidate strategies were statistically significant with pvalue under 0:05. The take away mes
sages from Figure 4.3 are:
The SSE strategy computed by Eraser performs poorly in the presence of execution and
observation uncertainty in terms of both expected and worstcase utility metrics. A small
amount of noise = 0:05 was sucient to lower the expected utility of Eraser from 2:73 to
0:63 in the 5target scenario and from 2:86 to0:96 in the 8target scenario. Indeed, when
there was nonzero uncertainty, Eraser was consistently outperformed by other candidate
strategies in both expected and worstcase utility metrics.
Comparing Cobra1 and Cobra2, we can see the parameter in Cobra oers tradeo
between expected utility and robustness to noise. Cobra1 had higher expected utility
than Cobra2 when there was low uncertainty but degraded faster than Cobra2 when the
amount of uncertainty increased.
When the true uncertainty distribution was close to the distribution used in generating the
strategy (
0:05
;
0:1
;
0:15
), Hunterbased sample average approximation provided the best
expected utility consistently in both the 5target and the 8target scenarios. This suggests
that Hunter performs well even when the modeled uncertainty distribution (
0:1
) is dier
ent from the actual uncertainty distribution.
72
When the true uncertainty distribution was drastically dierent from the modeled uncer
tainty distribution, e.g., when the true uncertainty distribution was
0
and
0:2
, Hunter
100 was outperformed by other candidate strategies. It therefore is valuable to obtain good
estimate of the true uncertainty distribution in order for Hunterbased sample average ap
proximation approach to work well in practice.
In the presence of uncertainty, Recon was consistently the best performer in terms of worst
case utility. This is indeed the motivation of Recon—being able to provide guarantees on
the defender’s utility is extremely valuable in situations where precise uncertainty distribu
tion is unavailable. It worths noting that the worstcase utility of Recon can still be very bad
when the actual uncertainty realization can exceed the estimated boundary. For example,
when uncertainty boundary increased from = = 0:1 to = = 0:15, the worstcase
utility of Recon dropped from0:24 to4:20 and from0:29 to5:89 for 5target and 8
target scenarios respectively (although Recon was still better than other candidate strategies
when the uncertainty boundary was = = 0:15).
Recon was outperformed by Hunter100 and the variants of Cobra in terms of expected
utility when the uncertainty was low (
when 0:1). This implies that the Recon
generated strategies can be overly conservative when the uncertainty boundary used was
too loose. For example, when the true uncertainty distribution is
0:1
, the Recon strategy
assuming an uncertainty boundary of = = 0:1 is too conservative to generate good
expected utility.
73
Chapter 5: Stackelberg vs. Nash in Security Games
A key element of the Stackelberg paradigm is the concept of leadership, which naturally defines a
party of the game as the leader who commits to a possibly randomized strategy whereas the other
party acts as the follower who attempts to observe the leader’s strategy. In previous chapters, de
spite the fact that the follower’s observation is possibly noisy, this leadership paradigm is always
taken for granted. However, there are legitimate concerns about whether the Stackelberg model is
appropriate in all cases. In some situations attackers may choose to act without acquiring costly
information about the security strategy, especially if security measures are dicult to observe
(e.g., undercover ocers) and insiders are unavailable. In such cases, a simultaneousmove game
model may be a better reflection of the real situation. The defender faces an unclear choice about
which strategy to adopt: the recommendation of the Stackelberg model (SSE strategy), or of the
simultaneousmove model (NE strategy), or something else entirely? Recall the example given
in Figure 2.2, the equilibrium strategy can in fact dier between these models.
In this chapter I will provide theoretical and experimental analysis of the leader’s dilemma,
focusing on security games defined in Chapter 2. Section 5.1 characterizes a set of key properties
of security games. In particular, I show that when the security games satisfy the SSAS (Subsets
of Schedules Are Schedules) property, the defender’s SSE strategy is also an NE strategy. In
74
this case, the defender is always playing a best response by using an SSE regardless of whether
the attacker can observe or not. Section 5.2 shows that this property no longer holds when the
attacker can attack multiple targets. Section 5.3 contains experimental results.
5.1 Properties of Security Games
The challenge faced here is to understand the fundamental relationships between the SSE and
NE strategies in security games. A special case is zerosum security games, where the defender’s
utility is the exact opposite of the attacker’s utility. For finite twoperson zerosum games, it is
known that the dierent game theoretic solution concepts of NE, minimax, maximin and SSE
all give the same answer. In addition, Nash equilibrium strategies of zerosum games have a
very useful property in that they are interchangeable: an equilibrium strategy for one player
can be paired with the other player’s strategy from any equilibrium profile, and the result is an
equilibrium, and the payos for both players remain the same.
Unfortunately, security games are not necessarily zerosum (and are not zerosum in deployed
applications). Many properties of zerosum games do not hold in security games. For instance,
a minimax strategy in a security game may not be a maximin strategy. Consider the example in
Table 5.1, in which there are 3 targets and one defender resource. The defender has three actions;
each of defender’s actions can only cover one target at a time, leaving the other targets uncovered.
While all three targets are equally appealing to the attacker, the defender has varying utilities
of capturing the attacker at dierent targets. For the defender, the unique minimax strategy,
(1=3; 1=3; 1=3), is dierent from the unique maximin strategy, (6=11; 3=11; 2=11).
75
t
1
t
2
t
3
Cov. Unc. Cov. Unc. Cov. Unc.
Defender 1 0 2 0 3 0
Attacker 0 1 0 1 0 1
Table 5.1: Security game which is not strategically zerosum
Strategically zerosum games [Moulin and Vial, 1978] are a natural and strict superset of
zerosum games for which most of the desirable properties of zerosum games still hold. This
is exactly the class of games for which no completely mixed Nash equilibrium can be improved
upon. Moulin and Vial proved a game (A; B) is strategically zerosum if and only if there exist
> 0 and > 0 such thatA +B = I
c
+ I
r
, where I
c
is a matrix with identical columns and
I
r
is a matrix with identical rows [Moulin and Vial, 1978]. Unfortunately, security games are not
even strategically zerosum. The game in Table 5.1 is a counterexample, because otherwise there
must exist;> 0 such that,
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
1 0 0
0 2 0
0 0 3
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
+
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
0 1 1
1 0 1
1 1 0
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
=
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
c
1
c
1
c
1
c
2
c
2
c
2
c
3
c
3
c
3
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
+
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
r
1
r
2
r
3
r
1
r
2
r
3
r
1
r
2
r
3
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
From these equations, c
1
+ r
2
= c
1
+ r
3
= c
2
+ r
1
= c
2
+ r
3
= c
3
+ r
1
= c
3
+ r
2
= , which
implies r
1
= r
2
= r
3
and c
1
= c
2
= c
3
. We also know c
1
+ r
1
= , c
2
+ r
2
= 2, c
3
+ r
3
= 3.
However since c
1
+ r
1
= c
2
+ r
2
= c
3
+ r
3
, must be 0, which contradicts the assumption> 0.
Nevertheless, I will show in the rest of this section that security games still have some im
portant properties. I will start by establishing equivalence between the set of defender’s minimax
strategies and the set of defender’s NE strategies. Second, I will show Nash equilibria in security
games are interchangeable, resolving the defender’s equilibrium strategy selection problem in
76
simultaneousmove games. Third, I will show that under a natural restriction on schedules, any
SSE strategy for the defender is also a minimax strategy and hence an NE strategy. This resolves
the defender’s dilemma about whether to play according to SSE or NE when there is uncertainty
about attacker’s ability to observe the strategy. Finally, for a restricted class of games (ARMOR
games), there is a unique SSE/NE defender strategy and a unique attacker NE strategy.
5.1.1 Equivalence of Nash Equilibrium and Minimax
Recall the definition and notation provided in Section 2.6. In this section, I will first prove that any
defender’s NE strategy is also a minimax strategy. Then for every defender’s minimax strategy X
we construct a strategy a for the attacker such thathX; ai is an NE profile.
Definition 7. For a defender’s mixed strategy X, define the attacker’s best response utility by
E(X) = max
N
i=1
v(X; t
i
). Denote the minimum of the attacker’s best response utilities over all
defender’s strategies by E
= min
X
E(X). The set of defender’s minimax strategies is defined as:
M
=fXjE(X) = E
g:
Define the function f as follows. If a is an attacker’s strategy in which target t
i
is attacked
with probability a
i
, then f (a) = ¯ a is an attacker’s strategy such that
¯ a
i
=a
i
i
i
;
77
where > 0 is a normalizing constant such that
P
N
i=1
¯ a
i
= 1. The inverse function f
1
(¯ a) = a is
given by the following equation.
a
i
=
1
¯ a
i
i
i
(5.1)
Lemma 3. Consider a security gameG. Construct the corresponding zerosum security game
¯
G
in which the defender’s utilities are redefined as follows.
c
i
=
c
i
;
u
i
=
u
i
; 8i = 1;:::; N
ThenhX; ai is an NE profile inG if and only ifhX; f (a)i is an NE profile in
¯
G.
Proof. Note that the supports of strategies a and ¯ a are the same, and also that the attacker’s utility
function is the same in gamesG and
¯
G. Thus a is a best response to X inG if and only if ¯ a is a
best response to X in
¯
G.
Denote the utility that the defender gets if profilehX; ai is played in gameG by u
G
(X; a). To
show that X is a best response to a in gameG if and only if X is a best response to ¯ a in
¯
G, it is
sucient to show equivalence of the following two inequalities.
u
G
(X; a) u
G
(X
0
; a) 0 , u
¯
G
(X; ¯ a) u
¯
G
(X
0
; ¯ a) 0
I will prove the equivalence by starting from the first inequality and transforming it into the
second one. On the one hand, from Equation (2.4) we have,
u
G
(X; a) u
G
(X
0
; a) =
N
X
i=1
a
i
(x
i
x
0
i
)
i
:
78
Similarly, on the other hand, we have,
u
¯
G
(X; ¯ a) u
¯
G
(X
0
; ¯ a) =
N
X
i=1
¯ a
i
(x
i
x
0
i
)
i
:
Given Equation (5.1) and> 0, we have,
u
G
(X; a) u
G
(X
0
; a) 0
,
N
X
i=1
a
i
(x
i
x
0
i
)
i
0,
N
X
i=1
1
¯ a
i
i
i
(x
i
x
0
i
)
i
0
,
1
N
X
i=1
¯ a
i
(x
i
x
0
i
)
i
0,
1
u
¯
G
(X; ¯ a) u
¯
G
(X
0
; ¯ a)
0
, u
¯
G
(X; ¯ a) u
¯
G
(X
0
; ¯ a) 0
Lemma 4. Suppose X is a defender NE strategy in a security game. Then E(X) = E
, i.e.,
NE
M
.
Proof. SupposehX; ai is an NE profile in the security gameG. According to Lemma 3,hX; f (a)i
must be an NE profile in the corresponding zerosum security game
¯
G. Since X is an NE strategy
in a zerosum game, it must also be a minimax strategy [Fudenberg and Tirole, 1991]. Thus
E(X) = E
.
Lemma 5. In a security gameG, any defender’s strategy X such that E(X) = E
is an NE
strategy, i.e.,
M
NE
.
Proof. X is a minimax strategy in bothG and the corresponding zerosum game
¯
G. Any minimax
strategy is also an NE strategy in a zerosum game [Fudenberg and Tirole, 1991]. Then there must
79
exist an NE profilehX; ¯ ai in
¯
G. By Lemma 3,hX; f
1
(¯ a)i is an NE profile inG. Thus X is an NE
strategy inG.
Theorem 5.1.1. In a security game, the set of defender’s minimax strategies is equal to the set of
defender’s NE strategies, i.e.,
M
=
NE
.
Proof. Lemma 4 shows that every defender’s NE strategy is a minimax strategy, and Lemma 5
shows that every defender’s minimax strategy is an NE strategy. Thus the sets of defender’s NE
and minimax strategies must be equal.
5.1.2 Interchangeability of Nash Equilibria
I show that Nash Equilibria in security games are interchangeable.
Theorem 5.1.2. SupposehX; ai andhX
0
; a
0
i are two NE profiles in a security gameG. Then
hX; a
0
i andhX
0
; ai are also NE profiles inG.
Proof. Consider the corresponding zerosum game
¯
G. From Lemma 3, bothhX; f (a)i and
hX
0
; f (a
0
)i must be NE profiles in
¯
G. By the interchange property of NE in zerosum games [Fu
denberg and Tirole, 1991],hX; f (a
0
)i andhX
0
; f (a)i must also be NE profiles in
¯
G. Applying
Lemma 3 again in the other direction, we get thathX; a
0
i andhX
0
; ai must be NE profiles in
G.
By Theorem 5.1.2, the defender’s equilibrium selection problem in a simultaneousmove se
curity game is resolved. The reason is that given the attacker’s NE strategy a, the defender must
get the same utility by responding with any NE strategy. Next, I will provide some additional
insights on the expected utilities of both players when some NE profile is played. In particu
lar, I will first show the attacker’s expected utility is the same in all NE profiles. However, the
80
defender may have varying expected utilities corresponding to dierent attacker’s strategies as
demonstrated by an example.
Theorem 5.1.3. SupposehX; ai is an NE profile in a security game. Then, v(X; a) = E
.
Proof. From Lemma 4, X is a minimax strategy and E(X) = E
. On the one hand,
v(X; a) =
N
X
i=1
a
i
v(X; t
i
)
N
X
i=1
a
i
E(X) = E
:
On the other hand, because a is a best response to X, it should be at least as good as the strategy
of attacking t
2 arg max
t
v(X; t) with probability 1, that is,
v(X; a) v(X; t
) = E(X) = E
:
Therefore we know v(X; a) = E
.
Unlike the attacker who gets the same utility in all NE profiles, the defender may get varying
expected utilities depending on the attacker’s strategy selection as shown in Example 1.
t
1
t
2
Cov. Unc. Cov. Unc.
Defender 1 0 2 0
Attacker 1 2 0 1
Figure 5.1: A security game where the defender’s expected utility varies in dierent NE profiles
Example 1. Consider the game shown in Figure 5.1. The defender can choose to cover one of
the two targets at a time. The only defender’s NE strategy is to cover t
1
with 100% probability,
making the attacker indierent between attacking t
1
and t
2
. One attacker’s NE response is always
81
attacking t
1
, which gives the defender an expected utility of 1. Another attacker’s NE strategy is
(2=3; 1=3), given which the defender is indierent between defending t
1
and t
2
. In this case, the
defender’s utility decreases to 2=3 because she captures the attacker with a lower probability.
5.1.3 SSE and Minimax / NE
We have already shown that the set of defender’s NE strategies coincides with her minimax
strategies. If every defender’s SSE strategy is also a minimax strategy, then SSE strategies must
also be NE strategies. The defender can then safely commit to an SSE strategy; there is no
selection problem for the defender. Unfortunately as shown in Example 2, if a security game has
arbitrary scheduling constraints, an SSE strategy may not be part of any NE profile.
t
1
t
2
t
3
t
4
Cov. Unc. Cov. Unc. Cov. Unc. Cov. Unc.
Defender 10 9 2 3 1 0 1 0
Attacker 2 5 3 4 0 1 0 1
Figure 5.2: A scheduleconstrained security game where the defender’s SSE strategy is not an
NE strategy.
Example 2. Consider the game in Figure 5.2 with 4 targetsft
1
;:::; t
4
g, 2 schedules s
1
=ft
1
; t
2
g,
s
2
=ft
3
; t
4
g, and a single defender resource. The defender always prefers that t
1
is attacked, and
t
3
and t
4
are never appealing to the attacker. There is a unique SSE strategy for the defender,
which places as much coverage probability on s
1
as possible without making t
2
more appealing
to the attacker than t
1
. The rest of the coverage probability is placed on s
2
. The result is that s
1
and s
2
are both covered with probability 0:5. In contrast, in a simultaneousmove game, t
3
and
t
4
are dominated for the attacker. Thus, there is no reason for the defender to place resources on
targets that are never attacked, so the defender’s unique NE strategy covers s
1
with probability
82
1. That is, the defender’s SSE strategy is dierent from the NE strategy. The dierence between
the defender’s payos in these cases can also be arbitrarily large because t
1
is always attacked
in an SSE and t
2
is always attacked in a NE.
The above example restricts the defender to protect t
1
and t
2
together, which makes it impos
sible for the defender to put more coverage on t
2
without making t
1
less appealing. If the defender
could assign resources to any subset of a schedule, this diculty is resolved. More formally, de
note the set of schedules that a resource i can cover by S
i
, then for any resource 1 i
, any
subset of a schedule in S
i
is also a possible schedule in S
i
:
81 i
: s
0
s2 S
i
) s
0
2 S
i
: (5.2)
If a security game satisfies Equation (5.2), we say it has the SSAS property. This is natural in many
security domains, since it is often possible to cover fewer targets than the maximum number that
a resource could possible cover in a schedule. I will show that this property is sucient to ensure
that the defender’s SSE strategy must also be an NE strategy.
Lemma 6. Suppose X is a defender strategy in a security game which satisfies the SSAS property
and x = '(X) is the corresponding vector of marginal probabilities. Then for any x
0
such that
0 x
0
i
x
i
for all t
i
2 T, there must exist a defender strategy X
0
such that'(X
0
) = x
0
.
Proof. The proof is by induction on the number of t
i
where x
0
i
, x
i
, denoted by(x; x
0
). As the
base case, if there is no target i such that x
0
i
, x
i
, the existence trivially holds because'(X) = x
0
.
Suppose the existence holds for all x; x
0
such that(x; x
0
) = k, where 0 k N 1. Consider
any x; x
0
such that(x; x
0
) = k + 1. Then for some j, x
0
j
, x
j
. Since x
0
j
0 and x
0
j
< x
j
, we have
x
j
> 0. There must be a nonempty set of coverage vectorsD
j
that cover t
j
and receive positive
83
probability in X. Because the security game satisfies the SSAS property, for every d2D
j
, there
is a valid d
which covers all targets in d except for t
j
. From the defender strategy X, by shifting
X
d
(x
j
x
0
j
)
x
j
probability from every d2D
j
to the corresponding d
, we get a defender strategy X
y
where x
y
i
= x
i
for i, j, and x
y
i
= x
0
i
for i = j. Hence(x
y
; x
0
) = k, implying there exists a X
0
such
that'(X
0
) = x
0
by the induction assumption. By induction, the existence holds for any x; x
0
.
Theorem 5.1.4. Suppose X is a defender SSE strategy in a security game which satisfies the
SSAS property. Then E(X) = E
, i.e.,
S S E
M
=
NE
.
Proof. The proof is by contradiction. First it is impossible that E(X) < E
since by definition
E
is the minimum of all possible E(X). Now supposehX; gi is an SSE profile in a security
game which satisfies the SSAS property, and E(X) > E
. Let T
a
=ft
i
jv(X; t
i
) = E(X)g be the
set of targets that give the attacker the maximum utility against the defender strategy X. By the
definition of SSE, we have
u(X; g(X)) = max
t
i
2T
a
u(X; t
i
):
Consider a defender mixed strategy X
such that E(X
) = E
. Then for any t
i
2 T
a
, v(X
; t
i
) E
.
Consider the following vector x
0
:
x
0
i
=
8
>
>
>
>
>
<
>
>
>
>
>
:
x
i
E
v(X
; t
i
) +
u
i
c
i
; t
i
2 T
a
, (5.3a)
x
i
; t
i
< T
a
, (5.3b)
where is an infinitesimal positive number. Since E
v(X
; t
i
) + > 0, we have x
0
i
< x
i
for all
t
i
2 T
a
. On the other hand, since for all t
i
2 T
a
,
v(x
0
; t
i
) = E
+ < E(X) = v(X; t
i
);
84
we have x
0
i
> x
i
0. Then for any t
i
2 T, we have 0 x
0
i
x
i
. From Lemma 6, there exists
a defender strategy X
0
corresponding to x
0
. The attacker’s utility of attacking each target is as
follows:
v(X
0
; t
i
) =
8
>
>
>
<
>
>
>
:
E
+; t
i
2 T
a
, (5.4a)
v(X
; t
i
) E
; t
i
< T
a
. (5.4b)
Thus, the attacker’s best responses to X
0
are still T
a
. For all t
i
2 T
a
, since x
0
i
> x
i
, it must be the
case that u(X; t
i
)< u(X
0
; t
i
). By definition of attacker’s SSE response g, we have,
u(X
0
; g(X
0
)) = max
t
i
2T
a
u(X
0
; t
i
)> max
t
i
2T
a
u(X; t
i
) = u(X; g(X)):
It follows that the defender is better o using X
0
, which contradicts the assumption X is an SSE
strategy of the defender.
Theorem 5.1.1 and 5.1.4 together imply the following corollary.
Corollary 1. In security games with the SSAS property, any defender’s SSE strategy is also an
NE strategy.
We can now answer the original question posed in this chapter: when there is uncertainty
over the type of game played, should the defender choose an SSE strategy or a mixed strategy
Nash equilibrium or some combination of the two? For domains that satisfy the SSAS property,
we have proven that any of the defender’s SSE strategies is also an NE strategy.
Among our motivating domains, the LAX domain satisfies the SSAS property since all sched
ules are of size 1. Other patrolling domains, such as patrolling a port, also satisfy the SSAS
property. In such domains, the defender could thus commit to an SSE strategy, which is also
85
now known to be an NE strategy. The defender retains the ability to commit, but is still play
ing a bestresponse to an attacker in a simultaneousmove setting (assuming the attacker plays
an equilibrium strategy – it does not matter which one, due to the interchange property shown
above). However, the FAMS domain does not naturally satisfy the SSAS property because mar
shals must fly complete tours (though in principle they could fly as civilians on some legs of a
tour). The question of selecting SSE vs. NE strategies in this case is addressed experimentally in
Section 5.3.
5.1.4 Uniqueness in Restricted Games
The previous sections show that SSE strategies are NE strategies in many cases. However, there
may still be multiple equilibria to select from (though this diculty is alleviated by the inter
change property). Here I will prove an even stronger uniqueness result for ARMOR games, an
important restricted class of security domains. In particular, I consider security games where
the defender has homogeneous resources that can cover any single target. The SSAS property is
trivially satisfied, since all schedules are of size 1 (and “stay home” is allowed). Any vector of
coverage probabilities x = (x
i
) such that
P
N
i=1
x
i
is a feasible strategy for the defender, so we
can represent the defender strategy by marginal coverage probabilities. With a minor restriction
on the attacker’s payo matrix, the defender always has a unique minimax strategy which is also
the unique SSE and NE strategy. Furthermore, the attacker also has a unique NE response to this
strategy.
Theorem 5.1.5. In an ARMOR game, if for every target t
i
2 T,
c
i
, E
, then the defender has a
unique minimax, NE, and SSE strategy.
86
Proof. I first show the defender has a unique minimax strategy. Let T
=ft
i
j
u
i
E
g. Define x
as
x
i
=
8
>
>
>
>
>
<
>
>
>
>
>
:
u
i
E
u
i
c
i
; t
i
2 T
, (5.5a)
0; t
i
< T
. (5.5b)
Note that E
cannot be less than any
c
i
– otherwise, regardless of the defender’s strategy, the
attacker could always get at least
c
i
> E
by attacking t
i
, which contradicts the fact that E
is the
attacker’s best response utility to a defender’s minimax strategy. Since E
c
i
and we assume
E
,
c
i
,
1 x
i
=
E
c
i
u
i
c
i
> 0) x
i
< 1:
Next, we will prove
P
N
i=1
x
i
. For the sake of contradiction, suppose
P
N
i=1
x
i
<
. Consider x
0
where x
0
i
= x
i
+. Since x
i
< 1 and
P
N
i=1
x
i
<
, we can find > 0 such that x
0
i
< 1 and
P
N
i=1
x
0
i
<
. Then every target has strictly higher coverage in x
0
than in x
, hence E(x
0
) < E(x
) = E
,
which contradicts the fact that E
is the minimum of all E(x).
Next, we show that if x is a minimax strategy, then x = x
. By the definition of a minimax
strategy, E(x) = E
. Hence, v(x; t
i
) E
) x
i
x
i
. On the one hand
P
N
i=1
x
i
and on the
other hand
P
N
i=1
x
i
P
N
i=1
x
i
. Therefore it must be the case that x
i
= x
i
for any i. Hence, x
is the unique minimax strategy of the defender.
Furthermore, by Theorem 5.1.1, we have that x
is the unique defender’s NE strategy. By
Theorem 5.1.4 and the existence of SSE [Basar and Olsder, 1995], we have that x
is the unique
defender’s SSE strategy.
87
The restriction that
c
i
, E
is equivalent to
c
i
< E
, implying that when a target is completely
covered by the defender, it is never appealing to the attacker. It is a reasonable assumption because
usually if the attacker knew attacking a particular target would definitely get himself caught, he
would never consider to attack it.
Theorem 5.1.6. In an ARMOR game, if for every target t
i
2 T,
c
i
, E
and
u
i
, E
, the attacker
has a unique NE strategy.
Proof. Define x
and T
to be the same as in the proof of Theorem 5.1.5. Given the defender’s
unique NE strategy x
, in any attacker’s best response, only t
i
2 T
can be attacked with positive
probability, because,
v(x
; t
i
) =
8
>
>
>
>
<
>
>
>
>
:
E
t
i
2 T
(5.6a)
u
i
< E
t
i
< T
(5.6b)
Supposehx
; ai forms an NE profile. We have
X
t
i
2T
a
i
= 1 (5.7)
For any t
i
2 T
, we know from the proof of Theorem 5.1.5 that x
i
< 1. In addition, because
u
i
, E
, we have x
i
, 0. Thus we have 0< x
i
< 1 for any t
i
2 T
. For any t
i
; t
j
2 T
, necessarily
a
i
i
= a
j
j
. Otherwise, assume a
i
i
> a
j
j
. Consider another defender’s strategy x
0
where
x
0
i
= x
i
+ < 1, x
0
j
= x
j
> 0, and x
0
k
= x
k
for any k, i; j.
u(x
0
; a) u(x
; a) =(a
i
i
a
j
j
)> 0
88
Hence, x
is not a best response to a, which contradicts the assumption thathx
; ai is an NE
profile. Therefore, there exists > 0 such that, for any t
i
2 T
, a
i
i
= . Substituting a
i
with
=
i
in Equation (5.7), we have
=
1
X
t
i
2T
1
i
Then we can explicitly write down a as
a
i
=
8
>
>
>
>
>
<
>
>
>
>
>
:
i
; t
i
2 T
, (5.8a)
0; t
i
< T
. (5.8b)
As we can see, a defined by (5.8a) and (5.8b) is the unique attacker NE strategy.
The implication of Theorem 5.1.5 and Theorem 5.1.6 is that in the simultaneousmove game,
there is a unique NE profile, which as a result, gives each player a unique expected utility.
5.2 Multiple Attacker Resources
To this point I have assumed that the attacker will attack exactly one target. We now extend our
security game definition to allow the attacker to use multiple resources to attack multiple targets
simultaneously. To keep the model simple, I assume homogeneous resources (for both players)
and schedules of size 1. The defender has
< N resources which can be assigned to protect any
target, and the attacker has < N resources which can be used to attack any target. Attacking
the same target with multiple resources is equivalent to attacking with a single resource. The
defender’s pure strategy is again a coverage vector d = (d
1
;:::; d
N
) 2 D, where d
i
2 f0; 1g
represents whether t
i
is covered or not. Similarly, the attacker’s pure strategy is an attack vector
89
q = (q
1
;:::; q
N
)2Q. We have
P
N
i=1
d
i
=
and
P
N
i=1
q
i
=. Ifhd; qi is played, the defender gets
a utility of
u(d; q) =
N
X
i=1
q
i
d
i
c
i
+ (1 d
i
)
u
i
while the attacker’s utility is given by
v(d; q) =
N
X
i=1
q
i
d
i
c
i
+ (1 d
i
)
u
i
The defender’s mixed strategy is again a vector X which specifies the probability of playing
each d2D. Similarly, the attacker’s mixed strategy A is a vector of probabilities corresponding
to all q2Q.
In security games with multiple attacker resources, the defender’s SSE strategy may not be
part of any NE profile, even if there are no scheduling constraints as shown in the following
example.
t
1
t
2
t
3
Cov. Unc. Cov. Unc. Cov. Unc.
Defender 0 1 100 100 0 0
Attacker 100 100 0 10 5 5
Figure 5.3: A security game with multiple attacker resources where the defender’s SSE strategy
is not an NE strategy.
Example 3. Consider the game shown in Figure 5.3. There are 3 targets t
1
; t
2
; t
3
. The defender
has 1 resource, and the attacker has 2 resources. Therefore the defender’s pure strategy space
is the set of targets to protect:ft
1
; t
2
; t
3
g, while the attacker’s pure strategy space consists of the
pairs of targets:f(t
1
; t
2
); (t
1
; t
3
); (t
2
; t
3
)g. If the defender protects t
1
and the attacker attacks (t
1
; t
2
),
the defender’s utility is
c
1
+
u
2
=100 and the attacker’s utility is
c
1
+
u
2
= 110. In
90
this example, t
1
is very appealing to the attacker no matter if it is covered or not, so t
1
is always
attacked. If t
2
is attacked, the defender gets a very low utility, even if t
2
is defended. So in the
SSE, the defender wants to make sure that t
2
is not attacked. The defender’s SSE strategy places
at least:5 probability on t
2
, so that t
1
and t
3
are attacked instead of t
2
(recall that the attacker
breaks ties in the defender’s favor in an SSE). The attacker’s SSE response is A = (0; 1; 0), i.e., to
always attack t
1
and t
3
. The other:5 defense probability will be placed on t
1
because
1
>
3
.
So, the SSE profile ishX; Ai, where X = (:5;:5; 0).
Next, I show that there is no NE in which the defender plays X. Suppose there is an NE profile
hX; A
0
i. Given X, the attacker’s utility for attacking t
1
is higher than the utility for attacking t
2
,
so it must be that t
1
is always attacked in this NE. Therefore, the attacker never playsht
2
; t
3
i.
However, this implies that t
1
is the most appealing target for the defender to cover, because
u(t
1
; A) > u(t
i
; A); i2f2; 3g. So to be a best response, the coverage of t
1
would need to be 1
instead of 0:5, contradicting the assumption that X is an equilibrium strategy for the defender.
5.3 Experimental Results
While the theoretical results presented earlier resolve the leader’s dilemma for many interesting
classes of security games, as we have seen, there are still some cases where SSE strategies are
distinct from NE strategies for the defender. One case is when the schedules do not satisfy the
SSAS property, and another is when the attacker has multiple resources. In this section, I provide
experiments to further investigate these two cases, oering evidence about the frequency with
which SSE strategies dier from all NE strategies across randomly generated games, for a variety
of parameter settings.
91
The methodology used is as follows. For a particular game instance, I first compute an SSE
strategy X using Dobss [Paruchuri et al., 2008]. Then I use the linear feasibility program below
to determine whether or not this SSE strategy is part of some NE profile by attempting to find an
appropriate attacker response strategy.
A
q
2 [0; 1]; for all q2Q (5.9)
X
q2Q
A
q
= 1 (5.10)
A
q
= 0; for all v(X; q)< E(X) (5.11)
X
q2Q
A
q
u(d; q) Z; for all d2D (5.12)
X
q2Q
A
q
u(d; q) = Z; for all d2D with X
d
> 0 (5.13)
HereQ is again the set of attacker pure strategies, which is the set of targets when attacker
has a single resource. The probability that the attacker plays q is denoted by A
q
which must
be between 0 and 1 (Constraint (5.9)). Constraint (5.10) forces the probabilities to sum to 1.
Constraint (5.11) prevents the attacker from placing positive probabilities on pure strategies which
give the attacker a utility less than the best response utility E(X). In constraints (5.12) and (5.13),
Z is a variable which represents the maximum expected utility the defender can get among all pure
strategies given the attacker’s strategy A, and X
d
denotes the probability of playing d in X. These
two constraints require the defender’s strategy X to be a best response to the attacker’s mixed
strategy. Therefore, any feasible solution A to this linear feasibility program, taken together
with the Stackelberg strategy C, constitutes a Nash equilibrium. Conversely, ifhX; Ai is a Nash
equilibrium, A must satisfy all of the LP constraints.
92
In this set of experiments, I varied:
the number of attacker resources,
the number of (homogeneous) defender resources,
the size of the schedules that resources can cover,
the number of schedules.
For each parameter setting, I generated 1000 games with 10 targets. For each target t
i
, a
pair of defender payos (
c
i
;
u
i
) and a pair of defender payos (
u
i
;
c
i
) were drawn uniformly
at random from the setf(x; y)2 Z
2
: x2 [10; +10]; y2 [10; +10]; x > yg. In each game
in the experiment, all of the schedules have the same size, except there is also always the empty
schedule—assigning a resource to the empty schedule corresponds to the resource not being used.
The schedules are randomly chosen from the set of all subsets of the targets that have the size
specified by the corresponding parameter.
The results of our experiments are shown in Figure 5.4. The plots show the percentage of
games in which the SSE strategy is not an NE strategy, for dierent numbers of defender and
attacker resources, dierent schedule sizes, and dierent numbers of schedules. Each row cor
responds to a dierent number of attacker resources, and each column to a dierent schedule
size. The number of defender resources is on the xaxis, and each number of schedules is plotted
separately. For each parameter setting, 1000 random games with 10 targets were generated. The
SSAS property holds in the games with schedule size 1 (shown in column 1); SSAS does not hold
in the games with schedule sizes 2 and 3 (columns 2 and 3). For the case where there is a single
attacker resource and schedules have size 1, the SSAS property holds, and the experimental results
93
Figure 5.4: The number of games in which the SSE strategy is not an NE strategy, for dierent
parameter settings.
confirm the theoretical result that the SSE strategy is always an NE strategy. If we increase either
the number of attacker resources or the schedule size, then the theoretical result no longer holds,
and indeed we start to see cases where the SSE strategy is not an NE strategy.
Let us first consider the eect of increasing the number of attacker resources. We can see
that the number of games in which the defender’s SSE strategy is not an NE strategy increased
significantly as the number of attacker resources increased, especially as it went from 1 to 2
(note the dierent scales on the yaxes). In fact, when there were 2 or 3 attacker resources, the
phenomenon that in many cases the SSE strategy is not an NE strategy was consistent across a
wide range of values for the other parameters.
94
Now, let us consider the eect of increasing the schedule size. When the schedule size (with
a single attacker resource) increases, the SSAS property no longer holds, and so there exist some
games where the SSE strategy is not an NE strategy—but the percentage of such games was
generally small (< 6%). Also, as more random schedules were generated, the number of games
where the SSE strategy is not an NE strategy dropped to zero. This is particularly encouraging
for domains like FAMS, where the schedule sizes are relatively small (2 in most cases), and the
number of possible schedules is large relative to the number of targets. The eect of increasing
the number of defender resources is more ambiguous. When there are multiple attacker resources,
increasing the schedule size sometimes increases and sometimes decreases the number of games
where the SSE strategy is not an NE strategy.
The main message to take away from the experimental results appears to be that for the
case of a single attacker resource, SSE strategies are usually also NE strategies even when SSAS
does not hold, which appears to further justify the practice of playing an SSE strategy. On the
other hand, when there are multiple attacker resources, there are generally many cases where the
SSE strategy is not an NE strategy. This strongly poses the question of what should be done
in the case of multiple attacker resources (in settings where it is not clear whether the attacker
can observe the defender’s mixed strategy). While a formal answer to this question is out of
the scope of this thesis, it has been studied extensively in the literature. In particular, security
games with multiple attacker resources were studied in [Korzhyk et al., 2011b]. Korzhyk et. al.
also provided an extensiveform game formulation to further address the defender’s dilemma in
security games where the attacker’s observability is uncertain and the defender’s SSE strategy is
not an NE strategy [Korzhyk et al., 2011a].
95
Chapter 6: Patrolling in Transit Systems
This chapter presents gametheoretic models for patrolling in transit systems, where timing is an
integral part of what determines the eectiveness of patrol schedules, in addition to the set of
targets being covered. For example, trains, buses and ferries follow specific schedules, and in
order to protect them the patroller needs to be at the right place at the right time. As introduced
earlier in Section 2.1, a motivating example that I will focus on is the problem of scheduling
randomized ticket inspections for fare evasion deterrence in the Los Angeles Metro Rail system.
Patrolling in transit systems introduces significant new challenges in designing and imple
menting gametheoretic models. First, there can be exponentially many feasible patrols, which
are sequences of patrol actions subject to subject to both the spatial and temporal constraints of
travel within the underlining system. There can be trillions of feasible ticket inspection patrols in
complex real world train systems. Second, there are potentially a large number of opponents. The
Los Angeles Metro Rail system serves 300,000 riders everyday among which approximately 6%
are evading tickets [Booz Allen Hamilton, 2007]. These potential fare evaders are of many types,
each corresponds to a unique travel pattern. Finally, execution uncertainty (errors, emergencies,
noise, etc) in transit systems can aect the defender units’ ability to carry out their planned sched
ules in later time steps. The patrols in train system may get interrupted for a variety of reasons
96
such as writing citations, felony arrests, and handling emergencies. Such interruptions can cause
the ocers to miss the train that they were supposed to take and void the rest of the schedule.
To address the three challenges mentioned above, I provide new gametheoretic models with
a focus on the problem of fare evasion deterrence in transit systems. The result of this investiga
tion is a novel application called TRUSTS (Tactical Randomization for Urban Security in Transit
Systems), for fare evasion deterrence in urban transit systems, carried out in collaboration with
the Los Angeles Sheri’s Department (LASD). TRUSTS models this problem as a Stackelberg
game with one leader (the LASD) and many followers, in which each metro rider (a follower)
takes a fixed route at a fixed time. The leader precommits to a mixed patrol strategy (a proba
bility distribution over all pure patrols), and riders observe this mixed strategy before deciding
whether to buy the ticket or not (the decision to ride having already been made), in order to min
imize their expected total cost, following for simplicity the classic economic analysis of rational
crime [Becker and Landes, 1974]. Both ticket sales and fines issued for fare evasion translate
into revenue to the government. Therefore the optimization objective chosen for the leader is to
maximize total revenue (total ticket sales plus penalties).
The remainder of this chapter is divided into three parts. In Section 6.1, I introduce the first
generation of TRUSTS (TRUSTSv1), assuming the leader has perfect execution. TRUSTSv1
uses the transition graph, which captures the spatial as well as temporal structure of the domain,
and solves for the optimal (fractional) flow through this graph, using linear programming (LP).
Such a flow can be interpreted as a marginal coverage vector from which a mixed strategy of fea
sible patrols can be extracted. Additionally, I show that a straightforward approach to extracting
patrol strategies from the marginals faces important challenges: it can create infeasible patrols
that violate the constraint on patrol length, and it can generate patrols that switch too frequently
97
between trains, which can be dicult for patrol personnel to carry out. Thus, I present a novel
technique to overcome these diculties using an extended formulation on a historyduplicate
transition graph that allows us to specify constraints and preferences on individual patrols.
In Section 6.2, I present TRUSTSv2, generalizing and extending TRUSTSv1 to model exe
cution uncertainty and provide automatic contingency plans when the patrol ocer deviates from
the original schedule. TRUSTSv2 models execution uncertainty as Markov Decision Processes.
Computing a Stackelberg equilibrium for this game presents significant computational challenges
due to the exponential dimension in the defender’s strategy space. I show that when the utility
functions have a certain separable structure, the leader’s strategy space can be compactly rep
resented. As a result the problem can be reduced to a polynomialsized optimization problem,
solvable by existing approaches for Bayesian Stackelberg games such as Dobss and Hunter. Fur
thermore I show that from the compactly represented solution we can generate randomized patrol
schedules with contingency plans. Such contingency plans can be implemented as a smartphone
app carried by patrol units, or as a communication protocol with a central operator. Finally, I will
show how this approach can be applied to the fare evasion deterrence problem and provide details
of a smartphone app that facilitates the deployment of the TRUSTSv2 in the Los Angeles Metro
System.
In Section 6.3, I provide simulation results of TRUSTSv1 and TRUSTSv2 based on actual
ridership data provided by the LASD, for four LA Metro train lines (Blue, Gold, Green, and Red).
My simulation results on TRUSTSv1 suggest the possibility of significant fare evasion deterrence
and hence prevention of revenue loss with very few resources. Field trials conducted by the LASD
using patrol schedules generated by TRUSTSv1 show encouraging results but also reveal serious
issues. While TRUSTSv1 schedules were more eective in catching fare evaders, they were
98
vulnerable to execution errors and often got interrupted and abandoned before completion. My
further simulation results show that execution uncertainty has a significant impact on revenue and
TRUSTSv2 significantly outperforms TRUSTSv1 in the presence of execution uncertainty.
99
6.1 TRUSTSv1: Deterministic Model for Perfect Execution
In this section, I introduce TRUSTSv1, a deterministic gametheoretic model assuming the de
fender’s patrol actions are always executed perfectly. A formal problem setting for the fare eva
sion deterrence problem is given in Section 6.1.1. Section 6.1.2 introduces the basic and extended
linear program formulations for solving the problem.
6.1.1 Formal Model
We model this problem as a leaderfollower Stackelberg game with one leader (the LASD) and
multiple followers (riders). In this game, a pure leader strategy is a patrol, i.e., a sequence of
patrol actions (defined below), of constant bounded duration. The two possible pure follower
strategies are buying and not buying. Each follower observes the strategy the leader commits to
and plays a best response. There are many types of followers, one for each source, destination,
and departure time triple (corresponding to the set of all riders who take such a trip). In general
the leader’s strategies will be mixed; the followers are assumed to play pure strategies [Conitzer
and Sandholm, 2006].
Train System: The train system consists of a single line on which trains travel back and forth,
in general with multiple trains traveling simultaneously. The system operates according to a fixed
daily schedule, with trains arriving at stations at (finitely many) designated times throughout the
day. Therefore we can model time as discrete, focusing only on the time steps at which some
train arrival/departure event occurs. We use the (directed) transition graph G =hV; Ei to encode
the daily timetable of the metro line, where a vertex v = hl;i corresponds to some pair of
100
location (train station) l and time point. An edge in G represents a possible (minimal) action.
In particular, there is an edge fromhl;i tohl
0
;
0
i if:
l
0
is either the predecessor or successor of l in the station sequence andhl;i andhl
0
;
0
i are
two consecutive stops for some train in the train schedule (traveling action), or
l
0
= l,<
0
, and there is no vertexhl;
00
i with<
00
<
0
(staying action).
We refer to the entire path that a given train takes through G, from the start station to the terminal
station, as a train path. For simplicity, this model assumes a single train line in the system,
however the solution methods presented in this thesis are applicable to extensions of multiple
intersecting lines with transfer points.
Patrols: There are a fixed number
of deployable patrol units, each of which may be sched
uled on a patrol of duration at most hours. There are two sorts of patrol actions, which a given
patrol unit can alternate between on its shift: ontrain inspections (in which patrollers ride the
train, inspecting passengers), and instation inspections (in which they inspect passengers as they
exit the station). A pure patrol strategy is represented mathematically as a path in G for each
patrol unit, in which an edge e represents an atomic patrol action, i.e., inspecting instation from
the time of one train event at that station to the next (at that station) or inspecting ontrain as it
travels from one station to the next. Each edge e has a duration
e
equal to the corresponding pa
trol action duration and an eectiveness value f
e
, which represents the percentage of the relevant
ridership inspected by this action. For both instation and ontrain inspections, f
e
depends on the
ridership volume at that location and time of day and on the duration. A valid pure patrol strategy
is a set of paths P
1
;:::; P
, each of size at most, i.e.,
P
e2P
i
e
.
101
Example 4. A simple scenario with 3 stations (A; B; C) and 4 discrete time points (6pm, 7pm,
8pm, 9pm) is given in Figure 6.1. The dashed lines represent staying actions; the solid lines
represent traveling actions. There are 4 trains in the system; all edge durations are 1 hour. A
sample train path here ishA; 6pmi!hB; 7pmi!hC; 8pmi. In this example, if = 2 and
= 1,
then the valid pure leader strategies (pure patrol strategies) consist of all paths of length 2.
A, 7PM
B, 7PM
C, 7PM C, 9PM
B, 9PM
A, 9PM
B, 8PM
A, 6PM
B, 6PM
C, 6PM
A, 8PM
C, 8PM
A
B
C
6PM 7PM 8PM 9PM
Figure 6.1: The transition graph of a toy problem instance.
Riders: The riders are assumed to be daily commuters who take a fixed route at a fixed time.
Horizon research corporation 2002 suggests more than 82% of riders use the system at least 3
days a week. A rider’s type is therefore defined by the path he takes in the graph. Because there
is a single train line, riders never linger in stations, i.e., do not follow any “stay” edges (staying
at a station) midjourney; the last edge of every follower type is a (short) stay edge, representing
the action of “exiting” the destination station, during which the rider may be subject to instation
inspection. Therefore the space of rider types corresponds to the set of all subpaths of train
paths. (When G is drawn as in Figure 6.1, all rider paths are “diagonal” except for the last edge.)
The ticket price that a rider of type pays is a nominal fee
, with the fine for fare evasion
%
much greater. As the riders follow the same route every day, they could estimate the likelihood
102
of being inspected, based on which they make a decision as to whether to buy a ticket. The ticket
cost is fixed but the possibility of being caught and fined for fare evasion is uncertain; based
on the likelihood of being caught, the rider must make a decision as to whether to buy a ticket.
We assume the riders know the inspection probability perfectly, and are rational, riskneutral
economic actors [Becker and Landes, 1974], who make this choice in order to minimize expected
cost. (Equivalently, we can assume that some riders are conscientious, but that selfish or rational
riders are distributed evenly among all passenger types.)
Given a pure patrol strategy of the
units, (P
1
;:::; P
), the inspection probability for a rider
of type2 is:
minf1;
X
i=1
X
e2P
i
\
f
e
g; (6.1)
and therefore his expected utility is the negative of the expected amount he pays:
if he
buys the ticket and%
minf1;
P
i=1
P
e2P
i
\
f
e
g otherwise. The inspection probability for a mixed
strategy is then the expectation of Equation (6.1), taken over the distribution of pure strategies.
We justify the inspection probability in Equation (6.1) as follows. First, consider ontrain
inspections. The fraction of the train that is inspected in a given inspection action is determined
by f
e
(which depends on ridership volume). The key is that in the next inspection action, a
patrol will not reinspect the fraction of the train that is already inspected in a previous inspection
action. Therefore, unlike in settings where patrollers may repeatedly draw a random sample
from the same set train passengers to inspect, in our setting, the probabilities f
e
are added rather
than multiplied. Now also consider instation inspections. Since a rider taking a journey only
exits a single station, a rider will encounter at most one instation inspection. Finally, when
multiple patrol units cover the same edge e, the inspection probability given by (6.1) is the sum
103
of the contributions from each patrol unit, capped at 1. This is a reasonable assumption when the
number of patrol units on each edge e is small, as multiple patrol units on the same train could
check dierent cars or dierent portions of the same car, and multiple patrol units inspecting at
the same station could be checking dierent exits.
Objective: The leader’s utility, equal to total expected revenue, can be decomposed into
utilities from bilateral interactions with each individual follower. Hence the game is equivalent to
a Bayesian Stackelberg game between one leader with one type and one follower with multiple
types. Specifically, we denote the prior probability of a follower type2 (proportional to its
ridership volume) by p
.
Furthermore, these utility functions imply that the game is zero sum, in which case the Stack
elberg equilibrium is equivalent to the maximin solution. Although such zerosum Bayesian
games are solvable by either applying the LP formulation of [Ponssard and Sorin, 1980] or treat
ing the Bayesian game as a extensiveform game and applying the sequence form LP formulation
of [Koller et al., 1994], those LP formulations would be impractical here because they explicitly
enumerate the exponential number of pure strategies of the leader.
104
6.1.2 Approach
In this section, I formulate a linear program which finds a maximumrevenue (mixed) patrol
strategy. As noted above, the leader’s space of pure strategies is exponentially large, even with a
single patrol unit. (A pure strategy consists of
pure patrol strategies, one for each patrol unit.)
We avoid this diculty by compactly representing mixed patrol strategies by marginal coverage
on edges x
e
of the transition graph (the marginal strategy), i.e., by the expected numbers of
inspections that will occur on these edges. Subsequently, we construct a mixed strategy (i.e., a
probability distribution over pure strategies) consistent with the marginal coverage.
For expository purposes, I will first present a basic LP formulation based on the compactly
represented strategy. This basic formulation however may generate infeasible patrols due to a
couple of key issues. I will then introduce an extended formulation to address these issues.
6.1.2.1 Basic Formulation
We denote the set of possible starting vertices in the transition graph G =hV; Ei by V
+
V,
and the set of possible ending vertices by V
V. For algorithmic convenience, we add to the
transition graph a source v
+
with edges to all vertices in V
+
and a sink v
with edges from all
vertices in V
. We assign these additional dummy edges zero duration and zero eectiveness.
Based on this graph, we provide a linear program (shown in Figure 6.2) to provide an upper
bound on the optimal revenue achievable. Here u
denotes the expected value paid by a rider
of type, and so p
u
is the expected total revenue from riders of this type; x
e
is the expected
number of inspections on edge e. Constraint (6.4) bounds the total flow entering and exiting the
system by
, the number of total patrol units allowed. Constraint (6.5) enforces conservation
105
max
x;u
X
2
p
u
(6.2)
s.t. u
minf
; %
X
e2
x
e
f
e
g; for all2 (6.3)
X
v2V
+
x
(v
+
;v)
=
X
v2V
x
(v;v
)
(6.4)
X
(v
0
;v)2E
x
(v
0
;v)
=
X
(v;v
y
)2E
x
(v;v
y
)
; for all v2 V (6.5)
X
e2E
e
x
e
; 0 x
e
;8e2 E (6.6)
Figure 6.2: Basic Formulation
of flow, which clearly is satisfied by any mixed patrol strategy. Constraint (6.6) limits the total
number of time units to
, and also bounds x
e
for each e by
.
Finally, let us consider Constraint (6.3), which indicates that the rider will best re
spond, by bounding the expected cost to a rider of type by both the ticket price
and
%
minf1;
P
e2
x
e
f
e
g = minf%
;%
P
e2
x
e
f
e
g, the formulation’s estimate of the expected fine if
the rider chooses not to buy. However, the latter is only an overestimate of the actual expected
fine of not buying. This is because the expression minf1;
P
e2
x
e
f
e
g only caps the expectation
(over its pure strategies) of the inspection probability at 1, but allows a pure strategy (P
1
;:::; P
)
in its support to achieve
P
i=1
P
e2P
i
\
f
e
> 1, whereas according to (6.1) the inspection probability
of each pure strategy should be at most 1. This results in an overestimate of the actual inspection
probability (and thus the leader’s utility). As a result the solution of this LP provides only an up
per bound on the optimal revenue. Fortunately, once we generate the patrols from the marginals
we are able to compute the actual bestresponse utilities of the riders. Our experiments show that
the dierences between the actual utilities and the upperbounds given by the LP formulation are
106
small. The remaining task is to construct a
unit mixed patrol strategy whose marginals match
the marginal strategy x.
Proposition 4. Given a marginal strategy x, a
unit mixed strategy for the leader that produces
the same coverage on each edge e as x does can be constructed in polynomial time.
Proof. First, we construct a set of weighted patrol paths, by extracting distinct sourcetosink
flows from x through the following iterative procedure.
1. Find a path P from v
+
to v
where x
e
> 0 for all e2 P. If no such path exists, terminate
because x
e
must then be 0 for all e2 E (due to Constraint (6.5)). Otherwise go to step 2.
2. Let x
= min
e2P
fx
e
g. Add path P with weight x
to the set . Deduct x
from x
e
for all
e2 P. Go to step 1.
Since every iteration removes a complete sourcetosink flow, constraint (6.5) is maintained
throughout the execution of this procedure. The procedure’s running time is polynomial because
at least one new x
e
is set to 0 in each iteration.
Finally, we create a mixed strategy of joint patrol paths (with
units) that matches exactly the
set of weighted patrol paths obtained in the procedure above, and thus the marginal strategy x.
To do this, we could assign a path of weight x
to the
units independently, each with an equal
probability of
x
. Since x
, we have
x
1.
6.1.2.2 Issues with the Basic Formulation
There are two fundamental issues with the basic formulation. First, the mixed strategy constructed
can fail to satisfy the patrol length limit of, notwithstanding Constraint (6.6) on the sum of the
lengths of all patrols, and hence be infeasible. In fact, the marginal strategy computed in the
107
basic formulation may not correspond to any feasible mixed strategy in which all patrols have
length at most . Consider the counterexample in Figure 6.3. Edges v
1
! v
2
and v
2
! v
3
represent two real atomic actions, each with duration 1. Patrols must start from either v
1
or v
3
,
but can terminate at any of v
1
, v
2
and v
3
. This is specified using v
+
and v
, the dummy source
and sink respectively. Let = 1 and
= 1. It can be verified that the marginal strategy shown
in Figure 6.3 satisfies constraints (6.4) through (6.6). However, the only corresponding mixed
strategy is to take v
+
! v
3
! v
with 50% probability and v
+
! v
1
! v
2
! v
3
! v
with 50%
probability. This mixed strategy is infeasible since its second patrol has duration greater than
1. This patrol length violation arises because the basic formulation only constrains the average
patrol length, and therefore allows the use of overlong patrols as long as some short patrols are
also used.
v
+
v
1
v
2
v
3
v

0.5
0.5 0.5 0.5 1
0
0
0
Figure 6.3: Example of an infeasible marginal strategy.
Second, the paths selected according the constructed mixed strategy may switch between
trains or between instation and ontrain at impractically large number of times, making the pa
trol path dicult to implement and errorprone. This is an important issue because we want
real LASD ocers to be able to carry out these strategies. The more switches there are in a
patrol strategy, the more instructions the patrol unit has to remember, and the more likely they
will miss a switch due to imperfections in the train schedule and/or the unit’s misexecution
of the instructions. For example, in Example 4, hA; 6pmi ! hB; 7pmi ! hA; 8pmi and
108
hC; 6pmi!hB; 7pmi!hC; 8pmi each has 1 switch whilehA; 6pmi!hB; 7pmi!hC; 8pmi
andhC; 6pmi!hB; 7pmi!hA; 8pmi each has 0. Both path pairs cover the same set of edges,
however the second pair will be preferred because it is easier to implement.
6.1.2.3 Extended Formulation
Now I present a more sophisticated formulation design to address the two aforementioned issues.
The diculty involved in imposing constraints on the patrol paths (i.e., penalizing or forbidding
certain paths) in the marginal representation is that paths themselves are not represented, instead
being encoded only as marginal coverage.
Hence the key idea is to preserve sucient path history information within vertices to be able
to evaluate our constraints, while avoiding the exponential blowup creating a node for every path
would cause. To this end, we construct a new graph, called the HistoryDuplicate Transition
graph (HDT graph), by creating multiple copies of the original vertices, each corresponding to
a unique (partial) patrol history. This duplication is performed only to preserve patrol history
information that is necessary as I will show next.
I will first explain how to construct the HDT graph from a transition graph G in order to
restrict the length of patrol paths to at most. The HDT graph is composed of multiple restricted
copies of G (i.e., subgraphs of G), each corresponding to a unique starting time. For the copy
corresponding to starting time point
, we only keep the subgraph that can be reached from time
, i.e., vertices v =hl;i2 V where
+. Thus, in each restricted copy of G, the
length of any path is guaranteed to be less than or equal to. Since there are a finite number of
distinct possible starting time points (i.e., all distinct discrete time points in V
+
), the new graph is
109
a linear expansion of G. It is however often desirable to use fewer starting time points (e.g., one
for every hour) to improve runtime eciency at the cost of small quality loss.
Figure 6.4(a) shows the HDT graph (the shaded portion further explained below) of Exam
ple 4 with = 2 and 2 starting time points, 6pm and 7pm. The HDT graph is thus composed of
two restricted copies of the original transition graph. In each vertex, the time shown in paren
thesis indicates the starting time point. For example, the original vertexhA; 7pmi now has two
copieshA; 7pm; (6pm)i andhA; 7pm; (7pm)i in the HDT graph. For the starting time point of
6pm, the patrol must end at or before 8pm, hence we do not need to keep vertices whose discrete
time point is 9pm. For the starting time point of 7pm, the patrol must start at or after 7pm, hence
we do not need to keep vertices whose discrete time point is 6pm. The two restricted copies are
not two separate graphs but a single graph that will be tied together by the dummy source and
sink.
Next, I will explain how to further extend the HDT graph to penalize complex patrol paths.
The idea is to have each vertex encode the last action occurring prior to it. Specifically, we create
multiple copies of a vertex v, each corresponding to a dierent edge (prior action) that leads to
it. If v is a possible starting vertex, we create an additional copy representing no prior action. If
there is an edge from v to v
0
, we connect all copies of v to the specific copy of v
0
whose last action
was edge (v; v
0
). A new edge is called a switching edge if the recorded last actions of its two
vertices are of dierent types (e.g., inspecting dierent trains), unless one of the two vertices is a
“no prior action” vertex. As can be verified, the number of switches of a patrol path in the new
graph is the number of switching edges it has. To favor simple patrol paths, we demand a cost
> 0 for using switching edges. Varying the value of allows us to trade o between revenue
and patrol complexity (average number of switches).
110
In Figure 6.4(b), we show how to apply this extension using the subgraph shown in the
shaded box of Figure 6.4(a). Since there is only one edge leading tohA; 7pm; (6pm)i, we
create one copy of it representing the action of staying at A. There are 3 edges leading to
hB; 7pm; (6pm)i, so we create 3 copies of it representing the actions of taking train from A,
staying at B, and taking train from C. The original edges are also duplicated. For example,
hB; 7pm; (6pm)i!hB; 8pm; (6pm)i has 3 copies connecting the 3 copies ofhB; 7pm; (6pm)i to
the copy ofhB; 8pm; (6pm)i, representing the staying at B action. Among the three copies, only
the “Stay” to “Stay” edge is not a switching edge.
A, 7PM
(6PM)
B, 7PM
(6PM)
C, 7PM
(6PM)
B, 8PM
(6PM)
A, 6PM
(6PM)
B, 6PM
(6PM)
C, 6PM
(6PM)
A, 8PM
(6PM)
C, 8PM
(6PM)
A, 7PM
(7PM)
B, 7PM
(7PM)
C, 7PM
(7PM)
C, 9PM
(7PM)
B, 9PM
(7PM)
A, 9PM
(7PM)
B, 8PM
(7PM)
A, 8PM
(7PM)
C, 8PM
(7PM)
(a)
Stay
Stay
Stay
Stay
Stay
Stay
From A
From C
From B
From A
From C
From B
A, 8PM
(6PM)
B, 8PM
(6PM)
C, 8PM
(6PM)
A, 7PM
(6PM)
B, 7PM
(6PM)
C, 7PM
(6PM)
A, 7PM
(6PM)
B, 7PM
(6PM)
C, 7PM
(6PM)
B, 8PM
(6PM)
A, 8PM
(6PM)
C, 8PM
(6PM)
(b)
Figure 6.4: (a) HDT graph of Example 4 with two starting time points. (b) extension storing the
last action occurring.
Given the final HDT graphG =hV;Ei, I provide an extended linear program formulation
in Figure 6.5. We still use x
e
to represent the marginal coverage on an original edge e 2 E,
but we now also use y
e
to represent the marginal coverage on an HDT graph edge e2E. Let
(e)E be the set of copies of e, then x
e
=
P
e
0
2(e)
y
e
0. Let c
e
= 1 if e2E is a switching edge
111
max
x;y;u
X
2
p
u
X
e2E
c
e
y
e
(6.7)
s.t. u
minf
; %
X
e2
x
e
f
e
g; for all2 (6.8)
X
v2V
+
y
(v
+
;v)
=
X
v2V
y
(v;v
)
(6.9)
X
(v
0
;v)2E
y
(v
0
;v)
=
X
(v;v
y
)2E
y
(v;v
y
)
; for all v2V (6.10)
x
e
=
X
e
0
2(e)
y
e
0;8e2 E; 0 x
e
;8e2 E (6.11)
Figure 6.5: Extended Formulation
and 0 otherwise. The set of possible starting verticesV
+
is the set of copies of V
+
that are “no
prior action” vertices. The set of possible ending verticesV
is the set of all copies of V
. We
again add a dummy source v
+
leading toV
+
and a dummy sink that can be reached fromV
.
Because the extended formulation enforces stricter restrictions on patrols allowed than the basic
formulation, the LP of Figure 6.5, with set to 0, provides a tighter upper bound on the optimal
revenue than the LP of Figure 6.2.
A path in the HDT graphG trivially corresponds to a path in the transition graph G, since any
edge inG is a duplicate of some edge in G. Therefore from the optimal solution y
, we can use
the same process described for the basic formulation to construct a mixed patrol strategy. As we
can see, this mixed patrol strategy does not have the two issues of the basic formulation. First,
the length of any patrol path in the HDT graph is bounded by. In addition, since the number
of switches in a patrol path equals the number of switching edges in it, the average number
of switches of the constructed mixed strategy is equal to
P
e2E
c
e
y
e
, which is penalized in the
objective function.
112
6.2 TRUSTSv2: Stochastic Model for Imperfect Execution
A major drawback of TRUSTSv1 is its vulnerability to execution uncertainty. In real world
trials carried out by the LASD, a significant fraction of the executions of pregenerated schedules
got interrupted, for a variety of reasons such as writing citations, felony arrests, and handling
emergencies. Such interruptions can cause the ocers to miss the train that they were supposed
to take as part of the schedule. As a result the solution of TRUSTSv1 may not provide instructions
on what to do after an interruption occurs. Furthermore, since the TRUSTSv1 model does not
take into account such execution uncertainty in its optimization formulation, the quality guarantee
of its solution is no longer valid in real world settings.
In this section, I will present TRUSTSv2, the second generation of TRUSTS to address the
challenge of execution uncertainty. In Section 6.2.1, I will first present a formal general game
theoretic model for patrolling with dynamic execution uncertainty. Section 6.2.2 provides a so
lution method for problems where the utilities have additional separable structure. Finally, I will
explain the details of applying this model to the fare evasion deterrence problem in Section 6.2.3.
6.2.1 Formal Model
A patrolling game with execution uncertainty is a twoplayer Bayesian Stackelberg game, be
tween a leader (the defender) and a follower (the adversary). The leader has
patrol units, and
commits to a randomized daily patrol schedule for each unit. A (naive) patrol schedule consists
of a list of commands to be carried out in sequence. Each command is of the form: at time, the
unit should be at location l, and should execute patrol action a. The patrol action a of the current
113
command, if executed successfully, will take the unit to the location and time of the next com
mand. Each unit faces uncertainty in the execution of each command: delays, or being called to
deal with emergencies (possibly at another location). As a result the unit may end up at a location
and a time that is dierent from the intended outcome of the action.
We use Markov Decision Processes (MDPs) as a compact representation to model each in
dividual defender unit’s execution of patrols. We emphasize that these MDPs are not the whole
game: they only model the defender’s interactions with the environment when executing patrols;
we will later describe the interaction between the defender and the adversary. Formally, for each
defender unit i2f1;:::;
g we define an MDP (S
i
; A
i
; T
i
; R
i
), where
S
i
is a finite set of states. Each state s
i
2 S
i
is a tuple (l;) of the current location of the
unit and the current discretized time. We denote by l(s
i
) and(s
i
) the location and time of
s
i
, respectively.
A
i
is a finite set of actions. Let A
i
(s
i
) A
i
be the set of actions available at state s
i
.
For each s
i
2 S
i
and each action a
i
2 A
i
(s), the default next state n(s
i
; a
i
) 2 S
i
is the
intended next state when executing action a
i
at s
i
. We call a transition (s
i
; a
i
; s
0
i
) a default
transition if s
0
i
= n(s
i
; a
i
) and a nondefault transition otherwise.
T
i
(s
i
; a
i
; s
0
i
) is the probability of next state being s
0
i
if the current state is s
i
and the action
taken is a
i
.
R
i
(s
i
; a
i
; s
0
i
) is the immediate reward for the defender from the transition (s
i
; a
i
; s
0
i
). For
example, being available for emergencies (such as helping a lost child) is an important
function of the police, and we can take this into account in our optimization formulation by
using R
i
to give positive rewards for such events.
114
We assume that the MDP is acyclic: T
i
(s
i
; a
i
; s
0
i
) is positive only when (s
0
i
) > (s
i
), i.e., all
transitions go forward in time. S
+
i
; S
i
S
i
are two subsets of states where a patrol could start
and end respectively. For convenience, we add a dummy source state s
+
i
2 S
i
that has actions
with deterministic transitions going into each of the states in S
+
i
, and analogously a dummy sink
state s
i
2 S
i
. Thus each patrol of defender i starts at s
+
i
and ends at s
i
. A patrol execution of i is
specified by its complete trajectory t
i
= (s
+
i
; a
+
i
; s
1
i
; a
1
i
; s
2
i
;:::; s
i
), which records the sequence of
states visited and actions performed. A joint complete trajectory, denoted by t = (t
1
;:::; t
), is a
tuple of complete trajectories of all units. LetX be the finite space of joint complete trajectories.
The immediate rewards R
i
are not all the utility received by the defender. The defender also
receives rewards from interactions with the adversary. The adversary can be of a set of possible
types and has a finite set of actionsA. The types are drawn from a known distribution, with p
the probability of type2 . The defender does not know the instantiated type of the adversary,
while the adversary does and can condition his decision on his type.
In this general game model, the utilities resulting from defenderadversary interaction could
depend arbitrarily on the complete trajectories of the defender units. Formally, for a joint com
plete trajectory t, the realized adversary type2 , and an action of the adversary2A, the
defender receives utility(t;;), while the adversary receives(t;;).
We are interested in finding the Strong Stackelberg Equilibrium (SSE) of this game, in which
the defender commits to a randomized policy which we define next, and the adversary plays
a best response to this randomized policy. It is sucient to consider only pure strategies for
115
the adversary [Conitzer and Sandholm, 2006]. Finding one SSE is equivalent to the following
optimization problem:
max
X
2
p
E
t
[(t;;
) +
X
i
R
i
(t
i
)] (6.12)
s.t.
2 arg max
E
t
[(t;;
)];82 (6.13)
where R
i
(t
i
) is the total immediate reward from the trajectory t
i
, and E
t
[] denotes the expecta
tion over joint complete trajectories induced by defender’s randomized policy.
Whereas MDPs always have Markovian and deterministic optimal policies, in our game the
defender’s optimal strategy may be nonMarkovian because the utilities depend on trajectories,
and may be randomized because of interactions with the adversary. The execution of patrols
can be potential coupled and decoupled. In coupled execution, patrol units can coordinate with
each other; that is, the behavior of unit i at s
i
could depend on the earlier joint trajectory of all
units. Formally, letT
i
be the set of unit i’s partial trajectories (s
+
i
; a
+
i
; s
1
i
; a
1
i
;:::; s
0
i
). A coupled
randomized policy is a function :
Q
i
T
i
Q
i
A
i
! I R that specifies a probability distribution over
joint actions of units for each joint partial trajectory. Denote by'(t;)2 I R the probability that
joint complete trajectory t2X is instantiated under policy. In decoupled execution, patrol units
do not communicate with each other. Formally, a decoupled randomized policy = (
1
;:::;
)
where for each unit i,
i
:T
i
A
i
! I R specifies a probability distribution over i’s actions given
each partial trajectory of i. Thus a decoupled randomized policy (
1
;:::;
) can be thought of as
a coupled randomized policy
0
where
0
(t; (a
1
;:::; a
)) =
Q
i
i
(t
i
; a
i
).
Coupled execution potentially yields higher expected utility than decoupled execution. Sup
pose the defender wants to protect an important target with at least one unit, and unit 1 is assigned
116
that task. Then if she knows unit 1 is dealing with an emergency and unable to reach that target,
she can reroute unit 2 to cover the target. However, coordinating among units presents significant
logistical and (as I will explain later) computational burden.
117
6.2.2 Approach
Since the defender’s optimal strategy may be coupled and nonMarkovian, i.e., the policy at
s could depend on the entire earlier trajectories of all units rather than the current state s. This
makes solving the game computationally dicult—the dimension of the space of mixed strategies
is exponential in the number of states.
Nevertheless, in many domains, the utilities have additional structure. In Section 6.2.2.1 I
will show that under the assumption that the utilities have separable structure, it is possible to ef
ficiently compute an SSE of patrolling games with execution uncertainty. In Section 6.2.2.2 I will
discuss generating patrol schedules from solutions described in Section 6.2.2.1. In Section 6.2.2.3
I will consider a more general case with partially separable utilities.
6.2.2.1 Ecient Computation on Separable Utilities
Consider a coupled strategy. Denote by x
i
(s
i
; a
i
; s
0
i
) the marginal probability of defender unit i
reaching state s
i
, executing action a
i
, and ending up at next state s
0
i
. Formally,
x
i
(s
i
; a
i
; s
0
i
) =
X
t2X
'(t;)(t
i
; s
i
; a
i
; s
0
i
); (6.14)
where the value of the membership function (t
i
; s
i
; a
i
; s
0
i
) is equal to 1 if trajectory t
i
contains
transition (s
i
; a
i
; s
0
i
) and is equal to 0 otherwise. Let x2 I R
M
be the vector of these marginal
probabilities, where M =
P
i
jS
i
j
2
jA
i
j. Similarly, let w
i
(s
i
; a
i
) be the marginal probability of unit i
118
reaching s
i
and taking action a
i
. Let w2 I R
P
i
jS
i
jjA
i
j
be the vector of these marginal probabilities. I
will show that w and x satisfy the linear constraints:
x
i
(s
i
; a
i
; s
0
i
) = w
i
(s
i
; a
i
)T
i
(s
i
; a
i
; s
0
i
);8s
i
; a
i
; s
0
i
(6.15)
X
s
0
i
;a
0
i
x
i
(s
0
i
; a
0
i
; s
i
) =
X
a
i
w
i
(s
i
; a
i
);8s
i
(6.16)
X
a
i
w
i
(s
+
i
; a
i
) =
X
s
0
i
;a
0
i
x
i
(s
0
i
; a
0
i
; s
i
) = 1; (6.17)
w
i
(s
i
; a
i
) 0;8s
i
; a
i
(6.18)
Lemma 7. For all coupled randomized policy , the resulting marginal probabilities w
i
(s
i
; a
i
)
and x
i
(s
i
; a
i
; s
0
i
) satisfy constraints (6.15), (6.16), (6.17), (6.18).
Proof. Constraint (6.15) holds by the definition of transition probabilities of MDPs. Constraint
(6.16) holds because both lhs and rhs equal the marginal probability of reaching state s. Con
straint (6.17) holds because by construction, the marginal probability of reaching s
+
i
is 1, and so
is the marginal probability of reaching s
i
. Constraint (6.18) holds because w
i
(s
i
; a
i
) is a proba
bility.
Intuitively, if we can formulate utilities in terms of w and x, which have dimensions poly
nomial in the sizes of the MDPs, this will lead to a much more compact representation of the
SSE problem compared to (6.12). It turns out this is possible if the game’s utilities are separable,
which intuitively means that given the adversary’s strategy, the utilities of both players are sums
of contributions from individual units’ individual transitions:
119
Definition 8. A patrolling game with execution uncertainty as defined in Section 6.2.1 has sep
arable utilities if there exist utilities U
(s
i
; a
i
; s
0
i
;) and V
(s
i
; a
i
; s
0
i
;) for each unit i, transition
(s
i
; a
i
; s
0
i
), 2 , 2 A, such that for all t 2 X, 2 , 2 A, the defender’s and the
adversary’s utilities can be expressed as (t;;) =
P
i
P
s
i
;a
i
;s
0
i
(t
i
; s
i
; a
i
; s
0
i
)U
(s
i
; a
i
; s
0
i
;) and
(t;;) =
P
i
P
s
i
;a
i
;s
0
i
(t
i
; s
i
; a
i
; s
0
i
)V
(s
i
; a
i
; s
0
i
;), respectively.
Let U
; V
2 I R
MjAj
be the corresponding matrices. Then U
; V
completely specifies the
utility functions and.
L
1
, τ
0
Stay
L
2
, τ
0
L
1
, τ
1
To L
2
L
2
, τ
1
Stay
To L
1
Stay L
1
, τ
2
To L
2
L
2
, τ
2
Stay
To L
1
L
1
L
2
τ
0
τ
1
τ
2
1.0
0.1
0.9
1.0
0.9
0.1
1.0
0.1
0.9
1.0
0.9
0.1
Figure 6.6: Example game with separable utilities.
Example 5. Consider the following simple example game with one defender unit, whose MDP
is illustrated in Figure 6.6. There are six states, shown as circles in the figure, over two lo
cations L
1
; L
2
and three time points
0
;
1
;
2
. From states at
0
and
1
, the unit has two ac
tions: to stay at the current location, which always succeeds, and to try to go to the other
location, which with probability 0.9 succeeds and with probability 0.1 fails (in which case it
stays at the current location). There are 12 transitions in total, which is fewer than the num
ber of complete trajectories (18). There is a single type of adversary who chooses one location
between L
1
and L
2
and one time point between
1
and
2
to attack (
0
cannot be chosen). If
120
the defender is at that location at that time, the attack fails and both players get zero utility.
Otherwise, the attack succeeds, and the adversary gets utility 1 while the defender gets1.
In other words, the attack succeeds if and only if it avoids the defender unit’s trajectory. It
is straightforward to verify that this game has separable utilities: for any transition (s
i
; a
i
; s
0
i
)
in the MDP , let V
(s
i
; a
i
; s
0
i
;) be 1 if coincides with s
0
i
and 0 otherwise. For example, the
utility expression for the adversary given trajectory ((L
1
;
0
); To L
2
; (L
1
;
1
); To L
2
; (L
2
;
2
)) is
V
((L
1
;
0
); To L
2
; (L
1
;
1
);)+V
((L
1
;
1
); To L
2
; (L
2
;
2
);), which gives the correct utility value
for the adversary: 1 if equals (L
1
;
1
) or (L
2
;
2
) and 0 otherwise.
It is straightforward to show the following.
Lemma 8. Consider a game with separable utilities. Suppose x is the vector of marginal prob
abilities induced by the defender’s randomized policy . Let y
2 I R
jAj
be a vector describing
the mixed strategy of the adversary of type , with y
() denoting the probability of choosing
action . Then the defender’s and the adversary’s expected utilities from their interactions are
P
p
x
T
U
y
and
P
p
x
T
V
y
, respectively.
In other words, given the adversary’s strategy, the expected utilities of both players are linear
in the marginal probabilities x
i
(s
i
; a
i
; s
0
i
). Lemma 8 also applies when (as in an SSE) the adversary
is playing a pure strategy, in which case y
is a 01 integer vector with y
() = 1 if is the
121
action chosen. We can thus use this compact representation of defender strategies to rewrite the
formulation for SSE (6.12) as a polynomialsized optimization problem.
max
w;x;y
X
2
p
x
T
U
y
+
X
i=1
X
s
i
;a
i
;s
0
i
x
i
(s
i
; a
i
; s
0
i
)R
i
(s
i
; a
i
; s
0
i
) (6.19)
s.t. constraints (6.15), (6.16), (6.17), (6.18)
X
y
() = 1; y
()2f0; 1g (6.20)
y
2 arg max
y
0
x
T
V
y
0
(6.21)
As I will show in Section 6.2.2.2, given a solution w; x to (6.19), we can calculate a decoupled
policy that matches the marginals w; x. Compared to (6.12), the optimization problem (6.19) has
exponentially fewer dimensions; in particular the numbers of variables and constraints are poly
nomial in the sizes of the MDPs. Furthermore, existing methods for solving Bayesian Stackelberg
games can be directly applied to (6.19) such as Dobss [Paruchuri et al., 2008] or Hunter in this
thesis [Yin and Tambe, 2012].
For the special case of U
+ V
= 0 for all, i.e., when the interaction between defender and
adversary is zerosum, the above SSE problem can be formulated as a linear program (LP):
max
w;x;u
X
2
p
u
+
X
i
X
s
i
;a
i
;s
0
i
x
i
(s
i
; a
i
; s
0
i
)R
i
(s
i
; a
i
; s
0
i
) (6.22)
s.t. constraints (6.15), (6.16), (6.17), (6.18)
u
x
T
U
e
;82 ; 2A; (6.23)
122
where e
is the basis vector corresponding to adversary action. This LP is similar to the max
imin LP for a zerosum game with the utilities given by U
and V
, except that an additional
term
P
i
P
s
i
;a
i
;s
0
i
x
i
(s
i
; a
i
; s
0
i
)R
i
(s
i
; a
i
; s
0
i
) representing defender’s expected utilities from immediate
rewards is added to the objective. One potential issue arises: because of the extra defender util
ities from immediate rewards, the entire game is no longer zerosum. Is it still valid to use the
above maximin LP formulation? It turns out that the LP is indeed valid, as the immediate rewards
do not depend on the adversary’s strategy.
Proposition 5. If the game has separable utilities and U
+ V
= 0 for all, then a solution of
the LP (6.22) is an SSE.
Proof. We can transform this game to an equivalent zerosum Bayesian game whose LP for
mulation is equivalent to (6.22). Specifically, given the nonzerosum Bayesian game speci
fied above, consider the Bayesian game
0
with the following “meta” type distribution for the
second player: for all 2 of there is a corresponding type
0
2
0
in
0
, with prob
ability p
0 = 0:5p
, with the familiar utility functions; and there is a special type 2
0
with probability p
= 0:5, whose action does not aect either player’s utility. Specifically
the utilities under the special type are (t;;) =
P
i
P
s
i
;a
i
;s
0
i
(t
i
; s
i
; a
i
; s
0
i
)R
i
(s
i
; a
i
; s
0
i
) and
(t;;) =
P
i
P
s
i
;a
i
;s
0
i
(t
i
; s
i
; a
i
; s
0
i
)R
i
(s
i
; a
i
; s
0
i
). The resulting game
0
is zerosum, with the
defender’s utility exactly half the objective of (6.22). Since for zerosum games maximin strate
gies and SSE coincide, a solution of the LP (6.22) is an optimal SSE marginal vector for the
defender of
0
. On the other hand, if we compare the induced normal forms of and
0
, the only
dierence is that for the adversary the utility0:5
P
e2E
U
e
x
e
is added, which does not depend
123
on the adversary’s strategy. Therefore and
0
have the same set of SSE, which implies that a
solution of the LP is an SSE of .
6.2.2.2 Generating Patrol Schedules
The solution of (6.19) does not yet provide a complete specification of what to do. We ultimately
want an explicit procedure for generating the patrol schedules. We define a Markov strategy to
be a decoupled strategy (
1
;:::;
),
i
: S
i
A
i
! I R, where the distribution over next actions
depends only on the current state. Proposition 6 below shows that given w; x, there is a simple
procedure to calculate a Markov strategy that matches the marginal probabilities. This implies
that if w; x is the optimal solution of (6.19), then the corresponding Markov strategy achieves
the same expected utility. I have thus shown that for games with separable utilities it is sucient
to consider Markov strategies.
Proposition 6. Given w; x satisfying constraints (6.15) to (6.18), construct a Markov strategy
as follows: for each s
i
2 S
i
, for each a
i
2 A
i
(s
i
),
i
(s
i
; a
i
) =
w
i
(s
i
;a
i
)
P
a
0
i
w
i
(s
i
;a
0
i
)
. Suppose the defender
plays, then for all unit i and transition (s
i
; a
i
; s
0
i
), the probability that (s
i
; a
i
; s
0
i
) is reached by i
equals x
i
(s
i
; a
i
; s
0
i
).
Sketch. Such a Markov strategy induces a Markov chain over the states S
i
for each unit i. It
can be verified by induction that the resulting marginal probability vector matches x.
In practice, directly implementing a Markov strategy requires the unit to pick an action ac
cording to the randomized Markov strategy at each time step. This is possible when units can
consult a smartphone app that stores the strategy, or can communicate with a central command.
However, in certain domains such requirement on computation or communication at each time
124
step places additional logistical burden on the patrol unit. To avoid unnecessary computation or
communication at every time step, it is desirable to have a deterministic schedule (i.e., a pure
strategy) from the Markov strategy. Without execution uncertainty, a pure strategy can be spec
ified by the a complete trajectory for each unit. However, this no longer works in the case with
execution uncertainty.
I will thus begin by defining a Markov pure strategy, which specifies a deterministic choice
at each state.
Definition 9. A Markov pure strategy q is a tuple (q
1
;:::; q
) where for each unit i, q
i
: S
i
!
A
i
.
Given a Markov strategy, we sample a Markov pure strategy q as follows: for each unit i
and state s
i
2 S
i
, sample an action a
i
as q
i
(s
i
) according to
i
. This procedure is correct since
each state in i’s MDP is visited at most once and thus q
i
exactly simulates a walk from s
+
i
on the
Markov chain induced by
i
.
To directly implement a Markov pure strategy, the unit needs to remember the entire mapping
q or receives the action from the central command at each time step. A logistically more ecient
way is for the central command to send the unit a trajectory assuming perfect execution, and only
after a nondefault transition happened does the unit communicates with the central command to
get a new trajectory starting from the current state. Formally, given s
i
2 S
i
and q
i
, we define the
optimistic trajectory from s
i
induced by q
i
to be (s
i
; q
i
(s
i
); n(s
i
; q
i
(s
i
));::: s
), i.e, the trajectory
assuming it always reaches its default next state. Given a Markov pure strategy q, the following
procedure for each unit i exactly simulates q: (i) central command gives unit i the optimistic
trajectory from s
+
induced by q
i
; (ii) unit i follows the trajectory until the terminal state s
is
125
reached or some unexpected event happens and takes i to state s
0
i
; (iii) central command sends the
new optimistic trajectory from s
0
i
induced by q
i
to unit i and repeat from step (ii).
6.2.2.3 Coupled Execution: Cartesian Product MDP
Without the assumption of separable utilities, it is no longer sucient to consider decoupled
Markov strategies of individual units’ MDPs. We therefore need to create a new MDP that cap
tures the joint execution of patrols by all units. For simplicity of exposition, we look at the case
with two defender units. Then a state in the new MDP corresponds to the tuple (location of
unit 1, location of unit 2, time). An action in the new MDP corresponds to a tuple (action of
unit 1, action of unit 2). Formally, if unit 1 has an action a
1
at state s
1
= (l
1
;) that takes her
to s
0
1
= (l
0
1
;
0
) with probability T
1
(s
1
; a
1
; s
0
1
), and unit 2 has an action a
2
at state s
2
= (l
2
;)
that takes her to s
0
2
= (l
0
2
;
0
) with probability T
2
(s
2
; a
2
; s
0
2
), we create in the new MDP an ac
tion a
= (a
1
; a
2
) from state s
= (l
1
; l
2
;) that transitions to s
0
= (l
0
1
; l
0
2
;
0
) with probability
T
(s
; a
; s
0
) = T
1
(s
1
; a
1
; s
0
1
)T
2
(s
2
; a
2
; s
0
2
). The immediate rewards R
of the MDP are defined
analogously. We call the resulting MDP (S
; A
; T
; R
) the Cartesian Product MDP.
An issue arises when at state s
the individual units have transitions of dierent time dura
tions. For example, unit 1 rides a train that takes 2 time steps to reach the next station while unit
2 stays at a station for 1 time step. During these intermediate time steps only unit 2 has a “free
choice”. How do we model this on the Cartesian Product MDP? One approach is to create new
states for the intermediate time steps. For example, suppose at location L
A
at time 1 a nondefault
transition takes unit 1 to location L
A
at time 3. We modify unit 1’s MDP so that this transition
ends at a new state (L
1
A
; 2)2 S
1
, where L
1
A
is a “special” location specifying that the unit will
become alive again at location L
A
in one more time step. There is only one action from (L
1
A
; 2),
126
with only one possible next state (L
A
; 3). Once we have modified the individual units’ MDPs
so that all transitions take exactly one time step, we can create the Cartesian Product MDP as
described in the previous paragraph.
Like the units’ MDPs, the Cartesian Product MDP is also acyclic. Therefore we can analo
gously define marginal probabilities w
(s
; a
) and x
(s
; a
; s
0
) on the Cartesian Product MDP.
Let w
2 I R
jS
jjA
j
and x
2 I R
jS
j
2
jA
j
be the corresponding vectors. Utilities generally cannot
be expressed in terms of w
and x
. We consider a special case in which utilities are partially
separable:
Definition 10. A patrolling game with execution uncertainty has partially separable utilities
if there exist U
(s
; a
; s
0
;) and V
(s
; a
; s
0
;) for each transition (s
; a
; s
0
), 2 ,
2 A, such that for all t 2 X, 2 , 2 A, the defender’s and the adversary’s utili
ties can be expressed as (t;;) =
P
s
;a
;s
0
(t; s
; a
; s
0
)U
(s
; a
; s
0
;) and (t;;) =
P
s
;a
;s
0
(t; s
; a
; s
0
)V
(s
; a
; s
0
;), respectively.
Partially separable utilities is a weaker condition than separable utilities, as now the expected
utilities may not be sums of contributions from individual units. When utilities are partially
separable, we can express expected utilities in terms of w
and x
and find an SSE by solving an
optimization problem analogous to (6.19). From the optimal w
, we can get a Markov strategy
(s
; a
) =
w
(s
;a
)
P
a
0
w
(s;a
0
)
, which is provably the optimal coupled strategy.
This approach cannot scale up to a large number of defender units, as the size of S
and A
grow exponentially in the number of units. In particular the dimension of the Markov policy
is already exponential in the number of units. To overcome this we will need a more compact
representation of defender strategies. One approach is to use decoupled strategies. Although no
127
longer optimal in general, I will show in Section 6.2.3 that decouple strategies can provide a good
approximation in the fare evasion deterrence problems.
128
6.2.3 Application to Fare Evasion Deterrence
I will now explain how the techniques proposed in Section 6.2.2 can be applied to the fare evasion
deterrence problem in transit systems. As we will see in Section 6.2.3.1, although the utilities in
this domain are not separable, we are able to upper bound the defender utilities by separable
utilities, allowing ecient computation. The solution given in Section 6.2.3.1 is a Markov strat
egy which can be used to sample Markov pure strategies as described earlier in Section 6.2.2.2.
However implementation of a Markov pure strategy with tens of thousands of states is nontriv
ial in practice. In Section 6.2.3.2, I will demonstrate a smartphone app solution that facilitates
TRUSTSv2 deployment in realworld transit systems.
6.2.3.1 Linear Program Formulation
Similar to the extended formulation on historyduplicate transition graph given in Section 6.1.2, a
state here comprises the current station and time of a unit, as well as necessary history information
such as starting time and prior patrol action. At any state, a unit may stay at her current station
to conduct an instation operation for some time or she can ride a train to conduct an ontrain
operation when her current time coincides with the train schedule. Due to execution uncertainty,
a unit may end up at a state other than the intended outcome of the action. For ease of analysis,
I assume a single type of unexpected event which delays a patrol unit for some time beyond
the intended execution time. Specifically, I assume for any fare check operation taken, there
is a probability that the operation will be delayed, i.e., staying at the same station (for in
station operations) or on the same train (for ontrain operations) involuntarily for some time.
Furthermore, I assume that units will be involved with events unrelated to fare enforcement and
129
thus will not check fares during any delayed period of an operation. Intuitively, a higher chance
of delay leads to less time spent on fare inspection.
The riders (adversaries) and the objective remain unchanged from TRUSTSv1. Recall riders
have multiple types, each corresponds a fixed route. A rider observes the likelihood of being
checked and makes a binary decision between buying and not buying the ticket. If the rider of
type buys the ticket, he pays a fixed ticket price
. Otherwise, he rides the train for free but
risks the chance of being caught and paying a fine of %
>
. The LASD’s objective is set
to maximize the overall revenue of the whole system including ticket sales and fine collected,
essentially forming a zerosum game.
Recall in TRUSTSv1, we define the fare check eectiveness f for each atomic patrol action
represented by an edge in the transition graph. However, in TRUSTSv2 the fare check operation
performed is determined by the actual transition rather than the action taken. Therefore we will
define the eectiveness of a transition (s; a; s
0
) against a rider type, f
(s; a; s
0
), as the percentage
of riders of type checked by transition (s; a; s
0
). Note f
(s; a; s
0
) is nonzero if and only if
the actual operation in transition (s; a; s
0
) intersects with the route takes. Following the same
argument as in Section 6.1.1, the probability that a joint complete trajectory t detects evader is
the sum of f
over all transitions in t = (t
1
;:::; t
) capped at one:
Pr(t;) = minf
X
i=1
X
s
i
;a
i
;s
0
i
f
(s
i
; a
i
; s
0
i
)(t
i
; s
i
; a
i
; s
0
i
); 1g: (6.24)
For type and joint trajectory t, the LASD receives
if the rider buys the ticket and %
Pr(t;) otherwise. The utilities in this domain are indeed not separable — even though multiple
units (or even a single unit) may detect a fare evader multiple times, the evader can only be fined
130
once. As a result, neither players’ utilities can be computed directly using marginal probabilities
x and w. Instead, we upper bound the defender utility by overestimating the detection probability
using marginals as the following:
[
Pr(x;) =
X
i=1
X
s
i
;a
i
;s
0
i
f
(s
i
; a
i
; s
i
)x
i
(s
i
; a
i
; s
0
i
): (6.25)
Equation (6.25) leads to the following upper bound LP for the fare evasion deterrence prob
lem:
max
x;w;u
X
2
p
u
+
X
i=1
X
s
i
;a
i
;s
0
i
R
i
(s
i
; a
i
; s
0
i
) (6.26)
s.t. constraints (6.15), (6.16), (6.17), (6.18)
u
minf
; %
[
Pr(x;)g; for all2 (6.27)
We prove the claims above by the following two propositions.
Proposition 7.
[
Pr(x;) is an upper bound of the true detection probability of any coupled strategy
with marginals x.
Proof. Consider a coupled strategy . Recall that '(t;)2 I R is the probability that joint tra
jectory t 2 X is instantiated. For rider type , the true detection probability is Pr(;) =
131
P
t2X
'(t;)Pr(t;). Relaxing Pr(t;) by removing the cap at 1 in Equation (6.24) and apply
ing Equation (6.14) we have,
Pr(;)
X
t2X
'(t;)
X
i=1
X
s
i
;a
i
;s
0
i
f
(s
i
; a
i
; s
0
i
)(t
i
; s
i
; a
i
; s
0
i
)
=
X
i=1
X
s
i
;a
i
;s
0
i
f
(s
i
; a
i
; s
0
i
)
X
t2X
'(t;)(t
i
; s
i
; a
i
; s
0
i
)
=
X
i=1
X
s
i
;a
i
;s
0
i
f
(s
i
; a
i
; s
0
i
)x
i
(s
i
; a
i
; s
0
i
) =
[
Pr(x;):
Proposition 8. LP (6.26) provides an upper bound of the optimal coupled strategy.
Proof. Let x
and w
be the marginal coverage and u
be the value of the patroller against rider
type in the optimal coupled strategy
. It suces to show that x
, w
, and u
is a feasible
point of the LP. From Lemma 7, we already know x
and w
must satisfy constraints (6.15) to
(6.18). Furthermore, we have u
since the rider pays at most the ticket price. Finally,
u
%
[
Pr(x;) since
[
Pr(x;) is an overestimate of the true detection probability.
Intuitively, LP (6.26) relaxes the utility functions by allowing an evader to be fined multiple
times (instead of only once in reality) during a single trip. The relaxed utilities are indeed separa
ble and thus the relaxed problem can be eciently solved. Since the solution returned x
and w
satisfy constraints (6.15) to (6.18), we can construct a Markov strategy from w
as described in
Section 6.2.2.2. The Markov strategy provides an approximate solution to the original problem,
whose actual value can be evaluated using Monte Carlo simulation.
132
6.2.3.2 Metro App: SmartPhone Implementation
In order to implement the TRUSTSv2 approach in realworld transit systems, the Metro App
presented in this section is being developed to work in accordance with TRUSTSv2 to (i) pro
vide ocers with patrol policies generated by TRUSTSv2, (ii) provide recovery from schedule
interruptions, and (iii) collect patrol data. In this section, I will present how the Metro App
will interface with the user and TRUSTSv2 component to provide patrol ocers with realtime
TRUSTSv2generated patrol schedules and collect reporting data from the patrol ocer’s shift.
Moreover I will discuss the features of Metro App and user interface design, and the benefits
expected from deploying TRUSTSv2 in the Los Angeles Metro System.
(a) Schedule view (b) Reporting view (c) Summary view
Figure 6.7: Metro App user interface.
The Metro App is a software agent carried by each patrol ocer that provides an interface for
interaction between the user and TRUSTSv2. The Metro App provides three principal features:
a TRUSTSv2generated patrol schedule for the current shift, a tracking system for reporting pas
senger violations, and a shift statistics summary report. At the beginning of an ocer’s shift, the
133
Metro App queries the database for the user’s patrol strategy (a Markov pure strategy) for the
current shift. From the patrol strategy, the Metro App displays a schedule of the user’s current
and upcoming patrol actions in “Schedule View”, shown in Figure 6.7(a). Implementing recov
ery from unexpected events in the real world that cause the ocer to fall o schedule, “Schedule
View” allows the ocer to manually set their current location, triggering the app to dynamically
update their schedule based on the ocer’s location. The new updated schedule is obtained from
the Markov pure strategy assuming no unexpected events will happen as I have explained in
Section 6.2.2.2.
The Metro App also allows patrol ocer to view and record passenger violations, such as
fare evasion, for the current patrol action using Reporting View, as shown in Figure 6.7(b). O
cers can also view and edit the passenger violations reported for past actions in Summary View,
shown in Figure 6.7(c). In Summary View, the ocer can also view and submit their Metro App
generated shift statistics summary report, including all unexpected events and violations reported
by the ocer throughout the shift, to the TRUSTS database. Through analysis on this collected
patrol data, we expect to gain valuable insight on the Los Angeles Metro patrolling domain, such
as follower behavior patterns, and better evaluate the eectiveness of TRUSTS deployment in a
real transit system. In addition, as many transit system security departments manually enter vio
lations data, Metro App can eliminate this ineciency by automatically submitting the collected
data to the security department. Furthermore, this collected data will also benefit transit system
security departments that conduct their own analysis on patrol system performance and the transit
system.
134
6.3 Experimental Results
In this Section, I will present experimental evaluation of TRUSTS based on real metro schedules
and rider trac data provided by the LASD. For both TRUSTSv1 and TRUSTSv2, I solved the
LP with history duplication using CPLEX 12.2 on a standard 2.8GHz machine with 4GB memory.
I will first describe the data sets I used, followed by simulation results.
6.3.1 Data Sets
I created four data sets, each based on a dierent Los Angeles Metro Rail line: Red (including
Purple), Blue, Gold, and Green. For each line, I created its transition graph using the correspond
ing timetable from http://www.metro.net. Implementing TRUSTSv1 and TRUSTSv2 requires
a finegrain ridership distribution of potential fare evaders (recall that a rider type corresponds to
a 4tuple of boarding station / time and disembarking station / time).
In my experiments, I assumed that potential fare evaders were evenly distributed among the
general population and created the required finegrained rider distribution using hourly boarding
and alighting counts provided by the Los Angeles Sheri Department. Suppose the percentage
of riders boarding in hour i is d
+
i
and the percentage of riders alighting in hour i is d
i
. Denote
the set of those that board in hour i by
+
i
and that alight in hour i by
i
. Then it is necessary
to compute a finegrained ridership distribution p to match the hourly boarding and alighting
percentages, i.e., to find a point within the following convex region
,
=fpjp 0^
X
2
+
i
p
= d
+
i
^
X
2
i
p
= d
i
;8ig:
135
For simplicity, I estimated the fare evader distribution by finding the analytic center of
, i.e.,
p
= arg min
p2
P
2
log(p
), which can be eciently computed.
The inspection eectiveness f of a patrol action was adjusted according to the ridership vol
ume intersected. f is capped at 0:5 to capture the fact that the inspector cannot switch between
cars while the train is moving. (Trains contain at least two cars.) In the initial batch of experi
ments on TRUSTSv1 (Section 6.3.2), f was estimated based on the assumption that 10 passengers
can be inspected per minute. In subsequent experiments on TRUSTSv2 (Section 6.3.4), this in
spection rate was reduced to 3 passengers per minute to account for longer inspection time on tap
card users. While this modeling discrepancy makes the results in the two subsections not directly
comparable, the experiments conducted within each section were selfconsistent and the compar
ison between TRUSTSv1 and TRUSTSv2 with exactly the same modeling parameters was given
in Section 6.3.4. The ticket fare was set to $1:5 (the actual current value) while the fine was set
to $100. (Fare evaders in Los Angeles can be fined $200, but they also may be issued warnings.)
If we could increase the fine dramatically the riders would have much less incentive for fare eva
sion, and we could achieve better revenue. However a larger fine is infeasible legally. Table 6.1
summarizes the detailed statistics for the Metro lines.
Line Stops Trains Daily Riders Types
Red 16 433 149991.5 26033
Blue 22 287 76906.2 46630
Gold 19 280 30940.0 41910
Green 14 217 38442.6 19559
Table 6.1: Statistics of Los Angeles Metro lines.
136
6.3.2 Simulation Results of TRUSTSv1
Throughout this set of experiments, I fixed
to 1. In the first set of experiments, I fixed penalty
to 0 (no penalty for using patrol paths with more switches), and varied the maximum number of
hours that an inspector can patrol from 4 to 7 hours. To create the HDT graph, I took one starting
time point every hour.
4 5 6 7
1.2
1.3
1.4
1.5
Number of patrol hours
Revenue per rider
BLUE
GOLD
GREEN
RED
(a)
4 5 6 7
96%
97%
98%
99%
100%
Number of patrol hours
Percentage of upper bound
BLUE
GOLD
GREEN
RED
(b)
Figure 6.8: Solution quality of TRUSTSv1: (a) Per passenger revenue of the computed mixed
strategy (b) Percentage of the solution value compared to the LP upper bound.
Figure 6.8(a) shows the expected revenue per rider of the mixed patrol strategy generated by
TRUSTSv1, which is the total revenue divided by the number of daily riders. Since the LP only
returns an upper bound of the attainable revenue, the true expected revenue of the mixed patrol
strategy was computed by evaluating the riders’ best responses for all rider types. A rider can
always pay the ticket price for $1:5 and will only evade the ticket when the expected fine is lower.
Hence the theoretical maximum achievable value is $1:5, which is achieved when every rider
purchases a ticket. As we can see, the perrider revenue increases as the number of patrol hours
increases, almost converging to the theoretical upper bound of $1:5 for the Gold and Green line.
Specifically, a 4hour patrol strategy already provides reasonably good expected value: 1:31 for
the Blue line (87:4% of the maximum), 1:45 for the Gold line (97:0%), 1:48 for the Green line
137
(98:8%), and 1:22 for the Red line (81:3%). Among the four lines, the Red line has the lowest
revenue per rider. This is because the eectiveness of fare inspection decreases as the volume of
daily riders increases, and the Red line has significantly higher number of daily riders than the
other lines.
I depict in Figure 6.8(b) the percentage of the true expected revenue vs. the theoretical upper
bound returned by the LP. Strategies generated by our method are near optimal; for example,
our 4hour strategies for the Blue, Gold, Green, and Red lines provided expected revenues of
96:5%, 98:5%, 99:5%, and 97:0% of the upper bound (and thus at least as much of the optimum),
respectively.
4 5 6 7
0
20%
40%
60%
80%
100%
Number of patrol hours
Percentage of riders
prefer purchasing
prefer fare evasion
Indifferent
(a)
4 5 6 7
0
10%
20%
30%
40%
Number of patrol hours
Percentage of riders that
prefer fare−evasion
BLUE
GOLD
GREEN
RED
(b)
Figure 6.9: Fare evasion analysis of TRUSTSv1: (a) Evasion tendency distribution of the Red
line (b) Percentage of riders that prefer fare evasion.
To study riders’ responses to the computed strategy, I partitioned the entire population of
riders into three groups depending on their expected fine if fareevading: riders who prefer pur
chasing tickets (expected fine is greater than 1.7—13:3% above the ticket price), riders who prefer
fare evasion (expected fine is less than 1.3—13:3% below the ticket price), and indierent rid
ers (expected fine is between 1:3 and 1:7). In Figure 6.9(a), I show the distribution of the three
groups against the strategies computed for the Red line. The three dashed lines inside the region
138
of indierent riders represent, from top to bottom, the percentages of riders whose expected fine
is less than 1:6, 1:5, and 1:4, respectively. As the number of patrol hours increases from 4 to 7, the
percentage of riders who prefer fare evasion decreases from 38% to 7%, the percentage of riders
who prefer purchasing tickets increases from 17% to 43%, and the percentage of indierent riders
remains stable between 45% and 50%.
Zooming in on the fare evasion, Figure 6.9(b) shows the percentage of riders who preferred
fare evasion against the patrol strategies computed. As we can see, this percentage decreased
almost linearly in the number of additional patrol hours beyond 4. Our 7hour patrol strategy
lowered this percentage to 4:2% for the Blue line, 0:01% for the Gold line, 0:01% for the Green
line, and 6:8% for the Red line. Again, due to having the highest daily volume, the Red line had
the highest percentage of riders who preferred fare evasion.
4 5 6 7
10
2
10
3
10
4
Number of patrol hous
Solving time (in seconds)
BLUE
GOLD
GREEN
RED
(a)
10 100 1000
85%
90%
95%
100%
Solving time (in seconds)
Normalized revenue
BLUE
GOLD
GREEN
RED
(b)
Figure 6.10: Runtime analysis of TRUSTSv1: (a) Runtime of solving the LP by CPLEX (b)
Tradeos between optimality and runtime.
Figure 6.10(a) shows the runtime required by CPLEX to solve the LPs created. As we can
see, the runtime increased as the number of patrol hours increased for all the metro lines. This is
because the size of the HDT graph constructed is roughly proportional to the maximum length of
the patrols, and a larger HDT graph requires an LP with more variables and constraints. Among
139
the four lines, the Red and the Green lines have significantly fewer types, and are thus easier to
solve than the other two lines.
To further study the tradeo between solution quality and runtime eciency, I varied the in
terval of taking starting time points. I fixed the patrol length to 4 hours and penalty parameter
to 0. For each line, I tested 6 interval settings ranging from 0:5 hour to 4 hours. In Figure 6.10(b),
the xaxis is the runtime (in logscale) and the yaxis is the normalized revenue against the ex
pected revenue of 0:5hour interval within each line. For each line, a data point from left to right
corresponds to 4, 3, 2, 1:5, 1, and 0:5 hour(s) interval respectively. Increasing the runtime al
ways led to a better solution; however, the quality gain was diminishing. For example, for the
Blue line, it took 20 seconds of additional runtime to increase the solution quality from 87:9%
(4 hours) to 92:9% (3 hours), whereas it took 1456 seconds of additional runtime to increase the
solution quality from 99:1% (1 hour) to 100% (0:5 hour).
5 10 15 20 25
95%
96%
97%
98%
99%
100%
Number of switches
Normalized revenue
BLUE
GOLD
GREEN
RED
(a)
0 20 40 60 80
0
0.2
0.4
0.6
0.8
1
Number of switches
Cumulative distribution
β = 0
β = 0.001
β = 0.01
(b)
Figure 6.11: Reducing number of switches: (a) Tradeos between optimality and patrol prefer
ence (b) Cumulative probability distribution of the number of switches for the Red line.
In the final experiment, I varied the penalty, trading o between the solution quality and the
average number of switches. I fixed the patrol length to 4 hours and starting time interval to
one hour. For each line, I tested 7 penalty settings from = 0 to = 0:01. Figure 6.11(a) plots
140
the average number of switches against the normalized revenue against the expected revenue of
= 0 within each line. For all lines, higher values led to both lower solution quality and fewer
number of switches. For example, the average number of switches in the solution of the highest
revenue ( = 0) ranged from 18:6 (Gold line) to 26:7 (Red line). However, by allowing 3%
quality loss, this number could be lowered to less than 10 for all the four lines.
To further understand the patrol paths returned in these solutions, I show, in Figure 6.11(b),
the cumulative probability distributions of the number of switches for the Red line given 3 settings
of: 0, 0:001, and 0:01. Choosing a lower tended to lead to more complex patrol paths. For
example, the solution of = 0 used patrol paths whose number of switches is greater than 20
with 68:9% probability; the solution of = 0:001 (99:7% of the optimum) only used such paths
with 31:2% probability. And the solution of = 0:01 (97:0% of the optimum) never used patrol
paths that had more than 20 switches.
6.3.3 Field Trials of TRUSTSv1
In addition to simulations, some real world trials of TRUSTSv1generated schedules have been
conducted by the Los Angeles Sheri’s Department to further validate the approach. In particular,
the LASD conducted two 4hour patrol shifts on Jan. 4 and Jan. 5, 2012 as initial trials, followed
by ten 3hour shifts on seven distinct dates in May and June and twelve more 3hour shifts on
Sep. 21 and 24, 2012. These shifts were all conducted on the Red line. Figure 6.12 shows one
example of a patrol shift given to the LASD where “Post” represents a fare inspection in the given
station and “Train” represents a fare inspection on the given train. The LASD followed the given
shift as much as they could and for each inspection period they collected statistics including the
number of patrons checked, warned, cited and arrested. They were also encouraged to provide
141
Post UNION 15:06! 15:41 (35 mins)
Train UNION 15:41! WILSHIRE/VERMONT 15:50 (9 mins)
Post WILSHIRE/VERMONT 15:50! 16:44 (54 mins)
Train WILSHIRE/VERMONT 16:44! 7TH/METRO CENTER 16:48 (4 mins)
Post 7TH/METRO CENTER 16:48! 17:23 (35 mins)
Train 7TH/METRO CENTER 17:23! UNION 17:28 (5 mins)
Post UNION 17:28! 17:58 (30 mins)
Figure 6.12: Example of a fare inspection patrol shift.
feedback on the schedules especially when they were unable to follow the patrol completely. An
example sheet of collected statistics and feedback is given in Figure 6.13.
Figure 6.13: Example of shift statistics and feedback provided by the LASD.
Table 6.2 summarizes the comparison between TRUSTSv1 shifts and LASD’s regular shifts.
It worths noting that TRUSTSv1 shifts were more eective than regular shifts—ocers were
able to check more patrons and catch more fare evaders following the TRUSTSv1 schedules.
TRUSTSv1 shifts also detected a higher citation rate (i.e. fare evasion rate) than regular shifts. A
plausible explanation of this observation is that regular shifts decided by human schedulers may
be limited to certain spatiotemporal patterns and can fail to intersect high evasion rate trac
emerged consequently. Being fully machinerandomized and optimized, TRUSTS on the other
142
(Daily Average) TRUSTSv1 Regular
Checks per ocer 403.3 317.8
Citations per ocer 6.72 3.54
Citation rate 1.67% 1.11%
Table 6.2: Comparison between TRUSTSv1 shifts and LASD’s regular shifts.
hand is able to identify location and time pairs where fare inspection can be the most eective yet
avoids being predictable.
While field trials of TRUSTSv1 showed encouraging results, serious issues also emerged
from the feedback given by the LASD. First, 8 out of 24 patrols were explicitly reported as
executed with errors for various reason including felon arrests, train delays, backup requests, and
etc. The observation that execution error aected at least one third of the TRUSTSv1 schedules
motivates the need of moving the TRUSTS system towards TRUSTSv2 to provide recoverable
patrols. Moreover, the feedback written on paper requires significant human eort to digitalize the
statistics for further systematic analysis. More importantly, this digitalization process is subject
to significant noise—counting errors, typos, bad handwriting, and/or incorrect data entry can all
corrupt the valuable data collected. The mobile phone application proposed in my thesis is indeed
motivated by resolving these important issues by providing userfriendly interface for TRUSTSv2
schedules and simplifying the task of data collection, transmission, and formatting.
6.3.4 Simulation Results of TRUSTSv2
I studied the performance of the Markov strategies generated by TRUSTSv2 under a variety of
settings. As mentioned earlier, to better fit the reality, the experiments in this section assumed
that the inspection rate was 3 passengers per minute instead of 10 (which was used in the experi
ments described in Section 6.3.2). Throughout the settings that I have tested, the Markov strategy
was close to optimal with revenue always above 99% of the LP upper bound. Therefore in the
143
remainder of this subsection I will report values of the Markov strategy without mentioning the
LP upper bound.
In the first set of experiments, I compared, under execution uncertainty, the performance of
the Markov strategy against pregenerated schedules given by TRUSTSv1, a deterministic model
assuming perfect execution. However, actions to take after deviations from the original plan are
not welldefined in TRUSTSv1 schedules, making a direct comparison inapplicable. Therefore, I
augmented these pregenerated schedules with two naive contingency plans indicating the actions
to follow after a unit deviates from the original plan. The first plan, “Abort”, is to simply abandon
the entire schedule and return to the base. The second plan, “Arbitrary”, is to pick an action
uniformly randomly from all available actions at any decision point after the deviation.
0 0.05 0.1 0.15 0.2 0.25
0
0.3
0.6
0.9
1.2
1.5
Probability of unexpected event
Revenue per rider
Markov
Arbitrary
Abort
(a)
5 10 15 20 25
0
0.3
0.6
0.9
1.2
1.5
Delay time
Revenue per rider
Markov
Arbitrary
Abort
(b)
0 0.05 0.1 0.15 0.2 0.25
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Probability of unexpected event
Fare evasion rate
Markov
Arbitrary
Abort
(c)
Figure 6.14: Markov strategy (TRUSTSv2) vs. pregenerated strategy (TRUSTSv1): (a) revenue
per rider of varying (b) revenue per rider of varying delay time (c) evasion rate of varying
In this experiment, I fixed the number of units to 6 and the patrol length to 3 hours, and
presented the results on the Red line (experiments on other lines showed similar results). I first
fixed the delay time to 10 minutes and varied the delay probability from 0% to 25%. As we can
see in Figure 6.14(a), both “Abort” and “Arbitrary” performed poorly in the presence of execution
uncertainty. With increasing values of, the revenue of “Abort” and “Arbitrary” decayed much
faster than the Markov strategy. For example, when was increased from 0% to 25%, the revenue
144
of “Abort” and “Arbitrary” decreased 75:4% and 37:0% respectively while that of the Markov
strategy decreased only 3:6%.
In addition to revenue, Figure 6.14(c) showed the fare evasion rate of the three policies with
increasing. Following the same categorization as in Figure 6.9(a), I considered a rider to prefer
fare evasion if and only if his expected penalty from fare evasion is less than $1:3. As we can
see, “Abort” and “Arbitrary” showed extremely poor performance in evasion deterrence with
even a tiny probability of execution error. In particular, when was increased from 0% to 5%,
the evasion rate of the Markov strategy barely increased while that of “Abort” and “Arbitrary”
increased from 11:2% both to 74:3% and 43:9% respectively.
Then I fixed to 10% and varied the delay time from 5 to 25 minutes. Figure 6.14(b) showed
that both “Abort” and “Arbitrary” performed worse than the Markov strategy. With increasing
delay time, the revenue of “Abort” remained the same as the time of the delay really did not
matter if the unit was to abandon the schedule after the first unexpected event. The revenue of
“Arbitrary”, however, decayed in a faster rate than the Markov strategy. When the delay time
was increased from 5 to 25 minutes, the revenue of “Abort” remained the same while that of
“Arbitrary” and the Markov strategy decreased 14:4% and 3:6% respectively.
An important observation here is that the revenue of “Abort”, a common practice in fielded
operations, decayed extremely fast with increasing — even with a 5% probability of delay, the
revenue of “Abort” was only 73:5% of that of the Markov strategy. With a conservative estimate
of 6% potential fare evaders [Booz Allen Hamilton, 2007] and 300; 000 daily riders in the LA
Metro Rail system, the 26:5% dierence implies a daily revenue loss of $6; 500 or $2:4 million
annually.
145
0 0.05 0.1 0.15 0.2 0.25
1.3
1.35
1.4
1.45
1.5
Probability of unexpected event
Revenue per rider
Blue Gold Green Red
(a)
0 0.05 0.1 0.15 0.2 0.25
0%
5%
10%
15%
20%
Probability of unexpected event
Fare evasion rate
Blue Gold Green Red
(b)
0 0.05 0.1 0.15 0.2 0.25
0.7
0.9
1.1
1.3
1.5
Probability of unexpected event
Revenue per rider
Low Medium High
(c)
Figure 6.15: Simulation results of TRUSTSv2: (a) Revenue per rider of Markov strategy (b)
Evasion rate of Markov strategy (c) Revenue decay with varying coverage levels.
In the second set of experiments, I showed that the Markov strategy performed well consis
tently in all of the four lines with increasing delay probability. I fixed the number of units to 6
and the patrol length to 3 hours, but varied from 0% to 25%. Figure 6.15(a) and Figure 6.15(b)
showed the revenue per rider and the evasion rate of the four lines respectively
1
. As we can see,
the revenue decreased and the evasion rate increased with increasing . However, the Markov
strategy was able to eectively allocate resources to counter the eect of increasing in terms
of both revenue maximization and evasion deterrence. For example, the ratio of the revenue of
= 25% to that of = 0% was 97:2%, 99:1%, 99:9%, 95:3% in the Blue, Gold, Green and Red
line respectively. Similarly, when was increased from 0% to 25%, the evasion rate of the Blue,
Gold, Green and Red line was increased by 4:6, 1:9, 0:1, 5:2 percentage points respectively.
The next experiment showed that the revenue decay of the Markov strategy with respect to
delay probability could be aected by the amount of resources devoted to fare enforcement.
In Figure 6.15(c), I presented the revenue per rider with increasing on the Red line only, but
the same trends were found on the other three lines. In this experiment, I considered 3, 6 and 9
patrol units, representing three levels of fare enforcement: low, medium, and high respectively.
1
The revenue of the Red line was significantly lower than the other lines because fare check eectiveness f defined
in Section 6.2.3.1 was set inversely proportional to the ridership volume.
146
Intuitively, with more resources, the defender could better aord the time spent on handling
unexpected events without sacrificing the overall revenue. Indeed, as we can see, the rate of
revenue decay with respect to decreased as we increased the level of fare enforcement from
low to high. For example, when was increased from 0% to 25%, the revenue drop in the low,
medium and high enforcement setting was 13:2%, 4:7%, and 0:4% respectively.
2 3 4 5 6
0.6
0.8
1
1.2
1.4
Number of patrol units
Revenue per rider
η = 0%
η = 10%
η = 20%
(a)
0 0.05 0.1 0.15 0.2 0.25
0
6
12
18
24
30
36
42
48
54
60
Probability of unexpected event
Runtime(in minutes)
Blue
Gold
Green
Red
(b)
Figure 6.16: Simulation results of TRUSTSv2: (a) Revenue per rider with increasing coverage
(b) Worstcase LP runtime.
Next, we demonstrate the usefulness of our Markov strategy in distributing resources under
dierent levels of uncertainty. I showed results on the Red line with a fixed patrol length of 3
hours. Three delay probabilities = 0%, 10%, and 20% were considered, representing increasing
levels of uncertainty. Figure 6.16(a) showed the revenue per rider with increasing number of
units from 2 to 6. As I increased the number of units, the revenue increased towards the maximal
achievable value of $1:5 (ticket price). For example, when = 10%, the revenue per rider was
$0:65, $1:12, and $1:37 with 2, 4, and to 6 patrol units respectively.
Finally, Figure 6.16(b) plotted the worstcase runtime (over 10 runs) of the LP with increasing
for the four metro lines. The number of units was fixed to 3 and the patrol length per unit was
fixed to 3 hours. As we can see, TRUSTSv2 was able to solve all of the problems within an hour.
147
The runtime varied among the four Metro lines and correlated to their number of states and types.
For example, when = 10%, the runtime for the Blue, Gold, Green, and Red line was 14:0,
24:3, 2:4, and 4:3 minutes respectively. Surprisingly, for all of the four lines, stochastic models
with = 5% took less time to solve than deterministic models ( = 0%). Overall no significant
correlation between the runtime and delay probability was found.
148
Chapter 7: Related Work
Dealing with uncertainty and finding robust equilibrium has long been an active topic in game the
ory traditionally with a focus on simultaneousmove games. Numerous models and approaches
were proposed such as the classic Bayesian game model [Harsanyi, 1967], robust game the
ory [Aghassi and Bertsimas, 2006b], and various equilibrium refinement concepts [Selten, 1975;
McKelvey and Palfrey, 1995; Beja, 1992]. My thesis focuses on Stackelberg games, which have
received a lot of recent attention due to their real world deployment—the ARMOR program [Pita
et al., 2008] has been deployed at the Los Angeles International Airport since 2007. Since then,
many new uncertainty models, robust techniques, and human bias models for Stackelberg games
were developed focusing on security applications [Tambe, 2011].In addition to research on un
certainty in game theory, another line of research that is related to my thesis and in particular
the TRUSTS application aims at finding ecient algorithms for solving complex graphbased
patrolling games, either optimally or approximately.
Therefore, in this chapter, I will describe research related to my thesis in the following three
categories: (i) work on addressing uncertainty in simultaneousmove games, (ii) uncertainty mod
eling and robust solutions for Stackelberg games and security games, (iii) ecient solutions for
solving complex graph patrolling games.
149
7.1 Uncertainty in Simultaneousmove Games
In game theory, uncertainty about data (such as players’ payos) is typically considered in
simultaneousmove games with a focus on finding robust Nash equilibria. Harsanyi 1967 mod
eled incomplete information games (i.e., games with payo uncertainty) as Bayesian games,
encoding such payo uncertainty in players’ type information. He showed any Bayesian game is
equivalent to an extensiveform game with complete, but imperfect information. This extensive
form game, in turn, is known to have a strategicform representation. This modeling technique
requires the availability of the full prior distributional information for all uncertain parameters.
Robust game theory [Aghassi and Bertsimas, 2006b], alternatively, employed an uncertainty
model analogous to robust optimization [BenTal et al., 2008] and provided a distributionfree
equilibrium concept called robustoptimization equilibrium. They showed that computing a
robustoptimization equilibrium for finite games with bounded polyhedral payo uncertainty sets
is equivalent to identifying a Nash equilibrium of finite games with complete information. My
work Recon [Yin et al., 2011] employs a similar framework in terms of optimizing the worstcase
utility with bounded uncertainty sets, but solves for the Stackelberg equilibrium, where the work
of Aghassi and Bertisimas 2006b is not applicable.
Other than payo uncertainty, work in the game theory community also investigates execution
error or bounded rationality separately, such as tremblinghand perfect equilibria [Selten, 1975],
quantal response equilibria [McKelvey and Palfrey, 1995] and imperfect equilibria [Beja, 1992].
The goal of these works is to refine notions of equilibrium. In each of these works, players are
assumed to make errors in choosing which pure strategy to play. The deviations to the intended
actions of the players are correlated with the expected payo of each of the actions. Execution
150
error was also studied in repeated games with incomplete information such as [Archibald and
Shoham, 2011].
While the uncertainty models and equilibrium refinement concepts can often be adapted in
the Stackelberg setting as seen in [Paruchuri et al., 2008; Yin et al., 2011; Yang et al., 2012], the
algorithms developed for simultaneousmove games are generally inapplicable for Stackelberg
games. Consequently, many works focusing on Stackelberg games have been proposed sepa
rately.
7.2 Uncertainty in Stackelberg Games
7.2.1 Algorithms for Bayesian Stackelberg Games
For leaderfollower Stackelberg games, the Bayesian extension analogous to that of simultaneous
move games has received much recent research interest due to their uses in deployed security ap
plications [Tambe, 2011]. In particular, Conitzer and Sandholm 2006 proved that finding an opti
mal leader’s mixed strategy in twoplayer Bayesian Stackelberg games is NPhard, and provided a
solution method by solving multiple linear programs (possibly exponentially many). Parachuri et
al. 2008 provided Dobss— a single mixed integer linear program formulation that solves the prob
lem. Jain et al. 2011b employed a branchandbound search algorithm in which they computed
heuristic upper and lower bounds by solving smaller restricted problems. Despite the algorithmic
advancement, none of these techniques can handle games with more than 50 types, even when the
number of actions per player is as few as 5. Beyond discrete uncertainty, Kiekintveld et al. 2011
151
modeled continuously distributed uncertainty over preferences of the follower as Bayesian Stack
elberg games with infinite types, and proposed an algorithm to generate approximate solutions
for such games.
My work advances this line of research in two aspects: (i) I provided a novel Bayesian Stack
elberg game solver Hunter which runs orders of magnitudes faster than previous methods, (ii)
I extended the Bayesian Stackelberg games to model the leader’s execution and the follower’s
observation uncertainty (potentially continuous) in a unified framework.
7.2.2 Robust Solutions
As an alternative to the Bayesian model of uncertainty, there have been works on security games
that compute robust solutions without explicitly modeling the uncertainty. Cobra [Pita et al.,
2010] assumes that the follower has bounded rationality and may not strictly maximize expected
value. As a result, the follower may select an optimal response strategy, i.e., the follower
may choose any of the responses within of his optimal strategy. Cobra attempts to maximize
the leader’s expect value for the worstcase scenario that fall within thisbound of the optimal
response.
In contrast, Match [Pita et al., 2012] employs an idea of graduated robust optimization, which
constrains the impact of the follower’s deviations depending on the magnitude of the deviation. In
particular, Match bounds the leader’s loss for a potential deviation of the follower by an adjustable
fraction of the follower’s loss for the deviation from the expectedvaluemaximizing strategy.
In addition to robust optimization formulations, An et al. 2011 provided refinement meth
ods to strong Stackelberg equilibrium in security games to achieve “free” additional robustness
against potential oequilibrium actions played by an adversary due to his capability limitations.
152
My work Recon complements the works above by explicitly considering two major causes
of real world uncertainty: the leader’s execution error and the follower’s observation noise, and
therefore provide solutions that are robust to such uncertainty. Recon is able to utilize partial
knowledge about the uncertainty such as the noise levels at dierent targets when such infor
mation is available. In contrast, Cobra and Match are incapable of taking advantage of such
uncertainty knowledge due to their limited parameter space.
7.2.3 Against Suboptimal Opponents
Another line of research studied systematic biases and bounded rationality of human opponents
in the context of security games. Pita et. al. 2010 suggested an anchoringbias of humans decision
makers and incorporate this human modeling component in their algorithmic contribution Cobra.
Yang et. al. have designed algorithms [Yang et al., 2011, 2012] to compute solutions for Stackel
berg games based on the prospect theory model [Kahneman and Tversky, 1979] and the quantal
response model [McKelvey and Palfrey, 1995]. Shieh et. al. showed that quantal response model
could also provide robustness against execution and observation errors as an eect of smoothing
out the follower’s response [Shieh et al., 2012].
These works have focused on creating accurate human decision making models using con
trolled human subject experiments. When sucient domain data is available, integrating human
decision models with Bayesian Stackelberg game model can be a valuable future research topic.
However finding perfect models of human decision making is dicult and requires a large amount
of data which can be limited in certain security applications. When data is limited, it can be ben
eficial for the security agency to use robust optimization framwork such as Recon, Cobra, or
Match.
153
7.2.4 Observability and Commitment
In terms of the follower’s observation uncertainty, there has been significant interest in under
standing the interaction of observability and commitment in general Stackelberg games. Bag
well 1995 questioned the value of commitment to pure strategies given noisy observations by
followers; but the ensuing and ongoing debate illustrated that the leader retains her advantage in
case of commitment to mixed strategies [Huck and Mller, 2000; van Damme and Hurkens, 1997].
The value of commitment for the leader when observations are costly was studied in [Morgan and
Vardy, 2007]. Secrecy and deception in Stackelberg games were also considered in [Zhuang and
Bier, 2011]. In contrast, my work focused on realworld security games, providing theoretical
properties [Yin et al., 2010] that are nonexistent in general Stackelberg games studied previ
ously.
In the context of security games, limited follower’s observability has been investigated both
theoretically assuming that the follower updates his belief according to Bayes’ rule [An et al.,
2012] and empirically through human subject experiments [Pita et al., 2010]. Both investigations
stick to a modified Stackelberg paradigm where the follower is assumed to infer the leader’s
strategy through a limited number of observations. Such approaches are however sensitive to
the follower’s observability model such as the number of observations allowed, which is dicult
to estimate in certain security applications. My work, alternatively, established a theoretical
partial equivalence between the leader’s strategies in the Stackelberg (perfect observability) and
simultaneousmove (no observability) models, suggesting that playing a Stackelberg equilibrium
strategy is optimal for the leader regardless of the follower’s observability. As a followup on my
154
work, Korzhyk et. al. 2011a studied the problem when the follower observes the leader’s strategy
perfectly with a known probability and does not observe at all otherwise.
7.2.5 Markov Decision Process and Stochastic Games
The MDP model used in TRUSTSv2 [Jiang et al., 2013] for modeling execution uncertainty re
sembles the transition independent DECMDP (TIDECMDP) [Becker et al., 2003]. TRUSTSv2
with the coupled execution of multiple teams is analogous to the TIDECMDP model with full
communication, where the optimal joint policy is sought. Decoupled execution on the other hand
corresponds to the TIDECMDP model with no communication. However, two major distinc
tions exist, presenting unique computational challenges. First, TRUSTSv2 considers the strategic
interaction against adversaries and focus on equilibrium computation. Second, utility functions in
TRUSTSv2 are nonMarkovian which depend on the entire trajectories as opposed to only state
and action pairs in typical DECMDP models.
The game model of TRUSTS in my thesis can be considered as a special case of extensive
form Stackelberg games with chance nodes, or as a special case of stochastic Stackelberg
games [Basar and Olsder, 1995]. The state in this special stochastic Stackelberg game is es
sentially the patroller’s physical state (location and time) and the transitions between states are
purely dependent on the patroller’s actions. The follower (rider) in this special game can only
choose one action (i.e., buy or not buy the ticket) in the initial state and stick to that action in
all future states. The general cases of both games were shown to be NPhard [Letchford and
Conitzer, 2010; Letchford et al., 2012]. V orobeychik and Singh 2012 provided mixed integer
linear programs for finding optimal and approximate Markov stationary strategy in generalsum
stochastic Stackelberg games. However, their approach does not handle multiple adversary types
155
and their MILP formulation lacks the scalability to a large number of states—inapplicable to the
Los Angeles Metro problems studied in my thesis.
7.3 Solving Complex Graph Patrolling Games
There has been research on a wide range of problems related to gametheoretic patrolling on
graphs that are related to TRUSTS presented in this thesis. One line of work considers games
in which one player, the patroller, patrols the graph to detect and catch the other player, the
evader, who tries to minimize the detection probability. This includes work on hiderseeker
games [Halvorson et al., 2009] for the case of mobile evaders and search games [Gal, 1979]
for the case of immobile evaders.
Another line of research considers games in which the patroller deploys resources (static
or mobile) on the graph to prevent the other player, the attacker, from reaching certain target
vertices. There are a few variations depending on the set of possible sources and targets of the
attacker. Infiltration games [Alpern, 1992] considered one source and target. Asset protection
problems [Dickerson et al., 2010] and Network interdiction [Washburn and Wood, 1995] consider
multiple sources and multiple equally weighted targets.
In the context of Stackelberg games for security, there have been numerous works related
to solving largescale graphrelated problems such as protecting commercial flights [Tsai et al.,
2009; Kiekintveld et al., 2009; Jain et al., 2010], protecting urban road network [Tsai et al.,
2010; Jain et al., 2011a], port security [Shieh et al., 2012], hostile area transit [Vanek et al.,
2011], malicious packet detection [Vanek et al., 2012b], preventing illegal extraction of forest
156
resources [Johnson et al., 2012], and etc. Large scale games on graphs often involve combinato
rial size of pure strategies which grows exponentially with increasing problem sizes. Therefore
general purpose Stackelberg game solvers such as Dobss and Hunter can be extremely inecient
when applied directly. Ecient solution approaches for large scale problems can be generally di
vided into three categories: (i) exact solution method using oraclebased algorithms [Jain et al.,
2010, 2011a; Vanek et al., 2011; Tsai et al., 2012], (ii) approximate solution method utilizing
submodular objective functions [Krause et al., 2011; Vanek et al., 2012b], (iii) approximate (re
laxed) solution method via compact strategy representation [Kiekintveld et al., 2009; Tsai et al.,
2010].
Oraclebased algorithms start with a small subset of pure strategies of the full game and search
for an equilibrium iteratively in a succession of increasingly larger subgames of the full game.
In each iteration the best response for the current subgame is provided by an oracle and added
to the current pure strategy sets of the respective player. The performance of the oracle plays an
important role in the overall performance of the algorithm. Jain et. al. 2010 presented Aspen for
scheduling air marshals to protect commercial flights (FAMS), which combines a branchand
price approach and a single bestresponse oracle for generating the defender’s pure strategies.
Double oracle algorithms, one for each player, were used in zerosum games where both play
ers have large pure strategy spaces. Jain et. al. 2011a provided a double oracle algorithm for
scheduling checkpoints in urban road network where the defender chooses a combination of road
segments to set up checkpoints and the attacker chooses a path in the network to reach a desired
target.
In certain security domains that the leader’s utility function has the submodularity property,
i.e., a natural eect of diminishing returns, there exist approximation algorithms with provable
157
quality guarantees. Specifically, the leader is considered to have a submodular utility function
if the marginal utility of deploying additional resources helps more if few resources have been
deployed and less if many resources have been deployed. Submodularity has been exploited in
optimizing sensor allocations in adversarial environment [Krause et al., 2011] and in randomizing
deep packet inspections for malicious packet detection within computer networks [Vanek et al.,
2012b].
The final line of research focused on finding approximate solutions by utilizing compact
strategy representation. EraserC [Kiekintveld et al., 2009] is an approximate algorithm for the
FAMS problem, representing the defender’s mixed strategy as a marginal coverage vector. This
representation relaxes the original strategy space and therefore may fail to generate a feasible
solution in cases where arbitrary schedules with more than two flights (i.e., multicity tours)
are allowed. EraserC avoids enumerating joint schedules to gain runtime eciency, but loses
the ability to correctly model arbitrary schedules. Similar to EraserC, Ranger [Tsai et al.,
2010] solves the urban network security problem using a marginal coverage representation of the
defender’s allocation strategy of road checkpoints and provides a couple of sampling approaches
to create feasible pure allocations from the marginal strategy generated. Although the sampling
approaches are always guaranteed to match the marginal coverage vector, the defender’s utility
function being optimized in Ranger is overestimated and therefore Ranger may not find the
optimal solution.
TRUSTS presented in this thesis, however, introduces unique computational challenges. First,
unlike in existing work on graph patrolling games and previous security applications for countert
errorism, the followers to influence in TRUSTS are potentially very many: large numbers of train
riders might plausibly consider fare evasion. Booz Allen Hamilton 2007 estimates that 6% of
158
riders are ticketless in the metro system overall; anecdotal reports suggest that on some lines this
percentage could be far greater, even a majority. Second, the patrols in TRUSTS correspond to all
the feasible trips within the transit network subject to restrictions and preferences that were non
existent in previous applications. Similar to EraserC [Kiekintveld et al., 2009] and Ranger [Tsai
et al., 2010], the patrol strategies in TRUSTS were compactly represented as a marginal coverage
vector. But unlike the FAMS problem where a patrol consists of very limited number of flights
(often a pair of flights) and unlike the urban network security problem where checkpoints can be
placed arbitrarily on any edges in the graphs without any constraints, TRUSTS allows much more
complex patrol constraints using a novel compact representation based on historyduplicate tran
sition graphs. Moreover, in contrast to EraserC which may fail to provide a feasible solution,
the approximate solutions given by TRUSTS are always feasible with nearoptimal performance
on real datasets.
159
Chapter 8: Conclusions
Gametheoretic approaches have shown their usefulness in deployed security applications such
as ARMOR for the Los Angeles International Airport [Pita et al., 2008], IRIS for the Federal Air
Marshal Service [Tsai et al., 2009], Guards for the Transportation Security Administration [Pita
et al., 2011], PROTECT for the Boston Coast Guard [Shieh et al., 2012], and TRUSTS for the
Los Angeles Metro Rail System [Yin et al., 2012a]. At the core of the these applications is the
Stackelberg game model. Despite its recent success in real world deployments, the Stackelberg
game paradigm is often questioned due to unrealistic assumptions such as (i) the security agency
has a complete knowledge about the adversary, (ii) the security agency can perfectly execute the
planned security activities, and (iii) the adversary can observe the exact mixed strategy of the
security agency, i.e., a probability distribution over actions.
Given the huge growth of recent research interest at the intersection between computer sci
ence and game theory, there has been heated discussions about “Does game theory actually
work?”. The answer is not as straightforward as one might think, and vastly depends on how
game theory here is interpreted. Wooldridge 2012 gave two interpretations: a descriptive inter
pretation which views game theory as predicting how (human) players will behave in strategic
settings, and a normative interpretation which views game theory as a tool to recommend action
160
for players. My thesis focuses on applying game theory to real world problems (in particular
security randomization), where the grand task is to make game theory work better under both its
descriptive and normative interpretations.
To this end, my thesis on the one hand augments the existing gametheoretic framework to
model and address real world uncertainty such as those in preference, execution, and observation,
providing better descriptive models and the corresponding solution methods for these real world
problems. On the other hand, my thesis also addresses various real world challenges that arise
from public transit domains such as scheduling constraints, human preferences, patrol interrup
tions, and so on, providing practical and usable recommendations to human users. In particular,
my thesis has the following four key contributions.
8.1 Contributions
Hunter is a new algorithm for solving discrete finite Bayesian Stackelberg games, com
bining five key ideas:
– ecient pruning via a bestfirst search in the follower’s strategy space;
– a novel linear program for computing tight upper bounds for this search;
– using Bender’s decomposition for solving the upper bound linear program eciently;
– ecient inheritance of Bender’s cuts from parent to child;
– an ecient heuristic branching rule.
My experimental results suggest that Hunter could provide orders of magnitude speedups
over the best existing methods for Bayesian Stackelberg games [Conitzer and Sandholm,
161
2006; Paruchuri et al., 2008; Jain et al., 2011b]. Moreover, as verified by my experiments,
Hunter’s eciency can be exploited in the sample average approximation approach to
handling execution and observation uncertainty in both discrete and continuous forms in a
unified framework.
Recon is a robust optimization framework to address execution and observation uncer
tainty of unknown distribution, with a focus on security games motivated by the ARMOR
application. Recon is suitable for security applications where full distributional knowledge
about the uncertainty is dicult or impossible to acquire. In the absence of the precise
uncertainty distribution, Recon models the uncertainty boundary as a hyperrectangle, and
correspondingly computes the optimal riskaverse strategy for the leader. I provide exper
imental analysis comparing the performance of various security game strategies including
those generated by Recon and Hunter in simulated uncertainty settings, showing the value
of Recon and Hunter under dierent assumptions.
Stackelberg vs. Nash: This work answers a fundamental question in gametheoretic model
ing of security applications: what should the security agency do if it is uncertainty whether
or not the adversary will conduct surveillance. I provide theoretical and experimental anal
ysis of this problem, focusing on security games motivated by the ARMOR and IRIS ap
plications. In particular, I show that in security games that satisfy the SSAS property (such
as ARMOR games), any Stackelberg game equilibrium strategy for the defender is also
a Nash equilibrium strategy. In this case, the defender is therefore bestresponding with
a Stackelberg equilibrium strategy regardless of the follower’s ability to observe. On the
other hand, counterexamples to this (partial) equivalence between the Stackelberg and
162
Nash equilibrium strategies exist when the SSAS property does not hold. However, my
experiments show that in this case, the fraction of games where the Stackelberg equilib
rium strategy is not in any Nash equilibrium is vanishingly small with increasing problem
sizes, especially for the IRIS games which have small schedule size and a large number of
schedules.
TRUSTS is a new application for scheduling inspection patrols in public transit systems for
fare evasion deterrence, which presents new challenges in gametheoretic modeling and ex
ecution uncertainty handling. In particular, security activities in TRUSTS are carried out
as sequences of actions in dierent place and time subject to strict restrictions imposed by
the underlining train system and preferences expressed by human patrollers. Execution un
certainty in such spatiotemporal domains needs an entirely dierent treatment than earlier
applications such as ARMOR and IRIS since an execution error can aect the security o
cers’ ability to carry out their planned schedules in later time steps. The novel contributions
of TRUSTS are the following:
– a general Bayesian Stackelberg game model for spatiotemporal patrolling with execu
tion uncertainty where the execution uncertainty is represented as Markov Decision
Processes,
– a compact strategy representation when the utility functions have a certain separa
ble structure, which reduces the problem to a polynomialsized linear optimization
problem,
– a novel historyduplicate approach to encode constraints on feasible patrols within
the compact representation,
163
– a smart phone app implementation of the generated patrol schedules with contingency
plans,
– simulations and real world experiments on the Los Angeles Metro Rail system in
collaboration with the Los Angeles Sheri Department.
My simulation results show that TRUSTS can provide nearoptimal solutions for large scale
problems within reasonable runtime requirement. Initial real world trials show encourag
ing results, indicating that TRUSTS schedules can be more eective than humancreated
schedules in catching fare evaders.
To summarize, my thesis contributes multiple uncertainty models for Stackelberg games fo
cusing on security applications where the leader’s execution and follower’s observation are im
perfect. These contributions allow the security agency to utilize dierent amounts of information
available about the uncertainty and generate reliable or robust strategies or even strategies with
contingency plans in the case where execution errors at earlier steps may void plans after.
8.2 Future Work
In the future one can imagine gametheoretic approaches to be applied in a large spectrum of ap
plications far beyond counterterrorism. The growing list of such applications ranges from ensur
ing safety in public facilities such as transportation hubs, parks, and sports stadiums to protecting
natural resources such as forests, animals, fishes, and etc. More and more new applications are
emerging at a rapid rate, bringing significant challenges in scalability and modeling that require
future research endeavors.
164
While this thesis presented algorithmic advancement in Bayesian Stackelberg games that al
lows problems with significantly more types to be solved, the scalability of the algorithm is still
limited, inadequate for large scale problems that may involve tens of thousands of opponents
such as riders in public transit system
1
, cars in road network, and criminals in large metropoli
tan areas. In addition, a straightforward Stackelberg game model may no longer be suitable for
new applications where some adversaries may be opportunistic without deliberate planning and
intelligent inferences. For example, patrols in the Los Angeles Metro Rail system serve multiple
purposes including ticket enforcement, ensuring public safety by suppressing crimes, and coun
terterrorism. Compared to terrorists who are careful planners with surveillance and fare evaders
who are informed decision makers, pickpockets on trains who snatch smart phones are more op
portunistic. Finally, a pure mathematical model without correct numbers cannot work in practical
problems. It thus requires significant research and engineering eort in creating accurate and
predictive models using quantitative methods.
In the short run, my goal is to develop algorithms for Stackelberg games with a large number
types to meet the needs of future applications. To this end, I plan to further improve the scalability
of my Bayesian Stackelberg game solver Hunter by exploring dierent relaxation techniques and
search heuristics. Moreover, I plan to design new approximation schemes to provide high quality
solutions to large scale problems that cannot be solved to optimality. Finally, another interesting
direction to pursue is to integrate human decision models into the Bayesian framework, allowing
the use of multiple types of human adversary each characterized by a dierent human decision
model.
1
TRUSTS has to stick to a zerosum model to avoid the computational complexity for solving a generalsum model,
which prohibits modeling of human biases and risk adjustments.
165
In the long run, it will be important to devise mathematical models for applications with both
deliberate and opportunistic adversaries. One possible way is to create a mixture of a Stackelberg
game model against deliberate saboteurs and a partial dierential equation model for modeling
the dynamics of opportunistic crimes [Short et al., 2010]. It will also be important to employ
quantitative methods to create gametheoretic models using real world data systematically col
lected from security operations. As shown in Section 6.2.3.2, mobile applications can improve
law enforcement agencies’ eciency and eectiveness in carrying out their daily duties as well
as collect and format patrol data automatically. Such data along with certain statistical infer
ence methods will help future researchers to create more accurate models, and in turn improve
the eectiveness of gametheoretic methods. For example it can help creating more accurate
distributions over dierent types of adversary, yielding better solution of Hunter. It can also
help creating more accurate ridership distributions and MDP transition models for the TRUSTS
system, and in turn improves the eectiveness of the fare inspection operations.
166
Bibliography
Michele Aghassi and Dimitris Bertsimas. Robust game theory. Math. Program., 107:231–273,
June 2006a.
Michele Aghassi and Dimitris Bertsimas. Robust game theory. Math. Program., 107:231–273,
June 2006b.
Noa Agmon, Vladimir Sadov, Gal A. Kaminka, and Sarit Kraus. The Impact of Adversarial
Knowledge on Adversarial Planning in Perimeter Patrol. In AAMAS, volume 1, 2008.
Shabbir Ahmed, Alexander Shapiro, and Er Shapiro. The sample average approximation method
for stochastic programs with integer recourse. SIAM Journal of Optimization, 12:479–502,
2002.
S. Alpern. Infiltration games on arbitrary graphs. Journal of Mathematical Analysis and Appli
cations, 163(1):286 – 288, 1992.
B. An, D. Kempe, C. Kiekintveld, E. Shieh, S. Singh, M. Tambe, and Y . V orobeychik. Security
games with limited surveillance: An initial report. In AAAI Spring Symposium on Game Theory
for Security, Sustainability and Health, 2012.
Bo An, Milind Tambe, Fernando Ordonez, Eric Shieh, and Christopher Kiekintveld. Refinement
of strong Stackelberg equilibria in security games. In AAAI, 2011.
Christopher Archibald and Yoav Shoham. Hustling in repeated zerosum games with imperfect
execution. In IJCAI, 2011.
Kyle Bagwell. Commitment and observability in games. Games and Economic Behavior, 8:
271–280, 1995.
Egon Balas. Disjunctive programming: Properties of the convex hull of feasible points. Discrete
Applied Mathematics, 89(13):3 – 44, 1998.
Jonathan F. Bard. Practical Bilevel Optimization: Algorithms and Applications (Nonconvex Op
timization and Its Applications). SpringerVerlag New York, Inc., 2006.
Tamer Basar and Geert Jan Olsder. Dynamic Noncooperative Game Theory. Academic Press,
San Diego, CA, 2nd edition, 1995.
Gary Becker and WilIiam Landes. Essays in the Economics of Crime and Punishment. Columbia
University Press, 1974.
167
R. Becker, S. Zilberstein, V . Lesser, and C.V . Goldman. Transitionindependent decentralized
markov decision processes. In AAMAS, pages 41–48. ACM, 2003.
Avraham Beja. Imperfect equilibrium. Games and Economic Behavior, 4(1):18 – 36, 1992.
Aharon BenTal, Laurent El Ghaoui, and Arkadi Nemirovski. Robust optimization. Princeton
University Press, 2008.
John R. Birge and Franois V . Louveaux. A multicut algorithm for twostage stochastic linear
programs. European Journal of Operational Research, 34(3):384 – 392, 1988.
Booz Allen Hamilton. Faregating analysis. Report commissioned by the LA Metro, http://
boardarchives.metro.net/Items/2007/11_November/20071115EMACItem27.pdf, 2007.
M. Breton, A. Alj, and A. Haurie. Sequential stackelberg equilibria in twoperson games. Opti
mization Theory and Applications, 59(1):71–94, 1988.
V . Conitzer and T. Sandholm. Computing the optimal strategy to commit to. In EC: Proceedings
of the ACM Conference on Electronic Commerce, 2006.
J. P. Dickerson, G. I. Simari, V . S. Subrahmanian, and Sarit Kraus. A graphtheoretic approach
to protect static and moving targets from adversaries. In AAMAS, 2010.
Drew Fudenberg and Jean Tirole. Game Theory. MIT Press, October 1991.
Shmuel Gal. Search games with mobile and immobile hider. SIAM Journal on Control and
Optimization, 17(1):99–122, 1979.
Nicola Gatti. Game theoretical insights in strategic patrolling: Model and algorithm in normal
form. In ECAI08, pages 403–407, 2008.
Andrew Gilpin and Tuomas Sandholm. Informationtheoretic approaches to branching in search.
Discrete Optimization, 8(2):147 – 159, 2011. ISSN 15725286.
E. Halvorson, V . Conitzer, and R. Parr. Multistep multisensor hiderseeker games. In IJCAI,
2009.
J.C. Harsanyi. Games with incomplete information played by “Bayesian” players, iiii. part i. the
basic model. Management science, 14(3):159–182, 1967.
Horizon Research Corporation. Metropolitan transit authority fare evasion study. http:
//libraryarchives.metro.net/DPGTL/studies/2002_horizon_fare_evasion_study.pdf,
2002.
Steen Huck and Wieland Mller. Perfect versus imperfect observability–an experimental test of
Bagwell’s result. Games and Economic Behavior, 31(2):174 – 190, 2000.
Manish Jain, Erim Kardes, Christopher Kiekintveld, Milind Tambe, and Fernando Ordonez. Se
curity games with arbitrary schedules: A branch and price approach. In AAAI, 2010.
Manish Jain, Dmytro Korzhyk, Ondrej Vanek, Vincent Conitzer, Michal Pechoucek, and Milind
Tambe. A double oracle algorithm for zerosum security games on graphs. In AAMAS, 2011a.
168
Manish Jain, Milind Tambe, and Christopher Kiekintveld. Qualitybounded solutions for finite
bayesian stackelberg games: Scaling up. In AAMAS, 2011b.
Albert Xin Jiang, Zhengyu Yin, Chao Zhang, Sarit Kraus, and Milind Tambe. Gametheoretic
randomization for security patrolling with dynamic execution uncertainty. In AAMAS, 2013.
Matthew P. Johnson, Fei Fang, and Milind Tambe. Patrol strategies to maximize pristine forest
area. In Conference on Artificial Intelligence (AAAI), 2012.
D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Economet
rica: Journal of the Econometric Society, 47(2):263–291, 1979.
Christopher Kiekintveld, Manish Jain, Jason Tsai, James Pita, Milind Tambe, and Fernando
Ord´ o˜ nez. Computing optimal randomized resource allocations for massive security games.
In AAMAS, 2009.
Christopher Kiekintveld, Janusz Marecki, and Milind Tambe. Approximation methods for infinite
bayesian stackelberg games: Modeling distributional payo uncertainty. In AAMAS, 2011.
D. Koller, N. Megiddo, and B. von Stengel. Fast algorithms for finding randomized strategies in
game trees. In STOC: Proceedings of the Annual ACM Symposium on Theory of Computing,
pages 750–759, 1994.
Dmytro Korzhyk, Vincent Conitzer, and Ronald Parr. Solving stackelberg games with uncertain
observability. In AAMAS, pages 1013–1020, 2011a.
Dmytro Korzhyk, Vincent Conitzer, and Ronald Parr. Security games with multiple attacker
resources. In IJCAI, pages 273–279, 2011b.
A. Krause, A. Roper, and D. Golovin. Randomized sensing in adversarial environments. In
IJCAI, 2011.
G. Leitmann. On generalized Stackelberg strategies. Optimization Teory and Applications, 26
(4):637–643, 1978.
Joshua Letchford and Vincent Conitzer. Computing optimal strategies to commit to in extensive
form games. In EC, 2010.
Joshua Letchford, Liam MacDermed, Vincent Conitzer, Ronald Parr, and Charles L. Isbell. Com
puting optimal strategies to commit to in stochastic games. In AAAI, 2012.
WaiKei Mak, David P. Morton, and R. Kevin Wood. Monte carlo bounding techniques for
determining solution quality in stochastic programs. Operations Research Letters, 24(12):47
– 56, 1999. ISSN 01676377.
R.D. McKelvey and T.R. Palfrey. Quantal response equilibria for normal form games. Games
and economic behavior, 10(1):6–38, 1995.
John Morgan and Felix Vardy. The value of commitment in contests and tournaments when
observation is costly. Games and Economic Behavior, 60(2):326–338, 2007.
169
H. Moulin and J. P. Vial. Strategically zerosum games: The class of games whose completely
mixed equilibria cannot be improved upon. International Journal of Game Theory, 7(34):
201–221, 1978.
P. Paruchuri, J. P. Pearce, J. Marecki, M. Tambe, F. Ordonez, and S Kraus. Playing games with
security: An ecient exact algorithm for Bayesian Stackelberg games. In AAMAS, 2008.
J. Pita, Manish Jain, Craig. Western, Christopher Portway, Milind Tambe, Fernando Ordonez,
Sarit Kraus, and Praveen Paruchuri. Deployed ARMOR protection: The application of a game
theroetic model for security at the los angeles international airport. In AAMAS, 2008.
James Pita, Manish Jain, Fernando Ordonez, Milind Tambe, and Sarit Kraus. Robust solutions to
stackelberg games: Addressing bounded rationality and limited observations in human cogni
tion. Artificial Intelligence Journal, 174(15):1142 – 1171, 2010.
James Pita, Milind Tambe, Chris Kiekintveld, Shane Cullen, and Erin Steigerwald. GUARDS 
game theoretic security allocation on a national scale. In AAMAS, 2011.
James Pita, Richard John, Rajiv Maheswaran, Milind Tambe, and Sarit Kraus. A robust approach
to addressing human adversaries in security games. In ECAI, 2012.
J. P. Ponssard and S. Sorin. The lp formulation of finite zerosum games with incomplete in
formation. International Journal of Game Theory, 9:99–105, 1980. ISSN 00207276. URL
http://dx.doi.org/10.1007/BF01769767. 10.1007/BF01769767.
R. Selten. Reexamination of the perfectness concept for equilibrium points in extensive games.
International Journal of Game Theory, 4:25–55, 1975.
Eric Shieh, Bo An, Rong Yang, Milind Tambe, Craig Baldwin, Joseph DiRenzo, Ben Maule, and
Garrett Meyer. Protect: A deployed game theoretic system to protect the ports of the united
states. In AAMAS, 2012.
M.B. Short, A.L. Bertozzi, and P.J. Brantingham. Nonlinear patterns in urban crime: Hotspots,
bifurcations, and suppression. SIAM Journal on Applied Dynamical Systems, 9(2):462–483,
2010.
Milind Tambe. Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned.
Cambridge University Press, 2011.
Jason Tsai, Shyamsunder Rathi, Christopher Kiekintveld, Fernando Ord´ o˜ nez, and Milind Tambe.
IRIS  a tool for strategic security allocation in transportation networks. In AAMAS  Industry
Track, 2009.
Jason Tsai, Zhengyu Yin, Junyoung Kwak, David Kempe, Christopher Kiekintveld, and Milind
Tambe. Urban security: Gametheoretic resource allocation in networked physical domains.
In AAAI, 2010.
Jason Tsai, Thanh H. Nguyen, and Milind Tambe. Security games for controlling contagion. In
AAAI, 2012.
170
Eric van Damme and Sjaak Hurkens. Games with imperfectly observable commitment. Games
and Economic Behavior, 21(12):282 – 308, 1997.
Ondrej Vanek, Michal Jakob, Viliam Lisy, Branislav Bosansky, and Michal Pechoucek. Iterative
gametheoretic route selection for hostile area transit and patrolling. In AAMAS, 2011.
Ondrej Vanek, Zhengyu Yin, Manish Jain, Branislav Bosansky, Milind Tambe, and Michal Pe
choucek. Gametheoretic resource allocation for malicious packet detection in computer net
works. In AAMAS, 2012a.
Ondrej Vanek, Zhengyu Yin, Manish Jain, Branislav Bosansky, Milind Tambe, and Michal Pe
choucek. Gametheoretic resource allocation for malicious packet detection in computer net
works. In AAMAS, 2012b.
Heinrich von Stackelberg. Marktform und Gleichgewicht. Springer, 1934.
B. von Stengel and S. Zamir. Leadership with commitment to mixed strategies. In CDAM Re
search Report LSECDAM200401, London School of Economics, 2004.
Yevgeniy V orobeychik and Satinder Singh. Computing stackelberg equilibria in discounted
stochastic games. In AAAI, 2012.
W. A. Wagenaar. Generation of random sequences by human subjects: A critical survey of
literature. Psychological Bulletin, 77(1):65–72, 1972.
Alan Washburn and Kevin Wood. Twoperson zerosum games for network interdiction. Opera
tions Research, 43(2):243–251, 1995.
M. Wooldridge. Does game theory work? Intelligent Systems, IEEE, 27(6):76–80, Nov.Dec.
2012.
Rong Yang, Christopher Kiekintveld, Fernando Ord´ o˜ nez, Milind Tambe, and Richard John. Im
proved computational models of human behavior in security games. In International Confer
ence on Autonomous Agents and Multiagent Systems (Ext. Abstract), 2011.
Rong Yang, Fernando Ordonez, and Milind Tambe. Computing optimal strategy against quantal
response in security games. In AAMAS, 2012.
Zhengyu Yin and Milind Tambe. A unified method for handling discrete and continuous uncer
tainty in Bayesian Stackelberg games. In AAMAS, 2012.
Zhengyu Yin, Dmytro Korzhyk, Christopher Kiekintveld, Vincent Conitzer, and Milind Tambe.
Stackelberg vs. nash in security games: Interchangeability, equivalence, and uniqueness. In
AAMAS, 2010.
Zhengyu Yin, Manish Jain, Milind Tambe, and Fernando Ordonez. Riskaverse strategies for
security games with execution and observational uncertainty. In AAAI, 2011.
Zhengyu Yin, Albert Xin Jiang, Matthew P. Johnson, Milind Tambe, Christopher Kiekintveld,
Kevin LeytonBrown, Tuomas Sandholm, and John Sullivan. TRUSTS: Scheduling random
ized patrols for fare inspection in transit systems. In IAAI, 2012a.
171
Zhengyu Yin, Albert Xin Jiang, Matthew P. Johnson, Milind Tambe, Christopher Kiekintveld,
Kevin LeytonBrown, Tuomas Sandholm, and John Sullivan. TRUSTS: Scheduling random
ized patrols for fare inspection in transit systems using game theory. AI Magazine, 33(4),
2012b.
Jun Zhuang and Vicki Bier. Secrecy and deception at equilibrium, with applications to anti
terrorism resource allocation. Defence and Peace Economics, 22:43–61, 2011.
172
Appendix A: Bender’s Decomposition
Benders’ decomposition, named after Jacques F. Benders, is a technique in mathematical pro
gramming that allows the solution of very large linear programming problems that have the
following special block structure (this structure often occurs in applications such as stochastic
programming):
max
x;y
1
;:::;y
k
c
T
x +f
1
T
y
1
+::: +f
k
T
y
k
s:t: Ax b
B
1
x +D
1
y
1
d
1
B
2
x D
2
y
2
d
2
:
:
:
:
:
:
:
:
:
B
k
x D
k
y
k
d
k
x; y
1
; :::; y
k
0
(A.1)
where x, y
1
;:::; y
k
are all vectors of continuous variables having arbitrary dimensions, A,
B
1
;:::; B
k
are matrices, and b, d
1
;:::; d
k
are vectors of appropriate dimensions. Due to the spe
cial structure, the problem becomes significantly easier to solve if x is fixed—we can solve for
each y
i
separately. Bender’s decomposition partitions problem (A.1) into a master problem that
contains only the xvariables, and k subproblems where the ith subproblem contains variables y
i
.
In particular, problem (A.1) can be partitioned into the master problem:
max
x
c
T
x +
P
k
i=1
i
(x)
s:t: Ax b
x 0
(A.2)
and k subproblems where for every i = 1;:::; k, the ith subproblem is:
i
(x) =
max
y
i
f
i
T
y
i
s:t: D
i
y
i
d
i
B
i
x
y
i
0
(A.3)
Formulation (A.3) is a linear program for any given x. Note that if (A.3) is unbounded for
some i and some x in the feasible region of problem (A.2), then (A.2) is also unbounded, which in
turn implies the original problem (A.1) is unbounded. Assuming boundedness of (A.3), we can
173
also calculate the value of
i
(x) by solving its dual. Let
i
be the dual variables for constraints
D
i
y
i
d
i
B
i
x. Then the dual of (A.3) is:
i
(x) =
min
i
(d
i
B
i
x)
T
i
s:t: D
T
i
i
f
i
i
0
(A.4)
The key observation is that the feasible region of the dual formulation (A.4) does not depend
on the values of variables x, which only aects the objective function. If the dual feasible region
of (A.4) is empty, then either the primal problem (A.3) is unbounded for some x and hence the
original problem (A.1) is unbounded, or the primal feasible region of (A.3) is also empty for all
x and hence the original problem (A.1) is infeasible.
Now let us consider the nontrivial case where the feasible region of (A.4) is not empty for
any i = 1;:::; k. Then we can enumerate all extreme points (
1
i
;:::;
P
i
i
), and all extreme rays
(
1
i
;:::;
R
i
i
) of the feasible region in (A.4), where P
i
and R
i
are the number of extreme points and
extreme rays of the ith subproblem respectively. Then for a given x, the ith dual problem can be
solved by (i) checking whether (d
i
B
i
x)
T
r
i
< 0 for some extreme ray
r
i
, in which case (A.4)
is unbounded and the primal formulation is infeasible, and (ii) finding an extreme point
p
i
that
minimizes the value of the objective function (d
i
B
i
x)
T
p
i
, in which case both the primal and
dual formulations have finite optimal solutions. Then the dual problem (A.4) can be reformulated
as follows:
i
(x) =
max
i
i
s:t: (d
i
B
i
x)
T
r
i
0; 8r = 1;:::; R
i
(d
i
B
i
x)
T
p
i
i
; 8p = 1;:::; P
i
(A.5)
We can replace
i
(x) in (A.2) with (A.5) and obtain a reformulation of the original problem
in terms of x and
1
;:::;
k
:
max
x;
1
;:::;
k
c
T
x +
P
k
i=1
i
s:t: Ax b; x 0
(d
i
B
i
x)
T
r
i
0; 8i = 1;:::; k 8r = 1;:::; R
i
(d
i
B
i
x)
T
p
i
i
; 8i = 1;:::; k 8p = 1;:::; P
i
(A.6)
Since there are typically an exponential number of extreme points and extreme rays of the
dual formulation (A.4), generating all constraints for (A.6) is not realistic. Instead Bender’s de
composition starts with a subset of these constraints, and solves a relaxed master problem, which
yields a candidate optimal solution (x
;
1
;:::;
k
). Then we can solve the dual subproblem (A.4)
to calculate
i
(x
). If for any i = 1;:::; k, the ith subproblem has an optimal solution such that
i
(x
) =
i
, then the algorithm stops and x
is the optimal solution of the original problem (A.1).
Otherwise, there exists at least one subproblem i such that (A.4) is unbounded or (A.4) is
bounded with
i
(x) <
i
. If the ith dual subproblem is unbounded, then an extreme ray
r
i
is
obtained and therefore the constraint (d
i
B
i
x)
T
r
i
0 should be added to the relaxed master
problem. This type of constraints is referred to as the Bender’s feasibility cuts because they
enforce necessary conditions for feasibility of the primal subproblems (A.3). On the other hand,
if the ith dual subproblem has an optimal solution such that
i
(x)<
i
, then an extreme point
p
i
is obtained and the constraint (d
i
B
i
x)
T
p
i
i
should be added to the relaxed master problem.
174
This type of constraints is referred to as the Bender’s optimality cuts because they enforce the
necessary conditions for optimality of the subproblems.
Mutliple constraints can be generated in each iteration if there are multiple subproblems that
are unbounded or have
i
(x) <
i
. After adding these constraints, we solve the new relaxed
master problem and repeat the process. Since P
i
and R
i
are finite for each subproblem i and at
least one new Bender’s cut is generated in each iteration, it can be concluded that the algorithm
will converge in a finite number of iterations, i.e., at most
P
k
i=1
P
i
+
P
k
i=1
R
i
iterations. In practice,
the number of iterations needed until convergence is orders of magnitude smaller than the total
number of extreme points and extreme rays, and therefore applying Bender’s decomposition by
solving (A.2) and (A.4) iteratively is often significantly more ecient than solving the original
linear program (A.1) directly.
175
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Thwarting adversaries with unpredictability: massivescale gametheoretic algorithms for realworld security deployments
PDF
Human adversaries in security games: integrating models of bounded rationality and fast algorithms
PDF
Protecting networks against diffusive attacks: gametheoretic resource allocation for contagion mitigation
PDF
The human element: addressing human adversaries in security domains
PDF
Hierarchical planning in security games: a game theoretic approach to strategic, tactical and operational decision making
PDF
Towards addressing spatiotemporal aspects in security games
PDF
Balancing tradeoffs in security games: handling defenders and adversaries with multiple objectives
PDF
Not a Lone Ranger: unleashing defender teamwork in security games
PDF
Predicting and planning against realworld adversaries: an endtoend pipeline to combat illegal wildlife poachers on a global scale
PDF
Game theoretic deception and threat screening for cyber security
PDF
Computational model of human behavior in security games with varying number of targets
PDF
The power of flexibility: autonomous agents that conserve energy in commercial buildings
PDF
Modeling human bounded rationality in opportunistic security games
PDF
Models and algorithms for energy efficient wireless sensor networks
PDF
Keep the adversary guessing: agent security by policy randomization
PDF
Realworld evaluation and deployment of wildlife crime prediction models
PDF
Combating adversaries under uncertainties in realworld security problems: advanced gametheoretic behavioral models and robust algorithms
PDF
Planning with continuous resources in agent systems
PDF
Discounted robust stochastic games with applications to homeland security and flow control
PDF
Handling attacker’s preference in security domains: robust optimization and learning approaches
Asset Metadata
Creator
Yin, Zhengyu
(author)
Core Title
Addressing uncertainty in Stackelberg games for security: models and algorithms
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
04/16/2013
Defense Date
03/05/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
algorithms,game theory,OAIPMH Harvest,optimization,robustness,Security
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Tambe, Milind (
committee chair
), Krishnamachari, Bhaskar (
committee member
), Maheswaran, Rajiv (
committee member
), McCubbins, Mathew D. (
committee member
), Ordonez, Fernando (
committee member
), Sandholm, Tuomas W. (
committee member
)
Creator Email
yzyyzy08@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/uscthesesc3238610
Unique identifier
UC11293489
Identifier
etdYinZhengyu1556.pdf (filename),uscthesesc3238610 (legacy record id)
Legacy Identifier
etdYinZhengyu1556.pdf
Dmrecord
238610
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Yin, Zhengyu
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 900892810, USA
Tags
algorithms
game theory
optimization
robustness