Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Information as a double-edged sword in strategic interactions
(USC Thesis Other)
Information as a double-edged sword in strategic interactions
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Information as a Double-Edged Sword in Strategic Interactions by Haifeng Xu A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) May 2019 Copyright 2019 Haifeng Xu Acknowledgment First and foremost, I would like to thank my advisors, Shaddin Dughmi and Milind Tambe, 1 whose careful mentorship and tremendous support largely influenced my growth as an academic. Being co-supervised is a very unique experience; Shaddin and Milind have made this a really harmonious and rewarding journey for me. I am very grateful for getting exposed to different research fields and experiencing different supervision styles, which also helped to shape my own perspectives. I also thank them for providing me the freedom to work on problems of my own in- terest, but always being available to help whenever I needed them. Over the years, their guidance has been a constant source of inspiration and helped me to develop as an independent researcher. I learned a lot from Shaddin through his beautiful taste of research, deep insights, fast thinking, clarity of thought and explanation, artistic paper writing, and the list goes on. Milind taught me what are important things to do at different stages, how to find significant research problems to work on, and how to explain research to any group of audience. I am also grateful to Milind for providing me the opportunity to work on cool real-world projects, such as FAMS and PAWS. I would like to thank my committee members: Odilon Cˆ amara, Vincent Conitzer, David Kempe and Detlof von Winterfeldt, not only for serving on my thesis committee but also for pro- viding invaluable suggestions for my research and career. Special thanks to Vincent Conitzer, whom I had the privilege to collaborate with and who has inspired me a lot with his broad knowledge and deep insights. I also thank David Kempe for giving me so much genuine and constructive feedback about my research, writing and presentation in the past five years. I had the fortune to collaborate with and also learn from many excellent researchers: Bo An, Ashwinkumar Badanidiyuru, Bin Gao, Kshipra Bhawalkar, Matthew Brown, Emma Bowring, Hau Chan, Yu Cheng, Bistra Dilkina, Fei Fang, Benjamin Ford, Rupert Freeman, Jiarui Gan, Mina Guirguis, Manish Jain, Nick Jennings, Albert X. Jiang, Chris Kiekintveld, Kate Larson, Tieyan Liu, Leandro Marcolino, Venil Loyd Noronha, Andrew Plumptre, Zinovi Rabinovich, Eric Rice, Aaron Schlenker, Arunesh Sinha, Solomon Sonya, Omkar Thakoor, Long Tran-Thanh, Phebe Vayanos, Yevgeniy V orobeychik, Kai Wang, Amulya Yadav and Yue Yin. Particularly, I would like to thank Tieyan Liu and Bin Gao for the memorable time at Microsoft Research Asia and for introducing me to computational game theory. If it were not for them, I do not think 1 All lists of names in this acknowledgment are in alphabetical order. ii I would be pursuing a PhD in computer science. My first publication in this area is under the guidance of Kate Larson; I thank Kate for her patient mentorship. I thank Matthew, Manish and Venil for helping me with the implementation and delivery of the software to the Federal Air Marshal Service (FAMS). I am grateful to Google Research for the PhD fellowship as well as an enjoyable summer in- ternship, where I had the fortune to learn from many excellent researchers. I also thank Ruggiero Cavallo for hosting me as a summer intern at Yahoo! Labs, which helped to broaden my research scope. A great advantage of being co-supervised is that I had the chance to interact closely with friends from two large research groups (the Teamcore group and USC theory group). Those I have not mentioned previously include: Yasaman Dehghani Abbasi, Brendan Avent, Joseph Bebel, Biswarup Bhattacharya, Elizabeth Bondi, Hsing-Hau Chen, Ho Yee Cheung, Sarah Cooney, Ehsan Emamjomeh-Zadeh, Shahrzad Gholami, Li Han, Xinran He, Debarun Kar, Lian Liu, Sara Marie Mc Carthy, Thanh Nguyen, Han Ching Ou, Yundi Qian, Ruixin Qiang, Aida Rahmattalabi, Eric Shieh, Alana Shine, Omkar Thakur, Anastasia V oloshinov, Bryan Wilder and Chao Zhang. I truly enjoyed all the casual talks with them, on research, life and other fun topics. All of them together made Teamcore and Theoroom both a home to me. I am indebted to my parents for their unconditional love, for always respecting and supporting my choices, and for always giving me the best they can give. I am most grateful to Yanqing for always being there with me, sharing my ups and downs. Research time sometimes gets dull and frustrated, but she made every minute of my life joyful. iii Contents Acknowledgment ii List Of Figures viii List Of Tables x Abstract xi I Background and Overview 1 1 Overview 2 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Background and Preliminaries 7 2.1 Information in Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 The Importance of Information in Games . . . . . . . . . . . . . . . . . 7 2.1.2 Persuasion by Utilizing Informational Advantages . . . . . . . . . . . . 8 2.2 Security Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 The General Security Game Model . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Equilibrium Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.3 Three Concrete Examples . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Related Work 14 3.1 Persuasion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Information in Security Games . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 II Exploiting Informational Advantages 20 4 Real-World Motivation and Two Illustrative Examples 21 4.1 Motivating Example I: Deterrence of Fare Evasion . . . . . . . . . . . . . . . . 21 4.2 Motivating Example II: Combating Poaching . . . . . . . . . . . . . . . . . . . 23 iv 5 Persuasion and Its Algorithmic Foundation 27 5.1 The Bayesian Persuasion Model . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2 Algorithmic Foundation for Bayesian Persuasion . . . . . . . . . . . . . . . . . 30 5.2.1 Explicit Input Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.2.2 Poly-Time Solvability for Persuasion with I.I.D Actions . . . . . . . . . 30 5.2.3 Complexity Barriers to Persuasion with Independent Actions . . . . . . . 36 5.2.4 An FPTAS for the General Persuasion Problem . . . . . . . . . . . . . . 39 5.3 Persuading Multiple Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.3.1 A Fundamental Setting: Binary Actions and No Externalities . . . . . . . 43 5.3.2 Technical Preliminaries: Set Functions and Submodularity . . . . . . . . 44 5.3.3 Optimal Private Persuasion and Its Complexity Characterization . . . . . 45 5.3.4 Private Persuasion with Submodular Objectives . . . . . . . . . . . . . . 50 5.3.5 The Sharp Contrast Between Private and Public Persuasion . . . . . . . . 55 6 Persuasion in Security Games 59 6.1 Exploiting Informational Advantage to Deter Fare Evasion . . . . . . . . . . . . 59 6.1.1 A Two-Stage Security Game Model . . . . . . . . . . . . . . . . . . . . 59 6.1.2 When Does Signaling Help? . . . . . . . . . . . . . . . . . . . . . . . . 61 6.1.3 Computing the Optimal Defender Strategy . . . . . . . . . . . . . . . . 65 6.1.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.2 Exploiting Informational Advantages to Combat Poaching . . . . . . . . . . . . 71 6.2.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.2.2 Additional Challenges and Computational Hardness . . . . . . . . . . . 73 6.2.3 A Branch-and-Price Approach . . . . . . . . . . . . . . . . . . . . . . . 75 6.2.3.1 Column Generation & Scalable Algorithms for the Slave . . . 76 6.2.3.2 LP Relaxation for Branch-and-Bound Pruning . . . . . . . . . 79 6.2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.3 Exploiting Informational Advantage in Bayesian Stackelberg Games . . . . . . . 83 6.3.1 An Example of Stackelberg Competition . . . . . . . . . . . . . . . . . 83 6.3.2 Single Leader Type, Multiple Follower Types . . . . . . . . . . . . . . . 86 6.3.2.1 Normal-Form Games . . . . . . . . . . . . . . . . . . . . . . 87 6.3.2.2 Security Games . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3.3 Multiple Leader Types, Single Follower Type . . . . . . . . . . . . . . . 93 6.3.3.1 Normal-Form Games . . . . . . . . . . . . . . . . . . . . . . 94 6.3.3.2 Security Games . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 III Dealing with Information Leakage 100 7 Real-World Motivation and Two Illustrative Examples 101 7.1 Motivating Example I: Information Leakage in Air Marshal Scheduling . . . . . 101 7.2 Motivating Example II: Information Leakage in Patrol Route Design . . . . . . . 104 7.3 The Curse of Correlation in Security Games . . . . . . . . . . . . . . . . . . . . 106 v 8 The Algorithmic Foundation for Dealing with Information Leakage 108 8.1 Information Leakage in Security Games – Two Basic Models . . . . . . . . . . . 108 8.1.1 Adversarial Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.1.2 Probabilistic Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 8.2 Complexity Barriers to Computing the Optimal Strategy . . . . . . . . . . . . . 110 8.2.1 An Exponential-Size LP Formulation and Evidence of Hardness . . . . . 111 8.2.2 The Dual Program and Evidence of Hardness . . . . . . . . . . . . . . . 115 8.3 Provable Algorithms for Restricted Settings and Approximate Solutions . . . . . 118 8.3.1 Leakage from Small Support . . . . . . . . . . . . . . . . . . . . . . . . 118 8.3.2 An Approximation Algorithm . . . . . . . . . . . . . . . . . . . . . . . 120 9 Mitigating Harms of Information Leakage via Entropy Maximization 125 9.1 The Max-Entropy Sampling Framework . . . . . . . . . . . . . . . . . . . . . . 125 9.1.1 Max-Entropy Sampling Over General Set Systems . . . . . . . . . . . . 125 9.1.2 Why Maximizing Entropy? . . . . . . . . . . . . . . . . . . . . . . . . . 128 9.2 Security Settings with No Scheduling Constraints . . . . . . . . . . . . . . . . . 129 9.2.1 A Polynomial-Time Max-Entropy Sampling Algorithm . . . . . . . . . 129 9.2.2 A Linear-Time Heuristic Sampling Algorithm . . . . . . . . . . . . . . 131 9.2.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 9.3 The Air Marshal Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . 136 9.3.1 A Polynomial-Time Max-Entropy Sampling Algorithm . . . . . . . . . . 137 9.3.2 Scalability Challenges and A Heuristic Sampling Algorithm . . . . . . . 140 9.3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 9.4 The Design of Randomized Patrol Routes . . . . . . . . . . . . . . . . . . . . . 143 9.4.1 Complexity Barriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 9.4.2 An Efficient Algorithm for a Restricted Setting . . . . . . . . . . . . . . 145 9.4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 9.4.3.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . 147 9.4.3.2 Real-World Data from the Queen Elizabeth National Park . . . 148 IV Conclusion 151 10 Conclusions and Open Directions 152 V Appendices 155 Appendix A Omitted Proofs From Section 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 A.1 Omissions from Section 5.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 A.1.1 Symmetry of the Optimal Scheme (Theorem 5.2.1) . . . . . . . . . . . . 156 A.1.2 The Optimal Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 A.1.3 A Simple (1 1=e)-Approximate Scheme . . . . . . . . . . . . . . . . 160 A.2 Proof of Theorem 5.2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 A.3 Omitted Proofs from Section 5.2.4 . . . . . . . . . . . . . . . . . . . . . . . . . 171 vi A.3.1 A Bicriteria FPTAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 A.3.2 Information-Theoretic Barriers . . . . . . . . . . . . . . . . . . . . . . . 173 Appendix B Omissions From Section 6.2.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 B.1 Omitted Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 B.2 Counter Example to Submodularity off(T ) . . . . . . . . . . . . . . . . . . . . 179 Appendix C Omitted Proofs From Section 6.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 C.1 Proof of Proposition 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 C.2 Proof of Propositions in Section 6.3.2.2 . . . . . . . . . . . . . . . . . . . . . . 180 C.3 Proof of the Polytope Transformation Lemma . . . . . . . . . . . . . . . . . . . 185 Bibliography 187 vii List Of Figures 1.1 Concrete security domains which motivate, and are also directly impacted by, the research of this thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4.1 An honor-based metro station in Los Angeles. . . . . . . . . . . . . . . . . . . . 21 4.2 Flying UA Vs for conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3 Cycle graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.1 Realizable signaturesP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 Persuasion in signature space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6.1 Feasible regions (gray areas) and an objective function gaining strictly better de- fender utility than SSE for the case U att > 0 (Left) and U att < 0 (Right). . . . . . 63 6.2 Comparison between SSE and peSSE: fixed parameterr = 3 (upper) and fixed parameter cov =0:5. The trend is similar for different r or cov, except the utility scales are different. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.3 Utility comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.4 TailoredGreedy vs. MILP . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.5 Utility comparison and scalability test of different algorithms for solving general- sum and zero-sum SEGs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.6 Payoff matrices for followers of different types . . . . . . . . . . . . . . . . . . 84 6.7 Timeline of the BSG with multiple follower types. . . . . . . . . . . . . . . . . . 87 6.8 Timeline of the BSG with multiple leader types . . . . . . . . . . . . . . . . . . 93 6.9 Extra utility gained by the leader from signaling. . . . . . . . . . . . . . . . . . 97 6.10 Runtime and utility comparisons by varying the number of actions n and the number of typesjj for the three different models in the case of multiple follower types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.1 A tweet that leaks information . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.2 A round-trip schedule with information leakage. . . . . . . . . . . . . . . . . . . 101 7.3 Desired marginal protection probabilities and two different mixed strategies to implement the marginals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7.4 An example with four cells to be protected within three time layers. . . . . . . . 105 7.5 One mixed strategy that implements the marginals in Figure 7.4 . . . . . . . . . 105 9.1 Comb sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 viii 9.2 Comparisons on real LAX airport data. . . . . . . . . . . . . . . . . . . . . . . 134 9.3 Comparisons in Simulated Games. . . . . . . . . . . . . . . . . . . . . . . . . . 135 9.4 Consistent round-trip flights between a domestic city and two outside cities. . . . 137 9.5 CARD Decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 9.6 Utility comparisons in the FAMS domain (x-axis is the DtS ratio) . . . . . . . . 142 9.7 Structure of a spatio-temporal security game . . . . . . . . . . . . . . . . . . . . 143 9.8 Utility comparisons in spatio-temporal security games. . . . . . . . . . . . . . . 148 B.1 GraphG for the counter example . . . . . . . . . . . . . . . . . . . . . . . . . . 179 ix List Of Tables 6.1 Payoff table for the constructed game . . . . . . . . . . . . . . . . . . . . . . . 65 9.1 Comparisons of different criteria at different patrol posts . . . . . . . . . . . . . 149 A.1 Receiver’s payoffs in rain and shine example . . . . . . . . . . . . . . . . . . . . 173 A.2 Two distributions on three actions . . . . . . . . . . . . . . . . . . . . . . . . . 174 x Abstract This thesis considers the following question: in systems with self-interested agents (a.k.a., games), how does information — i.e., what each agent knows about their environment and other agents’ preferences — affect their decision making? The study of the role of information in games has a rich history, and in fact forms the celebrated field of information economics. How- ever, different from previous descriptive study, this thesis takes a prescriptive approach and ex- amines computational questions pertaining to the role of information. In particular, it illustrates the double-edged role of information through two threads of research: (1) how to utilize infor- mation to one’s own advantage in strategic interactions; (2) how to mitigate losses resulting from information leakage to an adversary. In each part, we study the algorithmic foundation of basic models, and also develop efficient solutions to real-world problems arising from physical security domains. Besides pushing the research frontier, the work of this thesis is also directly impacting several real-world applications, resulting in delivered software for improving the scheduling of US federal air marshals and the design of patrolling routes for wildlife conservation. More concretely, the first part of this thesis studies an intrinsic phenomenon in human en- deavors termed persuasion — i.e., the act of exploiting an informational advantage in order to influence the decisions of others. We start with two real-world motivating examples, illustrating how security agencies can utilize an informational advantage to influence adversaries’ decisions and deter potential attacks. Afterwards, we provide a systematic algorithmic study for the foun- dational economic models underlying these examples. Our analysis not only fully resolves the computational complexity of these models, but also leads to new economic insights. We then leverage the insights and algorithmic ideas from our theoretical analysis to develop new models and solutions for concrete real-world security problems. The second part of this thesis studies the other side of the double-edged sword, namely, how to deal with disadvantages due to information leakage. We also start with real-world motivating examples to illustrate how classified information about security measures may leak to the adver- sary and cause significant loss to security agencies. We then propose different models to capture information leakage based on how much the security agency is aware of the leakage situation, and provide thorough algorithmic analysis for these models. Finally, we return to the real-world problems and design computationally efficient algorithms tailored to each security domain. xi Part I Background and Overview 1 Chapter 1 Overview 1.1 Introduction This thesis considers a basic question in multi-agent systems: How does information — i.e., what each agent knows about their environment and other agents’ preferences — affect their decision making in systems with self-interested actors (i.e., games)? Prior work, primarily from the economic literature, reveals that information can have a profound effect on the equilibrium outcome of strategic interactions; in fact, the study of the intricate role of information in games forms the celebrated field of information economics. However, previous study in economics is mostly descriptive while the prescriptive counterpart of these questions have remained largely unexplored. This thesis aims to fill this gap by taking a prescriptive approach, and examines computational questions pertaining to the role of information in games. In particular, we view information as an endogenous variable of a game and look to design the information structure that induces the most desirable equilibrium. Such problems are intrinsically algorithmic, and are particularly relevant in this digital age given the unprecedented convenience today to generate and communicate information. Our computational study not only results in implementable algorithms that enable automated applications, but also leads to new economic insights regarding the role of information in games. The primary motivating domain of this thesis is the strategic interaction between a defender and an adversary in physical security, a.k.a., security games (see Figure 1.1 for a few real-world application domains of this thesis). In a security game, the defender must allocate a limited num- ber of security resources, possibly under constraints, to protect a set of targets, while the adver- sary will strategically choose targets to attack. This important framework has been extensively studied in the past decade, and led to deployed systems in real-world use by security agencies such as the Federal Air Marshal Service (FAMS), the US Coast Guard (USCG) and the Wildlife Conservation Society (WCS) (Tambe, 2011). 2 (a) Scheduling of federal air marshals (b) Patrol planning for wildlife conservation (c) Preventing fare evasion in honor-based metro systems (d) Flying UA Vs to deter poaching in wildlife conservation Figure 1.1: Concrete security domains which motivate, and are also directly impacted by, the research of this thesis. Almost all the previous work on security games has focused on optimizing the scheduling of limited security resources to make them unpredictable to a strategic adversary. This thesis, however, departs from previous study by taking a completely different perspective and focuses on studying the effects of information on security games. Obviously, information is playing a more and more important role in security domains today. In fact, at a high level, what the defender does in a security game is essentially to hide information from the adversary by randomization; while the adversary looks to extract information from the defender by conducting surveillance. Additionally, the striking amount of information distilled from today’s numerous data sources serves as another key motivating force. For example, defenders and attackers may use sensors, surveillance tools and even infiltration techniques to collect information. Consequently, it is becoming increasingly important for us to understand how information affects such strategic interactions. 3 1.2 Summary of Contributions This thesis considers both the positive and negative effects of information on games, illustrating its double-edged role. In particular, we study questions along the following two angles: (1) how to utilize information to one’s own advantage in strategic interactions; (2) how to mitigate losses resulting from information leakage to an adversary. Each part begins with real-world motivating examples arising from physical security domains, followed by a systematic study of the fundamental theoretical questions underlying these real-world problems. We then show how our theoretical analyses shed light on practical solutions to the corresponding real-world problem. This forms an organic loop between theory and application. More concretely, the first part of this thesis studies how an agent can utilize informational advantages in strategic interactions. We start with two motivating examples. The first example illustrates how an unmanned aerial vehicle (UA V) can deter poaching activities by deceptively signaling to poachers the presence of nearby rangers — the information known to the defender but unknown to the poacher. The second example seeks to deter fare evasion in honor-based metro systems 1 via deceptive signaling. Despite the rich literature on security games, such an approach of exploiting informational advantages to improve defense has not received much attention. To study these problems, we start from the intrinsic phenomenon underlying all these examples, which is termed persuasion. Specifically, persuasion is the act of exploiting an informational advantage to influence the decisions of others; it has been the theme of a large body of work in economics due to its wide presence in many human activities including security, advertising, marketing, politics, negotiation and financial regulation. We provide a systematic algorithmic study for the most foundational model in this space as well as its natural generalizations, and fully resolve the computational complexity for these models by developing efficient algorithms whose performance match the complexity lower bound. Our algorithmic analysis not only paves the way for applications, but also leads to new economic insights about the problem. Finally, we incorporate these basic models and algorithmic ideas to develop new security game models that capture the aforementioned real-world problems and design practical algorithms to solve the model. En route to these solutions, we overcome additional challenges arising due to particular domain features. The second part of this thesis considers the other side of the double-edged sword, namely, how to mitigate harms due to information leakage. We also start with two motivating examples. The first example is about the scheduling of US federal air marshals. Since the schedule for each air marshal is usually a round trip, this necessarily creates correlation among the protection statuses of the outbound and return flights. As a result, if the real-time protection status of some 1 For example, many metro stations in Los Angeles and the Caltrain stations in San Francisco area are honor-based fare collection systems. 4 outbound flight leaks out to the adversary, he can use this information to infer the protection status of the return flight. This may cause significant loss to the defender if not addressed properly (see Section 7.1 for a concrete example). A similar issue arises in the second motivating example, con- cerning the design of randomized patrol routes for wildlife conservation. Despite its importance, such vulnerability due to information leakage has not been investigated in the previous work on security games. We initiate the study by developing basic models to capture information leakage and then provide a thorough algorithmic study of the computational complexity for computing the optimal defender strategy. Surprisingly, even for the simplest possible security game model which can be solved by a simple quadratic-time algorithm in the absence of leakage, we show that the problem suddenly becomes computationally intractable when information leakage is con- sidered in the model. This illustrates the intrinsic difficulty in handling leakage. To overcome this complexity barrier, we develop solutions from both theoretical and practical perspectives. On the theoretical side, we design efficient approximation algorithms with provable guarantees; on the practical side, we propose a sampling-based framework which efficiently generates defender mixed strategies with small correlation among targets and thus are robust to leakage. This frame- work enjoys several practical advantages which make it very useful in the real world. Finally, we use this sampling-framework to develop defender strategies that effectively mitigate the harms arising in the aforementioned real-world problems due to information leakage. En route to these solutions, we overcome specific computational challenges pertaining to each particular domain. Besides pushing the research frontier, this thesis is also directly impacting several real-world applications. For example, the software based on an algorithm from this thesis for improving the scheduling of US federal air marshal has been delivered to the Federal Air Marsha Service (FAMS) and is currently under pre-deployment evaluation. Another algorithm of this thesis for designing randomized patrol routes has been integrated into PAWS, an anti-poaching software system, and is currently being tested at several national parks in Africa. 1.3 Thesis Structure The remainder of this thesis is structured as follows. In Chapter 2, we describe the background and preliminaries. Chapter 3 surveys the related work. Afterwards, we move to the first main part of this thesis and study how to utilize informational advantages in strategic interactions. Chapter 4 describes two real-world motiving examples. Chapter 5 provides a systematic algorithmic study for the foundational economic models about the strategic use of information. Chapter 6 returns to the real world, and develops new models and algorithms to improve defense by exploiting the defender’s informational advantages at various security settings. Next, we move to the second main part of this thesis and study how to mitigate harms due to information loss to an adversary. 5 We again start with two real-world motiving examples in Chapter 7. Chapter 8 proposes basic models to capture information leakage, followed with rigorous algorithmic analysis for these models. Chapter 9 returns to real world problems, and seeks to develop efficient and practical algorithms tailored to each concrete security setting. Finally, Chapter 10 concludes the thesis with several open directions. 6 Chapter 2 Background and Preliminaries 2.1 Information in Games 2.1.1 The Importance of Information in Games The study of how information affects strategic interactions (i.e., games) dates back to the 1970s. A classic example illustrating the importance of information in games is Akerlof’s market for “lemons” (Akerlof, 1970). Akerlof considered the example of the market for used cars where buyers and sellers have asymmetric information regarding car qualities. That is, the seller knows the condition of her own car better than buyers. Akerlof observed that if car buyers cannot distinguish between good cars and bad cars (which are also called “lemons”), and therefore are only willing to pay an average price, this will drive the sellers with high-quality cars out of the market. Knowing this, the buyer will further lower his price, which then drives the sellers of average-quality cars out of the market. At its most extreme, the market would only be left with “lemons”. Therefore, the information asymmetry among buyers and sellers has led to an extremely inefficient market. To overcome such inefficiency, one way that has been suggested is to let car sellers “signal” their car quality to buyers so that buyers can distinguish good cars from “lemons”. Such signaling could be done, for example, by turning to a trusted third party for quality certification, as most of us do today. The previous example illustrates that more information may be beneficial to some or all play- ers in a game. The opposite effect is observed in brand advertising. Here, advertisers usually adopt a “semi-transparent” information revelation strategy by highlighting their products’ posi- tive attributes while obscuring the defects. In fact, in most economic activities, players or system designers tend to selectively reveal their private information in order to yield a more desirable equilibrium outcome. Such phenomena have been observed and analyzed in numerous economic realms, e.g., advertising (Anderson & Renault, 2006; Waldfogel & Chen, 2006; Johnson & My- att, 2006; Chakraborty & Harbaugh, 2014), voting (Alonso & Camara, 2014), security (Brown, 7 Carlyle, Diehl, Kline, & Wood, 2005; Powell, 2007; Zhuang & Bier, 2010), medical research (Kolotilin, 2015), and financial regulation (Gick & Pausch, 2012; Goldstein & Leitner, 2013). As all these works make clear, the information structure of a game — i.e., who has what information — can profoundly affect its equilibrium outcome. This raises a fundamental research question: how should a player utilize her own information advantage to influence the information structure of a game so that a desirable equilibrium outcome is attained? Unsurprisingly, this basic question and its instantiations in concrete domains have been extensively studied in the past. This line of work has led to many models which study how to influence the equilibrium through the control of information in different applications. In the next section, we describe one of the most foundational models in this space, namely the Bayesian persuasion model, which has been a building block of many models and applications including some of the new security games developed in this thesis. 2.1.2 Persuasion by Utilizing Informational Advantages Persuasion, sometimes also known as signaling, is the act of exploiting an informational advan- tage in order to influence the decisions of others. Persuasive communication is intrinsic in most human activities and has been estimated to account for almost a third of all economic activity in the US (Antioch, 2013). Such scenarios are increasingly common in the information economy. It is therefore unsurprising that persuasion has been the subject of a large body of work in re- cent years. In the rich literature of persuasion, perhaps no model is more basic and fundamental than the Bayesian Persuasion model of (Kamenica & Gentzkow, 2011), generalizing an earlier model from (Brocas & Carrillo, 2007). Here there are two players, who we call the sender and the receiver. The receiver is faced with selecting one of a number of actions, each of which is associated with an a-priori unknown payoff to both players. The state of nature, describing the payoff to the sender and receiver from each action, is drawn from a prior distribution known to both players. However, the sender possesses an informational advantage, namely access to the realized state of nature prior to the receiver choosing his action. In order to persuade the receiver to take a more favorable action for her, the sender can commit to a policy, often known as an infor- mation structure or signaling scheme, of releasing information about the realized state of nature to the receiver before the receiver makes his choice. This policy may be simple, say by always announcing the payoffs of the various actions or always saying nothing, or it may be intricate, involving partial information and added noise. Crucially, the receiver is aware of the sender’s committed policy, and moreover is rational and Bayesian. 8 An Example of Persuasion To illustrate the intricacy of Bayesian Persuasion, (Kamenica & Gentzkow, 2011) use a simple example in which the sender is a prosecutor, the receiver is a judge, and the state of nature is the guilt or innocence of a defendant. The receiver (judge) has two actions, conviction and acquittal, and wishes to maximize the probability of rendering the correct verdict. On the other hand, the sender (prosecutor) is interested in maximizing the probability of conviction. As they show, it is easy to construct examples in which the optimal signaling scheme for the sender releases noisy partial information regarding the guilt or innocence of the defendant. For example, if the defen- dant is guilty with probability 1 3 , the prosecutor’s best strategy is to claim “guilt” whenever the defendant is guilty, and also claim “guilt” just under half the time when the defendant is innocent. As a result, the defendant will be convicted whenever the prosecutor claims “guilt” (happening with probability just under 2 3 ), assuming that the judge is fully aware of the prosecutor’s signaling scheme. We note that it is not in the prosecutor’s interest to always claim “guilt”, since a rational judge aware of such a policy would ascribe no meaning to such a signal, and render his verdict based solely on his prior belief — in this case, this would always lead to acquittal. 1 2.2 Security Games 2.2.1 The General Security Game Model The security of critical infrastructures and resources is an important concern around the world, especially given the increasing threats of terrorism. Limited security resources cannot provide full security coverage at all places all the time, leaving potential attackers the chance to explore patrolling patterns and attack the weakness. How can we make use of the limited resources to build the most effective defense against strategic attackers? The past decade has seen an explosion of research in an attempt to address this fundamental question, which has led to the development of the well-known model of security games. The security game is a basic model for resource allocation in adversarial environments, and naturally captures the strategic interaction between security agencies and potential adversaries. Indeed, these models and their game-theoretic solu- tions have led to real-world deployments in use today by major security agencies. For example, they are used by LAX airport for checkpoint placement, the US Coast Guard for port patrolling and the Federal Air Marshal Service for scheduling air marshals (Tambe, 2011). Recently, new models and algorithms have been tested by the Transportation Security Administration for airport 1 In other words, a signal is an abstract object with no intrinsic meaning, and is only imbued with meaning by virtue of how it is used. In particular, a signal has no meaning beyond the posterior distribution on states of nature it induces. 9 passenger screening (Brown, Sinha, Schlenker, & Tambe, 2016) and by non-governmental orga- nizations in Malaysia for wildlife protection (Fang, Nguyen, Pickles, Lam, Clements, An, Singh, Tambe, & Lemieux, 2016b). Next, we give a formal description of security games. Player Strategies A security game is a two-player game played between a defender and an attacker. The defender possesses multiple security resources and aims to allocate these resources to protect n targets (e.g., physical facilities, critical locations, etc.) from the attacker’s attack. We use [n] to denote the set of these targets. A defender pure strategy is a subset of targets that is protected (a.k.a., covered) in a feasible allocation of these resources. For example, the defender may havek(<n) security resources, each of which can be assigned to protect any target. In this simple example, any subset of [n] with size at most k is a defender pure strategy. However, in practice, there are usually resource allocation constraints; thus not all such subsets correspond to feasible allocations. We will provide more examples in Section 2.2.3. A more convenient representation of a pure strategy is a binary vectore2f0; 1g n , in which the entries of value 1 specify the covered targets. LetEf0; 1g n denote the set of all defender pure strategies. Notice thatE also represents a set system. The size ofE is very large, usually exponential in the number of security resources. In the example mentioned above,jEj = (n k ) which is exponential in k. Therefore, computational efficiency in security games means time polynomial inn, notjEj. A defender mixed strategy is a distributionp over the elements inE. The attacker chooses one target to attack 2 ; thus an attacker pure strategy is a targeti2 [n]. We usey2 n to denote an attacker mixed strategy wherey i is the probability of attacking targeti. Payoff Structures The payoff structure of the game is as follows: given that the attacker attacks targeti, the defender gets utilityU d c (i) if targeti is covered or utilityU d u (i) ifi is uncovered; while the attacker gets utilityU a c (i) if targeti is covered or a rewardU a u (i) ifi is uncovered. Both players have utility 0 on the othern 1 unattacked targets. A crucial structure of security games is summarized in the following assumption:U d c (i)>U d u (i) andU a c (i)<U a u (i) for alli2 [n]. That is, covering a target is strictly beneficial to the defender compared to not covering it; and the attacker prefers to 2 There are generalizations of security games in which an attacker may attack multiple targets (see, e.g., (Korzhyk, Conitzer, & Parr, 2011a)). However, in all the models considered in this thesis, the attacker attacks only a single target. 10 attack a target when it is uncovered. 3 The security game is zero-sum ifU d c (i) +U a c (i) = 0 and U d u (i) +U a u (i) = 0 for alli2 [n]. The defender’s utility, as a function of the defender pure strategye and attacker pure strategy i, can be formally expressed as U d (e;i) =U d c (i)e i +U d u (i) (1e i ); wheree i is thei’th entry ofe. Given a defender mixed strategyp2 jEj and attacker mixed strat- egyy2 n , we useU d (p;y) to denote the defender’s expected utility, which can be expressed as U d (p;y) = P e2E P n i=1 p e y i U d (e;i) = P e2E P n i=1 p e y i U d c (i)e i +U d u (i) (1e i ) = P n i=1 y i r i P e2E p e e i +c i (1 P e2E p e e i ) = P n i=1 y i U d c (i)x i +U d u (i) [1x i ] (2.1) where x i = X e2E p e e i 2 [0; 1] (2.2) is the marginal coverage probability of target i. Let x = (x 1 ;:::;x n ) T denote the marginal probability for all targets induced by the mixed strategyp. Notice that the marginal probability induced by a pure strategy e is precisely e itself. Equation (2.1) shows that the defender’s ex- pected utility is bilinear inx andy, wherex is the marginal probability induced by the defender mixed strategy. The convex hull ofE forms a polytopeP =fx : x = P e2E p e e;8p2 jEj g which consists of all the feasible (i.e., implementable by a defender mixed strategy) marginal proba- bilities. Therefore, we will also interpret a pointx2P as a mixed strategy, and instead write the defender’s utility as U d (x;y). Similarly, the attacker’s expected utility can be compactly represented in the following form. We note thatU a (x;y) is also bilinear inx andy. U a (x;y) = n X i=1 y i U a c (i) [1x i ] +U a u (i)x i : 2.2.2 Equilibrium Concepts Zero-Sum Settings and the Minimax Equilibrium Many security games, including some deployed systems (An, Shieh, Tambe, Yang, Baldwin, DiRenzo, Maule, & Meyer, 2012; Yin, Jiang, Tambe, Kiekintveld, Leyton-Brown, Sandholm, 3 In practice, the attacker can also choose to not attack. This can be incorporated into the current model by adding a dummy target. Therefore, we will not explicitly consider this. 11 & Sullivan, 2012), are modeled as zero-sum games. That is, the defender’s reward [cost] is the attacker’s cost [reward]. For example, in the deployed security system for patrolling proof- of-payment metro systems (Yin et al., 2012), the defender aims to catch fare evaders at metro stations. This game can be viewed as zero-sum due to the following reasons: the evader’s cost of paying a fine is the defender’s reward of catching the evader, while the ticket price is the evader’s reward and the defender’s cost if failing to catch the evader. In zero-sum games, all standard equilibrium concepts are payoff-equivalent to the well-known minimax equilibrium. General-Sum Settings and the Strong Stackelberg Equilibrium When the game is not zero-sum, the main solution concept adopted in the literature of security games is the strong Stackelberg equilibrium (SSE) (von Stackelberg, 1934; von Stengel & Za- mir, 2004). In particular, the defender plays the role of the leader and can commit to a mixed strategy before the attacker moves. The attacker observes the defender’s mixed strategy and best responds. This is motivated by the consideration that the attacker usually does surveillance be- fore committing an attack, and thus is able to observe the empirical distribution of the defender’s patrolling strategy (Tambe, 2011). In this case, our goal is to compute the optimal mixed strat- egy for the defender to commit to (the attacker’s best response problem is usually easy). The adoption of SSE has another advantage — in contrast to the intractability of Nash equilibria in normal-form games (Daskalakis, Goldberg, & Papadimitriou, 2006; Chen, Deng, & Teng, 2009), SSE is typically tractable in both normal-form games (Conitzer & Sandholm, 2006) and security games (Xu, 2016). Moreover, previous work shows that under minor technical assumptions, the defender’s SSE strategy is always a Nash equilibrium strategy and all Nash equilibria in security games are exchangeable, which alleviates the equilibrium selection issue (Korzhyk, Yin, Kiek- intveld, Conitzer, & Tambe, 2011b). This serves as a theoretical justification for adopting SSE in security games. Notice that the classic Stackelberg security game model always assumes that the attacker is not able to observe, even partially, the defender’s real-time deployment (i.e., the sampled pure strategy). 2.2.3 Three Concrete Examples Section 2.2.1 gives an abstract description about the general security game model. The difference among various concrete security games essentially lies in the structure ofE, i.e., the structure of pure strategies. Next, we will describe a few examples. Airport Checkpoint Placement. In the problem of placing checkpoints at different entrances of an airport to prevent potential attackers’ attack (Pita, Jain, Marecki, Ord´ o˜ nez, Portway, Tambe, Western, Paruchuri, & Kraus, 2008a), the defender can place limited security resources at any 12 subset of airport entrances of a limited size. This can be modeled as a game where the defender has k security resources, and each resource can be assigned to protect any one of n targets. Therefore, any subset of [n] of size at mostk is a defender pure strategy. The set systemE is also called a uniform matroid in this setting. Scheduling of Federal Air Marshals. In the problem of scheduling air marshals to protect flights (Jain, Kardes, Kiekintveld, Ordez, & Tambe, 2010), flights are targets to be protected and air marshals are security resources. Since the schedule for each air marshal is usually a tour consisting of multiple flights, each pure strategy in this case correspond tok feasible tours where k is the number of air marshals. Here, tour feasibility means that the departure and arrival time of all flights within the tour should be compatible. Patrol Route Design for Wildlife Conservation. In the problem of designing randomized patrol routes for wildlife conservation (Fang et al., 2016b), targets correspond to the areas to be patrolled (usually discretized into cells), and rangers are security resources. Any defender pure strategy consists ofk feasible patrol routes, each for a ranger. 13 Chapter 3 Related Work 3.1 Persuasion To our knowledge, (Brocas & Carrillo, 2007) were the first to explicitly consider persuasion through information control. They consider a sender with the ability to costlessly acquire infor- mation regarding the payoffs of the receiver’s actions, with the stipulation that acquired infor- mation is available to both players. This is technically equivalent to an informed sender who commits to a signaling scheme. Brocas and Carrillo restrict attention to a particular setting with two states of nature and three actions, and characterize optimal policies for the sender and their associated payoffs. The Bayesian Persuasion model of (Kamenica & Gentzkow, 2011) naturally generalizes (Brocas & Carrillo, 2007) to finite (or infinite yet compact) states of nature and action spaces. They establish a number of properties of optimal information structures in this model; most notably, they characterize settings in which signaling strictly benefits the sender in terms of the convexity/concavity of the sender’s payoff as a function of the receiver’s posterior belief. The Bayesian persuasion model is foundational for understanding the strategic use of informa- tional advantage since it considers essentially the simplest possible scenario in this space — one persuader (i.e., the sender) influences the action of one decision maker (i.e., the receiver). Since (Brocas & Carrillo, 2007) and (Kamenica & Gentzkow, 2011), an explosion of interest in persuasion problems followed. Generalizations and variants of the Bayesian persuasion model have been considered: (Gentzkow & Kamenica, 2016) consider multiple senders, (Alonso & Cˆ amara, 2016) consider multiple receivers in a voting setting, (Gentzkow & Kamenica, 2014) consider costly information acquisition, (Rayo & Segal, 2010) consider an outside option for the receiver, and (Kolotilin, Mylovanov, Zapechelnyuk, & Li, 2017) considers a receiver with private side information. The basic Bayesian persuasion model underlies, or is closely related to, recent work in a number of different domains: price discrimination (Bergemann, Brooks, & Morris, 2015), advertising (Chakraborty & Harbaugh, 2014), security games (Rabinovich, Jiang, Jain, & Xu, 2015), recommendation systems (Kremer, Mansour, & Perry, 2014; Mansour, Slivkins, & 14 Syrgkanis, 2015), medical research (Kolotilin, 2015), and financial regulation (Gick & Pausch, 2012; Goldstein & Leitner, 2013). A important generalization of the Bayesian persuasion model is one recently proposed by (Arieli & Babichenko, 2016). Here the sender interacts with multiple receivers, each of whom is restricted to a binary choice of actions. As mentioned in (Arieli & Babichenko, 2016; Babichenko & Barman, 2017), settings like this arise when a manager seeks to persuade investors to invest in a project, or when a principal persuades opinion leaders in a social network with the goal of maximizing social influence. Each receiver’s utility depends only on his own action and the state of nature, but crucially not on the actions of other receivers — the no externality assumption. The sender’s utility, on the other hand, depends on the state of nature as well as the profile of receiver actions. As in (Kamenica & Gentzkow, 2011), the state of nature is drawn from a common prior, and the sender can commit to a policy of revealing information regarding the realization of the state of nature. Since there are multiple receivers, this policy — the information structure — is more intricate, since it can reveal different, and correlated, information to different receivers. As made clear in (Arieli & Babichenko, 2016), such flexibility is crucial to the sender unless receivers are homogeneous and the sender’s utility function highly structured (for example, additively sep- arable across receivers). In particular, if restricted to a public communication channel, the sender is limited in her ability to discriminate between receivers and correlate their actions, whereas a private communication channel provides more flexibility. However, the extent to which a public communication channel limits the sender’s powers of persuasion is a fundamental question which has not been thoroughly explored. Much of the earlier work on persuasion (a.k.a., signaling), in particular its computational as- pects, focused on public signaling models. This includes work on signaling in auctions (Emek, Feldman, Gamzu, Paes Leme, & Tennenholtz, 2012; Miltersen & Sheffet, 2012; Guo & Deligkas, 2013; Dughmi, Immorlica, & Roth, 2014), voting (Alonso & Camara, 2014), routing (Bhaskar, Cheng, Ko, & Swamy, 2016), and abstract game models (Dughmi, 2014; Cheng, Cheung, Dughmi, Emamjomeh-Zadeh, Han, & Teng, 2015; Bhaskar et al., 2016; Rubinstein, 2017). The work of (Cheng et al., 2015) is relevant to our results in Section 5.3 in that they identify conditions under which public persuasion problems are tractable to approximate, and prove impossibility re- sults in some cases where those conditions are violated. Our hardness proof in Section 5.3 is in part based on some of their ideas. Private persuasion has been less thoroughly explored, particularly through the computational lens. There is a recent line of work that explores private persuasion in the context of voting (Wang, 2015; Chan, Gupta, Li, & Wang, 2016; Bardhi & Guo, 2016). Additionally, the space of possible information structures and their induced equilibria is characterized in two-agent two- action games by (Taneva, 2015). 15 The models in (Kamenica & Gentzkow, 2011; Arieli & Babichenko, 2016) and other works are crucially based on the assumption that the sender has the power of commitment to a signaling scheme. The commitment assumption is not as unrealistic as it might first sound, and a number of arguments to that effect are provided in (Rayo & Segal, 2010; Kamenica & Gentzkow, 2011; Dughmi, 2017). We mention one of those arguments here: commitment arises organically at equilibrium if the sender and receiver(s) interact repeatedly over a long horizon, in which case commitment can be thought of as a proxy for “establishing credibility.” Other Problems Related to Persuasion Optimal persuasion is a special case of information structure design in games. The space of (private channel) information structures is studied by (Bergemann & Morris, 2016), who observe that these information structures and their associated equilibria form a generalization of correlated equilibria, and term the generalization the Bayes Correlated Equilibrium (BCE). Recent work in the CS community has also examined the design of information structures algorithmically. Work by (Emek et al., 2012), (Miltersen & Sheffet, 2012), (Guo & Deligkas, 2013), and (Dughmi et al., 2014), examine optimal signaling in a variety of auction settings, and presents polynomial-time algorithms and hardness results. (Dughmi, 2014) exhibits hardness results for signaling in two- player zero-sum games, and (Cheng et al., 2015) present an algorithmic framework and apply it to a number of different signaling problems. Also related to the Bayesian persuasion model is the extensive literature on cheap talk starting with (Crawford & Sobel, 1982). Cheap talk can be viewed as the analogue of persuasion when the sender cannot commit to an information revelation policy. Crawford and Sobel (1982) char- acterize the set of Bayesian Nash equilibria of the cheap talk game and show that, under technical assumptions, the sender’s equilibrium signaling scheme is more informative when her preference is more aligned with the receiver. After (Crawford & Sobel, 1982), there has been extensive study in the cheap talk model, and its variants and applications; we refer the reader to (Crawford, 1998) for a survey. When the sender has the power to commit, the game becomes a Stackelberg game. The commitment assumption in persuasion has been justified on the grounds that it arises organ- ically in repeated cheap talk interactions with a long horizon — in particular when the sender must balance his short term payoffs with long-term credibility. We refer the reader to the discus- sion of this phenomenon in (Rayo & Segal, 2010). Also to this point, (Kamenica & Gentzkow, 2011) mention that an earlier model of repeated 2-player games with asymmetric information by (Aumann, Maschler, & Stearns, 1995) is mathematically analogous to Bayesian persuasion. Various recent models on selling information in (Babaioff, Kleinberg, & Paes Leme, 2012; Bergemann & Bonatti, 2015; Bergemann, Bonatti, & Smolin, 2016) are quite similar to Bayesian 16 persuasion, with the main difference being that the sender’s utility function is replaced with rev- enue. Whereas (Babaioff et al., 2012) consider the algorithmic question of selling information when states of nature are explicitly given as input, the analogous algorithmic questions to ours have not been considered in their model. We speculate that some of our algorithmic techniques might be applicable to models for selling information when the prior distribution on states of nature is represented succinctly. 3.2 Information in Security Games Secrecy, Deception, and Strategic Signaling in Security Previous work on homeland security has realized the importance of information asymmetry be- tween the defender and adversary (Brown et al., 2005; Powell, 2007; Zhuang & Bier, 2010). In particular, they justify, via theoretical models, that it is important for the defender to hide private information and remain unpredictable to the adversary. However, these works mainly focused on studying how a defender can hide private information by secrecy and deception. For example, (Powell, 2007) observes that more defense on a particular target may not always be beneficial, since it may help the attacker to infer the importance of a target. These works inspire our explo- ration of the role of information in Stackelberg security games (SSGs). However, their goals and approaches differ from ours. (Yin, An, V orobeychik, & Zhuang, 2013) consider optimal allocation of deceptive resources (e.g., hidden cameras) in the Stackelberg game model. This naturally introduces asymmetric information regarding deployments of resources between the defender and attacker — i.e., the defender has private information regarding the allocation of these security resources while the attacker may not know. However, (Yin et al., 2013) did not consider the strategic use of such informational advantage. Instead, they model the failure of deceptive resources by a probability and feed it to a resource allocation formulation. (Zhuang & Bier, 2011) study a question that is more relevant to ours. They develop a game- theoretic model to analyze whether a defender should disclose correct information about her resource allocation, incorrect information, or no information. However, their work is more about comparing three natural strategies of information revelation while our work takes an optimization approach to compute the optimal strategy of information revelation. To the best of our knowledge, little is known in the prior literature about how to optimally reveal a defender’s private information to improve the defense. Only very recently, concurrently with this thesis work, have researchers started to investigate the strategic use of signaling or deception to improve the defender’s utility in security games (Rabinovich et al., 2015; Talmor & Agmon, 2017; Guo, An, Bosansky, & Kiekintveld, 2017). (Rabinovich et al., 2015) study 17 how a defender can increase her utility by deceptively revealing her private information about the vulnerability of different targets in order to mislead the attacker. (Rabinovich et al., 2015) analyze the computational complexity of the problem and experimentally show that such strategic use of an informational advantage may significantly increase the defender’s utility. (Talmor & Agmon, 2017) compare the advantages and limitations of several different deceptive strategies to manipulate the attacker’s belief in a multi-robot adversarial patrolling setting. (Guo et al., 2017) examine the Stackelberg security game setting and analyze the benefit for a defender of disguising her security resources. Gathering Information via Sensors for Security The security game model in Section 6.2 uses UA Vs (more generally, mobile sensors) to collect information about the poacher and deceptively signal the defender’s private information to deter illegal poaching. This part relates to several threads of research on security. The first line of research considers how to use UA Vs to gather information or monitor targets (Stranders, De Cote, Rogers, & Jennings, 2013; Mersheeva & Friedrich, 2015). The main research challenge there is to optimize the patrolling path of UA Vs so that it maximizes the defender’s objective. These works are usually in non-strategic settings and only consider the planning of UA V paths. In contrast, our work falls into a game-theoretic setting with an adversarial attacker. Moreover, we consider the joint task of UA V path planing and deceptive signaling, and seek to compute the globally optimal defending policy. Another interesting line of research studies adversarial patrolling games with alarm systems (Basilico, De Nittis, & Gatti, 2017b; Basilico, Celli, De Nittis, & Gatti, 2017a), which also utilizes sensors (i.e., alarms) to assist patrollers. The sensors in all these works are static (staying at fixed locations) and do not strategically signal. Sensors in our model, however, can strategically signal and are mobile. Such mobility gives us the extra flexibility to optimize their (possibly randomized) allocation. Concerns of Information Leakage in Games To our knowledge, (Alon, Emek, Feldman, & Tennenholtz, 2013) are the first to study games with information leakage. They focused on two-player zero-sum normal-form games. (Alon et al., 2013) consider a game with two players (player A and B) and assume that not only player A’s mixed strategy but also some partial information about her realized pure strategy is known to player B (a situation that is termed information leakage). The goal is to compute player A’s optimal strategy to play under the leakage model. Even for simple normal-form zero-sum games, they exhibit NP-hardness results for several model variants. The information leakage in our 18 work has similar meaning to that of (Alon et al., 2013). However, our specific leakage models are directly tied to the particular structure of security games and are different from the abstract leakage models considered in (Alon et al., 2013). Therefore, their hardness results do not directly apply to our settings. Information leakage has not received much attention in the study of Stackelberg security games. However, in the literature on adversarial patrolling games (APGs), the attacker’s real- time surveillance of the defender’s pure strategy has been considered (Agmon, Sadov, Kaminka, & Kraus, 2008; Basilico, Gatti, Rossi, Ceppi, & Amigoni, 2009b; Alpern, Morton, & Papadaki, 2011; Boˇ sansk´ y, Lis´ y, Jakob, & Pˇ echouˇ cek, 2011; V orobeychik, An, & Tambe, 2012). All these papers study settings of patrols carried out over space and time, i.e., the defender follows a sched- ule of visits to multiple targets over time. In addition, they assume that an attack action is not instantaneous and it takes time for the attacker to execute an attack, during which the defender can interrupt the attacker by visiting the attacked target. Therefore, even if the attacker can fully observe the current position of the defender (in essence, status of all targets), he may not have enough time to complete an attack on a target before being interrupted by the defender. The main challenge there is to create patrolling schedules with the smallest possible time between any two target visits. In contrast, our work studies information leakage in Stackelberg security game models, where the attack is instantaneous and cannot be interrupted by the defender’s resource rescheduling. Furthermore, as may be more realistic in our settings, we assume that information is leaked from a small subset of targets. As a result, our setting necessitates novel models and techniques. In some settings, a security game with information leakage can be viewed as an extensive- form game (EFG). Though there has been significant progress in solving general-purpose large EFGs recently (Letchford & Conitzer, 2010; Boˇ sansk´ y, Kiekintveld, Lis´ y, & Pˇ echouˇ cek, 2014; Cermak, Bosansky, Durkota, Lisy, & Kiekintveld, 2016; Cermak, Boˇ sansk´ y, & Lis´ y, 2017), we did not take this approach because the size of information sets in our game increases exponentially in the number of security resources, time steps and possibly leaking targets. This very quickly makes our problem intractable (see Section 9.1.2 for more details). 19 Part II Exploiting Informational Advantages 20 Chapter 4 Real-World Motivation and Two Illustrative Examples In this chapter, we will describe two concrete examples motivated from real-world domains that illustrate how informational advantages can be utilized to improve security. 4.1 Motivating Example I: Deterrence of Fare Evasion Our first example concerns the problem of deterring fare evasion in honor-based metro stations. Such metro systems exist in many cities, e.g., many metro stations in Los Angeles (see Figure 4.1) and the Caltrain stations in San Francisco area are honor-based fare collection systems. One problem of these systems is that some passengers get into the metro without purchasing a ticket. For example, it was estimated that such fare evasion results in a loss of $5.6 million each year in Los Angeles. To prevent such fare evasion, the Los Angeles Sheriff Department (LASD) allocate ticket inspectors to these metro stations. However, the LASD has a very limited number of ticket inspectors and can only inspect a few stations at a time. Naturally, they will allocate the inspectors randomly with the goal of deterring as much fare evasion as possible. To be concrete, let us consider the following example. The LASD, as the defender, aims to schedule 10 ticket inspectors to protect 50 identical (w.r.t. importance) metro stations, namely t 1 ; ;t 50 . Each ticket inspector can cover one metro station. Therefore, the defender’s pure strategies are simply arbitrary subsets of size at most 10 of the 50 stations. For each “potential” Figure 4.1: An honor-based metro station in Los Angeles. 21 fare evader, if he indeed does not purchase a ticket, the defender will get utility 2 for catching the evader through inspection and get utility2 for failing to catch the fare evader. For simplicity, we assume that a fare evader will be caught for sure if the corresponding station is under inspection. Using the security game notations from Section 2.2, this meansU u d (t i ) =2 andU c d (t i ) = 2, for alli = 1; ; 50. On the other hand, the fare evader gets utility 2 if he is not caught and utility 6 otherwise. That is,U u a (t i ) = 2 andU c a (t i ) =6, for alli = 1; ; 50. The fare evader also has the option of choosing to purchase a ticket. In that case, both players get utility 0. We note that, these simple utility numbers are chosen for convenience, and the example is similarly valid when these numbers are substituted by the realistic ones. We view the problem as a two-player game played between the defender (i.e., the LASD) and a potential fare evader. By symmetry, the optimal defender strategy is to protect each metro station with probability 10 50 = 0:2. This results in an expected attacker utility 0:82+0:2(6) = 0:4, which is greater than 0, the utility of not purchasing a ticket. Therefore, the fare evader will prefer to not purchase a ticket, resulting in defender utility 0:8 (2) + 0:2 2 =1:2. We have computed the Strong Stackelberg Equilibrium (SSE) — traditionally we would be done. However, one interesting question is whether1:2 is the best possible utility that the defender can achieve. Is there a way to achieve better defender utility? The answer turns out to be “yes”. Our approach exploits the asymmetric knowledge of the defensive strategy between the defender and the fare evader — the defender knows more. We show that, surprisingly, the defender can significantly improve her utility by strategically revealing such information. For any metro stationt i , letX c [X u ] denote the event thatt i is under inspection [not under inspection]. The defender’s mixed strategy results inP(X c ) = 0:2 andP(X u ) = 0:8. Consider a fare evader at some station, w.l.o.g., say stationt 1 . The fare evader only knows thatt 1 is protected with probability 0:2 , while the defender knows precisely whether stationt 1 is protected or not. We now design a policy for the defender to strategically reveal part of this information to the fare evader. More concretely, we will sometimes put a warning sign (e.g., a sign like “inspection in progress”) at the entrance of the station. Let + [ ] denote the situation that there is a warning sign [no warning sign]. The policy for putting the sign is defined as follows (> 0 is a negligible positive constant), and we assume that the defender commits to this policy: P( + jX c ) = 1 P( jX c ) = 0; P( + jX u ) = 3=4 P( jX u ) = 1=4 +: In other words, ift 1 is under inspection, the defender will always announce + ; ift 1 is not under inspection, the defender will announce + with probability 3=4 and with probability 1=4+ . This is also called a signaling scheme and + ; are signals which carry noisy information about the underlying true protection status of the station. We assume that this signaling scheme 22 is publicly known (thus also known to the fare evader) since passengers may learn it from their past observations. We analyze the scheme from the fare evader’s perspective. If he receives signal + , occurring with probability P( + ) =P( + jX c )P(X c ) +P( + jX u )P(X u ) = 0:8(1); the fare evader infers the following posterior belief via Bayes’ rule: P(X c j + ) = P( + jX c )P(X c ) P( + ) = 1 4(1) and similarly,P(X u j + ) = 34 4(1) . Therefore, the fare evader’s expected utility for not purchas- ing a ticket conditioned on + is 1 4(1) (6) + 3 4 4(1) 2 = 2 1 ; which is strictly less than 0. Therefore, the fare evader will prefer to purchase a ticket, and both players get utility 0 instead. On the other hand, if the attacker receives signal (with probability 0:2 + 8), he infers that the station is not under inspection, and thus will not purchase a ticket. In this situation, the defender’s utility is2. As a result, overall the defender receives expected utility (0:2 + 8) (2) =0:4 16 at targett 1 , which is significantly larger than her original utility1:2 (for a small ). Interestingly, the attacker’s expected utility is (0:2 + 8) 2 = 0:4 + 16 which essentially equals his SSE utility 0:4 (up to the negligible). We remark that the signals + ; have no intrinsic meaning besides the posterior distribu- tions inferred by the fare evader based on the signaling scheme and prior information. Intuitively, by designing signals, the defender identifies a “part” of the prior distribution that is “bad” for both players, i.e., the posterior distribution of + , and signals as much to the fare evader, so that the two players can “cooperate” to avoid it. This is why the defender can do strictly better while the attacker is not worse off. 4.2 Motivating Example II: Combating Poaching Our second concrete example concerns the protection of conservation areas (Fang, Nguyen, Pick- les, Lam, Clements, An, Singh, & Tambe, 2016a). Illegal poaching is a major threat to endangered animals. For example, from 2010 to 2013, within just 2 years, about 20% of animals in Africa were killed due to illegal poaching (Wittemyer, Northrup, Blanc, Douglas-Hamilton, Omondi, & Burnham, 2014). Recently, there has been a rapidly growing trend of using UA Vs, or more generally, mobile sensors, to combat poaching (Figure 4.2). Next, we illustrate how a UA V can utilize the defender’s informational advantage to deter illegal poaching. 23 To be concrete, consider the problem where a defender needs to protect 8 conservation areas whose underlying geographic structure is captured by a cycle graph depicted in Figure 4.3 (e.g., they are the border areas of the park): each vertex represents an area. Edges indicate the adjacency relation among these areas. The defender has only one patroller. There is a poacher who seeks to attack one area. For simplicity, assume that these 8 areas are of equal importance. Figure 4.2: Flying UA Vs for conservation Figure 4.3: Cycle graph. If the poacher is caught by the patroller in any area, the defender [poacher] gets utility 1 [1]; if the poacher successfully attacks an area, the defender [poacher] gets utility5 [1:25]. The defender has only one patroller, who can protect any area in the graph. Since areas are symmetric, it is easy to see that the optimal patrolling strategy here simply assigns the only patroller to each area with equal probability 1=8. As a result, the poacher attacks an arbitrary area, resulting in expected defender utility 1 1 8 + (5) 7 8 =17=4. Now consider that the defender is assisted by 4 UA Vs (e.g., an NGO named Air Shepherd [http://airshepherd.org/] provides such UA Vs for conservation). Each UA V can be assigned to patrol any area. When the poacher visits any areai, he will be caught right away if the patroller is ati. If there is neither the patroller nor a UA V at areai, the poacher will successfully poach animals at that target. If there is a UA V at i, since UA Vs are usually visible by the poacher from a distance, the poacher has a chance of choosing to continue the poaching or stop poaching and run away, based on his rational judgment, upon seeing the UA V . If he chooses to continue poaching, the attack will fail if the patroller is at any neighbor of areai, since the UA V can notify the patroller to come and catch the poacher (e.g., this is how air Shepherd operates). Otherwise, the poaching succeeds (despite the presence of the UA V). The poacher can also choose to stop poaching and immediately run away, in which case both players get utility 0. We are interested in the defender’s optimal strategy for allocating these resources. By sym- metry of the problem, it is natural to consider the following randomized strategy. The defender first chooses an area i uniformly at random to place the patroller, and then uses two UA Vs to 24 cover the left two neighbors ofi and another two to cover the right two neighbors. The pattern is also illustrated in Figure 4.3 where the thick dark vertex for placing the patroller is chosen uniformly at random. Under such an allocation, each vertex is assigned the patroller with prob- ability 1=8 and is assigned a UA V with probability 4=8. By symmetry, the poacher still chooses an arbitrary area to visit. With probability 1=8, the poacher will be caught by the patroller right away; with probability 3=8, the poacher encounters neither the patroller nor the UA Vs, and thus will successfully conduct an attack. With the remaining 4=8 probability, the poacher will see a UA V and needs to make a choice of continuing or stopping poaching. Observe that condi- tioned on a UA V showing up at an area, with probability 0:5, the patroller is at its neighbor- ing area. This is because out of the four areas covered by UA Vs, two are neighbors of the patroller-covered area. Therefore, the rational poacher will update his expected utility of con- tinuing poaching, as (1) 0:5 + 1:25 0:5 = 0:125 which is greater than the utility of stopping poaching. So the poacher will prefer to continue poaching, resulting in expected defender utility 1 0:5 + (5) 0:5 =2. Taking expectations over all possible situations, the defender derives expected utility 1 1 8 + (5) 3 8 + (2) 4 8 =11=4, which is an improvement over her previous utility of17=4. A more interesting question is whether the defender can achieve utility that is even larger than11=4. The answer turns out to be “yes”. We show that the defender can further improve her utility via strategic signaling, which is a natural functionality of UA Vs. Such improvement is possible when the poacher visits an areai covered by a UA V . In particular, let s+ [ s ] denote the random event that there is a patroller [no patroller] at some neighbor of areai. As mentioned before, conditioned on seeing a UA V ati, the poacher infersP( s+ ) =P( s ) = 0:5. However, the UA V will know the precise state ofi through communications with the defender. The UA V could strategically signal the state of areai to the poacher with the goal of deterring his poaching. This may sound counter-intuitive at first, but it turns out that strategic signaling does help. In particular, the following signaling scheme with two signals improves the defender’s utility: P(alertj s+ ) = 1 P(quietj s+ ) = 0; P(alertj s ) = 0:8 P(quietj s ) = 0:2: That is, when there is a patroller near areai (state s+ ), the UA V always sends an alert signal; when there is no patroller neari (state s ), 80% percent of the time the UA V still sends an alert signal while it keeps quiet otherwise. We assume that the poacher is aware of the signaling scheme and will best respond to each signal. If he receives an alert signal, which occurs with probability: P(alert) = P(alertj s+ )P( s+ ) +P(alertj s )P( s ) = 0:9, the poacher infers a posterior distribution on the state by Bayes rule: P( s+ jalert) = P(alertj s+ )P( s+ ) P(alert) = 5 9 andP( s jalert) = 4 9 . This 25 posterior results in expected poacher utility 5 9 (1) + 4 9 1:25 = 0, which is the same as the utility from not attacking. We assume that the poacher breaks ties in favor of the defender (see justifications later) and, in this case, chooses to stop poaching. This results in utility 0 for both players. On the other hand, if the poacher receives a quiet signal, he knows for sure that there is no patroller nearby; thus he chooses to continue poaching, resulting in defender utility5. As a result, the above signaling scheme (which occurs whenever a poacher encounters a UA V) results in defender utility 0 0:9 + (5) 0:1 =0:5. Overall, the defender’s expected utility is further improved to 1 1 8 +(5) 3 8 +(0:5) 4 8 =2, which is less than half of the original loss17=4. Remark. This example shows how a defender can utilize an informational advantage to deceive the poacher and improve her utility. Note that a signal takes effect only through its underlying posterior distribution over s . In the above example, the poaching would not have been deterred if the UA V always sent an alert signal since in that case the poacher would ignore the signal and act based on his prior belief. However, the signals could be deceptive in the sense that an alert may be issued even when there is no patroller nearby. The poacher still prefers to stop poaching even though he is aware of the potential deception! 26 Chapter 5 Persuasion and Its Algorithmic Foundation Though the two motivating examples in Chapter 4 are in security domains, the underlying phe- nomenon they illustrate is more general and fundamental. Such act of exploiting an informational advantage in order to influence the decisions of others is called persuasion. Indeed, persuasion is intrinsic in most human activities — persuasive communication has been estimated to account for almost a third of all economic activity in the US (Antioch, 2013). Such scenarios are increasingly common in today’s information economy. It is therefore unsurprising that persuasion has been the subject of a large body of work in recent years. In the rich literature of persuasion, perhaps no model is more basic than the Bayesian Persuasion (BP) model of (Kamenica & Gentzkow, 2011). It has been a building block of many models and applications. In the next section, we will provide a formal description of the BP mode, referring back to the poaching example in Section 4.2 to illustrate how the interaction there can be framed using the BP model. The correspondence between the fare evasion example in Section 4.1 and the BP model follows similarly. 5.1 The Bayesian Persuasion Model In a Bayesian persuasion game, there are two players: a sender and a receiver. The receiver is faced with selecting an action from [n] =f1;:::;ng, with an a-priori-unknown payoff to each of the sender and receiver. We assume that payoffs are a function of an unknown state of nature , drawn from an abstract set of potential realizations of nature. Specifically, the sender and receiver’s payoffs are functionss;r : [n]!R, respectively. We user =r()2R n to denote the receiver’s payoff vector as a function of the state of nature, wherer i () is the receiver’s payoff if he takes actioni and the state of nature is. Similarlys = s()2 R n denotes the sender’s payoff vector, and s i () is the sender’s payoff if the receiver takes action i and the state is . Without loss of generality, we often conflate the abstract set indexing states of nature with the set of realizable payoff vector pairs (s;r) — i.e., we think of as a subset ofR n R n . 27 Correspondence to the example of Section 4.2: In the example, the defender is the sender and the poacher is the receiver. After seeing the UAV , the poacher has two actions — either choose to continue poaching or stop poaching and run away. The random state of nature describes whether a ranger is nearby or not, so has two possible realizations. Naturally, affects both the defender’s and poacher’s utilities. In Bayesian persuasion, it is assumed that the state of nature is a-priori unknown to the re- ceiver, and drawn from a common-knowledge prior distribution supported on . The sender, on the other hand, has access to the realization of, and can commit to a policy of partially revealing information regarding its realization before the receiver selects his action. Specifically, the sender commits to a signaling scheme', mapping (possibly randomly) states of nature to a family of signals . For2 , we use'() to denote the (possibly random) signal selected when the state of nature is. Moreover, we use'(;) to denote the probability of selecting the signal given a state of nature. An algorithm implements a signaling scheme' if it takes as input a state of nature, and samples the random variable'(). Correspondence to the example of Section 4.2: In the example, the state of nature , i.e., whether a ranger is nearby or not, is known to the defender but unknown to the poacher. However, the probability that a ranger is nearby is publicly known. The defender commits to a signaling scheme to deceptively send the warning signal. The process can be randomized since when the ranger is not nearby, the defender sometimes sends the warning signal and sometimes does not. Given a signaling scheme ' with signals , each signal 2 is realized with probabil- ity = P 2 '(;). Conditioned on the signal , the expected payoffs to the receiver of the various actions are summarized by the vectorr() = 1 P 2 '(;)r(). Sim- ilarly, the sender’s payoffs as a function of the receiver’s action are summarized by s() = 1 P 2 '(;)s(). On receiving a signal, the receiver performs a Bayesian update and selects an actioni ()2 argmax i r i () with expected receiver utility max i r i (). This induces utilitys i () () for the sender. In the event of ties when selectingi (), we assume those ties are broken in favor of the sender. Correspondence to the example of Section 4.2: When the poacher receives a warning signal, he updates his belief about the probability of a ranger nearby and then best responds. We will adopt the perspective of a sender looking to design ' to maximize her expected utility P s i () (), in which case we say' is optimal. When' yields expected sender utility within an additive [multiplicative] of the best possible, we say it is-optimal [-approximate] in the additive [multiplicative] sense. A simple revelation-principle style argument (Kamenica & Gentzkow, 2011) shows that an optimal signaling scheme need not use more thann signals, with one recommending each action. Such a direct scheme' has signals =f 1 ;:::; n g, and 28 satisfiesr i ( i )r j ( i ) for alli;j2 [n]. We think of i as a signal recommending actioni, and the requirementr i ( i ) max j r j ( i ) as an persuasiveness constraint on the signaling scheme — i.e., the recommended action is indeed the receiver’s favorite action. 1 All the signaling schemes considered in this thesis will be direct, unless explicitly stated otherwise. 2 Correspondence to the example of Section 4.2: The optimal scheme we described in the exam- ple uses two signals + ; since the poacher has only two actions. Moreover, + [ ] can be thought of as a persuasive recommendation of stopping [continuing] poaching. So, the scheme we describe is direct. Next we mention a few remarks about the results in the next sections. For our results in Section 5.2.4, we relax the persuasiveness constraints by assuming that the receiver follows the recommendation so long as it approximately maximizes his utility — for a parameter> 0, we relax our requirement tor i ( i ) max j r j ( i ), which translates to the relaxed persuasiveness constraints P 2 '(; i )r i () P 2 '(; i )(r j ()) in LP (5.1). We call such schemes-persuasive. We judge the suboptimality of an-persuasive scheme relative to the best (absolutely) persuasive scheme; i.e., in a bi-criteria sense. We note that expected utilities, persuasiveness, and optimality are properties not only of a signaling scheme', but also of the distribution over its inputs. Therefore, we often say that a signaling scheme' is persuasive [-persuasive] for, or optimal [-optimal] for. We also use u s (';) to denote the expected sender utility P 2 P n i=1 '(; i )s i (). The Commitment Assumption. We conclude this section with a few justifications about the commitment assumption in the persuasion model. The commitment to signaling schemes is justified on the grounds of repeated games with a long horizon — in particular when the sender must balance his short-term payoffs with long-term credibility. We refer the reader to the discussion of this phenomenon in (Rayo & Segal, 2010). Also, (Kamenica & Gentzkow, 2011) mention that an earlier model of repeated 2- player games with asymmetric information by (Aumann et al., 1995) is mathematically analogous to Bayesian persuasion. With respect to the concrete security applications we study, the commitment to signaling schemes is usually natural and realistic. For example, in the poaching example of Section 4.2, the signaling schemes need to be implemented as software in the UA Vs. Once the code is finalized and deployed, the defender is committed to using the signaling scheme prescribed by the code. We will also assume that the receiver (i.e., attacker in security games) is aware of the signaling 1 Persuasiveness has also been called incentive compatibility or obedience in prior work. 2 One reason is that schemes for which it is tractable to compute the best receiver response (or the desired-best response) are w.l.o.g direct. Therefore, indirect schemes are somewhat less useful to consider. 29 scheme and will best respond to each signal. This is because by interacting with the system, the attacker can gradually learn each signal’s posterior. This is particularly true in “green security” domains which generally involve limited penalties for being caught (Carthy, Tambe, Kiekintveld, Gore, & Killion, 2016; Fang et al., 2016a). Moreover, there is usually a community of attackers who can learn these probabilities by sharing knowledge. 5.2 Algorithmic Foundation for Bayesian Persuasion Naturally, the sender in the Bayesian persuasion model seeks to find the signaling scheme that maximizes her expected utility subject to the receiver’s strategic response. Therefore, the Bayesian persuasion problem is an optimization problem by nature. We now provide a thor- ough algorithmic analysis for the model and pin down the complexity of the problem under several natural input models. To the best of our knowledge, this is the first algorithmic study for this foundational economic model. Our results not only pave the way to applications and help to operationalize the model, but also provide structural insights into the problem. Moreover, complexity-theoretic results often shed light on whether or not a model is realistic. 5.2.1 Explicit Input Model We start with the simple case where the distribution for the state of nature is explicitly given. That is, the probability for each state of nature is explicitly enumerated. In this case the sender’s optimization problem can be formulated as a linear program (LP) with variables f'(; i ) :2 ;i2 [n]g. maximize P 2 P n i=1 '(; i )s i () subject to P 2 '(; i )r i () P 2 '(; i )r j (); fori;j2 [n]: P n i=1 '(; i ) = 1; for2 : '(; i ) 0; for2 ;i2 [n]: (5.1) At a high level, the LP maximizes the sender’s expected utility subject to that the recom- mendation of each signal is persuasive and the scheme is feasible. This shows that the optimal persuasion problem can be solved in polynomial time for the explicit input model since linear programs can be solved in polynomial time in the LP size. 5.2.2 Poly-Time Solvability for Persuasion with I.I.D Actions In this section, we assume that the payoffs of different actions are independently and identically distributed (i.i.d.) according to an explicitly-described marginal distribution. To better motivate this setting, we start with an illustrative example. 30 Example 1 (An Example of Persuasion with I.I.D. Actions). Our example is in the context of wildlife conservation. The receiver is a poacher, actions correspond to visiting conservation ar- eas for poaching, and the sender is a security agency or defender with access to statuses of con- servation areas (e.g., animal populations, ranger locations, etc.) which are a priori unknown to the poacher. The misaligned incentives between the defender and poacher give rise to a nontrivial Bayesian persuasion problem. In fact, interesting examples exist when statuses of conservation areas are independent from each other, or even i.i.d. Consider the following simple example which fits into the i.i.d. model considered in this section: there are two conservation areas, each of which is a priori equally likely to be in one of the following three states (independently): pro- tected animals show up and rangers are patrolling the area (state A); protected animals show up and rangers are not patrolling the area (state B); only regular animals — which are not protected — show up (state C). We refer to A/B/C as the types of an area, and associate them with poacher utilities of1, 1, and, respectively. Suppose that the defender’s goal is to prevent the poacher to attack an area with protected animals. Concretely, the defender receives utility1 if the poacher attacks an area of type A or B, 3 and utility 0 if the poacher attacks an area of type C. The poacher will always choose one of these two areas to attack. A simple calculation shows that providing full information to the poacher results in an expected defender utility of 2 3 , as does providing no information. An optimal signaling scheme, which guarantees that the poacher attacks an area with type C whenever such an area exists, is the following: when exactly one of the areas has type C “recommend” that area to the poacher, and otherwise “recommend” any area uniformly at random. A simple calculation using Bayes’ rule shows that the poacher prefers to follow the recommendations of this partially informative scheme, and it follows that the expected defender utility is 4 9 . More formally, in the Bayesian persuasion model with i.i.d. actions, each state of nature is a vector in = [m] n for a parameterm, where i 2 [m] is the type of actioni. Associated with each typej2 [m] is a pair ( j ; j )2R 2 , where j [ j ] is the payoff to the sender [receiver] when the receiver chooses an action with type j. We are given a marginal distribution over types, described by a vectorq = (q 1 ;:::;q m )2 m . We assume each action’s type is drawn independently according toq; specifically, the prior distribution on states of nature is given by () = Q i2[n] q i . For convenience, we let = ( 1 ;:::; m )2R m and = ( 1 ;:::; m )2R m denote the type-indexed vectors of sender and receiver payoffs, respectively. We assume,, and q — the parameters describing an i.i.d. persuasion instance — are given explicitly. 3 The defender does not want the poacher to attack an area with protected animals even though there are patrollers there (i.e., in state B). This is because the protected animals may have already be killed before the rangers catch the poacher, and this is a huge loss to the defender. 31 M i = P ()'(; i )M ; fori = 1;:::;n: P n i=1 '(; i ) = 1; for2 : '(; i ) 0; for2 ;i2 [n]: Figure 5.1: Realizable signaturesP max P n i=1 M i i s.t. M i i M i j ; fori;j2 [n]: (M 1 ;:::;M n )2P Figure 5.2: Persuasion in signature space Note that the number of states of nature ism n , and therefore the natural representation of a signaling scheme hasnm n variables. As a result, the natural linear program for the persuasion problem in Section 5.2.1 has an exponential inn number of both variables and constraints. Never- theless, we will not need to explicitly write down the signaling scheme. Instead, as mentioned in Section 5.1, we seek only to implement an optimal or near-optimal scheme' as an oracle which takes as input and samples a signal'(). Our algorithms will run in time polynomial inn andm, and will optimize over a space of succinct “reduced forms” for signaling schemes which we term signatures, to be described next. For a state of nature, define the matrixM 2f0; 1g nm so thatM ij = 1 if and only if actioni has typej in (i.e. i =j). Given an i.i.d prior and a signaling scheme' with signals =f 1 ;:::; n g, for eachi2 [n] let i = P ()'(; i ) denote the probability of sending i , and let M i = P ()'(; i )M . Note that M i jk is the joint probability that action j has typek and the scheme outputs i . Also note that each row ofM i sums to i , and thejth row represents the un-normalized posterior type distribution of actionj given signal i . We call M = (M 1 ;:::;M n )2 R nmn the signature of '. The sender’s objective and receiver’s persuasiveness constraints can both be expressed in terms of the signature. In particular, using M j to denote thejth row of a matrixM, the persuasiveness constraints areM i i M i j for all i;j2 [n], and the sender’s expected utility assuming the receiver follows the scheme’s recommendations is P i2[n] M i i . We sayM = (M 1 ;:::;M n )2 R nmn is realizable if there exists a signaling scheme ' withM as its signature. Realizable signatures constitutes a polytopeP R nmn , which has an exponential-sized extended formulation as shown Figure 5.1. Given this characterization, the sender’s optimization problem can be written as a linear program in the space of signatures, shown in Figure 5.2: 32 Symmetry of the Optimal Signaling Scheme We now show that there always exists a “symmetric” optimal scheme when actions are i.i.d. Given a signatureM = (M 1 ;:::;M n ), it will sometimes be convenient to think of it as the set of pairsf(M i ; i )g i2[n] . Definition 1. A signaling scheme' with signaturef(M i ; i )g i2[n] is symmetric if there exist x;y2R m such thatM i i =x for alli2 [n] andM i j =y for allj6=i. The pair (x;y) is the s-signature of'. In other words, a symmetric signaling scheme sends each signal with equal probabilityjjxjj 1 , and induces only two different posterior type distributions for actions: x jjxjj 1 for the recommended action, and y jjyjj 1 for the others. We call (x;y) realizable if there exists a signaling scheme with (x;y) as its s-signature. The family of realizable s-signatures constitutes a polytopeP s , and has an extended formulation by adding the variablesx;y2 R m and constraintsM i i =x and M i j = y for alli;j2 [n] withi6= j to the extended formulation of (asymmetric) realizable signatures from Figure 5.1. We make two simple observations regarding realizables-signatures. First,jjxjj 1 =jjyjj 1 = 1 n for each (x;y)2P s , and this is because bothjjxjj 1 andjjyjj 1 equal the probability of each of then signals. Second, since the signature must be consistent with prior marginal distributionq, we havex + (n 1)y = P n i=1 M i 1 =q. We show that the restriction to symmetric signaling schemes will not reduce the sender’s optimal utility. Theorem 5.2.1. When the action payoffs are i.i.d., there exists an optimal and persuasive signal- ing scheme which is symmetric. Theorem 5.2.1 is proved in Appendix A.1.1. At a high level, we show that optimal signal- ing schemes are closed with respect to two operations: convex combination and permutation. Specifically, a convex combination of realizable signatures — viewed as vectors inR nmn — is realized by the corresponding “random mixture” of signaling schemes, and this operation pre- serves optimality. The proof of this fact follows easily from the fact that linear program in Figure 5.2 has a convex family of optimal solutions. Moreover, given a permutation2SS n and an op- timal signatureM =f(M i ; i )g i2[n] realized by signaling scheme', the “permuted” signature (M) =f(M i ; (i) )g i2[n] — where premultiplication of a matrix by denotes permuting the rows of the matrix — is realized by the “permuted” scheme' () =('( 1 ())), which is also optimal. The proof of this fact follows from the “symmetry” of the (i.i.d.) prior distribution about the different actions. Theorem 5.2.1 is then proved constructively as follows: given a real- izable optimal signatureM, the “symmetrized” signatureM = 1 n! P 2SSn (M) is realizable, optimal, and symmetric. 33 Implementing the Optimal Signaling Scheme We now exhibit a polynomial-time algorithm for persuasion in the i.i.d. model. Theorem 5.2.1 permits re-writing the optimization problem in Figure 5.2 as follows, with variablesx;y2R m . maximize nx subject to xy (x;y)2P s (5.2) Problem (5.2) cannot be solved directly, sinceP s is defined by an extended formulation with exponentially many variables and constraints, as described previously. Nevertheless, we make use of a connection between symmetric signaling schemes and single-item auctions with i.i.d. bidders to solve (5.2) using the Ellipsoid method. Specifically, we show a one-to-one correspondence between symmetric signatures and (a subset of) symmetric reduced forms of single-item auctions with i.i.d. bidders, defined as follows. Definition 2. (Border, 1991) Consider a single-item auction setting withn i.i.d. bidders andm types for each bidder, where each bidder’s type is distributed according toq2 m . An allocation rule is a randomized functionA mapping a type profile2 [m] n to a winnerA()2 [n][fg, where denotes not allocating the item. We say the allocation rule has symmetric reduced form 2 [0; 1] m if for each bidderi2 [n] and typej2 [m], j is the conditional probability ofi receiving the item given that she has typej. Whenq is clear from context, we say is realizable if there exists an allocation rule with as its symmetric reduced form. We say an algorithm implements an allocation ruleA if it takes as input, and samplesA(). Theorem 5.2.2. Consider the Bayesian Persuasion problem withn i.i.d. actions andm types, with parametersq2 m ,2 R m , and2 R m given explicitly. An optimal and persuasive signaling scheme can be implemented in poly(m;n) time. Theorem 5.2.2 is a consequence of the following set of lemmas. Lemma 1. Let (x;y)2 [0; 1] m [0; 1] m , and define = ( x 1 q 1 ;:::; xm qm ). The pair (x;y) is a realizable s-signature if and only if (a)jjxjj 1 = 1 n , (b)x + (n 1)y = q, and (c) is a realizable symmetric reduced form of an allocation rule withn i.i.d. bidders,m types, and type distributionq. Moreover, assumingx andy satisfy (a), (b) and (c), and given black-box access to an allocation ruleA with symmetric reduced form , a signaling scheme withs-signature (x;y) can be implemented in poly(n;m) time. Lemma 2. An optimal realizable s-signature, as described by LP (5.2), is computable in poly(n;m) time. 34 Lemma 3. (See (Cai, Daskalakis, & Weinberg, 2012; Alaei, Fu, Haghpanah, Hartline, & Malekian, 2012)) Consider a single-item auction setting with n i.i.d. bidders and m types for each bidder, where each bidder’s type is distributed according toq2 m . Given a realizable symmetric reduced form2 [0; 1] m , an allocation rule with reduced form can be implemented in poly(n;m) time. The proofs of Lemmas 1 and 2 can be found in Appendix A.1.2. The proof of Lemma 1 builds a correspondence betweens-signatures of signaling schemes and certain reduced-form allocation rules. Specifically, actions correspond to bidders, action types correspond to bidder types, and signaling i corresponds to assigning the item to bidderi. The expression of the reduced form in terms of the s-signature then follows from Bayes’ rule. Lemma 2 follows from Lemma 1, the ellipsoid method, and the fact that symmetric reduced forms admit an efficient separation oracle (see (Border, 1991, 2007; Cai et al., 2012; Alaei et al., 2012)). A “Simple” (1 1 e )-Approximate Scheme Our next result is a “simple” signaling scheme which obtains a (1 1=e) multiplicative approx- imation when payoffs are nonnegative. This algorithm has the distinctive property that it signals independently for each action, and therefore implies that approximately optimal persuasion can be parallelized among multiple colluding senders, each of whom only has access to the type of one or more of the actions. Recall that an s-signature (x;y) satisfiesjjxjj 1 =jjyjj 1 = 1 n andx + (n 1)y =q. Our simple scheme, shown in Algorithm 1, works with the following explicit linear programming relaxation of optimization problem (5.2). maximize nx subject to xy x + (n 1)y =q jjxjj 1 = 1 n x;y 0 (5.3) Algorithm 1 has a simple and instructive interpretation. It computes the optimal solution (x ;y ) to the relaxed problem (5.3), and uses this solution as a guide for signaling independently for each action. The algorithm selects, independently for each action i, a component signal o i 2fHIGH;LOWg. Each o i is chosen so that Pr[o i = HIGH] = 1 n , and moreover the events o i = HIGH and o i = LOW induce the posterior beliefs nx and ny , respectively, regarding the type of actioni. 35 Algorithm 1: Independent Signaling Scheme Input: Sender payoff vector, receiver payoff vector, prior distributionq Input: State of nature2 [m] n Output: Ann-dimensional binary signal2fHIGH;LOWg n 1: Compute an optimal solution (x ;y ) from linear program (5.3). 2: For each actioni independently, set component signalo i toHIGH with probability x i q i and toLOW otherwise, where i is the type of actioni in the input state. 3: Return = (o 1 ;:::;o n ). The signaling scheme implemented by Algorithm 1 approximately matches the optimal value of (5.3), as shown in Theorem 5.2.3, assuming the receiver is rational and therefore selects an ac- tion with aHIGH component signal if one exists. We note that the scheme of Algorithm 1, while not a direct scheme as described, can easily be converted into one; specifically, by recommend- ing an action whose component signal isHIGH when one exists (breaking ties arbitrarily), and recommending an arbitrary action otherwise. Theorem 5.2.3 follows from the fact that (x ;y ) is an optimal solution to LP (5.3), the fact that the posterior type distribution of an actioni isnx wheno i = HIGH andny wheno i = LOW, and the fact that each component signal is high independently with probability 1 n . We defer the formal proof to Appendix A.1.3. Theorem 5.2.3. Algorithm 1 runs inpoly(m;n) time, and serves as a (1 1 e )-approximate signal- ing scheme for the Bayesian Persuasion problem withn i.i.d. actions,m types, and nonnegative payoffs. Remark 5.2.4. Algorithm 1 signals independently for each action. This conveys an interesting conceptual message. That is, even though the optimal signaling scheme might induce posterior beliefs which correlate different actions, it is nevertheless true that signaling for each action in- dependently yields an approximately optimal signaling scheme. As a consequence, collaborative persuasion by multiple parties (the senders), each of whom observes the payoff of one or more ac- tions, is a task that can be parallelized, requiring no coordination when actions are identical and independent and only an approximate solution is sought. We leave open the question of whether this is possible when action payoffs are independently but not identically distributed. 5.2.3 Complexity Barriers to Persuasion with Independent Actions In this section, we consider optimal persuasion with independent action payoffs as in Section 5.2.2, albeit with action-specific marginal distributions given explicitly. Specifically, for each actioni we are given a distributionq i 2 m i onm i types, and each typej2 [m i ] of actioni is associated with a sender payoff i j 2 R and a receiver payoff i j 2 R. The positive results 36 of Section 5.2.2 draw a connection between optimal persuasion in the special case of identi- cally distributed actions and Border’s characterization of reduced-form single-item auctions with i.i.d. bidders. One might expect this connection to generalize to the independent non-identical persuasion setting, since Border’s theorem extends to single-item auctions with independent non- identical bidders. Surprisingly, we show that this analogy to Border’s characterization fails to generalize. We prove the following theorem. Theorem 5.2.5. Consider the Bayesian Persuasion problem with independent actions, with action-specific payoff distributions given explicitly. It is #P -hard to compute the optimal ex- pected sender utility. Invoking the framework of (Gopalan, Nisan, & Roughgarden, 2015), this rules out a gener- alized Border’s theorem for our setting, in the sense defined by (Gopalan et al., 2015), unless the polynomial hierarchy collapses toP NP . We view this result as illustrating some of the important differences between persuasion and mechanism design. The proof of Theorem 5.2.5 is rather involved. We defer the full proof to Appendix A.2, and only present a sketch here. Our proof starts from the ideas of (Gopalan et al., 2015), who show the #P-hardness for revenue or welfare maximization in several mechanism design problems. In one case, (Gopalan et al., 2015) reduce from the #P -hard problem of computing the Khintchine constant of a vector. Our reduction also starts from this problem, but is much more involved: 4 First, we exhibit a polytope which we term the Khintchine polytope, and show that computing the Khintchine constant reduces to linear optimization over the Khintchine polytope. Second, we present a reduction from the membership problem for the Khintchine polytope to the computation of optimal sender utility in a particularly-crafted instance of persuasion with independent actions. Invoking the polynomial-time equivalence between membership checking and optimization (see, e.g., (Gr¨ otschel, Lov´ asz, & Schrijver, 1988)), we conclude the #P-hardness of our problem. The main technical challenge we overcome is in the second step of our proof: given a vectorx which may or may not be in the Khintchine polytopeK, we construct a persuasion instance and a threshold T so that points inK encode signaling schemes, and the optimal sender utility is at leastT if and only ifx2K and the scheme corresponding tox results in sender utilityT . Proof Sketch of Theorem 5.2.5 The Khintchine problem, shown to be #P-hard in (Gopalan et al., 2015), is to compute the Khint- chine constant K(a) of a given vectora2 R n , defined asK(a) = E f1g n[jaj] where 4 In (Gopalan et al., 2015), Myerson’s characterization is used to show that optimal mechanism design in a public project setting directly encodes computation of the Khintchine constant. No analogous direct connection seems to hold here. 37 is drawn uniformly at random fromf1g n . To relate the Khintchine problem to Bayesian per- suasion, we begin with a persuasion instance with n i.i.d. actions and two action types, which we refer to as type -1 and type +1. The state of nature is a uniform random draw from the set f1g n , with theith entry specifying the type of actioni. We call this instance the Khintchine- like persuasion setting. As in Section 5.2.2, we still use the signature to capture the payoff- relevant features of a signaling scheme, but we pay special attention to signaling schemes which use only two signals, in which case we represent them using a two-signal signature of the form (M 1 ;M 2 )2R n2 R n2 . The Khintchine polytopeK(n) is then defined as the (convex) family of all realizable two-signal signatures for the Khintchine-like persuasion problem with an addi- tional constraint: each signal is sent with probability exactly 1 2 . We first prove that general linear optimization overK(n) is #P-hard by encoding computation of the Khintchine constant as linear optimization overK(n). In this reduction, the optimal solution inK(n) is the signature of the two-signal scheme'() =sign(a), which signals + and each with probability 1 2 . To reduce the membership problem for the Khintchine polytope to optimal Bayesian per- suasion, the main challenges come from our restrictions onK(n), namely to schemes with two signals which are equally probable. Our reduction incorporates three key ideas. The first is to design a persuasion instance in which the optimal signaling scheme uses only two signals. The instance we define will haven + 1 actions. Action 0 is special – it deterministically results in sender utility> 0 (small enough) and receiver utility 0. The othern actions are regular. Action i > 0 independently results in sender utilitya i and receiver utilitya i with probability 1 2 (call this type 1 i ), or sender utilityb i and receiver utilityb i with probability 1 2 (call this type 2 i ), for a i andb i to be set later. Note that the sender and receiver utilities are zero-sum for both types. Since the special action is deterministic and the probability of its (only) type is 1 in any signal, we can interpret any (M 1 ;M 2 )2K(n) as a two-signal signature for our persuasion instance (the row corresponding to the special action 0 is implied). We show that restricting to two-signal schemes is without loss of generality in this persuasion instance. The proof tracks the following intuition: due to the zero-sum nature of regular actions, any additional information regarding regular actions would benefit the receiver and harm the sender. Consequently, the sender does not reveal any information which distinguishes between different regular actions. Formally, we prove that there always exists an optimal signaling scheme with only two signals: one signal recommends the special action, and the other recommends some regular action. We denote the signal that recommends the special action 0 by + (indicating that the sender derives positive utility), and denote the other signal by (indicating that the sender derives negative utility, as we show). The second key idea concerns choosing appropriate values for fa i g n i=1 ;fb i g n i=1 for a given two-signature (M 1 ;M 2 ) to be tested. We choose these values to sat- isfy the following two properties: (1) For all regular actions, the signaling scheme implementing 38 (M 1 ;M 2 ) (if it exists) results in the same sender utility1 (thus receiver utility 1) conditioned on and the same sender utility 0 conditioned on + ; (2) the maximum possible expected sender utility from , i.e., the sender utility conditioned on multiplied by the probability of , is 1 2 . As a result of Property (1), if (M 1 ;M 2 )2K(n) then the corresponding signaling scheme' is persuasive and results in expected sender utilityT = 1 2 1 2 (since each signal is sent with probability 1 2 ). Property (2) implies that' results in the maximum possible expected sender utility from . We now run into a challenge: the existence of a signaling scheme with expected sender utility T = 1 2 1 2 does not necessarily imply that (M 1 ;M 2 )2K(n) if is large. Our third key idea is to set > 0 “sufficiently small” so that any optimal signaling scheme must result in the maximum possible expected sender utility 1 2 from signal (see Property (2) above). In other words, we must make so small that the sender prefers to not sacrifice any of her payoff from in order to gain utility from the special action recommended by + . We show that such an exists with polynomially many bits. We prove its existence by arguing that the polytope of persuasive two-signal signatures has polynomial bit complexity, and therefore an> 0 that is smaller than the “bit complexity” of the vertices would suffice. As a result of this choice of, if the optimal sender utility is preciselyT = 1 2 1 2 then we know that signal + must be sent with probability 1 2 since the expected sender utility from signal must be 1 2 . We show that this, together with the specifically constructedfa i g n i=1 ;fb i g n i=1 , is sufficient to guarantee that the optimal signaling scheme must implement the given two-signature (M 1 ;M 2 ), i.e., (M 1 ;M 2 )2K(n). When the optimal optimal sender utility is strictly greater than 1 2 1 2 , the optimal signaling scheme does not implement (M 1 ;M 2 ), but we show that it can be post-processed into one that does. 5.2.4 An FPTAS for the General Persuasion Problem We now turn our attention to the Bayesian Persuasion problem when the payoffs of different ac- tions are arbitrarily correlated, and the joint distribution is presented as a black-box sampling oracle. We assume that payoffs are normalized to lie in the bounded interval, and prove essen- tially matching positive and negative results. Our positive result is a fully polynomial-time ap- proximation scheme for optimal persuasion with a bi-criteria guarantee; specifically, we achieve approximate optimality and approximate persuasiveness in the additive sense described in Sec- tion 5.1. Our negative results show that such a bi-criteria loss is inevitable in the black box model for information-theoretic reasons. 39 A Bicriteria FPTAS Theorem 5.2.6. Consider the Bayesian Persuasion problem in the black-box oracle model with n actions and payoffs in [1; 1], and let > 0 be a parameter. An-optimal and-persuasive signaling scheme can be implemented in poly(n; 1 ) time. To prove Theorem 5.2.6, we show that a simple Monte-Carlo algorithm implements an ap- proximately optimal and approximately persuasive scheme'. Notably, our algorithm does not compute a representation of the entire signaling scheme' as in Section 5.2.2, but rather merely samples its output'() on a given input. At a high level, when given as input a state of na- ture , our algorithm first takes K = poly(n; 1 ) samples from the prior distribution which, intuitively, serve to place the true state of nature in context. Then the algorithm uses a linear program to compute the optimal-persuasive schemee ' for the empirical distribution of samples augmented with the input. Finally, the algorithm signals as suggested bye ' for. Details are in Algorithm 2, which we instantiate with> 0 andK =d 256n 2 4 log( 4n )e. We note that relaxing persuasiveness is necessary for convergence to the optimal sender utility — we prove this formally in Section 5.2.4. This is why LP (5.4) features relaxed persuasiveness constraints. Instantiating Algorithm 2 with = 0 results in an exactly persuasive scheme which could be far from the optimal sender utility for any finite number of samplesK, as reflected in Lemma 6. Algorithm 2: Signaling Scheme for a Black Box Distribution Parameter: 0 Parameter: IntegerK 0 Input: Prior distribution supported on [1; 1] 2n , given by a sampling oracle Input: State of nature2 [1; 1] 2n Output: Signal2 , where =f 1 ;:::; n g. 1: Draw integer` uniformly at random fromf1;:::;Kg, and denote ` =. 2: Sample 1 ;:::; `1 ; `+1 :::; K independently from, and let the multiset e =f 1 ;:::; K g denote the empirical distribution augmented with the input state = ` . 5 3: Solve linear program (5.4) to obtain the signaling schemee ' : e ! (). 4: Output a sample frome '() = e '( ` ). 5 It is not essential for the algorithm to pick a uniformly random` to set l =. That is, the algorithm also works if we always set1 =. We choosel uniformly at random because this makes uniformly distributed in e , conditioned on the samples. This simplifies our proof of Theorem 5.2.6. 40 maximize P K k=1 P n i=1 1 K e '( k ; i )s i ( k ) subject to P n i=1 e '( k ; i ) = 1; fork2 [K]: P K k=1 1 K e '( k ; i )r i ( k ) P K k=1 1 K e '( k ; i )(r j ( k )); fori;j2 [n]: e '( k ; i ) 0; fork2 [K];i2 [n]: (5.4) Relaxed Empirical Optimal Signaling Problem Theorem 5.2.6 follows from three lemmas pertaining to the scheme' implemented by Algo- rithm 2. 6 Approximate persuasiveness for (Lemma 4) follows from the principle of deferred decisions, linearity of expectations, and the fact that e ' is approximately persuasive for the aug- mented empirical distribution e . A similar argument, also based on the principal of deferred decisions and linearity of expectations, shows that the expected sender utility from our scheme when equals the expected optimal value of linear program (5.4), as stated in Lemma 5. Finally, we show in Lemma 6 that the optimal value of LP (5.4) is close to the optimal sender utility for with high probability, and hence also in expectation, whenK = poly(n; 1 ) is chosen appropriately; the proof of this fact invokes standard tail bounds as well as structural properties of linear program (5.4), and exploits the fact that LP (5.4) relaxes the persuasiveness constraint. We prove all three lemmas in Appendix A.3.1. Even though our proof of Lemma 6 is self-contained, we note that it can be shown to follow from (Weinberg, 2014, Theorem 6) with some additional work. Lemma 4. Algorithm 2 implements an-persuasive signaling scheme for prior distribution. Lemma 5. Assume, and assume the receiver follows the recommendations of Algorithm 2. The expected sender utility equals the expected optimal value of the linear program (5.4) solved in Step 3. Both expectations are taken over the random input as well as internal randomness and Monte-Carlo sampling performed by the algorithm. Lemma 6. LetOPT denote the expected sender utility induced by the optimal persuasive sig- naling scheme for distribution. When Algorithm 2 is instantiated withK 256n 2 4 log( 4n ) and its input is drawn from, the expected optimal value of the linear program (5.4) solved in Step 3 is at leastOPT. The expectation is over the random input as well as the Monte-Carlo sampling performed by the algorithm. Information-Theoretic Barriers We now show that our bi-criteria FPTAS is close to the best we can hope for: there is no bounded-sample signaling scheme in the black box model which guarantees persuasiveness and 6 Note that the overall scheme' implemented by Algorithm 2 should be distinguished from the particulare ' for empirical distribution e , which is used to construct'() for the particular input. 41 c-optimality for any constantc < 1, nor is there such an algorithm which guarantees optimality andc-persuasiveness for anyc < 1 4 . Formally, we consider algorithms which implement direct signaling schemes. Such an algorithm takes as input a black-box distribution supported on [1; 1] 2n and a state of nature2 [1; 1] 2n , wheren is the number of actions, and outputs a signal 2f 1 ;:::; n g recommending an action. We say such an algorithm is -persuasive [-optimal] if for every distribution the signaling schemeA() is-persuasive [-optimal] for . We define the sample complexitySC A (;) as the expected number of queries made byA to the blackbox given inputs and, where the expectation is taken over the randomness inherent in the Monte-Carlo sampling from as well as any other internal coins ofA. We show that the worst-case sample complexity is not bounded by any function of n and the approximation parameters unless we allow bi-criteria loss in both optimality and persuasiveness. More so, we show a stronger negative result for exactly persuasive algorithms: the average sample complexity over is also not bounded by a function of n and the suboptimality parameter. Whereas our results imply that we should give up on exact persuasiveness, we leave open the question of whether an optimal and-persuasive algorithm exists with poly(n; 1 ) average case (but un- bounded worst-case) sample complexity. Theorem 5.2.7. The following hold for every algorithmA for Bayesian Persuasion in the black- box model: (a) IfA is persuasive andc-optimal forc< 1, then for every integerK there is a distribution =(K) on 2 actions and 2 states of nature such thatE [SC A (;)]>K. (b) IfA is optimal andc-persuasive forc < 1 4 , then for every integerK there is a distribu- tion = (K) on 3 actions and 3 states of nature, and in the support of, such that SC A (;)>K. Our proof of each part of this theorem involves constructing a pair of distributions and 0 which are arbitrarily close in statistical distance, but with the property that any algorithm with the postulated guarantees must distinguish between and 0 . We defer the proof to Appendix A.3.2. 5.3 Persuading Multiple Receivers The Bayesian persuasion model examined in Section 5.1 and 5.2 consider the interaction be- tween one sender and one receiver. In this section, we consider a natural generalization in which the sender persuades multiple receivers. We focus on a basic model, first studied in (Arieli & Babichenko, 2016), with binary receiver actions and no externalities. This model generalizes and restricts aspects of the Bayesian persuasion model, and is a fundamental special case for multi-agent persuasion. 42 5.3.1 A Fundamental Setting: Binary Actions and No Externalities We adopt the perspective of a sender facingn receivers. Each receiver has two actions, which we denote by 0 and 1. The receiver’s payoff depends only on his own action and a random state of nature supported on . In particular, we useu i (; 1) andu i (; 0) to denote receiver i’s utility for action 1 and action 0, respectively, at the state of nature ; as shorthand, we use u i () = u i (; 1)u i (; 0) to denote how much receiveri prefers action 1 over action 0 given the state of nature . 7 Note that u i () may be negative. The sender’s utility (our objective) is a function of all the receivers’ actions and the state of nature . We use f (S) to denote the sender’s utility when the state of nature is andS is the set of receivers who choose action 1. We assume throughout this section thatf is a monotone non-decreasing set function for every. For convenience in stating our approximation guarantees, we assume without loss of generality that f is normalized so thatf (;) = 0 andf (S)2 [0; 1] for all2 andS [n]. Like in the Bayesian persuasion (BP) model, is drawn from a common prior distribution . The sender has access to the realized state of nature and can publicly commit to a signaling scheme that reveals to each receiver noisy partial information regarding the state of nature. The main difference from the BP model is that upon observing the realized state, the sender will draw a profile of signals ( 1 ;:::; n )'() and send signal i to each receiveri. Private vs. Public Signaling A general signaling scheme permits sending different signals to different receivers through a private communication channel — we term these private signaling schemes to emphasize this generality. We also study the special case of public signaling schemes — these are restricted to a public communication channel, and hence send the same signal to all receivers. We formally define these two signaling models in Sections 5.3.3 and 5.3.5, including the equilibrium concept and the induced sender optimization problem for each. In both cases, we are primarily interested in the optimization problem faced by the sender in step (1), the goal of which is to maximize the sender’s expected utility. When' yields expected sender utility within an additive [multiplicative] of the best possible, we say it is-optimal [-approximate] in the additive [multiplicative] sense. Input Models We distinguish two input models for describing persuasion instances. The first is the explicit model, in which the prior distribution is given explicitly as a probability vector. The second is the sample oracle model, where and are provided implicitly through sample access to. In both models, we assume that given a state of nature, we can efficiently evaluateu i () for 7 An equivalent presentation is to, w.l.o.g., assumeui(; 0) = 0. 43 eachi2 [n] andf (S) for eachS [n]. Our analysis will primarily focus on the explicit input model, though we will mention in the context how our results generalize to the implicit input model using techniques from Section 5.2.4. 5.3.2 Technical Preliminaries: Set Functions and Submodularity Given a finite ground setX, a set function is a mapf : 2 X !R. Such a function is nonnegative iff(S) 0 for allS X, monotone non-decreasing (or monotone for short) iff(S) f(T ) for allS T . Most importantly,f is submodular if for anyS;T X, we havef(S[T ) + f(S\T )f(S) +f(T ). Submodular functions are widely used to model utilities for a set of items. We also consider continuous functionsG from the solid hypercube [0; 1] X to the real num- bers. Such a function is nonnegative ifG(x) 0 for allx2 [0; 1] X , monotone non-decreasing (or monotone for short) ifG(x)G(y) wheneverxy (coordinate wise), and smooth submod- ular (in the sense of (Calinescu, Chekuri, P´ al, & V ondr´ ak, 2011)) if its second partial derivatives exist and are non-positive everywhere. The Multilinear Extension of a Set Function. Given any set functionf : 2 X !R, the multi- linear extension off is the continuous functionF : [0; 1] X !R defined as follows: F (x) = X SX f(S) Y i2S x i Y i62S (1x i ); (5.5) Notice thatF (x) can be viewed as the expectation off(S) when the random setS independently includes each elementi with probabilityx i . In particular, letp I x denote the independent distribu- tion with marginalsx, defined byp I x (S) = Q i2S x i Q i62S (1x i ). ThenF (x) =E Sp I x f(S). If f is nonnegative/monotone then so isF . Moreover, iff is submodular thenF is smooth submod- ular. For our results, we will need to maximizeF (x) subject to a set of linear constraints onx. This problem is NP-hard in general, yet can be approximated by the continuous greedy process of (Calinescu et al., 2011) for fairly general families of constraints. Note that though we cannot exactly evaluateF (x) in polynomial time, it is sufficient to approximateF (x) within a good pre- cision in order to apply the continuous greedy process. By an additive FPTAS evaluation oracle forF , we mean an algorithm that evaluatesF (x) within additive error in poly(n; 1 ) time. Theorem 5.3.1 (Adapted form (Calinescu et al., 2011)). Let F : [0; 1] n ! [0; 1] be a non- negative, monotone, smooth submodular function. LetP [0; 1] n be a down-monotone poly- tope 8 , specified explicitly by its linear constraints. Given an additive FPTAS evaluation oracle 8 A polytopePR n + is called down-monotone if for allx;y2R n + , ify2 P andx y (coordinate-wise) then x2P . 44 forF , there is a poly(n; 1 ) time algorithm that outputsx2P such thatF (x) (1 1 e )OPT, whereOPT = max x2P F (x). Correlation Gap. A general definition of the correlation gap can be found in (Agrawal, Ding, Saberi, & Ye, 2010). For our results, the following simple definition will suffice. Specifically, for anyx2 [0; 1] X , letD(x) be the set of all distributionsp over 2 X with fixed marginal probability Pr Sp (i2 S) = x i for all i. Let p I x , as defined above, be the independent distribution with marginal probabilitiesx. Note thatp I x 2D(x). For any set functionf(S), the correlation gap is defined as follows: = max x2[0;1] X max p2D(x) E Sp f(S) E Sp I x f(S) : (5.6) Loosely speaking, the correlation gap upper bounds the “loss” of the expected function value over a random set by ignoring the correlation in the randomness. Theorem 5.3.2. (Agrawal et al., 2010) The correlation gap is upper bounded by e e1 for any non-negative monotone non-decreasing submodular function. 5.3.3 Optimal Private Persuasion and Its Complexity Characterization A private signaling scheme' is a randomized map from the set of states of nature to a set of signal profiles = 1 2 n , where i is the signal set of receiveri. We use'(;) to denote the probability of selecting the signal profile = ( 1 ;:::; n )2 given a state of nature . Therefore, P 2 '(;) = 1 for every. With some abuse of notation, we use'() to denote the random signal profile selected by the scheme' given the state. Moreover, for each2 , i2 [n], and i 2 i , we use' i (; i ) =Pr[' i () = i ] to denote the marginal probability that receiveri receives signal i in state. An algorithm implements a signaling scheme' if it takes as input a state of nature, and samples the random variable'(). Given a signaling scheme', each signal i 2 i for receiveri is realized with probability Pr( i ) = P 2 ()' i (; i ). Upon receiving i , receiver i — like the receiver in the BP model — performs a Bayesian update and infers a posterior belief over the state of nature, as follows: the realized state is with posterior probability()' i (; i )=Pr( i ). Receiveri then takes the action maximizing his posterior expected utility. In case of indifference, we assume ties are broken in favor of the sender (i.e., in favor of action 1). Therefore, receiveri chooses action 1 if 1 Pr( i ) X 2 ()' i (; i )u i (; 1) 1 Pr( i ) X 2 ()' i (; i )u i (; 0); or equivalently X 2 ()' i (; i )u i () 0; 45 whereu i () =u i (; 1)u i (; 0). Like in the BP model, a revelation-principle style argument shows that there exist an optimal private signaling scheme which is direct and persuasive (Kamenica & Gentzkow, 2011; Arieli & Babichenko, 2016). By direct we mean that signals correspond to actions — in our setting i =f0; 1g for each receiver i — and can be interpreted as action recommendations. A di- rect scheme is persuasive if the strategy profile where all receivers follow their recommendations forms an equilibrium of the resulting Bayesian game. Due to the absence of inter-receiver exter- nalities in our setting, such an equilibrium will necessarily also satisfy the stronger property of being a dominant-strategy equilibrium — i.e., each receiveri maximizes his posterior expected utility by following the recommendation, regardless of whether other receivers follow their rec- ommendations. When designing private signaling schemes, we restrict attention (without loss) to direct and persuasive schemes. Here, a signal profile can be equivalently viewed as a set S [n] of re- ceivers — namely, the set of receivers who are recommended action 1. Using this alternative representation, a scheme can be specified by variables'(;S) for all2 ;S [n]. We can now encode the sender’s optimization problem of computing the optimal scheme using the fol- lowing exponentially large linear program; note the use of auxiliary variablesx ;i to denote the marginal probability of recommending action 1 to receiveri in state. maximize P 2 () P S[n] '(;S)f (S) subject to P S:i2S '(;S) =x ;i ; fori2 [n];2 : P 2 ()x ;i u i () 0; fori = 1;:::;n: P S[n] '(;S) = 1; for2 : '(;S) 0; for2 ;S [n]: (5.7) The second set of constraints in LP (5.7) are persuasiveness constraints, and state that each receiver i should maximize his utility by taking action 1 whenever action 1 is recommended. Note that the persuasiveness constraints for action 0, which can be written as P 2 ()(1 x ;i )u i () 0 for eachi2 [n], are intentionally omitted from this LP. This omission is without loss whenf is a non-decreasing set function for each: any solution to the LP in which a receiver prefers action 1 when recommended action 0 can be improved by always recommending action 1 to that receiver. Since the size of LP (5.7) is exponential in the input size of the problem, it is not clear whether we can solve the problem in time polynomial in the input size. Next, we study the complexity of optimal private persuasion. In particular, we relate the computational complexity of private persuasion to the complexity of maximizing the sender’s objective function, and show that the optimal private signaling scheme can be computed efficiently for a broad class of sender 46 objectives. LetF denote any collection of monotone set functions. We useI(F) to denote the class of all persuasion instances in our model in which the sender utility functionf is inF for all states of nature. We restrict attention to the explicit input model for most of this discussion, though discuss how to extend our results to the sample oracle model, modulo an arbitrarily small additive loss in both the sender’s objective and the persuasiveness constraints, at the end of this section. The following theorem establishes the polynomial-time equivalence between computing the optimal private signaling scheme and the problem of maximizing the objective function plus an additive function. Note that although the number of variables in LP (5.7) is exponential in the number of receivers, a vertex optimal solution of this LP is supported onO(njj) variables. Theorem 5.3.3. LetF be any collection of monotone set functions. There is a polynomial-time algorithm which computes the optimal private signaling scheme given any instance inI(F) if and only if there is a polynomial time algorithm for maximizingf(S) + P i2S w i given anyf2F and any set of weightsw i 2R. Proof. We first reduce optimal private signaling to maximizing the objective function plus an additive function, via linear programming duality. Consider the following dual program of LP (5.7) with variablesw ;i ; i ;y . minimize P 2 y subject to P i2S w ;i +y ()f (S); forS [n];2 : w ;i + i ()u i () = 0; fori = 1;:::;n: i 0; fori2 [n]: (5.8) We can obtain a separation oracle for LP (5.8) given an algorithm for maximizingf (S) plus an additive function. Given any variablesw ;i ; i ;y , separation over the first set of constraints reduces to maximizing the set functiong (S) = f (S) 1 () P i2S w ;i for each2 . The other constraints can be checked directly in linear time. Given the resulting separation oracle, we can use the Ellipsoid method to obtain a vertex optimal solution to both LP (5.8) and its dual LP (5.7) in polynomial time (Gr¨ otschel et al., 1988). We now prove the converse. Namely, we construct a polynomial-time Turing reduction from the problem of maximizingf plus an additive function to a private signaling problem inI(F). At a high level, we first reduce the set function maximization problem to a certain linear program, and then prove that solving the dual of the LP reduces to optimal private signaling for a set of particularly constructed instances inI(F). 47 Givenf2 F and weightsw, our reduction concerns the following linear program, parame- terized bya = (a 1 ;:::;a n ) andb, with variablesz = (z 1 ;:::;z n ) andv. minimize P i2[n] a i z i +bv subject to P i2S z i +vf(S); forS [n]: (5.9) LetP denote the feasible region of LP (5.9). As the first step of our reduction, we reduce maximizing the set function g w (S) = f(S) + P i2S w i to the separation problem forP. Let z i =w i for eachi. Notice that (z;v) is feasible (i.e., inP) if and only ifv max S[n] f(S) P i2S z i . Therefore, we can binary search for a valuee v such that (z;e v) is almost feasible, but not quite. More precisely, letB denote the bit complexity of thef(S)’s and thew i ’s. Then binary search returns the exact optimal value of the set function maximization problem afterO(B) steps. We then sete v to equal that value minus 2 B . Feeding (z;e v) to the separation oracle, we obtain a violated constraint which must correspond to the maximizer off(S) + P i2S w i . As the second step of our reduction, we reduce the separation problem forP to solving LP (5.9) for every choice of objective coefficientsa andb. This polynomial-time Turing reduction follows from the equivalence of separation and optimization (Gr¨ otschel et al., 1988). Third, we reduce solving LP (5.9) for arbitrarya andb to the special case wherea2 [0; 1] n and b = 1. The reduction involves a case analysis. (a) If any of the objective coefficients are negative, then the fact thatP is upwards closed implies that LP (5.9) is unbounded. (b) Ifb = 0 anda i > 0 for somei, then the LP is unbounded since we can makev arbitrarily small andz i arbitrarily large. Normalizing by dividing byb, we have reduced the problem to the case when b = 1 and a 0 (coordinate-wise). (c) Now suppose that a i > 1 = b for some i; the LP is unbounded by makingz i arbitrarily small andv arbitrarily large. This analysis leaves the case of b = 1 anda2 [0; 1] n . Fourth, we reduce LP (5.9) with parametersa2 [0; 1] n andb = 1 to its dual shown below, with variablesp S forS [n]. maximize P S[n] p S f(S) subject to P S:i2S p S a i ; fori2 [n]: P S[n] p S = 1 p S 0; forS [n]: (5.10) We note that LP (5.10) is not the standard dual of LP (5.9). In particular, the first set of constraints are inequality rather than equality constraints. It is easy to see that LP (5.10) is equivalent to the standard dual whenf is monotone non-decreasing, and that an optimal solution to one of the two duals can be easily converted to an optimal solution of the other. The fifth and final step of our reduction reduces LP (5.10) to a private signaling problem in I(F). There aren receivers and two states of nature 0 ; 1 with( 0 ) = ( 1 ) = 1=2. Define 48 u i ( 1 ) = 1 andu i ( 0 ) = 1 a i (1 ifa i = 0) for alli. The sender’s utility function satisfies f 1 = f 0 = f. Let' be an optimal signaling scheme, in particular an optimal solution to the instantiation of LP (5.7) for our instance. Note that all receivers prefer action 1 in state 1 ; there- fore,' can be weakly improved, without violating the persuasiveness constraints, by modifying it to always recommend action 1 to all receivers when in state 1 . After this modification,' is an optimal solution to the following LP, which optimizes over all signaling schemes satisfying '( 1 ; [n]) = 1. maximize 1 2 f([n]) + 1 2 P S[n] '( 0 ;S)f(S) subject to P S:i2S '( 0 ;S) =x 0 ;i ; fori2 [n]: x 0 ;i a i ; fori = 1;:::;n: P S[n] '( 0 ;S) = 1 '( 0 ;S) 0; for2 ;S [n]: (5.11) It is now easy to see that settingp S =' ( 0 ;S) yields an optimal solution to LP (5.10) As an immediate corollary of Theorem 5.3.3, the optimal private signaling scheme can be computed efficiently when the sender’s objective function is supermodular or anonymous. Recall that a set functionf : 2 [n] ! R is anonymous if there exists a functiong : Z! R such that f(S) =g(jSj). Corollary 1. There is a polynomial-time algorithm for computing the optimal private signaling scheme when the sender objective functions are either supermodular or anonymous. Proof. Since a supermodular function plus an additive function is still supermodular, and the problem of unconstrained supermodular maximization can be solved in polynomial time , The- orem 5.3.3 implies that the optimal private signaling scheme can also be computed in polyno- mial time. As for anonymous objectives, there is a simple algorithm for maximizing an anony- mous set function plus an additive function. In particular, consider the problem of maximizing f(S) + P i2S w i wheref(S) = g(jSj). Observe that fixingjSj = k, the optimal setS k corre- sponds to thek highest-weight elements inw. Enumerating allk and choosing the bestS k yields the optimal set. Finally, we make two remarks on Theorem 5.3.3, particularly on the reduction from opti- mal private signaling to set function maximization. First, the assumption of monotonicity is not necessary to the reduction from signaling to optimization. In other words, even without the monotonicity assumption for the sender’s objective function, one can still efficiently compute the optimal private signaling scheme for instances inI(F) given access to an oracle for maximizing f(S) + P i2S w i for anyf2F and weight vectorw. This can be verified by adding the persua- siveness constraints for action 0 back to LP (5.7) and examining the corresponding dual, which 49 has similar structure to LP (5.8). We omit the details here. Consequently, Corollary 1 applies to non-monotone supermodular or anonymous functions as well. Second, our reduction assumes that the prior distribution over the state of nature is explic- itly given. This can be generalized to the sample oracle model. In particular, when our only access to is through random sampling, we can implement an-optimal and-persuasive private signaling scheme inpoly(n; 1 ) time using the idea in Section 5.2.4. (assumingu i ()2 [1; 1]). The algorithm is as follows: given any input state , we first take poly(n; 1 ) samples from , and then solve LP (5.7) on the empirical distribution of the samples plus, with relaxed (by) persuasiveness constraints. Finally, we signal for as the solution to the LP suggests. The anal- ysis of this algorithm is very similar to that in Section 5.2.4, and is omitted here. Moreover, the bi-criteria loss is inevitable in this oracle model due to information theoretic reasons. 5.3.4 Private Persuasion with Submodular Objectives Theorem 5.3.3 relates the exact computation of the optimal private signaling scheme to exact maximization of (a variant of) the set function f(S). One natural question is what if exactly maximizing the set functionf(S) is intractable and we can only obtain an approximate solution efficiently. An important case of such a scenario is whenf(S) is submodular. To answer this question, we consider optimal private signaling for submodular sender objec- tives in this section, and show that there is a polynomial time (1 1 e )-approximation scheme, modulo an additive loss of . This is almost the best possible: (Babichenko & Barman, 2017) show that even in the special case of two states of nature, it is NP-hard to approximate the opti- mal private signaling scheme within a factor better than (1 1 e ) for monotone submodular sender objectives. Theorem 5.3.4. Consider private signaling with monotone submodular sender objectives. Let OPT denote the optimal sender utility. For any > 0, a private signaling scheme achieving expected sender utility at least (1 1 e )OPT can be implemented in poly(n;jj; 1 ) time. The main technical challenge in proving Theorem 5.3.4 is that a private signaling scheme may have exponentially large support, as apparent from linear program (5.7). To overcome this difficulty, we prove a structural characterization of (approximately) optimal persuasive private schemes, i.e., solutions to LP (5.7). Roughly speaking, we show that LP (5.7) always has an ap- proximately optimal solution with polynomial-sized support and nicely structured distributions. This greatly narrows down the solution space we need to search over. Recall that for any,'() is a random variable supported on 2 [n] . We say'() isK-uniform if it follows a uniform dis- tribution on a multiset of sizeK. The following lemma exhibits a structural property regarding 50 (approximately) optimal solutions to LP (5.7). Notably, this property only depends on mono- tonicity of the sender’s objective functions and does not depend on submodularity. Its proof is postponed to the end of this section. Lemma 7. Letf be monotone for each. For any > 0, there exists an-optimal persuasive private signaling scheme' such that'() isK-uniform for every, whereK = 108n log(2njj) 3 . By Lemma 7, we can, without much loss, restrict our design of'() to the special class of K-uniform distributions. Note that aK-uniform distribution'() can be described by variables x j ;i 2f0; 1g fori2 [n];j2 [K], wherex j ;i denotes the recommended action to receiveri in the j’th profile in the support of'(). Relaxing our variables to lie in [0; 1], this leads to optimization problem (5.12), whereF (x) = P S[n] f (S) Q i2S x i Q i62S (1x i ) is the multi-linear extension off . maximize P 2 () K P K j=1 F (x j ) subject to P 2 () K P K j=1 x j ;i u i () 0; fori = 1;:::;n: 0x j ;i 1; fori = 1;:::;n;2 : (5.12) At a high level, our algorithm first approximately solves Program (5.12) and then signals according to its solution. Details are in Algorithm 3, which we instantiate with > 0 and K = 108n log(2njj) 3 . SinceF (x) = E Sp I x f(S) wherep I x is the independent distribution over 2 [n] with marginal probabilityx, the expected sender utility induced by the signaling scheme in Algorithm 3 is precisely the objective value of Program (5.12) at the obtained solution. Theorem 5.3.4 then follows from two claims: 1. The optimal objective value of Program (5.12) is - close to the optimal sender utility (Claim 1); 2. The continuous greedy process (Calinescu et al., 2011) can be applied to Program (5.12) to efficiently compute a (1 1=e)-approximate solution, modulo a small additive loss (Claim 2). We remark that Theorem 5.3.4 can be generalized to the sample oracle model, but with an additional -loss in persuasiveness constraints (assuming u i ()2 [1; 1]), using the idea from Section 5.2.4. Claim 1. When K = 108n log(2njj) 3 , the optimal objective value of Program (5.12) is at least OPT, whereOPT is the optimal sender utility in private signaling. Proof. By Lemma 7, there exists a private signaling scheme' such that: (i)' achieves sender utility at leastOPT; (ii) for each, there existsK setsS 1 ;:::;S K [n] such that' is a uniform distribution overfS 1 ;:::;S K g. Utilizing ', we can construct a feasible solution x to Program (5.12) with objective value at leastOPT. In particular, letx j 2f0; 1g n be the indicator vector of the setS j , formally defined as follows: x j ;i = 1 if and only ifi2 S j . By 51 Algorithm 3: Private Signaling Scheme for Submodular Sender Objectives Parameter: > 0 Input: Prior distribution supported on Input: u i ()’s and value oracle access to the sender utilityf (S) Input: State of nature Output: A setS [n] indicating the set of receivers who will be recommended action 1. 1: Approximately solve Program (5.12). Letfe x j ;i g 2;i2[n];j2[K] be the returned solution. 2: Choosej from [K] uniformly at random; For each receiveri, addi toS independently with probabilitye x j ;i . 3: ReturnS. referring to the feasibility of' for LP (5.7), it is easy to check thatx j ;i ’s are feasible for Program (5.12). Moreover, sinceF (x j ) = f (S j ), the objective value of Program (5.12) at the solution x equals the objective value of Program (5.7) at the solution ', which is at least OPT. Therefore, the optimal objective value of Program (5.12) is at leastOPT, as desired. Claim 2. There is an algorithm that runs in poly(n;jj;K; 1 ) time and computes a (1 1=e)- approximate solution, modulo an additive loss of=e, to Program (5.12). Proof. The objective function of Program (5.12) is a linear combination, with non-negative coef- ficients, of multilinear extensions of monotone submodular functions, and thus is smooth, mono- tone and submodular. Moreover, the function value can be evaluated within error by poly(n; 1 ) random samples, and thus in poly(n; 1 ) time. To apply Theorem 5.3.1, we only need to prove that the feasible region is a down-monotone polytope. Observe that there always exists an optimal solution to Program (5.12) such thatx ;i = 1 for any;i such thatu i () 0. Therefore, w.l.o.g., we can pre-set these variables to be 1 and view the program as an optimization problem over x ;i ’s for all;i such thatu i ()< 0. It is easy to verify that thesex ;i ’s form a down-monotone polytope determined by polynomially many constraints, as desired. Proof of Lemma 7 Our proof is based on the probabilistic method. Recall that the optimal private signaling scheme can be computed by solving the exponentially large LP (5.7). Roughly speaking, given any optimal private scheme ' , we will take polynomially many samples from ' () for each , and prove that with strictly positive probability the corresponding empirical distributions form a solution to LP (5.7) that is close to optimality. However, the sampling approach usually suffers from -loss in both the objective and persuasiveness constraints. It turns out that the -loss in persuasiveness constraints can be avoided in our setting with carefully designed pre-processing steps. 52 At a high level, to get rid of the -loss in persuasiveness constraints, there are two main technical barriers. The first is to handle the estimation error in the receiver’s utilities, which is inevitable due to sampling. We address this by adjusting ' to strengthen the persuasiveness constraints so that a small estimation error still preserves the original persuasiveness constraints. The second barrier arises when somex ;i ’s are smaller than inverse polynomial of the precision . Thenpoly( 1 ) samples cannot guarantee a good multiplicative estimate ofx ;i . We deal with this issue by making the “honest” recommendation, i.e., action 0, in these cases, and show that such a modification does not cause much loss in our objective. We first introduce some convenient notations. For any receiveri, let + i =f :u i () 0g be the set of states in which receiveri (weakly) prefers action 1; similarly, i =f :u i ()< 0g is the set of states in which receiveri strictly prefers action 0. For any state of nature, letI + = fi :u i () 0g be the set of receivers who (weakly) prefer action 1 in state. It is convenient to think off + i g i2[n] andfI + g 2 as two different partitions of the setf(;i) :u i () 0g. Observe that by monotonicity there always exists an optimal signaling scheme' such that x ;i = 1 for every 2 + i . Let ' be such an optimal signaling scheme and OPT denote the optimal sender utility. We now adjust the scheme' without degrading the objective value by much but such that the scheme is more suitable for applying concentration bounds for our probabilistic argument. Adjustment 1: Always Recommend Action 0 Whenx ;i < 3n Note thatx ;i < 3n only when2 i , i.e., action 0 is the best action for receiveri condi- tioned on. We first adjust' to obtain a new schemee ', as follows: e ' is the same as' except that for every;i such thatx ;i < 3n ,e ' always recommends action 0 to receiveri given the state of nature. As a result,e x ;i equalsx ;i wheneverx ;i 3n and equals 0 otherwise. Note that the signaling scheme still satisfies the persuasiveness constraints. Naturally, each adjustment above, corresponding to;i satisfyingx ;i < 3n , could decrease the objective value since the marginal probability of recommending action 1 decreases. Never- theless, this loss, denoted asL(;i), can be properly bounded as follows: L(;i) = () X S:i2S ' (;S)f (S) X S:i2S ' (;S)f (Snfig) () X S:i2S ' (;S) = ()x ;i () 3n : 53 As a result, the aggregated loss of all the adjustments made in this step can be upper bounded by P 2 P n i=1 () 3n = 3 . That is, the objective value ofe ' is at leastOPT 3 . Adjustment 2: Strengthen the Persuasiveness Constraints by Scaling Downx ;i ’s We now strengthen the persuasiveness constraints by further adjusting the e ' obtained above so that a small estimation error due to sampling will still maintain the original persuasive- ness constraints. For any , we define ' 0 (;S) = 3 3+ e '(;S) for all S 6= I + , and define ' 0 (;I + ) = 1 P S6=I + ' 0 (;S). Obviously, ' 0 is still a distribution over 2 [n] . We claim thatx 0 ;i = E S' 0 I(i2 S) = 1 whenevere x ;i = 1, i.e., 2 + i . That is, given state, any receiveri2 I + will still aways be recommended action 1. This is because, to construct' 0 , we moved some probability mass from all other setsS to the setI + ; therefore the marginal proba- bility of recommending action 1 to any receiveri2I + will not decrease. However, this marginal probability is originally 1 in the solution of e '. Therefore,x 0 ;i still equals 1 for anyi2 I + , or equivalently, for any2 + i . Similarly, we also havex 0 ;i = 0 whenevere x ;i = 0. LetVal(') denote the objective value of a scheme'. We claim thatVal(' 0 )OPT 2 3 and' 0 satisfiesx 0 ;i = 3 3+ e x ;i for every2 i . For anyi2 [n];2 i (which meansi62I + ), we have x 0 ;i = X S:i2S ' 0 (;S) = 3 3 + X S:i2S e '(;S) = 3 3 + e x ;i ; since the summation excludes the term' 0 (;I + ). We now prove the guarantee of the objective value. Observe that' 0 (;I + ) 3 3+ e '(;I + ) also holds in our construction. Therefore, we have Val(' 0 ) = X 2 () X S[n] ' 0 (;S)f (S) 3 3 + X 2 () X S[n] e '(;S)f (S) = 3 3 + Val(e ') OPT 2 3 ; where we used the upper boundVal(e ') 1. Existence of An-Optimal Solution of Small Support. The above two steps of adjustment result in a feasible 2 3 -optimal solution' 0 to LP (5.7) that satisfies the following properties: (i)x 0 ;i =x ;i = 1 wheneveru i () 0; (ii)x 0 ;i = 3 3+ e x ;i = 3 3+ x ;i 4n whenx ;i 3n and2 i ; (iii)x 0 ;i = 0 whenx ;i < 3n and2 i . Utilizing such a' 0 we show that there exists an-optimal solution' to LP (5.7) such that the distribution ' is aK-uniform distribution for every, whereK = 108n log(2njj) 3 . 54 Our proof is based on the probabilistic method. For each , independently take K = 108n log(2njj) 3 samples from random variable' 0 (), and let' denote the corresponding empiri- cal distribution. Obviously,' is aK-uniform distribution. We claim that with strictly positive probability over the randomness of the samples,' is feasible to LP (5.7) and achieves utility at leastVal(' 0 ) 3 OPT. We first examine the objective value. Note that the objective valueVal(' 0 ) can be viewed as the expectation of the random variable P 2 ()f (S )2 [0; 1], whereS follows the distribu- tion of' 0 (). Our sampling procedure generatesK samples for the random variablefS g 2 ; therefore by the Hoeffding bound, with probability at least 1exp(2K 2 =9)> 11=(2njj), the empirical mean is at leastVal(' 0 )=3. Now we only need to show that all the persuasiveness constraints are preserved with high probability. First, if x 0 ;i = 0, then x ;i induced by ' also equals 0. This is because x 0 ;i = E S' 0 () I(i2 S) = 0 implies thati is not contained in anyS from the support of' 0 (), and therefore, also not contained in any sample. Similarly, x 0 ;i = 1 implies x ;i = 1. To show that all the persuasiveness constraints hold, we only need to argue that x ;i x ;i for every 2 i satisfying x ;i 3n . This holds with high probability by tail bounds. In particular, x 0 ;i = E S' 0 () I(i2 S) and we take K samples from ' 0 (). By the Chernoff bound, with probability at least 1 exp( K 2 x 0 ;i 27 ) 1 exp( K 3 108n )> 1 1 2njj ; the empirical meanx ;i is at most (1 +=3)x 0 ;i =x ;i . Note that there are at mostnjj choices of such;i. By the union bound, with probability at least 1 (njj + 1)=(2njj) > 0, ' satisfies all the persuasiveness constraints and thus is feasible for LP (5.7), and achieves objective value at leastVal(' 0 ) 3 OPT. So there must exist a feasible-optimal solution' to LP (5.7) such that' isK-uniform for every. This concludes our proof of Lemma 7. 5.3.5 The Sharp Contrast Between Private and Public Persuasion A public signaling scheme can be viewed as a special type of private signaling schemes in which each receiver must receive the same signal, i.e., only a public signal is sent. Overloading the notation of Section 5.3.3, we use to denote the set of public signals and2 to denote a public signal. A public signaling scheme is fully specified byf(;)g ; , where(;) denotes the probability of sending signal in state. Upon receiving a signal, each receiver performs the same Bayesian update and infers a posterior belief over the state of nature, as follows: the realized state is with probability()(;)=Pr(), wherePr() = P 2 (;). This induces a 55 subgame for each signal, one in which all receivers share the same belief regarding the state of nature. Whereas in more general settings than ours, receivers may play a mixed Nash equilibrium in each subgame, our restriction to a setting with no externalities removes this complication. Given a posterior distribution on states of nature (say, one induced by a signal ), our receivers face disjoint single-agent decision problems, each of which admits an optimal pure strategy. We as- sume that receivers break ties in favor of the sender (specifically, in favor of action 1), which results in a unique pure response for each receiver. Therefore, our solution concept here results in a unique action profile for each posterior distribution, and hence for each signal. A simple revelation-principle style argument then allows us to conclude that there is an optimal public sig- naling scheme which is direct, meaning that the public signals are action profiles, and persuasive, meaning that in the subgame induced by the signal = ( 1 ;:::; n ) each receiveri’s optimal decision problem (which breaks ties in favor of action 1) solves to action i . Restricting attention to direct and persuasive public signaling schemes, each signal can also be viewed as a subsetS [n] of receivers taking action 1. The sender’s optimization problem can then be written as the following exponentially large linear program. maximize P 2 () P S[n] (;S)f (S) subject to P 2 ()(;S)u i () 0; forS [n] withi2S: P S[n] (;S) = 1; for2 : (;S) 0; for2 ;S [n]: (5.13) The first set of constraints are persuasiveness constraints corresponding to action 1. Like in LP (5.7), the persuasiveness constraints for action 0 are intentionally omitted from this LP. This omission is without loss whenf is non-decreasing for each state: if signalS withi62S is such that receiveri prefers action 1 in the resulting subgame, then we can replace it with the signal S[fig without degrading the sender’s utility. We remark that LP (5.13) and LP (5.7) only differ in their persuasiveness constraints. We now consider the design of optimal public signaling schemes, and show a stark contrast with private signaling, both in terms of their efficacy at optimizing the sender’s utility, and in terms of their computational complexity. We start with an example illustrating how the restriction to public signaling can drastically reduce the sender’s expected utility. The example is notably simple: two states of nature, and a binary sender utility function which is independent of the state of nature. We show a multiplica- tive gap of (n), and an additive gap of 1 1 (n) , between the expected sender utility from the optimal private and public signaling schemes, wheren is the number of receivers. 56 Example 2 (Inefficacy of Public Signaling Schemes). Consider an instance with n identical receivers and two states of nature =fH;Lg. Each receiver has the same utility function, defined as follows: u i (H) = 1 andu i (L) =1, for alli. The state of natureH occurs with probability 1 n+1 , and L occurs with probability n n+1 . The sender’s utility function is f (S) = f(S) = min(jSj; 1). In other words, the sender gets utility 1 precisely when at least one receiver takes action 1. The persuasiveness constraints imply that each receiver can take action 1 with probability no more than 2 n+1 . This is achievable by always recommending action 1 to the receiver in stateH, and recommending action 0 with probability 1 n in stateL. The sender’s expected utility depends on how these recommendations are correlated. The optimal private scheme anti-correlates the receivers’ recommendations in order to guar- antee that at least one receiver takes action 1 always, which achieves an expected sender utility of 1, the maximum possible. Specifically, in stateH the scheme always recommends action 1 to every receiver, and in stateL the scheme chooses one receiver uniformly at random and recom- mends action 1 to that receiver, and action 0 to the other receivers. We argue that no public scheme can achieve sender utility more than 2 n+1 . Indeed, since receivers are identical, our solution concept implies that they choose the same action for every realization of a public signal. Therefore, the best that a public scheme can do is to recommend action 1 to all receivers simultaneously with probability 2 n+1 in aggregate, and recommend action 0 with the remaining probability, yielding an expected sender utility of 2 n+1 . This is achievable: in stateH the scheme always recommends action 1 to every receiver, and in stateL the scheme recommends action 1 to all receivers with probability 1 n , and action 0 to all receivers with prob- ability 1 1 n . Our next result illustrates the computational barrier to obtaining the optimal public signaling scheme, even for additive sender utility functions. Our proof is inspired by a reduction in (Cheng et al., 2015) for proving the hardness of computing the best posterior distribution over , a problem termed mixture selection in (Cheng et al., 2015), in a voting setting. That reduction is from the maximum independent set problem. Since a public signaling scheme is a combination of posterior distributions, one for each signal, we require a more involved reduction from a graph- coloring problem to prove our result. Theorem 5.3.5. Consider public signaling in our model, with sender utility function f (S) = f(S) = jSj n . It is NP-hard to approximate the optimal sender utility to within any constant multiplicative factor. Moreover, there is no additive PTAS for evaluating the optimal sender utility, unless P = NP . 57 Proof. We prove the theorem by reducing from the following NP-hard problem. (Khot & Saket, 2012) proves that for any positive integerk, any integerq such thatq 2 k + 1, and an arbitrarily small constant > 0, given an undirected graph G, it is NP-hard to distinguish between the following two cases: Case 1: There is aq-colorable induced subgraph ofG containing a (1) fraction of all vertices, where each color class contains a 1 q fraction of all vertices. Case 2: Every independent set inG contains less than a 1 q k+1 fraction of all vertices. Given a graphG with vertices [n] =f1;:::;ng and edgesE, we will construct a public persua- sion instance so that the desired algorithm for approximating the optimal sender utility can be used to distinguish these two cases. Our construction is similar to that in (Cheng et al., 2015). We let there ben receivers, and let = [n]. In other words, both receivers and states of nature correspond to vertices of the graph. The prior distribution over states of nature is uniform — i.e., the realized state of nature is a uniformly-drawn vertex in the graph. We define the receiver utili- ties as follows: u i () = 1 2 ifi = ;u i () =1 if (i;)2E; andu i () = 1 4n otherwise. We define the sender’s utility function, with range [0; 1], to bef (S) = f(S) = jSj n . The following claim is proven in (Cheng et al., 2015). Claim 3. (Cheng et al., 2015) For any distribution x 2 , the set S = fi 2 [n] : P 2 x u i () 0g is an independent set ofG. Claim 3 implies that upon receiving any public signal with any posterior distributionx over , the players who take action 1 always form an independent set ofG. Therefore, if the graphG is from Case 2, the sender’s expected utility in any public signaling scheme is at most 1 q k+1 . Now supposing thatG is from Case 1, we fix the corresponding coloring of (1)n vertices with colorsk = 1;:::;q, and we use this coloring to construct a public scheme achieving ex- pected sender utility at least (1) 2 q . The scheme usesq+1 signals, and is as follows: if has color k then deterministically send the signalk, and if is uncolored then deterministically send the sig- nal 0. Given signalk> 0, the posterior distribution on states of nature is the uniform distribution over the vertices with colork — an independent setS k of size 1 q n. It is easy to verify that re- ceiversi2S k prefer action 1 to action 0, since P 2S k 1 jS k j u i () = 1 jS k j ( 1 2 jS k j1 4n )> 1 4jS k j 0. Therefore, the sender’s utility isf(S k ) = jS k j n = 1 q wheneverk> 0. Since signal 0 has proba- bility, we conclude that the sender’s expected utility is at least (1) 2 q , as needed. Since distinguishing Case 1 and Case 2 is NP-hard for arbitrarily large constantsk andq, we conclude that it is NP-hard to approximate the optimal sender utility to within any constant factor. Moreover, by settingk = 1;q = 3, we conclude that the sender’s utility cannot be approximated additively to within (1) 2 =3 1=3 2 > 1=9, and thus there is no additive PTAS, unless P=NP. 58 Chapter 6 Persuasion in Security Games Chapter 5 studies the algorithmic foundations for basic models of persuasion. In this chapter, we examine how these basic economic models can be applied to real-world security problems, particularly, the motivating domains described in Chapter 4. We will also illustrate how the specific domain features further complicate the problem and how we overcome these challenges by developing new algorithmic techniques. 6.1 Exploiting Informational Advantage to Deter Fare Evasion In this section, we study how to improve a defender’s utility by strategically revealing noisy information about each target’s protection status to the attacker. We develop a two-stage security game model which abstracts the example described in Section 4.1. We then study when the defender can strictly benefit from such strategic signaling and how the defender can play both stages in a globally optimal fashion. Finally, we experimentally show that the two-state security game model allows the defender to achieve better utility than SSE in simulated random games. 6.1.1 A Two-Stage Security Game Model Consider a security game where the defender allocates k security resources, possibly under scheduling constraints, to protect n targets. Players’ strategies and the payoff structure are as described in Section 2.2.1. The game has two stages. The first stage is similar to regular security games, during which the defender commits to a mixed strategy. We now model the second stage — the signaling procedure. This stage can be viewed as a Bayesian persuasion game (Kamenica & Gentzkow, 2011), during which the defender persuades a rational attacker in order to yield a desired outcome. So we call it the persuasion phase. Specifically, for anyt2 [n] covered with probabilityx t , letZ =fZ c ;Z u g be the set of events describing whethert is covered (Z c ) or not 59 (Z u ) and be the set of all possible signals. A signaling scheme, with respect to targett, is a randomized map f c :Z rnd ! : The set of probabilities fp(z;) : z2Z; 2 g completely describes the mapf, in whichp(z;) is the probability that eventz2Z happens and signal2 is sent. Therefore, P p(z;) = P(z),8z2 Z. Upon receiving a signal, the attacker infers a posterior distributionP(Z c j) = p(Zc;) p(Zc;)+p(Zu;) andP(Z u j) = p(Zu;) p(Zc;)+p(Zu;) , and makes a decision among two actions: attack or not attack. For every targett, the defender seeks a signaling scheme w.r.t.t to maximize her expected utility ont. Mathematically, a signal induces a posterior distribution on Z. Thus a signaling scheme can be viewed as a partition of the prior distribution (x t ; 1x t ) intojj posteriors so that it maximizes the defender’s utility on t. Like in Bayesian persuasion, we can w.l.o.g. focus on “direct” signaling schemes, as formalized in the following lemma. Lemma 8. (Kamenica & Gentzkow, 2011) There exists an optimal signaling scheme, w.r.t. any targett, that uses at most two signals, each resulting in an attacker best response of attacking and not attacking, respectively. As a result, a signaling scheme w.r.t.t can be characterized by p(Z c ; c ) =p p(Z c ; u ) =x t p; p(Z u ; c ) =q p(Z u ; u ) = 1x t q; in whichp2 [0;x t ];q2 [0; 1x t ] are variables. So the attacker infers the following expected utility: E(utilityj c ) = 1 p+q (pU a c +qU a u ) andE(utilityj u ) = 1 1pq ((xp)U a c + (1x q)U a u ), where, for ease of notation, we drop the “t” in x t and U d=a c=u (t) when it is clear from context. W.l.o.g, let c be a signal recommending the attacker to not attack, i.e., constraining E(utilityj c ) 0, in which case both players get 0. Then the following LP parametrized by coverage probabilityx, denoted aspeLP t (x) (Persuasion Linear Program), computes the optimal signaling scheme w.r.t.t: max (xp)U d c + (1xq)U d u (6.1) s:t: pU a c +qU a u 0 (xp)U a c + (1xq)U a u 0 0px 0q 1x: 60 This yields the attacker utilityP( u )E(utilityj u ) +P( c ) 0 = (xp)U a c + (1xq)U a u and defender utility (xp)U d c + (1xq)U d u , w.r.t.t. We propose the following two-stage Stackelberg security game model: Phase 1 (Scheduling Phase): the defender (randomly) schedules the resources by playing a mixed strategyx2 [0; 1] T , and samples one pure strategy each round. Phase 2 (Persuasion Phase):8t2 [n], the defender commits to an optimal signaling scheme w.r.t. t computed by peLP t (x t ) before the game starts, and then in each round, sends a signal on each targett according to the commitment. During the play, the attacker first observesx by surveillance. Then he chooses a targett 0 to approach or board at some round, where the attacker receives a signal and decides whether to attackt 0 or not. Note that the model makes the following three assumptions. First, the defender is able to commit to a signaling scheme, and crucially will also follow the commitment. She is incentivized to do so because otherwise the attacker will not trust the signaling scheme, and thus may ignore signals. Then the game degenerates to a standard Stackelberg game. Second, the attacker breaks ties in favor of the defender. Similar to the definition of SSE, this is without loss of generality since if there is a tie among different choices, we can always make a tiny shift of the probability mass to make the choice preferred by the defender better than other choices. Third, we assume that the attacker cannot distinguish whether a target is protected or not when he approaches it. With the persuasion phase, both of the defender and the attacker’s payoff structures might be changed. Specifically, the defender’s utility on any targett is the optimal objective value of the linear programpeLP t (x), which is non-linear inx. Can the defender always strictly benefit by adding the persuasion phase? How can we compute the optimal mixed strategy in this new model? We answer these questions in the next two sections. 6.1.2 When Does Signaling Help? In this section, fixing a marginal coverage x on a target t, we compare the defender’s and at- tacker’s utilities w.r.t.t in the following two different models: Model 1: the regular security game model, without persuasion (but the attacker can choose to not attack); Model 2: the two-stage security game model, in which the signaling scheme w.r.t. t is optimal. 61 The following notation will be used frequently in our comparisons and proofs (the indext is omitted when it is clear): DefU 1=2 (t) : defender’s expected utility in Model 1/2; AttU 1=2 (t) : attacker’s expected utility in Model 1/2; U def/att (t) : =xU d=a c + (1x)U d=a u ; expected utility of defense/attack, if attacker attackst: Note that AttU 1 = max(U att ; 0) may not equal U att since the attacker chooses to not attack if U att < 0. Similarly, DefU 1 may not equal to U def . Defender’s Utility First, we observe that the defender will never be worse off in Model 2 than Model 1 w.r.t.t. Proposition 1. For anyt2 [n], DefU 2 DefU 1 . Proof. If U att 0, then p;q = 0 is a feasible solution to peLP t (x) in formula (6.1), which achieves a defender utilityxU d c + (1x)U d u = DefU 1 . So DefU 2 DefU 1 . If U att < 0, the attacker will choose to not attack in Model 1, so DefU 1 = 0. In this case, p = x;q = 1x is a feasible solution to peLP t (x), which achieves a defender utility 0. So DefU 2 0 = DefU 1 . However, the question is whether the defender will always strictly benefit w.r.t. t from the persuasion phase. The following theorem gives a succinct characterization. Theorem 6.1.1. For anyt2 [n] with marginal coveragex2 [0; 1], DefU 2 > DefU 1 , if and only if: U att (U d c U a u U a c U d u )< 0: (6.2) Proof. The inequality Condition (6.2) corresponds to the following four cases: 1. U att > 0;U d u 0;U d c U a u U a c U d u < 0; 2. U att > 0;U d u < 0;U d c U a u U a c U d u < 0; 3. U att < 0;U d u 0;U d c U a u U a c U d u > 0; 4. U att < 0;U d u < 0;U d c U a u U a c U d u > 0: 62 0 = + u a c a qU pU u d c d qU pU obj + = q p Type equation here. 1 − u d c d qU pU obj + = q p att u a c a U qU pU = + 1− Figure 6.1: Feasible regions (gray areas) and an objective function gaining strictly better defender utility than SSE for the case U att > 0 (Left) and U att < 0 (Right). Case 1 obviously does not happen, sinceU d c U a u U a c U d u > 0 whenU d c > U d u 0 andU a u > 0 > U a c . Interestingly, cases 2–4 correspond exactly to all the three possible conditions that make DefU 2 > DefU 1 . We now give a geometric proof. Instead ofpeLP t (x), we consider the following equivalent LP: min pU d c +qU d u s:t: pU a c +qU a u 0 pU a c +qU a u U att 0px 0q 1x; so that DefU 2 = U def Opt. Figure 6.1 plots the feasible region for the cases U att > 0 and U att < 0, respectively. Note that the vertex (x; 0) can never be an optimal solution in either case, since the feasible point (x;) for tiny enough > 0 always achieves strictly smaller objective value, assumingU d c >U d u . When U att > 0, the attacker chooses to attack, resulting in DefU 1 = U def . So to strictly increase the defender’s utility is equivalent to making Opt < 0 for the above LP. That is, we only need to guarantee that the optimal solution is not the origin (0; 0) (a vertex of the feasible polytope). This happens whenU d u < 0, and the slope ofobj =pU d c +qU d u is less than the slope of 0 =pU a c +qU a u , that is,U d c =U d u U a c =U a u > 0. These conditions correspond to the case 2. In this case, the defender gains extra utilityOpt = z U a u (U a u U d c U a c U d u )> 0 by adding the persuasion phase. When U att < 0, the attacker chooses to not attack, resulting in DefU 1 = 0. To increase the defender’s utility, we have to guarantee Opt < U def . Note that the vertex (x; 1x) yields exactly an objective U def , so we only need to guarantee the optimal solution is the vertex ( U att U a c ; 0). This happens either whenU d u 0 (corresponding to case 3 in which caseU d c U a u U a c U d u > 0 holds naturally) or whenU d u < 0 and the slope ofobj = pU d c +qU d u is greater than the slope 63 of 0 = pU a c +qU a u . That is,U d c =U d u >U a c =U a u . This corresponds to case 4 above. In such cases, the defender gains extra utility U def Opt = 1x U a c (U a u U d c U a c U d u ) > 0 by adding the persuasion phase. When U att = 0, the possible optimal vertices are (0; 0) and (x; 1x), which corresponds to the defender utility 0 and U def , respectively. So DefU 2 = maxf0; U def g at optimality, which equals to DefU 1 assuming the attacker breaks ties in favor of the defender. Interpreting the Condition in Theorem 6.1.1 Inequality (6.2) immediately yields that the defender does not benefit from persuasion in zero- sum security games, sinceU d c U a u U a c U d u = 0 for any target in zero-sum games. Intuitively, this is because there are no posterior distributions, and thus signals, where the defender and attacker can cooperate due to the strictly competitive nature of zero-sum games. One case of the Inequality (6.2) is U att > 0 andU d c U a u U a c U d u < 0. To interpret the latter, let us start from a zero-sum game, which assumesU d u = U a u > 0 andU d c =U a c > 0. Then the conditionU d c U a u U a c U d u =U d c U a u (U a c )(U d u )< 0 could be achieved by makingU d u >U a u orU d c <U a c . That is, the defender values a target more than the attacker (U d u >U a u ), e.g., the damage to a flight causes more utility loss to the defender than the utility gained by the attacker, or the defender values catching the attacker less than the cost to the attacker (U d c <U a c ), e.g., the defender does not gain much benefit by placing a violator in jail but the violator loses a lot. In such games, if the attacker has incentives to attack (i.e., U att > 0), the defender can “persuade” him to not attack. Another case of Condition 2 is U att < 0 andU d c U a u U a c U d u > 0. In contrast to the situation above, this is when the defender values a target less than the attacker (e.g., a fake target or honey pot) but cares more about catching the attacker. Interestingly, the defender benefits when the attacker does not want to attack (i.e., U att < 0), but the defender “entices” him to commit an attack in order to catch him. Attacker’s Utility Now we compare the attacker’s utilities w.r.t.t in Model 1 and Model 2. Recall that Proposition 1 shows the defender will never be worse off. A natural question is: whether the attacker can be strictly better off? The attacker will never be worse off under any signaling scheme. Intuitively, this is because he could just ignore any signals. Mathematically, this holds simply by observing the constraints inpeLP t (x) Formulation 6.1: 1. when U att 0, AttU 1 = U att =xU a c +(1x)U a u and AttU 2 = (xp)U a c +(1xq)U a u , so AttU 1 AttU 2 =pU a c +qU a u 0; 64 2. when U att < 0, AttU 2 = (xp)U a c + (1xq)U a u 0 = AttU 1 . Note that the above conclusion holds without requiring the signaling scheme to be optimal, since the derivation only uses feasibility constraints. Interestingly, if the defender does persuade opti- mally, then equality holds. Theorem 6.1.2. Given any targett2 [n] with marginal coveragex2 [0; 1], we have AttU 1 = AttU 2 = max(0; U att ) . Proof. FrompeLP t (x) we know that AttU 2 = U att (pU a c +qU a u ). The proof is divided into three cases. When U att > 0 (left panel in Figure 6.1), we have AttU 1 = U att . As argued in the proof of Theorem 6.1.1, the optimal solution can never be the vertex (x; 0). So the only possible optimal vertices are (0; 0) and (x;x U a c U a u ), both of which satisfypU a c +qU a u = 0. So AttU 2 = U att (pU a c +qU a u ) = U att = DefU 1 . When U att < 0 (right panel in Figure 6.1),we have AttU 1 = 0. The only possible optimal vertices are (x; 1x) or ( U att U a c ; 0), both of which satisfiespU a c +qU a u = U att . So AttU 2 = 0 = AttU 1 . For the case U att = 0, a similar argument holds. To sum up, we always have AttU 1 = AttU 2 . 6.1.3 Computing the Optimal Defender Strategy As we have seen so far, the defender can strictly benefit from persuasion in the two-stage security game model. Here comes the natural question for computer scientists: how can we compute the optimal mixed strategy? We answer the question in this section, starting with an example showing that the defender’s optimal mixed strategy in the two-stage model is different from the SSE in its standard security game model. Example 3. Consider a security game with payoff matrix in Table 6.1. U d c U d u U a c U a u t 1 1 -2 -1 1 t 2 3 -5 -3 5 t 3 1 -4 -2 4 t 4 0 -0.5 -2 1 Table 6.1: Payoff table for the constructed game Assume that there are two resources, and the feasible pure strategies are A 1 = (t 1 ;t 2 ), A 2 = (t 2 ;t 3 ) andA 3 = (t 3 ;t 4 ). Letp = (p 1 ;p 2 ;p 3 ) denote a mixed strategy wherep i is the probability of taking actionA i . After simple calculations, one can compute the Strong Stackelberg Equilibrium (SSE) asp = ( 3 8 ; 7 32 ; 13 32 ) with coverage probability vectorx = ( 3 8 ; 19 32 ; 5 8 ; 13 32 ). The 65 attacker’s utility is ( 1 4 ; 1 4 ; 1 4 ; 7 32 ) and the defender’s utility is ( 7 8 ; 1 4 ; 7 8 ; 19 64 ), so the attacker will attackt 2 . Now, if we add the persuasion phase as in Model 2, the optimal mixed strategy is p = ( 3 8 ; 3 8 ; 1 4 ) with coverage probability vectorx = ( 3 8 ; 3 4 ; 5 8 ; 1 4 ). The attacker’s utility is ( 1 4 ;1; 1 4 ; 1 4 ) and defender’s utility is ( 1 2 ; 1; 1 4 ; 1 8 ), so the attacker will attackt 4 , in favor of the defender’s preference. So the defender’s utility changes from 1 4 in Model 1 to 1 8 in Model 2. Therefore, we define the following solution concept. Definition 3. The optimal defender mixed strategy and signaling scheme in the two-stage Stack- elberg security game, together with the attacker’s best response, form an equilibrium called the Strong Stackelberg Equilibrium with Persuasion (peSSE). Proposition 1 yields that, by adding the persuasion phase, the defender’s utility will not be worse off under any mixed strategy, and specifically under the SSE mixed strategy. This yields the following performance guarantee of peSSE. Proposition 2. Given any security game, the defender’s utility in peSSE is at least the defender’s utility in SSE. Now we consider the computation of peSSE. Note that the optimal signaling scheme can be computed by LP 6.1 for any targett with given coverage probabilityx t . The main challenge is how to compute the optimal mixed strategy in Phase 1. Assume that the defender’s (leader) mixed strategy, represented as a marginal coverage vector over target set [n], lies in a polytopeP d . 1 With a bit of abuse of notation, let us usepeLP t (x t ) to denote also the optimal objective value of the persuasion LP, as a function ofx t . Let U att (t;x) =xU a c (t) + (1x)U a u (t) be the attacker’s expected utility, if he attacks, as a linear function ofx. Recall that, given a mixed strategyx2 [0; 1] T , the defender’s utility w.r.t.t ispeLP t (x t ) and the attacker’s utility w.r.t. t is max(U att (t;x t ); 0) (Theorem 6.1.2). Similar to the framework in 1 Note that a polytope can always be represented by linear constraints (though possibly exponentially many). For example, a simple case is the games in which pure strategies are arbitrary subsetsA [n] with cardinalityjAjk, P d can be represented by 2T + 1 linear inequalities: P i xi k and 0 x 1. However,P d can be compli- cated in security games, such that it is NP-hard to optimize a linear objective overP d (Xu, 2016). Finding succinct representations ofP d plays a key role in the computation of SSE. 66 (Conitzer & Sandholm, 2006), we define the following optimization problem for every targett, denoted asOPT t : max peLP t (x t ) (6.3) s:t: max(U att (t;x t ); 0) max(U att (t 0 ;x t 0); 0)8t 0 x2P d ; which computes a defender mixed strategy maximizing the defender’s utility ont, subject to: 1. the mixed strategy is achievable; 2. attackingt is the attacker’s best response. Notice that some of these optimization problems may be infeasible. Nevertheless, at least one of them is feasible. The peSSE is obtained by solving theseT optimization problems and picking the best solution among thoseOPT t ’s. To solve optimization problem (6.3), we have to deal with non-linear constraints and the specific objectivepeLP t (x t ), which is the optimal objective value of another LP. We first simplify the constraints to make them linear. In particular, the following constraints max(U att (t;x t ); 0) max(U att (t 0 ;x t 0); 0);8t 0 2 [n] can be split into two cases, corresponding to U att (t;x t ) 0 and U att (t;x t ) 0 respectively, as follows, CASE 1 CASE 2 U att (t;x t ) 0 U att (t 0 ;x t 0) 0;8t 0 U att (t;x t ) U att (t 0 ;x t 0);8t 0 Now, the only problem is to deal with the objective function in Formulation (6.3). Here comes the crux. Lemma 9. For anyt2 [n],peLP t (x) is increasing inx for anyx2 (0; 1). Proof. For ease of notation, letf(x) =peLP t (x). We show that for any sufficiently small> 0 (so thatx +< 1),f(x +)f(x). Fixingx, if the optimal solution forpeLP t (x), sayp ;q , satisfiesq = 0, then we observe thatp ;q is also feasible forpeLP t (x+). As a result, plugging p ;q inpeLP t (x +), we havef(x +) (xp )U d c + (1xq )U d u +(U d c U d c )f(x) since (U d c U d c ) 0. On the other hand, if q > 0, then for any small > 0 (specifically, <q ),p +;q is feasible forpeLP t (x +). Here the only need is to check the feasibility constraint (p +)U a c + (q )U a u = p U a c +q U a u +(U a c U a u ) 0, which holds since 67 (U a c U a u ) 0. This feasible solution achieves an objective value equalingf(x). Therefore, we must havef(x +)f(x). The intuition behind Lemma 9 is straightforward — the defender should always get more utility by protecting a target more. However, this actually does not hold in standard security games. Simply consider a target withU d c = 2,U d u =1 andU a c =1,U a u = 1. If the target is covered with probability 0:4, then in expectation both the attacker and defender get 0:2; however, if the target is covered with probability 0:6, the attacker will not attack and both of them get 0. Therefore, the monotonicity in Lemma 9 is really due to the signaling scheme. Back to the optimization problem (6.3), here comes our last key observation: the monotonic- ity property in Lemma 9 reduces the problem to an LP. Specifically, the following lemma is a simple consequence of the monotonicity. Lemma 10. Maximizing the increasing functionpeLP t (x t ) over any feasible regionD reduces to directly maximizingx t overD and then plugging in the optimalx t topeLP t (x t ). To this end, we summarize the main results in this section. The following theorem essentially shows that computing peSSE efficiently reduces to computing SSE (see (Conitzer & Sandholm, 2006) for a standard way to compute SSE by multiple LPs). In other words, adding the persuasion phase does not increase the computational complexity. Theorem 6.1.3. For any security game, the Strong Stackelberg Equilibrium with Persuasion (peSSE), defined in Definition 3, can be computed by multiple LPs. Proof. According to Lemma 9 and 10, Algorithm 4, based on multiple LPs, computes the peSSE. 6.1.4 Experiments In this section, we compare SSE and peSSE on randomly generated security games. Our simula- tions aim to compare the two concepts, SSE and peSSE, in games with various payoff structures. To generate payoffs, we follow most security game papers and use the covariance random payoff generator (Nudelman, Wortman, Shoham, & Kevin, 2004), but with a slight modifica- tion. Let [a;b] denote a uniform distribution on interval [a;b]. Then we randomly generate the following random payoffs: U d c [0;r], U d u [10; 0], U a c = aU d c 10 r +b[10; 0] (setU d c 10 r = 0 ifr = 0) andU a u = aU d u +b[0; 10], wherea = cov;b = p 1a 2 . Here cov2 [1; 0] is the covariance parameter between the defender’s reward (or penalty) and the at- tacker’s penalty (or reward). Socov = 0 means completely uncorrelated payoffs whilecov =1 andr = 10 means a zero-sum game. By settingU d c 2 [0;r] whileU a c 2 [0; 10], we intentionally 68 Algorithm 4: Computing peSSE 1: For every targett2 [n], compute the optimal objectives for the following two LPs: max x t (6.4) s:t: U att (t;x t ) 0 U att (t;x t ) U att (t 0 ;x t 0);8t 0 2 [n] x2P d and max x t (6.5) s:t: U att (t 0 ;x t 0) 0;8t 0 2 [n] x2P d : Letx t;1 ,x t;2 be the optimal objective value for LP (6.4), LP (6.5) respectively.x t;i =null if the corresponding LP is infeasible. 2: Choose the non-nullx t;i , denoted asx , that maximizespeLP t (x t;i ) overt2 [n] and i = 1; 2. The optimal mixed strategy that achievesx in one of the above LPs is the peSSE mixed strategy. capture the defender’s “overall” value of catching the attacker by the parameterr. Standard co- variance payoff (Nudelman et al., 2004) fixesr = 10, but Theorem 1 suggests thatr may affect the utility difference between SSE and peSSE. In all the simulations, every game has 8 targets and 3 resources, and the attacker has the option to not attack. We simulate two different kinds of pure strategies, which results in two types of games: 1. Uniform Strategy Game (UniG): in such games, a pure strategy is any subset of targets with cardinality at most 3. 2. Random Strategy Game (RanG): for each game we randomly generate 6 pure strategies, each of which is a subset of targets with cardinality at most 3. Each target is guaranteed to be covered by at least one pure strategy. We setr = 0; 1;:::; 10 andcov = 0;0:1;0:2;:::;1. For each parameter instance, i.e.,r andcov, 100 random security games are simulated. As a result, in total 2 100 11 2 = 24; 200 (2 types of games, 11 2 parameter combinations and 100 games per case) random security games are tested in our experiments. We find that the UniG and RanG games have similar experimental performance, except that RanG games have a lower utility at a given parameter instance. This 69 −1 −0.8 −0.6 −0.4 −0.2 0 0 20 40 60 80 100 cov num UniG Equilibria −− fix r=3 SSE 6= peSSE U SSE >U peSSE −1 −0.8 −0.6 −0.4 −0.2 0 −1 −0.5 0 0.5 1 1.5 2 cov u UniG Utiity −− fix r=3 SSE peSSE Udif 0 2 4 6 8 10 0 20 40 60 80 100 r num UniG Equilibria −− fix cov=−0.5 SSE 6= peSSE U SSE >U peSSE 0 2 4 6 8 10 −1 −0.5 0 0.5 1 1.5 2 r u UniG Utility −− fix cov=−0.5 SSE peSSE Udif Figure 6.2: Comparison between SSE and peSSE: fixed parameter r = 3 (upper) and fixed parametercov =0:5. The trend is similar for differentr orcov, except the utility scales are different. is reasonable since UniG games are relaxations of the RanG games in terms of the set of pure strategies. So we only show results for UniG to avoid repetition. Figure 6.2 gives a comprehensive comparison of the difference between SSE and peSSE. All these performances are averaged over 100 games. These figures suggest the following empirical conclusions as expected (note that the trends reflected in the figures are basically similar for differentr orcov, except the utility scales are different): In the left two panels, the lineSSE6=peSSE describes the number of games within 100 simulations that have different SSE and peSSE mixed strategies. This number seems not very sensitive to the parametercov (note that games withcov =1 are not zero-sum when r = 3), but increases asr decreases. That is, when the defender cares less about catching the attacker, then persuading the attacker to not attack benefits the defender more. The lineU SSE > U peSSE in the left two panels describes how many games have strictly greater peSSE utility than SSE utility. This number increases ascov orr decreases. That is, if the defender cares less about catching the attacker or the game becomes more competitive 70 (i.e.,cov decreases), then the defender benefits more from strategic signaling. Note that the Udif lines in the right two panels also show the same trend. The right two panels show that persuasion usually helps more when the defender’s SSE utility is less. Specifically, peSSE can increase the SSE utility by about half when r is small with fixedcov =0:5 (right-lower panel). 6.2 Exploiting Informational Advantages to Combat Poaching In this section, we study how to improve a defender’s utility via strategic signaling in a different setting, motivated by the emerging application of utilizing mobile sensors for patrolling (the example in Section 4.2). This setting differs from that of the previous section in the following key aspects. First, here we assume that the attacker (e.g., a poacher) can observe whether a patroller is at a target or not before he attacks the target (e.g., whether a ranger is patrolling the area or not); while in the previous section, the attacker cannot observe whether a target is protected or not before he attacks. Second, here the defender has a limited number of signaling devices (e.g., mobile sensors) and we have to optimally place these signaling devices at targets; while previously, the defender can have every target signal noisy information about its own protection status. These differences necessitate a different game model with strategic signaling. In particular, we propose the Sensor-Empowered security Game (SEG) model in this section. SEG captures the joint allocation of human patrollers and mobile sensors, and abstracts the example described in Section 4.2. Sensors differ from patrollers in that they cannot directly interdict attacks, but they can notify nearby patrollers about the attack (if any) and strategically signal to the attacker in order to deter attacks. On the technical side, we first illustrate the challenges in solving the new SEG model by proving its NP-hardness even for zero-sum cases. We then develop a scalable algorithmSEGer based on the branch-and-price framework with two key novelties: (1) a novel MILP formulation for the slave; (2) an efficient relaxation of the problem for pruning. To further accelerate SEGer, we design a faster combinatorial algorithm for the slave problem, which is provably a constant-approximation to the slave problem in zero-sum cases and serves as a useful heuristic for general-sum SEGs. We experimentally demonstrate the benefit of utilizing mobile sensors via simulations. 6.2.1 The Model Basic Setup. Consider a security game played between a defender (she) and an attacker (he). The defender possessesk human patrollers andm mobile sensors. She aims to protectn targets, 71 whose underlying geographic structure is captured by an undirected graphG. We use [n] to denote the set of all targets, i.e., all vertices. The attacker seeks to attack one target. LetU d=a += (i) denote the defender/attacker (d=a) payoff when the defender successfully protects/fails to protect (+=) the attacked targeti. 2 AssumeU d + (i) 0>U d (i) andU a + (i) 0<U a (i) for everyi. Sensors cannot directly interdict an attack; however, they can inform patrollers to come when detecting the attacker at a target. Let integer > 0 be the intervention distance such that a sensor-informed patroller within distance to the attacked target can successfully come to intervene in the attack. If there is no patroller within distance to the attacked target, the target is not protected despite being covered by a sensor. So a target covered by some resource (i.e., sensors) is not necessarily protected, which is a key difference between SEGs and classical security games. We assume that sensors are visible. Therefore, the attacker knows whether a target is covered by a sensor or not, upon visiting the target. Defender’s Action Space of Resource Allocation. We assume that any patroller or sensor can be assigned to cover any target onG without scheduling restrictions. Therefore, a defender pure strategy covers an arbitrary subset ofk vertices with patrollers and another subset ofm vertices with sensors. For convenience, we call both patrollers and sensors resources. W.l.o.g., we assume that the defender never places more than one resource at any target (otherwise, reallocating one resource to any uncovered target would only do better). Targets in SEGs have 4 possible states: (1) covered by a patroller (state + ); (2) uncovered by any resource (state ); (3) covered by a sensor and at least one patroller is within distance (state s+ ); (4) covered by a sensor but no patroller is within distance (state s ). Note that only state + ; s+ mean successful defense. Let =f + ; ; s+ ; s g denote the set of all states. Any resource allocation uniquely determines the state for each target and vice versa. Therefore we can equivalently use a state vectore2 n to denote a defender pure strategy. Lete i 2 denote the state of targeti2 [n] andE n denote the set of defender pure strategies. A defender mixed strategy is a distribution over the exponentially large setE. Mobile Sensor Signaling. SEGs naturally integrate the sensor functionality of strategic signal- ing, which can be easily implemented for many types of mobile sensors (e.g., UA Vs). Let denote the set of possible signals that a sensor could send (e.g, noise, warning lights, etc.). Let s =f s+ ; s g denote the set of possible states when a sensor covers the target. A signaling scheme, w.r.t. targeti, is a randomized map i : s r n d ! ; 2 The utility notationU d=a += (i) is different from the standard notationU d=a c=u (i) of classic security games as described in Section 2.2.1. This is to avoid confusion because in SEGs, successfully protecting a target is not the same as covering the target with a security guard. For example, if a target is covered by a UA V and meanwhile a security guard is nearby, the target is also successfully protected. 72 which is characterized by variablesf i (e i ; i )g e i 2s; i 2 . Here i (e i ; i ) is the joint probability that targeti is in statee i 2 s and signal i 2 is sent. So P i 2 i (e i ; i ) must equalP(e i ), the marginal probability that targeti is in statee i . A sensor at targeti first determines its state e i 2 s and then sends a signal i with probability i (e i ; i )=P(e i ). We assume that the defender commits to a signaling scheme and the rational attacker is aware of the commitment. Upon observing signal i , the attacker updates his belief on the target state: P( s+ j i ) = i ( s+ ; i ) i ( s+ ; i )+ i ( s ; i ) andP( s j i ) = 1P( s+ j i ), and derives expected utility AttU( i ) =U a + (i)P( s+ j i ) +U a (i)P( s j i ): The attacker will attack target i if AttU( i ) > 0. When AttU( i ) < 0, the rational attacker chooses to not attack, in which case both players get utility 0. We assume that the attacker breaks tie in favor of the defender when AttU( i ) = 0. This is without loss of generality because the defender can always slightly tune the probabilities to favor her preferred attacker action. As illustrated in Lemma 8 of the previous section, there always exists an optimal signaling scheme (w.r.t. a target) that uses at most two signals, each resulting in an attacker best response of attacking and not attacking, respectively. In our previous example of Section 6.2.3.1, an alert signal results in not attacking while a quiet signal result in attacking. Attacker’s Action Space. We assume that the defender commits to a mixed strategy (i.e., ran- domized resource allocation) and signaling schemes. The attacker is aware of the defender’s commitment, and will rationally respond. In particular, the attacker first chooses a target to visit. If he observes a sensor at the target, the attacker then makes a second decision and determines to attack or not, based on the signal from the sensor. If the attacker chooses to not attack, both players get utility 0. The attacker will choose actions that maximize his utility. 6.2.2 Additional Challenges and Computational Hardness We are interested in solving SEGs, by which we mean computing the globally optimal defender commitment consisting of the mixed strategy and signaling schemes. Without sensors in the game (i.e.,m = 0), the problem can be easily solved by anO(n 2 ) algorithm called ORIGAMI (Kiek- intveld, Jain, Tsai, Pita, Ord´ o˜ nez, & Tambe, 2009). In this section, we illustrate the additional challenges due to the consideration of sensors by proving the NP-hardness of solving SEGs even in zero-sum cases. Then we formulate the problem using the multiple-LP approach (Conitzer & Sandholm, 2006). Theorem 6.2.1. Computing the optimal defender commitment is NP-hard even in zero-sum SEGs. Proof. We reduce from the dominating set problem. A dominating set for a graphG is a subsetD of vertices such that every vertex is either inD or adjacent to a vertex inD. The dominating set 73 problem is to compute the size of a smallest dominating set forG. This problem is NP-hard even whenG is a planar graph with maximum degree 3 (Garey & Johnson, 1979). We now reduce an arbitrary dominating set instance to our problem. Given any graphG withn vertices, consider a zero-sum SEG instance withk patrollers and m = nk sensors. Let = 1 andU d + (i) = U a + (i) = 0;U d (i) =1 =U a (i) for every i. That is the defender receives utility 0 for successfully protecting a target and utility1 for failing to protect a target. We now prove thatG has a dominating set of sizek if and only if the optimal defender utility is 0 in the constructed SEG. As a result, by solving SEGs, we can solve the dominating set problem by enumerating different k’s, yielding the NP-hardness of solving SEGs. ): IfG has a dominating setD of sizek, the defender can cover thek vertices inD with patrollers and cover all the remaining vertices with sensors. By definition, any vertex not inD, covered by a sensor, will be adjacent to a vertex inD and therefore is successfully protected. As a result, all vertices are successfully protected and the defender receives utility 0. (: If the defender achieves utility 0, this must imply that each target is always successfully protected, i.e., either in state + or s+ . Otherwise, since attack failure has cost 0 to the attacker (U a + (i) = 0), the attacker will attack any target that is protected with probabilityp < 1, which would have resulted in a negative defender utility — a contraction. This implies that any pure strategy must aways protect every target, which means the vertices protected by thek patrollers must form a dominating set. A Formulation with Exponential-Size LPs The main challenge of solving a SEG is its nature as a bi-level optimization problem since signal- ing schemes are built on top of the mixed strategy. We show that the problem can be formulated as multiple (exponential-size) LPs. We first formulate the signaling process w.r.t. targeti. For convenience, lety i =P(e i = s+ ) andz i =P(e i = s ) denote the marginal probabilities of states s+ ; s , respectively. Thanks to Lemma 8, we can w.l.o.g. restrict to signaling schemes with two signals 1 ; 0 that result in the attacker best response of attacking and not attacking, respectively. Define variables + i = i ( s+ ; 1 )2 [0;y i ] and i = i ( s ; 1 )2 [0;z i ]. To guarantee that 1 ; 0 result in the desired attacker best responses, we need two constraints: U a 1 ( + i ; i ) = + i U a + (i) + i U a (i) 0 andU a 0 ( + i ; i ;y i ;z i ) = (y i + i )U a + (i) + (z i i )U a (i) 0: Under these constraints, the defender’s expected utility from 1 isU d 1 ( + i ; i ) = + i U d + (i) + i U d (i). Recall that the defender utility from 0 is 0. Crucially,U a 1 ;U d 1 ;U a 0 are all linear functions of + i ; i ;y i ;z i . With these representations of defender and attacker utilities from different signals, we are ready to present LPs to compute the optimal defender mixed strategy. For any fixed targett we 74 exhibit an LP that computes the optimal defender strategy, subject to visiting targett being the attacker’s best response. Details are given in the following linear program with variablesfp e g e2E andx i ;y i ;z i ;w i ; + i ; i for alli2 [n]. max x t U d + (t) +w t U d (t) +U d 1 ( + t ; t ) s.t. x t U a + (t) +w t U a (t) +U a 1 ( + t ; t ) x i U a + (i) +w i U a (i) +U a 1 ( + i ; i ) 8i6=t P e2E:e i = + p e =x i 8i2 [n] P e2E:e i = s+ p e =y i 8i2 [n] P e2E:e i = s p e =z i 8i2 [n] x i +y i +z i +w i = 1 8i2 [n] P e2E p e = 1 p e 0 8e2E U a 1 ( + i ; i ) 0 8i2 [n] U a 0 ( + i ; i ;y i ;z i ) 0 8i2 [n] 0 + i y i ; 0 i z i 8i2 [n] (6.6) In LP (6.6), variable p e is the probability of pure strategy e and x i ;y i ;z i ;w i are the marginal probabilities of different states. Program (6.6) is an LP sinceU d 1 ;U a 1 ;U a 0 are all linear func- tions. The last three sets of constraints guarantee thatf + i ; i g is a feasible signaling scheme at each targeti. The first set of constraints enforce that visiting targett is an attacker best response. The remaining constraints define various marginal probabilities. It is easy to see that LP (6.6) computes the optimal defender commitment, subject to visiting target t being an attacker best response. The optimal commitment can be computed by solving LP (6.6) for each t and picking the solution with maximum objective. A scalable algorithm for solving SEGs is given next. 6.2.3 A Branch-and-Price Approach The challenge of solving SEGs are two-fold. First, LP (6.6) has exponentially many variables. Second, we have to solve LP (6.6) for each t2 [n], which is very costly. In this section, we propose SEGer (SEGs engine with LP relaxations) — a branch and price based algorithm — to solve SEGs. We omit the standard description of branch and price (see, e.g., (Barnhart, John- son, Nemhauser, Savelsbergh, & Vance, 1998)) but highlight how SEGer instantiates the two key ingredients of this framework: (a) the column generation technique for solving LP (6.6) by developing scalable algorithms for the slave problem; (a) an efficient relaxation of LP (6.6) for branch-and-bound pruning. We will describe the column generation step first. 75 6.2.3.1 Column Generation & Scalable Algorithms for the Slave Our goal is to efficiently solve the exponential-size LP (6.6). The idea of column generation is to start by solving a restricted version of LP (6.6), where only a small subsetE 0 E of pure strategies are considered. We then search for a pure strategye2EnE 0 such that addinge toE 0 improves the optimal objective value. This procedure iterates until no pure strategies inEnE 0 can improve the objective, which means an optimal solution is found. The restricted LP (6.6) is called the master, while the problem of searching for a pure strategy e2EnE 0 is referred to as the slave problem. The slave is derived from the dual program of LP (6.6), particularly, from the dual constraints corresponding to primal variablep e s. We omit its textbook derivation here (see, e.g., (Tambe, 2011) for details), and only directly describe the slave problem in our setting as follows. Slave Problem: Given different weights i ; i ; i 2 R for each i, solve the following weight maximization problem: maximize e2E X i:e i = + i + X i:e i = s+ i + X i:e i = s i : (6.7) We mention that i ; i ; i in the slave are the optimal dual variables for the constraints that define x i ;y i ;z i respectively in LP (6.6). The slave is an interesting resource allocation prob- lem with multiple resource types (i.e., patrollers and sensors) which affect each other. Using a reduction from the dominating set problem, it is not difficult to prove the following. Lemma 11. The slave problem is NP-hard. Proof. The proof is similar to the proof of Theorem 6.2.1. By letting i = i = 1, i = 0, = 1 andm = nk, it is easy to show that the graph has an independent set of sizek if and only if the slave problem has optimal objective valuen. An MILP Formulation for the Slave Next we propose a mixed integer linear program (MILP) formulation for the slave problem. Our idea is to use three binary vectors v 1 ;v 2 ;v 3 2f0; 1g n to encode for each target whether it is in state + ; s+ ; s respectively. For example, target i is in state s+ if and only if v 2 i = 1. The main challenge then is to properly set up linear (in)equalities over these vectors to precisely capture their constraints and relations. The capacity for each resource type results in two natural constraints: P i2[n] v 1 i k and P i2[n] (v 2 i +v 3 i ) m. Moreover, since at most one resource is assigned to any target, we have 76 v 1 i +v 2 i +v 3 i 1 for eachi2 [n]. Finally, we use the set of constraintsA v 1 v 2 to specify which vertices could possibly have state s+ (i.e., have a patroller within distance). To see that this is the correct constraint, we claim that no vertex inv 1 is within distance toi if and only if A i v 1 = 0 where A i is the i’th row of A . This is easy to verify for = 1 and follows by induction for general. It turns out that these constraints are sufficient to encode the slave problem. Details are presented in MILP (6.8), whose correctness is summarized in Proposition 3. Here, = ( 1 ;:::; n ) > (; defined similarly) andhv 1 i is the inner product betweenv 1 and. The matrixA2f0; 1g nn is the adjacency matrix ofG (but with ones on its diagonal), andA is the’th power ofA. maximize hv 1 i +hv 2 i +hv 3 i subject to P i2[n] v 1 i k P i2[n] (v 2 i +v 3 i )m v 1 i +v 2 i +v 3 i 1; fori2 [n]: A v 1 v 2 v 1 ;v 2 ;v 3 2f0; 1g n (6.8) Proposition 3. Letfb e 1 ;b e 2 ;b e 3 g be an optimal solution to MILP (6.8). Then assigningk patrollers to vertices inb e 1 andm sensors to vertices inb e 2 +b e 3 correctly solves Slave (6.7). Here, for a vectorv2f0; 1g n , we say “i is inv” iffv i = 1. Proof. We prove that feasible solutions to MILP (6.8) precisely encode all pure strategies inE, under the mapping that vertices inb e 1 have state s + , vertices inb e 2 have state s s+ and vertices inb e 3 have states s . As a result, the objective of MILP (6.8) equals the objective of the slave, yielding the desired conclusion. First, any pure strategy inE must satisfy all constraints of MILP (6.8). To see this, we only need to argue the necessity of satisfying constraintA v 1 v 2 . LetA i denote thei’th row of A. The non-zero entries inA i specify all vertices within distance 1 fromi. A standard inductive argument shows that the non-zero entries in thei’th row ofA , denoted byA i , are precisely all the vertices within distance toi. Letv 1 denote the subset of vertices covered by patrollers. Then A i v 1 > 0 if and only if there is a vertex inv 1 (i.e., covered by a patroller) that is within distance toi. Only such a vertexi can havee 2 i = 1, and this is precisely captured byA i v 1 e 2 i for alli (i.e.,A v 1 v 2 ). Conversely, a similar argument shows that any feasible solution to MILP (6.8) corresponds to a pure strategy inE by assigningk patrollers to vertices inb e 1 andm sensors to vertices inb e 2 +b e 3 , concluding the proof of the proposition. 77 A 1 2 (1 1 e )-Approximation Algorithm for the Slave Next, we design a polynomial-time algorithm to approximately solve the slave problem, which can be used to accelerate SEGer. Our algorithm is provably a 1 2 (1 1 e )-approximation to the slave problem in zero-sum cases. The approximation guarantee relies on a special property of the slave for zero-sum SEGs, stated as follows, which unfortunately is not true in general. However, the algorithm can still be used as a good heuristic for solving general-sum SEGs. All the proofs in this part are deferred to Appendix B. Lemma 12. In zero-sum SEGs, the i ; i ; i in Slave (6.7) are guaranteed to satisfy: i i i 0 for anyi2 [n]. Our algorithm originates from the following idea. The slave problem can be viewed as a two- step resource allocation problem. In the first step, a vertex subsetT of size at mostk is chosen for allocating patrollers; in the second step, a subsetI [n]nT of size at mostm is chosen for allocating sensors. Our key observation is that givenT , the second step of choosingI is easy. Let T N =fiji62T butA i;j > 0 for somej2Tg denote the set of all vertices that are not in T but within distance to some vertices in T (in- terpreted as neighbors of T ). With some abuse of notations, let T c = [n]n (T[T N ) denote the set of remaining vertices. Notice thatT;T N ;T c are mutually disjoint. The following lemma illustrates how to pick the optimal setI, givenT . Lemma 13. GivenT , the second step of the slave (i.e., picking setI) simply picks them vertices corresponding to the largestm weights inf i ji2T N g[f i ji2T c g. Lemma 13 is true because when T is given, the weight of covering target i by a sensor is determined — either i if i 2 T N or i if i 2 T c . Thus the main difficulty of solving the slave problem lies at the first step, i.e., to find the allocation for patrollers. For convenience, let operator m max (W ) denote the sum of the largestm weights in weight setW . Utilizing Lemma 13, the objective value of the slave, parameterized by setT , can be viewed as a set function ofT : f(T ) = P i2T i + m max f i ji2T N g[f i ji2T c g : As a result, the slave problem can be re-formulated as a set function maximization problem: Slave Reformulation: max T[n]:jTjk f(T ): 78 The NP-hardness of the slave implies that there is unlikely to be a polynomial-time algo- rithm that maximizesf(T ) exactly. One natural question is whetherf(T ) is submodular, since submodular maximization admits good approximation guarantees (Calinescu et al., 2011). Unfor- tunately, the answer turns out to be “No” (see Appendix B.2 for a counter example). Nevertheless, we show that maximizingf(T ) admits a constant approximation under certain conditions. Theorem 6.2.2. When i i i 0;8i2 [n], there is a poly-time 1 2 (1 1 e )-approximate algorithm for the slave. A formal proof of Theorem 6.2.2 can be found in Appendix B; we only provide a proof sketch here. The key insight is that althoughf(T ) is not submodular, a variant of f(T ), defined below, can be proved to be submodular. Define g(T ) = P i2T i + m max f i ji2T N [Tg[f i ji2T c g : The only difference betweenf(T ) andg(T ) is that the weight set in the definition off(T ) [resp., g(T )] contains i s for anyi2 T N [resp.,i2 T N [T ]. Notice thatg(T ) can be evaluated in polynomial time for anyT [n]. Our algorithm, named TailoredGreedy (details in Algorithm 5), runs the greedy algo- rithm for maximizing g(T ) and then uses the output to construct a solution for the slave, i.e., for maximizingf(T ). The theorem can then be proved in two steps. First, we prove thatg(T ) is monotone submodular. This requires a somewhat intricate proof with careful analysis of the function. Then we show thatTailoredGreedy yields a 1 2 (1 1 e )-approximation for the slave problem. The key step for proving this result is to establish the following relation between the functionsf(T ) andg(T ):f(T )g(T ) 2f(T ). Algorithm 5:TailoredGreedy Input: weights i ; i ; i 2R for anyi2 [n] Output: a pure strategy inE 1: Initialization:T =;. 2: fort = 1 tok do 3: Computei = arg max i2[n]nT [g(T[fig)g(T )]. 4: Addi toT 5: return the pure strategy that covers the vertices inT with patrollers and covers them vertices corresponding to the largestm weights inf i ji2T N g[f i ji2T c g with sensors. 6.2.3.2 LP Relaxation for Branch-and-Bound Pruning Our goal of using branch-and-bound is to avoid solving LP (6.6) one by one for eacht, which is too costly. The idea is to come up with an efficiently computable upper bound of LP (6.6) for 79 eacht, so that once the best objective value among the solved LP (6.6)’s is larger than the upper bound of all the (yet) unsolved ones, we can safely claim that the current best solution is optimal without solving the remaining LPs. In this section, by properly relaxing LP (6.6) we obtain such an upper bound, which leads to significant speed-up in experiments. The standard approach for finding relaxations in security games is to ignore scheduling con- straints. Unfortunately, this does not work in our case since our security resources do not have scheduling constraints. The difficulty of our problem lies in characterizing marginal probabili- ties of different states in . Our idea is to utilize the constraints in MILP (6.8). Observe that v 1 ;v 2 ;v 3 in MILP (6.8) can be viewed as marginal vectors of a pure strategy for the states + ; s+ ; s respectively. Recall thatx;y;z in LP (6.6) are the marginal vectors of a mixed strat- egyp for state + ; s+ ; s respectively. Therefore, thex;y;z of any pure strategy must satisfy the constraints in MILP (6.8) by settingv 1 = x, v 2 = y, v 3 = z. By linearity, thex;y;z of any mixed strategy must also satisfy these constraints. This results in a relaxation of LP (6.6) by substituting the constraints in LP (6.6) that definex i ;y i ;z i with the constraints of MILP (6.8). Proposition 4. The following is a valid relaxation of LP (6.6). Moreover, this relaxation results in a linear program with polynomially many variables and constraints. P e2E:e i = + p e =x i ;8i P e2E:e i = s+ p e =y i ;8i P e2E:e i = s p e =z i ;8i P e2E p e = 1 p e 0; 8e2E =) P i2[n] x i k P i2[n] (y i +z i )m x i +y i +z i 1;8i A xy x;y;z2 [0; 1] n Relaxation: substitute left part in LP (6.6) with right part 6.2.4 Experiments In this section, we experimentally test our model and algorithms. All LPs and MILPs are solved by CPLEX (version 12.7.1) on a machine with an Intel core i5-7200U CPU and 11.6 GB mem- ory. All the game payoffs are generated via the covariant game model (Nudelman et al., 2004), which are widely adopted to test algorithms in security games. Let [a;b] denote the uniform distribution over the interval [a;b]. For any i2 [n], we generate U d + (i) [0; 10];U d (i) [10; 0];U a + (i) =corU d + (i)+(1+cor)[10; 0] andU a (i) =corU d (i)+(1+cor)[0; 10] wherecor2 [1; 0] is a parameter controlling the correlation between the defender and attacker payoffs. The game is zero-sum when cor =1. All general-sum games are generated with cor =0:6 unless otherwise stated. The graphG is generated via the Erd¨ os — R´ enyi random graph model. 80 Sensors Improve the Defender’s Utility Figure 6.3 shows the comparison of the defender utility under different scenarios. All data points in Figure 6.3 are averaged over 30 random instances and each instance has 30 targets. -4 -3 -2 -1 0 1 2 1 3 5 7 defender utility # patrollers (k) ratio 3 ratio 5 ratio 7 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 -1 -0.8 -0.6 -0.4 -0.2 0 defender utility correlation signaling no signaling no sensor Figure 6.3: Utility comparison 0 200 400 600 800 1000 1200 40 140 240 340 objective value # targets (n) MILP TailoredGreedy 0.001 0.01 0.1 1 10 100 40 140 240 340 sec [log-scale] # targets (n) MILP TailoredGreedy Figure 6.4: TailoredGreedy vs. MILP -2 -1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 20 40 60 80 100 120 defender utility # targets (n) CG[milp] CG[grdy] (a) Zero-sum, def utility 0 500 1000 1500 2000 2500 3000 20 40 60 80 100 120 sec # targets (n) CG[milp] CG[grdy] (b) Zero-sum, run time 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 20 40 60 80 defender utility # targets (n) SEGer[milp] SEGer[grdy] NtLP (c) General-sum, def utility 0 500 1000 1500 2000 2500 3000 20 40 60 80 sec # targets (n) SEGer[milp] SEGer[grdy] NtLP (d) General-sum, run time Figure 6.5: Utility comparison and scalability test of different algorithms for solving general-sum and zero-sum SEGs. The left panel of Figure 6.3 compares the following scenarios. The defender has a fixed budget that equals the total cost of 7 patrollers, and the cost of a patroller may equal the cost of 3 or 5 or 7 sensors (corresponding to ratio 3, ratio 5 and ratio 7 line, respectively). The x- axis coordinate k means that the defender gets k patrollers and ratio (7k) sensors; the y-axis is the defender utility. The figure demonstrates that a proper combination of patrollers and sensors results in better defender utility than just having patrollers (i.e.,k = 7). This is the case even when the cost ratio is 3. The figure also shows that many sensors with few patrollers will not perform well, either. Therefore, the number of patrollers and sensors need to be properly balanced in practice. The right panel of Figure 6.3 compares the defender utility in three different models: 1. signaling — SEG model; 2. no signaling — SEG model but assuming sensors do not strategically signal; 3. no sensor — classical security games. Bothsignaling andno signaling have 4 patrollers and 10 sensors whileno sensor has 6 patrollers with no sensors (i.e., cost ratio between the patroller and sensor is 5). The x-axis is the correlation parameter of the general-sum games. The graphG used in this figure is a cycle graph motivated by the protection of the border of conservation parks as in our previous illustrative example. The figure shows that signaling results in higher utility thanno signaling, demonstrating the benefit of using strategic signaling in this setting. Such a benefit decreases as the game becomes closer to being 81 zero-sum (i.e.,cor tends to1). This is as expected since signaling does not help in zero-sum settings due to its strict competition — any information to the attacker will benefit the attacker, and thus hurt the defender in a zero-sum setting. Bothsignaling andno signaling result in a stably higher utility thatno sensor regardless of players’ payoff correlation. TailoredGreedy vs. MILP In Figure 6.4, we compare the performances of MILP (6.8) andTailoredGreedy on solving just the slave problem. Notice that that running time in the right panel is in logarithmic scale. Each data point is an average over 15 instances with randomly generated i i i 0 for eachi2 [n]. Figure 6.4 shows thatTailoredGreedy achieves only slightly worse objective value than MILP but is much more scalable. The scalability superiority ofTailoredGreedy becomes prominent for larger instances (n280) where MILP starts to run in exponential time whileTailoredGreedy is a polynomial time algorithm. Game Solving: Utility & Scalability Comparisons Finally, we compare the performance of different algorithms in solving SEGs in Figure 6.5. Since zero-sum SEGs can be formulated as a single LP, which can then be solved by column generation. We compare two algorithms in this case: CG[milp] — column generation with MILP (6.8) for the slave;CG[grdy] — column generation withTailoredGreedy for the slave. Note that CG[milp] is optimal whileCG[grdy] is not optimal since it uses an approximate algorithm for the slave. 3 Figure 6.10(a) shows that our algorithms can solve zero-sum SEGs with 80 100 targets (depending on the algorithm) within 10 minutes. CG[grdy] achieves less utility than CG[milp], but is more scalable (exact calculations show that CG[grdy] is at least 6 times faster). The utility gap betweenCG[milp] andCG[grdy] becomes smaller asn grows, while their running time gap becomes larger. This suggests that it might be more desirable to use CG[milp] for small instances and CG[grdy] for large instances if some utility loss is acceptable. For general-sum SEGs (Figures 6.10(d) and 6.10(c)), we consider three algorithms: 1. SEGer[milp] — SEGer using MILP for column generation; 2. SEGer[grdy] — SEGer usingTailoredGreedy for column generation; 3. NtLP — solving LP (6.6) one by one for each t without branch and bound. Though SEGer[grdy] is not optimal, it achieves close-to-optimal objective value in this case and runs faster than SEGer[milp] (roughly half of the running time of SEGer[milp]). On the other hand, both SEGer[milp] and 3 We also implemented the algorithm that uses TailoredGreedy first and then switches to MILP when TailoredGreedy does not improve the objective. However, this approach seems to not help in our case and results in the same running time asCG[milp]. Thus we do not present it here. 82 SEGer[grdy] are much more scalable than NtLP. In fact, the running time for solving a general-sum SEG by SEGer[milp] is only slightly more than the running time of solving a zero-sum SEG of the same size, which demonstrates the significant advantage of our branch and price algorithm. 6.3 Exploiting Informational Advantage in Bayesian Stackelberg Games The previous two sections developed Stackelberg security game models which allow the defender to commit not only to a distribution over actions, but also to a scheme for stochastically signaling information about these actions to the attacker. This can result in higher utility for the defender. In this section, we extend this methodology to general Bayesian games, in which either the leader or the follower or both have payoff-relevant private information. This leads to novel variants of the model, for example by imposing an incentive compatibility constraint for each type to listen to the signal intended for it. We show that, in contrast to previous hardness results for the case without signaling (Conitzer & Sandholm, 2006; Letchford, Conitzer, & Munagala, 2009), we can solve unrestricted games in time polynomial in their natural representation. For security games, we obtain hardness results as well as efficient algorithms, depending on the settings. We show the benefits of our approach in experimental evaluations of our algorithms. 6.3.1 An Example of Stackelberg Competition The Stackelberg model was originally introduced to capture market competition between a leader (e.g., a leading firm in some area) and a follower (e.g., an emerging start-up). The leader has an advantage of committing to a strategy (or equivalently, moving first) before the follower makes decisions. Here we consider a Bayesian case of Stackelberg competition where the leader does not have full information about the follower. For example, consider a market with two firms, a leader and a follower. The leader specializes in two products, product 1 and product 2. The follower is a new start-up which focuses on only one product. It is publicly known that the follower will focus on product 1 with probability 0:55 (call him a follower of type 1 in this case), and product 2 with probability 0:45 (call him a follower of type 2 ). But the realization is only known to the follower. The leader has a research team, and must decide which product to devote this (indivisible) team to, or to send them on vacation. On the other hand, the follower has two options: either entering the market and developing the product he focuses on, or leaving the market. 83 F ; F 1 F 2 L ; 0 2 1 L 1 0 1 1 L 2 0 2 1 type 1 ,p = 0:55 F ; F 1 F 2 L ; 0 1 1 L 1 0 1 1 L 2 0 1 1 type 2 ,p = 0:45 Figure 6.6: Payoff matrices for followers of different types Naturally, the follower wants to avoid competition with the leader’s research team. In partic- ular, depending on the type of the follower, the leader’s decision may drive the follower out of the market or leave the follower with a chance to gain substantial market share. This can be modeled as a Bayesian Stackelberg Game (BSG) where the leader has one type and the follower has two possible types. To be concrete, we specify the payoff matrices for different types of follower in Figure 6.6, where the leader’s actionL i simply denotes the leader’s decision to devote the team to producti fori2f1; 2;;g;; means a team vacation. Similarly, the follower’s actionF i means the follower focuses on productsi2f1; 2;;g where; means leaving the market. Notice that the payoff matrices force the follower to only produce the product that is consistent with his type, otherwise he gets utility1. The utility for the leader is relatively simple: the leader gets utility 1 only if the follower (of any type) takes action F ; , i.e., leaving the market, and gets utility 0 otherwise. In other words, the leader wants to drive the follower out of the market. Possessing a first-mover advantage, the leader can commit to a randomized strategy to assign her research team so that it maximizes her utility in expectation over the randomness of her mixed strategy and the follower types. Unfortunately, finding the optimal mixed strategy to commit to turns out to be NP-hard for BSGs in general (Conitzer & Sandholm, 2006). Nevertheless, by exploiting the special structure in this example, it is easy to show that any mixed strategy that puts at least 2=3 probability on L 1 is optimal for the leader to commit to. This is because to drive a follower of type 1 out of the market, the leader has to takeL 1 with probability at least 2=3. Likewise, to drive a follower of type 2 out of the market, the leader has to takeL 2 with probability at least 1=2. Since 2=3 + 1=2 > 1, the leader cannot achieve both, so the optimal choice is to drive the follower of type 1 (occurring with a higher probability) out of the market so that the leader gets utility 0:55 in expectation. Notice that the leader commits to the strategy without knowing the realization of the fol- lower’s type. This is reasonable because the follower, as a start-up, can keep information con- fidential from the leader firm at the initial stage of the competition. However, as time goes on, the leader will gradually learn the type of the follower. Nevertheless, the leader firm cannot change her chosen action at that point because, for example, there is insufficient time to switch to another product. Can the leader still do something strategic at this point? In particular, we 84 study whether the leader can benefit by partially revealing her action to the follower after observ- ing the follower’s type. To be concrete, consider the following leader policy. Before observing the follower’s type, the leader commits to choose actionL 1 andL 2 uniformly at random, each with probability 1=2. Meanwhile, the leader also commits to the following signaling scheme. If the follower has type 1 , the leader will send a signal ; to the follower when the leader takes actionL 1 , and will send either ; or 1 uniformly at random when the leader takes actionL 2 . Mathematically, the signaling scheme for the follower of type 1 is captured by the following probabilities. Pr( ; jL 1 ; 1 ) = 1 Pr( 1 jL 1 ; 1 ) = 0; Pr( ; jL 2 ; 1 ) = 1 2 Pr( 1 jL 2 ; 1 ) = 1 2 . On the other hand, if the follower has type 2 , the leader will always send ; regardless of what action she has taken. When a follower of type 1 receives signal ; (occurring with probability 3=4), he infers the posterior belief of the leader’s strategy asPr(L 1 j ; ; 1 ) = 2=3 andPr(L 2 j ; ; 1 ) = 1=3, thus deriving an expected utility of 0 from taking actionF 1 . Assuming the follower breaks ties in favor of the leader, 4 he will then choose action F ; , leaving the market. On the other hand, if the follower receives 1 (occurring with probability 1=4), he knows that the leader has taken actionL 2 for sure; thus the follower will take actionF 1 , achieving utility 2. In other words, the signals ; and 1 can be viewed as recommendations to the follower to leave the market ( ; ) or develop the product ( 1 ), though we emphasize that a signal has no meaning beyond the posterior distribution on leader’s actions that it induces. As a result, the leader drives the follower out of the market 3=4 of the time. On the other hand, if the follower has type 2 , since the leader reveals no information, the follower derives expected utility 0 from takingF 2 , and thus will chooseF 0 in favor of the leader. In expectation, the leader gets utility 3 4 1 2 + 1 2 = 0:875(> 0:55). Thus, the leader achieves better utility by signaling. The design of the signaling scheme above depends crucially on the fact that the leader can distinguish different follower types before sending the signals and will signal differently to dif- ferent follower types. This fits the setting where the leader can observe the follower’s type after the leader takes her action and then signals accordingly. However, in many cases, the leader is not able to observe the follower’s type. Interestingly, it turns out that the leader can in some cases design a signaling scheme which incentivizes the follower to truthfully report his type to the leader and still benefit from signaling. Note that the signaling scheme above does not satisfy the follower’s incentive compatibility constraints — if the follower is asked to report his type, a follower of type 2 would be better off to report his type as 1 . This follows from some simple 4 This is without loss of generality because the leader can always slightly tune the probability mass to make the follower slightly preferF ; . 85 calculation, but an intuitive reason is that a follower of type 2 will not get any information if he truthfully reports 2 , but will receive a more informative signal, and thus benefit himself, by reporting 1 . Now let us consider another leader policy. The leader commits to the mixed strategy (L ; ;L 1 ;L 2 ) = (1=11; 6=11; 4=11). Interestingly, this involves sometimes sending the research team on vacation! Meanwhile, the leader also commits to the following more sophisticated sig- naling scheme. If the follower reports type 1 , the leader will send signal ; wheneverL 1 is taken as well as 3 4 of the time thatL 2 is taken; otherwise the leader sends signal 1 . If the follower reports type 2 , the leader sends signal ; wheneverL 2 is taken as well as 2 3 of the time thatL 1 is taken; otherwise the leader sends signal 2 . It turns out that this policy is incentive compatible — truthfully reporting the type is in the follower’s best interests — and achieves the maximum expected leader utility 17 22 0:7732 (0:55; 0:875) among all such policies. 6.3.2 Single Leader Type, Multiple Follower Types We now generalize the example in Section 6.3.1 and consider how the leader’s additional ability of committing to a signaling scheme changes the game and the computation. We start with the Bayesian Stackelberg Game (BSG) with one leader type and multiple follower types. Let denote the set of all the follower types. An instance of such a BSG in normal form is given by a set of tuplesf(A ;B ; )g 2 whereA ;B 2 R mn are the payoff matrices of the leader (row player) and the follower (column player), respectively, when the follower has type, which occurs with probability . We use [m] and [n] to denote the leader’s and follower’s pure strategy set respectively. For convenience, we assume that every follower type has the same number of actions (i.e.,n) in the above notation. This is without loss of generality since we can always add “dummy” actions with payoff1 to both players. We usea ij [b ij ] to denote a generic entry of A [B ]. IfA =B for all2 , we say that the BSG is zero-sum. Following the standard assumption of Stackelberg games, we assume that the leader can commit to a mixed strategy. Such a leader strategy is optimal if it results in maximal leader utility in expectation over the randomness of the strategy and follower types, assuming each follower type best responds to the leader’s mixed strategy. 5 It is known that computing the optimal mixed strategy, also known as the Bayesian Strong Stackelberg Equilibrium (BSSE) strategy, to commit to is NP-hard in such a normal-form BSG (Conitzer & Sandholm, 2006). A later result strengthened the hardness to approximation — no polynomial time algorithm can give a non-trivial approximation ratio in general unless P=NP (Letchford et al., 2009). 5 Note that the follower cannot observe the leader’s realized action, which is a standard assumption in Stackelberg games. 86 1 Leader commits (strategy + signaling scheme) Leader “observes” follower type and samples a signal Follower’s type is realized Follower observes the signal and plays time Leader plays an action Figure 6.7: Timeline of the BSG with multiple follower types. We consider a richer model where the defender can commit not only to a mixed strategy but also to a signaling scheme of partially revealing information regarding the action she is currently playing, i.e., the realized sample of the leader’s mixed strategy. Formally, the leader commits to a mixed strategyx2 m , where m is them-dimensional simplex, and a signaling scheme ' which is a randomized map from [m] to a set of signals . In other words, the sender randomly chooses a signal to send based on the action she currently plays and the follower type she observes. We call the pair (x;') wherex2 m ; ' : [m] rnd ! (6.9) a leader policy. After the commitment, the leader samples an action to play. Then the follower’s type is realized, and the leader observes the follower’s type and samples a signal. We assume that the follower has full knowledge of the leader policy. Upon receiving a signal, the follower updates his belief about the leader’s action and takes a best response. Figure 6.7 illustrates the timeline of the game. We note that if the leader cannot distinguish different follower types and has to send the same signal to all different follower types, then signaling does not benefit the leader (for the same reason as in the non-Bayesian setting). In this case, she should simply commit to the optimal mixed strategy. The leader only benefits when she can target different follower types with different signals. In many cases, like the example in Section 6.3.1, the leader gets to observe the follower’s type when it is realized (but after her action is completed) and can therefore choose to signal differently to different follower types. Moreover, in practice it is sometimes natural for the leader to send different signals to different follower types even without genuinely learning their types, e.g., the follower’s type may be defined by their location, in which case the leader can send signals using location-specific devices such as physical signs or radio transmission — this fits our model just as well. We will elaborate on one such example when discussing security games. 6.3.2.1 Normal-Form Games We first consider the case where the leader can explicitly observe the follower’s type, and thus can signal differently to different follower types. Like in the Bayesian persuasion model, we can 87 w.l.o.g. focus on direct signaling schemes that use at mostn signals with signal j recommending actionj2 [n] to the follower. As a result, we assume that =f j g j2[n] . Theorem 6.3.1. The optimal leader policy can be computed in poly(m;n;jj) time by linear programming. Proof. Let x = (x 1 ;:::;x m ) 2 m be the leader’s mixed strategy to commit to. A direct signaling scheme' can be characterized by'(jji;) which is the probability of sending signal j conditioned on the leader’s (pure) actioni and the follower’s type. Then,p ij =x i '(jji;) is the joint probability that the leader plays pure strategyi and sends signal j , conditioned on observing the follower of type. Then the following linear program computes the optimal leader policy captured by variablesfx i g i2[m] andfp ij g i2[m];j2[n];2 . maximize P 2 P ij p ij a ij subject to P n j=1 p ij =x i ; fori2 [m];2 : P m i=1 p ij b ij P m i=1 p ij b ij 0 ; for;j6=j 0 : P m i=1 x i = 1 p ij 0; for alli;j;: : (6.10) The first set of constraints mean that the summation of probability massp ij — the joint proba- bility of playing pure strategyi and sending signal j conditioned on follower type — overj should equal the probability of playing actioni for any type. The second set of constraints are to guarantee that the recommended actionj by signal j is indeed the follower’s best response. Given any gameG, letU sig (G) be the leader’s expected utility by taking the optimal leader policy computed by LP (6.10). Moreover, letU BSSE (G) be the leader’s utility in the BSSE, i.e., the expected leader utility by committing to (only) the optimal mixed strategy. Proposition 5. IfG is a zero-sum BSG, thenU sig (G) =U BSSE (G). That is, the leader does not benefit from signaling in zero-sum BSGs. The intuition underlying Proposition 5 is that, in a situation of pure competition, any infor- mation volunteered to the follower will be used to “harm” the leader. In other words, signaling is only helpful when the game exhibits some “cooperative components”. We defer the formal proof to Appendix C.1 Remark: As we mentioned earlier, computing the optimal mixed strategy (assuming no sig- naling) to commit to is NP-hard to approximate within any non-trivial ratio (Conitzer & Sand- holm, 2006; Letchford et al., 2009). Interestingly, it turns out that when we consider a richer model with signaling, the problem becomes easy! Intuitively, this is because the signaling scheme 88 “relaxes” the game by introducing correlation between the leader’s and follower’s action (via the signal). Such correlation allows more efficient computation. Similar intuition can be seen in the literature on computing Nash equilibria (hard for two players (Daskalakis et al., 2006; Chen et al., 2009)) and correlated equilibria (easy in fairly general settings (Papadimitriou & Roughgarden, 2008; Jiang & Leyton-Brown, 2011)). Incentivizing the Follower Type In many situations, it is not realistic to expect that the leader can observe the follower’s type. For example, the follower’s type may be whether he has a high or low value for an object, which is not directly observable. In such cases, the leader can ask the follower to report his type. However, it is not always in the follower’s best interest to truthfully report his own type since the signal that is intended for a different follower type might be more beneficial to the follower (recall the example in Section 6.3.1). In this section, we consider how to compute an optimal incentive compatible (IC) leader policy that incentivizes the follower to truthfully report his type, and meanwhile benefits the leader. Note that focusing on direct signaling schemes is still without loss of generality in this setting. To see this, consider a follower of type that receives more than one signal, each resulting in the same follower best response. Then, as before, the leader can merge these signals without harming the follower of type. But if a follower of type6= misreports his type as, receiving the merged signal provides less information than receiving one of the unmerged signals. Therefore, if the follower of type had no incentive to misreport type before the signals were merged, he has no incentive to misreport after the signals are merged. So any signaling scheme with more thann signals can be reduced to an equivalent scheme with exactlyn signals. Theorem 6.3.2. The optimal incentive compatible (IC) leader policy can be computed in poly(m;n;jj) time by linear programming, assuming the leader does not observe the follower’s type. Proof. We still use variables x 2 m andfp ij g i2[m];j2[n];2 to capture the leader’s policy. Then j = P m i=1 p ij is the probability of sending signalj when the follower has type. Now consider the case where the follower reports type , but has true type . When the leader recommends action j (assuming a follower of type ), which now is not necessarily the fol- lower’s best response due to the follower’s misreport, the follower’s utility for any action j 0 is 89 1 j P m i=1 p ij b ij 0 . Therefore, the follower’s action will be arg max j 0 1 j P m i=1 p ij b ij 0 with ex- pected utility max j 0 1 j P m i=1 p ij b ij 0 . As a result, the expected utility for the follower of type, but misreporting type, is U(;) = n X j=1 " j max j 0 1 j m X i=1 p ij b ij 0 # = n X j=1 " max j 0 m X i=1 p ij b ij 0 # : Therefore, to incentivize the follower to truthfully report his type, we only need to add the incentive compatibility constraintsU(;)U(;). Using the condition max j 0 P m i=1 p ij b ij 0 = P m i=1 p ij b ij , i.e., the recommended actionj by j is indeed the follower’s best response when the follower has type, we have U(;) = P n j=1 h max j 0 P m i=1 p ij b ij 0 i = P n j=1 P m i=1 p ij b ij Therefore, incorporating the above constraints to LP (6.10) gives the following optimization pro- gram which computes an optimal incentive compatible leader policy. maximize P 2 P ij p ij a ij subject to P n j=1 p ij =x i ; for alli;: P m i=1 p ij b ij P m i=1 p ij b ij 0 ; forj6=j 0 : P n j=1 P m i=1 p ij b ij P n j=1 h max j 0 P m i=1 p ij b ij 0 i ; for6=: P m i=1 x i = 1 p ij 0; for alli;j;: (6.11) Notice that P n j=1 h max j 0 P m i=1 p ij b ij 0 i is a convex function. Therefore, the above is a convex program. By standard tricks, the convex constraint can be converted to a set of polynomially many linear constraints (see, e.g., (Boyd & Vandenberghe, 2004)). Given any BSGG, letU IC (G) be the expected leader utility by playing an optimal incentive compatible leader policy computed by Convex Program (6.11). The following theorem captures the utility ranking of the different models. Proposition 6 (Utility Ranking). U sig (G)U IC (G)U BSSE (G): Proof. The first inequality holds because any feasible solution to Program (6.11) must be feasible to LP (6.10). The second inequality follows from the fact that the BSSE is an incentive compatible leader policy where the signaling scheme simply reveals no information to any follower. This scheme is trivially incentive compatible because it is indifferent to the follower’s report. 90 Relation to Other Models. Our model in this section relates to the model of Persuasion with Privately Informed Receivers (“followers” in our terminology) by (Kolotilin et al., 2017). Though in a different context, the model of Kolotilin et al. is essentially a BSG played between a leader and a follower of type only known to himself. In our model, players’ payoffs are affected by the leader’s action; thus the leader first commits to a mixed strategy and then signals her sampled action to the follower with incentive compatibility constraints. In (Kolotilin et al., 2017), the leader does not have actions. Instead, the payoffs are determined by some random state of nature, which the leader can privately observe but does not have control over. The follower only has a prior belief about the state of nature, analogous to the follower knowing the leader’s mixed strategy in our model. Kolotilin et al. study how the leader can signal such exogenously given information to the follower with incentive compatibility constraints. Mathematically, this corresponds to the case wherex in Program (6.11) is given a-priori instead of being designed. 6.3.2.2 Security Games We now consider Bayesian Security Games, a particular type of Stackelberg game played between a defender (leader) and an attacker (follower). Our results here are generally negative — the optimal leader policy becomes hard to compute even in the simplest of the security games. In particular, we consider a security game with n targets and k (< n) identical unconstrained security resources. Each resource can be assigned to at most one target; a target with a resource assigned is called covered, otherwise it is uncovered. Therefore, the defender pure strategies are subsets of targets (to be protected) of cardinality k. The attacker has n actions — attack any one of the n targets. The attacker has a private type which is drawn from finite set with probability . The attacker is privy to his own type, but the defender only knows the distribution f g 2 . This captures many natural security settings. For example, in airport patrolling, the attacker could either be a terrorist or a regular policy violator as modeled in (Pita, Jain, Marecki, Ord´ o˜ nez, Portway, Tambe, Western, Paruchuri, & Kraus, 2008b). In wildlife patrolling, the type of an attacker could be the species the attacker is interested in (Fang, Stone, & Tambe, 2015). If the attacker chooses to attack target i2 [n], players’ utilities depend not only on whether target i is covered or not, but also on the attacker’s type . We use U d=a c=u (ij) to denote the defender/attacker (d=a) utility when targeti is covered/uncovered (c=u) and an attacker of type attacks targeti. The leader now has n k pure strategies; thus, the natural LP has exponential size. Never- theless, in security games we can sometimes solve the game efficiently by exploiting compact representations of the defender’s strategies. Unfortunately, we show that this is not possible here. It turns out that the complexity of the problem depends on how many targets an attacker is inter- ested in. We say that an attacker of type is not interested in attacking targeti if there existsj 91 such thatU a u (ij)<U a c (jj). That is, even when targeti is totally uncovered and targetj is fully covered, the attacker still prefers attacking targetj — thus targeti will never be attacked by an attacker of type. Otherwise we say that an attacker of type is interested in attacking targeti. One might imagine that if an attacker is only interested in a small number of targets, this should simplify the computation. Unfortunately, this is not the case. Proposition 7. Computing the optimal defender policy in a Bayesian Stackelberg security game (both with and without type-reporting IC constraints) is NP-hard, even when the defender payoff does not depend on the attacker’s type and when each type of attacker is interested in attacking at most four targets. The proof of Proposition 7 requires a slight modification of a similar proof in (Li, Conitzer, & Korzhyk, 2016), and is provided in Appendix C.2 just for completeness. Our next proposition shows that we are able to compute the optimal defender policy in a restricted setting. This setting is motivated by fare evasion deterrence (Yin et al., 2012) where each attacker (i.e., a passenger) is only interested in attacking (i.e., stealing a ride from) one specific target (i.e., the metro station nearby), or choosing to not attack (e.g., buying a ticket) in which case both players get utility 0. Formally, we model this as a setting where each attacker type is interested in two targets: one type-specific target and one common targett ; (corresponding to the option of not attacking). If t ; is attacked, each player gets utility 0 regardless of whethert ; is protected or not — we callt ; coverage-invariant for this reason. 6 Proposition 8. Suppose each attacker type is interested in two targets: the common coverage- invariant targett ; and a type-specific target. Then the defender’s optimal policy (without type- reporting IC constraints) can be computed in poly(m;n;jj) time. The proof of Proposition 8 crucially exploits the fact that each player’s utility is “coverage- invariant” on targett ; . As a result, the defender will not covert ; at all at optimality. Therefore, for any attacker of type who is interested in targeti andt ; , the defender only needs to signal information about the protection of target i. This allows us to write a linear program. The proof is deferred to Appendix C.2. Note that when we take incentive compatibility constraints into account, the situation becomes more intricate. It could be the case that an attacker is not interested in attacking a target, but would still like to receive an informative signal regarding its coverage status in order to infer some information about the distribution of resources. This is reminiscent of information leakage as described in (Xu, Jiang, Sinha, Rabinovich, Dughmi, & Tambe, 2015), and our proof does not naturally extend to this setting. 6 The utility 0 is not essential so long ast ; is coverage-invariant. 92 2 Leader commits (strategies + signaling scheme) Follower observes the signal and plays Leader observes her type and samples an action + a signal time Figure 6.8: Timeline of the BSG with multiple leader types Our next result shows that the restriction in Proposition 8 is almost necessary for efficient computation, as evidence of computational hardness manifests itself when we slightly go beyond the condition there. Proposition 9. The defender oracle problem 7 is NP-hard (both with and without type-reporting IC constraints), even when each type of attacker is interested in two targets. 6.3.3 Multiple Leader Types, Single Follower Type Similarly to Section 6.3.2, we still start with the normal-form Bayesian Stackelberg Game, but with multiple leader types and a single follower type. Following the notation in Section 6.3.2, an instance of such a BSG is also given by a set of tuplesf(A ;B ; )g 2 whereA ;B 2R mn are the payoff matrices of the leader (row player) and the follower (column player) respectively. However, now is the set of leader types and is the probability that the leader has type . Among its many applications, one key motivation of this model is from security domains. In security games, the follower, i.e., the attacker, usually does not have full information regarding the importance and vulnerability of the targets for attack, while the leader, i.e., the defender, possesses much better knowledge. This can be modeled as a BSG where the leader has multiple types and the single-type follower has a prior belief regarding the leader’s types. It is known that in this case, a set of linear programs suffices to compute the optimal mixed strategy to commit to (Conitzer & Sandholm, 2006). We consider a richer model where the leader can additionally commit to a policy, namely a signaling scheme, of partially releasing her type and action. Formally, the leader commits to a mixed strategyx for each realized type and a signaling scheme' which is a stochastic map from [m] to . We call the pair (fx g 2 ;') wherex 2 m ; ' : [m] rnd ! (6.12) a leader policy in this setting. The game starts with the leader’s commitment. Afterwards, the leader observes her own type, and then samples an action and a signal accordingly. The follower observes the signal and best responds. Figure 6.8 illustrates the timeline of the game. 7 The optimal policy can be computed by an LP with exponential size. The defender oracle is a main subroutine for solving the dual of the LP . See Appendix C.2 for a derivation of the defender oracle and proof of the hardness. 93 6.3.3.1 Normal-Form Games We focus on direct signaling schemes and assume =f 1 ;:::; n g where j is a signal recom- mending actionj to the follower. Theorem 6.3.3. The optimal leader policy defined in Formula (6.12) can be computed in poly(m;n;jj) time by linear programming. Proof. To represent the signaling scheme', let'(jji;) be the probability of sending signal j , conditioned on the realized leader type and pure strategyi. Thenp ij = '(jji;)x (i) is the joint probability for the leader to take (pure) actioni and send signal j , conditioned on a realized leader type. The following linear program computes the optimalfp ij g i2[m];j2[n];2 . 8 maximize P 2 P ij p ij a ij subject to P m i=1 P n j=1 p ij = 1; for2 : P i; p ij b ij P i; p ij b ij 0 ; forj6=j 0 : p ij 0; for alli;j;: (6.13) By lettingx (i) = P n j=1 p ij and'(jji;) = p ij =x (i), we can recover the optimal defender policy (fx g 2 ;'). 6.3.3.2 Security Games We now again consider the security game setting. We have shown in Section 6.3.2 that, when there are multiple follower types, the polynomial-time solvability of BSGs does not extend to even the simplest security game setting. It turns out that when the leader has multiple types, the optimal leader strategy and signaling scheme can be efficiently computed in fairly general settings, as we show below. We still adopt the notations from Section 6.3.2.2, except that is now the defender’s private type. We further allow scheduling constraints in the defender’s resource allocation. Recall that the set of defender pure strategiesE and the set of marginal probabilitiesP have been described in Section 2.2.1. It was shown previously that if the defender’s best response problem can be solved in polynomial time, then the Strong Stackelberg equilibrium can also be computed in polynomial time (Jain et al., 2010; Xu, 2016). We now establish an analogous result for BSG with signaling. Theorem 6.3.4. The optimal defender policy can be computed in poly(n;jj) time if the de- fender’s best response problem (i.e., defender oracle) admits apoly(n) time algorithm. 8 Whenjj = 1, the game degenerates to a Stackelberg game without uncertainty of player types, and LP (6.13) degenerates to a linear program that computes the Strong Stackelberg equilibrium (Conitzer & Korzhyk, 2011). 94 Proof. First, observe that LP (6.13) does not obviously extend to security game settings because the number of leader pure strategies is exponentially large here and so is the LP formulation. Therefore, like classic security game algorithms, it is crucial to exploit a compact representation of the leader’s policy space. For this, we need an equivalent but slightly different view of the leader policy. That is, the leader policy can be equivalently viewed as follows: the leader ob- serves her type and then randomly chooses a signal j (occurring with probability P m i=1 p ij in LP (6.13)), and finally picks a mixed strategy that depends on both and j (i.e., the vector (p 1j ;p 2j ;:::;p mj ) normalized by the factor P m i=1 p ij in LP (6.13)). The different view of leader policy above allows us to write a quadratic program for comput- ing the optimal leader policy. In particular, letp j be the probability that the leader sends signal j conditioned on the realized leader type, and letx j be the leader’s (marginal) mixed strategy conditioned on observing and sending signal j . Then, upon receiving signal j , a rational Bayesian attacker will updates his belief, and compute the expected utility for attacking targetj 0 as X p j j h x j (j 0 )U a c (j 0 j) + 1x j (j 0 ) U a u (j 0 j) i ! (6.14) where the normalization factor j = P p j is the probability of sending signal j . Define AttU(j;j 0 ) to be the attacker utility by attacking target j 0 conditioned on receiving signal j , multiplied by the probability j of receiving signalj. Formally, AttU(j;j 0 ) = j Equation (6.14) = P p j x j (j 0 )U a c (j 0 j) + h p j p j x j (j 0 ) i U a u (j 0 j) Similarly, we can also defineDefU(j;j 0 ), the leader’s expected utility of sending signal j with targetj 0 being attacked, scaled by the probability of sending j . The attacker’s incentive compatibility constraints are thenAttU(j;j) AttU(j;j 0 ) for anyj 0 6= j. Then the leader’s problem can be expressed as the following quadratic program with variablesfx j g j2[n];2 and fp j g j2[n];2 . maximize P j DefU(j;j) subject to AttU(j;j)AttU(j;j 0 ); forj6=j 0 : P j p j = 1; for2 : x j 2P; forj;: p j 0; forj;: (6.15) The optimization program (6.15) is quadratic becauseAttU(j;j 0 ) andDefU(j;j 0 ) are quadratic in the variables. Notably, these two functions are linear inp j and the termp j x j . Therefore, we define variablesy j = p j x j 2R n . Then, bothAttU(j;j 0 ) andDefU(j;j 0 ) are linear inp j and 95 y j . The only problematic constraint in program (6.15) isx j 2P, which now becomesy j 2p j P where bothp j andy j are variables. HerepP denotes the polytopefpx : x2Pg for any given p. This turns out to still be a convex constraint, and behaves nicely as long as the polytopeP behaves nicely. Lemma 14 (Polytope Transformation). LetP R n be any bounded convex set. Then the fol- lowing hold: (i) The extended set e P =f(x;p) :x2pP;p 0g is convex. (ii) IfP is a polytope expressed by constraintsAx b, then e P is also a polytope, given by f(x;p) :Axpb;p 0g; (iii) IfP admits apoly(n) time separation oracle, so does e P. The proof of Lemma 15 is standard, and is deferred to Appendix C.3. We note that the restriction thatP is bounded is important; otherwise, some conclusions do not hold, e.g., Property 2. Fortunately, the polytopeP of mixed strategies is bounded. Therefore, using Lemma 15, we can rewrite Quadratic Program (6.15) as the following linear program. maximize P j DefU(j;j) subject to AttU(j;j)AttU(j;j 0 ); forj6=j 0 : P j p j = 1; for2 : (y j ;p j )2 e P; forj;: p j 0; forj;: (6.16) Program (6.16) is linear becauseAttU(j;j 0 ) andDefU(j;j) are linear inp j andy j , and more- over, (y j ;p j )2 e P are essentially linear constraints due to Lemma 15 and the fact thatP is a polytope in security games. Furthermore, LP (6.16) has a compact representation as long as the polytope of realizable mixed strategiesP has one. In this case, LP (6.16) can be solved explicitly. More generally, by standard techniques from convex programing, we can show that the separa- tion oracle forP easily reduces to the defender’s best response problem. Thus if the defender oracle admits apoly(n) time algorithm, then a separation oracle forP can be found inpoly(n) time. By Lemma 15, e P then admits apoly(n) time separation oracle, so LP (6.16) can solved in poly(n;jj) time. The proof is not particularly insightful and a similar argument can be found in (Xu, Fang, Jiang, Conitzer, Dughmi, & Tambe, 2014). So we omit the details here. Relation to Other Models We note that our model in this section is related to several models from the literature on both information economics and security games. In particular, when the leader does not have actions and only privately observes her type, our model degenerates to the Bayesian Persuasion (BP) 96 model of (Kamenica & Gentzkow, 2011). Our model generalizes the BP model to the case where the sender has both actions and private information, and our results show that this generalized model can be solved in fairly general settings. The security game setting in this section also relates to the model of Rabinovich et al. (Ra- binovich et al., 2015). Rabinovich et al. considered a similar security setting where the defender can partially signal her strategy and extra knowledge about targets’ states to the attacker in or- der to achieve better defender utility. This is essentially a BSG with multiple leader types and a single follower type. Rabinovich et al. (Rabinovich et al., 2015) were able to efficiently solve for the case with unconstrained identical security resources. Our Theorem 6.3.4 shows that this model can actually be efficiently solved in much more general security settings allowing com- plicated real-world scheduling constraints, as long as the defender oracle problem can be solved efficiently. 6.3.4 Experiments We mainly present the comparison of the models discussed in Section 6.3.2 in terms of both the leader’s optimal utility and the runtime required to compute the leader’s optimal policy. We focus primarily on the setting with one leader type and multiple follower types, for two reasons. First, this is the case in which it is NP-hard to compute the optimal leader strategy without allowing the leader to signal (i.e., to compute the BSSE strategy), while our models of signaling permit a polynomial time solution. Second, some interesting phenomena in our simulations for the case of multiple leader types also show up in the case of multiple follower types. We generate random instances using a modification of the covariant game model (Nudel- man et al., 2004). For any i, j, and , we independently set a ij equal to a random inte- ger in the range [5; 5] for each i;j;. The probabilitiesf g 2 are generated randomly. alpha 0 0.2 0.4 0.6 0.8 1 utility 0 0.2 0.4 0.6 0.8 1 no IC constraints IC constraints Figure 6.9: Extra utility gained by the leader from signaling. For some value of2 [0; 1], we then setB =(B 0 ) + (1)(A), whereB 0 is a random matrix generated in the same fashion asA. So in the case = 0 the game is zero-sum, while = 1 means completely uncorrelated leader and follower payoffs. For every set of parameter values, we averaged over 50 instances generated in this manner to obtain the utility/runtime values we report. We first consider the value of signaling for different values of chosen from the setf0; 0:1; 0:2;:::; 1g. For these simulations, we fix m = n = 10 andjj = 5. Figure 6.9 shows the absolute increase in leader utility from signaling (both with and without the type-reporting IC constraints), compared with the 97 utility from BSSE (they = 0 baseline). Note that when = 0 there is no gain from signaling, by Proposition 5. The gain from signaling is non-monotone, peaking at around = 0:7. Intuitively, large means low correlation between the payoff matrices of the leader and follower; therefore, there is a high probability that some entries will induce high payoff to both players. The leader can therefore extract high utility from commitment alone, and thus derives little gain from signaling. However, as we decrease and the game becomes more competitive, commitment alone is not as powerful for the leader and she has more to gain from being able to signal. n 5 10 15 runtime (s) 0 2 4 6 8 10 no IC constraints IC constraints BSSE (a) Running time,jj const. n 5 10 15 leader utility -0.5 0 0.5 1 1.5 2 no IC constraints IC constraints BSSE (b) Leader utility,jj const. # types 5 10 15 runtime (s) 0 2 4 6 8 10 no IC constraints IC constraints BSSE (c) Running time,n const. # types 5 10 15 leader utility -0.5 0 0.5 1 1.5 2 no IC constraints IC constraints BSSE (d) Leader utility,n const. Figure 6.10: Runtime and utility comparisons by varying the number of actionsn and the number of typesjj for the three different models in the case of multiple follower types. We next investigate the relation between the size of the BSG and the leader’s utility, as well as runtime, for the three different models. In Figures 6.10(a) and 6.10(b), we hold the number of follower types constant (jj = 5) and vary m = n between 1 and 15. In Figures 6.10(c) and 6.10(d) we fixm = n = 5 and varyjj between 1 and 15. In all cases we set = 0:5 for generating random instances. Not surprisingly, allowing signaling (both with and without the IC constraints) provides a significant speed-up over computing the BSSE. 9 On the other hand, the additional constraints in the model with IC constraints also increase the running time over the model without those constraints. Indeed, the time to compute the leader’s optimal policy without the IC constraints appears as a flat line in Figures 6.10(a) and 6.10(c). In both figures of leader utility, the differences of the leader’s utility among the models are as indicated by Proposition 6. Observe that in all models the leader’s utility increases with the number of actions, but decreases with the number of types. One explanation is that the former effect is due to the increased probability that the payoff matrices for a given follower type contain ‘cooperative’ entries where both players achieve high utility. However, as the number of follower types increases, it becomes less likely that the leader’s strategy (which does not depend on the follower type) can “cooperate” with a majority of follower types simultaneously. Thus there 9 To compute the BSSE, we implement the state-of-art algorithm DOBBS, a mixed integer linear program as for- mulated in (Paruchuri, Pearce, Marecki, Tambe, Ordonez, & Kraus, 2008). 98 is an increased chance that the leader’s strategy results in low utilities when playing against a reasonable fraction of follower types, which accounts for the latter effect. In the case of multiple leader types, allowing the leader to signal actually results in a small computational speed up compared to the case without signaling. We hypothesize that this is because we only need to solve one LP to compute the optimal policy, rather than the multiple LPs required to solve without signaling (Conitzer & Sandholm, 2006). Unsurprisingly, we also see an increase in the leader’s utility. The utility trends are similar to the case of multiple follower types, so we do not present them in detail. 99 Part III Dealing with Information Leakage 100 Chapter 7 Real-World Motivation and Two Illustrative Examples In this chapter, we describe two concrete examples motivated from real-world domains that illus- trate the issue of information leakage in security games. Our examples show that such leakage may cause significant loss to the defender if not addressed properly. 7.1 Motivating Example I: Information Leakage in Air Marshal Scheduling Our first example considers the problem of scheduling federal air marshals to protect flights (Tsai, Rathi, Kiekintveld, Ordonez, & Tambe, 2009). With more than 30,000 commercial flights per day in the United States airspace but only a limited number of air marshals, the Federal Air Marshal Service (FAMS) can only cover a small portion of flights and has to schedule air marshals in a randomized fashion. Naturally, such randomization needs to be intelligently designed based on the risk and importance of different flights. To assist in this large-scale scheduling process, a software assistance called Intelligent Randomization in Scheduling (IRIS) has been deployed and is currently in use by the FAMS (Tsai et al., 2009). Figure 7.1: A tweet that leaks information Figure 7.2: A round-trip schedule with information leakage. 101 When designing IRIS, a crucial assumption made was that the attacker could only observe the defender’s mixed strategy but was not able to observe, even partially, the defender’s pure strategy. However, this assumption may fail in practice. The realized protection status of some flights may leak out to the adversary due to various reasons, e.g., even an unintentional tweet (Figure 7.1). Since the air marshal’s schedule is usually a round trip or even a multi-way trip, if the adversary knows the protection status of a certain flight, he can infer the protection status of return flights (see Figure 7.2). It turns out that such information leakage may cause a significant loss to the defender if not addressed carefully. To illustrate this vulnerability, we consider a simple example in which the FAMS needs to protect four flights — two from LAX to ORD (denoted asA 1 ;A 2 ) and two return flights from ORD to LAX (denoted as B 1 ;B 2 ). There is only one air marshal available. The flights are depicted in the left panel of Figure 7.3. We assume that any outbound flight can form a round trip with any return flight — i.e., their arrival and departure times are compatible. Assume that, due to the different importances of the flights, the desired marginal protection probability is 2=3 for flightsA 1 andB 1 and is 1=3 for flightsA 2 andB 2 when there is no information leakage. Figure 7.3: Desired marginal protection probabilities and two different mixed strategies to imple- ment the marginals. There have been different algorithms (Tsai et al., 2009; Kiekintveld et al., 2009; Jain et al., 2010) developed to efficiently compute the optimal defender mixed strategy — i.e., a distribu- tion over the air marshal’s schedules — for the air marshal scheduling problem. However, they all assume that the attacker does not observe the defender’s realized pure strategy. Under this assumption, it does not matter how we implement the marginal protection probabilities. One computational challenge here is the exponential explosion of the total number of pure strategies due to the combinatorial structure of the defender’s strategy. To overcome this challenge, most efficient algorithms are designed to implement the desired marginal protection probabilities by randomizing over as few schedules as possible. This is also the reason that they are efficient — if the algorithm randomizes over too many schedules, it is not efficient any more. In this example, most efficient algorithms will output the strategy depicted in the middle panel of Figure 7.3. That is, the air marshal will take the red round trip with probability 2=3 and the 102 blue round trip with probability 1=3. It is easy to verify that this mixed strategy implements the desired marginal protection probabilities. Unfortunately, such a mixed strategy with small support may be extremely vulnerable to in- formation leakage. In fact, in this example, the attacker can completely uncover the air marshal’s schedule by observing the protection status of just one flight — actually, any flight. This is be- cause this mixed strategy creates too much correlation among flights. For example, consider that the adversary can observe the protection status of flightA 2 (e.g., because he sees a tweet about an air marshal as in Figure 7.1). Now, if flightA 2 is protected, this implies that the air marshal is taking the blue trip; therefore flightB 1 will not be protected later. Therefore, the attacker will have some time to plan an attack on B 1 . On the other hand, if flight A 2 is not protected, this implies that the air marshal is taking the red trip and flightB 2 will not protected later. In either case, the attacker can always identify a completely uncovered flight to attack. This example illustrates that previous algorithms can be extremely vulnerable to information leakage. In particular, these algorithms save running time by generating small-support mixed strategies which tends to introduce strong correlation among flights and make the strategies vul- nerable to leakage. This seems to create a dilemma between time efficiency and robustness to leakage. The next two chapters will illustrate how we can overcome such a dilemma. In this particular example, one possible way of overcoming the vulnerability to information leakage is to design a different schedule distribution that achieves the same marginal protection probabilities but has much less correlation among flights. The right panel of Figure 7.3 depicts one such implementation. It is easy to verify that the distribution implements the given marginal probabilities. However, it has much less correlation among flights. For example, even if the attacker knows that flightA 2 is protected, he is still uncertain whetherB 1 orB 2 will be protected later since both the orange and blue trip are possible. We note that the distribution of the air marshal’s schedule here is the max-entropy distribution subject to achieving the given marginal distributions. As we will see, the max-entropy distribution turns out to be a natural and useful choice in the presence of information leakage. Remark. One might wonder how much information the attacker needs in order to infer the protection status of a flight from the status of another. In particular, would this require the attacker to know the whole pure strategy — i.e., the probability of each pure strategy, which may be unrealistic for the attacker to know? For such inference, it is enough for the attacker to just know the correlation of the protection status for each pair of flights. In fact, if the attacker has a more specific idea about which particular target to observe and which to attack, he would only need to know the correlation among these two flights. 103 7.2 Motivating Example II: Information Leakage in Patrol Route Design Our second example considers the design of randomized patrol routes for rangers in order to combat poachers’ poaching activity. Information leakage is also a very important concern in this setting due to the poacher’s partial observation of rangers’ patrol routes. There have been many works optimizing the design of randomized patrol routes under differ- ent game settings and player rationality models (Fang et al., 2015; Nguyen, Delle Fave, Kar, Lak- shminarayanan, Yadav, Tambe, Agmon, Plumptre, Driciru, Wanyama, et al., 2015; Fang et al., 2016a). The essential charge of all these works — regardless of which model or algorithm is adopted — is to create unpredictability via randomization. However, despite te fact that there are usually a huge number of patrol routes to choose from, most efficient algorithms tend to ran- domize over as few routes as possible, as we illustrated in Section 7.1. Such small-support mixed strategies usually result in high correlation among targets. This opens the door for the poacher to use his partial observation to infer the rangers’ upcoming patrol directions. In fact, such an issue of information leakage has been a widely known concern in wildlife conservation (Nyirenda & Chomba, 2012; Moreto, 2013). To be more concrete, we now give an example illustrating how the attacker can first observe the protection status of a single target and then utilize correlations to infer the protection of other targets. To do so, the attacker does not even need full knowledge of the defender’s mixed strategy. Instead he only needs to know the correlations among the observed targets and his targets. Consider the problem of designing rangers’ patrolling routes within a fixed time period, say, a day. This is usually modeled by discretizing the area into cells as well as discretizing the time. At the top of Figure 7.4, we depict a concrete example with 4 cells to be protected at 3 time layers — morning, noon and afternoon. The numbers around each cell are the desired marginal coverage probabilities for each cell at each time. For simplicity, we assume that as time goes by, the ranger can move from any cell to any other one without constraints. The defender has 2 rangers, and seeks to randomize their patrolling to achieve the required marginal probabilities. 104 Figure 7.4: An example with four cells to be protected within three time layers. Figure 7.5: One mixed strategy that implements the marginals in Figure 7.4 Naturally, there are many different ways to implement this mixed strategy. As we mentioned before, classic algorithms strive to compute a mixed strategies of small support. In this example, previous algorithms tend to output the mixed strategy, as depicted in Figure 7.5, that randomizes over the three pure strategies. Unfortunately, such an implementation is extremely vulnerable to the attacker’s partial surveillance. For example, if the attacker can surveil the protection status of the bottom cell in the morning (i.e., the one with gray color and dashed boundary in Figure 7.4) and prepare an attack in the afternoon, he can always find a completely uncovered cell to attack. Specifically, if the dashed cell is covered, this means the third strategy in Figure 7.5 is deployed and the at- tacker can find two completely uncovered cells in the afternoon; Otherwise, either the first or the second strategy is deployed; thus the second-from-the-top cell will be uncovered in the afternoon for sure. To sum up, the attacker can successfully identify uncovered cells in the afternoon by monitoring only one cell in the morning. 105 7.3 The Curse of Correlation in Security Games The issue illustrated in the previous motivating examples is due to the inherent correlation among the protection status of different targets when allocating a limited number of resources. The coverage of some targets must imply that some other targets are unprotected. These examples show how the attacker can take advantage of such correlation and infer a significant amount of information about the protection of other targets by monitoring even a single target. This is what we term the Curse of Correlation (CoC) in security games. The following Proposition tries to capture this phenomenon in a more formal sense. We begin with some notation. Recall that any mixed strategy p is a distribution over the setE of pure strategies. Equivalently, we can view a mixed strategy as a random binary vector X = (X 1 ;:::;X n )2f0; 1g n satisfying Pr(X = e) = p e . Here, X i 2f0; 1g denotes the random protection status of targeti, andPr(X i = 1) =x i is the marginal coverage probability. For any X i , let H(X i ) = x i logx i + (1x i ) log(1x i ) denote its Shannon entropy. Note that, theX i ’s are correlated. LetX i jX k denoteX i conditioned onX k andH(X i jX k ) denote its conditional entropy. We say that targetk is trivial ifPr(X k = 1) = 0 or 1; otherwise,k is non- trivial. Obviously, the attacker infers no information about other targets by monitoring a trivial target. The following proposition shows that if any non-trivial targetk is monitored, the attacker can always infer information about the protection of other targets. 1 Proposition 10. For any non-trivial targetk, we have E X k X i6=k H(X i jX k ) < X i6=k H(X i ) . Proof. LetPr(X i = 1jX k = 1) = x 1 i andPr(X i = 1jX k = 0) = x 0 i . SinceE X k [Pr(X i = 1jX k )] =Pr(X i = 1) =x i , we havex i =x 1 i Pr(X k = 1) +x 0 i Pr(X k = 0). SinceH(X i ) is strictly concave w.r.t.x i , we have E X k H(X i jX k )H(X i ): Summing over alli6=k, we get P i2[n]:i6=k E X k H(X i jX k ) P i2[n]:i6=k H(X i ). We now argue that the “=” cannot hold, and will prove it by contradiction. Note that target i is non-trivial; thereforePr(X k )6= 0; 1. SinceH(X i )’s are strictly concave, if the “=” holds, then we must havePr(X i = 1jX k = 1) =Pr(X i = 1jX k = 0) =x i for anyi6=k. However, this implies that targeti is trivial, i.e.,i is either fully protected or unprotected. Otherwise, there 1 Here we assume that all resources are fully used, and do not consider the (unreasonable) situations in which certain security resources are sometimes underused or idle. 106 must exist some j 6= i such that the marginal probability of j will be different between the circumstances thati is protected and not protected. Proposition 10 shows that, conditioned on X k , the entropy sum of all other X i ’s strictly decreases in expectation. Note that it holds regardless of whether the security resources have scheduling constraints or not. This illustrates that the correlations among targets are intrinsic and inevitable. 107 Chapter 8 The Algorithmic Foundation for Dealing with Information Leakage Most security games assume that the attacker only knows the defender’s mixed strategy, but is not able to observe (even partially) the instantiated pure strategy. This fails to capture the cases where the attacker conducts real-time surveillance and may get partial observation regarding the deployed pure strategy. Despite its potential presence in reality as illustrated in Chapter 7, such issues, which we refer to as information leakage, have not been payed much attention in the literature on security games. In this chapter, we propose two natural models of security games with information leakage, depending on how much the defender knows about the leakage situation. We then undertake an algorithmic study for the problem of computing the optimal defender strategy under leakage, and focus on perhaps the most basic setting: zero-sum security games with no scheduling constraints. We first describe an exponential-size LP formulation to compute the defender’s optimal strategy against leakage, and then exhibit evidence of computational intractability for the model. This shows the intrinsic difficulty of handling leakage. We then tackle the problem from two different angles: developing polynomial-time algorithms for restricted settings and designing algorithms with approximation guarantees. 8.1 Information Leakage in Security Games – Two Basic Models To the best of our knowledge, there has not been any previous study about security games with in- formation leakage. Therefore, for simplicity, we start with a basic model where information leaks from only one target, though our model and algorithms can be generalized. For our algorithmic analysis in this chapter, we will focus on the simple security game setting where the defender allocatesk resources to protectn targets without any scheduling constraint. Such models have applications in real security systems like ARMOR for the LAX airport and GUARDS for port patrolling (Tambe, 2011). 108 Consider a standard zero-sum Stackelberg security game with a defender and an attacker. The defender allocatesk security resources to protectn targets, denoted by the set [n] =f1; 2;:::;ng. In this section we focus on the case where the security resources do not have scheduling con- straints. As a result, any subset of [n] with cardinality at mostk is a defender pure strategy. For any i2 [n], let r i be the reward [c i be the cost] of the defender when the attacked target i is protected [unprotected]. Since the game is zero-sum, the attacker’s utility is the negation of the defender’s utility. Following the notation in Section 2.2, we still usee2f0; 1g n to denote a pure strategy andE to denote the set of all pure strategies. Recall that we may also viewe as a subset of [n], denoting the protected targets. The intended interpretation should be clear from context. The support of a mixed strategy is the set of pure strategies with non-zero probabilities. With- out information leakage, the problem of computing the defender’s optimal mixed strategy can be compactly formulated as linear program (8.1) with each variablex i as the marginal probability of covering targeti. Any feasible marginal vector~ x can be efficiently implemented as a distribution over pure strategies, e.g., by Comb Sampling (Tsai, Yin, young Kwak, Kempe, Kiekintveld, & Tambe, 2010). maximize u subject to ur i x i +c i (1x i ); fori2 [n]: P i2[n] x i k 0x i 1; fori2 [n]: (8.1) Building on this basic security game, our model goes one step further and considers the possi- bility that the protection status of one target leaks to the attacker. Here, by “protection status” we mean whether this target is protected or not in an instantiation of the mixed strategy. We consider two basic models of information leakage. 8.1.1 Adversarial Leakage In the ADversarial Information Leakage (ADIL) model, parameterized by a probability param- eterp 0 2 [0; 1], we assume that with probability 1p 0 , one adversarially chosen target leaks information, and otherwise no target leaks information. Our goal then is to compute the opti- mal defender strategy assuming such an adversarially chosen leaking target. This model captures the case where the attacker will strategically choose a target for surveillance and with a certain probability he succeeds in observing the protection status of the surveiled target. In practice, the parameterp 0 can be estimated by domain experts. This model is suitable for the situation where the defender does not know much about the leakage situation except knowing that some target may leak information. The model then takes a robust perspective and optimizes against the worst case. 109 8.1.2 Probabilistic Leakage In the PRobabilistic Information Ieakage (PRIL) model, parameterized by probabilitiesp i ( 0) fori = 0; 1;:::;n, we assume that the leaking target isi with probabilityp i , and with probability p 0 no targets leak information. Therefore, these probabilities satisfy p 0 + P n i=1 p i = 1, i.e., ~ p = (p 0 ;p 1 ;:::;p n )2 n+1 where n+1 is the (n + 1)-dimensional simplex. In practice, the vector~ p is usually given by domain experts and may be determined by the nature or property of targets. This model requires the defender to have much more knowledge about the game. It is suitable when the defender has a relatively good estimate of the leakage situation and such an estimate is summarized as a distribution over the leakage probabilities of targets. 8.2 Complexity Barriers to Computing the Optimal Strategy Given either leakage model – PRIL parameterized by~ p2 n+1 or ADIL parameterized byp 0 – we are interested in computing the optimal defender mixed strategy. Recall that we focus on zero-sum security games. To see what an optimal strategy is like under a particular leakage model and what kind of information we need to keep track of, we start with a simple illustrative example. Consider a zero-sum security game with 4 targets and 2 resources. The profile of rewards r i [cost c i ] is ~ r = (1; 1; 2; 2) [~ c = (2;2;1;1)], where the coordinates are indexed by target ids. If there is no information leakage, it is easy to see that the optimal marginal coverage is ~ x = ( 2 3 ; 2 3 ; 1 3 ; 1 3 ). The attacker will attack an arbitrary target, resulting in a defender utility of 0. Now, let us consider a simple case of information leakage. Assume the attacker observes whether target 1 is protected or not in any instantiation of the mixed strategy, i.e., p 1 = 1. As we will argue, how the marginal probability~ x is implemented would matter now. One way to implement~ x is to protect targetsf1; 2g with probability 2 3 and protectf3; 4g with probability 1 3 . However, this implementation is “fragile” in the presence of the above information leakage. In particular, if the attacker observes that target 1 is protected (which occurs with probability 2 3 ), he infers that the defender is protecting targetsf1; 2g and will attack 3 or 4, resulting in a defender utility of 1; if target 1 is not protected, the attacker will just attack, resulting in a defender utility of2. Therefore, the defender gets expected utility 4 3 . Now consider another way to implement the same marginal~ x by the following mixed strat- egy: f1; 2g f1; 3g f1; 4g f2; 3g f2; 4g f3; 4g 10=27 4=27 4=27 4=27 4=27 1=27 110 If the attacker observes that target 1 is protected (which occurs with probability 2 3 ), then he infers that target 2 is protected with probability 10 27 10 27 + 4 27 + 4 27 = 5 9 , and target 3; 4 are both protected with probability 2 9 . Some calculation shows that the attacker will have the same utility 1 3 on targets 2; 3; 4 and thus will choose an arbitrary one to attack, resulting in a defender utility of 1 3 . On the other hand, if target 1 is observed to be unprotected, the defender gets utility2. In expectation, the defender gets utility 2 3 ( 1 3 ) + 1 3 (2) = 8 9 . As seen above, though implementing the same marginals, the latter mixed strategy achieves better defender utility than the former one in the presence of information leakage. However, is it optimal? It turns out that the following mixed strategy achieves an even better defender utility of 1 3 , which can be proved to be optimal: protectf1; 2g with probability 5 9 ,f1; 3g with probability 2 9 andf1; 4g with probability 2 9 . This example shows that compact representation by marginal coverage probabilities is not sufficient for computing the optimal defending strategy assuming information leakage. This nat- urally raises new computational challenges: how can we formulate the defender’s optimization problem and compute the optimal solution? Is there still a compact formulation or is it necessary to enumerate all the exponentially many pure strategies? What is the computational complexity of this problem? These are the questions we aim to answer in this section. 8.2.1 An Exponential-Size LP Formulation and Evidence of Hardness We will focus on the PRIL model. The formulation for the ADIL model will be provided at the end of this section since it admits a similar derivation. Fixing the defender’s mixed strategy, letT i (:T i ) denote the event that targeti is protected (unprotected). For the PRIL model, the defender’s utility equals DefU =p 0 u + n X i=1 p i (u i +v i ) whereu = min j [r j Pr(T j ) +c j Pr(:T j )] is the defender’s utility when there is no information leakage; and u i = Pr(T i ) min j [r j Pr(T j jT i ) +c j Pr(:T j jT i )] = min j [r j Pr(T j ;T i ) +c j Pr(:T j ;T i )] is the defender’s utility when targeti leaks out its protection status asT i (i.e., protected) multiplied by probabilityPr(T i ). Similarly v i = min j [r j Pr(T j ;:T i ) +c j Pr(:T j ;:T i )] is the defender’s expected utility multiplied by probability Pr(:T i ) when target i leaks status :T i (i.e., unprotected). 111 Define variablesx ij =Pr(T i ;T j ) (settingx ii =Pr(T i )). Using the fact thatPr(T i ;:T j ) = x ii x ij andPr(:T i ;:T j ) = 1x ii x jj +x ij , we obtain the following linear program which computes the defender’s optimal patrolling strategy: maximize p 0 u + P n i=1 p i (u i +v i ) subject to ur j x jj +c j (1x jj ); forj2 [n]: u i r j x ij +c j (x ii x ij ); fori;j2 [n]: v i r j (x jj x ij ) +c j (1x ii x jj +x ij ); fori;j2 [n]: x ij = P e:i;j2e e ; fori;j2 [n]: P e2E e = 1 e 0; fore2E: (8.2) whereu;u i ;v i ;x ij ; e are variables;e denotes a pure strategy and the sum condition “e :i;j2 e” means summing over all the pure strategies that protect both targetsi andj (ori ifi =j); e denotes the probability of choosing strategye. Unfortunately, LP (8.2) suffers from an exponential explosion of variables, specifically, e . From the sake of computational efficiency, one natural idea is to find a compact representation of the defender’s mixed strategy. As suggested by LP (8.2), the variables x ij , indicating the probability that targetsi;j are both protected, are sufficient to describe the defender’s objective and the attacker’s incentive constraints. Let us call the variablesx ij the pairwise marginals and think of them as a matrixX2R nn , i.e., thei’th row andj’th column ofX isx ij (not to be confused with the marginals~ x). We say X is feasible if there exists a mixed strategy, i.e., a distribution over pure strategies, that achieves the pair-wise marginalsX. Clearly, not allX2R nn are feasible. LetP(n;k)R nn be the set of all feasibleX. The following lemma shows a structural property ofP(n;k). Lemma 15.P(n;k) is a polytope and anyX2P(n;k) is a symmetric positive semi-definite (PSD) matrix. Proof. Notice thatX is feasible if and only if there exists e for any pure strategye such that the following linear constraints hold: x ij = P e:i;j2e e ; fori;j2 [n]: P e2E e = 1 e 0; fore2E: (8.3) These constraints define a polytope for variables (X; ~ ). Therefore, its projection to the lower dimensionX, which is preciselyP(n;k), is also a polytope. To proveX2P(n;k) is PSD, we first observe that any vertex ofP(n;k), characterizing a pure strategy, is PSD. In fact, lete2f0; 1g n be any pure strategy. Then the pair-wise marginal 112 w.r.t. e isX e =ee T , which is PSD. Therefore, anyX2P, which is a convex combination of its vertices, is also PSD. maximize p 0 u + P n i=1 p i (u i +v i ) subject to ur j x jj +c j (1x jj ); forj2 [n]: u i r j x ij +c j (x ii x ij ); fori;j2 [n]: v i r j (x jj x ij ) +c j (1x ii x jj +x ij ); fori;j2 [n]: X2P(n;k) (8.4) With Lemma 15, we may re-write LP (8.2) compactly as LP (8.4) with variables u, u i , v i andX. Therefore, we would be able to compute the optimal strategy in polynomial time if there are only polynomially many constraints determining the polytopeP(n;k) — recall that this is the approach we took with LP (8.1) in the case of no information leakage. Unfortunately, the following lemma rules out the approach of using compact representations of polytopes (unless P = NP). Lemma 16. Optimizing overP(n;k) is NP-hard. Proof. We prove the lemma by reduction from the densestk-subgraph problem. Given any graph instanceG = (V;E), letA be the adjacency matrix ofG. Consider the following linear program: maximize P i;j2[n] A ij x ij subject to X2P(n;k): (8.5) This linear program must have a vertex optimal solutionX which satisfiesX =ee T for some pure strategye2f0; 1g n . Therefore, the linear objective satisfies X i;j2[n] A ij x ij =tr(AX ) =tr(Aee T ) =tr(e T Ae) =e T Ae: Notice thate T Ae=2k equals the density of a subgraph ofG withk nodes indicated bye. Since X is the optimal solution to LP (8.5), it also maximizes the densitye T Ae=2k over all subgraphs withk nodes. In other words, the ability to optimize LP (8.5) implies the ability to compute the densestk-subgraph, which is NP-hard. Therefore, optimizing overP(n;k) is NP-hard. Lemma 16 suggests that there is no hope of finding polynomially many linear constraints which determineP(n;k) or, more generally, an efficient separation oracle forP(n;k), assuming P6=NP. In fact,P(n;k) is closely related to a fundamental geometric object, known as the corre- lation polytope, which has applications in quantum mechanics, statistics, machine learning and combinatorial problems. The following is a formal definition of the correlation polytope. 113 Definition 4. (Pitowsky, 1991) Given an integern, the Correlation PolytopeP(n) is defined as follows P(n) =Conv fvv T :v2f0; 1g n g : where Conv(S) denotes the convex hull of set S. Notice that vv T 2 f0; 1g nn for all v 2 f0; 1g n . The following proposition shows an interesting connection betweenP(n;k) and the correla- tion polytopeP(n). Proposition 11. X2P(n;k) if and only if the following three constraints hold: (a)X2P(n); (b)tr(X) = P n i=1 x ii = k; and (c)sum(X) = P n i;j=1 x ij = k 2 . In other words,P(n;k) is decided byP(n) with two additional linear constraints. Proof. We show that, given X 2 P(n), if X satisfies the following two linear constraints: tr(X) =k; sum(X) =k 2 , thenX2P(n;k). Since X 2P(n), there exist X i 2P(n;i) and p i 0, such that X = P n i=1 p i X i and P n i=1 p i = 1. That is, X is a convex combination of elements from eachP(n;i). Notice that 8X i 2P(n;i), we have tr(X i ) = i and sum(X i ) = i 2 , since any vertex ofP(n;i) satisfies these constraints. LetX2P(n;k). Then we have: (i) :1 = n X i=1 p i (ii) :k =tr(X) = n X i=1 p i tr(X i ) = n X i=1 p i i (iii) :k 2 =sum(X) = n X i=1 p i sum(X i ) = n X i=1 p i i 2 By the Cauchy-Schwarz inequality, we have ( P n i=1 p i )( P n i=1 p i i 2 ) ( P n i=1 p i i) 2 . Plugging the above three equations into the Cauchy-Schwarz inequality yields that the equality holds. The condition of equality for the Cauchy-Schwarz inequality is thatp i i 2 =p i is a constant for alli, such thatp i 6= 0. This shows that there is only one non-zero element among thep i ’s. That isp k = 1. Therefore,X2P(n;k). We note that (Pitowsky, 1991) defines correlation polytopes in a more general fashion, and our definition ofP(n) is in fact an important special case of the correlation polytope, which is called the full correlation polytope. (Pitowsky, 1991) proved that checking for membership of polytopeP(n) is NP-complete. Remark. Though Lemma 16 does not directly imply the NP-hardness of computing the optimal defender strategy (i.e., solving LP (8.4)), it serves as strong evidence. In (Xu, 2016), it was proved 114 that for standard security games with no information leakage, the NP-hardness of optimizing over the polytople of marginal probabilities implies the NP-hardness of computing the optimal defender strategy. Thus we view Lemma 16 as an indicator of the difficulty for computing the optimal defender strategy, though we remark that whether such an implication holds rigorously in the setting with information leakage is an interesting open problem. 8.2.2 The Dual Program and Evidence of Hardness Another popular approach for computing the optimal defender strategy in security games is to use the technique of column generation, which is a master/slave decomposition of an optimiza- tion problem (Tambe, 2011; Jain, Korzhyk, Vanˇ ek, Conitzer, Pˇ echouˇ cek, & Tambe, 2011). The essential part of this approach is the slave problem (Jain et al., 2010). Next we show that this approach will not work either. In particular, we will first derive the slave problem and then show that it is NP-hard to solve. A slave problem is an important subproblem for solving security games with a large number of pure strategies using the Column Generation technique. Any algorithm for solving the slave problem is also called a “defender oracle” by convention (Jain et al., 2010). We now derive the formulation for the slave problem in the PRIL model. Recall that LP (8.2) has a large number of variables because the number of pure strategies is exponential. However, by counting the number of activated constraints at optimality, we know that only polynomially many of these pure strategies will have non-zero probabilities at optimality since most pure strategies activate the corresponding constraint e 0 and take probability 0. Column generation is based on this observation, i.e., the optimal mixed strategy has small support. Basically, instead of solving LP (8.2) on the setE of all pure strategies, it starts from a small subset of pure strategies, denoted asA, and solves the following “restricted” LP. maximize p 0 u + P n i=1 p i (u i +v i ) subject to ur j x jj +c j (1x jj ); forj2 [n]: u i r j x ij +c j (x ii x ij ); fori;j2 [n]: v i r j (x jj x ij ) +c j (1x ii x jj +x ij ); fori;j2 [n]: x ij = P e2A:i;j2e e ; fori;j2 [n]: P e2A e = 1 e 0; fore2A: (8.6) Notice that the only difference between LP (8.2) and LP (8.6) is that the setE of all pure strategies is replaced by a small subsetA. In practice,A is usually initialized with a small number of pure strategies that are arbitrarily chosen. Column generation proceeds roughly as follows: 1. it solves LP (8.6); 2. by checking the dual of LP (8.6) the defender oracle judges whether the computed 115 optimal solution for LP (8.6) is also optimal for LP (8.2) (assigning all pure strategies inEnA probability 0); if not, the oracle finds a new pure strategy to be added to the setA and updates A. This procedure continues until the defender oracle asserts that the computed optimal solution w.r.t. the currentA is also optimal for LP (8.2). We now explain the underlying rationale of the column generation technique. We first derive the dual of LP (8.6). In fact, to emphasize the key aspects and avoid messy derivations, we rewrite LP (8.6) in the following abstract form: maximize d T y subject to Mx +Nyc x ij P e2A:i;j2e e = 0; fori;j2 [n]: P e2A e = 1 e 0; fore2A: (8.7) where the variabley represents the vector consisting ofu;v i ;u i while the variablex is the vector representation of x ij (putting i;j in some fixed order); d is a vector summarizing the original objective coefficients; the constraintsMx +Nyc summarize the first three sets of constraints in LP (8.6). This abstract form not only simplifies our derivation of the dual; more importantly it emphasizes that the column generation technique works regardless of what the first three sets of constraints are as long as there are polynomially many of them. LetM index(i;j) be the column vector ofM corresponding to variablex ij andN l be the column vector of N corresponding to the l’th component of y. We can now simply derive the dual of LP (8.7) as follows: minimize c T +! subject to T N l d l ; for alll: T M index(i;j) + ij0 ; fori;j2 [n]: P i;j2e ij +! 0; fore2A: 0 (8.8) where are the dual variables w.r.t. the first set of constraints in LP (8.7) and ij ,! are the dual variables w.r.t. the second and third set of constraints. Note that the optimal solution to LP (8.7) (denoted as OptSol A ) and the optimal solution to LP (8.8) (denoted as OptSolDual A ) can both be computed efficiently when A is small. A key observation here is that, if OptSolDual A , in particular, ! and ij , happens to make the constraints P i;j2e ij +! 0;8e2A hold more generally as P i;j2e ij +! 0;8e2E , then we claim thatOptSol A is also an optimal solution to LP (8.2) (by picking pure strategies inEnA with probability 0). This is because, if we replaceA byE in both LP (8.7) and LP (8.8), OptSol A is still feasible to LP (8.7) because all the newly added strategies (inEnA) have 116 probability 0; OptSolDual A is still feasible to LP (8.8) because our !; ij make constraints P i;j2e ij +! 0 hold for alle2E by assumption. Furthermore, complementary slackness still holds since the added new variables in LP (8.7) all take value 0. By linear program basics, OptSol A is still optimal if we replaceA in LP (8.7) byE, which is precisely LP (8.2). As a result, our key task is to judge whether P i;j2e ij +! 0 holds for all e2E for a given dual solution. This is equivalent to deciding whether ! max e2E h P i;j2e ij i . Therefore, the slave problem is defined as follows. Slave Problem: For any given weights ij , solve the following maximization problem: max e2E 2 4 X i;j2e ij 3 5 = max e2E s T ( M +M T 2 )s (8.9) whereM is the matrix satisfyingM ij = ij . In other words, the defender oracle finds a pure strategye that maximizes the sum P i;j2e ij . Recall that any algorithm that solves the slave problem is called a defender oracle. With this oracle, column generation proceeds as follows: 1. compute LP (8.7) and LP (8.8); 2. use the defender oracle to solve Problem (8.9): if the optimal value is less than or equal to the dual variable !, asserts optimality; otherwise, add e – the optimal solution to Problem (8.9) – to A; 3. repeat until optimality is reached. Notice that the newly addede does not belong to the originalA because all e2A satisfy P i;j2e ij !. Column generation does not guarantee polynomial convergence, but usually converges very fast in practice. This is because the optimal mixed strategy usually has small support. The following lemma shows that the slave problem is also NP-hard, and thus rules out the efficient implementation of the column generation approach for solving the problem. Lemma 17. The slave problem described above is NP-hard. Proof. The proof is similar to that of Lemma 16 by viewing the matrixM as an adjacency matrix of a graph. We omit the repetition here. By now, we have exhibited evidence of the hardness for solving LP (8.2) using either compact representation or the technique of column generation. For the ADIL model, a similar derivation 117 yields that the following LP formulation computes the optimal defender strategy. It is easy to verify that it shares its marginal probabilities and slave problem with the PRIL model. maximize p 0 u + (1p 0 )w subject to ur j x jj +c j (1x jj ); forj2 [n]: u i r j x ij +c j (x ii x ij ); fori;j2 [n]: v i r j (x jj x ij ) +c j (1x ii x jj +x ij ); fori;j2 [n]: wu i +v i ; fori2 [n]: X2P(n;k) (8.10) where variable w is the defender’s expected utility when an adversarially chosen target is ob- served by the attacker. LP (8.10) can also be abstractly written in the form of LP (8.7), and thus its slave problem is also NP-hard. 8.3 Provable Algorithms for Restricted Settings and Approximate Solutions The results in Section 8.2 suggest the difficulty of developing a polynomial-time algorithm to exactly solve security games with leakage. In this section, we seek to tackle this computational challenge by focusing on well-motivated special settings. 8.3.1 Leakage from Small Support Despite the hardness results for the general case, we show that the slave problem admits a polyno- mial time algorithm if the information only possibly leaks from a small subset of targets; we call this set the leakage support. By reordering the targets, we may assume without loss of generality that only the firstm targets, denoted by the set [m], could possibly leak information in both the PRIL and ADIL model. For the PRIL model, this meansp i = 0 for anyi>m and for the ADIL model, this means the attacker only chooses a target in [m] for surveillance. Why does this make the problem tractable? Intuitively the reason is as follows: when infor- mation leaks from a small set of targets, we only need to consider the correlations between these leaking targets and others, which is a much smaller set of variables than in LP (8.2) or (8.10). When restricted to a leakage support of sizem, a similar derivation as in Section 8.2.2 reveals that the slave problem is the follows. LetA be a symmetric matrix of the following block form Slave Problem with Leakage Support [m]: LetA be a symmetric matrix of the following block form A : " A mm A mm 0 A m 0 m A m 0 m 0 # (8.11) 118 where m 0 = nm; A mm 0 2 R mm 0 for any integers m;m 0 is a sub-matrix and, crucially, A m 0 m 0 is a diagonal matrix. GivenA of the form (8.11), find a pure strategye such thate T Ae is maximized. A defender oracle will identify the size-k principal submatrix with maximum entry sum for any A of form (8.11). Note that m = n in the general case. Next, we prove that the slave problem admits a polynomial time algorithm whenm is a constant. We start with some notation. LetA[i; :] be thei’th row of matrixA anddiag(A) be the vector consisting of the diagonal entries ofA. For any subsetsC 1 ;C 2 of [n], letA C 1 ;C 2 be the submatrix ofA consisting of rows inC 1 and columns inC 2 , andsum(A C 1 ;C 2 ) = P i2C 1 ;j2C 2 A ij be the entry sum ofA C 1 ;C 2 . The following lemma shows that Algorithm 6 solves the slave problem. The key insight here is that for a pure strategye to be optimal, once the setC = e\ [m] is decided, its complementC = enC can be explicitly identified. Therefore we can simply brute-force search to find the bestC [m]. Lemma 18 provides the algorithm’s guarantee, which then yields the polynomial-time solvability for the case of smallm (Theorem 8.3.1). Lemma 18. Letm be the size of the leakage support. Algorithm 6 solves the slave problem and runs inpoly(n;k; 2 m ) time. In particular, the slave problem admits apoly(n;k) time algorithm ifm is a constant. Proof. First, it is easy to see that Algorithm 1 runs inpoly(2 m ;n;k) time since the for-loop is executed at most 2 m times. We show that it solves the slave problem. Lete denote the indices of the principal submatrix ofA with maximum entry sum. Notice thate can also be viewed as a pure strategy. LetC = e\ [m] andC = enC. We claim that, givenC,C must be the set of indices of the largestkjCj values from the setfv m+1 ;:::;v n g, where~ v is defined as~ v = 2 P i2C A[i; :] +diag(A). In other words, if we knowC, the setC can be easily identified. To prove the claim, we re-write thesum(A s;s ) as follows: sum(A s;s ) = sum(A C;C ) + 2sum(A C;C ) +sum(A C;C ) = sum(A C;C ) + 2sum(A C;C ) +sum(diag(A C;C )) = sum(A C;C ) +sum(2 X i2C A i;C +diag(A C;C )) = sum(A C;C ) +sum(v C ) = val C where~ v = 2 P i2C A[i; :] +diag(A) andv C is the sub-vector ofv with indices inC. GivenC, sum(A C;C ) is fixed; thereforeC must be the set of indices of the largestkjCj elements from 119 fv m+1 ;:::;v n g. Algorithm 1 then loops over all the possibleC [m] (2 m many ) and identifies the optimal one, i.e., the one achieving the maximumval C . Algorithm 6: Defender Oracle Input: matrixA of the form (8.11). Output: a pure strategye. 1: for allC [m] constrained byjCjk do 2: ~ v = 2 P i2C A[i; :] +diag(A); 3: Choose the largestkjCj values from the setfv m+1 ;:::;v n g, and denote the set of their indices asC; 4: Setval C =sum(A C;C ) +sum(v C ); 5: return the pure strategye =C[C with maximumval C . Utilizing Lemma 18, we can prove the following theorem. Theorem 8.3.1. (Polynomial Solvability) There is apoly(n;k) time algorithm which computes the optimal defender strategy in the PRIL and ADIL model, if the size of the leakage supportm is a constant. Proof. We prove that LP (8.7) (which is really LP (8.4) written abstractly) can be solved in polynomial time. In fact, we prove that its dual program can be solved in polynomial time, which then implies that the primal LP (8.7) can be solved in polynomial time due to complementary slackness (Gr¨ otschel et al., 1988). Since the leaking target could only be from [m], LP (8.7) only has variables x ij for any i2 [m] orj2 [m] ori = j. As a result, the dual LP (8.8) only has variable ij ’s fori2 [m] orj2 [m] ori =j, which satisfies precisely the condition in the above slave problem for small support [m]. This implies that the polynomial time defender oracle (Algorithm 6) can be used to efficiently evaluate whether the constraints P i;j2e ij +! 0 are violated or not. All other other (polynomially many) constraints can be explicitly evaluated. Therefore, an efficient defender oracle gives rise to an efficient separation oracle for the feasible region of the dual LP (8.8). As a result, we can solve the dual program in polynomial time, concluding the proof. 8.3.2 An Approximation Algorithm We now consider approximation algorithms. Recall that information leakage is due to the cor- relation between targets. Thus one natural way to minimize leakage is to allocate each resource independently with certain distributions. The normalized marginal~ x =k is a natural choice, where ~ x is the solution to LP (8.1). To avoid the waste of using multiple resources to protect the same target, we sample without replacement. Formally, the independent sampling without replacement 120 (IndepSamp) algorithm proceeds as follows: 1. compute the optimal solution~ x of LP (8.1); 2. independently samplek elements from [n] without replacement using the distribution~ x =k. Since players may have positive or negative utilities in zero-sum games, a multiplicative approximation ratio in terms of utility is not meaningful. To analyze the performance of this algorithm, we instead consider the “coverage-match” criterion — i.e., how many more security resources are needed in order to achieve the same coverage level as the case of no leakage? More formally, we say that an algorithm is an -approximation ( 1) if the protection statuses T 1 ;:::;T n it induces satisfy that for anyi,Pr(T i )x i ,Pr(T i jT j )x i andPr(T i j:T j )x i for any target j that possibly leaks information. 1 This guarantees that the marginal protection probability of any targeti is at leastx i , i.e.,i’s protection probability in the SSE with no leakage, conditioned on any targetj with possible leakage. Theorem 8.3.2 shows that IndepSamp with a slight modification 2 is roughly a ( e e1 )- approximation under the aforementioned coverage-match criterion for both the PRIL and ADIL models. Theorem 8.3.2. Under the coverage-match criterion, there is a e e k1 k2 -approximation algorithm for both the PRIL and ADIL model. Proof of Theorem 8.3.2 LetY =Y (~ x)2R nn be a function of any~ x2R n , wherey ij is the probability that targetsi;j are both protected using IndepSamp. LetT i (:T i ) denote the event that targeti is protected (unprotected) using IndepSamp. We first prove Lemma 19, which provides a lower bound regarding how well the pair-wise marginals in Y approximate the original marginals ~ x. The difficulty of proving Lemma 19 lies in the fact thatY does not have a closed form in terms of ~ x if we sample without replacement. Our proof is based on a coupling argument by relating the algorithm to independent sampling with replacement. 3 Lemma 19. Given~ x,Y =Y (~ x) satisfies the following (in)equalities: Pr(T j ) =y jj (1 1 e )x j ;8j2 [n]; (8.12) Pr(T j jT i ) = y ij y ii ( k 2 k 1 1 e )x j ;8i6=j: (8.13) Pr(T j j:T i ) = y jj y ij 1y ii (1 1 e )x j ;8i6=j: (8.14) 1 Recall thatTi is the event that targeti is protected. 2 Because directly applyingIndepSamp can never match, e.g., coverage probability 1. 3 Our insistence on sampling without replacement is due to a practical consideration — making complete use of all security resources, though using the sampling approach with replacement may be easier to analyze from a theoretical perspective. 121 Proof. To prove these inequalities, we instead consider independent sampling with replacement. Define the functionZ = Z(~ x)2 R nn to be a function of~ x, wherez ij is the probability that targets i;j are protected together when sampling with replacement. Contrary to Y , Z has a succinct closed forms, and therefore we can lower bound entries inZ. We first considerz jj . z jj = 1 (1x j =k) k 1e x j (1 1 e )x j : where we used the fact (1) 1 e 1 for any 2 (0; 1). Now we lower bound z ij =z ii as follows. z ij z ii = 1 (1 x i k ) k (1 x j k ) k + (1 x i k x j k ) k 1 (1x i =k) k (8.15) = 1 (1 x j k ) k (1 x i k ) k (1 x j k ) k (1 x i k x j k ) k 1 (1x i =k) k = 1 (1 x j k ) k (1 x i k ) k 1 (1x i =k) k [(1 x j k ) k (1 x j kx i ) k ] (1 1 e )x j e x i 1e x i [(1 x j k ) k (1 x j kx i ) k ] where all the equations use arithmetic, while the inequality uses the fact that (1 x j k ) k e x j and x 1x is a decreasing function ofx2 (0; 1). We now upper-bound the term (1 x j k ) k (1 x j k1 ) k using the formulaa k b k = (ab) P k1 i=0 a i b k1i , as follows (1 x j k ) k (1 x j kx i ) k = (1 x j k 1 + x j kx i ) k1 X t=0 (1 x j k ) t (1 x j kx i ) k1t x j x i k(kx i ) k x i x j k 1 Plugging the above upper bound back into Inequality 8.15, we thus have z ij z ii (1 1 e )x j e x i 1e x i x i x j k 1 = (1 1 e )x j x i e x i 1 x j k 1 (1 1 e )x j x j k 1 = ( k 2 k 1 1 e )x j ; 122 where the last inequality is due to the fact thatf(x) = x e x 1 is a decreasing function forx2 (0; 1) and is upper bounded by lim x!0 x e x 1 = 1. Finally, we have z jj z ij 1z ii = (1 x i k ) k (1 x i k x j k ) k (1 x i k ) k = 1 (1 x j kx i ) k 1 (1x j =k) k (1 1 e )x j : To prove the lemma, we only need to show that y jj z jj , y ij =y ii z ij =z ii and (y jj y ij )=(1y ii ) (z jj z ij )=(1z ii ). To prove these inequalities, we use a coupling argument. Consider the following two stochastic process (StoP): 1. StoP 1 : at timet independently sample a random valuei t (2 [n]) with probabilityx it =k for anyt = 1; 2;::: until preciselyk different elements from [n] show up. 2. StoP 2 : at timet independently sample a random valuei t (2 [n]) with probabilityx it =k fort = 1; 2;:::k. Let C 1 [C 2 ] denote all the possible random sequences generated by StoP 1 [StoP 2 ], and C 1 j [C 2 j ] denote the subset ofC 1 [C 2 ], which consists of all the sequences including at least onej. For anye2C 2 j , letC e be the subset of sequences inC 1 , whose firstk elements are preciselye. Notice that any sequence inC 1 has length at leastk while any sequence inC 2 has preciselyk elements. Furthermore,C e C 1 j andC e \C e 0 =; for anye;e 0 2C 2 j ande6=e 0 . Now, think of each sequence as a probabilistic event generated by the stochastic process. Notice thatP (e;StoP 2 ) = P (C e ;StoP 1 ) due to the independence of the sampling procedure. Therefore, we have P (C 2 j ;StoP 2 ) = X e2C 2 j P (e;StoP 2 ) = X e2C 2 j P (C e ;StoP 1 ) P (C 1 j ;StoP 1 ) However,P (C 1 j jStoP 1 ) =y jj andP (C 2 j jStoP 2 ) =z jj . This provesy jj z jj . Notice thaty ij =y ii z ij =z ii is equivalent toP (e2 C 2 j je2 C 2 i ;StoP 2 ) P (e2 C 1 j je2 C 1 i ;StoP 1 ). To prove this inequality, we claim that it is without loss of generality to assume the first sample isi in both processes. This is because, if the firsti shows up as thet’th sample, mov- ingi to the first position would not change the probability of the sequence due to independence 123 between the sampling steps. Conditioned oni being sampled first, a similar argument as above shows that the probability of Stochastic processStoP 1 generatingj is at least the probability of stochastic processStoP 2 generatingj. Finally, (y jj y ij )=(1 y ii ) (z jj z ij )=(1 z ii ) is equivalent to P (e 2 C 2 j je 62 C 2 i ;StoP 2 ) P (e 2 C 1 j je 62 C 1 i ;StoP 1 ). The conditional probability P (e 2 C 2 j je 62 C 2 i ;StoP 2 ) can be viewed as the probability of generating a sequence including element j in a modifiedStoP 2 — it generates anyj6= i with probabilityx j =(kx i ) but generatesi with probability 0. Viewing from this perspective, we can conclude P (e2 C 2 j je62 C 2 i ;StoP 2 ) P (e2C 1 j je62C 1 i ;StoP 1 ) using a similar argument for provingy jj z jj . Let~ x be the optimal solution to LP (8.1) and let = e e k1 k2 . One natural idea is to scale up~ x by a factor of and then applyIndepSamp. The problem here is that some targets may have probability larger than 1 after the scaling up. To deal with this issue, we divide targets into two sets: S =fj : x j < 1=g and S { = [n]nS =fj : x j 1=g. For any j2 S { , we simply cover it with probability 1. Note that these targets will satisfyPr(T j )x j and will not leak any information about the protection of other targets since they will be always protected. For targets in S, we scale up their marginal probability by a factor of and then apply the IndepSamp algorithm. In total we need no more thank resources. By Lemma 19, we know thatPr(T j )x j ,Pr(T j jT i )x j andPr(T j j:T i )x j for anyj6=i2S. To summarize, for any targeti2 [n] that possibly leaks its protection status (either protected or unprotected), the conditional protection probability of any other target j is always at least x j . Therefore, in the ADIL leakage model, regardless which target i leaks information, the conditional protection probability of any other targetj is always at leastx j . This also holds in the PRIL model. 124 Chapter 9 Mitigating Harms of Information Leakage via Entropy Maximization Chapter 8 proposed and studied the complexity of two basic leakage models. Unfortunately, even in simple security game settings, we easily encounter barriers of computational intractability. Therefore, to obtain solutions with rigorous guarantees, we have to restrict ourselves to even more specific settings as described in Section 8.2. In this chapter, we instead propose a heuristic approach, based on max-entropy sampling, for handling information leakage. The solutions in Chapter 8 only work for the setting with no scheduling constraints. However, the framework we describe in this chapter will be generalizable to security games with arbitrary scheduling constraints. Together with some other practical advantages (illustrated later), this makes the approach very appealing in various real-world security applications. 9.1 The Max-Entropy Sampling Framework 9.1.1 Max-Entropy Sampling Over General Set Systems As we mentioned in Section 2.2.1, the set of defender pure strategies can be viewed as a set system, or equivalently, a set of binary vectors. Classic security games seek to achieve certain marginal probability vector ~ x (indexed by targets) by randomizing over these binary vectors. From Carath´ eodory’s theorem we know that, given any marginal vector~ x in the convex hull of E, denoted asconv(E), there are usually many different mixed strategies that achieve the same~ x (e.g., see examples in Section 7.1). One question then is which of these mixed strategies is more robust to information leakage. One natural choice is the mixed strategy of maximum entropy subject to achieving the given marginal vector x. Intuitively, this is because the max-entropy distribution is the most unpredictable distribution and usually has low correlation among targets. In this section, we describe a general framework for computing the max-entropy distribution over the set systemE subject to matching any given marginal~ x2 conv(E). This problem has 125 been studied in the literature of theoretic computer science; see, e.g., (Jerrum, Valiant, & Vazirani, 1986; Singh & Vishnoi, 2013). Our description here serves more as a review of previous work or its variants. Computing the max-entropy distribution can be formulated as the solution to anO(2 n )-size Convex Program (CP (9.1)) where variable e is the probability of taking pure strategye. maximize P e2E e ln( e ) subject to P e:i2e e =x i ; fori2 [n]: P e2E e = 1 e 0; fore2E: (9.1) Convex program for computing the max-entropy distribution An obvious challenge for solving CP (9.1) is that the optimal typically has exponentially large support, and thus cannot even be written down explicitly in polynomial time. This can be overcome via algorithms that efficiently sample a sete “on the fly”. Therefore, we say that an algorithm solves CP (9.1) if it takesx as input and randomly samplese2E with probability e where is the optimal solution to CP (9.1). Sampling the max-entropy distribution is closely related to the following generalized count- ing problem overE. Definition 5 (Generalized Counting). Given any 2 R n + , compute C() = P e2E e , where e = Q i2e i . Observe thatC(1) equals precisely the cardinality ofE. More generally,C() is a weighted count of the elements inE with weights e = Q i2e i . The relation between max-entropy sam- pling and counting is through the following unconstrained and convex dual program of CP (9.1) with variables ~ 2R n ande e = i2e e i . minimize f( ~ ) = P n i=1 i x i + ln( P e2E e e ); (9.2) Dual program of the convex program (9.1). The following theorem characterizes the optimal solutions for CP (9.1) and will be useful for our later results. Theorem 9.1.1. (Singh & Vishnoi, 2013) Let ~ 2 R n be the optimal solution to CP (9.2) and set i =e i for anyi2 [n]. Then, the optimal solution of CP (9.1) satisfies e = e P e 0 2E e 0 ; (9.3) 126 where e = i2e i for any pure strategye2E . Furthermore, if the generalized counting problem overE can be solved in poly(n) time, then ~ can be computed inpoly(n) time. Proof. The characterization of e is based on the KKT conditions of CP (9.1) and its dual pro- gram (9.2). Its proof can be found in (Singh & Vishnoi, 2013); we thus will not repeat the argument. Here, we prove the prescriptive part of the theorem. In particular, we will show that ~ can be computed in poly(n) time given any polynomial-time algorithm for the generalized counting problem and moreover, our algorithm will be practically efficient as well. Notice that CP (9.2) hasn variables but an expression of exponentially many terms, in partic- ular, the sum P e2E e e . The essential difficulty in evaluatingf( ~ ) lies in computing the sum P e2E e e , since the other parts can be explicitly calculated in poly(n) time. Note that calculat- ing P e2E e e is precisely the generalized counting problem over the set systemE with weights i =e i fori2 [n]. As a result, if we have a poly(n) time counting oracle, we can evaluate the function value off( ~ ) in poly(n) time. With this poly(n) time value oracle, one can conclude that CP (9.1) can be solved in poly(n) time using the ellipsoid method (Gr¨ otschel et al., 1988), though the order of this polynomial is usually large. Here, we instead give a more practical algorithm. We show that the gradient can also be evaluated efficiently. Therefore, one can use standard gradient-descent based algorithm to solve CP (9.1) which is usually more efficient in practice. In particular, @f( ~ ) @ i = x i P e2E:i2e e e P e2E e e : The only non-trivial part of evaluating @f( ~ ) @ i is to compute P e2E:i2e e e . This can be calculated by employing a generalized counting algorithm twice: once for the weightse j for eachj2 [n] and once with the same weights except using 2e i for i. Their difference equals precisely P e2E:i2e e e . To sum up, given a poly(n) time algorithm for the generalized counting problem, we can evaluatef( ~ ) and its gradient in poly(n) time. Thus we can also optimize the function in poly(n) time. After computing the optimal dual solution ~ , we need to develop sampling algorithms that output strategy e with probability precisely e = e P e 0 2E e 0 . This process will depend on the setting. However, it can usually be done efficiently given a generalized counting algorithm, which is the case in all the settings we study. 127 9.1.2 Why Maximizing Entropy? As we mentioned before, the issue of information leakage arises due to the correlation among the protection statues of targets, a phenomenon which we term the curse of correlation (CoC) in Section 7.3. To deal with CoC, the ideal approach is to come up with an accurate model to capture the attacker’s partial observation, i.e., an information leakage model, and then solve the model to obtain the defender’s optimal defending strategy, as we did in Section 8.2. However, we note that this approaches suffers from several drawbacks. 1. Unavailability of an Accurate Leakage Model. The attacker’s choice of target monitoring depends on many hidden factors, and thus is highly unpredictable. Therefore, it is typically very difficult to know which targets are leaking information — otherwise the defender could have resolved the issue in the first place via other approaches. As a result, it is usually not possible to obtain an accurate leakage model. As we will illustrates in our experiments, optimizing over an inaccurate leakage model can even be harmful to the defender compared to doing nothing. 2. Scalability and Computational Barriers. Even if the defender has an accurate leakage model, computing the optimal defender strategy against the leakage model is intractable generally. As we mentioned in Section 8.2, even in the simplest possible model — zero- sum games, no scheduling constraints and a single target leaking information — we ex- hibit evidence of intractability. The problem becomes even more difficult in more compli- cated spatio-temporal settings with scheduling constraints, e.g., the motivating examples in Chapter 7. 3. Vulnerability to Attacker’s Strategic Manipulations. Another concern about any op- timal solution tailored to a specific leakage model is that such a solution may be easily “gamed” by the attacker. In particular, the optimal solution naturally biases towards the leaking targets by assigning more security forces to these targets. This, however, opens the door for the attacker to strategically manipulate the defender’s belief on leaking targets, e.g., by intentionally spreading misleading information, with the goal of shifting the de- fense away from the attacker’s prime targets. As we show in our experiments, this could cause significant loss to the defender. Entropy maximization — a more robust solution. These barriers motivate our adoption of the more robust (though inevitably more conservative) max-entropy approach, as illustrated in Sec- tion 9.1.1. We propose to first compute the optimal defender strategy assuming no leakage and then play the mixed strategy with maximum entropy subject to matching the desired marginal 128 probabilities. Our choice of max entropy is due to at least three reasons. First, the max-entropy strategy is the most random, and thus unpredictable, defender strategy. When the defender is uncertain about which target is leaking information (the setting we are in), we believe that taking the most random strategy is one natural choice. Second, the max-entropy distribution usually ex- hibits substantial approximate stochastic independence among the protection statuses of targets 1 , so that the protection status of any leaking target does not carry much information about that of others. Third, as we will illustrate in the next few sections, the max entropy approach performs well in comparisons with several other alternatives in simulated games; in fact, in some settings, it achieves a solution quality that is even close to the optimal defender utility under no leakage! Given such encouraging empirical results, we believe that entropy maximization stood out as a powerful approach to address information leakage. From a practical perspective, the max-entropy approach also enjoys several advantages. First, it does not require a concrete leakage model. Instead, it seeks to reduce the overall correlation among the statuses of all targets, and thus serves as a robust solution. Second, this approach is easily “compatible” with any current deployed security systems since it does not require any change to previously deployed algorithms while only adding randomness (in some sense, this is a strictly better solution than previous ones). This is particularly useful in domains where re-building a new security system is not feasible or too costly. 9.2 Security Settings with No Scheduling Constraints As an instantiation of the above framework, we first consider the simple security game setting with no scheduling constraints. In this case, a defender pure strategy is any subset of [n] of size k. Such models have applications in real security systems like ARMOR for LAX airport and GUARDS for port patrolling in general (Tambe, 2011). 9.2.1 A Polynomial-Time Max-Entropy Sampling Algorithm In this section, we prove the following theorem. Theorem 9.2.1. When there are no scheduling constraints, the distribution that maximizes en- tropy subject to matching any given marginalx2conv(E) can be sampled in poly(n) time. The proof of Theorem 9.2.1 relies on the following two lemmas. Lemma 20. When there are no scheduling constraints, the generalized counting problem over Efor any given weight admits a poly(n) time algorithm. 2 1 This is widely observed in practice, and also theoretically proved in some settings, e.g., matchings (Kahn & Kayll, 1997). 2 The set systemE is also known as the uniform matroid in this case. 129 Proof. Let ~ = ( 1 ;:::; n ) be any given weight vector. Our goal is to compute the sum P e2E e where e = i2e i . We show that a dynamic program computes the sum P e2E e inpoly(n) time. Note that the set of all pure strategies consists of all the subsets of [n] of cardinalityk. We build the following DP tableT (i;j) = P e:e[j];jej=i e , which sums over all the subsets of [j] of cardinalityi. Our goal is to computeT (k;n) = P e2E e e . We first initializeT (0;j) = 1 andT (1;j) = j i=1 i for anyj. Then using the following update rule, we can build the DP table and computeT (k;n) inpoly(k;n) time. T (i;j) = T (i;j 1) + j T (i 1;j 1): This update rule is correct because T (i;j) is the sum of two parts. The first part contains terms without elementj. Therefore, these terms must sum up toT (i;j 1). The second part contains terms with elementj and otheri 1 elements beforej. These terms must sum up to j T (i 1;j 1). Lemma 20, together with Theorem 9.1.1, shows that we can solve CP (9.2) in poly(n) time. Our next lemma shows how to efficiently sample a pure strategye from an exponentially large support with probability e defined by Equation (9.3). The algorithm (Algorithm 7) simply goes through each target and adds it to the pure strategy with a specifically designed probability until exactlyk targets are added. Lemma 21. Given any input ~ 2 [0;1) n , Algorithm 7 runs in poly(n) time and correctly samples a pure strategye with probability e = e P e2E e , where e = i2e i . Proof. Note that Table T (i;j) can be computed in poly(n). We first show that the “while” loop in Algorithm 7 terminates within at mostn steps. In fact, j decreases by 1 each step and furthermorej i 0 always holds. This is because whenj decreases untilj = i, j will be sampled with probability j T (i1;j1) T (i;j) = i T (i1;i1) T (i;i) = 1; then bothj andi will decrease by 1 (Steps 6 9). This continues untili = 0. Furthermore, the algorithm terminates withjej = k because the cardinality ofe always satisfiesjej = ki by Steps 6 8 until the termination at i = 0. Therefore, Algorithm 7 runs in poly(n) time. Now we show that Algorithm 2 outputse with probability e . Let the outpute =fi 1 ;:::;i k g be sorted in decreasing order, i.e.,i 1 >i 2 >:::>i k . Notice that T (i;j) = j T (i 1;j 1) +T (i;j 1): Therefore, in the Sampling step (Step 5) of Algorithm 7,j is not included ine with probability T (i;j 1)=T (i;j). Therefore, to sample e = fi 1 ;:::;i k g, it must be the case that n;n 130 Algorithm 7: Max-entropy sampling in settings with no scheduling constraints Input: : ~ 2 [0;1) n ,k. Output: : a pure strategye withjej =k. 1: Initialize:e =;; the DP tableT (0;j) = 1 andT (j;j) = j i=1 i for anyj2 [n]. 2: ComputeT (i;j) = P e:e[j];jej=i e for anyi;j satisfyingik;jn and 1ij, using the following update rule T (i;j) = T (i;j 1) + j T (i 1;j 1): 3: Seti =k,j =n; 4: whilei> 0 do 5: Sampling: independently addj toe with probability p j = j T (i 1;j 1) T (i;j) ; 6: ifj was added toe then 7: i =i 1; 8: j =j 1; 9: return e. 1;:::;i 1 + 1 are not included, whilei 1 is included; i 1 1;:::;i 2 + 1 are not included, while i 2 is included; and so on. In addition, the sampling in each of these steps is conditioned on all its previous steps and the probability of each step is known. Therefore, by multiplying these probabilities together, we have P (s) = T (k;n 1) T (k;n) T (k;n 2) T (k;n 1) ::: i 1 T (k 1;i 1 1) T (k;i 1 ) T (k 1;i 1 2) T (k 1;i 1 1) ::: i k T (0;i k 1) T (1;i k ) = tk it T (k;n) = e : This gives precisely the probability we want. 9.2.2 A Linear-Time Heuristic Sampling Algorithm Though the sampling algorithm in Section 9.2.1 provably runs in polynomial time, the order of the polynomial may be large due to repeated calls to the counting oracle. This limits the scalability of the algorithm in very large applications. In this section, we develop a linear-time heuristic sampling algorithm, termed Uniform Comb Sampling (UniCS). UniCS is extremely efficient and, as we will show, also performs well in practice. 131 (Tsai et al., 2010) presented the Comb Sampling algorithm, which randomly samples a pure strategy and achieves a given marginal in expectation. The algorithm can be elegantly described as follows (also see Figure 9.1): thinking ofk resources ask buckets with height 1 each, we then put each target, the height of which equals precisely its marginal probability, one by one into the buckets. If one bucket gets full when filling in a certain target, we move the “rest” of that target to a new empty bucket. Continue this until all the targets are filled in, at which time we know that allk buckets are full. The algorithm then takes a horizontal line with a uniformly randomly chosen height from the interval [0; 1], and thek targets intersecting the horizontal line constitute the sampled pure strategy. As easily observed, Comb Sampling achieves the marginal coverage in expectation (Tsai et al., 2010). . . . . . . 1 . . . . . . sample x 2 x 1 x 3 x 2 x n 1 2 k Figure 9.1: Comb sampling However, is Comb Sampling robust against information leakage? We first observe that Comb Sampling generates a mixed strategy with support size at mostn + 1, which precisely matches the upper bound of Carath´ eodory’s theorem. Proposition 12. Comb Sampling generates a mixed strategy which mixes over at mostn+1 pure strategies. Proof. In Figure 9.1, let the sample line move from height 0 to height 1 continuously. The sampled pure strategy changes only when it meets a dotted line in any bucket. There are at most n 1 dotted lines (because there aren targets), so the total number of possible pure strategies is (n 1) + 2 =n + 1. Proposition 12 suggests that the mixed strategy sampled by Comb Sampling might be very easy to explore. Therefore we propose a variant of the Comb Sampling algorithm. Our key 132 observation is that Comb Sampling achieves the marginal coverage regardless of the order of the targets. That is, the marginal is still obtained if we randomly shuffle the order of the targets each time before sampling, and then fill them in one by one. Therefore, we propose the following Uniform Comb Sampling (UniCS) algorithm: 1. Choose an order of then targets uniformly at random; 2. Fill the targets into the buckets based on the random order, and then apply Comb Sampling. This algorithm runs in linear time because: (1) a random permutation can be generated in linear time, e.g., using the Knuth Shuffle (Knuth, 1997); (2) the Comb Sampling algorithm runs in linear time. The property of UniCS is summarized in the following proposition. Proposition 13. Uniform Comb Sampling (UniCS) runs inO(n) time and achieves the marginal coverage probability. 9.2.3 Experiments In this section, we experimentally study how traditional algorithms and our new algorithms per- form in the presence of probabilistic or adversarial information leakage (i.e., the PRIL and ADIL model in Section 8.1). Since we also have an algorithm that computes the exact optimal solution in this setting (Section 8.3) (though it runs in exponential time), we will also compare our max- entropy sampling (heuristic) approach with the exact optimal solution. In particular, we compare the following five algorithms. Traditional: optimal marginal + comb sampling, the traditional way to solve security games with no scheduling constraints (Kiekintveld et al., 2009; Tsai et al., 2010); OPT: the optimal algorithm for the PRIL or ADIL model (Section 8.1) using column gen- eration with the defender oracle in Algorithm 6; indepSample: independent sampling without replacement (Section 8.3); MaxEntro: max entropy sampling (Algorithm 7); UniCS: uniform comb sampling. All algorithms are tested on the following two sets of data: Los Angeles International Airport (LAX) Checkpoint Data from (Pita et al., 2008b). This problem was modeled as a Bayesian Stackelberg game with multiple adversary types in (Pita et al., 2008b). To be consistent with our model, we instead only consider the game against one particular type of adversary — the terrorist-type adversary, which is the main concern of the 133 0 0.2 0.4 0.6 0.8 1 −14 −12 −10 −8 −6 −4 1−p 0 u PRIL in LAX data Basis Traditional OPT indepSample MaxEntro UniCS 0 0.2 0.4 0.6 0.8 1 −14 −12 −10 −8 −6 −4 1−p 0 u ADIL in LAX data Basis Traditional OPT indepSample MaxEntro UniCS Figure 9.2: Comparisons on real LAX airport data. airport. The defender’s rewards and costs are obtained from (Pita et al., 2008b) and the game is assumed to be zero-sum in our experiments. Simulated Game Payoffs. A systematic examination is conducted with simulated zero-sum security games with no scheduling constraints, i.e., the basic setting we studied in Section 8.1. 3 All generated games have 20 targets and 10 resources. The rewardr i (costc i ) of each targeti is chosen uniformly at random from the interval [0; 10] ([10; 0]). This corresponds to the covariant random game generator (Nudelman et al., 2004), with covariance equal to1. In terms of running time, all the algorithms run efficiently as expected (terminate within seconds using MATLAB) except the optimal algorithm OPT, which takes about 3 minutes per simulated game on average. Therefore we mainly compare defender utilities. All the comparisons are listed in Figure 9.2 (for LAX data) and Figure 9.3 (for simulated data). The line “Basis” is the utility with no leakage and is listed as a basis for utility comparisons. The Y-axis is the defender’s utility — the higher, the better. We examine the effect of the total probability of leakage (i.e., the x-axis 1p 0 ) on the defender’s utility and consider 1p 0 = 0; 0:1;:::; 1. For probabilistic information leakage, we randomly generate the probabilities that each target leaks information with the constraint P n i=1 p i = 1p 0 . For the case of leakage from small support (for simulated payoffs only), we randomly choose a support of size 5. All the utilities are averaged 3 Another rationale of focusing on zero-sum games is the following. Zero-sum games are strictly competitive; therefore, any information leaking to the attacker will benefit the attacker and hurt the defender. The effects of the curse of correlation (CoC) could be a mix of both good and bad aspects in general-sum security games because “leaking” information to the attacker there could sometimes be beneficial to the defender. This has been studied in Part I of these thesis on strategic information revelation in security games (Rabinovich et al., 2015; Guo et al., 2017). In zero-sum security games, however, any information to the attacker will hurt the defender. In this sense, zero-sum games serve as the best fit for studying harms of CoC. Previous work studying information leakage in normal-form games (Alon et al., 2013) also focused on zero-sum games. 134 0 0.2 0.4 0.6 0.8 1 −5 −4 −3 −2 −1 0 1 1−p 0 u PRIL With Small Leakage Support Basis Traditional OPT indepSample MaxEntro UniCS 0 0.2 0.4 0.6 0.8 1 −5 −4 −3 −2 −1 0 1 1−p 0 u PRIL With Full Leakage Support Basis Traditional OPT indepSample MaxEntro UniCS 0 0.2 0.4 0.6 0.8 1 −5 −4 −3 −2 −1 0 1 1−p 0 u PRIL With Uniform Leakage Probability Basis Traditional OPT indepSample MaxEntro UniCS 0 0.2 0.4 0.6 0.8 1 −8 −7 −6 −5 −4 −3 −2 −1 0 1 1−p 0 u ADIL Basis Traditional OPT indepSample MaxEntro UniCS Figure 9.3: Comparisons in Simulated Games. over 50 random games except the ADIL model for LAX data. For the simulated payoffs, we also consider a special case of uniform leakage probability of each target. The following observations follow from the figures. Observation 1. The gap between the line “Basis” and “OPT” shows that information leakage from even one target may cause a dramatic utility decrease to the defender. Moreover, adversarial leakage causes more utility loss than probabilistic leakage; leakage from a restricted small support of targets causes less utility decrease than from full support. Observation 2. The gap between the line “OPT” and “Traditional” demonstrates the neces- sity of handling information leakage. The relative lossu(OPT )u(Basis) is approximately half of the relative lossu(Traditional)u(Basis) in Figure 9.3 (and 65% in Figure 9.2). Fur- thermore, if leakage is from a small support (top-left panel in Figure 9.3), OPT is close to Basis. 135 Observation 3. MaxEntro and UniCS have almost the same performance (overlapping in all these figures). Both algorithms are almost optimal when the leakage support is the full set [n] (they almost overlap with OPT in the top-right and bottom-left panels in Figure 9.3). Observation 4. An interesting observation is that in all of these figures, IndepSample start to outperform Traditional roughly at 1p 0 = 0:3 or 0:4, which is around 1 e 0:37. Furthermore, the gap between IndepSample and OPT does not change much at different 1p 0 . Observation 5. From a practical view, if the leakage is from a small support, OPT is preferred as it admits efficient algorithms (Section 8.3); if the leakage is from a large support, MaxEntropy and UniCS are preferred as they can be computed efficiently and are close to optimality. From a theoretical perspective, we note that the intriguing performance of IndepSample, MaxEntropy and UniCS raises questions for future work. 9.3 The Air Marshal Scheduling Problem In this section, we consider the problem of randomized air marshal scheduling, as illustrated in Section 7.1. One important task faced by the Federal Air Marshal Service (FAMS) is to schedule air marshals to protect international flights. In this setting, the schedule of each air marshal is a round trip (Kiekintveld et al., 2009), which is what we focus on. We start by formally defining the problem. FAMS seeks to allocatek homogeneous air mar- shals to protect round-trip international flights originating from domestic cities to different outside cities. These round-trip flights constitute a bipartite graphG = (A[B;E) in which nodes in A [B] correspond to all outbound [return] flights; e = (A i ;B j )2 E iff e forms a consistent round trip. We remind the reader that here we abuse notation since e is used to denote a pure strategy andE is the set of all defender pure strategies. Figure 9.4 depicts the graph between one domestic city and two outside cities, though in general we consider multiple domestic cities and multiple outside cities. Note thatG is a union of multiple isolated smaller bipartite graphs, each containing all flights between two cities. This is because any flight from citya to cityb can never form a round trip with a flight from cityc to citya. We will call each isolated bipartite graph a component. Naturally, not any two flightsA i ;B j can form consistent round-trip flights. The following are natural constraints on the structure of the graphG: (A i ;B j ) forms a compatible round trip (i.e., (A i ;B j )2E) if (1) the destination city ofA i is the departure city ofB j ; (2) the arrival time ofA i , denoted asarr(A i ), and the departure time ofB j , denoted asdep(Bj), satisfy dep(B j )arr(A i )2 [T 1 ;T 2 ] for constantsT 2 >T 1> 0. Moreover, we assume that in any pure strategy, each flight is covered by at most one air marshal. This is a requirement that comes from the US Transportation Security Administration to ensure maximum usage of valuable security resources (Jain et al., 2010). 136 " # $ % & ' ( ) * " # $ % & ' ( ) * Figure 9.4: Consistent round-trip flights between a domestic city and two outside cities. Next, we will develop a provably polynomial-time algorithm for sampling the max-entropy distribution as well as a fast heuristic sampling algorithm. Our exact algorithm crucially exploits certain “order” structure of the air marshal’s schedules. The heuristic sampling algorithm can be generalized to other security games as well so long as we can efficiently compute the defender’s best response. We evaluate these algorithms experimentally at the end of this section. 9.3.1 A Polynomial-Time Max-Entropy Sampling Algorithm We prove the following theorem in this section. Theorem 9.3.1. In the federal air marshal scheduling problem with round trips, the distribution that maximizes entropy subject to matching any given marginalx2conv(E) can be sampled in poly(n;k) time, wherek is the number of air marshals andn =jA[Bj is the number of total flights. The proof of Theorem 9.3.1 has two steps: First, we design a poly(n;k) time algorithm for the generalized counting problem over the set systemE of defender pure strategies. By Theorem 9.1.1, this implies that we can compute the optimal solution to CP (9.2) in poly(n;k) time. Second, we will design an efficient sampling algorithm that samples a pure strategye from an exponentially large support with probability e as defined by Equation (9.3). Step 1 We start with the first task. LetG = (A[B;E) denote the bipartite graph for the air marshal scheduling problem,jAj = n 1 ;jBj = n 2 . Recall thatG is a union of multiple isolated compo- nents, each containing all flights between two cities (see Figure 9.4). Within each component, we sort the flights inA by their arrival time and flights inB by their departure time. We now show that generalized counting over the set of defender pure strategies admits a polynomial time algorithm. Our algorithm crucially exploits the following “order” structure. 137 Definition 6. [Ordered Matching] In a bipartite graph G = (A[B;E), a matching M = fe 1 ;:::;e k g is called an ordered matching if for any edgee = (A i ;B j ) ande 0 = (A i 0;B j 0) in M, eitheri>i 0 ,j >j 0 ori<i 0 ,j <j 0 . Visually, any two edgese;e 0 in an ordered matching satisfy thate is either “above” or “below” e 0 — they do not cross. Since each flight has at most one air marshal, any assignment of air marshals must corre- spond to a matching inG. However, a pure strategye — i.e., a set of covered flights — can be accomplished by different matchings. For example, the sete =fA 1 ;A 2 ;B 1 ;B 2 g in Figure 9.4 can be achieved by the matchingf(A 1 ;B 1 ); (A 2 ;B 2 )g or the matchingf(A 1 ;B 2 ); (A 2 ;B 1 )g. However, only the matchingf(A 1 ;B 1 ); (A 2 ;B 2 )g is ordered. The following lemma shows that pure strategies and orderedk-matchings are in one-to-one correspondence. Lemma 22. In the air marshal scheduling problem, there exits an ordering of flights inA andB so that pure strategies and size-k ordered matchings are in one-to-one correspondence. Proof. It is easy to see that any orderedk-matching corresponds to one pure strategy. We prove the converse. Given any pure strategyS consisting of 2k flights, let e E =fe 1 ;:::;e k g be any matching that results inS. We claim that if there exist two edgese;e 0 2 e E withe = (A i ;B j ) ande 0 = (A i 0;B j 0) such thati>i 0 andj <j 0 , then (A i ;B j 0) and (A i 0;B j ) must also be edges in E. Since e;e 0 2 e E, we must have T 1 < dep(B j )arr(A i ) < T 2 and T 1 < dep(B j 0) arr(A i 0) < T 2 : Since flights inA are ordered increasingly by arrival time and flights inB are ordered increasingly by departure time, we havearr(A i )arr(A i 0) anddep(B j )dep(B j 0). These inequalities imply dep(B j 0)arr(A i ) dep(B j 0)arr(A i 0) T 2 and dep(B j 0) arr(A i ) dep(B j )arr(A i ) T 1 ; therefore (A i ;B j 0)2 E. Similarly, one can show that (A j ;B i 0)2E. As a result, we can adjust the matching by using the edges (A i ;B j 0) and (A i 0;B j ) instead. Such adjustments can continue until the matching becomes ordered. The procedure will terminate within a finite time by a simple potential function argument, with potential function f( e E) = P e=(A i ;B j )2 e E jijj 2 . The above adjustment always strictly decreases the potential function sincejijj 2 +ji 0 j 0 j 2 >jij 0 j 2 +ji 0 jj 2 ifi>i 0 andj <j 0 . The adjustment will terminate with an ordered matching and the ordered matching is unique, concluding our proof. Lemma 22 provides a way to reduce generalized counting over the set of pure strategies to generalized counting of size-k ordered matchings. Given any set of non-negative weights 2R n 1 +n 2 + , we define edge weightw e = A i B j for anye = (A i ;B j )2 E. As a result, the weight of any pure strategy equals the weight of the corresponding size-k ordered matching with edge weightsw e ’s. 138 Next we show that generalized counting of size-k ordered matchings admits an efficient al- gorithm. The main idea is to dynamically compute the generalized sum of size-k ordered match- ings according to some “order” of the bipartite graphs. More specifically, define E l;r E to be the set of edges that are “under” A l and B r , where A l 2 A;B r 2 B. Formally, any e = (A i ;B j )2 E is inE l;r iffi l;j r. We build a dynamic programing table with terms DP(l;r;d) = P M:ME l;r ;jMj=d w M , in which DP(l;r;d) is the sum of the weights of all size-d ordered matchings with edges in E l;r . Now, to compute DP(l;r;d), we only need to enumer- ate all the possibilities of the uppermost edge in the ordered matching, given that DP(i;j;d)s are known fori < l andj < r. This can be done by a dynamic program (Algorithm 8). The correctness of Algorithm 8 follows by definition. Algorithm 8: Generalized Counting of Orderedk-Matchings Input: :G = (A[B;E);w e 0 for anye2E. Output: : P M:jMj=d w M %M is an ordered matching 1: Initialization: DP(l;r; 0) = 1 forl = 0;::;n 1 ;r = 0;::;n 2 ; DP(0;r;d) = DP(l; 0;d) = 0 for alld 1;l = 0; 1;;:::;n 1 ;r = 0; 1;:::;n 2 . 2: Update: ford = 1;:::;k;l = 2;:::;n 1 ; r = 2;:::;n 2 : DP(l;r;d) =T (l 1;r 1;d) + X e=(A i ;B j )2E l;r s.t.i=l orj=r w e DP(i 1;j 1;d 1): 3: return DP(n 1 ;n 2 ;k). Step 2 Let ~ be the optimal solution of CP (9.2) for the air marshal scheduling problem. Invoking Algorithm 8, we can compute ~ in poly(n;k) time. Let i =e i for alli2A[B. Then the following algorithm (Algorithm 9) efficiently samples a pure strategy e from an exponentially large support with probability e defined by Equation (9.3). The correctness of Algorithm 9 follows a similar argument as the proof of Lemma 21; we thus will not repeat the details here. 139 Algorithm 9: Max-entropy sampling in the air marshal scheduling problem Input: : ~ 2 [0;1) n 1 +n 2 ,k. Output: : a pure strategye usingk air marshals. 1: Initialize:e =;; build the DP tableT (l;r;d) as in the previous part. 2: Setc =k,l =n 1 ;r =n 2 ; 3: whilec> 0 do 4: Sampling: for any edgee = (i;j)2E l;r incident onl orr, add edgee toe with probability p = i j T (i 1;j 1;k 1) T (l;r;k) ; 5: ife = (i;j) was added toe then 6: c =c 1; 7: l =i 1; r =j 1 8: else 9: l =l 1; r =r 1 10: returne. 9.3.2 Scalability Challenges and A Heuristic Sampling Algorithm Though the sampling algorithm in Section 9.3.1 provably runs in polynomial time, the order of the polynomial is large due to repeated calls to the counting oracle. In fact, the algorithm can only scale to a problem size of about 300 flights. However, FAMS needs to schedule about 30; 000 flights every day. Therefore, it is necessary to develop a much more efficient algorithm in order to scale up to this real-world problem size. In this section, we propose a heuristic sampling algorithm that matches any given marginal vectorx2 conv(E) and is expected to achieve high entropy. This algorithm works for general security games (not only the air marshal scheduling problem), and is computationally efficient as long as the underlying security game can be solved efficiently. At a high level, our idea is to design a randomized implementation for the celebrated Carath´ eodory’s theorem, which makes the following existence statement: for any bounded poly- topePR n and anyx2P, there exist (at most)n + 1 vertices ofP such thatx can be written as a convex combination of these vertices. InterpretingP as the convex hull of defender pure strategies, this means that any defender mixed strategy, i.e., a point inP, can be decomposed as a distribution over at most n + 1 pure strategies (n is the number of targets). We turn this existence statement into an efficient randomized algorithm, named CArath´ eodory Randomized Decomposition (CARD). 140 " $ ' = ' Figure 9.5: CARD Decomposition. Consider any polytopeP = fz : Az b;Mz = cg explicitly represented by polyno- mially many linear constraints, and any x 2 P. We use A i ;b i to denote the i’th row of A and b respectively; A i z = b i is a facet ofP. Geometrically, CARD randomly picks a vertex v 1 = arg max z2P ha;zi for a linear objective a2 [0; 1] n chosen uniformly at random. CARD then “walks along” the ray that originates from v 1 and points to x, until it crosses a facet of P, denoted byA i z = b i , at some point v 2 (see the illustration in Figure 9.5). Thus, x can be decomposed as a convex combination of v 1 ;v 2 . CARD then treats v 2 as a new x and decom- poses it within the facetA i z =b i recursively untilv 2 becomes a vertex. Details are presented in Algorithm 10. Algorithm 10:CARD Require:P =fzR n :Azb;Mz =cg andx2P Ensure: v 1 ;:::;v k andp 1 ;:::;p k such that P k i=1 p i v i =x. 1: if rank(M) =n then 2: Return the unique pointv 1 inP andp 1 = 1. 3: else 4: Choosea2 [1; 1] n uniformly at random. 5: Computev 1 = arg max z2P ha;zi. 6: Computet = min i:A i (xv 1 )>0 b i A i x i A i (xv 1 ) . Leti be the row achievingt, andP 0 =fz2P :A i z =b i g. 7: v 2 =x +t(xv 1 ); p 1 = t t+1 ; p 2 = 1 1+t . 8: [V 0 ;p 0 ] =CARD(v 2 ;P 0 ). 9: return V = (v 1 ;V 0 ) andp = (p 1 ;p 2 p 0 ). A crucial ingredient of CARD is that each vertexv 1 is the optimal vertex solution to a uni- formly random linear objective. Recall that the max-entropy distribution over any given support under no constraints is the uniform distribution. The intuition underlying CARD is that these randomly selected vertices will inherit the high entropy of their linear objectives. Notice that the 141 decomposition generated byCARD is different in each execution due to its randomness; therefore the strategies generated byCARD are sampled from a very large support. 9.3.3 Experiments We now experimentally compareMaxEn andCARD with traditional security game algorithms in the air marshal scheduling problem. We are not aware of any previous algorithm that directly computes the optimal defender strategy against a particular leakage model; therefore, the rig- orously optimal solution is not available. We instead use a “harder” BaseLine which is the attacker utility assuming no leakage. This is the best (i.e., smallest) possible attacker utility. The most widely used approach for solving large-scale security games is the column generation technique (a.k.a., strategy/constraint generation (Jain et al., 2010; Bosansky, Jiang, Tambe, & Kiekintveld, 2015)). We compare MaxEn and CARD with ColG (the optimal mixed strategy computed via column generation 4 assuming no leakage). Note that without leakage, all three algorithms achieve the same solution quality since they implement the same marginal vector. The goal of this experiment is to test their robustness in the presence of information leakage. Since it is impossible to obtain real-world data in this setting, all algorithms are thus tested on simulated instances for the Federal Air Marshal Scheduling problem with round-trip flights (FAMS). Like in the previous section, here we also generate zero-sum security games with reward and cost drawn randomly from [0; 10] and [10; 0], respectively. All results are averaged over 20 games. In the tested instances, we assume that the attacker can monitor two randomly chosen outbound flights and seeks to attack one return flight. -‐2 0 2 4 6 8 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Att Utility MaxEn CARD ColG BaseLine Figure 9.6: Utility comparisons in the FAMS domain (x-axis is the DtS ratio) 4 The column generation technique is widely used in many security game algorithms. Though some security games use a compact linear program to directly compute the optimal marginal vector, the ultimate generation of a deploy- able mixed strategy still requires strategy generation techniques. In FAMS domain, ColG is precisely the ASPEN algorithm (Jain et al., 2010) — the leading algorithm today for scheduling air marshals at scale. 142 Figure 9.6 compares the defender utility obtained by different algorithms when there is in- formation leakage. We vary the comparison on different deployment-to-saturation (DtS) ratios (Jain, Leyton-Brown, & Tambe, 2012). The DtS ratio captures the fraction of targets that can be covered in a pure strategy, which is 2k=n in the FAMS domain (n = 60 in Figure 9.6). From Figure 9.6, we observe thatMaxEn andCARD significantly outperformsColG in our simulations;CARD is usually slightly outperformed byMaxEn. In fact, the attacker utility of the max-entropy approach is even close to the Baseline benchmark. This shows that the approach performs really well since the Baseline is the lowest possible attacker utility. We observed that the higher the DtS ratio is, the worseColG performs. This is possibly because, with higher DtS, ColG quickly converges to an optimal mixed strategy with very small support, since each pure strategy covers many targets. Unfortunately, such a small-support strategy suffers severely from the curse of correlation. We observed that in FAMS games with 100 targets, MaxEn, CARD, ColG use 99997, 2199, 53 pure strategies on average, respectively (MaxEn samples 100,000 pure strategies in our experiments, and almost all of them are different). 9.4 The Design of Randomized Patrol Routes In this section, we consider the problem of designing randomized patrol routes, as illustrated in Section 7.2. This setting belongs to a broader class of games termed spatio-temporal security games. These games are played out in space and time, and have applications in many domains, e.g., wildlife protection, protection of mobile ferries, etc. (Basilico, Gatti, & Amigoni, 2009a; Fang, Jiang, & Tambe, 2013; Yin, Xu, Gain, An, & Jiang, 2015). In these domains, the defender needs to move patrollers as time goes on. Due to the inherent correlation among the patroller’s consecutive moves, these games are more likely to suffer from information leakage. We start with the formal definition of the problem. Like most previous work, we focus on discretized spatio-temporal security games. Such games are described by aTN grid graph time space 4 3 2 1 1 2 3 4 . . . . . . 5 5 6 6 Figure 9.7: Structure of a spatio-temporal security game 143 G = (V;E) indicating a problem withN cells andT time layers (see Figure 9.7). We use v t;i to denote a grid node, representing celli at timet. Eachv t;i is treated as a target, so there aren = TN targets. The directed edges denote the patroller’s feasible moves between cells within between consecutive time layers. Such feasibility usually incorporates speed limit, terrain constraints, etc. Figure 9.7 depicts some feasible moves between time layers 1 and 2. A feasible patrol path is a path in G starting from time 1 and ending at time T (e.g., the dashed path in Figure 9.7). Note that there are exponentially many patrol paths. We assume that the defender hask homogeneous patrollers, so a defender pure strategy corresponds to the set of nodes covered byk feasible patrol paths. Different from the cases in the previous two sections, we will prove that it is computationally intractable in this setting to compute the distribution that maximizes entropy subject to matching a given marginal vector. We will then develop a polynomial-time algorithm for a well-motivated special case. Finally, we thoroughly evaluate the algorithm based on both synthetic and real-world data. 9.4.1 Complexity Barriers We prove the following theorem in this section. Theorem 9.4.1. It is #P-hard to sample the max-entropy distribution for spatio-temporal security games even when there are only two time layers (i.e.,T = 2). Proof. When there are two time steps, the game structure corresponds to a bipartite graph (T = 2 in Figure 9.7). It is important to notice that a pure strategy here does not simply correspond to a bipartite matching of size k; therefore we cannot reduce from the problem of counting size- k matchings. This is because the selected k edges are allowed to share nodes. Moreover, our definition of a pure strategy is the set of covered targets, not the edges themselves. In fact, sometimes one pure strategy can be achieved by different sets ofk edges. To prove the theorem, we reduce from the problem of counting bases of a transversal matroid, which is known to be #P-complete (Colbourn, Provan, & Vertigan, 1995). Given any bipartite graphG = (L[R;E) withjLj =k,jRj =n andkn, any setTR is an independent set of the transversal matroidM(G) ofG if there exists a matching of sizejTj in the subgraph induced byL[T ; such aT is a base ifjTj =jLj =k. Given any bipartite graphG = (L[R;E) withk =jLjjRj = n, we reduce counting bases of the transversal matroidM(G) to computing the max-entropy distribution for the two- time-layer spatio-temporal security game on graphG. 5 LetS 2k denote the set of pure strategies 5 Though the definition of spatio-temporal security games requires that each time layer has the same number of nodes, this requirement is not essential since one can always add isolated nodes to each time layer to equalize the number of nodes. 144 that cover exactly 2k nodes. We first reduce counting bases ofM(G) to countingS 2k . This is simply because if a pure strategy covers 2k nodes, it must cover allk nodes inL and anotherk nodes inR, and these 2k nodes are matchable.Then elements inS 2k andM(G) are in one-to-one correspondence. Since counting reduces to generalized counting, we finally reduce generalized counting over the setS 2k to computing the max-entropy distribution for the following special subset of marginal vectorsX 2k =fx2 [0; 1] k+n : x 1;i = 1;8i2 L; P n i=1 x 2;i = kg. It is easy to see that any mixed strategy that matches a marginal vector x2X 2k must have support inS 2k . Therefore, when considering the max-entropy distribution for anyx2X 2k , we can w.l.o.g. restrict the set of pure strategies to beS 2k . By the computational equivalence between generalized counting and max-entropy sampling (Singh & Vishnoi, 2013), generalized counting overS 2k reduces to computing the max-entropy distribution for anyx2X 2k . 9.4.2 An Efficient Algorithm for a Restricted Setting Theorem 9.4.1 suggests that it is unlikely that there is an efficient algorithm for sampling the max-entropy distribution with given marginal probabilities in this setting. Moreover, there is no known polynomial size compact formulation for sampling the max-entropy distribution. Thus we cannot utilize state-of-the-art optimization software to tackle the problem neither. Nevertheless, in this section, we show that the max-entropy approach can be efficiently im- plemented in a well-motivated special setting where the defender only possesses a small number of patrollers. For example, in wildlife protection, the defender usually has only one or two patrol teams at each patrol post (Fang et al., 2016a); the US Coast Guard uses two patrollers to protect Staten Island ferries (Fang et al., 2013). We show that when the number of patrollers is small (i.e., a constant), the max entropy distribution can be sampled efficiently. Theorem 9.4.2. When the number of patrollers is a constant, there is poly(N;T ) time algorithm for sampling the distribution that maximizes entropy subject to matching any given marginal vector in spatio-temporal security games. Similar to the proof of Theorem 9.3.1, this proof also has two steps: First, we design a poly(N;T ) time algorithm for the generalized counting problem over the set systemE of defender pure strategies. Second, we will design an efficient sampling algorithm that samples a pure strategye from an exponentially large support with probability e defined by Equation (9.3). 145 Step 1 For ease of presentation, our description focuses on the case with two patrollers, though it easily generalizes to a constant number of patrollers. We propose a dynamic program (DP) that exploits the natural chronological order of the targets along the temporal dimension. 6 Let us call a pure strategy a 2-path, since each of the two patrollers takes a path onG. Our goal is to compute the weighted count of all 2-paths in a grid graphG, each weighted by the product of the node weights it traverses. Our goal is to compute the weighted count of combinations of two paths inG (one for each patroller), where the weight is the product of the node weights that the two paths traverse. Letf t;i g t2[T ];i2[n] be any given weight set. Obviously, the counting problem is easy ifT = 1, i.e., only one time layer. Our key observation is that the solution for T =t can be constructed by utilizing the solutions forT =t 1. For any 1ijN, we use DP(i;j;t) to denote the solution to the counting problem restricted to the truncated graph with only time layers 1; 2; ;t, satisfying that the two patroller must end at cells i;j at time t. Observe that DP(i;j; 1) = 1;i 1;j when i6= j and DP(i;i; 1) = 1;i . We then use the following update rule fort 2: DP(i;j;t) = 8 < : t;i t;j P (i 0 ;j 0 )2pre(i;j) DP(i 0 ;j 0 ;t 1) ifi<j t;i P (i 0 ;j 0 )2pre(i;j) DP(i 0 ;j 0 ;t 1) ifi =j (9.4) where pre =f(i 0 ;j 0 ) : i 0 j 0 s:t:v t1;i 0;v t1;j 0 can reachv t;i ;v t;j g is essentially the set of all pairs of nodes that can reachv t;i ;v t;j . Note that the solution to the generalized counting problem is P ij DP(i;j;T ). The correctness of the algorithm follows from the observation that if the two patrollers are atv t;i andv t;j , they must come fromv t1;i 0 andv t1;j 0 for certain (i 0 ;j 0 )2 pre(i;j). The updating rule simply aggregates all such choices. The algorithm runs in poly(N;T ) time. Step 2 Let ~ be the optimal solution of CP (9.2) in the spatio-temporal setting. Let t;i = e t;i for allt;i. Then the following algorithm efficiently samples a pure strategye from an exponentially large support with probability e defined by Equation (9.3). The correctness of Algorithm 11 follows from a similar argument as the proof of Lemma 21. 6 Dynamic programming is widely used in counting problems. See, e.g., (Cryan & Dyer, 2002; Dyer, 2003) as well as the remarks in (Valiant, 1979). The novel parts usually lie at careful analysis of the problem to uncover the proper structure for DP. 146 Algorithm 11: Max-Entropy Sampling In Spatio-Temporal Security Games Input: : ~ 2 [0;1) (T +1)(N+1) . Output: : a pure strategye. 1: Initialize:e =;; build the DP tableDP (i;j;t) according to Equation (9.4). 2: Sample two nodes (v i;T ;v j;T ), with 0i<jN, at timeT with probability p = DP (i;j;T ) P N i=0 P N j=i DP (i;j;T ) ; Leti ;j be the two sampled nodes; Add them toe. 3: Definea =i ;b =j . 4: fort =T 1 to 0 do 5: Sample nodesv t;i ;v t;j , for (i;j)2pred(a;b) and 0ijN, with probability p = t;i t;j DP (i;j;t) DP (a;b;t + 1) p = t;i DP (i;j;t) DP (a;b;t + 1) ifi =j ; 6: Letv t;i ;v t;j be the sampled nodes above, and addv t;i ;v t;j toe 7: Updatea =i ,b =j . 8: return e. 9.4.3 Experiments 9.4.3.1 Synthetic Data We first experimentally compareMaxEn,CARD with traditional algorithms for spatio-temporal security games. Like the setup in Section 9.3.3, we are not aware of any previous algorithm that directly computes the optimal defender strategy against a particular leakage model. Instead we use a “harder”BaseLine which is the attacker utility assuming no leakage. This is the best (i.e., smallest) possible attacker utility. We compareMaxEn andCARD withColG (the optimal mixed strategy computed via column generation assuming no leakage). Note that without leakage, all three algorithms achieve the same solution quality since they implement the same marginal vector. The goal of this experiment is to test their robustness in the presence of information leakage. In this part, we will test all algorithms on simulated instances. All results are averaged over 20 zero-sum security games with utilities drawn randomly from [10; 10]. In the tested instances, unless specifically mentioned, we always assume that the attacker can monitor two randomly chosen targets at the first time layer (i.e., t = 1) and seeks to attack one target at the last time layer (i.e.,t =T ). 147 3.5 4.5 5.5 6.5 7.5 2 3 4 5 6 7 8 Att Utility MaxEn CARD ColG BaseLine (a) IncreaseT 0 2 4 6 8 10 1 2 3 4 5 6 Att Utility MaxEn CARD ColG BaseLine (b) Increase #MoT Figure 9.8: Utility comparisons in spatio-temporal security games. Figure 9.8 compares the defender utility obtained by different algorithms when there is infor- mation leakage. From the figure, it is easy to see thatMaxEn andCARD significantly outperform ColG; CARD is usually slightly outperformed by MaxEn. Moreover, the attacker utility that the max-entropy approach induces is even close to the Baseline benchmark. This shows that the approach performs very well in the simulated random games since the Baseline is the lowest possible attacker utility. Figure 9.8(a) compares the algorithms by varying the number of time layers T , but fixing N = 9. WhenT increases,MaxEn andCARD approach theBaseLine, i.e., the lowest possible attacker utility. This shows that in patrolling strategies of large entropy, the correlation between a patroller’s initial and later moves gradually disappears as time goes on. This illustrates the validity of the max-entropy approach for mitigating CoC. In Figure 9.8(b), we fixT = 9;N = 9, and compare the algorithms by varying the number of monitored targets (#MoT). We observed that even when the attacker can monitor 6 out of 9 targets att = 1,MaxEn andCARD are still close toBaseLine, while the performance ofColG gradually decrease as #MoT increases. 9.4.3.2 Real-World Data from the Queen Elizabeth National Park Finally, we test our algorithm on a real-world wildlife crime dataset from Uganda’s Queen Eliza- beth Protected Area (QEPA). QEPA spans approximately 2,520 square kilometers and is patrolled by wildlife park rangers. While on patrol, they collect data on animal sightings and signs of ille- gal human activity (e.g., poaching, trespassing). In addition to this observational data, the dataset contains terrain information (e.g., slope, vegetation), distance data (e.g., nearest patrol post), ani- mal density, and the kilometers walked by rangers in an area (i.e., effort). There are 39 patrol posts at QEPA. We test the patrol design algorithm on the real data/model at patrol posts 11, 19 and 24, which are the three posts that had the most attacks in the three months of our testing. We divide the area around each patrol post into 1 square kilometer grid cells and optimize the patrol route for that particular post based on the importance of each cell estimated from several features (e.g., animal density, past captures, terrain, past effort). In our 148 data, all posts have less than 100 cells/targets reachable from the post by a route of maximum durationT = 12 (equivalently, a 12-cells long route). We aim at comparing our patrol design algorithm with the real patrol routes adopted in the past by park rangers. One major challenge in experimenting with this real data is the lack of ground truth. In particular, for the past patrolling, we do not know what happened at those cells that were not patrolled nor doe we know what would have happened if the rangers had adopted our algorithm. Therefore, as an approximation, we use the state-of-the-art predictive model in (Gholami, Ford, Fang, Plumptre, Tambe, Driciru, Wanyama, Rwetsiba, Nsubaga, & Mabonga, 2017) to estimate the attacks at each cell. This is of course not perfect, but it is the best comparison we could do currently since (Gholami et al., 2017) shows that this predictive model outperforms all previous poaching prediction models and provides relatively accurate predictions on the QEPA dataset. The comparisons are conducted under the following criteria: #Detection: total number of detected attacks under the prediction model. Since the predic- tion model we adopt is a 0-1 classification algorithm, in this case #Detection also equals the number of cells at which the corresponding patrol routes result in detected attacks. #Routes: the number of different patrol routes in 90-day route samples (corresponding to a 3-month patrolling period). Entropy: The entropy of the empirical distribution of the 90 samples. The first criterion concerns the efficacy of the patrol routes while the last two criteria are used to test the unpredictability of the patrol routes. For the #Detection criterion,a=b means that out of theb cells with predicted attacks,a are patrolled. For example, in Table 9.1, the “15/19” means the following: 19 cells are predicted to be attacked; the patrol route visits 15 of these 19 cells. A higher value of #Routes means that the patroller has more choices of patrol routes, and thus less explorable by the poacher; Entropy is a natural measure to quantify uncertainty. Criteria Post 11 Post 19 Post 24 MaxEn Past MaxEn Past MaxEn Past #Detections 15/19 4/19 6/6 5/6 4/4 3/4 #Routes 61 4 22 33 34 5 Entropy 4.0 1.2 2.6 3 2.8 1.4 Table 9.1: Comparisons of different criteria at different patrol posts The results are jointly presented in Table 9.1. As we can see, the patrol routes generated by MaxEn clearly outperform past patrolling in terms of the #Detections criteria. The routes 149 we generate can detect most (if not all) of the predicted attacks. In terms of unpredictability, past patrolling does not have stable performance. Particularly, it follows only a few routes at posts 11 and 24 with low unpredictability but takes many different routes at post 19 with high unpredictability. This is a consequence of various factors at different posts, e.g., the patroller’s preferences, location of the patrol post (e.g., inside or at the boundary of the area), terrain features, etc. On the other hand, MaxEn always comes with sufficient unpredictability. This shows the advantage ofMaxEn over the past patrolling. 150 Part IV Conclusion 151 Chapter 10 Conclusions and Open Directions This thesis seeks to understand how information affects agents’ decision making in strategic interactions through a computational lens. It illustrates the double-edged role of information through two threads of research: (1) how to utilize information to one’s own advantage in strategic interactions; (2) how to mitigate losses resulting from information leakage to an adversary. We conduct both theoretical study to understand the algorithmic foundations of these problems as well as applied study to show how these problems can be modeled and solved in real-world applications. Notably, algorithms from this thesis have been implemented and tested in the field by security agencies. This shows the potential real-world impact of understanding the role of information in decision making. Though the work of this thesis is primarily motivated by the strategic interaction between security agencies and adversaries (i.e., security games), we believe that the foundational economic models we studied and the basic tools we developed can find applications in many other application domains as well. We conclude with some future directions that are motivated by this thesis or are aligned with its theme. Future Direction I: How to optimally persuade receivers in more realistic, yet more intricate, settings by taking into account, e.g., externalities among receivers, uncer- tainties in receivers’ beliefs, and multi-round interactions between the sender and receivers? Can we still design efficient algorithms for these problems? In Section 5.2, we provide a thorough algorithmic analysis for two of the most foundational models of persuasion, and consider the setting where there is either one receiver or multiple receivers with binary actions and no externalities. However, in many settings, the receivers may have externalities and their decisions affect each others’ payoffs. Examples include: (1) auction settings where the auctioneer (the sender) may want to persuade bidders (receivers) about the value of the item for sale; (2) traffic routing where certain recommendation systems like Google Maps (the sender) may want to persuade drivers (receivers) about choices of routing paths; (3) 152 voting settings where a principal (the sender) may want to persuade voters (receivers) regarding which candidate to vote for. In all these applications, the receivers’ decisions will affect each other’s payoffs. One fundamental problem is to understand the complexity of persuasion due to such externalities. One important concern about previous models of persuasion is that the sender and receivers must share the same prior belief and the receivers must know precisely the signaling scheme. However, in practice, these beliefs usually come from observations or data analysis, and thus are rarely precise. To make these models more realistic, we have to understand how robust the models or computation are to uncertain or imprecise player beliefs. Moreover, in many settings, persuasion is done in a multi-round interaction between the sender and receiver. How to optimally persuade receivers in a multi-round interaction and how would this change the complexity of the problem? Future Direction II: What are other roles that information could play in security domains? How to improve security decision making by taking them into account? This thesis initiates a computational study of the role of information in security decision making and considers how to utilize the defender’s informational advantage and how to deal with harms due to information leakage. This motivates the future study of many other possible roles that information could play in security domains. For example, in security games we usu- ally assume that the adversary will surveil the defender’s strategy and then strategically respond. However, in many domains, the defender can also surveil the adversary by using surveillance tools like closed-circuit televisions. One important question in these problems is how to inte- grate such information into the defender’s decision making to improve the defense. This thesis studies how the defender can utilize informational advantages to deceive the adversary. How- ever, in practice, the adversary can also be deceptive and may provide misleading information to the defender. Therefore, another interesting question is how to take such strategic adversary behavior into account and how it would affect the defender’s decision making. This question is particularly relevant in cybersecurity domains since the attackers there are usually more sophisti- cated, and deceptive attacks have been frequently observed in practice (Rowe & Rothstein, 2004; Rowe, 2006). These are just two examples, and there are many other questions pertaining to the role of information in security domains that remain largely unexplored. More importantly, when considering a particular application, how can we take into account various effects of information together to make better decisions? Future Direction III: Besides security domains, how does information affect the decision making in other multi-agent systems and applications? How to model and design efficient algorithms for these applications? 153 This thesis is primarily motivated by strategic interactions in security domains. This, how- ever, is just one particular example of multi-agent systems. In the future, it will be interesting to study other systems with self-interested agents, and our work indicates that information can play a crucial role in influencing the outcomes of these systems. This broad line of research is particu- larly relevant in the digital age since the ubiquitous access to data and advances in data analytics have made it much easier today to generate and communicate information. This is profoundly af- fecting people’s decision making. Indeed, many of our decisions today (e.g., which route to take, which restaurant to go to, which stock to invest in, which candidate to vote for, etc.) are affected by, or even rely on, numerous information sources such as news, media, social networks, search engines and various recommendation system applications (e.g., Google Maps, Yelp, etc.). This brings tremendous opportunities for studying the effects of information in these domains and un- derstanding how we can utilize these effects to improve decision making. Moreover, we believe that the requirement of automated applications today makes it particularly suitable to develop computational techniques and algorithms for solving these problems. 154 Part V Appendices 155 Appendix A Omitted Proofs From Section 5.2 A.1 Omissions from Section 5.2.2 A.1.1 Symmetry of the Optimal Scheme (Theorem 5.2.1) To prove Theorem 5.2.1, we need two closure properties of optimal signaling schemes — with respect to permutations and convex combinations. We use to denote a permutation of [n], and let SS n denote the set of all such permutations. We define the permutation () of a state of nature 2 [m] n so that (()) j = (j) , and similarly the permutation of a signal i so that ( i ) = (i) . Given a signatureM =f(M i ; i )g i2[n] , we define the permuted signature (M) =f(M i ;( i ))g i2[n] , where M denotes applying permutation to the rows of a matrixM. Lemma 23. Assume the action payoffs are i.i.d., and let2SS n be an arbitrary permutation. If M is the signature of a signaling scheme', then(M) is the signature of the scheme' defined by' () =('( 1 ())). Moreover, if' is persuasive and optimal, then so is' . Proof. LetM =f(M ;)g 2 be the signature of', as given in the statement of the lemma. We first show that(M) =f(M ;())g 2 is realizable as the signature of the scheme' . 156 By definition, it suffices to show that P ()' (;())M = M for an arbitrary signal (). X ()' (;())M = X ()'( 1 ();)M (by definition of' ) = X 2 ()'( 1 ();)( 1 M ) (by linearity of permutation) = X 2 ()'( 1 ();)M 1 () = X 2 ( 1 ())'( 1 ();)M 1 () (Since is i.i.d.) = X 0 2 ( 0 )'( 0 ;)M 0 (by renaming 1 () to 0 ) =M (by definition ofM ) Now, assuming ' is persuasive, we check that ' is persuasive by verifying the relevant inequality for its signature. (M i ) (i) (M i ) (j) =M i i M i j 0 Moreover, we show that the sender’s utility is the same for' and' , completing the proof. (M i ) (i) = (M i ) i Lemma 24. Let t 2 [0; 1]. IfA = (A 1 ;:::;A n ) is the signature of scheme ' A , and B = (B 1 ;:::;B n ) is the signature of a scheme ' B , then their convex combinationC = (C 1 ;:::;C n ) with C i = tA i + (1t)B i is the signature of the scheme ' C which, on input, outputs' A () with probabilityt and' B () with probability 1t. Moreover, if' A and ' B are both optimal and persuasive, then so is' C . Proof. This follows almost immediately from the fact that the optimization problem in Figure 5.2 is a linear program, with a convex feasible set and a convex family of optimal solutions. We omit the straightforward details. Proof of Theorem 5.2.1 Given an optimal and persuasive signaling scheme' with signaturef(M i ; i )g i2[n] , we show the existence of a symmetric optimal and persuasive scheme of the form in Definition 1. Accord- ing to Lemma 23, for 2 SS n the signaturef(M i ;( i ))g i2[n] — equivalently written as 157 f(M 1 (i) ; i g i2[n] — corresponds to the optimal persuasive scheme' . Invoking Lemma 24, the signature f(A i ; i )g i2[n] =f( 1 n! X 2SSn M 1 (i) ; i )g i2[n] also corresponds to an optimal and persuasive scheme, namely the scheme which draws a permu- tation uniformly at random, then signals according to' . Observe that theith row of the matrixM 1 (i) is the 1 (i)th row of the matrixM 1 (i) . ExpressingA i i as a sum over permutations2SS n , and grouping the sum byk = 1 (i), we can write A i i = 1 n! X 2SSn [M 1 (i) ] i = 1 n! X 2SSn M 1 (i) 1 (i) = 1 n! n X k=1 M k k 2SS n : 1 (i) =k = 1 n! n X k=1 M k k (n 1)! = 1 n n X k=1 M k k ; which does not depend oni. Similarly, thejth row of the matrixM 1 (i) is the 1 (j)th row of the matrixM 1 (i) . Forj6= i, expressingA i j as a sum over permutations2 SS n , and grouping the sum byk = 1 (i) andl = 1 (j), we can write A i j = 1 n! X 2SSn [M 1 (i) ] j = 1 n! X 2SSn M 1 (i) 1 (j) = 1 n! X k6=l M k l 2SS n : 1 (i) =k; 1 (j) =l = 1 n! X k6=l M k l (n 2)! = 1 n(n 1) X k6=l M k l ; 158 which does not depend oni orj. Let x = 1 n n X k=1 M k k ; y = 1 n(n 1) X k6=l M k l : The signaturef(A i ; i )g i2[n] therefore describes an optimal, persuasive, and symmetric scheme withs-signature (x;y). A.1.2 The Optimal Scheme Proof of Lemma 1 For the “only if” direction,jjxjj 1 = 1 n andx + (n 1)y =q were established in Section 5.2.2. To show that is a realizable symmetric reduced form for an allocation rule, let' be a signaling scheme withs-signature (x;y). Recall from the definition of ans-signature that, for eachi2 [n], signal i has probability 1=n, andnx is the posterior distribution of actioni’s type conditioned on signal i . Now consider the following allocation rule: Given a type profile2 [m] n of then bidders, allocate the item to bidderi with probability'(; i ) for anyi2 [n]. By Bayes rule, Pr[i gets itemji has typej] =Pr[i has typejji gets item] Pr[i gets item] Pr[i has typej] =nx j 1=n q j = x j q j Therefore is indeed the reduced form of the described allocation rule. For the “if” direction, let ,x, andy be as in the statement of the lemma, and consider an allocation rule A with symmetric reduced form . Observe that A always allocates the item, since for each player i 2 [n] we have Pr[i gets the item] = P m j=1 q j j = P m j=1 x j = 1 n . We define the direct signaling scheme' A by' A () = A() . LetM = (M 1 ;:::;M n ) be the signature of ' A . Recall that, for and arbitrary i 2 [n] and j 2 [m], M i ij is the probability that' A () = i and i =j; by definition, this equals the probability thatA allocates the item to player i and her type is j, which is j q j = x j . As a result, the signatureM of ' A satisfies M i i = x for every action i. If ' A were symmetric, we would conclude that its s-signature is (x;y) since everys-signature (x;y 0 ) must satisfyx + (n 1)y 0 =q (see Section 5.2.2). However, this is not guaranteed when the allocation rule A exhibits some asymmetry. Nevertheless,' A can be “symmetrized” into a signaling scheme' 0 A which first draws a random permutation 2 SS n , and signals (' A ( 1 ())). That ' 0 A has s-signature (x;y) follows a similar argument to that used in the proof of Theorem 5.2.1, and we therefore omit the details here. 159 Finally, observe that the description of' 0 A above is constructive assuming black-box access toA, with runtime overhead that is polynomial inn andm. Proof of Lemma 2 By Lemma 1, we can re-write LP (5.2) as follows: maximize nx subject to xy x + (n 1)y =q jjxjj 1 = 1 n ( x 1 q 1 ;::::; xm qm ) is a realizable symmetric reduced form (A.1) From (Border, 1991, 2007; Cai et al., 2012; Alaei et al., 2012), we know that the family of all the realizable symmetric reduced forms constitutes a polytope, and moreover that this polytope admits an efficient separation oracle. The runtime of this oracle is polynomial inm andn, and as a result the above linear program can be solved inpoly(n;m) time using the Ellipsoid method. A.1.3 A Simple (1 1=e)-Approximate Scheme Proof of Theorem 5.2.3 Given a binary signal = (o 1 ;:::;o n )2fHIGH;LOWg n , the posterior type distribution for an action equals nx if the corresponding component signal is HIGH, and equals ny if the component signal isLOW. This is simply a consequence of the independence of the action types, the fact that the different component signals are chosen independently, and Bayes’ rule. The constraintx y implies that the receiver prefers actionsi for whicho i =HIGH, any one of which induces an expected utility ofnx for the receiver andnx for the sender. The latter quantity matches the optimal value of LP (5.3). The constraintjjxjj 1 = 1 n implies that each component signal isHIGH with probability 1 n , independently. Therefore, the probability that at least one component signal is HIGH equals 1 (1 1 n ) n 1 1 e . Since payoffs are nonnegative, and since a rational receiver selects aHIGH action when one is available, the sender’s overall expected utility is at least a 1 1 e fraction of the optimal value of LP (5.3). 160 A.2 Proof of Theorem 5.2.5 This section is devoted to proving Theorem 5.2.5. Our proof starts from the ideas of (Gopalan et al., 2015), who show the #P-hardness for revenue or welfare maximization in several mecha- nism design problems. In one case, (Gopalan et al., 2015) reduce from the #P -hard problem of computing the Khintchine constant of a vector. Our reduction also starts from this problem, but is much more involved: First, we exhibit a polytope which we term Khintchine polytope, and show that computing the Khintchine constant reduces to linear optimization over the Khintchine poly- tope. Second, we present a reduction from the membership problem for the Khintchine polytope to the computation of optimal sender utility in a particularly-crafted instance of persuasion with independent actions. Invoking the polynomial-time equivalence between membership checking and optimization (see, e.g., (Gr¨ otschel et al., 1988)), we conclude the #P-hardness of our prob- lem. The main technical challenge we overcome is in the second step of our proof: given a point x which may or may not be in the Khintchine polytopeK, we construct a persuasion instance and a thresholdT so that points inK encode signaling schemes, and the optimal sender utility is at leastT if and only ifx2K and the scheme corresponding tox results in sender utilityT . The Khintchine Polytope We start by defining the Khintchine problem, which is shown to be #P-hard in (Gopalan et al., 2015). Definition 7. (Khintchine Problem) Given a vector a2 R n , compute the Khintchine constant K(a) ofa, defined as follows: K(a) = E f1g n [jaj]; where is drawn uniformly at random fromf1g n . To relate the Khintchine problem to Bayesian persuasion, we begin with a persuasion in- stance with n i.i.d. actions. Moreover, there are only two action types, 1 which we refer to as type -1 and type +1. The state of nature is a uniform random draw from the setf1g n , with the ith entry specifying the type of action i. It is easy to see that these actions are i.i.d., with marginal probability 1 2 for each type. We call this instance the Khintchine-like persuasion set- ting. As in Section 5.2.2 , we still use the signature to capture the payoff-relevant features of a signaling scheme. A signature for the Khintchine-like persuasion problem is of the form M = (M 1 ;:::;M n ) whereM i 2 R n2 for anyi2 [n]. We pay special attention to signaling 1 Recall from Section 5.2.2 that each type is associated with a pair (;), where [] is the payoff to the sender [receiver] if the receiver takes an action of that type. 161 maximize P n i=1 a i (M + i;+1 M + i;1 ) P n i=1 a i (M i;+1 M i;1 ) subject to (M + ;M )2K(n) (A.2) Linear program for computing the Khintchine constantK(a) fora2R n schemes which use only two signals, in which case we represent them using a two-signal signa- ture of the form (M 1 ;M 2 )2R n2 R n2 . Recall that such a signature is realizable if there is a signaling scheme which uses only two signals, with the property thatM i jt is the joint probability of theith signal and the event that actionj has typet. We now define the Khintchine polytope, consisting of a convex family of two-signal signatures. Definition 8. The Khintchine polytope is the familyK(n) of realizable two-signal signatures (M 1 ;M 2 ) for the Khintchine-like persuasion setting which satisfy the additional constraints M 1 i;1 +M 1 i;2 = 1 2 8i2 [n]. We sometimes useK to denote the Khintchine polytopeK(n) when the dimensionn is clear from the context. Note that the constraintsM 1 i;1 +M 1 i;2 = 1 2 ;8i2 [n] state that the first signal should be sent with probability 1 2 (hence also the second signal). We now show that optimizing over the Khintchine polytope is #P -hard by reducing the Kintchine problem to Linear program (A.2). Lemma 25. General linear optimization over the Khintchine polytopeK is #P -hard. Proof. For any givena2R n , we reduce the computation ofK(a) – the Khintchine constant for a – to a linear optimization problem over the Khintchine polytopeK. Since our reduction will use two signals + and which correspond to the sign ofa, we will use (M + ;M ) to denote the two matrices in the signature in lieu of (M 1 ; M 2 ). Moreover, we use the two action types +1 and1 to index the columns of each matrix. For example,M + i;1 is the joint probability of signal + and the event that theith action has type1. We claim that the Kintchine constant K(a) equals the optimal objective value of the implicitly-described linear program (A.2). We denote this optimal objective value by OPT (LP (A.2)). We first prove thatK(a) OPT (LP (A.2)). Consider a signaling scheme ' in the Kintchine-like persuasion setting which simply outputs sign(a) for each state of na- ture 2f1g n (breaking tie uniformly at random if a = 0). Since is drawn uniformly fromf1g n andsign(a) =sign(a), this scheme outputs each of the signals and + with probability 1 2 . Consequently, the two-signal signature of' is a point inK. Moreover, 162 evaluating the objective function of LP (A.2) on the two-signal signature (M + ;M ) of' yields K(a) =E [jaj], as shown below. E [jaj] =E [aj + ]Pr( + ) +E [aj ]Pr( ) = n X i=1 a i E [ i j + ]Pr( + ) n X i=1 a i E [ i j ]Pr( ) = n X i=1 a i [Pr( i = 1j + )Pr( i =1j + )]Pr( + ) n X i=1 a i [Pr( i = 1j )Pr( i =1j )]Pr( ) = n X i=1 a i [Pr( i = 1; + )Pr( i =1; + )] n X i=1 a i [Pr( i = 1; )Pr( i =1; )] = n X i=1 a i [M + i;+1 M + i;1 ] n X i=1 a i [M i;+1 M i;1 ] This concludes the proof thatK(a)OPT (LP (A.2)). Now we proveK(a) OPT (LP (A.2)). Take any signaling scheme which uses only two signals + and , and let (M + ;M ) be its two-signal signature. Notice, however, that + now is only the “name” of the signal, and does not imply thata is positive. Nevertheless, it is still valid to reverse the above derivation until we reach n X i=1 a i [M + i;+1 M + i;1 ] n X i=1 a i [M i;+1 M i;1 ] = E [aj + ]Pr( + ) +E [aj ]Pr( ): Sincea anda are each no greater thanjaj, we have E [aj + ]Pr( + ) +E [aj ]Pr( )E [jajj + ]Pr( + ) +E [jajj ]Pr( ) =E [jaj] =K(a): That is, the objective value of LP (A.2) is upper bounded byK(a), as needed. Before we proceed to present the reduction from the membership problem forK to optimal persuasion, we point out an interesting corollary of Lemma 25. Corollary 2. LetP be the polytope of realizable signatures for a persuasion problem withn i.i.d. actions andm types (see Section 5.2.2). Linear optimization overP is #P -hard, and this holds even whenm = 2. Proof. Consider the Khintchine-like persuasion setting. It is easy to see that the Khintchine polytopeK can be obtained fromP by adding the constraintsM i = 0 fori 3 andM 1 i;1 + 163 M 1 i;2 = 1 2 fori2 [n], followed by a simple projection. Therefore, the membership problem forK can be reduced in polynomial time to the membership problem forP, since the additional linear constraints can be explicitly checked in polynomial time. By the polynomial-time equivalence between optimization and membership, it follows that general linear optimization overP is #P - hard. Remark A.2.1. It is interesting to compare Corollary 2 to single item auctions with i.i.d. bidders, where the problem does admit a polynomial-time separation oracle for the polytope of realizable signatures via Border’s Theorem (Border, 1991, 2007) and its algorithmic properties (Cai et al., 2012; Alaei et al., 2012). In contrast, the polytope of realizable signatures for Bayesian persua- sion is #P-hard to optimize over. Nevertheless, in Section 5.2.2 we were indeed able to compute the optimal signaling scheme and sender utility for persuasion with i.i.d. actions. Corollary 2 conveys that it was crucial for our algorithm to exploit the special structure of the persuasion objective and the symmetry of the optimal scheme, since optimizing a general objective overP is #P-hard. Reduction We now present a reduction from the membership problem for the Khintchine polytope to the computation of optimal sender utility for persuasion with independent actions. As the output of our reduction, we construct a persuasion instance of the following form. There are n + 1 actions. Action 0 is special – it deterministically results in sender utility and receiver utility 0. Here, we think of > 0 as being small enough for our arguments to go through. The other n actions are regular. Action i > 0 independently results in sender utilitya i and receiver utility a i with probability 1 2 (call this the type 1 i ), or sender utilityb i and receiver utility b i with probability 1 2 (call this the type 2 i ). Note that the sender and receiver utilities are zero-sum for both types. Notice that, though each regular action’s type distribution is uniform over its two types, the actions here are not identical because the associated payoffs — specified bya i andb i for each actioni — are different for different actions. Since the special action is deterministic and the probability of its (only) type is 1 in any signal, we can interpret any (M 1 ;M 2 )2K(n) as a two-signal signature for our persuasion instance (the row corresponding to the special action 0 is implied). For example,M 1 i;2 is the joint probability of the first signal and the event that actioni has type 2 i . Our goal is to reduce membership checking forK(n) to computing the optimal expected sender utility for a persuasion instance with carefully chosen parametersfa i g n i=1 ,fb i g n i=1 , and. In relating optimal persuasion to the Khintchine polytope, there are two main difficulties: (1)K consists of two-signal signatures, so there should be an optimal scheme to our persuasion instance which uses only two signals; (2) To be consistent with the definition ofK, such an 164 optimal scheme should send each signal with probability exactly 1 2 . We will design specific ;a i ;b i to accomplish both goals. For notational convenience, we will again use (M + ;M ) to denote a typical element inK instead of (M 1 ;M 2 ) because, as we will see later, the two constructed signals will induce positive and negative sender utilities, respectively. Notice that there are only n degrees of freedom in (M + ;M )2K. This is because M + +M is the all- 1 2 matrix in R n2 , corresponding to the prior distribution of states of nature (by the definition of realizable signatures). Moreover, M + i;1 +M i;2 = 1 2 for alli2 [n] (by the definition ofK). Therefore, we must have M + i;1 =M i;2 = 1 2 M + i;2 = 1 2 M i;1 : This implies that we can parametrize signatures (M + ;M )2K by a vectorx2 [0; 1 2 ] n , where M + i;1 = M i;2 = x i andM + i;2 = M i;1 = 1 2 x i for eachi2 [n]. For anyx2 [0; 1 2 ] n , letM(x) denote the signature (M + ;M ) defined byx as just described. We can now restate the membership problem forK as follows: givenx2 [0; 1 2 ] n , determine whetherM(x)2K. When any of the entries ofx equals 0 or 1 2 this problem is trivial, 2 so we assume without loss of generality that x 2 (0; 1 2 ) n . Moreover, when x i = 1 4 for some i, it is easy to see that a signaling scheme with signatureM(x), if one exists, must choose its signal independently of the type of action i, and thereforeM(x) 2 K(n) if and only if M(x i )2K(n 1). This allows us to assume without loss of generality thatx i 6= 1 4 for alli. Givenx2 (0; 1 2 ) n with x i 6= 1 4 for all i, we construct specific and a i ;b i for all i such that we can determine whetherM(x)2K by simply looking at the optimal sender utility in the corresponding persuasion instance. We choose parametersa i andb i to satisfy the following two equations. x i a i + ( 1 2 x i )b i = 0: (A.3) ( 1 2 x i )a i +x i b i = 1 2 : (A.4) We note that the above linear system always has a solution when x i 6= 1 4 , which we assumed previously. We make two observations about our choice ofa i andb i . First, the prior expected receiver utility 1 2 (a i +b i ) equals 1 2 for all actionsi (by simply adding Equation (A.3) and (A.4)). Second,a i andb i are both non-zero, and this follows easily from our assumption thatx i 2 (0; 1 2 ). Now we show how to determine whetherM(x)2K by only examining the optimal sender utility in the constructed persuasion instance. We start by showing that restricting to two-signal schemes is without loss of generality in our instance. 2 Ifxi is 0 or 1 2 , thenM(x)2K if and only ifxj = 1 4 for allj6=i. This is because the corresponding signaling scheme must choose its signal based solely on the type of actioni. 165 Lemma 26. There exists an optimal persuasive signaling scheme which uses at most two signals: one signal recommends the special action, and the other recommends some regular action. Proof. Recall that an optimal persuasive scheme usesn+1 signals, with signal i recommending actioni fori = 0; 1;:::;n. Fix such a scheme, and let i denote the probability of signal i . Signal i induces posterior expected receiver utilityr j ( i ) and sender utilitys j ( i ) for each actionj. For a regular actionj6= 0, we haves j ( i ) =r j ( i ) for alli due to the zero-sum nature of our construction. Notice thatr i ( i ) 0 for all regular actionsi6= 0, since otherwise the receiver would prefer action 0 over action i. Consequently, for each signal i with i6= 0, the receiver derives non-negative utility and the sender derives non-positive utility. We claim that merging signals 1 ; 2 ;:::; n — i.e., modifying the signaling scheme to out- put the same signal in lieu of each of them — would not decrease the sender’s expected utility. Recall that persuasiveness implies thatr i ( i ) = max n j=0 r j ( i ). Using Jensen’s inequality, we get n X i=1 i r i ( i ) n max j=0 " n X i=1 i r j ( i ) # : (A.5) If the maximum in the right hand side expression of (A.5) is attained at j = 0, the receiver will choose the special action 0 when presented with the merged signal . Recalling that s i ( i ) is non-positive for i 6= 0, this can only improve the sender’s expected utility. Oth- erwise, the receiver chooses a regular action j 6= 0 when presented with , resulting in a total contribution of P n i=1 i r j ( i ) to the receiver’s expected utility from the merged signal, down from the total contribution of P n i=1 i r i ( i ) by the original signals 1 ;:::; n . Recall- ing the zero-sum nature of our construction for regular actions, the merged signal contributes P n i=1 i s j ( i ) = P n i=1 i r j ( i ) to the sender’s expected utility, up from a total contribution of P n i=1 i s i ( i ) = P n i=1 i r i ( i ) by the original signals 1 ;:::; n . Therefore, the sender is not worse off by merging the signals. Moreover, interpreting as a recommendation for action j yields persuasiveness. Therefore, in characterizing the optimal solution to our constructed persuasion instance, it suffices to analyze two-signal schemes of the the form guaranteed by Lemma 26. For such a scheme, we denote the signal that recommends the special action 0 by + (indicating that the sender derives positive utility ), and denote the other signal by (indicating that the sender derives negative utility, as we will show). For convenience, in the following discussion we use the expression “payoff from a signal” to signify the expected payoff of a player conditioned on that signal multiplied by the probability of that signal. For example, the sender’s expected payoff from signal equals the sender’s expected payoff conditioned on signal multiplied by the overall probability that the scheme outputs , assuming the receiver follows the scheme’s (persuasive) 166 recommendations. We also use the expression “payoff from an action in a signal” to signify the posterior expected payoff of a player for that action conditioned on the signal, multiplied by the probability that the scheme outputs the signal. For example, the receiver’s expected payoff from actioni in signal + equals + r i ( + ), wherer i ( + ) is the receiver’s posterior expected payoff from actioni given signal + , and + is the overall probability of signal + . Lemma 27. Fix a persuasive scheme with signals and + as described above. The sender’s expected payoff from signal is at most 1 2 . Moreover, if the sender’ expected payoff from is exactly 1 2 , then for each regular actioni the expected payoff of both the sender and the receiver from actioni in signal + equals 0. Proof. Assume that signal + [ ] is sent with probability + [ ] and induces posterior ex- pected receiver payoffr i ( + ) [r i ( )] for each actioni. Recall from our construction that the prior expected payoff of each regular actioni6= 0 equals 1 2 a i + 1 2 b i = 1 2 . Since the prior expecta- tion must equal the expected posterior expectation, it follows that + r i ( + ) + r i ( ) = 1 2 when i is regular. The receiver’s reward from the special action is deterministically 0, and therefore persuasiveness implies that r i ( + ) 0 for each regular action i. It follows that r i ( ) = 1 2 + r i ( + ) 1 2 for regular actions i. In other words, the receiver’s ex- pected payoff from each regular action in signal is at least 1 2 . By the zero-sum nature of our construction, the sender’s expected payoff from each regular action in signal is at most 1 2 . Since recommends a regular action, we conclude that the sender’s expected payoff from is at most 1 2 . Now assume that the sender’s expected payoff from is exactly 1 2 . By the zero-sum property, persuasiveness, and the above-established fact that r i ( ) 1 2 for regular actions i, it follows that the receiver’s expected payoff from each regular action in signal is exactly 1 2 . Recalling that + r i ( + ) + r i ( ) = 1 2 wheni is regular, we conclude that the receiver’s expected payoff from a regular action in signal + equals 0. By the zero-sum property for regular actions, the same is true for the sender. The key to the remainder of our reduction is to choose a small enough value for the parameter — the sender’s utility from the special action — so that the optimal signaling scheme satisfies the property mentioned in Lemma 27: The sender’s expected payoff from signal is exactly equal to its maximum possible value of 1 2 . In other words, we must make so small so that the sender prefers to not sacrifice any of her payoff from in order to gain utility from the special action recommended by + . Notice that this upper bound of 1 2 is indeed achievable: the uninformative signaling scheme which recommends an arbitrary regular action has this property. We now show that a “small enough” indeed exists. The key idea behind this existence proof is 167 the following: We start with a signaling scheme which maximizes the sender’s payoff from at 1 2 , and moreover corresponds to a vertex of the polytope of persuasive signatures. When> 0 is smaller than the “bit complexity” of the vertices of this polytope, moving to a different vertex — one with lower sender payoff from — will result in more utility loss from than utility gain from + . We show that > 0 with polynomially many bits suffices, and can be computed in polynomial time. LetP 2 be the family of all realizable two-signal signatures (again, ignoring action 0). It is easy to see thatP 2 is a polytope, and importantly, all entries of any vertex ofP 2 are integer mul- tiples of 1 2 n . This is because every vertex ofP 2 corresponds to a deterministic signaling scheme which partitions the set of states of nature, and every state of nature occurs with probability 1=2 n . As a result, all vertices ofP 2 haveO(n) bit complexity. To ease our discussion, we use a compact representation for points inP 2 . In particular, any point inP 2 can be captured by n + 1 variables: variable p denotes the probability of sending signal + , and variabley i denotes the joint probability of signal + and the event that actioni has type 1 i . It follows that joint probability of type 2 i and signal + ispy i , and the probabilities associated with signal are determined by the constraint thatM + +M is the all- 1 2 matrix. With some abuse of notation, we useM(p;y) = (M + ;M ) to denote the signature inP 2 corresponding to the probabilityp and n-dimensional vectory. Now we consider the following two linear programs. maximize p +u subject to M(p;y)2P 2 y i a i + (py i )b i 0; fori = 1;:::;n: u[( 1 2 y i )a i + ( 1 2 p +y i )b i ]; fori = 1;:::;n: (A.6) maximize u subject to M(p;y)2P 2 y i a i + (py i )b i 0; fori = 1;:::;n: u[( 1 2 y i )a i + ( 1 2 p +y i )b i ]; fori = 1;:::;n: (A.7) Linear programs (A.6) and (A.7) are identical except for the fact that the objective of LP (A.6) includes the additional termp. LP (A.6) computes precisely the optimal expected sender util- ity in our constructed persuasion instance: The first set of inequality constraints are the per- suasiveness constraints for the signal + recommending action 0; The second set of inequality constraints state that the sender’s payoff from signal is the minimum among all actions, as implied by the zero-sum nature of our construction; The objective is the sum of the sender’s pay- offs from signals + and . Notice that the persuasiveness constraints for signal , namely ( 1 2 y i )a i +( 1 2 p+y i )b i 0 for alli6= 0, are implicitly satisfied because 1 2 a i + 1 2 b i = 1 2 by our 168 construction and ( 1 2 y i )a i +( 1 2 p+y i )b i = 1 2 a i + 1 2 b i [y i a i +(py i )b i ] 1 2 0> 0. On the other hand, LP (A.7) maximizes the sender’s expected payoff from signal . Observe that the optimal objective value of LP (A.7) is precisely 1 2 becauseu[( 1 2 y i )a i +( 1 2 p+y i )b i ] 1 2 for alli6= 0, and equality is attained, for example, atp = 0 andy = 0. Let f P 2 be the set of all feasible (u;M(p;y)) for LP (A.6) (and LP (A.7)). Obviously, f P 2 is a polytope. We now argue that all vertices of f P 2 have bit complexity polynomial inn and the bit complexity ofx2 (0; 1 2 ) n . In particular, denote the bit complexity ofx by`. Sincea i ;b i are computed by a two-variable two-equation linear system involving x i (Equations (A.3) and (A.4)), they each haveO(`) bit complexity. Consequently, all the explicitly described facets of f P 2 haveO(`) bit complexity. Moreover, since each vertex ofP 2 hasO(n) bit complexity, each facet ofP 2 then hasO(n 3 ) bit complexity, i.e., the coefficients of inequalities that determine the facets haveO(n 3 ) bit complexity. This is due to the fact that facet complexity of a rational polytope is upper bounded by a cubic polynomial of the vertex complexity and vice versa (see, e.g., (Schrijver, 2003)). To sum up, any facet of polytope f P 2 has bit complexityO(n 3 +`), and therefore any vertex of f P 2 hasO(n 9 ` 3 ) bit complexity. Let the polynomialB(n;`) = O(n 9 ` 3 ) be an upper bound on the maximum bit complexity of vertices of f P 2 . Now we are ready to set the value of. LP (A.6) always has an optimal vertex solution which we denote as (u ;M ). Recall thatu 1 2 for all points (u;M(p;y)) in f P 2 andu = 1 2 is attainable at some vertices. Since all vertices of f P 2 haveB(n;`) bit complexity, (u ;M ) must either satisfy eitheru = 1 2 oru 1 2 2 B(n;`) . Therefore, it suffices to set = 2 nB(n;`) , which is a number with polynomial bit complexity. As a result, any optimal vertex solution to LP (A.6) must satisfyu = 1 2 , since the loss incurred by moving to any other vertex withu< 1 2 can never be compensated for by the other termp<. With such a small value of, the sender’s goal is to send signal + with probability as high as possible, subject to the constraint that her utility from is precisely 1 2 . In other words, signal + must induce expected receiver/sender utility precisely 0 for each regular action i6= 0 (see Lemma 27). This characterization of the optimal scheme now allows us to determine whether M(x)2K by inspecting the sender’s optimal expected utility. The following Lemma completes our proof of Theorem 5.2.5. Lemma 28. Given the small enough value of described above, the sender’s expected utility in the optimal signaling scheme for our constructed persuasion instance is at least 1 2 ( 1) if and only ifM(x)2K. Proof. (: IfM(x)2K, then by our choice of a i ;b i (recall Equations (A.3) and (A.4)), the signaling scheme implementingM(x) is persuasive, the sender’s payoff from signal + is 1 2 , and her payoff from is 1 2 . Therefore, the optimal sender utility is at least 1 2 1 2 . 169 ): LetM(p;y) be the signature of a vertex optimal signaling scheme in LP (A.6). By our choice of we know that the sender payoff from signal must be exactly 1 2 . Therefore, to achieve overall sender utility at least 1 2 1 2 , signal + must be sent with probability p 1 2 , and the receiver’s payoff from each regular action i 6= 0 in signal + is exactly 0. That is, y i a i + (py i )b i = 0. By construction, we also have thatx i a i + (0:5x i )b i = 0 anda i ;b i 6= 0, which imply that y i x i = py i 0:5x i and, furthermore, that y i x i since p 1 2 . Now let ' be a signaling scheme with the signatureM(p;y). We can post-process' so it has signatureM(x) as follows: whenever ' outputs the signal + , flip a biased random coin to output + with probability 0:5 p and output otherwise. By using the identity y i x i = py i 0:5x i , it is easy to see that this adjusted signaling scheme has signatureM(x). 170 A.3 Omitted Proofs from Section 5.2.4 A.3.1 A Bicriteria FPTAS Proof of Lemma 4 Fix,K, and, and let' denote the resulting signaling scheme implemented by Algorithm 2. Let denote the input to', and'() denote its output. First, we condition on the empirical sample e =f 1 ;:::; K g without conditioning on the index ` of the input state of nature , and show that -persuasiveness holds subject to this conditioning. The principle of deferred decisions implies that, subject to this conditioning, is uniformly distributed in e . By definition of linear program (5.4), the signaling scheme e ' computed in Step 3 is-persuasive scheme for the empirical distribution e . Since e '() and is conditionally distributed according to e , this implies that all-persuasiveness constraints conditionally hold; formally, the following holds for each pair of actionsi andj: E[r i ()j = i ; e ]E[r j ()j = i ; e ] Removing the conditioning on e and invoking linearity of expectations shows that ' is - persuasive for, completing the proof. Proof of Lemma 5 As in the proof of Lemma 4, we condition on the empirical sample e =f 1 ;:::; K g and observe that is uniformly distributed in e after this conditioning. The conditional expectation of sender utility then equals P K k=1 P n i=1 1 K e '( k ; i )s i ( k ), where e ' is the signaling scheme computed in Step 3 based on e . Since this is precisely the optimal value of the LP (5.4) solved in Step 3, removing the conditioning and invoking linearity of expectations completes the proof. Proof of Lemma 6 Recall that linear program (5.1) solves for the optimal persuasive scheme for . It is easy to see that the linear program (5.4) solved in step 3 is simply the instantiation of LP (5.1) for the empirical distribution e consisting ofK samples from. To prove the lemma, it would suffice to show that the optimal persuasive scheme' corresponding to LP (5.1) remains-persuasive and -optimal for the distribution e , with high probability. Unfortunately, this approach fails because polynomially-many samples from are not sufficient to approximately preserve the per- suasiveness constraints corresponding to low-probability signals (i.e., signals which are output with probability smaller than inverse polynomial inn). Nevertheless, we show in Claim 4 that there exists an approximately optimal solution b ' to LP (5.1) with the property that every signal 171 i is either large, which we define as being output by b ' with probability at least 4n assuming , or honest in that only states of nature withi2 argmax j r j () are mapped to it. It is easy to see that sampling preserves persuasiveness exactly for honest signals. As for large signals, we employ tail bounds and the union bound to show that polynomially many samples suffice to approximately preserve persuasiveness (Claim 5). Claim 4. There is a signaling scheme b ' which is persuasive for , induces sender utility u s (b ';)OPT 2 on, and such that every signal ofb ' is either large or honest. Proof. Let' be the optimal persuasive scheme for — i.e. the optimal solution to LP (5.1). We call a signal small if it is output by' with probability less than 4n , i.e. if P 2 ' (;)< 4n , and otherwise we call it large. Let b ' be the scheme which is defined as follows: on input, it first samples' (); if is large thenb ' simply outputs, and otherwise it recommends an action maximizing receiver utility in state of nature —- i.e., outputs i 0 fori 0 2 argmax i r i (). It is easy to see that every signal of b ' is either large or honest. Moreover, since' is persuasive andb ' only replaces recommendations of' with “honest” recommendations, it is easy to check that b ' is persuasive for. Finally, since the total probability of small signals in' is at most 4 , and utilities are in [1; 1], the sender’s expected utility from b ' is no worse than 2 smaller than her expected utility from' . Claim 5. Let b ' be the signaling scheme from Claim 4. With probability at least 1 8 over the sample e ,b ' is-persuasive for e , and moreoveru s (b '; e )u s (b ';) 4 . Proof. Recall thatb ' is persuasive for, and every signal is either large or honest. Since e is a set of samples from, it is easy to see that persuasiveness constraints pertaining to the honest signals continue to hold over e . It remains to show that persuasiveness constraints for large signals, as well as expected sender utility, are approximately preserved when replacing with e . Recall that persuasiveness requires thatE [b '(; i )(r i ()r j ())] 0 for eachi;j2 [n]. Moreover, the sender’s expected utility can be written asE [ P n i=1 b '(; i )s i ()]. The left hand side of each persuasiveness constraint evaluates the expectation of a fixed function of with range [2; 2], whereas the sender’s expected utility evaluates the expectation of a function of with range in [1; 1]. Standard tail bounds and the union bound, coupled with our careful choice of the number of samplesK, imply that replacing distribution with e approximately preserves each of thesen 2 + 1 quantities to within an additive error of 2 4n with probability at least 1 8 . This bound on the additive loss translates to-persuasiveness for the large signals, and is less than the permitted decrease of 4 for expected sender utility. The above claims, coupled with the fact that sender payoffs are bounded in [1; 1], imply that the expected optimal value of linear program (5.4) is at leastOPT, as needed. 172 Rainy Sunny Walk 1 1 Drive 1 0 Table A.1: Receiver’s payoffs in rain and shine example A.3.2 Information-Theoretic Barriers Impossibility of Persuasiveness (Proof of Theorem 5.2.7 (a)) Consider a setting with two states of nature, which we will conveniently refer to as rainy and sunny. The receiver, who we may think of as a daily commuter, has two actions: walk and drive. The receiver slightly prefers driving on a rainy day, and strongly prefers walking on a sunny day. We summarize the receiver’s payoff function, parametrized by > 0, in Table A.1. The sender, who we will think of as a municipality with black-box sample access to weather reports drawn from the same distribution as the state of nature, strongly prefers that the receiver chooses walking regardless of whether it is sunny or rainy: we lets walk = 1 ands drive = 0 in both states of nature. Let r be the point distribution on the rainy state of nature, and let s be such that Pr s [rainy] = 1 1+2 and Pr s [sunny] = 2 1+2 . It is easy to see that the unique direct persua- sive scheme for r always recommends driving, and hence results in expected sender utility of 0. In contrast, a simple calculation shows that always recommending walking is persuasive for s , and results in expected sender utility 1. If algorithmA is persuasive andc-optimal for a constant c < 1, thenA( r ) must never recommend walking whereasA( s ) must recommend walking with constant probability at least (1c) overall (in expectation over the input state of nature s as well as all other internal randomness). Consequently, given a black box distribution D2f r ; s g, evaluatingA(D;) on a random drawD yields a tester which distinguishes between r and s with constant probability 1c. Since the total variation distance between r and s isO(), it is well known (and easy to check) that any black-box algorithm which distinguishes between the two distributions with (1) success probability must take ( 1 ) samples in expectation when presented with one of these distributions. As a consequence, the average-case sample complexity ofA on either of r and s is ( 1 ). Since> 0 can be made arbitrarily small, this completes the proof. Impossibility of Optimality (Proof of Theorem 5.2.7 (b)) Consider a setting with three actionsf1; 2; 3g and three corresponding states of nature 1 ; 2 ; 3 . In each state i , the receiver derives utility 1 from actioni and utility 0 from the other actions. The sender, on the other hand, derives utility 1 from action 3 and utility 0 from actions 1 and 2. 173 Pr[ 1 ] Pr[ 2 ] Pr[ 3 ] 1 2 2 0 0 1 2 Table A.2: Two distributions on three actions For an arbitrary parameter > 0, we define two distributions and 0 over states of nature with total variation distance, illustrated in Table A.2. Assume algorithmA is optimal andc-persuasive for a constantc< 1 4 . The optimal persuasive scheme for 0 results in expected sender utility 3 by recommending action 3 whenever the state of nature is 2 or 3 , and with probability 12 when the state of nature is 1 . Some calculation reveals that in order to match this expected sender utility subject toc-persuasiveness, signaling scheme' 0 =A( 0 ) must satisfy' 0 ( 2 ; 3 ) for = 1 4c > 0. In other words,' 0 must recommend action 3 a constant fraction of the time when given state 2 as input. In contrast, sincec < 1 2 it is easy to see that' =A() can never recommend action 3: for any signal, the posterior expected receiver reward for action 3 is 0, whereas one of the other two actions must have posterior expected receiver reward at least 1 2 . It follows that givenD2f; 0 g, a call to A(D; 2 ) yields a tester which distinguishes between and 0 with constant probability. Since and 0 have statistical distance, we conclude that the worst case sample complexity ofA on either of or 0 is ( 1 ). Since> 0 can be made arbitrarily small, this completes the proof. 174 Appendix B Omissions From Section 6.2.3.1 B.1 Omitted Proofs Proof of Lemma 12 The linear program for solving zero-sum SEGs can be written as follows, which is a slight modi- fication to LP (6.6): max u s.t. ux i U d + (i) +w i U d (i) +U d ( + i ; i ) 8i2 [n] P e2E:e i = + p e =x i 8i2 [n] P e2E:e i = s+ p e =y i 8i2 [n] P e2E:e i = s p e =z i 8i2 [n] x i +y i +z i +w i = 1 8i2 [n] P e2E p e = 1 p e 0 8e2E U d ( + i ; i ) 0 8i2 [n] (y i + i )U d + (i) + (z i i )U d (i) 0 8i2 [n] 0 + i y i ; 0 i z i 8i2 [n] (B.1) We first prove a useful property of the optimal solution of LP (B.1). In particular, we show that there always exists an optimal solution to LP (B.1) that satisfies i =z i 8i2 [n]. First, we claim that it is without loss of generality to assume that the optimal solution satisfied eithery i = + i orz i = i . Otherwise, we can increase + i by U d + (i) and i by U d (i) without violating constraints and changing the objective value. Once one of the + i ; i reaches its upper bound, we havey i = + i orz i = i and the solution remains optimal. Now, if i = z i , then we are done. If + i = y i , we have 0 (y i + i )U d + (i) + (z i i )U d (i) = (z i i )U d (i) 0, which impliesz i = i orU d (i) = 0. In the later case, we can arbitrary set i to bez i without affecting anything neither. 175 Therefore, adding the constraintz i = i will not affect the optimal value of linear program (B.1). Moreover, + i is always non-negative at the optimal solution. So relaxing + i to be a real number will not affect the optimal value. Thus, the linear program B.1 is equivilent to the following linear program: max u s.t. ux i U d + (i) + (1x i y i z i )U d (i) + + i U d + (i) +z i U d (i) 8i2 [n] P e2E:e i = + p e =x i 8i2 [n] P e2E:e i = s+ p e =y i 8i2 [n] P e2E:e i = s p e =z i 8i2 [n] P e2E p e = 1 p e 0 8e2E + i U d + (i) +z i U d (i) 0 8i2 [n] + i y i 8i2 [n] (B.2) The dual of LP (B.2) is the following LP. min P n i=1 U d (i)w i +r s.t. r P i:e i = + i + P i:e i = s+ i + P i:e i = s i 8e2E i = [U d + (i)U d (i)]w i 8i2 [n] i =' i w i U d (i) 8i2 [n] i = i U d (i) 8i2 [n] P e2E p e = 1 ' i =U d + (i)w i U d + (i) i 8i2 [n] P n i=1 w i = 1 (B.3) in which i ; i ; i correspond to the constraints definingx i ;y i ;z i respectively. Note that i = [U d + (i)U d (i)]w i [U d + (i)U d (i)]w i U d + (i) i = i Also, since' i 0 andU d + (i) 0 too, so we have the implicit constraintw i i 0. Therefore, i = [U d + (i)U d (i)]w i U d + (i) i [U d + (i)U d (i)] i U d + (i) i =U d (i) i = i Since i =U d (i) i 0, this implies i i i 0 176 Proof of Lemma 13 This is because whenT is fixed, the weight of covering any targeti by a sensor has been deter- mined – either i ifi2T N or i ifi2T c . Therefore, to maximize the total weights, we simply pick the largestm elements inf i ji2T N g[f i ji2T c g. Proof of Theorem 6.2.2 The proof follows from the following two lemmas. Lemma 29. When i i i 0;8i2 [n], functiong(T ) is nonnegative, monotone increas- ing and submodular. Proof of Lemma 29. It is easy to see thatg(T ) 0 and is monotone increasing inT . We only prove its submodularity. Since P i2T i is a modular function ofT , we only need to prove that functionf 0 (T ) = m max f i ji2T N [Tg[f i ji2T c g is submodular inT . The key step is to prove that the following function is submodular: W (S) = P m max (f i ji2Sg[f i ji2Sg) where i i for alli2 [n] andS = [n]S is the complement ofS. Notice thatW (T )6=f 0 (T ) (insteadW (T N [T ) =f 0 (T )), so they are two different functions despite the similarity. Pick any setsS T [n] andj62 T . Following the standard definition of submodularity, we prove the following inequality: W (S[fjg)W (S)W (T[fjg)W (T ): This follows a case analysis. For convenience, we will say “ j [ j ] contributes toW (S)” if j [ j ] is among the largestm weights off i ji2Sg[f i ji2Sg; Moreover, we denote setS[fjg byS +j . j contributes toW (T +j ). Then we must have that j also contributes toW (S +j ) since ST . In this case,W (S +j )W (S) equals j minus the smallest weight that contributes to W (S). On the other hand, W (T +j ) W (T ) equals j minus the smallest weight that contributes to W (T ). Since S T , the smallest weight contributing to W (T ) is larger than the smallest weight contributing toW (S). This impliesW (S +j )W (S) W (T +j )W (T ). j does not contribute to W (T +j ). In this case W (T +j )W (T ) = 0 and W (S +j ) W (S) 0. Therefore,W (S +j )W (S)W (T +j )W (T ). 177 As a result,W (S) is submodular. We now show thatf 0 (T ) is submodular by proving f 0 (S +j )f 0 (S)f 0 (T +j )f 0 (T ) for anyST [n] andj62T . LetA =T N +j [T +j n(T N [T ) andB =S N +j [S +j n(S N [S). Note thatAB sinceST . Therefore f 0 (S +j )f 0 (S) = W (S N +j [S +j )W (S N [S) = W (S N [S[B)W (S N [S) W (S N [S[A)W (S N [S) W (T N [T[A)W (T N [T ) = f 0 (T +j )f 0 (T ); where the first inequality follows from monotonicity of functionW (S) and the second inequality follows from submodularity ofW (S). This proves thatf 0 (T ), thusf(T ), is submodular. Lemma 30. When i i i 0;8i2 [n], Algorithm 5 outputs a 1 2 (1 1 e )-approximation for the slave problem. Proof of Lemma 30. Let T g and T f be the optimal solution to maximizing g(T ) and f(T ) subject tojTjk, respectively. Let b T be the set generated by the greedy process (step 2 – 5) in Algorithm 5. Our goal is to provef( b T ) 1 2 (1 1 e )f(T f ). The key step is to show the following relations: f(T )g(T ) 2f(T ); 8T [n]: Since the m max operator ing(T ) acts on a larger set than that inf(T ), this impliesg(T )f(T ). We now proveg(T ) 2f(T ). SinceT;T N ;T c are mutually disjoint, the weights that contribute tof(T ) are all indexed by different vertices. However, sinceTT N [T , there may exist vertex i2T such that both i and i contribute tog(T ). LetAT be all suchi’s. We have X i2A i X i2A i X i2T i f(T ): (B.4) Moreover, if we remove the portion of P i2A i fromg(T ), then the left weights are all indexed by different vertices and their total weights are at mostf(T ). That is, g(T ) X i2A i f(T ) (B.5) Combining Inequalities (B.4) and (B.5) yields thatg(T )f(T )+ P i2A i 2f(T ), as desired. By the monotone submodularity ofg(T ) (Lemma 29), we haveg( b T ) (1 1 e )g(T g ). Since g(T g )g(T f )f(T f ) and 2f( b T )g( b T ), this impliesf( b T ) 1 2 (1 1 e )f(T f ). 178 B.2 Counter Example to Submodularity off(T) Recall that f(T ) = P i2T i + m max f i ji2T N g[f i ji2T c g Consider a simple line graphG with 5 vertices, as in Figure B.1. Let = 1 andm = 2. Moreover, i = i = 1 while i = 0 for alli = 1;::; 5. 1 2 3 4 5 Figure B.1: GraphG for the counter example . ConsiderS =f2g,T =f2; 4g andj = 162T . We havef(S) = 3;f(S[fjg) = 3;f(T ) = 4 andf(T[fjg) = 5. Therefore, f(T[fjg)f(T ) = 1> 0 =f(S[fjg)f(S): Sof(T ) is not submodular inT . 179 Appendix C Omitted Proofs From Section 6.3 C.1 Proof of Proposition 5 First, notice that U sig (G) U BSSE (G) for any BSG G (not necessarily zero-sum). This is because the leader policy of playing the BSSE leader mixed strategy and sending only one signal to each attacker type degenerates to the BSSE. We now show thatU sig (G) U BSSE (G). Let (x ;p) be the optimal leader policy computed by LP (6.10). Note that, if the leader plays the optimal leader policy (x ;p), but the follower type “irrationally” ignores any signal and simply reacts tox by taking the best response (tox ) actionj , then, the follower of type gets utility P i x i b ij . We claim that this utility is less than the utility of best responding to each signal separately, as shown below X j X i p ij b ij X j X i p ij b ij = X i x i b ij where the inequality is due to second set of constraints in LP (6.10) and the equality is due to the first set of constraints in LP (6.10). Since this is a zero-sum game, the leader will be better off if the follower of type ignores signals. LetU be the defender utility when all the attacker types best respond tox by ignoring signals, thenU U sig (G). However,U is simply the defender utility in this BSG by committing to the mixed strategy x without any signaling, therefore is upper bounded byU BSSE (G). As a result,U BSSE (G)UU sig (G), as desired. C.2 Proof of Propositions in Section 6.3.2.2 Proof of Proposition 7 This is a slight modification from a proof of the hardness of Bayesian Stackelberg games (Theo- rem 2 in (Li et al., 2016)). We provide it only for completeness. The reduction is from 3-SAT. Given an instance of 3-SAT withn variables andm clauses, we create a security game with 2n + 2 targets and n resources. For each variable, there is a 180 target corresponding to taht variable and its negation (call these variable targets), as well as a punishment and a reward target. There are m + 3n types of attacker. m of these are clause types, one per clause. Each of these types are interested in attacking all targets corresponding to literals appearing in the corresponding clause, or the reward target. For any literal contained in the clause, this type gets -1 payoff for attacking when the target is covered and 0 when it is uncovered. Any clause type attacker gets 0 payoff for attacking the punishment target, whether or not it is covered. Note that if a clause type believes that at least one of the literal targets is covered with probability 1, then they will attack that target (breaking ties favorably). Otherwise, they attack the punishment target. There is one pair type for each variable. These types are not interested in any literal target that does not correspond to the relevant variable, or the reward target. For the two literal targets they are interested in, they get -1 payoff for attacking a covered target and 0 for an uncovered target. They get 0 for attacking the punishment target. Again, a pair type target will only not attack the punishment target if they believe that both literal targets are covered with non-zero probability. Lastly there are 2n counting types, one per literal. Each of these types is not interested in any literal target other than the one corresponding to them, or the punishment node. If they attack the relevant literal node and it is covered they get 0 payoff, and if it is uncovered they get 1. They get 0 payoff for attacking the reward target, regardless of whether it is covered. Note that each of these types attacks the reward target if they believe that the literal target is covered with probability 1. The defender gets 0 payoff whenever a literal target is attacked, regardless of whether it is covered and -1 payoff whenever the punishment target is attacked. If any attacker attacks the reward target the defender gets payoff (note that the only attacker types that will ever attack the reward target are the counting types). Each type occurs with equal probability. We show that the defender can obtain a utility of n m+3n if and only if the instance of 3-SAT is satisfiable. If the instance is satisfiable, then we simply cover the variable targets corresponding to a satisfying assignment, and signal as such. Then all clauses are satisfied, so no clause type attacks the punishment node, no variable has both its positive and negative literals covered with positive probability, andn counting types are sure that their literal is covered, so they attack the reward node. This results in an expected utility of n m+3n for the defender. Now suppose the instance of 3-SAT is not satisfiable. Note that whenever there is any uncer- tainty for the attacker they take an undesirable action, therefore the defender optimally signals 181 truthfully about their chosen action. Since the instance is unsatisfiable, for any allocation of re- sources either a clause type or pair type will be incentivized to attack the punishment target. The defender can get payoff 1 at most n m+3n of the time (from exactlyn counting types, as the de- fender can cover onlyn variable targets at a time), and gets -1 payoff from the pair/clause type that attacks the punishment target. Therefore the defender gets less than n m+3n expected utility. Proof of Proposition 8 For convenience, let target 0 denote the common coverage-invariant target. By assumption, leti denote the only type-specific target for the attacker of type. Notice that, our signaling scheme only needs two signals for the attacker of type, recommending either targeti or target 0 for attack, since he is not interested in other targets. Therefore, for each attacker type, we define four variables: p c;j [p u;j ] is the probability that type’s specific targeti is covered [uncovered] and actionj is recommended to the attacker, wherej2fi ; 0g is either to attacki , or stay home. Notice that, we can define these variables because our signaling scheme for type only depends on the coverage status of targeti as the utility of the common target 0 is coverage-invariant. This is crucial, since otherwise, the optimal signaling scheme may depend on all the targets that type is interested, and this makes the problem much harder (as shown in Proposition 9). The following linear program, with variablesp c;j andx, computes the optimal defender utility. maximize P 2 P s2fc;ug p s;i U d x (i ;) subject to P j2f0;i g p c;j =x i ; for2 : P j2f0;i g p u;j = 1x i ; for2 : P s2fc;ug p s;j U a s (j;) P s2fc;ug p s;j U a s (j 0 ;); for2 : x2P (C.1) where: the first two constraints mean that the signaling scheme should be consistent with the true marginal probability thati is covered (first constraint) or uncovered (second constraint). The third constraint is the incentive compatibility constraint which guarantees that the attacker prefers to follow the recommended action. The last constraint ensures that the marginal distributionx is implementable (P is the set of all implementable marginals.) Proof of Proposition 9 LP Formulation of the Problem and Its Dual Using similar notations as Section 6.3.3, we equivalently regard each pure strategy as a vector e2f0; 1g n , andE is the set of all pure strategies. We consider the case where the defender does not have any scheduling constraints, i.e., e is any vector with at most k 1’s, and show that the 182 defender oracle in this basic setting is already NP-hard. To describe a mixed strategy, letp e be the probability of taking pure strategye. Then x =E(e) = X e2S ep e (C.2) is the marginal coverage probability corresponding to this pure strategyfp e g e2S . Notice that x2R n . Sincen signals are need for each attacker type in the optimal scheme. Therefore, letp s;i be the probability that pure strategy s is taken and the attacker of type is recommended to take actioni. Then i = P e2E p e;i is the probability that attacker of type is recommended to take actioni, while x i = X e2E ep e;i is the corresponding posterior belief (absent by a normalization factor 1= i ) of marginal cov- erage when the attacker of type is recommended action i. Then the following optimization formulation computes the defender’s optimal mixed strategy as well as signaling scheme. 1 maximize P ;i x ii U c d (i;) + ( i x ii )U u d (i;) subject to x ii U c a (i;) + ( i x ii )U u a (i;) x ij U c a (j;) + ( i x ij )U u a (j;); fori;j;: i = P e2E p e;i ; fori;: P e2E ep e;i =x i ; fori;: P n i=1 p e;i =p e ; fore;: P s2E p e = 1 p e;i 0;p e 0; fore;i;: (C.3) wherex i 2R n ;p s 2R;p s;i 2R are variables. We now take the dual of LP (C.3). Instead of providing the exact dual program, we abstractly represent the dual by highlighting the non-trivial part, as follows: minimize subject to poly(n;jj) linear constraints ony i ; i i +ey i +q e 0; fori;e;: P q e + 0; fore: (C.4) where i ;q e ; 2R;y i 2R n are variables. We now analyze the dual program (C.4). Notice that the first (implicitly described) constraint does not depend on ;q e . So the last constraint, together 1 We only consider the case with no IC constraints for incentivizing attacker’s type report. Adding IC constraint will result in the same defender oracle, thus is omitted here. 183 with the “min” objective, yields that = max e2E P q e at optimality. The middle constraint, together with the “min” objective, yields thatq e = max i [ i ey i ] at optimality. As a result, the dual program can be re-written in the following form: max e2E X max i ( i ey i ) s:t: poly(n;jj) linear constraints ony i ; i : Notice that, this is still a convex program – the objective can be viewed as maximizing a convex function. The Defender Oracle The defender oracle problem is precisely to evaluate the function f(y i ; i ) = max e2E X max i ( i ey i ) (C.5) for any given inputy i ; i . When the attacker of type is only interested in a small number of targets, say a subsetS of targets. Then in LP (C.3), the third constraint onx i 2R n only needs to be restricted to the targets inS, since the attacker of type does not care about the coverage of other targets at all. That is, there is no constraints forx i for alli62S; Moreover, for thosei2S, the constraint onx i can be restricted to only the entries inS. This simplification is reflected in the defender oracle problem in the following way: the input y i are non-zeros vectors only for thosei2S; moreover, the non-zeroy i only has non-zeros at those entries corresponding toS. Hardness of the Defender Oracle We now prove that the defender oracle problem is NP-hard, even when each attacker type is only interested in 2 targets. In other words, we prove that evaluating functionf(y i ; i ) is NP- hard, even when only twoy i ’s are non-zero vectors for each and each of these twoy i ’s only has two non-zero entries. We reduce from max-cut. Given any graph G = (V; ) with node set V and edge set . Construct a security game withV as targets and as attacker types. The attacker type = (i;j) is interested in only targetsi;j. For any type = (i;j), definey i as follows: y ii = 1,y ij =1 and y ik = 0 for any k6= i;j; define y e j as follows: y ji =1, y jj = 1 and y jk = 0 for any k6=i;j. Let i = 0 for anyi;. We will think of each pure strategye as a cut of sizek, with all value-1 nodes on one side and value-0 nodes on another side. Let c(e) = X 2 max k ( k ey k ) = X =(i;j)2 max(ey i ;ey j ): 184 Note that max(ey i ;ey j ) = 1 if and only if edge is cut by strategy e (in which case ey i ;ey j equals 1;1 respectively). Otherwise max(ey i ;ey j ) = 0. Therefore,c(e) equals precisely the cut size induced bye. Note that evaluating functionf defined in Equation (C.5) is to maximize c(e) over e2 E, which is precisely to compute the Max k-Cut, a well- known NP-hard problem. Therefore the defender oracle is NP-hard, even when each attacker type is only interested in two targets. C.3 Proof of the Polytope Transformation Lemma In this section, we prove Lemma 15. Part 1: This is standard, and can be found, e.g., in (Boyd & Vandenberghe, 2004). We provide a proof for completeness. Consider any two elements (x;p) and (y;q) from e P. So there existsa;b2P such thatx = pa andy = qb. To prove the convexity, we need to show (x;p) + (y;q)2 e P for any2 (0; 1) and + = 1. Ifp = q = 0, this is obvious; Otherwise, we have (x;p) + (y;q) = (pa;p) +(qb;q) = [p +q] pa +qb p +q ;p +q Notice that pa+qb p+q 2P due to the convexity ofP, therefore (x;p) + (y;q)2 e P. So e P is convex. Part 2: First, it is easy to see that any element from e P satisfiesAx pb andp 0. We prove the other direction. Namely, for any (x;p) satisfiesAx pb andp 0, (x;p)2 e P. It is easy to see that this is true forp > 0 sincex=p2P. The non-trivial part is whenp = 0, in which case (x;p)2 e P if and only ifx = 0. We need to prove the onlyx satisfyingAx 0 is the all-zero vector0. Here we need the condition thatP is bounded. If (by contradiction) there exists x 0 6= 0 satisfyingAx 0 0, then for any x2P, we must have x +x 0 2P for any > 0, which contradicts the fact thatP is bounded. Part 3: IfP has a separation oracleO, then the following is a separation oracle for e P. Given arbitrary (x 0 ;p 0 )2R n+1 , case 1: Ifp 0 < 0, return “no” and separation hyperplanep 0 = 0; case 2: Ifp 0 > 0, first check whether x 0 =p 0 2P. If this is true, return “yes”; otherwise, find a violated constraint, using oracleO, such thata T x 0 p 0 >b buta T x 0 b for anyx 0 2P. We claim thata T xbp = 0 is a hyperplane separating (x 0 ;p 0 ) from e P. In particular, for any (x;p)2 e P withp > 0,9x 0 2P such thatx=p = x 0 . Note thata T x 0 b sincex 0 2P, so a T xpb (also holds whenp = 0 in which casex =0) . Howevera T x 0 >p 0 b. Therefore, a T xpb = 0 is a separation hyperplane. 185 case 3: If p 0 = 0, return “yes” if x 0 = 0. Otherwise, return “no”, and find a separation hyperplane as follows. SinceP is bounded, we can find someL 0 > 0 large enough such that y 0 = L 0 x 0 62 conv(P; 0), where conv(P; 0) is the convex hull ofP and the origin 0 (thus containsP), and is introduced for technical convenience. Letay =b be a hyperplane separating y 0 fromconv(P; 0). That isay 0 >b andayb for anyy2conv(P; 0), in particular, for any y2P. Similarly to the argument in case 2, we know thataxpb for any (x;p)2 e P. Note that, since 02conv(P; 0), we haveba 0 = 0 is non-negative. As a result,aLx 0 = L L 0 ay 0 >b for anyLL 0 . That is,ax 0 > 1 L b for anyLL 0 . Therefore, we must haveax 0 0 =p 0 b sincep 0 = 0. As a result, the hyperplaneax =pb separates (x 0 ;p 0 ) from e P. 186 Bibliography Agmon, N., Sadov, V ., Kaminka, G. A., & Kraus, S. (2008). The impact of adversarial knowledge on adversarial planning in perimeter patrol.. V ol. 1, pp. 55–62. Agrawal, S., Ding, Y ., Saberi, A., & Ye, Y . (2010). Correlation robust stochastic optimization. In Proceedings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’10, pp. 1087–1096, Philadelphia, PA, USA. Society for Industrial and Applied Mathematics. Akerlof, G. A. (1970). The market for” lemons”: Quality uncertainty and the market mechanism. The Quarterly Journal of Economics, 84(3), 488–500. Alaei, S., Fu, H., Haghpanah, N., Hartline, J. D., & Malekian, A. (2012). Bayesian optimal auc- tions via multi- to single-agent reduction.. In Faltings, B., Leyton-Brown, K., & Ipeirotis, P. (Eds.), ACM Conference on Electronic Commerce, p. 17. ACM. Alon, N., Emek, Y ., Feldman, M., & Tennenholtz, M. (2013). Adversarial leakage in games. SIAM Journal on Discrete Mathematics, 27(1), 363–385. Alonso, R., & Camara, O. (2014). Persuading voters. Working paper. Alonso, R., & Cˆ amara, O. (2016). Persuading voters. American Economic Review, 106(11), 3590–3605. Alpern, S., Morton, A., & Papadaki, K. (2011). Patrolling games. Operations research, 59(5), 1246–1257. An, B., Shieh, E., Tambe, M., Yang, R., Baldwin, C., DiRenzo, J., Maule, B., & Meyer, G. (2012). PROTECT–a deployed game theoretic system for strategic security allocation for the United States Coast Guard. AI Magazine, 33(4), 96. Anderson, S. P., & Renault, R. (2006). Advertising content. American Economic Review, 96(1), 93–113. Antioch, G. (2013). Persuasion is now 30 per cent of us gdp. Economic Roundup, pp. 1–10. Arieli, I., & Babichenko, Y . (2016). Private Bayesian persuasion. Available at SSRN 2721307. Aumann, R., Maschler, M., & Stearns, R. (1995). Repeated Games with Incomplete Information. MIT Press. Babaioff, M., Kleinberg, R., & Paes Leme, R. (2012). Optimal mechanisms for selling informa- tion. In Proceedings of the 13th ACM Conference on Electronic Commerce, EC ’12, pp. 92–109, New York, NY , USA. ACM. 187 Babichenko, Y ., & Barman, S. (2017). Algorithmic aspects of private Bayesian persuasion. In Proceedings of the 2017 ACM Conference on Innovations in Theoretical Computer Science, ITCS. Bardhi, A., & Guo, Y . (2016). Modes of persuasion toward unanimous consent. Working paper. Barnhart, C., Johnson, E. L., Nemhauser, G. L., Savelsbergh, M. W., & Vance, P. H. (1998). Branch-and-price: Column generation for solving huge integer programs. Operations re- search, 46(3), 316–329. Basilico, N., Celli, A., De Nittis, G., & Gatti, N. (2017a). Coordinating multiple defensive re- sources in patrolling games with alarm systems. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, AAMAS. Basilico, N., De Nittis, G., & Gatti, N. (2017b). Adversarial patrolling with spatially uncertain alarm signals. Artificial Intelligence, 246, 220–257. Basilico, N., Gatti, N., & Amigoni, F. (2009a). Leader-follower strategies for robotic patrolling in environments with arbitrary topologies. In Proceedings of The 8th International Confer- ence on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS ’09. Basilico, N., Gatti, N., Rossi, T., Ceppi, S., & Amigoni, F. (2009b). Extending algorithms for mobile robot patrolling in the presence of adversaries to more realistic settings. In Pro- ceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02, WI-IAT ’09. IEEE Computer Society. Bergemann, D., & Bonatti, A. (2015). Selling cookies. American Economic Journal: Microeco- nomics, 7(3), 259–94. Bergemann, D., Bonatti, A., & Smolin, A. (2016). The designing and pricing information. Work- ing Paper. Bergemann, D., Brooks, B., & Morris, S. (2015). The limits of price discrimination. American Economic Review, 105(3), 921–57. Bergemann, D., & Morris, S. (2016). Bayes correlated equilibrium and the comparison of infor- mation structures in games. Theoretical Economics, 11(2), 487–522. Bhaskar, U., Cheng, Y ., Ko, Y . K., & Swamy, C. (2016). Hardness results for signaling in Bayesian zero-sum and network routing games. In Proceedings of the 2016 ACM Con- ference on Economics and Computation (EC). ACM. Border, K. (2007). Reduced Form Auctions Revisited. Economic Theory, 31(1), 167–181. Border, K. C. (1991). Implementation of Reduced Form Auctions: A Geometric Approach. Econometrica, 59(4). Bosansky, B., Jiang, A. X., Tambe, M., & Kiekintveld, C. (2015). Combining compact represen- tation and incremental generation in large games with sequential strategies. In AAAI. Boˇ sansk´ y, B., Kiekintveld, C., Lis´ y, V ., & Pˇ echouˇ cek, M. (2014). An exact double-oracle algo- rithm for zero-sum extensive-form games with imperfect information. J. Artif. Int. Res., 51(1), 829–866. Boˇ sansk´ y, B., Lis´ y, V ., Jakob, M., & Pˇ echouˇ cek, M. (2011). Computing time-dependent policies for patrolling games with mobile targets.. In AAMAS. IFAAMAS. 188 Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press, New York, NY , USA. Brocas, I., & Carrillo, J. D. (2007). Influence through ignorance. The RAND Journal of Eco- nomics, 38(4), 931–947. Brown, G., Carlyle, M., Diehl, D., Kline, J., & Wood, K. (2005). A two-sided optimization for theater ballistic missile defense. Oper. Res., 53(5), 745–763. Brown, M., Sinha, A., Schlenker, A., & Tambe, M. (2016). One size does not fit all: A game- theoretic approach for dynamically and effectively screening for threats. In AAAI confer- ence on Artificial Intelligence (AAAI). Cai, Y ., Daskalakis, C., & Weinberg, S. M. (2012). An algorithmic characterization of multi- dimensional mechanisms. In Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing, STOC ’12, pp. 459–478, New York, NY , USA. ACM. Calinescu, G., Chekuri, C., P´ al, M., & V ondr´ ak, J. (2011). Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing. Carthy, S. M., Tambe, M., Kiekintveld, C., Gore, M. L., & Killion, A. (2016). Preventing illegal logging: simultaneous optimization of resource teams and tactics for security. In Proceed- ings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3880–3886. AAAI Press. Cermak, J., Bosansky, B., Durkota, K., Lisy, V ., & Kiekintveld, C. (2016). Using correlated strategies for computing stackelberg equilibria in extensive-form games. In Thirtieth AAAI Conference on Artificial Intelligence. Cermak, J., Boˇ sansk´ y, B., & Lis´ y, V . (2017). An algorithm for constructing and solving imper- fect recall abstractions of large extensive-form games. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 936–942. Chakraborty, A., & Harbaugh, R. (2014). Persuasive puffery. Marketing Science, 33(3), 382–400. Chan, J., Gupta, S., Li, F., & Wang, Y . (2016). Pivotal persuasion. Available at SSRN. Chen, X., Deng, X., & Teng, S.-H. (2009). Settling the complexity of computing two-player Nash Equilibria. J. ACM, 56(3), 14:1–14:57. Cheng, Y ., Cheung, H. Y ., Dughmi, S., Emamjomeh-Zadeh, E., Han, L., & Teng, S.-H. (2015). Mixture selection, mechanism design, and signaling. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pp. 1426–1445. IEEE. Colbourn, J. C., Provan, J. S., & Vertigan, D. (1995). The complexity of computing the tutte polynomial on transversal matroids. Combinatorica. Conitzer, V ., & Korzhyk, D. (2011). Commitment to correlated strategies.. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI). Conitzer, V ., & Sandholm, T. (2006). Computing the optimal strategy to commit to. In Proceed- ings of the 7th ACM conference on Electronic commerce, pp. 82–90. ACM. Crawford, P., & Sobel, J. (1982). Strategic information transmission. Econometrica. Crawford, V . (1998). A survey of experiments on communication via cheap talk. Journal of Economic theory, 78(2), 286–298. 189 Cryan, M., & Dyer, M. (2002). A polynomial-time algorithm to approximately count contingency tables when the number of rows is constant. In Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, STOC ’02, pp. 240–249, New York, NY , USA. ACM. Daskalakis, C., Goldberg, P. W., & Papadimitriou, C. H. (2006). The complexity of computing a Nash Equilibrium. In Proceedings of the Thirty-eighth Annual ACM Symposium on Theory of Computing, STOC ’06, pp. 71–78, New York, NY , USA. ACM. Dughmi, S. (2014). On the hardness of signaling. In Proceedings of the 55th Symposium on Foundations of Computer Science, FOCS ’14. IEEE Computer Society. Dughmi, S. (2017). Algorithmic information structure design: a survey. ACM SIGecom Ex- changes, 15(2), 2–24. Dughmi, S., Immorlica, N., & Roth, A. (2014). Constrained signaling in auction design. In Proceedings of the Twenty-five Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’14. Society for Industrial and Applied Mathematics. Dyer, M. (2003). Approximate counting by dynamic programming. In Proceedings of the Thirty- fifth Annual ACM Symposium on Theory of Computing, STOC ’03, pp. 693–699, New York, NY , USA. ACM. Emek, Y ., Feldman, M., Gamzu, I., Paes Leme, R., & Tennenholtz, M. (2012). Signaling schemes for revenue maximization. In Proceedings of the 13th ACM Conference on Electronic Commerce, EC ’12, pp. 514–531, New York, NY , USA. ACM. Fang, F., Jiang, A. X., & Tambe, M. (2013). Optimal patrol strategy for protecting moving targets with multiple mobile resources. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS). Fang, F., Nguyen, T. H., Pickles, R., Lam, W. Y ., Clements, G. R., An, B., Singh, A., & Tambe, M. (2016a). Deploying paws to combat poaching: Game-theoretic patrolling in areas with complex terrain (demonstration). In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp. 4355–4356. AAAI Press. Fang, F., Nguyen, T. H., Pickles, R., Lam, W. Y ., Clements, G. R., An, B., Singh, A., Tambe, M., & Lemieux, A. (2016b). Deploying PAWS: Field optimization of the protection assistant for wildlife security. In IAAI. Fang, F., Stone, P., & Tambe, M. (2015). When security games go green: Designing defender strategies to prevent poaching and illegal fishing. In International Joint Conference on Artificial Intelligence (IJCAI). Garey, M. R., & Johnson, D. S. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness (Series of Books in the Mathematical Sciences) (First Edition edition). W. H. Freeman. Gentzkow, M., & Kamenica, E. (2014). Costly persuasion. American Economic Review, 104(5), 457–62. Gentzkow, M., & Kamenica, E. (2016). Competition in persuasion. The Review of Economic Studies, 84(1), 300–322. 190 Gholami, S., Ford, B., Fang, F., Plumptre, A., Tambe, M., Driciru, M., Wanyama, F., Rwetsiba, A., Nsubaga, M., & Mabonga, J. (2017). Taking it for a test drive: a hybrid spatio-temporal model for wildlife poaching prediction evaluated through a controlled field test. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 292–304. Springer. Gick, W., & Pausch, T. (2012). Persuasion by stress testing: Optimal disclosure of supervisory information in the banking sector. No. 32/2012. Discussion Paper, Deutsche Bundesbank. Goldstein, I., & Leitner, Y . (2013). Stress tests and information disclosure.. Gopalan, P., Nisan, N., & Roughgarden, T. (2015). Public projects, boolean functions, and the borders of border’s theorem. In Proceedings of the Sixteenth ACM Conference on Eco- nomics and Computation, EC ’15, pp. 395–395, New York, NY , USA. ACM. Gr¨ otschel, M., Lov´ asz, L., & Schrijver, A. (1988). Geometric Algorithms and Combinatorial Optimization, V ol. 2 of Algorithms and Combinatorics. Springer. Guo, M., & Deligkas, A. (2013). Revenue maximization via hiding item attributes. CoRR, abs/1302.5332. Guo, Q., An, B., Bosansky, B., & Kiekintveld, C. (2017). Comparing strategic secrecy and stack- elberg commitment in security games. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. Jain, M., Kardes, E., Kiekintveld, C., Ordez, F., & Tambe, M. (2010). Security games with arbi- trary schedules: A branch and price approach.. In Fox, M., & Poole, D. (Eds.), Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI). AAAI Press. Jain, M., Korzhyk, D., Vanˇ ek, O., Conitzer, V ., Pˇ echouˇ cek, M., & Tambe, M. (2011). A dou- ble oracle algorithm for zero-sum security games on graphs. In The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, AAMAS ’11, pp. 327–334. Jain, M., Leyton-Brown, K., & Tambe, M. (2012). The deployment-to-saturation ratio in security games. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 1362–1370. AAAI Press. Jerrum, M. R., Valiant, L. G., & Vazirani, V . V . (1986). Random generation of combinatorial structures from a uniform distribution. Theor. Comput. Sci., 43, 169–188. Jiang, A. X., & Leyton-Brown, K. (2011). Polynomial-time computation of exact correlated equilibrium in compact games. In Proceedings of the Twelfth ACM Electronic Commerce Conference (ACM-EC). Johnson, J. P., & Myatt, D. P. (2006). On the simple economics of advertising, marketing, and product design. American Economic Review, 96(3), 756–784. Kahn, J., & Kayll, P. M. (1997). On the stochastic independence properties of hard-core distribu- tions. Combinatorica, 17(3), 369–391. Kamenica, E., & Gentzkow, M. (2011). Bayesian persuasion. American Economic Review, 101(6), 2590–2615. 191 Khot, S., & Saket, R. (2012). Hardness of finding independent sets in almost q-colorable graphs. In IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 380– 389. IEEE. Kiekintveld, C., Jain, M., Tsai, J., Pita, J., Ord´ o˜ nez, F., & Tambe, M. (2009). Computing optimal randomized resource allocations for massive security games. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, pp. 689–696. International Foundation for Autonomous Agents and Multiagent Systems. Knuth, D. E. (1997). The art of computer programming, V ol. 3. Pearson Education. Kolotilin, A. (2015). Experimental design to persuade. Games and Economic Behavior, 90, 215–226. Kolotilin, A., Mylovanov, T., Zapechelnyuk, A., & Li, M. (2017). Persuasion of a privately informed receiver. Econometrica, 85(6), 1949–1964. Korzhyk, D., Conitzer, V ., & Parr, R. (2011a). Security games with multiple attacker resources. In Twenty-Second International Joint Conference on Artificial Intelligence. Korzhyk, D., Yin, Z., Kiekintveld, C., Conitzer, V ., & Tambe, M. (2011b). Stackelberg vs. nash in security games: An extended investigation of interchangeability, equivalence, and unique- ness. Journal of Artificial Intelligence Research, 41, 297–327. Kremer, I., Mansour, Y ., & Perry, M. (2014). Implementing the ”wisdom of the crowd”. Journal of Political Economy, 122(5), 988–1012. Letchford, J., & Conitzer, V . (2010). Computing optimal strategies to commit to in extensive-form games. In Proceedings of the 11th ACM conference on Electronic commerce, pp. 83–92. ACM. Letchford, J., Conitzer, V ., & Munagala, K. (2009). Learning and approximating the optimal strategy to commit to.. In Mavronicolas, M., & Papadopoulou, V . G. (Eds.), SAGT, V ol. 5814 of Lecture Notes in Computer Science, pp. 250–262. Springer. Li, Y ., Conitzer, V ., & Korzhyk, D. (2016). Catcher-evader games. In Proceedings of the Twenty- Fifth International Joint Conference on Artificial Intelligence, pp. 329–337. AAAI Press. Mansour, Y ., Slivkins, A., & Syrgkanis, V . (2015). Bayesian incentive-compatible bandit explo- ration. arXiv preprint arXiv:1502.04147. Mersheeva, V ., & Friedrich, G. (2015). Multi-uav monitoring with priorities and limited en- ergy resources. In Proceedings of the Twenty-Fifth International Conference on Automated Planning and Scheduling, pp. 347–355. AAAI Press. Miltersen, P. B., & Sheffet, O. (2012). Send mixed signals: earn more, work less.. In Faltings, B., Leyton-Brown, K., & Ipeirotis, P. (Eds.), ACM Conference on Electronic Commerce, pp. 234–247. ACM. Moreto, W. (2013). To conserve and protect: Examining law enforcement ranger culture and operations in Queen Elizabeth National Park, Uganda. Ph.D. thesis, Rutgers University- Graduate School-Newark. Nguyen, T. H., Delle Fave, F. M., Kar, D., Lakshminarayanan, A. S., Yadav, A., Tambe, M., Agmon, N., Plumptre, A. J., Driciru, M., Wanyama, F., et al. (2015). Making the most 192 of our regrets: Regret-based solutions to handle payoff uncertainty and elicitation in green security games. In International Conference on Decision and Game Theory for Security, pp. 170–191. Springer. Nudelman, E., Wortman, J., Shoham, Y ., & Kevin, L.-B. (2004). Run the gamut: A compre- hensive approach to evaluating game-theoretic algorithms. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems-Volume 2, pp. 880–887. IEEE Computer Society. Nyirenda, V . R., & Chomba, C. (2012). Field foot patrol effectiveness in kafue national park, zambia. Journal of Ecology and the Natural Environment, 4(6), 163–172. Papadimitriou, C. H., & Roughgarden, T. (2008). Computing correlated equilibria in multi-player games. J. ACM, 55(3), 14:1–14:29. Paruchuri, P., Pearce, J. P., Marecki, J., Tambe, M., Ordonez, F., & Kraus, S. (2008). Efficient algorithms to solve Bayesian Stackelberg games for security applications.. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI), pp. 1559–1562. Pita, J., Jain, M., Marecki, J., Ord´ o˜ nez, F., Portway, C., Tambe, M., Western, C., Paruchuri, P., & Kraus, S. (2008a). Deployed ARMOR protection: the application of a game theoretic model for security at the los angeles international airport. In AAMAS: industrial track. Pita, J., Jain, M., Marecki, J., Ord´ o˜ nez, F., Portway, C., Tambe, M., Western, C., Paruchuri, P., & Kraus, S. (2008b). Deployed armor protection: the application of a game theoretic model for security at the Los Angeles international airport. In Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems: industrial track, pp. 125– 132. International Foundation for Autonomous Agents and Multiagent Systems. Pitowsky, I. (1991). Correlation polytopes: Their geometry and complexity.. Math. Program., 395–414. Powell, R. (2007). Allocating defensive resources with private information about vulnerability. American Political Science Review, 101(04), 799–809. Rabinovich, Z., Jiang, A. X., Jain, M., & Xu, H. (2015). Information disclosure as a means to security. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, AAMAS, Istanbul, Turkey, 2015. Rayo, L., & Segal, I. (2010). Optimal information disclosure. Journal of Political Economy, 118(5), 949 – 987. Rowe, N. C. (2006). A taxonomy of deception in cyberspace.. Rowe, N. C., & Rothstein, H. (2004). Two taxonomies of deception for attacks on information systems. Journal of Information Warfare, 3(2), 27–39. Rubinstein, A. (2017). Honest signaling in zero-sum games is hard...and lying is even harder!. In Proceedings of the 44th international colloquium conference on Automata, Languages, and Programming. Springer-Verlag. Schrijver, A. (2003). Combinatorial Optimization - Polyhedra and Efficiency. Springer. Singh, M., & Vishnoi, N. K. (2013). Entropy, optimization and counting. CoRR. 193 Stranders, R., De Cote, E. M., Rogers, A., & Jennings, N. R. (2013). Near-optimal continuous patrolling with teams of mobile information gathering agents. Artificial intelligence, 195, 63–105. Talmor, N., & Agmon, N. (2017). On the power and limitations of deception in multi-robot adversarial patrolling. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 430–436. Tambe, M. (2011). Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press. Taneva, I. A. (2015). Information design.. Tsai, J., Rathi, S., Kiekintveld, C., Ordonez, F., & Tambe, M. (2009). Iris - a tool for strategic security allocation in transportation networks. In The Eighth International Conference on Autonomous Agents and Multiagent Systems - Industry Track. Tsai, J., Yin, Z., young Kwak, J., Kempe, D., Kiekintveld, C., & Tambe, M. (2010). Urban security: Game-theoretic resource allocation in networked physical domains. In National Conference on Artificial Intelligence (AAAI). Valiant, L. G. (1979). The complexity of computing the permanent. Theoretical computer science, 8(2), 189–201. von Stackelberg, H. (1934). Marktform und Gleichgewicht. Springer, Vienna. von Stengel, B., & Zamir, S. (2004). Leadership with commitment to mixed strategies.. CDAM Research Report LSE-CDAM-2004-01, London School of Economics. V orobeychik, Y ., An, B., & Tambe, M. (2012). Adversarial patrolling games. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 3, pp. 1307–1308. International Foundation for Autonomous Agents and Multiagent Sys- tems. Waldfogel, J., & Chen, L. (2006). Does information undermine brand? information intermediary use and preference for branded web retailers. The Journal of Industrial Economics, 54(4), 425–449. Wang, Y . (2015). Bayesian persuasion with multiple receivers. Available at SSRN 2625399. Weinberg, S. M. (2014). Algorithms for Strategic Agents. Ph.D. thesis, Massachusetts Institute of Technology. Wittemyer, G., Northrup, J. M., Blanc, J., Douglas-Hamilton, I., Omondi, P., & Burnham, K. P. (2014). Illegal killing for ivory drives global decline in african elephants. Proceedings of the National Academy of Sciences, 111(36), 13117–13121. Xu, H. (2016). The mysteries of security games: Equilibrium computation becomes combina- torial algorithm design. In Proceedings of the 2016 ACM Conference on Economics and Computation, pp. 497–514. ACM. Xu, H., Fang, F., Jiang, A. X., Conitzer, V ., Dughmi, S., & Tambe, M. (2014). Solving zero- sum security games in discretized spatio-temporal domains. In Proceedings of the 28th Conference on Artificial Intelligence (AAAI 2014), Qubec, Canada. 194 Xu, H., Jiang, A. X., Sinha, A., Rabinovich, Z., Dughmi, S., & Tambe, M. (2015). Security games with information leakage: Modeling and computation. In Proceedings of the Twenty- Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pp. 674–680. Yin, Y ., An, B., V orobeychik, Y ., & Zhuang, J. (2013). Optimal deceptive strategies in security games: A preliminary study.. Yin, Y ., Xu, H., Gain, J., An, B., & Jiang, A. X. (2015). Computing optimal mixed strategies for security games with dynamic payoffs. In IJCAI. Yin, Z., Jiang, A. X., Tambe, M., Kiekintveld, C., Leyton-Brown, K., Sandholm, T., & Sullivan, J. P. (2012). TRUSTS: Scheduling randomized patrols for fare inspection in transit systems using game theory. AI Magazine, 33(4), 59. Zhuang, J., & Bier, V . M. (2010). Reasons for secrecy and deception in Homeland-Security resource allocation. Risk Analysis, 30(12), 1737–1743. Zhuang, J., & Bier, V . M. (2011). Secrecy and deception at equilibrium, with applications to anti-terrorism resource allocation. Defence and Peace Economics, 22(1), 43–61. 195
Abstract (if available)
Abstract
This thesis considers the following question: in systems with self-interested agents (a.k.a., games), how does information—i.e., what each agent knows about their environment and other agents' preferences—affect their decision making? The study of the role of information in games has a rich history, and in fact forms the celebrated field of information economics. However, different from previous descriptive study, this thesis takes a prescriptive approach and examines computational questions pertaining to the role of information. In particular, it illustrates the double-edged role of information through two threads of research: (1) how to utilize information to one's own advantage in strategic interactions
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Towards addressing spatio-temporal aspects in security games
PDF
Computational aspects of optimal information revelation
PDF
Do humans play dice: choice making with randomization
PDF
Hierarchical planning in security games: a game theoretic approach to strategic, tactical and operational decision making
PDF
Human adversaries in security games: integrating models of bounded rationality and fast algorithms
PDF
Thwarting adversaries with unpredictability: massive-scale game-theoretic algorithms for real-world security deployments
PDF
The human element: addressing human adversaries in security domains
PDF
Interactive learning: a general framework and various applications
PDF
Not a Lone Ranger: unleashing defender teamwork in security games
PDF
Modeling human bounded rationality in opportunistic security games
PDF
The interpersonal effect of emotion in decision-making and social dilemmas
PDF
An investigation of fully interactive multi-role dialogue agents
PDF
Decoding information about human-agent negotiations from brain patterns
PDF
Exploring the computational frontier of combinatorial games
PDF
Addressing uncertainty in Stackelberg games for security: models and algorithms
PDF
Automated negotiation with humans
PDF
Machine learning in interacting multi-agent systems
PDF
Game theoretic deception and threat screening for cyber security
PDF
Computational model of human behavior in security games with varying number of targets
PDF
Handling attacker’s preference in security domains: robust optimization and learning approaches
Asset Metadata
Creator
Xu, Haifeng
(author)
Core Title
Information as a double-edged sword in strategic interactions
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
01/11/2019
Defense Date
05/09/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
algorithms,game theory,information leakage,OAI-PMH Harvest,persuasion,security games,signaling
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Dughmi, Shaddin (
committee chair
), Tambe, Milind (
committee chair
), Camara, Odilon (
committee member
), Conitzer, Vincent (
committee member
), Kempe, David (
committee member
), von Winterfeldt, Detlof (
committee member
)
Creator Email
haifeng.hxu@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-113541
Unique identifier
UC11676770
Identifier
etd-XuHaifeng-7020.pdf (filename),usctheses-c89-113541 (legacy record id)
Legacy Identifier
etd-XuHaifeng-7020.pdf
Dmrecord
113541
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Xu, Haifeng
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
algorithms
game theory
information leakage
persuasion
security games
signaling