Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Discounted robust stochastic games with applications to homeland security and flow control
(USC Thesis Other)
Discounted robust stochastic games with applications to homeland security and flow control
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DISCOUNTED ROBUST STOCHASTIC GAMES WITH APPLICATIONS TO HOMELAND SECURITY AND FLOW CONTROL by Erim Karde¸ s A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (INDUSTRIAL AND SYSTEMS ENGINEERING) August 2007 Copyright 2007 Erim Karde¸ s ii Dedication To... my parents ¨ Umran Karde¸ s, Hamdi Karde¸ s, and my brother Emre Karde¸ s, Marie Salda˜ na for all her support, Sigmund Freud, Friedrich Nietzche, Karl Marx, and Beno Kuryel, Alphonso Johnson, my bands, and the Red Hot Chili Peppers for being around... much love... Erim, Los Angeles, Spring 2007 iii Acknowledgments ThisresearchwaspartiallysupportedbytheUnitedStatesDepartmentofHome- land Security through the Center for Risk and Economic Analysis of Terrorism Events (CREATE), grant number EMW-2004-GR-0112. However, any opinions, findings, and conclusions or recommendations in this document are those of the author’s and do not necessarily reflect views of the U.S. Department of Homeland Security. iv Table of Contents Dedication ii Acknowledgments iii List of Tables v List of Figures vi Abstract vii Chapter 1. Introduction 1 Chapter 2. Literature Survey 6 2.1. Game Theory 6 2.2. Stochastic Games 9 2.3. Game Theory and Robust Optimization 13 2.4. Game Theory and Influence Diagrams 14 2.5. Game Theory and Reliability 17 2.6. Summary of Research and Contributions 18 Chapter 3. Methodology 22 3.1. Stochastic Games 22 3.2. Robust Optimization 26 3.3. Formulation of Discounted Robust Stochastic Games 28 3.4. Existence of Equilibrium Points in Discounted Robust Stochastic Games 30 3.5. Calculation of an Equilibrium Point 39 Chapter 4. Applications 45 4.1. MANPADS Model 45 4.2. The Robust Game Model and Sensitivity Analyses 50 4.3. A Time-Variant Model 59 4.4. A Flow Control Example 65 Chapter 5. Conclusions 79 Bibliography 86 Appendix 92 v List of Tables Table 1: Equilibrium solution for the main MANPADS model 49 Table 2: Intervals for probability transitions 50 Table 3: Transition data scenarios 63 Table 4: Intervals of rates for different instances 67 vi List of Figures Figure 1: The main MANPADS model 48 Figure 2: 2-way sensitivity analysis on the cost of fatal crash and the probability of attempt 54 Figure 3: Best response values to the defender 55 Figure 4: 2-way sensitivity analysis with countermeasures costs set to $30 billion 56 Figure 5: 2-way sensitivity analysis on the probability of re-play and fatal crash cost (with a fixed attack probability of 0.50) 57 Figure 6: 2-way sensitivity analysis on the probability of re-play and fatal crash cost (with a fixed attack probability of 0.20) 58 Figure 7: Utilities to attacker 60 Figure 8: Strategy profiles 62 Figure 9: Payoff functions 68 Figure 10: L, the average number of customers in the system 70 Figure 11: W, average time a customer spends in the system 70 Figure 12: Average value of the game 71 Figure 13: Robust equilibrium average values vs average values for nominal strategies at their worst-case data 72 Figure 14: Robust equilibrium average values vs average values for nominal strategies at their worst-case data 73 Figure 15: L values for robust equilibrium vs for nominal strategies at their worst-case data 75 Figure 16: L values for robust equilibrium vs for nominal strategies at their worst-case data 75 Figure 17: Mean values for the average value of the game to player 2 77 Figure 18: Standard deviation values 77 vii Abstract This dissertation presents a distribution-free, robust optimization model for n- person finite state/action discounted stochastic games with incomplete information. Weconsidern-player,non-zerosumdiscountedstochasticgamesinwhichnoneofthe playersknowsthetruedataofthegameandeachplayerconsidersadistribution-free incomplete information stochastic game to be played using robust optimization. We call such games “discounted robust stochastic games”. Discounted robust stochastic games allow us to use simple uncertainty sets for the unknown data of the game, and to eliminate the former approaches’ requirements on defining prior probabil- ity distributions over a set of games. We prove the existence of equilibrium points when the payoffs of the game belong to a bounded set and the transition data is ambiguious. Unlike the prior work on incomplete information stochastic games, our approach lends itself to an explicit mathematical programming formulation for an equilibrium calculation. We illustrate the use of discounted robust stochastic games in a security related decision problem, followed by a control problem in a single server queuing system. 1 Chapter 1. Introduction Stochastic games have been studied in various fields such as mathematics, oper- ations research, economics, and engineering since the 1950s. They were first intro- duced to the game theory literature by Lloyd S. Shapley in 1953 [66]. In Shapley’s (non cooperative) two-person zero-sum game, the play proceeds in stages, from one state to the other according to transition probabilities controlled jointly by two op- ponents. It consists of states and actions associated with each player. Once in a state, each player chooses his respective actions. The play then moves into another state with some probability that is determined by the actions chosen and by the state in which they are chosen. Given that opponents make their respective de- cisions in a given stage, a cost is incurred to each player. In game theory, “two person” indicates that there are two players in the model. “Zero sum” denotes that a player’s (usually player 1) gain is the cost to the other player (usually player 2). Hence, there is a complete utility transfer from one player to the other and payoffs to the players sum up to zero. Many extensions to Shapley’s model have been proposed after his seminal pa- per, such as games with infinite states and actions, N person games, games with incompleteinformation, continuoustimegames, andsemi-Markovgamesamongnu- merous others. In certain applications, such as the security related decision models presented in this study, the data of a decision model could be subject to uncertainty at the time the decisions are made. Furthermore, it could be difficult to obtain distributions on the uncertain parameters of a decision model. Even if we suppose that the data could be extracted in some manner from a given source, we may not be able to measure, estimate, or compute it exactly. In this dissertation, we consider n-player, non-zero sum discounted stochastic games in which none of the players knows the true data of the game and each 2 player considers a distribution-free incomplete information stochastic game to be played using robust optimization. We call such games “ discounted robust stochastic games”. Robust models are used to represent uncertainty via uncertainty sets. In robust optimization, uncertain parameters are not known, but it is known that they belong to a set. An example of such a set is the set of intervals of probability transitions in a stochastic game, which could be easier to obtain in applications, as opposed to obtaining point value estimates of exact transition probability data. In this sense, robust optimization suggests to optimize an objective function with respect to the worst-case values that the uncertain parameters could attain, with respect to the values the decision variables attain. In this approach, neither the values of the decision variables, nor those of the uncertain parameters, are known a priori. Hence, an objective in this study is to present a distribution-free model for discounted stochastic games with incomplete information and provide an explicit formulation for equilibrium calculation. This research is motivated by the need to model and solve homeland security related decision problems using suitable methodologies that account for antagonism and uncertainty. Unlike naturally occurring or accidental events – such as floods, earthquakes or system failures – security related problems involve adversarial and adaptive opponents, i.e. the attackers or the terrorists. Thus, investments designed toprotectagainstonetypeofterrorismthreat(e.g.,againstblasts,biologicalagents, or radiological devices) have potential to elevate the risk of other types of terrorism. Furthermore, interventions to protect against a certain terrorism threat category may influence the terrorists’ selection of alternative methodswithin the same threat category. On the other hand, investments targeted at reducing the general effec- tiveness of terrorist organizations, or targeted at the willingness of individuals to engage in terrorism, may protect against multiple types of terrorism. 3 One goal of the Center for Risk and Economic Analysis of Terrorism Events (CREATE) is to develop new methods to guide investments in counter terrorism, accounting for economic costs and benefits, and accounting for non-state-based ter- rorism risk. From a theoretical perspective, our research is motivated by the fact that in many practical applications, the data of a game theoretical model may not be known exactly, but it may be known that it belongs to a set over which we do not have probabilistic information. Futhermore, solutions to stochastic games may be very sensitive to payoff and transition probability data for which the estimates couldbeinaccurate. Therefore, thisdissertationinvestigatesfinitestate/action, dis- crete time, non-zero sum, discounted stochastic games in which the payoffs and/or the transition probabilitites among states are ambigious and players adopt a robust optimization approach to cope with the ambiguity. Here, we use the conventional terminology used in decision analysis where ambiguity refers to quantities with un- known probabilities, unlike the term “uncertainty” that refers to random quantities with known probability distributions. From an applied point of view, we first demonstrate the use of discounted robust stochastic games in a homeland security related case study, entitled “Man Portable Air Defense Systems (MANPADS)”. MANPADS are man portable surface-to-air missiles. Recently, there have been publicized MANPADS attacks on large civilian aircrafts in Kenya and Iraq, which has increased the fear of such attacks on the US soil and outside the US. One countermeasure that could be installed on civilian aircrafts to deflect such incoming missiles is called direct infrared countermeasures (DIRCMs). DIRCMsjamtheheatseakingdeviceofaMANPADSmissileanddeflect its course away from the airplane. The heat seeking MANPADs are called infrared missiles (IR). In 2004, the US Department of Homeland Security (DHS) initiated a $ 100 million program to develop DIRCM countermeasures. Currently, there is a 4 pending decision by Congress on whether to install these countermeasures on some or all US commercial aircrafts. To aid DHS in this decision process, the Center for Risk and Economic Analysis of Terrorism Events (CREATE) has been conducting decision analysis for the MANPADS threat. Within CREATE, MANPADS case is studied in detail in [52]. In another study by CREATE researchers [76], a standard decision tree methodology is used to perform cost / benefit analyses of installing MANPADS countermeasures on civilian aircrafts in the US. In this dissertation, we illustratetheuseofanunconventionaldecisionanalysistechnique,namelystochastic games, to capture the intentional antagonistic character inherent in the MANPADS problem that could not be addressed using conventional methods such as decision trees and influence diagrams. Furthermore, this research addresses the uncertain adversarial behavior of attackers via sensitivity analyses performed on a stochastic game model. A crucial property of stochastic games, as also pointed out in [19], is that since costsandtransitionprobabilitiesdependonthedecisionsofbothdecisionmakers,as wellasonthecurrentstate, thefate ofthedecisionmakersiscoupledintheprocess. Note that this property is intimately related, in general, to the characterization of security investment decisions in the presence of terrorism threats, and to the MANPADS case in particular. In summary, this dissertation presents a distribution-free model for n-person fi- nitestate/actiondiscountedstochasticgameswithincompleteinformation. Byusing this distribution-free model, our new approach allows us to use simple uncertainty sets for the unknown data of the game, and to eliminate the former approaches’ requirements on defining prior probability distributions over a set of games, which in fact form the foundation of former research efforts on incomplete information stochastic games [58]. Furthermore, unlike the previous work on incomplete in- 5 formation stochastic games, our approach lends itself to an explicit mathematical programmingformulationforanequilibriumcalculationinarobuststochasticgame. This dissertation is organized as follows. Chapter 2 surveys the literature on game theoretical models used in homeland security related applications, as well as stochastic games. In chapter 3, we introduce the foundations of robust stochastic games. Specifically, chapter 3 introduces stochastic games basics and robust opti- mization. After formulating our incomplete information stochastic game model, we prove that robust stochastic games have equilibrium points in the sense of Nash’s pioneering work in [49]. In other words, we seek to determine whether decisions that prescribe the best response to the players’ best strategies could be found in the presence of ambiguity in the data of the game. Chapter 3 also shows that when the ambiguity in the transition data of a stochastic game comes from a polytope instersected with the probability simplex, the robust equilibrium can be formulated as a feasibility problem, the solution of which gives an equilibrium point of the dis- counted robust stochastic game. In chapter 4, we illustrate the use of discounted robust stochastic games in the context of a homeland security case study, namely the MANPDAS problem, followed by sensitivity analyses. Chapter 4 also presents an application of discounted robust stochastic games to the control of a birth and death process. In this section, we discuss the value of using robust policies to each player. Chapter 5 presents conclusions and possible future research directions. 6 Chapter 2. Literature Survey Thisresearchismotivatedbytheneedtomodelandsolvehomelandsecurityrelated decision problems using suitable methodologies that account for antagonism and uncertainty. Hence, models used in homeland security related problems play an important role in shaping the main focus in this dissertation. In this chapter, we review the literature on homeland security related adversarial modeling approaches that has shaped the main ideas in this research and present the related literature on stochastic games and robust optimization in the game theory context. 2.1. Game Theory Social scientists have written many papers on applications of game theory to terrorism, as explained in [59]. The authors contend that game theory captures the strategic interactions among terrorists and targeted governments, that is between players, where actions are interdependent and neither of the sides can be considered passive. Other reasons include the rationality assumption of the players in games and ability of games to represent gains or losses to a player through payoffs. Itisimportanttonotethat[59]usesasimplegametheorymodeltoanswerhigh- level,genericquestions. Theauthorsnotethatthemodelwouldbenefitfromamulti- period analysis of terrorist campaigns, where terrorist resource allocation is studied over time. Another area of future work could be differential games to examine how terrorist organizations are influenced by successful and failed operations. The dynamics of strategic choices of both players can be captured with this approach by modeling the change in resources over time for each player. Finally, the authors note that cooperative game theory has never been applied to the study of terrorism, which would enable analysis of shared intelligence, training facilities, and operatives to strengthen their abilities. 7 [60] presents models that depict the negotiation process between terrorists and government policymakers for incidents where hostages are seized and demands are issued. [42] presents a game in extensive form where the government first chooses the level of deterrence that consequently determines the logistical failure or success of terrorists when they engage in a hostage mission. [3] extends Nash’s bargaining game, where time is taken into consideration. [61] presents an application of game theory that involves terrorists choice of targets for a three-player game involving two targeted governments and a common terrorist threat. [43] analyzes a scenario via a two-period game, where the government is incompletely informed about the terrorists capability. The extent of terrorist attacks in this scenario can provide information to the government about the type of the terrorist group. [17]makestwocontributionstotheliteratureonterrorism: 1)Itpresentsamodel that explains the cyclical characteristic of terrorist attacks, and 2) It improves on the existing theoretical cyclical models since it takes into account terrorists’ motiva- tions and decision-making explicitly. A differential game is used between terrorists and the government in which terrorists maximize the numberofattacks subject to a constraint that combines terrorists resources and government anti-terrorist policies. This model is a standard microeconomic model, where the representative terrorist group solves a maximization problem based on preferences, actions, incentives, and budgetrestrictions. Thegovernmentproblemconcernsthemaximizationofnational security. The solution of the terrorist problem yields a time path for terrorist activ- ities. The government takes the time path of terrorist activities into account when maximizing national security over time. [15] studies the emergence of the recent form of terrorism using evolutionary gametheory. Themodelinthispaperpresentsterrorismastheresultofcompetition between countries, when the desire to imitate the leading country is frustrated by 8 theimpossibilityofdoingso. Authorsdefineamulti-countrysetupwhereinteraction takes place in an international trade game, which is a coordination game. [40] considers security as a problem among agents and focus on situations where the security levels of members of a group are interdependent. The main idea in this paper is that the dependence of one agent’s security on the behavior of others maypartiallyorcompletelynegatethepayoffsitreceivesfromitsowninvestmentin protective measures. These cross-effects are referred to as contagion. Authors illus- trate this argument by reference to an airline that attempts to determine whether to install a baggage checking system. In making this decision, the airline needs to balance the cost of installing and operating such a system with the reduction in the risk of an explosion from a piece of luggage not only from the passengers who check in with it, but also from the bags of passengers who check in on other airlines and then transfer to it. In this example, the incentive to invest in security decreases if other airlines fail to adopt protective measures. As the authors indicate, this paper examines the case where all agents are identical. [40] considers situations where the agents have different protection costs and risks,andwheretheactionscreatingpotentiallossesareimpactedbyagents’protec- tive decisions. Future research directions suggested in the paper include examining how agents behave in multi-period models and determining appropriate behavioral models of choice that could characterize individuals who make imperfectly rational decisions. [47] presents another application of game theory, which includes a simplified model of terrorism risk to develop a probability distribution of losses. However, this effort captures only the severity component of risk that is of potential interest to the insurance professionals. The losses that could occur with certain probabilities are revealed given that an attack is attempted. 9 Game theory is a suitable way to model adversarial decision-making processes. However, this approach still has limitations and simplifications. Game theory ap- plications could be supported with additional modeling methodologies as described below. 2.2. Stochastic Games Asmentionedintheintroduction,unlikenaturallyoccurringoraccidentalevents, terrorism is essentially adversarial. Therefore, investments designed to protect against one type of terrorism (e.g., against blasts, biological agents, or radiologi- cal devices) have the potential to elevate the risk of other types of terrorism. An approach that accounts for such shifts in intentions could be adopted by using stochastic games. There is an extensive amount of research in stochastic games in various fields such as economics, mathematics and operations research since the 1950s. Thebasictwopersonzerosum(discrete)stochasticgameisplayedasfollows. There are states, and strategy sets for each player and for all states. The system evolves in stages represented by discrete time points. At each state, the system is in one of its states and players 1 and 2 choose their respective actions. There is an immediate payoff as a consequence of the choices of the players. Then, the system moves into another state with some probability determined by the previous state, and by the choices of the players in the previous state. The fundamental question is then to find the optimal strategies that could be adopted by the players that op- timize their own (noncooperative) objectives. [66] first introduced stochastic games and proved that the value and optimal strategies of the game exist. Since publications on stochastic games are usually in the form of research pa- pers and monographs, [19] devotes a single textbook to the topic. The authors study discrete time finite state finite action stochastic games with complete infor- mation from the Markov decision processes and mathematical programming points 10 of views, where there are more than one decision makers with conflicting objec- tives, and use the name Competitive Markov Decision Processes. The authors treat discounted stochastic games, their relation with linear programming and nonlinear programming formulations, and the existence of stationary strategies and equilibria in depth. An important result is that the class of nonstationary strategies cannot achieve a better equilibrium value than the class of stationary strategies. Another observation is that, unlike Markov decision processes problems, general two person zero sum stochastic games cannot be solved by linear programming (LP). However, certain restrictions could be imposed on them in order to convert the problem into a suitable linear programming problem. Two of these restrictions are as follows. First, single controller discounted games lend themselves to LPs [53]. In this model, the system makes a transition into the next state with some probability according to the previous state and the action taken by one of the decision makers in the previous state. Hence, the action of the other player is irrelevant in determining the next state. Second, separable-reward-state-independent-transition discounted stochastic games could be converted to an LP. In this model, the payoff function can be expressed by two components, where one component is dependent only on the current state, and the other component is on the pair of choices made by the decisionmakers. Also,thetransitiontothenextstateisdeterminedonlybythepair of actions taken by the opponents and does not depend on the current state of the system [54]. Advances in stochastic games throughout the years could be viewed from two coupled perspectives: game theoretical perspective, and the stochastic processes perspective. An extension to stochastic game models mentioned above is the one with incom- plete information. The incomplete information case within the repeated games is firstintroducedin[4]. Severalauthorshaveadoptedtheapproachin[4]tostochastic 11 games. In [58], authors consider stochastic games with incomplete information for one of the players. However, the restriction in this model is that the transitions to the next state are controlled by a single player. Another extension by same authors concerns incomplete information on both sides. A two-player zero-sum stochastic game with incomplete information is described by a finite collection of stochastic games. It is assumed that the games differ only through their payoffs but they all have the same sets of states and actions, and the same transition matrix. The game is played in stages. A stochastic game is to be played out of the finite set of games over which a probability distribution is specified. Player 1 isinformed of thespecific game to be played, while player 2 is not. All that the second player knows is that a game is to be chosen randomly from the finite set of games and to be played thereafter. At every stage, the two players choose their actions simultaneously and the system moves into the next state. Both players are informed of their actions and the current state of the system. Note that the actual payoff is not told to player 2 but is known by player 1. A stochastic game could be viewed as a Markov Decision Process (MDP) where there are two or more competing decision makers who act as adversaries [19]. They could also be viewed as a collection of auxillary one-shot matrix games [66]. Hence, two lines of related research in the literature are MDPs and one-shot games. Some authors have addressed the issue of ambiguity in the transition probabilitites of MDPs. A Bayesian approach is presented by [65], where a prior distribution on the transition matrix should be known. [62], [75], and [23] have modeled an MDP where the transition matrix lies in a given set, which is most typically a polytope. [51] consider robust control in MDPs where a proof of the robust value iteration is presented. [6] consider a similar problem and present the robust value iteration without proof. [35] considers a similar problem and provides an independent proof 12 of the robust value iteration. It is important to note these recent efforts all consider the robustness in the context of MDPs, where an opponent player is not modeled explicitly. Using a worst-case approach has been prevalent in game theory since the “max- min” formulation of von Neumann’s and Morgenstern’s [72]. Forinstance, [22], [46], and [48] have presented a max-min approach to cope with ambiguous uncertainty in normal form games. Although these authors adopt a worst-case approach, their models are fundamentally probabilistic and are based on prior probability distrib- tions. Futhermore these authors adress complete information games and adopt a worst-case approach with respect to players’ behaviors towards each other, rather than adressing ambiguity in the data of a game. [27] modeled incomplete informa- tiongamesbyconsideringthateachplayercoulduseapriorprobabilitydistribution to obtain a conditional distribution on the data of the game unknown to himself. Unlike these approaches, a robust optimization approach to ambiguos payoff uncer- tainty in one shot games is considered by [1] , where authors prove the existence of equilibrium and formulate the robust game by considering the payoffs that belong to a polytope, which leads a way to equilibrium computation. Theincompleteinformationcasewithintherepeatedgamesisfirstintroducedby [4]. [67] and [68] consider stochastic games with incomplete information on one side that have a single nonabsorbing state. It is proven in this paper that these games have a min-max and a max-min value. However, there is no explicit computational scheme for these values. In a more recent effort, [58] consider two-player zero-sum stochastic games with incomplete information where the incomplete information is described by a finite collection of stochastic games, and a game is to be played out of the finite set of games over which a probability distribution is specified. This paper focuses on stochastic games in which one player controls the transitions. In 13 other words, they consider that the evolution of the game is independent of one of the opponents’ actions and only depends on one player’s actions. We note that the approach adopted in [58] is based on the approach proposed in [27], and requires a probability distribution over set of games. In his study that brings him the Nobel Prize in 1994, Harsanyi proved that an incomplete two person zero sum normal form game (I-game) could be converted into a set of complete information games (C-game) that is equivalent to the original I game. Typesofextensionstostochasticgamesfromthestochasticprocessesperspective include considering nonhomogeneous games [24], continuous time games [44], semi Markov games [36] among numerous others. 2.3. Game Theory and Robust Optimization Combination of game theory and robust optimization techniques is a very new research area. [1] considers robustness in one-shot general-sum n-person games. In this paper, authors assume that payoffs to players, who use robust optimization, belong to bounded uncertainty sets. The authors prove the existence of equilibrium points in robust n-person one-shot games, and present a formulation the solution of which gives an equilibrium point. [29] considers a bimatrix game in which the players can neither evaluate their costfunctionsexactlynorestimatetheiropponents’strategiesaccurately. Notethat this is the case in many applications in homeland security research. To formulate such a game, authors introduce the concept of robust Nash equilibrium and prove its existence under some conditions. Moreover, authors show that a robust Nash equilibrium in the bimatrix game can be characterized as a solution of a second- order cone problem (SOCP). Some numerical results are presented to illustrate the behaviorofrobustNashequilibria. Although[29]considersrobustnessinabimatrix game, combining robust optimization techniques with game theory is open to many 14 future research areas. First of all, differential or dynamic games with uncertainty couldbeworthwhiletostudythroughrobustoptimizationtechniques. Furthermore, astheauthorsindicate,theconceptofrobustNashequilibriumcouldbeextendedto thegeneraln-persongame. Forthetwo-personbimatrixgamestudiedinthispaper, it is sufficient to consider the uncertainty in the cost matrices and the opponent’s strategy. TodiscussgeneralN-persongames,amorecomplicatedstructureshouldbedealt with. Another issue is to find other sufficient conditions for the existence of robust Nash equilibria. Also, theoretical study on the relation between Nash equilibrium andtherobustNashequilibriumisworthwhile. Forexample,itisnotknownwhether the uniqueness of Nash equilibrium is inherited to robust Nash equilibrium. In this paper,authorshaveformulatedseveralrobustNashequilibriumproblemsasSOCPs. However, they have only considered the cases where either the cost matrices or the opponent’s strategy is uncertain for each player. According to the authors, it seems interesting to study the case where both of them are uncertain, or the uncertainty set is more complicated. In numerical experiments, authors employed an existing algorithm for solving SOCPs. But, there is room for improvement of solution methods. It may be useful to develop a specialized method for solving robust Nash equilibrium problems. 2.4. Game Theory and Influence Diagrams [55] presents a generic influence diagram model for setting priorities among threats and among countermeasures. The random variables used in the authors first model is fairly generic and account for types of terrorist groups, their access to materials, cash, types of weapons, and etc. For instance, only one decision variable is used to represent U.S. countermeasures. The authors next model elaborates on 15 the previous one by considering two influence diagrams: one for the terrorist be- havior and the other for the U.S. Results pertaining to the influence diagram for terrorist behavior are then used as inputs to the influence diagram for US. Hence, this model is called two-sided. The authors then consider using the two-sided dia- gram in a dynamic fashion via discrete time steps. At each step, each side updates its beliefs, objectives, and decisions based on the previous step. It is also denoted that each side is uncertain about the other’s actions and the state of knowledge. According to the authors, another change that needs to be included in the model is the evolution of the organizations involved, the emergence of new groups, or a new structure of existing groups and networks. Although these ideas are put forward, no implementations or quantitative illustrations exist with regards to the dynamic approach or evolutions of organizations. According to [39], the traditional representations of games using the extensive form or the normal form obscure much of the structure that is present in real-world games. Hence, authors propose a new representation language, named multi-agent influence diagrams (MAID), for general multi-player games. This approach extends influence diagrams to a context where more than one decision maker is involved, an idea first examined in [63]. MAIDs allow the dependencies among variables to be represented explicitly, whereas both the normal and the extensive form obscure certainimportantrelationshipsamongvariables. MAIDsrepresentationextendsthe Bayesian network formalism [56] and influence diagrams [33] to account for the de- cision problems involving multiple decision makers. They have defined semantics as non-cooperative games. Just as Bayesian networks make explicit the dependencies among random variables, MAIDs make explicit the dependencies among decision variables. They are also related to the formalism presented by [41], where network representation for games is developed. Solutions to MAIDs consider the strategic 16 independence structure on the diagram. Extensions to this research could be es- tablishing the relations among competitive Markov decision processes, stochastic games, and MAIDs. Another extension could be exploring ways to integrate the issue of evolution over time into the MAIDs framework. [12] reviews some military applications of gaming and introduce a game com- ponent into an influence diagram example. Authors illustrate the use of Bayesian game-theoretic reasoning for operations planning by transforming a decision prob- lem into a Bayesian game. [70] describes a multistage influence diagram game for modeling the maneuver- ing decisions of pilots in one-on-one air combat. [71] describes an extension of the influence diagram approach into a dynamic multistage setting without any game as- pect. Authors contend that this paper is the first elaboration where ideas regarding multi-agent multi-period influence diagrams are combined and implemented. Dy- namic programming is considered for the solution of the model in this paper. To cope with the combinatorial explosion, authors trade the solution of the complete game with the computing time and apply a moving horizon control approach, where the horizon of the original influence diagram is truncated and a dynamic game with a shorter planning horizon is solved at each decision instant. Instead of the whole duration of the game, this approach allows the players to update their information about the state of the system at any moment over the limited planning horizon. The solution approach in [71] is inspired by [13], who contend that dynamic game theory is a suitable formulation for problems that involve adversaries interacting with each other over a time period. [13] denotes that traditional solutions from dy- namic game theory that involve optimizing objective functions over the entire time horizon of the system are extremely difficult but not impossible to derive. Hence, the authors discuss a solution approach, where at each step the players limit the 17 computation of their actions to a shorter time horizon that may involve only the next few time steps. This moving horizon Nash equilibrium solution proves to be useful in near term decisions of the adversaries. An important extension to this re- search effort could be accounting for the uncertainty in payoffs by combining robust optimization techniques with game theory. 2.5. Game Theory and Reliability Many of the applications of reliability to security consider the threats against critical infrastructure, such as water supply systems [26]. However, many applica- tions do not consider an adaptive adversary. Therefore, incorporating game theory and risk and reliability analysis could be a fruitful approach [9]. [28] attempts to combine probabilistic risk analysis (PRA) and game theory by associating each unit in a reliability system with a player. By doing so, a behavioral dimension is intro- duced into PRA framework. The article demonstrates the different conflicts that arise among players in series, parallel, and summation systems over which players incur costs. [10] applies game theory and reliability analysis to identify optimal defenses against intentional threats to system reliability. Various scenarios are considered in this paper such as perfect attacker knowledge of defenses and single attack with constraineddefenderbudgetornoattackerknowledgeandsingleattackwithuncon- strained defender budget. Results of this paper emphasize the value of redundancy as a defensive strategy. According to the authors, future research could include ex- tendingthisworktocombinationsofparallelandseriessystemsratherthanfocusing only on pure parallel or series systems. Finding optimal strategies for arbitrary sys- tems is difficult. Hence, near-optimal heuristic attack and defense strategies could be developed. Another promising area of future research is to extend the models to include time, rather than the current static or snapshot view of system security. 18 This could allow the modeler to consider imperfect attacker information as well as multiple attacks over time. Another interesting future research topic could be the relation of stochastic games and reliability analysis. In a more recent effort, [5] extends results for defense of simple systems to com- bined series/parallel systems of more realistic complexity. This effort sometimes yields counterintuitive results, such as the observation that defending the stronger componentsinaparallelsubsystemcanactuallyimposegreaterburdensonprospec- tive attackers than hardening the weaker components. The authors indicate that the approach is limited to cases where the cost of attacks increases linearly with regards to the defensive investments. However, this may hold only for a limited range of defensive investments. An extension to this paper could be to relax the budget constraint, and permit the total investment to be optimized based on the value of the system being protected. 2.6. Summary of Research and Contributions In this chapter, we have presented game theoretical approaches to homeland security related problems. Game theoretical modeling techniques have been used mainly by political scientists to capture the adversarial character of security prob- lems. These models are simple in nature and are based on the basic rationality assumption of game theory. We have also presented numerous articles that include extensionstothebasicgametheoreticalapproachesthatcombinedifferentmodeling techniques. We notice in this survey that although the use of multi-stage games to model homelandsecurityhavebeenmentionedbyvariousresearchers,thereislimitedwork on this topic. Furthermore, and more importantly, there is limited work that takes into account the uncertainty (or data ambiguity) in game theoretical approaches, which forms a crucial aspect of homeland security problems. The data related to 19 homelandsecurityapplicationsaresubjecttouncertaintyatthetimeofthedecisions tobemade. Furthermore, itisverydifficulttoobtaindistributionsontheuncertain parameters of a homeland security related problem. Even if we suppose that data could be extracted in some manner from a given source, they cannot be measured, estimated, or computed exactly. Fromamethodologicalpointofview,unlikeone-shotgames,incompleteinforma- tioninstochasticgamesseemstobeafairlynewresearchareainoperationsresearch. Verylittleexistsintheliteratureregardingincompleteinformationstochasticgames. A distribution-free model for incomplete information stochastic games does not ex- ist in the literature. Single controller stochastic games with incomplete information is presented in [58]. Here, authors interperet the incomplete information as partial information on the payoff matrix for one player. Hence, the other player knows the exact payoff matrix. In the future directions section of this paper, authors consider thecasewhereeachoftheplayershaspartialinformationonthepayoffmatrix. The incomplete information scheme in this paper extends the ideas in [4] to stochastic games. Hence, there is some probability distribution associated with the unknown payoff matrix for a player. Stochastic games with incomplete information on one side that have a single non-absorbing state have been studied in [67] and [68]. The key contributions of this dissertation, both from an application and theory perspectives, are as follows. 1. We consider n-player, non-zero sum discounted stochastic games in which none of the players knows the true transition probabilities and/or payoffs of thegameandeachplayeradoptsarobustoptimizationapproachtoambiguous uncertainty. We offer an alternative equilibrium concept for stochastic games with incomplete information. We propose a distribution-free model that re- laxes the former approaches’ assumptions as to the player who has incomplete 20 informationandastowhetherthetransitionsarecontrolledbyasingleplayer. Ourapproachlendsitselftocomputationalresultsviaafeasibilityformulation for an equilibrium of a robust stochastic game. We finally illustrate the use of discounted robust stochastic games in the context of a flow control model. 2. Ourapproachextendstheideasincertainpartsof[16]. Specifically, weextend robustMarkovdecisionprocesseswithuncertaintransitionprobabilitiestothe competitive case where there are more than one players. 3. In this dissertation, we determined several properties of discounted robust stochasticgames. First,itfollowsfromourexistenceproofthatanequilibrium exists even if there exists players who do not adopt a robust optimization approach. This stems from the fact that when there are no uncertainty sets for the data of a stochastic game, the best response functions are already continuous, as shown in [20]. Hence, we can construct a correspondence that satisfies Kakutani’s theorem and that includes players who may play non- robustly. Second, equilibrium in discounted robust stochastic games exists whether the players have the same payoff uncertainty set or not. Third, the zero-sum property is most likely to vanish for stochastic games in which the payoff uncertainty is a common set for all players and in which there exists players who plays robustly. Fourth, if there is ambiguity in any data of the game,theplayers’approachtothisambiguitydiffer,resultinginthelossofthe completeantagonisticpropertyofazerosumgame. Forinstance, ifthereisan uncertainty set asscociated with the probability transition data of a stochastic game, players’ perspectives to worst-case data in this set could be different at equilibrium. Hence, at equilibrium, players may play the game according to different probability values in the uncertainty set. Fifth, if the stochastic 21 gameisatwopersonzero-sumgamebutthetransitiondataisambigious,then the equilibrium values for the players do not negate each other. This implies that although such games are zero-sum, formulations for zero-sum stochastic games could not be used for analyses and properties that pertain to zero-sum games cannot be expected to hold in the presence of ambiguity. 4. We present an approach that accounts for antagonism in homeland security related decisions. The use of this modeling technique is not restricted to the MANPADScasestudy; onthecontrary,itcouldbeusedasageneralmodeling technique for decision making in the context homeland security problems. We investigate the MANPADS decision problem using non-zero-sum stochastic games and perform sensitivity analysis. Insummary,thisdissertationoffersanalternativeequilibriumconceptforstochas- tic games with incomplete information. We propose a distribution-free model that relaxes the former approaches’ assumptions as to the player who has incomplete information and as to whether the transitions are controlled by a single player. Our approach lends itself to computational results via a feasibility formulation for an equilibrium of a discounted robust stochastic game. We finally illustrate the use of our new methodology in the context of a homeland security case study, and in the context of a flow control model. 22 Chapter 3. Methodology In this chapter, we introduce a novel approach, discounted robust stochastic games. To this end, we first review the basics of complete information stochastic games with finite state and action sets. 3.1. Stochastic Games This section reviews basics of stochastic game theory, as presented in [66] and [20]. In stochastic games, the play proceeds from one state to the other according to transition probabilities controlled jointly by two or more players. It consists of states and actions associated with each player. Once the game starts in a state, each player chooses his respective actions. The play then moves into the next state with some probability and continues from thereon. The probability that the game moves into the next state is determined by the current state and the actions chosen in the current state. Let the set of states S = {1,...,M} and the set of players I = {1,...,N} be finite. If the play is in state s, player i can choose the action a i s ∈A i s , where A i s is the set of actions of player i in state s. Suppose that each player makes a choice in states, i.e., we havea s = (a 1 s ,...,a i s ,...,a N s ). Then the game moves into statek with probability P sask ≥ 0, P M k=1 P sask = 1. In the most general sense, stochastic games could be seen as a sequence of one- shot non-zero sum n person games. Values of the one-shot games to players are accumulated in the process. Value of a stochastic game for player i starting the game in states is defined as the total value to playeri accumulated throughout the process if playeri starts the game in states. In discounted stochastic games player i discounts the values of the one-shot games to be played in the future by a factor β i , 0≤β i < 1. 23 Ateachstage,playersmayconsidertousemixedstrategies. Letx i s betheproba- bility distribution over the setA i s with cardinalitym i s . In other words, the probabil- ityvectorforplayeriinstatesisx i s = (x i s,1 ,...,x i s,m i s ),wherex i s,k ≥ 0, P m i s k=1 x i s,k = 1. If we denote the set of mixed strategies of player i in state s by S i s , then S i s is a polytope given by X i s ={x i s ∈ℜ m i s + | m i s X k=1 x i s,k = 1}. Inthisproposal,weconsideracertainclassofstrategiesasintroducedbyShapley (1953), namely, stationary strategies. Stationary strategies prescribe a player the same probabilities for his choices each time the player visits a certain state, no matter what route she follows to reach that state. Let us represent the stationary strategies of a player i by x i = (x i 1 ,...,x i M ) and denote the set of mixed strategies of all players in the state space of the game by x = (x 1 ,...,x N ). We denote mixed strategies of all players for all states except for player i by x −i = (x 1 ,...,x i−1 ,x i+1 ,...,x N ). The following notation is used to distinguish a mixed strategy of playeri from those of others, for all states, and for a specific state s, respectively, as follows. (x −i ,u i ) = (x 1 ,...,x i−1 ,u i ,x i+1 ,...,x N ). Finally, we designate the set of mixed strategies of playeri and the set of mixed strategies of all players in state s by X i = M Y s=1 X i s and X s = N Y m=1 X m s , respectively. 24 Suppose that players play with mixed strategies. Then a probability is associ- ated with each realization of a s ∈ A s , where A s = Q N i=1 A i s . Suppose that players choose their actions secretly (independently) at a given state. Then the probability associated with a s is N Y m=1 m6=i x m s,a m s u i s,a i s . Then, expected cost to player i starting in state s is given,∀s∈S,i∈I, by g i s (x −i s ,u i s ;v i ) = X as∈As N Y m=1 m6=i x m s,a m s u i s,a i s {C i sas +β i M X k=1 P sask v i k }, (1) where C i sas is the immediate cost to player i induced by a s in state s and v i k is the value to player i if the next state is k. We interpret v i k as a cost incurred to player i in state k. As it is seen in the above equation, expected cost to player i is composed of his immediate expected cost in states and expected total values of the games to be played in future stages. In this model, given the strategies of all other players in state s,i.e. x −i s , player i wishes to minimize his expected cost in s. This minimization, in turn, yields his value of the stochastic game starting in s. Hence, we obtain the following well known condition that the value vector for player i, i.e. v i = (v i 1 ,...,v i M ), must satisfy, if it exists. v i s = min u i s ∈X i s X as∈As N Y m=1 m6=i x m s,a m s u i s,a i s {C i sas +β i M X k=1 P sask v i k }, ∀s∈S,i∈I. (2) Itisinfactanotherresultthat,foranystrategytuplex = (x 1 ,...,x N )∈ Q N i=1 X i , there exists a unique corresponding value v i s ,∀i∈I,∀s∈S. We are now ready for the following definition. 25 Definition. A tuple of strategies x = (x 1 ,...,x N ) is a Nash equilibrium point in a stochastic game if and only if,∀i∈I and∀s∈S, v i s (x 1 ,...,x N )≤v i s (x −i ,u i ),∀u i ∈X i . (3) When the above conditions hold, the value v i s is called the optimal value of the game to playeri starting in states andx i is called the optimal stationary strategies for i. When this holds holds, we see that player i’s strategy x i is a best answer against all other players’ strategies x −i , for all i∈ I. Hence neither of the players has an incentive for a deviation from their respective strategies. In other words, once the equilibrium is reached, neither of the players individually wants to deviate from it. It is now a very well known result that optimal values in stochastic games exist. This interesting result was first found in [66] for two-person zero-sum stochastic games and later on was extended to the general-sum n player stochastic games in [20]. Equation 2 is a fundamental condition in stochastic games. It states that if a playerknewhowtoplayoptimallyfromthenextstageon,then,atthecurrentstage, hewouldplaywithsuchstrategiessothatheminimizestheexpectedimmediatecost atthecurrentstageandalsominimizestheexpectedcostspossiblyincurredinfuture stages. Hence, player i is not only concerned with the immediate outcome of his actions but also with the future consequences of his strategies in the current stage. Wenextstateanequivalentequilibriumdefinitionforthepurposesofdiscussions that follow. 26 Definition. A pointx∈X is a Nash equilibrium in a stochastic game if and only if,∃ v = (v 1 ,...,v N ), such that,∀i∈I,∀s∈S, v i s = min u i s ∈X i s g i s (x −i s ,u i s ;v i ) and x i s ∈ argmin u i s ∈X i s g i s (x −i s ,u i s ;v i ). (4) This definition states thatx −i s is an optimal (stationary) strategy for playeri in states if, when Equation 4 is satisfied, the corresponding minimizer of the objective functionofplayeriisastrategythathealwayswishestouseagainstallotherplayers’ strategies when in states. If this statement holds for all players and all states, then no player would wish to deviate from their strategies, resulting in an equilibrium. Playeri starting in states can use an arbitrary strategy and obtain equation 4 and a corresponding value but, in return, other players may change their strategies that forces playeri to establish equation 4 again. We look for such strategies, the use of which always makes all players reluctant to deviate from those strategies. 3.2. Robust Optimization This section briefly reviews the basics of robust optimization, as introduced in [7]. Consider the following optimization problem, P γ [7]. P γ : min x∈ℜ n f(x,γ) s.t. F(x,γ)∈K ⊂ℜ m , where γ ∈ℜ M is the data vector, x∈ℜ n is the decision vector, and K is a convex cone. 27 Suppose that • the data of P γ is uncertain and all that is known about the data is that it belongs to an uncertainty set U ∈ℜ M • the constraintsF(x,γ)∈ K must be satisfied no matter what the actual real- ization of γ∈U is. Now, consider the problem P ={P γ } γ∈U . An optimal solution to the uncertain problem P is defined as a solution that must give the best possible guaranteed value under all possible realizations of constraints. Formally, it should be an optimal solution of the following program: P R : min x∈ℜ n {sup γ∈U f(x,γ) s.t. F(x,γ)∈K, ∀γ∈U}. Problem P R is called the robust counterpart of P, and its feasible and optimal solutions are called robust feasible and robust optimal solutions, respectively [7]. Optimizationofalinearprogramwithcolumn-wiseuncertaintyintheconstraint matrix is first studied in [69]. Soyster’s model is equivalent to an LP where all uncertain parameters are fixed to their corresponding worst case values, resulting in an over conservative approach. [7] examines ellipsoidal uncertainty sets that relax the over conservative approach of Soyster’s and they show that the robust counterpart of an LP with an ellipsoidal uncertainty set is a second order conic program. It is explained in [50] that robust counterpart of an optimization problem is not restricted to LPs. Furthermore, [21] considers semidefinite programs (SDPs) whose data belong to some uncertain set. 28 3.3. Formulation of Discounted Robust Stochastic Games In this section, we formalize our robust model for incomplete information dis- counted stochastic games and the robust equilibrium concept by considering that both payoffs and transition probabilities of the game belong to respective uncer- tainty sets. In robust stochastic games, it is assumed that the players commonly know the uncertainty set of payoffs C s at each state and the set of transition prob- abilities P s out of each state. Unlike the approach in [58], players need not have distributional information on the uncertainty sets with respect to which they adopt a worst-case approach. Now, in light of the results summarized in the previous sec- tion, we notice the following: If a player knew how to play in the robust stochastic game optimally from the next stage on, then, at the current stage, she would play with such strategies so that she minimizes the maximum expected immediate cost at the current stage and also minimizes the maximum expected costs possibly incurred in future stages. Hence, if optimal robust values for player i exist, given x −i , they must satisfy the following, ω i s = min u i s ∈X i s max ˜ Cs∈Cs ˜ Ps∈Ps X as∈As N Y m=1 m6=i x m s,a m s u i s,a i s { ˜ C sas +β i M X k=1 ˜ P sask ω i k }, (5) where the inner maximization problem is with respect to the uncertain transition probabilities and uncertain immediate costs. Note that we could have modeled each player as wishing to minimize her expected maximum total cost, rather than her maximum expected total cost. We use the latter model for the following reasons. In the former case, players would have the advantage of observing each others’ randomized actions before they adopt their own perspectives on the ambiguous data. In the latter case, the adversaries do not have this advantage and the worst- caseperspectiveontheambiguityisconsideredwithrespecttothemixedstrategies. 29 Note that while the latter results in a pessimistic approach, the former model would be overly pessimistic. Furthermore, unlike the latter, the former approach does not capturetheessenceofrobustoptimization, wherewetypicallyoptimizeaworst-case objective. To ease the notation, let us define ψ i s ( ˜ C s , ˜ P s ;x −i s ,u i s ;ω i ) = X as∈As N Y m=1 m6=i x m s,a m s u i s,a i s { ˜ C sas +β i M X k=1 ˜ P sask ω i k }. Equation (5) now reads as follows. ω i s = min u i s ∈X i s max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C s , ˜ P s ;x −i s ,u i s ;ω i ). (6) We will in fact show that such robust values exist. Similar to condition (4), we are now ready to state our definition of equilibrium in robust stochastic games. Definition. Apointxisarobust equilibrium pointinarobuststochasticgame if and only if,∃ ω = (ω 1 ,...,ω N ), such that,∀i∈I,∀s∈S, ω i s = min u i s ∈X i s max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C s , ˜ P s ;x −i s ,u i s ;ω i ) (7) x i s ∈ argmin u i s ∈X i s max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C s , ˜ P s ;x −i s ,u i s ;ω i ). (8) Equivalently, a tuple of strategies x = (x 1 ,...,x N ) is a robust equilibrium point in a robust stochastic game if and only if, ∀i ∈ I and ∀s ∈ S, w i s (x 1 ,...,x N ) ≤ w i s (x −i ,u i ),∀u i ∈X i . 30 3.4. Existence of Equilibrium Points in Discounted Robust Stochastic Games Our proof of existence of equilibrium points in robust stochastic games parallels Fink’s(1964). However,adifferentcorrespondenceisdefinedthattakesintoaccount the robustness. This correspondence uses a maximum expected total cost function with respect to mixed strategies. We show that the fixed point of this suitably constructed correspondence is an equilibrium point. Let W i ≡ {ω i s ∈ ℜ} s∈S , W ≡ {W i } i∈I . The infinity norm on W is defined as follows: ||ω−θ|| ∞ = max i∈I,s∈S |ω i s −θ i s |. Next, a transformation is defined. Given the strategies of all other players and an arbitrary robust value vector for player i, this transformation minimizes the maximum expected total cost with respect to the mixed strategies for player i. As a result of Theorem 2 below, it is justified that such a robust value vector exists for any given x −i . Theorem 1 Let γ i s,x −i s :W i →ℜ be defined by γ i s,x −i s (ω i ) = min u i s ∈X i s max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C s , ˜ P s ;x −i s ,u i s ;ω i ). For x∈X, define γ x (ω) :W →W by (γ x (ω)) is =γ i s,x −i s (ω i ). The function γ x (ω) is a contraction mapping. Proof. Let ω,θ∈W. For x −i s fixed,∀i∈I,s∈S, γ i s,x −i s (ω i ) = min u i s ∈X i s max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C s , ˜ P s ;x −i s ,u i s ;ω i ) = ψ i s (C i s (x −i s ,u ∗ s i ),P i s (x −i s ,u ∗ s i ,ω i );x −i s ,u ∗ s i ;ω i ), 31 where u ∗ s i is the minimizer, and C i s (x −i s ,u ∗ s i )∈ C s and P i s (x −i s ,u ∗ s i ,ω i )∈ P s are the optimizers that now depend on (x −i s ,u ∗ s i ). Similarly, with z ∗ s i and C i s (x −i s ,z ∗ s i ) ∈ C s ,P i s (x −i s ,z ∗ s i ,θ i )∈P s , we have γ i s,x −i s (θ i ) = min z i s ∈X i s max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C s , ˜ P s ;x −i s ,z i s ;θ i ) = ψ i s (C i s (x −i s ,z ∗ s i ),P i s (x −i s ,z ∗ s i ,θ i );x −i s ,z ∗i s ;θ i ). Now, γ i s,x −i s (ω i )−γ i s,x −i s (θ i ) = ψ i s (C i s (x −i s ,u ∗ s i ),P i s (x −i s ,u ∗ s i ,ω i );x −i s ,u ∗ s i ;ω i ) −ψ i s (C i s (x −i s ,z ∗ s i ),P i s (x −i s ,z ∗ s i ,θ i );x −i s ,z ∗i s ;θ i ) ≤ ψ i s (C i s (x −i s ,z ∗ s i ),P i s (x −i s ,z ∗ s i ,ω i );x −i s ,z ∗i s ;ω i ) −ψ i s (C i s (x −i s ,z ∗ s i ),P i s (x −i s ,z ∗ s i ,θ i );x −i s ,z ∗i s ;θ i ) = X as∈As N Y m=1 m6=i x m s,a m s z ∗ s i {C i sas (x −i s ,z ∗ s i )+β M X k=1 P i sask (x −i s ,z ∗ s i ,ω i k )ω i k } − X as∈As N Y m=1 m6=i x m s,a m s z ∗ s i {C i sas (x −i s ,z ∗ s i )+β M X k=1 P i sask (x −i s ,z ∗ s i ,θ i k )θ i k } ≤ X as∈As N Y m=1 m6=i x m s,a m s z ∗ s i β{ M X k=1 P i sask (x −i s ,z ∗ s i ,ω i k )(ω i k −θ i k )} ≤ X as∈As N Y m=1 m6=i x m s,a m s z ∗ s i β( M X k=1 P i sask (x −i s ,z ∗ s i ,ω i k ))||ω−θ|| ∞ =β||ω−θ|| ∞ . 32 The second to the last inequality above follows from the fact that X as∈As N Y m=1 m6=i x m s,a m s z ∗ s i M X k=1 P i sask (x −i s ,z ∗ s i ,ω i k )θ i k ≤ X as∈As N Y m=1 m6=i x m s,a m s z ∗ s i M X k=1 P i sask (x −i s ,z ∗ s i ,θ i k )θ i k , because for a given (x −i s ,z ∗ s i ,θ i k ), [P i sask (x −i s ,z ∗ s i ,θ i k )] k=1,...,M is the maximizer of ψ i s ( ˜ C s , ˜ P s ;x −i s ,z i s ;θ i ) over ˜ P s ∈P s . Similar to the above arguments, we have for x −i s fixed that ,∀i∈I,s∈S, γ i s,x −i s (θ i )−γ i s,x −i s (ω i )≤β||ω−θ|| ∞ . Thus,||γ x (ω)−γ x (θ)|| ∞ ≤β||ω−θ|| ∞ . 2 Theorem 2 Application of Banach’s Theorem. For any x ∈ X, and ∀i ∈ I,s∈S, there exists a unique w i s such that ω i s = min u i s ∈S i s max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C s , ˜ P s ;x −i s ,u i s ;ω i ). Proof. NotethatW iscomplete. Hence, byBanach’sTheorem,γ x (ω)hasaunique fixed point, ω. That is,∃w such that γ x (ω) =ω, which means ω i s = min u i s ∈S i s max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C s , ˜ P s ;x −i s ,u i s ;ω i ). (9) 2 The above theorem states for all players, states, and any givenx −i ∈X −i that a unique robust value vector w i exists satisfying (9). This also implies that if β < 1, and we consider any fixed x −i s , applying the above transformation over and over again starting with an arbitrary robust value vector, will converge to the unique 33 fixed point of the transformation. We next state the definition of upper-semi continuity for correspondences and Kakutani’s fixed point theorem [37]. Definition. A correspondence φ : S → 2 S is upper semi-continuous if y n ∈ φ(x n ), lim n→∞ x n =x, lim n→∞ y n =y imply that y∈φ(x). Theorem 3 (Kakutani’s Fixed Point Theorem). If S is a closed, bounded, and convex set in a Euclidean space, and φ is an upper semi-continuous correspondence mapping S into the family of closed, convex subsets of S, then ∃ x∈S, s.t. x∈ φ(x). Let f i s (x −i s ,u i s ;ω i ) = max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C s , ˜ P s ;x −i s ,u i s ;ω i ). Definethemetricsd Xs (x s ,u s ) = max i∈I ||x i s −u i s || ∞ , d W i(w i ,θ i ) = max s∈S ||w i s − θ i s || ∞ , andd 1 (p,q) =d Xs (x s ,u s )+d W i(w i ,θ i ). We need the following lemma to show that f i s satisfies the properties needed to use Kakutani’s theorem. Lemma 1 Let p = (x s ,w i ), q = (u s ,θ i ). Given ǫ> 0,∃δ(ǫ)> 0 such that if for any p,q∈X s ×W i , d 1 (p,q)<δ(ǫ), then, ∀ ˜ C s ∈C s , ∀ ˜ P s ∈P s , ψ i s ( ˜ C s , ˜ P s ;x s ,ω i )−ψ i s ( ˜ C s , ˜ P s ;u s ,θ i ) <ǫ. Proof. Since, ˜ C s ∈ C s and C s is bounded ∀s ∈ S, we have ˜ C sas ≤ K, where K <∞. It is clear that robust values are bounded. Hence, we have ∀i∈I,s∈S, that|ω i s |≤W, where W <∞. Note that ψ i s ( ˜ C s , ˜ P s ;x s ,ω i )−ψ i s ( ˜ C s , ˜ P s ;u s ,θ i ) = | X as∈As N Y m=1 x m s,a m s ˜ C sas +β i X as∈As ( N Y m=1 x m s,a m s )( M X k=1 ˜ P sask ω i k ) − X as∈As N Y m=1 u m s,a m s ˜ C sas −β i X as∈As ( N Y m=1 u m s,a m s )( M X k=1 ˜ P sask θ i k )| 34 = | X as∈As ˜ C sas ( N Y m=1 x m s,a m s − N Y m=1 u m s,a m s ) +β i X as∈As M X k=1 ˜ P sask ( N Y m=1 x m s,a m s ω i k − N Y m=1 u m s,a m s θ i k )| ≤ X as∈As ˜ C sas ( N Y m=1 x m s,a m s − N Y m=1 u m s,a m s ) + β i X as∈As M X k=1 ˜ P sask ( N Y m=1 x m s,a m s ω i k − N Y m=1 u m s,a m s θ i k ) ≤ K X as∈As N Y m=1 x m s,a m s − N Y m=1 u m s,a m s +β i X as∈As M X k=1 N Y m=1 x m s,a m s ω i k − N Y m=1 u m s,a m s θ i k . Let δ 1 (ǫ) = min{ǫ,1} 3K(2 N −1) Q N i=1 m i s ,δ 2 (ǫ) = min{ǫ,1} 3Mβ i Q N i=1 m i s , δ 3 (ǫ) = min{ǫ,1} 3WMβ i (2 N −1) Q N i=1 m i s , and let δ(ǫ) = min{δ 1 (ǫ),δ 2 (ǫ),δ 3 (ǫ)}. Now, d 1 (p,q)<δ(ǫ) implies that,∀i∈I,s∈ S, and ∀a i s ∈ A i s , x m s,a m s = u m s,a m s +α m s,a m s and ω i s = θ i s +γ i s , where α m s,a m s < δ(ǫ), and|γ i s |<δ(ǫ). We will make use of the following algebraic identity. N Y m=1 (u m s,a m s +α m s,a m s )− N Y m=1 u m s,a m s = X I⊆1,...,N |I|≥1 ( Y m∈I α m s,a m s )( Y m∈I ′ u m s,a m s ) , where I ′ ={1,...,N}\I. Note that Q m∈I α m s,a m s < (δ 1 (ǫ)) |I| ≤δ 1 (ǫ), and that N Y m=1 (u m s,a m s +α m s,a m s )− N Y m=1 u m s,a m s ≤ X I⊆1,...,N |I|≥1 Y m∈I α m s,a m s Y m∈I ′ u m s,a m s . 35 Hence, we have K X as∈As N Y m=1 (u m s,a m s +α m s,a m s )− N Y m=1 u m s,a m s ≤ K X as∈As X I⊆1,...,N |I|≥1 Y m∈I α m s,a m s Y m∈I ′ u m s,a m s ≤K X as∈As X I⊆1,...,N |I|≥1 Y m∈I α m s,a m s < K X as∈As X I⊆1,...,N |I|≥1 γ 1 (ǫ) = ǫ 3 . We also have β i X as∈As M X k=1 N Y m=1 x m s,a m s ω i k − N Y m=1 u m s,a m s θ i k =β i X as∈As M X k=1 N Y m=1 (u m s,a m s +α m s,a m s )ω i k − N Y m=1 u m s,a m s θ i k = β i X as∈As M X k=1 N Y m=1 u m s,a m s (ω i k −θ i k )+ω i k X I⊆1,...,N |I|≥1 Y m∈I α m s,a m s Y m∈I ′ u m s,a m s ≤ β i X as∈As M X k=1 N Y m=1 u m s,a m s (ω i k −θ i k ) +β i W X as∈As M X k=1 X I⊆1,...,N |I|≥1 Y m∈I α m s,a m s Y m∈I ′ u m s,a m s ≤ β i X as∈As M X k=1 γ i s +β i W X as∈As M X k=1 X I⊆1,...,N |I|≥1 Y m∈I α m s,a m s < β i X as∈As M X k=1 δ 2 (ǫ)+β i W X as∈As M X k=1 X I⊆1,...,N |I|≥1 δ 3 (ǫ) = ǫ 3 + ǫ 3 = 2 ǫ 3 . 36 Thus, ψ i s ( ˜ C s , ˜ P s ;x s ,ω i )−ψ i s ( ˜ C s , ˜ P s ;u s ,θ i ) ≤ K X as∈As N Y m=1 x m s,a m s − N Y m=1 u m s,a m s + β i X as∈As M X k=1 N Y m=1 x m s,a m s ω i k − N Y m=1 u m s,a m s θ i k < ǫ 3 +2 ǫ 3 =ǫ. 2 ThefollowingtwolemmasaredirectconsequencesofLemma1andthedefinition of f i s (x −i s ,u i s ;ω i ). Lemma 2 The function f i s (x −i s ,u i s ;ω i ) is continuous ∀i∈I, and s∈S. Lemma 3 f i s (x −i s ,u i s ;ω i ) is convex in u i s for fixed x −i s and ω i . We need the following transformation and definition in order to prove Lemma 5, which is required to show the upper semi-continuity result in the main existence theorem below (Theorem 4): Let γ i s,x −i s (ω i ) =α i s,ω i (x −i s ). Define τ i (x −i ) ={ω i = (ω i 1 ,...,ω i M ) :ω i s = min u i s ∈S i s max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C s , ˜ P s ;x −i s ,u i s ;ω i ),s∈S}, and denote the s th element of τ i (x −i ) by τ i s (x −i ). ProofofLemma4followsdirectlyfrom[20]andLemma1above. ProofofLemma 5 follows directly from Lemma 4 as shown in [20]. These proofs are presented using our notation in the Appendix. Lemma 4 is used to prove Lemma 5, and Lemma 5 is used to show the upper semi-continuity result required by Kakutani’s fixed point theorem. 37 Lemma 4 α i s,ω i (x −i s ) is continuous on X −i s . Furthermore, the set {α i s,ω i |ω i is bounded } is equicontinuous. Lemma 5 If x −i,n →x −i and τ i s (x −i,n )→ω i s as n→∞, then τ i s (x −i ) =ω i s . Theorem 4 (Existence of Equilibrium in Discounted Robust Stochastic Games) Suppose that uncertain transition probabilities and payoffs in a discounted robust stochastic game belong to compact sets and that the set of actions and players, who use stationary strategies, are finite. Then, an equilibrium point of this game exists. Proof. We are now ready to apply Kakutani’s fixed point theorem. We will show thatthefixedpointofasuitablyconstructedcorrespondenceisanequilibriumpoint. To this end, let φ(x 1 ,...,x N ) = {(y 1 ,...,y N )∈X|y i s ∈ argmin u i s ∈X i s max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C sas , ˜ P s ;x −i s ,u i s ;ω i ), and ω i s = min u i s ∈X i s max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C s , ˜ P s ;x −i s ,u i s ;ω i ),∀s∈S,i∈I}. Note that by definition, φ(x)⊆X,∀x∈X. Next, we show that φ(x 1 ,...,x N ) is a convex set. Suppose that (z 1 ,...,z N ),(v 1 ,...,v N )∈φ(x 1 ,...,x N ). Then,∀u i s , and s∈S,i∈I, w i s =f i s (x −i s ,z i s ;ω i ) =f i s (x −i s ,v i s ;ω i )≤f i s (x −i s ,u i s ;ω i ). Hence, for any λ∈ [0,1] and∀i∈I,s∈S, w i s =λf i s (x −i s ,z i s ;ω i )+(1−λ)f i s (x −i s ,v i s ;ω i )≤f(x −i s ,u i s ;ω i ) 38 By the convexity of f i s (x −i s ,u i s ;ω i ), we obtain w i s = f i s (x −i s ,((λ)z i s +(1−λ)v i s );ω i ) ≤ λf i s (x −i s ,z i s ;ω i )+(1−λ)f i s (x −i s ,v i s ;ω i ) ≤ f i s (x −i s ,u i s ;ω i ), and hence, (λ)(z 1 ,...,z N )+(1−λ)(v 1 ,...,v N )∈φ(x 1 ,...,x N ). Finally, we must show thatφ(x 1 ,...,x N ) is an upper semi-continuous correspon- dence. Suppose x n → x, y n → y, y n ∈ φ(x n ). Taking a subsequence, we have τ i s (x −i,n ) → ω i s and by Lemma 5 τ i s (x −i ) = w i s . Using the triangle inequality, we have∀i∈I,s∈S that |f i s (x −i s ,y i s ;ω i )−w i s | ≤ |f i s (x −i s ,y i s ;ω i )−f i s (x −i,n s ,y i,n s ;τ i (x −i,n ))| +|f i s (x −i,n s ,y i,n s ;τ i (x −i,n ))−w i s | = |f i s (x −i s ,y i s ;ω i )−f i s (x −i,n s ,y i,n s ;τ i (x −i,n ))| +|τ i (x −i,n )−w i s |→ 0 as n→∞. Therefore, w i s =f i s (x −i s ,y i s ;ω i ), and since τ i s (x −i ) =w i s , we obtain that ω i s = min u i s ∈X i s max ˜ Cs∈Cs ˜ Ps∈Ps ψ i s ( ˜ C s , ˜ P s ;x −i s ,u i s ;ω i ). Therefore, y ∈ φ(x), completing the proof that φ is an upper semi-continuous cor- respondence. The fact that φ(x) is closed follows from the fact that it is an upper-semi con- tinuous correspondence. Therefore, φ satisfies the assumptions of Kakutani’s fixed point theorem. 2 39 3.5. Calculation of an Equilibrium Point Now that we have proved the existence of an equilibrium point in a discounted robust stochastic game, our next step is to calculate such a point. We will show that when the ambiguity in the probability transition data of the game belongs to a polytopeintersectedwiththeprobabilitysimplex, theproblemoffindinganequilib- rium point could be cast as a feasibility problem that has multi-linear constraints. Although we could have considered ambiguity both in payoffs and the transition data of a game and obtained a feasibility problem that characterizes equilibria, for simplicity, we consider ambiguity in the transition data only. RecallthedefinitionofarobustequilibriumgiveninCondition(7). Thesecondi- tions are equivalent to the requirement that∀i∈I,s∈S,∃q i s ∈ℜ such that (x i s ,q i s ) is an optimizer of the following robust mathematical programP R with the objective value at optimality being equal to w i s : P R :={w i s = min u i s ,q i s q i s : q i s ≥ max ˜ Ps∈Ps ψ i s (C s , ˜ P s ;x −i s ,u i s ;w i ),1u i s = 1,u i s ≥0}. Here, (x −i ,w i )istreatedasdata. Definetheuncertainprobabilitytransitionmatrix induced by a strategy (x −i ,u i ): ˜ P(x −i ,u i ) = X as∈As N Y m=1 m6=i x m s,a m s u i s,a i s ˜ P sask M, M s=1,k=1 . Denote the s th row of ˜ P(x −i ,u i ) by ˜ p s (x −i ,u i ). Let ˜ p s denote the uncertain transition probability vector associated with the starting state s, that is, ˜ p s = [ ˜ P sask ] as∈As;k∈S .Let1beavectorofonesofappropriatedimension. LetE i s (x −i s ,C i )∈ 40 ℜ Q N n=1 n6=i m n s ×m i s denote the matrix a row of which is given by the vector N Y m=1 m6=i x m s,a m s C i s,a −i s ,a i s a i s ∈1,...,m i s . Note that we have the following requirement in P R : q i s ≥ max ˜ Ps∈Ps ψ i s (C s , ˜ P s ;x −i s ,u i s ;w i ) = max ˜ p s β ˜ p s (x −i ,u i )w i +1 T E i s (x −i s ,C i )u i s (10) We assume that for any alternative combination of the players, the uncertain tran- sition probabilitites belong to a polytope intersected with the probability simplex. LetQ s ∈ℜ ( Q N i=1 m i s )×(M. Q N i=1 m i s ) beamatrixof0sand1ssuchthateachofitsrows that corresponds to a pure strategy combination a s ∈A s satisfies P k∈S ˜ P s,as,k = 1. In other words, we assume that the transition probabilities belong to the following uncertainty set: P = {˜ p s ,s ∈ S : A s .˜ p s ≥ b s ,Q s .˜ p s = 1,˜ p s ≥ 0}, where A s ∈ℜ ls×(M. Q N i=1 m i s ) . Consider the maximization problem in P R , where (x −i s ,u i s ,w i ) is regarded as data. Given that the uncertainty set is as stated, for fixed (x −i s ,u i s ,w i ), this maxi- mization problem is equivalent to the following LP: {max ˜ p s β ˜ p s (x −i ,u i )w i :A s .˜ p s ≥b s ,Q s .˜ p s =1,˜ p s ≥ 0}. (11) Define the column vectorz i s = [ Q N m=1 m6=i x m s,a m s u i s,a i s .w i k ] a m s ∈A m s ;a i s ∈A i s ;k∈S such that the in- dices of z i s match the ones of ˜ p s . Let Y i s (x −i s ,w i )∈ℜ ( Q N n=1 m n s .M)×m i s be the matrix such that Y i s (x −i s ,w i ).u i s = z i s . Let m i s and n i s be the dual variable vectors. The dual of 41 problem 11 is {min ms,ns h [b s ] ′ ;[1] ′ i . m i s n i s : A s 1 s ′ . m i s n i s ≥βY i s (x −i s ,w i ).u i s , m i s ≤ 0}. (12) By the definition of our uncertainty set, problem (11) is feasible and it is clear that it is bounded. By strong duality, problem (12) is bounded and feasible and its optimal objective value is equal to that of problem (11). Therefore, if (x −i ,u i ,w i ) satisfies condition (10), then (10) is equivalent to the condition that∃m i s ∈ℜ ls and ∃n i s ∈ℜ Q N i=1 m i s such that: q i s −1 T E i s (x −i s ,C i )u i s ≥ h [b s ] ′ ;[1] ′ i . m i s n i s (13) A s 1 s ′ . m i s n i s ≥ βY i s (x −i s ,w i ).u i s m i s ≤ 0 Conversely, if condition (13) is satisfied, then problem (12) is feasible. Then by weak duality, any feasible solution [b s ] ′ ;[1] ′ . m i s n i s of problem (12) is greater than or equal to any solution β ˜ p s (x −i ,u i )w i of problem (11), so, q i s −1 T E i s (x −i s ,C i )u i s ≥ h [b s ] ′ ;[1] ′ i . m i s n i s ≥ max ˜ p s β ˜ p s (x −i ,u i )w i . Therefore conditions (10) and (13) are equivalent. This proves: 42 Lemma 6 Condition (10) is equivalent to condition (13). Let T i (x) = " X as∈As N Y m=1 x m s,a m s t i sask # M, M s=1,k=1 , anddenotethes th rowofT i (x)byt i s (x). Lett i s denotethevariablesrepresentingthe transition probabilities adopted by player i according to i’s worst-case perspective, associated with the starting state s. That is, t i s = [t i sask ] as∈As;k∈S . Theorem 5 A stationary strategy x is a robust equilibrium point iff ∀i∈I,s∈S,∃m i s ∈ℜ ls ,n i s ∈ℜ Q N i=1 m i s ,t i s ∈ℜ M. Q N i=1 m i s such that for j = 1,...,m i s , (w i s ,x i s ,m i s ,n i s ,t i s ) satisfies w i s =1 T E i s (x −i s ,C i )x i s +βt i s (x)w i [e i s,j ] ′ E ′ i s (x −i s ,C i )1+β[e i s,j ] ′ Y ′ i s (x −i s ,w i )t i s ≥w i s ≥ h [b s ] ′ ;[1] ′ i . m i s n i s −1 T E i s (x −i s ,C i )x i s A s Q s ′ . m i s n i s −βY i s (x −i s ,w i ).x i s ≥ 0 1x i s = 1, m i s ≤ 0, x i s ≥ 0, A s .t i s ≥b s , Q s .t i s =1. Proof. Recall problem P R . By lemma 6, if x is a robust equilibrium point, ∀i ∈ I,s∈ S,∃q i s ∈ℜ,m i s ∈ℜ ls ,n i s ∈ℜ Q N i=1 m i s such that (x i s ,q i s ,m i s ,n i s ) is an optimizer 43 of w i s = min u i s ,q i s ,m i s ,n i s q i s (14) q i s −1 T E i s (x −i s ,C i )u i s ≥ h [b s ] ′ ;[1] ′ i . m i s n i s A s Q s ′ . m i s n i s ≥ βY i s (x −i s ,w i ).u i s , m i s ≤ 0, 1u i s = 1, u i s ≥ 0 Let e i s,j be the j th unit vector. Dual of the above is: max ν i s ,t i s ν i s s.t. A s .t s ≥b s , Q s .t i s =1, ν i s ≤ [e i s,j ] ′ E ′ i s (x −i s ,C i )1+β[e i s,j ] ′ Y ′ i s (x −i s ,w i )t i s ,j = 1,...,m i s . The statement in the theorem follows from strong duality and Theorem 2. For the other direction, suppose that ∀i ∈ I,s ∈ S,∃m i s ∈ ℜ ls ,n i s ∈ ℜ Q N i=1 m i s ,t i s ∈ ℜ M. Q N i=1 m i s such that for j = 1,...,m i s , (x i s ,m i s ,n i s ,t i s ) satisfies the above system. Let ν i s = min j∈1,...,m i s [e i s,j ] ′ E ′ i s (x −i s ,C i )1+β[e i s,j ] ′ Y ′ i s (x −i s ,w i )t i s ,j = 1,...,m i s , 44 q i s = h [b s ] ′ ;[1] ′ i . m i s n i s +1 T E i s (x −i s ,C i )u i s Then, for (x −i s ,w i ), (x i s ,q i s ,m i s ,n i s ) is feasible for problem 14, and (ν i s ,t i s ) is feasible for problem (8) with ν i s ≥q i s . By weak duality ν i s ≤ q i s , so ν i s =q i s . Hence, (x i s ,q i s ,m i s ,n i s )isoptimalforproblem14. Hence,(x i s ,q i s )isoptimalinP R . Therefore, x is an equilibrium point of the robust stochastic game. 2 Nowthatwehaveprovedthatequilibriumpointsindiscountedrobuststochastic games exist and that an equilibrium point could be found using a mathematical programming formulation, we next present two applications of discounted robust stochastic games. 45 Chapter 4. Applications In this chapter, we present three applications of discounted robust stochastic games. First, we illustrate the use of this method on a homeland security prob- lem, namely the MANPADS problem. Then, we present the results of a generic stochastic game model that has time-variant payoffs, where a certain target could be attacked over time, in different time periods. Finally, we present another use of our methodology in a queuing control context. 4.1. MANPADS Model In this section, we illustrate the use of our new methodology for the MANPADS case study. We first present a stochastic game model where the data are known exactly. Uncertainty in the data is then introduced and the consequences of using a robust approach by players are explained subsequently. MANPADS are man portable surface-to-air missiles. Recently, there have been publicized MANPADS attacks on large civilian aircrafts in Kenya and Iraq, which increased the fear of such attacks on the US soil or outside the US [76]. A coun- termeasure that could be installed on aircraft to deflect such missiles incoming to an aircraft is called direct infrared countermeasures (DIRCMs). DIRCMs jam the heat seaking device of a MANPAD and deflect its course away from the airplane. The heat seeking MANPADs are called infrared missiles (IR). In 2004, the US De- partment of Homeland Security (DHS) initiated a $100 million program to develop DIRCM countermeasures [76]. Currently, there is a pending decision by Congress on whether to install these countermeasures on some or all US commercial aircrafts. To aid DHS in this decision process, the Center for Risk and Economic Analysis of Terrorism Events (CREATE) has been conducting decision analysis for the MAN- PADS threat. Within CREATE, MANPADS case is studied in detail in [52]. In 46 another study by CREATE researchers [76], a standard decision tree methodology is used to perform cost / benefit analyses of installing MANPADS countermeasures on civilian aircrafts in the US. Our model has the following features: First, alternatives for the attacker in the first state are to attack or not to attack using MANPADS. Second, the data used in the model remain in the ranges suggested in [76]. Third, if the attacker chooses not toattack, andthedefenderchoosesnottoinstallcountermeasures, thenweconsider the possibility that MANPADS threat could still exist with some probability. Fi- nally, this model is a non-zero sum stochastic game where the defender’s loss is not necessarily the gain of the attacker. Therefore, we allow the possibility of having different payoffs for different players. Althoughourmainmodelisfundamentallydifferentfromthedecisiontreemodel used in [76], it paralells it. Like the main model, the decision tree model in [76] con- siders two options for the decision-maker: installing countermeasures or not. It captures the probability of an attack using a chance node that follows the decision of installing countermeasures or not. Similarly, the main model captures the prob- ability of an attack by declaring it as an option for the attaker. Considering the baseline values and ranges, the data of the decision tree model are mapped into the data of the stochastic game model. Like the main model, the tree model captures the possibility of safe landing and fatal crash given that there is a hit, and the possibility of a miss given that there is an attack. The fundamental difference between the two approaches lies in the fact that the main model is a stochastic game model, not a decision tree model. Hence, in the first state, players plan to choose their alternatives secretly, without any order of their possible actions, whereas in the tree model the attempt chance node follows the decision to install countermeasures or not. Unlike the decision tree where the 47 roll-back procedure is used to obtain the solution, the stochastic game model is mapped into an optimization problem, the solution of which yields an equilibrium point, where none of the players would be willing to deviate form their chosen alter- natives. Another main difference between the two models is that the game model allows the possiblity of using mixed strategies, which leads to a measure if not all but some of the commercial airliners are to be equipped with countermeasures. The decision tree counterpart of the game model solely considers two extreme options: installingcountermeasuresornot. Yetanotherdifferencebetweenthemodelsisthat iftheattackerchoosesnottoattack,andthedefenderchoosesnottoinstallcounter- measures, then the game model takes into account the possibility that MANPADS threatcouldstillexist. Finally, thepossibilityoftheplayersgettingdifferentpayoffs at a given state is what differs our approach from the decision tree method, where a single payoff is associated with a consequence. Figure 1 depicts our main model with original baseline data values. The first state in the model includes 2 alternative options for each player: The alternatives of “Attack” and “No Attack” for the attacker and the alternatives of “CM” and “No CM” that stand for the decisions to install DIRCM countermeasures on civilian airliners. The pairs of numbers in the cells of this 2x2 matrix indicate the baseline payoffs associated with each alternative combination (in billions of dollars). For instance, ifthedefenderchoosestoinstallcountermeasuresandtheattackerchooses to attack, the attacker is incurred a cost of attacking, which is $1 billion, whereas the defender is incurred the installation and maintenance cost of countermeasures of $10 billion over a horizon of ten years. The payoff and probability figures in the model are precisely the baseline values suggested in [76]. Furthermore, every payoff and probability value in this model are parametrized and could be changed for sensitivity analysis purposes. The attacking cost for the attacker is taken to be 48 Figure 1: The Main Manpads Model 49 1% of thefatal crashcost tothedefender. Based on thepossibly chosenalternatives in the first state, the process then moves into subsequent states where additional payoffs are incurred to the two players. The payoffs associated with safe landing and fatal crash are calculated in the same manner as in the previous example, with baseline values being precisely the values used in [76]. Except for the first state, the cost of the defender is a gain for the attacker. Hence, this model is non-zero sum and captures the cost of attacking from the attacker’s perspective, a point which cannot be taken into account in conventional decision tree analysis. If the players choose not to attack and not to install countermeasures, there is a 50% chance that the same game will be played in the next ten years. This implies that there will be a 50% chance that the MANPADS threat will exist in the next ten years, if nothing happens. This value is parametrized as well, and we investigate its effect in the sensitivity analysis. Table 1 depicts the equilibrium solution of this stochastic game with baseline data. Players CM No CM Attack No Attack Value (Billions) Defender 1 0 - - 28.2 Attacker - - 1 0 -17.2 Table 1: Equilibrium Solution for the main MANPADS model Since this is a nonzero sum game, the values to the players starting the game in state 1 are $28.2 billion and $-17.2 billion, where the minus sign indicates a gain to the attackers. At equilibrium, the attacker and the defender choose to attack and to install countermeasures respectively, with certainty. This is so since the cost of attacking to the attackers is quite small (1$ billion), whereas the expected gain to the attacker from attacking is large, due to the zero-sum property of the future states. The zero sum property in this model captures the antagonism inherent in our problem. However, since it makes attacking a very attaractive option for the 50 attacker,equilibriumsolutionprescribestheattackerstoattack,whichistherational action to take for the maximizing player. Therefore, it would be beneficial not only to analyze the equilibrium strategies of the players (or at least of the attackers’) but also to consider the best response strategies of the defender against non-equilibrium strategies of the attackers’. We perfrom such analyses in the next section. 4.2. The Robust Game Model and Sensitivity Analyses In this section, we use our robust stochastic game approach on the manpads model by considering intervals of probability transitions around the baseline values and solving this problem using Theorem 5. The intervals of probabilities that are changed are presented in Table 2. state next state alternative combination lower bound upper bound s1 s2 (1,1) 0 0.3 s1 s3 (1,1) 0.7 1 s1 s4 (1,2) 0.7 1 s1 s5 (1,2) 0 0.3 s1 s1 (2,2) 0.9 1 s2 s6 (1,1) 0.5 0.8 s2 s7 (1,1) 0.2 0.5 s4 s8 (1,1) 0.5 0.8 s4 s9 (1,1) 0.2 0.5 Table 2: Intervals for Probability Transitions In this case, if the players play robustly, we see that the equilibrium strategies remain the same; that is, the defender chooses to install countermeasures and the attacker chooses to attack. However, when players play robustly, they obtain worse values, starting in state 1. In other words, the gain for the attacker decreases and the cost for the defender increases. For instance, player 1 considers a higher transition probability from state 1 to state 3 than the baseline probability of 0.25. This is in fact a pessimistic perspective by player 1, beacuse he could have earned 51 more had he considered a higher transition probability from state 1 to state 2, since he then will have the possibility of earning the high payoff assciated with state 7. A similar pessimistic behaviour is observed for player 2 as well. The robust solution for this model is the same as the nominal solution under its own worst-case transition probability data. Here, using the notation of chapter 3, we calculate the values obtained by the nominal solution under its own worst-case transition data by finding the global maximum of max w i s X i∈,s∈S w i s s.t. 1 T E i s (x −i s ,C i )x i s +βt i s (x)w i =w i s A s .t i s ≥b s , 1 s .t i s =1. This problem is modeled using AMPL and solved using the solver LOQO. Conse- quently, the robust solution under the nominal transition data is the same as the nominalsolution. Thesametypeofanalyses, thatis, theequilibriumsolutiontothe robust stochastic game model, and the comparison of robust and nominal solutions’ performances, are also carried out using larger intervals than the ones presented in Table 2. As a result, we see that the values for player 2 increase, whereas values for player 1 decrease. Nevertheless, the equilibrium strategies given by the robust game remains the same. The reason for this is that the payoffs, as opposed to the proba- bility transition values, of the MANPADS game model are the dominant factors for the players to choose their equilibrium strategies. The payoffs (benefits) to the attacker in our model are significantly larger than the cost of attacking in state 1. Since players play rationally and hence the attacker is the maximizing player, and the defender is the minimizing player, in this game, 52 players always choose an equilibrium pair that prescribes “attacking” and “defend- ing”. Since the payoffs of this model prescribe the players their first alternatives, changes in the transition probabilities do not alter the equilibrium policies but the values obtained by the two players. This is indeed a drawback of the “equilib- rium” concept in game theory applications to homeland security, rather than the discounted robust stochastic game methodology developed in this dissertation. An equilibrium means that no player is willing to deviate unilaterally from his respec- tive strategy, and that each player’s strategy is a best response to the other players’ strategies. Players reach an equilibrium by considering the payoffs of each others’ actions. In this model, the payoffs for the “attack” option is quite attractive com- pared to the cost of attacking. Hence, as a rational decision maker, the attacker prefers to attack in the equilibrium. Our model captures the utility (payoff) aspect of the attacker’s decision process. However, there are other factors that determine the attacker’s willingness to attack in homeland security related problems, such as the difficulty of plotting and organizing an attack, and the psychology of the attack- ers. For instance, the fear of getting caught after an attack could be incorporated into the payoff table of a game model as the cost of getting caught. However, it is difficult to incorporate the mental processes leading to the attackers’ decisions or the pyschological aspect of their behavior into a payoff table. Moreover, game theory and the equilibrium concept apply to economic competition situations more naturally, since it is plausible to assume that a firm’s objective in a competitive en- vironment is to maximize its profit. However, the attackers certainly do not behave according to this rationale. On the other hand, Game theoretical concepts, such as the best response func- tions, can allow us to analyze a player’s best response against the other players’ possibly non-equilibrium strategies. In other words, we can run analyses where we 53 do not solve a homeland security related game theoretical problem for equilibrium, but use it in a way to calculate the best response strategies of a defender against possiblenon-equilibriumstrategiesadoptedbytheattacker. Therefore,wenextcon- sider that the attacker fixes his attack strategy to a non-equilibrium strategy, and calculate the best response strategies for the defender, under various data scenarios. However, we discuss the advantages and disadvantages of using robust policies in section 4.3 using a queuing control model. First, we run two-way sensitivity analysis by changing the fatal crash cost and the probability of attack (chosen by the attacker). To this end, we fix the attempt probabilityofattackersatvariouspointsbetween0and1andforeachfixedattempt probability and a given fatal crash payoff, we solve an optimization problem for the defender, which has nonconvex quadratic constraints and a linear objective, as follows min w 2 X i∈,s∈S w 2 s s.t. 1 T E 2 s (x −2 s ,C 2 )x 2 s +βt s (x)w 2 =w 2 s wherex 1 is treated as a parameter andt s represents the transition data. This prob- lemisderivedfromthenonlinearprogramsforequilibriumcalculationsinstochastic games given in [19], and given the fixed strategy of the attacker, it yields the best response for the defender by minimizing his total cost throughout the game. AMPL is used for modeling the mathematical program and the nonlinear solver KNITRO is used for calculating best responses for the defender. The area below the function in Figure 2 depicts the values of fatal crash costs and attempt probabilities that that do not favor installing countermeasures. Coun- termeasuresarepreferredabovethisfunction. Forinstance,iftheattackprobability 54 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 300 350 400 450 500 probability of attack cost of fatal crash (billions) no countermeasures preferred countermeasures preferred Figure 2: 2-way sensitivity analysis on the cost of fatal crash and the probability of attempt is 0.2, and the total economic costs of fatal crash is less than about $150 billion, then countermeasures are not preffered. Figure 3 depicts the values to the defender given by the best response strategies against the attacker’s attack probabilities and various economic costs for a fatal crash. For instance if the attack probability is 1 and if the fatal crash costs are ignored (or assumed to be zero), the best response value to the defender is about $19 billion, which comprises the economic costs ass- cociated with “safe landing” and/or “miss” states. Next, we run the same analysis with a fixed countermeasure cost of $30 billion instead of $10 billion. Figure 4 plots the areas where the fatal cost and attack probability data favor to insall counter- measures or not. As depicted in the figure, the higher countermeasure cost of $30 billion results in higher fatal crash costs that favor countermeasures. We next run two-way sensitivity analyses by changing the strategies of the de- 55 0 0.2 0.4 0.6 0.8 1 0 100 200 300 400 500 0 10 20 30 40 50 60 probability of attack cost of fatal crash (billions) value to the defender (in billions) Figure 3: Best response values to the defender fender and the cost of fatal crash, and by defining the problem from the attacker’s point of view. In this analysis, we keep the payoffs associated with safe landing and attackingattheiroriginalvalues. Analysesindicatethattheoptimizationofthebest response function for the attacker for any combination of the defense strategies and positive fatal crash costs ($0-$400 billion dollars) prescribes the attacker to attack. This is so, since the gain that the attacker achieves from attacking is substantial, comparedtothecostofattackandtothebenefitsofnotattacking. Wethenanalyze the problem from the attacker’s point of view by assuming that the fatal crash costs range from -$400 to $400 billion, which indicates a gain to the defender, possibly due to the benefit to the society caused by thwarting an attack. In this case, if the defender chooses to install countermeasures with some probability in the range of 0 to 0.4, then the attacker chooses to attack if the cost of a fatal crash to the attacker is less than $50 billion. If the defender considers installing countermeasures with 56 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 300 350 400 450 500 probability of attack cost of fatal crash (billions) no countermeasures ; countermeasures Figure 4: 2 way sensitivity analysis with countemeasures costs set to $30 billion some probability between 0.5 and 0.8, the threshold for the cost of fatal crash to the attacker, under which the attacker is willing to attack, increases to $100 billion. If the defender considers installing countermeasures with some probability between 0.8 and 1, this threshold increases to $150 billion. This means that as the defender puts more weight to his countermeasure installation option, the attacker is willing to increase his threshold for his cost of fatal crash that will prescribe him to attack. However, we see a limit to this threshold, which is $150 billion. The attacker is willing to increase his threshold since the probability of safe landing is high when there are countermeasures installed, and the gain to the attacker from safe landing is taken to be $25 billion dollars. Another reason for the attacker to increase his threshold is that the cost of attacking is only $1 billion. Therefore, we next run sensitivity analyses by changing the cost of fatal crash in the range of -$400 to $400 billion, while keeping the cost of attack as much as the countermeasure cost ($10 57 billion), and while safe landing causes a $25 billion cost to the attacker, rather than the defender. Under this data scenario, the attacker chooses to attack if the gain he achieves from attacking exceeds $150 billion, no matter the defender chooses to install countermeasures or not. We see from these analyses that when the payoffs to the attacker are at their original values, the attacker is willing to increase his threshold for his cost of fatal crash that will prescribe him to attack as the defender puts more weight to his countermeasure installation option. On the other hand, we see that the attacker could require a minimum gain from attacking, if the cost of attacking is high, and safe landing incurs costs on the attacker. Next, we run two-way sensitivity analysis on the probability of re-play and cost of fatal crash, when the chances for the attack probability are 50%. The area where the countermeasures are preferred is plotted in Figure 5. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 probability of re−play cost of fatal crash (billions) with %50 attack chance no countermeasures; countermeasures Figure5: 2waysensitivityanalysisontheprobabilityofre-playandfatalcrashcost (with a fixed attack probability of 0.50) 58 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 300 350 probability of re−play cost of fatal crash (billions) with %20 attack chance no countermeasures preferred; countermeasures preferred Figure6: 2waysensitivityanalysisontheprobabilityofre-playandfatalcrashcost (with a fixed attack probability of 0.20) Figure 5 indicates that if the re-play probability is between 0.1 and 0.4, then countermeasures should be preferred if the economic costs of a fatal crash exceeds $100 billion, when the cost of countermeasures is at $10 billion. Furthermore, we see that as the re-play probability increases, the costs of a fatal crash that favors countermeasures decreases as a step function of the re-play probabilities. Figure 5 outlines where these steps occur. Figure 5 also shows, for a $10 billion countermea- surecost,thatifthere-playprobabilityisbetween0.5and0.9,thencountermeasures should be preferred if the cost of a fatal crash exceeds $50 billion. Next, we run the same analyses for an attack probability of 0.20 and show the plot in Figure 6. In this case, for a $10 billion countermeasure cost, if the re-play probability is 0.2, countermeasures are not preffered unless the cost of fatal crash exceeds $250 billion, which is more than twice the fatal crash cost when the attack 59 probability is 0.50. In this section, we have applied a stochastic game model and sensitivity analyses to investigate the cost effectiveness of directed infrared countermeasures to protect commercial airliners from a possible MANPADS attack. Our analyses suggest that the countermeasures are cost-effective if the countermeasures costs over a ten year period is around $10 billion, the attack probability is high (> $4 billion), and if the fatal crash cost is more than about $ 75 billion. This conclusion mirrors the conclusion given in [76]. Furthermore, we conclude that if the attack probability is less than 0.4 and the re-play probability is low (around 0.1), then the counter- measures are not cost-effective unless economic costs associated with a fatal crash are very high (above $250 billion). Finally, our results suggest that assuming the attackprobabilityisaround0.2, countermeasurescouldbecost-effectiveifeconomic costs of a fatal crash is above $50 billion, and if the MANPADS threat continues to exist with high probability, given that no attacks occur and no countermeasures are installed. In the next section, we present a different stochastic game model that captures the time aspect of a security related problem. 4.3. A Time-Variant Model This section presents a stochastic game model that has time-variant payoffs. In this example, a certain target could be attacked over time, in different time periods. At each time period, the attacker has the options of attacking and not attacking whereas the defender has the alternatives of defending and not defending. Hence, a 2x2 matrix game is played in each time period. Once the game starts and players choose their alternatives, the game ends with a given probability, or is played with some probability in the next time period with different payoffs. If players continue 60 to play, there is always a probability that the play will end at the end of a given time period. In this example, we consider 40 time periods (states). The utility functions over time for different alternative combinations are shown in Figure 7. 0 5 10 15 20 25 30 35 40 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 time period utility (for player 1) attack & defend attack & not defend not attack & defend not attack & not defend Figure 7: Utilities to attacker The payoff (utility) scenario for this example is as follows: Figure 7 indicates that the utility of the pair “attack & defend” to player 1, u 1 (a,d), is a concave, decreasing function of the discrete time points. Untill the nineteenth time period, u 1 (a,d) exceeds the utility of the pair “attack & not defend” (u 1 (a,n)). Since this is a zero-sum game, the utility values in Figure 7 are also the disutility values to the defender. Note that the disutility of the (attack & defend) pair is larger than that of the (attack & not defend) pair untill the nineteenth time period, which captures the costs associated with defensive investments. From this point on in time, the disutility of the (attack & defend) pair becomes smaller (and hence, preferable from 61 the defender’s perspective) than that of the (attack & not defend) pair. On the other hand, utility of the (not attack & defend) (u 1 (n,d)) and (not attack & not defend) (u 1 (n,n)) pairs are less thanu 1 (a,d) andu 1 (a,n). However, when the time exceeds approximately the 30 th period, the “attack” option becomes less attractive for player 1 compared to “not attack” option, no matter what the defender chooses. This could be attributed to the diminishing attractiveness of a target over time. Our scenario indicates that as the time passes, the attack option for the attacker becomeslessandlessattractive,eventhoughthedefenderdoesnotdefendthetarget (intheverylasttimeperiods). Onelastpointthatthisutilityscenarioimpliesisthat the disutility of defending is greater than not defending when the attacker chooses not to attack. For simplicity and illustration purposes, we use a simple transition rule for this game: we assume that the probability of playing another game in the nexttimeperiodis0.4,nomatterwhattheplayerschooseinthecurrenttimeperiod. Hence, 0.6 is the probability that the whole game will end at the end of each time period. We first solve this model using the stochastic game formulations given in [19], the AMPL modeling language, and the solver LOQO. The strategy profiles for the players at equilibrium are shown in Figure 8. Figure 8 indicates that until about the 20 th time period, the defender chooses not to defend, and the attacker chooses to attack. After this point in time both players choose their first alternatives, that is, attack and defend. The utility curve that the defender wishes to follow is certainly “not attack & not defend”. On the other hand, the utility curve for “attack & defend” maximizes the attacker’s utility. As equilibrium strategy profiles show, players reach an equilibrium that is in betweenthesetwoextremecases. Approximatelyafterthe20 th period,thedisutility of the (attack & defend) pair becomes smaller (and therefore, preferable from the 62 0 5 10 15 20 25 30 35 40 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 time period probability attack strategy defense strategy Figure 8: Strategy profiles defender’s perspective) than that of the (attack & not defend) pair, and hence, the utility curve the players follow at equilibrium becomes that of the (attack & defend) pair. Finally, approximately after the 29 th time period, both players start using mixed strategies and put more and more emphasis on the (not attack & not defend) pair. This is so because our payoff (utility) scenario indicatesthat the attacker loses interest in attacking in the very future time periods. This behavior of the attacker prescribes the defender to switch to his “not defend” option gradually, since the disutility of defending is higher than not defending when attacker chooses not to attack. Next, the transition interval data shown in Table 3 are considered. The first entry of a row in this table is the transition probability interval of going from one state to the next. The second entry is the probability of absorption in a dummy state, which indicates the end of the game. For each row of Table 3, we solve the 63 model using the utility scenarios shown in Figure 7 and using Theorem 5. For each row, the equilibrium solution gives the exact strategy profiles depicted in Figure 8 but, of course, different values to players. This is so, because, as we observed in the sensivity analyses section for the MANPADS problem, players adopt different worst-case transition data from a given interval. prob of going to the next state prob of game ending (absorption) (0.3-0.5) (0.5-0.7) (0.2-0.6) (0.4-0.8) (0.1-0.7) (0.3-0.9) (0.7-0.9) (0.1-0.3) Table 3: Transition Data Scenarios Similar to a prior conclusion on the MANPADS model, we conclude that the payoffs of this model are the dominant factors that determine the strategy profiles oftheplayers,andthismakesthestrategyprofilesveryinsensitivetotransitiondata. Although this is not necessarily a specific charateristic of homeland security related decision problems, equilibrium profiles for both security related models presented in this study are insensitive to transition probability data, and the payoffs of these gametheoreticalmodelsaretheprevailingfactorsthatdefinetheplayers’strategies. On the other hand, equilibrium calculations using payoffs (or utilities) in a game theoretical model for homeland security related decision problem fail to capture the human aspect of terrorism. Equilibrium concepts in game theory apply to economic competition situations, to actual games (such as tik tak toe, and etc.), and, as we present in the next section, to systems control in a more natural way, since it is reasonable in all these applications to assume that a decision maker’s objective in a competitive environment is to maximize his profits, or minimize costs. The utility scheme, when used in game theory for equilibrium calculations, fails to take account for the human behavior aspects of attackers. Instead, it considers 64 the players to be an agent who solely wishes to maximize his gain, disregarding the human factors present in terrorism, such as the mental processes of individuals leadingtoanattackdecisionandtheirmotivation. Hence,weargueagainstusingthe equilibrium concept for terrorism, although some authors contend that equilibrium and the payoff scheme in game theory capture antagonistic intentions, and hence are suitable tools to be used to analyze homeland security problems (see [59]). Although equilibrium of zero-sum models capture antagonism, they fail to capture the underlying motivation for attackers decisions and behavior. Gametheoryisstillausefultooltoanalyzeterrorism, since, besidesequilibrium, it offers the use of best response functions. Using the given utility values for differ- ent players’ alternatives, a player’s best response to other players’ strategies could be obtained by fixing all other players’ strategies. In security related models, fixing the attacker’s strategy to an estimated value implies that the defender considers a possibly non-equilibrium strategy adopted by the attacker and seeks to optimize his best response against such a policy. Furthermore, fixing the attacker’s strategy to an estimated probability value allows us not to consider the attacker solely as a maximizing player, overcoming the deficiency of equilibrium in capturing the mo- tivations of the attacker. Therefore, considering the motivations of the attackers, incorporating this information into their strategies, and solving the corresponding bestresponseproblemfromthedefender’spointofviewislikelytogivemoreinsight to homeland security related decision problems than using equilibrium solutions as guidelines for counter-terrorism policies. Note that, in the previous section, we have done so by presenting sensitivity analyses results where estimates of attacker’s attack strategy is used for best response calculations of the defender’s. Wenextpresentanapplicationofdiscountedrobuststochasticgames, inaqueu- ing control context. 65 4.4. A Flow Control Example In this section, we present an application of our robust model for incomplete informationstochasticgamestoasingleserverqueuingsystemwhereausercontrols the flow of arriving customers into a finite buffer. This application is inspired by the flow control model presented in [2]. LetX t represent the number of customers in the system at timet,t = 0,1,2,.... The state space is denoted by X ={0,1,...,L}, where L<∞ is the buffer size. It is assumed that at most one arrival can occur at the beginning of a time slot. At the end of each time slot, if the state is x the service controller (player 1) chooses an interval from a finite set of possibly disjoint intervals of service rates. Hence, the exact service rate is unknown and belongs to an interval. Formally, if the state is x, service controller choses an interval I 1 j ∈ A 1 x , where A 1 x is a finite collection of possibly disjoint closed intervals. That is, A 1 x ={I 1 j } J j=1 , I 1 j = [μ j ,μ j ]. At the beginning of a time slot, if there arex, customers in the system, the flow controller chooses an arrival rate λ from a finite set A 2 x of arrival rates. We assume that no arrivals are allowed when the buffer is full. Let us denote the unknown service rate by ˜ μ. We assume that the alternative sets of both players are the same for all states, that is, A 1 x =A 1 ,A 2 x =A 2 ,∀x∈X. Therefore, we have the following transition rule that is unknown commonly to both players: ˜ p(y|x,˜ μ,λ) = ˜ μ/(λ+ ˜ μ), 1≤x≤L, y =x−1 λ/(λ+ ˜ μ), 0≤x<L−1, y =x+1 1−λ/(λ+ ˜ μ), y =x = 0 1− ˜ μ/(λ+ ˜ μ), y =x =L . There are three main differences between our model and the one given in [2]: First, alternatives in the latter are probability values for service completion and 66 arrival rates that the service and flow controllers (players) can choose from a finite set. The alternatives in the former are in the form of arrival rates and intervals of service rates, which lead to different transition rules. Second, the latter is an application of a complete information stochastic game where the transition data of the game are commonly known to both players. In our model, the service rates are unknown and therefore, the transition probabilities are unknown to both players. Lastly, different from the model in [2], the underlying Markov chain in our model is a continuous-time model and constitutes a birth and death process. The immediate payoff function that is frequently used in the literature on flow control models is defined as follows (see [2]): C(x,b μ,λ) =c(x)+θ(b μ)+ρ(λ), where b μ denotes the midpoint of a service rate interval. C(x,b μ,λ) represents the cost that the flow controller pays the service controller given the state of the system and the alternative pairs. Hence, in this two-person zero-sum model, player 2 is the minimizingwhereasplayer1isthemaximizingplayer. Here,c(x)is anondecreasing real function in x, θ ≥ 0 is an increasing real function in b μ, and ρ is a nonpositive real function of λ. c(x) can be interpreted as the holding cost per unit time, θ as the cost associated with the service rate, and ρ as the reward associated with an entering customer. Note that the exact immediate costs are known to both players. We set up an instance of this problem with two alternatives for the server con- troller at each state. First alternative for player 1 is a service rate of one customer per 14 to 16 seconds, and the second is a rate of one customer per 19 to 21 seconds. Note that the mean for the first and the second alternatives are between 14 and 16 seconds,andbetween19and21seconds,respectively. Wesetupninemoreinstances 67 of the same problem by enlarging the range of the means for both alternatives by 2 secondsforeverynewinstance. Hence, weobtainedTable4forservicerateintervals for each instance. instance 1 2 3 4 5 μ 1 [1/16,1/14] [1/17,1/13] [1/18,1/12] [1/19,1/11] [1/20,1/10] μ 2 [1/21,1/19] [1/22,1/18] [1/23,1/17] [1/24,1/16] [1/25,1/15] instance 6 7 8 9 10 μ 1 [1/21,1/9] [1/22,1/8] [1/23, 1/7] [1/24,1/6] [1/25,1/5] μ 2 [1/26,1/14] [1/27,1/13] [1/28,1/12] [1/29,1/11] [1/30,1/10] Table 4: Intervals of rates for different instances In each state, the flow controller has two alternatives. First alternative for the flow controller (player 2), 1/λ 1 , is to permit one customer into the system every 15 seconds. Second alternative for player 2 is to let one customer into the system every 20 seconds on average. Note that since players may use mixed strategies, a convex combinationofthetwoalternativescouldbechoseninanystate. Underthesesetsof alternatives, we set up the following scenario for payoff functions depicted in Figure 9. This scenario indicates that in the first few states, the service controller pays the flow controller so that he admits more customers into the system, which would be beneficialfortheservicecontroller. However,aftertheadmissionofapproximately2 customers, theflowcontrollerpayssomeamounttotheservicecontrollersothatthe admitted customers get served. From this point, up to approximately 20 customers in the system, the alternative pair of (b μ 1 = 1/15sec −1 , λ 2 = 1/20sec −1 ) incurs the highest amount paid by the flow controller to the service controller. Note that this pair indicates that player 1 serves at a relatively higher rate whereas player 2 admits at a slower rate. Hence, the flow controller encourages the service controller to complete the services quicker. The alternative pair (b μ 2 = 1/20sec −1 , λ 1 = 1/15sec −1 ) indicates a relatively 68 0 5 10 15 20 25 30 −30 −20 −10 0 10 20 30 number of customers in the system (states) payoffs (in dollars) alternative 1 for both players alternative 1 for player 1, alternative 2 for player 2 alternative 2 for player 1, alternative 1 for player 2 alternative 2 for both players Figure 9: Payoff functions 69 slower service rate with respect to the flow rate, and is less beneficial from the service controller’s perspective. Up until to the state of 20 customers in the sys- tem, alternative pairs (b μ 1 = 1/15sec −1 , λ 1 = 1/15sec −1 ) and (b μ 2 = 1/20sec −1 , λ 2 = 1/20sec −1 ) incurs costs to the flow controller that lie between the first two cases described, with higher service rates incurring slightly higher costs to the flow controller. When the number of customers in the system exceeds approximately 20, this scenario encourages the flow controller to decrease the flow rate to protect himself against higher pays to the service controller, whereas the service controller would still be willing to serve at a higher rate. In summary, the antagonistic nature ofthisscenarioiscapturedbythefactthattheservicecontrollerwouldbewillingto serve faster even though it would be to the benefit of the flow controller to decrase admission into the system. In other words the service contoller would be glad to haveasmanycustomersinthesystemaspossible, whereastheflowcontrollerwould wish the opposite. Based on the service rate intervals of this birth and death process, we first calculated the intervals of transition probabilities to obtain the uncertain transition data of the game for each instance in Table 4. We solved each instance using Theorem 5. From the equilibrium strategies, we calculated the expected service and arrival rates at equilibrium for each state from the two players’ perspectives. From these rates, we calculated steady state probabilites, the average number of customers in the system (L), the average amount of time a customer spends in the system(W),andtheaveragevalueofthegamefromthepointofviewofeachplayer. Theaveragevalueforaplayeriscalculatedbyweightingthevaluetoaplayerstarting in a state by the respective steady state probabilities from that player’s perspective andtakingthesummationoverthestates. Furthermore,wesolvedthenominalzero- sumstochasticgamebyignoringtheuncertaintyinserviceratesandconsideringthe 70 1 2 3 4 5 6 7 8 9 10 0 5 10 15 20 25 30 instance number average number of customers in the system nominal equilibrium strategies robust equilibrium (from the perspective of the service controller) robust equilibrium (from the perspective of the flow controller) Figure 10: L, the average number of customers in the system 1 2 3 4 5 6 7 8 9 10 0 100 200 300 400 500 600 700 800 900 instance number W, average time a customers spends in the system nominal equilibrium strategies robust equilibrium (from the perspective of the service controller) robust equilibrium (from the perspective of the flow controller) Figure 11: W, average time a customer spends in the system 71 1 2 3 4 5 6 7 8 9 10 −150 −100 −50 0 50 100 150 200 instance number average value nominal equilibrium strategies ( avg value to player 2) robust equilibrium (from the perspective of the service controller) robust equilibrium (from the perspective of the flow controller) Figure 12: average value of the game midpoints of the service intervals. We also calculated each of the above quantities for this nominal stochastic game. Note that each instance has the same rate as its midpoint. The results are depicted in Figures 10, 11, and 12. Figure 10 indicates thatasfarastheservicecontrollerisconcerned, theaveragenumberofcustomersin the system is less than that of the nominal solution when there is uncertainty in the system. Furthermore, the first five instance yield larger decreases in L as opposed to the last five instances. From the flow controller’s perspective, L increases as the intervals becomes larger. Note that these are pessimistic points of views for both players since an increase in the number of customers would be to an advantage to the service controller, whereas it would be a disadvantage to the flow controller. Consequently the gain that the service controller achieves decreases and the cost to the flow controller increases as the length of the service intervals increases. On the 72 other hand, average waiting time for a customer decreases from the first player’s perspective, and increases from the second player’s perspective, in paralell to the results for L and average values. Note that although this stochastic game is zero- sum, the resulting values differ for each player when they play robustly. We observe thatthedeviationinaveragevaluesforeachplayerfromthevaluesobtainedfromthe nominal zero-sum game, and the difference between the average values the players achieve increase as the length of service intervals increase. Hence, we justify here that although our example is a zero-sum game, formulations for zero-sum games cannot be used to solve robust stochastic games, despite the fact that one player pays the other player a fixed amount. 1 2 3 4 5 6 7 8 9 10 −150 −100 −50 0 50 100 instance number average value robust equilibrium avg values (from the perspective of the service controller) avg values for nominal stg at its worst−case data(from the perspective of the serv. cont.) Figure13: robustequilibriumaveragevaluesvsaveragevaluesfornominalstrategies at their worst-case data We next compare the results obtained for the average values, L, and W asso- ciated with the robust equilibrium and those with the nominal equilibrium under its respective worst-case probability transition data. In order to obtain the latter for each instance, we first fixed the nominal strategies for each player and solved 73 1 2 3 4 5 6 7 8 9 10 130 140 150 160 170 180 190 200 210 instance number average value robust equilibrium avg values (from the perspective of the flow controller) avg values for nominal stg at its worst−case data(from the perspective of the flow cont.) Figure14: robustequilibriumaveragevaluesvsaveragevaluesfornominalstrategies at their worst-case data the resulting formulations in which transition data are treated as variables. The objectives in these formulations yield the maximum value (maximum cost) and the minimum value (minimum profit) for players 2 and 1, respectively. This approach allows us to contrast the robust equilibrium against the nominal solution under its worst possible data scenario. Note that results of the nominal solution under its worst possible data scenario do not pertain to an equilibrium. Figure 13 shows that for the fifth instance, the average values obtained from the robust equilibrium ex- ceedstheworst-casevaluesobtainedfromthenominalstochasticgame’sequilibrium strategies. This is possible in the context of stochastic games since, unlike the dot- ted line, the dashed line in Figure 13 pertains to a robust equilibrium point where no player is willilng to deviate from their strategies. In robust equilibrium, players reach an equilibrium by not only considering their opponents’ strategies but also the worst-case transition data with respect to the opponents’ strategies. The dotted line is obtained by taking into account the nominal stochastic game’s equilibrium 74 strategieswithrespecttotheirownworst-casetransitiondata. Consideringthatthe opponent is playing with nominal strategies and their respective worst-case data, a player could be willing to deviate from using nominal strategies unless they face an equilibrium. Here, the dotted line is not associated with an equilibrium point, hence is not a stable point from which the players would be willing to deviate. This phenomenon becomes prevalent for the flow controller when the uncertainty in the service interval rates are relatively less (up to instance 5), as seen in Figure 14. This implies that for our example, as the uncertainty gets larger, using robust strategies is beneficial for the flow controller, as far as he/she is concerned with the average value of the game. Figures 15 and 16 indicate that, using robust strategies usually leadstoalargernumberofcustomersinthesystem. However, wejustobservedthat robust strategies yield better (less) average values for the flow controller when the interval length for the service rates increases. Since having more customers in the system is costly for the flow controller, this may seem unusual. However, it is not, and the explanation is as follows: Our computational results indicate that nominal startegies at their worst-case data scenario and robust strategies yield very similar steady state probabilities. However, the value (cost) of the game to the flow con- troller starting in any given state given by the former is always greater than the one given by the latter strategies. Therefore, although robust strategies yield slighltly higher number of customers in the system, they provide less average values. We next fix the robust strategies for both players and simulated the transition data using uniform distributions for instance numbers 1,3,5, and 10, and calculate the corresponding values for players. For each sample transition data, we then set players’ strategies to their nominal strategies and re-calculated the corresponding values. We plot the mean and standard deviation values for a sample size of 50 in Figures 17 and 18. 75 1 2 3 4 5 6 7 8 9 10 0 5 10 15 20 25 instance number L L, for robust strategies (from the perspective of the service controller) L, for nominal stg at its worst−case data(from the perspective of the serv. cont.) Figure 15: L values for robust equilibrium vs for nominal strategies at their worst- case data 1 2 3 4 5 6 7 8 9 10 22 23 24 25 26 27 28 29 30 instance number L L, for robust strategies (from the perspective of the flow controller) L, for nominal stg at its worst−case data(from the perspective of the flow cont.) Figure 16: L values for robust equilibrium vs for nominal strategies at their worst- case data 76 Simulation results indicate that when there is uncertainty in the system and players use robust strategies, the flow controller gets higher costs compared to the situation where both players use nominal strategies. Note that the simulation is carried out on a zero-sum game. Therefore, given that both players play robustly. the service contoller gets higher gains if there is uncertainty system. However, as the uncertainty increases, mean average values for both the nominal and the robust cases decrease, which is beneficial to the flow controller, as his average cost decreases. The reason for this is that as the uncertainty increases, flow contoller puts more weight in his first (higher admission rate) option, which brings him the higher reward associated with the higher admission rate. The result that the means for the robust policies are greater than the nominal policies stems from the fact that the robust policies prescribe the flow controller, who is the minimizing player, a pessimistic (worst-case) perspective with respect to transition data. Figure 18 indicates that when the uncertainty is not large (for instances 1,3,5), the standard deviation of average values obtained using robust policies are less than those of the nominal policies. We see that in instance 10, the standard deviation for therobustpoliciesexceedsthatofthenominalpolicies,anundesirablesituationfora robustequilibriumpoint. Explanationofthisphenemenonrequiresfurtherresearch. However, two possible explanations are as follows: First, it may be possible to have equilibriumpointswithundesirableproperties, ausualphenemenonencounteredfor Nash equilibrium in general [see [14]]. Second, the probability intervals in instance number 10 are large, and the simulation is carried out using uniform distributions. For larger uncertainty sets, the robust equilibrium may become overly conservative and possess undesirable properties. In this section, we have illustrated the use of discounted robust stochastic games in a queuing control context. We analyzed and presented the effects of using robust 77 1 2 3 4 5 6 7 8 9 10 −20 0 20 40 60 80 100 120 140 160 instance number mean for the average value of the game to player 2 nominal policies fixed robust policies fixed Figure 17: mean values for the average value of the game to player 2 1 2 3 4 5 6 7 8 9 10 0 5 10 15 20 25 30 35 40 45 50 instance number standard deviation for the average value of the game to player 2 nominal policies fixed robust policies fixed Figure 18: standard deviation values for the average value of the game to player 2 78 policies versus nominal policies on the average number of customers in the system (L), the average amount of time a customer spends in the system (W), and the average value of the game for each player. We next contrast the average values obtained from robust equilibrium with nominal strategies at their worst-case data. We conclude from these analyses that if the uncertainty increases, using robust policies becomes beneficial for the flow contoller, as far as he is concerned with the average value of the game. We arrive at a similar conclusion as a result of our simulation study, that is, as the uncertainty increases, mean average values for both the nominal and the robust cases decrease, which is beneficial to the flow controller since his average cost decreases. In the next chapter, we summarize our conclusions and present possible future research directions. 79 Chapter 5. Conclusions In this dissertation, we develop a new methodology, n-player, non-zero sum discountedrobuststochasticgames,andpresentapplications. Insuchgames,noneof theplayersknowsthetruedataofthegameandeachplayerconsidersadistribution- freeincompleteinformationstochasticgametobeplayedusingrobustoptimization. This dissertation focuses on two main research areas. First, we examine in- complete information discounted n-person stochastic games. In these non-zero sum games, we consider that none of the players know the true transition probability data and / or payoffs, and players adopt a robust optimization approach to ambigu- ity in the data of the game. We offer an alternative equilibrium concept for incom- pleteinformationstochasticgames. Priorresearcheffortsonincompleteinformation stochastic games make assumptions on the player who has incomplete information and on whether the transition probabilities are controlled by a single player or not (see [58]). Our approach relaxes these assumptions and consider that neither of the players knows the exact data of the game, and that the transitions are controlled jointly by the players. Furthermore, prior efforts assume prior probability distri- butions for the incomplete information stochastic games, whereas this dissertation focusesonadistributionfreemodel,namelythediscountedrobuststochasticgames. Excluding the work by [1], which is on normal form incomplete information games rather than incomplete information stochastic games, the prior approaches to in- complete information do not provide any formulations or procedures for equilibrium calculation. On the other hand, this dissertation provides an explicit mathemati- cal programming formulation for equilibrium calculation in incomplete information discounted n-person stochastic games, leading a way to computational results. We determined several properties in this research: • An equilibrium exists even if there exists players who do not adopt a robust 80 optimization approach. This stems from the fact that when there are no un- certainty sets for the data of a stochastic game, best response functions are already continuous, as shown in [20]. Hence, we can construct a correspon- dence that satisfies Kakutani’s theorem and that includes players who may play non-robustly. • In our existence proof we have assumed that the players commonly know the uncertainty set of payoffs C s at each state. The results of this existence theorem hold, even if each player has different sets for uncertainty. • The zero-sum property is most likely to vanish for stochastic games in which thepayoffuncertaintyisacommonsetforallplayersandinwhichthereexists players who plays robustly. • If there is ambiguity in any data of the game, the players’ approach to this ambiguity differ, resulting in the loss of the complete antagonistic property of a zero sum game. • If the stochastic game is a two person zero-sum game but the transition data is ambigious, then the equilibrium values for the players do not negate each other. This implies that although such games are zero-sum, formulations for zero-sum stochastic games could not be used for analyses and properties that pertain to zero-sum games cannot be expected to hold in the presence of ambiguity. • Asdilemmasarecommoningametheory,itispossibletofinddilemmasindis- countedrobuststochasticgames. Asinnormalformgames, discountedrobust stochastic games could encompass different equilibrium points with different values and probability perspectives for players. In this sense, an equilibrium 81 may not imply the best of what the players could do. For instance, the equi- librium values obtained by the minizing player in a two player discounted stochastic game could be less than that of its nominal counterpart. An equi- librium, in fact, means that given the probability perspectives adopted by the players, no player is unilaterally willing to deviate from his/her own strat- egy (Nash equilibrium). Consequently, based on the uncertainty sets used, it could be possible to have robust equilibrium points that does not have desir- able properties. This issue is in parallel to discussions on the drawbacks of Nash equilibrium. Hence, the question of stability and perfection ([14]) also arises in the context of discounted robust stochastic games. The second focus in this dissertation is the applications of stochastic games. To this end we first illustrate the use of stochastic games via a simple example for the MANPADS problem. Our example in this section differs from the conventional decision tree methodology in several ways: • First, we do address the adversarial aspect of the problem using a game the- oretical approach that yields self-enforcing equilibrium points. Hence, this approach takes into account the possible strategies that the attackers could adopt. • Second, in this approach, it is possible to take certain actions as a decision maker with some probability, which measures the weight of an action over other alternative actions. Furthermore, if desired, this approach allows the decision makers to choose an option with certainty, like in decision trees. • Third, example models in this thesis have a tree-like structure that is easy to communicate; however, unlike in the decision trees, the nodes of these trees 82 are not chance or decision nodes but payoff matrices. Hence, players play a normal form one shot game on each node (state). • Fourth, unlike decision trees, our applications model an opponent explicitly. • Fifth, besides merely modeling an opponent, the mathematics of stochastic gamesallowsustodeterminethebestresponseofoneplayeragainstthestrate- gies chosen by the other player through a suitable mathematical programming problem. Unlike the roll-back procedure in decision trees that chooses the decision alternative thatyields thelowest or highestexpected value associated withadecisionalternative,ourproblemsaresolvedeitherforanequilibriumor forabestresponseusingmathematicalprogramming. Thisallowsustomodel an irrational player in our applications, and find the best response strategies of the other against the irrational player. Futhermore, all sorts of sensitivity analyses that decision trees allow a user to perform could also be done using stochastic games. Subsequently, the main model for the MANPADS problem is presented along with solutions and sensitivity analyses. The output of the main model includes two-way sensitivity analyses on the cost of a fatal crash and attack probability for different countermeasure costs, on the probability of re-play and cost of fatal crash for different attack probabilities. The final results on the MANPADS problem are similar to those obtained in [76] with the additional consideration that if nothing happens,thereisa50%chancethattheMANPADSthreatcontinuestoexist. Hence, we see that countermeasures are cost-effective if the countermeasure costs over ten years is around $10 billion, the attack probability is more than 0.4, and if the fatal crash cost is more than about $75 billion. As an extension to the conclusions provided in [76], we conclude that if the 83 attack probability is around 0.2, and the replay probability is low (around 0.1), then countermeasures are not cost effective, unless economic costs associated with a fatal crash are very high (above $250 billion). On the other hand, even if the attack probability is around 0.20, countermeasures could be cost-effective if economic costs of fatal crash are more than about $50 billion, if the MANPADS threat continues to exist with high probability given there is no attack and no countermeasures are installed. Next, we present another stochastic game model that takes into account the changes in payoffs over time. Similar to a conclusion on the MANPADS model, we conclude that the payoffs of this model are the dominant factors that determine the strategy profiles of the players, and this makes the strategy profiles very insensitive to transition data. We observe in these examples that the utility scheme, when used in game theory for equilibrium calculations, fails to take into account the human behavior aspects of attackers. It rather considers the players as an agent who solely wishes to maximize his gain, disregarding the human factors present in terrorism such as the mental processes of individuals leading to an attack decision and their motivation. However, game theory is still a useful tool to analyze terrorism, since, besides equilibrium, it offers the use of best response functions. Considering the motivations of the attackers, incorporating this information into their strategies, and solving the corresponding best response problem from the defender’s point of view is likely to give more insight to homeland security related decision problems than using equilibrium solutions as guidelines for counter-terrorism policies. Finally,wepresentanapplicationofdiscountedrobuststochasticgamestoqueu- ing control. We analyzed and presented the effects of using robust policies versus nominal policies on the average number of customers in the system, the average amount of time a customer spends in the system, and the average value of the game 84 for each player. We next contrast the average values obtained from robust equi- librium strategies and from nominal strategies at their worst-case data and present related results. This dissertation reveals several future research topics to pursue. Some of these points could be outlined as follows: • It is possible to have multiple equilibria in discounted robust stochastic games as it is with discounted stochastic games. Characterization of the set of equi- libria,anddeterminingequilibriawithbetterproperties,isanimportanttopic. The idea of “perfect equilibrium points” has been a research area for bima- trix games that proposes refinements aimed at selecting equilibria with more stability (see [14]). Hence, a similar research direction could be pursued for discounted robust stochastic games. • The feasiblity problem that yields an equilibrium point in Chapter 3 considers thatbothplayersadoptarobustapproach. Anotherimportantfutureresearch topicwouldbetoaddresstheproblemwhereoneoftheplayersdoesnotadopt a robust approach, and to investigate the effects of using or not using a robust approach. • In this dissertation, we have considered discounted games. Robustness in average reward stochastic games is not addressed in the literature, and hence, is a new direction that could be followed. • Fromanappliedpointofview, aninterestingfutureresearchtopicwouldbeto investigate the effect of different payoff structures on the robust equilibrium, in the context of the queuing control example presented in this dissertation. • Certain types of stochastic games could be solved using linear programming. Itisworthwhiletodeterminethenecessaryconditionsthatallowonetomodel 85 discounted robust stochastic games as the robust counterpart of suitably con- structed linear programs. We hope that this dissertation provides a new perspective on incomplete infor- mation stochastic games and related applications. 86 Bibliography [1] Aghassi, M., Bertsimas, D. (2004).“Robust Game Theory,” To appear, revised November, 2004. [2] Altman, E. 1994. Monotonicity of Optimal Policies in a zero sum game: A flow control model. Advances of dynamic games and applications. Annals of the International Society of Dynamic Games, Birkhauser, 1, 269-286. [3] Atkinson, S. E., Sandler, T., and Tschirhart, J. T. (1987). “Terrorism in a bargaining framework”, Journal of Law and Economics, Vol.30, No.1. [4] Aumann, R. J., and Maschler, M. B. (1968). “Repeated games of incomplete information: The zerosum extensive case,” in Report of the U.S. Arms Control and Disarmament Agency ST-143,Washington, D.C., Chapter III, pp. 37116. [5] Azaiez, N., Bier, V. M. (2004).“Optimal Resource Allocation for Security in Reliability Systems,” Working Paper, Industrial EngineeringDepartment, Uni- versity of Wisconsin Madison. [6] Bagnell, J., A. Ng, J. Schneider. 2001. Solving uncertain Markov decision prob- lems. Tecnical Report CMU-RI-TR-01-25, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. [7] Ben-Tal, A., Nemirovski, A. (1998). “Robust Convex Optimization,” Mathe- matics of Operations Research, Vol.23, No.4. [8] Bernard, H. (2004).“Mathematical Methods in Combating Terrorism,”Risk Analysis, Vol.24, No.4. [9] Bier, V. (2004). “Game-Theoretic and Reliability Methods in Counter- Terrorism and Security,” Technical Report, Center for Human Performance and Risk Analysis, University of Wisconsin Madison. [10] Bier,V.,Nagaraj,A.,Abhichandani,V.(2004).“Protectionofsimpleseriesand parallel systems with components of different values,” Reliability Engineering and Systems Safety, Vol.10, No.27. 87 [11] Border, Kim C. (1985). Fixed Point Theorems with Applications to Economics and Game Theory, Cambridge University Press. [12] Brynielsson, J., Arnborg, S. (2004). “Bayesian Games for Threat Prediction and Situation Analysis,” Technical Report, Department of Numerical Analysis and Computer Science, Royal Institute of Technology, Stockholm, Sweden. [13] Cruz, Jr., J. B., Simaan M. A., Gacic, A., and Liu, Y. (2002).“Moving horizon Nash strategies for a military air operation,” IEEE Transactions on Aerospace and Electronic Systems, Vol. 38, No.3. [14] van Damme, E. (1991). Stability and Perfection of Nash Equilibria, Springer- Verlag, New York. [15] DArtigues, A., Vignolo, T. (2003). “Why Global Integration May Lead to Terrorism: An Evolutionary Theory of Mimetic Rivalry,” Economics Bulletin, Vol.6, No.11. [16] L. El Ghaoui, A. Nilim. (2004). “Robust solutions to markov decision problems with uncertain transition matrices,”, Operations Research, 2004. To appear. [17] Faria,J.R.(2003).“TerrorCycles,”StudiesinNonlinearDynamicsandEcono- metrics, Vol.7, No.1. [18] Faria,J.R.(2004).“TerroristInnovationsandAnti-TerroristPolicies”,Political Economy Working Paper, School of Social Sciences, University of Texas at Dallas. [19] Filar, J., Vrieze, K. (1997). Competitive Markov Decision Processes, Springer- Verlag, New York. [20] Fink, A.M. (1964). “Equilibrium in a Stochastic N-Person Game,” Journal of Science in Hiroshima University, Series A-I, vol.28 pp.89-93. [21] Ghaoui, L. E., Oustry, F., and Lebret, H. (1998). “Robust Solutions to Uncer- tain Semidefinite Programs,” SIAM J. Optim., Vol.9, No.1, pp.33-52. [22] Gilboa, I., D. Schmeidler. 1989. Maxmin Expected Utility with a Non-unique Prior. Journal of Mathematical Economics. 18 141-153. [23] Givan, R., S. Leach, T. Dean. 1997. Bounded parameter Markov decision pro- cesses. Fourth European Conf. Planning, 234-246. [24] Guo,X.,Hernandez,O.(2004).“ZeroSumGamesforNonhomogeneousMarkov Chains with an Expected Average Payoff Criterion,” Appl. Comput. Math, Vol. 3, No.1. 88 [25] Haimes,Y.Y.(2002).“RoadmapforModelingRisksofTerrorismtotheHome- land,” Journal of Infrastructure Systems, June 2002. [26] Haimes, Y. Y. (2002). “Strategic Responses to Risks of Terrorism to Water Resources,” Journal of Water Resources Planning and Management, Vol.128. [27] Harsanyi, J. C. (1967, 1968). “Games with Incomplete Information Played by BayesianPlayers, I-III,” Management Science, Vol.14, pp.159-182, pp.320-334, pp.486-502. [28] Hausken, K. (2002). “Probabilistic risk analysis and game theory,” Risk Anal- ysis, Vol.22. [29] Hayashi, S., Yamashita, N., and Fukushima, M. (2004). “Robust Nash equilib- ria and second-order cone complementarity problems,” Technical Report 2004- 004,DepartmentofAppliedMathematicsandPhysics,KyotoUniversity,April, 2004. [30] Hazen, G. B. (2002). “Stochastic Trees and the StoTree Modeling Environ- ment: Models and Software for Medical Decision Analysis,” Journal of Medical Systems, Vol.26. [31] Heal, G., Kunreuther, H. (2003). “You Only Die Once: Managing Discrete InterdependentRisks”,WorkingPaperColumbiaBusinessSchoolandWharton Risk Management and Decision Processes Center. [32] Hildenbrand W., Kirman, A.P. (1976). Introduction to Equilibrium Analysis. North Hollan Publishing Company. [33] Howard, R. A., Matheson, J.E. (1981). “Influence Diagrams,” Principles and Applications of Decision Analysis, Vol.2. [34] Hudson,L.D.,Ware,B.S.,Laskey,K.B.,andMahoney,S.M.(2002).“AnAp- plication of Bayesian Networks to Antiterrorism Risk Management for Military Planners,” Technical Report, Digital Sandbox, Inc. [35] Iyengar,G.2005.RobustDynamicProgramming.Math.Oper.Res.,30(2)1-21. [36] Jaskiewicz, A. (2002). “Zero Sum Semi-Markov Games,” SIAM Journal on Control and Optimization, Vol. 41, No. 3. [37] Kakutani, S. (1941). “A Generalization of Brouwer’s Fixed Point Theorem,” Duke Mathematical Journal, Vol.8, pp.457-59. [38] Kirk, B., W. A., Sims, B. (2001). Handbook of Metric Fixed Point Theory. Kluwer Academic Publishers, Dordrecht, Netherlands. 89 [39] Koller, D.,Milch, B. (2001). “Multi-agent influence diagrams for representing and solving games,” in Proceedings of the 17th International Joint Conference of Artificial Intelligence. [40] Kunreuther, H., Heal, G. (2003). “Interdependent security,” Journal of Risk and Uncertainty, Vol.26, No.2. [41] LaMura, P. (2000). “Game Networks,” in Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. [42] Lapan, H. E., Sandler, T. (1988). “To bargain or not to bargain: That is the question,” American Economic Review, Vol.78, No.2. [43] Lapan, H. E., Sandler, T. (1993). “Terrorism and signaling,” European Journal of Political Economy, Vol.9, No.3. [44] Laraki, R., Solan, E. (2002). “Stopping Games in Continuous Time,” SIAM Journal on Control and Optimization, to appear. [45] T. S. Laskey, K. B., Levitt T. S. (2002). “Multisource Fusion for Op- portunistic Detection and Probabilistic Assessment of Homeland Terrorist Threats,”Technical Report, SEOR Department, George Mason University. [46] Lo, K.C. 1996. Equilibrium in Beliefs Under Uncertainty. Journal of Economic Theory. 71 443-484. [47] Major, J. (2002). “Advanced techniques for modeling terrorism risk,” Journal of Risk Finance,Vol.4. [48] Marinacci, M. 2000. Ambiguous Games. Games and Economic Behavior. 31 191-219. [49] Nash, J.F.(1950).“EquilibriumPointsinn-PersonGames,” Proceedings of the National Academy of Sciences of the United States of America, Vol.36, No.1, pp.48-49. [50] Nemirovski, A. Lectures on Modern Convex Optimization. Technion, Israel In- stitute of Technology. [51] Nilim, A., L. El Ghaoui. 2005. Robust control of Markov decision processes with uncertain transition matrices. Operations Research. 53(5) 780-798. [52] O’Sullivan,TM.“ExternalTerroristThreatstoCivilianAirliners: ASummary Risk Analysis of MANPADS, Other Ballistic Weapons Risks, Future Threats, and Possible Countermeasures Policies”, CREATE REPORT. 90 [53] Parthasarathy, T., Raghavan, T.E.S. (1981). “An Orderfield Property for StochasticGamesWhenOnePlayerControlsTransitionProbabilities,”Journal of Optimization Theory and Applications, Vol. 33, 375-392. [54] Parthasarathy, T., Tijs, S.H., and Vrieze, O. J. (1984). Stochastic Games with State Independent Transition and Separable Rewards,SelectedtopicsinORand Mathematical Economics, Springer-Verlag, Lecture Note Series 226, New York. [55] Pate-Cornell, E., Guikema, S. (2002). “Probabilistic modeling of terrorist threats: A systems analysis approach to setting priorities among countermea- sures,” Military Operations Research, Vol.7, No.4. [56] Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers, San Francisco. [57] RobustOptimizationLectureNotes.EpsteinDepartmentofIndustrialandSys- tems Eng., Fall 2004. [58] Rosenberg, D., Solan, E., and Vieille, N. (2004). “Stochastic Games with a Single Controller and Incomplete Information,” SIAM Journal of Control and Optimization, Vol. 43, No.1. [59] Sandler, T., Arce M., D. G. (2003). “Terrorism and Game Theory,” Simulation and Gaming, Vol.34, No.3. [60] Sandler, T., Tschirhart, T. T., and Cauley, J. (1983). “A theoretical analysis of transnational terrorism,” American Political Science Review, Vol.77, No.4. [61] Sandler, T., Siqueira, K. (2002). “Global terrorism: Deterrence versus preemp- tion,” Unpublished manuscript, University of Southern California. [62] Satia, J. K., R. L. Lave. 1973. Markov decision processes with uncertain tran- sition probabilities. Operations Research. 21(3) 728-740. [63] Shachter, R. D. (1986). “Evaluating influence diagrams,” Operations Research, Vol.34, No.6. [64] Singh, S., Tu, H., Allanach, J., Pattipati, K.R., and Willett, P. (2004). “Stochastic Modeling of a Terrorist Event via the ASAM System”, in Pro- ceedings of the International Conference on Systems, Man and Cybernetics, The Hague, The Netherlands. [65] Shapiro, A., A. J. Kleywegt. 2002. Minimax analysis of stochastic problems. Optim. Methods Software. 17(1) 523-592. [66] Shapley, L.S. (1953). “Stochastic Games,” Proceeding of the National Academy of Science, 39, 1095-1100. 91 [67] Sorin, S. (1984). “Big Match with Lack of Information on One Side (Part 1),” International Journal of Game Theory, Vol.13, 201-255. [68] Sorin, S. (1985). “Big Match with Lack of Information on One Side (Part 2),” International Journal of Game Theory, Vol.14, 173-204. [69] Soyster,A.L.(1973).“ConvexProgrammingwithSet-InclusiveConstraintsand Applications to Inexact Linear Programming,” Operations Research, Vol.21, pp.1154-57. [70] Virtanen, K., Karelahti, J., Raivio, T., and Hamalainen, R. P. (2004). “Mod- eling Air Combat by a Moving Horizon Influence Diagram Game,” Technical Report, Systems Analysis Laboratory, Helsinki University of Technology, Fin- land. [71] Virtanen, K., T. Raivio, R. P. Hmlinen. (2004b). “Modeling pilots sequential maneuveringdecisionsbyamultistageinfluencediagram,”toappearinJournal of Guidance and Control. [72] von Neumann, J., O. Morgenstern. 1944. The Theory of Games and Economic Behavior. Princeton UP, Princeton. [73] Vrieze, O. J. (1987). “Stochastic Games with Finite State and Action Spaces,” CWI Tracts 33, Amsterdam. [74] Weaver, R., Silverman, B. G., Shin, H., and Dubois, R. (2001). “Modeling andSimulatingTerroristDecision-Making: APerformanceModeratorFunction Approach to Generating Virtual Opponents,” 10th Conference On Computer Generated Forces and Behavioral Representation, SISO, May, 2001. [75] White, C. C., H. K. Eldeib. 1994. Markov decision processes with imprecise transition probabilities. Operations Research. 42(4) 739-749. [76] Winterfeldt, Detlof von, O’Sullivan T. M. (2005). “A Decision Analysis to Evaluate the Cost-Effectiveness of MANPADS Countermeasures”, CREATE REPORT. 92 Appendix Banach’s Theorem Let (W,ρ) be a complete metric space and let γ : W → W be a contraction mapping. Then there exists a unique fixed point of the function γ. Lemma 4 (Fink (1964)) α i s,ω i (x −i s ) is continuous on X −i s . Furthermore, the set{α i s,ω i |ω i is bounded} is equicontinuous. Proof. Let α i s,ω i(x −i s ) = ψ i s (C i s (x −i s ,u ∗ s i ),P i s (x −i s ,u ∗ s i );x −i s ,u ∗ s i ;ω i ), α i s,ω i(y −i s ) = ψ i s (C i s (y −i s ,z ∗ s i ),P i s (y −i s ,z ∗ s i );y −i s ,z ∗ s i ;ω i ). Furthermore, α i s,ω i(y −i s )−α i s,ω i(x −i s ) ≤ ψ i s (C i s (y −i s ,u ∗ s i ),P i s (y −i s ,u ∗ s i );y −i s ,u ∗ s i ;ω i ) − ψ i s (C i s (x −i s ,u ∗ s i ),P i s (x −i s ,u ∗ s i );x −i s ,u ∗ s i ;ω i ), α i s,ω i(x −i s )−α i s,ω i(y −i s ) ≤ ψ i s (C i s (x −i s ,z ∗ s i ),P i s (x −i s ,z ∗ s i );x −i s ,z ∗ s i ;ω i ) − ψ i s (C i s (y −i s ,z ∗ s i ),P i s (y −i s ,z ∗ s i );y −i s ,z ∗ s i ;ω i ). If ω i is restrained to be in a bounded region, then the right hand sides can be made uniformly small because of the uniform continuity of ψ i s on compact sets. 2 93 Lemma 5 (Fink (1964)) If x −i,n →x −i and τ i s (x −i,n )→ω i s as n→∞, then τ i s (x −i ) =ω i s . Proof. |ω i s − γ i s,x −i s (ω i )| ≤ |ω i s − τ i s (x −i,n )| + |τ i s (x −i,n ) − γ i s,x −i s (τ i (x −i,n ))| +|γ i s,x −i s (τ i (x −i,n ))−γ i s,x −i s (ω i )| Now, by assumption, as n→∞ |ω i s −τ i s (x −i,n )|→ 0 and|γ i s,x −i s (τ i (x −i,n ))−γ i s,x −i s (ω i )|→ 0. Note that |τ i s (x −i,n )−γ i s,x −i s (τ i (x −i,n ))| =|α i s,τ i (x −i,n ) (x −i,n s )−α i s,τ i (x −i,n ) (x −i s )|→ 0 as n→∞ by Lemma 3 in Fink (1964). Hence,|ω i s −γ i s,x −i s (ω i s )|→ 0 as n→∞. 2
Abstract (if available)
Abstract
This dissertation presents a distribution-free, robust optimization model for n-person finite state/action discounted stochastic games with incomplete information. We consider n-player, non-zero sum discounted stochastic games in which none of the players knows the true data of the game and each player considers a distribution-free incomplete information stochastic game to be played using robust optimization. We call such games discounted robust stochastic games. Discounted robust stochastic games allow us to use simple uncertainty sets for the unknown data of the game, and to eliminate the former approaches' requirements on defining prior probability distributions over a set of games. We prove the existence of equilibrium points when the payoffs of the game belong to a bounded set and the transition data is ambiguious. Unlike the prior work on incomplete information stochastic games, our approach lends itself to an explicit mathematical programming formulation for an equilibrium calculation. We illustrate the use of discounted robust stochastic games in a security related decision problem, followed by a control problem in a single server queuing system.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The robust vehicle routing problem
PDF
Iterative path integral stochastic optimal control: theory and applications to motor control
PDF
Robust optimization under exogenous and endogenous uncertainty: methodology, software, and applications
PDF
Applications of Wasserstein distance in distributionally robust optimization
PDF
Stochastic models: simulation and heavy traffic analysis
PDF
Addressing uncertainty in Stackelberg games for security: models and algorithms
PDF
Models and algorithms for energy efficient wireless sensor networks
PDF
Learning and control in decentralized stochastic systems
PDF
Scalable optimization for trustworthy AI: robust and fair machine learning
PDF
Computational stochastic programming with stochastic decomposition
PDF
Stochastic games with expected-value constraints
PDF
A stochastic employment problem
PDF
Adaptive control with aerospace applications
PDF
Robust control of periodically time-varying systems
PDF
Topics in algorithms for new classes of non-cooperative games
PDF
Elements of robustness and optimal control for infrastructure networks
PDF
The fusion of predictive and prescriptive analytics via stochastic programming
PDF
Routing and inventory model for emergency response to minimize unmet demand
PDF
Designing an optimal software intensive system acquisition: a game theoretic approach
PDF
Human adversaries in security games: integrating models of bounded rationality and fast algorithms
Asset Metadata
Creator
Kardes, Erim
(author)
Core Title
Discounted robust stochastic games with applications to homeland security and flow control
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Industrial and Systems Engineering
Publication Date
06/26/2009
Defense Date
05/03/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
homeland security,OAI-PMH Harvest,queueing / flow control,robust optimization,stochastic games
Language
English
Advisor
Hall, Randolph W. (
committee chair
), Ordonez, Fernando (
committee chair
), Alexander, Kenneth S. (
committee member
)
Creator Email
kardes@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m551
Unique identifier
UC1214875
Identifier
etd-Kardes-20070626 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-505989 (legacy record id),usctheses-m551 (legacy record id)
Legacy Identifier
etd-Kardes-20070626.pdf
Dmrecord
505989
Document Type
Dissertation
Rights
Kardes, Erim
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
homeland security
queueing / flow control
robust optimization
stochastic games