Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Team decision theory and decentralized stochastic control
(USC Thesis Other)
Team decision theory and decentralized stochastic control
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Team Decision Theory and Decentralized Stochastic Control Author: Seyedmohammad Asgharipari A Dissertation Submitted to the Faculty of the USC Graduate School In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Committee Members: Prof. Ashutosh Nayyar (Chair) Prof. Rahul Jain Prof. Ketan Savla Department of Electrical Engineering University of Southern California Los Angeles, CA 90089 August 2019 Abstract The advent of new applications and demands for decentralized systems is changing the shape of today’s control systems. The performance of such systems hinges on a series of operational decisions made by multiple controllers where each controller may have different information regarding the overall system. Team theory provides a framework for studying decision and control problems in decentralized systems. The main focus of this thesis is on linear-quadratic-Gaussian (LQG) dynamic team decision problems. Of particular interest are problems in decentralized stochastic control that can be viewed as LQG dynamic teams. Existing work on such team decision and decentralized control problems has the following limitations: (L1) The available results on LQG dynamic teams are largely restricted to partially nested information structures and little is known about the solution of general non-partially nested team decision problems; (L2) Even for partially nested LQG team problems, the team-theoretic approach may not provide any structural results or sufficient statistics for the decision strategies; (L3) Much of the prior work in team decision problems assumes that the interconnections among the team members in terms of whether one member’s actions affects another member’s information or whether one’s information is available to the other members are deterministic. However, since real networks are subjected to random effects such as packet drops, the standard team-theoretic results do not easily extend to networked control problems and decentralized networked control problems with unreliable communication; (L4) The standard team-theoretic results are limited to decentralized LQG control problems over a finite time horizon. For decen- tralized control problems over an infinite time horizon, the corresponding team problem will have infinite number of members and no general results are known about the solution of LQG dynamic team decision problems with infinite members; (L5) The prior work has largely assumed that the system model is known. However, this is not always the case for real-world systems where model parameters are usually only partially known or even totally unknown. This thesis will focus on four classes of team decision and decentralized control problems where some of the above limitations are present and develop solution approaches that cir- cumvent these limitations. With respect to limitations L2 and L3, we study a class of decentralized control problems with unreliable communication among the controllers over a finite time horizon and show that not only can we find structural results for optimal control strategies, but it is also possible to explicitly find the optimal strategies in a computation- ally efficient way. With respect to limitation L4, we study a class of decentralized control problems over an infinite time horizon and characterize a set of conditions under which the decentralized control system is stabilizable. Further, when these conditions are met, we provide an explicit characterization of the optimal strategies. With respect to limita- tion L5, we study a decentralized control problem with a partially unknown system and propose an efficient learning algorithm in which each controller independently learns the unknown system. Finally, we attempt to bypass limitation L1 and identify a new class of dynamic team/decentralized control problems beyond partially nested problems for which the computation of optimal strategies remains tractable. i Acknowledgements Over the course of my doctoral studies, I had the privilege of having the fantastic support of many individuals, without whom none of this would be possible. First and foremost, I express my deepest gratitude to my advisor Professor Ashutosh Nayyar for his continuous support and diligent guidance during the past years. Without his generosity in sharing his in-depth knowledge, his patience and tenacity, this work would not be what it is. I thank him for that, and for all the time he spent to help me improve my writings and presentations. I would also like to thank Yi Ouyang for being like a second advisor to me. I much enjoyed working and having rewarding and invaluable discussions with him. Our work together remarkably enhanced my understanding and improved the quality of my work. I am grateful to my qualifying exam and dissertation committee members, Professor Rahul Jain, Professor Ketan Savla, Professor Bhaskar Krishnamachari, and Professor Insoon Yang. Their insightful comments and constructive suggestions have been essential for me to im- prove the quality of this dissertation. I would also like to thank my labmates and colleagues at USC, Mukul Gagrani, Mehdi Jafarnia, Hiteshi Sharma, and Nathan Dahlin. I am grateful to them for their valuable suggestions and many fruitful discussions. I am thankful to several people who have made my stay in Los Angeles so enjoyable and homely. In particular, I would like to thank Mehdi Jafarnia again for being like a true brother to me and for being such a generous and wonderful roommate, Navid Naderial- izadeh for helping me settle in and for giving me rides to beautiful places around Los Angeles, Ahmad Fallahpour for the spiritual and insightful discussions we had, and Mo- hammad Noormohammadpour for his continuous encouragement and support. I also thank my friends Ghasem Pasandi, Seyed Mohammadreza Mousavi, Mehrdad Kiamari, and Ali Zarei for the time we spent together at Tea time! I would also like to thank everyone in the volleyball group for all the fun. Above all, I want to express my sincerest and deepest gratitude to my lovely parents (Maman and Baba) and my lovely brothers (Amin, Ali, and Hamed). Their boundless love, constant support, and selfless sacrifices have enabled me to achieve all I have in my life. Undoubtedly, the hardest part of my Ph.D. was not being able to see them over the last five years. I am forever indebted to them for everything. This dissertation is dedicated to them. ii Contents Abstract i Acknowledgements ii Contents iii List of Figures viii List of Tables x 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Team-theoretic formulation . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Prior results for LQG team decision and decentralized control problems 2 1.1.3 Limitations of prior work in LQG team decision and decentralized control problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.4 Scope of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Organization and Contributions of the Thesis . . . . . . . . . . . . . . . . . 6 1.2.1 Chapter 3: Decentralized Control over Unreliable Communication– Finite horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Chapter 4: Decentralized Control of Finite State and Jump Linear Systems Connected by Markovian Channels . . . . . . . . . . . . . . 8 1.2.3 Chapter 5: Decentralized Control over Unreliable Communication– Infinite horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.4 Chapter 6: Decentralized Control with Partially Unknown Systems . 10 1.2.5 Chapter 7: Dynamic Teams and Decentralized Control Problems with Substitutable Actions . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Preliminaries 14 2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Operator Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 iii Contents 3 Decentralized Control over Unreliable Communication– Finite horizon 17 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 System Model and Problem Formulation . . . . . . . . . . . . . . . . . . . . 20 3.2.1 Communication Model . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.2 Information structure and cost . . . . . . . . . . . . . . . . . . . . . 22 3.2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.4 Characteristics of the Optimization Problem . . . . . . . . . . . . . 23 3.3 Equivalent Problem and Dynamic Program: A Common Information Approach 24 3.3.1 Identifying irrelevant information at the controllers . . . . . . . . . . 24 3.3.2 Formulating an equivalent centralized problem . . . . . . . . . . . . 25 3.3.3 Identifying an information state for the remote controller . . . . . . 28 3.3.4 Writing a dynamic program for the equivalent problem . . . . . . . . 29 3.4 Optimal Control Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.1 Optimal prescription strategy in Problem 3.2 . . . . . . . . . . . . . 31 3.4.2 Optimal control strategies in Problem 3.1 . . . . . . . . . . . . . . . 32 3.5 Extension to Multiple Local Controllers . . . . . . . . . . . . . . . . . . . . 33 3.5.1 System Model and Problem Formulation . . . . . . . . . . . . . . . . 33 3.5.2 Equivalent Problem and Dynamic Program: A Common Information Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.5.3 Optimal Control Strategies . . . . . . . . . . . . . . . . . . . . . . . 44 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.6.1 Common Information Approach . . . . . . . . . . . . . . . . . . . . 46 3.6.2 Structure of Optimal Controllers . . . . . . . . . . . . . . . . . . . . 48 3.6.3 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.6.3.1 No control action for some controllers . . . . . . . . . . . . 48 3.6.3.2 Decoupled systems . . . . . . . . . . . . . . . . . . . . . . . 49 3.6.3.3 Always active links . . . . . . . . . . . . . . . . . . . . . . 49 3.6.3.4 Always failed links . . . . . . . . . . . . . . . . . . . . . . . 49 3.7 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4 Decentralized Control of Finite State and Jump Linear Systems Con- nected by Markovian Channels 54 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 System Model and Problem Formulation . . . . . . . . . . . . . . . . . . . . 56 4.3 Main Result For The General System Model . . . . . . . . . . . . . . . . . . 59 4.4 Finite Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5 Markov jump linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.5.1 Solving the DP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 iv Contents 4.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.6.1 Decentralized Networks Control Systems with Broadcast-out Archi- tecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.6.2 Decentralized Networks Control Systems with Decoupled subsystems and coupled costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.6.3 Two-controller decentralized systems with decoupled dynamics, cou- pled costs, and two-way unreliable communication . . . . . . . . . . 70 4.6.4 Decoupled subsystems with coupled costs . . . . . . . . . . . . . . . 70 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5 Decentralized Control over Unreliable Communication– Infinite horizon 72 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.1.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2 System Model and Problem Formulation . . . . . . . . . . . . . . . . . . . . 75 5.2.1 Communication Model . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.2 Information structure and cost . . . . . . . . . . . . . . . . . . . . . 76 5.2.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.3 Review of Markov Jump Linear Systems . . . . . . . . . . . . . . . . . . . . 79 5.4 Infinite Horizon Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . 82 5.4.1 Answering Q1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.4.2 Answering Q2 and Q3 . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.4.3 Summary of the Infinite Horizon Results . . . . . . . . . . . . . . . . 87 5.5 Extension to Multiple Local Controllers . . . . . . . . . . . . . . . . . . . . 89 5.5.1 Infinite Horizon Optimal Control . . . . . . . . . . . . . . . . . . . . 93 5.5.2 Summary of the Infinite Horizon Results . . . . . . . . . . . . . . . . 96 5.6 Extension to Multiple Local Controllers With A Global State . . . . . . . . 97 5.6.1 Infinite Horizon Optimal Control . . . . . . . . . . . . . . . . . . . . 102 5.6.2 Summary of the Infinite Horizon Results . . . . . . . . . . . . . . . . 105 5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.7.1 Summary of the Approach . . . . . . . . . . . . . . . . . . . . . . . . 106 5.7.2 The Information Switching . . . . . . . . . . . . . . . . . . . . . . . 107 5.8 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6 Decentralized Control with Partially Unknown Systems 111 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.1.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.2.1 The Optimal Multi-Agent Linear-Quadratic Problem . . . . . . . . . 114 6.2.2 The Multi-Agent Reinforcement Learning Problem . . . . . . . . . . 116 6.3 A Single-Agent LQ Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.4 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.5 Proof of Theorem 6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 v Contents 6.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7 Dynamic Teams and Decentralized Control Problems with Substitutable Actions 127 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.1.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 7.2 Dynamic Team with Non-partially-nested Information Structure . . . . . . 129 7.2.1 Team Model and Information Structure . . . . . . . . . . . . . . . . 129 7.2.2 Substitutability Assumption . . . . . . . . . . . . . . . . . . . . . . . 130 7.2.3 Partially nested expansion of the information structure . . . . . . . 132 7.2.4 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.2.5 Proof of Theorem 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 134 7.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.3 Substituability in Decentralized LQG Control . . . . . . . . . . . . . . . . . 142 7.3.1 System Model and Information Structure . . . . . . . . . . . . . . . 142 7.3.2 Substitutability Assumption . . . . . . . . . . . . . . . . . . . . . . . 143 7.3.3 A Centralized Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.3.4 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.3.5 Proof of Theorem 7.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 8 Concluding Remarks 150 8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 8.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.2.1 Decentralized Control over Unreliable Communication with Output feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.2.2 Stochastic Teams with Randomized Information Structures . . . . . 156 8.2.3 Learning in Decentralized Control . . . . . . . . . . . . . . . . . . . 156 Appendices 158 A Decentralized Control over Unreliable Communication– Finite horizon . . . 158 A.1 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 A.2 Proof of Lemma 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 A.3 Proof of Lemma 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 A.4 Proof of Theorem 3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . 165 A.5 Proof of Theorem 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . 167 A.6 Proof of Theorem 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . 171 B Decentralized Control of Finite State and Jump Linear Systems Connected by Markovian Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 B.1 Proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 172 vi Contents B.2 Proof of Theorem 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . 174 C Decentralized Control over Unreliable Communication– Infinite horizon . . 179 C.1 Properties of the Operators . . . . . . . . . . . . . . . . . . . . . . . 179 C.2 Proof of Lemma 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 C.3 Proof of Lemma 5.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 C.4 Proof of Lemma 5.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 C.5 Proof of Lemma 5.8, parts 1 and 2 . . . . . . . . . . . . . . . . . . . 182 C.6 Proof of Lemma 5.8, part 3 . . . . . . . . . . . . . . . . . . . . . . . 184 C.7 Proof of Lemma 5.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 C.8 Proof of Lemma 5.17 . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 C.9 Proof of Lemma 5.19 . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 C.10 Proof of Lemma 5.20 . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 C.11 Proof of Lemma 5.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 C.12 Proof of Claim .5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 D Decentralized Control with Partially Unknown Systems . . . . . . . . . . . 195 D.1 Proof of Lemma 6.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 D.2 Proof of Lemma 6.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 D.3 Proof of Lemma 6.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 D.4 Proof of Lemma 6.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 E Dynamic Teams and Decentralized Control Problems with Substitutable Ac- tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 E.1 Proof of Claim 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Bibliography 202 vii List of Figures 1.1 An application of the DNCS of Chapter 3 in smart buildings . . . . . . . . 7 1.2 An application of the DNCS of Chapter 3 in control of UAVs . . . . . . . . 7 1.3 System model of problem of Chapter 4 . . . . . . . . . . . . . . . . . . . . . 9 1.4 System model of problem of Chapter 6. Solid lines indicate communication links, dashed lines indicate control links, and dash-dot lines indicate that one system can affect another one. . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5 An application of problems with substitutability property . . . . . . . . . . 12 3.1 System model. The binary random variables Γ 1:N t indicate whether packets are transmitted successfully. Dashed lines indicate control links and solid lines indicate communication links. Blue links are perfects but red links are prone to packet drops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Two-controller system model. The binary random variable Γ t indicates whether packets are transmitted successfully. Dashed lines indicate control links and solid lines indicate communication links. Blue links are perfects but red links are prone to packet drops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 System model. The binary random variables Γ 1:N t indicate whether packets are transmitted successfully. Dashed lines indicate control links and solid blue lines indicate communication links. Blue links are perfects but red links are prone to packet drops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 Time-ordering of relevant variables. . . . . . . . . . . . . . . . . . . . . . . . 34 4.1 System model. The binary random variables Γ 1:N t indicates whether packets are transmitted successfully. Blue lines indicate perfect links and red lines indicate unreliable links. Solid lines are communication links and dotted lines are control links. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2 Two-state Markovian channel from C n to C 0 . . . . . . . . . . . . . . . . . . 57 4.3 Broadcast-out architecture. Blue dotted lines indicate perfect communication links and blue solid lines indicate that the root node affects the dynamics of the leaf nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4 Communication over networks. Blue dotted lines indicate perfect communi- cation links and red dotted lines indicate unreliable communication channels. 70 4.5 Two-controller system with two-way unreliable communication. Red dotted lines indicate unreliable communication links. . . . . . . . . . . . . . . . . . 70 viii List of Figures 4.6 Decoupled subsystem with coupled costs. . . . . . . . . . . . . . . . . . . . 71 5.1 Two-controller system model. The binary random variable Γ t indicates whether packets are transmitted successfully. Dashed lines indicate control links and solid lines indicate communication links. Blue links are perfects but red links are prone to packet drops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2 System model. The binary random variables Γ 1:N t indicate whether packets are transmitted successfully. Dashed lines indicate control links and solid blue lines indicate communication links. Blue links are perfects but red links are prone to packet drops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3 System model. The binary random variables Γ 1:N t indicates whether packets are transmitted successfully. Blue lines indicate perfect links and red lines indicate unreliable links. Solid lines are communication links and dotted lines are control links. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.4 P 1 i=0 kP i t+1 −P i t k F versus number of iterations . . . . . . . . . . . . . . . . 108 5.5 Average cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.1 Three-agent system model. Solid lines indicate communication links, dashed lines indicate control links, and dash-dot lines indicate that one system can affect another one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.2 TS-MARL algorithm with sampling conditionC of [1] . . . . . . . . . . . . . . 125 ix List of Tables 3.1 Average running time in seconds for computing the optimal strategies for 100 instances of Problem 3.3 with unreliable links . . . . . . . . . . . . . . . . . 52 3.2 Average running time in seconds for computing the optimal strategies for the corresponding centralized LQ problems (with always active links) . . . . . . 52 x To My Dearest Father, Mother, and Brothers. xi Chapter 1 Introduction 1.1 Background The advent of new applications and demands for decentralized systems is changing the shape of today’s control systems. Smart grids [2–4], building systems [5–7], networked con- trol systems (NCSs) [8–11], mobile sensor networks [12], automated highway systems and unmanned aerial vehicles [13–16] are all examples of decentralized systems. The perfor- mance of such systems hinges on a series of operational decisions made by multiple con- trollers where each controller may have different information regarding the overall system. These two salient features of decentralized systems –namely, decentralization of decision- making and decentralization of information– distinguish such systems from centralized sys- tems. While techniques from centralized stochastic control can be used to solve many decision and control problems in centralized systems, the decentralization of information and decision-making presents a significant challenge in solving decentralized decision and control problems. Team theory provides a systematic framework for studying decentralized decision and control problems. In this introductory chapter, we will present a description of team-theoretic approach, a brief literature review of prior work on team decision and decentralized control problems relevant to this thesis, and a discussion of some of the limitations of this prior work. We will also specify the scope of this thesis and give an overview of the class of problems we examine in this thesis and the organization of rest of this thesis. 1 Chapter 1. Introduction 1.1.1 Team-theoretic formulation In team decision problems, first proposed by Marschak [17] and Radner [18], there are multiple members/decision makers, each taking an action based on the information available to it. A key property of a team decision problem is the information structure of the problem which describes what information is available to which member. The information available to each member can be a function of (i) the actions of other members and (ii) exogenous uncertainties which are not controlled by any team members. The goal of the team members is to collaboratively accomplish a common task by minimizing a shared cost function. The generality of team problem formulation allows one to view any decentralized control problem as a team decision problem by treating each controller at each time as a team member. Team decision problems where the information function of a member does not depend on the action of others are known as static teams. Team decision problems which are not static are known as dynamic teams. A team decision problem is referred to as a linear-quadratic- Gaussian (LQG) team problem if information functions are linear, the cost function is quadratic, and the exogenous uncertainties are Gaussian. The main focus of this thesis (with the exception of Chapter 4) is on LQG decentralized control problems. Since these problems can be considered as team problems, in the next sec- tion, we first provide a brief literature review of prior work on LQG team decision problems and then discuss some of the existing results for LQG decentralized control problems. 1.1.2 Prior results for LQG team decision and decentralized control prob- lems In dynamic teams, the dependence of the information of one team member on the strategy of another in the past leads to difficult strategy optimization problems. This complexity is best captured in the counterexample provided by Witsenhausen [19] where it was demon- strated that nonlinear strategies outperform linear strategies. Thereafter, there has been significant interest in identifying classes of problems that are more tractable. [20] intro- duced the concept of “partially nested” information structure. An information structure is partially nested if the following condition holds: when the action of a member/decision maker (say i) affects the information of another member (say j), then member j should know everything that memberi knows. For the LQG dynamic team decision problems with 2 Chapter 1. Introduction partially nested information structure –a.k.a. the partially nested LQG team– [20] showed that affine control strategies are optimal. Related properties such as stochastic nestedness [21] and P-quasiclassical information structures [22] have been later introduced for some non-partially nested problems. [21–24] studied the idea of considering an expanded infor- mation structure and using strategies in the expansion to investigate optimal strategies in the original information structure. In some cases [21], [23, Section 3.5.2], it is shown that the expansion is redundant as far as strategy optimization is concerned: an optimal strat- egy is found in the expanded information structure that is implementable in the original information structure. In [22, 24], an optimal strategy found in the expanded structure may not be directly implementable in the original information structure but, under some condi- tions, it can be used to construct an optimal strategy in the original information structure. [25, 26] introduced the idea of considering a team problem as a centralized planning problem that may be solved by its decomposition into several sub-problems. However, the results of [25, 26] appear to be computationally prohibitive when applied to LQG teams. Recently, [27] introduced the concept of common knowledge in teams that can be used to construct a sequential decomposition of the strategy optimization problem in teams. However, this result is also not applicable to LQG teams as it has been proposed for problems with finite spaces. For LQG decentralized control problems, the prior work has largely focused on the problems with partially nested information structure where the controllers can communicate with each other either over deterministic graphs [28–36] or under some delay constraints [37–45]. We should note that in addition to the team-theoretic approach, there is another approach for decentralized control problems which is known as the norm-optimal design approach. This approach can be applied to the problems where the plant and the controllers are linear time-invariant (LTI) systems. The plant has two inputs: an exogenous input vector and a control input vector. The plant has two outputs: a performance-related vector and an observation vector. The controller is a LTI system with the observation vector as its input and the control input vector as its output. The information structure is described in terms of structural constraints on the transfer function of the controller. For a fixed choice of the controller, the closed loop LTI system can be described in terms of its transfer function from the exogenous input vector to the performance related vector. The design problem is to minimize a norm (such as theH 2 -norm) of this transfer function. In the norm-optimal design framework, some properties of the plant and the information constraint have been identified as simplifying conditions. These properties imply convexity of the transfer function norm 3 Chapter 1. Introduction optimization problem. Examples of such properties include quadratic invariance [46, 47], funnel causality [48], and certain hierarchical architectures [49]. The norm-optimal design approach has been used for various models. This approach, however, does not apply if the closed loop system is not LTI. 1.1.3 Limitations of prior work in LQG team decision and decentralized control problems Existing work on LQG team decision and decentralized control problems has the following key limitations. (L1) The available results on LQG team decision and decentralized control problems are largely restricted to partially nested information structures and little is known about the solution of general problems with non-partially nested information structures. (L2) Even for a decentralized control problem that can be described by a partially nested LQG team problem, the team-theoretic approach of [20] does not provide any struc- tural results or sufficient statistics for the decision strategies as it describes the strate- gies as a function of all the information a controller has. This lack of structure typically makes it difficult to compute good/optimal strategies and to implement such strate- gies. Structural results/sufficient statistics have been obtained for only some models of LQG decentralized control problems [32, 50]. (L3) Much of the prior work in team decision (resp. decentralized control) problems as- sume that the interconnections among the members (resp. controllers) in terms of whether one member’s (resp. controller’s) actions affects another member’s (resp. controller’s) information or whether one’s information is available to the other mem- bers (resp. controllers) are deterministic. However, real networks are subjected to random effects such as packet drops [51, 52]. Hence, the standard results do not easily extend to networked control problems and decentralized networked control problems with unreliable communication. (L4) The standard team-theoretic results are limited to decentralized control problems over a finite time horizon. For the decentralized control problems over an infinite time horizon, the corresponding team problem will have an infinite number of members and 4 Chapter 1. Introduction no general results are known about the solution of dynamic team decision problems with infinite members 1 . (L5) The prior work has largely focused on problems with known models. In particular, for team decision (resp. decentralized control) problems, it is assumed that the in- formation (resp. system dynamics) functions are known. However, this is not always the case for real-world systems where these functions are usually only partially known or even totally unknown. Hence, the standard results do not work for decentralized control problems with partially or totally unknown model parameters. 1.1.4 Scope of the Thesis We will focus on four classes of decentralized control problems where some of the limita- tions of Section 1.1.3 are present and develop solution approaches that circumvent these limitations. 1. With respect to the limitations L2 and L3, we study decentralized control problems with unreliable communication among the controllers in Chapter 3 and Chapter 4 and show that not only can we find structural results for the optimal control strategies of this problem, but it is also possible to explicitly find the optimal strategies in a computationally efficient way. 2. With respect to the limitation L4, we study the problem of Chapter 3 over an infinite time horizon in Chapter 5 and by exploiting the connection between this problem and Markov jump linear systems, we characterize a set of conditions under which the system of this problem is stabilizable. Further, when these conditions are met, we provide an explicit characterization of the optimal strategies. 3. With respect to the limitation L5, we study a decentralized control problem with a partially unknown system in Chapter 6 and propose an efficient learning algorithm in which each controller independently learns the unknown system. 4. Finally, in Chapter 7 we attempt to bypass the limitation L1 and identify a new class of problems beyond partially nested problems under which the optimal decentralized control problems still remain tractable. 1 A team-theoretic result with an infinite number of team members is available in [53] but it requires a very special cost and information structure. 5 Chapter 1. Introduction 1.2 Organization and Contributions of the Thesis 1.2.1 Chapter 3: Decentralized Control over Unreliable Communication– Finite horizon Problem Statement: We consider a decentralized networked control system (DNCS) consisting of a “remote” controller and a collection of linear plants, each associated with a “local” controller. For example, Fig. 1.1 shows an example of this DNCS architecture in a smart building where the central AC unit plays the role of a remote controller that affects the temperature of multiple rooms which may also have local controllers. Fig. 1.2 presents another example of this DNCS architecture in remotely controlled systems such as unmanned aerial vehicles (UAVs) in which certain low-level functions like collision avoidance are controlled by a group of local processors, but many high-level mission-related functions are remotely controlled by a ground control station [14]. For this DNCS architecture, we assume that each local controller can perfectly observe the state of its associated plant. The remote controller can control all plants, but it does not have direct access to the states as its name suggests. The remote controller and local controllers are connected by a communication network where the links from the remote controller to local controllers are perfect but the links from local controllers to the remote controller are unreliable channels with random packet drops. Our objective is to find decentralized control strategies for the local controllers and the remote controller that minimize an overall quadratic performance cost over a finite horizon. Main challenges: Due to the unreliable links from the local controllers to the remote controller, the information structure of this problem is not partially nested. Furthermore, we do not assume that the underlying primitive random variables of the systems are necessarily Gaussian. Therefore, we cannot a priori restrict to linear strategies for optimal control. This means that we have to optimize over the full (infinite-dimensional) space of control strategies rather than the finite-dimensional subspace of linear strategies. Main ideas: We use ideas from the common information approach of [54] to compute optimal controllers. Since the state and action spaces of our problem are Euclidean spaces, the results and arguments of [54] for finite spaces cannot be directly applied. We provide a 6 Chapter 1. Introduction Figure 1.1: An application of the DNCS of Chapter 3 in smart buildings Figure 1.2: An application of the DNCS of Chapter 3 in control of UAVs 7 Chapter 1. Introduction complete set of results to adapt the common information approach to our linear-quadratic setting with non-partially nested information structure. Our rigorous proofs carefully handle the issues of measurability constraints, existence of well-defined value functions and infinite dimensional strategy spaces. Main results: We show that the optimal control strategies of this problem admit simple structures– the optimal remote control is linear in the common estimates of system states and each optimal local control is linear in both the common estimates of system states and the local state of this controller. The main strengths of our result are that (i) it provides a simple strategy that is proven to be optimal; (ii) it uses estimates that can be easily updated; (iii) it provides a tractable way of computing the gain matrices involved in the optimal strategy. In fact, our numerical experiments indicate that the computational burden of finding the optimal gain matrices in our decentralized problem is comparable to finding optimal strategies in a corresponding centralized LQ problem. 1.2.2 Chapter 4: Decentralized Control of Finite State and Jump Linear Systems Connected by Markovian Channels Problem Statement: We consider a decentralized networked control system (DNCS) consisting of one “global” controller C 0 and N “local” controllers C 1 to C N as shown in Fig. 1.3. The DNCS includes a “global” plant controlled only by the global controller and N “local” plants which are controlled jointly by a co-located local controller and the global controller. The global controller is analogous to the remote controller in Chapter 3. However, in addition to controlling the local plants, here it also controls the global plant (which was absent in Chapter 3). All N + 1 plants have general system dynamics. We assume that the state of a local plant is perfectly observed by its co-located local controller and the global state is perfectly observed by the global controller. Each local controller can inform the global controller of local plant’s state through an unreliable two-state Markovian communication channel. The global controller shares whatever information it has received over the unreliable channels as well as the global state with all local controllers. We assume that the communication channels from the global controller to the local controllers are perfect as shown in Fig. 1.3. The objective of the controllers is to cooperatively minimize a general cost function over a finite time horizon. 8 Chapter 1. Introduction Global Controller C 0 Global Plant X 0 t Local Controller C 1 Plant X 1 t U 0 t U 1 t X 1 t Z 1 t Γ 1 t U 0 t X 0 t X 0 t ,Z 1:N t Local Controller C N Plant X N t U 0 t U N t X N t Z N t Γ N t X 0 t ,Z 1:N t Figure 1.3: System model of problem of Chapter 4 Main challenges: While this problem is closely related to the problem of Chapter 3, the presence of the global state, the Markovian channel, and the general system dynamics and cost function make this problem different. Main ideas: We use the main ideas of Chapter 3 and show that they can be extended for our problem in Chapter 4. Main results: We first provide a dynamic program to obtain the optimal strategies of the controllers. For the case with finite state and action spaces, it is possible to solve the dynamic program numerically using POMDP (Partially observable Markov Decision Pro- cesses) solvers. For the case with switched linear dynamics and mode-dependent quadratic cost, we will show that it is possible to explicitly solve the dynamic program and obtain explicit optimal strategies for all local controllers and the global controller. 1.2.3 Chapter 5: Decentralized Control over Unreliable Communication– Infinite horizon Problem Statement: We consider the same DNCS architecture as the one described in Chapter 3. While Chapter 3 focused on the finite time horizon problem, in this chapter we 9 Chapter 1. Introduction attempt to find decentralized control strategies that minimize an infinite horizon average cost. Main challenges: For the finite horizon version of this problem, we obtained optimal decentralized controllers in Chapter 3 using ideas from the common information approach. The optimal strategies in the finite horizon case are characterized by coupled Riccati re- cursions. In contrast to the finite horizon problem, stability is an important issue for the infinite horizon problem and this makes it non-trivial to extend finite horizon results to infinite horizon. Main ideas: We exploit a connection between our problem and an auxiliary Markov Jump Linear System (MJLS). This connection enables us to use the theory of Markov jump linear systems ([55, 56]) to investigate the existence of solution to infinite horizon/fixed point version of the coupled Riccati recursions found for our problem in Chapter 3. We use these solutions to construct provably optimal decentralized control strategies. Main results: We characterize a critical failure probability for each unreliable link above which no decentralized control strategies can give finite cost for the DNCS. Further, when all link failure probabilities are below the thresholds, we provide an explicit characterization of the optimal strategies. 1.2.4 Chapter 6: Decentralized Control with Partially Unknown Systems Problem Statement: We consider a multi-agent (or multi-controller) linear-quadratic (LQ) control problem consisting of three systems, a “global” system and two “local” systems as shown in Fig. 1.4. In this problem, there are three agents – the actions of agent 1 (global agent) can affect the global system as well as the local systems while the actions of local agents 2 and 3 can only affect their respective co-located local systems. Further, the global system’s state can affect the local systems’ state evolution. We are interested in minimizing the infinite-horizon average cost incurred when the dynamics of the global system are not known to the agents, which makes this problem a multi-agent reinforcement learning (MARL) problem. 10 Chapter 1. Introduction Agent 2 Agent 3 Agent 1 System 2 A 22 ,B 22 known System 3 A 33 ,B 33 known System 1 A 11 ,B 11 unknown U 2 t U 3 t U 1 t U 1 t U 1 t X 2 t X 3 t X 1 t X 1 t X 1 t Figure 1.4: System model of problem of Chapter 6. Solid lines indicate communication links, dashed lines indicate control links, and dash-dot lines indicate that one system can affect another one. Main challenges: Our problem is a decentralized online learning problem where agents need to learn model parameters and control the system simultaneously. The asymmetry of the information among agents and the coupling in their dynamics and cost make this a difficult problem. Main ideas: We construct an auxiliary single-agent (centralized) LQ control problem and use it for the regret analysis of our MARL problem. Main results: We propose a Thompson Sampling (TS)-based multi-agent learning al- gorithm where each agent learns the global system’s dynamics independently. We show that the expected (Bayesian) regret achieved by our algorithm is upper bounded by the expected (Bayesian) regret for the auxiliary single-agent problem under a TS algorithm. Consequently, using existing results for single-agent LQ regret, our result indicates that the expected regret of our algorithm is upper bounded by ˜ O( √ T ) under certain assumptions. Our numerical experiments indicate that this bound is matched in practice. 1.2.5 Chapter 7: Dynamic Teams and Decentralized Control Problems with Substitutable Actions Problem Statement: We consider a LQG dynamic team problem and also a LQG de- centralized control problem for which the information structure is not partially nested. Our objective is to find optimal control strategies for these two problems. We assume that the 11 Chapter 1. Introduction Figure 1.5: An application of problems with substitutability property pattern of information propagation and the cost function enjoy a special property that we call “Substitutability” . As an example of a problem that has this property, consider a set of soccer-playing robots shown in Fig. 1.5. In this example, the action of robot A affects the information of robot C, but assume that there is no communication link from robot A to robot C due to a large distance between them. Hence, the information structure of this problem does not satisfy the partially nested property. However, since robots A and B are close to each other, robot A can share its information with robot B. Now, assume that robotB is also able to replicate the effect of action of robotC on the information structure and cost function. We call robot B a substituting member for the pair of robots A and C. We use such substituting members to solve our dynamic team and decentralized control problems. Main challenges: The information structure of these two problems is not partially nested. Therefore, we cannot a priori restrict to linear strategies for optimal control and this makes the problem of finding optimal control strategies hard. Main ideas: We show that although these problems do not belong to the class of prob- lems with partially nested information structure, the substitutability property allows us to circumvent the difficulties arising due to the non-partially nested structure. Main results: For the non-partially nested LQG dynamic team problem, we show that under certain conditions linear strategies are optimal. For the non-partially nested LQG 12 Chapter 1. Introduction decentralized control problem, the state structure can be exploited to obtain optimal con- trol strategies with recursively update-able sufficient statistics. These results suggest that substitutability can work as a counterpart of the information structure requirements that enable simplification of dynamic teams and decentralized control problems. 13 Chapter 2 Preliminaries 2.1 Notation Random variables/vectors are denoted by upper case letters, their realization by the cor- responding lower case letter. For a sequence of column vectors X,Y,Z,..., the notation vec(X,Y,Z,...) denotes the vector [X | ,Y | ,Z | ,...] | . The transpose, trace, spectral radius, spectral norm, Frobenius norm, and Moore-Penrose pseudo-inverse of matrixA are denoted by A | , tr(A), ρ(A),kAk,kAk F , and A † ; respectively. In general, subscripts are used as time index while superscripts are used to index team members/controllers. For time indices t 1 ≤t 2 , X t 1 :t 2 (resp. g t 1 :t 2 (·)) is the shorthand notation for the variables X t 1 ,X t 1 +1 ,...,X t 2 (resp. functions g t 1 (·),...,g t 1 (·)). Similarly, for n 1 ≤ n 2 , X n 1 :n 2 (resp. g n 1 :n 2 (·)) is the shorthand notation for the variablesX n 1 ,X n 1 +1 ,...,X n 2 (resp. functionsg n 1 (·),...,g n 2 (·)). For setA ={α 1 ,...,α N }, the collectionX α 1 ,...,X α N (resp. g α 1 (·),...,g α N (·)) is denoted by{X m } m∈A (resp. {g m (·)} m∈A ). Furthermore, the notation X −αn is used to denote {X m } m∈A\{αn} . The intersection of the events E α 1 ,...,E α N is denoted by{E m } m∈A . In addition, for a sequence of column vectors X i , i∈A ={α 1 ,α 2 ,...,α n }, X A is used to denote the vector vec(X α 1 ,X α 2 ,...,X αn ). For two vectors X and Y , we use X⊂ v Y to denote thatX is a sub-vector of Y andX6⊂ v Y to denote thatX is not a sub-vector of Y . IfA is a set, we denote the cardinality ofA by|A|. Furthermore, ifA ={α 1 ,α 2 ,...,α n }, we use [B m ] m∈A to denote a matrix composed of B α 1 ,B α 2 ,...,B αn as row blocks, that is [B m ] m∈A = [(B α 1 ) | , (B α 2 ) | ,..., (B αn ) | ] | . Similarly, [B mk ] m∈A denotes a matrix composed of B α 1 k ,B α 2 k ,...,B αnk as row blocks. 14 Chapter 2. Preliminaries The indicator function of set E is denoted by 1 E (·), that is, 1 E (x) = 1 if x∈ E, and 0 otherwise. If E is an event, then1 E denotes the resulting random variable. P(·), E[·], and cov(·) denote the probability of an event, the expectation of a random variable/vector, and the covariance matrix of a random vector, respectively. For random variables/vectors X and Y , P(·|Y = y) denotes the probability of an event given that Y = y, and E[X|y] := E[X|Y =y]. For a strategyg, we useP g (·) (resp. E g [·]) to indicate that the probability (resp. expectation) depends on the choice ofg. Let Δ(R n ) denote the set of all probability measures onR n with finite second moment. For any θ∈ Δ(R n ), θ(E) = R R n 1 E (x)θ(dx) denotes the probability of event E under θ. The mean and the covariance of a distribution θ∈ Δ(R n ) are denoted by μ(θ) and cov(θ), respectively, and are defined as μ(θ) = R R n xθ(dx) and cov(θ) = R R n (x−μ(θ))(x−μ(θ)) | θ(dx). The notation I n and 0 n×m is used to denoted a n×n identity matrix and a n×m zero matrix, respectively. For two symmetric matrices A,B, A B means that (A−B) is positive semi-definite (PSD). For a block matrix A, we use [A] m,: to denote block matrix located at the m-th block row and [A] :,n to denote block matrix located at the n-th block column of A. Further, [A] m,n denotes the block matrix located at the m-th block row and n-th block column of A. For example, if A = A 11 A 12 A 13 A 21 A 22 A 23 A 31 A 32 A 33 , then [A] 2,: = h A 21 A 22 A 23 i , [A] :,3 = A 13 A 23 A 33 , and [A] 2,3 =A 23 . When random vector X is normally distributed with mean μ and variance Σ, it is shown as X∼N(μ, Σ). For a vector x and a matrix G, we use QF (G,x) :=x | Gx = tr(Gxx | ) to denote the quadratic form. 2.2 Operator Definitions We define the following operators. 15 Chapter 2. Preliminaries • For matrices P,Q,R,A,B with appropriate dimensions, we define Ω(P,Q,R,A,B) and Ψ(P,R,A,B) as follows: Ω(P,Q,R,A,B) :=Q +A | PA−A | PB(R +B | PB) −1 B | PA. (2.1) Ψ(P,R,A,B) :=−(R +B | PB) −1 B | PA. (2.2) • Let P be a block matrix with M 1 block rows and M 2 block columns. Then, for numbers m 1 ,m 2 and matrix Q,L zero (P,Q,m 1 ,m 2 ) is a matrix with the same size as P defined as follows: L zero (P,Q,m 1 ,m 2 ) := 1 :m 2 − 1 m 2 m 2 + 1 :M 2 0 0 0 1 :m 1 − 1 0 Q 0 m 1 0 0 0 m 1 + 1 :M 1 • Let P be a block matrix with M 1 block rows and M 1 block columns. Then, for numberm 1 and matrixQ,L iden (P,Q,m 1 ) is a matrix with the same size asP defined as follows: L iden (P,Q,m 1 ) := 1 :m 1 − 1 m 1 m 1 + 1 :M 1 I 0 0 1 :m 1 − 1 0 Q 0 m 1 0 0 I m 1 + 1 :M 1 16 Chapter 3 Decentralized Control over Unreliable Communication– Finite horizon 3.1 Introduction 3.1.1 Motivation The advent of information and communication technologies along with the development of the Internet of Things (IoT) has drawn increasing attention to networked control systems (NCSs). NCSs are distributed systems in which information is exchanged through a network among various components (controllers, smart sensors, actuators, etc.). The connectivity of NCS brings numerous opportunities to new applications such as autonomous vehicles, smart grid, remote surgery, smart home, and large manufacturing systems (see [9, 57, 58] and references therein). However, the network connection is subjected to various communication constraints. One main constraint is the unreliability of communication channels which can greatly affect the performance of NCS [10, 52]. Therefore, the study of NCS over unreliable channels is of great importance. The effect of control over unreliable channels has been investigated in [51, 59–63] for NCS with a single controller. However, most NCS applications consist of multiple sub-systems 17 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon where each sub-system may be controlled by a remote controller as well as a local con- troller and the overall system performance depends on the coordination among the remote controller and all local controllers through the communication network. In this chapter, we consider a decentralized networked control system DNCS consisting of a remote controller and a collection of linear plants, each associated with a local controller as shown in Fig. 3.1. Each plant is directly controlled by a local controller which can perfectly observe the state of the plant. The remote controller can control all plants, but it does not have direct access to the states as its name suggests. The objective of the local controllers and the remote controller is to cooperatively minimize an overall quadratic performance cost of the DNCS. The remote controller and local controllers are connected by a communication network where the downlinks from the remote controller to local controllers are perfect but the uplinks from local controllers to the remote controller are unreliable channels with random packet drops. Such scenario happens in many situations where the remote controller is equipped with sufficient communication resources, but each local controller has limited transmission capabilities. For instance, the local controllers can be a group of battery- powered telerobots or autonomous vehicles with limited transmission power proximal to their co-located systems while the remote controller can be a controlling operator connected to a power outlet or a base station with high transmission power. The DNCS structure we study models various networked systems architectures: 1) The re- mote controller can model a global controller that affects all local dynamics. For example, in a smart building, the central AC unit plays the role of a remote controller that affects the temperature of multiple rooms which may also have local controllers. Furthermore, in many remotely controlled systems such as UAVs, certain low-level functions like collision avoid- ance are controlled by a local processor, but many high-level mission-related functions are remotely controlled by a ground control station [14]; 2) System-wide global references and constraints could be modeled by the remote controller’s actions. For example, the remote controller’s action can describe the target location for a robot formation problem; and 3) The remote controller can be used to model an access point or a base station that relays and broadcasts information for all local controllers. For example, in vehicle to infrastructure (V2I) communication, the remote controller/access point can relay information among a set of autonomous vehicles [64]. When the local controllers are smart sensors or encoders that can only sense and transmit information, the DNCS operation depends only on remote estimation and control. Remote 18 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon Remote Controller C 0 Local Controller C 1 Plant X 1 t U 0 t U 1 t X 1 t Z 1 t Γ 1 t Z 1:N t Local Controller C N Plant X N t U 0 t U N t X N t Z N t Γ N t Z 1:N t Figure 3.1: System model. The binary random variables Γ 1:N t indicate whether packets are transmitted successfully. Dashed lines indicate control links and solid lines indicate communication links. Blue links are perfects but red links are prone to packet drops. estimation with a single smart sensor has been studied in [65–68] and has been extended to the case with multiple smart sensors and general packet drop models in [69, 70]. Re- mote estimation and control of a linear plant has been studied in [71–76] under various channel models between smart sensors and a remote controller. The problem we consider in this chapter is different from these previous works on NCS because our problem is a decentralized control problem with multiple controllers where the dynamics of each plant is controlled by the remote controller as well as the corresponding local controller. Find- ing optimal strategies in decentralized control problems is generally considered a difficult problem (see [19, 77, 78]). In general, linear control strategies are not optimal, and even the problem of finding the best linear control strategies is not convex [23]. Existing optimal solutions of decentralized control problems require either specific information structures, such as partially nested [20, 32–34, 79, 80], stochastically nested [21], or other specific properties, such as quadratic invariance [46] or substitutability [81, 82]. A two-controller partially-nested decentralized control problem with packet drop channels from controllers to actuators but with perfect one-directional communication from controller 1 to controller 2 was investigated in [83, 84]. For the problem we consider in this chapter, none of the above properties hold either due 19 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon to the unreliable inter-controller communication or due to the nature of dynamics and cost function. We use the common information approach to show that this problem is equivalent to a centralized sequential decision-making problem where the remote controller is the only decision-maker. We provide a dynamic program to obtain the optimal strategies of the remote controller in the equivalent problem. Then, using the optimal strategies of the equivalent problem, we obtain explicit optimal strategies for all local controllers and the remote controller. In the optimal strategies, all controllers compute common estimates of the states of the plants based on the common information obtained from the communication network. The remote controller’s action is linear in the common state estimates, and the action of each local controller is linear in both the actual state of its co-located plant and the common state estimates. 3.1.2 Organization The rest of this chapter is organized as follows. We introduce the system model and formu- late the DNCS problem with one local controller and one remote controller in Section 3.2. In Section 3.3, we formulate an equivalent problem using the common information approach and provide a dynamic program for this problem. We solve the dynamic program in Section 3.4. In Section 3.5, we extend the system model of Section 3.2 to the case with multiple local controllers and provide the optimal control strategies for this problem. In Section 3.6, we discuss some key aspects of our approach and results. In Section 3.7, we present some numerical experiments. Section 3.8 concludes this chapter. The proofs of all the technical results of this chapter appear in the Appendix A. 3.2 System Model and Problem Formulation Consider the discrete-time system with a local controller C 1 and a remote controller C 0 as shown in Fig. 3.2. The linear plant dynamics are given by X t+1 =AX t +B 10 U 0 t +B 11 U 1 t +W t =AX t +BU t +W t , (3.1) where X t ∈ R d X is the state of the plant at time t, U t = vec(U 0 t ,U 1 t ), U 0 t ∈ R d 0 U is the control action of the remote controller C 0 , U 1 t ∈ R d 1 U is the control action of the local 20 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon Remote Controller C 0 Local Controller C 1 Plant X t U 0 t U 1 t X t Γ t X t Z t Z t Z t Figure 3.2: Two-controller system model. The binary random variable Γt indicates whether packets are transmitted successfully. Dashed lines indicate control links and solid lines indicate communication links. Blue links are perfects but red links are prone to packet drops. controllerC 1 , andW t is the noise at timet. We assume thatX 0 = 0 andW t ,t = 0, 1,..., is an i.i.d. noise process with cov(W t ) = I. A,B = [B 10 ,B 11 ] are matrices with appropriate dimensions. 3.2.1 Communication Model At each timet the local controllerC 1 perfectly observes the stateX t and sends the observed state to the remote controller C 0 through an unreliable link with packet drop probability p 1 . Let Γ 1 t be a Bernoulli random variable describing the state of this link, that is, Γ 1 t = 0 when the link is broken (i.e., the packet is dropped) and Γ 1 t = 1 if the link is active. We assume that Γ 1 t ,t≥ 0, is an i.i.d. process and is independent of the noise processW 0:t ,t≥ 0. Let Z 1 t be the output of the unreliable link. Then, Γ 1 t = 1 with probability (1−p 1 ), 0 with probability p 1 . (3.2) Z 1 t = X t when Γ 1 t = 1, ∅ when Γ 1 t = 0. (3.3) We assume that Z 1 t is perfectly observed by C 0 . Further, we assume that C 0 sends an acknowledgment to the local controller C 1 if it receives the state value. Thus, effectively, Z 1 t is perfectly observed byC 1 as well. The two controllers select their control actions after 21 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon observing Z 1 t . We assume that the links for sending acknowledgments as well as the links from the controllers to the plant are perfectly reliable. 3.2.2 Information structure and cost Let H 0 t and H 1 t denote the information available to the controllers C 0 and C 1 to make decisions at time t, respectively. Then, H 0 t ={Z 1 0:t ,U 0 0:t−1 }, H 1 t ={X 0:t ,U 1 0:t−1 ,Z 1 0:t ,U 0 0:t−1 }. (3.4) H 0 t will be referred to as the common information among the two controllers at time t 1 . LetH 0 t andH 1 t be the spaces of all possible realizations of H 0 t andH 1 t , respectively. Then, the control actions are selected according to U 0 t =g 0 t (H 0 t ), U 1 t =g 1 t (H 1 t ), (3.5) where the control laws g 0 t :H 0 t → R d 0 and g 1 t :H 1 t → R d 1 are measurable mappings. We use g := (g 0 0 ,g 0 1 ,...,g 1 0 ,g 1 1 ,... ) to denote the control strategies of C 0 and C 1 . The instantaneous cost c(X t ,U t ) of the system is a quadratic function given by c(X t ,U t ) =X | t QX t +U | t RU t , (3.6) where Q is a symmetric positive semi-definite (PSD) matrix, and R = R 00 R 01 R 10 R 11 is a symmetric positive definite (PD) matrix. 3.2.3 Problem Formulation LetG denote the set of all possible control strategies ofC 0 andC 1 that ensure that all states and control actions have finite second moments. The performance of control strategies g 1 We assumed that U 0 0:t−1 is pat of H 1 t . This is not a restriction because even if U 0 0:t−1 is not directly observed by C 1 at time t, C 1 can still compute it using C 0 ’s strategy since it knows everything C 0 knows. 22 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon over a finite horizon T is measured by the total expected cost, J T (g) :=E g " T X t=0 c(X t ,U t ) # . (3.7) We consider the problem of strategy optimization for the above decentralized networked control system (DNCS) over a finite time horizon. This problem is formally defined below. Problem 3.1. For the DNCS described by (3.1)-(3.6), determine decentralized control strategies g that optimize the total expected cost over a finite horizon of duration T . In other words, solve the following strategy optimization problem: inf g∈G J T (g) (3.8) Remark 3.1. Without loss of optimality, we can restrict attention to strategy profiles g that ensure a finite expected cost at each time step. Because R is positive definite, finite expected cost at each time t is equivalent to E g [(U n t ) | U n t ] =E g [g n t (H n t ) | g n t (H n t )]<∞, n = 0, 1,∀t. (3.9) Therefore, in the subsequent analysis we will implicitly assume that the strategy profile under consideration, g, ensures that for all time t and for n = 0, 1, g n t :H n t → R d n U has finite second moments, that is, (3.9) holds. 3.2.4 Characteristics of the Optimization Problem Problem 3.1 is a decentralized optimal control problem with 2 controllers. Decentralized optimal control problems are generally believed to be hard because (i) linear strategies may not be globally optimal and (ii) the strategy optimization problem may be a non-convex problem over infinite dimensional spaces [50]. For decentralized linear-quadratic-Guassian (LQG) control problems with partially-nested information structure, however, linear control strategies are known to be optimal [20]. An information structure is partially-nested if whenever the action of a controller affects the information of another controller, the latter knows whatever the former knows. Note that Problem 3.1 is not a partially nested LQG problem. In particular, C 1 ’s action U 1 t−1 affects X t , and consequently, it affects Z 1 t . Since Z 1 t is a part of the remote controller C 0 ’s information H 0 t at time t but H 1 t−1 6⊂ H 0 t , the 23 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon information structure in Problem 3.1 is not partially nested. Furthermore, in Problem 3.1, W 0:T are not necessarily Gaussian. Therefore, we cannot a priori assume that linear control strategies are optimal for Problem 3.1. This means we have to optimize over the full space of control strategies rather than the finite-dimensional subspace of linear strategies. Our approach to Problem 3.1 is based on the common information approach [54] for decen- tralized decision-making. We identify the common information among the controllers and use it to define a common belief on the system state. This common belief can serve as an information state for a dynamic program that characterizes optimal control strategies. Even though our conceptual approach is borrowed from [54], we have to deal with the infinite- dimensional strategy spaces of our problem and we cannot fully rely on the arguments in [54] that explicitly only deal with finite strategy spaces. 3.3 Equivalent Problem and Dynamic Program: A Common Information Approach In this section, we present the following four-step approach to derive a dynamic program- ming for Problem 3.1. 3.3.1 Identifying irrelevant information at the controllers We first provide a structural result for the local controller’s strategies. Lemma 3.1. Let ˆ H 1 t ={X t ,H 0 t }, and ˆ G 1 ={g 1 ∈G 1 :g 1 t depends only on ˆ H 1 t }. Then, inf g 0 ∈G 0 ,g 1 ∈G 1 J(g) = inf g 0 ∈G 0 g 1 ∈ ˆ G 1 J(g). (3.10) Proof. See Appendix A.2. Due to Lemma 3.1, we only need to consider strategies g 1 ∈ ˆ G 1 for the local controller C 1 . That is, the local controller C 1 only needs to use ˆ H 1 t ={X t ,H 0 t } to make the decision att. According to the information structure (3.4) and Lemma 3.1,H 0 t is the common information among the controllersC 0 andC 1 , andX t (which is ˆ H 1 t \H 0 t ) is the private information used 24 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon by the local controller C 1 in its decision-making. Note that C 0 has no private information (sinceH 0 t \H 0 t =∅). Following the common information approach [54], we construct below an equivalent centralized problem using the controllers’ common information. 3.3.2 Formulating an equivalent centralized problem Consider arbitrary control strategies g 1 ∈ ˆ G 1 and g 0 ∈G 0 for the local and the remote controllers, respectively. Under these strategies, U 1 t can be written as U 1 t =g 1 t (X t ,H 0 t ) =E g [g 1 t (X t ,H 0 t )|H 0 t ] + n g 1 t (X t ,H 0 t )−E g [g 1 t (X t ,H 0 t )|H 0 t ] o . (3.11) We can rewrite (3.11) as U 1 t = ¯ g 1 t (H 0 t ) + ˜ g 1 t (X t ,H 0 t ), (3.12) where ¯ g 1 t (H 0 t ) =E g [g 1 t (X t ,H 0 t )|H 0 t ], ˜ g 1 t (X t ,H 0 t ) =g 1 t (X t ,H 0 t )−E g [g 1 t (X t ,H 0 t )|H 0 t ]. (3.13) Observe that ˜ g 1 t (X t ,H 0 t ) is conditionally zero-mean given H 0 t , that is,E g [˜ g 1 t (X t ,H 0 t )|H 0 t ] = 0. Note that ¯ g 1 t (H 0 t ) is the conditional mean of g 1 t (X t ,H 0 t ) given the remote controller’s in- formation H 0 t and ˜ g 1 t (X t ,H 0 t ) can be interpreted as the deviation of g 1 t (X t ,H 0 t ) from the mean ¯ g 1 t (H 0 t ). With this interpretation, (3.12) suggests that, at each time t, the problem of finding optimal control action U 1 t for C 1 is equivalent to the problem of finding “mean value” of U 1 t and “deviation” of U 1 t from the mean value. We will use the above representation of g 1 t in terms of ¯ g 1 t and ˜ g 1 t to formulate a central- ized decision-making problem. In the centralized problem, the remote controller is the only decision-maker. At each time t, given the realization h 0 t of the remote controller’s information, it makes three decisions: 1. Remote controller’s control action u 0 t =φ 0 t (h 0 t ), 2. Mean value of the local controller’s control action ¯ u 1 t = ¯ φ 1 t (h 0 t ), 25 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon 3. A “deviation from the mean value” mapping q t ∈Q, where Q ={q :R d X →R d 1 U , Borel measurable} 2 and q t = ˜ φ 1 t (h 0 t ). Based on the above decisions, the control actions applied to the system described by (3.1)- (3.6) are: • u 0 t , as the control action of the remote controller, • U 1 t = ¯ u 1 t +q t (X t ), as the control action of the local controller. We call u prs t := (u 0 t , ¯ u 1 t ,q t ) the prescription at time t. We denote (φ 0 t , ¯ φ 1 t , ˜ φ 1 t ) by φ prs t and write u prs t = φ prs t (h 0 t ) to indicate that the prescription is a function of the common infor- mation h 0 t . The functions (φ prs t , 0≤t≤T ) are collectively referred to as the prescription strategy and denoted by φ prs . The prescription strategy is required to satisfy the following conditions: (C1) φ 0 ∈G 0 . (C2) Define φ 1 t (X t ,H 0 t ) := ¯ φ 1 t (H 0 t ) + [ ˜ φ 1 t (H 0 t )](X t ). Then, φ 1 ∈ ˆ G 1 where the notation [ ˜ φ 1 t (H 0 t )](X t ) means that we first use ˜ φ 1 t (H 0 t ) to find the deviation mapping q t and then evaluate q t at X t . (C3) We require that for any t, E φ prs n [ ˜ φ 1 t (H 0 t )](X t )|H 0 t o = 0, (3.14) whereE φ prs is the probability measure induced by the prescription strategy φ prs . Denote by Φ prs the set of all prescription strategies satisfying the above conditions. Consider the following problem of optimizing the prescription strategies. Problem 3.2. Consider the system described by (3.1)-(3.6). Given a prescription strategy φ prs ∈ Φ prs , let Λ(φ prs ) =E φ prs " T X t=0 c prs t (X t ,U prs t ) # , (3.15) 2 In other words,Q is the set of all Borel measurable functions from R d X to R d 1 U . 26 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon where for any x t and u prs t = (u 0 t , ¯ u 1 t ,q t ), c prs t (x t ,u prs t ) =c t x t ,u 0 t , ¯ u 1 t +q t (x t ) . (3.16) Then, we would like to solve the following optimization problem: inf φ prs ∈Φ prs Λ(φ prs ). (3.17) We now note that any feasible prescription strategy in Problem 3.2 can be used to construct control strategies in Problem 3.1. On the other hand, any control strategies in Problem 3.1 can be represented by a prescription strategy in Problem 3.2. This equivalence between Problems 3.1 and 3.2 is formally stated in the following lemma. Lemma 3.2. Problems 3.1 and 3.2 are equivalent in the following sense: 1. For any control strategies g 1 ∈ ˆ G 1 and g 0 ∈G 0 in Problem 3.1, there is a prescription strategy φ prs ∈ Φ prs in Problem 3.2 such that for 0≤t≤T , φ 0 t (H 0 t ) =g 0 t (H 0 t ), (3.18) ¯ φ 1 t (H 0 t ) = ¯ g 1 t (H 0 t ) =E g [g 1 t (X t ,H 0 t )|H 0 t ], (3.19) [ ˜ φ 1 t (H 0 t )](X t ) = ˜ g 1 t (X t ,H 0 t ) =g 1 t (X t ,H 0 t )−E g [g 1 t (X t ,H 0 t )|H 0 t ], (3.20) Λ(φ prs ) =J(g). (3.21) 2. Conversely, for any prescription strategyφ prs ∈ Φ prs in Problem 3.2, there are control strategies g 1 ∈ ˆ G 1 and g 0 ∈G 0 in Problem 3.1 such that for 0≤t≤T , g 0 t (H 0 t ) =φ 0 t (H 0 t ), (3.22) g 1 t (X t ,H 0 t ) = ¯ g 1 t (H 0 t ) + ˜ g 1 t (X t ,H 0 t ) = ¯ φ 1 t (H 0 t ) + [ ˜ φ 1 t (H 0 t )](X t ), (3.23) J(g) = Λ(φ prs ). (3.24) 27 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon Proof. The proof is a straightforward extension of arguments used in [54], and is therefore omitted. 3.3.3 Identifying an information state for the remote controller Since Problem 3.2 is a centralized decision-making problem for the remote controller C 0 , C 0 ’s belief on the system states can be used as an information state for decision-making. Note that C 0 ’s information at any time t is the common information H 0 t . Therefore, we define the common belief Θ t as the conditional probability distribution ofX t givenH 0 t . That is, under prescription strategies φ prs 0:t−1 until time t− 1, for any measurable set E⊂R d X , Θ t (E) :=P φ prs 0:t−1 (X t ∈E|H 0 t ). (3.25) Then, for a given realization h 0 t of H 0 t , the corresponding realization θ t of Θ t belongs to Δ(R d X ). We show in the following that the common beliefs θ t can be sequentially updated. Lemma 3.3. For any feasible prescription strategy φ prs ∈ Φ prs and for any h 0 t ∈H 0 t , we recursively define ν t (h 0 t )∈ Δ(R d X ) as follows: For any measurable set E⊂R d X , [ν 0 (h 0 0 )](E) = π X 0 (E) if z 1 0 =∅, 1 E (x 0 ) if z 1 0 =x 0 . (3.26) [ν t+1 (h 0 t+1 )](E) = [ψ t (ν t (h 0 t ),u prs t ,z 1 t+1 )](E), (3.27) where u prs t =φ prs t (h 0 t ) and ψ 1 t (ν t (h 0 t ),u prs t ,z 1 t+1 ) is defined as follows: • If z 1 t+1 =x t+1 , then [ψ 1 t (ν t (h 0 t ),u prs t ,z 1 t+1 )](E) =1 E (x t+1 ). (3.28) • If z 1 t+1 =∅, then [ψ 1 t (ν t (h 0 t ),u prs t ,∅)](E) = Z Z 1 E f t (x t ,w t ,u prs t ) ν t (h 0 t )(dx t )π Wt (dw t ), (3.29) 28 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon where f t (x t ,w t ,u prs t ) =Ax t +B 11 (¯ u 1 t +q t (x t )) +B 10 u 0 t +w t . (3.30) Then, ν t is the conditional probability distribution of X t given H 0 t , that is [ν t (H 0 t )](E) = P φ prs 0:t−1 (X t ∈E|H 0 t ). Proof. See Appendix A.3 for a proof. Lemma 3.3 implies that the realization θ t of the belief Θ t can be updated according to θ t+1 =ψ t (θ t ,u prs t ,z 1 t+1 ). (3.31) Recall thatQ is the space of all measurable functions q : R d X →R d 1 U . We now define the spaceQ(θ)⊂Q for any θ∈ Δ(R d X ) to be Q(θ) = n q :R d X →R d 1 U measurable, Z q(x)θ(dx) = 0 o . (3.32) Note that for any feasible prescription strategy φ prs ∈ Φ prs , (3.14) implies that for almost every realization h 0 t under φ prs , E φ prs [q t (X t )|h 0 t ] = 0, (3.33) where q t = ˜ φ 1 t (h 0 t ). Then, (3.33) and (3.25) imply that for almost every realization h 0 t , R q t (x t )θ t (dx t ) = 0, that is, q t belongs toQ(θ t ). 3.3.4 Writing a dynamic program for the equivalent problem We can use the common belief Θ t as an information state to construct a dynamic program for Problem 3.2. For that purpose, we will use the following definitions. For every x∈R d X , we use ρ(x) to denote the Dirac-delta distribution at x. Then, for any E⊂R d X , [ρ(x)](E) =1 E (x). 29 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon For any θ t ∈ Δ(R d X ), q t ∈Q(θ t ), ¯ u 1 t ∈R d 1 U , u 0 t ∈R d 0 U , and u prs t = (u 0 t , ¯ u 1 t ,q t ), we define • IC(θ t ,u prs t ) := R c prs t (x t ,u prs t )θ t (dx t ). This function represents the remote controller’s expected instantaneous cost at time t when its beliefs on the system state is θ t and it selects u prs t . • α 1 t :=ψ 1 t (θ t ,u prs t ,∅) (see (3.31) and note that α 1 t ∈ Δ(R d X )). • For any realization γ 1 t+1 ∈ {0, 1} of Γ 1 t+1 , NB(γ 1 t+1 ,α 1 t ,x t+1 ) := (1− γ 1 t+1 )α 1 t + γ 1 t+1 ρ(x t+1 ). This function represents the next belief equation for θ t . If γ 1 t+1 = 0, θ t+1 =α 1 t and if γ 1 t+1 = 1, θ t+1 =ρ(x t+1 ). • LS(p 1 ,γ 1 t+1 ) := (p 1 ) 1−γ 1 t+1 (1−p 1 ) γ 1 t+1 . If γ 1 t+1 = 0, this function represents the link failure probability, that is, p 1 . If γ 1 t+1 = 1, this function represents the probability that link is active, that is, 1−p 1 . The following theorem provides a dynamic program for optimal prescription strategies of Problem 3.2. Theorem 3.1. Suppose there exist functions V t : Δ(R d X )→R for t = 0, 1,...,T + 1 such that for each θ t ∈ Δ(R d X ), the following are true: • V T +1 (θ t ) = 0, • For any t = 0, 1,...,T V t (θ t ) = min qt∈Q(θt) n min ¯ u 1 t ∈R d 1 U,u 0 t ∈R d 0 U n IC(θ t ,u prs t ) + X γ 1 t+1 ∈{0,1} LS(p 1 ,γ 1 t+1 ) Z V t+1 NB(γ 1 t+1 ,α 1 t ,x t+1 ) α 1 t (dx t+1 ) oo , (3.34) where u prs t = (u 0 t , ¯ u 1 t ,q t ). Further, suppose there exists a feasible prescription strategy φ prs∗ ∈ Φ prs such that for any realization h 0 t ∈H 0 t and its corresponding common belief θ t = ν t (h 0 t ), (as defined by Lemma 3.3), the prescription u prs∗ t = (u 0∗ t , ¯ u 1∗ t ,q ∗ t ) = φ prs∗ (h 0 t ) achieves the minimum in the definition of V t θ t . Then, φ prs∗ is an optimal prescription strategy for Problem 3.2. Proof. See Appendix A.4 for a proof. 30 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon If the functions V 0:T of Theorem 3.1 can be shown to exist, then Theorem 3.1 provides a dynamic program to solve Problem 3.2. Even if such a dynamic program is available, it suffers from two significant challenges. First, it is a dynamic program on the belief space Δ(R d X ) which is infinite dimensional. Second, each step of the dynamic program involves functional optimization over the spacesQ(θ t ). In the next section, we show that functions satisfying (3.34) exist and that it is possible to use the dynamic program of Theorem 3.1 to obtain optimal control strategies in Problem 3.1. 3.4 Optimal Control Strategies 3.4.1 Optimal prescription strategy in Problem 3.2 The following theorem presents functions V 0:T satisfying (3.34) and an explicit optimal solution of the dynamic program in Theorem 3.1. Theorem 3.2. For t = 0, 1,...,T , the functions V t (·) of Theorem 3.1 exist and are given by 3 V t (θ t ) =QF P t ,μ(θ t ) + tr ˜ P t cov(θ t ) +e t , (3.35) where e t = T X s=t tr (1−p 1 )P s+1 +p 1 ˜ P s+1 . (3.36) The matrices P t , ˜ P t defined recursively below, are symmetric positive semi-definite (PSD). P T +1 = 0 (3.37) P t = Ω(P t+1 ,Q,R,A,B), (3.38) K t = Ψ(P t+1 ,R,A,B); (3.39) ˜ P T +1 = 0, (3.40) 3 Recall that μ(θt) and cov(θt) are the mean vector and the covariance matrix for the probability distri- bution θt. 31 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon ˜ P t = Ω (1−p 1 )P t+1 +p 1 ˜ P t+1 ,Q,R 11 ,A,B 11 , (3.41) ˜ K t = Ψ (1−p 1 )P t+1 +p 1 ˜ P t+1 ,R 11 ,A,B 11 . (3.42) Furthermore, the optimal prescription strategy is given as, u 0∗ t ¯ u 1∗ t = φ 0∗ t (θ t ) ¯ φ 1∗ t (θ t ) =K t μ(θ t ), (3.43) q ∗ t (X t ) = [ ˜ φ 1∗ t (θ t )](X t ) = ˜ K t X t −μ(θ t ) , (3.44) and the optimal cost is given by J ∗ T = T X t=0 tr (1−p 1 )P t+1 +p 1 ˜ P t+1 . (3.45) Proof. See Appendix A.5 for a proof. 3.4.2 Optimal control strategies in Problem 3.1 From Theorem 3.1, Theorem 3.2 and Lemma 3.2, we can explicitly compute the optimal control strategies for Problem 3.1. Theorem 3.3. The optimal strategies of Problem 3.1 are given by U 0∗ t U 1∗ t =K t ˆ X t + 0 ˜ K t X t − ˆ X t , (3.46) where ˆ X t is the estimate (conditional expectation) of X t based on the common information H 0 t . ˆ X t can be computed recursively according to ˆ X 0 = 0. (3.47) ˆ X t+1 = (A +BK t ) ˆ X t if Z 1 t+1 =∅, X t+1 if Z 1 t+1 =X t+1 . (3.48) Proof. See Appendix A.6 for a proof. 32 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon Theorem 3.3 shows that the optimal control strategy of the remote controllerC 0 is linear in the state estimate ˆ X t , and the optimal control strategy of the local controller C 1 is linear in both the state X t and the state estimate ˆ X t . 3.5 Extension to Multiple Local Controllers In this section, we extend of the system model of Section 3.2 to the case with multiple local controllers. 3.5.1 System Model and Problem Formulation In this section, we extend the system model of Section 3.2 to the case where instead of 1 local controller, we haveN local controllers,C 1 ,C 2 ,...,C N , each associated to a co-located plant as shown in Fig. 3.3. We useN to denote the set{1, 2,...,N} andN to denote {0, 1,...,N}. The linear dynamics of plant n∈N are given by X n t+1 =A nn X n t +B nn U n t +B n0 U 0 t +W n t ,t = 0,...,T, (3.49) where X n t ∈R d n X is the state of the plant n∈N at time t, U n t ∈R d n U is the control action of the controller C n , n∈N , and A nn ,B nn ,B n0 , n∈N , are matrices with appropriate dimensions. We assume that X n 0 = 0, W n t , n∈N and t≥ 0, are i.i.d noise processes with zero mean and cov(W n t ) = I. Note that we do not assume that W n t , n∈N and t≥ 0, are Gaussian. The overall dynamics can be written as X t+1 =AX t +BU t +W t (3.50) where X t = vec(X 1:N t ),U t = vec(U 0:N t ),W t = vec(W 0:N t ) and A,B are defined as A = A 11 0 . . . 0 A NN ,B = B 10 B 11 0 . . . . . . B N0 0 B NN . (3.51) 33 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon Remote Controller C 0 Local Controller C 1 Plant X 1 t U 0 t U 1 t X 1 t Z 1 t Γ 1 t Z 1:N t Local Controller C N Plant X N t U 0 t U N t X N t Z N t Γ N t Z 1:N t Figure 3.3: System model. The binary random variables Γ 1:N t indicate whether packets are transmitted successfully. Dashed lines indicate control links and solid blue lines indicate communication links. Blue links are perfects but red links are prone to packet drops. time t t + 1 time t t + 1 X n t Γ n t Z n t Z 1:N t U n t U 0 t W n t X n t+1 C n C 0 Figure 3.4: Time-ordering of relevant variables. Communication Model We assume that the communication network between the local controller C n , n∈N , and the remote controller C 0 is the same as one described for the local controller C 1 is Section 3.2.1. In particular, there exists an unreliable channel with link failure probability p n from the local controller C n , n∈N , to the remote controller C 0 , through which the local controller C n sends the perfectly observed state X n t of its co-located plant. The state of this channel at time t is described by Bernoulli random variable Γ n t = 0 and the output of this channel at time t is denoted by Z n t where Γ n t and Z n t are described similar to (3.3). Unlike the unreliable uplinks, we assume that there exist perfect links from C 0 to C n , for n∈N . Therefore, C 0 can share Z 1:N t and U 0 t−1 with all local controllers C 1:N . All 34 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon controllers select their control actions at time t after observing Z 1:N t . A schematic of the time ordering of variables is shown in Fig. 3.4. We assume that for all n∈N , the links from controllers C n and C 0 to the plant n are perfect. Information structure and cost LetH n t denote the information available to controllerC n ,n∈N , to make decisions at time t. Then, H n t ={X n 0:t ,U n 0:t−1 ,Z 1:N 0:t ,U 0 0:t−1 }, n∈N H 0 t ={Z 1:N 0:t ,U 0 0:t−1 }. (3.52) LetH n t be the space of all possible realizations of H n t . Then, C n ’s actions are selected according to U n t =g n t (H n t ), n∈N, (3.53) where g n t :H n t →R d n U is a Borel measurable mapping. We use g := (g 0 0 ,g 0 1 ,...,g 1 0 ,g 1 1 ,...,g N 0 ,g N 1 ,..., ) to denote the control strategies of C 0 and C 1:N . The instantaneous cost c t (X t ,U t ) of the system is a quadratic function similar to the one described in (3.6) where X t = vec(X 1:N t ),U t = vec(U 0:N t ) and Q = Q 11 ... Q 1N . . . . . . . . . Q N1 ... Q NN ,R = R 00 R 01 ... R 0N R 10 R 11 ... R 1N . . . . . . . . . . . . R N0 ... ... R NN . (3.54) Q is a symmetric positive semi-definite (PSD) matrix andR is a symmetric positive definite (PD) matrix. Problem Formulation LetG denote the set of all possible control strategies ofC 0:N that ensure that all states and control actions have finite second moments. The performance of control strategies g over a 35 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon finite horizon T is measured by J T (g) described in (3.7). We consider the problem of strategy optimization for the above DNCS over a finite time horizon. This problem is formally defined below. Problem 3.3. For the DNCS described above, solve the following strategy optimization problem: inf g∈G J T (g) (3.55) Remark 3.2. Without loss of optimality, we can restrict attention to strategy profiles g that ensure a finite expected cost at each time step. Because R is positive definite, finite expected cost at each time t is equivalent to E g [(U n t ) | U n t ] =E g [g n t (H n t ) | g n t (H n t )]<∞, n∈N,∀t. (3.56) Therefore, in the subsequent analysis we will implicitly assume that the strategy profile under consideration,g, ensures that for all timet and forn∈N ,g n t :H n t →R d n U has finite second moments, that is, (3.56) holds. 3.5.2 Equivalent Problem and Dynamic Program: A Common Informa- tion Approach In this section, we present the following four-step approach to derive a dynamic program- ming for Problem 3.3. Identifying irrelevant information at the controllers We first provide a structural result for the local controllers’ strategies. Lemma 3.4. Forn∈N , let ˆ H n t ={X n t ,H 0 t }, and ˆ G n ={g n ∈G n :g n t depends only on ˆ H n t }. inf g 0 ∈G 0 ,...,g N ∈G N J(g 0:N ) = inf g 0 ∈G 0 g 1 ∈ ˆ G 1 ,...,g N ∈ ˆ G N J(g 0:N ). (3.57) Proof. See Appendix A.2. 36 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon Due to Lemma 3.4, we only need to consider strategies g n ∈ ˆ G n for the local controller C n , n∈N . That is, the local controller C n only needs to use ˆ H n t ={X n t ,H 0 t } to make the decision at t. According to the information structure (3.52) and Lemma 3.4, H 0 t is the common infor- mation among C 0:N , and X n t (which is ˆ H n t \H 0 t ) is the private information used by the local controller C n in its decision-making. Note that C 0 has no private information (since H 0 t \H 0 t =∅). Following the common information approach [54], we construct below an equivalent centralized problem using the controllers’ common information. Formulating an equivalent centralized problem Consider arbitrary control strategies g n ∈ ˆ G n , n∈N, and g 0 ∈G 0 for the local and the remote controllers, respectively. Under these strategies, U n t can be written as U n t =g n t (X n t ,H 0 t ) =E g [g n t (X n t ,H 0 t )|H 0 t ] + n g n t (X n t ,H 0 t )−E g [g n t (X n t ,H 0 t )|H 0 t ] o . (3.58) We can rewrite (3.58) as U n t = ¯ g n t (H 0 t ) + ˜ g n t (X n t ,H 0 t ), (3.59) where ¯ g n t (H 0 t ) =E g [g n t (X n t ,H 0 t )|H 0 t ], ˜ g n t (X n t ,H 0 t ) =g n t (X n t ,H 0 t )−E g [g n t (X n t ,H 0 t )|H 0 t ]. (3.60) Observe that ˜ g n t (X n t ,H 0 t ) is conditionally zero-mean givenH 0 t , that is,E g [˜ g n t (X n t ,H 0 t )|H 0 t ] = 0. Note that ¯ g n t (H 0 t ) is the conditional mean of g n t (X n t ,H 0 t ) given the remote controller’s informationH 0 t and ˜ g n t (X n t ,H 0 t ) can be interpreted as the deviation ofg n t (X n t ,H 0 t ) from the mean ¯ g n t (H 0 t ). With this interpretation, (3.59) suggests that, at each time t, the problem of finding optimal control action U n t for C n is equivalent to the problem of finding “mean value” of U n t and “deviation” of U n t from the mean value. We will use the above representation of g n t in terms of ¯ g n t and ˜ g n t to formulate a central- ized decision-making problem. In the centralized problem, the remote controller is the 37 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon only decision-maker. At each time t, given the realization h 0 t of the remote controller’s information, it makes three decisions: 1. Remote controller’s control action u 0 t =φ 0 t (h 0 t ), 2. Mean value of every local controller’s control action ¯ u n t = ¯ φ n t (h 0 t ), n∈N , 3. A “deviation from the mean value” mapping q n t ∈Q n , n∈N , where Q n ={q n :R d n X →R d n U , Borel measurable} 4 and q n t = ˜ φ n t (h 0 t ). Based on the above decisions, the control actions applied to the system described by (3.49)- (3.54) are: • u 0 t , as the control action of the remote controller, • U n t = ¯ u n t +q n t (X n t ), as the control action of the n-th local controller, n∈N . We call u prs t := (u 0 t , ¯ u 1:N t ,q 1:N t ) the prescription at time t. We denote (φ 0 t , ¯ φ 1:N t , ˜ φ 1:N t ) by φ prs t and write u prs t = φ prs t (h 0 t ) to indicate that the prescription is a function of the com- mon information h 0 t . The functions (φ prs t , 0≤ t≤ T ) are collectively referred to as the prescription strategy and denoted by φ prs . The prescription strategy is required to satisfy the following conditions: (C1) φ 0 ∈G 0 . (C2) Define φ n t (X n t ,H 0 t ) := ¯ φ n t (H 0 t ) + [ ˜ φ n t (H 0 t )](X n t ). Then, φ n ∈ ˆ G n for any n ∈ N , where the notation [ ˜ φ n t (H 0 t )](X n t ) means that we first use ˜ φ n t (H 0 t ) to find the deviation mapping q n t and then evaluate q n t at X n t . (C3) We require that for any t, E φ prs n [ ˜ φ n t (H 0 t )](X n t )|H 0 t o = 0, (3.61) whereE φ prs is the probability measure induced by the prescription strategy φ prs . Denote by Φ prs the set of all prescription strategies satisfying the above conditions. Consider the following problem of optimizing the prescription strategies. 4 In other words,Q n is the set of all Borel measurable functions from R d n X to R d n U . 38 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon Problem 3.4. Consider the system described by (3.49)-(3.54). Given a prescription strategy φ prs ∈ Φ prs , let Λ(φ prs ) =E φ prs " T X t=0 c prs t (X 1:N t ,U prs t ) # , (3.62) where for any x 1:N t and u prs t = (u 0 t , ¯ u 1:N t ,q 1:N t ), c prs t (x 1:N t ,u prs t ) =c t x 1:N t ,u 0 t ,{¯ u n t +q n t (x n t )} n∈N . (3.63) Then, we would like to solve the following optimization problem: inf φ prs ∈Φ prs Λ(φ prs ). (3.64) We now note that any feasible prescription strategy in Problem 3.4 can be used to construct control strategies in Problem 3.3. On the other hand, any control strategies in Problem 3.3 can be represented by a prescription strategy in Problem 3.4. This equivalence between Problems 3.3 and 3.4 is formally stated in the following lemma. Lemma 3.5. Problems 3.3 and 3.4 are equivalent in the following sense: 1. For any control strategies g n ∈ ˆ G n and g 0 ∈G 0 in Problem 3.3, there is a prescription strategy φ prs ∈ Φ prs in Problem 3.4 such that for 0≤t≤T , φ 0 t (H 0 t ) =g 0 t (H 0 t ), (3.65) ¯ φ n t (H 0 t ) = ¯ g n t (H 0 t ) =E g [g n t (X n t ,H 0 t )|H 0 t ],∀n∈N, (3.66) [ ˜ φ n t (H 0 t )](X n t ) = ˜ g n t (X n t ,H 0 t ) =g n t (X n t ,H 0 t )−E g [g n t (X n t ,H 0 t )|H 0 t ],∀n∈N, (3.67) Λ(φ prs ) =J(g 0:N ). (3.68) 2. Conversely, for any prescription strategyφ prs ∈ Φ prs in Problem 3.4, there are control strategies g n ∈ ˆ G n and g 0 ∈G 0 in Problem 3.3 such that for 0≤t≤T , g 0 t (H 0 t ) =φ 0 t (H 0 t ), (3.69) 39 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon g n t (X n t ,H 0 t ) = ¯ g n t (H 0 t ) + ˜ g n t (X n t ,H 0 t ) = ¯ φ n t (H 0 t ) + [ ˜ φ n t (H 0 t )](X n t ),∀n∈N, (3.70) J(g 0:N ) = Λ(φ prs ). (3.71) Proof. The proof is a straightforward extension of arguments used in [54], and is therefore omitted. Identifying an information state for the remote controller Since Problem 3.4 is a centralized decision-making problem for the remote controller C 0 , C 0 ’s belief on the system states can be used as an information state for decision-making. Note that C 0 ’s information at any time t is the common information H 0 t . Therefore, we define the common belief Θ t as the conditional probability distribution of X 1:N t given H 0 t . That is, under prescription strategies φ prs 0:t−1 until time t− 1, for any measurable set E⊂ Q N n=1 R d n X , Θ t (E) :=P φ prs 0:t−1 (vec(X 1:N t )∈E|H 0 t ). (3.72) Let Θ n t denote the marginal common belief on X n t . That is, for any measurable set E n ⊂ R d n X Θ n t (E n ) :=P φ prs 0:t−1 (X n t ∈E n |H 0 t ). (3.73) Then, for a given realization h 0 t of H 0 t , the corresponding realization θ t of Θ t belongs to Δ( Q N n=1 R d n X ) and the realization θ n t of Θ n t belongs to Δ(R d n X ),n∈N . Since the plants’ dynamics are only coupled through the remote controller’s actions which belongs to the common information, the common belief has the following conditional inde- pendence property. Lemma 3.6. Consider a feasible prescription strategy φ prs ∈ Φ prs . Then, the random vectorsX 1:N t are conditionally independent given the common information H 0 t . That is, for any measurable sets E n ⊂R d n X , n∈N , Θ t ( N Y n=1 E n ) = N Y n=1 Θ n t (E n ) (3.74) 40 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon where Θ t and Θ n t are given by (3.72) and (3.73). Proof. The proof is a direct consequence of Part 2 of Claim .2 in Appendix A.1. Remark 3.3. The conditional independence of the statesX 1:N t given the common informa- tion, as described above in Lemma 3.6, is similar the conditional independence of the global state given the common information in [85, Lemma 6]. However, our model is different from the one considered in [85]. From Lemma 3.6, the joint common belief Θ t can be represented by the collection of marginal common beliefs Θ 1:N t . We show in the following that the marginal common beliefs θ n t ,n∈N, can be sequentially updated. Lemma 3.7. For any feasible prescription strategy φ prs ∈ Φ prs and for any h 0 t ∈H 0 t , we recursively define ν n t (h 0 t )∈ Δ(R d n X ) as follows: For any measurable set E n ⊂R d n X , [ν n 0 (h 0 0 )](E n ) = π X n 0 (E n ) if z n 0 =∅, 1 E n(x n 0 ) if z n 0 =x n 0 . (3.75) [ν n t+1 (h 0 t+1 )](E n ) = [ψ n t (ν n t (h 0 t ),u prs t ,z n t+1 )](E n ), (3.76) where u prs t =φ prs t (h 0 t ) and ψ n t (ν n t (h 0 t ),u prs t ,z n t+1 ) is defined as follows: • If z n t+1 =x n t+1 , then [ψ n t (ν n t (h 0 t ),u prs t ,z n t+1 )](E n ) =1 E n(x n t+1 ). (3.77) • If z n t+1 =∅, then [ψ n t (ν n t (h 0 t ),u prs t ,∅)](E n ) = Z Z 1 E n f n t (x n t ,w n t ,u prs t ) ν n t (h 0 t )(dx n t )π W n t (dw n t ), (3.78) where f n t (x n t ,w n t ,u prs t ) =A nn x n t +B nn (¯ u n t +q n t (x n t )) +B n0 u 0 t +w n t . (3.79) 41 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon Then, ν n t is the conditional probability distribution of X n t given H 0 t , that is [ν n t (H 0 t )](E n ) = P φ prs 0:t−1 (X n t ∈E n |H 0 t ). Proof. See Appendix A.3 for a proof. Lemma 3.7 implies that the realization θ n t of the belief Θ n t can be updated according to θ n t+1 =ψ n t (θ n t ,u prs t ,z n t+1 ). (3.80) Recall thatQ n is the space of all measurable functions q :R d n X →R d n U . We now define the spaceQ n (θ n )⊂Q n for any θ n ∈ Δ(R d n X ) to be Q n (θ n ) = n q n :R d n X →R d n U measurable, Z q n (x n )θ n (dx n ) = 0 o . (3.81) Note that for any feasible prescription strategy φ prs ∈ Φ prs , (3.61) implies that for almost every realization h 0 t under φ prs , E φ prs [q n t (X n t )|h 0 t ] = 0, (3.82) where q n t = ˜ φ n t (h 0 t ). Then, (3.82) and (3.73) imply that for almost every realization h 0 t , R q n t (x n t )θ n t (dx n t ) = 0, that is, q n t belongs toQ n (θ n t ). Writing a dynamic program for the equivalent problem We can use the collection of marginal common beliefs Θ 1:N t as an information state to construct a dynamic program for Problem 3.4. For that purpose, we will use the following definitions. For every x∈R d X , we use ρ(x) to denote the Dirac-delta distribution at x. Then, for any E⊂R d X , [ρ(x)](E) =1 E (x). For all n∈N and any θ n t ∈ Δ(R d n X ), q n t ∈Q n (θ n t ), ¯ u n t ∈ R d n U , u 0 t ∈ R d 0 U , and u prs t = (u 0 t , ¯ u 1:N t ,q 1:N t ), we define 42 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon • IC(θ 1:N t ,u prs t ) := R c prs t (x 1:N t ,u prs t ) Q n∈N θ n t (dx n t ). This function represents the remote controller’s expected instantaneous cost at timet when its beliefs on the system states are θ 1:N t and it selects u prs t . • α n t :=ψ n t (θ n t ,u prs t ,∅) (see (3.80) and note that α n t ∈ Δ(R d n X )). • For any realization γ n t+1 ∈ {0, 1} of Γ n t+1 , NB(γ n t+1 ,α n t ,x n t+1 ) := (1− γ n t+1 )α n t + γ n t+1 ρ(x n t+1 ). This function represents the next belief equation for θ n t . If γ n t+1 = 0, θ n t+1 =α n t and if γ n t+1 = 1, θ n t+1 =ρ(x n t+1 ). • LS(p n ,γ n t+1 ) := (p n ) 1−γ n t+1 (1−p n ) γ n t+1 . If γ n t+1 = 0, this function represents the link failure probability, that is p n . If γ n t+1 = 1, this function represents the probability that link is active, that is 1−p n . The following theorem provides a dynamic program for optimal prescription strategies of Problem 3.4. Theorem 3.4. Suppose there exist functions V t : Q N m=1 Δ(R d m X )→R for t = 0, 1,...,T + 1 such that for each θ 1:N t ∈ Q N m=1 Δ(R d m X ), the following are true: • V T +1 (θ 1:N t ) = 0, • For any t = 0, 1,...,T V t (θ 1:N t ) = min {q n t ∈Q n (θ n t )} n∈N n min {¯ u n t ∈R d n U} n∈N ,u 0 t ∈R d 0 U n IC(θ 1:N t ,u prs t ) + X γ 1 t+1 ∈{0,1} ... X γ N t+1 ∈{0,1} Y n∈N LS(p n ,γ n t+1 ) Z V t+1 n NB(γ n t+1 ,α n t ,x n t+1 ) o n∈N Y n∈N α n t (dx n t+1 ) oo , (3.83) where u prs t = (u 0 t , ¯ u 1:N t ,q 1:N t ). Further, suppose there exists a feasible prescription strategy φ prs∗ ∈ Φ prs such that for any realization h 0 t ∈H 0 t and its corresponding common beliefs θ n t = ν n t (h 0 t ), n∈N , (as defined by Lemma 3.7), the prescription u prs∗ t = (u 0∗ t , ¯ u 1:N∗ t ,q 1:N∗ t ) =φ prs∗ (h 0 t ) achieves the minimum in the definition of V t θ 1:N t . Then, φ prs∗ is an optimal prescription strategy for Problem 3.4. Proof. See Appendix A.4 for a proof. 43 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon If the functions V 0:T of Theorem 3.4 can be shown to exist, then Theorem 3.4 provides a dynamic program to solve Problem 3.4. Even if such a dynamic program is available, it suffers from two significant challenges. First, it is a dynamic program on the belief space Q N n=1 Δ(R d n X ) which is infinite dimensional. Second, each step of the dynamic program involves functional optimization over the spacesQ n (θ n t ), n∈N . In the next section, we show that functions satisfying (3.83) exist and that it is possible to use the dynamic program of Theorem 3.4 to obtain optimal control strategies in Problem 3.3. 3.5.3 Optimal Control Strategies Optimal prescription strategy in Problem 3.4 The following theorem presents functions V 0:T satisfying (3.83) and an explicit optimal solution of the dynamic program in Theorem 3.4. Theorem 3.5. For t = 0, 1,...,T , the functions V t (·) of Theorem 3.4 exist and are given by 5 V t (θ 1:N t ) =QF P t , vec {μ(θ n t )} n∈N + N X n=1 tr ˜ P nn t cov(θ n t ) +e t , (3.84) where e t = T X s=t N X n=1 tr ((1−p n )P nn s+1 +p n ˜ P nn s+1 ) cov(π W n s ) . (3.85) The matricesP t , ˜ P nn t ,n∈N , defined recursively below, are symmetric positive semi-definite (PSD) and P nn t is the nth diagonal block of P t . P T +1 = 0 (3.86) P t = Ω(P t+1 ,Q,R,A,B), (3.87) K t = Ψ(P t+1 ,R,A,B); (3.88) 5 Recall thatμ(θ n t ) andcov(θ n t ) are the mean vector and the covariance matrix for the probability distri- bution θ n t . 44 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon For n∈N , ˜ P nn T +1 = 0, (3.89) ˜ P nn t = Ω (1−p n )P nn t+1 +p n ˜ P nn t+1 ,Q nn ,R nn ,A nn ,B nn , (3.90) ˜ K n t = Ψ (1−p n )P nn t+1 +p n ˜ P nn t+1 ,Q nn ,A nn ,B nn . (3.91) Furthermore, the optimal prescription strategy is given as, u 0∗ t ¯ u 1∗ t . . . ¯ u N∗ t = φ 0∗ t (θ 1:N t ) ¯ φ 1∗ t (θ 1:N t ) . . . ¯ φ N∗ t (θ 1:N t ) =K t μ(θ 1 t ) . . . μ(θ N t ) , (3.92) q n∗ t (X n t ) = [ ˜ φ n∗ t (θ 1:N t )](X n t ) = ˜ K n t X n t −μ(θ n t ) , (3.93) and the optimal cost is given by J ∗ T = T X t=0 N X n=1 tr (1−p n )P nn t+1 +p n ˜ P nn t+1 . (3.94) Proof. See Appendix A.5 for a proof. Remark 3.4. Consider a centralized version of our problem where at each timet, the remote controller knows all the states X 1:N t and chooses all control actions U 0:N t . The solution to this centralized problem is U t =K t X t where K t is as given in (3.88) and (3.86)-(3.87) are the standard Riccati recursion for the centralized problem. Remark 3.5. The recursions of Theorem 3.5, and hence the optimal prescription strategy, do not depend on the covariances of the initial state and the noises . Optimal control strategies in Problem 3.3 From Theorem 3.4, Theorem 3.5 and Lemma 3.5, we can explicitly compute the optimal control strategies for Problem 3.3. 45 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon Theorem 3.6. The optimal strategies of Problem 3.3 are given by U 0∗ t ¯ U 1∗ t . . . ¯ U N∗ t =K t ˆ X 1 t . . . ˆ X N t + 0 ˜ K 1 t (X 1 t − ˆ X 1 t ) . . . ˜ K N t (X N t − ˆ X N t ) , (3.95) where ˆ X n t is the estimate (conditional expectation) ofX n t based on the common information H 0 t . ˆ X n t can be computed recursively according to ˆ X n 0 = 0. (3.96) ˆ X n t+1 = ([A] n,: + [B] n,: K t ) ˆ X t if Z n t+1 =∅, X n t+1 if Z n t+1 =X n t+1 . (3.97) Proof. See Appendix A.6 for a proof. Theorem 3.6 shows that the optimal control strategy of the remote controllerC 0 is linear in the state estimate ˆ X 1:N t , and the optimal control strategy of the local controllerC n ,n∈N , is linear in both the state X n t and the state estimate ˆ X 1:N t . Remark 3.6. According to Theorems 3.5 and 3.6, gain matrices K 0:T , ˜ K n 0:T , n∈N , are calculated offline and only the estimates ˆ X 1:N t are computed online at each time t using (3.96)-(3.97). 3.6 Discussion 3.6.1 Common Information Approach As stated before, our approach for Problem 3.3 is conceptually based on the the common information approach of [54]. Due to the perfect downlinks, the remote controller’s in- formation H 0 t is common information among all controllers and hence it can serve as a coordinator who provides prescriptions to all controllers to compute their optimal control actions. Although we conceptually follow the common information approach of [54], we had to come up with some new technical arguments to adapt this approach to our problem. The 46 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon technical argument in [54] is proven for finite state and action spaces. While the authors in [54] state that their results should apply to more general spaces, this was not explicitly proven. In our model, both the state and the action spaces are Euclidean. This has several implications: (a) Firstly, unlike the case with finite state and action spaces where the set of feasible strategies is a finite set, the set of feasible strategies in our problem is an infinite- dimensional space. This is not merely a difference in the size of problem. This difference means that in our version of the coordinator’s problem, the common belief is a conditional probability measure on a Euclidean space and the coordinator’s decision is to be selected from an infinite-dimensional space of all mappings from one Euclidean space to another. (b) Because of the features of the coordinator’s problem described above, it is not known a priori whether well-defined, measurable value functions satisfying the dynamic pro- gram of Theorem 3.4 actually do exist. Note that this existence was trivially true in the finite case of [54]. (c) Further, in the dynamic program of Theorem 3.4, it is not known a priori if a min- imizing prescription for the coordinator exists at each step of the dynamic program and for each possible common belief. Even if such minimizing prescriptions were known to exist, it is still unclear whether a coordination strategy that selects the minimizing prescription for each possible common belief is even measurable. Clearly, if a minimizer-selecting strategy is not measurable, it is not feasible because we can- not even define the expectations involved in the problem. Due to these reasons, our Theorem 3.4 provides only sufficient conditions for optimality— Theorem 3.4 is useful only if well-defined, measurable value functions and minimizer-selecting strategies can be shown to exist. All of these difficulties are trivially absent from [54] due to the assumed finiteness of spaces involved. While other works have used common information approach for linear strategies [32, 50], these again bypass the technical difficulties described above. This is because (i) linear strategies imply a finite-dimensional strategy space, (ii) in the context of LQG problems, linear strategies result in Gaussian common beliefs which can be replaced by mean and covariance in the coordinator’s dynamic program. Thus, both the belief space and the set 47 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon of prescriptions are finite-dimensional Euclidean spaces and one can use straightforward modifications of [54] here. To the best of our knowledge, this is the first result that explicitly shows that the common information approach for decentralized control is not confined to the realm of problems where state/action spaces are finite or problems which pre-suppose linear strategies. Even though our strategy space allows for arbitrary measurable functions, we were able to adapt the common information approach to find explicit optimal strategies. 3.6.2 Structure of Optimal Controllers As discussed in Section 3.5.1, the information structure of Problem 3.3 is not partially nested due to the unreliable links. Nevertheless, the information structure of Problem 3.3 behaves in a way similar to a partially nested structure in the following sense: If the states of uplinks are fixed a priori to a certain realization, that is, Γ 1:N t =γ 1:N t ∈{0, 1} N for allt, the information structure of the problem becomes partially nested. Therefore, we would expect that optimal controllers have a linear structure if the realization of Γ 1:N t ,t = 0,...,T , was known in advance. Theorem 3.6 shows that the linearity of controllers is true even when Γ 1:N t ,t = 0,...,T, are not fixed in advance but only causally observed — given the realization of Γ 1:N 0:t , the common estimates ˆ X 1:N t are linear in the available common information, and the optimal control action of C n is linear in the actual state X n t and the common state estimates ˆ X 1:N t . 3.6.3 Special cases 3.6.3.1 No control action for some controllers Our model can also capture the situation when some controllers participate in the commu- nication but do not take any control action. In particular, the situation when controller C n , n∈N has no action can be captured in the system model of Section 3.5 by setting B nn = 0 (if n∈N ), or B m0 = 0 for all m∈N (if n = 0), R nn = I, and R mn ,R nm to be zero matrices for allm∈N\{n}. Then, from Theorem 3.6, the optimal action U n∗ t is zero which means that controller n takes no action. 48 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon 3.6.3.2 Decoupled systems Consider the system model of Section 3.5 where the dynamics of plant n in (3.49) and the instantaneous cost of sub-systemn (that is, plantn and the local controllerC n collectively) in (3.6) are affected only by the n-th component of the remote controller’s action U 0 t , denoted by [U 0 t ] n,: . Specifically, X n t+1 =A nn X n t +B nn U n t + ¯ B n0 [U 0 t ] n,: +W n t ,t = 0,...,T, c t (X t ,U t ) = N X n=1 c n t (X n t , [U 0 t ] n,: ,U n t ), (3.98) where c n t is a quadratic function of the form (3.6). We can still use Theorem 3.6 to find optimal control strategies in this model. However, it is more efficient to consider the system as consisting of N decomposed remote controllers C 0n ,n∈N , where the remote controller C 0n is associated with only subsystem n. The problem of finding optimal strategies then decomposes intoN separate problems, each with one remote and one local controller. Each subproblem is a special case of Problem 3.3. Problems with one local and one remote controller were also investigated in our prior work [86]. 3.6.3.3 Always active links Consider an instance of Problem 3.3 where the links from the local controllers to the remote controller are always active, that is, Γ n t = 1, for all n∈N , and all t = 0,...,T . Note that in this case, we have Z n t =X n t for all n∈N , t = 0,...,T . Hence, Problem 3.3 effectively becomes a centralized problem. The optimal strategies of this problem can be calculated using Theorem 3.6 as U ∗ t = K t X t where K t is computed recursively using (3.86)-(3.88). These results are identical to the standard results for centralized LQ control problem under the cost function of (3.6). 3.6.3.4 Always failed links Consider an instance of Problem 3.3 where the links from the local controllers to the remote controller are always failed, that is, Γ n t = 0, for all n∈N , t = 0,...,T . In this case, the 49 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon optimal control strategies are given by, U 0∗ t U 1∗ t . . . U N∗ t =K t ˆ X 1 t . . . ˆ X N t + 0 ˜ K 1 t (X 1 t − ˆ X 1 t ) . . . ˜ K N t (X N t − ˆ X N t ) , whereK t is computed recursively using (3.86)-(3.88). Furthermore, ˜ K n t is computed recur- sively using (3.89)-(3.91) by setting p n = 1 for all n∈N . Note that in the case of always failed links, we have Z n t =∅ for all n∈N , t = 0,...,T . According to (3.52), in this case H 0 t ={U 0 0:t−1 }, H n t ={X n 0:t ,U n 0:t−1 }∪H 0 t . Problem 1 now becomes a (N + 1)-controller problem with partially nested information structure. More specifically, in this caseC 0 ’s actionU 0 t−1 affects the remote controllerC n ’s informationH n t , but since H 0 t−1 ⊂ H n t , the information structure is partially nested. This partially nested (N + 1)-controller problem has been studied in [87] with linear time-invariant plants and controllers, continuous-time dynamics and with the objective of finding the optimal linear time-invariant controllers that minimize an infinite-horizon quadratic cost. Remark 3.7. A setup with partially nested information structure similar to [87] but with finite state and action spaces was studied in [88]. However, [88] only provides a dynamic program without explicitly solving it. The finite-ness of state/action spaces, the absence of unreliable communication and the lack of an explicit solution make this work very different from ours. 3.7 Numerical Experiments In this section, we apply the result of Theorem 3.6 to an instance of Problem 3.3 and its corresponding centralized LQ problem (see Section 3.6.3.3). The purpose of this example is to show that finding optimal strategies for an arbitrary instance of Problem 3.3 using our results is computationally efficient and it is computationally comparable to a corresponding centralized LQ problem. Consider an instance of Problem 3.3 with one remote controller and N local controllers over a time horizon of duration T = 1000. We assume that d n X = d X = 3 for all n∈N and d n U = d U = 3 for all n∈N . We want to measure the running time of computing the 50 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon optimal control strategies for this problem and its corresponding centralized LQ problem. In order to make sure that our comparison does not depend on the particular choices of system matrices, we calculate the running time for 100 different instances of Problem 3.3 and their corresponding centralized LQ problem. For each iteration, each entry of the system matrices is chosen randomly and independently according to a uniform distribution on the interval [0, 20]. In particular, in each iteration, we generate a matrix A, a matrixB, and a symmetric PD matrix R randomly 6 . Further, the random variables W 1:N 0:T are chosen according to independent Gaussian distributions with zero mean and identity covariance matrices 7 . For this problem setup, we perform the following two experiments: • We generate a set of random variables Γ n t , n ∈ N , t = 0,...,T, according to a Bernoulli distribution with P(Γ n t = 0) = p n = 0.5 for all n∈N . We use Theorem 3.6 to compute the optimal control strategies and measure the time required for this computation. • Next, we fix Γ n t = 1, for all n∈N , t = 0,...,T . In this case, Problem 1 effectively becomes a centralized problem. We use special case 3 in Section 3.5.3-D to calculate the optimal control strategies and measure the time required for this computation. Note that the runtime for this experiment is simply the time required for computing the optimal control strategies for a centralized LQ problem with the aforementioned system matrices. The experiments were performed on a MacBook Pro, Intel 3 GHz core i7 processor with 16 GB memory. Tables 3.1 and 3.2 show the average running time of instances of Problem 3.3 with unreliable links and their corresponding centralized LQ problems (with always active links) for different values ofN. As can be seen from Tables 3.1 and 3.2, applying our results to an arbitrary instance of Problem 3.3 is computationally comparable to finding optimal strategies in its corresponding centralized LQ problem. 6 To generate a d×d symmetric PD matrix, we generate d(d+1) 2 numbers randomly. Then, we check to see whether the resulting symmetric matrix is PD. If not, we repeat this process until we generate a PD matrix. 7 According to Remark 3.5, computation of optimal strategies does not depend on the covariances of initial states and noises. 51 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon Table 3.1: Average running time in seconds for computing the optimal strategies for 100 instances of Problem 3.3 with unreliable links N (# of local controllers) 1 10 100 1000 Average running time (s) 0.347 1.191 32.353 19163.20 Table 3.2: Average running time in seconds for computing the optimal strategies for the corresponding centralized LQ problems (with always active links) N (# of local controllers) 1 10 100 1000 Average running time (s) 0.101 0.441 26.015 18512.49 3.8 Conclusion We considered two problems in this chapter: a decentralized optimal control problem for a linear plant controlled by a local controller and a remote controller, and the extension of this model where there exist a remote controller and a collection of linear plants, each associated with a local controller. For the system model with multiple local controllers, we assumed that each local controller directly observes the state of its co-located plant and can inform the remote controller of the plant’s state through an unreliable uplink channel. The downlink channels from the remote controller to local controllers are assumed to be perfect. The objective of the local controllers and the remote controller is to cooperatively minimize a quadratic performance cost over finite time horizon. For each of these two problems, we employed the common information approach to this problem and showed that it is equivalent to a centralized sequential decision-making problem where the remote controller is the only decision-maker. We provided a dynamic program to obtain optimal strategies in the equivalent problem. Then, using these optimal strategies for the equivalent problem, we obtained optimal control strategies for all local controllers and the remote controller in our original problem. In the optimal control strategies, all controllers compute common estimates of the states of the plants based on the common information obtained from the communication network. The remote controller’s action is linear in the common state estimates, and the action of each local controller is linear in both the actual state of its corresponding plant and the common state estimates. Our results sketch a solution methodology for decentralized control with unreliable com- munication among controllers. The methodology can potentially be generalized to other 52 Chapter 3. Decentralized Control over Unreliable Communication– Finite horizon communication topologies in decentralized control such as directed acyclic communication graphs with unreliable links. 53 Chapter 4 Decentralized Control of Finite State and Jump Linear Systems Connected by Markovian Channels 4.1 Introduction Networked control systems (NCSs), wherein feedback control loops are closed via com- munication networks, have been extensively studied in the literature (see [57, 89–91] and references therein). Typical models of such systems involve a single controller controlling the plant. Decentralized networked control systems (DNCS), on the other hand, consist of multiple controllers controlling an overall system while communicating over a network [92]. The DNCS architecture provides a way for multiple controllers to coordinate their actions in order to improve the overall system performance. However, design and analysis of DNCSs face various challanges due to the fact that decentralized control problems are generally difficult to solve (see [19, 77, 78]). These difficulties include: (i) linear strategies may not be optimal; (ii) simple sufficient statistics for control strategies may not be available; (iii) the strategy optimization problem may be a non convex infinite dimensional problem [23]. Existing methods for solving decentralized control problems require either specific informa- tion structures, such as static [18], partially nested [20, 32–34, 79, 80], stochastically nested [21], switched partially nested [93], quadratically invariant [46] or other specific properties 54 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems such as substitutability [94] which make the decentralized control problem “simpler” than the general problems. In this chapter, we consider a DNCS consisting of one “global” controller C 0 andN “local” controllers C 1 toC N as shown in Fig. 4.1. The DNCS includes a “global” plant controlled only by the global controller andN “local” plants which are controlled jointly by a co-located local controller and the global controller. AllN+1 plants have general system dynamics. We assume that the state of a local plant is perfectly observed by its co-located local controller and the global state is perfectly observed by the global controller. Each local controller can inform the global controller of local plant’s state through an unreliable two-state Markovian communication channel. The global controller shares whatever information it has received over the unreliable channels as well as the global state with all local controllers. We assume that the communication channels from the global controller to the local controllers are perfect as shown in Fig. 4.1. The objective of the controllers is to cooperatively minimize a general cost function over a finite time horizon. In Chapter 3, we studied a closely related problem where there is no global plant, the channel states are independent over time, and the dynamics of local plants are linear. The model considered in this chapter differs from the model of Chapter 3 in the following key respects: • The presence of the global state allows us to model a wider range of applications. For example, as we will describe in Section 4.5, the global state can be used to capture a switching global mode for the system. This global model allows us to model non-linear dynamics such as abrupt environmental disturbances, component failures or repairs, abrupt changes in the operation point, and changes in the control objective [55]. • The Markovian channel can better model the communication system in practice as the independence assumption of the channel states over time (considered in Chapter 3) may not hold for the real-world systems. • The general system dynamics and cost function allow us to study discrete state and action space models (Section 4.4) and switched linear system models (Section 4.5) using a unified methodology. For the problem we consider in this chapter, we first provide a dynamic program to obtain the optimal strategies of the controllers. For the case with finite state and action spaces, 55 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems it is possible to solve the dynamic program numerically using POMDP (Partially observ- able Markov Decision Processes) solvers. For the case with switched linear dynamics and mode-dependent quadratic cost, we will show that it is possible to explicitly solve the dy- namic program and obtain explicit optimal strategies for all local controllers and the global controller. 4.1.1 Organization The rest of the chapter is organized as follows. We introduce the system model and formulate the multi-controller DNCS problem in Section 4.2. In Section 4.3, we provide a dynamic program for this problem. Section 4.4 specializes the dynamic program for problems with finite state and action spaces. In Section 4.5, we solve the dynamic program for problems with switched linear dynamics and mode-dependent quadratic cost. In Section 4.6, we present some applications for the system model of this chapter. Section 4.7 concludes the chapter. The proofs of all the technical results of the chapter appear in the Appendix B. 4.2 System Model and Problem Formulation Consider the discrete-time system with N local controllers C 1 ,C 2 ,...,C N and one global controller 1 C 0 as shown in Fig. 4.1. The set of N local controllers is denoted byN , that is,N ={1, 2,...,N}. We useN to denote the set of all controllers, that is,N ={0}∪N . The dynamics of global plant 0 are given by S 0 t+1 =f 0 t (S 0 t ,U 0 t ,V 0 t ), (4.1) where S 0 t ∈S 0 t is the state of the plant 0 at time t, U 0 t ∈U 0 t is the control action of the global controller C 0 , and V 0 t ∈V 0 t is the noise of the global plant. The dynamics of local plant n∈N are given by S n t+1 =f n t (S n t ,S 0 t ,U n t ,U 0 t ,V n t ), (4.2) 1 The global controller is analogous to the remote controller in Chapter 3. However, in addition to controlling the local plants, here it also controls the global plant. 56 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems Global Controller C 0 Global Plant S 0 t Local Controller C 1 Plant S 1 t U 0 t U 1 t S 1 t Z 1 t Γ 1 t U 0 t S 0 t S 0 t ,Z 1:N t Local Controller C N Plant S N t U 0 t U N t S N t Z N t Γ N t S 0 t ,Z 1:N t Figure 4.1: System model. The binary random variables Γ 1:N t indicates whether packets are transmitted successfully. Blue lines indicate perfect links and red lines indicate unreliable links. Solid lines are communication links and dotted lines are control links. 0 1 p n t (0, 0) p n t (0, 1) p n t (1, 1) p n t (1, 0) Figure 4.2: Two-state Markovian channel from C n to C 0 . where S n t ∈S n t is the state of the plant n at time t, U n t ∈U n t is the control action of controller C n , and V n t ∈V n t is the noise of plant n. For n∈N , S n 0 is a random vector with distributionπ S n 0 ,V n t ∈V n t is a zero mean vector with distributionπ V n t . S 0:N 0 ,V 0:N 0:T are independent random vectors with finite second moments. The spacesS n t ,U n t , andV n t are either finite or Euclidean. At each time t, the local controller C n , n∈N , perfectly observes the state S n t and sends the observed state to the global controller C 0 through an unreliable two-state Markovian channel. Let Γ n t be the state of this channel at timet with Γ n t = 1 meaning that the channel is active and Γ n t = 0 meaning that the channel is inactive. Γ n t evolves according to transition probabilitiesp n t (i,j) =P(Γ n t+1 =j|Γ n t =i) fori,j∈{0, 1} as shown in Fig. 4.2. We assume that the Markov chains{Γ 1 0:T },...{Γ N 0:T } are independent of each other andS 0:N 0 andV 0:N 0:t . 57 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems LetZ n t be the output of the channel between the local controllerC n and the global controller C 0 . Then, Z n t = S n t when Γ n t = 1, ∅ when Γ n t = 0. (4.3) We assume that there exist perfect links from C 0 to C 1:N . Therefore, C 0 can share Z 1:N t as well as the global plant state S 0 t with C 1:N . All controllers select their control actions after observing S 0 t and Z 1:N t . LetH n t denote the information available toC n ,n∈N , to make decisions at timet. Then 2 , H 0 t ={S 0 0:t ,U 0 0:t−1 ,Z 1:N 0:t , Γ 1:N 0:t }, (4.4) H n t ={S n 0:t ,U n 0:t−1 }∪H 0 t , ∀n∈N. (4.5) Note that we include Γ 1:N 0:t in H 0 t because according to (4.3), all controllers can infer Γ n t after observing Z n t . LetH n t be the space of all realizations of H n t , n∈N . Then, C n ’s actions are selected according to U n t =λ n t (H n t ), ∀n∈N, (4.6) whereλ n t :H n t →U n t is a Borel measurable mapping. The collection of mappingsλ n 0 ,...,λ n T is called the strategy of the controller C n and it is denoted by λ n . The collection of all controllers’ strategies λ 0:N is called the strategy profile. At time t, the system incurs a cost c t (S 0:N t ,U 0:N t ). The system runs for a time horizon T . The performance of strategies λ 0:N is the total expected cost given by J(λ 0:N ) =E λ 0:N " T X t=0 c t (S 0:N t ,U 0:N t ) # . (4.7) Let Λ n be the set of all control strategies for C n , n∈N . The optimal control problem for C 0:N is formally defined below. 2 U 0 t−1 is not directly observed by C n at time t, but C n can obtain U 0 t−1 because U 0 t−1 = λ 0 t (H 0 t−1 ) and H 0 t−1 ⊂H n t . 58 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems Problem 4.1. For the system model described by (4.1)-(4.7), we would like to solve the following strategy optimization problem, inf λ 0 ∈Λ 0 ,...,λ N ∈Λ N J(λ 0:N ). (4.8) Remark 4.1. Without loss of optimality, we can restrict attention to strategy profiles λ 0:N that ensure a finite expected cost at each time step. Therefore, in the subsequent analysis we will implicitly assume that the strategy profile under consideration, λ 0:N , ensures that for all time t and for n∈N , λ n t :H n t →U n t has finite second moments. 4.3 Main Result For The General System Model In this section, we present a dynamic program based characterization for the solution of Problem 4.1. Our approach is based on the common information approach [54] for decen- tralized decision-making. We first provide a structural result for the local controllers’ strategies. Lemma 4.1. Let ˆ H n t ={S n t ,H 0 t }, and ˆ Λ n ={λ n ∈ Λ n :λ n t depends only on ˆ H n t }. Then, inf λ 0 ∈Λ 0 ,λ 1 ∈Λ 1 ,...,λ N ∈Λ N J(λ 0:N ) = inf λ 0 ∈Λ 0 ,λ 1 ∈ ˆ Λ 1 ,...,λ N ∈ ˆ Λ N J(λ 0:N ). (4.9) Proof. The proof is a straightforward extension of arguments used in the proof of Lemma 3.4 in Appendix A.2 and is therefore omitted. Due to Lemma 4.1, we only need to consider strategies λ n ∈ ˆ Λ n for the local controllerC n , n∈N . That is, C n only needs to use ˆ H n t ={S n t ,H 0 t } to make the decision at t. Characterizing the solution of Problem 4.1 According to the information structure (4.5) and Lemma 4.1,H 0 t is the common information among C 0:N . We define the common belief Θ t as the conditional probability distribution 59 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems of vec(S 1:N t ) given H 0 t . That is, under any strategies λ 0:N 0:t−1 where λ 0 t ∈ Λ 0 and λ n t ∈ ˆ Λ n , n∈N , and for any measurable set E⊂ Q n∈N S n t , Θ t (E) :=P λ 0:N 0:t−1 (vec(S 1:N t )∈E|H 0 t ). (4.10) Let Θ n t ,n∈N , denote the marginal common belief on S n t . That is, for any measurable set E n ⊂S n t , Θ n t (E n ) :=P λ 0:N 0:t−1 (S n t ∈E n |H 0 t ). (4.11) Then, for any realization h 0 t of H 0 t , the realization of Θ t is θ t ∈ Δ( Q n∈N S n t ) and the realization of θ n t is θ n t ∈ Δ(S n t ),n∈N . Since the dynamics of the local plants are only coupled through the state of the global plant and the action of the global controller, Θ t has the following conditional independence property. Lemma 4.2. Consider any strategies λ 0:N 0:t−1 where λ 0 t ∈ Λ 0 and λ n t ∈ ˆ Λ n , n∈N . Then, the random vectors S 1:N t are conditionally independent given the common information H 0 t . That is, for any measurable sets E n ⊂S n t , n∈N , Θ t ( N Y n=1 E n ) = N Y n=1 Θ n t (E n ) (4.12) where Θ t and Θ n t are given by (4.10) and (4.11). Proof. The proof is a straightforward extension of Lemma 3.6. The only difference is that in Lemma 3.6, there is no global stateS 0 t . However, sinceS 0 t is included in the common infor- mationH 0 t , one can easily follow the proof of Lemma 3.6 to show thatS 1:N t are conditionally independent given H 0 t . Due to Lemma 4.2, the joint common belief Θ t can be represented by the collection of marginal common beliefs Θ 1:N t . We show in the following lemma that for any realizationh 0 t of H 0 t , the marginal common beliefs θ n t ,n∈N, can be sequentially updated. Lemma 4.3. For any h 0 t ∈H 0 t , let π n t be a mapping fromS n t toU n t defined as π n t (·) := λ n t (h 0 t ,·), n∈N . For any strategies λ 0:N 0:t−1 , where λ 0 t ∈ Λ 0 and λ n t ∈ ˆ Λ n , n∈N , and for any h 0 t ∈H 0 t , we recursively define ν n t (h 0 t )∈ Δ(S n t ) as follows. 60 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems For any measurable set E n ⊂S n 0 , [ν n 0 (h 0 0 )](E n ) = π S n 0 (E n ) if z n 0 =∅, 1 E n(s n 0 ) if z n 0 =s n 0 . (4.13) For any measurable set E n ⊂S n t+1 , [ν n t+1 (h 0 t+1 )](E n ) = [ψ n t (θ n t ,s 0 t ,u 0 t ,π n t ,z n t+1 )](E n ), (4.14) where ψ n t (θ n t ,s 0 t ,u 0 t ,π n t ,z n t+1 ) is defined as follows. • If z n t+1 =s n t+1 , then [ψ n t (θ n t ,s 0 t ,u 0 t ,π n t ,s n t+1 )](E n ) =1 E n(s n t+1 ). (4.15) • If z n t+1 =∅, then [ψ n t (θ n t ,s 0 t ,u 0 t ,π n t ,∅)](E n ) = Z Z 1 E n f n t (s n t ,s 0 t ,π n t (s n t ),u 0 t ,v n t ) θ n t (ds n t )π V n t (dv n t ). (4.16) Then, ν n t (H 0 t ) is the conditional probability of S n t given H 0 t , that is, [ν n t (H 0 t )](E n ) = P λ 0:N 0:t−1 (S n t ∈E n |H 0 t ) = Θ n t (E n ). Proof. The proof is a straightforward extension of Lemma 3.7. The only difference is that s n t+1 here is a function of s n t ,s 0 t ,π n t (s n t ),u 0 t ,v n t while in Lemma 3.7, there is no global state s 0 t . However, since s 0 t is included in the common information h 0 t , one can easily obtain the results of Lemma 4.3 by following the arguments in the proof of Lemma 3.7 in Appendix A.3. Lemma 4.3 implies that the realization θ n t of the belief Θ n t can be updated according to θ n t+1 =ψ n t (θ n t ,u 0 t ,π 1:N t ,s 0 t ,z n t+1 ). (4.17) Theorem 4.1. Suppose there exist functions{V t :{0, 1} N ×S 0 t × Q N m=1 Δ(S m t )→R for t = 0, 1,...,T + 1} such that for each γ n t ∈{0, 1}, θ n t ∈ Δ(S n t ), n∈N , and s 0 t ∈S 0 t , the following are true: 61 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems • V T +1 (γ 1:N T +1 ,s 0 T +1 ,θ 1:N T +1 ) = 0, • For any t = 0, 1,...,T V t (γ 1:N t ,s 0 t ,θ 1:N t ) = min u 0 t ,π 1:N t n E h c t s 0 t , ¯ S 1:N t ,u 0 t ,{π n t ( ¯ S n t )} n∈N i +E h V t+1 Γ 1:N t+1 ,S 0 t+1 , Θ 1:N t+1 γ 1:N t ,s 0 t ,θ 1:N t ,u 0 t ,π 1:N t io , (4.18) where ¯ S n t ,n∈N , is a random vector distributed according toθ n t and the minimization is over u 0 t ∈U 0 t and π n t ∈P n :={π :S n t →U n t , Borel measurable}, n∈N . Further, suppose there exists a strategy profile λ 0:N∗ , where for each time t, λ 0∗ t ∈ Λ 0 and λ n∗ t ∈ ˆ Λ n , n∈N , such that for any realization h 0 t ∈H 0 t and its corresponding γ 1:N t , s 0 t , and common beliefs θ n t = ν n t (h 0 t ), n∈N , (as defined in Lemma 4.3), u 0∗ t = λ 0∗ t (h 0 t ) and π n∗ t (·) = λ n∗ t (h 0 t ,·), n∈N , achieve the minimum in the definition of V t (γ 1:N t ,s 0 t ,θ 1:N t ). Then, λ 0:N∗ is optimal for Problem 4.1. Proof. See Appendix B.1 for a proof. 4.4 Finite Systems We now apply the results of Section 4.3 to systems with finite state, action and noise spaces. Consider the system model described in Section 4.2 withS n t ,U n t ,V n t being finite sets for all n∈N and 0≤t≤T . Then, it follows from (4.4)-(4.5) thatH n t are finite spaces forn∈N and 0≤t≤T . Furthermore, sinceS n t andU n t are finite spaces, function spaceP n is finite and the belief space Δ(S n t ) is finite-dimensional for n∈N and 0≤t≤T . By restricting attention to finite spaces, there are only finite number of u 0 t ∈ U 0 t and finite number of π n t ∈P n , n∈N . Hence, the minimization in (4.18) is well-defined and consequently, the functions V 0:T of Theorem 4.1 exist. As a result, when the spaces are finite, it is possible to use the dynamic program of Theorem 4.1 to obtain optimal control strategies in Problem 4.1. This result is summarized in the following theorem. Theorem 4.2. For finite systems, the functions V 0:T satisfying (4.18) exist and can be recursively computed. Further, for any realization h 0 t ∈H 0 t and its corresponding γ 1:N t , s 0 t , and common beliefs θ n t =ν n t (h 0 t ), n∈N , (as defined in Lemma 4.3), let u 0∗ t =λ 0∗ t (h 0 t ) and π n∗ t (·) =λ n∗ t (h 0 t ,·), n∈N , be minimizers of (4.18). Then, λ 0:N∗ is optimal. 62 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems 4.5 Markov jump linear systems We now use the results of Section 4.3 to find optimal decentralized control strategy for a class of Markovian jump linear systems (MJLSs). These systems model situations where the plants can operate in one of several modes. We first describe our MJLS model and then show that it can be seen as a special case of the general model in Section 4.2. Consider a discrete time discrete state Markov process with state M t ∈M. M t is referred to as the system mode at time t. The system mode evolves as M t+1 =η t (M t , Ξ t ), (4.19) where Ξ 0:T are i.i.d random variables with distributionπ Ξt . We equivalently describe (4.19) by the transition probabilities ˜ p t (i,j) for any for i,j∈M as follows, ˜ p t (i,j) =P(M t+1 =j|M t =i) =P η t (M t , Ξ t ) =j|M t =i =P η t (i, Ξ t ) =j . (4.20) Our MJLS model consists of N local controllers C 1 ,C 2 ,... , C N and one global controller C 0 as shown in Fig. 4.1. The dynamics of the global plant are X 0 t+1 =A 00 (M t )X 0 t +B 00 (M t )U 0 t +W 0 t , (4.21) and the dynamics of local plant n∈N are given by X n t+1 =A nn (M t )X n t +A n0 (M t )X 0 t +B nn (M t )U n t +B n0 (M t )U 0 t +W n t , (4.22) where for n∈N , X n t ∈R d n X is the state of the plant n at time t, U n t ∈R d n U is the control action of controller C n , and W n t ∈R d n X is a zero-mean noise vector with distribution π W n t . We assume that X n 0 = 0 for all n∈N . Furthermore, A nn (M t ),B nn (M t ), n∈N , and A n0 (M t ),B n0 (M t ), n∈N , are matrices with appropriate dimensions depending on the system modeM t . We assume that the collection of random variables X 0:N 0 ,M 0 , Ξ 0:T ,W 0:N 0:T are independent. The overall dynamics can be written as X t+1 =A(M t )X t +B(M t )U t +W t , (4.23) 63 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems where X t = vec(X 0:N t ),U t = vec(U 0:N t ),W t = vec(W 0:N t ) and A(M t ),B(M t ) are given by A(M t ) = A 00 (M t ) A 10 (M t ) A 11 (M t ) 0 . . . . . . A N0 (M t ) 0 A NN (M t ) , B(M t ) = B 00 (M t ) B 10 (M t ) B 11 (M t ) 0 . . . . . . B N0 (M t ) 0 B NN (M t ) . (4.24) The information available to the controllers at time t is given by, H 0 t ={X 0 0:t ,M 0:t ,U 0 0:t−1 ,Z 1:N 0:t , Γ 1:N 0:t }. (4.25) H n t ={X n 0:t ,U n 0:t−1 }∪H 0 t , ∀n∈N. (4.26) Furthermore, the instantaneous costc t (M t ,X 0:N t ,U 0:N t ) of the system is a quadratic function given by c t (M t ,X 0:N t ,U 0:N t ) =X | t Q(M t )X t +U | t R(M t )U t (4.27) where R(M t ) = R 00 (M t ) ... R 0N (M t ) . . . . . . . . . R N0 (M t ) ... R NN (M t ) =: [R ij (M t )] i,j∈N , Q(M t ) = [Q ij (M t )] i,j∈N . (4.28) Q(m) is a symmetric positive semi-definite (PSD) matrix andR(m) is a symmetric positive definite (PD) matrix for all m∈M. The optimal control problem for C 0:N is formally defined below. 64 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems Problem 4.2. For the system model described by (4.19)-(4.26), we would like to solve the following strategy optimization problem, inf λ 0 ∈Λ 0 ,...,λ N ∈Λ N " T X t=0 c t (M 0 t ,X 0:N t ,U 0:N t ) # . (4.29) It is easy to check that Problem 4.2 can be posed as an instance of Problem 4.1. If we define S 0 t = vec(M t ,X 0 t ), V 0 t = vec(Ξ t ,W 0 t ), S n t =X n t , and V n t =W n t for n∈N , then • The dynamics in (4.21) and (4.22) can be written as (4.1) and (4.2), respectively. • The information structures in (4.25) and (4.26) can be written as (4.4) and (4.5), respectively. • The objective function in (4.29) can be written as (4.7). As a result, Theorem 4.1 is applicable to Problem 4.2 and it provides a dynamic program (DP) based characterization for the solution of Problem 4.2. However, there exist three significant challenges for using Theorem 4.1. First, it is not a priori known that whether the functions V 0:T of Theorem 4.1 exist. Second, the information state of this DP (that is, the argument of the functions V t ) belongs to{0, 1} N ×M×R d 0 X × Q N n=1 Δ(R d n X ) which is an infinite-dimensional space. Third, each step of the dynamic program involves a functional optimization over the spacesP 1:N , where eachP n , n∈N , is an infinite-dimensional space of all Borel-measurable functions from R d n X to R d n U . In spite of these impediments, we will show that the functions V 0:T of Theorem 4.1 exist and Theorem 4.1 can be used to find the optimal control strategies in Problem 4.2. 4.5.1 Solving the DP Our approach for solving the DP is based on the following proposition. Proposition 4.1. The functions V 0:T of Theorem 4.1 exist and at each time t, the value function V t is quadratic in x 0 t and the means of distributions θ 1:N t , and is linear in the covariances of distributions θ 1:N t . More specifically, there exist PSD matrices P t (m) and ˜ P n t (γ,m) for each m∈M, γ∈{0, 1}, n∈N , and a constant e t such that V t (γ 1:N t ,m t ,x 0 t ,θ 1:N t ) = 65 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems QF P t (m t ), vec x 0 t ,{μ(θ n t )} n∈N + X n∈N tr ˜ P n t (γ n t ,m t ) cov(θ n t ) +e t . (4.30) To show that this conjecture is correct, we use induction. At time T + 1, (4.30) is correct as we have defined V T +1 (·) = 0. Assume (4.30) is correct at time t + 1. Using the assumed structure of V T +1 in (4.18) and simplifying the right hand side, we can show that V t (γ 1:N t ,m t ,x 0 t ,θ 1:N t ) =e t + min u 0 t ,π 1:N t n QF (G t (m t ),E[O t ]) + tr ˜ G t (γ 1:N t ,m t ) cov(O t ) o , (4.31) where O t := vec(x 0 t , ¯ X 1:N t ,u 0 t ,{π n t ( ¯ X n t )} n∈N ) and ¯ X n t is a random vector with distribution θ n t . Furthermore, matrices G t (m t ) and ˜ G t (γ 1:N t ,m t ) are obtained from P t+1 (·), ˜ P n t+1 (·), n∈N , and system matrices (see (76) and (77) in Appendix B for exact descriptions). Note that while cov(O t ) depends only onπ 1:N t ,E[O t ] depends onπ 1:N t as well asu 0 t . This means that finding V t (·) from (4.31) requires jointly optimizing for u 0 t ,π 1:N t . Now, we propose a “special” change of variable that helps us find V t (·) from (4.31). We write π n t (·) = ¯ u n t +q n t (·), (4.32) where ¯ u n t =E[π n t ( ¯ X n t )] andq n t (·) =π n t (·)−E[π n t ( ¯ X n t )]. Note thatE[q n t ( ¯ X n t )] = 0. If we apply this change of variable toO t , thenE[O t ] depends only onu 0 t , ¯ u 1:N t (not onq 1:N t ) and cov(O t ) depends only on q 1:N t (not on u 0 t , ¯ u 1:N t ). This enables us to decompose the optimization in (4.31) into two optimization problems where in the first one, we optimize foru 0 t , ¯ u 1:N t and in the second one, we optimize for q 1:N t . Note that although the first optimization is a vector optimization, the second one is still a functional optimization. However, this functional optimization is simpler than optimizing for π 1:N t in (4.31) and can be solved using the fact E[q n t ( ¯ X n t )] = 0 for n∈N to find q 1:N∗ t (·). Furthermore, the first optimization problem can be solved to find u 0∗ t , ¯ u 1:N∗ t . Then, by substituting π n∗ t (·) back for ¯ u n∗ t +q n∗ t (·), we can find π 1:N∗ t (·) minimizing (4.31). This result has been stated in the following theorem. Theorem 4.3. The matrices and the constants of Proposition 4.1 are given by the following recursions: 1. Recursion of matrices P t (m), m∈M: P T +1 (m) := 0, (4.33) 66 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems F(P t+1 , ˜ p t ,m) := X l∈M ˜ p t (m,l)P t+1 (l), (4.34) P t (m) = Ω F(P t+1 , ˜ p t ,m),A(m),B(m),Q(m),R(m) , (4.35) K t (m) = Ψ F(P t+1 , ˜ p t ,m),A(m),B(m),R(m) . (4.36) 2. Recursion of matrices ˜ P n t (γ,m), n∈N , γ∈{0, 1}, m∈M: ˜ P n T +1 (γ,m) := 0, (4.37) H(P n t+1 , ˜ P n t+1 , ˜ p t ,p n t ,γ,m) := X l∈M ˜ p t (m,l) p n t (γ, 1)P n t+1 (l) +p n t (γ, 0) ˜ P n t+1 (0,l) , (4.38) ˜ P n t (γ,m) = Ω H(P n t+1 , ˜ P n t+1 , ˜ p t ,p n t ,γ,m),A nn (m),B nn (m),Q nn (m),R nn (m) , (4.39) ˜ K n t (γ,m) = Ψ H(P n t+1 , ˜ P n t+1 , ˜ p t ,p n t ,γ,m),A nn (m),B nn (m),R nn (m) . (4.40) Note that in the above equations, P n t (m) is the n-th diagonal block of P t (m). 3. Recursion of e t : e T +1 = 0, (4.41) e t = T X s=t tr F(P 0 s+1 , ˜ p s ,m s ) cov(π W 0 s ) + X n∈N tr H(P n s+1 , ˜ P n s+1 , ˜ p s ,p n s ,γ n s ,m s ) cov(π W n s ) . (4.42) Furthermore, the optimal strategies of Problem 4.2 are given as, U 0∗ t =K 0 t (M t ) ˆ X t , (4.43) U n∗ t =π n∗ t (X n t ) =K n t (M t ) ˆ X t + ˜ K n t (Γ n t ,M t ) X n t − ˆ X n t , (4.44) where K t (m) = [K n t (m)] n∈N is given in (4.36) and ˜ K n t (γ,m), γ ∈ {0, 1}, m∈ M, is given in (4.40). Further, ˆ X t := vec(X 0 t , ˆ X 1 t ,..., ˆ X N t ) where ˆ X n t = μ(Θ n t ) is the estimate (conditional expectation) ofX n t based on the common informationH 0 t and it can be computed recursively according to ˆ X n 0 = 0, (4.45) 67 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems ˆ X n t+1 =μ(Θ n t+1 ) = [A(M t )] n,: + [B(M t )] n,: K t (M t ) ˆ X t if Z n t+1 =∅, X n t+1 if Z n t+1 =X n t+1 . (4.46) Proof. See Appendix B.2 for a proof. For the special case where there is only one mode (that is,M ={1}) and the channel states are independent over time (that is, p n t (γ, 0) =p n and p n t (γ, 1) = 1−p n for γ∈{0, 1}), the results of the above theorem can be simplified as follows. For simplicity of presentation, we drop (·) in front of matrices. Theorem 4.4. If there is only one mode (that is,M ={1}) and the channel states are independent over time (that is, p n t (γ, 0) = p n and p n t (γ, 1) = 1−p n for γ∈{0, 1}), the optimal strategies of Problem 4.2 are given as, U 0∗ t =K 0 t ˆ X t , (4.47) U n∗ t =K n t ˆ X t + ˜ K n t X n t − ˆ X n t , (4.48) where K t = [K n t ] n∈N is given in (4.53) and ˜ K n t is given in (4.56). Further, ˆ X t := vec(X 0 t , ˆ X 1 t ,..., ˆ X N t ) where ˆ X n t is the estimate (conditional expectation) of X n t based on the common information H 0 t and it can be computed recursively according to ˆ X n 0 = 0, (4.49) ˆ X n t+1 = [A] n,: + [B] n,: K t ˆ X t if Z n t+1 =∅, X n t+1 if Z n t+1 =X n t+1 . (4.50) Matrices P t , defined recursively below, are symmetric PSD. P T +1 = 0, (4.51) P t = Ω P t+1 ,A,B,Q,R , (4.52) K t = Ψ P t+1 ,A,B,R , (4.53) For n∈N , matrices ˜ P n t , defined recursively below, are symmetric PSD and P n t is the n-th diagonal block of P t . ˜ P n T +1 = 0, (4.54) ˜ P n t = Ω (1−p n )P n t+1 +p n ˜ P n t+1 ,A nn ,B nn ,Q nn ,R nn , (4.55) 68 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems ˜ K n t = Ψ (1−p n )P n t+1 +p n ˜ P n t+1 ,A nn ,B nn ,R nn . (4.56) 4.6 Applications 4.6.1 Decentralized Networks Control Systems with Broadcast-out Ar- chitecture In the DNCS with broadcast-out architecture, there exist several subsystems connected as shown in Fig. 4.3 where each node indicates a controller and its co-located plant. In this architecture, the root node can affect several leaf nodes, but the leaf nodes do not affect any other nodes. The root node only has access to its own state, while each leaf node has access to both its own state and the root node’s state. This DNCS problem, studied in [87, 88], can be considered as a special case of our system model where the communication channels from the local controllers to the global controller are always inactive, that is, Γ n t = 0, for all n∈N , t = 0,...,T . 0 1 2 ... N Figure 4.3: Broadcast-out architecture. Blue dotted lines indicate perfect communication links and blue solid lines indicate that the root node affects the dynamics of the leaf nodes. 4.6.2 Decentralized Networks Control Systems with Decoupled subsys- tems and coupled costs Consider the DNCS of Section 4.6.1 without the root node. Instead the N leaf controllers can communicate with each other through a network as shown in Fig. 4.4. An example of this DNCS can be a collection of controllers connected to an access point (AP) where the AP is equipped with enough communication resources to transmit perfectly whatever it receives to all the controllers (see the perfect links from the network to the controllers shown by blue color in Fig. 4.4). On the other hand, the controllers can have communication constraints and hence, their packets can be dropped (see the unreliable links from the controllers to the network shown by red color in Fig. 4.4). This DNCS can be considered as a special 69 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems case of our model in Section 4.2 where the global controller takes no action (that is, it is absent). In this DNCS, the global plant can model an uncontrolled global mode affecting all subsystems. In particular if the subsystems of this DNCS have switched linear dynamics, the DNCS can be modeled by MJLS model of Section 4.5 where for all possible modes m∈M, B n0 (m) = 0, R 0n t (m),R n0 t (m) are zero matrices for all n∈N , B 00 (m) = I, and R 00 t (m) = I. Then, from Theorem 4.3, the optimal actionU 0∗ t is zero which means that the global controller takes no action. Network 1 2 ... N Figure 4.4: Communication over networks. Blue dotted lines indicate perfect communication links and red dotted lines indicate unreliable communication channels. 4.6.3 Two-controller decentralized systems with decoupled dynamics, cou- pled costs, and two-way unreliable communication In this DNCS, there exist two subsystems with decoupled dynamics but with coupled costs where the controllers can share information with each other through unreliable links as shown in Fig. 4.5. This DNCS can be modeled as a special case of our system model where there exists two local controllers with their co-located plants and no action is taken by the global controller. In this case, since the links from the global controller to the local controllers are perfect, the unreliable link from local controller n, n = 1, 2, models the unreliable link from this local controller to another one. 1 2 Figure 4.5: Two-controller system with two-way unreliable communication. Red dotted lines indicate unreliable communication links. 4.6.4 Decoupled subsystems with coupled costs In this DNCS, there exists a collection of subsystems with decoupled dynamics but with coupled costs where there is no communication links among the controllers as shown in Fig. 4.6. This DNCS can be modeled as a special case of our system model where no action is 70 Chapter 4. Decentralized Control of Finite State and Jump Linear Systems taken by the global controller and the links from the local controllers to the global controller are always failed, that is, Γ n t = 0, for all n∈N , t = 0,...,T . 1 2 ... N Figure 4.6: Decoupled subsystem with coupled costs. 4.7 Conclusion We considered a decentralized networked control system (DNCS) including one “global” controller and a collection of “local” controllers. The DNCS includes a global plant con- trolled only by the global controller and a collection of local plants where each local plant is controlled jointly by its co-located local controller and the global controller. The objective of the controllers is to cooperatively minimize a general cost function over a finite time horizon. For this problem, we provided a dynamic program to obtain the optimal strategies of the controllers. While for the case with finite state and action spaces, it is possible to solve the dynamic program numerically using POMDP (Partially observable Markov Decision Pro- cesses) solvers, for the case with switched linear dynamics and mode-dependent quadratic cost, we will show that it is possible to explicitly solve the dynamic program and obtain explicit optimal strategies for all local controllers and the global controller. In the optimal strategies, all controllers compute common estimates of the states of the local plants based on the common information obtained from the communication network. The global con- troller’s action is linear in the state of global plant and the common state estimates, and action of each local controller is linear in both the actual states of its co-located plant and the global plant as well as the common state estimates. 71 Chapter 5 Decentralized Control over Unreliable Communication– Infinite horizon 5.1 Introduction Many cyber-physical systems can be viewed as Networked Control Systems (NCSs) con- sisting of several components such as physical systems, controllers, actuators and sensors that are interconnected by communication networks. One key question in the design and operation of such systems is the following: what effect do communication limitations and imperfections such as packet loss, delays, noise and data rate limits have on the system performance? A well-studied communication model in the context of NCSs is that of an unreliable communication link that randomly loses packets. This means that the receiver in this unreliable link (e.g., a controller, an actuator etc.) receives information intermittently and has to perform its functions (selecting a control action, applying a control on the plant etc.) despite the interruptions in communication. Networked control and estimation problems in which there is only a single controller in the NCS and the unreliable links are from sensor(s) to the controller and/or from the controller to actuator(s) have been a focus of significant research (see, for example, [51, 52, 59–62, 68–76, 95]). In many complex NCSs, however, there are multiple controllers which may need to communicate with each other to control the overall system. In such 72 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon cases, the unreliable communication may not be just between sensors and controllers or controllers and actuators but also among controllers themselves. Thus, multiple controllers may need to make decentralized decisions while communicating intermittently with each other and with the sensors and actuators of the system. We will refer to such a NCS as a Decentralized Networked Control System (DNCS) since the control decisions need to be made in a decentralized manner. The fact that multiple controllers need to make decentralized decisions means that control problems in DNCSs can be viewed as decentralized control problems. Optimal decentralized control problems are generally difficult to solve (see [19, 77, 78, 96]). In general, linear con- trol strategies may not be optimal, and even the problem of finding the best linear control strategies may not be a convex problem [97], [23]. Existing methods for computing optimal decentralized controllers require specific information structures and system properties such as partial nestedness of the information structure [20], stochastic nestedness [21] quadratic invariance [46], substitutability [94] etc. A common feature of the prior work in decentral- ized control is that the underlying communication structure of the decentralized system is assumed to be fixed and unchanging. For example, several works assume a fixed commu- nication graph among controllers whose (directed) edges represent perfect communication links between controllers [28, 34, 80, 87, 98–101]. Similarly, when the communication graph incorporates delays, the delays are assumed to be fixed [42, 79, 102–109]. Such models, however, do not incorporate the intermittent nature of communication over unreliable links between controllers. While some works [110, 111] have investigated unreliable controller- actuator communication in the context of a decentralized control problem, they require that the inter-controller communication be perfect. In this chapter, we investigate a decentralized control problem with unreliable inter-controller communication. In particular, we consider a DNCS consisting of a remote controller and a collection of linear plants, each associated with a local controller. Each plant is directly controlled by a local controller which can perfectly observe the state of the plant. The remote controller can control all plants, but it does not have direct access to the states as its name suggests. The remote controller and the local controllers are connected by a com- munication network in which the downlinks from remote controller to local controllers are perfect but the uplinks from local controllers to remote controller are unreliable channels with random packet drops. The objective of the local controllers and the remote controller 73 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon is to cooperatively minimize an overall quadratic performance cost of the DNCS. The infor- mation structure of this DNCS does not fit into the standard definition of partially nested information structures due to the unreliable links between controllers. For the finite horizon version of our problem, we obtained optimal decentralized controllers in Chapter 3 using ideas from the common information approach [54]. The optimal strate- gies in the finite horizon case were shown to be characterized by coupled Riccati recursions. Another approach based on Pontryagin’s maximum principle was used in [112] for the finite horizon problem with only two controllers. In this chapter, we will focus on the infinite time horizon average cost problem. The infinite horizon problem differs from its finite horizon counterpart in several key ways: (i) In the finite horizon problem, the optimal cost is always finite. In the infinite horizon problem, however, it may be the case that no strategy can achieve a finite cost over the infinite horizon. In fact, we will show that this is the case if the link failure probabilities are above certain thresholds. (ii) Similarly, the finite horizon problem does not have to deal with the issue of stability since under any reasonable finite horizon strategy the system state cannot become “too large” in a finite time. The stability of the state becomes a key issue in the infinite hori- zon. In addition to proving optimality of control strategies, we need to make sure that the optimal strategies keep the state mean-square stable. (iii) Finally, the analytical approaches for the finite and infinite horizon problems are funda- mentally different. In the finite horizon case, we were able to use the common information approach to obtain a coordinator-based dynamic program. In the infinite horizon case, our essential task is to show that the value functions of the coordinator-based finite horizon dynamic program converge to a steady state as the horizon approaches infinity. Since the value functions were characterized by coupled Riccati recursions, this boils down to show- ing that these coupled recursions reach a steady state. Further, we need to show that the decentralized control strategies characterized by the steady-state coupled Riccati equations are indeed optimal. We achieve these goals by establishing a connection between our DNCS and an auxiliary (and fictitious) Markov jump linear system (MJLS) 1 . An alternative ap- proach for the two-controller version of our infinite horizon problem was used in [112] to find optimal strategies if certain coupled Riccati equations have solutions. 1 Note that due to the presence of multiple controllers, our DNCS cannot be viewed as a standard MJLS (with one controller). Nevertheless, we show that it is still possible to use some MJLS results for our DNCS. 74 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon 5.1.1 Organization The rest of the chapter is organized as follows. In Section 5.2, we formulate the finite horizon and infinite horizon optimal control problems for a DNCS with one remote controller and one local controller. We briefly review Markov Jump Linear Systems (MJLSs) in Section 5.3. We establish a connection between the DNCS of Section 5.2 and an auxiliary MJLS in Section 5.4 and use this connection to provide our main results for the DNCS of Section 5.2. In Section 5.5, we extend our DNCS model to the case with multiple local controllers and provide our main results for this DNCS. In Section 5.6, we extend our DNCS model to the case with multiple local controllers where there is a remote state associated with the remote controller and provide our main results for this DNCS. We discuss some key aspects of our approach in Section 5.7. We verify our theoretical results through several simulations in Section 5.8. Section 5.9 concludes the chapter. The proofs of all technical results are in the Appendix C. 5.2 System Model and Problem Formulation Consider a discrete-time system with a local controller C 1 and a remote controller C 0 as shown in Fig. 5.1. The linear plant dynamics are given by X t+1 =AX t +B 10 U 0 t +B 11 U 1 t +W t =AX t +BU t +W t , (5.1) where X t ∈ R d X is the state of the plant at time t, U t = vec(U 0 t ,U 1 t ), U 0 t ∈ R d 0 U is the control action of the remote controller C 0 , U 1 t ∈ R d 1 U is the control action of the local controller C 1 , and W t is the noise at time t. A,B := [B 10 ,B 11 ] are matrices with appropriate dimensions.We assume that X 0 = 0 and W t ,t = 0, 1,..., is an i.i.d. noise process with cov(W t ) = I. 5.2.1 Communication Model At each timet, the local controllerC 1 observes the stateX t perfectly and sends the observed state to the remote controller C 0 through an unreliable link with packet drop probability p 1 . Let Γ 1 t be a Bernoulli random variable describing the state of this link, that is, Γ 1 t = 0 75 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Remote Controller C 0 Local Controller C 1 Plant X t U 0 t U 1 t X t Γ t X t Z t Z t Z t Figure 5.1: Two-controller system model. The binary random variable Γt indicates whether packets are transmitted successfully. Dashed lines indicate control links and solid lines indicate communication links. Blue links are perfects but red links are prone to packet drops. if the link is broken (i.e., the packet is dropped) and Γ 1 t = 1 if the link is active. We assume that Γ 1 t ,t≥ 0, is an i.i.d. process and is independent of the noise process W 0:t ,t≥ 0. Let Z 1 t be the output of the unreliable link. Then, Γ 1 t = 1 with probability (1−p 1 ), 0 with probability p 1 . (5.2) Z 1 t = X t when Γ 1 t = 1, ∅ when Γ 1 t = 0. (5.3) We assume that Z 1 t is perfectly observed by C 0 . Further, we assume that C 0 sends an acknowledgment to the local controller C 1 if it receives the state value. Thus, effectively, Z 1 t is perfectly observed by C 1 as well. The two controllers select their control actions at time t after observing Z 1 t . We assume that the links for sending acknowledgments as well as the links from the controllers to the plant are perfectly reliable. 5.2.2 Information structure and cost Let H 0 t and H 1 t denote the information available to the controllers C 0 and C 1 to make decisions at time t, respectively. Then, H 0 t ={Z 1 0:t ,U 0 0:t−1 }, H 1 t ={X 0:t ,Z 1 0:t ,U 1 0:t−1 ,U 0 0:t−1 }. (5.4) 76 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon H 0 t will be referred to as the common information among the two controllers at time t 2 . LetH 0 t andH 1 t be the spaces of all possible realizations of H 0 t andH 1 t , respectively. Then, the control actions are selected according to U 0 t =g 0 t (H 0 t ), U 1 t =g 1 t (H 1 t ), (5.5) where the control laws g 0 t :H 0 t → R d 0 and g 1 t :H 1 t → R d 1 are measurable mappings. We use g := (g 0 0 ,g 0 1 ,...,g 1 0 ,g 1 1 ,... ) to denote the control strategies of C 0 and C 1 . The instantaneous cost c(X t ,U t ) of the system is a quadratic function given by c(X t ,U t ) =X | t QX t +U | t RU t , (5.6) where Q is a symmetric positive semi-definite (PSD) matrix, and R = R 00 R 01 R 10 R 11 is a symmetric positive definite (PD) matrix. 5.2.3 Problem Formulation LetG denote the set of all possible control strategies of C 0 and C 1 . The performance of control strategies g over a finite horizon T is measured by the total expected cost 3 : J T (g) :=E g " T X t=0 c(X t ,U t ) # . (5.7) We refer to the system described by (5.1)-(5.6) as the decentralized networked control system (DNCS). We consider the problem of strategy optimization for the DNCS over finite and infinite time horizons. These two problems are formally defined below. Problem 5.1 (Finite Horizon DNCS Optimal Control). For the DNCS described by (5.1)- (5.6), determine decentralized control strategies g that optimize the total expected cost over a finite horizon of duration T . In other words, solve the following strategy optimization 2 We assumed that U 0 0:t−1 is pat of H 1 t . This is not a restriction because even if U 0 0:t−1 is not directly observed by C 1 at time t, C 1 can still compute it using C 0 ’s strategy since it knows everything C 0 knows. 3 Because the cost function c(Xt,Ut) is always non-negative, the expectation is well-defined on the the extended real line R∪{∞}. 77 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon problem: inf g∈G J T (g). (5.8) Problem 5.2 (Infinite Horizon DNCS Optimal Control). For the DNCS described by (5.1)- (5.6), find decentralized strategiesg that minimize the infinite horizon average cost. In other words, solve the following strategy optimization problem: inf g∈G J ∞ (g) := inf g∈G lim sup T→∞ 1 T + 1 J T (g). (5.9) We make the following standard assumption on the system and cost matrices [113]. Assumption 5.1. (A,Q 1/2 ) is detectable and (A,B) is stabilizable. In Theorem 3.3 of Chapter 3, we described the optimal decentralized strategies for Problem 5.1. We summarize the finite horizon results below. Lemma 5.1. (Theorem 3.3) The optimal control strategies of Problem 5.1 are given by U 0∗ t U 1∗ t =K 0 t ˆ X t + 0 K 1 t X t − ˆ X t , (5.10) where ˆ X t =E[X t |H 0 t ] is the estimate (conditional expectation) of X t based on the common information H 0 t . The estimate can be computed recursively according to ˆ X 0 =0, (5.11) ˆ X t+1 = A +BK 0 t ˆ X t if Z t+1 =∅, X t+1 if Z t+1 =X t+1 . (5.12) The gain matrices are given by K 0 t = Ψ(P 0 t+1 ,R,A,B), (5.13) K 1 t = Ψ((1−p 1 )P 0 t+1 +p 1 P 1 t+1 ,R 11 ,A,B 11 ), (5.14) where P 0 t and P 1 t are PSD matrices obtained recursively as follows: P 0 T +1 =P 1 T +1 = 0, (5.15) 78 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon P 0 t = Ω(P 0 t+1 ,Q,R,A,B), (5.16) P 1 t = Ω((1−p 1 )P 0 t+1 +p 1 P 1 t+1 ,Q,R 11 ,A,B 11 ). (5.17) Furthermore, the optimal cost is given by J ∗ T = T X t=0 tr (1−p 1 )P 0 t+1 +p 1 P 1 t+1 . (5.18) Remark 5.1. Note that remote controller’s action U 0∗ t in (5.10) is a function of ˆ X t only while the local controller’s action U 1∗ t is a function of both ˆ X t and X t . Further, as per (5.11) and (5.12), ˆ X t is computed recursively based only on the knowledge of Z 0:t . In this chapter, we will focus on solving the infinite horizon problem (Problem 5.2). Our solution will employ results from Markov Jump Linear Systems (MJLS). We provide a review of the relevant results from the theory of Markov jump linear systems before describing our solution to Problem 5.2. 5.3 Review of Markov Jump Linear Systems A discrete-time Markov Jump Linear System (MJLS) is described by the dynamics X t+1 =A (M t )X t +B (M t )U t , (5.19) where X t ∈ R d X represents the state, M t ∈M ={0, 1,...,M} the mode, and U t the control action at time t. A (M t ),B (M t ) are mode-dependent matrices. The mode M t evolves as a Markov chain described by the transition probability matrix Θ = [θ ij ] i,j∈M such that P(M t+1 =j|M t =i) =θ ij . (5.20) The initial stateX 0 and modeM 0 are independent and they have probability distributions π X 0 and π M 0 , respectively. 79 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon The information available at time t to the controller of MJLS is H t ={X 0:t ,M 0:t ,U 0:t−1 }. The instantaneous cost incurred at time t is given by c (X t ,U t ,M t ) = (X t ) | Q (M t )X t + (U t ) | R (M t )U t , (5.21) where Q (M t ),R (M t ) are mode-dependent matrices. An admissible control strategy is a sequence of measurable mappingsg = (g 0 ,g 1 ,... ) such thatU t =g t (H t ) and eachU t has finite second moment. LetG be the set of all admissible control strategies for the MJLS. The MJLS has been extensively studied in the literature (see [55] and references therein). In the following, we state the finite horizon optimal control problem for the MJLS and provide the optimal control strategy for this problem. Finite Horizon MJLS Optimal Control Problem 5.1. For the MJLS described by (5.19)-(5.21), solve the following finite horizon strategy optimization problem inf g ∈G T X t=0 E g h c (X t ,U t ,M t ) i . (5.22) The solution of the finite horizon MJLS optimal control problem is given in the following lemma. Lemma 5.2. ([55, Theorem 4.2]) The optimal controller for finite horizon MJLS optimal control problem is given by U t =K t (M t )X t , (5.23) where K t (M t ) is a mode-dependent gain matrix. The gain matrices K t (m), m∈M, are given by K t (m) = Ψ M X k=0 θ mk P t+1 (k),R (m),A (m),B (m) , (5.24) where the matrices P t (m), m∈M, are recursively computed as follows P T +1 (m) = 0, (5.25) P t (m) = Ω M X k=0 θ mk P t+1 (k),Q (m),R (m),A (m),B (m) . (5.26) 80 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Further, the optimal cost is E[(X 0 ) | P 0 (M 0 )X 0 ]. (5.27) One interesting property of the recursions in (5.26) is that under some certain stability conditions, matrices P t (m), m∈M, converge as t→−∞ to steady-state solutions P ∗ (m) satisfying discrete-time coupled algebraic Riccati equations (DCARE) P ∗ (m) = Ω M X k=0 θ mk P ∗ (k),Q (m),R (m),A (m),B (m) . (5.28) Before providing the results on the convergence of sequences of matrices{P t (m),t =T + 1,T,T−1,...},m∈M, we introduce the concepts of stochastic stabilizability and stochastic detectability of the MJLS [55]. Definition 5.1. The MJLS of (5.19)-(5.20) is stochastically stabilizable if there exist gain matrices K (m),m∈M, such that for any initial state and mode, P ∞ t=0 E[||X t || 2 ] <∞ where X t+1 =A s (M t )X t and A s (M t ) =A (M t ) +B (M t )K (M t ). (5.29) In this case, we say the gain matrices K (m),m∈M, stabilize the MJLS. Definition 5.2. The MJLS of (5.19)-(5.21) is Stochastically Detectable (SD) if there exist gain matricesH (m),m∈M, such that for any initial state and mode, P ∞ t=0 E[||X t || 2 ]<∞ where X t+1 =A d (M t )X t and A d (M t ) =A (M t ) +H (M t ) Q (M t ) 1/2 . (5.30) From the theory of MJLS ([55, 56]), we can obtain the following result for the convergence of matrices{P t (m),t =T + 1,T,T− 1,...} to P ∗ (m) satisfying the DCARE in (5.28). Lemma 5.3. Suppose the MJLS is stochastically detectable (SD). Then, matrices P t (m), m∈M, converge as t→−∞ to PSD matrices P ∗ (m) that satisfy the DCARE in (5.28) if and only if the MJLS is stochastically stabilizable (SS). Proof. See Appendix C.2. 81 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Stochastically stabilizability (SS) and stochastically detectability (SD) of a MJLS can be verified from the system matrices and the transition matrix for the mode of the MJLS [55, 114]. Specifically, we have the following lemmas. Lemma 5.4. ([55, Theorem 3.9] and also [114, Corollary 2.6]) 1. A MJLS is SS if and only if there exist matrices K (m), m∈M, such the matrix A s := diag A s (0)⊗A s (0),...,A s (M)⊗A s (M) (Θ | ⊗ I), (5.31) is Schur stable, i.e. ρ(A s )< 1, where A s (M t ) is given by (5.29). 2. A MJLS is SD if and only if there exist matrices H (m), m∈M, such the matrix A d := diag A d (0)⊗A d (0),...,A d (M)⊗A d (M) (Θ | ⊗ I), (5.32) is Schur stable, i.e. ρ(A d )< 1, where A d (M t ) is given by (5.30). 5.4 Infinite Horizon Optimal Control In centralized LQG control, the solution of the finite horizon problem can be used to solve the infinite horizon average cost problem by ensuring that the finite horizon Riccati recursions reach a steady state and that the corresponding steady state strategies are optimal [113]. We will follow a similar conceptual approach for our problem. Unlike the centralized LQG case, however, we have to ensure that coupled Riccati recursion reach a steady state. Even if such a steady state is reached, we need to show that the corresponding decentralized strategies outperform every other choice of decentralized strategies. Because of these issues, our analysis for the infinite horizon problem, Problem 5.2, will differ significantly from that of centralized LQG problem. Recall that Lemma 5.1 in Section 5.2 describes the optimal control strategies for the finite horizon version of Problem 5.2 (that is, Problem 5.1). Since we know the optimal control strategies for Problem 5.1, solving Problem 5.2 amounts to answering the following three questions: 82 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon (Q1) Do matrices P 0 t and P 1 t defined in (5.15)-(5.17) converge as t→−∞ to P 0 ∗ and P 1 ∗ that satisfy the coupled fixed point equations (5.33)-(5.34) below? P 0 ∗ = Ω P 0 ∗ ,Q,R,A,B , (5.33) P 1 ∗ = Ω (1−p 1 )P 0 ∗ +p 1 P 1 ∗ ,Q,R 11 ,A,B 11 . (5.34) The above equations are steady state versions of (5.16)-(5.17) obtained by replacing P 0 t , P 0 t+1 (resp. P 1 t , P 1 t+1 ) with P 0 ∗ (resp. P 1 ∗ ). (Q2) If matrices P 0 t and P 1 t converge and we define matrices K 0 ∗ and K 1 ∗ using P 0 ∗ and P 1 ∗ as follows, K 0 ∗ = Ψ(P 0 ∗ ,R,A,B), (5.35) K 1 ∗ = Ψ (1−p 1 )P 0 ∗ +p 1 P 1 ∗ ,R 11 ,A,B 11 , (5.36) are the following strategies optimal for Problem 5.2? U 0∗ t U 1∗ t =K 0 ∗ ˆ X t + 0 K 1 ∗ X t − ˆ X t , (5.37) where ˆ X 0 = 0 and ˆ X t+1 = A +BK 0 ∗ ˆ X t if Z t+1 =∅, X t+1 if Z t+1 =X t+1 . (5.38) The above strategies are steady state versions of (5.10)-(5.12) obtained by replacing K 0 t ,K 1 t with K 0 ∗ ,K 1 ∗ . (Q3) If matrices P 0 t and P 1 t do not converge, is it still possible to find control strategies with finite cost for Problem 5.2? We answer the above three questions in the following subsections. 5.4.1 Answering Q1 In Q1, we want to know whether P 0 t and P 1 t defined by coupled recursions of (5.15)-(5.17) converge to P 0 ∗ and P 1 ∗ satisfying (5.33)-(5.34). Our approach for answering Q1 is based on establishing a connection between the recursions for matrices P 0 t and P 1 t in our DNCS 83 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon problem and the recursions for matrices P t (m),m∈M, in the MJLS problem reviewed in Section 5.3. This approach consists of the following two steps. Step 1: Constructing an auxiliary MJLS Consider an auxiliary MJLS where the setM of modes is{0, 1}. Then, we have the following two sequences of matrices, P t (0),P t (1), defined recursively using (5.25) and (5.26) for this MJLS: P T +1 (0) =P T +1 (1) = 0, (5.39) P t (0) = Ω θ 00 P t+1 (0) +θ 01 P t+1 (1),Q (0),R (0),A (0),B (0) , (5.40) P t (1) = Ω θ 10 P t+1 (0) +θ 11 P t+1 (1),Q (1),R (1),A (1),B (1) . (5.41) Furthermore, recall that from (5.15)-(5.17), we have the following recursions for matrices P 0 t and P 1 t in our DNCS problem, P 0 T +1 =P 1 T +1 = 0, (5.42) P 0 t = Ω(P 0 t+1 ,Q,R,A,B), (5.43) P 1 t = Ω((1−p 1 )P 0 t+1 +p 1 P 1 t+1 ,Q,R 11 ,A,B 11 ). (5.44) Is it possible to find matrices A (m),B (m),Q (m),R (m), m∈{0, 1}, and a transition probability matrix Θ for the auxiliary MJLS such that the recursions in (5.39) - (5.41) coincide with the recursions in (5.42)-(5.44)? By comparing (5.40)-(5.41) with (5.43)-(5.44), we find that the following definitions would make the two sets of equations identical: A (0) =A (1) =A, (5.45) B (0) =B, B (1) = [0,B 11 ], (5.46) Q (0) =Q (1) =Q, (5.47) R (0) =R, R (1) = I 0 0 R 11 , (5.48) Θ = θ 00 θ 01 θ 10 θ 11 = 1 0 1−p 1 p 1 . (5.49) 84 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon To complete the definition of the auxiliary MJLS, we need to define the initial state and mode probability distributions π X 0 and π M 0 . These can be defined arbitrarily and for simplicity we assume that the initial state and mode of the auxiliary MJLS are fixed to be X 0 = 0 and M 0 = 1. The following lemma summarizes the above discussion. Lemma 5.5. For the auxiliary MJLS described by (5.45)-(5.49), the coupled recursions in (5.39)-(5.41) are identical to the coupled recursions in (5.42)-(5.44). Proof. The lemma can be proved by straightforward algebraic manipulations. Remark 5.2. Note that we have not defined B (1) to be B 11 because the MJLS model requires that the dimensions of matrices B (0) and B (1) be the same (see Section 5.3). Similar dimensional considerations prevent us from defining R (1) to be simply R 11 . Remark 5.3. It should be noted that the auxiliary MJLS is simply a mathematical device. It cannot be seen as a reformulation or another interpretation of our DNCS problem. In particular, the binary modeM t is not the same as the link state Γ 1 t . The distinction between M t and Γ 1 t is immediately clear if one recalls that M t is the state of a Markov chain with transition probability matrix given in (5.49) whereas Γ 1 t ,t≥ 0, is an i.i.d process. Step 2: Using MJLS results to answer Q1 Now that we have constructed an auxiliary MJLS such that P t (m) = P m t for m∈{0, 1}, we can use the MJLS results about convergence of matrices P t (m) (that is, Lemmas 5.3 and 5.4) to answer Q1. The following lemma states this result. Lemma 5.6. Suppose Assumption 5.1 holds. Then, the matrices P 0 t and P 1 t defined in (5.15)-(5.17) converge as t→−∞ to matrices P 0 ∗ and P 1 ∗ that satisfy the coupled fixed point equations (5.33) - (5.34) if and only if p 1 <p 1 c where the critical threshold p 1 c is given by p 1 c := 1 (m 1 c ) 2 and m 1 c is given by m 1 c = min K∈R d 1 U ×d X ρ(A +B 11 K). (5.50) Proof. See Appendix C.3. 85 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Algorithm 5.1 PBH test Input: A and B 11 E = eig(A) # the set of eigenvalues of A M ={} # the set of unreachable modes of (A,B 11 ) for λ∈E do if rank([A−λI,B 11 ])< length(A) then Add|λ| to set M if M is empty then m 1 c = 0 else m 1 c = maxM Return: p 1 c = 1 (m 1 c ) 2 Lemma 5.7. Let m 1 c be as defined in (5.50). If (A,B 11 ) is reachable, then m 1 c is zero. Otherwise, m 1 c is the largest unreachable mode of (A,B 11 ). Proof. See Appendix C.4. Based on Lemma 5.7, we propose a simple algorithm for calculating m 1 c and use it to find the critical threshold p 1 c . This algorithm (Algorithm 1) finds the set of unreachable modes of pair (A,B 11 ) through the Popov-Belovich-Hautus (PBH) test [115]. If this set is not empty, then it sets m 1 c to be the largest unreachable mode. Otherwise, it sets m 1 c to be zero. Finally, it uses (5.50) to find p 1 c from m 1 c . In this algorithm, function eig(A) returns the set of eigenvalues of matrix A. Furthermore, function length(A) returns the length of the largest dimension of matrix A and function rank(A) returns the rank of matrix A. 4 Example 5.1. If A = 1 3 3 1 and B 11 = 1 0 , then we have m 1 c = 0 because the pair (A,B 11 ) is reachable. Hence, p 1 c =∞. If A = 1 3 3 1 and B 11 = 1 1 , then m 1 c = 2. Hence, we have p 1 c = 0.25. 4 Note that all of the functions eig, length, and rank are standard functions in MATLAB. 86 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon 5.4.2 Answering Q2 and Q3 Assuming that P n t → P n ∗ as t→−∞ for n = 0, 1, we want to know whether the control strategies of (5.35)-(5.38) are optimal for Problem 5.2. The following result shows that these control strategies are indeed optimal. Lemma 5.8. If P n t →P n ∗ as t→−∞ for n = 0, 1, then 1. Problem 5.2 has finite optimal cost, 2. The strategies described by (5.35)-(5.38) are optimal for Problem 5.2, 3. Under the strategies described by (5.35)-(5.38), X t and (X t − ˆ X t ) are mean square stable, i.e., sup t≥0 E g ∗ [||X t || 2 ]<∞ and sup t≥0 E g ∗ [||(X t − ˆ X t )|| 2 ]<∞, where g ∗ denotes the strategy described by (5.35)-(5.38). Proof. See Appendix C.5 for proof of parts 1) and 2). See Appendix C.6 for proof of part 3). The following lemma answers Q3. Lemma 5.9. If P n t , n = 0, 1, do not converge as t→−∞, then Problem 5.2 does not have finite optimal cost. Proof. See Appendix C.7. Now that we have answered Q1, Q2 and Q3, we can summarize our results for the infinite horizon DNCS problem (Problem 5.2). 5.4.3 Summary of the Infinite Horizon Results Based on the answers to Q1-Q3, the following theorem summarizes our results for Problem 5.2. 87 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Theorem 5.1. Suppose Assumption 5.1 holds. Then, (i) Problem 5.2 has finite optimal cost if and only if p 1 <p 1 c where the critical threshold p 1 c is given by (5.50). (ii) If p 1 < p 1 c , there exist symmetric positive semi-definite matrices P 0 ∗ ,P 1 ∗ that satisfy (5.33)-(5.34) and the optimal strategies for Problem 5.2 are given by U 0∗ t U 1∗ t =K 0 ∗ ˆ X t + 0 K 1 ∗ X t − ˆ X t , (5.51) where the estimate ˆ X t can be computed recursively using (5.38) with ˆ X 0 = 0 and the gain matrices K 0 ∗ ,K 1 ∗ are given by K 0 ∗ = Ψ(P 0 ∗ ,R,A,B), (5.52) K 1 ∗ = Ψ((1−p 1 )P 0 ∗ +p 1 P 1 ∗ ,R 11 ,A,B 11 ). (5.53) (iii) If p 1 < p 1 c , then under the strategies described in part (ii) above, X t and (X t − ˆ X t ) are mean square stable. Proof. The result follows from Lemmas 5.6, 5.8 and 5.9. If B 11 = 0, the local controller becomes just a sensor without any control ability. In this case, Theorem 5.1 gives the critical threshold as p 1 c =ρ(A) −2 and the closed-loop system is mean-square stable ifρ(A)< 1/ p p 1 . This recovers the single-controller NCS result in [51]. Thus, we have the following corollary of Theorem 5.1. Corollary 5.1 (Theorem 3 of [51] with α = 0 and β =p 1 ). Suppose the local controller is just a sensor (i.e., B 11 = 0) and the remote controller is the only controller present. Then, if ρ(A) < 1/ p p 1 , the optimal controller of this single-controller NCS is given by U 0∗ t in (5.51), and the corresponding closed-loop system is mean-square stable. Remark 5.4. If p 1 < p 1 c , the coupled Riccati equations in (5.33)-(5.34) can be solved by iteratively carrying out the recursions in (5.15)-(5.17) until convergence. This is similar to the procedure in [55, Chapter 7]. 88 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon 5.5 Extension to Multiple Local Controllers In this section, we study an extension of the system model in Section 5.2 to the case where instead of 1 local controller, we have N local controllers, C 1 ,C 2 ,...,C N , each associated to a co-located plant as shown in Fig. 5.2. We useN to denote the set{1, 2,...,N} and N to denote{0, 1,...,N}. The linear dynamics of plant n∈N are given by X n t+1 =A nn X n t +B nn U n t +B n0 U 0 t +W n t ,t = 0,...,T, whereX n t ∈R d n X is the state of the plant n at timet,U n t ∈R d n U is the control action of the controller C n , U 0 t ∈ R d 0 U is the control action of the controller C 0 , and A nn ,B nn ,B n0 are matrices with appropriate dimensions. We assume thatX n 0 = 0, and thatW n t ,n∈N,t≥ 0, are i.i.d random variables with zero mean and cov(W n t ) = I. Note that we do not assume that random variables W n t , n∈N,t≥ 0, are Gaussian. The overall system dynamics can be written as X t+1 =AX t +BU t +W t , (5.54) where X t = vec(X 1:N t ),U t = vec(U 0:N t ),W t = vec(W 0:N t ) and A,B are defined as A = A 11 0 . . . 0 A NN ,B = B 10 B 11 0 . . . . . . B N0 0 B NN . (5.55) Communication Model The communication model is similar to the one described in Section 5.2.1. In particular, for each n∈N , there is an unreliable link with link failure probability p n from the local controllerC n to the remote controllerC 0 . The local controllerC n uses its unreliable link to send the state X n t of its co-located plant to the remote controller. The state of this link at timet is described by a Bernoulli random variable Γ n t and the output of this link at timet is denoted byZ n t , where Γ n t andZ n t are described by equations similar to (5.2) and (5.3). We assume that Γ 1:N 0:t , t≥ 0, are independent random variables and that they are independent of W 1:N 0:t , t≥ 0. 89 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Unlike the unreliable uplinks, we assume that there exist perfect links from C 0 to C n , for each n∈N . Therefore, C 0 can share Z 1:N t and U 0 t−1 with all local controllers C 1:N . All controllers select their control actions at time t after observing Z 1:N t andU 0 t−1 . We assume that for each n∈N , the links from controllers C n and C 0 to plant n are perfect. Information structure and cost Let H n t denote the information available to controller C n , n∈N , at time t. Then, H n t ={X n 0:t ,U n 0:t−1 ,Z 1:N 0:t ,U 0 0:t−1 }, n∈N, H 0 t ={Z 1:N 0:t ,U 0 0:t−1 }. (5.56) LetH n t be the space of all possible realizations of H n t . Then, C n ’s actions are selected according to U n t =g n t (H n t ), n∈N, (5.57) where g n t :H n t →R d n U is a Borel measurable mapping. We use g := (g 0 0 ,g 0 1 ,... , g 1 0 ,g 1 1 ,... , g N 0 ,g N 1 ,..., ) to collectively denote the control strategies of all N + 1 controllers. The instantaneous cost c t (X t ,U t ) of the system is a quadratic function similar to the one described in (5.6) where X t = vec(X 1:N t ),U t = vec(U 0:N t ) and Q = Q 11 ... Q 1N . . . . . . . . . Q N1 ... Q NN ,R = R 00 R 01 ... R 0N R 10 R 11 ... R 1N . . . . . . . . . . . . R N0 ... ... R NN . Q is a symmetric positive semi-definite (PSD) matrix andR is a symmetric positive definite (PD) matrix. Problem Formulation LetG denote the set of all possible control strategies of controllersC 0 ,...,C N . The perfor- mance of control strategies g over a finite horizon T is measured by J T (g) defined in (5.7). 90 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Remote Controller C 0 Local Controller C 1 Plant X 1 t U 0 t U 1 t X 1 t Z 1 t Γ 1 t Z 1:N t Local Controller C N Plant X N t U 0 t U N t X N t Z N t Γ N t Z 1:N t Figure 5.2: System model. The binary random variables Γ 1:N t indicate whether packets are transmitted successfully. Dashed lines indicate control links and solid blue lines indicate communication links. Blue links are perfects but red links are prone to packet drops. For the decentralized networked control system (DNCS) described above, we consider the problem of strategy optimization over finite and infinite time horizons. These two problems are formally defined below. Problem 5.3. For the DNCS described above, solve the following strategy optimization problem: inf g∈G J T (g). (5.58) Problem 5.4. For the DNCS described above, solve the following strategy optimization problem: inf g∈G J ∞ (g) := inf g∈G lim sup T→∞ 1 T + 1 J T (g). (5.59) Due to stability issues in the infinite horizon problem, we make the following assumptions on the system and cost matrices. Assumption 5.2. (A,Q 1/2 ) is detectable and (A,B) is stabilizable. 91 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Assumption 5.3. A nn , (Q nn ) 1/2 is detectable for all n∈N . In Theorem 3.6 of Chapter 3, we described the optimal decentralized strategies for Problem 5.3. We summarize the finite horizon results below. Lemma 5.10. (Theorem 3.6) The optimal control strategies of Problem 5.3 are given by U 0∗ t U 1∗ t . . . U N∗ t =K 0 t ˆ X t + 0 ... 0 K 1 t 0 . . . 0 K N t X t − ˆ X t , (5.60) where ˆ X t = vec( ˆ X 1:N t ) = vec(E[X 1 t |H 0 t ],...,E[X N t |H 0 t ]) =E[X t |H 0 t ] is the estimate (con- ditional expectation) of X t based on the common information H 0 t . The estimate ˆ X t can be computed recursively as follows: for n∈N , ˆ X n 0 =0, (5.61) ˆ X n t+1 = [A] n,: + [B] n,: K 0 t ˆ X t if Z n t+1 =∅, X n t+1 if Z n t+1 =X n t+1 . (5.62) The gain matrices K 0 t and K n t , n∈N , are given by K 0 t = Ψ(P t+1 ,R,A,B), (5.63) K n t = Ψ (1−p n )[P 0 t+1 ] n,n +p n P n t+1 ,R nn ,A nn ,B nn , (5.64) where P 0 t = [P 0 t ] 1,1 ... [P 0 t ] 1,N . . . . . . . . . [P 0 t ] N,1 ... [P 0 t ] N,N ∈ R ( P N n=1 d n X )×( P N n=1 d n X ) and P n t ∈ R d n X ×d n X , for n∈ N , are PSD matrices obtained recursively as follows: where P 0 t and P n t , n∈N , are PSD matrices obtained recursively as follows: P 0 T +1 = 0, P n T +1 = 0, (5.65) P 0 t = Ω(P 0 t+1 ,Q,R,A,B), (5.66) P n t = Ω (1−p n )[P 0 t+1 ] n,n +p n P n t+1 ,Q nn ,R nn ,A nn ,B nn . (5.67) 92 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Furthermore, the optimal cost is given by J ∗ T = T X t=0 N X n=1 tr (1−p n )[P 0 t+1 ] n,n +p n P n t+1 . (5.68) 5.5.1 Infinite Horizon Optimal Control As in Section 5.4, the infinite horizon problem (Problem 5.4) can be solved by answering the following three questions: (Q1) Do matricesP 0 t ,...,P N t , defined in (5.65)-(5.67) converge ast→−∞ toP 0 ∗ ,...,P N ∗ , that satisfy the coupled fixed point equations (5.69)-(5.70) below? P 0 ∗ = Ω P 0 ∗ ,Q,R,A,B , (5.69) P n ∗ = Ω (1−p n )[P 0 ∗ ] n,n +p n P n ∗ ,Q nn ,R nn ,A nn ,B nn . (5.70) (Q2) If matrices P 0 t ,...,P N t converge and we define matrices K 0 ∗ ,...,K N ∗ , using matrices P 0 ∗ ,...,P N ∗ as follows, K 0 ∗ = Ψ(P 0 ∗ ,R,A,B), (5.71) K n ∗ = Ψ (1−p n )[P 0 ∗ ] n,n +p n P n ∗ ,R nn ,A nn ,B nn , (5.72) are the following strategies optimal for Problem 5.4? U 0∗ t U 1∗ t . . . U N∗ t =K 0 ∗ ˆ X t + 0 ... 0 K 1 ∗ 0 . . . 0 K N ∗ X t − ˆ X t , (5.73) where ˆ X t can be computed recursively using (5.61)-(5.62) by replacing K 0 t with K 0 ∗ . (Q3) If matrices P 0 t ,...,P N t do not converge, is it still possible to find control strategies with finite cost for Problem 5.4? As in Section 5.4.1, we will answer Q1 by establishing a connection between the recursions for matrices P 0 t ,...,P N t in our DNCS problem and the recursions for matrices P t (m), 93 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon m∈M, in the MJLS problem. One obstacle in making this connection is the fact that the matrices P 0 t ,...,P N t in our DNCS problem do not have the same dimensions while the matrices P t (m), m∈M, in the MJLS problem all have the same dimensions. This obstacle was not present in Section 5.4. To get around this difficulty, we first provide a new representation ¯ P 0 t ,..., ¯ P N t for the matricesP 0 t ,...,P N t in our DNCS problem such that the new matrices ¯ P 0 t ,..., ¯ P N t all have the same dimensions. Lemma 5.11. Define matrices ¯ P n t ∈R ( P N k=1 d k X )×( P N k=1 d k X ) , n = 0, 1,...,N, recursively as follows: ¯ P n T +1 = 0, (5.74) ¯ P 0 t = Ω( ¯ P 0 t+1 ,Q,R,A,B), (5.75) ¯ P n t = Ω (1−p n ) ¯ P 0 t+1 +p n ¯ P n t+1 ,L zero (Q,Q nn ,n,n), L iden (R,R nn ,n + 1),L zero (A,A nn ,n,n), L zero (B,B nn ,n,n + 1) , (5.76) where the operatorsL zero andL iden are as defined in Chapter 2. Then, for t≤T + 1, ¯ P 0 t =P 0 t , (5.77) ¯ P n t =L zero (P 0 t ,P n t ,n,n), n = 1,...,N. (5.78) Consequently, matricesP 0 t ,...,P N t converge ast→−∞ if and only if matrices ¯ P 0 t ,..., ¯ P N t converge as t→−∞. Proof. See Appendix C.8 for the proof of a more general case. We can now proceed with constructing an auxiliary MJLS. Consider an auxiliary MJLS where the setM of modes is{0, 1,...,N}. Then, we have the following N + 1 sequences of matrices, P t (0),P t (1),...,P t (N), defined recursively using (5.25) and (5.26) for this MJLS: P T +1 (m) = 0, ∀m∈M, (5.79) P t (0) = Ω N X k=0 θ 0k P t+1 (k),Q (0),R (0),A (0),B (0) , (5.80) 94 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon P t (n) = Ω N X k=0 θ nk P t+1 (k),Q (n),R (n),A (n),B (n) . (5.81) Furthermore, we have the recursions of (5.74)-(5.76) for matrices ¯ P 0 t ,..., ¯ P N t in our DNCS problem. By comparing (5.80)-(5.81) with (5.75)-(5.76), we find that the following defini- tions would make the two sets of equations identical: A (0) =A,A (n) =L zero (A,A nn ,n,n), n∈N, (5.82) B (0) =B,B (n) =L zero (B,B nn ,n,n + 1),n∈N, (5.83) Q (0) =Q,Q (n) =L zero (Q,Q nn ,n,n), n∈N, (5.84) R (0) =R,R (n) =L iden (R,R nn ,n + 1), n∈N, (5.85) Θ = 0 1 2 ... N 0 1 0 ... ... 0 1 1−p 1 p 1 . . . . . . 2 1−p 2 0 p 2 . . . . . . . . . . . . . . . . . . . . . 0 N 1−p N 0 ... 0 p N . (5.86) To complete the definition of the auxiliary MJLS, we need to define the initial state and mode probability distributions π X 0 and π M 0 . These can be defined arbitrarily and for simplicity we assume that the initial state is fixed to be X 0 = 0 and the initial mode M 0 is uniformly distributed over the setM. The following lemma summarizes the above discussion. Lemma 5.12. For the auxiliary MJLS described by (5.82)-(5.86), the coupled recursions in (5.79)-(5.81) are identical to the coupled recursions in (5.74)-(5.76). Proof. The lemma can be proved by straightforward algebraic manipulations. We can now use the MJLS results about convergence of matrices P t (m) (that is, Lemmas 5.3 and 5.4) to answer Q1. The following lemma states this result. 95 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Lemma 5.13. Suppose Assumptions 5.2 and 5.3 hold. Then, the matrices P 0 t ,...,P N t defined in (5.65)-(5.67) converge ast→−∞ to matricesP 0 ∗ ,...,P N ∗ that satisfy the coupled fixed point equations (5.69)-(5.70) if and only if p n < p n c for all n∈N , where the critical thresholds p n c ,n∈N , are given by 1 √ p n c = min K∈R d n U ×d n X ρ(A nn +B nn K). (5.87) Proof. See Appendix C.9 for the proof of a more general case. The following lemmas answer Q2 and Q3. Lemma 5.14. If P n t →P n ∗ as t→−∞ for n = 0,...,N, then 1. Problem 5.4 has finite optimal cost, 2. The strategies described by (5.71)-(5.73) are optimal for Problem 5.4, 3. Under the strategies described by (5.71)-(5.73), X t and (X t − ˆ X t ) are mean square stable. Proof. See Appendix C.10 for the proof of a more general case. Lemma 5.15. If matrices P 0 t ,...,P N t do not converge as t→−∞, then Problem 5.4 does not have finite optimal cost. Proof. See Appendix C.11 for the proof of a more general case. 5.5.2 Summary of the Infinite Horizon Results Based on the answers to Q1-Q3, the following theorem summarizes our results for Problem 5.4. Theorem 5.2. Suppose Assumptions 5.2 and 5.3 hold. Then, (i) Problem 5.4 has finite optimal cost if and only if for all n∈N , p n < p n c where the critical threshold p n c is given by (5.87). 96 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon (ii) If p n < p n c for all n ∈ N , there exist symmetric positive semi-definite matrices P 0 ∗ ,...,P N ∗ that satisfy (5.69)-(5.70) and the optimal strategies for Problem 5.4 are given by (5.73) where the estimate ˆ X t can be computed recursively using (5.61)-(5.62) by replacingK 0 t withK 0 ∗ and the gain matricesK 0 ∗ ,...,K N ∗ are given by (5.71)-(5.72). (iii) If p n < p n c for all n∈N , then under the strategies described in part (ii) above, X t and (X t − ˆ X t ) are mean square stable. Corollary 5.2. Suppose the local controllers are just sensors (i.e., B nn = 0 for n = 1,...,N) and the remote controller is the only controller present. Then, if ρ(A)< 1/ √ p n for all n = 1,...,N, the optimal controller of this multi-sensor, single-controller NCS is given by U 0∗ t in (5.73), and the corresponding closed-loop system is mean-square stable. Remark 5.5. If Assumption 5.3 is not true, define, for n = 1,...,N, p n c = min{p n d ,p n s }, where 1 √ p n s = min K∈R d n U ×d n X ρ(A nn +B nn K). (5.88) 1 p p n d = min H∈R d n X ×d n X ρ A nn +H(Q nn ) 1/2 . (5.89) Then, using arguments similar to those used for proving Theorem 5.2, we can show that if p n < p n c for all n∈N , the strategies in (5.73) are optimal for Problem 5.4. Moreover, Problem 5.4 has finite optimal cost and the system state is mean square stable under optimal strategies. 5.6 Extension to Multiple Local Controllers With A Global State In this section, we study an extension of the system model in Section 5.5 where there is also a global state associated with the global controller 5 as shown in Fig. 5.3. We useN to denote the set{1, 2,...,N} andN to denote{0, 1,...,N}. The linear dynamics of the plant 0 (that is, the global plant) are given by X 0 t+1 =A 00 X 0 t +B 00 U 0 t +W 0 t , (5.90) 5 The global controller is analogous to the remote controller in Section 5.5 of this chapter. However, in addition to controlling the local plants, here it also controls the global plant. 97 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Global Controller C 0 Global Plant X 0 t Local Controller C 1 Plant X 1 t U 0 t U 1 t X 1 t Z 1 t Γ 1 t U 0 t X 0 t X 0 t ,Z 1:N t Local Controller C N Plant X N t U 0 t U N t X N t Z N t Γ N t X 0 t ,Z 1:N t Figure 5.3: System model. The binary random variables Γ 1:N t indicates whether packets are transmitted successfully. Blue lines indicate perfect links and red lines indicate unreliable links. Solid lines are communication links and dotted lines are control links. and the linear dynamics of plant n∈N are given by X n t+1 =A nn X n t +A n0 X 0 t +B nn U n t +B n0 U 0 t +W n t ,t = 0,...,T, (5.91) where for n ∈ N , X n t ∈ R d n X is the state of the plant n at time t and U n t ∈ R d n U is the control action of the controller C n . Furthermore, A nn ,B nn , n∈N , and A n0 ,B n0 , n∈N , are matrices with appropriate dimensions. We assume that X n 0 = 0, and that W n t , n∈N,t≥ 0, are i.i.d random variables with zero mean and cov(W n t ) = I. Note that we do not assume that random variablesW n t ,n∈N,t≥ 0, are Gaussian. The overall system dynamics can be written as X t+1 =AX t +BU t +W t , (5.92) where X t = vec(X 0:N t ),U t = vec(U 0:N t ),W t = vec(W 0:N t ) and A,B are defined as A = A 00 A 10 A 11 0 . . . . . . A N0 0 A NN ,B = B 00 B 10 B 11 0 . . . . . . B N0 0 B NN . (5.93) 98 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Communication Model The communication model is similar to the one described in Section 5.5. In particular, for each n∈N , there is an unreliable link with link failure probability p n from the local controllerC n to the global controllerC 0 . The local controllerC n uses its unreliable link to send the state X n t of its co-located plant to the global controller. The state of this link at timet is described by a Bernoulli random variable Γ n t and the output of this link at timet is denoted byZ n t , where Γ n t andZ n t are described by equations similar to (5.2) and (5.3). We assume that Γ 1:N 0:t , t≥ 0, are independent random variables and that they are independent of W 0:N 0:t , t≥ 0. Unlike the unreliable uplinks, we assume that there exist perfect links from C 0 to C n , for each n∈N . Therefore, C 0 can share Z 1:N t and U 0 t−1 with all local controllers C 1:N . We further assume that all controllers C 0:N can perfectly observes the state X 0 t of the global plant. All controllers select their control actions at time t after observing Z 1:N t and U 0 t−1 . We assume that for each n∈N , the links from controllers C n to plant n and for each n∈N , the links from controller C 0 to plant n are perfect. Information structure and cost Let H n t denote the information available to controller C n , n∈N , at time t. Then, H n t ={X 0 0:t ,X n 0:t ,U n 0:t−1 ,Z 1:N 0:t ,U 0 0:t−1 }, n∈N, H 0 t ={X 0 0:t ,Z 1:N 0:t ,U 0 0:t−1 }. (5.94) LetH n t be the space of all possible realizations of H n t . Then, C n ’s actions are selected according to U n t =g n t (H n t ), n∈N, (5.95) where g n t :H n t →R d n U is a Borel measurable mapping. We use g := (g 0 0 ,g 0 1 ,... , g 1 0 ,g 1 1 ,... , g N 0 ,g N 1 ,..., ) to collectively denote the control strategies of all N + 1 controllers. 99 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon The instantaneous cost c t (X t ,U t ) of the system is a quadratic function similar to the one described in (5.6) where X t = vec(X 0:N t ),U t = vec(U 0:N t ) and Q = Q 00 Q 01 ... Q 0N Q 10 Q 11 ... Q 1N . . . . . . . . . . . . Q N0 ... ... Q NN ,R = R 00 R 01 ... R 0N R 10 R 11 ... R 1N . . . . . . . . . . . . R N0 ... ... R NN . (5.96) Q is a symmetric positive semi-definite (PSD) matrix andR is a symmetric positive definite (PD) matrix. Problem Formulation LetG denote the set of all possible control strategies of controllersC 0 ,...,C N . The perfor- mance of control strategies g over a finite horizon T is measured by J T (g) defined in (5.7). For the decentralized networked control system (DNCS) described above, we consider the problem of strategy optimization over finite and infinite time horizons. These two problems are formally defined below. Problem 5.5. For the DNCS described above, solve the following strategy optimization problem: inf g∈G J T (g). (5.97) Problem 5.6. For the DNCS described above, solve the following strategy optimization problem: inf g∈G J ∞ (g) := inf g∈G lim sup T→∞ 1 T + 1 J T (g). (5.98) Due to stability issues in the infinite horizon problem, we make the following assumptions on the system and cost matrices. Assumption 5.4. (A,Q 1/2 ) is detectable and (A,B) is stabilizable. Assumption 5.5. A nn , (Q nn ) 1/2 is detectable for all n∈N . In Theorem 4.4 of Chapter 4, we described the optimal decentralized strategies for Problem 5.3. We summarize this result below. 100 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Lemma 5.16. (Theorem 4.4) The optimal control strategies of Problem 5.5 are given by U 0∗ t U 1∗ t . . . U N∗ t =K 0 t X 0 t ˆ X 1 t . . . ˆ X N t + 0 K 1 t (X 1 t − ˆ X 1 t ) . . . K N t (X N t − ˆ X N t ) , (5.99) whereX n t =E[X n t |H 0 t ] is the estimate (conditional expectation) ofX n t based on the common information H 0 t . The estimate ˆ X n t can be computed recursively as follows: for n∈N , ˆ X n 0 =0, (5.100) ˆ X n t+1 = [A] n+1,: + [B] n+1,: K 0 t ˆ X t if Z n t+1 =∅, X n t+1 if Z n t+1 =X n t+1 . (5.101) where ˆ X t := vec(X 0 t , ˆ X 1:N t ). The gain matrices K 0 t and K n t , n∈N , are given by K 0 t = Ψ(P 0 t+1 ,R,A,B), (5.102) K n t = Ψ (1−p n )[P 0 t+1 ] n+1,n+1 +p n P n t+1 ,R nn ,A nn ,B nn , (5.103) where P 0 t = [P 0 t ] 1,1 ... [P 0 t ] 1,N+1 . . . . . . . . . [P 0 t ] N+1,1 ... [P 0 t ] N+1,N+1 ∈ R ( P N n=0 d n X )×( P N n=0 d n X ) and P n t ∈ R d n X ×d n X , for n∈N , are PSD matrices obtained recursively as follows: where P 0 t and P n t , n∈N , are PSD matrices obtained recursively as follows: P 0 T +1 = 0, P n T +1 = 0, (5.104) P 0 t = Ω(P 0 t+1 ,Q,R,A,B), (5.105) P n t = Ω (1−p n )[P 0 t+1 ] n+1,n+1 +p n P n t+1 ,Q nn ,R nn ,A nn ,B nn . (5.106) Furthermore, the optimal cost is given by J ∗ T = T X t=0 h tr([P 0 t+1 ] 1,1 ) + N X n=1 [(1−p n ) tr([P 0 t+1 ] n+1,n+1 ) +p n tr(P n t+1 )] i . (5.107) 101 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon 5.6.1 Infinite Horizon Optimal Control As in Section 5.4, the infinite horizon problem (Problem 5.6) can be solved by answering the following three questions: (Q1) Do matricesP 0 t ,...,P N t , defined in (5.104)-(5.106) converge ast→−∞ toP 0 ∗ ,...,P N ∗ , that satisfy the coupled fixed point equations (5.108)-(5.109) below? P 0 ∗ = Ω P 0 ∗ ,Q,R,A,B , (5.108) P n ∗ = Ω (1−p n )[P 0 ∗ ] n+1,n+1 +p n P n ∗ ,Q nn ,R nn ,A nn ,B nn . (5.109) (Q2) If matrices P 0 t ,...,P N t converge and we define matrices K 0 ∗ ,...,K N ∗ , using matrices P 0 ∗ ,...,P N ∗ as follows, K 0 ∗ = Ψ(P 0 ∗ ,R,A,B), (5.110) K n ∗ = Ψ (1−p n )[P 0 ∗ ] n+1,n+1 +p n P n ∗ ,R nn ,A nn ,B nn , (5.111) are the following strategies optimal for Problem 5.6? U 0∗ t U 1∗ t . . . U N∗ t =K 0 ∗ X 0 t ˆ X 1 t . . . ˆ X N t + 0 K 1 ∗ (X 1 t − ˆ X 1 t ) . . . K N ∗ (X N t − ˆ X N t ) , (5.112) where ˆ X t can be computed recursively using (5.100)-(5.101) by replacing K 0 t with K 0 ∗ . (Q3) If matrices P 0 t ,...,P N t do not converge, is it still possible to find control strategies with finite cost for Problem 5.6? As in Section 5.4.1, we will answer Q1 by establishing a connection between the recursions for matricesP 0 t ,...,P N t in our DNCS problem and the recursions for matricesP t (m),m∈M, in the MJLS problem. Similar to Section 5.5.1, we first provide a new representation ¯ P 0 t ,..., ¯ P N t for the matrices P 0 t ,...,P N t in our DNCS problem such that the new matrices ¯ P 0 t ,..., ¯ P N t all have the same dimensions. 102 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Lemma 5.17. Define matrices ¯ P n t ∈R ( P N k=0 d k X )×( P N k=0 d k X ) , n = 0, 1,...,N, recursively as follows: ¯ P n T +1 = 0, (5.113) ¯ P 0 t = Ω( ¯ P 0 t+1 ,Q,R,A,B), (5.114) ¯ P n t = Ω (1−p n ) ¯ P 0 t+1 +p n ¯ P n t+1 ,L zero (Q,Q nn ,n + 1,n + 1), L iden (R,R nn ,n + 1),L zero (A,A nn ,n + 1,n + 1), L zero (B,B nn ,n + 1,n + 1) , (5.115) where the operatorsL zero andL iden are as defined in Section 2. Then, for t≤T + 1, ¯ P 0 t =P 0 t , (5.116) ¯ P n t =L zero (P 0 t ,P n t ,n + 1,n + 1), n = 1,...,N. (5.117) Consequently, matricesP 0 t ,...,P N t converge ast→−∞ if and only if matrices ¯ P 0 t ,..., ¯ P N t converge as t→−∞. Proof. See Appendix C.8 for a proof. We can now proceed with constructing an auxiliary MJLS. Consider an auxiliary MJLS where the setM of modes is{0, 1,...,N}. Then, we have the following N + 1 sequences of matrices, P t (0),P t (1),...,P t (N), defined recursively using (5.25) and (5.26) for this MJLS: P T +1 (m) = 0, ∀m∈M, (5.118) P t (0) = Ω N X k=0 θ 0k P t+1 (k),Q (0),R (0),A (0),B (0) , (5.119) P t (n) = Ω N X k=0 θ nk P t+1 (k),Q (n),R (n),A (n),B (n) . (5.120) Furthermore, we have the recursions of (5.113)-(5.115) for matrices ¯ P 0 t ,..., ¯ P N t in our DNCS problem. By comparing (5.119)-(5.120) with (5.114)-(5.115), we find that the following definitions would make the two sets of equations identical: A (0) =A,A (n) =L zero (A,A nn ,n + 1,n + 1), n∈N, (5.121) 103 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon B (0) =B,B (n) =L zero (B,B nn ,n + 1,n + 1),n∈N, (5.122) Q (0) =Q,Q (n) =L zero (Q,Q nn ,n + 1,n + 1), n∈N, (5.123) R (0) =R,R (n) =L iden (R,R nn ,n + 1), n∈N, (5.124) Θ = 0 1 2 ... N 0 1 0 ... ... 0 1 1−p 1 p 1 . . . . . . 2 1−p 2 0 p 2 . . . . . . . . . . . . . . . . . . . . . 0 N 1−p N 0 ... 0 p N . (5.125) To complete the definition of the auxiliary MJLS, we need to define the initial state and mode probability distributions π X 0 and π M 0 . These can be defined arbitrarily and for simplicity we assume that the initial state is fixed to be X 0 = 0 and the initial mode M 0 is uniformly distributed over the setM. The following lemma summarizes the above discussion. Lemma 5.18. For the auxiliary MJLS described by (5.121)-(5.125), the coupled recursions in (5.118)-(5.120) are identical to the coupled recursions in (5.113)-(5.115). Proof. The lemma can be proved by straightforward algebraic manipulations. Now that we have constructed an auxiliary MJLS where P t (m) = ¯ P m t for m = 0,...,N, we can use the MJLS results about convergence of matrices P t (m) (that is, Lemmas 5.3 and 5.4) to answer Q1. The following lemma states this result. Lemma 5.19. Suppose Assumptions 5.4 and 5.5 hold. Then, the matrices P 0 t ,...,P N t defined in (5.104)-(5.106) converge as t→−∞ to matrices P 0 ∗ ,...,P N ∗ that satisfy the coupled fixed point equations (5.108)-(5.109) if and only if p n <p n c for all n∈N , where the critical thresholds p n c ,n∈N , are given by 1 √ p n c = min K∈R d n U ×d n X ρ(A nn +B nn K). (5.126) Proof. See Appendix C.9. The following lemmas answer Q2 and Q3. 104 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon Lemma 5.20. If P n t →P n ∗ as t→−∞ for n = 0,...,N, then 1. Problem 5.6 has finite optimal cost, 2. The strategies described by (5.110)-(5.112) are optimal for Problem 5.6, 3. Under the strategies described by (5.110)-(5.112), X t and (X t − ˆ X t ) are mean square stable. Proof. See Appendix C.10. Lemma 5.21. If matrices P 0 t ,...,P N t do not converge as t→−∞, then Problem 5.6 does not have finite optimal cost. Proof. See Appendix C.11. 5.6.2 Summary of the Infinite Horizon Results Based on the answers to Q1-Q3, the following theorem summarizes our results for Problem 5.6. Theorem 5.3. Suppose Assumptions 5.4 and 5.5 hold. Then, (i) Problem 5.6 has finite optimal cost if and only if for all n∈N , p n < p n c where the critical threshold p n c is given by (5.126). (ii) If p n < p n c for all n ∈ N , there exist symmetric positive semi-definite matrices P 0 ∗ ,...,P N ∗ that satisfy (5.108)-(5.109) and the optimal strategies for Problem 5.6 are given by U 0∗ t U 1∗ t . . . U N∗ t =K 0 ∗ X 0 t ˆ X 1 t . . . ˆ X N t + 0 K 1 ∗ (X 1 t − ˆ X 1 t ) . . . K N ∗ (X N t − ˆ X N t ) , (5.127) 105 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon where the estimate ˆ X t can be computed recursively using (5.100)-(5.101) by replacing K 0 t with K 0 ∗ and the gain matrices K 0 ∗ ,...,K N ∗ are given by K 0 ∗ = Ψ(P 0 ∗ ,R,A,B), (5.128) K n ∗ = Ψ (1−p n )[P 0 ∗ ] n+1,n+1 +p n P n ∗ ,R nn ,A nn ,B nn . (5.129) (iii) If p n < p n c for all n∈N , then under the strategies described in part (ii) above, X t and (X t − ˆ X t ) are mean square stable. Remark 5.6. If Assumption 5.5 is not true, define, for n = 1,...,N, p n c = min{p n s ,p n d }, where 1 √ p n s = min K∈R d n U ×d n X ρ(A nn +B nn K). (5.130) 1 p p n d = min H∈R d n X ×d n X ρ A nn +H(Q nn ) 1/2 . (5.131) Then, using arguments similar to those used for proving Theorem 5.3, we can show that if p n < p n c for all n∈N , the strategies in (5.127) are optimal for Problem 5.6. Moreover, Problem 5.6 has finite optimal cost and the system state is mean square stable under optimal strategies. 5.7 Discussion 5.7.1 Summary of the Approach The analysis in Sections 5.4, 5.5, and 5.6 suggests a general approach for solving infinite horizon decentralized control/DNCS problems. This can be summarized as follows: 1. Solve the finite horizon version of the DNCS/decentralized control problem (for in- stance by using the common information approach [54]). Suppose the optimal strate- gies are characterized by matrices P m t ,m ∈ M = {1, 2,...,M} which satisfy M coupled Riccati recursions P m t = Ω( X j θ mj P j t+1 ,Q m ,R m ,A m ,B m ), (5.132) 106 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon for some matrices Q m ,R m ,A m ,B m and positive numbers θ mj for m,j∈M. Note that we can scale the θ’s such that P j∈M θ mj = 1 by appropriately scaling A m and R m for all m∈M. 2. Construct aM-mode auxiliary MJLS with transition probabilitiesθ mj and system ma- tricesQ m ,R m ,A m ,B m so that the Riccati recursions associated with optimal control of the MJLS coincide with the Riccati recursions (5.132). 3. Analyze stability criteria of the auxiliary MJLS to find conditions under which the Riccati recursions of the DNCS reach a steady state. 4. Verify that the decentralized strategies characterized by the steady state DNCS Riccati equations are optimal. 5.7.2 The Information Switching Even though the auxiliary MJLS we used in our analysis is an artificial system without apparent physical meaning (see Remark 5.3), a general DNCS with random packet drops (or random packet delays) does have some aspects of a switched system. In particular, the information at a controller (e.g, the remote controller in our problem) switches between different patterns based on the state of the underlying communication network. The infor- mation of the remote controller in Section 5.2 clearly switches between two patterns: no observation when the packet is dropped, and perfect observation when the transmission is successful. The number of such patterns seems related to (but not always equal to) the number of modes in the MJLS used to analyze the DNCS. For the two-controller DNCS of Section 5.2, the number of patterns between which the remote controller’s information switches is the same as the number of modes in its auxiliary MJLS. For theN +1 controller DNCS in Section 5.5, the remote controller’s information appears to switch between 2 N patterns (depending on the state of the N links) but its auxiliary MJLS has only N + 1 modes. This difference between the number of information patterns and the number of modes is due to the nature of the plant dynamics which ensure that the remote controller’s estimate of the nth plant is not affected by the state of the mth link if m6=n. 5.8 Simulation Results In this section, we demonstrate our theoretical results through several simulations. 107 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon 0 25 50 75 100 125 150 175 200 Iteration (t) 0 2 4 1 i = 0 ||P i t + 1 P i t || F p 1 = 0.24, p 1 c = 0.25 0 25 50 75 100 125 150 175 200 Iteration (t) 0 2 4 1 i = 0 ||P i t + 1 P i t || F 1e44 p 1 = 0.25, p 1 c = 0.25 Figure 5.4: P 1 i=0 kP i t+1 −P i t k F versus number of iterations To verify Lemma 5.6, we study the convergence of matricesP 0 t andP 1 t given by (5.15)-(5.17). To this end, we consider the following example, A = 1 3 3 1 , B = [B 10 ,B 11 ] = 1 1 0 1 , Q = I 2 , R = 0.01× I 2 . (5.133) For this example, we can use Algorithm 5.1 to find the critical probability threshold as p 1 c = 0.25. Then, Lemma 5.6 states that the matrices P 0 t and P 1 t converges (as t → −∞) if and only if p 1 < p 1 c . Since the critical probability threshold p 1 c is 0.25, we study the convergence of matrices P 0 t and P 1 t for two values 0.24 and 0.25 of the packet drop probabilityp 1 . As we can see from Figure 5.4, for the value 0.24 of packet drop probability p 1 ,kP 0 t+1 −P 0 t k F +kP 1 t+1 −P 1 t k F goes to zero as the number of iterations increases. This means that both of matricesP 0 t andP 1 t converge in this case. However, for the value 0.25 of packet drop probabilityp 1 ,kP 0 t+1 −P 0 t k F +kP 1 t+1 −P 1 t k F gets arbitrarily large by increasing the number of iterations. This means that at least one of matrices P 0 t and P 1 t does not converge which agrees with the statement of Lemma 5.6. 108 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon 0 200 400 600 800 1000 Time (t) 5 10 15 average cost p 1 = 0.24, p 1 c = 0.25 0 200 400 600 800 1000 Time (t) 0 20000 40000 60000 average cost p 1 = 0.25, p 1 c = 0.25 Figure 5.5: Average cost To verify part (i) of Theorem 5.1 (which states that the optimal infinite-horizon average cost is finite if and only ifp 1 <p 1 c ), we consider an instance of Problem 5.2 where the system and cost matrices are described by (5.133) and the sequence of matrices W t are assumed to be Gaussian with zero mean and covariance matrix I 2 . We first calculate the matrices P 0 • andP 1 • for the values ofp 1 = 0.24 andp 1 = 0.25 by iterating the recursions of (5.15)-(5.17) 200 times. Then, for both cases (p 1 = 0.24 and p 1 = 0.25), we substitute matrices P 0 • and P 1 • instead ofP 0 ∗ andP 1 ∗ and use (5.35)-(5.38) to calculate control actions U 0∗ t andU 1∗ t for 0≤ t≤ T where T = 1000. We have run this experiment N E = 1000 times and Figure 5.5 shows the mean 1 N E P N E i=1 1 t P t j=0 c(X j ,U ∗ j ) of average cost versus time t for 0≤t≤T . As we can see from this figure, while the average cost is finite for the value p 1 = 0.24, the average cost gets arbitrarily large for the value p 1 = 0.25. This agrees with part (i) of Theorem 5.1 since the critical probability threshold p 1 c for this example is 0.25. 5.9 Conclusion We considered the infinite horizon optimal control problem of a decentralized networked control system (DNCS) with unreliable communication links. We showed that if the link 109 Chapter 5. Decentralized Control over Unreliable Communication– Infinite horizon failure probabilities are below certain critical thresholds, then the solutions of certain cou- pled Riccati recursions reach a steady state and the corresponding decentralized strategies are optimal. Above these thresholds, we showed that no strategy can achieve finite cost. Our main results in Theorems 5.1, 5.2, and 5.3 explicitly identify the critical thresholds for the link failure probabilities and the corresponding optimal decentralized strategies when all link failure probabilities are below their thresholds. These results were obtained by exploit- ing a connection between our DNCS Riccati recursions and the coupled Riccati recursions of an auxiliary Markov jump linear system (MJLS). 110 Chapter 6 Decentralized Control with Partially Unknown Systems 6.1 Introduction Many modern control systems such as networked control systems and teams of autonomous systems consist of a group of agents acting in collaboration with each other to achieve a common goal under uncertainty [116]. Such systems have motivated the investigation of multi-agent (decentralized) control problems under various information structures [50, 117– 120]. Most of these works assume that the system model is known precisely to all the agents in the system. However, for most real-world systems the model and its parameters are often not known perfectly to the agents. Reinforcement learning provides a framework for controlling a dynamical system in the absence of perfect knowledge of system parameters. There exists a rich body of work in the field of multi-agent reinforcement learning where the system is usually modeled as a multi-agent Markov Decision Process (MDP) or a team Markov game [121–127]. However, these works mostly deal in a finite state space and action space setting and cannot be extended trivially to a system with continuous state/action space. The adaptive control of a single-agent (centralized) linear quadratic (LQ) control problem has been well-studied [128–131]. However, many of the available results are asymptotic in nature and do not take into account the performance during learning. Recently [1, 132– 134] have used online learning methods for single-agent LQ control problems which provide 111 Chapter 6. Decentralized Control with Partially Unknown Systems finite-time guarantees on the cost achieved by the learning algorithm. Among these is the idea of Thompson Sampling (TS) which has gained wide attention due to its computational efficiency and performance. TS based algorithms for single-agent LQ control problems have been proposed in [1, 134–136] which achieve a regret of ˜ O( √ T ) over a time horizon of T . Here ˜ O(·) hides constants and logarithmic factors. This regret scaling is believed to be optimal for single-agent control LQ problems except for logarithmic factors. In this chapter, we consider a multi-agent LQ control problem consisting of three systems, a “global” system and two “local” systems. In this problem, there are three agents – the actions of agent 1 can affect the global system as well as the local systems while the actions of agents 2 and 3 can only affect their respective co-located local systems. Further, the global system’s state can affect the local systems’ state evolution. Variations of this problem setting where the dynamics of all systems are known have been studied in the literature [31, 137, 138]. We are interested in minimizing the infinite-horizon average cost incurred when the dynamics of the global system are not known to the agents. We propose a Thompson Sampling (TS)-based multi-agent learning algorithm where each agent learns the global system’s dynamics independently. We construct an auxiliary single-agent LQ control problem and show that the expected (Bayesian) regret achieved by our algorithm is upper bounded by the expected (Bayesian) regret for the auxiliary single-agent problem under a TS algorithm. Since the TS algorithm of [1] achieves a ˜ O( √ T ) expected regret for single-agent LQ problems, our result indicates that the expected regret of our algorithm is upper bounded by ˜ O( √ T ) under certain assumptions. Our numerical experiments indicate that this bound is matched in practice. 6.1.1 Organization The rest of the chapter is organized as follows. We introduce the system model and formulate the multi-agent learning problem in Section 6.2. In Section 6.3, we construct an auxiliary single-agent LQ control problem based on the problem of Section 6.2. This auxiliary single- agent LQ control problem will be used later for the regret analysis of the problem of Section 6.2. In Section 6.4, we state our main result which provides a learning algorithm with sub- linear regret for the problem of Section 6.2. Section 6.5 provides the proof for the main result of Section 6.4. We show the correctness of our theoretical results through some simulations in Section 6.6. Section 6.7 concludes the chapter. The proofs of all the technical results of the chapter appear in the Appendix D. 112 Chapter 6. Decentralized Control with Partially Unknown Systems 6.2 Problem Formulation Consider a multi-agent linear system consisting of a global system (System 1) and two local systems (Systems 2 and 3) as shown in Figure 6.1. The linear dynamics of the global system are given by X 1 t+1 =A 11 X 1 t +B 11 U 1 t +W 1 t , (6.1) and the linear dynamics of the local systems are given by X 2 t+1 =A 21 X 1 t +A 22 X 2 t +B 21 U 1 t +B 22 U 2 t +W 2 t , X 3 t+1 =A 31 X 1 t +A 33 X 3 t +B 31 U 1 t +B 33 U 3 t +W 3 t , (6.2) where, for n∈{1, 2, 3}, X n t ∈ R d n X is the state of system n and U n t ∈ R d n U is the action of agent n. The matrices A n1 ,A nn ,B n1 ,B nn , n∈{2, 3}, of the local systems are known matrices with appropriate dimensions. However, A 11 ∈ R d 1 X ×d 1 X and B 11 ∈ R d 1 X ×d 1 U are unknown matrices of the global system. We assume that the initial states X 1:3 1 are zero and for n∈{1, 2, 3}, W n t , t≥ 1, are i.i.d random variables with zero-mean and covariance matrix cov(W n t ) = I. Furthermore, the collection of random variables W 1:3 1:t , t≥ 1, are independent. The overall system dynamics can be written as, X t+1 =AX t +BU t +W t (6.3) where we have defined A = A 11 0 0 A 21 A 22 0 A 31 0 A 33 , B = B 11 0 0 B 21 B 22 0 B 31 0 B 33 , (6.4) and X t = vec(X 1:3 t ),U t = vec(U 1:3 t ),W t = vec(W 1:3 t ). At each time t, the state X 1 t of the global system is directly observed by all the agents. Also, agents 2 and 3 perfectly observe the state of their respective co-located local systems. Agentn’s actionU n t at timet is a functionπ n t of its informationH n t , that is,U n t =π n t (H n t ) 113 Chapter 6. Decentralized Control with Partially Unknown Systems Agent 2 Agent 3 Agent 1 System 2 A 22 ,B 22 known System 3 A 33 ,B 33 known System 1 A 11 ,B 11 unknown U 2 t U 3 t U 1 t U 1 t U 1 t X 2 t X 3 t X 1 t X 1 t X 1 t Figure 6.1: Three-agent system model. Solid lines indicate communication links, dashed lines indicate control links, and dash-dot lines indicate that one system can affect another one. where H 1 t ={X 1 1:t ,U 1 1:t−1 }, H n t ={X 1 1:t ,X n 1:t ,U 1 1:t−1 ,U n 1:t−1 }, n∈{2, 3}. (6.5) Let π = (π 1 ,π 2 ,π 3 ) where π n = (π n 1 ,π n 2 ,...). At time t, the system incurs an instantaneous cost c(X t ,U t ), which is a quadratic function given by c(X t ,U t ) =X | t QX t +U | t RU t , (6.6) where Q is a known symmetric positive semi-definite (PSD) matrix and R is a known symmetric positive definite (PD) matrix with the following structure, Q = Q 11 Q 12 Q 13 Q 21 Q 22 Q 23 Q 31 Q 32 Q 33 , R = R 11 R 12 R 13 R 21 R 22 R 23 R 31 R 32 R 33 . (6.7) 6.2.1 The Optimal Multi-Agent Linear-Quadratic Problem Let Θ := [A 11 ,B 11 ] be the dynamics parameter of the global system. When Θ is known to the agents, minimizing the infinite horizon average cost is a multi-agent (decentralized) stochastic Linear-Quadratic (LQ) problem. LetJ(Θ) be the optimal infinite horizon average 114 Chapter 6. Decentralized Control with Partially Unknown Systems cost under Θ, that is, J(Θ) = inf π lim sup T→∞ 1 T T X t=1 E π [c(X t ,U t )|Θ]. (6.8) We make the following assumption about the multi-agent stochastic LQ problem. Assumption 6.1. (A,Q 1/2 ) is detectable and (A,B) is stabilizable. Furthermore, for n∈ {2, 3}, (A nn , (Q nn ) 1/2 ) is detectable and (A nn ,B nn ) is stabilizable. The optimal infinite horizon cost J(Θ) for the above multi-agent stochastic LQ problem can be obtained from the results of Theorem 5.3 in Chapter 5 by assuming that there are two local controllers and the communication links from the local controllers to the remote controller are always failed (that is, p 1 =p 2 = 1). We summarize this result below. Lemma 6.1. Under Assumption 6.1, the optimal infinite horizon cost J(Θ) is given by J(Θ) = tr [P (Θ)] 1,1 + tr( ˜ P 2 ) + tr( ˜ P 3 ), (6.9) where P (Θ), ˜ P 2 , and ˜ P 3 are the unique PSD solutions to the following Ricatti equations: P (Θ) = Ω(P (Θ),Q,R,A,B), (6.10) ˜ P n = Ω( ˜ P n ,Q nn ,R nn ,A nn ,B nn ), n∈{2, 3}. (6.11) The optimal strategies π ∗ are given by U 1 t U 2 t U 3 t = K 1 (Θ) K 2 (Θ) K 3 (Θ) X 1 t ˆ X 2 t ˆ X 3 t + 0 ˜ K 2 (X 2 t − ˆ X 2 t ) ˜ K 3 (X 3 t − ˆ X 3 t ) , (6.12) where the gain matrices K(Θ) := K 1 (Θ) K 2 (Θ) K 3 (Θ) , ˜ K 2 , and ˜ K 3 are given by K(Θ) = Ψ(P (Θ),R,A,B), (6.13) ˜ K n = Ψ( ˜ P n ,R nn ,A nn ,B nn ), n∈{2, 3}. (6.14) 115 Chapter 6. Decentralized Control with Partially Unknown Systems Furthermore ˆ X n t =E π ∗ [X n t |H 1 t , Θ], n∈{2, 3}, is the estimate (conditional expectation) of X n t given H 1 t and Θ. The estimates ˆ X n t , n∈{2, 3}, can be computed recursively according to ˆ X n 1 = 0, (6.15) ˆ X n t+1 =A n1 X 1 t +A nn ˆ X n t + B n1 K 1 (Θ) +B nn K n (Θ) X 1 t ˆ X 2 t ˆ X 3 t . (6.16) 6.2.2 The Multi-Agent Reinforcement Learning Problem The problem we are interested in is to minimize the infinite horizon average cost when the matrices A 11 and B 11 of the global system are unknown. In this case, the control problem can be seen as a Multi-Agent Reinforcement Learning (MARL) problem where all the three agents need to learn the system parameter Θ = [A 11 ,B 11 ] in order to minimize the infinite horizon average cost. We adopt a Bayesian setting and assume that there is a prior distributionμ 1 for Θ. Since the actual parameter Θ is unknown, we define the expected regret of a (potentially randomized) policy π = (π 1 ,π 2 ,π 3 ) up to time T as follows: R(T,π) =E π " T X t=1 c(X t ,U t )−TJ(Θ) # , (6.17) which is the expected difference between the performance of the agents under policy π and the optimal infinite horizon cost under full information about the parameter Θ of the global system. Thus, the regret can be interpreted as a measure of the cost of not knowing the global system. The above expectation is with respect to the random noise of the overall system (W 1:T ), the prior distribution μ 1 , and randomization in the agents’ strategies. The learning objective is to find a multi-agent strategy that minimizes the expected regret. Due to stability issues in the infinite horizon problem, we make the following assumption on the system and cost matrices. Assumption 6.2. For any Θ = [A 11 ,B 11 ] in the support of prior distributionμ 1 , (A,Q 1/2 ) is detectable and (A,B) is stabilizable. 116 Chapter 6. Decentralized Control with Partially Unknown Systems 6.3 A Single-Agent LQ Problem In this section, we construct an auxiliary single-agent LQ control problem based on the MARL problem of Section 6.2. This auxiliary single-agent LQ control problem will be used later for the regret analysis of the MARL problem. Consider a single-agent system with dynamics X t+1 =AX t +BU t + W 1 t 0 0 , (6.18) where X t ∈ R d 1 X +d 2 X +d 3 X is the state of the system, U t ∈ R d 1 U +d 2 U +d 3 U is the action of the auxiliary agent, W 1 t is the noise vector of the global system defined in (6.1), and matrices A and B are as defined in (6.4). The initial state X 1 is assumed to be zero. The action U t =π t (H t ) at timet is a function of the history of observationsH t ={X 1:t ,U 1:t−1 }. The auxiliary agent’s strategy is denoted byπ = (π 1 ,π 2 ,...). The instantaneous costc(X t ,U t ) of the system is a quadratic function given by c(X t ,U t ) = (X t ) | QX t + (U t ) | RU t , (6.19) where matrices Q and R are as defined in (6.6). When Θ = [A 11 ,B 11 ] (note thatA 11 andB 11 are sub-block matrices ofA andB as described in (6.4)) is known to the auxiliary agent, minimizing the infinite horizon average cost is a single-agent stochastic Linear-Quadratic (LQ) control problem. Let J (Θ) be the optimal infinite horizon average cost under Θ, that is, J (Θ) = inf π lim sup T→∞ 1 T T X t=1 E π [c(X t ,U t )|Θ]. (6.20) Then, the following lemma summarizes the result for the optimal infinite horizon single- agent LQ control problem. Lemma 6.2 ([128, 139]). Under Assumption 6.1, the optimal infinite horizon cost J (Θ) is given by J (Θ) = tr([P (Θ)] 1,1 ) where P (Θ) is as defined in (6.10). Furthermore, the optimal strategy π ∗ is given by U t =K(Θ)X t where K(Θ) is as defined in (6.13). 117 Chapter 6. Decentralized Control with Partially Unknown Systems Algorithm 6.1 TS-SARL Input: μ 1 Initialize X 1 to be zero, τ = 1, L = 0 for t = 1, 2,... do ifC is true then #C:sampling condition Sample Θ t from μ t L←t−τ #L:previous episode length τ←t #τ:last sampled time else Θ t = Θ t−1 Compute K(Θ t ) from (6.13) Compute U t =K(Θ t )X t and execute it Observe new state X t+1 Update μ t with X t ,K(Θ t ),X t+1 to obtain μ t+1 When the actual parameter Θ is unknown, this single-agent stochastic LQ control problem becomes a Single-Agent Reinforcement Learning (SARL) problem. We define the expected regret of a policy π up to time T compared with the optimal infinite horizon cost J (Θ) to be R (T,π ) =E π " T X t=1 c(X t ,U t )−TJ (Θ) # . (6.21) The above expectation is with respect to the random noise of the global system (W 1 1:T ), the prior distribution μ 1 , and randomization in the auxiliary agent’s strategy. Thompson Sampling (TS) or posterior sampling has recently been applied to minimize the regret in the single-agent LQ control problem [1, 134–136]. Based on the TS approach, at each time t, the controller (i.e. the auxiliary agent in SARL problem) maintains a posterior belief μ t on the unknown parameter Θ based on its observations so far. Then, at certain carefully chosen times, the controller generates a random sample Θ t from the posterior μ t and uses it to compute the gain matrix K(Θ t ). The control action U t =K(Θ t )X t is executed and as a result, the next state X t+1 is observed. Then, X t+1 together with X t ,K(Θ t ) is used to update μ t . Existing TS-based algorithms for the single-agent LQ control problem [1, 134–136] differ in when a sample should be generated. LetC be a sampling condition which specifies when a new sample should be generated. Further, let τ be the last time a sample has been generated and L be the time interval between the last two successive generated samples (which is known as an episode). Then, in spite of the differences among the existing proposed 118 Chapter 6. Decentralized Control with Partially Unknown Systems algorithms, all can be generally described as the TS-SARL algorithm (TS-based algorithm for the SARL problem). The regret analysis for the single-agent LQ control problem can be done either in the Bayesian setting [1, 135] (i.e., for an unknown environment that comes from the prior μ 1 ) or in the frequentist setting [134, 136] (i.e., for a fixed unknown environment with some arbitrary priorμ 1 ). For the Bayesian setting, the TS-based algorithm of [1] achieve a ˜ O( √ T ) expected regret for the SARL problem (Here ˜ O(·) hides constants and logarithmic factors). Remark 6.1. If the prior distribution on Θ is Gaussian and the underlying system noise is Gaussian, the posterior at each time is also a Gaussian distribution since the system dynamics are linear. TS-based algorithms in the literature [1, 134–136] adopt a Gaussian posterior (say ¯ μ t ) projected on some set Υ where the set Υ varies in each paper based on their assumptions. This results in a posterior μ t = ¯ μ t | Υ where·| Υ is the projection operator on set Υ. Example 6.1. TS-based algorithm of [136] can be considered as an instance of the TS-SARL algorithm where the sampling condition C is satisfied when t = bγ m c for some m = 0, 1, 2,.... Here γ > 1 is an arbitrary initial parameter of the algorithm. Further, for this algorithm the projection set Υ is the set of all d 1 X × (d 1 X +d 1 U ) matrices. Example 6.2. TS-based algorithm of [1] can be considered as an instance of the TS-SARL algorithm where the sampling conditionC holds true when time t satisfies the following condition t>τ +L or det(Ξ t )< 0.5 det(Ξ τ ). (6.22) Here Ξ t is the covariance of the unprojected posterior ¯ μ t . Further, for this algorithm the projection set Υ is such that for any Θ = [A 11 ,B 11 ] and ˜ Θ in Υ, the closed-loop matrix A +BK( ˜ Θ) has spectral norm less than δ, that is,kA +BK( ˜ Θ)k≤ δ where δ < 1 is an initial parameter of the algorithm. Example 6.3. TS-based algorithm of [134] can be considered as an instance of the TS-SARL algorithm where the sampling condition C is satisfied at each time t. Further, for this algorithm the projection set Υ is as follows Υ ={Θ :J (Θ)≤D,kA +BK(Θ)k≤δ, tr(Θ | Θ)≤S}, (6.23) where D, S, and δ< 1 are initial parameters of the algorithm. 119 Chapter 6. Decentralized Control with Partially Unknown Systems 6.4 Main Results We will now propose the TS-MARL algorithm for the multi-agent RL (MARL) problem formu- lated in Section 6.2 based on the TS-SARL algorithm. Algorithm 6.2 TS-MARL Input: μ 1 Initialize X 1:3 1 and ˇ X 2:3 1 all to be zero, τ = 1, and L = 0 for t = 1, 2,... do ifC is true then Sample Θ t from μ t L←t−τ τ←t else Θ t = Θ t−1 Compute K(Θ t ) from (6.13) Compute U n t from (6.24) and execute it Observe new global state X 1 t+1 if n = 2, 3 then Observe new state X n t+1 Compute ˇ X 2 t+1 and ˇ X 3 t+1 using (6.25) Update μ t with X 1 t ,K 1 (Θ t ),X 1 t+1 to obtain μ t+1 U 1 t U 2 t U 3 t = K 1 (Θ t ) K 2 (Θ t ) K 3 (Θ t ) X 1 t ˇ X 2 t ˇ X 3 t + 0 ˜ K 2 (X 2 t − ˇ X 2 t ) ˜ K 3 (X 3 t − ˇ X 3 t ) , (6.24) ˇ X m t+1 =A m1 X 1 t +A mm ˇ X m t + B m1 K 1 (Θ t ) +B mm K m (Θ t ) vec(X 1 t , ˇ X 2 t , ˇ X 3 t ) m∈{2, 3} (6.25) TS-MARL algorithm is a multi-agent algorithm which is performed independently by all three agents. In the TS-MARL algorithm, each agent keeps a posterior on the parameter Θ of the global system. When the sampling conditionC holds true, each agent generates a random sample Θ t from its posterior μ t and computes the gain matrix K(Θ t ) from (6.13). Agent n uses the gain matrix K(Θ t ) to compute its action U n t according to (6.24). Note that agents 2 and 3 need ˜ K 2 and ˜ K 3 respectively to calculate their actionsU 2 t andU 3 t . However, we know from (6.14) that ˜ K 2 and ˜ K 3 are independent of the unknown parameter Θ and hence, they can be calculated prior to the beginning of the algorithm. After the execution 120 Chapter 6. Decentralized Control with Partially Unknown Systems of the actions U 1:3 t by the agents, all the agents observe the new global state X 1 t+1 and the agents 2 and 3 further observe the new states X 2 t+1 and X 3 t+1 of their co-located systems, respectively. Then, each agent n independently uses (6.25) to compute ˇ X 2 t+1 and ˇ X 3 t+1 . Finally, the independently calculated gain matrix K(Θ t ) together with X 1 t and X 1 t+1 is used by each agent n to update μ t . Remark 6.2. Note that ˇ X 2 t+1 and ˇ X 3 t+1 in the TS-MARL algorithm (given by (6.25)) are proxies for ˆ X 2 t+1 and ˆ X 3 t+1 of (6.16) where instead of the unknown parameter Θ, we have Θ t . Remark 6.3. Due to the independent execution of the TS-MARL algorithm, agents might generate different samples Θ t from μ t . As a result, the computed gain matrices K(Θ t ) by the agents can be different. Since each agent uses its calculated K(Θ t ) to compute ˇ X 2 t+1 and ˇ X 3 t+1 using (6.25), the computed ˇ X 2 t+1 and ˇ X 3 t+1 can be different from one agent to the other. Further, since each agent uses its own K(Θ t ) to update μ t , the new posterior μ t+1 can be different among the agents. This difference in μ t+1 among the agents can lead to different τ and L in the subsequent steps of the algorithm. In order to avoid issues pointed out in Remark 6.3, we make an assumption about how samples are generated by the agents. Assumption 6.3. All agents use the same sampling seed for generating samples from their posteriors μ t . Now, we present our main result which is based on Assumption 6.3. Theorem 6.1. Under Assumption 6.3, letR(T,TS-MARL) be the expected regret for the MARL problem under the policy of the TS-MARL algorithm andR (T,TS-SARL) be the expected regret for the auxiliary SARL problem under the policy of the TS-SARL algorithm. Then, R(T,TS-MARL)≤R (T,TS-SARL). (6.26) This result shows that under the policy of the TS-MARL algorithm, the expected regret for the MARL problem is upper-bounded by the expected regret for the auxiliary SARL problem constructed in Section 6.3 under the policy of the TS-SARL algorithm. This theorem fur- ther implies that expected regret bounds for TS-SARL algorithm also hold for the TS-MARL algorithm. 121 Chapter 6. Decentralized Control with Partially Unknown Systems Corollary 6.1. TS-MARL algorithm with the sampling conditionC of [1] achieves a ˜ O( √ T ) regret for the MARL problem. 6.5 Proof of Theorem 6.1 We first prove some preliminary results in the following lemmas which will be used in the proof of Theorem 6.1. Lemma 6.3. Under Assumption 6.3, at each time t of the TS-MARL algorithm, Θ t , τ, L, U 1 t , ˇ X 2 t , ˇ X 3 t , and μ t calculated independently by the agents are all equal. Proof. See Appendix D.1 for a proof. Lemma 6.4. Let S n t be a random process that evolves as S n t+1 =C n S n t +W n t , S n 1 = 0, (6.27) where C n =A nn +B nn ˜ K n . Define Σ n t = cov(S n t ), then the sequence of matrices Σ n t , t≥ 1, is increasing 1 and it converges to a PSD matrix Σ n as t→∞. Proof. See Appendix D.2 for a proof. We now proceed in two steps: • Step 1: Showing the connection between the auxiliary SARL problem and the MARL problem • Step 2: Using the SARL problem to bound the regret of the MARL problem Step 1: Showing the connection between the auxiliary SARL problem and the MARL problem First, we present the following lemma that connects the optimal infinite horizon cost J (Θ) of the auxiliary SARL problem when Θ is known (that is, the auxiliary single-agent LQ 1 Note that increasing is in the sense of partial order, that is, Σ n 1 Σ n 2 Σ n 3 ... 122 Chapter 6. Decentralized Control with Partially Unknown Systems problem of Section 6.3) and the optimal infinite horizon cost J(Θ) of the MARL problem when Θ is known (that is, the multi-agent LQ problem of Section 6.2.1). Lemma 6.5. Let J (Θ) be the optimal infinite horizon cost for the auxiliary single-agent LQ problem of Section 6.3, J(Θ) be the optimal infinite horizon cost for the multi-agent LQ problem of Section 6.2.1, and Σ n , n∈{2, 3}, be as defined in Lemma 6.4. Then, J(Θ) =J (Θ) + tr(D 2 Σ 2 ) + tr(D 3 Σ 3 ), (6.28) where we have defined D n :=Q nn + ( ˜ K n ) | R nn ˜ K n for n∈{2, 3}. Proof. See Appendix D.3 for a proof. Next, we provide the the following lemma that shows the connection between the expected cost E [c(X t ,U t )] in the MARL problem under the policy of the TS-MARL algorithm and ex- pected cost E [c(X t ,U t )] in the auxiliary SARL problem under the policy of the TS-SARL algorithm. Lemma 6.6. At each time t, the following equality holds between the expected cost under the policies of the TS-SARL and the TS-MARL algorithms, E TS-MARL [c(X t ,U t )] =E TS-SARL [c(X t ,U t )] + tr(D 2 Σ 2 t ) + tr(D 3 Σ 3 t ). (6.29) Proof. See Appendix D.4 for a proof. Step 2: Using the SARL problem to bound the regret of the MARL problem In this step, we use the connection between the auxiliary SARL problem and our MARL problem, which was established in Step 1, to show that the expected regret of the policy of the TS-MARL algorithm is bounded by the expected regret of the policy of the TS-SARL algorithm. Note that from the definition of the expected regret in the the MARL problem given by (6.17), we have, R(T,TS-MARL) =E TS-MARL " T X t=1 c(X t ,U t )−TJ(Θ) # = T X t=1 E TS-SARL [c(X t ,U t )] + T X t=1 [tr(D 2 Σ 2 t ) + tr(D 3 Σ 3 t )] 123 Chapter 6. Decentralized Control with Partially Unknown Systems −TE[J (Θ)]−T tr(D 2 Σ 2 )−T tr(D 3 Σ 3 ) = T X t=1 E TS-SARL [c(X t ,U t )−TJ (Θ)] + T X t=1 [tr(D 2 (Σ 2 t − Σ 2 )) + tr(D 3 (Σ 3 t − Σ 3 ))] =R (T,TS-SARL) + T X t=1 [tr(D 2 (Σ 2 t − Σ 2 )) + tr(D 3 (Σ 3 t − Σ 3 ))] ≤R (T,TS-SARL), (6.30) where the second equality is correct because of Lemma 6.5, Lemma 6.6, and the fact that J(Θ) is independent of the policy of the TS-MARL algorithm, that is, E TS-MARL [J(Θ)] = E[J(Θ)]. Furthermore, the third equality is correct due to the fact thatJ (Θ) is independent of the policy of the TS-SARL algorithm, that is, E[J (Θ)] = E TS-SARL [J (Θ)], the fourth equality is correct by definition of the expected regret in the SARL problem, and the last inequality is correct because from Lemma 6.4, the sequence of matrices Σ n t is increasing, that is, Σ n − Σ n t 0 and D n is positive semi-definite, and consequently, tr(D n (Σ n t − Σ n ))≤ 0, n∈{2, 3}. This proves the statement of Theorem 6.1. 6.6 Experiments In this section, we illustrate the performance of the TS-MARL algorithm through numerical experiments. Our proposed algorithm requires a sampling conditionC. As the algorithm in [1] achieves a ˜ O( √ T ) regret for a SARL problem, we use the sampling conditionC of this algorithm described in Example 6.2 with the parameter δ = 0.99. We consider an instance of the MARL problem where the global system (which is unknown to the agents in our problem), has the following parameters (which are the same as the model studied in [140–142]) with d 1 x =d 1 u = 3, A 11 = 1.01 0.01 0 0.01 1.01 0.01 0 0.01 1.01 , B 11 = I 3 , (6.31) 124 Chapter 6. Decentralized Control with Partially Unknown Systems 0 200 400 600 800 1000 Time horizon (t) 100 0 100 200 300 400 500 600 700 Regret Same sampling seed Arbitrary sampling seed 20 p t Figure 6.2: TS-MARL algorithm with sampling conditionC of [1] and the local systems are one-dimensional, that is, d 2 x = d 3 x = d 2 u = d 3 u = 1, with the following parameters, A 21 =B 21 = h 1 0 1 i , A 22 = 1.01, B 22 = 1 (6.32) A 31 =B 31 = h 0 1 1 i , A 33 = 1.01, B 33 = 1. (6.33) Further, we consider the following matrices (with the same structure as the model in [140– 142]) for the cost function, Q = 10 −3 I 5 , R = I 5 . (6.34) While the theoretical result of Theorem 6.1 required the same sampling seed among the agents (i.e., Assumption 6.3), we consider both cases of same sampling seed and arbitrary sampling seed for the experiments. We ran 50 simulations and show the mean of regret with the 95% confidence interval for each scenario. As it can be seen from Figure 6.2, for both of theses cases, our proposed algorithm with sampling conditionC of [1] achieves a ˜ O( √ T ) regret for our MARL problem, which matches the theoretical results of Corollary 6.1. 125 Chapter 6. Decentralized Control with Partially Unknown Systems 6.7 Conclusion We considered a multi-agent Linear-Quadratic (LQ) control problem consisting of three systems, a “global” system and two “local” systems, and three agents. The goal was to minimize the infinite-horizon average cost incurred when the dynamics of the global system are not known to the agents. We proposed a Thompson Sampling (TS)-based multi-agent learning algorithm where each agent learns the global system’s dynamics independently. We showed that the expected regret of our algorithm is upper bounded by ˜ O( √ T ) under certain assumptions where ˜ O(·) hides constants and logarithmic factors. 126 Chapter 7 Dynamic Teams and Decentralized Control Problems with Substitutable Actions 7.1 Introduction The difficulty of finding optimal strategies in dynamic teams and decentralized control problems has been well-established in the literature [20, 77, 143, 144]. In general, the optimization of strategies can be a non-convex problem over infinite dimensional spaces [23]. Even the celebrated linear quadratic Gaussian (LQG) model of centralized control presents difficulties in the decentralized setting [77, 143, 145]. There has been significant interest in identifying classes of problems that are more tractable. Information structures, which describe what information is available to which member/controller, have been closely associated with their tractability. Problems with partially nested [20] or stochastically nested information structures [21] and problems that satisfy quadratic invariance [46] or funnel causality [48] properties have been identified as being “simpler” than the general problems. In this chapter, we first look at the nature of cost function and the information dynamics in a LQG dynamic team problem. We define a property called substitutability which means that the effects of one member’s action on the cost function and the information dynamics can be achieved by the action of another member. Although the problem we formulate is 127 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions not partially nested, our result shows that, under certain conditions, linear strategies are optimal. We then consider a decentralized LQG problem and show that the idea of substi- tutability can be used in such problems as well. Substitutability in a decentralized problem can be interpreted as follows: the effects of one controller’s action on the instantaneous cost and the state dynamics can be achieved by the action of another controller. Even though the problem we formulate does not belong to one of the simpler classes mentioned earlier, our results show that linear strategies are optimal. Further, we provide a complete state-space characterization of optimal strategies and identify a family of information structures that all achieve the same cost as the centralized information structure. These results suggest that substitutability can work as a counterpart of the information structure requirements that enable simplification of dynamic teams and decentralized control problems. Our work shares conceptual similarities with the work on internal quadratic variance [146, 147] which identified problems that are not quadratically invariant but can still be reduced to (infinite dimensional) convex programs. In contrast to this work, we explicitly iden- tify optimal strategies in the decentralized control problem. The interplay of information structure and cost in relation to the complexity of dynamic team problems has also been observed for variations of the Witsenhausen counterexample [143] in [148, 149]. 7.1.1 Organization The rest of this chapter is organized as follows. In Section 7.2, we introduce a non-partially- nested LQG dynamic team problem and we show that under certain conditions called “sub- stitutability”, an optimal strategy of each team member is linear in its information. We then in Section 7.3, consider a decentralized LQG problem and show that the idea of sub- stitutability can be used in such problems as well. Section 7.4 concludes this chapter. The proofs of some the technical results of the chapter appear in the Appendix E. 128 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions 7.2 Dynamic Team with Non-partially-nested Information Structure 7.2.1 Team Model and Information Structure We consider a team composed of n members. The setN = {1, 2,...,N} denotes the collection of team members. The random vector Ξ taking values in R d ξ denotes all the exogenous uncertainties which are not controlled by any of the members. The probability distribution of Ξ is assumed to be N(0, Σ), where Σ is a positive definite matrix. The information available to memberi is denoted byZ i ∈R d i z . Memberi chooses action/de- cision U i ∈ R d i u as a function of the information available to it. Specifically, for i∈N , U i =g i (Z i ) whereg i is the decision strategy of memberi. The collectiong = (g 1 ,g 2 ,...,g N ) is called the team strategy. The performance of the team strategy g is measured by the expected cost J (g) =E g [(MΞ +PU) | (MΞ +PU)] (7.1) where U = vec(U 1 ,...,U N ) and P = h P 1 ... P N i . The information Z i available to member i includes what it has observed and what other members have communicated to it. We assume that Z i is a known linear function of Ξ and the decisions taken by some other members, that is, Z i =H i Ξ + X j∈N\{i} D ij U j ∀i∈N (7.2) where H i and D ij are matrices with appropriate dimensions. We assume that members make decisions sequentially according to their index and that the information of member i can depend only on the decisions of members indexed 1 to i− 1. Thus, we assume that D ij = 0 for j≥i. (7.3) The matrices H i ,D ij ,i,j∈N, in the information structure, the matrices M,P i ,i∈N, in the cost function, and the probability distribution of random vector Ξ are known to all team members. 129 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions Following [20], we define the following relationships among team members. Definition 7.1. We say that member s is related to member t and denote this by sRt if D ts 6= 0. Further, we say that members is a precedent of membert and denote this bys t if (a) sRt, or (b) there exist distinct k 1 ,k 2 ,...,k m ∈N such that sRk 1 , k 1 Rk 2 ,...,k m Rt. We denote the set of all precedents of membert byP t . The team is said to have a partially nested information structure if for each member t and each s∈P t , Z s ⊂ v Z t . In other words, whenever the decision of member s affects the information of member t, then t knows whatever s knows. For partially nested information structure, optimal strategies can be obtained using the method described in [20]. We will focus on teams where the information structure is not partially nested. Definition 7.2. We say (s,t) is a critical pair with respect to partial nestedness if s t but Z s 6⊂ v Z t . We denote the set of all members s∈P t for which (s,t) is a critical pair byC t . According to above definitions, an information structure is not partially nested if there exists t∈P for whichC t 6=?. 7.2.2 Substitutability Assumption We make the following assumption about the team model. Assumption 7.1. For every critical pair (s,t), there exists a member k such that 1. Z s ⊂ v Z k , and 2. For every u t ∈R d t u , there exists a u k ∈R d k u such that P t u t =P k u k , (7.4a) D mt u t =D mk u k ∀m∈N. (7.4b) We refer to member k as the Substituting Member for the critical pair (s,t). Example 7.1. Consider Problem 7.1 with the following information structure: Z 1 =H 1 Ξ, Z 2 =H 2 Ξ +D 21 U 1 (D 21 6= 0,Z 1 6⊂ v Z 2 ) 130 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions Z 3 =H 3 Ξ +D 31 U 1 = H 1 H 3 0 Ξ + 0 D 31 0 U 1 Z 4 =H 4 Ξ +D 41 U 1 +D 42 U 2 +D 43 U 3 = H 1 H 3 0 H 2 H 4 0 Ξ + 0 D 31 0 D 21 0 U 1 + 0 0 0 D 42 0 U 2 + 0 0 0 D 43 0 U 3 . (7.5) In this information structure, (1, 2) is a critical pair since D 21 6= 0 but Z 1 6⊂ v Z 2 . Since Z 1 ⊂ v Z 3 , the substitutability assumption may be satisfied in this example if for every u 2 , there exists a u 3 satisfying P 2 u 2 =P 3 u 3 and D 42 u 2 =D 43 u 3 . Remark 7.1. The substituting member k for the critical pair (s,t) can be the member s itself as long as condition 2 of Assumption 7.1 can be satisfied. Remark 7.2. In order to check condition 2 of Assumption 7.1, we need to verify that the column space of the matrix P t [D mt ] m∈N is contained in the column space of the matrix P k [D mk ] m∈N . This can be easily done, for instance, by projecting the columns of the first matrix onto the column space of the second and verifying that the projection leaves the columns unchanged. Alternatively, for each column c of the first matrix, we can check if the following equation has a solution for x: P k [D mk ] m∈N x =c. The following lemma is immediate from the theory of pseudo-inverses [150]. Lemma 7.1. If a solution u k to (7.4a) and (7.4b) exists, it can be written as u k = Λ kst u t = P k [D mk ] m∈N † P t [D mt ] m∈N u t . The optimization problem is defined as follows. Problem 7.1. For the model described in section 7.2.1, given that information structure is not partially nested and the substitutability assumption (Assumption 7.1) holds, find the team strategy g = (g 1 ,...,g N ) that minimizes the expected cost given by (7.1). 131 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions 7.2.3 Partially nested expansion of the information structure In order to solve Problem 7.1, we will consider a partially nested expansion of the infor- mation structure. This expansion is constructed by simply providing each team member i with the information of members inC i . We thus formulate the following problem. Problem 7.2. Solve Problem 7.1 under the assumption that the information available to member i, i∈N , is ˜ Z i = Z i Z C i . (7.6) Lemma 7.2. The information structure of Problem 7.2 is partially nested. Proof. It can be easily established that Problems 7.1 and 7.2 have the same precedence relationships. That is, j is a precedent of i in Problem 7.1 if and only if j is a precedent of i in Problem 2. Further, if k is a precedent of i (in both problems), then, by construction (see (7.6)), ˜ Z i contains Z k . Now, suppose thatj is a precedent ofi in Problem 7.2. To establish partial nestedness of the information structure in Problem 7.2, we need to establish that ˜ Z j = vec(Z j ,Z C j )⊂ v ˜ Z i . We already know that Z j ⊂ v ˜ Z i . Further, any k∈C j is a precedent of j and since j is a precedent of i, it follows that k is a precedent of i. Thus, ˜ Z i must contain Z k . This establishes that ˜ Z j ⊂ v ˜ Z i and hence the information structure in Problem 7.2 is partially nested. Remark 7.3. The idea of considering an expanded information structure and using strate- gies in the expansion to investigate optimal strategies in the original information structure has been used before [23, Section 3.5.2], [21, 22, 24]. In some cases [23, Section 3.5.2], [21], it is shown that the expansion is redundant as far as strategy optimization is concerned: an optimal strategy is found in the expanded information structure that is implementable in the original information structure. In [22, 24], an optimal strategy found in the expanded structure may not be directly implementable in the original information structure but, under some conditions, it can be used to construct an optimal strategy in the original information structure (see Section 7.2.6). As discussed below, our use of information structure expansion differs from both these approaches. 132 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions 7.2.4 Main Results Our main result relies on showing that we can construct an optimal team strategy in Problem 7.1 from an optimal team strategy in the partially nested expansion of this problem (Problem 7.2). We start with the following assumption. Assumption 7.2. In Problem 7.2 with partially nested information structure, there exists an optimal team strategy γ 0 = (γ 1 0 ,γ 2 0 ,...,γ N 0 ) that is a linear function of the members’ information and is given as U i =γ i 0 ( ˜ Z i ) =K ii 0 Z i + X j∈C i K ij 0 Z j , i∈N. (7.7) Remark 7.4. The information available to each member of Problem 7.2 can be written as, ˜ Z i = ˜ H i Ξ + X j∈N ˜ D ij U j ∀i∈N (7.8) where ˜ H i = [H m ] m∈{i}∪C i and ˜ D ij = [D mj ] m∈{i}∪C i. Since Problem 7.2 is a partially nested LQG problem, [20, Theorem 1] shows that it is equivalent to a static LQG team problem with the following information structure: ˆ Z i = ˜ H i Ξ ∀i∈N. (7.9) In particular, a linear strategy is optimal for Problem 7.2 iff a linear strategy is optimal for the equivalent static team. If the static LQG team has a cost function that is strictly convex in the team decision, then the optimal strategy of each team member is linear in the information of this member [18]. However, the cost function of (7.1) is not strictly convex since P | P is not positive definite. Hence, the result of [18] cannot be applied here. According to [151], for static LQG team problems with a cost function that is convex (not necessarily strictly convex) in the team decision, the linear team strategy γ i ( ˆ Z i ) = Π i ˆ Z i for alli∈N is optimal if the following linear system of equations has a solution for Π i , i∈N , N X j=1 (P i ) | P j Π j Σ ˆ Z j ˆ Z i =−(P i ) | MΣ Ξ ˆ Z i ∀i∈N (7.10) where Σ ˆ Z j ˆ Z i = E[ ˆ Z j ( ˆ Z i ) | ] and Σ Ξ ˆ Z i = E[Ξ( ˆ Z i ) | ]. Therefore, Assumption 7.2 is true as long as (7.10) has a solution. 133 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions Remark 7.5. Since team members in Problem 7.2 have more information than the corre- sponding members in Problem 7.1, it follows that the optimal expected cost in Problem 7.2 is a lower bound on the optimal expected cost in Problem 7.1. Theorem 7.1. Under Assumptions 7.1 and 7.2, there exist linear strategies in Problem 7.1, given as U i = Γ i Z i , i∈N, that achieve the same expected cost as the optimal strategies in Problem 7.2. Consequently, the strategies U i = Γ i Z i , i∈N , are optimal strategies in Problem 7.1. 7.2.5 Proof of Theorem 7.1 Let γ 0 := {γ 1 0 ,...,γ N 0 } be an optimal team strategy of Problem 7.2 as given in (7.7). These strategies may violate the information structure of Problem 7.1 since they use some information not available to members in Problem 7.1. Fori∈N andr∈C i , we say thatγ i 0 usesZ r if the matrixK ir 0 6= 0. We defineE i 0 ={r :r∈C i andγ i 0 usesZ r }. The cardinality of this set is referred to as the number of information structure violations of the strategy γ i 0 . Clearly, E i 0 ≤ C i . We will use the optimal team strategy γ 0 in Problem 7.2 to construct an optimal team strategy in Problem 7.1. We will proceed iteratively. At each step of the iteration, we will construct an equivalent team strategy for which the total number of information structure violations across all members is one less than the previous team strategy. This iterative process, described in Algorithm 1, can be summarized as follows: At the beginning of l th iteration, we are given a linear team strategy γ l of Problem 7.2. We consider members t and s such that s∈C t and γ t l uses Z s . This represents an information structure violation for Problem 7.1. We carry out a strategy transformation, referred to as Procedure 1 and described in detail below, to obtain a new team strategy γ l+1 which has the same perfor- mance asγ l butγ t l+1 does not useZ s . Thus, the number of information structure violations is reduced by one. The 0 th iteration starts with the optimal team strategy of Problem 7.2 as given in (7.7). The process terminates after l ∗ = P N j=1 E j 0 ≤ P N j=1 C j iterations at which point the number of information structure violations has been reduced to 0. As a result, in the team strategyγ l ∗,γ i l ∗ only usesZ i (which is available to memberi in Problem 1). Furthermore, γ l ∗ has the same performance as γ 0 . Therefore, the team strategy γ l ∗ 134 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions is optimal for Problem 1 and for each team member i, γ i l ∗ is linear in the information of member i. Algorithm 7.1 1: l = 0; (Iteration Number) 2: γ 0 from (7.7); 3: E i 0 ={r :r∈C i and γ i 0 uses Z r }, i = 1,...,N; 4: for t = 1 to N 5: while E t l 6= 0 6: s = min{r :r∈E t l }; 7: Find new team strategyγ l+1 fromγ l according to Procedure 1; 8: E t l+1 :=E t l \{s}; 9: E j l+1 :=E j l , j = 1,...,N, j6=t; 10: l =l + 1; 11: end while 12: end for We now describe Procedure 1 and then show that it preserves the expected cost. Procedure 1: Given linear team strategy γ l in Problem 7.2 and team members t and s such that s∈C t and γ t l uses Z s . The setE t l ={r : r∈C t and γ t l uses Z r } represents the information structure violations for member t under γ t l . The strategy of member t can be written as: γ t l ( ˜ Z t ) =K tt l Z t + X j∈E t l \{s} K tj l Z j +K ts l Z s . (7.11) According to Assumption 7.1, there exists a substituting member k for the critical pair (s,t). We construct new strategies for members t and k as follows: γ t l+1 ( ˜ Z t ) =γ t l ( ˜ Z t )−K ts l Z s =K tt l Z t + X j∈E t l \{s} K tj l Z j , (7.12) γ k l+1 ( ˜ Z k ) =γ k l ( ˜ Z k ) + Λ kst K ts l Z s =K kk l Z k + X j∈E k l K kj l Z j + Λ kst K ts l Z s =K kk new Z k + X j∈E k l K kj l Z j (7.13) 135 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions where K kk new is derived from K kk l and Λ kst K ts l because Z s ⊂ v Z k . At the end of the procedure, the strategies of members can be written as follows, • For membert,γ t l+1 ( ˜ Z t ) =K tt l+1 Z t + P j∈E t l \{s} K tj l+1 Z j whereK tt l+1 =K tt l andK tj l+1 = K tj l for j∈E t l \{s}. • For member k, γ k l+1 ( ˜ Z k ) =K kk l+1 Z k + P j∈E k l K kj l+1 Z j where K kk l+1 =K kk new and K kj l+1 = K kj l for j∈E k l . • For all other members r∈N\{t,k}, γ r l+1 ( ˜ Z r ) =K rr l+1 Z r + P j∈E r l K rj l+1 Z j =γ r l ( ˜ Z r ) where K rr l+1 =K rr l and K rj l+1 =K rj l for j∈E r l . By construction, membert’s new strategy is no longer usingZ s while every other member’s new strategy is using the same information as before. Thus, the total number of information structure violations has been reduced by one. To show that Procedure 1 preserves the expected cost, we need to show that the team strategyγ l+1 achieves the same expected cost as the team strategyγ l . We start with the following claim. Claim 7.1. Let us denote the team decision under team strategiesγ l+1 andγ l in Problem 7.2 by U γ l+1 and U γ l respectively. Then, PU γ l+1 =PU γ l . (7.14) Proof. See Appendix E.1. Remark 7.6. Under the team strategiesγ l+1 andγ l , U γ l+1 and U γ l are linear functions of Ξ and are, therefore, well-defined random vectors. The equality in (7.14) should be interpreted as follows: for every realization ξ of Ξ, the realizations of PU γ l+1 and PU γ l are equal to each other. Based on Claim 7.1, the following equality holds for every realization of the random vectors involved, MΞ +PU γ l+1 =MΞ +PU γ l . (7.15) Consequently, the expected costs under the team strategiesγ l+1 andγ l are identical. 136 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions 7.2.6 Discussion The proof of Theorem 7.1 shows that under Assumptions 7.1 and 7.2 an optimal team strat- egy in the partially nested Problem 7.2 that violates the information structure of Problem 7.1 can be transformed into an equivalent strategy that can be implemented in Problem 7.1. The idea of utilizing optimal strategies in a partially nested expansion of a team problem to construct equivalent strategies in the original problem was used in [24] as well. We paraphrase [24, Theorem 2] below: [24, Theorem 2]: Consider the setup of Problem 7.1 but without Assumption 7.1. Let γ 0 := (γ 1 0 ,...,γ N 0 ) be an optimal team strategy in its partially nested expansion. We will assume thatγ 0 violates the information structure of Problem 7.1 1 . Under this team strategy, let p i∗ be the composite control function from Ξ to U i defined such that p i∗ (Ξ) = γ i 0 (Z i ). Define functions g i :R d ξ 7→R d i z for i∈N as g i (ξ) :=η i (ξ,p 1∗ (ξ),p 2∗ (ξ),...,p i−1∗ (ξ)) :=H i ξ + X j<i D ij p j∗ (ξ). (7.16) Suppose there exist functionsr = (r 1 ,...,r N ) where r i :R d i z 7→R d i u for i∈N such that, p i∗ (Ξ) =r i (g i (Ξ)), ∀i∈N. (7.17) Then, [24, Theorem 2] states that r is an optimal team strategy for the original non- partially-nested problem. In comparing our result to [24, Theorem 2], the following key observations can be made: 1. The substitutability assumption required for our result is a condition placed on the information structure of Problem 7.1 and on the parameters in the cost and observa- tion equations (namely, the matrices P i ,D ij , i,j∈N ). The condition required for the result in [24], on the other hand, is a requirement that an optimal team strategy in the partially nested expansion must satisfy. Clearly, our result and the result in [24] require conditions of very different nature. 2. Using an optimal strategyγ 0 in the expanded structure, the result in [24] constructs a team strategyγ eq for the original team problem in a manner that ensures that for 1 This assumption is not made in [24]. But it is clear that ifγ 0 does not violate the information structure of Problem 7.1, then it is an optimal strategy in that problem and no further construction is needed. 137 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions eachi∈N ,U i γ 0 =U i γ eq . In contrast, the strategies constructed in our proof ensure that PU γ l+1 = PU γ l (see Claim 7.1). In other words, the transformation in [24] ensures that each member’s action U i is the same random variable under the original and transformed strategies. In contrast, under our transformation, a member’s action may become a different random variable but the combined effect of the team members’ actions on cost as captured by the term PU remains unchanged. 3. We next consider the following question: Suppose we solve Problem 7.2 under As- sumptions 1 and 2 and find an optimal team strategy 2 γ 0 that violates the infor- mation structure of Problem 7.1. We then construct the composite control functions p i∗ ,i∈N, underγ 0 . Then, do there always exist functionsr = (r 1 ,...,r N ) satisfying (7.17)? In other words, are our assumptions sufficient conditions forγ 0 to satisfy the conditions imposed in [24, Theorem 2]? The answer is no as the following examples demonstrate: Example 7.2. Consider Problem 7.1 where the control actions are one-dimensional, Ξ∼N (0, 1), and N ={1, 2, 3}, P = h 0 1 1 i , M = 1 Z 1 = Ξ, Z 2 =U 1 , Z 3 = Ξ. (7.18) It is straightforward to see that (1, 2) is a critical pair and that member 3 is a substi- tuting member for this critical pair. The information structure for the partially nested expansion of this example is ˜ Z 1 =Z 1 = Ξ, ˜ Z 2 = Z 2 Z 1 = U 1 Ξ , ˜ Z 3 =Z 3 = Ξ. (7.19) An optimal strategy in the partially nested expansion is U 1 =γ 1 0 ( ˜ Z 1 ) = 0, U 2 =γ 2 0 ( ˜ Z 2 ) =−0.5Z 1 , U 3 =γ 3 0 ( ˜ Z 3 ) =−0.5Z 3 . (7.20) The above strategy results in the lowest possible expected cost of 0. The composite control functions under the above strategy are p 1∗ (Ξ) = 0, p 2∗ (Ξ) =−0.5Ξ, p 3∗ (Ξ) =−0.5Ξ, (7.21) 2 There may be many optimal strategies. We pick one arbitrarily. 138 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions and the functions g i defined in (7.16) are g 1 (Ξ) = Ξ, g 2 (Ξ) = 0, g 3 (Ξ) = Ξ. (7.22) It is clear that there is no function r 2 :R7→R such that p 2∗ (Ξ) =r 2 (g 2 (Ξ)). (7.23) In the above example, one can easily find other optimal strategies in the partially nested expansion that would satisfy the conditions of [24, Theorem 2]. The point we wish to make is that for an example that meets our assumptions, there may be some optimal strategies in the partially nested expansion that do not satisfy the conditions of [24, Theorem 2]. Our Algorithm 1, on the other hand, works with any linear optimal strategy in the partially nested expansion. Example 7.3. Consider Problem 7.1 where the control actions are one-dimensional, Ξ = Ξ 1 Ξ 2 ∼N(0, 2 −1 −1 2 ) and N ={1, 2, 3}, P = 2 −1 −1 1 2 2 , M = 1 0 0 1 Z 1 = Ξ 2 , Z 2 =U 1 , Z 3 = Ξ 2 . (7.24) It is straightforward to see that (1, 2) is a critical pair and that member 3 is a substi- tuting member for this critical pair. The information structure for the partially nested expansion of this example is ˜ Z 1 =Z 1 = Ξ 2 , ˜ Z 2 = Z 2 Z 1 = U 1 Ξ 2 , ˜ Z 3 =Z 3 = Ξ 2 . (7.25) Since this information structure is partially nested, it is equivalent to a static team with the following information structure, ˆ Z 1 = Ξ 2 , ˆ Z 2 = Ξ 2 , ˆ Z 3 = Ξ 2 . (7.26) According to Remark 7.4, the linear team strategy U i = γ i ( ˆ Z i ) = Π i ˆ Z i = Π i Ξ 2 for i∈N is optimal if the following linear system of equations has a solution for Π i , 139 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions i∈N , 10Π 1 = 0, 10Π 2 + 10Π 3 =−5. (7.27) One solution provides the following strategies: U 1 = 0, U 2 = Ξ 2 , U 3 =−1.5Ξ 2 . (7.28) (7.28) can be written as follows under the information structure of (7.25), U 1 =γ 1 0 ( ˜ Z 1 ) = 0, U 2 =γ 2 0 ( ˜ Z 2 ) =Z 1 , U 3 =γ 3 0 ( ˜ Z 3 ) =−1.5Z 3 . (7.29) The team strategies of (7.29) cannot be implemented in the original non-partially- nested information structure because U 2 uses Z 1 while Z 1 6⊂ v Z 2 . We now follow the procedure of Algorithm 1 and use γ 0 = (γ 1 0 ,γ 2 0 ,γ 3 0 ) from (7.29) to find optimal team strategies that can be implemented in the original information structure. Since there is only one information structure violation underγ 0 , we obtain the desired strategies after one iteration: U 1 =γ 1 1 ( ˜ Z 1 ) = 0, U 2 =γ 2 1 ( ˜ Z 2 ) = 0, U 3 =γ 3 1 ( ˜ Z 3 ) =−0.5Z 3 . (7.30) To compare our approach with that of [24], note that the composite control functions underγ 0 are p 1∗ (Ξ) = 0, p 2∗ (Ξ) = Ξ 2 , p 3∗ (Ξ) =−1.5Ξ 2 . (7.31) and the functions g i defined in (7.16) are g 1 (Ξ) = Ξ 2 , g 2 (Ξ) = 0, g 3 (Ξ) = Ξ 2 . (7.32) Clearly, there is no function r 2 such that p 2∗ (Ξ) =r 2 (g 2 (Ξ)). 4. Under our assumptions, the strategies Γ i Z i , i∈N , of Theorem 7.1 are optimal for both Problem 7.1 and 7.2. If these are used asγ 0 in [24, Theorem 2], then it can be shown that r i = Γ i will satisfy (7.17). Of course, if we know Γ i , i∈N , already, then there is no need to carry out the transformation of [24, Theorem 2]. 140 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions 5. Finally, [24, Problem A in Section IV] presents an example where the conditions of [24, Theorem 2] hold, but the substitutability assumption does not. Thus, our assumptions do not provide necessary conditions for the conditions imposed in [24, Theorem 2]. The core idea of substitutability is also conceptually different from the conditional inde- pendence related properties of stochastic nestedness [21] and P-quasiclassical information structures [22] that have been used for some non-partially-nested problems. In the non- partially-nested models of [21] and [22], one can identify an agent’s “missing information” that prevents the information structure from being partially nested, i.e., if the agents knew their missing information, then the information structure would be partially nested. These papers then rely on conditional independence like properties (of the relevant cost or state variables) to argue that given an agent’s actual information, the missing information is ir- relevant for making optimal decisions. We believe that this is very different from the essence of substitutability. Under our assumptions, it is not the case that the missing information of agenti is irrelevant for its decision. It’s just that there is another agent present that knows the information missing at agent i and can reproduce any effects on cost and observations that agent i could have produced had it known its missing information. Let’s reconsider Example 7.2 where the information structure is Z 1 = Ξ, Z 2 = U 1 , Z 3 = Ξ. Suppose the following strategies are being used under this information structure: U 1 =γ 1 (Z 1 ) = 0,U 2 =γ 2 (Z 2 ) = 1,U 3 =γ 3 (Z 3 ) = 0. (7.33) Under the above strategies, the conditional expectation of the cost terms that involve U 2 , given Z 2 ,U 2 , can be computed to be E γ h U 2 U 2 + 2U 2 Ξ + 2U 2 U 3 |Z 2 ,U 2 i = 1. (7.34) On the other hand, if the same expectation is computed given Z 1 ,U 1 ,Z 2 ,U 2 , we get E γ h U 2 U 2 + 2U 2 Ξ + 2U 2 U 3 |Z 1 ,U 1 ,Z 2 ,U 2 i = 1 + 2Ξ. (7.35) If the information structure was P-quasiclassical, then the two conditional expectations above should have been identical (see [22, Definition 2]). This demonstrates that Example 7.2 violates the definition of P-quasiclassical information structures even though it satisfies our substitutability assumption. 141 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions 7.3 Substituability in Decentralized LQG Control Decentralized control problems in discrete time can be viewed as dynamic team problems by viewing a controller’s actions at different time instants as the actions of distinct team members [20]. Thus, a decentralized control problem with n controllers acting over a time horizon of duration T can be seen as a dynamic team with nT members, each member responsible for one control action. We will denote the team member corresponding to controller i’s action at time t as member i.t. We can then verify whether this dynamic team satisfies the assumptions of Section 7.2 and if it does we can use an optimal team strategy in its partially nested expansion to find an optimal team strategy in the original team. The optimal strategy for member i.t then naturally becomes the control strategy for controller i at time t. Thus, non-partially-nested decentralized control problems whose dynamic team representations satisfy Assumptions 7.1 and 7.2 can be solved using the analysis of Section 7.2. It is possible, however, to exploit the state structure in control problems to (a) simplify the verification of the substitutability assumption and (b) to find compact control strategies with recursively update-able sufficient statistics. We demonstrate this by considering the following problem. 7.3.1 System Model and Information Structure We consider a decentralized control problem with n controllers where 1. The state dynamics are given as X t+1 =AX t +BU t +W t , t = 1,...,T− 1, (7.36) where X t ,W t ∈R dx , U t ∈R du and U t = vec(U 1 t ,...,U N t ). 2. Each controller makes a noisy observation of the system state given as Y i t =C i X t +V i t , i = 1,...,N. (7.37) Combining (7.37) for all controllers gives: Y t =CX t +V t , (7.38) 142 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions where Y t denotes vec(Y 1 t ,Y 2 t ,...,Y N t ) and V t denotes vec(V 1 t ,V 2 t ,...,V N t ) and C is a matrix composed of C 1 ,...,C N as row blocks. The initial state X 1 and the noise variables W t ,t = 1,...,T− 1, and V t ,t = 1,...,T− 1, are mutually independent and jointly Gaussian with the following probability distributions: X 1 ∼N(0, Σ x ), W t ∼N(0, Σ w ), V t ∼N(0, Σ v ). The information available to the i th controller at time t is: I i t ={Y i 1:t ,U i 1:t−1 } i = 1,...,N. (7.39) Each controller i, chooses its action U i t according to U i t = g i t (I i t ). The collection g i = (g i 1 ,...,g i T ) is called the control strategy of controller i. The performance of the control strategies of all controllers, g = (g 1 ,...,g N ), is measured by the total expected cost over a finite time horizon: J (g) =E g " T X t=1 (MX t +PU t ) | (MX t +PU t ) # . (7.40) The optimization problem is defined as follows. Problem 7.3. For the model described above, find control strategy g = (g 1 ,...,g N ) that minimizes the expected cost given by (7.40). 7.3.2 Substitutability Assumption We make the following assumption about the system. Assumption 7.3. For every vector u = vec(u 1 ,u 2 ,...,u N ), there exist control actions v i =l i (u) for controller i, i = 1,...,N, such that Bu =B 0 . . . v i . . . 0 and Pu =P 0 . . . v i . . . 0 . (7.41) 143 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions We can write the B and P matrices in terms of their blocks as B = h B 1 ... B N i , P = h P 1 ... P N i . An example of a system satisfying Assumption 7.3 is a two-controller LQG problem where the dynamics and the cost are functions only of the sum of the control actions, that is, (u 1 t +u 2 t ). This happens if B 1 = B 2 and P 1 = P 2 . In this case, using v 1 t = v 2 t = u 1 t +u 2 t satisfies (7.41). Remark 7.7. More generally, Assumption 7.3 is satisfied iff the column spaces of matrices B i P i , i = 1,...,N, are identical. Remark 7.8. The substitutability assumption above (Assumption 7.3) is really just a com- pact representation of the substitutability assumption of Section 7.2 (Assumption 7.1) with a specified substituting member for each critical pair. To see this, first note that in the dynamic team representation of the control problem members i.s andj.t form a critical pair when j6=i and s<t. Secondly, member i.t knows all the information of member i.s. To show that member i.t is a substituting member for the critical pair (i.s,j.t), we just need to argue that for any actionu j t , we can find an actionu i t that produces the same effect on total cost and future observations. Since the effect of u j t on the cost at time t is only through the term P j u j t and its effects on the future costs and observations are only through B j u j t , it suffices to ensure that for any u j t , there exists u i t such that P j u j t =P i u i t and B j u j t =B i u i t . Combining the above for all j6=i gives the substitutability conditions of Assumption 7.3. The following lemma is immediate from the theory of pseudo-inverses [150]. Lemma 7.3. If a solution v i to (7.41) exists, it can be written as v i = Λ i u, where Λ i = B i P i † B P . (7.42) 144 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions 7.3.3 A Centralized Problem In order to solve Problem 7.3, we would like to consider a partially nested expansion of its information structure. Because members i.s and j.t form a critical pair in the dynamic team representation of the control problem when j 6= i and s < t, a partially nested expansion must give controller j at time t all the information of controller i at any time s<t. A convenient expansion that meets this requirement is the information structure of the centralized problem described below. Problem 7.4. For the model described above, assume that the information available to each controller is ˜ I t ={Y 1:t ,U 1:t−1 }. (7.43) Controller i chooses its action according to strategy U i t = g i t ( ˜ I t ). The objective is to select control strategies that minimize (7.40). The following lemma follows directly from the problem descriptions above and well-known results for the centralized LQG problem with output feedback [152]. Lemma 7.4. 1. The optimal cost in Problem 7.4 (with centralized information struc- ture) is a lower bound on the optimal cost in Problem 7.3 (with decentralized infor- mation structure). 2. The optimal strategies in Problem 7.4 have the form ofU t =K t Z t whereZ t =E[X t | ˜ I t ]. Z t evolves according to the following equations: Z 1 =L 1 Y 1 Z t+1 = (I−L t+1 C)(AZ t +BU t ) +L t+1 Y t+1 . (7.44) The matrices L t ,t = 1,...,T can be computed apriori from the problem parameters. 7.3.4 Main results In this section, we show that it is possible to construct optimal strategies in Problem 7.3 from the optimal control strategy of Problem 7.4. 145 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions Theorem 7.2. Consider Problems 7.3 and 7.4, and consider the optimal strategy, U t = K t Z t , of Problem 7.4. We write L t+1 of Lemma 7.4 as L t+1 = h L 1 t+1 L 2 t+1 ... L N t+1 i . The optimal control strategies of Problem 7.3 can be written as U i t = Λ i K t S i t (7.45) where Λ i is given by (7.42) and S i t satisfies the following update equations: S i 1 =L i 1 Y i 1 S i t+1 = (I−L t+1 C)(AS i t +B i U i t ) +L i t+1 Y i t+1 . (7.46) Moreover, the optimal strategies in Problem 7.3 achieve the same cost as the optimal strate- gies in Problem 7.4. Observe that the strategies given by (7.45) and (7.46) are valid control strategies under the information structure of Problem 7.3 because they depend only on Y i 1:t ,U i 1:t−1 which are included inI i t . The statesS i t defined in (7.46) are related to the centralized estimate Z t by the following result. Lemma 7.5. The centralized state estimate Z t and the states S i t defined in (7.46) satisfy the following equation: Z t = N X i=1 S i t . (7.47) Proof. We prove the result by induction. For t = 1, from (7.44), we have Z 1 = L 1 Y 1 and according to (7.46), N X i=1 S i 1 =L 1 1 Y 1 1 +L 2 1 Y 2 1 +... +L N 1 Y N 1 =L 1 Y 1 . (7.48) Now assume that Z t = P N i=1 S i t . We need to show that Z t+1 = P N i=1 S i t+1 . From (7.44), it follows that Z t+1 = (I−L t+1 C)(AZ t +BU t ) +L t+1 Y t+1 . (7.49) From (7.46), we have N X i=1 S i t+1 = N X i=1 [(I−L t+1 C)(AS i t +B i U i t ) +L i t+1 Y i t+1 ] 146 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions = (I−L t+1 C)(A N X i=1 S i t + N X i=1 B i U i t ) + N X i=1 L i t+1 Y i t+1 ] = (I−L t+1 C)(AZ t +BU t ) +L t+1 Y t+1 . (7.50) Therefore, Z t+1 = P N i=1 S i t+1 . Remark 7.9. If X t = vec(X 1 t ,...,X N t ) and for each i Y i t = X i t , it can be easily shown that S i t = vec(0,...,X i t ,..., 0). The following result is an immediate consequence of Theorem 7.2. Corollary 7.1. For the model described in section 7.3.1, consider any information structure under which the information of controller i at time t, ˆ I i t , satisfies {Y i 1:t ,U i 1:t−1 }⊆ ˆ I i t ⊆{Y 1:t ,U 1:t−1 }, for all i = 1,...,N and t = 1,...,T . Then, the optimal strategies in this information structure are the same as in Theorem 7.2. 7.3.5 Proof of Theorem 7.2 For notational convenience, we will describe the proof for N = 2. If U t = K t Z t is the optimal control strategy of Problem 7.4, then from Lemma 7.5, we have: U t =K t Z t =K t (S 1 t +S 2 t ). (7.51) We claim that the decentralized control strategies defined in Theorem 7.2, that is U t = U 1 t U 2 t = Λ 1 K t S 1 t Λ 2 K t S 2 t , t = 1,...,T, (7.52) yield the same expected cost as the optimal centralized control strategies U t = K t Z t ,t = 1,...,T . We first consider the control system under the centralized strategies. We proceed sequen- tially to establish the claim by successively changing the control strategies at each time step. Under the control strategies U t = K t Z t ,t = 1,...,T , the controlled system can be 147 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions viewed as a linear system with vec(X t ,Z t ) as the state. We first change the control strategy at time t = 1 from U 1 =K 1 Z 1 to the one given by (7.52) and show that it doesn’t change the instantaneous cost or the future evolution of the linear system. Under control action U 1 = K 1 Z 1 , we have PU 1 = PK 1 Z 1 . Under control actions U 1 1 = Λ 1 K 1 S 1 1 ,U 2 1 = Λ 1 K 1 S 2 1 , we have PU 1 = h P 1 P 2 i U 1 =P 1 U 1 1 +P 2 U 2 1 =P 1 Λ 1 K 1 S 1 1 +P 2 Λ 2 K 1 S 2 1 . (7.53) From the substitutability assumption (Assumption 7.3) and Lemma 7.3, for any vector u, Pu =P i Λ i u. Therefore, P 1 Λ 1 K 1 S 1 1 =PK 1 S 1 1 , P 2 Λ 2 K 1 S 2 1 =PK 1 S 2 1 . (7.54) (7.53) can now be written as, P 1 Λ 1 K 1 S 1 1 +P 2 Λ 2 K 1 S 2 1 =P (K 1 S 1 1 +K 1 S 2 1 ) =PK 1 Z 1 , (7.55) where the last equality is true becauseZ 1 =S 1 1 +S 2 1 . Thus, the change in strategies at time t = 1 does not affect the cost at time t = 1. The change in strategies at time t = 1 affects the next state vec(X 2 ,Z 2 ) only through the termBU 1 . From the substitutability assumption (Assumption 7.3) and Lemma 7.3, for any vector u, Bu =B i Λ i u. Therefore, B 1 Λ 1 K 1 S 1 1 +B 2 Λ 2 K 1 S 2 1 =B(K 1 S 1 1 +K 1 S 2 1 ) =BK 1 Z 1 . (7.56) The future state evolution is unaffected by the change in strategies at timet = 1. Therefore, changing strategies at time t = 1 from the centralized strategy to the one given by (7.52) does not change the expected cost. Proceeding in the same manner for all successive time instants establishes the claim. 148 Chapter 7. Dynamic Teams and Decentralized Control Problems with Substitutable Actions 7.4 Conclusion We considered two problems in this chapter, an LQG dynamic team problem and a de- centralized LQG control problem and defined a property called substitutability in these problems. For the non-partially-nested LQG dynamic team problem, we showed that under certain conditions an optimal strategy of each team member is linear in its information. For the non-partially-nested decentralized control problem under the substitutability assump- tion, we showed that linear strategies are optimal and we provided a complete state-space characterization of optimal strategies. Our results suggest that substitutability can work as a counterpart of the information structure requirements that enable simplification of dynamic teams and decentralized control problems. 149 Chapter 8 Concluding Remarks 8.1 Summary In this dissertation, we studied several classes of decentralized control problems where some of the limitations of team-theoretic framework were present and developed solution ap- proaches that circumvent these limitations. Chapter 3: Decentralized Control over Unreliable Communication– Finite horizon In this chapter, we considered a decentralized networked control system (DNCS) consisting of a remote controller and a collection of linear plants, each associated with a local controller. Each local controller directly observes the state of its co-located plant and can inform the remote controller of the plant’s state through an unreliable communication channel. The communication channels from the remote controller to local controllers are assumed to be perfect. The objective of the local controllers and the remote controller is to cooperatively minimize a quadratic performance cost. This multi-controller DNCS problem is not a partially nested LQG problem, hence we could not directly use prior results in decentralized control to conclude that linear strategies are optimal. We employed the common information approach to this problem and showed that it is equivalent to a centralized sequential decision-making problem where the remote controller is the only decision-maker. We provided a dynamic program to obtain optimal strategies 150 Chapter 8. Proposed Future Directions in the equivalent problem. Then, using these optimal strategies for the equivalent problem, we obtained optimal control strategies for all local controllers and the remote controller in our original problem. In the optimal control strategies, all controllers compute common estimates of the states of the plants based on the common information obtained from the communication network. The remote controller’s action is linear in the common state estimates, and the action of each local controller is linear in both the actual state of its corresponding plant and the common state estimates. Although we conceptually followed the common information approach of [54], we had to come up with some new technical arguments to adapt this approach to our problem. The technical argument in [54] was proven for finite state and action spaces, which makes this approach not directly applicable to our problem where both the state and the action spaces are Euclidean. While [54] states that the results should apply to more general spaces, this was not explicitly proven. To the best of our knowledge, our result is the first one that explicitly shows that the common information approach for decentralized control is not confined to the realm of problems where state/action spaces are finite or problems which pre-suppose linear strategies. Even though our strategy space allows for arbitrary measurable functions, we were able to adapt the common information approach to find explicit optimal strategies. Our results sketch a solution methodology for decentralized control with unreliable com- munication among controllers. The methodology can potentially be generalized to other communication topologies in decentralized control such as directed acyclic communication graphs with unreliable links. Chapter 4: Decentralized Control of Finite State and Jump Linear Systems Connected by Markovian Channels In this chapter, we considered a decentralized networked control system (DNCS) consisting of one global controller and multiple local controllers. The DNCS includes a global plant controlled only by the global controller and multiple local plants where each local plant is controlled jointly by a co-located local controller and the global controller. The global controller is analogous to the remote controller in Chapter 3. However, in addition to controlling the local plants, here it also controls the global plant (which was absent in Chapter 3). Unlike the problem in Chapter 3 where all the local plants had linear system 151 Chapter 8. Proposed Future Directions dynamics, here we let all plants (the global one and all local ones) have general system dynamics. We assumed that the state of a local plant is perfectly observed by its co- located local controller and the global state is perfectly observed by the global controller. Each local controller can inform the global controller of its local plant’s state through an unreliable two-state Markovian communication channel. The global controller shares whatever information it has received over the unreliable channels as well as the global state with all local controllers. We assumed that the communication channels from the global controller to the local controllers are perfect. The objective of the controllers is to cooperatively minimize a general cost function over a finite time horizon. While this problem is closely related to the problem of Chapter 3, the presence of the global state, the Markovian channel, and the general system dynamics and cost function make this problem different. In spite of this difference, we used the methodology we proposed in Chapter 3 and showed that it can be extended for our problem here. We first provided a dynamic program to obtain the optimal strategies of the controllers. For the case with finite state and action spaces, it is possible to solve the dynamic program numerically using POMDP (Partially observable Markov Decision Processes) solvers. For the case with switched linear dynamics and mode-dependent quadratic cost, we showed that it is possible to explicitly solve the dynamic program and obtain explicit optimal strategies for all local controllers and the global controller. As we showed in this chapter, the generality of the architecture of our DNCS problem allows one to use it for capturing various applications, such as DNCSs with broadcast- out architecture; DNCSs with decoupled subsystems and coupled costs; and two-controller DNCSs with decoupled dynamics, coupled costs, and two-way unreliable communication. Chapter 5: Decentralized Control over Unreliable Communication– Infi- nite horizon In this chapter, we considered the same DNCS architecture as the one studied in Chapter 3. While Chapter 3 focused on the finite time horizon problem, in this chapter we attempted to find decentralized control strategies that minimize an infinite horizon average cost. For this problem, the standard team-theoretic results cannot be used as the corresponding team problem version of our infinite horizon problem will have an infinite number of members 152 Chapter 8. Proposed Future Directions and no general results are known about the solution of dynamic team decision problems with infinite members. For the finite horizon version of this problem, we obtained optimal decentralized controllers in Chapter 3 using ideas from the common information approach. The optimal strategies in the finite horizon case are characterized by coupled Riccati recursions. In contrast to the finite horizon problem, stability is an important issue for the infinite horizon problem and this makes it non-trivial to extend finite horizon results to infinite horizon. In order to study the possibility of such an extension, one needs to answer two questions: (i) Does there exist a solution for the fixed point version of the coupled Riccati recursions obtained in the finite horizon case? (ii) If it does exist, can this solution be used to find optimal strategies that minimize the infinite horizon average cost? In order to answer the first question, we exploited a connection between our problem and an auxiliary Markov Jump Linear System (MJLS). This connection enables us to use the theory of Markov jump linear systems to investigate the existence of solution to the fixed point version of the coupled Riccati recursions obtained in the finite horizon case. We characterized a critical failure probability for each unreliable link and showed that when all link failure probabilities are below their critical thresholds, a solution does exist for the fixed point version of the coupled Riccati recursions. Further, when the solution exists, we also answered the second question by providing an explicit characterization of the optimal strategies. Our approach for answering the first question above sketches a methodology for solving infinite horizon decentralized control/DNCS problems. Chapter 6: Decentralized Control with Partially Unknown Systems In this chapter, we considered a multi-agent (or multi-controller) linear-quadratic (LQ) control problem consisting of three systems, a global system and two local systems. In this problem, there are three agents – the actions of the global agent can affect the global system as well as the local systems while the actions of the local agents can only affect their respective co-located local systems. Further, the global system’s state can affect the local systems’ state evolution. We were interested in minimizing the infinite-horizon average cost incurred when the dynamics of the global system are not known to the controllers. For this 153 Chapter 8. Proposed Future Directions multi-agent reinforcement learning (MARL) problem, the standard team-theoretic results cannot be used as these results apply only to known models. Our problem is a decentralized online learning problem where agents need to learn model parameters and control the system simultaneously. The asymmetry of the information among agents can cause them to have different views about the unknown system from their different learning processes. This difference in perspectives makes it challenging for the agents to coordinate their actions for minimizing the cost. In order to address this challenge, we proposed a multi-agent learning algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem serves as an implicit coordination mechanism among the learning agents. In our proposed Thompson Sampling (TS)-based multi-agent learning algorithm, each agent learns the global system’s dynamics independently. We showed that the expected (Bayesian) regret achieved by our algorithm is upper bounded by the expected (Bayesian) regret for the auxiliary single-agent problem under a TS algorithm. Consequently, using existing results for single-agent LQ regret, our algorithm provides a ˜ O( √ T ) bound on the expected regret of our MARL problem. Our approach above sketches a methodology for designing coordination mechanisms among agents in MARL problems. Chapter 7: Dynamic Teams and Decentralized Control Problems with Substitutable Actions In this chapter, we considered a linear-quadratic-Gaussian (LQG) dynamic team problem and also a LQG decentralized control problem for which the information structure is not partially nested. Our objective was to find optimal control strategies for these two problems. We assumed that the pattern of information propagation and the cost function enjoy a special property that we call “Substitutability” . Since the information structure of these two problems is not partially nested, we could not a priori restrict to linear strategies for optimal control and this makes the problem of finding optimal control strategies hard. We showed that the substitutability property allows us to circumvent the difficulties arising due to the non-partially nested structure. For our non-partially-nested LQG dynamic team problem, we showed that under certain 154 Chapter 8. Proposed Future Directions conditions linear strategies are optimal. For our non-partially nested decentralized LQG control problem, we showed that the state structure can be exploited to obtain optimal control strategies with recursively update-able sufficient statistics. These results suggest that substitutability can work as a counterpart of the information structure requirements that enable simplification of dynamic teams and decentralized con- trol problems. The substitutability assumption required for our result is a condition placed on the information structure of the problem and on the parameters in the cost and infor- mation functions. However, the partially nested assumption of [20] places a condition only on the information structure of the problem. Further, the condition required for the result in [24] is a requirement that the optimal team strategy of the partially nested expansion of the problem must satisfy. Clearly, our result and the results in [20, 24] require conditions of different nature. 8.2 Future Directions We discuss three main future directions based on the topics studied in this thesis. 8.2.1 Decentralized Control over Unreliable Communication with Output feedback For the decentralized network control system (DNCS) with unreliable communication (and its extensions and related problems) studied in Chapters 3-5, we assumed that the state of the plant(s) is perfectly observed by the controller(s). However, the sensors in practice observe a noisy version of the state of the plant(s). In this case, the observation available to the local controller C n at time t is, Y n t =C n X n t +V n t , (8.1) where V n t is the noise in the observation of local controller C n . For this problem, it would be interesting to understand how the noisy observations affect the overall performance of the system and to study the problem of finding optimal control strategies in this scenario. One question about this problem is whether the well-known “separation principle” in control theory works here as well. The separation principle, more 155 Chapter 8. Proposed Future Directions formally known as the principle of separation of estimation and control, states that under some assumptions, the problem of designing an optimal feedback controller for a stochastic system can be solved by designing an optimal estimator for the state of the system, which feeds into an optimal controller for the deterministic system. Therefore, the problem of designing an optimal feedback controller can be broken into two separate parts, which facilitates the design. We would like to investigate whether similar results can be obtained for the output-feedback versions of the problems described in Chapters 3-5. 8.2.2 Stochastic Teams with Randomized Information Structures In Chapter 1, we briefly talked about team problems and stated the well-known result for problems with partially nested information structure [20]: when the underlying state of the system is Gaussian, observation functions are linear, and the loss function is quadratic –a.k.a. the partially nested LQG case– affine control strategies are optimal. The partial nestedness of the information structure of a problem requires that whenever the action of an agent/team member (sayi) affects the information of another agent (sayj), theni should be able to communicate perfectly toj. This condition is violated for the networked control and decentralized networked control problems where agents are connected through unreliable communication. Therefore, a natural question is that: Are there analogues of partially nested information structures that enable simplification of team problems with unreliable communication among agents? We have made some progress in this direction by introducing the concept of “randomized information structures” where the information collection and sharing structure is randomly selected by nature. With this randomization, we extend the two classical information struc- tures, namely, the static and the partially nested information structures, to their random- ized counterparts, namely, the randomized static (RS) and the randomized partially nested (RPN) information structures. The stochastic nature of RS and RPN can allow us to model randomness in data collection and communication, such as network induced packet dropouts and random delays [57, 153–155], in decentralized systems. Investigating such information structures may allow us to identify a tractable class of team problems with unreliable communication among agents. 8.2.3 Learning in Decentralized Control 156 Chapter 8. Proposed Future Directions In Chapter 6, we studied a decentralized linear-quadratic (LQ) control problem with a partially unknown system. We proposed a multi-agent learning algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent prob- lem serves as an implicit coordination mechanism among the learning agents. In our pro- posed Thompson Sampling (TS)-based multi-agent learning algorithm, each agent learns the global system’s dynamics independently. We showed that the expected (Bayesian) re- gret achieved by our algorithm is upper bounded by the expected (Bayesian) regret for the auxiliary single-agent problem under a TS algorithm. Consequently, using existing results for single-agent LQ regret, our algorithm provides a ˜ O( √ T ) bound on the expected regret of our MARL problem. Two important extensions that one can consider are as follows. • For the problem we studied in Chapter 6, we considered the expected (Bayesian) regret which assumes that the unknown parameter of the system belongs to a set of possible values and there exists a prior probability distribution over this set. Another related and more common notion of regret is frequentist regret which assumes that the unknown parameter of the system is fixed. Since this notion of regret does not require existence of a prior distribution, it is more suited for practical situations. Therefore, a natural question is that: Is it possible to design an efficient online learning algorithm similar to the one proposed in Chapter 6 with provable guarantees for the frequentist regret? • For the problem we studied in Chapter 6, we assumed that only the global system is unknown and all agents receive feedback about the state of this system. For this pattern of information sharing, known as broadcast-out, our approach was to construct an auxiliary single-agent (centralized) LQ control problem and use it for the regret analysis of our multi-agent decentralized LQ control problem. An interesting question is whether our approach can be applied to a wider class of MARL problems with different information sharing patterns. 157 Appendices A Decentralized Control over Unreliable Communication– Finite horizon In the Appendices, we use{X m } to denote{X m } m∈A if the setA is clear from the context. A.1 Preliminary Results In this section, we state and prove a set of claims which are useful in proving the main results of Chapter 3. Claim .1. Let F 0 , F 1:N and G 1:N be σ−algebras such that F 1:N are conditionally indepen- dent given F 0 , and G n ⊂F n , n∈N . Then, for A n ∈F n , n∈N , P({A n }|F 0 ,{G m }) = Y n∈N P(A n |F 0 ,{G m }). (2) Proof. Showing the correctness of (2) is the same as showing E[ Y n∈N 1 A n|F 0 ,{G m }] = Y n∈N E[1 A n|F 0 ,{G m }]. (3) The left hand side of (3) can be written as, E[ Y n∈N 1 A n|F 0 ,{G m }] =E h E[ Y n∈N 1 A n|F 0 ,G k ,{F m } m6=k ] F 0 ,{G m } i =E h Y n6=k 1 A nE[1 A k|F 0 ,G k ,{F m } m6=k ] F 0 ,{G m } i =E h Y n6=k 1 A nE[1 A k|F 0 ,G k ] F 0 ,{G m } i 158 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon =E h Y n6=k 1 A n F 0 ,{G m } i E[1 A k|F 0 ,G k ] (4) where the first equality is true due to the tower property of conditional expectation, the second property is true due to “pulling out known factors” property; the third equality is obtained by first using “chain rule” property to show that F k is conditionally independent of{F m } m6=k given F 0 ,G k and then using Doob’s conditional independence property [156, Chapter 5]; the fourth equality is true again due to “pulling out known factors” property. By repeating the procedure of (4) one by one for each k∈N , then we get E h Y n6=k 1 A n F 0 ,{G m } i E[1 A k|F 0 ,G k ] = Y k∈N E[1 A k|F 0 ,G k ] = Y n∈N E[1 A n|F 0 ,{G m }] (5) where last equality is true due to the “chain rule” property and Doob’s conditional inde- pendence property. Claim .2. 1. Consider feasible strategies g = g 0:N , g n ∈G n , n∈N , in Problem 3.3. Then, the random vectorsX n 0:t are conditionally independent ofX m 0:t forn,m∈N,n6= m given H 0 t . That is, for any measurable sets E n 0:t ⊂ Q t s=0 R d n X , n∈N , P g ({X n 0:t ∈E n 0:t }|H 0 t ) = Y n∈N P g (X n 0:t ∈E n 0:t |H 0 t ). (6) 2. The same result holds under any feasible fixed prescription strategy φ prs ∈ Φ prs in Problem 3.4. Proof. We prove (6) by induction. At time 0, (6) is true because random vectors X 1:N 0 are independent. Suppose (6) is true at time t. At time t + 1, for all n∈N , define ˜ f n t (X n 0:t ,W n t ,H 0 t ) =X n t+1 =A nn X n t +B nn g n t (X n 0:t ,H 0 t ,U n 0:t−1 ) +B n0 g 0 t (H 0 t ) +W n t =A nn X n t +B nn ˇ g n t (X n 0:t ,H 0 t ) +B n0 g 0 t (H 0 t ) +W n t (7) where ˇ g n t is obtained from g n 0:t by recursively substituting for U n 0:t−1 . 159 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon Then, the left hand side of (6) at time t + 1 becomes P g {X n 0:t+1 ∈E n 0:t+1 }|H 0 t+1 =P g {X n 0:t+1 ∈E n 0:t+1 } H 0 t ,{Γ n t+1 , Γ n t+1 X n t+1 } =P g { ˜ f n t (X n 0:t ,W n t ,H 0 t )∈E n t+1 ,X n 0:t ∈E n 0:t } H 0 t ,{Γ n t+1 , Γ n t+1 ˜ f n t (X n 0:t ,W n t ,H 0 t )} . (8) Note that 1) Γ 1:N t+1 ,W 1:N t are independent of all other variables at time t, and 2) X 1:N 0:t are independent conditioned on H 0 t from the induction hypothesis. Hence, if we define F 0 =σ(H 0 t ,{Γ n t+1 }), F n =σ(X n 0:t ,W n t ,H 0 t , Γ n t+1 ), and G n =σ(Γ n t+1 ˜ f n t (X n 0:t ,W n t ,H 0 t )), then it can be shown that F 1:N are conditionally independent given F 0 . Then, by using Claim .1, we can write P g { ˜ f n t (X n 0:t ,W n t ,H 0 t )∈E n t+1 ,X n 0:t ∈E n 0:t } H 0 0 ,{Γ n t+1 , Γ n t+1 ˜ f n t (X n 0:t ,W n t ,H 0 t )} = Y n∈N P g ˜ f n t (X n 0:t ,W n t ,H 0 t )∈E n t+1 ,X n 0:t ∈E n 0:t |H 0 t ,{Γ n t+1 , Γ n t+1 ˜ f n t (X n 0:t ,W n t ,H 0 t )} = Y n∈N P g X n 0:t+1 ∈E n 0:t+1 |H 0 t+1 . (9) Therefore, (6) is true at time t and the proof of the first part is complete. The second part can be proved in a similar way. Corollary .1. For any feasible prescription strategy φ prs ∈ Φ prs in Problem 3.4 (g =g 0:N , g n ∈G n , n∈N , in Problem 3.3), (X n t+1 ,Z n t+1 ) is conditionally independent of{Z m t+1 } m6=n given any realization H 0 t . Claim .3. 1) For any constant vector x∈ Q N n=1 R d n X , min {u n ∈R d n U} n∈N QF G t , vec(x,{u n } n∈N ) =QF (P t ,x) where P t := G XX t −G XU t G UU t −1 G UX t is the Schur complement of G UU t of G t = G XX t G XU t G UX t G UU t and the optimal solution is given by, vec({u n∗ } n∈N ) =− G UU t −1 G UX t x. 160 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon 2) For anyθ 1:N ∈ Q N m=1 Δ(R d m X ), letX θ 1 ,...,X θ N be independent random variables such that X θ n has distribution θ n ,n∈N . Then min {q n ∈Q n (θ n )} n∈N tr G t cov vec({X θ n } n∈N , 0,{q n (X θ n )} n∈N ) = N X n=1 tr P nn t cov(X θ n ) (10) where P nn t :=G X n X n t −G X n U n t G U n U n t −1 G U n X n t (11) and the optimal solution for n∈N is given by, q n∗ (X θ n ) =− G U n U n t −1 G U n X n t X θ n −μ(θ n ) . (12) Proof. The first part of Claim .3 can be obtained by a simple completing the square argu- ment. Now let’s consider the functional optimization problem (10) in the second part of Claim .3. Using properties of trace and covariance matrices, we can write tr G t cov vec({X θ n } n∈N , 0,{q n (X θ n )} n∈N ) =E h QF G t , vec({X θ n } n∈N , 0,{q n (X θ n )} n∈N )−E[vec({X θ n } n∈N , 0,{q n (X θ n )} n∈N )] i =E h QF G t , vec({X θ n −μ(θ n )} n∈N , 0,{q n (X θ n )} n∈N ) i = X n∈N E h QF ˜ G n t , vec X θ n −μ(θ n ),q n (X θ n ) i (13) where ˜ G n t = G X n X n t G X n U n t G U n X n t G U n U n t . The last equality in (13) is true because all off-diagonal terms are zero sinceE[q n (X θ n )] = 0,E[X θ n −μ(θ n )] = 0, andX θ n andX θ m are independent for all n6=m. Note that each term in (13) only depends on one q n , n∈N . Therefore, the functional optimization problem (10) is equivalent to solving the N optimization problems min q n ∈Q n (θ n ) E h QF ˜ G n t , vec X θ n −μ(θ n ),q n (X θ n ) i . (14) 161 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon Since θ n is the distribution of X θ n , we have E h QF ˜ G n t , vec X θ n −μ(θ n ),q n (X θ n ) i = Z QF ˜ G n t , vec (y−μ(θ n ),q n (y)) θ n (dy). (15) Note that the function inside the integral of (15) is a quadratic function. As in the first part of Claim .3, for any y∈R d n X we have QF ˜ G n t , vec (y−μ(θ n ),q n (y)) ≥QF ˜ G n t , vec (y−μ(θ n ),q n∗ (y)) =QF (P nn t ,y−μ(θ n )) where P nn t is given by (11) and q n∗ is the function given by (12) . Note that q n∗ ∈Q θ n because q n∗ is measurable and Z q n∗ (x n )θ n (dx n ) = Z − G U n U n t −1 G U n X n t (x n −μ(θ n ))θ n (dx n ) = 0. Thus q n∗ is the optimal solution for the optimization problem in (14) for each n∈N and the optimal value is tr(P nn t cov(X θ n )). Using (13), the optimal value in (10) then becomes P N n=1 tr P nn t cov(X θ n ) . Remark .1. The functional optimization in part 2 of Claim 3 can be thought of as a static team problem [157] where players are constrained to use zero-mean strategies. A.2 Proof of Lemma 3.4 To show that the local controller C n , n∈N , can use only ˆ H n t ={X n t ,H 0 t } to make the decision at time t without loss of optimality, we proceed using person-by-person approach. For any fixed feasible strategies of the remote controller g 0 and the local controllers g m , m∈N\{n}, the problem of finding optimal strategy of the local controller n becomes a centralized problem with the state ˆ X t = vec(X −n 0:t−1 ,X 1:N t ,H 0 t ). From the theory of centralized control problems with imperfect information [158], we know that we can restrict controller C n ’s strategy to be of the form: U n∗ t =σ n t (P g 0 ,g −n ( ˆ X t |H n t )). (16) 162 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon Then, if we denote ˜ g = (g 0 ,g −n ), for any measurable setsF⊂H 0 t ,E m s ∈R d m X ,m∈N\{n}, s = 0, 1,...,t− 1, and E m t ∈R d m X , m∈N , P ˜ g ˆ X t ∈ vec(E −n 0:t−1 ,E 1:N t ,F )|H n t =P ˜ g X −n 0:t ∈E −n 0:t ,X n t ∈E n t ,H 0 t ∈F|X n 0:t ,H 0 t =1 F (H 0 t )P ˜ g X −n 0:t ∈E −n 0:t ,X n t ∈E n t |X n 0:t ,H 0 t =1 F (H 0 t )P ˜ g (X n t ∈E n t |X n t ,H 0 t ) Y m6=n P ˜ g (X m 0:t ∈E m 0:t |H 0 t ) =P ˜ g ˆ X t ∈ vec(E −n 0:t−1 ,E 1:N t ,F )|X n t ,H 0 t (17) where the second equality is true due to the “pulling out known factors” property, the third equality is true from Claim .2, and the last equality follows from the same reasons as the first three equalities. Therefore, the local controller C n can use only ˆ H n t ={X n t ,H 0 t } to make the optimal decision at time t. An alternative proof based on Markov decision Problem (MDP): To show that the local controller C n , n∈N , can use only ˆ H n t ={X n t ,H 0 t } to make the decision at time t without loss of optimality, we proceed using person-by-person approach. For any fixed feasible strategies of the remote controller g 0 and the local controllers g m , m∈N\{n}, the problem of finding optimal strategy of the local controller n can be reduced to a Markov decision problem with (X n t ,H 0 t ) as the (perfectly observed) state—In particular, it can be shown that this state evolves in a controlled Markovian fashion with U n t as the control action. Moreover, by averaging over X −n 0:t , the expected cost at time t can be written as a function of this state and the actionU n t . From the theory of centralized control problems with perfect state information [158], we know that we can restrict to control strategies for C n that are of the form: U n∗ t =σ n t (X n t ,H 0 t ). (18) A.3 Proof of Lemma 3.7 Note that from (3.75)-(3.79), [ν n t (·)](E n ) :H 0 t →R is a measurable function. To show that [ν n t (H 0 t )](E n ) =P φ prs 0:t−1 (X n t ∈E n |H 0 t ), first note that for any t P φ prs 0:t−1 (X n t ∈E n |H 0 t ) =P φ prs 0:t−1 (X n t ∈E n |H 0 t−1 ,{Z m t }) =P φ prs 0:t−1 (X n t ∈E n |H 0 t−1 ,Z n t ), (19) 163 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon where the second equality is true because of Corollary .1. We now prove by induction that [ν n t (H 0 t )](E n ) =P φ prs 0:t−1 (X n t ∈E n |H 0 t−1 ,Z n t ). (20) At time t = 0, since Γ n 0 ∈{0, 1}, consider two cases: • If Γ n 0 = 1, P(X n 0 ∈E n |Z n 0 )1 {Γ n 0 =1} =P(X n 0 ∈E n |X n 0 , Γ n 0 )1 {Γ n 0 =1} =P(X n 0 ∈E n |X n 0 )1 {Γ n 0 =1} =1 E n(X n 0 )1 {Γ n 0 =1} . (21) • If Γ n 0 = 0, P(X n 0 ∈E n |Z n 0 )1 {Γ n 0 =0} =P(X n 0 ∈E n |Γ n 0 )1 {Γ n 0 =0} =P(X n 0 ∈E n )1 {Γ n 0 =0} =π X n 0 (E n )1 {Γ n 0 =0} . (22) Hence, (20) holds at time 0. Assume that (20) holds at time t. This means that P φ prs 0:t−1 (dx n t |H 0 t ) = [ν n t (H 0 t )](dx n t ) and since W n t is independent of all random variables at and before time t, we get P φ prs 0:t−1 (dx n t dw n t |H 0 t ) = [ν n t (H 0 t )](dx n t )π W n t (dw n t ). (23) At time t + 1, since Γ n t+1 ∈{0, 1}, consider two cases: • If Γ n t+1 = 1, similar to (21) we obtain P φ prs 0:t (X n t+1 ∈E n |H 0 t ,Z n t+1 )1 {Γ n t+1 =1} =1 E n(X n t+1 )1 {Γ n t+1 =1} = [ν n t+1 (H 0 t+1 )](E n )1 {Γ n t+1 =1} . (24) • If Γ n t+1 = 0, P φ prs 0:t (X n t+1 ∈E n |H 0 t ,Z n t+1 )1 {Γ n t+1 =0} =P φ prs 0:t (X n t+1 ∈E n |H 0 t , Γ n t+1 )1 {Γ n t+1 =0} =P φ prs 0:t (f n t+1 (X n t ,W n t ,φ prs t (H 0 t ))∈E n |H 0 t )1 {Γ n t+1 =0} 164 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon =E φ prs 0:t [1 E n(f n t+1 (X n t ,W n t ,φ prs t (H 0 t )))|H 0 t ]1 {Γ n t+1 =0} = Z Z 1 E n(f n t+1 (x n t ,w n t ,φ prs t (H 0 t )))P φ prs 0:t−1 (dx n t dw n t |H 0 t )1 {Γ n t+1 =0} = Z Z 1 E n(f n t+1 (x n t ,w n t ,φ prs t (H 0 t )))ν n t (H 0 t )(dx n t )π W n t (dw n t )1 {Γ n t+1 =0} = [ν n t+1 (H 0 t+1 )](E n )1 {Γ n t+1 =0} , (25) where the second equality is true due to (3.79) and the fact that Γ n t+1 is independent of X n t+1 and H 0 t , the fourth equality is true due to the disintegration theorem [156], and the fifth equality is true due to (23). Hence, (20) holds at time t + 1 and the proof of Lemma 3.7 is complete. A.4 Proof of Theorem 3.4 For any φ prs ∈ Φ prs and any realization h 0 t ∈ H 0 t , let the realization of the common belief Θ n t be θ n t = ν n t (h 0 t ), n ∈ N , defined by Lemma 3.7. Suppose the prescription strategy φ prs∗ ∈ Φ prs achieves the minimum of (3.83) for θ 1:N t , t = 0,...,T , and let u prs∗ t = (u 0∗ t , ¯ u 1:N∗ t ,q 1:N∗ t ) =φ prs∗ (h 0 t ) for any realization h 0 t ∈H 0 t . We prove by induction that V t {ν n t (h 0 t )} is a measurable function with respect to h 0 t , and for any h 0 t ∈H 0 t we have E φ 0 t " T X s=t c prs s (X 1:N s ,U prs s ) h 0 t # =V t {ν n t (h 0 t )} (26) ≤E φ prs " T X s=t c prs s (X 1:N s ,U prs s ) h 0 t # (27) where φ 0 t ={φ prs 0:t−1 ,φ prs∗ t:T }. Note that the above equation at t = 0 gives the optimality of φ prs∗ for Problem 3.4. We first consider (26). AtT +1, (26) is true (all terms are defined to be 0 atT +1). Assume V t+1 {ν n t+1 (h 0 t+1 )} is a measurable function with respect to h 0 t+1 and (26) is true at t + 1. 165 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon From the tower property of conditional expectation we have E φ 0 t " T X s=t c prs s (X 1:N s ,U prs s ) h 0 t # =E φ 0 t h c prs t (X 1:N t ,U prs t ) h 0 t i +E φ 0 t " E φ 0 t " T X s=t+1 c prs s (X 1:N s ,U prs s ) H 0 t+1 # h 0 t # . (28) Note that the first term in (28) is equal to Z c prs t (x 1:N t ,u prs∗ t ) Y n∈N θ n t (dx n t ) = IC(θ 1:N t ,u prs∗ t ). (29) From the induction hypothesis, V t+1 {ν n t+1 (h 0 t+1 )} is measurable with respect toh 0 t+1 , and (26) holds at t + 1. Since ν n t+1 (h 0 t+1 ) =ψ n t (θ n t ,u prs∗ t ,z n t+1 ), the second term in (28) can be written as E φ 0 t h V t+1 n ν n t+1 (H 0 t+1 ) o h 0 t i =E φ 0 t h V t+1 n ψ n t (θ n t ,u prs∗ t ,Z n t+1 ) o h 0 t i = X γ 1 t+1 ∈{0,1} ... X γ n t+1 ∈{0,1} N Y n=1 LS(p n ,γ n t+1 )E φ 0 t h V t+1 n ψ n t (θ n t ,u prs∗ t ,Z n t+1 ) o h 0 t ,{Γ n t+1 =γ n t+1 } i (30) where (30) follows from the fact that P(Γ n t+1 = 0) = 1−P(Γ n t+1 = 1) = p n . From Lemma 3.7 we have ψ n t (θ n t ,u prs∗ t ,Z n t+1 ) = (1− Γ n t+1 )α n∗ t + Γ n t+1 ρ(X n t+1 ) = NB(Γ n t+1 ,α n∗ t ,X n t+1 ) (31) where α n∗ t =ψ n t (θ n t ,u prs∗ t ,∅). Consequently, each inner term in (30) can be written as E φ 0 t h V t+1 n NB(Γ n t+1 ,α n∗ t ,X n t+1 ) o h 0 t ,{Γ n t+1 =γ n t+1 } i =E φ 0 t h V t+1 n NB(γ n t+1 ,α n∗ t ,X n t+1 ) o h 0 t ,{Γ n t+1 = 0} i = Z V t+1 n NB(γ n t+1 ,α n∗ t ,x n t+1 ) o Y n∈N α n t (dx n t+1 ). (32) The first equality in (32) is true becauseX 1:N t+1 are independent of Γ 1:N t+1 and the last equality in (32) follows from Lemma 3.7. 166 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon Combining (29), (30), and (32), the right hand side of (28) is V t θ 1:N t from the definition of the value function (3.83) which is equal to V t {ν n t (h 0 t )} . Hence, (26) is true at time t. The measurability of V t {ν n t (h 0 t )} with respect to h 0 t is also resulted from the fact that V t {ν n t (h 0 t )} is equal to the conditional expectation E φ 0 t h P T s=t c prs s (X 1:N s ,U prs s ) h 0 t i which is measurable with respect to h 0 t . Now let’s consider (27). At T + 1, (27) is true (all terms are defined to be 0 at T + 1). Assume (27) is true att + 1. Letu prs t = (u 0 t , ¯ u 1:N t ,q 1:N t ) =φ prs (h 0 t ). Following an argument similar to that of (28)-(32), E φ prs " T X s=t c s (X 1:N s ,U 0:N s ) h 0 t # ≥ IC(θ 1:N t ,u prs t ) + X γ 1 t+1 ∈{0,1} ... X γ n t+1 ∈{0,1} N Y n=1 LS(p n ,γ n t+1 )× Z V t+1 n NB(γ n t+1 ,α n t ,x n t+1 ) o Y n∈N α n t (dx n t+1 )≥V t (θ 1:N t ). where the last inequality follows from the definition of the value function (3.83). This completes the proof of the induction step, and the proof of the theorem. A.5 Proof of Theorem 3.5 The proof is done by induction. First note that (3.84) is true for t =T + 1 sinceP T +1 = 0, ˜ P nn T +1 = 0 for all n∈N and, by definition, e T +1 = 0. Now, suppose (3.84) is true at t + 1 and the matrices P t+1 and ˜ P nn t+1 , for all n∈N , are all PSD. Let’s compute the right hand side of (3.83) in Theorem 3.4. We first consider V t+1 term on the right hand side in (3.83). From the induction hypothesis we have V t+1 n NB(γ n t+1 ,α n t ,x n t+1 ) o =QF P t+1 , vec n μ NB(γ n t+1 ,α n t ,x n t+1 ) o + X n∈N tr ˜ P nn t+1 cov NB(γ n t+1 ,α n t ,x n t+1 ) +e t+1 =QF P t+1 , vec n μ(α n t ) +γ n t+1 (x n t+1 −μ(α n t )) o + X n∈N (1−γ n t+1 ) tr ˜ P nn t+1 cov α n t +e t+1 (33) 167 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon where the last equality in (33) is true because NB(γ n t+1 ,α n t ,x n t+1 ) = (1−γ n t+1 )α n t +γ n t+1 ρ(x n t+1 ), μ(ρ(x n t+1 )) =x n t+1 , and cov(ρ(x n t+1 )) = 0. The first term on the right hand side of (33) can be further decomposed into QF P t+1 , vec n μ(α n t ) +γ n t+1 (x n t+1 −μ(α n t )) o =QF P t+1 , vec n μ(α n t ) o + 2 vec n μ(α n t ) o | P t+1 vec n γ n t+1 (x n t+1 −μ(α n t )) o +QF P t+1 , vec n γ n t+1 (x n t+1 −μ(α n t )) o . (34) Note that R (x n t+1 −μ(α n t ))α n t (dx n t+1 ) = 0, ∀n∈N and R (x n t+1 −μ(α n t ))(x m t+1 −μ(α m t ))α n t (dx n t+1 )α m t (dx m t+1 ) = 0, ∀n6=m. Consequently, integrat- ing the right hand side of (33) with respect to Q n∈N α n t (dx n t+1 ) we get Z V t+1 n NB(γ n t+1 ,α n t ,x n t+1 ) o Y n∈N α n t (dx n t+1 ) =QF P t+1 , vec n μ(α n t ) o + X n∈N γ n t+1 tr P nn t+1 cov(α n t ) + X n∈N (1−γ n t+1 ) tr ˜ P nn t+1 cov(α n t ) +e t+1 . (35) Substituting (35) back into (3.83), the second term (the term after +) on the right hand side of (3.83) can be written as QF P t+1 , vec n μ(α n t ) o + X n∈N (1−p n ) tr P nn t+1 cov(α n t ) + X n∈N p n tr ˜ P nn t+1 cov(α n t ) +e t+1 . (36) LetS θt t := vec({X θ n t },u 0 t ,{¯ u n t +q n t (X θ n t )}) whereX θ n t is a random vector with distribution θ n t and{X θ n t } and W 1:N t are independent. Let Y θ n t t be the random vector defined by Y θ n t t : = h A B i n,: S θt t +W n t =A nn X θ n t +B nn (¯ u n t +q n t (X θ n t )) +B n0 u 0 t +W n t . 168 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon From (3.78) in Lemma 3.7 we know thatY θ n t t has distributionα n t for alln∈N . Then, (36) becomes QF P t+1 , vec({E[Y θ n t t ]}) + X n∈N (1−p n ) tr P nn t+1 cov(Y θ n t t ) + X n∈N p n tr ˜ P nn t+1 cov(Y θ n t t ) +e t+1 =QF L t ,E[S θt t ] + X n∈N tr ((1−p n ) ˆ L nn t +p n ˜ L nn t ) cov(S θt t ) + X n∈N tr ((1−p n )P nn t+1 +p n ˜ P nn t+1 ) cov(π W n t ) +e t+1 =QF L t ,E[S θt t ] + X n∈N tr ((1−p n ) ˆ L nn t +p n ˜ L nn t ) cov(S θt t ) +e t , (37) where we have defined L t = h A B i | P t+1 h A B i , (38) ˆ L nn t = ( h A B i n,: ) | P nn t+1 h A B i n,: , (39) ˜ L nn t = ( h A B i n,: ) | ˜ P nn t+1 h A B i n,: . (40) The first equality in (37) is true because S θt t and W 1:N t are independent, and the second equality in (37) follows from the definition of e t in (3.85). Using the random vectorS θt t , we can write the first term on the right hand side of (3.83) as E h QF R,S θt t i =QF R,E h S θt t i + tr R cov(S θt t ) . (41) Now putting (41) and (37) (that is, the first and second terms of the right hand side of (3.83)) together into the right hand side of (3.83) we get e t + min {q n t ∈Q n (θ n )} n min {¯ u n t ∈R d n U},u 0 t ∈R d 0 U n QF G t ,E h S θt t i + tr ˜ G t cov(S θt t ) oo , (42) 169 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon where we have defined G t = Q 0 0 R +L t (43) ˜ G t = Q 0 0 R + X n∈N (1−p n ) ˆ L nn t +p n ˜ L nn t . (44) Note thatE[q n t (X θ n )] = 0 for alln∈N , and consequently,E[S θt t ] = vec {μ(θ n t )},u 0 t , ¯ u 1:N t depends only on u 0 t , ¯ u 1:N t . Furthermore, cov(S θt t ) = cov vec({X θ n t }, 0,{q n t (X θ n t )}) de- pends only on the choice ofq 1:N t . Consequently, the optimization problem in the (3.83) can be further simplified to be e t + min u 0 t ,¯ u 1:N t QF G t , vec({μ(θ n t )},u 0 t , ¯ u 1:N t ) + min {q n t ∈Q n (θ n )} tr ˜ G t cov vec({X θ n t }, 0,{q n t (X θ n t )}) . (45) Now we need to solve the two optimization problems min u 0 t ,¯ u 1:N t QF G t , vec({μ(θ n t )},u 0 t , ¯ u 1:N t ) , (46) min {q n t ∈Q n (θ n )} tr ˜ G t cov vec({X θ n t }, 0,{q n t (X θ n t )}) . (47) We first consider the optimization in (46). According to (43) and (38), we have G t = G XX t G XU t G UX t G UU t = Q 0 0 R + A | B | P t+1 h A B i . (48) Since R is PD and further, P t+1 is PSD by induction, G UU t is PD. Then, it follows by the first part of Claim .3 that the optimal solution of (46) is given by (3.92) and min u 0 t ,¯ u 1:N t QF G t , vec({μ(θ n t )},u 0 t , ¯ u 1:N t ) =QF P t , vec({μ(θ n t )}) , (49) whereP t =G XX t −G XU t (G UU t ) −1 G UX t andK t =−(G UU t ) −1 G UX t . From (48), it is straight- forward to see that P t = Ω(P t+1 ,Q,R,A,B) and K t = Ψ(P t+1 ,R,A,B). Furthermore, 170 Appendix A. Decentralized Control over Unreliable Communication– Finite horizon since P t+1 is PSD, according to (48), G t is PSD. Then P t , and consequently P nn t for all n∈N , are PSD because P t is the Schur complement of G UU t of the matrix G t . Now, we consider the optimization in (47). We define the matrix ˜ G n t as follows ˜ G n t := ˜ G X n X n t ˜ G X n U n t ˜ G U n X n t ˜ G U n U n t = Q nn 0 0 R nn + (A nn ) | (B nn ) | (1−p n )P nn t+1 +p n ˜ P nn t+1 h A nn B nn i (50) where the last equality is true from (39), (40), and (44). Since R nn is PD and further,P nn t+1 and ˜ P nn t+1 are PSD by induction, ˜ G U n U n t is PD. Then, the second part of Claim .3 implies that the optimal solution of (47) is given by (3.93) and min {q n t ∈Q n (θ n )} tr ˜ G t cov vec({X θ n t }, 0,{q n t (X θ n t )}) = N X n=1 tr ˜ P nn t cov (θ n t ) , (51) where ˜ P nn t = ˜ G X n X n t − ˜ G X n U n t ( ˜ G U n U n t ) −1 ˜ G U n X n t and ˜ K n t =−( ˜ G U n U n t ) −1 ˜ G U n X n t . From (50), it is straightforward to see that ˜ P nn t = Ω (1−p n )P nn t+1 +p n ˜ P nn t+1 ,Q nn ,R nn ,A nn ,B nn and ˜ K n t = Ψ (1−p n )P nn t+1 +p n ˜ P nn t+1 ,R nn ,A nn ,B nn , . Furthermore, since ˜ P nn t+1 is PSD, according to (50) ˜ G n t is PSD. Then, ˜ P nn t is PSD because ˜ P nn t is the Schur complement of ˜ G U n U n t of the matrix ˜ G n t . Finally, substituting (49) and (51) into (45) we obtain that V t defined by (3.84) is equal to the right hand side of (3.83). This completes the proof of the induction step and the proof of the theorem. A.6 Proof of Theorem 3.6 Let ˆ X n t , n∈N , be the estimate (conditional expectation) of X n t based on the common information H 0 t . Then, for any realization of the marginal common belief θ n t , ˆ x n t = μ(θ n t ) for all n∈N . This together with Theorems 3.4 and 3.5 result in (3.95). To show (3.96) and (3.97), note that at time t = 0, for any n∈N and for any realization h 0 t of H 0 t , ˆ x n 0 =μ(θ n 0 ) = Z yθ n 0 (dy) = n R yπ X n 0 (dy) =μ(π X n 0 ) = 0 if z n 0 =∅, R y1 {y} (x n 0 )(dy) =x n 0 = 0 if z n 0 =x n 0 . (52) 171 Appendix B. Decentralized Control of Finite State and Jump Linear Systems Therefore, (3.96) is true. Furthermore, at time t + 1 and for any realization h 0 t+1 of H 0 t+1 , let θ 1:N t+1 be the corresponding common beliefs and u prs∗ t =φ prs∗ t (h 0 t ), then ˆ x n t+1 =μ(θ n t+1 ) = Z yψ n t (θ n t ,u prs∗ t ,z n t+1 )(dy). If z n t+1 =x n t+1 , then ˆ x n t+1 = R y1 {y} (x n t+1 )(dy) =x n t+1 . If z n t+1 =∅, then, ˆ x n t+1 = Z yψ n t (θ n t ,u prs∗ t ,q n t ,∅)(dy) = Z y Z Z 1 {y} (f n t (x n t ,w n t ,u prs∗ t ))θ n t (dx n t )π W n t (dw n t )(dy) = Z Z f n t (x n t ,w n t ,u prs∗ t )θ n t (dx n t )π W n t (dw n t ) =([A] n,: + [B] n,: K t )ˆ x t . (53) where the third equality is true because Z y1 {y} (f n t (x n t ,w n t ,u prs∗ t ))dy =f n t (x n t ,w n t ,u prs∗ t ). Furthermore, the last equality is true because q n t ∈Q n (θ n ) andW n t is a zero mean random vector. Therefore, (3.97) is true and the proof is complete. B Decentralized Control of Finite State and Jump Linear Systems Connected by Markovian Channels B.1 Proof of Theorem 4.1 Suppose functions V,t = 0, 1...T , and a strategyλ ∗ satisfying the conditions of 4.1 exist. For any strategy λ, where for each time t λ 0 t ∈ Λ 0 and λ n t ∈ ˆ Λ n , n∈N , and for any h 0 t ∈H 0 t , let the realization of the common beliefs Θ 1:N t be θ n t =ν n t (h 0 t ), n∈N (defined by Lemma 4.3). To show the correctness of Theorem 4.1, note that since γ 1:N t and s 0 t are included in h 0 t and θ n t = ν n t (h 0 t ), n∈N , are functions of h 0 t , the value function V t (γ 1:N t ,s 0 t ,θ 1:N t ) can be 172 Appendix B. Decentralized Control of Finite State and Jump Linear Systems equivalently written only as function of h 0 t , that is, ˜ V t (h 0 t ) := V t (γ 1:N t ,s 0 t ,θ 1:N t ). (54) Now, we prove by induction that ˜ V t (h 0 t ) is a measurable function with respect to h 0 t , and for any h 0 t ∈H 0 t we have E λ 0 t " T X s=t c s (S 0:N s ,U 0:N s ) h 0 t # = ˜ V t (h 0 t ) (55) ≤E λ " T X s=t c s (S 0:N s ,U 0:N s ) h 0 t # , (56) where λ 0 t ={λ 0:t−1 ,λ ∗ t:T }. Note that the above equation at t = 0 gives the optimality of λ ∗ for Problem 4.1. We first consider (55). AtT +1, (55) is true (all terms are defined to be 0 atT +1). Assume ˜ V t+1 (h 0 t+1 ) is a measurable function with respect toh 0 t+1 and (55) is true att + 1 (Induction hypothesis). Then, from the induction hypothesis and the tower property of conditional expectation, we have E λ 0 t [ T X s=t c s (S 0:N s ,U 0:N s )|h 0 t ] =E λ 0 t h c t (S 0:N t ,U 0:N t ) h 0 t i +E λ 0 t h ˜ V t+1 (H 0 t+1 ) h 0 t i . (57) The first term in right hand side of (57) is equal to E λ 0 t h c t (S 0:N t ,U 0:N t ) h 0 t i =E λ 0 t h c t s 0 t ,S 1:N t ,u 0∗ t ,{π n∗ t (S n t )} n∈N h 0 t i =E h c t s 0 t , ¯ S 1:N t ,u 0∗ t ,{π n∗ t ( ¯ S n t )} n∈N i , (58) where the last equality is correct because from Lemma 4.3, given h 0 t , S n t has distribution θ n t and we have defined ¯ S n t to be a random vector distributed according to θ n t . For the second term in right hand side of (57), from (54) we have, E λ 0 t h ˜ V t+1 (H 0 t+1 ) h 0 t i =E λ 0 t h V t+1 (Γ 1:N t+1 ,S 0 t+1 , Θ 1:N t+1 ) h 0 t i . (59) Next, we will show that the relevant part of informationh 0 t isi t :={γ 1:N t ,s 0 t ,θ 1:N t ,u 0∗ t ,π 1:N∗ t }. To this end, note that 173 Appendix B. Decentralized Control of Finite State and Jump Linear Systems • Γ n t+1 is described by γ n t and transition probability p n t . • From (4.1), S 0 t+1 is described by s 0 t ,u 0∗ t ,V 0 t . • From Lemma 4.3, Θ n t+1 is described by θ n t ,s 0 t ,u 0∗ t ,π n∗ t , and Z n t+1 . From (4.3), Z n t+1 is a function of Γ n t+1 and S n t+1 . From (4.2), S n t+1 is described by S n t ,s 0 t ,π n∗ t (S n t ), u 0∗ t , and V n t . • From Lemma 4.3, given h 0 t , S n t has distribution θ n t . • V 0:N t are independent of the part of information h 0 t which is not included in i t . From the above points and (59), the second term in right hand side of (57) is equal to E λ 0 t h ˜ V t+1 (H 0 t+1 ) h 0 t i =E λ 0 t h V t+1 (Γ 1:N t+1 ,S 0 t+1 , Θ 1:N t+1 ) h 0 t i =E h V t+1 (Γ 1:N t+1 ,S 0 t+1 , Θ 1:N t+1 ) γ 1:N t ,s 0 t ,θ 1:N t ,u 0∗ t ,π 1:N∗ t i . (60) Combining (58) and (60), the right hand side of (57) is ˜ V t (h 0 t ) = V t (γ 1:N t ,s 0 t ,θ 1:N t ) from the definition of the value function (4.18). Hence, (55) is true at time t. The measurability of ˜ V t (h 0 t ) with respect toh 0 t is also resulted from the fact that ˜ V t (h 0 t ) is equal to the conditional expectationE λ 0 t h P T s=t c s (S 0:N s ,U 0:N s ) h 0 t i which is measurable with respect to h 0 t . Now let’s consider (56). At T + 1, (56) is true (all terms are defined to be 0 at T + 1). Assume (56) is true at t + 1. Let u 0 t = λ 0 t (h 0 t ) and u n t = λ n t (h 0 t ,S n t ) = π n t (S n t ) for any realization h 0 t ∈H 0 t . Following an argument similar to that of (57)-(60), E λ [ T X s=t c s (S 0:N s ,U 0:N s )|h 0 t ] =E h c t s 0 t , ¯ S 1:N t ,u 0 t ,{π n t ( ¯ S n t )} n∈N i +E h V t+1 (Γ 1:N t+1 ,S 0 t+1 , Θ 1:N t+1 ) γ 1:N t ,s 0 t ,θ 1:N t ,u 0 t ,π 1:N t i ≥ V t (γ 1:N t ,s 0 t ,θ 1:N t ), (61) where the last inequality follows from the definition of the value function (4.18). This completes the proof of the induction step, and the proof of the theorem. B.2 Proof of Theorem 4.3 The proof is done by induction. 174 Appendix B. Decentralized Control of Finite State and Jump Linear Systems • At time T + 1: It is clear that (4.30) is true since V T +1 (·) = 0 and P T +1 (·) = 0, ˜ P n T +1 (·,·) = 0 for n∈N , and e T +1 = 0. • At time t + 1: Suppose (4.30) is true for time t + 1 and the matrices P t+1 (γ), ˜ P n t+1 (γ,m), for n∈N , γ∈{0, 1}, m∈M, are all PSD. • At time t: Let’s now compute the value function at t given by (4.18) in Theorem 4.1 (where we have replaced s 0 t with m t ,x 0 t and ¯ S n t with ¯ X n t ). Let’s define i t :={γ 1:N t ,m t ,x 0 t ,θ 1:N t ,u 0 t ,π 1:N t }. We should find u 0 t ,π 1:N t minimizingT where T =E c t m t ,x 0 t , ¯ X 1:N t ,u 0 t ,{π n t ( ¯ X n t )} n∈N | {z } T 1 +E V t+1 Γ 1:N t+1 ,M t+1 ,X 0 t+1 , Θ 1:N t+1 i t | {z } T 2 , (62) where ¯ X n t is a random vector with distribution θ n t . In order to find u 0 t ,π 1:N t minimizingT, we first calculateT 1 ,T 2 and use them to calculate T. Calculating T 1 : Let O t := vec(x 0 t , ¯ X 1:N t ,u 0 t ,{π n t ( ¯ X n t )} n∈N ) and J(m t ) = Q(m t ) 0 0 R(m t ) . Then, accord- ing to (4.27), T 1 =E[QF (J(m t ),O t )] =QF (J(m t ),E[O t ]) + tr (J(m t ) cov(O t )). (63) Calculating T 2 : First note that from Lemma 4.3 (and by replacing s 0 t with m t ,x 0 t ) θ n t+1 =ψ n t (θ n t ,u 0 t ,π 1:N t ,m t ,x 0 t ,z n t+1 ) = (1−γ n t+1 )α n t +γ n t+1 δ(x n t+1 ) =: NB(γ n t+1 ,α n t ,x n t+1 ) (64) 175 Appendix B. Decentralized Control of Finite State and Jump Linear Systems where we have defined α n t := ψ n t (θ n t ,u 0 t ,π 1:N t ,m t ,x 0 t ,∅) and we use δ(x n t+1 ) to denote the Dirac-delta distribution at x n t+1 . From (62) and (64),T 2 can be written as, T 2 = X m∈M ˜ p t (m t ,m) X γ∈{0,1} N Y n∈N p n t (γ n t ,γ n )E V t+1 γ 1:N ,m,X 0 t+1 ,{NB(γ n ,α n t ,X n t+1 )} N i t = X m∈M ˜ p t (m t ,m) X γ∈{0,1} N Y n∈N p n t (γ n t ,γ n ) Z V t+1 γ 1:N ,m,x 0 ,{NB(γ n ,α n t ,x n )} N | {z } T 3 Y n∈N α n t (dx n ), (65) where we have defined γ = vec(γ 1:N ) and α 0 t :=ψ 0 t (m t ,x 0 t ,u 0 t ) as follows for any E 0 ∈X 0 t , [ψ n t (m t ,x 0 t ,u 0 t )](E 0 ) :=P λ 0:N 0:t (X 0 t+1 ∈E 0 |h 0 t ) = Z 1 E 0 A 00 (m t )x 0 t +B 00 (m t )u 0 t +w 0 t π W 0 t (dw 0 t ). (66) Now, from the induction hypothesis we know that V t+1 has the structure of (4.30). Hence, T 3 can be written as, T 3 =QF P t+1 (m), vec x 0 , μ NB(γ n ,α n t ,x n ) N + X n∈N tr ˜ P n t+1 (γ n ,m) cov NB(γ n ,α n t ,x n ) +e t+1 . (67) To simplify (67) further, let’s defineγ 0 := 1. Then, we can writex 0 =μ(α 0 t )+γ 0 (x 0 −μ(α 0 t )). Furthermore, note that • μ δ(x n ) =x n and hence, μ NB(γ n ,α n t ,x n ) =μ(α n t ) +γ n (x n −μ(α n t )). • cov NB(1,α n t ,x n ) = cov δ(x n ) = 0 and cov NB(0,α n t ,x n ) = cov α n t . • R (x n −μ(α n t ))α n t (dx n ) = 0, ∀n∈N . • R (x n −μ(α n t ))(x m −μ(α m t ))α n t (dx n )α m t (dx m ) = 0,∀n6=m. 176 Appendix B. Decentralized Control of Finite State and Jump Linear Systems Now, by considering (67) and the above points, R T 3 Q n∈N α n t (dx n ) can be simplified as follows, Z T 3 Y n∈N α n t (dx n ) =QF P t+1 (m), vec μ(α n t ) N + X n∈N γ n tr P n t+1 (m) cov(α n t ) + X n∈N (1−γ n ) tr ˜ P n t+1 (0,m) cov(α n t ) +e t+1 . (68) Substituting (68) back into (65), T 2 can be written as T 2 =QF F(P t+1 , ˜ p t ,m t ), vec({μ(α n t )} N + X n∈N tr H(P n t+1 , ˜ P n t+1 , ˜ p t ,p n t ,γ n t ,m t ) cov(α n t ) + tr F(P t+1 , ˜ p t ,m t ) cov(α 0 t ) +e t+1 , (69) where operatorsF andH are as defined in (4.34) and (4.38). Let Y n t be the random vector defined as follows for n∈N , Y n t := h A(m t ) B(m t ) i n,: O t +W n t . (70) From (4.16) in Lemma 4.3 and (66), we know that Y n t has distribution α n t for all n∈N . Let D(m t ) = [A(m t ) B(m t )], then, we have for all n∈N , μ(α n t ) =E[Y n t ] = [D(m t )] n,: E[O t ], (71) and since O t and W 0:N t are independent, we have cov(Y 0 t ) = cov(W 0 t ) and for all n∈N , cov(α n t ) = cov(Y n t ) = [D(m t )] n,: cov(O t )[D(m t )] | n,: + cov(W n t ). (72) Then, using (71) and (72), (69) can be written as T 2 =QF ¯ F,E[O t ] + X n∈N tr ¯ H n cov(O t ) +e t , (73) where we have defined, ¯ F =D(m t ) | F(P t+1 , ˜ p t ,m t )D(m t ), ¯ H n = [D(m t )] | n,: H(P n t+1 , ˜ P n t+1 , ˜ p t ,p n t ,γ n t ,m t )[D(m t )] n,: . (74) 177 Appendix B. Decentralized Control of Finite State and Jump Linear Systems Calculating T: Now that we have calculated T 1 andT 2 , we use them to obtainT as follows, T =QF (G t (m t ),E[O t ]) + tr ˜ G t (γ 1:N t+1 ,m t ) cov(O t ) , (75) where we have defined G t (m t ) =J(m t ) + ¯ F, (76) ˜ G t (γ 1:N t ,m t ) =J(m t ) + X n∈N ¯ H n . (77) Finding u 0 t ,π 1:N t minimizing T: Now, we want to find u 0 t ,π 1:N t minimizing T. Here, we apply our proposed special change of variable, that is, we write π n t ( ¯ X n ) = ¯ u n t +q n t ( ¯ X n ) where ¯ u n t =E[π n t ( ¯ X n )] and q n t ( ¯ X n ) = π n t ( ¯ X n )−E[π n t ( ¯ X n )]. Note that E[q n t ( ¯ X n )] = 0. By applying this change of variable to O t = vec(x 0 t ,{ ¯ X n } N ,u 0 t ,{π n t ( ¯ X n )} N ), we can have E[O t ] = vec(x 0 t ,{μ(θ n t )} N ,u 0 t ,{¯ u n t } N ), (78) which only depends on u 0 t and ¯ u 1:N t . Further, cov(O t ) only depends on q 1:N t . This means that we can write V t (γ 1:N t ,m t ,x 0 t ,θ 1:N t ) = min u 0 t ,¯ u 1:N t ,q 1:N t T =e t + min u 0 t ,¯ u 1:N t QF (G t (m t ),E[O t ]) + min q 1:N t tr ˜ G t (γ 1:N t+1 ,m t ) cov(O t ) . (79) Then, it follows from part 1 of Claim .3 in Appendix A that min u 0 t ,¯ u 1:N t QF (G t (m t ),E[O t ]) =QF P t (m t ), vec(x 0 t ,{μ(θ n t )} N ) , (80) where the minimizers areu 0∗ t =K 0 t (m t ) vec(x 0 t ,{μ(θ n t )} N ) and ¯ u n∗ t =K n t (m t ) vec(x 0 t ,{μ(θ n t )} N ), n∈N . Here matrices P t (m t ) and K t (m t ) are as described in (4.35) and (4.36). Further, part 2 of Claim .3 in Appendix A implies that min q 1:N t tr ˜ G t (γ 1:N t+1 ,m t ) cov(O t ) = X n∈N tr ˜ P n t (γ n t ,m t ) cov (θ n t ) , (81) 178 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon where the minimizer is q n∗ t (x n t ) = ˜ K n t (γ n t ,m t )(x n t −μ(θ n t )), n ∈ N . Further, matrices ˜ P n t (γ n t ,m t ) and ˜ K n t (γ n t ,m t ) are as described in (4.39) and (4.40). By substituting (80) and (81) into (79), we obtain (4.30) at t. This completes the proof of the induction step. Furthermore, from Theorem 3.6, we can show that μ(θ n t ) is ˆ x n t (that is, conditional expec- tation of X n t given h 0 t ) and x n t −μ(θ n t ) is x n t − ˆ x n t . Now, by considering this fact and by substituting π n∗ t (·) back for ¯ u n∗ t +q n∗ t (·), we get (4.43)-(4.44). This completes the proof of the theorem. C Decentralized Control over Unreliable Communication– Infinite horizon C.1 Properties of the Operators Lemma .1. Consider matrices P,Q,R,A,B of appropriate dimensions with P,Q being PSD matrices and R being a PD matrix. Define Φ(P,K,Q,R,A,B) :=Q +K | RK + (A + BK) | P (A +BK). Then, (i) Ω(P,Q,R,A,B) = Φ(P, Ψ(P,R,A,B),Q,R,A,B) = min K Φ(P,K,Q,R,A,B). (82) Note that the minimization is in the sense of partial order, that is, the minimum value Ω(P,Q,R,A,B) Φ(P,K,Q,R,A,B) for all K. (ii) Furthermore, for PSD matrices Y 1 and Y 2 such that Y 1 Y 2 , we have Ω(Y 1 ,Q,R,A,B) Ω(Y 2 ,Q,R,A,B). (83) Proof. The statements in the above lemma can be found in the literature (see, for exam- ple, [159, Chapter 2]). We provide a proof for completeness. It can be established by straightforward algebraic manipulations that Φ(P,K,Q,R,A,B) = Ω(P,Q,R,A,B) 179 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon + (K− Ψ(P,R,A,B)) | R(K− Ψ(P,R,A,B)), (84) withR = R + B | PB. Then (84) implies that Φ(P,K,Q,R,A,B) is minimized when K = Ψ(P,R,A,B) and the minimum value is Ω(P,Q,R,A,B). For PSD matrices Y 1 and Y 2 such that Y 1 Y 2 , it is straightforward to see that Φ(Y 1 ,K,Q,R,A,B) Φ(Y 2 ,K,Q,R,A,B) for any K. Hence, min K Φ(Y 1 ,K,Q,R,A,B) min K Φ(Y 2 ,K,Q,R,A,B). (85) From (85) and (82), it follows that (83) is correct. C.2 Proof of Lemma 5.3 If matrices P t (m), m∈M, converge as t→−∞ to PSD matrices P ∗ (m), then by con- tinuity, the collection of PSD matrices P ∗ ={P ∗ (0),...,P ∗ (M)} satisfy the DCARE in (5.28). Since the DCARE (5.28) has a PSD solution P ∗ ={P ∗ (0),...,P ∗ (M)}, then from [56, Proposition 7] and the SD assumption of the MJLS, it is also a stabilizing solution of the DCARE ([56, Definition 3] and [55, Definition 4.4]). Then, the MJLS is SS from the definition of the stabilizing solution. On the other hand, if the MJLS is SS, under the SD assumption of the MJLS, [55, Corollary A.16] ensures the existence of a stabilizing solution of the DCARE in (5.28). The solution is also the unique PSD solution from [55, Theorem A. 15] (by taking X = 0 in Theorem A. 15). Then from [55, Proposition A. 23], matrices P t (m), m∈M, converge as t→−∞ to PSD matrices P ∗ (m). C.3 Proof of Lemma 5.6 Because of Lemma 5.5, P 0 t = P t (0) and P 1 t = P t (1), where matrices P t (0),P t (1) are defined by (5.39) - (5.41) for the auxiliary MJLS. Thus, we can focus on the convergence of matrices P t (0),P t (1). To investigate the convergence of P t (0),P t (1), we first show that the auxiliary MJLS described by (5.45)-(5.49) is SS if and only if p 1 < p 1 c where p 1 c is the critical threshold 180 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon given by (5.50). According to Lemma 5.4, the MJLS is SS if and only if there exist matrices K (m), m∈{0, 1}, such that ρ(A s )< 1. For the MJLS described by (5.45)-(5.49), we can findA s from Lemma 5.4 as follows A s = A s (0)⊗A s (0) (1−p 1 )A s (1)⊗A s (1) 0 p 1 A s (1)⊗A s (1) , (86) whereA s (m) =A (m)+B (m)K (m),m∈{0, 1}. Since the matrixA s is upper-triangular, it is Schur stable if and only if all its diagonal blocks are Schur stable. Since A (0) = A,B (0) = B and (A,B) is stabilizable from Assumption 5.1, there exists K (0) such that ρ A s (0)⊗A s (0) , which is equal to ρ A s (0) 2 , is less than 1. Therefore, the MJLS is SS if and only ifρ p 1 A s (1)⊗A s (1) < 1 for someK (1). Note thatρ p 1 A s (1)⊗ A s (1) = p 1 × ρ A s (1) 2 . Therefore, the MJLS is SS if and only if 1 √ p 1 > ρ A s (1) for some K (1). Since A (1) =A and B (1) = [0,B 11 ], it follows then that the MJLS is SS iff 1 p p 1 >ρ A +B 11 ˜ K (1) , for some ˜ K (1). This condition is equivalent to p 1 < p 1 c where p 1 c is the critical threshold given by (5.50). Next, we show that the auxiliary MJLS described by (5.45)-(5.49) is SD. To this end, we can follow an argument similar to the one described above for establishing that the MJLS is SS and use part 2 of Lemma 5.4 to show that the MJLS is SD if and only if there exist matricesH (0) andH (1) such thatρ A d (0)⊗A d (0) < 1 andρ p 1 A d (1)⊗A d (1) < 1. Since A (0) =A (1) =A,Q (0) =Q (1) =Q and (A,Q) is detectable from Assumption 5.1, there exist matricesH (0) andH (1) such thatρ A d (0)⊗A d (0) < 1 andρ p 1 A d (1)⊗A d (1) < 1. Hence, the MJLS is SD. Thus, the MJLS of (5.45)-(5.49) is SD for any p 1 and it is SS if and only if p 1 < p 1 c . It then follows from Lemma 5.3 that matrices P t (m), m∈{0, 1}, converge as t→−∞ to PSD matrices P ∗ (m),m∈{0, 1} that satisfy the steady state version of (5.40)-(5.41) (i.e, equations (5.33)-(5.34)) if and only if p 1 <p 1 c . This proves the lemma. 181 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon C.4 Proof of Lemma 5.7 If (A,B 11 ) is reachable, then m 1 c = ρ(A +B 11 K) can be set to zero. If (A,B 11 ) is not reachable, then there exists a similarity transformation T [160] such that ˆ A =TAT | = ˆ A uc 0 ˆ A 21 ˆ A c , ˆ B 11 =TB 11 = 0 ˆ B 11 c , (87) where the pair ( ˆ A c , ˆ B 11 c ) is reachable and all eigenvalues of ˆ A uc are unreachable. Now, to show that m 1 c is the largest unreachable mode of (A,B 11 ), first note that since matrix T is invertible and T | = T −1 , we have ρ(TMT | ) = ρ(M) for any arbitrary matrix M. Hence, we can write m 1 c = min K ρ(A +B 11 K) = min K ρ(TAT | +TB 11 KT | ) = min K ρ( ˆ A + ˆ B 11 KT | ) = min K ρ ˆ A uc 0 ˆ A 21 + ˆ B 11 c KT | ˆ A c + ˆ B 11 c KT | = min K max{ρ( ˆ A uc ),ρ( ˆ A c + ˆ B 11 c KT | )} = max{ρ( ˆ A uc ), min K ρ( ˆ A c + ˆ B 11 c KT | )} =ρ( ˆ A uc ), (88) where the fifth equality is correct because of the spectral radius property of lower-triangular matrices, the sixth equality is correct because the minimization does not depend on ρ( ˆ A uc ), and the last equality is correct because the pair ( ˆ A c , ˆ B 11 c ) is reachable and hence, there is a matrix K 0 =KT | that can make ρ( ˆ A c + ˆ B 11 c K 0 ) arbitrarily small and more specifically, smaller thanρ( ˆ A uc ). Note that sinceT is invertible, the correspondingK =K 0 T exists and hence, we have ρ( ˆ A uc )> min K ρ( ˆ A c + ˆ B 11 c KT | ). Therefore, we have m 1 c =ρ( ˆ A uc ), that is, m 1 c is the largest unreachable mode of (A,B 11 ). C.5 Proof of Lemma 5.8, parts 1 and 2 Letg ∗ denote the strategies described by (5.35)-(5.38). We want to show that for anyg∈G, J ∞ (g)≥J ∞ (g ∗ ) and that J ∞ (g ∗ ) is finite. We will make use of the following claim. Claim .4. For the strategies g ∗ described by (5.35)-(5.38), the following equation is true: J T (g ∗ ) =(T + 1) tr(Λ ∗ )−E g∗ [V T +1 ], (89) 182 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon whereJ T (g ∗ ) is the finite horizon cost ofg ∗ over a horizon of durationT , Λ ∗ = (1−p 1 )P 0 ∗ + p 1 P 1 ∗ and for any t≥ 0, V t = ˆ X | t P 0 ∗ ˆ X t + tr P 1 ∗ cov(X t |H 0 t ) . (90) Proof. The above claim is a special case of Claim .5 in the proof of Lemma 5.20. Based on Claim .4, the infinite horizon average cost for g ∗ is given as J ∞ (g ∗ ) = lim sup T→∞ 1 T + 1 J T (g ∗ ) = tr(Λ ∗ )− lim inf T→∞ E g ∗ [V T +1 ] T + 1 ≤ tr(Λ ∗ ), (91) where the last inequality holds because V T +1 ≥ 0. For n = 0, 1, define Y n 0 = 0, and for k = 0, 1, 2,..., Y 0 k+1 = Ω(Y 0 k ,Q,R,A,B), (92) Y 1 k+1 = Ω (1−p 1 )Y 0 k +p 1 Y 1 k ,Q,R 11 ,A,B 11 . (93) It’s easy to check that for n = 0, 1, Y n k = P n T +1−k for all k≥ 0, and that lim k→∞ Y n k = lim t→−∞ P n t =P n ∗ . Further, let’s define Λ k = (1−p 1 )Y 0 k +p 1 Y 1 k . From (5.18) of Lemma 5.1, we know that the optimal finite horizon cost is given as J ∗ T = T X t=0 tr (1−p 1 )P 0 t+1 +p 1 P 1 t+1 = T X k=0 tr (1−p 1 )P 0 T +1−k +p 1 P 1 T +1−k = T X k=0 tr (1−p 1 )Y 0 k +p 1 Y 1 k = T X k=0 tr(Λ k ). (94) We can therefore write lim T→∞ 1 T + 1 J ∗ T = lim T→∞ 1 T + 1 T X k=0 tr(Λ k ) = tr(Λ ∗ ), (95) 183 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon where the last equality is correct because lim k→∞ Y n k =P n ∗ for n = 0, 1. Now, for any g∈G, J ∞ (g) = lim sup T→∞ 1 T + 1 J T (g) ≥ lim sup T→∞ 1 T + 1 J ∗ T = tr(Λ ∗ )≥J ∞ (g ∗ ), (96) where the first inequality is true because by definition J ∗ T = inf g 0 ∈G J T (g 0 )≤J T (g) for any g∈G, the second equality is true because of (95) and the last inequality is true because of (91). Hence, g ∗ is optimal for Problem 5.2, and the optimal cost is finite and equal to tr(Λ ∗ ). C.6 Proof of Lemma 5.8, part 3 Let ˜ X t :=X t − ˆ X t denote the estimation error. It suffices to show that ˆ X t and ˜ X t are mean square stable. The optimal strategies can be written as U t =K 0 ∗ ˆ X t + 0 K 1 ∗ ˜ X t . (97) Then, from (5.38) we have ˜ X t+1 =(1− Γ 1 t+1 )(A s (1) ˜ X t +W t ), (98) where A s (1) = (A +B 11 K 1 ∗ ). If p 1 = 0 or 1, the stability result follows from standard linear system theory arguments. If 0<p 1 < 1, the estimation error ˜ X t is a MJLS with an i.i.d. switching process 1 . From [55, Theorem 3.33], the estimation error process is mean square stable if the corresponding noiseless system (i.e., with W t = 0) is mean square stable. Because Γ 1 t+1 is an i.i.d. process, from [114, Corollary 2.7], the noiseless system is mean-square stable if p 1 ρ(A s (1)⊗A s (1))< 1. Note that the gain matrices K 0 ∗ ,K 1 ∗ are obtained from the solution of DCARE in (5.28) for the SD and SS auxiliary MJLS described by (5.45)-(5.49), so the corresponding MJLS gains 2 stabilize the auxiliary MJLS [55, Corollary A.16], [55, Theorem A.15]. That is, the 1 Note that this MJLS is not the same as the auxiliary MJLS constructed in Section 5.4.1. 2 obtained by a steady state version of (5.24) 184 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon following matrix A s = A s (0)⊗A s (0) (1−p 1 )A s (1)⊗A s (1) 0 p 1 A s (1)⊗A s (1) (99) has a spectral radius less than one (see the proof of Lemma 5.6), where A s (0) =A +BK 0 ∗ . Thus, ρ(A s (0))< 1 and p 1 ρ(A s (1)⊗A s (1))< 1. Consequently, the estimation error ˜ X t is mean-square stable. Now, note that the estimate evolution can be written as ˆ X t+1 =A s (0) ˆ X t + ˜ W t , (100) where ˜ W t = Γ 1 t+1 (A s (0) ˜ X t +W t ) can be viewed as a “noise” process. The process ˜ W t is mean square stable because Γ 1 t+1 ≤ 1, and ˜ X t and W t are both mean square stable. Since ρ(A s (0)) < 1, we conclude that ˆ X t is mean square stable using standard linear system arguments [161, Theorem 3.4]. C.7 Proof of Lemma 5.9 Consider the matrices Y n k , n = 0, 1,k = 0, 1,..., defined by Y n 0 = 0 and the recursions in (92) and (93). Since matrices P n t , n = 0, 1, do not converge as t→−∞, it follows that matrices Y n k , n = 0, 1, do not converge as k→∞ (recall that Y n k = P n T +1−k for n = 0, 1 and k≥ 0). Recall from (94) thatJ ∗ T = P T k=0 tr(Λ k ), where Λ k = (1−p 1 )Y 0 k +p 1 Y 1 k . Also, from the first inequality in (96), recall thatJ ∞ (g)≥ lim sup T→∞ 1 T +1 J ∗ T for any strategyg. Therefore, to show that no strategy can achieve finite cost, it suffices to show that lim sup T→∞ 1 T + 1 J ∗ T = lim sup T→∞ 1 T + 1 T X k=0 tr(Λ k ) =∞. (101) To do so, we first show that the sequence{Y n k ,k = 0, 1,...} is monotonically increasing 3 for n = 0, 1. To this end, note that Y n 1 Y n 0 = 0 for n∈{0, 1}. Furthermore, the monotonic property of the operator Ω(·) (proved in part (ii) of Lemma .1 in Appendix C.1) implies 3 in the sense of the partial order. 185 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon that for n = 0, 1, and for all k ≥ 0, Y n k+1 Y n k . Now, if the sequences{Y n k ,k ≥ 0}, n = 0, 1, are bounded, they will converge due to the monotone behavior. This contradicts the fact that these sequences do not converge as k→∞. Therefore, at least one of the two sequences{Y 0 k ,k≥ 0},{Y 1 k ,k≥ 0} is unbounded. Consequently, the sequence{Λ k ,k≥ 0} is unbounded. Hence, lim sup T→∞ 1 T +1 P T k=0 tr(Λ k ) =∞. This completes the proof. C.8 Proof of Lemma 5.17 By comparing (5.104)-(5.105) with (5.113)-(5.114), it is straightforward to observe that P 0 t = ¯ P 0 t for all t. We will now show by induction that at any time t, P n t = ¯ P n t for n∈N . First note that by definition, P n T +1 = 0 and ¯ P n T +1 = 0 for n∈N . Hence, (5.117) is correct at time T + 1. Now, assume that (5.117) is correct at time t + 1 (induction hypothesis). Then, from (5.115) and the induction hypothesis, we have for n∈N , ¯ P n t = Ω (1−p n )P 0 t+1 +p n L zero (P 0 t+1 ,P n t+1 ,n + 1,n + 1), L zero (Q,Q nn ,n + 1,n + 1),L iden (R,R nn ,n + 1), L zero (A,A nn ,n + 1,n + 1),L zero (B,B nn ,n + 1,n + 1) =L zero (Q,Q nn ,n + 1,n + 1) +T 1 −T 2 (T 3 ) −1 (T 2 ) | , (102) where T 1 =L zero (A,A nn ,n + 1,n + 1) | ¯ ¯ P n t+1 L zero (A,A nn ,n + 1,n + 1) (103) T 2 =L zero (A,A nn ,n + 1,n + 1) | ¯ ¯ P n t+1 L zero (B,B nn ,n + 1,n + 1), (104) T 3 =L iden (R,R nn ,n + 1) +L zero (B,B nn ,n + 1,n + 1) | ¯ ¯ P n t+1 L zero (B,B nn ,n + 1,n + 1), (105) and we have defined ¯ ¯ P n t+1 = (1−p n )P 0 t+1 +p n L zero (P 0 t+1 ,P n t+1 ,n + 1,n + 1). Note that from the definitions of operatorsL zero andL iden in Chapter 2, it is straightforward to observe that the block dimensions of T 1 ,T 2 ,T 3 are the same as the block dimensions of A,B,B | B, respectively (They are all block matrices of size (N + 1)× (N + 1)). Therefore, through straightforward algebraic manipulations, we can get T 1 =L zero (A, ˜ T 1 ,n + 1,n + 1), (106) 186 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon T 2 =L zero (B, ˜ T 2 ,n + 1,n + 1), (107) T 3 =L iden (B | B, ˜ T 3 ,n + 1), (108) where ˜ T 1 = (A nn ) | [(1−p n )[P 0 t+1 ] n+1,n+1 +p n P n t+1 ]A nn , (109) ˜ T 2 = (A nn ) | [(1−p n )[P 0 t+1 ] n+1,n+1 +p n P n t+1 ]B nn , (110) ˜ T 3 =R nn + (B nn ) | [(1−p n )[P 0 t+1 ] n+1,n+1 +p n P n t+1 ]B nn . (111) Further, sinceT 3 is a block diagonal matrix, we have (T 3 ) −1 =L iden B | B, ( ˜ T 3 ) −1 ,n + 1 . (112) Now, using (106)-(112) and the fact that matrices A,Q,BB | have the same size as matrix P t (They are block matrices of size (N + 1)× (N + 1)), (102) can be simplified to ¯ P n t =L zero P 0 t ,Q nn + ˜ T 1 − ˜ T 2 ( ˜ T 3 ) −1 ( ˜ T 2 ) | ,n + 1,n + 1 =L zero (P 0 t ,P n t ,n + 1,n + 1), (113) where the last equality is true because of the definition of P n t in (5.106). Hence, (5.117) is true at time t. This completes the proof. C.9 Proof of Lemma 5.19 From Lemma 5.17, we know that the convergence of matrices P 0 t ,P n t , n∈N , is equivalent to the convergence of matrices ¯ P n t , n∈N . Further, because of Lemma 5.18, ¯ P n t =P t (n), n∈N , where matricesP t (n),n∈N , are defined by (5.118)-(5.120) for the auxiliary MJLS. Thus, in order to study the the convergence of matrices ¯ P n t , n∈N , we can focus on the convergence of matricesP t (n),n∈N . To investigate the convergence ofP t (n),n∈N , we first show that the auxiliary MJLS described by (5.121)-(5.125) is SS if and only if p n <p n c for all n∈N . According to part 1 of Lemma 5.4, the MJLS is SS if and only if there exist matrices K (n), n∈N , such that ρ(A s )< 1. For the MJLS described by (5.121)-(5.125), we can findA s from (5.31) as in (114). 187 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon As = As(0)⊗As(0) (1−p 1 )As(1)⊗As(1) (1−p 2 )As(2)⊗As(2) ... (1−p N )As(N)⊗As(N) 0 p 1 As(1)⊗As(1) 0 ... 0 . . . . . . p 2 As(2)⊗As(2) . . . . . . . . . . . . . . . 0 0 ... ... 0 p N As(N)⊗As(N) (114) Since the matrixA s is upper-triangular, it is Schur stable if and only if there exist matrices K (1) and K (n) for n∈N such that ρ A s (0)⊗A s (0) < 1 and ρ p n A s (n)⊗A s (n) < 1, where A s (n), n∈N , defined as in (115). A s (n) =A (n) +B (n)K (n) = 1 :n n + 1 n + 2 :N + 1 0 0 0 1 :n B nn [K (n)] n+1,1:n A nn +B nn [K (n)] n+1,n+1 B nn [K (n)] n+1,n+2:N+1 n + 1 0 0 0 n + 2 :N + 1 . (115) Since A (0) = A,B (0) = B and (A,B) is stabilizable from Assumption 5.4, there exists K (1) such that ρ A s (1)⊗A s (1) , which is equal to ρ A s (1) 2 , is less than 1. Therefore, the MJLS is SS if and only if for n∈N , ρ p n A s (n)⊗A s (n) < 1 for some K (n). Note that ρ p n A s (n)⊗ A s (n) = p n × ρ A s (n) 2 . Therefore, the MJLS is SS if and only if 1 √ p n > ρ A s (n) for some K (n). Because of the property of spectral radius of block matrices, from (115), we have ρ A s (n) =ρ A nn +B nn [K (n)] n+1,n+1 . (116) 188 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon Therefore, the MJLS is SS iff for all n∈N , 1 √ p n >ρ A nn +B nn ˜ K (n) , for some ˜ K (n). This condition is equivalent to p n <p n c where p n c is the critical threshold given by (5.126). Next, we can use part 2 of Lemma 5.4 to show that the auxiliary MJLS described by (5.121)-(5.125) is SD if and only ofA d defined as in (117) is Schur stable. A d = A d (0)⊗A d (0) (1−p 1 )A d (1)⊗A d (1) (1−p 2 )A d (2)⊗A d (2) ... (1−p N )A d (N)⊗A d (N) 0 p 1 A d (1)⊗A d (1) 0 ... 0 . . . . . . p 2 A d (2)⊗A d (2) . . . . . . . . . . . . . . . 0 0 ... ... 0 p N A d (N)⊗A d (N) (117) Since the matrixA d is upper-triangular, it is Schur stable if and only if there exist matrices H (0) and H (n) for n∈N such that ρ A d (1)⊗A d (1) < 1 and ρ p n A d (n)⊗A d (n) < 1, where A d (n), n∈N , defined as in (118). A d (n) =A (n) +H (n) Q (n) 1/2 = 1 :n n + 1 n + 2 :N + 1 0 [H (n)] n+1,1:n (Q nn ) 1/2 0 1 :n 0 A nn + [H (n)] n+1,n+1 (Q nn ) 1/2 0 n + 1 0 [H (n)] n+1,n+2:N+1 (Q nn ) 1/2 0 n + 2 :N + 1 . (118) The existence of these matrices follows from detectability of (A,Q) and A nn , (Q nn ) 1/2 for n∈N (see Assumptions 5.4 and 5.5). Hence, the MJLS is SD. 189 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon It then follows from Lemma 5.3 that matrices P t (n), n∈N , converge as t→−∞ if and only ifp n <p n c for alln∈N . Consequently, matrices P 0 t ,P n t ,n∈N , converge ast→−∞ to matricesP 0 ∗ ,P n ∗ ,n∈N , that satisfy the coupled fixed point equations (5.108)-(5.109) if and only if p n <p n c for all n∈N . This proves the lemma. C.10 Proof of Lemma 5.20 Letg ∗ denote the strategies defined by (5.110)-(5.112). We want to show that for anyg∈G, J ∞ (g)≥J ∞ (g ∗ ) and that J ∞ (g ∗ ) is finite. We will make use of the following claim. Claim .5. For the strategies g ∗ described by (5.110)-(5.112), the following equation is true: J T (g ∗ ) =(T + 1)Λ ∗ −E g∗ [V T +1 ] (119) whereJ T (g ∗ ) is the finite horizon cost ofg ∗ over a horizon of durationT , Λ ∗ = tr([P 0 ∗ ] 1,1 )+ P N n=1 [(1−p n ) tr([P 0 ∗ ] n+1,n+1 ) +p n tr(P n ∗ )] and for any t≥ 0, V t = ˆ X | t P 0 ∗ ˆ X t + N X n=1 tr P n ∗ cov(X n t |H 0 t ) . (120) Proof. See Appendix C.12 for a proof of Claim .5. Along the lines of the proof of Lemma 5.8 in Appendix C.5, we define Y 0 0 = 0, and for k = 0, 1, 2,..., Y 0 k+1 = Ω(Y 0 k ,Q,R,A,B), (121) and for n = 1,...,N, Y n 0 = 0, (122) Y n k+1 = Ω (1−p n )[Y 0 k ] n+1,n+1 +p n Y n k ,Q nn ,R nn ,A nn ,B nn . (123) It’s easy to check that forn = 0, 1,...,N,Y n k =P n T +1−k for allk≥ 0, and that lim k→∞ Y n k = lim t→−∞ P n t =P n ∗ . The rest of the proof for parts 1 and 2 follows the same arguments as in Appendix C.5 for the proof of parts 1 and 2 of Lemma 5.8. For the proof of part 3, define 190 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon for n = 1,...,N, ¯ K n ∗ := Ψ (1−p n ) ¯ P 0 ∗ +p n ¯ P n ∗ ,R (n),A (n),B (n) , where ¯ P 0:N ∗ are the limits of ¯ P 0:N t (see Lemmas 5.17 and 5.19 and the auxiliary MJLS in Section 5.6). Then, it can be shown that (i) ¯ K n ∗ =L zero (K 0 ∗ ,K n ∗ ,n + 1,n + 1) and hence (ii) p n ρ((A nn +B nn K n ∗ )⊗ (A nn +B nn K n ∗ )) =p n ρ((A (n) +B (n) ¯ K n ∗ )⊗ (A (n) +B (n) ¯ K n ∗ )), which is less than 1 since the auxiliary MJLS of (5.121)-(5.125) is SD and SS (see proof of Lemma 5.19). The rest of the proof uses similar arguments as in Appendix C.6 for the proof of part 3 of Lemma 5.8. C.11 Proof of Lemma 5.21 The proof can be obtained by following the arguments in the proof of Lemma 5.9 and defining Λ k = tr([Y 0 k ] 1,1 ) + P N n=1 [(1−p n ) tr([Y 0 k ] n+1,n+1 ) +p n tr(Y n k )], where Y 0 k ,Y n k are as defined in Appendix C.10. C.12 Proof of Claim .5 In order to show that (119) holds, it suffices to show that the following equation is true for all t≥ 0: E g ∗ [c(X t ,U ∗ t )|H 0 t ] = Λ ∗ +E g ∗ [V t −V t+1 |H 0 t ], (124) where U ∗ t are the control actions at time t under g ∗ . This is because by taking the expec- tation of (124) and summing it from t = 0 to T , we obtain J T (g ∗ ) = (T + 1)Λ ∗ −E g ∗ [V T +1 ]. (125) Now, to show that (124) holds, note that E g ∗ [V t |H 0 t ] = V t since V t is a function of H 0 t . Hence, (124) is equivalent to E g ∗ [c(X t ,U ∗ t )|H 0 t ] +E g ∗ [V t+1 |H 0 t ] = Λ ∗ +V t . (126) 191 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon In the following subsections we will calculateE g ∗ [c(X t ,U ∗ t )|H 0 t ] andE g ∗ [V t+1 |H 0 t ] and then simplify the left hand side of (126). To do so, we define for n = 1,...,N, ˆ X n t+1|t :=E[X n t+1 |H 0 t ] (127) Σ n t+1|t := cov(X n t+1 |H 0 t ) (128) Σ n t := cov(X n t |H 0 t ). (129) and recall that ˆ X n t =E[X n t |H 0 t ]. Calculating E g ∗ [c(X t ,U ∗ t )|H 0 t ] Note that E g ∗ [c(X t ,U ∗ t )|H 0 t ] =E g ∗ [X | t QX t |H 0 t ] | {z } T 4 +E g ∗ [U ∗| t RU ∗ t |H 0 t ] | {z } T 5 . (130) T 4 can be written as T 4 = ˆ X | t Q ˆ X t +E h (X t − ˆ X t ) | Q(X t − ˆ X t )|H 0 t i = ˆ X | t Q ˆ X t + N X n=1 tr Q nn Σ n t . (131) where the second equality is true because according to [162, Lemma 3]X n t andX m t ,m6=n, are conditionally independent given H 0 t and because ˆ X 0 t =X 0 t . Similarly,T 5 can be written as T 5 = ˆ X | t (K 0 ∗ ) | RK 0 ∗ ˆ X t + N X n=1 tr (K n ∗ ) | R nn K n ∗ Σ n t . (132) Thus, E g ∗ [c(X t ,U ∗ t )|H 0 t ] = ˆ X | t Q ˆ X t + N X n=1 tr Q nn Σ n t + ˆ X | t (K 0 ∗ ) | RK 0 ∗ ˆ X t + N X n=1 tr (K n ∗ ) | R nn K n ∗ Σ n t . (133) 192 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon Calculating E g ∗ [V t+1 |H 0 t ] From the definition of V t+1 (see (120)) we have E g ∗ [V t+1 |H 0 t ] =E g ∗ h ˆ X | t+1 P 0 ∗ ˆ X t+1 H 0 t i | {z } T 6 +E g ∗ h N X n=1 tr P n ∗ Σ n t+1 H 0 t i | {z } T 7 . (134) Note that if Γ n t+1 = 1, ˆ X n t+1 = X n t+1 and Σ n t+1 = 0 and if Γ n t+1 = 0, ˆ X n t+1 = ˆ X n t+1|t and Σ n t+1 = Σ n t+1|t 4 . Let Δ be a random block diagonal matrix defined as follows: Δ := I d 0 X Γ 1 t+1 I d 1 X 0 . . . 0 Γ N t+1 I d N X . (135) Then, we can write ˆ X t+1 = ˆ X t+1|t + Δ(X t+1 − ˆ X t+1|t ), (136) Σ n t+1 = (1− Γ n t+1 )Σ n t+1|t , n∈N. (137) Now, we use (136) and (137) to calculate the terms T 6 and T 7 in (134). It can be shown through some straightforward manipulations that T 6 = ˆ X | t+1|t P 0 ∗ ˆ X t+1|t + tr([P 0 ∗ ] 1,1 ) + N X n=1 (1−p n ) tr [P 0 ∗ ] n+1,n+1 Σ n t+1|t . (138) Similarly, it can be shown that T 7 = N X n=1 p n tr P n ∗ Σ n t+1|t . (139) 4 If Γ n t+1 = 0, the remote controller gets no new information about X n t+1 . Hence, its belief on X n t+1 given H 0 t+1 remains the same as its belief on X n t+1 given H 0 t . 193 Appendix C. Decentralized Control over Unreliable Communication– Infinite horizon Combining (134), (138) and (139), we get E g ∗ [V t+1 |H 0 t ] = ˆ X | t+1|t P 0 ∗ ˆ X t+1|t + tr([P 0 ∗ ] 1,1 )+ N X n=1 (1−p n ) tr [P 0 ∗ ] n+1,n+1 Σ n t+1|t + N X n=1 p n tr P n ∗ Σ n t+1|t . (140) Since the right hand side of (140) involves ˆ X t+1|t and Σ n t+1|t , we will now try to write these in terms of ˆ X t and Σ n t . It can be easily established that ˆ X t+1|t =A s (0) ˆ X t , (141) Σ n t+1|t = I +A s (n)Σ n t A s (n) | , (142) where A s (0) =A +BK 0 ∗ and A s (n) =A nn +B nn K n ∗ for n = 1,...,N. Using (141),(142) and the definition of Λ ∗ in (140), we get E g ∗ [V t+1 |H 0 t ] = ˆ X | t A s (0) | P 0 ∗ A s (0) ˆ X t + Λ ∗ + N X n=1 (1−p n ) tr([P 0 ∗ ] n+1,n+1 A s (n)Σ t A s (n) | ) +p n tr(P n ∗ A s (n)Σ t A s (n) | ) . (143) Simplifying the left hand side of (126) Adding (143) and (133) and grouping together the terms involving ˆ X t and those involving Σ n t , we can write E g ∗ [c(X t ,U ∗ t )|H 0 t ] +E g ∗ [V t+1 |H 0 t ] = Λ ∗ + ˆ X | t Φ(P 0 ∗ ,K 0 ∗ ) ˆ X t + N X n=1 tr(Φ n ((1−p n )[P 0 ∗ ] n+1,n+1 +p n P n ∗ ,K n ∗ )Σ n t ), (144) where Φ(P 0 ∗ ,K 0 ∗ ) :=Q + (K 0 ∗ ) | RK 0 ∗ + (A +BK 0 ∗ ) | P 0 ∗ (A +BK 0 ∗ ), (145) and Φ n ((1−p n )[P 0 ∗ ] n+1,n+1 +p n P n ∗ ,K n ∗ ) :=Q nn + (K n ∗ ) | R nn K n ∗ + (A nn +B nn K n ∗ ) | (1−p n )[P 0 ∗ ] n+1,n+1 +p n P n ∗ (A nn +B nn K n ∗ ). (146) 194 Appendix D. Decentralized Control with Partially Unknown Systems Using the fact thatK 0 ∗ = Ψ(P 0 ∗ ,R,A,B) and thatP 0 ∗ = Ω(P 0 ∗ ,Q,R,A,B), it can be shown that Φ(P 0 ∗ ,K 0 ∗ ) =P 0 ∗ . (147) Similarly, using the fact thatK n ∗ = Ψ((1−p n )[P 0 ∗ ] n+1,n+1 +p n P n ∗ ,R nn ,A nn ,B nn ) and that P n ∗ = Ω (1−p n )[P 0 ∗ ] n+1,n+1 +p n P n ∗ ,Q nn ,R nn ,A nn ,B nn , it can be shown that Φ n ((1−p n )[P 0 ∗ ] n+1,n+1 +p n P n ∗ ,K n ∗ ) =P n ∗ . (148) Using (147) and (148) in (144), we get E g ∗ [c(X t ,U ∗ t )|H 0 t ] +E g ∗ [V t+1 |H 0 t ] = Λ ∗ + ˆ X | t P 0 ∗ ˆ X t + N X n=1 tr(P n ∗ Σ n t ), = Λ ∗ +V t . (149) This establishes (126) and hence completes the proof of the claim. D Decentralized Control with Partially Unknown Systems D.1 Proof of Lemma 6.3 First, note that the only source of randomness in the TS-MARL algorithm is the step where the agents generate sample from their posteriors. Further, all the three agents begin with the same priorμ 1 and they all observe the state of the central system. If the sampling seed is the same among the agents (i.e., under Assumption 6.3), then the information that the agents need to update ˇ X 2 t , ˇ X 3 t , and the posteriorμ t is the same. This will lead to the same τ, L, ˇ X 2 t , ˇ X 3 t , Θ t , and μ t among the agents. D.2 Proof of Lemma 6.4 First, note that Σ n t can be sequentially calculated as Σ n t+1 = I +C n Σ n t (C n ) | with Σ n 1 = 0. Now, we use induction to show that the sequence of matrices Σ n t ,t≥ 1, is increasing. First, we can write Σ n t+1 − Σ n t = C n (Σ n t − Σ n t−1 )(C n ) | . Then, since Σ n 1 = 0 and Σ n 2 = I 0, 195 Appendix D. Decentralized Control with Partially Unknown Systems we have Σ n 2 − Σ n 1 0. Now, assume that Σ n t − Σ n t−1 0. Then, it is easy to see that Σ n t+1 − Σ n t =C n (Σ n t − Σ n t−1 )(C n ) | 0. To show that the sequence of matrices Σ n t , t≥ 1 converges to Σ n as t→∞, first we show thatC n is stable, that is, ρ(C n )< 1. Note thatC n =A nn +B nn ˜ K n where from (6.14), we have ˜ K n = Ψ( ˜ P n ,R nn ,A nn ,B nn ), ˜ P n = Ω( ˜ P n ,Q nn ,R nn ,A nn ,B nn ). (150) Then, since from Assumption 6.1, (A nn ,B nn ) is stabilizable, we know from [55, Theorem 2.21] that ρ(C n ) < 1. Since C n is stable, the converges of the sequence of matrices Σ n t , t≥ 1, can be concluded from [128, Chapter 3.3]. D.3 Proof of Lemma 6.5 Let π ∗ be optimal policy for the auxiliary SARL problem when Θ is known. Then, from (6.20) we have, J (Θ) = lim sup T→∞ 1 T T X t=1 E π ∗ [c(X t ,U t )|Θ], (151) From Lemma 6.2, we know that under the optimal policy π ∗ , U t = K(Θ)X t and hence, the dynamics of X t in (6.18) can be written as, X t+1 = A +BK(Θ) X t + vec(W 1 t , 0, 0). (152) Further, letπ ∗ be optimal policy for the MARL problem when Θ is known. Then, from (6.8) we have, J(Θ) = lim sup T→∞ 1 T T X t=1 E π ∗ [c(X t ,U t )|Θ]. (153) From Lemma 6.1, we know that under the optimal policy π ∗ , U t =K(Θ) ¯ X t + vec(0, ˜ K 2 E 2 t , ˜ K 3 E 3 t ) (154) 196 Appendix D. Decentralized Control with Partially Unknown Systems where we have defined ¯ X t := vec(X 1 t , ˆ X 2 t , ˆ X 3 t ) andE n t :=X n t − ˆ X n t ,n∈{2, 3}. Then, from the dynamics of X t in (6.3) and update equation for ˆ X 2 t , ˆ X 3 t in (6.16), we can write X t+1 = ¯ X t+1 + vec(0,E 2 t+1 ,E 3 t+1 ) (155) where ¯ X t+1 = A +BK(Θ) ¯ X t + vec(W 1 t , 0, 0), (156) E 2 t+1 =C 2 E 2 t +W 2 t , E 3 t+1 =C 3 E 3 t +W 3 t . (157) Note that we have defined C n = A nn +B nn ˜ K n . Now by comparing (152) and (156) and the fact that both X 1 and ¯ X 1 zero, we can see that for any time t, ¯ X t+1 =X t+1 . (158) Now, we can use the above results to write E π ∗ [c(X t ,U t )|Θ] as follows, E π ∗ [c(X t ,U t )|Θ] =E π ∗ [X | t QX t + (U t ) | RU t |Θ] =E π ∗ [ ¯ X | t Q ¯ X t + (K(Θ) ¯ X t ) | RK(Θ) ¯ X t |Θ] +E[(E 2 t ) | D 2 E 2 t + (E 3 t ) | D 3 E 3 t + (E 2 t ) | D 23 E 3 t |Θ] =E π ∗ [(X t ) | QX t + (K(Θ)X t ) | RK(Θ)X t |Θ] +E[(E 2 t ) | D 2 E 2 t + (E 3 t ) | D 3 E 3 t + (E 2 t ) | D 23 E 3 t ] =E π ∗ [c(X t ,U t )|Θ] +E[(E 2 t ) | D 2 E 2 t + (E 3 t ) | D 3 E 3 t ] =E π ∗ [c(X t ,U t )|Θ] + tr(D 2 Σ 2 t ) + tr(D 3 Σ 3 t ), (159) where we have defined D 23 := Q 23 + ( ˜ K 2 ) | R 23 ˜ K 3 . Note that the first equality is correct from (6.6), the second equality is correct because of (154) and (155), and the third equality is correct because of (158) and the fact thatE n t ,n = 2, 3, described in (157) does not depend on matricesA 11 ,B 11 and hence it is independent of Θ. Further, the fourth equality is correct because E n t is only a function of W n 1:t−1 and since W 2 1:t−1 and W 3 1:t−1 are independent, we know that E 2 t and E 3 t are independent. Finally the last equality is correct because E n t has the same dynamics as S n t in Lemma 6.4, and consequently, cov(E n t ) = Σ n t . Now, by substituting (159) in (153), considering (151) and the fact from Lemma 6.4, Σ n t converges to Σ n for n∈{2, 3} as t→∞, the statement of the lemma follows. 197 Appendix D. Decentralized Control with Partially Unknown Systems D.4 Proof of Lemma 6.6 First note that under the policy of the TS-SARL algorithm, U t =K(Θ t )X t and hence, the dynamics of X t in (6.18) can be written as, X t+1 = A +BK(Θ t ) X t + vec(W 1 t , 0, 0). (160) Further, note that under the policy of the TS-MARL algorithm, U t =K(Θ t ) ¯ X t + vec(0, ˜ K 2 E 2 t , ˜ K 3 E 3 t ) (161) where we have defined ¯ X t := vec(X 1 t , ˇ X 2 t , ˇ X 3 t ) andE n t :=X n t − ˇ X n t ,n∈{2, 3}. Then, from the dynamics of X t in (6.3) and update equation for ˇ X 2 t , ˇ X 3 t in (6.25), we can write X t+1 = ¯ X t+1 + vec(0,E 2 t+1 ,E 3 t+1 ), (162) where ¯ X t+1 = A +BK(Θ t ) ¯ X t + vec(W 1 t , 0, 0), (163) E 2 t+1 =C 2 E 2 t +W 2 t , E 3 t+1 =C 3 E 3 t +W 3 t . (164) Now by comparing (160) and (163) and the fact that bothX 1 and ¯ X 1 zero, we can see that for any time t, ¯ X t+1 =X t+1 . (165) Now, we can use the above results to write E TS-MARL [c(X t ,U t )] as follows, E TS-MARL [c(X t ,U t )] =E TS-MARL [X | t QX t + (U t ) | RU t ] =E TS-MARL [ ¯ X | t Q ¯ X t + (K(Θ t ) ¯ X t ) | RK(Θ t ) ¯ X t ] +E[(E 2 t ) | D 2 E 2 t + (E 3 t ) | D 3 E 3 t + (E 2 t ) | D 23 E 3 t ] =E TS-SARL [(X t ) | QX t + (K(Θ t )X t ) | RK(Θ t )X t ] +E[(E 2 t ) | D 2 E 2 t + (E 3 t ) | D 3 E 3 t + (E 2 t ) | D 23 E 3 t ] =E TS-SARL [c(X t ,U t )] +E[(E 2 t ) | D 2 E 2 t + (E 3 t ) | D 3 E 3 t ] =E TS-SARL [c(X t ,U t )] + tr(D 2 Σ 2 t ) + tr(D 3 Σ 3 t ), (166) 198 Appendix E. Dynamic Teams and Decentralized Control Problems with Substitutable Actions where we have defined D 23 := Q 23 + ( ˜ K 2 ) | R 23 ˜ K 3 . Note that the first equality is correct from (6.6), the second equality is correct because of (161) and (162), and the third equality is correct because of (165). Further, the fourth equality is correct because E n t is only a function of W n 1:t−1 and since W 2 1:t−1 and W 3 1:t−1 are independent, we know that E 2 t and E 3 t are independent. Finally the last equality is correct because E n t has the same dynamics as S n t in Lemma 6.4, and consequently, cov(E n t ) = Σ n t . E Dynamic Teams and Decentralized Control Problems with Substitutable Actions E.1 Proof of Claim 7.1 We want to show that the term PU is the same under the team strategiesγ l+1 andγ l of Problem 7.2. Under different team strategies, the information available to team members changes. Hence, we first need to show that ˜ Z i ,∀i∈N, is the same underγ l+1 andγ l . If we denote the information available to member i in Problem 2 under team strategiesγ l+1 andγ l by ˜ Z i γ l+1 and ˜ Z i γ l respectively, we want to show that, ˜ Z i γ l+1 = ˜ Z i γ l ∀i∈N. (167) According to (7.6), ˜ Z i is obtained from{Z r , r≤ i}. Therefore, to show that (167) holds, it suffices to show that Z r γ l+1 =Z r γ l ∀r∈N. (168) According to Procedure 1,γ j l+1 is the same asγ j l forj∈N\{t,k}. We, therefore, categorize team members into two groups: • Group 1:{r∈N :r≤ min{t,k}} • Group 2:{r∈N :r> min{t,k}} Forr in Group 1,Z r does not depend on the strategies of members t andk. Therefore, for r in Group 1, (168) holds. 199 Appendix E. Dynamic Teams and Decentralized Control Problems with Substitutable Actions We will show inductively that for all r ≤ h, Z r is the same under γ l and γ l+1 . For h = min{t,k}, the statement holds because we have shown that it holds for r in Group 1. Now, assume that for someα≥ min{t,k}, (168) holds for allr≤α (induction hypothesis). This implies that (167) also holds for r≤ α. We now need to show that (168) holds for r =α + 1. Suppose α + 1>t,α + 1>k. Under team strategyγ l+1 , Z α+1 can be written as Z α+1 γ l+1 =H α+1 Ξ + X j<α+1 D α+1,j U j γ j l+1 =H α+1 Ξ + X j<α+1,j6=t,j6=k D α+1,j γ j l+1 ( ˜ Z j γ l+1 )+ D α+1,t γ t l+1 ( ˜ Z t γ l+1 ) +D α+1,k γ k l+1 ( ˜ Z k γ l+1 ). (169) ˜ Z j , ˜ Z t and ˜ Z k present in the right hand side of (169) are the same under control strategies γ l and γ l+1 by the induction hypothesis. Further, for j6= t,k, γ j l+1 = γ j l . Using these observations and (7.12) and (7.13), (169) can be written as, Z α+1 γ l+1 =H α+1 Ξ + X j<α+1,j6=t,j6=k D α+1,j γ j l ( ˜ Z j γ l )+ +D α+1,t γ t l ( ˜ Z t γ l )−K ts l Z s γ l +D α+1,k γ k l ( ˜ Z k γ l ) + Λ kst K ts l Z s γ l . (170) According to Lemma 7.1 and the substitutability assumption, D α+1,k Λ kst K ts l Z s γ l =D α+1,t K ts l Z s γ l . (171) Using (171), (170) can be simplified as, Z α+1 γ l+1 =H α+1 Ξ + X j<α+1,j6=t,j6=k D α+1,j γ j l ( ˜ Z j γ l ) +D α+1,t γ t l ( ˜ Z t γ l ) +D α+1,k γ k l ( ˜ Z k γ l ) =Z α+1 γ l . (172) Thus, Z α+1 γ l+1 = Z α+1 γ l if α + 1 > t,α + 1 > k. If t < α + 1 ≤ k (alternatively k < α + 1 ≤ t), we can employ arguments similar to above along with the fact that D α+1,k = 0 (alternatively D α+1,t = 0) to show (168) for r =α + 1. 200 Appendix E. Dynamic Teams and Decentralized Control Problems with Substitutable Actions Hence, by induction, (168) holds for all r from 1 to n. Therefore, Z r and consequently ˜ Z r for r∈N are the same under team strategiesγ l andγ l+1 . Now, we show that PU is the same under team strategies γ l and γ l+1 . Under γ l+1 , PU can be written as follows, PU γ l+1 = N X j=1 P j γ j l+1 ( ˜ Z j ) = X j∈N\{t,k} P j γ j l+1 ( ˜ Z j ) +P t γ t l+1 ( ˜ Z t ) +P k γ k l+1 ( ˜ Z k ) = X j∈N\{t,k} P j γ j l ( ˜ Z j ) +P t γ t l ( ˜ Z t )−K ts l Z s +P k γ k l ( ˜ Z k ) + Λ kst K ts l Z s = X j∈N\{t,k} P j γ j l ( ˜ Z j ) +P t γ t l ( ˜ Z t ) +P k γ k l ( ˜ Z k ) =PU γ l (173) where the penultimate equality is true because Lemma 7.1 and the substitutability assump- tion provide that P k Λ kst K ts l Z s =P t K ts l Z s . (174) 201 Bibliography [1] Y. Ouyang, M. Gagrani, and R. Jain, “Control of unknown linear systems with thomp- son sampling,” in 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1198–1205, IEEE, 2017. [2] Y. Wang, D. J. Hill, and G. Guo, “Robust decentralized control for multimachine power systems,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 45, pp. 271–279, Mar 1998. [3] A. L. Dimeas and N. D. Hatziargyriou, “Operation of a multiagent system for mi- crogrid control,” IEEE Transactions on Power Systems, vol. 20, pp. 1447–1455, Aug 2005. [4] A. Vaccaro, G. Velotto, and A. F. Zobaa, “A decentralized and cooperative architec- ture for optimal voltage regulation in smart grids,” IEEE Transactions on Industrial Electronics, vol. 58, no. 10, pp. 4593–4602, 2011. [5] J. P. Lynch, K. H. Law, et al., “Decentralized control techniques for large-scale civil structural systems,” in Proceedings of the 20th International Modal Analysis Confer- ence, pp. 4–7, 2002. [6] V. Chandan and A. G. Alleyne, “Decentralized architectures for thermal control of buildings,” in American Control Conference (ACC), pp. 3657–3662, June 2012. [7] V. Chandan and A. Alleyne, “Optimal partitioning for the decentralized thermal control of buildings,” IEEE Transactions on Control Systems Technology, vol. 21, pp. 1756–1770, Sept 2013. [8] A. V. Savkin, “Analysis and synthesis of networked control systems: Topological entropy, observability, robustness and optimal control,” Automatica, vol. 42, pp. 51– 62, Jan. 2006. 202 Bibliography [9] R. A. Gupta and M.-Y. Chow, “Networked control system: overview and research trends,” IEEE Transactions on Industrial Electronics, vol. 57, no. 7, pp. 2527–2535, 2010. [10] W. Zhang, M. S. Branicky, and S. M. Phillips, “Stability of networked control sys- tems,” IEEE Control Systems, vol. 21, no. 1, pp. 84–99, 2001. [11] P. Seiler and R. Sengupta, “An H ∞ approach to networked control,” IEEE Transac- tions on Automatic Control, vol. 50, no. 3, pp. 356–364, 2005. [12] P. Ogren, E. Fiorelli, and N. E. Leonard, “Cooperative control of mobile sensor net- works: Adaptive gradient climbing in a distributed environment,” IEEE Transactions on Automatic control, vol. 49, no. 8, pp. 1292–1302, 2004. [13] J. Wolfe, D. Chichka, and J. Speyer, “Decentralized controllers for unmanned aerial vehicle formation flight,” AIAA paper, pp. 96–3833, 1996. [14] P. J. Seiler, Coordinated control of unmanned aerial vehicles. PhD thesis, University of California, Berkeley, 2001. [15] A. Jadbabaie, J. Lin, and A. S. Morse, “Coordination of groups of mobile autonomous agents using nearest neighbor rules,” IEEE Transactions on automatic control, vol. 48, no. 6, pp. 988–1001, 2003. [16] S. S. Stankovic, M. J. Stanojevic, and D. D. Siljak, “Decentralized overlapping control of a platoon of vehicles,” IEEE Transactions on Control Systems Technology, vol. 8, pp. 816–832, Sep 2000. [17] J. Marschak, “Elements for a theory of teams,” Management Science, vol. 1, pp. 127– 137, Jan. 1955. [18] R. Radner, “Team decision problems,” Annals of Mathmatical Statistics, vol. 33, pp. 857–881, 1962. [19] H. S. Witsenhausen, “A counterexample in stochastic optimum control,” SIAM Jour- nal of Optimal Control, vol. 6, no. 1, pp. 131–147, 1968. [20] Y.-C. Ho and K.-C. Chu, “Team decision theory and information structures in optimal control problems–Part I,” IEEE Transactions on Automatic Control, vol. 17, no. 1, pp. 15–22, 1972. 203 Bibliography [21] S. Y¨ uksel, “Stochastic nestedness and the belief sharing information pattern,” IEEE Transactions on Automatic Control, pp. 2773–2786, Dec. 2009. [22] A. Mahajan and S. Y¨ uksel, “Measure and cost dependent properties of information structures,” in American Control Conference (ACC), pp. 6397–6402, 2010. [23] S. Y¨ uksel and T. Ba¸ sar, Stochastic Networked Control Systems: Stabilization and Optimization under Information Constraints. Boston, MA: Birkh¨ auser, 2013. [24] K.-C. Chu, “Team decision theory and information structures in optimal control problems–part ii,” IEEE Transactions on Automatic Control, vol. 17, pp. 22–28, Feb 1972. [25] H. S. Witsenhausen, “A standard form for sequential stochastic control,” Mathemat- ical Systems Theory, vol. 7, no. 1, pp. 5–11, 1973. [26] A. Mahajan, “Sequential decomposition of sequential dynamic teams: Applications to real-time communication and networked control systems.,” 2008. [27] A. Nayyar and D. Teneketzis, “Common knowledge and sequential team problems,” IEEE Transactions on Automatic Control, 2019. [28] J. Swigart and S. Lall, “An explicit state-space solution for a decentralized two-player optimal linear-quadratic regulator,” in Proc. American Control Conference (ACC), pp. 6385–6390, 2010. [29] J. Swigart and S. Lall, Lecture Notes in Control and Information Sciences, vol. 406, ch. Networked Control Systems, pp. 179–201. 201. [30] P. Shah and P. A. Parrilo, “An optimal controller architecture for poset-causal sys- tems,” in IEEE Conference on Decision and Control, pp. 5522–5528, IEEE, 2011. [31] L. Lessard, “Decentralized lqg control of systems with a broadcast architecture,” in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 6241–6246, IEEE, 2012. [32] L. Lessard and A. Nayyar, “Structural results and explicit solution for two-player LQG systems on a finite time horizon,” in IEEE 52nd Annual Conference on Decision and Control (CDC), 2013, pp. 6542–6549, Dec 2013. 204 Bibliography [33] A. Nayyar and L. Lessard, “Structural results for partially nested lqg systems over graphs,” in American Control Conference (ACC), 2015, pp. 5457–5464, July 2015. [34] L. Lessard and S. Lall, “Optimal control of two-player systems with output feedback,” IEEE Transactions on Automatic Control, vol. 60, pp. 2129–2144, Aug. 2015. [35] A. Mishra, C. Langbort, and G. E. Dullerud, “Team optimal control of stochastically switched systems with local parameter knowledge,” IEEE Transactions on Automatic Control, vol. 60, no. 8, pp. 2086–2101, 2015. [36] Y.-S. Wang, N. Matni, and J. C. Doyle, “A system level approach to controller syn- thesis,” IEEE Transactions on Automatic Control, 2019. [37] A. Lamperski and J. C. Doyle, “On the structure of state-feedback lqg controllers for distributed systems with communication delays,” in 2011 50th IEEE Conference on Decision and Control and European Control Conference, pp. 6901–6906, IEEE, 2011. [38] H. R. Feyzmahdavian, A. Alam, and A. Gattami, “Optimal distributed controller design with communication delays: Application to vehicle formations,” in 51st IEEE Conference on Decision and Control (CDC), pp. 2232–2237, IEEE, 2012. [39] A. Lamperski and J. Doyle, “Dynamic programming solutions for decentralized state- feedback LQG problems with communication delays,” in Proc. American Control Con- ference (ACC), pp. 6322–6327, June 2012. [40] N. Matni and J. C. Doyle, “Optimal distributed lqg state feedback with varying communication delay,” in 52nd IEEE Conference on Decision and Control, pp. 5890– 5896, IEEE, 2013. [41] N. Matni, A. Lamperski, and J. C. Doyle, “Optimal two player lqr state feedback with varying delay1, 2,” IFAC Proceedings Volumes, vol. 47, no. 3, pp. 2854–2859, 2014. [42] A. Lamperski and L. Lessard, “Optimal decentralized state-feedback control with sparsity and delays,” Automatica, vol. 58, pp. 143–151, 2015. [43] N. Nayyar, D. Kalathil, and R. Jain, “Optimal decentralized control with asymmetric one-step delayed information sharing,” IEEE Transactions on Control of Network Systems, vol. 5, no. 1, pp. 653–663, 2018. 205 Bibliography [44] Y. Wang, J. Xiong, and W. Ren, “Decentralised output-feedback lqg control with one- step communication delay,” International Journal of Control, vol. 91, no. 8, pp. 1920– 1930, 2018. [45] Y. Wang and J. Xiong, “Optimal decentralized output-feedback lqg control with ran- dom communication delay,” IEEE transactions on cybernetics, 2018. [46] M. Rotkowitz and S. Lall, “A characterization of convex problems in decentralized control,” IEEE Transactions on Automatic Control, vol. 51, no. 2, pp. 274–286, 2006. [47] L. Lessard and S. Lall, “Quadratic invariance is necessary and sufficient for convexity,” in Proceedings of the 2011 American Control Conference, pp. 5360–5362, IEEE, 2011. [48] B. Bamieh and P. Voulgaris, “A convex characterization of distributed control prob- lems in spatially invariant systems with communication constraints,” Systems and Control Letters, vol. 54, no. 6, pp. 575–583, 2005. [49] X. Qi, M. V. Salapaka, P. G. Voulgaris, and M. Khammash, “Structured optimal and robust control with multiple criteria: A convex solution,” IEEE Transactions on Automatic Control, vol. 49, no. 10, pp. 1623–1640, 2004. [50] A. Mahajan and A. Nayyar, “Sufficient statistics for linear control strategies in de- centralized systems with partial history sharing,” IEEE Transactions on Automatic Control, vol. 60, no. 8, pp. 2046–2056, 2015. [51] O. C. Imer, S. Y¨ uksel, and T. Ba¸ sar, “Optimal control of LTI systems over unreliable communication links,” Automatica, vol. 42, no. 9, pp. 1429–1439, 2006. [52] L. Schenato, B. Sinopoli, M. Franceschetti, K. Poolla, and S. S. Sastry, “Foundations of control and estimation over lossy networks,” Proceedings of the IEEE, vol. 95, pp. 163–187, Jan 2007. [53] A. Mahajan, N. C. Martins, and S. Y¨ uksel, “Static lqg teams with countably infinite players,” in 52nd IEEE Conference on Decision and Control, pp. 6765–6770, Dec 2013. [54] A. Nayyar, A. Mahajan, and D. Teneketzis, “Decentralized stochastic control with partial history sharing: A common information approach,” IEEE Transactions on Automatic Control, vol. 58, no. 7, pp. 1644–1658, 2013. 206 Bibliography [55] O. L. V. Costa, M. D. Fragoso, and R. P. Marques, Discrete-time Markov jump linear systems. Springer Science & Business Media, 2006. [56] O. L. Costa and M. D. Fragoso, “Discrete-time lq-optimal control problems for infinite markov jump parameter systems,” IEEE Transactions on Automatic Control, vol. 40, no. 12, pp. 2076–2088, 1995. [57] J. P. Hespanha, P. Naghshtabrizi, and Y. Xu, “A survey of recent results in networked control systems,” Proceedings of the IEEE, vol. 95, pp. 138–162, Jan 2007. [58] N. Lu, N. Cheng, N. Zhang, X. Shen, and J. W. Mark, “Connected vehicles: Solutions and challenges,” IEEE internet of things journal, vol. 1, no. 4, pp. 289–299, 2014. [59] B. Sinopoli, L. Schenato, M. Franceschetti, K. Poolla, and S. S. Sastry, “Optimal con- trol with unreliable communication: the TCP case,” in American Control Conference (ACC), 2005, pp. 3354–3359 vol. 5, June 2005. [60] B. Sinopoli, L. Schenato, M. Franceschetti, K. Poolla, and S. Sastry, “Optimal linear LQG control over lossy networks without packet acknowledgment,” in IEEE 45th Annual Conference on Decision and Control (CDC), 2006, pp. 392–397, Dec 2006. [61] N. Elia and J. N. Eisenbeis, “Limitations of linear remote control over packet drop networks,” in IEEE 43th Annual Conference on Decision and Control (CDC), 2004, vol. 5, pp. 5152–5157 Vol.5, Dec 2004. [62] E. Garone, B. Sinopoli, and A. Casavola, “LQG control over lossy TCP-like networks with probabilistic packet acknowledgements,” in IEEE 47th Annual Conference on Decision and Control (CDC), 2008, pp. 2686–2691, Dec 2008. [63] V. Gupta, “On estimation across analog erasure links with and without acknowl- edgements,” IEEE Transactions on Automatic Control, vol. 55, pp. 2896–2901, Dec 2010. [64] R. Horowitz and P. Varaiya, “Control design of an automated highway system,” Pro- ceedings of the IEEE, vol. 88, no. 7, pp. 913–925, 2000. [65] G. M. Lipsa and N. Martins, “Remote state estimation with communication costs for first-order LTI systems,” IEEE Transactions on Automatic Control, vol. 56, pp. 2013– 2025, Sept. 2011. 207 Bibliography [66] A. Nayyar, T. Basar, D. Teneketzis, and V. Veeravalli, “Optimal strategies for commu- nication and remote estimation with an energy harvesting sensor,” IEEE Transactions on Automatic Control, vol. 58, no. 9, pp. 2246–2260, 2013. [67] M. Nourian, A. S. Leong, and S. Dey, “Optimal energy allocation for kalman filtering over packet dropping links with imperfect acknowledgments and energy harvesting constraints,” IEEE Transactions on Automatic Control, vol. 59, pp. 2128–2143, Aug 2014. [68] S. Knorn and S. Dey, “Optimal sensor transmission energy allocation for linear control over a packet dropping link with energy harvesting,” in IEEE 54th Annual Conference on Decision and Control (CDC), 2015, pp. 1199–1204, Dec 2015. [69] V. Gupta, N. C. Martins, and J. S. Baras, “Optimal output feedback control using two remote sensors over erasure channels,” IEEE Transactions on Automatic Control, vol. 54, pp. 1463–1476, July 2009. [70] V. Gupta, A. F. Dana, J. P. Hespanha, R. M. Murray, and B. Hassibi, “Data trans- mission over networks for estimation and control,” IEEE Transactions on Automatic Control, vol. 54, pp. 1807–1819, Aug 2009. [71] V. Gupta, B. Hassibi, and R. M. Murray, “Optimal LQG control across packet- dropping links,” Systems & Control Letters, vol. 56, no. 6, pp. 439–446, 2007. [72] R. Bansal and T. Ba¸ sar, “Simultaneous design of measurement channels and control stategies for stochastic systems with feedback,” Automatica, vol. 25, pp. 679–694, 1989. [73] S. Tatikonda, A. Sahai, and S. K. Mitter, “Stochastic linear control over a commu- nication channel,” IEEE Transactions on Automatic Control, vol. 49, pp. 1549–1561, Sept. 2004. [74] G. N. Nair, F. Fagnani, S. Zampieri, and R. J. Evans, “Feedback control under data rate constraints: An overview,” Proceedings of the IEEE, vol. 95, pp. 108–137, Jan. 2007. [75] A. Molin and S. Hirche, “On the optimality of certainty equivalence for event-triggered control systems,” IEEE Transactions on Automatic Control, vol. 58, no. 2, pp. 470– 474, 2013. 208 Bibliography [76] M. Rabi, C. Ramesh, and K. H. Johansson, “Separated design of encoder and controller for networked linear quadratic optimal control,” arXiv preprint arXiv:1405.0135, 2014. [77] G. M. Lipsa and N. C. Martins, “Optimal memoryless control in Gaussian noise: A simple counterexample,” Automatica, vol. 47, no. 3, pp. 552–558, 2011. [78] V. D. Blondel and J. N. Tsitsiklis, “A survey of computational complexity results in systems and control,” Automatica, vol. 36, no. 9, pp. 1249–1274, 2000. [79] A. Lamperski and J. Doyle, “On the structure of state-feedback LQG controllers for distributed systems with communication delays,” in Proc. 50th IEEE Conf. Decision and Control and European Control Conf (CDC-ECC), pp. 6901–6906, Dec. 2011. [80] P. Shah and P. Parrilo, “H 2 -optimal decentralized control over posets: A state- space solution for state-feedback,” IEEE Transactions on Automatic Control, vol. 58, pp. 3084–3096, Dec. 2013. [81] S. M. Asghari and A. Nayyar, “Decentralized control problems with substitutable actions,” in 54th IEEE Conference on Decision and Control (CDC), pp. 5302–5307, Dec 2015. [82] S. M. Asghari and A. Nayyar, “Dynamic teams and decentralized control prob- lems with substitutable actions,” IEEE Transactions on Automatic Control, vol. 62, pp. 5302–5309, Oct 2017. [83] C. C. Chang and S. Lall, “An explicit solution for optimal two-player decentralized control over tcp erasure channels with state feedback,” in Proceedings of the 2011 American Control Conference, pp. 4717–4722, June 2011. [84] C. C. Chang and S. Lall, “Synthesis for optimal two-player decentralized control over tcp erasure channels with state feedback,” in 2011 50th IEEE Conference on Decision and Control and European Control Conference, pp. 3824–3829, Dec 2011. [85] A. Mahajan, “Optimal decentralized control of coupled subsystems with control shar- ing,” IEEE Transactions of Automatic Control, vol. 58, no. 9, pp. 2377–2382, 2013. [86] Y. Ouyang, S. M. Asghari, and A. Nayyar, “Optimal local and remote controllers with unreliable communication,” in IEEE 55th Conference on Decision and Control (CDC), pp. 6024–6029, Dec 2016. 209 Bibliography [87] L. Lessard, “Decentralized LQG control of systems with a broadcast architecture,” in IEEE 51st Annual Conference on Decision and Control (CDC), 2012, pp. 6241–6246, Dec 2012. [88] J. Wu and S. Lall, “A dynamic programming algorithm for decentralized markov decision processes with a broadcast structure,” in 49th IEEE Conference on Decision and Control (CDC), pp. 6143–6148, Dec 2010. [89] X.-M. Zhang, Q.-L. Han, and X. Yu, “Survey on recent advances in networked control systems,” IEEE Transactions on Industrial Informatics, vol. 12, no. 5, pp. 1740–1752, 2016. [90] G. C. Walsh, H. Ye, and L. G. Bushnell, “Stability analysis of networked control systems,” IEEE transactions on control systems technology, vol. 10, no. 3, pp. 438– 446, 2002. [91] L. Schenato, B. Sinopoli, M. Franceschetti, K. Poolla, and S. S. Sastry, “Foundations of control and estimation over lossy networks,” Proceedings of the IEEE, vol. 95, no. 1, pp. 163–187, 2007. [92] S. Oh and S. Sastry, “Distributed networked control system with lossy links: state estimation and stabilizing communication control,” in Decision and Control, 2006 45th IEEE Conference on, pp. 1942–1947, IEEE, 2006. [93] Y. Ouyang, S. M. Asghari, and A. Nayyar, “Stochastic teams with randomized infor- mation structures,” in IEEE 56th Annual Conference on Decision and Control (CDC), pp. 4733–4738, Dec 2017. [94] S. M. Asghari and A. Nayyar, “Dynamic teams and decentralized control prob- lems with substitutable actions,” IEEE Transactions on Automatic Control, vol. 62, pp. 5302–5309, Oct 2017. [95] B. Sinopoli, L. Schenato, M. Franceschetti, K. Poolla, M. I. Jordan, and S. S. Sastry, “Kalman filtering with intermittent observations,” IEEE Transactions on Automatic Control, vol. 49, no. 9, pp. 1453–1464, 2004. [96] A. Mahajan, N. C. Martins, M. C. Rotkowitz, and S. Y¨ uksel, “Information structures in optimal decentralized control,” in IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 1291–1306, Dec 2012. 210 Bibliography [97] A. Mahajan and A. Nayyar, “Sufficient statistics for linear control strategies in de- centralized systems with partial history sharing,” IEEE Transactions on Automatic Control, vol. 60, pp. 2046–2056, Aug 2015. [98] J.-H. Kim and S. Lall, “Separable optimal cooperative control problems,” in American Control Conference (ACC), 2012, pp. 5868–5873, IEEE, 2012. [99] J. Swigart and S. Lall, “Optimal controller synthesis for a decentralized two-player system with partial output feedback,” in Proceedings of the 2011 American Control Conference, pp. 317–323, June 2011. [100] T. Tanaka and P. A. Parrilo, “Optimal output feedback architecture for triangular lqg problems,” in American Control Conference (ACC), pp. 5730–5735, IEEE, 2014. [101] A. Nayyar and L. Lessard, “Optimal control for LQG systems on graphs - part I: structural results,” CoRR, vol. abs/1408.2551, 2014. [102] P. Varaiya and J. Walrand, “On delayed sharing patterns,” IEEE Transactions on Automatic Control, vol. 23, no. 3, pp. 443–445, 1978. [103] B. Z. Kurtaran and R. Sivan, “Linear-quadratic-Gaussian control with one-step-delay sharing pattern,” IEEE Transactions on Automatic Control, vol. 19, pp. 571–574, Oct 1974. [104] T. Yoshikawa, “Dynamic programming approach to decentralized stochastic control problems,” IEEE Transactions on Automatic Control, vol. 20, pp. 796 – 797, Dec. 1975. [105] A. Nayyar, A. Mahajan, and D. Teneketzis, “Optimal control strategies in delayed sharing information structures,” IEEE Transactions on Automatic Control, vol. 56, pp. 1606–1620, July 2011. [106] N. Nayyar, D. Kalathil, and R. Jain, “Optimal decentralized control in unidirectional one-step delayed sharing pattern with partial output feedback,” in American Control Conference (ACC), 2014, pp. 1906–1911, June 2014. [107] A. Rantzer, “A separation principle for distributed control,” in Proc. 45th IEEE Conf on Decision and Control, pp. 3609–3613, Dec. 2006. 211 Bibliography [108] H. R. Feyzmahdavian, A. Gattami, and M. Johansson, “Distributed output-feedback LQG control with delayed information sharing,” in Proc. IFAC Workshop on Dis- tributed Estimation and Control in Networked Systems (NECSYS), 2012. [109] A. Lamperski and J. C. Doyle, “TheH 2 control problem for quadratically invariant systems with delays,” IEEE Transactions on Automatic Control, vol. 60, pp. 1945– 1950, July 2015. [110] C.-C. Chang and S. Lall, “An explicit solution for optimal two-player decentralized control over tcp erasure channels with state feedback,” in American Control Confer- ence (ACC), 2011, pp. 4717–4722, IEEE, 2011. [111] C.-C. Chang and S. Lall, “Synthesis for optimal two-player decentralized control over tcp erasure channels with state feedback,” in Decision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on, pp. 3824–3829, IEEE, 2011. [112] X. Liang and J. Xu, “Control for networked control systems with remote and local controllers over unreliable communication channel,” arXiv preprint arXiv:1803.01336, 2018. [113] D. P. Bertsekas, Dynamic Programming and Optimal Control, vol. 1. Belmont, MA: Athena Scientific, 1995. [114] Y. Fang and K. A. Loparo, “Stochastic stability of jump linear systems,” IEEE Trans- actions on Automatic Control, vol. 47, no. 7, pp. 1204–1208, 2002. [115] J. P. Hespanha, Linear systems theory. Princeton university press, 2009. [116] X. Ge, F. Yang, and Q.-L. Han, “Distributed networked control systems: A brief overview,” Information Sciences, vol. 380, pp. 117–131, 2017. [117] M. Rotkowitz and S. Lall, “A characterization of convex problems in decentralized control,” IEEE Transactions on Automatic Control, vol. 50, no. 12, pp. 1984–1996, 2005. [118] A. Lamperski and J. C. Doyle, “TheH 2 control problem for quadratically invariant systems with delays,” IEEE Transactions on Automatic Control, vol. 60, pp. 1945– 1950, July 2015. 212 Bibliography [119] A. Mishra, C. Langbort, and G. E. Dullerud, “Team optimal control of stochastically switched systems with local parameter knowledge,” IEEE Transactions on Automatic Control, vol. 60, pp. 2086–2101, Aug 2015. [120] S. M. Asghari and A. Nayyar, “Dynamic teams and decentralized control problems with substitutable actions,” IEEE Transactions on Automatic Control, vol. 62, no. 10, pp. 5302–5309, 2017. [121] S. Kar, J. M. F. Moura, and H. V. Poor, “Qd-learning: A collaborative distributed strategy for multi-agent reinforcement learning through consensus innovations,” IEEE Transactions on Signal Processing, vol. 61, pp. 1848–1862, April 2013. [122] M. L. Littman, “Markov games as a framework for multi-agent reinforcement learn- ing,” in Machine learning proceedings 1994, pp. 157–163, Elsevier, 1994. [123] J. Foerster, I. A. Assael, N. de Freitas, and S. Whiteson, “Learning to communicate with deep multi-agent reinforcement learning,” in Advances in Neural Information Processing Systems, pp. 2137–2145, 2016. [124] M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. P´ erolat, D. Sil- ver, and T. Graepel, “A unified game-theoretic approach to multiagent reinforcement learning,” in Advances in Neural Information Processing Systems, pp. 4190–4203, 2017. [125] L. Bu, R. Babu, B. De Schutter, et al., “A comprehensive survey of multiagent rein- forcement learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 2, pp. 156–172, 2008. [126] K. Zhang, Z. Yang, H. Liu, T. Zhang, and T. Ba¸ sar, “Fully decentralized multi-agent reinforcement learning with networked agents,” arXiv preprint arXiv:1802.08757, 2018. [127] M. Gagrani and A. Nayyar, “Thompson sampling for some decentralized control prob- lems,” in 2018 IEEE Conference on Decision and Control (CDC), pp. 1053–1058, IEEE, 2018. [128] P. R. Kumar and P. Varaiya, Stochastic systems: Estimation, identification, and adaptive control, vol. 75. SIAM, 2015. 213 Bibliography [129] M. C. Campi and P. Kumar, “Adaptive linear quadratic gaussian control: the cost- biased approach revisited,” SIAM Journal on Control and Optimization, vol. 36, no. 6, pp. 1890–1907, 1998. [130] H.-F. Chen and L. Guo, “Optimal adaptive control and consistent parameter estimates for armax model with quadratic cost,” SIAM Journal on Control and Optimization, vol. 25, no. 4, pp. 845–867, 1987. [131] T. E. Duncan, L. Guo, and B. Pasik-Duncan, “Adaptive continuous-time linear quadratic gaussian control,” IEEE Transactions on automatic control, vol. 44, no. 9, pp. 1653–1662, 1999. [132] Y. Abbasi-Yadkori and C. Szepesv´ ari, “Regret bounds for the adaptive control of linear quadratic systems,” in Proceedings of the 24th Annual Conference on Learning Theory, pp. 1–26, 2011. [133] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “Regret bounds for robust adap- tive control of the linear quadratic regulator,” in Advances in Neural Information Processing Systems, pp. 4192–4201, 2018. [134] M. Abeille and A. Lazaric, “Improved regret bounds for thompson sampling in linear quadratic control problems,” in International Conference on Machine Learning, pp. 1– 9, 2018. [135] Y. Abbasi-Yadkori and C. Szepesv´ ari, “Bayesian optimal control of smoothly param- eterized systems,” in Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, pp. 2–11, AUAI Press, 2015. [136] M. K. S. Faradonbeh, A. Tewari, and G. Michailidis, “On optimality of adaptive linear-quadratic regulators,” arXiv preprint arXiv:1806.10749, 2018. [137] S. M. Asghari, Y. Ouyang, and A. Nayyar, “Optimal local and remote controllers with unreliable uplink channels,” IEEE Transactions on Automatic Control, pp. 1–1, 2018. [138] Y. Ouyang, S. M. Asghari, and A. Nayyar, “Optimal infinite horizon decentralized net- worked controllers with unreliable communication,” arXiv preprint arXiv:1806.06497, 2018. 214 Bibliography [139] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic programming and optimal control, vol. 1. Athena scientific Belmont, MA, 1995. [140] S. Tu and B. Recht, “Least-squares temporal difference learning for the linear quadratic regulator,” arXiv preprint arXiv:1712.08642, 2017. [141] Y. Abbasi-Yadkori, N. Lazic, and C. Szepesv´ ari, “Regret bounds for model-free linear quadratic control,” arXiv preprint arXiv:1804.06021, 2018. [142] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample complexity of the linear quadratic regulator,” arXiv preprint arXiv:1710.01688, 2017. [143] H. S. Witsenhausen, “A counterexample in stochastic optimum control,” SIAM Jour- nal on Control, vol. 6, no. 1, pp. 131–147, 1968. [144] V. D. Blondel and J. N. Tsitsiklis, “A survey of computational complexity results in systems and control,” Automatica, vol. 36, no. 9, pp. 1249–1274, 2000. [145] A. Mahajan, N. Martins, M. Rotkowitz, and S. Yuksel, “Information structures in optimal decentralized control,” in IEEE Conference on Decision and Control (CDC), pp. 1291–1306, 2012. [146] L. Lessard and S. Lall, “Internal quadratic invariance and decentralized control,” in American Control Conference (ACC), 2010, pp. 5596–5601. [147] L. Lessard, Tractability of complex control systems. PhD thesis, Stanford University, 2011. [148] R. Bansal and T. Basar, “Stochastic teams with nonclassical information revisited: When is an affine law optimal?,” IEEE Transactions on Automatic Control, vol. 32, pp. 554–559, Jun 1987. [149] M. Rotkowitz, “Linear controllers are uniformly optimal for the witsenhausen coun- terexample,” in Decision and Control, 2006 45th IEEE Conference on, pp. 553–558, Dec 2006. [150] A. Ben-Israel and T. Greville, Generalized Inverses: Theory and Applications. CMS Books in Mathematics, Springer New York, 2006. [151] S. M. Asghari and A. Nayyar, “Static LQG team with convex function,” technical report, Department of Electrical Engineering, University of Southern California, Mar. 2016. 215 Bibliography [152] P. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identification and Adaptive Control. Prentice-Hall, 1986. [153] Y. Shi and B. Yu, “Output feedback stabilization of networked control systems with random delays modeled by Markov chains,” IEEE Transactions on Automatic Control, vol. 54, no. 7, pp. 1668–1674, 2009. [154] J. P. Hespanha, “Modeling and analysis of networked control systems using stochastic hybrid systems,” Annual Reviews in Control, vol. 38, no. 2, pp. 155–170, 2014. [155] Y. Ouyang, S. M. Asghari, and A. Nayyar, “Optimal local and remote controllers with unreliable communication,” in IEEE 55th Annual Conference on Decision and Control (CDC), 2016, Dec 2016. [156] O. Kallenberg, Foundations of Modern Probability. Springer New York, 2006. [157] R. Radner, “Team decision problems,” The Annals of Mathematical Statistics, vol. 33, no. 3, pp. 857–881, 1962. [158] D. P. Bertsekas and S. E. Shreve, Stochastic optimal control: The discrete time case, vol. 23. Academic Press New York, 1978. [159] P. Whittle, Optimal Control: Basics and Beyond. Wiley, 1996. [160] H. H. Rosenbrock, State-space and multivariable theory. Wiley Interscience Division, 1970. [161] P. R. Kumar and P. Varaiya, Stochastic Systems: Estimation Identification and Adap- tive Control. Prentice Hall, 1986. [162] S. M. Asghari, Y. Ouyang, and A. Nayyar, “Optimal local and remote controllers with unreliable uplink channels,” IEEE Transactions on Automatic Control, 2018. Accepted. 216
Abstract (if available)
Abstract
The advent of new applications and demands for decentralized systems is changing the shape of today's control systems. The performance of such systems hinges on a series of operational decisions made by multiple controllers where each controller may have different information regarding the overall system. Team theory provides a framework for studying decision and control problems in decentralized systems. ❧ The main focus of this thesis is on linear-quadratic-Gaussian (LQG) dynamic team decision problems. Of particular interest are problems in decentralized stochastic control that can be viewed as LQG dynamic teams. Existing work on such team decision and decentralized control problems has the following limitations: (L1) The available results on LQG dynamic teams are largely restricted to partially nested information structures and little is known about the solution of general non-partially nested team decision problems
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Learning and decision making in networked systems
PDF
Learning and control in decentralized stochastic systems
PDF
Sequential decision-making for sensing, communication and strategic interactions
PDF
Sequential Decision Making and Learning in Multi-Agent Networked Systems
PDF
The smart grid network: pricing, markets and incentives
PDF
Analysis, design, and optimization of large-scale networks of dynamical systems
PDF
Smarter markets for a smarter grid: pricing randomness, flexibility and risk
PDF
I. Asynchronous optimization over weakly coupled renewal systems
PDF
Novel techniques for analysis and control of traffic flow in urban traffic networks
PDF
Physics-based and data-driven models for bio-inspired flow sensing and motion planning
PDF
Adaptive control: transient response analysis and related problem formulations
PDF
Data-driven H∞ loop-shaping controller design and stability of switched nonlinear feedback systems with average time-variation rate
PDF
Empirical methods in control and optimization
PDF
Characterizing and improving robot learning: a control-theoretic perspective
PDF
Taxicab transportation: operations, equilibrium, and efficiency
PDF
New approaches in modeling and control of dynamical systems
PDF
Train scheduling and routing under dynamic headway control
PDF
Elements of robustness and optimal control for infrastructure networks
PDF
Understanding goal-oriented reinforcement learning
PDF
Integrated control of traffic flow
Asset Metadata
Creator
Asgharipari, Seyedmohammad
(author)
Core Title
Team decision theory and decentralized stochastic control
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
07/24/2019
Defense Date
06/17/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
decentralized stochastic control,learning in decentralized control,non-partially nested problems,OAI-PMH Harvest,team decision theory,unreliable communication
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Nayyar, Ashutosh (
committee chair
), Jain, Rahul (
committee member
), Savla, Ketan (
committee member
)
Creator Email
asgharip@usc.edu,s.m.asghari.pari@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-190843
Unique identifier
UC11662249
Identifier
etd-Asgharipar-7609.pdf (filename),usctheses-c89-190843 (legacy record id)
Legacy Identifier
etd-Asgharipar-7609.pdf
Dmrecord
190843
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Asgharipari, Seyedmohammad
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
decentralized stochastic control
learning in decentralized control
non-partially nested problems
team decision theory
unreliable communication