Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Three fundamental pillars of decision-centered teamwork
(USC Thesis Other)
Three fundamental pillars of decision-centered teamwork
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
THREE FUNDAMENTAL PILLARS OF DECISION-CENTERED TEAMWORK by Leandro Soriano Marcolino A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2016 Copyright 2016 Leandro Soriano Marcolino To my father ii Acknowledgements When a chess grandmaster performs a brilliant move, she may think of herself as a genius. But that move would never have been made without the lessons of her previous mentors and the pressure of her opponent. When an athlete wins a gold medal, she may think of herself as superior. But nothing would have been achieved without the training of her coaches and the support of family and friends. When a scientist has a breakthrough, she may think of it as her greatest achievement. However, that breakthrough is the result of her standing \in the shoulder of giants", as Newton brilliantly realized. No achievement in life is performed alone. We are a product of our interactions with our family, our friends, our masters, our colleagues and collaborators. We learn and grow with everyone around us, no matter if they are intentionally teaching us or not. We survive thanks to everyone around us, no matter if they are playing as friends or as antagonists. Life is a product of our struggles, and of the hands that pull us up when we fall, the hands that point the way forward when we are lost, the hands that push us ahead when we stop. It is impossible to cite names without being unfair. It is also impossible to write an acknowledgment without citing names. The list of everyone that I own my gratitude would perhaps grow larger than this thesis itself. Hence, excuse my unfairness by rst writing the names of my masters. I thank my advisor, Milind Tambe, for guiding me in those ve years, not only in my research, but also in my formation as a professional researcher. I must also thank my previous advisors, Hitoshi Matsubara and Luiz Chaimowicz, who helped in shaping me as a researcher, and in preparing me to start my PhD. I also thank the members of the committee, Gaurav Sukhatme, William Swartout, Craig Knoblock and Nicholas Weller, for their time and guidance. In particular, Milind, Gaurav and William went beyond their usual duties to guide me in securing an academic position. I also must thank my students, who believed in me to guide their work: Vaishnavh Nagarajan, Aravind Lakshminarayanan, Samori Price, Boian Kolev, Alvaro Souza, Dou- glass Chen. They taught me so much, I would be glad if they learned with me at least half of what I myself learned from them. iii I thank my friends from the School of Architecture: Sreerag Palangat Veetil, Evangelos Pantazis, David Gerber. They gave me new and exciting ways to think about my own research, besides very interesting discussions and conversations. I thank my collaborators, and the members of the AAMAS community. In particular, I thank Ariel Procaccia, Tuomas Sandholm, Maria Gini, Shimon Whiteson, Albert Xin Jiang, Chris Kiekintveld, Emma Bowring, William Haskell, Michael Wooldridge; who helped me in my research, and in nding a position to start my academic career. Of course I must thank my wife for being with me throughout this journey, far from her family and friends of her own country, just to oer me her love and support. I also thank my family, who always support me, and care about me, no matter where I am. For all the names that I failed to list, I oer you my sincere gratitude, in exchange of your forgiveness... iv The end is nothing; the road is all. (Willa Cather) v Table of Contents Acknowledgements iii List Of Figures ix List Of Tables xiv Abstract xv I Introduction and Background 1 Chapter 1 Introduction 2 1.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Guide to thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 2 Background and Related Work 9 2.1 Social Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Teamwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Social Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5 Team Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.6 Other Multiple Algorithms Techniques . . . . . . . . . . . . . . . . . . . . 17 2.7 Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 II Agent Selection 20 Chapter 3 Diversity Beats Strength? 21 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Detailed Study: Why a Team of Diverse Agents Perform Better? . . . . . 35 3.5 Conclusion and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . 42 vi Chapter 4 Give a Hard Problem to a Diverse Team 44 4.1 Model for Analysis of Diversity in Teams . . . . . . . . . . . . . . . . . . 45 4.2 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3 Conclusion and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Chapter 5 So Many Options, but We Need Them All 63 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2 Design Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.3 Agent Teams for Design Problems . . . . . . . . . . . . . . . . . . . . . . 68 5.4 Synthetic Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.5 Experiments in Architectural Design . . . . . . . . . . . . . . . . . . . . . 85 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 III Aggregation of Opinions 98 Chapter 6 Ranked Voting 99 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.2 Team Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.3 Ranked Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.4 PSINET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Chapter 7 Simultaneous Influencing and Mapping 115 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.2 In uencing and Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 IV Team Assessment 135 Chapter 8 Every Team Deserves a Second Chance 136 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 8.2 Prediction Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.3 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 8.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.5 Analysis of Coecients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 8.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 8.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 vii V Discussions and Conclusions 179 Chapter 9 Conclusions 180 References 183 Appendices 199 Appendix A Simultaneous Influencing and Mapping: Additional Re- sults 200 A.1 Results for each network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 A.2 Additional results for power law distribution . . . . . . . . . . . . . . . . . 200 Appendix B Every Team Deserves a Second Chance: Additional Re- sults 219 viii List Of Figures 1.1 Domains where I explored decision-centered teamwork. . . . . . . . . . . . 5 2.1 \Classical" voting model. Each agent has a noisy perception of the truth, or correct outcome. Hence, its vote is in uenced by the correct outcome. . 10 3.1 1200 random teams of 4 agents. . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Histogram of the agents, using real data. . . . . . . . . . . . . . . . . . . . 33 3.3 Expected size of the set of agents that vote for the winning move, with 6 agents and no opening database. . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 Results in the Computer Go domain. The error bars show the condence interval, with 99% of signicance. . . . . . . . . . . . . . . . . . . . . . . . 36 3.5 First example, the diverse team plays as white without the opening database against Fuego. White wins by resignation. . . . . . . . . . . . . . . . . . . 38 3.6 Second example, the diverse team plays as white without the opening database against Fuego. White wins by resignation. . . . . . . . . . . . . 40 3.7 Third example, the diverse team plays as white with the opening database against Fuego. White wins by resignation. . . . . . . . . . . . . . . . . . . 41 4.1 Comparing diverse and uniform when uniform also increases d m . . . . . . 53 4.2 p best of a diverse team as the number of agents increases. . . . . . . . . . . 54 4.3 Winning rate in the real Computer Go system. . . . . . . . . . . . . . . . 57 4.4 Winning rates for 4 and 6 agents teams. . . . . . . . . . . . . . . . . . . . 57 4.5 Histograms of agents for dierent board sizes. . . . . . . . . . . . . . . . . 59 4.6 Verifying the assumptions in the real system. . . . . . . . . . . . . . . . . 60 5.1 A parametric design of a building, showing two parameters: X1 and Y 1. . 66 5.2 Illustrative example of the probability distribution functions of two diverse agents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 Illustrative example of Equation 5.1. Here I show 6 agents (n := 6) with 2 preferred actions each (r := 2). Each action is in the list of preferences of 4 agents (k := 4). As an example, I mark with a dashed circle one of the actions, a 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.4 Illustrative example of the probability distribution functions of two agents with multiple j Good i sets. . . . . . . . . . . . . . . . . . . . . . . . . . . 80 ix 5.5 Percentage of optimal solutions found by uniform teams as max + grows. Note how max + := 0 gives the best result. . . . . . . . . . . . . . . . . . 83 5.6 Percentage of optimal solutions as the number of agents grows. The uni- form teams decrease in performance, while multiple variations of diverse teams improve, but with diminishing returns. . . . . . . . . . . . . . . . . 84 5.7 Percentage of optimal solutions found by large teams of diverse agents. . . 85 5.8 Parametric designs with increasing complexity used in our experiments. . 86 5.9 Percentage of optimal solutions of each system. The teams nd a much larger percentage than the individual agents. . . . . . . . . . . . . . . . . 89 5.10 Some building designs generated by the teams. . . . . . . . . . . . . . . . 91 5.11 Additional analysis. I show here that many falsely reported optimal solu- tions are eliminated by the team of agents, and also that the teams provide a large number of optimal solutions to the designer. . . . . . . . . . . . . 92 5.12 All the optimal solutions in the objectives space. . . . . . . . . . . . . . . 93 6.1 Winning rates for Diverse (continuous line) and Uniform (dashed line), for a variety of team sizes, using the plurality voting rule. . . . . . . . . . . . 107 6.2 Winning rate of Fuego and of the parametrized agents. . . . . . . . . . . . 107 6.3 Evaluation of the diversity of the parametrized agents, and the fraction of states in which all of them have a low probability of playing the optimal action. The error bars show 99% condence intervals. . . . . . . . . . . . 109 6.4 Winning rates for Diverse (continuous line) and Uniform (dashed line), for a variety of team sizes and voting rules. . . . . . . . . . . . . . . . . . . . 110 6.5 All voting rules, for Diverse with 5 agents, using the new ranking method- ology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.6 Comparison on BTER graphs . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.7 One of the friendship based social network of homeless people visiting My Friend's Place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.8 Solution Quality for Real World Networks . . . . . . . . . . . . . . . . . . 113 7.1 A graph where the traditional greedy algorithm has arbitrarily low perfor- mance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 7.2 Results of 4 real world networks across many interventions, forp = 0:5 and ' = 0:5 (uniform distribution). . . . . . . . . . . . . . . . . . . . . . . . . 124 7.3 Results of In uence and Knowledge for dierent teaching and in uence probabilities (uniform distribution). . . . . . . . . . . . . . . . . . . . . . 126 7.4 Regret for dierent teaching and in uence probabilities (uniform distribu- tion). Lower results are better. . . . . . . . . . . . . . . . . . . . . . . . . 127 7.5 Results of 4 real world networks across many interventions, forp = 0:5 and ' = 0:5 (power law distribution). . . . . . . . . . . . . . . . . . . . . . . . 129 7.6 In uence and Knowledge for dierent teaching and in uence probabilities (power law distribution). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 x 7.7 Regret for dierent teaching and in uence probabilities (power law distri- bution). Lower results are better. . . . . . . . . . . . . . . . . . . . . . . . 131 8.1 \Classical" voting model. Each agent has a noisy perception of the truth, or correct outcome. Hence, its vote is in uenced by the correct outcome. . 144 8.2 My main model. I assume that the subset of agents that decided the action of the team at each world state H i determines whether the team will be successful or not (W ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.3 Winning rates of the three teams, under four dierent board sizes. . . . . 155 8.4 Performance metrics over all turns of 9x9 Go games. . . . . . . . . . . . . 158 8.5 ROC curves, analyzing dierent thresholds in 9x9 Go. . . . . . . . . . . . 160 8.6 AUC for diverse, uniform and intermediate teams, in 9x9 Go. . . . . . . . 161 8.7 ROC curves for diverse and uniform, for dierent board sizes. . . . . . . . 162 8.8 AUC for dierent teams and board sizes, organized by teams. . . . . . . . 163 8.9 AUC for dierent teams and board sizes, organized by board sizes. . . . . 164 8.10 Dierences in prediction quality for the diverse and uniform teams. . . . 164 8.11 Performance metrics over all turns of 9x9 and 21x21 Go games. . . . . . . 166 8.12 Comparison of prediction quality with the full and reduced representation. 167 8.13 Performance metrics over a set of items in ensemble system classication (full feature vector). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 8.14 Performance metrics over a set of items in ensemble system classication (reduced feature vector). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 8.15 Normalized coecients of all teams and board sizes, organized by board sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.16 Normalized coecients of all teams and board sizes, organized by teams. . 174 A.1 Results for network A across many interventions, for in uence probability p = 0:5, teaching probability ' = 0:5, assuming uniform distribution. . . . 201 A.2 Results of In uence and Knowledge in network A for dierent teaching and in uence probabilities, assuming uniform distribution. . . . . . . . . . . . 202 A.3 Regret in network A for dierent teaching and in uence probabilities, as- suming uniform distribution. . . . . . . . . . . . . . . . . . . . . . . . . . 202 A.4 Results for network B across many interventions, for in uence probability p = 0:5, teaching probability ' = 0:5, assuming uniform distribution. . . . 203 A.5 Results of In uence and Knowledge in network B for dierent teaching and in uence probabilities, assuming uniform distribution. . . . . . . . . . . . 204 A.6 Regret in network B for dierent teaching and in uence probabilities, as- suming uniform distribution. . . . . . . . . . . . . . . . . . . . . . . . . . 204 A.7 Results for Facebook network across many interventions, for in uence probabilityp = 0:5, teaching probability' = 0:5, assuming uniform distri- bution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 xi A.8 Results of In uence and Knowledge in Facebook network for dierent teaching and in uence probabilities, assuming uniform distribution. . . . . 206 A.9 Regret in Facebook network for dierent teaching and in uence probabili- ties, assuming uniform distribution. . . . . . . . . . . . . . . . . . . . . . . 206 A.10 Results for MySpace network across many interventions, for in uence prob- ability p = 0:5, teaching probability ' = 0:5, assuming uniform distribution.207 A.11 Results of In uence and Knowledge in MySpace network for dierent teach- ing and in uence probabilities, assuming uniform distribution. . . . . . . . 208 A.12 Regret in MySpace network for dierent teaching and in uence probabili- ties, assuming uniform distribution. . . . . . . . . . . . . . . . . . . . . . . 208 A.13 Results for network A across many interventions, for in uence probability p = 0:5, teaching probability ' = 0:5, assuming power law distribution. . . 209 A.14 Results of In uence and Knowledge in network A for dierent teaching and in uence probabilities, assuming power law distribution. . . . . . . . . . . 210 A.15 Regret in network A for dierent teaching and in uence probabilities, as- suming power law distribution. . . . . . . . . . . . . . . . . . . . . . . . . 210 A.16 Results for network B across many interventions, for in uence probability p = 0:5, teaching probability ' = 0:5, assuming power law distribution. . . 211 A.17 Results of In uence and Knowledge in network B for dierent teaching and in uence probabilities, assuming power law distribution. . . . . . . . . . . 212 A.18 Regret in network B for dierent teaching and in uence probabilities, as- suming power law distribution. . . . . . . . . . . . . . . . . . . . . . . . . 212 A.19 Results for Facebook network across many interventions, for in uence probability p = 0:5, teaching probability ' = 0:5, assuming power law distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 A.20 Results of In uence and Knowledge in Facebook network for dierent teaching and in uence probabilities, assuming power law distribution. . . 214 A.21 Regret in Facebook network for dierent teaching and in uence probabili- ties, assuming power law distribution. . . . . . . . . . . . . . . . . . . . . 214 A.22 Results for MySpace network across many interventions, for in uence prob- ability p = 0:5, teaching probability ' = 0:5, assuming power law distri- bution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 A.23 Results of In uence and Knowledge in MySpace network for dierent teach- ing and in uence probabilities, assuming power law distribution. . . . . . 216 A.24 Regret in MySpace network for dierent teaching and in uence probabili- ties, assuming power law distribution. . . . . . . . . . . . . . . . . . . . . 216 A.25 Results of 4 real world networks across many interventions, forp = 0:5 and ' = 0:5, assuming power law distribution with a = 1:2. . . . . . . . . . . . 217 A.26 Results of In uence and Knowledge for dierent teaching and in uence probabilities, assuming power law distribution with a = 1:2. . . . . . . . . 218 xii A.27 Regret for dierent teaching and in uence probabilities, assuming power law distribution with a = 1:2. Lower results are better. . . . . . . . . . . . 218 B.1 Performance metrics over all turns of 9x9 Go games. (Alternative baseline) 220 B.2 ROC curves, analyzing dierent thresholds in 9x9 Go. (Alternative baseline)221 B.3 ROC curves for diverse and uniform, for dierent board sizes. (Alternative baseline) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 B.4 AUC for dierent teams and board sizes, organized by teams. (Alternative baseline) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 B.5 AUC for dierent teams and board sizes, organized by board sizes. (Alter- native baseline) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 B.6 Dierences in prediction quality for the diverse and uniform teams. (Al- ternative baseline) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 B.7 Performance metrics over all turns of 9x9 and 21x21 Go games. (Alterna- tive baseline) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 B.8 Comparison of prediction quality with the full and reduced representation. (Alternative baseline) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 xiii List Of Tables 3.1 A team of deterministic agents that can reach perfect play under simple voting. \1" indicates agent plays perfect action. . . . . . . . . . . . . . . 24 3.2 A team of non-deterministic agents that can overcome copies of the best agent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Probability to select the best move of each player and the teams. . . . . . 32 3.4 Weak agents can play better in some board states. In parentheses, I show when the dierence in P best is 99% signicant. . . . . . . . . . . . . . . . . 34 3.5 Probability of playing the moves in the rst example. * indicates the better move. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.6 Probability of playing the moves in the second example. Some results are unavailable due to lack of memory. * indicates the better move. . . . . . . 40 3.7 Probability of playing the moves in the third example. * indicates the better move. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.1 Performance of diverse team increases when the number of actions increases. 47 4.2 Average winning rates of the team members across dierent board sizes. Note that these are not the winning rates of the teams. . . . . . . . . . . 56 4.3 Winning rates of each one of the agents across dierent board sizes. . . . 56 5.1 Probability distribution function of the agents in my example. . . . . . . . 69 5.2 Probability of outputting each possible optimal solution, for the diverse and uniform teams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.3 Probability of outputting each possible optimal solution, for dierent sizes of the diverse team. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.4 GA parameters for the diverse team. Initial Population and Maximum Iteration were kept as constants: 10 and 5, respectively. PZ = Population Size, SZ = Selection Size, CR = Crossover Ratio, MR = Mutation Ratio. 87 6.1 Parameters sampled to generate dierent versions of Fuego. . . . . . . . . 102 8.1 A simple example of voting proles after three iterations of problem-solving.141 8.2 Example of the full feature vector after three iterations of problem solving. 142 8.3 Example of the reduced feature vector after three iterations of problem solving. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 xiv Abstract This thesis introduces a novel paradigm in articial intelligence: decision-centered team- work. Decision-centered teamwork is the analysis of agent teams that iteratively take joint decisions into solving complex problems. Although teams of agents have been used to take decisions in many important domains, such as: machine learning, crowdsourcing, forecasting systems, and even board games; a study of a general framework for decision- centered teamwork has never been presented in the literature before. I divide decision-centered teamwork in three fundamental challenges: (i) Agent Se- lection, which consists of selecting a set of agents from an exponential universe of possible teams; (ii) Aggregation of Opinions, which consists of designing methods to aggregate the opinions of dierent agents into taking joint team decisions; (iii) Team Assessment, which consists of designing methods to identify whether a team is failing, allowing a \co- ordinator" to take remedial procedures. In this thesis, I handle all these challenges. For Agent Selection, I introduce novel models of diversity for teams of voting agents. My models rigorously show that teams made of the best agents are not necessarily optimal, and also clarify in which situations diverse teams should be preferred. In particular, I show that diverse teams get stronger as the number of actions increases, by analyzing how the agents' probability distribution function over actions changes. This has never been presented before in the ensemble systems literature. I also show that diverse teams have a great applicability for design problems, where the objective is to maximize the number of optimal solutions for human selection, combining for the rst time social choice with number theory. All of these theoretical models and predictions are veried in real systems, such as Computer Go and architectural design. In particular, for architectural design I optimize the design of buildings with agent teams not only for cost and project requirements, but also for energy-eciency, being thus an essential domain for sustainability. Concerning Aggregation of Opinions, I evaluate classical ranked voting rules from social choice in Computer Go, only to discover that plurality leads to the best results. This happens because real agents tend to have very noisy rankings. Hence, I create a ranking by sampling extraction technique, leading to signicantly better results with the Borda voting rule. A similar study is also performed in the social networks domain, in xv the context of in uence maximization. Additionally, I study a novel problem in social networks: I assume only a subgraph of the network is initially known, and we must spread in uence and learn the graph simultaneously. I analyze a linear combination of two greedy algorithms, outperforming both of them. This domain has a great potential for health, as I run experiments in four real-life social networks from the homeless population of Los Angeles, aiming at spreading HIV prevention information. Finally, with regards to Team Assessment, I develop a domain independent team assessment methodology for teams of voting agents. My method is within a machine learning framework, and learns a prediction model over the voting patterns of a team, instead of learning over the possible states of the problem. The methodology is tested and veried in Computer Go and Ensemble Learning. xvi Part I Introduction and Background 1 Chapter 1 Introduction They may sound your praise and call you great, They may single you out for fame, But you must work with your running mate Or you'll never win the game; Oh, never the work of life is done By the man with a selsh dream, For the battle is lost or the battle is won By the spirit of the team. (Edgar A. Guest) Teams of agents have been used for taking decisions while solving complex problems in many important domains. For instance, in machine learning ensemble systems have been widely studied, where the output of multiple classiers are aggregated into a nal classication [Polikar, 2012]. Recently, in the articial intelligence literature, aggregating the opinions of multiple people has been receiving considerable attention in the study of crowdsourcing [Mao et al., 2013, Bachrach et al., 2012]. Similarly, multiple agents have also been used to make predictions, in the study of forecasting systems [Isa et al., 2010]. Such decision-centered teams have been explored even for board games [Obata et al., 2011, Soejima et al., 2010]. One of the rst works to study team-decision making was in 1785, when Condorcet [Condorcet, 1785] analyzed the guarantees of a jury doing majority voting to make a correct verdict, assuming a binary problem (guilt or not guilt) and identical jury members. More recent works in the social sciences literature emphasize the importance of diversity when forming (human) teams, such as Hong and Page [2004], and LiCalzi and Surucu [2012], that study models where agents are able to know the utility of the solutions, and the team can simply pick the best solution found by one of its members. 2 In machine learning, however, the main focus has been on how to best divide the training set across dierent learners [Polikar, 2012]. In social choice, the complexity of computing the output of voting rules and of manipulating elections has been widely studied [Bartholdi III et al., 1989a,b, Davies et al., 2011, Yang, 2015], but that still does not guide us on how to form the best voting team. Teamwork models are also a major part of multi-agent systems research, but the focus is on assigning tasks to agents with dierent capabilities, in order to maximize the expected utility of the team [Nair and Tambe, 2005, Guttmann, 2008]. Hence, in this thesis I introduce the concept of decision-centered teamwork to study domains where a set of fully cooperative agents is used to take decisions into accom- plishing a common goal. Decision-centered teamwork is the analysis of agent teams that iteratively take joint decisions to solve complex problems. This thesis is the rst general study of decision-centered teamwork. When forming such teams, we must not only se- lect agents, but also design aggregation mechanisms and domain-independent methods to assess the performance of a given team. Hence, I divide decision-centered teamwork in three fundamental challenges: Agent Selection, Aggregation of Opinions and Team Assessment. Agent Selection is picking a limited number of agents to form a team. Previous work in social choice considered a single world state [Conitzer and Sandholm, 2005, List and Goodin, 2001], which would lead us to expect the best team to be the one composed by the best possible agents. I show [Marcolino et al., 2013] that this is not true, and it is fundamental to also consider diversity when forming teams. However, my rst diversity model only presents necessary conditions for such phenomenon; therefore I present a second, more general model of diversity [Marcolino et al., 2014b], where I can predict that diverse teams perform better than uniform teams (i.e., copies of the best agent) in problems with a large action space. Additionally, I perform a study of diverse and uniform teams for design problems [Marcolino et al., 2016a], where the focus is to maximize the number of optimal solutions for human selection (according to aesthetics or non-formalized compromises). Aggregation of Opinions, in the context of decision centered teamwork, is combin- ing the opinions of each member of the team into a nal team decision. I study ranking extraction and ranked voting rules [Jiang et al., 2014, Yadav et al., 2015] using existing agents in Computer Go and in uence maximization. Additionally, I study a linear combination of two greedy algorithms for simultaneously learning and in uencing social networks [Marcolino et al., 2016c]. 3 Team Assessment is verifying the performance of a given team. In particular, it is fundamental to predict whether a team is going to be successful or fail in problem solving. Existing methods are tailored for specic domains, such as robot-soccer [Ramos and Ayanegui, 2008]. Hence, I propose [Nagarajan et al., 2015, Marcolino et al., 2016b] a novel domain independent technique, which learns a prediction function for estimating the performance of a team using only its voting patterns. Based on such prediction, it is possible to take remedial procedures to increase a team's performance. This thesis presents theoretical, methodological and experimental contributions for these facets of decision-centered teamwork. I run experiments in three domains: Com- puter Go [Marcolino et al., 2013, 2014b, Jiang et al., 2014, Nagarajan et al., 2015, Mar- colino et al., 2016b], Architectural Design [Marcolino et al., 2016a] and in uence maxi- mization in social networks [Yadav et al., 2015, Marcolino et al., 2016c] (Figure 1.1). As I detail later, these domains have a positive societal impact for health and sustainability. 1.1 Summary of Contributions 1.1.1 Agent Selection A na ve solution for agent selection would be to simply select the set of the individually best agents to form a team. I show, however, that team diversity is also an important aspect of decision-centered teamwork. Hence, this thesis presents three models of diversity in teamwork. In my rst model, I show that a diverse team can outperform a uniform team if at least one agent has a higher probability of playing the best action than the best agent in at least one world state [Marcolino et al., 2013]. However, since only necessary conditions are provided, I develop a second model [Marcolino et al., 2014b], which study the eect of increasing the number of actions available to choose from. I dene spreading tail (ST) agents, that have an increasingly larger number of actions assigned with a non-zero probability as the number of actions in the domain increases. A diverse team is modeled as a team of ST agents. I show that the probability of a diverse team picking the best action increases as the action space increases; and also converges to 1 when the team size grows in large action spaces. The main idea of my model is that diverse agents are less likely to agree on the same mistakes when the action space is large, and therefore only two agents voting for the optimal solution is sucient. I run experiments in the Computer Go domain, using six existing Computer Go agents, and analyze the predictions and the assumptions of my models. 4 Decision-centered teamwork allowed me to: Play better Go by voting among multiple diverse agents. Optimally design buildings by voting among agents and using human selection. Energy-ecient solutions for architecture! Los Angeles Homeless Population Social Network Prevent HIV in uncertain social networks by combining multiple algorithms. Figure 1.1: Domains where I explored decision-centered teamwork. 5 My third diversity model focuses on agent teams for design problems. This thesis introduces design problems as a novel problem in social choice, where the performance of a voting system is evaluated by the number of optimal solutions found by an agent team. For maximum applicability, I study agents that are queried for a single opinion, and multiple solutions are obtained by multiple voting iterations. I show that diverse teams composed of agents with dierent preferences maximize the number of optimal solutions, while uniform teams composed of multiple copies of the best agent are in general suboptimal. I run experiments with diverse and uniform teams in a real system for architectural design optimization. The design of buildings are not only optimized for cost and conformity with project requirements, but also for energy-eciency. Hence, this is an important domain for sustainability. 1.1.2 Aggregation of Opinions In the context of decision-centered teamwork, team coordination is aggregating the opin- ions of multiple agents into taking a nal decision. I study the performance of four classical ranked voting rules from social choice in Computer Go. First, I build a ranking for each agent by directly extracting their estimated utilities of each action from their search trees. However, after aggregating such rankings, I nd that plurality still had the best winning rates. Such surprising result may seem discouraging at rst, but I introduce a novel method to extract rankings from existing agents, based on the frequency that ac- tions are played when sampling an agent multiple times. I show that using this method, the Borda ranked voting rule outperforms plurality. While performing this study, I also introduce a technique for generating diverse teams by creating random parametrizations of a base agent [Jiang et al., 2014], in order to analyze larger diverse teams of voting agents. Similarly, I analyze the performance of ranked voting aggregation in the social networks domain, in the context of in uence maximization [Yadav et al., 2015]. Additionally, I analyze a linear combination of two greedy algorithms in a novel social networks problem: I assume only a subgraph of the network is initially known, and we must spread in uence and learn the graph simultaneously [Marcolino et al., 2016c]. Although in uence maximization has been extensively studied [Kempe et al., 2003, Cohen et al., 2014, Golovin and Krause, 2010, Maghami and Sukthankar, 2012, Li and Jiang, 2014], the main motivation of previous works is viral marketing, and hence they assume that the social network graph is fully known beforehand, generally taken from some social media network (such as Facebook or MySpace). However, the graphs recorded in social media do not really represent all the people and all the connections of a population. Most critically, when performing interventions in real life, we deal with large degrees of lack of knowledge. 6 I show that the traditional greedy in uence maximization algorithm may have arbi- trarily low performance. However, we can still eectively in uence and learn a network graph when the independence of objectives hypothesis holds (which will be later dened in this thesis). When it does not hold, I give an upper bound for the in uence loss, which converges to 0 as the number of interventions in the network grows. I run extensive experiments over four real-life social networks of the homeless population of Los Angeles. In uence maximization has a great potential to positively impact society, as we can modify the behavior of a community. For example, we can increase the overall health of a population; my main motivation is to spread information about HIV prevention in the homeless population of Los Angeles. 1.1.3 Team Assessment Team assessment is essential for decision-centered teamwork. We must be able to assess if a team is likely to succeed or fail in order to take remedial procedures when appropriate. For instance, we could switch team members or the aggregation mechanisms according to the current assessment, in order to increase the team performance in an online manner. Hence, it is important to develop methods to quickly assess a team's performance. I propose a model for team assessment [Nagarajan et al., 2015, Marcolino et al., 2016b], which allows me to develop a novel domain independent technique to predict online the nal reward of a team. The nal reward is dened by a random variable; which is in uenced by a set of variables representing the subset of agents that agreed on the chosen action at each world state. Based on that, I show that the nal reward can be predicted by linear models, and I derive a domain-independent prediction function using only the frequencies of agreement of each subset of agents. Additionally, I show that the quality of the prediction increases as the action space grows larger. I evaluate the quality of the predictions using a real system of Computer Go playing agents, over a varying of board sizes. I study the predictions at every turn of the games, and compare with an analysis performed by using an in-depth search. I achieve an accuracy of 71% for a diverse team in 9 9 Go, and of 81% when I increase the action space size to 21 21 Go. For a uniform team, I obtain 62% accuracy in 9 9, and 75% accuracy in 21 21 Go. I also predict the nal performance of an ensemble system that votes to classify hand-written digits, demonstrating the domain independence of my method. 7 1.2 Guide to thesis This thesis is organized as follows: Chapter 2 introduces the related work and the nec- essary background for the research presented in this thesis. Part II introduces the work on agent selection: Chapter 3 presents my rst diversity model; Chapter 4 shows my second model, where diverse teams in large action spaces are analyzed; and Chapter 5 presents my model about agent teams for design problems. Part III introduces my work on aggregation of opinions: Chapter 6 presents an experimental study on ranked voting rules in Computer Go and Social Networks, and Chapter 7 studies a linear combination of two algorithms for simultaneously in uencing and mapping social networks. Part IV focuses on team assessment, where Chapter 8 introduces a domain independent technique to predict success or failure in teams of voting agents. Finally, Chapter 9 presents my conclusions and discussions for future work. 8 Chapter 2 Background and Related Work If I have seen further it is by standing on the shoulders of Giants. (Isaac Newton) In this chapter I discuss relevant background and related work for this thesis. The study of decision-centered teamwork relates to dierent areas of study in Computer Sci- ence, and even in other elds, such as Social Science. Hence, in this section I will discuss the literature in Multi-agent Systems, focusing on the study of Social Choice (Section 2.1) and Teamwork (Section 2.2); Machine Learning, especially Ensemble Systems (Section 2.4); and Social Science (Section 2.3), where many models have been proposed to study diversity in human teams. Since the thesis also introduces the problem of simultaneously in uencing and mapping social networks (in Chapter 7), I will also discuss the relevant literature for this problem in Section 2.7. 2.1 Social Choice Social choice studies theoretical frameworks for analyzing the combination of individual preferences or opinions into a collective decision. One of the most common methods to aggregate the opinions of dierent agents into taking a decision is voting. Hence, voting is extensively studied in the social choice literature. Normally, voting is seen under one of two possible views: (i) as a way to decide on a fair outcome, given the preferences of dierent individuals; (ii) as a way to use the opinions of dierent agents to estimate a correct outcome, or a \truth". The classical work in the second view is the Condorcet's Jury Theorem [Condorcet, 1785], published in the 18th century. Condorcet developed the theorem to justify the use of voting in the jury system. According to his theorem, when facing a binary decision, as long as the average of the probability of each individual being correct is higher than 1 2 , a 9 \Correct Outcome" Agent 1's vote Agent 2's vote ... Agent n's vote Figure 2.1: \Classical" voting model. Each agent has a noisy perception of the truth, or correct outcome. Hence, its vote is in uenced by the correct outcome. group of independent individuals doing majority voting will have a higher probability of being correct than the individuals alone. In fact, as the number of voters go to innite, the probability of taking a correct decision converges to 1. This theorem is extended to the k options case in List and Goodin [2001], where it is shown that if each of the individuals have a probability of choosing the best answer higher than choosing any other answer, the group performing majority voting will still be stronger than the individuals alone. Recently, social choice has been considered extensively in the articial intelligence literature. Each agent is modeled as having a noisy perception of the truth (or correct decision) [Conitzer and Sandholm, 2005]. Hence, the correct outcome in uences how each agent is going to vote, as shown in the model in Figure 2.1. Therefore, given a voting prole and a noise model (the probability of voting for each action, given the correct outcome) of each agent, we can estimate the likelihood of each action being the best by a simple (albeit computationally expensive) probabilistic inference. Any voting rule is going to be optimal if it corresponds to always picking the action that has the maximum likelihood of being correct (i.e., the action with maximum likelihood of being the best action), according to the assumed noise model of the agents. That is, the output of an optimal voting rule always corresponds to the output of actually computing, by the probabilistic inference method mentioned above, which action has the highest likelihood of being the best one. Classically, however, this view of voting considers only teams of identical agents [List and Goodin, 2001, Conitzer and Sandholm, 2005], and thus they do not consider the importance of diversity when selecting agents. More recent works are also considering agents with dierent probability distribution functions. Caragiannis et al. [2013] study which voting rules converge to a true ranking as the number of agents (not necessarily identical) goes to innity. In Souani et al. [2012] the problem of inferring the true ranking is studied, assuming agents with dierent pdfs, but drawn from the same family. Therefore, team formation is not the focus of their work. Since the classical works of Bartholdi III et al. [1989a,b], many works in social choice also study the computational complexity of computing the winner in elections [Aziz et al., 2015, Amanatidis et al., 2015], and/or of manipulating the outcome of an election, in 10 general by disguising an agent's true preference [Conitzer and Sandholm, 2003, Davies et al., 2011, Yang, 2015]. The main motivation of such works is in nding voting rules which are easy to compute (i.e., the winner can be computed in polynomial time), but are hard to manipulate (i.e., an agent would have to run an NP-Complete algorithm in order to calculate a voting prole dierent than its true preference in order to change the nal outcome of the voting procedure). There are also works studying the aggregation of partial [Xia and Conitzer, 2011a,b] or non-linear rankings (such as pair-wise comparisons among alternatives) [Elkind and Shah, 2014], since it could be costly/impossible to request agents for a full linear ranking over all possible actions. Some very recent works in social choice also analyze probabilistic voting rules, where the agents' votes aect a probability distribution over outcomes [Brandl et al., 2015, Chatterji et al., 2014]. 2.1.1 Large Solutions Pool In this thesis I also consider agent teams for design problems, where the objective is to maximize the number of optimal solutions. In Chapter 5 I discuss in detail why for design problems it is important to obtain a large number of solutions. In this chapter I present the current state of the art in social choice with regards to obtaining a large solution pool. Most of the social choice literature is about nding a correct ranking in domains where there is a linear order over the alternatives, and hence a unique optimal decision [Conitzer and Sandholm, 2005, Caragiannis et al., 2013, List and Goodin, 2001, Souani et al., 2012, Baharad et al., 2011]. Recent works, however, are considering more complex domains. Farfel and Conitzer [2011] analyze the case where instead of having a single preferred action, voters have a preference over an interval of real numbers. Xia and Conitzer [2011a] study the problem of nding k optimal solutions, where k is known beforehand, by aggregating rankings from each agent. However, not only do they need strong assumptions about the quality of the rankings of such agents, but they also show that calculating the maximum likelihood estimation (MLE) from the rankings is an NP- hard problem. Procaccia et al. [2012] study a similar perspective, where the objective is to nd the top k options given rankings from each agent, where, again,k is known in advance. However, in their case, they assume there still exists one unique truly optimal choice, hidden among these topk alternatives. Elkind and Shah [2014], motivated by the crowdsourcing domain, study the case where instead of rankings, the voters output pairwise comparisons among all actions, which may not follow transitivity. However, their nal objective is still to pick a single winner, not to maximize the set of optimal solutions found by a voting 11 system. Oren and Lucier [2014] study the problem of nding the top k options in an on-line setting, where we must take a decision concerning adding a candidate to ak-sized pool at each time we see the (ranked) vote of an agent; hence the problem of maximizing the number of optimal solutions is not considered. Finally, outputting a full comparison among all actions can be a burden for an agent [Boutilier, 2002, Kalech et al., 2011]. In this thesis I show that actual agents can have very noisy rankings, and therefore do not follow the assumptions of previous works in social choice. Hence, as any agent is able to output at least one action (i.e., a single vote), I study here systems where agents vote across multiple iterations, which brings the model closer to real applications for design. 2.2 Teamwork Teamwork is an important topic of research in the multi-agent systems literature. Many theoretical frameworks of teamwork have been developed. For example, Cohen and Levesque [1991] introduced the Joint Intentions theory. In that framework, a team has a joint mental state. All agents work to achieve a certain objective in the joint mental state. If one of the agents discovers that the objective has been achieved, or became irrelevant/impossible, then it must communicate with its teammates in order to pass this belief to the joint mental state. In the SharedPlans [Grosz and Kraus, 1996] framework, there is a set of possible recipes for achieving one action, which are composed by subac- tions, forming a hierarchy; and agents may have individual plans to perform some of the subactions. These ideas are combined in an actual implemented framework in STEAM [Tambe, 1997], where agents build a hierarchy of joint intentions when performing tasks in three dierent domains. STEAM is further extended in Tambe et al. [2002], where a markov decision process (MDP) model is proposed, enabling agents to autonomously decide when to transfer control (i.e., decision-making) to humans or other agents. A common approach to coordination is to consider a centralized controller, that plans beforehand the best action each agent should take at each world state (i.e., plans oine); and the agents merely execute the actions ascribed to them in the plan. These are nor- mally studied in the Dec-POMDP framework (decentralized partially observable Markov decision process) [Bernstein et al., 2002], and policy iteration is one of the main methods to compute the global plan (i.e., a policy) [Bernstein et al., 2009]. Task allocation is also an important approach to coordinate a team. A problem or goal is distributed in a set of tasks, and each agent must execute a subset of those. The contract net protocol [Smith, 1980] is a very common technique for task allocation, where agents can be manager and contractors. A manager receives bids and allocate a task to 12 the most appropriate agent. Upon being allocated a task, the agent (contractor) must execute it, but it can divide those in subtasks and also act as a manager to allocate those. A similar approach is the auction based task allocation mechanism [Bertsekas, 1992], where agents submit bids to compete for tasks, like in actual auctions. For large teams, especially swarms, a common approach is to specify individual be- haviors for each agent, in order for a desired global behavior to emerge in a distributed fashion [Marcolino and Chaimowicz, 2008, 2009a,b]. Designing the individual behaviors in an automatic way, however, is still an open challenge. Decision-centered teamwork is a fundamentally dierent form of teamwork from the ones previously described. Instead of each agent executing one action, or solving a certain subtask of a common goal, the team must decide on a single team action at each world state towards problem solving. However, it is still a form of teamwork, as all agents are fully cooperative, and they share a common goal. Aggregating their opinions at each step of problem-solving is their method of achieving that common goal. 2.2.1 Team Formation Team formation is also an active topic of research in multi-agent teamwork. However, the main focus has been in modeling tasks as requiring a set of skills; and the team formation problem is considered as selecting a set of agents with all the necessary skills but the minimum cost [He and Ioerger, 2003, Guttmann, 2008]. More recent work go beyond a simple sum of skills and also models the synergy of a group [Liemhetcharat and Veloso, 2012], or how to automatically congure a network of agents [Gaston and desJardins, 2005]. In Matthews et al. [2012], a team formation procedure is presented for a class of online football prediction games, and the system is able to play successfully against a large number of human players. Recently, ad-hoc teamwork is also surging as an important area of study in the multi- agent literature [Stone et al., 2013], where agents must coordinate without being pre- programmed to work together. Researchers have considered, for example, how to lead a group to the optimal joint action with a new ad-hoc agent [Agmon and Stone, 2012], or how to in uence a ock of agents [Genter and Stone, 2014]. Barret and Stone [2015] have demonstrated ad-hoc teamwork in practice in the simulated robot soccer domain. These works, however, do not analyze the importance of diversity when forming agent teams. 2.3 Social Science Forming teams to take decisions has also been studied in the social sciences. For example, Hong and Page [2004] is an impactful work showing the importance of diversity when 13 forming (human) teams. Even though recently some of the mathematical arguments were put into question [Thompson, 2014], it remains as a mile-stone on the study of the importance of diversity, as many researchers were in uenced by their work [Luan et al., 2012, Lakhani et al., 2007, Krause et al., 2011], showing the importance of diversity in dierent settings. In their model, each agent has a set of local minima that they reach while trying to maximize an objective function. The agents can improve the solution from the local minima of their team members, therefore the search of a team stops only in the intersection of the local minima of all agents. By using a large number of diverse agents the system is able to converge to the optimal solution. Their model, however, does not cover situations where agents are unable to improve the solution from their team members' local minima. This can happen, for example, when we use existing software, that were not architectured to collaborate in this way or when there are time constraints. Therefore, there are many situations where the agents have to collaborate in other ways, such as voting. If a team of agents votes, the system will not necessarily converge to an option in the intersection of their local minima. However, as I will show, it is still possible for a diverse team to play better than a uniform strong team. A more recent model to analyze diversity was proposed in LiCalzi and Surucu [2012]. It is an equivalent model to Page's and still do not overcome the limitations previously described. In Braouezec [2010], the authors show the benets of diverse agents voting to estimate the optimum of a single peaked function. In this thesis I am dealing with a harder problem, as the function to be optimized changes at every iteration. Another work that uses voting to study diversity is West and Dellana [2009], but they assumed that Page's model would work in a voting context, and do not propose a new model. Lamberson and Page [2012] study diversity in the context of forecasts. They assume that solutions are represented by real numbers, and a team converges to the average of the opinion of its members. Hence, they do not capture domains with discrete solutions, and the model also does not cover teams of voting agents. Prediction markets are also studied in the social sciences. They are markets where the users bet on possible outcomes of real world events. For example, if a useru believes that an eventx will take place with probabilityp u (for instance, a certain candidate winning an election), then she can oer to buy a share onx with a pricep u . Similarly, a user can also oer to buy a share that an event x will not take place, and the price p u will correspond to the probability of the event not taking place. The nal market price of the event x can be interpreted as the probability ofx actually taking place in the world. Erikson and Wlezien [2008], for example, compare the accuracy of prediction markets in determining the nal outcome of elections with pools. Manski [2006] presents the rst theoretical study concerning to what extent the market prices can be actually understood as the 14 probabilities of events occurring. A historical analysis of political prediction markets in the USA is presented at Rhode and Strumpf [2004]. In this thesis, however, I will focus on voting as a way to aggregate opinions, rather than markets. 2.4 Machine Learning Combining multiple classiers through voting is a very common technique in machine learning [Polikar, 2012]. Such systems are called Ensemble Systems. Traditionally, each classier runs the same algorithm, but trained with dierent subsets of items, or dierent features. The way that each subset of items is selected for each classier denes dierent ensemble system approaches, for example: bagging, boosting, or ada-boost; while mixture of experts trains each classier in a dierent subspace of the feature space. Combining multiple classiers has been a very active research area. For example, Sylvester and Chawla [2005] uses a Genetic Algorithm to learn an optimal set of weights when combining multiple classiers, Chiu and Webb [1998] uses voting to predict the future actions of an agent, and AL-Malaise et al. [2014] use ensembles to predict the performance of a student. Diversity is known to be important when forming an ensemble, and some systems try to minimize the correlation between the classiers while training [Chen and Yao, 2009]. Still, an important problem is how to form the ensemble system, i.e., how to pick the classiers that lead to the best predictions [Fu et al., 2012]. My models allow us to make many predictions about teams as the action space and/or number of agents changes, and also compare the rate of change of the performance of dierent teams, or analyze agent teams in the context of design problems. To the best of my knowledge, there are no models similar to mine in the machine learning literature. Additionally, team assessment is also important for ensembles, as we need techniques to evaluate whether the current ensemble is performing well in order to change it if necessary. In Chapter 8 I will perform experiments in predicting the nal performance of an ensemble system. 2.5 Team Assessment As in Chapter 8 I discuss a technique for assessing the performance of a team of voting agents, in this section I discuss relevant literature in team assessment. Traditional team assessment methods rely heavily on tailoring for specic domains. Raines et al. [2000] present a method to build automated assistants for post-hoc, oine team analysis; but domain knowledge is necessary for such assistants. Other methods 15 for team analysis are heavily tailored for robot-soccer, such as Ramos and Ayanegui [2008], that present a method to identify the tactical formation of soccer teams (num- ber of defenders, midelders, and forwards). Mirchevska et al. [2014] present a domain independent approach, but they are still focused on identifying opponent tactics, not in assessing the current performance of a team. In the multi-agent systems community, we can see many recent works that study how to identify agents that present faulty behavior [Khalastchi et al., 2014, Lindner and Agmon, 2014, Tarapore et al., 2013]. Other works focus on verifying correct agent implementation [Doan et al., 2014] or monitoring the violation of norms in an agent system [Bulling et al., 2013]. Some works go beyond the agent-level and verify if the system as a whole conforms to a certain specication [Kouvaros and Lomuscio, 2013], or verify properties of an agent system [Hunter et al., 2013]. However, a team can still have a poor performance and fail in solving a problem, even when the individual agents are correctly implemented, no agent presents faulty behavior, and the system as a whole conforms to all specications. Sometimes even correct agents might fail to solve a task, especially embodied agents (robots) that could suer sensing or actuating problems. Kaminka and Tambe [1998] present a method to detect clear failures in an agent team by social comparison (i.e., each agent compares its state with its peers). Such an approach is fundamentally dierent than this work, as we are detecting a tendency towards failure for a team of voting agents (caused, for example, by simple lack of ability, or processing power, to solve the problem), not a clearly problematic situation that could be caused by imprecision/failure of the sensors or actuators of an agent/robot. Later, Kaminka [2006], Kalech and Kaminka [2007], Kalech et al. [2011] study the detection of failures by identifying disagreement among the agents. In my case, however, disagreements are inherent in the voting process. They are easy to detect but they do not necessarily mean that a team is immediately failing, or that an agent presents faulty behavior/perception of the current state. There is also a body of work that focuses on analyzing (and predicting the perfor- mance of) human teams playing sports games. For example, Quenzel and Shea [2014] learn a prediction model for tied NFL American football games, where they use logistic regression to predict the nal winner. They study the coecients of the regression model to determine which factors aect the nal outcome with statistical signicance. Heiny and Blevins [2011] use discriminant analysis to predict which strategy an American foot- ball team will adopt during the game. In soccer, Bialkowski et al. [2014] analyze data from games to automatically identify the roles of each player, and Lucey et al. [2015] also use logistic regression to predict the likelihood of a shot scoring a goal. We can also nd examples in basketball. Maheswaran et al. [2012] use a logistic regression model 16 to predict which team will be able to capture the ball in a rebound, while Lucey et al. [2014] study which factors are important when predicting whether a team will be able to perform an open 3-points shot or not. 2.6 Other Multiple Algorithms Techniques There is a vast literature on using multiple algorithms to solve complex problems. For example, in algorithm portfolios [Gomes and Selman, 2001], multiple programs run in parallel to solve a single problem instance, as one of them may be the fastest for the particular instance. In multi-start local search algorithms [Gyorgy and Kocsis, 2011], many searches run in parallel, and we must dynamically decide to allocate computational resources to a search process or to initialize a new search. There are also works about mul- tiple experts systems, where predictions are performed by choosing one among multiple experts' advice [Cesa-Bianchi and Lugosi, 2006]. In the robotics literature, multi-heuristic A* search has recently been proposed [Aine et al., 2014], where multiple heuristics are used simultaneously to nd a bounded suboptimal solution. Concerning distributed optimization, Chapter 5 is related to the study of distributed genetic algorithms [Knysh and Kureichik, 2010]. The experimental section on agent team for designs (Section 5.5) relates to the \island model", where populations evolve concurrently. This model has been improved recently, by Osaba et al. [2015], with a technique where some of the populations stop evolving temporarily in order to focus the search on promising ones. Normally, however, the populations interact by transferring osprings, not by voting, and a theoretical study of voting teams which must maximize the number of optimal solutions was never performed. Other recent works study how to run genetic algorithms on GPUs in order to solve classical NP-hard problems or navigate UAVs [Cekmez et al., 2013, 2014]. 2.7 Social Networks As in Chapter 7 I will focus on the problem of in uencing social networks that are not fully known in advance, I discuss here related works in in uence maximization. The in uence maximization problem has recently been a very popular topic of re- search. Normally, the main motivation is viral marketing in social media (like Facebook or MySpace). Hence, previous works assume full knowledge of the social network graph. The classical result is Kempe et al. [2003], where they study the approximation guarantee of a greedy algorithm for the \independent cascade model", which will be described in Chapter 7. Golovin and Krause [2010] extended that result to the case where we are able 17 to observe which nodes are already in uenced or not before picking each one. However, they still assume full knowledge of the graph. Similarly, Dhamal et al. [2015] study the specic situation where we select two subsets of nodes at two dierent phases, and we can observe the network between the phases. Cohen et al. [2014] focus on how to quickly estimate the potential spread of one node, since running simulations (as needed by the greedy algorithm) in large graphs is computationally expensive. A dierent view was studied by Yadav et al. [2015], where they analyze a model where nodes try to in uence their neighbors multiple times, and each edge has an existence probability. Here, how- ever, I will deal with a dierent kind of uncertainty, as in my model whole portions of the graph are completely unknown. Some works in the AAMAS (multi-agent systems) community study variations of the original in uence model. For example, Pasumarthi et al. [2015] study the problem where the objective is to maximize in uence not in the whole social network graph, but only in a subset of nodes which represent targeted consumers of a certain product. Anagnostopoulos et al. [2015] study the case where multiple entities compete to spread in uence in order to convince consumers to buy their respective products. Maghami and Sukthankar [2012] consider in uence maximization under more realistic assumptions than traditional models: they assume agents pertain to social groups, and the ones of the same group are more likely to in uence each other. Moreover, they assume multiple products, and that each agent has a certain probability of buying a particular one. Li and Jiang [2014] study a dierent way to increase the realism: they consider graphs with multiple edge types, modeling dierent connection types between people (for example, Facebook or real life interaction). Tsang and Larson [2014], instead of focusing on the problem of selecting subset of nodes, run several experiments to study the evolution of social networks, assuming that each node has a degree of adoption (instead of a binary value representing in uenced or not in uenced), and nodes are more likely to be in uenced by others with a similar adoption degree. However, even though all these works are more realistic than the traditional models, they still assume full knowledge of the social network graph. Chapter 7 is also related to the problem of submodular optimization (as in uence is a submodular function) by selecting elements without knowledge of the whole set. Badanidiyuru et al. [2014] present an algorithm for selecting the best subset, giving elements arriving from a \stream". Hence, in their case, given enough time the whole set would be seen, and which elements are discovered does not depend on which ones are selected. My problem is also related to the classical max-k-cover problem [Khuller et al., 1999], where we must pickk subsets that maximize our coverage of a set. In that case, however, 18 the elements of each subset is known. There is also an online version of the max-k-cover problem [Alon et al., 2009], where we pick one subset at a time, but an adversary xes which element must be covered next. However, the set and the available subsets are still known in advance. Similarly, Grandoni et al. [2008] study the case of covering a set whose elements are randomly chosen from a universe of possible elements. Another important related problem is the submodular secretary problem [Bateni et al., 2013], where we again pick a subset to optimize a submodular function. However, in that case, we receive one element at a time, and we must make an irrevocable decision of either keeping it or not. The problem of learning a graph online was studied in the context of security games by Barth et al. [2010], where there is a possible attack graph with unknown edges. They learn edges when they are used by an attacker for the rst time, and repeatedly update the probability of protecting each edge using an online learning algorithm [Freund and Schapire, 1999]. In my case, however, we cannot update weights iteratively, as we pick a node at most one time. Finally, Chapter 7 also relates to sequential decision making with multiple objec- tives [Roijers et al., 2013]. Here I do not aim at computing an optimal policy (which is computationally expensive), but at studying a greedy method, similar to other works in in uence maximization. My algorithm is a scalarization over two objectives, but such method was never studied before to in uence and map social networks. 19 Part II Agent Selection 20 Chapter 3 Diversity Beats Strength? I exclude no one I am strengthened by all My name is Diversity and yes I stand tall. Recognize me and keep me in the mix Together there's no problem that we can't x. I am your best hope towards true innovation And to many, I re ect hope and inspiration. (Charles Bennaeld) 3.1 Introduction Team formation is essential when dealing with a multi-agent system. Given limited resources, we must select a strong team to deal with a complex problem. Many works model team formation as selecting a team that accomplishes a certain task with the maximum expected value, given a model of the capabilities of each agent [Nair and Tambe, 2005, Guttmann, 2008]. Other works go beyond a simple sum of skills, for example by considering synergetic eects in a team of agents [Liemhetcharat and Veloso, 2012] or studying how to automatically congure a network of agents [Gaston and desJardins, 2005]. After forming a team, their members must work together. There are many dierent ways for a team to coordinate. One common and simple way is to use voting. By voting, a team of agents can get closer to nding the best possible decision in a given situation [List and Goodin, 2001]. Only one voting iteration might not be enough, sometimes the agents must vote continuously in many dierent scenarios. Consider, for example, agents that are cooperating in a board game [Obata et al., 2011, Soejima et al., 2010], deciding together stock purchases across dierent economic scenarios, or even picking 21 items to recommend to a large number of users [Burke, 2002]. This situation imposes a con ict for team formation: should we focus on the diversity of the team or on the strength of each individual member? Previous works do not address this issue. Diversity is proposed as an important concept for team formation in the eld of Economics and Social Science [Hong and Page, 2004, LiCalzi and Surucu, 2012]. However, Hong and Page [2004], LiCalzi and Surucu [2012] assume a model where each agent brings more information, and the system converges to one of the best options known by the group. When a team votes to decide its nal opinion, their model and theorems do not hold anymore. In the current literature on voting it is assumed a model where agents have a xed probability to take the best action [Condorcet, 1785, List and Goodin, 2001, Young, 1995, Conitzer and Sandholm, 2005, Xia, 2011], and under that model it is not possible to show any advantage in having a diverse team of agents. My experiments show, however, that a diverse team can outperform a uniform team of stronger agents. It is necessary to develop, therefore, a new model to analyze a team of voting agents. In this chapter, I present a new model of diversity and strength for a team of voting agents. The fundamental novelty of my model is to consider a setting with multiple world states, and each agent having dierent performance levels (characterized by dierent probability distributions) across world states. Under this model, I can show that a team of diverse agents can perform better than a uniform team composed by strong agents. I present the necessary conditions for a diverse team to play better than a uniform team, and study optimal voting rules for a diverse team. I show synthetic experiments with a large number of teams that demonstrate that both diversity and strength are important to the performance of a team. I also show results in one of the main challenges for Articial Intelligence: Computer Go. Go is an iterative game, and the possible board states can represent a great variety of dierent situations, in such a way that the relative strength of dierent Go playing software changes according to the board state. Therefore, we can use my model to study a team of agents voting to play Computer Go. By using a diverse team I am able to increase the winning rate against Fuego (one of the strongest Go software) by 18.7%, and the diverse team could play 11% better than a team of copies of Fuego. Moreover, the diverse team plays 15.8% better than one of the versions of parallelized Fuego. I also improve the performance of the diverse team by 12.7% using one of my proposed voting rules. Therefore, I eectively show that a team of diverse agents can have competitive strength, and even play better, than a uniform team composed by stronger agents. My new model provides a theoretical explanation for my results. 22 3.2 Methodology Let be a set of agents i voting to decide an action a in the set of possible actions A and be a set of world states ! j . I assume that we can rank the actions from best to worst and U j is the vector of expected utilities of the actions in world state ! j , ordered by rank. The agents do not know the ranking of the actions, and will vote according to some decision procedure, characterized by a probability distribution function (pdf) over action ranks. Hence, each agent i has a pdf V i;j for deciding which action to vote for in state ! j . Agents that have the same V i;j in all world states will be referred as copies of the same agent. Let j be the likelihood of world state ! j . If we expect the world states to be equally frequent, we can use j = 1=j j. I dene strength as the weighted average of the expected utility of an agent or a team. It is given by the following dot product: s = P ! j 2 j V j U j ; where V j is the pdf of the agent/team in world state ! j . V j can be calculated given a team of agents and a voting rule. A voting rule is a function that given the (single) votes of a team of agents, outputs an action. I dene the team formation problem as selecting from the space of all possible agents a set of n agents that has the maximum strength in the set of world states . An application does not necessarily know V i;j for all agents and for all world states. In this chapter, I focus on showing that the na ve solution of forming a team by selecting the strongest agents (or copies of the best agent) is not necessarily the optimal solution. Therefore, I am introducing a new problem to the study of team formation. I dene diversity as how dierent are the probability distributions of agents in in the set of world states : d = 1 jj 2 P ! j 2 P i 2 P k 2 j H(V i;j ; V k;j ); where H is a distance measure between two pdfs. In this chapter, I use the Hellinger Distance [Hellinger, 1909], given by: H(V i;j ; V k;j ) = 1 p 2 q P a2A ( p V i;j (a) p V k;j (a)) 2 . At each iteration, each agent will examine the current world state and submit its (single) opinion about which one should be the next action. The opinions are then combined using plurality voting, that picks as a winner the option that received the most votes. I consider in this chapter three dierent voting rules: simple - break ties randomly, static - break ties in favor of the strongest agent overall, optimal - break ties in favor of the strongest agent of each world state. I consider the static voting rule because in some applications we might have a clear idea of which is the strongest agent overall, but the information of which is the strongest agent for a given world state might not be available. I will encounter this situation in the Computer Go domain, as will be clear in Section 3.3.2. This voting procedure will repeat at every iteration, until the end, when the system can obtain a reward. 23 Agent State 1 State 2 State 3 State 4 Strength Agent 1 1 0 1 1 0.75 Agent 2 0 1 1 0 0.5 Agent 3 1 1 0 0 0.5 Agent 4 1 1 0 1 0.75 Agent 5 0 0 1 1 0.5 Table 3.1: A team of deterministic agents that can reach perfect play under simple voting. \1" indicates agent plays perfect action. 3.2.1 Diversity Beats Strength I rst present examples to demonstrate that a diverse team can play better than a uniform team. First, let's consider the simplest case, when all agents are deterministic. The team made of copies of the strongest agent will play as well as the strongest agent, no matter how many members we add in the team. However, a team of diverse agents can overcome the strongest agent, and even reach perfect play, as we increase the number of agents. Consider, for example, the team in Table 3.1. This diverse team of 5 agents will reach perfect play under simple voting, while copies of the best agent (Agent 1 or Agent 4) will be able to play well only in 3 out of 4 world states, no matter how many agents we use in the team. We can easily change the example to non-deterministic agents, by decreasing slightly the probability of them playing their deterministic action. An example is shown in Table 3.2, where I show the pdf of the agents for each world state. I considered the utility vector< 1; 0; 0> for all world states. The resulting strength of the teams is very similar to the deterministic case. Assuming all world states are equally likely, the strength of the diverse team is 0:9907, while copies of the best agent have strength 0:7499. Therefore, it is possible for a team of weak diverse agents to overcome a uniform team of stronger agents, when in certain states the individual agents are stronger than the overall strongest agent. Even if we make the number of agents go to innity, copies of the best agent will still be unable to perform the best action in one world state, and will play worse than the diverse team with only ve agents. This situation is not considered in the Condorcet's Jury Theorem, neither in the classical nor in the extended version, because they assume independent agents with a xed pdf. Therefore, in the previous models, we would not be able to show the importance of diversity. 24 Agent State 1 State 2 State 3 State 4 Agent 1 < 0:99; 0:01; 0> < 0; 0:99; 0:01> < 0:99; 0; 0:01> < 0:99; 0:01; 0> Agent 2 < 0; 0:99; 0:01> < 0:99; 0:01; 0> < 0:99; 0; 0:01> < 0; 0:01; 0:99> Agent 3 < 0:99; 0:005; 0:005> < 0:99; 0:005; 0:005> < 0; 0:5; 0:5> < 0; 0:5; 0:5> Agent 4 < 0:99; 0:01; 0> < 0:99; 0:004; 0:006> < 0; 0:4; 0:6> < 0:99; 0:003; 0:007> Agent 5 < 0; 0:3; 0:7> < 0; 0:7; 0:3> < 0:99; 0:005; 0:005> < 0:99; 0:002; 0:008> (a) Agents' pdfs Agent Strength Agent 1 0.7425 Agent 2 0.4950 Agent 3 0.4950 Agent 4 0.7425 Agent 5 0.4950 (b) Agents' strength Table 3.2: A team of non-deterministic agents that can overcome copies of the best agent. 3.2.1.1 Necessary Conditions I present a formal proof of the conditions necessary for a diverse team to play better than copies of the best agent, under the simple voting rule. If the conditions of the theorem are not met, we can simply use copies of the best agent as the optimal team. To simplify the presentation of the proof, we will consider a utility function with a value of 1 for the optimal action and 0 for the other actions. That is, we will consider the optimal team in a xed world state as the team that has the highest probability of performing the optimal action. Let best be the strongest agent in , anda best be the best action in a given world state. Theorem 3.2.1 For a diverse team to be the optimal team under the simple voting rule it is necessary that at least one agent in has a higher probability of taking the best action than best in at least one world state, or a lower probability of taking a suboptimal action than best in at least one world state. Proof: I develop the proof by showing that copies of the best agent of a given world state will be the optimal team in that world state. Therefore, it is necessary that the agents in the diverse team play better than the best agent overall in at least one world state. Let best;j be the strongest agent in world state ! j . Let's dene the pdf of this agent as <p 1 ;:::;p k >, wherep 1 is the probability of taking the best action. I will show that a team ofn copies of best;j doing simple voting will have a higher probability of taking the best action than a team ofn agents composed ofx copies of best;j andm agents i doing simple voting, where the probabilities of each i are given by <p 1 i ;p 2 + i;2 ;:::;p k + i;k >, i;l 08l2 (2;k) and P k l=2 i;l = i . 25 Given a team of agents, let them all vote. We will start with a team of x copies of agent best;j . We will perform m iterations, and at each one we will add either another agent best;j or agent i , where i is the current iteration. Let v i1 be the current vote result. The result of v i1 is either: (i) victory for a best , (ii) tie between a best and other options, (iii) defeat for a best . (i) Ifv i1 is a victory fora best , the new agent can change the result only when it votes in another option. Supposea l is an option that upon receiving one more vote will change a victory fora best into a tie betweena best anda l . Agent best;j will vote in optiona l with probability p l , while agent i will vote in option a l with probability p l + i;l . Therefore, if v i1 is such that one vote can change a victory for a best into a tie between a best and other options, agent i will have a higher probability of changing a victory for a best into a tie between a best and other options. (ii) If v i1 is a tie between a best and other options, agent best;j will break the tie in favor ofa best with probabilityp 1 while agent i with probabilityp 1 i . Therefore, agent best;j will have a higher probability of breaking the tie in favor ofa best . Moreover, ifa l is an option that is currently tied witha best , agent best;j will vote fora l with probabilityp l , while agent i with probabilityp l + i;l . Therefore, agent i will have a higher probability of changing a tie between a best and other options into a defeat for a best . (iii) If v i1 is a defeat for a best , agent best;j will vote for a best with probability p 1 while agent i will vote for a best with probability p 1 i . Therefore, if v i1 is such that one vote can change a defeat for a best into a tie between a best and other options, agent best;j will have a higher probability of changing a defeat fora best into a tie betweena best and other options. In all three cases, agent best;j leads to a higher increase in the probability of picking a best than agent i . Therefore, up to any iteration i, copies of best;j will have a higher probability of playing the best action than a diverse team. Hence, if best;j = best 8j, then copies of the best agent best will be the best team in all world states, and therefore it will be the optimal team. Therefore, for a diverse team to perform better, at least one agent must have either a higher probability of taking the best action or a lower probability of taking a suboptimal action than best in at least one world state. This theorem, however, only gives the necessary conditions for a diverse team to be stronger than a non-diverse team. The sucient conditions will depend on which specic game the agents are playing. Basically, given the pdf of the agents for a set of world states, we can calculate the pdf of both the diverse team, and the team made of copies of the best agent. If the diverse team has a higher probability of taking the best action in a subset of the world states that is enough for it to play better, considering that it will 26 have a lower probability of taking the best action in the complementary subset, then the diverse team will play better than copies of the best agent. 3.2.1.2 Optimal Voting Rules In my next theorem, I show that given some conditions, the optimal voting rule for a diverse team is to consider plurality voting, but break ties in favor of the strongest agent that participates in the tie. Basically, we have to assume that all agents are strong enough to contribute to the team, so no agent should be ignored. If there are harmful agents in the team, we can try to remove them until the conditions of the theorem are satised. Again, we consider a utility function with a value of 1 for the optimal action and 0 for the other actions. Given a team with size n, our conditions are: Assumption 1 Weak agents do not harm For any subset of with an even number of agentsn 0 , and for a xed world state! j , let 0 best;j be the best agent of the subset. We divide the agents in 2 sets: Weak containing the n 0 =2 1 agents that have the lowest probability of taking the best action and the highest probability of taking a suboptimal action, andStrong containing then 0 =2 agents that have the highest probability of playing the best action and the lowest probability of taking a suboptimal action (except for the best agent 0 best;j , that is in neither one of the sets). We assume that when all agents inWeak and 0 best;j vote together in an optiona x , and all agents in Strong vote together in another option a y , the probability of a x being the best action is higher than the probability of a y being the best action. Assumption 2 Strong agents are not overly strong Given a xed world state! j , I assume that ifm 1 agents voted in an actiona x andm 2 agents voted in an action a y , the probability of a x being the best action is higher than a y being the best action, if m 1 >m 2 . If there is a situation where the opinion of a set of agents always dominates the opinion of another set, we can try to remove the dominated agents until the assumption holds true. Theorem 3.2.2 The optimal voting rule for a team is to consider the vote of all agents, but break ties in favor of the strongest agent if the above assumptions are satised. Proof: By Assumption 2 we know that we are looking for a tie-breaking rule, as the action chosen by most of the votes should always be taken. Let's consider the sets and the voting result described in the Assumption 1. Let < p 1 ;:::;p k > be the pdf of agent 0 best;j , and the pdf of the other agents of the subset be <p 1 i ;p 2 + i;2 ;:::;p k + i;k >, i;l 08l2 (2;k) and P k l=2 i;l = i . Let b be a rank in (2;k). The probability of a x being the best action is given by: 27 P 1 = (p 1 ) Q i2Weak (p 1 i ) Q t2Strong (p b + t;b ); where is a constant (according to the Bayes theorem). While the probability that a y is the best action is given by: P 2 = (p b ) Q i2Weak (p b + i;b ) Q t2Strong (p 1 t ) By Assumption 1, we have thatP 1 >P 2 . We can generate another voting pattern by making one agent weak inWeak vote fora y and one agent strong inStrong vote fora x . The probability of a x being the best action will change to: P 0 1 =P 1 (p 1 strong)(p b + weak;b ) (p 1 weak )(p b + strong;b ) While the probability of a y being the best action will change to: P 0 2 =P 2 (p 1 weak )(p b + strong;b ) (p 1 strong)(p b + weak;b ) As (p 1 strong )> (p 1 weak ) and (p b + weak;b )> (p b + strong;b ) by the Assumption 1, we have that P 0 1 >P 1 . Similarly, as (p 1 weak )< (p 1 strong ) and (p b + strong;b )< (p b + weak;b ) by the Assumption 1, we have that P 0 2 <P 2 . Therefore, assuming thatP 1 >P 2 , we have thatP 0 1 >P 0 2 . Hence, for all modications that can be generated by switching one of the agents, it is better to break ties in favor of the strongest agent. We can use all these voting patterns as a base and apply the same process recursively, to generate all possible voting patterns with a tie. Therefore, it will always be better to break ties in favor of the strongest agent. Now I consider voting patterns with a tie between more than two options. Let's suppose that in this case breaking ties in favor of the strongest agent ( 0 best;j ) is not the optimal voting rule. Therefore, we should break the tie in favor of some option a y . This implies that a y has a higher probability of being the best action than a x , the option chosen by the best agent. Now let's remove the agents that voted in all other options except a x and a y . This aects the probability of a x and a y being the best action in the same way. Therefore, we should still break ties in favor of optiona y . However, we already showed that when there are two options we should break ties in favor of the strongest agent. Hence, we should break the tie in favor of option a x . So, by contradiction, we see that if there is a tie between more than two options we should still break ties in favor of the strongest agent. If the strongest agent of the team is not one of the agents involved in the tie, we can ignore the opinion of the strongest agent according to Assumption 2, and break the tie in favor of the strongest agent from the ones involved in the tie, because Assumption 1 applies to any subset of the agents. 28 An application may not have the knowledge of the pdf of the agents in individual world states. Therefore, I also study an approximation of the optimal voting rule, that break ties in favor of the strongest agent overall, instead of breaking ties in favor of the strongest agent in a given world state. In the next section we will see that both the optimal voting rule and our approximation improves the performance of a diverse team. 3.3 Results 3.3.1 Synthetic I perform synthetic experiments using the quantal response (QR) model for the agents [McKelvey and Palfrey, 1995]. The quantal response model is a pdf from behavioral game theory to approximate how human beings (or non-rational players) behave while playing a game. It states that the probability of playing the best action is the highest, and it decays exponentially as the utility of the action gets worse. I use the QR model in my experiment, because it is a convenient way to represent non-rational agents with dierent strengths playing a game with a great number of options. The pdf depends on a parameter, , that denes how rational (i.e., strong) is the agent. As gets higher, the agent provides a closer approximation to a perfect player. I dene a ij for each agent i and world state j. I generated 1200 random teams of 4 agents, playing in 10 world states, and with 82 possible actions. I dene each ij as a random number in the interval (0; 7), according to a uniform distribution. For each team, we can calculate the diversity and the average strength of the agents, according to the equations dened earlier. In Figure 3.1, we can see the performance of each team, as a function of diversity and the strength of its members. The strength of a team can be calculated after we generate the pdf of the team, by calculating the probability of all possible situations where the system would pick a particular ranking position. I assume that all world states are equally likely, hence the strength of a team is the average over all world states. I used a utility vector that gives a value close to 1 to the best action, and a low value to the other actions. I performed a multiple linear regression for each voting rule. The following models were found: simple: z = 0:09 + 1:48s + 0:45d; static: z = 0:03 + 1:36s + 0:55d; optimal: z = 0:09 + 0:92s + 1:29d. The variable s is the average strength of the team members,d is the diversity of the team, andz is the strength of the team. The coecient of multiple determination (R 2 ) of the models are 0:96, 0:81, 0:88, respectively. As can be seen, both diversity and strength had a positive weight. This shows that groups with more diversity are stronger, given a xed strength for their members. It is interesting to note that the impact of diversity increases as we change the voting 29 (a) Simple Voting (b) Static Rule (c) Optimal Rule Figure 3.1: 1200 random teams of 4 agents. rule from simple to static, and from static to optimal. The mean strength of all teams are 0:56(0:08), 0:61(0:08), 0:74(0:06), respectively. We can note that, as expected, simple had the lowest strength, followed by static, and optimal had the highest strength. The optimal voting rule is 30% stronger than simple voting in average. 3.3.2 Experiments in Computer Go I also perform experiments with four Go software: Fuego 1.1, GnuGo 3.8, Pachi 9.01, MoGo 3, and two (weaker) variants of Fuego (Fuego and Fuego), in a total of 6 dierent agents. These are all publicly available Go software. Fuego is known to be the strongest Go software among all of them. Fuego, Pachi and MoGo follow a UCT Monte Carlo Go algorithm [Gelly et al., 2006]. Fuego uses heuristics to simulate games during the Monte Carlo simulations. There are mainly 5 possible heuristics in Fuego's 30 code. These heuristics have a hierarchical order, and the original Fuego agent follows the order <Atari Capture, Atari Defend, Lowlib, Pattern> (The heuristic called Nakade is not enabled by default). Fuego follows the order <Atari Defend, Atari Capture, Pattern, Nakade, Lowlib>. Fuego follows the order <Atari Defend, Nakade, Pattern, Atari Capture, Lowlib>. The memory available for Fuego and Fuego is half of the memory available for Fuego. All results presented are obtained by playing 1000 9x9 Go games, in a HP dl165 with dual dodeca core, 2.33GHz processors and 48GB of RAM. I rst present results when my system plays as white, against the original Fuego playing as black with opening database. Then, I present results of my system playing as black, against the original Fuego playing as white with opening database. I will compare the winning rate of dierent agents and teams when playing against the same opponent. When I say that a result is signicantly better than another, I use a t-test with 1% signicance level ( = 0:01). I call a team composed by dierent Go software as \Diverse" or by the name of the voting rule that they use (\Simple" or \Static"). The team of copies of the strongest agent (Fuego) will be called \Uniform". The copies are initialized with dierent random seeds, therefore due to the nature of the search algorithms, they will not always choose the same movement. When I want to be explicit about the number of agents in a team I will add a number after the name of the team. \Diverse" is composed by Fuego, GnuGo, Pachi and MoGo when executed with 4 agents, and is composed by all agents when executed with 6 agents. I also work with a parallelized version of Fuego (\Parallel"), and I will add a number after its name to indicate the number of threads. Before introducing my results, I rst analyze the agents under the classical voting theory and under my proposed theory. To simplify the analysis, I consider here the probability of playing the best move (P best ); therefore, I consider a utility vector with a value of 1 for the best move, and 0 for the other moves. I start by the classical voting theories. In order to estimate P best , I use 1000 board states from my experiments. In 1000 games, I randomly choose a board state between the rst and the last movement. I then ask Fuego to perform a movement in that state, but I give Fuego a time limit 50x higher than the default one. Therefore, Fuego is approximating how a perfect (or at least much stronger) player would play. To avoid confusion with the names I will call this agent Perfect. I then obtain Perfect's evaluation for all the positions of the board, and organize them into a ranking. I ran all agents in the selected 1000 board states and for each state I verify in which position of the ranking each agent would play. If, instead of playing, the agent resigns, I randomly pick a dierent board state and regenerate the data for all agents, including 31 Player P best Fuego 52.3% GnuGo 26.4% Pachi 40.6% MoGo 40.8% Fuego 48.8% Fuego 47.7% (a) Players Team P best Simple 4 57.5% Static 4 61.8% Uniform 4 79.6% Simple 6 71.1% Static 6 72.4% Uniform 6 86.6% (b) Teams Table 3.3: Probability to select the best move of each player and the teams. Perfect's evaluation. Based on that, I can generate a histogram for all agents, shown in Figure 3.2. Assuming that the agents are independent, and that each one will choose a move according to the probability distribution corresponding to its histogram, we can calculate P best of any group and voting rule that we want. Basically we have to calculate the probability of all the possible situations where the system would pick the best move. For a team of k agents we have to calculate O(n k1 ) probabilities, where n is the number of possible options. While for a team of 4 agents I am able to calculate the precise value, for a team of 6 agents I am going to show approximations. In Table 3.3 we can see P best of each individual player and of all teams. The P best of the teams is higher than the P best of each one of the agents, and is higher for a team of 6 agents than for a team of 4 agents. This result is expected when we consider the extended version of the Condorcet's Jury Theorem [List and Goodin, 2001], at least for a uniform team. According to the theorem P best approaches 1 when the number of agents goes to innity. However, we would also expect Uniform to perform better than Diverse. Would it be possible, then, for a diverse team to perform better than a uniform team? Intuitively, we would expect that a uniform team would agree on certain moves much more often than a diverse team. And indeed, when we look at the graph of the frequency of the size of the set of agents that voted for the winning move (Figure 3.3), we can see that they are very dierent. In the x-axis I show the number of agents that agreed on the selected movement, and in the y-axis the frequency of each number considering all moves in the 1000 games. The expected size of the set for Diverse is 3:50, while for Uniform is 4:43. Therefore, if Fuego plays badly in a certain board state, all copies of Fuego would also tend to vote for the same bad moves. In a diverse team, however, some agents could be able to play better in that particular situation. The extended Condorcet's Jury Theorem assumes that agents are independent, but in fact their relative performances might change according to the state of the board. 32 0 20 40 60 80 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Ranking Frequency (a) Fuego's Histogram 0 20 40 60 80 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Ranking Frequency (b) GnuGo's Histogram 0 20 40 60 80 0 0.1 0.2 0.3 0.4 Ranking Frequency (c) Pachi's Histogram 0 20 40 60 80 0 0.1 0.2 0.3 0.4 Ranking Frequency (d) MoGo's Histogram 0 20 40 60 80 0 0.1 0.2 0.3 0.4 0.5 Ranking Frequency (e) Fuego's Histogram 0 20 40 60 80 0 0.1 0.2 0.3 0.4 0.5 Ranking Frequency (f) Fuego's Histogram Figure 3.2: Histogram of the agents, using real data. 33 1 2 3 4 5 6 0 0.3 0.2 0.1 Size of set of winning agents Frequency (a) Uniform Size of set of winning agents 1 2 3 4 5 6 0 Frequency 0.3 0.2 0.1 (b) Diverse Figure 3.3: Expected size of the set of agents that vote for the winning move, with 6 agents and no opening database. Player # Higher P best GnuGo 17% (12%) Pachi 21% (11%) MoGo 20% (7%) Fuego 25% (6%) Fuego 26% (6%) Table 3.4: Weak agents can play better in some board states. In parentheses, I show when the dierence in P best is 99% signicant. Now I analyze the agents according to my proposed theory. I will use Theorem 3.2.1 to justify that it is worth it to explore a diverse team. If Fuego, the strongest agent, is always stronger in all board positions, then we can just use copies of Fuego as the optimal team. Therefore, I will test if all agents are able to play better than Fuego in some board positions. I selected 100 board states, and I played all agents 50 times for each board state. Based on my estimate of the best move (obtained from Perfect), I can calculate P best for each agent and for each board state. In Table 3.4, we can see in how many board states the agents have a higher P best than Fuego (in its default time limit). As can be observed, all agents are able to play better than Fuego in some board positions, therefore it is possible for a diverse team to play better than copies of the best agent. As the number of board states where an agent plays better is not small, we can expect that a diverse team should be able to overcome the uniform team. According to Theorem 3.2.2, if we assume that the weak agents (like GnuGo) are not weak enough to harm the system, and the strong agents (like Fuego and its variants) are not strong enough to dominate a subset of the agents, then the optimal voting rule is to break ties in favor of the strongest agent. However, during a game the system does not have access to the pdf of the agents, and has no way to identify which is the strongest agent. Therefore, I present results using the static voting rule, that break ties in favor 34 of the strongest agent overall. Based on my synthetic results, we can predict that static should perform better than simple. I also tried a weighted voting rule, which allowed me to empirically learn the best weights by a hill climbing algorithm. The resulting rule was equivalent to the static voting rule. We can see my results for white in Figure 3.4(a,b). Diverse plays signicantly better than Fuego, with 6 agents or with the static voting rule. When I keep the opening database, Diverse plays signicantly better than Uniform and Parallel with 6 agents. Without the opening database, Diverse still plays signicantly better than Parallel with 6 agents, but the dierence between Diverse and Uniform is not signicant. Static is either signicantly better than Simple, or the dierence between them is not signicant. In Figure 3.4(c,d) we can see the results for black. Again, Diverse plays signicantly better than Fuego when using the static voting rule. This time, however, Diverse (with 6 agents or using the static voting rule) is able to play signicantly better than Uniform without the opening database, but with the opening database the dierence between them is not signicant. Again, Static is either signicantly better than Simple, or the dierence between them is not signicant. Static is always signicantly better than Parallel. To verify the generality of improving the results by the static voting rule and by adding more agents, I also played our system as white against Pachi as black, without opening database. Simple 4 won 56.2% of the games, Static 4 won 65.5% and Simple 6 won 66.8%. Therefore, these techniques can improve the results in other situations. By the classical view of voting, my experimental result is not expected. If we view each agent as having a xed pdf, we would predict that copies of the best agent would perform much better than a diverse team with weaker agents. However, in my results I showed that the diverse team has a competitive strength, and is able to play even better than copies of the best agent in some situations. My new model provides a theoretical explanation for my experimental results. 3.4 Detailed Study: Why a Team of Diverse Agents Perform Better? I study in detail three games from my experiments, in order to better understand why a team of weak players can perform as well, or better than a team made of copies of the best player. I study games with 6 agents, using the simple voting rule. According to my theoretical work, at least one agent must play better than the strongest agent in at least one world state for a diverse team to overcome a uniform team. These are only necessary conditions, for a diverse team to eectively play better this must happen in many world states, specially in critical situations that can decide the game. Here I show that this 35 With DB No DB 0 0.1 0.2 0.3 0.4 0.5 0.6 Winning Rate GnuGo MoGo Pachi FuegoΔ Fuego Θ Fuego Simple 4 Static 4 Simple 6 Static 6 (a) Results for white. Single agents and the diverse team (Sim- ple/Static). With DB No DB 0 0.1 0.2 0.3 0.4 0.5 0.6 Winning Rate Simple 4 Static 4 Simple 6 Static 6 Parallel 4 Parallel 6 Uniform4 Uniform6 (b) Results for white. The uniform team, the diverse team (Simple/Static), and a parallelized agent (Parallel). With DB No DB 0 0.2 0.4 0.6 0.8 1 Winning Rate GnuGo MoGo Pachi FuegoΔ Fuego Θ Fuego Simple 4 Static 4 Simple 6 Static 6 (c) Results for black. Single agents and the diverse team (Sim- ple/Static). With DB No DB 0 0.2 0.4 0.6 0.8 1 Winning Rate 0.6 Simple 4 Static 4 Simple 6 Static 6 Parallel 4 Parallel 6 Uniform4 Uniform6 (d) Results for black. The uniform team, the diverse team (Sim- ple/Static), and a parallelized agent (Parallel). Figure 3.4: Results in the Computer Go domain. The error bars show the condence interval, with 99% of signicance. 36 really happens in Computer Go, based on an analysis by an expert human player. As Go is a complex game, note that some expert readers might not agree completely with all points of this analysis. Although I present results in the Computer Go domain, this phenomenon should also occur in other complex domains, where the relative strength of the agents change according to the world state. These games are analyzed by Chao Zhang, a 4-dan amateur Go player. In order to show that the weak agents are not playing better simply by chance, I estimate the probability of all agents playing all analyzed moves by repeatedly playing them 100 times in the board state under consideration. Based on these probabilities, I calculate the probabilities of the diverse team and the uniform team, to show that the diverse team would perform better in these board states. An important point to note is that it is not the case that a certain subset of the agents always vote for a better move; the set of agents that can nd a better move than Fuego changes according to each board state. This analysis requires some Go knowledge to be fully understood. Go is a turn-based game between two players: black and white. At each turn, the players must place a stone in an empty intersection of the board. If a group of stones is surrounded by the opponent's stones they are removed from the board (i.e. they are \killed"). The stones that surround an area form a territory, whose value is counted by the number of empty intersections inside. In the end of the game, the score is dened by the amount of territory minus the number of captured stones, and the player with the highest score wins. A detailed description of the rules can be found in Pandanet [2016]. I rst analyze the Go game in Figure 3.5. In some positions, the weak agents vote for better moves than Fuego, the strongest agent. Move 11 is a very interesting situation. Here, Fuego, Pachi and MoGo vote for move D4, while GnuGo votes for E8 (X). Even though GnuGo is the weakest agent, in this situation it is able to nd a better move than all other agents. E8 is better because it allows white to get the territory in the upper left corner. Besides, white can aim at G7 to kill the black group in the upper right. If white plays D4, black can play E8 to kill white aiming at the upper left corner. Unfortunately, GnuGo loses the vote in this situation. In all other positions, I show situations where the weak agents vote together for a better move than Fuego. For example, in move 23, Fuego votes for B4 () while Pachi, MoGo, Fuego and Fuego vote for B7. If white chooses B7, white can kill C7&D7 or B5&C5&D5&E5. If black saves C7&D7, white can use B4 to kill the other group; If black saves B5&C5&D5&E5, white can use C8 to kill C7&D7. If white chooses B4, black will use B7 to kill the white group in the upper left. Fuego's mistake is critical in this situation, and would lead to losing the game. In move 45, Fuego would make another mistake. Fuego votes for B9 (), while GnuGo and Pachi vote for H3. B9 wastes a move: it cannot aect the nal result and wastes a chance for further 37 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 0 1 2 3 4 5 6 7 8 9 10 11 Move11 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 12 13 14 15 16 17 18 19 20 21 22 23 Move23 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 24 25 26 27 2829 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 27 B3 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Move63 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 64 65 66 67 68 69 7071 72 73 74 75 64 , 70 C1 Move75 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 76 77 78 79 80 81 FullGame Move45 Figure 3.5: First example, the diverse team plays as white without the opening database against Fuego. White wins by resignation. developments. H3, on the other hand, is aiming at killing black in the right bottom. In this case Fuego also votes for B9, so the static voting rule would choose a worse move. In move 63, Fuego would play E3, while GnuGo, Pachi, Fuego and Fuego vote for G2. If white plays G2, the black group in the right bottom dies, while if white plays E3, white cannot kill them. This is another critical mistake, that would make white lose the game. Finally, in move 75, Fuego votes for A7, while Pachi and Fuego vote for G2. G2 is better than A7, as it allows white to have a larger territory. As can be seen, there are many situations where the weaker agents vote together for a better move than Fuego. The probabilities of each agent playing the analyzed moves can be seen in Table 3.5. It is clear that Fuego did not choose the worse move by accident: in many cases it has a lower probability than the other agents of playing the best move between the two options. Consequently, the uniform team is still not able to perform well in these situations, it still has a low probability of playing the best move, and it is always outperformed by the diverse team. In some situations, the probability of playing the worst move even increases by using multiple copies of Fuego. 38 Agent Move 11 Move 23 Move 45 Move 63 Move 75 E8* D4 B7* B4 H3* B9 G2* E3 G2* A7 Fuego 2% 51% 83% 14% 1% 76% 53% 29% 22% 16% GnuGo 100% 0% 0% 0% 100% 0% 100% 0% 0% 0% Pachi 6% 75% 30% 70% 46% 0% 78% 1% 35% 1% MoGo 2% 61% 100% 0% 0% 0% 0% 84% 53% 0% Fuego 24% 19% 100% 0% 16% 19% 76% 13% 24% 7% Fuego 35% 9% 99% 0% 12% 30% 78% 10% 31% 11% Diverse 15% 57% 99% 0% 20% 28% 88% 7% 45% 5% Uniform 0% 73% 95% 4% 0% 98% 63% 26% 23% 21% Table 3.5: Probability of playing the moves in the rst example. * indicates the better move. I now analyze the game in Figure 3.6. In move number 4, Fuego and Pachi vote for C7, while Fuego votes for move G3 (). G3 is a bad opening for white, because the two white groups would be split by black. Another example is in move 7, when GnuGo, Pachi and Fuego vote for B6, while Fuego votes for G7 (). Black and white are ghting in the upper left corner. If white plays G7, it waives the ght and plays in a place that is not immediately important. White should choose B6 to continue the ght in order to win. Even GnuGo, the weakest agent, knows that B6 is a better move. In move 25, GnuGo and Mogo choose A8, while Fuego chooses F2 (). If white does not play A8, black will play A5 to kill the white group in the left side. White has to kill with A8. This time Fuego's mistake is critical, and could lead to losing the whole game. In this situation GnuGo helps avoid a critical mistake, because Fuego also votes for F2. Moreover, it is an example of a case where the static voting rule fails, as it would break the tie in favor of Fuego. I expect that signicant improvements in game play would be possible if we learn which is the strongest agent in a given situation, and better approximate the optimal voting rule. Another interesting move is 37. Fuego and Fuego vote for D2, while MoGo and Fuego vote for E3. Both moves are equally good, as they get the same territory. However, GnuGo might have a better move: F6. If white plays F6, it can aim at both G6 and F4 for the next moves, which will cause great harm to black's territory. This is another example of a situation where the weakest agent has a better move than all other agents. The probabilities of each agent playing the analyzed moves can be seen in Table 3.6. Again, we can see that the diverse team would have a higher probability of nding the better moves than the uniform team. In the games with the opening database, an interesting one is in Figure 3.7. In move 29, GnuGo, Pachi and MoGo choose D2, while Fuego votes for B8 (). D2 can protect the lower left, while B8 cannot kill the black group in the upper left, and ends up making 39 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 0 1 2 3 Move3 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 4 5 6 7 Move7 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Move25 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 26 27 28 2930 31 32 33 34 35 36 37 38 39 40 41 42 43 FullGame Figure 3.6: Second example, the diverse team plays as white without the opening database against Fuego. White wins by resignation. Agent Move 3 Move 7 Move 25 Move 37 C7* G3 B6* G7 A8* F2 F6* D2 E3 Fuego 20% 2% 7% 41% 11% 30% 1% 53% 19% GnuGo 0% 0% 100% 0% 100% 0% 100% 0% 0% Pachi 27% 14% 99% 1% 28% 19% 26% 27% 0% MoGo 0% 8% 1% 0% 89% 0% 0% 41% 45% Fuego 20% 0% 34% 20% 28% 10% 0% 83% 7% Fuego 25% 4% 50% 7% 37% 11% 0% 80% 12% Diverse {% {% 77% 0% 70% 1% 1% 90% 5% Uniform 19% 0% 2% 30% 7% 27% 0% 84% 8% Table 3.6: Probability of playing the moves in the second example. Some results are unavailable due to lack of memory. * indicates the better move. 40 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2021 22 23 24 25 26 27 28 29 Move29 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 30 31 Move31 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 32 33 34 35 36 3738 39 40 41 42 43 44 45 Move45 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 46 47 48 49 50 51 Move51 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 Move67 A A B B C C D D E E F F G G H H J J 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 68 69 70 71 73 74 75 76 77 78 80 72 , 79 Pass FullGame Figure 3.7: Third example, the diverse team plays as white with the opening database against Fuego. White wins by resignation. it more solid. In move 31, Pachi and MoGo vote for B3, and Fuego votes for B1 (). Even if both moves might be able to kill the black stone in B2, B3 can kill it for sure. If white plays B1, black can play B3 and would lead to complications. This mistake could make white lose the game. If black survives, it can kill the white group in the lower left. In move 45, Pachi, MoGo and Fuego vote for F4, while Fuego votes for A4 (). F4 splits black into two groups and can make use of this division in the future. A4 just wastes a move and gives black more territory. In move 51, Pachi and MoGo choose E9, and Fuego chooses H6 (). E9 makes the white group on the left survive, while H6 wastes a move and will lead to the death of the white group. This is a critical mistake, that would make white lose the game. In move 67, GnuGo and MoGo vote for B4, while Fuego votes for G2 (). B4 is better, as it can get more territory. G2 just wastes a move. The probabilities of each agent playing the analyzed moves can be seen in Table 3.7. Again, in all these situations the diverse team has a higher probability of playing the better move than the uniform team. In some cases, the probability of playing the worse move even increases with multiple copies of Fuego. 41 Agent Move 29 Move 31 Move 45 Move 51 Move 67 D2* B8 B3* B1 F4* A4 E9* H6 B4* G2 Fuego 3% 16% 44% 26% 17% 40% 0% 35% 0% 12% GnuGo 100% 0% 0% 0% 0% 0% 0% 0% 100% 0% Pachi 77% 18% 64% 0% 78% 17% 90% 0% 3% 0% MoGo 91% 0% 98% 0% 92% 0% 51% 0% 46% 4% Fuego 6% 4% 11% 1% 51% 12% 0% 1% 0% 9% Fuego 5% 7% 13% 0% 50% 21% 0% 2% 0% 5% Diverse 82% 3% 75% 0% 54% 12% 37% 0% 9% 1% Uniform 0% 12% 56% 32% 5% 53% 0% 44% 0% 4% Table 3.7: Probability of playing the moves in the third example. * indicates the better move. 3.5 Conclusion and Discussions I showed that diverse teams can outperform teams composed by copies of the best player. However, it is still a challenge to nd the best possible teams. In an open multi-agent system the pdfs of the agents are generally not available. Moreover, in many complex scenarios we cannot even easily enumerate all the possible states of the world. Hence, given a world state, how can we quickly and automatically know the relative strength of the dierent agents? This is still an important open problem. I gave an initial step by studying in detail dierent scenarios where diverse agents are able to outperform the best agent. One possible direction for future work is to identify common characteristics of world states where a certain agent is able to play better than the best agent. Hence, given a new world state we would be able to estimate the strongest agent for that specic world state and better approximate the optimal voting rule. In addition, we could also dynamically change the team in order to have the best (or close to the best) possible one for each dierent scenario. In real-life scenarios, like robot teams, the problem is even more challenging. We can always estimate the pdf of an agent by running it multiple times in a given world state, if we have at least an estimation of the ground truth. However, for an embodied agent, the number of times we can sample might be very limited. A similar challenge is faced in Evolutionary Robotics [Nol and Floreano, 2001], where a great range of robots/controllers must be constantly evaluated. One common approach is to perform the evaluation in simulation, and implement in real life the best performing solution. Likewise, we could sample the pdf of dierent robots in simulation, in order to estimate their pdfs in the real world. Of course, the accuracy of the pdf estimation would depend on the accuracy of the simulation environment. 42 In general, however, even without knowledge of the pdfs of the agents, this chapter shows that a team composed by strong but very similar agents is not necessarily optimal. Hence, if an operator is not able to estimate the pdfs, she should at least evaluate the performance of diverse teams before picking only the strongest agents as the chosen team for a certain multi-agent application. In the next chapter I will present a second model of diversity, which will allow an operator to better identify in which situations diverse teams should be preferred. 43 Chapter 4 Give a Hard Problem to a Diverse Team Together, we form a necessary paradox; not a senseless contradiction. (Criss Jami) Team formation is crucial when deploying a multi-agent system [Nair and Tambe, 2005, Guttmann, 2008, Liemhetcharat and Veloso, 2012, Matthews et al., 2012]. Many researchers emphasize the importance of diversity when forming teams [LiCalzi and Su- rucu, 2012, Lamberson and Page, 2012, Hong and Page, 2004]. However, there are many important questions about diversity that were not asked before, and are not explored in such models. LiCalzi and Surucu [2012] and Hong and Page [2004] propose models where the agents know the utility of the solutions, and the team converges to the best solution found by one of its members. Clearly in complex problems the utility of solutions would not be available, and agents would have to resort to other methods, such as voting, to take a common decision. Lamberson and Page [2012] study diversity in the context of forecasts, where the solutions are represented by real numbers and the team takes the average of the opinion of its members. Domains where the possible solutions are discrete, however, are not captured by such a model. In the previous chapter, I studied teams of agents that vote in discrete solution spaces. I showed that a diverse team of weaker agents can overcome a uniform team made of copies of the best agent. However, this does not always occur, and the previous model do not present ways to know when we should use diverse teams. Moreover, it lacks a formal study of how the performance of diverse teams change as the number of agents and/or actions increases. In this chapter I shed new light on this problem, by presenting a new, more general model of diversity for teams of voting agents. My model captures better than the previous ones the notion of a diverse team as a team of agents that tend to not agree on the same 44 actions, and allows us to make new predictions. My main insight is based on the notion of spreading tail (ST) and non-spreading tail (NST) agents. As I will show, a team of ST agents has a diverse behavior, i.e., they tend to not agree on the same actions. Hence, I can model a diverse team as a team of ST agents, and show that the performance improves as the size of the action space gets larger. I also prove upper and lower bounds on how fast dierent teams converge. The improvement can be large enough to overcome a uniform team of NST agents, even if individually the ST agents are weaker. As it is generally hard to nd good solutions for problems with a large number of actions, it is important to know which teams to use in order to tackle such problems. Moreover, I show that the performance of a diverse team converges to the optimal one exponentially fast as the team grows. My synthetic experiments provide even further insights about my model: even though the diverse team overcomes the uniform team in a large action space, the uniform team eventually will again play better than the diverse team as the action space keeps increasing if the best agent does not behave exactly like an NST agent. Finally, I test my predictions by studying a system of voting agents, in the Computer Go domain. I show that a uniform team made of copies of the best agent plays better in smaller board sizes, but is overcome by a diverse team as the board gets larger. Moreover, I analyze the agents and verify that weak agents have a behavior closer to ST agents, while the best agent is closer to an NST agent. Therefore, I show that my predictions are veried in a real system, and can eectively be used while forming a multi-agent team. 4.1 Model for Analysis of Diversity in Teams Consider a problem dened by choosing an action a from a set of possible actions A. Each a has an utility U(a), and our goal is to maximize the utility. I always list the actions in order from best to worst, thereforeU(a j )>U(a j+1 )8j (a 0 is the best action). In some tasks (like in Section 4.2), a series of actions are chosen across dierent states, but here I focus on the decision process in a given state. Consider a set of agents, voting to decide over actions. The agents do not know the utility of the actions, and vote for the action they believe to be the best according to their own decision procedure, characterized by a probability distribution (pdf). I write as p i;j the probability of agent i voting for actiona j . I denote byp i;j (m), when I explicitly talk about p i;j for an action space of size m. If the pdf of one agent is identical to the pdf of another agent, they will be referred to as copies of the same agent. The action that wins by plurality voting is taken by the team. Ties are broken randomly, except when I explicitly talk about a tie breaking rule. Let D m be the set of suboptimal actions (a j ;j 6= 0) assigned with a nonzero probability in the pdf of an agenti, andd m =jD m j. I assume that 45 there is a bound in the ratio of the suboptimal action with highest probability and the one with lowest nonzero probability, i.e., letp i;min =min j2Dm p i;j andp i;max =max j2Dm p i;j ; there is a constant such that p i;max p i;min 8 agents i. I dene strength as the expected utility of an agent and/or a team. The probability of a team playing the best action will be called p best . I rst consider a setting where U(a 0 ) U(a j )8j6= 0, hence I can use p best as my measure of performance. I will later consider more general settings, where the rst r actions have a high utility. I dene team formation as selecting from the space of all agents a limited number of agents that has the maximum strength by voting together to decide on actions. I study the eect of increasing the size m of the set of possible actions on the team formation problem. Intuitively, the change in team performance as m increases will be aected by how the pdf of the individual agentsi change whenm gets higher. As we increasem,d m can increase or not change. Hence, I classify the agents as spreading tail (ST) agents or non-spreading tail agents (NST). I dene ST agents as agents whose d m is non-decreasing on m and d m ! 1 as m!1. I consider that there is a constant > 0, such that for all ST agents i,8m, p i;0 . I assume thatp i;0 does not change withm, although later I discuss what happens when p i;0 changes. I dene NST agents as agents whose pdf does not change as the number of actionsm increases. Hence, let m i0 be the minimum number of actions necessary to dene the pdf of an NST agent i. We have that8m;m 0 m i0 ,8jm i0 p i;j (m) =p i;j (m 0 ),8j >m i0 p i;j (m) = 0. I rst give an intuitive description of the concept of diversity, then dene formally diverse teams. By diversity, I mean agents that tend to disagree. In the previous chapter, a diverse team is dened as a set of agents with dierent pdfs. Hence, they disagree because of having dierent probabilities of playing certain actions. In this chapter, I generalize the previous denition to capture cases where agents disagree on actions, re- gardless of whether their pdfs are the same or not. Formally, I dene a diverse team to be one consisting of a set of ST agents (either dierent ST agents or copies of the same ST agent). In my theoretical development I will show that this denition captures the notion of diversity: a team of ST agents will tend to not agree on the same suboptimal actions. I call uniform team as the team composed by copies of an NST agent. This is an idealization to perform my initial analysis. I will later discuss more complex domains, where the agents of the uniform team also behave like ST agents. 46 Agents Action 1 Action 2 Agent 1 0.6 0.4 Agent 2 0.55 0.45 Agent 3 0.55 0.45 Uniform p best : 0.648 Diverse p best : 0.599 (a) With 2 actions, uniform team plays better than diverse team. Agents Action 1 Action 2 Action 3 Agent 1 0.6 0.4 0 Agent 2 0.55 0.25 0.2 Agent 3 0.55 0.15 0.3 Uniform p best : 0.648 Diverse p best : 0.657 (b) When we add one more action, diverse team plays better than uniform team. Table 4.1: Performance of diverse team increases when the number of actions increases. 4.1.1 A Hard Problem to a Diverse Team I start with an example, to give an intuition about my model. Consider the agents in Table 4.1(a), where I show the pdf of the agents, and p best of the uniform team (three copies of agent 1) and the diverse team (one copy of each agent). I assume agent 1 is an NST agent, while agent 2 and 3 are ST agents. In this situation the uniform team plays better than the diverse team. Now let's add one more action to the problem. Because agent 2 and 3 are ST agents, the probability mass on action 2 scatters to the newly added action (Table 4.1(b)). Hence, while before the ST agents would always agree on the same suboptimal action if they both did not vote for the optimal action, now they might vote for dierent suboptimal actions, creating a tie between each suboptimal action and the optimal one. Because ties are broken randomly, when this happens there will be a 1=3 chance that the tie will be broken in favor of the optimal action. Hence, p best increases when the probability of the ST agents agreeing on the same suboptimal actions decreases, and the diverse team now plays better than the uniform team, even though individually agents 2 and 3 are weaker than agent 1. I now present my theoretical work. First I show that the performance of a diverse team converges when m!1, to a value that is higher than the performance for any other m. Theorem 4.1.1 p best (m) of a diverse team of n agents converges to a certain value ~ p best as m!1. Furthermore, ~ p best p best (m),8m. 47 Proof: Letp i;min = min j2Dm p i;j ,p i;max = max j2Dm p i;j and T be the set of agents in the team. By my assumptions, there is a constant such thatp i;max p i;min for all agents i. Then, we have that 1 1p i;0 = P j2Dm p i;j d m p i;min . Therefore, p i;min 1 dm ! 0 as d m tends to1 with m. Similarly, p i;min ! 0 as d m !1. As p i;j p i;min we have that8j p i;j ! 0 as d m !1. I show that this implies that when m!1, weak agents never agree on the same suboptimal action. Leti 1 andi 2 be two arbitrary agents. Without loss of generality, assume i 2 's d m (d (i 2 ) m ) is greater than or equal i 1 's d m (d (i 1 ) m ). The probability ( i 1 ;i 2 ) of i 1 and i 2 agreeing on the same suboptimal action is upper bounded by i 1 ;i 2 = P a j 2Ana 0 p i 1 ;j p i 2 ;j d (i 2 ) m p i 1 ;max p i 2 ;max d (i 2 ) m p i 2 ;min p i 1 ;max p i 1 ;max (as d (i 2 ) m p i 2 ;min 1). We have that p i 1 ;max ! 0 as p i 1 ;max ! 0, because is a constant. Hence the probability of any two agents agreeing on a suboptimal action is P i 1 2T P i 2 2T;i 2 6=i 1 i 1 ;i 2 2 n(n1) 2 max i 1 ;i 2 i 1 ;i 2 ! 0, as n is a constant. Hence, whenm!1, the diverse team only chooses a suboptimal action if all agents vote for a dierent suboptimal action or in a tie between the optimal action and subop- timal actions (because ties are broken randomly). Therefore, p best converges to: ~ p best = 1 n Y i=1 (1p i;0 ) n X i=1 (p i;0 n Y j=1;j6=i (1p j;0 )) n 1 n ; (4.1) that is, the total probability minus the cases where the best action is not chosen: the second term covers the case where all agents vote for a suboptimal action and the third term covers the case where one agent votes for the optimal action and all other agents vote for suboptimal actions. When m is nite, the agents might choose a suboptimal action by agreeing over that suboptimal action. Therefore, we have that p best (m) ~ p best 8m. Letp uniform best (m) bep best of the uniform team, withm actions. A uniform team is not aected by increasingm, as the pdf of an NST agent will not change. Hence,p uniform best (m) is the same,8m. If ~ p best is high enough so that ~ p best p uniform best (m), the diverse team will overcome the uniform team, when m!1. Therefore, the diverse team will be better than the uniform team when m is large enough. In practice, a uniform team made of copies of the best agent might not behave exactly like a team of NST agents, as the best agent could also increase its d m as m gets larger. I discuss this situation in Section 4.2. In order to perform that study, I derive in the following corollary how fast p best converges to ~ p best , as a function of d m . Corollary 4.1.2 p best (m) of a diverse team increases to ~ p best in the order of O( 1 d min m ) and ( 1 d max m ), where d max m is the highest and d min m the lowest d m of the team. 48 Proof: I assume here the notation that was used in the previous proof. First I show a lowerbound on p best (m). We have that p best (m) = 1 1 , where 1 is the probability of the team picking a suboptimal action. 1 = 2 + 3 , where 2 is the probability of no agent agreeing and the team picks a suboptimal action and 3 is the probability of at least two agents agreeing and the team picks a suboptimal action. Hence, p best (m) = 1 2 3 = ~ p best 3 ~ p best 4 , where 4 is the probability of at least two agents agreeing. Let max = max i 1 ;i 2 i 1 ;i 2 , and i 1 and i 2 are the agents whose i 1 ;i 2 = max . We have that p best (m) ~ p best n(n1) 2 max ~ p best n(n1) 2 d (i 2 ) m p i 1 ;max p i 2 ;max ~ p best n(n1) 2 d (i 2 ) m p i 1 ;min p i 2 ;min ~ p best n(n1) 2 2 1 d (i 1 ) m (as p i;min 1 dm ). Hence, p best (m) ~ p best n(n1) 2 2 1 d min m ~ p best p best (m)O( 1 d min m ). Now I show an upper bound: p best (m) = ~ p best 3 ~ p best 5 , where 5 is the probability of at least two agents agreeing and no agents vote for the optimal action. Let min = min i 1 ;i 2 i 1 ;i 2 ; i 1 and i 2 are the agents whose i 1 ;i 2 = min ; and p max;0 = max i2T p i;0 . Without loss of generality, I assume that d (i 2 ) m d (i 1 ) m . Hence, p best (m) ~ p best n(n1) 2 min (1p max;0 ) n2 ~ p best n(n1) 2 d (i 1 ) m p i 1 ;min p i 2 ;min (1p max;0 ) n2 ~ p best n(n1) 2 d (i 1 ) m p i 1 ;max p i 2 ;max 2 (1p max;0 ) n2 ~ p best n(n1) 2 2 1 d i 2 m (1p max;0 ) n2 ~ p best n(n1) 2 2 1 d max m (1p max;0 ) n2 ~ p best p best (m) ( 1 d max m ). Hence, agents that change their d m faster will converge faster to ~ p best . This is an important result when I consider later more complex scenarios where thed m of the agents of the uniform team also change. Note that ~ p best depends on the number of agents n (Equation 4.1). Now I show that the diverse team tends to always play the optimal action, as n!1. Theorem 4.1.3 ~ p best converges to 1, as n!1. Furthermore, 1 ~ p best converges expo- nentially to 0, that is,9 constant c, such that 1 ~ p best c(1 2 ) n ,8n 2 . However, the performance of the uniform team improves as n!1 only if p s;0 = max j p s;j , where s is the best agent. Proof: By the previous proof, we know that when m!1 the diverse team plays the optimal action with probability given by ~ p best . I show that 1 ~ p best ! 0 exponentially as n ! 1 (this naturally induces ~ p best ! 1). I rst compute an upper bound for P n i=1 (p i;0 Q n j=1;j6=i (1p j;0 )): P n i=1 p i;0 Q n j=1;j6=i (1p j;0 ) P n i=1 p i;0 (1p min;0 ) n1 np max;0 (1p min;0 ) n1 n(1) n1 for p max;0 = max i p i;0 , p min;0 = min j p j;0 . Since Q n i=1 (1p i;0 ) (1) n , thus we have that 1~ p best (1) n +n(1) n1 . So we only need to prove that there exists a constantc such that (1) n +n(1) n1 c(1 2 ) n , as follows: (1) n+1 +(n+1)(1) n (1) n +n(1) n1 = (1) 1+n+1 1+n = 1 + 1 1+n 1 1 2 , if n 2 (by 49 setting 1 1+n 2 ). Hence,9c, such that (1) n +n(1) n1 c(1 2 ) n whenn 2 . Therefore, the performance converges exponentially. For the uniform team, the probability of playing the action that has the highest probability in the pdf of the best agent converges to 1 asn!1 [List and Goodin, 2001]. Therefore, the performance only increases asn!1 if the optimal action is the one that has the highest probability. Now I show that we can achieve further improvement in a diverse team by breaking ties in favor of the strongest agent. Theorem 4.1.4 When m ! 1, breaking ties in favor of the strongest agent is the optimal tie-breaking rule for a diverse team. Proof: Lets be one of the agents. If we break ties in favor ofs, the probability of voting for the optimal choice will be given by: ~ p best = 1 n Y i=1 (1p i;0 ) (1p s;0 )( n X i=1;i6=s p i;0 n Y j=1;j6=i;j6=s (1p j;0 )) (4.2) It is clear that Equation 4.2 is maximized by choosing agent s with the highest p s;0 . However, I still have to show that it is better to break ties in favor of the strongest agent than breaking ties randomly. That is, I have to show that Equation 4.2 is always higher than Equation 4.1. Equation 4.2 diers from Equation 4.1 only on the last term. Therefore, I have to show that the last term of Equation 4.2 is smaller than the last term of Equation 4.1. Let's begin by rewriting the last term of Equation 4.1 as: n1 n P n i=1 p i;0 Q n j=1;j6=i (1p j;0 ) = n1 n (1p s;0 ) P n i=1;i6=s p i;0 Q n j=1;j6=i;j6=s (1p j;0 ) + n1 n p s;0 Q n j=1;j6=s (1p j;0 ) This implies that: n1 n P n i=1 p i;0 Q n j=1;j6=i (1p j;0 ) n1 n (1p s;0 ) P n i=1;i6=s p i;0 Q n j=1;j6=i;j6=s (1p j;0 ). We know that: (1p s;0 ) P n i=1;i6=s p i;0 Q n j=1;j6=i;j6=s (1p j;0 ) = n1 n (1p s;0 ) P n i=1;i6=s p i;0 Q n j=1;j6=i;j6=s (1 p j;0 ) + 1 n (1p s;0 ) P n i=1;i6=s p i;0 Q n j=1;j6=i;j6=s (1p j;0 ) Therefore, for the last term of Equation 4.2 to be smaller than the last term of Equation 4.1 I have to show that: n1 n p s;0 Q n j=1;j6=s (1p j;0 ) 1 n (1p s;0 ) P n i=1;i6=s p i;0 Q n j=1;j6=s;j6=i (1p j;0 ) It follows that this equation will be true if: 50 p s;0 (1p s;0 ) P n i=1;i6=s p i;0 Q n j=1;j6=i;j6=s (1p j;0 ) (n1) Q n j=1;j6=s (1p j;0 ) p s;0 (1p s;0 ) 1 n1 P n i=1;i6=s p i;0 (1p i;0 ) p s;0 (1p s;0 ) P n i=1;i6=s p i;0 (1p i;0 ) n1 As s is the strongest agent the previous inequality is always true. This is because p s;0 1p s;0 = P n i=1;i6=s p s;0 (1p s;0 ) n1 and p s;0 1p s;0 p i;0 (1p i;0 ) 8i6= s. Therefore, it is always better to break ties in favor of the strongest agent than breaking ties randomly. Next I show that with one additional assumption, not only the diverse team converges to ~ p best , but alsop best monotonically increases withm. My additional assumption is that higher utility actions have higher probabilities, i.e., if U(a j )U(a j 0), then p i;j p i;j 0. Theorem 4.1.5 The performance of a diverse team monotonically increases with m, if U(a j )U(a j 0) implies that p i;j p i;j 0. Proof: Let an event be the resulted choice set of actions of these n agents. I denote by P (V ) the probability of occurrence of any event in V (hence, P (V ) = P v2V p(v)). I call it a winning event if in the event the action chosen by plurality is the best action a 0 (including ties). I assume that for all agents i, if U(a j )U(a j 0), then p i;j p i;j 0. I show by mathematical induction that we can divide the probability of multiple suboptimal actions into a new action and p best (m + 1)p best (m). Let be the number of actions whose probability is being divided. The base case holds trivially when = 0. That is, there is a new action, but all agents have a 0 probability of voting for that new action. In this case we have thatp best does not change, thereforep best (m + 1)p best (m). Now assume that we divided the probability of actions and it is true thatp best (m + 1)p best (m). I show that it is also true for + 1. Hence, let's pick one more action to divide the probability. Without loss of generality, assume it is action a dm , for agent c, and its probability is being divided into action a dm+1 . Therefore, p 0 c;dm =p c;dm and p 0 c;dm+1 =p c;dm+1 +, for 0p c;dm . Let p after best (m + 1) be the probability of voting for the best action after this new division, and p before best (m + 1) the probability before this new division. I show that p after best (m + 1)p before best (m + 1). Let be the set of all events where all agents voted, except for agent c (the order does not matter, so we can consider agent c is the last one to post its vote). If 2 will be a winning event no matter if agent c votes fora dm ora dm+1 , then changing agent c's pdf will not aect the probability of these winning events. Hence, let 0 be the set of all events that will become a winning event depending if agent c does not vote for a dm or a dm+1 . Given that 2 0 already happened, the probability of winning or losing is equal to the probability of agent c not voting for a dm or a dm+1 . Now let's divide 0 in two exclusive subsets: dm+1 0 , where for each 2 dm+1 action a dm+1 is in tie with action a 0 , so if agent c does not vote for a dm+1 , will be a 51 winning event; dm 0 , where for each 2 dm actiona dm is in tie with actiona 0 , so if agentc does not votes fora dm , will be a winning event. I do not consider events where botha dm+1 anda dm are in tie witha 0 , as in that case the probability of a winning event does not change (it is given by 1p 0 c;dm p 0 c;dm+1 = 1p c;dm p c;dm+1 ). Note that for each 2 dm+1 , the probability of a winning event equals 1p 0 c;dm+1 . Therefore, after changing the pdf of agent c, for each 2 dm+1 , the probability of a wining event decreases by . Similarly, for each 2 dm , the probability of a winning event equals 1p 0 c;dm . Therefore, after changing the pdf of agent c, for each 2 dm , the probability of a winning event increases by . Therefore,p after best (m+1)p before best (m+1) if and only ifP ( dm )P ( dm+1 ). Note that 8 2 dm+1 there are more agents that voted fora dm+1 than fora dm . Also,8 2 dm there are more agents that voted for a dm than for a dm+1 . If, for all agents i, p i;dm p i;dm+1 , we have that P ( dm )P ( dm+1 ). Therefore, p after best (m + 1)p before best (m + 1), so we still have that p best (m + 1) p best (m). Also note that for the next step of the induction be valid, so that we can still divide the probability of one more action, it is necessary that p 0 c;dm p 0 c;dm+1 . 4.1.2 Generalizations In the previous theorems I focused on the probability of playing the best action, assuming that U(a 0 ) U(a j )8j6= 0. I show now that the theorems still hold in more general domains where r actions (A r A) have a signicant high utility, i.e., U(a j 1 ) U(a j 2 ) 8j 1 < r;j 2 r. Hence, I now focus on the probability of playing any action in A r . I assume that my assumptions are also generalized, i.e., p i;j > 8j < r, and the number d m of suboptimal actions (a j , jr) in the D m set increases with m for ST agents. Theorem 4.1.6 The previous theorems generalize to settings where U(a j 1 ) U(a j 2 ) 8j 1 <r;j 2 r. Proof Sketch: I give here a proof sketch. We just have to generate new pdfs p 0 i;j , such that p 0 i;0 = P r1 j=0 p i;j , and p 0 i;b = p i;b+r1 ;8b6= 0. We can then reapply the proofs of the previous theorems, but replacing p i;j by p 0 i;j . Note that this does not guarantee that all agents will tend to agree on the same action in A r ; but the team will still tend to pick any action in A r , since the agents are more likely to agree on actions in A r than on actions in An A r . Now I discuss a dierent generalization: what happens when p i;0 decreases as m increases (8 agentsi). Ifp i;0 ! ~ p i;0 asm!1, the performance in the limit for a diverse team will be ~ p best evaluated at ~ p i;0 . Moreover, even if p i;0 ! 0, my conclusions about 52 0 50 100 150 200 250 300 Number of Option s 0.00 0.05 0.10 0.15 0.20 p b est - pr ed ict ion Diverse Uniform (a) Convergence of p best to predicted value. 0 50 100 150 200 250 300 Number of Option s 0.65 0.70 0.75 0.80 0.85 0.90 0.95 p b est Diverse Uniform (b) Actual performance value. Figure 4.1: Comparing diverse and uniform when uniform also increases d m . relative team performance are not aected as long as we are comparing two ST teams that have similar p i;0 : the same argument as in Corollary 1 implies that the team with faster growing d m will perform better. 4.2 Experimental Analysis 4.2.1 Synthetic Experiments I present synthetic experiments, in order to better understand what happens in real systems. I generate agents by randomly creating pdfs and calculate the probability of playing the best action (p best ) of the generated teams. I use a uniform distribution to generate all random numbers. When creating a pdf, I rescale the values assigned randomly, so that the overall sum of the pdf is equal to 1. As I said earlier, uniform teams composed by NST agents is an idealization. In more complex domains, the best agent will not behave exactly like an NST agent; the number of suboptimal actions with a non-zero probability (d m ) will also increase as the action space gets larger. I perform synthetic experiments to study this situation. I consider that the best agent is still closer to an NST agent, therefore it increases its d m at a slower rate than the agents of the diverse team. In my rst experiment, I use teams of 4 agents. For each agent of the diverse team, p i;0 is chosen randomly between 0.6 and 0.7. The remaining is distributed randomly from 10% to 20% of the next best actions (the number of actions that will receive a positive probability is also decided randomly). For the uniform team, I make copies of the best agent (with highest p i;0 ) of the diverse team, but distribute the remaining probability randomly from 1% to 3% of the next best actions. 53 2 3 4 5 6 7 8 9 10 11 Number of Agents 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 p b est Diverse Pre d iction Diverse Figure 4.2: p best of a diverse team as the number of agents increases. We can see the average result for 200 random teams in Figure 4.1, where in Figure 4.1(a) I show the dierence between the performance in the limit (~ p best ) and the actual p best (m) for the diverse and the uniform teams; in Figure 4.1(b) I show the average p best (m) of the teams. As can be seen, when the best agents increase theird m at a slower rate than the agents of the diverse team, the uniform teams converge slower to ~ p best . Even though they play better than the diverse teams for a small m, they are surpassed by the diverse teams as m increases. However, because ~ p best of the uniform teams is actually higher than the one of the diverse teams, eventually the performance of the uniform teams get closer to the performance of the diverse teams, and will be better than the one of the diverse teams again for a large enough m. This situation is expected according to Theorem 4.1.1. If the d m of the best agent also increases asm gets larger, the uniform team will actually behave like a diverse team and also converge to ~ p best . ~ p uniform best ~ p diverse best , as the best agent has a higher probability of playing the optimal action. Hence, in the limit the uniform team will play better than the diverse team. However, as we saw in Corollary 4.1.2, the speed of convergence is in the order of 1=d m . Therefore, the diverse team will converge faster, and can overcome the uniform team for moderately large m. As Theorem 4.1.3 only holds when m!1, I also explore the eect of increasing the number of agents for a large m. The ~ p best of a team of agents is shown as the dashed line in Figure 4.2. I am plotting for agents that have a probability of playing the best action of only 10%, but as we can see the probability quickly grows as the number of agents increases. I also calculate p best for random teams from 2 to 6 agents (shown as the continuous line), when there are 300 available actions. Each agent has a probability of playing the best action of 10%, and the remaining probability is randomly distributed 54 over the 10% next best actions. As can be seen, the teams have a close performance to the expected. I only show up to 6 agents because it is too computationally expensive to calculate the pdfs of larger teams. 4.2.2 Computer Go I present now results in a real system. I use in my experiments 4 dierent Go software: Fuego 1.1, GnuGo 3.8, Pachi 9.01, MoGo 4, and two (weaker) variants of Fuego (Fuego and Fuego), in a total of 6 dierent, publicly available, agents. Fuego is considered the strongest agent among all of them. Fuego is an implementation of the UCT Monte Carlo Go algorithm, therefore it uses heuristics to simulate games in order to evaluate board congurations. Fuego uses mainly 5 heuristics during these simulations, and they are executed in a hierarchical order. The original Fuego agent follows the order <Atari Capture, Atari Defend, Lowlib, Pattern> (The heuristic called Nakade is not enabled by default). My variation called Fuego follows the order <Atari Defend, Atari Cap- ture, Pattern, Nakade, Lowlib>, while Fuego follows the order<Atari Defend, Nakade, Pattern, Atari Capture, Lowlib>. Also, Fuego and Fuego have half of the memory available when compared with the original Fuego. All my results are obtained by playing either 1000 games (to evaluate individual agents) or 2000 games (to evaluate teams), in a HP dl165 with dual dodeca core, 2.33GHz processors and 48GB of RAM. I compare results obtained by playing against a xed opponent. Therefore, I evaluate systems playing as white, against the original Fuego playing as black. I removed all databases and specic board size knowledge of the agents, including the opponent. I call Diverse as the team composed of all 6 agents, and Uniform as the team composed of 6 copies of Fuego. Each agent is initialized with a dierent random seed, therefore they will not vote for the same action all the time in a given world state, due to the characteristics of the search algorithms. In all the graphs I present in this section, the error bars show the condence interval, with 99% of condence (p = 0:01). I evaluate the performance of the teams over 7 dierent board sizes. I changed the time settings of individual agents as I increased the board size, in order to keep their strength as constant as possible. The average winning rates of the team members is shown in Table 4.2, while Table 4.3 show the winning rates of the individual agents 1 . We can see my results in Figure 4.3 (a). Diverse improves from 58:1% on 9x9 to 72:1% on 21x21, an increase in winning rate that is statistically signicant withp< 2:210 16 . This result is expected according to Theorem 4.1.1. Uniform changes from 61:0% to 1 In my rst experiment, Diverse improved from 56:1% on 9x9 to 85:9% on 19x19. I noted, however, that some of the diverse agents were getting stronger in relation to the opponent as the board size increased. Hence, by changing the time setting to keep the strength constant, I am actually making my claims harder to show, not easier. 55 Team 9x9 11x11 13x13 15x15 17x17 19x19 21x21 Diverse 32.2% 30.8% 29.6% 29.4% 31.5% 31.9% 30.3% Uniform 48.1% 48.6% 46.1% 48.0% 49.3% 46.9% 46.6% Table 4.2: Average winning rates of the team members across dierent board sizes. Note that these are not the winning rates of the teams. Agent 9x9 11x11 13x13 15x15 17x17 19x19 21x21 Fuego 48.1% 48.6% 46.1% 48.0% 49.3% 46.9% 46.6% GnuGo 1.1% 1.1% 1.9% 1.9% 4.5% 6.8% 6.1% Pachi 25.7% 22.9% 25.8% 26.9% 23.5% 20.8% 11.0% MoGo 27.6% 26.4% 22.7% 22.0% 27.1% 30.1% 27.1% Fuego 45.7% 45.8% 42.2% 40.4% 43.0% 44.5% 47.4% Fuego 45.5% 40.2% 39.2% 37.6% 41.8% 42.3% 43.6% Table 4.3: Winning rates of each one of the agents across dierent board sizes. 65:8%, a statistically signicant improvement with p = 0:0018. As we saw before, an increase in the performance of Uniform can also be expected, as the best agent might not be a perfect NST agent. A linear regression of the results of both teams gives a slope of 0:010 for the diverse team (adjusted R 2 : 0.808, p = 0:0036) and 0:005 for the uniform team (adjusted R 2 : 0.5695, p = 0:0305). Therefore, the diverse team improves its winning rate faster than the uniform team. To check if this is a signicant dierence, I evaluate the interaction term in a linear regression with multiple variables. We nd that the in uence of board size is higher on Diverse than on Uniform with p = 0:0797 (estimated coecient of \size of the board group type":10:321, adjustedR 2 : 0:7437). Moreover, on the 9x9 board Diverse is worse than Uniform (p = 0:0663), while on the 21x21 board Diverse is better with high statistical signicance (p = 1:941 10 5 ). I also analyze the performance of the teams subtracted by the average strength of their members (Figure 4.3 (b)), in order to calculate the increase in winning rate achieved by \teamwork" and compensate uctuations on the winning rate of the agents as we change the board size. Again, the diverse team improves faster than the uniform team. A linear regression results in a slope of 0:0104 for Diverse (adjusted R 2 : 0.5549, p = 0:0546) and 0:0043 for Uniform (adjusted R 2 : 0.1283, p = 0:258). I also evaluate the performance of teams of 4 agents (Diverse 4 and Uniform 4). For Diverse 4, I removed Fuego and Fuego from the Diverse team. As can be seen in Figure 4.4, the impact of adding more agents is higher for the diverse team in a larger board size (21x21). In the 9x9 board, the dierence between Diverse 4 and Diverse 6 is only 4.4%; while in 21x21 it is 14%. Moreover, we can see a higher impact of adding 56 8 10 12 14 16 18 20 22 Board S ize 0.55 0.60 0.65 0.70 0.75 Win ning R ate Diverse Uniform (a) Absolute winning rates. 8 10 12 14 16 18 20 22 Board S ize 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Relativ e W innin g Rate Diverse Uniform (b) Relative to the average strength of team members. Figure 4.3: Winning rate in the real Computer Go system. Diverse Diverse Uniform Uniform 0.0 0.2 0.4 0.6 0.8 1.0 Win ning R ate 9x9 21x21 4 4 6 6 Figure 4.4: Winning rates for 4 and 6 agents teams. 57 agents for the diverse team, than for the uniform team. These results would be expected according to Theorem 4.1.3. As can be seen, the predictions of my theory holds: the diverse team improves signi- cantly as I increase the action space. The improvement is enough to make it change from playing worse than the uniform team on 9x9 to playing better than the uniform team with statistical signicance on the 21x21 board. Furthermore, I show a higher impact of adding more agents when the size of the board is larger. 4.2.3 Analysis To test the assumptions of my model, I estimate a pdf for each one of the agents. For each board size, and for each one of 1000 games from my experiments, I randomly choose a board state between the rst and the last movement. I make Fuego evaluate the chosen board, but I give it a time limit 50x higher than the default one. Therefore, I use this much stronger version of Fuego to approximate the true ranking of all actions. For each board size, I run all agents in each board sample and check in which position of the approximated true ranking they play. This allows me to build a histogram for each agent and board size combination. Some examples can be seen in Figure 4.5. We can see that a strong agent, like Fuego, has most of its probability mass on the higher ranked actions, while weaker agents, like GnuGo, has the mass of its pdf distributed over a larger set of actions, creating a larger tail. Moreover, the probability mass of GnuGo is spread over a larger number of actions when I increase the size of the board. I study how the pdfs of the agents change as we increase the action space. My hypothesis is that weaker agents will have a behavior closer to ST agents, while stronger agents to NST agents. In Figure 4.6 (a) I show how many actions receive a probability higher than 0. As can be seen, Fuego does not behave exactly like an NST agent. However, it does have a slower growth rate than the other agents. A linear regression gives the following slopes: 13.08, 19.82, 19.05, 15.82, 15.69, 16.03 for Fuego, Gnugo, Pachi, Mogo, Fuego and Fuego, respectively (R 2 : 0.95, 0.98, 0.94, 0.98, 0.98, 0.98, respectively). It is clear, therefore, that the probability mass of weak agents is distributed into bigger sets of actions as we increase the action space, and even though the strongest agent does not behave in the idealized way it does have a slower growth rate. I also verify how the probability of playing the best action changes for each one of the agents as the number of actions increase. Figure 4.6 (b) shows that even though all agents experience a decrease inp i;0 , it does not decrease much. From 9x9, all the way to 21x21, I measure the following decrease: 20%, 23%, 39%, 26%, 28%, 22%, for Fuego, Gnugo, Pachi, Mogo, Fuego and Fuego, respectively. Hence, on average, they decreased about 25% from 9x9 to 21x21. Even though my assumption about p i;0 does not hold perfectly, the 58 0 100 200 Ranking 0 100 200 Ranking 0.00 0.05 0.10 Probability 0 100 200 Ranking 0.00 0.05 0.10 Probability Probability 0 100 200 Ranking 0 100 200 Ranking 0.00 0.05 0.10 Probability 0 100 200 Ranking 0 100 200 Ranking Fuego - 9x9 GnuGo - 9x9 Fuego - 21x21 GnuGo - 21x21 Pachi - 9x9 0 100 200 Ranking 0.00 0.05 0.10 Mogo - 9x9 0 100 200 Ranking Mogo - 21x21 0 100 200 Ranking 0.00 0.05 0.10 Probability Fuego∆ - 9x9 0 100 200 Ranking Fuego∆ - 21x21 0 100 200 Ranking 0.00 0.05 0.10 Probability FuegoΘ - 9x9 0 100 200 Ranking FuegoΘ - 21x21 Fuego - 13x13 0 100 200 Ranking Fuego - 17x17 GnuGo - 13x13 0 100 200 Ranking GnuGo - 17x17 0 100 200 Ranking Pachi - 21x21 0 100 200 Ranking Pachi - 13x13 0 100 200 Ranking Pachi - 17x17 0 100 200 Ranking Mogo - 13x13 0 100 200 Ranking Mogo - 17x17 0 100 200 Ranking Fuego∆ - 13x13 0 100 200 Ranking Fuego∆ - 17x17 0 100 200 Ranking FuegoΘ - 13x13 0 100 200 Ranking FuegoΘ - 17x17 Figure 4.5: Histograms of agents for dierent board sizes. 59 8 10 12 14 16 18 20 22 Bo a rd S i z e 0 50 100 150 200 250 300 Nu mb er o f Ac ti o n s F ueg o GnuGo P a ch i F ueg o F ueg o M o G o (a) Size of the set of actions that receive a nonzero probability. 8 10 12 14 16 18 20 22 Bo a rd S i z e 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 F ueg o GnuGo P a ch i p i, 0 M o G o F ueg o F ueg o (b) pi;0 as the size of the board grows. Figure 4.6: Verifying the assumptions in the real system. predictions of my model are still veried. Therefore, the amount of decrease experienced is not enough to avoid that the diverse team increases in performance as the action space grows. 4.3 Conclusion and Discussions Diversity is an important point to consider when forming teams. In this chapter I present a new model that captures better than previous ones the intuitive notion of diverse agents as agents that tend to disagree. This model allows me to make new predictions. I show that the performance of diverse teams increases as the size of the action space gets larger. Uniform teams may also increase in performance, but at a slower pace than diverse teams. Therefore, even though a diverse team may start playing worse than a uniform team, it can eventually outperform the uniform team as the action space increases. Besides, I show that in large action spaces the performance of a diverse team converges exponentially fast to the optimal one as the number of agents increases. I start my model with the notion of spreading tail (ST) and non-spreading tail (NST) agents. ST agents are agents that have a non-zero probability over a larger set of actions as the action space increases, while NST agents always have a constant number of actions with non-zero probability. I dene a diverse team as a team of ST agents, and a uniform team as a team of NST agents. Therefore, my focus change from modeling diverse teams as teams with dierent agents (as in Chapter 3), to focusing on diverse teams as teams where the agents tend to disagree. This change allows me to make new predictions that were not possible before. 60 Note that my model does not say that an NST agent will never vote for a new action. I dene the pdfs of the agents by the rankings of the actions. Hence, when the number of actions increases from a certain number x 0 to a new numberx 1 , a new actiona may be the action with highest utility. Therefore, an agent will assign to a the same probability that it assigned before to the previously best action when the number of actions was only x 0 . A uniform team made of copies of the best agent also does not mean that the agents always vote for the same actions. The vote of each agent is a sample from a pdf, so copies of a single agent may or may not vote for the same action. In fact, we observe an increase in performance by voting among multiple copies of a single agent, both theoretically and experimentally. The division of agents into two types (ST and NST) is, however, only an idealization, that allows me to isolate and study in detail the eect of diversity. A very strong agent will normally assign most of its probability mass to the actions with the highest utility, so in the extreme its pdf would never change by adding new actions. In reality, however, it may also consider a larger set of actions as the action space grows. Therefore, I relax my model, and introduce the hypothesis that the best agent spreads the tail of its pdf at a slower pace than weaker agents. I show that because of this eect, a diverse team increases in performance faster than uniform teams, and I illustrate this phenomenon with synthetic experiments. Hence, even in a relaxed model where both diverse and uniform teams are composed of ST agents, a diverse team still outperforms a uniform team as the action space grows. The eect, however, is transient, as a uniform team may still have a higher convergence point than a diverse team, so in extreme large action spaces it would again outperform the diverse team. If the agents have the same probability of playing the best action, however, then it is clear that in the limit the diverse team will always be better than the uniform team. My model needs one strong assumption: that the probability of the individual agents voting for the best action does not change as the action space increases. This assumption allows my analysis to be cleaner, although it may not hold perfectly in a real system. In fact, in my Computer Go experiments we did observe a decrease in the probability of the agents voting for the best action. However, even though the assumption did not hold perfectly, the predictions of my theory holds: a diverse team signicantly increased in performance as the action space got larger. Clearly, a decrease in the probability of the individual agents voting for the best action will decrease the performance of a team, while the eects studied in this paper will increase the performance. Therefore, as long as the decrease is not large enough to counter-balance the eect under study, we are still going to observe an increase in performance as the action space gets larger. Moreover, as I discuss in my generalizations, the argument that teams that spread the tail faster 61 converge faster is still valid when the assumption does not hold; hence if the agents are equally strong (i.e., the individual agents have the same probability of voting for the best action) the team with faster growing tail will always perform better. As mentioned, I veried my theory in a real system of Computer Go playing agents. Not only a real diverse team of agents eectively increased in performance as the board size increased, but I also veried that the strongest agent indeed spreads the tail of its pdf at a slower rate than other weaker agents. I also veried that both diverse and uniform teams increase in performance, but the diverse team increased two times faster. This is explained by the relaxed version of my model, when I predict diverse teams to converge faster than uniform teams, as illustrated by my synthetic experiments. In the next chapter, I will study diverse and uniform teams in the context of design problems, where the number of optimal solutions must be maximized, allowing a human to choose according to aesthetics or compromises that cannot be formalized. 62 Chapter 5 So Many Options, but We Need Them All Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep. (Scott Adams) 5.1 Introduction Teams of voting agents are a powerful tool for nding the optimal solution in many applications [Mao et al., 2013, Bachrach et al., 2012, Soejima et al., 2010, Polikar, 2012, Isa et al., 2010]. Voting is a popular approach since it is easily parallelizable, it allows the re-use of existing agents, and there are theoretical guarantees for nding one optimal choice [Conitzer and Sandholm, 2005, List and Goodin, 2001]. For design problems, however, nding one optimal solution is not enough. For ex- ample, it could be mathematically optimal under measurable metrics but lack aesthetic qualities or social acceptance by the target public. Besides, the solution could have a poor performance in some key objective of a multi-objective optimization problem. Essentially, designers need to explore a large set of optimal alternatives, to pick one solution not only according to her aesthetic taste (and/or the one of the target public), but also according to preferences that may be unknown or not formalized, especially when there are multi- ple optimization objectives leading to intricate trade-os and compromises [Gerber, 2007, Woodbury and Burrow, 2006, Radford and Gero, 1980, van Langen and Brazier, 2006, Gero and Sosa, 2008]. Hence, we actually need systems that nd as many optimal solutions as possible, allowing a human to explore such optimal alternatives to make a choice. Even if a user does not want to consider too many solutions, they can be ltered and clustered [Erhan et al., 2014], and be presented in manageable ways [Smith et al., 2010], allowing her to 63 easily make an informed choice. Therefore, a system of voting agents that produces a unique optimal solution is insucient, and I propose the novel social choice problem of maximizing the number of optimal alternatives found by a voting system. As ranked voting may suer from noisy rankings when using existing agents (Chapter 6), I study multiple plurality voting iterations, allowing great applicability and re-use of existing agents. Traditionally, social choice studies the optimality of voting rules, assuming a certain noise models for the agents, and rankings composed of a linear order over alternatives [Conitzer and Sandholm, 2005, Caragiannis et al., 2013, List and Goodin, 2001]. Hence, there is a single optimal choice, and a system is successful if it can return that optimal choice with high probability. More recently, several works have been considering cases where there is a partial order over alternatives [Xia and Conitzer, 2011a, Procaccia et al., 2012], or where the agents output pairwise comparisons instead of rankings [Elkind and Shah, 2014]. However, these works still focus on nding an optimal alternative, or a xed-sized set of optimal alternatives (where the size is known beforehand). Therefore, they still provide no help in nding the maximum set of optimal solutions. Moreover, they assume agents that are able to output comparisons among all actions with fairly good precision, and the use of multiple voting iterations has never been studied. When considering agents with dierent preferences, the eld is focused on verifying if voting rules satisfy a set of axioms that are considered to be important to achieve fairness [Nurmi, 1987]. Meanwhile, the computational design literature has not yet found the potential of teams of voting agents. They mainly study traditional optimization techniques to nd optimal solutions, such as genetic algorithms [Miles et al., 2001, Yi and Malkawi, 2009, Gerber and Lin, 2013], particle swarm optimization [Luh et al., 2011, Felkner et al., 2013], ant colony optimization [Luh and Lin, 2009], immune systems [Zhao et al., 2014], etc. Many researchers also explore the potential of swarms of agents that interact on the geometric space to emerge aesthetically complex shapes (but lacking any optimization attempt) [Snooks, 2011, Vehlken, 2014, Aranda and Lasch, 2006, Baharlou and Menges, 2013, Ireland, 2009, Carranza and Coates, 2000, Hanna, 2005]. Therefore, the literature still lacks a deep study of other multi-agent techniques and ideas. Hence, in this chapter I bring together the social choice and computational design elds, oering new perspectives to both literatures. I present a theoretical study of which kinds of teams are desirable for design problems, and how their size may eect optimality. In doing so, I show many novel results for the study of multi-agent systems. For example, instead of studying agents with dierent preferences in order to verify fairness axioms, as in traditional social choice, I show here that agents with dierent preferences are actually 64 fundamental when voting to nd a \truth" (i.e., optimal decisions). On the other hand, agents with the same set of preferences signicantly harm the performance, and in general the number of optimal solutions decreases as the size of the team grows. Such results were never seen before in the social choice literature, as voting to estimate optimality is normally studied separated from voting as a way to aggregate agents with dierent preferences. My theoretical development draws a novel connection between social choice and num- ber theory, instead of the traditional connections with bayesian probability theory. This novel connection allows me to show, for instance, that the optimal diverse team size is constant with high probability, and a prime number of optimal actions may impose prob- lems. I also show that we can maximize the number of optimal solutions with agents with dierent preferences as the team size grows, as long as the team size grows carefully. Moreover, I simulate design agents in synthetic experiments to further study my model, conrming the predictions of my theory and providing realistic insights into what happens when systems run with bounded computational time. Finally, I present exper- iments in a highly relevant domain: architectural design, where I show teams of real design agents that vote to choose the best qualifying and energy-ecient design solutions for buildings. Such domain is fundamental in the current scenario where we must nd energy-ecient solutions for our modern life-style, since it is known that the early design of a building has a major impact in its performance through-out its whole life-span [Lin and Gerber, 2014, Bogenst atter, 2000, Echenagucia et al., 2015]. I study actual teams of voting agents, and show that by aggregating their opinions, we are able to nd a large percentage of optimal solutions, signicantly overcoming single agents. I also discuss how my model can explain and make predictions about the real system, and how it can guide researchers in computational design when developing novel agents for design. 5.2 Design Domains I consider in this chapter domains where the objective is to nd the highest number of optimal solutions. I show that design is one of such domains. With the rapid development of computation, algorithmic techniques have been emerging as an important approach in design [Terzidis, 2006]. For example, Gero [2000] argues that computational approaches can be used to increase the space of design exploration and the creativity in designs. One of the most common computational design approaches is to use parametric designs [Vierlinger and Bollinger, 2014, Globa et al., 2014, Erhan et al., 2014], where a human designer creates an initial design of a product using computer-aided design tools. However, instead of manually deciding all aspects of the product, she leaves free parameters, whose 65 Building Site 3D Plan Building Plan Site (0,0) X1 Y1 Range X1 Range Y1 Figure 5.1: A parametric design of a building, showing two parameters: X1 and Y 1. values can be modied to change the design of the product. It is up to the designer to decide which parameters are going to be available, their valid types and their valid range. This approach is used because design is an inherently complex problem [Simon, 1973]. Although a human is able to test and evaluate a few solutions looking for optimality, the number of dierent possibilities that she can manually create is highly limited, especially under the (common) hard time-constraints. In Figure 5.1, I show a simple example in the context of architectural design, where the parameters X1 and Y 1 are being used to specify the position of the lower left corner of the building relative to the site boundary. The design of a product normally occurs over multiple phases, where increasing levels of details are decided and optimized. My work is focused on the initial design phase, when multiple possible design alternatives are analyzed in order to choose one for further study and optimization. This initial design phase is, however, very important to the nal performance of a product [Smith et al., 2010, Holzer et al., 2007]. For example, in the context of architectural design (as how I explore later in my experiments), it has been acknowledged that it has a high impact on the overall building performance [Bogenst atter, 2000, Lin and Gerber, 2014, Echenagucia et al., 2015, Yi and Malkawi, 2009]. Design problems are in general multi-objective [Lin and Gerber, 2012, Keough and Benjamin, 2010], since a product normally must be optimized across dierent objectives. For example, a product should have a low cost, but at the same time high quality, two highly-contradictory objectives. Hence, there are a large number of optimal solutions, all tied in a Pareto frontier. For the computational system, these optimal solutions are all equivalent. However, a human may have unknown preferences, may dynamically decide to value some objective over another when handling intricate trade-os, and/or may choose the option that most pleases her own aesthetic taste or the one of the target public/client. Note that choosing a design according to aesthetics is an undened problem, since there are no formal denitions to compare among dierent options. Hence, the best that a 66 system can do is to provide a human with a large number of optimal solutions (according to other measurable factors), allowing her to freely decide among equally optimal solutions | but most probably with dierent aesthetic qualities. Therefore, it is natural that in design problems we are going to have many possible solutions, and we want to nd as many optimal ones as possible. In fact, the exploration of a large space of possible alternatives is essential in design, as recently shown by Woodbury and Burrow [2006], van Langen and Brazier [2006], and Gero and Sosa [2008]. There are many benets in discovering a large number of optimal solutions, and I list some of them below: Knowledge \Does not Hurt": I argue that having more optimal solutions to choose from is not worse than having less. Although some works in psychology show that humans may get frustrated in the face of too many options, especially under time pressure [Haynes, 2009, Iyengar and Lepper, 2000], I argue that if a designer has enough time or motivation to analyze onlyx solutions, she can do so with a system that provides more than x optimal solutions by sampling the exact amount that she desires. However, she will never be able to do so with a system that provides less than x optimal solutions. Note also that the works in psychology [Haynes, 2009, Iyengar and Lepper, 2000] were taken in the context of consumers deciding among products to purchase, not in the context of design exploration. As mentioned before, in design the necessity of large exploration spaces is recognized [Gerber, 2007, Woodbury and Burrow, 2006, van Langen and Brazier, 2006, Gero and Sosa, 2008]. Moreover, as I discuss in detail later, voting systems could be combined with another system that identify and eliminate solutions that are similar by applying clustering and analysis techniques, and that presents the optimal alternatives to a human in a manage- able way [Erhan et al., 2014, Smith et al., 2010], so that every solution that the human looks at is meaningful. Knowledge Increases Condence in Optimality: In general design problems, the true Pareto frontier is unknown. Genetic algorithms are widely used in order to es- timate it. The only knowledge available for the system to evaluate the optimality is in comparison with the other solutions that are also being evaluated during the optimization process [Lin and Gerber, 2014]. Many apparently \optimal" solutions are actually dis- covered to be sub-optimal as we nd more solutions. Hence, nding a higher number of optimal solutions decreases the risk of a designer picking a wrong choice that was initially outputted as \optimal" by a system (for example, the single agents, as I will show later). Knowledge Increases Aesthetic Qualities: If a human has a larger set of optimal solutions to choose from, there is a greater likelihood that at least one of these solutions 67 is going to be of high aesthetic quality according to her preferences, or the ones of the target public [Gero and Sosa, 2008]. Knowledge Increases Diversity of Options: In general, when a system x has more optimal solutions available than a system y, it does not necessarily imply that the solutions in the system x are more similar, while the optimal solutions in y are more dierent/diverse. In fact, all things equal (i.e., the algorithms are equally able to nd unique solutions), the greater the amount of optimal solutions, the higher the likelihood that we have more diverse solutions available. Of course we could have some algorithm x that produces many optimal solutions by creating small variations of one unique solution, but here I do not consider these potentially misleading systems. Again, I assume that such solutions could be identied and ltered by another system [Erhan et al., 2014, Smith et al., 2010]. 5.3 Agent Teams for Design Problems I present my theory of agent teams for design problems. I consider teams that vote together at each possible decision point of the design of a product (for example, they may vote for the value of each parameter, in a parametric design). Before showing my theoretical development, I rst show an example to give an intuitive idea about my results. Then, in Section 5.3.2 and 5.3.3 I show my formal results. 5.3.1 Example Let's consider a parametric design problem with two dierent parameters: ! 1 and ! 2 . Let's say that for! 1 there are two possible optimal values: a 0 ora 1 . For! 2 , let's assume three possible optimal values: a 1 , a 2 , a 3 . I consider that any solution with an optimal value for each parameter is optimal, and hence there are 6 (and only 6) possible optimal vectors: <a 0 ;a 1 >, <a 0 ;a 2 >, <a 0 ;a 3 >, <a 1 ;a 1 >, <a 1 ;a 2 >, <a 1 ;a 3 >. Of course the information of which actions are optimal is not known before-hand, otherwise we would not need a design system at all. I list here the optimal values only for the purpose of my example. I will study teams where each agent outputs a value for each parameter, and these opinions are aggregated by voting. Voting across each parameter, however, will only produce a single solution. Therefore, I consider multiple voting iterations (where one iteration goes across all parameters). At each iteration, we may nd a new optimal solution, we may repeat a solution that was already found, or we may fail to nd an optimal solution altogether. 68 Action a 0 a 1 Probability 0.3 0.7 (a) First parameter Action a 0 a 1 a 2 a 3 Probability 0 0.2 0.6 0.2 (b) Second parameter (I) Uniform agent Action a 0 a 1 Agent 1 0.3 0.7 Agent 2 0.3 0.7 Agent 3 0.3 0.7 Agent 4 0.7 0.3 Agent 5 0.7 0.3 Agent 6 0.7 0.3 (a) First parameter Action a 0 a 1 a 2 a 3 Agent 1 0.1 0.1 0.4 0.4 Agent 2 0.1 0.4 0.4 0.1 Agent 3 0.1 0.4 0.4 0.1 Agent 4 0.1 0.1 0.4 0.4 Agent 5 0.1 0.4 0.1 0.4 Agent 6 0.1 0.4 0.1 0.4 (b) Second parameter (II) Diverse agents Table 5.1: Probability distribution function of the agents in my example. I will study in this chapter the performance of dierent kinds of teams. Consider now that we have a strong agent, that always votes for optimal values, but according to the probabilities shown in Table 5.1 (I). We can see, therefore, that each time we run such agent, it has a greater tendency of voting for ! 1 :=a 1 and ! 2 :=a 2 than the other possible optimal values. Now, let's rst consider a uniform team composed of multiple copies of the previously mentioned strong agent. The vote of each agent is a dierent sample from the same probability distribution function (pdf) in Table 5.1 (I). I consider that the team aggregates the opinions of its members by plurality voting for each parameter. That is, for each parameter the team takes the decision voted by the largest number of agents. As each individual agent has a greater tendency of voting for! 1 :=a 1 and! 2 :=a 2 , we expect to see the optimal solution<a 1 ;a 2 > more often than the other possible optimal solutions. In fact, in Table 5.2 (a) I calculate the probability of nding each possible optimal solution for uniform teams of 6 and 12 agents. As we can see, even though the probability of outputting any optimal solution remains constant as 1, the higher the number of agents the higher the likelihood that the team will output the solution <a 1 ;a 2 >. Of course, we have limited time to nd optimal design alternatives. Consider, for example, that we can run the whole system only 10 times. Ideally, we want to nd all the 6 possible optimal solutions within these 10 iterations. In expectancy, however, with 6 agents we would repeat the solution<a 1 ;a 2 > 6:7 times, and with 12 agents 8:4 times. Hence, we would end up being able to nd only around 3 optimal solutions. Consider now a diverse team composed of agents with dierent preferences, as shown in Table 5.1 (II). We can see the pdfs when aggregating the opinions of such agents with 69 Solution <a 0 ;a 1 > <a 0 ;a 2 > <a 0 ;a 3 > <a 1 ;a 1 > <a 1 ;a 2 > <a 1 ;a 3 > 6 Agents 0.0160 0.1310 0.0160 0.0822 0.6724 0.0822 12 Agents 0.0033 0.0715 0.0033 0.0393 0.8430 0.0393 (a) Uniform Team Solution <a 0 ;a 1 > <a 0 ;a 2 > <a 0 ;a 3 > <a 1 ;a 1 > <a 1 ;a 2 > <a 1 ;a 3 > 6 Agents 0.1590 0.1590 0.1590 0.1590 0.1590 0.1590 12 Agents 0.1638 0.1638 0.1638 0.1638 0.1638 0.1638 (b) Diverse Team Table 5.2: Probability of outputting each possible optimal solution, for the diverse and uniform teams. plurality voting in Table 5.2 (b). For a 6 agents team I consider one copy of each agent, and for a 12 agents team I consider two copies of each agent. As we can notice, even though now the probabilities of voting for optimal solutions sum up to 0:954 1 with 6 agents, they are evenly distributed over all optimal solutions. Hence, if we run 10 voting iterations, each option is expected to be seen 1:5 times. Moreover, when we increase the number of agents to 12, the team improves, as the probabilities of voting for each optimal solution increases (summing up now to 0:982), but they remain evenly distributed. Hence, each option is expected to be seen now 1:6 times. As we can see, teams of agents with dierent preferences have a great potential in increasing the number of optimal solutions that we can nd, as there is a greater prob- ability of nding a new solution each time we run the system, even though the agents themselves may have a lower probability of nding optimal solutions than the ones in the uniform team. However, the performance of the team also depends on the team size, and it will not necessarily increase as the team grows (even for the diverse team). For example, in Table 5.3 we can see the pdfs of the diverse team for other team sizes. As we can see, even though the team size increases, for several optimal solutions the probability of voting for them decreases. For example, the solution <a 0 ;a 3 > goes from 0:1590 with 6 agents all the way down to 0:0787 with 9 agents. Hence, we will see a lower number of optimal solutions after 10 voting iterations. However, when the number of agents reaches 12, we again have the situation where the total probability is equally divided across all possible optimal solutions. Hence, the diverse team will, in general, improve as the team size grows, but not for all team sizes. Therefore, we have to increase the team size carefully to guarantee optimality. In the next section I formalize my theory. 70 Solution <a 0 ;a 1 > <a 0 ;a 2 > <a 0 ;a 3 > <a 1 ;a 1 > <a 1 ;a 2 > <a 1 ;a 3 > 6 Agents 0.1590 0.1590 0.1590 0.1590 0.1590 0.1590 7 Agents 0.1107 0.1515 0.1515 0.1466 0.2007 0.2007 8 Agents 0.1118 0.1468 0.1118 0.1806 0.2371 0.1806 9 Agents 0.1046 0.1333 0.0787 0.2171 0.2766 0.1632 10 Agents 0.1065 0.1713 0.1065 0.1641 0.2638 0.1641 11 Agents 0.1339 0.1687 0.1339 0.1666 0.2099 0.1666 12 Agents 0.1638 0.1638 0.1638 0.1638 0.1638 0.1638 Table 5.3: Probability of outputting each possible optimal solution, for dierent sizes of the diverse team. 5.3.2 Theory I consider here a team of agents that vote together at each decision point of the design of a product. For the sake of clarity and precision, I present in this section an idealized model. In Section 5.3.3 I generalize my model to more complex situations, and in Section 5.4 I generalize further by performing synthetic experiments. Let be a set of agents , and a set of world states !. Each ! has an associated set of possible actionsA ! . For example, each world state may represent a parameter of a parametric design problem, and each action may represent a possible value for such parameter. At each world state, each agent outputs an action a, an optimal action according to the agent's imperfect evaluation { which may or may not be a truly optimal action. Hence, there is a probability p j that the agent outputs a certain action a j . The teams take the action decided by plurality voting (i.e, as mentioned in the previous section, the team takes the decision voted by the largest number of agents { I consider ties are broken uniformly at random). I assume rst that the world states are independent, and by taking an optimal action at all world states we nd an optimal solution for the entire problem. That is, I assume rst that by taking locally optimal decisions at each design decision point, a globally optimal solution is obtained. I generalize this assumption later, in Proposition 5.3.10 (in Section 5.3.3), where I consider design problems with correlated parameters. I will also further discuss this assumption in Section 5.6. In this chapter my objective goes beyond nding one optimal solution, I want to maximize the number of optimal solutions that we can nd. For greater applicability, I consider here agents that output a single action. Hence, we generate multiple solutions by re-applying the voting procedure across all world states multiple times (which are called voting iterations { one iteration goes across all world states, forming one solution). Formally, letS be the set of (unique) optimal solutions that we nd by re-applying the 71 voting procedure through z iterations. Our objective is to maximizejSj. I will show that, under some conditions, we can achieve that when z!1 (I study bounded time in Section 5.4). I consider that at each world state ! there is a subset Good ! A ! of optimal actions in !. An optimal solution is going to be composed by assigning any a2Good ! in world state ! { for all world states. Conversely, I consider the complementary subset Bad ! A ! , such thatGood ! [Bad ! =A ! ;Good ! \Bad ! =;. I drop the subscripts ! when it is clear that I am referring to a certain world state. One fundamental problem is selecting which agents should form a team. By the classical voting theories, one would expect the best teams to be uniform teams composed of multiple copies of the best agent [Conitzer and Sandholm, 2005, List and Goodin, 2001]. Here I show, however, that for design problems uniform teams need very strong assumptions to be optimal, and in most cases they actually converge to always outputting a single solution { an undesirable outcome. However, diverse teams are optimal as long as the team size grows carefully, as I explain later in Theorem 5.3.6. I call a team optimal when: (i)jSj! Q ! jGood ! j as z!1, and (ii) all optimal solutions are chosen by the team with the same probability 1= Q ! jGood ! j. Otherwise, even though the team still produces all optimal solutions, it would tend to repeat already generated solutions whose probability is higher. Since in practice there are time bounds, such condition is fundamental to have as many optimal solutions as possible in limited time. Also note that condition (ii) subsumes condition (i), but I keep both for clarity. I rst consider agents whose pdfs are independent and identically distributed. Let p Good j be the probability of voting for a j 2Good, and p Bad k be the probability of voting fora k 2Bad. Letn :=jj be the size of the team, andN l be the number of agents that vote for a l in a certain voting iteration. If8a j 2 Good;a k 2 Bad, p Good j > p Bad k , the team is going to nd one optimal solution with probability 1 as n!1, as I show in the following observation: Observation 5.3.1 The probability of a team outputting one optimal solution goes to 1 as n!1, if p Good j >p Bad k ,8a j 2Good;a k 2Bad. Note that as the agents are independent and identically distributed, we can model the process of pooling the opinions of n agents as a multinomial distribution with n trials (and the probability of any class k of the multinomial corresponds to the probability p k of voting for an action a k ). Hence, for each action a l , the expected number of votes is given by E[N l ] = np l . Therefore, by the law of large numbers, if p Good j >p Bad k 8a j 2Good;a k 2Bad, we have 72 that N j >N k . Hence, the team will pick an action a j 2Good, in all world states, if n is large enough (i.e., n!1). However, with a team made of copies of the same agent, the system is likely to lose the ability to generate new solutions as n increases. If, for each !, we have an action a ! m such that p Good m > p Good j 8a ! m 6= a ! j , the team converges to picking only action a ! m . Hence,jSj = 1, which is a very negative result for design problems. Therefore, contrary to traditional social choice, here it is not the case that increasing the team size always improves performance. I formalize this notion in Proposition 5.3.2 below, where I also show the conditions for a uniform team to be optimal. Let p Good := P j p Good j be the probability of picking any action in Good. I re-write the probability of an action a Good j as: p Good j := p Good jGoodj + j , where P j j := 0. Hence, some j are positive, and some are negative (unless they are all equal to 0). Let + be the set of j > 0. Let High be the maximum possible value for j 2 + , such that the relation p Good j >p Bad k ,8a j 2Good;a k 2Bad is preserved. I show that when z!1,jSj is the highest as max + ! 0, and the lowest (i.e., one) as min + ! High . Note that max + ! 0 represents the situation where the probability is equally divided among all optimal actions, and min + ! High represents the case where one optimal action receives a high probability in comparison with the other optimal actions. Proposition 5.3.2 The maximum value forjSj is Q ! jGood ! j. When z;n!1, as max + ! 0,jSj! Q ! jGood ! j. Conversely, as min + ! High ,jSj! 1. Proof: As max + ! 0, j ! 0,8a j . Hence,E[N j ]!n p Good jGoodj ,8a j 2Good. Because ties are broken randomly, at each world state !, each a j 2 Good ! is selected by the team with equal probability 1 jGood!j . AsE[N j ] =E[N k ]8a j ;a k 2Good, we have that at each ! it is possible to choosejGood ! j dierent actions. Hence, there are Q ! jGood ! j possible combinations of solutions. At each voting iteration, ties are broken at each ! randomly, and one possible combination is generated. As z!1, eventually we cover all possible combinations, andjSj! Q ! jGood ! j. Conversely, as min + ! High ,E[N j ]!np Good j for one xeda j such thatp Good j > p Good k ;8a j 6=a k 2Good. Consequently,E[N j ]>E[N k ], at each!. Hence, there is no tie in any world state, and the team picks a xed a ! j at each world state. Therefore, even if z!1,jSj! 1. Note that I do not say here that the same action is picked across world states (as a ! j may dier for each !), but that the same optimal solution is picked for all voting iterations. Therefore, uniform teams need a very strong assumption to satisfy condition (i): the probability of voting for optimal actions must be uniformly distributed over all optimal 73 Probability Good 1 a 2 a 3 GoodnGood 1 a 1 Bad a 0 a 5 a 4 Agent 1 Probability Good 2 a 1 a 2 GoodnGood 2 a 3 Bad a 0 a 5 a 4 Agent 2 Figure 5.2: Illustrative example of the probability distribution functions of two diverse agents. actions (i.e., max + ! 0). If max + ! 0, condition (ii) is also satised as n grows, because of Observation 5.3.1 (i.e., the probability of outputting a suboptimal solution goes to 0) and because of the fact that all actions are equally likely to be chosen; hence each solution is chosen with equal probability 1= Q ! jGood ! j. I show that, alternatively, we can use agents with dierent \preferences" (i.e., \di- verse" agents), to maximizejSj. I consider here agents that have about the same ability in problem-solving, but they prefer dierent optimal actions. As the agents have simi- lar ability, I consider here the probabilities to be the same across agents, except for the actions in Good, as each agent i has a subset Good i Good consisting of its pre- ferred actions (which are more likely to be chosen than other actions). I denote by p ij the probability of agent i voting for action a j . Hence, I dene the pdf of the diverse agents as: 8a j 2 Good i , let p Good i := P j p ij , p ij := p Good i jGood i j ; 8a j 2 GoodnGood i , p ij := p Good p Good i jGoodnGood i j ; and8a k = 2Good i ;a j 2Good i ,p ij >p ik . Good i \Good l (of agents i and l ) is not necessarily;. The pdfs are strictly dened in this section for the sake of clarity and precision, but in the next section and in my synthetic experiments I generalize further. In Figure 5.2 I show an illustrative example of the pdf of two agents. Let's consider we can draw diverse agents from a distributionF. Each agent i has r<jGoodj actions in itsGood i , and I assume that all actions inGood are equally likely to be selected to formGood i (since they are all equally optimal). Note thatr is the same for all agents (as, again, I assume they have the same pdfs, but dierent preferences), 74 r a 2 a 3 Agent 1 a 1 a 2 Agent 2 a 1 a 2 Agent 3 a 2 a 3 Agent 4 a 1 a 3 Agent 5 a 1 a 3 Agent 6 n k := 4 Figure 5.3: Illustrative example of Equation 5.1. Here I show 6 agents (n := 6) with 2 preferred actions each (r := 2). Each action is in the list of preferences of 4 agents (k := 4). As an example, I mark with a dashed circle one of the actions, a 2 . and that I also cover the case where each agent prefers a single action (which would be r := 1). I will show that by drawing n agents fromF, the team is optimal for large n with probability 1, as long as n is a multiple of a divisor (> 1) of eachjGood ! j. I will also show that the minimum necessary optimal team size is constant with high probability as the number of world states grow. I start with the following proposition: Proposition 5.3.3 If a team of sizen is optimal at a world state, thengcd(n;jGoodj)> 1. That is, n andjGoodj are not co-prime. Proof: (By contradiction). By the optimality requirement (ii), each action must be in theGood i set of the same number of agents. Otherwise, if an action a i is preferred by a larger number of agents than another action a j , the team would pick a i with a larger probability than a j . Hence, we must have that: nr =kjGoodj; (5.1) wherek is a constant2N >0 . k represents the number of agents that have a given action a j in itsGood i . Note that it must be the same for all optimal actions, and therefore we have a single constant. Ifn andjGoodj are co-prime, then it must be the case that r is divisible byjGoodj. However, this yields rjGoodj, which contradicts our assumption. Therefore, n and jGoodj are not co-prime. I illustrate Equation 5.1 with an example in Figure 5.3. In the gure I show 6 agents (n := 6), with 2 preferred actions each (r := 2), equivalent to the example that I showed before in Table 5.1 (II-b). Note that each action is preferred by 4 agents, and hence I show a case wherek := 4. As an example, I mark with a dashed circle one of the actions, a 2 . In such case, the team will have an equal probability of picking all optimal actions, 75 and optimality condition (ii) would be satised if the probability of picking suboptimal actions is 0. If, for example, we now change agent 5 to prefer actions a 2 and a 3 (replacing action a 1 by a 2 ), then the team would be more likely to pick action a 2 by plurality voting than any other action, and it would be less likely to pick action a 1 than any other action. As the number of voting iterations is limited in actual applications, this situation is not desirable. Note that we could also have a case where one agent prefers a larger number of actions than others. For example, we could change agent 5 to prefer actions a 1 , a 2 and a 3 . However, as the agent has a limited amount of probability distributed over the actions in the Good i set (i.e., p Good i), we have that necessarily the probability of the agent voting for a 1 and a 3 would drop; hence, the team would pick a 2 more often than the other actions, and a 1 and a 3 less often than the other actions. This does not mean, however, that there is a single optimal conguration for each number of optimal actionsjGoodj. There are multiple possible solutions for Equation 5.1, but in any possible solution we will nd that the size of the team n andjGoodj are not co-prime. Proposition 5.3.3 is a necessary but not sucient condition for optimality. That is, if Equation 5.1 is satised, all optimal actions will be selected with the same probability, but it is still necessary for the probability of picking suboptimal actions to go to 0 in order to fully satisfy condition (ii). That will be the case ifp Good i = 1, or ifn!1, since p Good j >p Bad k ,8a j 2Good;a k 2Bad. Note that Proposition 5.3.3 implies hard restrictions for world states wherejGoodj is prime, or for teams with prime size n: if n is prime,jGoodj must be a multiple of n; and ifjGoodj is prime, n must be a multiple ofjGoodj. Now let's analyze across a set of world states . For a team of xed sizen, Proposition 5.3.3 applies across all world states. Hence, the team size must be a multiple of a divisor (> 1) of eachjGood ! j. Note that the pdfs of the agents (and alsor) may change according to !. Let D be a set containing one divisor of each world state (if two or more world states have a common divisor x, it will be representable by only one x2D). Hence,8!, 9d2D, such thatd jGood ! j; and8d2D,9Good ! , such thatd jGood ! j. There are multiple possibleD sets, from the superset of all possibilitiesD. Therefore, we can now study the minimum size necessary for an optimal team. Ap- plying Proposition 5.3.3 at each world state !, we have that the minimum size necessary for an optimal team isn = min D2D Q d2D d. Hence, our worst case is when eachjGood ! j is a unique prime, as the team will have to be a product of all (unique) optimal action space sizes. This means that: 76 Proposition 5.3.4 In the worst case, the minimum team size is exponential in the size of the world statesj j. In the best case, the minimum necessary team size is a constant withj j. Proof: In the worst case, each added world state ! has a unique prime optimal action space size. Hence, the minimum team size is at least the product of the rstj j primes, which, by the prime number theorem, has growth rate exp((1 +o(1))j j logj j). In the best case, each addedGood ! has a common divisor with previous ones, and the minimum necessary team size does not change. However, I show that the worst case happens with low probability, and the best case with high probability. Let G be the maximum possiblejGoodj, and M :=j j. Assume that each world state! j will have a uniformly randomly drawn number of optimal actions, denoted as m j , for all j = 1;:::;M (i.e.,8!2 ). I assume that G is large enough, so that the probability that a given m j has factor p is 1=p. Proposition 5.3.5 The probability that the minimum necessary team size grows expo- nentially tends to 0, and the probability that it is constant tends to 1, as M!1. Proof: It is sucient to show that the probability that m 1 ;:::;m M1 are all co-prime with m M tends to 0 as M!1. That is, I show that when adding a new world state ! M , itsjGood ! j will have a common factor with the size of theGood set of some of the other world states with high probability. Given any prime p, the probability that at least one of any independently randomly generated M 1 numbers m 1 ;:::;m M1 has factor p is 1 (1 1 p ) M1 , while the proba- bility that one independently randomly generated numberm M has factorp is 1 p (for large enough G). Therefore, the probability m M shares common factor p with at least one of m 1 ;:::;m M1 is 1(1 1 p ) M1 p . The probability that m M is co-prime with all m 1 ;:::;m M1 is: Y all primes p [1 1 (1 1 p ) M1 p ]; which, as M!1, tends to: Y all primes p (1 1 p ) = 1 (1) = 0; where (s) is the Riemann zeta function. The last equality holds true since: (1) = Y all primes p 1 1p 1 = 1 X i=1 1 i !1 77 (as shown by Euler). Hence, with high probability, when adding a new world state !, jGood ! j will share a common factor with a world state already in . Finally, in the next theorem I show that a diverse team of agents is always optimal as the team grows, as long as it grows carefully. That is, I show that for large diverse teams we will be able to satisfy the optimality conditions (i) and (ii), as long as the team size is a multiple of a divisor ofjGood ! j,8!2 . Again, I assume that G is large enough, so that the probability that a given m j has factor p is 1=p. Theorem 5.3.6 Let D 2 D be a set containing one factor from each Good ! . For arbitraryn, the probability that we generate (by drawing from a distributionF) an optimal team of sizen converges to 0 asj j!1. However, ifn =c Q d2D d, then the probability that the team is optimal tends to 1 as c!1. Proof: For an arbitrary team size n, let P be the set of its prime factors. Given one p2P , the probability that p is not a factor ofjGood ! j is 1 1=p. The probability that all p2P are not factors is: Q p (1 1=p). As 0< Q p (1 1=p)< 1, the probability that at least one p2 P is a factor ofjGood ! j is 1 Q p (1 1=p) < 1. Forj j tests, the probability that at least one p is a factor in all of them is: 1 Y p (1 1=p) ! j j ; which tends to 0, asj j!1. Hence, the probability that gcd(n;jGood ! j) = 1 for at least one ! tends to 1, and the probability that the team can be optimal tends to 0. However, if: n =c Y d2D d; thengcd(n;jGood ! j)6= 18!2 , satisfying the necessary condition in Proposition 5.3.3 at all world states. Let n j be the number of agents i that have a j in itsGood i , and P (n j =n k ) be the probability that n j = n k (that is, the probability that the same number of agents have a j and a k in theirGood i ). As each a j has equal probability of being in aGood i , for a large number of drawings fromF (i.e., c!1), we have that P (n j =n k )! 1;8a j ;a k 2 Good ! ;8!, by the law of large numbers. Hence, each optimal solution will be selected with the same probability. Moreover, as p Good j > p Bad k ,8a j 2 Good;a k 2 Bad, the probability of picking a suboptimal solution converges to 0 (asn!1 withc!1), and hence the probability of 78 picking each of the optimal solutions converges to 1= Q ! jGood ! j (satisfying optimality condition (ii)). If it is expensive to test values forn such that Theorem 5.3.6 is satised, we can choose n =c Q ! jGood ! j, as it immediately implies the conditions of the theorem. Moreover, if we know the size of alljGood ! j, we can check ifn andjGood ! j are co-prime inO(h) time (where h is the number of digits in the smaller number), using the Euclidean algorithm. Hence, we can test all world states in O(j jh) time. I presented in this section an idealized version of my theory. In practice, the agents' pdfs may dier further than what was considered, and the world states (i.e., parameters of a design problem) may not be completely independent. Hence, in the next section I generalize the theory to more complex situations. 5.3.3 Generalizations In this section I present several generalizations from my initial idealized model, in order to cover more realistic situations. I start by generalizing my theory to cases where the agents do not have only a probability of p ij := p Good i jGood i j or p ij := p Good p Good i jGoodnGood i j to vote for actions inGood (depending if the action is inGood i or not), but now can have dierent probabilities distributed over the actions in Good. Hence, I now model each agent as having a set ofGood i sets, each with its own probability distributed over the actions in the set. For this generalization, I still consider that the agents have the same pdf, but dierent preferences. That is, the agents may have dierent actions at each Good i set, but their size and the number of sets is the same across agents. Hence, I denote each Good i set j as j Good i . Each also has its own p j Good i total probability, that will be equally distributed among all actions in j Good i , in a similar fashion as before. As mentioned, the content of each j Good i set may dier across agents, but I consider the p j Good i to be the same across agents. Note that the case where each action has a dierent probability is dened as the situation where eachj j Good i j := 1. Similarly as before, I consider that each agent i has j r <jGoodj actions at each j Good i , and all actions inGood are equally likely to be selected to form each j Good i . In Figure 5.4 I show an illustrative example of the pdf of two agents with multiple j Good i sets. Proposition 5.3.7 Theorem 5.3.6 still applies under the more general model stated above. That is, ifn =c Q d2D d, then the probability that the team is optimal tends to 1 asc!1. Proof: Similarly as before, for each j Good i we must have that: 79 Probability 1 Good 1 a 5 a 7 a 2 2 Good 1 a 10 3 Good 1 a 1 a 6 Bad a 3 a 4 a 8 a 9 Agent 1 Probability 1 Good 2 a 10 a 2 a 1 2 Good 2 a 6 3 Good 2 a 7 a 5 Bad a 3 a 4 a 8 a 9 Agent 2 Figure 5.4: Illustrative example of the probability distribution functions of two agents with multiple j Good i sets. n j r = j kjGoodj; (5.2) so that for each j Good i we have that j k agents have a given action a in its j Good i . As the total probability p j Good i of each set is the same across agents, we have that each optimal action will be selected by the team with the same probability when Equation 5.2 is satised for all j Good i . Hence, across world states, each optimal solution will also have the same probability of being selected. Similarly as in Proposition 5.3.3, for Equation 5.2 to be satised, we must have that n andjGoodj are not co-prime, and that will be true when n =c Q d2D d. Let j n l be the number of agents i that have a l in its j Good i , and P ( j n l = j n m ) be the probability that j n l = j n m . Like before, as each a l has equal probability of being in a j Good i , for a large number of drawings fromF (i.e., c!1), we have that P ( j n l = j n m )! 1;8a l ;a m 2Good ! ;8!, by the law of large numbers. Notice that this happens for all j Good i sets. Hence, all optimal actions will be selected with the same probability by the team. Like before, as p Good l > p Bad m ,8a l 2 Good;a m 2 Bad, the probability of picking a suboptimal action converges to 0 (as n!1 with c!1), and hence the probability of picking each of the optimal solutions converges to 1= Q ! jGood ! j (satisfying optimality condition (ii)). Now I present my second generalization. I show that Theorem 5.3.6 still applies for agents i with dierent probabilities over optimal actions p Good i. I consider here a more 80 general denition of optimal team: the dierence between the probabilities of picking each optimal solution and 1= Q ! jGood ! j must be as small as possible. Hence, let p j be the probability of team picking optimal action a j , the optimal team is such that # := P a j jp j 1=jGood ! jj,8a j 2 Good ! is minimized (8!2 ). I focus here in a single world state !, as by minimizing # in each world state we are also making the dierence between the probability of picking each optimal solution and 1= Q ! jGood ! j as small as possible. Hence, the original denition in the previous section is the case where # := 0. Proposition 5.3.8 Theorem 5.3.6 still applies whenjp Good ip Good jj ,8 i ; j , for small enough > 0. Proof: Let be an optimal team, where p Good i is the same for all agents i . Hence, the probability of all actions in Good being selected by the team is the same. I.e., p k = p l ;8a k ;a l 2Good, and # := 0. Let := P a k 2Good P a l 2Good jp k p l j be the dierence between the probabilities of the team taking each optimal action. In the rest of the proof we will disturb the probabilities p Good i of sets of agents, which will change . I focus in studying the variation in , as minimizing the variation in also minimizes the variation in #. I prove by mathematical induction. Assume we change the p Good i ofx agents i , and is as small as possible. Now we will change x + 1 agents. Let's pick one agent i and increase its p Good i by . It follows that p k >p l ;8a k 2Good i ;a l = 2Good i , and the new 0 := P a k 2Good P a l 2Good jp k p l j> . If we add one more agent j , such that Good j \Good i =;, the probability of voting for actions a m 2Good j increases. For small enough , p Good j will be too large to precisely equalize the probabilities, and it follows thatp m >p k >p l ;8a m 2Good j ;a k 2 Good i ;a l = 2Good i [Good j , and 00 := P a k 2Good P a l 2Good jp k p l j> 0 . The same applies for each newly added agent, until we have a new team such that n = c Q d2D d (again, satisfying the conditions of the theorem). The base case follows trivially. If we did not change the probability of any agent (i.e., x := 0), and we now increase p Good i of a single agent i , p k > p l ;8a k 2 Good i ;a l = 2 Good i , and 0 > . By the same argument as before, adding more agents will only increase 0 , until n =c Q d2D d. Thirdly, I also generalize to the case where the number of preferred actions r changes for each agent. I consider that the number of actions in theGood i of each agent i (r i ) is decided according to a uniform distribution on the interval [1;r 0 ]. Proposition 5.3.9 If n =r 0 c Q d2D d, the probability that the team is optimal! 1 as c!1. 81 Proof: For largen, the number of agents withr i = 1;:::;r 0 is the same. Therefore, if for each subset i , such thatr =i;82 i , we have thatp i k =p i l ,8a k ;a l 2Good, we will have that p k =p l ;8a k ;a l 2Good. Given an optimal team of size n, we have r 0 subsets i of size n=r 0 each. It follows by Theorem 5.3.6 that n=r 0 =c Q d2D d, and: n =r 0 n=r 0 =r 0 c Y d2D d; hencen also follows the necessary conditions in Proposition 5.3.3. Similarly as in Theorem 5.3.6, asn!1 withc!1, the probability of picking a suboptimal solution converges to 0, and the probability of picking each of the optimal solutions converges to 1= Q ! jGood ! j (satisfying optimality condition (ii)). Lastly, I discuss the assumption that world states are independent. In design problems they could actually be correlated. Hence, I present below a constructive proof showing that we can still use our model to study design problems with correlated parameters. Proposition 5.3.10 The previous results still apply for design problems with correlated parameters. Proof: Let's consider a design problem with a set of parameters. We can divide in k sets, where all 2 k are correlated, but i and j are independent,8 i 2 i ; j 2 j ;i6= j. That is, all parameters in a k set are correlated, but the parameters between two dierent k sets are independent. This can always be performed, as in the worst case where all parameters are correlated, we can have a single k := . Now, instead of modeling each design parameter as a world state! (as in my original model), we can model each set k as a world state!. Hence, instead of an actiona being one value assigned to a parameter , an action a now represents one full combination of values to each k in a set k . Hence, instead of voting at each parameter, each agent i now votes for one combi- nation of value assignments (of correlated parameters) at each set k . As all sets k are independent, we still have agents voting for independent world states ! and the previous results still apply. In the worst case, where all parameters of the problem are correlated, we would have agents voting for entire solutions, and the model would be considered as having a single world state !. In the next section I perform synthetic experiments with agents whose pdfs dier, to further generalize over my theory, and I show that diverse teams still signicantly outperform uniform teams. 82 0.0 0.1 0.2 0.3 maxλ + 0.0 0.2 0.4 0.6 0.8 % of Optimal Figure 5.5: Percentage of optimal solutions found by uniform teams as max + grows. Note how max + := 0 gives the best result. 5.4 Synthetic Experiments I run synthetic experiments, where I simulate design agents and evaluate diverse and uniform teams (henceforth diverse and uniform). I randomly create pdfs for the agents, and simulate voting iterations across a series of world states. I repeat all my experiments 100 times, and in the graphs I plot the average and the condence interval of my results (according to a t-test with p := 0:01). I run 1000 voting iterations (z), and measure how many optimal solutions the team is able to nd. I study a scenario where the number of actions (jAj) := 100, and the number of optimal actions per world state (jGood ! j) is, respectively: < 2; 3; 5; 5; 5>, in a total of 750 optimal solutions. At each repetition of my experiment, I randomly create a pdf for the agents. I start by studying the impact of max + in uniform. When creating the uniform team, the total probability of playing any of the optimal actions (i.e., p Good ) is randomly assigned (uniform distribution) between 0.6 and 0.8. I x the size of the team (25) and evaluate dierent max + in Figure 5.5. As expected from Proposition 5.3.2, for max + := 0 the system nds the highest number of optimal solutions; and as max + increases, it quickly drops. I then study the impact of increasing the number of agents, for uniform and diverse. To generate a diverse team, I draw randomly a r ! in an interval U for each world state, that will be the size ofjGood i j. I study three variants: diverse*, whereU := (0;jGood ! j]; diverse, where U := (0;jGood ! j), and diverse, where I allow agents to have dierent r i ! , also drawn from (0;jGood ! j). I independently create pdfs randomly for each agent i . For each agent I draw a number between 0.6 and 0.8 to distribute over the set of optimal actions, and randomly decide r ! actions to compose itsGood i set. I distribute equally 80% of the probability of voting over optimal actions on the actions of that set. 83 0 10 20 30 40 50 Number of Agents 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 % of Optimal Diverse* Diverse DiverseΔ Uniform Figure 5.6: Percentage of optimal solutions as the number of agents grows. The uniform teams decrease in performance, while multiple variations of diverse teams improve, but with diminishing returns. As we can see, in Figure 5.6, the number of solutions decreases for uniform as the number of agents grows. Normally, in social choice, we expect the performance to improve as teams get larger, so this is a novel result. It is, however, expected from our Proposition 5.3.2. Diverse, on the other hand, improves in performance for all 3 versions, as predicted by my theory. However, the system seems to converge for a xed z, as the performance does not increase much after around 20 agents. Hence, in Figure 5.7 I study larger diverse (continuous line) and diverse teams (dashed line), going all the way up to 1800 agents. I also study four dierent number of voting iterations (z, shown in the gure by dierent lines): 1000, 2000, 3000, 4000. As we can see, although adding more agents was not really improving the performance in the experimental scenario under study, there is clearly a statistically signicant improvement (p< 0:01) by increasing the number of voting iterations, with the system improving from nding around 53% of the optimal solutions, all the way up to nding more than 80% of them. However, there is a diminishing returns eect, as the impact of adding more iterations decreases as the actual number of iterations grow larger. We also note that diverse is better than diverse, and the dierence increases as z grows. As we can see, although theoretically possible, it is still a challenge to have a system that can nd all the possible optimal solutions. Moreover, it would be expensive to pool the votes of agents through a large number of voting iterations. However, as I show next, we can actually approximate this process in a real system, by pooling only a small number of solutions from each agent, and executing many voting iterations by aggregating 84 600 1200 1800 Number of Agents 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 % of Optimal 1000 2000 3000 4000 Figure 5.7: Percentage of optimal solutions found by large teams of diverse agents. dierent combinations of these solutions. In the next section I show executions in a real design system. 5.5 Experiments in Architectural Design 5.5.1 Architectural Design Domain I study a real system for architectural building design. This is a fundamental domain, since the design of a building impacts its energy usage during its whole life-span [Bo- genst atter, 2000, Lin and Gerber, 2014, Echenagucia et al., 2015]. I use Beagle [Gerber and Lin, 2013], a multi-objective design optimization software that assists users in the early stage design of buildings. Hence, the experiments presented here were run in an actual system that performs expensive energy evaluations over complex architectural de- signs, and represent months of experimental work. First, the designer creates a parametric design, containing (as discussed in Section 5.2) a set of parameters that can be modied within a specied range, allowing the creation of many variations. The ranges are dened according to the legislation (i.e., setback, maximum height, etc), or the intention of the designer (for example, the general shape of the building). I use designs from Gerber and Lin [2013]: base, a simple building type with uniform program (i.e., tenant type); oce park, a multi-tenant grouping of towers; and contem- porary, a double \twisted" tower that includes multiple occupancy types, relevant to contemporary architectural practices. I show the designs in Figure 5.8. 85 (a) Base (b) Oce Park (c) Contemporary Figure 5.8: Parametric designs with increasing complexity used in our experiments. Beagle uses a genetic algorithm (GA) to optimize the building design based on three objectives: energy eciency, nancial performance and area requirements. In detail, the objective functions are: S obj : maxSPCS; E obj : minEUI; F obj : maxNPV . SPCS is the Spatial Programming Compliance Score, EUI is the Energy Use Intensity and NPV is the Net Present Value, dened as follows. SPCS denes how well a building conforms to the project requirements (by mea- suring how close the area dedicated to dierent activities is to a given specication). Let L be a list of activities (in our designs, L=<Oce, Hotel, Retail, Parking>), area(l) be the total area in a building dedicated to activity l and requirement(l) be the area for activity l given in a project specication. SPCS is dened as: SPCS := 100 1 P l2L jarea(l)requirement(l)j jLj EUI regulates the overall energy performance of the building. This is an estimated overall building energy consumption in relation to the overall building oor area. The process to obtain the energy analysis result is automated in Beagle through Autodesk Green Building Studio (GBS) web service. Finally, NPV is a commonly used nancial evaluation. It measures the nancial per- formance for the whole building life cycle, given by: NPV := P T t=1 ct (1+r) t c 0 , where T is the Cash Flow Time Span, r is the Annual Rate of Return, c 0 is the construction cost, and c t := Revenue Operation Cost. Many options aect the execution of the GA, including: initial population size, size of the population, selection size, crossover ratio, mutation ratio, maximum iteration. Further details about Beagle are at Gerber and Lin [2013]. In the end of the optimization process, the GA outputs a set of solutions. These are considered \optimal", according to the internal evaluation of the GA, but are not necessarily so. As in my theory, for each parameter the assigned value is going to be one 86 Agent PZ SZ CR MR Agent 1 12 10 0.8 0.1 Agent 2 18 8 0.6 0.2 Agent 3 24 16 0.55 0.15 Agent 4 30 20 0.4 0.25 Table 5.4: GA parameters for the diverse team. Initial Population and Maximum Itera- tion were kept as constants: 10 and 5, respectively. PZ = Population Size, SZ = Selection Size, CR = Crossover Ratio, MR = Mutation Ratio. of the optimal ones with a certain probability. In fact, most of the solutions outputted by the GAs are later identied as sub-optimal and eliminated in comparison with better ones found by the teams. I model each run of the GA as an agent. Each parameter of the parametric design is a world state !, where the agents decide among dierent actions A (i.e., possible values for the current parameter). My model assumes independent multiple voting iterations across all world states. However, in general it could be expensive to pool agents for votes in a large number of iterations. Therefore, in order to test the applicability of the predictions of my model in more realistic scenarios, in my experiments I actually pool only 3 solutions per agent, but run multiple voting iterations by aggregating over all possible combinations of them. That is, at each combination I pick one solution per agent, and vote across all the design parameters, in a total of 81 voting iterations with 4 agents. Nevertheless, as I show next, the predictions of my model are veried in my empirical experiments, presented next. 5.5.2 Empirical Results I run experiments across the dierent parametric designs shown in Figure 5.8. These are designs with increasing complexity. More details about the designs and the meanings of each parameter are available in Gerber and Lin [2013]. I create 4 dierent agents, using dierent options for the GA, as shown in Table 5.4. Contrary to the previous synthetic experiments, we are dealing here with real (and consequently complex) design problems. Hence, the true set of optimal solutions is un- known. I approach the problem in a comparative fashion: when evaluating dierent systems, I consider the union of the set of solutions of all of them. That is, letH x be the set of solutions of system x; I consider the setH := S x H x . I compare all solutions in H, and consider as optimal the best solutions inH, forming the set of optimal solutions O. I use the concept of Pareto dominance: the best solutions inH are the ones that dominate all other solutions (i.e., they are better in all 3 objectives). As I know which 87 system generated each solution o2O, I estimate the set of optimal solutionsS x of each system. Although my theory focuses on plurality voting as the aggregation methodology, in order to have a more thorough experimental study, I also present results using the mean and the median of the opinions of the agents. That is, given one combination (a set of one solution from each agent), I also generate a new solution by calculating the mean or the median of the values from each agent across all parameters (i.e., world states). Also, when performing the voting aggregation (vote), I consider values that are the same up to 3 decimal places as equal. Concerning uniform, I evaluate a team composed of copies of the \best" agent. By \best", I mean the agent that nds the highest number of optimal solutions. According to Proposition 5.3.2, such an agent should be the one with the lowest max + , and we can predict that voting among copies of that agent generates a large number of optimal solutions. Hence, for each design, I rst compare all solutions of all agents (i.e., construct H as the union of the solutions of all agents), to estimate which one has the largest set of optimal solutions S. I, then, run that agent multiple times, creating uniform. For diverse, I consider one copy of each agent in Table 5.4. I aggregate the solutions of diverse and uniform. I run 81 aggregation iterations (across all parameters/world states), by selecting 3 solutions from each agent i , in its set of solutionsH i , and aggregating all possible combinations of these solutions. I evaluate together the solutions of all agents and all teams (i.e., I constructH with the solutions of all systems), in order to estimate the size ofS x of each system. Since the true optimal solutions set is unknown, I rst plot the percentage of unique solutions found by each system in relation to the total number of unique optimal solutions inH. Hence, in Figure 5.9 (a), I show the percentage of optimal solutions for all systems, in relation tojOj. For clarity, I represent the result of the individual agents by the one that had the highest percentage. As we can see, in all parametric designs the teams nd a signicantly larger percentage of optimal solutions than the individual agents. The agents nd less than 1% of the solutions, while the teams are in general always close to or above 15%. In total (considering all aggregation methods and all agents), for all three parametric designs the agents nd only about 1% of the optimal solutions, while uniform nds around 51% and diverse 47%. Looking at vote, in base diverse nds a larger percentage of optimal solutions than uniform (around 9:4% for uniform, while 11:6% for diverse). In oce park and contemporary, however, uniform nds more solutions than diverse. Based on Proposition 5.3.2, we expect that this is caused by the best agent having a lower max + in oce park and contemporary than in base. 88 Base Office Contemporary 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 % of Optimal Uniform Diverse Uniform Diverse Uniform Diverse Agents Vote Mean Median (a) Percentage in relation to all solutions found by all systems. Base Office Contemporary 0.0 0.2 0.4 0.6 0.8 1.0 #Optimal/Solution Set Uniform Diverse Uniform Diverse Uniform Diverse (b) Percentage in relation to the number of solutions of each system. Figure 5.9: Percentage of optimal solutions of each system. The teams nd a much larger percentage than the individual agents. 89 Figure 5.9 (b) shows the percentage of optimal solutions found, in relation to the size of the set of evaluated solutions of each system. That is, let O x be the set of optimal solutions of system x, inO. I show jOxj jHxj . Concerning vote, the teams are able to nd a new optimal solution around 20% of the time for base, around 73% of the time for oce park and around 36% of the time for contemporary. Meanwhile, for the individual agents it is close to 0%. We can see that teams have great potential for generating new optimal solutions, as expected from my theory. However, as studied in my synthetic experiments, we can expect some diminishing returns when increasing the number of voting iterations. I show examples of solutions created by the teams in Figure 5.10. It is interesting to note that the performance of the teams is much higher for oce park than for the other two parametric designs. In base and contemporary, the building mass is parametrized into a single volume, while in oce park the building mass has multiple volumes. Hence, a possible explanation is that the division in multiple volumes facilitated the generation of multiple optimal solutions, since these can be combined in many dierent ways. I also plot in Figure 5.11 (a) the percentage of solutions that were reported to be optimal by each agent, but were later discovered to be suboptimal by evaluatingH. A large amount of solutions are eliminated, close to 100%, helping the designer to avoid making a poor decision, and increasing her condence that the set of optimal solutions found represent well the \true" Pareto frontier. Moreover, I test for duplicated solutions across dierent aggregation methods, dierent teams and dierent agents. The number is small: only 4 in contemporary, and none in base and oce park. Hence, we are providing a high coverage of the Pareto frontier for the designer. I show the total number of optimal solutions in Figure 5.11 (b). We reconrm here that in the oce park design, where the building mass is divided in multiple volumes, we could generate a larger number of optimal solutions. Finally, to better study the solutions proposed by the agents and teams, I plot all the optimal solutions in the objectives space in Figure 5.12, where I show that the solutions give a good coverage of the Pareto frontier. 5.6 Discussion I present in this chapter a new model of teams of voting agents for design problems. I propose as the main objective of such system to present for human evaluation as many optimal solutions as possible. Hence, in this chapter I present a view of the role of articial intelligence (AI) in the creative process as generating a large number of optimal 90 (a) Base. Shaded area shows variance of building's footprint in relation to site. Dashed line indicates height variance. (b) Oce Park. Dashed line shows variance in volume. (c) Contemporary. Line shows variance in orientation. Figure 5.10: Some building designs generated by the teams. 91 Base Office Contemporary 0.0 0.2 0.4 0.6 0.8 1.0 % of False Agent 1 Agent 2 Agent 3 Agent 4 (a) False optimal solutions that are eliminated. Base OfficeContemporary 0 50 100 150 200 250 300 350 400 # Optimal (b) Number of unique optimal solutions. Figure 5.11: Additional analysis. I show here that many falsely reported optimal solutions are eliminated by the team of agents, and also that the teams provide a large number of optimal solutions to the designer. solutions, so that a human can later take an aesthetic choice (also based on a quantitative evaluation of dierent objectives, solving complex trade-os). A human designer also needs to input an initial parametric design (such as the ones in Figure 5.8) for the agents to vote for dierent combinations of values for such design. Hence, this chapter views the role of AI in the creative process as enabling a collaborative work between a human and algorithms. I study here the potential of voting systems to be creative by generating a large number of optimal solutions, but the entire creative process is actually a collaboration between the voting system and the designer, and the aesthetic evaluation of quality/beauty is assigned by a human. Hence, the AI system is still in the role of optimizing quantiable metrics (cost, energy eciency, etc), that would be too burdensome for a human to optimize, while the human designer assumes the aesthetic evaluation work (which I argue that cannot be properly taken by the machine), besides solving trade-os between objectives that may not be well formalized, and hence cannot be handled by the computer. The role of AI in the creative process is an important topic of current discussion. For example, d'Inverno and McCormack [2015], in a very recent position paper, divide the AI research in two dierent types: what they ironically call \Heroic AI", when researchers try to build systems that are in the role of the whole creative process; and \Collaborative AI", when a computational system works in collaboration with a human in the creative process. This chapter, therefore, is aligned with the second classication of d'Inverno and McCormack. Another important topic of discussion is how to evaluate AI systems that handle creative processes. There is not yet a widely accepted evaluation metric in the literature. Some researchers use tests inspired by the \Turing test", and evaluate if humans can distinguish if a creative output (such as music), was produced by a human or by a 92 NPV 1e8 9.0 8.5 8.0 7.5 7.0 6.5 6.0 EUI 43.2 43.4 43.6 43.8 44.0 Design S core 96.0 96.5 97.0 97.5 98.0 98.5 99.0 99.5 100.0 Agent 1 Agent 2 Agent 3 Agent 4 Diverse - Vote Diverse - Median Diverse - Mean Unifo r m - Vo t e Unifo r m - M e d ian Unifo r m - M e a n (a) Base NPV 1e8 1 0 1 2 3 4 5 6 EUI 47 48 49 50 51 52 53 Design S core 50 60 70 80 90 Agent 1 Agent 2 Agent 3 Agent 4 Diverse - Vote Diverse - Median Diverse - Mean Unifo r m - Vo t e Unifo r m - M e d ian Unifo r m - M e a n (b) Oce Park NPV 1e8 1 0 1 2 3 4 EUI 57 58 59 60 61 62 Design S core 65 70 75 80 85 90 95 Agent 1 Agent 2 Agent 3 Agent 4 Diverse - Vote Diverse - Median Diverse - Mean Unifo r m - Vo t e Unifo r m - M e d ian Unifo r m - M e a n (c) Contemporary Figure 5.12: All the optimal solutions in the objectives space. 93 computer [Burnett et al., 2012]. Other works evaluate in qualitative terms, by reporting the impressions of a human when using the system to create [Berman and James, 2015]. Others, such as Machado et al. [2015], uses evolutionary algorithms to generate creative objects (in their case, ambiguous images), and uses the tness function of the evolutionary algorithm as a quantitative evaluation metric. However, in the end they still need to manually look over the \optimal" solutions found by the algorithm and make a qualitative evaluation if the system was really able to create what it was supposed to. Hence, the quantitative evaluation still fails to really represent a measurable quality of the system. In this chapter I propose the number of optimal alternatives as a quantitative eval- uation metric for a \Collaborative AI" system. Of course that may not be the best quantitative metric for all creative processes where AI is involved, but it is one possible evaluation metric that could be used for systems where the aesthetic choice is left for the human. It could still be an issue that we may not want to overload a human with too many solutions. Although mathematically all the solutions are of equal \optimal" quality, the human may prefer one objective over another (if in a multi-objective optimization problem), or may want to only evaluate highly dierent solutions in the aesthetic sense. This situation is, however, much better than if the human had to pick one solution in the space of all possible solutions (optimal and suboptimal). The proposed system already helps to reduce the space to only the solutions that matter, by presenting only optimal solutions for human evaluation. I propose, hence, that such voting systems that maximize the number of optimal solu- tions could be used as a building block of a bigger \Collaborative AI" system. Its output could be the input of another system that cluster and organize the proposed solutions, in order to present them to a human designer in a manageable way. As mentioned previ- ously, systems that organize a large number of solutions for human evaluation is currently an active research topic in computational design [Erhan et al., 2014, Smith et al., 2010]. Additionally, my model assumes independent world states, which are initially modeled as the parameters of a parametric design problem. Some design problems, however, may have correlated parameters. I discuss an extension of my model in Proposition 5.3.10, where the parameters are grouped into independent groups. In the worst case of such extension, if all parameters are correlated, the agents would vote for entire solutions, rather than each parameter. Besides, a theoretical model allows us to make predictions about a real system, but most often is only an approximation of reality (as actual systems are normally overly complex to be modeled and analyzed in detail). Hence, even though the parameters of a problem may not be completely independent, they (or some subsets of them, as in 94 my extension) may still have some degree of independence, so that my model will still approximate the real execution. In light of Proposition 5.3.10, one may also question the need for world states in my model, instead of simply always voting for entire solutions, which would work for any design problem. I argue, however, that we should use multiple world states whenever possible/reasonable, as it has the greatest potential in creating new solutions. With a single world state, we will always pick one solution proposed by one of the agents, while with multiple states we can generate solutions that were not individually proposed by none of the agents (since at each world state we may select actions proposed by dierent subsets of agents). There are, of course, even more complex problems whose solutions cannot even be properly evaluated by any metric, such as the \wicked problems" coined by Rittel and Webber [1973]. Such problems seem to be beyond the scope of my theory, as we need to be able to evaluate a set of solutions with some well dened metric, so that a set of \optimal" solutions can be found. At the very least, we would need an approximate measurable quality for each solution, which could be obtained, for example, with computer simulations. Finally, this chapter discusses the importance of having diverse agents, but not how such set of agents with dierent preferences can be eectively obtained. In Section 5.5 I create dierent agents by using dierent settings in a GA software (Beagle), but that may not yet be the best way to obtain such diverse teams. Creating and evaluating dierent diversication strategies is still avenue for future work. Perhaps the evaluation function of each GA could be tweaked to bias its search to dierent portions of the search space. Some of my preliminary results [Marcolino et al., 2014a], however, show that this should be done with care, as giving one unique objective for each agent to optimize was not as eective as using dierent parametrizations for each GA in the way that I present in this chapter. Essentially, the best diversication strategy may depend on each specic design prob- lem. For some problems it could be the case that a bias towards specic kinds of solutions could be programmed for each agent, or dierent algorithmic strategies may converge to dierent solutions. A deeper study on how diverse agents can be eectively generated for dierent design problems is, however, beyond the scope of this thesis. Nevertheless, this chapter sets formal and theoretical guidelines for the community to better develop, in future work, strong diverse voting teams for design. 95 5.7 Conclusion Design imposes a novel problem to social choice: maximize the number of optimal solu- tions. This problem is of fundamental importance, since designers need a large space of alternatives in order to take a choice. Ideally every solution a designer examines should be optimal according to measurable metrics, so that she can then solve complex trade-os that may not be formalized and/or decide according to aesthetics. Hence, I present a new model for agent teams, that shows the potential of a system of voting agents to be creative, by generating a large number of optimal solutions to the designer. My analysis, which builds a new connection with number theory, presents several novel results: (i) uniform teams are in general suboptimal, and converge to a unique solution; (ii) diverse teams are optimal as long as the team size is increased carefully; (iii) the minimum optimal team size is constant with high probability; (iv) the worst case for teams is a prime number of optimal actions. I also extend my theory to more general situations, covering cases where agents have greater variability on their probabilities of voting for each optimal action, considering not only a variability across dierent actions but also across dierent agents. Most importantly, I also show one way to generalize my theory to cover design problems with correlated parameters, by considering independent parameter sets. I further study my model with synthetic experiments, which evaluate teams of agents with bounded time and relaxed assumptions, as I allow the probability distribution func- tions of the agents to vary. The experiments explore three variations of diverse teams and show that they all perform well, increasing in performance as the number of agents and/or the number of voting iterations grow, although with diminishing returns. Uniform teams, on the other hand, decreased in performance as the number of agents increased, as expected by the theory. I present results in conceptual architectural design, where I study a real team of genetic algorithm agents, which propose dierent alternatives to the design of buildings, optimizing for energy-eciency, cost and project requirements. I show that teams nd a large number of solutions for designing energy-ecient buildings across three dierent parametric designs of increasing complexity. Moreover, I nd that when voting a diverse team is able to nd a larger set of solutions than a uniform team in the real system for one of the design problems, a result that could be explained by my theory. I also noticed that we could nd a larger number of optimal solutions in the parametric design where the building mass was divided in multiple volumes, which could indicate that these kind of designs are preferable when using voting systems. 96 Finally, I present an extensive discussion about how this work ts in the context of the study of creativity within AI, and I argue that my model can be considered as \Col- laborative AI", where humans and algorithms work together in the process of creation. I also propose that voting systems which maximize the number of optimal solutions could be combined with another system that cluster and lter the solutions, and presents them in a manageable way to a designer. Such systems are a current topic of research in the computational design literature. This chapter, hence, lays a rst theoretical foundation for the study of multi-agent voting teams for design problems. Further extensions of this model, and a deeper study on how to methodologically create diverse agents for design are exciting venues for the multi-agents and/or computational design community perform future work. 97 Part III Aggregation of Opinions 98 Chapter 6 Ranked Voting A dierence of opinions just means you and I On dierent subjects do not see eye to eye And who is wrong or right is not for me to say Since we do look at life in a dierent way. (Francis Duggan) 6.1 Introduction Ranked voting is an active topic of research in the social choice literature [Caragiannis et al., 2013, Xia and Conitzer, 2011a,b, Souani et al., 2012, Baharad et al., 2011, Mao et al., 2013]. However, there are not many systems actually using ranked voting ap- proaches, and plurality is still the most common aggregation methodology. Additionally, it is not yet clear how to extract rankings from existing agents, and how to methodolog- ically generate diverse teams of agents in order to aggregate their opinions. In this chapter, I study the performance of ranked voting approaches in two real systems: Computer Go and In uence Maximization in social networks using PSINET agents. I analyze experimentally the performance of classical ranked voting rules, and discuss dierent ways to extract rankings from existing agents. Additionally, I will study the performance of large diverse teams of agents, by generating random parametrizations of one base agent. This chapter encompasses two of my publications: Jiang et al. [2014] and Yadav et al. [2015]. Additional experimental and theoretical results can be found in those publications, but in this thesis I will focus on my main contributions, and hence only some of the results of these publications is presented and discussed here. Moreover, contrary to the rest of the thesis, this chapter does not introduce new models, but rather presents techniques for 99 team generation and ranking extraction, and experimental results in real systems. The main focus of the experiments will be in the Computer Go domain, but I will also present results in social networks, when handling the in uence maximization problem. This chapter is organized in the following way: I brie y introduce the social networks domain that I consider in this chapter below (Section 6.1.1). Then, Section 6.2 and Section 6.3 will introduce the team generation and ranking extraction methodologies, in the context of Computer Go. Section 6.4 will discuss the social network domain in more detail, including the generation of voting agents and the aggregation methodologies in that context. Section 6.5 presents my experimental results: First, I will show results in the Computer Go domain in Sections 6.5.1 and 6.5.2; in Section 6.5.3 I present the results in the social networks domain. 6.1.1 Social Networks Homelessness aects2 million youths in USA annually, 11% of whom are HIV positive, which is 10 times the rate of infection in the general population [Aidala and Sumartojo, 2007]. Peer-led HIV prevention programs such as POL [Kelly et al., 1997] try to spread HIV prevention information through network ties and recommend selecting intervention participants based on Degree Centrality (i.e., highest degree nodes rst). Such peer-led programs are highly desirable to agencies working with homeless youth as these youth are often disengaged from traditional health care settings and are distrustful of adults [Rice and Rhoades, 2013, Rice, 2010]. Agencies working with homeless youth prefer a series of small size interventions de- ployed sequentially as they have limited manpower to direct towards these programs. This fact, along with emotional and behavioral problems of youth makes managing groups of more than 5-6 youth at a time very dicult [Rice et al., 2012]. Strategically choosing intervention participants is important so that information percolates through their social network in the most ecient way. In this chapter, I will use PSINET to study ranked voting aggregation. PSINET (POMDP based Social Interventions in Networks for Enhanced HIV Treatment), is a Partially Observable Markov Decision Process (POMDP) based system which chooses the participants of successive interventions in a social network [Yadav et al., 2015]. The main motivation of PSINET is to choose participants for a sequence of interventions, where they would be educated about HIV-prevention, in order to spread that knowledge across the social network. As I will explain in detail in Section 6.4, PSINET considers the uncertainty over the edges (i.e., friendship connections) of the social network. Such uncertainty is very common when handling real life social networks, such as homeless populations. 100 6.2 Team Generation I use a novel methodology for generating large teams. It is fundamentally dierent from that of Chapters 3 and 4, where I created a diverse team by combining four dierent, inde- pendently developed Go programs. Here I automatically create arbitrarily many diverse agents by parametrizing one Go program. Specically, I use dierent parametrizations of Fuego 1.1 [Enzenberger et al., 2010]. Fuego is a state-of-the-art, open source, publicly available Go program; it won rst place in 19 19 Go in the Fourth Computer Go UEC Cup, 2010, and also won rst place in 9 9 Go in the 14th Computer Olympiad, 2009. I sample random values for a set of parameters for each generated agent, in order to change its behavior. In Table 6.1 I present the parameters that were sampled to generate parametrized versions of Fuego. For each random draw, I used a uniform random distribution, dened in the interval shown in the column \Range". Also, depending on the domain of each parameter, I sample integers or oating point numbers. A detailed description of these parameters is available in the Fuego documentation, at http:// fuego.sourceforge.net/fuego-doc-1.1/. Using this approach, I generate larger diverse teams than what I showed in previous chapters of this thesis. I will evaluate such teams in Section 6.5.1. 6.3 Ranked Voting 6.3.1 Ranking Extraction Fuego (and, in general, all programs using Monte Carlo tree search algorithms) is not originally designed to output a ranking over all possible moves (alternatives), but rather to output a single move | the best one according to its search tree (there is no guarantee that the selected move is in fact the best one). Hence, we need to study ways to obtain a ranking from the agents. I study two dierent methodologies. First, I modied Fuego to make it directly output a ranking over moves. When asked to nd the best move, Fuego builds a search tree. Each node of the tree corresponds to a move in the current board state (given by the current level in the tree). In order to estimate the value of each node, Fuego runs simulations. Hence, each node has two values associated with it: p and n, where p is the probability of that move being the best one and n is the number of simulations used to estimate this probability. Due to the nature of Monte Carlo tree search algorithms, dierent nodes have wildly dierent n values. Hence, the comparison of moves according to the p values is unstable, and by default Fuego outputs the move with the highest n value. As a natural generalization, 101 Parameter Domain Range uct param globalsearch mercy rule Integer [0,1] uct param globalsearch territory statistics Integer [0,1] uct param globalsearch length modication Float [0, 0.5] uct param globalsearch score modication Float [0,0.5] uct param player forced opening moves Integer [0,1] uct param player reuse subtree Integer [0,1] uct param player use root lter Integer [0,1] uct param policy nakade heuristic Integer [0,1] uct param policy llboard tries Integer [0, 5] uct param rootlter check ladders Integer [0,1] uct param search check oat precision Integer [0,1] uct param search prune full tree Integer [0,1] uct param search rave Integer [0,1] uct param search virtual loss Integer [0,1] uct param search weight rave updates Integer [0,1] uct param search bias term constant Float [0, 1.0] uct param search expand threshold Integer [1,4] uct param search rst play urgency Integer [1,10000] uct param search knowledge threshold Integer [0,10000] uct param search number playouts Integer [1,3] uct param search prune min count Integer [1,128] uct param search randomize rave frequency Integer [0,200] uct param search rave weight nal Integer [1000,10000] uct param search rave weight initial Integer [0,999] Table 6.1: Parameters sampled to generate dierent versions of Fuego. 102 I inspect the rst level of the nal search tree, and rank the moves according to their n values. I also introduce a novel ranking methodology, which I call \ranking by sampling". For each board state, I sample moves from each agent, and rank the moves according to how frequently they were played by each agent. I generate the samples by repeatedly asking the agents where they would play, but without actually changing the current board state (I do not let the agents reuse parts of previous search trees, for the samples to be as independent as possible). In Section 6.5.2, I compare these two ranking procedures. 6.3.2 Ranked Voting Rules In this chapter I will study the performance of 5 dierent ranked voting rules: Plurality, Borda, Harmonic, Maximin, and Copeland. For completeness, I brie y introduce those below: Plurality: Plurality is the simplest voting rule: given one vote per agent, the action with the highest number of votes is taken by the team. If the agents submit rankings, the plurality voting rule will consider only the top position of the rankings. Borda: The Borda voting rule assigns a score for each position in the ranking of the agents. Given a set of rankings (one per agent), the total score obtained by each action is calculated (by summing up the scores assigned for the action at each ranking { according to the position of the action at each ranking). The action with the highest total score is taken by the team. Traditionally, the scoring vector is (m;m 1;:::; 1), where m is the size of the ranking. That is, at each ranking, the top action receives score m, the second top action score m 1, and so on, until reaching the nal action, which receives a score of 1. In this chapter I limit Borda to the top 6 positions in the rankings. Harmonic: The harmonic rule is similar to Borda: each position in the rank- ing is assigned a score, and rankings are aggregated by summing up the scores of each action. However, the scoring vector of the harmonic rule is dened as: (1; 1=2;:::; 1=m) [Boutilier et al., 2012]. Maximin: Maximin follows a dierent approach than the previous rules. Instead of giving scores to each ranking position, it considers a pairwise comparison among all actions. Let n(a i ;a j ) be the number of agents that rank action a i higher than a j . The maximin score of each action is its worst score: min i6=j n(a i ;a j ). The nal result ranks the actions by their maximin score (i.e., the top action has the highest maximin score). 103 Copeland: The Copeland rule follows a similar approach as the maximin rule: it considers a pairwise comparison among all actions. Again, let n(a i ;a j ) be the number of agents that rank action a i higher than a j . An action a i beats an action a j if more agents rank a i higher than a j . That is, if n(a i ;a j ) > n(a j ;a i ). The Copeland score of an action a i is dened as the number of actions that a i beats. Again, the nal result ranks the actions by their Copeland score (i.e., the top action has the highest Copeland score). 6.4 PSINET I also study ranked voting for in uence maximization, using PSINET [Yadav et al., 2015]. In this section I brie y describe the system. Additional details are available at Yadav et al. [2015]. PSINET was developed to solve the in uence maximization problem under uncer- tainty. Formally, let G := (V;E) be a graph with a set of nodes V and edges E. We perform interventions, where at each intervention we pick x nodes. A node may be either in uenced or unin uenced. An unin uenced node may change to in uenced, but an in uenced node will never change back to unin uenced. Each time we pick a node for an intervention, it will change to in uenced. When a node changes from unin uenced to in uenced, it will \spread" the in uence to its neighbors with some probability. That is, at each edge e there is a probability p e . If a node v 1 is in uenced, and there is an edge e = (v 1 ;v 2 ), the node v 2 will also change to in uenced with probability p e . Similarly, if v 2 changes to in uenced, it will spread the in uence to its neighbors by the same process. Our objective is to maximize the number of in uenced nodes after interventions. Contrary to the traditional independent cascade model [Kempe et al., 2003], Yadav et al. [2015] consider that each edge also has an existence probability q e . Hence, if an edge e actually exists (with probability q e ), the in uence will spread in the edge with probability p e . If the edge does not exist (with probability 1q e ), in uence will not spread. Additionally, unlike the traditional model [Kempe et al., 2003], Yadav et al. [2015] consider that an in uenced node v i may spread in uence to its neighbors after each time step, instead of only at the moment when it changed to in uenced. Moreover, each time a nodev i is selected for an intervention, we are able to observe its edges. That is, we will be able to know which edges connected to v i actually exist, and which edges do not exist. 104 6.4.1 Aggregation Methodologies PSINET works by generating multiple agents before each intervention. Each agent votes for one action (or a raking over actions), where in this context an action represents which subset of nodes the agent thinks should be called for an intervention. The system takes an action by aggregating the opinions of all agents. I will study three dierent aggregation methodologies, which will be described later in this section. Each agent is created by sampling one possible graph instantiation. That is, for each edge e, we decide whether it will exist or not with probability q e . After going through all edges, we will have one possible network, which will be given to an agent. The agent, then, will run simulations in its sampled network, in order to decide which action to recommend. I study three dierent aggregation methodologies: PSINET-S: Uses the usual plurality voting, where the vote of each agent has equal weight. PSINET-W: Uses weighted plurality voting. Let m be the number of uncertain edges. The agent which removes x uncertain edges has a vote weight of: W (x) := ( x 8xm=2 mx 8x>m=2 This weighting scheme approximates the probabilities of occurrences of real world events by giving low weights to instances which removes either too few or too many uncertain edges, since those events are less likely to occur. Instances which remove m=2 uncertain edges get the highest weight, since that event is most likely. PSINET-C: Uses the Copeland voting rule, described in the previous section. In order to use this aggregation methodology, it is necessary to have a ranking for each agent. I use the same method as in the previous section: each agent is queried multiple times, and the actions are ranked according to how frequent they are chosen by the agent (the most frequent action is assigned to the top position of the ranking). The rankings of the team are then aggregated using the Copeland voting rule. In Section 6.5.3 I perform an experimental evaluation of these aggregation method- ologies. 105 6.5 Results 6.5.1 Team Generation I will start by presenting the results in the Computer Go domain. All results were obtained by simulating 1000 9 9 Go games, in an HP dl165 with dual dodeca core, 2.33GHz processors and 48GB of RAM. I compare the winning rates of games played against a xed opponent. In all games the system under evaluation plays as white, against the original Fuego playing as black. I evaluate two types of teams: Diverse is composed of dierent agents, and Uniform is composed of copies of a specic agent (with dierent random seeds). In order to study the performance of the uniform team, for each sample (which is an entire Go game) I construct a team consisting of copies of a randomly chosen agent from the diverse team. Hence, the results presented for Uniform are approximately the mean behavior of all possible uniform teams, given the set of agents in the diverse team. In all graphs, the error bars show 99% condence intervals. Figure 6.1 shows the winning rates of Diverse and Uniform for a varying number of agents using the plurality voting rule. The winning rates of both teams increase as the number of agents increases. Diverse and Uniform start with similar winning rates, around 35% with 2 agents and 40% with 5 agents, but with 25 agents Diverse reaches 57%, while Uniform only reaches 45:9%. The improvement of Diverse over Uniform is not statistically signicant with 5 agents (p = 0:5836), but is highly statistically signicant with 25 agents (p = 8:59210 7 ). I perform linear regression on the winning rates of the two teams to compare their rates of improvement in performance as the number of agents increases. Linear regression (shown as the dotted lines in Figure 6.1) gives the function y = 0:0094x + 0:3656 for Diverse (R 2 = 0:9206, p = 0:0024) and y = 0:0050x + 0:3542 for Uniform (R 2 = 0:8712, p = 0:0065). In particular, the linear approximation for the winning rate of Diverse increases roughly twice as fast as the one for Uniform as the number of agents increases. I now analyze the parametrized agents, in order to better understand the performance of the diverse team. First, I show that the original Fuego is stronger than the parametrized agents. I ran 1000 9 9 Go games, with the system under evaluation playing as white, against the original Fuego playing as black. In Figure 6.2 we can see the winning rate of Fuego and of each one of the parametrized agents. The original Fuego is the strongest agent (with p < 0:01 for all but 3 agents), having a winning rate close to 50%. The parametrized agents, on average, have a winning rate of 32.3% (std: 10.4%). I also evaluate the diversity of a team of parametrized agents, by analyzing a sample of 10 parametrized agents. I use the metric proposed in Chapter 3, where diversity is dened as the average Hellinger Distance [Hellinger, 1909] between the probability 106 2 5 10 15 20 25 Number of Agents 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 Winning Rate Diverse Uniform Figure 6.1: Winning rates for Diverse (continuous line) and Uniform (dashed line), for a variety of team sizes, using the plurality voting rule. Fuego Parametrized Agents 0.0 0.2 0.4 0.6 0.8 1.0 Winning Rate Figure 6.2: Winning rate of Fuego and of the parametrized agents. 107 distribution functions (pdfs) of all possible combinations of pairs of agents across dierent world states. I show three dierent results: Control compares each agent with a second sample of itself, in order to measure the noise in my evaluation; Parametrized Agents compares all possible pairs of parametrized agents, in order to estimate the diversity of our team; and Independent Agents compares each parametrized agent with Pachi [Baudi s and Gailly, 2011], an independently developed Computer Go program. In order to perform the analysis, I estimate the pdfs of Pachi and 10 agents from the diverse team, using 100 dierent board states. For each board state I sample 100 moves for each agent. The results are shown in Figure 6.3(a). These results indicate that the level of diversity is especially high when the parametrized agents are compared with Pachi, suggesting that the current parametrization methodology falls short of creating an idealized diverse team. That said, the methodology does lead to some diversity, as indicated by the statistically signicant dierence between the Control bar and Parametrized Agents bar. I also evaluate the level of diversity by testing whether there is a set of board states where all parametrized agents have a low probability of playing the best action. Again, I evaluate a sample of 10 agents from the diverse team. I rst estimate the best move for each of 100 board states. To this end, I use Fuego to evaluate the given board state, but with a time limit 50x higher than the default one. Then, based on the previous estimated pdfs of the parametrized agents, we can obtain the probability of each agent playing the optimal action. Finally, I calculate the proportion of board states in which all parametrized agents play the best action with probability below a certain threshold. The results are shown in Figure 6.3(b). It turns out that all parametrized agents play the optimal action with probability smaller than 1=2 in 40% of the board states. Moreover, in 10% of the board states, the probability of playing the optimal action is lower than 10%. Hence, there is still a large set of board states in which all agents play badly, regardless of the parametrization. 6.5.2 Ranked Voting I rst study the performance of the ranked voting rules, when extracting rankings directly from the search tree of the agents. Figure 6.4 compares the results across dierent voting rules. As mentioned, to generate ranked votes, I used the internal data in the search tree of an agent's run (in particular, I rank using the number of simulations per alternative). We can see that increasing the number of agents has a positive impact for all voting rules under consideration. Moving from 5 to 15 agents for Diverse, plurality has a 14% increase in the winning rate, whereas other voting rules have a mean increase of only 6:85% (std = 2:25%), close to half the improvement of plurality. For Uniform, the impact of increasing the number of agents is much smaller: Moving from 5 to 15 agents, the increase 108 Control Parametrized Agents Independent Agents 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Diversity (a) Diversity of the parametrized agents, compared with a second sample and with the diversity between indepen- dently developed agents 0.1 0.2 0.3 0.4 0.5 Threshold 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Percentage of ”Bad” States (b) Percentage of world states where all parametrized agents have probability of playing the best action below the given threshold Figure 6.3: Evaluation of the diversity of the parametrized agents, and the fraction of states in which all of them have a low probability of playing the optimal action. The error bars show 99% condence intervals. 109 2 5 10 15 Number of Agents 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Winning Rate Plurality Borda Harmonic Maximin Copeland Figure 6.4: Winning rates for Diverse (continuous line) and Uniform (dashed line), for a variety of team sizes and voting rules. for plurality is 5:3%, while the mean increase for other voting rules is 5:70%(std = 1:45%). Plurality surprisingly seems to be the best voting rule in these experiments, even though it uses less information from the submitted rankings. This suggests that the ranking method used does not typically place good alternatives in high positions other than the very top. Hence, I study now the performance of my novel ranking extraction methodology. As mentioned, to generate a ranked vote from an agent on a given board state, I run the agent on the board state 10 times (each run is independent of other runs), and rank the moves by the number of times they are played by the agent. I use these votes to compare plurality with the four other voting rules, for Diverse with 5 agents. Figure 6.5 shows the results. All voting rules outperform plurality; Borda and maximin are statistically signicantly better (p< 0:007 andp = 0:06, respectively). All ranked voting rules are also statistically signicantly better than the non-sampled (single run) version of plurality. 6.5.3 PSINET I also evaluate aggregation methodologies for PSINET, in the context of in uence max- imization. I provide two sets of results. First, I show results on articial networks to understand the algorithms' properties on abstract settings, and to gain insights on a 110 Plurality Non-sampled Plurality Sampled Borda Harmonic Maximin Copeland 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Winning Rate Figure 6.5: All voting rules, for Diverse with 5 agents, using the new ranking methodology. 0 5 10 15 20 25 10 30 50 70 90 Influence Spread Number of Network Nodes DC PSINET-‐S PSINET-‐W PSINET-‐C POMCP Figure 6.6: Comparison on BTER graphs range of networks. Next, I show results on two real world homeless youth networks. In all experiments, I select 2 nodes per round and average over 20 runs. PSINET-S and PSINET-W use 20 agents (possible network instances) and PSINET-C uses 5 agents (each agent is sampled 5 times, in order to build a ranking). The propagation and existence probability values were set to 0.5 in all experiments (based on ndings by Kelly et al. [1997]). In this section, ahX;Y;Zi network refers to a network with X nodes, Y certain andZ uncertain edges. I use a metric of \indirect in uence spread" (IIS) throughout this section, which is the number of nodes \indirectly" in uenced by the intervention partici- pants. For example, on a 30 node network, by selecting 2 nodes each for 10 interventions (horizon), 20 nodes (a lower bound for any strategy) are in uenced with certainty. How- ever, the total number of in uenced nodes might be 26 (say) and thus, the IIS is 6. All comparison results are statistically signicant under bootstrap-t ( = 0:05). 111 6.5.3.1 Articial networks First, I compare all algorithms on Block Two-Level Erdos-Renyi (BTER) networks (hav- ing degree distribution X d /d 1:2 , where X d is number of nodes of degree d) of several sizes, as they accurately capture observable properties of real-world social networks [Se- shadhri et al., 2012]. In Figure 6.6, I compare solution qualities of Degree Centrality (DC), POMCP [Silver and Veness, 2010] (since this problem can be modeled as a POMDP, see Yadav et al. [2015]), PSINET-S, PSINET-W and PSINET-C on BTER networks of varying sizes. In DC, nodes are selected in subsequent rounds in decreasing order of out-degrees, where every uncertain edge adds its existence probability to the node degrees. I choose DC as my baseline as it is the current modus operandi of agencies working with homeless youth. This gure shows that all algorithms beat DC by60%. Further, it shows that PSINET- W beats PSINET-S and PSINET-C. Also, POMCP runs out of memory on 30 node graphs. Hence, as we can see, for PSINET plurality outperforms the Copeland ranked voting rule. However, weighted plurality obtains better results than simple plurality voting. 6.5.3.2 Real world networks Figure 6.7 shows one of the two real-world friendship based social networks of homeless youth (created by my collaborators through surveys and interviews of homeless youth attending My Friend's Place), where each numbered node represents a homeless youth. Figure 6.8 compares PSINET variants and DC on these two real-world social networks (each of sizeh155; 120; 190i). This gure clearly shows that all PSINET variants beat DC on both real world networks by60%, which shows that PSINET works equally well on real-world networks. Also, PSINET-W (weighted plurality) beats PSINET-S (simple plurality), in accordance with previous results. 6.6 Conclusion In this chapter I explored the automatic generation of agent teams, and ranked voting rules. I present results in two dierent domains: Computer Go and Social Networks. In Computer Go I show that we can improve performance with large agent teams, but the gain in performance decreases as the team grows. How to better generate diverse teams, in order to obtain even higher performance with large teams is still an open area for further study. 112 Figure 6.7: One of the friendship based social network of homeless people visiting My Friend's Place 0 5 10 15 20 25 30 1st graph 2nd graph Influence Spread DC PSINET‐S PSINET‐W Figure 6.8: Solution Quality for Real World Networks 113 I also show that when directly extracting rankings from the search trees of the agents, the result of classical ranked voting rules is actually quite poor, and they are easily outper- formed by simply playing plurality voting. However, it is possible to obtain better results by using a dierent ranking extraction technique, where rankings are built according to how frequently each action is played when an agent is sampled multiple times. When using this approach, I show that Borda outperforms plurality with statistical signicance. In the Social Networks domain, I study agents that vote to select subsets of nodes for in uence maximization. I experimentally study the performance of plurality, weighted plurality and the Copeland ranked voting rule. In this domain plurality outperforms the Copeland voting rule; but I can obtain a better performance by using weighted plurality instead of using simple plurality voting. 114 Chapter 7 Simultaneous Influencing and Mapping 'Another may be sadly deceived By the words you say. And another, believing and trusting you, May be led astray by the things you do.' 'For much that never you'll see or know Will mark your days as you come and go. And in countless lives that you'll never learn The best and the worst of you will return.' (Edgar Albert Guest) 7.1 Introduction In uencing a social network is an important technique, with great potential to posi- tively impact society, as we can modify the behavior of a community. For example, we can increase the overall health of a population; Yadav et al. [2015], for instance, spread information about HIV prevention in homeless populations. However, although in u- ence maximization has been extensively studied [Kempe et al., 2003, Cohen et al., 2014, Golovin and Krause, 2010], their main motivation is viral marketing, and hence they assume that the social network graph is fully known, generally taken from some social media network (such as Facebook). However, the graphs recorded in social media do not really represent all the people and all the connections of a population. Most critically, when performing interventions in real life, we deal with large degrees of lack of knowledge. Normally the social agencies have to perform several interviews in order to learn the social network graph [Marsden, 2005]. 115 These highly unknown networks, however, are exactly the ones we need to in uence in order to have a positive impact in the real world, beyond product advertisement. Additionally, learning a social network graph is very valuable per se. Agencies also need data about a population, in order to perform future actions to enhance their well- being, and better actuate in their practices [Marsden, 2005]. As mentioned, however, the works in in uence maximization are currently ignoring this problem. Each person in a social network actually knows other people, including the ones she cannot directly in uence. Hence, each time we select someone for an intervention (to spread in uence), we also have an opportunity to obtain knowledge from that person. Therefore, in this chapter I present for the rst time the problem of simultaneously in uencing and mapping a social network. I study the performance of the classical greedy in uence maximization algorithm in this context, and show that it can be arbitrarily low. Hence, I study a class of algorithms for this problem, and show that we can eectively in uence and map a network when independence of objectives holds. For the interventions where it does not hold, I give an upper bound in our loss, which converges to 0. I study an approximation of my main algorithm, that works as well but requiring fewer assumptions. I perform a large scale experimentation using four real life social networks of homeless populations, where I show that my algorithm is competitive with previous approaches in terms of in uence (even outperforming them in hard cases), and is signicantly better in terms of mapping. This chapter is the only one of the thesis that does not consider voting. Here the aggregation will be performed by a linear combination of two greedy algorithms, in order to simultaneously in uence and map a social network. 7.2 In uencing and Mapping I consider the problem of maximizing the in uence in a social network. However, we start by knowing only a subgraph of the social network. Each time we pick a node to in uence, it may teach us about subgraphs of the network. Our objective is to spread in uence, at the same time learning the network graph (i.e., mapping). I call this problem as \Simultaneous In uencing and Mapping" (SIAM). In this chapter, I consider a version of SIAM where we only need to map the nodes that compose the network. I assume that we always know all the edges between the nodes of the known subgraph. For clarity, I will dene formally here only the version of SIAM that I handle in this chapter. Therefore, unless otherwise noted, henceforth by SIAM I mean the version of the problem that is formally dened below. 116 Let G := (V;E) be a graph with a set of nodes V and edges E. We perform interventions, where at each one we pick one node. The selected node is used to spread in uence and map the network. I assume we do not know the graph G, we only know a subgraph G k = (V k ;E k ) G, where k is the current intervention number. G k starts as G k := G 0 G. For each node v i , there is a subset of nodes V i V , which will be called \teaching list". Each time we pick a node v i , the known subgraph changes to G k := (V k1 [V i ;E k ), whereE k contains all edges between the set of nodesV k1 [V i in G. Our objective is to maximizejV k j, given interventions. For each node v i , I assume we can observe a number i , which indicates the size of its teaching list. I study two versions: in one i is the number of nodes in V i that are not yet in G k (hence, the number of new nodes that will be learned when picking v i ). I refer to this version as \perfect knowledge". In the other, i :=jV i j, and thus we cannot know how many nodes in V i are going to be new or intersect with already known nodes in V k . I refer to this version as \partial knowledge". The partial knowledge version is more realistic, as by previous experience we may have estimations of how many people a person with a certain prole usually knows. I study both versions in order to analyze how much we may lose with partial knowledge. Note that we may also have nodes with empty teaching lists ( i = 0). The teaching list of a node v i is the set of nodes that v i will teach us about once picked, and is not necessarily as complete as the true set of all nodes known by v i . Some nodes could simply refuse to provide any information. Additionally, note that I am assuming the teaching list and the neighbor list to be independent. That is, a node may teach us about nodes that it is not able to directly in uence. For instance, it is common to know people that we do not have direct contact with, or we are not \close" enough to be able to in uence. Similarly, a person may not tell us about all her close friends, due to limitations of an interview process, or even \shame" to describe some connections. However, some readers could argue that people would be more likely to teach us about their direct connections. Hence, I will handle the case where the independence does not hold in my empirical experiments in Section 7.3.2. Simultaneously to the problem of mapping, we also want to maximize the spread of in- uence over the network. I consider here the traditional independent cascade model, with observation, as in Golovin and Krause [2010]. That is, a node may be either in uenced or unin uenced. An unin uenced node may change to in uenced, but an in uenced node will never change back to unin uenced. Each time we pick a node for an intervention, it will change to in uenced. When a node changes from unin uenced to in uenced, it will \spread" the in uence to its neighbors with some probability. That is, at each edge e there is a probability p e . When a node v 1 changes to in uenced, if there is an edge e = (v 1 ;v 2 ), the node v 2 will also change to in uenced with probability p e . Similarly, if 117 v 2 changes to in uenced, it will spread the in uence to its neighbors by the same process. In uence only spreads in the moment a node changes from unin uenced to in uenced. That is, a node may only \try" one time to spread in uence to its neighbors. As in Golovin and Krause [2010], I consider that we have knowledge about whether a node is in uenced or not (but in my case, we can only know about nodes in the current known subgraph G k ). Let I k be the number of in uenced nodes after k interventions. Our ob- jective is to maximizeI k given interventions. In uence may spread beyond G k . Hence, I consider I k as the number of in uenced nodes in the full graph G. I denote as i the expected number of nodes that will be in uenced when picking v i (usually calculated by simulations). As mentioned, we want to attend both objectives simultaneously. Hence, we must maximize bothjV k j and I k . It is easy to show that SIAM is an NP-Complete problem: Proposition 7.2.1 SIAM is NP-Complete. Proof: Let be an instance of the in uence maximization problem, with graph G. Consider now a SIAM problem where no node carries information and G 0 := G. If we can solve this SIAM problem we can also solve the in uence maximization problem . Therefore, SIAM is NP-Complete. As SIAM is NP-Complete, similarly to previous in uence maximization works [Kempe et al., 2003, Golovin and Krause, 2010], I study greedy solutions. Like the exploration vs exploitation dilemmas in online learning [Valizadegan et al., 2011], the fundamental problem of SIAM is whether to focus on in uencing or mapping the network. Hence, I propose as a general framework to select the node v i such that: v i = argmax(c 1 i +c 2 i ) (7.1) Constantsc 1 andc 2 control the balance between in uencing or mapping. c 1 = 1;c 2 = 0 is the classical in uence maximization algorithm (\in uence-greedy"); c 1 = 0;c 2 = 1, on the other hand, only maximizes the knowledge-gain at each intervention (\knowledge- greedy"). c 1 =c 2 = 1 is an algorithm where both objectives are equally balanced (\bal- anced"). Dierent weights may also be used. Remember that I dened two versions for the values: perfect knowledge, where we know how many new nodes a node will teach us about; and partial knowledge, where we do not know how many nodes will be new. In order to better handle the partial knowledge case, I also propose the \balanced-decreasing" algorithm, where c 2 constantly decreases until reaching 0. Hence, I dene c 2 as: 118 A A' B B' C Connected graph of z nodes Figure 7.1: A graph where the traditional greedy algorithm has arbitrarily low perfor- mance. c 2 := ( c 0 2 1 d c 0 2 k if kd 0 otherwise ; (7.2) where c 0 2 is the desired value for c 2 at the very rst iteration, and d controls how fast c 2 decays to 0. 7.2.1 Analysis I begin by studying in uence-greedy. It was shown that when picking the node v which argmax( v ) at each intervention, we achieve a solution that is a (11=e) approximation of the optimal solution, as long as our estimation of v (by running simulations) is \good enough" [Kempe et al., 2003]. However, even though the actual in uence spread may go beyond the known graph G k , we can only run simulations to estimate v in the current G k . Hence, the previous results are no longer valid. In fact, in the next observation I show that we can obtain arbitrarily low-performing solutions by using in uence-greedy. Observation 7.2.2 The performance of in uence-greedy can be arbitrarily low in a SIAM problem. I show with an example. Consider the graph in Figure 7.1, and assume we will run 2 interventions (i.e., pick 2 nodes). There is a probability 1 to spread in uence in any edge. Our initial knowledge is V 0 =fA;A 0 ;B;B 0 ;Cg. A and B can in uence A' and B', respectively. However, C cannot in uence any node. A, B, A' and B' have empty teaching lists. C, on the other hand, can teach us about a connected graph of z nodes. In uence-greedy, by running simulations on the known graph, picks nodes A and B, since each can in uence one more node. The optimal solution, however, is to pick node C, which will teach us about the connected graph of z nodes. Then, we can pick one node in that graph, and in uence z + 1 nodes in total. Hence, the in uence-greedy solution is only 4 z+1 of the optimal. As z grows, in uence-greedy will be arbitrarily far from the optimal solution. 119 If we make some assumptions about the distribution of the teaching lists across the nodes, however, in uence-greedy eventually maps the full graph given enough interven- tions. Let n =jVj, and n k =jV k j. I show the expected number of interventions to learn all n nodes (subtracted by a small , for numerical stability). I study the partial knowl- edge version. Assume the size of the teaching list of each node is drawn from a uniform distribution on the interval [0;u], and any node is equally likely to be in a teaching list. I consider that there is a probability ' that a node will have a non-empty teaching list. Proposition 7.2.3 The expected number (k full ) of interventions for in uence-greedy to learn n nodes is log( n 0 n ) log(1 'u 2n ) . Proof: Since in uence-greedy is not considering , it picks nodes arbitrarily in terms of knowledge-gain. Hence, on average it selects the expected value of the uniform dis- tribution, u=2. For each node v in a teaching list, the probability that it is not yet known is nn k n . Therefore, the expected number of nodes known at one iteration k is: E[n k ] = ' u 2 nE[n k1 ] n + E[n k1 ]. Solving the recurrence gives: E[n k ] = n 0 (1 'u 2n ) k n (1 'u 2n ) k +n. Solving for E[n k ] = n gives that the ex- pected number of interventions is: k full = log( n 0 n ) log(1 'u 2n ) . k full quickly increases as ' (or u) decreases. In Section 7.3, I study experimentally the impact of ' on the performance of in uence-greedy. Now, let's look at balanced. Clearly, it will learn the full graph with a lower number of expected interventions than in uence-greedy. However, although intuitively balanced may seem reasonable, its performance may also quickly degrade if we assume partial knowledge (i.e., i =jV i j). Proposition 7.2.4 The performance of the balanced algorithm degrades as n k ! n, if i =jV i j. Proof: Each node in the teaching list of a node v i has probability nn k n of being a yet unknown node. Hence, the expected number of unknown nodes that will be learned by picking a node with teaching list size i is: E[new] = i nn k n . Asn k !n,E[new]! 0. Hence, when n k ! n, balanced picks a node v that maximizes v + v , thus missing to select nodes v o (if available) with o > v , o + o < v + v , with no actual gains in mapping. This problem does not happen in the perfect knowledge version. Since the values only include new nodes, ! 0 as n k ! n, for all . Hence, in the perfect knowledge version, balanced converges to the same behavior as in uence-greedy as k increases. In 120 order to approximate this behavior for the partial knowledge case, I propose the balanced- decreasing algorithm, where the constantly decreasing c 2 \simulates" the decreasing values. I now show that balanced can match the performance of in uence-greedy in terms of in uence, but at the same time mapping better the network. As I just discussed that perfect knowledge can be approximated by using balanced-decreasing, I focus here on the perfect knowledge case. I show that when the independence of objectives hypothesis holds (dened below), balanced plays the same as in uence-greedy or better, while in uence- greedy may still fail in terms of mapping. If the hypothesis does not hold, our in uence loss at one intervention will be bounded by u=2! 0. LetV k be a subset ofV k where eachv2V k maximizes in the current interventionk. Similarly, letV k V k , where eachv2V k maximizes in the current intervention k. As before, I consider that the teaching list size of a node is given by a uniform distribution, but since tends to decrease at each intervention, I denote the interval as [0;u k ]. Clearly, any node in the set V Good k :=V k \V k should be selected, as they maximize both objectives. Hence, when V Good k 6=; it is possible to simultaneously maximize both objectives, and thus we say that the independence of objectives hypothesis holds. Since we are studying greedy-algorithms, both balanced and in uence-greedy lack optimality guarantees. Hence, I focus here on a \local" analysis, and show that given a set of k possible interventions (with the same graph state across both algorithms at each inter- vention), balanced is able to pick nodes that spread as much in uence as in uence-greedy. Moreover, when balanced picks a dierent node, our loss is bounded by u k =2. Asu k ! 0 with k!1, our loss also converges to 0. Proposition 7.2.5 Balanced selects nodes that spread as much in uence as in uence- greedy, ifjV k j > n k =2 andjV k j > n k =2, or as k!1. In uence-greedy, on the other hand, selects worse nodes than balanced in terms of mapping with probability 1 jV k \V k j jV k j . Moreover, when balanced selects a node with worse than in uence-greedy, the expected in uence loss is bounded by u k =2, which! 0 as k!1. Proof: As balanced plays argmax( + ), if there is a node v2 V Good k , balanced picks v. In uence-greedy, however, selects an arbitrary node in V k . Hence, it picks a node v2 V Good k with probability jV k \V k j jV k j . Therefore, for all interventions where V Good k 6=;, balanced selects a node in V Good k , while in uence-greedy makes a mistake in terms of mapping with probability 1 jV k \V k j jV k j . We consider now the probability of V Good k 6=; across k interventions. Clearly, if jV k j>n k =2, andjV k j>n k =2, we have V Good k 6=;. If not, note that as k!1, n k !n. 121 Therefore,V k !V k (since all ! 0, all nodes will have the same teaching list size), thus V Good k !V k 6=;. Hence, the probability of V Good k 6=; goes to 1 as k!1. Let's study now the case when V Good k =;. Let v 1 be the node in V k picked by in uence-greedy, and v 2 be the node in V k nV k with the largest 2 . Since V Good k =;, we must have that 1 > 2 , and 2 > 1 . However, as long as 2 1 < 1 2 , balanced still selectsv 1 (or an even better node). In the worst case, the expected value for 2 is the expected maximum of the uniform distribution: E[ 2 ] = u k u k =(n k + 1) u k . 1 , on the other hand, has the expected value of the uniform distribution E[ 1 ] =u k =2. Hence, as long as 1 2 >u k =2, in expectation balanced still picksv 1 (or an even better node). Moreover, when balanced does not pick v 1 , our loss in terms of in uence at intervention k is at most u k =2. Since ! 0 as n k !n, u k =2! 0 as k!1. Proposition 7.2.5 shows that we may experience loss in one intervention, when com- paring in uence-greedy with balanced. However, the loss is bounded by u k =2, which goes to 0 as the number of interventions grows. Moreover, when we do not update the values, we can use the balanced-decreasing algorithm to simulate the same eect. Additionally, in Proposition 7.2.5 I considered the same graph states at each intervention across both algorithms. In practice, however, since balanced is able to map the network faster, any loss experienced in the beginning when k is low can be compensated by playing better later with full knowledge of the graph, while in uence-greedy may still select nodes with lower due to lack of knowledge. As noted in Observation 7.2.2, lack of knowledge of the full graph can make in uence-greedy play with arbitrarily low performance. In Section 7.3.2 I perform an empirical analysis assuming a power law model for the teaching lists, and I note here that my main results still hold. 7.3 Results I run experiments using four real life social networks of the homeless population of Los Angeles, provided by Eric Rice, from the School of Social Work of the University of South- ern California. All the networks are friendship-based social networks of homeless youth who visit a social agency. The rst two networks (A, B) were created through surveys and interviews. The third and fourth networks (Facebook, MySpace) are online social networks of these youth created from their Facebook and MySpace proles, respectively. I run 100 executions per network. At the beginning of each execution, 4 nodes are randomly chosen to compose our initial subgraph (G 0 ). As mentioned, I consider that we always know the edges between the nodes of our current knowledge graph (G k ). I noticed similar tendencies in the results across all four social networks. For clarity, I plot here the results considering all networks simultaneously (that is, I average over all the 122 400 executions). In the Appendix A I show the individual results for each network. In all graphs, the error bars show the condence interval, with = 0:01. When I say that a result is signicantly better than another, I mean with statistical signicance according to a t-test with 0:01, unless noted otherwise. The size of each network is: 142, 188, 33, 105; for A, B, Facebook and MySpace, respectively. I evaluate up to 40 interventions. I measure the percentage of in uence in the network (\In uence") and percent- age of known nodes (\Knowledge") for in uence-greedy, knowledge-greedy, balanced and balanced-decreasing (with c 0 2 = 1:0, and d = 5). In order to estimate the expected in u- ence spread ( v ) of each node, I run 1000 simulations before each intervention. Estimating the expected in uence through simulations is a common method in the literature. In our case, the simulations are run in the current known subgraph G k , although the actual in- uence may go beyondG k . In uenced nodes inGnG k will be considered when I measure In uence, but will not be considered in my estimation of v . Concerning the teaching list size ( v ), I consider it to hold the number of new nodes that would be learned if v is selected, for balanced and knowledge-greedy (i.e., perfect knowledge). For balanced- decreasing, I consider v to hold the full teaching list size, including nodes that are already known (i.e., partial knowledge). Therefore, we can evaluate if balanced-decreasing approximates well balanced, when perfect knowledge is not available. I simulate the teaching lists, since there are no real world data available yet (I only have data about the connections in the four real life social networks). I study two models: (i) uniform, which follows the assumptions of my theoretical analysis; (ii) power law, which considers that nodes are more likely to teach us about others which are close to them in the social network graph. I present the second model to show that my conclusions hold irrespective of the uniform assumption. For each node, we decide whether it will have a non-empty teaching list according to a probability '. I run experiments using dierent combinations of ', probability of in uence p, and c 1 and c 2 values. 7.3.1 Uniform Model Under the uniform model, if a node has a teaching list, I x its size according to a uniform distribution from 0 to 0:5jVj. Each node in the graph is also equally likely to be in the teaching list of a nodev i . I consider here the teaching list and the neighbor list to be independent, as people may know others that they cannot in uence, and they may also not tell us all their connections, as described before. The case where the teaching list and the neighbor list are not independent is considered in Section 7.3.2. I run several parametrizations. Figure 7.2 shows the result at each intervention for' = 0:5 and p = 0:5. As we see in Figure 7.2 (a), the In uence obtained by in uence-greedy, balanced, and balanced-decreasing are very similar. In fact, out of all 40 interventions, 123 0 10 20 30 40 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Influence ● ● ● ● ● ● ● ● ● ● ● ● ● ● Intervention Number ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing (a) In uence 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (b) Knowledge 0 10 20 30 40 0.2 0.4 0.6 0.8 1.0 Influence + Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (c) In uence + Knowledge Figure 7.2: Results of 4 real world networks across many interventions, for p = 0:5 and ' = 0:5 (uniform distribution). their result is not signicantly dierent in any of them (and they are signicantly better than knowledge-greedy in around 75% of the interventions). This shows that balanced is able to successfully spread in uence in the network, while at the same time mapping the graph. We can also notice that a perfect knowledge about the number of new nodes in the teaching lists is not necessary, as balanced-decreasing obtained close results to balanced. Figure 7.2 (b) shows the results in terms of Knowledge. All algorithms clearly out- perform in uence-greedy with statistical signicance. Moreover, the result for knowledge- greedy, balanced and balanced-decreasing are not signicantly dierent in any of the in- terventions. This shows that we are able to successfully map the network (as well as knowledge-greedy), but at the same time spreading in uence successfully over the network (as well as in uence-greedy), even in the partial knowledge case. Hence, the independence of objectives hypothesis seems to hold at most interventions in the networks, since we 124 could maximize both objectives simultaneously, as predicted in Proposition 7.2.5. Given enough interventions, however, in uence-greedy is also able to map the network, as I discussed in Proposition 7.2.3. It is also interesting to note that even though in uence-greedy has much less infor- mation about the network (with signicantly lower mapping performance in around 16 interventions), it is still able to perform as well as the other algorithms in terms of In u- ence. Observation 7.2.2, however, showed that its In uence performance can be arbitrarily low. As I discuss later, for some parametrizations we actually found that in uence-greedy has signicantly lower results than the other algorithms in terms of In uence as well. In order to compare the results across dierent parametrizations, we calculate the area under the curve (AUC) of the graphs. The closer the curves are to 1:0 the better, hence an AUC of 39 (that is, always at 1:0 across all 40 interventions) would be an \ideal" result. In Figure 7.3 (a) I show the results for a xed in uence probability value (p = 0:5), but dierent teaching probability (') values. First I discuss the results in terms of In uence (left-hand side of the graph). As we can see, except for knowledge- greedy, all algorithms obtain very similar results. However, for ' = 0:1, the In uence for balanced and balanced-decreasing is slightly better than in uence-greedy, in the borderline of statistical signicance ( = 0:101 and 0:115, respectively). Moreover, we can see that ' does impact the in uence that we obtain over the network, although the impact is not big. For in uence-greedy, from ' = 0:5 to ' = 1:0, the dierence is only statistically signicant with = 0:092. However, from' = 0:1 to' = 0:5 the dierence is statistically signicant with = 3:26 10 27 . Similarly, for all other algorithms there is a signicant dierence from ' = 0:1 to ' = 0:5, while from ' = 0:5 to ' = 1:0 the dierence is only signicant with < 0:1 (except for knowledge-greedy, its dierence is not signicant between ' = 0:5 and ' = 1:0). Let's look at the results in terms of Knowledge, on the right-hand side of Figure 7.3 (a). We can see that' has a much bigger impact in our mapping, as expected. Knowledge- greedy, balanced and balanced-decreasing are all signicantly better than in uence-greedy. However, we can notice that the dierence between in uence-greedy and the other algo- rithms decreases as ' increases. Similarly, when comparing knowledge-greedy, balanced and balanced-decreasing, we can notice that the dierence between the algorithms also decreases as ' increases. For both ' = 0:1 and ' = 0:5, however, the algorithms are not signicantly dierent. Interestingly, when ' = 1, because of the lower variance, knowledge-greedy and balanced become signicantly better than balanced-decreasing, even though the dierences between the algorithms decreases. In Figure 7.3 (b), I keep ' = 0:5, and change p. In the left-hand side we see the results for In uence. As expected, there is clearly a signicant dierence when p changes 125 ϕ= 0.1 ϕ= 0.5 ϕ= 1 ϕ= 0.1 ϕ= 0.5 ϕ= 1 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (a) Changing teaching probability p=0.1 p=0.5 p=0.1 p=0.5 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (b) Changing in uence probability Figure 7.3: Results of In uence and Knowledge for dierent teaching and in uence prob- abilities (uniform distribution). from 0:1 to 0:5. However, we can notice that the dierence between the algorithms does not change signicantly whenp changes. In both cases, the dierences between in uence- greedy, balanced and balanced-decreasing are not signicant. Additionally, in both cases all algorithms are signicantly better than knowledge-greedy. In terms of Knowledge (right-hand side of the gure) we see that the in uence probability has no impact in any algorithm, as it would be expected. For all algorithms, the dierence between p = 0:1 and p = 0:5 is not statistically signicant. I also compare the regret obtained by the dierent algorithms at dierent in uence probabilities and teaching probability values. First, I run the in uence-greedy algorithm, but considering that we know the full graph (that is,G k :=G). Although that solution is not optimal, it is the best known approximation of the optimal, hence I call it \perfect". I calculate the AUC for perfect, and dene the regret of an algorithmx as: AUC Perfect AUC x . I analyze the regret in terms of In uence in Figure 7.4. Note that, in the gure, the lower the result the better. On the left-hand side I show the regret for p = 0:1 and dierent ' values, while on the right-hand side I show for p = 0:5. All algorithms 126 ϕ= 0.1 ϕ= 0.5 ϕ= 1 ϕ= 0.1 ϕ= 0.5 ϕ= 1 Regret 0 2 4 6 8 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing p = 0.1 p = 0.5 Figure 7.4: Regret for dierent teaching and in uence probabilities (uniform distribution). Lower results are better. (except knowledge-greedy) have a similar regret, as it would be expected based on the previous results. However, we can notice here that the regret for balanced and balanced- decreasing when ' = 0:1 and p = 0:5 is actually lower than in uence-greedy. The dierence is statistically signicant, with = 0:069 and = 0:092, respectively. Hence, we can actually have a signicantly better in uence on the social network graph than the traditional greedy algorithm, when the teaching probability is low. However, when ' = 0:1 and p = 0:1, even though the regret for balanced and balanced-decreasing is still lower than in uence-greedy, it is not signicant anymore (as there is larger variance when p = 0:1). Additionally, we note that the dierence in regret for balanced and balanced-decreasing is always not signicant. It is also interesting to note that for some parametrizations, the regret of in uence-greedy is actually close to 0, which means that in some cases lack of knowledge of the full graph does not signicantly harm the in uence performance. When p = 0:1, and ' = 0:5 or ' = 1:0, the regret is not signicant ( = 0:410, and = 0:78, respectively). For p = 0:5 and ' = 1:0, the regret is in the borderline of not being signicant ( = 0:102). In all other cases, the regret is signicant. I discussed 4 algorithms, but my framework can actually generate a variety of behav- iors by using dierent c 1 and c 2 values. I tested 6 more combinations:f(0:5; 2); (1; 0:5); (1; 1:5); (1; 2); (2; 0:5); (2; 1)g, but I did not observe signicant dierences in compari- son with the previous algorithms in the four social networks graphs. Hence, nding a good parametrization ofc 1 andc 2 does not seem to be a crucial problem for the balanced algorithm. 7.3.2 Power Law Model In order to show that my conclusions still hold under dierent models, I also run experi- ments considering a power law distribution for the teaching lists. The power law is a very 127 suitable model for a range of real world phenomena. In fact, Andriani and McKelvey [2007], in a very comprehensive literature survey, lists 80 dierent kinds of phenomena which are modeled in the literature by power law distributions, and half of them are social phenomena. For example, it has been shown to be a good model for social networks, co- authorships networks, the structure of the world wide web and actor movie participation networks. A power law model also seems suitable in our case, as we can expect that a person will be very likely to teach us about the people who she has a direct connection with, and less and less likely to report people that are further away in the graph. Hence, when generating the teaching list of a node v i in my experiments, each node v o (v o 6=v i ) will be in its teaching list according to the following probability: p o := (a 1:0)h a o , where 1:0<a 2:0, andh o is the shortest path distance between nodev i andv o . a 1:0 represents the probability of a neighbor node v o (i.e., h o = 1) being selected. If node v i and v o are completely disconnected, I set h o :=jVj. Under this model the probability of a person teaching us about another is always strictly greater than 0, even though it may be very small if the respective nodes are very distant in the graph. I xa = 1:8 (80% probability of each of a nodes' neighbors being in its teaching list). I show results for a = 1:2 in Appendix A, for the interested reader (and my conclusions still hold in the alternative parametrization). Similarly as before, Figure 7.5 shows the result at each intervention for ' = 0:5 and p = 0:5. As we can see, my main conclusions still hold in the power law model. The In uence obtained by in uence-greedy, balanced, and balanced-decreasing are very similar. Out of 40 interventions, their results are not signicantly dierent in 39 ( 0:05), and they are signicantly better than knowledge- greedy in around 60% of the interventions ( 0:05). This time, however, balanced-decreasing obtained worse results than balanced in terms of Knowledge. Although up to iteration 4, knowledge-greedy, balanced and balanced- decreasing are not signicantly dierent; both knowledge-greedy and balanced are sig- nicantly better than balanced-decreasing after that iteration. It is harder to obtain Knowledge under the power law model than under the uniform model (all algorithms converge slower to 1:0 than before). Hence, balanced-decreasing would require a slower decay speed (i.e., higher d) in this case, in order to perform better. We can also notice that all algorithms are signicantly better than in uence-greedy in all iterations, in terms of Knowledge. Note that under the uniform model, in uence- greedy was not signicantly worse than the other algorithms after iteration 20 ( 0:1). Hence, as expected, in uence-greedy becomes relatively worse than the other algorithms when we assume a model where mapping is harder. I calculate the AUC, in order to compare dierent parametrizations. Figure 7.6 (a) shows the result for a xed in uence probability value (p = 0:5), and dierent teaching 128 0 10 20 30 40 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Influence ● ● ● ● ● ● ● ● ● ● ● ● ● ● Intervention Number ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing (a) In uence 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (b) Knowledge 0 10 20 30 40 0.2 0.4 0.6 0.8 1.0 Influence + Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (c) In uence + Knowledge Figure 7.5: Results of 4 real world networks across many interventions, for p = 0:5 and ' = 0:5 (power law distribution). 129 probability (') values. As before, except for knowledge-greedy, all algorithms in general obtain similar results in terms of In uence. We notice, however, that for ' = 0:1, the result for balanced and balanced-decreasing is actually signicantly better than in uence- greedy ( = 0:064 and 0:038, respectively). Again, we also notice that ' signicantly impacts the in uence that we obtain over the network, although the impact is small. For all algorithms, the impact is signicant from ' = 0:1 to ' = 0:5. For in uence-greedy, the dierence is statistically signicant with = 1:39 10 31 , while for knowledge- greedy, balanced and balanced decreasing, there is statistically signicant dierence with = 4:52910 6 , 3:25810 14 and 8:42910 20 , respectively. However, the impact of' increasing from 0:5 to 1 is not that signicant for knowledge-greedy, balanced and balanced- decreasing ( = 0:41; 0:02; 0:03, respectively), while for in uence-greedy the change has signicant impact ( = 10 6 ). In terms of Knowledge, we can see that all algorithms are signicantly better than in uence-greedy for all ' values (with 3:464015 10 19 ). However, this time we notice that knowledge-greedy and balanced are signicantly better than balanced-decreasing for all '. As mentioned, a dierent decay speed d is necessary in this case. In Figure 7.6 (b), I show dierent values of p for ' = 0:5. As before, in terms of In uence the dierence between in uence-greedy, balanced and balanced-decreasing is not signicant, and all algorithms are signicantly better than knowledge-greedy. In terms of Knowledge, the in uence probability does not aect knowledge-greedy, balanced nor balanced-decreasing signicantly, as expected. This time, however, in uence-greedy obtains a signicantly better result for p = 0:1 than for p = 0:5. This may happen because forp = 0:1 in uence-greedy has a higher tendency of selecting nodes with a high number of neighbors, which also tends to be the ones with large teaching lists. Note that this does not happen when the teaching and neighbor lists are independent. Figure 7.7 shows the regret (lower results are better). We can notice similar results as before: all algorithms have similar regret (except for knowledge-greedy), and the regret for balanced and balanced-decreasing when' = 0:1 andp = 0:5 is again signicantly lower than in uence-greedy ( = 0:019 and 0:009, respectively). This time, however, we can notice that for some parametrizations balanced-decreasing is actually the algorithm with the lowest regret. For p = 0:5 and ' = 0:5, balanced-decreasing is better than balanced with = 4:7 10 4 . Hence, even though balanced-decreasing performed relatively worse than under the uniform model in terms of Knowledge, it is actually the best algorithm in terms of In uence for some parametrizations. 130 ϕ= 0.1 ϕ= 0.5 ϕ= 1 ϕ= 0.1 ϕ= 0.5 ϕ= 1 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (a) Changing teaching probability p=0.1 p=0.5 p=0.1 p=0.5 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (b) Changing in uence probability Figure 7.6: In uence and Knowledge for dierent teaching and in uence probabilities (power law distribution). ϕ= 0.1 ϕ= 0.5 ϕ= 1 ϕ= 0.1 ϕ= 0.5 ϕ= 1 Regret 0 2 4 6 8 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing p = 0.1 p = 0.5 Figure 7.7: Regret for dierent teaching and in uence probabilities (power law distribu- tion). Lower results are better. 131 7.4 Discussion I present, for the rst time, the problem of simultaneously in uencing and mapping a social network. This is a fundamental step for a greater applicability of in uencing social networks, for example to increase the adoption of healthy-conscious behavior. My model, however, currently has two main shortcomings. First, I focus in this chap- ter on the problem of learning the network nodes, and I assume to always know the edges of the current known subgraph. This assumption was necessary to keep the clarity and precision of my theoretical analysis. A simple extension would be to consider the teaching lists to contain both nodes and edges. That is, at each time we pick a node v i , we would learn a set of nodesV i and a set of edgesE i . We must enforce, however, thatv a ;v b 2V i , for all (v a ;v b )2 E i . Otherwise, we could run into the situation of learning an edge e without knowing the edge end-points, which would be inconsistent. My theoretical results would still apply, but this constraint makes the theoretical analysis unnecessarily more intricate. Another alternative would be to always consider fully connected graphs, but with a very low in uence probability. We could, then, increase the in uence probability of the edges which we are certain of their existence. Second, there is not yet a grounded model available to simulate how much someone may be able to teach us about the social network upon being called for interview, nor real world datasets that we could use. I do have datasets of real social network graphs, but as I mentioned, the teaching list of a node is not the same as its neighbor list. People often know others who they do not have direct contact with, or are not \close" enough to have a chance of in uencing behavior. Besides, a person may not teach us about all her connections during an interview, due, for example, of shame or privacy issues. An exciting venue for future work would be to perform experiments with human subjects to understand how the teaching lists are actually formed. Such an experiment, however, would present many challenges. To start with, we would need access to the full knowledge of a person, in order to know which persons she fails to report to us. Also, we can expect high variance in the dierence between the full knowledge and the teaching lists, as it depends on the personality of each individual. In my theoretical analysis, I assume a uniform distribution model. The uniform distri- bution has as its main advantage the lack of any biases, which suits this situation where the precise real-world model is yet unknown. However, intuitively many readers would expect that a person would be more likely to report people who she is directly connected to, and less likely to report people that are more distant in the graph. Hence, I comple- ment my empirical evaluation with experiments where I assume a power law distribution 132 when generating the teaching lists of each node. My main conclusions still hold under this alternative model, showing that my results generalize to dierent distributions. I noticed, however, that although balanced-decreasing had a similar performance as balanced in terms of Knowledge under the uniform distribution model, it did have a lower performance under the power law model. I used the same decay speed in both cases, but under the power law it is harder to map the network. Hence, in that situation it would be necessary to use a lower decay speed in order to better match the performance of balanced. Balanced-decreasing, however, is not worse in terms of In uence. In fact, for some parametrizations it was actually the best algorithm, outperforming in uence-greedy. Therefore, it can perform reasonably well, even without an accurate parametrization of the speed of decay. Additionally, another interesting point of my experimental analysis is that under both models we found that in uence-greedy tends to perform as well as balanced in terms of In uence. This shows that the traditional greedy in uence maximization algorithm is able to perform well even without full knowledge of the social network graph. In fact, for some parametrizations it is even not statistically dierent than running with full knowledge of the network. There are cases, however, where it will perform badly: for a low teaching probability the regret is lower for the other algorithms, and my Observation 7.2.2 shows that it can have an arbitrarily low performance. Moreover, in uence-greedy clearly performs worse in terms of Knowledge. Besides helping to in uence a network, mapping the graph is also an important action per se. Social agencies usually need knowledge about a community in order to decide their policies and programs, enhancing their ability to increase the general well-being of a population. For instance, during an interview we will not necessarily only learn about the existence of a node v x . When we learn about v x we may also query for additional information about the person represented by v x , such as gender, age group or profession. Agencies, then, will be able to use this data when deciding their educational programs. 7.5 Conclusion I introduced the novel problem of simultaneously in uencing and learning the graph of a social network. I show theoretically and experimentally that an algorithm which locally maximizes both in uence and knowledge performs as well as an in uence-only greedy algorithm in terms of in uence, and as well as a knowledge-only greedy approach in terms of knowledge. I present an approximation of my algorithm that gradually decreases the weight given to knowledge-gain, which requires fewer assumptions. 133 I run experiments using four real-life social networks, where I study two dierent ways of modeling the knowledge gained by interviewing each node: one where a uniform dis- tribution is assumed, avoiding biases and matching my theoretical analysis; and another where a power law distribution is assumed, and thus nodes are more likely to report oth- ers which are closer to them in the social network graph. By testing two dierent models I validate my conclusions irrespective of the uniform distribution assumption. My empirical results show not only that the proposed algorithms are competitive with the traditional greedy one in terms of in uence, but that they can also signicantly in uence more nodes than the traditional algorithm when nodes have a low teaching probability. Additionally, the proposed algorithms are signicantly better in terms of mapping the graph. Besides helping in in uencing, learning about the social network is also important for institutions to eectively decide their policies and educational pro- grams. 134 Part IV Team Assessment 135 Chapter 8 Every Team Deserves a Second Chance In the middle of the road there was a stone there was a stone in the middle of the road there was a stone in the middle of the road there was a stone. Never should I forget this event in the life of my fatigued retinas. Never should I forget that in the middle of the road there was a stone there was a stone in the middle of the road in the middle of the road there was a stone. (Carlos Drummond de Andrade) 8.1 Introduction It is well known that aggregating the opinions of dierent agents can lead to a signicant performance improvement when solving complex problems. In particular, voting has been extensively used to improve the performance in machine learning [Polikar, 2012], crowd- sourcing [Mao et al., 2013, Bachrach et al., 2012], and even board games [Obata et al., 2011, Soejima et al., 2010]. Additionally, it is an aggregation technique that does not de- pend on any domain, being very suited for wide applicability. However, a team of voting agents will not always be successful in problem-solving. It is fundamental, therefore, to be able to quickly assess the performance of teams, so that a system operator can take actions to recover the situation in time. Moreover, complex problems are generally char- acterized by a large action space, and hence methods that work well in such situations are of particular interest. 136 Current works in the multi-agent systems literature focus on identifying faulty or erroneous behavior [Khalastchi et al., 2014, Lindner and Agmon, 2014, Tarapore et al., 2013, Bulling et al., 2013], or verifying correctness of systems [Doan et al., 2014]. Such approaches are able to identify if a system is not operating correctly, but provide no help if a correct system of agents is failing to solve a complex problem. Other works focus on team analysis. Raines et al. [2000] present a method to automatically analyze the performance of a team. The method, however, only works oine and needs domain knowledge. Other methods for team analysis are heavily tailored for robot-soccer [Ramos and Ayanegui, 2008] and focus on identifying opponent tactics [Mirchevska et al., 2014]. In fact, many works in robotics propose monitoring a team by detecting dierences in the internal state of the agents (or disagreements), mostly caused by malfunction of the sensors/actuators [Kaminka and Tambe, 1998, Kaminka, 2006, Kalech and Kaminka, 2007, Kalech et al., 2011]. In a system of voting agents, however, disagreements are inher- ent in the coordination process and do not necessarily mean that an erroneous situation has occurred due to such malfunction. Additionally, research in social choice is mostly focused on studying the guarantees of nding the optimal choice given a noise model for the agents and a voting rule [Caragiannis et al., 2013, List and Goodin, 2001, Conitzer and Sandholm, 2005], but provide no help in assessing the performance of a team of voting agents. There are also many recent works presenting methods to analyze and/or make predic- tions about human teams playing sports games. Such works use an enormous amount of data to make predictions about many popular sports, such as American football [Quenzel and Shea, 2014, Heiny and Blevins, 2011], soccer [Bialkowski et al., 2014, Lucey et al., 2015] and basketball [Maheswaran et al., 2012, Lucey et al., 2014]. Clearly, however, these works are not applicable to analyzing the performance of a team of voting agents. Hence, in this chapter, I show a novel method to predict the nal performance (success or failure) of a team of voting agents, without using any domain knowledge. Therefore, my method can be easily applied in a great variety of scenarios. Moreover, my approach can be quickly applied online at any step of the problem-solving process, allowing a system operator to identify when the team is failing. This can be useful in many applications. For example, consider a complex problem being solved on a cluster of computers. It is undesirable to allocate more resources than necessary, but if we notice that a team is failing in problem solving, we might wish to increase the allocation of resources. Or consider a team playing together a game against an opponent (such as board games, or poker). Dierent teams might play better against dierent opponents. Hence, if we notice that a team is predicted to perform poorly, we could dynamically change it. Under time constraints, however, such prediction must be done quickly. 137 Although related, note that the contribution of this chapter is not in using a team of experts to solve a problem or make predictions [Cesa-Bianchi and Lugosi, 2006] or aggregating multiple classiers through voting as in ensemble systems [Polikar, 2012]. My objective is, given a team of voting agents, to make a prediction about such a team, in order to estimate whether they will be able to solve a certain problem or not. My approach is based on a prediction model derived from a graphical representation of the problem-solving process, where the nal outcome is modeled as a random variable that is in uenced by the subsets of agents that agreed together over the actions taken at each step towards solving the problem. Hence, my representation depends uniquely on the coordination method, and has no dependency on the domain. I explain theoretically why we can make accurate predictions, and I also show the conditions under which we can use a reduced (and scalable) representation. Moreover, my theoretical development allows us to anticipate situations that would not be foreseen by a simple application of classical voting theories. For example, my model indicates that the accuracy can be better for diverse teams composed of dierent agents than for uniform teams, and that we can make equally accurate predictions for teams that have signicant dierences in playing strength (which is later conrmed in our experiments). I also study the impact of increasing the action space in the quality of my predictions, and show that we can make better predictions in problems with large action spaces. I present experimental results in two dierent domains: Computer Go and Ensemble Learning. In the Computer Go domain, I predict the performance of three dierent teams of voting agents: a diverse, a uniform, and an intermediate team (with respect to diversity); in four dierent board sizes. I study the predictions at every turn of the games, and compare with an analysis performed by using an in-depth search. I am able to achieve an accuracy of 71% for a diverse team in 9 9 Go, and of 81% when I increase the action space size to 21 21 Go. For a uniform team, I obtain 62% accuracy in 9 9, and 75% accuracy in 21 21 Go. I evaluate dierent classication thresholds using Receiver Operating Characteristic (ROC) curves, and compare the performance for dierent teams and board sizes according to the area under the curves (AUC). I experimentally show in such analysis that: (i) we can eectively make high-quality predictions for all teams and board sizes; (ii) the quality of my predictions is better for the diverse and intermediate teams than uniform (irrespective of their strength), across all thresholds; (iii) the quality of the predictions increases as the board size grows. Moreover, the impact of increasing the action space on the prediction quality occurs earlier in the game for the diverse team than for the uniform team. Finally, I study the learned prediction functions, and how they change 138 across dierent teams and dierent board sizes. My analysis shows that the functions are not only highly non-trivial, but in fact even open new questions for further study. In the Ensemble Learning domain, I predict the performance of classiers that vote to assign labels to set of items. I use the scikit-learn's digits dataset [Pedregosa et al., 2011], and teams vote to correctly identify hand-written digits. I am also able to obtain high- quality predictions online about the nal performance of two dierent teams of classiers, showing the applicability of my approach to dierent domains. 8.2 Prediction Method I start by presenting my prediction method, and in Section 8.3 I will explain why the method works. I consider scenarios where agents vote at every step (i.e., world state) of a complex problem, in order to take common decisions at every step towards problem- solving. Formally, let T be a set of agents t i , A be a set of actions a j and S be a set of world states s k . The agents must vote for an action at each world state, and the team takes the action decided by the plurality voting rule, that picks the action that received the highest number of votes (I assume ties are broken randomly). The team obtains a nal reward r upon completing all world states. In this chapter, I assume two possible nal rewards: \success" (1) or \failure" (0). I dene the prediction problem as follows: without using any knowledge of the do- main, identify the nal reward that will be received by a team. This prediction must be executable at any world state, allowing a system operator to take remedial procedures in time. I now explain my algorithm. The main idea is to learn a prediction function, given the frequencies of agreements of all possible agent subsets over the chosen actions. In order to learn such function, we need to dene a feature vector to represent each problem solving instance (for example, a game). My feature vector records the frequency that each subset of agents was the one that determined the action taken by the team (i.e., the subset whose action was selected as the action of the team by the plurality voting rule). When learning the prediction function I calculate the feature vector considering the whole history of the problem solving process (for example, from the rst to the last turn of a game). When using the learned prediction function to actually perform a prediction, we can simply compute the feature vector considering all history from the rst world state to the current one (for example, from the rst turn of a game up to the current turn), and use it as input to our learned function. Note that my feature vector does not hold any domain information, and uses solely the voting patterns to represent the problem solving instances. 139 Formally, letP(T) =fT 1 ; T 2 ;:::g be the power set of the set of agents, a i be the action chosen in world state s j and H j T be the subset of agents that agreed on a i in that world state. Consider the feature vector ~ x = (x 1 ;x 2 ;:::) computed at world state s j , where each dimension (feature) has a one-to-one mapping withP(T). I dene x i as the proportion of times that the chosen action was agreed upon by the subset of agents T i . That is, x i = jS j j X k=1 I(H k = T i ) jS j j ; where I is the indicator function and S j S is the set of world states from s 1 to the current world state s j . Hence, given a set ~ X such that for each feature vector ~ x t 2 ~ X we have the associated rewardr t , we can estimate a function, ^ f, that returns an estimated reward between 0 and 1 given an input~ x. I classify estimated rewards above a certain threshold# (for example, 0.5) as \success", and below it as \failure". In order to learn the classication model, the features are computed at the nal world state. That is, the feature vector will record the frequency that each possible subset of agents won the vote, calculated from the rst world state to the last one. For each feature vector, we have (for learning) the associated reward: \success" (1) or \failure" (0), accordingly with the nal outcome of the problem solving process. In order to execute the prediction, the features are computed at the current world state (i.e., all history of the current problem solving process from the rst world state to the current one). I use classication by logistic regression, which models ^ f as ^ f(~ x) = 1 1 +e (+ ~ T ~ x) ; where and ~ are parameters that will be learned given ~ X and the associated rewards. While training, I eliminate two of the features. The feature corresponding to the subset ; is dropped because an action is chosen only if at least one of the agents voted for it. Also, since the rest of the features sum up to 1, and are hence linearly dependent, I also drop the feature corresponding to all agents agreeing on the chosen action. I also study a variant of this prediction method, where we use only information about the number of agents that agreed upon the chosen action, but not which agents exactly were involved in the agreement. For that variant, I consider a reduced feature vector 140 Agent 1 Agent 2 Agent 3 Iteration 1 a 1 a 1 a 2 Iteration 2 a 2 a 2 a 1 Iteration 3 a 1 a 2 a 2 Table 8.1: A simple example of voting proles after three iterations of problem-solving. ~ y = (y 1 ;y 2 ;:::), where I dene y i to be the proportion of times that the chosen action was agreed upon by any subset of i agents: y i = jS j j X k=1 I(jH k j =i) jS j j ; where I is the indicator function and S j S is the set of world states from s 1 to the current world state s j . I compare the two approaches in Section 8.4. 8.2.1 Example of Features I give a simple example to illustrate my proposed feature vectors. Consider a team of three agents: t 1 , t 2 , t 3 . Let's assume three possible actions: a 1 , a 2 , a 3 . Consider that, in three iterations of the problem solving process, the voting proles were as shown in Table 8.1, where I show which action each agent voted for at each iteration. Based on plurality voting rule, the action chosen for the respective iterations would be a 1 , a 2 , and a 2 . We can see an example of how the full feature vector will be dened at each iteration in Table 8.2, where each column represents a possible subset of the set of agents, and I mark the frequency that each subset agreed on the chosen action. Note that the frequency of the subsetsft 1 g,ft 2 g andft 3 g remains 0 in this example. This happens because we only count the subset where all agents involved in the agreement are present. If there was a situation where, for example, agent t 1 votes for a 1 , agent t 2 votes for a 2 and agent t 3 votes fora 3 , then we would select one of these agents by random tie braking. After that, we would increase the frequency of the corresponding subset containing only the agent that was chosen (i.e, eitherft 1 g,ft 2 g orft 3 g). If the problem has only three iterations in total, we would use the feature vector at the last iteration and the corresponding result (\success" | 1, or \failure" | 0), while learning the function ^ f (that is, the feature vectors at Iteration 1 and 2 would be ignored). If, however, we already learned a function ^ f, then we could use the feature vector at Iteration 1, 2 or 3 as input to ^ f to execute a prediction. Note that at Iteration 1 and 2 the output (i.e., the prediction) of ^ f will be exactly the same, as the feature vector did not change. At Iteration 3, however, we may have a dierent output/prediction. 141 ft 1 g ft 2 g ft 3 g ft 1 ;t 2 g ft 1 ;t 3 g ft 2 ;t 3 g Iteration 1 0 0 0 1 0 0 Iteration 2 0 0 0 1 0 0 Iteration 3 0 0 0 2/3 0 1/3 Table 8.2: Example of the full feature vector after three iterations of problem solving. 1 2 Iteration 1 0 1 Iteration 2 0 1 Iteration 3 0 1 Table 8.3: Example of the reduced feature vector after three iterations of problem solving. In Table 8.3, I show an example of the reduced feature vector, where the column headings dene the number of agents involved in an agreement over the chosen action. I consider here the same voting proles as before, shown in Table 8.1. Note that the reduced representation is much more compact, but we have no way to represent the change in which specic agents were involved in the agreement, from Iteration 2 to Iteration 3. In this case, therefore, we would always have the same prediction after Iteration 1, Iteration 2 and also Iteration 3. 8.3 Theory I consider here the view of social choice as a way to estimate a \truth", or the correct (i.e., best) action to perform in a given world state. Hence, we can model each agent as a probability distribution function (pdf): that is, given the correct outcome, each agent will have a certain probability of voting for the best action, and a certain probability of voting for some incorrect action. These pdfs are not necessarily the same across dierent world states (Chapter 3). Hence, given the voting prole in a certain world state, there will be a probability p of picking the correct choice (for example, by the plurality voting rule). I will start by developing, in Section 8.3.1, a simple explanation of why we can use the voting patterns to predict success or failure of a team of voting agents, based on classical voting theories. Such explanation will give an intuitive idea of why we can use the voting patterns to predict success or failure of a team of voting agents, and it can be immediately derived from the classical voting models. However, it fails to explain some of the results in Section 8.4, and it needs the assumption that plurality is an optimal voting rule. Hence, it is not enough for a deeper understanding of my prediction methodology. Therefore, I will later present, in Section 8.3.2, my main theoretical model, that provides a better 142 understanding of my results. In particular, based on classical models we would expect to make better predictions for teams that have a greater performance (i.e., likelihood of being correct). My theory and experiments will show, however, that we can actually make better predictions for teams that are more diverse, even if they have a worse performance. Moreover, I will also be able to build on my previous ST agent model, in Section 8.3.3, to study the eect of increasing the action space on the prediction quality. 8.3.1 Classical Voting Model I start with a simple example to show that we can use the outcome of plurality voting to predict success. Consider a scenario with two agents and two possible actions, a correct and an incorrect one. I assume, for this example, that agents have a probability of 0.6 of voting for the correct action and 0.4 of making a mistake. If both agents vote for the same action, they are either both correct or both wrong. Hence, the probability of the team being correct is given by 0:6 2 =(0:6 2 + 0:4 2 ) = 0:69. Therefore, if the agents agree, the team is more likely correct than wrong. If they vote for dierent actions, however, one will be correct and the other one wrong. Given that prole, and assuming that we break ties randomly, the team will have a 0.5 probability of being correct. Hence, the team has a higher probability of taking a correct choice when the agents agree than when they disagree (0:69> 0:5). Therefore, if across multiple iterations these agents agree often, the team has a higher probability of being correct across these iterations, and we can predict that the team is going to be successful. If they disagree often, then the probability of being correct across the iterations is lower, and we can predict that the team will not be successful. More generally than the previous example, we can consider all cases where plurality is the optimal voting rule. In social choice, optimal voting rules are often studied as maximum likelihood estimators (MLE) of the correct choice [Conitzer and Sandholm, 2005]. That is, each agent is modeled as having a noisy perception of the truth (or correct outcome). Hence, the correct outcome in uences how each agent is going to vote, as shown in the model in Figure 8.1. For example, consider a certain agent t that has a probability 0.6 of voting for the best action, and let's say we are in a certain situation where action a is the best action. In this situation agent t will have a probability of 0.6 of voting for a . Therefore, given a voting prole and a noise model (the probability of voting for each action, given the correct outcome) of each agent, we can estimate the likelihood of each action being the best by a simple (albeit computationally expensive) probabilistic inference. Any voting rule is going to be optimal if it corresponds to always picking the action that has the maximum likelihood of being correct (i.e., the action with maximum 143 \Correct Outcome" Agent 1's vote Agent 2's vote ... Agent n's vote Figure 8.1: \Classical" voting model. Each agent has a noisy perception of the truth, or correct outcome. Hence, its vote is in uenced by the correct outcome. likelihood of being the best action), according to the assumed noise model of the agents. That is, the output of an optimal voting rule always corresponds to the output of actually computing, by the probabilistic inference method mentioned above, which action has the highest likelihood of being the best one. If plurality is assumed to be an optimal voting rule, then the action voted by the largest number of agents has the highest probability of being the optimal action. We can expect, therefore, that the higher the number of agents that votes for an action, the higher the probability that the action is the optimal one. Hence, given two dierent voting proles with a dierent number of agreeing agents, we expect that the team has a higher probability of being correct (and, therefore, be successful) in the voting prole where a larger number of agents agrees on the chosen action. I formalize this idea in the following proposition, under the classical assumptions of voting models. Hence, I consider that the agents have a higher probability of voting for the best action than any other action (which makes plurality a MLE voting rule [List and Goodin, 2001]), uniform priors over all actions, and that all agents are identical and independent. Proposition 8.3.1 The probability that a team is correct increases with the number of agreeing agents m in a voting prole, if plurality is MLE. Proof: Let a be the best action (whose identity we do not know) and V =v 1 ;v 2 :::v n be the votes ofn agents. The probability of any actiona being the best action, given the votes of the agents (i.e., P (a =a jv 1 ;v 2 :::v n )), is governed by the following relation: P (a =a jv 1 ;v 2 :::v n )/P (v 1 ;v 2 :::v n ja =a )P (a =a ) (8.1) Let's consider two voting proles V 1 ; V 2 , where in one a higher number of agents agree in the chosen action than in the other (i.e., m V 1 > m V 2 ). Let w 1 be the action with the highest number of votes in V 1 , and w 2 the one in V 2 . Without loss of generality (since the order does not matter), let's reorder the voting proles V 1 and V 2 , such that all votes for w 1 are in the beginning of V 1 and all votes 144 forw 2 are in the beginning of V 2 . Now, let V x 1 and V x 2 be the voting proles considering only the rst x agents (after reordering). We have that P (V m V 2 1 jw 1 = a ) = P (V m V 2 2 jw 2 = a ), since up to the rst m V 2 agents for both voting proles we are considering the case where all agents voted for a . Now, let's consider the agents from m V 2 + 1 to m V 1 . In V 1 , the voted action of all agents (stillw 1 ) is wired toa (by the conditional probability). However, in V 2 , the voted actiona6=w 2 is not wired toa in the conditional probability anymore. As each agent is more likely to vote for a than any other action, from m V 2 + 1 tom V 1 , the events in V 1 (an agent voting for a ) has higher probability than the events in V 2 (an agent voting for an action a6=a ). Hence, P (V m V 1 1 jw 1 =a )>P (V m V 1 2 jw 2 =a ). Now let's consider the votes after m V 1 . In V 1 there are no more votes for w 1 , and in V 2 there are no more votes for w 2 . Hence, all the subsequent votes are not wired to any ranking (as we only wirew 1 =a andw 2 =a in the conditional probabilities). Therefore, each vote can be assigned to any ranking position that is not the rst. Since the agents are independent, any sequence of votes will thus be as likely. Hence, P (V 1 jw 1 = a ) > P (V 2 jw 2 =a ). Since I assume uniform priors, it follows that: P (w 1 =a jV 1 )>P (w 2 =a jV 2 ) Therefore, the team is more likely correct in proles where a higher number of agents agree. Hence, if across multiple voting iterations, a higher number of agents agree often, we can predict that the team is going to be successful. If they disagree a lot, we can expect that they are wrong in most of the voting iterations, and we can predict that the team is going to fail. In the next observation I show that we can increase the prediction accuracy by knowing not only how many agents agreed, but also which specic agents were involved in the agreement. Basically, I show that the probability of a team being correct depends on the agents involved in the agreement. Therefore, if we know that the best agents are involved in an agreement, we can be more certain of a team's success. This observation motivates the use of the full feature vector, instead of the reduced one. Observation 8.3.2 Given two prolesV 1 ;V 2 with the same number of agreeing agents m, the probability that a team is correct is not necessarily equal for the two proles. We can easily show by an example (that is, we only need one example where the probability is not equal to show that it will not always be equal). Consider a problem 145 with two actions. Consider a team of three agents, where t 1 and t 2 have a probability of 0:8 of being correct, while t 3 has a probability of 0:6 of being correct. As the probability of picking the correct action is the highest for all agents, the action chosen by the majority of the agents has the highest probability of being correct (that is, we are still covering a case where plurality is MLE). However, when only t 1 and t 2 agree, the probability that the team is correct is given by: 0:8 2 0:4=(0:8 2 0:4 + 0:2 2 0:6) = 0:91. When onlyt 2 andt 3 agree, the probability that the team is correct is given by: 0:80:60:2=(0:80:60:2+0:20:40:8) = 0:59. Hence, the probability that the team is correct is higher when t 1 andt 2 agree than when t 2 and t 3 agree. However, based solely on the classical voting models, one would expect that given two dierent teams, the predictions would be more accurate for the one that has greater performance (i.e., likelihood of being correct), as I formalize in the following proposition. Therefore, this model fails to explain my experimental results (as I will show later). I will use the term strength to refer to a team's performance. Proposition 8.3.3 Under the classical voting models, given two dierent teams, one can expect to make better predictions for the strongest one. Proof Sketch: Under the classical voting models, assuming the agents have a noise model such that plurality is a MLE, we have that the best team will have a greater probability of being correct given a voting prole where m agents agree than a worse team with the same amount of m agreeing agents. Hence, the probability of the best team being correct will be closer to 1 in comparison with the probability of the worse team being correct. The closer the probability of success is to 1, the easier it is to make predictions. Consider a Bernoulli trial with probability of success p 1. In the learning phase, we will see many successes accordingly. In the testing phase, we will predict the majority of the two for every trial, and we will go wrong only with probabilityj1pj 0. Of course, we could also have an extremely weak team, that is wrong most of the time. For such team, it would also be easy to predict that the probability of success is close to 0. Notice, however, that I am assuming here the classical voting models, where plurality is a MLE. In such models, the agents must play \reasonably well": classically they are assumed to have either a probability of being correct greater than 0.5 or the probability of voting for the best action is the highest one in their pdf [List and Goodin, 2001]. Otherwise, plurality is not going to be a MLE. Consider, however, that the strongest team is composed of copies of the best agent (which would often be the case, under the classical assumptions). We actually have 146 W H 1 H 2 H 3 H 4 . . . H S Figure 8.2: My main model. I assume that the subset of agents that decided the action of the team at each world state H i determines whether the team will be successful or not (W ). that, in fact, such agents will not necessarily have noise models (pdfs) where the best action has the highest probability in all world states. In some world states, a suboptimal action could have the highest probability, making the agents agree on the same mistakes (Chapter 3). Therefore, when plurality is not actually a MLE in all world states, we have that Proposition 8.3.1 will not hold in the world states where this happens. Hence, we will predict that the team made a correct choice, when actually the team was wrong, causing problems in our accuracy. I give more details in the next section. 8.3.2 Main Theoretical Model I now present my main theory, that holds irrespective of plurality being an optimal voting rule (MLE) or not. Again, I consider agents voting across multiple world states. I assume that all iterations equally in uence the nal outcome, and that they are all independent. Let the nal reward of the team be dened by a random variable W , and let the number of world states beS. I model the problem solving process by the graphical model in Figure 8.2, where H j represents the subset of agents that agreed on the chosen action at world state s j . That is, I assume that the subset of agents that decided the action taken by the team at each world state of the problem solving process will determine whether the team will be successful or not in the end. A specic problem (for example, Go games where the next state will depend on the action taken in the current one) would call for more complex models to be completely represented. My model is a simplication of the problem solving process, abstracting away the details of specic problems. For any subset H, let P (H) be the probability that the chosen action was correct given the subset of agreeing agents. If the correct action is a , P (H) is equivalent to: P (8t2 H;t chooses a ; 8t = 2 H;t chooses a6=a ) P (9a 0 8t2 H;t chooses a 0 ; 8t = 2 H;t chooses a6=a 0 ) ; where H is the subset of agents which voted for the action taken by the team. 147 Note thatP (H) depends on both the team and the world state. However, I marginalize the probabilities to produce a value that is an average over all world states. I consider that, for a team to be successful, there is a threshold such that: 8 < : S Y j=1 P (H j ) 9 = ; 1=S > (8.2) I use the exponent 1=S in order to maintain a uniform scale across all problems. Each problem might have a dierent number of world states; and for one with many world states, it is likely that the incurred product of probabilities is suciently low to fail the above test, independent of the actual subsets of agents that agreed upon the chosen actions. However, the nal reward is not dependent on the number of world states. I can show, then, that we can use a linear classication model (such as logistic regres- sion) that is equivalent to Equation 8.2, to predict the nal reward of a team. Theorem 8.3.4 Given the model in Equation 8.2, the nal outcome of a team can be predicted by a linear model over agreement frequencies. Proof: Getting the log in both sides of Equation 8:2, we have: S X j=1 1 S log(P (H j ))> log() The sum over the steps (world states) of the problem-solving process can be trans- formed to a sum over all possible subset of agents that can be encountered,P: X H2P n H S log(P (H))> log(); (8.3) wheren H is the number of times the subset of agreeing agents H was encountered during problem solving. Hence, n H S is the frequency of seeing the subset H, which I will denote by f H . Recall that T is the set of all agents. Hence, f T (which is the frequency of all agents agreeing on the same action), is equal to 1 P H2PnfTg f H . Also, note thatn ; = 0, since at least one agent must pick the chosen action. Equation 8.3 can, hence, be rewritten as: f T log(P (T)) + X H2PnT f H log(P (H))> log() 0 @ 1 X H2PnfTg f H 1 A log(P (T)) + X H2PnT f H log(P (H))> log() 148 log(P (T)) + X H2PnT f H log P (H) P (T) > log() Hence, my nal model will be: X H2PnT log P (H) P (T) f H > log P (T) (8.4) Note that log( P(T) ) and the \coecients" log( P(H) P(T) ) are all constants with respect to a given team, as I have discussed earlier. Considering the set of all f H (for each possible subset of agreeing agents H) to be the characteristic features of a single problem, the coecients can now be learned from training data that contains many problems represented using these features. Further, the outcome of a team can be estimated through a linear model. The number of constants is exponential, however, as the size of the team grows. Therefore, in the following corollary, I show that (under some conditions) we can approx- imate well the prediction with a reduced feature vector that grows linearly. In order to dierentiate dierent possible subsets, I will denote by H i a certain subset2P, and by jH i j the size of that subset (i.e., the number of agents that agree on the chosen action). Corollary 8.3.5 If P (H i ) P (H j )8H i ; H j such thatjH i j =jH j j, we can approxi- mate the prediction with a reduced feature vector, that grows linearly with the number of agents. Furthermore, in a uniform team the reduced representation is equal to the full representation. Proof: By the assumption of the corollary, there is a P H 0n, dened as P H 0n P (H j ), 8H j such thatjH j j = n. Let f n = P f H j, over alljH j j = n. Also, let N 0 be the set of all integers 0<x<N, where N is the number of agents. We thus have that: X H2PnT f H log P (H) P (T) X x2N 0 f x log P H 0n P (T) (8.5) As P H 0n depends only on the number of agents, we have that such representation grows linearly with the size of the team. Moreover, note that for a team made of copies of the same agent, we have that P H 0n = P (H j ),8H j such thatjH j j = n. Hence, the left hand side of Equation 8.5 is going to be equal to the right hand side. Also, notice that my model does not need any assumptions about plurality being an optimal voting rule (MLE). In fact, there are no assumptions about the voting rule at 149 all. Hence, Proposition 8.3.1, Observation 8.3.2 and Proposition 8.3.3 do not apply, and we can still make accurate predictions irrespective of the performance of a team. We can also note that the accuracy of the predictions is not going to be the same across dierent teams. Given two teams where plurality is the optimal voting rule in general (i.e., P (H i ) > P (H j ),8H i ; H j wherejH i j >jH j j): one uniform, made of copies of the best agent, and a diverse, made of dierent agents, we have that we can actually make better predictions for diverse than uniform, irrespective of the actual playing performance of these two teams. Corollary 8.3.6 Given a diverse and a uniform team, the accuracy of the prediction for the diverse team can be higher, even if the diverse team has a lower probability of victory. Proof Sketch: Note that the learned constants ^ c H log( P(H) P(T) ) represent the marginal probabilities across all world states. However, I showed (Chapter 3) that the agent with the highest marginal probability of voting for the correct choice will not nec- essarily have the highest probability of being correct at all world states. Hence, let Bad be the set of world states where the best agent does not have the highest probability of voting for the correct action in its pdf. In such world states, the agents tend to agree over the incorrect action that has the highest probability. This follows from modeling the voting of all agents as a multinomial distribution. Hence, the expected number of agents that vote for an actiona j , in a team withn agents, is given by: E[jHj] =np j , where p j is the probability of the agent voting for action a j . Therefore, the action with the highest probability will tend to receive the highest number of votes. However, in my prediction model, the estimation that the team is correct will get higher as more agents agree together upon the nal choice. Hence, we will tend to make wrong predictions for the world states in the set Bad, and our accuracy will be lower as jBadj gets higher. Since a diverse team is composed of agents with dierent pdfs, it is less likely that in a given world state they will all have the highest probability in the same incorrect action (Chapter 3). Hence, it is less likely that the situation described above happens, and we can have better predictions for a diverse team. Although I only give here a proof sketch, in Section 8.4 I experimentally show a statistically signicant higher accuracy for the predictions for a diverse team than for a uniform team, even though they have similar strength (i.e., performance in terms of winning rates). I am able to achieve a better prediction for diverse teams both in the end of the problem-solving process and also while doing online predictions at any world state. 150 8.3.3 Action Space Size I present now my study concerning the quality of the predictions over large action space sizes. In order to perform this analysis, I assume the spreading tail (ST) agent model, presented in Chapter 4. The ST agent model was developed to study how teams of voting agents change in performance as the size of the action space increases. The basic assumption is that the pdf of each member of the team has a non-zero probability over an increasingly larger number of suboptimal actions as the action space grows, while the probability of voting for the optimal action remains unchanged. Chapter 4 perform an experimental validation of this model in the Computer Go domain. As a reminder, I brie y summarize here the formal denition of the ST agent model, and I refer the reader to Chapter 4 for a more detailed description. Let D m be the set of suboptimal actions (a j ;j6= 0) assigned with a nonzero probability in the pdf of an agenti, andd m =jD m j. The ST model assumes that there is a bound in the ratio of the suboptimal action with highest probability and the one with lowest nonzero probability, i.e., let p i;min =min j2Dm p i;j and p i;max =max j2Dm p i;j ; there is a constant such that p i;max p i;min 8 agents i. ST agents are agents whose d m is non-decreasing on m and d m !1 asm!1. I consider that there is a constant> 0, such that for all ST agents i,8m, p i;0 . I also assume that p i;0 does not change with m. Let the size of the action spacejAj =%, andp i;j be the probability that agenti votes for action with rank j. In Chapter 4, I show that when %!1, the probability that a team of n ST agents will play the optimal action converges to: ~ p best = 1 n Y i=1 (1p i;0 ) n X i=1 (p i;0 n Y j=1;j6=i (1p j;0 )) n 1 n ; (8.6) that is, the probability of two or more agents agreeing over suboptimal actions converges to zero, and the agents can only agree over the optimal choice (note that a suboptimal action can still be taken as we may have situations where no agent agrees). Hence, Equation 4.1 calculates the total probability minus the cases where the best action is not chosen: the second term covers the case where all agents vote for a suboptimal action and the third term covers the case where one agent votes for the optimal action and all other agents vote for suboptimal actions. Before proceeding to my study, I am going to make a few denitions and then two weak assumptions. I consider now here any action space size. Let be the probability of a team taking the optimal action when all agents disagree. Since we can only take the optimal action if one agent votes for that action, is a function of the probability of each agent voting for the optimal action. That is, we may have voting proles where all agents disagree and no agent voted for the optimal action, or we may have voting proles 151 where all agents disagree, but there is one agent that voted for the optimal action (and, hence, we may still take the optimal action due to random tie braking). Let be the probability of a team taking the optimal action when there is some agreement on the voting prole. may be dierent according to each voting prole, but we assume that we always have that < 1 if%<1, and = 1 if%!1, according to the ST agent model. That is, if two or more agents agree, there is always some probability q> 0 that they are agreeing over a suboptimal action, and q! 0 as %!1. I will make the following weak assumptions: (i) If there is no agreement, the team is more likely to take a suboptimal action than an optimal action. I.e., < 1; (ii) If there is agreement, there is at least one voting prole where the team is more likely to take an optimal action than a suboptimal action. That is, there is at least one such that > 1. Assumption (i) is weak, since < 1=n (as we break ties randomly and there may be cases where no agent votes for the optimal action). Clearly 1=n < 1 1=n for n > 2. Assumption (ii) is also weak, because if we are given a team that is always more likely to take suboptimal actions than an optimal action for any voting prole, then a trivial predictor that always outputs \failure" would be optimal (and, hence, we would not need a prediction at all). Therefore, assumption (i) and (ii) are satised for all situations of interest. I present now my result: Theorem 8.3.7 Let T be a set of ST agents. The quality of our prediction about the performance of T is the highest as %!1. Proof: Let's x the problem to predicting performance at one world state. Hence, as we consider a single decision, there is a single H i such that f H i = 1, and f H j = 08j6=i. In order to simplify the notation, I denote byH the subset H i corresponding to f H i = 1. I also consider the performance of the team as \success" on that xed world state if they take the optimal action, and as \failure" otherwise. Let a voting event be the process of querying the agents for the vote, obtaining the voting prole and the corresponding nal decision. Hence, it has a unique correct label (\success" or \failure"). A voting event will be mapped to a point in the feature space, according to the subset of agents that agreed on the chosen action. Multiple voting events, however, will be mapped to the same point (as exactly the same subset can agree in dierent situations, sometimes they may be agreeing over the optimal action, and sometimes they may be agreeing over suboptimal actions). Hence, given a point , there is a certain probability that the team was successful, and a certain probability that the team failed. Therefore, by assigning a label to that point, our predictor will also be correct with a certain probability. With enough data, the predictor will output the 152 more likely of the two events. That is, if given a prole, the team has a probability p of taking the optimal action, and p > 1p, the predictor will output \success", and it will be correct with probability p. Correspondingly, if 1p > p, the predictor will output \failure", and it will be correct with probability 1p. Hence, the probability of the prediction being correct will be max(p; 1p). I rst study the probability of making a correct prediction across the whole feature space, for dierent action space sizes, and after that I will focus on what happens with the specic voting events as the action space changes. Let us start by considering the case when %!1. By Equation 4.1, we know that every time two or more agents agree on the same action, that action will be the optimal one. Note that this is a very clear division of the feature space, as for every single point wherejHj 2 the team will be successful with probability 1. Therefore, on this subspace we can make perfect predictions. The only points in the feature space where a team may still take a suboptimal action are the ones where a single agent agrees on the chosen action, i.e.,jHj = 1. Hence, for such points we will make a correct prediction with probability max(; 1). Let's now consider cases with a smaller action space size (i.e., % <1). Let's rst consider the subspacejHj 2. Before, our predictor was correct with probability 1. Now, given a voting event where there is an agreement, there will be a probability < 1 of the team taking the optimal action. Hence, the predictor will be correct with probability max(,1), but max(; 1)< 1. Let's consider now the subspacejHj = 1. Here the quality of the prediction depends on , which is a function of the probability of each agent playing the best action. On the ST agent model, however, the probability of one agent voting for the best action is independent of % (Chapter 4). Hence, does not depend on the action space size, and for these cases the quality of our prediction will be the same as before. Therefore, for all points in the feature space, the probability of making a correct prediction is either the same or worse when %<1 than when %!1. However, that does not complete the proof yet, because a voting event may map to a dierent point when the action space changes. For instance, the number of agents that agree over a suboptimal action may overpass the number of agents that agree on the optimal action as the action spaces changes from %!1 to % <1. Therefore, we need to show that our prediction will be strictly better when %!1 irrespective of such mapping. Hence, let us now study the voting events. As the number of actions decrease, a certain voting event when %!1, will map to a voting event 0 when %<1 (where may or may not be equal to 0 ). Let and 0 be the corresponding points in the feature 153 space for and 0 . Also, letH andH 0 be the respective subset of agreeing agents. Let's consider now the four possible cases: (i)jHj =jH 0 j = 1. For such events, the performance of the predictor will remain the same, that is, for both cases we will make a correct prediction with probability max(; 1). Note that this case will not happen for all events, as p i;0 6! 0 when %!1, hence there will be at least one event wherejHj 2. (ii)jHj 2,jH 0 j 2. For such events the performance of the predictor will be higher when%!1, as we can make a correct prediction for a point with probability 1, while for a point 0 with probability max(; 1)< 1. (iii)jHj 2,jH 0 j = 1. This case will not happen under the ST agent model. If there was a certain subset H of agreeing agents when %!1, when we decrease the number of actions the new subset of agreeing agents H 0 will either have the same size or will be larger. This follows from the fact that we may have a larger subset agreeing over some suboptimal action when the action space decreases, but the original subset that voted for the optimal action will not change. (iv)jHj = 1,jH 0 j 2. We know that in this case 0 is an event where the team fails (otherwise the same subset would also have agreed when %!1). Hence, for 1> (weak assumption (i)), we make a correct prediction for such case when %!1. When % <1, we make a correct prediction if 1 > . , however, depends on the voting prole of the event 0 . By weak assumption (ii), there will be at least one event where the team is more likely to be correct than wrong (that is, > 1). Hence, there will be at least one event where our predictor changes from making a correct prediction (when %!1) to making an incorrect prediction (when %<1). Hence, for all voting events, the probability of making a correct prediction will either be the same or worse when%<1 than when%!1, and there will be at least one voting event where it will be worse, completing the proof. Hence,%!1 is strictly the best case for our prediction. As we assume that all world states are independent, if %!1 is the best case for a single world state, it will also be the best case for a set of world states. 8.4 Results 8.4.1 Computer Go I rst test my prediction method in the Computer Go domain. I use four dierent Go software: Fuego 1.1 [Enzenberger et al., 2010], GnuGo 3.8 [FSF, 2009], Pachi 9.01 [Baudi s and Gailly, 2011], MoGo 4 [Gelly et al., 2006], and two (weaker) variants of Fuego (Fuego and Fuego), in a total of six dierent, publicly available, agents. Fuego is the strongest 154 0.0 0.2 0.4 0.6 0.8 1.0 Winning Rate 9x9 13x13 17x17 21x21 Diverse Intermediate Uniform Figure 8.3: Winning rates of the three teams, under four dierent board sizes. agent among all of them (Chapter 3). The description of Fuego and Fuego is available in Chapter 3. I study three dierent teams: Diverse, composed of one copy of each agent 1 ; Uniform, composed of six copies of the original Fuego (initialized with dierent random seeds, as in Soejima et al. [2010]); Intermediate, composed of six random parametrized versions of Fuego (from Chapter 6). In all teams, the agents vote together, playing as white, in a series of Go games against the original Fuego playing as black. I study four dierent board sizes for diverse and uniform: 9x9, 13x13, 17x17 and 21x21. For intermediate, I study only 9x9, since the random parametrizations of Fuego do not work on larger boards. In Go a player is allowed to place a move at any empty intersection of the board, so the largest number of possible actions at each board size is, respectively: 81, 169, 289, 441. In order to evaluate my predictions, I use a dataset of 1000 games for each team and board size combination (in a total of 9000 games, all played from the beginning). For all results, I used repeated random sub-sampling validation. I randomly assign 20% of the games for the testing set (and the rest for the training set), keeping approximately the same ratio as the original distribution. The whole process is repeated 100 times. Hence, in all graphs I show the average results, and the error bars show the 99% condence interval (p = 0:01), according to a t-test. If the error bars cannot be seen in a certain point in a graph, it is because they are smaller than the symbol used to mark that point in the graph. Moreover, when I say that a certain result is signicantly better than another, I mean statistically signicantly better, according to a t-test where p < 0:01, unless I explicitly give a p value. First, I show the winning rates of the teams in Figure 8.3. This result is not yet evaluating the quality of my prediction, it is merely background information that I will 1 Except for GnuGo, all other agents use Monte Carlo Tree Search. However, each software was developed by dierent groups, and use dierent heuristics and implementation strategies. In comparison with the other teams studied, this is the team with the greater diversity. 155 use when analyzing my prediction results later. On 9x9 Go, uniform is better than diverse with statistical signicance (p = 0:014), and both teams are clearly signicantly better than intermediate (p < 2:2 10 16 ). On 13x13 and 17x17 Go, the dierence between diverse and uniform is not statistically signicant (p = 0:9619 and 0:5377, respectively). On 21x21 Go, the diverse team is signicantly better than uniform (p = 0:03897). In order to verify my online predictions, I used the evaluation of the original Fuego, but I give it a time limit 50 longer. I will refer to this version as \Baseline". I, then, use the Baseline's evaluation of a given board state to estimate its probability of victory, allowing a comparison with my approach. Considering that an evaluation above 0:5 is \success" and below is \failure", I compare my predictions with the ones given by the Baseline's evaluation, at each turn of the games. I use this method because the likelihood of victory changes dynamically during a game. That is, a team could be in a winning position at a certain stage, after making several good moves, but suddenly change to a losing position after committing a mistake. Similarly, a team could be in a losing position after several good moves from the opponent, but suddenly change to a winning position after the opponent makes a mistake. Therefore, simply comparing with the nal outcome of the game would not be a good evaluation. However, for the interested reader, I show in Appendix B how the evaluation would be comparing with the nal outcome of the game | and I note here that our prediction quality is still high in such alternative. Since the games have dierent lengths, I divide all games in 20 stages, and show the average evaluation of each stage, in order to be able to compare the evaluation across all games uniformly. Therefore, a stage is dened as a small set of turns (on average, 1:35 0:32 turns in 9 9; 2:76 0:53 in 13 13; 4:70 0:79 in 17 17; 7:85 0:87 in 21 21). For all games, I also skip the rst 4 moves, since my baseline returns corrupted information in the beginning of the games. I will present my results in four dierent sections. First, I will show the analysis for a xed threshold (that is, the value # above which the output of my prediction function ^ f will be considered \success"). Secondly, I am going to analyze across dierent thresholds using receiver operating characteristic (ROC) curves, and the area under such curves (AUC). Thirdly, I will present the results for dierent board sizes. Then, nally, I will study the performance of the reduced feature vector. 8.4.1.1 Single Threshold Analysis I start by showing my results for a xed threshold, since it gives a more intuitive under- standing of the quality of my prediction technique. Hence, in these results I will consider that when my prediction function ^ f returns a value above 0.5 for a certain game, it will 156 be classied as \success", and when the value is below 0.5, it will be classied as \failure". In this section I also restrict myself to the 9x9 Go case. I will evaluate my prediction results according to ve dierent metrics: Accuracy, Failure Precision, Success Precision, Failure Recall, Success Recall. Accuracy is dened as the sum of the true positives and true negatives, divided by the total number of tests (i.e., true positives, true negatives, false positives, false negatives). Hence, it gives an overall view of the quality of the classication. Precision gives the percentage of data points classied with a certain label (\success" or \failure") that are correctly labeled. Recall denotes the percentage of data points that truly pertain to a certain label, that are correctly classied. I show the results in Figure 8.4. As we can see, we were able to obtain a high- accuracy very quickly, already crossing the 0:5 line in the 2 nd stage for all teams. In fact, the accuracy is signicantly higher than the 0:5 mark for all teams after the 2 nd stage (and for diverse and intermediate since the 1 st stage). From around the middle of the games (stage 10), the accuracy for diverse and uniform already gets close to 60% (with intermediate only close behind). Although we can see some small drops { that could be explained by the sudden changes in the game {, overall the accuracy increases with the game stage number, as expected. Moreover, for most of the stages, the accuracy is higher for diverse than for uniform. The prediction for diverse is signicantly better than for uniform in 90% of the stages. It is also interesting to note that the prediction for intermediate is signicantly better than for uniform in 60% of the stages, even though intermediate is a signicantly weaker team. In fact, we can see that in the last stage, the accuracy, the failure precision and the failure recall is signicantly better for intermediate than for the other teams. 8.4.1.2 Multiple Threshold Analysis I now measure my results under a variety of thresholds (that is, the value # above which the output of our prediction function ^ f will be considered \success"). In order to perform this study, I use receiver operating characteristic (ROC) curves. An ROC curve shows the true positive and the false positive rates of a binary classier at dierent thresholds. The true positive rate is the number of true positives divided by the sum of true positives and false negatives (i.e., the total number of items of a given label). That is, the true positive rate shows the percentage of \success" cases that are correctly labeled as such. The false positive rate, on the other hand, is the number of false positives divided by the sum of false positives and true negatives (i.e., the total number of items that are not of a given label). That is, the false positive rate shows the percentage of \failure" cases that are wrongly classied as \success". 157 5 10 15 20 0.5 0.6 0.7 0.8 Game Stage Accuracy ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (a) Accuracy 5 10 15 20 0.4 0.5 0.6 0.7 0.8 Game Stage Failure Precision ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (b) Failure Precision 5 10 15 20 0.4 0.5 0.6 0.7 0.8 Game Stage Success Precision ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (c) Success Precision 5 10 15 20 0.3 0.4 0.5 0.6 0.7 0.8 Game Stage Failure Recall ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (d) Failure Recall 5 10 15 20 0.3 0.4 0.5 0.6 0.7 0.8 Game Stage Success Recall ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (e) Success Recall Figure 8.4: Performance metrics over all turns of 9x9 Go games. 158 Hence, ROC curves allow us to better understand the trade-o between the true positives and the false positives of a given classier. By varying the threshold, we can increase the number of items that receive a given label, increasing henceforth the number of items of such label that are correctly classied, but at the cost of also increasing the number of items that incorrectly receive the same label. The ideal classier is the one on the upper left corner of the ROC graph. Such point (true positive rate = 1, false positive rate = 0) indicates that every item classied with a certain label truly pertains to such label, and every item that truly pertains to such label receives the correct classication. In this chapter, I do not aim only on studying the performance of a single classier, but rather on comparing the performance of our prediction technique on a variety of situations, changing the team and the action space size (i.e., the board size). Hence, I also study the area under the ROC curve (AUC), as a way to synthesize the quality information from the curve into a single number. The higher the AUC, the better the prediction quality of my technique in a given situation. That is, a completely random classier would have an AUC of 0.5, and as the ROC curve moves towards the top-left corner of the graph, the AUC gets closer and closer to 1.0. In fact, the AUC metric has been shown to be equal to the probability of, given two pair of items with dierent labels (one \success" case and one \failure" case, in our situation), correctly considering the item that was truly a \success" as more likely to be a \success" case than the other item. It has also been shown to be related to other important statistical metrics, such as the Wilcoxon test of ranks, the Mann-Whitney U and the Gini coecient [Hanley and McNeil, 1982, Mason and Graham, 2002, Hand and Till, 2001]. I start by studying multiple thresholds in a xed board size (9x9), and in the next section I study the eect of increasing the action space. Hence, in Figure 8.5 I show the ROC curves for all teams in the 9x9 Go games, for 4 dierent stages of the game. Although in the beginning of the game (stage 5) it is hard to distinguish the results for the dierent teams, we can note that from the middle of the games to the end (stage 15 and 20), there is a clear distinction between the prediction quality for diverse and intermediate, when compared with the one for uniform, across many dierent thresholds. As mentioned, to formally compare these results across all stages I use the AUC metric. Hence, in Figure 8.6, I show the AUC for the three dierent teams in 9x9 Go. As we can see, the three teams have similar AUCs up to stage 10, but from that stage on we can get better AUCs for both diverse and intermediate, signicantly outperforming the AUC for uniform in all stages. We also nd that, considering all stages, we have a signicantly better AUC for diverse (than for uniform) in 85% of the cases, and for intermediate 159 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (a) Stage 5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (b) Stage 10 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (c) Stage 15 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (d) Stage 20 Figure 8.5: ROC curves, analyzing dierent thresholds in 9x9 Go. 160 5 10 15 20 0.5 0.6 0.7 0.8 Stage AUC ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse Figure 8.6: AUC for diverse, uniform and intermediate teams, in 9x9 Go. in 55% of the cases. Curiously, we can also note that even though intermediate is the weakest team, we can obtain for it the (signicantly) best AUC in the last stage of the games, surpassing the AUC found for the other teams. Overall, these results show that my hypothesis that we can make better predictions for teams that have higher diversity holds irrespective of the threshold used for classication. On the next section, I study the eect of increasing the action space size. 8.4.1.3 Action Space Size Let us start by showing, in Figure 8.7, ROC curves for diverse and uniform under dierent board sizes. It is harder to distinguish the curves on stages 5 and 10, but we can notice that the curve for 21x21 tends to dominate the others on stages 15 and 20. Again, to better study these results, we look at how the AUC changes for dierent teams and board sizes. In Figure 8.8 we can see the AUC results. For the diverse team (Figure 8.8 (a)), we start observing the eect of increasing the action space after stage 5, when the curves for 17 17 and 21 21 tends to dominate the other curves. In fact, the AUC for 17 17 is signicantly better than smaller boards in 60% of the stages, and in 80% of the stages after stage 5. Moreover, after stage 5 no smaller board is signicantly better than 1717. Concerning 2121, we can see that from stage 14, its curve completely dominates all the other curves. In all stages from 14 to 20 the result for 21 21 is signicantly better than for all other smaller boards. Hence, we can note that the eect of increasing the action space seems to depend on the stage of the game, and it gets more evident as the stage number increases. 161 ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Diverse ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Uniform (a) Stage 5 ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Diverse ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Uniform (b) Stage 10 ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Diverse ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Uniform (c) Stage 15 ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Diverse ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Uniform (d) Stage 20 Figure 8.7: ROC curves for diverse and uniform, for dierent board sizes. 162 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 0.50 0.55 0.60 0.65 0.70 0.75 0.80 ● 9x9 13x13 17x17 21x21 Stage AUC (a) Diverse Team ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 0.50 0.55 0.60 0.65 0.70 0.75 0.80 ● 9x9 13x13 17x17 21x21 Stage AUC (b) Uniform Team Figure 8.8: AUC for dierent teams and board sizes, organized by teams. Concerning the uniform team (Figure 8.8 (b)), up to 17 17 we cannot observe a positive impact of the action space size on the prediction quality; but for 21 21 there is clearly an improvement from the middle game when compared with smaller boards. On all 8 stages from stage 13 to stage 20, the result for 21 21 is signicantly better than for other board sizes. In terms of percentage of stages where the result for 21 21 is signicantly better than for 9 9, we nd that it is 40% for the uniform team, while it is 85% for the diverse team. Hence, the impact of increasing the action space occurs for diverse earlier in the game, and over a larger number of stages. Now, in order to compare the performance for diverse and uniform under dierent board sizes, I show the AUCs in Figure 8.9 organized by the size of the board (Figure 8.6 is repeated here in Figure 8.9 (a) to make it easier to observe the dierence between board sizes). It is interesting to observe that the quality of the predictions for diverse is better than for uniform, irrespective of the size of the action space. Moreover, while for 9 9 and 13 13 the prediction for diverse is only always signicantly better than for uniform after around stage 10, we can notice that for 17 17 and 21 21, the prediction for diverse is always signicantly better than for uniform, irrespective of the stage (except for stage 1 in 17 17). In fact, I can also show that the dierence between the teams is greater on larger boards. In Figure 8.10 we can see the dierence between diverse and uniform, in terms of area under the AUC graph, and also in terms of percentage of stages where diverse is signicantly better than uniform, for 9 9 and 21 21. The dierence between the areas in 9 9 and 21 21 is statistically signicant, with p = 0:0003337. I study again the accuracy, precision and recall in 21 21, as the results in these metrics may be more intuitive to understand for most readers (although limited to a xed 163 5 10 15 20 0.5 0.6 0.7 0.8 Stage AUC ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (a) 9x9 5 10 15 20 0.5 0.6 0.7 0.8 Stage AUC ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Diverse (b) 13x13 5 10 15 20 0.5 0.6 0.7 0.8 Stage AUC ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Diverse (c) 17x17 5 10 15 20 0.5 0.6 0.7 0.8 Stage AUC ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Diverse (d) 21x21 Figure 8.9: AUC for dierent teams and board sizes, organized by board sizes. 9x9 21x21 Board Size Area Difference 0.0 0.5 1.0 1.5 (a) By area 9x9 21x21 Board Size % Better 0.0 0.2 0.4 0.6 0.8 1.0 (b) By percentage of stages Figure 8.10: Dierences in prediction quality for the diverse and uniform teams. 164 threshold, in this case 0.5). I show these results in Figure 8.11, where for comparison I also plot the results for 99. For the diverse team, the accuracy in 2121 is consistently signicantly better than in 9 9, in all stages from stage 12. We also have a signicantly better accuracy in the rst 5 stages. Concerning uniform, the accuracy for 21 21 is only consistently better from stage 15 (and also in stage 1). Hence, again, we can notice a better prediction quality on larger boards, and also a higher impact of increasing the action space size on diverse than on uniform (in terms of number of stages, and also how early the performance improves). 8.4.1.4 Reduced Feature Vector Finally, I study the reduced feature vector. The results are very similar to the ones using the full feature vector, so I do not repeat them here. I note, however, that we can obtain similar results with a much more scalable representation 2 . In fact, in Figure 8.12 I study the area under the AUC graphs of the full and reduced representations, for all combinations of teams and board sizes. Surprisingly, the reduced representation turns out to be signicantly better for all teams on the 9 9 and 13 13 boards. This may happen since it is easier to learn a model with less features. For the diverse team, however, the importance of the full representation increases as the action space grows. On 1717, the dierence between the representations is not statistically signicant (p = 0:9156), while on 21 21 the full representation is actually signicantly better than the reduced one (p = 0:001794). 8.4.2 Ensemble Learning In this section, I demonstrate that my approach also applies to other domains, by using my technique in order to predict the performance of an ensemble system. Note that I am not using here an ensemble system to predict the performance of a team. I am still using a single predictor function, but now the team of agents whose nal performance we want to predict is a set of classiers voting in order to assign labels to sets of items. Hence, an agent here corresponds to one classier, an action corresponds to a label assignment, and a world state corresponds to an item. I use the scikit-learn's digits dataset [Pedregosa et al., 2011], which is composed of 1797 8 8 images. Each image is a hand-written digit, and the objective of the ensemble is to correctly identify the digit. Hence, for each item (i.e., image), there are 10 possible labels, corresponding to each possible digit, and only one of these labels will be a correct 2 All my main conclusions still hold, except that this time the AUC for 21 21 is signicantly better than for 9 9 around stage 15 for both teams. 165 5 10 15 20 0.5 0.6 0.7 0.8 Game Stage Accuracy ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform 21x21 Uniform 9x9 Diverse 21x21 Diverse 9x9 (a) Accuracy 5 10 15 20 0.2 0.3 0.4 0.5 0.6 0.7 Game Stage Failure Precision ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform 21x21 Uniform 9x9 Diverse 21x21 Diverse 9x9 (b) Failure Precision 5 10 15 20 0.5 0.6 0.7 0.8 0.9 Game Stage Success Precision ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform 21x21 Uniform 9x9 Diverse 21x21 Diverse 9x9 (c) Success Precision 5 10 15 20 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Game Stage Failure Recall ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform 21x21 Uniform 9x9 Diverse 21x21 Diverse 9x9 (d) Failure Recall 5 10 15 20 0.2 0.4 0.6 0.8 Game Stage Success Recall ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform 21x21 Uniform 9x9 Diverse 21x21 Diverse 9x9 (e) Success Recall Figure 8.11: Performance metrics over all turns of 9x9 and 21x21 Go games. 166 Full Reduced Area 0 5 10 15 20 Diverse Uniform IntermediateDiverse Uniform Diverse Uniform Diverse Uniform 9x9 13x13 17x17 21x21 Figure 8.12: Comparison of prediction quality with the full and reduced representation. classication. I randomly select 1000 items for training the agents. Among the chosen 1000 samples, 400 randomly selected items are used in training each individual agent. I use the remaining 797 items for the testing phase, where the agents will vote to assign labels to sets of 50 items randomly sampled from the remaining 797. To avoid confusion, note that I am referring here to the training process of the individual agents, not of my team prediction methodology (which I will discuss later in this section). I use 6 classication algorithms: SVM, Logistic Regression, Decision Tree, Nearest Neighbour, Bernoulli Naive Bayes, Gaussian Naive Bayes. Each classication algorithm corresponds to one agent. I study two dierent teams. Diverse 1, composed by copies of the SVM agent with dierent parametrizations; and Diverse 2, composed by one copy of each agent. Unlike in the Go domain, where multiple copies of the same agent (like Fuego), would still vote for dierent actions due to the randomness involved in the rollouts of the UCT Monte Carlo algorithm, a classier is deterministic given its algorithm and parametrization. Therefore, I do not present results for Uniform in this domain, and instead I present two possible variations for a diverse team. I dene one instance of the problem solving process as a set of 50 items to be classied. Similar to the previous domain, where we have a large set of games (each with a set of turns), we will have here multiple sets of 50 items to be classied (each corresponding to one world state). Each set of items is constructed from random sampling 50 items from our testing set (i.e., the 797 items that were not used to train the agents). The agents will vote to assign labels for each item, and the problem solving process will be completed when the team assigns a label to each one of the 50 items. If the team classies correctly more than a threshold percentage of the items, I consider it a 167 \Success". Otherwise, the performance of the team is classied as \Failure". I explain below how the threshold is calculated. I randomly generate 2500 sets in this fashion. I will now discuss the training of my team prediction methodology. I randomly select 2000 sets for training, where the agents voted to classify each item, as discussed (note that the training set previously discussed in this section was used to train the individual agents). I use the mean performance over these 2000 sets as our threshold. Hence, if the team is able to classify a higher number of items than its average performance, I consider it a \success". Otherwise, the problem solving process will be classied as \failure". The remaining sets are used to test my team prediction methodology. I again evaluate accuracy, precision and recall. One run consists of the ensemble evaluating one set of items, and the performance metric at each stage will be calculated as the average over 500 runs (similarly as in the Computer Go case, where I evaluated across multiple games). As before, I use random sub-sampling evaluation. Hence, the whole training and testing procedure is repeated 50 times; and at each time the training and the testing set (of my team prediction methodology) are randomly sampled from the full database of runs. Again, in all graphs I show the average results, and the error bars show the 99% condence interval (p = 0:01), according to a t-test. When I say that a certain result is signicantly better than another, I mean statistically signicantly better, according to a t-test where p< 0:01, unless I explicitly give a p value. I show the results in Figure 8.13, where each stage corresponds to an item to be classied. As we can see, we are able to obtain high quality predictions for both teams quickly. The accuracy for both teams is always signicantly higher than 50% after Stage 3. Around the middle of the runs (i.e., Stage 25), the accuracy of Diverse 1 is already around 60%, with Diverse 2 only close behind. Towards the end of the runs we obtain around 70% accuracy for Diverse 1 and 60% accuracy for Diverse 2. Note that we also obtain results signicantly better than 50% for most of the stages in the other performance metrics. In Figure 8.14, we can see the results using the reduced feature vector. As we can see, we can still obtain high-quality predictions in a more compact representation. This holds especially for Diverse 1, as we obtain an accuracy higher than 70% towards the end of the runs. 8.5 Analysis of Coecients I analyze the coecients that are learned in my experiments, in order to better under- stand the prediction functions. I focus here in analyzing the reduced feature vector in 168 0 10 20 30 40 50 0.4 0.5 0.6 0.7 0.8 Stage Accuracy Diverse1 Diverse2 (a) Accuracy 0 10 20 30 40 50 0.4 0.5 0.6 0.7 0.8 Stage Failure Precision Diverse1 Diverse2 (b) Failure Precision 0 10 20 30 40 50 0.4 0.5 0.6 0.7 0.8 0.9 Stage Success Precision Diverse1 Diverse2 (c) Success Precision 0 10 20 30 40 50 0.4 0.5 0.6 0.7 0.8 Stage Failure Recall Diverse1 Diverse2 (d) Failure Recall 0 10 20 30 40 50 0.4 0.5 0.6 0.7 0.8 Stage Success Recall Diverse1 Diverse2 (e) Success Recall Figure 8.13: Performance metrics over a set of items in ensemble system classication (full feature vector). 169 0 10 20 30 40 50 0.4 0.5 0.6 0.7 0.8 Stage Accuracy Diverse1 Diverse2 (a) Accuracy 0 10 20 30 40 50 0.4 0.5 0.6 0.7 0.8 Stage Failure Precision Diverse1 Diverse2 (b) Failure Precision 0 10 20 30 40 50 0.4 0.5 0.6 0.7 0.8 0.9 Stage Success Precision Diverse1 Diverse2 (c) Success Precision 0 10 20 30 40 50 0.4 0.5 0.6 0.7 0.8 0.9 Stage Failure Recall Diverse1 Diverse2 (d) Failure Recall 0 10 20 30 40 50 0.4 0.5 0.6 0.7 0.8 0.9 Stage Success Recall Diverse1 Diverse2 (e) Success Recall Figure 8.14: Performance metrics over a set of items in ensemble system classication (reduced feature vector). 170 the Computer Go domain, as its lower number of coecients makes the analysis more practicable (and its results are similar to the ones with the full feature vector). Hence, I study the dierent coecients learned for each subset size (i.e., the size of the subset of agents that agrees on the chosen action), under dierent teams and board sizes. As a way to make a comparison among the coecients under dierent situations, I plot the normalized coecients, by dividing each by their overall sum. This allows us to study how the relative importance of each coecient in determining whether a team will be successful changes under dierent situations. Similarly to the previous section, I also plot error bars showing the 99% condence interval (i.e., p = 0:01) by performing a ttest over the results of my random subsampling validation. I start by plotting, in Figure 8.15, the coecients organized according to the board sizes, to better compare how they change for each team. Note that the closer to 0 a coecient is (i.e., the lower the bar size in the graphs), the higher the importance of the corresponding subset size in determining that a game will be a victory for the team. Also, the numbers on the x axis indicate the respective subset size. We can make several observations from these graphs. First, note that in general the learned coecients are quite stable, as the low error bars indicate a low variance across dierent samples. Second, for almost all the coecients we have a statistically signicant dierence across the dierent teams. This indicates that the learned functions are not trivial, as the appropriate values for each coecient strongly depends on the team members. Furthermore, we normally would expect the coecients to increase (i.e., get closer to 0) as the respective subset size increases, capturing the notion that the higher the agreement, the higher the likelihood of success. However, that is not always the case, and we can, in fact, observe many surprising coecients, especially in larger board sizes. On 21 21 Go, the \strict increase" hypothesis does not happen for both the diverse and the uniform teams. The coecient for subsets of size 1 for the uniform team is the most surprising result, as it seems to indicate some tendency of winning the games when the agents fully disagree. For intermediate, we can notice such non-strict increase even on 9 9 Go (as the coecient for subsets of size 4 is signicantly higher than the one for subsets of size 5), which may be caused by this being the weakest team (and, hence, more agents agreeing on the same action may not really indicate a higher likelihood of the action being optimal). In order to better analyze how the coecients change as the board size increases, I plot in Figure 8.16 the coecients organized by teams. It is interesting to note that the coecients evolve in dierent ways for the dierent teams. For diverse, we notice an increase in the importance of subsets of size 3 in 21 21 Go, curiously signicantly 171 Uniform Intermediate Diverse Coefficient −0.5 −0.4 −0.3 −0.2 −0.1 0.0 1 2 3 4 5 (a) 9x9 Uniform Diverse Coefficient −0.5 −0.4 −0.3 −0.2 −0.1 0.0 1 2 3 4 5 (b) 13x13 Uniform Diverse Coefficient −0.5 −0.4 −0.3 −0.2 −0.1 0.0 1 2 3 4 5 (c) 17x17 Uniform Diverse Coefficient −0.5 −0.4 −0.3 −0.2 −0.1 0.0 1 2 3 4 5 (d) 21x21 Figure 8.15: Normalized coecients of all teams and board sizes, organized by board sizes. 172 surpassing subsets of size 4. One possible explanation could be that in such board size the team is more likely correct when the three strongest agents are in (exclusive) agreement. For the uniform team the results are even more surprising. We can notice a consistent increase in the importance of subsets of size 1. This seems to indicate that the team is more likely to be winning when the team is in full disagreement as the board size grows, which does not correspond well with the ST agent model. The ST agent model, however, was originally created to model diverse teams (Chapter 4), hence a deviation on uniform teams could be expected. Perhaps this result may happen because the games where the agents fully disagree tend to be more complex, and the uniform team may be in a higher advantage on dealing with such complex positions than its opponent. It is worthy of notice that even though the prediction functions for these teams are so dierent, and evolve in such dierent ways as the number of actions increases, we could still learn them adequately for all these dierent situations, always obtaining high-quality prediction results. This shows that my learning methodology is robust, as it is able to learn good prediction functions for a variety of situations. 8.6 Discussion I show, both theoretically and experimentally, that we can make high-quality predictions about the performance of a team of voting agents, using only information about the frequency of agreements among them. I present two kinds of feature vectors, one that includes information about which specic agents were involved in an agreement and one that only uses information about how many agents agreed. Although the number of fea- tures in the former increases exponentially with the number of agents, causing scalability concerns, the latter representation scales much better, as it increases linearly. Theoret- ically, the full feature vector should have better results in general, but as I discuss in Corollary 8.3.5, the reduced representation approximates well the full one under some conditions. In fact, however, in my experiments we nd that, although the results are similar, the reduced representation is statistically signicantly better in most cases. This may happen since it is easier to learn a model with less features. Hence, for large teams we can safely use the reduced feature vector, avoiding scalability problems. Moreover, in real applications we usually do not have extremely large teams of vot- ing agents. Unless we have an \idealized" diverse team, the performance is expected to converge after a certain number of agents (Chapter 6). In Chapters 3 and 4, signicant improvements are already obtained with only 6 agents, while Chapter 6 shows little im- provement as teams grow larger than 15 agents. Therefore, even in cases where Corollary 173 9x9 13x13 17x17 21x21 Coefficient −0.5 −0.4 −0.3 −0.2 −0.1 0.0 1 2 3 4 5 (a) Diverse Team 9x9 13x13 17x17 21x21 Coefficient −0.5 −0.4 −0.3 −0.2 −0.1 0.0 1 2 3 4 5 (b) Uniform Team 9x9 Coefficient −0.5 −0.4 −0.3 −0.2 −0.1 0.0 1 2 3 4 5 (c) Intermediate Team Figure 8.16: Normalized coecients of all teams and board sizes, organized by teams. 174 8.3.5 does not apply, and the reduced vector is not a good approximation, the scalability of the full feature vector might not be a real concern. Based on classical voting theory, and in my Proposition 8.3.1, Observation 8.3.2 and Proposition 8.3.3, we would expect the predictions to work better for the uniform team, if it is composed of copies of the best agent. However, I present a more general model in Theorem 8.3.4, that does not really depend on plurality being a maximum likelihood estimator (MLE). In fact, I show in my experiments that the prediction works signicantly better for the diverse team, and I explain this phenomenon in Corollary 8.3.6. Moreover, the prediction for intermediate works as well as for the other teams, and also signicantly better than for the other teams towards the last stages of the games, even though it is a signicantly weaker team. We would not expect this result, based on classical voting theory and Proposition 8.3.1, Observation 8.3.2 and Proposition 8.3.3. We can, however, expect such result based on my more general model in Theorem 8.3.4, as I show that the prediction does not really depend at all on plurality being a MLE. Hence, it can still work well in cases where the team is weaker, and the MLE assumption does not hold well. We also observed that the quality of my prediction increases signicantly as the board size increases, for both the diverse and the uniform team, but the impact for diverse occurs earlier in the game, and over a larger number of stages. I explain such phenomenon in Theorem 8.3.7. As normally complex problems are characterized by having a large action space, this shows that we can make better predictions for harder problems, when it is actually more useful to be able to make predictions about the team performance. As I study the Receiver Operating Characteristic (ROC) curves, and the respective Areas Under the Curves (AUC), I can also show that my experimental conclusions hold irrespective of how we interpret the output of the prediction function. Hence, we nd that across many dierent thresholds we can still make better predictions for the diverse team, and the prediction quality still improves in larger boards. We also notice that the dierence on the prediction quality for diverse and uniform teams increases in larger boards. I analyze the coecients of the reduced feature vector, and note that they are highly non-trivial. The way the coecient for subsets of size 1 evolves for the uniform team is the most surprising result, and seems to suggest that the performance improves for such team for a dierent reason than the one presented in Theorem 8.3.7. As the theorem assumes ST agents, however, we can expect it to cover better the situation of the diverse team. A better understanding of why the coecients turned out to be how they are is an interesting direction for future work. Nevertheless, it is a positive result that my learning technique could work well and learn adequate prediction functions across all these dierent situations. 175 I also applied my technique to the Ensemble Learning domain, where I was able to predict the nal performance of a team of classiers in a dataset of hand-written digits recognition. This shows that my model is really domain independent, as it worked well in two completely dierent domains. However, although I showed a great performance in prediction, I did not present in this chapter what an operator should actually do as the prediction of failure goes high. Possible remedial procedures (and the relative qualities of each one of them) vary according to each domain, but here I discuss some possible situations. For example, consider a complex problem being solved in a cluster of computers. We do not want to overly allocate resources, as that would be a waste of computational power that could be allocated to other tasks (or a waste of electrical energy, at the very least). However, we could increase the allocation of resources for solving the current problem when it becomes necessary, according to our prediction. While playing a game, it is well known that the best strategy changes according to each specic opponent we are facing [Ganzfried and Sandholm, 2011, Southey et al., 2005, Lockett et al., 2007, Schadd et al., 2007, Bakkes et al., 2009]. However, it is in general hard to know which player is our current antagonist. Therefore, we could start playing the game with the team that works the best against general opponents, and dynamically change the team as our prediction of failure goes high, trying to adapt to the current situation. Moreover, it is known that the best voting rule depends on the noise model of the agents [Conitzer and Sandholm, 2005]. However, in general, such model is not known for existing agents (Chapter 3). Therefore, we could start by playing a game with a very general rule, such as plurality voting, and dynamically try dierent voting rules according to our current prediction. Note that although Proposition 8.3.1, Observation 8.3.2 and Proposition 8.3.3 refer to plurality voting, my general model presented in Theorem 8.3.4 does not really depend on the voting rule. Finally, in this chapter I always trained and tested my classier under similar condi- tions, as in traditional machine learning approaches. Training and testing under dierent conditions leads to a transfer learning problem, which is a current topic of research in the machine learning literature [Banerjee and Stone, 2007, Konidaris et al., 2012, Taylor and Stone, 2009]. Hence, exploring transfer learning in the context of my technique is an interesting direction for future work. 176 8.7 Conclusion Voting is a widely applied domain independent technique, that has been used in a variety of domains, such as: machine learning, crowdsourcing, board games, forecasting systems, etc. In this chapter, I present a novel method to predict the performance of a team of agents that vote together at every step of a complex problem. My method does not use any domain knowledge and is based only on the frequencies of agreement among the agents. I explain theoretically why my prediction works. First, I present an explanation based on classical voting theories, but such explanation fails to fully explain my experimental results. Hence, I also present a more general model, that is independent of the voting rule, and also independent of any optimality assumptions about the voting rule (i.e., the voting rule does not need to be a maximum likelihood estimator across all world states). Such model allows a deeper understanding of the prediction methodology, and a more complete comprehension of my experimental results. I perform experiments in the Computer Go domain with three dierent teams, each having dierent levels of diversity and strength (i.e., performance), and four dierent board sizes. I showed that the prediction works online at each step of the problem solving process, and matches often with an in-depth analysis (that takes orders of magnitude longer time). Hence, I could achieve a high prediction quality for all teams, despite their dierences. In particular, I show that, contrary to what would be expected based on classical voting models, my prediction is not better for stronger teams, and can actually work equally well (even signicantly better in some cases) for a very weak team. Moreover, I showed that I could achieve a higher prediction quality for a diverse team than for a uniform team. Furthermore, I veried experimentally that the prediction quality increase as the action space (board size) grows (and the impact occurs earlier in the game for the diverse team). All that could be expected based on my more general model. I also study the Receiver Operating Characteristic curves of my results, and their re- spective Areas Under the Curves (AUC). Hence, I veried that my conclusions hold across many dierent thresholds (i.e., dierent ways to interpret the output of my prediction function). Moreover, I tested my approach when predicting the nal performance of a team of classiers, a domain that is very dierent than Computer Go. I was still able to obtain high-quality predictions, showing that my approach really works across dierent domains. Finally, I studied in detail the prediction functions learned. My analysis showed that these functions are highly non-trivial, and vary signicantly according to each team and board size. In fact, better understanding how and why these coecients evolved in the 177 way they did as the board size grows for dierent teams is an open direction for further study. Overall, however, my prediction technique is very robust, as it could learn adequate functions across all these situations. Therefore, a system operator can use my technique to take remedial procedures if the team is not performing well, or incorporated within an automatic procedure to dynam- ically change the team and/or the voting rule. I discussed in detail how that could be executed in dierent domains. Hence, this chapter is a signicant step towards not only a deeper understanding of voting systems, but also towards a greater applicability of voting in a variety of domains, by combining the potential of voting in nding correct answers with the ability to access the performance of teams and the predictability of the nal outcome. 178 Part V Discussions and Conclusions 179 Chapter 9 Conclusions This thesis presented theoretical and experimental results in decision-centered teamwork, a novel paradigm in articial intelligence. Decision-centered teamwork is the analysis of agent teams that iteratively take joint decisions into solving complex problems. I divide it in three fundamental challenges: Agent Selection, Aggregation of Opinions and Team Assessment. In Agent Selection, my main focus was on the importance of diversity when forming teams. I presented three models: the rst shows that a diverse team can outperform a uniform team, and gives the necessary conditions for that to happen. The second shows that diverse teams get stronger in large action spaces, allowing one to better identify when diverse teams should be used. It introduces the novel model of spreading tail agents, which assign a non-zero probability to a larger set of actions as the action space increases. The third model studies agent teams for design problems, where the focus is on providing a large set of optimal actions, for a human to select according to aesthetics or other factors. I show that diverse teams improve as the number of agents grows, while uniform teams actually decrease in performance. In Aggregation of Opinions, I studied ranking extraction techniques, and showed in the Computer Go domain that the Borda voting rule outperforms plurality when using the ranking by sampling technique, which ranks actions according to how frequently they are played when sampling an agent multiple times. However, in the context of in uencing social networks, weighted plurality was still the best voting rule. I also studied the simultaneous in uencing and mapping problem, where we must iteratively discover a network and spread in uence at the same time. That was the only part of the thesis that does not concern voting, but rather a linear combination over two greedy algorithms. I showed that under some conditions, we can spread in uence as well as the classical greedy in uence maximization algorithm, but mapping the network much better. 180 Concerning Team Assessment, I presented a novel technique to predict whether a team of voting agents will be successful or will fail while solving a complex problem. My prediction can be performed online, and it is completely domain independent. I demonstrate its eectiveness in the Computer Go domain, and in the Ensemble Learning domain, and I also showed that the quality of the prediction increases in large action spaces. Although this thesis presents several contributions, in order to eectively use multi- agent teams to jointly take decisions into solving complex problems much more must be developed. Some topics I will investigate in my future work include: Dynamic Teams: For eective teamwork, we must dynamically change teams. Hence, we need to further develop assessment mechanisms, and also methods to decide how to switch members and coordination methods according to the assessment results and the problem state. Agent Adaptation for Teamwork: Although some agents may already exist for a certain problem, most often they were not originally designed to coordinate with others, or were not designed to coordinate using our desired mechanism. Therefore, enabling an easy re-use of agents for teamwork will open the doors for a large number of agents available to choose from. Team Generation: For many problems we may not even have a few agents available to choose from; in fact, there could be only one agent. My research shows, however, that diversity is an important factor in teamwork. Therefore, given a \basic" agent, automatic methods to create diverse variations of it are very desirable, as a way to easily form diverse teams. As I showed in the thesis, generating random parametrizations of an agent is not yet a good solution; as the resulting team may still have limited diversity, besides the risk of the individual members being too weak in comparison with the \basic" agent. Mixed Teams: In order to eectively use human input, AI systems must present complex and large-scale information in a manageable way, while simultaneously gaining user trust. Additionally, we must not assume that an \optimal" AI system is always \correct", since any model has limitations. Hence, the system must also respect and trust the user opinion, delegating when necessary and even changing its own behavior accordingly. Besides, I believe that an interaction between a human and the computer is essential when handling projects related to arts and creativity. Simultaneous In uencing and Mapping: In Chapter 7, I studied the problem of learning a social network graph, while spreading in uence. However, in order to eectively solve this problem, a deeper study is necessary. In particular, it is important to better understand how we can estimate beforehand the amount of information that we will 181 obtain when interviewing one node (according, for example, to its prole). We must also understand better how the information provided by a node correlates with the list of neighbors in the social network graph. That is, we must understand how likely will be someone to teach us about each one of his/her friends, and how likely would he/she teach us about nodes that are further away in the social network graph; allowing us to better model the knowledge gain at each intervention, and hence improve our performance. 182 References Noa Agmon and Peter Stone. Leading ad hoc agents in joint action settings with multiple teammates. In Proceedings of the 11th International Conference on Au- tonomous Agents and Multiagent Systems, AAMAS '12, pages 341{348, Richland, SC, 2012. Angela A Aidala and Esther Sumartojo. Why housing? AIDS and Behavior, 11 (2):1{6, 2007. Sandip Aine, Siddharth Swaminathan, Venkatraman Narayanan, Victor Hwang, and Maxim Likhachev. Multi-heuristic A*. In Proceedings of Robotics: Science and Systems X, RSS, 2014. Abdullah AL-Malaise, Areej Malibari, and Mona Alkhozae. Students' performance prediction system using multi agent data mining technique. International Journal of Data Mining & Knowledge Management Process, 4(5), September 2014. Noga Alon, Baruch Awerbuch, Yossi Azar, Niv Buchbinder, and Joseph (Se) Naor. The online set cover problem. SIAM Journal on Computing, 39(2), 2009. Georgios Amanatidis, Nathana el Barrot, J er^ ome Lang, Evangelos Markakis, and Bernard Ries. Multiple referenda and multiwinner elections using hamming dis- tances: Complexity and manipulability. In Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2015. Aris Anagnostopoulos, Diodato Ferraioli, and Stefano Leonardi. Competitive in- uence in social networks: Convergence, submodularity, and competition eects. In Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2015. Pierpaolo Andriani and Bill McKelvey. Beyond gaussian averages: Redirecting international business and management research toward extreme events and power laws. Journal of International Business Studies, 38(7), 2007. B. Aranda and C. Lasch. Flocking, in Tooling. Princeton Architectural Press, 2006. Haris Aziz, Serge Gaspers, Joachim Gudmundsson, Simon Mackenzie, Nicholas Mattei, and Toby Walsh. Computational aspects of multi-winner approval vot- ing. In Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2015. 183 Yoram Bachrach, Thore Graepel, Gjergji Kasneci, Michal Kosinski, and Jurgen Van Gael. Crowd IQ: Aggregating opinions to boost performance. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, pages 535{542, 2012. Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streaming submodular maximization: Massive data summarization on the y. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, 2014. Eyal Baharad, Jacob Goldberger, Moshe Koppel, and Shmuel Nitzan. Distilling the wisdom of crowds: weighted aggregation of decisions on multiple issues. Journal of Autonomous Agents and Multi-Agent Systems, 22:31{42, 2011. E. Baharlou and A. Menges. Generative agent-based design computation, in com- putation and performance. In Proceedings of the 31st Education and research in Computer Aided Architectural Design in Europe Conference, eCAADe, 2013. Sander Bakkes, Pieter Spronck, and Jaap van den Herik. Opponent modelling for case-based adaptive game AI. Entertainment Computing, 1(1):27{37, 2009. Bikramjit Banerjee and Peter Stone. General game learning using knowledge trans- fer. In Proceedings of the 20th International Joint Conference on Articial Intelli- gence, IJCAI, 2007. Samuel Barret and Peter Stone. Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork. In Proceedings of the Twenty-Ningth Conference on Articial Intelligence, AAAI, 2015. Adam Barth, Benjamin I. P. Rubinstein, Mukund Sundararajan, John C. Mitchell, Dawn Song, and Peter L. Bartlett. A learning-based approach to reactive secu- rity. In Radu Sion, editor, Financial Cryptography and Data Security { Revised Selected Papers of the Proceedings of the Financial Cryptography and Data Security Conference, volume 6052 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2010. J. J. Bartholdi III, C. A. Tovey, and M. A. Trick. The computational diculty of manipulating an election. Social Choice and Welfare, 6(3):227{241, 1989a. J. J. Bartholdi III, C. A. Tovey, and M. A. Trick. Voting schemes for which it can be dicult to tell who won the election. Social Choice and Welfare, 6(3):157{165, 1989b. Mohammadhossein Bateni, Mohammadtaghi Hajiaghayi, and Morteza Zadi- moghaddam. Submodular secretary problem and extensions. ACM Transactions on Algorithms, 9(4), 2013. Petr Baudi s and Jean-loup Gailly. Pachi: State of the Art Open Source Go Pro- gram. In Advances in Computer Games 13, November 2011. 184 Alexander Berman and Valencia James. Kinetic imaginations: Exploring the pos- sibilities of combining AI and dance. In Proceedings of the Twenty-Fourth Interna- tional Joint Conference on Articial Intelligence, IJCAI, 2015. D. S. Bernstein, C. Amato, E. Hansen, and S. Zilberstein. Policy iteration for decentralized control of markov decision processes. Journal of Articial Intelligence Research, 34:89{132, 2009. Daniel S. Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. The complexity of decentralized control of markov decision processes. Mathematics of Operations Research, 27:819{840, 2002. Dimitri P. Bertsekas. Auction algorithms for network ow problems: A tutorial introduction. Computational Optimization and Applications, 1(1):7{66, 1992. Alina Bialkowski, Patrick Lucey, Peter Carr, Yisong Yue, Sridha Sridharan, and Iain Matthews. Large-scale analysis of soccer matches using spatiotemporal tracking data. In Proceedings of the 2014 IEEE International Conference on Data Mining, ICDM, 2014. U. Bogenst atter. Prediction and optimization of life-cycle costs in early design. Building, Research & Information, 28:376{386, 2000. C. Boutilier, I. Caragiannis, S. Haber, T. Lu, A. D. Procaccia, and O. Sheet. Optimal social choice functions: A utilitarian view. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 197{214, 2012. Craig Boutilier. A POMDP formulation of preference elicitation problems. In Proceedings of the 18th National Conference on Articial Intelligence, AAAI, 2002. Florian Brandl, Felix Brandt, and Johannes Hofbauer. Incentives for participation and abstention in probabilistic social choice. In Proceedings of the 14th Interna- tional Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2015. Yann Braouezec. Committee, expert advice, and the weighted majority algorithm: An application to the pricing decision of a monopolist. Computational Economics, 35(3):245{267, March 2010. Nils Bulling, Mehdi Dastani, and Max Knobbout. Monitoring norm violations in multi-agent systems. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2013. Robin Burke. Hybrid recommender systems: Survey and experiments. User Mod- eling and User-Adapted Interaction, 12(4):331{370, November 2002. Adam Burnett, Evon Khor, Philippe Pasquier, and Arne Eigenfeldt. Validation of harmonic progression generator using classical music. In Proceedings of the Third International Conference on Computational Creativity, ICCC, 2012. 185 Ioannis Caragiannis, Ariel D. Procaccia, and Nisarg Shah. When do noisy votes reveal the truth? In Proceedings of the 14th ACM conference on Electronic Com- merce, 2013. Pablo Miranda Carranza and Paul Coates. Swarm modelling: the use of swarm intelligence to generate architectural form. In Proceedings of the 3rd International Generative Art Conference, Generative Art, 2000. U. Cekmez, M. Ozsiginan, and O. K. Sahingoz. Adapting the GA approach to solve traveling salesman problems on CUDA architecture. In Proceedings of the IEEE 14th International Symposium on Computational Intelligence and Informat- ics, CINTI, 2013. U. Cekmez, M. Ozsiginan, M. Aydin, and O. K. Sahingoz. UAV path planning with parallel genetic algorithms on CUDA architecture. In International Conference of Parallel and Distributed Computing, 2014. Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cam- bridge University Press, 2006. Shurojit Chatterji, Arunava Sen, and Huaxia Zeng. Random dictatorship domains. Games and Economic Behavior, 86:212{236, 2014. H. Chen and X. Yao. Regularized negative correlation learning for neural network ensembles. IEEE Transactions on Neural Networks, 20(12), 2009. Bark Cheung Chiu and Georey I. Webb. Using decision trees for agent modeling: Improving prediction performance. User Modeling and User-Adapted Interaction, 8:131{152, 1998. Edith Cohen, Daniel Delling, Thomas Pajor, and Renato F. Werneck. Sketch-based in uence maximization and computation: Scaling up with guarantees. Technical report, Microsoft Research, 2014. P. R. Cohen and H. J. Levesque. Conrmation and joint action. In Proceedings of the International Joint Conference on Articial Intelligence, IJCAI, 1991. Marquis de Condorcet. Essai sur l'application de l'analyse a la probabilite des decisions rendues a la pluralite des voix. L'Imprimerie Royale, 1785. V. Conitzer and T. Sandholm. Universal voting protocol tweaks to make manipu- lation hard. In Proceedings of the 18th International Joint Conference on Articial Intelligence, IJCAI, 2003. Vincent Conitzer and Tuomas Sandholm. Common voting rules as maximum like- lihood estimators. In Proceedings of the Twenty-First Conference Conference on Uncertainty in Articial Intelligence, UAI, 2005. J. Davies, G. Katsirelos, N. Narodytska, and T. Walsh. Complexity of and algo- rithms for borda manipulation. In Proceedings of the 25th AAAI Conference on Articial Intelligence, AAAI, 2011. 186 Swapnil Dhamal, Prabuchandran K. J., and Yadati Narahari. A multi-phase ap- proach for improving information diusion in social networks. In Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2015. Mark d'Inverno and Jon McCormack. Heroic versus Collaborative AI for the arts. In Proceedings of the Twenty-Fourth International Joint Conference on Articial Intelligence, IJCAI, 2015. Thu Trang Doan, Yuan Yao, Natasha Alechina, and Brian Logan. Verifying hetero- geneous multi-agent programs. In Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2014. T. M. Echenagucia, A. Capozzoli, Y. Cascone, and M. Sassone. The early design stage of a building envelope: Multi-objective search through heating, cooling and lighting energy performance analysis. Applied Energy, 154:577{591, 2015. Edith Elkind and Nisarg Shah. Electing the most probable without eliminating the irrational: Voting over intransitive domains. In Proceedings of the 30th Conference on Uncertainty in Articial Intelligence, UAI, 2014. M. Enzenberger, M. M uller, B. Arneson, and R. Segal. Fuego - An open-source framework for board games and go engine based on Monte Carlo Tree Search. IEEE Transactions on Computational Intelligence and AI in Games, 2(4):259 {270, dec. 2010. Halil Erhan, Ivy Wang, and Naghmi Shireen. Interacting with thousands: A parametric-space exploration method in generative design. In Proceedings of the 2014 Conference of the Association for Computer Aided Design in Architecture, ACADIA, 2014. Robert S. Erikson and Christopher Wlezien. Are political markets really superior to polls as election predictors? The Public Opinion Quarterly, 72(2):190{215, 2008. Joseph Farfel and Vincent Conitzer. Aggregating value ranges: Preference elicita- tion and truthfulness. Journal of Autonomous Agents and Multi-Agent Systems, 22:127{150, 2011. J. Felkner, E. Chatzi, and T. Kotnik. Interactive particle swarm optimization for the architectural design of truss structures. In Proceedings of the 2013 IEEE Symposium in Computational Intelligence for Engineering Solutions, CIES, pages 15{22, 2013. Y. Freund and R.E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29, 1999. FSF. Gnugo. http://www.gnu.org/software/gnugo/, 2009. 187 Bin Fu, Zhihai Wang, Rong Pan, Guandong Xu, and Peter Dolog. An integrated pruning criterion for ensemble learning based on classication accuracy and diver- sity. In Lorna Uden, Francisco Herrera, Javier Bajo Perez, and Juan M. Corchado Rodriguez, editors, Proceedings of the 7th International Conference on Knowledge Management in Organizations: Service and Cloud Computing, KMO'12, volume 172 of Advances in Intelligent Systems and Computing, pages 47{58. Springer, 2012. Sam Ganzfried and Tuomas Sandholm. Game theory-based opponent modeling in large imperfect-information games. In Proceedings of the 10th International Con- ference on Autonomous Agents and Multiagent Systems, AAMAS, 2011. Matthew E. Gaston and Marie desJardins. Agent-organized networks for dynamic team formation. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, pages 230{237, New York, NY, USA, 2005. ACM. Sylvain Gelly, Yizao Wang, R emi Munos, and Olivier Teytaud. Modication of UCT with patterns in Monte-Carlo Go. Technical report, Institut National de Recherche en Informatique et en Automatique, 2006. Katie Genter and Peter Stone. In uencing a ock via ad hoc teamwork. In Proceed- ings of the Ninth International Conference on Swarm Intelligence, ANTS, 2014. D. J. Gerber and S.-H. E. Lin. Designing in complexity: Simulation, integration, and multidisciplinary design optimization for architecture. Simulation, April 2013. David J. Gerber. Parametric practices: Models for design exploration in architec- ture. PhD thesis, Harvard University, 2007. J. S. Gero. Computational models of innovative and creative design processes. Technological Forecasting and Social Change, 64:183{196, 2000. J. S. Gero and R. Sosa. Complexity measures as a basis for mass customization of novel designs. Environment and Planning B: Planning and Design, 35(1):3{15, 2008. Anastasia Globa, Michael Donn, and Jules Moloney. Abstraction versus cased- based: A comparative study of two approaches to support parametric design. In Proceedings of the 2014 Conference of the Association for Computer Aided Design in Architecture, ACADIA, 2014. Daniel Golovin and Andreas Krause. Adaptive submodularity: A new approach to active learning and stochastic optimization. In Proceedings of the 23rd International Conference on Learning Theory, 2010. Carla P. Gomes and Bart Selman. Algorithm portfolios. Articial Intelligence, 126: 43{62, 2001. 188 F. Grandoni, A. Gupta, S. Leonardi, P. Miettinen, P. Sankowski, and M. Singh. Set covering with our eyes closed. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS, 2008. B. Grosz and S. Kraus. Collaborative plans for complex group actions. Articial Intelligence, 86:269{358, 1996. Christian Guttmann. Making allocations collectively: Iterative group decision mak- ing under uncertainty. In Ralph Bergmann, Gabriela Lindemann, Stefan Kirn, and Michal Pechoucek, editors, Proceedings of the 6th German Conference on Multi- agent System Technologies, volume 5244 of Lecture Notes in Computer Science, pages 73{85, Kaiserslautern, Germany, 2008. Springer. Andras Gyorgy and Levente Kocsis. Ecient multi-start strategies for local search algorithms. Journal of Articial Intelligence Research, 41:407{444, 2011. David J. Hand and Robert J. Till. A simple generalization of the area under the ROC curve for multiple class classication problems. Machine Learning, 45:171{ 186, 2001. James A. Hanley and Barbara J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29{36, 1982. Sean Hanna. Where creativity comes from? The social spaces of embodied minds. In Proceedings of the International Conference of Computational and Cognitive Models of Creative Design VI, 2005. Graeme A. Haynes. Testing the boundaries of the choice overload phenomenon: The eect of number of options and time pressure on decision diculty and satisfaction. Psychology & Marketing, 26(3):204{212, 2009. Linli He and Thomas R. Ioerger. A quantitative model of capabilities in multi- agent systems. In Hamid R. Arabnia, Rose Joshua, and Youngsong Mun, editors, Proceedings of the International Conference on Articial Intelligence, IC-AI, pages 730{736, Las Vegas, USA, 2003. ISBN 1-932415-13-0. Erik L Heiny and David Blevins. Predicting the atlanta falcons play-calling using discriminant analysis. Journal of Quantitative Analysis in Sports, 7(3), July 2011. Ernst Hellinger. Neue begr undung der theorie quadratischer formen von un- endlichvielen ver anderlichen. Journal f ur die reine und angewandte Mathematik, 136:210{271, 1909. D. Holzer, R. Hough, and M. Burry. Parametric design and structural optimisation for early design exploration. International Journal of Architectural Computing, 5: 625{643, 2007. Lu Hong and Scott E. Page. Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proceedings of the National Academy of Sciences of the USA, 101(46):16385{16389, 2004. 189 Josie Hunter, Franco Raimondi, Neha Rungta, and Richard Stocker. A synergistic and extensible framework for multi-agent system verication. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2013. T. Ireland. Emergent space diagrams: The application of swarm intelligence to the problem of automatic plan generation. In Proceedings of the 13th International CAAD Futures Conference, 2009. I. S. Isa, S. Omar, Z. Saad, N. M. Noor, and M. K. Osman. Weather forecast- ing using photovoltaic system and neural network. In 2010 Second International Conference on Computational Intelligence, Communication Systems and Networks, CICSyN, pages 96{100, July 2010. S. Iyengar and M. Lepper. When choice is demotivating: Can one desire too much of a good thing? Journal of Personality and Social Psychology, 79:995{1006, 2000. A. X. Jiang, L. S. Marcolino, A. D. Procaccia, T. Sandholm, N. Shah, and M. Tambe. Diverse randomized agents vote to win. In Proceedings of the Neu- ral Information Processing Systems Conference, NIPS, 2014. Meir Kalech and Gal A. Kaminka. On the design of coordination diagnosis algo- rithms for teams of situated agents. Articial Intelligence, 171:491{513, 2007. Meir Kalech, Sarit Kraus, Gal A. Kaminka, and Claudia V. Goldman. Practical voting rules with partial information. Journal of Autonomous Agents and Multi- Agent Systems, 22:151{182, 2011. Gal A. Kaminka. Coordination of Large-Scale Multiagent Systems, chapter Han- dling Coordination Failures in Large-Scale Multi-Agent Systems. Springer, 2006. Gal A. Kaminka and Milind Tambe. What is wrong with us? Improving robustness through social diagnosis. In Proceedings of the National Conference on Articial Intelligence, AAAI, 1998. Jerey A. Kelly, Debra A. Murphy, Kathleen J. Sikkema, Timothy L. McAulie, Roger A. Roman, Laura J. Solomon, Richard A. Winett, and Seth C. Kalichman. Randomised, controlled, community-level HIV-prevention intervention for sexual- risk behaviour among homosexual men in US cities. The Lancet, 350(9090):1500, 1997. David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of in uence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, 2003. Ian Keough and David Benjamin. Multi-objective optimization in architectural design. In Proceedings of the 2010 Spring Simulation Multiconference, SpringSim, 2010. 190 Eliahu Khalastchi, Meir Kalech, and Lior Rokach. A hybrid approach for fault detection in autonomous physical agents. In Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2014. Samir Khuller, Anna Moss, and Joseph (Se) Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1), 1999. D. S. Knysh and V. M. Kureichik. Parallel genetic algorithms: A survey and problem state of the art. Journal of Computer and Systems Sciences International, 49(4):579{589, 2010. George Konidaris, Ilya Scheidwasser, and Andrew G Barto. Transfer in reinforce- ment learning via shared features. The Journal of Machine Learning Research, 13 (1), 2012. Panagiotis Kouvaros and Alessio Lomuscio. Automatic verication of parameterised interleaved multi-agent systems. In Proceedings of the 12th International Confer- ence on Autonomous Agents and Multiagent Systems, AAMAS, 2013. Stefan Krause, Richard James, Jolyon J. Faria, Graeme D. Ruxton, and Jens Krause. Swarm intelligence in humans: diversity can trump ability. Animal Be- haviour, 81(5):941{948, May 2011. Karim R. Lakhani, Lars Bo Jeppesen, Peter A. Lohse, and Jill A. Panetta. The value of openness in scientic problem solving. HBS Working Paper, 07-050, 2007. URL http://hbswk.hbs.edu/item/5612.html. P. J. Lamberson and Scott E. Page. Optimal forecasting groups. Management Science, 58(4):805{810, 2012. Zhaofeng Li and Yichuan Jiang. Cross-layers cascade in multiplex networks. In Proceedings of the 13th International Conference on Autonomous Agents and Mul- tiagent Systems, AAMAS, 2014. Marco LiCalzi and Oktay Surucu. The power of diversity over large solution spaces. Management Science, 58(7):1408{1421, July 2012. URL http://dx.doi.org/10. 1287/mnsc.1110.1495. Somchaya Liemhetcharat and Manuela Veloso. Modeling and learning synergy for team formation with heterogeneous agents. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, pages 365{ 374, Richland, SC, 2012. Shih-hsin Eve Lin and David Jason Gerber. Designing-in performance: A frame- work for evolutionary energy performance feedback in early stage design. Automa- tion and Construction, 38:59{73, 2012. Shih-Hsin Eve Lin and David Jason Gerber. Evolutionary energy performance feedback for design. Energy and Buildings, 84:426|-441, 2014. 191 Michael Q. Lindner and Noa Agmon. Eective, quantitative, obscured observation- based fault detection in multi-agent systems. In Proceedings of the 13th Interna- tional Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2014. Christian List and Robert E. Goodin. Epistemic democracy: Generalizing the Condorcet Jury Theorem. Journal of Political Philosophy, 9:277{306, 2001. Alan J. Lockett, Charles L. Chen, and Risto Miikkulainen. Evolving explicit op- ponent models in game playing. In Proceedings of the 9th annual conference on genetic and evolutionary computation, GECCO, 2007. Shenghua Luan, Konstantinos V. Katsikopoulos, and Torsten Reimer. When does diversity trump ability (and vice versa) in group decision making? A simulation study. PLoS One, 7(2):e31043, 2012. Patrick Lucey, Alina Bialkowski, Peter Carr, Yisong Yue, and Iain Matthews. How to get an open shot: Analyzing team movement in basketball using tracking data. In Proceedings of the 8th Annual MIT SLOAN Sports Analytics Conference, 2014. Patrick Lucey, Alina Bialkowski, Mathew Monfort, Peter Carr, and Iain Matthews. Quality vs quantity: Improved shot prediction in soccer using strategic features from spatiotemporal data. In Proceedings of the 9th Annual MIT SLOAN Sports Analytics Conference, 2015. Guan-Chun Luh and Chun-Yi Lin. Structural topology optimization using ant colony optimization algorithm. Applied Soft Computing, 9:1343{1353, 2009. Guan-Chun Luh, Chun-Yi Lin, and Yu-Shu Lin. A binary particle swarm optimiza- tion for continuum structural topology optimization. Applied Soft Computing, 11: 2833{2844, 2011. Penousal Machado, Adriano Vinhas, Jo~ ao Correia, and Aniko Ek art. Evolving ambiguous images. In Proceedings of the Twenty-Fourth International Joint Con- ference on Articial Intelligence, IJCAI, 2015. Mahsa Maghami and Gita Sukthankar. Identifying in uential agents for advertising in multi-agent markets. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2012. R. Maheswaran, Y. Chang, A. Henehan, and S. Danesis. Deconstructing the re- bound with optical tracking data. In Proceedings of the 6th Annual MIT SLOAN Sports Analytics Conference, 2012. Charles F. Manski. Interpreting the predictions of prediction markets. Economics Letters, 91(3):425{429, June 2006. Andrew Mao, Ariel D. Procaccia, and Yiling Chen. Better Human Computation Through Principled Voting. In Proceedings of the Twenty-Seventh AAAI Confer- ence on Articial Intelligence, AAAI, 2013. 192 L. S. Marcolino, B. Kolev, S. Price, S. P. Veetil, D. Gerber, J. Musil, and M. Tambe. Aggregating opinions to design energy-ecient buildings. In 8th Multidisciplinary Workshop on Advances in Preference Handling, M-PREF, 2014a. L. S. Marcolino, H. Xu, D. Gerber, B. Kolev, S. Price, E. Pantazis, and M. Tambe. Multi-agent team formation for design problems. In Coordination, Organizations, Institutions and Norms in Agent Systems XI. Springer-Verlag Lecture Notes in AI, 2016a. (To Appear). Leandro Soriano Marcolino and Luiz Chaimowicz. No robot left behind: Coordi- nation to overcome local minima in swarm navigation. In Proceedings of the 2008 IEEE International Conference on Robotics and Automation, ICRA, 2008. Leandro Soriano Marcolino and Luiz Chaimowicz. Trac control for a swarm of robots: Avoiding target congestion. In Proceedings of the 2009 IEEE International Conference on Intelligent Robots and Systems, IROS, 2009a. Leandro Soriano Marcolino and Luiz Chaimowicz. Trac control for a swarm of robots: Avoiding group con icts. In Proceedings of the 2009 IEEE International Conference on Intelligent Robots and Systems, IROS, 2009b. Leandro Soriano Marcolino, Albert Xin Jiang, and Milind Tambe. Multi-agent team formation: Diversity beats strength? In Proceedings of the Twenty-Third International Joint Conference on Articial Intelligence, IJCAI, 2013. Leandro Soriano Marcolino, Haifeng Xu, Albert Xin Jiang, Milind Tambe, and Emma Bowring. Give a hard problem to a diverse team: Exploring large action spaces. In Proceedings of the Twenty-Eighth AAAI Conference on Articial Intel- ligence, AAAI, 2014b. Leandro Soriano Marcolino, Aravind Lakshminarayanan, Vaishnavh Nagarajan, and Milind Tambe. Every team deserves a second chance: An extended study on predicting team performance. Journal of Autonomous Agents and Multi-Agent Systems, 2016b. (Under Review). Leandro Soriano Marcolino, Aravind Lakshminarayanan, Amulya Yadav, and Milind Tambe. Simultaneous in uencing and mapping social networks (short pa- per). In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2016c. P. V. Marsden. Recent developments in network measurement. In Peter J. Car- rington, John Scott, and Stanley Wasserman, editors, Models and methods in social network analysis. Cambridge University Press, 2005. Simon J. Mason and Nicholas E. Graham. Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical sig- nicance and interpretation. Quarterly Journal of the Royal Meteorological Society, 128:2145{2166, 2002. 193 Tim Matthews, Sarvapali D. Ramchurn, and Georgios Chalkiadakis. Competing with humans at Fantasy Football: Team formation in large partially-observable domains. In Proceedings of the Twenty-Sixth AAAI Conference on Articial Intel- ligence, AAAI, pages 1394{1400, 2012. Richard D. McKelvey and Thomas R. Palfrey. Quantal response equilibria for normal form games. In Games and Economic Behavior, volume 10, pages 6{38. Elsevier, 1995. J. C. Miles, G. M. Sisk, and C. J. Moore. The conceptual design of commercial buildings using a genetic algorithm. Computers & Structures, 79(17):1583{1592, 2001. Violeta Mirchevska, Mitja Lu strek, Andra z Be zek, and Matja z Gams. Discovering strategic behaviour of multi-agent systems in adversary settings. Computing and Informatics, 2014. Vaishnavh Nagarajan, Leandro Soriano Marcolino, and Milind Tambe. Every team deserves a second chance: Identifying when things go wrong. In Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2015. (Double 1 st author). Ranjit Nair and Milind Tambe. Hybrid BDI-POMDP framework for multiagent teaming. Journal of Articial Intelligence Research, 23(1):367{420, April 2005. S. Nol and D. Floreano. Evolutionary Robotics. The Biology, Intelligence, and Technology of Self-organizing Machines. MIT Press, Cambridge, MA, 2001. H. Nurmi. Comparing Voting Systems. Springer, 1987. Takuya Obata, Takuya Sugiyama, Kunihito Hoki, and Takeshi Ito. Consultation algorithm for Computer Shogi: Move decisions by majority. In Computer and Games'10, volume 6515 of Lecture Notes in Computer Science, pages 156{165. Springer, 2011. Joel Oren and Brendan Lucier. Online (budgeted) social choice. In Proceedings of the 28th AAAI Conference on Articial Intelligence, AAAI, 2014. E. Osaba, E. Onieva, F. Diaz, R. Carballedo, P. Lopez, and A. Perallos. A migration strategy for distributed evolutionary algorithms based on stopping non-promising subpopulations: A case study on routing problems. International Journal of Arti- cial Intelligence, 2015. In Press. Pandanet. Introduction to Go. http://www.pandanet.co.jp/English/ introduction_of_go/, 2016. Rama Kumar Pasumarthi, Ramasuri Narayanam, and Balaraman Ravindran. Near optimal strategies for targeted marketing in social networks. In Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2015. 194 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825{2830, 2011. Robi Polikar. Ensemble Machine Learning: Methods and Applications, chapter Ensemble Learning. Springer, 2012. Ariel D. Procaccia, Sashank J. Reddi, and Nisarg Shah. A maximum likelihood approach for selecting sets of alternatives. In Proceedings of the 28th Conference on Uncertainty in Articial Intelligence, UAI, 2012. Jared Quenzel and Paul Shea. Predicting the winner of tied NFL games: Do the details matter? Journal of Sports Economics, 2014. Forthcoming, available online. A. D. Radford and J. S. Gero. Tradeo diagrams for the integrated design of the physical environment in buildings. Building and Environment, 15(1), 1980. T. Raines, M. Tambe, and S. Marsella. Automated assistants to aid humans in understanding team behaviors. In AGENTS, 2000. Fernando Ramos and Huberto Ayanegui. Discovering tactical behavior patterns supported by topological structures in soccer-agent domains. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2008. Paul Rhode and Koleman Strumpf. Historical presidential betting markets. Journal of Economic Perspectives, 18(2):127{142, 2004. Eric Rice. The positive role of social networks and social networking technology in the condom-using behaviors of homeless young people. Public health reports, 125 (4):588, 2010. Eric Rice and Harmony Rhoades. How should network-based prevention for home- less youth be implemented? Addiction, 108(9):1625, 2013. Eric Rice, Eve Tulbert, Julie Cederbaum, Anamika Barman Adhikari, and Nor- weeta G Milburn. Mobilizing homeless youth for HIV prevention: a social network analysis of the acceptability of a face-to-face and online social networking interven- tion. Health education research, 27(2):226, 2012. Horst W. J. Rittel and Melvin M. Webber. Dilemmas in a general theory of plan- ning. Policy Sciences, 4:155{169, 1973. Diederik Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley. A survey of multi{objective sequential decision{making. Journal of Articial Intelligence Research, 48, 2013. F. Schadd, S. Bakkes, and P. Spronck. Opponent modeling in real-time strat- egy games. In Proceedings of the 2007 Simulation and AI in Games Conference, GAMEON, pages 61{68, 2007. 195 C. Seshadhri, Tamara G. Kolda, and Ali Pinar. Community structure and scale-free collections of Erd} os-R enyi graphs. Physical Review E, 85(5):056109, 2012. David Silver and Joel Veness. Monte-Carlo planning in large POMDPs. In Advances in Neural Information Processing Systems, NIPS, pages 2164{2172, 2010. Herbert A. Simon. The structure of ill-structured problems. Articial Intelligence, 4:181{201, 1973. Brittany N. Smith, Anbang Xu, and Brian P. Bailey. Improving interaction mod- els for generating and managing alternative ideas during early design work. In Proceedings of the Graphics Interface Conference, 2010. Reid G. Smith. The contract net protocol: High-level communication and con- trol in a distributed problem solver. IEEE Transactions on Computers, C-29(12), December 1980. Roland Snooks. Encoding behavioral matter. In Proceedings of the International Symposium on Algorithmic Design for Architecture and Urban Design, ALGODE, 2011. Yusuke Soejima, Akihiro Kishimoto, and Osamu Watanabe. Evaluating root par- allelization in Go. IEEE Transactions in Computational Intelligence and AI in Games, 2(4):278 { 287, 2010. Hossein Azari Souani, David C. Parkes, and Lirong Xia. Random utility theory for social choice. In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, L eon Bottou, and Kilian Q. Weinberger, editors, Proceedings of the Neural Information Processing Systems Conference, NIPS, pages 126{134, 2012. Finnegan Southey, Michael Bowling, Bryce Larson, Carmelo Piccione, Neil Burch, Darse Billings, and Chris Rayner. Bayes' blu: Opponent modeling in poker. In Proceedings of the Twenty-First Conference on Uncertainty in Articial Intelli- gence, UAI, 2005. Peter Stone, Gal A. Kaminka, Sarit Kraus, Jerey S. Rosenschein, and Noa Agmon. Teaching and leading an ad hoc teammate: Collaboration without pre-coordination. Articial Intelligence Journal, 203:35{65, October 2013. Jared Sylvester and Nitesh V. Chawla. Evolutionary ensembles: Combining learn- ing agents using genetic algorithms. In Proceedings of the 20th National Conference on Articial Intelligence, AAAI, 2005. Milind Tambe. Towards exible teamwork. Journal of Articial Intelligence Re- search, 7:83{124, 1997. Milind Tambe, Paul Scerri, and David V. Pynadath. Adjustable autonomy for the real world. Journal of Articial Intelligence Research, 17(1):171{228, 2002. 196 Danesh Tarapore, Anders Lyhne Christensen, Pedro U. Lima, and Jorge Carneiro. Abnormality detection in multiagent systems inspired by the adaptive immune sys- tem. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2013. Matthew E. Taylor and Peter Stone. Transfer learning for reinforcement learning domains: A survey. The Journal of Machine Learning Research, 10, 2009. Kostas Terzidis. Algorithmic Architecture. Routledge, 2006. Abigail Thompson. Does diversity trump ability? An example of the misuse of mathematics in the social sciences. Notices of the AMS, 61(9), 2014. Alan Tsang and Kate Larson. Opinion dynamics of skeptical agents. In Proceed- ings of the 13th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2014. Hamed Valizadegan, Rong Jin, and Shijun Wang. Learning to trade o between exploration and exploitation in multiclass bandit prediction. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011. P.H.G. van Langen and F.M.T. Brazier. Design space exploration revisited. Arti- cial Intelligence for Engineering Design, Analysis, and Manufacturing, 20:113{119, 2006. Sebastian Vehlken. Computational swarming: A cultural technique for generative architecture. Footprint { Delft Architecture Theory Journal, 15, 2014. Robert Vierlinger and Klaus Bollinger. Accommodating change in parametric de- sign. In Proceedings of the 2014 International Conference of the Association for Computer-Aided Design in Architecture, ACADIA, 2014. David West and Scott Dellana. Diversity of ability and cognitive style for group decision processes. Information Sciences, 179(5):542{558, 2009. Robert F. Woodbury and Andrew L. Burrow. Whither design space? Articial Intelligence for Engineering Design, Analysis and Manufacturing, 20:63{82, 2006. Lirong Xia. Computational voting theory: game-theoretic and combinatorial as- pects. PhD thesis, Duke University, Durham, NC, USA, 2011. Lirong Xia and Vincent Conitzer. Determining possible and necessary winners under common voting rules given partial orders. Journal of Articial Intelligence Research (JAIR), 41, 2011a. Lirong Xia and Vincent Conitzer. A maximum likelihood approach towards aggre- gating partial orders. In Proceedings of the 22nd International Joint Conference on Articial Intelligence, IJCAI, 2011b. 197 A. Yadav, L. S. Marcolino, E. Rice, R. Petering, H. Winetrobe, H. Rhoades, M. Tambe, and H. Carmichael. Preventing HIV spread in homeless populations using PSINET. In Proceedings of the Twenty-Seventh Innovative Applications of Articial Intelligence Conference, IAAI, 2015. Yongjie Yang. Manipulation with bounded single-peaked width: A parameterized study. In Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2015. Y. K. Yi and A. M. Malkawi. Optimizing building form for energy performance based on hierarchical geometry relation. Automation in Construction, 18, 2009. Peyton Young. Optimal voting rules. Journal of Economic Perspectives, 9(1): 51{64, 1995. Fengqiang Zhao, Guangqiang Li, Chao Yang, Ajith Abraham, and Hongbo Liu. A human{computer cooperative particle swarm optimization based immune algorithm for layout design. Neurocomputing, 132:68{78, 2014. 198 Appendices 199 Appendix A Simultaneous Influencing and Mapping: Additional Results A.1 Results for each network Since in Section 7.3 I presented my results across 4 dierent social network graphs, in this section I present those individually for each network, for the interested reader. Assuming a uniform model for the teaching lists, we can see the results for network A in Figures A.1, A.2, A.3; for network B in Figures A.4, A.5, A.6; for the Facebook network in Figures A.7, A.8, A.9; and nally for the MySpace network in Figures A.10, A.11, A.12. Assuming a power law model, we can see the results for network A in Figures A.13, A.14, A.15; for network B in Figures A.16, A.17, A.18; for the Facebook network in Figures A.19, A.20, A.21; and nally for the MySpace network in Figures A.22, A.23, A.24. As we can see, the results for each network show similar tendencies as the results across all networks presented in the thesis. A.2 Additional results for power law distribution In Section 7.3.2 I presented results for the power law distribution for a = 1:8. In this section, I present results for a = 1:2 (that is, each neighbor has a 20% probability of being in a teaching list). I show in Figure A.25 the result at each intervention for' = 0:5 and p = 0:5. In Figure A.26, I show the AUC results for dierent parametrizations of p and'. Finally, I present the regret for this case in Figure A.27. As we can see, my main conclusions still hold for a dierent parametrization of a. 200 0 10 20 30 40 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Influence ● ● ● ● ● ● ● ● ● ● ● ● ● ● Intervention Number ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing (a) In uence 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (b) Knowledge 0 10 20 30 40 0.2 0.4 0.6 0.8 1.0 Influence + Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (c) In uence + Knowledge Figure A.1: Results for network A across many interventions, for in uence probability p = 0:5, teaching probability ' = 0:5, assuming uniform distribution. 201 φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (a) Changing teaching probability p=0.1 p=0.5 p=0.1 p=0.5 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (b) Changing in uence probability Figure A.2: Results of In uence and Knowledge in network A for dierent teaching and in uence probabilities, assuming uniform distribution. φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Regret 0 2 4 6 8 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing p = 0.1 p = 0.5 Figure A.3: Regret in network A for dierent teaching and in uence probabilities, as- suming uniform distribution. 202 0 10 20 30 40 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Influence ● ● ● ● ● ● ● ● ● ● ● ● ● ● Intervention Number ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing (a) In uence 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (b) Knowledge 0 10 20 30 40 0.2 0.4 0.6 0.8 1.0 Influence + Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (c) In uence + Knowledge Figure A.4: Results for network B across many interventions, for in uence probability p = 0:5, teaching probability ' = 0:5, assuming uniform distribution. 203 φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (a) Changing teaching probability p=0.1 p=0.5 p=0.1 p=0.5 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (b) Changing in uence probability Figure A.5: Results of In uence and Knowledge in network B for dierent teaching and in uence probabilities, assuming uniform distribution. φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Regret 0 2 4 6 8 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing p = 0.1 p = 0.5 Figure A.6: Regret in network B for dierent teaching and in uence probabilities, assum- ing uniform distribution. 204 0 10 20 30 40 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Influence ● ● ● ● ● ● ● ● ● ● ● ● ● ● Intervention Number ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing (a) In uence 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (b) Knowledge 0 10 20 30 40 0.2 0.4 0.6 0.8 1.0 Influence + Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (c) In uence + Knowledge Figure A.7: Results for Facebook network across many interventions, for in uence prob- ability p = 0:5, teaching probability ' = 0:5, assuming uniform distribution. 205 φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (a) Changing teaching probability p=0.1 p=0.5 p=0.1 p=0.5 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (b) Changing in uence probability Figure A.8: Results of In uence and Knowledge in Facebook network for dierent teaching and in uence probabilities, assuming uniform distribution. φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Regret 0 2 4 6 8 10 12 14 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing p = 0.1 p = 0.5 Figure A.9: Regret in Facebook network for dierent teaching and in uence probabilities, assuming uniform distribution. 206 0 10 20 30 40 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Influence ● ● ● ● ● ● ● ● ● ● ● ● ● ● Intervention Number ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing (a) In uence 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (b) Knowledge 0 10 20 30 40 0.2 0.4 0.6 0.8 1.0 Influence + Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (c) In uence + Knowledge Figure A.10: Results for MySpace network across many interventions, for in uence prob- ability p = 0:5, teaching probability ' = 0:5, assuming uniform distribution. 207 φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (a) Changing teaching probability p=0.1 p=0.5 p=0.1 p=0.5 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (b) Changing in uence probability Figure A.11: Results of In uence and Knowledge in MySpace network for dierent teach- ing and in uence probabilities, assuming uniform distribution. φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Regret 0 2 4 6 8 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing p = 0.1 p = 0.5 Figure A.12: Regret in MySpace network for dierent teaching and in uence probabilities, assuming uniform distribution. 208 0 10 20 30 40 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Influence ● ● ● ● ● ● ● ● ● ● ● ● ● ● Intervention Number ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing (a) In uence 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (b) Knowledge 0 10 20 30 40 0.2 0.4 0.6 0.8 1.0 Influence + Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (c) In uence + Knowledge Figure A.13: Results for network A across many interventions, for in uence probability p = 0:5, teaching probability ' = 0:5, assuming power law distribution. 209 φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (a) Changing teaching probability p=0.1 p=0.5 p=0.1 p=0.5 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (b) Changing in uence probability Figure A.14: Results of In uence and Knowledge in network A for dierent teaching and in uence probabilities, assuming power law distribution. φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Regret 0 2 4 6 8 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing p = 0.1 p = 0.5 Figure A.15: Regret in network A for dierent teaching and in uence probabilities, as- suming power law distribution. 210 0 10 20 30 40 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Influence ● ● ● ● ● ● ● ● ● ● ● ● ● ● Intervention Number ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing (a) In uence 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (b) Knowledge 0 10 20 30 40 0.2 0.4 0.6 0.8 1.0 Influence + Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (c) In uence + Knowledge Figure A.16: Results for network B across many interventions, for in uence probability p = 0:5, teaching probability ' = 0:5, assuming power law distribution. 211 φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (a) Changing teaching probability p=0.1 p=0.5 p=0.1 p=0.5 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (b) Changing in uence probability Figure A.17: Results of In uence and Knowledge in network B for dierent teaching and in uence probabilities, assuming power law distribution. φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Regret 0 2 4 6 8 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing p = 0.1 p = 0.5 Figure A.18: Regret in network B for dierent teaching and in uence probabilities, as- suming power law distribution. 212 0 10 20 30 40 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Influence ● ● ● ● ● ● ● ● ● ● ● ● ● ● Intervention Number ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing (a) In uence 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (b) Knowledge 0 10 20 30 40 0.2 0.4 0.6 0.8 1.0 Influence + Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (c) In uence + Knowledge Figure A.19: Results for Facebook network across many interventions, for in uence prob- ability p = 0:5, teaching probability ' = 0:5, assuming power law distribution. 213 φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (a) Changing teaching probability p=0.1 p=0.5 p=0.1 p=0.5 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (b) Changing in uence probability Figure A.20: Results of In uence and Knowledge in Facebook network for dierent teach- ing and in uence probabilities, assuming power law distribution. φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Regret 0 2 4 6 8 10 12 14 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing p = 0.1 p = 0.5 Figure A.21: Regret in Facebook network for dierent teaching and in uence probabili- ties, assuming power law distribution. 214 0 10 20 30 40 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Influence ● ● ● ● ● ● ● ● ● ● ● ● ● ● Intervention Number ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing (a) In uence 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (b) Knowledge 0 10 20 30 40 0.2 0.4 0.6 0.8 1.0 Influence + Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (c) In uence + Knowledge Figure A.22: Results for MySpace network across many interventions, for in uence prob- ability p = 0:5, teaching probability ' = 0:5, assuming power law distribution. 215 φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (a) Changing teaching probability p=0.1 p=0.5 p=0.1 p=0.5 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (b) Changing in uence probability Figure A.23: Results of In uence and Knowledge in MySpace network for dierent teach- ing and in uence probabilities, assuming power law distribution. φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Regret 0 2 4 6 8 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing p = 0.1 p = 0.5 Figure A.24: Regret in MySpace network for dierent teaching and in uence probabilities, assuming power law distribution. 216 0 10 20 30 40 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Influence ● ● ● ● ● ● ● ● ● ● ● ● ● ● Intervention Number ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing (a) In uence 0 10 20 30 40 0.0 0.2 0.4 0.6 0.8 1.0 Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (b) Knowledge 0 10 20 30 40 0.2 0.4 0.6 0.8 1.0 Influence + Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Intervention Number (c) In uence + Knowledge Figure A.25: Results of 4 real world networks across many interventions, for p = 0:5 and ' = 0:5, assuming power law distribution with a = 1:2. 217 φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (a) Changing teaching probability p=0.1 p=0.5 p=0.1 p=0.5 Area Under Curve 0 10 20 30 40 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing Influence Knowledge (b) Changing in uence probability Figure A.26: Results of In uence and Knowledge for dierent teaching and in uence probabilities, assuming power law distribution with a = 1:2. φ = 0.1 φ = 0.5 φ = 1 φ = 0.1 φ = 0.5 φ = 1 Regret 0 2 4 6 8 10 Influence−greedy Knowledge−greedy Balanced Balanced−decreasing p = 0.1 p = 0.5 Figure A.27: Regret for dierent teaching and in uence probabilities, assuming power law distribution with a = 1:2. Lower results are better. 218 Appendix B Every Team Deserves a Second Chance: Additional Results In this chapter I present the results of my team assessment methodology (Chapter 8) when comparing with the actual nal outcome of the games, instead of using Fuego's evaluation. I show these results because some readers could be interested in seeing this alternative evaluation methodology. In Figure B.1 I show the accuracy, precision and recall for 9x9 Go games, while in Figure B.2 I show the ROC curves for such games. In Figure B.3 we can see the ROC results for all the 4 dierent board sizes, while I plot the AUC results with one graph per team in Figure B.4. I also show the AUC results with one graph per board size in Figure B.5. In Figure B.6 I plot the dierence between the areas under the AUC graphs for diverse and uniform, and also the percentage of stages where the prediction for diverse is signicantly better. Finally, in Figure B.7 I show the accuracy, precision and recall for 21 21 Go (and also 9 9 Go, for comparison). As we can see, the results are similar to the previous ones, and my main conclusions still hold: we still can make better predictions for the diverse team than for the uniform team, and we can observe a better quality when predicting for 21 21 Go than smaller board sizes. We notice, however, that this time the dierence between diverse and uniform (both in terms of area under the AUC curves and percentage of stages where the AUC for diverse is signicantly better) is higher on 9 9 Go than on 21 21 Go. I compare the full and the reduced representation under this alternative baseline as well. In Figure B.8 I show the area under the AUC curves for all teams and board sizes under consideration. Again, the results of both representations are similar 1 , but the reduced one is signicantly better in almost all cases. 1 My main results still hold, except that the AUC for 2121 is signicantly better than for 99 earlier for uniform than diverse under the reduced representation. For uniform, since stage 12, while for diverse, since stage 15. 219 5 10 15 20 0.4 0.5 0.6 0.7 0.8 Game Stage Accuracy ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (a) Accuracy 5 10 15 20 0.3 0.4 0.5 0.6 0.7 0.8 Game Stage Failure Precision ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (b) Failure Precision 5 10 15 20 0.4 0.5 0.6 0.7 0.8 Game Stage Success Precision ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (c) Success Precision 5 10 15 20 0.3 0.4 0.5 0.6 0.7 0.8 Game Stage Failure Recall ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (d) Failure Recall 5 10 15 20 0.3 0.4 0.5 0.6 0.7 0.8 Game Stage Success Recall ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (e) Success Recall Figure B.1: Performance metrics over all turns of 9x9 Go games. (Alternative baseline) 220 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (a) Stage 5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (b) Stage 10 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (c) Stage 15 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (d) Stage 20 Figure B.2: ROC curves, analyzing dierent thresholds in 9x9 Go. (Alternative baseline) 221 ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Diverse ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Uniform (a) Stage 5 ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Diverse ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Uniform (b) Stage 10 ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Diverse ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Uniform (c) Stage 15 ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Diverse ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate True Positive Rate ● 9x9 13x13 17x17 21x21 Uniform (d) Stage 20 Figure B.3: ROC curves for diverse and uniform, for dierent board sizes. (Alternative baseline) 222 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 0.5 0.6 0.7 0.8 ● 9x9 13x13 17x17 21x21 Stage AUC (a) Diverse Team ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 0.5 0.6 0.7 0.8 ● 9x9 13x13 17x17 21x21 Stage AUC (b) Uniform Team Figure B.4: AUC for dierent teams and board sizes, organized by teams. (Alternative baseline) 223 5 10 15 20 0.5 0.6 0.7 0.8 Stage AUC ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Intermediate Diverse (a) 9x9 5 10 15 20 0.5 0.6 0.7 0.8 Stage AUC ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Diverse (b) 13x13 5 10 15 20 0.5 0.6 0.7 0.8 Stage AUC ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Diverse (c) 17x17 5 10 15 20 0.5 0.6 0.7 0.8 Stage AUC ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform Diverse (d) 21x21 Figure B.5: AUC for dierent teams and board sizes, organized by board sizes. (Alter- native baseline) 224 9x9 21x21 Board Size Area Difference 0.0 0.5 1.0 1.5 (a) By area 9x9 21x21 Board Size % Better 0.0 0.2 0.4 0.6 0.8 1.0 (b) By percentage of stages Figure B.6: Dierences in prediction quality for the diverse and uniform teams. (Alter- native baseline) 225 5 10 15 20 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Game Stage Accuracy ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform 21x21 Uniform 9x9 Diverse 21x21 Diverse 9x9 (a) Accuracy 5 10 15 20 0.2 0.3 0.4 0.5 0.6 0.7 Game Stage Failure Precision ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform 21x21 Uniform 9x9 Diverse 21x21 Diverse 9x9 (b) Failure Precision 5 10 15 20 0.6 0.7 0.8 0.9 1.0 Game Stage Success Precision ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform 21x21 Uniform 9x9 Diverse 21x21 Diverse 9x9 (c) Success Precision 5 10 15 20 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Game Stage Failure Recall ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform 21x21 Uniform 9x9 Diverse 21x21 Diverse 9x9 (d) Failure Recall 5 10 15 20 0.2 0.4 0.6 0.8 Game Stage Success Recall ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Uniform 21x21 Uniform 9x9 Diverse 21x21 Diverse 9x9 (e) Success Recall Figure B.7: Performance metrics over all turns of 9x9 and 21x21 Go games. (Alternative baseline) 226 Full Reduced Area 0 5 10 15 20 Diverse Uniform IntermediateDiverse Uniform Diverse Uniform Diverse Uniform 9x9 13x13 17x17 21x21 Figure B.8: Comparison of prediction quality with the full and reduced representation. (Alternative baseline) 227
Abstract (if available)
Abstract
This thesis introduces a novel paradigm in artificial intelligence: decision-centered teamwork. Decision-centered teamwork is the analysis of agent teams that iteratively take joint decisions into solving complex problems. Although teams of agents have been used to take decisions in many important domains, such as: machine learning, crowdsourcing, forecasting systems, and even board games
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Speeding up distributed constraint optimization search algorithms
PDF
Toward human-multiagent teams
PDF
The human element: addressing human adversaries in security domains
PDF
Not a Lone Ranger: unleashing defender teamwork in security games
PDF
Real-world evaluation and deployment of wildlife crime prediction models
PDF
The interpersonal effect of emotion in decision-making and social dilemmas
PDF
Automated negotiation with humans
PDF
Towards addressing spatio-temporal aspects in security games
PDF
Interaction and topology in distributed multi-agent coordination
PDF
Hierarchical planning in security games: a game theoretic approach to strategic, tactical and operational decision making
PDF
Team decision theory and decentralized stochastic control
PDF
The power of flexibility: autonomous agents that conserve energy in commercial buildings
PDF
Leveraging prior experience for scalable transfer in robot learning
PDF
Characterizing and improving robot learning: a control-theoretic perspective
PDF
Learning social sequential decision making in online games
PDF
Robust loop closures for multi-robot SLAM in unstructured environments
PDF
Sample-efficient and robust neurosymbolic learning from demonstrations
PDF
Program-guided framework for your interpreting and acquiring complex skills with learning robots
PDF
Dynamic pricing and task assignment in real-time spatial crowdsourcing platforms
PDF
Rethinking perception-action loops via interactive perception and learned representations
Asset Metadata
Creator
Soriano Marcolino, Leandro
(author)
Core Title
Three fundamental pillars of decision-centered teamwork
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
06/16/2016
Defense Date
03/25/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
collective intelligence,distributed problem solving,OAI-PMH Harvest,single and multiagent learning,social choice theory,team formation,teamwork,voting
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Tambe, Milind (
committee chair
), Knoblock, Craig (
committee member
), Sukhatme, Gaurav (
committee member
), Swartout, William (
committee member
), Weller, Nicholas (
committee member
)
Creator Email
leandromarcolino@gmail.com,sorianom@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-251722
Unique identifier
UC11281120
Identifier
etd-SorianoMar-4438.pdf (filename),usctheses-c40-251722 (legacy record id)
Legacy Identifier
etd-SorianoMar-4438.pdf
Dmrecord
251722
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Soriano Marcolino, Leandro
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
collective intelligence
distributed problem solving
single and multiagent learning
social choice theory
team formation