Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Learning social sequential decision making in online games
(USC Thesis Other)
Learning social sequential decision making in online games
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Learning Social Sequential Decision Making in Online Games by Yilei Zeng A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2024 Copyright 2024 Yilei Zeng I dedicate this thesis to my family for their support through thick and thin. ii Acknowledgements I extend my deepest appreciation to my advisor, prof. Emilio Ferrara, whose guidance, support, encouragement, and expertise were invaluable throughout this journey. His mentorship served as a professional anchor and a profound personal inspiration, guiding me toward lifelong advancements and helping to unlock my full potential. I am also immensely thankful to my committee members, prof. Dmitri Williams and prof. Michael Zyda, for their insightful feedback and unwavering support throughout my PhD Journey that significantly shaped the direction and execution of this research. I owe a great debt of gratitude to all of my friends, who have provided companionship throughout my PhD. Their presence and reassurance helped lighten the load of my doctoral pursuits. I would like to thank my family, whose endless love and encouragement have sustained me in my academic endeavors. Their belief in my capabilities has been a constant source of strength and motivation. I want to express my sincere gratitude to all those who have contributed to completing this dissertation. First and foremost, I acknowledge the partial financial support provided by DARPA, without which this research would not have been possible. This dissertation stands as a milestone not just in my academic career but as a testament to the collective effort and support of everyone mentioned above. iii Table of Contents Dedication ii Acknowledgements iii List of Tables viii List of Figures ix Abstract xi Chapter 1: Introduction 1 1.1 Learning Sequential and Team Play . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Learning Heterogeneous and Multi-modal Representations . . . . . . . . . . . . . 2 1.3 Learning Human AI Collaborations . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2: Learning Sequential Play: Gaming Sessions and Team Performance 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Data & Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 League of Legends and Data Collection . . . . . . . . . . . . . . . . . . . 8 2.2.2 Gaming Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.3 Prediction Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 RQ1. Long-term Performance . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.2 RQ2. Short-term Performance . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.3 RQ3. Effect of Experience on Performance Deterioration . . . . . . . . . . 15 2.3.4 RQ4. Short-term Engagement Prediction . . . . . . . . . . . . . . . . . . 16 2.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.1 Individual and Team Performance in Games . . . . . . . . . . . . . . . . . 20 2.4.2 Team-based Online Games and Engagement . . . . . . . . . . . . . . . . 23 2.4.3 Performance Deterioration . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Chapter 3: Learning Team Play: The Influence of Social Ties on Performance 27 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Data & Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.1 Match log data from Dota 2 . . . . . . . . . . . . . . . . . . . . . . . . . 31 iv 3.2.2 Friendship data from Steam . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.3 Final dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.1 RQ1: Overview of the influence of social ties on individuals . . . . . . . . 34 3.3.2 RQ2: Overview of the influence of social ties on teams . . . . . . . . . . . 35 3.3.3 RQ3: Influence of social ties on individuals over sessions . . . . . . . . . . 36 3.3.4 RQ4: Influence of social ties on teams over sessions . . . . . . . . . . . . 39 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.1 RQ1: Influence of social ties on individual players’ activity . . . . . . . . . 40 3.4.2 RQ2: Influence of social ties on team dynamics . . . . . . . . . . . . . . . 41 3.4.3 RQ3: Influence of social ties on individuals over sessions . . . . . . . . . . 43 3.4.4 RQ4: Influence of social ties on teams over sessions . . . . . . . . . . . . 45 3.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.5.1 Social ties in teams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.5.2 Social ties in online games . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.5.3 Performance deterioration effects . . . . . . . . . . . . . . . . . . . . . . 48 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Chapter 4: Learning Purchase Decision Representation: Purchase Sequence Generation in Round-based Games 51 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2.1 Learning to Learn & Few-Shot Learning . . . . . . . . . . . . . . . . . . . 53 4.2.2 Gaming Machine Learning Datasets . . . . . . . . . . . . . . . . . . . . . 53 4.3 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3.1 Few-shot Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.4 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4.1 Parsing Replays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4.2 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.5 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.5.1 Meta-learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.5.2 Atomic Action and Embedding . . . . . . . . . . . . . . . . . . . . . . . 58 4.5.3 State Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.5.3.1 Weapon Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.5.3.2 Team Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.5.3.3 Round Attribute Encoder . . . . . . . . . . . . . . . . . . . . . 60 4.5.3.4 Economy Encoder . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5.3.5 State Representation . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5.4 Multi-Task Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5.4.1 Gate Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5.4.2 Task-Specific Decoder . . . . . . . . . . . . . . . . . . . . . . . 62 4.5.5 Learning Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5.6 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 v 4.6.1 Greedy Algorithm Baseline . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.6.2 Multi-Sequence Reasoner . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.6.2.1 Round Attribute Encoder . . . . . . . . . . . . . . . . . . . . . 65 4.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6.3.1 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Chapter 5: Learning Multi-Modal Share: Sequential Multi-Modal Social Media Gamer Embedding 67 5.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.5 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.6 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.6.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.6.2 Text Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.6.3 Image Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.6.4 Graph Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.6.5 Triplet Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.7 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.9 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Chapter 6: Learning Sequential Advancement: Human-in-the-loop Curriculum Reinforcement Learning for Game AI 82 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.2.1 Curriculum Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . 84 6.2.2 Human-in-the-Loop Reinforcement Learning . . . . . . . . . . . . . . . . 84 6.3 Interactive Curriculum Guided by Human . . . . . . . . . . . . . . . . . . . . . . 85 6.3.1 Interactive Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.3.2 A Simple Interactive Curriculum Framework . . . . . . . . . . . . . . . . 87 6.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.4.1 Effect of Interactive Curriculum . . . . . . . . . . . . . . . . . . . . . . . 89 6.4.2 Generalization Ability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Chapter 7: Conclusions, Implications, and Future Work 93 7.1 Findings Summary and Implications . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.1.1 Learning Sequential and Team Play . . . . . . . . . . . . . . . . . . . . . 93 7.1.1.1 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . 93 7.1.1.2 Impact and Implications . . . . . . . . . . . . . . . . . . . . . . 94 7.1.2 Learning Heterogeneous and Multi-modal Representations . . . . . . . . . 95 7.1.2.1 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . 95 vi 7.1.2.2 Impact and Implications . . . . . . . . . . . . . . . . . . . . . . 96 7.1.3 Learning Human AI Collaborations . . . . . . . . . . . . . . . . . . . . . 98 7.1.3.1 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . 98 7.1.3.2 Impact and Implications . . . . . . . . . . . . . . . . . . . . . . 98 7.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.2.1 Future Directions for Learning Sequential and Team Play . . . . . . . . . . 100 7.2.2 Future Directions for Learning Heterogeneous and Multi-modal Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.2.3 Future Directions for Learning Human AI Collaborations . . . . . . . . . . 101 7.2.4 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Bibliography 102 vii List of Tables 2.1 Feature statistics summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Classification performance metrics scores. . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Feature importance table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1 Percentage difference of 4 categories of teams. . . . . . . . . . . . . . . . . . . . 40 4.1 Description of the extracted information of a player for each round. . . . . . . . . 55 4.2 The distribution of purchasing action count for each type of weapon. . . . . . . . . 56 4.3 The results of different methods including ablation study. . . . . . . . . . . . . . . 63 5.1 DeepWalk hyper-parameters for graph embedding. . . . . . . . . . . . . . . . . . 76 5.2 Model hyper-parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.3 Prediction scores for different experiments. . . . . . . . . . . . . . . . . . . . . . 78 5.4 Multi-modal top k=1 Nearest Neighbours Similarity . . . . . . . . . . . . . . . . . 78 viii List of Figures 2.1 Original Sessions and Randomized Index Sessions. . . . . . . . . . . . . . . . . . 10 2.2 Relationship between experience and player performance. . . . . . . . . . . . . . 12 2.3 Performance deterioration over the course of a gaming session. . . . . . . . . . . . 14 2.4 Performance Deterioration in High vs. Low Experience Players. . . . . . . . . . . 15 2.5 Comparison of highly-experienced vs.inexperienced players. . . . . . . . . . . . . 16 3.1 Distribution plot of our dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Social ties’ impact on individuals playing with and without friends. . . . . . . . . . 38 3.3 KDA Trajectories of Players Gaming Exclusively with Friends whole session. . . . 44 3.4 Rate of KDA Change in Sessions of Varying Lengths. . . . . . . . . . . . . . . . . 45 3.5 Social ties’ impact on teams over gaming sessions. . . . . . . . . . . . . . . . . . 46 4.1 Purchase Sequence Generation Model Architecture. . . . . . . . . . . . . . . . . . 57 4.2 Atomic action Embedding t-SNE visualization. . . . . . . . . . . . . . . . . . . . 58 5.1 Structure of coordinated representations . . . . . . . . . . . . . . . . . . . . . . . 69 5.2 Architecture of the multimodal system. . . . . . . . . . . . . . . . . . . . . . . . . 71 5.3 Architecture of BERT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.4 Image embedding framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.5 Triplet Loss Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.6 t-SNE Visualization of retweet network’s deepwalk embedding. . . . . . . . . . . 79 5.7 Multi modality representation visualization with t-SNE . . . . . . . . . . . . . . . 80 6.1 Adaptive Human Strategies in Curriculum Training. . . . . . . . . . . . . . . . . . 83 6.2 Interactive platform design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 ix 6.3 Example of our interactive platform training in parallel. . . . . . . . . . . . . . . . 86 6.4 Three tasks in our interactive platform. . . . . . . . . . . . . . . . . . . . . . . . . 86 6.5 Learning curve facing high wall. . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.6 Effect of interactive curriculum evaluated on the ultimate task. . . . . . . . . . . . 89 6.7 The generalization ability of interactive curriculums. . . . . . . . . . . . . . . . . 90 x Abstract A paradigm shift towards human-centered intelligent gaming systems is gradually setting in. This dissertation explores the complexities of social sequential decision-making within online gaming environments and presents comprehensive AI solutions to enhance personalized single and multiagent experiences. The three core contributions of the dissertation are intricately interrelated, creating a cohesive framework for understanding and improving AI in gaming. I begin by delving into the dynamics of gaming sessions and sequential in-game individual and social decisionmaking, which establishes a baseline of how decisions evolve, providing the necessary context for the subsequent integration of diverse information sources; two, the integration of heterogeneous information and multi-modal trajectories, which enhances decision-making generation models; and three, the creation of a reinforcement learning with human feedback framework to train gaming AIs that effectively align with human preferences and strategies, which enables the system not only learning but also interacting with humans. Collectively, this dissertation combines innovative data-driven, generative AI, representation learning, and human-AI collaboration solutions to help advance both the fields of computational social science and artificial intelligence applications of gaming. xi Chapter 1 Introduction The gaming industry is moving into an era where re-engineered systems replace old-style game engines with embedded machine learning technologies to operate, analyze, and understand gameplay. Human-centered AI takes a path that bridges the interest of both the gaming industry and the artificial intelligence research community. In this statement, I divide my research on constructing intelligent gaming systems into three sub-research topics: Learning sequential and team play, Learning Heterogeneous and Multi-modal Representations, and Learning Human AI Collaborations. My research primarily focuses on building representations of players, leveraging multi-modal machine learning to extract insight from heterogeneous sources, and building interactive reinforcement learning frameworks to model the community’s human decision-making process. 1.1 Learning Sequential and Team Play Social gaming, such as the Battle-Royale genre and Animal Crossing, has recently gained popularity. Combined with heterogeneous data on social media and streaming platforms, understanding and predicting players’ behavior patterns by considering graph structures has become increasingly important. Moreover, complex real-world challenges are often solved through teamwork and collaboration. Research in gaming teamwork [106, 142] might inspire a broader research audience. One part of my current research focuses on understanding the social dynamics and collaboration strategies in the gaming community. I devote chapter 2 and chapter 3 to this research question. 1 Chapter 2, Learning Sequential Play: Gaming Sessions and Team Performance, features my research from Individual performance in team-based online games [106]. To understand the evolution of individual performance within ad hoc teams, the analysis of player performance in successive matches of a gaming session demonstrates that a player’s success deteriorates over the course of the session, but this effect is mitigated by the player’s experience. I also find no significant long-term improvement in individual performance for most players. Modeling the short-term performance dynamics allows us to accurately predict when players choose to continue to play or end the session. Chapter 3, Learning Team Play: The Influence of Social Ties on Performance, deepens the previous chapter’s research by adding social dynamics into sequential play, as published in my research The Influence of Social Ties on Performance in Team-based Online Games [141]. Social ties are the invisible glue that keeps together human ecosystems. This research aims to elucidate the influence of social ties on individual and team performance dynamics. The research focus on a popular Multiplayer Online Battle Arena (MOBA) collaborative team-based game, Defense of the Ancients 2 (Dota 2). The research reveals that, when playing with their friends, individuals are systematically more active in the game than taking part in a team of strangers. However, I find that increased activity does not homogeneously lead to an improvement in players’ performance. Despite being beneficial to low skill players, playing with friends negatively affects performance of high skill players. Our findings shed light on the mixed influence of social ties on performance and can inform new perspectives on virtual team management and behavioral incentives. 1.2 Learning Heterogeneous and Multi-modal Representations Human priors in gaming exist primarily in four sources: replays and gaming logs records the detailed in-game behaviors; walk-through blog posts and gaming Wikipedia shows the summarized strategies; social media uncovers the myth of gaming networks, crowd, and marketing; video 2 streaming with its multi-modal feature is both informative and interactive. The second part of my current research focuses on mining and learning from heterogeneous data to reason the human’s decision-making process with multi-modal machine learning. I devote chapter 4 and chapter 5 to this research question. Chapter 4, featuring my research of Learning to Reason in Round-based Games: Multi-task Sequence Generation for Purchasing Decision Making in First-person Shooters [140] aims to learn state representations of players leverage heterogeneous data. Sequential reasoning is a complex human ability. With extensive previous research focusing on gaming AI in a single continuous game, round-based decision-making extending to a sequence of games remains less explored. Leveraging Counter-Strike: Global Offensive (CS:GO), a round-based first-person shooter game, this research aims to learn from top tier players’ virtual digital currencies’ sequential spending strategies and reason the decision impacting component. This research shed light on modeling and reasoning broader temporal virtual purchasing behavior in gamified online systems for solo and team users. Specifically, I propose a Sequence Reasoner with Round Attribute Encoder and Multi-Task Decoder to interpret the strategies. The model adopts few-shot learning to sample multiple rounds in a match and modifies the agnostic meta-learning algorithm Reptile for the meta-learning loop. I formulate each round as a multi-task sequence generation problem. Our state representations combine action encoder, team encoder, player features, round attribute encoder, and economy encoders to help our agent learn to reason under this specific multi-player round-based scenario. Chapter 5, Learning Multi-Modal Share: Sequential Multi-Modal Social Media Gamer Embedding, focuses on creating a representation learning method that can jointly leverage fine-tuned embeddings from three modalities, i.e., natural language from text, visual signals from images, and graphs for relational connections on social media. This method further enhances decision-making generation models’ performance. This research features the Esports community, a coordinated representation learning method leveraging three modalities: written natural language, visual signals represented with images, and graphs shown by social media interactions. I choose multi-modal 3 social-media-based user profiling as our downstream task to predict team affiliations. Our multimodal embedding space can be visualized by T-SNE and reveal higher quality and better prediction than uni-modal representations. 1.3 Learning Human AI Collaborations Chapter 6, Learning Sequential Advancement: Human-in-the-loop Curriculum Reinforcement Learning for Game AI, discusses the third part of my research, which features human interactions with the bots or the automated gaming and broader AI system. The motivation is to leverage human priors in interactions to improve the learning algorithms. This research topic features my work in Human Decision Makings on Curriculum Reinforcement Learning with Difficulty Adjustment [139]. As how to leverage human world knowledge in reinforcement learning becomes a heated and established open question [52], I created a platform that enables users to interact with the agent online by manipulating the task difficulty. I investigate the effect of human prior on the designing of a general reinforcement learning curriculum and compare it against existing approaches to automatic curriculum generation. In the extension of this project, I will learn from the established research field of human-robot interactions and extend it to human-virtual bot interactions. These three parts all contribute to the broader domain of human-centered AI [99] in gaming, with the general goal of understanding players, which will benefit the construction of intelligent gaming systems. 4 Chapter 2 Learning Sequential Play: Gaming Sessions and Team Performance 2.1 Introduction Solving today’s complex challenges increasingly calls for collaborating with others. People are often brought together in temporary ad hoc teams to achieve a common goal before moving on to the next problem, likely with a different team. An example of such ad hoc teams can be found in Multiplayer Online Battle Arena (MOBA) games. In this popular genre of games, two teams are assembled and face each other, with individuals collaborating with strangers to complete a series of complex, fast-paced tasks (e.g., kill enemies, destroy towers, conquer the enemy base, etc.) to win the game. Previous studies [67] showed that strangers collaborate in online games through communication and coordination, often trying to exert influence over their teammates. Players understand that how they interact with teammates affects collaboration, and thus, they must discipline themselves to facilitate successful social interaction with their team. Players must reach a mutual understanding of the changing situations, work closely, continuously make new strategies together, build and maintain team cohesiveness, and deal with deviant players. In addition, game designers dynamically assemble players to match the skill levels of opposing teams. There are several factors that affect 5 these ad hoc team performance, such as communication [71], social ties [94], composition [60, 61], etc. However, the performance of individuals within teams and of the teams themselves may evolve over time as individuals improve and perfect their skills or learn how to work with others on a given shared task. Understanding how individual and team performance changes over time can then provide suitable insights on how to assemble successful teams. To this aim, I study the performance of players in League of Legends (LoL), a popular MOBA game. Data from MOBA games like LoL enable us to explore the following four research questions: RQ1 Do players improve over time, as they acquire skills and experience through teamwork? RQ2 Are there noticeable changes in individual performance during the course of a single teamplaying session? RQ3 If performance does change over a session, does experience mitigate its variation? RQ4 What factors predict a player’s choice to continue playing or end a given session? The data I study contains records of nearly 242K solo-queue matches played by 16,665 of the most active League of Legends players. After segmenting matches by sessions, therefore segmented periods of gameplay activity without an extended break, we track player performance over the course of the session. We measure performance at two levels: the overall team performance and the individual player’s performance. The former is defined as the fraction of matches during a session won by the player’s team. The latter is defined on the basis of three main player’s actions during the game: the number of kills (K), the number of assists (A), and the number of deaths (D). We compute the kill-death-assist (KDA) ratio of the player, which is a value commonly used by players to compare their performance. Interestingly, both measures show that performance generally declines over the course of a single game-playing session. This is surprising for two reasons: first, players in solo-queue matches do not choose their teammates in the game (we indeed consider this type of matches to avoid the possible influence of playing with friends); second, the game is designed to match opposing teams’ skills and yield an equal probability of winning to each team. 6 However, I systematically observe that the team to which a player is assigned wins, on average, fewer matches if that player already played other matches without taking a break. While similar short-term performance deterioration was observed in the context of different online activities, such as commenting on Reddit [114] or Twitter [65], this is the first time that depletion effect was observed in the context of teamwork and in particular in online games. Moreover, I find that deterioration is more pronounced for novices, rather than veteran players, potentially reflecting the benefits of experience and learning within the game. To identify features predictive of player behavior, I train a classifier to predict whether the player will end the gaming session after the current match. We consider different sets of features related to various aspects of the game: match information, actions carried out by the player in the game, and features related to their performance. I find that the most predictive features correspond to how many matches the player played in the current session and the win rate of the player both in the last match and throughout the session. 7 2.2 Data & Methods 2.2.1 League of Legends and Data Collection League of Legends is a multiplayer online game that combines elements of role-playing, real-time strategy, and tower defense game genres. A single match consists of a strategic, fast-paced battle between two teams composed of five people, who are usually strangers. A team wins by destroying the opposing team’s nexus, a large structure fortified by defensive towers. While the destruction of the enemy nexus is the main goal, teams also aim to fulfill subgoals, which may be necessary for, or conducive to victory; individual players also strive to achieve personal goals, such as a high kill/death ratio. We collected data about League of Legends by using the League of Legends’ Riot Games API.1 With the aim of studying individual performance, I collected information of solo-queue matches, in which players cannot select their teammates. These specific matches allow us to avoid any influence that playing with friends might have on the final performance of players. I additionally require that each player in the dataset has at least 10 matches for two main reasons. First, I want to avoid biases related to players who try the game a few times and never play again. Second, I will focus our analysis on performance evolution in gaming sessions (as described in the following). Thus, I need each player to play at least a few sessions in their history. The final dataset consists of about 242K solo-queue matches played by a sample of 16,665 players between May 2014 and January 2016.The data contains information about matches, including match time and duration, and the number of deaths, kills, earned gold, gold spent, etc., for each player in each match. I reported some additional information about the dataset, such as the number of matches and sessions per player, average matches’ duration, etc., in Tab. 2.1. 1Riot Games API: https://developer.riotgames.com/ 8 Statistics Mean St. Dev. Min Max Kills 6.3 4.8 0.0 42.0 Deaths 6.2 3.3 0.0 51.0 Assists 9.2 6.1 0.0 46.0 KDA 2.8 2.6 0.0 39.0 accumulated Kills 13.7 12.4 0.0 207.0 accumulated Assists 20.0 16.4 0.0 284.0 accumulated Deaths 13.3 10.3 0.0 152.0 accumulated KDA 2.7 2.0 0.0 49.0 Session Mean Kills 6.2 3.8 0.0 36.0 Session Mean Assists 9.1 4.9 0.0 46.0 Session Mean Deaths 6.0 2.7 0.0 32.5 Session Mean KDA 2.6 1.9 0.0 35.0 Quit 0.4 0.5 0.0 1.0 Experience 258.8 273.7 1.0 1575.0 Win 0.5 0.5 0.0 1.0 Current Win Rate 0.5 0.4 0.0 1.0 Session Win Rate 0.5 0.3 0.0 1.0 Match 2.2 1.4 1.0 22.0 session 88.7 89.8 1.0 535.0 Match Duration in Minute 33.5 7.8 5.6 78.6 accumulated Duration in Minute 72.7 48.8 9.3 753.1 Mean Duration in Minute 33.1 6.1 9.3 76.5 Table 2.1: Feature statistics summary.(#Match=242352, #player=16665) 2.2.2 Gaming Sessions To address RQ2 – RQ3, I will need to identify sessions of continuous player activity. The time series of a player’s matches can be decomposed into gaming sessions, i.e., periods of activity without an extended break. The sessions can be identified by examining time intervals between consecutive matches. Cases, where this interval exceeds some predefined threshold, are used to separate matches into different sessions [39, 114]. Here, I define a gaming session of length n as the temporally-ordered sequence of n matches, with no more than a 15-minute break between matches. The break length, corresponding to the median of the distribution of break times between matches, is computed over the most active players of our dataset (i.e., players having at least 10 matches in their history). 9 m1 m2 m3 m4 10 mins 4 mins 55 mins m3 m1 m2 m5 10 mins 4 mins 55 mins Original Sessions m5 3 mins m4 3 mins Randomized Index Sessions Figure 2.1: Original Sessions and Randomized Index Sessions. To check the robustness of our findings regarding individual performance and verify that they are not due to chance, I also carry out an analysis of randomized session data, i.e., sessions where the order of matches for individual players was randomly shuffled according to the strategy depicted by Figure 2.1. The results of this test will be presented later in Figures 2.3a and 2.3b. 2.2.3 Prediction Methods To address RQ4, in our analysis I will present a prediction task that will leverage the three methods described as follows. Random forest is an ensemble-based learning method for classification and prediction that operates by constructing a multitude of decision trees at training time and outputs the class that is the mode of the classes or mean prediction of the individual trees [47]. Random forests increase the generalization accuracy of decision tree-based classifiers without compromising accuracy on training data [46]. In particular, random forests correct for the problem of decision trees over-fitting to the training data [32]. Gradient boosting is a machine learning technique that produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. Gradient boosting produces competitive, highly robust, interpretable procedures for both regression and classification [33]. 10 Adaptive boosting is a machine learning meta-algorithm that produces a prediction model combining weak learners (typically decision trees) into a weighted sum that represents the final output of the boosted classifier [109, 31]. The term adaptive means that subsequent weak learners are adjusted in favor of those instances misclassified by previous classifiers. Even if such an approach is sensitive to noisy data and outliers, as long as the performance of each weak classifier is slightly better than random guessing, the final predictive model can be proven to converge to a strong learner [32]. Moreover, for each classification method, I learn three models, in which I incrementally add different sets of features: (i) match metadata, such as player id, match position in a session, match duration, etc.; (ii) player’s actions, such as kills, deaths, assists, etc.; and finally, (iii) players performance measures, such as the KDA and the binary information about whether the player wins in the match or not, etc. 2.3 Results In this chapter, I study the performance of a set of League of Legends players who played at least 10 solo-queue matches. I require at least 10 matches to consider players who engaged in the game long enough to play a few sessions in their history and avoid the bias that might occur when considering players who tried the game a few times and quit. Importantly, I only select solo-queue matches, in which players cannot decide their team, or part of their team, thus avoiding possible influences of friends in the game. Our dataset comprises about 242K matches played by 16,665 different players. In the following, I will address the research questions previously defined, and I will provide some insights into the possible mechanisms underlying our observations. 11 Figure 2.2: Relationship between experience and player performance. 2.3.1 RQ1. Long-term Performance First, I examine how performance changes with experience (RQ1), thus I compute the long-term performance of players by taking into account their entire history in the dataset, i.e., the total number of matches of each player. Here, I consider two measures of performance. First, I define a team performance measure, which is computed as the fraction of wins. Second, I define an individual performance measure, namely the kill-death-assist ratio KDA, defined as (k +a)/max(1,d), where k is the number of kills, a is the number of assists, and d is the number of deaths of a player in a given match.2 Figure 2.2 reports how performance changes, measured by the overall fraction of wins (top panels) and KDA (bottom panels) for each player as they play more matches. As I can observe, there is no long-term team performance improvement with experience (ρ = 0.02). The longer users play, the more the performance related to their teams reverts to the mean, which is approximately 0.5 (Fig. 2.2, top panels). A possible explanation might be related to the game’s design. In fact, players are given Elo-like ratings, a method used to calculate the relative skill of players in competitor-versus-competitor games such as chess, and these ratings are used to assemble 2http://leagueoflegends.wikia.com/wiki/Kill_to_Death_Ratio 12 teams of players with comparable skills. In other words, if a player’s skill improves, he/she will be paired up against players with similar skill levels and analogously if the skill level decreases. Thus, the likelihood of winning each match is not significantly better than 50%. I noticed the same effect when studying the KDA ratio, whose values revert to the mean score of 2.7 (Fig. 2.2, bottom panels). 2.3.2 RQ2. Short-term Performance Our second question (RQ2) explores short-term performance over the course of one session. In contrast to long-term performance, the player’s performance, both measured by the fraction of matches the player’s team won and the player’s KDA of each match, degrades measurably over the course of a single session. Figures 2.3a and 2.3b (left panels) provide a comparison between the performance achieved by players in sessions of different length (number of matches going from 1 to 5). We can observe that both types of performance at the end of a session are lower than at the beginning of that session. Moreover, the longer the session, the larger the performance decline: for sessions with three or more matches, the win rate and the KDA value respectively deteriorate by more than 10% and 8% between the first and the last matches in the session. Such short-term performance deterioration is not present in the randomized data (Fig. 2.3a and 2.3b, right panels), suggesting the presence of a real effect and not simply a byproduct of data heterogeneity. Performance declines over the course of a session according to both measures (win rate and KDA). The only difference is the initial improvement during longer game-playing sessions: this pattern might reflect a "warm-up" period. This pattern is stronger for the team’s performance measure (win rate) than for the player performance measure (KDA). The decline in the team’s performance suggests that the teams a player is assigned to later in the session do not perform as well as the teams the player is assigned to earlier in the session. On the other hand, deterioration is also observed in individual performance. This phenomenon might be associated with cognitive effects, such as mental fatigue, boredom, attention decline, etc. (we reported relevant research in this area in the Related Work section). 13 (a) Win rate (b) KDA ratio Figure 2.3: Performance deterioration over the course of a gaming session. Each line reports an average (a) win rate or (b) KDA ratio for each successive match of a gaming session of a given length. Matches played later in the session have lower performance (left plots), but not when play data has been randomized (right plots). Error bars represent standard deviations (standard errors would be almost invisible due to large sample sizes). 14 Figure 2.4: Comparison of Performance Deterioration in High versus Low Experience Players. (a) high experienced players’ (top 5 percentile) win rate and KDA performance comparison and (b) win rate and KDA performance comparison for low experienced players (bottom 5 percentile). 2.3.3 RQ3. Effect of Experience on Performance Deterioration Does experience mitigate performance declines? To answer our third research question (RQ3), I studied how deterioration is linked to players’ experience. To this end, I ranked players by the number of matches they played and compared highly experienced players (those in the 95th percentile or above) with the less experienced players (those below the 5th percentile by number of matches played). Figure 2.4 shows the magnitude of performance deterioration over the course of sessions played by the highly experienced players (left panel) and the less experienced ones (right panel). The performance of the latter group of players declines far more than the experienced players. Comparison to randomized data suggests that these trends are not due to chance. This suggests that player experience mitigates the mechanisms that lead to short-term performance deterioration. For example, experienced players may use their available cognitive resources more efficiently and stretch them over more games. The analysis supports the hypothesis that highly experienced players tend to engage in longer gaming sessions compared to less experienced players. Boxplots in Figure 2.5a show that the average length of sessions played by these two groups of 15 (a) Session length (b) Session duration Figure 2.5: Comparison of highly-experienced vs.inexperienced players. (a) Average session length and (b) session duration (in seconds) for these players. players is significantly different (Wilcoxon test, p-value < 0.0005). The difference is still statistically significant even when only the player’s first 20 sessions are taken into account (Wilcoxon test, p-value < 0.0005), indicating that highly experienced players are different from other players already at the beginning of their tenure. These players not only play more games during a session, they also play longer. Boxplots in Figure 2.5b shows that the duration of sessions (in seconds) of the highlyversus less experienced players are also significantly different (Wilcoxon test, p-value < 0.0005). Although the reason why the more experienced players are able to play longer is still unknown, its net effect is to partially shield these players from the effects of performance deterioration. 2.3.4 RQ4. Short-term Engagement Prediction To address our last question (RQ4), I focus on player engagement. In particular, I examine what characteristics predict if some players engage with short gaming sessions, while others go on to have longer sessions. We formulate this problem as a prediction task. Specifically, given a player’s history, described by a set of match-related features, our goal is to predict whether a given match will be the player’s last in the session. We chose three different sets of features to characterize players: features describing matches, game actions, and performance. Match features (henceforth, MATCH) include: • match: current match’s position in the current session; 16 • match duration: duration (in seconds) of the current match; • accumulated match duration: duration (in seconds) of the current session. • mean match duration: average match duration in the current session; • sessions: total number of sessions played until now; • player id: the unique identification of each player; • experience: total number of matches played until current match; Players’ actions (henceforth, ACTIONS) in the game include: • kills: number of kills a player performed in the current match; • deaths: number of deaths a player suffered in the current match; • assists: number of assists a player carried out in the current match; • accumulated kills: total number of kills a player performed in the current session; • accumulated deaths: total number of deaths a player suffered in the current session; • accumulated assists: total number of assists a player helped in the current session; • mean kills: average kills a player performed per match in the current session; • mean deaths: average deaths a player suffered per match in the current session; • mean assists: average assists a player carried out per match in the current session; Finally, I characterize players performance (henceforth, PERFORMANCE) through the following features: • KDA: kill-death-assist (KDA) ratio of a player in the current match; • accumulated KDA: KDA ratio of a player in the current session; 17 • mean KDA: average KDA a player achieved per match in the current session; • win: binary variable indicating whether the player won or lost the current match; • session win rate: fraction of wins in the current session; • current win rate: the fraction of wins until the current match in the current session; I label each match in the data set as a positive outcome if that match is the last match of the player’s session, and a negative outcome if the player keeps playing after that match. Our dataset is mildly unbalanced, containing 145,169 positive labels and 261,037 negative ones. This is consistent with the presence of several sessions of length greater than 1 (i.e., with at least two matches). In machine learning, standard evaluation metrics that do not account for uneven class distribution can be misleading. To address this challenge, I perform two different predictive tasks: (i) I use the full (unbalanced) dataset to evaluate the performance of three prediction models by means of the Area Under the Receiving Operator Characteristic Curve (AUC), providing an evaluation for the true and false positive rates of the models’ predictions (where AUC = 1 represents a perfect test); (ii) I under-sample the original data to obtain a balanced dataset and evaluate the performance of our prediction models through standard metrics such as precision (i.e. the fraction of true predicted positive outcomes overall positive predictions), recall (i.e. the fraction of true predicted positive outcomes overall positive outcomes), accuracy (i.e. the fraction of correctly predicted outcomes overall outcomes), and F1 (which combines precision and recall measures). In both prediction tasks, I compare the performance of three ensemble-based prediction models: Random Forest (RF), Gradient Boosting (GB), and Adaptive Boosting (AB). I perform a 10-fold cross-validated grid search over the hyperparameter space to find the best combination of hyperparameters for each classifier. To prove the robustness of the results, I report mean scores and standard deviations obtained via Monte Carlo cross-validation. Here, I use 90% of the data samples to train and the remaining 10% to test our models. For each classification algorithm (RF, GB, and AB), I learn 3 distinct predictive models in which I cumulatively add the different sets of features: (1) I only consider match metadata (namely, 18 Model 1 (MATCH) RF GB AB AUC 0.830±0.003 0.837±0.002 0.837±0.003 F1 0.803±0.002 0.818±0.002 0.818±0.003 Precision 0.709±0.004 0.702±0.003 0.701±0.004 Recall 0.926±0.002 0.981±0.001 0.982±0.001 Accuracy 0.773±0.002 0.783±0.002 0.783±0.003 Model 2 (MATCH+ACTIONS) RF GB AB AUC 0.827±0.003 0.839±0.001 0.836±0.002 F1 0.813±0.002 0.819±0.002 0.818±0.002 Precision 0.703±0.004 0.704±0.003 0.701±0.003 Recall 0.965±0.002 0.979±0.001 0.981±0.001 Accuracy 0.779±0.003 0.783±0.002 0.782±0.003 Model 3 (MATCH+ACTIONS+PERFORMANCE) RF GB AB AUC 0.968±0.001 0.976±0.001 0.914±0.002 F1 0.962±0.001 0.959±0.001 0.888±0.003 Precision 0.927±0.002 0.922±0.002 0.824±0.004 Recall 0.999±0.000 0.999±0.000 0.962±0.003 Accuracy 0.960±0.001 0.957±0.001 0.878±0.003 Table 2.2: Classification performance metrics scores. MATCH); (2) I additionally take into account the action features (namely, MATCH+ACTIONS); (3) Finally, I add the features related to performance (namely, MATCH+ACTIONS+PERFORMANCE). This procedure is commonly called model nesting. In the first prediction task (unbalanced data), the best performance is obtained by the model (3) where all the 22 features are used (i.e., MATCH+ACTIONS+PERFORMANCE). As shown in Tab. 2.2, the best result is obtained by Gradient Boosting (AUC = 0.976±0.001), followed by Random Forest (AUC = 0.968±0.001 over 512 different decision trees), and Adaptive Boosting (AUC = 0.914±0.002). The most significant features, whose Gini index (i.e. a score indicating the relevance of each specific feature in the prediction task) is reported in Tab. 2.3, used by the GB classifier are session win rate (feature importance = 0.163), current win rate (feature importance = 0.286), and match (feature importance = 0.087). The importance of the match index in the session, which indicates how much time players have already spent in the game, in predicting behavior suggests that people have a finite budget, whether of time or cognitive resources, for 19 gameplay. At the same time, the overall team performance (current and session win rate) also decreases during the session. The perception of a decreasing win rate, combined with the exhaustion of a finite budget, may lead to a player’s decision to quit the game. In the second prediction task (balanced data), the highest accuracy is again achieved by Model 3 (MATCH+ACTIONS+PERFORMANCE). The best results, shown in Tab. 2.2, are provided by RF (accuracy = 0.960 ± 0.001), followed by GB (accuracy = 0.957 ± 0.001) and AB (accuracy = 0.878±0.003). Consistently with the results provided in the first prediction task, the features identified by the RF classifier as most predictive are: match (feature importance = 0.364),current win rate (feature importance = 0.335) and session win rate (feature importance = 0.111). 2.4 Related Work 2.4.1 Individual and Team Performance in Games Various recent studies explored human performance and activity in online games. Several authors investigated aspects of team performance [71, 8, 60, 61], as Ill as individual performance [54, 128, 103, 105, 89] in multiplayer team-based games. In [79] an extensive review about team effectiveness is provided. Here, the authors analyze different aspects of teamwork, such as team outcomes (team performance, members’ affect, and viability), mediator-team outcome relationships, and team composition. Other aspects of social and group phenomena in virtual environments are covered in the review by [115]. In this work, the authors identified four major topics related to virtual environment studies: testing that laws of social behaviors in real life also apply in virtual environments, finding social behavior norms, focusing on micro-level social phenomenon, and filling the gap in Ill-established theoretical discussions and paradigms within social science. The "optimal" composition of temporary teams also attracted a lot of research: Kim et al. studied League of Legends to determine how team composition affects team performance [60, 61]. Using mixed-methods approaches, the authors studied in-game role proficiency, generality, and congruency 20 Model 1 (MATCH) Random Forest Gradient Boosting Adaptive Boosting feature name score feature name score feature name score match 0.368 cum. match duration 0.249 experience 0.371 match duration 0.131 experience 0.183 session 0.354 player id 0.113 session 0.148 match duration 0.080 mean match duration 0.105 match duration 0.117 player id 0.076 experience 0.103 player id 0.112 mean match duration 0.063 cum. match duration 0.101 mean match duration 0.096 cum. match duration 0.043 session 0.079 match 0.095 match 0.014 Model 2 (MATCH+ACTIONS) Random Forest Gradient Boosting Adaptive Boosting feature name score feature name score feature name score match 0.364 cum. match duration 0.141 experience 0.342 match duration 0.069 experience 0.141 session 0.334 player id 0.061 session 0.139 player id 0.063 experience 0.060 player id 0.091 match duration 0.059 mean match duration 0.054 match 0.080 mean match duration 0.039 session 0.046 match duration 0.078 cum. kills 0.023 cum. match duration 0.046 cum. kills 0.046 cum. match duration 0.021 mean assists 0.038 cum. deaths 0.045 cum. assists 0.021 assists 0.037 mean match duration 0.044 mean assists 0.020 mean kills 0.035 cum. assists 0.043 mean kills 0.016 mean deaths 0.034 assists 0.033 mean deaths 0.016 kills 0.034 kills 0.032 kills 0.012 cum. assists 0.033 mean deaths 0.025 cum. deaths 0.12 cum. kills 0.030 mean kills 0.024 match 0.008 deaths 0.030 deaths 0.020 deaths 0.008 cum. deaths 0.028 mean assists 0.020 assists 0.008 Model 3 (MATCH+ACTIONS+PERFORMANCE) Random Forest Gradient Boosting Adaptive Boosting feature name score feature name score feature name score match 0.364 current win rate 0.301 session win rate 0.367 current win rate 0.335 session win rate 0.194 current win rate 0.209 session win rate 0.111 match 0.087 experience 0.135 match duration 0.020 cum. match duration 0.072 session 0.129 player id 0.018 experience 0.058 match duration 0.035 experience 0.016 session 0.051 cum. match duration 0.020 mean match duration 0.014 match duration 0.036 player id 0.018 KDA 0.013 player id 0.029 match 0.016 cum. match duration 0.012 mean match duration 0.022 KDA 0.012 session 0.012 cum. assists 0.021 mean match duration 0.010 mean KDA 0.010 cum. kills 0.021 deaths 0.008 cum. KDA 0.010 cum. deaths 0.017 mean deaths 0.008 assists 0.009 mean assists 0.014 assists 0.006 kills 0.008 KDA 0.013 mean assists 0.006 mean assists 0.008 cum. KDA 0.011 cum. kills 0.004 mean kills 0.008 mean KDA 0.010 cum. deaths 0.004 cum. assists 0.007 deaths 0.010 cum. assists 0.004 cum. kills 0.007 mean kills 0.008 cum. KDA 0.004 deaths 0.007 kills 0.008 mean KDA 0.004 mean deaths 0.006 assists 0.008 mean kills 0.002 cum. deaths 0.006 mean deaths 0.007 win 0.002 win 0.000 win 0.003 kills 0.000 Table 2.3: Feature importance table. Ranking based on the Gini splitting index. 21 to determine the influence of these constructs on team performance. Proficiency in tacit cooperation and verbal communication highly correlate with team victories, and learning ability and speed of skill acquisition differentiate novices from elite players. The importance of communication and its effects on team performance has been extensively studied by Leavitt and collaborators once again in LoL [71]: the authors studied both explicit and implicit (nonverbal, i.e. pings) communication, highlighting differences based on player styles, and different extents of effectiveness in individual performance increase. Finally, the topic of individual performance in online games has been studied on different platforms. Shen et al. [111] suggested in their paper that gender-based performance disparities do not exist in massive multiplayer online games (MMO). In their work, the authors operationalized game performance as a function of character advancement and voluntary play time, based on [117] and show how character levels correlate with other types of performance metrics. Other works looking at individual performance analyze first-person shooter games: Microsoft researchers studied the performance trajectories of Halo players, as well as the effect that taking prolonged breaks from playing has on their skills [54]. Analyzing individual game performance allowed them to categorize players in groups exhibiting different trajectories and then study how other variables (demographics, in-game activity, etc.) relate to game performance. This analysis reveals the most common performance patterns associated with first-person online games, and it allows us to model skill progression and learning mechanisms. Finally, Vicencio-Moreira and coauthors studied individual performance as a tool to balance game design and game-play [128]: the authors defined several statistical models of player performance and associated them to multiple dimensions of game proficiency, demonstrating a concept of an algorithm aimed at balancing individual skills by providing different levels of assistance (e.g., aim assistance, character level assistance, etc.) to make the game-play experience more balanced and satisfactory by matching players of different skill levels. 22 To the best of our knowledge, ours is the first study to focus on individual performance within temporary teams, to analyze the effect of performance deterioration over the short term, and to determine its interplay with engagement. 2.4.2 Team-based Online Games and Engagement Video-games represent a natural setting to study human behavior. Prior to this study, several works have been devoted to analyzing the behavior and activity of players in multiplayer games. In particular, behavioral dynamics of team-based online games have been extensively studied in role-playing games like World of Warcraft [85, 6], in battle arena games like League of Legends [67, 68, 103], and in others games [55, 125, 89]. The earlier studies focused on massively multiplayer online games like World of Warcraft, which exhibit both a strong component of individual game-play (e.g., solo quests aimed at increasing one’s character level and skills) as well as collaborative instances (e.g., raid bosses). Nardi and Harris first [85], and Bardzell and collaborators shortly after [6], analyzed the 5-person raid-boss instance runs to determine the ingredients of successful cooperative game-play. By means of a mixture of survey-based and data-driven analysis, the authors illustrated how the social component (i.e., chatting with teammates and guild-based activity) was the leading factor in satisfaction and engagement. Later studies focused on MOBAs: Kuo et al. investigated engagement mechanisms on League of Legends [67, 68], by means of semi-structured interviews with players, aimed to unveil the elements behind successful team composition in temporary teams. Communication (written and oral) and effective collaboration strategies are linked to a satisfactory game experience. Similar results hold for other MOBAs [55, 125]. In conclusion, a recent study investigated the relationship between brain activity and game-play experience in multiplayer games: playing with human teammates yields higher levels of satisfaction but lower overall performance and coordination than playing with computer-controlled teammates [56]. 23 Although our work does not focus on the analysis of engagement in team-based online games, the results I found could be leveraged to design incentives to increase players’ engagement over time and used to prevent players from quitting the game. 2.4.3 Performance Deterioration Performance deterioration following a period of sustained engagement has been demonstrated in a variety of contexts, such as student performance [112], driving [14], data entry [42], self-control [84] and, more recently, online activity [65, 114]. In particular, in vigilance tasks, i.e. tasks which require monitoring visual displays or auditory systems for infrequent signals, performance was shown to decrease over time, with concomitant increases in perceived mental effort [107]. For example, after long periods in flight simulators, pilots are more easily distracted by non-critical signals and less able to detect critical signals [131]. Factors leading to a deteriorating performance are still debated [13, 70, 77]. However, deterioration has been shown to be associated with physiological brain changes [76, 73, 90], suggesting a cognitive origin, whether due to mental fatigue, boredom, or strategic choices to limit attention. In particular, mental fatigue refers to the effects that people experience following and during the course of prolonged periods of demanding cognitive activity, requiring sustained mental efficiency [76]. Persistent mental fatigue has been shown to lead to burnout at work, lower motivation, increased distractibility, and poor information processing [22, 51, 11, 76, 101, 127, 12, 50]. Moreover, mental fatigue is detrimental to individuals’ judgments and decisions, including those of experts, e.g., judges are more likely to deny a prisoner’s request as they advance through the sequence of cases without breaks on a given day [20], and evidence for the same type of cognitive fatigue has been documented in consumers making choices among different alternatives [130] and physicians prescribing unnecessary antibiotics [75]. Recent studies indicate that cognitive fatigue destabilizes economic decision-making, resulting in inconsistent preferences and informational strategies that may significantly reduce decision quality [87]. 24 Short-term deterioration of individual performance was previously observed in other online platforms. It has been shown that the quality of comments posted by users on Reddit social platform [114], the answers provided on StackExchange question-answering forums [28], and the messages written on Twitter [65] decline over the course of an activity session. In all previously studied platforms, users worked individually to produce content or achieve some results, while in the present work, I considered both measures for individual performance (i.e., KDA) and the performance achieved by the team (i.e., win rate). I can interpret the KDA ratio of a player as the quality of his/her playing style during a match, which can thus be compared to the results previously achieved in other types of platforms. 2.5 Conclusion This research addressed four questions about modeling individual performance within temporary teams. To this aim, I studied players of a team-based online game, League of Legends, and measured performance at the level of the team, as the fraction of matches the player’s team won, and at the individual level, by computing the KDA ratio of the player at the end of each match. In the long term, I observed that there is no evident performance (both team and individual) improvement with experience and that both measures of performance asses around their mean value. This observation might be linked to the game design: the team composition balancing strategy limits individual performance variance and thus reduces individual contributions to their team performance. In the short term, i.e., over the course of a single game-playing session, our performance measures showed a strong deterioration pattern: the longer a player’s session is, the more performance decreases, with metrics decreasing on average by 8-10% between the beginning and end of a session. Our findings are consistent with observations made on different online platforms and social networks, where the performance deterioration was observed over the course of sessions.We found, however, that experience modulates short-term performance changes, potentially reducing the effects of performance depletion. Player experience (i.e., the overall number of matches played 25 by each individual), appeared indeed to mitigate some of the effects of performance deterioration: the more experienced players showed less performance decline over the course of a game session than the less experienced ones. Other factors, that are not investigated in the present work can influence performance in team-based games: the presence of friends in the team could trigger higher collaborative behavior, and players’ performance in MOBA games can also be affected by the role the players are impersonating, and the composition of the team can have an effect on players decisions during the game. We have shown, through the analysis of performance in the short term, that players tend to quit the game session after a certain number of matches in which their performance declines. I also investigated the factors that are predictive of a player quitting a game session. To this aim, I designed a prediction task in which I defined three sets of features. Each of these sets describes a specific aspect of the game. I considered features related to matches, players’ actions, and performance. I found that the features that best predict whether the player will quit the session are those associated with the match histories (e.g., session length, match duration, etc.). These findings are consistent with the hypothesis that players have a finite "cognitive budget" for playing, which they deplete with gameplay. While our work does not address the origins of depletion, whether through growing boredom or cognitive fatigue, we have shown that this phenomenon has different effects on experienced and inexperienced players. By leveraging our findings, individualized incentive strategies could be designed to identify different classes of performers and reward them dynamically and differently based on personalized, relative performance assessments. This would allow us to overcome the issues related to long-term performance and game design by guaranteeing a satisfactory game experience for experienced and inexperienced players. Moreover, incentives that enhance players’ engagement in the game could be combined with our predictions to prevent a player’s choice to quit the session or frustration that may drive them to quit the game. Our future efforts will thus be devoted to further research in the science of individualized incentives. 26 Chapter 3 Learning Team Play: The Influence of Social Ties on Performance 3.1 Introduction Unveiling the interplay mechanism between social ties and complex human behavioral dynamics has become increasingly essential for understanding human decision-making in the virtual world. Currently, the study of social ties and the study of human performance are two independent research fields. Social tie researches are either devoted to illuminating on the benefits that such connections can bring to mental and physical health, such as lower levels of stress and increased longevity [58, 18, 126, 122], or interested in comparing online and offline social ecosystems, studying the formation of online social ties and showing how these ties evolve into social networks [2, 27, 120, 135, 21], including in video games [143, 124]. Human performance researchers have been focusing on evaluating how physical and mental factors impact performance in education or sports contexts [72]. Therefore, little attention has been devoted to studying the interplay between social ties and human performance dynamics. Being able to shed light on the interdisciplinary research of both fields is the subject of this study. In the age of big data, humans leave behind traces of their online activity in the form of digital behavioral data, which facilitates our research and bestows us with new data-centric perspectives to study social ties. Understanding the relationship between social ties, in particular, preexisting 27 connections within team members, and (individual or team) performance is a question of broad relevance across education, psychology, and management sciences [136, 37, 15, 19, 83]. However, these researches are dependent on established methods like interviews, surveys, or ethnographic observations based on a small population. In this chapter, our main interest is exploiting dataintensive methods to study the influence that social ties exert on human performance in collaborative team-based settings, more specifically in Multiplayer Online Battle Arena (MOBA) video games. Our research will focus on a popular MOBA collaborative team-based game, Dota 2, a rich dataset allowing us to study millions of players and their co-play matches. Dota 2 is one of the most successful MOBA games: according to the official Dota 2 website, more than ten million unique players participate in the games each month. 1 Dota 2 not only hosts a huge user base but also innately incorporates mechanisms that stress the impact of social ties. Since two opposing teams, each consisting of 5 players, have to compete against each other and demonstrate themselves as the better team, preexisting friendships are put to the test, and strangers are brought together. Each player has the autonomy to befriend other players, and these constructed social ties are stored in a friendship list on Steam2 , the online game distribution platform that hosts Dota 2 and hundreds of other games and associated communities. In each list, both the time of friendship creation and the players’ ID involved in each dyad are recorded. Social ties are the connections among people used for propagating all forms of information. Depending on the intensity of interaction between the two participants, social ties can be categorized as latent, weak, or strong ties [35]. Therefore, for accurate and systematic analysis, in this research, I only focus on one particular type of social tie: the friendship lists provided by the Steam community. I jointly leverage the behavioral data provided by the log of Dota 2 matches to evaluate human performance. Motivated by the need for a thorough investigation of the influence of social ties on performance, and in light of the recent advancement in network science and team science, I analyze our data considering four different perspectives: 1Statistics on Dota 2 Official Website: http://blog.dota2.com/ 2Steam Community Website: https://steamcommunity.com/ 28 (a) Match Distribution (b) Duration Distribution (c) Individual’s Matches Time Gap Distribution Figure 3.1: Distribution plot of our dataset. (a) Number of matches per player (b) Duration (in seconds) per match and (c) Time gap distribution (in hours) between consecutive matches. RQ1: What is the influence of social ties on individual players’ activity? I will test whether the presence of social ties affects the activity of individuals within a team. Our hypothesis is that the presence of preexisting friendship ties within a team will increase teammates’ activity. I will set to test whether there exists a spillover effect (event in one context that occurs because of something else in a seemingly unrelated context) by which even individuals who do not have friendship connections with other teammates, but who play in a team where some players are friends among each other, experience such effect. If social ties have an effect, I will also characterize which dimensions of activity it affects. RQ2: What is the influence of social ties on team dynamics? I will investigate whether preexisting social ties will affect the team’s performance as a whole. I will further investigate the subsets of teams composed of high/low experience players and high/low performing players. Our hypothesis is that preexisting social ties improve team performance. I will test whether this is the case, and if so, I will characterize how performance is affected. While the former two questions focus on measuring effects within single matches, the next two questions focus on effects that span over the course of a gaming session (i.e., a nearly uninterrupted sequence of consecutive matches): 29 RQ3: What is the influence of social ties on individuals over gaming sessions? I will study whether playing game sessions within teams with preexisting social ties affects individuals’ shortterm activity. I hypothesize that the presence of such ties can mitigate the known effects of deterioration in individual performance over the course of the sessions. RQ4: What is the influence of social ties on teams over gaming sessions? I will determine whether the short-term performance of team members is affected by the presence of social ties. Our hypothesis is that social ties can influence team performance and mitigate known session-level deterioration effects. This chapter is organized as follows: I will first explain data gathering and preprocessing steps (see Section §3.2). In Section §3.3, I will elucidate my methods when answering our four research questions. The results will be presented and discussed in Section §3.4. I will also provide an overview of literature concerning social ties, online games, and performance dynamics in Section §6.2. In Section §3.6, I will conclude our study and shed light on its potential applications and future extensions. Below, I summarize the contributions of this work by highlighting the novelties I brought about with respect to previous studies: • I proposed a methodological framework based on stratified statistical testing that is capable of untangling the role of friendship at the individual and team levels. Our methodology can support future research aiming to unveil patterns in complex systems and team studies. • Our study reveals the hidden influence social ties exert on different types of players or teams, which are not only categorized by friendship connections but also by experience and skill. Such analysis can lead to new theoretical frameworks to optimize team formation for real-world scenarios. • I further characterizes the effect of performance deterioration over several consecutive matches (gaming sessions) for individual players and teams. Our findings can be further applied to 30 mitigate human performance deterioration effects over repetitive tasks for both the online and offline workforce. 3.2 Data & Statistics 3.2.1 Match log data from Dota 2 Defense of the Ancients 2 (Dota 2), is a multi-player online battle arena (MOBA) video game. We acquired match log data of Dota 2 from the OpenDota API.3 This service provides information such as match duration, team members, action statistics of each player, matchmaking type, etc., for millions of Dota 2 matches. The Dota 2 gaming system provides four mechanisms to construct the two opposing teams (matchmaking), i.e., normal match, ranked match, practice 1-vs-1 match, and bot match. Since ranked matches are governed by the Matchmaking Rating (MMR) system, friends cannot freely group together—the goal is to create artificially-balanced opposing teams, thus teams are often composed by random strangers. Practice 1-vs-1 matches and bot matches do not meet our need to evaluate how social ties affect human performance in team-based human environments. Therefore, in our analysis, I only focus on the normal matches, where ten human players participate in one 5-vs-5 match. In this chapter, I only study normal matches. In such matches, two teams, each composed of five human players, compete to destroy the opposing team’s fortified home base known as the "Ancient", and to defend the Ancient themselves. Each player can draft a virtual avatar known as a hero, to participate in each match. The game is designed with an internal nudging mechanism to foster cooperation since heroes have complementary abilities (e.g., Pudge is popular for its strength, Sniper is recognized for agility, Invoker is known for intelligence, etc.). Thus, to increase the probability of winning, teammates must coordinate to form balanced teams during the draft phase. Then during the preceding match, teammates need to collaborate by filling in various desirable roles (e.g., Carry, Disabler, Support), coordinate by constant communicating, and support each other 3OpenDota API: https://docs.opendota.com/ 31 by harvesting resources (e.g., collect gold by killing AI-controlled mobs called creeps) or casting spells, in order to defend their base and towers, attack and defeat the enemies, and destroy rivals’ towers and base. Due to data privacy, some users’ match records I collected are incomplete. After discarding these unusable match records, our dataset contains 3,566,804 matches, comprising 1,940,047 unique players, and spans from July 17, 2013, to December 14, 2015. Figure 3.1a shows the number of matches per player in our dataset. The average match duration of the matches in our data is about 41.8 minutes, and its distribution is displayed in Figure 3.1b. 3.2.2 Friendship data from Steam Steam is currently the world’s largest digital game distribution platform where registered users can not only purchase and manage a variety of games but also join gaming communities. It is worth noting that the Steam platform and Dota 2 can be synchronized by converting the Steam account ID to the Dota 2 player account ID. Therefore, I am able to link each Dota 2 player’s in-game behavioral data acquired from the Dota 2 API with their friend list (and other account metadata) on Steam. According to Steam’s official Website,4 over 10 million players are active on the platform on a daily basis. The Steam platform, with its open API, has provided researchers with access to a massive amount of data, that has been leveraged to analyze various aspects of players’ behaviors. For instance, the previous study has analyzed play-time related, cross-game behavior of Steam users [113]. Statistic abstractions between the different components of game achievements have been proposed using Steam data [40]. Other than cross-game behaviors, the gamers’ social network provided by the Steam Community has also caught the attention of the research community, such that researchers have analyzed the evolution patterns of the Steam community network [7] and utilized the network structure of Steam to identify cheaters [10]. Despite these macro-level analyses, the influence of social ties (i.e., online friendships) on individual and team performance remains 4Steam Official Statistics: https://store.steampowered.com/stats/ 32 largely unexplored. Therefore, I will utilize the friendship lists of Dota 2 players provided by Steam to reconstruct the player social network and closely examine the impact of social ties on players’ in-game performance and behavior. We construct the in-game friendship network using the following steps. Firstly, I identified 2 million players’ IDs from match log data. Then I requested the friendship lists for all discovered players from the Steam API. After making sure that each friendship pair was formed before the starting time of each match, I can construct the exact team-wise friendship network structure within each match. I describe teams in each match as a network with 5 nodes, i.e., I use 5 nodes to represent 5 players and an edge to represent a preexisting friendship (pairwise friendship formed before the starting time of the match). Similar to the Dota 2 API, the Steam API also respects each player’s data privacy preferences. I requested friendship information for all 1,940,047 distinct Dota 2 players and acknowledged that data for 227,045 players was unavailable due to privacy restrictions. 3.2.3 Final dataset We now finally combine information provided by both the Dota 2 API and the Steam API. To this aim, I make sure that for each team in our final dataset, all features about 5 players’ friendships and match actions are openly accessible (i.e., not restricted by privacy settings). Our final dataset, therefore, contains 954,731 players, 673,864 teams, and 621,629 matches. I use this final dataset in all our experiments, discussed next. Here, I have records of 365,412 teams consisting of 652,215 unique players who participated in 337,043 normal matchmaking matches. This dataset starts on July 14, 2014, and ends on December 14, 2015. It includes match features, player’s actions, and social ties related to each team. 33 3.3 Methods 3.3.1 RQ1: Overview of the influence of social ties on individuals We first introduce the types of social structures I will study. There exist three types of teams in our setting: • Teams of strangers, where no preexisting friendship ties exist among any players prior to the match; • Teams of friends, where each team member has at least another friend in the team; and, • Mixed teams, where some members have at least one friend among their teammates, while others do not. It is worth noting that in our categorization, I do not have a distinct definition for teams that are cliques, i.e., where each player is friends with everyone else because these instances are exceptionally rare in the data at hand. Furthermore, I consider individuals who are playing in teams of friends, as well as in mixed teams, as conditions potentially affected by the influence of social ties. Conversely, I hold teams of strangers as control groups where players cannot be affected by social ties, due to the absence of friendships. Since our first research question focuses on analyzing the influence of social ties on individual players, for each instance of a match, I divide players into three types: • Null players, i.e., those playing in a team of strangers; due to the absence of social ties, these players are used as null models (thus the name "null players"), or baselines, to compare and contrast with other player types. • In-friendship players, i.e., those playing in teams of friends, as well as those playing in mixed teams, who have preexisting social ties with some teammates (both sets of players may directly experience social ties’ influence). 34 • Out-friendship players, i.e., those playing in mixed teams who do not have preexisting social ties with any of their teammates (yet may indirectly benefit from their teammates’ preexisting friendships). In Section §3.4.1, I analyze the influence of social ties on players with friends and players without friends separately. Take the study on teams of friends, for instance: I compare the statistic distributions of in-game actions (kills, assists, and deaths) for in-friendship players with that of the null players, contrasting the distributions by both statistical analysis and visual analysis (using so-called violin plots). I carry out t-test(s) to prove or reject our hypothesis. If the t-test results are statistically significant (i.e., p-value<0.05 ) across all observed distribution pairs, our hypothesis is confirmed, and thus I observe an effect of social ties on in-game activity. It is worth noting that the data of null players are randomly sampled with reshuffling to yield samples of the exact same size as the samples of in-friendship players. Likewise, I use the same null model strategy to analyze the impact of social ties on out-friendship players. 3.3.2 RQ2: Overview of the influence of social ties on teams After answering our first research question, I proceed to include not only in-game actions but also performance and experience in our analysis. I use the kill-death-assist ratio (KDA) to measure both the performance of individual players and the performance of teams. KDA can be formalized as (k +a)/max{1,d}, where k, d, and a represent kills, deaths, and assists, respectively, for a player or team in a given match. The teams composed of players with very high/low experience and very high/low skills are of particular interest for our analysis, provided that they may exhibit noteworthy behavioral patterns: for example, they may exacerbate the effect of social ties’ influence in one direction or another. For this purpose, I add two match-based features for each team, i.e., the average KDA and the experience. I calculate the average KDA by averaging all five players’ KDA in the current match, and I calculate the experience by summing over the number of past matches of all five players until the current match. I select the top 25% in either feature as a high-level category and the bottom 25% 35 as a low-level category. Having divided the teams into the four categories, I further compare the actions and performance of the whole team, the players with friends (in-friendship players) as well as the players without friends in mixed teams (out-friendship players) with that of the null players in each category. I compute the difference in each case as (Y −X)/X, where Y is the mean of actions or performance of players (or teams) who may be subject to the influence of social ties, and X is the mean of actions or performance of null players (teams of strangers). Thus, to summarize, I select four categories of teams as follows: • Low Experience & Low KDA: these are teams that are in the bottom 25th percentile by experience (no. of played matches summed over team) as well as by performance (team members’ average KDA). • High Experience & Low KDA: these are teams that are in the top 25th percentile by experience (no. of played matches summed over team) and in the bottom 25th percentile by performance (team members’ average KDA). • Low Experience & High KDA: these are teams composed of players that are in the bottom 25th percentile by experience (no. of played matches summed over team) and in the top 25th percentile by performance (team members’ average KDA). • High Experience & High KDA: these are teams that are in the top 25th percentile by experience (no. of played matches summed over team) as well as by performance (team members’ average KDA). 3.3.3 RQ3: Influence of social ties on individuals over sessions Often, individual players tend to complete a sequence of matches rather than a single match before they decide to stop their gaming session. Playing consecutive matches may bring tiredness and boredom to players, which could, in turn, affect their performance. On the other hand, playing consecutive matches could also help train proficiency. Due to this dichotomy, this aspect warrants 36 further investigation. Therefore, I formalize consecutive playing patterns as individual gaming sessions. Provided that I don’t know exactly when players start or interrupt a gaming session, I need to infer such sessions from the start/end times recorded in each match’s meta-data. We set 1-hour as the threshold to split gaming sessions: if the time gap between the end of a match and the beginning of the next match, for each player, is shorter than one hour, I assume that these two matches belong to the same gaming session; otherwise, I split two neighboring matches into separate gaming sessions. I calculate all the time gaps—the time intervals between the end of a match and the beginning of the subsequent match—for each player and concatenate all players’ time gaps together. Fig. 3.1c shows the distribution of time gaps that are less than 24 hours in our dataset: Among these 84K time gaps, the median is 1.265 hours, supporting our choice of 1-hour threshold to split sessions. Each gaming session consists of a list of consecutive matches ordered by their starting time. Such sequence index in a session is named as match position. For example, in a session of four matches, the first match is called match in position one, and the last is referred to as match in position four. To isolate the effect of social ties on gaming sessions, in RQ3, I focus on individuals who only play with friends throughout the entire session. Therefore, I reduce the variability that may arise by preventing the inclusion of mixed sessions where users play with and without friends. Of course, this filter also reduces the number of sessions suitable for analysis. We use two strategies to analyze social ties’ impact on these individuals: • First, I study the individuals’ KDA trajectories throughout the gaming sessions. I only utilize gaming sessions with length 1 to 4, as data about sessions of length more than 4 is very sparse (for reference, a gaming session of length 4 usually spans between 3 and 5 consecutive hours of uninterrupted playing; anecdotally, in our data, I observe isolated instances of sessions that last up to 20 consecutive hours). For gaming sessions with different lengths, I separately aggregate the KDA on each match position and use separate line plots to visualize the trajectories over the course of the sessions. Then, I randomly reshuffle the sequence of matches in all gaming sessions, and reconstruct the trajectories based on the shuffled 37 (a) Individuals playing with friend(s) (in-friendship players) vs null players (those players in all-stranger teams). (b) Individuals playing without friend(s) (out-friendship players) vs null players (those players in all-stranger teams). Figure 3.2: Social ties’ impact on individuals playing with friends and individuals playing without friends. Violin plots convey the statistical distribution distinctions between (a) Individuals playing with friends, and (b) Individuals playing without friends, in teams of friends and mixed teams, as well as comparing against null players (those who play in all-stranger teams). Shuffled null players are displayed as orange violins (right violin of each pair), and observed individuals with and without friends are shown in green violins (left violin of each pair). Stars representing t-test statistical significance are shown in all subplots (∗∗∗ means p-value <= 0.001). 38 data—this is used as a randomized null model. By comparing the trajectories with original match positions against trajectories with randomized match positions, I exclude the possibility that any emerging trend is produced just by chance. • Second, I compute the KDA difference between the last and the first match of a session, expressed as (Y − X)/X, where Y is the KDA performance of the last match in a gaming session and X is the KDA performance of the first match in a gaming session. I adopt the KDA difference to capture the variation of overall performance throughout the whole session, i.e., the overall size of such an effect. 3.3.4 RQ4: Influence of social ties on teams over sessions In the previous section, I introduced the notion of gaming sessions of individual players. For each match, I define a team’s gaming session as the average session position of its 5 individual players. For example, for a given team in a given match, three players may be playing the first match of their session, one player may be playing the second match of the personal session, and one player may be playing the fourth match of the personal session: in this case, the average session length for this team in this match would be (1+1+1+2+4)/5 = 1.8. Therefore, due to the employed averaging strategy, the length of a team’s gaming session will be expressed in a range, i.e., gaming sessions of length [1-2), [2-3) and [3-4). Sessions of average length greater than 4 are exceptionally rare and, therefore, excluded from our analysis. To answer RQ4, I analyzed the kills, assists, and deaths, as well as the KDA performance of players subject to the influence of social ties, considering the team’s gaming sessions. I use one pair of violin plots to visualize each type of action or KDA’s distributions of in-friendship players versus out-friendship players in each team. I then use t-test(s) to verify the statistical difference of each pair. To investigate the trend of teams’ actions and performance when the length of gaming sessions increases, I organize the plots by comparing each type of action (or KDA) across teams of different session lengths. Consider, for example, the plot tracking team kills (take a glance at Fig. 3.5a): from 39 Team Category Condition Kills Deaths Assists KDA Low Experience Low KDA Whole Team 87% 129% 19% 79% In-Friendship 96% 15% 27% 474% Out-Friendship 79% 97% 9% 990% High Experience Low KDA Whole Team 109% 125% 18% 89% In-Friendship 185% 209% 54% 437% Out-Friendship 48% 56% -1% 392% Low Experience High KDA Whole Team 17% 23% 94% -36% In-Friendship 25% 35% 109% -32% Out-Friendship 1% 8% 79% -28% High Experience High KDA Whole Team 39% 52% 151% -32% In-Friendship 67% 85% 208% -26% Out-Friendship 9% 13% 89% -23% Table 3.1: Percentage difference of 4 categories of teams’ action/performance compared with the null model of all-stranger teams. left to right, the three pairs of violin plots separately belong to teams with avg. session length 1-2, 2-3, and 3-4. 3.4 Results In this section, I present the results in four parts, corresponding to the four proposed research questions. 3.4.1 RQ1: Influence of social ties on individual players’ activity Our hypothesis is that if social ties have some form of influence on players’ activity, in-friendship players will experience this effect directly (since these are the players who are playing with some friends), while out-friendship players may experience it indirectly (spillover effect), even without playing with friends, yet by playing with teammates who are friends among each other. We evaluate observed players’ actions (kills, assists, and deaths) against the null model, i.e., comparing individuals’ actions under the presence of social ties with that of all-strangers teams. Data 40 of players in all-stranger teams are randomly shuffled and then under-sampled (or over-sampled) to match the number of players in our observed conditions’ data. The plot for in-friendship players is shown in Figure 3.2a. The plot for out-friendship players can be found in Figure 3.2b. Note that in Figure 3.2a, I use "With Friends" to label the distributions associated with in-friendship players, while in 3.2b, I use "Without Friends" to mark the distribution associated with out-friendship players. Stars in all plots represent t-test statistical significance obtained by comparing observed conditions versus the null model (null players, i.e., those in allstranger teams where no social ties exist). For all the plots in this chapter, I consistently represent statistical significance by stars where ∗ means p-value<0.05, ∗∗ means p-value<0.01, ∗∗∗ means p-value<0.001, ns means not-significant. By inspecting Figure 3.2a, I observe that, in comparison to null players (players in all-strangers teams), in-friendship players have a higher number of kills and assists. However, deaths also arise along with kills and assists. In other words, in-friendship players are more engaged and active in the game, which leads to an increased number of in-game actions, both positive (kills and assists) and negative (deaths). Such an effect suggests that in-friendship players may tend to adopt more aggressive or impulsive strategies. Figure 3.2b shows reverse patterns: contrary to in-friendship players, out-friendship players have relatively fewer actions than null players. Such a decrease suggests that players without friends may tend to act in their best interest, adopting a more conservative play style. Alternatively, they may also experience being left out of the coordination and, therefore, being exposed to less game action, thus having fewer opportunities throughout a match to accomplish both positive and negative actions. 3.4.2 RQ2: Influence of social ties on team dynamics We divide friendship existing teams into four categories: (i) Low Experience & Low KDA, (ii) High Experience & Low KDA, (iii) Low Experience & High KDA, and (iv) High Experience & High KDA teams. By comparing each category (observation) with our null model (all-stranger teams), I 41 analyze the actions (kills, deaths, assists) and performance (KDA) statistically by calculating the percentage-difference. We describe the percentage difference as (Y −X)/X, where Y is the mean of actions/performance(KDA) of individual players in teams with social ties, and X is the mean of actions/performance(KDA) of individual players in the null model. Take Low Experience & Low KDA teams, for instance. For example, here, I compute the percentage difference on the whole team’s level. In this case, I represent the player’s average kills in friendship existing teams as k, and the null player’s average kills in all stranger teams as k ′ . Then the Kills percentage difference can be shown as (k − k ′ )/k ′ . In a similar manner, I represent the percentage difference of deaths, assists, and KDA as (d − d ′ )/d ′ , (a − a ′ )/a ′ , and (KDA−KDA′ )/KDA′ . Note that the KDA average is calculated over an individual player’s KDA in the team for a given match so that it could not be inversely deduced given the average k, d, a calculated overall match of the entire category. In Table 3.1, I report the results I obtained. I can observe that low-experience & low-KDA teams have kills, assists, and death actions all higher than those of all-strangers teams. In terms of having a positive percentage gain of actions, in-friendship players are the biggest winners since they almost double the number of kills and have a 27 percent raise on assists. However, out-friendship players are the largest beneficiary of KDA performance. Although in-friendship players gained a 4.74 times performance boost by collaborating with friends, the out-friendship players received twice as many benefits by an indirect effect. Such effect is not consistent concerning high-experience & low-KDA teams, whose in-friendship players are the biggest gainers in all actions as well as KDA performance. Moreover, their out-friendship players show increased unwillingness to help other teammates since their assists are even less than null players. For the remaining two categories with high KDA, in-friendship players have the largest action percentage gain, but the whole team is the biggest loser in terms of KDA performance. When comparing across the four categories, in-friendship players and the whole team have consistently positive gains on kills, deaths, and assists. I observe that in-friendship players in high-experience teams double the actions in comparison to their low-experience counterparts. 42 However, this is not the case for out-friendship players, as their actions are not obviously affected by experience differences. Such observation reveals that, for players with friends, experience boosts activity, but it is ineffective on performance. I also observe that for high-KDA scenarios, the teams with preexisting social ties experience drastic KDA drops in comparison with all stranger teams. The sharpest rate of percentage loss goes to the whole team’s performance, followed by a loss of 20 percent concerning the in-friendship players and out-friendship players. While for low-KDA teams with preexisting social ties, the KDA improves drastically when compared with teams of all strangers, such that on a whole team level, their performance rose by almost 80 percent, while in-friendship players exhibit over 4 times higher KDA. In summary, low-performance players, regardless of their experience, exhibit the highest gains in KDA when preexisting social ties are present in the team. Conversely, high-skill players exhibit significant decreases in KDA, regardless of their experience, when social ties are present in the team. In other words, playing with friends benefits almost exclusively low-performance players, who drag down the performance of their better-skilled friends. 3.4.3 RQ3: Influence of social ties on individuals over sessions To answer RQ3, I focus on analyzing the impact of social ties on individuals who only play with friends throughout the entire gaming session. Figure 3.3 displays the KDA performance trajectory of individuals who only play with friends throughout the entire gaming session: The left plot shows the actual data suggesting the presence of individual performance deterioration over the course of gaming sessions. For example, for sessions of length 3, the average KDA in the first match of such sessions is above 3.3, while the average drops to below 3.2 in the third and last match of such sessions. This effect is visible across the three conditions of session length greater than one. Then I verified our findings via randomization to exclude the possibility that the performance deterioration phenomenon was created by chance (random effect). The right plot of Figure 3.3 shows the reshuffled data, where the effect of the match position is disrupted: as I would expect, the lines flatten out suggesting that indeed the position 43 Figure 3.3: KDA trajectories of individuals who only play with friends throughout the entire gaming session. The left plot shows the actual data suggesting the presence of individual performance deterioration over the course of gaming sessions. The right plot shows the reshuffled null model where the effect of match position is disrupted (therefore, the lines are expected to become flat). of a match in a session has an effect on performance, corroborating the performance deterioration hypothesis in line with recent research results [102, 104, 106]. To quantify the effect size of such performance deterioration, I compute the KDA percentage change between the last and the first game in each session. Figure 3.4 shows the percentage change in KDA performance for sessions of length 1, 2, 3, and 4. I observe greater effect size on sessions of length 3 and 4, where the KDA drops approximately 5% (the randomized model, as expected, shows a flat line suggesting the absence of such effect in the null model). Summarizing: Results in Figure 3.3 and 3.4 reveal that players who only play with friends in a gaming session display an apparent trend of performance deterioration. I have conducted exactly the same procedures on players who only play without friends and players who only play with strangers. However, I did not observe such a strong trend of performance deterioration for players who only play in teams with 44 Figure 3.4: Changing rate of KDA performance in sessions of different lengths by individuals who only play with friends throughout the entire gaming session. This plot shows the KDA change percentage of the last game in the session from the first game in the session from infriendship players (those who played the entire session with some friend(s) in their team). friendship existing but play without friends and players who only play with strangers in a gaming session. 3.4.4 RQ4: Influence of social ties on teams over sessions We now analyze how actions and performance change with different session positions in the presence of social ties over the entire team. As mentioned in Section §3.3.4, the average gaming sessions’ length of a team falls into three ranges: [1-2), [2-3), and [3-4). Data of gaming sessions beyond that average length are excluded due to high sparsity and low significance. Furthermore, I concentrate exclusively on teams with social ties. In Figure 3.5, I compare in-friendship players with out-friendship players. Figure 3.5a, Figure 3.5b, Figure 3.5c, and Figure 3.5d each convey the kills, assists, deaths, and average KDA performance distributions of in-friendship players vs out-friendship players, on different session positions. All the distributions of out-friendship players are labeled as "out", whereas in-friendship players are labeled as "in". 45 (a) Kills of In-friendship players vs Out-friendship players (b) Assists of In-friendship players vs Out-friendship players (c) Deaths of In-friendship players vs Out-friendship players (d) Average KDA of In-friendship players vs Out-friendship players Figure 3.5: Social ties’ impact on teams over gaming sessions. Violin plots convey the statistic distributions between players with friends and players without friends in teams with preexisting social ties. Four aspects were examined: (a) kills, (b) assists, (c) deaths, and (d) KDA performance. In-friendship players’ data are displayed as green violins (left violins), while out-friendship players’ data are shown as orange violins (right violins). Stars represent t-test statistical significance (∗ means p-value<0.05, ∗∗ means p-value<0.01, ∗∗∗ means p-value<0.001, ns means not-significant). 46 Our results overall reveal that throughout a gaming session, individuals playing with friends have gradually increasing kills, assists, deaths, and KDA, while individuals playing without friends have decreasing actions and performance. Such results suggest that, except for experience, the presence of social ties in a team can also help players mitigate performance deterioration over the short term throughout a gaming session. 3.5 Related Work 3.5.1 Social ties in teams Teams science is essential to organizations, informal groups, and individuals [136, 37, 15, 19]. Considerable attention has been paid to teams across a range of interdisciplinary challenges. However, the factors affecting team performance in complex, realistic task environments remain yet scarcely understood, both in theory and in practice. A recent work by Mukherjee et al. [83] explores the impact of previous collaborations history on the performance of teams by analyzing both sports (football, cricket, baseball) and esports (Dota 2). Their results suggest that success shared in prior team experiences is an excellent predictor of future team success. However, their research focused on a specific co-play connection that preexists in the team. The social ties in the form of personal relatedness, which is the main topic of this chapter, are not explored. Prior research in the context of business psychology and organizational management examined the effect of friends (vs. strangers) working together as a team. Chung et al. [17] revealed that friendship has a significant positive effect on group task performance. However, Pillemer and Rothbard [92] explained the downsides associated with workplace friendships. This line of work has led to conflicting conclusions about the role of friendship on teams in the workspace. To clarify the complex scenario, our research tries to untangle both the positive and negative impacts of friendship on teams. To obtain a more refined investigation on the impacts of the preexisting social ties on teams, I decided to hierarchically dissect teams. In other words, for RQ2, I divided teams by 47 high/low skill/experience and examined the comparative difference with all-stranger teams at the same level. Although I focus on individual and team performance in online games in this chapter, I wish that the knowledge I have learned could be transferred to broader teamwork scenarios, including workplace teams. 3.5.2 Social ties in online games Previous research has been conducted on a Massively Multiplayer Online Role Playing Game (MMORPG), Dragon Nest [132], which revealed that successful and unsuccessful teams were homogeneous in terms of different characteristics, but successful teams are more often formed based on friendship than those unsuccessful teams. Instead of treating friendship as a feature to predict team success, I inversely convey analysis to unveil the hidden impact patterns of friendship on team performance. Mason and Clauset [78] used in-game data from Halo Reach, a multiplayer first-person shooter game, coupled with survey responses to show that players who name each other as friends in their survey tend to perform better together than apart. These self-reported friends showed improved overall performance of their team and increased rate of pro-social behaviors. These results are aligned with our findings, suggesting that existing friendship teams, based on Steam’s social network (friendship list) data, have much more active actions (kills, assists, and deaths) than all stranger teams. 3.5.3 Performance deterioration effects A recent research thread is concerned with quantifying the temporal dynamics of performance in techno-social systems. Short-term deterioration of individual performance was previously observed in real-world (offline) tasks. Recent studies investigate this phenomenon by drawing a parallel with online platforms: research shows that the quality of comments posted by users on Reddit [114], the answers provided on StackExchange question-answering forums [28], and the messages written on Twitter [65] and Facebook [66] decline throughout an activity session. Other than individual online 48 behaviors, short-term deterioration effects have also been found in virtual teams in MOBA games. [102, 104, 106]. These results pose the basis for RQ3 and RQ4. However, our analysis adds to these studies by revealing that social ties can play a role in mitigating performance deterioration throughout an activity session. However such mitigation is not homogeneous across all individuals in a friendship existing team but tends to benefit more the in-friendship ones, while out-friendship individuals may not be affected by social ties when it comes to mitigating performance deterioration. 3.6 Conclusion In this chapter, I investigated four research questions concerned with measuring the influence of social ties within single matches and over the course of a gaming session. Differently from prior studies, which utilized co-play history and self-reported friendship as features to predict team success [37, 83], our work explains how explicit friendship ties impact both individual and team performance in online games. In summary, our results reveal that preexisting friendship within a team will increase the actions of in-friendship teammates and create a spillover effect to decrease the actions of out-friendship players. Additionally, friendship does not necessarily guarantee the performance of a team. Lowperformance players (regardless of their experience) benefit the most, while high-skill players exhibit significant performance decreases. By tracing individual trajectories, our analyses suggest that social ties may relate to performance deterioration. While tracing teams with preexisting social ties, I found that social ties help mitigate in-friendship players’ performance deterioration. Even though our macro-level analysis is limited to in-friendship and out-friendship interactions, it suggests a promising research direction for further inquiry: Due to a lack of micro-level interaction data (e.g., voice communication, chat logs, etc.), I was here unable to fully unveil the intricate patterns of in-friendship and out-friendship connections and the causal mechanisms that drive the observed effects. 49 Our research suggests that preexisting social ties among team members are critical to team actions and performance. It is noteworthy that the empirical evidence of our research transcends the specific characteristics of Dota 2. Although our analysis is restricted to the domain of online games, it could be extended to other team-based environments such as sports, organizational teamwork, and social functions. This study advances our understanding of the factors that may contribute to the competitive edge of a team. Prior research [83, 17, 92] has focused on the results of previously-shared success, individual skills, and personality, in making teams more or less competitive. In contrast to the extant literature, this study demonstrates the competitive advantage derived from the preexisting social ties among team members. To generalize our results, analogous analyses along the lines of what I proposed in this chapter shall be applied to other techno-social systems; in order to do that, researchers would require data capturing who interacts with whom at what point in time, alongside historical and current friendship information. Preferably, researchers would also collect different performance and outcome variables, to test whether specific interaction patterns are associated with differential performance levels (e.g., of groups, of the individuals in the groups, or systems of groups). Since, in this context, there is often no established theoretical framework to predict exactly when or for how long an outcome is expected to occur, the ideal data would include sufficiently long longitudinal observations. We plan to carry out some of such studies ourselves in the future, targeting other types of games, as well as virtual teams in online and offline task-specific settings, individual- and team-based settings in virtual reality, and more in general both competitive and collaborative endeavors, to study how social network dynamics may affect human behavior and performance, including in teamwork. 50 Chapter 4 Learning Purchase Decision Representation: Purchase Sequence Generation in Round-based Games 4.1 Introduction Over the last couple of years, I have seen the gaming AI community moving towards training agents in more sophisticated games, like Doom [59], StarCraft [74], Minecraft [38], and Dota2 [88]. These online games are match-based, fast-paced, highly strategic, and involve adversarial real-time battles. However, existing studies are restricted to treating the complete continuous game as a single task, which would not be generalized to the cases of round-based games. I define a round-based game as a meta-game that can be decomposed into multiple games that can be independent and with different rules. A round-based game reasoning example could be: A professional Dota2 player plays five games in total but already loses two games in a row. How would he play the third one? Will he choose an aggressive strategy? Such round-based game reasoning is a fundamental problem of building video game AI, as its strategy can also be applied to continuous game cases. For example, the death of a player’s character in DOTA2 can be considered a round or a micro-game that contributes non-equally to the entire meta-game. Utilizing information from a round-based level could be crucial to the complete game reasoning process. In this work, I am particularly interested in tackling the new challenges of round-based games by proposing a simple scenario: each round only requires one action to receive a result, and each 51 round contributes equally to the entire meta-game. To this end, I introduce a round-based dataset collected from CS:GO professional match replays, which consists of 5167 matches, and each match contains a maximum of 30 rounds and exactly 10 players for two sides. Winning the entire game requires to win at least 16 rounds. The agent learns the reasoning behind the player’s weapon purchasing decisions at the beginning of each round, given the observation of previous rounds’ match statistics together with the current round weapon information described in Section 4.4. I aim to build a human-centered AI that learns to reason from human decision-making and, in return, helps interpret the process rather than achieving the best in-game performance. We propose three approaches to deal with such reasoning challenge: • Greedy Algorithm, this model buys the utmost affordable weapon sequence during each round. • Sequence Reasoner, since a player buys a sequence of weapons for each round and each weapon belongs to one of three types, gun, grenade, or equipment, I consider this task as a multi-task sequence generation problem with pre-trained the weapon embedding from the context of weapon sequence. • Sequence Reasoner with Round Attribute Encoder, the model encodes the player’s previous round history through Round Attribute Encoder (RAE) into an auxiliary round attribute for the Sequence Reasoner. Extensive experiments demonstrate the effectiveness of our proposed third method. However, the result is still not close to the original professional player’s level. Thus, I believe that the proposed CS:GO weapon purchasing dataset can be an important new benchmark, and our model sheds light on future works on round-based AI reasoning. 52 4.2 Related Work 4.2.1 Learning to Learn & Few-Shot Learning Existing learning-to-learn or meta-learning studies [49, 123] mainly focus on supervised learning problems. A particularly challenging problem is learning with few training examples, i.e. few-shot learning. Generally, few-shot learning datasets contain several tasks, and for each task, there are a limited number of examples with supervised information. Few-shot learning algorithms can improve on new tasks using provided supervision. Within the learning process, the meta-knowledge is extracted by a meta-learner, which learns to generalize the meta-knowledge on each specific task. vinyals2016matching [129] uses Matching Networks with attention and memory to enable rapid learning. snell2017prototypical [116] propose Prototypical Networks. While ravi2016optimization [97] use an LSTM-based meta-learning to learn an update rule for training a neural network learner, Model-Agnostic Meta-Learning (MAML) [29] learns a good model parameter initialization that can quickly adapt to similar tasks. Reptile [86] is a first-order approximation of MAML, which is remarkably simple and performs similarly well. In this study, I adapt Reptile to our framework. 4.2.2 Gaming Machine Learning Datasets Datasets and environments are crucial for facilitating gaming machine-learning research by serving as benchmarking platforms for new methods. STARDATA [74] is currently the largest StarCraft AI Research Dataset, dedicated to real-time strategy (RTS) games research. hu2019hierarchical [53] provides a simpler RTS environment that focuses more on language instruction as macroactions. For first-person shooter (FPS) games, VizDoom [59] provides an environment for 3D visual Reinforcement Learning. MineRL [38] introduces a large-scale, simulator-paired dataset of human demonstrations of sandbox game MineCraft. The Atari games are popular for RL methods evaluation, and The Atari Grand Challenge Dataset [69] has catalyzed the research. These studies focus on continuous environments without interruptions. In our work, I introduce a novel roundbased gaming dataset based on CS:GO. Each round can be considered as an independent gaming 53 episode that equally contributes to the match where multi-round meta-strategies exist. Players can strategically lose some gaming episodes in exchange for winning a long-term goal. 4.3 Task 4.3.1 Few-shot Learning Each team could develop several multi-round economy strategies that deliberately let go of some disadvantaged rounds temporarily to save money and build up comparative advantages for future rounds. As for each game, each player has their own preference for weapon purchasing. The financial status profoundly impacts the player’s purchasing policy in each round. Due to the complexity and diversity of each player’s attributes (policies), I cast the task into the learningto-learn framework. For each game, the first few rounds are observed, and the model learns to predict later rounds. This few-shot task setting brings more opportunities for agents to learn players’ dynamic attributes during inference and challenges the agents to learn more generalized policies that can quickly be adapted to current players after some observations. 4.3.2 Problem Formulation We treat each match as a separate data point. Each match consists of 10 different tasks from 10 players’ perspectives. Since each player has their own preference for weapons, I formulate the problem into a few-shot learning task to foster the agent to capture the preference from the few support shots. For each match Mi , each player i go through j rounds, rounds are noted as Ri, j , j ∈ [1,ni ], Mi = {Ri,1,Ri,2,...,Ri,ni }. I use the first K rounds as K-shot training examples. The model adjusts on the K shots (support set) and is asked to behave well on other ni −K rounds (target set). We formulate the problem in a reinforcement learning setup. At the beginning of each round, an agent (a player) estimates the states and takes a single action that stands for a weapon-purchasing set. The state of the agent includes weapons and money of all players. 54 Attribute Description account current cash cash spent cash the player has spent this round weapons all weapons held by this player items value sum of current items’ prices performance score player’s score in the scoreboard Table 4.1: Description of the extracted information of a player for each round. Performance score is described in Section 4.4.1. We introduce the formulation for a match Mi and for simplicity, I omit the subscript i. For the match M, the agent’s possessions at the j-th round is composed of the weapons it owns Xj = {x j,1, x j,2,..., x j,mj } and money c j . At round j, the history information of this round Hj = {E1,E2,...,Ej−1} contains empirical information of past rounds. The empirical information of j-th round Ej consists of final weapons after purchasing X ′ j = {x ′ j,1 , x ′ j,2 ,...} and performance score sj . For the j-th round, given an agent’s own weapons X s j , team’s weapons X t j = {X t j,1 ,X t j,2 ,X t j,3 ,X t j,4 ,X t j,5 } and opponent’s weapons X o j = {X o j,1 ,X o j,2 ,X o j,3 ,X o j,4 ,X o j,5 }, along with history information Hj = {E1,E2,...,Ej−1} from past rounds and all players’ money, the agent needs to properly generate the action Aj to approach the label Aˆ j . 4.4 Dataset We aim to build a dataset dedicated to round-based game reasoning. To build our dataset, I collect professional CS:GO players’ match replays during 2019. 4.4.1 Parsing Replays We design a systematic procedure to process the replays. First, I parse the replays using the demofile parser1 . I then filter out anomalous data to ensure quality. For all the replays, I extract all information related to weapon purchasing. Specifically, I capture players’ weapon pickup and weapon removal actions. I also extract each player’s state 3 times each round, including round 1https://saul.github.io/demofile/index.html 55 Type Count 0 1 2 3 4 Gun 35.9% 61.6% 2.4% 0.1% 0% Grenade 19.4% 12.6% 14.6% 16.4% 37.0% Equipment 38.3% 50.3% 10.7% 0.7% 0% Table 4.2: The distribution of purchasing action count for each type of weapon. Note that guns and equipment purchased 4 times in one purchasing sequence are very rare, so they are rounded down and visualized as 0% in this table. start, weapon purchasing period end, and round end. Table 4.1 shows a detailed description of the extracted states. The performance score is provided by the CS:GO in-game scoreboard, which is based on the player’s kill and bomb planted/defused. I use the normalized score in our Round Attribute Encoder for past rounds’ encoding. After parsing, I convert them into structured JSON format. Data cleaning is subsequently performed to ensure data quality. I drop out the data with inconsistencies on weapon purchased and money spent. I then obtain consistent data of 5167 matches. I randomly shuffle the matches and split the data into training, development, and test sets in the ratio 8:1:1. The training, development, and test sets consist of 4133, 517, and 517 matches, respectively. 4.4.2 Statistics In CS:GO, there are 44 different weapons in total, including 34 guns, 6 grenades, and 4 equipments. For guns, there are 6 different types: pistols, shotguns, SMGs, Automatic Rifles, LMGs, and Sniper Rifles. Equipment includes helmet, vest, defuse kit, and Zeus x27. In order to get high-level representations for their intrinsically diverse attributes, I use a self-supervised learning method to train the embedding of weapons. I treat guns, grenades, and equipment as three types and perform generation separately in our model. Table 4.2 shows the frequency distribution of three categories of purchasable items, i.e., gun, grenade, and equipment, in the purchasing sequences per round. Table 4.2 does not contain information about the exact position within a sequence. I sort all the weapon sequences in the order of gun, grenade, and equipment based on our human prior knowledge. 56 ... Low-Level Attention Weapon Features Player Features High-Level Attention State Representation <Gun1> <Grenade2> Opponent Features Opponent Summary Team Summary Weapon Encoder ... Player History Features Round Attribute Encoder Player attribute Summary Team Encoder Team Features Team Finance Features Economy Encoder Team Finance Summary LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM Gun Decoder Grenade Decoder Equipment Decoder MLP Classifier MLP MLP 0/1 Buy guns/grenades/equipment? Strong Supervision Figure 4.1: Model Architecture. The weapon encoder generates a weapon summary based on a set of weapon features of a player. The team encoder encodes a set of weapon summaries into a single team summary. Round attribute encoder encodes the player’s information from previous rounds. The state representation is then fed into three LSTMs separately. Three gate classifiers are trained with strong supervision to determine if the generation of a certain type of action is suitable. Although a player can only carry 1 primary gun and 1 pistol, there are some cases in which players buy more than 2 guns, probably for their teammates. As in this chapter, I only consider studying the task as a single agent round-based problem, I do not remove these cases and leave learning collaborative purchasing to future work. The player can carry 4 grenades and 1 of each type of equipment at maximum, so the maximum action length for purchasing is 4. Each row shows each type of weapon that is likely to be purchased in the purchasing sequence. 4.5 Methodology 4.5.1 Meta-learning Algorithm The detailed algorithm is described in Algorithm 1. I have investigated MAML and its first-order simplification [29] and moved to Reptile [86] since it is also first-order and performs reasonably well in many tasks. Note that the original algorithm is used for image classification, and each task/class contains several data. In our case, however, I have more tasks/games and fewer data 57 100 50 0 50 100 100 75 50 25 0 25 50 75 100 Desert Eagle R8 Revolver Dual Berettas Five-SeveN Glock-18 P2000 USP-S P250 CZ75-Auto Tec-9 MAG-7 Nova Sawed-Off XM1014 PP-Bizon MAC-10 MP7 MP5-SD MP9 P90 UMP-45 AK-47 AUG FAMAS Galil AR M4A4 M4A1-S SG 553 M249 Negev AWP G3SG1 SCAR-20 SSG 08 High Explosive Grenade Decoy Grenade Molotov Incendiary Grenade Flashbang Smoke Grenade vest vesthelm Zeus x27 defuser End Figure 4.2: Atomic action Embedding t-SNE visualization. The different colors stand for purchasing different types of weapons. I further categorize guns into 6 types just for visualization. to support. Thus, for target sets of training data, I update the model parameters as well, while the original approach performs evaluations only after several epochs. I use vanilla SGD for the meta-learning loop, which samples a single task for each step. In the inner loop for adapting to each task, I use Adam [62] as an optimizer. 4.5.2 Atomic Action and Embedding Although each round only requires a one-time purchase to receive the result, the purchasing action space is huge and complex. It contains different combinations of 44 weapons for each player. We thus approximate the complex joint action by splitting it into a sequence of atomic actions and each atomic action represents a single weapon purchase. According to the authors’ human priors, professional players mostly show the same pattern of purchasing weapons: they prefer to buy guns first, then grenades, and equipment at last if there is still money left. We, therefore, sort the action sequence by the specific order in accordance with the professional purchasing habit. Then I formulate the one-time purchasing action into a sequence of atomic actions and consider it a sequence generation problem. 58 Algorithm 1: Our modified model agnostic reptile meta-learning algorithm k = number of shots in few-shot learning. ε = meta-learning step size. Initialize model parameters θ. repeat Sample single match data Mi with repetition for j = 1 to k do Sample single round Ri j and compute loss L θ ′ ← θ − ∇θL end for Compute target set loss L ′ on the other rounds. θ ′′ ← θ ′ − ∇θ ′L ′ Update θ ← θ + ε(θ ′′ −θ) until convergence Therefore I can narrow down the action space. I pre-train the atomic action embeddings of purchasing each weapon by continuous bag-of-words model [81] using the context information in the training set. By visually inspecting the t-SNE of atomic action embeddings (Figure 4.2), I verify that purchasing similar weapons is closer to each other, which can support the sequence decoding process. I highlight guns in different colors based on their subtypes. The "End" action will terminate the purchase. 4.5.3 State Encoder The state of the agent includes the weapons of itself, teammates and enemies. I use hierarchical attention [137] to represent the aggregated representation of each player and each team. By using hierarchical attention, I fuse the order-independent weapons representations into player representations and player representations into team representations. With the round attribute encoder, I leverage the round information by incorporating the weapon history and performance score of the agent at past rounds to force the agent to learn from the past rounds. 59 4.5.3.1 Weapon Encoder Not all weapons contribute equally to the representation of the player’s attribute. Hence, I apply the attention mechanism to attend to more important weapons for the player and aggregate the representation of the weapons to vectorize a player. Specifically, weapon representation pi for player i is ui,t = tanh(W1xi,t +b1), (4.1) αi,t = exp(v T 1 ui,t) ∑t exp(v T 1 ui,t) , (4.2) pi = ∑ t αi,txi,t . (4.3) That is, I get the hidden representation ui,t from weapon embedding xi,t of t-th weapon of player i. I then measure the weight of the weapon αi,t based on the similarity between ui,t and a weapon importance vector v T 1 . After that, the aggregated representation of weapons of one player is the weighted sum of the weapon embeddings. W1, b1, and v T 1 are trainable parameters and shared for all players. 4.5.3.2 Team Encoder Similarly, I use the attention mechanism to assign weights to players in each team and thereby compute the aggregated team representation z. 4.5.3.3 Round Attribute Encoder In the proposed round-based task, previous rounds’ actions and feedback are explicitly perceived and should be carefully reflected by the agent. Therefore, round information is a key feature that needs to be effectively utilized. Specifically, in the j-th round, the agent knows its final weapons after purchasing from previous rounds {X ′ 1 ,...,X ′ j−1 } and performance scores {s1,...,sj−1}. The 60 past weapons are passed into the weapon encoder to get the aggregated weapon representation of previous rounds. Then I normalize the performance scores as weights and compute the weighted sum of past weapon representation as round representation hr . As a result, the agent will attend rounds with better performance. 4.5.3.4 Economy Encoder To take the economy into account, I encode the normalized money features of all players through a multi-layer perceptron (MLP) as a dense economy vector hc. 4.5.3.5 State Representation We concatenate the agent’s player representation p s , two teams’ representation z a , z e , round representation hr , and money representation hc. Then I get the overall initial state representation using MLP. h = Wh2 ReLU(Wh1[p s ;z a ;z e ;hr ;hc]). (4.4) 4.5.4 Multi-Task Decoder Given the player’s states, the agent is asked to take a sequence of atomic actions and receive a reward from the environment. Such atomic actions can be classified into three categories by their types, i.e., purchasing guns, grenades, or equipment. Since the attributes of each type of weapon and the prices of weapons are diverse, the strategies for generating different types of weapons should be different. I formulate the purchasing as a multi-task atomic action sequence generation problem. 4.5.4.1 Gate Network Before the decoding step, I train three gate networks to control the atomic action generation of each task. Each gate is a binary classifier, a simple MLP, which decides whether to generate actions for a task. They are trained independently throughout the entire training procedure with strong 61 supervision signals, such as whether a label has actions for this task. The gates can facilitate action generation and are easy to train. 4.5.4.2 Task-Specific Decoder Based on the initial state representation, the agent generates atomic actions sequentially and transits state using LSTM [48]. Each LSTM-based decoder is designed for one task and all decoders share the same player money information and the initial state representation given by the encoder. This multi-task design also ensures a better generalization of the encoder with training signals of different domains [16]. The agent takes the state representation h to initialize the LSTM hidden state h0 and uses the hidden state ht at each time step t to generate the distribution over atomic actions. ht = LST M(ht−1,at−1), (4.5) P(·|a1,...,at−1,h) = σ(Wc2 ReLU(Wc1ht)). (4.6) where σ is the softmax function. 4.5.5 Learning Objective Since I use LSTM to generate the atomic actions sequentially, using the "teacher forcing" algorithm [134] to train our model will inevitably result in the exposure bias problem [96]: maximizing the likelihood of a sequence of atomic actions needs the ground truth atomic action sequence during training but such supervision signal is not available in testing, thus errors are accumulated while generating the atomic action sequence. To address the issue, I use the self-critical sequence training (SCST) method [98]. SCST is a form of REINFORCE [133] algorithm that is designed for tackling sequence generation as an RL problem. We first sample an atomic action a s t ∼ P(·|a s 1 ,...,a s t−1 ,h) from the atomic action distribution at each generating step t to get an atomic action sequence 62 A s . Another sequence A g is generated using greedy search to maximize the output probability distribution P(·|a g 1 ,...,a g t−1 ,h) at each step and serves as a baseline. I define r(A) as the reward function. I compute F1 score with ground-truth as our reward function since I do not consider the atomic action sequence order and compare the formed atomic action set only. The objective function is defined as follows. L = (r(A g )−r(A s ))∑ t log p(a s t |a s 1 ,...,a s t−1 ). (4.7) Our gate network is trained separately with a strong supervision signal. I compute the cross entropy loss for each binary gate. Method F1 F1-gun F1-grenade F1-equip Greedy Algorithm 0.2612 0.0338 0.3831 0.2890 Single-Sequence Reasoner w/ Gate 0.5109 0.3487 0.5734 0.7119 Single-Sequence Reasoner + RAE w/ Gate 0.5206 0.3494 0.5870 0.7177 Multi-Sequence Reasoner w/o Gate 0.5028 0.3216 0.5912 0.5370 Multi-Sequence Reasoner + RAE w/o Gate 0.5114 0.3216 0.6008 0.5435 Multi-Sequence Reasoner w/ Gate 0.5475 0.4524 0.6608 0.5386 Multi-Sequence Reasoner + RAE w/ Gate 0.5670 0.3920 0.6639 0.6006 Table 4.3: The results of different methods including ablation study. I categorize weapon outputs by gun, grenade, and equipment (equip) to get more insights. The multi-sequence reasoner with round attribute encoder (RAE) and gate classifier achieves the best result. Both gate classifier and RAE can improve model performance in different circumstances. 4.5.6 Evaluation Metrics We evaluate the methods by calculating the F1 score between the model output atomic action sequence and ground truth. Same as the reward function used in learning objectives. 63 4.6 Experiments In this section, I evaluate the proposed methods on the test set based on their highest performance on the development set. During the atomic action generation step, I mask out weapons that cannot currently afford all methods to avoid invalid purchases in both the training and testing phases. In addition, some weapons, such as grenades, have a quantity limit. I mask out weapons that currently reach the quantity limit as well. We do an ablation study on our multi-sequence reasoner to measure its effectiveness. 4.6.1 Greedy Algorithm Baseline The greedy method buys weapons based on type order. For each type, it will do a sequence of purchases prioritized by weapon expensiveness. In other words, at each purchasing step, it buys the most expensive weapon which is affordable in that type. It only buys one gun, a maximum number of grenades up to the quantity limit, and then buys equipment. I consider it the baseline as it does not require training and is hard to generalize to different gaming scenarios. 4.6.2 Multi-Sequence Reasoner The Multi-Sequence Reasoner follows the model architecture described in section ??. In the first round, all players start with no weapons and are restricted from buying pistols due to financial issues. Data in the first round are not generalizable, and I cannot utilize its information for the second round. Since I need useful empirical information from past rounds and the second round only contains data from the first round, these two rounds are removed. In the 16th round, the two teams switch sides and start from scratch. Therefore 16th and 17th rounds are not included in our task as well. We set the few shot number K to 5. I set the batch size to 10 to generate the atomic actions of 10 players independently in a match simultaneously. I tackle this task as a single-agent problem and 64 leave team-based multi-agent purchasing prediction to future work. During inference time, I use the beam search to generate optimal atomic actions and set the beam size to 1. 4.6.2.1 Round Attribute Encoder We evaluate the effectiveness of the round attribute encoder by concatenating its output to the state representation encoded by the original sequence reasoner without modifying the original model architecture. To measure its effectiveness, for all experiments, I run the model with two settings: with and without the round attribute encoder. 4.6.3 Results We report the performance of different methods in Table 4.3. I also showed the performance of each type of weapon purchasing. First of all, I observe that the naive Greedy Algorithm does not achieve a good performance compared to deep learning models. Besides, I also observe that the multi-sequence reasoner with round attribute encoder in the last row of table 4.3 achieves the highest F1. 4.6.3.1 Ablation Study To test the importance of the gate network (Gate), round attribute encoder (RAE), and multi-task decoder (Multi-Sequence Reasoner), I perform an ablation study where I remove the gate network, round attribute encoder and turn the multi-task decoder into a single decoder (Single-Sequence Reasoner). As shown in Table 4.3, ablating gate network, round attribute encoder, and multi-task decoder from our integrated model will impair the performance and lead to a decrease of 5.56%, 1.95% and 4.64% for F1 score. More importantly, consistently decreased performance in all three ablation models due to RAE removal shows the importance of utilizing round metainformation. Thus, I believe round-based games are fundamentally different from conventionally studied continuous games. How to learn an effective round meta information representation and how to utilize it is an important topic for future game AI studies. 65 4.7 Conclusion This chapter explored the challenges in round-based games in which each data contains a long sequence of dependent episodes. I introduced a new round-based dataset based on CS:GO. The dynamic environment and connections between rounds make it suitable for round-based game study. I presented a few-shot learning task to encourage the agent to learn general policies and can quickly adapt to players’ personal preferences in certain scenarios. Experimentally, I showed that our proposed model, Multi-Sequence Reasoner, is effective. I found that using round empirical information leads to nontrivial improvement in the result, thus testifying to the importance of round history for the task. I believe our research will open doors for building interpretable AI to understand episodic and long-term behavioral strategies not only for the gaming community but also for the broader online platforms. 66 Chapter 5 Learning Multi-Modal Share: Sequential Multi-Modal Social Media Gamer Embedding 5.1 Motivations "With great power comes great responsibility." Understanding hate speech, reducing misinformation and polarization, and developing unbiased user profiling for recommendation systems have become essential topics for social media platforms1 . However, compared to the extensive efforts devoted to question answering by combining vision and language, research on harnessing the complementary predictive power of multi-modalities for user profiling on social media is limited. In this chapter, I utilize fine-tuned embeddings from three modalities—natural language from text, visual signals from images, and graphs for relational connections on social media. By employing a multi-modal triplet loss, I aim to map these three modalities onto similar locations in high-dimensional space. This technology is crucial for creating high-performance models of human traits and behavior on social media, as the ground truth for assessing latent human traits and behavior is often costly to acquire on a large scale. 1https://about.fb.com/news/2019/06/social-media-and-conflict/ 67 5.2 Hypothesis • The proposed methods facilitate more accurate predictive power through multi-modal fusion. For instance, given a user and their posts, we can predict which posts were made by that user, and given a profile picture, we can identify the corresponding user. • The proposed methods are applicable to downstream classification tasks, including team profiling to predict the winning team, user profiling to determine fan affiliation, and data augmentation through co-learning and inter-modality mapping by concealing some modalities and making predictions. • The proposed methods generate a multi-modal embedding space that can be visualized using t-SNE. 5.3 Contributions • I collect a benchmark dataset for multimodal machine learning on social media. (Section 5.5); • I design a novel coordinated representation learning framework 2 leveraging three modalities (Section 5.6); • I propose a general multi-modality triplet loss in learning the representations (Section 5.6.5); • I propose a novel flow for user profiling and content classification applications. 5.4 Related work The field of Multi-modal Machine Learning [4] presents unique challenges for computational researchers due to the heterogeneity of the data. The five main challenges are: 2 I release our implementation, which is an extension to a stable tensorflow implementation of triplet loss applied to a multi-modal setting under https://github.com/yileizeng/tensorflow-triplet-loss 68 • Representation: Exploiting the complementarity and redundancy of multiple modalities. • Fusion: Integrating different modalities with varying predictive power, noise levels, and missing information. • Co-learning: Transferring knowledge between modalities through co-training, conceptual grounding, and zero-shot learning. • Alignment: Aligning multiple modalities at the same timestamp presents significant challenges. Each modality, whether it be text, image, or graph data, often operates on different temporal scales and may exhibit varying levels of granularity. [119] • Translation: Each modality carries unique nuances and context-specific details that may not be easily transferable to another modality. In this project, I aim to propose solutions for the challenges of representation, fusion, and co-learning. Figure 5.1: Structure of coordinated representations While a significant amount of work has focused on uni-modal representation, most multi-modal representations have traditionally relied on the simple concatenation of uni-modal ones. However, this approach is rapidly evolving. Currently, multi-modal representations can generally be divided into two categories. Joint representations are projected into the same space using all modalities as 69 input. In contrast, coordinated representations, as illustrated in Appendix Figure 5.1, exist in their own spaces but are linked through a similarity measure (e.g., Euclidean distance) or a structural constraint (e.g., canonical correlation analysis (CCA)). Joint representations have been employed to construct representations involving more than two modalities, whereas coordinated spaces have predominantly been limited to two modalities. This chapter aims to construct coordinated representations of three modalities: text, image, and user graph. I will evaluate the effectiveness of these representations through a downstream task focused on user profiling on social media. 5.5 Dataset The dataset features three of the most renowned multiplayer online games: CSGO, Dota 2, and League of Legends. I collected sports world championship data from Twitter, including the "2019 CS: GO StarLadder Berlin Major Championships," "The International 2019 (TI9) Dota 2 World Championships," and "The 2019 League of Legends World Championship (Worlds 2019)" events. The dataset includes 2.3 million tweets with 4 million images for CS: GO, 1.6 million tweets with 2.7 million images for Dota 2, and 2.5 million tweets with 1.7 million images for League of Legends. Additionally, I collected a 1 percent sample of Twitter’s streaming data on all competition days as complementary resources. For the experiment, I used the textual content in tweets as the text modality, the images embedded in tweets as the image modality, and the interaction network, including replies, retweets, and quotes, as the graph modality. I focused on Dota 2 Esports events as a sample for the experiment. Although the experiment features gaming social media, the methods explored are transferable to broader domains beyond gaming. 70 f(T) k(I) h(G) n1 x d n2 x d n3 x d Offline Pre-Training Fusion Module Fusion Module Fusion Module Task Prediction Output n1 x d1 n2 x d2 n3 x d3 Update weights Online Training Triplet loss Multi-modal representation . . . Text (T) Image (I) Graph (G) Figure 5.2: Architecture of the multimodal system. 5.6 Methodology Figure 5.2 illustrates the architecture employed to train our multi-modal representation. It comprises two primary stages: • Offline training: In this stage, I first obtain embeddings for each modality. BERT is used to convert text into embeddings (Section 5.6.2), ResNet is utilized to obtain embeddings for images (Section 5.6.3), and DeepWalk is employed to generate embeddings for the user graph (Section 5.6.4). • Online training: In the second stage, I implement a fusion module using dense layers to combine embeddings from different modalities, each with varying dimensionalities, into a common dimension d. This fused representation is then used to backpropagate on a team classification downstream task. 71 5.6.1 Data Preparation I begin by retaining only the columns relevant to our experiments and prediction tasks, specifically full text, links, user ids, tweet ids, and the 16 teams. 3 Since the provided links encompass a diverse range of multimedia, including animations and videos, I clean the data by retaining only downloadable images. For each tweet, I replicate its full text across multiple rows based on the number of linked images. 5.6.2 Text Embeddings To generate embeddings for each tweet, I use contextualized embeddings based on the Bidirectional Encoder Representations from Transformers (BERT) architecture [24]. We chose this model due to its state-of-the-art performance on various Natural Language Understanding (NLU) tasks. 4 To prepare the tweets for Byte-Pair-Encoding (BPE), the special tokenizer used by BERT pre-trained models, I start by appending a [CLS] token at the beginning and a [SEP] token at the end of each tweet. Next, I feed the tweets into the BERT model, freezing its pre-trained weights without further fine-tuning at this stage, to obtain encoded states for each sub-word by concatenating the forward and backward directions. To derive the sentence embeddings, I take the pooled output of all encoded states, which corresponds to the embeddings of the [CLS] token. Figure 5.3 illustrates this process. 5.6.3 Image Embeddings An image feature vector is a feature map obtained from the output of a neural network layer. This vector provides a dense representation of the input image and can be utilized for various tasks such as ranking, classification, or clustering. 3The teams include ’PSG.LGD’, ’Team Secret’, ’TNC Predator’, ’Alliance’, ’Newbee’, ’Mineski’, ’Team Liquid’, ’Keen Gaming’, ’OG’, ’Vici Gaming’, ’Evil Geniuses’, ’Virtus.pro’, ’Infamous’, ’Royal Never Give Up’, ’Fnatic’, and ’Natus Vincere’. 4Although newer and more performant models like RoBERTa exist, these embeddings are also based on BERT. https://github.com/UKPLab/sentence-transformers 72 Figure 5.3: Architecture of BERT. It is crucial to decide which layer to extract features from. In traditional convolutional neural networks, the backbone layers, which are typically the initial and middle layers, learn common lower-level features specific to the training data. In contrast, the head layers, which are generally the final layers, learn high-level task-specific features, such as the exact classes in image classification. Therefore, backbone layers trained on a large dataset can be used to extract more general feature representations that are useful for various image tasks. Figure 5.4 illustrates the architecture of image embeddings. To obtain the image embedding features, I used the pre-trained ResNet-18 model [41] on ImageNet [23] as the backbone. Figure 5.4 illustrates the backbone structure employed as the embedding network from the pre-trained ResNet-18. I removed the final layers designated for the classification task and retained the remaining parts of ResNet-18 as the image embedding network. 73 Figure 5.4: Image embedding framework 5.6.4 Graph Embeddings I employ DeepWalk [91] to generate graph embeddings. DeepWalk learns latent representations of vertices in a network through truncated random walks. It begins by generating short random walks, treating each as an item in a corpus and each vertex as a word in its vocabulary. DeepWalk then uses the SkipGram model to update the latent representation of each vertex. Considering the heterogeneity of interactions on Twitter, I created three graphs corresponding to reply, retweet, and quote interactions among the 16 participating teams. 5.6.5 Triplet Loss Figure 5.5: Triplet Loss Mechanism I train the architecture using a triplet loss that takes an anchor, a positive example, and a negative example as input. As illustrated in Figure 5.5, the objective is to learn embeddings such that the anchor is closer to the positive example than to the negative example by a specified margin. The 74 triplet loss function is shown in equation (5.1), where a, p, and n represent the anchor, positive sample, and negative example, respectively. f1, f2 denote the trainable embedding networks for anchors and samples, which vary in different tasks. m represents the margin, which is adjustable for different implementations. Ltriplet(a, p,n) = max(0,∥ f1(a)− f2(p)∥ 2 2 −∥ f1(a)− f2(n)∥ 2 2 +m) (5.1) Triplet loss in one modality. To illustrate the implementation details of the uni-modal triplet network, I use the imaging modality as an example. After applying a dense layer, the image embeddings are converted to a fixed 128-dimensional vector. The triplet loss then transforms this 128-dimensional vector into a new 128-dimensional vector using the ReLU activation function, resulting in the final embedding of each image. In this scenario, the input data consists entirely of images: the anchor image, a positive image (from the same team class as the anchor), and a negative image (from a different team class than the anchor). f1 and f2 are identical, comprising a non-trainable image embedding network followed by a trainable single-layer triplet network. Triplet loss in multimodality. In the multi-modal context, the implementation of triplet loss differs from that of a single modality. Firstly, the network is more complex. Each modality has a specific non-trainable embedding network, followed by a trainable triplet network that converts each sample into a fixed 128-dimensional vector. As shown in Figure 5.5, the non-trainable embedding networks for different modalities have distinct structures, whereas the trainable single-layer triplet networks have similar structures but do not share weights as in the single modality scenario. Secondly, the combination of input images varies. Given a set of modalities S = Set{T, I, G} representing tweets, images, and user graphs respectively, I randomly select one modality from Set{T, I, G} as anchor modality (e.g. graph) and sample the anchor sample a from anchor modality. Then, I randomly select a modality from the complement set of the anchor modality Set{T, I} (for e.g. image) as sample modality. Finally, I sample a positive example p from sample modalities that 75 belong to the same profile as anchor sample a. Additionally, I sample a negative example n, which does not belong to the same profile as anchor sample a. 5.7 Experimental Setup I use pre-trained models for text and image embeddings, maintaining their default parameters. For text, I employ bert-base-cased5 and for images, I employ Img2Vec implementation 6 . To fine-tune the hyperparameters for our classification task, I fixed the window size and walk length to emphasize the local structure, setting both to 10. I then experimented with different values for the latent embedding dimensions (d), the number of walks initiated per vertex (γ), and the total number of training samples in the dataset to assess their impact on the results. Detailed information is provided in Table 5.1. Parameters Values Window size w = 5 Embedding dimensions d = 128 Walks per vertex γ = 80 Walk length t = 10 Training Ratio TR = 0.8 Table 5.1: DeepWalk hyper-parameters for graph embedding. To evaluate the performance of our multi-modal representation, I conduct several experiments comparing uni-modal and various combinations of multi-modal approaches. The hyperparameters used in these experiments are detailed in Table 5.2. Each experiment is run three times, and the results remain consistent across runs, eliminating the need to report confidence intervals. The hypothesis being tested is whether multi-modal approaches enhance the accuracy of predictions for certain variables present in the data but hidden to simulate a self-supervision task. In these preliminary experiments, the focus is on predicting team affiliation. 5Hugging face implementation: https://github.com/huggingface/transformers 6https://github.com/christiansafka/img2vec 76 Parameters Values learning rate 1e-4 batch size 64 # epochs 20 # channels 32 batch normalization momentum 0.9 margin 0.5 embedding size 128 triplet strategy batch hard Table 5.2: Model hyper-parameters I project the fine-tuned embeddings from different modalities, originally in high-dimensional space (128 dimensions), into a 2-dimensional space using t-SNE. I use the visualizations to assess the inherent ability of the embeddings from tweets, images, and user graphs to cluster into semantically related groups, each associated with a specific team, independent of the modality. Figure 5.7 illustrates the visualization of the multi-modality representation. In addition to visualizations, I perform a comparative analysis of classification accuracies for team prediction for each tweet, comparing uni-modal and multi-modal representations. 5.8 Results Table 5.3 presents an ablation study highlighting the impact of different subsets of modality representations on team prediction performance. The prediction accuracy for the 16 team classes surpasses the random baseline of 6.25%. Notably, the final experiment (#4), which incorporates all modalities, achieves the highest accuracy, with an improvement of 11.36% and 14.35% over Text, Graph, and Image modalities, respectively. This finding suggests that multi-modal representation can capture more useful features than any single modality alone. Among the three modalities, the imaging modality performs the worst, likely because most images provided in the links are profile pictures, which may not directly correlate with the team the user is tweeting about. 77 Experiment Modalities Team AccuracyText Image Graph # 1 ✓ ✗ ✗ 36.36% # 2 ✗ ✓ ✗ 33.37% # 3 ✗ ✗ ✓ 36.36% # 4 ✓ ✓ ✓ 47.72% Table 5.3: Prediction scores for different experiments. Table 5.4 presents the pairwise accuracy of the top k=1 nearest neighbors between every two modalities in the multi-modal setting. Each entry ai j represents the proportion of instances where a particular modality i has its top k=1 similar embeddings in modality j sharing the same team affiliation. The results indicate that the mean accuracy shows that image and graph embeddings are clustered with other modalities based on team affiliation. The prediction accuracy for team affiliation, using the Euclidean distance between a sample and the anchor image (19.31%) or graph (18.56%), is significantly higher than random guessing (6.25%). This demonstrates that our proposed multi-modal triplet loss effectively preserves the original relationships after embedding. Image Text Graph Mean Image - 15.90% 22.72% 19.31% Tweet 4.16% - 4.54% 4.35% Graph 6.06% 31.06% - 18.56% Table 5.4: Multi-modal top k=1 Nearest Neighbours Similarity Our t-SNE visualization in Figure 5.7 demonstrates the clustering behavior of the multi-modal representation. Ideally, we expect to see homogeneous clusters of tweets about a particular team, regardless of the feature modality. Specifically, tweets (represented as circles) and images (represented as crosses) of the same color (indicating the same team) should be close together in the vector space. This desired clustering is evident for a significant number of points, as we observe well-defined clusters for six teams, enriched by incorporating different modalities. These clusters are better separated and less cluttered compared to the clusters in the t-SNE visualization based solely on retweet graph features, as shown in Figure 5.6. 78 Figure 5.6: t-SNE Visualization of retweet network’s deepwalk embedding. Figure 5.6 demonstrates the t-SNE visualization of the uni-modal representation based on graphs. Comparing this visualization with Figure 5.7, it is evident that before applying triplet loss optimization, the embeddings generated using only the pre-trained model exhibit too much noise and display little distinction between different teams. However, after applying our multi-modal triplet loss to optimize the one-layer triplet network, the representations become more clustered according to teams. This highlights the effectiveness of the multi-modal triplet loss in enhancing the clarity and separation of the representations. 5.9 Conclusion and Future Work In conclusion, this study explores multi-modal representation across three different modalities: image, text, and graph within social media datasets. I propose a multi-modal triplet loss and a framework to learn a unified multi-modal representation that retains the original relationships present in the raw data. 79 Figure 5.7: Multi modality representation visualization with t-SNE For future work, I plan to investigate the following: Unimodal embeddings In all above mentioned experiments, embeddings for each modality were obtained independently offline. Future work will focus on leveraging more advanced open source transformers, fine-tuning the unimodal embeddings or training them jointly with the task at hand. Multi-modal on time dimension I also plan to conduct a more comprehensive investigation into the role of different combinations of modalities in multi-modal training and test the predictive capacity of representations on a broader range of variables. Additionally, developing a multi-dimensional demonstration that spans over time could help visualize the dynamism of the representations, providing further insights into the specific conditions under which certain modalities are most useful. Complex triplet loss verification To further validate the efficacy of our multi-modal representation, I intend to predict relationships across different modalities. For example, given a user and their posts, I can determine which posts belong to the user by calculating the distances between their embeddings. This approach will allow us to evaluate the predictive power and coherence of our 80 multi-modal representation. Additionally, employing top-K accuracy as a metric will provide a quantitative measure of performance, highlighting the representation’s ability to accurately associate content across modalities. This in-depth analysis will not only reinforce the robustness of our model but also offer deeper insights into the intricate relationships within the multi-modal data. 81 Chapter 6 Learning Sequential Advancement: Human-in-the-loop Curriculum Reinforcement Learning for Game AI 6.1 Introduction Humans make billions of decisions in games, and how to leverage this wealth of resources to make better adaptive and personalized systems has been a perpetual pursuit. Humans are both quick and impatient learners, as they lose interest when they outgrow once-challenging games or stagnate too early. By learning the human learning process through gaming feedback loops, AI can better create a flow channel that is neither too challenging nor too boring. A curriculum organizes the learning process in an upward spiral by gradually mastering more complex skills and knowledge [9]. When combined with reinforcement learning, it’s been shown that a curriculum can improve convergence or performance compared to learning from the target task from scratch [121, 36, 30]. Thus, will make finer adjustments faster. Previous works [9, 44] focus on reaping the advantage of a curriculum strategy to train the best performing AI agent via automatically proposing curriculum through another RL agent, such as teacher-student framework [80, 95], self-play [118, 5, 3] or goal-gan [44]. One way of interpreting these approaches is that curriculum evolves through the adversarial nature between the two agents, similar to GAN [34, 25]. 82 Figure 6.1: Given specific scenarios during curriculum training, humans can adaptively decide whether to be "friendly" or "adversarial" by observing the progress the agent is able to make. In cases where performance degrades, a user may flexibly adjust the strategy as opposed to an automatic assistive agent. Compared to an automatic agent, human has an innate ability to improvise and adapt when confronted with different scenarios; in order to design more personalized experiences, I must capture these human indications, to help the explainability and flexibility of curriculum reinforcement learning. In Figure 6.1, a user can intuitively understand the learning progress and dynamically manipulate the task difficulty by changing the height of the wall. With new challenging environments, I show how human inductive bias can help solve three nontrivial tasks to various difficulty levels that are otherwise unsolvable by learning from scratch or even auto-curriculum. Another key motivation is assistive agents such as autonomous driving systems, language-based virtual systems, and robotic companions. The agent should provide services adjusted to human preferences and personal needs [138]. In Section 6.2, I give a brief introduction to related work. In Section 6.3, Our interactive curriculum platform is introduced, with which I identify the "inertial" problem in an "easy-to-hard" automatic curriculum. In Section 6.4, I show preliminary results of user studies on our environments that require millions of interactions. I conclude and discuss future work in Section 6.5. 83 6.2 Related Work 6.2.1 Curriculum Reinforcement Learning Apart from previously mentioned automatic learning methods, the most related work to ours is [43], which shows empirically how a rich environment can help to promote the learning of complex behavior without explicit reward guidance. In comparison, I evolve environments leveraging human inductive bias in curriculum design. 6.2.2 Human-in-the-Loop Reinforcement Learning As learning agents move from research labs to the real world, it becomes increasingly important for human users, especially those without programming skills, to teach agents desired behavior. A large amount of work focuses on imitation learning [108, 100, 45, 93], where demonstrations from the expert act as direct supervision. Humans can also interactively shape training with only positive or negative reward signals [63] or combine manual feedback with rewards from MDP [64, 1]. A recent work formulates human-robot interaction as an adversarial game [26] and shows improvement in grasping success and robustness when the robot trains with a human adversary. In this chapter, I aim to close the loop between these two fields by studying the effect of interactive curriculum on reinforcement learning. To achieve this, I have designed three challenging environments that are nontrivial to solve even for state-of-the-art RL method [110], which I describe in the next section. 84 6.3 Interactive Curriculum Guided by Human 6.3.1 Interactive Platform I build this interactive platform with three goals in mind: 1) Real-time online interaction with flexibility; 2) Parallelizable for human-in-the-loop training 3) Seamless control between reinforcement learning and human-guided curriculum. Figure 6.2: General design of our interactive platform and associations between environment container with RL trainer and interactive interface. I run an event-driven environment container separated from the training process to achieve the first goal, allowing the user to send a control signal (e.g., UI control, scene layout, task difficulty) to the environment during training via the interactive interface. The framework is shown in Figure 6.2 to explain the positions users, environment, and training algorithms stand. I 85 Figure 6.3: Example of our interactive platform training in parallel. integrate human-interactive signals into RL parallelization to achieve efficiency similar to automatic training. Figure 6.3 shows an example of parallel training. I perform centralized SGD updates with decentralized experience collection as agents of the same kind share the same network policy [82]. I also enable controlling environment parameters in different instantiations simultaneously via a unified interactive interface, making solving tasks requiring millions of interactions possible. For the third goal, I display real-time instructions and allow users to inspect learning progress when designing the curriculum. Figure 6.4: Our interactive platform for curriculum reinforcement learning allows the user to manipulate the task difficulty via a unified interface (slider and buttons). All three tasks receive only sparse rewards. The manipulable variable for the three environments is, respectively, the number of red obstacles (GridWorld, Top row), the height of the wall (Wall-Jumper, Middle row), and the radius of the target (SparseCrawler, Bottom row). The task difficulty gradually increases from left to right. 86 Figure 6.4 shows our released environments for curriculum reinforcement learning, where users can manipulate the task difficulty. The agents will reach the green target in GridWorld, navigate to land on the green mat in Wall-Jumper, and reach the dynamic green box in SparseCrawler, respectively. As shown in Figure 6.4, the user formulated the curriculum in a way that was neither too hard nor too easy for the agent so as to maximize the efficiency and quality trade-off. During the interaction, the user can pause, play, or save the current configuration. The locations of the objects in the arena are customizable with the cursor, and the height of the wall is tunable for difficulty transitions. Our interactive interface is the same for the rest of the environments listed below. Grid-World: The agent (represented as a blue square) is tasked to reach the goal position (green plus), by navigating through obstacles (red cross, maximally 5). All objects are randomly spawned on a 2D plane. A positive reward 1 for reaching the goal, negative 1 for cross and -0.01 for each step. Movements are in cardinal directions. Wall-Jumper: The goal is to navigate a wall (maximum height is 8), by jumping or (possibly) leveraging a block (white box). The positive reward is 1 for a successful landing on the goal location (green mat) or a negative 1 for falling outside or reaching the maximum allowed time. A penalty of -0.0005 for each step taken. The observation space is 74 dimensional, corresponding to 14 ray casts, each detecting 4 possible objects, plus the global position of the agent and whether or not the agent is grounded. Allowed actions include translation, rotation, and jumping. Sparse-Crawler: A crawler is an agent with 4 arms and 4 forearms. The aim is to reach a randomly located target on the ground (maximum radius of 40). The state is a vector of 117 variables corresponding to the position, rotation, velocity, and angular velocities of each limb, plus the acceleration and angular acceleration of the body. The action space is 20, which corresponds to the target rotations for joints. Only a sparse reward is provided when the target is reached. 6.3.2 A Simple Interactive Curriculum Framework Curriculum reinforcement learning is an adaptation strategy to improve RL training by ordering a set of related tasks to be learned [9]. The most natural ordering is gradually increasing the task 87 (a) Training curve (b) Testing curve (c) High Wall Figure 6.5: "Inertial" problem of auto-curriculum gradually increases the difficulty at fixed intervals. The performance of the auto-curriculum (orange curve) significantly drops when navigation requires jumping over the box first, but the learning inertial prevents it from adapting to the new task. Note that the testing curve is evaluated on the ultimate task unless otherwise stated. difficulty with an automatic curriculum. However, as shown in Figure 6.5a, the auto-curriculum quickly mastered skills when walls were low but failed to adapt when a dramatic change of skill was required (Figure 6.5c), leading to a degradation of performance on the ultimate task (Figure 6.5b). The reason is that the agent must use a box to navigate a high wall in contrast to low-wall scenarios, where additional steps to locate the box will be penalized. Algorithm 2: Human-Guided Interactive Curriculum Result: Agent’s policy π R Initialize difficulty=0; while step ≤ total_step do π R new = Train(π R old, difficulty); if step % interval ==0 then difficulty=H (π R new, difficulty); end π R old = π R new end Our results testify to what [9] observed in their curriculum for the supervised classification task that the curriculum should be designed to focus on "interesting" examples. In our case, the curriculum that resided at an easy level for the first 3M steps "overfitted" the previous skill and prevented it from adapting. Although a comprehensive IF-ELSE rule is possible, in the real world, where situations could be arbitrarily complex, adaptable behavior out of guidance from a human 88 is desired. Following this spirit, I test the ability of human interactive curriculum using a simple framework (Algo 2), where human (function H ) provides feedback by adjusting the task difficulty at a fixed interval in the training loop (i.e., after evaluating the agent’s learning progress on current difficulty, user can choose to tune the task easier/harder or leave it unchanged). I show in the next Section that with this simple interactive curriculum, tasks that are originally unsolvable can be guided towards success by humans, with an additional property of better generalization. 6.4 Experiments We train the agents for three competitive tasks using the training method described previously. I aim to show that a human-in-the-loop interactive curriculum can leverage human prior during adaptation which allows agents to build on past experiences. For all our experiments, I fix the interaction interval (e.g., 0, 0.1, 0.2,...,0.9 of the total steps) and allow users to inspect learning progress twice before adjusting the curriculum. The user can choose to make it easier, harder, or unchanged. Our baseline is PPO with the optimized parameters as in [57]. I train GridWorld, Wall-Jumper, and SparseCrawler for 50K, 5M, and 10M steps. (a) GridWorld (obstacles of 5) (b) Wall-Jumper (height of 8) (c) SparseCrawler (radius of 40) Figure 6.6: Effect of interactive curriculum evaluated on the ultimate task. 6.4.1 Effect of Interactive Curriculum In Section 6.3.1, I introduced three challenging tasks due to the sparsity of rewards. For example, in Figure 6.6a, I observed that agents learning from scratch (green and red curves) had little chance of success with obstacles scattered around the grid, thus failing to reinforce any desired behavior. On 89 (a) GridWorld (obstacles from 1 to 5) (b) Wall-Jumper (heights from 0 to 8)(c) SparseCrawler (radius from 5 to 40) Figure 6.7: The generalization ability of an interactive curriculum evaluated on a set of tasks. The average performance over these tasks is plotted for different time steps. the other hand, users could gradually load or remove obstacles by inspecting the learning progress. Eventually, the models trained with our framework can solve GridWorld with 5 obstacles present. Inspired by this, I further tested our framework on the SparseCrawler task (Figure 6.6c), which requires 10M steps of training. Thanks to our parallel design (Section 6.3.1), I was able to reduce the training time from 10 to 3 hours, during which users would interact ten times. When trained with dynamically moving targets of increasing radius, I found that crawlers gradually learned to align themselves in the right direction. In the Wall-Jumper task (Figure 6.6b), I noticed a variance in performance given different users. One run (blue curve) outperformed learning from scratch with a noticeable margin, while another run (orange curve) performed less well but still converged with learning from scratch. Nevertheless, both the two trials are much better than an auto-curriculum that suffers from over-fitting, as described in Section 6.3.2. 6.4.2 Generalization Ability Overfitting to a particular dataset is a common problem in supervised learning. Similar problems can occur in reinforcement learning when there’s no or slight variation in the environment. To deal with this problem, I considered 1) randomness in terms of how the grid is generated, the layout of blocks and jumpers, and the locations of crawlers and targets. 2) entropy regularization in our PPO implementation, making a strong baseline. 90 We compare models trained with our framework with ones trained from scratch in three environments with a set of tasks. For example, in GridWorld, the agents were tested with the number of obstacles increasing from 1 to 5. In Wall-Jumper, the heights of the wall rise from 0 to 8 discretely during testing and in SparseCrawler, the radius of the moving target transitions from 5 to 40 with a span of 5 (Figure 6.7). One common observation is that our model consistently outperforms learning from scratch. Secondly, there’s a large gap between the curves from the curriculum learning model and learning from scratch (Figure 6.7a), indicating that they "warm up" more quickly with easy tasks than directly jumping into the difficult task. The learning process is analogous to how a human learns by building on past experiences. Interestingly, the curves eventually congregate in Wall-Jumper (Figure 6.7b), for both the curriculum model and scratch model. Finally, I observed that the performance of our model in SparseCrawler (Figure 6.7c) continually arose and reached the target with 1 to 2 more successes, as opposed to the Wall-Jumper environment. I would reset the environment in SparseCrawler only when it reaches the maximum time steps in a single round. When performing qualitative tests, our model solves the GridWorld with varying obstacles, whereas the learning from scratch model fails when the number of obstacles exceeds 3. For WallJumper, our model can reach the goal with minimum steps, while the scratch model would inevitably use the block, which is necessary only for heights over 6.5. In the SparseCrawler environment, our model has a faster moving speed and more success, whereas the scratch model could only reach proximal targets. 6.5 Conclusion To learn a difficult task, humans have developed an easy-to-hard transition strategy to ameliorate the learning curve. Similarly, curriculum reinforcement learning leverages experience across many easy tasks before adapting its skills to more challenging ones. However, questions such as “What metric to use for quantifying the task difficulty?” or “How should the curriculum be designed?” remain unanswered. 91 In this research, I experimented with and demonstrated how human decision-making can help curriculum reinforcement learning agents make very fine-grained difficulty adjustments. We released a multi-platform portable, interactive, and parallelizable tool that features three non-trivial tasks that are challenging to solve (sparse reward, transfer between skills, and a large amount of training up to 10M steps), with varying curriculum space (discrete/continuous). I identified a phenomenon of over-fitting in auto-curriculum that leads to deteriorating performance during skill transfer in this environment. Then, I proposed a simple interactive curriculum framework facilitated by our unified user interface. The experiment shows the promise of a more explainable and generalizable curriculum transition by involving human-in-the-loop on tasks that are otherwise nontrivial to solve. I would like to explore a more efficient method for collecting users’ decision-making for future work. 92 Chapter 7 Conclusions, Implications, and Future Work 7.1 Findings Summary and Implications This dissertation has explored diverse aspects of human-centered AI in games, incorporating team dynamics, representation learning, and interactive systems within the context of online multiplayer games, offering insights into both theoretical and practical applications of these findings. The following sections summarize the conclusions drawn from each chapter and discuss their broader implications. 7.1.1 Learning Sequential and Team Play 7.1.1.1 Summary of Findings Chapter 2 focused on analyzing players’ performance within temporary teams in a team-based online game, using both team and individual performance metrics. The research revealed no significant long-term improvement in performance with experience, suggesting that the game’s design and team balancing strategies significantly limit performance variability and individual contributions. Additionally, it was observed that performance deteriorates over the course of a single game session, although experienced players exhibited a slower rate of decline. Chapter 3 delved into the influence of social ties on team performance within online games, highlighting how preexisting friendship ties affect individual and collective actions. The research 93 demonstrated that while in-friendship players tend to perform better, out-friendship players often experience a reduction in their performance levels. The study also showed that friendship ties do not uniformly guarantee improved team performance, suggesting that while social connections can enhance cooperation among friends, they might also lead to imbalances and potential underperformance within the broader team context. 7.1.1.2 Impact and Implications The findings from Chapter 2 have several important implications for game design and player management. First, the lack of long-term performance improvement despite experience highlights a potential ceiling effect imposed by game mechanics, which could discourage player engagement over time. Recognizing this, game developers might consider modifying the team balancing algorithms to allow for more variability and potential for individual impact, thus maintaining player interest and engagement. Furthermore, the observation of performance deterioration within sessions due to cognitive factors like fatigue suggests that games could be designed to incorporate features that encourage breaks or vary gameplay intensity, potentially improving overall player experience and performance retention. These insights also extend beyond gaming into any dynamic team-based environment, such as project teams in a corporate setting, where understanding the balance between team composition and individual performance can lead to more effective team management and task assignment. Implementing strategies that mitigate fatigue and maintain performance could enhance productivity and reduce turnover in such contexts. The modulation of short-term performance changes by player experience could also inform personalized player support systems. For example, game platforms could develop targeted interventions that help less experienced players manage their session times better or provide adaptive challenges that keep them engaged without overwhelming them, thereby optimizing learning and enjoyment. 94 The implications of the findings in Chapter 3 extend beyond gaming into broader areas of team dynamics and organizational behavior. In corporate settings, understanding the dual effects of social ties can help in structuring teams more effectively. Managers could use insights from this research to balance teams, ensuring that social bonds among team members do not overshadow overall team objectives or marginalize those outside existing social circles. Furthermore, these findings can inform the design of team-based online platforms, where algorithms could be developed to foster optimal team compositions that balance social ties and individual capabilities, potentially enhancing both performance and user satisfaction. This could be particularly relevant in collaborative work environments that rely on virtual team interactions, where understanding the dynamics of social ties can contribute to more effective collaboration and project management. In educational settings, the research suggests that group assignments could benefit from a strategic mix of social ties and academic capabilities, potentially increasing engagement and collective performance. Educators could use these insights to form student groups that are both socially balanced and academically effective, enhancing learning outcomes. Additionally, this research offers a valuable perspective for the development of social network analysis tools that can predict and optimize the impact of social relationships on team performance. Such tools could be used in various contexts, from sports teams to strategic business units, helping leaders make informed decisions about team composition and management strategies. Overall, the nuanced understanding of how social ties impact team dynamics provided by this chapter adds depth to our knowledge of interpersonal relationships and their effects on group performance, offering actionable insights for multiple domains where teamwork is crucial. 7.1.2 Learning Heterogeneous and Multi-modal Representations 7.1.2.1 Summary of Findings In Chapter 4, we explored the dynamics of round-based games by introducing a new dataset from CS:GO, focusing on the interactions and dependencies between rounds. Our research introduced the 95 Multi-Sequence Reasoner, an AI model designed to adapt quickly to player-specific preferences and behaviors in this context. The model leverages round history to effectively anticipate and respond to player actions, demonstrating the significant role that empirical round data plays in enhancing predictive capabilities. Chapter 5 explored the integration of multiple modalities—image, text, and graph data—to develop a unified multi-modal representation that preserves the inherent relationships within the data. This research proposed a multi-modal triplet loss and a comprehensive framework to ensure that the learned representations are dimensionally consistent and reflect the original relational dynamics of the data. The experimental results validated the efficacy of this framework, demonstrating its capability to enhance the understanding and predictive accuracy of multi-modal datasets in social media. 7.1.2.2 Impact and Implications The findings from Chapter 4 have profound implications for the development of AI in interactive environments, particularly in gaming. By highlighting the importance of round history, this research suggests that AI systems can be made more adaptable and responsive to user behavior, leading to more engaging and personalized gaming experiences. This could help in developing AI that not only reacts in real-time but also anticipates future player actions based on past behaviors, enhancing both the challenge and the enjoyment of the game. Beyond gaming, these insights are applicable to any scenario involving sequential decisionmaking processes, such as financial trading systems, where predicting sequences of events can lead to more informed and strategic decision-making. The ability of AI to adapt to a user’s unique behavior pattern could also find applications in educational technologies, where a system might adjust its teaching strategies based on the learning pace and style of each student. Additionally, the methodology and findings could influence how developers approach the design of AI in other round-based interactive systems, encouraging a focus on dynamic adaptation rather than static response patterns. This approach could be particularly beneficial in the development of 96 virtual reality (VR) training simulations, where adaptive AI can provide tailored challenges to users, thereby improving learning outcomes and user engagement. Furthermore, the implications for data science are significant, as the techniques developed could be applied to improve the analysis and prediction of any temporal data sequences, enhancing the granularity and accuracy of insights derived from such data. For chapter 5, the advancements in multi-modal representation learning have broad and impactful implications for several fields, particularly in how complex data is analyzed and utilized. In social media, these findings can significantly improve content recommendation systems and targeted advertising by providing a more holistic understanding of user behavior and preferences across various content types. This could lead to more engaging user experiences and more effective marketing strategies, as content delivery can be finely tuned to the multi-faceted interests of users. In the realm of artificial intelligence and machine learning, this research contributes to the development of more sophisticated models that can process and interpret data from multiple sources simultaneously. This capability is crucial for applications such as autonomous vehicles, where integrating visual, textual, and sensor data is essential for safe operation. The healthcare industry could also benefit from these techniques, particularly in patient monitoring and diagnosis, where combining medical images, patient records, and real-time health data could lead to more accurate diagnoses and personalized treatment plans. By understanding the relationships between different types of data, medical professionals can gain a more comprehensive view of a patient’s health status. Furthermore, the ability to integrate and analyze different data modalities can enhance security systems, where combining visual surveillance data with textual and other sensor data can lead to more robust security measures. This approach can improve anomaly detection and threat recognition in sensitive environments. Lastly, the implications for research in cognitive science and psychology are significant, as this approach allows for a better understanding of how different types of information are processed and integrated in the human brain. This could inform the development of new cognitive models and 97 therapeutic strategies, particularly in understanding and treating disorders that affect information processing and integration. Overall, the methodological innovations presented in this chapter provide a foundational framework that can be adapted and extended to a wide range of applications, enhancing our ability to make informed decisions based on complex and diverse data sources. 7.1.3 Learning Human AI Collaborations 7.1.3.1 Summary of Findings Chapter 6 focused on the application of curriculum reinforcement learning to address the challenge of mastering complex tasks by employing an easy-to-hard progression strategy. This approach was demonstrated to facilitate the AI’s learning process, making it more efficient and effective in adapting to increasingly difficult tasks. The research also highlighted the potential of integrating human input to refine the learning process, suggesting that human judgment can significantly enhance the adaptability and effectiveness of AI learning strategies. 7.1.3.2 Impact and Implications The findings from this chapter have profound implications for the future of AI training methodologies and their application across various domains. In educational technology, curriculum learning can be applied to develop systems that adapt to the learning pace of students, potentially offering a more personalized and effective educational experience. By modeling the progression of learning tasks according to the individual’s capability, educational platforms can maximize engagement and learning outcomes. In the field of robotics, employing a curriculum learning approach could significantly improve the efficiency of training robots to perform complex tasks. By structuring the learning process in stages, from simple to complex, robots can develop a more robust understanding and skill set, enhancing their performance and versatility in real-world applications. 98 Additionally, the integration of human decision-making in the training process could revolutionize how we approach AI development in safety-critical systems, such as autonomous driving and medical diagnostics. Human experts can provide nuanced insights that guide the AI’s learning process, ensuring that the system not only learns efficiently but also adheres to safety and ethical standards. The concept of curriculum learning also extends to the AI development for game design, where it can be used to create more sophisticated AI opponents that adapt to the player’s skill level, enhancing the gaming experience by providing appropriate challenges and learning opportunities for players at all levels. Moreover, this approach has implications for workforce training, where curriculum learning can be applied to better prepare employees for the complexities of their roles. By simulating a progression of task difficulties, training programs can more effectively equip employees with the skills needed to handle challenging situations, thereby improving job readiness and performance. Overall, the methodology and insights gained from this chapter offer a scalable and adaptable framework that could influence a broad spectrum of AI applications, making the process of learning complex tasks more manageable and effective. This could lead to broader adoption and more rapid development of AI systems capable of handling diverse and challenging environments. 7.2 Future Directions This dissertation has covered a wide range of topics related to team dynamics, individual and team performance, and the integration of complex data sets into actionable models. The future research directions stemming from each chapter are rich and varied, promising to expand our understanding and application of these initial findings. 99 7.2.1 Future Directions for Learning Sequential and Team Play The research in Chapter 2 raises several intriguing questions for future investigation. One key area is exploring alternative team composition strategies that allow for greater variance in player performance and a more significant individual impact. Further studies could also examine the specific game design elements that might be modified to enhance long-term player engagement and performance improvement. Additionally, examining the psychological and physiological factors contributing to performance deterioration during game sessions could yield insights into optimal session lengths and the design of games that help manage player fatigue. Chapter 3’s findings on the impact of social ties within teams suggest several future research directions. Investigations could focus on how different types of online and offline social ties (e.g., professional connections vs. personal friendships) affect team performance in various settings. There is also a need to explore mechanisms to mitigate the negative impacts of out-friendship relationships on team performance, possibly through team-building exercises or organizational development interventions. 7.2.2 Future Directions for Learning Heterogeneous and Multi-modal Representations The introduction of the Multi-Sequence Reasoner in Chapter 4 opens up several avenues for future research. One promising direction is to test this model in other round-based games and contexts, such as strategic business simulations or military planning exercises, to validate and possibly extend its applicability. Another area of interest could be the integration of real-time adaptive learning, where the AI adjusts its strategies within a single gameplay session based on the player’s actions and outcomes. For Chapter 5, future work could look into expanding the multi-modal framework to include additional types of data, such as audio or physiological signals, to enrich the understanding of user interactions and behaviors. Another direction could involve exploring the applicability of the 100 developed models to different domains, such as predictive healthcare or real-time crisis management, where rapid and accurate interpretation of complex data is crucial. 7.2.3 Future Directions for Learning Human AI Collaborations The curriculum reinforcement learning framework presented in Chapter 6 offers several exciting future directions. Research could explore scaling the framework to more complex, multi-agent environments or applying it to real-world learning situations, such as skill development in professional settings. Additionally, integrating more granular human feedback into the learning process could further refine the adaptability and effectiveness of AI systems. 7.2.4 Final Remarks Understanding user behaviors will attract more attention and bridge both the gaming industry and the AI research community. This dissertation focuses on these three main interests: Learning sequential and collaborations, Heterogeneous and Multi-modal Representations, and Human AI collaboration. Across all chapters, there is a recurring theme of combining innovative data-driven, generative AI, representation learning, and reinforcement learning solutions to help advance and build more human-centered, safe, and robust intelligent gamified online systems. A cross-disciplinary approach involving psychology, computer science, and organizational behavior could greatly enrich the research and lead to more comprehensive human and AI interaction models. Moreover, developing generalized frameworks that can be adapted for more domains beyond games could significantly impact both theoretical research and practical implementations, bridging the gap between academia and industry applications. 101 Bibliography [1] David Abel, John Salvatier, Andreas Stuhlmüller, and Owain Evans. Agent-agnostic humanin-the-loop reinforcement learning. arXiv preprint arXiv:1701.04079, 2017. [2] Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 44–54. ACM, 2006. [3] Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, and Igor Mordatch. Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528, 2019. [4] Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, 41(2):423–443, 2018. [5] Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, and Igor Mordatch. Emergent complexity via multi-agent competition. arXiv preprint arXiv:1710.03748, 2017. [6] Shaowen Bardzell, Jeffrey Bardzell, Tyler Pace, and Kayce Reed. Blissfully productive: grouping and cooperation in world of warcraft instance runs. In Proceedings of the 2008 ACM conference on Computer supported cooperative work, pages 357–360. ACM, 2008. [7] Roi Becker, Yifat Chernihov, Yuval Shavitt, and Noa Zilberman. An analysis of the steam community network evolution. In Electrical & Electronics Engineers in Israel (IEEEI), 2012 IEEE 27th Convention of, pages 1–5. IEEE, 2012. [8] Grace A Benefield, Cuihua Shen, and Alex Leavitt. Virtual team networks: How group social capital affects team success in a massively multiplayer online game. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, pages 679–690. ACM, 2016. [9] Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009. [10] Jeremy Blackburn, Ramanuja Simha, Nicolas Kourtellis, Xiang Zuo, Matei Ripeanu, John Skvoretz, and Adriana Iamnitchi. Branded with a scarlet c: cheaters in a gaming social network. In Proceedings of the 21st international conference on World Wide Web, pages 81–90. ACM, 2012. 102 [11] Maarten Boksem, Theo Meijman, and Monicque Lorist. Effects of mental fatigue on attention: an erp study. Cognitive brain research, 25(1):107–116, 2005. [12] Maarten Boksem, Theo Meijman, and Monicque Lorist. Mental fatigue, motivation and action monitoring. Biological psychology, 72(2):123–132, 2006. [13] Maarten Boksem and Mattie Tops. Mental fatigue: costs and benefits. Brain Res Rev, 59(1):125–139, 2008. [14] Gianluca Borghini, Laura Astolfi, Giovanni Vecchiato, Donatella Mattia, and Fabio Babiloni. Measuring neurophysiological signals in aircraft pilots and car drivers for the assessment of mental workload, fatigue and drowsiness. Neuroscience & Biobehavioral Reviews, 44:58–75, 2014. [15] Katy Börner, Noshir Contractor, Holly J Falk-Krzesinski, Stephen M Fiore, Kara L Hall, Joann Keyton, Bonnie Spring, Daniel Stokols, William Trochim, and Brian Uzzi. A multilevel systems perspective for the science of team science. Science Translational Medicine, 2(49):49cm24–49cm24, 2010. [16] Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997. [17] Seunghoo Chung, Robert B Lount Jr, Hee Man Park, and Ernest S Park. Friends with performance benefits: A meta-analysis on the relationship between friendship and group performance. Personality and Social Psychology Bulletin, 44(1):63–79, 2018. [18] Sheldon Cohen. Social relationships and health. American psychologist, 59(8):676, 2004. [19] Noshir Contractor. Some assembly required: leveraging web science to understand and enable team assembly. Phil. Trans. R. Soc. A, 371(1987):20120385, 2013. [20] Shai Danziger, Jonathan Levav, and Liora Avnaim-Pesso. Extraneous factors in judicial decisions. Proceedings of the National Academy of Sciences, 108(17):6889–6892, 2011. [21] Pasquale De Meo, Emilio Ferrara, Giacomo Fiumara, and Alessandro Provetti. On facebook, most ties are weak. Communications of the ACM, 57(11):78–84, 2014. [22] Evangelia Demerouti, Arnold B Bakker, Friedhelm Nachreiner, and Wilmar B Schaufeli. The job demands-resources model of burnout. Journal of Applied psychology, 86(3):499, 2001. [23] Jia Deng. A large-scale hierarchical image database. Proc. of IEEE Computer Vision and Pattern Recognition, 2009, 2009. [24] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [25] Jiali Duan, Xiaoyuan Guo, Yuhang Song, Chao Yang, and C-C Jay Kuo. Portraitgan for flexible portrait manipulation. arXiv preprint arXiv:1807.01826, 2018. 103 [26] Jiali Duan, Qian Wang, Lerrel Pinto, C-C Jay Kuo, and Stefanos Nikolaidis. Robot learning via human adversarial games. CoRR, 2019. [27] Nicole B Ellison, Charles Steinfield, and Cliff Lampe. The benefits of facebook “friends:” social capital and college students’ use of online social network sites. Journal of ComputerMediated Communication, 12(4):1143–1168, 2007. [28] Emilio Ferrara, Nazanin Alipourfard, Keith Burghardt, Chiranth Gopal, and Kristina Lerman. Dynamics of content quality in collaborative knowledge production. In Proceedings of 11th AAAI International Conference on Web and Social Media, pages 520–523. AAAI, 2017. [29] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1126–1135. JMLR. org, 2017. [30] Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. Reverse curriculum generation for reinforcement learning. arXiv preprint arXiv:1707.05300, 2017. [31] Yoav Freund, Robert Schapire, and N Abe. A short introduction to boosting. JournalJapanese Society For Artificial Intelligence, 14(771-780):1612, 1999. [32] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning, volume 1. Springer series in statistics Springer, Berlin, 2001. [33] Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001. [34] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014. [35] Mark S Granovetter. The strength of weak ties. In Social networks, pages 347–367. Elsevier, 1977. [36] Alex Graves, Marc G Bellemare, Jacob Menick, Remi Munos, and Koray Kavukcuoglu. Automated curriculum learning for neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1311–1320. JMLR. org, 2017. [37] Roger Guimera, Brian Uzzi, Jarrett Spiro, and Luis A Nunes Amaral. Team assembly mechanisms determine collaboration network structure and team performance. Science, 308(5722):697–702, 2005. [38] William H Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, and Ruslan Salakhutdinov. Minerl: a large-scale dataset of minecraft demonstrations. arXiv preprint arXiv:1907.13440, 2019. [39] Aaron Halfaker, Oliver Keyes, Daniel Kluver, Jacob Thebault-Spieker, Tien Nguyen, Kenneth Shores, Anuradha Uduwage, and Morten Warncke-Wang. User session identification based on strong regularities in inter-activity time. In Proceedings of the 24th International Conference on World Wide Web, pages 410–418, 2015. 104 [40] Juho Hamari and Veikko Eranti. Framework for designing and evaluating game achievements. In Digra conference. Citeseer, 2011. [41] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [42] Alice F Healy, James A Kole, Carolyn J Buck-Gengler, and Lyle E Bourne. Effects of prolonged work on data entry speed and accuracy. Journal of Experimental Psychology: Applied, 10(3):188, 2004. [43] Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, SM Eslami, et al. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286, 2017. [44] David Held, Xinyang Geng, Carlos Florensa, and Pieter Abbeel. Automatic goal generation for reinforcement learning agents. 2018. [45] Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. In Advances in neural information processing systems, pages 4565–4573, 2016. [46] Tin Ho. Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition, pages 278–282. IEEE, 1995. [47] Tin Ho. The random subspace method for constructing decision forests. IEEE Trans on Pattern Analysis and Machine Intelligence, 20(8):832–844, 1998. [48] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [49] Sepp Hochreiter, A Steven Younger, and Peter R Conwell. Learning to learn using gradient descent. In International Conference on Artificial Neural Networks, pages 87–94. Springer, 2001. [50] G Robert J Hockey, A John Maule, Peter J Clough, and Larissa Bdzola. Effects of negative mood states on risk in everyday decision making. Cognition & Emotion, 14(6):823–855, 2000. [51] Robert Hockey. Stress and fatigue in human performance, volume 3. John Wiley & Sons Inc, 1983. [52] Katja Hofmann. Minecraft as ai playground and laboratory. In Proceedings of the Annual Symposium on Computer-Human Interaction in Play, pages 1–1, 2019. [53] Hengyuan Hu, Denis Yarats, Qucheng Gong, Yuandong Tian, and Mike Lewis. Hierarchical decision making by generating and following natural language instructions. In Advances in neural information processing systems, pages 10025–10034, 2019. 105 [54] Jeff Huang, Thomas Zimmermann, Nachiappan Nagapan, Charles Harrison, and Bruce C Phillips. Mastering the art of war: how patterns of gameplay influence skill in halo. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 695–704. ACM, 2013. [55] Yun Huang, Wenyue Ye, Nicholas Bennett, and Noshir Contractor. Functional or social?: exploring teams in online games. In Proc. Conference on Computer supported cooperative work, pages 399–408. ACM, 2013. [56] Daniel Johnson, Peta Wyeth, Madison Clark, and Christopher Watling. Cooperative game play with avatars and agents: Differences in brain activity and the experience of play. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 3721–3730. ACM, 2015. [57] Arthur Juliani, Vincent-Pierre Berges, Esh Vckay, Yuan Gao, Hunter Henry, Marwan Mattar, and Danny Lange. Unity: A general platform for intelligent agents. arXiv preprint arXiv:1809.02627, 2018. [58] Ichiro Kawachi and Lisa F Berkman. Social ties and mental health. Journal of Urban health, 78(3):458–467, 2001. [59] Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Jaskowski. ´ Vizdoom: A doom-based ai research platform for visual reinforcement learning. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1–8. IEEE, 2016. [60] Jooyeon Kim, Brian C Keegan, Sungjoon Park, and Alice Oh. The proficiency-congruency dilemma: Virtual team design and performance in multiplayer online games. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 4351–4365. ACM, 2016. [61] Young Ji Kim, David Engel, Anita Williams Woolley, Jeffrey Yu-Ting Lin, Naomi McArthur, and Thomas W Malone. What makes a strong team?: Using collective intelligence to predict team performance in league of legends. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, pages 2316–2329. ACM, 2017. [62] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [63] W Bradley Knox and Peter Stone. Interactively shaping agents via human reinforcement: The tamer framework. In Proceedings of the fifth international conference on Knowledge capture, pages 9–16, 2009. [64] W Bradley Knox and Peter Stone. Combining manual feedback with subsequent mdp reward signals for reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1, pages 5–12. Citeseer, 2010. 106 [65] Farshad Kooti, Esteban Moro, and Kristina Lerman. Twitter session analytics: Profiling users’ short-term behavioral changes. In Proceedings of the 8th International Conference, pages 71–86. Springer, 2016. [66] Farshad Kooti, Karthik Subbian, Winter Mason, Lada Adamic, and Kristina Lerman. Understanding short-term changes in online activity sessions. In Proceedings of the 26th International Conference on World Wide Web Companion, pages 555–563, 2017. [67] Yubo Kou and Xinning Gui. Playing with strangers: Understanding temporary teams in league of legends. In Proceedings of the Symposium on Computer-Human Interaction in Play, pages 161–169, 2014. [68] Yubo Kou, Xinning Gui, and Yong Ming Kow. Ranking practices and distinction in league of legends. In Proceedings of the Symposium on Computer-Human Interaction in Play, pages 4–9. ACM, 2016. [69] Vitaly Kurin, Sebastian Nowozin, Katja Hofmann, Lucas Beyer, and Bastian Leibe. The atari grand challenge dataset. arXiv preprint arXiv:1705.10998, 2017. [70] Robert Kurzban, Angela Duckworth, Joseph W Kable, and Justus Myers. An opportunity cost model of subjective effort and task performance. Behavioral and Brain Sciences, 36(06):661–679, 2013. [71] Alex Leavitt, Brian C Keegan, and Joshua Clark. Ping to win?: Non-verbal communication and team performance in competitive online multiplayer games. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 4337–4350. ACM, 2016. [72] Thomas C Leonard. Richard h. thaler, cass r. sunstein, nudge: Improving decisions about health, wealth, and happiness, 2008. [73] Julian Lim, Wen-chau Wu, Jiongjiong Wang, John A Detre, David F Dinges, and Hengyi Rao. Imaging brain fatigue from sustained mental workload: an asl perfusion study of the time-on-task effect. Neuroimage, 49(4):3426–3435, 2010. [74] Zeming Lin, Jonas Gehring, Vasil Khalidov, and Gabriel Synnaeve. Stardata: A starcraft ai research dataset. In Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference, 2017. [75] Jeffrey A Linder, Jason N Doctor, Mark W Friedberg, Harry Reyes Nieva, Caroline Birks, Daniella Meeker, and Craig R Fox. Time of day and the decision to prescribe antibiotics. JAMA internal medicine, 174(12):2029–2031, 2014. [76] Monicque Lorist, Maarten Boksem, and Richard Ridderinkhof. Impaired cognitive control and reduced cingulate activity during mental fatigue. Cognitive Brain Research, 24(2):199– 205, 2005. [77] Samuele M Marcora, Walter Staiano, and Victoria Manning. Mental fatigue impairs physical performance in humans. Journal of Applied Physiology, 106(3):857–864, 2009. 107 [78] Winter Mason and Aaron Clauset. Friends ftw! friendship and competition in halo: Reach. In Proceedings of the 2013 conference on Computer supported cooperative work, pages 375–386. ACM, 2013. [79] John Mathieu, M Travis Maynard, Tammy Rapp, and Lucy Gilson. Team effectiveness 1997-2007: A review of recent advancements and a glimpse into the future. Journal of management, 34(3):410–476, 2008. [80] Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. Teacher-student curriculum learning. IEEE transactions on neural networks and learning systems, 2019. [81] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013. [82] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937, 2016. [83] Satyam Mukherjee, Yun Huang, Julia Neidhardt, Brian Uzzi, and Noshir Contractor. Prior shared success predicts victory in team competitions. Nature Human Behaviour, 2018. [84] Mark Muraven and Roy F Baumeister. Self-regulation and depletion of limited resources: Does self-control resemble a muscle? Psychological bulletin, 126(2):247, 2000. [85] Bonnie Nardi and Justin Harris. Strangers and friends: Collaborative play in world of warcraft. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work, pages 149–158. ACM, 2006. [86] Alex Nichol and John Schulman. Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999, 2:2, 2018. [87] A O‘Dhaniel, Ruth LF Leong, and Yoanna A Kurnianingsih. Cognitive fatigue destabilizes economic decision making preferences and strategies. PloS one, 10(7):e0132022, 2015. [88] OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław D˛ebiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, and Susan Zhang. Dota 2 with large scale deep reinforcement learning, 2019. [89] Kunwoo Park, Meeyoung Cha, Haewoon Kwak, and Kuan-Ta Chen. Achievement and friends: Key factors of player retention vary across player levels in online multiplayer games. arXiv:1702.08005, 2017. [90] Nathalie Pattyn, Xavier Neyt, David Henderickx, and Eric Soetens. Psychophysiological investigation of vigilance decrement: boredom or cognitive fatigue? Physiology & Behavior, 93(1):369–378, 2008. 108 [91] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701–710, 2014. [92] Julianna Pillemer and Nancy P Rothbard. Friends without benefits: Understanding the dark sides of workplace friendship. Academy of Management Review, (ja), 2018. [93] Lerrel Pinto and Abhinav Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In 2016 IEEE international conference on robotics and automation (ICRA), pages 3406–3413. IEEE, 2016. [94] Nataliia Pobiedina, Julia Neidhardt, Maria del Carmen Calatrava Moreno, Laszlo GradGyenge, and Hannes Werthner. On successful team formation: Statistical analysis of a multiplayer online game. In Business Informatics (CBI), 2013 IEEE 15th Conference on, pages 55–62. IEEE, 2013. [95] Rémy Portelas, Cédric Colas, Katja Hofmann, and Pierre-Yves Oudeyer. Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. arXiv preprint arXiv:1910.07224, 2019. [96] Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732, 2015. [97] Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. 2016. [98] Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, and Vaibhava Goel. Selfcritical sequence training for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7008–7024, 2017. [99] Mark O Riedl. Human-centered artificial intelligence and machine learning. Human Behavior and Emerging Technologies, 1(1):33–36, 2019. [100] Stéphane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635, 2011. [101] Andries F Sanders and Andries Sanders. Elements of human performance: Reaction processes and attention in human skill. Psychology Press, 2013. [102] A. Sapienza, H. Peng, and E. Ferrara. Performance dynamics and success in online games. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pages 902–909, 2017. [103] Anna Sapienza, Alessandro Bessi, and Emilio Ferrara. Non-negative tensor factorization for human behavioral pattern mining in online games. Information, 9(3):66, 2018. [104] Anna Sapienza, Alessandro Bessi, and Emilio Ferrara. Non-negative tensor factorization for human behavioral pattern mining in online games. Information, 9(3):66, 2018. 109 [105] Anna Sapienza, Hao Peng, and Emilio Ferrara. Performance dynamics and success in online games. In 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pages 902–909, 2017. [106] Anna Sapienza, Yilei Zeng, Alessandro Bessi, Kristina Lerman, and Emilio Ferrara. Individual performance in team-based online games. Royal Society open science, 5(6):180329, 2018. [107] Mark W Scerbo. Stress, workload, and boredom in vigilance: a problem and an answer. Stress, workload, and fatigue, 2001. [108] Stefan Schaal. Is imitation learning the route to humanoid robots? Trends in cognitive sciences, 3(6):233–242, 1999. [109] Robert E Schapire and Yoram Singer. Improved boosting algorithms using confidence-rated predictions. Machine learning, 37(3):297–336, 1999. [110] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. [111] Cuihua Shen, Rabindra Ratan, Y Dora Cai, and Alex Leavitt. Do men advance faster than women? debunking the gender performance gap in two massively multiplayer online games. Journal of Computer-Mediated Communication, 21(4):312–329, 2016. [112] Hans Sievertsen, Francesca Gino, and Marco Piovesan. Cognitive fatigue influences students’ performance on standardized tests. Proceedings of the National Academy of Sciences, 113(10):2621–2624, 2016. [113] Rafet Sifa, Anders Drachen, and Christian Bauckhage. Large-scale cross-game player behavior analysis on steam. Borderlands, 2:46–378, 2015. [114] Philipp Singer, Emilio Ferrara, Farshad Kooti, Markus Strohmaier, and Kristina Lerman. Evidence of online performance deterioration in user sessions on reddit. PloS one, 11(8):e0161636, 2016. [115] Anu Sivunen and Marko Hakonen. Review of virtual environment studies on social and group phenomena. Small Group Research, 42(4):405–457, 2011. [116] Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In Advances in neural information processing systems, pages 4077–4087, 2017. [117] Constance Steinkuehler and Sean Duncan. Scientific habits of mind in virtual worlds. Journal of Science Education and Technology, 17(6):530–543, 2008. [118] Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, and Rob Fergus. Intrinsic motivation and automatic curricula via asymmetric self-play. arXiv preprint arXiv:1703.05407, 2017. 110 [119] Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid. Videobert: A joint model for video and language representation learning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7464–7473, 2019. [120] Michael Szell, Renaud Lambiotte, and Stefan Thurner. Multirelational organization of largescale social networks in an online world. Proceedings of the National Academy of Sciences, 107(31):13636–13641, 2010. [121] Matthew E Taylor and Peter Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(Jul):1633–1685, 2009. [122] Peggy A Thoits. Mechanisms linking social ties and support to physical and mental health. Journal of health & social behavior, 52(2):145–161, 2011. [123] Sebastian Thrun and Lorien Pratt. Learning to learn. Springer Science & Business Media, 2012. [124] Sabine Trepte, Leonard Reinecke, and Keno Juechems. The social side of gaming: How playing online computer games creates online and offline social support. Computers in Human Behavior, 28(3):832–839, 2012. [125] April Tyack, Peta Wyeth, and Daniel Johnson. The appeal of moba games: What makes people start, stay, and stop. In Proceedings of the 2016 Annual Symposium on ComputerHuman Interaction in Play, pages 313–325. ACM, 2016. [126] Debra Umberson and Jennifer Karas Montez. Social relationships and health: A flashpoint for health policy. Journal of health and social behavior, 51(1_suppl):S54–S66, 2010. [127] Dimitri Van der Linden, Michael Frese, and Theo F Meijman. Mental fatigue and the control of cognitive processes: effects on perseveration and planning. Acta Psychologica, 113(1):45–65, 2003. [128] Rodrigo Vicencio-Moreira, Regan L Mandryk, and Carl Gutwin. Now you can compete with anyone: Balancing players of different skill levels in a first-person shooter game. In Proceedings of the 33rd ACM Conference on Human Factors in Computing Systems, pages 2255–2264. ACM, 2015. [129] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In Advances in neural information processing systems, pages 3630– 3638, 2016. [130] Kathleen Vohs, Roy Baumeister, Brandon Schmeichel, Jean Twenge, Noelle Nelson, and Dianne Tice. Making choices impairs subsequent self-control: a limited-resource account of decision making, self-regulation, and active initiative. 2014. [131] Joel S Warm, Gerald Matthews, and Victor S Finomore Jr. Vigilance, workload, and stress. Performance under stress, pages 115–41, 2008. 111 [132] Amy Wax, Leslie A DeChurch, and Noshir S Contractor. Self-organizing into winning teams: Understanding the mechanisms that drive successful collaborations. Small Group Research, 48(6):665–718, 2017. [133] Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992. [134] Ronald J Williams and David Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280, 1989. [135] Robert E Wilson, Samuel D Gosling, and Lindsay T Graham. A review of facebook research in the social sciences. Perspectives on psychological science, 7(3):203–220, 2012. [136] Stefan Wuchty, Benjamin F Jones, and Brian Uzzi. The increasing dominance of teams in production of knowledge. Science, 316(5827):1036–1039, 2007. [137] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pages 1480–1489, 2016. [138] Yilei Zeng. How human centered ai will contribute towards intelligent gaming systems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 15742– 15743, 2021. [139] Yilei Zeng, Jiali Duan, Yang Li, Emilio Ferrara, Lerrel Pinto, C-C Jay Kuo, and Stefanos Nikolaidis. Human decision makings on curriculum reinforcement learning with difficulty adjustment. arXiv preprint arXiv:2208.02932, 2022. [140] Yilei Zeng, Deren Lei, Beichen Li, Gangrong Jiang, Emilio Ferrara, and Michael Zyda. Learning to reason in round-based games: Multi-task sequence generation for purchasing decision making in first-person shooters. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, pages 308–314, 2020. [141] Yilei Zeng, Anna Sapienza, and Emilio Ferrara. The influence of social ties on performance in team-based online games. IEEE Transactions on Games, 2019. [142] Yilei Zeng, Aayush Shah, Jameson Thai, and Michael Zyda. Applied machine learning for games: A graduate school course. The Eleventh Symposium on Educational Advances in Artificial Intelligence (EAAI-21), 2020. [143] Zhi-Jin Zhong. The effects of collective mmorpg (massively multiplayer online role-playing games) play on gamers’ online and offline social capital. Computers in human behavior, 27(6):2352–2363, 2011. 112
Abstract (if available)
Abstract
A paradigm shift towards human-centered intelligent gaming systems is gradually setting in. This dissertation explores the complexities of social sequential decision-making within online gaming environments and presents comprehensive AI solutions to enhance personalized single and multi-agent experiences. The three core contributions of the dissertation are intricately interrelated, creating a cohesive framework for understanding and improving AI in gaming. I begin by delving into the dynamics of gaming sessions and sequential in-game individual and social decision-making, which establishes a baseline of how decisions evolve, providing the necessary context for the subsequent integration of diverse information sources; two, the integration of heterogeneous information and multi-modal trajectories, which enhances decision-making generation models; and three, the creation of a reinforcement learning with human feedback framework to train gaming AIs that effectively align with human preferences and strategies, which enables the system not only learning but also interacting with humans. Collectively, this dissertation combines innovative data-driven, generative AI, representation learning, and human-AI collaboration solutions to help advance both the fields of computational social science and artificial intelligence applications of gaming.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
No-regret learning and last-iterate convergence in games
PDF
Socially-informed content analysis of online human behavior
PDF
Robust and adaptive online reinforcement learning
PDF
Sequential Decision Making and Learning in Multi-Agent Networked Systems
PDF
Emotional appraisal in deep reinforcement learning
PDF
Predicting and planning against real-world adversaries: an end-to-end pipeline to combat illegal wildlife poachers on a global scale
PDF
Artificial Decision Intelligence: integrating deep learning and combinatorial optimization
PDF
Decision making in complex action spaces
PDF
Leveraging cross-task transfer in sequential decision problems
PDF
Modeling emotional effects on decision-making by agents in game-based simulations
PDF
Behavior-based approaches for detecting cheating in online games
PDF
Improving decision-making in search algorithms for combinatorial optimization with machine learning
PDF
Understanding goal-oriented reinforcement learning
PDF
Robust and adaptive online decision making
PDF
Artificial intelligence for low resource communities: Influence maximization in an uncertain world
PDF
High-throughput methods for simulation and deep reinforcement learning
PDF
Reducing unproductive learning activities in serious games for second language acquisition
PDF
Online reinforcement learning for Markov decision processes and games
PDF
Automated alert generation to improve decision-making in human robot teams
PDF
Machine learning in interacting multi-agent systems
Asset Metadata
Creator
Zeng, Yilei
(author)
Core Title
Learning social sequential decision making in online games
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2024-05
Publication Date
06/10/2024
Defense Date
05/07/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
artificial intelligence,computational social science,OAI-PMH Harvest,online games,reinforcement learning,representation learning
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ferrara, Emilio (
committee chair
), Williams, Dmitri (
committee member
), Zyda, Michael (
committee member
)
Creator Email
yilei.zeng@gmail.com,yileizen@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113992539
Unique identifier
UC113992539
Identifier
etd-ZengYilei-13071.pdf (filename)
Legacy Identifier
etd-ZengYilei-13071
Document Type
Dissertation
Format
theses (aat)
Rights
Zeng, Yilei
Internet Media Type
application/pdf
Type
texts
Source
20240610-usctheses-batch-1166
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
artificial intelligence
computational social science
online games
reinforcement learning
representation learning