Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Essays on the economics of digital entertainment
(USC Thesis Other)
Essays on the economics of digital entertainment
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ESSAYS ON THE ECONOMICS OF DIGITAL ENTERTAINMENT by Wensi Zhang A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (BUSINESS ADMINISTRATION) May 2023 Copyright 2023 Wensi Zhang Dedication This dissertation is dedicated to my loving and supportive family. To my parents, Derong Zheng and Xianming Zhang, who have always believed in me and provided selfless support throughout my life. To my dearest husband, Guanshengrui Hao, who has been my rock throughout this journey. And finally, to my lovely and sweet daughter, Emily Hao, who delays but enriches this process. ii Acknowledgements My dissertation quantifies numerous things. But it is impossible to quantify how grateful I am for the help and support I have received throughout the years. I would like to express my deepest gratitude to my supervisor, Prof. Sha Yang, for her un- wavering guidance and support throughout my Ph.D. journey. You have been more than just a supervisor for me; you have been a mentor, a friend, and a family member. I am truly grateful for your confidence in me, especially during the times when I faced personal challenges and strug- gled to find the right balance between my responsibilities as a Ph.D. student and as a new mom. I could not have thrived in either role without you. I am sincerely thankful to the members of my dissertation committee members — Professor Yanhao (Max) Wei, Professor Anthony Dukes, and Professor Tianshu Sun, for their constructive feedback and insightful suggestions, which have significantly contributed to the improvement of this dissertation and helped me land on my dream job. I would also like to extend my gratitude to other faculty members of the Marketing Department at USC Marshall, particularly: Lan Luo, Nikhil Malik, Gerard Tellis, Sivaramakrishnan Siddarth, Shantanu Dutta, Kalinda Ukanwa, Kristin Diehl, and Stephanie Tully, for their support in my studies at USC Marshall. iii In addition, I would like to thank Julie Phaneuf, Elizabeth Mathew, Doris Meunier, and Jennifer Lim for their administrative support and assistance that make my life at Marshall smooth and enjoyable. My Ph.D. journey would not have been so wonderful without my friends. I am deeply grateful to Yao Yao, Jisu Cao, and Yanyan Li for the many engaging and enjoyable discussions we have shared over the years. Your encouragement and support helped me through the most challeng- ing times. I thank Gizem Ceylan and Sajeev Nair for being fantastic officemates and providing invaluable help during my job search. I am also thankful to Mengxia Zhang, Jihoon Hong, and Chaumanix Dutton for their support throughout the years. Finally, I would like to thank my family for their love and support. I am deeply indebted to my parents, Derong Zheng and Xianming Zhang, for their love and confidence in me. Thank you for flying to the United States and taking care of us during the worst COVID-19 times — this dissertation would not have been possible without your support. I also want to thank my husband, Guanshengrui Hao, for always being there for me, loving me, and providing everything he could to support me. And my sweet, lovely, and beautiful girl, Emily Hao: thank you for inspiring me with your exceptional creativity and eagerness to explore. iv TableofContents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter 1: Online Social Network with Communities: Evolvement of Within- and Cross-Community Ties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Data and Descriptive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.2 Descriptive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.4.1 Friendship Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.4.2 Community Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.5 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.5.1 Estimation Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.5.2 Neural Network Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.5.3 Moments and Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1.5.4 Test Datasets and Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.6 Model Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.7 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 1.7.1 Within- versus Cross-Community Links . . . . . . . . . . . . . . . . . . . 37 1.7.2 Mobility and Network Connectivity . . . . . . . . . . . . . . . . . . . . . . 40 1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Chapter 2: Learning to Create on Content-Sharing Platforms . . . . . . . . . . . . . . . . . 46 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 v 2.4 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.4.1 Utility Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.4.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 2.5 Identification and Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 2.6 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 2.7 Counterfactual Analysis: Smoothing Reward Over Time . . . . . . . . . . . . . . . 81 2.7.1 Effects on Content Creation . . . . . . . . . . . . . . . . . . . . . . . . . . 85 2.7.2 Effects on Platform Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . 89 2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Chapter 3: Beauty, Effort, and Earnings: Empirical Evidence from Live Streaming . . . . . 94 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 3.3 Data and Empirical Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.3.1 Creator Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 3.3.2 Live Streaming Session Information . . . . . . . . . . . . . . . . . . . . . . 105 3.3.3 Earning and Effort by Attractiveness . . . . . . . . . . . . . . . . . . . . . 106 3.4 Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 3.4.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 3.4.2 Instrumenting Effort Using the Temporal Distance to Level-ups . . . . . . 112 3.5 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 3.5.1 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 3.5.2 Robustness Check: Single Level-up Subsample . . . . . . . . . . . . . . . . 117 3.5.3 Robustness Check: Other Forms of Instruments . . . . . . . . . . . . . . . 119 3.5.4 Robustness Check: Rookies vs. Experienced Creators . . . . . . . . . . . . 119 3.6 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 A Moments for NNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 B Tie Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 C COVID-19 Induced Lockdowns in China . . . . . . . . . . . . . . . . . . . . . . . 148 D Estimation Results: Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . 149 E Policy Results: Impacts on Content Variety Measured by Entropy . . . . . . . . . 151 vi ListofTables 1.1 Explanation of Model Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.2 Monte Carlo Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.3 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.4 Evolution of Within- vs. Cross-Community Ties . . . . . . . . . . . . . . . . . . . 37 2.1 Explanation of Model Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2.2 Choice Model Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 2.2 Choice Model Estimates (Continued) . . . . . . . . . . . . . . . . . . . . . . . . . 76 2.3 Summary of Individual Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 77 2.4 Content Creation Under Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 2.5 Platform Revenue Under Policy (k) . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.1 Explanation of Model Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 3.2 Main Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 3.2 Main Estimation Results (Continued) . . . . . . . . . . . . . . . . . . . . . . . . . 124 3.3 Estimation Results: Single Level-up Subsample . . . . . . . . . . . . . . . . . . . 125 3.4 Estimation Results: Alternative IVs . . . . . . . . . . . . . . . . . . . . . . . . . . 126 3.5 Estimation Results: Rookie vs. Experienced Creators . . . . . . . . . . . . . . . . . 127 B.1 Evolvement of Within- and Cross-Community Ties . . . . . . . . . . . . . . . . . 147 vii C.1 Lockdown Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 E.1 Content Variety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 viii ListofFigures 1.1 Descriptive Analysis: Static Properties . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2 Descriptive Analysis: Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3 Network Distance and Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1.4 Estimation Errors on Test Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 1.5 Community Switching Creates Cross-Community Ties . . . . . . . . . . . . . . . 39 1.6 Player Mobility and Cross-Community Tie Density . . . . . . . . . . . . . . . . . 41 1.7 Player Mobility and Network Density . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.1 Examples of Live Streaming Sessions . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.2 Content Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.3 Individual Preference on Reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 2.4 Individual Preference on Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 2.5 Individual Preference on Content Produced in Recent Past . . . . . . . . . . . . . 83 2.6 Income Smoothing Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 2.7 Effects of Income Smoothing on Content Creation . . . . . . . . . . . . . . . . . . 87 2.8 Effects of Income Soothing on Platform Revenue . . . . . . . . . . . . . . . . . . . 91 3.1 Examples of Attractiveness Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . 103 3.2 Keeping Track of Level-ups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 ix 3.3 Tipping Income by Attractiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.4 Tipping Income Per Hour by Attractiveness . . . . . . . . . . . . . . . . . . . . . 108 3.5 Effort by Attractiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 3.6 Effort and Income around Level-ups . . . . . . . . . . . . . . . . . . . . . . . . . . 113 3.7 Normalized Effort and Income around Level-ups . . . . . . . . . . . . . . . . . . . 114 3.8 Effort and Income around Level-ups: Single Level-up Subsample . . . . . . . . . . 118 3.9 Decomposing Beauty Premium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 3.10 Quantifying Income Gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 D.1 Learning Model Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 x Abstract The digital entertainment industry has revolutionized how consumers entertain and socialize, and created careers for tens of millions of people worldwide. My dissertation studies the managerial and social implications of this industry. Chapter 1 studies the social network evolvement with communities. Online communities are widely adopted marketing practice to encourage user engagement but bring the concern of potentially fragmenting the network. Using data from online gaming, we build and estimate a model that captures the co-evolution of social networks and communities and find that ties are much more likely to form within-community than cross-community. However, in the long term, a great fraction of cross-community ties actually come from within-community tie formation. In this process, the frequency that individuals switch communities play a crucial role. Hence the platform could achieve stronger connections by harnessing tie formation within communities. Chapter 2 examines how monetary reward and reward fluctuations affect creators’ content creation choices on content-sharing platforms. Using data from a leading live streaming platform, we build and estimate a model capturing how creators learn about their reward evolvement pat- terns through their past creation and consumption experience, and utilize the expected reward and reward variance to make content creation choices. Our results suggest that, reward fluctua- tions bring significant disutility. To manage this, we propose an income smoothing policy that, xi if well designed, could achieve additional revenue gain and promote content creation. Moreover, the policy also promotes the content diversity on the platform by encouraging creators to tap into less popular content categories. Chapter 3 quantifies the beauty premium while incorporating the role of effort. In the ris- ing live streaming industry, where physical appearance is highly prominent, we measure effort based on the streaming time and address its endogeneity concern by instrumenting effort with creators’ temporal distance to level-ups. In addition to directly controlling effort, we model the interaction of effort and beauty to capture attractive creators’ advantage in efficiency. This al- lows us to decompose beauty premium into an effort-independent component and a component enhanced by effort. We find a substantial overall beauty premium, 93.5% of which attributed to the effort-enhanced component. Contrary to the perceptions of beauty as an effortless asset, our findings suggest that beauty needs to be capitalized by effort. Such findings provide important implications for both content creators and the platforms. xii Chapter1 OnlineSocialNetworkwithCommunities: Evolvementof Within-andCross-CommunityTies 1.1 Introduction Online social networks have continued to grow popularity and become integrated into a variety of digital platforms (e.g., LinkedIn, Reddit, Steam, and Quora). A common feature of online social networks is the community structure, where a network is divided into “communities” that are much more densely connected within than between each other 1 . Communities can have com- plex effects on online social networks. On one hand, a community brings together users with similar interests, which can increase user interactions and foster strong ties. On the other hand, self-selection into homogeneous groups is a major source of concern for social network frag- mentation, where individuals fall into silos of knowledge or behavior. 2 Network fragmentation 1. See Girvan and Newman (2002), Newman and Girvan (2004), and Java et al. (2007). 2. See Van Alstyne and Brynjolfsson (2005), Lawrence, Sides, and Farrell (2010), Henry, Prałat, and Zhang (2011), Conover et al. (2011), Bessi et al. (2015), Del Vicario et al. (2016), Schmidt et al. (2018), and Hwang and Krackhardt (2020). Also see discussion in media such as Mims (2020) and Horwitz and Seetharaman (2020). 1 impedes diffusion and adoption, thus weakening the marketing value of the social network to the platform. 3 Underlying the concern of network fragmentation is the opposition between two types of ties: within- and cross-community ties. Intuitively, because maintaining ties is costly, if a person develops many ties with people in her own community, she may not keep as many ties connecting to other communities as she would otherwise. This dichotomous view, while common, relies on a fairly static perspective of communities. From a dynamic perspective, a person’s within- community ties become cross-community when she switches communities (for an illustration, see Figure 1.5 in Section 1.7). Similarly, cross-community ties can evolve into within-community ties. This paper’s goal is to explore and emphasize this mutual inclusivity between the two types of ties. We document and quantify the evolution between within- and cross-community ties. We study an online game that features explicit communities (known as guilds in the game). The unique dataset includes multiple snapshots of the network and communities over three months. We construct and estimate a model to capture the dynamics of the network and communities, allowing for flexible unobserved individual heterogeneity. The model allows us to evaluate the long-term evolution of ties. We find that within-community tie formation is a major source of cross-community ties, and the main source of strong cross-community ties. The reverse is not true: cross-community tie formation contributes to only a small fraction of within-community ties. 3. See Watts and Strogatz (1998), Morris (2000), Onnela et al. (2007), Centola and Macy (2007), and Easley, Klein- berg, et al. (2010). 2 Specifically, we carry out our study following three research questions: (i) How do we capture the evolution of friendships and community memberships over time, accounting for unobserved user heterogeneity? (ii) How do within-community and cross-community ties form and evolve over time on an online social network? (iii) What policy can a platform exploit to promote the connectivity of a social network with communities? These questions pose several challenges. The first challenge is to measure the mutual influ- ence between the social ties and communities, which raises two requirements. The first require- ment is a clear separation between communities and social ties in data. A common theme in studies of community structure is to infer communities from social ties, as data on community memberships are rarer than on social ties. However, most inferences already impose a relation between communities and ties. To this end, we examine data where explicit communities are observed. The second requirement is a separation between the two directions of mutual influ- ence. A user may join a community because her friends are in that community; meanwhile, the user may form a tie with someone because they are in the same community. Observing multiple snapshots of the network and communities helps us separate these two directions of effects. We model each user’s friendship and community choices in a given period as affected by the ties and communities in the previous period. The second challenge is unobserved heterogeneity. In general, we can think of two types of heterogeneity in the context of social networks. The first type is homophily (Jackson 2010). In our context, homophily is reflected in that players with similar traits or interests in the game form ties or join the same community. These traits or interests, however, are unobserved. Thus, we introduce a set of latent individual parameters to capture them. Because it is the distances be- tween traits/interests that matter here, we think of the heterogeneity as “horizontal.” The second 3 type of heterogeneity is that certain players are more sociable and more likely to form ties with any other player (Graham 2017). Thus, we introduce another set of latent individual parameters that capture each player’s tendency to form ties. We think of this heterogeneity as “vertical.” The third challenge is computational difficulty in estimation, especially with the flexible un- observed heterogeneity. To evaluate the standard likelihood, all users’ latent parameters need to be integrated out. 4 On top of this, the evolution of the network and communities before the first snapshot of data need to be integrated out, too (the first snapshot cannot be simply conditioned on because it is informative of the latent parameters). These integrations are infeasible even for networks of modest sizes. To this end, we follow a trend in literature to exploit machine learning in econometric problems. Specifically, we train a neural net to “recognize” the parameters of our network model. 5 The estimation strategy works well in Monte Carlo experiments. The estimates of our model indicate that community and friendship choices reinforce each other, as expected. Specifically, in our model players maintain, develop, and dissolve ties simul- taneously as they choose which community to stay in or switch to. We find that the probability of forming a tie between two players increases if they are in the same community. Meanwhile, the chance that a player joins (or stays in) a community increases if a larger proportion of her friends belong to that community. We use the estimated model to carry out the central exercise of the paper, i.e., to quantify the evolution between within-community vs. cross-community ties. The model allows us to simulate the evolution for longer than the short span of the data (3 months), which then allows us to track whether each tie was initially born as within-community or cross-community. We find that over 4. Graham (2017) is not applicable due to the unobserved horizontal heterogeneity. 5. General properties for this estimation strategy, including consistency, are provided in Wei and Jiang (2021). For the literature that applies machine learning in econometric problems, see, e.g., Chiong, Galichon, and Shum (2016), Wager and Athey (2018), and Farrell, Liang, and Misra (2020). 4 1/3 of cross-community ties were born as within-community ties. This portion increases to over 1/2 if we give more weight to stronger ties. 6 In words, within-community tie formation is the main source of strong cross-community ties. The reverse relation is not true: only about 5% of within-community ties were born cross-community. These findings have important implications for models of community structure. In particular, when a model treats all cross-community ties in a snapshot as formed cross-community (e.g., stochastic block models), it will greatly inflate the cross-community tie formation probability. The mechanism of the above results lies in that users carry ties with them when switching communities. As a result, if the platform can regulate the mobility of users (i.e., frequency of switching communities), it may exploit this mechanism to achieve stronger cross-community connections on its social network. In light of this, we study a counterfactual platform policy where users must stay in a community for some time (i.e., freeze time) after moving into that community and before being allowed to move again. We discover an inverse U-shaped relation between mobility and cross-community connectivity. If players are too mobile, they do not have sufficient time to develop strong ties within a community. If players are immobile, their within- community ties never become cross-community ties. Overall, our results brings an important perspective to understanding and managing commu- nity structure in networks. Communities are typically characterized by a clear division between (dense) within-community ties and (sparse) cross-community ties. However, we show that there is substantial migration from one type of ties to the other over time. Thus, one may actually 6. We measure tie strength by friend overlap; see Easley, Kleinberg, et al. (2010) and Li et al. (2012). 5 make use of within-community tie formation (with a balanced mobility level) to foster connec- tions across different communities. For platforms, a less fragmented network implies better in- formation diffusion, less polarization, and ultimately more added value for the platform. The rest of the paper is organized as follows. Section 1.2 reviews literature. Section 1.3 de- scribes the empirical context and data. Section 1.4 presents the model. Section 1.5 outlines es- timation. Section 1.6 presents estimates. Section 1.7 presents implications and counterfactuals. Section 1.8 concludes. 1.2 Literature OnlineCommunityinMarketing Our work builds on the marketing literature studying on- line virtual communities, which investigates whether and how community participation affects consumer behavior and generates economic and social values (Algesheimer, Dholakia, and Her- rmann 2005; Algesheimer et al. 2010; He, Kuksov, and Narasimhan 2012; Zhu et al. 2012; Lu, Jerath, and Singh 2013; Manchanda, Packard, and Pattabhiramaiah 2015; Goh, Gao, and Agarwal 2016; Park et al. 2018; Yildirim et al. 2020; Chen, Xu, and Whinston 2011; Chamakiotis, Petrakaki, and Panteli 2021). Using data from a large North American retailer, Manchanda, Packard, and Pattabhiramaiah (2015) quantify the economic benefit of a firm-sponsored online community and find a significant increase in customer spending attributable to consumers joining the commu- nity. Using data from eBay Germany, Algesheimer et al. (2010) find that community participants become more conservative in bidding as buyers and more selective in listing as sellers. Through field and lab experiments, Zhu et al. (2012) find that community participation can lead to risk- seeking tendencies in financial decisions and behaviors when the participants form relatively 6 strong ties with each other. In Park et al. (2018), the authors examine the effects of online so- cial connections on users’ in-game purchases and find that social interaction between gamers in the community increases their in-game product purchases. He, Kuksov, and Narasimhan (2012) distinguish between “intraconnectivity” between customers of the same platform and “intercon- nectivity” between customers of different platforms and use a theoretical model to examine how platforms should strategically provide one of the two types of connectivity to customers. Being the first empirical study that focuses on the social (non-economic) values of online communities, Goh, Gao, and Agarwal (2016) study how online health communities address rural-urban health disparities via improved health capabilities in the empirical context of a rare disease community. Our research contributes to this literature and makes several distinctions. First, we focus on the communities within a platform. The marketing literature mostly focuses on the “community” that represents an entire platform. For example, Park et al. (2018) also take online gaming as the empirical context. They examine an online gaming community and how player interactions in this community bring in additional revenues for the game. Our study zooms in on the guilds within a game as distinctive communities. Second, aforementioned studies take social ties as given and focus on the impact of them on platform success. Our study focuses on the formation (or evolution) of social ties and communities. Third, our results speak to the structure of online social network as shaped by distinctive communities. 7 SocialNetworksandCommunities There is a large literature on social networks across dis- ciplines, in which there are several streams most related to our study. First, there is a stream of studies on the importance of unobserved heterogeneity in social network modeling. Manski 7. Related to our work are studies on how social network properties relate to the success of online communities (Hinds and Lee 2008; Dover, Goldenberg, and Shapira 2020). 7 (1993) and Bramoullé, Djebbari, and Fortin (2009) discuss how to estimate peer effects when treat- ing the social network as exogenously given. A series of more recent econometric models propose to incorporate the formation of networks (Goldsmith-Pinkham and Imbens 2013; Boucher 2016; Hsieh and Lee 2016; Ameri, Honka, and Xie 2017). These models recognize that networks carry additional information on the underlying similarities between individuals (e.g., homophily) not directly observed in the data. Graham (2017) and Dzemski (2019) emphasize that there can be additional heterogeneity in how likely individuals form ties with others (i.e., degree heterogene- ity). We extend this stream of works by allowing both types of unobserved heterogeneity in both social network formation and community choices. Second, there is a growing literature on online gaming in marketing, information systems, and economics. Video games have been studied as an important example of products with network externality (Katz and Shapiro 1985, 1994). More recently, online gaming became a popular context of research due to the rich and accurate measures of social ties (Zhao, Gao, and Xie 2017; Gu et al. 2017; Zhang et al. 2017; Park et al. 2018). We contribute to this literature by extending the examination of social networks to their co-evolution with within-game communities (e.g., guilds). Third, in network science and statistics, there is a line of literature that tries to detect commu- nity structures in networks (Girvan and Newman 2002; Newman 2006; Karrer and Newman 2011; Ho, Yin, and Xing 2016). Recent works extend this literature to accommodate dynamic networks (Lin et al. 2009; Greene, Doyle, and Cunningham 2010; Yang et al. 2011). Communities are not directly observed in the data but inferred from the network, relying on the assumption that nodes are more densely connected within than between communities. In contrast, the communities in 8 our study are explicitly defined in the data. This distinction helps us measure how the network and communities affect each other. Closely related to the community structure are studies that examine the implications of the structure. One stream of literature shows that networks fragmented by communities obstruct diffusion and adoption, and bridges between communities are essential for removing the obstruc- tion. This idea is argued by Granovetter (1973), Watts and Strogatz (1998), and Morris (2000). Em- pirical investigations include Onnela et al. (2007), Centola and Macy (2007), and Weng, Menczer, and Ahn (2013), among others. Another stream of literature examines how segregation and po- larization manifest themselves along community lines. Examples include Schmidt et al. (2018), Bessi et al. (2015), and Del Vicario et al. (2016) on Facebook network, and Conover et al. (2011), Guerra et al. (2013), and Shore, Baek, and Dellarocas (2018) on Twitter network. Our study con- tributes to these lines of literatures by offering insights on how platforms can achieve stronger connections that bridge communities. Machine Learning in Analyzing Social Networks Traditionally, economic and statistical models have been extensively applied to model social network formation and social influence (e.g., Manski 2000; Goldsmith-Pinkham and Imbens 2013; Shriver, Nair, and Hofstetter 2013; Christakis et al. 2020; Chen, Lans, and Trusov 2021). Meanwhile, there is a growing body of research in computer and data science that uses machine learning techniques in modeling social networks (Tang et al. 2015; Grover and Leskovec 2016; Zhu et al. 2016; Yuan, Alabdulkareem, et al. 2018; Qiu et al. 2018). In particular, they aim to represent nodes in a network by latent posi- tions in a vector space, known as the embedding of the network (similar to the word embedding 9 extracted from a corpus). For instance, Zhu et al. (2016) propose a temporal latent space model that predicts link formation formation in a dynamic social networks setting. We also apply a machine learning technique (i.e., train a neural net to recognize the parame- ters of our network-community model). The purpose of machine learning is not to model social networks. Rather, the purpose is to estimate the parameters of a social network model, taking the latter as given. Application of machine learning makes the estimation of our model feasi- ble. In general, a network model coupled with rich unobserved heterogeneity creates substantial computational burden for estimation. We show how one can take advantage of machine learning techniques to alleviate the burden. 1.3 DataandDescriptiveAnalysis 1.3.1 Data We work with a unique dataset from a multiplayer online game. The game was launched in mainland China in 2010 and became popular soon after. Similar to many modern role-playing online games, this game sets up rich story lines for various in-game characters. Inside the game, players could pick a character and participate in various types of combats to achieve victories. In completing the combats, players could either play on their own or team up with other players. The game embeds social features of both friendship networks and online communities. A community in the game is known as a “guild” and serves as an information and social platform. Though each guild may have institutional features that differ slightly from each other, the overall goal for a guild is to share information within the guild, provide access to regular teams for 10 group battles, and compete against other guilds in guild wars. 8 Additionally, guild members may enjoy benefits such as earning guild points for equipment purchase and gaining additional storage quota, etc. By fostering interactions inside the group, communities likely promote link formation within them. Each player can join up to one guild at a time (i.e., monoguilding). However, players are able to switch from one guild to another. There is a cost of switching. If a player decides to leave the current guild, her guild points will be reset to zero. Friendship ties in the game establish informational links between two players so that players could share in-game status and complete gaming tasks together. In this game, each player can form as many friendship ties as desired and may de-friend at any time. Once players become virtual friends, they can share their in-game status, complete gaming tasks together, or simply discuss their gaming experiences. Our data are from a specific server of the game from January to March 2011. During our data period, player activities were restricted within servers. That is, players could only com- bat/cooperate with others on the same server; friendship ties and guilds involved players from the same server. Therefore, we observe the complete friendship network and community (guild) memberships. For our analysis, we focus on four snapshots of the network and communities. The snapshots are spaced at intervals of about three weeks (i.e., the lapse between 1st and 2nd snapshots is about 3 weeks, the lapse between 2nd and 3rd snapshots is about 3 weeks). Because our research focus is the co-evolution of online communities and the network, we carefully filter out observations that do not fit into our research focus. First, we remove players who, during our data period, were inactive (i.e., who did not “level up” at all), had never been in a community, or 8. In group battles, a number of players will team up and play against computer opponents. Players can either be matched randomly or form a group by themselves. In the latter case, a guild is essentially an information disseminator where players can conveniently find teammates. As to guild wars, members from two different guilds will play against each other. As such, guild wars highlight the competitive nature between guilds in the game. 11 had never been friends of other players. Second, we only keep active communities that consist of more than one active members at some point of our data period. Altogether, the above procedures provide us with a dataset consisting of 2,511 individuals and 537 communities. 1.3.2 DescriptiveAnalysis Figure 1.1 and Figure 1.2 display a handful of important model-free patterns from the data. Figure 1.1 focuses on the “static” properties of the network and communities that hold generally across all periods of our data. Figure 1.2 focuses on the dynamics that describe how the network and communities evolve. These patterns motivate our modeling choices in Section 1.4. Network and community properties Plot (a) in Figure 1.1 presents the distribution of de- grees. The degree of a player is the number of friends (or ties) of the player in the network. Overall, the degrees have a long-tail distribution, showing considerable variation across players. The largest degree in our data (after removing inactive players) is 64. An average player has 4.47 friends. This large variation across players motivates us to introduce later in our model the unobserved heterogeneity that captures the possibility that players may be very different in how sociable they are in the game. Plot (b) in Figure 1.1 shows the distribution of community sizes. The largest community in our data reaches 42 members, while an average community consists of 4.68 active players (again after removing inactive players). This long-tail distribution can be explained by player hetero- geneity. Intuitively, communities are driven by the interests of their members: Large communities 12 typically consist of players with interests that are prevalent in the game. In contrast, small com- munities typically consist of players with niche interests. We will incorporate this feature in our model. Plot (c) in Figure 1.1 shows the within-community and cross-community network densities. Network density is defined as the ratio between the total number of ties and the total number of player pairs (Jackson 2010). We see that the within-community network density is much higher than the cross-community density. In words, it is more likely to observe a tie between two players from the same community than between two players from different communities. This result is consistent with the intended purpose of communities: fostering interactions and subsequently tie formation within-community. However, to measure the community effect on tie formation, we must also take into account: (i) some ties were existent before players joined a community, and (ii) some ties formed due to community members having similar interests rather than the community effect per se. We attempt to account for these two factors later in the model. Plot (d) in Figure 1.1 shows the clustering coefficients (Jackson 2010). The clustering coeffi- cient captures the extent to which players “clump” together in network. Specifically, it measures the frequency that two friends of a player are also friends themselves. We see that the net- work shows positive clustering overall (0.12). More importantly, there is more clustering within- community. This result is consistent with the expectation that a community comprises players with similar interests. Network and community dynamics Plot (a) in Figure 1.2 classifies the ties in each period either as already existing last period, or as newly formed. On average, existing ties account for about80% of the ties observed in each period. The result suggests that there is a substantial state 13 Figure 1.1: Descriptive Analysis: Static Properties (a) Degree Distribution (b) Community Size Distribution (c) Density of Ties (d) Clustering 14 Figure 1.2: Descriptive Analysis: Dynamics (a) Existing vs. New Ties (b) Stay vs. Switch Communities (c) Friend Presence and Community Choice (d) Friendship Ties after Community Switch- ing 15 dependence in friendship formation, so that once a social tie is formed, it is likely to remain for some time into the future. However, we do see new ties being formed, showing that the network changes over time. Plot (b) in Figure 1.2 classifies the community choices in each period either as staying in the same community as last period or switching to a different community. Similarly to from plot (a), we find that community decisions also exhibit substantial state dependence. At the same time, we see switching, which shows that there is some level of mobility in community memberships. Plot (c) in Figure 1.2 shows how the social network can predict community choices in the next period. It shows that a player is more likely to choose (stay in or join) a community if that community had a larger fraction of that player’s friends. While this result is intuitive, more interesting is how large a difference the friend presence makes. The probability for a player to choose a community is near zero when less than 1/3 of her friends belonged to that community (the left bar). This probability increases to 0.6 when more than 2/3 of her friends belonged to the community (the right bar). This result suggests that existing friendship is an important factor for players to choose a community. However, to measure the effect of friendships on community choices, we must take into account that a community choice can also be driven by the similarity between the player’s and community members’ interests (likely reflected in the large presence of the player’s friends in the community). We will try to account for this factor in the model. Plot (d) in Figure 1.2 shows how community switching affects friendship over time. In the upper panel, we track the number of a player’s friends in a previous community after the player leaves for a different community. We see that the longer after a player leaves a community, the fewer friends she has in that community. However, this decline is gradual and slow, implying that players still maintain ties with their previous communities for a substantial time. In the lower 16 panel, we track the number of a player’s friends in her current community. We see that the longer that a player is a member of a community, the more friends she has in that community. Overall, the result suggests that players trade ties with their previous communities for ties with their current communities, but do so only gradually. Later, our model will try to capture this pattern by allowing the dependence of tie formation on community memberships, state dependence in social ties, and cost of maintaining ties. 1.4 Model To quantify the evolution between within- and cross-community ties, we want to track the long term dynamics of network and communities. Yet, the data cover a relatively short term section of the dynamics, with moderate period-to-period changes in network and community member- ships (see Section 1.3.2). To this end, we build a model to capture the dynamics. We specify two individual-level choices: (i) whom to be friends with, and (ii) which community to be a member of. Both choices in period t depend on the network and communities in period t− 1. In other words, we characterize the co-evolution of the social network and online communities. 9 There are two reasons to use the above approach to jointly model tie formation and commu- nity choice. First, it allows the intuition that tie formation and community choice may reinforce each other. An individual player may decide to join a community because that is where many of her friends are. Meanwhile, she may be more likely to develop ties with members of her commu- nity. Second, this approach allows us to better control for unobserved individual heterogeneity. 9. “Dynamics” here does not refer to the inclusion of forward-looking behaviors. In this paper we rely on state dependences to capture dynamics. Incorporating forward-looking choices can offer new insights but doing so in network models is unfortunately a difficult challenge in literature. 17 Particularly, the same set of unobserved individual factors can affect both tie formation and com- munity choice. For example, the unobserved interest of a player will likely steer her to find friends who share this interest. Meanwhile, the same unobserved interest can also nudge her to join a community where members share this interest. 1.4.1 FriendshipFormation We represent the player network using its adjacency matrix y t , where its entry y ijt indicates whether or not there is a tie between individuali andj in periodt. Specifically, y ijt = y jit = 1 if players i and j are friends at t and y ijt = y jit = 0 otherwise. Similarly, we represent the community choices with a membership matrix g t , where g ikt = 1 if player i is a member of communityk att andg ikt = 0 otherwise. (In the data the lapse between two periods is about 3 weeks; see Section 1.3.) We assume that, in each period t, players i and j will form a tie if and only if utility U F ijt exceeds a threshold. The superscript F denotes “friendship” to distinguish the term from the utility for community choice, which we will describe later. We normalize the threshold to be zero. LetI(· ) denote the indicator function, we have y ijt =y jit =I U F ijt > 0 , (1.1) where we separate the utility term into two parts: U F ijt =V F ij,t− 1 +ε F ijt , 18 and V F ij,t− 1 = α F · y ij,t− 1 +β F · SameComm ij,t− 1 − γ F · SumDegree ij,t− 1 − δ F ·|ω i − ω j |+θ F · (η i +η j )+λ F . (1.2) In the above,y ij,t− 1 indicates whetheri andj were friends in the last period (state dependence). The variable SameComm ij,t− 1 is a dummy variable indicating whether i and j belonged to the same community. Letm be the total number of communities. SameComm ij,t− 1 = m X k=1 g ik,t− 1 · g jk,t− 1 . SumDegree ij,t− 1 is the sum of the log degrees ofi andj in the network from last period. Letn be the total number of players. SumDegree ij,t− 1 = log 1+ n X k=1 y ik,t− 1 ! +log 1+ n X k=1 y jk,t− 1 ! . As a result, parameter α F represents the own state dependence in the friendship tie formation. Parameterβ F represents the effect of existing community membership on tie formation. Param- eter γ F captures the cost of maintaining friendships; a positive γ F reduces the probability of a new tie between individuals who already have many friends. In addition, Equation (1.2) includes two sets of individual latent parameters for unobserved heterogeneity,ω ≡{ ω i } n i=1 andη ≡{ η i } n i=1 . Parametersω capture the “horizontal” aspect of the player heterogeneity. A larger distance|ω i − ω j | between playeri andj reduces the probability of a tie. One way to think ofω i is that it represents the “interest” of playeri. Players with similar interests are more likely to form a tie, also known as homophily (Jackson 2010). We specify that ω i ∼ N(0,1). The mean is normalized to zero because only the pairwise distances enter utility. 19 The variance can be normalized to 1 because we include coefficient δ F in front of |ω i − ω j |. Implicitly, player’s interest can further be a function of player characteristics such as age, gender, and geographic location. However, we do not specify this function because we are not interested in it per se. Parametersη capture the “vertical” heterogeneity. A largerη i increases the probability fori to form ties with all other players, leading to a higher degree for i in the network. One way to think ofη i is that it captures how sociable a player is in the game. This set of latent parameters is sometimes known in literature as degree heterogeneity (Graham 2017). We specifyη i ∼ N(0,1). The mean can be normalized to zero because we have an intercept term λ F in Equation (1.2). Similar toω i , the latent parameterη i can further be a function of player characteristics. However, we do not specify this function because we are not interested in it per se. It is worthwhile to point out that one would expect a positive correlation between(η i +η j ) and SumDegree ij,t− 1 across player pairs. However, it is important to note they capture different effects. Specifically, (η i +η j ) is time-invariant and captures the underlying qualities of players, whereas SumDegree ij,t− 1 is time-variant and captures the cost of maintaining ties. Assuming that the idiosyncratic error termε F ijt follows the logistic distribution, the probability that playeri andj will form a tie in periodt is p F ijt = exp(V F ij,t− 1 ) 1+exp(V F ij,t− 1 ) . (1.3) 20 Note that the above probabilityp F ijt is conditional on the data from periodt− 1 and unobserved heterogeneity (ω,η ). To compute the likelihood for estimation, we must integrate out the indi- vidual heterogeneity parameters. We discuss the computational aspect of this integral in Section 1.5. 1.4.2 CommunityChoice Recall that in our empirical setting, the game adopts a competitive monoguilding design where each player could join only one community (guild) at a time. In periodt, playeri chooses com- munity k such that the utility with that community U C ikt is higher than U C iℓt for every ℓ ̸= k. The superscriptC denotes “community” to distinguish the utility terms for friendship choice de- scribed earlier. Recall that we useg ikt to denote whetheri is a member of communityk at time t. Thus, g ikt = 1, ifU C ikt >U C iℓt for allℓ̸=k; 0, otherwise. We again separate the utility term into two parts: U C ikt =V C ik,t− 1 +ε C ikt , where V C ik,t− 1 = α C · g ik,t− 1 +β C · PresenceFriend ik,t− 1 − δ C · e ω ik,t− 1 +θ C · e η k,t− 1 (1.4) 21 In the above, g ik,t− 1 indicates whether i belonged to community k in the last period (state de- pendence). PresenceFriend ik,t− 1 equals the fraction of player i’s friends that were members of communityk in the last period. PresenceFriend ik,t− 1 = P n j=1 y ij,t− 1 · g jk,t− 1 P n j=1 y ij,t− 1 . Because of this specification, α C captures the own state dependence in the community choice and β C captures the effect of friendship ties on community choices. Intuitively, a player is more likely to join or stay with a community if most of the player’s social ties belong to that community. Importantly, individual heterogeneity (ω,η ) also enters Equation (1.4). Intuitively, a player may lean towards joining a community where members’ interests are close to hers. We capture this intuition bye ω ik,t− 1 , which is defined as the average latent distance between player i and communityk’s members. e ω ik,t− 1 = P n j=1 |ω i − ω j |· g jk,t− 1 P n j=1 g jk,t− 1 . (1.5) In addition to horizontal heterogeneity, vertical heterogeneity also enters community choice. Intuitively, players may tend to favor a community where members are more sociable. We capture this viae η k,t− 1 in Equation (1.4). Specifically, e η k,t− 1 takes the simple average across community members. e η k,t− 1 = P n j=1 η j · g jk,t− 1 P n j=1 g jk,t− 1 . (1.6) Assuming that the idiosyncratic termε C ikt follows the type I Extreme Value distribution, the probability that individuali chooses communityk is given as 22 p C ikt = exp(V C ik,t− 1 ) P m ℓ=1 exp(V C iℓ,t− 1 ) (1.7) Similar to Equation (1.3), the above probabilityp C ikt is conditional on the data from periodt− 1 and unobserved heterogeneity (ω,η ). To compute the likelihood for estimation, we need to integrate out the individual heterogeneity parameters. We discuss the computational aspect of this integral in Section 1.5. Finally, to complete our model, we specify that the initial period starts with the empty network with empty communities. We add a discrete parameterτ ∈{1,2,...}. It represents the number of periods it takes for the network and communities to evolve from the empty ones to the ones that we observe in the first period of the data. The exact value of τ will be estimated from the data together with other main parameters in the model. Table 1.1 provides a summary of the main variables included in our model. 1.5 Estimation This section describes our estimation procedure. The main challenge in estimation comes from the rich unobserved heterogeneity (ω andη ) that is important for both the network and commu- nity models. The unobserved heterogeneity must be integrated out to evaluate the tie formation and community choice probabilities. Thanks to the network, the integral does not break into individual levels like in standard cross-sectional settings. Further complicating this challenge is that we need to integrate out the model’s initial periods that are not observed in the data (see end of Section 1.4). 23 Table 1.1: Explanation of Model Variables Model Variable Meaning Friendship y ijt A dummy indicating whether playeri andj are friends at timet. SameComm ijt A dummy indicating whether playeri andj are members of the same community at timet. SumDegree ijt A sum measuring the number of friends thati andj have at timet. |ω i − ω j | Distance between playeri’s andj’s interests (latent horizontal heterogeneity). η i +η j How sociable playeri andj are (latent vertical heterogeneity). Community g ikt A dummy indicating whether playeri is a member of communityk at timet. PresenceFriend ikt Fraction of playeri’s friends who are members of communityk. e ω ikt Average distance between playeri’s interest and interests of communityk’s members at timet. e η kt How sociable on average the members of communityk are at timet. Initial Condition τ The initial period of data is modeled as the (τ +1)th period of the model. 24 Below, we first discuss the computational burden with traditional estimation methods. Next, we describe an alternative estimation method offering substantial computational savings. This alternative estimation method exploits machine learning techniques (specifically, neural nets). However, it is important to note that the machine learning technique is only used to obtain pa- rameter estimates of our model; it does not replace or change any part of our model in Section 1.4. 1.5.1 EstimationChallenges We first write down the likelihood function. This exercise will allow us to see the computational difficulty in applying a direct MLE approach. Then, we extend our discussion to Bayesian MCMC and method of moments. Before discussion, we note for readers familiar with Graham (2017) that the method is not readily applicable due to the unobserved horizontal heterogeneity in our model. Specifically, the probability for observing the network and communities in period t, condi- tional on the observed network and communities int− 1 and the individual latent parameters, is P y t ,g t |y t− 1 ,g t− 1 ,ω,η = Y i<j y ijt · p F ijt +(1− y ijt )· (1− p F ijt ) × n Y i=1 m X k=1 g ikt · p C ikt ! . (1.8) On the right hand side, the first product is for the tie formation probabilities across all the player pairs (conditional on the unobserved heterogeneity). The second product is for the community choice probabilities (again conditional on the unobserved heterogeneity). For the definitions of p F ijt andp C ikt , see Equation (1.3) and (1.7). 25 The above likelihood is for one snapshot. Next, we write down the likelihood for observing the multiple snapshots{y t ,g t } τ +T t=τ +1 . Recall that the first snapshot in the data is modeled as τ periods after the empty network and communities. P {y t ,g t } τ +T t=τ +1 |ω,η =P y τ +1 ,g τ +1 |ω,η × τ +T Y t=τ +2 P y t ,g t |y t− 1 ,g t− 1 ,ω,η . At this point, we are ready to discuss the two computational problems if one is to apply MLE. Both problems stem from the unobserved heterogeneity. First, the above likelihood is conditional onω andη , which we need to integrate out to ob- tain the likelihood for estimation. The integral overω andη is high-dimensional. In standard cross-sectional settings, the integral over heterogeneity can break down into individual levels; individual’s choices are independent of each other and each’s choice only depends on her own latent parameters. This separability greatly simplifies computation. However, this is not the case in a network setting. The choices are inter-dependent on the unobserved heterogeneity. For ex- ample,ω i affects the tie formation probabilities between i and all the othern− 1 players; the tie formation betweeni and any particular playerj also depends onω j . Second, the probability for the first snapshot of data, P y τ +1 ,g τ +1 |ω,η , is not simple to evaluate. It requires us to integrate over the first τ periods of the model not observed in data. Specifically, let y 0 andg 0 denote the initial empty network and communities. We have P y τ +1 ,g τ +1 |ω,η = Z τ +1 Y t=1 P y t ,g t |y t− 1 ,g t− 1 ,ω,η · d(y 1 ,g 1 ,...,y τ ,g τ ). (1.9) 26 In its basic form, this integral enumerates over all possible networks and communities from period 1 to period τ . Needless to say, the number of possibilities is astronomical. One may attempt to sidestep this issue by conditioning the likelihood on the first snapshot of data. However, it is not difficult to see that doing so requires one to evaluate P ω,η |y τ +1 ,g τ +1 , which in turn requires one to evaluate Equation (1.9) again. More fundamentally, the first snapshot of data is informative ofω andη (for example, y ij,t− 1 = 1 indicates|ω i − ω j | is small). This information cannot be simply “thrown away” but needs to be incorporated into the likelihood. One common approach for the first problem (i.e., integration over individual latent parame- ters) is Bayesian MCMC (Rossi, Allenby, and McCulloch 2012). In MCMC, individual latent pa- rameters are updated iteration by iteration together with the main parameters. However, MCMC is still computationally costly in general and is especially so in our setting. In standard cross- sectional settings, the update of an individuali’s latent parameter affects only i’s choices. How- ever, in our setting, the update ofi’s latent parameter affects all friendships involving i as well as all players’ community choices. Thus, each update involves calculatingO(nT) choice probabili- ties instead ofO(T) probabilities. In addition, MCMC does not address the second problem (i.e., integration over the initial periods). Another approach is SMM (simulated method of moments). SMM does not rely on the difficult- to-evaluate likelihood function. Instead, one specifies a set of moments to be matched between real data and model-simulated data. In this aspect, SMM is close to NNE that we will describe in a moment. NNE also relies on a set of moments. However, SMM has several disadvantages in our setting. First, estimation errors are large when the moments have large variances (likely in the case with rich unobserved heterogeneity). Using a larger number of simulations reduces 27 errors but leads to higher computational costs. Second, substantial biases can arise with redun- dant moments. This requires researchers to make careful choices when there is an abundance of candidate moments (typical in the case of network data). Third, there are no well-established formulas for inferences in some settings (such as with network data). One may have to bootstrap which is computationally heavy. NNE has better properties in these three aspects. 1.5.2 NeuralNetworkEstimator Now we describe the estimation approach aided by machine learning techniques. The basic idea is to train a neural net capable of “recognizing” our model parameters from data (similar to how neural nets recognize objects from images). In the rest of the paper, we refer to it as the neural network estimator (NNE). NNE requires integrating out neither the unobserved heterogeneity nor initial periods. We find NNE: (i) computationally light, and (ii) able to recover parameter values reasonably well in Monte Carlo study (shown later in Section 1.5.4). As a result, NNE has the potential to carry our model to even larger-scale network and community data. In general, given any econometric model, NNE proceeds as follows to estimate the model’s parameters. (In this paper, “parameters” always refer to the parameters of the econometric model, not the neural net.) We first simulate say 5000 datasets, each under a different set of parameter values. Then, we train a neural net that maps from these simulated datasets to the correspond- ing parameter values. Finally, we “plug” the real data into the trained neural net to obtain the parameter estimates for the real data. In an alternative form, one may “compress” each simu- lated dataset to a set of summary moments, and train the neural net to map from these moments (instead of the datasets) to parameter values. 28 It is useful to note that there are general theoretical guarantees for NNE to give parameter estimates that are meaningful in the traditional sense. Wei and Jiang (2021) show that NNE con- verges to Bayesian posteriors as the number of simulated training datasets increases. Thus, NNE inherits properties of Bayesian estimators, such as consistency. For readers familiar with approxi- mate Bayesian computation (ABC), one may see NNE as an implementation under its framework. However, NNE helps address two fundamental issues in ABC: (i) how to properly choose the dis- tance between simulated datasets and real dataset, and (ii) how to properly choose a distance cutoff to reject/accept simulated datasets. 10 Next, we gives the specifics of implementing NNE in our model. Denote the collection of the main parameters in our model asϕ = (α F ,β F ,γ F ,δ F ,θ F ,λ F ,α C ,β C ,δ C ,θ C ,τ ). Recall that we have 4 periods of observation in the data, and assume that the first period in the data is the (τ +1)th period of the model that starts with an empty network and empty communities. 1. DrawR sets of parameter values,ϕ (r) ,r = 1,...,R. 2. For each r, simulate the model underϕ (r) for τ +4 periods. Denote the last 4 periods of simulated data as{y (r) t ,g (r) t } 4 t=1 . 3. Compute a set of summary moments, denoted asm (r) , for{y (r) t ,g (r) t } 4 t=1 . (We will discuss the exact moments a bit later). 4. Train a neural networkf that predictsϕ (r) usingm (r) , forr = 1,...,R. 5. Compute the summary moments of the real data. Denote these moments asm (0) . Obtain the parameter estimates as b ϕ =f(m (0) ). 10. See Gelman et al. (2013) for the discussion of ABC and these two issues. 29 If we apply the general convergence result of NNE applied to this specific context, it says that f is equal toE(ϕ |m) ifR is sufficiently large. E(ϕ |m) is known as the limited-information Bayesian posterior mean. Here, the extra qualifier “limited-information” simply signifies that the posterior is conditional on a set of summary moments (i.e.,m) instead of the entire data (i.e.,{y t ,g t } 4 t=1 ). In the above procedure, the neural net outputs point estimates for parameters. Going beyond point estimates, one can modify step 4 to obtain measures for statistical accuracy. The modifi- cation is to ask neural netf to make an additional output for each parameter. This additional output aims to capture the discrepancy between the point estimate and the true parameter value (which is known for simulated datasets). We shall omit the technical details here and refer in- terested readers to Wei and Jiang (2021). They show that these additional outputs converge to Var(ϕ |m) – the limited-information Bayesian posterior variances – whenR is sufficiently large. 1.5.3 MomentsandIdentification We now describe the summary moments used in our estimation. The description here attempts to be non-technical and instead focuses on identification (which dictates moment choices). We give the technical definitions in Appendix A. A central question on identification is, when estimating the tie formation probabilities, how we separate the effect of community memberships from the effect of the unobserved horizontal heterogeneity? In words, when we see a pair of players in the same community forming a tie, do we attribute it to: (i) the two players being in the same community, or (ii) a likely high level of unobserved similarity between the two players, as indicated by the fact that the two players have 30 Figure 1.3: Network Distance and Identification Notes: We consider the link formation probability between the two nodes marked by green boxes. Background colors indicate community memberships. joined the same community? In our model, factor (i) is captured by the indicator SameComm ij,t− 1 , whereas factor (ii) is captured by a small distance|ω i − ω j |. The challenge in the above question is that we do not observe the latent heterogeneity. How- ever, we note that a smaller|ω i − ω j | tends to result in a shorter distance betweeni andj in the network (e.g., length of shortest path). In words, network distances reflect unobserved similarities between players. The observation of the network distances allows identification. Figure 1.3 offers an illustration. Plot (a) shows a pair of players in the same community and close in the network. Plot (b) shows a pair in the same community but far apart in the network. Plot (c) shows a pair in different communities but close in the network. If the data show that the tie formation probabilities conditional on the three scenarios are (a)≃ (b) > (c), then we may say that membership in the same community has a significantly effect on tie formation but the unobserved similarity does not. If the data probabilities are (a)≃ (c)> (b), then the unobserved similarity has a significant effect but membership in the same community does not. 31 A similar intuition applies to identifying the effect of the friend presence on community choices. Network distances help us separate the effect of unobserved horizontal heterogeneity on community choices. Particularly, the average network distance between a player and members of a community would serve as a proxy for e w ik,t− 1 (see Equation 1.5). We include three sets of moments for NNE. First, we include a set of network summary statis- tics: the mean and variance of the degree distribution and the clustering coefficient. Second, we include the mean and covariance matrix of the variables in Equation (1.2). We proxy|ω i − ω j | with the network distance between i and j and η i with the degree of i. Third, we include the mean and covariance matrix of the variables in Equation (1.4). We construct proxies fore ω ikt and e η kt based on network distances and degrees. See Appendix A for detailed definitions. 1.5.4 TestDatasetsandMonteCarlo There are two approaches to verify the performance of NNE in our application. The first approach is to test the neural net out-of-sample (i.e., an approach used more in machine learning). We set aside 10% of theR datasets simulated in step 2 above. We use the rest 90% datasets to train the neural net (in step 4), and then apply the trained neural net on the 10% “test” datasets. We use R = 1e4. The result is reported in Figure 1.4, where we plot the the parameter estimates by the neural net against the true parameter values. We see that the points are concentrated around 45 degree lines, indicating that NNE recovers the parameter values well. The second approach is to conduct Monte Carlo (i.e., an approach used more in econometrics). We fix one set of parameter values ϕ ∗ and simulate, say 100, datasets underϕ ∗ . Then, for each simulated dataset, we treat it as a real dataset and carry out the entire procedure (step 1 to 5) 32 Table 1.2: Monte Carlo Results Model Parameter True Value Mean of Estimates S.E. of Mean Friendship α F 12.00 12.3280 (0.0884) β F 6.00 5.8968 (0.0600) γ F 0.25 0.2547 (0.0046) δ F 4.00 3.9212 (0.0529) θ F 0.75 0.7538 (0.0078) λ F − 6.50 − 6.5215 (0.0225) Community α C 9.00 9.0071 (0.0453) β C 2.50 2.4875 (0.0594) δ C 1.50 1.6167 (0.0323) θ C 3.50 3.5065 (0.0459) Initial Condition τ 4.00 3.9387 (0.0416) Note: Results are based on parameters recovered from 100 simulated datasets. of NNE. This exercise will give us 100 estimates forϕ ∗ . We use these 100 estimates to assess the bias and variance of NNE. The result is reported in Table 1.2. We see that NNE recovers all parameter values with relatively small biases. 1.6 ModelEstimates This section discusses our model estimates. We present the model estimates in Table 1.3. The description of each variable was summarized in Table 1.1. Here presented are parameter estimates of our full model that allows for both types of unobserved heterogeneity (vertical and horizontal). On the friendship formation, our model estimates reveal several driving forces. First, the pos- itive model estimate in front of y ij,t− 1 indicates that friendship ties have a high degree of state dependence. Second, the estimated coefficient of SameComm ij,t− 1 is positive and statistically sig- nificant, indicating that being in the same community does increase the chance of tie formation. 33 Figure 1.4: Estimation Errors on Test Datasets Note: 1. We simulateR datasets, and use 90% of the datasets to train NNE. We setR = 1e4. 2. Plots here show how well the NNE recovers parameters on the remaining 10% of the simulated datasets (the test datasets). 3. Each scatter plot corresponds to one parameter in our model. It compares the estimated parameter values (vertical axis) against their true values (horizontal axis). The red line is the 45 degree line. 34 Table 1.3: Estimation Results Model Parameter Variable Estimate S.E. Friendship α F y ij,t− 1 12.0710 ∗∗∗ (0.5542) β F SameComm ij,t− 1 5.3006 ∗∗∗ (0.2011) γ F SumDegree ij,t− 1 0.1805 ∗∗∗ (0.0322) δ F |ω i − ω j | 4.9283 ∗∗∗ (0.4041) θ F η i +η j 0.6551 ∗∗∗ (0.0535) λ F Constant − 6.3838 ∗∗∗ (0.1573) Community α C g ik,t− 1 8.9165 ∗∗∗ (0.2283) β C PresenceFriend ik,t− 1 2.6709 ∗∗∗ (0.3660) δ C e ω ik,t− 1 1.7006 ∗∗∗ (0.2941) θ C e η k,t− 1 3.7315 ∗∗∗ (0.3329) Initial Condition τ 3.3160 ∗∗∗ (0.3764) Note: *** indicates that results are significant at the 99%-level. It is important to note that this estimate is obtained after controlling for unobserved heterogene- ity, which can drive tie formation too (see Section 1.4 for discussion). Third, SumDegree ij,t− 1 has a significant negative effect on tie formation (The model has a negative sign for the coefficient in front of SumDegree ij,t− 1 ; see Equation 1.2). This result captures that there is a cost to maintain social ties. This is consistent with Dunbar (2016), which shows that people have limited cognitive resources and time constraints even for managing online social ties. We find substantial unobserved heterogeneity, as reflected by the coefficients in front of |ω i − ω j | as well as(η i +η j ). Recall thatω i captures a player’s horizontal heterogeneity (e.g., personal interests) and η i captures a player’s vertical heterogeneity (e.g., how sociable). We can obtain some idea for the magnitude of these effects of unobserved heterogeneity. First, in terms of the latent distance|ω i − ω j |, the probability for an average pair of individuals to form a tie is only 0.004 times the probability for two individuals with the same latent parameter (ω i =ω j ). Second, in terms ofη i , an increase of one standard deviation means that the player is 1.93 more likely to 35 form ties with others. 11 Overall, these substantial effects highlight the importance of controlling for unobserved heterogeneity. On players’ community choices, our model estimates again reveal several driving forces. First, the positive coefficient of g ik,t− 1 indicates that community choices have a high degree of state dependence. Second, PresenceFriend ik,t− 1 has a positive and statistically significant coefficient, which indicates that a player tends to join or stay in the community where a large proportion of her friends are members. Note that this estimate is after controlling for unobserved heterogene- ity, which can drive the player’s community choice, too (see Section 1.4.2 for discussion). Third, e ω ik,t− 1 has a significant coefficient. This result indicates that horizontal heterogeneity plays a role in community choice. A player prefers a community where the members have similar in- terests as the player’s. Similarly,e η k,t− 1 has a significant coefficient too, indicating that vertical heterogeneity plays a role in community choice. Players prefer communities where members are more sociable on average. 1.7 Implications We use the estimated model to explore two questions. The first question is to what extent cross- community ties are initially born within-community, and vice versa? The model enables us to answer this question because it allows us to track the birth (and death) of all ties. Building on the answer, we explore the second question: how can we use a platform policy to increase cross- community ties and possibly also the overall connectivity of the social network? 11. Note that bothω i andη i are specified to follow N(0,1). On average,|ω i − ω j | equals about 1.128 (i.e., the MAD ofN(0,2)). We can approximate Equation (1.3) with the exponential function because the tie formation probabilities are small in general. Thus,0.004≃ exp(− 4.928× 1.128) and1.93≃ exp(0.655). 36 Table 1.4: Evolution of Within- vs. Cross-Community Ties Unweighted Weighted by Tie Strength Born Within Born Cross Born Within Born Cross T = 25 Within 96.09% 3.91% 97.47% 2.53% (0.06) (0.06) (0.05) (0.05) Cross 33.41% 66.59% 64.80% 35.20% (0.21) (0.21) (0.25) (0.25) T = 50 Within 94.17% 5.83% 96.01% 3.99% (0.07) (0.07) (0.09) (0.09) Cross 39.82% 60.18% 58.79% 41.21% (0.16) (0.16) (0.20) (0.20) T = 75 Within 92.73% 7.27% 94.39% 5.61% (0.09) (0.09) (0.10) (0.10) Cross 42.38% 57.62% 53.72% 46.28% (0.13) (0.13) (0.16) (0.16) Notes: A tie is classified as “Born Within” if it connects two players from the same community in the period that the tie is formed. A tie is classified as “Born Cross” otherwise. The numbers reported are averaged over 100 simulations. Numbers in parentheses are standard errors. 1.7.1 Within-versusCross-CommunityLinks An important feature of our model is that it allows us to examine the dynamics of networks and communities over (potentially a long period of) time. We simulate the network formation and community choices for T periods. During the process, when each tie is formed, we record whether it is within- or cross-community (i.e., whether the pair of players are from the same or different communities). Then, in the T th period, we look at each within-community tie at the time and check whether the tie was initially formed within- or cross-community. We do the same examination of each cross-community tie in theT th period. We carry out this exercise for T = 25, 50 and 75 (corresponding to roughly 1.5, 3, and 4.5 years) to examine the long-term effects. 37 Table 1.4 reports the results. The reported numbers are averaged across 100 simulation runs and the standard errors are reported in parentheses. The key observation is that a large fraction of cross-community ties evolve from initially within-community ties. Take the results atT = 25 for example. 33.41% of the cross-community ties observed atT = 25 were initially formed within- community. This fraction increases to 39.82% for the network at T = 50, and 42.38% for the network at T = 75. In contrast, much smaller fractions (less than 10%) of within-community ties were initially formed as cross-community ties. The results indicate that within-community tie formation is actually an important source of cross-community connections. In Appendix B, we report results from a different perspective, conditioning on the tie type at t = 1 rather than T . The same conclusion holds. Why within-community tie formation can lead to cross-community ties? The explanation lies with the dynamic aspect of friendship and community choices: as a player moves from one community to another, she carries the ties with the former community to the latter one. These ties then become cross-community ties. An illustration is given in Figure 1.5. Overall, our exercise here shows that within-community tie formation and cross-community tie formation are not antithetical (as perhaps is often the case in static network models). The above results become even more prominent if we take into account the “strength” of ties. We measure the strength of a tie as in Easley, Kleinberg, et al. (2010) and Li et al. (2012). Specifically, given a tie between i andj, the strength is the number of nodes who are neighbors of both i and j, divided by the number of nodes who are neighbors of either i or j. In words, the strength measures the extent of friend overlap between two individuals. Intuitively, a larger overlap indicates that it is more likely that the two individuals belong to each other’s “inner circle.” A long line of research has argued that stronger ties with more friend overlap tend to be 38 Figure 1.5: Community Switching Creates Cross-Community Ties Note: Colors of the nodes indicate the communities that they belong to. The two plots show the event where a node (marked with a green box) switches from the purple community to the yellow community. This movement transforms four within-community ties into cross-community ties. associated with a higher degree of interactions as well as trust (Coleman 1988; Bakshy et al. 2012; Aral and Walker 2014). The right side of Table 1.4 reports the results that weigh each tie by its strength. We see much higher fractions of cross-community ties (64.80%, 58.79%, and 53.72%) that were initially formed as within-community ties. The results indicate that within-community tie formation is not only an important but actually the main source for strong connections across communities. The results above have an important implication for modeling social networks with commu- nity structures. They point to potential pitfalls in using static models, for example, a stochastic block model. A standard stochastic block model estimates two link formation probabilities, one for within-community pairs and the other for cross-community pairs (Holland, Laskey, and Lein- hardt 1983; Anderson, Wasserman, and Faust 1992; Karrer and Newman 2011). The estimation typically relies on a snapshot of the network. Conceptually, this would lead to a substantial over-estimate of the cross-community link formation probability, because it ignores that many cross-community links could be initially formed as with-community links. 39 1.7.2 MobilityandNetworkConnectivity The exercise in Section 1.7.1 above shows that within-community tie formation is an important source of cross-community ties. Because the explanation of this seemingly unintuitive result lies in community switching, a key factor is then the mobility of players, i.e., the frequency at which players switch communities. Below, we examine the effects of mobility on the network. For the platform to influence the mobility of players, one possible policy is to impose a “freeze time” after players moving to a new community, during which players are not allowed to move again. 12 Denote the “freeze time” by f ∈ {0,1,2,...}, where f = 0 means players can change communities at any time (as in the data), andf = 1 means that a player must stay in the commu- nity for at least one period after the period she joins a community. (In practice, the platform may implement more flexible versions of this policy, where players are still allowed to move during the immobile time but at a cost, e.g., losing game points.) One caveat in this policy exercise is that players may take the freeze time into account before making community choices. Unfortunately, accounting for forward-looking behaviors is very challenging with social networks and beyond the scope of our model. However, with forward- looking players, it seems still reasonable that the policy will influence mobility. Particularly, a largerf should lead to a lower mobility. Thus, we expect that our qualitative results will continue to hold. Figure 1.6 displays the results. The vertical axis of each plot represents the cross-community tie density, which measures the connectivity between communities. The three plots on the left side show the results for the networks atT = 25,50, and75, respectively. We see that the cross- community connectivity initially changes slowly withf, and then starts to decline asf becomes 12. One can think of this policy as similar to a job contract or rental lease that penalizes early termination. 40 Figure 1.6: Player Mobility and Cross-Community Tie Density (a) T = 25 (b) Weighted by Tie Strength; T = 25 (c) T = 50 (d) Weighted by Tie Strength; T = 50 (e) T = 75 (f) Weighted by Tie Strength; T = 75 Note: The bands show the95%-confidence intervals. 41 Figure 1.7: Player Mobility and Network Density (a) T = 25 (b) Weighted by Tie Strength; T = 25 (c) T = 50 (d) Weighted by Tie Strength; T = 50 (e) T = 75 (f) Weighted by Tie Strength; T = 75 Note: The bands show the95%-confidence intervals. 42 longer. The three plots on the right side weigh each cross-community tie by the tie strength (as defined in Section 1.7.1). Very interestingly, we see clear inverted-U relations between mobility and the resulting cross-community densities. Take T = 50 for example. The densest weighted cross-community connection is achieved at f = 3 and that density is 12.7% higher than the no-policy case (f = 0). The above result is not straightforward yet intuitive, especially considering the mechanism found from Section 1.7.1. On one hand, staying in a community helps the player to build within- community ties with the community members. On the other hand, community switching “trans- forms” these within-community into cross-community ties. A balance between these two aspects is needed to achieve the most cross-community connectivity. When the mobility is too high, play- ers do not stay in a community long enough to develop strong ties. When the mobility is too low, players are “locked” in their respective communities, without sufficient community switching to transform within-community ties into cross-community connections. In our specific context, the highest density of strong cross-community ties is achieved when the platform moderately limits the mobility of players (through the freeze-time policy). The above result and intuition carry over to overall network density (instead of just cross- community density). Figure 1.7 shows the results. Thus, a policy with the appropriate freeze time benefits not only cross-community connectivity but also the overall connectivity of the network 1.8 Conclusion Social networks often feature community structure. In this paper, we model and study the co- evolution of online social network and communities, with unique data that track social ties as well 43 as explicitly defined communities over time. We estimate the mutual effects between friendship and community choices. Because unobserved individual heterogeneity (such as latent interests) may manifest itself as dependence between friendship and community choices in the data, we allow for rich unobserved heterogeneity in the model. The rich unobserved heterogeneity cou- pled with a network structure introduces significant computational challenges for estimation. To overcome this challenge, we apply a novel estimation approach (NNE) that uses machine learning techniques. Our model delivers two key substantive messages. First, the model shows that a large frac- tion of cross-community ties come from within-community ties (while the reverse does not hold true). In our context, within-community tie formation is in fact the main source of strong cross- community ties. This result has implications for our understanding of the community structure. It says that many cross-community ties that we observe in snapshots of networks were actually not formed as cross-community ties. This result emphasizes the importance of taking a dynamic per- spective when modeling social networks and their community structure. A static network model typically takes an antithetical view between within- and cross-community ties, and thus would overestimate the tie formation probability across communities (e.g., stochastic block model). Second, platforms can tap into within-community tie formation to achieve more and stronger connections between communities. This result provides a perspective for managing online social networks; directly encouraging cross-community interactions is not the only way to bridge dif- ferent communities. Specifically, our model shows that a key factor is the mobility of individuals. With a balanced level of mobility, individuals have sufficient time to develop ties within a com- munity, and then “transform” these ties into cross-community ties when they move to a different community. 44 The insight should be particularly relevant for modern platforms with built-in social network features. These platforms (e.g., Linkedin and Twitter) build their value from not only the number of users but also the connections between users. It is these connections that give rise to faster diffusion of information and subsequently new marketing opportunities (e.g., viral marketing). A long, well-known obstacle to diffusion is a lack of bridges between communities (see, e.g., Easley, Kleinberg, et al. 2010). As such, the value of an online social network very much depends on how well distinctive communities are connected. Obstructing information dissemination, dis- connected communities are often associated with knowledge or behavior silos unhealthy for platform’s long term growth. Given this context, understanding the formation and dynamics of cross-community ties is ever more important. There are limitations of our study that open possible directions for future research. First, our dataset features the “mono-community” design. Individuals can join one but not multiple commu- nities at a time. It is not difficult to extend our model and estimation method to data where where individuals are allowed to join multiple communities (e.g., whether to join a particular commu- nity is modeled as a separate choice). It is important to note that even with a “multi-community” design, a community structure where the social network is divided into distinctive clusters is likely still present. However, it will be interesting to examine whether a multi-community design can mitigate segregation and lead to more connections between communities. Second, our study focuses on the dynamics of within- vs. cross-community ties, without formalizing an analysis on the their implications on information diffusion and product adoption. With additional data, one may estimate a model of diffusion/adoption and examine how the diffusion process interacts with the evolving community structure. 45 Chapter2 LearningtoCreateonContent-SharingPlatforms 2.1 Introduction Digital content creation is a multi-billion industry that is undergoing massive growth, with its market size estimated to reach$104 billion in 2022 (Geyser 2022). Central to the thriving industry are 50 million independent content creators (2 million being full-time creators) all over the world (Murthy 2021), who produce, share, and monetize their creations on content-sharing platforms such as YouTube, Instagram, and Twitch, etc. Content creators on YouTube, for instance, shared 55% of the $28.8 billion ad revenue in 2021 (Hutchinson 2022). Twitch live streamers also re- ceive considerable monetary reward through subscription, tipping or ads (Geyser 2022). Indeed, creating content for revenue is a rising trend that distinguishes content creators from other UGC contributors 1 . 1. For instance, Toubia and Stephen (2013) study the motivation of non-commercial users on Twitter and find that the importance of image-related motivation to participate; on an online review platform, Sun, Dong, and McIntyre (2017) find that consumer participation is largely driven by intrinsic motivation, such that consumers contribute less after introducing monetary reward. 46 For independent content creators, however, uncertainty around monetizing their content is a real challenge that persists (Jarrett 2022). Unlike employees in the traditional industries, con- tent creators face unstable reward income streams that they have little control over. Regardless of whether creators receive monetary reward through ad-revenue share, tipping, or subscription, etc., their incomes are essentially determined by how positively consumers think about their con- tent produced. Yet consumers’ likings are hard to predict and change from time to time, leading to highly volatile revenue streams over time. As a result, it is not uncommon for content creators to feel stressful and frustrated — reward volatility (i.e., risk) is in fact the leading cause for con- tent creators’ mental health struggles and even “burn-out” (Lorenz 2021; Hall 2022). For instance, Mark Shust, an online course creator, shared his own experience of “making nearly $20,000 in a single month, and then not even hitting $4,000 the following month” and described that as “times of great despair.” (Shust 2021) Such an averse attitude toward income risk is in accordance with the literature on income smoothing, where numerous studies in economics, finance, and accounting have addressed why and how individuals and firms smooth out their incomes over time (Beidleman 1973; Morduch 1995; Koch 1981; Hanna and Lindamood 2010). The reward income volatility likely brings negative consequences on the platform through content creation choices, too. To survive under reward volatility, content creators have to learn about the potential of their creations, stay vigilant about feedback from consumers, and adapt their content production choices accordingly. Anecdotal evidence suggests that, to cope with risk, creators tend to choose to produce content that appeals to popular tastes and attracts more audiences, possibly leading to content concentration on the platform. If the risk becomes too high to manage, creators may even withdraw from the platform (Lorenz 2021; Hall 2022). As facilita- tors of content creation, platforms depend crucially on creator-contributed content to maintain 47 profitable and build a prosperous content ecosystem (Parker, Alstyne, and Jiang 2017; Hukal et al. 2020). It is thus essential for the platform to not only understand creators’ content production choices but also be able to manage the reward-related risk in content creation. Our study quantifies the importance of both reward and risk in content creation. While the effects of monetary reward on content creation is well-researched 2 , existing literature remains largely silent on how reward fluctuations (or risk) affect content creation and, further, how to manage the risk. Our study revolves around three questions: (i) How do reward and risk affect content creation? (ii) How to capture content creators’ learning about reward and risk from their own creation and by observing others’ performance? (iii) How can platforms leverage the understanding about creators’ learning and decision-making to promote content creation and increase revenue? Answering those questions requires understanding of content creation choices at the individ- ual level. To this end, we model creators’ choice among content categories in a mean-variance utility framework, where we explicitly incorporate the effects of both reward and risk using the expected reward and its variance. Because creators are uncertain about their reward beforehand, we overcome this challenge by imposing a learning structure within the choice framework to capture how creators deal with the uncertainty. In particular, creators in our model predict that their monetary reward associates with their intrinsic qualities, their own content creation, and their content consumption. What remains to resolve is how important those contributing factors are (Chylinski, Roberts, and Hardie 2012; Zhao, Yang, et al. 2022), which is updated in a Bayesian framework upon receiving the reward 2. To name just a few: Sun and Zhu (2013), Sun, Dong, and McIntyre (2017), Burtch et al. (2018), Cong, Zhao, and Zhang (2018), Kuang et al. (2019), Wang, Li, and Hui (2021), and Jain and Qian (2021). 48 signals (DeGroot 2005). Specifically, our model incoporates two types of learning signals that are observable: creators’ own monetary reward from content creation and the performance they ob- serve during content consumption. Learning about how past experiences associate with current reward (instead of the level of reward) deviates from a conventional learing model. It not only captures the evolving nature of reward, but also features “learning of learning” in content cre- ation: creators learn about how their reward evolve over time depending on their own creation (learn-by-doing; Arrow 1971; Thornton and Thompson 2001) and their observation of others’ performance (learn-from-observation, Thornton and Thompson 2001; Zhang 2010; Chan, Li, and Pierce 2014). In addition, our model incorporates rich individual heterogeneity (both observed and unobserved) and dynamics, which enables us to further analyze the differential reward and risk preference among subgroups of creators over time. We collaborate with a leading live streaming platform in China, where creators receive mon- etary reward directly from audience’s tipping 3 . and observe the reward at real time. Our model estimates reveal several interesting findings. First, both financial motivation and risk aversion play important roles in content creation: creators generally prefer to produce in a category that yields greater monetary reward but involves less reward fluctuations. We quantify the effects of risk and find that, at the mean level, reward fluctuations cause disutility that amounts to 1/3 of the average reward; to make up for the negative effects of having an additional unit of reward variance, creators are willing to give up23% of their reward 4 . Second, there is significant hetero- geneity in reward and risk preference across creators. In terms of risk attitude, creators divide into two groups, with 30% of creators being highly risk averse and the rest being slightly averse to 3. Tipping (also referred to as donation) is an important reward stream for live streaming platforms and represents a “pay-what-you-want” pricing strategy (Lu et al. 2021; Ma, Wang, and Liu 2022) 4. Reward here is measured in log-scale. 49 risk (and even risk-loving); moreover, female creators and creators who are physically attractive tend to be more risk averse than the rest of creators. In addition, our unique dataset covers the initial outbreak of COVID-19 pandemic and the first round of national-level physical lockdown in China, which allows us to empirically study how physical lockdowns affect creators’ content chocies. Our findings suggest that creators not only show heavier preference over monetary reward after lockdown, but also have more diverse preference over risk. After lockdown, the highly risk averse creators become even more risk averse while some creators who used to be moderately risk averse now enjoy risk. Leveraging the findings, we propose an income smoothing policy for the platform to encour- age content creation by mitigating individual reward fluctuations over time. Under the policy, the platform collects a fixed proportion of monetary reward (“tax”) every time creators produce 5 and subsidizes creators whenever creators’ reward is low. Our counterfactual simulations deliver several important messages. First, by carefully designing the tax and subsidy rates, the platform could earn additional revenue while helping creators cope with risk. Second, we find an in- verse U-shape relationship between the tax rate and the platform revenue, such that the platform achieves an optimal level of revenue by collecting a moderate level of tax while fully subsidizing creators’ reward gap over time. Intuitively, a moderate level of tax balances between collecting sufficient funding for subsidy and encouraging content creation that is potentially profitable. Last but not least, such profitability comes hand-in-hand with a boost in content creation: not only do creators produce more content under the policy, but they also diversify their content produced. A policy with a30% of tax rate that maximizes the platform revenue also leads to a10% increase in 5. This is in addition to the existing commission rate, where the platform collects 70% of the tipping reward income from content creators. See Section 2.3 for details. 50 the content produced and an improvement over content variety. Hence, a moderate-to-high level of tax discourages not only unprofitable creations but also creations that concentrate on popular categories. Overall, our study contributes to understanding and managing the reward uncertainty in content creation. Theoretically, we add to the literature by proposing a model of creators’ content choice with reward uncertainty. In resolving the uncertainty, we characterize creators “learning of learning”, where creators learn about the reward evolvement through their past creation and consumption experiences. Our results highlight the impeding role of risk in content production, where creators in general disfavor content categories with greater risk. Managerially, our work not only uncovers how risk and reward affect creators’ content choices, but also provides a new perspective on how to help creators cope with the risk. We propose an easy-to-implement income smoothing policy that, if carefully designed, could encourage content creation, improve content variety on the platform, and increase profitability at the same time. The rest of the paper is organized as follows. Section 2.2 reviews literature. Section 2.3 de- scribes the empirical context and summarizes data. Section 2.4 develops our model. We outline the model estimation in Section 2.5 and discuss the model estimates in Section 2.6. Section 2.7 presents counterfactuals. Section 2.8 concludes. 2.2 Literature User-GeneratedContent(UGC). Our study relates to the rich literature on the user-generated content in several ways. First, there is a stream of literature studying the financial incentive of content creation and how platform policies of monetary reward affect content creation (to name 51 just a few: Sun and Zhu 2013; Sun, Dong, and McIntyre 2017; Cong, Zhao, and Zhang 2018; Burtch et al. 2018; Khern-am-nuai, Kannan, and Ghasemkhani 2018; Kuang et al. 2019; Wang, Li, and Hui 2021; Jain and Qian 2021). For instance, Sun and Zhu (2013) study how ad revenue sharing between bloggers and the platform affects content creation and find that, although ad- revenue-sharing improves the qualify of content created, it makes popular content even more dominant, leading to content concentration on the platform. Sun, Dong, and McIntyre (2017) study how monetary reward affects product review contributions and find an overall decrease in reviews posted. They further examine the moderating role of social connectedness and find that although less-connected members increase their contribution, members with more friends contribute less reviews with monetary reward. In contrast, Burtch et al. (2018) find that introduc- ing financial incentives induces greater review volumes; but financial incentives have to combine with social norms to yield more online reviews and with greater length. On a knowledge-sharing platform, Cong, Zhao, and Zhang (2018) find that allowing content creators to price their content contribution increases the free content supply but with lower qualities. Using analytical models, Jain and Qian (2021) study the revenue-sharing policy design on platforms; they also examine the effects of receiving tipping (in addition to ad-revenue-sharing) and find an improvement in content quality. While the effects of providing monetary reward (i.e., the level of reward) are well researched, the literature remains silent on how fluctuations (or variability) of reward affect content creation, especially given the highly volatile reward income, the resulting mental struggles to the creators and potential impacts on their content supply. We contribute to the literature by highlighting the role of reward fluctuations and studying the effects of both the level of monetary reward (reward mean) and the reward-related risk (reward variance) on content creation. In examining the effects 52 on content creation, we account for both the content volume and content variety on the platform. Methodologically, we propose a structural model and conduct policy simulations (i.e., smoothing out creators’ income over time) without incurring the costs of actually implementing the policy, which adds to the studies of designing monetary reward policy. Second, following the COVID-19 pandemic and the worldwide physical lockdowns, there are studies that look into the effects of physical lockdowns on content consumption (Marzouki, Al- dossari, and Veltri 2021; Lemenager et al. 2021) and content creation (Wang, Mousavi, and Hong 2020). For instance, Wang, Mousavi, and Hong (2020) find that creators experiencing physical lockdowns produce more content but the content created are less novel and less optimistic com- pared to pre-lockdown periods. Our study also provides empirical evidence on how lockdowns affect creators’ reward and risk preference in content creation, and how different groups of cre- ators (attractive vs. unattractive; female vs. male) respond differently to lockdowns. LiveStreaming. Our research adds to the emerging literature that studies marketing problems on live streaming platforms. On the demand side (i.e., content consumption on live streaming platforms), various studies have looked into consumers’ motivation to participate (Lin, Yao, and Chen 2021; Wu et al. 2022; Liu, Tan, and Pawar 2022), consumers’ tipping behavior under the pay-what-you-want pricing strategy (Lu et al. 2021; Ma, Wang, and Liu 2022; Liu, Tan, and Pawar 2022), and consumers’ demand for live streaming products (Cong, Liu, and Manchanda 2021), etc. For instance, Lu et al. (2021) examine how the revenue a live streaming session generates changes with its audience size and find an overall positive and concave relationship between the two, which highlights the social image-related utility in paying for the contents. In contrast, the supply side of the industry remains much less studied until recently (Zhao, Lu, et al. 2022; Qian and Xie 53 2022). Using data from Twitch, Zhao, Lu, et al. (2022) study the impacts of content switching on the viewership of incumbent live streaming content creators. They find two different positive spillover effects of content switching, including the direct effects of bringing more audiences to the new content category and the indirect effects of increasing the exposure of the new content category. Qian and Xie (2022) empirically investigate the effects of top creators on both the demand and supply of contents on a live streaming platform. They find that the exit of top creators leads to decreases not only in content consumption but also content production from incumbent non-top creators. Also under the pay-what-you-want strategy, where reward from tipping is highly volatile, our study focuses on the supply side. We study how the tipping reward level and its fluctuations affect creators’ live streaming content creation. To help cope with reward fluctuations, we also propose an income smoothing policy that potentially not only generates more revenue for the live streaming platform, but also promotes content creation and content variety on the platform. Learning. First, our work builds upon the rich literature on decision-making under uncertainty (consumer learning), where people are uncertain about product attribute and collect information signals to learn more about the product over time 6 . Since the seminal work of Erdem and Keane (1996), there has been numerous research that applies the learning framework to solve market- ing problems, including but not limited to advertising and brand loyalty (Erdem and Keane 1996; Ackerberg 2003; Mehta, Rajiv, and Srinivasan 2004; Erdem, Keane, and Sun 2008), product adop- tion and diffusion (Coscelli and Shum 2004; Chintagunta, Jiang, and Jin 2009; Narayanan and 6. See Ching, Erdem, and Keane (2013) for an excellent review and assessment of learning models in marketing. 54 Manchanda 2009), online reviews (Zhao et al. 2013; Wu et al. 2015), crowd-sourcing ideas and content (Huang, Vir Singh, and Srinivasan 2014), among many others. In the aforementioned studies, individuals mainly learn about an attribute that is fixed over time, such as the product quality in Erdem and Keane (1996), the product quality and credibility of online reviews in Zhao et al. (2013), and the ability to come up with high-potential ideas as well as the cost of implementing new idea in Huang, Vir Singh, and Srinivasan (2014). But the evolving nature of reward income for content creators in our study calls for more. We distinguish from the majority of the literature by modeling creators’ learning about the evolution patterns of reward, not just the level of reward. Closely related to our paper are Chylinski, Roberts, and Hardie (2012) and Zhao, Yang, et al. (2022). In Chylinski, Roberts, and Hardie (2012), the authors propose a framework where consumers learn about the significance of binary product attribute importance evolution by updating beliefs about the association between the presence/absence of the attribute and utility. Zhao, Yang, et al. (2022) model and estimate players’ level progress in online gaming, where players learn about the evolution of their operation efficiencies before making the play-or-quit decisions. Our work adds to the literature by modeling content creators’ learning about how their reward evolves over time, where creators collect information and update beliefs on how their reward is associated with their reward in the past, the reward they observe from consumption, and their intrinsic quality. Second, another line of literature looks into how learning behaviors affect production and performance (Thornton and Thompson 2001; Chan, Li, and Pierce 2014; Riedl and Seidel 2018). For instance, Thornton and Thompson (2001) investigate both learning from own experience and 55 from others in the context of shipbuilding during World War II. Chan, Li, and Pierce (2014) exam- ine the effects of peer-based learning and learning-by-doing on worker productivity among sales- people. The authors find that peer-based learning substantially increases the long-term produc- tivity growth of new salespeople, and that peer-based learning is more important than learning- by-doing. In an online innovation community, Riedl and Seidel (2018) empirically characterize and explain the learning curve of contributors to innovation contest, where contributors receive feedbacks and learn through direct experience and by observing others. Our work contributes to this literature by incorporating both learning from content creation (i.e., learn-by-doing; Arrow 1971; Thornton and Thompson 2001) and learning from content consumption (i.e., learn-from- observation; Thornton and Thompson 2001; Zhang 2010; Chan, Li, and Pierce 2014) into how content creators make predictions about their reward evolvement patterns over time. 2.3 Data We work with a unique dataset provided by a leading live streaming platform in China. Oper- ating in a similar way as Facebook Live and Periscope, the focal platform encourages content creators (also referred to as “streamers” or “broadcasters”) to produce and share live streaming videos of any content, such as dancing, singing, talk show, or video/audio chat with audience and other streamers, etc 7 . Figure 2.1 presents three examples of live streaming sessions: the leftmost streamer was singing while playing the piano, the streamer in the middle was sharing a seaside walk, and the streamer on the right was chatting with the audiences. The creator’s view of the session is basically the same as the view from audiences. 7. Unlike traditional performers on TV or in theaters, live streaming content creators do not have to be an expert to perform. For instance, they do not need to be a singer to sing a song during live streaming sessions. 56 Figure 2.1: Examples of Live Streaming Sessions Similar to most live streaming platforms (e.g., Twitch, Douyin, Kuaishou), tipping is the major income source for content creators on our focal platform. One can think of this revenue model as similar to the “pay-what-you-want” strategy (Lu et al. 2021; Ma, Wang, and Liu 2022). On the focal platform, tipping exists in the form of virtual gifts and operates in a similar way as “Bits” on Twitch and “Stars” on Facebook. Particularly, during a live streaming session, audiences can send virtual gifts to the creator. Those virtual gifts are purchased using in-app currency (or “points”) and will show as fancy short animations on the screen for the creator and all audiences watching the session. Immediately after receiving the gifts, the creator’s reward account is updated to reflect the value of gifts. Therefore, a creator on the platform not only has information on her own reward income but also observes the reward performance of other sessions she watches. Once a live streaming session finishes, its tipping reward no longer accumulates. On the focal 57 platform,30% of the tipping income goes to the creator; the rest (70%) is collected by the platform as commission 8 . Our data have several features that fit into our research scheme. First, our dataset covers histories of both content creation and content consumption activities of the sampled content creators, enabling us to study the two-sidedness of content-sharing platforms. Second, the major source of reward for both the creators and the platform is tipping. Thus the dataset provides a relatively simple and clean empirical context to study the effects of reward and risk on content creation: there are no interactive effects from other sources of income, such as advertisement, endorsement, or subscription. On the other hand, tipping income is highly volatile: it yields large variations in income both across individuals and over time; the volatility also calls for the platform to help monitoring and managing income risk. Third, our data period covers the initial outbreak of COVID-19 pandemic and the first round of national-level physical lockdown in China, which allows us to analyze how reward and risk affect content creation differently during a major external shock. We carefully filter out observations that either are invalid or do not fit into our research scheme. First, we remove sessions that last less than60 seconds in the creation records because they are too short for the company to conduct content analysis and thus do not have content information. Second, we filter out watching histories that last less than 60 seconds since stream- ers are unlikely to learn anything significant in such a short period. Third, we remove content creators with no more than 30-day records of valid streaming or watching because we use the first 14 day to initialize our learning model and the rest to estimate content choices. Altogether, 8. This is operationalized by setting different buying and selling prices for their in-app currency (“points”): when an audience buys virtual gifts, 100 in-app point worth CNY 10 ($1.5); when a creator transfers the in-app point to her banking account, 100 in-app points are equivalent to CNY3 ($0.45). 58 we obtain information on5,177 valid creators, with1.08 million streaming sessions from Octo- ber,2019 to February,2020. We conduct analyses at the daily level. On average, a creator in our sample produces2.6 sessions on a active streaming day. To characterize live streaming sessions, we utilize information on the monetary reward re- ceived and the content of the session. Overall, approximately 86% of creators who produce receive reward from tipping each day. The average income received per day is CNY 265 (ap- proximately$40) with a large standard deviation of5142, demonstrating large income dispersion across live streaming sessions. Tipping reward demonstrates significant fluctuations over time: if we look at the standard deviation of reward along each individual’s creation history and divide it by the individual mean reward over time, an average creator has a coefficient of variation (CV) of2.15 9 . This indicates that, on average, a creator’s reward standard deviation over time amounts to more than twice of her average reward. In our data, each valid live streaming session (i.e., sessions that last at least 60 seconds) re- ceives a content label 10 from the platform. Such content labels are mined from the live streaming videos and images, audiences’ live comments, and self-reported hashtags. Altogether, we classify the content labels into six mutually exclusive content categories, including (1) movie/TV/gaming, (2) social, (3) music talent, (4) co-hosted podcast, (5) solo podcast, and (6) all-other contents. For instance, the leftmost panel in Figure 2.1 would be categorized into “music talent” and the ses- sion in the middle was in the “other” category. Figure 2.2 presents the overall distribution of content categories, where music talent and solo-podcast are popular choices for creators, and movie/TV/gaming is relatively less popular. In our data, approximately75% of content creators 9. Coefficient of variance (CV) measures the magnitude of standard deviation relative to the mean. Cases with CV > 1 are considered high-variance. The average individual-level reward standard deviation is 419.65. 10. We collaborated with the platform in labeling the content but unfortunately, we do not have access to the original video recordings. 59 Figure 2.2: Content Frequency produced more than one categories during our sample period. Over time, we observed 32,857 content switching (about8% of the total streaming occasions at daily level), where content cre- ators shift from one content category to another. For individual content creators, we also observe their gender and their level of physical at- tractiveness, both of which are salient features of the live streaming sessions. Overall, 68% of creators are female. For physical attractiveness, the platform hires human raters to rate how at- tractive a creator looks like and categorize creators as attractive (34%) or unattractive (66%). The attractiveness rating is not observed by the creators. Our data period covers the initial outbreak of COVID-19 pandemic and the first round of physical lockdown in mainland China. A creator is considered as being in the lockdown status if her geographic location during the live streaming session is in a province that declares a level I public health emergency 11 regarding the COVID-19 pandemic. Across the nation, all provinces in China mainland started physical lockdown during January 23rd to 30th, 2020. We list the timing of physical lockdown in each province in Appendix C. 11. For details on the first level emergency response, see summaries at Wikipedia and policies from the Chinese central government. 60 2.4 Model We model creators as individuals who face uncertainty in their monetary reward from producing each content and try to predict their reward using factors including their own intrinsic quality, their own performance in the past, and the performance they observe as audiences. Creators will gather information on how important each factor is towards a successful creation and update their belief through both creation and consumption experiences in a Bayesian fashion. Both the mean and the variance of reward income affect creators’ utility of producing a content. Our model has several features. First, we adopt a mean-variance utility framework and ex- plicitly model how reward-related risk (in addition to reward) affects creators’ content creation choices. Second, we model the “learning of learning”. In our model, creators are learning about how their reward evolves over time instead of learning about a fixed level of reward. While the later way of learning has been extensively studied in the consumer learning literature 12 , learning about association or the evolvement pattern (Chylinski, Roberts, and Hardie 2012; Zhao, Yang, et al. 2022) provides a new modeling framework that fits into our study. By including past cre- ation and consumption experiences as two sources of learning, our learning process captures both learn-by-doing (Arrow 1971; Thornton and Thompson 2001) and learn-from-observation (Thornton and Thompson 2001; Zhang 2010; Chan, Li, and Pierce 2014). Third, we model the switching costs towards producing new and different contents, which adds another type of state dependence in addition to learning (Israel 2005; Osborne 2011; Goettler and Clay 2011). Depend- ing on the sign of the estimated coefficient, our creators could either exhibit inertia or tendency of variety seeking. 12. For instance, Erdem and Keane (1996), Ackerberg (2003), Coscelli and Shum (2004), Erdem, Keane, and Sun (2008), Zhao, Zhao, and Helsen (2011), and Zhao et al. (2013). 61 2.4.1 UtilitySpecification We model a creator’s choice as follows. In each periodt, creatori chooses to produce contentj such that contentj yields a higher utilityU ijt than any other choicesl̸=j, including the outside option J of not producing anything. 13 Let the binary variable y ijt denote the content choice, wherey ijt = 1 if creatori produces a live streaming session with contentj at timet andy ijt = 0 otherwise. Then, y ijt = 1, ifU ijt >U iℓt for allℓ̸=j; 0, otherwise. We separate the utility into two parts: V ijt and the idiosyncratic error termϵ ijt . We assume thatϵ ijt follows the extreme value distribution. The utility of not producing any contentU iJt is normalized to be zero: U ijt = V ijt +ϵ ijt , forj = 1,2,...,J− 1; 0, forj =J. We specifyV ijt as follows: V ijt =α ijt0 +α it1 · R ijt +α it2 · Risk ijt +α it3 · PastCreate ijt . (2.1) In the above, the variableR ijt stands for the log of the monetary reward that creatori receives for producing contentj at timet. The variableRisk ijt denotes the risk associated with monetary 13. To distinguish between the choice of the outside option and the choice of exiting the platform, we consider creators who did not create or watch for14 consecutive days as inactive participants (i.e., no longer on the platform). The14-day window covers99% of valid live streaming sessions. 62 reward. Such a mean-variance utility specification for expected utility is widely adopted in the economics and finance literature; the utility is equivalent to a Constant Relative Risk Aversion (CRRA) utility model under normality assumption (Sargent 1987). PastCreate ijt is a binary variable indicating whether content j has been produced by creator i during the past week: it equals one if contentj has been produced during the past week and zero otherwise. It controls for state dependence in addition to that results from learning (Israel 2005; Osborne 2011; Goettler and Clay 2011). We define the variable PastCreate ijt based on a creator’s recent creation history (as opposed to the whole history) for two reasons. First, we do not observe the complete creation histories of all streamers since their very first streaming session. Thus definition based on the “whole” history is likely biased for the experienced streamers. Second, a rolling-based definition of history is a simple way of capturing “forgetting” without creating complexities in the learning process. 14 As such, parameterα ijt0 is the individual- and time-specific content dummy that captures the intrinsic utility or cost of production creatori derives from streaming contentj at timet. Param- etersα it1 andα it2 capture the effects of reward and risk on creators’ content choice, respectively. In particular, α it1 is the weight creator i attaches to her monetary rewards. Parameters α it2 is the individual specific risk coefficients of creator i towards monetary reward. Depending on the valuesα it2 take, creators in our model can be risk averse, risk neutral, or risk seeking: α it2 < 0 implies that individual i is risk averse and α it2 > 0 suggests that individual i is risk seeking. Parameter α it3 reflects creators’ intrinsic valuation towards producing familiar contents versus 14. “Forgetting” happens since memories of prior evaluations weaken, which makes memories harder to retrieve over time (Anderson 2000). In the marketing literature of consumer learning, Mehta, Rajiv, and Srinivasan (2004) model forgetting in consumers’ purchase decisions by assuming that consumers recall their prior evaluations with noise; Zhao, Zhao, and Helsen (2011) capture consumer forgetting through the decreasing consumer confidence in quality beliefs. 63 exploring new contents. In particular, A positiveα it3 would indicate a preference for producing a familiar content. On the other hand,α it3 < 0 means streameri derives additional utility by pro- ducing a content that is new to her recent creation history. As such,α it3 captures inertia versus the tendency for variety seeking for individuali (e.g., Bawa 1990): ifα it3 is positive, individuali is inertial in content creation; if negative,α it5 suggests individuali enjoys variety seeking. 15 Previous literature on risk and financial decision making has shown interesting results regard- ing how female versus male may have different risk attitude (Byrnes, Miller, and Schafer 1999; Schubert et al. 1999; Weber, Blais, and Betz 2002; Filippin and Crosetto 2016), and how crisis such as natural disasters, pandemics, and financial crisis may affect people’s risk preference (Cohn et al. 2015; Hanaoka, Shigeoka, and Watanabe 2018; Tsutsui and Tsutsui-Kimura 2022). To account for the heterogeneous reward and risk preference among different groups of content creators and the impact of COVID-19 pandemic, we further model eachα it parameter 16 as a combination of observed and unobserved individual heterogeneity. We use the superscript m to distinguish between different α it parameters: α m it =β m 0 +β m 1 · Attractive i +β m 2 · Female i +β m 3 · Lockdown it +β m 4 · Lockdown it · Attractive i +β m 5 · Lockdown it · Female i (2.2) +β m 6 · Time t +κ m i . 15. Drawing from the psychology literature, a negative α it5 also captures creators’ need for varied and novel sensation or experience (“sensation seeking”; Zuckerman 2014), which applies to the risk-seeking behaviors in entertainment-related context such as gaming (Zhao, Yang, et al. 2022). 16. For each individual i, we have (J − 1) + 3 sets of α it , which includes J − 1 content dummies (α ijt0 ), the income parameter(α it1 ), the risk parameter(α it2 ), and the state dependence parameter(α it3 ). 64 In the above, Attractive i is the binary variable of physical attractiveness: it equals one if creator i is physically attractive and zero otherwise. Female i is a binary variable indicating whether i is a female creator. Lockdown it is the variable that captures the state of physical lockdown, defined based on creator i’s geographic location of production att: Lockdown it = 1, if creatori is experiencing physical lockdown at timet; 0, otherwise. In addition, we include the interactions of physical attractiveness and gender with physical lockdown indicators. As such, our utility specification not only allows us to see the heteroge- neous preference among different groups (defined by physical attractiveness and gender), before and after the COVID-19 physical lockdowns, but also uncovers the possibly different preference before- and after-lockdown among different groups of creators. To control for the time effects, we include a normalized time trend Time t : Time t = t− 1 T − 1 , where T is the total number of days in the data. Finally, we also include the unobservable partκ i and assume that it follows a normal distribution: κ m i ∼ i.i.d N(0,ξ 2 m ). 65 Since creators could not observe their reward and risk ex ante, they will make choices based on their expectation on both reward and risk. The expected utility of creatori producing content j, conditional information available at timet, is as follows: E t V ijt =α ijt0 +α it1 · (E t R ijt )+α it2 · (V t R ijt ) (2.3) +α it3 · PastCreate ijt +ϵ ijt , whereE t R ijt andV t R ijt stand for the expectation and variance of income, respectively. We will discuss how individual creators form and update their belief on their reward evolvement and obtain its mean (E t R ijt ) and variance (V t R ijt ) in a moment. After obtaining the expected income and income variance, the conditional probability of creator i choosing to produce content j at timet is: P(y ijt = 1) = exp(E t V ijt ) P J l=1 exp(E t V ilt ) . (2.4) 2.4.2 Learning When deciding what content to produce at time t, creator i is uncertain about the monetary reward (R ijt ) that she will receive ex ante. Yet creators understand that their income relates to their own intrinsic quality and their past experiences and will learn about the association between those aspects and their income to receive. Creators will then update their beliefs upon receiving two types of signals: (i) their own reward income from creation in the past and (ii) reward income information they gather by watching other creators’ sessions; both are content specific. The first type of signals captures how a creator learns from her own previous experience (learn-by-doing), 66 while the second type of signals represents how a creator learns by observing others’ sessions (learn-from-observation). As such, our model delineates how individual learn from their previous learning experiences. Specifically, before making content choices at t, creator i is assumed to hold the following belief on how her monetary reward for creating contentj evolves over time 17 : R ijt =θ i0 +θ i1 · R ij,t− 1 +θ i2 · W ij,t− 1 +η ijt , (2.5) where R ijt is the monetary reward creator i receives by producing a session with content j. 18 R ij,t− 1 denotes the reward creator i received for producing content j at time t− 1, and W ij,t− 1 denotes the log of monetary reward of the sessions individuali consumes in the previous period. All reward variables are in log scale. In the reward learning equation (Equation 2.5), parameterθ i0 stands for the intrinsic aspect of individuali, or quality, that generates (greater or less) monetary reward. Parametersθ i1 andθ i2 capture the importance of each information source in helping individuali forecast her monetary reward, namely her past creating and watching experience regarding category j. The idiosyn- cratic error term,η ijt , is assumed to follow a normal distribution: η ijt ∼ i.i.d. N(0,σ 2 i ). For simplicity of notation, letθ i denote the vector of learning parameters that individuali is uncertain about: θ i = [θ i0 θ i1 θ i2 ] T . 17. Note that in our model, all reward is in log-scale. 18. Since each streamer may create and watch multiple sessions on each day (i.e., for eacht), we assume that she will aggregate the signals by content categories for each day. 67 Denote the vector of income signalsM ijt and the matrix of income signals stacked over con- tentM it as follows: M ijt = [1 R ij,t− 1 W ij,t− 1 ] T , M it = [M i1,t− 1 M i2,t− 1 ... M i,J− 1,t− 1 ] T . At the beginning of timet, we assume that streameri holds the following prior beliefs onθ i based upon her current information setI it : θ i |I it ∼ MVN(µ it ,Σ it ). After completing her live streaming sessions at t, creator i observes the realized monetary reward immediately. Note that in our case, learning signals (own and others’ reward income) are observed. In contrast, most consumer learning studies deal with unobserved information signals (e.g., Erdem and Keane 1996; Mehta, Rajiv, and Srinivasan 2004; Erdem, Keane, and Sun 2008; Zhao, Zhao, and Helsen 2011). Observing information signals not only helps identify our model under fewer identification assumptions, but also avoids the computational costs of integrating out the unobserved signals through simulation (Chintagunta, Jiang, and Jin 2009; Ching, Erdem, and Keane 2013; Zhao, Yang, et al. 2022). With new information from creation and consumption att, the creator then updates her belief on parametersθ i in a Bayesian fashion (DeGroot 2005) at the beginning of the next period(t+1). 68 Under our distributional assumptions, the updated evaluation of θ i also follows a multivariate normal distribution: θ i |I i,t+1 ∼ MVN(µ i,t+1 ,Σ i,t+1 ), where µ i,t+1 = Σ i,t+1 Σ − 1 it µ it + (M it ◦ e y i,t− 1 ) T R it σ 2 i Σ i,t+1 = Σ − 1 it + (M it ◦ e y i,t− 1 ) T (M it ◦ e y i,t− 1 ) σ 2 i − 1 . In the above, R it is the column vector consisting income from creation at t. e y i,t− 1 is the matrix indicating whether contents were created or consumed in the last periodt− 1. Intuitively, creators will only update their belief upon receiving new income signals: e y i,t− 1 = [e y i1,t− 1 e y i2,t− 1 ... e y i,J− 1,t− 1 ] T , wheree y ij,t− 1 is a vector indicating whether content j was created or consumed in the pre- vious periods, with binary variables y ij,t− 1 and y W ij,t− 1 standing for creation and consumption, respectively. The symbol “◦ ” denotes the element-by-element product: e y ij,t− 1 = [I(y ij,t− 1 +y W ij,t− 1 > 0) y ij,t− 1 y W ij,t− 1 ] T . Finally, creatori’s anticipation of reward before content choice is: 69 R ijt |I it ∼ N(M T ijt µ it ,M T ijt Σ it M ijt +σ 2 i ). And the expectation for reward-related terms in Equation 2.3, conditional on the information at the beginning of timet, is: E t R ijt =M T ijt µ it V t R ijt =M T ijt Σ it M ijt +σ 2 i (2.6) Altogether, the expected utility of creating contentj at timet is: E t V ijt =α ijt0 +α it1 · (M T ijt µ it )+α it2 · [M T ijt Σ it M ijt +σ 2 i ] (2.7) +α it3 · PastCreate ijt +ϵ ijt . Table 2.1 explains the variables in our model. 70 Table 2.1: Explanation of Model Variables Model Variable Meaning Utility E t Inc ijt Creatori’s expected log-income of producing contentj at timet. V t Inc ijt Creatori’s variance of log-income of producing contentj at timet. PastCreate ijt A dummy indicating whether creatori produced contentj in the week before timet. Attractive i A dummy indicating whether creatori is physically attractive. Female i A dummy indicating whether creatori is female. Lockdown it A dummy indicating whether creatori experiences physical lockdown at timet. Time t The time trend normalized to between zero and one. Learning R ijt The log of income that creatori receives from creating contentj at timet. W ijt Average log-income of sessions watched by creatori for contentj at timet. 71 2.5 IdentificationandEstimation In terms of estimation, our proposed framework of content creation essentially boils down to a choice model with a learning structure. Unlike most studies in consumer learning, where individ- uals do not directly observe the learning signals (e.g., product quality), creators and the researcher in our model observe all reward signals. Accordingly, the creator’s learning (Equation 2.5) sim- plifies to a standard regression model for each individual i (Zhao, Yang, et al. 2022). Observing learning signals brings the advantages of not only identifying the model under fewer identifica- tion assumptions, but also avoiding the computational costs of integrating out the unobserved signals through simulation (Chintagunta, Jiang, and Jin 2009; Ching, Erdem, and Keane 2013; Zhao, Yang, et al. 2022). Our main challenge in empirical identification comes from the lack of complete content cre- ation and consumption histories. In our data, we do not observe the very start of content creation and consumption (and hence learning) for all creators. Thus creators with more experience (but unobserved to the researcher) possibly have more prior information than others at the begining of our data period. This is known as the initial condition (or data truncation) problem in the learning literature (Mehta, Rajiv, and Srinivasan 2004; Zhao, Yang, et al. 2022). Following the lit- erature, we address the challenge as follows. We split data into two parts: an initialization sample (including the first 14 streaming occasions) and an estimation sample (including the rest of data). We estimate the learning parameters in Equation 2.5 using the initialization sample and use them as the initial priors for our estimation. With the initial conditions and observed reward signals, the reward expectation and variance are then uniquely determined by Equation 2.6. Altogether, it 72 suffices to obtain σ 2 i and initial priors (θ Init i0 ,θ Init i1 ,θ Init i2 ) to estimate the learning model and derive the mean and variance of reward. Once we derive the mean and variance of reward from the learning process, we are estimat- ing a mixed logit model of content choices. Rich data variations both across individual and over time allow us to empirically identify the choice parameters, including observed individual hetero- geneity parameters{β m 0 ,β m 1 ,...,β m 6 } m=1,2,...,9 as well as the standard deviations of unobserved individual heterogeneity{ξ m } m=1,2,...9 , which totals72 parameters. We carry out the estimation following three steps. First, we estimate σ 2 i by obtaining the variance of residual in Equation 2.5 for each individuali. For each creatori, we regress her realized monetary reward on i’s own reward in the last period (R ij,t− 1 ), reward information observed through content consumption in the previous period (W ij,t− 1 ), and an intercept, after which we calculate the variance of the residuals. We keepσ 2 i constant during the learning process. Second, given σ 2 i , the observed learning signals, and the initial priors (θ Init i0 , θ Init i1 , θ Init i1 ), we then derive the mean and variance of reward following Equation 2.6. Finally, with all pieces together, we estimate creators’ content choices following the standard simulated MLE procedure (Train 2009). We discuss model estimates in detail in Section 2.6. 2.6 EstimationResults This section discusses our choice model estimates. Estimation results of the learning model are included in Appendix D. Table 2.2 presents the estimation results from the the full model. Expla- nations on the variables are listed in Table 2.1. Using the model estimates and the observed choice sequences, we follow Train (2009) and make inference on the parameters at individual-level (i.e., 73 α m it for each individual i at time t). Intuitively, one can think of the individual parameters as posteriors from an empirical Bayes model, where the priors are the estimated population-level parameters presented in Table 2.2. We summarize the means and standard deviations of the in- dividual (posterior) parameters and present the results in Table 2.3. Overall, the model estimates highlight the importance of accounting for risk (i.e. variance of reward) in addition to reward in creators’ content choice. Table 2.3 indicates that overall cre- ators prefer content categories with higher monetary reward but less risk. At the mean level, the disutility caused by unstable reward amounts to32% of the average reward (in log scale) 19 ; and to offset the disutility brought by one additional unit of risk (i.e., one unit increase in the vari- ance of log-reward), one needs to receive an additional reward that equals 22% of the average log-reward 20 . Second, reward and risk preference are highly heterogeneous across creators, as in- dicated by the large standard deviations (relative to the mean) of individual parameters. This calls for further analysis into the heterogeneity, which we will discuss in a moment. Lastly, creators in general gain utility from producing a content that they have produced in the recent past, as sug- gested by the large positive estimate ofPastCreate ijt . Such preference comes from a different perspective (e.g., psychological) than the reward-learning mechanism we model explicitly. To gain more insights into content creators’ choices, we visualize individual creators’ prefer- ence over monetary reward and risk based on the individual posteriors. The rich heterogeneity in our model enables us to further analyze the preference of sub-population defined by physical attractiveness, gender, as well as before-and-after physical lockdowns. Figure 2.3 and 2.4 present the distribution of individual preference for reward and risk, respectively. Figure 2.5 plots the 19. The average of individual log-reward variance over time is1.47 and the average of individual log-reward over time is 2.54. Thus at the mean level, the disutility caused by risk amounts to 1.47× 0.14 2.54× 0.25 = 32% average log-reward 20. Again, the average of individual log-reward over time is 2.54. One additional unit of risk brings0.14 disutilities, which is 0.14 0.25× 2.54 = 22% of the average individual reward in log-scale. 74 Table 2.2: Choice Model Estimates Variable Estimate S.E. E t Inc Constant 0.2158 ∗∗∗ (0.0112) Attractive − 0.0734 ∗∗∗ (0.0132) Female 0.0123 (0.0148) Lockdown − 0.0150 ∗ (0.0089) Lockdown x Attractive − 0.0002 (0.0121) Lockdown x Female 0.0260 ∗∗ (0.0115) Time 0.1027 ∗∗∗ (0.0117) Unobs Hetero 0.2059 ∗∗∗ (0.0037) V t Inc Constant 0.0396 ∗∗∗ (0.0048) Attractive − 0.2173 ∗∗∗ (0.0100) Female − 0.0647 ∗∗∗ (0.0038) Lockdown 0.0684 ∗∗∗ (0.0089) Lockdown x Attractive − 0.0145 (0.0229) Lockdown x Female − 0.0553 ∗∗∗ (0.0176) Time − 0.1415 ∗∗∗ (0.0169) Unobs Hetero 0.0957 ∗∗∗ (0.0042) PastCreate Constant 3.4609 ∗∗∗ (0.0394) Attractive − 0.4040 ∗∗∗ (0.0361) Female 0.1902 ∗∗∗ (0.0366) Lockdown 0.1227 ∗∗∗ (0.0401) Lockdown x Attractive 0.1681 ∗∗∗ (0.0479) Lockdown x Female − 0.2373 ∗∗∗ (0.0478) Time − 1.0224 ∗∗∗ (0.0582) Unobs Hetero 0.8575 ∗∗∗ (0.0153) Content 1: Constant − 5.0300 ∗∗∗ (0.0817) Movie/TV, Attractive 0.0352 (0.1016) and Gaming Female − 0.2535 ∗∗∗ (0.0873) Lockdown 0.3619 ∗∗∗ (0.0662) Lockdown x Attractive − 0.0796 (0.1135) Lockdown x Female − 0.0999 (0.0852) Time − 0.5047 ∗∗∗ (0.1101) Unobs Hetero 1.9767 ∗∗∗ (0.0392) Content 2: Constant − 6.9155 ∗∗∗ (0.1336) Social Attractive − 2.2364 ∗∗∗ (0.1274) Female − 0.6528 ∗∗∗ (0.0973) Lockdown 0.2025 ∗∗∗ (0.0701) Lockdown x Attractive 0.1069 (0.1583) Lockdown x Female − 0.0159 (0.0810) Time − 0.7594 ∗∗∗ (0.1115) Unobs Hetero 3.8543 ∗∗∗ (0.0742) 75 Table 2.2: Choice Model Estimates (Continued) Variable Estimate S.E. Content 3: Constant − 5.1270 ∗∗∗ (0.0717) Music Attractive 2.7506 ∗∗∗ (0.0738) Talent Female 0.9263 ∗∗∗ (0.0659) Lockdown − 0.0185 (0.0691) Lockdown x Attractive − 0.0592 (0.0856) Lockdown x Female 0.1483 ∗ (0.0783) Time − 0.3238 ∗∗∗ (0.0907) Unobs Hetero 1.3355 ∗∗∗ (0.0306) Content 4: Constant − 3.7534 ∗∗∗ (0.0676) Co-hosted Attractive − 0.8135 ∗∗∗ (0.0866) Podcast Female − 0.2141 ∗∗∗ (0.0684) Lockdown 0.1482 ∗∗∗ (0.0517) Lockdown x Attractive − 0.1421 (0.0914) Lockdown x Female 0.0915 (0.0623) Time − 0.5090 ∗∗∗ (0.0826) Unobs Hetero 2.0704 ∗∗∗ (0.0370) Content 5: Constant − 2.7709 ∗∗∗ (0.0522) Solo Attractive 0.4382 ∗∗∗ (0.0654) Podcast Female − 0.1084 ∗∗ (0.0516) Lockdown 0.1480 ∗∗∗ (0.0496) Lockdown x Attractive − 0.0795 (0.0794) Lockdown x Female 0.1000 ∗ (0.0597) Time − 0.5992 ∗∗∗ (0.0776) Unobs Hetero 1.0366 ∗∗∗ (0.0195) Content 6: Constant − 3.9318 ∗∗∗ (0.0610) All Others Attractive 1.3015 ∗∗∗ (0.0702) Female 0.4149 ∗∗∗ (0.0605) Lockdown 0.3570 ∗∗∗ (0.0616) Lockdown x Attractive − 0.0702 (0.0842) Lockdown x Female 0.0444 (0.0723) Time − 0.9937 ∗∗∗ (0.0901) Unobs Hetero 0.7730 ∗∗∗ (0.0263) Model Fit LogLikelihood − 317333.26 Note: Results significant at 90%,95%, and99% levels are indicted by ( ∗ ), ( ∗∗ ), and ( ∗∗∗ ), respectively. 76 Table 2.3: Summary of Individual Parameters Variable Individual Parameter Mean Std E t Inc α it1 0.2503 0.1560 V t Inc α it2 − 0.1437 0.1252 PastCreate α it3 2.8667 0.7008 Content 1 α i1t0 − 5.6061 1.2139 Content 2 α i2t0 − 8.4759 2.7709 Content 3 α i3t0 − 3.8468 1.8596 Content 4 α i4t0 − 4.3981 1.7194 Content 5 α i5t0 − 2.9869 0.8500 Content 6 α i6t0 − 3.7643 0.8796 Note: Individual parameters are calculated based on10,000 draws from the population-level model estimates. individual preference over recent creation (state dependence). In what follows, we discuss our findings on reward preference, risk attitude, and preference over recent creation; we also discuss the preference changes after the physical lockdowns. Reward preference. Our results suggest that monetary reward is indeed an important mo- tivation for content creation. Creators in general gain positive utility from receiving monetary reward, as suggested by the positive reward coefficient for almost 95% of the creators in Figure 2.3 (a). This is in accordance with previous studies on UGC where providing reward incentivizes cre- ators to produce more (Sun and Zhu 2013; Burtch et al. 2018; Wang, Li, and Hui 2021). Moreover, reward preference varies among attractive versus unattractive creators. Figure 2.3 (b) indicates that attractive creators place less importance on their monetary reward while the less attractive creators value reward more. Interestingly, we do not find much gender difference in reward pref- erence; only that the preference over reward is more centered for male creators, suggesting that the reward preference is comparatively homogeneous among male creators, as shown in 2.3 (c). 77 Physical lockdown impacts creators’ reward preference significantly. Creators generally show heavier preference over monetary reward, with the distribution of reward preference estimates shifting to the right after the lockdown (Figure 2.3 d). Intuitively, creators’ income source outside the platform might be compromised during physical lockdown and hence creators gain more utility from the same reward amount on the platform. For creators who are physically attractive, physical lockdown does not impact them differently compared with the less attractive ones. The lockdown effects exhibit gender difference, such that female creators receive more impact (i.e., gets comparatively more utility from receiving monetary reward) than males. Riskpreference. There are two main findings regarding the overall attitude toward risk, dis- played in Figure 2.4 (a). First, our estimation results indicate that the majority of creators are risk- averse, as approximately 93% of them derive dis-utility from the their reward variance. While previous literature shows that consumers might exhibit risk-seeking tendency in entertainment activities such as gaming (Zhao, Yang, et al. 2022), our results suggest that the importance of mon- etary incentive dominates creators’ entertainment motivation on the live streaming platform so that creators are generally averse to reward income volatility. Second, the individual risk coef- ficient has a bimodal distribution, with approximately 30% of creators being highly risk averse and the rest shows moderate-to-low levels of risk averse (and even risk-seeking). Moreover, creators with different levels of physical attractiveness exhibit different levels of risk aversion. Particularly, Figure 2.4 (b) shows that creators who are physically attractive are more risk averse; the small group of creators who are risk-neutral or risk-loving almost all come from the group of unattractive creators. One possible reason behind the difference is that at- tractive creators (as opposed to the less attractive ones) are more likely to be full-time content 78 Figure 2.3: Individual Preference on Reward (a) Overall (b) Attractive vs. Unattractive (c) Female vs. Male (d) Before vs. After Lockdown (e) Lockdown Effect by Attractiveness (f) Lockdown Effect by Gender Note: The bands show the95%-confidence intervals. 79 creators on the platform, making them more susceptible and sensitive to risk. Gender matters, too. Figure 2.4 (c) indicates that female creators tend to be more risk averse than male creators; in contrast, most male creators either gain little disutility from or even enjoy risk. The results are in accordance with most existing literature on gender difference in risk attitude (Byrnes, Miller, and Schafer 1999; Weber, Blais, and Betz 2002). In addition, the risk attitude of female creators follow a similar bimodal distribution as the overall sample of creators. During physical lockdowns, creators show risk attitude that is more diverse than before: the two groups (highly risk averse and moderate risk averse) of creators collapse into three segments. Particularly, Figure 2.4 (d) shows that the highly risk averse group of creators become even more risk averse (a “shifting” pattern) while creators who used to be moderately risk averse divide into two classes (“division” into mildly risk averse and risk seeking groups). Moreover, if we look at the different impact of lockdown on attractive versus unattractive creators in Figure 2.4 (e), the aforementioned shifting pattern is possibly driven by creators who look attractive. Unattractive creators, on the other hand, show a division of risk preference among the group during lock- down that is similar to the moderately risk averse group in Figure 2.4 (d). Finally, lockdowns also have different effects on female versus male creators’ risk attitude: while female creators become more risk averse after lockdown, the male creators now derive more utility (or less dis-utility) from the risk, as suggested by patterns in Figure 2.4 (f). Previous literature has found mixed re- sults (either more risk-averse or more risk-seeking) on how people’s risk attitude changes during financial crisis (Cohn et al. 2015), natural disasters (Hanaoka, Shigeoka, and Watanabe 2018), and the pandemic (Tsutsui and Tsutsui-Kimura 2022). Duclos, Wan, and Jiang (2013) study financial risk-taking behaviors under social exclusion and examine how feeling isolated causes consumers 80 to pursue riskier financial opportunities. Our study adds new empirical evidence on how cre- ators’ risk preference diversifies during the COVID-19 lockdowns and how different sub-groups of creators react differently. Preferenceovercontentproducedintherecentpast(PastCreate). Figure 2.5 displays the distribution of individual preference on content produced in the recent past, which is captured by the variablePastCreate. All creators in our analysis gain additional utility from content that has been produced during the preceding week. Such tendency is more salient for unattractive (as opposed to attractive) creators, as indicated by the different distributions among attractive versus unattractive creations in Figure 2.5 (b). Female and male creators have similar state dependence estimates in Figure 2.5 (c), suggesting that there is no significant gender difference regarding this valuation. After physical lockdown, creators’ content choices tend to be less dependent on recently produced content. 2.7 CounterfactualAnalysis:SmoothingRewardOverTime Our model estimates highlight the overall negative role of risk in creators’ content choices. In this regard, how could the platform leverage the knowledge regarding creators’ learning and decision- making mechanism to cope with the reward-related risk? We propose income smoothing policies in a hope to alleviate creators’ risk in reward, promote content creation, and increase revenue for the platform 21 . 21. In practice, some platforms and management companies also adopt other measures to help smooth creators’ reward income over time, including providing a base salary or minimum wage for content creators, especially for rookies with potentials. 81 Figure 2.4: Individual Preference on Risk (a) Overall (b) Attractive vs. Unattractive (c) Female vs. Male (d) Before vs. After Lockdown (e) Lockdown Effect by Attractiveness (f) Lockdown Effect by Gender Note: The bands show the95%-confidence intervals. 82 Figure 2.5: Individual Preference on Content Produced in Recent Past (a) Overall (b) Attractive vs. Unattractive (c) Female vs. Male (d) Before vs. After Lockdown (e) Lockdown Effect by Attractiveness (f) Lockdown Effect by Gender Note: The bands show the95%-confidence intervals. 83 Under the income smoothing policy, the platform collects a fixed portion of reward (“com- mission”, τ ) from each live streaming session 22 and subsidizes the creator (“subsidy”, s it ) if her creation reward on a particular day is too low. In particular, we assume the following rule for subsidy: if a creator receives an after-tax tipping reward that is less than her average total reward in the previous week, she will automatically receive a subsidy from the platform that makes up for a proportion (“subsidy rate”,ρ ) of the reward gap: s it =ρ · max(0, ¯ R i,t− 1 − (1− τ )· Tipping it ). In the above, s it is the subsidy to receive, ρ is the subsidy rate, ¯ R i,t− 1 is the average (after- policy) reward from creation in the previous week,τ is the tax rate, andTipping it is the tipping reward i receives from the audience at time t. As such, the platform has two dimensions to manage the policy, namely the commission rate (τ ) and the subsidy rate (ρ ). We simulate under different levels of commission τ ∈ {0.7, 0.75, 0.8, 0.85, 0.9} and sub- sidy rate ρ ∈ {0.25, 0.50, 0.75,1} and analyze their effects on (i) content volume, (ii) content variety, and (iii) platform revenue. For each case, we sample5,000 creators from data and initiate their learning with the priors estimated from our data. After that, we simulate creators’ learn- ing and content choices forT = 30 days using our model estimates in Table 2.2. We repeat the above procedure for100 times and summarize the total number of sessions produced, the content concentration (lack of variety), and the platform revenue during the30 days. In our analysis, we assume that creators are informative about the policy: creators not only understand the policy rule (including the commission rate and the subsidy rule), but also observe 22. Note that even in the benchmark case of zero tax and zero subsidy, the platform still collects 70% of tipping reward as observed from data (as described in Section 2.3). The tax and subsidy we discuss here is on top of the current revenue-sharing between creators and the platform. 84 Figure 2.6: Income Smoothing Policies the tipping reward and the platform subsidy separately. Creators will learn about how their tipping reward evolves. However, they will take the policy into consideration when deriving the expectation and variance of their total reward, based on which content choices are then made. Subsidy is private information, so that creator i could not observe the subsidy other creators receive. Figure 2.6 outlines our policy. Overall, our results suggest that, under appropriate levels of tax and subsidy rates, smoothing income over time is beneficial. Under the policy that maximizes the revenue gain for the platform (i.e., 80% commission rate and 100% subsidy rate), the platform sees a 10.35% improvement in the content volume, a0.23% decrease in their content concentration, as well as an8.3% increase in the platform revenue. In what follows, we discuss the effects on content volume and variety in Section 2.7.1 and the impacts on platform revenue in Section 2.7.2. 2.7.1 EffectsonContentCreation Table 2.4 presents the content volume and variety resulted from different combinations of com- mission and subsidy rates. To better visualize the changes, Figure 2.7 plots the content volume 85 Table 2.4: Content Creation Under Policy Outcome Subsidy Commission Rate Rate 70% 75% 80% 85% 90% Content 0% 94.57 − − − − Volume (0.07) (k) 25% 103.72 101.57 98.78 95.82 92.78 (0.07) (0.07) (0.07) (0.07) (0.07) 50% 106.95 104.33 101.24 97.6 94.07 (0.07) (0.06) (0.07) (0.07) (0.06) 75% 108.73 106.15 102.98 99.36 95.23 (0.07) (0.07) (0.07) (0.07) (0.07) 100% 109.98 107.30 104.37 100.62 96.32 (0.07) (0.07) (0.07) (0.07) (0.07) Content 0% 2279.46 − − − − Variety (3.34) (HHI) 25% 2283.89 2279.34 2267.86 2263.16 2262.59 (3.16) (2.76) (2.65) (3.04) (2.98) 50% 2285.14 2281.44 2273.73 2268.95 2265.61 (3.07) (2.93) (3.03) (3.33) (3.04) 75% 2283.63 2277.18 2276.24 2263.56 2261.21 (2.91) (2.93) (3.27) (2.74) (2.49) 100% 2283.28 2276.46 2274.11 2273.12 2264.87 (2.77) (2.49) (3.18) (3.23) (2.88) and variety under the subsidy rates of 50% and 100% relative to the benchmark case. Overall, our analyses suggest that under appropriate levels of commission and subsidy rates, smoothing tipipng reward over time help promote both content volume and variety, compared to the status quo. Content Volume. We find that introducing the income smoothing policy in general encour- ages content creation. While providing monetary reward is shown to promote UGC volume in some situations (Sun and Zhu 2013; Burtch et al. 2018; Wang, Li, and Hui 2021), our results suggest that reducing reward-related risk by smoothing out reward volatility also encourages creators to 86 Figure 2.7: Effects of Income Smoothing on Content Creation (a) Content Volume Change (k),s = 50% (b) Content Volume Change (k),s = 100% (c) Change in Content Variety (HHI),s = 50% (d) Change in Content Variety (HHI),s = 100% Note: The bands show the95%-confidence intervals. 87 produce more. According to Table 2.4, with the policy combination that maximizes platform rev- enue (i.e., 80% commission rate with 100% subsidy rate for the reward gap), there are 105,068 sessions being produced on the platform; it represents an 10.35% increase in content volume compared to the baseline case without policy (94,574 sessions). In addition, Figure 2.7 shows that, as the subsidy rate increases, creators will produce more sessions. On the other hand, the content volume decreases with the tax rate. At a high tax rate of60% with a relatively low subsidy rate of25%, creators produce less compared to the baseline case. Content Variety. Of the content created, we quantify its concentration (or equivalently, lack of variety) using the Herfindahl-Hirschman Index (HHI; Rhoades 1993): HHI = 6 X j=1 s 2 j , where s j is the share percentage of content category j. A large HHI value indicates that the platform is dominated by a small number of popular content categories and lack variety in content, which is similar to the case of the monopoly in an industry. Accordingly, a decrease in HHI indicates that the contents on the platform become less concentrated (i.e., have more variety). Our results in Table 2.4 indicate that under proper levels of tax rates, our policy will not lead to more concentration; it will improve content variety on the platform by imposing additional commission at a moderate-to-high level. This provides an alternative perspective to the literature, where introducing monetary reward (again, without managing risk) has been shown to cause content concentration (Sun and Zhu 2013). Take the policy combination that maximizes platform 88 revenue (i.e., 80% commission rate with 100% subsidy rate for the reward gap) for an example, the HHI index is lower (though not statistically significant lower) than the index without policy. As the tax rate further increases to60%, the platform actually has a significantly more balanced content distribution. Interestingly, as the tax rate increases, the content produced generally become more diverse. Intuitively, such a relationship suggests that a higher commission rate not only discourages cre- ations that are not profitable, but also those sessions that concentrate on popular content cate- gories. Figure 2.7 further illustrates the results, where the negative values imply improvement in content diversity. As a robustness check, we also analyze the content entropy and present the results in Appendix E. Both HHI and content entropy measures yield the same conclusions. 2.7.2 EffectsonPlatformRevenue We aggregate the platform revenue from the policy (i.e., tax minus subsidy) within30 days after policy. Table 2.5 presents the platform revenue under policy, where a positive number in the table indicates a relative revenue gain compared with the benchmark case (70% commission without subsidy) and a negative number stands for a loss of revenue. For instance, if the platform collects 10% of tipping reward from each session and subsidizes 25% of creators’ reward gap, then the total revenue loss for the platform is 3.65 k; if the platform increases the tax rate to 20%, there will be a revenue gain of1217.80 k. The benchmark case has zero revenue gain/loss. 89 Table 2.5: Platform Revenue Under Policy (k) Subsidy Commission Rate Rate 70% 75% 80% 85% 90% 0% 0.00 − − − − (0.00) 25% − 2741.18 938.59 1592.68 1200.44 603.07 (215.85) (9.11) (17.75) (22.12) (7.40) 50% − 5866.35 457.67 1747.95 1326.29 649.59 (127.39) (22.70) (18.64) (12.37) (9.27) 75% − 8581.91 453.92 2168.35 1512.88 742.36 (120.13) (44.36) (32.65) (12.86) (21.59) 100% − 9417.99 1851.76 3296.47 2016.83 839.54 (177.80) (29.10) (55.35) (23.55) (10.26) Our results highlight two findings. First, our results suggest that the platform could achieve revenue gain under appropriate levels of commission and subsidy rates. Across all rates of sub- sidy, the platform will earn a positive revenue if it imposes a tax of no less than20% of the tipping reward from every live streaming session. Second, there is an optimal policy combination that maximizes the platform revenue, under which the platform collects 80% of the tipping income and subsidizes 100% of the creators’ re- ward gap. On one hand, for a fixed level of subsidy rate, we find an inverse U-shaped revenue gain from policy. To see this, we plot the revenue changes in Figure 2.8. As the commission rate rises, the policy-related revenue first increases with the commission rate before it decreases. Intuitively, since the platform sends out subsidy as long as the creator finds a reward gap (com- pared to their own reward history), the platform needs a balance between encouraging content creations while discouraging those that may lose too much money. If the tax rate is too low, then all creators tend to create more, even if they receive low levels of reward; if the commission rate is too high, it might discourage content creations, including the potentially profitable ones. On the 90 Figure 2.8: Effects of Income Soothing on Platform Revenue (a) (a) Platform Revenue Change (k), s = 50% (b) Platform Revenue Change (k) Note: The bands show the95%-confidence intervals. other hand, if we compare the maximum revenue received under different levels of subsidy rates, fully smoothing out creators reward (i.e., a 100% subsidy rate) gives the best revenue outcome for the platform. 2.8 Conclusion As the creators’ economy grows rapidly during the past several years, monetary reward has become an increasingly important motivation for content production, especially for those full- time independent creators. However, content monetization comes with the challenge of unstable revenue streams over time, which is among the leading causes for creators’ mental struggles and possibly distorts creators’ content production. As facilitators for content creation, platforms rely heavily on their content supply to remain profitable and maintain a health content ecosystem. It is thus crucial for platforms to not only understand but also be able to manage the impacts of reward fluctuations on content creation. 91 This paper studies the effects of monetary reward and risk on creators’ content choices. We model the content creators as active learners who face uncertainty in their reward that they will receive. To resolve the uncertainty, creators learn about how their reward evolves over time by gathering information both from their own past creation and by observing others’ performance. Based on the expected reward and reward variance, creators then make the decision of what content to produce. Our results highlight the importance of risk in content creation. In the empirical context of a live streaming platform, we quantify the impacts of risk and find that the disutility brought by risk at the mean level amounts to approximately 1/3 of the average reward. Additionally, we also uncover significant heterogeneity among creators’ reward and risk preference and interesting dynamics during the COVID-19 pandemic. Managerially, our work not only uncovers how reward and risk affect creators’ decision- making, but also provides insights on how to help creators cope with the reward uncertainty and manage content creation. In particular, we propose smoothing creators’ reward over time by collecting a a fixed proportion of reward from each creation and subsidizing the creator when she has low reward. We find that when the platform collects an additional 10% of tipping reward income and fully subsidizes the individual’s reward gap over time, the platform could achieve an optimal level of revenue. Moreover, such profitability is compatible with a healthy and booming content ecosystem. While previous studies raise the concern that directly providing monetary reward might harm content variety, our policy targeting at smoothing income over time provides an alternative solution. Under some combinations of commission and subsidy policy that yield revenue gain for the platform, not only does the content volume increase, but also the content variety improves. Although we analyze in the empirical context of a live streaming platform where tipping (“pay-what-you-want”) from audience is the major income source, our modeling 92 framework and policy suggestions readily extend to accommodate other types of reward, such as ad revenue share, sponsorship, and subscription, etc. 93 Chapter3 Beauty,Effort,andEarnings: EmpiricalEvidencefromLive Streaming 3.1 Introduction Having an attractive physical appearance is advantageous. This widely recognized phenomenon, often referred to as “beauty premium” or “beauty bias,” has been extensively studied in various contexts, including job performance and career development (to name a few: Hamermesh and Biddle 1994; Biddle and Hamermesh 1998; Mobius and Rosenblat 2006; Peng et al. 2020). The question of whether and why physically attractive individuals are more likely to earn higher incomes, in particular, has captured the attention of researchers across many fields including marketing and economics. To quantify beauty premium, the common practice in literature is to model beauty as an observed individual heterogeneity that is independent of one’s effort. Effort, however, is an unarguably crucial determinant for performances and earnings (Brown and Peterson 1994; Barlevy and Neal 2019). When it comes to measuring the beauty premium on earnings, effort not only directly contributes to the performance, but also could moderate the 94 impacts of an attractive appearance. Without either role, the beauty premium measured could be biased up or down, depending on whether attractive individuals are more or less effortful. The second aspect, which focuses on the interaction of effort and appearance, takes into account the potential difference in marginal productivity for attractive individuals. It captures a new type of beauty premium that is enhanced by effort and reflects their advantage in efficiency. This is in contrast to the existing studies that treat beauty premium as a stand-alone individual heterogene- ity 1 . If a significant effort-enhanced beauty premium exists compared to the baseline premium studied in literature, highlighting the role of effort delivers two important new messages: (i) beauty is a valuable endowment that requires effort to capitalize on, and (ii) people without the innate advantage could narrow the gap by devoting more effort. This paper aims to quantify the beauty premium while controlling for effort and taking into account the interplay of effort and beauty. Substantively, we extend the investigation from more traditional occupations (such as salespersons, lawyers, and politicians) to the rising live streaming industry, where tens of millions of people now engage in careers that may benefit from their physical appearance to promote their content and achieve higher earnings. Indeed, the rise of digital content-sharing platforms, such as YouTube, Twitch, Instagram, and TikTok, has generally amplified the reach and visibility of physical attractiveness by placing a greater emphasis on visual elements such as images and videos. To carry out our research, we collaborate with a leading Chinese live streaming platform with503,911 live streaming records from14,317 content creators. 1. Throughout our paper, we refer to the effort-free beauty premium studied in literature as the “baseline beauty premium”. In contrast, we define the “effort-enhanced beauty premium” to be the advantage of having higher marginal productivity (i.e. efficiency), captured by a higher coefficient in front of the beauty-effort interaction term. 95 In particular, our study revolves around three main research questions: (i) How does the beauty premium affect the earnings of live streaming content creators? (ii) To what extent is the beauty premium contributed by effort? (iii) What are the implications for content creators and the platforms? Addressing those research questions poses two main challenges. To start with, we need ac- curate measures for both effort and beauty. Notably, it is challenging to separate people’s effort using observational data since effort is generally unobserved in the traditional labor market. In our study, we take advantage of our live streaming empirical setting to tackle the measurement problem. In the context of live streaming, the amount of time a creator spends on presenting the content is accurately recorded, which we utilize to proxy her effort. Moreover, the validity of this effort measure is strengthened by the “live” feature of live streaming: unlike short video produc- ers on YouTube or Instagram, live streaming creators’ effort and income do not extend beyond the live streaming sessions. Once a session is finished, creators do not invest in any additional input, nor will they receive income from their previous creations 2 . To measure beauty, the com- mon practice in the literature is to use either human ratings (Hamermesh and Biddle 1994; Biddle and Hamermesh 1998; Andreoni and Petrie 2008) or machine learning algorithm-based ratings (Malik, Singh, and Srinivasan 2019; Peng et al. 2020) on the basis of (real or synthetic) profile photos. In our study, all live streaming content creators are evaluated by a group of employees who consider both creators’ profile photos and videos of their initial live streaming sessions. The second challenge comes from the endogeneity concern related to effort. In particular, ef- fort could be a strategic decision based on some unobserved productivity, such as social skills or 2. In fact, our focal platform does not offer recordings for the live streaming sessions. Once a session is finished, it is no longer available to audiences. 96 creativity. As such, the unobserved individual productivity factors could affect both her earnings and her effort input, thereby creating an endogeneity problem. To address this problem, we con- struct an instrument for effort by exploring the variations in creators’ effort and earning around their promotion to a higher level. Particularly, we observe that as individuals approach the level- up threshold, the amount of time they spend on live streaming increases significantly and then drops drastically right after their level-up; the income patterns closely mirror the fluctuations in effort, too. Note that such patterns in both effort and incomes persist, even after normalized by the individual averages to control for individual differences. Thus we propose using the (absolute) temporal distance to the nearest level-up as an instrument for effort. The validity of our instrument is rooted in the level-up designs. All live streaming content creators on the platform have assigned levels based on cumulative streaming duration and in- come. As they devote more time and earn more, creators will level up, experience new features and gain rewards. During this process, not only do creators anticipate the rewards and privileges associated with level-ups, but also they could monitor their progress towards the next level. Thus the intuition for increased effort before level-up lies in creators’ motivation and expectation to achieve the milestones for rewards and new features. On the other hand, creators lose motiva- tion and decrease time input after level-ups and experiencing the new features. To validate our findings from the main instrument, we also perform a series of robustness tests using alternative IVs and on a subset of individuals experiencing only one level-up. Our analysis suggests that there is a premium for attractive content creators so that they earn more income from live streaming compared to their peers, even after accounting for factors such as effort, gender, levels, the content created, and individual fixed effects, etc. To match the earnings of an attractive creator during a one-hour live streaming, an average-looking creator 97 needs to spend approximately2 hours, whereas an unattractive creator would have to devote5.8 hours to close the income gap. We uncover more intriguing findings as we decompose this beauty premium into an effort- independent component and a premium enhanced by effort. As discussed, the former type of premium, extensively studied in literature, serves as a baseline; the latter captures the increased efficiency (i.e., higher marginal productivity of labor) enjoyed by the attractive creators. First, we find that ignoring effort leads to an inflated baseline beauty premium estimate and an under- estimated total premium for the attractive content creators. By including effort alone – without the effort-beauty interactions – explains away approximately 17.4% of the baseline beauty pre- mium. Factoring in the efficiency difference further cuts the baseline premium to 10.2% of the original scale. Second, the major driver of the earning disparity between attractive creators and their peers is the effort-enhanced beauty premium, which accounts for approximately 93.5% of the total premium. This finding indicates that attractive creators enjoy a substantial premium mostly because they have a higher marginal productivity of labor or efficiency. Overall, our results underscore the interplay of effort and beauty in determining beauty pre- mium for live streaming content creators. Incorporating effort into the analysis not only reduces biases in measuring beauty premium, but also conveys valuable insights to both the creators and the platform. Contrary to the perceptions of beauty as an effortless gift – especially in live streaming where one’s physical appearance plays a prominent role – our findings suggest that, albeit being an important asset, beauty needs to be capitalized by effort. For attractive creators, this motivates them to leverage their advantage of high efficiency by working harder. And we do observe in our data that attractive creators spend more time on live streaming compared with their peers. For creators lacking in attractiveness, on the other hand, the message is encouraging: 98 the income disparity is not unbridgeable; indeed, they could close the income gap by devoting more effort. In light of such findings, platforms should recognize that being attractive alone does not guarantee success when recruiting creators and identifying high potentials. It is crucial to avoid discrimination against less attractive creators without considering their potential effort and devotion. The rest of this paper is organized as follows. Section 3.2 summarizes the related literature and discusses our contributions. Section 3.3 presents the data and empirical evidence on earning and effort gaps between attractive individuals and others. In Section 3.4, we discuss our empirical strategy, in particular how we utilize creators’ effort patterns around level-up and construct an instrument for effort. Section 3.5 presents the estimation results. Section 3.6 quantifies the beauty premium and discusses the implications. Section 3.7 concludes. 3.2 Literature Existing literature on beauty premium suggests that being physically attractive is in general ad- vantageous (Eagly et al. 1991; Agthe, Spörrle, and Maner 2011) and manifests its influence in many aspects including marriage market (Elder Jr 1969; Feingold 1988; McNulty, Neff, and Kar- ney 2008), job market (Hamermesh and Biddle 1994; Biddle and Hamermesh 1998; Hamermesh 2006; Mobius and Rosenblat 2006; Cipriani and Zago 2011; Stinebrickner, Stinebrickner, and Sul- livan 2019; Malik, Singh, and Srinivasan 2019), as well marketing and business practices (Argo, Dahl, and Morales 2008; Peng et al. 2020; Halford and Hsu 2020). Among them, a large body of literature studies the magnitude of beauty premium in determining earnings and the mechanism behind it. 99 For example, Hamermesh and Biddle (1994) examine the impact of people’s physical appear- ance on their earnings and find that both beauty premium and plainness penalty exist such that better-looking people earn more; they also find that such premium and penalty exist across occu- pation, suggesting the existence of pure appearance-based discrimination. In Biddle and Hamer- mesh (1998), the authors document a large beauty premium for lawyers and find evidence of strategic sorting into the private sections where the premium is higher for attractive individuals. Using an experimental method where the job task is unrelated to physical attractiveness, Mobius and Rosenblat (2006) further identify three transmission channels of beauty premium, including better confidence, being regarded as more able, and better oral skills during negotiation. Using data on students’ physical attractiveness rating and their examination scores, Cipriani and Zago (2011) evaluate the effects of beauty on productivity in an academic setting and find that beauty premium reflects differences in productivity. The marketing field, too, has seen a growing interests in this topic recently. For instance, Malik, Singh, and Srinivasan (2019) investigate the long-term dynamics of beauty bias in the job market and find attractiveness premium due to the preference-based bias – as opposed to belief- based bias due to lack of information – towards beauty. In a recent work, Peng et al. (2020) extend the discussion of beauty premium from traditional jobs to sellers on e-commerce platforms, where they analyze the effects of profile picture attractiveness on e-commerce sales and find that both attractive and unattractive people sell more than the average-looking sellers. In measuring beauty premium, one important factor that is missing from the existing study is effort. While studies have shown that effort is crucial in determining job performance (Brown and Peterson 1994) and income gaps (Fuentes and Leamer 2019), the literature on beauty premium is 100 surprisingly silent on the role of effort in beauty premium. To fill this gap in the literature, how- ever, one has to address the immediate challenges of properly measuring effort and handling its endogeneity concern, where both effort and productivity could be driven by unobserved produc- tivity at the individual level. Our paper tackles the problems and makes three contributions to the literature. To start with, we are the first to emphasize the role of effort in beauty premium and decom- pose the premium into an effort-independent baseline component and a component enhanced by effort. The effort-enhanced component is the leading cause for beauty premium, which is new to the literature and gains us distinctive insights compared to the baseline premium, in addition to mitigating the bias in beauty premium measurement. Second, our work extends the discus- sions of beauty premium to the live streaming industry, which provides jobs for tens of millions of creators. Our findings carry managerial implications for the platform to caution against dis- criminating toward less attractive content creators without evaluating their effort and devotion. Methodologically, we utilize a unique dataset in live streaming to measure effort and propose a novel instrument that utilizes content creators’ effort variations around their level-up promotions on the platform. Such an instrument can be generalized to many settings to address the effort endogeneity concerns. 3.3 DataandEmpiricalEvidence We conduct our analysis in the live streaming industry, in collaboration with a leading live streaming platform in China, which is comparable to popular U.S. live streaming platforms such 101 as Facebook Live and YouTube Live 3 . The platform encourages content creators, both profes- sional and amateur, to share their talents and interests, such as music, outdoor activities, gaming, and more. Figure 2.1 displays screenshots of typical live streaming sessions on the platform: the leftmost creator was singing while playing the keyboard, the creator in the middle was sharing a beach walk, and the creator on the right was chatting with the audience. Our dataset covers 14,317 individual content creators and their 503,911 valid live streaming sessions over 99 days from October, 2019 to January, 2020 4 . The dataset provides explicit mea- sures of creators’ physical attractiveness, effort, and earnings. In what follows, we discuss infor- mation on creators’ individual traits, including their beauty rating, gender, and their “levels” on the platform, as well as summaries of the live streaming activities, including the time spent each day, tipping income, and content labels. We carry out our analysis on a daily basis. 3.3.1 CreatorInformation Attractivenessratings. On our focal platform, a group of well-trained human raters (employ- ees) rate how attractive a creator is based on her profile photo and her first several live streaming sessions. The rating scale ranges from 1 to 3, with 1 indicating attractive creators, 2 being av- erage looking, and 3 standing for lack of attractiveness. In case raters disagree on the ratings, the majority rule applies. Note that creators and audiences remain unaware of these ratings, and the platform does not update the rating over time. Figure 3.1 displays examples of creators with varying beauty ratings on the platform. We follow the platform’s practice and categorize content creators as “attractive” with a rating of 1 and “unattractive” with a rating of 3. Overall, 24.6% 3. Note that Chapter 2 and Chapter 3 share the same empirical background but with slightly different sample periods and valid sample content creators. 4. We filter out content creators who did not experience any level-up during our sample periods. Valid sessions are defined based on its length: only sessions that last longer than 60 seconds are included in our analysis. 102 of our sampled creators are being described as attractive, and57.1% of them being unattractive. The remaining creators who receive a rating of 2 are considered as our average-looking baseline group for the study. Figure 3.1: Examples of Attractiveness Ratings CreatorLevels. Each live streaming content creator has a “creator level”, which reflects their progress or experience on the platform and is similar to the player level or rank in mobile and online gaming (Huang, Jasin, and Manchanda 2019; Zhao, Yang, et al. 2022). A creator’s level is determined by a creator’s total time spent on live streaming in the past and her cumulative earnings. A higher level indicates both more effort and higher earnings in the past. As a salient indicator of a creator’s status, creators’ levels are displayed alongside their profile photo. In addition, upon reaching certain levels, creators unlock rewards and features such as personalized virtual gifts, increased platform assistance, and being listed as “featured creators,” among others. Figure 3.2 illustrates the rewards and privileges tied to different levels and shows how creators can track their progress towards the next level and the rewards to expect after level-ups. 103 Figure 3.2: Keeping Track of Level-ups We utilize the creators’ level information in two ways. First, we construct an instrumen- tal variable for effort using the temporal distance to creators’ level-ups, which is motivated by creators’ effort variations around level-ups. We discuss the details in Section 3.4.2. Second, we in- corporate the current level of a creator to capture her cumulative earnings and experience, which controls her current status on the platform. Our data include creator levels ranging from one to 99, with an average of 39. Because our empirical strategy depends crucially on the level-up history, we carefully filter out individual creators who did not experience any level-up during our sample periods. On average, an average creator experiences 4.8 level-ups; meanwhile, the maximum number of level-ups we observe is 36. Gender. Existing studies show a significant gender difference in the magnitude and even direc- tion of beauty premium (Mobius and Rosenblat 2006; Ruffle and Shtudiner 2015; Póvoa et al. 2020). Thus we also consider the gender of live streaming content creators, using this information as a control. Overall,61.1% creators are female and38.5% are male. 104 3.3.2 LiveStreamingSessionInformation Tippingincome. Virtual tipping is the major earning method for individual content creators and the platform. Specifically, tipping on the platform takes the format of virtual gift 5 . Audiences can purchase virtual gifts and send them to creators any time during the live streaming. Virtual gifts then appear as fancy animations on the screen and are visible to all audiences watching the same live streaming session. Creators receive in-app credit instantly, which can be transferred to their bank account at any time. The platform retains a70% commission from the tipping income. Note that on the live streaming platform, unlike short video producers on YouTube or Instagram, live streaming creators’ income do not extend beyond the live streaming sessions. Once a session is finished, creators do not receive any income from their previous creations. Approximately81.4% of live streaming sessions receive tips. The average earning aggregated at the daily level is approximately CNY 399.82 ($59.97) and the median daily income is CNY 14.78 ($2.22). Earnings vary significantly across individual content creators. The tipping income has a standard deviation of 6294.08, which amounts to 15.7 times of the average earning. In Section 3.3.3, we further discuss the income difference among attractive content creators versus the others. Sessionduration. Our data sample consists of live streaming sessions that last over 60 seconds. The same criterion is used for mining content labels on the platform. An average content creator spends approximately 1.81 hours on live streaming on a daily basis. Time spent on live streaming demonstrates substantial variations across individuals, with the standard deviation being about 4.14. In our analysis, we use the total time spent on live streaming to measure creators’ effort. 5. The way virtual tipping operates on our focal platform is similar to “Bits” on Twitch and “Stars” on Facebook. 105 Like incomes, live streaming creators’ effort and income do not extend beyond the live streaming sessions either. This “live” feature of live streaming further strengthens the validity of our effort measure. Content labels. We collaborate with the platform and categorize all valid live streaming ses- sions into 24 content categories 6 based on live streaming videos, audience chat histories, and creator’s self-reported content hashtags. Content labels are non-exclusive, such that the first cre- ator from our example in Figure 2.1 will receive both “vocal” and “music instruments” for the same session. We include the content labels in our analysis to control for the effects of creators’ expertise and areas of interest. Content label analysis is conducted at the daily level. 3.3.3 EarningandEffortbyAttractiveness We further explore how tipping income differs among attractive, average-looking, and unattrac- tive live streaming creators. Figure 3.3 compares the average earnings across different groups of creators conditional on active streaming. As shown in Figure 3.3, attractive individuals are enjoying better earnings, suggesting the existence of beauty premium for live streaming creators on the platform. On average, the tipping income for an average-looking individual is roughly1/2 of the earnings for an attractive creator; an unattractive creator only receives less than1/6 of an attractive creator’s income. Such a striking income gap could arise due to multiple reasons, for instance: attractive content creators may be endowed with large unexplained beauty premium in this context, work more efficiently, or simply exert more effort on live streaming. 6. Unfortunately, we do not have access to the original live streaming video recordings or chat histories. 106 Figure 3.3: Tipping Income by Attractiveness We measure the efficiency using the tipping income earned per hour conditional on stream- ing actively. Intuitively, the more an individual could earn within the same amount of time, the higher her productivity of labor is. Figure 3.4 summarizes the efficiency for different groups of content creators. We observe that attractive content creators are over twice as efficient as con- verting their effort into earnings compared with the average-looking group, and over six times as efficient compared to the unattractive individuals. Such a difference in efficiency across at- tractiveness groups also reflects the benefit of being good-looking but differs from the beauty premium discussed in the existing literature. To capitalize their advantage in efficiency, attrac- tive creators need to capitalize their premium and exert effort; in contrast, the beauty premium discussed in existing literature is independent of effort. To distinguish between the two different types of premium, we refer to the effort independent premium as “baseline beauty premium” and the benefit from high efficiency as “effort-enhanced beauty premium.” Due to their higher efficiency, we expect that rational attractive content creators will exert more effort to capitalize their beauty. We plot the number of hours spent on live streaming 107 Figure 3.4: Tipping Income Per Hour by Attractiveness Figure 3.5: Effort by Attractiveness 108 across different groups in Figure 3.5. As shown in the plot, attractive creators are indeed the most effortful, followed by unattractive creators and then the average-looking individuals. To further disentangle and quantify the importance of different sources for the earning gaps between attractive creators and others, we build a model that incorporates (i) the baseline beauty premium as measured in literature, (ii) the effort-enhanced beauty premium (i.e., efficiency), and (iii) effort. We also address the potential endogeneity concern where both effort and earnings could be driven by some unobserved individual productivity. 3.4 EmpiricalStrategy 3.4.1 Model Following the literature, we analyze the effects of physical attractiveness and its interplay with effort using the following linear model. LogIncome it = α · Attractive i +β · Unattractive i +γ · LogStreamHour it +δ · Attractive i · LogStreamHour it +θ · Unattractive i · LogStreamHour it +η · Control it +ξ i +ϵ it (3.1) In this equation,i indexes the live streaming content creator andt denotes time (day).LogIncome it isi’s earnings from tipping for content creatori on dayt. The attractiveness dummyAttractive i equals one if creatori receives an attractiveness rating of 1 and zeros otherwise; the dummy indi- cating less attractive creators,Unattractive i equals one if the platform ratesi as lacking attrac- tiveness (receiving a rating of 3). LogStreamHour it denotes how many hours creatori spends 109 on live streaming on dayt; it serves as the measure of her effort. Attractive i · LogStreamHour it andUnattractive i · LogStreamHour it are interaction terms between creators’ (un)attractiveness measure and their effort. We also include creator i’s gender, creator level, content label dummies, and time trend as controls in the term Control it . The variables ξ i is the set of fixed effects and ϵ it is the random error term. Table 3.1 explains the model variables. As such, the parameter α captures how being attractive directly contributes to a creator’s tipping incomes and is interpreted as the “beauty premium” defined in the existing literature (Hamermesh and Biddle 1994; Biddle and Hamermesh 1998; Malik, Singh, and Srinivasan 2019). It exists in the form similar to a fixed effect and is independent of effort. Similarly, parameter β is the direct effect of lacking attractiveness on tipping and captures the “plainness penalty.” (Peng et al. 2020) Note that both parts are independent of effort. Throughout our paper, we refer to α as the “baseline beauty premium,” andβ as the “baseline plainness penalty.” Another component of the premium (penalty) in our model is captured by the parameter δ (η ). In particular, they capture how being (un)attractive could translate one’s effort more or less efficiently, i.e., they represent the different marginal productivity for (un)attractive content creators. Alternatively, one could also interpret δ as part of the beauty premium enhanced by effort. We refer to δ as “effort-enhanced beauty premium” in our paper to distinguish it from the baseline premium. Note thatδ (η ) is the part of the beauty premium (plainness penalty) that has not been separately studied before. Separating the effort-enhanced beauty premium from the baseline premium is important in two aspects. First, depending on the signs ofδ andη as well as the effort level, the beauty premium studied in the literature may be biased up or down. Second, having a high fixed effect type of beauty premium versus a high effort-enhanced beauty premium has different implications: while the previous situation indicates that attractive individuals could 110 earn privilege without paying effort, the latter one encourages them to be more effortful and capitalize the beauty premium by putting more effort into their work. At this point, we also want to clarify our definition of the baseline beauty premium captured byα . Beauty premium could build on the preferential treatment from both the platform and au- diences at the very early stage of creators’ live streaming career. For instance, the platform may utilize beauty ratings to recommend rookie content creators as a solution to the cold start prob- lem; when deciding whom to watch and tip, audiences, too, could rely on creators’ appearance as a cue. This is similar to the belief-based beauty bias studied in the existing beauty premium literature (Deryugina and Shurchkov 2015; Malik, Singh, and Srinivasan 2019). In our paper, such effects are absorbed into the (direct) beauty premium α . We do not intend to completely isolate the initial forces or study the dynamics of beauty premium in our model. But as a robustness test, we utilize a discontinuity in the recommender system design for rookie versus experienced creators that adds to this discussion Section 3.5.4. In our model, we also include factors such as the creators’ gender, their progress or status on the platform (creator level), the content they create, the individual fixed effects and the time trend. Incorporating these factors allows us to control for factors other than creators’ beauty and effort. For instance, a creator’s past experience and her expertise (e.g. in dancing, musical instrument) likely will impact her earnings; those influences are captured by the creator’s level on the platform and the content she is creating, respectively. Another factor possibly affecting earnings is the quality of the live streaming session. Such a quality difference is mainly observed at the individual level (i.e., some creators tend to produce higher quality live streaming sessions) and it is absorbed into the individual fixed effect terms. 111 3.4.2 InstrumentingEffortUsingtheTemporalDistancetoLevel-ups Even after controlling for observed and unobserved characteristics of creators and their live streaming sessions, an endogeneity issue concerning effort persists. In particular, creators might have some factors affecting their productivity not observed by researchers such as social skills or creativity. The unobserved productivity could impact both the level of effort (i.e., how much time to spend on live streaming) and the creators’ tipping income, leading to an endogeneity problem. To address this concern, we construct an instrument for effort by exploiting variations in creators’ effort patterns around level-ups. As discussed in Section 3.3.2, live streaming creators can monitor their progress (via experience points) till the next level. We anticipate that level-up progress impacts creators’ effort input. Intuitively, expecting a promotion in the near future may motivate creators to live stream more to unlock the new features and rewards. After a level-up, creators might spend time enjoying their higher levels (which is shown next to the profile photos) or exploring new features. We expect creators’ effort levels will decline over time during the post level-up periods, along with their excitement about leveling up. Figure 3.6 displays streaming hours and log tipping income, both averaged across the number of days till level up. The x-axis represents the time until the next level up, where negative numbers indicate the days before level-ups and positive numbers are days after level-ups. Note that in the case of multiple level-ups, we look at the distance to the closest level. For instance, if a creator gets promoted to Level 2 two days ago and is 5 days till the next level (Level 3). Then her data point is plotted as− 2 on the x-axis. Note that the data patterns we find are not sensitive to the way we deal with the overlaps in level-up progress. Alternative rules, such as including the creator as two data points (i.e., both− 2 and5) and looking at the distance to the farther level (i.e., 112 Figure 3.6: Effort and Income around Level-ups (a) Effort (b) Income 5 on the axis), both yield very similar data patterns. To further verify the consistency of such data patterns and the resulting estimation results, we conduct a robustness in Section 3.5.2, where we use a subset of data consisting of individuals who have only experienced one level-up. Figure 3.6 reveals inverse V-shaped effort and income patterns around level-ups. As shown in Figure 3.6 (a), creators generally exert more effort, spending more time on live streaming as they get closer to the level-up threshold. Immediately after creators get promoted to the next level, however, they tend to exert much less effort. Figure 3.6 (b) illustrates a very similar pattern for tipping income, with creators earning much more when they are closer to the level-up cut- offs. Moreover, a comparison between the two plots in Figure 3.6 indicates that tipping income variations very much mirror those in effort. Although Figure 3.6 offers insights into the validity of our IV, questions may arise regarding to what extent the patterns are driven by individual differences. For instance, certain creators might be more effortful/efficient/productive and frequently experience level-ups, which consistently puts them at the center of Figure 3.6. To account for this heterogeneity, we present the normalized effort and reward in Figure 3.7. Here, effort is normalized by the creator’s average time spent on live streaming (Figure 3.7a), and earnings are normalized by the creator’s average tipping 113 Figure 3.7: Normalized Effort and Income around Level-ups (a) Normalized Effort (b) Normalized Income incomes over time (Figure 3.7b). The two primary observations still hold true after considering the individual differences. First, the temporal distance to level-ups closely correlates with effort so that the closer a creator is till level-ups, the more time a creator spends on live streaming. Second, the tipping income patterns around level-ups mirror those as effort. In our main analysis, we use the temporal distance (i.e., absolute days) to level-up as the instrument for creators’ effort. Accordingly, we construct IVs for effort-attractiveness inter- actions terms (Attractive i · LogStreamHour it and Unattractive i · LogStreamHour it ) using the interactions of absolute days and physical attractiveness (Attractive i · DistLevelUp it and Unattractive i · DistLevelUp it ). We also conduct robustness tests with other IV variants based on the same data patterns, including (i) main the number of days before- and after- level-ups, and (ii) whether a creator is within a 3/7/15/30-day band of level-ups. Our main results are consistent across different instruments. Across the analysis, all of our proposed instruments pass the weak instrument tests (Stock, Wright, and Yogo 2002). 114 3.5 EstimationResults 3.5.1 MainResults Table 3.2 presents the estimation results of our main analysis. Model 1 and Model 2 report esti- mates without and with instruments, respectively. For each model, we include three specifica- tions, namely (a) no effort, (b) with effort but no attractiveness-effort interactions, and (c) with effort and attractiveness-effort interactions. Across all models and specifications, we also include gender, creator level, content dummies, time trend, and individual fixed effects. Note that our beauty measures are constant over time and thus cannot be separately esti- mated from individual fixed effects. In our analysis, we obtain estimates of the baseline beauty premium in two steps (Pesaran and Zhou 2018). First, we estimate the main model and back-up all individual fixed effects. After that, we regress the individual fixed effects on all time constant variables (including beauty measures and gender), and present the estimates in Table 3.2. A comparison between estimates in Model 1 and 2 reveals significant differences due to instru- menting effort variables. Using instruments, effort-related coefficients in Model 2 have smaller magnitude compared to their counterpart without IV in Model 1. Take specification (b) as an example, the effect of streaming an additional one log-hour of streaming (i.e., additional 2.718 hours) on tipping income decreases from 1.614 (S.E. = 0.024) to 1.562 (S.E. = 0.017). In- tuitively, there are indeed unobserved productivity factors at the individual level that correlate with both her effort input and performance, even after controlling for observed individual traits and fixed effects. Ignoring it tends to overestimate the contribution of effort to earnings. For our remaining discussions, we focus on results from the instrumented analysis in Model 2. Overall, there are three main findings. 115 First, both beauty premium and plainness penalty exist on the platform, so that, ceteris paribus, attractive content creators earn more. The baseline beauty premium in Model 2 (a), which is the measurement adopted in the existing literature, suggests that being good-looking generates CNY 1.32 more tipping incomes compared with the ordinary-looking creators and CNY3.22 more com- pared with creators lacking attractiveness for each streaming day, regardless of their effort input or live streaming quality. Those findings are in accordance with the literature. Moreover, the appearance-based baseline disparity persists even if we take into consideration creators’ effort level (Column 2b) and the difference in attractiveness related efficiency (Column 2c), although the magnitude changes. Second, incorporating effort explains a significant part of the beauty premium and ignoring effort leads to an overestimated beauty premium. As highlighted by the comparison between Column 2 (a) and 2(b), adding effort alone (without introducing the effort- enhanced premium) decreases the baseline beauty premium from0.609 (S.E. = 0.003) to0.503 (S.E. = 0.002), explaining away 17.4% of the baseline premium. This implies that the beauty premium measured in existing literature may be biased without considering effort. Third, effort-enhanced beauty premium is an important driving force of both the beauty pre- mium and the earning gaps between attractive content creators and others. Based on model es- timates in Column 2 (c), the estimated baseline beauty premium is0.062 (S.E. = 0.002) and the effort-enhanced beauty premium has an estimate of 0.810 (S.E. = 0.013) at the log-hour scale. In fact, the baseline beauty premium and plainness penalty are almost negligible once we intro- duce the effort-enhanced counterparts into our model, as suggested by the comparison between Column 2 (b) and Column 2 (c). In addition, there is a large discrepancy between effort-enhanced beauty premium (0.810,S.E. = 0.013) and plainness penalty (− 0.664,S.E. = 0.013), suggesting that it could be an important explanation for the income gap for attractive versus unattractive 116 individuals on the platform. We will further quantify the role of baseline and effort-enhanced beauty premium and discuss the implications in Section 3.6. 3.5.2 RobustnessCheck: SingleLevel-upSubsample As discussed in Section 3.4.2, due to multiple level-up achievements at individual level, we con- struct our instruments using the distance to the closest level. To test the robustness of such construction and validate our findings, we replicate our analysis using a subsample consisting of individuals who only experience one level-up during our data period. This leaves us with4,076 creators and their403,524 live streaming records. Figure 3.8 displays (un)normalized individual creators’ time devoted to live streaming and their earnings as they approach and pass their level-up milestone for individuals who experience level-up only once during our sample period. Overall, both unnormalized and normalized data patterns using this sub sample largely resemble those illustrated in Figure 3.6 and Figure 3.7. It suggests that individuals who have one single level-up (and thus do not have the problem of overlapping pre- and post-level up experience) also demonstrate more effort as they get closer to the level promotion and decrease their time input right afterwards. Their income patterns also mimic the variations in effort. We then estimate our full model with both effort and efficiency and present the model es- timates in Table 3.3. Compared with the results in Table 3.2 Column 2 (c), estimation using the subsample here yields very similar results, particularly for the baseline premium, the effort- enhanced premium, and the effects of effort. Our analysis remains valid regardless of single or multiple level-ups situations. 117 Figure 3.8: Effort and Income around Level-ups: Single Level-up Subsample (a) Effort (b) Income (c) Normalized Effort (d) Normalized Income 118 3.5.3 RobustnessCheck: OtherFormsofInstruments As another robustness check, we construct a set of different instruments based on the data pat- terns described in Section 3.4.2, namely (i) the number of days before and after level-ups, and (ii) if the creator experiences any level up during certain time windows. Table 3.4 presents the results. Column (a) shows the model estimates using the number of days before and after level-ups as the instrument for effort, where a negative number indicates the days before level-ups and a positive number stands for post level-up days. Overall, results that we discuss in Section 3.5.1 also apply here, especially the relative magnitude of effort-enhanced beauty premium versus the baseline premium. For results in Column (b), the instruments are indicators of whether a live streaming creator experiences any level-up during some time windows:± 3/7/15/30 days. Take the case of± 3-day window as an example. The binary instrument takes the value one at time t if a creator has at least one level-up during the[t− 3,t+3] time window. Estimation results from those columns indicate that results are robust across different instruments and even the numerical values for beauty premium and effort coefficients are similar to those presented in Section 3.5.1. 3.5.4 RobustnessCheck: Rookiesvs. ExperiencedCreators As we briefly discussed in Section 3.4.1, our research does not aim to disentangle the dynamics of beauty premium and the start-up effect of attractiveness. However, we do leverage a variation in the recommender system design to illustrate how the baseline and effort-enhanced beauty premiums differ across rookies and experienced content creators on the platform. 119 In particular, the platform considers a content creator as a rookie once she enters the platform and starts live streaming. As she spends more time and accumulates more tipping incomes, she will pass a threshold and be regarded as an experienced creator by the platform. The threshold is based on cumulative days of active streaming on the platform and the cumulative income earned. Overall, about 10.41% of the streaming sessions are contributed by rookies. Note that creators are not informed of this status change. The platform then utilizes this information to build their recommender system. The system relies heavily on being attractive to recommend rookies as a way to deal with the “cold start” problem, so that attractive creators are listed relatively on top and receive more exposure. Once the creators are considered as experienced, the platform will minimize the use of attractiveness information. Accordingly, we expect a higher baseline beauty premium for the rookies due to this design. Table 3.5 presents model estimates where we consider the rookie vs. experienced differences by including them both into the baseline premium and the effort-enhanced premium. Results suggest that being a rookie does increase the baseline beauty premium by 464% but does not affect the baseline plainness penalty. This is in accordance with the recommender system de- sign. Interestingly, attractive rookies creators have a lower effort-enhanced premium than the attractive creators who are experienced. Altogether, the results indicate that for attractive con- tent creators, although they are favored by the platform, they are not as efficient as experienced creators in translating their effort to earnings. 120 3.6 Implications In this section, we quantify the importance of the baseline vs. the effort-enhanced components in driving the beauty premium and income gaps between the attractive creators and others. Al- together, our results suggest that the effort-enhanced beauty premium is the main driving force. Our calculations in this section are based on Column 2 in Table 3.2. Quantifyingthebeautypremium. Based on results in Table 3.2 Column 2 (c), simple algebra suggests that the baseline beauty premium amounts to only2.4 minutes’ streaming for an attrac- tive live streaming content creator, which is relatively minimal time compared with the average streaming hours among the attractive individuals (2.01 hours) at daily level. Based on this aver- age effort level, we further decompose the beauty premium into the baseline premium and the effort-enhanced premium (efficiency). Note that only the baseline premium is accounted for in existing literature. Figure 3.9 plots the decomposition from three specifications: (i) without effort or efficiency (Table 3.2, Column 2a), (ii) with effort but not efficiency (Table 3.2, Column 2b), and (iii) with both effort and efficiency (Table 3.2, Column 2c). As noted from Figure 3.9, the effort-enhanced beauty premium (the orange bar) is the dom- inant component, which accounts for 93.5% of the total premium, according to our full model. Moreover, two findings arise as we compare across different models. First, failing to consider effort and/or efficiency leads to an overestimated baseline beauty premium (the blue bar). While the true baseline premium is about0.062 in our model, a model ignoring effort and effort gives an inflated estimate of 0.609. Second, if we look at the overall premium (orange + blue bars), models ignoring effort and efficiency actually underestimate the total premium for attractive creators on the platform. Note that by incorporating beauty premium in the form similar to an individual 121 Table 3.1: Explanation of Model Variables Variable Meaning LogIncome it Creatori’s log-income in at timet. Attractive i A dummy indicating whether creatori is physically attractive. Unattractive i A dummy indicating whether creatori is physically unattractive.. LogStreamHour it Hours creator i spends on live streaming at time t. Female i A dummy indicating whether creatori is female. CreatorLevel it A dummy indicating whether creatori experiences physical lockdown at timet. Content it Creatori’s level at timet Time t The time trend normalized to between zero and one. Figure 3.9: Decomposing Beauty Premium 122 Table 3.2: Main Estimation Results (1) No Instrument (a) No Effort (b) With Effort (c) With Effort No Efficiency No Efficiency and Efficiency Appearance Attractive 0.609 ∗∗∗ 0.492 ∗∗∗ 0.037 ∗∗∗ ∗∗∗ (0.003) (0.002) (0.002) Unattractive − 0.263 ∗∗∗ − 0.2364 ∗∗∗ 0.005 ∗∗∗ ∗∗∗ (0.002) (0.002) (0.002) Effort LogStreamHour – 1.495 ∗∗∗ 1.614 ∗∗∗ ∗∗∗ (0.014) (0.024) Attractive x LogStreamHour – – 0.851 ∗∗∗ ∗∗∗ (0.027) Unattractive x LogStreamHour – – − 0.707 ∗∗∗ ∗∗∗ (0.027) Control Female 0.017 ∗∗∗ 0.006 ∗∗∗ 0.061 ∗∗ ∗∗∗ (0.001) (0.001) (0.001) CreatorLevel 0.004 ∗∗∗ 0.003 ∗∗∗ 0.003 ∗∗∗ ∗∗∗ (0.001) (0.000) (0.000) Intercept 0.213 ∗∗∗ 0.011 − 0.002 (0.022) (0.018) (0.016) Content Dummies YES YES YES TimeTrend YES YES YES FE YES YES YES Model Fit R-squared 0.597 0.701 0.754 123 Table 3.2: Main Estimation Results (Continued) (2) Instrumented (a) No Effort (b) With Effort (c) With Effort No Efficiency No Efficiency and Efficiency Appearance Attractive 0.609 ∗∗∗ 0.503 ∗∗∗ 0.062 ∗∗∗ (0.003) (0.002) (0.002) Unattractive − 0.263 ∗∗∗ − 0.238 ∗∗∗ − 0.011 ∗∗∗ (0.002) (0.002) (0.002) Effort LogStreamHour – 1.361 ∗∗∗ 1.562 ∗∗∗ (0.0033) (0.017) Attractive x LogStreamHour – – 0.810 ∗∗∗ (0.013) Unattractive x LogStreamHour – – − 0.664 ∗∗∗ (0.013) Control Female 0.017 ∗∗∗ 0.007 ∗∗∗ 0.059 ∗∗∗ (0.001) (0.001) (0.001) CreatorLevel 0.004 ∗∗∗ 0.003 ∗∗∗ 0.003 ∗∗∗ (0.001) (0.000) (0.000) Intercept 0.213 ∗∗∗ 0.029 0.005 (0.022) (0.018) (0.016) Content Dummies YES YES YES TimeTrend YES YES YES FE YES YES YES Model Fit R-squared 0.597 0.701 0.755 Note: 1. Results significant at 90%,95%, and99% levels are indicted by ( ∗ ), ( ∗∗ ), and ( ∗∗∗ ), respectively. 2. For variables constant over time (including Attractive, Unattractive, and Female), we obtain its estimates as follows: first back up all the individual fixed effects, and then regress the individual fixed effects on those variables. 124 Table 3.3: Estimation Results: Single Level-up Subsample Variable Estimate S.E. Appearance Attractive 0.030 ∗∗∗ (0.003) Unattractive − 0.027 ∗∗∗ (0.002) Effort LogStreamHour 1.304 ∗∗∗ (0.136) Attractive x LogStreamHour 1.176 ∗∗∗ (0.129) Unattractive x LogStreamHour − 0.529 ∗∗∗ (0.109) Control Female − 0.003 ∗∗ (0.001) CreatorLevel 0.002 ∗∗∗ (0.000) Intercept − 0.008 (0.018) Content YES Time YES FE YES Model Fit R-squared 0.698 Note: 1. Results significant at 90%,95%, and99% levels are indicted by ( ∗ ), ( ∗∗ ), and ( ∗∗∗ ), respectively. 2. For variables constant over time (including Attractive, Unattractive, and Female), we obtain its estimates as follows: first back up all the individual fixed effects, and then regress the individual fixed effects on those variables. fixed effect, the existing literature suggests that beauty premium is effort-independent, such that attractive individuals may benefit without putting in much effort. Here our results suggest that there is indeed a small amount of unexplained premium from being good-looking, the major part of the beauty premium, however, does not come free. Attractive individuals need to exert effort to capitalize their beauty endowment. Indeed, in our data, we observe them investing more time on live streaming. Quantifyingtheincomegap. Since the effort-enhanced premium is the dominating compo- nent, less attractive creators could catch up by investing more effort to catch up as suggested by our results — but by how much? Figure 3.10 quantifies the number of hours an average-looking and an unattractive creator needs to invest, in order to achieve the earnings equivalent to one hour live streaming for an attractive individual. Here we look at the income gap suggested by our 125 Table 3.4: Estimation Results: Alternative IVs (a) Days Before/ (b) Any Level-up During the Time Window After Level-up 3 day 7 day 15 day 30 day Appearance Attractive 0.143 ∗∗∗ 0.090 ∗∗∗ 0.070 ∗∗∗ 0.060 ∗∗∗ 0.045 ∗∗∗ (0.002) (0.002) (0.002) (0.002) (0.002) Unattractive − 0.003 ∗ 0.000 − 0.001 − 0.007 ∗∗∗ − 0.017 ∗∗∗ (0.002) (0.002) (0.002) (0.002) (0.002) Effort LogStreamHour 1.461 ∗∗∗ 1.575 ∗∗∗ 1.621 ∗∗∗ 1.660 ∗∗∗ 1.555 ∗∗∗ (0.0238) (0.029) (0.030) (0.036) (0.055) Attractive x LogStreamHour 0.673 ∗∗∗ 0.755 ∗∗∗ 0.786 ∗∗∗ 0.800 ∗∗∗ 0.843 ∗∗∗ (0.173) (0.030) (0.031) (0.035) (0.044) Unattractive x LogStreamHour − 0.678 ∗∗∗ − 0.688 ∗∗∗ − 0.685 ∗∗∗ − 0.669 ∗∗∗ − 0.651 ∗∗∗ (0.159) (0.029) (0.030) (0.032) (0.041) Control Female 0.057 ∗∗∗ 0.058 ∗∗∗ 0.059 ∗∗∗ 0.058 ∗∗ 0.059 ∗∗∗ (0.001) (0.001) (0.001) (0.001) (0.001) CreatorLevel 0.003 ∗∗∗ 0.003 ∗∗∗ 0.003 ∗∗∗ 0.003 ∗∗∗ 0.003 ∗∗∗ (0.000) (0.000) (0.000) (0.000) (0.001) Intercept 0.025 0.007 − 0.001 − 0.008 0.003 (0.036) (0.016) (0.016) (0.016) (0.017) Content Dummies YES YES YES YES YES TimeTrend YES YES YES YES YES FE YES YES YES YES YES Model Fit R-squared 0.754 0.755 0.755 0.755 0.755 Note: 1. Results significant at 90%,95%, and99% levels are indicted by ( ∗ ), ( ∗∗ ), and ( ∗∗∗ ), respectively. 2. For variables constant over time (including Attractive, Unattractive, and Female), we obtain its estimates as follows: first back up all the individual fixed effects, and then regress the individual fixed effects on those variables. 126 Table 3.5: Estimation Results: Rookie vs. Experienced Creators Variable Estimate S.E. Rookie Rookie − 0.025 (0.023) Rookie x Attractive 0.079 ∗ (0.041) Rookie x Unattractive − 0.020 (0.033) Rookie x LogStreamHour − 0.063 (0.040) Rookie x Attractive x LogStreamHour − 0.476 ∗∗∗ (0.054) Rookie x Unattractive x LogStreamHour 0.071 (0.052) Appearance Attractive 0.017 ∗∗∗ (0.002) Unattractive − 0.005 ∗∗∗ (0.002) Effort LogStreamHour 0.002 ∗∗∗ (0.000) Attractive x LogStreamHour 0.903 ∗∗∗ (0.040) Unattractive x LogStreamHour − 0.687 ∗∗∗ (0.037) Control Female 0.062 ∗∗∗ (0.001) CreatorLevel 0.003 ∗∗∗ (0.000) Intercept 0.018 (0.017) Content YES Time YES FE YES Model Fit R-squared 0.756 Note: 1. Results significant at 90%,95%, and99% levels are indicted by ( ∗ ), ( ∗∗ ), and ( ∗∗∗ ), respectively. 2. For variables constant over time (including Attractive, Unattractive, and Female), we obtain its estimates as follows: first back up all the individual fixed effects, and then regress the individual fixed effects on those variables. 127 Figure 3.10: Quantifying Income Gaps model specifications with (i) effort but no efficiency (Table 3.2, Column 2b), and (ii) both effort and efficiency (Table 3.2, Column 2c). Note that it is not possible to quantify the income gap in terms of hours to spend if effort is excluded from the analysis. As illustrated by Figure 3.10, if we only consider effort (but not efficiency differences), ordinary- looking and unattractive creators need to spend1.9 and2.4 hours, respectively, to achieve earn- ings equivalent to one-hour tipping incomes for the attractive individuals. If we also take effi- ciency into account, to fill the income gap, an ordinary-looking content creator needs to spend almost two hours for every hour spent by an attractive creator; that number increases to5.8 hours for unattractive creators. Taking into account the difference in efficiency expands the additional effort for creators who do not belong to the attractive group. Indeed the disparity in efficiency (or equivalently, the beauty-enhanced premium) is the driving force behind the income difference. But one important message that we hope to deliver with this analysis, is that despite the need for increased effort, the income gap between attractive creators and the others can still be bridged. 128 3.7 Conclusion In this paper, we study the beauty premium for live streaming content creators. We contribute to the literature by highlighting the role of effort in quantifying and explaining beauty premium. While the baseline effort-independent premium has been intensively studied, we introduce a new component of the premium enhanced by effort. The effort-enhanced premium captures the efficiency advantage for attractive individuals due to a higher marginal productivity. According to our analysis, there is a large total premium for the attractive content creators and a penalty for the less attractive individual: to match the earnings of an attractive creator during a one- hour live streaming, an unattractive creator would have to devote 5.8 hours to close the income gap. More importantly, we show that the effort-enhanced premium is actually the dominant component of beauty premium, accounting for about93.5% of the total premium. The baseline premium decreases drastically in magnitude as we incorporate effects of effort and effort-beauty interaction. Therefore, accounting for effort in beauty premium studies not only reduces the bias in beauty premium measurement, but also reveals that the efficiency advantage is the main driver of the income disparity between the attractive creators and their less attractive peers. Unlike the effort- independent baseline premium, which allows individuals to benefit solely from their beauty, a dominant effort-enhanced premium indicates that beauty is not an effortless asset and requires effort to fully capitalize on. These findings have significant implications for both content creators and the platform. At- tractive creators must recognize that effort is necessary to activate their efficiency advantage, whereas less attractive creators can view this as encouraging evidence: the attractiveness-related 129 income disparity can be closed by exerting more effort. For the platforms, they should never discriminate against less attractive individuals solely based on their appearance in hiring and identifying creators with high potentials. Based on our study, it is a decision that would both foster inclusivity and lead to better earnings for all involved. Our paper has some limitations that present opportunities for future research. First, our defi- nition of beauty premium does not isolate the premium by audiences (e.g., higher tipping income during live streaming) from the advantage provided by the platform, such as high exposure fa- vorable by the recommender system. With experiments, researchers could estimate the relative contributions of both sources. Second, while we have controlled for creators’ level on the plat- form, we do not formally model the dynamics of beauty premium over time. Further investigation could explore how the premium accumulates over time and the relative importance of factors that shape the beauty premium at different stages. 130 Bibliography Ackerberg, Daniel A. 2003. “Advertising, learning, and consumer choice in experience good markets: an empirical examination.” International Economic Review 44 (3): 1007–1040. Agthe, Maria, Matthias Spörrle, and Jon K Maner. 2011. “Does being attractive always help? Positive and negative effects of attractiveness on social decision making.” Personality and Social Psychology Bulletin 37 (8): 1042–1054. Algesheimer, René, Sharad Borle, Utpal M Dholakia, and Siddharth S Singh. 2010. “The impact of customer community participation on customer behaviors: An empirical investigation.” Marketing Science 29 (4): 756–769. Algesheimer, René, Utpal M Dholakia, and Andreas Herrmann. 2005. “The social influence of brand community: Evidence from European car clubs.” Journal of Marketing 69 (3): 19–34. Ameri, Mina, Elisabeth Honka, and Ying Xie. 2017. “A Structural Model of Network Dynamics: Tie Formation, Product Adoption, and Content Generation.” Working Paper. Anderson, Carolyn J, Stanley Wasserman, and Katherine Faust. 1992. “Building stochastic blockmodels.” Social Networks 14 (1-2): 137–161. Anderson, John Robert. 2000. Learning and memory: An integrated approach. John Wiley & Sons Inc. Andreoni, James, and Ragan Petrie. 2008. “Beauty, gender and stereotypes: Evidence from laboratory experiments.” Journal of Economic Psychology 29 (1): 73–93. Aral, Sinan, and Dylan Walker. 2014. “Tie strength, embeddedness, and social influence: A large-scale networked experiment.” Management Science 60 (6): 1352–1370. Argo, Jennifer J, Darren W Dahl, and Andrea C Morales. 2008. “Positive consumer contagion: Responses to attractive others in a retail context.” Journal of Marketing Mesearch 45 (6): 690–701. 131 Arrow, Kenneth Joseph. 1971. “The economic implications of learning by doing.” In Readings in the Theory of Growth, 131–149. Springer. Bakshy, Eytan, Dean Eckles, Rong Yan, and Itamar Rosenn. 2012. “Social influence in social advertising: Evidence from field experiments.” In Proceedings of the13th ACM Conference on Electronic Commerce, 146–161. Barlevy, Gadi, and Derek Neal. 2019. “Allocating effort and talent in professional labor markets.” Journal of Labor Economics 37 (1): 187–246. Bawa, Kapil. 1990. “Modeling inertia and variety seeking tendencies in brand choice behavior.” Marketing Science 9 (3): 263–278. Beidleman, Carl R. 1973. “Income smoothing: The role of management.” The Accounting Review 48 (4): 653–667. Bessi, Alessandro, Mauro Coletto, George Alexandru Davidescu, Antonio Scala, Guido Caldarelli, and Walter Quattrociocchi. 2015. “Science vs conspiracy: Collective narratives in the age of misinformation.” PloS One 10 (2): e0118093. Biddle, Jeff E, and Daniel S Hamermesh. 1998. “Beauty, productivity, and discrimination: Lawyers’ looks and lucre.” Journal of Labor Economics 16 (1): 172–201. Boucher, Vincent. 2016. “Conformism and self-selection in social networks.” Journal of Public Economics 136:30–44. Bramoullé, Yann, Habiba Djebbari, and Bernard Fortin. 2009. “Identification of peer effects through social networks.” Journal of Econometrics 150 (1): 41–55. Brown, Steven P, and Robert A Peterson. 1994. “The effect of effort on sales performance and job satisfaction.” Journal of Marketing 58 (2): 70–80. Burtch, Gordon, Yili Hong, Ravi Bapna, and Vladas Griskevicius. 2018. “Stimulating online reviews by combining financial incentives and social norms.” Management Science 64 (5): 2065–2082. Byrnes, James P, David C Miller, and William D Schafer. 1999. “Gender differences in risk taking: A meta-analysis.” Psychological Bulletin 125 (3): 367. Centola, Damon, and Michael Macy. 2007. “Complex contagions and the weakness of long ties.” American Journal of Sociology 113 (3): 702–734. Chamakiotis, Petros, Dimitra Petrakaki, and Niki Panteli. 2021. “Social value creation through digital activism in an online health community.”InformationSystemsJournal 31 (1): 94–119. 132 Chan, Tat Y, Jia Li, and Lamar Pierce. 2014. “Learning from peers: Knowledge transfer and sales force productivity growth.” Marketing Science 33 (4): 463–484. Chen, Jianqing, Hong Xu, and Andrew B Whinston. 2011. “Moderated online communities and quality of user-generated content.” Journal of Management Information Systems 28 (2): 237–268. Chen, Xi, Ralf van der Lans, and Michael Trusov. 2021. “Efficient estimation of network games of incomplete information: Application to large online social networks.” Management Science 67 (12): 7575–7598. Ching, Andrew T, Tülin Erdem, and Michael P Keane. 2013. “Learning models: An assessment of progress, challenges, and new developments.” Marketing Science 32 (6): 913–938. Chintagunta, Pradeep K, Renna Jiang, and Ginger Z Jin. 2009. “Information, learning, and drug diffusion: The case of Cox-2 inhibitors.” Quantitative Marketing and Economics 7 (4): 399–443. Chiong, Khai Xiang, Alfred Galichon, and Matt Shum. 2016. “Duality in dynamic discrete-choice models.” Quantitative Economics 7 (1): 83–115. Christakis, Nicholas, James Fowler, Guido W Imbens, and Karthik Kalyanaraman. 2020. “An empirical model for strategic network formation.” In The Econometric Analysis of Network Data, 123–148. Elsevier. Chylinski, Mathew B, John H Roberts, and Bruce GS Hardie. 2012. “Consumer learning of new binary attribute importance accounting for priors, bias, and order effects.” Marketing Science 31 (4): 549–566. Cipriani, Giam Pietro, and Angelo Zago. 2011. “Productivity or discrimination? Beauty and the exams.” Oxford Bulletin of Economics and Statistics 73 (3): 428–447. Cohn, Alain, Jan Engelmann, Ernst Fehr, and Michel André Maréchal. 2015. “Evidence for countercyclical risk aversion: An experiment with financial professionals.” American Economic Review 105 (2): 860–85. Coleman, James S. 1988. “Free riders and zealots: The role of social networks.” Sociological Theory 6 (1): 52–57. Cong, Ziwei, Jia Liu, and Puneet Manchanda. 2021. “The role of "live" in livestreaming markets: Evidence using orthogonal random forest.” arXiv preprint arXiv:2107.01629. Cong, Ziwei, Ying Zhao, and Zilei Zhang. 2018. “Understanding users’ content contribution behavior when content can be priced.” Thesis. The Hong Kong University of Science and Technology. 133 Conover, Michael D, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves, Filippo Menczer, and Alessandro Flammini. 2011. “Political polarization on Twitter.” In Fifth International AAAI Conference on Weblogs and Social Media. Coscelli, Andrea, and Matthew Shum. 2004. “An empirical model of learning and patient spillovers in new drug entry.” Journal of Econometrics 122 (2): 213–246. DeGroot, Morris H. 2005. Optimal statistical decisions. John Wiley & Sons. Del Vicario, Michela, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H Eugene Stanley, and Walter Quattrociocchi. 2016. “The spreading of misinformation online.” Proceedings of the National Academy of Sciences 113 (3): 554–559. Deryugina, Tatyana, and Olga Shurchkov. 2015. “Now you see it, now you don’t: The vanishing beauty premium.” Journal of Economic Behavior & Organization 116:331–345. Dover, Yaniv, Jacob Goldenberg, and Daniel Shapira. 2020. “Sustainable online communities exhibit distinct hierarchical structures across scales of size.” Proceedings of the Royal Society A 476 (2239): 20190730. Duclos, Rod, Echo Wen Wan, and Yuwei Jiang. 2013. “Show me the honey! Effects of social exclusion on financial risk-taking.” Journal of Consumer Research 40 (1): 122–135. Dunbar, Robin IM. 2016. “Do online social media cut through the constraints that limit the size of offline social networks?” Royal Society Open Science 3 (1): 150292. Dzemski, Andreas. 2019. “An empirical model of dyadic link formation in a network with unobserved heterogeneity.” Review of Economics and Statistics 101 (5): 763–776. Eagly, Alice H, Richard D Ashmore, Mona G Makhijani, and Laura C Longo. 1991. “What is beautiful is good, but...: A meta-analytic review of research on the physical attractiveness stereotype.” Psychological Bulletin 110 (1): 109. Easley, David, Jon Kleinberg, et al. 2010. Networks, crowds, and markets. Vol. 8. Cambridge University Press. Elder Jr, Glen H. 1969. “Appearance and education in marriage mobility.” American Sociological Review, 519–533. Erdem, Tülin, and Michael P Keane. 1996. “Decision-making under uncertainty: Capturing dynamic brand choice processes in turbulent consumer goods markets.” Marketing Science 15 (1): 1–20. Erdem, Tülin, Michael P Keane, and Baohong Sun. 2008. “A dynamic model of brand choice when price and advertising signal product quality.” Marketing Science 27 (6): 1111–1125. 134 Farrell, Max H, Tengyuan Liang, and Sanjog Misra. 2020. “Deep learning for individual heterogeneity.” arXiv preprint arXiv:2010.14694. Feingold, Alan. 1988. “Matching for attractiveness in romantic partners and same-sex friends: A meta-analysis and theoretical critique.” Psychological Bulletin 104 (2): 226. Filippin, Antonio, and Paolo Crosetto. 2016. “A reconsideration of gender differences in risk attitudes.” Management Science 62 (11): 3138–3160. Fuentes, J Rodrigo, and Edward E Leamer. 2019. Effort: The unrecognized contributor to US income inequality. Technical report. National Bureau of Economic Research. Gelman, Andrew, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. Chapman & Hall/CRC. Geyser, Werner. 2022. “The state of influencer marketing 2022: benchmark report,” March. https://influencermarketinghub.com/influencer-marketing-benchmark-report/. Girvan, Michelle, and Mark EJ Newman. 2002. “Community structure in social and biological networks.” Proceedings of the National Academy of Sciences 99 (12): 7821–7826. Goettler, Ronald L, and Karen Clay. 2011. “Tariff choice with consumer learning and switching costs.” Journal of Marketing Research 48 (4): 633–652. Goh, Jie Mein, Guodong Gao, and Ritu Agarwal. 2016. “The creation of social value: Can an online health community reduce rural-urban health disparities?” MIS Quarterly 40 (1): 247–264. Goldsmith-Pinkham, Paul, and Guido W Imbens. 2013. “Social networks and the identification of peer effects.” Journal of Business & Economic Statistics 31 (3): 253–264. Graham, Bryan S. 2017. “An econometric model of network formation with degree heterogeneity.” Econometrica 85 (4): 1033–1063. Granovetter, Mark S. 1973. “The strength of weak ties.” American Journal of Sociology 78 (6): 1360–1380. Greene, Derek, Donal Doyle, and Padraig Cunningham. 2010. “Tracking the evolution of communities in dynamic social networks.” In 2010 International Conference on Advances in Social Networks Analysis and Mining, 176–183. IEEE. Grover, Aditya, and Jure Leskovec. 2016. “node2vec: Scalable feature learning for networks.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855–864. 135 Gu, Liyi, Ilya Ryzhov, Shawn Mankad, and Bin Han. 2017. “Social behavior and user engagement in competitive online gaming: An empirical analysis.” Working Paper. Guerra, Pedro Calais, Wagner Meira Jr, Claire Cardie, and Robert Kleinberg. 2013. “A measure of polarization on social media networks based on community boundaries.” In Seventh international AAAI conference on weblogs and social media. Halford, Joseph T, and Hung-Chia S Hsu. 2020. “Beauty is wealth: CEO attractiveness and firm value.” Financial Review 55 (4): 529–556. Hall, Sarah Lindenfeld. 2022. “5 burnout triggers for creators and how to get support,” January. https://www.thetilt.com/content-entrepreneur/creator-burnout-support. Hamermesh, Daniel S. 2006. “Changing looks and changing “discrimination”: The beauty of economists.” Economics Letters 93 (3): 405–412. Hamermesh, Daniel S, and Jeff Biddle. 1994. “Beauty and the labor market.” American Economic Review 84 (5): 1174–1194. Hanaoka, Chie, Hitoshi Shigeoka, and Yasutora Watanabe. 2018. “Do risk preferences change? Evidence from the great east Japan earthquake.” American Economic Journal: Applied Economics 10 (2): 298–330. Hanna, Sherman D, and Suzanne Lindamood. 2010. “Quantifying the economic benefits of personal financial planning.” Financial Services Review 19 (2). He, Tingting, Dmitri Kuksov, and Chakravarthi Narasimhan. 2012. “Intraconnectivity and interconnectivity: When value creation may reduce profits.” Marketing Science 31 (4): 587–602. Henry, Adam Douglas, Paweł Prałat, and Cun-Quan Zhang. 2011. “Emergence of segregation in evolving social networks.” Proceedings of the National Academy of Sciences 108 (21): 8605–8610. Hinds, David, and Ronald M Lee. 2008. “Social network structure as a critical success condition for virtual communities.” In Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008), 323–323. IEEE. Ho, Qirong, Junming Yin, and Eric P Xing. 2016. “Latent space inference of internet-scale networks.” The Journal of Machine Learning Research 17 (1): 2756–2796. Holland, Paul W, Kathryn Blackmond Laskey, and Samuel Leinhardt. 1983. “Stochastic blockmodels: First steps.” Social networks 5 (2): 109–137. 136 Horwitz, Jeff, and Deepa Seetharaman. 2020. “Facebook Executives Shut Down Efforts to Make the Site Less Divisive Facebook Executives Shut Down Efforts to Make the Site Less Divisive Facebook Executives Shut Down Efforts to Make the Site Less Divisive,” May. https://www.wsj.com/articles/facebook-knows-it-encourages-division-top-executives- nixed-solutions-11590507499. Hsieh, Chih-Sheng, and Lung Fei Lee. 2016. “A social interactions model with endogenous friendship formation and selectivity.” Journal of Applied Econometrics 31 (2): 301–319. Huang, Yan, Stefanus Jasin, and Puneet Manchanda. 2019. ““Level up”: Leveraging skill and engagement to maximize player game-play in online video games.” Information Systems Research 30 (3): 927–947. Huang, Yan, Param Vir Singh, and Kannan Srinivasan. 2014. “Crowdsourcing new product ideas under consumer learning.” Management Science 60 (9): 2138–2159. Hukal, Philipp, Ola Henfridsson, Maha Shaikh, and Geoffrey Parker. 2020. “Platform signaling for generating platform content.” MIS Quarterly 44 (3): 1177–1205. Hutchinson, Andrew. 2022. “YouTube generated $28.8 Billion in ad revenue in 2021, fueling the creator economy,” February. https://influencermarketinghub.com/influencer-marketing-benchmark-report/. Hwang, Elina H, and David Krackhardt. 2020. “Online knowledge communities: Breaking or sustaining knowledge silos?” Production and Operations Management 29 (1): 138–155. Israel, Mark. 2005. “Services as experience goods: An empirical examination of consumer learning in automobile insurance.” American Economic Review 95 (5): 1444–1463. Jackson, Matthew O. 2010. Social and economic networks. Princeton university press. Jain, Sanjay, and Kun Qian. 2021. “Compensating online content producers: A theoretical analysis.” Management Science 67 (11): 7075–7090. Jarrett, Kylie. 2022. Digital labor (digital Media and society). 1st ed. John Wiley & Sons. Java, Akshay, Xiaodan Song, Tim Finin, and Belle Tseng. 2007. “Why we Twitter: Understanding microblogging usage and communities.” In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, 56–65. Karrer, Brian, and Mark EJ Newman. 2011. “Stochastic blockmodels and community structure in networks.” Physical Review E 83 (1): 016107. Katz, Michael L, and Carl Shapiro. 1985. “Network externalities, competition, and compatibility.” American Economic Review 75 (3): 424–440. 137 Katz, Michael L, and Carl Shapiro. 1994. “Systems competition and network effects.” The Journal of Economic Perspectives 8 (2): 93–115. Khern-am-nuai, Warut, Karthik Kannan, and Hossein Ghasemkhani. 2018. “Extrinsic versus intrinsic rewards for contributing reviews in an online platform.” Information Systems Research 29 (4): 871–892. Koch, Bruce S. 1981. “Income smoothing: An experiment.” Accounting Review, 574–586. Kuang, Lini, Ni Huang, Yili Hong, and Zhijun Yan. 2019. “Spillover effects of financial incentives on non-incentivized user engagement: Evidence from an online knowledge exchange platform.” Journal of Management Information Systems 36 (1): 289–320. Lawrence, Eric, John Sides, and Henry Farrell. 2010. “Self-segregation or deliberation? Blog readership, participation, and polarization in American politics.” Perspectives on Politics 8 (1): 141–157. Lemenager, Tagrid, Miriam Neissner, Anne Koopmann, Iris Reinhard, Ekaterini Georgiadou, Astrid Müller, Falk Kiefer, and Thomas Hillemacher. 2021. “COVID-19 lockdown restrictions and online media consumption in Germany.” International Journal of Environmental Research and Public Health 18 (1): 14. Li, Kun, Xiaofeng Gong, Shuguang Guan, and C-H Lai. 2012. “Efficient algorithm based on neighborhood overlap for community identification in complex networks.” Physica A: Statistical Mechanics and its Applications 391 (4): 1788–1796. Lin, Yu-Ru, Yun Chi, Shenghuo Zhu, Hari Sundaram, and Belle L Tseng. 2009. “Analyzing communities and their evolutions in dynamic social networks.” ACM Transactions on Knowledge Discovery from Data (TKDD) 3 (2): 1–31. Lin, Yan, Dai Yao, and Xingyu Chen. 2021. “Happiness begets money: emotion and engagement in live streaming.” Journal of Marketing Research 58 (3): 417–438. Liu, Haoyu, Kim Hua Tan, and Kulwant Pawar. 2022. “Predicting viewer gifting behavior in sports live streaming platforms: the impact of viewer perception and satisfaction.” Journal of Business Research 144:599–613. Lorenz, Taylor. 2021. “Young creators are burning out and breaking down,” September. https://www.nytimes.com/2021/06/08/style/creator-burnout-social-media.html. Lu, Shijie, Dai Yao, Xingyu Chen, and Rajdeep Grewal. 2021. “Do larger audiences generate greater revenues under pay what you want? evidence from a live streaming platform.” Marketing Science 40 (5): 964–984. 138 Lu, Yingda, Kinshuk Jerath, and Param Vir Singh. 2013. “The emergence of opinion leaders in a networked online community: A dyadic model with time dynamics and a heuristic for fast estimation.” Management Science 59 (8): 1783–1799. Ma, Xuejing, Zetao Wang, and Hongju Liu. 2022. “Do long-life customers pay more in pay-what-you-want pricing? Evidence from live streaming.” Journal of Business Research 142:998–1009. Malik, Nikhil, Param Vir Singh, and Kannan Srinivasan. 2019. “A dynamic analysis of beauty premium.” Working Paper. Manchanda, Puneet, Grant Packard, and Adithya Pattabhiramaiah. 2015. “Social dollars: The economic impact of customer participation in a firm-sponsored online customer community.” Marketing Science 34 (3): 367–387. Manski, Charles F. 1993. “Identification of endogenous social effects: The reflection problem.” The Review of Economic Studies 60 (3): 531–542. . 2000. “Economic analysis of social interactions.” Journal of Economic Perspectives 14 (3): 115–136. Marzouki, Yousri, Fatimah Salem Aldossari, and Giuseppe A Veltri. 2021. “Understanding the buffering effect of social media use on anxiety during the COVID-19 pandemic lockdown.” Humanities and Social Sciences Communications 8 (1): 1–10. McNulty, James K, Lisa A Neff, and Benjamin R Karney. 2008. “Beyond initial attraction: physical attractiveness in newlywed marriage.” Journal of Family Psychology 22 (1): 135. Mehta, Nitin, Surendra Rajiv, and Kannan Srinivasan. 2004. “Role of forgetting in memory-based choice decisions: A structural model.” Quantitative Marketing and Economics 2 (2): 107–140. Mims, Christopher. 2020. “Why Social Media Is So Good at Polarizing Us,” October. https: //www.wsj.com/articles/why-social-media-is-so-good-at-polarizing-us-11603105204. Mobius, Markus M, and Tanya S Rosenblat. 2006. “Why beauty matters.” American Economic Review 96 (1): 222–235. Morduch, Jonathan. 1995. “Income smoothing and consumption smoothing.” Journal of Economic Perspectives 9 (3): 103–114. Morris, Stephen. 2000. “Contagion.” The Review of Economic Studies 67 (1): 57–78. Murthy, Naimeesha. 2021. “The continuous growth and future of the creator economy,” August. https://www.forbes.com/sites/forbesbusinessdevelopmentcouncil/2021/08/30/the- continuous-growth-and-future-of-the-creator-economy/?sh=211a64467c9c. 139 Narayanan, Sridhar, and Puneet Manchanda. 2009. “Heterogeneous learning and the targeting of marketing communication for new products.” Marketing Science 28 (3): 424–441. Newman, Mark EJ. 2006. “Modularity and community structure in networks.” Proceedings of the National Academy of Sciences 103 (23): 8577–8582. Newman, Mark EJ, and Michelle Girvan. 2004. “Finding and evaluating community structure in networks.” Physical Review E 69 (2): 026113. Onnela, J-P, Jari Saramäki, Jorkki Hyvönen, György Szabó, David Lazer, Kimmo Kaski, János Kertész, and A-L Barabási. 2007. “Structure and tie strengths in mobile communication networks.” Proceedings of the National Academy of Sciences 104 (18): 7332–7336. Osborne, Matthew. 2011. “Consumer learning, switching costs, and heterogeneity: A structural examination.” Quantitative Marketing and Economics 9 (1): 25–70. Park, Eunho, Rishika Rishika, Ramkumar Janakiraman, Mark B Houston, and Byungjoon Yoo. 2018. “Social dollars in online communities: The effect of product, user, and network characteristics.” Journal of Marketing 82 (1): 93–114. Parker, Geoffrey, Marshall Van Alstyne, and Xiaoyue Jiang. 2017. “Platform ecosystems: How developers invert the firm.” MIS Quarterly 41 (1): 255–266. Peng, Ling, Geng Cui, Yuho Chung, and Wanyi Zheng. 2020. “The faces of success: Beauty and ugliness premiums in e-commerce platforms.” Journal of Marketing 84 (4): 67–85. Pesaran, M Hashem, and Qiankun Zhou. 2018. “Estimation of time-invariant effects in static panel data models.” Econometric Reviews 37 (10): 1137–1171. Póvoa, Angela Cristiane Santos, Wesley Pech, Juan José Camou Viacava, and Marcos Tadeu Schwartz. 2020. “Is the beauty premium accessible to all? An experimental analysis.” Journal of Economic Psychology 78:102252. Qian, Kun, and Ying Xie. 2022. “The power of star creators: Evidence from the live streaming industry.” Available at SSRN 4123516. Qiu, Jiezhong, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. 2018. “Deepinf: Social influence prediction with deep learning.” In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2110–2119. Rhoades, Stephen A. 1993. “The herfindahl-hirschman index.” Federal Reserve Bulletin 79:188. Riedl, Christoph, and Victor P Seidel. 2018. “Learning from mixed signals in online innovation communities.” Organization Science 29 (6): 1010–1032. 140 Rossi, Peter E, Greg M Allenby, and Rob McCulloch. 2012. Bayesian statistics and marketing. John Wiley & Sons. Ruffle, Bradley J, and Ze’ev Shtudiner. 2015. “Are good-looking people more employable?” Management Science 61 (8): 1760–1776. Sargent, Thomas J. 1987. Macroeconomic Theory (Economic Theory, Econometrics, and Mathematical Economics Series). Emerald Group Publishing Limited, United Kingdom. Schmidt, Ana Lucıa, Fabiana Zollo, Antonio Scala, Cornelia Betsch, and Walter Quattrociocchi. 2018. “Polarization of the vaccination debate on Facebook.” Vaccine 36 (25): 3606–3612. Schubert, Renate, Martin Brown, Matthias Gysler, and Hans Wolfgang Brachinger. 1999. “Financial decision-making: are women really more risk-averse?” American Economic Review 89 (2): 381–385. Shore, Jesse, Jiye Baek, and Chrysanthos Dellarocas. 2018. “Network structure and patterns of information diversity on Twitter.” MIS Quarterly 42 (3): 849–972. Shriver, Scott K, Harikesh S Nair, and Reto Hofstetter. 2013. “Social ties and user-generated content: Evidence from an online social network.” Management Science 59 (6): 1425–1443. Shust, Mark. 2021. “Creator tip: Handling inconsistent revenue streams as a full-time course creator,” December. https://teachable.com/blog/handling-inconsistent-revenue-streams. Stinebrickner, Ralph, Todd Stinebrickner, and Paul Sullivan. 2019. “Beauty, job tasks, and wages: A new conclusion about employer taste-based discrimination.” Review of Economics and Statistics 101 (4): 602–615. Stock, James H, Jonathan H Wright, and Motohiro Yogo. 2002. “A survey of weak instruments and weak identification in generalized method of moments.” JournalofBusiness&Economic Statistics 20 (4): 518–529. Sun, Monic, and Feng Zhu. 2013. “Ad revenue and content commercialization: Evidence from blogs.” Management Science 59 (10): 2314–2331. Sun, Yacheng, Xiaojing Dong, and Shelby McIntyre. 2017. “Motivation of user-generated content: Social connectedness moderates the effects of monetary rewards.” Marketing Science 36 (3): 329–337. Tang, Jian, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. “Line: Large-scale information network embedding.” In Proceedings of the 24th International Conference on World Wide Web, 1067–1077. 141 Thornton, Rebecca Achee, and Peter Thompson. 2001. “Learning from experience and learning from others: An exploration of learning and spillovers in wartime shipbuilding.” American Economic Review 91 (5): 1350–1368. Toubia, Olivier, and Andrew T Stephen. 2013. “Intrinsic vs. image-related utility in social media: Why do people contribute content to twitter?” Marketing Science 32 (3): 368–392. Train, Kenneth E. 2009. Discrete choice methods with simulation. Cambridge University Press. Tsutsui, Yoshiro, and Iku Tsutsui-Kimura. 2022. “How does risk preference change under the stress of COVID-19? Evidence from Japan.” Journal of Risk and Uncertainty, 1–22. Van Alstyne, Marshall, and Erik Brynjolfsson. 2005. “Global village or cyber-balkans? Modeling and measuring the integration of electronic communities.” Management Science 51 (6): 851–868. Wager, Stefan, and Susan Athey. 2018. “Estimation and inference of heterogeneous treatment effects using random forests.” Journal of the American Statistical Association 113 (523): 1228–1242. Wang, Jing, Gen Li, and Kai-Lung Hui. 2021. “Monetary incentives and knowledge spillover: Evidence from a natural experiment.” Management Science. Wang, Xunyi, Reza Mousavi, and Yili Hong. 2020. “The unintended consequences of stay-at-home policies on work outcomes: The impacts of lockdown orders on content creation.” arXiv preprint arXiv:2011.15068. Watts, Duncan J, and Steven H Strogatz. 1998. “Collective dynamics of ‘small-world’ networks.” Nature 393 (6684): 440–442. Weber, Elke U, Ann-Renee Blais, and Nancy E Betz. 2002. “A domain-specific risk-attitude scale: Measuring risk perceptions and risk behaviors.” Journal of Behavioral Decision Making 15 (4): 263–290. Wei, Yanhao’Max’, and Zhenling Jiang. 2021. “Estimating parameters of structural models using neural networks.” USC Marshall School of Business Research Paper. Weng, Lilian, Filippo Menczer, and Yong-Yeol Ahn. 2013. “Virality prediction and community structure in social networks.” Scientific Reports 3 (1): 1–6. Wu, Chunhua, Hai Che, Tat Y Chan, and Xianghua Lu. 2015. “The economic value of online reviews.” Marketing Science 34 (5): 739–754. Wu, Yang, Gengfeng Niu, Zhenzhen Chen, and Dongjing Zhang. 2022. “Purchasing social attention by tipping: Materialism predicts online tipping in live-streaming platform through self-enhancement motive.” Journal of Consumer Behaviour 21 (3): 468–480. 142 Yang, Tianbao, Yun Chi, Shenghuo Zhu, Yihong Gong, and Rong Jin. 2011. “Detecting communities and their evolutions in dynamic social networks—a Bayesian approach.” Machine Learning 82 (2): 157–189. Yildirim, Pinar, Yanhao Wei, Christophe Van den Bulte, and Joy Lu. 2020. “Social network design for inducing effort.” Quantitative Marketing and Economics 18 (4): 381–417. Yuan, Yuan, Ahmad Alabdulkareem, et al. 2018. “An interpretable approach for social network formation among heterogeneous agents.” Nature Communications 9 (1): 1–9. Zhang, Cheng, Chee Wei Phang, Qingsheng Wu, and Xueming Luo. 2017. “Nonlinear effects of social connections and interactions on individual goal attainment and spending: Evidences from online gaming markets.” Journal of Marketing 81 (6): 132–155. Zhang, Juanjuan. 2010. “The sound of silence: Observational learning in the US kidney market.” Marketing Science 29 (2): 315–335. Zhao, Huazhong, Haibing Gao, and Jinhong Xie. 2017. “The competitiveness of social interactions as a marketing variable in social gaming.” Working Paper. Zhao, Keran, Yingda Lu, Yuheng Hu, and Yili Hong. 2022. “Direct and indirect spillovers from content providers’ switching: Evidence from online livestreaming.” Information Systems Research. Zhao, Yi, Sha Yang, Vishal Narayan, and Ying Zhao. 2013. “Modeling consumer learning from online product reviews.” Marketing Science 32 (1): 153–169. Zhao, Yi, Sha Yang, Matthew Shum, and Shantanu Dutta. 2022. “A dynamic model of player level-progression decisions in online gaming.” Management Science 68 (11): 8062–8082. Zhao, Yi, Ying Zhao, and Kristiaan Helsen. 2011. “Consumer learning in a turbulent market environment: Modeling consumer choice dynamics after a product-harm crisis.” Journal of Marketing Research 48 (2): 255–267. Zhu, Linhong, Dong Guo, Junming Yin, Greg Ver Steeg, and Aram Galstyan. 2016. “Scalable temporal latent space inference for link prediction in dynamic social networks.” IEEE Transactions on Knowledge and Data Engineering 28 (10): 2765–2777. Zhu, Rui, Utpal M Dholakia, Xinlei Chen, and René Algesheimer. 2012. “Does online community participation foster risky financial behavior?” JournalofMarketingResearch 49 (3): 394–407. Zuckerman, Marvin. 2014. Sensation seeking (psychology revivals): Beyond the optimal level of arousal. Psychology Press. 143 Appendices A MomentsforNNE Below, we describe the summary momentsm (see Section 1.5.2). First, we include a set of network summaries, including: (i) the average degree, (ii) the variance of the degree distribution, and (iii) the clustering coefficient, of each period t = 1,2,3,4. This amounts to 12 moments. Second, we include the mean and covariance of the variables in Equation 1.2. Specifically, let x F ij,t− 1 = y ij,t− 1 , SameComm ij,t− 1 , SumDegree ij,t− 1 , NetDistance ij,t− 1 . In the above, NetDistance ij,t− 1 is a measure of network distance between i and j iny t− 1 . It is defined as follows. Let S ij (y t ) be the length of the shortest path betweeni andj iny t . NetDistance ijt = S ij (y t ) 1+S ij (y t ) . The reason we adopt this definition is because S ij can take+∞ when there is no path connecting i andj. In this case, the above network distance still takes a well defined finite value (equal to 1). A natural choice of the moments here would be using the mean vector and covariance matrix of (y ijt , x F ij,t− 1 ). We find that under this choice, NNE recovers the parameters reasonably well. 144 However, we obtain better performance when separating these moments by the value of y ijt . That is, we include: (i) the mean vector and covariance matrix ofx F ij,t− 1 conditional ony ijt = 1, and (ii) mean vector and covariance matrix ofx F ij,t− 1 conditional on y ijt = 0. This alternative choice amounts to (4+10)× 2 = 28 moments for each t. Because we estimate time-invariant parameters, we average each of these moments acrosst = 2,3,4 and include only the averages inm. Third, we include the mean and covariance of the variables in Equation 1.4. Specifically, let x C ij,t− 1 = g ik,t− 1 , PresenceFriend ik,t− 1 , AvgDistance ik,t− 1 , AvgDegree k,t− 1 . In the above, AvgDistance ik,t− 1 takes the average network distance betweeni and members ofk in periodt− 1. That is, AvgDistance ikt = P n j=1 g jkt · NetDistance ijt P n j=1 g jkt . AvgDegree k,t− 1 is the average of the log degrees of community k’s members in period t− 1. Similar to the choice of our second set of moments above, we include: (i) the mean vector and covariance matrix ofx C ij,t− 1 conditional ong ikt = 1, and (ii) mean vector and covariance matrix ofx C ij,t− 1 conditional ong ikt = 0. This amounts to(4+10)× 2 = 28 moments. We again average each moment acrosst = 2,3,4 and include only the averages inm. There is one detail on the computation of moments. When computing the moments con- ditional on y ijt = 0, we randomly sample 3% of the (i,j) pairs with y ijt = 0 to calculate the moments. This is because there is a very large number of pairs withy ijt = 0 in a sparse network 145 and we only require a fraction of them to obtain sufficiently accurate moments (If the simulated network is dense under a set of parameters, this set of parameters can be considered unlikely and dropped when training the NNE). The same idea applies to the moments conditional ong ikt = 0. We sample 10% of the(i,k) pairs withg ikt = 0. To verify that these sampling rates are sufficient, we have tried larger rates (5% and 15%, respectively). The performance of NNE does not change significantly. 146 B TieTransitionProbabilities Table B.1 is based on the same model simulations used in Table 1.4, but it reports results from a different perspective. It conditions on the type of ties at t = 1 instead of at time T . Take the results atT = 25 for example. The table says, among the ties that were born within-community in t = 1 and survive to T = 25, 82.97% become cross-community ties at T = 25. The lower panel of the table weights links by their strengths att = 1. Table B.1: Evolvement of Within- and Cross-Community Ties T = 25 T = 50 T = 75 Within Cross Within Cross Within Cross Unweighted Born Within 17.03% 82.97% 5.43% 94.57% 2.67% 97.33% (0.39) (0.39) (0.31) (0.31) (0.18) (0.18) Born Cross 1.58% 98.42% 1.55% 98.45% 1.40% 98.60% (0.03) (0.03) (0.04) (0.04) (0.04) (0.04) Weighted by Born Within 21.33% 78.67% 7.41% 92.59% 3.66% 96.34% Link Strength (1.02) (1.02) (0.82) (0.82) (0.58) (0.58) Born Cross 1.32% 98.68% 1.46% 98.54% 1.55% 98.45% (0.24) (0.24) (0.20) (0.20) (0.21) (0.21) 147 C COVID-19InducedLockdownsinChina We define “physical lockdown” as the provinces and equivalent administrative units (“provinces” for short) declare a level I public health emergency (the highest possible) regarding the COVID- 19 pandemic. Although such an emergency state is announced at the province level, the Chinese central government has unified rules of when and how to declare. The central government also coordinates and provides guidance for the local government for specific policies; thus the overall policy environment regarding the level I emergency is consistent across provinces. From January 23 to January 30, 2020, all provinces activated Level I public health emergency response toward COVID-19 7 . Table C.1 lists the starting times of physical lockdown in each province. Table C.1: Lockdown Timing Start Date Province 01/23/2020 Zhejiang, Hunan, Guangdong 01/24/2020 Beijing, Tianjin, Hebei, Shanghai, Jiangsu, Anhui, Fujian, Jiangxi, Shandong, Hubei, Guangxi, Chongqing, Sichuan, Guizhou, Yunnan 01/25/2020 Shanxi, Inner Mongolia, Liaoning, Jilin, Heilongjiang, Henan, Shaanxi, Gansu, Qinghai, Ningxia, Xinjiang 01/30/2020 Tibet 7. http://www.scio.gov.cn/m/zfbps/32832/Document/1681809/1681809.htm 148 D EstimationResults: LearningModel Figure D.1 displays the estimation results of the learning parameters. Figure D.1 (a) displays the distribution ofθ i0 estimates, i.e., the importance of creatori’s intrinsic aspect in her reward evolvement. Figure D.1 (b) and (c) plot the distribution of θ i1 and θ i2 estimates, which capture how creators learn from their content creation and consumption, respectively. A comparison between Figure D.1 (b) and (c), reveals that learning from creation (learn-by-doing), as opposed to learning from consumption (learn-by-observation), is more effective in predicting their monetary reward over time. Figure D.1 (d) displays the distribution ofσ 2 i estimates, which is the variance of prediction errorη ijt and measures uncertainty level. 149 Figure D.1: Learning Model Estimates (a) Distribution ofθ i0 (b) Distribution ofθ i1 (c) Distribution ofθ i2 (d) Distribution ofσ 2 i Note that across all plots in D.1, we replace an individual’s learning parameters with the population level estimates, if the individual content creator does not have sufficient earning in- formation to form a belief on the parameters (e.g., does not receive any monetary reward from any of their live streaming creation). 150 E PolicyResults: ImpactsonContentVarietyMeasuredby Entropy Table E.1 displays the content entropy under different combination of subsidy and tax rates, where the entropy is defined as: H =− 6 X j=1 P j logP j , whereP j is the relative frequency of contentj. The lower the concentration, the greater the entropyH. For instance, under the benchmark case (without smoothing incomes over time), the content entropy is1.62. If the platform collects50% of monetary reward from each creation and fully level out the risk (i.e., with a 100% subsidy rate), the content entropy increases to 1.63, suggesting an improvement of content variety (or less concentration) on the platform under the policy. The overall conclusions remain the same as discussed in Section 2.7.1. 151 Table E.1: Content Variety Subsidy Tax Rate Rate 0% 10% 20% 30% 40% 50% 60% 0% 1.6221 − − − − − − (0.0008) 25% 1.6225 1.6240 1.6247 1.6257 1.6249 1.6260 1.6274 (0.0008) (0.0008) (0.0008) (0.0007) (0.0009) (0.0008) (0.0008) 50% 1.6217 1.6245 1.6235 1.6248 1.6260 1.6272 1.6278 (0.0009) (0.0007) (0.0008) (0.0007) (0.0007) (0.0007) (0.0007) 75% 1.6237 1.6228 1.6235 1.6239 1.6254 1.6258 1.6274 (0.0007) (0.0008) (0.0008) (0.0007) (0.0007) (0.0008) (0.0007) 100% 1.6227 1.6228 1.6240 1.6252 1.6251 1.6273 1.6281 (0.0007) (0.0008) (0.0008) (0.0008) (0.0008) (0.0008) (0.0008) 152
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Essays on digital platforms
PDF
Essays on understanding consumer contribution behaviors in the context of crowdfunding
PDF
Essays on revenue management with choice modeling
PDF
Essays on information design for online retailers and social networks
PDF
Essays on the econometric analysis of cross-sectional dependence
PDF
Tokens, ledgers, and rails: the communication of money
Asset Metadata
Creator
Zhang, Wensi
(author)
Core Title
Essays on the economics of digital entertainment
School
Marshall School of Business
Degree
Doctor of Philosophy
Degree Program
Business Administration
Degree Conferral Date
2023-05
Publication Date
10/11/2023
Defense Date
03/07/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
appearance-based discrimination,beauty premium,content creation,content diversity,digital entertainment,effort,effort-enhanced beauty premium,endogeneity,income smoothing,Learning and Instruction,live streaming,monetary reward,network dynamics,network fragmentation,OAI-PMH Harvest,online community,reward fluctuations,risk,social networks
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Yang, Sha (
committee chair
), Dukes, Anthony (
committee member
), Sun, Tianshu (
committee member
), Wei, Yanhao (
committee member
)
Creator Email
wensiz@usc.edu,zh.wensi@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113013889
Unique identifier
UC113013889
Identifier
etd-ZhangWensi-11615.pdf (filename)
Legacy Identifier
etd-ZhangWensi-11615
Document Type
Dissertation
Format
theses (aat)
Rights
Zhang, Wensi
Internet Media Type
application/pdf
Type
texts
Source
20230413-usctheses-batch-1021
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
appearance-based discrimination
beauty premium
content creation
content diversity
digital entertainment
effort
effort-enhanced beauty premium
endogeneity
income smoothing
live streaming
monetary reward
network dynamics
network fragmentation
online community
reward fluctuations
risk
social networks