Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Improving information diversity on social media
(USC Thesis Other)
Improving information diversity on social media
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Improving Information Diversity on Social Media by Ho-Chun Herbert Chang A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMMUNICATION) May 2023 Copyright 2023 Ho-Chun Herbert Chang Dedication To my grandparents Chang Ming-Feng and Lu Ming-Chu, parents Chang Chien-Hsien and Mei-Hong Wen, and sister Helena Chang who have supported me in every endeavor unconditionally. ii Acknowledgements The pandemic was an incredibly lonely time, and this dissertation would not be possible without the patience, compassion, and warmth from so many important people. I would like to express my deepest gratitude to my committee. Thank you, Dr. Jamie Druckman, for sharpening my theoretical application of research (at an Evanston Café), and showing how rigor and kindness can be combined. Thank you, Dr. Pablo Barbera, for showing me how to sustain research through humor, empathy, and humility. Thank you, Dr. Marlon Twyman, for always identifying what my work lacks most, brainstorming connective tissue, and affirming intuitions for the right way to do things. Last but not least, thank you to my advisor and chair Dr. Emilio Ferrara, who taught me most of what I know about computational social science, being my greatest supporter at USC, and believing in my work through these four years. Special thanks to the ANN Group—Peter, Janet, Aimei, Lindsay, and Dmitri Williams for always be- ing generous with their time and firm feedback. I have immense gratitude to my mentors Robb Willer, Allissa Richardson, and Kristina Lerm; Feng Fu and Spencer Topel who has been my constant supporter throughout my research career. Special thanks to the great friends I’ve made in graduate school, as intellectual, affective, and dinner companions: Alex, Atharva, Becky, Chris, Donna, Eugene J., Eugene L., Feixue, Goran, Jack, Julie, Junyi, Kim, Lichen, Lucas, Max, Melisa, Martin, Mingxuan, Nick, Pete, Sierra, Soyun, Sukyoung, and Sunny. Lastly, I’d like to thank my family Hank, May, Helena, Evelyn, Ming-Feng, Yue-Bi, my friends in Tai- wan, and girlfriend Ashley for all the love and support, and making these four years all that brighter. iii TableofContents Dedication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 A Bar Joke from Reddit (2011) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Deliberative Democracies and Participatory Politics . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Information Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Social Media: Networks of Deliberation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Three case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5.1 Bar Patrons: Group Dynamics of Online Communication . . . . . . . . . . . . . . . 7 1.5.2 Counter, Glass, Ambiance: Information Multiplexity and Modal Affordances . . . . 8 1.5.3 The Bartender: Algorithms and Social Networks . . . . . . . . . . . . . . . . . . . 9 1.5.4 How these studies relate theoretically . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2: Bar Fights on Social Media: Ideological Asymmetries, Group Dynamics, and Toxicity . 15 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.1 Policy Engagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.2 Toxicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.3 Supply-Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.6 Supplementary Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Chapter 3: Interior Design: Multimodal and Multiplex Affordances from the 2020 BLM Movement 39 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.1 Black Lives Matter and Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.2 A Political Instagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.3 Organizational Dynamics: Modality and Opinion Leadership . . . . . . . . . . . . 44 iv 3.3 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.1 Richness and Affordances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.2 Framing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.3 Geography and Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3.4 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.4.1 Data Collection and Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.4.2 Visual Content Analysis through Perceptual Hashing . . . . . . . . . . . . . . . . . 52 3.4.3 Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.5.1 Temporal characterization of the movement . . . . . . . . . . . . . . . . . . . . . . 54 3.5.1.1 Geographic Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5.2 Injustice Symbols and Legitimization . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5.2.1 Visual Content Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5.3 Networked Flow and Opinion Leaders . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Chapter 4: The Bartending Algorithm: A Mathematical Model of Social Communication . . . . . . 70 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.3 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4 Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.5 Tie-Level Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.5.1 Strong Ties and Weak Ties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.5.2 Closed-form visualizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.6 Complex Systems: Adding Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.6.1 Variance via Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.6.2 Variance via Multinomial Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.7 Group Dynamics: Dichotomous Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.9.1 Explorations of Closed-form Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 92 Chapter 5: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.1 Simple Open Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.2 Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 v ListofTables 2.1 Policy versus non-toxic/toxic content, across Democratic and Republican MCs. Democrats tweet about policy with more toxicity than Republicans (4.4 versus 7.3 civility ratio). . . . 29 2.2 Accuracy and F1 values per topic from deep learning model (using the BERTweet architecture). Each policy topic is labeled and derived from the comparative agenda project. 38 2.3 BERTweet classifier accuracy for policy versus not policy. . . . . . . . . . . . . . . . . . . . 38 3.1 Top states by number of posts and comparison with actual population statistics. The percentage by population shows the actual percentage relative to the entire population of the United States. As such, the post-to-pop. ratio describes the level of over-representation a certain state has, with D.C. leading at 20.14 times the representation. . . . . . . . . . . . 51 3.2 Top hashtags of 540,591 unique hashtags. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3 Interstate network structure and proportion of posts seen within-state, exported to other states, and imported from other states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4 Top 10 accounts by likes in the dataset. Opinion leaders include magazines and meme pages, in addition to institutional or established celebrities. . . . . . . . . . . . . . . . . . . 63 vi ListofFigures 2.1 Density estimate of audience response by Republican MCs (red) and Democrat MCs (blue). Republican MCs garner a more diverse audience – more liberal users tune in. . . . . . . . . 21 2.2 Left-/right-heaving agenda items versus audience diversity. Negative values indicate Democrat MCs tweet more; positive values Republican MCs tweet more. Republican topic labels are equivalent to Democrat labels perpendicularly below. Republican MCs elicit greater response from liberal users. When Democrat MCs tweet about defense, macroeconomics, diversity is high. When Democrat MCs tweet about civil rights or the environment, diversity is low. Republican MCs have high audience diversity across all topics, which indicates the presence of Democrat re-shares. Thus, Democrat cross-cutting engagement is overall higher. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3 Liberals are exposed and engage with more policy topics than conservatives, shown through a) unique policy topics retweeted and b) aggregate measures of entropy (the more right the sigmoid curve, the more diversity). c) scales topic entropy against ideological extremism with LOESS, and we observe that topic entropy decreases with extremism, until a pullback at higher levels of extremism. . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4 Virality versus toxicity, split by (a) partisan in-/out-group mentions and (b) user-level response. Out-group tweets are in general more viral and incivil (a). Republican out-group tweets can be viral (red)—due to response from the left (a). Democrat users respond much more drastically to toxicity than Republicans (b). . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5 In-/out-group effects on (a) toxicity, (b) in-group composition, (c) political position, and (d) virality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.6 CatBoost regression on the four variables, then ranked top to bottom in terms of feature importance, separated by MC tweets (Democrats/left, Republicans/right). We find toxicity and group status more predictive for Democratic MC tweets and policy topic more for Republican MC tweets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.7 Partisan agenda items versus sensitivity to toxicity (slope of toxicity to virality), broken down by liberal and conservative users. Democrats seem to discuss international affairs with less toxicity. Labeled dots are explicitly discussed. . . . . . . . . . . . . . . . . . . . . 31 vii 2.8 Average political position of audience, by topic and source partisanship (Party of the MC). Validates “starting points” for diversity values in Figure 2.2. . . . . . . . . . . . . . . . . . 36 2.9 Raw values for the supply in the axes of Figure 2.2 and Figure 2.3. Blue denotes number of tweets from Democratic MCs; red denotes number of tweets from Republican MCs. . . . 37 3.1 Volume of Instagram posts plotted on an hourly basis, separated by top hashtags shared during the George Floyd Protests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2 Network of state-state exposure based on aggregated user engagement. Direction of attention are shown by arrows, then colored by general regions in the United States. . . . 57 3.3 Top photos that emerged from the 2020 George Floyd protests. From top to bottom and left to right, we have the Black Out Tuesday Square (a), logos and icons of the Black Lives Matter movement (b, d, i), portraits of George Floyd (c, e, g), and photos of protest (h). . . . 59 3.4 Time series of top icons diffused during the George Floyd protests. Figure 3.4a) shows the diffusion of the three BLM logos and the funding infographic. Figure 3.4b) shows the three iterations of George Floyd’s portrait. We observe much earlier volume in the portraits as compared to BLM and protest organizations. . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5 Network of users (n=713,209), where links are constructed between commenting users and the original poster. Small, dense communities in the center indicate diverse consumption of content. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1 Heat map of the number of topics K versus the topic homophily constant. Fig. 4.1a) shows the overall entropy derived from the information environment. Fig. 4.1b) shows the difference in entropy across strong ties and weak ties. . . . . . . . . . . . . . . . . . . . 79 4.2 Heat map of the number of topics K versus the strong-tie amplification constant β . Fig. 4.2a) shows the overall entropy derived from the information environment. Fig. 4.1b) shows the difference in entropy across strong ties and weak ties. . . . . . . . . . . . . . . . 80 4.3 Kernel density estimates on varied message supply (n=2,5,10,25,50), with K=50. . . . . . . 82 4.4 Interaction of number of messages with entropy. Fig. 4.4a) shows the mean entropy and Fig. 4.4b) the variance, acrossa = 1,5,10,15. The horizontal line in a) denotes the theoretical limit given bylogK. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.5 Information diversity by the distribution of a) unique topics and b) the cumulative density functions for entropy, across strong ties and weak ties. The distribution of underlying topics is uniform, then adjusted by homophily factorα . . . . . . . . . . . . . . . . . . . . . 84 4.6 Information diversity by the distribution of a) unique topics and b) the cumulative density functions for entropy, across strong ties and weak ties. The distribution of underlying topics undergoes perturbation by a lognormal factor, sorted, then adjusted byα . . . . . . . 87 viii 4.7 Entropy ratios between G1 and G2, enumerating the four case-combinations of α 1 ,α 2 ,β 1 ,β 2 . The second quadrant (a 2 < a 1 ,b 2 > b 1 ) produces all three possibilities, hence a space where intervention is possible. . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.8 Simulated results of the information environments between liberals and conservatives. Assuming conservatives have greater perception of in-group ties (β 2 > β 1 ) and are topically less heterogenous (α 2 >α 1 ) , conservatives end up with less information diversity. 90 ix Abstract This dissertation examines information diversity on social media. Through three case studies, investigate information diversity’s relationship with media richness theory, algorithm-driven framing, affective versus attitudinal diffusion, and parasocial interaction. The first case study examines whether liberals and con- servatives respond differently to elite communication on Twitter, in lieu of long-standing questions about (as)symetric behavior based on ideology. I find a demand-side asymmetry, where liberals engage with more diverse policy issues, toxicity, and as a result across the aisle; conservatives engage with Republican-owned issues with less toxicity. The second case study investigates how Instagram facilitated the 2020 Black Lives Matter movement, following the murder of George Floyd. I find a shift toward non-institutional, entertainment-based opinion leaders, and a subsequent more positive framing of the protesters. I argue these affordances produce a divergence from typical framing produced by the traditional media, and facili- tated global solidarity and greater coalition formation. The third case study proposes a general framework for measuring information diversity at and within the user level, adaptable to mulitplex and multimodal settings. GivenK number of topics andr percentage of tie divisions, I derived closed-form equations for the mean information diversity based on topic homophily and strong tie amplification. I then show the problem of computing entropy distribution is isomorphic to the integer composition problem. After simu- lations, I find tie-level and group-level comparisons comparable with empirical data. Together, these case studies illustrate how group dynamics, language, policy issues, modality, algorithms, and tie strength can interact to influence our information environments. x I offer three take aways. First, information diversity is one of the most holistic frames to better un- derstand diffusion phenomenon. Second, high quality opinions and protest-friendly framing emerge from the interaction of media-rich messages and the algorithm. Third, media-rich networks with parasocial ties and localized accounts generate sustained diffusion during social movements. xi Chapter1 Introduction ch:introduction 1.1 ABarJokefromReddit(2011) A Democrat, a Republican, and an Independent walk into a bar. The bartender asks what they’d like to order. The Democrat says, "I’d like whatever drink promotes the most social justice, provides equitably for those in need, respects the physical toll it takes on the planet, and is produced by individuals subject to fair conditions and wages.” The bartender says, "I’m not playing this game." "What do you mean?" the Republican asks. "I see where this is going. You guys are going to pick on the Independent," the bartender said. The Democrat asks him politely to keep taking drink orders. "I’m not doing this," the bartender says. "First of all, I’d consider myself a political independent, but I don’t think your drink order accurately summarizes the Democratic party’s views. ’I don’t care what it costs?’ That’s BS. Plenty of Democrats care about fiscal issues." The Democrat says, "it’s just a drink order." "No, it’s not. It’s a dumb political joke. Hell, this isn’t even a real bar, it’s a reddit post." "At least listen to what the Republican wants," the Democrat says. "It helps set it up." "Hell with that. I want you out of my bar." 1 The Republican says, "I’d like a drink that’s produced as cheaply as possible, but nonetheless provides contractors with funds that–" "Out! Get out!" "Can I just–" "No, get out of my bar." The bartender pulls out a shotgun as the three hurry out. He shouts after them, "and don’t let see you come back in here in the comments." - A bar joke from Reddit. ∗∗∗ The modern bar joke includes several people walking into the bar together—often a Rabbi, Minister, and Priest. Through a comedic trinity, the first two produce a contrast, then the third subverts expectation. The Independent, for instance, had the three not been kicked out, would be sadly ignored. Humor is created by the satirically unreconcilable views of the bargoers. This is far from the truth. The bar is a great place to meet new people. During a year in Edinburgh, I pub crawled through the Fringe Festival, along with 200,000 summer visitors to the Scottish Capital. Along these countertops, I’ve experienced great expansion in my world view. Like airplane conversations, people seemed to converse with an unexpected tolerance. March 2020, the Coronavirus pushed human activity to the limits of virtual spaces, to Facebook walls, Instagram stories, Twitter feeds, Snaps, the YouTube comment section. The pandemic saw unprecedent patronage to these online “bars,” with their own rules, modes of interaction, and bar tenders in the form of algorithms. And, along with the billions of other people around the world suffering Zoom fatigue, I felt a distinct lack of human connection despite speaking with more people than I ever had. 2 This dissertation is the product of hours of Zoom fatigue and asking myself daily, how can we design social media to be healthier? What are the barriers that arise from our own cognition, biases, and social influence? This dissertation has two goals. First, it is to probe what healthy, information rich onlines space would look like. We do so through three case-studies. The first focuses on ten years of Congress people tweeting on Twitter, and how their constituents respond. Second, we shrink the time-scale and focus on the Black Lives Movement in 2020. Third, inspired by my work experience at Meta Platforms, I present a general model of information diversity across different ties and groups. These three case studies each explore a fun- damental tension in social media design: how algorithms decide between personalizing and diversifying one’s feed. The second goal is a bar joke in itself—what happens when a social scientist, computer scientist, and mathematician write a thesis together? This dissertation was generated from many collaborations over four years of graduate school. Computational social science draws from the social sciences, computer science, and mathematics (though less compared to the others), each with its own foundations and ped- agogies. And while these perspectives appear largely synergistic, opposing epistemologies can cause dif- ferences in conclusions to emerge. The three aforementioned case-studies are written with one dominant style, the first two exploring mechanisms that influence information diversity, and is then summarized in a model in the final chapter. 1.2 DeliberativeDemocraciesandParticipatoryPolitics If you were a man in Athens, 500 B.C.E, a typical day may look like this. You wake and fire up your kiln. Then you remember you are one of the 500 citizens tapped to vote on the new law, proposed by your friend the Cobbler, regarding beehive placement after he was stung the prior month. With a toss of water you 3 quench the embers in the combustion chamber, then rush over to cast your vote, lest you be fined by the magistrate. Democracies come in many flavors. Direct democracies, as was the case in Athens, allowed Athenian men over twenty to partake directly in decision-making. Today, most societies lean toward representa- tive democracies, where elected officials—senators, representatives, and mayors—represent larger groups of people. The criticism is familiar. Representatives turn into an elite class (see the Iron Law of Oli- garchy (Michels, 2019), which exposes the political system to corruption and influence. Most modern systems grow out of this tension between decision-making from the top down and bottom up, primarily in two flavors: deliberative democracies and participatory democracies, which allow direct and representa- tive democracies to co-exist, with the intent to empower citizens to engage with the political process (Leib, 2010). Under adeliberativedemocracy, deliberation is central to decision-making. For a democratic deci- sion to be legitimate, it should be preceded by authentic deliberation. Authentic means free from distor- tions due to unequal political power, such as lobbyists or foreign interference from state actors (Bohman & Rehg, 1997; McKay & Tenove, 2021). As such, the elements that influence the deliberation and decision- making process become crucial in this model. This diverges from the traditional theory of democracy that says it is not the mere act of voting that imparts legitimacy for the law (Gutmann & Thompson, 2004). One problem of deliberative democracies is its theories are hard to test, compared to the hard signals of voting. Advocates thus vie for “middle-range theories,” or theories that are not too far off the ground and operational, where specific constituent theories related to public opinion can be tested (Mutz, 2008). Participatory politics, on the other hand, focuses on empowerment and action. Harkening back to Athens, citizens decide on policy proposals and politicians are in charge of implementation. Participatory politics is broadly split into five types of activities: investigation, dialogue, circulation, production, and mobilization. Under participatory politics, the act of doing generates civic engagement (Kahne et al., 2015), 4 and is often evoked in tandem with social movements. Moreover, this sequence of action articulates the stages of agenda setting, opinion formation, collective political action, and political influence. The tension between deliberative and participatory democracies, is what the Citizen Labs call “thick versus thin” engagement (Lodewijckx, 2022). Deliberation is thick, as careful thought through debate is crucial to better understand policy issues, but not scalable en masse. Participation, while it can more easily scale, may be temporary and shallow in nature. These movements are framed with populistic mass hysteria by their opponents, especially when investigation and dialogue are not evident. Regardless of these debates on the distinctions between deliberation and participation, one crucial element lives at the heart of these processes: information. How this information is conveyed and in what media may have drastic implications for both deliberation and participation. 1.3 InformationDiversity Health is a vague term. In this dissertation, I focus on the diversity of information as one measure of health, particularly relative to politics (Bakshy et al., 2015). Broadly speaking, information diversity describes the variety of information at an individual’s disposal when making decisions—purchasing a car, voting for a candidate, or finding an international trade partner (Iselin, 1988). Exposure to cross-cutting information (information outside your own party), is crucial to deliberative democracies (Mutz, 2002; Riker, 1993). Information diversity isn’t the only important part of democratic decision-making. The quality of information naturally plays a role. The contemporary preoccupation with misinformation is largely over the truth value as a quality, which in the language of deliberative democracy, would inject unwanted distortion via erroneous information and affect voter decision-making. Like diversity, quality is also a critical factor in producing healthy environments (Raghunathan, 1999). However, diversity of information is a necessary one. It is also a minimalist concept associated with deliberation (Mutz) and is suitably testable as a mid-range theory, especially on social media. For instance, 5 there are also results that show the opposite. Made popular by Brendan Nyhan and Jason Reifler, the back- fire effect describes the pheonemon where exposure to the opposition can reinforce existing beliefs (Nyhan & Reifler, 2010). Bail and colleagues found this to be true on social media, implying that exposure to op- posing views can increase polarization (Bail et al., 2018). Though the backfire effect has been since disputed, it stimulated significant research efforts around how context, media, and status interact to moderate our beliefs, especially on social media. 1.4 SocialMedia: NetworksofDeliberation As of 2019, the majority of Americans receive some form of political news from social media (Center, 2022). Facebook and YouTube are the most responsible, followed by Twitter and Instagram. While initially her- alded as a driver for participatory politics due to the Arab Spring (Harlow & Johnson, 2011), press regard- ing these tech giants turned mostly negative following Brexit and the election of Donald Trump (Gorod- nichenko et al., 2021; Hall et al., 2018; Hänska & Bauchowitz, 2017). Since then, large efforts have been placed into understanding social media’s role in exacerbating social division. Like many other complex systems, the information environment is unruly in its complexity. The focus on information diversity in social media isn’t to circumvent other potential factors, but to be precise. Socio-technical systems have such endogenous complexity that understanding its dynamics in full would be largely infeasible, but having multiple objectives—such as quality, source diversity, accuracy, dis- tributional parity—is often impossible. In terms of themes, Tucker and colleagues provide a good road map for the current scientific literature regarding social media and the most pressing future directions (Tucker et al., 2018). 1. Measures to estimate the effect of exposure to (dis)information online. 2. Cross-platform and multi-platform research. 6 3. Information spread through images and videos. 4. Generalizability of U.S. based findings. 5. Ideological asymmetries in mediating exposure to disinformation and polarization. 6. Effects of laws and regulations intended to limit disinformation spread. 7. Understand the strengths and weaknesses of bot detection methods. 8. Role of political elites (in spreading misinformation or polarization). This dissertation builds on these frontiers and investigates how three dueling concepts modulate the diversity within social media ecosystems. • Affective versus attitudinal diffusion. • Multimodality and multiplexity. • Tie strength (social networks) and algorithms. 1.5 Threecasestudies 1.5.1 BarPatrons: GroupDynamicsofOnlineCommunication Bars are places for both old friends and new friends. Sometimes you go with a group of friends, just to unwind; other times, it’s an academic conference and it’s the only time to meet other students, from other schools. There is a simultaneous feeling of competition and collegiality. Pervading all fields of social science is social identity theory (Tajfel & Turner, 2004). We perceive ourselves as coarse categories of in-groups and out-groups, as “us” and “them.” There can be many out-groups, but belonging and self- identification to a group is a fundamental human process. 7 In politics, the notion of polarization describes a sharp division into contrasting groups through sets of opinions or beliefs, which is particularly stark in two-party systems such as the United States (Dalton, 2008). Polarization is split into two types. Attitudinal polarization describes disagreement over specific policy issues. Affective polarization describes division caused by either in-group favoritism (liking your own party) or out-group animosity (dislike of opposing groups) (Enders, 2021). Despite being sourced from 2011, the bar joke from Reddit depicts the issue of polarization quite well even today. In my first case study, I investigate how liberals and conservatives responded to political elites from their in- and out-group differently. Central to this work is a long-standing question in politics: do liberals and conservatives respond to messages the same way? Using a dataset of more than 3.5 billion tweets to model the ideology of 1.29 million retweeters, over ten years, we investigate the role of in-/out-group dynamics, message toxicity, and policy issues in the diffusion of messages from congresspeople. This is a paper done in collaboration with James Druckman, Robb Willer, and Emilio Ferrara. I have retained my portions and written an alternative opening. 1.5.2 Counter,Glass,Ambiance: InformationMultiplexityandModalAffordances When we enter a bar, we immediately take note of the furnishing, the lighting, and the soundscape. Are there pints of beer on standing tables, or neon-fluorescent cocktails in champagne glasses? Can we have a quiet, intimate conversation, or will we be drowned out by boisterous guffaws, or “the game”? Coined by psychologist Gibson (1977), the term affordances describe all “action possibilities” latent within an environment. While initially passive, Norman (1999) adopted it in terms of design thinking, be- yond just the person-object level, but to the social dimensions as well. Naturally, as online platforms grew popular in the 2000s, with binary code dictating these action possibilities in strict structures, significant efforts have been put into understanding digital affordances (Autio et al., 2018). 8 In particular, the ability of social networks to facilitate collective social behavior, at rapid speeds was unprecedented until the 21st century. Beginning with the Arab Springs in 2011, social media was heralded to generate democratic movements through decentralized coordination (Harlow & Johnson, 2011). Half a decade later, views toward these tech giants turned sour with the Cambridge Analytica scandal and the election of Donald Trump (Isaak & Hanna, 2018). Significant energy is now focused on how platform design itself influences the outcome of collective behavior, and what becomes possible as we articulate affordances through design. In my second case study, I investigate how Instagram facilitated of the largest civil rights movements in the United States— the 2020 Black Lives Matter Movement. Following the murder of George Floyd and upload of the video, Instagram surprisingly became the foci of attention, which marked the first time the platform found itself at the center of civil politics. This is a paper done in collaboration with Allissa Richardson and Emilio Ferrara, published in PLOS (Chang, Richardson, et al., 2022). I have retained my portions and written an alternative opening. 1.5.3 TheBartender: AlgorithmsandSocialNetworks At Japanese bars, especially nicer ones in Ginza, it is a faux pas to engage in conversation with other guests. If this happens, experienced bartenders will catch your attention by conversing with you, and leave guests to their own conversations or thoughts. Apart from directing conversation (when needed), they dictate where you sit, how welcome you feel, and if you take the time to discuss your drink, or what a perfect drink would be for you. Unlike bartenders, the algorithm is an invisible hand that feeds you what you see and what you do not. Like bar-going mores of distinct countries, different platform algorithms deliver via different sensi- bilities. Yet most people don’t take the same time to understand the algorithm, and how it curates the information you receive. The reason why the algorithm is not included in the previous section is because 9 they have grown so complex it would be inappropriate to include solely as an artifact of affordances. So much, that even engineers in these tech giants expend huge amounts of resources to understand the algo- rithm that they designed. Following the previously outlined importance of social media and polarization, a mechanistic understanding of how algorithms media information from one’s connections is crucial. As an example, Mark Granovetter famously demonstrated in The Strength of Weak Ties, that acquain- tances rather than close friends are critical to helping you find jobs (Granovetter, 1973). They served as important bridges to increase information diversity. This notion had been taken up and tested on various social media sites, including Facebook, LinkedIn, and in my own work, Instagram. The interaction between algorithms and different ties has become particularly interesting, given the interest in information bubbles, or more controversially known as echo chambers. A generalized way of measuring and comparing across platforms is still lacking. In my third case study, I propose a general framework for understanding information diversity on social networks. The core question is how we can measure information diversity with any type of network, modality, or platform. First, I construct a model that only relies on two parameters: topic homophily and the perception of strong ties (or any set of ties i.e. in-group ties). After defining measures for entropy at the user level, I show how information diversity can be computed across the tie level (such as strong ties and weak ties; messages containing text versus images) and group level (i.e. liberals versus conservatives). I also show theoretical limitations to what can be computed through closed-form solutions (mathematical equations). 1.5.4 Howthesestudiesrelatetheoretically The feed-ranking algorithms is a mysterious entity, the mastermind that decides everything shown on your feed. Their design is often an ensemble of multiple inference techniques, resulting in opaque, stochastic, and non-deterministic outcomes. The general concern surrounds how algorithms can drive increased 10 exposure to like-minded people, which is a fundamentally information-limiting phenomenon based on feedback from behavior (Cinelli et al., 2021). This type of feedback loop can pose significant “unhealthy” patterns of behavior across multiple dimen- sions of civil society. This is most evident in the discussion of modern politics. As previously mentioned, passive reinforcement of biased framing can restrict our ability to deliberate over policy issues. With jour- nalism in general, the filter bubble framework is reductive in that it assumes users are diversity-averse and algorithms are diversity-blind. Rather, the preference for diversity often arises through feed-ranking algorithms and users interacting. Who and what algorithm, matters. For instance, younger, less educated individuals have less of a concern with the diversity of news (Bodó et al., 2019). This is because many younger users have had little exposure to non-personalized news. Similarly, users who subscribe to news organizations who prioritize reader loyalty, trust and quality often appreciate diversity in news. This dissertation puts emphasis on characterizing the rich interaction of agential users and non-deterministic algorithms, and what configurations produce the most information diversity. For instance, chapter one dis- cusses how different predilection for uncertainty between liberals and conservatives produce asymmetric reactions to issue engagement. Other political processes can also be influenced. Most research in media and data studies examine the "infrastructural turn" brought by tech platforms. Less attention has been placed on how algorithms interact with human agency, particularly in how activists leverage algorithms forvisibility within social movements (Treré & Bonini, 2022). The most common example is the use of hashtags on Twitter to build networks of dissent, such as #MeToo, #GirlsLikeUs, and #BlackLivesMatter (Jackson et al., 2020). The less obvious include ways to avoid content moderation, which Trere et al. call algorithmic evasion. Examples include vaccine-opposition (Moran et al., 2021) and far-right discourse practices (Bhat & Klein, 2020) on Twitter. Lastly, algorithmic hijacking denotes the use of spam or content to erase visibility of another movement. For instance, the hashtag #myNYPD was used against the New York City Police 11 Department’s PR campaign to produce a counter-public. Since feed-ranking algorithms take into account temporal popularity, their design can be leveraged to influence social movements (Jackson & Foucault Welles, 2015). Each of these examples show how online coordination drives the visibility of a public or counter-public. As such, many researchers attempt to “audit” these algorithms experimentally. NYU’sCybersecurityfor Democracy Project utilized an opt-in browser tool to collect how ads were served and whether there were cases of racial targeting—though not exactly a feed-ranking algorithm, ad-ranking algorithms are similarly motivated: finding the best way to fill-up digital real estate to increase engagement via likes or a click for ad revenue. In particular, targeted advertising in housing and job opportunity cannot contain biases based on race (Chang, Bui, & McIlwain, 2021). Ultimately, NYU’s use of third-party plug-ins resulted in a lawsuit from Facebook. Companies like Twitter have recently also revealed the composition of their algorithm to the public (Twitter, 2023). Even so, companies often do not know how exactly their algorithms work, or how changing certain parameters influence the feed-ranking decisions. As such, they rely on extensive randomized experiments (i.e. A/B testing). For instance, internal research led by Huzsar and colleagues attempted to evaluate the extent for which conservative voices were being amplified by the Twitter algorithm (Huszár et al., 2022). Similarly, my own work at Meta Platforms followed this approach with Instagram. Are empirical approaches the only way? An alternative framework is how we can apply “targeted” amplification, based on information-diverse outcomes. A core goal of this dissertation is to develop such a simulation framework—one that is applicable to the intervention within tech platforms, but also with sufficient generality to provide rich theory building. It explicitly encodes for the agential interaction of users and the algorithm (Chapter 1 on asymmetries between liberals and conservatives), with forays into how diffusion-based phenomenon can influence these outcomes (Chapter 2 on BLM2020). This falls within a larger push to understand how algorithms shape collective behavior (Bak-Coleman et al., 2021). We will 12 focus particularly on visibility, attitudinal vs affective diffusion, media richness (especially equivocality), algorithm-mediated framing, parasocial ties, and localized online networks. Visibility in particular emerges as the central idea, and the gap between diversity and diffusion is what dictates what becomes visible. ∗∗∗ Together, these three case studies highlight how social media has shifted over the past decade, and touch on some of the most crucial questions in social media design. The dissertation is also an exercise of epistemology, with each chapter written in a different style—as a social scientist, a data scientist, and then a mathematician. It is a homage to how researchers of different backgrounds can work together to understand these complex systems, and potentially design them to become healthier. I wanted to also acknowledge the studies during my doctoral years, which provided this dissertation with methodological maturity and theoretical bases. I had the chance to help first identify COVID-19 and USA elections misinformation on Twitter (E. Chen et al., 2022; Z. Chen et al., 2021; Ferrara et al., 2020) as part of our lab effort. I also identified multi-platform misinformation during the Taiwanese elec- tions (Chang, Haider, et al., 2021), gender-based diffusion phenomenon (Jiang et al., 2022), ideological differences of partisan bots on Twitter (Chang & Ferrara, 2022), and entertainment-based amplification of health messaging (Chang, Pham, et al., 2021) on Twitter. In the machine learning front, I developed game theoretic frameworks for negotiation games (Chang, 2021) and "community-in-the-loop" approaches to debiasing language models for LGBTQ+ users (Felkner et al., 2022). A small foray into health communica- tion also led to the multimodal and mulitplex analysis of Tobacco influence through visuals (Kennedy et al., 2021). Similar to Chapter 4, I also developed theoretical models for multiplex network diffusion (Chang & Fu, 2019), and diversity analysis to networks beyond social media, including mentorship and financial networks (Chang & Fu, 2021; Chang et al., 2023). While any of these studies could have been added to this dissertation, these three case studies represented not just the most cohesive set around information diversity, but the wonderful opportunities to learn from mentors and colleagues from all across the campus. 13 Changing the algorithm is the biggest decision a company makes. It is the engine that not just generates revenue, but the health of the platform since it impacts what people engage with. Most strategies at mature companies involve A/B testing—where certain groups of users are selected for different versions of the algorithm. However, the magnitude of experimentation pales in contrast with simulation. In an ideal world, social media companies can iterate between user data and simulations to understand intervention strategies. This dissertation aims to fill that gap. Without further delay, let’s see how we can redesigning our bar. 14 Chapter2 BarFightsonSocialMedia: IdeologicalAsymmetries,GroupDynamics, andToxicity ch:group_dynamics Abstract The rise of social media gives citizens direct access to information shared by political elites, but more than ever before, citizens now play a critical role in diffusing this content as well. What kinds of elite-generated content spread on social media, and how do conservative and liberal citizens differ in the content they engage? These questions relate to long-standing academic and popular debates about whether political behavior is a symmetric or asymmetric concerning political ideology. Here, we analyze the content of 13,581,100 user retweets of messages by US Members of Congress on Twitter from 2009 to 2019, leverag- ing citizens’ estimated political ideology constructed from the users’ retweet histories (N = 3,522,734,792 citizen tweets). Contrary to ideological symmetry accounts, we find limited evidence that the strength of ideology predicts diffusion choices similarly on the left and right. In contrast, we find robust support for ideological asymmetry accounts, as liberals and conservatives differed in both the style and substance of the elite communications they engaged. Consistent with prior work showing liberals’ greater preferences 15 for diverse and novel stimuli, we find liberals retweeted topics common central to both liberal and con- servative agendas, while conservatives avoided retweeting topics central only to the liberal agenda. Addi- tionally, consistent with prior work demonstrating that liberals are less conventional and norm-following than conservatives, we find liberals were much more likely than conservatives to share toxic messages. Overall, this meant for Democrat elites, group status and toxicity is about 1.09 and 1.03 times as strong as policy topic for determining virality; for Republican elites, policy topic was 2.31 and 1.66 times as strong as toxicity and group status. The diffusion patterns imply that liberals are exposed to more toxic and polit- ically diverse elite-generated content n social media, while conservatives receive more conventional and political homogenous information. 2.1 Introduction At least a third of the world’s population uses social media platforms (Ortiz-Ospina & Roser, 2023). Within individual countries, they play a critical role in political information. What distinguishes social media from the traditional media is the ability for audience members to create, share, and thus dictate the flow of information. As such, while new media technology is thought to facilitates access to elites who can directly communicate with voters (Neblo et al., 2018), the reality is that most of the political information may comes from other users (Wojcieszak et al., 2022). This harkens back to the idea of a two-step communication flow where “ideas, often, seem to flow from [elites] to opinion leaders and from them to the last active sections of the population” (Katz, 1957). This gives audience members, collectively, considerable influence over the political information environments (Rathje et al., 2021). A long-lasting question in the United States’ context is whether liberals and conservatives make deci- sions in the same way, or if ideological asymmetries exist. Although there has been significant research done for misinformation (C. X. Chen et al., 2021; Guay & Johnston, 2022; Osmundsen et al., 2021; Pen- nycook & Rand, 2021) and message-based diffusion (Brady et al., 2019; Frimer et al., 2023; Rathje et al., 16 2021), very little work has been done at the audience level based on ideology. This is largely due to the unavailability of public data, though surprising due to debates on how liberals and conservatives behave. In this chapter, we investigate the political information environment from both the supply and demand perspective. We do so with a seed-set of 1.29 million tweets from Members of Congress (MC) between 2009- 2019, and 3.5 billion timeline events of their retweeters. Our analysis thus requires looking at the data from both the audience and the MC’s perspective, to see if liberals/Democrats and conservatives/Republicans behave differently. For clarification, we use liberal and conservative to denote ideology, Democrat and Republican for party affiliation. There are two reasons why we might expect liberals and conservatives to react differently. First, in regards to political issues, ideologues and partisans tend to engage with some issues more than others. Issue-ownership denotes this idea where certain parties and candidates prioritize certain issue bundles more than others, for which they have a political advantage (Egan, 2013; Petrocik, 1996). Democrats tend to champion the environment, education, and civil rights, whereas Republicans defense, macroeconomics, and international affairs (Van der Brug, 2004). Additionally, conservatives and liberals have a different reaction to uncertainty, such as less openness to new experiences, including policy issues (Jost, 2021). In particular, Rogers and Jost (2022) show that conservatives participate in less cultural activities and listen to less music due partially to lower openness. Thus, conservatives would theoretically engage with fewer policies that come from the other party’s (i.e., liberals’ / Democrats’) agenda. Conversely from the supply-side, we would expect Democrat MCs to have a less diverse audience, as conservatives would not tune in. In contrast, Republican MCs would draw engagement from both conservatives and liberals, and thus a more diverse audience. Second, we expect liberals and conservatives to respond differently to toxicoruncivilcontent. From the supply-side, Rathje et al. (2021) find discussing the political outgroup, largely, increases the likelihood of sharing content from MCs particularly when it includes negative emotions. This motivated tweeting is 17 consistent with other work on moral contagion (Barberá et al., 2019). Frimer et al. (2022) find that citizens are substantially more likely to retweet messages from members of Congress when they are uncivil – and that this retweeting, in turn, incentivizes members to become more uncivil. However, from this may not arise symmetrically from the demand-side. Conservatives place higher value on order and tradition (Jost, 2017), and Mutz finds “Republicans and conservatives may be particu- larly sensitive to norm-violating threats to the social structure.” As toxicity violates social norms (Bormann et al., 2022) and generates negative emotional arousal (Druckman et al., 2019; Mutz, 2006), conservatives may be less likely to want to share toxic communications. As Rathje et al. 2021 shows toxicity (or animos- ity) is correlated when discussing the other party for virality (Rathje et al., 2021), which suggest conser- vatives will be less likely to retweet Republican MC’s tweets of Democrats, than liberal retweet Democrat MC’s tweets of Republicans. As a corollary, we would expect ideologue extremists to also exhibit less diversity of information com- pared to more centric users. This is because ideologues to obtain information from like-minded ideologues on social media (Boutyline & Willer, 2017), and extremist show the same predilection for certainty. As such, we test two hypotheses: 1. Relative to liberals, when conservatives encounter issues that are not part of their agenda in a tweet from an MC, they will be less likely to retweet it. 2. Relative to liberals, conservatives will be less likely to retweet issues that contain toxicity. In sum, we expect asymmetry and ideologues will show different responses to political information they receive from elites. More importantly, this would mean liberals and conservatives exist in different information environments. Liberals would be exposed to more policy topics (more information diversity) and more toxicity (than conservatives) and thus have distinct bases on which to make decisions and inter- act. 18 By information diversity, we refer to how diverse a user engages with topics common to the political process, such as health care, civil rights, national security, and macroeconomics. We chose information diversity as it pairs well with retweeting. As many scholars note, retweets do not imply endorsement. Rather, retweeting is a form of engagement that implies the diffusion of information. In congress, a well- defined set of policy issues are debated by political elites, with robust code-books developed over the last two decades (Baumgartner et al., 2006). By observing how engaged retweeters share this information, we can characterize the downstream information environments curated by these intermediaries. The principal novelty of this study lies in its ability to observe the supply-demand dynamics of polit- ical elites and their audience, which provides a rare scenario to test the clear, testable middle theory of issue ownership. The deeper strand that underpins this is the tension between attitudinal and affective polarization. Topic engagement explores mechanisms that lead to issue exposure within local networks; group animosity studies how source moderates this exposure. Though not one-to-one with polarization, this case study is able to investigate diffusion based on attitudinal versus affective factors. 2.2 Data We use Frimer et al. (2023) as the seed dataset (N=1,293,753). Frimer et al. (2023) contain toxicity (or incivility) scores based on the Google Perspective API. We additionally add the following variables: pol- icy topic (machine-learned labels with accuracy of 88% based on the Comparative Agenda Project), user ideology based on user timeline data (weighted score based on media domains tweeted). The dataset con- tains a total of 10,451,080 retweet events, 1,435,132 unique tweeters, and 4.04 billion timeline tweets. Our dataset differs from Rathje et al. (2021) primarily in timespan. Our dataset includes MC tweets from be- tween 2009 and 2019. Notably, this then includes early online discussion of the primaries of 2016 (which does not seem to be as present in Rathje et al. (2021) whose data are from 2016 to 2020), and contains the subsequent presidential election and COVID-19 discourse. As mentioned, we label audience members as 19 liberals (Democrats) or conservatives (Republicans) based on their media diet over time. While we suggest this captures ideology (partisanship), an alternative (less presumptive) interpretation is to think of it as a political diet. Twitter is also a deliberate choice. There are many social media platforms, each with its own structure, and the platform a researcher uses for a given study depends on a variety of considerations including the function of the data, the availability of the data, and the hypotheses under study (A. M. Guess, 2021). Our interest lies in how individuals with varying ideologies react to data from elite sources. This requires data on the audience’s ideology – which is the endpoint of the information flow. We obtain this information by extracting timelines of users to then label their ideology. In this regard, Twitter is unique, among relatively widely used social media platforms, in that it allows public and retroactive analysis detailed at the user level. Our scope is 10 years, which falls outside the range of Facebook’s data retention. More generally, Facebook user timelines are much more privacy restrictive, which pose challenges to user-level ideological modeling (Sharma & Verma, 2018). As Steinert-Threlkeld (2018) states, “researchers interested in diffusion processes and emergent behavior find Twitter a natural resource." We recognize that, like other studies of a single platform, there are invariably questions about generalizability. However, existing theories of par- tisanship and ideology can be tested with our Twitter dataset, and if the data cohere with our predictions, that means the continued accumulation of evidence for theories of partisan asymmetry (Jost, 2017). As shown above, testing our hypotheses requires that we identify the political predilections of both elites – i.e., Members of Congress – and Twitter users. We rely on partisanship for the former, since it is easily identifiable. As intimated, users’ partisan affiliations on Twitter, in contrast, are not explicitly declared but can be accurately inferred by relying on patterns of Twitter use over time to categorize users as either liberals or conservatives (Chang, Chen, et al., 2021; E. Chen et al., 2021; Ferrara et al., 2020). We discuss our approach below. Consequently, we employ partisanship and ideology as proxies for one 20 Figure 2.1: Density estimate of audience response by Republican MCs (red) and Democrat MCs (blue). Republican MCs garner a more diverse audience – more liberal users tune in. fig:ch2_kde another, assuming that Democratic (Republican) MCs share an outlook and identity with liberal (conser- vative) users. We will use the terms interchangeably, noting the high degree of ideological sorting among partisans (Levendusky, 2009). 2.3 Results We begin with a crucial descriptive figure. Figure 2.1 shows the ideology of those who retweet MC’s tweets. The x-axis provides the political position of the retweeter with lower (higher) scores being more liberal (conservative). That is, negative values (between -2 and 0) indicate liberal/Democrat leaning while positive values (between 0 and 2) indicate conservative/Republican leaning. The y-axis provides the probability density estimate with the solid blue line indicating Democratic MCs, and the dashed red line being for Republican MCs. The figure reveals that for Democrat MCs, the retweeters are much more likely to be homogeneously liberal; in contrast, for Republican MCs, the retweeters are much more likely to be widely distributed including a non-trivial proportion of liberals (i.e., only a slight skew to the right). This implies 21 Figure 2.2: Left-/right-heaving agenda items versus audience diversity. Negative values indicate Democrat MCs tweet more; positive values Republican MCs tweet more. Republican topic labels are equivalent to Democrat labels perpendicularly below. Republican MCs elicit greater response from liberal users. When Democrat MCs tweet about defense, macroeconomics, diversity is high. When Democrat MCs tweet about civil rights or the environment, diversity is low. Republican MCs have high audience diversity across all topics, which indicates the presence of Democrat re-shares. Thus, Democrat cross-cutting engagement is overall higher. fig:ch2_topic_regression liberals are more likely to tune into Republican MCs, than conservatives into Democrat MCs (Barberá et al., 2015): an asymmetry in engagement with other party MCs. We next turn to uncover whether our predictions capture (can explain) the sources of the asymmetry. 2.3.1 PolicyEngagement Our first hypothesis suggests that conservatives will be less likely to retweet Democratic MCs than liberals are to retweet Republican MCs. This occurs because conservatives avoid engaging with liberal policies that Democratic MCs likely discuss, more so than liberals avoid conservative topics, based on preferences for certainty (Boutyline & Willer, 2017). To identify issues, we trained a supervised machine learning classifier using labeled tweets from Russell (2018), with the assumption that the linguistic characteristics of tweets 22 between Representatives and Senators are similar (Russell, 2018). The dataset includes a total of 68,398 tweets, 45,402 tweets labeled with codes from the Comparative Agenda Project’s (CAP’s) codebook (Be- van, 2019) and 22,996 labeled as non-related tweets. We implemented a variant of the large BERTweet architecture, a pre-trained model from Nguyen et al (2020), and trained two classifiers – one for multiclass classification of 20 topics, one for binary classification to see if a tweet relates to policy or not, distinguish- ing if policies are mentioned (Nguyen et al., 2020). Our first (multiclass) model produces an accuracy of 0.864 for classification across all 20 policy topics (from CAP). Our second model (binary) produces a score of 88.5%. This beats the current best model at 79% (Hemphill & Schöpke-Gonzalez, 2020). Methodological details are further discussed in the Methods section. We then classified issues as liberal/Democratic or conservatives/Republican, based on their relative supply, with the theoretical rationale being that partisans emphasize issues on which they are advantaged (Druckman & Jacobs, 2015) For Figure 2.2, the x-axis shows the difference in tweet supply for each policy (Republicans minus Democrats). This operationalizes right and left agenda items. Positive values indicate Republican MCs tweet more; negative values indicate Democrat MCs tweet more. The y-axis shows the political diversity via the standard deviation of political position of the audience – that is, the political di- versity for the given topic (and thus the results are not affected by the frequency with which a given topic is tweeted by MCs). Thus, the blue dots are aggregate tweets from Democratic MCs; red dots from Repub- lican MCs. The figure is consistent with work on issue ownership: Democrat MCs tweet Democratic topics including civil rights, health, environment, immigration, and education, while Republican MCs tweet Re- publican topics such as defense, macroeconomics, and international affairs (Craig & Cossette, 2020; Fagan, 2021; Petrocik, 1996). To further ensure the validity of the variance, we also plot the average political position (Figure 2.9 in the SI), and show the direction of diversity does indeed come from the out-group. This is useful confirmation since our definition of liberal and conservative issues are based on the supply 23 difference (see Figure 2.8 in the SI); it also shows that MCs of a given party pursue the strategy we sug- gested by emphasizing their owned issues (the one exception issue is Democrat MC’s relative emphasis on law and crime despite it being a traditional Republican issue. This likely reflects Democrats using the terms “criminal” or “crime” to discuss Trump, during the Russia probe, which would inflate supply of this topic for Democrats). The figure clearly supports Hypothesis 1; when Republican MCs tweet, the audience diversity is al- ways very high (consistently around 45%, with a slight downslope). In contrast, when Democrat MCs tweet on issues usually owned by liberals (i.e., civil rights), diversity is low, meaning cross-cutting en- gagement is low. Moreover, when Democrat MCs tweet about issues usually owned by conservatives (i.e., defense), diversity is high, which suggests cross-cutting engagement is high. Liberals respond consistently to Republican MCs across policy topics, whereas conservatives are more selective and only engage with Democrat MCs for conservative agenda items. Conservatives do not appear to engage Democrat MCs on liberal issues. But, liberals engage Republican MCs on conservative issues. We further summarize these results in Figure 2.3. Figure 2.3a shows the user distribution of unique topics. The x-axis represents the number of unique topics with which a user has engaged, and the y-axis the proportion. The heavier tail of liberals show that in general, liberals engage more diversely. This, however, does not take into account the skew within the topic categories: for instance, a user can engage with two topics, but fifty times with topic 1 and just once with topic 2. For every user, we then attach a score for topic diversity based on Shannon entropy: H(x) =− X p(x)logp(x) (2.1) Here, p(x) denotes the probability of an event (in our case frequency) and log p(x) the information content. Thus, a higher Shannon entropy in this context means greater diversity. Figure 2.3b shows the cumulative distribution, and the more right-ward the S-shaped curve is, the more aggregate diversity. This 24 Figure 2.3: Liberals are exposed and engage with more policy topics than conservatives, shown through a) unique policy topics retweeted and b) aggregate measures of entropy (the more right the sigmoid curve, the more diversity). c) scales topic entropy against ideological extremism with LOESS, and we observe that topic entropy decreases with extremism, until a pullback at higher levels of extremism. fig:ch2_user_diversity 25 shows once again liberals engage more diversely. The extent of the partisan difference is sizable, captured by the yellow shading in Figure 2.3a: when aggregated, it is equivalent to a 19.4% divergence where liberals engage more diversely than conservatives. Or put another way, liberals retweeted more diverse topics by 19.4% versus conservatives (a ratio of 1.19). We can further show this scales with ideological extremism. Figure 2.3c shows that as we scale the absolute value of ideology topic entropy decreases using LOESS smoothing, and this happens faster for conservatives. This affirms a prior hypothesis from Boutyline and Willer (2017) that homogeneity increases with extremism, and conservatives exhibit this more acutely (suggesting that there is not symmetry among extremists on each side). The compounding of homophilous ties and lower topic diversity for these intermediaries would have downstream effects for who generates virality, especially with toxicity. One further extension is the relationship of bots and media diversity, as bots may share a select few media sources by design, and as they share in higher volumes, may be over-represented at certain extremity bins with biases toward certain topics (Chang & Ferrara, 2022). 2.3.2 Toxicity We next turn to tweets that invoke the parties regardless of issues. Figure 2.4a displays the relationship between the toxicity of an MC’s tweet about their or the other party and the extent to which the tweet is retweeted. The figure shows, consistent with prior work, that in every case, as toxicity increases, so does re-tweeting (Brady et al., 2020). It also shows that Democrat MCs’ out-group tweets that are particularly toxic are most likely to be retweeted. In contrast, Republican out-group MCs’ tweets have a much lower level of retweeting. Figure 2.4b plots different audience responses to distinct tweets by MCs. It shows that liberals generally retweet much more of the toxic tweets than conservatives do. Indeed, conditioning upon the partisan source (e.g., same party or other party), the graph shows that the gap between users’ reactions grows as toxicity increases (i.e., the slopes on the liberal user lines are much steeper than those on the conservative user lines). Substantively, liberals retweeted toxicity from their own party 1.56 more 26 Figure 2.4: Virality versus toxicity, split by (a) partisan in-/out-group mentions and (b) user-level response. Out-group tweets are in general more viral and incivil (a). Republican out-group tweets can be viral (red)—due to response from the left (a). Democrat users respond much more drastically to toxicity than Republicans (b). fig:ch2_supply_demand_toxicity often than conservatives. Moreover, liberals retweeted toxicity from the other party -8.30 more often than conservatives, where the negative sign indicates different directions in response. In summary, retweeting occurs much more when Democrat MCs are toxic against Republicans than vice versa, and this reflects that liberals are more apt to retweet toxic tweets. This is consistent with Hypothesis 2. Liberals will engage in retweeting toxic critiques of Republicans but conservatives are less likely to retweet toxic tweets of Democrats. In Figure 2.5, we provide more descriptions of the tweets. Figure 2.5a reports the average amount of toxicity in distinct types of tweets, showing that out-group tweets from either party contain more toxicity. This is particularly true among Democrat MCs. Figure 2.5b and c describe the profile of retweeters in terms of ideology, whereas Figure 2.5d displays the aggregate amount of retweeting. Figure 2.5b shows in-group audience composition of retweets, revealing – consistent with the diversity dynamic – that Democrat MCs are overwhelmingly retweeted by liberals, while Republican MCs have a smaller in-group retweet 27 Figure 2.5: In-/out-group effects on (a) toxicity, (b) in-group composition, (c) political position, and (d) virality. fig:ch2_message_agg percentage (although still the clear majority are in-party). That said, for both parties, out-group tweets draw more in-party users. This is further evidenced by Figure 2.5c that shows Democrat MCs get retweeted by an extreme set of liberal users while Republican MCs draw only a moderately conservative set of users. Looking specifically at out-group tweets, we see that the ideology of the audience is more extreme for out-group tweets from both parties. Out-group tweets are more toxic, homogenous, and extreme. As with Figure 2.3, we thus see some extremity impact, but importantly it is not the extremes on each side that evade topic diversity or endorse toxicity (as a symmetry hypothesis might suggest) – that stems from ideological predilection. Figure 2.5d) reveals that Democrat MC outgroup tweets get the most retweets, reflecting that liberals are much more apt to retweet such communications (that tend to be toxic). Republican MC in-group tweets are more viral than their out-group tweets. This reflects conservatives’ relative toxicity aversion (where toxicity is more present with out-party tweets) and the presence of Trump in the 2015 primaries and 28 (a)DemocraticMCs Not Policy Policy N.Pol + Pol. Non-toxic 32.3% 52.6% 84.9% Toxic 3.3% 11.8% 15.1% Non-toxic/ToxicRatio 9.739 4.447 – (b)RepublicanMCs Not Policy Policy N.Pol+Pol. Non-toxic 41.4% 48.7% 90.2% Toxic 3.1% 6.7% 9.8% Non-toxic/ToxicRatio 13.25 7.27 – Table 2.1: Policy versus non-toxic/toxic content, across Democratic and Republican MCs. Democrats tweet about policy with more toxicity than Republicans (4.4 versus 7.3 civility ratio). tab:ch2_polcy_toxicity 2016 election driving more in-party retweeting. In sum, liberals engage in retweeting toxic critiques of Republicans but conservatives are less likely to retweet toxic tweets of Democrats. In the previous section, we showed that liberals are much more likely to retweet Republican issues than conservatives are to retweet Democratic issues. This section showed that liberals are much more likely to retweet discussions of Republicans that tend to be toxic than conservatives are to retweet about Democrats (which also tend to be toxic). This aligns with each MC’s incentives for maximizing virality. Therefore, liberals engage with, i.e., retweet, more diverse issue agendas and more discussion of the other side (and more toxicity). In short, our results show the audience diversity displayed in the audiences for MC’s from different parties (Figure 2.1) does in fact reflect asymmetric engagement based on policy issues and toxicity. 2.3.3 Supply-Side It is useful to explore whether supply side effects drive any of the asymmetry; that is, are the dynamics we are displaying not only demand-driven (e.g., Frimer et al. (2023))? In Table 2.1, we present confusion matrices of MC tweets by non-toxic/toxic and not policy/policy content, split by party. The bottom row shows the ratio of toxic to non-toxic tweets per that policy category. We classify a tweet as toxic if it is greater than the mean plus one standard deviation (0.237). These results remain the same when using 29 Figure 2.6: CatBoost regression on the four variables, then ranked top to bottom in terms of feature im- portance, separated by MC tweets (Democrats/left, Republicans/right). We find toxicity and group status more predictive for Democratic MC tweets and policy topic more for Republican MC tweets. fig:ch2_catboost the mean (see Table S4, the additional standard deviation is more conservative.) The main finding can be captured by comparing the not toxic / not policy category versus the toxic/policy category. For Democrat MCs, the respective percentages are 32% and 12% whereas for Republicans, they are 41% and 7%. In short, Democrats tweet much more about policy and they do so with much more toxicity than Republicans. The overall toxicity is 15% for Democrat MCs and 10% for Republican MCs. This suggests that the supply of content differs between partisan MCs. Of course, supply depends on demand and thus, we next look at the relationship of virality with MC choices. Using boosting and Shapley value analysis, we regress on virality (number of retweets) per MC party, using the following variables: • PolicyDifference: The supply difference of the 20 Comparative Agenda Project’s Policy Topics. • Policy: Whether it is a policy issue or not • Toxicity: Toxicity level of that tweet. • Groupstatus: Whether a tweet is an in-group, out-group, or not related. 30 Figure 2.7: Partisan agenda items versus sensitivity to toxicity (slope of toxicity to virality), broken down by liberal and conservative users. Democrats seem to discuss international affairs with less toxicity. Labeled dots are explicitly discussed. fig:ch2_topic_inc_reg The results are in Figure 2.6. We find a clear asymmetry with Democrat MC tweets becoming much more viral when they invoke the other group and when they are toxic. In contrast, Republican MC tweets become more viral based on policy and specifically when the policies are their policies presumably. This outlines the different incentives for MCs based on party, if they are seeking to maximize their virality. Furthermore, we show that for MCs of both parties, Democrat-dominant topics generate more virality. This explains most clearly the toxicity asymmetry in supply: Democrats MCs tweet with more toxicity, as that draws more of an audience for them. This makes sense given liberal users are relatively drawn to toxicity. In Figure 2.7, we further look at the demand for toxicity. Here, we plot the effect of toxicity on retweet- ing on the y-axis (i.e., the regressed slope from virality on toxicity; tabular results shown in Table S3). This is a proxy for sensitivity (i.e., y = 0 indicates no change to virality based on toxicity). The x-axis shows the ideological dispersion on the given issue with each dot indicating an issue tweeted by a Republican MC 31 (red) or a Democrat MC (blue). In line with what we have already shown, when toxicity is used, liberals are substantially more likely to retweet it than conservatives. The figure reveals a large effect. On Democratic issues, when there is toxicity, conservatives in fact are less likely to retweet it (lower beta/sensitivity). Even on Republican issues, they retweet toxic tweets at much lower rates than liberals do. There are no is- sues that conservatives are more likely to retweet when they are toxic, though this reflects somewhat that conservatives retweet less and also are less on Twitter in general. In sum, the asymmetry in retweets and thus information environments is a result of both supply and demand preferences. Republican MCs gain the attention of liberals for cross-cutting issues through toxicity (without compromising their in-group followers), but not vice versa. 2.4 Discussion We investigated whether there are asymmetric fractions to social media where liberals and conservatives spread information differently. We hypothesized ideological asymmetry due to cross-ideology policy and toxicity aversion among conservatives. We tested our predictions on the only available social media data, from Twitter, that provides the relevant measures. We have three main findings. First, Republican Members of Congress have a more diverse audience, implying that liberal users engage in more cross-cutting engagement while conservatives pass along less diverse information. Second, conservative users are more selective on which policies they retweet, avoid- ing topics that are not on the conservative agenda. This may drive them to exist in a policy-based infor- mation bubble. This is consistent with work showing socially driven (but not necessarily news driven) policy bubbles (Barberá et al., 2015; Cinelli et al., 2021; Mosleh et al., 2021). Ours is one of the only anal- yses, however, to document the asymmetry in the nature of those networks (Boutyline & Willer, 2017). A key difference is they focus on a user subset based on follows, which is not required for retweeting. 32 Our dataset also begins from the supply level, whereas they start from the demand level. Third, toxic- ity creates cross-cutting engagement asymmetrically: liberals respond to, and retweet toxicity more than conservatives do. Put together, our results show that, if the forces that generate engagement for one party diminish engagement for the other, then we are pushed toward an asymmetric situation where one group engages with much less information. Specifically, liberals and conservatives react differently to information and, consequently, rebroadcast varying types of information. Liberals then will view a host of policies and toxicity while conservatives will mostly see issues on the conservative agenda and relatively less toxicity. These results add more evidence on behalf of the ideological asymmetry hypothesis from a new domain of testing. The findings also have potentially crucial downstream implications. Wojcieszak et al. (2022) find most Twitter users do not follow partisan elites, which indicates a diffusion gap between partisan elites and the general public (Wojcieszak et al., 2022). It is plausible, if not probable, that most users receive more political information from other users via retweets. In that sense, the individuals we studied here serve as intermediaries a la the canonical theory of two-step information flow, where the general public receives political information not directly from elites but second-hand from other citizens (Carlson, 2019; Druckman et al., 2018; Katz, 1957; Weeks et al., 2017). While the composition of the intermediaries’ audiences is not something we measured, there has been extensive work detailing the extent of homophily across liberal and conservative lines on Twitter. Bouty- line and Willer (2017) find conservative and ideologically extreme individuals exhibit greater levels of homophily, and posit this is caused by a stronger “preference for certainty.” Putting this together with our finding that liberals retweet a more diverse palate of topics suggests that this diversity is compounded with liberals’ more heterophilous social ties. Vice versa for conservatives, sharing less diversely and ho- mophilic ties likely create an “information bubble.” This bubble reflects liberals being both more likely to engage with heterogeneous content and with heterogeneous sources (relative to conservatives). Moreover, 33 the compounding effects of toxicity, tie homophily, and topic diversity, means the demand for out-group animosity may not necessarily arise from their expected constituents. The consequence could be very distinct information environments for liberals and conservatives with liberals having access to a wider array of policy topics but also more toxicity relative to conservatives. This could make cross-ideological or cross-partisan interactions more difficult since those with distinct outlooks are engaging from very differ- ent places. This would have downstream implications for how social media should be designed, if for the purpose of increasing exposure and engagement with outgroups. Symmetric approaches, such as blanket thresholds for toxicity or out-group topics, would have heterogeneous success based on the ideology of the user. 2.5 Method We extend Frimer et al. (2023)’s data source to include the following covariates (at the MC tweet level, i.e., the supply level): comparative agendas project (CAP) topic, in-/out-group labeling, MC source party labeling, and, upon scraping retweeter timelines, user ideology labeling. We use the toxicity scores pro- vided by Frimer et al. (that is based on an independently validated operationalization of PerspectiveAPI’s “toxicity”). Topic (Supervised): We trained a supervised machine learning classifier using labeled tweets from Russell (2018), with the assumption that the linguistic characteristics of tweets between the House and the Senate are similar. The dataset includes a total of 68,398 tweets, 45,402 tweets labeled with codes from the Comparative Agenda Project’s (CAP’s) codebook (Bevan, 2019) and 22,996 labeled as non-related tweets. The best models from Russell feature F1-scores of 79% and include augmentation via the Linguistic In- quiry and Word Count (LIWC) scores. LIWC is a gold standard package for computerized text analysis cov- ering a range of psychological and topical categories and social, cognitive, and affective processes (Tausczik & Pennebaker, 2010). Using these features, we implement deep learning, specifically a variant of BERT 34 (Bidirectional Encoder Representations from Transformers), which is a variant of the transformer archi- tecture fine-tuned for the English language. We implement the large BERTweet architecture (a specialized BERT model for Twitter and tweets), using a pre-trained model from Nguyen and colleagues (2020). We train two classifiers – one for granularly extracting topics and another for distinguishing if policies are mentioned. Our first (multiclass) model produces an accuracy of 0.864 for classification across all 20 policy topics. Our second (binary) model yields 88.5% accuracy, both of which out-perform those by Russell. Full classification results are given in Table 2.2 and Table 2.3 in the Supplementary Information. Group-labeling: We implemented the same approach as Rathje et al. (2021), using a list of keywords that refer to Democrats (such as left-wing, liberal, etc), Republicans (right-wing, conservatives, etc), and the most popular politicians based on polling from YouGov. Source Party Labeling: A total of 831 members of congress had Twitter labels and were labeled. Several have changed accounts, and these were manually merged. For those who had switched parties, their party affiliation at the time of the tweet was used. User ideology: Ideology scores were labeled based on a user’s timeline history (up to 3,200 tweets per the limit of the API). This yielded a total of 3,522,734,792 tweets. URL domains were extracted from the timeline. Associated with each URL is a political score of [-2, -1, 0, 1, 2], corresponding to “left,” “left- center,” “center,” “right-center, and “right,” based on Media-bias/ fact-check. A weighted political score is then calculated based on the proportion of tweets from each category. This approach has been adopted in many contexts related to Twitter ideological modeling (Chang, Chen, et al., 2021; E. Chen et al., 2021; Ferrara et al., 2020; Huszár et al., 2022). Only tweets prior to a retweet are used to attribute ideology, to allow for the possibility that a retweeter switch is small. 2.6 SupplementaryInformation 35 Figure 2.8: Average political position of audience, by topic and source partisanship (Party of the MC). Validates “starting points” for diversity values in Figure 2.2. fig:ch2_SI_1 36 Figure 2.9: Raw values for the supply in the axes of Figure 2.2 and Figure 2.3. Blue denotes number of tweets from Democratic MCs; red denotes number of tweets from Republican MCs. fig:ch2_SI_2 37 Label PolicyTopic Accuracy 0 Not Policy Related 0.666 1 Macroeconomics 0.845 2 Civil Rights 0.883 3 Health 0.948 4 Agriculture 0.893 5 Labor 0.762 6 Education 0.929 7 Environment 0.782 8 Energy 0.834 9 Immigration 0.960 10 Transportation 0.885 12 Law and Crime 0.919 13 Social Welfare 0.885 14 Housing 0.925 15 Domestic Commerce 0.821 16 Defense 0.847 17 Technology 0.808 18 Foreign Trade 0.708 19 International Affairs 0.815 20 Government Operations 0.859 21 Public Land 0.679 AverageAccuracy 0.864 F1Score 0.864 Table 2.2: Accuracy and F1 values per topic from deep learning model (using the BERTweet architecture). Each policy topic is labeled and derived from the comparative agenda project. tab:ch2_SI_1 Label Accuracy Policy 0.926174 Not Policy 0.824503 F1 Score 0.88518 Table 2.3: BERTweet classifier accuracy for policy versus not policy. tab:ch2_SI_2 38 Chapter3 InteriorDesign: MultimodalandMultiplexAffordancesfromthe2020 BLMMovement ch:modality 3.1 Abstract We present and analyze a database of 1.13 million public Instagram posts during the Black Lives Matter protests of 2020, which erupted in response to George Floyd’s public murder by police on May 25. Our aim is to understand the growing role of visual media, focusing on a) the emergent opinion leaders and b) the subsequent press concerns regarding frames of legitimacy. We perform a comprehensive view of the spatial (where) and temporal (when) dynamics, the visual and textual content (what), and the user communities (who) that drove the social movement on Instagram. Results reveal the emergence of non- institutional opinion leaders such as meme groups, independent journalists, and fashion magazines, which contrasts with the institutionally reinforcing nature of Twitter. Visual analysis of 1.69 million photos show symbols of injustice are the most viral coverage, and moreover, actual protest coverage is framed positively, in contrast with combatant frames traditionally found from legacy media. Together, these factors helped facilitate the online movement through three phases, culminating with online international solidarity in #BlackOutTuesday. Through this case study, we demonstrate the precarious nature of protest journalism, 39 and how content creators, journalists, and everyday users co-evolved with social media to shape one of America’s largest-ever human rights movements. 3.2 Introduction On May 25, 2020, 17-year-old Darnella Frazier used her smartphone to film Minneapolis police officer Derek Chauvin kneeling on the neck of George Floyd (Hill et al., 2020). She uploaded the nine minutes footage to Facebook, and within two days, the video had become seen globally. By June 2, 2020, two Black women and music executives, created the viral hashtag #TheShowMustBePaused (Kaufman, 2020), calling for artists to post a single Black square on social media. By Tuesday, millions of black squares had flooded Instagram. Millions of everyday people joined the initiative and did the same in what came to be known as #BlackOutTuesday (Coscarelli, 2020). In this chapter, I examine how Instagram facilitated this movement, through its apex and eventual nadir. This case-study is compelling as Twitter and its text-based nature, since the Arab Spring uprisings, has been the center of social movement and collective action research (Arafa & Armstrong, 2016; Hermida et al., 2014; Howard & Hussain, 2013; Khondker, 2011). This is the first time visuals became the center of an online social movement, and Instagram became political around civil rights. More importantly, in modern social movements, it is often an incendiary photo or video that galvanizes the public’s support for a social justice issue (Corrigall-Brown, 2012; Edrington & Gallagher, 2019; Marsh, 2018; Neumayer & Rossi, 2018), which may point toward new paradigms of organizing around civil rights. 3.2.1 BlackLivesMatterandSocialMedia When George Zimmerman was acquitted for shooting and killing Trayvon Martin, an unarmed Black teenager, Alicia Garza, a political organizer wrote a love letter to Black people after the verdict was an- nounced. She posted it on Facebook (Garza, 2014) and ended it with the sentence, “Black Lives Matter." Her 40 fellow organizer, Patrisse Cullors, added a hashtag to the front of the message and shared it on Twitter. The virality of #BlackLivesMatter would come a year later, in 2014, when Darren Wilson, a white police officer, shot and killed an unarmed Black teenager, Mike Brown, and left his body in the road for four hours. In the ensuing protests (Buchanan et al., 2015), citizen journalists began using the hashtag to draw attention to Brown’s killing and the community’s outrage (Lopez, 2016). For the rest of the summer, use of the hashtag soared (Anderson et al., 2020). As the hashtag was created in response to police brutality, counternarrative hashtags also emerged, such as #AllLivesMatter (to reject claims of racism in policing) and #BlueLivesMatter (to support law enforcement) (Baptiste, 2021; Carney, 2016; Solomon et al., 2021). An exhaustive study of Twitter discourse found that six major communities relied on Twitter to discuss police brutality in 2014 and 2015: Black Lives Matter activists, Black entertainers, conservatives, bipartisan reporters, legacy media outlets, and young Black Twitter users. Freelon et al. (2016) analyzed 40.8 million tweets, more than 100,000 web links, and 40 interviews with frontline activists and allies, and discovered that the vast majority of those who tweeted using the #BlackLivesMatter hashtag denounced police brutality (Freelon et al., 2016). The authors also found that activists who tweeted movement-related news succeeded in educating “casual observers" who either expressed “awe and disbelief at the violent police reactions to the Ferguson protests" or “conservative admissions of police brutality," especially in the cases of Eric Garner and Walter Scott’s public police killings. Overall, the research postulated that activists’ primary goals for using Twitter were “education, amplification of marginalized voices, and structural police reform". Subsequent inquiries into the Black Lives Matter movement’s use of Twitter in 2016 included how users craft counternarratives to anti-Black racism (Jackson, 2016; Richardson, 2020). The Pew Center, for example, reported in 2016 that the #BlackLivesMatter hashtag peaked during the 10 days spanning July 7-17—with nearly 500,000 tweets of the back-to-back killings of Alton Sterlingaily. Studies such as these 41 centered around Twitter as a digital public sphere (Papacharissi, 2002). In the bookHashtagActivism (Jack- son et al., 2020), for example, the authors argue that the Black Lives Matter movement, and other contem- porary Twitter-based movements then, were propelled by strong Twitter communities of Black women. Although publications such as The New York Times or The Atlantic published end-of-decade pieces on how social media shaped modern protest in the 2010s, the emphasis remained on Twitter’s impact — until this study. 3.2.2 APoliticalInstagram Darnella Frazier’s cellphone video of George Floyd’s murder reinvigorated the Black Lives Matter move- ment, not unlike the photographs of Emmett Till’s 1955 lynching rebooted the dwindling, post-World War II Civil Rights Movement (Hermida, 2010). In Bearing Witness While Black, Richardson draws these paral- lels, arguing that pictures and videos have played an outsized role in Black movement-building. The rise in smartphone video as a tool for political testimony to explain why Black Americans “press record" as a means to fight back against systemic oppression in the US (Richardson, 2020). This observation motivates the study of Instagram not just as a facilitator of the 2020 BLM movement, but a space for citizen journalists to document civil rights movements. Media pundits and scholars have in recent years begun exploring Instagram’s affordances, and early content largely fell outside of the scope of social movement media. An early probe into the world of Instagram endeavored to explain how users’ “experiences of production, sharing, and interaction with the media they create" are mediated by the “interfaces of particular social media platforms" (Hochman & Manovich, 2013). This study was one of the first to use computational analysis and visualizations to explore Instagram’s social and cultural patterns. The team compared the visual signatures of 13 different global cities using 2.3 million Instagram photos, and honed in on 200,000 Instagram photos that were uploaded in Tel Aviv, Israel. While the three-month study confirmed that one could ascertain people’s activities and 42 political habits from the geotagged photos, there was not a sustained look at a particular viral moment, nor substantive advances to understanding how people used visuals to organize. Other Instagram studies from the 2010s followed a similar pattern. Scholars often investigated what images people uploaded to Instagram (Ferrara et al., 2014) or how to detect the age of a user from their photos (Jang et al., 2015), but stopped short of analyzing Instagram photos during moments of political unrest. Still other studies elucidated the effects of pop culture (Al-Kandari et al., 2016), or the five primary social and psychological motives of Instagram use: “social interaction, archiving, self-expression, escapism, and peeking" (Lee et al., 2015). Recent studies on political movements followed en suite. Meme pages and celebrities, for instance, have been recently characterized to spark instances of political participation in Morocco (Moreno-Almeida, 2021), maintain partisan identity in Canada (McKelvey et al., 2021), promote public health behavior (Chang, Pham, et al., 2021), or facilitate misinformation (Zidani & Moran, 2021). Recent work has focused on the growing role of visuals in protests directly, such as the case of the Hong Kong Protests in 2019 (Haq et al., 2022). Amid the virality of the #BlackOutTuesday hashtag on Instagram we decided to study the visual platform, finally, in a political context. The backdrop of the global discourse about Mr. Floyd’s controversial video was not the only reason for the shift away from Twitter and toward Instagram. As I find, Instagram content is four times more likely to be geotagged than Twitter content, which provides us invaluable insight into when and where the #BlackOutTuesday groundswell occurred. Instagram provided the right lens for studying the visual dimension of this phenomenon. Recent studies have also shown the importance of racial presentation in mediating (mis)information dissemination (Freelon et al., 2022), which may be more evident in the visual mode. Moreover, Instagram and its affordances are worthy of direct study. Many of the #BlackOutTuesday posts on Twitter linked back to an original post on Instagram. The platform remains a walled garden, which does not make it easy for users to hyperlink outward—users are only allowed one link from their bios. That is, Twitter can reference Instagram posts but not vice versa, creating an asymmetry in information 43 flow. Trapping captive audiences within Instagram gives the platform enormous power over what the user sees—in that content is mediated solely through one platform. There is certainly the perspective that #BlackOutTuesday was performative and did not directly ad- vance the movement. For instance, a recent study found, through 20 interviews of wellness influencers, that sharing of the square was for maintaining credibility with their following base (Wellman, 2022). How- ever, their work focuses primarily on influencers, users that have achieved a certain level of popularity, and in wellness, which is an even more specific slice of Instagram. Moreover, the greater point is that there was sufficient demand for influencers to navigate the movement, which creates a larger presence on Instagram than previously, whether solidarity is performative or not. 3.2.3 OrganizationalDynamics: ModalityandOpinionLeadership Modality, taken from semiotics, refers to the format for which information is stored, prior to presenta- tion (Bateman, 2012). The difference between Instagram and Twitter is thus the visual and textual mode, respectively. As such, Instagram is poised to advance research on how visuals aid social movements when held in comparison with the textual mode. I am particularly interested in two dimensions: framing and opinion leadership. First, the visual modality of engagement—through photos and video— demands a dif- ferent form of participation than through text. Per the adage “a picture contains a thousand words," visuals are dense and powerful message holders. Studies have shown visual messages are more temporally effi- cient (Powell et al., 2015; Rodriguez & Dimitrova, 2011) and amplify the affective reaction and reinforce textual content (Dan, 2018; Paivio, 1991). Beyond photos, injustice symbols can serve as a force that ex- tends political efforts beyond local, even national, accounts (Olesen, 2015). It’s useful to frame this study in terms of the traditional relationship between press and protest. Long- standing work has shown institutional news media portrayals delegitimize collective action (Chan & Lee, 1984; Detenber et al., 2007; Kilgo & Harlow, 2019). As conceptualized, images of protest are distributed by 44 legacy media sources. Protestors in turn sought their attention through disruptive or combative tactics, which led to condemnation, and thus delegitimization. The participation of citizen journalists, theoreti- cally, would remove this necessity, and recent scholarship explores this possibility. A recent experimental study finds visual frames important for increasing support and identification toward protesters (Brown & Mourão, 2021). Apart from the modality itself, the legacy media has traditionally occupied opinion leadership, and this flow of information through elites has garnered significant attention since the proposition of two-step flow from more than half a century ago (Katz, 1957). While research that intersects visuals in social move- ments on digital media remains sparse compared to textual analysis, there are still numerous studies. For instance, Neumayer and Rossi (2018) analyze images recently studied photos and videos of Twitter during the Blockupy protests in Frankfurt. However, they found reinforcement in regard to the politics of visibil- ity, where institutional and official accounts still garnered the most attention (retweets). Instagram, with its less-public interface and limited hyper-linkage could poise as a counterpoint to Twitter’s organizational dynamics. 3.3 LiteratureReview 3.3.1 RichnessandAffordances In BLM2020, there are two intuitive ways for which richness provided Instagram an advantage over Twitter and traditional platforms for organizing. First, as it was the video of Mr. Floyd’s murder that produced the global outrage towards injustice, Instagram would be a better platform than text-based Twitter. Second, the localized network allowed greater coalescence due to grass-root opinion leadership. We begin with the discussion on media, richness, and affordances. Media Richness Theory was de- veloped in the 80s following the growth of electronic media, beginning with phone calls and email (Ishii 45 et al., 2019). Since then, the development of technologies such as video conferencing afford richer trans- mission due to body language. In the original theory, the elevation of richness helps resolve ambiguity in certain situations. There are two elements. First, uncertainty denotes the absence of information. Second, equivocality describes the state of multiple interpretations. The product of these two dictate the efficacy of organizational communication. Taken in context of the typology of protest, Instagram provides an environment where posts with low uncertainty, but high equivocality can thrive. In the online mobilization literature, Theocharis et al. (2015) identified four ways textual messages functioned: (1) political mobilization; (2) coordination; (3) information; and (4) conversation. A single image can contain multiple dimensions of Theocharis’s typology of platform-mediated protest. A black square is simultaneously conversation, mobilization, and coordination. Beyond the function of posts in protest, we can consider the broader literature of organizational com- munication. As discussed in the introduction, the application of the algorithm fundamentally surrounds the issue of visibility. This is well theorized by Treem and Leonardi, who posit from an organizational per- spective that beyond visibility, social media also provides persistence, editability, and association Treem and Leonardi (2013). Whether it is in the protest literature or media affordances literature, the difference between text and image is the simultaneous presentation of multiple functions by image. A picture is worth a thousand words, largely due to the high equivocality inherent to visual media. 46 3.3.2 Framing Apart from resource mobilization and political opportunity processes, collective action frames have emerged as a crucial element within social movements. Framing is a guiding theory within the study of news cov- erage, and in clear but banal terms, how political issues are presented. In public opinion research, high- quality opinions are those that are stable, consistent, informed, and connected to principles and values. High-quality opinions are generally rare in the mass public (Chong & Druckman, 2007). Framing denotes how small changes in the presentation of an issue can generate large changes in opinion (Chong & Druckman, 2007). A common example is how when asked if people supported hate groups holding a political rally, 85% answered in favor when prefaced with “Given the importance of free speech.” This drops to 45% when prefaced with “given the risk of violence (Sniderman & Theriault, 2004).” A natural question is first, whether a similar manipulation of visuals can generate a similar change in opinion, then second, how media rich environments may facilitated information exchange differently. In the traditional media, content is generally theorized as emerging from the journalistic news gathering process, which is dominated by the frames of political elite. In the context of citizen journalism, framing becomes an activity that goes beyond media actors, but all social actors that engage with the social network. Our hypothesis is that the affordances of Instagram essentially provide means for which any protestor can engage with the framing of the movement. In combination with a feed-ranking algorithm, framing is no longer a top-down process produced by the framing of institutional accounts, but the evolutionary process of user-generated content and algorithm-driven popularity. Through the algorithm, media rich ecosystem allows more nuanced opinions and positive framing of protestors to gain visibility by shifting the source of opinion leadership. This achieves helps protestors achieve their communication goals within political organizing. 47 3.3.3 GeographyandNetworks Strip away the affordances that make a platform unique and you find a social network. Social structure of communities is often formalized as networks (Olson, 1989) that facilitate collective action and lower the cost of involvement. In particular, the strength of ties may be a significant contributor as strong ties are amenable to person recruitment. The frequency of interaction typically is typically correlated with tie strength (Centola, 2018), which can be a significant driver of collective action. On the other hand, weak ties can be crucial in providing information outside of one’s immediate circle (Granovetter, 1973). Social media increases the exposure to these weak ties, such as through topic-based algorithmic filtering or the following of celebrities (Shmargad & Klar, 2020). The collective influence of weak ties thus often overcomes the individual contributions of strong ties (Bakshy et al., 2012; Lubbers et al., 2019). Directionality and strength: However, one weakness of the strong-weak tie paradigm is that it is typically describe personal connections— best friends versus acquaintances. It does not distinguish be- tween verified users or celebrity accounts, nor the direction of interaction matters. Unlike social media platforms such as Facebook, platforms such as Twitter and Instagram “follows” are one directional, mean- ing A can follow B without B reciprocating a follow. The notion of parasocialinteraction thus becomes useful in these contexts, where one person extends emotional energy, interest, and time, whereas the other party is unaware of their existence (Giles, 2002). In the context of mass communication, this is common with celebrities and their audiences. This has been highlighted in the context of YouTube, where video- based audiences were cultivated (Kurtin et al., 2018). A possible hypothesis is thus, due to modal and media richness, the strength of these one-sided ties may be much stronger on Instagram than on Twitter, due to greater and deeper levels of engagement a priori. Localization of Content: Moreover, the localization of content proves relevant. As we find in our dataset, 13% of posts are Geotagged, compared to the average of 2% in Tweets in comparable datasets. A corollary of media richness may mean a greater sense of physical place on Instagram, hence localization 48 of content. Thus, the journalistic coverage generated by citizens is much more local, rather than a ho- mogenous mass. Indeed, the protests spread far beyond the United States, with large demonstrations in the UK and France (Sandford, 2020). The localized content increases the perceived level of diversity of the movement, which may increase identification with the cause and mobilize protestors. At the intersection of these two notions is how these networks are formed. The network structure is an intermediary notion as it both determines the information environment, but is determined by the platform’s affordances. Twitter has often been characterized as highly centralized— that is, a few users are followed by everyone. As such, Twitter’s diffusion pattern resemble “broadcasting” rather than the word- of-mouth analogy we often hear (Goel et al., 2016). However, due to Instagram’s localization of content, and potentially less centralized structure, the more interpersonal analogy may hold truer. 3.3.4 ResearchQuestions While offline events–such as widespread protests–certainly also occurred, contemporary social move- ments are coordinated significantly through online spaces, such as social media. Various reports have highlighted the importance of Instagram. If our assumption that a) online social platforms are important to protest organizing and b) Instagram has taken an outsized role compared to prior movements and Twit- ter, then answers to these questions can address how solidarity can be in part attributed to shifts in framing and shift in opinion leaders found on Instagram. As such, our investigation into Instagram offers rich comparative insight toward a) how the modality of engagement shifts frames of legitimacy, and b) the nature and role of emergent opinion leaders. We offer the following research questions and hypotheses: 1. Characterization: What were the temporal trends of June 2020’s second wave of the Black Lives Matter movement, in terms of frequency, geography, and textual content? 49 2. InjusticeSymbolsandLegitimization: What were the top shared images on Instagram, and how do they semantically, affectively, and symbolically function? 3. Network and Opinion Leaders: Do the central actors and communities that emerged on Insta- gram reinforce the institutional media, or are they conducive to grassroots connective action? 3.4 MaterialsandMethods 3.4.1 DataCollectionandDescription We monitored Instagram posts for #JusticeForGeorgeFloyd from May 28, 2020 to June 30, 2020, extracting the top shortcodes from the public hashtag page (every two hours using the Instaloader package). The hashtag was chosen as it was the top trending hashtag on Twitter, Instagram, and Facebook from the three days prior. Posts were then extracted through static short codes, including all photos, videos, post- specific public metadata, and comments. No personal information was collected, and non-verified accounts were hashed. Data was collected using Instagram’s public page with public posts only. The entire dataset consists of 1,147,278 posts and descriptions and 1,694,909 photos. We found 155,282 of 1.13 million culled posts (13.7%) have location tags. This is a significantly higher rate of geotagging than in a Twitter dataset, which averages 3-4% geotagged posts (Chang, Chen, et al., 2021; Chang & Ferrara, 2022; Ferrara et al., 2020). We posit this arises due to the visual, scrapbook na- ture of the platform, as pictures taken live are associated with a physical location. The inclusion of this metadata allows us to understand the flow of protest geographically with much higher statistical power. Table 3.1 shows the total number of posts by state, overviewing the distribution of US-based participation. While California, Florida, New York, and Texas occupy top positions due to their large population, Min- nesota is significant since the movement originated from there. Washington, DC generated a high level of participation as well, relative to its population. 50 State #ofPosts %ofposts Pop. Rank %byPop Post-to-pop. Ratio CA 32,705 21.10% 1 11.91% 1.77 NY 23,477 15.10% 4 5.86% 2.58 MN 14,935 9.62% 22 1.70% 5.66 TX 10,212 6.57% 2 8.74% 0.75 FL 7,501 4.83% 3 6.47% 0.75 DC 6,573 4.23% 49 0.21% 20.14 GA 6,506 4.19% 8 3.20% 1.31 IL 4,704 3.03% 5 3.86% 0.78 PA 4,685 3.02% 6 3.82% 0.79 OR 3,068 1.98% 27 1.27% 1.56 Table 3.1: Top states by number of posts and comparison with actual population statistics. The percentage by population shows the actual percentage relative to the entire population of the United States. As such, the post-to-pop. ratio describes the level of over-representation a certain state has, with D.C. leading at 20.14 times the representation. tab:top_posts_state_level To discern the discursive dimension of the protest, we collected the top hashtags and sorted them by usage in Table 3.2. These include iterations of George Floyd (such as #JusticeForGeorgeFloyd and #George- Floyd) and the Black Lives Matter movement (#blm and #blacklivesmatter). The phrase #icantbreathe also emerged as an important hashtag, co-occurring in 10% of all posts. Hashtag counts % hashtag counts % justiceforgeorgefloyd 1,147,278 100% minneapolis 43,066 4% blacklivesmatter 719,046 63% equality 42,539 4% georgefloyd 301,076 26% stopracism 38,684 3% blm 224,415 20% peace 37,484 3% blackouttuesday 198,372 17% breonnataylor 36,603 3% justiceforbreonnataylor 156,208 14% black 35,466 3% justiceforahmaud 144,287 13% repost 34,324 3% icantbreathe 126,407 11% alllivesmatter 32,894 3% justiceforfloyd 122,818 11% saytheirnames 32,607 3% nojusticenopeace 114,406 10% blackoutday2020 31,062 3% justice 109,319 10% justiceforahmaudarbery 30,918 3% protest 85,273 7% ahmaudarbery 28,020 2% policebrutality 61,884 5% acab 27,212 2% racism 54,935 5% usa 26,973 2% love 53,351 5% endracism 26,849 2% Table 3.2: Top hashtags of 540,591 unique hashtags. tab:top_hash 51 Moreover, the important feature of this second-wave Black Lives Matter movement was that it de- manded legal and moral justice for other people who died from white supremacist vigilantism and police brutality around the same time as Mr. Floyd. Dual campaigns for Ahmaud Arbery (who was killed by three White men in Georgia that spring while jogging) and Breonna Taylor (whom police killed in Ken- tucky after issuing a mistaken no-knock warrant to her home) also emerged. The #saytheirnames hashtag attempted to connect these cases. Briefly, the #JusticeForGeorgeFloyd hashtags occurred prior to his death. A quick survey indicated these hashtags were added retroactively to posts to generate attention and traffic, what scholars might refer to as the pursuit of clout. These posts were filtered out. 3.4.2 VisualContentAnalysisthroughPerceptualHashing To determine the top visual content that emerged from the movement, we first identified similar figures by conducting a perceptual hash (p-hash) on each image. This converts each picture into a 64-bit string, which is then used to extract the similarity between photos and identify the most popular images based on its hashes (Zauner, 2010). The algorithm then reduces image (usually to 32 x 32 pixels), then applies greyscaling, cosine transformation, then reconversion to a string, and thus yielding the resultant p-hash. 3.4.3 NetworkAnalysis We then constructed an interaction network using the full set of Instagram comments. A network is a set of nodes, which are connected by a set of edges. Edges can be directed or undirected. Directed nodes indicate a directional relationship (such as unreciprocated following relations on Twitter) or undirected (such as friendships on Facebook). In our analysis, direct edges are constructed between Instagram posters (source node) and people who comment (destination node). Additionally, associated with each edge is a weight—the frequency of times two users interact in the comments section. We observe users who interact 52 hundreds of times within their comments in such a short timespan of 30 days. This resulting network contains 3,337,890 unique users and 3,976,914 unique edges. We then aggregated these networks into state-level networks, to better understand how communica- tion flowed inside and outside local communities. We describe this formally below. Let U denote the set of users and S represent the set of States. A represented the adjacency matrix for users (columns) and States (rows). We then aggregate A into a State-level adjacency matrix. Let i be the row index and j be the column index (a i,j the element). The algorithm for generating the state-state network can then be summarized in Algorithm 3.1. A = Adjacency Matrix with Dimensions|U|×| S| A S = Pre-allocated|S|×| S|matrix for i, state∈ enumerate(S) : E := [v j ] ∀js.t.a i,j > 0,v j is a column vector A s [i,∗ ] = X j∈E v j T (3.1) eq:algo eq:algo In simple terms, for each user (represented by a column), we considered all the geographical sources to which the user had been exposed. If a user had engaged with a post from both New York (NY) and Minnesota (MN), then the incoming edge from MN to NY (denoted MN→ NY) would be equivalent to the number of times the user engaged with content from MN. Vice versa, NY→ MN denotes the number of times the user engaged with posts from NY. The user thus serves as a proxy for cross-engagement between states. The sum of all user-state aggregations is the total weight between two states. 53 We also share a brief note comparing Instagram and Twitter network construction. Since Instagram does not allow URL sharing and hyperlinks, exposure to a post is strictly mediated by Instagram’s al- gorithm. There is no direct equivalent of a retweet. Commenting on Instagram is closer to replying on Twitter, which naturally impacts the topology of the Instagram network. 3.5 Results 3.5.1 Temporalcharacterizationofthemovement Our first research question concerns what the temporal trends of June 2020’s second wave of the Black Lives Matter movement were, in terms of frequency, geography, and textual content. To address this question, we considered the frequency of posts over the month of protest. Figure 3.1 shows the distribution of post frequencies on an hourly basis between May 25, 2020 and June 25, 2020, on a log-basis. Right after the killing of George Floyd on May 25, we observed a lull of activity. This was punctuated then by an increase on May 31. This remained consistent until exploding in volume on June 3. Its use then decayed exponentially (from the order of 10,000 to 1,000 in two days). Figure 3.1: Volume of Instagram posts plotted on an hourly basis, separated by top hashtags shared during the George Floyd Protests. fig:ts 54 Figure 3.1 shows the temporal behavior of top hashtags (as given in Table 2), and shows what generated the spikes of activity between May 31 and June 3. Hashtags can be summarized as a few distinct categories: 1) mentions of George Floyd, Breonna Taylor, and Ahmaud Arbery, and 2) the BLM movement and the slogan #icantbreathe. The one exception is #BlackOutTuesday. It was localized during the spike on June 2, which suggests that the musician-led movement was not just a critical driver of momentum, but the most viral event on Instagram between May 30 and June 3. We can thus split the second wave of the Black Lives Matter movement into two periods: the first, driven organically after the death of George Floyd, and the second, driven purposefully via #BlackOutTuesday event. 3.5.1.1 GeographicNetworkAnalysis As noted in the methods, New York, California, and Minnesota claimed the highest activity, which is reflected temporally (Appendix Fig. A2). We thus our attention to the interaction between various states, to understand the localization of movements, summarized in Table 3.3. The summary technique follows network flow based on the direction of edges (Bui et al., 2022; Chang, Bui, & Mcilwain, 2022; Chang & Fu, 2021). By looking at the circulation of information in, out, and within a state, we can better understand the consumption and attention dynamics from a supply and demand perspective. High self-flow ( SF ) indicates significant localization of the movement, and relatively less attention to other locations. High out-flow ( OF ; exports) demonstrates that a state receives attention from other states. Higher in-flow ( IF ; imports) indicate a state pays attention to other states. In other words, this table shows the supply and demand dynamics at the state-level: ifIF > OF , then demand outstripes supply. IfOF > IF , then supply outstrips demand. We choose this state-level normalization as otherwise, small states will be lost in the analysis. Immediately, we noticed a high percentage within several states, with South Dakota at 0.805 and Ver- mont at 0.688. California had the highest frequency of posts at 0.617 (rank 4) and Minnesota at 0.589 (rank 55 6). We reached two conclusions. For states with high levels of participation (CA and MN), we observe high levels of self-generated content within the state. Interestingly, engagement with local activism is similar for states with smaller volumes of participation, which we posit may be due to their relative isolation. TopSelf-Flow State %withinState %exported %imported Totalpostsfromstate SD 81% 11% 8% 447 VT 69% 18% 13% 474 ID 64% 20% 16% 1,760 CA 62% 16% 23% 358,948 WY 60% 21% 18% 179 MN 59% 20% 21% 158,753 Topexporters State %withinState %exported %imported Totalpostsfromstate WV 25% 60% 15% 965 NE 36% 50% 14% 1,968 ND 35% 47% 18% 603 NH 37% 44% 20% 1,471 MS 31% 43% 26% 2,495 TopImporters State %withinState %exported %imported Totalpostsfromstate DC 43% 28% 29% 83,710 MD 42% 29% 29% 20,912 IA 33% 39% 28% 3,052 NY 55% 18% 27% 268,169 MI 39% 35% 26% 19,452 Table 3.3: Interstate network structure and proportion of posts seen within-state, exported to other states, and imported from other states. tab:imports We contrasted this to States with comparatively high levels of content “exports” and “imports.” What we mean by exports is the proportion of posts engaged by users connected to other States. What we mean by imports is the proportion of posts users engaged with that hailed from other States. Washington, DC is the highest importer, indicating the region had limited amounts of self-generated content. While DC is politically active, they also have a smaller population. Because small states have less people compared to externally. These results suggests for small states, connective action is demand-driven. Smaller states will feel like part of a bigger whole, because most their interaction occurs from outside. For the largest states, there is significant self-content. New York is also a top importer. However, it also boasted high levels of 56 engagement within its own State, indicating high levels of participation. As such, east coast centers, while having considerable self-content, also pays attention to what is happening to other states. We show these dependencies further in Figure 2, which each state’s top destination based on all user engagement. We observe three epicenters: California, New York, and Minnesota. California received the most content from New York and, reciprocally, New York from California. New York seemed to hold the attention of East Coast states, such as New Jersey, Maine, Rhode Island, and Connecticut, along with Southern States closer to the East Coast, such as West Virginia. The rest of the states, except for Mississippi, co-occurred most frequently with California. Mississippi co-occurred most frequently with Texas. These results harken back to prior US-based case-studies. During Occupy Wall Street, observed a similar hub- spoke structure on Twitter, although its activity was localized in California, New York, and Washington DC (Conover et al., 2013). Figure 3.2: Network of state-state exposure based on aggregated user engagement. Direction of attention are shown by arrows, then colored by general regions in the United States. fig:state-network 57 To summarize, we identify two interesting trends. First, the time series suggests we break the move- ment into two waves, starting from May 31 and June 3 respectively. Second, while we observe two epi- centers in New York and California, we also observe highly localized, generative activity in Minnesota and smaller states, with geography-based correlations. These would suggest different extents for which messages are propagated (Olesen, 2015). Thus, to understand this dynamic more clearly, we now move into the critical parts of our analysis—visual content and network analysis. 3.5.2 InjusticeSymbolsandLegitimization 3.5.2.1 VisualContentAnalysis The focus of this section are the top shared images on Instagram, and how they semantically, affectively, and symbolically functioned. Figure 3.3 shows the most shared photos during the month of protest. We see a few themes, which we used to construct a typology. The most popular photo during the period we observed was the #BlackOutTuesday full black square. The second type of popular posts were three iterations of George Floyd’s portrait. The first is his original selfie (GF original), a stenciled style (GF portrait), and a floral version in remembrance (GF floral). The third type of common posts were official logos for the BLM movement, which we denoted as BLM grayscale and BLM yellow. The fourth type of common posts were the edited pictures of protest, such as the one in the bottom center. Note, these were prevalent further down the list of top photos. Lastly, the fifth common post type focused on information sharing and organizing, such as the infographic found in the center left. This infographic contains six places to donate money, such as foundations, bail for protestors, or medical fees for those harmed during the protests. Per our discussion of injustice symbols, none are as iconic as the raised fist, which was among the top 9 most diffused icons. And in general, 7 of the top 9 are all abstractions or calls to a greater movement. For instance, the most popular of Mr. Floyd’s portraits is the floral rendition—one that emphasizes memorial. 58 Figure 3.3: Top photos that emerged from the 2020 George Floyd protests. From top to bottom and left to right, we have the Black Out Tuesday Square (a), logos and icons of the Black Lives Matter movement (b, d, i), portraits of George Floyd (c, e, g), and photos of protest (h). fig:pictures_and_symbols As previously theorized for the world before social media, while protesters would complain about being misconstrued as angry and combatant, this was one of the only ways of gaining media attention and thereby generate awareness through cable television (Bonilla & Rosa, 2015; Chan & Lee, 1984). 59 Instead, in this Instagram-led protest, coverage of the protest seems to have come secondary to these injustice symbols. Furthermore, the logic that brings about this coverage of protest action is based on the algorithm, and thus grassroots popularity. The effects of this change in logic is most clear with how protesters are framed. Instead of the deviant protesters, we see poignant, positive frames of protesters in action. How do these photos act together? From a rudimentary level, these photos resemble prior typologies of protest content based on Twitter (Theocharis et al., 2015): (1) political mobilization; (2) coordination; (3) information; and (4) conversation. However, unlike Tweets, these individual photos resist singular placement into these four categories. The #BlackOutTuesday square is simultaneously a call for action, and a form of coordination to incite conversation. Similarly, the George Floyd portrait is political mobilization, information, and conversation, as its form shifts over time. Early forms of George Floyd’s portrait were more targeted toward information sharing, whereas latter ones were for conversation. Only the one for funding sources has an explicit informational categorization. Here, we experience a distinction with theories based on Twitter. Whereas prior classification is based upon clear, textual information—the four-pronged typology cannot be applied neatly to images, since the categorization depends more on how it is used (the coordinated flooding of images) and its actual contents (as part of a broader piece of social justice discourse). Note, while coordinated flooding has happened on Twitter (Chang, Pham, et al., 2021; Kirkland, 2021), this has been primarily textual. We can further analyze the evolution of these photos temporally. Figure 3.4 shows the BLM logos, the full black squares of #BlackOutTuesday, and the funding source. Before May 31, there was little volume from the BLM logos, until the grayscale logo exploded in volume—along with information about funding. This indicates some form of coordination (whether organic or pre-determined). Then, on June 2, (the day of #BlackOutTuesday), another eruption of posts emerged. We observed these beginning as early as midday 60 Figure 3.4: Time series of top icons diffused during the George Floyd protests. Figure 3.4a) shows the diffusion of the three BLM logos and the funding infographic. Figure 3.4b) shows the three iterations of George Floyd’s portrait. We observe much earlier volume in the portraits as compared to BLM and protest organizations. fig:pictures_ts on June 1, yet the small notch in the black line right before June 2 suggested that users waited for the actual campaign day before posting. Figure 3.4b) shows the permutations of the portraits of George Floyd. Notably, the sharing of these photos began much earlier than the BLM logos. The first post that emerged at scale was the realistic portrait, rather than the stylized, floral version of him. Then, on the night of May 31, there was a great acceleration of his floral portrait. The popularity of his floral portrait, designed in remembrance rather than simple outrage, became the most popular icon involving him as a person. Similar portraits of remembrance 61 were made for Breonna Taylor. In other words, the shift in portrait preference also indicates a shift from mobilization and information to conversation and memorial. By comparing panels a) and b), we observe an early focus on George Floyd as an individual, as there was volume for his portrait rendered early on. The Black Lives Matter logos emerged on May 31, coinciding with the massive protests on the following weekend and thus generating critical mass. We thus make a temporal observation—the death of individual Black Men coalesce once they relate to symbols of injustice. In summary, by shifting to algorithmic-curated content, one that is based on grassroot popularity rather than institutional agenda setting, the visual frames shift from combatant to memorial, protestors from de- viant to poignant. Visual frames also shift overtime, as victims of social injustice are affectively connected to prior symbols of injustice. Next, for us to claim these symbols emerge through algorithmic logic, we need to investigate the central users—namely whether this content was dictated by the institutional media or not. 3.5.3 NetworkedFlowandOpinionLeaders Next, we investigate the central actors and communities that emerged on Instagram. Table 3.4 aggregates the top 10 accounts by the number of likes. What is most interesting about these top users is that their differences largely outweigh their similarities. The top-ranked user is The Shade Room, an online pub- lication that specializes in celebrity gossip specifically within the Black community, which sets the tone for our focus on entertainment-based groups, specifically meme-related pages. Accounts that fall into this category include thedisappointingexperience, which puts out a large volume of posts, and tenth-ranked DomisLiveNews, which combines world news in a meme format, often with a hyper-focus on trivial events such as a Black rapper changing into a pair of new Yeezy sneakers. Their presence on the list is likely due to their large, pre-existing audience base. 62 Username Likes Name AccountDescription/Occupation theshaderoom 7,849,147 The Shade Room Publication of celebrity gossip predomi- nantly within the Black community _stak5_ 6,405,100 Stephen Jackson, Sr. Former NBA player midianinja 3,741,410 midia NINJA Independent narratives and journalists based in Brazil. the_female_lead 2,611,844 The Female Lead Account of an education charity iamjamiefoxx 2,191,800 Jamie Foxx African-American actor thedisappointing- experience 1,816,197 N/A Semi-automated meme account gentlemanmodern 1,610,376 Gentleman Mod- ern Independent photographer based in New York diet_prada 1,421,634 Diet Prada A fashion watchdog group that emerged as a serious voice campaigning for integrity and accountability within the industry shaunking 1,382,424 Shaun King American writer and civil rights activist domislivenews 1,372,158 DomisLiveNews Meme group for hip-hop and world news Table 3.4: Top 10 accounts by likes in the dataset. Opinion leaders include magazines and meme pages, in addition to institutional or established celebrities. tab:top_accounts Surprisingly, there are only a few real individuals in comparison to the organizations. We observed a former NBA player (Stephen Jackson Jr.), an actor (Jamie Foxx), an activist/writer (Shaun King), and a photographer (Gentleman Modern). Apart from King, most of these users are celebrities that have a high follower count (pre-existing audience). The photographer stands apart as he is not a public figure. His prominence can be attributed to his viral post of two children (one white and one Black) running toward each other and hugging. This lone video garnered more than 1.5 million likes. On Twitter, Neumayer and Rossi’s found users with institutional support rise to the top, such as politi- cians, local police forces, and media outlets. Instagram’s slate of citizen journalists, fashion magazines, and celebrities stands in contrast. What this shows is that Instagram can push the content of non-public figures and organizations to the forefront of a movement. His presence is significant as it highlights the role of individual content creators, serving in a similar capacity to citizen journalists. Anyone who participates can become an opinion leader. 63 Apart from these entertainment-oriented groups, we also observe other social justice-oriented organi- zations. For instance, Diet Prada is a fashion-related watchdog organization that began from the work of two fashion industry professionals and grew into a significant voice in critiquing its business practices in- cluding racism. They are most well-known for identifyingDolce&Gabbana’s racist ad in Shanghai which features an Asian man struggling to eat Italian food with a chopstick. Lastly,midianinja deserves its own separate mention. Based in Brazil, its website is almost completely in Portuguese. Instead of an individual or an organization,midianinja is a collection of journalists, photog- raphers, and media creators focused rather on a theme. The following excerpt appears on their website: Based on the collaborative logic of production that emerges from the networked society, we connect journalists, photographers, videographers, designers, and enable the exchange of knowledge between those involved (translated). The presence of a non-American entity in the top percentile of influencers speaks not only to the global reach of the movement, but also to the role that international organizations took in collaborating with US-based journalists and photographers. Rather than individual journalists networking, existing coali- tions own their own Instagram account, which instantiates quite directly the concept of “networks as actors” (Bennett & Segerberg, 2015). Opinion leaders exist in every social network. Understanding differences across platforms provide valuable insight to how platform affordances can engender different types of leaders and information environments. Specifically, Neumayer and Rossi find "visuals in political protest in social media reproduces existing hierarchies than challenges them (Neumayer & Rossi, 2018).” They find traditional opinion leaders with institutional power in the form of politicians and the local police forces, which share expressions of latent violence in their visuals. In contrast, entertainment organizations. On Twitter, perhaps due to an algorithm that facilitates public diffusion, the leaders are official accounts and opinion leaders. On 64 Instagram, the most popular accounts are fashion magazines, independent journalist, and meme pages—a collection of opinion leaders that represent everyday content creators and citizen journalists. To drive this comparison home, we close this section with a broad overview of the Instagram public sphere. Figure 3.5 shows the entire Instagram network, visualized through ForceAtlas, a force-based net- work layout algorithm (Jacomy et al., 2014). The listed users are the top users by in-degree, meaning ones whose posts featured the most interaction via commenting. The top users are associated with a certain amount of white-space, which corresponds to their level of influence. Figure 3.5: Network of users (n=713,209), where links are constructed between commenting users and the original poster. Small, dense communities in the center indicate diverse consumption of content. fig:network_all 65 One important qualitative observation is the large number of everyday users located in the middle. This indicates a diverse diet from a central core of users, driven by everyday users. Users are also colored by the state for which they consumed the most content. The graph appears heterogeneous (colors are well mixed), which is a testament for its international reach. Most importantly, topologies where users are found clustered in the middle indicate many connected small worlds, which lies in contrast with many Twitter visualizations (Chang, Chen, et al., 2021). On Twitter, direct visualizations lead to “hair-balls”—opinion leaders occupy the center while users retweet or reply to only one of these. To summarize our answer to RQ3, the most prominent users are non-institutional accounts with ex- isting Black audience-serving publications, individual journalists or content creators, social justice orga- nizations, and meme groups. Unlike Twitter, where preexisting organizations garner more attention, on Instagram entertainers that specialize in content production with an existing audience base generate the most engagement throughout the protests, especially those who have a history of engaging with social issues. 3.6 Conclusion The purpose of this study was to understand how Instagram mediated the Black Lives Matter protest in 2020, as a counterpoint to Twitter-based research, while intersecting the rich literature on social move- ments, visual framing, and connective action. We narrow our focus on a) how the modality of engagement shifts frames of legitimacy, and b) the nature and role of emergent opinion leaders who dictate the flow of information. To do so, we analyzed the movement across the spatial, temporal, semantic (both visual and textual), and communal dimensions. Spatially, three epicenters arose in New York, California, and Minnesota. While the movement gar- nered large-scale heterogenous reach, localized activity in Minnesota and smaller states emerged. The movement itself consisted of three phases. The first, latent phase arose from the sharing of George Floyd’s 66 portrait following his killing. The second phase began with the explicit mobilization with BLM. The third phase, which generated the largest level of engagement, was from the #BlackOutTuesday campaign. We identified five types of posts: (1) portraits of George Floyd, (2) BLM logos, (3) protest activism, (4) organi- zational infographics, and (5) the #BlackOutTuesday square. While these post resemble Theocharis’ (2015) characterizations of protest communication, these functions shift as the pictures themselves evolve. For instance, the portrait of George Floyd shifted from informational to means of remembrance. Furthermore, multiple of these functions can fold into a single visual. With this, we offer a few take-aways. Visuals, when coupled with Instagram’s affordances, allows non- institutional opinion leaders to emerge. This stands in contrast with Neumayer and Rossi’s (2018) study on Twitter, which finds official accounts of pre-existing institutions still holding the most clout (Neumayer & Rossi, 2018). Furthermore, these opinion leaders are independent journalists and entertainment-based accounts, such as content creators and fashion magazines, many that embody the “networks as actors” paradigm described in the logics of connective action (Bennett & Segerberg, 2015). In regard to the modality and images themselves, the iconography evolves as the protest evolve. More- over, due to the emergence of non-institutional leaders, the visual content of protest differ dramatically. First, instead of actual imagery of protest, the most popular images were symbols of injustice (Olesen, 2015). Second, framing shifts from combatant to memorial, deviant to poignant (Chan & Lee, 1984). Pos- itive framing has the ability to increase support and identification toward protestors, thus gaining legiti- macy. Given that social media is the largest source of news, it would not be unrealistic to think most users now obtained most of the protest through platforms like Instagram. Together, these factors that emerged from Instagram’s unique ecosystems helped facilitate the Black Lives Matter movement of 2020. Our study has some limitations. Since our monitoring is done daily at regular intervals, there may be biases toward the content depending on how Instagram updates its public pages. It is also impossible to tell how Instagram’s filtering algorithm works as well. For instance, we observed that posts by other celebrities 67 were suppressed somewhat on the public page, and more likely to induce a more grassroots perception of the platform instead of reinforcing the presence of popular personas. Additionally, there is the perspective that Instagram and #BlackOutTuesday was a mere instance of hashtag activism. However, regardless of whether such attention was performative or not, this drives traffic disproportionately to Instagram instead of Twitter. As such, the ambient journalism being projected would be distinct from the institutional media or Twitter. A natural follow up study is in the works, tracing the flow of information from Instagram to Twitter. This case-study on Instagram refines our notion of tie strength in terms of parasocial relationships. These parasocial ties are strong because of the prior engagement, investment, and exposure generated by individual users, but weak in that they are one directional. Moreover, the media richness of Instagram make these ties appear much more personal, frequent, and visceral than the text-based ones on Twitter. Interaction is richer in direction, physical location, and media richness. We find that these parasocial ties generated by entertainment accounts may have been crucial to dif- fusing information in ways that personal strong and weak ties cannot. These parasocial ties are strong because of the prior engagement, investment, and exposure generated by individual users, but weak in that they are one directional. Media richness of Instagram make these ties appear much more personal, frequent, and visceral than the text-based ones on Twitter, while providing local contexts of protest. As a generalization, media richness is conferred by the mode (visual vs text), while semantic richness emerges from the network. The networks configuration dictates how many simultaneous messages it can transmit. This effectively ties together the literature of information diffusion and diversity in a network setting, by framing the question as follows: in what configurations of network and media allow the pro- duction of sustained movements? Based on our results, media rich networks with diverse local accounts allow is what generates sustained and deep diffusion. 68 Overall, our research pushes the field forward in a few important ways. Our study confirms obser- vations from recent work: that groups often considered frivolous in their creation and dissemination of content—meme groups and fashion magazines—can spark instances of political participation. Entertainers can get serious, and when they do, they can command the attention and shape the behavior of an entire nation. 69 Chapter4 TheBartendingAlgorithm: AMathematicalModelofSocial Communication ch:friends 4.1 Abstract This chapter formalizes a model of measuring information diversity on any multiplex social network, given a set number of topics. I provides a general strategy for cross-sectional analysis at the tie-level (i.e. strong ties and weak ties) and group-level (i.e. liberals versus conservatives). After the formal model, we discuss the dynamics of information diversity through both analytic and empirical approaches via simulation. I combine measurements from prior chapters of group dynamics, tie strength, and multiplexity into a unifying model. The two broad objectives are to a) characterize the information environment of social networks as multinomial distributions under power-law sampling (i.e. the distribution of messages), and b) the effect of algorithmic amplification of strong ties. For group-level, comparisons, tie distribution is a dual, where instead of strong/weak ties we have in-group and out-group ties. The four-way product of topic homophily and strong-tie amplification for each group determines asymmetries in the information environment. 70 4.2 Literature Researchers have recently placed significant attention to understanding information-limiting systems, such as "echo chambers," "filter bubbles," and "rabbit holes (Anandhan et al., 2018; Flaxman et al., 2016; Garrett, 2009)" Recommender systems have borne the brunt of the blame, as they are what dictate what people see and interact with on online platforms (Overgaard & Woolley, 2022). This is somewhat ironic, as recommender systems help users curate their own information environment, based on their tastes. In- creased engagement is a hallmark of being effective. No recommender in the private sector without some form of engagement optimization—this is the primary driver of sustained user retention (Buchbinder et al., 2007). The implications of limited information environments have been explored as early as the 1950s, such as Phillip K. Dick’s homeostatic newspaper to mass media as “agents of reinforcement” (Dick, 2012; Klapper, 1957). In other words, anxiety around information-limiting environments existed more than 60 years ago, and the effects of cutting out relevant discourse. One of the contemporary issues is that these terms are nebulous and subject to redefinition in every paper. As a result, some studies show their existence of echo chambers (Chitra & Musco, 2019; Kitchens et al., 2020), and other times do not (Allcott et al., 2020; A. Y. Chen et al., 2022; A. Guess et al., 2018). A better way to characterize these systems as feedback loops across exposure (what is seen), engage- ment (what is interacted with), and belief updating (what is thought). For this study, we focus on exposure and engagement, the two that can be directly measured with digital trace data. For most social networks, the exposure funnel describes different stages of system-level content selection: • Theinventory is any content that can possibly be randomly selected. • Thenetworkinventory is content from one’s immediate social network, via shares or direct con- tent generation (i.e. content facilitated through one’s network connections). 71 • Exposedcontent is content chosen by the algorithm. • Engagedcontent is content a user engages with. A feed-ranking algorithm is an algorithm that prioritizes the types of content it believes a user would like to see or engage with the most (Meta, n.d.).This study focuses on modeling the interaction of feed-ranking algorithms and social networks, or from the network inventory to exposed content. We expect an interaction because social relationships are not made equally. Within a person’s social network, some social connections are stronger than others. People not only have friends, but best friends (Jones et al., 2013). And if the algorithm’s goal is to increase engagement, then content from best friends may be pushed to a greater degree. This may come at the cost of lost information diversity. InTheStrengthofWeak Ties (1973), Mark Granovetter empirically showed that weak ties, through the form of acquaintances, were bridges that were crucial to finding jobs outside one’s immediate network. Not only did it serve to individuals a more diverse information set, but also led to higher job satisfaction and compensation (Granovetter, 1973). His work resulted in a contemporary paradigm split by strong ties (friends and family) or weak ties (acquaintances or those who share a similar cultural background). Weak ties in particular have been shown time and time again being critical to information diffusion (Bakshy et al., 2012). Bakshy and colleagues show that on the social media platform Facebook, the collective influence of weak ties may produce a dominant role in information diffusion. In terms of engagement, Holtz and colleagues recently noted the trade-off between diversity and engagement in Spotify (Holtz et al., 2020). Another open question that exists is whether people can have multiple strong ties across different platforms. It is natural to act differently with our childhood friends than managers at the workplace. In response to the rise of TV shows, Erving Goffman noted new broadcasting technologies eliminated the ability of performers to pick their audience and act in front of them. Contextcollapse describes the in- ability to act differently in front of different audiences. The concept extended naturally with the rise of social media as platforms like Twitter do not allow the determination of distinct audiences (Marwick & 72 Boyd, 2011; Spurk & Straub, 2020). Individuals then learn to navigate this shared larger audience through targeting, concealment, and signaling authenticity. On the other hand, Media multiplexity, refers to when different communication media are not necessarily substitutes; rather, the closer two friends are, the more likely they will interact on different media (Garton et al., 1997; Haythornthwaite, 2001; Haythorn- thwaite & Wellman, 1998). Substitution is a reasonable assumption. One could argue the more time you spend with a friend in real life, the less you spend online. With their results, Jones and colleagues argued, this is not necessarily the case (Jones et al., 2013)—that close friends interact just as much, if not more, in online spaces. Most studies investigating information diversity are empirical, such as that on Twitter (Han et al., 2017) and Facebook (Bakshy et al., 2015). Theoretical models (in the graph-theoretic sense) tend to focus exclusively on message diffusion (Bao et al., 2016) with extensive surveys (Guille et al., 2013). Even those that study information diversity with tie strength tend to have diffusion as a focus (Bakshy et al., 2012; Zhao et al., 2010). Moreover, these models focus more on identity diversity rather than message diversity (Han et al., 2017; Sanz-Cruzado & Castells, 2018). As such, there is a literature gap in studies that focus on tie strength and information diversity, es- pecially ones that take a more formal-theoretical approach. To fully address these different scenarios, a model of information diversity that works across tie-strength, multiplex networks, and group affiliation is needed. 4.3 DesignPrinciples Why does this model matter? Or rather, how do we design a model that matters? As mentioned in the introduction, changing the algorithm is the biggest decision a company makes, and A/B testing is limited by the magnitude in experimentation. Simulations offer a possible way out—to better understand intervention 73 strategies. However, constraints in feasibility prevent this. This model was designed to suitably address this issue, while remaining rooted in social scientific inquiry. To this, there are three main benefits. Feasibility: Having 100,000 nodes sampling millions of messages across 20 topics already takes up considerable space. Simulating 1 million would be extremely time-expensive. Simulating a 3 billion user network like Facebook is impossible. This model makes computation much more efficient by taking the combination of topic entropy rather than the permutation, by assuming all people’s interest follow some aggregate long-tail distribution. This allows sampling from just one distribution, rather than a separate, topic-fixed distribution for every user, as only the aggregation of entropy matters. BalancingSpecificityandGenerality: When working with mathematical modeling, there is always a balance between specificity and generality. Hyper-specific models are rarely useful because they “over- model” their case study and are thus not extendable to other analogous systems. On the other hand, overly general models are either to simplistic or require context-adaptation to be useful. This model offers a balance of specificity and generality, through the distribution of messages and topics. It is specific in the actual proportion of topics matter, but general in that we do not distinguish specific topics from each other. For the cases of social systems, a good point is fundamental cognitive biases or processes that humans share, as the underlying kernel for modeling assumptions (such as those prescribed by Page (2018) (Page, 2018). My model is rooted in canonical theories of preferential attachment and the long tail (retail) in tastes. PracticalIteration: Many complex models are hard to apply to real-life scenarios. For instance, sim- ulating a complex contagions diffusing on an actual tech platform would be very difficult. For one, fitting the user-level parameters into a user-specific model takes a lot of resources on distributed computing. On the other hand, fitting user-level data to a roughly linear simulation is much more efficient. 74 4.4 Formalism LetG = (V,E) denote a network, whereV is the set of vertices indexedi∈|V| = N andE the edges. Here, the edge weights denote the number of interactions—in the form of messages— between two users i and j. Let K denote the number of unique topics messages can encode. Each useri is associated with a topic profile v i , and for the sake of initial simplicity, assume only one element ofu i is equal to one— a topic the user has a particular affinity toward. The information environment v(i) is then defined as a vector of the number of messages from all neighbors, by topic. Note, this implicitly encodes for a multiplex weighted network, where each layer of edges is subset by topic. The information diversity of useri is then defined by the Shannon Entropy: H(i) =− K X k=1 p k logp k (4.1) eq:entropy eq:entropy wherep k is the proportion of topick in the information environment. For the first part of this analysis, we will evaluate the meanbehavior. This is the profile for the average user. We will later evaluate and measure the effects of additional variance, specifically from the social network and the categorical variance of topics. As such, we assume each topic has an equal chance of being encoded as a message: p k = 1 k ∀k (4.2) Thus, for a random user, their information environment will look like this: Inf(i) =|deg(i)| 1 K 1 K . . . 1 K (4.3) eq:rand_user eq:rand_user 75 To get the normalized proportion, we divide by the total degree (total number of messages), leaving us with a vector of 1 K . The mean entropy is then given as: H(i) =− K X k=1 1 K logK − 1 =− K× 1 K logK − 1 = logK (4.4) eq:uniform_entropy eq:uniform_entropy 4.5 Tie-LevelAnalysis 4.5.1 StrongTiesandWeakTies People not only have friends, but they have best friends. We formalize this notion for information diversity on social media, and introduce a few more variables. Letr denote the proportion of ties that are strong ties. For instance,r = 0.1 means 10% of ties are close friends. Next, letα be the correlation of topic between useri and their neighbors. Specifically, it denotes the increased probability your friends share something you like. Without loss of generality (WLOG), assume this is the first topic (indexed at 0). The proportion increase is thenαK − 1 . After adjusting the other terms, we arrive at: αK − 1 (K− α )K − 1 (K− 1) − 1 . . . (K− α )K − 1 (K− 1) − 1 (4.5) Note, while this can be written additively, recasting this as equivalent equations in multiplication allows for more elegant expressions in logarithms. Additionally,α must satisfy: 0< α K <= 1 → 0<α< =K (4.6) 76 As this correlation describes messages sent by a neighbor, this is a purely exposure-level parameter. So far, this covers the exposure portion of the model. Next, we designate amplification parameter β . There are two primary reasons why responses to strong ties may be amplified. First is the engagement effect: people tend to engage with their best friends more than normal friends. Given some form of homophily, then you interact with people more similar to yourself more frequently. Second is an exposure effect: as these online platforms are governed by feed-ranking al- gorithms, they will push more content from strong ties. This is known as algorithmic amplification (Huszár et al., 2022). The purpose here is to split the increased exposure due to the topic and due to social ties, barring nonlinear effects between the two. There is actually no reason to include a variable β . It’s presence is for conceptual clarity— r is the base division between strong ties and weak ties, whereas b denotes the amplification on the perceived stock of strong-tie messages. Varyingβ is equivalent to varying the extent of algorithmic amplification, which increases the perceived content of strong ties. We can now put all these elements together. Under this strong-weak tie framework, the information environment can be described as: Inf(i) =|deg(i)| 1− rβ K 1 1 . . . 1 +rβ αK − 1 (K− α )K − 1 (K− 1) − 1 . . . (K− α )K − 1 (K− 1) − 1 ! (4.7) In plain language, the information diversity of a user is the sum of weak tie exposure and strong time exposure weighted byr. Weak tie (WT) exposure is the same as the random case (Eq. 4.3), thus the WT entropy of the average user is H(WT) = logK, per Eq. 4.4. We can now explicitly state strong-tie exposure. We define variable γ = α K . Then the entropy of strong-tie exposure alone is: 77 H(ST) = α K log α K +β K− 1 X k=1 K− α K(K− 1) log K− α K(K− 1) (4.8) = α K log α K +(K− 1) K− α K(K− 1) log K− α K(K− 1) (4.9) =γ log(γ )+(1− γ )log (1− γ ) K− 1 (4.10) Here,γ is a useful synthetic variable as it denotes how muchα is relative to the total number of topics. This closed form suggests what matters the most in these entropy calculations is not an absolute value of α , butK and the ratio withK. The expected difference in diversity between weak ties and strong ties is thus: H(WT)− H(ST) = logK− γβ log(γβ )− β (1− γ )log β (1− γ ) K− 1 (4.11) eq:entropy_diff eq:entropy_diff We can then designate two probabilities that arise out of this model: p(i) = p 1 = 1 K 1− rβ +rαβ ifi = 1 p 2 = 1 K 1− rβ + rβ (K− α ) K− 1 ifi> 1 (4.12) eq:binomial eq:binomial The general entropy of the average user is then: H(i) =p 1 logp 1 +(K− 1)p 2 logp 2 (4.13) eq:weighted_prob eq:weighted_prob The distinction between strong ties and weak ties can be generalized to multimodal messaging as well— for instance, we can expect increased engagement or amplification of visual content over pure textual content. The resulting dynamics would be equivalent. 78 Figure 4.1: Heat map of the number of topics K versus the topic homophily constant. Fig. 4.1a) shows the overall entropy derived from the information environment. Fig. 4.1b) shows the difference in entropy across strong ties and weak ties. fig:k_vs_a 4.5.2 Closed-formvisualizations Figure 4.1 shows how the number of topics (K) changes with the topic homophily factor (α ). In lieu of known properties of Shannon entropy, the diversity increases as we increase the number of topics as shown in Figure 4.1a), more so than increments to the amplification factor. At higher levels of α (near the slope, we observe darkening which indicates close to an entropy of1 (since whenα = K only one topic is ever messaged). Figure 4.1b) shows the divergence in distribution between weak ties and strong ties. It is roughly linear, and per Eq. 4.11 is scaled by a factor ofK. Figure 4.2 shows the same panels but for the relationship betweenK andb . Figure 4.2 a) shows that as algorithmic amplification increases, the level of diversity also decreases, as a greater proportion of in- ventory appears in the feed. On the other hand, the difference between the strong and weak ties doesn’t change all that much, as seen in Figure 4.2b. This is because whileβ changes the perceived information environment, the distributions at the stock level are unchanged. Figures 4.1 and 4.2 show an important dis- 79 Figure 4.2: Heat map of the number of topicsK versus the strong-tie amplification constant β . Fig. 4.2a) shows the overall entropy derived from the information environment. Fig. 4.1b) shows the difference in entropy across strong ties and weak ties. fig:k_vs_b tinction betweenα andβ . Homophilyα is something that alters the information environment by shifting the underlying multinomial distribution, by adding skew. The amplification factor works by influencing the perceived presence of strong ties, but the underlying distributions are unchanged (hence no change when comparing across strong ties and weak ties Fig. 4.2b). NoteontheDiversity-EngagementTrade-off: If we assume strong ties generate more engagement, then we can thus expect diversity to go down. In other words, the boost in manipulating inventory will increase engagement but decrease diversity according to these relationships on entropy. 80 4.6 ComplexSystems: AddingVariance 4.6.1 VarianceviaSocialNetworks While this model may appear reductive, the general dynamics in topic homophily and amplification of strong ties hold across non-uniform multinomial distributions. For instance, the amplification may occur across two or three topics, and a similar alpha would be distributed away from other topics. However, the mean behavior is rarely interesting—the next step is to consider information diversity from a distributional perspective. The goal here is less realistic modeling, but to develop ways of measuring information diversity in any general setting. Here, I identify two sources of variance— in the number of messages, then the distribution of topics. The first source of variance is that users have a different number of friends. Social networks often exhibit power laws, which can be generated through processes like preferential attachment (Barabási, 2009). From a supply perspective, this means some users may receive one or two messages; others will receive hundreds. For users that only receive one message, as there can only be one topic, we will measure this as having no diversity. We would expect an issue of discretization in a lower number of messages. It is thus useful to investigate how changing the number of messages influences information diversity. Figure 4.3 shows the kernel density estimates of entropy values across a range of messages, with 50 total topics. At n = 2 (blue, solid), we see two peaks that correspond to the possible values of 0 and 1. Entropy equal to 0 occurs much less frequently as this requires both messages to fall on the same topic p = 1 K 2 = 0.025, whereas an entropy of 1 occurs much more frequently, at 97.5%. This type of discretization occurs atn = 5 andn = 10. Eventually, this smooths over at a higher number of messages. We also observe interaction with the homophily parameterα in Fig. 4.4, across 10,000 samples each. In Fig. 4.4.a), as the number of messages increase, we observe an increase in diversity as well. This is because with more samples, the closer to the theoretical underlying distribution one gets, and in the case of the 81 Figure 4.3: Kernel density estimates on varied message supply (n=2,5,10,25,50), with K=50. fig:discrete_n uniform distribution maximizes entropy. Simultaneously, Fig. 4.4.b) shows a decrease in variance as the sample size increases (aligned with the theoretical value of √ n. As α increases, diversity decreases and variance increases. This is the expected behavior in how homophilyα skews the underlying distribution. Having explored some basic behavior when increasingn, we now explore the possibility of a closed- form solution. Suppose we haven messages acrossK topics. The goal is to see what the expected entropy is given all the different ways the n messages can be distributed. This is equivalent to the bins-in-balls problem: givenK bins, how many ways can we storen balls in them? Rather than enumerating the ways, we are interested in thedistributionofentropyvalues. In other words for fixed n, how can we characterize the entropy? Unfortunately, this problem is equivalent to the composition problem in combinatorics. GivenK inte- gers, how many ways can we sum up ton? It is a known problem that has no known solution and whose generating algorithm grows exponentially, due to the combinatorial nature of coefficients (Heubach & 82 Figure 4.4: Interaction of number of messages with entropy. Fig. 4.4a) shows the mean entropy and Fig. 4.4b) the variance, across a = 1,5,10,15. The horizontal line in a) denotes the theoretical limit given bylogK. fig:amp_div Mansour, 2004). I show in the Appendix if we can find (or approximate) the distribution of integer occur- rences, we can elegantly compute the expected entropy through a closed-form equation. The idea is the distribution of integers may be simpler (or tractable) to approximate than enumerating the combinations themselves. However, this would also make headway into provingP! =NP , which falls outside the scope of this dissertation. As such, we proceed using the Monte-Carlo method of sampling. Another way to see this problem is by sampling n times from a multinomial distribution on k cate- gories, which represents the information environment. To extend this to fit a social network, we investi- gate behaviors of this system when n is distributed under a power-law distribution. Known canonically as scale-free networks, these networks follow degree distributions as: P(k) ∼ k − γ Many social phenomena follow power-law distributions, such as the distribution of CDs sales or even the choice of financial managers (Barabási, 2009). One of the mechanisms that explain this is preferential 83 Figure 4.5: Information diversity by the distribution of a) unique topics and b) the cumulative density functions for entropy, across strong ties and weak ties. The distribution of underlying topics is uniform, then adjusted by homophily factorα . fig:powerlaw-entropy attachment, where new nodes "attach" to the most popular nodes with higher probability. The Barabasi- Albert model is commonly used to generate these networks, and is parametrized by i) the number of nodes and ii) number of new connections each node creates. To simulate this, we generate 4,000 sample networks using the Barabasi-Albert model, initialized with10,000 nodes and20 initial connections. This guarantees at least 20 connections with other nodes ( |nei(i)>K ∀i∈V ) to avoid discretization of topic. We can then visualize topic diversity in two ways: a) topic-level distributions and b) entropy-based measures. Figure 4.5a) shows the topic distribution across strong ties (blue) and weak ties (red). The X-axis shows the number of unique topics and the Y-axis the number of users who receive that many unique topics. What we see is that the distribution for strong ties is more leftward than weak ties, which indicates weak ties generate more unique topics. These results hold even withr = 0.5, which denotes an even number of strong ties and weak ties. In other words, topic homophily α alone is responsible for generating this discrepancy. 84 However, how topics are distributed should matter as well. For instance, one user who consumes a topic 99% of the time, then other topics only once would show up as having consumed many unique topics. To consider topic-relative distributions we use entropy. Figure 4.5b) further shows the cumulative distribution of strong ties (blue) and weak ties (red). The X-axis shows the topic entropy, and the Y-axis shows the cumulative frequency of users who have that entropy. The more rightward the S-shaped curve is, the more topic diversity there is. Due to the frequency of nodes with degree 20 (nodes who connect via preferential attachment but were not chosen themselves) and the discretization of entropy values (onlyp 1 andp 2 ), there are extremely high frequencies at certain entropy values. We can be more precise about how these curves diverge. To do so, we use Earth Movers Distance (EMD). To gain an intuition of EMD, suppose there are two ways of piling one pound of dirt. The EMD is the minimum cost required, under a metric space, to move one pile into another. Naturally, the two piles of dirt describes a probability mass function which we can formally callf u andf v . For the case of a one-dimensional probability distribution, the EMD is simply the difference between the two cumulative density functions: EMD(f u ,f v ) = Z |F u − F v | (4.14) The EMD for this is found to be 1.343. These two figures— user-leveltopicdistributions andentropyCDFs— form the core of this math- ematical model of social communication. It allows for a standardized way to understand information diversity across mean measures, split across different types of ties, modalities, or groups. 4.6.2 VarianceviaMultinomialSampling Continuing with our set-up of measuring information environments as sampling multinomial distribu- tions, one glaring short-coming of our current approach is that we rely on adjustments to an underlying 85 uniform distribution— a strong assumption not rooted in reality. To adjust for this, we add a log-normal perturbation to the existing probabilities. Letx i ∈ N(0,1) withi ={1,...K} and⃗ p a probability vector withK elements. Then we adjust every element as follows: p ′ i =p i e x i (4.15) The reason is two-fold. First, in expectation, the underlying distribution is still normalized. E[⃗ p· exp(⃗ x)] =E[ K X i=0 p i × e x i ] = K X i=0 p i × E[e x i ] = K X i=0 p i × 1 = 1 (4.16) Second, the lognormal distribution is often used as an alternative to the Pareto distribution to describe long- tail phenomenon. Both are exponentials of underlying random variables, and can capture the underlying social phenomena. After this, we then sort the multinomial probability vector, then adjust it byα . Figure 4.6 shows the distributions from Fig. 4.5 after applying perturbation. We observe significantly more smoothing to the weak ties (tail-end) and smaller jumps on the strong ties. While this does not completely remove this issue, with a higher cut-off for the number of messages, a smaller α , and larger sample size would likely remove this issue. It may seem strange to not distinguish between individuals— surely not everyone will like the same topic. However, our procedure merely guarantees one of these topics is the most popular, not necessarily the same one. Our assumption here is on average, the distribution is the same, even if every individual’s information environment can be very different. The most popular topic may change, but the overall en- tropy need not. This allows us to side-step sampling based on user-level modeling but drawing from global distribution. 86 Figure 4.6: Information diversity by the distribution of a) unique topics and b) the cumulative density functions for entropy, across strong ties and weak ties. The distribution of underlying topics undergoes perturbation by a lognormal factor, sorted, then adjusted byα . fig:lognormal-entropy The core strength of this model is thus highlighted: we may accurately model the dynamics of tie strength and modality through just two parameters α and β . Many forms of behavior can be predicted through closed-form solutions, and simulations performed at low-dimensions (thus circumventing the curse of high dimensionality). 4.7 GroupDynamics: DichotomousPolarization From our existing model, one thing to note is thatr only encodes a partition of ties. Thisr can be extended to characterize ties within one’s own group and outgroup(s). Furthermore, these two groups form a duality (r 2 = 1− r 1 ). Formally, suppose we have two groupsG1 andG2, each with their own homophily factors α 1 ,α 2 and strong-tie amplification β 1 ,β 2 (we can interpret this as how much the inventory is amplified 87 via the algorithm). Then following our existing model, the components forG1 (p1,q1) andG2 (p 2 ,q 2 ) can be written as: p 1 =e N(0,1) 1− rβ 1 +rβ 1 α 1 K q 1 =e N(0,1) 1− rβ 1 K +rα 1 β 1 K− α 1 K(K− 1) p 2 =e N(0,1) rβ 2 +(1− rβ 2 )α 2 K q 2 =e N(0,1) rβ 2 +(1− rβ 2 )(K− α 2 K(K− 1) (4.17) These values can then be used to compute the mean entropy of G1 and G2, based on Eq. 4.13. The solution space can then be described by four cases: • a 1 > a 2 ,b 1 > b 2 : G1 is more homophilous topically and has greater amplification of in-group ties than G2. • a 1 >a 2 ,b 1 b 2 : G1 is less homophilous topically but has more amplification of in-group ties than G2. • a 1 < a 2 ,b 1 < b 2 : G1 is less homophilous topically and less amplification of in-group ties than G2. Equal to Case 1 WLOG. The heatmap in figure 4.7 shows these four cases. The color describes the ratio of information diversity between G2 and G1. The more blue, the less relative information diversity G2 has; the more red, the greater the relative information diversity. As expected, whenG2 is more homophilous with in-group ties amplified, the less diversity there is. In general, when G2 is less homophilous than G1, G2 has greater information diversity. 88 Figure 4.7: Entropy ratios betweenG1 andG2, enumerating the four case-combinations ofα 1 ,α 2 ,β 1 ,β 2 . The second quadrant (a 2 b 1 ) produces all three possibilities, hence a space where intervention is possible. fig:entropy_ratios The most exciting space is the second quadrant, as bothH 1 >H 2 andH 2 >H 1 exists. This indicates the possibility of platform-based interventions. For instance, if G2 is more homophilous, the reduction and information diversity can be offset by de-amplifying its own in-group ties and amplifying out-group ties fromG1. As an example, we attempt to simulate what we found in Chapter 2. From there, we know conservatives have greater perception of in-group ties (β 2 > β 1 ) and are topically less heterogenous (α 2 > α 1 ). Setting these values for another Monte-Carlo simulation (10,000 nodes and 4,000 trials), Fig. 4.8 shows the user- level topic distribution (Fig. 4.8a) and entropy CDFs (Fig. 4.8b). This largely matches our empirical results in Chapter 2. 89 Figure 4.8: Simulated results of the information environments between liberals and conservatives. Assum- ing conservatives have greater perception of in-group ties (β 2 > β 1 ) and are topically less heterogenous (α 2 >α 1 ) , conservatives end up with less information diversity. fig:simulated_liberal_conservative 4.8 Conclusion In this paper, we develop a mathematical theory for social communication, with particular emphasis on entropy-based information diversity. As a small homage to Claude Shannon’s model, I provide a frame- work where everyone is an encoder and receiver, where information is not produced as a mere pairwise interaction but mediated through a social network. The model can be used to simulate behavior on so- cial networks, and fit empirical data at the tie-level and group level, with individual users as the unit of analysis. The model can be simply characterized by a known number of topics K, distribution of ties r, topic homophily α , and algorithmic amplification factor β . Example tie distributions of r can be interpreted across strong-weak ties, modality (Instagram Stories versus feed, text versus photos), and in-/out-group types. In the latter case, two groups are defined and r serves as a dual for both systems. Thus, dynamics of behavior can be characterized by α and β alone, without needing user-level multinomial calculations 90 common in complex ABM models. Comparisons between two groups can be similarly derived. Compar- isons at the group- or tie-level can be easily measured by user-level distributions on a)uniquetopics and b)entropy-basedCDFs. A natural question that may arise is why we chose to binarize the ties. While the model can be changed to ties with continuous weights, this may not be completely feasible in large SQL databases in large tech companies. A weight threshold based on interaction to perform partitioned aggregations is much more efficient with millions of users and billions of log events daily. Additionally, many affordances are binary by design. For instance, within Instagram, users have the option to further partition the people they follow with a "close friends list." This allows them to control what their audience sees in a discrete fashion. Taking a step back, this chapter is just as much about the findings, as it is about the simulation design. As mentioned in the introduction, the feed-ranking algorithm is the engine of a tech platform, dictating what its users see. Not only does this have implications for revenue but also legal ramifications, such as content about race for housing and employment (Chang, Bui, & McIlwain, 2021). Being able to predict how the algorithm behaves when parameters are changed is crucial, but randomized experiments require significant resources to run, with a limited spectrum of possible conditions. As such, a good model to understand information diversity should befeasible,parametricallygen- eralandspecific , and practically iterable. Feasibility denotes the underlying algorithmic design, such as how numbers are sampled, or the time and space complexity of computation. Generality and specificity refer to our ability to answer social science questions while addressing immediate needs. Practical itera- tion simply means that the model inputs an be derived (feasibly!) from a tech platforms database, run, then output measures that are also measurable by the platform. In particular, fast iteration is critical because it allows the simulation of interventions. For instance, assuming we derive different correlations of topic homophily for liberals and conservatives, as our values forα 1 andα 2 . Then we can fit for a level of β 1 and β 2 to identify group-level amplification strategies to ensure parity in information diversity. 91 In sum, this paper provides a mathematically-grounded framework to analyze empirical data and un- derstand information diversity. Future work includes better ways of reducing the discretization of strong- tie CDFs, and methods to fit values of α andβ without resorting to numerical optimization. If the distri- bution of integers under the composition problem can be solved or computed in a tractable manner, then closed-form solutions to the distribution of entropy can be produced. 4.9 Appendix 4.9.1 ExplorationsofClosed-formSolutions First, assume our basic set-up: we haveK topics that are uniformly distributed, with the first topic ampli- fied. We can partition the multinomial distribution into the first topic (the one that is amplified), and the rest of the topics (which are distributed uniformly). Then, the weighted entropy of the system would be: H(n,K) = (1− P 1 ) n H(n,K− 1)+ n X j=1 P j 1 (1− P 1 ) n− j j n log j n +H(n− j,K− 1) (4.18) The first component is the entropy of the amplified component, whereas the summation is the aggregation on the uniform, multinomila component. Thus, our approach is to first understand the sample entropy of a uniform distribution (WLOG the case ofK− 1. Suppose a person only receives 3 messages, and these three messages can come from K topics, and for now, assume K > 3. Then we have the following possibilities: 3, 1+2, 2+1, and 1+1+1. These combinations can further be distributed on theK topics. Both the composition problem and bins-in-ball question do not have a closed-form solution, are computationally intractable (P=NP), and grow exponentially. However, if there is a way to compute the distribution of integers, not necessarily enumerate the equations, this may become feasible. Taking the case ofK = 3 we have: 92 • freq(3) = 1 • freq(2) = 2 • freq(1) = 5 LetT denote the total number of integers in the composition problem. Then, we can simply compute the entropy as follows: n X i=1 freq(i) T i n log i n (4.19) where freq(i) T effectively weights each entropy component by its frequency. As such, it isn’t necessary to even generate the full series of combinations, just the frequency of entropy components. It isn’t clear whether the task of computing the frequency is simpler; if simpler, it would contribute a general advance- ment to understanding multinomial entropy. Another way to understand this infeasibility is to consider the known expression for multinomial entropy: H =− log(n!)− n k X i=1 p i log(p i )+ k X i=1 n X x i =0 n x i p x i i (1− p i ) n− x i log(x i !) (4.20) For the case of the uniform distribution, wherep i = 1 K ∀i, this expression can be simplified into: H(p i = 1 K ) =− log(n!)+nlogK + K X i=1 n X x i =0 n x i (K− 1) n− x i K n logx i ! (4.21) The term (K− 1) n− x i K n can be simplified using polynomial expansion. However, even then, the initial term n x i and leading polynomial term grows too quickly to be computationally feasible through numerical methods. 93 Chapter5 Conclusions ch:conclusions A computer scientist, a social scientist, and a mathematician walk into a bar. “How would you find the best cocktail here?” the computer scientist asks. “Hmm,” the social scientist muses, “the theory of taste would suggest this concept as a social process, and I believe it derives itself from the elites of society.” “What about the chemical factors?” the mathematician asks back. “It’s not really relevant to the theory, so I don’t really care,” the social scientist replies. “I would build a machine learning model—probably a large neural network model that takes in a variety of data types to encode the different ingredients of all the cocktails here—, then predict on the most popular cocktail. Once I do some hyperparameter tuning, I can run an even larger model using Nvidia’s newest GPU,” the computer scientist replies. “Couldn’t you just look at what cocktail was most popular?” the mathematician asks. “Well then I wouldn’t be able to build a machine-learning model,” the computer scientist replies, obliv- ious to the jab. “What about you then?” “. . . I’d probably just ask the bartender,” the mathematician replies. - An original joke. ∗∗∗ 94 Moderating social media is a difficult task precisely because of the multitude of design choices and affordances that can be built in. From the ways we interact to decisions over what we see, every single variable can produce significant downstream effects. In the first study, we saw that liberals respond to more toxic content and diverse policy issues, whereas conservatives to less toxicity and Republican-owned policy issues. This is significant as it means the infor- mation environments of liberals and conservatives are different, which make cross-talk difficult. Moreover, it also shows if social media platforms want to increase engagement diversity, blanket policies do not work. Apart from this recommendation, I contribute to long-standing debates on ideological asymmetry (Jost, 2021) and a new attention towards the audience perspective in diffusion studies, looking at both supply and demand (Munger & Phillips, 2022). In the second study, the 2020 BLM movement on Instagram showed a shift in the opinion leaders that emerged, contrary to the leaders from text-based movements. Instead of institutional accounts such as politicians, the top users were entertainment magazines and meme pages. In turn, the framing of the protest became much more positive. While we did not make any cross-platform comparisons, the role of Instagram was indispensable to the scale and reach of the 2020 BLM movement. Modality can be a critical force in shifting power to everyday users, augmenting two-step flow, and allowing traditional out-groups to form coalitions. This contributes to a better understanding of movements facilitated through visual- based social media, and contrasts with the behavior of text-based movements (Neumayer & Rossi, 2018). It also fits within a greater push to characterize visual communication along similar axis as traditional text-based communiucation research (Peng et al., 2023). In the third study, I proposed a general framework to measure information diversity theoretically and empirically. Given a known number of topicsK and distribution of tiesr, it is then possible to simulate or fit empirically the diversity measures using topic homophily α and algorithmic amplification factor β . This simplifies modeling approaches and circumvents high-dimensional, user-level modeling. It contributes to 95 the literature gap that intersects tie strength and information diversity beyond diffusion, especially formal- theoretical models of information diversity (Guille et al., 2013). As evidenced by the work with social media data, it provides a foundational bases for studying information environments across modality, tie-strength, and group dynamics. 5.1 SimpleOpenQuestions From this dissertation, there are a few immediate open questions. Between case studies 1 and 2, how does the increased cost of content creation (i.e. multimedia), or lowered cost due to mass market software, push opinion leaders to become less institutional? What is the relationship between parasocial interaction—the phenomenon of treating entertainers as friends—and opinion leadership? Importantly, do we see the same type of asymmetry with non-institutional political accounts—i.e. YouTubers—compared to actual political elites? If so, then non-institutional opinion leaders could potentially be better at generating cross-talk. Between case studies 2 and 3, how can we use the information diversity framework to comparing cross-platform information diversity and diffusion? Namely, for large diffusion events such as #Black- OutTuesday, do strong ties or weak ties matter more? Between case studies 1 and 3, how can we effectively fit values of α and β , and what do their ratios tell us about opposing or adjacent groups? How can we simulate interventions, based on predefined goals in information diversity? One pragmatic implication is whether we should design social media to specifically target users based on ideology or other attributes, to push certain types of content based on modality or topic, and the legal, ethical, and political implications that follow. This will diverge based on a country’s culture, political system, and user base. While I don’t prescribe normative judgments about social media design here, I do provide some intuition for designers how to weigh trade-offs for their platforms. Want to equalize information diversity across two ideological groups? De-bias one but not the other. Want an egalitarian blanket policy? Simulate and find a suitable toxicity threshold to maximize out-group engagement. 96 Given the ubiquity of online social platforms, scholars from different fields all study information en- vironments; however, a divergence in method, theory, and approach is to be expected. Each field brings puzzle pieces to help solve an obviously multi-objective domain. The social scientist clarifies the significance. The data scientist provides the evidence, often with new tools. The mathematician explains the mechanism. 5.2 Takeaways Theoretically, the combination of these case-studies provide an understanding of information diversity through the following themes: visibility, attitudinal vs affective diffusion, media richness, algorithm- mediated framing, parasocial ties, and localized online networks. I offer three takeaways. ∗∗∗ Take-away one: Information diversity is a holistic frame to understanding information diffusion, as it accounts for both the inventory of exposure and people’s reactions to it. The role of the algorithm cannot be understated. Whether it is interacting with the underlying social network or acting within an election or social movement, they are the principal entity that dictates visibility of content, opinions, or a cause. Once exposed, then various factors such as topic and group dynamics can influence the subsequent diffusion. From another angle, this dissertation takes an incredibly close look at how diversity impacts diffusion decisions. Simulation approaches, such as that in Case Study 3, broaden the experimental possibilities, so long as it satisfies computational feasibility, parametrically general and specificity, and practically iterability. This ensures our results are scalable, answers interesting research questions, and fit with real databases for simulation. ∗∗∗ 97 Take-away two: Media rich messages high in equivocality interact with the algorithm to promote high quality opinions, that advance communication goals in political organizing. Secondly, this dissertation works to advance public opinion research by combining media richness theory (MRT) and framing theory. Under MRT, visual communication during social movements are marked by high equivocality and low uncertainty. This provides a theoretical explanation for “a picture is worth a thousand words.” In terms of framing theory, high-quality opinions are canonically thought to be generated by elites. However, with the popularity-driven feed-ranking algorithm, high quality opinions from the masses can also bubble up. Through this bottom-up process, protestors can achieve communication goals in political organizing (Theocharis et al., 2020). ∗∗∗ Take-awaythree: Media-rich networks with diverse local accounts generates sustained and deep diffu- sion. MRT interacts with the underlying social network in nontrivial ways. The media-rich environments naturally engender greater levels of parasocial interaction, where normal people interact in an unrequited fashion with entertainers. The well-cultivated audiences provide an extant critical mass when social movements erupt. More importantly, evoking parasocial interaction refines the strong-weak tie paradigm. Parasocial relations are “strong” due to the intensity and multimodality of interaction between users and entertainers. Yet, parasocial relations are “weak” in that one directional. The high levels of geolocation mean highly diverse content, which frames the movement as connected, localized protest. Put together, topically and modally diverse information is allowed to traverse across entire networks in a sustained fash- ion. This framework of analysis also extends to more content-based platforms like YouTube and TikTok. With virtual reality and augmented reality technologies on the horizon, these dynamics will play an even greater role in the information environments in online, even hybrid, spaces. 98 Bibliography Al-Kandari, A. J., Al-Hunaiyyan, A. A., & Al-Hajri, R. (2016). The influence of culture on instagram use. Journal of Advances in Information Technology, 7 (1), 54, 57. Allcott, H., Braghieri, L., Eichmeyer, S., & Gentzkow, M. (2020). The welfare effects of social media. American Economic Review, 110(3), 629–676. Anandhan, A., Shuib, L., Ismail, M. A., & Mujtaba, G. (2018). Social media recommender systems: Review and open research issues. IEEE Access, 6, 15608–15628. Anderson, M., Barthel, M., Perrin, A., & Vogels, E. A. (2020). # Blacklivesmatter surges on twitter after george floyd’s death. Pew Research Center. Arafa, M., & Armstrong, C. (2016). " facebook to mobilize, twitter to coordinate protests, and youtube to tell the world": New media, cyberactivism, and the arab spring. Journal of Global Initiatives: Policy, Pedagogy, Perspective, 10(1), 6. Autio, E., Nambisan, S., Thomas, L. D., & Wright, M. (2018). Digital affordances, spatial affordances, and the genesis of entrepreneurial ecosystems. Strategic Entrepreneurship Journal, 12(1), 72–95. Bail, C. A., Argyle, L. P., Brown, T. W., Bumpus, J. P., Chen, H., Hunzaker, M. F., Lee, J., Mann, M., Merhout, F., & Volfovsky, A. (2018). Exposure to opposing views on social media can increase political polarization. Proceedings of the National Academy of Sciences, 115(37), 9216–9221. Bak-Coleman, J. B., Alfano, M., Barfuss, W., Bergstrom, C. T., Centeno, M. A., Couzin, I. D., Donges, J. F., Galesic, M., Gersick, A. S., Jacquet, J., et al. (2021). Stewardship of global collective behavior. Proceedings of the National Academy of Sciences, 118(27), e2025764118. Bakshy, E., Messing, S., & Adamic, L. A. (2015). Exposure to ideologically diverse news and opinion on facebook. Science, 348(6239), 1130–1132. Bakshy, E., Rosenn, I., Marlow, C., & Adamic, L. (2012). The role of social networks in information diffusion. Proceedings of the 21st international conference on World Wide Web, 519–528. Bao, Q., Cheung, W. K., Zhang, Y., & Liu, J. (2016). A component-based diffusion model with structural diversity for social networks. IEEE transactions on cybernetics, 47(4), 1078–1089. 99 Baptiste, N. (2021). The mob at the capitol proves that blue lives have never mattered to trump supporters. Mother Jones. Barabási, A.-L. (2009). Scale-free networks: A decade and beyond. science, 325(5939), 412–413. Barberá, P., Casas, A., Nagler, J., Egan, P. J., Bonneau, R., Jost, J. T., & Tucker, J. A. (2019). Who leads? who follows? measuring issue attention and agenda setting by legislators and the mass public using social media data. American Political Science Review, 113(4), 883–901. Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A., & Bonneau, R. (2015). Tweeting from left to right: Is online political communication more than an echo chamber? Psychological science, 26(10), 1531–1542. Bateman, J. A. (2012). The decomposability of semiotic modes. In Multimodal studies (pp. 37–58). Routledge. Baumgartner, F. R., Green-Pedersen, C., & Jones, B. D. (2006). Comparative studies of policy agendas. Journal of European public policy, 13(7), 959–974. Bennett, W. L., & Segerberg, A. (2015). The logic of connective action: Digital media and the personalization of contentious politics. In Handbook of digital politics (pp. 169–198). Edward Elgar Publishing. Bevan, S. (2019). The creation of the comparative agendas project master codebook. Comparative policy agendas: Theory, tools, data, 17. Bhat, P., & Klein, O. (2020). Covert hate speech: White nationalists and dog whistle communication on twitter. Twitter, the public sphere, and the chaos of online deliberation, 151–172. Bodó, B., Helberger, N., Eskens, S., & Möller, J. (2019). Interested in diversity: The role of user attitudes, algorithmic feedback loops, and policy in news personalization. Digital journalism, 7(2), 206–229. Bohman, J., & Rehg, W. (1997). Deliberative democracy: Essays on reason and politics. MIT press. Bonilla, Y., & Rosa, J. (2015). # Ferguson: Digital protest, hashtag ethnography, and the racial politics of social media in the united states. American ethnologist, 42(1), 4–17. Bormann, M., Tranow, U., Vowe, G., & Ziegele, M. (2022). Incivility as a violation of communication norms—a typology based on normative expectations toward political communication. Communication Theory, 32(3), 332–362. Boutyline, A., & Willer, R. (2017). The social structure of political echo chambers: Variation in ideological homophily in online networks. Political psychology, 38(3), 551–569. Brady, W. J., Crockett, M. J., & Van Bavel, J. J. (2020). The mad model of moral contagion: The role of motivation, attention, and design in the spread of moralized content online. Perspectives on Psychological Science, 15(4), 978–1010. 100 Brady, W. J., Wills, J. A., Burkart, D., Jost, J. T., & Van Bavel, J. J. (2019). An ideological asymmetry in the diffusion of moralized content on social media among political leaders. Journal of Experimental Psychology: General, 148(10), 1802. Brown, D. K., & Mourão, R. R. (2021). Protest coverage matters: How media framing and visual communication affects support for black civil rights protests. Mass Communication and Society, 24(4), 576–596. Buchanan, L., Fessenden, F., Lai, K. R., Park, H., Parlapiano, A., Tse, A., Wallace, T., Watkins, D., & Yourish, K. (2015). What happened in ferguson. The New York Times, 10. Buchbinder, N., Jain, K., & Naor, J. (2007). Online primal-dual algorithms for maximizing ad-auctions revenue. Algorithms–ESA 2007: 15th Annual European Symposium, Eilat, Israel, October 8-10, 2007. Proceedings 15, 253–264. Bui, M., Chang, H.-C. H., & McIlwain, C. (2022). Carlson, T. N. (2019). Through the grapevine: Informational consequences of interpersonal political communication. American Political Science Review, 113(2), 325–339. Carney, N. (2016). All lives matter, but so does race: Black lives matter and the evolving role of social media. Humanity & Society, 40(2), 180–199. Center, P. (2022). Social media and news fact sheet. https://www.pewresearch.org/journalism/fact-sheet/social-media-and-news-fact-sheet/ Centola, D. (2018). How behavior spreads: The science of complex contagions (Vol. 3). Princeton University Press Princeton, NJ. Chan, J. M., & Lee, C.-C. (1984). The journalistic paradigm on civil protests: A case study of hong kong. The news media in national and international conflict , 183–202. Chang, H.-C. H. (2021). Multi-issue negotiation with deep reinforcement learning. Knowledge-Based Systems, 211, 106544. Chang, H.-C. H., Bui, M., & McIlwain, C. (2021). Targeted ads and/as racial discrimination: Exploring trends in new york city ads for college scholarships. arXiv preprint arXiv:2109.15294. Chang, H.-C. H., Bui, M., & Mcilwain, C. (2022). Targeted ads and/as racial discrimination: Exploring trends in new york city ads for college scholarships. Proceedings of the 55th Hawaii International Conference on System Sciences. Chang, H.-C. H., Chen, E., Zhang, M., Muric, G., & Ferrara, E. (2021). Social bots and social media manipulation in 2020: The year in review. arXiv preprint arXiv:2102.08436. Chang, H.-C. H., & Ferrara, E. (2022). Comparative analysis of social bots and humans during the covid-19 pandemic. Journal of Computational Social Science, 1–17. 101 Chang, H.-C. H., & Fu, F. (2019). Co-contagion diffusion on multilayer networks. Applied Network Science, 4(1), 1–15. Chang, H.-C. H., & Fu, F. (2021). Elitism in mathematics and inequality. Humanities and Social Sciences Communications, 8(1), 1–8. Chang, H.-C. H., Haider, S., & Ferrara, E. (2021). Digital civic participation and misinformation during the 2020 taiwanese presidential election. Media and Communication, 9(1), 144–157. Chang, H.-C. H., Harrington, B., Fu, F., & Rockmore, D. N. (2023). Complex systems of secrecy: The offshore networks of oligarchs. PNAS nexus, 2(3), pgad051. Chang, H.-C. H., Pham, B., & Ferrara, E. (2021). Kpop fandoms drive covid-19 public health messaging on social media. arXiv preprint arXiv:2110.04149. Chang, H.-C. H., Richardson, A., & Ferrara, E. (2022). # Justiceforgeorgefloyd: How instagram facilitated the 2020 black lives matter protests. PLoS one, 17(12), e0277864. Chen, A. Y., Nyhan, B., Reifler, J., Robertson, R. E., & Wilson, C. (2022). Subscriptions and external links help drive resentful users to alternative and extremist youtube videos. arXiv preprint arXiv:2204.10921. Chen, C. X., Pennycook, G., & Rand, D. (2021). What makes news sharable on social media? Chen, E., Chang, H., Rao, A., Lerman, K., Cowan, G., & Ferrara, E. (2021). Covid-19 misinformation and the 2020 us presidential election. Harvard Kennedy School Misinformation Review. Chen, E., Jiang, J., Chang, H.-C. H., Muric, G., & Ferrara, E. (2022). Charting the information and misinformation landscape to characterize misinfodemics on social media: Covid-19 infodemiology study at a planetary scale. Jmir Infodemiology, 2(1), e32378. Chen, Z., Oh, P., & Chen, A. (2021). The role of online media in mobilizing large-scale collective action. Social Media+ Society, 7(3), 20563051211033808. Chitra, U., & Musco, C. (2019). Understanding filter bubbles and polarization in social networks. arXiv preprint arXiv:1906.08772. Chong, D., & Druckman, J. N. (2007). Framing theory. Annu. Rev. Polit. Sci., 10, 103–126. Cinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, W., & Starnini, M. (2021). The echo chamber effect on social media. Proceedings of the National Academy of Sciences, 118(9), e2023301118. Conover, M. D., Ferrara, E., Menczer, F., & Flammini, A. (2013). The digital evolution of occupy wall street. PloS one, 8(5), e64679. Corrigall-Brown, C. (2012). The power of pictures: Images of politics and protest. American Behavioral Scientist, 56(2), 131–134. 102 Coscarelli, J. (2020). # Blackouttuesday: A music industry protest becomes a social media moment. New York Times, 2. Craig, S. C., & Cossette, P. S. (2020). Who owns what, and why? the origins of issue ownership beliefs. Politics & Policy, 48(1), 107–134. Dalton, R. J. (2008). The quantity and the quality of party systems: Party system polarization, its measurement, and its consequences. Comparative political studies, 41(7), 899–920. Dan, V. (2018). Integrative framing analysis: Framing health through words and visuals. Taylor & Francis. Detenber, B. H., Gotlieb, M. R., McLeod, D. M., & Malinkina, O. (2007). Frame intensity effects of television news stories about a high-visibility protest issue. Mass Communication & Society, 10(4), 439–460. Dick, P. K. (2012). We can build you. Houghton Mifflin Harcourt. Druckman, J. N., Gubitz, S., Lloyd, A. M., & Levendusky, M. S. (2019). How incivility on partisan media (de) polarizes the electorate. The Journal of Politics, 81(1), 291–295. Druckman, J. N., & Jacobs, L. R. (2015). Who governs? presidents, public opinion, and manipulation. University of Chicago Press. Druckman, J. N., Levendusky, M. S., & McLain, A. (2018). No need to watch: How the effects of partisan media can spread via interpersonal discussions. American Journal of Political Science, 62(1), 99–112. Edrington, C. L., & Gallagher, V. J. (2019). Race and visibility: How and why visual images of black lives matter. Visual Communication Quarterly, 26(4), 195–207. Egan, P. J. (2013). Partisan priorities: How issue ownership drives and distorts american politics. Cambridge University Press. Enders, A. M. (2021). Issues versus affect: How do elite and mass polarization compare? The Journal of Politics, 83(4), 1872–1877. Fagan, E. (2021). Issue ownership and the priorities of party elites in the united states, 2004–2016. Party Politics, 27(1), 149–160. Felkner, V. K., Chang, H.-C. H., Jang, E., & May, J. (2022). Towards winoqueer: Developing a benchmark for anti-queer bias in large language models. arXiv preprint arXiv:2206.11484. Ferrara, E., Chang, H., Chen, E., Muric, G., & Patel, J. (2020). Characterizing social media manipulation in the 2020 us presidential election. First Monday. Ferrara, E., Interdonato, R., & Tagarelli, A. (2014). Online popularity and topical interests through the lens of instagram. Proceedings of the 25th ACM conference on Hypertext and social media, 24–34. Flaxman, S., Goel, S., & Rao, J. M. (2016). Filter bubbles, echo chambers, and online news consumption. Public opinion quarterly, 80(S1), 298–320. 103 Freelon, D., Bossetta, M., Wells, C., Lukito, J., Xia, Y., & Adams, K. (2022). Black trolls matter: Racial and ideological asymmetries in social media disinformation. Social Science Computer Review, 40(3), 560–578. Freelon, D., McIlwain, C. D., & Clark, M. (2016). Beyond the hashtags:# ferguson,# blacklivesmatter, and the online struggle for offline justice. Center for Media & Social Impact, American University, Forthcoming. Frimer, J. A., Aujla, H., Feinberg, M., Skitka, L. J., Aquino, K., Eichstaedt, J. C., & Willer, R. (2023). Incivility is rising among american politicians on twitter. Social Psychological and Personality Science, 14(2), 259–269. Garrett, R. K. (2009). Echo chambers online?: Politically motivated selective exposure among internet news users. Journal of computer-mediated communication, 14(2), 265–285. Garton, L., Haythornthwaite, C., & Wellman, B. (1997). Studying online social networks. Journal of computer-mediated communication, 3(1), JCMC313. Garza, A. (2014). A herstory of the# blacklivesmatter movement. The Feminist Wire. Gibson, J. J. (1977). The theory of affordances. Hilldale, USA, 1(2), 67–82. Giles, D. C. (2002). Parasocial interaction: A review of the literature and a model for future research. Media psychology, 4(3), 279–305. Goel, S., Anderson, A., Hofman, J., & Watts, D. J. (2016). The structural virality of online diffusion. Management Science, 62(1), 180–196. Gorodnichenko, Y., Pham, T., & Talavera, O. (2021). Social media, sentiment and public opinions: Evidence from# brexit and# uselection. European Economic Review, 136, 103772. Granovetter, M. S. (1973). The strength of weak ties. American journal of sociology, 78(6), 1360–1380. Guay, B., & Johnston, C. D. (2022). Ideological asymmetries and the determinants of politically motivated reasoning. American Journal of Political Science, 66(2), 285–301. Guess, A., Nyhan, B., Lyons, B., & Reifler, J. (2018). Avoiding the echo chamber about echo chambers. Knight Foundation, 2(1), 1–25. Guess, A. M. (2021). Experiments using social media data. Advances in experimental political science, 184. Guille, A., Hacid, H., Favre, C., & Zighed, D. A. (2013). Information diffusion in online social networks: A survey. ACM Sigmod Record, 42(2), 17–28. Gutmann, A., & Thompson, D. (2004). Why deliberative democracy? Princeton University Press. Hall, W., Tinati, R., & Jennings, W. (2018). From brexit to trump: Social media’s role in democracy. Computer, 51(1), 18–27. 104 Han, X., Cao, S., Shen, Z., Zhang, B., Wang, W.-X., Cressman, R., & Stanley, H. E. (2017). Emergence of communities and diversity in social networks. Proceedings of the National Academy of Sciences, 114(11), 2887–2891. Hänska, M., & Bauchowitz, S. (2017). Tweeting for brexit: How social media influenced the referendum. Haq, E.-U., Braud, T., Yau, Y.-P., Lee, L.-H., Keller, F. B., & Hui, P. (2022). Screenshots, symbols, and personal thoughts: The role of instagram for social activism. Proceedings of the ACM Web Conference 2022, 3728–3739. Harlow, S., & Johnson, T. J. (2011). The arab spring| overthrowing the protest paradigm? how the new york times, global voices and twitter covered the egyptian revolution. International journal of Communication, 5, 16. Haythornthwaite, C. (2001). Exploring multiplexity: Social network structures in a computer-supported distance learning class. The information society, 17(3), 211–226. Haythornthwaite, C., & Wellman, B. (1998). Work, friendship, and media use for information exchange in a networked organization. Journal of the american society for information science, 49(12), 1101–1114. Hemphill, L., & Schöpke-Gonzalez, A. M. (2020). Two computational models for analyzing political attention in social media. Proceedings of the International AAAI Conference on Web and Social Media, 14, 260–271. Hermida, A. (2010). Twittering the news: The emergence of ambient journalism. Journalism practice, 4(3), 297–308. Hermida, A., Lewis, S. C., & Zamith, R. (2014). Sourcing the arab spring: A case study of andy carvin’s sources on twitter during the tunisian and egyptian revolutions. Journal of computer-mediated communication, 19(3), 479–499. Heubach, S., & Mansour, T. (2004). Compositions of n with parts in a set. Congressus Numerantium, 168, 127. Hill, E., Tiefenthäler, A., Triebert, C., Jordan, D., Willis, H., & Stein, R. (2020). How george floyd was killed in police custody. The New York Times, 31. Hochman, N., & Manovich, L. (2013). Zooming into an instagram city: Reading the local through social media. First Monday. Holtz, D., Carterette, B., Chandar, P., Nazari, Z., Cramer, H., & Aral, S. (2020). The engagement-diversity connection: Evidence from a field experiment on spotify. Proceedings of the 21st ACM Conference on Economics and Computation, 75–76. Howard, P. N., & Hussain, M. M. (2013). Democracy’s fourth wave?: Digital media and the arab spring. Oxford University Press. 105 Huszár, F., Ktena, S. I., O’Brien, C., Belli, L., Schlaikjer, A., & Hardt, M. (2022). Algorithmic amplification of politics on twitter. Proceedings of the National Academy of Sciences, 119(1), e2025334119. Isaak, J., & Hanna, M. J. (2018). User data privacy: Facebook, cambridge analytica, and privacy protection. Computer, 51(8), 56–59. Iselin, E. R. (1988). The effects of information load and information diversity on decision quality in a structured decision task. Accounting, organizations and Society, 13(2), 147–164. Ishii, K., Lyons, M. M., & Carr, S. A. (2019). Revisiting media richness theory for today and future. Human Behavior and Emerging Technologies, 1(2), 124–131. Jackson, S. J. (2016). (re) imagining intersectional democracy from black feminism to hashtag activism. Women’s Studies in Communication, 39(4), 375–379. Jackson, S. J., Bailey, M., & Welles, B. F. (2020). # Hashtagactivism: Networks of race and gender justice. Mit Press. Jackson, S. J., & Foucault Welles, B. (2015). Hijacking# mynypd: Social media dissent and networked counterpublics. Journal of communication, 65(6), 932–952. Jacomy, M., Venturini, T., Heymann, S., & Bastian, M. (2014). Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software.PloSone,9(6), e98679. Jang, J. Y., Han, K., Shih, P. C., & Lee, D. (2015). Generation like: Comparative characteristics in instagram. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 4039–4042. Jiang, J., Chen, E., Luceri, L., Murić, G., Pierri, F., Chang, H.-C. H., & Ferrara, E. (2022). What are your pronouns? examining gender pronoun usage on twitter. arXiv preprint arXiv:2207.10894. Jones, J. J., Settle, J. E., Bond, R. M., Fariss, C. J., Marlow, C., & Fowler, J. H. (2013). Inferring tie strength from online directed behavior. PloS one, 8(1), e52168. Jost, J. T. (2017). Ideological asymmetries and the essence of political psychology. Political psychology, 38(2), 167–208. Jost, J. T. (2021). Left and right: The psychological significance of a political distinction . Oxford University Press. Kahne, J., Middaugh, E., & Allen, D. (2015). Youth, new media, and the rise of participatory politics. From voice to influence: Understanding citizenship in a digital age , 35. Katz, E. (1957). The two-step flow of communication: An up-to-date report on an hypothesis. Public opinion quarterly, 21(1), 61–78. Kaufman, G. (2020). ’the show must be paused’: What to know about the music industry’s response to george floyd’s death. Billboard. 106 Kennedy, C. J., Vassey, J., Chang, H.-C. H., Unger, J. B., & Ferrara, E. (2021). Tracking e-cigarette warning label compliance on instagram with deep learning. arXiv preprint arXiv:2102.04568. Khondker, H. H. (2011). Role of the new media in the arab spring. Globalizations, 8(5), 675–679. Kilgo, D. K., & Harlow, S. (2019). Protests, media coverage, and a hierarchy of social struggle. The International Journal of Press/Politics, 24(4), 508–530. Kirkland, J. (2021). Inside k-pop stans’ social media war against white supremacists. https://www.esquire.com/entertainment/music/a32754772/k-pop-stans-fight-white-blue-all- lives-matter-twitter-hashtags/ Kitchens, B., Johnson, S. L., & Gray, P. (2020). Understanding echo chambers and filter bubbles: The impact of social media on diversification and partisan shifts in news consumption. MIS quarterly, 44(4). Klapper, J. T. (1957). What we know about the effects of mass communication: The brink of hope. Public opinion quarterly, 21(4), 453–474. Kurtin, K. S., O’Brien, N., Roy, D., & Dam, L. (2018). The development of parasocial interaction relationships on youtube. The Journal of Social Media in Society, 7(1), 233–252. Lee, E., Lee, J.-A., Moon, J. H., & Sung, Y. (2015). Pictures speak louder than words: Motivations for using instagram. Cyberpsychology, behavior, and social networking, 18(9), 552–556. Leib, E. J. (2010). Deliberative democracy in america: A proposal for a popular branch of government. Penn State Press. Levendusky, M. (2009). The partisan sort: How liberals became democrats and conservatives became republicans. University of Chicago Press. Lodewijckx, I. (2022). What’s the difference between deliberation and participation? https://www.citizenlab.co/blog/civic-engagement/whats-the-difference-between-deliberative- and-participatory-democracy/ Lopez, G. (2016). What were the 2014 ferguson protests about? vox. Lubbers, M. J., Molina, J. L., & Valenzuela-Garcı*************************************a, H. (2019). When networks speak volumes: Variation in the size of broader acquaintanceship networks. Social Networks, 56, 55–69. Marsh, W. T. (2018). Pictures are worth a thousand words: An analysis of visual framing in civil rights and black lives matter protest photography. Marwick, A. E., & Boyd, D. (2011). I tweet honestly, i tweet passionately: Twitter users, context collapse, and the imagined audience. New media & society, 13(1), 114–133. McKay, S., & Tenove, C. (2021). Disinformation as a threat to deliberative democracy. Political Research Quarterly, 74(3), 703–717. 107 McKelvey, F., DeJong, S., & Frenzel, J. (2021). Memes, scenes and# elxn2019s: How partisans make memes during elections. New Media & Society, 14614448211020690. Meta. (n.d.). https://ai.facebook.com/tools/system-cards/instagram-feed-ranking/ Michels, R. (2019). The iron law of oligarchy. In Power in modern societies (pp. 111–124). Routledge. Moran, R. E., Koltai, K., Grasso, I., Schafer, J., & Klentschy, C. (2021). Content moderation avoidance strategies. https://www.viralityproject.org/rapid-response/content-moderation-avoidance- strategies-used-to-promote-vaccine-hesitant-content Moreno-Almeida, C. (2021). Memes as snapshots of participation: The role of digital amateur activists in authoritarian regimes. New Media & Society, 23(6), 1545–1566. Mosleh, M., Martel, C., Eckles, D., & Rand, D. G. (2021). Shared partisanship dramatically increases social tie formation in a twitter field experiment. Proceedings of the National Academy of Sciences, 118(7), e2022761118. Munger, K., & Phillips, J. (2022). Right-wing youtube: A supply and demand perspective. The International Journal of Press/Politics, 27(1), 186–219. Mutz, D. C. (2002). Cross-cutting social networks: Testing democratic theory in practice. American Political Science Review, 96(1), 111–126. Mutz, D. C. (2006). Hearing the other side: Deliberative versus participatory democracy. Cambridge University Press. Mutz, D. C. (2008). Is deliberative democracy a falsifiable theory? Annu. Rev. Polit. Sci., 11, 521–538. Neblo, M. A., Esterling, K. M., & Lazer, D. M. (2018). Politics with the people: Building a directly representative democracy (Vol. 555). Cambridge University Press. Neumayer, C., & Rossi, L. (2018). Images of protest in social media: Struggle over visibility and visual narratives. New Media & Society, 20(11), 4293–4310. Nguyen, D. Q., Vu, T., & Nguyen, A. T. (2020). Bertweet: A pre-trained language model for english tweets. arXiv preprint arXiv:2005.10200. Norman, D. A. (1999). Affordance, conventions, and design. interactions, 6(3), 38–43. Nyhan, B., & Reifler, J. (2010). When corrections fail: The persistence of political misperceptions. Political Behavior, 32(2), 303–330. Olesen, T. (2015). Global injustice symbols and social movements. Springer. Olson, M. (1989). Collective action. The invisible hand, 61–69. Ortiz-Ospina, E., & Roser, M. (2023). The rise of social media. https://ourworldindata.org/rise-of-social-media 108 Osmundsen, M., Bor, A., Vahlstrup, P. B., Bechmann, A., & Petersen, M. B. (2021). Partisan polarization is the primary psychological motivation behind political fake news sharing on twitter. American Political Science Review, 115(3), 999–1015. Overgaard, C. S. B., & Woolley, S. (2022). How social media platforms can reduce polarization. https://www.brookings.edu/techstream/how-social-media-platforms-can-reduce-polarization/ Page, S. E. (2018). The model thinker: What you need to know to make data work for you. Basic Books. Paivio, A. (1991). Dual coding theory: Retrospect and current status. Canadian Journal of Psychology/Revue canadienne de psychologie, 45(3), 255. Papacharissi, Z. (2002). The virtual sphere: The internet as a public sphere. New media & society, 4(1), 9–27. Peng, Y., Lu, Y., & Shen, C. (2023). An agenda for studying credibility perceptions of visual misinformation. Political Communication, 1–13. Pennycook, G., & Rand, D. G. (2021). The psychology of fake news. Trends in cognitive sciences, 25(5), 388–402. Petrocik, J. R. (1996). Issue ownership in presidential elections, with a 1980 case study. American journal of political science, 825–850. Powell, T. E., Boomgaarden, H. G., De Swert, K., & de Vreese, C. H. (2015). A clearer picture: The contribution of visuals and text to framing effects. Journal of communication, 65(6), 997–1017. Raghunathan, S. (1999). Impact of information quality and decision-maker quality on decision quality: A theoretical model and simulation analysis. Decision support systems, 26(4), 275–286. Rathje, S., Van Bavel, J. J., & Van Der Linden, S. (2021). Out-group animosity drives engagement on social media. Proceedings of the National Academy of Sciences, 118(26), e2024292118. Richardson, A. V. (2020).Bearingwitnesswhileblack:Africanamericans,smartphones,andthenewprotest# journalism. Oxford University Press, USA. Riker, W. H. (1993). Agenda formation. University of Michigan press. Rodriguez, L., & Dimitrova, D. V. (2011). The levels of visual framing. Journal of visual literacy, 30(1), 48–65. Rogers, N., & Jost, J. T. (2022). Liberals as cultural omnivores. Journal of the Association for Consumer Research, 7(3), 255–265. Russell, A. (2018). Us senators on twitter: Asymmetric party rhetoric in 140 characters. American Politics Research, 46(4), 695–723. 109 Sandford, A. (2020). George floyd: Protests continue in europe despite virus restrictions. https://www.euronews.com/2020/06/06/black-lives-matter-protesters-take-to-streets-in- europe-despite-pandemic-restrictions Sanz-Cruzado, J., & Castells, P. (2018). Enhancing structural diversity in social networks by recommending weak ties. Proceedings of the 12th ACM conference on recommender systems, 233–241. Sharma, S., & Verma, H. (2018). Social media marketing: Evolution and change. Social media marketing: Emerging concepts and applications, 19–36. Shmargad, Y., & Klar, S. (2020). Sorting the news: How ranking by popularity polarizes our politics. Political Communication, 37(3), 423–446. Sniderman, P. M., & Theriault, S. M. (2004). The structure of political argument and the logic of issue framing. Studies in public opinion: Attitudes, nonattitudes, measurement error, and change, 133–65. Solomon, J., Kaplan, D., & Hancock, L. E. (2021). Expressions of american white ethnonationalism in support for “blue lives matter”. Geopolitics, 26(3), 946–966. Spurk, D., & Straub, C. (2020). Flexible employment relationships and careers in times of the covid-19 pandemic. Steinert-Threlkeld, Z. C. (2018). Twitter as data. Cambridge University Press. Tajfel, H., & Turner, J. C. (2004). The social identity theory of intergroup behavior. In Political psychology (pp. 276–293). Psychology Press. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: Liwc and computerized text analysis methods. Journal of language and social psychology, 29(1), 24–54. Theocharis, Y., Barberá, P., Fazekas, Z., & Popa, S. A. (2020). The dynamics of political incivility on twitter. Sage Open, 10(2), 2158244020919447. Theocharis, Y., Lowe, W., Van Deth, J. W., & Garcia-Albacete, G. (2015). Using twitter to mobilize protest action: Online mobilization patterns and action repertoires in the occupy wall street, indignados, and aganaktismenoi movements. Information, Communication & Society, 18(2), 202–220. Treem, J. W., & Leonardi, P. M. (2013). Social media use in organizations: Exploring the affordances of visibility, editability, persistence, and association. Annals of the International Communication Association, 36(1), 143–189. Treré, E., & Bonini, T. (2022). Amplification, evasion, hijacking: Algorithms as repertoire for social movements and the struggle for visibility. Social Movement Studies, 1–17. Tucker, J. A., Guess, A., Barberá, P., Vaccari, C., Siegel, A., Sanovich, S., Stukal, D., & Nyhan, B. (2018). Social media, political polarization, and political disinformation: A review of the scientific literature. Political polarization, and political disinformation: a review of the scientific literature (March 19, 2018). 110 Twitter. (2023). Twitter’s recommendation algorithm. https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation- algorithm Van der Brug, W. (2004). Issue ownership and party choice. Electoral studies, 23(2), 209–233. Weeks, B. E., Ardèvol-Abreu, A., & Gil de Zúñiga, H. (2017). Online influence? social media use, opinion leadership, and political persuasion. International journal of public opinion research, 29(2), 214–239. Wellman, M. L. (2022). Black squares for black lives? performative allyship as credibility maintenance for social media influencers on instagram. Social Media+ Society, 8(1), 20563051221080473. Wojcieszak, M., Casas, A., Yu, X., Nagler, J., & Tucker, J. A. (2022). Most users do not follow political elites on twitter; those who do show overwhelming preferences for ideological congruity. Science advances, 8(39), eabn9418. Zhao, J., Wu, J., & Xu, K. (2010). Weak ties: Subtle role of information diffusion in online social networks. Physical Review E, 82(1), 016105. Zidani, S., & Moran, R. (2021). Memes and the spread of misinformation: Establishing the importance of media literacy in the era of information disorder. Teaching Media Quarterly, 9(1). 111
Abstract (if available)
Abstract
This dissertation examines information diversity on social media. Through three case studies, investigate information diversity's relationship with media richness theory, algorithm-driven framing, affective versus attitudinal diffusion, and parasocial interaction.
The first case study examines whether liberals and conservatives respond differently to elite communication on Twitter, in lieu of long-standing questions about (as)symetric behavior based on ideology. I find a demand-side asymmetry, where liberals engage with more diverse policy issues, toxicity, and as a result across the aisle; conservatives engage with Republican-owned issues with less toxicity. The second case study investigates how Instagram facilitated the 2020 Black Lives Matter movement, following the murder of George Floyd. I find a shift toward non-institutional, entertainment-based opinion leaders, and a subsequent more positive framing of the protesters. I argue these affordances produce a divergence from typical framing produced by the traditional media, and facilitated global solidarity and greater coalition formation. The third case study proposes a general framework for measuring information diversity at and within the user level, adaptable to mulitplex and multimodal settings. Given $K$ number of topics and $r$ percentage of tie divisions, I derived closed-form equations for the mean information diversity based on topic homophily and strong tie amplification. I then show the problem of computing entropy distribution is isomorphic to the integer composition problem. After simulations, I find tie-level and group-level comparisons comparable with empirical data. Together, these case studies illustrate how group dynamics, language, policy issues, modality, algorithms, and tie strength can interact to influence our information environments. I offer three take aways. First, information diversity is one of the most holistic frames to better understand diffusion phenomenon. Second, high quality opinions and protest-friendly framing emerge from the interaction of media-rich messages and the algorithm. Third, media-rich networks with parasocial ties and localized accounts generate sustained diffusion during social movements.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Socially-informed content analysis of online human behavior
PDF
Three text-based approaches to the evolution of political values and attitudes
PDF
Modeling information operations and diffusion on social media networks
PDF
Analysis and prediction of malicious users on online social networks: applications of machine learning and network analysis in social science
PDF
The impact of social media on the diabetes industry
PDF
Building social Legoland through collaborative crowdsourcing: marginality, functional diversity, and team success
PDF
Crowdsourcing for integrative and innovative knowledge: knowledge diversity, network position, and semantic patterns of collective reflection
PDF
Staying ahead of the digital tsunami: strategy, innovation and change in public media organizations
PDF
Climate change communication: challenges and insights on misinformation, new technology, and social media outreach
PDF
Relationship formation and information sharing to promote risky health behavior on social media
PDF
Get @ me: comedy in the age of social media
PDF
Rebels with a cause: Youth, social movements, and media
PDF
Centering impact, not intent: reframing everyday racism through storytelling, participatory politics, and media literacy
PDF
Mining and modeling temporal structures of human behavior in digital platforms
PDF
Measuing and mitigating exposure bias in online social networks
PDF
Connected: information and state power between the United States and South Africa
PDF
Empowering equity: an exploration of how Black women-owned brands can harness social media to overcome public relations’ equity gap to build influence
PDF
The data dilemma: sensemaking and cultures of research in the media industries
PDF
Social media in community policing: a vehicle for communication
PDF
Upvoting the news: breaking news aggregation, crowd collaboration, and algorithm-driven attention on reddit.com
Asset Metadata
Creator
Chang, Ho-Chun Herbert
(author)
Core Title
Improving information diversity on social media
School
Annenberg School for Communication
Degree
Doctor of Philosophy
Degree Program
Communication
Degree Conferral Date
2023-05
Publication Date
11/11/2023
Defense Date
05/02/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
information diversity,network science,OAI-PMH Harvest,polarization,social media
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ferrara, Emilio (
committee chair
), Barberá, Pablo (
committee member
), Druckman, James (
committee member
), Twyman, Marlon II (
committee member
)
Creator Email
herbert.hc.chang@gmail.com,hochunhe@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113120729
Unique identifier
UC113120729
Identifier
etd-ChangHoChu-11826.pdf (filename)
Legacy Identifier
etd-ChangHoChu-11826
Document Type
Dissertation
Format
theses (aat)
Rights
Chang, Ho-Chun Herbert
Internet Media Type
application/pdf
Type
texts
Source
20230511-usctheses-batch-1042
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
information diversity
network science
polarization
social media