Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Global consequences of local information biases in complex networks
(USC Thesis Other)
Global consequences of local information biases in complex networks
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
GLOBAL CONSEQUENCES OF LOCAL INFORMATION BIASES IN COMPLEX NETWORKS by Xin-Zeng Wu A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (PHYSICS) December 2018 Copyright 2018 Shin-Chieng Ngo Contents Acknowledgements 4 Abstract 5 1 Introduction 8 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3 Dissertation overview and contributions . . . . . . . . . . . . . . . . 14 2 Background and Related Work 16 2.1 Friendship paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Related work on the friendship paradox . . . . . . . . . . . . . . . . 16 2.3 The dK series network measures . . . . . . . . . . . . . . . . . . . . 17 2.4 Log-normal joint degree distribution . . . . . . . . . . . . . . . . . . 19 2.5 Related work on social contagion process . . . . . . . . . . . . . . . 20 3 Neighbor Assortativity and Correlation 21 3.1 Higher-order structures on networks . . . . . . . . . . . . . . . . . . 21 3.2 Correlation between neighbors . . . . . . . . . . . . . . . . . . . . . 23 3.3 Visualization on Karate Club network . . . . . . . . . . . . . . . . . 24 3.4 Neighbor assortativity in real-world networks . . . . . . . . . . . . . 26 3.5 Impact of neighbor assortativity . . . . . . . . . . . . . . . . . . . . 27 4 The \Majority Illusion" 30 4.1 Problem setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Impact of degree-attribute correlations . . . . . . . . . . . . . . . . 32 4.3 Impact of degree assortativity . . . . . . . . . . . . . . . . . . . . . 34 4.4 Binomial model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.5 Numerical results in synthetic networks . . . . . . . . . . . . . . . . 37 4.6 Numerical results in real-world networks . . . . . . . . . . . . . . . 40 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2 5 Strong Friendship Paradox 45 5.1 Problem setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 Eects of 2K structure . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 Eects of 3K structure . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.5 Collective behaviors of the neighbors . . . . . . . . . . . . . . . . . 53 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6 Generating Function Formulation 57 6.1 Percolation and generating functions . . . . . . . . . . . . . . . . . 57 6.2 Subcritical cascades . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.3 The onset of global cascades . . . . . . . . . . . . . . . . . . . . . . 60 6.4 Supercritical regime and node in uence . . . . . . . . . . . . . . . . 63 7 Network Structure and Dynamics 65 7.1 Problem setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7.2 Bivariate log-normal model . . . . . . . . . . . . . . . . . . . . . . . 67 7.3 Cascades in synthetic networks . . . . . . . . . . . . . . . . . . . . 72 7.4 Cascades in real-world networks . . . . . . . . . . . . . . . . . . . . 75 7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 8 Conclusion 80 A Symbols 82 A.1 Network measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 A.2 Generating functions and cascading . . . . . . . . . . . . . . . . . . 83 B Data and Networks 84 B.1 Synthetic networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 B.2 Real-world networks . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Reference List 88 3 Acknowledgements My path toward the degree of Doctor of Philosophy in the University of South- ern California was an intellectual and colorful journey to self-actualization. This journey is remarkable to my life because of the wonderful faculty members and incredible fellow students I worked with during these years. In the rst place, I would like to thank my research advisor Dr. Kristina Lerman for her academic guidance and the freedom given to me to pursue my course work and travel. In particular, I would like to thank her for her unreserved support to me when I faced research challenges and troublesome school issues. Without these, this dissertation would not have been possible. I would like to thank also the members of my qualication and dissertation committee; Dr. Aiichiro Nakano, Dr. Stephan Haas, Dr. Hubert Saleur and Dr. Kayla de la Haye for all their suggestions and constructive criticism. I would like to thank the funding agencies who supported the research during the long PhD years. The work presented in this dissertation was funded by Army Research Oce under contract W911NF-16-1-0306 and by the National Science Foundation under grant SMA-1360058. Last but not least, I would like to dedicate this dissertation to my parents, my family, and my beloved homeland | Taiwan. Tsit phinn phok-s u l un-b^ un hi an h oo Gu a ^ e p e-b u, ka-t^ ng, kap ts - ai ^ e T^ ai-u^ an th oo-t e. | Sh n-Chieng Ng^ o 4 Abstract Complex systems can be represented as networks of interacting entities or nodes. In numerous physical models on networks, dynamics are based on interactions that exclusively involve properties of a node's nearest neighbors. Local interac- tions among nodes in a complex network can lead to an astounding array of global behaviors. Examples include viral outbreaks and social contagions in social net- works, cascading failures in the power grid and nancial networks, synchronization of coupled oscillators, opinion dynamics and consensus formation in human groups. A node's local view of the network, however, may be systematically dierent from the global ground truth, which may aect global phenomena. Social scientists have identied one source of bias|the friendship paradox|which states that, on average, nodes have fewer connections, or smaller degree, than their neighbors. Recently, more interesting variations of the paradox were discovered. The strong friendship paradox states that most nodes have fewer connections than most of their neighbors. Unlike the original friendship paradox and its generalizations to attributes other than degree, the strong friendship paradox does not arise trivially as a result of sampling from skewed distributions. Strong friendship paradox can dramatically distort local information in a net- work, leading to the \majority illusion" paradox in which a globally rare attribute may be dramatically overrepresented in the local neighborhoods of many nodes. As 5 consequence, many nodes will observe the majority of neighbors with the attribute, which can aect not only an individual node's behavior but also the collective phenomena unfolding on the network. Two key network properties determine the strength of the paradox in a network. The paradox in a network: 1. nodes with high connectivity are more likely bear that specic property, 2. nodes with high connectivity prefer to connect to nodes with low connectivity. By using these two correlations, we can quantify the global visibility of that specic property in a complex network. We also investigate the strength of the strong friendship paradox to estimate the amount of information distortion a node with a given connectivity experiences. To accurately model the strength of the strong friendship paradox requires a new structure property beyond the connectivity preference between two nodes men- tioned above: what we dened as the correlation between node's neighbors or the neighbor assortativity. These paradoxes can in uence results of social contagion. To verify this, we employ the Watt's threshold model, in which a node become active when some fraction of its neighbors are active. A mathematical theory with generating func- tions and tree-like assumption describes the size of contagious outbreaks. In most cases, there is a critical threshold that controls whether the outbreak is local or global (i.e. reaches a sizable fraction of the network). Interestingly we found that the outbreak behavior is non-monotonic with respect to networks assortativity. To explain this, we identify the weights of links between nodes that could be activated by a single neighbor are directly related to this phe- nomena. The vulnerable correlation can be then dened in the same manner. The outbreak threshold increases monotonically with the vulnerable correlation. 6 Of the measures that dene the properties of a network, most are based on the connectivity distribution of a single node or a pair of connected nodes. Not much discussion about higher order properties of network have been made so far. Our work elucidates how these properties aect local information bias a node experiences and its consequences on dynamic phenomena in networks, including dynamics of spreading processes. 7 Chapter 1 Introduction 1.1 Motivation Many systems in our universe appear in the form of network. A network represents individuals and interactions between them as nodes and edges respectively. In a network, each individual tends to behave or make decisions based on the status of the individuals they are interacting with. Studies of network began with obser- vations of human society. In the past decades, unveiling of the scale-free network structure and other statistical mechanics-related topics also gained attention of the physics community. Recent technology developments amplied and accelerated the spread of infor- mation inside networks. Generally, the information propagation is based on some individuals accepting and then promoting or sharing the opinion acquired from the network. [Schelling, 1973, Granovetter, 1978, Salganik et al., 2006, Centola, 2010, Young, 2011, Centola and Baronchelli, 2015] This phenomenon is manifested daily in the decisions people make to adopt a new technology [Valente, 1995, Rogers, 2003] or idea [Bettencourt et al., 2006, Young, 2011], listen to music [Salganik et al., 2006], engage in risky behavior [Bearak, 2014], abuse alcohol [Baer et al., 1991, Berkowitz, 2005], or join a social movement [Granovetter, 1978, Schelling, 1973]. However, sometimes there exists some inconsistency between an individual's local observation of its network neighbors and the global ground truth in the net- work. When this eect is large, the dominant opinion of the network may change 8 signicantly. Thus, understanding the process and the factors quantifying it is important for understanding dynamics of information spread and opinion change. Social scientists have identied \friendship paradox" in the early 90s [Feld, 1991]. In its original form, it states \your friends have more friends than you do, on average." Together with many other discoveries of statistical biases in the complex network, the paradox systematically distorts a node's local view of the network. We refer this eect as local information bias, and it aects the emergence of collective phenomena and out understanding of networks. The dynamic process of social contagion also attracted the attention of sci- entists. Such eects have been studied in numerous contexts, including critical phenomena [Goltsev et al., 2008] and percolation [Karrer et al., 2014]. One of the simplest models used to describe these interactions is the threshold model [Gra- novetter, 1978, Watts, 2002], where a node changes its state to one held by a suciently large fraction of its neighbors. Under some conditions, even a single node can trigger a global cascade that aects a signicant portion of the net- work [Watts, 2002]. Despite its simplicity, the threshold model has been used to study a surprisingly wide range of social, biological, and technological phenomena. For example, in social systems, the local perturbation could represent adoption of an innovation, e.g., a new product, action, or idea, that spreads throughout society as individuals adopt the behavior of their friends [Granovetter, 1978, Watts, 2002, Centola, 2010]. In technological systems, the perturbation could represent the failure of a single component, which triggers a cascade of failures in connected components. If small shocks can become pandemic, then nodes that can seed global outbreaks will have outsize importance [Kempe et al., 2003, Goltsev et al., 2008, Karrer et al., 2014]. Such seeds represent in uential individuals in social systems, who can help an 9 innovation become widely adopted, or \weak links" in technological systems, whose failure compromises the robustness of the entire system. For example, cascading failures were implicated in widespread blackouts in the power grid [Watts and Strogatz, 1998], and market crashes in nancial systems [Haldane and May, 2011]. The network structure is known to aect the size and the easiness to generate outbreak in networks. [Gleeson, 2008, Payne et al., 2009] As the relation between network structures and local information biases is shown, it is important to explore their eects towards the nal outcomes of the multi-step processes. 1.2 Research questions Local information bias in networks manifests itself in dierent ways: it can distort a node's local view of network structure, or of the conguration of information or node traits. Local information bias due to the structure of the underlying network can also aect the collective behaviors of the community. In this dissertation, we developed statistical models based on the network structures that quanties the bias. New network structural measures are also proposed. These measures aims to determine the magnitude of the local information bias and the contagion pattern based on it. Particularly, this dissertation is devoted to answer the following questions. Q0 How does network structure create biases? Local prevalence of some attribute among a node's network neighbors can be very dierent from its global prevalence, creating an illusion that the attribute is far more common than it actually is. In a social network, this illusion may cause peo- ple to reach wrong conclusions about how common a behavior is, leading them to accept as a norm a behavior that is globally rare. In addition, it may also explain 10 how global outbreaks can be triggered by very few initial adopters. This may also explain why the observations and inferences individuals make of their peers are often incorrect. Psychologists have, in fact, documented a number of systematic biases in social perceptions [Miller and Prentice, 1994]. The \false consensus" eect arises when individuals overestimate the prevalence of their own features in the population [Marks and Miller, 1987], believing their type to be more common. Thus, Democrats believe that most people are also Democrats, while Republicans think that the majority are Republican. \Pluralistic ignorance" is another social perception bias. This eect arises in situations when individuals incorrectly believe that a majority has an attribute or accepts a norm that they themselves do not share. Pluralistic ignorance was invoked to explain why bystanders fail to act in emergencies [Latan e and Darley, 1969], and why college students tend to overesti- mate alcohol use among their peers [Baer et al., 1991, Prentice and Miller, 1993, Berkowitz, 2005]. Psychologists proposed several explanations for these biases (see [Kitts, 2003] for a concise review), many based on emotional or cognitive mechanisms. For example, when making social inferences, individuals may use themselves as exam- ples for estimating the states of others (using the \availability" heuristic [Tver- sky and Kahneman, 1973]). This leads them to mistakenly believe that majority shares their attitudes and behaviors. However, if instead of using themselves, individuals use their peers as examples to generalize about the population as a whole, network-based explanations for social perception bias are also possible. \Selective exposure" [Kitts, 2003] is one such explanation. Social networks are homophilous [Mcpherson et al., 2001], meaning that socially linked individuals 11 tend to be similar. Homophily exposes people to a biased sample of the popula- tion, creating the false consensus eect [Marks and Miller, 1987]. A related mech- anism is \selective disclosure" [Kitts, 2003, Salganik et al., 2011], in which people selectively divulge or conceal their attributes or behaviors to peers, especially if these deviate from prevailing norms. This too can bias social perceptions, leading individuals to incorrectly infer the prevalence of the behavior in the population. Q1 How to estimate the magnitude of this phenomenon, which we call local information bias, quantitatively? Network structures distort social perceptions by biasing individual's observa- tions. One of these network biases is the friendship paradox, which states that, on average, most people have fewer friends than their friends have [Feld, 1991]. Despite its almost nonsensical nature, the friendship paradox has been used to design e- cient strategies for vaccination [Cohen et al., 2003], social intervention [Kim et al., 2015], and early detection of contagious outbreaks [Christakis and Fowler, 2010, Garcia-Herranz et al., 2014]. In a nutshell, rather than monitoring random people to catch a contagious outbreak in its early stages, the friendship paradox suggests monitoring their random network neighbors, because they are more likely to be better connected and not only to get sick earlier, but also to infect more people once sick. Having understood the possible reasons that lead to such skewness on the local distribution of information, we dene the paradox holding probability as a measure of such distortion. If the distribution is unbiased, we can expect there are around 50% of nodes will experience the paradox as the criteria for the paradox is dened by mean or median. If the global probability of paradox is over 50% then the local information bias exists: in other words, nodes will observe that a majority of their neighbors have some trait or information. The strength of the paradox can 12 alternately be dened as the probability of a node to be in the paradox regime: the more probable that a node is in the paradox, the more distorted the information is. Various collective properties of the network were considered as factors that quantify the strength of the paradox, including degree distribution of the network, the degree-attribute correlation, and the degree-degree correlation (assortativity). We proposed a rigorous mathematical model that incorporate the three factors together, and can accurately predict the global probability of the paradox. [Lerman et al., 2016] Q2 How does higher-order structure of the network, including terms describing the inhomogeneity of connectivity and attribute, distort local information available to nodes in complex networks? Local information bias arises from what we call \strong friendship paradox" in networks: namely, most of a node's neighbors have higher degrees than the node itself [Hodas et al., 2013]. In fact, any attribute that is correlated with degree will produce a paradox [Eom and Jo, 2014, Kooti et al., 2014]. In order to accurately quantify the strong friendship paradox, we need to account for a property of the network that has not been widely investigated before, called neighbor-neighbor correlation. [Wu et al., 2017] This correlation measures the inhomogeneity of the neighbor degree distribution for nodes in a single degree class. Collective behavior between neighbors was shown to be important for describing network structure in recent publications. Our work showed empirically that nodes tend to have similar-degree neighbors, whether with a higher or lower degree than the node's own degree. This leads us to dene a general measure on neighbors | the neighbor assortativity. A related phenomenon, named monophily [Altenburger and Ugander, 2018], describes a situation where a node's preference for neighbors 13 of a certain type introduces correlations between their attributes. Social networks, in particular, often have signicant monophily, even in the absence of homophily. Q3 Can we leverage local information bias to aect the dynamics of global phe- nomena in complex networks, i.e., for network intervention? Under what conditions does a local shock, e.g. a change of state of a node, spread globally? Researchers have examined how the dynamics of such cascades are aected by network structure, including degree distribution [Watts, 2002] and degree correlations between connected nodes [Gleeson, 2008, Payne et al., 2009]. Surprisingly, networks with both suciently positive and suciently nega- tive assortativity have been found to be vulnerable to global outbreaks. Such non- monotone behavior was reported in Erd} os-R enyi random networks [Payne et al., 2009], and for k-core networks [Gleeson, 2008]. However, these works did not explain the anomalous relationship between degree assortativity and cascade size, nor did they provide a general mathematical framework for quantifying how degree correlations aect the properties of cascades under the threshold model. Our work addresses these topics and also demonstrates that assortativity does not adequately capture some important aspects of degree correlations in networks. Specically, in real-world networks, assortativity is heavily skewed by nodes with a single neighbor, which act as \dead ends" to spreading cascades. To address this shortcoming, we introduce a new measure | the vulnerable correlation | which accounts for degree correlations of nodes that participate in a spreading cascade. We demonstrate that cascade size is monotonically related to this measure. 1.3 Dissertation overview and contributions In our current work, we have completed the following : 14 We proposed a structural model that precisely predict the strength of the \majority illusion", which relates to generalized friendship paradox with binary attributes. Proves of degree-attribute correlation and network assor- tativity towards the strength of the \majority illusion" is also presented along with the model. We proposed a statistical model that precisely predict the strength of the strong friendship paradox. The model incorporates information about higher- order network structures. The in uence of higher order network structures is observed in real-world networks. We proposed new measures, neighbor-neighbor correlation and assortativity, that summarize the in uences of third-order network structures and directly related to the strength of the paradox, as well as the network vulnerability to contagion outbreaks. We investigated the complex threshold contagion process and identied the eect of network assortativity toward the network vulnerability, and the importance of connectivity between vulnerable nodes as a key factor for amplifying cascades. In the following chapters, we will rst review the discoveries and development of friendship paradox as well, as the terminology for describing complex network structure in Chapter 2. A detailed discussion on the higher-order structural eects and measures proposed will be hold on Chapter 3. Then we present researches on generalized friendship paradox in Chapter 4, and strong friendship paradox in Chapter 5. Finally, we will verify the importance of the proposed measures on the complex threshold contagion process in Chapter 6 and Chapter 7. 15 Chapter 2 Background and Related Work 2.1 Friendship paradox Node degree distribution p(k) gives the probability that a randomly chosen node in an undirected network has k neighbors or edges. Neighbor degree distribution q(k) gives the probability that a randomly chosen edge in an undirected network is connected to a node of degree k. It is easy to demonstrate that average neighbor degreehki q is larger than average node degreehki. The dierence between these quantities is hki q hki = X k k 2 p(k) hki hki = hk 2 ihki 2 hki = 2 k hki ; where k is the standard deviation of the degree distribution p(k). Since k 0, hki q hki 0. This conrms that the friendship paradox, which says that average degree of the neighbors is larger than node's own degree, has its origins in the heterogeneous degree distribution [Feld, 1991], and is more pronounced in networks with larger degree heterogeneity k . 2.2 Related work on the friendship paradox Recently, more interesting variations of the friendship paradox were discovered. The strong friendship paradox [Kooti et al., 2014] states that nodes tend to have fewer connections than most of their neighbors. Unlike the original friendship 16 paradox, the strong friendship paradox does not arise trivially as a result of sam- pling from skewed distributions [Kooti et al., 2014]. Strong friendship paradox can dramatically distorts local information in a network, leading to the \majority illusion" [Lerman et al., 2016] in which a globally rare attribute may be over- represented in the local neighborhoods of many nodes. The phenomena of the friendship paradox is veried in various social networks, including friendship net- works of school and university students [Zuckerman and Jost, 2001, Grund, 2014], scientic collaboration networks [Eom and Jo, 2014], and user networks of online social media including Facebook and Twitter [Ugander et al., 2011, Hodas et al., 2013, Kooti et al., 2014]. 2.3 The dK series network measures Questions for network science often stand on how much information is required to describe properties of networks. One framework to describe network structure is the dK series, which indicates the joint degree distributions of subgroups involving up to d nodes. [Mahadevan et al., 2006] Consider an undirected network with a specied number of nodesN and edges E. We can immediately write the average degree of the network ashki = 2E N . 1K distribution is often called the degree distribution of the specic network, denoted byp(k), which gives the probability to nd a node withk neighbors in the network. The 2K distribution species the degree-degree correlation of pairs of connected nodes. In a network, a node may have preference to connect with nodes with similar degrees or with dierent degrees. The joint distribution is denoted by e(k;k 0 ), a matrix of probabilities to nd an edge linking two nodes with degrees k and k 0 . From here, people often dene the neighbor degree distribution q(k), which is 17 the probability for a specic node to have a neighbor with degree k. The 2K distribution has the following normalization conditions X k 0 e(k;k 0 ) =q(k) = k hki p(k) = N 2E kp(k) X k;k 0 e(k;k 0 ) = 1 (2.1) It is worth mentioning here that the measure of the degree-degree correlation on the two sides of an edge is called assortativity, which is dened as r = 1 2 q X k;k 0 kk 0 [e(k;k 0 )q(k)q(k 0 )] = 1 2 q " X k;k 0 kk 0 e(k;k 0 ) ! hki 2 q # (2.2) where 2 q = P k k 2 q(k) [ P k kq(k)] 2 . Networks with more positive value of r are said to be assortative: Here, nodes tend to connect with other nodes with similar degrees. Otherwise, the network is disassortative: nodes tend to connect with other nodes with very dierent degrees. The 3K distribution is the joint distribution involving degrees of three con- nected nodes, written as t(k 0 i ;k;k 0 j ). This is the probability to nd a connected ordered triplet of nodes with degrees (k 0 i ;k;k 0 j ) where k is the degree of the cen- tral node, and k 0 i , k 0 j are the degrees of its two distinct neighbors. The number of triplets in a network is A = 1 2 Nhk 2 ki. The 3K distribution has the following normalization conditions X k 0 j t(k 0 i ;k;k 0 j ) = hki(k 1) hk 2 ki e(k;k 0 i ) = E A (k 1)e(k;k 0 i ) X k 0 i ;k 0 j t(k 0 i ;k;k 0 j ) = hki(k 1) hk 2 ki q(k) = k 2 k hk 2 ki p(k) = N 2A k(k 1)p(k) X k 0 i ;k;k 0 j t(k 0 i ;k;k 0 j ) = 1 (2.3) 18 In some publications 3K distribution is distinguished by wedge distribution and triangle distribution. In the scope of friendship paradox problem, we are only concerned on the neighbors' degree of degree k node, regardless of whether the neighbors are themselves connected or not. 2.4 Log-normal joint degree distribution To make a better illustration on the impact of 2K structure on some phenomena, we model the joint distribution e(k;k 0 ) as a bivariate log-normal distribution, a long-tailed distribution dened on positive domain of k, with equal means m, equal variances s 2 , and correlation coecient c. The distribution function can be explicitly written as e(k;k 0 ) = 1 2s 2 p 1c 2 kk 0 exp 1 2s 2 (1c 2 ) h (logkm) 2 2c(logkm)(logk 0 m) + (logk 0 m) 2 i : (2.4) This form of the distribution allows for analytical treatment of the problems. Under this assumption, the conditional distribution of degrees of the linked nodes are logk 0 j logkN +c(logk); (1c 2 )s 2 (2.5) The 1K degree distribution can be written explicitly as p(k) = e 1 2 s 2 p 2s 2 k 2 exp (logkm) 2 2s 2 (2.6) and q(k) = 1 p 2s 2 k exp (logkm) 2 2s 2 (2.7) 19 From these we can derive the variance and covariance of linked degrees as Var q (k) =e 2m+s 2 e s 2 1 , and Cov q (k;k 0 ) =e 2m+s 2 e cs 2 1 . Thus, the assor- tativity can be written as r = Cov(k;k 0 ) Var(k) = e cs 2 1 e s 2 1 : (2.8) Note that the assortativity is bounded bye s 2 r 1, and increases with c. 2.5 Related work on social contagion process Mathematical models on social contagion process can be classied into simple and complex contagion. While simple contagion process are heavily used in com- partmental models in epidemiology [Kermack and McKendrick, 1927]; complex contagion process gains more attention on social network analysis. [Centola et al., 2007, Mnsted et al., 2017] Complex contagion process can be viewed as a peers' eect to the indivudual's own behavior, an example of this phenomena appears on researches regarding adolescents overestimate their peers' alcohol consumption and drug use. [Prentice and Miller, 1993, Baer et al., 1991, Berkowitz, 2005] Researchers have examined how the dynamics of such cascades are aected by network struc- ture, including degree distribution [Watts, 2002] and degree correlations between connected nodes [Gleeson, 2008, Payne et al., 2009]. Surprisingly, networks with both suciently positive and suciently negative assortativity have been found to be vulnerable to global outbreaks. Such non-monotone behavior was reported in Erd} os-R enyi random networks [Payne et al., 2009], and for k-core networks [Glee- son, 2008]. 20 Chapter 3 Neighbor Assortativity and Correlation 3.1 Higher-order structures on networks The structure of networks in large part determines the phenomena taking place on them. As a characterization of network structure, the degree distribution gives the number of neighbors a randomly chosen nodes is expected to have. Degree distribution is one of the fundamental building blocks of network models, and it aects many network properties, such as their connectivity [Newman, 2003], searchability [Kleinberg, 2000], and robustness to disruptions due to individual node failure [Callaway et al., 2000, Gao et al., 2016]. Most real-world biological, social and technological networks have markedly heterogeneous degree distribution, with a few hubs|extremely well-connected nodes|and a multitude of poorly- connected nodes with few neighbors [Barab asi and P osfai, 2016]. Networks often have structure beyond that given by the degree distribution. Nodes in real-world networks are not connected at random, but instead tend to link to other nodes with a similar, or in some cases, very dissimilar, degrees. These tendencies are formally represented by the joint degree distribution, from which a global measure of correlation between degrees of connected nodes|called degree assortativity [Newman, 2002]|can be computed. An analogous property, known as homophily [Mcpherson et al., 2001], describes the propensity of nodes to link to 21 other nodes with similar attributes. Degree assortativity and homophily are nec- essary to describe the structure of real-world networks, and they govern a variety of phenomena on networks, such as their stability to contagious outbreaks [Dodds and Payne, 2009, Payne et al., 2009] and the formation of echo chambers in social networks [Lee et al., 2017]. Recently, correlations beyond those of linked node pairs were shown to be important for describing network structure. Specically, to explain the strong friendship paradox|an observation that most neighbors of a node are better con- nected than the node is itself [Kooti et al., 2014]|one also has to account for degree correlations among node's neighbors [Wu et al., 2017]. The same work showed empirically that nodes tend to have similar-degree neighbors, whether with a higher or lower degree than the node's own degree. A related phenomenon, named monophily [Altenburger and Ugander, 2018], describes a situation where a node's preference for neighbors of a certain type introduces correlations between their attributes. Social networks, in particular, often have signicant monophily, even in the absence of homophily. The preference of nodes to connect to neighbors with similar properties can also be explained by the graph structural balance the- ory. [Cartwright and Harary, 1956, Hummon and Doreian, 2003, Facchetti et al., 2011, Rawlings and Friedkin, 2017] In this Chapter we introduce neighbor assortativity as a measure of higher order degree structure in complex networks and a new tool for network analysis. Like assortativity, this quantity measures correlation of degrees, but here among neighbors of a node, regardless of whether they are themselves connected. By analyzing several real-world networks, we show that neighbor assortativity can be 22 substantially large. Shuing the links to change neighbor assortativity, while keep- ing degree distribution and assortativity xed, can dramatically alter the structure of a network. 3.2 Correlation between neighbors The 3K structure species the behavior of neighbors of a degree{k node, i.e., of connected subgraphs with three nodes forming a wedge or a triangle. Assuming that each neighbor is independent, the joint degree distribution of the neighbors of a degree{k node follows the sum of independent random variables with distribution P (k 0 jk) =e(k;k 0 )=q(k). In many networks, however, the variance of the neighbor degree distribution is higher than that given by the independence assumption [Wu et al., 2017]. This suggests that neighbors' degrees are correlated. From here, we can dene a correlation coecient between pairs of random neighbors by com- paring the measured real variance of neighbor degrees and the variance estimated by the independence assumption. The variance of average neighbor degrees is Var( 1 k P k i=1 k 0 i ), while the variance for a single random neighbor is Var(k 0 i ). We have the following derivation 2 3K (k) = 1 k 2 " k X i=1 Var(k 0 i ) + 2 k1 X i=1 k X j=i+1 Cov(k 0 i ;k 0 j ) # = 1 k 2 kVar(k 0 i ) +k(k 1) Cov(k 0 i ;k 0 j ) = 1 k Var(k 0 i )[1 + (k 1)c(k)] (3.1) where 2 3K (k) is the real variance of neighbor degrees of the degree=k nodes within the network, and this variance will be 1 k Var(k 0 i ) if the assumption that the neigh- bors are independent random variables following the 2K conditional distribution 23 is valid. More specically to calculate 2 3K (k), we will need information about the joint degree distribution of connected three node subgraphs. This distribution is denoted by t(k 1 ;k;k 2 ) and involves a focal node and two neighbors. From this distribution we can explicitly dene the neighbor assortativity in a similar manner like Eq. (2.2) as c(k) = 1 Var (k) X k 1 ;k 2 k 1 k 2 [w(k 1 ;k 2 jk)(k 1 jk)(k 2 jk)]; (3.2) where (k) = e(k;k 0 )=q(k 0 ) is the distribution of neighbor degrees of a degree-k node, and w(k 1 ;k 2 jk) = t(k 1 ;k;k 2 )=(k) is the joint distribution of two random neighbors of a degree-k node. From here, we can also dene a global measure of neighbor assortativity as the sum of each degree's value weighted by the degree distribution c global = P k p(k)c(k). 3.3 Visualization on Karate Club network We illustrate the impact of neighbor assortativity on network structure using a widely studied benchmark, Zachary's Karate club network [Zachary, 1977]. This network contains 34 members of a Karate club with 78 social ties between them. There are two hubs|the club administrator and the club instructor|who serve as foci of the two communities. Figure 3.1 shows the joint degree distribution of pairs of neighbors, or \wedges", of degree-k nodes, where k comes from dierent quintiles of the neighbor degree distribution q(k). The upper left gure is for wedges rooted in the highest degree nodes, and the lower right gure is that of the lowest degree nodes. The wedge degree distribution shows more assortative structure for higher k and more disas- sortative structure for lower k. 24 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 0 10 Karate (n = 35, e = 78) Figure 3.1: Joint degree distribution of \wedges" in Zachary's Karate Club, i.e., joint degree distribution of pairs of neighbors of degree-k nodes, where k comes from dierent quintiles of the degree distribution q(k). Plots are ordered from highest k to lowest k. Lighter colors correspond to bigger probability mass. We use a rewiring algorithm to change neighbor assortativity while preserving the network's 2K and 1K structure. The rewiring algorithm randomly chooses two edges with two equivalent end degrees, and swaps their connections so as to change the global neighbor assortativity in the desired direction. The original Karate club network has a global weighted neighbor assortativity value of0.098. By implementing the above mentioned algorithms, we created two network with extremely positive and negative neighbor assirtativity values of 0.4068 and0.4595 respectively. The original and rewired networks are shown in Fig. 3.2. The visualization demonstrates how neighbor assortativity can change the emergent structure of networks. When neighbor degree assortativity is negative, the network acquires 25 (a) Negative (b) Original (c) Positive Figure 3.2: Visualizations of the original and rewired Zachary's Karate Club net- work. These have been rewired to obtain global neighbor assortativity (a)0.4021, (b)0.098 (original network) and (c) 0.4068. a core-periphery structure, while for positive degree assortativity, it splinters into communities. 3.4 Neighbor assortativity in real-world networks To further investigate neighbor assortativity, we plot c(k), neighbor assortativ- ity aggregated over neighbor pairs of same-degree nodes. in Figure 3.3 shows c(k) neighbor assortativity aggregated over neighbor pairs of same-degree nodes for var- ious real-world networks. These networks represent a variety of social, biological, technological and other networks (See Appendix). The real-world networks usually have signicant positive values of global neighbor assortativity. The lower degrees nodes usually have signicant positive neighbor assortativity values. This coher- ence with the neighbor-neighbor correlation values explored in strong friendship paradox problem [Wu et al., 2017]. The assortativity has more signicant positive values because the distribution of node degrees are usually long-tailed and con- nections to hubs have more dominant eect toward the neighbor correlation than the neighbor-neighbor correlation. Some peaks of positive neighbor assortativity values exists in the middle degree classes for all networks. 26 10 0 10 1 10 2 10 3 Degree k −0.4 −0.2 0.0 0.2 0.4 Neighbor assortativity HepPh Facebook Digg HepTh GR 10 0 10 1 10 2 10 3 Degree k 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Neighbor assortativity WordNet Google HepPhCit Reactome (a) Social (b) Non-social Figure 3.3: The neighbor{neighbor degree correlations in (a) social and (b) non- social networks. Facebook network is an interesting case, as it has very negative values of neigh- bor assortativity. From most of the observations, the neighbor assortativities are positive. The variance formula Eq. (5.10) gives a theoretical lower bound of cor- relation c(k) min = 1 k1 . The neighbor assortativity will be negative because the baseline randomness is assuming neighbors are independent variables. If the ran- domness of neighbors are lower than the totally random assumption, the neighbor correlation has a value smaller than 0. Note that the lower bound is always reached if there is only one node in the network with that degree. Positive neighbor assorta- tivity symbols nodes with the same degree are placed in dierent relative positions in the network, conversely negative neighbor assortativity presents that nodes with the same degree have similar neighborhood pictures 3.5 Impact of neighbor assortativity Neighbor-neighbor correlations were shown to aect the strong friendship paradox problem, amplifying it for the high degree nodes, while also suppressing it in the 27 low degree nodes [Wu et al., 2017]. Neighbor assortativity also aects social con- tagion on networks. A popular model of social contagion is the Watts threshold model [Watts, 2002], which assumes that nodes can be activated when a large enough fraction of their neighbors is active. This fraction is specied by a thresh- old value between zero and one. If a node can be activated by a single active neighbor, then it is called vulnerable. Vulnerable nodes are those with degrees lower than the inverse of the threshold embedded on the node. Several works have shown that the giant vulnerable component, with the largest number of connected vulnerable nodes, determines the size of the global cascades [Dodds and Payne, 2009, Payne et al., 2009]. To analyze the impact of neighbor assortativity on the global cascades we use the conguration model to generate a synthetic power-law network with exponent = 2:4. Then we implement the same rewiring proce- dure as described in the Karate Club section. Since the conguration model does now have any specied higher-order structure the neighbor assortativity is zero for the original network. The rewired networks have neighbor assortativities ranges up to 0:3, and all this series of networks shares the same 2K distributions. We then plot the size of the graph's giant vulnerable components by threshold of each 3K -rewired WordNet graph in Fig. 3.4. In Fig. 3.4, neighbor assortative networks tend to be more vulnerable and can easily generate small size of cascades, but the cascading size growth slowly by lowering the threshold. Conversely, neighbor disassortative networks are more robust in threshold cascading, but when cascading happens, a small decrease on the threshold can signicantly magnify the size of the cascading size. This result coheres with the visualization of Karate Club, as neighbor assortative networks have a large amount of lower degree nodes connected together and without any link with hubs. 28 0.0 0.1 0.2 0.3 0.4 0.5 Threshold φ 0.0 0.2 0.4 0.6 0.8 Giant Vulnerable Cluster Size Original c = 0.1 c = 0.2 c = 0.3 Figure 3.4: Cascading phenomena on rewired power-law network with 10,000 nodes. The original network is instructed by conguration model with assorta- tivity value0:17. In this Chapter we have derived the property that captures the collective behav- iors of a node's neighbors by its own degree. We give it the name neighbor assorta- tivity as its similarity with the assortativity values in the early publications. [New- man, 2002] We also noticed that with the constraints of the same 2K structure, the neighbor assortative networks tends to have nodes in farther distance on average. This phenomena agrees with the neighbor-neighbor correlations that is critical to solve the strong friendship paradox problem. [Wu et al., 2017] It is also possible to extend the denition of neighbor assortativity to be the correlation between attributed of neighbors. This measure will also tend to split the network into separate party clusters while similarity between neighbors are shown. The eect of neighbor assortativity towards cascading phenomena is also sig- nicant. Neighbor assortativity has impact on both the critical threshold and the sensitivity of cascading threshold toward cascading size. 29 Chapter 4 The \Majority Illusion" 4.1 Problem setup In this Chapter, we describe a novel variation of the friendship paradox that is essential for understanding social contagion. The paradox applies to networks in which individuals have attributes, in the simplest case a binary attribute, such as \has red hair" vs \does not have red hair," \purchased an iPhone" vs \did not purchase an iPhone," \Democrat" vs \Republican." We refer to individuals with this attribute as \active", and the rest as \inactive." We show that under some conditions many individuals will observe a majority of their neighbors in the active state, even when it is globally rare. For example, though few people have red hair, many may observe that a majority of their friends are red-headed. For this reason, we call this eect the \majority illusion." As a simple illustration of the \majority illusion" paradox, consider the two networks in Fig 4.1. The networks are identical, except for which of the few nodes are colored. Imagine that colored nodes are active and the rest of the nodes are inactive. Despite this apparently small dierence, the two networks are profoundly dierent: in the rst network, every inactive node will examine its neighbors to observe that \at least half of my neighbors are active," while in the second network no node will make this observation. Thus, even though only three of the 14 nodes are active, it appears to all the inactive nodes in the rst network that most of their neighbors are active. 30 Figure 4.1: An illustration of the \majority illusion" paradox | The two networks are identical, except for which three nodes are colored. These are the \active" nodes and the rest are \inactive." In the network on the left, all \inactive" nodes observe that at least half of their neighbors are \active," while in the network on the right, no \inactive" node makes this observation. The \majority illusion" can dramatically impact collective phenomena in net- works, including social contagions. One of the more popular models describing the spread of social contagions is the threshold model [Granovetter, 1978, Watts, 2002, Centola et al., 2007]. At each time step in this model, an inactive individual observes the current states of its k neighbors, and becomes active if more than k of the neighbors are active; otherwise, it remains inactive. The fraction 0 1 is the activation threshold. It represents the amount of social proof an individ- ual requires before switching to the active state [Granovetter, 1978]. Threshold of = 0:5 means that to become active, an individual has to have a majority of neighbors in the active state. Though the two networks in Fig 4.1 have the same topology, when the threshold is = 0:5, all nodes will eventually become active in the network on the left, but not in the network on the right. This is because the \majority illusion" alters local neighborhoods of the nodes, distorting their obser- vations of the prevalence of the active state. Thus, \majority illusion" provides an alternate mechanism for social perception biases. For example, if heavy drinkers also happen to be more popular (they are the red nodes in the gure above), then, 31 while most people drink little at parties, many people will examine their friends' alcohol use to observe a majority drinking heavily. This may explain why ado- lescents overestimate their peers' alcohol consumption and drug use [Prentice and Miller, 1993, Baer et al., 1991, Berkowitz, 2005]. The magnitude of the \majority illusion" paradox, which we dene as the fraction of nodes more than half of whose neighbors are active, depends on struc- tural properties of the network and the distribution of active nodes. As shown below, network congurations that exacerbate the paradox include those in which low-degree nodes tend to connect to high-degree nodes (i.e., networks are disas- sortative by degree). Activating the high-degree nodes in such networks biases the local observations of many nodes, which in turn impacts collective phenom- ena emerging in networks, including social contagions and social perceptions. We develop a statistical model that quanties the strength of this eect in any net- work and evaluate the model using synthetic networks. These networks allow us to systematically investigate how network structure and the distribution of active nodes aect observations of individual nodes. We also show that structure of many real-world networks creates conditions for the \majority illusion" paradox. 4.2 Impact of degree-attribute correlations Heterogeneous degree distribution also contributes to nodes perceiving that their neighbors have more of some attribute than they themselves have | what is 32 referred to as the generalized friendship paradox [Eom and Jo, 2014]. Let's con- sider again a network where nodes have a binary attribute x. For convenience, we will refer to nodes with the attribute x = 8 < : 1 as active 0 as inactive (4.1) The probability that a random node is active is P (x = 1) = P k P (x = 1jk)p(k), and that a random neighbor is active isQ(x = 1) = P k P (x = 1jk)q(k). The joint distribution of x and k can be used to compute kx , the correlation between node degrees and attributes: kx 1 x k X x;k xk [P (x;k)P (x)p(k)] (4.2) = 1 x k X k k [P (x = 1;k)P (x = 1)p(k)] = P (x = 1) x k [hki x=1 hki]: In the equations above, k and x are the standard deviations of the degree and attribute distributions respectively, and hki x=1 is the average degree of active nodes. Using Bayes' rule, this can be rewritten as Q(x = 1) = X k P (x = 1;k) p(k) kp(k) hki = X k P (x = 1;k) P (x = 1) kP(x = 1) hki = P (x = 1) hki X k kP(kjx = 1) =P (x = 1) hki x=1 hki ; 33 wherehki x=1 is the average degree of active nodes. This quantity and the average degreehki are related via the correlation coecient (Eq. (4.2)). Hence, the strength of the generalized friendship paradox is Q(x = 1)P (x = 1) = kx x k hki ; which is positive when node degree and attribute are positively correlated ( kx > 0) and increases with this correlation [Eom and Jo, 2014]. Like the friendship paradox, the \majority illusion" is rooted in dierences between degrees of nodes and their neighbors [Feld, 1991, Gupta et al., 2015]. These dierences result in nodes observing that, not only are their neighbors better con- nected [Feld, 1991] on average, but that they also have more of some attribute than they themselves have [Hodas et al., 2013]. The latter paradox, which is referred to as the generalized friendship paradox, is enhanced by correlations between node degrees and attribute values kx [Eom and Jo, 2014]. In binary attribute networks, where nodes can be either active or inactive, a conguration in which higher degree nodes tend to be active causes the remaining nodes to observe more of their neigh- bors to be active. 4.3 Impact of degree assortativity While heterogeneous degree distribution and degree{attribute correlations give rise to friendship paradoxes even in random networks, other elements of network struc- ture, such as degree assortativityr [Jo and Eom, 2014], may also aect observations nodes make of their neighbors. To understand why, we need a more detailed model of network structure that includes correlation between degrees of connected nodes 34 e(k;k 0 ). Consider a node with degree k that has a neighbor with degree k 0 and attribute x 0 . The probability that the neighbor is active is: P (x 0 = 1jk) = X k 0 P (x 0 = 1jk 0 )P (k 0 jk) = X k 0 P (x 0 = 1jk 0 ) e(k;k 0 ) q(k) : (4.3) In the equation above, e(k;k 0 ) is the joint degree distribution. Globally, the prob- ability that any node has an active neighbor is P (x 0 = 1) = X k P (x 0 = 1jk)p(k) = X k X k 0 P (x 0 = 1jk 0 ) e(k;k 0 ) q(k) p(k) = X k X k 0 P (x 0 = 1;k 0 ) p(k 0 ) e(k;k 0 ) hki k = X k 0 P (x 0 = 1;k 0 ) q(k 0 ) X k k 0 k e(k;k 0 ): Given two networks with the same degree distribution p(k), their neighbor degree distribution q(k) will be the same even when they have dierent degree correla- tions e(k;k 0 ). For the same conguration of active nodes, the probability that a node in each network observes an active neighbor P (x 0 = 1) is a function of P k;k 0 (k 0 =k)e(k;k 0 ). Since degree assortativity r is a function of P k;k 0 kk 0 e(k;k 0 ), the two expressions weigh thee(k;k 0 ) term in opposite ways. This suggests that the probability of having an active neighbor increases as degree assortativity decreases and vice versa. Thus, we expect stronger paradoxes in disassortative networks. 35 4.4 Binomial model To quantify the \majority illusion" paradox, we calculate the probability that a node of degree k has more than a fraction of active neighbors, i.e., neighbors with attribute value x 0 = 1: P > (k) = k X n>k k n P (x 0 = 1jk) n [1P (x 0 = 1jk)] kn : (4.4) HereP (x 0 = 1jk) is the conditional probability of having an active neighbor, given the node has degree k, and is specied by Eq. (4.3). Although the threshold in Eq. (4.4) could be any fraction, in this Chapter we focus on = 1 2 , which represents a straight majority. Thus, the fraction of all nodes most of whose neighbors are active is P > 1 2 = X k p(k) k X n> k 2 k n P (x 0 = 1jk) n [1P (x 0 = 1jk)] kn : (4.5) Using Eq. (4.5), we can calculate the strength of the \majority illusion" para- dox for any network whose degree sequence, joint degree distribution e(k;k 0 ), and conditional attribute distribution P (xjk) are known. The solid lines in Figs 4.3{ 4.5 report these calculations for each network. The conditional probability P (x = 1jk) = P (x 0 = 1jk 0 ) required to calculate the strength of the \majority illusion" using Eq. (4.5) can be specied analytically only for networks with \well-behaved" degree distributions, such as scale{free distributions of the form p(k)k with > 3 or the Poisson distributions of the Erd} os-R enyi random graphs in near- zero degree assortativity. For other networks, including the real world networks with a more heterogeneous degree distribution, we use the empirically determined joint probability distribution P (x;k) to calculate both P (x = 1jk) and kx . For 36 the Poisson-like degree distributions, the probability P (x 0 = 1jk 0 ) can be deter- mined by approximating the joint distribution P (x 0 ;k 0 ) as a multivariate normal distribution: hP (x 0 jk 0 )i =hP (x 0 )i + kx x k (k 0 hki); resulting in P (x 0 = 1jk 0 ) =hxi + kx x k (k 0 hki): Fig 4.2 reports the \majority illusion" in the same synthetic scale{free net- works as Fig 4.3, but with theoretical lines (dashed lines) calculated using the Gaussian approximation for estimatingP (x 0 = 1jk 0 ). The Gaussian approximation ts results quite well for the network with degree distribution exponent = 3:1. However, theoretical estimate deviates signicantly from data in a network with a heavier{tailed degree distribution with exponent = 2:1. The approximation also deviates from the actual values when the network is strongly assortative or disassortative by degree. 4.5 Numerical results in synthetic networks Synthetic networks allow us to systematically study how network structure aects the strength of the \majority illusion" paradox. First, we looked at networks with a highly heterogeneous degree distribution, which contain a few high-degree hubs and many low-degree nodes. Such networks are usually modeled with a scale-free degree distribution of the form p(k) k . To create a heterogeneous network, we rst sampled a degree sequence from a distribution with exponent , where exponent took three dierent values (2.1, 2.4, and 3.1), and then used the conguration model to create an undirected network with N = 10; 000 nodes and that degree 37 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 k−x correlation Probability of majority, P >1/2 (a)α=2.1 r kk =−0.35 r kk =−0.25 r kk =−0.15 r kk =−0.05 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 k−x correlation Probability of majority, P >1/2 (b)α=2.4 r kk =−0.20 r kk =−0.10 r kk =0.00 r=0.10 r=0.20 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 k−x correlation Probability of majority, P >1/2 (c)α=3.1 r kk =−0.15 r kk =−0.05 r kk = 0.00 r= 0.30 0 0.2 0.4 0.6 0.8 0 0.05 0.1 0.15 0.2 k−x correlation Probability of majority, P >1/2 (d) <k>=5.2 P(x=1) = 0.05 P(x=1) = 0.1 P(x=1) = 0.2 0 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3 0.4 k−x correlation Probability of majority, P >1/2 (e) <k>=2.5 P(x=1) = 0.05 P(x=1) = 0.1 P(x=1) = 0.2 Figure 4.2: Gaussian approximation. | (a) to (c), power-law networks with expo- nent . (d) to (f), Erd} os-R enyi ransom networks. Symbols show the empirically determined fraction of nodes in the paradox regime, while dashed lines show the- oretical estimates using the Gaussian approximation. sequence. We used the edge rewiring procedure described above to create a series of networks that have the same degree distributionp(k) but dierent values degree assortativity r. Then, we activated a fraction P (x = 1) = 0:05 of nodes and used the attribute swapping procedure to achieve dierent values of degree{attribute correlation kx . Fig 4.3 shows the fraction of nodes with more than half of active neighbors in these scale-free networks as a function of the degree{attribute correlation kx . The fraction of nodes experiencing the \majority illusion" can be quite large. For = 2:1, 60%{80% of the nodes will observe that more than half of their neighbors are active, even though only 5% of the nodes are, in fact, active. The \majority illusion" is exacerbated by three factors: it becomes stronger as the 38 degree{attribute correlation increases, and as the network becomes more disassor- tative (i.e., r decreases) and heavier-tailed (i.e., becomes smaller). However, even when = 3:1, under some conditions a substantial fraction of nodes will experience the paradox. The lines in the gure show show theoretical estimates of the paradox using Eq. (4.5), as described in the next subsection. 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 k−x correlation Probability of majority, P >1/2 (a)α=2.1 r kk =−0.35 r kk =−0.25 r kk =−0.15 r kk =−0.05 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 k−x correlation (b)α=2.4 Probability of majority, P >1/2 r kk =−0.20 r kk =−0.10 r kk =0.00 r kk =0.10 r kk =0.20 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 k−x correlation Probability of majority, P >1/2 (c)α=3.1 r kk =−0.15 r kk =−0.05 r kk = 0.00 r kk = 0.30 Figure 4.3: \Majority illusion" in scale-free networks. | Plots show the magnitude of the illusion in scale-free networks as a function of degree{attribute correlation kx and for dierent values of degree assortativity r. Each network has 10,000 nodes and degree distribution of the form p(k) k . The fraction of active nodes in all cases is 5%. The lines represent calculations using the statistical model of Eq. (4.5). \Majority illusion" can also be observed in networks with a more homogeneous, e.g., Poisson, degree distribution. We used the Erd} os-R enyi model to generate networks with N = 10; 000 and average degreeshki = 5:2 andhki = 2:5. We randomly activated 5%, 10%, and 20% of the nodes, and used edge rewiring and attribute swapping to change r and kx in these networks. Fig 4.4 shows the fraction of nodes in the paradox regime. Though much reduced compared to scale- free networks, we still observe some amount of the paradox, especially in networks with a greater fraction of active nodes. 39 0 0.2 0.4 0.6 0 0.02 0.04 0.06 0.08 0.1 k−x correlation Probability of majority, P >1/2 (a) <k>=5.2, P(x=1)=0.05 r kk =−0.50 r kk =0.00 r kk =0.50 0 0.2 0.4 0.6 0 0.05 0.1 0.15 0.2 k−x correlation Probability of majority, P >1/2 (b) <k>=5.2, P(x=1)=0.10 r kk =−0.50 r kk =0.00 r kk =0.50 0 0.2 0.4 0.6 0 0.1 0.2 0.3 0.4 k−x correlation Probability of majority, P >1/2 (c) <k>=5.2, P(x=1)=0.20 r kk =−0.50 r kk =0.00 r=0 kk .50 0 0.2 0.4 0.6 0 0.02 0.04 0.06 0.08 0.1 k−x correlation Probability of majority, P >1/2 (d) <k>=2.5, P(x=1)=0.05 r kk =−0.50 r kk =0.00 r kk =0.50 0 0.2 0.4 0.6 0 0.05 0.1 0.15 0.2 k−x correlation Probability of majority, P >1/2 (e) <k>=2.5, P(x=1)=0.10 r kk =−0.50 r kk =0.00 r kk =0.50 0 0.2 0.4 0.6 0 0.1 0.2 0.3 0.4 k−x correlation Probability of majority, P >1/2 (f) <k>=2.5, P(x=1)=0.20 r kk =−0.50 r kk =0.00 r kk =0.50 Figure 4.4: \Majority illusion" in random networks. | Magnitude of the illusion in Erd} os-R enyi-type random networks as a function of degree{attribute correlation kx and for dierent values of degree assortativity r. Each network has 10,000 nodes withhki = 5:2 (top row) orhki = 2:5 (bottom row), and dierent fractions of active nodes. The lines represent calculations using the statistical model of Eq. (4.5). 4.6 Numerical results in real-world networks We also examined whether \majority illusion" can be manifested in real-world networks. We looked at six dierent networks: the co-authorship network of high energy physicists (HepTh) [Leskovec et al., 2007], protein-protein interac- tions network (Reactome) [Joshi-Tope et al., 2005], social media follower graphs (Digg [Hogg and Lerman, 2012] and Twitter [Smith et al., 2013]), Enron email net- work [Klimt and Yang, 2004], and the network representing links between political blogs (blogs) [Adamic and Glance, 2005]. All six networks are undirected. To make the Digg and Twitter follower graphs undirected, we kept only the mutual follow links, and further reduced the graph by extracting the largest connected compo- nent. For Enron email network, we removed duplicate email communication links 40 between users. The degree assortativity of these networks spans a broad range, from r = 0:27 (HepTh) to r =0:22 (political blogs). Fig 4.5 shows the fraction of nodes experiencing the \majority illusion" for dierent fractions of active nodes P (x = 1) = 0:05; 0:1; 0:2 and 0:3. As degree{ attribute correlation kx increases (using the attribute swapping procedure), a substantial fraction of nodes experience the paradox in almost all networks. The eect is larger in disassortative political blogs, Twitter and Enron email networks, where for high enough correlation, as many as 60%{70% of nodes have more than half of their neighbors in the active state, even though only 20% of the nodes are active. The eect also exists in the Digg network of mutual followers, and to a lesser degree in the HepTh co-authorship and Reactome protein interactions network. Although positive degree assortativity reduces the magnitude of the eect, compared with synthetic networks, local perceptions of nodes in real-world networks can also be substantially skewed. If the attribute represents an opinion, under some conditions, even a minority opinion can appear to be extremely popular locally. Overall, our statistical model that uses empirically determined joint distribu- tion P (x;k) does a good job explaining most observations. However, the global degree assortativity r is an important contributor to the \majority illusion," a more detailed view of the structure using joint degree distributione(k;k 0 ) is neces- sary to accurately estimate the magnitude of the paradox. In fact, many dierent structures are possible with an identical degree sequence and r. These structural dierence may aect the \majority illusion." Here, we report some comparison of selected degree sequences in scale-free synthetic networks. As demonstrated in Fig. 4.6, two networks with the same p(k) and r (but degree correlation matrices e(k;k 0 )) can display dierent amounts of the paradox. 41 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 k−x correlation Probability of majority, P >1/2 (a) HepTh 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 k−x correlation Probability of majority, P >1/2 (b) Reactome 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 k−x correlation Probability of majority, P >1/2 (c) Digg 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 k−x correlation Probability of majority, P >1/2 (d) Enron email 0 0.1 0.2 0.3 0 0.2 0.4 0.6 0.8 k−x correlation Probability of majority, P >1/2 (e) Twitter 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 k−x correlation Probability of majority, P >1/2 (f) Political blogs Figure 4.5: \Majority illusion" in real-world networks | Magnitude of the illusion in real-world networks as a function of degree{attribute correlation kx for dier- ent fraction of active nodes P (x = 1). The lines represent calculations using the statistical model of Eq. (4.5). The plots are arranged in order of degree assorta- tivity, from highest (a) to lowest (f). Blue circles correspond to the fraction of active nodes P (x = 1) = 0:3, red triangles to P (x = 1) = 0:2, green squares to P (x = 1) = 0:1, and black crosses to P (x = 1) = 0:05. 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 1 k−x correlation Probability of majority, P >1/2 (a)α=2.1 r kk =−0.05 r kk =−0.15 r kk =−0.25 r=−0.35 0 0.2 0.4 0.6 0 0.2 0.4 0.6 0.8 k−x correlation Probability of majority, P >1/2 (b)α=2.4 r kk =0.20 r kk =0.10 r kk =0.00 r kk =−0.10 r kk =−0.20 0 0.2 0.4 0.6 0 0.1 0.2 0.3 0.4 k−x correlation Probability of majority, P >1/2 (c)α=3.1 r kk =0.45 r kk =0.30 r kk =0.15 r kk =0.00 r kk =−0.15 Figure 4.6: Structural dierences. | Strength of the majority illusion in synthetic networks with identical degree sequence and assortativity, but with higher-order structural dierences. 42 We generated scale-free networks with some degree sequence and assortativity. We then used edge rewiring to change network's structure while keeping the degree assortativity r (and degree sequence) the same. Existing structural constraints restrict the range of degree assortativity values that a given degree sequence may attain. Thus there are fewer choices for extreme values of degree assortativity. Fig. 4.6 reports the fraction of nodes in the paradox regime in these networks. Results don't change much in cases where structural constraints prevent varying the structure while keeping degree assortativity xed. On the other hand, the fraction of nodes in the paradox regime can vary somewhat in the mid-assortativity range. 4.7 Discussion The paradox described in this Chapter provides an alternate network-based mecha- nism for biases in social perceptions. We showed that under some conditions, indi- viduals will grossly overestimate the prevalence of some attribute, making it appear more popular than it is. We quantied this paradox, which we call the \majority illusion", and studied its dependence on network structure and attribute congu- ration. As in the friendship paradox [Feld, 1991, Hodas et al., 2013, Eom and Jo, 2014, Kooti et al., 2014], \majority illusion" can ultimately be traced to the power of high degree nodes to skew the observations of many others. This is because such nodes are overrepresented in the local neighborhoods of other nodes. This, by itself is not surprising, given than high degree nodes are expected to have more in uence and are often targeted by in uence maximization algorithms [Kempe et al., 2003]. However, the ability of high degree nodes to bias the observations of others depends on other aspects of network structure. Specically, we showed that the paradox 43 is much stronger in disassortative networks, where high degree nodes tend to link to low degree nodes. In other words, given the same degree distribution, the high degree nodes in a disassortative network will have greater power to skew the obser- vations of others than those in an assortative network. This suggests that some network structures are more susceptible than others to in uence manipulation and the spread of external shocks [Watts, 2002]. Furthermore, small changes in net- work topology, degree assortativity and degree{attribute correlation may further exacerbate the paradox even when there are no actual changes in the distribution of the attribute. This may explain the apparently sudden shifts in public attitudes witnessed during the Arab Spring and on the question of gay marriage. The \majority illusion" is an example of class size bias eect. When sampling data to estimate average class or event size, more popular classes and events will be over-represented in the sample, biasing estimates of their average size [Feld and Grofman, 1977]. Thus, the average class size that students experience at college is larger than the college's average class size. Similarly, people experience highways, restaurants, and events to be more crowded than they normally are. In networks, sampling bias aects estimates of network structure, including its degree distribution [Achlioptas et al., 2005, Gupta et al., 2015]. Our work suggests that network bias also aects an individual's local perceptions. Further work is required to understand how this bias impacts the dynamics of collective social phenomena. 44 Chapter 5 Strong Friendship Paradox 5.1 Problem setup Feld's friendship paradox states that, the number of connections, or degree, of a node is smaller than the average of its neighbor's degrees [Feld, 1991]. Recently, more subtle forms of the paradox have been proposed. The strong friendship paradox [Kooti et al., 2014] states that the degree of a node tends to be smaller than the median of its neighbor's degrees. Roughly speaking, this is equivalent to the node having fewer neighbors than do a majority of its neighbors. But unlike the original friendship paradox and some recent generalizations [Hodas et al., 2013, Eom and Jo, 2014, Jo and Eom, 2014, Cao and Ross, 2016], the strong friendship paradox does not arise as a straightforward result of sampling from skewed distributions [Kooti et al., 2014]. Physical systems whose dynamics are governed by majority rule|from Ising spin interactions [Krapivsky and Redner, 2003] to more complex voting models [Liggett, 1999]|may be aected by this paradox. In this Chapter, we develop a stochastic model to predict the magnitude of the strong friendship paradox. Specically, we show that: a) increasingly disassor- tative networks exhibit a larger paradox, and b) accurately modeling it requires considering degree correlations one step beyond those of neighboring nodes. Given a network with degree distributionp(k), we dene the global probability of the strong friendship paradox as P paradox = P k p(k)f(k), where f(k) is the 45 probability that a randomly chosen node with degree k experiences the paradox. Formally, we dene f(k)P Medianfk 0 1 ; ;k 0 k g>kjk P (k 0 (d k 2 e) >kjk); (5.1) where k 0 i is the degree of the node's ith neighbor, and here we use the notation of order statistics, for example, k 0 (1) = min(k 0 1 ; ;k 0 k ), k 0 (k) = max(k 0 1 ; ;k 0 k ). 5.2 Eects of 2K structure The strong friendship paradox depends only on the comparison between the degrees of a node and its neighbors. The probability Q > that a node sees a neighbor with degree larger than its own can be written as: Q > = X k X k 0 >k P (k 0 jk)p(k) =hki X k X k 0 >k e(k;k 0 ) k ; (5.2) since the neighbor degree distribution of a degreek node isP (k 0 jk) =e(k;k 0 )=q(k). This expression uses information about the network's 2K structure, which is glob- ally measured by the assortativity coecient. [Newman, 2002] In assortative net- works (r > 0), nodes preferentially link to other nodes with similar degree, while in disassortative networks (r < 0), they prefer to link to others with dissimilar degree, e.g., high to low degree nodes. Since k is in the numerator of the sum for r but in the denominator of Eq. (5.2), given the normalization P k;k 0 e(k;k 0 ) = 1, we may expect disassorativity to magnify the paradox in networks, and assorta- tivity to suppress it. Previous numerical results for the conventional friendship paradox [Jo and Eom, 2014] support this prediction. 46 Given a randomly chosen node with degree k, dene an indicator function x i , i = 1:::k, to track the degree of the node's ith neighbor: x i = 1 k 0 i >k = 8 < : 1 if k 0 i >k 0 if k 0 i k (5.3) To a close approximation (and exactly, for odd k), the node is in the paradox regime if x 1 k P k i=1 x i > 1 2 . To understand how network structure aects the strong friendship paradox, we now examine x (k), the probability that a neighbor (say theith one) of a randomly chosen degree-k node has degree greater than k: x (k) =P (x i = 1jk) =P (k 0 i >kjk) = X k 0 >k e(k;k 0 ) q(k) (5.4) If we assume that degrees of neighbors are independent and identically dis- tributed random variables, the probability for a degree-k node to observe the strong friendship paradox is then given by the binomial distribution: f(k) =P x> 1 2 =P k X i=1 x i > k 2 ! = k X i=d k+1 2 e k i x (k) i [1 x (k)] ki : (5.5) For large k, f(k) is close to Gaussian with mean mean and standard deviation as the following, x (k) = X k 0 >k e(k;k 0 ) q(k) ; 2 x (k) = 1 k (k)[1(k)]: (5.6) 47 In terms of the normal distribution's cumulative distribution function (), f(k) 1 1 2 x (k) x (k) 1 ( 1 2 x (k) 2 p x (k)[1 x (k)] p k ) : (5.7) 5.3 Numerical results To demonstrate how assortativity modies the strong friendship paradox, we con- sider a network with e(k;k 0 ) that has a bivariate log-normal distribution, a long- tailed distribution dened on positive domain of k, with equal means m, equal variances s 2 , and correlation coecient c. We can then express x (k) analytically as x (k) =P (k 0 >kjk) =P (logk 0 > logkjk) = 1 ( logkE(logk 0 j logk) p Var(logk 0 j logk) ) = 1 ( (1c)(logkm) p (1c 2 )s 2 ) = 1 ( logkm s r 1c 1 +c ) : (5.8) It follows thatf(k) decreases withk. As the network becomes more disassortative (c< 0),f(k) undergoes an increasingly sharp transition from 1 to 0 aroundk =e m (Fig. 7.2(a)). Given that most nodes have low degree, this leads to a globally stronger paradox in more disassortative networks (Fig. 7.2(b)), consistent with our prediction. The structure of real-world networks creates conditions for the paradox. Table 5.1 reports the observed fraction of nodes in these networks who see a majority of their neighbors with a larger degree. This fraction is very large in all networks, ranging from 75% to 90%. 48 0 10 20 30 40 Degree, k 0 0.2 0.4 0.6 0.8 1 Paradox probability, f(k) c = -0.75, r = -0.183 c = -0.25, r = -0.086 c = 0.25, r = 0.127 c = 0.75, r = 0.591 -0.2 0 0.2 0.4 0.6 Assortativity, r 0.5 0.6 0.7 0.8 0.9 1 Global prob., P paradox (a) (b) Figure 5.1: Impact of assortativity on strong friendship paradox. The matrix e(k;k 0 ) has bivariate log-normal distribution with parametersm = 2:5,s = 1:25,c ranges from0:75 to 0.75, which corresponds to assortativityr in the range0:18 to 0.6. Network Type Observed 3K model 2K model LiveJournal Social 83.71% 84.43% 86.95% Youtube Social 89.94% 88.51% 90.34% Skitter Internet 88.62% 90.35% 95.79% Google Web 77.31% 78.25% 84.36% ArXiv HEP Citation 78.71% 79.67% 83.83% English words Semantic 75.23% 71.00% 71.05% Table 5.1: Observed fraction of nodes in real-world networks that experience the strong friendship paradox, compared to predictions of the two proposed models. Table 5.1 shows that the observed fractions of nodes experiencing the paradox are close to the global probabilities predicted by the 2K model, when x (k) is set to the actual frequency with which a neighbor of a degree-k node has larger degree. However, a breakdown by degree class reveals signicant deviations. Fig. 5.2 plots the paradox probabilityf(k) for a degree-k node (blue dots). We dene the degree at which the 2K estimate (Eq. (5.7)) of paradox probability is 0.5 as the critical degreek c of the network. By construction, k c = Median(q(k)). Nodes with degree k<k c are likely to experience the paradox, while those withk>k c are unlikely to do so. The 2K model (dotted line) overestimates the paradox for low-degree nodes 49 0 100 200 300 400 500 Degree k 0 0.2 0.4 0.6 0.8 1 Probability of paradox LiveJournal Communities 0 50 100 150 200 250 300 Degree k 0 0.2 0.4 0.6 0.8 1 Probability of paradox Youtube Communities 0 100 200 300 400 Degree k 0 0.2 0.4 0.6 0.8 1 Probability of paradox Skitter Internet Topology 0 20 40 60 80 100 Degree k 0 0.2 0.4 0.6 0.8 1 Probability of paradox Google Hyperlink 0 20 40 60 80 100 Degree k 0 0.2 0.4 0.6 0.8 1 Probability of paradox ArXiv HEP Citations 0 20 40 60 80 Degree k 0 0.2 0.4 0.6 0.8 1 Probability of paradox English Words Figure 5.2: Probability of the strong friendship paradox in six real-world networks, comparing observed fraction of degree-k nodes that are in the paradox regime (blue dots) to predictions of the 2K model (dotted red line) and the 3K model (solid red line). and underestimates it for high-degree nodes. This suggests that the 2K model is insucient, and we need to take into account structure beyond degree correlations of connected pairs of nodes. 5.4 Eects of 3K structure If neighbor degrees are identically distributed but correlated random variables, Eq. (5.7) must be modied to represent a multivariate rather than a single binomial distribution. To deal with the correlation, we now consider a pair of neighbors, with degreek i andk j , of a single degree-k node, and their indicator functionsx i and 50 x j as dened in Eq. (5.3). The corresponding multivariate normal approximation then gives f(k) = 1 1 2 x (k) x (k) ; (5.9) where the variance 2 x (k) is now 2 x (k) = Var( x) = Var 1 k k X i=1 x i ! = 1 k 2 " k X i=1 Var(x i ) + 2 k1 X i=1 k X j=i+1 Cov(x i ;x j ) # = 1 k 2 [kVar(x i ) +k(k 1)Cov(x i ;x j )] = 1 k x (k)[1 x (k)] + k 1 k Cov(x i ;x j ): (5.10) Unlike in Eq. (5.7), where f(k) is completely determined by x (k), the 3K model requires the covariance term to be specied. Using values determined empir- ically from real-world networks as in the 2K model, we obtain very accurate para- dox probability estimates (solid line in Fig. 5.2). These estimates also improve on the global 2K results shown in Table 5.1 for all cases except Youtube and English words, where the two estimates are nearly identical due to their close agreement for low degree values that represent a large fraction of nodes in the network. To understand the eect of the covariance term, consider the 3K distribution t(k 0 i ;k;k 0 j ), the joint degree distribution of a connected ordered triplet of nodes 51 with degrees (k 0 i ;k;k 0 j ). Conditioning on the degree k of the focal node gives the joint degree distribution of its two neighbors: P (k 0 i ;k 0 j jk) =P (k 0 i ;k;k 0 j jk) = t(k 0 i ;k;k 0 j ) g(k) g(k) = X k 0 i ;k 0 j t(k 0 i ;k;k 0 j ): (5.11) The indicator function covariance term in Eq. (5.10) is Cov(x i ;x j ) =P (x i = 1;x j = 1jk)P (x i = 1jk) 2 =P (k 0 i >k;k 0 j >kjk)P (k 0 i >kjk) 2 ; (5.12) where P (k 0 i >k;k 0 j >kjk) = 1 g(k) X k 0 i >k;k 0 j >k t(k 0 i ;k;k 0 j ): (5.13) and P (k 0 i > kjk) is given by Eq. (5.4). Thus, the covariance takes into account correlations only up to the level of chains (k 0 i ;k;k 0 j ). Any higher-order correlations beyond 3K, such as those involving connected subgraphs of four nodes, would no longer be consistent with a normal approximation forf(k), since they would involve information beyond the second moment of the indicator function. The remarkable success of the 3K model in Fig. 5.2 suggests that such higher-order correlations are not needed to explain the paradox, or that they are negligible in real-world networks. 52 5.5 Collective behaviors of the neighbors 10 -1 10 0 10 1 k/k c 0 0.1 0.2 0.3 0.4 0.5 0.6 ρ x (k) 0 0.2 0.4 0.6 0.8 1 ¯ x 0 0.05 0.1 0.15 0.2 P(¯ x) LiveJournal Youtube Skitter Google HEP Physics WordNet Figure 5.3: (Left) Neighbor-neighbor correlation coecient by degree class for each network discussed in this Chapter. (Data have been smoothed). (Right) Distribution of x at the critical degree k c . Dene the neighbor-neighbor correlation as x (k) = Cov(x i ;x j ) p Var(x i )Var(x j ) = Cov(x i ;x j ) x (k)[1 x (k)] : (5.14) Note that this correlation, like x (k), is based not on the neighbors' degrees but on the indicator function comparing them to the node's degree. Substituting this expression into Eq. (5.10), we get 2 x (k) = 1 k x (k)[1 x (k)][1 + (k 1) x (k)]: (5.15) Now the probability for a degree k node to experience the paradox can be written using Eq. (5.7), with the variance given by Eq. (5.15): f(k) = 1 ( 1 2 x (k) 2 p x (k)[1 x (k)] s k 1 + (k 1) x (k) ) (5.16) 53 This gives a very accurate estimate of paradox probability in real-world networks, as shown by solid line in Figure 5.2 and column 3K model in Table 5.1, which use the neighbor-neighbor correlation values x (k) empirically determined for each network. Equation (5.16) reduces to equation (5.7) if we set x (k) = 0. Fig. 5.3 shows empirically determined values of x (k) for the real-world net- works we studied. Recall that in the 2K model, the probability that a degree-k node has a neighbor with degree greater thank is determined completely bye(k;k 0 ) and is unrelated to the degrees of the other neighbors. One might reasonably expect low-degree nodes to have mostly neighbors of higher degree, high-degree nodes to have mostly neighbors of lower degree, and medium-degree nodes to have a mix of both. Fig. 5.3, however, depicts a dierent scenario: medium-degree nodes pre- fer to have neighbors with similar degree to one another|whether those neighbors have higher or lower degree. To see how these correlations may be indicative of the macroscopic organization of a network, we plot the distribution of x, the fraction of higher-degree neighbors, for nodes with k = k c . In the technological networks of Skitter and Google, such medium-degree nodes link more often to high-degree nodes, possibly re ecting a hierarchical network structure with medium-degree at the top level and high-degree nodes at the next level. The remaining networks show a broad distribution of x, consistent with a core-periphery network struc- ture where medium-degree nodes link to higher-degree nodes in the core and to lower-degree nodes in the periphery [Rombach et al., 2014, Zhang et al., 2015]. 5.6 Discussion The connection between local measurement bias and network structure revealed by the strong friendship paradox is crucial for several reasons. It is often impractical 54 to observe large networks in their entirety: instead, researchers estimate network properties by exploring local neighborhoods of select nodes. The paradox, how- ever, may systematically bias local views of networks structure, including sam- pled degree distribution [Achlioptas et al., 2005]. The strong friendship paradox also aects measurements of information in networks. Consider a network where nodes have attributes and estimate their prevalence from local observations. When attribute and degree are correlated, the paradox can create an illusion that the attribute is common even when it is globally rare [Lerman et al., 2016]. Finally, quantifying measurement bias may be necessary for predicting the evolution of dynamic processes such as domain formation by majority rule in interacting spin systems [Krapivsky and Redner, 2003], or synchronization of frequencies in com- plex networks such as electrical power grids [D or er et al., 2013]. Accounting for neighbor-neighbor correlations could be instrumental to the success of network models for such systems. In this Chapter, we have studied strong friendship paradox in networks, a phe- nomenon that distorts nodes' observations of local network structure. The paradox leads most nodes to observe that a majority of their neighbors have a larger degree than their own. We have developed an analytical model of the strong friendship paradox, enabling highly accurate predictions of its strength in networks. In con- trast to Feld's friendship paradox [Feld, 1991], which exists in any network with variance in the degree distribution, the strong friendship paradox requires infor- mation about higher-order network structure. Specically, negative correlations between degrees of connected nodes|given by network's 2K structure|will mag- nify the paradox, especially in networks with a skewed degree distribution. The impact of disassortativity, however, is modulated by degree correlations between 55 nodes' neighbors. These correlations|given by network's 3K structure|are neces- sary to accurately quantify the paradox. The success of the 3K model in explaining the paradox is consistent with the observation [Mahadevan et al., 2006] that it is sucient to capture known network properties. In order to mitigate the eects of local measurement bias in networks, it is important to account for the strong friendship paradox and how it is impacted by higher-order network structure. 56 Chapter 6 Generating Function Formulation 6.1 Percolation and generating functions On articial networks dened by degree and edge distributions, the density of loops scales as N 1 and thus goes to zero in the limit of innite network size. Thus, the network is locally tree like. In this case, for a suciently small seed size, the cascades problem can be mapped to a site percolation problem, where we look at the size of the vulnerable components to which the seed nodes belong. Vulnerable nodes are those with threshold satisfying < 1=k, and thus require the presence of only one activated neighbour to become active. It is straightforward to see that one a tree-like network, the size of the cascade generated by a seed node is exactly equal to the size of the vunerable component to which the seed node belongsTo calculate the probability distribution of the sizes of a component to which a seed node belongs we revert to generating functions, which have long been used to study such problems. We here recap on generating functions, remarking on some of their important properties. A convenient way of expressing a probability distribution is through its gener- ating function. Given a probability distribution p(k), the probability generating function (PGF) is given by G 0 (x) = 1 X k=0 p(k)x k : (6.1) 57 Many interesting properties of the distribution of the function can be easily attained from the generating function, such as the mean: hki =G 0 (1): (6.2) An important property of the generating function is the power-rule. If we perform n dierent realization of the random process governing the probability distribution, then the probability distribution for the sum of the outcomes from the n dierent realization has a generating function that is the original generating function to the power of n. To illustrate this, the probability distribution for the sum of the degrees of n randomly chosen nodes is [G 0 (x)] n . The power rule is very useful. G 0 (x) was previously introduced as the generating function for the degree dis- tribution. We also introduce the family of generating functions G 1;k (x) for the degrees of neighbours of a degree-k node: G 1;k (x) = 1 X k 0 =1 e(k;k 0 ) q(k) x k 0 : (6.3) 6.2 Subcritical cascades Consider a network with N nodes, in the limit where N !1. We adopt the setting of [Dodds and Payne, 2009], distinguishing between a local cascade that reaches at most a vanishing fraction of nodes (such as some countable number), and a global cascade that spreads to a nonvanishing fraction of the nodes. Given a seed node of degree k, we denote by k;n the probability that it generates a local cascade, of size n, and we denote by g k the probability that it generates a global cascade. We represent the distribution k;n through its generating function H 0;k (x) = P 1 n=0 k;n x n . Generating functions easily allow calculating many key 58 properties of cascades, such as the total probability of a local cascade H 0;k (1) = P 1 n=0 k;n or its mean size H 0 0;k (1) = P 1 n=0 n k;n . Since a seed node can generate either a local or a global cascade, H 0;k (1) +g k = 1. First, we consider the subcritical regime, where only local cascades exist and g k = 0. In this regime, the size of a local cascade generated by a seed node of degree k can be decomposed as one (the seed itself) plus the collective size of cascades generated by its k neighbors. We denote as H 1;k (x) the generating function for the size of the local cascade created by the neighbor of a k-degree node. The power-rule of generating functions [Newman, 2002] states that if k independent realizations of a random process with generating function G(x) are created, then the probability distribution for the sum of the outcomes has generating function [G(x)] k . Therefore, H 0;k (x) =x[H 1;k (x)] k ; (6.4) where the leading factor x on the right hand side of Eq. (6.4) has the eect of adding one to the combined size of the cascade generated by the neighbors of the seed node (and thus accounting for the seed node itself). To determineH 1;k (x), we rst note that the neighbor of the k-degree seed node can only generate a cascade if it itself is vulnerable, i.e., it can be activated by a single active neighbor. This will occur if the neighbor's degreek 0 satisesk 0 1 . Given that the neighbor is activated, the size of the cascade generated by that neighbor is one plus the size the cascade generated by its neighbors down the tree (that is, not including the seed). Hence, H 1;k (x) is given by H 1;k (x) =x e(k; 1) q(k) + X k 0 > 1 e(k;k 0 ) q(k) +x b 1 c X k 0 =2 e(k;k 0 ) q(k) [H 1;k 0(x)] k 0 1 ; (6.5) 59 where the rst term on the right hand side of Eq. (6.5) is the probability that the neighbor of a degree-k node has no other outgoing links (i.e., has degree k 0 = 1), the second term is the probability that the neighbor is not vulnerable, and the third term is the generating function for the size of the cascade generated from its k 0 1 down-tree neighbors. The right hand side of Eq. (6.5) only containsH 1;k 0(x) terms, and indeed if we were to consider the next level of the tree (i.e., neighbors of the seed's neighbors) then we would obtain the same equations, as our network model only considers correlations between nearest neighbors. Thus, the full system of equations describing the distribution of the sizes of local cascades generated by a degree-k seed is given by Eqs. (6.4) and (6.5) for k min kk max . 6.3 The onset of global cascades In the subcritical regime, the mean cascade sizehs k 0 i generated by ak 0 -degree seed node isH 0 0;k 0 (1) (asg k = 0). Dierentiating Eqs. (6.4) and (6.5) and evaluating at x = 1 gives a system of equations forhs k 0 i, hs k 0 i = 1 +k 0 H 0 1;k 0 (1); (6.6) H 0 1;k (1) = b 1 c X k 0 =2 e(k;k 0 ) q(k) 1 + (k 0 1)H 0 1;k 0(1) ; (6.7) where we have used the identity H 0;k (1) = H 1;k (1) = 1 in the subcritical regime. Eq. (6.7) holds for all values of k, and can be written in matrix form as h 0 1 (1) = B1 + BUh 0 1 (1). Here, h 0 1 (1) is the state vector with (h 0 1 (1)) k = H 0 1;k+1 (1), B is a matrix with elements B kk 0 = e(k;k 0 )=q(k) for 2 k 0 b 1 c and 0 otherwise, 1 is a vector of ones, and U is a diagonal matrix with elements U kk = (k 1). 60 Notice from the upper limit of the sum in Eq. (6.7) that the threshold aects the system of equations discretely, through the integer valueb 1 c. The linear system can be rearranged to give (I BU)h 0 1 (1) = B1: (6.8) In the subcritical regime, I BU can be inverted to give an expression for h 0 1 (1) that is positive and nite. However, there exists a critical value of marking the onset of global cascades where I BU is no longer invertible, and hence there is no solution to Eq. (6.8). This critical point can be expressed formally as det(I BU) = 0: (6.9) The critical point marks the transition to the supercritical regime, where the net- work is vulnerable to global outbreaks (i.e., g k 6= 0). Note that Eq. (6.9) corre- sponds to the largest eigenvalue of BU being equal to one, and from Eq. (6.8) we can see that the supercritical regime exists if and only if the largest eigenvalue of BU is greater than one. Since the critical threshold is the largest value of for which this condition holds, it will be the inverse of an integer degree. In general, the critical condition depends on the threshold, the degree distri- butionp(k), and the joint degree distributione(k;k 0 ) through the matrix B. Thus, for a given degree distribution and threshold, degree-degree correlations determine whether the network is vulnerable to global outbreaks. To illustrate this, we con- sider two idealized cases: a network in which the degrees of nodes are completely independent, and one in which they are perfectly correlated. The rst case is a 61 network with no degree-degree correlations; therefore, e(k;k 0 ) = q(k)q(k 0 ). As a consequence of Eq. (6.9), the critical point exists when b 1 c X k=2 kp(k) hki (k 1) = 1: (6.10) The sum on the left hand side of Eq. (6.10) is a decreasing function of the threshold , and if the degree distribution satises P 1 k=1 k 2 p(k)=hki > 2, then there will always exist a critical threshold for which global cascades can occur for values but will never occur for > . Moreover, in this scenario H 0 1;k (1) is independent ofk (Eq. (6.7)), and thus from Eq. (6.6) we can see that nodes of any degree can generate global cascades in the supercritical regime. The second case we consider is a perfectly assortative network in which nodes are connected only to other nodes of the same degree: e(k;k 0 ) = (k;k 0 )q(k 0 ), where (k;k 0 ) = 1 if k = k 0 and 0 otherwise. While this limiting case may seem articial in that it results in a collection of disconnected regular graph components (note that more than one degree value is needed in order for assortativity to be well-dened), it allows Eq. (6.7) to reduce to the set of independent equations H 0 1;k (1) = 8 > > < > > : 1 + (k 1)H 0 1;k (1) 2kb 1 c 0 k = 1 or k>b 1 c (6.11) which, rearranged, givesH 0 1;k (1) = 1=(2k) for 2kb 1 c. This expression is either innite or negative for all values of k, and since H 0 1;k (1) should be positive andH 0 1;k (1) = 0 is not a solution to Eq. (6.11), the only possible solution for these degree classes k is the global cascade condition H 1;k (1)!1. Thus, in perfectly assortative networks, there will always be global cascades when 1=2, but from 62 Eq. (6.6) we see that these cascades can only be triggered by seed nodes that are vulnerable (i.e., 2kb 1 c). The two scenarios above illustrate not only that degree-degree correlations aect the onset of global cascades in networks, but also that such correlations aect the ability of seeds in dierent degree classes to trigger global outbreaks. 6.4 Supercritical regime and node in uence To identify in uential nodes, we calculate the expected size of the cascades they trigger. A node can be considered in uential if it is able to seed a global out- break [Kempe et al., 2003]. In the supercritical regime, the network is composed of a single vulnerable component , and thus a node will trigger a global outbreak if and only if it belongs to this giant vulnerable component (GVC). The size S of the giant vulnerable component can be calculated as follows. A k-degree node will be in the GVC if and only if it triggers a global cascade, and so the fraction of k-degree nodes that are in the GVC is the fraction that will generate global cascades, i.e.,g k . Summing over all degree classes, the size of the giant vulnerable component is S = 1 X k=1 p(k)g k : (6.12) Because a k-degree node either belongs to the GVC or not, the expected fraction of the network triggered by a k-degree seed is S k =S for nodes in the GVC, and S k = 0 for nodes outside of the GVC, and thus we have that S k =Sg k . Then, as 63 g k = 1H 0;k (1), and H 0;k (1) = [H 1;k (1)] k (from Eq. (6.4)), we have a system of equations that give us the expected outbreak size triggered by a node of degree k: S k = 1 [H 1;k (1)] k S (6.13) H 1;k (1) = e(k; 1) q(k) + X k 0 > 1 e(k;k 0 ) q(k) + b 1 c X k 0 =2 e(k;k 0 ) q(k) [H 1;k 0(1)] k 0 1 (6.14) S = 1 X k 0 =1 p(k 0 ) 1 [H 1;k 0(1)] k 0 (6.15) Equation (6.14), for all values ofk, gives a set of polynomial equations that can be solved for H 1;k (1), and substituting these values into Eqs. (6.13) and (6.15) gives us the expected cascade size S k triggered by nodes of degree k. 64 Chapter 7 Network Structure and Dynamics 7.1 Problem setup Under what conditions does a local shock, e.g. a change of state of a node, spread globally? Researchers have examined how the dynamics of such cascades are aected by network structure, including degree distribution [Watts, 2002] and degree correlations between connected nodes [Gleeson, 2008, Payne et al., 2009]. Surprisingly, networks with both suciently positive and suciently neg- ative assortativity have been found to be vulnerable to global outbreaks. Such non-monotone behavior was reported in Erd} os-R enyi random networks [Payne et al., 2009], and for k-core networks [Gleeson, 2008]. However, these works did not explain the anomalous relationship between degree assortativity and cascade size, nor did they provide a general mathematical framework for quantifying how degree correlations aect the properties of cascades under the threshold model. Our work addresses these topics and also demonstrates that assortativity does not adequately capture some important aspects of degree correlations in networks. Specically, in real-world networks, assortativity is heavily skewed by nodes with a single neighbor, which act as \dead ends" to spreading cascades. To address this shortcoming, we introduce a new measure, which accounts for degree correlations of nodes that participate in a spreading cascade. We demonstrate that cascade size is monotonically related to this measure. 65 In this Chapter, we study dynamics of cascades that spread on networks accord- ing to the Watts threshold model [Watts, 2002]. Every node in this model can be in one of two possible states: either active or inactive (e.g., adopting an innovation or not). Nodes change their state based on the states of their neighbors, with acti- vated nodes remaining active in the later updating steps. Specically, an inactive node withk neighbors becomes active if at leastk of its neighbors are active. The activation threshold takes values 0 1, with smaller values of rendering a node more susceptible to the in uence of neighbors. Following Watts, we call a node vulnerable if it can be activated by a single active neighbor. In a thresh- old model this is equivalent to a node of degree k having an activation threshold < 1=k. We ignore correlations beyond two neighboring nodes, so structures of higher-order than e(k;k 0 ) are assumed to be random. We use the generating function approach [Dodds and Payne, 2009] to derive the expected size of cascades triggered by a single active node. In the subcritical regime, cascades never reach an appreciable fraction of the network, but when the system transitions to the supercritical regime, global cascades are possible. We derive the condition for this supercritical transition to occur. The supercritical formulas yield the expected size of cascades given the seed's degree. To better understand how degree correlations aect cascades, we model the joint degree distribution using a bivariate log-normal distribution [Wu et al., 2017]. Strong assortative behavior renders networks vulnerable to global outbreaks. Surprisingly, the same holds for strongly disassortative behavior, as long as there are enough links between vulnerable nodes. This highlights the limits of using assortativity to measure degree correlation. Additionally, we show that in some networks when assortativity is strongly positive (or negative), lower degree nodes are more in u- ential, as they can trigger larger outbreaks. As assortativity approaches zero from 66 either direction, in uence shifts to higher degree nodes. This phenomenon can be explained by the fraction of edges that link vulnerable nodes. We replicate these ndings by simulating cascades on synthetic random net- works with power-law degree distribution, which have been rewired to obtain a range of degree-degree correlations. Both the onset of global outbreaks and their size agree with theory. However, in real-world networks drawn from diverse domains, cascade size is systematically smaller than theoretical predictions. This re ects the fact that real-world networks have structure beyond that given by degree-degree correlations [Gleeson, 2008, Wu et al., 2017]. After rewiring the networks so as to eliminate the higher-order structure, while preserving the degree and joint degree distributions, we nd the agreement with theoretical predictions of simulated outbreak sizes restored. This suggests that higher-order network struc- ture suppresses outbreaks. Despite this, theory predicts well who the in uential nodes are. 7.2 Bivariate log-normal model To understand better how the properties of cascades depend on network structure, we model e(k;k 0 ) as a bivariate log-normal distribution [Wu et al., 2017] with parameters (; 2 ;c), where is the location parameter, 2 is the scale parameter, and c is the correlation coecient. The assortativity of the network is then r = Cov(k;k 0 ) Var(k) = e c 2 1 e 2 1 : (7.1) Note that, for a given value of 2 , the assortativity is bounded bye 2 r 1. We identify the location of the critical point dening the onset of global cascades by inserting the log-normal distribution into Eq. (6.9) and varying c to produce a 67 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Network Assortativity r 0.100 0.125 0.150 0.175 0.200 0.225 0.250 Critical threshold φ ∗ μ = 2.5,σ = 2.5 μ = 1.5,σ = 2.5 μ = 2.5,σ = 1.5 μ = 1.5,σ = 1.5 Figure 7.1: Critical threshold as a function of network's degree assortativity. Global cascades exist in the parameter region< , and only local cascades exist in the region > . range of assortativity values. Fig. 7.1 shows the series of critical points obtained from the theory using parameters = 1:5 and 2.5, = 1:5 and 2.5 as a function of assortativity r. The critical point is dened by the threshold , such that global cascades exist in the parameter region< , and only local cascades exist in the region > . A maximum degree in the joint degree distribution needs to be introduced here in order to solve the outbreak size by Eq. (6.14). We set the cuto to bek max = 1; 000. As expected, assortative (r> 0) networks are more vulnerable to global outbreaks, since with increasing assortativity, vulnerable nodes are more likely to connect to other vulnerable nodes, which helps create the giant vulnerable component on which global outbreaks spread. Interestingly, as observed by Payne et al. [Payne et al., 2009], disassortative (r < 0) networks are vulnerable to global outbreaks as well. Fig. 7.1 shows that, for certain values of and, the critical point displays non-monotone behavior and in fact increases with disassortativity. At rst this may appear puzzling, since 68 in the disassortative regime, vulnerable nodes are less likely to connect to other vulnerable nodes, which would seem to inhibit the formation of a giant vulnerable component on which cascades spread. That intuition is awed, however, because of the heterogeneous nature of the joint degree distribution: even in networks that are globally disassortative, nodes of lower degree can be linked assortatively. Some numerical evidence of this has been seen in [Payne et al., 2009] for the r =0:8 case of a random network, where nodes with degrees k = 2 through k = 4 have a strong tendency to link to one another. 10 0 10 1 10 2 Seed degree k 0.0 0.2 0.4 0.6 0.8 1.0 Expected Outbreak Size r = .7844 r = .4963 r = .2273 r = .0312 r =−.0820 r =−.2605 Figure 7.2: Expected outbreak size S(k) on network with log-normal distribution network joint degree distribution e(k;k 0 ). The distribution is with parameters = 1:5 and = 1:5, and the assortativity parameter c is tunable between range 1 and 1. The vertical dashed line indicates the threshold chosen = 1 6 . We also consider the impact of assortativity on how in uential nodes of a given degree are, as measured by their ability to initiate large cascades. Fig. 7.2 shows the expected size of cascades triggered by seeds in dierent degree classes. Perhaps surprisingly, the best-connected nodes|hubs|are not always the most in uential. Instead, we again see a non-monotone eect. For highly assortative networks, the 69 largest cascades are triggered by lower-degree nodes. As assortativity decreases, in uence shifts to higher-degree nodes, and lower-degree nodes lose their ability to trigger large outbreaks. But when assortativity becomes negative, the picture changes: in uence shifts back to lower degree nodes. In highly disassortative net- works, similar to the highly assortative networks, the degree of the most in uential nodes is not far from the inverse of the threshold ( 1 ). −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Assortativity r 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Joint probability weight μ = 1.5,σ = 1.5 μ = 1.5,σ = 2.5 μ = 2.5,σ = 1.5 μ = 2.5,σ = 2.5 Figure 7.3: The block probability weight on dierent threshold chosen in e(k;k 0 ) with = 1:5; 2:5 and = 1:5; 2:5 which sums over the element values in range of 2k 1 0 , 2k 0 1 0 , where 1 0 solves the equation of Eq. (6.10). In order to explain these eects, consider how degree correlations among nodes in disassortative networks can promote global outbreaks. From Eq. (6.9), it is clear that the matrix BU dictates the occurrence of global outbreaks. This matrix has entries e(k;k 0 )(k 0 1)=q(k) for 2 k 0 b 1 c, and as decreases, the nonzero part of the matrix increases. At some point, it may reach a size for which the largest eigenvalue is greater than one, at which point it will be unstable to global outbreaks. The interpretation of this is that decreasing makes an increasing number of nodes vulnerable, and when a sucient number of links exist between 70 vulnerable nodes | or when a sucient amount of probability mass is contained in the submatrix e(k;k 0 ) for 2k;k 0 b 1 c | then a giant vulnerable component will form. Fig. 7.3 shows how the probability mass of e(k;k 0 ) in the vulnerable regime (i.e., e(k;k 0 ) for 2 k;k 0 b 1 c) changes as a function of the global assortativity coecient. Comparing this plot to Fig. 7.1, we can see that the parameter regime where the block probability mass is large corresponds to networks that are more unstable to global outbreaks, even when they are disassortative at the global level. −0.4 −0.2 0.0 0.2 0.4 0.6 Vulnerable Correlation 0.100 0.125 0.150 0.175 0.200 0.225 0.250 Critical threshold φ ∗ μ = 2.5,σ = 2.5 μ = 1.5,σ = 2.5 μ = 2.5,σ = 1.5 μ = 1.5,σ = 1.5 Figure 7.4: Critical threshold as a function of the network's vulnerable node degree correlation dened by Eq. (7.2). Global cascades exist in the parameter region < , and only local cascades exist in the region > . As an alternative to assortativity, we propose a measure that quanties degree correlation between vulnerable nodes. We introduce an indicator for node i to be vulnerable as v i = 1 2k i b 1 0 c , where 1 0 solves Eq. (6.10). Here P (v i = 1) = P b 1 0 c k=2 q(k) =Q v . Then the correlation between degrees of vulnerable nodes is vul = P b 1 0 c k=2 P b 1 0 c k 0 =2 e(k;k 0 )Q 2 v Q v (1Q v ) : (7.2) 71 When we plot the critical threshold as a function of this new measure|vulnerable node degree correlation|we observe a monotone behavior (Fig. 7.4). That sug- gests that this quantity, rather than degree assortativity, is an appropriate control parameter determining the size of outbreaks. Both the most disassortative and the most assortative networks in Fig. 7.1 have high vulnerable correlation in Fig. 7.4. This also explains the eect seen in Fig. 7.2. While high-degree nodes can in uence more nodes once they are activated, low-degree nodes are more likely to be vul- nerable. Thus, when the vulnerable node degree correlation is high, large cascades are more likely to be initiated by nodes of lower degree. We use the theoretical framework to study vulnerability of synthetic and real- world networks to global outbreaks and the properties of cascades in such networks. 7.3 Cascades in synthetic networks Using the conguration model, we generate networks with N = 10; 000 nodes and a power law-like degree sequence with exponents = 2:1 and 2.4. The resulting networks have assortativity r =0:17 and0:07 respectively. We then change the degree correlations in the networks by rewiring them according to Newman's method [Newman, 2002]. The rewiring procedure picks two pairs of linked nodes at random and exchanges their edges if doing so changes the assortativity in the desired direction. Note that this procedure only changes the so-called 2K structure of the network [Wu et al., 2017]|its joint degree distribution e(k;k 0 ) matrix| without changing its degree distribution. Through this process, we obtain a series of networks that share the same degree sequence but span a range of assortativity values fromr =0:1 to0:3 for = 2:1, andr = 0:05 to0:15 for = 2:4. Note 72 that we choose the degree sequence with not too large maximum degree to allow greatest exibility for the rewiring process. 10 −2 10 −1 10 0 Threshold φ 0.0 0.2 0.4 0.6 0.8 Giant Vulnerable Cluster Size r = 0.05 r = 0.02 r = 0.00 r =−0.07 r =−0.15 Figure 7.5: Theoretical calculation (lines) and observations (points) of giant vul- nerable component of synthetic power-law networks with exponent = 2:4 and number of nodes N = 10; 000. Fig. 7.5 reports the size of the giant vulnerable component for dierent values of the threshold . This was calculated by plugging the e(k;k 0 ) matrix of the network into the nonlinear equations Eq. (6.14). The critical point signaling the onset of global outbreaks occurs at the in ection point, where the line departs from the x-axis. As assortativity of the rewired network increases (r>0:07), the critical point shifts to ever larger values of . Degree correlations destabilize the network, creating conditions for outbreaks to spread. This is because in assortative networks, similar degree nodes are connected, with vulnerable nodes more likely to be linked to one another, forming a giant vulnerable component on which out- breaks spread. However, as assortativity increases, the network also becomes more fragmented. Since cascades are limited to the giant component of the network, 73 10 0 10 1 10 2 10 3 0.0 0.2 0.4 0.6 0.8 1.0 Avg. Outbreak size hSki α = 2.1φ = 1/10,r = −0.1 10 0 10 1 10 2 10 3 0.0 0.2 0.4 0.6 0.8 1.0 α = 2.1,φ = 1/10,r = −0.15 10 0 10 1 10 2 Seed Degree k 0.0 0.2 0.4 0.6 0.8 1.0 Avg. Outbreak size hS k i α = 2.4,φ = 1/5,r = 0.05 10 0 10 1 10 2 Seed Degree k 0.0 0.2 0.4 0.6 0.8 1.0 α = 2.4,φ = 1/5,r = 0.01 Experimental Theory Figure 7.6: Theoretical calculation and observations of single seed cascading prob- lems on synthetic power-law networks with = 2:1 and 2.4, comparing the observed expected outbreak by the seed degree (blue dots) and the predictions of the generating function formulation (red line). assortativity decreases their maximum size. In contrast, as the network becomes more disassortative (r<0:07), we also observe that the upper bound of the onset of global outbreaks increases. This non-monotonicity is similar to that observed in numerical experiments with the log-normal distribution (Fig. 7.1). Although the assortativity of the unrewired network (black line, r =0:07) is slightly negative due to the structural cuto [Bogu~ n a et al., 2004], this network corresponds to the neutral assortativity networks in Fig. 7.1 with lowest values of critical threshold. When the network is more assortative or more disassortative, the critical threshold for the onset of global cascades shifts to larger values, just as in Fig. 7.1. Next, we study node in uence, which, as explained earlier, we measure by its ability to trigger large cascades. We simulate cascades using the threshold that 74 puts the network in a slightly supercritical regime. Fig. 7.6 shows expected size of cascades predicted by theory, together with the average size of the simulated outbreaks triggered by nodes of dierent degree. Each plot in Fig. 7.6 with = 2:4 produces a single point in Fig. 7.5 via a weighted sum by network degree distribution. Again, the observations made in synthetic networks are similar to those for the log-normal model. In the more disassortative networks (right hand plots in Fig. 7.6), the high- degree nodes are more in uential. However, when assortativity increases, low- degree nodes also become in uential. In the more assortative networks (left hand plots in Fig. 7.6), both the low-degree and high-degree nodes initiate larger cas- cades, on average, than the middle-degree nodes. 7.4 Cascades in real-world networks 10 −3 10 −2 10 −1 Threshold φ 0.0 0.2 0.4 0.6 0.8 1.0 Giant Vulnerable Cluster Size Facebook Digg HepTh WordNet Figure 7.7: Theoretical calculation (lines) and observations (points) of giant vul- nerable component of selected real-world networks. 75 10 0 10 1 10 2 10 3 0.0 0.2 0.4 0.6 0.8 1.0 Avg. Outbreak size hSki Facebook φ = 1/10 10 0 10 1 10 2 10 3 0.0 0.2 0.4 0.6 0.8 1.0 Reactome φ = 1/9 10 0 10 1 10 2 0.0 0.2 0.4 0.6 0.8 1.0 Digg φ = 1/9 10 0 10 1 10 2 Seed Degree k 0.0 0.2 0.4 0.6 0.8 1.0 Avg. Outbreak size hSki ArXiv HepPh φ = 1/7 10 0 10 1 Seed Degree k 0.0 0.2 0.4 0.6 0.8 1.0 ArXiv HepTh φ = 1/6 Observation (real-world) Observation (Reconstructed) Theory 10 0 10 1 10 2 10 3 Seed Degree k 0.0 0.2 0.4 0.6 0.8 1.0 WordNet φ = 1/6 Figure 7.8: Theoretical calculation and observations of single seed cascading prob- lems on real-world networks, comparing the observed expected outbreak by the seed degree (blue dots), predictions of the generating function formulation (red line) and the observation on the reconstructed graph (green crosses). Finally, we study cascade dynamics on real-world networks, which include bio- logical, social, and semantic networks, ranging in size from 4k to 150k nodes. The basic properties of these networks are listed in Table B.1. Their assortativity ranges from mildly disassortative (r =0:0623) to strongly assortative (r = 0:6322). Fig. 7.7 shows the size of the giant vulnerable component as a function of thresh- old . Unlike in synthetic networks, the theory (lines) does not agree well with results of simulations (symbols). We simulate cascading dynamics on real world-networks and measure their size. The thresholds for these simulations are chosen slightly below the theoretical global cascading threshold by Eq. (6.9). As shown in Fig. 7.8, the average size of cascades triggered by seeds of a given degree (blue dots in Fig. 7.8) is smaller than theoretical predictions (red lines). This is likely because real-world networks have 76 structure beyond that given by the joint degree distributione(k;k 0 ), which our the- ory does not take into account, and which connes cascades within portions of the network. To test the hypothesis, we rewire the network in such a way as to preserve thee(k;k 0 ) andp(k) distributions, but destroy higher-order structure such as clus- tering, community structure, or neighbor degree correlations [Wu et al., 2017]. In each step of the rewiring, we randomly choose two edges subject to the constraint that one endpoint from each edge shares the same degree value. Then we swap the edges so that each of those endpoints instead links to the other node [Mahadevan et al., 2006]. This step does not change the two edges' contribution toward the joint degree distribution e(k;k 0 ). The step is then implemented a large number of times, roughly equal to the total number of edges. Since the edge chosen depends only on the degrees of its endpoints, the procedure is sucient to eliminate the higher-order structures in the network. The average cascade size on such rewired networks (green dots in Fig. 7.8) is in agreement with theoretical predictions. This suggests that higher-order structure beyond degree-degree correlations suppresses outbreaks in real-world networks. At the same time, we notice that the theory does accurately predict the seed degree at which the outbreak size peaks in these real-world networks. We explain this as follows. In the log-normal joint distribution model, we have seen that low- degree nodes (degree near 1 ) are in uential when the network is both highly assortative and highly disassortative, and a similar peak phenomenon also exists in assortative synthetic networks. In real-world networks, on the other hand, the presence of higher-order structure means that there is a mixing of more assorta- tive and more disassortative network elements. Our analysis of the log-normal model illustrates that both of these elements, assortative and disassortative, can contribute to a peak of in uence around 1 . Thus, even if the theory does not 77 reproduce the size of cascades well in real-world networks, it correctly identies the location of the peak and thereby identies the in uential nodes in the network. 7.5 Discussion In this Chapter, we have explored how the structure of networks aects the dynam- ics of cascades. We have used a tree-like approximation to calculate the expected size of cascades spreading on networks according to the Watts threshold model. The mathematical formulation allows us to explicitly model the impact of degree correlations, specied by the joint degree distribution e(k;k 0 ), on the size of out- breaks triggered by a single node. Global outbreaks are more likely in strongly assortative networks, where the degrees of connected nodes are highly correlated. In such networks, nodes that are vulnerable to changing state tend to be con- nected to other vulnerable nodes, forming a giant connected component on which cascades spread. Surprisingly, strongly disassortative networks are also unstable to global outbreaks, but only when enough vulnerable nodes are connected. Outside of this important block of assortativity, the probability mass of the joint degree distribution matrix is dominated by non-spreading nodes, leading to an overall neg- ative assortativity. We have introduced a new measure | vulnerable node degree correlation | that better captures the size of outbreaks of the Watts threshold model. We have also explored the role of seed node degree in cascades. While low- degree nodes are the most vulnerable, high-degree nodes are typically the most in uential as they can trigger the largest outbreaks. On the other hand, in suf- cently assortative as well as suciently disassortative networks, the low degree nodes turn out to be the most in uential. We have found that this, too, relates 78 closely to the correlation between degrees of vulnerable nodes. When that correla- tion is suciently high, which corresponds to both the highly assortative and the highly disassortative case, the vulnerable low-degree nodes are themselves able to initiate the largest cascades. Our theory, which is based only on degree correlations of connected nodes, accurately predicts the seed degree at which a local maximum in cascade size can occur in a real-world network: near the inverse threshold 1 . However, for the theory to correctly predict the cascade size itself, these networks must be rewired so as to randomize any structure beyond the joint degree distribution. We know that local assortativity is more heterogeneous in real-world networks than in synthetic ones, with certain parts of a given network being assortative and other parts being disassortative. This suggests that manipulating a network's higher-order structure may allow us to tune the size of cascades, with an appropriate rewiring strategy oering a valuable tool for tailoring its stability to outbreaks. 79 Chapter 8 Conclusion Structures of the network constraint the information ow, and nally aect the information distribution in networks. In our theoretical frameworks, we link the information propagation process with structural measures of the networks. Specif- ically, we provided the mathematical proof to connect the \majority illusion" phe- nomena with degree-attribute correlation and network assortativity. [Lerman et al., 2016] In further investigation of the strength of the strong friendship paradox, we discovered that nodes in real-world networks tends to make friend with similar degrees or attributes. [Altenburger and Ugander, 2018] This yields the denition of the neighbor assortativity and neighbor-neighbor correlation. The newly pro- posed measures have been successfully integrated to the mathematical model, this enhance the accuracy of the predictions on the strength of the paradox. [Wu et al., 2017] The dynamic process, which may be understand as a multi-step version of a biased propagation, consists of more complex interactions between structures and information distribution. The structural eects on dynamic process of complex contagion are studied by using message-passing and percolation approaches. [Cohen et al., 2000, Watts, 2002, Morone and Makse, 2015] Some lower level network mea- sures are embedded into the percolation theory to enhance the prediction. [Gal- styan and Cohen, 2007, Goltsev et al., 2008, Dodds and Payne, 2009] In this work, 80 we proposed a new measure | the vulnerable correlation | which serve as a cor- rection to the widely known assortativity measure as a predictor of the cascading size from a single seed. [Wu et al., 2018] From our theoretical analysis and statistical models, we have explained the local information biases through feature phenomena such as strong friendship paradox and majority illusion. These networks are assumed to be friendship-based undi- rected graphs. Information in these graphs are free to ow through edges in both directions. In real-world, there are signicant amount of information ows can only been provided through one direction of the edge. This form a directed graph. There are evidences that the local information biases also exists in such graphs. The theory presented in this work can be extended to model direct graphs. Generating functions approaches are proven to be successful in explaining per- colation or complex contagion problem. However, currently there are not many studied on integrating higher order structural measures into the theory. In this work, we showed that the higher order structures have impacts on the nal cascad- ing size, also the neighbors are usually correlated to each other. There is a clear rationale to incorporate the collective behaviors of neighbors into the mathematical formulations. 81 Appendix A Symbols A.1 Network measures N number of nodes in the network E number of edges in the network hki average degree of nodes in the network p(k) node degree distribution of the network q(k) neighbor degree distribution of the network g(k) connected triplet center degree distribution of the network e(k;k 0 ) 2K joint degree distribution of the edges in the network t(k 0 i ;k;k 0 j ) 3K joint degree distribution of the connected triplets in the network r network assortativity, correlation of the degrees between two con- nected nodes kx correlation between node degrees and attributes in the network x (k) correlation between two neighbor indicator functions of a degree k node c(k) neighbor assortativity, correlation of the degrees between two neigh- bor degrees of a degree k node 82 A.2 Generating functions and cascading vul Vulnerable correlation hs k 0 i Expected subcritical outbreak size generated by a degree k 0 seed H 0;k Generating function of the vulnerable cluster size that a degree k node belongs H 1;k Generating function of the vulnerable cluster size that a degree k node connects S k Expected supercritical outbreak size generated by a degree k seed S Expected supercritical outbreak size generated by a random seed k;n Probability that a degree k seed generates a local cascade of size n g k Probability that a degree k seed generates a global cascade 83 Appendix B Data and Networks B.1 Synthetic networks We used the conguration model [Bender and Caneld, 1978, Molloy and Reed, 1995], as implemented by the SNAP library [Leskovec and Krevl, 2014] to create a scale-free network with a specied degree sequence. We generated a degree sequence from a power law of the form p(k) k . Here, p k is the fraction of nodes that have k half-edges. The conguration model proceeded by linking a pair of randomly chosen half-edges to form an edge. The linking procedure was repeated until all half-edges have been used up or there were no more ways to form an edge. To create Erd} os-R enyi-type networks, we started with N = 10; 000 nodes and linked pairs at random with some xed probability. These probabilities were chosen to produce average degree similar to the average degree of the scale-free networks. B.2 Real-world networks In this dissertation we also applied our statistical models on real-world networks from various domains. These networks vary in size from 34 nodes (Karate Club) to almost 4M nodes (LiveJournal), and assortativity from 0.63 (ArXiv HepPh) to 0:4756 (Karate Club). The basic properties of the real-world networks networks we used in this dissertation are listed in Table B.1. 84 Networks used in Chapter 4 The \Majority Illusion" include: collaboration net- work of high energy physicist (ArXiv HepTh), human protein{protein interactions network (Reactome), Digg follower graph (Digg), Enron email network (Enron), Twitter user voting graph (Twitter), and a network of political blogs (Political blogs). Networks used in Chapter 5 Strong Friendship Paradox include: friendship links on LiveJournal blogging site (LiveJournal), community structure on Youtube (Youtube), Internet topology from Skitter project (Skitter), Google web hyperlink graph (Google), scientic citations graph (ArXiv HepCit), and semantic relation- ships between English words (WordNet). Networks used in Chapter 7 Network Structure and Dynamics include: Face- book friendship network (Facebook), Digg follower graph (Digg), human protein{ protein interactions network (Reactome), two collaboration networks of high energy physicist (ArXiv HepPh, ArXiv HepTh), and semantic relationships between English words (WordNet). Sources of the network datasets are: SNAP: Stanford network analysis project (LiveJournal, Skitter, Youtube, Google, ArXiv HepCit, ArXiv HepPh, ArXiv HepTh, Facebook) [Leskovec and Krevl, 2014] WordNet: A lexical database for English (WordNet) [Fellbaum and Tengi, 2005] Enron email network (Enron) [http://www.cs.cmu.edu/enron/] Digg follower graph (Digg) [http://dx.doi.org/10.6084/m9.figshare.2062467] 85 Twitter user voting graph (Twitter) [Smith et al., 2013] Reactome project (Reactome) [http://www.reactome.org/pages/download-data/] Network of political blogs (Political blogs) [http://www-personal.umich.edu/mejn/netdata/] Zachary's Karate Club (Karate Club) [Zachary, 1977] 86 Network Type Nodes Edges hki Assortativity LiveJournal Social 3,997,962 34,681,189 17.35 0.045145 Skitter Internet 1,696,415 11,095,298 13.08 0:0814 Youtube Social 1,134,890 2,987,624 5.27 0:0369 Google Hyperlink 875,713 4,322,051 9.87 0:0551 WordNet Semantic 146,005 656,999 9.00 0:0623 Enron email Social 36,692 367,662 20.04 0:1108 ArXiv HepCit Citation 34,546 420,877 24.37 0:0059 Digg Social 27,567 175,892 12.76 0.1660 Twitter Social 23,025 336,262 29.21 0:1375 ArXiv HepPh Co-authorship 12,008 118,489 19.74 0.6322 ArXiv HepTh Co-authorship 9,877 25,998 5.26 0.2679 Reactome Biological 6,327 146,160 46.64 0.2449 Facebook Social 4,039 88,234 43.69 0.1660 Political blogs Social 1,490 19,090 25.62 0:2212 Karate Club Social 34 78 6.88 0:4756 Table B.1: List of real-world networks and their basic proles. Note that directed edges, if they exist, are treated as undirected edges. 87 Reference List Dimitris Achlioptas, Aaron Clauset, David Kempe, and Cristopher Moore. On the bias of traceroute sampling: or, power-law degree distributions in regular graphs. In Proc. 37th ACM Symposium on Theory of Computing (STOC), 2005. Lada A Adamic and Natalie Glance. The political blogosphere and the 2004 us election: Divided they blog. In Proceedings of the 3rd international workshop on Link discovery, pages 36{43. ACM, 2005. Kristen M. Altenburger Altenburger and Johan Ugander. Monophily in social net- works introduces similarity among friends-of-friends. Nature Human Behaviour, 2:284{290, 2018. J. S. Baer, A. Stacy, and M. Larimer. Biases in the perception of drinking norms among college students. Journal of Studies on Alcohol, 52(6):580{586, 1991. Albert-L aszl o Barab asi and M arton P osfai. Network Science. Cambridge Univer- sity Press, Cambridge, 2016. ISBN 1107076269. Jonathan M. Bearak. Casual contraception in casual sex: Life-cycle change in undergraduates' sexual behavior in hookups. Social Forces, 93(2):483{513, 2014. Edward A Bender and E Rodney Caneld. The asymptotic number of labeled graphs with given degree sequences. Journal of Combinatorial Theory, Series A, 24(3):296{307, 1978. Alan D Berkowitz. An overview of the social norms approach. Changing the culture of college drinking: A socially situated health communication campaign, pages 193{214, 2005. Lu s M. A. Bettencourt, Ariel Cintr on-Arias, David I. Kaiser, and Carlos Castillo- Ch avez. The power of a good idea: Quantitative modeling of the spread of ideas from epidemiological models. Physica A: Statistical Mechanics and its Applications, 364:513{536, 2006. 88 Mari an Bogu~ n a, Romualdo Pastor-Satorras, and Alessandro Vespignani. Cut-os and nite size eects in scale-free networks. European Physical Journal B, 38 (2):205{209, 2004. Duncan S. Callaway, Mark E. J. Newman, Steven H. Strogatz, and Duncan J. Watts. Network robustness and fragility: Percolation on random graphs. Phys- ical Review Letters, 85:5468{5471, 2000. Yang Cao and Sheldon M. Ross. The friendship paradox. Mathematical Scientist, 41(1):61{64, 2016. Dorwin Cartwright and Frank Harary. Structural balance: A generalization of Heider's theory. Psychological Review, 63(5):277{293, 1956. Damon Centola. The spread of behavior in an online social network experiment. Science, 329(5996):1194{1197, 2010. Damon Centola and Andrea Baronchelli. The spontaneous emergence of conven- tions: An experimental study of cultural evolution. Proceedings of the National Academy of Sciences, 112(7):1989{1994, 2015. ISSN 1091-6490. Damon Centola, V ctor M. Egu luz, and Michael W. Macy. Cascade dynamics of complex propagation. Physica A: Statistical Mechanics and its Applications, 374 (1):449{456, 2007. Nicholas A. Christakis and James H. Fowler. Social network sensors for early detection of contagious outbreaks. PLoS ONE, 5(9):e12948, 2010. Reuven Cohen, Keren Erez, Daniel ben Avraham, and Shlomo Havlin. Resilience of the internet to random breakdowns. Physical Review Letters, 85:4626, 2000. Reuven Cohen, Shlomo Havlin, and Daniel ben Avraham. Ecient immunization strategies for computer networks and populations. Physical Review Letter, 91: 247901, 2003. Peter S. Dodds and Joshua L. Payne. Analysis of a threshold model of social contagion on degree-correlated networks. Physical Review E, 79:066115, 2009. Florian D or er, Michael Chertkov, and Francesco Bullo. Synchronization in com- plex oscillator networks and smart grids. Proceedings of the National Academy of Sciences, 110(6):2005{2010, 2013. Young-Ho Eom and Hang-Hyun Jo. Generalized friendship paradox in complex networks: The case of scientic collaboration. Scientic Reports, 4, 2014. ISSN 2045-2322. 89 Giuseppe Facchetti, Giovanni Iacono, and Claudio Altani. Computing global structural balance in large-scale signed social networks. Proceedings of the National Academy of Sciences, 108(52):20953{20958, 2011. Scott L. Feld. Why your friends have more friends than you do. American Journal of Sociology, 96(6):1464{1477, 1991. Scott L. Feld and Bernard Grofman. Variation in class size, the class size paradox, and some consequences for students. Research in Higher Education, 6(3), 1977. Christiane Fellbaum and Randee Tengi. Wordnet: A lexical database of english. http://wordnet.princeton.edu/, 2005. Aram Galstyan and Paul Cohen. Cascading dynamics in modular networks. Phys- ical Review E, 75:036109, 2007. Jianxi Gao, Baruch Barzel, and Albert-L aszl o Barab asi. Universal resilience pat- terns in complex networks. Nature, 530:307, 2016. Manuel Garcia-Herranz, Esteban Moro, Manuel Cebrian, Nicholas A. Christakis, and James H. Fowler. Using friends as sensors to detect global-scale contagious outbreaks. PLoS ONE, 9(4):e92413, 2014. James P. Gleeson. Collective dynamics of `small-world' networks. Physical Review E, 77:046117, 2008. Alexander V. Goltsev, Sergey N. Dorogovtsev, and Jos e F. F. Mendes. Percolation on correlated networks. Physical Review E, 78:051105, 2008. Mark Granovetter. Threshold models of collective behavior. American Journal of Sociology, 83(6):1420{1443, 1978. Thomas Grund. Why your friends are more important and special than you think. Sociological Science, 1:128{140, 2014. ISSN 23306696. Sidharth Gupta, Xiaoran Yan, and Kristina Lerman. Structural properties of ego networks. In International Conference on Social Computing, Behavioral Modeling and Prediction, 2015. Andrew G Haldane and Robert M May. Systemic risk in banking ecosystems. Nature, 469(7330):351{355, 2011. Nathan Hodas, Farshad Kooti, and Kristina Lerman. Friendship paradox redux: Your friends are more interesting than you. In Proc. 7th Int. AAAI Conf. on Weblogs And Social Media, 2013. 90 Tad Hogg and Kristina Lerman. Social dynamics of digg. EPJ Data Science, 1(5), 2012. Norman P. Hummon and Patrick Doreian. Some dynamics of social balance pro- cesses: Bringing Heider back into balance theory. Social Networks, 25(1):17{49, 2003. Hang-Hyun Jo and Young-Ho Eom. Generalized friendship paradox in networks with tunable degree-attribute correlation. Physical Review E, 90:022809, 2014. G. Joshi-Tope, Marc Gillespie, Imre Vastrik, Peter D'Eustachio, Esther Schmidt, Bernard de Bono, Bijay Jassal, G. R. Gopinath, G. R. Wu, Lisa Matthews, et al. Reactome: A knowledgebase of biological pathways. Nucleic Acids Research, 33(suppl 1):D428{D432, 2005. Brian Karrer, Mark E. J. Newman, and Lenka Zdeborov a. Percolation on sparse networks. Physical Review Letters, 113:208702, 2014. David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of in uence through a social network. KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137{ 146, 2003. William O. Kermack and Anderson G. McKendrick. A contribution to the math- ematical theory of epidemics. Proceedings of the Royal Society A, 155(772):700, 1927. David A. Kim, Alison R. Hwong, Derek Staord, D. Alex Hughes, A. James O'Malley, James H. Fowler, and Nicholas A. Christakis. Social network target- ing to maximise population behaviour change: A cluster randomised controlled trial. The Lancet, 2015. James A. Kitts. Egocentric bias or information management? selective disclosure and the social roots of norm misperception. Social Psychology Quarterly, 66(3): 222{237, 2003. Jon M. Kleinberg. Navigation in a small world. Nature, 406(6798):845, 2000. Bryan Klimt and Yiming Yang. The enron corpus: A new dataset for email classi- cation research. In Proceedings of European Conference on Machine Learning, pages 217{226, 2004. Farshad Kooti, Nathan O. Hodas, and Kristina Lerman. Network weirdness: Exploring the origins of network paradoxes. In International Conference on Weblogs and Social Media (ICWSM), 2014. 91 Pavel L. Krapivsky and Sidney Redner. Dynamics of majority rule in two-state interacting spin systems. Physical Review Letter, 90:238701, 2003. Bibb Latan e and John M. Darley. Bystander `apathy'. American Scientist, 57(2): 244{268, 1969. Eun Lee, Fariba Karimi, Hang-Hyun Jo, Markus Strohmaier, and Claudia Wagner. Homophily explains perception biases in social networks, 2017. Kristina Lerman, Xiaoran Yan, and Xin-Zeng Wu. The \majority illusion" in social networks. PLoS ONE, 11(2):e0147617, 2016. Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, 2014. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densi- cation and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):2, 2007. Thomas M. Liggett. Stochastic Interacting Systems: Contact, Voter and Exclusion Processes. Springer-Verlag, Berlin, 1 edition, 1999. ISBN 3540659951. Priya Mahadevan, Dmitri Krioukov, Kevin Fall, and Amin Vahdat. Systematic topology analysis and generation using degree correlations. In ACM SIGCOMM Computer Communication Review, volume 36, pages 135{146. ACM, 2006. Gary Marks and Norman Miller. Ten years of research on the false-consensus eect: An empirical and theoretical review. Psychological Bulletin, 102(1):72, 1987. Miller Mcpherson, Lynn Smith-Lovin, and James M. Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27:415{444, 2001. Dale T. Miller and Deborah A. Prentice. Collective errors and errors about the collective. Personality and Social Psychology Bulletin, 20(5):541{550, 1994. Bjarke Mnsted, Piotr Sapie_ zy nski, Emilio Ferrara, and Sune Lehmann. Evidence of complex contagion of information in social media: An experiment using twitter bots. PLoS ONE, 12(9):e0184148, 2017. Michael Molloy and Bruce Reed. A critical point for random graphs with a given degree sequence. Random structures & algorithms, 6(2-3):161{180, 1995. Flaviano Morone and Hernan A. Makse. In uence maximization in complex net- works through optimal percolation. Nature, 524:65, 2015. Mark E. J. Newman. Assortative mixing in networks. Physical Review Letter, 89: 208701, 2002. 92 Mark E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167{256, 2003. Joshua L. Payne, Peter S. Dodds, and Margaret J. Eppstein. Information cascades on degree-correlated random networks. Physical Review E, 80:026125, 2009. Deborah A. Prentice and Dale T. Miller. Pluralistic ignorance and alcohol use on campus: Some consequences of misperceiving the social norm. Journal of Personality and Social Psychology, 64(2):243{256, 1993. Craig M. Rawlings and Noah E. Friedkin. The structural balance theory of sen- timent networks: Elaboration and test. American Journal of Sociology, 123(2): 510{548, 2017. Everett M. Rogers. Diusion of Innovations, 5th Edition. Free Press, 5th edition, 2003. ISBN 0743222091. M. Puck Rombach, Mason A. Porter, James H. Fowler, and Peter J. Mucha. Core- periphery structure in networks. SIAM Journal on Applied mathematics, 74(1): 167{190, 2014. Matthew J. Salganik, Peter S. Dodds, and Duncan J. Watts. Experimental study of inequality and unpredictability in an articial cultural market. Science, 311 (5762):854{856, 2006. Matthew J. Salganik, Maeve B. Mello, Alexandre H. Abdo, Neilane Bertoni, Dim- itri Fazito, and Francisco I. Bastos. The game of contacts: Estimating the social visibility of groups. Social networks, 33(1):70{78, 2011. Thomas C. Schelling. Hockey helmets, concealed weapons, and daylight saving: A study of binary choices with externalities. The Journal of Con ict Resolution, 17(3), 1973. Laura M. Smith, Linhong Zhu, Kristina Lerman, and Zornitsa Kozareva. The role of social media in the discussion of controversial topics. In ASE/IEEE International Conference on Social Computing, 2013. Amos Tversky and Daniel Kahneman. Availability: A heuristic for judging fre- quency and probability. Cognitive Psychology, 5(2):207{232, 1973. Johan Ugander, Brian Karrer, Lars Backstrom, and Cameron Marlow. The anatomy of the Facebook social graph, 2011. Thomas W. Valente. Network models of the diusion of innovations (Quantita- tive methods in communication subseries). Hampton Press (NJ), 1995. ISBN 1881303225. 93 Duncan J. Watts. A simple model of global cascades on random networks. Pro- ceedings of the National Academy of Sciences, 99(9):5766{5771, 2002. Duncan J. Watts and Steven H. Strogatz. Collective dynamics of `small-world' networks. Nature, 391(1):440{4421, 1998. Xin-Zeng Wu, Allon G. Percus, and Kristina Lerman. Neighbor-neighbor correla- tions explain measurement bias in networks. Scientic Reports, 7:5576, 2017. Xin-Zeng Wu, Peter G. Fennell, Allon G. Percus, and Kristina Lerman. Degree correlations amplify the growth of cascades in networks. Physical Review E, 98: 022321, 2018. H. Peyton Young. The dynamics of social innovation. Proceedings of the National Academy of Sciences, 108(Supplement 4):21285{21291, 2011. ISSN 1091-6490. Wayne W. Zachary. An information ow model for con ict and ssion in small groups. Journal of Anthropological Research, 33(4):452{473, 1977. Xiao Zhang, Travis Martin, and Mark E. J. Newman. Identication of core- periphery structure in networks. Physical Review E, 91:032803, 2015. Ezra W. Zuckerman and John T. Jost. What makes you think you're so popular? self-evaluation maintenance and the subjective side of the \friendship paradox". Social Psychology Quarterly, 64(3):207{223, 2001. 94
Abstract (if available)
Abstract
Complex systems can be represented as networks of interacting entities or nodes. In numerous physical models on networks, dynamics are based on interactions that exclusively involve properties of a node's nearest neighbors. Local interactions among nodes in a complex network can lead to an astounding array of global behaviors. Examples include viral outbreaks and social contagions in social networks, cascading failures in the power grid and financial networks, synchronization of coupled oscillators, opinion dynamics and consensus formation in human groups. ❧ A node's local view of the network, however, may be systematically different from the global ground truth, which may affect global phenomena. Social scientists have identified one source of bias—the friendship paradox—which states that, on average, nodes have fewer connections, or smaller degree, than their neighbors. Recently, more interesting variations of the paradox were discovered. The strong friendship paradox states that most nodes have fewer connections than most of their neighbors. Unlike the original friendship paradox and its generalizations to attributes other than degree, the strong friendship paradox does not arise trivially as a result of sampling from skewed distributions. ❧ Strong friendship paradox can dramatically distort local information in a network, leading to the “majority illusion” paradox in which a globally rare attribute may be dramatically overrepresented in the local neighborhoods of many nodes. As consequence, many nodes will observe the majority of neighbors with the attribute, which can affect not only an individual node's behavior but also the collective phenomena unfolding on the network. Two key network properties determine the strength of the paradox in a network. The paradox in a network: 1. nodes with high connectivity are more likely bear that specific property, 2. nodes with high connectivity prefer to connect to nodes with low connectivity. By using these two correlations, we can quantify the global visibility of that specific property in a complex network. ❧ We also investigate the strength of the strong friendship paradox to estimate the amount of information distortion a node with a given connectivity experiences. To accurately model the strength of the strong friendship paradox requires a new structure property beyond the connectivity preference between two nodes mentioned above: what we defined as the correlation between node's neighbors or the neighbor assortativity. ❧ These paradoxes can influence results of social contagion. To verify this, we employ the Watt's threshold model, in which a node become active when some fraction of its neighbors are active. A mathematical theory with generating functions and tree-like assumption describes the size of contagious outbreaks. In most cases, there is a critical threshold that controls whether the outbreak is local or global (i.e. reaches a sizable fraction of the network). ❧ Interestingly we found that the outbreak behavior is non-monotonic with respect to networks assortativity. To explain this, we identify the weights of links between nodes that could be activated by a single neighbor are directly related to this phenomena. The vulnerable correlation can be then defined in the same manner. The outbreak threshold increases monotonically with the vulnerable correlation. ❧ Of the measures that define the properties of a network, most are based on the connectivity distribution of a single node or a pair of connected nodes. Not much discussion about higher order properties of network have been made so far. Our work elucidates how these properties affect local information bias a node experiences and its consequences on dynamic phenomena in networks, including dynamics of spreading processes.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Disentangling the network: understanding the interplay of topology and dynamics in network analysis
PDF
Learning distributed representations from network data and human navigation
PDF
Heterogeneous graphs versus multimodal content: modeling, mining, and analysis of social network data
PDF
Modeling and predicting with spatial‐temporal social networks
PDF
Essays on bioinformatics and social network analysis: statistical and computational methods for complex systems
PDF
Query processing in time-dependent spatial networks
PDF
Elements of robustness and optimal control for infrastructure networks
PDF
Efficient data collection in wireless sensor networks: modeling and algorithms
PDF
Plasmonic excitations in quantum materials: topological insulators and metallic monolayers on dielectric substrates
PDF
Fundamental limits of caching networks: turning memory into bandwidth
PDF
Physics informed neural networks and electrostrictive cavitation in water
PDF
Normative and network influences on electronic cigarette use among adolescents
PDF
Theoretical and computational foundations for cyber‐physical systems design
PDF
Scheduling and resource allocation with incomplete information in wireless networks
PDF
Topological protection of quantum coherence in a dissipative, disordered environment
PDF
Development and application of robust many-body methods for strongly correlated systems: from spin-forbidden chemistry to single-molecule magnets
PDF
Mutual information estimation and its applications to machine learning
PDF
Data-driven image analysis, modeling, synthesis and anomaly localization techniques
PDF
Functional connectivity analysis and network identification in the human brain
PDF
Artificial intelligence for low resource communities: Influence maximization in an uncertain world
Asset Metadata
Creator
Ngo, Shin-Chieng (author), Wu, Xin-Zeng (author)
Core Title
Global consequences of local information biases in complex networks
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
physics
Publication Date
11/12/2018
Defense Date
08/08/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
assortativity,cascades,complex systems,degree correlations,network science,network structures,networks,OAI-PMH Harvest,social contagion,social networks
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Nakano, Aiichiro (
committee chair
), de la Haye, Kayla (
committee member
), Haas, Stehpan (
committee member
), Lerman, Kristina (
committee member
), Saleur, Hubert (
committee member
)
Creator Email
energiya@gmail.com,xinzengw@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-104051
Unique identifier
UC11676716
Identifier
etd-WuXinZeng-6955.pdf (filename),usctheses-c89-104051 (legacy record id)
Legacy Identifier
etd-WuXinZeng-6955.pdf
Dmrecord
104051
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Wu, Xin-Zeng; Ngo, Shin-Chieng
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
assortativity
cascades
complex systems
degree correlations
network science
network structures
networks
social contagion
social networks