Close
The page header's logo
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Network structures: graph theory, statistics, and neuroscience applications
(USC Thesis Other) 

Network structures: graph theory, statistics, and neuroscience applications

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content NETWORK STRUCTURES: GRAPH THEORY, STATISTICS, AND NEUROSCIENCE APPLICATIONS by Yu-Teng Chang A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2012 Copyright 2012 Yu-Teng Chang Dedication To all those support me in my life, especially my family: My parents Chun-Yen Chang and Shu-Fen Lin, my brother Jimmy Hsiang-Chun Chang, and my wife I-Ting Chen. ii Acknowledgements Even though only my name appears on the cover of this dissertation, I would never have been able to complete this research without the contribution from a great many people. I owe my gratitude to them; they have not only made this dissertation possible, but also made my graduate experience one that I will cherish forever. I would like to express my deepest gratitude to my advisor, Dr. Richard M. Leahy. I am extremely fortunate to have an advisor who gave me the freedom to explore ideas and thoughts on my own, and at the same time the excellent guidance and caring when my step faltered on the road of research. His patience and support helped me overcome academic challenges and many other tough situations. I am truly thankful to Dr. Dimitrios Pantazis, who has spent lots of time discussing and brainstorming with me with his full enthusiasm and knowledge. I am deeply grateful for his extremely supportive attitude on my research, always giving me very useful advice and suggestions, and his carefully reading, thinking, and commenting on countless revisions of all manuscripts. I would like to thank Dr. C.-C. Jay Kuo, Dr. Bosco Tjan, and Dr. Bhaskar Krishnamachari. They gave very insightful comments and constructive suggestions along my graduate years. They helped me not only focus my ideas, but also explore new data sets and new aspects of research. I am grateful to them for holding me to a high standard of research. iii I would like to thank both the current and former lab members under Dr. Leahy, especially Dr. Quanzheng Li, Dr. Justin Haldar, and Dr. Anand A. Joshi. They always provide very supportive help and are very willing to discuss with me. They also shared their own experience in doing research and dealing with both life and academic challenges with me, and this helped me a lot. Also I would like to give my gratitude to Dr. Brian H. Hui, my oce-mate, who taught me the attitude of knowledge pursuit and the spirit of hard working. I greatly value their friendship and appreciate all they have done for and with me. Finally, I would also like to give a very special thank to my family. My parents were always supporting me and encouraging me to pursue the goal I want with their best love and wishes. My brother is also a graduate student in the United States and we constantly share research experience, new knowledge, and the graduate lives. I would like to thank my wife, I-Ting Chen. She was always there cheering me up and stood by me through the good times and bad. I deeply appreciate her love for and belief in me. iv Table of Contents Dedication ii Acknowledgements iii List of Tables viii List of Figures ix Abstract xv Chapter 1: Introduction 1 Chapter 2: Modularity-Based Partitioning for Undirected Graphs 8 2.1 Modularity and Expected Graph Models . . . . . . . . . . . . . . . 10 2.1.1 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Random Network Partially Conditioned on Node Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.3 Random Network Fully Conditioned on Node Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.4 Gaussian Random Network . . . . . . . . . . . . . . . . . . 16 2.1.5 Gaussian Random Network with Independent Identically Distributed Edges . . . . . . . . . . . . . . . . . 17 2.1.6 Non-Gaussian Random Networks and BLUE . . . . . . . . . 17 2.2 Partition Implementation . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Partition Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.1 Node Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.2 Network Topology . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.3 Isolated Nodes . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.4 Resolution Limit . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4 Partition Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5.1 Gaussian Random Networks . . . . . . . . . . . . . . . . . . 26 2.5.2 Binary Random Networks . . . . . . . . . . . . . . . . . . . 29 v 2.5.3 Real World Networks . . . . . . . . . . . . . . . . . . . . . . 35 2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.8 Supplementary Materials . . . . . . . . . . . . . . . . . . . . . . . . 43 2.8.1 Random Network Conditioned on Node Degrees . . . . . . . 43 2.8.2 Gaussian Random Network with Independent Identically Distributed Edges . . . . . . . . . . . . . . . . . 45 2.8.3 Derivation of Best Linear Unbiased Estimator (BLUE) . . . 47 Chapter 3: Modularity-Based Partitioning for Directed Graphs 53 3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1.1 Null Models for Directional Graphs . . . . . . . . . . . . . . 55 3.1.1.1 Conditional Expected Model . . . . . . . . . . . . 55 3.1.1.2 Directed Gaussian Random Network and BLUE . . 56 3.1.2 Maximization of Information Flow and Modularity . . . . . 57 3.1.3 Tikhonov Weighted Hybrid Clustering . . . . . . . . . . . . 61 3.1.4 Hybrid Partitioning Implementation . . . . . . . . . . . . . 61 3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.1 MEG Simulated Networks . . . . . . . . . . . . . . . . . . . 63 3.2.2 MEG Resting Networks . . . . . . . . . . . . . . . . . . . . 65 3.2.3 Hybrid Network Partitioning . . . . . . . . . . . . . . . . . . 66 3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Chapter 4: Assessing the Signicance of Modularity-Based Graph Partitioning 71 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.1 Modularity and Null Models . . . . . . . . . . . . . . . . . . 73 4.2.2 Distribution of the Largest Eigenvalue . . . . . . . . . . . . 74 4.2.3 Approximation with Gamma Distribution Family . . . . . . 75 4.2.4 Fitting with dierent network sizes . . . . . . . . . . . . . . 76 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Chapter 5: On the Largest Eigenvalue for Modularity-Based Graph Partitioning 83 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2.1 Gaussian Random Ensembles . . . . . . . . . . . . . . . . . 85 5.2.2 Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.2.3 Modularity Signicance and Gaussian Ensembles . . . . . . 88 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 vi Chapter 6: Signicance of the Modular Structures in Binary Graphs 95 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.2.1 Monte-Carlo Experiment Setup . . . . . . . . . . . . . . . . 96 6.2.2 Normalization On Modularity . . . . . . . . . . . . . . . . . 98 6.2.3 Addressing Signicance . . . . . . . . . . . . . . . . . . . . . 98 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.3.1 Structured Binary Networks . . . . . . . . . . . . . . . . . . 99 6.3.2 Fuzzy Structured Binary Networks . . . . . . . . . . . . . . 101 6.3.3 Real Data Networks . . . . . . . . . . . . . . . . . . . . . . 101 6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Chapter 7: Conclusions and Future Work 112 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 7.2.1 Exhaustive Assignment . . . . . . . . . . . . . . . . . . . . . 113 7.2.2 Hybrid Network Edge Distributions . . . . . . . . . . . . . . 113 7.2.3 Hypernetworks . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.2.4 Overlapping Networks . . . . . . . . . . . . . . . . . . . . . 115 7.2.5 Inclusion of Prior Clustering Knowledge . . . . . . . . . . . 115 7.2.6 Applications in Other Disciplines . . . . . . . . . . . . . . . 116 References 117 vii List of Tables 2.1 Expected edge strength for several dierent types of random networks 12 2.2 Results of karate club binary network partitioning using 3 methods: Q B , Q Bernoulli A , and Q BLUE A . Values represent modularity achieved for the 3 partition results on the columns of the table. Bold font indi- cates maximum modularity, which also represents the nal partition results achieved by each method. . . . . . . . . . . . . . . . . . . . 36 5.1 Top row: estimated coecients in equations (5.15) and (5.16). Mid- dle and bottom rows: 95% condence intervals. . . . . . . . . . . . 91 viii List of Figures 2.1 Random graph models and modularity. First column: nodes i andj of the original graph are connected with an edgeA ij . Second column: random graph A has the same node degreesk i andk j as the original graph, but the edge strength is replaced by its conditional expected value E(A ij jk), with analytic expressions given in Table 1. The contribution of this edge to modularity is A ij E(A ij jk) when the two nodes are assigned to the same group, and the total modularity involves a sum over all edges of the graph. Third column: similarly for random graph B, with the exception thatR ij in equation (2.1) is used instead of the conditional expected edge strength. . . . . . . . 9 2.2 Two simple 4-node complete networks with dierent topological con- gurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Partition results for Gaussian graphs. a) Performance against dif- ferent values of variance ( 2 ) of the Gaussian distribution and zero additive noise ( 2 N = 0); b) anity matrix; c) dierence between anity matrix of original graph versus null model A NULL jk ; d) same for R null model. Top row: 12 vs. 8 cluster size; bottom row: 10 vs. 10 cluster size. Method Q A accurately partitions the graph for all values of 2 , whereas method Q B fails for low values of 2 . . . . 26 2.4 Partition results of Gaussian graphs for several levels of edge variance 2 , noise variance 2 N , and cluster size ratio N 1 =N. Method Q A is more robust to noise, and methodQ B fails for small values of 2 but performs better for large values of 2 . . . . . . . . . . . . . . . . . 28 2.5 Partitioning graphs with both positive and negative connections. a) Anity matrix; b) dierence between anity matrix of original graph versus null model A NULL jk ; d) same for R null model. In the presence of negative connections, null model R fails to partition the graph, as indicated by the lack of structure of matrix A R. . . . . 29 ix 2.6 Comparison of null graph models against a random rewiring scheme. Distance is expressed as root mean square dierence between the elements of the anity matrices of the graphs (equation (2.28)). For each box plot, the central mark is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers extend to the most extreme data points not considered outliers. Left: Distance does not depend on the number of rewiring steps N r . Right: As the size of graph N increases, distance becomes smaller. Overall, Q A null models are closer to an actual rewiring scheme than Q B models. 32 2.7 Partition results for the binary network described in [53]. Top row: higher values of k o lead to less community structure of the network; bottom row: Partition results for dierent values ofk o . Performance of all methods is practically equal. . . . . . . . . . . . . . . . . . . . 33 2.8 Results of benchmark in [85] for Q A and Q B methods. The per- formance of all methods is almost identical for all values of mixing parameter and average degreehki. . . . . . . . . . . . . . . . . . 34 2.9 Partition results for a binary network for dierent values of cluster size ratio. a) Anity matrix for 10 vs. 10 cluster size; b) Anity matrix for 17 vs. 3 cluster size; c) Performance is very similar for all methods, across all cluster size ratio values. . . . . . . . . . . . . . 34 2.10 Karate club network. Nodes indicate 34 club members and color indicates the actual split of the group after a con ict between its members. . . . . . . . . 36 2.11 Graph analysis of a structural network based on diusion brain imag- ing. a) Anity matrix with the upper-left and lower-right quadrants corresponding to regions of interest on the right and left hemispheres respectively; b) structural brain network with edges greater than 0.1; c) partition results for method Q A . Color indicates cluster mem- bership; d) same as before for method Q B ; e) partition results for method Q A , but now spheres indicate ROIs and larger spheres indi- cate network hubs; f) same as before for method Q B . . . . . . . . . 51 2.12 Partitioning results with Q A , Q B , and Q B [54] methods of a real functional brain network. Dierent modules detected by the algo- rithm are labeled by color. First row indicates the symmetric across- hemisphere results while the second row highlights the asymmetric ROIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.1 Left: A 3-node complete directional network. Right: Forward map- ping matrix H. Rows above the red line are related to the out-degrees of all nodes; rows below are related to the in-degrees. . . . . . . . . 56 x 3.2 Maximizing information ow (a) and modularity (b). The plus sign indicates regions we maximize and the minus regions we minimize. . 58 3.3 Top: topography of interacting cortical sources; Bottom: Spectral Granger causality between pairs of sources. . . . . . . . . . . . . . . 64 3.4 Partition results based on modularity and information ow. a) An- ity matrices for low and high frequencies, calculated using Granger causality; b) Clusters based on modularity; c) Sources (blue) and sinks (red) of information within each cluster; d) Maps of k, rep- resenting the contribution of each vertex to information ow. . . . . 67 3.5 Paritioning results for a single subject resting MEG study in which we looked at delta, theta, and alpha band interactions. Each row represents a subnetwork identied by modularity using our null model. 68 3.6 Resting state MEG partitioning results for beta and gamma interac- tions measure by Granger causality and the partition result for in- teractions measured by correlation coecient. Each row represents a subnetwork identied by modularity using our null model. . . . . 69 3.7 Hybrid network clustering. Left: simulated network of 32 nodes. Middle top: reordered network adjacency matrix after the partition- ing with = 0. Middle bottom: reordered network adjacency matrix after partitioning with = 3. Right top and bottom: graphical rep- resentations of the corresponding partitioning results. . . . . . . . . 70 4.1 Top: Distribution of the largest eigenvalue of B for several values of mean and xed = 0:8. Bottom: Distribution of the largest eigenvalue of B for several values of standard deviation . . . . . . 76 4.2 Monte Carlo distribution of 1 for = 1 and its best t with a Gamma distribution. The maximum likelihood estimate of the Gamma distribution parameters was ^ = 117:99 and ^ = 0:06321 with estimator variance 2 ^ = 0:0277 and 2 ^ = 8 10 9 . . . . . . . 77 4.3 Fitted parameters ^ and ^ in the Gamma distribution as a function of standard deviation for network sizeN = 20,N = 30, andN = 40. 78 4.4 Left: ^ as a function of network size N. Right: ratio of ^ with respect to as a function of network size N. . . . . . . . . . . . . . 79 xi 4.5 Estimation of Gamma distribution parameters when the original net- works used to create A Random networks have a community structure of two clusters withN 1 andN 2 nodes such thatN 1 +N 2 = 20. For each N 1 value, we generated 100 original networks which further produced 10 4 random networks each. The Gamma parameters estimated from the histogram of those random networks are in close agreement with those predicted by equations (4.6) and (4.7). . . . . . . . . . . . . . 79 4.6 The bi-partitioning results for two Gaussian networks; (a) shows a clear network structure while (b) is less structured. . . . . . . . . . 80 4.7 Modularity-based partitioning of the structural brain network in [61]. Each color represents a dierent subnetwork. All clusters are statis- tically signicant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.8 Partition results of a resting state fMRI network. All 3 clusters are signicant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.9 Partition of the Karate Club network [165]. . . . . . . . . . . . . . . 82 5.1 Tracy-Widom distributions for the three types of Gaussian random ensembles. Left: cumulative density function; Right: probability density function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.2 Empirical cumulative density functionsF T (t) constructed by Monte- Carlo simulations for networks of dierent sizeN and edge standard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.3 Illustration of the minimization on Kullback-Leibler divergence for a specic network size N = 100 and standard deviation = 3. The distribution of random variable S is almost identical to the trans- formed distribution of T . . . . . . . . . . . . . . . . . . . . . . . . 91 5.4 Surfaces of log 10 (w 1 ) and w 2 computed by equation (5.15) and (5.16). 92 6.1 Scatter plots of modularities of dierent Erd os-R enyi random graph sizes. Each single circle represents a realization. We observe that the results change with respect to the network size. The bend of curves shrink in width (range of modularity) and approach to zero in larger random networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.2 Scatter plots of modularities of dierent Erd os-R enyi random graph sizes. Each single circle represents a realization. Y-axis is the nor- malized modularity ~ Q 2 . Compared with Figure 6.1, the locations of bend of curves from dierent N settings are roughly aligned. . . . . 106 xii 6.3 Dierent percentile curves as functions of empirical Bernoulli distri- bution parameterp. Dierent colors indicates the curve for dierent network sizeN. We can see the curves are approximately aligned by scaling the modularity to be ~ Q 2 by equation (6.3). . . . . . . . . . . 107 6.4 (a) The scatter plots of 5000 realizations of structured graphs. Graphs are generated to have a 20-20 cluster by randomly choosing the within cluster Bernoulli distribution parameterp in 2 (0:5; 1) and be- tween cluster connection probability p out 2 (0; 0:5). The adjacency matrix of one such realization is given in (b); the top left and bot- tom right quadrants denote the within cluster connections for two modules (p in = 0:85) while the top right and bottom left quadrants denote the between cluster connections (p out = 0:15). . . . . . . . . 108 6.5 (a) The scatter plots of 5000 realizations of structured graphs. They are generated to have a 20-20 cluster by randomly choosing the within cluster Bernoulli distribution parameterp in 2 (0:4; 1) and be- tween cluster connection probability p out =p in0:4 . The adjacency matrix of one such realization is given in (b). . . . . . . . . . . . . . 108 6.6 Dierent set up for test bench graph realizations: (a) Network size N = 250. Cluster size are between 50 and 120. (b) Network size N = 250. Cluster size are between 80 and 130. Results for various mixing parameter are plot in colors. We observe that when the network structure is weak, indicated by a large , the normalized modularity ~ Q 2 decreases and could be below theF 0:95 curve. . . . . 109 6.7 Real data network: the Karate Club network [165] has 34 nodes. Its normalized modularity value ~ Q 2 is calculated and above theF 0:95 curve, indicating that the community structure is signicant. So does the Dolphin network [91] of 62 nodes. . . . . . . . . . . . . . . 109 6.8 The fMRI resting connectome has 96 nodes and we used theF 0:95 curve from N = 100 for reference. The completed weighted graph is thresholded and binarized to preserve dierent portions of the strongest edges. We calculate the normalized modularity ~ Q 2 as well as the empirical Bernoulli parameterp for each of them to test their community structure signicance. Dierent color indicates the dif- ferent sparsity level as specied in the color bar; for example 0.85 means the network preserves top 85% of the edges according to their strengths. All the thresholded network has ~ Q 2 above theF 0:95 curve, indicating that the community structures are signicant. . . . . . . 110 xiii 6.9 Frequency plot of the normalized modularity ~ Q 2 for dierent empiri- cal Bernoulli distribution parameterp value in each panel. Dierent size networks are in colors. In each of the dierent p case, the vari- ance of ~ Q 2 decreases when the network size N increases. . . . . . . 111 xiv Abstract Network modeling and graph theory have been widely studied and applied in a vari- ety of modern research areas. The detection of network structures in these contexts is a fundamental and challenging problem. \Community structures" (where nodes are clustered into densely connected subnetworks) appear in a wide spectrum of dis- ciplines. Examples include groups of individuals sharing common interests in social networks and groups of proteins carrying out similar biological functions. Detecting network structure also enables us to perform other kinds of network analysis, such as hub identication. Modularity-based graph partitioning is an approach that is specically designed to detect the community structures in a network. Though modularity methods have achieved some amount of success in detecting modular structures, many of the related problems have not yet been solved. Central to modularity-based graph partitioning is the null model: a statistical representation of a network with no structure. In this work, I will present a novel approach to design null models. This new null model approach resolves many of the existing problems, including dealing with non-negativity, topological consistency, etc. Null models are presented for binary/weighted graphs and undirected/directed graphs. I will also present several new methods to assess the statistical signicance xv of the detected community structures. Several of the potential future work direc- tions as well as the position of the modular detection problem in a broader network analysis scheme will be given at the end of this thesis. xvi Chapter 1 Introduction In 1736, Euler provided a mathematical solution to the a problem, now known as Euler's Puzzle, to nd a path by which a person could cross each of the seven bridges in the city of K onigsberg exactly once and return to the starting point [39]. He not only proved that such a path does exist but gave a general solution that could be applied to any arbitrarily arranged bridges and landmasses. Euler pointed out the physical distances and geographical positions are not important in solving this problem. For this important nding that he referred to the \geometry of position", he is usually credited as the founder of the eld of graph theory. A graph is a mathematical representation of a network or a system comprised of interconnected components, denoted as nodes, with connections denoted as edges. A number of dierent graph representations exist: undirected graphs assign no directional information to the connections, as opposed to directed graphs where edges are represented with directionality; binary graphs have edges denoting either the presence (1) or absence (0) of a connection, while weighted graphs also quantify the interconnection strength. Density of connections can range from fully connected graphs (completed graphs) to very sparse graphs. 1 Graph theory methods have been applied to study the structure and proper- ties of a wide range of systems, including the World Wide Web [15, 44, 71], social networks [53, 87, 143, 165], biological networks [63, 104, 112, 130], brain and neural networks [18, 19, 43, 60, 73, 99, 120], and many others. At least a couple of reasons can explain why network approaches attract more and more attention. First, as data increase in size and networks become more complex, inferring system prop- erties from studying individual elements no longer remains unbiased and accurate. Second, the network approach provides more information to explain the functional specialization for each element, considering both their tendency to make connec- tions, as well as the importance of each connection globally in the network. A famous example is the challenge of identifying functions for each protein in molecular biology systems. Conventional methods involve the study of each individ- ual protein structure and nd their characteristics. Similar or modular structural features become the bases for predicting interactions and functions of unknown pro- teins. Network approaches, on the other hand, utilize the known protein-protein interaction information. The function of individual proteins in a protein complex can be hypothesized because many proteins work jointly to carry out specic func- tions [27,130,131,154]. This is usually carried out by identifying the modules com- prised of proteins involved in common biological functions. Proteins with unknown functions are assigned functional predictions according to their modular classied identity. This example indicates how a network approach can expand the analysis scope from a local level to a global, interaction-based system approach. Network tools are instrumental to this analysis. Two broad classes of algorithms have been used towards detecting a network structure: divisive and agglomerative techniques. Divisive techniques partition the network into multiple sub-networks by removing edges between them, whereas 2 agglomerative techniques start with individual nodes and progressively join them into clusters using similarity criteria. Both approaches are popular and successful in analyzing networks [16,88,161], however they also suer from shortcomings. For example, the minimum-cut method [116], a divisive algorithm that minimizes the sum of weights of the removed edges, has the disadvantage of often dividing the network very unevenly [159]. To deal with this problem, researchers have proposed methods with modied cost functions that normalize the cost of the removed edges. This is achieved by using either the cardinality of the resulting clusters, as with average-cuts and ratio-cuts [59], or the ratio of the within-cluster connections to the total-cluster connections, as with normalized cuts [133, 160, 164]. However, while minimizing the cost of removed connections, these methods are not specically designed to preserve another important feature of the network: its community structure. Real life networks divide into modules (communities, groups) within which the network connections are strong, but between which the connections are weak. Mod- ules are groups of nodes that share the same properties or play similar roles in the whole network. More essential than the node-level analysis of the graph, detecting community structure can be of great value in identifying the substructures of a net- work that have distinguishable and important functions. For example, web-pages dealing with the same topic form a web-community [44], while in biology modules are dened as groups of proteins or mRNA associated with specic cellular func- tions [63], or the functional cycle and pathways in metabolic networks [56]. Popu- lations can be clustered according to their shopping behavior for ecient advertis- ing and customized recommendation systems [82]; Internet providers can enhance server performance by analyzing user habits and geographical location data [83]. Storing and searching queries of large networks can benet from identication of 3 the community structure of the data [1,158]. In brain imaging, modules may consist of brain regions that are densely connected with neuronal bers, or are functionally highly correlated [61]. In neuroscience, community detection has recently stimulated new research nd- ings, both in research and clinical applications. Some evidence from synaptic con- nectivity suggest that the stronger connections tend to be highly clustered in the rat visual cortex, which favors more densely connected structural motifs [135]. Clus- ters of cortical area were retrieved and found to resemble known functional divi- sions in cat and macaque using a multidimensional scaling method by Young and collaborators on connectivity data [162, 163]; Hilgetag applied another stochastic method to large-scale mammalian connectivity data to identify clusters [66]. Clus- tering results are also studied on networks constructed by dierent connectivity measures, such as cortical thickness correlation data [23, 24, 40], diusion connec- tivity [55,60,61,147], temporal correlation from fMRI time series [65,121,149,156], mutual information based MEG interactions [32], or Granger-causality based func- tional interactions [31,89,125]. The importance of community detection to neuroscience is manifold: cluster structure could help predict and discover previously-unknown connections and com- ponents [124]. Once the communities are identied, each individual component (for example, a brain area in large scale brain networks) can be classied according to its relative location in the community, such as at the center of a cluster or its boundary. Hub identication can be performed with detected communities. Hub nodes usually have special integrative functions or in uence the information ow within the network. This feature indicates both the importance [55, 61, 137] and vulnerability of hubs; their failure often causes entire system breakdowns [64,72]. 4 Another popular research nding is the small-world property of brain networks [89,147,150]. A network is dened to have small world property if it has relatively short characteristic path while being more clustered than random networks. The later metric depends on how the network is clustered; thus it is related to the identication of community structure as well. Recent research is also directed towards how brain networks change in response to how the community structure changes in response to cognitive processing [80, 119], learning [11], lesions [4], traumatic brain injury [20], schizophrenia [3,96,134], autism [5,123], psychopathology [96], normal aging [24,97], and development [35,41, 62, 114]. This graphical approach, often involving community structure detection, provides a more global inference of not only where changes occur, but also how the brain responds, compensates, or develops as a collective and integrative system. The modular structure also poses new questions to neuroscientists. For example, how do those community structures occur and develop [67, 113, 136]? How are clusters associated structurally and functionally [68,69,134,139,166,168]? Or more generally, how can we relate the community structures identied from dierent modalities? How do clusters interact with each other to give rise to high level cognitive behavior? Such questions are built on top of a reliable detection algorithm of community structures and a good understanding about how to interpret the clustering results. There are multiple formal denitions of a community structure [46], ranging from local denitions, such as n-clique [2,36,90], n-clan, n-club, [101], k-plex [129], k-core [128], clique percolation [111], motif analysis [135, 137, 138], and strong or weak-community [115], to those using global measures on the graph [108,109]. New- man and Girvan introduced a quality measure of a particular division of a network, called modularity [109], and later presented a spectral graph partition algorithm 5 that maximizes modularity [108]. Unlike traditional clustering methods that seek to minimize weighted combinations of the number of edges running between the modules, such as minimum cuts or normalized cuts [133], modularity-driven clus- tering methods compare each edge against its expected value when clustering nodes into corresponding modules. If a natural division of a network exists, we should expect connections within a module to be stronger than their expected values, and the opposite should hold true for connections between modules. Central to this modularity idea is the expected network, or the \null model", which is dened as a random network conditioned to have the same number of nodes, edges, and degree distribution as in the original graph but in which the links are randomly placed [110]. In graph theory, the set of random networks that has predetermined node degrees is called the conguration model and has been extensively studied [50,102]. Starting from the \null model" for undirected binary graphs [108], many eorts have been devoted into null model construction in order to extend the applicability of the modularity method: for weighted graphs [42, 52, 105]; for directed graphs [6,86]; for graphs with overlapping communities [110,132]; null models constructed by modied rewiring of edges [94], a local version of modularity [103], or from the spin-glass model [117]; for bipartite graphs [10, 58]; and for graphs that deal with negative edges typically found in correlation-based networks [9, 54, 75, 122, 144]. Most of those extensions are based on and modied from the null model originally proposed by Newman and Girvan [108], and are usually constructed similarly in terms of the product of associated degrees. In this dissertation, we focus on the modularity based method to identify com- munity structures. We rst propose a novel null model, the statistically based conditional expected network, which is designed to have the same number of nodes, 6 edges, and degree distribution as the original graph, but with randomly placed links. We begin with derivations for undirected graphs and estimate dierent equa- tions depending on the random network distributions. We carefully examine the properties of this new null model and observe many desirable properties (Chapter 2). We further extend this work for directed graphs where edges also have directional information. We develop the general formula for conditional expected directed graphs. In addition to studying the null model properties in relation to undirected networks, we also look at the directional information specic to directed networks. A unied approach is applied to analyze the information ow. We further propose a Tikhonov weighted method to combine modularity and information ow (Chapter 3). We also address the statistical signicance of community structure in networks. We propose to use the largest eigenvalue as a surrogate of modularity and compute a parametric estimate of the largest eigenvalue under the Gaussian assumption (Chapter 4). We then relate this work to random matrix theory and Gaussian random ensembles (Chapter 5). For binary graphs, we use Monte-Carlo simulations to compute non-parametric distributions of maximum modularity in the the Erd os- R enyi random graph framework (Chapter 6). All of the above approaches provide results for assessing the statistical signicance of structure detected by modularity- based graph partitioning algorithms. In the chapter 7, we conclude with several remaining challenges faced by modularity- based partitioning methods and indicate future research directions. 7 Chapter 2 Modularity-Based Partitioning for Undirected Graphs In this chapter, we propose a new null graph model that can be used for modularity- based graph partitioning, and provide analytic solutions for specic parametric dis- tributions. We rst provide closed form solutions for the expected strength of an edge when it is conditioned only on the degrees of its two neighboring nodes. We then provide an improved estimate of the expected network, where we condition the strength of an edge on the nodes comprising the whole network. We analytically compute the expected network under the assumption of Gaussian and Bernoulli distribution. When the Gaussian assumption is violated, we prove that our expres- sion is the best linear unbiased estimator. Finally, we use our conditional expected network to partition graphs, and demonstrate its performance in simulated and real world networks. 8 Figure 2.1: Random graph models and modularity. First column: nodes i andj of the original graph are connected with an edge A ij . Second column: random graph A has the same node degreesk i andk j as the original graph, but the edge strength is replaced by its conditional expected valueE(A ij jk), with analytic expressions given in Table 1. The contribution of this edge to modularity isA ij E(A ij jk) when the two nodes are assigned to the same group, and the total modularity involves a sum over all edges of the graph. Third column: similarly for random graph B, with the exception thatR ij in equation (2.1) is used instead of the conditional expected edge strength. 9 2.1 Modularity and Expected Graph Models In this section we rst describe modularity and the null model used in [108] for the estimation of modularity. We then introduce our null models, which are analytically computed for specic probability distributions for the edges of the network. We assume an undirected network with N nodes and L edges, and the weight of the edge connecting nodes i and j denoted as the A ij element of a weighted anity matrix A. If the network is unweighted, then A is a binary anity matrix (also called adjacency matrix) with every edge of unit strength. We extend the denition of the degree of node i as k i = P j A ij , i.e. the sum of the weights of edges associated with node i. This denition is consistent with the denition of degree for binary graphs, and allows us to apply our method to both binary and weighted networks, similarly to [105]. We also denote the total sum of the weights of all edges in the network as m = 1 2 P i;j A ij = 1 2 P i k i . 2.1.1 Modularity Optimal partitioning of a network requires specication of an appropriate cost function. Among the most popular is \modularity" [109] which uses the idea that if a natural division of a network exists, connections within a module should be stronger than their expected values, and the opposite should hold true for connec- tions between modules. If an edge is stronger than its expected value, it contributes positively to modularity, provided that the two nodes connected by the edge belong to the same module. Divisions that increase modularity are preferred, because they lead to modules with high community structure. Evaluation of modularity requires the computation of an expected network, or \null model", which has the same conguration as the original network but contains 10 no community structure because of a random placement of its edges. Newman [108] considered a random network where the probability of having a connection between two nodes i and j is proportional to the product of their degrees: R ij = k i k j 2m ;8i;jN (2.1) Under this random graph model, modularity is expressed as [108]: Q = 1 2m X i;j (A ij R ij ) (C i ;C j ) (2.2) where C i indicates group membership of node i. The Kronecker delta function equals 1 when nodesi andj belong to the same group, and is 0 otherwise. Therefore, modularity increases whenA ij R ij (edge strength minus expected edge strength) is positive for within-module edges (third column of gure 2.1). 2.1.2 Random Network Partially Conditioned on Node Degrees We denote by E(A ij ) the expected value of edge strength between nodes i and j. Instead of usingR ij , we propose the null graph model whose expected edge strength E(A ij jk i ;k j ) is conditioned on the degreesk i andk j of the neighboring nodes. The idea of conditioning on the degrees of neighboring nodes is based on the observation that the signicance of an edge is directly related to the total connections of its nodes. If two nodes have high degrees, there is a high chance they are connected even on a random network without a community structure. The opposite holds true for nodes with low degrees, where even weak connections could be important. 11 Table 2.1: Expected edge strength for several dierent types of random networks Null Model Expected edge strength Expected Bernoulli random network conditioned on degrees of associated nodes E(A ij jk i ;k j ) = ( k i k j k i k j +(N1k i )(N1k j )(p=(1p)) ;i6=j 0 ;i =j Expected Gaussian random network conditioned on degrees of associated nodes E(A ij jk i ;k j ) = ( k i +k j (N2) N ;i6=j 0 ;i =j Expected Gaussian random network conditioned on whole degree sequence. Also, BLUE null model in the non-Gaussian case E(xjHx = k) = x + xk 1 k (k k ) Expected i.i.d Gaussian random network conditioned on whole degree sequence E(A ij jk) = ( k i +k j N2 2m (N1)(N2) ;i6=j 0 ;i =j 12 Even though R ij also has a dependency on node degrees (equation (2.1)), it is not in the explicit form of a conditional expected value. The conditional expected value of edgeA ij can be calculated using the Bayesian formulation: E(A ij jk i ;k j ) = Z tP (A ij =tjk i ;k j )dt = Z t P (k i ;k j jA ij =t)P (A ij =t) R P (k i ;k j jA ij =u)P (A ij =u)du dt (2.3) To solve equation (2.3), we need to specify the joint distribution of the net- work edges. Section 2.8.1 provides detailed derivation for the cases of Gaussian and Bernoulli distributions. For the case of Gaussian random networks with inde- pendent and identically distributed (i.i.d.) edges with mean and variance 2 , we obtain the following analytic expression: E(A ij jk i ;k j ) = 8 > > < > > : k i +k j (N2) N ;i6=j 0 ;i =j (2.4) For binary networks, we can solve equation (2.3) for edges following an i.i.d. Bernoulli distribution with parameter p, where p is the probability of having a non-zero edge: E(A ij jk i ;k j ) = 8 > > < > > : k i k j k i k j +(N1k i )(N1k j )(p=(1p)) ;i6=j 0 ;i =j (2.5) 13 2.1.3 Random Network Fully Conditioned on Node Degrees In the previous section, we conditioned an edge only on the neighboring node de- greesk i andk j . A better representation of the null model structure is to condition an edge on the degrees of all nodes comprising the network: E(A ij jk 1 ;k 2 ;k 3 ;:::;k N ) =E(A ij jk) (2.6) To nd the above expectation, we rst concatenate the elements of the adjacency matrix A into a column vector x. When graph topology does not allow self-loop connections, we use the transformation: A ij =x l (2.7) , where l = (2Nj)(j 1) 2 + (ij);8(0<j <iN): (2.8) This transformation simply takes into account the symmetric nature of the adja- cency matrix and concatenates only the elements of the lower triangle, excluding the main diagonal. To allow self-loop connections, we use a similar transformation, but also include the diagonal elements. We now consider the linear mapping: Hx = k (2.9) where H is the NL incidence matrix of the graph [34, 151], a uniquely dened matrix that connects the edge strengths x to the node degrees k. Figure 2.2a shows 14 (a) Network without self-loops (b) Network with self-loops Figure 2.2: Two simple 4-node complete networks with dierent topological con- gurations. an example graph consisting of 4 nodes. The corresponding matrix H, edge vector x, and degree vector k are: H = 0 B B B B B B B @ 1 1 1 0 0 0 1 0 0 1 1 0 0 1 0 1 0 1 0 0 1 0 1 1 1 C C C C C C C A x = x 1 x 2 x 3 x 4 x 5 x 6 T (2.10) k = k 1 k 2 k 3 k 4 T The incidence matrix H represents the permissible structure of a null graph model, and for example, can be a fully connected network with or without self-loops or even have missing edges. Figure 2.2b gives an example of a network with self-loops; it is straightforward to update the incidence matrix H and edge vector x for this case. 15 With the above notation, the conditional expectation of edge strengths now becomes: E(xjk) =E(xjHx = k) = Z xP (xjHx = k)dx (2.11) 2.1.4 Gaussian Random Network While there is no general analytical solution toP (xjHx = k), the conditional prob- ability of the above equation, a closed form exists for the multivariate Gaussian dis- tribution. We refer to the random network whose edges are Gaussian distributed with mean vector x and covariance x as a Gaussian random network. Given that k is a linear transformation of x, k is also Gaussian distributed with mean k = H x and covariance k =H x H T . The conditional probabil- ity P (xjHx = k) is also a multivariate Gaussian distribution with the conditional expected value and covariance [77]: E(xjHx = k) = xjk = x + xk 1 k (k k ) (2.12) xjk = x xk 1 k kx (2.13) where the cross covariance matrix is xk = x H T . The above expression relaxes the non-negative edge weight assumption for the R null model in [108]. Furthermore, the denition of mean and covariance of the edge vector x allows us to provide prior information to the null network model. For example, network edges may be correlated with each other or have dierent mean and variance because of measurement considerations and noise rather than a true underlying network structure; adjusting the mean x and variance matrix x can account for such eects. 16 2.1.5 Gaussian Random Network with Independent Identically Distributed Edges For the special case of a Gaussian random network with independent identically distributed edges with x =1 and x = 2 I, we can simplify equation (2.12). As derived in Section 2.8.2, the conditional expectation of the l th component of the edge vector x (or equivalently the A ij element of the adjacency matrix) becomes: E(A ij jk) =E(x l jk) = 8 > > < > > : k i +k j N2 2m (N1)(N2) ;i6=j 0 ;i =j (2.14) The rst term on the right hand side of the above equation (when i6=j) shows that the expected value of a specic edge is positively correlated with the summa- tion of the degrees of the two associated nodes, whereas the second term shows that the expected edge strength decreases when the total weight of the network increases while the associated degrees are kept the same. In a real network this fact translates to the following. When two specic nodes are more densely connected to the network, we expect the link between them to be stronger. At the same time, if the degrees of the two nodes are kept the same, we expect the connection between them to be weaker when the entire network becomes more densely/heavily connected. 2.1.6 Non-Gaussian Random Networks and BLUE Searching for a network null model can be seen as an estimation problem. Given the degree vector k, we need to estimate the unknown edge vector ^ x jk . We can consider the best estimate in the minimum mean squared error sense (MMSE). An 17 important property of the MMSE estimator is that it is unbiased, which is highly desirable for graph clustering because the criterion we use to partition a graph is the measured edge strength versus its expected value. The MMSE estimator is dened as: ^ x MMSE jk = arg min ^ x jk F ^ x jk = arg min ^ x jk E n ^ x jk x 2 o (2.15) and is solved by setting the derivative to zero, which gives: ^ x MMSE jk =E (xjk) = xjk (2.16) Therefore, the MMSE estimator of the unknown random edges x given observa- tion k is also the conditional expectation, as in equation (2.11). As we have shown, in the Gaussian case this is found analytically using equation (2.12). In practice the MMSE estimator, even if it exists, often cannot be found [77]. In this case it is reasonable to resort to a suboptimal estimator, and a common approach is to restrict the estimator to be linear in the data. We then nd the linear estimator that is unbiased and has minimum variance, which is termed the Best Linear Unbiased Estimator (BLUE) [77]. In our case, the best linear (with respect to the degree observation k) estimator ^ x that minimizes the mean square error is: ^ x BLUE jk = arg min ^ x jk F ^ x jk such that ^ x jk = Lk + b (2.17) for some matrix L and vector b. The solution ^ x BLUE jk of the above minimization problem is the same as equation (2.12): ^ x BLUE jk = ^ Lk + ^ b = x + xk 1 k (k k ) (2.18) 18 with the derivation given in Section 2.8.3. This equivalence indicates that our null model under the Gaussian assumption is also the BLUE estimator of the null model under any probability distribution. Notice that, in general, ^ x BLUE jk is not the expected network xjk . However, ^ x BLUE jk best explains the observed degree vector k among all linear estimations, while at the same time remaining unbiased. 2.2 Partition Implementation There are multiple methods to maximize modularity through graph partitioning: greedy agglomerative hierarchical clustering [14,26,28,106,127], spectral partition- ing [107,108], external optimization [37], simulated annealing [56], and many others. Comparison of these methods is beyond the scope of this dissertation; rather we are interested in investigating the eectiveness of our novel null models. To achieve this goal, we chose the sequential spectral partitioning method [108] to perform clustering and a brief description follows below. We use two expressions for modularity; Q A based on our null models (Table 1), which we now jointly call A NULL jk , and Q B based on the null model R (equation (2.1)). Modularity is maximized over an indicator vector s A (or s B ) denoting group membership. ^ s A = arg max s A Q A = arg max s 1 ( s A T (A A NULL jk )s A 4m ) (2.19) ^ s B = arg max s B Q B = arg max s B s B T (A R)s B 4m (2.20) 19 Specically, the i th element of s A (s B ) is 1 or1, depending on which of the two groups the i th node belongs to after one partition. Partitioning proceeds by selecting vectors s A and s B that maximize modularity. Based on spectral graph theory, when s A and s B are allowed to have continuous values, maximization of Q A and Q B is achieved by selecting the maximum eigenvalues and eigenvectors of matrices A A NULL jk and A R respectively. The elements of vectors s A and s B are then discretized tof1; 1g by setting a zero threshold. Because of discretization, s A and s B do not align with the eigenvector with largest eigenvalue and further ne tuning is necessary to approach the global maximum, which can be done using the Kernighan-Lin algorithm [79]. We have further optimized this algorithm by randomizing the sequence of nodes and allowing them to change group membership more than once. This still does not guarantee a global optimum, but represents a trade-o between computational cost and accuracy. To partition the network into more than two groups, we recursively dichotomize the resulting sub-networks by maximizing equations (2.19) and (2.20) for each sub- network separately. After sucient partitioning steps,Q A andQ B will stop increas- ing at which point we have reached maximum modularity for the entire network and therefore no more sequential partitioning should be performed. This corresponds to a formal partition stopping criterion, which is an attractive property of modularity not shared by other clustering methods as we discuss in Section 2.6. 20 2.3 Partition Properties 2.3.1 Node Degrees An important property of the null graph models A NULL jk and R is that they have the same node degrees as the original graph. For the model A NULL jk this is enforced by construction, because it is conditioned on the node degrees k of the original graph. For the model R we can show this property as follows. Since the degree of a node is dened as the sum of edges that connect to this node, multiplication of the anity matrix A with a unit vector results in the degrees of all the nodes of the network: A 1 = k. Similarly, for the expected network R, the degree of node i is the i th column of R 1: (R 1) i = N X j=1 R ij = N X j=1 k i k j 2m =k i N X j=1 k j 2m = k i (2.21) Considering all nodes, we have R 1 = k, which implies that the degrees of all nodes of the expected network R are the same as those of the original network A. It can also be shown that A NULL jk 1 = k. For example, consider the Gaussian random network in equation (2.14): A NULL jk 1 i = N X j=1 A NULL jk ij = N X j=1;j6=i k i +k j N 2 2m (N 1)(N 2) = k i (2.22) 21 As a consequence of preserving the node degrees, the two networks satisfy: A A NULL jk 1 = 0 (2.23) (A R) 1 = 0 (2.24) As described in the previous section, maximization of modularity is performed by selecting the maximum eigenvalues and eigenvectors of matrices A A NULL jk and A R. Based on the above equations, vector 1 is always an eigenvector of these matrices with 0 contribution to modularity (its eigenvalue). This is reminiscent of the matrix known as the graph Laplacian [151] and is important in the spectral graph cut algorithm for the following reason. The unit vector indicates trivial partitioning, because it leads to grouping of all nodes into one cluster while at the same time the other cluster is left empty. This property gives a clear stopping criterion for dividing a graph: when the largest eigenvalue of A A NULL jk or A R is zero, there is no way to further divide the nodes into two clusters to increase modularity. 2.3.2 Network Topology Even though both null models maintain the node degrees of the original network, network R does not maintain the same topology. In particular, even though a real network often involves nodes where self-loops are not allowed or are not meaningful, network R always includes self-loops: R ii = k 2 i 2m 6= 0 22 The positive values assigned to self-loops lead to a bias in the R random model such that the diagonal values of the anity matrix are overestimated, whereas the rest of the connections are generally underestimated. This does not happen with the A NULL jk model because the allowed network topology is already included in matrix H by construction. For example, equation (2.14) was derived for an expected network where self-loops are not allowed and therefore: (A NULL jk ) ii = 0 2.3.3 Isolated Nodes A network node i is isolated if it does not connect to other nodes in the graph, which implies A ij = 0;8j6=i. Random network models A NULL jk and R treat isolated nodes dierently. Null model R leaves these nodes isolated and it does not matter where they will eventually be assigned. In contrast, model A NULL jk assigns non-zero expected connections between the isolated nodes and the rest of the network, based on the underlying probability distribution. As a result, isolated nodes are eventually assigned to clusters rather than treated as \don't-cares". Connecting isolated nodes to specic clusters may seem counterintuitive at rst glance, but there is an argument that supports such behavior. An isolated node is a part of the network, so the fact that it did not connect to any node can be considered an unexpected and noteworthy event. Consider two examples: a) a network with two clusters of unequal size,N1N2, but with equal within-cluster edge strength; b) a network with two clusters of equal size,N1 =N2, but with larger within-cluster edge strength for the rst cluster. In the former case, it is more surprising that the isolated node did not connect to the larger cluster rather than the smaller cluster, 23 because many potential connections exist in the larger cluster. In the latter case, it is more surprising that the isolated node did not connect to nodes which have overall strong connections. Our partition algorithm favors the least surprising of the above events, so it tends to assign isolated nodes to small clusters consisting of nodes with overall small degrees. Furthermore, the assignment of isolated nodes to clusters is non-trivial for networks with both positive and negative edges, for instance correlations vs. anti-correlations in functional brain networks [48], friends vs. foes in social networks [87], ferromagnetic vs. antiferromagnetic couplings in ising/spin-glass models [92] etc. In such cases, a zero edge is stronger than a negative edge. 2.3.4 Resolution Limit Modularity has a resolution limit that may prevent it from detecting relatively small clusters with respect to the entire graph, even though such small clusters can be dened as communities using local properties, for example cliques [7,45,46]. Originally derived using a Potts model approach, one solution to this problem is to introduce a resolution parameter that weights the null model when computing modularity [84,117]: Q B () = 1 2m X i;j (AR) ij (C i ;C j ) (2.25) and then solve for the partitioning results C i that maximize the above equation. The same approach can be applied with our null models, in which case we maximize: Q A () = 1 2m X i;j AA NULL jk ij (C i ;C j ) (2.26) 24 and similarly tune the resolution parameter to focus either on local structures (> 1) or global structures (< 1). 2.4 Partition Evaluation To evaluate the accuracy of partition algorithms, rather than requiring perfectly accurate partition results, we assess the overall similarity between the resulting and correct partition using the normalized mutual information (NMI) measure [29, 33, 70, 167]. Denoting the number of true communities as C T and the number of communities, resulting either from equation (2.19) or (2.20) as C R , NMI is dened as: NMI(C T ;C R ) = 2 P i2C T P j2C R N ij N log ( N ij N N i N j ) P i2C T N i N log ( N i N ) + P j2C R N j N log ( N j N ) (2.27) whereN ij is the number of nodes in the true community (cluster) i that appear in the resulting community j. For the case where an algorithm is unable to perform a partition and incorrectly nds the whole graph to be a single cluster (inseparable graph), we dene NMI= 0. 2.5 Results We assessed the performance of Q A and Q B modularity-based algorithms in parti- tioning simulated graphs as well as real world networks. Modularity Q A assumes several null models, depending on the underlying edge strength distribution (Table 1), whereas Q B is based on the null model in equation (2.1). We simulated graphs that follow a Gaussian or Bernoulli distribution of edge strength, as assumed by the null models in Table 1. Moreover, to test whether the BLUE estimator performs well on non-Gaussian cases, we measured its performance on Bernoulli networks. 25 (a) Performance (b)A (c)AA NULL jk (d)AR N 1 =N = 0:6 N 1 =N = 0:5 Figure 2.3: Partition results for Gaussian graphs. a) Performance against dierent values of variance ( 2 ) of the Gaussian distribution and zero additive noise ( 2 N = 0); b) anity matrix; c) dierence between anity matrix of original graph versus null model A NULL jk ; d) same for R null model. Top row: 12 vs. 8 cluster size; bottom row: 10 vs. 10 cluster size. Method Q A accurately partitions the graph for all values of 2 , whereas method Q B fails for low values of 2 . Real world networks include the Karate Club Network in [165], a structural brain network in [61], and a resting state functional brain network from the 1000 Func- tional Connectomes Project (http://www.nitrc.org/projects/fcon_1000/). 2.5.1 Gaussian Random Networks We simulated Gaussian graphs by drawing from an i.i.d. Gaussian distribution with xed mean = 8 and several levels of variance 2 . For each level of variance, we simulated two clusters with variable size N 1 N 2 , such that N 1 +N 2 =N, where N = 20 is the total number of nodes. This community structure was enforced by randomly allocating the stronger values of the Gaussian distribution as intra- cluster edges and the weaker values as inter-cluster edges. To test robustness against noisy measurements, we added i.i.d. Gaussian noise with mean zero and variance 2 N . Our goal is to evaluate the performance of partition algorithms for several levels of edge variance 2 , noise variance 2 N , and cluster size ratio N 1 =N. For 26 each conguration of the above parameters, we simulated 1000 random network realizations and then used equations (2.19) and (2.20), which maximize modularity Q A and Q B respectively, in order to partition graphs. For Q A modularity, we used the null model in equation (2.14), which is optimal for i.i.d. Gaussian networks. Partition quality is measured with NMI and averaged across the 1000 realizations. Figure 2.3a displays the NMI similarity metric, averaged over the 1000 random networks, across several levels of edge variance 2 , for the case of no noise ( 2 N = 0) and cluster size ratio N 1 =N = 0:6 (top row; 12 vs. 8 cluster size) and N 1 =N = 0:5 (bottom row; 10 vs. 10 cluster size). Figure 2.3bcd displays sample realizations of the network anity matrix A, and its dierence with the null models used for Q A and Q B , A A NULL jk and A R, respectively. Partition results based onQ A are practically identical with the underlying struc- ture of the simulated network, as indicated by NMI = 1, for all values of 2 . Con- versely, for the conguration parameters in Figure 2.3, method Q B has a consid- erable performance drop with decreased values of variance 2 , eventually reaching NMI = 0. In fact, for low values of variance it was unable to divide the network, considering it inseparable most of the times. Figure 2.4 shows the performance of the two partition methods for several levels of edge variance 2 , noise variance 2 N , and cluster size ratio N 1 =N. As expected, the additive Gaussian noise deteriorates the partition performance, as indicated by the drop of NMI when 2 N increases. However, theQ A method is much more robust to noise interference than Q B . Changing the edge variance 2 has little eect on method Q A performance. This is not the case with methodQ B , which fails completely for small values of 2 , however it performs better thanQ A for large values of 2 . Both methods deteriorate 27 2 N = 0 2 N = 0:1 2 N = 0:5 2 N = 1 2 N = 5 2 = 0:1 2 = 0:5 2 = 1 2 = 5 N 1 =N Figure 2.4: Partition results of Gaussian graphs for several levels of edge variance 2 , noise variance 2 N , and cluster size ratio N 1 =N. Method Q A is more robust to noise, and method Q B fails for small values of 2 but performs better for large values of 2 NMI 28 (a)A (b)AA NULL jk (c)AR Figure 2.5: Partitioning graphs with both positive and negative connections. a) Anity matrix; b) dierence between anity matrix of original graph versus null model A NULL jk ; d) same for R null model. In the presence of negative connections, null model R fails to partition the graph, as indicated by the lack of structure of matrix A R. when cluster sizes are very asymmetric, with very low NMI values when N 1 =N is close to 1. The null model R in modularity Q B requires positive edge strength to ensure that the product of degrees of two nodes is a meaningful measure of their expected connection. There is no such constraint for the null models in Q A . Figure 2.5 displays a network with negative connections. It consists of two equal size clusters of 10 nodes each. Connections follow a Gaussian distribution with mean0:4 (positive for inter-cluster and negative for intra-cluster connections) and variance 0:04. When subtracting the null model from the anity matrix A, network structure is still visible in the A A NULL jk case, but not in the A R case. We further address the issue of networks with negative connections in the Discussion section. 2.5.2 Binary Random Networks Random rewiring schemes, involving permutation of existing edges of a network, have been proposed to construct binary random networks [93]. Here we explore the similarity of our null graph models to these networks. 29 We consider the random rewiring scheme proposed in [93], which keeps the degrees of all network nodes constant. A numerical algorithm rst selects a pair of edges, e AB and e CD , connecting nodes A-B and C-D, respectively. The two edges are then rewired to connect nodes A-C and B-D, eectively eliminating the original two edges and replacing them with e AC and e BD . To avoid multiple edges connecting the same nodes, the rewiring step is aborted if at least one of the edges, e AD or e BC , already exists. Following a modication in [94], the constraint of no self-loops is added to the rewiring algorithm by requiring nodes A, B, C, and D to be dierent. To produce a suciently random binary graph, the rewiring step is repeated multiple times. Our Monte Carlo simulation includes generating 200 dierent binary undirected networks of size N without self-loops. For each network, we create 1000 random rewired graphs, each produced by multiple rewiring steps, and then average them to produce a mean anity matrix W. The number of rewiring steps was set to be equal to N r times the total number of edges in the graph for values of N r from 25 to 400. We compute the distanced(W; A jk NULL ) between W and our null model A jk NULL using the root mean square dierence between the elements of the two corresponding adjacency matrices: d(W; A jk NULL ) = s P i;j (A jk NULL W) 2 ij N 2 (2.28) The above procedure results in 200 estimates of the distance metric for each null model tested. We evaluated the null models in equations (2.1) and (2.5), as well as the BLUE null model in equation (2.18). In the later case, we assumed i.i.d. 30 edges with diagonal covariance matrix, so the BLUE model is the same as equation (2.14). For a xed network size N = 10, we repeated the above analysis for various values of N r . As shown in gure 2.6a, N r does not aect the distance between the rewiring networks and the null models. Furthermore, the Bernoulli null model is the most similar to the rewiring procedure, followed by the BLUE model and then the R model, which either includes (Q B ) or does not include (Q B ) the diagonal terms in the calculation of Equation (2.28). We consider bothQ B andQ B to test whether the deviation of the R model from the random rewiring graphs is only attributed to the diagonal terms (cf. section 2.3.2, which indicates that the topology of network R always requires self-loops, even though they are not necessarily present in the original network). Although a considerable amount of dissimilarity is explained by the diagonal terms, Q B is still more distant from a rewiring graph than our null models. We also evaluated networks with various sizes N, as shown in Fig. 2.6b, while xingN r = 50. Overall, Bernoulli and BLUE null models were closer to the rewiring scheme than the R model for all values of N. However, the dierence dissipates with increased N. The BLUE estimator is best for network sizes N < 8, whereas the Bernoulli estimator is best for larger networks. To test the performance of the partition methods Q A and Q B in binary net- works, we simulated a graph introduced in [56]. The graph consists of 128 nodes, arranged in four cluster of 32 nodes each. Edges follow Bernoulli distribution with probabilityp i for within-cluster connections andp o for between-cluster connections. The average degreek of each node has two components: k i = 31p i from connections within the same cluster, andk o = 96p o from connections to nodes in other clusters. Probabilities p i and p o are selected such that the average degree of each node is 31 Figure 2.6: Comparison of null graph models against a random rewiring scheme. Distance is expressed as root mean square dierence between the elements of the anity matrices of the graphs (equation (2.28)). For each box plot, the central mark is the median, the edges of the box are the 25th and 75th percentiles, and the whiskers extend to the most extreme data points not considered outliers. Left: Distance does not depend on the number of rewiring steps N r . Right: As the size of graphN increases, distance becomes smaller. Overall,Q A null models are closer to an actual rewiring scheme than Q B models. k =k i +k o = 16. We created 100 realizations of this network, each for dierent val- ues ofk o , ranging from 1 to 15. High values ofk o lead to less community structure, as displayed in the top row of Fig. 2.7. We quantied the performance of method Q B , as well as method Q A using the Bernoulli null model, with probability value p = 16=127, and the BLUE model. The probability value was selected because each node on average connects to 16 out of 127 potential nodes to the rest of the graph. All three methods had roughly the same performance, as shown in the bottom row of Fig. 2.7. Partition accuracy drops ask o increases, given that community structure is less detectable when nodes from dierent clusters become more and more densely connected. We further tested all methods with another benchmark introduced in [85], and online code available athttp://sites.google.com/site/andrealancichinetti/ files. This benchmark generates random binary graphs with a xed number of 32 k o = 1 k o = 5 k o = 8 Performance Figure 2.7: Partition results for the binary network described in [53]. Top row: higher values of k o lead to less community structure of the network; bottom row: Partition results for dierent values ofk o . Performance of all methods is practically equal. nodes and a desired average degree. Node degrees, as well as the size of communi- ties, were sampled from power-law distributions. Each node had a fraction 1 of its links with its community, where is a mixing parameter with values between 0 and 1. We used a network size of N = 500 and averaged results of 100 realizations of networks. Methods Q BLUE A , Q Bernoulli A , and Q B have almost identical performance, as shown in Figure 2.8. We also explored the performance of partitioning binary networks with variable cluster size ratio. Similar to the Gaussian case described earlier, we simulated N = 20 node graphs and varied the cluster size ratioN 1 =N, but now edge strength followed a Bernoulli distribution with parameters p = 0:8, and 0:1 for within and between-cluster connections, respectively. Parameters were selected so that the 33 hki = 15 hki = 20 hki = 25 Figure 2.8: Results of benchmark in [85] forQ A andQ B methods. The performance of all methods is almost identical for all values of mixing parameter and average degreehki. (a) N 1 =N = 0:5 (b) N 1 =N = 0:85 (c) performance Figure 2.9: Partition results for a binary network for dierent values of cluster size ratio. a) Anity matrix for 10 vs. 10 cluster size; b) Anity matrix for 17 vs. 3 cluster size; c) Performance is very similar for all methods, across all cluster size ratio values. graph has community structure in the weak sense [115], which requires that for every cluster the sum of within connections is larger than the number of between connections divided by two. This was true for the cluster size ratio simulated, which included clusters of size 17 . For each cluster size ratio, we simulated 1000 random graph realizations and then appliedQ A andQ B methods to segment the graph. ForQ A , we used both the BLUE model, and the Bernoulli model with p value estimated from the data. Fig. 2.9 shows that all methods have practically the same performance for all tested cluster size ratios. 34 2.5.3 Real World Networks We tested Q A and Q B modularity partition methods on the karate club network given in [165]. In this binary network, individual members are represented by nodes on the graph, and edges connect nodes if a relationship exists between the corre- sponding individuals. After a con ict between the club ocers and the instructor, the club was divided into two organizations. We tested whether Q A andQ B parti- tion methods predict the actual division of the group. Q Bernoulli A is the Q A method based on the Bernoulli random network model in equation (2.5), and Q BLUE A is the Q A method based on the BLUE null model in equation (2.18). The original paper that presented the Karate Club network [165] describes two possible partitions, one based on the factions of its members and the other on the actual split of the network. The two partitions are almost identical, with the exclusion of person 9, who was friendly to both groups, with two friends in the instructor's group (relationship strength 2 and 5) and three friends in the ocer's group (relationship strength 3, 4 and 3). The faction partition weakly assigns per- son 9 to the ocer's group, however after split he actually joined the instructor's group, for reasons explained as personal interest to obtain his black belt. In this dissertation, we use the actual split of the network as a reference partition, be- cause it is the most objective measure describing the network; as described in [165], \The factions were merely ideological groupings, however, and were never organi- zationally crystallized. There was an overt sentiment in the club that there was no political division, and the factions were not named or even recognized to exist by club members". Figure 2.10 shows the Karate Club network, with color indicating the actual subsequent club split due to con icts. Table 2 shows the modularity achieved by 35 Figure 2.10: Karate club network. Nodes indicate 34 club members and color indicates the actual split of the group after a con ict between its members. Table 2.2: Results of karate club binary network partitioning using 3 methods: Q B , Q Bernoulli A , andQ BLUE A . Values represent modularity achieved for the 3 partition results on the columns of the table. Bold font indicates maximum modularity, which also represents the nal partition results achieved by each method. Club split Club split except node 9 Club split except nodes 9 & 10 Q B 0.3582 0.3715 0.3718 Q Bernoulli A 0.4671 0.4667 0.4662 Q BLUE A 0.3741 0.3872 0.3869 three methods, Q B , Q Bernoulli A , and Q BLUE A , for three dierent ways of bi-partitioning the network: the club split, club split with the exception of node 9 classied into the other group, and club split with the exception of node 9 and 10 classied into the other group. Bold fonts indicate maximum modularity for each method, which also represents the nal partition results achieved with our implementation of each method. Modularity values are shown based on the binary values of indicator vectors s A and s B . Only Q Bernoulli A precisely predicted the true split of the club. However, all methods produced very similar results, and inspection of gure 2.10 shows that nodes 9 and 10 are actually very close to both groups. 36 We additionally applied the above methods to the weighted version of the Karate Club network (Figure 3 in [165]). All methods achieved maximum modularity for the actual club split with the exclusion of node 9, which eectively corresponds to the faction partition of the network. We also point out that modularity can increase further, both for the unweighted and weighted versions of the Karate Club network, with subsequent partition into four communities. We further applied the Q A and Q B clustering methods to structural brain net- work data described in [61]. In this undirected weighted network, each node repre- sents one of the 66 FreeSurfer-parcellated regions of the cerebral cortex, and edges represent the connection of regions in terms of axonal ber density. Axonal bers were extracted from diusion spectrum imaging data, and then averaged across 5 subjects as described in [61]. Figure 2.11ab displays the complete network, and Figure 2.11cd shows our clustering results with methods Q A and Q B . For method Q A we used the BLUE null model in equation (2.18). Nodes for dierent groups are plotted using dierent colors. We also label hub nodes with larger spheres. Hub nodes denote regions that are highly connected to the network, and were identied as those with a participation coecient greater than 0.5 [61]. Although the ground truth for this network is unknown, we observe that method Q A produced a more symmetric between-hemisphere sub-network structure than method Q B (gure 2.11cd). Furthermore, several resting state studies [48] have identied the central role of precuneus and posterior cingulate in regulating brain activity in the default network. Method Q A identied several hub nodes in this area, with a more symmetric organization than method Q B (gure 2.11ef). To test Q A and Q B methods on a network with both positive and negative connections, we used the Beijing data set from the 1000 Functional Connectomes 37 Project in NITRC (http://fcon_1000.projects.nitrc.org/). The data set con- sists of 191 subjects of age 18-26, each having 7.5 minutes resting state fMRI recordings. The available data was already motion corrected, registered into the Harvard-Oxford probabilistic atlas, segmented, spatial smoothed, bandpass ltered, and subjected to nuisance analysis [78]. We constructed a functional network using a total of 96 available cortical ROIs as nodes and computed the correlation coe- cient between the fMRI recordings in every pair of ROIs. The correlation coecients were then transformed to z-values using the Fisher transform z = arctanh(r) [51], which produces a weighted adjacency matrix A. This real world network contains both positive and negative connections, the former representing correlated spontaneous uctuations within a network, and the latter anti-correlations between networks [47,48]. We applied Q A andQ B methods to partition the network, as well as a generalization ofQ B , denoted asQ B , that can deal with negative connections [54]. Figure 2.12 shows thatQ A is the only method that achieves perfect across-hemisphere symmetry. The dark blue spheres represent the task-negative network, which includes a set of regions often termed the \default network", namely the posterior cingulate, precuneus, medial prefrontal areas, and others [48]; the cyan spheres represent the task-positive network, which includes the insula, sensory and motor cortices, and others; the red spheres represent the visual cortex, which has no intrinsic preference for either network. Method Q B has unpredictable behavior with the presence of negative connec- tions, and does not accurately cluster the task-negative network. Method Q B pro- duces very similar results to Q A , however there are 4 asymmetric nodes and an additional cluster of yellow spheres which encompasses the left/right middle medial frontal cortex and the right Broca's area. 38 2.6 Discussion We have proposed several null models for modularity-based graph partitioning. Apart from being optimal for specic parametric distributions of edge strength, these models are also not limited by two constraints of the R null model, namely non-negative edge strength and the assumption of self-loops. Model R uses the product of degrees of two nodes to evaluate their expected connection strength. This measure becomes meaningless when negative edges are allowed, because it can arbitrarily change sign. As a result, A R has a random structure and is not useful to partition graphs, as indicated in gure 2.5c. This is a well known constrain and there are multiple studies generalizing the modularity denition to accommodate the negative values [54,75,122]. The basic idea behind these methods is to calculate modularity based on a weighted summation of two terms, each based on separate null models derived either from positive or negative edges. Although they overcome the problem of negative edges, these methods represent heuristic solutions with a required selection of weighted coecients that aects clustering results [122]. In contrast, the Gaussian and BLUE null models require no model modications or selection of mixing parameters. Nodes with negative connections are treated not only as disjoint, but also as repelling each other. The functional brain network in Fig. 2.12 illustrates such an example, with negative edges representing anti-correlations. Unlike the conventional method (Q B ) and its extension for negative edges (Q B ), the BLUE null model achieves perfectly symmetric partition results. As discussed in Section 2.3, model R always assumes self-loops, because the diagonal elements of the anity matrix are non-zero. However, real world networks often do not allow self-loops, for example trac networks, brain networks etc. As a 39 result, model R often over-estimates self-loop connections. This by itself may not seem important for partitioning graphs. But, given that model R has the same node degrees as the original graph (total sum of edges is constant), this also means that it under-estimates all between-node connections. Figure 2.3d displays A R, which has a negative bias in diagonal elements and a positive bias in o-diagonal elements. In particular, diagonal elements are strongly negative, while o-diagonal elements are mostly positive, even in the terms corresponding to inter-cluster connections (top right and bottom left quadrants). The later causes the graph to often become inseparable with methodQ B , leading to NMI= 0 as indicated in our results. Method Q A does not suer from similar bias, and network topology can be properly modeled using matrix H. Figures 2.3 and 2.4 show that method Q B greatly benets from increased val- ues of 2 . This is because increased 2 amplies the dierence between inter and intra-cluster connections, which is a consequence of our graph simulation approach: we randomly draw from the Gaussian distribution and assign stronger values as intra-cluster edges and weaker values as inter-cluster edges. After some threshold, the dierence between inter and intra-cluster connections is so large that the afore- mentioned bias in the Q B approach no longer makes the graph inseparable (gure 2.3a). On the other hand, method Q A does not benet as much from increased 2 , because the null model (2.14) does not depend on the variance of the Gaussian distribution. Both Q A and Q B methods have decreased performance when cluster sizes be- come very asymmetric (N 1 =N very large). This is because the community structure becomes less prominent, eventually not even satisfying the weak sense community structure property [115]. 40 Rewiring results in gure 2.6 show that method Q B is not as close as Q A to an actual rewiring scheme. This is true regardless of whether we include (Q B ) or exclude (Q B ) diagonal terms, and is explained by the aforementioned bias of the conventional null model towards increased self-loop connections and decreased inter-cluster connections. The Q Bernoulli A model outperforms the Q BLUE A model for all cases other thanN < 8, where the small size of the network severely limits possible rewiring selections using the algorithm in [93]. Our partition results on binary networks show that all methods perform simi- larly. This includes the BLUE model, which is statistically optimal for Gaussian distributions of edge strength, but still performs well for the Bernoulli distribution. This may be indicative of a more general robustness of the BLUE model in cases that deviate from the multivariate Gaussian assumption. Our Karate network partition results withQ B method are dierent from the ones reported in [109], even though both use the same null model R. This is because [109] used a dierent implementation of the Kernighan-Lin algorithm (discretization of elements of vector S B ), which may have resulted in a local maximum that produces dierent results for node 10. However, method Q B achieves higher modularity for the partition result in the third column of Table 2. Given that nodes 9 and 10 are about equally connected to both clusters and modularity is very close for all partitions, these results seem rather coincidental.For the Karate club network, even higher modularity can be achieved by subsequent partitioning of the network into four subnetworks. However, by tuning the resolution parameter lambda in equation (2.26), results can be restricted to two clusters by selecting a value< 0:59 for the BLUE null models (a value smaller than 1 biases towards a more global structure). To perform clustering, we selected the spectral partitioning method proposed in [108]. This method performs sequential bipartitions of the graph until maximum 41 modularity is achieved. A variant of this method would be to use multiple eigenvec- tors to directly estimate multiple clusters [107, 151, 152, 155]. While such method can potentially achieve better clustering results, this requires a priori knowledge of the number of clusters, a general problem dealt with by many methods with dierent levels of success [12,25,100,140,142]. Selecting which null model to use for clustering depends on the network at hand. In general, we recommend using the BLUE model in equation (2.19). Our results indicate that it performs reliably in a variety of networks regardless of the probability distribution of the network edges. It is also the optimal estimator (in the MMSE sense) when edges follow a Gaussian distribution. In the most common case of no prior information, the covariance matrix x is identity and the BLUE model is estimated by equation (2.14). Furthermore, if network topology allows for self- loops, the BLUE model can be estimated using equation (2.12) with an appropriate incidence matrix H. For the specic case of binary networks, the Bernoulli expected network in equation (5) and the BLUE model in equation (2.14) perform nearly equivalently. Regarding the estimation of parameters for the null models, the mean in equa- tion (2.4) can be estimated as = 2m N(N1) , while the probability parameter p in equation (2.5) can be estimated asp = 2m N(N1) . The mean x and covariance x in equation (2.19) depend on the prior information about the network. The commonly used null model in equation (2.14) does not require any parameter estimation. 2.7 Conclusion Graph partition algorithms based on modularity have become increasingly popular in identifying network community structure. In this chapter, we introduced null 42 models that are consistent for their assumed underlying distributions, and in some cases lead to improved estimation of modularity and graph partitioning. The BLUE model performed well in all cases, and therefore can be considered as a general method for graph partitioning and not restricted in its use to only Gaussian edge distributions. Our models do not have some of the limitations of the R model, namely they do not assume self-loops and can deal with negative connections. In addition, they accommodate possible network topology changes and can be more robust to noise. They performed well in our simulations and also appeared to successfully detect the community structure of real world networks. 2.8 Supplementary Materials 2.8.1 Random Network Conditioned on Node Degrees Here we derive equations (2.4) and (2.5), namely the conditional expected network for the cases of Gaussian and binary networks respectively. First, consider the conditional expected network in equation (2.3) for a Gaussian random network with independent identically distributed edge strengths with mean and variance 2 . We assume a complete graph, i.e. a graph where every pair of distinct nodes is connected by a unique edge. We exclude self-loops, therefore the degree of a node in the graph is the sum of N 1 Gaussian variables. For two nodes i and j with degrees k i and k j and connected with edge strength A ij = t, we dene two new random variables: k 0 i =k i t andk 0 j =k j t. These variables represent the sum of all edges connected to nodesi andj, respectively, excluding the common connection A ij . Therefore, their distribution is the sum ofN2 Gaussian independent random 43 variables with mean and variance 2 . The conditional expectation in the right side of equation (2.3) becomes: P (k i ;k j jA ij =t) =P (k 0 i =k i t;k 0 j =k j t) = P (k 0 i =k i t)P (k 0 j =k j t) (2.29) where the rst equality uses the newly dened random variables and the second equality results from the independence of k 0 i and k 0 j , since they do not share any common edges. We can now substitute in the right hand side of equation (2.3) the known dis- tributions: P (A ij =t) = 1 p 2 2 exp (t) 2 2 2 (2.30) P (k 0 i =k i t) = 1 p 2(N 2) 2 exp (k i t (N 2)) 2 2(N 2) 2 (2.31) which produces the expression in equation (2.4). For the binary network, the Bayesian formula (2.3) can be written as: E(A ij jk i ;k j ) = 1 X t=0 tP (A ij =tjk i ;k j ) = 1 X t=0 t P (k i ;k j jA ij =t)P (A ij =t) P 1 u=0 P (k i ;k j jA ij =u)P (A ij =u) (2.32) 44 Following the same analysis as above for the Gaussian case, if we assume all edges in the binary network follow i.i.d. Bernoulli distribution with parameterp, we have: P (k i ;k j jA ij =t)P (A ij =t) = 8 > > < > > : P (k 0 i =k i 1)P (k 0 j =k j 1)p; t = 1 P (k 0 i =k i )P (k 0 j =k j ) (1p); t = 0 (2.33) In this case, the corresponding node degrees k i follow a binomial distribution, as the sum of N 2 i.i.d. Bernoulli random variables. For example: P (k 0 i =k i 1) = N 2 k i 1 p k i 1 (1p) N1k i (2.34) To obtain equation (2.5) for a binary network whose edges follow i.i.d. Bernoulli distribution, we apply equations (2.33) and (2.34) to equation (2.32). 2.8.2 Gaussian Random Network with Independent Identically Distributed Edges For the case of Gaussian random networks with independent identically distributed edge strengths, we have x = 1 and x = 2 I, which simplify the covariance matrices: k = H x H T = 2 HH T xk = x H T = 2 H T Therefore, equation (2.12) now becomes: E (xjHx = k) = 1 + H T HH T 1 k (2.35) 45 where k k k . The product HH T has the structure: (HH T ) ij = 8 > > < > > : N 1 ;i =j 1 ;i6=j (2.36) and we can write HH T in the following format: HH T = (N 2)I N + 1 N 1 T N (2.37) The physical meaning of HH T ij is the number of common edges that nodesi and j share. In a complete graph without self-loops there is only one edge that links two distinct nodes while there are N 1 edges associated with one node (on the diagonal of HH T ). To calculate the inverse of HH T , we apply the matrix inversion lemma [146] to equation (2.37): (A + UCV) 1 = A 1 A 1 U(C 1 + VA 1 U)VA 1 (2.38) with A = (N 2)I N , U = 1 N , V = 1 T N , and C = 1. We denote I N the identity matrix with dimension N and 1 N a unit column vector with dimension N. The inverse becomes: (HH T ) 1 ij = 8 > > < > > : 1 N2 1 2(N1)(N2) ;i =j 1 2(N1)(N2) ;i6=j (2.39) 46 For the l th element of x, the conditional expectation in equation (2.35) is: E(x l jHx = k) = 1 + xk k 1 k l = + H T l HH T 1 k = + k i + k j N 1 N X s=1;s6=i;s6=j k s (N 1)(N 2) = k i + k j N 2 2m (N 1)(N 2) (2.40) where H T l denotes the l th row of H T . In the above we used the fact that the i th and j th are the only non-zero elements of row H T l . They are equal to 1, indicating which nodes are associated with edge x l . Therefore, a left multiplication with H T l results in the addition of i th and j th rows of matrix (HH T ) 1 . 2.8.3 Derivation of Best Linear Unbiased Estimator (BLUE) To solve for the BLUE estimator ^ x BLUE jk = ^ Lk + ^ b in equation (2.17), we set the corresponding derivatives with respect to ^ L and ^ b to zero: @F ^ x jk @ ^ b ^ x jk =^ x BLUE jk = ^ Lk+ ^ b = 0 (2.41) @F ^ x jk @ ^ L ^ x jk =^ x BLUE jk = ^ Lk+ ^ b = 0 (2.42) The objective functionF(^ x jk ) can be written as: F(^ x jk ) = E (^ x jk x) T (^ x jk x) = E tr(^ x jk x)(^ x jk x) T (2.43) 47 To compute (2.41) and (2.42), we need to estimate derivatives of the trace of several matrix products. We will therefore use the following formulas, which are true for any matrices A,B,C, and X that result in an nn argument for trace tr(): @tr(AXB) @X = @tr(B T X T A T ) @X = A T B T (2.44) @tr(AXBX T C) @X = A T C T XB T + CAXB (2.45) A proof of these formulas is given later in this section. The derivatives are now estimated as follows: @F ^ x jk @ ^ b = @ @ ^ b n E h tr(( ^ Lk + ^ b x)( ^ Lk + ^ b x) T ) io = E @ @ ^ b n tr(( ^ Lk + ^ b x)( ^ Lk + ^ b x) T ) o = E h 2 ^ Lk + 2 ^ b 2x i = 2 ^ L k + 2 ^ b 2 x (2.46) @F ^ x jk @ ^ L = @ @ ^ L n E h tr(( ^ Lk + ^ b x)( ^ Lk + ^ b x) T ) io = E @ @ ^ L n tr(( ^ Lk + ^ b x)( ^ Lk + ^ b x) T ) o = E h 2 ^ Lkk T + 2 ^ bk T 2xk T i = 2 ^ LE kk T + 2 ^ bE k T 2E xk T = 2 ^ L k + k T k + 2 ^ b T k 2 xk + x T k (2.47) 48 By setting both derivatives to zero and solving the resulting system of equations with respect to ^ L and ^ b, we get: ) 8 > > < > > : ^ b = x xk 1 k k ^ L = xk 1 k (2.48) Therefore, the BLUE estimator becomes the same as equation (2.12), also given here for convenience: ^ x BLUE jk = ^ Lk + ^ b = x + xk 1 k (k k ) (2.49) Equations (2.44) and (2.45) are given in the calculus section in [17], and a brief proof follows. Assume e i a vector containing 1 in i th position and zeros elsewhere. Then dX=dx ij =e i e T j . Consequently, d=dx ij (tr(AXB)) = tr(Ae i e T j B) = tr(e T j BAe i ) = e T j BAe i = (BA) ji = A T B T ij (2.50) 49 Similarly, d=dx ij (tr(AXBX T C)) = tr(Ae i e T j BX T C + AXBe j e T i C) = tr(Ae i e T j BX T C) +tr(AXBe j e T i C) = tr(e T j BX T CAe i ) +tr(e T i CAXBe j ) = (A T C T XB T + CAXB) ij (2.51) 50 (a)A (b) Brain Network (c) Method Q A (d) Method Q B (e) Method Q A (f) Method Q B Figure 2.11: Graph analysis of a structural network based on diusion brain imag- ing. a) Anity matrix with the upper-left and lower-right quadrants corresponding to regions of interest on the right and left hemispheres respectively; b) structural brain network with edges greater than 0.1; c) partition results for method Q A . Color indicates cluster membership; d) same as before for method Q B ; e) partition results for method Q A , but now spheres indicate ROIs and larger spheres indicate network hubs; f) same as before for method Q B 51 Q A Q B Q G B Figure 2.12: Partitioning results with Q A , Q B , and Q B [54] methods of a real functional brain network. Dierent modules detected by the algorithm are labeled by color. First row indicates the symmetric across-hemisphere results while the second row highlights the asymmetric ROIs. 52 Chapter 3 Modularity-Based Partitioning for Directed Graphs Mapping the human connectome has received considerable attention recently, as it promises to construct a detailed map of the structural and functional neural connections of the human brain. Network analysis methods have a prominent role in this eort, as they have successfully revealed the community structure of brain networks, including functional MR connectivity maps [148] and ber tract networks estimated from diusion imaging [22,61]. Optimal partitioning of a network requires specication of an appropriate cost function. As we have seen in Chapter 2, among the most popular is modular- ity [109], which uses the idea that if a natural division of a network exists, con- nections within a module should be stronger than their expected values, and the opposite should hold true for connections between modules. Evaluation of modular- ity requires the computation of an expected network, or null model, which has the same conguration as the original network but contains no community structure be- cause of a random placement of its edges. For an undirected network, Newman [108] considered a random network where the probability of having a connection between 53 two nodes i and j is proportional to the product of their degrees, k i k j . For a di- rected network, [6] and [86] extended the above idea by dening a random network whose edge from node i to j is proportional to the product of in- and out-degrees of the two nodes, k in i k out j . Partitioning a directed network based on modularity results in its community structure, but it does not reveal causal relationships or how information ows within the network. Several studies have investigated how information ows between ac- tive brain regions [81, 126] or brain hemispheres [13, 141]. However, to our knowl- edge, a global search to nd information ow throughout the network has not been presented. In this chapter, we propose an accurate null model for partitioning directed graphs: the expected network conditioned on the in- and out-degrees of all nodes. This model is directly analogous to our work described in Chapter 2 on partition- ing undirected graphs (also in [21, 22]), where we proposed the expected network conditioned on the degrees of all nodes. We also present a novel method to parti- tion graphs based on information ow. To investigate the organization structure of directed graphs, we combine the two methods in a two step procedure, where we initially partition a graph based on modularity, and then measure information ow within each module. 54 3.1 Method 3.1.1 Null Models for Directional Graphs 3.1.1.1 Conditional Expected Model Consider a directed network consisting of N nodes, where the edge from node i to node j is the A ij element of the adjacency matrix A. The in- and out-degrees of node i, representing the sum of incoming and outgoing edges respectively, are denoted as P N j=1 A ji = k in i and P N j=1 A ij = k out i . In vector notation, A T 1 = k in and A 1 = k out . To partition directed graphs based on modularity, [6] and [86] proposed the null model whose edge from i to j is given by: R ij = k out i k in j m (3.1) In Chapter 2, we introduced a more accurate null model for undirected graphs, the conditional expected network E(A ij jk). Here, we introduce an analogous null model for directed graphs: E(A ij jk in 1 ;k in 2 ; ;k in N ;k out 1 ;k out 2 ; ;k out N ) =E(A ij jk in ; k out ) (3.2) which is the conditional expected network given the in- and out-degrees of all nodes. To estimate the above expectation, we concatenate the elements of the adjacency matrix A into a column vector x with the transformation A ij =x l , where l = ( (2Nj)(j 1) + 2(ij); j <i (3.3a) (2Ni)(i 1) + 2(ji) 1; i<j (3.3b) 55 This transformation maps all non-diagonal elements of A into a column vector. We now consider the linear mapping Hx = k, where H is a uniquely dened matrix that connects the edge strengths x to the degree nodes k = k outT k inT T . Figure 3.1a illustrates this with a 3-node-6-edge complete directed graph. Our conditional expected model in equation (3.2) becomes: E(xjk) =E(xjHx = k) = Z xP (xjHx = k)dx (3.4) Figure 3.1: Left: A 3-node complete directional network. Right: Forward mapping matrix H. Rows above the red line are related to the out-degrees of all nodes; rows below are related to the in-degrees. 3.1.1.2 Directed Gaussian Random Network and BLUE While there is no general analytical solution to the conditional probability in equa- tion (3.4), a closed form exists for the multivariate Gaussian distribution. We refer to the random network whose edges are Gaussian distributed with mean x and covariance x as a Gaussian random network, and under this assumption equation (3.4) becomes: E(xjHx = k) = xjk = x + xk + k (k k ) (3.5) 56 where xk = x H T and k = H x . We use the pseudoinverse + k because H is rank decient (since P N j=1 k in j = P N j=1 k out j =m). The null model in equation (3.5) is more exible than the one in equation (3.1): negative connections are allowed; matrix H allows for dierent network topologies with or without self loops; and prior information can be introduced in the form of x and x . For the special case of Gaussian independent identically distributed edges, we have x =1 and x =I, and with some manipulation the conditional expecta- tion becomes: E(A ij jHx = k) =E(x l jk) = 8 > > < > > : k out i +k in j N + k all i +k all j N(N2) m (N1)(N2) ;i6=j 0 ;i =j (3.6) where k all i =k in i +k out i is the total degree of node i. For the general case of non-Gaussian distributions, the solution in equation (3.5) is still useful. Similarly to [21], we can show that it is the best linear unbiased estimator ^ x BLUE jk of the edges x given the node degree sequence k, which we have xed in the conguration model [102]. 3.1.2 Maximization of Information Flow and Modularity We dene information ow within a network naturally as follows: from node i toj the information ow amount is A ij . Our goal is to identify clusters of nodes that are information sources or information sinks. Schematically, this is shown in Figure 3.2a. Assume the nodes in the anity matrix are ordered into two clusters. Then, the top-right quadrant denotes con- nections from cluster 1 to cluster 2, and the opposite holds true for the bottom-left 57 quadrant (even though clusters can be of unequal size, we use the term quadrant for convenience). Maximizing information ow between the two clusters is equivalent to nding the proper conguration of nodes so that elements in the top-right quad- rant are maximized while elements in the bottom-left quadrant are simultaneously minimized. For comparison, Figure 3.2b shows maximization of modularity, which uses B = A A BLUE jk because in this case the graph is compared against its null model. Maximizing modularity is equivalent to maximizing intra-cluster connections (top- left and bottom-right quadrants) and minimizing inter-cluster connections (top- right and bottom-left quadrants). In the following we demonstrate how this maxi- mization leads to modularity. (a) A (b) A A BLUE jk = B Figure 3.2: Maximizing information ow (a) and modularity (b). The plus sign indicates regions we maximize and the minus regions we minimize. Let y 1 and y 2 , with elements equal to 1 or 0, be indicator vectors for two clusters. Motivated by Figure 3.2, we dene the following objective functions that maximize modularity and information ow, respectively: J M (y 1 ; y 2 ) = (y T 1 By 1 + y T 2 By 2 y T 2 By 1 y T 1 By 2 ) (3.7) J IF (y 1 ; y 2 ) = y T 1 Ay 2 y T 2 Ay 1 (3.8) 58 Each term in the objective functions corresponds to one quadrant in Figure 3.2. These expressions are simplied with a transformation of variables: y 1 = (1 + s)=2 and y 2 = (1 s)=2, which replaces indicator vectors y 1 and y 2 with a single indi- cator vector s with elements 1 and1. The objective functions now become: J M (s) = 1 4 s T Bs (3.9) J IF (s) = 1 2 (s T A1 1 T As) = 1 2 s T k (3.10) with k = (k out k in ). Notice, the modularity expression above is same, up to a constant, as the original expression of modularity (cf. [86]), but has a dierent null model. We seek the partitions ^ s M and ^ s IF that maximize the above objective functions: ^ s M = arg max s J M (s) = arg max s s T Bs (3.11) ^ s IF = arg max s J IF (s) = arg max s (s T k) (3.12) According to spectral graph theory, maximization of equation (3.11) proceeds by allowing the elements of vector s to have continuous values and constraining its normjjsjj 2 2 = 1. Applying the method of Lagrange multipliers, we have: J M 0 = s T Bs s T s 1 (3.13) @J M 0 @s = s T B + B T 2s T = 0 (3.14) B + B T s = 2s (3.15) Therefore, maximum modularity is achieved when 2 and s are the maximum eigenvalue and eigenvector of B + B T , respectively. 59 Matrix B +B T is symmetric, which implies that modularity-based clustering of directed graphs does not have directional preference. When an edge is weaker that its expected value, it contributes equally to modularity regardless of whether it is from node i to node j or the opposite. Vector s is then discretized to -1,1 by setting a zero threshold. Because of discretization, further ne tuning is necessary to approach the global maximum and this is done with the Kernighan-Lin algorithm [79]. We have further optimized this algorithm by randomizing the sequence of nodes and allowing them to change group membership more than once. This still does not guarantee a global optimum as simulated annealing does, but it is a trade-o between computation time and accuracy. To partition the network into more than two groups, we recursively dichotomize the resulting sub-networks by maximizing equation (3.11) for each of them sepa- rately. After some partition steps and when modularity stops increasing, we have reached the maximum modularity for the whole network and at this point no more sequential partitioning should be performed. We maximize information ow solving equation (3.12) in a similar way. Solution is much simpler in this case: ^ s IF i = 8 > > < > > : 1; if k i 0 1; if k i < 0 (3.16) To combine the two methods, we propose a two step procedure: rst perform modularity based clustering on directed networks, and then information ow anal- ysis within each cluster. This approach not only reveals the community structure of a network, but also identies information sources and sinks within each module. 60 3.1.3 Tikhonov Weighted Hybrid Clustering Here we propose a hybrid objective function for bipartitioning the graph. We hope to achieve a partition with dierent weightings between purely directional information and purely modularity considerations. Following the setting of y 1 and y 2 and denition of objective functionJ M andJ IF , we can use a Tikhonov weighting factor to combine the objective functions: J H (y 1 ; y 2 ) = (y T 1 By 1 + y T 2 By 2 y T 2 By 1 y T 1 By 2 ) + y T 1 Ay 2 y T 2 Ay 1 (3.17) with a single Tikhonov parameter adjusting between them. We then use the transformation y 1 = (1 + s)=2 and y 2 = (1 s)=2 to convert into single indicator vector s that has 1 and1 for the nodes grouped in two subgraphs, respectively. J H (s) = c s T Bs + 2 [1 T As s T A1] = c s T Bs + 2 [k inT s s T k out ] = c s T Bs + 2 s T k (3.18) with k = k out k in and the scaling factor c = 1=4. We can favor the information ow by setting very large, favor the modularity by setting zero, or some mediate value to balance the contribution between the two. 3.1.4 Hybrid Partitioning Implementation To achieve a desirable partition, we look for the best indicator vector to optimize the hybrid objective function in equation (3.18). Instead of directly optimizing over 61 the discrete domain, we relax the solution s on the continuum with a constraint on its normjjsjj 2 2 = 1. Applying the method of Lagrange multipliers we have: J H 0 (s;) = s T Bs + 2 s T k(s T s 1) (3.19) with > 0. By setting the partial derivatives of J 0 with respect to s and , the optimal solution satises the following: 8 > > < > > : ( I G)s =C jjs jj 2 2 = 1 (3.20) where C = 4 (k) and G = B+B T 2 . Compared with [21] this is no longer an eigen- value problem because C is nonzero in general. However, we can use an iterative numerical method proposed in [8] together with the unbiased property of the null model A BLUE jk . We can easily get A BLUE jk 1 = k out and A BLUE jk T 1 = k in . So B 1 = 0 and B T 1 = 0. This implies 1 is a eigenvector of G with eigenvalue zero. Recall this is the unbiased property on the degree sequence for the BLUE null model. De- note the largest eigenvalue of G as '(G), which is guaranteed to be non-negative because zero is one of its eigenvalue. The iterative method is given as follows: 1. set min = '(G) and max = min . 2. compute = min +max 2 . 3. compute f() = s T s where s = (I G) 1 C. 4. if f()> 1 + then min =; if f()< 1 then max =; if both not, return . 5. go to step 2. 62 Notice controls the stopping point of this iterative algorithm. is set to a large enough value to ensure f( max )< 1. This iterative method works because the function f() is proven to be monotonically decreasing in the range ( '(G);1). Once we solve for the solution s in the continuum by the above method, we convert the indicator vector into discrete value (1 or 1) by simple thresholding. Again, the Kernighan-Lin algorithm is applied to further ne-tune the solution for better solution. For partitioning involving more that two clusters, we can sequentially perform the bipartition scheme on each of the subnetworks. Notice that unlike the purely modularity driven case which has a natural stopping criterion, this is not necessarily the case for the sequential hybrid clustering. This is because if we examine the information ow part, we can see there is always an increase for one additional cut if k6= 0. So if the decrease in modularity is less weighted by a larger , the algorithm will favor additional cuts. 3.2 Results 3.2.1 MEG Simulated Networks We demonstrate the performance of our directed graph partitioning method on a simulated network of interacting cortical sources estimated from MEG (magne- toencephalography) data. Using a 4th order AR (autoregressive) model, time series of 4 sources were simulated with topography and Granger causality conguration shown in Figure 3.3. Interactions were limited within hemispheres in low or high frequencies. Simulated data consisted of 100 trials with 4 seconds each were sim- ulated for a 306 channels MEG system, an overlapping spheres head model, and 63 Tikhonov-regularized minimum-norm imaging we used to compute cortical current density maps from these data. For computational reasons, network analysis was performed on data averaged over each of the 742 cortical regions of interest (ROIs) symmetrically located across hemispheres, but results are interpolated on a higher resolution cortical surface for display purposes. Figure 3.3: Top: topography of interacting cortical sources; Bottom: Spectral Granger causality between pairs of sources. Figure 3.4 shows the results of our modularity and information ow analysis. MEG measurements were ltered and interactions were analyzed separately for low 64 frequencies f L (0 20Hz) and high frequencies f H (40 60Hz). Row (a) shows the anity matrices of the directed cortical networks, and row (b) shows clustering based on modularity using our conditional expected network. As expected, the method resulted in two clusters corresponding to two pairs of interacting sources, 1-2 and 3-4. Even though our results have low spatial resolution due to MEG imaging limitations, the network organizational structure is clearly visible. In row (c), subsequent information ow analysis revealed sources (red) and sinks (blue) of information within each cluster. Even though not explicitly simulated, very weak causal relationship exists between sources 3-4 in the low frequency, which has been detected in the binary maps of the third row. To measure the contribution of each vertex to information ow, we plot maps of k values in row (d), which accurately indicate minimal communication between sources 3-4 in low frequencies. 3.2.2 MEG Resting Networks As a real data example, we used 130-second 2000Hz-sampled MEG data recorded from one subject in an eye-open resting experiment. Using Granger causality to measure interactions, we constructed 5 dierent directed networks - delta(1-4Hz), theta(4-7Hz), alpha(8-12Hz), beta(12-30Hz), gamma(30-60Hz) - and performed graph partitioning on individual networks. Results are given in Figure 3.5 and 3.6. Despite the unknown ground truth, the symmetry of the detected subnetworks across hemispheres in this very preliminary study is promising and worth more investigation. 65 3.2.3 Hybrid Network Partitioning We demonstrate the hybrid network partitioning method with the example of built- in structure of 16-node-each clusters (1 16; 17 32) on a 32-node binary network. The structure is deliberately introduced using a larger Bernoulli distribution pa- rameter (p = 0:5) within cluster and lower for edges between cluster (p = 0:2). However, the connections are set to only allow uni-directional edges between two sets of nodes:f1 8; 25 32g andf9 24g. We can see when we set the Tikhonov weighting factor = 0, as shown in the top row of Figure 3.7, we extracted the community structure. In contrast, when we set = 3 we nd a dierent result that reveals only information ow (the bottom row of Figure 3.7). This implies that we can balance the contribution from modularity and from information ow by chang- ing the Tikhonov parameter and optimizing the corresponding hybrid objective function to achieve a clustering result we think is physically meaningful. 3.3 Conclusion We have proposed two novel methods to partition directed networks. The rst method uses a conditional expected network to partition graphs based on modular- ity, and the second method detects information ow within the network. Despite limited MEG spatial resolution, we have successfully demonstrated our methods on MEG simulated cortical networks based on Granger causality, and our next goal is to apply these methods to experimental data, including a resting state study, for which we show only preliminary results here. On the other hand, the Tikhonov weighted hybrid partitioning method allows us to combine the two dierent kind of information together when each one is meaningful on its own. 66 f L (0 20)Hz f H (40 60)Hz (a) Cluster1 Cluster2 Cluster1 Cluster2 Figure 3.4: Partition results based on modularity and information ow. a) An- ity matrices for low and high frequencies, calculated using Granger causality; b) Clusters based on modularity; c) Sources (blue) and sinks (red) of information within each cluster; d) Maps of k, representing the contribution of each vertex to information ow. 67 Delta Band Theta Band Alpha Band Figure 3.5: Paritioning results for a single subject resting MEG study in which we looked at delta, theta, and alpha band interactions. Each row represents a subnetwork identied by modularity using our null model. 68 Delta Band Theta Band Alpha Band Figure 3.6: Resting state MEG partitioning results for beta and gamma interactions measure by Granger causality and the partition result for interactions measured by correlation coecient. Each row represents a subnetwork identied by modularity using our null model. 69 Figure 3.7: Hybrid network clustering. Left: simulated network of 32 nodes. Middle top: reordered network adjacency matrix after the partitioning with = 0. Middle bottom: reordered network adjacency matrix after partitioning with = 3. Right top and bottom: graphical representations of the corresponding partitioning results. 70 Chapter 4 Assessing the Signicance of Modularity-Based Graph Partitioning Multivariate analysis of structural and functional brain imaging data can be used to produce network models of interaction or similarity between dierent brain struc- tures. Graph partitioning methods can then be used to identify distinct subnet- works that may provide insight into the organization of the human brain. Although several ecient partitioning algorithms have been proposed, and their properties studied thoroughly, there has been limited work addressing the statistical signi- cance of the resulting partitions. We present a new method to estimate the statisti- cal signicance of a network structure based on modularity. We derive a numerical approximation of the distribution of modularity on random graphs, and use this distribution to calculate a threshold that controls the type I error rate in parti- tioning graphs. We demonstrate the technique in application to brain subnetworks identied from diusion-based ber tracking data and from resting state fMRI data. 71 4.1 Introduction The study of community structure of graphs has revealed interesting patterns in the structure and function of the human brain [18] with features that vary across neurological diseases, aging, and cognitive processes [3, 98]. A large number of methods have been proposed to identify natural divisions of networks into groups. Perhaps the most popular is modularity [109], which compares the network against a null model and favors within module connections when edges are stronger than their expected values. Divisions that increase modularity are preferred because they lead to modules with high community structure. Despite the popularity of modularity methods, the statistical signicance of network partitions has received less attention. Random networks can exhibit high modularity because of incidental concentration of edges, even though they have no underlying organizational structure [57]. This is even more evident in large networks where the number of possible divisions increases rapidly with the network size [76]. Therefore, signicant divisions of a network should have higher modularity than random graphs [57,118]. An alternative approach was proposed in [76] to evaluate the signicance of a partition: a community structure should be robust against small perturbations of the network. In this chapter, we propose a statistical procedure to test the signicance of a community structure based on its modularity value. As a surrogate of modularity, we use the largest eigenvalue of the dierence between the anity matrices of the network and its null model. Based on the previous work on null models [21], we show that the distribution of the largest eigenvalue can be well approximated with a Gamma distribution. We derive an empirical formula for the parameters of the Gamma distribution with respect to the size of the network and the variance of 72 its edges. Based on this distribution we compute a p-value for the community structure, which can be used as a threshold criterion when partitioning a graph. We demonstrate our method with simulated and real brain networks. 4.2 Method 4.2.1 Modularity and Null Models Consider an undirected weighted graph with N nodes, adjacency matrix A, and connection strength between nodesi andj denoted asA ij . The degree vector k has elements k i = P N j=1 A ij , indicating the sum of all edge strengths associated with node i. Dene the edge vector x containing all possible edges in A, according to the mappingA ij =x l , wherel = (2Nj)(j1) 2 +(ij) (81j <iN). In Chapter 2 and in [21,22], we derived the expected network conditioned on the degree vector k: E(xjk) = xjk = x + xk 1 k (k k ) (4.1) with the conditional covariance matrix: xjk = x xk k 1 kx (4.2) As indicated in [21], the conditional expected network satises the conguration model criteria (same number of nodes and equal degrees), while at the same time preserving network topology and allowing negative connections. Furthermore, it is in general the best linear unbiased estimator given the node degrees. 73 Dene B to be the dierence matrix between the original and the expected null network, with elementsB ij =A ij E(A ij jk) =A ij E(x l jk) . ModularityQ [109] is then computed as: Q = 1 2m X i;j B ij (C i ;C j ) (4.3) where C i indicates group membership of node i, (C i ;C j ) = 1 only when node i and j are clustered within the same group, and m is the total sum of the weights of all edges in the network. The best bi-partition of the graph that maximizes modularity can be described by an indicator vector s, having values 1 and1 for the two subnetworks, respectively: ^ s = max s Q = max s s T Bs (4.4) Spectral graph theory solves equation (4.4) in the continuous domain while con- straining the norm of s. Maximization is achieved by selecting the eigenvector cor- responding to the largest eigenvalue 1 of B. The elements of s are then discretized to1; 1 by setting a zero threshold. Because of this discretization, further ne tuning is necessary to locally maximize Q, which can be done using, for example, the Kernighan-Lin algorithm. Given that the nal maximum value of modularity Q is close to 1 , we can use the latter as a surrogate for modularity. 4.2.2 Distribution of the Largest Eigenvalue We consider a graph partition statistically signicant if modularity Q is substan- tially higher than the one achieved by partitioning random graphs. Therefore, we seek the distribution of modularity for random graphs; hypothesis testing would 74 then proceed by selecting a 5% threshold in the right tail of this distribution. In- stead of using modularity, we use the largest eigenvalue of B as a surrogate. Here we assume the network edges follow a jointly Gaussian distribution with mean 1 and variance 2 I. Although the largest eigenvalue distribution of simple Gaussian random networks (GRN) has been studied, there is no closed form solution for the case of correlated edges caused by conditioning on k. We therefore follow a Monte Carlo approach. We will look at the connection with respect to Gaussian random ensembles in the next chapter. For a network size N = 20, we generate 100 Gaussian random networks A for given values of and 2 . For each network, instead of randomly permuting network edges, we generate 10 6 random networks A Random , by sampling from equa- tions (4.1,4.2), and then compute the largest eigenvalue of B Random = A Random E(A Random jk). The largest eigenvalue distribution is shown in Figure 4.1 for several values of mean and variance 2 , and it is evident that it only depends on the value of 2 . 4.2.3 Approximation with Gamma Distribution Family The empirical distribution of the largest eigenvalue in Figure 4.1 is skewed to the left for all values of 2 . This distribution can be accurately approximated by a Gamma distribution: f(xj;) = x 1 e x () (4.5) where determines the shape and the scale of the distribution. () is the Gamma function with parameter . An example of the approximation is shown in Figure 4.2. Therefore, the empirical distribution of the largest eigenvalue can be well represented by a pair of parameters (;). These parameters are functions of 75 Figure 4.1: Top: Distribution of the largest eigenvalue of B for several values of mean and xed = 0:8. Bottom: Distribution of the largest eigenvalue of B for several values of standard deviation . the network sizeN and variance of the edges 2 , and in the next section we identify the form of this functional relationship. 4.2.4 Fitting with dierent network sizes We repeated the above Monte Carlo procedure and estimation of the Gamma distri- bution parameters (;) for dierent values of network size N and edge standard deviation . Figure 4.3 shows that does not depend on , whereas linearly increases with. The network sizeN does not change the nature of these relation- ships. 76 Figure 4.2: Monte Carlo distribution of 1 for = 1 and its best t with a Gamma distribution. The maximum likelihood estimate of the Gamma distribution param- eters was ^ = 117:99 and ^ = 0:06321 with estimator variance 2 ^ = 0:0277 and 2 ^ = 8 10 9 Figure 4.4 plots the value of and the ratio = for dierent values of net- work size N. Simple curve tting methods produced the following formula for the parameters of the Gamma distribution: ^ (N) = 2:459N 1:335 16:453 (4.6) ^ (N;) = 0:84N 0:87 + 0:0015 (4.7) For every bi-partition of a graph, we can determine the signicance of the cut by comparing the largest eigenvalue of B against the distribution of eigenvalues of B Random . So rather than accepting a partition if we nd an increase in modular- ity, instead we test whether the increase is statistically signicantly above a 5% threshold. The procedure is given in the following steps. For a network A, esti- mate edge variance 2 . Then apply spectral graph partitioning and compute the largest eigenvalue of matrix B [21]. Use equation (4.6) and (4.7) to estimate the Gamma distribution parameters and then compare the largest eigenvalue against 77 Figure 4.3: Fitted parameters ^ and ^ in the Gamma distribution as a function of standard deviation for network size N = 20, N = 30, and N = 40. the Gamma distribution and accept the partition if it is above the 5% threshold in the right tail of the distribution. 4.3 Results To test whether the above formula holds for all networks irrespective of their com- munity structure, we generated a number of dierent networks whose edges were i.i.d Gaussian random variables with mean 4 and variance 1. For each network we simulated two clusters with variable size N 1 N 2 , such that N 1 +N 2 =N, where N = 20 is the total number of nodes. This community structure was enforced by randomly allocating the stronger values of the Gaussian distribution as intra-cluster edges and the weaker values as inter-cluster edges. In all cases, we repeated the procedure in Section 4.2 to generate random graphs A Random and re-estimate the pa- rameters of the Gamma distribution. The results are shown in Figure 4.5 and are in 78 Figure 4.4: Left: ^ as a function of network size N. Right: ratio of ^ with respect to as a function of network size N. close agreement with equations (4.6) and (4.7). This result agrees with our expec- tations; irrespective of the community structure of the original networks, A Random is a randomized network with no community structure to aect the distribution of the largest eigenvalue. Figure 4.5: Estimation of Gamma distribution parameters when the original net- works used to create A Random networks have a community structure of two clusters withN 1 andN 2 nodes such thatN 1 +N 2 = 20. For eachN 1 value, we generated 100 original networks which further produced 10 4 random networks each. The Gamma parameters estimated from the histogram of those random networks are in close agreement with those predicted by equations (4.6) and (4.7). Figure 6a shows a Gaussian network with mean 1, variance 4, and a community structure of two 10-node clusters. To blur its community structure, we randomly permuted a portion of within and between edges producing the network on Figure 79 4.6b. The black lines in the adjacency matrices represent the clustering results by modularity-based partitioning. The computed modularity values are Q a = 0:098 andQ b = 0:038, indicating that the original unperturbed network has the stronger community structure. Our statistical procedure produced p-values p a 0 and p b = 0:73, indicating only the rst network has a signicant community structure. (a) (b) Figure 4.6: The bi-partitioning results for two Gaussian networks; (a) shows a clear network structure while (b) is less structured. We applied signicance-based modularity partitioning to the structural brain network reported in [61]. The network consists of 66 nodes representing FreeSurfer parcellated cortical regions and edges representing the neuronal ber densities be- tween pairs of regions. In [21] we reported partition results of this network, and using the method described above we conrm that all bi-partitions leading to 5 clus- ters are signicant. The rst bipartition, separating the two hemispheres (medial subnetwork goes with the right hemisphere), had p 0, and the subsequent bipar- titions were p = 1:33 10 6 , p = 2:17 10 9 , andp = 8:96 10 5 . The structural data reveals subnetworks that largely follow the classical lobe-based partitioning of cerebral cortex. We also investigated subnetwork partitioning using networks extracted from resting state fMRI data [48]. The data consists of 191 subjects from the Beijing data set of the 1000 Functional Connectome Project in NITRC [78]. The 96 nodes 80 Figure 4.7: Modularity-based partitioning of the structural brain network in [61]. Each color represents a dierent subnetwork. All clusters are statistically signi- cant. represent ROIs dened on the Harvard-Oxford atlas and edges indicate the corre- lation coecient between fMRI signals for each pair of ROIs. Correlations were computed after bandpass ltering in the range 0.005-0.1Hz. We detected 3 clus- ters: the default mode network, a motor-sensory subnetwork, and a visual-related subnetwork. All bipartitions had p-values p 0. In contrast to the structural sub- networks, the functional subnetworks involve larger scale interactions and follow well known functional subdivisions. Figure 4.8: Partition results of a resting state fMRI network. All 3 clusters are signicant. Finally, we partitioned the widely-studied Karate Club network in [165] into four clusters, as in Figure 4.9. Only the rst bipartition (separating groups 1,2 against 3,4) is signicant with p-value p = 2:12 10 12 . This bipartition exactly 81 predicts the subsequent split of the Karate club into 2 groups. The remaining two bipartitions are insignicant: p = 0:067 for Groups 1 - 2 and p = 0:486 for Groups 3 - 4. Figure 4.9: Partition of the Karate Club network [165]. 4.4 Conclusions Even though graph partition methods are becoming increasingly popular, assess- ment of the statistical signicance of the resulting partitions has drawn little at- tention. We believe this will change in the future in the same way that statistical thresholding procedures are paramount in analyzing brain imaging studies. Using Monte Carlo procedures, we have derived equations that can control the type I error rate in bipartitions of graphs. It is therefore no longer necessary to perform computationally intensive permutations of edges and randomization of graphs every time a new graph is to be partitioned. The Gamma distribution provides an excellent t to the empirical data so that a parametric approach based on these distributions should be sucient for assessing subnetwork signicance. 82 Chapter 5 On the Largest Eigenvalue for Modularity-Based Graph Partitioning 5.1 Introduction In the previous chapter, we described an empirical study of the distribution of the maximum eigenvalue of the matrix used to optimize modularity for network partitioning. While we showed that the gamma distribution provides an excellent t to our data, we do not give a theoretical basis for this observation. In this chapter we take alternative approach in which we relate our problem to existing results for families of matrices for which the distribution of the largest eigenvalue is known. Spectral clustering algorithms [155] use the extreme eigenvalues and correspond- ing eigenvectors of matrices, such as the Laplacian graph [133] or the adjacency matrix [21, 108] to partition data into groups with similar properties. To evaluate the statistical signicance of spectral clustering results, we need to compare the spectral decomposition of a given network against those from random networks. Since networks are represented by their adjacency matrix, a connection between random networks and random matrix theory is natural. 83 Random matrix theory provides analytical formulas for a family of matrices called Gaussian random ensembles. The eigenvalue distribution of random matrices, especially for Gaussian random ensembles, has been thoroughly studied [30, 38, 145]. However, random matrix theory typically assumes an independent identical distribution for the matrix elements. Even though this is satised by random networks, it is violated by the matrices used for spectral clustering. Examples include the Laplacian graph of random networks, used in the normalized cut, and the deviation of random networks against their null model, used in modularity- based methods. In this chapter, we focus on modularity-based methods and relate Gaussian random ensembles with modularity partitioning. We demonstrate that the Tracy-Widom mapping of the largest eigenvalue of Gaussian random ensembles can be modied to predict the distribution of the largest eigenvalue of matrices used for modularity-based spectral clustering. Using this nding we derive formulas that control the type I error rate on modularity- based partitions. Using Gaussian random ensembles, we derive formulas for the maximum eigen- value that allows us to directly evaluate the statistical signicance of a modularity partition without the need for Monte Carlo simulations. 5.2 Methods In this section, we rst discuss Gaussian random ensembles and provide analytical formulas for the distribution of their largest eigenvalue [145]. We then describe the modularity partitioning method and its maximization using spectral decomposition. Finally, we derive formulas for the distribution of the maximum eigenvalue of a 84 random matrix minus its null model, and use these results to threshold partitions based on modularity. 5.2.1 Gaussian Random Ensembles Tracy and Widom [145] derived analytical solutions for the distribution of the largest eigenvalue of three types of random matrices G : the Gaussian Orthogonal Ensembles = 1), the Gaussian Unitary Ensembles (GUE, = 2), the Gaussian Symplectic Ensembles (GSE, = 4). G 1 = A 1 + A T 1 2 (5.1) G 2 = A 2 + A H 2 2 (5.2) G 4 = A 4 + A D 4 2 (5.3) where A 1 is NN with real elements following an i.i.d.N (0; 2 2 ) normal distri- bution, A 2 isNN with complex elements whose real and imaginary parts follow an i.i.d.N (0; 2 ) normal distribution,D denotes the self transpose of a quaternion matrix, and A 4 is 2N 2N obtained from: A 4 = 0 B @ X Y conj(Y) conj(X) 1 C A where X and Y are two independent realizations of A 2 of sizeNN. The standard deviation of the o-diagonal elements of all matrices G 1 , G 2 and G 4 is . Of spe- cial interest to modularity-based partitioning is the GOE case, because undirected networks have real and symmetric adjacency matrices. 85 Let T = 1 (G ) be a random variable equal to the largest eigenvalue 1 of any of the above random ensembles. Tracy and Widom [145] derived an important scaling formula that predicts the distribution of T : S = 1 (N ) 1 6 T 2(N ) 2 3 (5.4) where N = 8 > > > > > > < > > > > > > : N ; = 1 N ; = 2 2N + 1 ; = 4 (5.5) and S follows the Tracy-Widom distributions below. Note, that = 2 is the simplest case and is used to derive the other cases: F 1 (s) 2 = F 2 (s)e R 1 s q(x)dx (5.6) F 2 (s) = exp Z 1 s (xs)q (x) 2 dx (5.7) F 4 s p 2 2 = F 2 (s) e 1 2 R 1 s q(x)dx +e 1 2 R 1 s q(x)dx 2 ! 2 (5.8) were q is the solution to the Painlev e II equation: q 00 = sq + 2q 3 , satisfying the conditionq(s) Ai(s) as s!1, with Ai the Airy function [145]. Figure 5.1 shows the distribution of S for the three Gaussian random ensembles. Using equation (5.4), we can derive the distribution of the largest eigenvalue for any Gaussian ensemble (denoted by T ) for dierent values of N and . 86 Figure 5.1: Tracy-Widom distributions for the three types of Gaussian random ensembles. Left: cumulative density function; Right: probability density function. 5.2.2 Modularity Consider an undirected weighted graph with N nodes, adjacency matrix A with connection strength between nodes i and j denoted as A ij , and degree vector k with elementsk i = P j A ij , indicating the sum of all edge strengths associated with node i. In [21], we derived the expected network conditioned on the degree vector k: E(A ij jk) = 8 > > < > > : k i +k j N2 2m (N1)(N2) ;i6=j 0 ;i =j (5.9) This equation assumes the network A has independent identically distributed edges with Gaussian distribution, but it is also the best linear unbiased estimate of the expected network for any other distribution. Dene B as the dierence matrix between the original network and the expected null network, with elementsB ij =A ij E(A ij jk). ModularityQ is then computed as: Q = 1 2m X i;j B ij (C i ;C j ) (5.10) 87 whereC i indicates group membership of nodei,(C i ;C j ) = 1 only when nodei and j are clustered within the same group, and m is the total sum of the weights of all edges in the network [21, 108]. The best bi-partition of the graph that maximizes modularity can be described by an indicator vector y having 1 and1 for two subnetworks, respectively [108]: ^ y = max y Q = max y y T By (5.11) Spectral graph theory solves equation (5.11) in the continuous domain while con- straining the norm of y. Maximization is achieved by setting y equal to the eigen- vector corresponding to the largest eigenvalue 1 of B. The elements of y are then discretized to -1 and 1 by setting a zero threshold. Even though this discretization leads to a somewhat dierent value of modularity, the maximum eigenvalue 1 is a close surrogate of modularity. 5.2.3 Modularity Signicance and Gaussian Ensembles Random networks can exhibit high modularity because of incidental concentration of edges, even though they have no underlying organizational structure. Therefore, partitions of a network should be deemed signicant only if they lead to modularity appreciably higher that partitions of random graphs. In other words, we need to test whether the maximum eigenvalue 1 of matrix B of the original network is appreciably higher (say above a 5% signicance level) than the distribution of eigenvalues of matrices B Random of random graphs. 88 To develop this test, we investigated whether the Tracy-Widom mapping in equation (5.4) for = 1 can predict the distribution of eigenvalues of B Random matrices. We consider the relationship: S =w 1 (N;)T +w 2 (N;) (5.12) where S follows the distribution in equation (5.6), i.e. f S (s) =f 1 (s) =dF 1 (s)=ds, andT is a new random variable sampling the largest eigenvalue of B Random matrices. We seek to identify the functional form ofw 1 (N;) andw 2 (N;), and check to what extent they conform to the Tracy-Widom mapping in equation (5.4), namely 1 N 1=6 and2N 2=3 , respectively. Using a Monte Carlo approach, we simulate 10 6 Gaussian random graphs A Random for a given value of N and . For each graph, we compute B Random as described in Section 5.2.2 and then estimate the empirical distribution of the largest eigenvalue, termed f T (t). The functional transformation between S and T in equation (5.12) indicates thatf S (s) = 1 w 1 f T ( sw 2 w 1 ); we dropped the dependency onN and to ease notation. We use the total Kullback-Leibler divergence to measure the distance between the two distributions: D KL (S;T;w 1 ;w 2 ) = Z 1 1 " f 1 (s) log f 1 (s) 1 w 1 f T ( sw 2 w 1 ) + 1 w 1 f T ( sw 2 w 1 ) log 1 w 1 f T ( sw 2 w 1 ) f 1 (s) # ds (5.13) We then minimize the total Kullback-Leibler divergence over w 1 and w 2 : ( ^ w 1 ; ^ w 2 ) = min w 1 ;w 2 D KL (S;T;w 1 ;w 2 ) (5.14) 89 Figure 5.2: Empirical cumulative density functions F T (t) constructed by Monte- Carlo simulations for networks of dierent size N and edge standard deviation . Every minimization of the above equation produces a pair of values for w 1 (N;) and w 2 (N;) for any specic (N;). By repeating the above procedure multiple times, we sample the two functions across the (N;) plane. 5.3 Results We performed the above Monte Carlo procedure for multiple network sizes N 2 [10; 300] and standard deviations2 [0:05; 100], suciently sampling the functions w 1 (N;) and w 2 (N;). We then t these samples to power functions similar in form to the Tracy-Widom scaling formula in equation (5.4). We achieved accurate ts with the functional forms below: w 1 (N;) = 1 N 1 + 1 (5.15) w 2 (N;) = 2 N 2 (5.16) The values of the constants 1 , 1 , 1 , 2 , and 2 , as well as their 95% condence intervals, are given in Table 5.1. 90 Figure 5.3: Illustration of the minimization on Kullback-Leibler divergence for a specic network size N = 100 and standard deviation = 3. The distribution of random variable S is almost identical to the transformed distribution of T . Table 5.1: Top row: estimated coecients in equations (5.15) and (5.16). Middle and bottom rows: 95% condence intervals. 1 1 1 2 2 0.1646 0.3569 1.364 -2.072 0.6614 0.1565 0.3502 1.349 -2.077 0.6609 0.1726 0.3636 1.379 -2.068 0.6618 We also dene the proportional estimation error as e 1 = ( ^ w 1 w 1 )= ^ w 1 and e 2 = ( ^ w 2 w 2 )= ^ w 2 for the two functions, respectively. The root mean square errors were rms(e 1 ) = 0:0042 and rms(e 2 ) = 0:0105, indicating that equation (5.15) and (5.16) closely match the sampled data. These results indicate that the Tracy-Widom mapping in equation (5.4) is very close to the mapping required for the largest eigenvalue of matrix B Random . In fact, function w 2 (N;) is practically identical, with values dierent only below the 2nd 91 log 10 (w 1 (N;)) w 2 (N;) Figure 5.4: Surfaces of log 10 (w 1 ) and w 2 computed by equation (5.15) and (5.16). decimal point, most probably due to numerical errors. However, function w 1 (N;) is dierent, as we had to include a constant 1 to achieve a good t. We can now use equations (5.15) and (5.16) to test the signicance of a modu- larity bi-partition as follows: For each bi-partition of a graph A, we compute the largest eigenvalue t = 1 (B) = 1 (AE(Ajk)). We then linearly transform t into s through equation (5.12), and compute the value F 1 (s). The bi-partition is accepted as signicant only if 1F 1 (s)< 0:05, which controls the type I error rate at a 5% level. We applied the above statistical procedure on the Karate Club network [165], which is widely accepted to have two clear modular subnetworks, but is partitioned into four clusters when maximization of modularity is used as a sole criterion. Our proposed signicance test resulted at a very small p-value p 0 for the rst bi- partition, which is deemed signicant. Even though the subsequent two partitions showed further increases in modularity, they had p-values 0.0895 and 0.509 and thus were not signicant. 92 Finally, we provide some details on the numerical optimization of equation (5.14). Direct minimization using gradient decent often leads to local minima rather than the global minimum. To overcome this problem, we used an infor- mative initialization of the gradient decent algorithm, by taking advantage of the direct mapping between the cumulative density functions F 1 (s) and F T (t). For a number of values x i 2 [0; 1] we derived the corresponding s i and t i such that fs i jF 1 (s i ) =xig andft i jF T (t i ) =xig. We then estimated the initialization values w init 1 and w init 2 with a least-squares minimization: w init 1 ;w init 2 = min w 1 ;w 2 X i (s i (w 1 t i +w 2 )) 2 (5.17) The valuesw init 1 andw init 2 were then used to initialize the gradient decent algorithm in equation (5.14). 5.4 Conclusion We have derived formulas that enable us to test modularity-based bi-partitions for statistical signicance, and demonstrated our method to a real world network, the Karate Club network. Furthermore, we have shown that there is a very close relationship between the Tracy-Widom mapping equation in (5.4) and the mapping required for modularity-based partitions in equation (5.12). This indicates that the analytical derivations used for the Tracy-Widom mapping could be modied or extended to other matrices, such as the B matrix used for modularity-based partitions. We have introduced two dierent methods to model the null distribution: Tracy- Widom distribution modeling in equation (5.6) with corresponding mapping in this 93 chapter, and the Gamma distribution modeling described in Chapter 4. According to the close tness from both modelings to the actual null distribution of the largest eigenvalue, we note that they should be very close to each other. Directly tting Gamma distribution to equation (5.6) does not work because the Gamma random variable can only be non-negative. We need further investigation to directly link these two models together. 94 Chapter 6 Signicance of the Modular Structures in Binary Graphs 6.1 Introduction We have already explained that the modularity is a non-negative quantity mea- suring the extent of community structure of a particular graph. The zero-value modularity has the corresponding trivial partitioning result - a single cluster with- out any partition at all. However, conversely a network with high modularity does not necessary have a well-structured community of clusters. One such example is random graphs that may occasionally have high modularity values [57, 117]. This can be explained by some random variations in dierent graph realizations from the same distribution with i.i.d. edges, some of which happen to be inhomogeneous across the network, resulting in apparent clusters within a graph. As pointed out by [45], community structures are detected by comparing a graph against with its null model satisfying the conguration model. Therefore, a signicant community structure is valid only if it is appreciably larger than the modularity of correspond- ing random graphs of the same size and expected degree sequence. 95 In the previous chapters our work addresses random graphs with a Gaussian distribution, and we used the maximum eigenvalue as a surrogate for modularity. In this chapter we focus on Erd os-R enyi (ER) random graphs of the second type. In such an undirected binary random graph, edges follow an i.i.d. Bernoulli distri- bution with a certain parameter p. We study the characteristic of modularity of such ER realizations with dierent settings, such as network size and distribution parameter p. We nd that with proper normalizations, the distribution of maxi- mum modularity of an ER type random graph is roughly a function of network size N and Bernoulli parameter p. 6.2 Method 6.2.1 Monte-Carlo Experiment Setup As in the study of maximum eigenvalue of random graphs (Chapter 4, 5), we start with generating Erd os-R enyi random graphs where the edges follow an i.i.d. Bernoulli distribution with parameterp in the network of sizeN. We then calculate the maximum modularity Q 2 through bi-partitions using the general BLUE null model as in equation (2.12). The average degree can be calculated as: hki = P i;j A ij N = 2m N (6.1) Because these random graphs do not have self loops, the average degree is approx- imately equal to the sum of N 1 Bernoulli random variables: hki p(N 1) )p hki N 1 (6.2) 96 For a xed network size N, we uniformly picked up p2 (0; 1) as the Bernoulli distribution parameter and generate one Erd os-R enyi random graph realization. By construction, these random networks have no community structure. We generate 100000 such random graphs and calculate the modularity through bi-partitioning. We perform this Monte-Carlo experiment for dierent network size from 20 to 1000. Because in real network analysis, we do not know the exact parameter p, we use the empirical estimation from equation (6.2) to study the result as shown in Figure 6.1. We observe that the modularity of ER random graphs seems to be a function of averaged degreehki normalized roughly by the network size N 1. We can view this as the empirical Bernoulli distribution parameter p. For any specic network size, the modularity, as the measurement of modular structure, decreases to zero asp approaches to 1. This is because any community structure from random uctuations or edge inhomogeneity across the graph will vanish when more edges are present in the graph. On the other hand, when the random graph becomes more and more sparse, we expect edge inhomogeneity would occur more often and give rise to community structures. We also observe that the network size has an impact on how the modularity varies in such random graphs. For a xed p value, the mean and the variance of modularity Q 2 decreases with increasing network size N. For a size-N network there are N(N1) 2 Bernoulli random variable realizations. This indicates that either we need to perform Monte-Carlo simulations for every dierent N, or we need to normalize the modularity to counter the eect of the size of the network. 97 6.2.2 Normalization On Modularity Inspired by [118], we conjecture that modularity properly scaled according with the averaged degree information of the graph could reveal important information. Dene the normalized modularity ~ Q 2 as follows: ~ Q 2 =Q 2 hki D p k E (6.3) where D p k E is calculated for each graph: D p k E = P i p k i N (6.4) The scatter plots of ~ Q 2 versus hki N1 are shown in Figure 6.2. Despite the fact that the variance of the ~ Q 2 for a xed empirical parameterp vary with the network size N, ~ Q 2 lie at approximately the same value for dierent values of network size N. This scaling factor on modularity is a function of the averaged degree, which is a specic network property and is easy to compute. With this normalization, we can approximate the curve over a wide range of applicable network sizes. 6.2.3 Addressing Signicance The null hypothesis is that the random graphs have no community structure. From the Monte-Carlo experiment, we can calculate dierent percentile curves as func- tions of the empirical Bernoulli distribution parameter p. For example, the 0.95- percentileF 0:95 sits on top of 95% of the ~ Q 2 from the Erd os-R enyi random graph realizations: Prob ~ Q 2 (t)>F 0:95 (t) = 0:05,8t2 (0; 1) (6.5) 98 and thus gives us control of type I error. Figure 6.3 shows several dierent percentile curves. As aforementioned, edge inhomogeneity gives rise to the high modularity and this inhomogeneity has larger impact in a smaller graph and when the graph becomes more sparse. As a result, the leftmost end of the percentile curves have some bumps and may be less useful in assessing signicance. For any particular binary graph G, we calculate the modularity using the spec- tral bi-partitioning method. We also calculate the averaged degree, empirical Bernoulli distribution parameter which denoted as p G , and the averaged square- root of the degree. After proper scaling, we have the normalized modularity ~ Q G 2 . With the null hypothesis that the graphG has no community structure, the signif- icance test on the community structure of graph G would be: 8 > > < > > : ~ Q G 2 >F 0:95 p G , reject the null hypothesis ~ Q G 2 <F 0:95 p G , accept the null hypothesis (6.6) 6.3 Results We tested the proposed method to assess the statistical signicance of community structures in binary graphs for several dierent settings, both for simulations and real data examples: the Karate-Club network [165], the dolphin network [91], and binarized fMRI resting state networks (http://www.nitrc.org/projects/fcon_ 1000/). 6.3.1 Structured Binary Networks We designed two ways to create structured binary network realizations. We il- lustrate them using the following setting: network size N = 40 and two 20-node 99 modules. All possible connections for pairs of nodes from the same cluster are i.i.d. Bernoulli distributed with parameter p in . Similarly those for pairs of nodes from dierent clusters are i.i.d. Bernoulli distributed with parameterp out . To create the modular structure we want nodes that are heavily connected within cluster and loosely connected between clusters. For each graph realization, we randomly pick p in 2 (0:5; 1) and p out 2 (0; 0:5). The results are shown in Figure 6.4: (a) shows the scatter plot for each realization G on the space of empirical-parameter-p and normalized-modularity- ~ Q 2 whileF 0:95 as equation (6.5) are also plotted; (b) gives an example of such realization. The second approach also proceeds by creating dierent connection probabili- ties for nodes within cluster p in versus those between clusters p out . To ensure the community structure, for each individual graph these two probabilities are exactly 0.4 away from each other, i.e. p in p out = 0:4. The results are shown in Figure 6.5. In the two aforementioned simulations we create graph realizations G of clear community structures. They all lie above theF 0:95 (t) so we can say the structures detected by modularity-based partitioning are signicant. In the rst model we can have random graph realizations ranging from very strong modular structure (eg. p out = 0:01 and p in = 0:99) to quite weak modular structure (eg. p out = 0:49 and p in = 0:51). This is the reason that their normalized modularity, although all above theF 0:95 , can substantially exceed that value. In contrast, the second theme creates roughly the same extent of community structure by setting a xed dierence between p in and p out . So normalized modularity value ~ Q 2 are clustered at about the same distance away from theF 0:95 lines. 100 6.3.2 Fuzzy Structured Binary Networks In this simulation we aim to verify that when we destroy the network modular structure, ~ Q 2 decreases and become smaller than the threshold curveF 0:95 . We create random graphs using the \test bench" approach proposed in [85]. The mix- ing parameter denotes the portion of edges in such a graph connecting nodes from dierent clusters. In other words, the larger the value of , the less modular structure the network has. We generate 500 realizations for each values and their normalized modularity ~ Q 2 values are shown in Figure 6.6 for two specic test bench settings. We observe that when the network community structure collapses by increased mixing parameter , the normalized modularity not only decreases but eventually falls below the decision curveF 0:95 . Note that because in the test bench we could have more than two clusters, it is not the case that = 0:5 is the border between modular structure absence and presence. For example in the setting as Figure 6.6a, the cluster size is set between 50 and 120 so there would be 3 to 5 clusters in such networks. In such a case, = 0:55 will still make the graph of signicant community structures. 6.3.3 Real Data Networks We tested the method on three real data sets. One is the Karate Club [165] and another is the Dolphin network [91]. Both of them are believed to have community structures in social science. Using the aforementioned procedure, we compute the averaged degree for each of them. Also we perform modularity based partitioning to compute their modularityQ 2 after bi-partitioning. After scaling of Q 2 , we show 101 them together with theF 0:95 curves in Figure 6.7. Both networks lie above the de- cision curve - this indicates that the structures detected through graph partitioning are signicant. We also tested the method on resting fMRI functional connectome data. In some graph analysis cases, the connectome data is thresholded and the weights of edges are dropped [3, 74, 153]. Unweighted connectome networks can simplify the calculation and interpretation of network features and a variety of dierent construction techniques have been studied. The most common is to set a threshold; any edge strength less than the threshold is set to zero, meaning the absence of that edge. Those above the threshold are set to be one, representing the presence of those edges. Selecting this threshold is itself a potentially dicult question, but beyond the scope of this chapter. In this resting fMRI connectome data (http://www.nitrc.org/projects/fcon_ 1000/) we applied dierent thresholds which preserve dierent portion of the strongest edges in the network. And for each binary network constructed, we test how signif- icant the structure was after modularity-based partitioning. The results are shown in Figure 6.8. We observe that this resting fMRI network seems to have signicant community structures regardless of the threshold. 6.4 Discussion Compared with the work in the previous chapters (Chapter 4 and 5), we give another method to test the statistical signicance of the community structures of graphs. This Chapter focused on undirected binary graphs and we use Monte-Carlo simulations to create a non-parametric distribution of the modularity. Unlike the maximum eigenvalue approach, we do not yet have a quantitative approximation 102 for the p-value of the modular structure. However, the 0.95 percentile curvesF 0:95 also provide a way to control the type I error in modularity-based partitioning. The normalization of modularity into ~ Q 2 is essential to extend applicability to a wider range of network sizes. Unlike the previous research [118], which only looks at very sparse graphs, our approach gives curves forF 0:95 rather than a constant value. This allows us to perform the signicance test on network structures on dierent type of graphs besides those that are very sparse. We also note that the modularity is calculated with respect to our proposed BLUE null model rather than the conventional null model proposed by Newman and Girvan [108,109]. The value of Q 2 could be dierent if this alternative model is used. We do observe that the distribution of ~ Q 2 have a larger variance for any specic empirical parameter p in a smaller network. This is shown in Figure 6.9. This indicates two things. First, we should be very carefully in interpreting the result on the boundary ofF 0:95 , especially when we are using a uniedF 0:95 rather than a specic one for the exact network size. AlthoughF 0:95 overlap for a range of dierent network sizes, the tails of the empirical distribution still vary with network size. Second, we need to be cautious when addressing the statistical signicance level - the distance fromF 0:95 needs to be interpreted as function network size. 6.5 Conclusion This work complements the signicance study on the network community structure by modularity-based graph partitioning. Although it has the limitation that it lacks a complete probability distribution model (or p-value), hypothesis testing based on a non-parametric distribution of properly scaled modularity from random graphs is practically useful and can control the type I error. ThisF 0:95 curve not only 103 broadens the applicable network size range, but also covers from dense graphs to sparse graphs. The networks from the simulations and real world data such as the Karate Club, the Dolphin network, and the resting state fMRI connectome illustrate the practical utility of this proposed method. 104 Figure 6.1: Scatter plots of modularities of dierent Erd os-R enyi random graph sizes. Each single circle represents a realization. We observe that the results change with respect to the network size. The bend of curves shrink in width (range of modularity) and approach to zero in larger random networks. 105 Figure 6.2: Scatter plots of modularities of dierent Erd os-R enyi random graph sizes. Each single circle represents a realization. Y-axis is the normalized modular- ity ~ Q 2 . Compared with Figure 6.1, the locations of bend of curves from dierent N settings are roughly aligned. 106 F 0:50 F 0:75 F 0:90 F 0:95 Figure 6.3: Dierent percentile curves as functions of empirical Bernoulli distribu- tion parameter p. Dierent colors indicates the curve for dierent network size N. We can see the curves are approximately aligned by scaling the modularity to be ~ Q 2 by equation (6.3). 107 (a) (b) Figure 6.4: (a) The scatter plots of 5000 realizations of structured graphs. Graphs are generated to have a 20-20 cluster by randomly choosing the within cluster Bernoulli distribution parameterp in 2 (0:5; 1) and between cluster connection prob- abilityp out 2 (0; 0:5). The adjacency matrix of one such realization is given in (b); the top left and bottom right quadrants denote the within cluster connections for two modules (p in = 0:85) while the top right and bottom left quadrants denote the between cluster connections (p out = 0:15). (a) (b) Figure 6.5: (a) The scatter plots of 5000 realizations of structured graphs. They are generated to have a 20-20 cluster by randomly choosing the within cluster Bernoulli distribution parameterp in 2 (0:4; 1) and between cluster connection prob- ability p out =p in0:4 . The adjacency matrix of one such realization is given in (b). 108 (a) (b) Figure 6.6: Dierent set up for test bench graph realizations: (a) Network size N = 250. Cluster size are between 50 and 120. (b) Network size N = 250. Cluster size are between 80 and 130. Results for various mixing parameter are plot in colors. We observe that when the network structure is weak, indicated by a large , the normalized modularity ~ Q 2 decreases and could be below theF 0:95 curve. Figure 6.7: Real data network: the Karate Club network [165] has 34 nodes. Its normalized modularity value ~ Q 2 is calculated and above theF 0:95 curve, indicating that the community structure is signicant. So does the Dolphin network [91] of 62 nodes. 109 Figure 6.8: The fMRI resting connectome has 96 nodes and we used theF 0:95 curve from N = 100 for reference. The completed weighted graph is thresholded and binarized to preserve dierent portions of the strongest edges. We calculate the normalized modularity ~ Q 2 as well as the empirical Bernoulli parameter p for each of them to test their community structure signicance. Dierent color indicates the dierent sparsity level as specied in the color bar; for example 0.85 means the network preserves top 85% of the edges according to their strengths. All the thresholded network has ~ Q 2 above theF 0:95 curve, indicating that the community structures are signicant. 110 (a) p 0:388 (b) p 0:490 (c) p 0:592 (d) p 0:694 Figure 6.9: Frequency plot of the normalized modularity ~ Q 2 for dierent empirical Bernoulli distribution parameter p value in each panel. Dierent size networks are in colors. In each of the dierent p case, the variance of ~ Q 2 decreases when the network size N increases. 111 Chapter 7 Conclusions and Future Work 7.1 Conclusions Graph partition algorithms based on modularity have become increasingly popular in identifying network community structure. We have seen that the performance relies heavily on the null model which should have no underlying structure but the same conguration as the true network. In this dissertation we proposed several dierent null models derived statistically based on dierent distributions and condi- tions for both undirected and directed graphs. This models oer several advantages, such as node degree preservation, and are more exible that conventional models, allowing for topology exibility and the introduction of prior information. Appli- cation of these models to both simulations and experimental data demonstrated their ability to detect true community structure. We also proposed two types of methods (one based on the largest eigenvalue and the other based on modularity) to test for the community structure signicance while controlling the type I error rate. In the following we indicate several more challenges that remain ahead. 112 7.2 Future Work Modularity based partitioning methods have several inherent limitations in detect- ing network structure. In the following we review some of these limitations and point towards future research directions. We also seek new applications to our network partition algorithms. 7.2.1 Exhaustive Assignment A property of most network partitioning algorithms is that after partitioning every node have to be assigned to a specic subnetwork. However, we should admit the case that some of the nodes do not participate in any structure, and their subnetwork assignments can be misleading. A potential remedy is to impose a threshold for the participation of each node to modules. Consider bi-partitioning as an example: if a specic node a has smaller than threshold () contribution to modularity when it is assigned to each of the resulting subnetwork: jQ(C;C a = 1)Q(C;C a = 2)j< (7.1) then we should not really include the node in either side of the resulting subnetwork. 7.2.2 Hybrid Network Edge Distributions We have developed null models for Gaussian random networks and Bernoulli ran- dom networks. However, in some brain connectome cases the edge distributions do not conform to either case. For example, a connectome constructed by partial cor- relation using graphical lasso [49,95] or the PC* method [157] will make networks 113 not fully connected. In such graphs, the edge could be better described as following a hybrid distribution: 8 > > < > > : Prob (A ij = 0) = (1p) +p 1 p 2 2 exp h 2 2 2 i ; absent edges Prob (A ij =x;x6= 0) =p 1 p 2 2 exp h (x) 2 2 2 i ; present edges (7.2) Under such edge distributions, if we could solve the conditional probability of edge strength given the degree sequence, we can have the analytic form of expected network and thus use it as the null model. This null model will thus better re ect the true graph model and may improve the community structure detection performance. 7.2.3 Hypernetworks In brain network analysis, a possible structure may have an ROI (region of inter- est) participate in dierent modules in dierent frequency bands. To allow such structures, we can construct interaction networks for dierent frequency bands. However, this does not include the possible cross-frequency couplings which have recently drawn considerable attention and play an important role in brain cou- pling analysis. We can use a hyper-network approach to address cross-frequency coupling. In this setting, each ROI is not represented by a node but by a series of nodes, each modeling signals extracted from a particular frequency band at a certain ROI. Edges between nodes follow previous denitions, such as correlations between time series from associated ROIs. Module analysis using modularity-based partitioning on such a hyper-network should identify meaningful modules not only across spatial locations, but also across dierent frequency bands. 114 7.2.4 Overlapping Networks Another limitation to existing partitioning methods is that each node must be as- signed to a single sub-network. However, we know that nodes in some real networks can participate in dierent modules at the same time. This is the overlapping clus- tering concept. Some approaches incorporate fuzzy output of clustering results, or focus on the graphical layout and trying to address the possibility of overlap- ping clusters. We plan to investigate physically meaningful overlapping partitioning methods that reveal the underlying structure. In brain interaction networks, this may be carried out through the analysis of the interaction pattern. Canonical in- teraction analysis can be one of the possible approaches because it gives not only the interaction value but the coupling vector for both signals. These coupling vec- tors should bear some information and may assist in developing an overlapping partitioning scheme. 7.2.5 Inclusion of Prior Clustering Knowledge In some applications we may need to include prior information on the network structure. Equivalently, during the partitioning phase, we may wish to have part of the network pre-clustered because of the prior information. These can be seen as constraints to partitioning solutions while optimizing modularity. One example is protein-protein interaction networks where each node represents a unique protein. Prior studies may have already pointed out several proteins form a known cluster and perform some biological function jointly. Thus, including such prior information as part of the constraint is meaningful and could help us focus on other parts of the network where we lack a priori information. 115 7.2.6 Applications in Other Disciplines Our resting MEG study is preliminary but encouraging. We plan to extend this work with more real data and analyze the patterns over dierent subjects. Network analysis comparing dierent populations (e.g. patient vs. control) is also interesting and may give us insights into how the network is aected under dierent conditions. The study of protein networks is another promising research direction, with new challenges, such as working with vast sparse networks. 116 References [1] R. Agrawal and HV Jagadish. Algorithms for searching massive graphs. Knowledge and Data Engineering, IEEE Transactions on, 6(2):225{238, 1994. [2] R.D. Alba. A graph-theoretic denition of a sociometric clique. Journal of Mathematical Sociology, 3(1):113{126, 1973. [3] A.F. Alexander-Bloch, N. Gogtay, D. Meunier, R. Birn, L. Clasen, F. Lalonde, R. Lenroot, J. Giedd, and E.T. Bullmore. Disrupted modularity and local connectivity of brain functional networks in childhood-onset schizophrenia. Frontiers in systems neuroscience, 4:doi:10.3389, 2010. [4] J. Alstott, M. Breakspear, P. Hagmann, L. Cammoun, and O. Sporns. Mod- eling the impact of lesions in the human brain. PLoS computational biology, 5(6):e1000408, 2009. [5] J.S. Anderson, J.A. Nielsen, A.L. Froehlich, M.B. DuBray, T.J. Druzgal, A.N. Cariello, J.R. Cooperrider, B.A. Zielinski, C. Ravichandran, P.T. Fletcher, et al. Functional connectivity magnetic resonance imaging classication of autism. Brain, 134:3742{3754, 2011. [6] A. Arenas, J. Duch, A. Fern andez, and S. G omez. Size reduction of complex networks preserving modularity. New Journal of Physics, 9:176, 2007. [7] A. Arenas, A. Fernandez, and S. Gomez. Analysis of the structure of complex networks at dierent resolution levels. New Journal of Physics, 10:053039, 2008. [8] A.E. Ashari, R. Nikoukhah, and S.L. Campbell. A numerical method to solve a quadratic constrained maximization. center for scientic computation technique reports, 2009. [9] N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learn- ing, 56(1):89{113, 2004. [10] M.J. Barber. Modularity and community detection in bipartite networks. Physical Review E, 76(6):066102, 2007. 117 [11] D.S. Bassett, N.F. Wymbs, M.A. Porter, P.J. Mucha, J.M. Carlson, and S.T. Grafton. Dynamic reconguration of human brain networks during learning. Proceedings of the National Academy of Sciences, 108(18):7641, 2011. [12] S. Ben-David, U. Von Luxburg, and D. P al. A sober look at clustering sta- bility. Learning Theory, pages 5{19, 2006. [13] M. Bertini, M. Ferrara, L. De Gennaro, G. Curcio, F. Moroni, F. Vecchio, M. De Gasperis, P.M. Rossini, and C. Babiloni. Directional information ows between brain hemispheres during presleep wake and early sleep stages. Cere- bral Cortex, 17(8):1970, 2007. [14] V.D. Blondel, J.L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008:P10008, 2008. [15] A. Borodin, G.O. Roberts, J.S. Rosenthal, and P. Tsaparas. Finding author- ities and hubs from link structures on the world wide web. In Proceedings of the 10th international conference on World Wide Web, pages 415{429. ACM, 2001. [16] Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max- ow algorithms for energy minimization in vision. Pattern Analysis and Ma- chine Intelligence, IEEE Transactions on, 26(9):1124{1137, 2004. [17] M. Brookes. The matrix reference manual, 2005. [18] E. Bullmore and O. Sporns. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 10(3):186{198, 2009. [19] E.T. Bullmore and D.S. Bassett. Brain graphs: graphical models of the human brain connectome. Annual review of clinical psychology, 7:113{140, 2011. [20] N.P. Castellanos, I. Leyva, J.M. Buld u, R. Bajo, N. Pa ul, P. Cuesta, V.E. Ord o~ nez, C.L. Pascua, S. Boccaletti, F. Maest u, et al. Principles of recovery from traumatic brain injury: Reorganization of functional networks. Neu- roimage, 55(3):1189{1199, 2011. [21] Y.T. Chang, R.M. Leahy, and D. Pantazis. Modularity-based graph parti- tioning using conditional expected models. Physical Review E, 85(1):016109, 2012. [22] Y.T. Chang, D. Pantazis, H.B. Hui, and R.M. Leahy. Statistically optimal graph partition method based on modularity. In Biomedical Imaging: From 118 Nano to Macro, 2010 IEEE International Symposium on, pages 1193{1196. IEEE, 2010. [23] Z.J. Chen, Y. He, P. Rosa-Neto, J. Germann, and A.C. Evans. Revealing modular architecture of human brain structural networks by using cortical thickness from mri. Cerebral cortex, 18(10):2374{2381, 2008. [24] Z.J. Chen, Y. He, P. Rosa-Neto, G. Gong, and A.C. Evans. Age-related alterations in the modular organization of structural cortical network by using cortical thickness from mri. NeuroImage, 56(1):235{245, 2011. [25] F.R.K. Chung. Spectral graph theory. Number 92. Amer Mathematical Soci- ety, 1997. [26] A. Clauset, M.E.J. Newman, and C. Moore. Finding community structure in very large networks. Physical review E, 70(6):066111, 2004. [27] M.E. Cusick, N. Klitgord, M. Vidal, and D.E. Hill. Interactome: gateway into systems biology. Human molecular genetics, 14(suppl 2):R171, 2005. [28] L. Danon, A. D az-Guilera, and A. Arenas. The eect of size heterogene- ity on community identication in complex networks. Journal of Statistical Mechanics: Theory and Experiment, 2006:P11010, 2006. [29] L. Danon, A. D az-Guilera, J. Duch, and A. Arenas. Comparing commu- nity structure identication. Journal of Statistical Mechanics: Theory and Experiment, 2005:P09008, 2005. [30] P. Deift and D. Gioev. Random matrix theory: invariant ensembles and universality, volume 18. Amer Mathematical Society, 2009. [31] G. Deshpande, X. Hu, R. Stilla, and K. Sathian. Eective connectivity dur- ing haptic perception: a study using granger causality analysis of functional magnetic resonance imaging data. Neuroimage, 40(4):1807{1814, 2008. [32] L. Deuker, E.T. Bullmore, M. Smith, S. Christensen, P.J. Nathan, B. Rock- stroh, and D.S. Bassett. Reproducibility of graph metrics of human brain functional networks. Neuroimage, 47(4):1460{1468, 2009. [33] I.S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 551{556. ACM, 2004. [34] R. Diestel. Graph Theory. Graduate Texts in Mathematics, vol. 173. Springer, 2005. 119 [35] N.U.F. Dosenbach, B. Nardos, A.L. Cohen, D.A. Fair, J.D. Power, J.A. Church, S.M. Nelson, G.S. Wig, A.C. Vogel, C.N. Lessov-Schlaggar, et al. Prediction of individual brain maturity using fmri. Science, 329(5997):1358, 2010. [36] N. Du, B. Wu, X. Pei, B. Wang, and L. Xu. Community detection in large- scale social networks. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 16{25. ACM, 2007. [37] J. Duch and A. Arenas. Community detection in complex networks using extremal optimization. Physical Review E, 72(2):027104, 2005. [38] A. Edelman. The probability that a random real gaussian matrix haskreal eigenvalues, related distributions, and the circular law. Journal of Multivari- ate Analysis, 60(2):203{232, 1997. [39] L. Euler. Solutio problematis ad geometriam situs pertinentis. Commentarii academiae scientiarum Petropolitanae, 8:128{140, 1741. [40] AC Evans, JM Lee, SI Kim, H. Fukuda, R. Kawashima, Y. He, T. Jiang, JS Kim, Z. Chen, K. Im, et al. Human cortical anatomical networks assessed by structural mri. Brain Imaging and Behavior, 2(4):289{299, 2008. [41] Y. Fan, F. Shi, J.K. Smith, W. Lin, J.H. Gilmore, and D. Shen. Brain anatom- ical networks in early human brain development. Neuroimage, 54(3):1862{ 1871, 2011. [42] Z. Feng, X. Xu, N. Yuruk, and T. Schweiger. A novel similarity-based mod- ularity function for graph partitioning. Data Warehousing and Knowledge Discovery, 4654:385{396, 2007. [43] L. Ferrarini, I.M. Veer, E. Baerends, M.J. van Tol, R.J. Renken, N.J.A. van der Wee, D.J. Veltman, A. Aleman, F.G. Zitman, B.W.J.H. Penninx, et al. Hierarchical functional modularity in the resting-state human brain. Human brain mapping, 30(7):2220{2231, 2009. [44] G.W. Flake, S. Lawrence, C.L. Giles, and F.M. Coetzee. Self-organization and identication of web communities. Computer, 35(3):66{70, 2002. [45] S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75{ 174, 2010. [46] S. Fortunato and M. Barthelemy. Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1):36, 2007. 120 [47] M.D. Fox and M.E. Raichle. Spontaneous uctuations in brain activity ob- served with functional magnetic resonance imaging. Nature Reviews Neuro- science, 8(9):700{711, 2007. [48] M.D. Fox, A.Z. Snyder, J.L. Vincent, M. Corbetta, D.C. Van Essen, and M.E. Raichle. The human brain is intrinsically organized into dynamic, anticorre- lated functional networks. Proceedings of the National Academy of Sciences of the United States of America, 102(27):9673, 2005. [49] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estima- tion with the graphical lasso. Biostatistics, 9(3):432{441, 2008. [50] A.M. Frieze and T. Luczak. On the independence and chromatic num- bers of random regular graphs. Journal of Combinatorial Theory, Series B, 54(1):123{132, 1992. [51] AK Gayen. The frequency distribution of the product-moment correlation coecient in random samples of any size drawn from non-normal universes. Biometrika, 38(1/2):219{247, 1951. [52] R. Ghosh and K. Lerman. Community detection using a measure of global in- uence. Advances in Social Network Mining and Analysis, 5498:20{35, 2010. [53] M. Girvan and M.E.J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821, 2002. [54] S. G omez, P. Jensen, and A. Arenas. Analysis of community structure in networks of correlated data. Physical Review E, 80(1):016114, 2009. [55] G. Gong, Y. He, L. Concha, C. Lebel, D.W. Gross, A.C. Evans, and C. Beaulieu. Mapping anatomical connectivity patterns of human cerebral cortex using in vivo diusion tensor imaging tractography. Cerebral Cortex, 19(3):524, 2009. [56] R. Guimera and L.A.N. Amaral. Functional cartography of complex metabolic networks. Nature, 433(7028):895{900, 2005. [57] R. Guimer a, M. Sales-Pardo, and L.A.N. Amaral. Modularity from uctua- tions in random graphs and complex networks. Phys. Rev. E, 70(2):025101, 2004. [58] R. Guimer a, M. Sales-Pardo, and L.A.N. Amaral. Module identication in bipartite and directed networks. Physical Review E, 76(3):036102, 2007. 121 [59] L. Hagen and A.B. Kahng. New spectral methods for ratio cut partitioning and clustering. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 11(9):1074{1085, 1992. [60] P. Hagmann, L. Cammoun, X. Gigandet, S. Gerhard, P. Ellen Grant, V. Wedeen, R. Meuli, J.P. Thiran, C.J. Honey, and O. Sporns. Mr con- nectomics: Principles and challenges. Journal of neuroscience methods, 194(1):34{45, 2010. [61] P. Hagmann, L. Cammoun, X. Gigandet, R. Meuli, C.J. Honey, V.J. Wedeen, and O. Sporns. Mapping the structural core of human cerebral cortex. PLoS biology, 6(7):e159, 2008. [62] P. Hagmann, O. Sporns, N. Madan, L. Cammoun, R. Pienaar, V.J. Wedeen, R. Meuli, J.P. Thiran, and PE Grant. White matter maturation reshapes structural connectivity in the late developing human brain. Proceedings of the National Academy of Sciences, 107(44):19067{19072, 2010. [63] L.H. Hartwell, J.J. Hopeld, S. Leibler, and A.W. Murray. From molecular to modular cell biology. Nature, 402(6761 Suppl):C47{C52, 1999. [64] B.J. He, A.Z. Snyder, J.L. Vincent, A. Epstein, G.L. Shulman, and M. Cor- betta. Breakdown of functional connectivity in frontoparietal networks un- derlies behavioral decits in spatial neglect. Neuron, 53(6):905{918, 2007. [65] Y. He, J. Wang, L. Wang, Z.J. Chen, C. Yan, H. Yang, H. Tang, C. Zhu, Q. Gong, Y. Zang, et al. Uncovering intrinsic modular organization of spon- taneous brain activity in humans. PLoS One, 4(4):e5226, 2009. [66] C.C. Hilgetag, G.A.P.C. Burns, M.A. O'Neill, J.W. Scannell, and M.P. Young. Anatomical connectivity denes the organization of clusters of cortical areas in the macaque and the cat. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 355(1393):91{110, 2000. [67] C.C. Hilgetag and M. Kaiser. Clustered organization of cortical connectivity. Neuroinformatics, 2(3):353{360, 2004. [68] C.J. Honey, R. K otter, M. Breakspear, and O. Sporns. Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Pro- ceedings of the National Academy of Sciences, 104(24):10240, 2007. [69] C.J. Honey, J.P. Thivierge, and O. Sporns. Can structure predict function in the human brain? Neuroimage, 52(3):766{776, 2010. [70] X. Hu, X. Zhang, C. Lu, EK Park, and X. Zhou. Exploiting wikipedia as external knowledge for document clustering. In Proceedings of the 15th ACM 122 SIGKDD international conference on Knowledge discovery and data mining, pages 389{396. ACM, 2009. [71] M.H. Jackson. Assessing the structure of communication on the world wide web. Journal of Computer-Mediated Communication, 3(1):0{0, 1997. [72] H. Jeong, SP Mason, AL Barab asi, and ZN Oltvai. Lethality and centrality in protein networks. Nature, 411(6833):41, 2001. [73] M. Kaiser. A tutorial in connectome analysis: topological and spatial features of brain networks. Neuroimage, 57(3):892{907, 2011. [74] M. Kaiser. A tutorial in connectome analysis: topological and spatial features of brain networks. Neuroimage, 57(3):892{907, 2011. [75] T.D. Kaplan and S. Forrest. A dual assortative measure of community struc- ture. Arxiv preprint arXiv:0801.3290, 2008. [76] B. Karrer, E. Levina, and MEJ Newman. Robustness of community structure in networks. Physical Review E, 77(4):046119, 2008. [77] S.M. Kay. Fundamentals of Statistical Signal Processing, Volume I: Esti- mation Theory. Prentice Hall Signal Processing Series. Prentice Hall PTR, 1993. [78] C. Kelly, Z. Shehzad, M. Mennes, and M. Milham. Functional connectome processing scripts. http://www.nitrc.org/projects/fcon_1000, 2011. [79] B.W. Kernighan and S. Lin. An ecient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(2):291{307, 1970. [80] M.G. Kitzbichler, R.N.A. Henson, M.L. Smith, P.J. Nathan, and E.T. Bull- more. Cognitive eort drives workspace conguration of human brain func- tional networks. The Journal of Neuroscience, 31(22):8259{8270, 2011. [81] A. Korzeniewska, M. Manczak, M. Kaminski, K.J. Blinowska, and S. Ka- sicki. Determination of information ow direction among brain structures by a modied directed transfer function (ddtf) method. Journal of neuroscience methods, 125(1-2):195{207, 2003. [82] P. Krishna Reddy, M. Kitsuregawa, P. Sreekanth, and S. Srinivasa Rao. A graph based approach to extract a neighborhood customer community for collaborative ltering. Databases in Networked Information Systems, pages 188{200, 2002. 123 [83] B. Krishnamurthy and J. Wang. On network-aware clustering of web clients. In ACM SIGCOMM Computer Communication Review, volume 30, pages 97{110. ACM, 2000. [84] J.M. Kumpula, J. Saram aki, K. Kaski, and J. Kertesz. Limited resolution in complex network community detection with potts model approach. The Euro- pean Physical Journal B-Condensed Matter and Complex Systems, 56(1):41{ 45, 2007. [85] A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for test- ing community detection algorithms. Physical Review E, 78(4):046110, 2008. [86] E.A. Leicht and M.E.J. Newman. Community structure in directed networks. Physical Review Letters, 100(11):118703, 2008. [87] J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive and neg- ative links in online social networks. In Proceedings of the 19th international conference on World wide web, pages 641{650. ACM, 2010. [88] K. Li, X. Wu, D.Z. Chen, and M. Sonka. Optimal surface segmentation in volumetric images-a graph-theoretic approach. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(1):119{134, 2006. [89] W. Liao, J. Ding, D. Marinazzo, Q. Xu, Z. Wang, C. Yuan, Z. Zhang, G. Lu, and H. Chen. Small-world directed networks in the human brain: Multivari- ate granger causality analysis of resting-state fmri. NeuroImage, 54(4):2683{ 2694, 2011. [90] R.D. Luce. Connectivity and generalized cliques in sociometric group struc- ture. Psychometrika, 15(2):169{190, 1950. [91] D. Lusseau, K. Schneider, O.J. Boisseau, P. Haase, E. Slooten, and S.M. Daw- son. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology and Sociobiology, 54(4):396{405, 2003. [92] D.J.C. MacKay. Information theory, inference, and learning algorithms. Cambridge University Press, 2003. [93] S. Maslov and K. Sneppen. Specicity and stability in topology of protein networks. Science, 296(5569):910, 2002. [94] C.P. Massen and J.P.K. Doye. Identifying communities within energy land- scapes. Physical Review E, 71(4):046101, 2005. [95] N. Meinshausen and P. B uhlmann. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3):1436{1462, 2006. 124 [96] V. Menon. Large-scale brain networks and psychopathology: a unifying triple network model. Trends in Cognitive Sciences, 15(10):483{506, 2011. [97] D. Meunier, S. Achard, A. Morcom, and E. Bullmore. Age-related changes in modular organization of human brain functional networks. Neuroimage, 44(3):715{723, 2009. [98] D. Meunier, S. Achard, A. Morcom, and E. Bullmore. Age-related changes in modular organization of human brain functional networks. Neuroimage, 44(3):715{723, 2009. [99] D. Meunier, R. Lambiotte, and E.T. Bullmore. Modular and hierarchically modular organization of brain networks. Frontiers in neuroscience, 4:article 200, 2010. [100] B. Mohar. Some applications of laplace eigenvalues of graphs. Graph sym- metry: algebraic methods and applications, 497:227{275, 1997. [101] R.J. Mokken. Cliques, clubs and clans. Quality & Quantity, 13(2):161{173, 1979. [102] M. Molloy and B. Reed. A critical point for random graphs with a given degree sequence. Random Structures & Algorithms, 6(2-3):161{180, 1995. [103] S. Mu, F. Rao, and A. Ca isch. Local modularity measure for network clusterizations. Physical Review E, 72(5):056107, 2005. [104] S. Navlakha and C. Kingsford. Exploring biological network dynamics with ensembles of graph partitions. In Proc. 15th Intl. Pacic Symposium on Biocomputing (PSB), volume 15, pages 166{177, 2010. [105] M.E.J. Newman. Analysis of weighted networks. Physical Review E, 70(5):056131, 2004. [106] M.E.J. Newman. Fast algorithm for detecting community structure in net- works. Physical Review E, 69(6):066133, 2004. [107] M.E.J. Newman. Finding community structure in networks using the eigen- vectors of matrices. Physical Review E, 74(3):036104, 2006. [108] M.E.J. Newman. Modularity and community structure in networks. Proceed- ings of the National Academy of Sciences, 103(23):8577, 2006. [109] M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004. 125 [110] V. Nicosia, G. Mangioni, V. Carchiolo, and M. Malgeri. Extending the deni- tion of modularity to directed graphs with overlapping communities. Journal of Statistical Mechanics: Theory and Experiment, 2009:P03024, 2009. [111] G. Palla, I. Der enyi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814{818, 2005. [112] B.O. Palsson, N.D. Price, and J.A. Papin. Development of network-based pathway denitions: the need to analyze real metabolic networks. Trends Biotechnol, 21(5):195{198, 2003. [113] I.J.G. Portillo and P.M. Gleiser. An adaptive complex network model for brain functional networks. PloS one, 4(9):e6863, 2009. [114] J.D. Power, D.A. Fair, B.L. Schlaggar, and S.E. Petersen. The development of human functional brain networks. Neuron, 67(5):735{748, 2010. [115] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Dening and identifying communities in networks. Proceedings of the National Academy of Sciences of the United States of America, 101(9):2658, 2004. [116] A. Raj and C.H. Wiggins. An information-theoretic derivation of min-cut- based clustering. Pattern Analysis and Machine Intelligence, IEEE Transac- tions on, 32(6):988{995, 2010. [117] J. Reichardt and S. Bornholdt. Statistical mechanics of community detection. Physical Review E, 74(1):016110, 2006. [118] J. Reichardt and S. Bornholdt. Partitioning and modularity of graphs with arbitrary degree distribution. Physical Review E, 76(1):015102, 2007. [119] J.C. Reijneveld, S.C. Ponten, H.W. Berendse, and C.J. Stam. The applica- tion of graph theoretical analysis to complex networks in the brain. Clinical Neurophysiology, 118(11):2317{2331, 2007. [120] M. Rubinov and O. Sporns. Complex network measures of brain connectivity: uses and interpretations. Neuroimage, 52(3):1059{1069, 2010. [121] M. Rubinov and O. Sporns. Weight-conserving characterization of complex functional brain networks. Neuroimage, 56(4):2068{2079, 2011. [122] M. Rubinov and O. Sporns. Weight-conserving characterization of complex functional brain networks. Neuroimage, 56(4):2068{2079, 2011. 126 [123] J.D. Rudie, Z. Shehzad, L.M. Hernandez, N.L. Colich, S.Y. Bookheimer, M. Iacoboni, and M. Dapretto. Reduced functional integration and segrega- tion of distributed neural systems underlying social and emotional informa- tion processing in autism spectrum disorders. Cerebral Cortex, 2011. [124] R. Salvador, J. Suckling, M.R. Coleman, J.D. Pickard, D. Menon, and ED Bullmore. Neurophysiological architecture of functional magnetic res- onance images of human brain. Cerebral Cortex, 15(9):1332{1342, 2005. [125] J.R. Sato, A. Fujita, E.F. Cardoso, C.E. Thomaz, M.J. Brammer, and E. Amaro Jr. Analyzing the connectivity between regions of interest: An approach based on cluster granger causality for fmri data analysis. Neuroim- age, 52(4):1444{1455, 2010. [126] M.B. Schippers, A. Roebroeck, R. Renken, L. Nanetti, and C. Keysers. Map- ping the information ow from one brain to another during gestural com- munication. Proceedings of the National Academy of Sciences, 107(20):9388, 2010. [127] P. Schuetz and A. Ca isch. Ecient modularity optimization by multi- step greedy algorithm and vertex mover renement. Physical Review E, 77(4):046112, 2008. [128] S.B. Seidman. Network structure and minimum degree. Social networks, 5(3):269{287, 1983. [129] S.B. Seidman and B.L. Foster. A graph-theoretic generalization of the clique concept*. Journal of Mathematical sociology, 6(1):139{154, 1978. [130] R. Sharan and T. Ideker. Modeling cellular machinery through biological network comparison. Nature biotechnology, 24(4):427{433, 2006. [131] R. Sharan, I. Ulitsky, and R. Shamir. Network-based prediction of protein function. Molecular systems biology, 3(1), 2007. [132] H. Shen, X. Cheng, K. Cai, and M.B. Hu. Detect overlapping and hierarchical community structure in networks. Physica A: Statistical Mechanics and its Applications, 388(8):1706{1712, 2009. [133] J. Shi and J. Malik. Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8):888{905, 2000. [134] P. Skudlarski, K. Jagannathan, K. Anderson, M.C. Stevens, V.D. Calhoun, B.A. Skudlarska, and G. Pearlson. Brain connectivity is not only lower but dierent in schizophrenia: a combined anatomical and functional approach. Biological psychiatry, 68(1):61{69, 2010. 127 [135] S. Song, P.J. Sj ostr om, M. Reigl, S. Nelson, and D.B. Chklovskii. Highly nonrandom features of synaptic connectivity in local cortical circuits. PLoS biology, 3(3):e68, 2005. [136] O. Sporns, D.R. Chialvo, M. Kaiser, and C.C. Hilgetag. Organization, devel- opment and function of complex brain networks. Trends in cognitive sciences, 8(9):418{425, 2004. [137] O. Sporns, C.J. Honey, and R. K otter. Identication and classication of hubs in brain networks. PLoS One, 2(10):e1049, 2007. [138] O. Sporns and R. K otter. Motifs in brain networks. PLoS Biology, 2(11):e369, 2004. [139] CJ Stam. Characterization of anatomical and functional connectivity in the brain: A complex networks perspective. International Journal of Psychophys- iology, 77(3):186{194, 2010. [140] S. Still and W. Bialek. How many clusters? an information-theoretic per- spective. Neural computation, 16(12):2483{2506, 2004. [141] V. Swarnkar, U.R. Abeyratne, and AS Karunajeewa. Left-right information ow in the brain during eeg arousals. In Engineering in Medicine and Biology Society, 2006. EMBS'06. 28th Annual International Conference of the IEEE, pages 6133{6136. IEEE, 2006. [142] R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2):411{423, 2001. [143] N.M. Tichy, M.L. Tushman, and C. Fombrun. Social network analysis for organizations. Academy of Management Review, 4(4):507{519, 1979. [144] VA Traag and J. Bruggeman. Community detection in networks with positive and negative links. Physical Review E, 80(3):036115, 2009. [145] C.A. Tracy and H. Widom. The distribution of the largest eigenvalue in the gaussian ensembles. Calogero-Moser-Sutherland Models, CRM Series in Mathematical Physics, 4:461{472, 2000. [146] DJ Tylavsky and GRL Sohie. Generalization of the matrix inversion lemma. Proceedings of the IEEE, 74(7):1050{1052, 1986. [147] MJ Vaessen, PAM Hofman, HN Tijssen, AP Aldenkamp, JFA Jansen, and WH Backes. The eect and reproducibility of dierent clinical dti gradient sets on small world brain connectivity measures. Neuroimage, 51(3):1106{ 1116, 2010. 128 [148] M. Van Den Heuvel, R. Mandl, and H.H. Pol. Normalized cut group clustering of resting-state fmri data. PLoS One, 3(4):e2001, 2008. [149] M.P. van den Heuvel and H.E. Hulsho Pol. Exploring the brain network: a review on resting-state fmri functional connectivity. European Neuropsy- chopharmacology, 20(8):519{534, 2010. [150] MP Van den Heuvel, CJ Stam, M. Boersma, and HE Hulsho Pol. Small- world and scale-free organization of voxel-based resting-state functional con- nectivity in the human brain. Neuroimage, 43(3):528{539, 2008. [151] P. Van Mieghem. Graph Spectra for Complex Networks. Cambridge Univer- sity Press, 2011. [152] P. Van Mieghem, X. Ge, P. Schumm, S. Trajanovski, and H. Wang. Spec- tral graph analysis of modularity and assortativity. Physical Review E, 82(5):056113, 2010. [153] B.C.M. van Wijk, C.J. Stam, and A. Daertshofer. Comparing brain net- works of dierent size and connectivity density using graph theory. PLoS One, 5(10):e13701, 2010. [154] A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani. Global pro- tein function prediction from protein-protein interaction networks. Nature biotechnology, 21(6):697{700, 2003. [155] U. Von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395{416, 2007. [156] J. Wang, X. Zuo, and Y. He. Graph-based network analysis of resting-state functional mri. Frontiers in systems neuroscience, 4:article 16, 2010. [157] D. Wheland, A.A. Joshi, K. McMahon, N. Hansell, N. Martin, M. Wright, P. Thompson, D. Shattuck, and R.M. Leahy. Comparison of partial- correlation network inference methods with applications to cortical thickness data. 2012. [158] A.Y. Wu, M. Garland, and J. Han. Mining scale-free networks using geodesic clustering. In Proceedings of the tenth ACM SIGKDD international confer- ence on Knowledge discovery and data mining, pages 719{724. ACM, 2004. [159] Z. Wu and R. Leahy. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 15(11):1101{1113, 1993. 129 [160] L. Xu, W. Li, and D. Schuurmans. Fast normalized cut with linear con- straints. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 2866{2873. IEEE, 2009. [161] M.S. Yang and K.L. Wu. A similarity-based robust clustering method. Pat- tern Analysis and Machine Intelligence, IEEE Transactions on, 26(4):434{ 448, 2004. [162] M.P. Young. The organization of neural systems in the primate cerebral cor- tex. Proceedings of the Royal Society of London. Series B: Biological Sciences, 252(1333):13{18, 1993. [163] M.P. Young et al. Objective analysis of the topological organization of the primate cortical visual system. Nature, 358(6382):152{155, 1992. [164] S.X. Yu and J. Shi. Segmentation given partial grouping constraints. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 26(2):173{183, 2004. [165] W.W. Zachary. An information ow model for con ict and ssion in small groups. Journal of anthropological research, 33:452{473, 1977. [166] L. Zemanov a, C. Zhou, and J. Kurths. Structural and functional clusters of complex brain networks. Physica D: Nonlinear Phenomena, 224(1-2):202{ 212, 2006. [167] S. Zhong and J. Ghosh. Generative model-based document clustering: a comparative study. Knowledge and Information Systems, 8(3):374{384, 2005. [168] C. Zhou, L. Zemanov a, G. Zamora-Lopez, C.C. Hilgetag, and J. Kurths. Structure-function relationship in complex brain networks expressed by hier- archical synchronization. New Journal of Physics, 9:178, 2007. 130 
Abstract (if available)
Abstract Network modeling and graph theory have been widely studied and applied in  a variety of modern research areas. The detection of network structures in these contexts is a fundamental and challenging problem. ""Community structures'' (where nodes are clustered into densely connected subnetworks) appear in a wide spectrum of disciplines.  Examples include groups of individuals sharing common interests in social networks and groups of proteins carrying out similar biological functions. Detecting network structure also enables us to perform other kinds of network analysis, such as hub identification. ❧ Modularity-based graph partitioning is an approach that is specifically designed to detect the community structures in a network. Though modularity methods have achieved some amount of success in detecting modular structures, many of the related problems have not yet been solved. ❧ Central to modularity-based graph partitioning is the null model: a statistical representation of a network with no structure.  In this work, I will present a novel approach to design null models. This new null model approach resolves many of the existing problems, including dealing with non-negativity, topological consistency, etc. Null models are presented for binary/weighted graphs and undirected/directed graphs. I will also present several new methods to assess the statistical significance of the detected community structures. Several of the potential future work directions as well as the position of the modular detection problem in a broader network analysis scheme will be given at the end of this thesis. 
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button
Conceptually similar
Efficient graph learning: theory and performance evaluation
PDF
Efficient graph learning: theory and performance evaluation 
Applications of graph theory to brain connectivity analysis
PDF
Applications of graph theory to brain connectivity analysis 
Multivariate statistical analysis of magnetoencephalography data
PDF
Multivariate statistical analysis of magnetoencephalography data 
Graph-based models and transforms for signal/data processing with applications to video coding
PDF
Graph-based models and transforms for signal/data processing with applications to video coding 
Critically sampled wavelet filterbanks on graphs
PDF
Critically sampled wavelet filterbanks on graphs 
Lifting transforms on graphs: theory and applications
PDF
Lifting transforms on graphs: theory and applications 
Human activity analysis with graph signal processing techniques
PDF
Human activity analysis with graph signal processing techniques 
Modeling intermittently connected vehicular networks
PDF
Modeling intermittently connected vehicular networks 
Learning contour statistics from natural images
PDF
Learning contour statistics from natural images 
Learning and control for wireless networks via graph signal processing
PDF
Learning and control for wireless networks via graph signal processing 
Estimation of graph Laplacian and covariance matrices
PDF
Estimation of graph Laplacian and covariance matrices 
Structured codes in network information theory
PDF
Structured codes in network information theory 
Sampling theory for graph signals with applications to semi-supervised learning
PDF
Sampling theory for graph signals with applications to semi-supervised learning 
Realistic modeling of wireless communication graphs for the design of efficient sensor network routing protocols
PDF
Realistic modeling of wireless communication graphs for the design of efficient sensor network routing protocols 
Graph embedding algorithms for attributed and temporal graphs
PDF
Graph embedding algorithms for attributed and temporal graphs 
Advanced knowledge graph embedding techniques: theory and applications
PDF
Advanced knowledge graph embedding techniques: theory and applications 
Applications of estimation and detection theory in decentralized networks
PDF
Applications of estimation and detection theory in decentralized networks 
Fundamental limits of caching networks: turning memory into bandwidth
PDF
Fundamental limits of caching networks: turning memory into bandwidth 
Graph machine learning for hardware security and security of graph machine learning: attacks and defenses
PDF
Graph machine learning for hardware security and security of graph machine learning: attacks and defenses 
Human motion data analysis and compression using graph based techniques
PDF
Human motion data analysis and compression using graph based techniques 
Action button
Asset Metadata
Creator Chang, Yu-Teng (author) 
Core Title Network structures: graph theory, statistics, and neuroscience applications 
School Andrew and Erna Viterbi School of Engineering 
Degree Doctor of Philosophy 
Degree Program Electrical Engineering 
Publication Date 05/10/2012 
Defense Date 03/07/2012 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag brain connectome,graph theory,modularity,network structures,OAI-PMH Harvest,random matrix theory,statistical significance 
Language English
Contributor Electronically uploaded by the author (provenance) 
Advisor Leahy, Richard M. (committee chair), Kuo, C.-C. Jay (committee member), Tjan, Bosco S. (committee member) 
Creator Email yutengch@usc.edu,yutengchang@gmail.com 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c3-39102 
Unique identifier UC11289235 
Identifier usctheses-c3-39102 (legacy record id) 
Legacy Identifier etd-ChangYuTen-842.pdf 
Dmrecord 39102 
Document Type Dissertation 
Rights Chang, Yu-Teng 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law.  Electronic access is being provided by the USC Libraries in agreement with the a... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
brain connectome
graph theory
modularity
network structures
random matrix theory
statistical significance