Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Asymptotic properties of two network problems with large random graphs
(USC Thesis Other)
Asymptotic properties of two network problems with large random graphs
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Asymptotic Properties of Two Network Problems with Large Random Graphs by Yusheng Wu A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (APPLIED MATHEMATICS) May 2022 Copyright 2022 Yusheng Wu To my family. ii Acknowledgements I would like to express my deepest gratitude to my advisor Prof. Xin Tong for his great care, exceptional patience and unwavering support. I would also like to express my sincere thank to Prof. Sergey Lototsky for his valuable help throughout the completion of my degree and various discussions during the probability seminar course. IamextremelygratefultomycollaboratorProf.XiaoHan,withoutwhomthecompletion of my dissertation would not be possible. I also wish to thank Prof. Yiqing Xing for his constructive advice on the second problem of my dissertation. IverymuchappreciatethehelpfrommytemporaryadvisorProf.KennethAlexander,my dissertationcommitteememberProf.StevenHeilmanandmyoralexamcommitteemembers Prof. Jianfeng Zhang and Prof. Jay Bartroff. IowemythankstomyfriendsBowenGang, WenhanJiang, LinfengLi, JiajunLuo, Man Luo, Ying Tan, Wenqian Wu and Jian Zhou, for the valuable discussions we had and all the fun we shared together. Last but not least, I would like to thank my parents and my brother for their love and support. iii Table of Contents Dedication ii Acknowledgements iii List of Figures vi Abstract vii Chapter 1: Introduction 1 1.1 Limit Theorems for Vertex-Centered Random Subgraphs . . . . . . . . . . . 2 1.2 Asymptotic Behavior of the Degree-Weighted DeGroot Model . . . . . . . . 3 1.3 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 2: Limit Theorems for Vertex-Centered Random Subgraphs 7 2.1 Graph Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Vertex-Centered Subgraphs . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Regularity Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Main Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Asymptotic Expansion and CLT . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Sketch of Proof of Theorem 2.2.1 . . . . . . . . . . . . . . . . . . . . 12 2.2.3 Key Lemmas and Their Role in Proofs . . . . . . . . . . . . . . . . . 13 2.3 Proof of Theorem 2.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Proof of Theorem 2.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 An Improvement of Condition 1 . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.6 Proof of Technical Lemmas. . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.6.1 Proof of Lemma 2.2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.6.1.1 For W − 1 with Zeros on its Diagonal . . . . . . . . . . . . . 26 2.6.1.2 Generalization to W . . . . . . . . . . . . . . . . . . . . . . 31 2.6.2 Lemma 2.6.1, Lemma 2.6.2 and their Proofs . . . . . . . . . . . . . . 34 2.6.2.1 Proof of Lemma 2.6.1 . . . . . . . . . . . . . . . . . . . . . 36 2.6.2.2 Proof of Lemma 2.6.2 . . . . . . . . . . . . . . . . . . . . . 41 2.6.3 Lemma 2.6.3: An Improvement of Lemma 2.6.2 for e i . . . . . . . . . 41 2.6.4 Proof of Lemma 2.2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.7 Additional Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.7.1 Lemma 2.7.1 and its Proof . . . . . . . . . . . . . . . . . . . . . . . . 43 2.7.2 Proof of Corollary 2.6.1 . . . . . . . . . . . . . . . . . . . . . . . . . 45 iv 2.7.3 Corollary 2.7.1 and its Proof . . . . . . . . . . . . . . . . . . . . . . . 45 2.7.4 Lemma 2.7.2 and its Proof . . . . . . . . . . . . . . . . . . . . . . . . 46 2.7.5 Lemma 2.7.3: A Modification of Lemma 2.2.4 . . . . . . . . . . . . . 48 Chapter 3: Asymptotic Behavior of the Degree-Weighted DeGroot Model 50 3.1 Network Structure and Updating Rule . . . . . . . . . . . . . . . . . . . . . 50 3.1.1 The Stochastic Block Model . . . . . . . . . . . . . . . . . . . . . . . 50 3.1.2 Degree-Weighted Linear Updating . . . . . . . . . . . . . . . . . . . . 51 3.1.3 The Function ϕ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2.1 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.1.1 Results for Networks Conditioning on A . . . . . . . . . . . 54 3.2.1.2 Results for “Expectation”–Replacing A by IEA . . . . . . . 55 3.2.1.3 Unbalanced Case for Lemma 3.2.1 . . . . . . . . . . . . . . 58 3.2.2 Main Results: a Concentration Inequality and the Monotonicity Result 59 3.2.3 Theorem 3.2.3 under Perturbation . . . . . . . . . . . . . . . . . . . . 62 3.3 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.1 Proof of Proposition 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.1.1 Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.3.1.2 Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3.2 Proof of Proposition 3.2.2 . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3.3 Proof Lemma 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3.4 Proof of Corollary 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.3.5 Proof of Lemma 3.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.3.6 Proof of Theorem 3.2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.3.7 Proof of Corollary 3.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.3.8 Proof of Lemma 3.3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.3.9 Proof of Lemma 3.3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 References 88 v List of Figures 2.1 An example with the original graph on the left and the vertex-centered sub- graph on the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 A diagram illustrating the graph visualization of the sum with m=3, r 1 =2, r 2 = 4, r 3 = 3, Q(6) ={{1,6},{2,3}}, s 1 = s 6 = 2, s 2 = s 3 = 3, s 4 = 5 and s 5 =6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3 Example of a star with one internal vertex and 5 leaves . . . . . . . . . . . . 30 3.1 Graph of |λ 2 (T ∗ )| vs. α , with ϕ (α,d ) = d α , n 1 = 500, n 2 = 300, p = 0.3, q =0.2 and m=2,4,6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 Scatter plot of lim α →0 |λ 2 (T ∗ )| lim α →−∞ |λ 2 (T ∗ )| vs. m, with ϕ (α,d ) = d α , n 1 = 500, n 2 = 300, p=0.3, q =0.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3 Graph of |λ 2 (T ∗ )| vs. α with n 1 = 300, n 2 = 500, p 1 = 0.3, p 2 = 0.2 and q =0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 vi Abstract Random graphs, created by connecting graph theory and probability theory, are studied in mathematics and other related subjects. Nowadays, many problems involve analysis of graphs with very large sizes and the mathematical understanding of them becomes impor- tant in the aspect of both theory and application. In this dissertation, we explore two related problems. The first problem, motivated by Han and Tong (2021), focuses on a graph structure, named the vertex-centered subgraph. We establish asymptotic joint normality for entries of eigenvectors of the corresponding random adjacency matrix by computing the asymptoticexpansionoftheentriesoftheeigenvectorsfollowedbyanapplicationoftheLya- punov central limit theorem. In the second problem, we study the convergence speed of the DeGroot model for random networks modeled by the stochastic block model. We generalize theupdatingrule, representedbyarow-stochasticmatrixT, tobedegree-weightedandcap- ture the change in the updating rule by a parameter α ∈R such that greater α implies that higher weights are put on vertices of higher degrees. By deriving a concentration inequality for the second largest eigenvalue in magnitude of the matrix T, we establish monotonicity, showing that the convergence speed increases as α increases in some regionD α . vii Chapter 1 Introduction A graph, defined as an ordered pair consisting of a set of vertices and a set of edges, is an object that connects with many areas of mathematics, such as combinatorics, algebra and topology. Graph theory also applies to other fields including physics, biology, computer science and economics, in which graphs are commonly referred to as networks. By making a connection between graph theory and probability theory, Erd˝ os and R´ enyi introduced a ran- dom graph model (Erd˝ os and R´ enyi, 1960) in which edges are added independently between vertices with some probability p. Random graph models are especially suitable for large networks (i.e., the number of vertices n is large) and can provide meaningful insights under standard settings. In terms of asymptotic theory, if it could be shown that some property of a random graph holds with probability approaching one as the number of vertices goes to infinity, then nearly all realizations would have this property when n is large enough. In this dissertation, we study two related problems. In the first problem, we have a new random graph structure, referred to as the vertex-centered random subgraph. Our main result establishes the asymptotic joint normality for the entries of the eigenvectors of the random adjacency matrix. In the second problem, we study an updating rule under a gener- alized DeGroot model with the network structure given by the stochastic block model. For a random network with large enough n, we show a monotonicity result for the change in convergence speed corresponding to a change in the updating rule. Random graphs are closely connected to random matrix theory; the adjacency matrix 1 of a random graph is a symmetric matrix with upper triangular entries being independent Bernoulli random variables. Random matrix theory gained attention after being applied to nuclearphysicsbyWigner(1955),andtwomajorfieldsofongoingstudiesincludeelucidating the asymptotic behaviors of eigenvalues and eigenvectors of random matrices. In the liter- ature, there have been extensive studies in eigenvalues. Some notable results includes the semicircular law (Arnold, 1967, 1971) and the Tracy-Widom law (Tracy and Widom, 1994, 1996). More recent findings have been covered in textbooks (Tao, 2012; Bai and Silverstein, 2010). Ontheotherhand,progresstowardsunderstandingtheasymptoticbehaviorofeigen- vectors of random matrices has been made only recently, with leading work including Dekel et al. (2007); Erd˝ os et al. (2009); Bordenave and Guionnet (2013); Vu and Wang (2015); Rudelson and Vershynin (2015, 2016); Bourgade and Yau (2017). In this dissertation, the first problem can be viewed as an eigenvector problem and the second one an eigenvalue problem. Details of the two problems are introduced in the following sections. 1.1 Limit Theorems for Vertex-Centered Random Subgraphs The adjacency matrix of a graph captures all of the graph’s information. Hence, studying a random graph is equivalent to studying its random adjacency matrix. Several recent works establishlimittheoremsforeigenvectorsofrandomadjacencymatricesandotherrelatedma- trices. Inparticular,TangandPriebe(2018)derivedcentrallimittheoremsfortheadjacency matrix and the normalized Laplacian matrix. Fan et al. (2020) introduced a new method of proof and established a central limit theorem for the eigenvectors of spiked random matrices under the setting of diverging spikes. The result was later applied in Fan et al. (2021) on inferences in network theory. Other related works in the literature include Capitaine and Donati-Martin (2021) and Bao et al. (2021). The present work is motivated by Han and Tong (2021), in which the idea of using only 2 subgraphs rather than the original graphs is implemented in a clustering algorithm. Un- der their setting, by fixing a center vertex 1, the target subgraph is the spanning subgraph whose edges either have endpoints being vertices adjacent to 1 or have a vertex being 1; all other edges in the original graph are deleted. While this specific random graph structure hasbroughtnewinsightsinpracticalapplications, italsoholdsinterestfromamathematical point of view. The main result of our work establishes a central limit theorem for eigenvectors of the corresponding random adjacency matrix. Our approach begins by using the standard pro- cedure of achieving an asymptotic expansion of the entries of the eigenvectors followed by applicationoftheLyapunovcentrallimittheorem. Ourproofoftheasymptoticexpansionis inspired by the architecture of Fan et al. (2020); however, our setting has a major difference: the entries of the random adjacency matrix B in our work are not independent. The value of this work is two-fold: we increase our mathematical understanding of a special random graph structure in addition to shedding light upon the dependency structure between entries of the random adjacency matrix. 1.2 Asymptotic Behavior of the Degree-Weighted DeGroot Model With the purpose of studying how collective behaviors of individuals can affect whole net- works,economistshaveraisedaninterestingquestion: ifeachindividualisassignedaninitial state and a specific updating rule is given, what will the results be, in long run? There are two existing frameworks for addressing this problem, namely the Bayesian learning model and the DeGroot model. While Bayesian learning (Gale and Kariv, 2003; Acemoglu et al., 2011; Mueller-Frank, 2013, 2014; Mossel et al., 2015; Lobel and Sadler, 2015) is based on assumption of rationality, the DeGroot model (Degroot, 1974; Chatterjee and Seneta, 1977; Friedkin and Johnsen, 1999; Demarzo et al., 2003; Jackson and Golub, 2010; Golub and 3 Jackson, 2012) simply takes the weighted average of current states of an individual’s neigh- bors as the updated state. The DeGroot model is simple but effective, and is especially well-suited to more math- ematically complicated investigations. For an initial vector b 0 ∈R n of states, the updated vector at time t is given byb(t)=T t b 0 , whereT is some row-stochastic matrix. Under the DeGroot model, convergence (i.e., all states stabilize as the time t approaches infinity) is a standard result (Demarzo et al., 2003; Jackson and Golub, 2010) so that the main focus is the speed with which convergence is reached. Representing a network by a graph with adjacency matrix A and defining T entrywise by t ij = a ij P j a ij , Golub and Jackson (2012) defined a quantity called “spectral homophily” to capture the structure of the network, and discussed the how it affects the convergence speed. Our work goes in an orthogonal direction focusing on the updating rule represented by the matrixT, rather than the graph structure. Our approach is to generalize the original definition of T to a degree-weighted version, given by t ij = a ij ϕ (α,d j (A)) P j a ij ϕ (α,d j (A)) , where d j (A) is the degree of vertex j, α ∈ R and ϕ : R 2 → R assigns weights that a vertex puts on its neighbors. This new definition of T aims to capture changes in the updating rule by the change in the parameter α and thus allows us to investigate how such changes could affect the convergence speed. We model the network using the stochastic block model (Holland et al., 1983), a random graph model that has been widely applied. Our study of the convergence speed proceeds in three steps. First, for any nonrandom network with adjacency matrix A, we show that the convergence speed is dominant by the size of |λ 2 (T)|, the second largest eigenvalue of T in magnitude. The smaller the magnitude, the faster the speed, aligning with standard results in the Markov chain literature (Levin and Peres, 2017). To study random networks, we consider an appropriate “expectation” T ∗ and show that |λ 2 (T ∗ )| is monotonically de- creasing in α in region D α . Then, by deriving a concentration inequality for λ 2 (T), we see that|λ 2 (T)− λ 2 (T ∗ )|,thegapbetweenthe“expectation”andtherandomcase,isarbitrarily 4 small when n is sufficiently large. Our main monotonicity result given by Theorem 3.2.3, follows as a consequence. 1.3 Notations In this section, we introduce some notations that will be used throughout this dissertation. Sets Denote|·| as the cardinality of a set. Let [n]={1,...,n}. Matrices For a matrixM, let m ij be its (i,j)-th entry. When the matrix has a subscript, for example M 1 ,wealsousethenotation(M 1 ) ij todenoteits(i,j)-thentry. LetIEMdenotetheexpecta- tionofarandommatrixM. Foravectorv,let∥v∥ 2 =( P i v 2 i ) 1/2 beitsEuclideannorm. Let ∥M∥ = sup ∥x∥ 2 =1 ∥Mx∥ 2 be the operator norm of M. This can also be equivalently defined as the square root of the largest eigenvalue of MM ⊤ , sometimes referred to as the spectral norm. Denote the Frobenious norm by ∥M∥ F = [tr(MM ⊤ )] 1/2 . Let the submatrix M − k of an m× n matrixM be defined as the ( m− 1)× (n− 1) matrix, formed by deleting the first row and first column of M. Similarly, for a vector x∈R n , denote x − k ∈R n− 1 as the vector formed by deleting the k-th entry of x. Let the maximum entry in absolute value of a ma- trixMandthatofavectorxbedenotedbyd M =∥M∥ max andd x =∥x∥ ∞ ,correspondingly. Asymptotics If two positive sequences a n , b n satisfy limsup n→∞ (a n /b n ) <∞, then we denote a n ≳ b n or b n ≲ a n . We write a n ≪ b n or b n ≫ a n if lim n→∞ (a n /b n ) = 0. If a n ≪ b n and b n ≪ a n , we write a n ∼ b n . Let a n =O(b n ) mean that there exist N ∈N and C >0 such that a n ≤ Cb n , for all n ≥ N. While for a sequence of random variable X n , we write X n = O p (a n ) if for 5 all ϵ > 0, there exist N ∈ N and C > 0 such that IP{|X n | > Ca n } ≤ ϵ , for all n ≥ N. An event E n dependent on n ∈ N holds with high probability if it holds with probability IP{E n } ≥ 1− Cn − c , for some C > 0 and c > 0 independent of n. Throughout the paper, the constants C and c may differ from line to line. 6 Chapter 2 Limit Theorems for Vertex-Centered Random Subgraphs 2.1 Graph Structure Consider a graph G = (V,E), where V = [n] is the set of vertices and E the set of edges. The graph G is also referred to as a labeled graph in which each vertex is given a unique label from the set [n]. Definition2.1.1. Denoteanedgebetweenverticesiandj by(i,j). Theadjacencymatrix A associated withG is defined entrywise by a ij = 1, if(i,j)∈E, 0, otherwise. Itisclearthatthereisaone-to-onecorrespondencebetweenlabeledgraphsandadjacency matrices. To study random graphs, we consider random adjacency matrices, whose entries are Bernoulli random variables. In particular, we restrict ourselves to the random graphs with adjacency matrices of the following structure: 1. They are symmetric (by default). 7 2. Entries of the upper triangular part and the main diagonal are independent Bernoulli random variables. Let the expectation IEA of the random matrix A have constant rank K > 0 (meaning that K does not vary with the number of vertices n). Hence, IEA has eigendecomposition IEA=UΛU ⊤ , (2.1) where Λ = diag(λ 1 ,...,λ K ) with λ i being the i-th largest eigenvalue of IEA in magnitude andUisthen× K matrix(u 1 ,...,u K )witheachcolumnu i beingtheeigenvectorassociated with the eigenvalue λ i . 2.1.1 Vertex-Centered Subgraphs Motivated by Han and Tong (2021), which proposes clustering by using a subgraph of the original graph with a center vertex, we aim to derive a central limit theorem for the eigen- vectors of the corresponding adjacency matrix. While there could be various subgraphs to consider, we focus on the structure defined in Han and Tong (2021), given by Definition2.1.2. GivenagraphG onnverticeswithadjacencymatrixA, thecorresponding vertex-centered (or individual-centered) subgraph of G is the spanning subgraph with adjacency matrix B=− SAS+AS+SA, (2.2) where S=diag(1,a 12 ,...,a 1n ). It is equivalent to define B entrywise by b ij =a ij 1− 1I {a 1i =0} 1I {a 1j =0} , i,j∈[n]. (2.3) The definition in (2.3) gives a clear description of the structure of the subgraph: any edge in the original graph G with neither endpoints adjacent to vertex 1 is deleted. It should 8 additionallybenotedthattheresultingsubgraphdependsonthechoiceofvertex1labeling. Furthermore, the adjacency matrixA will be a random matrix throughout this work, which makesthematrixBrandomaswell. Notethatweleta 11 =1foreasiertreatmentofrandom matrices B E and W defined by (2.4) and (2.5) later in this section. 1 5 3 2 4 1 5 3 2 4 Original Graph Vertex-Centered Subgraph Figure2.1: Anexamplewiththeoriginalgraphontheleftandthevertex-centeredsubgraph on the right 2.1.2 Regularity Assumptions In this section, we list our assumptions and some technical conditions. Following Han and Tong (2021), we make assumptions to ensure the establishment of a major term B E (see Definition 2.1.3) of the matrix B, which will play a central role in the proofs of our main results. Let p n :=max i,j IP{a ij =1}. We assume the following conditions: Condition 1. min j≥ 2 IP{a 1j = 1}∼ p n and 1− c > p n ≫ n − 1/2+ϵ for some constants c > 0 and ϵ ∈(0,1/2). Condition 2. d U :=∥U∥ max =O(1/ √ n). Condition 3. |λ 1 |∼| λ 2 |∼ ...∼| λ K |∼ np n . Condition 1 is a natural assumption: if there are too few edges with vertex 1 as one endpoint, then the observed subgraph would be nearly empty and the structure becomes pointless. Condition1isfurtherweakenedinsection2.5bydoingamoredelicatetruncation of matrix series. Conditions 2 and 3, as pointed out in Han and Tong (2021), are strong conditions for ease of technical analysis but could possibly be relaxed in future works. 9 Definition 2.1.3. Let M be an m× n matrix. A major term M E of M is a matrix with the same size such that ∥M− M E ∥≪ min i σ i (M E ), where σ i (M E ) is the i-th singular value of M E . In Definition 2.1.3, the dimensionality n is implicitly sent to ∞, which means M and M E are sequences of matrices. So are∥M− M E ∥ and min i σ i (M E ). The following result in Han and Tong (2021) establishes a major term of the matrix B. Theorem (Han and Tong 2021). Let the matrix B E be defined by B E =− S(IEA)S+(IEA)S+S(IEA). (2.4) Then, under conditions 1-3, B E is a major term of B with high probability. In addition, B E has rank 2K and its eigenvalues d i ∼ np 3/2 n for all i∈[2K]. This result is critical, as we can now express the random matrix B as B=B E +W, (2.5) whereweletW =(B− B E )forsimplicity. ThiscanbeviewedasamainsignaltermB E plus a noise term W. The fact that B E is a major term of B provides enough of a gap between theeigenvaluesofBsothatwecanapplyexistingmethods(Fanetal.,2020)tocomputethe asymptoticexpansionoftheeigenvectorsofB. Tofacilitatefurtherdiscussion,wedenotethe eigenvalues and eigenvectors ofB E as d 1 ,...,d 2K andv 1 ,...,v 2K , respectively. In addition, denote the eigen-decomposition of B E as B E =VDV ⊤ , (2.6) 10 where D=diag(d 1 ,...,d 2K ) and V is the n× 2K matrix (v 1 ,...,v 2K ). Similarly,lettheeigenvaluesandeigenvectorsofBbe ˆ λ 1 ,..., ˆ λ 2K and ˆ v 1 ,...,ˆ v 2K respec- tively. As assumed in Fan et al. (2020), we require sufficient gaps between the eigenvalues ˆ λ 1 ,..., ˆ λ 2K , given by the following condition: Condition 4. d i /d i+1 >1+c 0 for some constant c 0 >0, i∈[2K]. ThiscanbeimpliedbyaconditioninHanandTong(2021),whichrequiresµ i /µ i+1 >1+c for some constant c > 0, i ∈ [K − 1], where µ 1 ,...,µ K are eigenvalues of the matrix U ⊤ (IES)UΛ (I− U ⊤ (IES)U)Λ . 2.2 Main Theorems 2.2.1 Asymptotic Expansion and CLT In this section, we present our main result that for k∈[2K], any finite subset of the entries of the eigenvector ˆ v k are asymptotically jointly Gaussian. An issue of eigenvectors is that if ˆ v k , then− ˆ v k is another. Hence, we select the direction for ˆ v k such that ˆ v ⊤ k v k ≥ 0. We first give the asymptotic expansion of e ⊤ i ˆ v k , wheree i is the i-th element in the standard basis of R n , for i∈[n]. Theorem 2.2.1. Under conditions 1-4, for k∈[2K], and i∈[n]\{1}, we have e ⊤ i ˆ v k =e ⊤ i v k +d − 1 k e ⊤ i Wv k +O p (|d k | − 2 √ p n ), (2.7) e ⊤ 1 ˆ v k =e ⊤ 1 v k +d − 1 k e ⊤ 1 Wv k +O p (|d k | − 2 √ np n ). (2.8) Theorem 2.2.1 can be understood as: each entry e ⊤ i ˆ v k can be expanded as a mean term e ⊤ i v k (whichisdependentonS),plustherandomtermd − 1 k e ⊤ i Wv k containingthefirstpower of W, which will contribute to the central limit theorem. Note that ˆ v k is a high-dimensional vector, since it has dimension n. Hence, when a 11 central limit theorem is derived, it is reasonable to consider a finite subset of entries of ˆ v k . For N ∈ N, let I ⊂ [n]\{1} be a set of indices such that |I| = N. For k ∈ [2K] and i 1 ,...,i N ∈ I, let ˆ v I = (e ⊤ i 1 ˆ v k ,...,e ⊤ i N ˆ v k ) ⊤ , v I = (e ⊤ i 1 v k ,...,e ⊤ i N v k ) ⊤ , Y I = (e i 1 ,...,e i N ) andΣ I be the covariance matrix of the vector (e ⊤ i 1 Wv k ,...,e ⊤ i N Wv k ) conditioned onS. We then apply the Lyapunov central limit theorem (see Billingsley (2008), Theorem 27.3) to the asymptotic expansion in Theorem 2.2.1 to establish the asymptotic normality of ˆ v k . Theorem 2.2.2. Suppose that there exists a positive sequence h n such that h − 1 n Σ I →Σ ∗ I in probability, (2.9) and Σ ∗ I is nonzero. In addition, let h n ≫ max((np n ) − 1 ,|d k | − 2 p n ). Then, under conditions 1-4, we have the corresponding central limit theorem h − 1/2 n d k (ˆ v I − v I ) d − →N (0,Σ ∗ I ). (2.10) Prior to Theorem 2.2.2, we have not made direct assumptions on the size of p ij ’s for i,j̸=1andtheycouldpossiblybewithordersofmagnitudemuchsmallerthan p n . Assuch, the assumption made in Theorem 2.2.2 on the size of h n can be seen as a requirement on those p ij ’s but in a more general form. 2.2.2 Sketch of Proof of Theorem 2.2.1 Here, we were inspired by the architecture of the proof in Fan et al. (2020). For i ∈ [n], we consider the bilinear forms e ⊤ i ˆ v k ˆ v ⊤ k v k and v ⊤ k ˆ v k ˆ v ⊤ k v k . By applying Cauchy’s integral formula and the residue formula, we can express the e ⊤ i ˆ v k ˆ v ⊤ k v k as a contour integral and then convert it back to a fraction e ⊤ i ˆ v k ˆ v ⊤ k v k = e ⊤ i [G( ˆ t k )− F k ( ˆ t k )]v k v ⊤ k [G( ˆ t k )− F k ( ˆ t k )]v k v ⊤ k [G ′ ( ˆ t k )− F ′ k ( ˆ t k )]v k , 12 whereG(z)=(W− zI n ) − 1 ,F k (z)=G(z)V − k [D − 1 − k +V ⊤ − k G(z)V − k ] − 1 V ⊤ − k G(z),G ′ (z) and F ′ k (z) are entrywise derivatives of G(z) and F k (z), and ˆ t k is the solution of the equation 1+d k v ⊤ k [G(z)− F k (z)]v k =0. After this, we expand the bilinear forms e ⊤ i G(z)v k , v ⊤ k G(z)v k , e ⊤ i G ′ (z)v k , e ⊤ i F k (z)v k , v ⊤ k F k (z)v k andv ⊤ k F k (z) ′ v k . Inparticular,theyareintheformofmatrixseries. Withtheaid of Lemma 2.2.4 presented in the next section, we are able to cut the tails of these series and pick out the leading terms. v ⊤ k ˆ v k ˆ v ⊤ k v k can similarly be expanded. The result of Theorem 2.2.1 follows from here. In short, the proof of Theorem 2.2.1 is more delicate computations than a proof. 2.2.3 Key Lemmas and Their Role in Proofs Intheprevioussection,weseethatthetreatmentofseveralmatrixseriesisthekeyingredient of Theorem 2.2.1’s proof. The technical lemmas below take care of those series and allow us to pick out the leading terms that contribute to the central limit theorem given by Theorem 2.2.2. Lemma 2.2.3. For any unit vectors x,y∈R n and for any integer l≥ 1, we have x ⊤ W l − IEW l y =O p (max{α n ,β n }) , (2.11) where α n and β n are defined by (2.74) and (2.75), respectively. Lemma2.2.3servesasastartingpoint,astheunitvectorsxandyarenonrandom. What we truly need is Lemma 2.2.4, in which the bilinear forms contain the random vectorv k . To fillinthegapbetweenLemma2.2.3andLemma2.2.4,weapplythenontrivialdecomposition v k given in Han and Tong (2021), providing better outcomes than would be achieved with standard methods that bound the operator norm of W l − IEW l . We then derive Lemma 2.6.1, Lemma 2.6.2 and Lemma 2.6.3 that can be combined to achieve Lemma 2.2.4. 13 Lemma 2.2.4. For any integers l≥ 2, k∈[2K] and i∈[n]\{1}, we have e ⊤ 1 W l v k =O p ((np n ) (l− 1)/2 ), (2.12) e ⊤ i W l v k =O p ((np n ) (l− 1)/2 p n ), (2.13) v ⊤ k W l v k =O p ((np n ) l/2 ). (2.14) In addition, e ⊤ 1 Wv k =O p ( √ np n ), e ⊤ i Wv k =O p ( √ p n ) and v ⊤ k Wv k =O p (1). We demonstrate the application of Lemma 2.2.4 by considering the bilinear form e ⊤ 1 G(z)v k that contains the resolventG(z)=(W− zI) − 1 . Note thatG(z) is in the form of a Neumann series. Hence, by expanding G(z), we have e ⊤ 1 G(z)v k =− e ⊤ 1 ∞ X l=0 z − (l+1) W l v k . (2.15) By Lemma 2.7.2, for any integer L≥ 0, it holds with high probability that ∞ X l=L+1 z − (l+1) e ⊤ 1 W l v k ≤ ∞ X l=L+1 ∥W∥ l |z| l+1 ≤ (6np n logn) (L+1)/2 |z| L+2 . (2.16) In addition, by Lemma 3.2.2, we have L X l=2 z − (l+1) e ⊤ 1 W l v k ≤ L X l=2 z − (l+1) O p ((np n ) (l− 1)/2 )=O p (np n ) 1/2 |z| 3 . (2.17) Hence, by (2.15)-(2.17), we have e ⊤ 1 G(z)v k =− z − 1 e ⊤ 1 v k − z − 2 e ⊤ 1 Wv k − L X l=2 z − (l+1) e ⊤ 1 W l v k − ∞ X l=L+1 z − (l+1) e ⊤ 1 W l v k =− z − 1 e ⊤ 1 v k − z − 2 e ⊤ 1 Wv k +O p (np n ) 1/2 |z| 3 +O p (np n logn) (L+1)/2 |z| L+2 , (2.18) 14 where |z| ∼ d k ∼ np 3/2 n in the proof of Theorem 2.2.1. Using straightforward calculations, we can see that if we let L be large enough, say L=⌈ 1 4ϵ + 3 2 ⌉, where ϵ is a positive constant from condition 1, then (2.18) becomes e ⊤ 1 G(z)v k =− z − 1 e ⊤ 1 v k − z − 2 e ⊤ 1 Wv k +O p (np n ) 1/2 |z| 3 . (2.19) In this way, the matrix series can be truncated so that only the first few (finitely many) terms need to be retained. 2.3 Proof of Theorem 2.2.1 We will show (2.7) for i = 1 and the rest could follow from the same calculations. We first repeat the setup as in Fan et al. (2020). By defining the contour Ω k as the circle centered at (a k +b k )/2 with radius|a k − b k |/2, where a k and b k are defined by a k = d k /(1+2 − 1 c 0 ), d k >0 (1+2 − 1 c 0 )d k , d k <0 and b k = (1+2 − 1 c 0 )d k , d k <0 d k /(1+2 − 1 c 0 ), d k >0 , and c 0 is the constant in condition 4. Because of the gap between the eigenvalues d k of B E provided by condition 4, the contours Ω k encloses only d k but not d i ’s with i ̸= k. More importantly, by the fact that B E is a major term of B, the same holds for ˆ λ k ’s (Ω k encloses only ˆ λ k ). By applying Cauchy’s integral, the bilinear form can be expressed as a contour integral e ⊤ 1 ˆ v k ˆ v ⊤ k v k =− 1 2πi I Ω k e ⊤ 1 (B− zI n ) − 1 v k dz. (2.20) 15 Then, we write B = B E +W and define G(z) = (W− zI n ) − 1 , usually referred to as the resolvent and F k (z) = G(z)V − k [D − 1 − k +V ⊤ − k G(z)V − k ] − 1 V ⊤ − k G(z). By some manipulations of matrices, (2.20) can rewritten as e ⊤ 1 ˆ v k ˆ v ⊤ k v k = 1 2πi I Ω k e ⊤ 1 [G(z)− F k (z)]v k v ⊤ k [G(z)− F k (z)]v k 1+d k v ⊤ k [G(z)− F k (z)]v k dz. (2.21) Finally, by applying the residue formula, we reach e ⊤ 1 ˆ v k ˆ v ⊤ k v k = e ⊤ 1 [G( ˆ t k )− F k ( ˆ t k )]v k v ⊤ k [G( ˆ t k )− F k ( ˆ t k )]v k v ⊤ k [G ′ ( ˆ t k )− F ′ k ( ˆ t k )]v k , (2.22) whereG ′ (z) andF ′ k (z) are entrywise derivatives of G(z) andF k (z) and ˆ t k is the solution of the equation 1+d k v ⊤ k [G(z)− F k (z)]v k =0. Note that|z|∼ d k ∼ np 3/2 n . ThisisthesamesetupasinFanetal.(2020)(refertotheirequations(70)-(74)and(100) for more details). We then perform computations to get the asymptotic expansion given in (2.7). The expansion of e ⊤ 1 G(z)v k is shown in section 2.2.3 to be e ⊤ 1 G(z)v k =− z − 1 e ⊤ 1 v k − z − 2 e ⊤ 1 Wv k +O p |z| − 3 √ np n . (2.23) Similarly, we have e ⊤ 1 G(z)V − k =− z − 1 e ⊤ 1 V − k − z − 2 e ⊤ 1 WV − k +O p (|z| − 3 √ np n ). (2.24) We truncate v ⊤ k G(z)v k in the same way. However, the result will not be exactly the same since the order of (2.14) is higher than (2.12) by a factor of √ np n . Hence, we will need to keep some extra remainder terms. In particular, we have v ⊤ k G(z)v k =− z − 1 − z − 2 v ⊤ k Wv k − L X l=2 z − (l+1) v ⊤ k W l v k +O p |z| − 3 √ np n =− z − 1 − z − 2 R(v k ,v k )+O p |z| − 3 √ np n , (2.25) 16 where we denote R(v k ,v k )= L X l=1 z − (l− 1) v ⊤ k W l v k . (2.26) By Lemma 3.2.2, R(v k ,v k )=O p (1). Similarly, we have v ⊤ k G(z)V − k =− z − 2 R(v k ,V − k )+O p (|z| − 3 √ np n ), (2.27) V ⊤ − k G(z)V − k =− z − 1 I− z − 2 R(V − k ,V − k )+O p (|z| − 3 √ np n ). (2.28) where R(v k ,V − k ) and R(V − k ,V − k ) are the remainders replacing the corresponding v k by V − k in (2.26). The O p notation in (2.27) and (2.28) are for the entries of the vector and matrix, respectively. In addition, we compute [D − 1 − k +V − k G(z)V − k ] − 1 =[D − 1 − k − z − 1 I− z − 2 R(V − k ,V − k )+O p (|z| − 3 √ np n )] − 1 =D − k [I− z − 1 D − k − D − k (z − 2 R(V − k ,V − k )+O p (|z| − 2 √ np n ))] − 1 =D − k − z − 1 D 2 − k +z − 2 D 2 − k R(V − k ,V − k )+O p (|z| − 1 √ np n ). (2.29) Therefore, it implies that e ⊤ 1 F k (z)v k =(e ⊤ 1 G(z)V − k )[D − 1 − k +V − k G(z)V − k ] − 1 (V ⊤ − k G(z)v k ) = − z − 1 e ⊤ 1 V − k − z − 2 e ⊤ 1 WV − k D − k − z − 1 D 2 − k +z − 2 D 2 − k R(V − k ,V − k ) − z − 2 R(V − k ,v k ) +O p (|z| − 3 √ np n ) =z − 3 (e ⊤ 1 V − k − z − 1 e ⊤ 1 WV − k )(D − k − z − 1 D 2 − k )R(V − k ,v k )+O p (|z| − 4 √ np n ), (2.30) 17 and similarly v ⊤ k F k (z)v k =z − 4 R(v k ,V − k )(D − k − z − 1 D 2 − k )R(V − k ,v k )+O p (|z| − 4 ). (2.31) For the derivatives, we have v ⊤ k G ′ (z)v k =v ⊤ k ∞ X l=0 (l+1)z − (l+2) W l v k =z − 2 +z − 3 R ′ (v k ,v k )+O p (|z| − 4 √ np n ), (2.32) v ⊤ k G ′ (z)V − k =z − 3 R ′ (v k ,V − k )+O p (|z| − 4 √ np n ), (2.33) where we denote R ′ (v k ,v k )=z 3 (− z − 2 R(v k ,v k )) ′ = L X l=1 (l+1)z − (l− 1) v ⊤ k W l v k . (2.34) R ′ (v k ,V − k ) and R ′ (V − k ,V − k ) are similarly defined by replacing the corresponding v k by V − k in (2.34). We then compute [D − 1 − k +V − k G(z)V − k ] − 1 ′ =(D − k − z − 1 D 2 − k +z − 2 D 2 − k R(V − k ,V − k )) ′ +O p (|z| − 1 √ np n ) ′ =z − 2 D 2 − k − z − 3 D 2 − k R ′ (V − k ,V − k )+O p (|z| − 2 √ np n ). (2.35) 18 Hence, by (2.33) and (2.35), we have v ⊤ k F ′ k (z)v k =2v ⊤ k G ′ (z)V − k [D − 1 − k +V − k G(z)V − k ] − 1 V ⊤ − k G(z)v k +v ⊤ k G(z)V − k [D − 1 − k +V − k G(z)V − k ] − 1 ′ V ⊤ − k G(z)v k =2z − 3 R ′ (v k ,V − k ) D − k − z − 1 D 2 − k − z − 2 R(V − k ,v k ) +(− z − 2 R(v k ,V − k ))(z − 2 D 2 − k ) − z − 2 R(V − k ,v k ) +O p (|z| − 5 √ np n ) = − 2z − 5 R ′ (v k ,V − k ) D − k − z − 1 D 2 − k R(V − k ,v k ) +z − 6 R(v k ,V − k )D 2 − k R(V − k ,v k )+O p (|z| − 5 √ np n ) =O p (|z| − 4 ). (2.36) Using (2.23) and (2.30), we have e ⊤ 1 [G(z)− F k (z)]v k =− z − 1 e ⊤ 1 v k − z − 2 e ⊤ 1 WV − k − z − 3 (e ⊤ 1 V − k − z − 1 e ⊤ 1 WV − k )(D − k − z − 1 D 2 − k )R(V − k ,v k ) +O p (|z| − 3 √ np n ). (2.37) Using (2.25)-(2.31), we have v ⊤ k [G(z)− F k (z)]v k =− z − 1 − z − 2 R(v k ,v k )+O p (|z| − 3 ). (2.38) Using (2.32)-(2.36), we have v ⊤ k [G ′ (z)− F ′ k (z)]v k − 1 =[z − 2 +z − 3 R ′ (v k ,v k )+O p (|z| − 4 √ np n )] − 1 =z 2 (1+z − 1 R ′ (v k ,v k )) − 1 +O p ( √ np n ) =z 2 − zR ′ (v k ,v k )+O p ( √ np n ). (2.39) 19 Hence, by (2.22) and (2.37)-(2.39), we have e ⊤ 1 ˆ v k ˆ v ⊤ k v k = − ˆ t − 1 k e ⊤ 1 v k − ˆ t − 2 k e ⊤ 1 WV − k − ˆ t − 3 k (e ⊤ 1 V − k − ˆ t − 1 k e ⊤ 1 WV − k ) (D − k − ˆ t − 1 k D 2 − k )R(V − k ,v k ) − ˆ t − 1 k − ˆ t − 2 k R(v k ,v k ) ˆ t 2 k − ˆ t k R ′ (v k ,v k ) +O p (| ˆ t k | − 2 √ np n ) =e ⊤ 1 v k + ˆ t − 1 k e ⊤ 1 Wv k +O p (| ˆ t k | − 2 √ np n ), (2.40) v ⊤ k ˆ v k ˆ v ⊤ k v k = − ˆ t − 1 k − ˆ t − 2 k R(v k ,v k ) 2 ˆ t 2 k − ˆ t k R ′ (v k ,v k ) +O p (| ˆ t k | − 2 √ np n ) =1+ ˆ t − 1 k (2R(v k ,v k )− R ′ (v k ,v k ))+O p (| ˆ t k | − 2 √ np n ). (2.41) Recall the choice of ˆ v ⊤ k v k ≥ 0. Note that ˆ v ⊤ k v k ̸=0 with high probability. Hence, by (2.41), we have (ˆ v ⊤ k v k ) − 1 =(v ⊤ k ˆ v k ˆ v ⊤ k v k ) − 1/2 =1− 1 2 ˆ t − 1 k (2R(v k ,v k )− R ′ (v k ,v k ))+O p (| ˆ t k | − 2 √ np n ). (2.42) By (2.40) and (2.42), we have e ⊤ 1 ˆ v k =e ⊤ 1 ˆ v k ˆ v ⊤ k v k (ˆ v ⊤ k v k ) − 1 =e ⊤ 1 v k + ˆ t − 1 k e ⊤ 1 Wv k +O p (| ˆ t k | − 2 √ np n ). (2.43) By using Lemma 3 and equation (96) in Fan et al. (2020), (2.43), ˆ t k can be replaced by d k , as ˆ t k /d k = O p (1). This concludes the proof of Theorem 2.2.1 for e 1 . For e i , i ̸= 1, note that although (2.12) and (2.13) differs by a factor of p n , the orders of the remainder terms in (2.40) and (2.41) are determined by e ⊤ i Wv k , R(v k ,v k ) and R ′ (v k ,v k ). Thus, the result for e ⊤ i Wv k is achieved by replacing the order of the remainder by O p (|d k | − 2 √ p n ). ■ 20 2.4 Proof of Theorem 2.2.2 ToshowTheorem2.2.2, wefirstshowtheasymptoticjointnormalityof Y ⊤ I Wv k − Y ⊤ I w 1,v k , which is equivalent to show that for any unit vector x ∈ R N , the linear combination h − 1/2 n x ⊤ (Y ⊤ I Wv k − Y ⊤ I w 1,v k ) is asymptotically normal, where w 1,v k ∈R n is a vector with entries (w 1,v k ) i =w 1i v 1 . In other words, we want to show h − 1/2 n x ⊤ (Y ⊤ I Wv k − Y ⊤ I w 1,v k ) d − →N (0,x ⊤ Σ ∗ I x). (2.44) With a slight abuse of notation, we let y =h − 1/2 n Y I x. Note that y has only N (i.e., finitely many) nonzero entries with order O(h − 1/2 n ). We first condition on F ={all S} and we will apply the Lyapunov central limit theorem (see Billingsley (2008), Theorem 27.3). We compute y ⊤ Wv k − y ⊤ w 1,v k = X 2≤ i<j≤ n (y i w ij v j +y j w ji v i )+ X 2≤ i≤ n y i w ii v i , (2.45) where the j-th entry of v k is denoted as v j . Let s 2 n = X 2≤ i<j≤ n IE(y i w ij v j +y j w ji v i |F) 2 + X 2≤ i≤ n IE(y i w ii v i |F) 2 = X 2≤ i<j≤ n (y i v j +y j v i ) 2 (a 1i +a 1j − a 1i a 1j )IE(a ij − IEa ij ) 2 + X 2≤ i≤ n (y i v i ) 2 a 1i IE(a ii − IEa ii ) 2 . (2.46) 21 We further notice that x ⊤ Σ I x= X i 1 ,i 2 ∈I 2≤ j 1 ,j 2 ≤ n IE(x i 1 w i 1 j 1 v j 1 x i 2 w i 2 j 2 v j 2 |F) =h n X i 1 ,i 2 ∈I 2≤ j 1 ,j 2 ≤ n IE(y i 1 w i 1 j 1 v j 1 y i 2 w i 2 j 2 v j 2 |F) =h n s 2 n . (2.47) Therefore, s 2 n =h − 1 n x ⊤ Σ I x converges to x ⊤ Σ ∗ I x in probability. We also compute that X 2≤ i<j≤ n IE |y i w ij v j +y j w ji v i | 3 |F + X 2≤ i≤ n IE |y i w ii v i | 3 |F = X 2≤ i<j≤ n |y i v j +y j v i | 3 |a 1i +a 1j − a 1i a 1j |IE|a ij − IEa ij | 3 + X 2≤ i≤ n |y i v i | 3 a 1i IE|a ii − IEa ii | 3 . (2.48) Since ∥v k ∥ ∞ = O p ((np n ) − 1/2 ) and ∥y∥ ∞ ∼ h − 1/2 n , we have ∥y∥ ∞ ∥v k ∥ ∞ ≪ s n , under the assumption that h n ≫ max{(np n ) − 1 ,|d k | − 2 p n }. Then, since|a ii − IEa ii |≤ 1, we have 1 s 3 n X 2≤ i<j≤ n |y i v j +y j v i | 3 |a 1i +a 1j − a 1i a 1j |IE|a ij − IEa ij | 3 + X 2≤ i≤ n |y i v i | 3 a 1i IE|a ii − IEa ii | 3 ≤ C s 3 n X 2≤ i<j≤ n |y i v j +y j v i | 3 |a 1i +a 1j − a 1i a 1j |IE|a ij − IEa ij | 3 + X 2≤ i≤ n |y i v i | 3 a 1i IE|a ii − IEa ii | 3 ≪ C s 2 n X 2≤ i<j≤ n (y i v j +y j v i ) 2 (a 1i +a 1j − a 1i a 1j )IE(a ij − IEa ij ) 2 + X 2≤ i≤ n (y i v i ) 2 a 1i IE(a ii − IEa ii ) 2 =O(1). (2.49) 22 This shows that if we condition onF ={all S}, we have that y ⊤ Wv k − y ⊤ w 1,v k s n d − →N (0,1). (2.50) Then, by Slutsky’s Theorem and conditioning onF ={all S}, we achieve that y ⊤ Wv k − y ⊤ w 1,v k d − →N (0,x ⊤ Σ ∗ I x), (2.51) as needed. Note that the limiting distribution is independent of S. Hence, we can removing the condition onF ={all S} and the normal distribution in (2.51) becomes unconditional. In addition, we have that y ⊤ w 1,v k =h − 1/2 n x ⊤ Y ⊤ I w 1,v k =h − 1/2 n × O(∥v k ∥ ∞ )=o(1), almost surely. Applying Slutsky’s Theorem completes the proof of Theorem 2.2.2. ■ 2.5 An Improvement of Condition 1 In this section, we discuss an improvement that replaces condition 1 by a weaker condition, given by Condition 5. min j≥ 2 IP{a 1j = 1} ∼ p n and 1− c > p n ≫ p logn/n for some constant c>0. The potential problem with this new condition is that the existence of such large enough positive constant L in (2.18) is not guaranteed. That is, for any L∈N, the order of (np n logn) (L+1)/2 |z| L+2 ∼ logn np 2 n L/2 (logn) 1/2 n − 3/2 p − 5/2 n 23 mayexceedthatofz − 2 e ⊤ 1 Wv k . Thiscanbesolvedbyconsideringamoredelicatetruncation e ⊤ 1 G(z)v k = − z − 1 e ⊤ 1 v k − z − 2 e ⊤ 1 Wv k + L X l=2 z − (l+1) e ⊤ 1 W l v k + ⌈(logn) 1/2 ⌉ X l=L+1 z − (l+1) e ⊤ 1 W l v k + ∞ X l=⌈(logn) 1/2 ⌉+1 z − (l+1) e ⊤ 1 W l v k , (2.52) where L≥ 2 is some constant that can be specified later. Condition 1 implies np n ≫ logn. Hence, for any constant C >1, there is N ∈N such that for all n≥ N, we have (np n logn) (l+1)/2 |z| l+2 ≤ C − l (logn) 1/2 n − 3/2 p − 5/2 n . (2.53) Let l =⌈(logn) 1/2 ⌉+1 in (2.53). By picking a large enough C, we can achieve an order as tiny as needed and therefore we have ∞ X l=⌈ √ logn⌉+1 z − (l+1) e ⊤ 1 W l v k =O p (np n ) 1/2 |z| 3 , (2.54) asin(2.19). Now, toboundthesumfromL+1to⌈(logn) 1/2 ⌉, Lemma3.2.2isnotsufficient, since the number of terms in the sum diverges as n approaches ∞. We fix this issue by deriving Lemma 2.7.3, a modification of Lemma 2.2.4, through a more detailed counting. By Lemma 2.7.3 and the union bound, we see that IP ⌈(logn) 1/2 ⌉ \ l=L+1 e ⊤ 1 W l v k ≤ logn(4 l l l+1/2 )(np n ) (l− 1)/2 ≥ 1− C(logn) − 3/2 →1, as n→∞. In addition, we have the ratio l(np n ) 1/2 |z| ≤ ⌈(logn) 1/2 ⌉ √ np n ≪ 1. 24 By taking L large enough, we have ⌈(logn) 1/2 ⌉ X l=L+1 z − (l+1) e ⊤ 1 W l v k =O p |z| − (L+2) logn(np n ) L/2 =O p (np n ) 1/2 |z| 3 . (2.55) Hence, by (2.54)-(2.55), we are able to get the same result as (2.19) from the new truncation (2.52). The seriese ⊤ i G(z)v k , i̸=1 andv ⊤ k G(z)v k have the same truncation as before. As a result, the proof of Theorem 2.2.1 is unchanged and we can reach the following proposition: Proposition 2.5.1. Theorem 2.2.1 and Theorem 2.2.2 hold if condition 1 is replaced by 5. 2.6 Proof of Technical Lemmas 2.6.1 Proof of Lemma 2.2.3 Intheproof,wetakeuseofthetechniquesintheproofofLemma4inFanetal.(2020). When possible, we try to adopt the original notations in Fan et al. (2020) for readers’ convenience. However, the main difference results from the fact that the joint independence of the upper triangular part of the matrix does not hold in our case, which requires extra efforts to for such dependency to be dealt with. Recall that W =B− B E . The (i,j)-th entry of W takes the form w ij =(a ij − IEa ij ) 1− 1I {a 1i =0} 1I {a 1j =0} , (2.56) for i,j =1,...,n. Also recall the notation that W − 1 is the submatrix ofW by deleting the first row and column. Similar to Fan et al. (2020), we first consider W − 1 with zeros on the main diagonal. Then, we take away this restriction and consider the full matrix W after that. 25 2.6.1.1 For W − 1 with Zeros on its Diagonal Let x = (x 1 ,...,x n ) ⊤ and y = (y 1 ,...,y n ) ⊤ be two unit vectors in R n and l ≥ 1 an integer. Let x − 1 = (x 2 ,...,x n ) and y − 1 = (y 2 ,...,y n ). Suppose the diagonal entries of W − 1 are zeros. In the following part of the proof, we find the order of IE( x ⊤ − 1 W l − 1 y − 1 − x ⊤ − 1 IEW l − 1 y − 1 ) 2 . Expanding this, we have IE(x ⊤ − 1 W l − 1 y − 1 − x ⊤ − 1 IEW l − 1 y − 1 ) 2 = X 2≤ i 1 ,...i l+1 ,j 1 ...j l+1 ≤ n is̸=i s+1 ,js̸=j s+1 ,1≤ s≤ l IE (x i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 y i l+1 − IEx i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 y i l+1 ) (x j 1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 y j l+1 − IEx j1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 y j l+1 ) . (2.57) In fact, many terms are zeros in (2.57). A graph visualization of the sum is commonly used to identify those terms. See Tao (2012) section 2.3.4 for example. For vector i of indices, letG i be the graph with vertices{i 1 ,...,i l+1 } and edges{(i 1 ,i 2 ),...(i l ,i l+1 )}. Vertices and edges are possibly repeated. It is clear that a term in the sum in (2.57) is nonzero only if the associated graphG i ∪G j is connected and each distinct edge has at least one copy. We then take note that for each term in the sum in (2.57), w i 1 i 2 w i 2 i 3 ...w i l i l+1 and w j 1 j 2 w j 2 j 3 ...w j l j l+1 are either independent (when the sets {w i 1 i 2 ,...,w i l i l+1 } and {w j 1 j 2 ,...,w j l j l+1 } are disjoint) or positively correlated (when the two sets are not dis- joint). The same holds for |x i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 y i l+1 | and |x j 1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 y j l+1 |. We further see that for any two positively correlated random variables X,Y, we have 26 IE[XY] > IEXIEY. Applying this fact by letting X = |x i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 y i l+1 | and Y =|x j 1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 y j l+1 |, we have IE |x i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 y i l+1 |− IE|x i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 y i l+1 | |x j 1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 y j l+1 |− IE|x j1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 y j l+1 | ≤ 2IE x i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 y i l+1 x j 1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 y j l+1 , (2.58) for any term in the sum in (2.57). Using (2.58), we then bound (2.57) from above by (2.57)≤ 2 X 2≤ i 1 ,...i l+1 ,j 1 ...j l+1 ≤ n is̸=i s+1 ,js̸=j s+1 ,1≤ s≤ l IE x i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 y i l+1 x j 1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 y j l+1 . (2.59) To facilitate further discussion, we introduce some notations (with modifications) used in Fan et al. (2020). We combine the two vectors of indices i and j into one by letting ˜ i=(i 1 ,...,i l+1 ,j 1 ,...,j l+1 )=(i 1 ,...,i 2l+2 ). LetG ˜ i bethegraphwithvertices{i 1 ,...,i 2l+2 } and edges {(i 1 ,i 2 ),...(i l ,i l+1 ),(i l+2 ,i l+3 ),...,i 2l+1 ,i 2l+2 )}. Notice that only the ˜ i’s associ- ated with connectedG ˜ i need to be considered. Suppose there are m distinct edges. Let the set of those edges be denoted as P( ˜ i) = {(s 1 ,s 2 ),(s 3 ,s 4 ),...,(s 2m− 1 ,s 2m )}. Then, let the partition of the set {1,2,...,2m} be Q(2m) = {Q 1 ,...,Q h }, where h is the number of blocks in the partition. We also de- note s i Q ∼ s j if i,j ∈ Q k for some k ∈ {1,2,...,h} and we require that under a specific Q(2m), s i = s j if and only if s i Q ∼ s j . The partition Q(2m) can be understood as gluing vertices of the distinct edges. Since it is only meaningful to consider connected graphs, as 27 seen above, we assume by default that all the partitions Q(2m) in the following proof are the ones that create connected graphs. Using the above notations, we have X 2≤ i 1 ,...i l+1 ,j 1 ...j l+1 ≤ n is̸=i s+1 ,js̸=j s+1 ,1≤ s≤ l IE x i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 y i l+1 x j 1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 y j l+1 ≤ l X |P( ˜ i)|=m=1 X r 1 ,...,rm≥ 2 r 1 +...+rm=2l X Q(2m) X 2≤ s 1 ,...,s 2m ≤ n s k 1 =s k 2 iff s k 1 Q ∼ s k 2 |x i 1 y i l+1 x i l+2 y i 2l+2 |IE m Y j=1 |w s 2j− 1 s 2j | r j , (2.60) which is essentially the same expression as (A.23) in Fan et al. (2020). We note that the four sums from left to right correspond to choosing the number of distinct edges, choosing the number of repetitions of each edge, gluing distinct vertices and assigning numbers to the vertices, respectively. Thus, each term in the sum should represent a graph G ˜ i and hence a vector ˜ i as well. Figure 2.2: A diagram illustrating the graph visualization of the sum with m = 3, r 1 = 2, r 2 = 4, r 3 =3, Q(6)={{1,6},{2,3}}, s 1 =s 6 =2, s 2 =s 3 =3, s 4 =5 and s 5 =6. We further denote ˜ w ij = a ij − IEa ij for all i,j = 1,...,n. Then, w ij = ˜ w ij 1− 1I {a 1i =0} 1I {a 1j =0} and (2.60) becomes l X |P( ˜ i)|=m=1 X r 1 ,...,rm≥ 2 r 1 +...+rm=2l X Q(2m) X 2≤ s 1 ,...,s 2m ≤ n s k 1 =s k 2 iff s k 1 Q ∼ s k 2 |x i 1 y i l+1 x i l+2 y i 2l+2 | × IE " m Y j=1 |˜ w s 2j− 1 s 2j | r j 1− 1I {a 1s 2j− 1 =0} 1I {a 1s 2j =0} r j # . (2.61) 28 We see that the product on the second line of (2.61) can be reduced to m Y j=1 IE|˜ w s 2j− 1 s 2j | r j ! IE " m Y j=1 1− 1I {a 1s 2j− 1 =0} 1I {a 1s 2j =0} # , by independence and by the fact that 1− 1I {a 1s 2j− 1 =0} 1I {a 1s 2j =0} is boolean. Then, the first three sums (starting from left) are finite sums, which reduces the problem to finding the order of X 2≤ s 1 ,...,s 2m ≤ n s k 1 =s k 2 iff s k 1 Q ∼ s k 2 |x i 1 y i l+1 x i l+2 y i 2l+2 | m Y j=1 IE|˜ w s 2j− 1 s 2j | r j ! IE " m Y j=1 1− 1I {a 1s 2j− 1 =0} 1I {a 1s 2j =0} # . (2.62) Putting the expectation of the boolean expression aside, we have X 2≤ s 1 ,...,s 2m ≤ n s k 1 =s k 2 iff s k 1 Q ∼ s k 2 |x i 1 y i l+1 x i l+2 y i 2l+2 | m Y j=1 IE|˜ w s 2j− 1 s 2j | r j . (2.63) The order of this sum has been considered in Fan et al. (2020). The arguments made to reach [A.30] in the proof of Lemma 4 in Fan et al. (2020) builds on the fact that we can pick an order of deleting edges and vertices (i.e., deleting s j ’s from the sum in (2.63)) such that the graph is connected till it only contains one vertex and no edge. By Cauchy-Schwartz inequality P i,j a i b j ≤ P i a i P j b j , removing each distinct edge together with an endpoint (equivalent to removing an index s j from the sum(2.63)) contributes to an order of X s 2j− 1 IE|˜ w s 2 j − 1 s 2j | r j ≤ IE(W− IEW) 2 =O(np n ), where the equality is by Lemma 2.2.3. This gives a total order of n(np n ) m , where the extra n comes from the last vertex left in the procedure of deleting edges. Then, by using the fact that x and y are unit vectors, the extra n can be removed by using either P i x 2 i = 1 or 29 P i y 2 i =1. Then, the other can be used to reduce the total order by np n or provide an extra factor of d x =∥x∥ ∞ or d y =∥y∥ ∞ . Hence, the order of (2.63), depending on the choice of m can be improved to be (2.63)=O p min (np n ) m− 1 ,d 2 x (np n ) m ,d 2 y (np n ) m . (2.64) To find the order of IE[ Q m j=1 (1− 1I {a 1s 2j− 1 =0} 1I {a 1s 2j =0} )], we observe that by using the trivial bound 1− 1I {a 1s 2j− 1 =0} 1I {a 1s 2j =0} ≤ 1 for any j∈[m], IE " m Y j=1 1− 1I {a 1s 2j− 1 =0} 1I {a 1s 2j =0} # ≤ IE 1− 1I {a 1s 1 =0} 1I {a 1s 2 =0} =O(p n ). (2.65) This is, however, the finest order we can get. We observe this by searching through all connected graphs (there are only finitely many of them) with m distinct edges and pick out thelargest order. Consider atree withone internal vertex and m leaves, referred toasa star in graph theory. Without loss of generality, let the internal vertex be s 2 = s 4 = ... = s 2m s 2 Figure 2.3: Example of a star with one internal vertex and 5 leaves 30 andtheleavesbes 1 ,s 3 ,...,s 2m− 1 andnoneofthemareequal. ByDeMorgan’slaw, wehave IE " m Y j=1 1− 1I {a 1s 2 =0} 1I {a 1s 2j− 1 =0} # =IE " m Y j=1 1I {a 1s 2 =1} ∨1I {a 1s 2j− 1 =1} # =IP ( m Y j=1 1I {a 1s 2 =1} ∨1I {a 1s 2j− 1 =1} =1 ) =IP m \ k=1 n 1I {a 1s 2 =1} ∨1I {a 1s 2j− 1 =1} =1 o ! ≥ IP(a 1s 2 =1)=O(p n ), (2.66) where a∨b=max{a,b}. (2.65) and (2.66) imply that for any positive integer m, the contribution of Q m j=1 IE(1− 1I {a 1s 2j− 1 =0} 1I {a 1s 2j =0} ) is constantly O(p n ). Letting m=l, we can conclude from (2.64) that (2.61)=O min (np n ) l− 1 ,d 2 x (np n ) l ,d 2 y (np n ) l p n . (2.67) 2.6.1.2 Generalization to W The generalization from W − 1 with non-zero diagonal entries to W is less trivial. We now update the expansion given by (2.57) and the bound (2.59). IE(x ⊤ W l y− x ⊤ IEW l y) 2 = X 1≤ i 1 ,...i l+1 ,j 1 ...j l+1 ≤ n is̸=i s+1 ,js̸=j s+1 ,1≤ s≤ l IE (x i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 y i l+1 − IEx i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 y i l+1 ) (x j 1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 y j l+1 − IEx j1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 y j l+1 ) (2.68) ≤ 2 X 1≤ i 1 ,...i l+1 ,j 1 ...j l+1 ≤ n is̸=i s+1 ,js̸=j s+1 ,1≤ s≤ l IE x i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 y i l+1 x j 1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 y j l+1 . (2.69) Similar to (2.57) and (2.59), terms in (2.69) are associated with graphsG ˜ i that is connected and the edges (i j ,i j+1 ) should have at least one copy if i j ,i j+1 ̸= 1. The difference now is thatanedgewithendpoint1donothavetoberepeated, aslongasitsotherendpointisalso an endpoint of another edge with both endpoints not 1. So, it is to be determined whether 31 such difference could raise the total order. We first consider the most extreme case that every edge has an endpoint 1. Let l be odd. Then, either i 1 = i 3 = ... = i l = 1 or i 2 = i 4 = ... = i l+1 = 1. Similarly, either j 1 = j 3 = ... = j l = 1 or j 2 = j 4 = ... = j l+1 = 1. There are four cases in total and we first consider one case for which we have the sum X 1≤ i 2 ,i 4 ,...i l+1 ,j 2 ,j 4 ...j l+1 ≤ n is̸=i s+1 ,js̸=j s+1 ,1≤ s≤ l IE x 1 w 1i 2 w i 2 1 ...w 1i l+1 y i l+1 x 1 w 1j 2 w j 2 1 ...w 1j l+1 y j l+1 . (2.70) Suppose there are m distinct vertices (same as number of distinct edges) other than 1. Each such term has an order of O(p m n ) and there are O(n m ) such terms in total. This implies a total order of O(x 2 1 d 2 y (np n ) m ). On the other hand, if we use the fact that y is unit and the inequality that |y i l+1 y i l+1 | ≤ y 2 i l+1 +y 2 j l+1 , we can reduce the number of vertices by 1. This gives an order of O(x 2 1 (np n ) m− 1 ). We notice that all edges except (1,i l+1 ) and (1,j l+1 ) are repeated at least once. To get a nonzero value, we need these two edges to be repeated and the two chains to be connected by a vertex other than 1. This can be achieved by letting i l+1 = j l+1 and if all other edges are repeated exactly once, we achieve the most number of distinct edges, l in total. Hence, we have that (2.70)= l X m=1 O(min{x 2 1 (np n ) m− 1 ,x 2 1 d 2 y (np n ) m })=O(min{x 2 1 (np n ) l− 1 ,x 2 1 d 2 y (np n ) l }). (2.71) Repeat the calculation for the other three cases, we achieve the final order O max min{x 2 1 (np n ) l− 1 ,x 2 1 d 2 y (np n ) l },min{y 2 1 (np n ) l− 1 ,y 2 1 d 2 x (np n ) l } (2.72) For even l, the one possibility is given by the case i 1 = i 3 = ... = i l+1 = 1, j 1 = j 3 = ... = j l+1 =1, which yields an order of O(x 2 1 y 2 1 (np n ) l− 1 )≤ O((2.72)). (2.73) 32 Another possibility is i 2 = i 4 = ... = i l = 1 and j 2 = j 4 = ... = j l = 1, which yields O(min{(np n ) l− 2 ,d 2 x (np n ) l− 1 ,d 2 y (np n ) l− 1 }) smaller than (2.72). Thus, (2.72) is the order for this extreme case. Wethenturntothegeneralcase. Thereare2l edgesinthegraph, whichcanbeclassified into the following types: 1. Edges with neither endpoints 1. Let the number of distinct edges of this type be m 1 . 2. Edges with endpoints 1 and have common endpoints with some edges of type 1. Let the number of distinct edges of this type be m 2 . 3. Edges with endpoints 1 and have no common endpoints with any edge of type 1. Let the number of distinct edges of this type be m 3 . In addition, those edges should satisfy the following rules: 1. Edges of types 1 and 3 have at least one copy and to get the order of (2.69), we only need to consider the case that these edges have exactly one copy. 2. If m 1 ,m 3 >0, then m 2 >0 so that the graph is connected. 3. 2m 1 +m 2 +2m 3 =2l. Note that the extreme case above is with m 1 = m 2 = 0 and m 3 = l. Then, repeating the arguments we took from Fan et al. (2020) to reach (2.64), we see that types 1-3 edges contribute orders of O((np n ) m 1 ), O(p m 2 n ) and (np n ) m 3 , respectively. This gives a total order of O((np n ) (l− m 2 2 ) p m 2 n ), by rule 3. Note that rule 3 also forces m 2 to be even. Since we are only interested in the case where m 1 ̸= 2l (this case has been treated), the largest possible order is given by m 2 =0, which is essentially the extreme case. Taking m 2 =0 and applying the improvement of order by the unit vectorsx andy, the result agrees with our calculation in (2.72). The last step is to generalize toW with non-zero diagonal entries. This follows from the 33 argument in the proof of Lemma 4 in Fan et al. (2020), which shows that allowing self-edges makes no difference. To conclude the proof, we denote α 2 n =min (np n ) l− 1 ,d 2 x (np n ) l ,d 2 y (np n ) l p n , (2.74) β 2 n =max min{x 2 1 (np n ) l− 1 ,x 2 1 d 2 y (np n ) l },min{y 2 1 (np n ) l− 1 ,y 2 1 d 2 x (np n ) l } . (2.75) Then we have IE(x ⊤ W l y− x ⊤ IEW l y) 2 =O(max{α n ,β n }) . (2.76) The result of Lemma 2.2.3 follows from Chebyshev’s inequality. ■ 2.6.2 Lemma 2.6.1, Lemma 2.6.2 and their Proofs Theorem 2 of Han and Tong (2021) gives a decomposition of the random vector v k as v k =SUq 1k +(I− S)Uq 2k , (2.77) where q 1k = λ − 1 k ΛU ⊤ v k and q 2k = λ − 1 k ΛU ⊤ Sv k . Based on this decomposition, we derive Lemma2.6.1andLemma2.6.2. TosimplifythenotationsusedinthetwoLemmas,wedefine the following sequences. γ n =min (np n ) (l− 1)/2 ,d x (np n ) l/2 ,d U (np n ) l/2 , (2.78) δ n =max min{x 1 (np n ) (l− 1)/2 ,x 1 d U (np n ) l/2 },min{d U (np n ) (l− 1)/2 ,d U d x (np n ) l/2 } , (2.79) τ n =min (np n ) (l− 1)/2 ,d U (np n ) l/2 , (2.80) η n =max min{x 1 (np n ) l/2− 1 ,x 1 d U (np n ) l/2 },min{d U (np n ) l/2− 1 ,d U d x (np n ) l/2 } , (2.81) κ n =min{d U (np n ) l/2− 1 ,d 2 U (np n ) l/2 }. (2.82) 34 Lemma 2.6.1. For any integer l≥ 1, k∈[2K] and any unit vector x∈R n , we have x ⊤ W l SU− IEW l SU q 1k =O p (max{γ n ,δ n p − 1/2 n }), (2.83) x ⊤ W l (I− S)U− IEW l (I− S)U q 2k =O p (max{γ n p n ,δ n p 1/2 n }), (2.84) q ⊤ 1k U ⊤ SW l SU− IEU ⊤ SW l SU q 1k =O p (max{τ n ,τ n d U p − 1 n }), (2.85) q ⊤ 1k U ⊤ SW l (I− S)U− IEU ⊤ SW l (I− S)U q 2k =O p (max{τ n p 1/2 n ,τ n d U }), (2.86) q ⊤ 2k U ⊤ (I− S)W l (I− S)U− IEU ⊤ (I− S)W l (I− S)U q 2k =O p (max{τ n p 3/2 n ,τ n d U p n }), (2.87) where q 1k and q 2k are vectors defined in (2.77). Lemma 2.6.2. For any integer l≥ 2, k∈[2K] and any unit vector x∈R n , we have x ⊤ IEW l SU q 1k =O(max{(np n ) l/2 p 1/2 n ,η n p − 1/2 n }), (2.88) x ⊤ IEW l (I− S)U q 2k =O(max{(np n ) l/2 p 3/2 n ,η n p 1/2 n }), (2.89) q ⊤ 1k IEU ⊤ SW l SU q 1k =O(max{(np n ) l/2 ,κ n p − 1 n }), (2.90) q ⊤ 2k IEU ⊤ (I− S)W l (I− S)U q 2k =O(max{(np n ) l/2 p 2 n ,κ n p n }), (2.91) q ⊤ 1k IEU ⊤ SW l (I− S)U q 2k =O(max{(np n ) l/2 p n ,κ n }). (2.92) ThepointofderivingLemma2.6.1isthatthevectorv k israndom. Lemma2.2.3isderived by bounding the variance of the random variable x ⊤ W l − IEW l y, which is essentially an expectation. However, if the unit vectors x or y is replaced by v k which is not independent of W, the proof of Lemma 2.2.3 fails. Such dependency suggests that a gap needs to be filled between the Lemma 2.2.3 and Lemma 2.6.1. The value of (2.77) is that it allows us to express a high-dimensional random vector v k ∈ R n in terms of two low-dimensional random vectors q 1k ,q 2k ∈ R K . We give the following Corollary to take advantage of this fact in the proof of Lemma 2.6.1. 35 Corollary 2.6.1. The entries of q 1k are with order O(1/ √ p n ) almost surely and the entries of q 2k are with order O( √ p n ) almost surely. The proof is left in section 2.7.1. 2.6.2.1 Proof of Lemma 2.6.1 We first find the order of x ⊤ W l SU− IEW l SU q 1k . Since we know the size of entries in q 1k , we are left to find the size of entries in the row vector x ⊤ W l SU− IEW l SU . Denote itsh-thentryas x ⊤ W l SU− IEW l SU h , forh∈[K]. Weproceed inthesameprocedure as in the proof of Lemma 2.2.3. Similar to (2.57), the second moment, with W replaced by W − 1 with zero diagonal entries, can be expanded as IE (x ⊤ − 1 W l − 1 S − 1 U − 1 ) h − (x ⊤ − 1 IEW l − 1 S − 1 U − 1 ) h 2 = X 2≤ i 1 ,...i l+1 ,j 1 ...j l+1 ≤ n is̸=i s+1 ,js̸=j s+1 ,2≤ s≤ l IE x i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 s i l+1 i l+1 u i l+1 h − IEx i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 s i l+1 i l+1 u i l+1 h x j 1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 s j l+1 i l+1 u j l+1 h − IEx j1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 s j l+1 j l+1 u j l+1 h , (2.93) The sum in (2.93) is already very close to (2.57), except that we have the first column of U instead of a unit vector y and we have an extra diagonal entry of the matrix S. By the assumption that d U =∥U∥ max =O(1/ √ n), we can treat the any column of the matrixU as if it is an unit vector because P n i=1 u 2 ij = O(1) for any j ∈ [K]. So, what remains is to find 36 outwhethertheextras i l+1 i l+1 =a 1i l+1 ands j l+1 j l+1 =a j l+1 couldbringanychange. Following the same arguments to reach (2.61) in the proof of Lemma 2.2.3, we have (2.93)≤ C X 2≤ i 1 ,...i l+1 ,j 1 ...j l+1 ≤ n is̸=i s+1 ,js̸=j s+1 ,2≤ s≤ l IE x i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 s i l+1 i l+1 u i l+1 h × x j 1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 s j l+1 j l+1 u j l+1 h (2.94) ≤ C l X |P( ˜ i)|=m=1 X r 1 ,...,rm≥ 2 r 1 +...+rm=2l X Q(2m) X 2≤ s 1 ,...,s 2m ≤ n s k 1 =s k 2 iff s k 1 Q ∼ s k 2 |x i 1 u i l+1 h x i l+2 u i 2l+2 h | × IE " a 1i l+1 a 1i 2l+2 m Y j=1 |˜ w s 2j− 1 s 2j | r j 1− 1I {a 1s 2j− 1 =0} 1I {a 1s 2j =0} r j # , (2.95) where ˜ w ij =a ij − IEa ij definedinproofofLemma2.2.3andthecorresponding s j oftheindices i l+1 and i 2l+2 are determined by the partition Q(2m). The indices s 2j− 1 and s 2j in (2.95) should not be confused with the entries of the matrix S. Again, we put Q m j=1 |˜ w s 2j− 1 s 2j | r j aside. Since there are only finitely many partitions Q(2l), we need to determine the one among all that gives the largest order. We consider partitions that corresponds to a star (see figure 2.3). In the proof of Lemma 2.2.3, we have the star formed by letting s 2 = s 4 = ... = s 2m and the rest s j ’s all distinct and different from s 2 . We modify this by: 1. If i l+1 and i 2l+2 are even, then no modification is needed. 2. If i l+1 is odd, then replace s i l+1 +1 in the equality s 2 =s 4 =...=s 2m by s i l+1 . 3. Repeat for i 2l+2 if it is odd. 37 For the resulting star, we have IE " a 2 1s 2 m Y j=1 1− 1I {a 1s 2 =0} 1I {a 1s 2j− 1 =0} # =IE " a 1s 2 m Y j=1 1I {a 1s 2 =1} ∨1I {a 1s 2j− 1 =1} # =IP {a 1s 2 =1} \ ( m Y j=1 1I {a 1s 2 =1} ∨1I {a 1s 2j− 1 =1} =1 )! =IP m \ k=1 n 1I {a 1s 2 =1} ∨1I {a 1s 2j− 1 =1} =1 o \ {a 1s 2 =1} ! ≥ IP(a 1s 2 =1)=O(p n ). (2.96) This means that the entries of the diagonal matrix S do not reduce the order of (2.95) by a factor p n . Hence, (2.95) and (2.61) have the same order, given by (2.67). Together with (2.93)-(2.95), we have (2.93)=O p min (np n ) (l− 1) ,d 2 x (np n ) l ,d 2 U (np n ) l p n =O p (γ 2 n p n ). (2.97) We can then generalize toW, following the same argument in the proof of Lemma 2.2.3. By the same analysis of the extreme case in which all edges have an endpoint 1, we reach the conclusion that IE (x ⊤ W l SU) h − (x ⊤ IEW l SU) h 2 =O(max{γ 2 n p n ,δ 2 n }). (2.98) Therefore, we can conclude that x ⊤ W l SU− IEW l SU q 1k =O p (max{γ n p 1/2 n ,δ n })× O(p − 1/2 n )=O p (max{γ n ,δ n p − 1/2 n }). (2.99) We then look at x ⊤ W l (I− S)U− IEW l (I− S)U q 2k . The difference between this term and (2.83) is that the matrix S is now I− S and the entries of the low-dimensional vector q 2k is with a smaller order O( √ p n ). This makes the analysis easier since the matrix I− S 38 can now be ignored, as its nonzero entries 1− a ii =O(1) for all i∈[n]. Repeating the same calculations, we can achieve (2.84), which is essentially the order of (2.83) multiplied by a factor of p n . (2.85)-(2.86)canbeachievedbythesamecalculationsasabove. Wewillonlyshow(2.85). We expand IE (U ⊤ − 1 S ⊤ − 1 W l − 1 S − 1 U − 1 ) h 1 h 2 − (U ⊤ − 1 S ⊤ − 1 IEW l − 1 S − 1 U − 1 ) h 1 h 2 2 = X 2≤ i 1 ,...i l+1 ,j 1 ...j l+1 ≤ n is̸=i s+1 ,js̸=j s+1 ,2≤ s≤ l IE u h 1 i 1 s i 1 i 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 s i l+1 i l+1 u i l+1 h 2 − IEu h 1 j 1 s j 1 j 1 w i 1 i 2 w i 2 i 3 ...w i l i l+1 s i l+1 i l+1 u i l+1 h 2 u h 1 j 1 w j 1 j 2 w j 2 j 3 ... w j l j l+1 s j l+1 i l+1 u j l+1 h 2 − IEu h 1 j 1 w j 1 j 2 w j 2 j 3 ...w j l j l+1 s j l+1 j l+1 u j l+1 h 2 ≤ C l X |P( ˜ i)|=m=1 X r 1 ,...,rm≥ 2 r 1 +...+rm=2l X Q(2m) X 2≤ s 1 ,...,s 2m ≤ n s k 1 =s k 2 iff s k 1 Q ∼ s k 2 |u h 1 i 1 u i l+1 h 2 u h 1 i l+2 u i 2l+2 h 2 | × IE " a 1i l+1 a 1i 2l+2 m Y j=1 |˜ w s 2j− 1 s 2j | r j 1− 1I {a 1s 2j− 1 =0} 1I {a 1s 2j =0} r j # . (2.100) We can ignore the s ii ’s and reach the same order as (2.97). However, we can make the following argument to get a finer order. We take note that in (2.100), a 1i 1 , a 1i l+1 , a 1i l+2 and a 1i 2l+2 are possibly correlated with the boolean term Q m j=1 (1− 1I {a 1s 2j− 1 =0} 1I {a 1s 2j =0} ), for example when i 1 is one of the s 2j− 1 or s 2j . Hence, to find the order of (2.100), we either keep a 1i 1 a 1i l+1 a 1i l+2 a 1i 2l+2 or Q m j=1 (1− 1I {a 1s 2j− 1 =0} 1I {a 1s 2j =0} ). If we keep the second, then by (2.96), this contributes to an order of p n . If we keep the first (which indeed gives a finer order), then several cases are to be considered. 39 1. Suppose two pairs of indices are equal and without loss of generality let i 1 = i l+1 ̸= i l+2 = i 2l+2 . Then, IEa 1i 1 a 1i l+1 a 1i l+2 a 1i 2l+2 = IEa 1i 1 a 1i l+2 contributes to an order of p 2 n , which gives an extra factor of p n . Hence, this implies (2.100)=O min (np n ) (l− 1) ,d 2 U (np n ) l p 2 n =τ 2 n p 2 n . (2.101) 2. Suppose i 1 = i l+1 = i l+2 = i 2l+2 . Then IEa 1i 1 a 1i l+1 a 1i l+2 a 1i 2l+2 = IEa 1i 1 contributes to an order of p n . On the other hand, setting all four indices equal reduces the total number of distinct edges of the corresponding graph by 1. Compared to the previous case, this gives an increase in order by a factor of 1/p n with a compensate of losing a factor of np n , corresponding to the order contributed by an edge. Since (np 2 n ) − 1 ≪ 1, we have in the case an order smaller than (2.101). 3. Allotherpossibilitieshaveorderslessthanorequalto(2.101), asmoredistinctindices gives more factors of p n . Therefore, (2.101) gives the finest order for (2.100). Going through the reg- ular procedure for generalization to W, the extreme case gives an order of O p (min{d U (np n ) (l− 1)/2 ,d 2 U (np n ) l/2 })=O p (τ n d U ). We can conclude that q ⊤ 1k U ⊤ SW l SU− IEU ⊤ SW l SU q 1k =O(p − 1/2 n )× O p (max{τ n p n ,τ n d U })× O(p − 1/2 n )=O p (max{τ n ,τ n d U p − 1 n }), which shows (2.85). It could be seen that (2.87) and (2.86) follow from similar calculations. ■ 40 2.6.2.2 Proof of Lemma 2.6.2 The proof of Lemma 2.6.2 is the same as that of Lemma 2.6.1. We will show (2.88) and the rest follows from the same argument. We expand (x ⊤ − 1 IEW l − 1 S − 1 U − 1 ) h ≤ X 2≤ i 1 ,...i l+1 ≤ n is̸=i s+1 ,2≤ s≤ l IE|x i 1 w i 1 i 2 ...w i l i l+1 s i l+1 i l+1 u i l+1 h |, (2.102) which is in the same form as (2.94), with half number of edges. It is shown by (2.96) and (2.97) in the proof Lemma 2.6.1 that the s i l+1 i l+1 has no effect on the order and can be replaced by 1. By|x i 1 u i l+1 h |≤ (x 2 i 1 +u 2 i l+1 h )/2 and repeating the counting, we achieve that (2.102)=O (np n ) l/2 p n . (2.103) Then, we generalize to W similar to the proofs of Lemma 2.6.1 and Lemma 2.6.2. The extreme case gives an order of O(η n ), defined in (2.81). Together we have x ⊤ (IEW l SU)q 1k =O(max{(np n ) l/2 p n ,η n })× O(p − 1/2 n )=O(max{(np n ) l/2 p 1/2 n ,η n p − 1/2 n }), which shows (2.88). ■ 2.6.3 Lemma 2.6.3: An Improvement of Lemma 2.6.2 for e i Lemma 2.6.3. For i∈ [n], let x =e i , the i-th element in the standard basis of R n . Then, for l≥ 2, we have e ⊤ i IEW l SU q 1k =O(max{d U (np n ) l/2 p 1/2 n ,η n p − 1/2 n }), (2.104) e ⊤ i IEW l (I− S)U q 2k =O(max{d U (np n ) l/2 p 3/2 n ,η n p 1/2 n }), (2.105) where η n is defined in (2.81). 41 Proof. This follows from the same proof as Lemma 2.6.2. The only difference is that by letting x = e i , the first index in the sum (2.102) is fixed so that u i l+1 h can be factored out to give the improvement of a factor of d U in (2.104). For the same reason, we get the same improvement for (2.105). ■ 2.6.4 Proof of Lemma 2.2.4 Lemma 2.2.4 is a combination of the results in Lemma 2.6.1, Lemma 2.6.2 and Lemma 2.6.3. We first show (2.12). By the decomposition in (2.77) and the assumption that d U = O(1/ √ n), we have e ⊤ 1 W l v k =e ⊤ 1 W l SUq 1k +e ⊤ 1 W l (I− S)Uq 2k =e ⊤ 1 W l SU− IEW l SU q 1k +e ⊤ 1 IEW l SU q 1k +e ⊤ 1 W l (I− S)U− IEW l (I− S)U q 2k +e ⊤ 1 IEW l (I− S)U q 2k =O p (n − 1/2 (np n ) l/2 )+O((np n ) (l− 1)/2 )+O p (n − 1/2 (np n ) l/2 p 1/2 n ) +O((np n ) (l− 1)/2 p 1/2 n ) =O p ((np n ) (l− 1)/2 ). It could be easily verified that (2.13) and (2.14) follow from similar calculations. We then show the case that l =1. Use the same decomposition above. By Lemma 2.6.1, we have e ⊤ 1 (WSU− IEWSU)q 1k =O p ( √ p n ). Then, for h∈[2K], we compute e ⊤ 1 IEWSU h = n X j=1 IEw 1j s jj u jh =O( √ np n ). Hence,multipliedbyanorderof1/ √ p n givenbyentriesofq 1k ,wehavee ⊤ 1 Wv k =O p ( √ np n ). The result fore i , i̸=1 can be computed similarly. Note that the expectation (e ⊤ 2 IEWSU) h 42 is O(p n / √ n) so that the order is given by e ⊤ 2 (WSU− IEWSU)q 1k = O p ( √ p n ). For v k , we rewrite v ⊤ k Wv k =q ⊤ 1k U ⊤ SW l SU q 1k +q ⊤ 2k U ⊤ (I− S)W l (I− S)U q 2k +2q ⊤ 1k U ⊤ SW l (I− S)U q 2k . (2.106) We then compute for h 1 ,h 2 ∈[K], IEU ⊤ SW l SU h 1 h 2 = X 1≤ i 1 ,i 2 ≤ n IEu h 1 i 1 s i 1 i 1 w i 1 i 2 s i 2 i 2 u i 2 h 2 =O(p n ), since when i 1 ,i 2 ̸= 1, the corresponding expectation is 0. So, there are only O(n) nonzero terms, each with order O(p n /n). Hence, q ⊤ 1k IEU ⊤ SW l SU q 1k = O(1) together with Lemma 2.6.1 equation (2.85), we have that q ⊤ 1k U ⊤ SW l SU q 1k =O p (1). By similar calculations, the rest terms in (2.106) have smaller orders. This concludes the proof of Lemma 2.2.4. ■ 2.7 Additional Technical Details 2.7.1 Lemma 2.7.1 and its Proof Lemma 2.7.1. For the matrix W define by (2.56), we have IE(W− IEW) 2 1/2 =O( √ np n ) . IE(W− IEW) 2 is a matrix of the same size asW. Let the k-th column of the matrixW be denoted as the vector w k . It is also the k-th row since W is symmetric. We compute all 43 of its entries and there are four cases to consider: IE[w ⊤ k 1 w k 2 ], with k 1 =k 2 =1, k 1 =1̸=k 2 , k 1 =k 2 ̸=1 and 1̸=k 1 ̸=k 2 ̸=1. We have the first case as the following IE (w 1 − IEw 1 ) ⊤ (w 1 − IEw 1 ) =IE w ⊤ 1 w 1 =IE " n X j=1 (a 1j − IEa 1j ) 2 # = n X j=1 var(a 1j )=O(np n ). (2.107) For k̸=1, we have IE (w 1 − IEw 1 ) ⊤ (w k − IEw k ) =IE w ⊤ 1 w k =IE " n X j=1 (a 1j − IEa 1j )(a jk − IEa jk ) 1− 1I {a 1k =0} 1I {a 1j =0} # =0. Similarly, for k̸=1, we have IE (w k − IEw k ) ⊤ (w k − IEw k ) =IE w ⊤ k w k =IE " n X j=1 (a kj − IEa kj ) 2 1− 1I {a 1k =0} 1I {a 1j =0} 2 # =O(np n ). (2.108) Lastly, for 1̸=k 1 ̸=k 2 ̸=1, IE (w k 1 − IEw k 1 ) ⊤ (w k 2 − IEw k 2 ) =IE w ⊤ k 1 w k 2 =0. (2.109) Combining equations (2.107)-(2.109), we can easily conclude that ∥IE(W − IEW) 2 ∥ = O(np n ). ■ 44 2.7.2 Proof of Corollary 2.6.1 We first find the order of the entries of q 1k by using its expression q 1k =λ − 1 k ΛU ⊤ v k . (2.110) The rows of U ⊤ and v k are unit vectors. Hence, by the Cauchy-Schwarz inequality, the entries of the vector U ⊤ v k are of order O(1). By assumption, λ − 1 k ∼ n − 1 p − 3/2 n and the entries of the diagonal matrix Λ are with order O(np n ). Thus, by (2.110), the entries of q 1k is with order O(1/ √ p n ) almost surely. Similarly, for the vector q 2k , we have q 2k =λ − 1 k ΛU ⊤ Sv k . (2.111) Compared to (2.110), there is an extra S contributing a factor of p n in the order. Entries of v k have order O(1) and entries of ΛU ⊤ S have order O(np 2 n ). As a result, the entries of the vector q 2k are with order O( √ p n ) almost surely. ■ 2.7.3 Corollary 2.7.1 and its Proof Corollary 2.7.1 is a result of the decomposition given in (2.77) and Corollary 2.6.1. Corollary 2.7.1. For any k ∈ [2K], the largest entry of the random vector v k has order ∥v k ∥ ∞ =O((np n ) − 1/2 ) almost surely. Proof. Consider the decomposition (2.77). The i-th entry of SUq 1k has order (SUq 1k ) i =a 1i u ⊤ i q 1k ≤ C √ p n a 1i ∥U∥ max =O(∥U∥ max )=O((np n ) − 1/2 ), almost surely. 45 Similarly, for the i-th entry of (I− S)Uq 2k , we have ((I− S)Uq 2k ) i =(1− a 1i )u ⊤ i q 2k ≤ C √ p n ∥U∥ max =O(p 1/2 n n − 1/2 ), almost surely. Therefore, denote the i-th entry of v k as v i , we have v i =(SUq 1k ) i +((I− S)Uq 2k ) i =O((np n ) − 1/2 ), almost surely. ■ 2.7.4 Lemma 2.7.2 and its Proof Lemma 2.7.2. The operator norm∥W∥ satisfies the concentration inequality IP{∥W∥≥ 6 p np n logn}≤ 1/n. Proof. By triangular inequality, ∥W∥≤∥ W 0 − 1 ∥+∥W− W 0 − 1 ∥, (2.112) wherewedenoteW 0 − 1 then× nmatrixwithzerosonitsfirstrowandfirstcolumnandtherest entries are the same as W. Both terms can be bounded by the matrix Bernstein inequality (Vershynin (2018) Theorem 5.4.1), which states that for a sequence of independent mean zero n× n symmetric random matricesX 1 ,...,X N , such that∥X i ∥≤ K for all i. Then, for every t≥ 0, we have IP ( X i X i ≥ t ) ≤ 2nexp − t 2 /2 σ 2 +Kt/3 , (2.113) 46 where σ 2 =∥ P i IEX 2 i ∥. The matrix W 0 − 1 can be expressed as W 0 − 1 =Y+ X 2≤ i<j≤ n X ij , where Y =diag(0,w 22 ,...,w nn ) and X ij =W 0 − 1 (e i e ⊤ j +e j e ⊤ i ). Straightforwardcomputationsgive σ 2 ≤ np n . Inaddition, itisobviousthatK =1. Take t=3 √ np n logn in (2.113), we have IP{∥W 0 − 1 ∥≥ 3 p np n logn}≤ 2nexp − 9np n logn/2 np n + √ np n logn ≤ 2/n. (2.114) Repeating the same calculations for W− W 0 − 1 , we get IP{∥W− W 0 − 1 ∥≥ 3 p np n logn}≤ 2/n. (2.115) Denote events E 1 , E 2 , E 3 by E 1 ={∥W 0 − 1 ∥≥ 3 p np n logn}, E 2 ={∥W− W 0 − 1 ∥≥ 3 p np n logn}, E 3 ={∥W 0 − 1 ∥+∥W− W 0 − 1 ∥≥ 6 p np n logn}. Notethat{∥W∥≥ 6 √ np n logn}⊂ E 3 =(E 3 ∩E 1 )∪(E 3 ∩E C 1 ). Inaddition,(E 3 ∩E C 1 )⊂ E 2 . Hence, by (2.112), (2.115) and (2.115), we have IP{∥W∥≥ 6 p np n logn}≤ IP(E 3 ∩E 1 )+IP(E 3 ∩E C 1 )≤ IP(E 1 )+IP(E 2 )≤ 1/n. ■ 47 2.7.5 Lemma 2.7.3: A Modification of Lemma 2.2.4 Lemma 2.7.3. For any integers l≥ 2, k∈[2K] and i∈[n]\{1}, we have IP{e ⊤ 1 W l v k ≥ logn(4 l l l+1/2 )(np n ) (l− 1)/2 }≤ C (logn) 2 , (2.116) IP{e ⊤ i W l v k ≥ logn(4 l l l+1/2 )(np n ) (l− 1)/2 p n }≤ C (logn) 2 , (2.117) IP{v ⊤ k W l v k ≥ logn(4 l l l+1/2 )(np n ) l/2 }≤ C (logn) 2 , (2.118) for some constant C >0. Proof. The main idea of the proof is to achieve fast enough rates with which the tail probabilities in (2.116)-(2.118) converge to 0 by sacrificing the order of the upper bound of e ⊤ i W l v k and v ⊤ k W l v k by logn. In addition, as l is potentially unbounded, we will need a more detailed counting to deal with all the quantities associated with l. Recall that Lemma 2.2.4 follows from Lemma 2.6.1, Lemma 2.6.2 and Lemma 2.6.3, which are essentially proved under the same procedure as the proof of Lemma 2.2.3. We observe that the only extra work to be done is to find out the order (in terms of l) of the sum C l := l X |P( ˜ i)|=m=1 X r 1 ,...,rm≥ 2 r 1 +...+rm=2l X Q(2m) 1≤ l max m∈[l] X r 1 ,...,rm≥ 2 r 1 +...+rm=2l 1 X Q(2l) 1 . (2.119) We can turn the counting into two classical combinatorics problems: the number of com- positions of a positive integer and the number of partitions of a set. It is easy to see that counting r 1 ,...,r m ≥ 2, with the constraint that r 1 +...+r m = 2l is essentially counting the m-compositions of 2l− m, which is trivially upper-bounded by 2 m− 1 . See Stanley (2011) Ch1.2 for more details of compositions. Counting the number of Q(2l) corresponds to counting the number of partitions of a set 48 with 2l elements. This is known as the Bell number. It is shown by De Bruijn (1981) that B n , the n-th Bell number, admits the asymptotic expansion logB n n =logn+O(loglogn). (2.120) Therefore, combining the two results from combinatorics, we can conclude that C l =O(2 2l− 1 (2l) 2l l)=O(2 4l l 2l+1 ). (2.121) It is clear that this can be directly incorporated into the proofs of Lemma 2.2.3, Lemma 2.6.1, Lemma 2.6.2 and Lemma 2.6.3: every order in the result of these lemmas should now be multiplied by C 1/2 l . Note that this is unnecessary when l is a fixed integer. Hence, by Markov’s inequality, we have a modification of Lemma 2.2.4 that for any C > 0 and i∈[n]\{1}, IP{e ⊤ 1 W l v k ≥ C4 l l l+1/2 (np n ) (l− 1)/2 }≤ C 1 C 2 , IP{e ⊤ i W l v k ≥ C4 l l l+1/2 (np n ) (l− 1)/2 p n }≤ C 2 C 2 , IP{v ⊤ k W l v k ≥ C4 l l l+1/2 (np n ) l/2 }≤ C 3 C 2 , for some constants C 1 ,C 2 ,C 3 > 0. Lemma 2.7.3 follows from replacing C by logn and naming the maximum of C 1 ,C 2 ,C 3 as the new constant C. ■ 49 Chapter 3 Asymptotic Behavior of the Degree-Weighted DeGroot Model 3.1 Network Structure and Updating Rule 3.1.1 The Stochastic Block Model Consideranetworkofnverticesrepresentedbyanundirectedgraph(i.e. verticesandedges). The associated n× n adjacency matrix A is defined in Definition 2.1.1 and the structure of the network is completely characterized by the adjacency matrix A. We model the network using the Stochastic Block model (Holland et al., 1983; Wang and Wong, 1987; Abbe, 2017), which is a random graph model that has been extensively studied and is widely used throughout the network analysis literature (Decelle et al., 2011; Rohe et al., 2011; Golub and Jackson, 2012; Abbe et al., 2015). Suppose there are m groups (or blocks) of vertices. Let P be an m× m symmetric matrix. Then, for i ≤ j, (i.e., the upper triangular part of A), let A ij be independent Bernoulli random variables, A ij = Bernoulli(P kl ), if vertices i and j belong to groups k and l respectively. For i > j, A ij = A ji . Let n = (n 1 ,...,n m ), with each n k equal to the number of vertices in group k ∈ {1,...,m}. For clarity, the notation A(P,n) will be used for the random matrix generated by the Stochastic Block model. 1 1 We work on the same network generative model as in Golub and Jackson (2012). 50 3.1.2 Degree-Weighted Linear Updating Under DeGroot’s model (Degroot, 1974), the vector b(t)∈R n at time t is given by: b(t)=T t b 0 , where b 0 ∈R n with∥b 0 ∥ 2 = 1 is an initial vector and T is an n× n row-stochastic matrix, with each entry t ij representing the weight that vertex i puts on vertex j. InGolubandJackson(2012),foranetworkwithadjacencymatrixA,matrixTisdefined entrywise by: t ij = a ij d i (A) , where d i (A) = P j a ij is the degree of vertex i. This definition assumes that every vertex puts equal weight on all of its neighbors. Equal weights in the above model might be an over-simplification and the main innovation of our work is to generalize this updating rule to be degree-weighted. Our new matrix T is defined by: t ij = a ij ϕ (α,d j (A)) P j a ij ϕ (α,d j (A)) , (3.1) where ϕ :R 2 →R is a function that represents the dependency of the weights on the degree of vertices and the parameter α ∈R gives an extra layer of freedom to discuss how things (limit, convergence, rate of convergence) could change when matrix T changes. In other words, thechangeinTiscapturedbythechangeinα . Thedefinitionin(3.1)aimstoreflect the heterogeneity of vertices by their degrees and the function ϕ is introduced to ensure the level of generality. 3.1.3 The Function ϕ Under the rationale that higher degree vertices are considered more “important” and have a largerimpactonneighbors,weshouldnaturallyassumethatϕ ismonotonicallyincreasingin 51 thedegreed. Onthe other hand, it isequally interesting tostudythe oppositecase that ϕ is monotonically decreasing in d, meaning that higher degree vertices are given lower weights. To facilitate the discussion of both cases, we let ϕ be increasing in d for α ∈ (0,∞) while also be decreasing for α ∈ (−∞ ,0) and we let ϕ ≡ 1 at α = 0, acting as a reference point. 2 In addition, our intention to study the effect of putting larger weights on more “important” vertices in the network suggests the assumption of the monotonicity of ϕ in α for α ∈ R (i.e., increase in α means larger weights are put on more “important” vertices throughout the domain of α ). Although our study is not restricted to any specific form of ϕ , we assume ϕ to be “nice” (to avoid, for example, discontinuities and singularities) by requiring some extra properties of ϕ , listed below together with the monotonicity mentioned above: 1. Monotonicity and Differentiability : ϕ (α,d ) ∈ C 2 (R× [0,∞)) is nonnegative and ϕ (0,d) ≡ 1 for all d ∈ [0,∞). In addition, ϕ is monotonically increasing in α for α ∈R, monotonically increasing in d for α ∈ (0,∞) and monotonically decreasing in d for α ∈(−∞ ,0). 2. Diverging Difference : For any two degrees d 1 > d 2 , the ratio ϕ (α,d 2 )/ϕ (α,d 1 ) is strictly decreasing in α : d dα ϕ (α,d 2 ) ϕ (α,d 1 ) <0. (3.2) In addition, lim α →∞ ϕ (α,d 2 ) ϕ (α,d 1 ) = lim α →−∞ ϕ (α,d 1 ) ϕ (α,d 2 ) =0. 3. Finite Elasticity: ϕ satisfies limsup d→∞ ∂ϕ (α,d )/∂d ϕ (α,d )/d <∞, (3.3) and limsup d→∞ ∂ 2 ϕ (α,d )/∂d 2 (∂ϕ (α,d )/∂d)/d <∞. (3.4) 2 Technically,wecanconsidertwofunctionsϕ 1 ,ϕ 2 suchthatϕ 1 ismonotonicallyincreasingindforα ∈D 1 and ϕ 2 decreasing for α ∈D 2 , for arbitrary domains D 1 ,D 2 ⊂ R. However, the more convenient form of ϕ that we take is sufficient for later discussions. 52 Properties 1 and 2 are the basic requirements for ϕ in order for our discussion to be meaningful. Property 3 consists of two technical conditions needed for the concentration inequality that we derive in Lemma 3.2.2. Roughly speaking, property 3 says that we do not want a tiny change in the degree d to cause a huge increase in the ϕ . All results in the paper are derived for matrix T, defined in terms of this general function ϕ . However, this should not conceal the main target of this paper, which is to study the change in convergence speed in response to a change in weights that is represented by a change in the parameter α . Remark 3.1.1. The properties that we assume are quite general and one typical example of a class of elementary functions is the exponential functions of the form p(d) q(α ) , for α ∈R, where p(d) and q(α ) are two polynomials with orders k 1 and an odd k 2 respectively and with positive leading coefficients. The simplest example in this form is d α . Due to its simplicity, ϕ (α,d ) = d α is our representative example when needed. Specifically, we use it for all the related plots. 3.2 Results Our study follows three steps across three settings: for networks conditioning on A, for an appropriate “expectation”, and finally for random networks. For nonrandom networks, the adjacency matrix A is deterministic. In Proposition 3.2.1, we show that the conver- gence speed of the limit lim t→∞ b(t) is determined by|λ 2 (T)|, the size of the second largest eigenvalue ofT in magnitude. In addition, the explicit expression of the limit lim t→∞ b(t) is giveninProposition3.2.2. Beforestudyingrandomnetworks,wefirstlookatadeterministic version of the matrix T in (3.1), denoted as T ∗ . In Lemma 3.2.1, we find λ 2 (T ∗ ) for this “expectation”T ∗ andshowthat|λ 2 (T ∗ )| ismonotonicallydecreasingin α . Finally, inTheo- rem 3.2.2, we show that the difference between the random network and its “expectation” is arbitrarily small, for sufficiently large n. In this section, the results are stated in the above order. 53 We restrict our work to networks where the limit lim t→∞ T t exists. In fact, this is really a minor issue. By our assumption of minimum density (assumption 1 of Theorem 3.2.2), convergence is guaranteed with high probability. 3 3.2.1 Preliminary Results 3.2.1.1 Results for Networks Conditioning on A Many studies such as Jackson and Golub (2010) and Levin and Peres (2017) show that the magnitude of the second largest eigenvalue of T is the dominant factor that determines the convergence speed. Our first result generalizes this insight, with adjustments for degree heterogeneities, to the newly proposed updating rule. Proposition 3.2.1. Denote T ∞ =lim t→∞ T t . Then at time t, max ∥b 0 ∥ 2 =1 ∥(T t − T ∞ )b 0 ∥ 2 =|λ 2 (T)| t , (3.5) where λ 2 (T) is the second largest eigenvalue in magnitude of T. The term max ∥b 0 ∥ 2 =1 ∥(T t − T ∞ )b 0 ∥ 2 represents the distance between the vector b(t) and its limit. The definition of distance contains a maximum because the convergence speed with the worst possible initial vectorb 0 best represents the network’s ability to converge. In contrast, cases such as b 0 = (1/ √ n,...,1/ √ n) ⊤ that imply instant convergence would be meaningless. As t goes to ∞, the right-hand side of (3.5) shrinks to zero and the speed is determined by|λ 2 (T)|. This prompts the question of what the limit is, and how it is affected by α , which can be answered by the following proposition. 3 With the minimum density assumed, the network is connected with high probability. Restricting to this high probability event, the directed graph G(T) corresponding to the matrix T is strongly connected. In addition, T is aperiodic with high probability. With the known fact that strongly connectivity and aperiodicity together implies convergence, we see that the limit exists with high probability. We have the same settings as Golub and Jackson (2012). 54 Proposition 3.2.2. The limit T ∞ as t→∞ is given by: T ∞ = T 1 T 2 ... T n T 1 T 2 ... T n . . . . . . . . . . . . T 1 T 2 ... T n , where T j = P i a ij ϕ (α,d j (A))ϕ (α,d i (A)) P i,j a ij ϕ (α,d j (A))ϕ (α,d i (A)) , for j = 1,...,n. This means that the limit b(∞) = T ∞ b 0 would shift towards the initial states of vertices with high degrees and high degree neighbors as α increases. 4 3.2.1.2 Results for “Expectation”–Replacing A by IEA LetA=A(P,n) andR=IEA(P,n). In other words, R is an n× n matrix, with r ij =p kl , if vertex i belongs to group k and vertex j belongs to group l. Let T ∗ denote the linear updating mechanism defined entrywise in terms of R: t ∗ ij = r ij ϕ (α,d j (R)) P j r ij ϕ (α,d j (R)) , (3.6) where the expected degree d i (R) is defined as: d i (R) = IEd i (A) = P j r ij for i = 1,...,n. Note in the definition of T ∗ , we replace allA withR in the expression in (3.1). It is indeed not the actual expectation of T but this expression can be dealt with more easily in this theoreticalcontext. Ourgoalhereistofinditssecondlargesteigenvalueandtheninvestigate how a change of α affects its magnitude. However, for the most general Stochastic Block model (i.e., for general n and P), the difficulty lies in solving for the eigenvalues, since 4 Of course, if all degrees d j (A) are equal for all j = 1,...,n, then b(∞) is independent of α , but this happens with negligible probability. 55 there is no explicit formula for zeros of polynomials with orders higher than or equal to 5. Therefore, we consider a less general case where the group sizes are n 1 >n 2 =...=n m . (3.7) In addition, the matrix P is of the form: P= p q ... q q p ... q . . . . . . . . . . . . q q ... p , (3.8) wherepisthein-grouplinkingprobability,q isthelinkingprobabilityacrossdifferentgroups, and p>q. We then have the following Lemma: Lemma 3.2.1. Let d ∗ 1 =n 1 p+(m− 1)n 2 q and d ∗ 2 =n 1 q+n 2 p+(m− 2)n 2 q be the expected degrees 5 and let g : R → R be the function g(α ) = ϕ (α,d ∗ 2 ) ϕ (α,d ∗ 1 ) . Under the above assumptions of group sizes and linking probability in (3.7) and (3.8), the second largest (in magnitude) eigenvalue is: 6 |λ 2 (T ∗ )|= n 1 pϕ (α,d ∗ 1 ) n 1 pϕ (α,d ∗ 1 )+(m− 1)n 2 qϕ (α,d ∗ 2 ) − n 1 qϕ (α,d ∗ 1 ) n 1 qϕ (α,d ∗ 1 )+(n 2 p+(m− 2)n 2 q)ϕ (α,d ∗ 2 ) , α ≥ g − 1 (n 1 /n 2 ) n 2 (p− q)ϕ (α,d ∗ 2 ) n 1 qϕ (α,d ∗ 1 )+(n 2 p+(m− 2)n 2 q)ϕ (α,d ∗ 2 ) , α<g − 1 (n 1 /n 2 ) (3.9) Furthermore,|λ 2 (T ∗ )| is monotonically decreasing in α for α ∈D α where D α =(−∞ ,g − 1 (n 1 /n 2 )]∪ g − 1 n 1 n 2 p (m− 1)(p+(m− 2)q) 1/2 ! ,∞ . (3.10) 5 Note that the expected degree of vertex i is d i (R) = d ∗ 1 if vertex i belongs to group 1. Otherwise, d i (R)=d ∗ 2 . 6 Note that the existence of the inverse g is guaranteed: property 2 of ϕ implies that ϕ (α,d ∗ 2 ) ϕ (α,d ∗ 1 ) is strictly monotone and thus invertible. 56 In particular, if m=2, thenD α =R. 0.0 0.1 0.2 0.3 −50 0 50 100 alpha |lambda_2| colour m=2 m=4 m=6 Figure 3.1: Graph of |λ 2 (T ∗ )| vs. α , with ϕ (α,d ) = d α , n 1 = 500, n 2 = 300, p = 0.3, q = 0.2 and m=2,4,6. Remark3.2.1. Whenm is large and ifp,q andn 1 ,n 2 are relatively close, there is a nontriv- ial rise in |λ 2 | on the right side of 0 in the plot. Intuitively, in this case, when α increases from 0, the level of dominance of the higher degree vertices is so weak that the effect of sharing higher weights is overwhelmed by those multiple blocks of lower degree vertices. In addition, the kink on the left side of 0 is the result of a change in expression of|λ 2 (T ∗ )|. Lemma3.2.1impliesthatwithinD α theconvergencespeedisfasterasα increases,thatis, whenverticesputmoreweightonhigherdegreeneighbors. Itisclearthatlim α →∞ |λ 2 (T ∗ )|= 0 and lim α →−∞ |λ 2 (T ∗ )| = (p− q)/(p+(m− 2)q). While the limit towards ∞ is easy to understand, there is more to discuss with respect to the limit towards−∞ . We notice that when α approaches−∞ , the block of higher degree vertices shares no weight in the network so that the convergence depends only on the blocks of low degree vertices. This is identical to the island model studied in Golub and Jackson (2012) and taking the limit α → −∞ replicatestheirresult. Anotherinterestingpointisthecomparisonbetween|λ 2 (T ∗ )|at0and at−∞ . Weseethatwhenα =0,allneighboringverticessharethesameweight,independent of their degrees. So, the difference here is only the weight shared by higher degree vertices. 57 Such a difference is nontrivial even when higher degree vertices are extremely sparse: the ratio of|λ 2 (T ∗ )| in these two scenarios approaches a constant not equal to 1 as the number of blocks m increases. This effect is illustrated in figure 3.2. 0.75 1.00 1.25 1.50 0 100 200 300 m ratio Figure 3.2: Scatter plot of lim α →0 |λ 2 (T ∗ )| lim α →−∞ |λ 2 (T ∗ )| vs. m, with ϕ (α,d ) = d α , n 1 = 500, n 2 = 300, p = 0.3, q =0.2. 3.2.1.3 Unbalanced Case for Lemma 3.2.1 There is one potential variant of the result from Lemma 3.2.1. Let m = 2, n 1 < n 2 . We consider a different linking probability matrix P unbalanced = p 1 q q p 2 , (3.11) whereq >p 1 >p 2 . Thedifferencebetweenthisnewprobabilitymatrix P unbalanced andthe one in (3.8) is that it allows its diagonal elements to be unequal. Under the above settings, we have the following corollary: 58 Corollary 3.2.1. Let d ∗ 1 =n 1 p 1 +n 2 q, d ∗ 2 =n 1 q+n 2 p 2 and g(α )= ϕ (α,d ∗ 2 ) ϕ (α,d ∗ 1 ) . Suppose n 1 <n 2 , q >p 1 >p 2 and the linking probability matrix P=P unbalanced . Then, we have |λ 2 (T ∗ )|= − n 1 p 1 ϕ (α,d ∗ 1 ) n 1 p 1 ϕ (α,d ∗ 1 )+n 2 qϕ (α,d ∗ 2 ) + n 1 qϕ (α,d ∗ 1 ) n 1 qϕ (α,d ∗ 1 )+n 2 p 2 ϕ (α,d ∗ 2 ) , α ≥ g − 1 n 1 (q− p 1 ) n 2 (q− p 2 ) n 2 (q− p 2 )ϕ (α,d ∗ 2 ) n 1 qϕ (α,d ∗ 1 )+n 2 p 2 ϕ (α,d ∗ 2 ) , α<g − 1 n 1 (q− p 1 ) n 2 (q− p 2 ) . (3.12) In addition,|λ 2 (T ∗ )| is monotonically decreasing in α for α ∈R. 0.0 0.5 1.0 1.5 −20 0 20 alpha |lambda_2| Figure 3.3: Graph of|λ 2 (T ∗ )| vs. α with n 1 =300, n 2 =500, p 1 =0.3, p 2 =0.2 and q =0.5. The value of Corollary 3.2.1 is that it allows us to extend the result of Lemma 3.2.1 to the unbalanced case with m=2. 3.2.2 Main Results: a Concentration Inequality and the Monotonicity Result Lemma 3.2.1 summarizes our findings with regards to the convergence speed for the “expec- tation” T ∗ . To see the full picture, we need to show that the result for the random case is arbitrarily “close” to the “expectation” when n is sufficiently large. To address this issue, we first make some assumptions: 7 7 We put our assumptions in the same form as the ones in Golub and Jackson (2012). 59 Assumption 3.2.1. (Minimum Density) Let τ n = min i d i (R) n , where R = IEA(P,n). Assume that lim n→∞ τ n q logn n =∞. (3.13) Assumption 3.2.2. (No Vanishing Groups) For all k =1,...,m, liminf n n k n >0. (3.14) Assumption 3.2.3. (Comparable Densities 8 ) limsup n p n q n <∞. (3.15) With those assumptions in place, we achieve the following concentration inequality: Lemma 3.2.2. Suppose A(P,n) is generated by the Stochastic Block model with P and n. In addition, suppose assumptions 3.2.1, 3.2.2 and 3.2.3 hold. Then, there exists a positive constant 9 ˜ C, independent of n, such that for all n>0, IP |λ 2 (T)− λ 2 (T ∗ )|≥ ˜ C √ logn τ n √ n ≤ 16 n 2 . Note for this lemma, we do not require the structure assumed in Lemma 3.2.1. In fact, ourresultismuchstrongerthantheoneinGolubandJackson(2012): weachievearateand it is for the degree-weighted setting, which is more general. By Lemma 3.2.2, we see that the error gets arbitrarily small when n goes to infinity. Moreover, the monotonicity seen in the case of “expectation” does not necessarily occur in the case of random networks, but we can still determine by how much an increase in α would decrease the eigenvalue by an amount that exceeds the error created by randomness. Given an α 0 , suppose α rises from α 0 to α 1 , then how large does α 1 need to be to see an increase in the convergence speed with 8 Here, we consider increasing n, so p,q in (3.8) are with subscript n. 9 The constant ˜ C may depend on α and ϕ . 60 high probability? We answer this question by applying Lemma 3.2.2 and this results in the main theorem. Theorem 3.2.3. Given a constant α 0 . Let α 1 be bounded away from ∞. If α 0 ,α 1 ∈ D α defined in (3.10) and α 1 − α 0 > ˆ C √ logn τ n √ n ∂ϕ/∂α (α 1 ,d ∗ 1 ) ϕ (α 1 ,d ∗ 1 ) − ∂ϕ/∂α (α 1 ,d ∗ 2 ) ϕ (α 1 ,d ∗ 2 ) − 1 , (3.16) where ˆ C > 0 is a constant 10 independent of n and d ∗ 1 , d ∗ 2 are expected degrees defined in Lemma 3.2.1, then there exists a constant n 0 >0, such that for all n>n 0 , IP{|λ 2 (T(α 0 ))|−| λ 2 (T(α 1 ))|>0}≥ 1− 16 n 2 . Remark 3.2.2. The condition in (3.16) may seem complicated but it is not restrictive at all: the right-hand side of the inequality approaches zero as n goes to∞. The purpose of this condition is to give an expression of the size α 1 − α 0 . In addition, we view α 1 as a sequence of n in Theorem 3.2.3. If α 1 is a constant, then (3.16) can be replaced by α 1 − α 0 >0. This theorem determines the size of increase from α 0 to α 1 such that the change in the magnitude of the eigenvalue would be greater than the error, whose size is given by Lemma 3.2.2. Therefore, Theorem 3.2.3 tells us that if α increases from α 0 to α 1 that satisfy the conditions in Theorem 3.2.3, then we will see a decrease in |λ 2 (T)| and thus an increase in convergence speed with a probability tending to one as n goes to∞. Remark 3.2.3. Theorem 3.2.3 is a result under the setting of Lemma 3.2.1. The same result holds for the unbalanced case under the setting of Corollary 3.2.1. 10 ˆ C dependsonlimsup n p n /q n , limsup n n 2 /n 1 , limsup n ϕ (α 1 ,d 2 )/ϕ (α 1 ,d 1 )andtheconstant ˜ C inLemma 3.2.2. 61 3.2.3 Theorem 3.2.3 under Perturbation In this section, the effect of perturbation on our main result is discussed. A perturbation term is added to each entry of the adjacency matrixA=A(P,n) and the impact of this on Theorem3.2.3isinvestigated. 11 Morespecifically,for i,j =1,2,...,n,letϵ ij beindependent and identically distributed random variables supported on [0,1]. Then, let the perturbed adjancency matrix ˜ A be defined entrywise by: ˜ a ij =(1− δ )a ij +δϵ ij , for δ ∈[0,1]. Replacing A by ˜ A in (3.1), we define the perturbed weight matrix ˜ T by: ˜ t ij = ˜ a ij ϕ (α,d j ( ˜ A)) P j ˜ a ij ϕ (α,d j ( ˜ A)) . Similarly, let ˜ R=IE ˜ A and define the deterministic matrix ˜ T ∗ by: ˜ t ∗ ij = ˜ r ij ϕ (α,d j ( ˜ R)) P j ˜ r ij ϕ (α,d j ( ˜ R)) . Then, we have the following Corollary: Corollary 3.2.2. Lemma 3.2.1 and Theorem 3.2.3 hold for the perturbed random matrix ˜ T. It is clear that the extra perturbation term does not change the structure assumed in Lemma 3.2.1. The concentration result of Lemma 3.2.2 also holds since the perturbation ϵ ij is bounded so that the concentration inequalities as our main tools for proving of Lemma 3.2.2 are unaffected. The result of Theorem 3.2.3 follows. 11 Ideally, we would add the perturbation to the matrix T but the structure of the row-stochastic matrix can easily be broken in this way. Therefore, we pass the perturbation to the adjancency matrix A for theoretical simplicity. 62 3.3 Proofs 3.3.1 Proof of Proposition 3.2.1 By (3.1), our updating rule T is equivalently defined by: T=D − 1 1,A AD 2,A , (3.17) where D 1,A and D 2,A are diagonal matrices with diagonal entries: (D 1,A ) ii = X j a ij ϕ (α,d j (A)), (D 2,A ) ii =ϕ (α,d i (A)). Recall that in section 3.2, we have restrictedG(T) to be strongly connected, which is equiv- alent to sayT is irreducible. In addition, we study only the case in which lim t T t converges. Proposition3.2.1isageneralizationoftheresultinGolubandJackson(2012). Fortheproof, we proceed by finding an upper bound and a lower bound and show that they are indeed the same. The main part of the proof is the same as that of Golub and Jackson (2012) but some details are adjusted to work for our matrixT and the norm∥·∥ 2 . The key component of the proof is to apply the spectral theorem to get a decomposition of the matrix T. We first show that such decomposition exists by the following Lemma: Lemma 3.3.1. The matrix T in (3.17) is diagonalizable. Proof. First, note that (D 1,A D 2,A ) 1/2 T(D 1,A D 2,A ) − 1/2 =D − 1/2 1,A D 1/2 2,A TD 1/2 2,A D − 1/2 1,A . (3.18) The equality holds because the diagonal matrices D 1,A and D 2,A commute. Equation (3.18) says T is similar to a symmetric matrix. Then, since any real symmetric matrix is 63 diagonalizable, T is also diagonalizable. ■ Lemma 3.3.1 allows us to apply spectral theorem for diagonalizable matrices (Meyer, 2000) to T. Let the spectral decomposition of T be: T= n X i=1 λ i U i , (3.19) where λ i are the eigenvalues ofT in decreasing order (in magnitude) andU i are the orthog- onal projections onto the eigenspace of T associated with λ i . The next lemma, commonly referredtoasthePerron-FronbeniusTheorem,givesanimportantpropertyoftheeigenvalues of the matrix T that is applied in the proof of Proposition 3.2.1. Lemma3.3.2. For a nonnegative irreducible stochastic matrixT, its spectral radius ρ (T)= 1 is a simple eigenvalue. In addition, the limit T ∞ exists and takes the form: T ∞ =λ 1 U 1 =vw ⊤ , where v and w are left and right eigenvectors of T corresponding to λ 1 , normalized so that w ⊤ v =1. The proof is given in Meyer (2000). 3.3.1.1 Upper Bound Inthispart, wewanttoachieveanupperboundofthedistance∥(T t − T ∞ )b∥ 2 byapplying the spectral decomposition of the matrix T in (3.19). Note that T t − T ∞ = n X i=2 λ t i U i . (3.20) The sum from 2 to n is justified by Lemma 3.3.2. The largest eigenvalue of T in magnitude is 1 and all other eigenvalues have magnitude less than 1. Then, the i = 1 part in the sum 64 T t = P n i=1 λ t i U i cancels with T ∞ , since U 1 = (vw ⊤ ) t = v(w ⊤ v) t− 1 w ⊤ = vw ⊤ = T ∞ for t∈N. Applying (3.20), we have T t − T ∞ b 2 2 = n X i=2 λ t i U i b 2 2 (3.21) = n X i=2 |λ i | 2t ∥U i b∥ 2 2 (U i U j =0 for i̸=j) ≤| λ 2 | 2t n X i=2 ∥U i b∥ 2 2 =|λ 2 | 2t n X i=2 U i b 2 2 . (U i U j =0 for i̸=j) View U = P n i=2 U i as a single orthogonal projection and it has the property that: U = U ⊤ =U 2 . Then, for any vector x, we have ∥Ux∥ 2 2 =⟨Ux,Ux⟩=⟨U ⊤ Ux,x⟩=⟨Ux,x⟩≤∥ Ux∥ 2 ∥x∥ 2 . The last inequality is the Cauchy-Schwarz inequality. By (3.21), we have T t − T ∞ b 2 2 ≤| λ 2 | 2t ∥b∥ 2 2 =|λ 2 | 2t . The upper bound is obtained by taking square root on both sides. 3.3.1.2 Lower Bound Here, we obtain a lower bound by considering a specific b. Let b be the eigenvector of T corresponding to its second largest eigenvalue λ 2 in magnitude, with ∥b∥ 2 2 = 1. Note (T t − T ∞ )b=λ t 2 b. This is true because U i b=0 for all i̸=2. Then, we have T t − T ∞ b 2 2 = λ t 2 b 2 2 =|λ 2 | 2t . 65 By taking square root on both sides, we see that the lower bound is the same as the upper bound. ■ 3.3.2 Proof of Proposition 3.2.2 We first show a simple lemma that is used in the proof of Proposition 3.2.2. Lemma 3.3.3. Consider similar n× n matrices M 1 and M 2 such that M 2 = SM 1 S − 1 , where S is some invertible n× n matrix. If v is an eigenvector of M 1 corresponding to eigenvalue λ , then Sv is an eigenvector of M 2 corresponding to the same eigenvalue λ . Proof. Note that M 2 (Pv)= SM 1 S − 1 (Sv)=SM 1 v =λ Sv. Bythedefinitionofeigenvectors, Sv isaneigenvectorofM 2 correspondingtotheeigenvalue λ . ■ We want to find the limit: T ∞ = lim t→∞ T t = lim t→∞ D − 1/2 1,A D − 1/2 2,A D 1/2 2,A D − 1/2 1,A AD 1/2 2,A D − 1/2 1,A t D 1/2 1,A D 1/2 2,A . (3.22) Given that the limit converges, lim t→∞ D 1/2 2,A D − 1/2 1,A AD 1/2 2,A D − 1/2 1,A t =λ 1 v 1 v ⊤ 1 , (3.23) where λ 1 =1 is the largest eigenvalue in magnitude and v 1 is the corresponding unit eigen- vector of the matrix D 1/2 2,A D − 1/2 1,A AD 1/2 2,A D − 1/2 1,A . The limit being in this form is a result of the Perron-Frobenius Theorem, which is the same as what we apply in equality (3.21) in the proof of Proposition 3.2.1. Note that since the matrix in (3.23) is symmetric, the 66 left and right eigenvectors are the same. To find v 1 , apply Lemma 3.3.3: it’s easy to see that D − 1 1,A AD 2,A has eigenvector e = (1,...,1) ⊤ , corresponding to the eigenvalue 1. Then, D 1/2 2,A D − 1/2 1,A AD 1/2 2,A D − 1/2 1,A has eigenvector w 1 =D 1/2 2,A D 1/2 1,A e. To make it into a unit vector, we divide it by its magnitude∥w 1 ∥ 2 , which has the expression: ∥w 1 ∥ 2 = D 1/2 2,A D 1/2 1,A e 2 = X i,j A ij ϕ (α,d j (A))ϕ (α,d i (A)) ! 1/2 . Since v 1 =w 1 /∥w 1 ∥ 2 , (3.22) becomes: T ∞ =D − 1/2 1,A D − 1/2 2,A ∥w 1 ∥ − 1 2 D 1/2 2,A D 1/2 1,A e ∥w 1 ∥ − 1 2 D 1/2 2,A D 1/2 1,A e ⊤ D 1/2 1,A D 1/2 2,A =∥v 1 ∥ − 2 2 E n D 1,A D 2,A , where E n is an n× n matrix with all entries equal to 1. This leads us to the result of the limit: t ∞ ij = P i a ij ϕ (α,d j (A))ϕ (α,d i (A)) P i,j a ij ϕ (α,d j (A))ϕ (α,d i (A)) , for each i,j =1,...,n. ■ 3.3.3 Proof Lemma 3.2.1 By the definition of the nonrandom matrix T ∗ in (3.6), T ∗ is equivalently defined by: T ∗ =D − 1 1,R RD 2,R , (3.24) where R=IEA(P,n) and D 1,R , D 2,R are diagonal matrices defined by: (D 1,R ) ii = X j r ij ϕ (α,d j (R)), (D 2,R ) ii =ϕ (α,d i (R)). 67 Before the proof of Lemma 3.2.1, we first show a Lemma: Lemma 3.3.4. Let the m× m matrix F T ∗ be defined as: (F T ∗ ) kl = n l p kl ϕ (α, P h n h p lh ) P l n l p kl ϕ (α, P h n h p lh ) . (3.25) Then, F T ∗ has the same eigenvalues as T ∗ . Proof. First, note R is a matrix with m 2 blocks. Within each block, the entries are identical. So isT ∗ . From the definition of T ∗ , we see that if vertex i belongs to group k and vertex j belongs to group l, t ∗ ij = r ij ϕ (α,d j (R)) P j r ij ϕ (α,d j (R)) = p kl ϕ (α, P h n h p lh ) P l n l p kl ϕ (α, P h n h p lh ) . After suitable rearrangement of the vertices, this matrix T ∗ is in the following block form: B 11 B 12 ... B 1m B 21 B 22 ... B 2m . . . . . . . . . . . . B m1 B m2 ... B mm , where each B kl is a block matrix and within each block the entries are identical. De- note the entries in block B kl by b kl , which takes the value of t ∗ ij in (3.25), if vertex i is in group k and vertex j is in group l. Consider eigenvectors in the form of v = (v 1 ,v 1 ,...,v 2 ,v 2 ,...,v m ,v m ...) ⊤ . T ∗ v =λ v implies: n 1 b 11 v 1 +n 2 b 12 v 2 +...+n m b 1m v m =λv 1 , . . . n 1 b m1 v 1 +n 2 b m2 v 2 +...+n m b mm v m =λv m , 68 which completes the proof. ■ By applying Lemma 3.3.4, we are able to reduce the n × n matrix T ∗ to an m × m matrix F T ∗ . By our assumptions of n and P in (3.7) and (3.8), we see there are two expected degrees, denoted as d ∗ 1 and d ∗ 2 . For vertices in group 1 (the group with size n 1 ), d ∗ 1 = n 1 p+(m− 1)n 2 q and for vertices in the rest groups, d ∗ 2 = n 1 q +n 2 p+(m− 2)n 2 q. Then, the matrix F T ∗ is in the following form: F T ∗ = a b b b ... b e c d d ... d e d c d ... d . . . . . . . . . . . . . . . . . . e d d ... c d e d d ... d c , (3.26) where a= n 1 pϕ (α,d ∗ 1 ) n 1 pϕ (α,d ∗ 1 )+(m− 1)n 2 qϕ (α,d ∗ 2 ) , b= n 2 qϕ (α,d ∗ 2 ) n 1 pϕ (α,d ∗ 1 )+(m− 1)n 2 qϕ (α,d ∗ 2 ) , c= n 2 pϕ (α,d ∗ 2 ) n 1 qϕ (α,d ∗ 1 )+n 2 pϕ (α,d ∗ 2 )+(m− 2)n 2 qϕ (α,d ∗ 2 ) , d= n 2 qϕ (α,d ∗ 2 ) n 1 qϕ (α,d ∗ 1 )+n 2 pϕ (α,d ∗ 2 )+(m− 2)n 2 qϕ (α,d ∗ 2 ) , e= n 1 qϕ (α,d ∗ 1 ) n 1 qϕ (α,d ∗ 1 )+n 2 pϕ (α,d ∗ 2 )+(m− 2)n 2 qϕ (α,d ∗ 2 ) . We perform row operations on the matrix F T ∗ − λ I m before finding the zeros of the charac- teristic polynomial: det(F T ∗ − λ I m )=0. (3.27) 69 Subtract last row from rows 2,3,...,m− 1 and equation (3.27) becomes det a− λ b b b ... b 0 c− d− λ 0 0 ... − (c− d− λ ) 0 0 c− d− λ 0 ... − (c− d− λ ) . . . . . . . . . . . . . . . . . . 0 0 0 ... c− d− λ − (c− d− λ ) e d d ... d c− λ =0. We see now one eigenvalue is λ =c− d, with algebraic multiplicity m− 2. Suppose λ ̸=c− d. Multiply rows 2,3,...,m− 1 by 1/(c− d− λ ), we have det a− λ b b b ... b 0 1 0 0 ... − 1 0 0 1 0 ... − 1 . . . . . . . . . . . . . . . . . . 0 0 0 ... 1 − 1 e d d ... d c− λ =0. Then, subtract multiplies of rows 2,3,...,m− 1 from the first and last row, we have det a− λ 0 0 0 ... (m− 1)b 0 1 0 0 ... − 1 0 0 1 0 ... − 1 . . . . . . . . . . . . . . . . . . 0 0 0 ... 1 − 1 e 0 0 ... 0 c− λ +(m− 2)d =0. (3.28) 70 Computing the determinant (3.28), we see the eigenvalues different from c− d should satisfy the equation (a− λ )(c− λ +(m− 2)d)− e(m− 1)b=0. Note whether m is even or odd does not change the equation. By solving the quadratic equation, we see that the other two eigenvalues are λ =1 and λ =a− e. To get the second largest eigenvalue, we compare a− e and c− d, both being positive. So, in the computation below, |λ 2 | = λ 2 and the absolute value is omitted. We then compare the two candidates a− e and c− d. Denoting the denominators of a and c as D 1 and D 2 respectively, we have D 1 D 2 (a− e− c+d) =n 1 pϕ (α,d ∗ 1 )[n 1 qϕ (α,d ∗ 1 )+n 2 pϕ (α,d ∗ 2 )+(m− 2)n 2 qϕ (α,d ∗ 2 )] − (n 2 (p− q)ϕ (α,d ∗ 2 )+n 1 qϕ (α,d ∗ 1 ))[n 1 pϕ (α,d ∗ 1 )+(m− 1)n 2 qϕ (α,d ∗ 2 )] =(m− 1)n 2 qϕ (α,d ∗ 2 )(p− q)(n 1 ϕ (α,d ∗ 1 )− n 2 ϕ (α,d ∗ 2 )). (3.29) This proves (3.9) in Lemma 3.2.1, since n 1 ϕ (α,d ∗ 1 ) ≥ n 2 ϕ (α,d ∗ 2 ) is equivalent to α ≥ g − 1 (n 1 /n 2 ). To show the monotonicity, we first consider α ≥ g − 1 (n 1 /n 2 ), for which the second largest eigenvalue is λ 2 =a− e = n 1 pϕ (α,d ∗ 1 ) n 1 pϕ (α,d ∗ 1 )+(m− 1)n 2 qϕ (α,d ∗ 2 ) − n 1 qϕ (α,d ∗ 1 ) n 1 qϕ (α,d ∗ 1 )+(n 2 p+(m− 2)n 2 q)ϕ (α,d ∗ 2 ) , (3.30) From this expression, we see that the limit is 0 as α goes to ∞. We rewrite the above expression of λ 2 as λ 2 = 1 1+C 1 g(α ) − 1 1+C 2 g(α ) , (3.31) 71 where C 1 = (m− 1)n 2 q n 1 p , C 2 = n 2 p+(m− 2)n 2 q n 1 q and g(α,d ∗ 1 ,d ∗ 2 ) = ϕ (α,d ∗ 2 ) ϕ (α,d ∗ 1 ) . Since d ∗ 1 and d ∗ 2 are fixed here, we omit them in the notation below and write g(α ). Note that 0<C 1 <C 2 . Then, the derivative with respect to α is ∂λ 2 ∂α =− C 1 dg/dα (1+C 1 g(α )) 2 + C 2 dg/dα (1+C 2 g(α )) 2 = dg dα (C 2 − C 1 )(1− C 1 C 2 g 2 (α )) (1+C 1 g(α )) 2 (1+C 2 g(α )) 2 . (3.32) Property 2 in (3.2) implies that dg dα < 0. Thus, the ∂λ 2 ∂α ≤ 0 for 1− C 1 C 2 g 2 (α )≥ 0, which is equivalent to α ≥ g − 1 1 √ C 1 C 2 =g − 1 n 1 n 2 p (m− 1)(p+(m− 2)q) 1/2 ! . (3.33) Similarly, for α<g − 1 (n 1 /n 2 ), we have λ 2 =c− d= n 2 (p− q)g(α ) n 1 q+n 2 (p+(m− 2)q)g(α ) . (3.34) Then, taking the derivative and simplify, we get ∂λ 2 ∂α = ∂g ∂α n 1 n 2 (p− q)q (n 1 q+n 2 (p+(m− 2)q)g(α )) 2 <0, (3.35) since ∂g ∂α <0. Weareleftwiththespecialcaseof m=2. Weseethatifm=2, thecondition in (3.33) becomes α ≥ g − 1 (n 1 /n 2 ). This completes the computations in Lemma 3.2.1. ■ 3.3.4 Proof of Corollary 3.2.1 Two expressions of the second largest eigenvalue in (3.12) follows directly from equations (3.30) and (3.34) after replacing the corresponding p by p 1 and p 2 and taking the absolute value. 72 To get the threshold of α in (3.12), we repeat the calculation in (3.29). After simplifica- tions, we have − n 1 p 1 ϕ (α,d ∗ 1 ) n 1 p 1 ϕ (α,d ∗ 1 )+n 2 qϕ (α,d ∗ 2 ) + n 1 qϕ (α,d ∗ 1 ) n 1 qϕ (α,d ∗ 1 )+n 2 p 2 ϕ (α,d ∗ 2 ) ≥ n 2 (q− p 2 )ϕ (α,d ∗ 2 ) n 1 qϕ (α,d ∗ 1 )+n 2 p 2 ϕ (α,d ∗ 2 ) , if and only if n 2 qϕ (α,d ∗ 2 )[n 1 ϕ (α,d ∗ 1 )(q− p 1 )− n 2 ϕ (α,d ∗ 2 )(q− p 2 )]≥ 0, which is equivalent to α ≥ g − 1 ( n 1 (q− p 1 ) n 2 (q− p 2 ) ). To show the monotoncity result, we repeat the same calculations in equations (3.32)- (3.35) and get that|λ 2 | is monotonically decreasing in the region −∞ ,g − 1 n 1 (q− p 1 ) n 2 (q− p 2 ) [ g − 1 n 1 √ p 1 n 2 √ p 2 ,∞ . This is equivalent to the whole real lineR, since g − 1 n 1 (q− p 1 ) n 2 (q− p 2 ) >g − 1 n 1 √ p 1 n 2 √ p 2 . ■ 3.3.5 Proof of Lemma 3.2.2 Recall that R = IEA(P,n). It is easy to see from the definition of matrices in (3.17) and (3.24) that to compare the eigenvalues λ 2 (T) and λ 2 (T ∗ ), it is equivalent to compare the eigenvalues of D 1/2 2,A D − 1/2 1,A AD 1/2 2,A D − 1/2 1,A and D 1/2 2,R D − 1/2 1,R RD 1/2 2,R D − 1/2 1,R because they are similar to T and T ∗ correspondingly. For simplicity, we let D A =D − 1 2,A D 1,A , D R =D − 1 2,R D 1,R . 73 Then, by Weyl’s inequality, we have |λ 2 (T)− λ 2 (T ∗ )|≤ D − 1/2 A AD − 1/2 A − D − 1/2 R RD − 1/2 R , (3.36) where∥·∥ is the spectral norm. Therefore, it is sufficient to bound the spectral norm on the right and this is done in the proof of Lemma 3.2.2. Due to the assumptions (3.2.2) and (3.2.3) made in Lemma 3.2.2, the expected degrees satisfy: 1<liminf n min i d i (R) max i d i (R) ≤ limsup n max i d i (R) min i d i (R) <∞. which implies that there exists a positive constant C such that for all n, we have min i d i (R)<max i d i (R)<Cmin i d i (R). (3.37) In addition, we have min i (D R ) ii =min i P j r ij ϕ (α,d j (R)) ϕ (α,d i (R)) =min i X j r ij ϕ (α,d j (R)) ϕ (α,d i (R)) ≥ min i d i (R). (3.38) We list two technical lemmas that are used in the following proof of Lemma 3.2.2. The proofs the two lemmas are left at the end of the section. Lemma 3.3.5. ∥A− R∥≤ 3 p nlogn, with probability at least 1− 2 n 2 . Lemma 3.3.6. With the assumptions (3.2.1)-(3.2.3) of Lemma 3.2.2, D − 1/2 A − D − 1/2 R ≤ C p nlogn min i d i (R) − 3/2 , with probability at least 1− 14 n 2 . 74 Then, note that the spectral norm in (3.36) can be split as: D − 1/2 A AD − 1/2 A − D − 1/2 R RD − 1/2 R ≤ D − 1/2 A AD − 1/2 A − D − 1/2 R AD − 1/2 A + D − 1/2 R AD − 1/2 A − D − 1/2 R AD − 1/2 R + D − 1/2 R AD − 1/2 R − D − 1/2 R RD − 1/2 R ≤ D − 1/2 A − D − 1/2 R ·∥A∥· D − 1/2 A + D − 1/2 A − D − 1/2 R ·∥A∥· D − 1/2 R +∥A− R∥· D − 1 R . By triangular inequality, we have ∥A∥≤∥ A− R∥+∥R∥. The spectral norm is bounded above by the Frobenius norm: ∥R∥≤∥ R∥ F = s X i,j r 2 ij ≤ Cnmax i,j p ij ≤ Cmin j d j (R), for all n > 0, where max i,j p ij is the largest entry of the matrix P in the stochastic block model. Then applying Lemma 3.3.5 together with assumption (3.2.1), we get ∥A∥≤ Cmin j d j (R), with probability at least 1− 2 n 2 . Similarly, by Lemma 3.3.6, we have D − 1/2 A ≤ C min j d j (R) − 1/2 , with probability at least 1− 14 n 2 , for n large enough. Conditioning on the event n ∥A− R∥≤ 3 p nlogn o \ D − 1/2 A − D − 1/2 R ≤ C p nlogn min i d i (R) − 3/2 , 75 which takes place with probability at least 1− 16 n 2 , we get D − 1/2 A AD − 1/2 A − D − 1/2 R RD − 1/2 R ≤ C √ nlogn min i d i (R) =C √ logn τ n √ n , (3.39) which finishes the proof. ■ 3.3.6 Proof of Theorem 3.2.3 Recall that in Lemma 3.2.1, we computed |λ 2 (T ∗ (α ))| and ∂|λ 2 (T ∗ (α ))| ∂α . Since |λ 2 (T ∗ (α ))| is positive and monotonically decreasing onD α , for α 1 >α 0 , we have |λ 2 (T ∗ (α 0 ))|−| λ 2 (T ∗ (α 1 ))|=λ 2 (T ∗ (α 0 ))− λ 2 (T ∗ (α 1 )). (3.40) By mean value theorem, we have λ 2 (T ∗ (α 0 ))− λ 2 (T ∗ (α 1 ))≥ (α 0 − α 1 ) ∂λ 2 (T ∗ (α )) ∂α α =α 1 . (3.41) We want to find the size of the derivative. Let C 1 = (m− 1)n 2 q n 1 p , C 2 = n 2 p+(m− 2)n 2 q n 1 q . Then, for α ≥ g − 1 (n 1 /n 2 ), using the expression (3.32) in the proof of Lemma 3.2.1, we have ∂λ 2 (T ∗ (α )) ∂α α =α 1 =− C 3 n 2 n 1 , q p , ϕ (α 1 ,d ∗ 2 ) ϕ (α 1 ,d ∗ 1 ) ∂ϕ/∂α (α 1 ,d ∗ 1 ) ϕ (α 1 ,d ∗ 1 ) − ∂ϕ/∂α (α 1 ,d ∗ 2 ) ϕ (α 1 ,d ∗ 2 ) , (3.42) 76 where C 3 is some positive constant which depends on the fractions n 2 n 1 , q p and ϕ (α 1 ,d ∗ 2 ) ϕ (α 1 ,d ∗ 1 ) and it does not depend on n when n is large. By assumption, we have α 1 − α 0 > ˆ C n 2 n 1 , q p , ϕ (α 1 ,d ∗ 2 ) ϕ (α 1 ,d ∗ 1 ) , ˜ C √ logn τ n √ n ∂ϕ/∂α (α 1 ,d ∗ 1 ) ϕ (α 1 ,d ∗ 1 ) − ∂ϕ/∂α (α 1 ,d ∗ 2 ) ϕ (α 1 ,d ∗ 2 ) − 1 , where ˜ C is the constant in Lemma 3.2.2. Combining with (3.40) – (3.42), we get |λ 2 (T ∗ (α 0 ))|−| λ 2 (T ∗ (α 1 ))|> ˆ CC 3 √ logn τ n √ n , (3.43) for α ≥ g − 1 (n 1 /n 2 ). On the other hand, for α<g − 1 (n 1 /n 2 ), using expression (3.35) in the proof of Lemma 3.2.1, we have equation (3.42), for a positive constant C 4 possibly different from C 3 . Similar to (3.43), we then get |λ 2 (T ∗ (α 0 ))|−| λ 2 (T ∗ (α 1 ))|> ˆ CC 4 √ logn τ n √ n , (3.44) for α<g − 1 (n 1 /n 2 ). We then notice that |λ 2 (T(α 0 ))|−| λ 2 (T(α 1 ))|=|λ 2 (T ∗ (α 0 ))|−| λ 2 (T ∗ (α 1 ))|+(|λ 2 (T(α 0 ))|−| λ 2 (T ∗ (α 0 ))|) +(|λ 2 (T ∗ (α 1 ))|−| λ 2 (T(α 1 ))|) >|λ 2 (T ∗ (α 0 ))|−| λ 2 (T ∗ (α 1 ))|+C 5 √ logn τ n √ n , withprobabilityatleast1− 16 n 2 forsomeconstantC 5 ,byLemma3.2.2. Letting ˆ C = (1− C 5 ) min(C 3 ,C 4 ) , we see that the result of Theorem 3.2.3 follows. ■ 3.3.7 Proof of Corollary 3.2.2 The fact that Lemma 3.2.1 and Theorem 3.2.3 holds for ˜ T follows trivially from the proofs of Lemma 3.2.1, Lemma 3.2.2 and Theorem 3.2.3. ■ 77 3.3.8 Proof of Lemma 3.3.5 To bound∥A− R∥, rewrite it as: A− R=Y+ X 1≤ i<j≤ n X i,j , whereX i,j =(A− R)(e i e ⊤ j +e j e ⊤ i ),Y =diag(a 11 − r 11 ,...,a nn − r nn )ande i isthestandard basisofR n . NoteX i,j areindependentwithmeanzeroandtheyareindependentofY,which also has mean zero. Then, the matrix Bernstein inequality, Vershynin (2018) Theorem 5.4.1 implies IP ( Y+ X 1≤ i<j≤ n X i,j ≥ t ) ≤ 2nexp − t 2 /2 σ 2 +Kt/3 , (3.45) where ∥Y∥, X i,j ≤ K =1 and σ 2 = IEY 2 + X 1≤ i<j≤ n IE(X i,j ) 2 . Note (X i,j ) 2 =(a ij − r ij ) 2 (e i e ⊤ i +e j e ⊤ j ) so that IE(X i,j ) 2 =(r ij − r 2 ij )(e i e ⊤ i +e j e ⊤ j ), IEY 2 = n X i=1 (r ii − r 2 ii )e i e ⊤ i . Since each a ij is a Bernoulli random variable, r ij ∈ [0,1]. Therefore, r ij − r 2 ij ≤ 1/4, we see σ 2 ≤ n/4. Let t=3 √ nlogn. By (3.45) we get: IP ( X 1≤ i<j≤ n Y+X i,j ≥ 3 p nlogn ) ≤ 2nexp − 9nlogn 2 n 4 + 3 √ nlogn 3 ! ≤ 2nexp(− 3logn)= 2 n 2 . ■ 78 3.3.9 Proof of Lemma 3.3.6 The proof of Lemma 3.3.6 can be broken into proofs of several lemmas. We present the statements and proofs of these lemmas below. Lemma 3.3.7. Let K >1 be a constant independent of n. Then, for α ≥ 0, we have ϕ (α, min i d i (R))≤ ϕ (α,K min i d i (R))≤ C 1 ϕ (α, min i d i (R)), (3.46) C 2 ∂ϕ ∂d (α, min i d i (R))≤ ∂ϕ ∂d (α,K min i d i (R))≤ C 3 ∂ϕ ∂d (α, min i d i (R)), (3.47) where C 1 ,C 2 ,C 3 are some positive constants independent of n. On the other hand, for α< 0, we have C 1 ϕ (α, min i d i (R))≤ ϕ (α,K min i d i (R))≤ ϕ (α, min i d i (R)), (3.48) C 3 ∂ϕ ∂d (α, min i d i (R))≤ ∂ϕ ∂d (α,K min i d i (R))≤ C 2 ∂ϕ ∂d (α, min i d i (R)). (3.49) Proof. We first prove (3.46). ϕ (α, min i d i (R)) ≤ ϕ (α,K min i d i (R)) is by the mono- tonicity in property 1 of ϕ for α ≥ 0. For the other side, we have log ϕ (α,K min i d i (R)) ϕ (α, min i d i (R)) =logϕ (α,K min i d i (R))− logϕ (α, min i d i (R)) =(K− 1)min i d i (R) ∂ ∂d logϕ (α,d 0 ), for some d 0 ∈[min i d i (R),Kmin i d i (R)]. Then, by property 3 in (3.3), we have (K− 1)min i d i (R) ∂ ∂d logϕ (α,d 0 )=(K− 1)min i d i (R) ∂ϕ/∂d (α,d 0 ) ϕ (α,d 0 ) ≤ (K− 1)Kd 0 ∂ϕ/∂d (α,d 0 ) ϕ (α,d 0 ) ≤ C. 79 Then, we have ϕ (α,K min i d i (R)) ϕ (α, min i d i (R)) =exp log ϕ (α,K min i d i (R)) ϕ (α, min i d i (R)) ≤ e C . Theproofof(3.47)isthesameusingproperty3in(3.4). Then, (3.48)and(3.49)areimplied by(3.46)and(3.47),respectively,byreversingalltheinequalities,since∂ϕ/∂d ≤ 0forα< 0. ■ Lemma 3.3.8. For a fixed j, we have |d j (A)− d j (R)|≤ p 2nlogn, (3.50) with probability at least 1− 2 n 4 . In addition, we have |ϕ (α,d j (A))− ϕ (α,d j (R))|≤ p 2nlogn ∂ϕ ∂d (α,c j,1 ) , (3.51) for some c j,1 ∈ d j (R)− √ 2nlogn,d j (R)+ √ 2nlogn for each j =1,...,n, with probabil- ity at least 1− 2 n 4 . Proof. Recall d j (A)= P j i=1 a ij and d j (R)= P j i=1 r ij . Then, by Hoeffding’s inequality, we have IP{|d j (A)− d j (R)|≥ t}=IP ( n X i=1 (a ij − r ij ) ≥ t ) ≤ IP ( n X i=1 (a ij − r ij )≥ t ) +IP ( n X i=1 (r ij − a ij )≥ t ) ≤ 2exp − 2t 2 n . Letting t= √ 2nlogn, we get IP{|d j (A)− d j (R)|≥ p 2nlogn}≤ 2e − 4logn = 2 n 4 , (3.52) 80 which is (3.50). For (3.51), notice that by mean value theorem, we have |ϕ (α,d j (A))− ϕ (α,d j (R))|=|d j (A)− d j (R)| ∂ϕ ∂d (α,c j ) , where c j ∈[d j (R)−| d j (A)− d j (R)|,d j (R)+|d j (A)− d j (R)|]. Then by the previous bound (3.52), we have |ϕ (α,d j (A))− ϕ (α,d j (R))|≤ p 2nlogn ∂ϕ ∂d (α,c j,1 ) , for c j,1 ∈ d j (R)− √ 2nlogn,d j (R)+ √ 2nlogn for each j = 1,...,n, with probability at least 1− 2 n 4 . ■ Lemma 3.3.9. With the assumptions 3.2.1, 3.2.2 and 3.2.3 of Lemma 3.2.2, we have ∥D A − D R ∥≤ C p nlogn, with probability at least 1− 14 n 2 and C is some positive constant independent of n. Proof. Note both D A and D R are diagonal matrices. We first achieve an entrywise bound. For any i=1,...,n, we have P j a ij ϕ (α,d j (A)) ϕ (α,d i (A)) − P j r ij ϕ (α,d j (R)) ϕ (α,d i (R)) ≤ P j a ij ϕ (α,d j (A)) ϕ (α,d i (A)) − P j r ij ϕ (α,d j (R)) ϕ (α,d i (A)) + P j r ij ϕ (α,d j (R)) ϕ (α,d i (A)) − P j r ij ϕ (α,d j (R)) ϕ (α,d i (R)) =X +Y . (3.53) 81 Define d j,− i (A) = d j (A)− a ij . Note d j,− i (A) is independent of a ij . Then, the numerator of the first term X in (3.53) can be split as: X j a ij ϕ (α,d j (A))− X j r ij ϕ (α,d j (A)) ≤ X j a ij (ϕ (α,d j (A))− ϕ (α,d j,− i (A))) + X j (a ij − r ij )ϕ (α,d j,− i (A)) + X j r ij [ϕ (α,d j,− i (A))− ϕ (α,d j,− i (R))] + X j r ij [ϕ (α,d j (R))− ϕ (α,d j,− i (R))] =G+H +J +K. (3.54) Bound for G in (3.54): By mean value theorem and Lemma 3.3.8, |{ϕ (α,d j (A))− ϕ (α,d j,− i (A))}−{ ϕ (α,d j (R))− ϕ (α,d j,− i (R))}| ≤|{ ϕ (α,d j (A))− ϕ (α,d j (R))}−{ ϕ (α,d j,− i (A))− ϕ (α,d j,− i (R))}| ≤ 2max(|ϕ (α,d j (A))− ϕ (α,d j (R))|,|ϕ (α,d j,− i (A))− ϕ (α,d j,− i (R))|) ≤ 2 p 2nlogn ∂ϕ ∂d (α,c j,2 ) , for some c j,2 ∈ d j (R)− √ 2nlogn− 1,d j (R)+ √ 2nlogn for each j =1,...,n, with prob- ability at least 1− 2 n 4 . Then, by triangular inequality, we have |ϕ (α,d j (A))− ϕ (α,d j,− i (A))|≤| ϕ (α,d j (R))− ϕ (α,d j,− i (R))|+2 p 2nlogn ∂ϕ ∂d (α,c j,2 ) ≤ ∂ϕ ∂d (α,c j,3 ) +2 p 2nlogn ∂ϕ ∂d (α,c j,2 ) , 82 for some c j,3 ∈[d j,− i (R),d j (R)] for each j =1,...,n, with probability at least 1− 2 n 4 . Note the bound obtained is not optimal but is more than enough for the proof of the Lemma. Consider the event: Λ= \ j |ϕ (α,d j (A))− ϕ (α,d j,− i (A))|≤ ∂ϕ ∂d (α,c j,3 ) +2 p 2nlogn ∂ϕ ∂d (α,c j,2 ) . We see that by union bound 12 , IP{Λ }≥ 1− 2 n 3 . Denote IP S {T} = IP{S∩T} for events S and T. Let t = 2 logn P j ∂ϕ ∂d (α,c j,3 ) +2 √ 2nlogn ∂ϕ ∂d (α,c j,2 ) 2 1/2 . Then, by Hoeffding’s in- equality, we have IP ( X j (a ij − r ij )(ϕ (α,d j (A))− ϕ (α,d j,− i (A))) ≥ t ) ≤ IP Λ ( X j (a ij − r ij )(ϕ (α,d j (A))− ϕ (α,d j,− i (A))) ≥ t ) +IP{Λ C } ≤ 2exp − 4logn P j ∂ϕ ∂d (α,c j,3 ) +2 √ 2nlogn ∂ϕ ∂d (α,c j,2 ) 2 P j ∂ϕ ∂d (α,c j,3 ) +2 √ 2nlogn ∂ϕ ∂d (α,c j,2 ) 2 ! + 2 n 3 ≤ 2exp(− 4logn)+ 2 n 3 ≤ 4 n 3 . (3.55) Bound for H in (3.54): By mean value theorem and Lemma 3.3.8, |ϕ (α,d j,− i (A))− ϕ (α,d j,− i (R))|≤ p 2(n− 1)log(n− 1) ∂ϕ ∂d (α,c j,4 ) ≤ p 2nlogn ∂ϕ ∂d (α,c j,4 ) , 12 Consider events E 1 ,...,E n and IP(E j )≥ 1− c j for all j =1,...,n. Then, IP(∩ j E j )=IP((∪ j E C j ) C )=1− IP(∪ j E C j )≥ 1− P j IP(E C j )≥ 1− P j c j . 83 for some c j,4 ∈ d j (R)− √ 2nlogn− 1,d j (R)+ √ 2nlogn for each j =1,...,n, with prob- ability at least 1− 2 n 4 . Consider the event: Γ= \ j ϕ (α,d j,− i (A))≤ ϕ (α,d j (R))+ p 2nlogn ∂ϕ ∂d (α,c j,4 ) . With the exact same proof in Lemma 3.3.8 and union bound, IP{Γ }≥ 1− 2 n 3 . Let t = 2 logn P j ϕ (α,d j (R))+ √ 2nlogn ∂ϕ ∂d (α,c j,4 ) 2 1/2 . Then, by Hoeffding’s in- equality, we have IP ( X j (a ij − r ij )ϕ (α,d j,− i (A)) ≥ t ) ≤ IP Γ ( X j (a ij − r ij )ϕ (α,d j,− i (A)) ≥ t ) +IP{Γ C } ≤ 2exp − 4logn P j ϕ (α,d j (R))+ √ 2nlogn ∂ϕ ∂d (α,c j,4 ) 2 P j ϕ (α,d j (R))+ √ 2nlogn ∂ϕ ∂d (α,c j,4 ) 2 ! + 2 n 3 ≤ 2exp(− 4logn)+ 2 n 3 = 4 n 3 . (3.56) Bound for J in (3.54): Under the event Γ, we have X j R ij [ϕ (α,d j,− i (A))− ϕ (α,d j,− i (R))] ≤ p 2nlogn X j R ij ∂ϕ ∂d (α,c j,4 ) . (3.57) Thus, the inequality holds with probability at least 1− 2 n 3 . 84 Bound for K in (3.54): By mean value theorem, we have X j r ij [ϕ (α,d j (R))− ϕ (α,d j,− i (R))] ≤ X j r ij r ij ∂ϕ ∂d (α,c j,5 ) ≤ X j r ij ∂ϕ ∂d (α,c j,5 ) , (3.58) for some c j,5 ∈[d j,− i (R),d j (R)] for each j =1,...,n. Bound for X in (3.53): The assumptions of Lemma 3.2.2 are applied here. Note that assumptions 3.2.2 and 3.2.3 would imply that d j (R) for all j are with the same order. This is also true for ϕ (α,d j (R)) and ∂ϕ ∂d (α,d j (R)) for all j, by Lemma 3.3.7. Hence, by assumption 3.2.1 of Lemma 3.2.2 together with the property 3 (3.3) of the function ϕ , we combine the 4 bounds (3.55) – (3.58) to get a bound for numerator of X: X j a ij ϕ (α,d j (A))− X j r ij ϕ (α,d j (A)) ≤ C p nlognϕ (α,d i (R)), (3.59) with probability at least 1− 10 n 3 and C is a fixed constant independent of n. By Lemma 3.3.8 and the triangular inequality, we have 1 ϕ (α,d i (A)) ≤ 1 ϕ (α,d i (R))−| ϕ (α,d i (A))− ϕ (α,d i (R))| ≤ 1 ϕ (α,d i (R))− √ 2nlogn ∂ϕ ∂d (α,c j,1 ) , with probability at least 1− 2 n 4 . Therefore, we have X = P j a ij ϕ (α,d j (A)) ϕ (α,d i (A)) − P j r ij ϕ (α,d j (R)) ϕ (α,d i (A)) ≤ C p nlogn, with probability at least 1− 12 n 3 . 85 Bound for Y in (3.53): By Lemma 3.3.8 and the bound for 1 ϕ (α,d i (A)) , Y = P j r ij ϕ (α,d j (R)) ϕ (α,d i (A)) − P j r ij ϕ (α,d j (R)) ϕ (α,d i (R)) = X j r ij ϕ (α,d j (R)) |ϕ (α,d i (A))− ϕ (α,d i (R))| ϕ (α,d i (A))ϕ (α,d i (R)) ≤ X j r ij ϕ (α,d j (R)) √ 2nlogn ∂ϕ ∂d (α,c i,0 ) ϕ (α,d i (R)) ϕ (α,d i (R))− √ 2nlogn ∂ϕ ∂d (α,c i,0 ) ≤ Cd i (R) √ nlogn ∂ϕ ∂d (α,c i,0 ) ϕ (α,d i (R)) ≤ C p nlogn, (3.60) with probability at least 1− 2 n 4 . Bound for X +Y in (3.53): Combining (3.59) and (3.60), we get the entrywise bound. For any i, we have P j a ij ϕ (α,d j (A)) ϕ (α,d i (A)) − P j r ij ϕ (α,d j (R)) ϕ (α,d j (R)) ≤ C p nlogn, with probability at least 1− 14 n 3 . Finally, by union bound, we have ∥D A − D R ∥≤ C p nlogn, 86 with probability at least 1− 14 n 2 . ■ We then finish the proof by applying Lemma 3.3.9. Rewrite the left-hand side as D − 1/2 A − D − 1/2 R = D − 1 A − D − 1 R D − 1/2 A +D − 1/2 R − 1 = (D A − D R )(D A D R ) − 1 D − 1/2 A +D − 1/2 R − 1 ≤∥ D A − D R ∥· D − 1 A · D − 1 R · D − 1/2 A +D − 1/2 R − 1 . Condition on the event∥D A − D R ∥≤ C √ nlogn, we have D − 1/2 A +D − 1/2 R − 1 =max i (D A ) − 1/2 ii +(D R ) − 1/2 ii − 1 = min i (D A ) − 1/2 ii +(D R ) − 1/2 ii − 1 ≤ min i (D R ) − 1/2 ii +((D R ) ii +|(D A ) ii − (D R ) ii |) − 1/2 − 1 ≤ C min i (D R ) − 1/2 ii − 1 ≤ Cmax i p d i (R) ≤ Cmin i p d i (R). Similarly, D − 1 A = min i (D A ) ii − 1 ≤ C min i d i (R) − 1 . Combining the results and applying Lemma 3.3.9, we have D − 1/2 A − D − 1/2 R ≤ C p nlogn min i d i (R) − 3/2 , with probability at least 1− 14 n 2 . ■ 87 References Abbe, E. (2017). Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531. Abbe,E.,Bandeira,A.S.,andHall,G.(2015). Exactrecoveryinthestochasticblockmodel. IEEE Transactions on information theory, 62(1):471–487. Acemoglu, D., Dahleh, M.A., Lobel, I., andOzdaglar, A.(2011). Bayesianlearninginsocial networks. Review of Economic Studies, 78(4):1201–1236. Arnold, L. (1967). On the asymptotic distribution of the eigenvalues of random matrices. Journal of Mathematical Analysis and Applications, 20(2):262–268. Arnold, L. (1971). On wigner’s semicircle law for the eigenvalues of random matrices. Prob- ability Theory and Related Fields, 19(3):191–198. Bai,Z.andSilverstein,J.W.(2010). Spectral analysis of large dimensional random matrices, volume 20. Springer. Bao, Z., Ding, X., , andWang, K.(2021). Singularvectorandsingularsubspacedistribution for the matrix denoising model. The Annals of Statistics, 49(1):370 – 392. Billingsley, P. (2008). Probability and measure. John Wiley & Sons. Bordenave, C. and Guionnet, A. (2013). Localization and delocalization of eigenvectors for heavy-tailed random matrices. Probability Theory and Related Fields, 157(3-4):885–953. 88 Bourgade,P.andYau,H.-T.(2017). Theeigenvectormomentflowandlocalquantumunique ergodicity. Communications in Mathematical Physics, 350(1):231–278. Capitaine, M. and Donati-Martin, C. (2021). Non universality of fluctuations of outlier eigenvectors for block diagonal deformations of wigner matrices. Latin American Journal of Probability and Mathematical Statistics, 18:129. Chatterjee, S. and Seneta, E. (1977). Towards consensus: Some convergence theorems on repeated averaging. Journal of Applied Probability, 14(1):89–97. De Bruijn, N. G. (1981). Asymptotic methods in analysis, volume 4. Courier Corporation. Decelle, A., Krzakala, F., Moore, C., and Zdeborov´ a, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6):066106. Degroot, M. H. (1974). Reaching a consensus. Journal of the American Statistical Associa- tion, 69(345):118–121. Dekel, Y., Lee, J. R., and Linial, N. (2007). Eigenvectors of random graphs: Nodal do- mains. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 436–448. Springer. Demarzo, P., Vayanos, D., and Zwiebel, J. (2003). Persuasion bias, social influence, and unidimensional opinions. The Quarterly Journal of Economics, 118:909–968. Erd˝ os, L., Schlein, B., and Yau, H.-T. (2009). Local semicircle law and complete delocaliza- tion for wigner random matrices. Communications in Mathematical Physics, 287(2):641– 655. Erd˝ os, P.andR´ enyi, A.(1960). Ontheevolutionofrandomgraphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60. 89 Fan, J., Fan, Y., Han, X., and Lv, J. (2020). Asymptotic theory of eigenvectors for random matriceswithdivergingspikes. Journal of the American Statistical Association, 0(0):1–14. Fan, J., Fan, Y., Han, X., and Lv, J. (2021). Simple: Statistical inference on membership profiles in large networks. arXiv preprint arXiv:1910.01734. Friedkin,N.andJohnsen,E.(1999). Socialinfluencenetworksandopinionchange. Advances in Group Processes, 16. Gale, D. and Kariv, S. (2003). Bayesian learning in social networks. Games and Economic Behavior, 45(2):329–346. Golub, B. and Jackson, M. (2012). How homophily affects the speed of learning and best response dynamics. The Quarterly Journal of Economics, 127. Han, X. and Tong, X. (2021). Individual-centered partial information in social networks. arXiv preprint arXiv:2010.00729. Holland,P.W.,Laskey,K.B.,andLeinhardt,S.(1983). Stochasticblockmodels: Firststeps. Social Networks, 5(2):109 – 137. Jackson, M. and Golub, B. (2010). Na¨ ıve learning in social networks and the wisdom of crowds. American Economic Journal: Microeconomics, 2:112–49. Levin, D. A. and Peres, Y. (2017). Markov chains and mixing times, volume 107. American Mathematical Soc. Lobel, I. and Sadler, E. (2015). Information diffusion in networks through social learning. Theoretical Economics, 10(3):807–851. Meyer, C. D. (2000). Matrix analysis and applied linear algebra, volume 71. Siam. Mossel, E., Sly, A., and Tamuz, O. (2015). Strategic learning and the topology of social networks. Econometrica, 83(5):1755–1794. 90 Mueller-Frank, M. (2013). A general framework for rational learning in social networks. Theoretical Economics, 8(1). Mueller-Frank, M. (2014). Does one bayesian make a difference? Journal of Economic Theory, 154(C):423–452. Rohe, K., Chatterjee, S., and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. The Annals of Statistics, 39(4):1878–1915. Rudelson, M. and Vershynin, R. (2015). Delocalization of eigenvectors of random matrices with independent entries. Duke Mathematical Journal, 164(13):2507–2538. Rudelson,M.andVershynin,R.(2016). No-gapsdelocalizationforgeneralrandommatrices. Geometric and Functional Analysis, 26(6):1716–1776. Stanley, R. P. (2011). Enumerative combinatorics volume 1 second edition. Cambridge studies in advanced mathematics. Tang, M. and Priebe, C. E. (2018). Limit theorems for eigenvectors of the normalized laplacian for random graphs. The Annals of Statistics, 46(5):2360–2415. Tao, T. (2012). Topics in random matrix theory, volume 132. American Mathematical Soc. Tracy, C. A. and Widom, H. (1994). Level-spacing distributions and the airy kernel. Com- munications in Mathematical Physics, 159(1):151–174. Tracy, C. A. and Widom, H. (1996). On orthogonal and symplectic matrix ensembles. Communications in Mathematical Physics, 177(3):727–754. Vershynin, R. (2018). High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. 91 Vu, V. and Wang, K. (2015). Random weighted projections, random quadratic forms and random eigenvectors. Random Structures & Algorithms, 47(4):792–821. Wang, Y. J. and Wong, G. Y. (1987). Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82(397):8–19. Wigner, E. P. (1955). Characteristic vectors of bordered matrices with infinite dimensions. Annals of Mathematics, pages 548–564. 92
Abstract (if available)
Abstract
Random graphs, created by connecting graph theory and probability theory, are studied in mathematics and other related subjects. Nowadays, many problems involve analysis of graphs with very large sizes and the mathematical understanding of them becomes important in the aspect of both theory and application. In this dissertation, we explore two related problems. The first problem, motivated by Han and Tong (2021), focuses on a graph structure, named the vertex-centered subgraph. We establish asymptotic joint normality for entries of eigenvectors of the corresponding random adjacency matrix by computing the asymptotic expansion of the entries of the eigenvectors followed by an application of the Lyapunov central limit theorem. In the second problem, we study the convergence speed of the DeGroot model for random networks modeled by the stochastic block model. We generalize the updating rule, represented by a row-stochastic matrix T, to be degree-weighted and capture the change in the updating rule by a parameter α ∈ ℝ such that greater α implies that higher weights are put on vertices of higher degrees. By deriving a concentration inequality for the second largest eigenvalue in magnitude of the matrix T, we establish monotonicity, showing that the convergence speed increases as α increases in some region $\mathcal{D}_{\alpha}$.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Some results concerning the critical points of directed polymers
PDF
Dimension reduction techniques for noise stability theorems
PDF
Asymptotic problems in stochastic partial differential equations: a Wiener chaos approach
PDF
Max-3-Cut performance of graph neural networks on random graphs
PDF
Probabilistic divide-and-conquer -- a new method of exact simulation -- and lower bound expansions for random Bernoulli matrices via novel integer partitions
PDF
Dynamic network model for systemic risk
PDF
Multi-population optimal change-point detection
PDF
Applications of Stein's method on statistics of random graphs
PDF
Set values for mean field games and set valued PDEs
PDF
Topics on dynamic limit order book and its related computation
PDF
Neural matrix factorization model combing auxiliary information for movie recommender system
PDF
Random forests and diffusion flow
PDF
Bulk and edge asymptotics in the GUE
PDF
New methods for asymmetric error classification and robust Bayesian inference
PDF
Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol
PDF
Linear filtering and estimation in conditionally Gaussian multi-channel models
PDF
A rigorous study of game-theoretic attribution and interaction methods for machine learning explainability
PDF
Asymptotic expansion for solutions of the Navier-Stokes equations with potential forces
PDF
A “pointless” theory of probability
PDF
Statistical citation network analysis and asymmetric error controls
Asset Metadata
Creator
Wu, Yusheng
(author)
Core Title
Asymptotic properties of two network problems with large random graphs
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Degree Conferral Date
2022-05
Publication Date
12/20/2021
Defense Date
12/02/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
asymptotics,networks,OAI-PMH Harvest,probability,random graphs
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lototsky, Sergey V. (
committee chair
), Heilman, Steven M. (
committee member
), Tong, Xin (
committee member
)
Creator Email
yushengmath@gmail.com,yushengw@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC18807275
Unique identifier
UC18807275
Legacy Identifier
etd-WuYusheng-10316
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Wu, Yusheng
Type
texts
Source
20211223-wayne-usctheses-batch-906-nissen
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
asymptotics
networks
probability
random graphs