Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Applications of Stein's method on statistics of random graphs
(USC Thesis Other)
Applications of Stein's method on statistics of random graphs
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Applications of Stein's Method on Statistics of Random Graphs by Radoslav Marinov A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Applied Mathematics) August 2013 Copyright 2013 Radoslav Marinov Abstract We look at the background of Stein's Method, its ties with the eld of Markov Chains, and the connections of both those subjects and the Neighbourhood Attack Voter-type model. On the basis of that background, we provide two examples of applications of Stein's Method, one in the context of a pair of Voter Models, and one in the context of the uniform recursive tree. ii Table of Contents Part I Background 1 Chapter 1 Stein's Method 3 1.1 Introduction to Stein's Method . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.2 A few word regarding metrics . . . . . . . . . . . . . . . . . 6 1.2 Stein's Method and the Gaussian Distribution . . . . . . . . . . . . 7 1.2.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 A sum of independent random variables . . . . . . . . . . . 9 1.2.3 Dependency Neighborhoods . . . . . . . . . . . . . . . . . . 12 1.2.4 Exchangeable pairs . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Stein's Method and the Poisson Distribution . . . . . . . . . . . . . 17 1.3.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3.2 A basic example - sums of i.i.d. indicators . . . . . . . . . . 19 1.3.3 Size-biasing, with an example . . . . . . . . . . . . . . . . . 20 Chapter 2 Markov Chains 25 2.1 Overview of Markov Chains . . . . . . . . . . . . . . . . . . . . . . 25 2.1.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Chapter 3 \Voter" and Other Models on Graphs 33 3.1 Overview of Voter-type Models . . . . . . . . . . . . . . . . . . . . 33 3.2 Voter-models, Markov Chains, and Stein's Method . . . . . . . . . . 34 3.2.1 Other aspects of voter models . . . . . . . . . . . . . . . . . 40 Part II Applications 46 Chapter 4 CountingVerticesintheNeighbourhoodAttackModel Given Abundance of Symmetry 48 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.1.1 Problem: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.1.2 Approach and some background: . . . . . . . . . . . . . . . 49 4.1.3 Setup: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1.4 Results: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 iii 4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.1 ProvingE(W 0 jW ) = (1)W : . . . . . . . . . . . . . . . . 52 4.2.2 R ollin's Result: . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 Calculations and Proofs . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.1 Bounding the Variance of Y . . . . . . . . . . . . . . . . . . 57 4.3.2 Reducing and bounding VarE[(Y 0 Y ) 2 jY ] Part 1: . . . . . 61 4.3.3 Reducing and bounding VarE[(Y 0 Y ) 2 jY ] Part 2: . . . . . 63 4.4 Conclusions and Questions . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.1 Examples concerning specic families of graphs . . . . . . . 66 4.5 Further Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Chapter 5 A More General Case of the Neighborhood Attack Model; the Mercurial Model 70 5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2 Neighborhood Attack: A More General Case . . . . . . . . . . . . . 72 5.3 The Mercurial Voter Model . . . . . . . . . . . . . . . . . . . . . . 80 5.3.1 and Var(Y ) . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3.2 Upper bound to Var[E(Y 0 Y ) 2 jY ] . . . . . . . . . . . . . . 85 5.4 Relaxing some conditions in the Mercurial Model . . . . . . . . . . 87 Chapter 6 A Stein's Method Application to the Uniform Recur- sive Tree 95 6.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.4 Coupling in d = 1 case . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.5 Bound in d = 1 case . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.6 d> 1 Case; Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.7 The Bounds in d> 1 case . . . . . . . . . . . . . . . . . . . . . . . 103 6.8 The Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.9 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.10 Pictures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.11 Bounding maxp i . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.12 The Overall Bound; the Bivariate Poisson Distribution . . . . . . . 117 Bibliography 120 iv Part I BACKGROUND In this part, we review the established work which comprises the foundation of the results presented in Part II. In particular, we survey: Stein's Method; Markov Chains; and Voter-type models. 2 1 Stein's Method Charles Stein (b. 1920) of Stanford University developed his method in the early 1970s. He came up with the ideas we will examine shortly while attempting to prove the Combinatorial Central Limit Theorem in a manner closer to his liking than the one he found in a paper he was reading. Stein rst published his results in [57]. In eect, Stein's Method provides an infrastructure for the estimation of the dis- tances between certain classes of random variables and certain classic distributions, most notably the Gaussian and the Poisson distributions. For practical purposes, we can break Stein's Method into three key steps: First, one has to use Stein's identities to establish a bound on the distance between a class of random variables and a specic distribution expected to be close to the given class; second, one has to satisfy the conditions generated in the preceding step; and third, one has to evaluate the acquired bound. The last step typically involves something along the lines of reducing an expression involving a function of the variance of the given random variable. 3 Details and examples follow below. 1.1 Introduction to Stein's Method For a recent introductory-level overview of Stein's Method, refer to [53]. Also useful is [13]. For a recent expert-level - though still clear and accessible - book on applications of Stein's Method in relation to the Gaussian distribution, refer to [18]. For a classic book-length presentation of Stein's Method as applied to the Poisson distribution, refer to [9]. Unless otherwise indicated, proofs of the results present in this section can be found in the references listed above. The rst two reference contain proofs to the important general results, while the other two references contain proofs to the respective Gaussian- and Poisson-related specic results. 1.1.1 Preliminaries Lemma 1.1.1. For the functional operator A dened as Af(x) = f 0 (x)xf(x), 1) EAf(Z) = 0 for all absolutely continuous f with Ejf 0 (Z)j<1, where Z is the standard normal; and 2) if for some random variable W we have EAf(W ) = 0 for all absolutely continuous functions f withjf 0 j <1, then W has the standard normal distribution. This rst lemma goes along with the following second lemma: 4 Lemma 1.1.2. If (x) is the c.d.f. of the standard normal, then the unique bounded solution f x of f 0 x (w)wf x (w) = 1 wx (x) (1.1.1) is f x (w) =e w 2 =2 Z 1 w e t 2 =2 ((x) 1 tx )dt =e w 2 =2 Z 1 w e t 2 =2 ((x) 1 tx )dt: In other words, given a reasonably nice function f,E(f 0 (W )xf(W )) is close to zero if and only ifW is close to the standard normal. Moreover, we can quantify the proximity between W and the standard normal via Lemma 1.1.2. We have: Corollary 1.1.3. Given f x dened as in Lemma 1.1.2, we have for all random variables W , jP (Wx) (x)j =jE[f 0 x (W )Wf x (W )]j: (1.1.2) Hence, bounding the distance to the normal for specic classes of W amounts to bounding the right-hand-side of 1.1.2. 5 1.1.2 A few word regarding metrics A \distance" only has meaning in the context of a metric. Specically, given two probability measures and , we dene the distance between them as follows: d H (;) = sup h2H Z h(x)d(x) Z h(x)d(x) ; (1.1.3) whereH is a family of \test" functions. For r.v.'s (random variables) X, Y , pos- sessing laws and , we regard d H (X;Y ) as equivalent to d H (;). There are three standard metrics on which we will focus: ForH =f1 x : x2 Rg, we have the Kolmogorov metric: d K . That is the maximum distance between the distribution functions. Convergence in this metric implies weak convergence. ForH =fh :R!R :jh(x)h(y)jjxyjg we have the Wasserstein metric: d W . This metric is useful when one works with continuous distributions. ForH =f1 2A :A2 Borel(R)g, we have the total variation metric: d TV . The total variation distance is congenial to discrete distributions; it is also useful for continuous distributions. Moreover, Proposition 1.1.4. (Prop 1.2 in [53].) Under the above notation, 1) for r.v.'s X and Y , d K (X;Y )d TV (X;Y ); 6 2) if the r.v. X has Lebesgue density bounded by C, then for all r.v.'s Y we have d K (X;Y ) p 2Cd W (X;Y ); and 3) for X and Y discrete on the space , d TV (X;Y ) = 1 2 X !2 jP (X =x)P (Y =y)j: In practical terms, this means that bounds in the Wasserstein and TV met- rics also yield (somewhat inferior) bounds in the Kolmogorov metric. Hence, we work with the Wasserstein metric when trying to bound distances to continuous distributions (specically the Gaussian), and with Total Variation when targeting distances to discrete distributions (particularly the Poisson). 1.2 Stein's Method and the Gaussian Distribution 1.2.1 Setup Withd H (X;Y ) = sup h2H jEh(X)Eh(Y )j for r.v.'sX,Y , and givenh2H, letf h solve f 0 h (w)wf h (w) =h(w) (h) where (h) =Eh(Z), Z being the standard normal. Then, drawing on the results presented above, it follows that: 7 Proposition 1.2.1. If W is a random variable and Z has the standard normal distribution, then d H (W;Z) = sup h2H jE[f 0 h (W )Wf h (W )]j: (1.2.1) To bound the right-hand-side of 1.2.1 we need to look more closely at W and f h . As to the latter, Lemma 1.2.2. Let f h be the solution to f 0 h (w)wf h (w) = h(w) (h). If h is bounded, then 1) jf h j r 2 jh() (h)j; and jf 0 h jh() (h)j: 2) If h is absolutely continuous, then jf h j 2jh 0 j; jf 0 h j r 2 jh 0 j; jf 00 h j 2jh 0 j: Since, by 1.1.4, for W an r.v. and Z a standard normal we have d K (W;Z) 2 1 4p d W (W;Z); we opt for the Wasserstein over the Kolmogorov distance in the context of the application of Stein's Method in regard to the Gaussian distribution. One key reason for preferring the Wasserstein metric is that for Wasserstein test functions h, the crucial f h function has two bounded derivatives (by Theorem 1.2.2); while 8 h-Kolmogorov guarantees only one bounded derivative for f h . The signicance of this will become clear below. Theorem 1.2.3. SupposeW is an r.v. andZ has the standard normal distribution. Further, letF =ff :jfj;jf 00 j 2;jf 0 j p 2=g. Then d W (W;Z) sup f2F jE[f 0 (W )Wf(W )]j: (1.2.2) We now possess sucient infrastructure to consider specic examples of appli- cations of Stein's Method. 1.2.2 A sum of independent random variables Drawing on the above results, we claim that: Theorem 1.2.4. (Theorem 3.2 in [53]; also see Ch.3 and Ch.4.1 of [18]) Let X 1 ;:::;X n be independent mean-zero r.v.'s with EjX i j 4 <1 and EX 2 i = 1. For W = P i X i p n and Z-standard-normal, we have d W (W;Z) 1 n 3=2 n X i=1 EjX i j 3 + p 2 p n v u u t n X i=1 E[X 4 i ]: In particular, if the X i 's are identically distributed, we have d W (W;Z) (A +B)n 1=2 ; A =EjX 1 j 3 ; B = p 2E[X 4 1 ] p : (1.2.3) 9 In comparison, the classic Berry-Esseen theorem produces a similar result, the key dierences being: 1) that Berry-Esseen only requires EjX 1 j 3 <1 (as opposed to EjX 1 j 4 <1); and 2) in the coecients. In regard to the rst point, one can drop theEjX 1 j 4 <1 condition and restrict oneself to the Barry-EsseenEjX 1 j 3 <1 condition within the context of Stein's Method (See Lecture 3 in [13]). Proof. Let p nW i = P i6=j X i : Then X i is independent of W i and E[X i f(W i )] = 0. It follows that E[Wf(W )] =E " 1 p n X i X i f(W ) # = =E " 1 p n X i (X i f(W )X i f(W i )) # = =E " 1 p n X i X i (f(W )f(W i ) (WW i )f 0 (W )) # + +E " 1 p n X i X i (WW i )f 0 (W ) # Next, jE [f 0 (W )Wf(W )]j E " 1 p n X i X i (f(W )f(W i ) (WW i )f 0 (W )) # + + E " f 0 (W ) 1 1 p n X i X i (WW i ) !# 10 Using Taylor's expansion and the triangle inequality, we nd that E " 1 p n X i X i (f(W )f(W i ) (WW i )f 0 (W )) # jf 00 j 2 p n X i EjX i (WW i ) 2 j: Via Cauchy-Shwartz, we get E " f 0 (W ) 1 1 p n X i X i (WW i ) !# jf 0 j n Ej X i (1X 2 i )j jf 0 j n s Var( X i X 2 i ): Finally, combining the results, and noting that theX i 's are i.i.d. and that Var(X 2 i ) E[X 4 i ], we obtain jE[f 0 (W )Wf(W )]j jf 00 j 2n 3=2 X i EjX 3 i j + jf 0 j n s X i E[X 4 i ] (1.2.4) as desired. One can build generally applicable theorems on the basis of the above argument. For specic examples, consult [18]. 11 1.2.3 Dependency Neighborhoods We now look at what happens when one introduces some dependency into the model. For the complete details, consult Chapter 4.7 of [18], Chapter 3.2 of [53], and [20] (also see the original results found in [6] and [7]). Below are three forms of a local dependency neighborhood bound, ordered from most complicated (and powerful) to most accessible. Denition 1.2.5. LetJ be a nite index set of cardinality n, and letfX i ;i2Jg be an indexed collection of mean-zero nite-variance random variables. For W = P i2J X i and Var(W ) = 1, for any AJ dene A C =fj2J : j62 Ag and X A =fX i :i2Ag.jAj is the cardinality of A. Denition 1.2.6. We dene a condition for local dependence (LD1): For each i2J , there exists an A i J such that X i and X A C i are independent. Theorem 1.2.7. (Version 1, [20]) Under (LD1), we obtain the bound sup z jF (z) (z)jr 1 + 4r 2 + 8r 3 +r 4 + 4:5r 5 + 1:5r 6 ; where r 1 =Ej X i2J (X i Y i EX i Y i )j; r 2 = X i2J EjX i Y i j1 jY i j>1 ; r 3 = X i2J EjX i j(Y 2 i ^ 1); r 4 = X i2J EfjWX i j(Y 2 i ^ 1)g; 12 r 5 = Z jtj1 Var( b K(t))dt; r 6 = ( Z jtj1 jtj Var( b K(t))dt) 1=2 ; b K(t) = X i2J b K i (t); b K i (t) =X i f1 Y i t<0 1 0tY i g; K(t) =E b K(t) = X i2J K i (t); K i (t) =E b K i (t): Theorem 1.2.8. (Version 2, Chapter 4.7 of [18]) Dene N i = P j2A i X i . Then, given (LD1), we have: kL(W )L(Z)k 1 r 2 E X i2J fX i N i E(X i N i )g + X i2J EjX i N 2 i j: Theorem 1.2.9. (Version 3, Chapter 3.2 of [53]) Suppose the X i 's have nite fourth moments and zero means, and let 2 = Var( P i X i ) and W = P i X i =. Let N i be the dependency neighborhoods around the nodes X i . Let D be the maximal cardinality of the N i 's. Then: d W (W;Z) D 2 3 n X i=1 EjX i j 3 + p 28D 3=2 p 2 v u u t n X i=1 E[X 4 i ]: (1.2.5) We provide an application of local dependency in Part II. 1.2.4 Exchangeable pairs We consider an introduction to the exchangeable-pairs technique for Stein's Method. Charles Stein wrote of exchangeable pairs already in 1986 ([58]). We rely on the 13 sources mentioned above; for a recent extended summary also see Nathan Ross's doctoral dissertation on the subject, [52]. Denition 1.2.10. The ordered pair (W;W 0 ) of random variables is called an exchangeable pair if (W;W 0 ) = d (W 0 ;W ). Moreover, if we also have (linearity condition) E[W 0 jW ] = (1)W (1.2.6) for some 0< 1, then we call (W;W 0 ) a -Stein pair. Next, Proposition 1.2.11. Given an exchangeable pair (W;W 0 ), we have: 1) IfF :R 2 ! R is anti-symmetric (F (x;y) =F (y;x)), then E[F (W;W 0 )] = 0. 2) If the pair (W;W 0 ) is not only exchangeable but -Stein with Var(W ) = 2 , then E[W ] = 0 and E[(W 0 W ) 2 ] = 2 2 . On the basis of these key results, we obtain the following bound: Theorem 1.2.12. (Theorem 3.9 in [53]) If (W;W 0 ) is-Stein withE[W 2 ] = 1 and Z is of the standard normal distribution, then d W (W;Z) p Var(E[(W 0 W ) 2 jW ]) p 2 + EjW 0 Wj 3 3 : (1.2.7) Thus, if we can construct an appropriate W 0 on the same space as W , we can obtain a bound. In eect, the key steps in applications of exchangeable pairs to Stein's method are: 1) The construction of the pair, along with the proofs 14 for the presence of the exchangeability and linearity conditions; and 2) the often challenging reduction or bounding of the Var(E[(W 0 W ) 2 jW ]) term. (Typically, the non-conditional term EjW 0 Wj 3 is tractable.) So how do we construct a-Stein pair? It turns out that givenf - an appropriate function on the state space of a stationary reversible Markov Chain - we can often construct a - Stein pair by letting W and W 0 be the values of f at consecutive turns of the Markov Chain. We discuss this approach in the respective section further below. For now, consider the following basic example: Example 1.2.13. Let X 1 ;:::;X n be mean-0-variance-1 independent random variables with nite fourth moments. Dene W as W = P i X i p n . To construct W 0 , uniformly-at-random choose an integerI fromf1;:::;ng, produce a (X 0 1 ;X 0 2 ;:::;X 0 n ) of the same distribu- tion as and independent of (X 1 ;:::;X n ), and let W 0 =W X I p n + X 0 I p n : W and W 0 are clearly exchangeable. To show that the linearity condition holds, observe that E[W 0 WjW ]E[W 0 Wj(X 1 ;:::;X n )]; (1.2.8) sincefX i g i=1;:::;n is a sigma-algebra ofW . If we can show thatE[W 0 Wj(X 1 ;:::;X n )] equalsW , we can use the fact that E[E[W 0 Wj(X 1 ;:::;X n )]jW ] = E[W 0 15 WjW )] to procure our desired result. This is a standard trick which recurs across the literature. Now, E[W 0 Wj(X 1 ;:::;X n )] = 1 p n E[X 0 I X I j(X 1 ;:::;X n )] = 1 p n X i 1 n E[X 0 i X i j(X 1 ;:::;X n )] = 1 n X i X i p n = W n : It follows that (W;W 0 ) is -Stein with = 1=n. To get our rate, we need to bound Var(E[(W 0 W ) 2 jW ]) andEjW 0 Wj 3 (from 1.2.12). In regard to the latter: EjW 0 Wj 3 = 1 n 3=2 X i EjX i X 0 i j 3 8 n 3=2 X i EjX i j 3 Next, E[(W 0 W ) 2 j(X 1 ;:::;X n )] = 1 n 2 X i E[(X 0 i X i ) 2 jX i ] = 1 n 2 X i (1 +X 2 i ): 16 We take variance to obtain Var(E[(W 0 W ) 2 jW ]) 1 n 4 X i E[X 4 i ]: In sum, we have the following bound on the distance between W and the standard normal: d W (W;Z) r 2 p P i E[X 4 i ] 2n + 2 3n X i EjX i j 3 : (1.2.9) If theX i 's are i.i.d., this translates to an ordern 1=2 distance, which is the optimal (in order) distance. 1.3 Stein's Method and the Poisson Distribution For a thorough overview of the subject, consult [9]. [16] comprises a useful recent survey. [3] is a classic paper on the subject. Herein, I continue to draw on [53]. 1.3.1 Setup It is natural to suspect that if Stein's Method works when applied to the Gaussian distribution, one can adapt it to other distributions. That is indeed the case. In regard the the Poisson distribution, observe that: Lemma 1.3.1. For > 0 (so that it can be the parameter of a Poisson variable), dene A as follows - Af(k) =f(k + 1)kf(k): (1.3.1) 17 We have: 1) If Z is Poisson(), then EAf(Z) = 0 for all bounded f. 2) If, for a non-negative integer-valued r.v. W , EAf(W ) = 0 for all bounded f, then W is Poisson(). The operatorA is called the characterizing operator of the Poisson distribution. Next, Lemma 1.3.2. Let P denote probabilities drawn from a Poisson() distribution; and let Af0; 1; 2;:::g. The unique solution f A of f A (k + 1)kf A (k) = 1 k2A P (A) (f A (0) = 0) (1.3.2) is given by f A (k) = k e (k 1)![P (A\U k )P (A)P (U k )]; (1.3.3) where U k =f0; 1;:::;k 1g. Hence, Corollary 1.3.3. For W 0 an integer-valued r.v. with mean , we have jP (W2A)P (A)j =jE[f A (W + 1)Wf A (W )]j: (1.3.4) Continuing as in our examination of Stein's Method in the Gaussian context, we seek bounds for f A the solutions to 1.3.2: 18 Lemma 1.3.4. If f A solves 1.3.2, then jf A j minf1; 1=2 g and jf A j 1e minf1; 1 g; (1.3.5) where f(k) :=f(k + 1)f(k): Combining all of the above, we procure a distance to the Poisson distribution: Theorem 1.3.5. LetF be the set of functions satisfying 1.3.2. If W 0 is an integer-valued r.v. with mean , and Z is Poisson(), then d TV (W;Z) sup f2F jE[f(W + 1)Wf(W )]j: (1.3.6) Intuitively,W will typically be a sum of potentially dependant, potentially non- identically-distributed indicator variables. 1.3.2 A basic example - sums of i.i.d. indicators Theorem 1.3.6. (Theorem 4.6 in [53]) Let X 1 ;:::;X n be i.i.d. indicators with P (X i = 1) =p i and P i p i =. For W = P i X i and Z Poisson(), we have d TV (W;Z) minf1; 1 g X i p 2 i 19 Proof. For f a solution to 1.3.2, note that, for W i = W X i , and given the independence between the X i 's, E[Wf(W )] = X i E[X i f(W )] = X i E[f(W )jX i = 1]P (X i = 1) = X i p i E[f(W i + 1)]: Next, since f(W + 1) = P i p i f(W + 1), we nd that jE[f(W + 1)Wf(W )]j =j X i=1 p i E[f(W + 1)f(W i + 1)]j X i p i jfjEjWW i j = minf1; 1 g X i p i E[X i ] = minf1; 1 g X i p 2 i : 1.3.3 Size-biasing, with an example The most convenient approach to applying Stein's Method in the Poisson-setting is size-biasing. For an in-depth overview of the subject, see [9]. Also see [33] and Ch.2.3.4 of [18]. 20 Denition 1.3.7. For a non-negative, nite-mean () r.v. X, we say that the r.v. X s has the size-bias distribution with respect to X, if for all f s.t. (such that) EjXf(X)j is nite, we have E[Xf(X)] =E[f(X s )]: (1.3.7) Given that, consider the following: Theorem 1.3.8. Let W 0 be an integer-valued r.v. s.t. EW = > 0. Let W s be a size-bias coupling of W . If Z is Poisson(), then d TV (W;Z) minf1;gEjW + 1W s j: (1.3.8) Proof. Take f bounded,kfk minf1; 1 g. Then jE[f(W + 1)Wf(W )]j =jE[f(W + 1)f(W s )]j =jE[f(W + 1)f(W s )]j kfkEjW + 1W s j = minf1;gEjW + 1W s j: Corollary 1.3.9. In particular, if W is the sum of the indicators X 1 ;:::;X n , = P i EX i , we can attempt the following construction: Take I an r.v. independent of all else with P (I = i) = p i =. Dene W s = P j6=I X (I) j + 1. If for all i = 1;:::;n we can show that (X (i) j ) has the law of (X j ) j6=i conditional on X i = 1, then W s has the siaz-bias distribution of W . 21 Now, if we can make sure that W + 1>W s or W + 1<W s , we can drop the absolute value in 1.3.8. Couplings satisfying either of those conditions are called monotone. Let us take a look at the general Stein's Method result for the decreasing monotone size-bias coupling, along with a standard example. Theorem 1.3.10. LetX 1 ;:::;X n be indicators with expectationsp i ; P i p i =. For eachi = 1;:::;n, let (X (i) j ) j6=i have the law of (X j ) j6=i conditional onX i = 1, and let I be an r.v. independent of all else withP (I =i) =p i =. ThenW s = P j6=I X (I) j +1 has the size-bias distribution of W . If also X (i) j X j for all i6= j, and Z is Poisson(), we have d TV (W;Z) minf1;g 1 Var(W ) : (1.3.9) Proof. Let W i = P j6=i X (i) j + 1. We drop the absolute value in 1.3.8: d TV (W;Z) minf1;gEjW + 1W s j = minf1;g X i EjW + 1W i jP (I =i) = = minf1;g X i p i EjW + 1W i j = minf 1 ; 1g X i p i EjW + 1W i j = = minf 1 ; 1g X i p i E[ X j6=i (X j X (i) j ) +X i ] = = minf 1 ; 1g X i p i E[WW i + 1] = = minf 1 ; 1g( 2 E[W s ] +) = = minf 1 ; 1g( (E[W 2 ] 2 )) = = minf1;g 1 Var(W ) : 22 Consider the following example as an illustration: Example 1.3.11. (Hypergeometric distribution example) Suppose we have a set of N elements, m of which are special, the rest being ordinary. We draw a set ofn elements uniformly-at-random out of the N n possible such sets. We know that for N n;N m, the number of special elements in the random subset of size n is approximately Poisson(mn=N). Suppose Z has the Poisson(mn=N) distribution. Let us determine the distance between W and Z using a decreasing coupling. Arbitrarily order the special elements. Let X i be the indicator variable for the event \special element i was selected in the draw." Clearly, W = P m i=1 X i , p i =EX i =n=N, and P m i=1 p i = nm N . To construct the coupling, uniformly-at-random select one of the special ele- ments (element I). If it is in the draw, do nothing. If it is not in the draw, insert special element I into the draw in the place of some uniformly-at-random selected element in the draw. Let X (I) j;j6=I be the indicator of the event \element j is in the draw after the coupling." Since the coupling may have removed a special element from the sample, it is clear that X (I) j;j6=I X j . Moreover, it is easy to see that P (X (I) j6=I = 1) = n N mn N 1 n = m(n 1) N =P (X j6=i = 1jX i = 1): 23 We hence have our coupling, and therefore d TV (W;Z) minf1;g 1 Var(W ) = = min n 1; mn N o 1 Var(W ) mn=N = = min n 1; mn N o 1 nm(Nm)(Nn) N 2 (N 1) N nm = = min n 1; mn N o N(N 1) (Nm)(Nn) N(N 1) = = min n 1; mn N o n N 1 + m N 1 nm N(N 1) 1 N 1 24 2 Markov Chains The main source on Markov Chains that I follow here is [38]. Unless otherwise specied, proofs omitted below are to be found in that source. Other useful com- prehensive sources include [59], [22], [35], [42], and [1]. 2.1 Overview of Markov Chains 2.1.1 Denitions Denition 2.1.1. A sequence of random variables (X 0 ;X 1 ;:::) is a Markov chain with state space and transition matrix P if for all x;y2 and all t 1 and all events H t1 =\ t1 s=0 fX s =x s g satisfying P(H t1 \fX t =xg)> 0, we have P(H t1 \fX t =xgg =PfX t+1 =yjX t =xg =P(x;y): (2.1.1) In other words, we jump around a state space via the matrixP (a square matrix of dimension equal to the size of the state space), which gives us the probabilities 25 of going from one state to another. Observe that the only information used each turn is the last state - that is the \Markov Property." Clearly, P has to satisfy P y2 P (x;y) = 1 8x2 : One natural question to ask is the following: under what conditions can we predict (in the form of a probability distribution) the state of the chain at some advanced turn t? It turns out that a certain large class of Markov Chains attains a stationary distribution after running for suciently many turns. The stationary distribution satises: =P: (2.1.2) To specify exactly which class of Markov Chains has a stationary distribution, we need a few more denitions: Denition 2.1.2. A chainP (here we identify the chain with its transition matrix) is called irreducible if for any two states x;y 2 there exists an integer t s.t. P t (x;y)> 0. I.e. a chain is irreducible if one could conceivably go, perhaps over a few turns, from any state to any other state at any time. Denition 2.1.3. Let RT (x) :=ft 1 :P t (x;x)> 0g be the set of times when it is possible for the chain to return to a starting position x. The period of the state x is the greatest common divisor of RT (x). Lemma 2.1.4. If P is irreducible, then gcdRT (x) = gcdRT (y) for all x;y2 . 26 Denition 2.1.5. The period of an entire chain is dened as the period which is common to all states. An irreducible chain is aperiodic if all states have period 1. A non-aperiodic chain is periodic. Some periodic chains can be made aperiodic via the trick of introducing waiting times (i.e. force the chain to stay put each turn with certain probability). Proposition 2.1.6. If P is aperiodic and irreducible, then there is an integer r such that P r (x;y)> 0 for all x;y2 . Next we consider a set of sucient conditions for the existence of a stationary distribution for a given chain: Proposition 2.1.7. LetP be the transition matrix of an irreducible Markov Chain. Then 1) there exists a probability distribution on s.t. =P and(x)> 0 for all x2 . Moreover, 2) (x) = 1 Ex( + x ) . Here + x := minft 1 :X t =xg is the rst return time. Furthermore, Denition 2.1.8. If P is the transition matrix of an irreducible Markov Chain, then that Markov chain has a unique stationary distribution . Of some interests in the context of Stein's Method is the notion of reversibility. Proposition 2.1.9. Given a Markov chain with transition matrix P , any distribu- tion satisfying (x)P (x;y) =(y)P (y;x) 8x;y2 (2.1.3) 27 is stationary for P . A chain satisfying 2.1.3 is called reversible. The idea here is that if we start a chain at stationarity, we can \run it backwards" with the same distribution the chain ordinarily has; i.e. (X 0 ;X 1 ;:::;X n ) is of the same law as is (X n ;X n1 ;:::;X 0 ). Also, Proposition 2.1.10. Let (X t ) be an irreducible Markov Chain with transition ma- trix P and stationary distribution . Let ( b X t ) be the time-reversed chain with transition matrix b P . Then is stationary for b P , and for x 0 ;:::;x t 2 we have P fX 0 =x 0 ;:::;X t =x t g =P f b X 0 =x t ;:::; b X t =x 0 g: (2.1.4) 2.1.2 Examples We provide an example of exchangeable pairs applied to a non-reversible Markov Chain. From Jason Fulman's 2004 paper [29]: Example 2.1.11. (Descents and inversions of a permutation.) LetW () be either the number of descents or inversions of a permutation2S n . We will show, via Stein's method, thatW satises a central limit theorem with error raten 1=2 . In particular, our approach features the construction of an exchangeable pair on top of a non-trivial, non-reversible Markov Chain. The number of descents statistic Des() of a permutation from the symmetric group S n , is dened as the number of pairs (i;i + 1) with 1 i n 1 s.t. 28 (i)>(i + 1). The number of permutations in S n with k + 1 descents is called the Eulerian number A(n;k). The other statistic we consider, Inv(), counts the number of inversions in 2 S n . An inversion is a pair (i;j), ii<jn s.t. (i)>(j). We can generalize both statistics as follows: Let M = (M i;j ) be a real, anti- symmetric,nn matrix; letX be an r.v. onS n dened byX() = P i<j M (i);(j) . Let M i;j =1 if j = i + 1, M i;j = 1 if j = i 1, and M i;j = 0 otherwise. That yields X() = 2 Des( 1 ) (n 1). If, on the other hand, we let M i;j =1 for i<j and M i;j = +1 for i>j and M i;i = 0, we obtain X() = 2 Inv( 1 ) n 2 . Dene W as the normalized X: W = X p Var(X): Thus, W is mean-0, variance-1. The bound on the distance betweenW and the normal is given by the following theorem by Rinott and Rotar; we will see that theorem again in the next section. Theorem 2.1.12. Let W;W 0 be an exchangeable pair of real random variables s.t. E(W 0 jW ) = (1)W with 0 < < 1. Suppose moreover thatjW 0 Wj A for some constant A. Then for all real x, jPfWxg (x)j 12 p Var(E[(W 0 W ) 2 jW ] + 48 A 3 + 8 A 2 p : (2.1.5) Here is the standard normal distribution. 29 From the above result, we derive the following: Theorem 2.1.13. Let Des() and Inv() be the number of descents and inversions of 2S n . Then for all real x, P 8 < : Des n1 2 q n+1 12 9 = ; (x) C n 1=2 P 8 > > < > > : Inv n 2 =2 q n(n1)(2n+5) 72 x 9 > > = > > ; (x) C n 1=2 where C is a constant independent of n. We next look at the non-trivial construction of the exchangeable Stein-pair. To obtain an appropriate W 0 : Pick I uniformly at random between 1 and n and dene 0 as (I;I + 1;:::;n), where (I;I + 1;:::;n) cycles by mappingI!I + 1! ::: ! n ! I, and where permutation multiplication is from left to right. Let W 0 () =W ( 0 ). Showing that the linearity condition holds with = 2=n is not hard: E(W 0 Wj) = 1 Var(X) 1 n n X i=1 X j:j>i 2M (i);(j) = = 1 p Var(X) 1 n X 1i<jn 2M (i);(j) = 2 n W: Notice the trick, typical in the related literature, of conditioning on a large sigma- algebra instead of on W . 30 The exchangeability, which holds though not obviously so, is encoded in the follow- ing theorem (rendered unnecessary by the result found in [50]): Theorem 2.1.14. Given a subset S off1;:::;ng, for each i2 S dene a i;S = P j2S:j>i M i;j and b i;S = P j2S:j<i M j;i . Suppose that for all subsets S off1;:::;ng, there is a bijection :S!S satisfying the following conditions: 1. For each i2S, a i;S b i;S =b (i);S a (i);S : 2. For each i2 S, there is a bijection i : Sfig) Sf(i)g s.t. M j;k = M i (j); i (k) for all j;k2Sfig. Then (W;W 0 ) is an exchangeable pair of random variables. The proof can be found in the cited reference. Having established the conditions for possessing the bound, our next step is to reduce the variance term in the right-hand-side of equation 2.1.5. To that end, Lemma 2.1.15. E(X) = 0 and Var(X) = P i<j(M i;j ) 2 + P n i=1 (A i B i ) 2 3 . Hence, Var(Des()) = n+1 12 and Var(Inv()) = n(n1)(2n+5) 72 . Next, it can be shown thatE(W 0 W ) 2 = 4=n. The next result follows from Jensen's inequality: Lemma 2.1.16. E[E[(W 0 W ) 2 jW ]]E[E[(W 0 W ) 2 j]] 2 : 31 Combining these results gives us the following: Var(E(W 0 W ) 2 j) B 0 n 5 Var(X) 2 n 2 B n 3 : Here B;B 0 are universal constants. The result is non-trivial and takes some eort to establish; the details can be found in [29]. Thus, we end up with a rate of O(n 1=2 ). Letting n rise to innity yields the central-limit-type result. 32 3 \Voter" and Other Models on Graphs In this chapter, we look at Voter-type models, particularly in the context of Stein's Method and its relation to Markov chains. A useful book-length treatise on Voter- models and related subjects can be found in Thomas Liggett's [39]. We start with a look at the anti-voter model, as discussed in Ch. 6.4 in [18], and in [49]. 3.1 Overview of Voter-type Models The original voter model (introduced independently in the 1970s by Cliord and Sudbury in 1973, and by Holley and Liggett in 1975, as mentioned in the Introduc- tion of [39]) can be formulated as follows: Take a connected, r-regular (each vertex hasr edges) graph of sizen. Assign 1's and1's to the nodes of the graph. Run a Markov chain on the graph with the following transition procedure: each turn, pick a node at random (under some distribution; usually we take the uniform), pick one of its neighbors at random (usually uniformly), and switch the value of the selected neighbor-node to the value of the originally selected node. Under uniformity of 33 node and neighbor selection, this chain converges to one of two absorbing states, in which all nodes have the same values. The \anti-voter" model, introduced in [41], has the selected neighbour node adopt a value opposite to that of the originally selected node. Under uniformity - again, of node and neighbor selection - the resulting chain has a stationary distri- bution. The mix of the two models obtained by adding an additional coin-throw in choosing whether to convert the neighbor node to the value of the original or to the opposite of the value of the original has been proposed by Goldstein and Rinott (pri- vate communication); they have dubbed this model the \mercurial voter model." Persi Diaconis and Christos Athanasiadis proposed the following variation of the voter-model in [4]: upon selecting a node, instead of picking one of its neighbors, ip a coin (with weightp, perhaps taken to be a half), and, according to the result of the cointoss, assign either 1 or -1 to the selected nodes and all its neighbors. The model has been labeled the \Neighborhood Attack" model. The following example is from Ch. 6.4 in [18], and from [49]. We follow the latter, original, source by Rinott and Rotar. (Also see [23].) 3.2 Voter-models, Markov Chains, and Stein's Method Example 3.2.1. (Anti-voter model.) Preliminary denitions: LetX (t) =fX (t) i g i2V ,t = 0; 1:;:::, whereV is the vertex set of ann-vertexr-regular graphG, andX (t) i =1, andX (t) is the antivoter chain. 34 The chain's transition occurs as follows: at time t, choose a random vertex i and a random neighbor of it j, and set X (t+1) i = X (t) j and X (t+1) k = X (t) k for all k6=i. Starting at stationarity, we study the normal approximation of the statistic U n =U (t) n = P i X (t) i for largen. Our goal is to show that under certain conditions, U n is asymptotically normal. We use Stein's Method through the following theorem (Theorem 1.1 in [49]): Theorem 3.2.2. For a function h : R! R, set h = R 1 1 h(z)(dz), where is the standard normal measure. Given that, for any exchangeable pair (W;W 0 ) satisfyingW is mean-0-variance-1, andE(W 0 jW ) = (1)W , and any continuous dierentiable bounded function h, we have jEh(W ) hj 1 (supjh hj) p VarfE[(W 0 W ) 2 jW ]g+ + 1 4 (supjh 0 j)EjW 0 Wj 3 ; where h 0 is the derivative of h. Furthermore, for all real x, sup x jP(Wx) (x)j 2 p VarfE[(W 0 W ) 2 jW ]g+ + 1 (2) 1=4 r 1 EjW 0 Wj 3 : Rinott and Rotar improve the above result to derive the following: 35 Theorem 3.2.3. Let (W;W 0 ) be exchangeable andW be mean-0-variance-1. Dene R =R(W ) from E(W 0 jW ) = (1)W +R, where 0<< 1. Then, := supfjEh(W ) hj :h2Hg 6 p VarfE[(W 0 W ) 2 jW ]g + 19 p ER 2 + 6 r a EjW 0 Wj 3 ; where a is a specic constant dened in the cited paper. If, furthermore,jW 0 WjA for a constant A, then 12 p VarfE[(W 0 W ) 2 jW ]g + 37 p ER 2 + aA 3 + 8 aA 3 p : (3.2.1) The construction of the exchangeable pair follows the standard technique of takingW;W 0 from successive turns of the underlying Markov Chain. Moreover, we have the following: Lemma 3.2.4. Let X (t) be a stationary process, T (X (t) ) assume nonnegative in- teger values, and suppose T (X (t+1) )T (X (t) ) = +1; 0 or 1. Set W =f(T (X (t) )), W 0 = f(T (X (t+1) )) where f is a measurable function. Then (W;W 0 ) is an ex- changeable pair. Assume thatG is neither bipartite nor an n-cycle. Then, the set of 2 n 2 congurations in which not all nodes hold the same values is irreducible and provides the support for the stationary distribution of the Markov Chain. Assume we start at stationarity, and thus X (t) is stationary. 36 Let U =U n (X) = P n i=1 X i and 2 = 2 n = VarU n . In many cases 2 n is of order n. Let W be the normalized U: W = W n = U n = n . Using the above results, we have: Theorem 3.2.5. For any n-vertex, r-regular graphG and any function h2H, supfjEh(W ) hj :h2HgC p VarQ r 2 +a n 3 ; (3.2.2) where Q =Q n = n X i=1 X j2N i X i X j ; (3.2.3) where a is the constant mentioned above. Thus, one of the main problems to be dealt with is the bounding of VarQ. We discuss that further below. Let (i;j) be the distance between vertices of i;j on our graphG, which we assume to be connected. This distance is the number of edges traversed in the shortest walk joining vertices i;j. The diameter of the graph, d, is the maximal pairwise distance. Suppose also thatG is distance regular, i.e. for all m = 1;:::;d and any pair of vertices (i;j) with (i;j) = m, the number of vertices k2 N i for which (k;j) = m 1, as well as the number of those for which (k;j) =m + 1, does not depend on the position of the vertices (i;j) but only on the distance m. Since we assume G is regular, the number of k2N i for which (k;j) =m also depends only on m. Denote the last number by a m . Note that if r is the degree of the graph, a m =r is 37 the probability that a random neighbor of i has the same distance to j as has i. Let = minfa m =r;m = 1;:::;dg. Then, (Lemma 1.2 in [49]) Lemma 3.2.6. For anyG as above, VarQC(2=) d n 2 r: (3.2.4) Then we have Proposition 3.2.7. LetG n be a sequence of distance regular graphs having diame- ters d n , characteristics = n , as dened above, and degrees r n . Let r n !1 and for some absolute strictly positive constants d;;, d n d; n ; 2 n 2 n: (3.2.5) Then W n converges to the standard normal in distribution as n rises to innity. The proof can be found in the source. The authors provide a few other analyses of special cases. But let us focus on the proof of Theorem 3.2.3. Taking (W;W 0 ) as the respective statistics over successive turns of the given stationary Markov Chain, we observe that the chain is not reversible; however, Lemma 3.2.4 shows that the selected pair is exchangeable anyway. For the linearity condition, dene a(X) as the number of edgesfi;jg for which X i = X j = 1; b(X) as the number of edgesfi;jg for which X i = X j =1; and c(X) as the number of edgesfi;jg with X i 6=X j . 38 Given that, we have T (X) = [2a(X) +c(X)]=r; nT (X) = [2b(X) +c(X)]=r: We deal with U;U 0 instead of with the W 's to deal with fewer terms. For the U's, E[(U 0 U)jX] = 4b(X)=(rn) 4a(X)=(rn) = 2(n 2T )=n =2U=n: Divide by n and condition on W to obtain the Stein-pair linearity condition with = 2=n and R = 0. Moreover, A as dened is the theorem above can be taken with A = 2= n , since for any n, U 0 can dier by U by at most 2. After satisfying the exchangeability and linearity conditions, one must reduce the resulting Stein bound, and in particular, one must deal with the term VarE[(W 0 W ) 2 jW ]. Focus on E[(U 0 U) 2 jX] = 4[2a(X)=(rn) + 2b(X)=(rn)]: Next, notice that 2a(X) + 2b(X) + 2c(X) =rn; Q = 2a(X) + 2b(X) 2c(X); from which we obtain 4[a(X) +b(X)] =Q +rn. 39 Hence,E[(U 0 U) 2 jX] = 2(Q +rn)=(rn) and VarE[(W 0 W ) 2 jX] =C Var Q rn 2 n = C (rn) 2 4 n VarQ: To wrap up, we note that since W is a function of X, VarE[(W 0 W ) 2 jW ] VarE[(W 0 W ) 2 jX]. It remains to examine Q for various graphs. We omit the rather technical results. 3.2.1 Other aspects of voter models The preceding example shows that a certain statistic on the vertices of a voter-type model goes to the normal distribution as the size of the selected family of graphs goes to innity. Naturally, one can examine other aspects of the voter-type Markov chains. In particular, one could try to obtain results on the stationary distribution of the chain associated with the model. What are its eigenvalues? Mixing times? What are the probabilities for specic states of the stationary distribution? A recent paper by Ron Graham and Fan Chung ([21]) examines the problem. Example 3.2.8. (Flipping edges and vertices in graphs.) Consider the following coloring process on a given connected graphG: AssumeG starts out uncolored. At each step, select an edge uniformly at random and recolor the two vertices at its ends. Pick blue for both with probability p and red for both 40 with probability q = (1p). We let the process run for a while, and examine the resulting colorings. LetH be the associated state graph. Its nodes are the color patterns of vertices ofG. There is a directed edge from one state ofH to another state ofH if one gets from the former to the latter in a single turn. In eect, we are dealing with a random walk onH. Indeed, one can examine the problem as a random walk on semigroups. Let us examine some of the details: Denition 3.2.9. A left-regular band (LRB) is a semigroupS in which every element is idempotent, and in addition, the following holds: xyx =xy (3.2.6) for all x;y2S. IfS is an RLB and is nitely generated, thenS is nite. For a probability distri- butionfw x g dened onS, we can dene a random walk with transition probability matrix P given by P (s;t) = X x2S;xs=t w x : Many problems can be formulated as random walks on RLB semigroups, an example being hyperplane chamber walks. Next, let us look at the results in [21], which deal with the spectra of \ ipping games" (or voter-type Markov Chains) on graphs and hypergraphs. 41 Theorem 3.2.10. For a graphG withn vertices, suppose we have a \deck" (set) of \cards" c v , each of which species a color c inf0; 1g and a vertex v. Let w x denote the probability we move from a state to x which means, for x = c v , we assign the color c to the vertex v and all its neighbors. Suppose the sum of all w x satises P x w x = 1. The associated random walk of vertex ipping on the state graph ofG has eigenvalue T = X suppxT w v for a subsetT of the letter setV wheresuppx for the cardx consists of its associated vertex v and all the neighbors of v. For each T , the multiplicity m T of T is 1. For the special case of w x = 1=(2n), we have T =(T )=m where (T ) denotes the number of vertices v with all its neighbors in T . We shift gears, and adopt a model, in which the ipping of vertices occurs without replacement. For a specic example, consider the cycle-graph: We ip edges on an augmented pathP n , meaning a path ofn1 vertices 1; 2;:::;n1 with edges between consecutive nodes, and with two additional half-edges attached to nodes 1 and n 1. The additional edges are treated the same as the other edges, except that their missing endpoints are ignored. Consider the probability u(n) that all the n 1 vertices have the value 0. Set u(0) =u(1) = 1 and take u(m) = 0 for m< 0. Then 42 Theorem 3.2.11. u(n) is given by U(z) = X n0 u(n)z n = 1 +z +pz 2 + 1 2 p(2p + 1)z 3 +::: = r q p tan z p pq arctan r p q : In addition, C(n) - the asymptotic probability that this process on the n-cycle has all its vertices assigned 0 - is order r n . What happens in the more general case, on some general undirected graph G = (V;E)? Recall that the process selects edges uniformly at random without replacement and then assigns a value of 0 (with probabilityp) or 1 (with probability q = 1p) to each unassigned endpoint of the selected edge. Let : V !f0; 1g be some assignment to the vertex setV ofG. For a xed x2V, let x :Vfxg!f0; 1g be the assignment of restricted toVfxg. LetE be the event that the terminal assignment is given by x onVfxg, with x arbitrary. Fori = 0; 1, letE i denote the event that the terminal assignment agrees with x on V fxg and (x) =i. Then Lemma 3.2.12. (Reduction Lemma) Pr(E) =Pr(E 0 ) +Pr(E 1 ): (3.2.7) As an example of this Lemma, letC(n1; 1) be the probability that the terminal assignment of an n-cycle is all zeros except on the vertex n, which is 1. By the 43 Lemma, C(n 1; 1) +C(n) = u(n), and so C(n 1; 1) = u(n)C(n) is of order 1 p r n+1 r n = ( r p 1)r n . Applying the Lemma to the augmented paths yields additional results; those can be found in the cited paper. We conclude this section with a look at Graham and Chung's examination of a model similar to the one examine in the next chapter. The authors' version of the \Neighborhood Attack Model" (called so by Persi Diaconis and Christos Athanasiadis in [4]) diers from the one suggested below in that it assigns values to the modes without replacement (as opposed to with replacement). Specically, take a pathP n withn vertices. Select vertices uniformly at random without replacement. Upon selected a vertex v, we give that vertex and its un- marked neighbors the value 0 with probability p, and the value 1 with probability q = 1p. Let P(n) be the probability that at the end of the process al vertices have value 0. By induction, P(n) = p n 2P(n 2) + n3 X k=0 P(k)P(n 3k) ! ; n 3: (3.2.8) Dene the generating function P(z) = P n0 P (n)z n . From this, one can produce a dierential equation, the solution to which is P(z) = p p tanh z p p + 1 2 ln 1+ p p 1 p p z p p : (3.2.9) 44 This implies that, asymptotically, P(n) is of order cr n 0 for a constant c, where r 0 = 1 0 is the inverse to the root of the denominator in the right-hand-side of 3.2.9. Specically, c = r 3 0 p . Hence, P (n) goes to 1 p p r n+3 0 as n!1. Let us stop here; the point is that the voter-type, edge- ipping models pro- vide plenty of non-trivial problems criss-crossing across a number of (sometimes dramatically) dierent elds in mathematics. 45 Part II APPLICATIONS We provide two disparate applications of Stein's Method, one in the context of the standard normal and the Voter models, and the other in the context of the Poisson distribution and the uniform recursive tree. 47 4 Counting Vertices in the Neighbourhood Attack Model Given Abundance of Symmetry We show, via a Stein's Method argument involving exchangeable pairs, that for certain (highly symmetric) families of graphs the number of 1's in the Neighbour- hood Attack model goes to the normal distribution asymptotically as the number of nodes tends to innity. 4.1 Introduction 4.1.1 Problem: We apply the Neighbourhood attack model on a given family of (nite) graphs. The model does the following each turn: Selects a random node (according to a particular non-degenerate probability distribution; usually the uniform distribution; I use the uniform distribution). 48 Turns the node and all its immediate neighbours into 1's or1's according to some distribution (Bernoulli(p) with 0<p< 1; I use p = 1=2). Given 1) a connected graph; 2) positive probability of selection for all nodes; 3) positive probabilities of turning into 1 or -1 for the selected node and its neighbours, the underlying Markov Chain, the states of which are the possible permutations of 1's and1's, is irreducible and everywhere recurrent on (a subset of) its state space, and therefore possesses a stationary distribution. Assume we begin at this stationary distribution. Let X be the number of 1's at stationarity. Then NX equals the number of 1's, where N is the number of nodes. We want to use Stein's method to show XEX X ! N!1 Z (4.1.1) whereZ is the standard normal, andEX and X are the expectation and standard deviation of X. 4.1.2 Approach and some background: We seek to apply Stein's Method. To that end we use -Stein pairs, since for Markov chains the pair (f(X);f(X 0 )), where X;X 0 are the sums of the nodes at consecutive turns at stationarity, can be made Stein for appropriate f. 49 A pair of r.v.'s (W;W 0 ) is a-Stein pair, if (W;W 0 ) = d (W 0 ;W ) (exchangeability condition) andE(W 0 jW ) = (1)W for some 2 [0; 1]. For an application of Stein's method in a similar context, see [49]. In that paper, Yosef Rinott and Vladimir Rotar show that, in a setting almost the same as the one examined here, the sum of the values of the nodes in the anti-voter model at stationarity is asymptotically normal. We examined [49] in rst part of the present work, in the review of literature. For an application in a dierent context, see the paper [29], in which Jason Fulman shows that the number of descents or inversions in permutations complies to a central limit theorem. Both the current problem and the one examined in [29] can be viewed as random walks on hyperplanes; and hence there is a structural similarity between the approach adopted here, and the one in [29]. For further details on the results of [29], see Ch.2 in Part I of this work. Other useful references can be found in the rst, background, part of the present work. For more results on the Neighbourhood Attack model, see [4], [21]. The former paper introduces the model and presents some results on random walks on hyper- plane arrangements. The latter paper studies some properties of the distributions of the implicit Markov Chains in models similar to the Neighbourhood Attack model. For more on Stein's method, see [18], [9], [53]. The rst two books provide a comprehensive overview of Stein's Method in regard to its applications to Normal 50 and Poisson approximations re exively. The latter document is an up-to-date sur- vey of Stein's Method literature and a useful entry-level source on the subject. For a brief introduction to Stein's Method, see Ch. 1 of the present work. 4.1.3 Setup: Let X be the number of 1's at stationarity. Let Y = 2XN = N X i=1 i : Here N is the total number of nodes and (t) i is the value of node i at time t. Examining Y is equivalent to examining X. We want mean-0 variance-1 variables. To that end, normalize Y : W (t) := YEY Y : (4.1.2) Note Y is a constant dependant on N: 2 Y = Var X i i = X i Var i + 2 X i<j Cov i j (4.1.3) Now, W is mean-0 variance-1. To get the condition for the theorem we want to apply, let W =W (t) and W 0 =W (t+1) , and examineE(W 0 jW ): 51 4.1.4 Results: Theorem. In 4.2.1, we show that = r+1 N : In 4.3.1, we establish the bound on VarY : (r + 1)N 4 2 Y (r + 1)N 2 : Finally, in 4.3.2 and 4.3.3, we derive the bound on the distance betweenW and the standard normal, 48 r 2 p r + 1 p N + (2 19=2 + 48) p r + 1 p N : We draw conclusions in 4.4.1. 4.2 Preliminaries 4.2.1 Proving E(W 0 jW ) = (1)W: Y E(W 0 jW ) =E N X i=1 (t+1) i E N X i=1 (t+1) i j N X i=1 (t) i E N X i=1 (t) i ! (4.2.1) Assumer-regularity for the graph (i.e. every node has exactlyr neighbours). Then the sum P N i=1 i changes each turn by between2(r + 1) and 2(r + 1). A basic example of a graph of this type is the circle (2-regular) graph, in which we have a set of nodes arranged in a circle, each node with two neighbours. Further assume uniformity in choosing nodes and choosing \colors" (i.e 1's and 1's). Under such conditions E (t) i = 0, i.e. at stationarity each node is 1 or 1 with equal probability. The sum of the node values will tend toward 0 (under 52 certain conditions; one of which, clearly, has to do with the number of neighbors each node has, since our model takes only the extreme values over the complete graph), since if elements of a certain \color" (\1" or \-1") dominate the graph, we are less likely to see an increase in the number of the elements of that color. LetC be a particular conguration of i 's. By the \Law of Total Expectation," E[E(Y 0 jC)jY ] =E[Y 0 jY ]. Denea 1;i as the number of nodes which equal 1 and havei neighbours equal to 1; denea 1;i as the number of nodes which equal1 and havei neighbours equal to -1. Observe that r X i=0 (a 1;i +a 1;i ) =N; r X i=0 (a 1;i a 1;i ) =Y: (4.2.2) At time (t + 1) we select a node of type a 1;i with probability a 1;i =N, and turn it and its neighbours to 1s or -1s with probability 1=2. Therefore, E(Y 0 jC) = r X i=0 [ a 1;i N 1 2 (Y + 2(ri)) + a 1;i N 1 2 (Y 2i 2) + a 1;i N 1 2 (Y 2(ri)) + a 1;i N 1 2 (Y + 2i + 2)] = = r X i=0 h a 1;i N (Y +r 1 2i) + a 1;i N (Yr + 1 + 2i) i = =Y r X i=0 a 1;i +a 1;i N + (r 1) r X i=0 a 1;i a 1;i N 2 r X i=0 i N (a 1;i a 1;i ) = =Y + (r 1) Y N 2 r X i=0 i N (a 1;i a 1;i ) 53 To nd the third term, note that P i ia 1;i is the number of positive neighbours to positive nodes; P i (ri)a 1;i is the number of negative neighbours to positive nodes; P i ia 1;i is the number of negative neighbours to negative nodes; and P i (ri)a 1;i is the number of positive neighbours to negative nodes. In adding the four sums we count each node once for each of its neighbours - for a total of r times. Therefore, the following equality holds: Y = 1 r r X i=0 [ia 1;i + (ri)a 1;i (ri)a 1;i ia 1;i ] = 1 r X i (2ia 1;i 2ia 1;i +ra 1;i ra 1;i ) = 1 r X i (a 1;i a 1;i )(2ir) )Y + 1 r X i r(a 1;i a 1;i ) = 1 r X i 2i(a 1;i a 1;i ) 2Y = 1 r X i 2i(a 1;i a 1;i ) Y = 1 r X i i(a 1;i a 1;i ) Yr = X i i(a 1;i a 1;i ) Going back toE(Y 0 jC): E(Y 0 jC) =Y + (r 1) Y N 2 r X i=0 i N (a 1;i a 1;i ) =Y + (r 1)Y N 2Yr N = 1 r + 1 N Y 54 Taking expectation conditioned on Y yields E(Y 0 jY ) =E[E(Y 0 jC)jY ] =E 1 r + 1 N YjY = 1 r + 1 N Y (4.2.3) So, as desired, E(W 0 jW ) = 1 r + 1 N W; (4.2.4) which complies with the Stein linearity condition E(W 0 jW ) = (1)W +R; (4.2.5) where 0 1 and R is a random variable. In our case conveniently R = 0. As for lambda, = r + 1 N : (4.2.6) 4.2.2 R ollin's Result: In general, the next step is to show thatW andW 0 are exchangeable, i.e. (W;W 0 ) = d (W 0 ;W ). Exchangeability clearly holds when the Markov Chain underlying W and W 0 is stationary and reversible. Reversibility is not always available or easily proved. For example, our chain is clearly not necessarily reversible: Take the circle graph. It is easy to see that for N large, Y can take the value N 2 - i.e. there is an attainable at stationarity arrangement of values for the nodes in which all nodes but one have the value of 1. Now, the probability of going from that arrangement 55 to the all 1's arrangement for whichY =N is positive; but the probability of going from Y =N to Y =N 2 is zero, and hence our chain fails to satisfy the detailed balance equations (x)P (x;y) =(y) =P (y;x). However, a recent result by Adrian R ollin removes the necessity for exchange- ability. R ollin's theorem - see [50, Theorem 2.1] - states: Theorem. Assume W;W 0 are r.v.s on the same probability space, s.t. L(W 0 ) = L(W ) (L for 'law'), EW = 0, Var(W ) = 1. Given E(W 0 jW ) = (1)W +R, for := sup h2H jEh(W )Eh(Z)j (4.2.7) (hereZ is the standard normal distribution, andH is the family of functions asso- ciated with the Wasserstein distance), we have 6 p VarE W (W 0 W ) 2 + 19 p ER 2 + 4 r aEjW 0 Wj 3 : (4.2.8) If also there exists a constant A s.t.jW 0 WjA a.s., we have 12 p VarE W (W 0 W ) 2 + 37 p ER 2 + 32 A 3 + 6 A 2 p : (4.2.9) Proof. See [50, Theorem 2.1]. In our case R = 0; so the bound is 12 p VarE[(W 0 W ) 2 jW ] + 32 A 3 + 6 A 2 p : (4.2.10) 56 The next step is to boundjW 0 Wj. NotejY 0 Yj 2(r + 1). So jW 0 Wj 2(r + 1) Y =A: Thus, the bound becomes 12N (r + 1) 2 Y p VarE[(Y 0 Y ) 2 jY ] + 32 8(r + 1) 2 N 3 Y + 6 4(r + 1) 3=2 p N 2 Y : (4.2.11) 4.3 Calculations and Proofs 4.3.1 Bounding the Variance of Y In eect, the next goal is to bound the two terms Var [E(Y 0 Y ) 2 jY ] and Var(Y 0 ) = Var(Y ) = 2 Y . For an explicit formula for the value of the corresponding Y in the anti-voter model, consult Ch.14 of [1]. Another relevant paper dealing with the anti-voter case can be found in [25]. Let us rst try to nd VarY . Dene t Y =Y 0 Y to obtain Y 0 =Y + Y . By stationarity, it follows that 0 = Var(Y 0 ) Var(Y ): Hence: 0 =E(Y 0 ) 2 EY 2 = (4.3.1) =E Y 2 + 2Y Y + (Y ) 2 Y 2 = (4.3.2) = 2E(Y t Y ) +E(Y ) 2 : (4.3.3) 57 To continue, we need to obtain a rmer grip on the r.v. t Y . It is easy to see that t Y takes values between2(r + 1) and 2(r + 1), and that the probability distribution of t Y is a function of certain edge and vertex counts on the coloured graph, themselves random variables. Specically: Dene q i (or q (t) i at turn t; we omit the superscript to avoid excess clutter) as the number of nodes s.t. the sum of the values at the node and all its vertices equalsi. Clearly,i takes integer values (all odd or all even depending on the parity of r) between(r + 1) and (r + 1). Specically, if r is odd, i takes the values (r + 1);(r 1);:::;2; 0; 2;:::; (r 1); (r + 1); and ifr is even,i takes the values (r + 1);(r 1);:::;1; 1;:::; (r 1); (r + 1). In each casei takes (r + 2) distinct values. Simple counting produces two useful identities involving the q i 's: X i q i =N and X i iq i = (r + 1)Y: Now, at each turn of the Neighborhood-Attack process we pick a node uniformly at random (i.e. with probability 1=N), and turn its value and the value of all its neighbors to either 1 or -1 uniformly at random (i.e. with probability 1=2). It thus follows that t Y has the (conditional onfq i g) p.d.f.: t Y = 8 > > > < > > > : (r + 1)i with probability q i 1 2N (r + 1)i with probability q i 1 2N 58 So for example, t Y takes the value of 2(r+1) = (r+1)((r+1)) with probability q r+1 =2N; and the value2(r + 1) =(r + 1) (r + 1) with probabilityq (r+1) =2N. Thus, we have E [ t YjY ] = X i [(r + 1)i] q i 2N + X i [(r + 1)i] q i 2N = =2 X i i q i 2N = (r + 1)Y N ; which coincides with our earlier calculations for . We continue from 4.3.3: 0 =E(Y 0 ) 2 EY 2 = (4.3.4) = 2E(Y Y ) +E(Y ) 2 = (4.3.5) = 2(r + 1) N E(Y 2 ) +EE X i (r + 1i) 2 + ((r + 1)i) 2 q i 2N jY ! (4.3.6) 59 Let us focus on theE P i [(r + 1i) 2 + ((r + 1)i) 2 ] q i 2N jY term: E X i (r + 1i) 2 + ((r + 1)i) 2 q i 2N jY ! = =E X i (r + 1) 2 2(r + 1)i +i 2 + (r + 1) 2 + 2(r + 1)i +i 2 q i 2N jY ! = = 1 2N E 2 X i (r + 1) 2 +i 2 q i jY ! = =E " (r + 1) 2 N X i q i jY # + 1 N E " X i i 2 q i jY # = = (r + 1) 2 + 1 N E " X i i 2 q i jY # (r + 1) 2 + 1 N E " X i (r + 1) 2 q i jY # = = (r + 1) 2 + (r + 1) 2 = 2(r + 1) 2 And therefore, continuing from 4.3.6, 0 =E(Y 0 ) 2 EY 2 = = 2(r + 1) N E(Y 2 ) +EE X i (r + 1i) 2 + ((r + 1)i) 2 q i 2N jY ! 2(r + 1) N E(Y 2 ) + 2(r + 1) 2 ; meaning 2 Y = Var(Y ) =E(Y 2 ) (r + 1)N: 60 However, since the Y terms appear in the denominators of the terms in 4.2.11, we need either a lower bound of Y or the exact variance of Y . Observe that we have: Var(Y ) =EY 2 = N 2(r + 1) EE X i (r + 1i) 2 + ((r + 1)i) 2 q i 2N jY ! = = N 2(r + 1) " (r + 1) 2 + 1 N E( X i i 2 q i )jfq i g # (r + 1)N 2 : Thus (r + 1)N 2 2 Y (r + 1)N: (4.3.7) 4.3.2 Reducing and bounding VarE[(Y 0 Y ) 2 jY ] Part 1: Now we have to evaluate or bound VarE[(Y 0 Y ) 2 jY ] = VarE Y [(Y 0 Y ) 2 ]. (Also see [49], [29].) LetC be the specic set off (t) 1 ;:::; (t) N g underlying Y . Since Y is a function ofC, Var E Y (Y 0 Y ) 2 Var E C (Y 0 Y ) 2 : (4.3.8) LetY 0 i+ be theY 0 of the case when nodei and its neighbours turn to 1s during the transition from turn t to turn t + 1, with Y 0 i dened similarly. Then E C (Y 0 Y ) 2 = 1 N N X i=1 1 2 (Y 0 i+ Y ) 2 + 1 2 (Y 0 i Y ) 2 61 (i j are the indices of the updated nodes; i 0 =i) = 1 2 1 N N X i=1 2 4 (r + 1) r X j=0 (t) i j ! 2 + (r + 1) r X j=0 (t) i j ! 2 3 5 = = 1 2N N X i=1 [(r + 1) 2 + 2(r + 1) r X j=0 i j + r X j=0 i j ! 2 + + (r + 1) 2 2(r + 1) r X j=0 i j + r X j=0 i j ! 2 ] = = (r + 1) 2 + 1 N N X i=1 r X j=0 i j ! 2 The constant is invariant; therefore Var E C (Y 0 Y ) 2 = Var 0 @ 1 N N X i=1 r X j=0 (t) i j ! 2 1 A = (4.3.9) = 1 N 2 Var 0 @ N X i=1 r X j=0 (t) i j ! 2 1 A (4.3.10) We need to look at the N(r + 1) 2 pairs i j with i;j2f1; 2;:::;Ng: It is easy to see that N(r + 1) of the pairs will be of the type 2 i = 1 and will not contribute to the variance term. The rest of the pairs are of elements at distances 1 or 2 from each other. For all neighbouring nodes i;j, the product i j participates 4 times. This gives 4 Nr 2 = 2Nr terms (one of N and r must be even). The remaining Nr(r 1) terms correspond to the pairs i j , where for each path of length of 2 from nodei to node j we get one such pair with a coecient 2. 62 4.3.3 Reducing and bounding VarE[(Y 0 Y ) 2 jY ] Part 2: Let us consider the term Var[E(Y 0 Y ) 2 jY ] from a slightly dierent perspective. Var E(Y 0 Y ) 2 jY = Var E( t Y ) 2 jY Var E( t Y ) 2 jfq i g = = Var (r + 1) 2 + 1 N " X i i 2 q i jfq i g #! = = 1 N 2 Var X i i 2 q i ! Now, we can express each i as the sum P j2N k j , where k is a node such that its value and the values of its neighbors sum to i, andN k is the neighborhood of that node including the node itself. Since q i counts all such i's, the sum P i i 2 q i equals P N k=1 P j2N k j 2 . So we end up with what we had in 4.3.10. Next, Var X i i 2 q i ! = Var 0 @ N X k=1 X j2N k j ! 2 1 A = = Var 0 @ N(r + 1) + X 0i;jN;i;j:d(i;j)=1;2 i j 1 A = = Var (2()) = Var (4( +)): Here d(i;j) is the distance between nodes i and j. For the last line, observe that N(r + 1) is invariant, and that the sum P 0i;jN;i;j:d(i;j)=1;2 i j can be interpreted as a sort of an edge count over our graph, with each pair of neighbors or near 63 neighbors of the same sign participating as a +1, and each pair of opposite values participating as a1. Each such pair gets counted twice. Thus let r be the number of neighbors or near-neighbors (meaning nodes at distances one or two) each node has; be the number of pairs of neighbors or near-neighbors with equal node-values; and be the number of pairs with opposite node-values. Moreover, corresponds to the count of pairs of neighbors or near- neighbors with values both equal to 1, and is the count of pairs of1's. Note that we assume r is some xed quantity: i.e. the underlying graph pos- sesses sucient symmetry so that each node has the same number of neighbors or near-neighbors. For example each node in the circle-graph has 4 neighbors/ near-neighbors. The assumption that r i is xed for all i is not particularly gratuitous, since, either way, r i r 2 i , and under the present assumptions, r i =r is xed. Next, since 2( +) =r N, we have 2() = 2 (r N 2) = 4r N. Therefore, Var(2()) = Var(4) = Var(4(+)), where is the number of pairs of neighbors and near neighbors with node-values 1, and is the count of pairs with values1. On the other hand, we have 2( ) = r Y , and therefore Var( ) = (r ) 2 4 Var(Y ). 64 One naturally wonders if we can use the bound we established for Var(Y ) to bound Var( +). From the denition, Var(+) = Var()+2 Cov(;)+Var(); Var() = Var()2 Cov(;)+Var() It suces to show that Cov(;) 0 to obtain Var( +) Var(). For and to be negatively correlated, an increase in one would have to imply a decrease in the other - meaning, in our setting, that an increase in the number of edges (i.e. pairs of nodes at distance 1) with ones at both ends would have to imply a decrease in the edges with negative ones at both ends - and vice versa. That is indeed the case in our game, in which each move picks a node and its neighbors and turns them to ones or to negative ones. In the former case, we may increase , but we won't increase . The latter case is symmetric to the former. It follows that Cov(;) 0, and hence that Var( +) Var(). Therefore, Var X i i 2 q i ! = Var (2()) = 4 Var (2( +)) 4 Var (2()) = 4(r ) 2 Var(Y ) 4(r ) 2 (r + 1)N: It follows that Var[E(Y 0 Y ) 2 jY ] 4(r ) 2 (r + 1) N (4.3.11) 65 We thus arrive at the overall bound (4.2.11): 12N (r + 1) 2 Y p VarE[(Y 0 Y ) 2 jY ] + 32 8(r + 1) 2 N 3 Y + 6 4(r + 1) 3=2 p N 2 Y 24N (r + 1)N r 4(r ) 2 (r + 1) N + 32 16 p 2(r + 1) 2 N ((r + 1)N) 3=2 + 6 8(r + 1) 3=2 p N (r + 1)N = = 48 r p r + 1 p N + 2 19=2 p r + 1 p N + 48 p r + 1 p N Thus our overall bound is: Theorem 4.3.1. Given the stationary Neighborhood Attack model on a family of r-regular graphs of size N, the distance between the normalized sum of the values of the nodes of the graph and the standard normal is bounded by: 48 r p r + 1 p N + (2 19=2 + 48) p r + 1 p N (4.3.12) Proof. Above. The bound on the distance is of O r p r N 1 2 . 4.4 Conclusions and Questions 4.4.1 Examples concerning specic families of graphs The bound in 4.3.12 implies that (under stationarity) the normalized sum of values of the nodes of the graph, W , goes to the standard normal as the size of the graph rises given (r ) 2 rN ! 0. Note that rr r 2 . 66 Let us consider four specic families of graphs. First, the complete graph, in which r = N 1. On the complete graph, Y = Y W clearly has the uniform binary distribution taking valuesN. Thus it is to no surprise that our bound on the distance to the normal rises to innity with N. From the other side of the spectrum of regular graphs, we can take the circuit (or circle or simple cycle) graph, in which we haveN ordered nodes, each connected to its predecessor and its successor, with node N connected to nodes N 1 and 1. Here r = 2, and hence r 3=2 =N 1=2 goes to 0 as N increases to innity. The argument can be extended to circulant 1 graphs: as long asr stays constant as N rises, Y would converge to the normal. For a slightly more complicated example, consider the hypercube graph: the hypercube of dimension 2 is the square; the hypercube of dimension 3 is the cube; and so on. One can index the nodes of the n-dimensional hypercube graph with a string ofn zeros and ones, with nodes diering in exactly one digit being neighbors. It is easy to see that for an n-dimensional cube, r =n, and N = 2 n . Since r 3=2 N 1=2 = n 3=2 2 n=2 ! n!1 0; we can conclude that W goes to the standard normal for the hypercube family of graphs. 1 A circulant graph is such that we can arbitrarily index its nodes with 0,1,...,N1, in such a way that if the nodes corresponding to two indices x and y are adjacent, then any two nodes indexed by z and (zx+y) mod N are adjacent. Here N is the number of nodes and adjacency of two nodes means they are connected by an undirected edge. 67 Finally, consider the (complete) bipartite graph of size N = 2M, with M a natural number. For this family, r =M, andN = 2M. On such a graph, Y would frequently take values nearN; 0 and N, and hence cannot be expected to go to the normal. Indeed, we have r 3=2 N 1=2 = M 3=2 (2M) 1=2 ! M!1 1: The argument can clearly extend to multipartite graphs of a xed number of par- titions. 4.5 Further Questions We list some potentially interesting questions related to the present paper: What does the stationary distribution of the Markov chain on the state space of the Neighborhood Attack model look like? What states are the most and the least likely? What can one say of the transition matrix of the same Markov chain? What are its eigenvalues? Mixing times? How would our convergence results change if we relax some assumptions? What if we abandon uniformity in the selection of nodes in the Neighborhood Attack model? What if we remove the requirement for regularity of the given family of graphs, and replace it with some requirement on the measure of bipartiteness of the graph, or on the diameter of the graph? 68 What other statistics of the state-space of the Neighborhood Attack model could be of interest? What if one considers the Mercurial Voter model instead of the Neighborhood Attack model? 69 5 A More General Case of the Neighborhood Attack Model; the Mercurial Model We examine a more general case of the problem discussed in the previous chapter; and we consider a similar problem in another voter-type model. 5.1 Results Below, we prove the following theorems: Theorem. Suppose we have a family of (connected, undirected) graphs of size N, with r i the number of neighbors of node i (taking any arbitrary ordering of the nodes), such that 1) r 3 max p N goes to 0 with N; 2) we have 1 (r min + 1) 1 2 X i (r i + 1) 2 + (C N N) X C N (r i r min ) ! > 0; where C N is the number of nodes of the graph endowed with more than r min neigh- bors. 70 Then, the normalizedY -statistic (Y := P i i ) of the Neighborhood Attack Model ran at stationarity over the given family of graphs goes to the standard normal with a rate of at most O( r 3 max p N ) as N rises toward innity. Theorem. The normalized sum of values of the nodes in the stationary Mercurial Voter model over a family of r-regular simple connected graphs - W - goes to the standard normal as the size of the underlying graph increases, given that the pa- rameter p2 (0; 1) of the model remains xed. The distance between W and the standard normal is bounded above byC p N 1=2 ; whereC p is the computable constant dependent solely on p given by 5.3.6. Theorem. Suppose we have a family of (connected, undirected) graphs, for which k 2 max N goes to 0 as N goes to innity. (Here k i = P j2N i 1 r j , where r i is the number of neighbors node i has under an arbitrary ordering of the nodes, andN i is the set of neighbors of node i.) Suppose also, that C N , the number of nodes i of the graph of size N for which k i > 1 and ~ C N , the number of nodes i for which k i 6= 1, are such that k max C N is of O(1) and ( P i (k i 1) 2 ) is also of O(1). Suppose, moreover, we run the Mercurial Model with parameter p> 1 2 , in stationarity, on the family of graphs. Then the normalized Y := P i i - the sum of the values associated with the nodes of the graph at stationarity - goes to the standard normal with N, with rate of at most k 2 max p N . 71 5.2 Neighborhood Attack: A More General Case We now try to relax the Neighborhood Attack model. There are at least three conditions which beg for relaxation: 1) the uniformity condition in picking a node at each turn; 2) the uniformity condition in picking 1 or -1 at each turn; 3) and the r-regularity of the graph. The most natural way to relax the rst condition would be to adopt weights for the probabilities of nodes selection according to the number of neighbors each node has - but that pre-supposes that the third of the given conditions has already been relaxed. I don't think that relaxing the second condition would do much. Hence, let us relax the r-regularity condition - in short, suppose each node i in our graph has r i neighbors. Let, moreover Q = P i r i =rN. We proceed along the general scheme established in the rst section, with some necessary alterations. To begin with, let us drop the \q i " terms we had in the previous section, and instead dene u i as the number of 1's which neighbor node i or are node i, i.e. u i = P i2N j 1 j =1 . Here, as before,N i is the neighborhood of nodei, including node i. Clearly, u i takes the integer values between 0 and (r i + 1). Now, Yjfu i g = 2 8 > > > < > > > : (r i + 1)u i with probability 1 2N u i with probability 1 2N 72 And so EYjf i g = 2 N X i 1 2 X j2N i (1 1 i =1 ) + 1 2 X j2N i (1 i =1 ) ! = = 2 2N X i X j2N i ( i ) = 1 N X i (r i + 1) i = = 1 N (r min + 1)Y 1 N X i (r i r min ) i We see that = (r min +1) N and R = 1 Y E Y 1 N P i (r i r min ) i . Furthermore, E(Y ) 2 jfu i g = 4 2N X i (r i + 1) 2 2u i (r i + 1) + 2u 2 i : Basic calculus shows that for g(u) = (r + 1u) 2 +u 2 on u2 [0;r + 1] with r> 0, argmax(g) = 0; (r+1) and argmin(g) = (r+1)=2, and max(g) = (r+1) 2 and min(g) = (r + 1) 2 =2. And so 1 2 P i (r i + 1) 2 P i ((r i + 1) 2 2u i (r i + 1) + 2u 2 i ) P i (r i + 1) 2 . Hence, 1 N X i (r i + 1) 2 E(Y ) 2 2 N X i (r i + 1) 2 Now, 2EY Y =E(Y ) 2 2 N EY (r min + 1)Y + X i (r i r min ) i ! =E(Y ) 2 73 2(r min + 1) N EY 2 =E(Y ) 2 2 N E X i i ! X i (r i r min ) i ! Next, let C N be the count of nodes with non-minimal number of neighbors in the graph at size N. Then (C N N) X C N (r i r min )E X i i ! X i (r i r min ) i ! (NC N ) X C N (r i r min ) 2 N (NC N ) X C N (r i r min ) 2 N E X i i ! X i (r i r min ) i ! 2 N (C N N) X C N (r i r min ) And so, 1 N X i (r i + 1) 2 + 2 N (C N N) X C N (r i r min ) 2(r min + 1) N EY 2 2 N X i (r i + 1) 2 + 2 N (NC N ) X C N (r i r min ) 74 Hence: 1 (r min + 1) 1 2 X i (r i + 1) 2 + (C N N) X C N (r i r min ) ! EY 2 1 (r min + 1) X i (r i + 1) 2 + (NC N ) X C N (r i r min ) ! For a quick heuristic verication of our calculations, let us try to see what happens at r i =r =r. We end up with N(r + 1) 2 = 1 2 P i (r + 1) 2 (r + 1) Var(Y ) P i (r + 1) 2 (r + 1) =N(r + 1): So things add up so far. To continue, we have to assume that 1 (r min + 1) 1 2 X i (r i + 1) 2 + (C N N) X C N (r i r min ) ! > 0: How weighty is that assumption, and what does it amount to? In eect, we are requesting that P C N (r i r min ) is not too large, and that amounts to a pair of requirements: First, that no single node has too many neighbors; and second, that not too many nodes have more neighbors than the minimum number of neighbors for a node of the graph. In essence, we are asking that the graph, while not r-regular, is \almost" r-regular. 75 Thus, for example, if we take the circle graph and add a single edge somewhere, we still obtain the desired convergence to the normal with the current approach. If, however, I let C N stay at N=2, or I let (r max r min ) stay near some xed fraction ofN along withN, my approach fails. It is fairly clear that letting (r max r min ) stay, say, near N=2 would preclude a convergence to the normal for the Y - statistic, as I explained in the previous section and reiterate again at the end of the current section. It is not clear, however, that the Y -statistic in a family of graphs in which half the nodes have 3 neighbors and the other have have 4 neighbors should not converge to the normal. Indeed, my approach fails to prove that such a convergence doesn't occur; perhaps the convergence is there, but my approach is too weak to achieve the result. We next consider Var(E X (X) 2 ), or, rather, the non-smaller entity Var(E(X) 2 jf i g). Note that since Y = 2X N, we have Y = 2X, and conditioning onX is equivalent to condition onY . Hence bounding Var(E X (X) 2 ) is equivalent to bounding Var(E Y (Y ) 2 ) up to an insignicant scalar factor of 4. Var E X (X) 2 Var E(X) 2 jf (t) i g = = Var 1 2N X i (r i + 1) 2 2u i (r i + 1) + 2u 2 i ! = = 1 N 2 Var X i u i (u i r i 1) ! 76 Now, 0u i r i + 1, and so u i r i 1 0. Hence, continuing from above, Var E X (X) 2 1 N 2 Var X i u i (r i + 1u i ) ! = = 1 N 2 Var X i X j2N i 1 j =1 ! X k2N i 1 k =1 !! ; where, againN i is the neighborhood of node i, meaning (the set of) all nodes at distance 1 to i, including i itself. Thus we end up with taking the variance of a count over the pairs of nodes of opposite signs at distances 1 (neighbor) or 2 (near-neighbor) from each other. Each node participates twice for each neighbor or near neighbor of opposite sign. We need to introduce some more notation. Let i be the number of neighbors or near-neighbors to node i with a value same as i's; let i be the number of neighbors or near-neighbors to node i with a value opposite i's; and i = i + i , with i corresponding to pairs of 1's, and i corresponding to pairs of -1's. Let, moreover, Q = P i i + i = P i r i , where r i is the number of nodes at a distance of at most 2 from nodei. Note thatQ is xed along withN andQ. Now, 77 Var X i ( X j2N i 1 j =1 )( X k2N i 1 k =1 ) ! = Var(2 X i i ) = = Var(2 X i i ) = Var 2 X i ( i + i ) ! = 4 Var X i i + X i i ! 4 Var(2 X i i ) 16(r max ) 2 Var(X); Note that r i r i r 2 i . To get the last line, we use the fact that for identically distributed r.v.'s X;Y , we have Var(X +Y ) Var(X +X) - and in our case, by symmetry, P i i must have the same distribution as P i i . Finally, Var(E X (X) 2 ) 16(r max ) 2 N 2 Var(X) 16(r max ) 2 N 2 1 4(r min + 1) 1 2 X i (r i + 1) 2 + (C N N) X C N (r i r min ) ! : Next, we want to consider ER 2 . Note that here R = 1 Y E Y 1 N P i (r i r min ) i : Hence, using Jensen and Cauchy-Schwarz, 1 2 Y E ~ R 2 =ER 2 1 2 Y 1 N 2 E X i (r i r min ) i ! 2 1 2 Y 1 N 2 X C N (r i r min ) ! 2 : 78 We can now evaluate the overall bound to the normal. Recall that 12 p VarE W (W 0 W ) 2 + 37 p ER 2 + 32 A 3 + 6 A 2 p : We can take A = 2(rmax+1) Y . Note that Q = P i r i =Nr, and recall that = r min +1 2N . So the above expression is equivalent to: 24N (r min + 1) 2 Y p VarE Y (Y ) 2 + 74N p E ~ R 2 (r min + 1) Y + + 64 8 N(r max + 1) 3 (r min + 1) 3 Y + 6 4 p 2 p N(r max + 1) 2 p (r min + 1) 2 Y : Patching the whole mess together yields something of at most O( r 3 max p N ). So we can posit a sucient condition for the asymptotic normality of X (and hence also of Y ) in the current form of the model - we require, on top of the previously established assumptions, that r 3 max p N goes to zero as N rises to innity in the given family of graphs. This outcome make sense heuristically. Consider the star graph, in which one central node has (N 1) neighbors, and all other nodes have only 1 neighbor - the central node. Every turn, with probability 1=N, the nodes of the entire graph adopt the same value. Such behavior places great weight in the tails of the distribution ofX, greater in particular than one would have ifX were approximately Gaussian. In particular, we improve on the r-regularity result by having shown that, for example, the simple circuit family of graphs endowed with one additional edge at 79 each value ofN, has the property that the normalizedX-statistic of the Neighbor- hood Attack Model in stationarity on that family of graphs goes to the standard normal as N rises. And so, we have the following theorem: Theorem 5.2.1. Suppose we have a family of (connected, undirected) graphs of size N, with r i the number of neighbors of node i (taking any arbitrary ordering of the nodes), such that 1) r 3 max p N goes to 0 with N; 2) we have 1 (r min + 1) 1 2 X i (r i + 1) 2 + (C N N) X C N (r i r min ) ! > 0; where C N is the number of nodes of the graph endowed with more than r min neigh- bors. Then, the normalizedY -statistic (Y := P i i ) of the Neighborhood Attack Model ran at stationarity over the given family of graphs goes to the standard normal with a rate of at most O( r 3 max p N ) as N rises toward innity. Proof. Above. 5.3 The Mercurial Voter Model Let us now turn our attention to the Mercurial Voter Model. The goal again is to determine sucient conditions under which the distribution of the sum of the values of the nodes of the Mercurial model goes to the normal under stationarity as the sizes of the graphs of a given family of underlying graphs rises to innity. 80 The Mercurial model mixes the Voter and Anti-Voter models as follows: The Voter Model assigns 1's and -1's to the nodes of a given (nite, connected, undirected) simple graph. We then run a Markov chain by (uniformly at random) picking a node at each turn, and then assigning it a value equal to that of one of its neighbors chosen uniformly at random. One can see that the chain of the Voter model would eventually sink into one of the two absorbing states on which all nodes of the graph are of the same value. A detailed analysis of the Voter Model can be found in [39]. The Anti-Voter Model works as the Voter Model, the key dierence being that the selected node aligns itself in opposition to its neighbors rather than in agreement. The associated Markov Chain over the relevant subset of the state spacef1; 1g N (with N the size of the graph) is thus irreducible. The problem of obtaining sucient conditions for the convergence of the sum of the nodes in the Anti-Voter Model to the normal was tackled in [49]. The Mercurial Voter Model mixes the Voter and Anti-Voter Models by having the selected node adopt its neighbor's value with probability (1p) = q2 (0; 1), and the opposite value with probability p. Forp = 1=2, one expects that the value of each node at stationarity does not depend on the values of the other nodes in the graph, and is1 with probability a half each, implying that the number of nodes with value 1 is Binomial with parameters 1=2 and N - which goes to the normal as N rises. 81 Let us therefore consider the general case p6= 0; 1=2; 1. Dene, as in the previous example, (t) i =1 the value of node i at turn t; Y :=Y (t) = P N i=1 (t) i ; Y 0 =Y (t+1) ; 2 Y = Var(Y ); W =Y= Y ; W 0 =Y 0 = Y . Assume the given family of graphs is r-regular. The symmetry of the model implies that at stationarity each node is equally likely to be 1 or -1. ThereforeEY = 0 and Var(Y ) =EY 2 . 5.3.1 and Var(Y ) Let (with (t) meaning at turn t) a(t) := number of edges with the same values at either end b(t) := number of edges with dierent values at either end c(t) := number of edges with 1's at either end d(t) := number of edges with -1's at either end: Counting provides the equalities: a =c +d; a +b =rN=2; 2(cd) =rY (5.3.1) 82 Suppose we know b;c; and d at turn t. Then, for Y = Y 0 Y , we have the following possible recongurations after we move at turn t: 8 > > > > > > > > > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > : Y = 2 :b; + +c with probability q 2 b b+c+d Y =2 :b; + +d with probability q 2 b b+c+d Y = 0 : no change with probability p b b+c+d Y =2 :c; + +b with probability p c b+c+d Y = 0 : no change with probability q c b+c+d Y = 2 :d; + +b with probability p d b+c+d Y = 0 : no change with probability q d b+c+d Herex means \x decreases by 1," and + +x means \x increases by 1." It follows that (using b +c +d =a +b =rN=2) Yjb;c;d = 8 > > > > > > > > < > > > > > > > > : 2 with probability 2 rN [ qb 2 +pd] 0 with probability 2 rN [pb +qc +qd] 2 with probability 2 rN [ qb 2 +pc] 83 And therefore, E[Y 0 YjY ] =E[YjY ] =E [E(Yjb;c;d)jY ] = =E 2 rN (qb + 2pd (qb + 2pc))jY =E 4p rN (dc)jY = =E 4p rN (cd))jY =E 4p rN rY 2 jY = = 2p N Y; implying (W;W 0 ) is -Stein with = 2p N . At p = 1, we should have the for the Anti-Voter model; which is indeed the case, as shown in [49]. Next, 0 =E(Y 0 ) 2 EY 2 =E(Y + Y ) 2 EY 2 =EY 2 + 2EY Y +E(Y ) 2 EY 2 = = 2EY Y +E(Y ) 2 = 2E[E(Y Yjb;c;d)] +E(Y ) 2 = = 4p N EY 2 +E[E(Y ) 2 jb;c;d] We expand the rightmost term: E[(Y ) 2 jb;c;d] = 4 2 rN qb 2 +pd + qb 2 +pc = 8 rN (qb +pc +pd) Since (using b;c;d 0) min(p;q) rN 2 = min(p;q)(b+c+d) (qb+pc+pd) max(p;q)(b+c+d) = max(p;q) rN 2 ; 84 we see that 4 min(p;q)E[(Y ) 2 jb;c;d] 4 max(p;q); and hence N min(p;q) p EY 2 N max(p;q) p (5.3.2) 5.3.2 Upper bound to Var[E(Y 0 Y ) 2 jY ] Since (by Jensen) Var(E[(Y 0 Y ) 2 jY ]) Var(E[(Y ) 2 jb;c;d]), we obtain Var E(Y 0 Y ) 2 jY Var 8 rN (qb +pc +pd) = = 64 r 2 N 2 Var q rN 2 (c +d) +p(c +d) = = 64(pq) 2 r 2 N 2 Var(c +d): From denition, Var(c +d) = Var(c) + Var(d) + 2 Cov(c;d). On the other hand, Var(Y ) = Var( 2 r (cd)) = 4 r 2 Var(cd); and Var(cd) = Var(c) + Var(d) 2 Cov(c;d). Given Cov(c;d) 0, we would have Var(c +d) Var(cd) = r 2 4 Var(Y ) r 2 N max(p;q) 4p : 85 And Cov(c;d) 0 follows from the fact that at each turn, raising either c or d implies a certainty of non-increase in d or c respectively, with a possibility of decrease. We have therefore obtained the bound Var E(Y 0 Y ) 2 jY 64(pq) 2 r 2 N 2 Var(c +d) 64(pq) 2 r 2 N 2 r 2 N max(p;q) 4p = (5.3.3) = 16(pq) 2 max(p;q) Np : (5.3.4) We can now bound the distance () between W =Y= Y and the normal using 4.2.10: 12 p VarE[(W 0 W ) 2 jW ] + 32 A 3 + 6 A 2 p ; (5.3.5) whereA is some xed upper bound onjW 0 Wj. In our case we can putA = 2= Y . Then: 12 p VarE[(W 0 W ) 2 jW ] + 32 A 3 + 6 A 2 p 12N 2p 2 Y p VarE[(Y 0 Y ) 2 jY ] + 32 8N 2p 3 Y + 6 4 p N p 2p 2 Y 12Np 2pN min(p;q) s 16(pq) 2 max(p;q) Np + 2 7 Np 3=2 p(N min(p;q)) 3=2 + 24 p Np p 2pN min(p;q) = = 24jpqj p max(p;q) p 1=2 N 1=2 min(p;q) + 2 7 p 1=2 N 1=2 (min(p;q)) 3=2 + 12 p 2p 1=2 N 1=2 min(p;q) : 86 Thus, N 1=2 24jpqj p max(p;q) p 1=2 min(p;q) + 2 7 p 1=2 (min(p;q)) 3=2 + 12 p 2p 1=2 min(p;q) ! : (5.3.6) Thejpqj factor in the rst (and typically heaviest) of the three terms conrms our conjecture that the case p =q = 1=2 is somehow special. We can summarize our main result in a theorem. Theorem 5.3.1. W , the normalized sum of values of the nodes in the stationary Mercurial Voter model over a family of r-regular simple connected graphs goes to the standard normal as the size of the underlying graph increases, given that the parameter p2 (0; 1) of the model remains xed. The distance between W and the standard normal is bounded above byC p N 1=2 ; whereC p is the computable constant dependent solely on p given by 5.3.6. Proof. Above. 5.4 Relaxing some conditions in the Mercurial Model Let us now drop the r-regularity condition in the Mercurial Voter model. Keeping notation as above, additionally dene Q := P i r i the number of all edges in the graph; and r i the number of neighbors of node i. We need to introduce some new statistics to manage the additional structure we have introduced. Lets i be the number of neighbors to nodei whose values equal 1 at turn t in stationarity. 87 Let us moreover do some work with X = P i 1 i =1 rather than with Y = P i i , since X is non-negative. The conditional distribution of X becomes: Xjfs i g = 8 > > > > > > > > > > > > < > > > > > > > > > > > > : 1 with probability 1 N P i =1 p r i si i r i + P i =1 q s i r i 0 with probability 1 N ( P i =1 q r i si i r i + P i =1 q s i r i + + P i =1 p r i si i r i + P i =1 p s i r i ) 1 with probability 1 N P i =1 q r i si i r i + P i =1 p s i r i Note that Y = 2X. Before we continue, observe that, forN j the set of neighbors of nodej (excluding node j itself), X i r i 2s i r i = X i P j2N i j r i = X i i X j2N i 1 r j ! = X i k i i ; where k i = P j2N i 1 r j . It is easy to see that k i 0 and P i k i =N. And also, X i s i r i = X i 1 i =1 X j2N i 1 r j ! = X i k i 1 i =1 : 88 Proceeding as before, we nd that E(YjY ) =E[E(Yjf i g)jY ] = = 2 N E X " X i =1 p r i s i r i + X i =1 q s i r i ! X i =1 p s i r i + X i =1 q r i s i r i !# = = 2 N E Y " X i p r i s i r i X i =1 r i s i r i X i p s i r i + X i =1 s i r i # = = 2 N E Y " X i p r i 2s i r i + X i s i r i X i 1 i =1 # = = 2 N E Y " p X i k i i + X i (k i 1)1 i =1 # = = 2pY N + 2 N E Y " p X i (k i 1) i + X i (k i 1)1 i =1 # That means = 2p N , as before. We also have an R, which equals R = 1 Y 2 N E Y [p P i (k i 1) i + P i (k i 1)1 i =1 ]. Observe that when all the r i 's are equal, we have k i = 18i, and so we end up with exactly the same and R we had in the previous section (in which we discussed the r-regular case for the Mercurial model). It can also be shown that E(XjX) =E[E(Xjf i g)jX] = = X N + 1 N E X (qp) X i s i r i +pN ! 89 Next, E (X) 2 jfs i g =E 1 N " X i =1 p r i s i r i + X i =1 q s i r i + X i =1 p s i r i + X i =1 q r i s i r i # : It follows that min(p;q)E(X) 2 max(p;q): We want to boundEX 2 . To that end, note that 2EXX =E(X) 2 The left-hand-side expands to: 2EXX = 2 N EX 2 2pN N EX + 2 pq N EX X i k i 1 i =1 : Force p>q. Then, with C N the number of nodes for which k i > 1: 2EXX 2 N EX 2 pN + 2 (pq)k min N EX 2 + 2 (k max 1)C N N N 2 = = 2(1 (1 2p)k min ) N EX 2 pN + (k max 1)C N and 2EXX 2 N EX 2 pN + 2 (pq)k min N EX 2 : 90 Thus, (min(p;q) +pN (k max 1)C N )N 2(1 (1 2p)k min ) EX 2 (max(p;q) +pN)N 2 (1 + (pq)k min ) : Now, we are actually interested in Var(X) =EX 2 (EX) 2 . Straightforward calcu- lations show that in the case where all r i are equal, we get the O(N) upper bound on Var(X) that we had in the previous section; and if the r i 's are not equal, then k min < 1 < k max , and the upper bound is O(N 2 ) - meaning that we need to push the lower bound to also be of O(N 2 ) to succeed in obtaining our overall goal. C N also has to be assumed to be of order less thanO(N) (preferablyO(1)); and moreover, indeed C N k max must be of order less than O(N). So consider our current lower bound to Var(X): Var(X) (min(p;q) +pN (k max 1)C N )N 2(1 (1 2p)k min )N 2 =4 2(1 (1 2p)k min ) = = N(q (k max 1)) +N 2 (p 1 2 (1 (1 2p)k min )) 2(1 (1 2p)k min ) Now, k min < 1 unless r i = r8i, and we have already examined that case, so let us suppose not allr i are equal. Givenk min < 1, we have 1 2 (1 (1 2p)k min ))<p, and so the coecient of the N 2 term is positive. To have that term dominate the sum for large N, we must force the assumption that k max is of order less than O(N) (which we would have to assume later anyway, as explained below). 91 Observe that if the coecient of theN 2 term is of orderO(N 1 ), i.e., if (1k min ) is of order O(N 1 ), then both the lower and upper bounds of Var(X) are (as long ask max is of order less thanO(N)) of orderO(N), and the current approach works. Finally, recall that Var(Y ) = 4 Var(X), and that therefore we have bounded Var(Y ) as well as Var(X). Next, we want to take care of Var(E X (X) 2 ). We have: Var(E X (X) 2 ) (5.4.1) E Var 1 N " X i =1 p r i s i r i + X i =1 q s i r i + X i =1 p s i r i + X i =1 q r i s i r i #! = (5.4.2) = 1 N 2 Var p X i i X j2N i 1 r j 1 j = i ! +q X i i X j2N i 1 r j 1 j 6= i !! = (5.4.3) = 1 N 2 Var p X i i X j2N i 1 r j 1 j = i 1 j 6= i ! + X i i X j2N i 1 r j 1 j 6= i !! (5.4.4) Now, 1 j = i 1 j 6= i equals i . Hence the rst term inside the variance in the last line is p P i P j2N i 1 r j . That equals pN and is invariant. We obtained line 5.4.4 by substituting (1p) for q in the preceding line. Sub- stituting (1q) for p leads us to: Var(E X (X) 2 ) 1 N 2 Var X i i X j2N i 1 r j 1 j 6= i ! = 1 N 2 Var X i i X j2N i 1 r j 1 j = i ! 4k 2 max N 2 Var(X) 4k 2 max N 2 (max(p;q) +pN)N N 2 4 2 (1 + (pq)k min ) 2 (1 + (pq)k min ) 92 Note that conditioning on X is the same as conditioning on Y , and that Y = 2X. So we have bounded Var(E Y (Y ) 2 ) as well as Var(E X (X) 2 ). Next,ER 2 . Recall that R = 1 Y 2 N E Y " p X i (k i 1) i + X i (k i 1)1 i =1 # : Hence, with some use of Jensen, ER 2 1 2 Y N 2 E " p X i (k i 1) i + X i (k i 1)1 i =1 # 2 1 2 Y N 2 0 @ (p + 1) X ~ C N (k i 1) 1 A 2 : For this last term to behave appropriately, we want to let ~ C N be the number ofk i 's that do not equal one; and to require that ~ C N is xed. This assumption implies that (p + 1) P ~ C N (k i 1) 2 is of O(1). More generally, any sucient assumption providing (p + 1) P ~ C N (k i 1) 2 O(1) would work. Putting everything together yields a bound on the distance to normality of order O( k 2 max p N ): What is going on? In a way, k i = P j2N i 1 r j measures the amount of \lonely" neighbors a node has. Thus the central node in the star graph has N 1 neighbors who only have one neighbor each; and the k associated with the central node is (N 1), while the k's associated with each of the lonely nodes all equal 1 N1 . 93 Hence, for extreme values of p (and hence q), the values of the lonely nodes in the star graph will align themselves with (or against) the value of the central node. The value of the central node could change, on average, only once every N turns - meaning that the X-statistics will take large and small (tail) values disproportionately more often than it would under a Normal postulation. Thus our sucient conditions for the convergence of the distribution of X to the normal (at stationarity, as N rises for a specic family of graphs), is that k max is not too large, in the sense that k 2 max p N goes to 0 asN rises to innity; and that ~ C N , the number of nodes in a graph for which k i 6= 1, is such that C N k max is of O(1) and that ( P i (k i 1) 2 ) is also of O(1). In sum, Theorem 5.4.1. Suppose we have a family of (connected, undirected) graphs, for which k 2 max p N goes to 0 as N goes to innity. (Here k i = P j2N i 1 r j , where r i is the number of neighbors node i has under an arbitrary ordering of the nodes, andN i is the set of neighbors of node i.) Suppose also, that C N , the number of nodes i of the graph of size N for which k i > 1 and ~ C N , the number of nodes i for which k i 6= 1, are such thatk max C N is ofO(1) and ( P i (k i 1) 2 ) is also ofO(1). Suppose, moreover, we run the Mercurial Model with parameterp> 1 2 , in stationarity, on the family of graphs. Then the normalizedY := P i i - the sum of the values associated with the nodes of the graph at stationarity - goes to the standard normal with N, with rate of at most k 2 max p N . Proof. Above. 94 6 A Stein's Method Application to the Uniform Recursive Tree We use size-bias techniques to bound the distance to a Poisson random variable in the context of the Uniform Recursive Tree model. 6.1 Denitions Denition 6.1.1. Uniform Recursive Tree: Start with a (\root") node 0; and then at each turn, uniformly connect a new node to an existing node. Thus, at turn 1, connect a node 1 to node 0. At turn 2, connect a node 2 to either node 1 or node 0 with probabilities 1/2 each. Etc. Denition 6.1.2. The level of a node is its distance to the root node (node 0). The set of level i nodes is denoted by L n (i). Denition 6.1.3. The degree of a node is the number of nodes connected to it (i.e. number of edges to the node). There is at most one edge between any two nodes, and there are no edges between nodes of levels diering by more or less than 95 1. Each node is of degree at least 1. The sum of degrees of all nodes is 2n. The degree of node i is denoted by deg n (i). Here, n is the number of turns/steps that have taken place. Now, Theorem 3.1 in [5] tells us that: Theorem. ForX[n;d] =jfi2L n (1) :deg n (i) =dgj,X[n; 1];X[n; 2];::: are asymp- totically i.i.d. Poisson with mean 1 as n!1. Thus, X[n;d] is the number of nodes of level 1 with degree d at time n. 6.2 Problem We want to use Stein's method to get a bound on the distance between the Pois- son(1) distribution and the X[n;d]'s. To do that, let us use one of the available size-bias Stein bounds, for example Theorem 4.31 in [53]: Theorem. (Decreasing Size-Bias Coupling) Let X 1 ;:::;X n be indicator variables with P(X i = 1) = p i ; W = P n i=1 X i ; and =E[W ] = P i p i : For each i = 1;:::;n; let (X (i) j ) j6=i have the distribution of (X j ) j6=i , conditional on X i = 1 and let I be a random variable independent of all else, such that P(I = i) = p i = so that W s = P j6=i X (i) j + 1 has the size-bias distribution of W . If X (i) j X j for all i6=j, and Z has distribution Poisson(), then d TV (W;Z) minf1;g 1 Var(W ) : (6.2.1) 96 The increasing coupling version changes X (i) j X j to X (i) j X j and yields the bound: d TV (W;Z) minf1; 1 g Var(W ) + 2 n X i=1 p 2 i ! ; (6.2.2) where p i =EX i : In our case, it is hard to construct either an increasing or a decreasing coupling, and we resort to the following result: Theorem. (Theorem 4.13 in [53]) Let W 0 be an integer-valued random variable such that EW = > 0 and let W s be a size-bias coupling of W . If Z is Poisson(), then d TV (W;Z) minf1;gEjW + 1W s j (6.2.3) Here W s = P j6=I X (I) j + 1 is a size-bias coupling of W if for each i = 1;:::n; (X (i) j ) j6=i has the distribution of (X j ) j6=i conditional on X i = 1 with I an r.v. independent of the rest,P(I =i) =p i =. 6.3 Result Before, we continue, let us summarize our main result: 97 Theorem. The distance in total variation between the Poisson distribution with parameter d and the distribution of the number of level-1-degree-less-than-or-equal- to-d nodes in the recursive tree of size n is bounded above by a term of the form O(C (lnn) d1 n ), where C is some constant not dependant on n. The proof follows below. 6.4 Coupling in d = 1 case Express X[n;d] as X[n;d] = n X i=1 1 A (n) i 1 B (n) i;d = n X i=1 1 A (n) i \B (n) i;d = n X i=1 1 C (n) i;d ; (6.4.1) where eventA (n) i is \nodei is level 1," and eventB (n) i;d is \nodei is degreed." Here 1 (n) is the expected value of the number of level 1degree 1 nodes in the recursive tree of size n. Let us focus on X [n;1] . Observe that P(C (n) i;1 ) = 1 i i i + 1 ::: n 1 n = 1 n ) 1 (n) = n X i=1 1 n = 1 (6.4.2) Moreover, since knowing that C (n) i;1 has occurred tells us that 1) node i is level 1, and 2) all nodes after i are not attached to i, P(C (n) j;1 jC (n) i;1 ;j >i) = 1 j 1 j 1 j j j + 1 ::: n 2 n 1 = 1 n 1 (6.4.3) 98 P(C (n) j;1 jC (n) i;1 ;j <i) = 1 j j 1 j j j + 1 ::: i 2 i 1 i i i 1 i i i + 1 ::: n 2 n 1 = 1 n 1 (6.4.4) Intuitively, the coupling should be along the following lines: Pick a node i uniformly out of all nodes. (I.e. take I s.t. P(I = i) = 1=n = 1=n). Connect node i to the root node if necessary. If there are any nodes connected to i, \reconnect" them uniformly elsewhere, i.e. if node j is connected to i, uniformly pick a number k from the set f0; 1;:::;i 1;i + 1;:::;jg; and connect j to that k. Under this construction, we do have the desired conditional distribution: LetE i;j be the event that there is an edge between nodesi andj. We do haveP(E s i;j ;i;j6= I) =P(E i;j j1 C (n) I;1 = 1). Fori<j,j <I that probability is 1=j; and fori<j;j >I, the probability is 1 j1 = 1 j + 1 j 1 (j1) . Both X [n;1] and the 1 C (n) i;1 's are functions of the E i;j 's, and therefore the coupling holds. The problem is that the coupling is neither increasing nor decreasing: a \recon- nected" node may go to a level-1 node that had no branches; or it could connect to the root node, and have no branches of its own. Hence we make do with theEjW + 1W s j bound. 99 6.5 Bound in d = 1 case We have d TV minf1;gEjW + 1W s j =EjW + 1W s j 1 n + n 1 n Ej n X k=I k j1 C I;1 6= 1j 1 n +E n X k=I j k j1 C I;1 6= 1j; where k is the change in the given sum caused by a coupling perturbation at node i I. These changes can occur in three cases: 1) at coupling step 2, in the case when node I is the only node connected to a level-1 node; 2) at coupling step 3, in the case when a branch of I turns into a lvl-1-deg-1 node upon coupling; 3) at coupling step 3, in the case when a branch ofI attaches to a lvl-1-deg-1 node upon coupling. The probability for the occurrence of the rst case is: P 1 = I1 X i=1 1 i i i + 1 ::: I 2 I 1 1 I I I + 1 ::: n 1 n = = I1 X i=1 1 I 1 1 n = 1 n I1 X i=1 1 I 1 = 1 n : The idea is that nodeI can be connected to any of its predecessors except the root node - so we sum over nodes 1 to I 1. For each element of the sum, we look at the chance of its being connected to both the root node and I but to nothing else. For the second case, the probability that node j > I was connected (prior to the coupling) specically to node I, and was of degree one, is the same as the 100 probability of node j being connected to the root node and of degree one - namely 1=n. The probability of node j's hitting the root node upon its detachment from node I is 1=(j 1) - since node j must reattach to any of its predecessors except I, and the root node is one out of (j 1) uniform options. Hence, the case 2 perturbation at node j >I occurs with probability 1 n(j1) . For the third case, the probability that nodej >I was connected toI before the coupling is 1=j. Upon coupling, node j connects to a non-root-non-I node k with probability j2 j1 . That node k was originally lvl-1-deg-1 with probability slightly less than 1=n - since we are conditioning on I not being lvl-1-deg-1, which slightly raises the chance of I originally being a branch of k. So the probability of seeing a case 3 perturbation is bounded above by 1 jn . Hence, 1 n +E n X k=I j k j1 C I;1 6= 1j = 2 n +E X I<j ( 1 (j 1) 1 n + 1 j 1 n )< (6.5.1) < 2 n +E 2 n X I<j 1 j 1 = 2 n + 2 n n X i=1 1 n X i<j 1 j 1 = (6.5.2) = 2 n + 2 n 2 1 1 + 1 2 +::: + 1 n 1 + 1 2 + 1 3 +::: + 1 n 1 +::: = (6.5.3) = 2 n + 2 n 2 (n 1)< 4 n (6.5.4) Which produces our bound. The bound does go to 0 as n rises to innity. 101 6.6 d> 1 Case; Coupling 1) Pick a node i with probability p C (n) i;d P n j=1 p C (n) j;d = p C (n) i;d d (n) . 2) Move node i, along with all its branches, to the root node. 3) Node i either has too many or is has not enough branches. 3a) If i has too many branches: We want to shed the extra branches. Pick d 1 branches connected to i, making sure to use appropriate weights: for example, ford = 2, supposei has two branches, starting at nodes j and k. Pick node j with probability 1 j 1 j + 1 k and nodek with probability one minus the above. The selected nodes are the ones we keep. Discard the extra nodes by reassigning them uniformly away from i: for ex- ample, if we are discarding node j, we would uniformly (with probabilities 1=(j 1)) connect it to any of the nodes 0; 1;:::;i 1;i + 1;:::;j 1: 3b) If i has too few branches: With appropriately weighted probabilities, select as many nodes as we need from the nodes arriving after node i. For example, supposed d = 3, and only nodek (withk>i) is connected to i, and we need one more node to connect to i. Pick a node j >i with probability 1 j 1 i+1 + 1 i+2 +:::+ 1 k1 + 1 k+1 +:::+ 1 n . 102 This produces a non-monotone size-bias coupling, provided that we can show that we have obtained the required conditional probability distribution. We should have that, since we possess the same amount of information under both the coupling and the conditioning - namely, that nodeI is level-1, and has exactlyd1 branches. For d > 1, it is hard to explicitly calculate the conditional distributions, though examining the events E (n;d) I;j j[1 C (n) I;d = 1] may do the job. One can get rid of the "not enough branches" case by working with Y [n;d] = P d k=1 X [n;k] instead of withX [n;d] . We look at (takei<j)P(E s i;j ) andP(E i;j j1 C I;d = 1). For j <I,P(E s i;j ) =P(E i;j j1 C I;d = 1) = 1=j. For j =I,P(E s i;j ) =P(E i;j j1 C I;d = 1) = 0 unless i = 0. For j > I;i6= I, P(E s i;j ) = P(E i;j j1 C I;d = 1) = 1 j + 1 j P(F I;j ) 1 j1 = j1+P(F I;j ) j(j1) : HereP(F I;j ) is the probability that j is an "excess node" on I. For j >I;i =I,P(E s i;j ) =P(E i;j j1 C I;d = 1) = (1P(F I;j )) 1 j : So, since Y [n;d] is a function of these E i;j 's, we do have our coupling. 6.7 The Bounds in d> 1 case Take d (n) =EY [n;d] . d TV minf1; d (n)gEjW + 1W s j (6.7.1) We look at later. Focus onEjW + 1W s j: 103 Examine the case where we have too many branches. Adopt the same tactic as before - break the original expression into the \coupling not required" and "coupling required" cases, and proceed from there: d TV minf1; d (n)gEjW + 1W s j = d (n)EjW + 1W s j d (n)E[P(D I;d )] + d (n)E[(1P(D I;d ))j n X k=I k j1 D I;1 6= 1j] d (n) n X i=1 P(D i;d )P(I =i)+ + d (n) n X i=1 P(I =i)(1P(D i;d ))Ej n X k=i k j1 D i;1 6= 1j d (n) n X i=1 P(D i;d ) P(D i;d ) d (n) + + n X i=1 d (n) P(D i;d ) d (n) (1P(D i;d ))Ej n X k=i k j1 D i;1 6= 1j n X i=1 (P (D i;d )) 2 + n X i=1 P(D i;d )(1P(D i;d )) n X k=i Ej k j1 D i;1 6= 1j Here the event D i;d is the event \node i is level-1-degree-at-most-d." Note D i;d = [ i k=1 C k;d . Nowj j j,j >I, is non-zero in the following cases: ifj is an \excess" branch of I, and moreover 1)j had at-mostd branches and connected to the root node upon coupling; 2) j went to a level-1-degree-d node upon the coupling. Case 1 occurs with probability P(F I;j )jP(D (n) j;d ) 1 j1 . Here P(F I;j ) 1=j is the probability that initially nodej is an excess branch of nodeI;jP(D (n) j;d ) = P(D (n) j;d ) P(E 0;j ) is the probability that nodej is of degreed; and 1 j1 is the probability 104 that node j hooks to the root node upon coupling. We get an additional 1=j term upon summing over the possible values of k as dened below. Case 2 occurs with probabilityP(F I;j ) P j1 k=1;k6=I 1 j j j1 P(C (n) k;d ). The reshuing at coupling of nodeI itself can also cause a perturbation in the sum. That occurs in the case when node I is connected to a lvl-1-deg-(d + 1) prior to coupling. This Case 3 occurs with probability (conditioned on I =i) P i1 j=1 1 i P(C (n) j;d+1 ). HereP(C (n) j;d+1 ) is the probability of nodej being lvl-1-deg-1; and 1=i is the probability of node i being connected to j prior to coupling. We sum over all possible j's. We end up with d TV n X i=1 h P(D (n) i;d ) i 2 + n X i=1 " i1 X j=1 1 i P(C (n) j;d+1 ) # P(D (n) i;d )+ + n X i=1 " n X j=i+1 P(F i;j )P(D (n) j;d ) j j 1 # P(D (n) i;d )+ + n X i=1 " n X j=i+1 P(F i;j ) j1 X k=1;i6=k 1 j 1 P(C (n) k;d ) # P(D (n) i;d ) n X i=1 h P(D (n) i;d ) i 2 + n X i=1 [ 1 i i1 X j=1 P(C (n) j;d+1 ) + n X j=i+1 1 j P(D (n) j;d ) j j 1 + + n X j=i+1 1 j j1 X k=1;i6=k 1 j 1 P(C (n) k;d )]P(D (n) i;d ) Examine the three summands of the second sum: n X i=1 1 i i1 X j=1 P(C (n) j;d+1 ) n X i=2 max ji1 P(C (n) j;d+1 ) 105 Next, n X i=1 n X j=i+1 1 j P(D (n) j;d ) j j 1 P(D (n) 2;d ) + 1 2 P(D (n) 3;d ) +::: + 1 n 1 P(D (n) n;d ) + + 1 2 P(D (n) 3;d ) +::: + 1 n 1 P(D (n) n;d ) +::: + 1 n 1 P(D (n) n;d ) = n X i=2 P(D (n) i;d ) The third summand: n X i=1 n X j=i+1 1 j j1 X k=1;i6=k 1 j 1 P(C (n) k;d ) n X i=1 n X j=i+1 1 j max kj1 P(C (n) k;d ) = n X i=2 max ki P(C (n) k;d ) So d TV 2 n X i=1 (P(D i;d )) 2 + n X i=1 max ki P(C (n) k;d ) + max ki P(C (n) k;d+1 ) (P(D i;d )) (6.7.2) By establishing some convenient relationship between max ki P(C (n) i;d ) andP(D i;d ), we should be able to show that d TV is bounded byC P n i=1 [P(D (n) i;d )] 2 for a xed C. In any case, bounding P n i=1 [P(D (n) i;d )] 2 (and P n i=1 [P(C (n) i;d )] 2 ) is of interest. 6.8 The Term Let p (n) i;d =E1 C (n) i;d . 106 Since we are looking at Y [n;d] rather than at X [n;d] , let us consider d (n) =EY [n;d] =E d X i=1 X [n;i] = n X j=1 d X i=1 p (n) j;i : We focus on d (n) =EX n;d = P n i=1 p (n) i;d . We have already seen that p (n) i;1 = 1 n . Getting the values of for d > 1 is challenging; let us employ the following recursive formula: P(C (n) i;d ) =P(C (n1) i;d1 ) 1 n +P(C (n1) i;d ) n 1 n p (n) i;d =p (n1) i;d1 1 n +p (n1) i;d n 1 n p (n1) i;d1 =np (n) i;d (n 1)p (n1) i;d Next, sum over i: n1 X i=1 p (n) i;d = 1 n n1 X i=1 p (n1) i;d1 + n 1 n n1 X i=1 p (n1) i;d Observe thatp (n) n;d = 0 ford> 1; and in our case we do haved> 1 for (d 1) to make sense. Hence n X i=1 p (n) i;d = 1 n n1 X i=1 p (n1) i;d1 + n 1 n n1 X i=1 p (n1) i;d 107 And so if we dene d (n) = P n i=1 p (n) i;d , we obtain the functional equation d (n) = 1 n d1 (n 1) + n 1 n d (n 1) n d (n) (n 1) d (n 1) = d1 (n 1) We know 1 (n) = 1. So n 2 (n) (n 1) 2 (n 1) = 1 (n 2 (n) (n 1) 2 (n 1)) + ((n 1) 2 (n 1) (n 2) 2 (n 2)) + +::: + (2 2 (2) 2 (1)) =n 1 Clearly, for d>n, d (n) = 0. Thus, n 2 (n) =n 1) 2 (n) = n 1 n : The result above is easy to verify with direct calculations. The iterative func- tional equation becomes more complicated for d> 2: n 3 (n) (n 1) 3 (n 1) = 2 (n 1) = n 2 n 1 108 We can use the following general formula, derived as above: d (n) = 1 n n1 X i=d1 d1 (i) For d = 3: 3 (n) = 1 n n 2 n 1 + n 3 n 2 +::: + 1 2 = = 1 n (n 2) 1 n 1 + 1 n 2 +::: + 1 3 + 1 2 = n 2 n H n1 1 n ; where H n = P n i=1 1 i is the harmonic series. The harmonic series can be approximated with H n ' ln(n) + , where ' 0:57721 is the Euler-Mascheroni constant. The error is about 1=2n. Thus, 3 (n)' n 1 n ln(n 1) + n (6.8.1) Next, 4 (n) = 1 n n1 X i=3 3 (i)' 1 n n1 X i=3 ( i 1 i ln(i 1) + i ) = (6.8.2) ' 1 n [(n 3) 1 3 + 1 4 +::: + 1 n 1 (6.8.3) 1 3 + 1 4 +::: + 1 n 1 n1 X i=3 ln(i 1) i ] (6.8.4) 109 To approximate the integral, observe that R (lnx) n x dx = (lnx) n+1 n+1 : So 4 (n)' nc n (1 + )H n1 n 1 2 (lnn) 2 n ' nc n (1 + ) ln(n 1) + n (lnn) 2 2n (6.8.5) Here c = 3 ( 3 2 (1 + )). We see that a general approximation for d (n), for nd, looks like d (n) = nc n k 1 lnn n k 2 (lnn) 2 n :::k d2 (lnn) d2 n ; (6.8.6) where k i are coecients, which look something like 1 i! . Given that, then, clearly, we have d (n)! 1 as n!1. One can also see the convergence to 1 from the expression d (n) = 1 n P n1 i=d1 d1 (i). We know 1 (n) = 1. Hence, 2 (n) = n1 n , and that goes to 1 as n goes to innity. Next, as n rises, upon averaging, the largest terms in 1 n P n1 i=2 2 (i) dominate the averaged sum, and those go to 1. Inductively, for xedd, the lambdas for the higher d's also go to 1. Hence,EX n;d ! 1; andEY n;d !d. 6.9 Numerical Results Follow the specic probabilities related to indicator functions (calculated with Mat- lab). For d = 1;n = 1;::; 8: 110 1.0000 0 0 0 0 0 0 0 0.5000 0.5000 0 0 0 0 0 0 0.3333 0.3333 0.3333 0 0 0 0 0 0.2500 0.2500 0.2500 0.2500 0 0 0 0 0.2000 0.2000 0.2000 0.2000 0.2000 0 0 0 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0 0 0.1429 0.1429 0.1429 0.1429 0.1429 0.1429 0.1429 0 0.1250 0.1250 0.1250 0.1250 0.1250 0.1250 0.1250 0.1250 For d = 2;n = 1;:::; 8: 0 0 0 0 0 0 0 0 0.5000 0 0 0 0 0 0 0 0.5000 0.1667 0 0 0 0 0 0 0.4583 0.2083 0.0833 0 0 0 0 0 0.4167 0.2167 0.1167 0.0500 0 0 0 0 0.3806 0.2139 0.1306 0.0750 0.0333 0 0 0 0.3500 0.2071 0.1357 0.0881 0.0524 0.0238 0 0 0.3241 0.1991 0.1366 0.0949 0.0637 0.0387 0.0179 0 For d = 3;n = 1;:::; 8: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.1667 0 0 0 0 0 0 0 0.2500 0.0417 0 0 0 0 0 0 0.2917 0.0750 0.0167 0 0 0 0 0 0.3125 0.0986 0.0333 0.0083 0 0 0 0 0.3222 0.1151 0.0472 0.0179 0.0048 0 0 0 111 0.3257 0.1266 0.0583 0.0266 0.0107 0.0030 0 0 For d = 4;n = 1;:::; 8: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0417 0 0 0 0 0 0 0 0.0833 0.0083 0 0 0 0 0 0 0.1181 0.0194 0.0028 0 0 0 0 0 0.1458 0.0308 0.0071 0.0012 0 0 0 0 0.1679 0.0413 0.0122 0.0033 0.0006 0 0 0 Follow, i (n) for i = 1; 2; 3; 4 as columns and n = 1;:::; 8 as rows. 1.0000 0 0 0 1.0000 0.5000 0 0 1.0000 0.6667 0.1667 0 1.0000 0.7500 0.2917 0.0417 1.0000 0.8000 0.3833 0.0917 1.0000 0.8333 0.4528 0.1403 1.0000 0.8571 0.5071 0.1849 1.0000 0.8750 0.5509 0.2252 112 6.10 Pictures 113 6.11 Bounding maxp i We want to bound the P i (p (n) i;d ) 2 = P n i=1 [P(C (n) i;d )] 2 term. We have P i p (n) i;d = d (n) 1. In a more general setting, take a set of non-negative numbers a i such that P n i=1 a i = 1. Assume a 1 a 2 a 3 ::: a n 0. Suppose in addition we have b a 1 and a n c, i.e. b is an upper bound for the a i 's, and c is a lower bound. We try to boundS = P n i=1 a 2 i . It is easy to see thatS is the smallest when a 1 =a 2 =::: =a n . 114 Beyond that, let us try to get an upper bound to S as a function of b and c. The sum would be largest if as many as possible terms achieve the upper bound b. Disregarding remainders, we obtain the conditions: mc +kb = 1 m +k =n Using those, it is possible to obtain the following bound: b +cnbcS (()a max +a min na max a min S) In our setting, we do have p (n) 1;d p (n) 2;d ::: p (n) nd+1;d . Consider the following combinatorial argument. For a xedn, there aren! possible constructions of recur- sive trees. There are (nd)! trees for which 1 C nd+1;d = 1, since there is only one option for the placement of each of the lastd nodes. There are (nd1)!( P n i=nd i) trees for which 1 C nd;d = 1. Clearly (nd 1)!( P n i=nd i) > (nd)! for d > 1. Inductively, one can see that for i < j, there are more tree for which node i is level-1-degree-d, than there are trees for which node j is level-1-degree-d. Hence, in our casep max =p (n) 1;d andp min =p (n) nd+1;d . Moreover, though nding an explicit expression for p (n) 1;d is not trivial, we have p (n) nd+1;d = (nd)! n! ' 1 n d for nd. Unfortunately, this p min is too small to help us signicantly - and hence we end up resorting to the basic unsatisfying inequality n X i=1 a 2 i a max : 115 In eect, we have to bound p max =p (n) 1;d . From the recursive formula we know that p (n) 1;d = 1 n p (n1) 1;d1 + n 1 n p (n1) 1;d = 1 n p (n1) 1;d1 + n 1 n 1 n 1 p (n2) 1;d1 + + n 1 n n 2 n 1 p (n2) 1;d =::: = = 1 n n1 X i=1 (p (ni) 1;d1 ) = 1 n n1 X i=d1 (p (i) i;d1 ) Now, p (n) 1;1 = 1 n p (n) 1;2 = 1 n n1 X i=1 p (i) 1;1 = 1 n 1 1 + 1 2 +::: + 1 n 1 = 1 n H n1 ' ln(n 1) + n p (n) 1;3 ' 1 n Z n 1 lnx + x dx = 1 n (lnn) 2 2 + ln(n) Using a more accurate approximation we would also have a term c n . In general, we see that p (n) 1;d is O(C (lnn) d1 n ). That goes to 0 as n increases. In light of the bound we have (expression 6.7.2), we are more interested in bound- ing P n i=1 [P(D (n) i;d )] 2 : Examine 1 d 2 P n i=1 [P(D (n) i;d )] 2 instead. The sum P n i=1 1 d P(D (n) i;d ) is less than equal to 1, and all its terms are non-negative. The same arguments as in the case that dealt with the P(C (n) i;d )'s apply. In particular, max 1in P(D (n) i;d ) = P(D (n) 1;d ) = P d P(C (n) 1;d ). SinceP(C (n) 1;d ) goes to zero asn rises, then maxP(D (n) i;d ) also goes to zero, and hence the entire sum of squares converges to zero. 116 This immediately takes care of the rst summand ( P n i=1 [P(D (n) i;d )] 2 ) in expression 6.7.2. What to do with the other two summands - the ones of type P n i=1 max ki P(C (n) k;d )P(D i;d )? Observe that n X i=1 max ki P(C (n) k;d P(D i;d ) max 1kn P(C (n) k;d n X i=1 P(D i;d )d max 1kn P(C (n) k;d ): And so we end up with another bound of the form Const d maxP(C (n) k;d ), where Const d depends on d but not on n. Therefore the right-hand-side of the bound in expression 6.7.2 tends to zero asn goes to innity, meaning that the number of level- 1-degree-less-than-or-equal-to-d nodes in the recursive tree goes to a Poisson(d) distribution as the number of nodes n rises to innity. 6.12 The Overall Bound; the Bivariate Poisson Distribution In sum, Theorem 6.12.1. The distance in total variation between the Poisson distribution with parameter d and the distribution of the number of level-1-degree-less-than-or- equal-to-d nodes in the recursive tree of size n is bounded above by a term of the form O(C (lnn) d1 n ), where C is some constant not dependant on n. Regarding the bound on the distance between the Poisson distribution and the variablesY [n;d] : The bound isO(C (lnn) d1 n ), and does go to zero asn rises. Moreover, EY [n;d] !d as n goes to innity. 117 We are still interested in trying to show that not only the Y s but also the Xs converge to Poisson variables. AssumingY [n;d] goes to Poisson(d), how can we show that theX [n;d] 's are independent and converge to Poisson variables with parameter 1? Assuming that theX [d] 's are independent Poisson(1) variables is consistent with what we have found so far. In particular, supposing that Y [d] and X [d+1] are inde- pendent Poisson variables with respective means d and 1, it has been shown (see [37] for an overview of the Bivariate Poisson Distribution) that the variablesY [d] and Y [d+1] =Y [d] +X [d+1] are jointly distributed with the Bivariate Poisson Distribution with parameters 1 = 0; 2 = 1; and 3 =d. One of the properties of the Bivariate Poisson Distribution is that its marginal distributions are Poisson variables with means 1 + 3 and 2 + 3 . In eect, going the other way, it suces to show that (Y [d] ;Y [d+1] ) is Bivariate Poisson in order to prove that theX [d] 's are independent Poisson(1) variables. One way of establishing the Bivariate Poisson distribution for (Y [d] ;Y [d+1] ) would be to establish the Bivariate Poisson recurrence relations: kP(Y [d] =k;Y [d+1] =l) = 1 kP(Y [d] =k1;Y [d+1] =l)+ 3 kP(Y [d] =k1;Y [d+1] =l1) (6.12.1) lP(Y [d] =k;Y [d+1] =l) = 2 kP(Y [d] =k;Y [d+1] =l1)+ 3 kP(Y [d] =k1;Y [d+1] =l1) (6.12.2) 118 It turns out that, in our case, proving the above relations is equivalent to directly proving the recurrence relation kP(X [d+1] =k) =P(X [d+1] =k 1) (see [36]). But having that relation would mean that X [d+1] is Poisson(1) anyway. 119 Bibliography [1] D. Aldous and Fill J. A., Reversible Markov chains and random walks on graphs, 1994-2002, http://www.stat.berkeley.edu/ aldous/RWG/book.html. [2] Richard Arratia and Larry Goldstein, Size bias, sampling, the waiting time paradox, and innite divisibility: when is the increment independent?, (2010), http://arxiv.org/abs/1007.3910. [3] Richard Arratia, Larry Goldstein, and Louis Gordon, Poisson approximation and the Chen-Stein Method, Statist. Sci. 5 (1990), no. 4, 403{424. [4] Christos A. Athanasiadis and Persi Diaconis, Functions of random walks on hyperplane arrangements, Advances in Applied Mathematics 45 (2010), no. 3, 410{437. [5] Agnes Backhausz and Tam as F. M ori, Degree distribution in the lower levels of the uniform recursive tree, Annales Univ. Sci. Budapest To appear (2011), http://arxiv.org/abs/1112.1250. [6] Pierre Baldi and Yosef Rinott, On normal approximations of distributions in terms of dependency graphs, Ann. Probab. 17 (1989), no. 4. 120 [7] Pierre Baldi, Yosef Rinott, and Charles Stein, A normal approximation for the number of local maxima of a random function on a graph, Probability, statistics, and mathematics (1989). [8] A. D. Barbour, Stein's Method and Poisson process convergence, Journal of Applied Probability 25 (1988), no. A Celebration of Applied Probability. [9] A.D. Barbour, Lars Holst, and Svante Janson, Poisson approximation, Oxford Studies in Probability, vol. 2, Clarendon Press, Oxford, 1992. [10] Andrew D. Barbour, Multivariate Poisson-binomial approximation using Stein's Method, University of Zurich Preprints 4 (2005). [11] Timothy C. Brown and M.J. Phillips, Negative binomial approximation with Stein's method, Methodology and Computing in Applied Probability 1 (1999), no. 4, 407{421. [12] Sourav Chatterjee, Concentration inequalities with exchangeable pairs (Ph.D thesis), 2005, http://arxiv.org/abs/math/0507526. [13] , Stein's Method and applications (fall 2007 lecture notes), (2007), 124, http://www.stat.berkeley.edu/ sourav/stat206Afall07.html. [14] , Stein's Method for concentration inequalities, Probab. Theory Related Fields 138 (2007), no. 1-2, 305{321. [15] Sourav Chatterjee and Partha S. Dey, Applications of Stein's Method for con- centration inequalities, Ann. Probab. 38 (2010), no. 6, 2443{2485. 121 [16] Sourav Chatterjee, Persi Diaconis, and Elizabeth Meckes, Exchangeable pairs and Poisson approximation, Probability Surveys 2 (2005), 64{106. [17] Sourav Chatterjee, Jason Fulman, and Adrian R ollin, Exponential approxima- tion by Stein's Method and spectral graph theory, ALEA Lat. Am. J. Probab. Math. Stat. 8 (2011), 197{223. [18] Louis H. Y. Chen and Qi-Man Goldstein, Larry; Shao, Normal approximation by Stein's Method, Probability and its Applications, Springer-Verlag Berlin Heidelberg, 2011. [19] Louis H. Y. Chen and Adrian R ollin, Stein couplings for normal approximation, (2010), http://arxiv.org/abs/1003.6039. [20] Louis H.Y. Chen and Qi-Man Shao, Normal approximation under local depen- dence, The Annals of Probability 32 (2004), no. 3A, 1985{2028. [21] Fan Chung and Ron Graham, Flipping edges and vertices in graphs, Advances in Applied Mathematics 48 (2012), 37{63. [22] Persi Diaconis, The Markov chain Monte Carlo revolution, Bull. Amer. Math. Soc. 46 (2009), 179{205. [23] Persi Diaconis, Jason Fulman, Susan Holmes, Mark Huber, Gesine Reinert, and Charles Stein, Stein's Method: Expository lectures and applications, Lecture Notes - Monograph Series, vol. 46, Institute of Mathematical Statistics, 2004. 122 [24] Christian D obler, Stein's Method of exchangeable pairs for absolutely continu- ous, univariate distributions with applications to the Polya urn model, (2012), http://arxiv.org/abs/1207.0533. [25] Peter Donnelly and Dominic Welsh, The antivoter problem: Random 2- colourings of graphs, Graph Theory and Combinatorics (1984). [26] Werner Ehm, Binomial approximation to the Poisson binomial distribution, Statistics & Probability Letters 11 (1991), no. 1, 7{16. [27] Xiao Fang, Multivariate, combinatorial and discretized normal approximations by Stein's Method (Ph.D dissertation), 2012. [28] Xiao Fang and Adrian R ollin, Rates of convergence for multivariate normal ap- proximation with applications to dense graphs and doubly indexed permutation statistics, (2012), http://arxiv.org/abs/1206.6586. [29] Jason Fulman, Stein's Method and non-reversible Markov chains, Lecture Notes - Monograph Series 46 (2004), 66{74. [30] Jason Fulman and Nathan Ross, Exponential approximation and Stein's Method of exchangeable pairs, (2012), http://arxiv.org/abs/1207.5073. [31] Subhankar Ghosh, Larry Goldstein, and Martin Raic, Concentration of mea- sure for the number of isolated vertices in the Erdos-R enyi random graph by size bias couplings, Statistics & Probability Letters 81 (2011), no. 11. 123 [32] Larry Goldstein and Gesine Reinert, Stein's Method for the Beta distribution and the P olya-Eggenberger Urn, (2012), http://arxiv.org/abs/1207.1460. [33] Larry Goldstein and Yosef Rinott, Multivariate normal approximations by Stein's Method and size bias couplings, Journal of Applied Probability 33 (1996), 1{17. [34] F. G otze, On the rate of convergence in the multivariate CLT, Ann. Probab. 19 (1991), no. 2, 724{739. [35] Olle H aggstr om, Finite Markov chains and algorithmic applications, London Mathematical Society Student Texts, vol. 00, Cambridge University Press, 2002. [36] Kazumoto Kawamura, A note on the recurrence relations for the bivariate Poisson distribution, Kodai Math. J. 8 (1985), no. 1, 70{78. [37] Subrahmaniam Kocherlakota and Kathleen Kocherlakota, Bivariate discrete distributions, Statistics: Textbooks and Monographs, Marcel Decker, Inc, 1992. [38] David A. Levin, Yuval Peres, and Elizabeth L. Wilmer, Markov chains and mixing times, American Mathematical Society, 2008. [39] Thomas M. Liggett, Interacting particle systems, Classics in Mathematics, vol. 276, Springer Berlin Heidelberg, 2005 reprint of 1985 original publication. [40] H.M. Luk, Stein's Method for the gamma distribution and related statistical applications, Ph.D thesis, 1994. 124 [41] Norman S. Matlo, Ergodicity conditions for a dissonant voting model, Ann. Probab. 5 (1977), no. 3, 371{386. [42] Sean Meyn and Richard L. Tweedie, Markov chains and stochastic stability 2nd ed., Cambridge University Press, 2009. [43] Erol Pek oz, Adrian R ollin, and Nathan Ross, Total variation error bounds for geometric approximation, (2010-2011), http://arxiv.org/abs/1005.2774. [44] Erol A. Pek oz and Adrian R ollin, New rates for exponential approximation and the theorems of r enyi and yaglom, Ann. Probab. 39 (2011), no. 2, 587{608. [45] Erol A. Pek oz, Adrian R ollin, and Nathan Ross, Degree asymp- totics with rates for preferential attachment random graphs, (2012), http://arxiv.org/abs/1108.5236. [46] Gesine Reinert and Adrian R ollin, Multivariate normal approximation with Stein's Method of exchangeable pairs under a general linearity condition, Ann. Probab. 37 (2009), no. 6, 2150{2173. [47] , Random subgraph counts and u-statistics: multivariate normal approx- imation via exchangeable pairs and embedding, J. Appl. Probab. 47 (2010), no. 2. [48] Yosef Rinott and Vladimir Rotar, A multivariate CLT for local dependence with n 1=2 log(n) rate and applications to multivariate graph related statistics, Journal of Multivariate Analysis 56 (1996), no. 2, 333{350. 125 [49] , On coupling constructions and rates in the CLT for dependent sum- mands with applications to the antivoter model and weighted u-statistics, The Annals of Applied Probability 7 (1997), no. 4, 1080{1105. [50] Adrian R ollin, A note on the exchangeability condition in Stein's Method, Statistics & Probability Letters 78 (2008), no. 13, 1800{1806. [51] , Stein's Method in high dimensions with applications, (2011), http://arxiv.org/abs/1101.4454. [52] Nathan Ross, Exchangeable pairs in Stein's Method of distributional approxi- mation (Ph.D), 2009. [53] , Fundamentals of Stein's Method, Probability Surveys 8 (2011), 210{ 293. [54] , Power laws in preferential attachment graphs and Stein's Method for the negative binomial distribution, (2012), http://arxiv.org/abs/1208.1558. [55] Matthias Schulte, A Central Limit Theorem for the Poisson-Voronoi approxi- mation, (2011), arXiv:1111.6466v2. [56] Spario Y. T. Soon, Binomial approximation for dependent indicators, Statist. Sinica 6 (1996), 703{714. [57] Charles Stein, A bound for the error in the normal approximation to the dis- tribution of a sum of dependent random variables, Proc. Sixth Berkeley Symp. on Math. Statist. and Prob. 2 (1972), 583{602. 126 [58] , Approximate computation of expectations, Monograph Series, no. 7, Institute of Mathematical Statistics, 1986. [59] Daniel W. Stroock, An introduction to Markov processes, Graduate Texts in Mathematics, vol. 230, Springer-Verlag Berlin Heidelberg, 2005. [60] K. Teerapabolarn and P. Wongkasem, On pointwise binomial approximation by w-functions, International Journal of Pure and Applied Mathematics 71 (2011), no. 1, 57{66. [61] Remco van der Hofstad, Random graphs and complex networks, Lecture notes, available here (as of Sept 2012): http://www.win.tue.nl/ rhofs- tad/NotesRGCN.pdf. 127
Abstract (if available)
Abstract
We look at the background of Stein's Method, its ties with the field of Markov Chains, and the connections of both those subjects and the Neighbourhood Attack Voter-type model. ❧ On the basis of that background, we provide two examples of applications of Stein's Method, one in the context of a pair of Voter Models, and one in the context of the uniform recursive tree.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Concentration inequalities with bounded couplings
PDF
Stein's method via approximate zero biasing and positive association with applications to combinatorial central limit theorem and statistical physics
PDF
Stein's method and its applications in strong embeddings and Dickman approximations
PDF
Limit theorems for three random discrete structures via Stein's method
PDF
Exchangeable pairs in Stein's method of distributional approximation
PDF
Finite sample bounds in group sequential analysis via Stein's method
PDF
Eigenfunctions for random walks on hyperplane arrangements
PDF
Stein couplings for Berry-Esseen bounds and concentration inequalities
PDF
A stochastic Markov chain approach for tennis: Monte Carlo simulation and modeling
PDF
Asymptotic properties of two network problems with large random graphs
PDF
An application of Markov chain model in board game revised
PDF
On limiting distribution and convergence rates of random processes defined over discrete structures
PDF
Cycle structures of permutations with restricted positions
PDF
Network structures: graph theory, statistics, and neuroscience applications
PDF
Convergence rate of i-cycles after an m-shelf shuffle
PDF
Sequential testing of multiple hypotheses with FDR control
PDF
Sequential testing of multiple hypotheses
PDF
Max-3-Cut performance of graph neural networks on random graphs
PDF
Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol
PDF
Reproducible large-scale inference in high-dimensional nonlinear models
Asset Metadata
Creator
Marinov, Radoslav T.
(author)
Core Title
Applications of Stein's method on statistics of random graphs
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Publication Date
07/31/2013
Defense Date
05/08/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
bounds on rates of convergence,Markov chains,mathematics,OAI-PMH Harvest,probability,random graphs,Stein's method,uniform recursive tree,voter models
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Fulman, Jason (
committee chair
), Bartroff, Jay (
committee member
), Newton, Paul K. (
committee member
)
Creator Email
radoslavtmarinov@gmail.com,rtmarino@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-307970
Unique identifier
UC11294873
Identifier
etd-MarinovRad-1905.pdf (filename),usctheses-c3-307970 (legacy record id)
Legacy Identifier
etd-MarinovRad-1905.pdf
Dmrecord
307970
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Marinov, Radoslav T.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
bounds on rates of convergence
Markov chains
probability
random graphs
Stein's method
uniform recursive tree
voter models