Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Stein couplings for Berry-Esseen bounds and concentration inequalities
(USC Thesis Other)
Stein couplings for Berry-Esseen bounds and concentration inequalities
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
STEIN COUPLINGS FOR BERRY-ESSEEN BOUNDS AND CONCENTRATION INEQUALITIES by Subhankar Ghosh A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (APPLIED MATHEMATICS) May 2012 Copyright 2012 Subhankar Ghosh Acknowledgements One of my friends from India once exclaimed that our lives are nothing but collec- tions of several ve year plans. That was at the beginning of probably the most important `ve year plan' of my life, otherwise known as PhD. Today, at the end of this journey I can think of several people who have helped me in myriad ways, big and small, signicant and petty. I shall humbly take this opportunity to pay my tribute of gratitude and respect to them. At the outset, I would like to thank my parents for helping me through my founda- tional years. From kindergarten to the high school years, they have helped me with my studies, encouraged me to take part in science fairs and merit tests and most of all taught me the value of education. My mother, a physics graduate and my father, an engineer, imbued their passion for numbers in me right in my childhood and made maths seem attractive to me. So this thesis is partly their handiwork as well. In my moments of disappointment and achievement, they have always been there by my side. Sitting ten thousand miles away from home, I would have never been able to retain my sanity and enthusiasm and complete my PhD had it not been for Larry Gold- stein's advising. From introducing me to the fascinating arena of Stein's method, going through the minute details of my write-ups to constantly assuring me of my competency, Larry has been a guide in the truest sense of the term. While he rebuked me from time to time for my error prone proofreading, he has helped me ii cope with the anxiety of graduate student life also. I cannot thank him enough for keeping my morale up on so many occasions. Prof. Igor Kukavica was probably the main reason why I could circumvent a tech- nical roadblock and got a chance to study at USC. All through these ve years, he has found time for me from his busy schedule to advise me on issues ranging from teaching to PDE and from job hunting to budget proposal writing. Mark Twain said that cauli ower is nothing but a cabbage with college education. In my life, the role of Indian Statistical Institute (ISI), Kolkata, India is much more than just unfolding me to smell excellence, it has moulded me, it has chiselled me, it has crafted me! I would take this opportunity to thank Dr. Alok Goswami and Dr. Krishanu Maulik, my Professors from ISI for kindling my interest in probability theory. I would also like to thank Parimal babu and Shyamal babu, my high school maths teachers whose rigorous and intense tutelage saw me through the grilling ISI entrance test ten years back. I have a lot of fond memories of my college days with friends like Abhishek Banerjee, Rahul Mazumder , Chiranjit Mukherjee, Amit Gupta, GP and Abhisek Saha . Thanks to you guys for apprising me of statistics, vital statistics and maths problems all of which have enriched me signicantly! I strongly believe that nobody from the USC Maths Department would have suc- cessfully graduated without the aid and assistance of Amy Yung and Arnold Deal. I would have ended up paying hundreds of dollars on late fee if not for Amy's iii reminders. Arnold has helped me with issues like getting printouts to setting up room reservations. Academics apart, I have been extremely lucky to have some really great friends who have made my life blissful in so many dierent ways. My two super awesome roommates, Chiru and Sivaditya have stuck by me through all these years. I will always miss our endless discussions on life, world, politics and math puzzles over mugs of tea. Among my friends in Los Angeles, Abhishek Ghosh, Sunil Narang, Shanu Sushmita, Adarsh Shekhar, Prithviraj Banerjee, Rumi Ghosh and Mahesh- waran Sathyamurthy have given me immense relief from boring food and the pain of cooking by inviting me to the dinner parties they organised. I thank USC for giving me the opportunity of pursuing my PhD. I shall always cherish the memories, my pedagogic experiences and scholastic attainments of these ve years spent in USC and the people who made it possible and enjoyable. I should rest my case by thanking Kamalika for standing by me, so proximately distant. iv Table of Contents Acknowledgements ii Abstract vii Chapter 1: The Basic Premise and Organization 1 Chapter 2: Fundamentals of Stein's Method 6 2.1 The Stein Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Zero Bias Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Size Bias Couplings . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 3: Combinatorial Central Limit Theorem for Involutions 20 3.1 Statement of the Problem and Literature Review . . . . . . . . . . 20 3.2 Notation and Statement of Main Result . . . . . . . . . . . . . . . . 27 3.3 Zero Bias Transformation . . . . . . . . . . . . . . . . . . . . . . . 31 3.3.0.1 Dening the Auxiliary Permutation . . . . . . . . . 40 3.4 L 1 Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.5 L 1 Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.6 Comment Regarding the Sharpness of the Bounds . . . . . . . . . . 90 Chapter 4: A Brief Overview of the Concentration of Measure Phe- nomenon 92 4.1 Some Well Known Concentration of Measure Inequalities . . . . . . 93 4.2 An Application of Concentration of Measure Inequalities . . . . . . 97 4.2.1 Skip List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.2.2 Randomized Skip List . . . . . . . . . . . . . . . . . . . . . 99 Chapter 5: Concentration of Measure Results Using Bounded Size Bias Couplings 103 5.1 The Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.2 Proof of the Main Result . . . . . . . . . . . . . . . . . . . . . . . . 108 5.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.3.1 Relatively Ordered Sub-sequences of a Random Permutation 113 5.3.2 Local Dependence . . . . . . . . . . . . . . . . . . . . . . . . 117 5.3.2.1 Sliding m Window Statistics . . . . . . . . . . . . . 119 v 5.3.2.2 Local Extrema on a Lattice . . . . . . . . . . . . . 121 5.3.3 Urn Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.3.4 An Application to Coverage Processes . . . . . . . . . . . . 129 5.3.5 The Lightbulb Problem . . . . . . . . . . . . . . . . . . . . . 133 Chapter 6: Unbounded couplings 137 6.1 A Concentration of Measure Result for Isolated Vertices . . . . . . 138 6.2 Proof of Theorem 6.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . 144 6.3 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.3.0.1 Innitely Divisible Distributions . . . . . . . . . . . 150 6.3.0.2 Compound Poisson Distribution . . . . . . . . . . . 155 References 158 vi Abstract Stein's method is one of the cornerstones of modern limit theory in probability. While working on the problem of inadmissibility of the multivariate sample mean for Gaussian distribution, Charles Stein used a characterization identity of the normal distribution. In 1972, he showed that the same identity could be used to prove the Central Limit Theorem in cases where summands were not even independent and also to yield the rate of convergence to normal distribution. For probability theorists, this new method was appealing as it did not use Fourier transforms. Within a few years many new CLT error bounds were obtained for cases where the summands were not independent. Louis Chen showed that similar ideas could be used for obtaining error bounds in the realm of Poisson approximation also. From the early days of Stein's method, coupled random variables played an important role in Stein's method and some of the very useful coupling techniques were devised in the late eighties and nineties. In this thesis, we will look at two of these couplings namely the zero and size bias couplings. In particular, we will show how zero bias couplings can be used to obtain error bounds in a combinatorial central limit theorem using involutions. We furthermore show that the couplings are useful not only for obtaining error bounds but for obtaining concentration of measure inequalities as well. We illustrate the use of our results through several nontrivial examples. vii Chapter 1 The Basic Premise and Organization One of the cornerstones of modern probability theory is the central limit theorem (CLT). In its most basic form, the CLT states that when X 1 ;X 2 ;:::;X n are inde- pendent and identically distributed (i.i.d) random variables having nite mean and variance, the centered and scaled sum W n = S n nE(X 1 ) p nVar(X 1 ) L )N (0; 1) as n!1; where L ) denotes convergence in distribution, S n = P n i=1 X i andN (0; 1) is the standard normal distribution. Interestingly, the CLT does not depend on the exact distribution of the individual summands. It is natural to ask how well the standard normal distribution approximates the distribution of W n , in other words, if F n is the distribution function of W n and is the distribution function of aN (0; 1) random variable, then one is interested in 1 knowing how well approximates F n with respect to one of the standard function space distance metrics, the L 1 distance, for example. Recall that the L 1 distance between F n and is given by jjF n jj 1 = sup x jF n (x) (x)j: Although the earliest version of the CLT, which considered only the case off0; 1g valued X i 's was proved by De Moivre in as early as 1773, bounds on the L 1 error were not obtained for almost another 170 years. The rst result in this direction was proved by Berry (1941) and Esseen (1942), who showed that for X i 's having nite third moment, jjF n jj 1 CEjX 1 j 3 p n ; (1.1) for a universal constant C. The best possible value of C is an ongoing topic of research. Esseen's original upper bound on C 7:59 has been markedly reduced to C :4748 recently [48]. It should be noted that Esseen [15] had proved that C:4097. The result of Berry and Esseen and its subsequent improvements provide a satisfactory error bound in the CLT when the summands are i.i.d. The obvious next step of making the summand distributions non identical is also amenable to techniques similar to Berry and Esseen's. However, the cases where the summands are allowed to be dependent are in general harder. As the number of ways variables 2 can be dependent is countless, a general result like the aforementioned CLT is not possible in this case. However, in dierent special cases where the summands exhibit dependence among themselves, CLTs have been proven by several authors. Berry and Esseen had proved their results using characteristic function tech- niques. The presence of dependence makes the application of these techniques harder. Stein's method is one way of getting Berry-Esseen type bounds by using a characterization of the normal distribution rather than resorting to the complicated characteristic functions. As it does not use characteristic functions, Stein's method works in certain cases involving dependent summands also with relative ease. Stein's characterization states that ZN (0; 1) i E(f 0 (Z)) =E(Zf(Z)) for every dierentiable function f for which these moments exist. The key insight of Stein's method is that if a random variableW is suciently close to theN (0; 1) distribution then one would expectE(f 0 (W ))E(Wf(W )) for a suciently large class of dierentiable functions f. We will look more closely at this intuition in Chapters 2 and 3. As we shall see later, one needs to bound the quantity sup f2C jE(f 0 (W ) Wf(W ))j for a suciently large class of dierentiable functionsC in order to ob- tain the L 1 distance of the distribution function of W given by F W , and . How 3 this bounding is carried out depends on the particular problem and may involve use of multitude of ideas and techniques. One particular technique that has been quite successfully used in this regard is the zero bias coupling [22]. In this thesis we use the zero bias couplings to obtain the error in the normal approximation for W n = P n i=1 e i(i) where ((e ij )) nn is a xed matrix and is an involution free of any xed points, that is (i)6= i for all 1 i n. Recall that an involution is a permutation onf1; 2;:::;ng so that 2 (i) = i for all 1 i n. Statistics such asW n arise in matched control experiments where we are interested in testing the signicance of the eect of a particular set of pairings on the outcome of an experiment. We discuss an application of statistics like W n in a practical situation in Chapter 3. Techniques pertinent to Stein's method for distributional approximation have been shown to be useful in answering some other questions as well, questions which can not be answered by distance bound between the normal and another distri- bution. We will consider concentration of measure inequalities in particular. In- formally, concentration of measure implies that the random variable's mass decays rapidly on regions far away from its mean. The normal distribution exhibits very strong concentration of measure phenomenon. This strong concentration of the nor- mal variable, however, does not imply that another random variable which closely approximates the normal distribution in the Kolmogorov distance will also exhibit similar strong concentration phenomenon. 4 While Stein's method deals with the question of how well theN (0; 1) distribution ts the overall distribution ofW n , it does not readily yield any information regarding the concentration property of W n or the rate with which P (W n x) decreases as a function of x. Concentration bounds are often needed in analyzing the time complexity of some randomized algorithms. We will brie y illustrate one such use of the concentration inequalities for querying a particular data structure called the skip list in Section 4.2.1. Concentration bounds are most useful when they are gaussian or subgaussian, that is, when the tail decay of W n is only `slightly' less than that of the Gaussian distribution when n is suciently large. The literature on concentration of measure inequalities for functions of indepen- dent random variables is extensive and some of the rather general results like the Cherno-Hoeding and Talagrand inequalities work quite well in several cases. We will state some of these well known concentration results in Chapter 4. However, there are very few results available for random variables which are functions of a collection of dependent random variables. Chatterjee [8] and Rai c [45] proved con- centration type inequalities for the rst time using tools from the Stein's method literature. Continuing in the same spirit as in [8], we show that size bias couplings, one of the cornerstones of Stein's method can be used quite eectively to obtain concentration of measure results as well. This fact is explained in Chapters 5 and 6. 5 Chapter 2 Fundamentals of Stein's Method In this chapter, we will brie y go over the fundamentals of Stein's method. In particular, we will review the Stein equation, the zero and size bias couplings and their constructions. Most of the results in this chapter are foundational in the Stein's method literature and can be found in the recent work of Chen et al. [11]. Also in the rest of the thesis L = denotes equality in distribution. 2.1 The Stein Equation The whole of Stein's method has its roots in the following characterization of a standard normal variable Z, due to Stein [49]. Lemma 2.1.1. If W L =Z, then E(f 0 (W )) =E(Wf(W )); (2.1) 6 for all absolutely continuous functions f :R!R with Ejf 0 (Z)j<1. Conversely, if (2.1) holds for all bounded, continuous and piecewise continuously dierentiable functions f with Ejf 0 (Z)j<1, then W has a standard normal distribution. The key intuition behind Stein's method is that if E(f 0 (W ))E(Wf(W )) for a suciently large class of dierentiable functions f, then the distribution of W should be close to the standard normal distribution. This is a reasonable expec- tation as identity (2.1.1) is a necessary condition for a distribution to be standard normal. As we shall see soon, not only is this intuition correct, it also yields er- ror bounds similar to the Berry-Esseen type error bounds in (1.1). We need the following result which is essential in making the intuition behind Stein's method concrete. Let (z) = P (Z z), the cumulative distribution function of Z. For xed z2R, consider the equation f 0 (w)wf(w) =1(wz) (z): (2.2) Lemma 2.1.2. For xed z2R, the unique bounded solution f(w) :=f z (w) of (2.2) is given by f z (w) = 8 > > < > > : p 2e w 2 =2 (w)[1 (z)] if wz p 2e w 2 =2 (z)[1 (w)] if w>z: Lemma 2.1.2 guarantees the existence and uniqueness of the solution to (2.2). Since jP (Wz) (z)j =jE(f 0 z (W )Wf z (W ))j; 7 we have jjF W jj 1 = sup z jP (Wz) (z)j = sup z jE(f 0 z (W )Wf z (W ))j (2.3) Note that the right hand side of (2.3) is zero for all z2R ifWN (0; 1): Clearly, boundingjE(f 0 (W )Wf(W )j for all f 2C, whereC is a class of dierentiable functions containingf z for allz2R is enough to obtain bounds for theL 1 distance jjF W jj 1 . Although, Lemma 2.1.2 allows the question regarding the L 1 error bounds to be translated in terms of the solution of the dierential equation (2.2), how exactly the quantityjE(f 0 (W )Wf(W ))j can be bounded depends on the particular problem. In this thesis, we will be concerned with two techniques that are widely used. They are the zero bias couplings and the size bias couplings. These two couplings are described in the next two sections. 8 2.2 Zero Bias Coupling The zero bias transformation was introduced in [22]. For a random variableW with zero mean and nite, positive variance 2 , we say thatW follows theW -zero bias distribution if E(Wf(W )) = 2 E(f 0 (W )); (2.4) for all dierentiable functions f for which the above expectations are well dened. In [22], it has been shown that the zero bias distribution always exists as long as E(W ) = 0 and 0 < 2 <1. In fact, irrespective of the distribution of W , W always has a continuous distribution. Lemma 2.1.1 shows that when W is normally distributed, W L = W and if the zero bias distribution is viewed as a transformation of the original distribution of W , the normal distribution is the only xed point of this transformation. Owing to (2.4), we see that with f z as in Lemma 2.1.2 and W having unit variance, jP (Wz) (z)j =jE(f 0 z (W )Wf z (W ))j =jE(f 0 z (W )f 0 z (W ))j: Clearly, ifW andW can be coupled in such a way so that their dierence is quite small, then we can expect thatW will be close to the standard normal distribution in law. 9 The following result from Chapter 8 of [11] shows that the intuition from the previous paragraph is indeed correct. Theorem 2.2.1. Let W be a mean zero and variance one random variable, and suppose that there exists W , having the W -zero biased distribution, dened on the same space as W , satisfyingjWW j. Then sup z2R jP (Wz) (z)jc where c = 1 + 1= p 2 + p 2=4 2:03: Thus a small value of ensures thatW andZ are close in distribution. Whenever we can construct W in such a way so that W and W are dened on the same probability space, we call W a zero bias coupling of W . From the Stein's method perspective, zero bias couplings are typically more interesting than just the zero bias distribution. While traditionally theL 1 bounds have been the most interesting error metric for the probability community, there is no reason why one could not consider generalL p metrics. Recall that for two real valued functionsf;g :R!R, the L p distance between them is dened by jjfgjj p = Z 1 1 jf(x)g(x)j p dx 1=p : In [20], the following result was proved regarding the L 1 distance bound between a random variable W and the normal distribution. 10 Theorem 2.2.2. Let W be a mean zero, variance one random variable with distri- bution function F and let W have the W -zero biased distribution and be dened on the same probability space as W . Then with as before, jjF jj 1 = Z 1 1 jF (x) (x)jdx 2EjWW j: Theorem 2.2.2 shows that closeness ofW andW guarantees the closeness of the F and in theL 1 metric as well.While Theorems 2.2.1 and 2.2.2 assert that a `good' zero bias coupling will ensure good bounds in accuracy of normal approximation, we have not as yet proven any result that shows how to create a zero bias coupling for a particular given W . While, there is no recipe available in general, in certain special cases it is possible to construct zero bias couplings after some work. The easiest of these cases is probably the construction of zero bias coupling for the partial sum of independent variables as illustrated in the following result, the proof of which can be found in [11]. Theorem 2.2.3. Let i ; 1in, be a collection of independent mean zero random variables with variance 2 i summing to one. Let W = P n i=1 i . If I is a random variable distributed independently of the i 's with distribution P (I =i) = 2 i ; 11 then the random variable W =W I + I follows the W -zero bias distribution. Another case where we can construct zero bias couplings explicitly occurs when we have-Stein pair available. A-Stein pair is a pair of random variables (W;W 0 ) which are exchangeable and satisfy E(WW 0 jW ) =W for some 2 (0; 1): (2.5) The interested reader can see [10] for more on Stein pairs. Before proceeding any further, we dene the square bias distribution for a-Stein pair (W;W 0 ) as the one with the joint distribution function dF y (w;w 0 ) = (ww 0 ) 2 E(WW 0 ) 2 dF (w;w 0 ) (2.6) The following proposition from [20] shows how Stein pairs can be used to create a W -zero biased variable. Proposition 2.2.1. Let W;W 0 be an exchangeable pair with Var(W ) = 2 2 (0;1) and distribution F (w;w 0 ) satisfying the linearity condition (2.5). Then E(WW 0 ) 2 = 2 2 : (2.7) 12 Suppose (W y ;W z ) has the square bias distribution given by (2.6) and UU[0,1] is independent of W y ;W z . Then the variable W =UW y + (1U)W z has the W -zero biased distribution. Among the results stated in this section, Proposition 2.2.1 and Theorem 2.2.2 are used in Chapter 3 to obtain the zero bias coupling and L 1 error bounds for a specic combinatorial statistic respectively. 13 2.3 Size Bias Couplings Like the zero bias transformation, the size bias transformation is dened using a characterising equation similar to (2.4). However, dening Y -size bias transforma- tions only requires that Y 0 and 0 < E(Y ) <1. Thus, unlike the zero bias transformation, we do not need the nite variance assumption to dene size bias transformations. For a nonnegative random variableY with positive nite mean, we say that Y s follows the Y -size biased distribution if it satises E(Yf(Y )) =E(f(Y s )); (2.8) for all functions f for which E(Yf(Y )) exists. The distribution of Y s is given by the distribution function dF s (y) = y dF (y); (2.9) where F is the distribution function of Y . As Y is supported only on the nonneg- ative half line and 0 < <1, we see that F s as in (2.9) is well dened and is a distribution function. From (2.9), we see that size biasing of random variables is essentially sampling them proportional to their size. Of the many contexts in which size biasing appears, perhaps the most well known is the waiting time paradox, so clearly described in Feller [16], Section I.4. Here, a paradox is generated by the fact that in choosing a 14 time interval `at random' in which to wait for, say buses, it is more likely that an interval with a longer interarrival time is selected. In statistical contexts it has long been known that size biasing may aect a random sample in adverse ways, though at times this same phenomena may also be used to correct for certain biases [35]. Similar to the zero bias transformation, whenever the Y -size biased random variable Y s can be dened on the same probability space as Y , we say Y and Y s are coupled and under these circumstances, the (Y;Y s ) tuple is called a size bias coupling ofY . Unlike the zero bias coupling though, the construction of a size bias coupling is often easier, especially whenY is a sum of possibly dependent Bernoulli random variables. We now review the discussion in [23] which gives a procedure for a construction of size bias couplings when Y is a sum; the method has its roots in the work of Baldi et al. [1]. The construction depends on being able to size bias a collection of nonnegative random variables in a given coordinate, as described in denition (2.3.1). Definition 2.3.1. LetA be an arbitrary index set and letfX : 2Ag be a collection of nonnegative random variables with nite, nonzero expectationsEX = and joint distribution dF (x) where x = (x ;2A). For 2A, we say that X =fX :2Ag has the X-size bias distribution in coordinate if X has joint distribution dF (x) =x dF (x)= : 15 Just as (2.9) is related to (2.8), the random vector X has the X-size bias distribution in coordinate if and only if E[X f(X)] = E[f(X )] for all functions f for which these expectations exist. Now lettingf(X) =g(X ) for some functiong one recovers (2.8), showing that the th coordinate of X , that is, X , has the X -size bias distribution. The factorization P (X2dx) =P (X2dxjX =x)P (X 2dx) of the joint distribution of X suggests a way to construct X. First generate X , a variable with distribution P (X 2 dx). If X = x, then generate the remain- ing variatesfX ; 6= g with distribution P (X2 dxjX = x). Now, by the factorization of dF (x), we have dF (x) = x dF (x)= =P (X2dxjX =x)x P (X 2dx)= = P (X2dxjX =x)P (X 2dx): (2.10) Hence, to generate X with distributiondF , rst generate a variableX with the X -size bias distribution, then, when X = x, generate the remaining variables according to their original conditional distribution given that the th coordinate takes on the value x. 16 Denition 2.3.1 and the following proposition will be applied in the subsequent constructions. The interested reader is referred to Section 2 of [23] for the simple proof. Proposition 2.3.1. LetA be an arbitrary index set, and let X =fX ;2Ag be a collection of nonnegative random variables with nite means. For any subset BA, set X B = X 2B X and B =EX B : Suppose B A with 0 < B <1, and for 2 B let X have the X-size bi- ased distribution in coordinate as in Denition 2.3.1. If X B has the mixture distribution L(X B ) = X 2B B L(X ); then EX B f(X) = B Ef(X B ) for all real valued functions f for which these expectations exist. Hence, for any AA, if f is a function of X A = P 2A X only, EX B f(X A ) = B Ef(X B A ) where X B A = X 2A X B : (2.11) Taking A =B in (2.11) we have EX A f(X A ) = A Ef(X A A ), and hence X A A has the X A -size biased distribution, as in (2.8). 17 In our examples we use Proposition 2.3.1 and (2.10) to obtain a variable Y s with the size bias distribution of Y , where Y = P 2A X , as follows. First choose a random index I2A with probability P (I =) = = A ; 2A. (2.12) Next generate X I I with the size bias distribution of X I . If I = and X = x, generatingfX :2Anfgg using the (original) conditional distribution P (X ;6=jX =x); the sum Y s = P 2A X I has the Y size biased distribution. In our subsequent discussions in Chapters 5 and 6, we will be dealing with the case when each X is distributed as a Bernoulli variable, for example, they may be indicators of a specic event, such as the occurrence of a specic pattern in a random permutation or the event where a specic vertex in a random graph is empty. In these cases Y , their sum, gives the total number of occurrences of the aforesaid events. When we are dealing with Y which is the sum of possibly dependent indicator variables, to size bias Y , following the recipe outlined above, we simply choose the random indexI so thatI follows distribution (2.12). Since the size bias transforma- tion of a nontrivial Bernoulli variable is degenerate at one, we generatefX ;6=g 18 following the conditional distribution offX ;6= jX = 1g. Thus, the random variable Y = P 6= X + 1 is one particular realization of Y s . 19 Chapter 3 Combinatorial Central Limit Theorem for Involutions One of the most important aspects of Stein's method is the relative ease with which it can be applied to certain cases where the summands in a CLT are dependent. In this chapter, we will see one such case where we get the explicit Berry-Esseen type bound of the optimal order. 3.1 Statement of the Problem and Literature Review Let E = ((e ij )) be an nn array of real numbers. The study of combinatorial central limit theorems, that is, central limit theorems for random variables of the form Y E = n X i=1 e i(i) (3.1) 20 focuses on the case where is a permutation chosen uniformly from a subset A n of S n , the permutation group of order n. Some of the well studied choices for A n areS n itself [29][28][7][20], the collection n of xed point free involutions [24] [47], and the collection of permutations having one long cycle [32]. The last two cases of A n are examples of distributions over S n that are constant on same cycle type, considered in [19]. A probability distributionP dened onS n is said to be constant over cycle type if for any two permutations and , P ( 1 ) =P (): A probability thats constant over cycle types gives equal mass to permutations which have the same number of disjoint cycles in their unique cycle decompositions. Recall from Chapter 1 that the collection of involutions is given by n =f2S n : 2 = id;8i :(i)6=ig (3.2) with id denoting the identity permutation. However, before focusing on the case of A n = n , we rst brie y review the existing results pertaining to the much more studied case A n =S n , otherwise commonly known as the Hoeding combinatorial CLT. Approximating the distribution ofY E by the normal when is chosen uniformly from S n began with the work of Wald and Wolfowitz [53], who, motivated by 21 approximating null distributions for permutation test statistics, proved the central limit theorem as n!1 for the case where the factorization e ij = b i c j holds. In this special case, when b i are numerical characteristics of some population, k of the c j 's are 1, and the remaining nk are zero, Y E has the distribution of a sum obtained by sampling a set of size k uniformly from the population. More general arrays which are not separable in functions of i and j were handled in the work of Hoeding [29]. Motoo obtained Lindeberg-type conditions which are sucient for the normal limit in [37]. A number of later authors rened these limiting results and obtained information on the rate of convergence and bounds on the error in the normal approximation, usually in the L 1 norm. Ho and Chen [28] and von Bahr [52] derived L 1 bounds when the matrixE is random, the former using a concentration inequality approach and Stein's method, yielding the correct rate O(n 1=2 ) under certain boundedness assumptions on sup i;j je i;j j. Goldstein [19], employing the zero bias version of Stein's method obtained bounds of the correct order with an explicit constant, but in terms of sup i;j je i;j j. The work of Bolthausen [7], proceeding inductively, is the only one which yields anL 1 bound in terms of third moment type quantities on E without the need for conditions on sup i;j je i;j j. More recently, Goldstein [20] obtained L 1 bounds for this case using zero biasing. The case ofA n = n was considered much more recently. In [47], a permutation test is considered for a certain matched pair experiment designed to answer the 22 question of whether there is an unusually high degree of similarity in a distinguished pairing, when there is some unknown background or baseline similarity between all pairs. In such a case, one considersY E as in (3.1) when is chosen uniformly from n . One illustrative example where the involution statistic can be used is testing for presence of cheating in a classroom with n students, labeled 1; 2;:::;n. We assume that n is even and that during the exam, the students are seated in pairs. Also, it is possible for each student to see what his/her neighbouring student is writing. Suppose denotes the particular seating arrangement, that is the student labeled (i) is the neighbour of the student i. As the students are seated in pairs, it follows that is an involution without xed points. If we propose a similarity measure e ij that is an increasing function of the similarity between the answers given by students i andj, then we can conclude that we have evidence of cheating if P n i=1 e i(i) is signicantly high compared to a realization ofY E = P n i=1 e i(i) when is chosen uniformly from n . For this comparison we need to have idea about the distribution of Y E when is chosen uniformly from n . While this example is not a very signicant one, similar ideas have been proposed in clinical medicine in [47]. Situations similar to the aforementioned example arise naturally in environmental and medical studies, where subjects in a given study group are matched by certain common background of interest, such as having lived in the same neighborhood during a given period or having certain common medical conditions among other possibilities. Since the distribution ofY E is complicated,L 1 bounds for the error in normal approximation enables one to test whether a distinguished pairing sharing 23 the common background shows unusually high level of similarity. The interested reader can look into [24] for a discussion on these applications. A Berry-Esseen type bound for Y E was provided in the L 1 norm by [24], for chosen uniformly from n , but under a boundedness assumption. In this chapter, we use techniques similar to those in [7] and [20] to obtain the L p error bounds in normal approximation. We manage to relax the assumptions of [24] so that L p bounds to the normal for the involution case can be obtained for arrays E, which are possibly not uniformly bounded in n. These L p bounds are obtained in terms of third moment type quantities on the matrix E. In particular, in Theorem 3.2.1 we show that if 2 n uniformly, W denotes the variable Y E appropriately standardized and K p is given explicitly as K p = (379) 1=p (61; 702; 446) 11=p ; (3.3) then the L p norm of the dierence between W and the normal satises jjF W jj p K p E n for all n 9, where E is given in (3.12). This error bound yields a rate of O(n 1=2 ) in the case of arrays bounded uni- formly in n and as indicated in the appendix, for the bounded arrays this rate can not be improved uniformly over all arrays. Although the constantK p is quite large 24 in magnitude to be applied to a practical example, it is the rst of its kind in the literature. Also we improve upon Goldstein and Rinott's [24] result since we obtain bounds of order O(n 1=2 ) under milder conditions like E = p n being bounded in- stead of supje ij j being bounded. It should be noted that the method applied here can be adopted to giveL p estimates in Hoeding's combinatorial CLT as well, and will yield a bound of the same from as the one obtained in [7]. The main tool that we use to obtain L p error bounds for Y E are zero biasing and an inductive argument similar in avour to [7]. While Bolthausen's method [7] yields optimal results for the Hoeding combinatorial CLT, that is, the case of A n =S n , it is not immediately clear to the author if the exact same method can be extended to other classes of permutations such as involutions, that isA n = n . The main problem is that the auxiliary permutations 1 ; 2 ; 3 considered in page 383 of [7] are unrestricted, but similar permutations for the involutions CLT have to be involutions without a xed point and it is unclear to the author how to produce the auxiliary permutations satisfying these restrictions. As we shall see, this diculty can be overcome in a natural way by using zero biasing. The auxiliary variables Y y and Y z produced by Proposition 2.2.1 have distributions that are absolutely continuous with respect to Y E as in (3.1) with 2 n . This, in particular, ensures that the corresponding auxiliary permutations y and z which we obtain also are involutions. A second novelty of using zero bias transformation is that it yields not only the optimal L 1 or Berry-Esseen bounds, but general L p bounds holding for all 1p1. 25 What follows in this chapter is organized as follows. In Section 2, we introduce some notations and state our main result. In Section 3 the construction of the zero bias transformation is reviewed, and an outline is provided that illustrates how to obtain zero bias couplings in some cases of interest. In Section 4, L 1 bounds are obtained for the normal approximation of Y E . Lastly, in Section 5 we use the calculations in Section 4 along with the recursive argument of [7] to obtain L 1 bounds. From the L 1 and L 1 bounds, the following simple inequality allows for the computation of L p bounds for all intermediate p2 (1;1), jjfjj p p jjfjj p1 1 jjfjj 1 : (3.4) 26 3.2 Notation and Statement of Main Result Forn an even positive integer, let be a permutation chosen uniformly from n in (3.2), the set of involutions with no xed points. Since for 2 n the terms e i(i) and e (i)i always appear together in (3.1), and e ii never appears, we may assume without loss of generality that e ij =e ji and that e ii = 0 for all i;j = 1; 2;:::;n. (3.5) For an array E = ((e ij )) 1i;jn satisfying the conditions in (3.5), dene e i+ = n X j=1 e ij ; e +j = n X i=1 e ij and e ++ = n X i;j=1 e ij : Then, as shown in [24], E = EY E = e ++ n1 2 E = 2 (n1)(n3) (n 2) P 1i;jn e 2 ij + 1 n1 e 2 ++ 2 P n i=1 e 2 i+ : (3.6) Again following [24], letting b e ij = 8 > > < > > : e ij e i+ n2 e +j n2 + e ++ (n1)(n2) i6=j 0 i =j; (3.7) we have b e +i =b e j+ =b e ++ = 0 for all i;j = 1;:::;n. (3.8) 27 and that Y b E =Y E E where b E is obtained from E by (3.7). We consider bounds to the normalN (0; 1) for the standardized variable W = Y E E E : (3.9) Since Y E and Y b E dier by a constant, (3.6) and (3.8) yield 2 E = 2 b E = 2(n 2) (n 1)(n 3) X 1i;jn b e 2 ij : (3.10) In particular, the mean zero, variance 1 random variableW in (3.9) can be written as W = n X i=1 d i(i) where d ij =b e ij = E ; (3.11) and moreover, the array b E inherits properties (3.5) and (3.8) from E, as then does D from b E. For any array E = ((e ij )) nn , let E = X i6=j jb e ij j 3 3 E withb e ij as in (3.7) and 2 E as in (3.6). (3.12) 28 As E appears in the denominator in denition (3.9), in order to guarantee thatW is well dened, we need that 2 E > 0 and to achieve this, we henceforth impose the following condition without further mention. Condition 3.2.1. The valueb e ij 6= 0 for some i6=j, or, equivalently, e ij e i+ n 2 e +j n 2 + e ++ (n 1)(n 2) 6= 0 for some i6=j. Sinced ij andb e ij are linearly related, Condition 3.2.1 is equivalent to the condition that d ij 6= 0 for some i6=j. We provide bounds on the accuracy of the normal approximation ofW using the zero bias transformation. Recall, for any random variable W with mean zero and variance 2 , there exists a unique distribution for a random variableW , called the zero bias distribution satisfying identity (2.4). Also, from Theorem 2.2.2 if W and W are dened on the same space, and (W;W ) is a zero bias coupling, then jjF W jj 1 2EjW Wj: (3.13) The following is our main result which we prove using the zero bias coupling. Theorem 3.2.1. Let E = ((e ij )) nn be an array satisfying e ij = e ji ;e ii = 08i;j, and let be an involution chosen uniformly from n . If Y E = n X i=1 e i(i) ; 29 and W = (Y E E )= E , then for n 9 and p2 [1;1], with E as in (3.12), we have jjF W jj p K p E n : Here F W denotes the distribution function of W , is the distribution function of a standard normal variate and K p is as in (3.3). As W in the theorem is given by (3.11) with d ij =d ji ;d ii = 0;d i+ = 0 and 2 D = 1; E = D (3.14) we assume in what follows that all subsequent occurrences of ((d ij )) satisfy these conditions and instead of working with E work with the centered and scaled array D only. In the next section, we review construction of zero bias couplings in certain cases of interest that include the present problem. 30 3.3 Zero Bias Transformation We prove Theorem 3.2.1 by constructing a zero bias coupling using a Stein pair, as introduced in equation (2.5) from Chapter 2 and Proposition 2.2.1. We review the construction of (W y ;W z ) from (W;W 0 ) as outlined in (2.5). Sup- pose we have a Stein pair (W;W 0 ) which is a function of some collection of random variablesf ;2g, and that for a possibly random index setI, independent off ;2 g, the dierence WW 0 depends only on I and onf ;2 I g, where I depends on I. That is, for some function b(i; ;2 i ) dened on i, and I a random index set, WW 0 =b(I; ;2 I ): (3.15) We now show how, under this framework, the pair (W;W 0 ) can be constructed; the pair (W y ;W z ) will then be constructed in a similar fashion. First generate I, then independently generatef ;2 I g and nallyf ;2 c I g conditioned on f ;2 I g. That is, rst generate the indices I on which the dierence WW 0 depends, then the underlying variables ;2 I which make up that dierence, and lastly the remaining variables. This construction corresponds to the following factorization of the joint distribution of I andf ;2g as the product dF (i; ;2) =P (I =i)dF i ( ;2 i )dF i c ji ( ; = 2 i j ;2 i ): (3.16) 31 For dF y we consider the joint distribution of I andf ;2 g, biased by the squared dierence (ww 0 ) 2 , that is, dF y (i; ;2) = (ww 0 ) 2 E(WW 0 ) 2 dF (i; ;2): (3.17) From (2.7), (3.15) and the independence of I andf ;2g we obtain X i P (I =i)Eb 2 (i; ;2 i ) = 2 2 : (3.18) Hence we can dene a probability distribution for an index set I y by P (I y =i) = r i 2 2 where r i =P (I =i)Eb 2 (i; ;2 i ): (3.19) From (3.15), (3.17) and (3.19), we obtain dF y (i; ;2) = b 2 (i; ;2 i ) 2 2 P (I =i)dF i ( ;2 i )dF i c ji ( ; = 2 i j ;2 i ) = r i 2 2 b 2 (i; ;2 i ) Eb 2 (i; ;2 i ) dF i ( ;2 i )dF i c ji ( ; = 2 i j ;2 i ) = P (I y =i)dF y i ( ;2 i )dF i c ji ( ; = 2 i j ;2 i ) (3.20) where dF y i ( ;2 i ) = b 2 (i; ;2 i ) Eb 2 (i; ;2 i ) dF i ( ;2 i ): (3.21) 32 Note that (3.20) gives a representation of the distribution F y which is of the same form as (3.16). The parallel forms of F and F y allow us to generate variables having distributionF y parallel to that for distributionF ; rst generate the random index I y , thenf y ;2 y I g according to dF y I , and lastly the remaining variables according to dF I c jI ( ; = 2 i j ;2 i ). That the last step is the same in the construction of pairs of variables having the F and F y distribution allows an opportunity for a coupling between (W;W 0 ) and (W y ;W z ) to be achieved and from Proposition 2.2.1, this suces to couple W and W . One coupling may be accomplished using the following outline, as was done in [20]. First generate I andf ;2 g, yielding the pair W;W 0 . Next generate I y and then the variablesf y : 2 I yg following dF y i if the realization of I y is i. Lastly, when constructing the remaining variables that isf y : = 2 i g which make up W y ;W z , use as much of the previously generated variablesf ; = 2 i g as possible so that the pairs (W;W 0 ) and (W y ;W z ) will be close. We review the construction of (W;W 0 ) in [24] for the case at hand, and then show how it agrees with the outline above. For distinct i;j2f1;:::;ng let i;j be the permutation which transposes the elementsi andj, that is, i;j (i) =j; i;j (j) =i and i;j (k) =k for all k62fi;j;g. Furthermore, given 2 n , let i;j = i;(j) j;(i) : 33 Note that for any given 2 n , the permutation i;j will again belong to n . In particular, whereas has the cycle(s) (i;(i)) and (j;(j)), the permutation i;j has cycle(s) (i;j) and ((i);(j)), and all other cycles in common with. Now, with chosen uniformly from n andW given by (3.11), we construct an exchangeable pair (W;W 0 ), as in [24], as follows. Choose two distinct indicesI = (I;J) uniformly fromf1; 2;:::;ng, that is, having distribution P (I =i) =P (I =i;J =j) = 1 n(n 1) 1(i6=j) (3.22) and let 0 = I;J : (3.23) Since (I;J) is chosen uniformly over all distinct pairs, and 0 are exchangeable, and hence, letting W 0 = P d i 0 (i) , so are (W;W 0 ). Moreover, as 0 has the cycle(s) (I;J) and ((I);(J)), and shares all other cycles with , we have WW 0 = 2(d I(I) +d J(J) (d IJ +d (I)(J) )): (3.24) Using (3.24) it is shown in [24] that E(WW 0 jW ) = 4 n W; (3.25) that is, (2.5) is satised with = 4=n. 34 To put this construction in the framework above, so to be able to apply the de- composition (3.20) for the construction of a zero bias coupling, let =f1; 2;:::;ng and =(). From (3.24), we see that the dierence WW 0 depends on a pair of randomly chosen indices, and their images. Hence, regarding these indices, let i = (i;j)2 2 and let I = (I;J) where I and J have joint distribution given in (3.22), specifyingP (I =i), the rst term in (3.16). Also, in this case i =i. Next, for given distinct i;j and 2 n , we have (i)6=(j);(i)6=i and (j)6=j. As is chosen uniformly from n , the distribution of the images k = i and l = j of distinct i and j under is given by dF i ( ;2i) =dF i;j (k;l)/1(k6=l;k6=i;l6=j) for i6=j; (3.26) specifying the second term of (3.16). The last term in (3.16) is given by dF i c ji ( ;62ij ;2i) = P ((i) =k;(j) =l;() = ;62fi;j;k;lg) P ((i) =k;(j) =l) ;(3.27) when i6= j and k6= l. The equality in (3.27) follows from the fact that is an involution, implying k =(k) =((i)) =i and similarly for l =j, and thus f(i) =k;(j) =l;() = ;62fi;jgg = f(i) =k;(j) =l;(k) =i;(l) =j;() = ;62fi;j;k;lgg = f(i) =k;(j) =l;() = ;62fi;j;k;lgg: 35 The conditional distribution in (3.27) can be simplied further by considering the two cases k = j or equivalently l = i and k6= j or equivalentlyjfi;j;k;lgj = 4 separately. If k =j then we have l =i and thus we obtain dF i c ji ( ;62ij i =j; j =i) = P ((i) =j;(j) =i;() = ;62fi;jg) P ((i) =j;(j) =i) = P ((i) =j;() = ;62fi;jg) P ((i) =j) = j n j 1 (n 1) 1 =j n2 j 1 : (3.28) For (3.28), we note P ((i) =j) = 1=(n 1) since is chosen uniformly from n . The last equality in (3.28) simply indicates that to obtain in its entirety we follow a two step procedure. In the rst step we x (i) = j, that is the cycle (i;j) in , and then in the next step we only have to choose an involution uniformly at random from the rest of the indices that is, essentially sample uniformly from n2 to construct . This argument yields the recursion j n j = (n 1)j n2 j; yieldingj n j = (n 1)(n 3)::: 1: (3.29) When we have k6=j and hence l6=i or equivalentlyjfi;j;k;lgj = 4 in (3.27), we obtain dF i c ji ( ;62ij ;2i) = P ((i) =k;(j) =l;() = ;62fi;j;k;lg) P ((i) =k;(j) =l) = j n j 1 ((n 1)(n 3)) 1 =j n4 j 1 : (3.30) 36 In (3.30) we have used the equality P ((i) =k;(j) =l) =P ((i) =k)P ((j) =lj(i) =k) = 1 (n 3)(n 1) : which follows from the fact that is an involution chosen uniformly at random. From (3.30) we see that the conditional distribution dF i c ji is uniform over all values of for 2 for which i = k; j = l and P (() = ;2 ) > 0 whenjfi;j;k;lgj = 4. Hence we may construct (W;W 0 ) following (3.16), that is, rst choosing I = (I;J), then the images K = (K;L) of I and J under and 0 , then the remaining images uniformly over all possible values for which the resulting permutations lie in n . We may construct (W y ;W z ) from (3.20) quite easily now. In view of (3.15) and (3.24), for pairs of distinct indices i;j, let b(i;j; i ; j ) = 2(d i i +d j j (d ij +d i j )); (3.31) where again we may also let k = i and l = j . Now, considering the rst two factors in (3.20), using (3.19), (3.21), (3.22) and (3.26) we obtain, P (I y =i)dF y i ( ;2i) = P (I y =i;J y =j)dF y i;j (K y =k;L y =l) / P (I =i;J =j)[d ik +d jl (d ij +d kl )] 2 1(k6=l;k6=i;l6=j) 37 / [d ik +d jl (d ij +d kl )] 2 1(k6=l;k6=i;l6=j;i6=j): (3.32) By Lemma 3.3.3, below, relation (3.32) species a joint distribution, sayp(i;j;k;l), on the pairs I y = (I y ;J y ) and their images ( I y; J y) = (K y ;L y ) := K y , say. Since when j =k, or equivalently i =l the square term vanishes, so we may write p(i;j;k;l) =c n [d ik +d jl (d ij +d kl )] 2 1(jfi;j;k;lgj = 4); (3.33) where the constant of proportionality c n is provided in Lemma 3.3.3 on page 46. Next, note that since i;j;k;l have to be distinct in the denition (3.33), the third term in (3.20), dF i c ji ( ; = 2 ij ;2 i), reduces to uniform distribution over all values of for 2 for which i =k; j =l and P (() = ;2)> 0 owing to (3.30). Once we form (W y ;W z ) following the square bias distribution (2.6), its easy to produceW which has theW zero bias distribution using Proposition 2.2.1. We summarize the conclusions above in the following lemma. Lemma 3.3.1. LetdF (w;w 0 ) be the joint distribution of a Stein pair (W;W 0 ) where W = n X i=1 d i(i) and W 0 = n X i=1 d i 0 (i) with chosen uniformly from n and 0 as in (3.23) with I;J having distribution (3.22). Then a pair (W y ;W z ) with the joint distribution (2.6) can be constructed by setting W y = n X i=1 d i y (i) and W z = n X i=1 d i z (i) 38 where y and z = y y I y ;J y are constructed by rst sampling I y = (I y ;J y ) and the respective images K y = (K y ;L y ) under y according to (3.33) and then selecting the remaining images of y uniformly from among the choices for which it lies in n . Furthermore if UU[0; 1] is independent of (W y ;W z ), then W = UW y + (1U)W z has the W zero bias distribution. In the next section we dene the d ag in a case by case basis and also prove that it indeed gives us the desired distribution. 39 3.3.0.1 Dening the Auxiliary Permutation Given the permutations and 0 from which the pair (W;W 0 ) is constructed, we would like to form (W y ;W z ) as close to (W;W 0 ) as possible, thus making W close to W . Towards this end, we follow the construction noted after (3.20), using many of the already chosen variables which form and 0 to make the two pairs close. In particular, begin the construction of (W y ;W z ) by choosing I y = (I y ;J y ) and K y = (K y ;L y ) with joint distribution (3.33), independent of and 0 . Let R 1 = jf(I y );(J y )g\fK y ;L y gj and R 2 = jf(I y );(K y )g\fJ y ;L y gj; clearly R 1 ;R 2 2f0; 1; 2g. Dene y by y = 8 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > : J y ;L y if (I y ) =K y and (J y )6=L y , (R 1 ;R 2 ) = (1; 0) I y ;K y if (I y )6=K y and (J y ) =L y , (R 1 ;R 2 ) = (1; 0) J y ;K y I y ;J y K y ;L y if (I y ) =L y and (J y )6=K y , (R 1 ;R 2 ) = (1; 1) I y ;L y I y ;J y K y ;L y if (I y )6=L y and (J y ) =K y , (R 1 ;R 2 ) = (1; 1) K y ;L y I y ;L y J y ;K y if (I y ) =J y and (K y )6=L y , (R 1 ;R 2 ) = (0; 1) I y ;J y I y ;L y J y ;K y if (I y )6=J y and (K y ) =L y , (R 1 ;R 2 ) = (0; 1) if (I y ) =K y and (J y ) =L y , (R 1 ;R 2 ) = (2; 0) I y ;L y J y ;K y if (I y ) =J y and (K y ) =L y , (R 1 ;R 2 ) = (0; 2) I y ;J y K y ;L y if (I y ) =L y and (J y ) =K y , (R 1 ;R 2 ) = (2; 2) I y ;K y J y ;L y when R 1 =R 2 = 0: (3.34) 40 The partition in display (3.34) is based on the possible values of (R 1 ;R 2 ); it does not include the cases where (R 1 ;R 2 ) = (2; 1) or (R 1 ;R 2 ) = (1; 2) because these two events are impossible. If R 1 = 2 and (I y ) =K y ;(J y ) =L y then R 2 = 0 while if (I y ) =L y ;(J y ) =K y then R 2 = 2. Similarly one can rule out (R 1 ;R 2 ) = (1; 2). Similar arguments show us that the cases described above are indeed exhaustive. Clearly any two cases in (3.34) with diering values of (R 1 ;R 2 ) tuple are exclu- sive. Also, it can be easily checked that any two cases with the same tuple value, are also exclusive. For example, in case one we have (I y ) = K y whereas in case two we have (I y )6= K y making these two cases disjoint. In summary y is well dened, and this construction species the pairs (; 0 ) and ( y ; z ) on the same space. The following lemma shows that the y so obtained is an involution. Lemma 3.3.2. For 2 n , the permutation y dened in (3.34) belongs to n and has the cycles (I y ;K y ) and (J y ;L y ). Moreover, with z as in Lemma 3.3.1, the per- mutations; y ; z are involutions when restricted to the setI =fI y ;J y ;K y ;L y ;(I y ); (J y );(K y );(L y )g, and agree on the complementI c . Proof. If we prove y has the cycles (I y ;K y ) and (J y ;L y ), then from (3.34) and z = y I y ;J y we have that, y , z all agree on the complement ofI, which proves the last claim in the lemma. Note that mapsI onto itself and is an involution when restricted toI. Therefore is an involution when restricted toI c , and hence so are y and z . So, we only need to prove y has the cycles as claimed. 41 Since z = y I y ;J y , it suces now to show that y is an involution onI and has cycles (I y ;K y ) and (J y ;L y ), which can be achieved by examining the cases in (3.34) one by one. For instance, in case one, where (I y ) = K y and (J y )6= L y we have I =fI y ;J y ;K y ;L y ;(J y );(L y )g and y (I y ) = J y ;L y (I y ) =(I y ) =K y y (J y ) = J y ;L y (J y ) = J y ;(L y ) (J y ) =((L y )) =L y and y ((J y )) = J y ;L y (J y ) = J y ;(L y ) L y ;(J y ) (J y ) =(L y ): Hence y is an involution onI and has cycles (I y ;K y ), (J y ;L y ), ((J y );(L y )). In case ten,jIj = 8, and y will be an involution onI with cycles (I y ;K y ), (J y ;L y ), ((I y );(K y )), ((J y );(L y )). As an illustration we note y (I y ) = I y ;K y J y ;L y (I y ) = I y ;K y (I y ) = I y ;(K y ) K y ;(I y ) (I y ) = I y ;(K y ) (I y ) =((K y )) =K y : That y has the other cycles as claimed can be shown similarly. So in these two cases y is an involution restricted toI and has the cycles (I y ;K y ), (J y ;L y ). That y is an involution onI with cycles (I y ;K y ), (J y ;L y ) can be similarly shown for the other cases, completing the proof. 42 Henceforth we will only write i;j for i;j unless otherwise mentioned. The utility of the construction (3.34) as a coupling is indicated by the following result. Theorem 3.3.1. Suppose is chosen uniformly at random from n and (I y ;J y ;K y ;L y ) has joint distribution p() as in (3.33). If y is obtained from according to (3.34) above, then and y are constructed on a common space and y satises the conditions specied in Lemma 3.3.1. Proof. By hypothesis the indices (I y ;J y ;K y ;L y ) have distribution in (3.33). From Lemma 3.3.2, we see that y is an involution and has cycles (I y ;K y ), (J y ;L y ). It only remains to verify that the distribution of y is uniform over all involutions in n having cycles (I y ;K y ) and (J y ;L y ). That is, recalling that I y = (I y ;J y ) and K y = (K y ;L y ), letting n;I y ;K y =f2 n :(I y ) =K y ;(J y ) =L y g; we need to verify that P ( y =jI y ;K y ) = 1 j n;I y ;K yj = 1 j n4 j for all 2 n;I y ;K y. (3.35) Since I y ;J y ;K y ;L y are distinct, withI as in Lemma 3.3.2, the size ofI satises 4jIj 8. In addition, since is an involution we see thatjfI y ;J y ;K y ;L y g\ f(I y );(J y );(K y );(L y )gj is even. Hence so isjIj, and we conclude thatI2 f4; 6; 8g. 43 For 2 n;I y ;K y, independence ofI and (I y ;K y ) yields P ( y =jI y ;K y ) = X 2f4;6;8g P y = jIj =;I y ;K y P (jIj =jI y ;K y ) = X 2f4;6;8g P y = jIj =;I y ;K y P (jIj =): (3.36) For 2 n let denote the restriction of to the complement of I y ;J y ;K y ;L y . First consider the case = 4 that isI =fI y ;J y ;K y ;L y g. Since y 2 n;I y ;K y the permutation y agrees with every 2 n;I y ;K y on I y ;J y ;K y ;L y , and, as y and agree onI c , P y = jIj = 4;I y ;K y = P y = jIj = 4;I y ;K y = P = jIj = 4;I y ;K y = 1 j n4 j : Now suppose = 6. In this case, the setJ =InfI y ;J y ;K y ;L y g has size 2, say J =fi 1 ;i 2 g. We claim that y has the cycle (i 1 ;i 2 ) withfi 1 ;i 2 g =f(a);(b)g for somea;b2fI y ;J y ;K y ;L y g, and that conditional onjIj = 6 andfI y ;J y ;K y ;L y g the values i 1 ;i 2 are uniform over all pairs of distinct values infI y ;J y ;K y ;L y g c . That y has the cycle (i 1 ;i 2 ) follows from Lemma 3.3.2. Suppose (i 1 ;i 2 ) = ((J y );(L y )) as in case 1 in (3.34), since J y ;L y do not form a cycle, and their images under under case 1 is constrained exactly to lie outsidefI y ;J y ;K y ;L y g, as is uniform over n , these images are uniform overfI y ;J y ;K y ;L y g c . One can show that these properties hold similarly in the remaining cases. The remaining cycles of y are in 44 I c and thus are the same as those of j I c. Thus, the cycles of y are conditionally uniform, that is, for 2 n;I y ;K y P y = jIj = 6;I y ;K y =P y = jIj = 6;I y ;K y = 1 j n4 j : The case = 8, that is, whereR 1 =R 2 = 0, is handled similar to = 6. Here y will have the cycles ((I y );(K y )); ((J y );(L y )). These two cycles are both of the form ((a);(b)) with a;b2fI y ;J y ;K y ;L y g and hence, as in the case = 6, uniform random transpositions. Since, and y agree onI c , we can see that y has uni- form random transpositions onfI y ;J y ;K y ;L y g c =I c [f(I y );(J y );(K y );(L y )g yielding P y = jIj = 8;I y ;K y =P y = jIj = 8;I y ;K y = 1 j n4 j : Thus (3.36) now yields P ( y =) = 1 j n4 j ; verifying (3.35) and proving the theorem. So the tuple ( y ; z ) obtained in Theorem 3.3.1 satises the conditions in Lemma 3.3.1. Hence (W y ;W z ) constructed from ( y ; z ) as in Lemma 3.3.1 has the required square bias distribution. 45 We conclude this section with the calculation of the normalization constant for the distribution p() in (3.33). Lemma 3.3.3. For n 4 and D = ((d ij )) nn satisfying (3.14), we have c n X jfi;j;k;lgj=4 [d ik +d jl (d ij +d kl )] 2 = 1 where c n = 1 2(n 1) 2 (n 3) =O 1 n 3 ; (3.37) and in particular c n 1 n 3 when n 9: (3.38) Proof. From (3.24), we have WW 0 = 2(d I(I) +d J(J) (d IJ +d (I)(J) )); where (I;J) are two distinct indices selected uniformly fromf1; 2;:::;ng. Since is an involution chosen uniformly, we have E(WW 0 ) 2 = 1 n(n 1) X i6=j 4E(d i(i) +d j(j) (d ij +d (i)(j) )) 2 = 4 n(n 1) 2 (n 3) X jfi;j;k;lgj=4 (d ik +d jl (d ij +d kl )) 2 : (3.39) 46 Using (3.39), (2.7) and (3.25) and 2 D = 1, we obtain 4 n(n 1) 2 (n 3) X jfi;j;k;lgj=4 (d ik +d jl (d ij +d kl )) 2 = 2 2 D = 8 n : On simplication, we obtain 1 2(n 1) 2 (n 3) X jfi;j;k;lgj=4 (d ik +d jl (d ij +d kl )) 2 = 1; proving (3.37). The verication of (3.38) is direct. 47 3.4 L 1 Bounds We now derive L 1 bounds for the normal approximation of W = P n i=1 d i(i) . The main theorem in this section is the following. Theorem 3.4.1. Let be an involution chosen uniformly from n andD = ((d ij )) be an array satisfying (3.14). Then with D as in (3.12), W = P n i=1 d i(i) satises jjF W jj 1 D n 224 + 1344 1 n + 384 1 n 2 when n 9. In particular, jjF W jj 1 379 D n when n 9. We will need the following inequalities in order to prove Theorem 3.4.1. To avoid writing down the indices over which we are summing, the summation will be taken over the same index set as the one immediately preceding it, unless otherwise specied . With p() as in (3.33), in what follows, we will apply bounds such as X jfi;j;k;lgj=4 jd ik jp(i;j;k;l) = c n X jd ik j[d ik +d jl (d ij +d kl )] 2 c n X i;j;k;l jd ik j[d ik +d jl (d ij +d kl )] 2 48 = c n X jd ik j(d 2 ik +d 2 jl +d 2 ij +d 2 kl ) 4c n n 2 D 4 D n when n 9: (3.40) The rst nontrivial equality above uses the special form of the term inside squares while the last inequality follows from from (3.38). Elaborating on the rst inequal- ity, whenever we encounter a cross term we always get a free index to sum over which gives us zero since d i+ = 08i. The second inequality uses the fact that for any choices 1 ; 2 ; 1 ; 2 2fi;j;k;lg with 1 6= 1 and 2 6= 2 , X i;j;k;l jd 1 1 jd 2 2 2 ( X jd ij j 3 ) 1 3 ( X jd kl j 3 ) 2 3 n 2 D : (3.41) Generally the exponent of n in such an inequality will be 2 less the number of indices over which we are summing up. For instance, if we are summing up over 5 indices the exponent of n will be 3 and so on. In particular, X jfi;j;k;l;sgj=5 jd s jp(i;j;k;l) 4n 3 c n D where 2fi;j;k;lg 4 D when n 9, using (3.38). (3.42) Also, X jfi;j;k;l;s;tgj=6 jd st jp(i;j;k;l) 4n 4 c n D 4n D when n 9: (3.43) 49 Theorem 3.4.2 ([20]). SupposeD = ((d ij )) is an array satisfying (3.14) and and y are as in Theorem 3.3.1, and z ;W y ;W z and W are as in Lemma 3.3.1. Then W;W y ;W z can be decomposed as W =S +T W y =S +T y W z =S +T z ; (3.44) where S = X i= 2I d i(i) ; T = X i2I d i(i) ; T y = X i2I d i y (i) and T z = X i2I d i z (i) ; (3.45) whereI is as in Lemma 3.3.2. Also, W has the W zero bias distribution and satises EjWW j 112 D n + 672 D n 2 + 192 D n 3 ; (3.46) when n 9. In view of (3.13) Theorem 3.4.2 implies Theorem 3.4.1. Proof. Lemma 3.3.1 guarantees that W has the W -zero biased distribution. Re- callingI =fI y ;J y ;K y ;L y ;(I y );(J y );(K y );(L y )g and that ; y and z agree onI c by Lemma 3.3.2, we obtain decomposition (3.44). 50 From W =UW y + (1U)W z and (3.44), we obtain EjWW j =EjUT y + (1U)T z Tj: (3.47) Using the fact thatE(U) = 1=2, and thatU is independent ofT y andT z , we obtain EjWW j 1 2 (EjT y j +EjT z j) +EjTj =EV where V =jT y j +jTj; (3.48) where the equality follows from the fact that y ; z , and therefore T y ;T z , are ex- changeable. Thus our goal is to bound the L 1 norms of T and T y and we proceed in a case by case basis, much along the lines of Section 6 in [20]. In summary, we group the ten cases in (3.34) into the following ve cases: R 1 = 1; R 1 = 0;R 2 = 1; R 1 = 2; R 1 = 0;R 2 = 2;R 1 = 0;R 2 = 0. Computation on R 1 = 1: The event R 1 = 1, which we indicate by 1 1 , can occur in four dierent ways, corresponding to the rst four cases in the denition of y in (3.34). With V as in (3.48), we can decompose 1 1 to yield V1 1 =V1 1;1 +V1 1;2 +V1 1;3 +V1 1;4 ; (3.49) 51 where 1 1;1 = 1((I y ) = K y ;(J y )6= L y ) and 1 1;m for m = 2; 3; 4 similarly corre- sponding to the other three cases in (3.34), in their respective order. On 1 1;1 , we haveI =fI y ;J y ;K y ;L y ;(J y );(L y )g and (I y ) =K y , yielding T1 1;1 = X i2I d i(i) 1 1;1 = 2(d I y K y +d J y (J y ) +d L y (L y ) )1 1;1 : (3.50) By Lemma 3.3.2 y has cycles (I y ;K y ); (J y ;L y ) and is an involution restricted to I, hence T y 1 1;1 = X i2I d i y (i) 1 1;1 = 2(d I y K y +d J y L y +d (J y )(L y ) )1 1;1 : (3.51) So, we obtain EjTj1 1;1 2(Ejd I y K yj1 1;1 +Ejd J y (J y ) j1 1;1 +Ejd L y (L y ) j1 1;1 ) (3.52) EjT y j1 1;1 2(Ejd I y ;K yj1 1;1 +Ejd J y ;L yj1 1;1 +Ejd (J y );(L y ) j1 1;1 ): (3.53) Because of the indicator in (3.52) and (3.53), we need to consider the joint distri- bution p 2 (i;j;k;l;s;t) =P ((I y ;J y ;K y ;L y ;(I y );(J y )) = (i;j;k;l;s;t)); which includes the images of I y ;J y under , say s and t respectively. 52 With c n as in Lemma 3.3.3, we claim p 2 () is given by p 2 (i;j;k;l;s;t) = 8 > > > > > > > > > > > > > < > > > > > > > > > > > > > : 1 n1 p(i;j;k;l) when s =j;t =i 0 when s =j;t6=i or s6=j;t =i 0 when s =i or t =j 0 when s =t 1 (n1)(n3) p(i;j;k;l) when t = 2fj;s;ig,s62fi;t;jg. (3.54) To justify (3.54), note rst thats =j if and only ift =i, for example,s =j implies t =(j) =(s) =((i)) =i, and therefore fs =jg =fs =j;t =ig =ft =ig: (3.55) Thus the second case of (3.54) has zero probability. The remaining trivial cases can be discarded using the fact that 2 n . Leaving these out, the rst probability is derived using (3.55), since the image of I y under is uniform overfI y g c and independent of (I y ;J y ;K y ;L y ), so in particular takes the value s = j with proba- bility 1=(n 1). In the last case it is easy to see that t = 2fj;s;ig and s62fi;t;jg are equivalent. The image of I y is uniform over all available n 1 choices, and, conditional on (I y )6= J y , the n 3 remaining choices for the image of J y fall in fI y ;(I y );J y g c uniformly. Next we bound each of the summands in (3.52) and (3.53) separately. First note that under 1 1;1 only the last form of p 2 (i;j;k;l;s;t) in (3.54) is relevant. In 53 particular, s6= t always, and since i;j;k;l are distinct, s = k implies s62fi;jg. Hence, for the rst summand in (3.52), we obtain Ejd I y K yj1 1;1 = X i;j;k;l;s;t jd ik jp 2 (i;j;k;l;s;t)1(s =k;t6=l) = X jfi;j;k;l;tgj=5 jd ik jp 2 (i;j;k;l;k;t) = n 4 (n 1)(n 3) X jfi;j;k;lgj=4 jd ik jp(i;j;k;l) 4 n(n 1) D using (3.40) 8 D n 2 when n 9: (3.56) Similarly, we can estimate the second summand in (3.52) as Ejd J y (J y ) j1 1;1 = X i;j;k;l;s;t jd jt jp 2 (i;j;k;l;s;t)1(s =k;t6=l) = X jfi;j;k;l;tgj=5 jd jt jp 2 (i;j;k;l;k;t) = 1 (n 1)(n 3) X jd jt jp(i;j;k;l) 4 (n 1)(n 3) D using (3.42), when n 9 8 D n 2 when n 9: (3.57) The last summand in (3.52) is Ejd L y (L y ) j1 1;1 . Using the fact that (I y ;J y ;K y ;L y ;(I y );(J y )) L = (K y ;L y ;I y ;J y ;(K y );(L y )); 54 (3.57) yields, Ejd L y (L y ) j1 1;1 =Ejd J y (J y ) j1 1;1 8 D n 2 when n 9: (3.58) Thus combining the bounds in (3.56),(3.57),(3.58) we obtain the following bound on the term (3.52), EjTj1 1;1 48 D n 2 when n 9: (3.59) Next, we note that the rst summand in (3.53) is the same as the rst summand of (3.52), and so can be bounded by (3.56). We can bound Ejd J y ;L yj1 1;1 , which is the second summand in (3.53) in a similar fashion as in (3.56) through the following calculation Ejd J y L yj1 1;1 = X jfi;j;k;l;tgj=5 jd jl jp 2 (i;j;k;l;k;t) = n 4 (n 1)(n 3) X jfi;j;k;lgj=4 jd jl jp(i;j;k;l) 8 D n 2 when n 9, using (3.40). (3.60) So we are only left with the last summand of (3.53) which is Ejd (J y )(L y ) j1 1;1 . For this we will need to introduce the joint distribution p 3 (i;j;k;l;s;t;r) =P ((I y ;J y ;K y ;L y ;(I y );(J y );(L y )) = (i;j;k;l;s;t;r)): (3.61) 55 The case 1 1;1 is equivalent to s =k;t6=l;r6=j, and, since 2 n it is equivalent to s =k andjfi;j;k;l;r;tgj = 6. We claim that p 3 (i;j;k;l;s;t;r) = 1 (n 1)(n 3)(n 5) p(i;j;k;l) (3.62) when s =k andjfi;j;k;l;r;tgj = 6. To justify (3.62), we note that since is independent offI y ;J y ;K y ;L y g, the image ofI y is uniform overn1 choices fromfI y g c and conditional on(I y ) =K y , then3 choices for(J y ) fall infI y ;J y ;K y g c uniformly. Conditioned on these two images, when (J y )6=L y , (L y ) is distributed uniformly over the n 5 choices of fI y ;J y ;K y ;L y ;(J y )g c . Now we can bound Ejd (J y )(L y ) j1 1;1 in the following way, Ejd (J y )(L y ) j1 1;1 = X i;j;k;l;s;t;r jd tr jp 3 (i;j;k;l;s;t;r)1(s =k;t6=l;r6=j) = X jfi;j;k;l;t;rgj=6 jd tr jp 3 (i;j;k;l;k;t;r) = 1 (n 1)(n 3)(n 5) X jd tr jp(i;j;k;l) 4n (n 1)(n 3)(n 5) D using (3.43) 16 D n 2 when n 9: (3.63) So, adding the bounds in (3.56),(3.60) and (3.63), and using (3.53) we obtain, EjT y j1 1;1 64 D n 2 when n 9: (3.64) 56 From (3.59) and (3.64), using the denition of V in (3.48), we obtain the following bound on the rst term of (3.49), EV1 1;1 112 D n 2 ; (3.65) when n 9. Next, on 1 1;2 , indicating the event (I y ) 6= K y ;(J y ) = L y , we have I = fI y ;J y ;K y ;L y ;(I y );(K y )g and hence, by denition (3.45), T1 1;2 = 2(d I y (I y ) +d K y (K y ) +d J y L y)1 1;2 (3.66) We further observe, (I y ;J y ;K y ;L y ;(I y );(J y );(K y );(L y )) L = (J y ;I y ;L y ;K y ;(J y );(I y );(L y );(K y )); (3.67) which because of (3.59) yields T1 1;2 L =T1 1;1 )EjTj1 1;2 =EjTj1 1;1 48 D n 2 for n 9 (3.68) Furthermore, the distributional equality in (3.67) implies T y 1 1;2 = 2(d I y K y +d J y L y +d (I y )(K y ) )1 1;2 L =T y 1 1;1 ; (3.69) 57 yielding EjT y j1 1;2 =EjT y j1 1;1 64 D n 2 when n 9: (3.70) Thus combining (3.68),(3.70) we obtain EV1 1;2 112 D n 2 ; (3.71) when n 9. Next on 1 1;3 , indicating (I y ) = L y ;(J y )6= K y , we haveI =fI y ;J y ;K y ;L y ; (J y );(K y )g and T1 1;3 = 2(d I y L y +d J y (J y ) +d K y (K y ) )1 1;3 ; (3.72) T y 1 1;3 = 2(d I y K y +d J y L y +d (J y )(K y ) )1 1;3 : (3.73) On 1 1;3 , we have s = l;t6= k which is equivalent to s = l andjfi;j;k;l;tgj = 5. Hence we may bound the rst summand in (3.72) as follows Ejd I y L yj1 1;3 = X i;j;k;l;s;t jd il jp 2 (i;j;k;l;s;t)1(s =l;t6=k) = X jfi;j;k;l;tgj=5 jd il jp 2 (i;j;k;l;l;t) = n 4 (n 1)(n 3) X jfi;j;k;lgj=4 jd il jp(i;j;k;l) 4 n(n 1) D using (3.40) 58 8 D n 2 when n 9. Continuing in this manner we arrive, as in (3.71), at EV1 1;3 112 D n 2 ; (3.74) when n 9. Symmetries between 1 1;1 and 1 1;2 such as (3.67) hold as well between 1 1;3 and 1 1;4 , yielding EV1 1;4 112 D n 2 ; (3.75) when n 9. Combining the bounds from (3.65), (3.71), (3.74) and (3.75), we obtain EV1 1 448 D n 2 : (3.76) when n 9. Computation on R 1 = 0;R 2 = 1: Now we need to make the following decom- position V1 2 =V1 2;1 +V1 2;2 ; 59 where 1 2 =1(R 1 = 0;R 2 = 1), 1 2;1 =1((I y ) =J y ;(K y )6=L y ), 1 2;2 =1((I y )6= J y ;(K y ) =L y ). On 1 2;1 we haveI =fI y ;J y ;K y ;L y ;(K y );(L y )g which gives T1 2;1 = 2(d I y J y +d K y (K y ) +d L y (L y ) )1 2;1 and T y 1 2;1 = 2(d I y K y +d J y L y +d (K y )(L y ) )1 2;1 : So, we obtain, EjTj1 2;1 2(Ejd I y J yj1 2;1 +Ejd K y (K y ) j1 2;1 +Ejd L y (L y ) j1 2;1 ); (3.77) EjT y j1 2;1 2(Ejd I y K yj1 2;1 +Ejd J y L yj1 2;1 +Ejd (K y )(L y ) j1 2;1 ): (3.78) Since p(i;j;k;l) =p(i;k;j;l) and is chosen independently offI y ;J y ;K y ;L y g, we have (I y ;K y ;J y ;L y ;(I y );(K y );(J y );(L y )) L = (I y ;J y ;K y ;L y ;(I y );(J y );(K y );(L y )): (3.79) Hence,we obtain T1 2;1 L =T1 1;1 ; which, by (3.59) gives EjTj1 2;1 =EjTj1 1;1 48 D n 2 when n 9: (3.80) 60 To begin boundingEjT y j1 2;1 , we bound the rst summand in (3.78),Ejd I y K yj1 2;1 , as follows Ejd I y K yj1 2;1 = X jfi;j;k;l;rgj=5 jd ik jP ((I y ;K y ;J y ;L y ;(I y );(K y )) = (i;k;j;l;j;r)) = X jd ik jP ((I y ;J y ;K y ;L y ;(I y );(J y )) = (i;k;j;l;j;r)) = X jd ik jp 2 (i;k;j;l;j;r) = n 4 (n 1)(n 3) X jfi;j;k;lgj=4 jd ik jp(i;j;k;l) 8 D n 2 when n 9, using (3.40): (3.81) Also using the distributional equality (I y ;J y ;K y ;L y ;(I y );(K y )) L = (J y ;I y ;L y ;K y ;(J y );(L y )) and the bound in (3.81) above, we obtain Ejd J y L yj1 2;1 =Ejd I y K yj1 2;1 8 D n 2 when n 9: (3.82) Using the distributional equality in (3.79) and the bound in (3.63) Ejd (K y )(L y ) j1 2;1 =Ejd (J y )(L y ) j1 1;1 16 D n 2 : (3.83) Combining (3.82),(3.83) and using (3.78) we obtain EjT y j1 2;1 64 D n 2 when n 9: (3.84) 61 Adding the two bounds in (3.80) and (3.84) we have EV1 2;1 112 D n 2 ; (3.85) when n 9. Next, on 1 2;2 , where (I y ) 6= J y ;(K y ) = L y , we haveI = fI y ;J y ;K y ;L y ; (I y );(J y )g and hence T1 2;2 = 2(d I y (I y ) +d J y (J y ) +d K y L y)1 2;2 (3.86) T y 1 2;2 = 2(d I y K y +d J y L y +d (I y )(J y ) )1 2;2 : (3.87) Noting the distributional equality (I y ;J y ;K y ;L y ;(I y );(J y );(K y );(L y )) L = (K y ;L y ;I y ;J y ;(K y );(L y );(I y );(J y )) we obtain T1 2;1 L =T1 2;2 and T y 1 2;1 L =T y 1 2;2 ; which yields EjTj1 2;2 =EjTj1 2;1 48 D n 2 and EjT y j1 2;2 =EjT y j1 2;1 64 D n 2 : 62 Hence we have EV1 2;2 112 D n 2 when n 9: (3.88) Combining (3.85) with (3.88) we obtain EV1 2 224 D n 2 ; (3.89) when n 9 Computation on R 1 = 2: Here we need the decomposition V1 3 =V1 3;1 +V1 3;2 ; where 1 3 = 1(R 1 = 2), 1 3;1 = 1((I y ) = K y ;(J y ) = L y ) and 1 3;2 = 1((I y ) = L y ;(J y ) = K y ). Note that 1 3;1 and 1 3;2 correspond to the cases in (3.34) where (R 1 ;R 2 ) = (2; 0) and (R 1 ;R 2 ) = (2; 2), respectively. On both1 3;1 and1 3;2 , we have I =fI y ;J y ;K y ;L y g and T y 1 3;1 =T1 3;1 = 2(d I y K y +d J y L y)1 3;1 since 1 3;1 = y 1 3;1 . (3.90) 63 From (I y ;J y ;K y ;L y ;(I y );(J y )) L = (J y ;I y ;L y ;K y ;(J y );(I y )), it is clear that Ejd I y K yj1 3;1 = Ejd J y L yj1 3;1 and hence it is enough to bound any one of the two summands in (3.90). We bound Ejd I y K yj1 3;1 using (3.40) as follows Ejd I y K yj1 3;1 = X i;j;k;l;s;t jd ik jp 2 (i;j;k;l;s;t)1(s =k;t =l) = X jfi;j;k;lgj=4 jd ik jp 2 (i;j;k;l;k;l) = 1 (n 1)(n 3) X jd ik jp(i;j;k;l) 8 D n 3 when n 9: (3.91) Thus, using (3.90) and (3.91),we obtain EV1 3;1 64 D n 3 ; (3.92) when n 9. On 1 3;2 , we have T1 3;2 = 2(d I y L y +d J y K y) and T y 1 3;2 = 2(d I y K y +d J y L y): (3.93) To obtain bounds for E(V1 3;2 ), we bound the rst summand in (3.93) as follows, Ejd I y L yj1 3;2 = X jfi;j;k;lgj=4 jd il jp 2 (i;j;k;l;l;k) = 1 (n 1)(n 3) X jd il jp(i;j;k;l) 8 D n 3 (3.94) 64 when n 9, using(3.40). Similarly its easy to obtain bounds on the other sum- mands also and conclude as in (3.92), EV1 3;2 64 D n 3 for n 9: (3.95) Combining (3.92) and (3.95), we obtain EV1 3 128 D n 3 ; (3.96) when n 9. Computation for R 1 = 0;R 2 = 2: This event is indicated by 1 4 = 1((I y ) = J y ;(K y ) =L y ). On 1 4 , we have T1 4 = 2(d I y J y +d K y L y)1 4 ; (3.97) T y 1 4 = 2(d I y K y +d J y L y)1 4 : (3.98) For the rst term in (3.97), Ejd I y J yj1 4 = X jfi;j;k;lgj=4 jd ij jP ((I y ;K y ;J y ;L y ;(I y );(K y ) = (i;k;j;l;j;l)) = X jd ij jp 2 (i;k;j;l;j;l) using (3.79) 1 (n 1)(n 3) X jd ij jp(i;j;k;l) 8 D n 3 when n 9: (3.99) 65 For the other summand in (3.97) and (3.98), we can follow similar calculations as in (3.99) above and obtain the same bounds. Thus we will nally obtain EjT1 4 j;EjT y 1 4 j 32 D n 3 when n 9: (3.100) So we have EV1 4 64 D n 3 ; (3.101) when n 9. Computation on R 1 = R 2 = 0: Now we bound the L 1 contribution from the last case in (3.34) denoted by 1 5 = 1(R 1 = R 2 = 0). Here we haveI = fI y ;J y ;K y ;L y ;(I y );(J y );(K y );(L y )g and T1 5 = 2 d I y (I y ) +d J y (J y ) +d K y (K y ) +d L y (L y ) (3.102) T y 1 5 = 2 d I y K y +d J y L y +d (I y )(K y ) +d (J y )(L y ) : (3.103) Since we are on 1 5 , we need to consider p 3 () as introduced in (3.61). On 1 5 , p 3 () is given by p 3 (i;j;k;l;s;t;r) = 1 (n 1)(n 3)(n 5) p(i;j;k;l); (3.104) 66 whenjfi;j;k;l;s;t;rgj = 7. The justication for (3.104) is essentially the same as that for (3.62). Using p 3 (), we begin to bound the summands in (3.102). Ejd I y (I y ) j1 5 = X jfi;j;k;l;s;t;rgj=7 jd is jp 3 (i;j;k;l;s;t;r) = (n 6)(n 5) (n 1)(n 3)(n 5) X jfi;j;k;l;sgj=5 jd is jp(i;j;k;l) 8 D n when n 9: (3.105) It is easy to see that I y ;J y ;K y ;L y have identical marginal distributions and since is chosen independently of these indices, we have Ejd N(N) j1 5 is constant over fI y ;J y ;K y ;L y g. Thus we obtain EjTj1 5 64 D n when n 9: (3.106) Now bounding the L 1 norm of the rst summand in (3.103), Ejd I y K yj1 5 = X jfi;j;k;l;s;t;rgj=7 jd ik jp 3 (i;j;k;l;s;t;r) = (n 4)(n 5)(n 6) (n 1)(n 3)(n 5) X jfi;j;k;lgj=4 jd ik jp(i;j;k;l) 4 D n when n 9, using (3.40): (3.107) Now consider the last summand of (3.103), Ejd (J y )(L y ) j1 5 = X jfi;j;k;l;s;t;rgj=7 jd tr jp 3 (i;j;k;l;s;t;r) 67 n 6 (n 1)(n 3)(n 5) X jfi;j;k;l;t;rgj=6 jd tr jp(i;j;k;l) 8 D n when n 9, using (3.43): (3.108) Since (I y ;J y ;K y ;L y ;(I y );(J y );(K y );(L y )) L = (J y ;I y ;L y ;K y ;(J y );(I y ); (L y );(K y )), we see that Ejd I y K yj1 5 =Ejd J y L yj1 5 and Ejd (I y )(K y ) j1 5 =Ejd (J y )(L y ) j1 5 : (3.109) So, using (3.107),(3.108) along with (3.109), we obtain EjT y j1 5 48 D n when n 9: (3.110) Combining (3.106) and (3.110), we obtain EV1 5 112 D n when n 9: (3.111) Combining the bounds from (3.76),(3.89),(3.96),(3.101) and (3.111), we obtain EjWW jEV 112 D n + 672 D n 2 + 192 D n 3 when n 9: (3.112) This completes the proof of Theorem 3.4.2. 68 3.5 L 1 Bounds In this section we will use Theorem 3.4.2 from the previous section to obtain L 1 bounds using arguments similar to those in [7]. It is worth noting that we can use L 1 along withL 1 bounds to obtainL p bounds for anyp 1 using (3.4). The main theorem of this section is the following Theorem 3.5.1. Suppose we have an nn array D = ((d ij )) satisfying d ij = d ji ;d ii = d i+ = 08 i;j and 2 D = 1. If W = P d i(i) where is an involution chosen uniformly at random from n , then for n 9 jjF W jj 1 K D n (3.113) Here K = 61; 702; 446 is a universal constant. Theorem 3.5.1 readily implies Theorem 3.2.1 by (3.4) and Theorem 3.4.1. Remark 3.5.1. We claim that to prove Theorem 3.5.1 it is enough to consider arrays with D =n 0 = 1=90, and nn 0 = 1000. To prove the claim, rst note that 1 3 D = ( X 1i;jn jd ij j 3 ) 1 3 n 1=3 ( X jd ij j 2 ) 1 2 =n 1=3 (n 1)(n 3) 2(n 2) 1 2 ; (3.114) which is greater than 1=2 for n 4. In (3.114), the rst inequality follows from H older's inequality and the next equality follows from the fact that 2 D = 1. So if 69 n < n 0 then D =n > 1=(8n 0 ). Since K maxf1= 0 ; 8n 0 g inequality (3.113) holds if D =n> 0 or n<n 0 . One useful inequality that will be used repeatedly in the proof is the following j k X i=1 a i j ! 3 X ja i j 3 k 2 X ja i j 3 : (3.115) The proof of Theorem 3.5.1 proceeds by rst proving several auxiliary lemmas. The rst lemma helps bound the error created when truncating the array as in [7], page 382. For the array D = ((d ij )) nn , we dene d 0 ij =d ij 1 jd ij j 1 2 : (3.116) Letting =f(i;j) :jd ij j> 1 2 g and i =fj : (i;j)2 g we have j i j 8 X j jd ij j 3 and hence (3.117) jj 8 D : (3.118) Inequality (3.118) has the following useful consequence, that is, P (Y D 06=Y D ) P X i 1 (i;(i)) 1 ! E( X i 1 (i;(i))) = jj n 1 < 2jj n 16 D n : (3.119) 70 Lemma 3.5.1. Suppose ((d ij )) nn satises the conditions in Theorem 3.5.1. Then with ((d 0 ij )) nn dened in (3.116), D 0 n and nn 0 , we have, jd 0 ++ j 4 D and jd 0 i+ j 4 X j jd ij j 3 4 D ; (3.120) and therefore j D 0j 8 D n ; j 2 D 0 1j 10 D n and D 0 22 D : Proof. Using d ++ = 0 and (3.118), we have jd 0 ++ j =j X (i;j):jd ij j1=2 d ij j =j X (i;j)2 d ij j X (i;j)2 jd ij jjj 2 3 D 1 3 4 D : (3.121) Similarly, as d i+ = 0 for all i2f1;:::;ng, we obtain using (3.117), jd 0 i+ j =j X j= 2 i d ij j =j X j2 i d ij jj i j 2=3 X j2 i jd ij j 3 ! 1=3 4 X j jd ij j 3 4 D :(3.122) Now, using (3.6) and (3.121), j D 0j = d 0 ++ n 1 4 D n 1 8 D n : To prove the last assertion, rst note that by (3.122) we have n X i=1 jd 0 i+ j 2 4 D n X i=1 jd 0 i+ j 16 D X i X j jd ij j 3 = 16 2 D : (3.123) 71 Similarly one can obtain n X i=1 jd 0 i+ j 3 64 3 D : (3.124) From (3.6) and the fact that 2 D = 1 we obtain the following, 2 D 0 1 = 2 (n 1)(n 3) (n 2) X 1i;jn (d 02 ij d 2 ij ) + 1 n 1 d 02 ++ 2 n X i=1 d 02 i+ ! 2 (n 1)(n 3) 0 @ (n 2) X (i;j)2 d 2 ij + 1 n 1 d 02 ++ + 2 n X i=1 d 02 i+ 1 A 2 (n 1)(n 3) 0 @ (n 2) X (i;j)2 d 2 ij + 16 2 D (n 1) + 32 2 D 1 A using(3.125) 8 D n 1 + 32 2 D (n 1) 2 (n 3) + 64 2 D (n 1)(n 3) (3.126) 10 D n < 1 9 since D =n 1=90 and n 1000. (3.127) In the above sequence of inequalities, (3.125) follows from (3.121) and (3.123) while (3.126) holds since X (i;j)2 d 2 ij 2 3 D jj 1 3 2 D : Hence, we obtainj 2 D 0 1j< 1=9. Thus using (3.7) with notation as in (3.12), and inequality (3.115), we obtain D 0 = 1 3 D 0 X i6=j d 0 ij d 0 i+ n 2 d 0 +j n 2 + d 0 ++ (n 1)(n 2) 3 72 16 1 3 D 0 X i6=j jd 0 ij j 3 + jd 0 i+ j 3 (n 2) 3 + jd 0 +j j 3 (n 2) 3 + jd 0 ++ j 3 ((n 1)(n 2)) 3 16 1 3 D 0 D + 128 3 D (n 2) 3 + 64n 2 3 D ((n 1)(n 2)) 3 22 D sincej 2 D 0 1j< 1=9; D =n 1=90; where in the second inequality we use (3.121) and (3.124). Lemma 3.5.2. Let D = ((d ij )) be an array as in Theorem 3.5.1 and D 0 = ((d 0 ij )) be as in (3.116). Let us denote e d ij = c d 0 ij D 0 : If D =n 0 and nn 0 , we havej e d ij j 1. Proof: Because 2 D 0 > 1=2 from Lemma 3.5.1, D 0 > 2=3. Using this inequality, denition (3.7) and (3.120) we obtain j e d ij j 1 D 0 jd 0 ij j + 1 n 2 (jd 0 i+ j +jd 0 +j j) + jd 0 ++ j (n 1)(n 2) 3 4 + 8 D 0 D n 2 + 4 D 0 D (n 1)(n 2) < 3 4 + 16 D n + 4 D n since n 1000 = 3 4 + 20 D n < 1 since D =n 1=90: For E = ((e ij )) nn let F E be the distribution function of Y E and E =jjF E jj 1 : (3.128) 73 For > 0 let M n ( ) be the set of nn matrices E = ((e ij )) satisfying 2 E = 1;e ij =e ji ;e ii =e i+ = 0 8i;j and E : Let us dene ( ;n) = sup E2Mn( ) E : (3.129) Also, we dene 1 ( ;n) = sup E2M 1 n ( ) E where M 1 n ( ) =fE2M n ( ) : sup i;j je ij j 1g: (3.130) Lemma 3.5.3. When nn 0 , with D and 1 ( ;n) dened in (3.128) and (3.130), for all > 0 supf D :D2M n ( ); D =n 0 g 1 (22 ;n) + 36 n : Proof: Let D = ((d ij ))2 M n ( ) with D =n 1=90. Hence, by (3.119) and Lemmas 3.5.2 and 3.5.1, D =jjF D jj 1 sup t jP (Y D 0t) (t)j +P (Y D 6=Y D 0) sup t jP (Y D 0t) (t)j + 16 D n 1 (22 D ;n) + sup t j( t D 0 D 0 ) (t)j + 16 D n : (3.131) 74 Since 1 ( ;n) is monotone in , 1 (22 D ;n) 1 (22 ;n) when D ; similarly, we bound the last term as 16 D =n 16 =n. So, we are only left with verifying that the second term is bounded by 20 =n. From Lemma 3.5.1, we obtainj 2 D 0 1j 1=9 yielding in particular D 0 2 [2=3; 4=3]. First consider the case wherejtj 8 D =n, and for a given t let t 1 = (t D 0)= D 0. Sincej D 0j 8 D =n by Lemma 3.5.1,t andt 1 will be on the same side of the origin. Next, it is easy to show that fora> 0 we havejt exp(at 2 =2)j 1= p a. Hence t exp 9 32 (t D 0) 2 (t D 0) exp 9 32 (t D 0) 2 +j D 0j 4 3 +j D 0j 4 3 (1 +j D 0j): (3.132) Since D 0 2=3 and Lemma 3.5.1 givesj 2 D 0 1j 10 D =n, we nd that j D 0 1j = j 2 D 0 1j D 0 + 1 10 D n : Now, by the mean value theorem , D 02 [2=3; 4=3], and Lemma 3.5.1 j(t 1 ) (t)j max ( t D 0 D 0 );(t) t D 0 D 0 t where = 0 75 1 p 2 max exp( 9 32 (t D 0) 2 ); exp( t 2 2 ) t(1 D 0) D 0 + 1 p 2 D 0 D 0 3 2 p 2 j D 0 1j max t exp( 9 32 (t D 0) 2 ) ; t exp( t 2 2 ) + 1 p 2 D 0 D 0 3 2 p 2 j D 0 1j( 4 3 (1 +j D 0j)) + 3 4 j D 0j using (3.132) 2 p 2 j D 0 1j(1 +j D 0j) + 3 4 j D 0j (3.133) 2 p 2 10 D n (1 + 8 D n ) + 6 D n 17 D n since D =n 1=90: Whenjtj< 8 D =n, the bound is easier. Since t 1 lies in the interval with boundary points 3(t D 0)=2 and 3(t D 0)=4, we have jt 1 j 3(jtj +j D 0j) 2 : (3.134) Now, usingjtj< 8 D =n,j D 0j< 8 D =n and (3.134), we obtain j(t 1 ) (t)j 1 p 2 jt 1 tj 1 p 2 (3jtj + 2j D 0j) 1 p 2 40 D n < 20 D n : 76 Thus we obtain sup t j( t D 0 D 0 ) (t)j 20 D n 20 n : In view of Lemma 3.5.3, to prove Theorem 3.5.1 it remains only to show 1 ( ;n) c =n for an explicit c which we eventually determine. Hence in the following cal- culations we consider only arrays D = ((d ij )) nn in M 1 n ( ), and so sup i;j jd ij j 1. We will need the following technical lemma. Lemma 3.5.4. IfD2M 1 n ( ) andB = ((b ij )) nlnl is the array formed by removing from D the l rows and l columns indexed byT f1; 2;:::;ng, where l is even and l =jTj 8, then for nn 0 and D =n 0 =50,j B j 8:07,j 2 B 1j :52, and B 50 D . Proof. To prove the rst claim, we note since d i+ = 0;jd ij j 1 and l 8, letting n =nl, jb i+ j =j X j= 2T d ij j =j X j2T d ij j 8 and hence jb ++ j =j X i= 2T b i+ j 8n: (3.135) This inequality impliesj B j =jb ++ j=(n 1) 8:07 proving the rst claim. Using 2 D = 1 and (3.6) we obtain 2 n 2 (n 1)(n 3) X 1i;jn d 2 ij = (n 2)(n 1)(n 3) (n 2)(n 1)(n 3) 2 [1; 1:01] (3.136) 77 when n 1000. Using the above bound, (3.6) and (3.135), we obtain 2 B 2 n 2 (n 1)(n 3) X 1i;jn d 2 ij (3.137) 2 (n 1)(n 3) 0 @ (n 2)j X fi= 2Tg\fj= 2Tg d 2 ij n X i;j=1 d 2 ij j + 1 n 1 b 2 ++ +2 X i= 2T b 2 i+ ! 2 (n 1)(n 3) 0 @ (n 2) X fi2Tg[fj2Tg d 2 ij + 64n 2 n 1 + 128n 1 A 32n 272(n 3) + 128n 2 (n 1) 2 (n 3) + 256n (n 1)(n 3) (3.138) :51 when n 1000; (3.139) where for (3.138), we usejTj 8, D n=(50 90) and H older's inequality to obtain X fi2Tg[fj2Tg d 2 ij X i2T n X j=1 d 2 ij + X j2T n X i=1 d 2 ij = 2 X i2T n X j=1 d 2 ij 2 8 2 3 D n 1 3 as P n j=1 d 2 ij ( P n j=1 jd ij j 3 ) 2 3 n 1 3 2 3 D n 1 3 andjTj 8 16 n 272 since D n=4500: From (3.139) and (3.136), we obtainj 2 B 1j:52 for n 1000. This proves the second claim of the lemma. 78 To prove the nal claim we observejb i+ j =j P j2T d ij j. Thus, by (3.115) and jTj 8, P i= 2T jb i+ j 3 = P i= 2T j P j2T d ij j 3 64 P i= 2T P j2T jd ij j 3 64 D and similarly jb ++ j 3 =j P i= 2T b i+ j 3 n 2 P i= 2T jb i+ j 3 64n 2 D : (3.140) These observations, along with the fact that 2 B :48, yield B = P fi= 2Tg\fj= 2Tg j b b ij j 3 3 B 3:1 X fi= 2Tg\fj= 2Tg j b b ij j 3 = 3:1 X fi= 2Tg\fj= 2Tg b ij b i+ (n 2) b +j (n 2) + b ++ (n 1)(n 2) 3 3:1 4 2 X jb ij j 3 + 2n X i= 2T jb i+ j 3 (n 2) 3 +n 2 jb ++ j 3 ((n 1)(n 2)) 3 ! by (3.115) 49:6 D + 128 n (n 2) 3 D + 64 D n 4 ((n 1)(n 2)) 3 using (3.140) 50 D when n 1000, since l 8: This completes the proof. Lemma 3.5.5. Consider a nonnegative sequencefa n ;n2Zg such that a n = 0 for all n 0 and a n c + max l2f4;6;8g a nl for all n 1; (3.141) 79 where c 0 and 2 (0; 1=3). Then, with b = maxfc;a 1 (1 3)g, for all n 1 a n b 1 3 :for all n. Proof. Letting b n ;n 1 be given by the recursion b n+1 = 3b n +b for n 1, with b 1 =a 1 , explicitly solving yields for n> 1 b n =c 1 (3) n +c 2 where c 1 = a 1 (13)b 3(13) and c 2 = b 13 : Sincec 1 0 and 3< 1,b n is increasing and bounded above byc 2 . Hence it suces to show a n b n . Since a n is nonnegative, (3.141) implies that a n c + X l2f4;6;8g a nl for all n 1: (3.142) We show a m b m for all 1mn by induction. When n 4, since a nl = 0 for l2f4; 6; 8g we have a n cbb n : 80 Now supposing the claim is true for some n 4, by (3.142) and the monotonicity of b n ;n 1, we obtain a n+1 c + X l2f4;6;8g a n+1l b + X l2f4;6;8g b n+1l b + 3b n3 b + 3b n =b n+1 : This completes the proof. Lemma 3.5.6. With 1 ( ;n) dened as in (3.130), 1 ( ;n) 2804655 n when nn 0 = 1000: Proof. We will consider a smoothed family of indicator functions indexed by> 0, namely h z; (x) = 8 > > > > > < > > > > > : 1 if xz 1 + (zx)= if x2 (z;z +] 0 if x>z +: (3.143) Also, dene h z;0 (x) = 1 (1;z] (x): Let f z; denote the solution to the following Stein equation f 0 (x)xf(x) =h z; (h z; ); (3.144) 81 where (h z; ) = E(h z; (Z)) and Z N (0; 1). We will need the following key inequality about f z; from [7] that holds for any > 0, jf 0 z; (x +y)f 0 z; (x)jjyj 1 + 2jxj + 1 Z 1 0 1 [z;z+] (x +ry)dr : (3.145) We considerD2M 1 n ( ) with D =n 0 =51 andn 1000. Let W = P n i=1 d i(i) as before. Using the fact that h z;0 h z; h z+;0 ; and recalling the denition of D in (3.128), we obtain D sup z jE(h z; (W )) (h z; )j += p 2: (3.146) From (3.144), (2.4) and Var(W ) = 1, we obtain sup z jE(h z; (W ) (h z; )j = sup z jE(f 0 z; (W ))E(Wf z; (W ))j = sup z jE(f 0 z; (W ))E(f 0 z; (W ))j sup z Ejf 0 z; (W )f 0 z; (W )j: (3.147) From (3.146), (3.147) and (3.145), we obtain the following D sup z Ejf 0 z; (W )f 0 z; (W )j += p 2 sup z E jW Wj 1 + 2jWj + 1 Z 1 0 1 [z;z+] (W +r(W W ))dr 82 += p 2 = EjW Wj + 2E(jWjjW Wj) + sup z 1 E jW Wj Z 1 0 1 [z;z+] (W +r(W W ))dr += p 2 := A 1 +A 2 +A 3 += p 2 say: (3.148) First, A 1 is bounded using Theorem 3.4.2, as follows A 1 = EjW Wj 112 D n + 672 D n 2 + 192 D n 3 113 D n since nn 0 = 1000: (3.149) Next we bound A 2 . From Theorem 3.4.2, we obtain, W W = (UW y + (1U)W z )W = (U(S +T y ) + (1U)(S +T z )) (S +T ) = UT y + (1U)T z T: (3.150) Let I = (I y ;J y ;K y ;L y ;(I y );(J y );(K y );(L y )) andI be as dened in Lemma 3.3.2. Thus, by (3.45), the right hand side of (3.150), and hence W W , is measurable with respect to I =fI;Ug. Furthermore, since sup i;j jd ij j 1 and jIj 8, we have jWj =jS +TjjSj +jTj =jSj +j X i2I d i(i) jjSj + X i2I jd i(i) jjSj + 8: 83 Now, by the denition of A 2 , and that U is independent of S and I, A 2 = 2E (jWjjW Wj) = 2E (jW WjE(jWjjI )) 2E (jW WjE(jSj + 8jI )) 2E jW Wj p E(S 2 jI) + 16EjW Wj: (3.151) In the following, fori a realization ofI, letjij denote the number of distinct elements of i. Since S = P i= 2I d i(i) and is chosen from n uniformly at random, condi- tioned onI =i,S has the same distribution as P i= 2i b i(i) , whereB = ((b ij )) njijnjij is the matrix formed by removing rows and columns of D that occur in i and is chosen uniformly from njij . Sincejij 8 and D =n 0 =51 < 0 =50, Lemma 3.5.4 yieldsj B j 8:07 and 2 B 1:52, and hence EjY B j 2 1:52 + 8:07 2 = 66:6449 when n 1000: (3.152) Using (3.152), we obtain E(S 2 jI =i) =EjY B j 2 67 for all i. Thus using (3.151) and (3.149), we obtain A 2 33EjW Wj 3729 D n : (3.153) 84 Finally, we are left with bounding A 3 . First we note that W +r(W W ) = rW + (1r)W = r(S +UT y + (1U)T z ) + (1r)(S +T ) = S +rUT y +r(1U)T z + (1r)T = S +g r where g r =rUT y +r(1U)T z + (1r)T: Now, from the denition of A 3 , again using that WW isI measurable, A 3 = sup z 1 E jWW j Z 1 0 1 [z;z+] (W +r(W W ))dr = sup z 1 E jWW jE( Z 1 0 1 [z;z+] (W +r(W W ))drjI ) = sup z 1 E jWW j Z 1 0 P (W +r(W W )2 [z;z +]jI )dr = sup z 1 E(jWW j Z 1 0 P (S +g r 2 [z;z +]jI )dr) = sup z 1 E(jWW j Z 1 0 P (S2 [zg r ;z +g r ]jI )dr) 1 E(jWW j Z 1 0 sup z P (S2 [zg r ;z +g r ]jI )dr) = 1 E(jWW j Z 1 0 sup z P (S2 [z;z +]jI )dr) (3.154) = 1 E(jWW j sup z P (S2 [z;z +]jI )) = 1 E(jWW j sup z P (S2 [z;z +]jI)); (3.155) where to obtain equality in (3.154) we have used the fact thatg r is measurable with respect toI for all r. 85 It remains only to bound P (S 2 [z;z + ]jI). In the following calculations e b ij = b b ij = B as before. Since D =n 0 =51, Lemma 3.5.4 yields B > 1=2. Hence, sup z P (S2 [z;z +]jI =i)) = sup z P ( X i= 2i b i(i) 2 [z;z +]) = sup z P ( X i= 2i b i(i) B 2 [ z B ; z + B ]) sup z P ( X i= 2i b i(i) B 2 [ z B ; z B + 2]) = sup z P ( X i= 2i b i(i) B 2 [z;z + 2]) = sup z P ( X i= 2i e b i(i) 2 [z;z + 2]) = sup z P (Y e B 2 [z;z + 2]): (3.156) The equality (3.156) holds as when computing b b ij we have that P i= 2i b i+ and P i= 2i b +(i) = P j= 2i b +j do not depend on : Recalling that the distribution function of Y e B is de- noted by F e B , we have, from the denition of ( ;n) in (3.129), P (Y e B 2 [z;z + 2]) jF e B (z + 2) (z + 2)j +jF e B (z) (z)j +j(z + 2) (z)j 2 e B + 2 p 2 : (3.157) Note that e B = B and by Lemma 3.5.4 e B njij = B njij 50 D njij 51 D n 0 : 86 Hence using the fact that e B2M njij ( B ) and applying Lemma 3.5.3, we obtain for n 1008 e B 1 (22 e B ) + 36 e B njij = 1 (22 B ;njij) + 36 B njij : (3.158) Inequalities (3.157) and (3.158) imply P (Y e B 2 [z;z + 2]) 2 e B + 2 p 2 2 1 (22 B ;njij) + 72 B njij + 2 p 2 2 1 (22 50 D ;njij) + 72 50 D njij + 2 p 2 2 max l2f4;6;8g 1 (k 1 D ;nl) + k 2 D n + 2 p 2 ; (3.159) where k 1 = 1100 and k 2 = 3630. As (3.159) does not depend on z or i, it bounds sup z P (S2 [z;z +]jI)) in (3.155). Now using (3.155), (3.159) and (3.149), we obtain A 3 1 (2 max l2f4;6;8g 1 (k 1 D ;nl) +k 2 D n + 2 p 2 )EjWW j 1 (2 max l2f4;6;8g 1 (k 1 D ;nl) +k 2 D n + 2 p 2 )113 D n : (3.160) Combining (3.149), (3.153), (3.160) and using (3.148) we obtain for n 1008, D 3842 D n + 1 (2 max l2f4;6;8g 1 (k 1 D ;nl) +k 2 D n + 2 p 2 )113 D n + p 2 3842 n + 113 n (2 max l2f4;6;8g 1 (k 1 D ;nl) +k 2 n + 2 p 2 ) + p 2 ; 87 since D for all D 2 M 1 n ( ). Setting = (113 8)k 1 =n, we obtain for n 1008, D c n + 1 4 max l2f4;6;8g 1 (k 1 D ;nl) k 1 ; (3.161) where c = 400; 665. Since c > 51= 0 , its clear that if we consider D2 M 1 n ( ), with D =n> 0 =51, then D <c =n. Hence (3.161) holds for all D2M 1 n ( ) with n 1008. Taking supremum over D2M 1 n ( ), we have, for n 1008, 1 ( ;n) c n + 1 4 max l2f4;6;8g 1 (k 1 ;nl) k 1 ; where c = 400; 665. Now multiplying by n= and taking supremum over we obtain sup n 1 ( ;n) c + 2 7 max l2f4;6;8g sup (nl) 1 ( ;nl) for all n 1008: (3.162) IfD2M 1 n ( ) andn 1000 then (3.114) shows that D 2, and henceD62M 1 n ( ) for all < 2. Since 1 ( ;n) 1 for all n2N, for 1000n 1008 we have sup n 1 ( ;n) = sup 2 n 1 ( ;n) 1008 sup 2 1 ( ;n) 504: Since c> 504, we conclude (3.162) holds for n 1000, that is sup n 1 ( ;n) c + 2 7 max l2f4;6;8g sup (nl) 1 ( ;nl) for all n 1000: (3.163) 88 Letting s n = sup n 1 ( ;n)= and a n = s n+999 for n 1, and a n = 0 for n 0, (3.163) gives a n c + 2 7 max l2f4;6;8g a nl for n 1. Using Lemma 3.5.5 with = 2=7,c = 400; 665 and noting thata 1 =n 1 ( ; 1000)= 500 since 2, we obtain a 1 (1 3) 500(1 6=7)<c which yields b = maxfc;a 1 (1 3)g =c; and therefore a n b 1 3 = c 1 3 = 2804655: This completes the proof of the lemma. Lemmas 3.5.6, 3.5.3 and Remark 3.5.1 yield jjF W jj 1 K D n with n 1000; where we can takeK = 22 2804655 + 36 = 61; 702; 446. Using Remark 3.5.1, this proves Theorem 3.5.1 and hence completes the proof of Theorem 3.2.1 as well. 89 3.6 Comment Regarding the Sharpness of the Bounds Here we brie y indicate that the order D =n in Theorem 3.5.1 can not be improved uniformly over all arraysD satisfying the conditions as in the theorem. For example, dene the symmetric arrayE given as follows where because of symmetry we dene the entries e ij for ji only. e ij = 8 > > > > > < > > > > > : 0 if i =j or i is odd and j =i + 1 1 if i6=j and ij is even 1 otherwise: : Clearly e i+ = 0 for all 1in and for i = 2k 1 X ji+1 e 2 i+1j = X ji e 2 ij =n 2k: Using symmetry again, we obtain P 1i;jn e 2 ij = 2 P i;ji e 2 ij = O(n 2 ) and using (3.10), we have 2 E = O(n). Also, sinceje ij j = 0; 1, we have E = f n =g n , where f n = P je 3 ij j = O(n 2 ) and g n = 3 E = O(n 3=2 ). Collecting all these facts together, we obtain E =n =O(n 1=2 ). Also, dene D by d ij =e ij = E . Now along the lines of [21], x 2 (0; 1) and dene t = (1)= E . Then, we have (t) (0)t(t) 1 E ( 1 E ): 90 Using notations as in (3.1) and (3.9), we observe thatY E is integer valued implying F W (0) =F W (t) and hence jjF W jj 1 1 2 (1) 1 E ( 1 E ): Multiplying by n 1=2 on both sides yields n 1=2 jjF W jj 1 n 1=2 1 2 (1) 1 E ( 1 E ): (3.164) Since E =O(n 1=2 ), letting n!1 and then taking ! 0, we obtain lim inf n!1 n 1=2 jjF W jj 1 O(1): Since E =n =O(n 1=2 ), we have that E =n provides the correct rate of convergence. 91 Chapter 4 A Brief Overview of the Concentration of Measure Phenomenon Concentration of measure is concerned with bounding quantities likeP (jYE(Y )j r) for a random variable Y as a function of r. This type of a question is typically important when we expect that most of a random variable's mass is concentrated near its mean. In the following sections we will brie y go over some of the well known concentration of measure inequalities and also discuss an interesting appli- cation. Probably the simplest concentration of measure bounds can be obtained by using Chebychev's inequality. For example, if E(jYj p )<1 for p> 1, then P (jYE(Y )jr) EjYE(Y )j p r p : In the next two chapters we will discuss some new cases where we obtained con- centration inequalities using tools from Stein's method. 92 4.1 SomeWellKnownConcentrationofMeasureInequalities Although Chebychev's inequality is a rather general way of obtaining tail bounds and can be quite tight in certain cases, for some random variables this inequality does not yield the best possible concentration bounds. For example, ifZN (0; 1), then it can be shown that x p 2(1 +x 2 ) e x 2 =2 <P (Zx)< 1 p 2x e x 2 =2 : Thus the correct order of the tail bound is much better than the ones given by Chebychev's inequality. Motivated by the example of a normal variate, it is often interesting to see if one can obtain bounds of the order O(e cx 2 =2 ). This question is even more important as the standard Berry-Esseen type results do not yield decaying tail bounds. Tail bounds that exhibit a decay rate ofO(e cx 2 =2 ) are sometimes referred to as a subgaussian bound. In this chapter we will give a brief overview of some of the well known theorems which yield subgaussian tail bounds. A detailed treatment of this topic can be found in the texts [34] [12]. While results using the Chebychev inequality are easy to obtain, sharper, sub- gaussian concentration of measure results for functions of independent random variates were obtained starting with the seminal Hoeding-Cherno bound. The Hoeding-Cherno bound states the following. 93 Theorem 4.1.1. Let X 1 ;X 2 ;:::;X n be independently distributed random variables satisfying 0 X i 1 almost surely. Then S n = P n i=1 X i satises the following inequality P (S n E(S n )t);P(S n E(S n )t) exp 2t 2 n : The main tool in proving Theorem 4.1.1 is the use of moment generating func- tions. Clearly, the assumption of independence is quite crucial for this reason. Theorem 4.1.1 was one of the earliest subgaussian concentration of measure results. A more general result was proved later, which is known as the Azuma-Hoeding inequality. Suppose we have a probability space ( ;F;P) and consider a ltration f;; g =F 0 2F 1 2:::2F n =F: SupposeY is an integrable random variable, measurable with respect toF. Denote the martingale dierence sequence by d i =E F i (Y )E F i1 (Y ) for 1in: Then the Azuma-Hoeding inequality states that Theorem 4.1.2. For every x 0 P (YE(Y ) +x)e x 2 =(2D 2 ) ; 94 where D 2 P n i=1 d 2 i . Clearly, if in Theorem 4.1.2, we takeF i =(X 1 ;X 2 ;:::;X i ) for 1in for a collection of independent random variables X 1 ;X 2 ;:::;X n , then we recover Theo- rem 4.1.1. On a close inspection, we see that if theX i 's are identically distributed, then the tail bound is of the forme ct 2 =n , wherec is a constant. Hence, ift is xed and n!1 as is often the case, the bound is not too useful, although it still goes to zero exponentially fast. Talagrand's inequality provides sharper bounds for the so called self bounding functions. Suppose X i 's are independent random variables taking value in i , a probability space. Suppose a positive function f : Q n i=1 i ! R + satises the following two conditions 1. If x;y2 Q n i=1 i are such that x i =y i for all i6=k for some 1kn, then jf(x)f(y)jc. 2. Iff(x)s, then there is a setI2f1; 2;:::;ng,jIjrs so that ifx 0 2 Q n i=1 i and x 0 i =x i for all i2I, then f(x 0 )s. Talagrand's theorem states that if m is the median of f(X) where X stands for X = (X 1 ;X 2 ;:::;X n ), then under the product probability measure P dened on Q n i=1 i , P (f(X)mt) 2 exp t 2 4c 2 r(m +t) ; 95 and P (f(X)mt) 2 exp t 2 4c 2 rm : In several important examples, Talagrand's inequality yields sharp concentration bounds. 96 4.2 AnApplicationofConcentrationofMeasureInequalities Concentration of measure inequalities have been proved useful in several areas in- cluding analysis of randomized algorithms and random matrix theory. In this sec- tion we discuss one such application, the skip list, in detail. This and several other interesting examples are discussed in detail in [12]. We would also like to point out that in Chapters 5 and 6, we discuss some other scenarios where our results are applicable. 4.2.1 Skip List A skip list is a data structure that helps in ecient implementation of insertion, deletion and search of an element in a data set. A skip list uses the ordering of the data members crucially and hence works only on a totally ordered set. Suppose we have a totally ordered set of n elements. We arrange the elements in ascending order and call it the list at level 0 or L 0 . Also, for convenience sake, we add the element1 in front of the list. Thus the list L 0 looks like 1!x 1 !x 2 !:::!x n : 97 Next, we form the listL 1 by skipping every odd element or in other words, picking every second element. So, L 1 looks like 1!x 2 !x 4 !:::!x m ; where m =n 1 if n is even or m =n if n is odd. Continuing this process, at every step a new list is formed by skipping the odd indexed elements from the list in the previous step. Once the list with just one element, that is1 is formed, the process stops. At this point, we haveO(log(n)) lists. In the discussion that follows, we suppose that in total, we form t lists. Thus the nal list is L t and t is O(log(n)). Once the skip list is formed, searching for an elementq is quite simple. We start with the nal listL t and nd the maxima of the set of elements that are not greater than q. Call this e t . Then we go one step up, for L t1 and search to the right of e t to nd the maximum element less than or equal to q. The ordering in the lists ensure that we always have to search only to the right of e t . We thus obtain e t1 . Continuing this way and connecting the identical elements using vertical arrows, we obtain a liste t ;e t1 ;e t2 ;:::;e 0 represented by a zig-zag path of downturns and right turns as in Figure 4.1 on page 99. The query q is present in the original list if and only if e 0 = q. While this type of search operation is relatively inexpensive in a skip list, insertion of a new element and deletion of an existing one is more 98 Figure 4.1: A typical search pattern in the skip list cumbersome. To mitigate this problem, a randomized version is often used in practise and we discuss the randomized version now. 4.2.2 Randomized Skip List In a randomized skip list, rather than skipping every odd element, elements are skipped using a probabilistic rule. If an element x i is present in list L k , it is kept in list L k+1 with probability p. Thus H i , the highest level to which an element x i belongs follows a geometric(p) distribution. In other words, P (H i =k) =p k (1p): To remove an element from the list, the element is rst searched and then all the previous occurrences of that element is deleted. Suppose we are interested in 99 knowing how expensive this search/delete operations can be. Clearly, the larger the value of the maximum of the heights is, the more expensive the operations become. If we construct the levels one on top of another with the highest level on top, then the search path follows only downturns and right turns as indicated in Figure 4.1. Every time an element x, at level L k+1 is entered from top, it means that element was retained while skipping elements from list L k to list L k+1 , a probability p event. Every time an element is encountered from left, it means that the element was dropped at that level, a probability 1p event. If H = max i H i : denotes the maximum height of the elements, then the number of levels in the skip list is H and using the right turn and downturn interpretation, we see the, maximum number of arrows as in Figure 4.1 is bounded by the random variable which follows negative binomial distribution with parameters H and p denoted by NB(H;p). Recall that theNB(k;p) variable denotes the number of trials required for the realisation of k successes when the success probability is p. In the present case if for the number that is retained H times, the number of arrows will be NB(H;p). Thus, to analyze the performance of a randomized skip list, we need to see how large an NB(H;p) random variable can be with high probability. In fact, we can show that this random variable is O(log(n)) with a very high probability. For 100 simplicity, we consider the case of p = 1=2 in the following and use the following two lemmas. The reader can see [12] for the proofs of these two lemmas. Lemma 4.2.1. For any a> 0, P (Ha log p (n))n a+1 : In particular, H =O(log(n)) with large probability. The following lemma connects the negative binomial distribution to the more fa- miliar binomial distribution enabling us to apply results like the Hoeding-Cherno bound, Theorem 4.1.1 directly. Lemma 4.2.2. For any k;m integers P (NB(k;p)m) =P (Bin(m;p)k): Equivalently, P (NB(k;p)>m) =P (Bin(m;p)<k): Conditioning on H yields for any k, xed P (NB(H;p)>m) = P (NB(H;p)>m;Hk) +P (NB(H;p)>mjH >k)P (H >k) 101 X ik P (NB(i;p)>m)P (H =i) +P (H >k) = X ik P (Bin(m;p)<i)P (H =i) +P (H >k) P (Bin(m;p)<k) +P (H >k) Let k = 2 log p (n);m = 8 log p (n). Using Theorem 4.1.1, we obtain P (Bin(m;p)mpkmp) = P (Bin(m;p) 4 log(n)2 log(n)) = exp 2 log 2 (n) m =n 1=4 : (4.1) Using Lemma 4.2.1, we have P (H > 2 log p (n)) 1 n : Thus with probability of at mostn 1=4 +n 1 , the search takes more thanO(log(n)) steps. 102 Chapter 5 Concentration of Measure Results Using Bounded Size Bias Couplings Though Stein's method has been used mostly for assessing the accuracy of distribu- tional approximations, recently related ideas have been used successfully in deriving concentration of measure inequalities. For example, Rai c obtained large deviation bounds for certain graph related statistics in [45] using the Cram er transform and Chatterjee [8] derived Gaussian and Poisson type tail bounds for Hoeding's com- binatorial CLT and the net magnetization in the Curie-Weiss model in statistical physics in [8]. While the rst paper employs the Stein equation, the later applies constructions which are related to the exchangeable pair in Stein's method (see [50]). In this chapter, we will prove a theorem that shows how bounded size bias couplings can be used to prove concentration of measure results. The techniques are similar in avour to the approach used in [8]. 103 5.1 The Main Result For a given nonnegative random variableY with nite nonzero mean, recall from (2.8) that Y s has the Y -size biased distribution if E[Yf(Y )] =E[f(Y s )] (5.1) for all functions f for which these expectations exist. The main result in [8] was proved using exchangeable pairs as in (2.5). Motivated by the complementary connections that exist between the exchangeable pair method and size biasing in Stein's method, we prove the following theorem which shows that parallel exists in the area of concentration of measures, and that size biasing can be used to derive one sided deviation results for nonnegative variables Y if it can be closely coupled to a variable Y s following the Y -size biased distribution. Our rst result requires the coupling to be bounded. Theorem 5.1.1. Let Y be a nonnegative random variable with mean and variance and 2 respectively, both nite and positive. Suppose there exists a coupling of Y to a variable Y s having the Y -size bias distribution which satisesjY s Yj C for some C > 0 with probability one. If Y s Y with probability one, then P Y t exp t 2 2A for all t> 0, where A =C= 2 . (5.2) 104 If the moment generating function m() =E(e Y ) is nite at = 2=C, then P Y t exp t 2 2(A +Bt) for all t> 0; (5.3) where A =C= 2 and B =C=2. The monotonicity hypothesis for inequality (5.2), that Y s Y , is natural since Y s is stochastically larger than Y . Therefore there always exists a coupling for whichY s Y . There is no guarantee, however, that for such a monotone coupling, the dierenceY s Y is bounded. In the following chapter, we deal with some cases where the size bias coupling is unbounded. For the niteness condition preceding (5.3), we note that the moment generating function is nite everywhere when Y is bounded. In typical examples the variable Y is indexed by n, and the ones we consider have the property that the ratio= 2 remains bounded asn!1, andC does not depend onn. In such cases the bound in (5.2) decreases at rate exp(ct 2 ) for some c > 0, and if !1 as n!1, the bound in (5.3) is of similar order, asymptotically. Examples covered by Theorem 5.1.1 are given in Section 5.3, and include the number of relatively ordered subsequences of a random permutation, sliding window statistics including the number ofm-runs in a sequence of coin tosses, the number of local maximum of a random function on the lattice, the number of urns containing exactly one ball in the uniform urn allocation model, the volume covered by the 105 union ofn balls placed uniformly over a volumen subset ofR d , and the number of bulbs switched on at the terminal time in the so called lightbulb problem. As we already saw, a number of results in Stein's method for normal approxi- mation rest on the fact that if a variable Y of interest can be closely coupled to some related variable, then the distribution ofY is close to normal. An advantage, therefore, of the Stein method is that dependence can be handled in a direct man- ner, by the construction of couplings on the given collection of random variables related to Y . In [45] and [8], ideas related to Stein's method were used to obtain concentration of measure inequalities in the presence of dependence. Of the two, the technique used by Chatterjee in [8], based on Stein's exchangeable pair [50], is the one closer to the approach taken here. Recall, from (2.5) in Chapter 2 that we say Y;Y 0 is a -Stein pair if these variables are exchangeable and satisfy the linearity condition E(YY 0 jY ) =Y for some 2 (0; 1): (5.4) The -Stein pair is clearly the special case of the more general identity E(F (Y;Y 0 )jY ) =f(Y ) for some antisymmetric function F , 106 specialized to F (Y;Y 0 ) =YY 0 and f(y) =y. Chatterjee in [8] considers a pair of variables satisfying this more general identity, and, with (Y ) = 1 2 E((f(Y )f(Y 0 ))F (Y;Y 0 )jY ); obtains a concentration of measure inequality for Y under the assumption that (Y )Bf(Y ) +C for some constants B and C. For normal approximation, as seems to be the case here also, the areas in which pair couplings such as (5.4) apply, and those for which size bias coupling of Theorem 5.1.1 succeed, appear to be somewhat disjoint. In particular, (5.4) seems to be more suited to variables which arise with mean zero, while the size bias couplings work well for variables, such as counts, which are necessarily nonnegative. Indeed, for the problems we consider, there appears to be no natural way by which to nd exchangeable pairs satisfying the conditions of [8]. On the other hand, the size bias couplings applied here are easy to obtain. After proving Theorem 5.1.1 in Section 5.2, we move to the examples already mentioned. 107 5.2 Proof of the Main Result In the sequel we make use of the following inequality, which depends on the con- vexity of the exponential function; e y e x yx = Z 1 0 e ty+(1t)x dt Z 1 0 (te y + (1t)e x )dt = e y +e x 2 for all x6=y. (5.5) We now move to the proof of Theorem 5.1.1. The proof uses the above inequality and relates the boundedness of the coupling to the limited growth of the mgf of Y . Proof. RecallY s is given on the same space asY , and has theY size biased distri- bution. By (5.5), for all 2R, sincejY s YjC, je Y s e Y j 1 2 j(Y s Y )j(e Y s +e Y ) Cjj 2 (e Y s +e Y ): (5.6) Recalling that if the moment generating function m() =E[e Y ] exists in an open interval containing then we may dierentiate under the expectation, we obtain m 0 () =E[Ye Y ] =E[e Y s ]: (5.7) To prove (5.2), let < 0 and note that since the coupling is monotone exp(Y s ) exp(Y ). Now (5.6) yields e Y e Y s Cjje Y : 108 Since Y 0 the moment generating function m() exists for all < 0, so taking expectation and rearranging yields Ee Y s (1Cjj)Ee Y = (1 +C)E(e Y ); and now, by (5.7), m 0 ()(1 +C)m() for all < 0: (5.8) To consider standardized deviations of Y , that is, deviations ofjYj=, let M() =Ee (Y)= =e = m(=): (5.9) Now rewriting (5.8) in terms of M(), we obtain for all < 0, M 0 () = (=)e = m(=) +e = m 0 (=)= (=)e = m(=) + (=)e = 1 + C m(=) = (= 2 )CM(): (5.10) Since M(0) = 1, by (5.10) logM() = Z 0 M 0 (s) M(s) ds Z 0 Cs 2 ds = C 2 2 2 ; 109 so exponentiation gives us M() exp C 2 2 2 when < 0: Hence for a xed t> 0, for all < 0, P Y t = P Y t = P e ( Y ) e t e t M() exp t + C 2 2 2 : (5.11) Substituting =t 2 =(C) into (5.11) completes the proof of (5.2). Moving on to the proof of (5.3), taking expectation in (5.6) with > 0, we obtain Ee Y s Ee Y C 2 Ee Y s +Ee Y ; so in particular, when 0<< 2=C, E[e Y s ] 1 +C=2 1C=2 E[e Y ]: (5.12) As m(2=C)<1, (5.7) applies and (5.12) yields m 0 () 1 +C=2 1C=2 m() for all 0<< 2=C: (5.13) 110 Now letting 2 (0; 2=C), from (5.9), M() is dierentiable for all < 2=C and (5.13) yields, M 0 () = (=)e = m(=) +e = m 0 (=)= (=)e = m(=) + (=)e = 1 +C=(2) 1C=(2) m(=) = (=)e = m(=) 1 +C=(2) 1C=(2) 1 = (= 2 ) C 1C=(2) M(): Dividing by M() we may rewrite the inequality as d d logM() (= 2 ) C 1C=(2) : Noting that M(0) = 1, setting A =C= 2 andB =C=(2), integrating we obtain logM() = Z 0 d ds logM(s)ds (= 2 ) Z 0 Cs 1B ds = (= 2 ) C 2 2(1B) = A 2 2(1B) : (5.14) Hence, for t> 0, P Y t = P ( Y )t = P e ( Y ) e t e t M() e t exp A 2 2(1B) : (5.15) 111 Note that =t=(A +Bt) lies in (0; 2=C) for all t> 0. If we substitute this value for that yields the bound P Y t < exp t 2 2(A +Bt) for all t> 0, completing the proof. 112 5.3 Applications We now consider the application of Theorem 5.1.1 to derive concentration of mea- sure results for the number of relatively ordered subsequences of a random per- mutation, the number of m-runs in a sequence of coin tosses, the number of local extrema on a graph, the number of nonisolated balls in an urn allocation model, the covered volume in binomial coverage process, and the number of bulbs lit at the terminal time in the so called lightbulb process. Without further mention we will use the fact that when (5.2) and (5.3) hold for some A and B then they also hold when these values are replaced by any larger ones, which may also be denoted by A and B. 5.3.1 Relatively Ordered Sub-sequences of a Random Permutation For n m 3, let and be permutations ofV =f1;:::;ng andf1;:::;mg, respectively, and let V =f; + 1;:::; +m 1g for 2V, where addition of elements ofV is modulo n. We say the pattern appears at location 2V if the valuesf(v)g v2V andf(v)g v2V 1 are in the same relative order. Equivalently, the pattern appears at if and only if( 1 (v)+1);v2V 1 is an increasing sequence. When = m , the identity permutation of length m, we 113 say that has a rising sequence of length m at position . Rising sequences are studied in [5] in connection with card tricks and card shuing. Letting be chosen uniformly from all permutations off1;:::;ng, and X the indicator that appears at , X ((v);v2V ) = 1(( 1 (1) + 1)<<( 1 (m) + 1)); the sum Y = P 2V X counts the number of m-element-long segments of that have the same relative order as . For 2V we may generate X =fX ;2Vg with the X =fX ;2Vg distribution size biased in direction, following [22]. Let be the permutation of f1;:::;mg for which ( (1) + 1)<<( (m) + 1); and set (v) = 8 > > < > > : ( ((v + 1)) + 1); v2V (v) v62V : In other words is the permutation with the values (v);v2V reordered so that ( ) for 2V are in the same relative order as . Now let X =X ( (v);v2V ); 114 the indicator that appears at position in the reordered permutation . As and agree except perhaps for the m values inV , we have X =X ((v);v2V ) for alljjm. Hence, as jY Yj X jjm1 jX X j 2m 1: (5.16) we may take C = 2m 1 as the almost sure bound on the coupling of Y s and Y . Regarding the mean ofY , clearly for any, as all relative orders of(v);v2V are equally likely, EX = 1=m! and therefore =n=m!: (5.17) To compute the variance, letI k be the indicator that(1);:::;(mk) and(k + 1);:::;(m) are in the same relative order, for 0km 1. ClearlyI 0 = 1, and for rising sequences, as (j) =j, I k = 1 for all k. In general for 0km 1 we have X X +k = 0 if I k = 0, as the joint event in this case demands two dierent relative orders on the segment of of length mk of which both X and X +k are a function. If I k = 1 then a given, common, relative order is demanded for this same length of , and relative orders also for the two segments of length k 115 on which exactly one of X and X depend, and so, in total a relative order on mk + 2k =m +k values of , and therefore EX X +k =I k =(m +k)! and Cov(X ;X +k ) =I k =(m +k)! 1=(m!) 2 : As the relative orders of non-overlapping segments of are independent, now taking n 2m, the variance 2 of Y is given by 2 = X 2V Var(X ) + X 6= Cov(X ;X ) = X 2V Var(X ) + X 2V X :1jjm1 Cov(X ;X ) = X 2V Var(X ) + 2 X 2V m1 X k=1 Cov(X ;X +k ) = nVar(X 1 ) + 2n m1 X k=1 Cov(X 1 ;X 1+k ) = n 1 m! 1 (m!) 2 + 2n m1 X k=1 I k (m +k)! ( 1 m! ) 2 = n 1 m! 1 2m 1 m! + 2 m1 X k=1 I k (m +k)! ! : Clearly Var(Y ) is maximized for the identity permutation (k) =k;k = 1;:::;m, as I m = 1 for all 1 m m 1, and as mentioned, this case corresponds to counting the number of rising sequences. In contrast, the variance lower bound 2 n m! 1 2m 1 m! (5.18) 116 is attained at the permutation (j) = 8 > > > > > < > > > > > : 1 j = 1 j + 1 2jm 1 2 j =m which has I k = 0 for all 1km 1. In particular, the bound (5.3) of Theorem 5.1.1 holds with A = 2m 1 1 2m1 m! and B = 2m 1 2 q n m! 1 2m1 m! : 5.3.2 Local Dependence The following lemma shows how to construct a collection of variables X having the X distribution biased in direction whenX is some function of a subset of a collection of independent random variables. Lemma 5.3.1. LetfC g ;g2Vg be a collection of independent random variables, and for each 2V letV V and X =X (C g ;g2V ) be a nonnegative random variable with a nonzero, nite expectation. Then iffC g ;g2V g has distribution dF (c g ;g2V ) = X (c g ;g2V ) EX (C g ;g2V ) dF (c g ;g2V ) 117 and is independent offC g ;g2Vg, letting X =X (C g ;g2V \V ;C g ;g2V \V c ); the collection X =fX ;2Vg has the X distribution biased in direction . Furthermore, with I chosen proportional to EX , independent of the remaining variables, the sum Y s = X 2V X I has the Y size biased distribution, and when there exists M such that X M for all , jY s YjbM where b = max jf :V \V 6=;gj: (5.19) Proof. By independence, the random variables fC g ;g2V g[fC g ;g62V g have distribution dF (c g ;g2V )dF (c g ;g62V ): Thus, with X as given, we nd EX f(X) = Z x f(x)dF (c g ;g2V) = EX Z f(x) x dF (c g ;g2V ) EX (C g ;g2V ) dF (c g ;g62V ) = EX Z f(x)dF (c g ;g2V )dF (c g ;g62V ) = EX Ef(X ): 118 That is,X has theX distribution biased in direction, as in Denition 2.3.1. The claim on Y s follows from Proposition 2.3.1, and nally, since X = X whenever V \V =;, jY s Yj X :V \V I 6=; jX I X jbM: This completes the proof. 5.3.2.1 Slidingm Window Statistics For n m 1, letV =f1;:::;ng considered modulo n,fC g : g2Vg i.i.d. real valued random variables, and for each 2V set V =fv2V :v +m 1g: Then for X : R m ! [0; 1], say, Lemma 5.3.1 may be applied to the sum Y = P 2V X of the m-dependent sequence X = X(C ;:::;C +m1 ), formed by ap- plying the function X to the variables in the `m-window'V . As for all we have X 1 and max jf :V \V 6=;gj = 2m 1; we may take C = 2m 1 in Theorem 5.1.1, by Lemma 5.3.1. For a concrete example letY be the number ofm runs of the sequence 1 ; 2 ;:::; n ofn i.i.d Bernoulli(p) random variables withp2 (0; 1), given byY = P n i=1 X i where X i = i i+1 i+m1 , with the periodic convention n+k = k . In [46], the authors 119 develop smooth function bounds for normal approximation for the case of 2-runs. Note that the construction given in Lemma 5.3.1 for this case is monotone, as for any i, letting 0 j = 8 > > < > > : j j62fi;:::;i +m 1g 1 j2fi;:::;i +m 1g; the number of m runs off 0 j g n i=1 , that is Y s = P n i=1 0 i 0 i+1 0 i+m1 , is at least Y . For the mean of Y clearly =np m . For the variance, now letting n 2m and using the fact that non-overlapping segments of the sequence are independent, 2 = n X i=1 Var( i i+1 i+m1 ) + 2 X i<j Cov( i i+m1 ; j j+m1 ) = np m (1p m ) + 2 n X i=1 m1 X j=1 Cov( i i+m1 ; i+j i+j+m1 ): For the covariances, Cov( i i+m1 ; i+j i+j+m1 ) = E( i i+j1 i+j i+m1 i+j+m1 )p 2m = p m+j p 2m ; and therefore 2 = np m (1p m ) + 2 pp m 1p (m 1)p m = np m 1 + 2 pp m 1p (2m 1)p m : 120 Hence (5.2) and (5.3) of Theorem 5.1.1 hold with A = 2m 1 1 + 2 pp m 1p (2m 1)p m and B = 2m 1 2 r np m 1 + 2 pp m 1p (2m 1)p m : 5.3.2.2 Local Extrema on a Lattice Size biasing the number of local extrema on graphs, for the purpose of normal approximation, was studied in [1] and [22]. For a given graphG =fV;Eg, let G v =fV v ;E v g;v2V, be a collection of isomorphic subgraphs ofG such thatv2V v and for allv 1 ;v 2 2V the isomorphism fromG v 1 toG v 2 mapsv 1 tov 2 . LetfC g ;g2Vg be a collection of independent and identically distributed random variables, and let X v be dened by X v (C w ;w2V v ) = 1(C v >C w ;w2V v ); v2V: Then the sumY = P v2V X v counts the number local maxima. In general one may dene the neighbor distance d between two vertices v;w2V by d(v;w) = minfn : there9 v 0 ;:::;v n inV such that v 0 =v;v n =w and (v k ;v k+1 )2E for 0kng: Then for v2V and r = 0; 1;:::, V v (r) =fw2V :d(w;v)rg 121 is the set of vertices ofV at distance at most r from v. We suppose that the given isomorphic graphs are of this form, that is, that there is somer such thatV v =V v (r) for all v2V. Then if d(v 1 ;v 2 )> 2r, and (w 1 ;w 2 )2V v 1 V v 2 , rearranging 2r<d(v 1 ;v 2 )d(v 1 ;w 1 ) +d(w 1 ;w 2 ) +d(w 2 ;v 2 ) and using d(v i ;w i )r;i = 1; 2; yields d(w 1 ;w 2 )> 0. Hence, d(v 1 ;v 2 )> 2r implies V v 1 \ V v 2 =;; so by (5.19) we may take b = max v jV v (2r)j: (5.20) For example, for p2f1; 2;:::g and n 5 consider the latticeV =f1;:::;ng p modulo n inZ p andE =ffv;wg :d(v;w) = 1g; in this case d is the L 1 norm d(v;w) = p X i=1 jv i w i j: Considering the case where we call vertex v a local extreme value if the value C v exceeds the values C w over the immediate neighbors w of v, we take V v =V v (1) and that jV v (1)j = 1 + 2p; 122 the 1 accounting forv itself, and then 2p for the number of neighbors at distance 1 from v, which dier from v by either +1 or1 in exactly one coordinate. Lemma 5.3.1, (5.20), andjX v j 1 yield jY s Yj max v jV v (2)j = 1 + 2p + 2p + 4 p 2 = 2p 2 + 2p + 1; (5.21) Now letting C v have a continuous distribution, without loss of generality we can assume C v U[0; 1]. As any vertex has chance 1=jV v j of having the largest value in its neighborhood, for the mean of Y we have = n 2p + 1 : (5.22) To begin the calculation of the variance, note that when v and w are neighbors they cannot both be maxima, so X v X w = 0 and therefore, for d(v;w) = 1, Cov(X v ;X w ) =(EX v ) 2 = 1 (2p + 1) 2 : If the distance between v and w is 3 or more, X v and X w are functions of disjoint sets of independent variables, and hence are independent. Whend(w;v) = 2 there are two cases, asv andw may have either 1 or 2 neighbors in common, and EX v X w = 123 P ((U >U j ;V >V j ; 1jmk) \ (U >U j ;V >U j ;mk + 1jm)); where m is the number of vertices over which v and w are extreme, so m = 2p, and k = 1 and k = 2 for the number of neighbors in common. For k = 1; 2;:::, letting M k = maxfU mk+1 ;:::;U m g, as the variables X v and X w are conditionally independent given U mk+1 ;:::;U m E(X v X w jU mk+1 ;:::;U m ) = P (U >U j ;j = 1;:::;mjU mk+1 ;:::;U m ) 2 = 1 (mk + 1) 2 (1M mk+1 k ) 2 ; (5.23) as P (U >U j ;j = 1;:::;mjU mk+1 ;:::;U m ) = Z 1 M k Z u 0 Z u 0 du 1 du mk du = Z 1 M k u mk du = 1 mk + 1 (1M mk+1 k ): Since P (M k x) =x k on [0; 1], we have EM mk+1 k = k Z 1 0 x mk+1 x k1 dx = k m + 1 and E(M mk+1 k ) 2 = k Z 1 0 x 2(mk+1) x k1 dx = k 2mk + 2 : 124 Hence, averaging (5.23) over U mk+1 ;:::;U m yields EX v X w = 2 (m + 1)(2(m + 1)k) : For n 3, when m = 2p, for k = 1 and 2 we obtain Cov(X v ;X w ) = 1 (2p + 1) 2 (2(2p + 1) 1) and Cov(X v ;X w ) = 2 (2p + 1) 2 (2(2p + 1) 2) ; respectively. For n 5, of the 2p + 4 p 2 vertices w that are at distance 2 from v, 2p of them share 1 neighbor in common with v, while the remaining 4 p 2 of them share 2 neighbors. Hence, 2 = X v2V Var(X v ) + X v6=w Cov(X v ;X w ) = X v2V Var(X v ) + X d(v;w)=1 Cov(X v ;X w ) + X d(v;w)=2 Cov(X v ;X w ) = n 2p (2p + 1) 2 2p 1 (2p + 1) 2 + 2p 1 (2p + 1) 2 (2(2p + 1) 1) + 4 p 2 2 (2p + 1) 2 (2(2p + 1) 2) (5.24) = n 2p (2p + 1) 2 1 (2(2p + 1) 1) + 2(p 1) (2(2p + 1) 2) = n 4p 2 p 1 (2p + 1) 2 (4p + 1) : (5.25) 125 We conclude that (5.2) of Theorem 5.1.1 holds with A = C= 2 and B = C=2 with , 2 and C given by (5.22), (5.25) and (5.21), respectively, that is, A = (2p + 1)(4p + 1)(2p 2 + 2p + 1) 4p 2 p 1 and B = 2p 2 + 2p + 1 2 r n 4p 2 p1 (2p+1) 2 (4p+1) : 5.3.3 Urn Allocation In the classical urn allocation model n balls are thrown independently into one of m urns, where, for i = 1;:::;m, the probability a ball lands in the i th urn is p i , with P m i=1 p i = 1. A much studied quantity of interest is the number of nonempty urns, for which Kolmogorov distance bounds to the normal were obtained in [14] and [43]. In [14], bounds were obtained for the uniform case wherep i = 1=m for all i = 1;:::;m, while the bounds in [43] hold for the nonuniform case as well. In [41] the author considers the normal approximation for the number of isolated balls, that is, the number of urns containing exactly one ball, and obtains Kolmogorov distance bounds to the normal. Using the coupling provided in [41], we derive right tail inequalities for the number of non-isolated balls, or, equivalently, left tail inequalities for the number of isolated balls. For i = 1;:::;n let X i denote the location of ball i, that is, the number of the urn into which ball i lands. The number Y of non-isolated balls is given by Y = n X i=1 1(M i > 0) where M i =1 + n X j=1 1(X j =X i ): 126 We rst consider the uniform case. A construction in [41] produces a coupling of Y toY s , having theY size biased distribution, which satisesjY s Yj 2. Given a realization of X =fX 1 ;X 2 ;:::;X n g, the coupling proceeds by rst selecting a ball I, uniformly fromf1; 2;:::;ng, and independently of X. Depending on the outcome of a Bernoulli variableB, whose distribution depends on the number of balls found in the urn containingI, a dierent ballJ will be imported into the urn that contains ball I. In some additional detail, letB be a Bernoulli variable with success probability P (B = 1) = M I , where k = 8 > > < > > : P (N>kjN>0)P (N>k) P (N=k)(1k=(n1)) if 0kn 2 0 if k =n 1; withN Bin(1=m;n 1). Now letJ be uniformly chosen fromf1; 2;:::;ngnfIg, independent of all other variables. Lastly, ifB = 1, move ball J into the same urn as I. It is clear thatjY 0 Yj 2, as at most the occupancy of two urns can aected by the movement of a single ball. We also note that if M I = 0, which happens when ball I is isolated, 0 = 1, so that I becomes no longer isolated after relocating ball J. We refer the reader to [41] for a full proof that this procedure produces a coupling of Y to a variable with the Y size biased distribution. For the uniform case, the following explicit formulas for and 2 can be found in Theorem II.1.1 of [31], = n 1 1 1 m n1 ! and 127 2 = (n) + (m 1)n(n 1) m 1 2 m n2 (n) 2 = n 1 1 m n1 + (m 1)n(n 1) m 1 2 m n2 n 2 1 1 m 2n2 : (5.26) Hence with and 2 as in (5.26), we can apply (5.3) of Theorem 5.1.1 for Y , the number of non isolated balls with C = 2, A = 2= 2 and B = 1=. Taking limits in (5.26), ifm andn both go to innity in such a way thatn=m! 2 (0;1), the mean and variance 2 obey n(1e ) and 2 ng() 2 where; g() 2 =e e 2 ( 2 + 1)> 0 for all 2 (0;1); (5.27) where for positive functionsf andh depending onn we writefh when lim n!1 f=h = 1. Hence, in this limiting case A and B satisfy A 2(1e ) e e 2 ( 2 + 1) and B 1 p ng() : In the nonuniform case similar results hold with some additional conditions. Letting jjpjj = sup 1im p i and = (n) = max(njjpjj; 1); 128 in [41] it is shown that whenjjpjj 1=11 and n 83 2 (1 + 3 + 3 2 )e 1:05 , there exists a coupling such that jY s Yj 3 and 2 8165 2 e 2:1 : Now also using Theorem 2.4 in [41] for a bound on 2 , we nd that (5.3) of Theorem 5.1.1 holds with A = 24; 495 2 e 2:1 and B = 1:5 p 7776 e 1:05 n p P m i=1 p 2 i : 5.3.4 An Application to Coverage Processes We consider the following coverage process, and associated coupling, from [21]. Given a collection U = fU 1 ;U 2 ;:::;U n g of independent, uniformly distributed points in thed dimensional torus of volumen, that is, the cubeC n = [0;n 1=d ) d R d with periodic boundary conditions, let V denote the total volume of the union of the n balls of xed radius centered at these n points, and S the number of balls isolated at distance, that is, those points for which none of the othern 1 points lie within distance . The random variables V and S are of fundamental interest in stochastic geometry, see [26] and [40]. If n!1 and remains xed, both V and S satisfy a central limit theorem [26, 36, 42]. The L 1 distance of V , properly standardized, to the normal is studied in [9] using Stein's method. The quality of 129 the normal approximation to the distributions of bothV andS, in the Kolmogorov metric, is studied in [21] using Stein's method via size bias couplings. In more detail, for x2 C n and r > 0 let B r (x) denote the ball of radius r centered at x, and B i;r =B(U i ;r). The covered volume V and number of isolated balls S are given, respectively, by V = Volume( n [ i=1 B i; ) and S = n X i=1 1f(U n \B i; =fU i gg: (5.28) We will derive concentration of measure inequalities for V and S with the help of the bounded size biased couplings in [21]. Assume d 1 and n 4. Denote the mean and variance of V by V and 2 V , respectively, and likewise for S, leaving their dependence on n and implicit. Let d = d=2 =(1 +d=2), the volume of the unit sphere in R d , and for xed let = d d . For 0r 2 let! d (r) denote the volume of the union of two unit balls with centers r units apart. We have ! 1 (r) = 2 +r, and ! d (r) = d + d1 Z r 0 (1 (t=2) 2 ) (d1)=2 dt; for d 2. From [21], the means of V and S are given by V =n (1 (1=n) n ) and S =n(1=n) n1 ; (5.29) 130 and their variances by 2 V =n Z B 2 (0) 1 d ! d (jyj=) n n dy +n(n 2 d ) 1 2 n n n 2 (1=n) 2n ; (5.30) and 2 S = n(1=n) n1 (1 (1=n) n1 ) +(n 1) Z B 2 (0)nB(0) 1 d ! d (jyj=) n n2 dy +n(n 1) 1 2 d n 1 2 n n2 1 n 2n2 ! : (5.31) It is shown in [21], by using a coupling similar to the one brie y described for the urn allocation problem in Section 5.3.3, that one can construct V s with the V size bias distribution which satisesjV s Vj. Hence (5.2) of Theorem 5.1.1 holds for V with A V = V 2 V and B V = 2 V ; where V and 2 V are given in (5.29) and (5.30), respectively. Similarly, with Y = nS the number of non-isolated balls, it is shown that Y s with Y size bias distribution can be constructed so thatjY s Yj d + 1, where d denotes the maximum number of open unit balls in d dimensions that can be packed so 131 they all intersect an open unit ball in the origin, but are disjoint from each other. Hence (5.2) of Theorem 5.1.1 holds for Y with A Y = ( d + 1)(n S ) 2 S and B Y = d + 1 2 S : To see how the A V ;A Y and B V ;B Y behave as n!1, let J r;d () =d d Z r 0 exp( d ! d (t))t d1 dt; and dene g V () = d J 2;d () (2 d + 2 )e 2 and g S () = e (1 + (2 d 2) + 2 )e 2 + d (J 2;d ()J 1;d ()): Then, again from [21], lim n!1 n 1 V = lim n!1 (1n 1 S ) = 1e ; lim n!1 n 1 2 V = g V ()> 0; and lim n!1 n 1 2 S = g S ()> 0: Hence, B V and B Y tend to zero at rate n 1=2 , and lim n!1 A V = (1e ) g V () ; and lim n!1 A Y = ( d + 1)(1e ) g S () : 132 5.3.5 The Lightbulb Problem The following stochastic process, known informally as the `lightbulb process', arises in a pharmaceutical study of dermal patches, see [44]. Changing dermal receptors to lightbulbs allows for a more colorful description. Considern lightbulbs, each op- erated by a switch. At day zero, none of the bulbs are on. At dayr forr = 1;:::;n, the position of r of the n switches are selected uniformly to be changed, indepen- dent of the past. One is interested in studying the distribution of the number of lightbulbs which are switched on at the terminal timen. The process just described is Markovian, and is studied in some detail in [54]. In [25] the authors use Stein's method to derive a bound to the normal via a monotone, bounded size bias cou- pling. Borrowing this coupling here allows for the application of Theorem 5.1.1 to obtain concentration of measure inequalities for the lightbulb problem. We begin with a more detailed description of the process. For r = 1;:::;n, letfX rk ;k = 1;:::;ng have distribution P (X r1 =e 1 ;:::;X rn =e n ) = n r 1 for all e k 2f0; 1g with P n k=1 e k =r, and let these collections of variables be independent overr. These `switch variables' X rk indicate whether or not on day r bulb k had its status changed. With Y k = n X r=1 X rk ! mod 2 133 therefore indicating the status of bulb k at time n, the number of bulbs switched on at the terminal time is Y = n X k=1 Y k : From [44], the mean and variance 2 of Y are given by = n 2 1 n Y i=1 1 2i n ! ; (5.32) and 2 = n 4 " 1 n Y i=1 1 4i n + 4i(i 1) n(n 1) # + n 2 4 " n Y i=1 1 4i n + 4i(i 1) n(n 1) n Y i=1 1 2i n 2 # : (5.33) Note that whenn is even =n=2 exactly, as the product in (5.32) is zero, containing the term i = n=2. By results in [44], in the odd case = (n=2)(1 +O(e n )), and in both the even and odd cases 2 = (n=4)(1 +O(e n )). The following construction, given in [25] for the case where n is even, couplesY to a variable Y s having the Y size bias distribution such that YY s Y + 2; (5.34) that is, the coupling is monotone, with dierence bounded by 2. For every i2 f1;:::;ng construct the collection of variables Y i from Y as follows. 134 IfY i = 1, that is, if bulbi is on, letY i =Y. Otherwise, withJ i =Ufj :Y n=2;j = 1Y n=2;i g, let Y i =fY i rk :r;k = 1;:::;ng where Y i rk = 8 > > > > > > > > > < > > > > > > > > > : Y rk r6=n=2 Y n=2;k r =n=2;k62fi;J i g Y n=2;J i r =n=2;k =i Y n=2;i r =n=2;k =J i ; and let Y i = P n k=1 Y i k where Y i k = n X r=1 Y i rk ! mod 2: Then, with I uniformly chosen fromf1;:::;ng and independent of all other vari- ables, it is shown in [25] that the mixtureY s =Y I has theY size biased distribution, essentially due to the fact that L(Y i ) =L(YjY i = 1) for all i = 1;:::;n. It is not dicult to see that Y s satises (5.34). If Y I = 1 then X I = X, and so in this case Y s = Y . Otherwise Y I = 0, and for the given I the collection Y I is constructed fromY by interchanging the stagen=2, unequal, switch variablesY n=2;I and Y n=2;J I. If Y J I = 1 then after the interchange Y 0 I = 1 and Y 0 J I = 0, in which case Y s = Y . If Y J I = 0 then after the interchange Y I I = 1 and Y I J I = 1, yielding 135 Y s = Y + 2. We conclude that for the case n even C = 2 and (5.2) and (5.3) of Theorem 5.1.1 hold with A =n= 2 and B = 1= (5.35) where 2 is given by (5.33). For the coupling in the odd case, n = 2m + 1 say, due to the parity issue, [25] considers a random variable V close to Y constructed as follows. In all stages but stagem andm + 1 let the switch variables which will yieldV be the same as those for Y . In stage m, however, with probability 1=2 one applies an additional switch variable, and in stagem+1, with probability 1=2, one switch variable fewer. In this way the switch variables in these two stages have the same, symmetric distribution and are close to the switch variables for Y . In particular, as at most two switch variables are dierent in the conguration for V , we havejVYj 2. Helped by the symmetry, one may coupleV to a variableV s with theV size bias distribution as in the even case, obtaining V V s V + 2. Hence (5.2) and (5.3) of Theorem 5.1.1 hold for V as for the even case with values given in (5.35), where = n=2 and 2 = (n=4)(1 +O(e n ). SincejVYj 2, by replacing t by t + 2= in the bounds for V one obtains bounds for the odd case Y . 136 Chapter 6 Concentration of Measure Results for Unbounded Couplings One of the main drawbacks of the concentration of measure results from the Chapter 5 is that the results are applicable only to bounded size bias couplings. In this chapter, we show how similar techniques can be applied on a somewhat case by case basis to some problems where the boundedness condition is violated. In particular we derive a concentration of measure inequality for the number of vertices in the Erd} os-R enyi random graph model and also derive concentration results on examples from innitely divisible distributions. In section 6.1, we describe the setup and the statement of the main result, Theorem 6.1.1 concerning the number of isolated vertices in the Erd} os-R enyi random graph model. In section 6.2, we give the detailed proof of Theorem 6.1.1. In section 6.3, we deal with some other examples where the size bias coupling is unbounded including the case of innitely divisible and compound Poisson distributions. 137 6.1 AConcentrationofMeasureResultforIsolatedVertices For some n2f1; 2;:::;g and p2 (0; 1) let K be the Erd} os-R enyi random graph on the verticesV =f1; 2;:::;ng and edge success probability p, that is, with edge indicators X uv = 1(u;v2V : uv is an edge in K) independent random variables with the Bernoulli(p) distribution for all u6= v. We set X vv = 0 for all v2V. Recall that the degree of a vertex v2V, denoted by d(v), is the number of edges incident on v. Hence, d(v) = X w2V X vw : Many authors have studied the distribution of Y = X v2V 1(d(v) =d) (6.1) counting the number of vertices v of K having some xed degree d. We derive upper bounds, for xed n andp, on the tail probabilities of the number of isolated vertices of K, that is, for Y in (6.1) for the case d = 0, which counts the number of vertices having no incident edges. For d in general, and p depending on n, previously in [30], the asymptotic normality of Y was shown when n (d+1)=d p!1 and np! 0, or np!1 and np lognd log logn!1; see also [39] and [6]. For the case d = 0 of isolated vertices, [2] and [4] show that Y is asymptotic normal if and only if n 2 p!1 and 138 np logn!1. Here we study the distribution of Y using a size bias coupling that was used in [23] to study the rate of convergence to the multivariate normal distribution for a vector whose components count the number of vertices of some xed degrees. In [33], the mean and variance 2 ofY for the particular cased = 0 are computed as =n(1p) n1 ; and 2 =n(1p) n1 (1 +np(1p) n2 (1p) n2 ) (6.2) for n 2. In the same paper, Kolmogorov distance bounds to the normal of order O(Var(Y ) 1=2 ) were obtained. O'Connell [38] showed that an asymptotic large deviation principle holds for Y . Rai c [45] obtained nonuniform large deviation bounds for mean zero, variance one random variables in some generality, and applied his results to the case of counting the number of isolated vertices. His results yield that with W = (Y)= P (Wt) 1 (t) e t 3 (t)=6 (1 +Q(t)(t)) for all t 0, (6.3) where (t) denotes the distribution function of a standard normal variate, Q(t) = 12 p 2 + 23 2 t + 11 p 2 2 t 2 ; 139 and (t) = n 6 3 (13 + 43np + 27(np) 2 ) exp (8 + 4np)t + 2np(e t= 1) : Still from [45], when np!c as n!1, (6.3) holds for all n suciently large with (t) = C 1 p n exp C 2 t p n +C 3 (e C 4 t= p n 1) (6.4) for some unspecied constantsC 1 ;C 2 ;C 3 andC 4 depending only onc. Fort of order p n, for instance, the function (t) will be of order 1= p n as n!1, allowing an asymptotic approximation of the deviation probability P (W t) by the normal, to within some factors. In Theorem 6.1.1 we supply a bound that likewise holds for all n, and that also gives somewhat more explicit information on the rate of tail decay. In particular, we see from (6.7) that the standardized variable W has a left tail that is bounded above by expft 2 2 =(4)g. Moreover, the right tail also exhibits similar bounds over some parameter regions, with a worst case bound there of order expftg by (6.8) for some > 0. Theorem 6.1.1. For n2f2; 3;:::g and p2 (0; 1) let K denote the random graph on n vertices where each edge is present with probability p, independently of all other edges, and let Y denote the number of isolated vertices in K, having mean 140 and variance 2 , as given in (6.2). Let M() =E exp((Y)=) be the moment generating function of the standardized Y variable. Then, letting s = e s (pe s + 1p) n2 (npe s + 1p) + (n 1)p + 1 and H() = 2 2 Z 0 s s= ds (6.5) we have M() expH() for all 0, and for all t> 0, P Y t inf 0 exp(t +H()): (6.6) For all 0 we have M() exp( 2 = 2 ), and for all t> 0, P Y t exp t 2 2 4 : (6.7) Though integration shows that we may explicitly write H() = 2 2 np((n 1)p + 1) 2 + 2 2 2np 2 pe = (n) +(1p) p e = 1 + 1 n1 2np ! ; the integral formula for H() in the theorem appears simpler to handle. Useful bounds for the minimization in (6.6) may be obtained by restricting to 2 [0; 0 ] for some 0 . In this case, as s= is an increasing function of s, we have H() 4 2 0 = 2 for 2 [0; 0 ]. 141 The quadratict + 0 = 2 =(4 2 ) in is minimized at = 2t 2 =( 0 = ). When this value falls in [0; 0 ] we obtain the rst bound in (6.8), and setting = 0 yields the second, thus, P Y t 8 > > < > > : exp( t 2 2 0 = ) for t2 [0; 0 0 = =(2 2 )] exp( 0 t + 0 = 2 0 4 2 ) for t2 ( 0 0 = =(2 2 );1). (6.8) Inequality (6.8) and the boundedness of Y yields the following useful corollary. Corollary 6.1.1. For all c2 (0;1) there exists a positive constant k depending only on c such that when p2 (0; 1) and np!c2 (0;1) as n!1, P ((Y)=t) exp(kt 2 ) for all t 0 and all n 2. Proof. Since Y can be no more than n, and 2 increases at rate n when np! c, there exists a positive constant a 0 such that Y n a 0 p n: Hence P ((Y)=)t) = 0 for all ta 0 p n. For any given n let n = a 0 p n 2 =. Then, as s 2 for all s 0, we have ( n 0 = )=(2 2 )a 0 p n, so the rst bound in (6.8) applies for allta 0 p n. Note that n = =a 0 p n= converges to a positive constant, implying the convergence of 142 n= , and hence that of 2 =( n= ), also to a positive constant. Since 2 =( n= ) is positive for alln 2, we see that the claim of the corollary holds for all k in the nonempty interval (0; inf n 2 =( n= )). In the asymptotic of Corollary 6.1.1, for, sayt =a p n, the function(t) of (6.4) behaves likeC= p n, so the bound (6.3) also gives useful information for some range of positive values of a up to some upper limit. However as exp(t 3 (t)=6) behaves like exp(Cna 3 =6), when multiplied by 1 (t), of exponential order exp(a 2 n=2), the product tends to innity for all suciently large a, so the bound in (6.3) may explode before the right tail of W vanishes. The main tool used in proving Theorem 6.1.1 is size bias coupling, that is, the construction of Y and Y s on the same space where Y s has the Y -size biased distribution characterized by identity (5.1). In the previous chapter, size bias couplings were used to prove concentration of measure inequalities whenjY s Yj can be almost surely bounded by a constant independent of the problem size. Here, in contrast, we apply the coupling for the number of isolated vertices of K from [23], which violates the boundedness condition. Unlike the theorem used in [17] and [18], which can be applied to a wide variety of situations under a bounded coupling assumption, it seems that cases where the coupling is unbounded, such as the one we consider here, need application specic treatment, and cannot be handled by one single general result. 143 6.2 Proof of Theorem 6.1.1 For any graph with vertex setV, forv2V we letN(v) denote the set of neighbors of v, N(v) =fw2V :X vw = 1g; where X vw is the indicator that there exists an edge connecting vertices v and w. We now present the proof of Theorem 6.1.1. Proof. Following Proposition 2.3.1, we rst construct a coupling of Y s , having the Y -size bias distribution, toY . LetK be given, and letY be the number of isolated vertices in K. From (6.1) with d = 0 we see that Y is the sum of exchangeable indicators. Let V be uniformly chosen fromV, independent of the remaining vari- ables. If V is already isolated, do nothing and set K s =K. Otherwise, let K s be the graph obtained by deleting all the edges connected to V in K. By Proposition 2.3.1, the variableY s counting the number of isolated vertices ofK s has theY -size biased distribution. Since all edges incident to the chosen V are removed in order to form K s , any neighbor of V which had degree one thus becomes isolated, and V also becomes 144 isolated if it was not so earlier. As 1(d(w) = 0) is unchanged for allw62fVg[N(V ), we have Y s Y =d 1 (V ) + 1(d(V )6= 0) where (6.9) for any v2V we let d 1 (v) = P w2N(v) 1(d(w) = 1): In particular the coupling is monotone, that is, Y s Y . Further, since d 1 (V )d(V ), by (6.9) we have 0Y s Yd(V ) + 1: (6.10) Let 0. Using (5.5) and (6.10), we have E(e Y s e Y ) 2 E (Y s Y )(e Y s +e Y ) = 2 E e Y (Y s Y )(e (Y s Y ) + 1) 2 E e Y (d(V ) + 1)(e (d(V )+1) + 1) : (6.11) Clearly the number of isolated vertices Y is a nonincreasing function of the edge indicatorsX vw , whiled(V )+1 is a nondecreasing function of these same indicators. HenceY andd(V ) + 1 have negative correlations, that is, by the inequality of [27], E[f(Y )g(d(V ) + 1)]E[f(Y )]E[g(d(V ) + 1)] (6.12) 145 for any two nondecreasing real functions f and g. In particular, when f(x) = e x and g(x) =x(e x + 1) with x2 [0;1), by (6.11) and (6.12) we obtain E(e Y s e Y ) 2 Ee Y E (d(V ) + 1)(e (d(V )+1) + 1) = 2 Ee Y e E d(V )e d(V ) +e d(V ) +E(d(V )) + 1 : (6.13) To handle the terms in (6.13), note that for any vertex v the degree d(v) has the Binomial(n 1;p) distribution, and in particular E(d(v)) = (n 1)p and E(e d(v) ) = where = (pe + 1p) n1 : Hence, as V is chosen uniformly over the vertices v of K, E(d(V )) = (n 1)p and E(e d(V ) ) = ; (6.14) and now dierentiation under the second expectation above, allowed since d(V ) is bounded, yields E(d(V )e d(V ) ) = where = (n 1)pe (pe + 1p) n2 : (6.15) Substituting (6.14) and (6.15) into (6.13) yields, for all 0, E(e Y s e Y ) 2 E(e Y ) where =e ( + ) + (n 1)p + 1: (6.16) 146 Letting m() =E(e Y ), using that Y is bounded to dierentiate under the expec- tation, along with (5.1) and (6.16), we obtain m 0 () =E(Ye Y ) =E(e Y s ) 1 + 2 m(): (6.17) Standardizing Y , we set M() =E(exp((Y)=)) =e = m(=); (6.18) and now by dierentiating and applying (6.17), we obtain M 0 () = 1 e = m 0 (=) e = m(=) e = 1 + = 2 m(=) e = m(=) = e = = 2 2 m(=) = = 2 2 M(): Since M(0) = 1, integrating M 0 (s)=M(s) over [0;] yields the bound log(M())H(); or that M() exp(H()) where H() = 2 2 Z 0 s s= ds; proving the claim on M() for 0. Moreover, for nonnegative, P Y t P exp (Y) e t e t M() exp(t +H()): As the inequality holds for all 0, it holds for the inmum over 0, proving (6.6). 147 To demonstrate the left tail bound let < 0. Since Y s Y and < 0, using (5.5), (6.10) and that Y is a function of K we obtain E(e Y e Y s ) jj 2 E (e Y +e Y s )(Y s Y ) jjE(e Y (Y s Y )) =jjE(e Y E(Y s YjK)): (6.19) Note that X v2V d 1 (v) = X v2V X w2N(v) 1(d(w) = 1) = X w2V X v2N(w) 1(d(w) = 1) = X w2V jN(w)j1(d(w) = 1) = X w2V 1(d(w) = 1); (6.20) the number of degree one vertices in K. Hence, using (6.9) we obtain E(Y s YjK) = 1 n X v2V (d 1 (v) + 1(d(v)6= 0)) 1 n X v2V d 1 (v) + 1 2: (6.21) Now, by (6.19) and (6.21), E(e Y e Y s ) 2jjE(e Y ) and therefore, justifying dierentiating under the expectation as before, applying (5.1) yields m 0 () =E(Ye Y ) =E(e Y s ) (1 + 2)m(): 148 Again with M() as in (6.18), M 0 () = 1 e = m 0 (=) e = m(=) e = ((1 + 2=)m(=)) e = m(=) = 2 2 M(): Dividing by M(), integrating over [; 0] and exponentiating yields M() exp 2 2 ; (6.22) showing the claimed bound on M() for < 0. The inequality in (6.22) implies that for all t> 0 and < 0, P Y t exp t + 2 2 : Taking =t 2 =(2) we obtain (6.7). 149 6.3 Other Applications In this section we derive concentration of measure inequalities for another example whereY s Y is not bounded: the nonnegative innitely divisible distributions with certain associated moment generating functions which satisfy a boundedness con- dition. As an example for nonnegative innitely divisible distribution, compound Poisson distributions will be our main illustration. 6.3.0.1 Innitely Divisible Distributions When Y is Poisson then Y s =Y + 1 and we may write Y s =Y +X (6.23) withX andY independent. Theorem 5.3 of [51] shows that ifY is nonnegative with nite mean then (6.23) holds if and only if Y is innitely divisible. Hence, in this case, a coupling ofY toY s may be achieved by generating the independent variable X and adding it toY . SinceY s is always stochastically larger thanY we must have X 0, and therefore this coupling is monotone. In addition Y s Y = X so the coupling is bounded if and only ifX is bounded. WhenX is unbounded, Theorem 6.3.1 provides concentration of measure inequalities forY under appropriate growth conditions on two generating functions in Y and X. We assume without further 150 mention that Y is nontrivial, and note that therefore the means of both Y and X are positive. Theorem 6.3.1. LetY have a nonnegative innitely divisible distribution and sup- pose that there exists > 0 so that E(e Y )<1. Let X have the distribution such that (6.23) holds whenY andX are independent, and assumeE(Xe X ) =C <1. Letting = E(Y ); 2 = Var(Y ); = E(X) and K = (C +)=2, the following concentration of measure inequalities hold for all t> 0, P Y t 8 > > < > > : exp t 2 2 2K for t2 [0; K= 2 ) exp t + K 2 2 2 for t2 [ K= 2 ;1), and P Y t exp t 2 2 2 : Proof. Since Y s =Y +X with Y and X independent and X 0, using (5.5) with 2 (0; ) we have, E(e Y s e Y ) = E(e (X+Y ) e Y ) 1 2 E X(e (X+Y ) +e Y ) = 2 E X(e X + 1)e Y = 2 E X(e X + 1) E(e Y ) 2 (E(Xe X ) +E(X))E(e Y ) = Km() where K = (C +)=2 and m() =E(e Y ). 151 Now adding m() to both sides yields E(e Y s ) (1 +K)m(); and therefore m 0 () =E(Ye Y ) =E(e Y s )(1 +K)m(): (6.24) Again, with M() the moment generating function of (Y)=, M() =Ee (Y)= =e = m(=); by (6.24) we have, M 0 () = (=)e = m(=) +e = m 0 (=)= (=)e = m(=) + (=)e = 1 +K m(=) = (= 2 )KM(): (6.25) Integrating, and using the fact that M(0) = 1 yields M() exp K 2 2 2 for 2 (0; ). Hence for a xed t> 0, for all 2 (0; ), P Y t e t M() exp t + K 2 2 2 : 152 The inmum of the quadratic in the exponent is attained at = t 2 =K. When this value lies in (0; ) we obtain the rst, right tail bound, for t in the bounded interval, while setting = yields the second. Moving on to the left tail bound, using (5.5) for < 0 yields E(e Y e Y s ) 2 E((Y s Y )(e Y +e Y s ))E(Xe Y ) =E(X)E(e Y ): Rearranging we obtain m 0 () =E(e Y s )(1 +)m(): Following calculations similar to (6.25) one obtains M 0 () (= 2 )M() for all < 0, which upon integration over [; 0] yields M() exp 2 2 2 for all < 0: Hence for any xed t> 0, for all < 0, P Y t e t M() exp t + 2 2 2 : (6.26) Substituting =t 2 =() in (6.26) yields the lower tail bound, thus completing the proof. 153 Though Theorem 6.3.1 applies in principle to all nonnegative innitely divisible distributions with generating functions for Y and X that satisfy the given growth conditions, we now specialize to the subclass of compound Poisson distributions, over which it is always possible to determine the independent incrementX. Not too much is sacriced in narrowing the focus to this case, since a nonnegative innitely divisible random variable Y has a compound Poisson distribution if and only if P (Y = 0)> 0. 154 6.3.0.2 Compound Poisson Distribution One important subfamily of the innitely divisible distributions are the compound Poisson distributions, that is, those distributions that are given by Y = N X i=1 Z i ; where N Poisson(), andfZ i g 1 i=1 are i.i.d and Z i L =Z. (6.27) Compound Poisson distributions are popular in several applications, such as insur- ance mathematics, seismological data modelling, and reliability theory; the reader is referred to [3] for a detailed review. Although Z is not in general required to be nonnegative, in order to be able to size bias Y we restrict ourselves to this situation. It is straightforward to verify that when the moment generating function m Z () = Ee Z of Z is nite, then the moment generating function m() of Y is given by m() = exp((1m Z ())): In particular m() is nite whenever m Z () is nite. As Y in (6.27) is innitely divisible the equality (6.23) holds for some X; the following lemma determines the distribution of X in this particular case. 155 Lemma 6.3.1. Let Y have the compound Poisson distribution as in (6.27) where Z is nonnegative and has nite, positive mean. Then Y s =Y +Z s ; has the Y size biased distribution, where Z s has the Z size bias distribution and is independent of N andfZ i g 1 i=1 . Proof. Let V (u) =Ee iuV for any random variable V . If V is nonnegative and has nite positive mean, using f(y) =e iuy in (5.1) results in V s(u) = 1 EV EVEe iuV s = 1 EV EVe iuV = 1 iEV 0 V (u): (6.28) It is easy to check that the characteristic function of the compound Poisson Y in (6.27) is given by Y (u) = exp((1 Z (u))); (6.29) and letting EZ =#, that EY =#. Now applying (6.28) and (6.29) results in Y s(u) = 1 i# 0 Y (u) = 1 i# Y (u) 0 Z (u) = Y (u) Z s(u): To illustrate Lemma 6.3.1, consider the Cram er-Lundberg model [13] from insur- ance mathematics. Suppose an insurance company starts with an initial capital u 0 , and premium is collected at the constant rate . Claims arrive according to a 156 homogenous Poisson processfN g 0 with rate, and the claim sizes are indepen- dent with common distributionZ. The aggregate claimsY made by time 0 is therefore given by (6.27) with N and replaced by N and , respectively. Distributions for Z which are of interest for applications include the Gamma, Weibull, and Pareto, among others. For concreteness, if Z Gamma(;) then Z s Gamma( + 1;), and the mean of the incrementZ s , and the mean and variance 2 of Y , are given by = ( + 1); = and 2 = 2 : The conditions of Theorem 6.3.1 are satised with any 2 (0; 1=) sinceE(e Y )< 1 andE(Z s e Z s )<1 for all< 1=. Taking = 1=(M) forM > 1 for example, yields C =E(Z s e Z s ) = ( + 1)( M M 1 ) +2 : For instance, the lower tail bound of Theorem 6.3.1 now yields a bound on the probability that the aggregate claims by time will be `small', of P Y t exp t 2 2( + 1) : It should be noted that in some applications one may be interested in Z which are heavy tailed. Hence, these cases do not satisfy the conditions in Theorem 6.3.1. 157 References [1] P. Baldi, Y. Rinott, and C. Stein. A normal approximation for the number of local maxima of a random function on a graph. In Probability, statistics, and mathematics, pages 59{81. Academic Press, Boston, MA, 1989. [2] A. D. Barbour. Poisson convergence and random graphs. Math. Proc. Cam- bridge Philos. Soc., 92(2):349{359, 1982. [3] A. D. Barbour and O. Chryssaphinou. Compound Poisson approximation: a user's guide. Ann. Appl. Probab., 11(3):964{1002, 2001. [4] A. D. Barbour, Micha l Karo nski, and Andrzej Ruci nski. A central limit the- orem for decomposable random variables with applications to random graphs. J. Combin. Theory Ser. B, 47(2):125{145, 1989. [5] Dave Bayer and Persi Diaconis. Trailing the dovetail shue to its lair. Ann. Appl. Probab., 2(2):294{313, 1992. [6] B ela Bollob as. Random graphs. Academic Press Inc. [Harcourt Brace Jo- vanovich Publishers], London, 1985. [7] E. Bolthausen. An estimate of the remainder in a combinatorial central limit theorem. Z. Wahrsch. Verw. Gebiete, 66(3):379{386, 1984. [8] Sourav Chatterjee. Stein's method for concentration inequalities. Probab. Theory Related Fields, 138(1-2):305{321, 2007. [9] Sourav Chatterjee. A new method of normal approximation. Ann. Probab., 36(4):1584{1610, 2008. 158 [10] L.H.Y. Chen and Q.M. Shao. Stein's metod for normal approximation. In An Introduction to Stein's Method, pages 1{59. Singapore University Press and World Scientic, 2005. [11] L.H.Y. Chen, Q.M. Shao, and L. Goldstein. Normal approximation by Stein's method. Springer-verlag, 2011. [12] Devdatt P. Dubhashi and Alessandro Panconesi. Concentration of measure for the analysis of randomized algorithms. 2005. [13] P. Embrekht . s and K. Klyuppel 0 berg. Some aspects of insurance mathematics. Teor. Veroyatnost. i Primenen., 38(2):374{416, 1993. [14] Gunnar Englund. A remainder term estimate for the normal approximation in classical occupancy. Ann. Probab., 9(4):684{692, 1981. [15] C. G. Esseen. A moment inequality with an application to the central limit theorem. Skand. Aktuarietidskr., 39:160{170 (1957), 1956. [16] William Feller. An introduction to probability theory and its applications. Vol. II. Second edition. John Wiley & Sons Inc., New York, 1971. [17] S. Ghosh and L. Goldstein. Concentration of measures via size biased cou- plings. Probab. Theory Related Fields, 149:271{278, 2011. [18] Subhankar Ghosh and Larry Goldstein. Applications of size biased couplings for concentration of measures. Electron. Commun. Probab., 16:70{83, 2011. [19] Larry Goldstein. Berry-Esseen bounds for combinatorial central limit theo- rems and pattern occurrences, using zero and size biasing. J. Appl. Probab., 42(3):661{683, 2005. [20] Larry Goldstein. L 1 bounds in normal approximation. Ann. Probab., 35(5):1888{1930, 2007. [21] Larry Goldstein and Mathew D. Penrose. Normal approximation for cover- age models over binomial point processes. Ann. Appl. Probab., 20(2):696{721, 2010. 159 [22] Larry Goldstein and Gesine Reinert. Stein's method and the zero bias trans- formation with application to simple random sampling. Ann. Appl. Probab., 7(4):935{952, 1997. [23] Larry Goldstein and Yosef Rinott. Multivariate normal approximations by Stein's method and size bias couplings. J. Appl. Probab., 33(1):1{17, 1996. [24] Larry Goldstein and Yosef Rinott. A permutation test for matching and its asymptotic distribution. Metron, 61(3):375{388 (2004), 2003. [25] Larry Goldstein and Haimeng Zhang. A berry esseen theorem for the lightbulb problem. Adv. in Appl. Probab., 43:875{898, 2011. [26] Peter Hall. Introduction to the theory of coverage processes. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statis- tics. John Wiley & Sons Inc., New York, 1988. [27] T. E. Harris. A lower bound for the critical probability in a certain percolation process. Proc. Cambridge Philos. Soc., 56:13{20, 1960. [28] Soo Thong Ho and Louis H. Y. Chen. An L p bound for the remainder in a combinatorial central limit theorem. Ann. Probability, 6(2):231{249, 1978. [29] Wassily Hoeding. A combinatorial central limit theorem. Ann. Math. Statis- tics, 22:558{566, 1951. [30] Micha l Karo nski and Andrzej Ruci nski. Poisson convergence and semi-induced properties of random graphs. Math. Proc. Cambridge Philos. Soc., 101(2):291{ 300, 1987. [31] Valentin F. Kolchin, Boris A. Sevast 0 yanov, and Vladimir P. Chistyakov. Ran- dom allocations. V. H. Winston & Sons, Washington, D.C., 1978. Translated from the Russian, Translation edited by A. V. Balakrishnan, Scripta Series in Mathematics. [32] V.F. Kolchin and V.P. Chistyakov. On a combinatorial central limit theorem. Theory Probab. Appl., 18:728{739, 1973. [33] Wojciech Kordecki. Normal approximation and isolated vertices in random graphs. In Random graphs '87 (Pozna n, 1987), pages 131{139. Wiley, Chich- ester, 1990. 160 [34] Michel Ledoux. The concentration of measure phenomenon, volume 89 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI, 2001. [35] Hiroshi Midzuno. On the sampling system with probability proportionate to sum of sizes. Ann. Inst. Statist. Math., Tokyo, 3:99{107, 1952. [36] P. A. P. Moran. The random volume of interpenetrating spheres in space. J. Appl. Probability, 10:483{490, 1973. [37] Minoru Motoo. On the Hoeding's combinatorial central limit theorem. Ann. Inst. Statist. Math. Tokyo, 8:145{154, 1957. [38] Neil O'Connell. Some large deviation results for sparse random graphs. Probab. Theory Related Fields, 110(3):277{285, 1998. [39] Zbigniew Palka. On the number of vertices of given degree in a random graph. J. Graph Theory, 8(1):167{170, 1984. [40] Mathew Penrose. Random geometric graphs, volume 5 of Oxford Studies in Probability. Oxford University Press, Oxford, 2003. [41] Mathew D. Penrose. Normal approximation for isolated balls in an urn allo- cation model. Electron. J. Probab., 14:no. 74, 2156{2181, 2009. [42] Mathew D. Penrose and J. E. Yukich. Central limit theorems for some graphs in computational geometry. Ann. Appl. Probab., 11(4):1005{1041, 2001. [43] M. P. Quine and J. Robinson. A Berry-Esseen bound for an occupancy prob- lem. Ann. Probab., 10(3):663{671, 1982. [44] C. Radhakrishna Rao, M. Bhaskara Rao, and Haimeng Zhang. One bulb? Two bulbs? How many bulbs light up?|a discrete probability problem involving dermal patches. Sankhy a, 69(2):137{161, 2007. [45] Martin Rai c. CLT-related large deviation bounds based on Stein's method. Adv. in Appl. Probab., 39(3):731{752, 2007. [46] Gesine Reinert and Adrian R ollin. Multivariate normal approximation with Stein's method of exchangeable pairs under a general linearity condition. Ann. Probab., 37(6):2150{2173, 2009. 161 [47] A. Schiman, S. Cohen, R. Nowik and D. Selinger. Initial diagnostic hypothe- ses: factors which may distort physicians judgement. Organisational Behaviour and Human Performance, 21:305{315, 1978. [48] I. Shevtsova. On the absolute constants in the BerryEsseen type equalities for identically distributed summands. Preprint, 2011. [49] Charles Stein. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability theory, pages 583{ 602, Berkeley, Calif., 1972. Univ. California Press. [50] Charles Stein. Approximate computation of expectations. Institute of Math- ematical Statistics Lecture Notes|Monograph Series, 7. Institute of Mathe- matical Statistics, Hayward, CA, 1986. [51] F. W. Steutel. Some recent results in innite divisibility. Stochastic Processes Appl., 1:125{143, 1973. [52] Bengt von Bahr. Remainder term estimate in a combinatorial limit theorem. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 35(2):131{139, 1976. [53] A. Wald and J. Wolfowitz. Statistical tests based on the permutations of the observations. Ann. Math. Statistics, 15:358{372, 1944. [54] Hua Zhou and Kenneth Lange. Composition Markov chains of multinomial type. Adv. in Appl. Probab., 41(1):270{291, 2009. 162
Abstract (if available)
Abstract
Stein's method is one of the cornerstones of modern limit theory in probability. While working on the problem of inadmissibility of the multivariate sample mean for Gaussian distribution, Charles Stein used a characterization identity of the normal distribution. In 1972, he showed that the same identity could be used to prove the Central Limit Theorem in cases where summands were not even independent and also to yield the rate of convergence to normal distribution. For probability theorists, this new method was appealing as it did not use Fourier transforms. Within a few years many new CLT error bounds were obtained for cases where the summands were not independent. Louis Chen showed that similar ideas could be used for obtaining error bounds in the realm of Poisson approximation also. From the early days of Stein's method, coupled random variables played an important role in Stein's method and some of the very useful coupling techniques were devised in the late eighties and nineties. In this thesis, we will look at two of these couplings namely the zero and size bias couplings. In particular, we will show how zero bias couplings can be used to obtain error bounds in a combinatorial central limit theorem using involutions. We furthermore show that the couplings are useful not only for obtaining error bounds but for obtaining concentration of measure inequalities as well. We illustrate the use of our results through several nontrivial examples.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Concentration inequalities with bounded couplings
PDF
Stein's method via approximate zero biasing and positive association with applications to combinatorial central limit theorem and statistical physics
PDF
Finite sample bounds in group sequential analysis via Stein's method
PDF
Stein's method and its applications in strong embeddings and Dickman approximations
PDF
Limit theorems for three random discrete structures via Stein's method
PDF
Exchangeable pairs in Stein's method of distributional approximation
PDF
Applications of Stein's method on statistics of random graphs
PDF
High order correlations in sampling and concentration bounds via size biasing
PDF
Probabilistic numerical methods for fully nonlinear PDEs and related topics
PDF
Cycle structures of permutations with restricted positions
PDF
Multi-population optimal change-point detection
PDF
Probabilistic divide-and-conquer -- a new method of exact simulation -- and lower bound expansions for random Bernoulli matrices via novel integer partitions
PDF
Copula methods and dependence concepts
PDF
Sequential testing of multiple hypotheses
PDF
Probability assessment: Continuous quantities and probability decomposition
PDF
Dimension reduction techniques for noise stability theorems
PDF
Differentially private and fair optimization for machine learning: tight error bounds and efficient algorithms
PDF
Robust estimation of high dimensional parameters
PDF
On the depinning transition of the directed polymer in a random environment with a defect line
PDF
Population modeling and Bayesian estimation for the deconvolution of blood alcohol concentration from transdermal alcohol biosensor data
Asset Metadata
Creator
Ghosh, Subhankar
(author)
Core Title
Stein couplings for Berry-Esseen bounds and concentration inequalities
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Publication Date
05/08/2012
Defense Date
01/13/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Mathematics,OAI-PMH Harvest,probability,Stein's method
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Goldstein, Larry (
committee chair
), Alexander, Kenneth S. (
committee member
), Langholz, Bryan (
committee member
)
Creator Email
subhankar.ghosh@gmail.com,subhankar.ghosh@sas.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-35920
Unique identifier
UC11289387
Identifier
usctheses-c3-35920 (legacy record id)
Legacy Identifier
etd-GhoshSubha-815.pdf
Dmrecord
35920
Document Type
Dissertation
Rights
Ghosh, Subhankar
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
probability
Stein's method