Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Stein's method via approximate zero biasing and positive association with applications to combinatorial central limit theorem and statistical physics
(USC Thesis Other)
Stein's method via approximate zero biasing and positive association with applications to combinatorial central limit theorem and statistical physics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
STEIN’S METHOD VIA APPROXIMATE ZERO BIASING AND POSITIVE ASSOCIATION WITH APPLICATIONS TO COMBINATORIAL CENTRAL LIMIT THEOREM AND STATISTICAL PHYSICS by Nathakhun Wiroonsri A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (APPLIED MATHEMATICS) December 2018 Copyright 2018 Nathakhun Wiroonsri Dedication To my mom, dad, wife and daughter. ii Acknowledgments The bridge between goals and accomplishment is always filled with inspirations, knowledge, discipline and people. I definitely owe many thanks big and small to colleagues, friends and family in the past five years at USC. I would like to use this opportunity to mention those who have played significant roles in my Ph.D. journey. First I would like to express my sincere gratitude and respect to my advisor, Professor Larry Goldstein. I remember the moment in my first year that I met him and got his advice about Stein’s method. That was the time I learned the first lesson of Stein’s method which turned out to be my research area. From day one, he has been supportive, understanding, patient and inspiring. He advised me to my very first research questions and has been there mentoring along the way. He has always encouraged me to begin new research on my own. Not only does he provide invaluable technical advice, he has also spent a tremendous amount of time editing and proofreading my works despite his very busy schedule. Without him, my Ph.D. journey would have been a rocky ride. I would like to thank Professor Jason Fulman for his fruitful guidance in the very first stage of my Ph.D. study and introducing one of the problems in my dissertation. He has always been approachable and very kind. I would also like to thank Professor Sergey Lototsky for being my second language exam mentor and iii for his helpful comments on my research during the probability seminar. Thanks also to both of them for serving on my qualifying exam committee. I would like to express my appreciation to Professor Kenneth Alexander for his suggestions and comments regarding statistical physics and also for serving on my qualifying exam and dissertation committees. I would like to acknowledge Professor Fengzhu Sun as well for his kindness in agreeing to serve on my qualifying exam committee before we actually met and now once again my dissertation committee. I feel grateful for the innumerable help I have received from Amy Yung since the first day I arrived at the mathematics department. Without her, my Ph.D. life would be much more difficult. Thanks also to the other staffs who have made everything in the department very convenient. My parents are also one of the biggest parts of my success. They are the persons who inspired me to start this Ph.D. journey. Though it is not possible to list everything they have done for me, I would like to acknowledge in a few sentences the things that have led me to where I am today. My dad is the one who taught me the first lesson of mathematics and had been the most important mentor throughout my academic life prior to moving to the U.S. I thank my mom for always taking care of me in every tiny aspect of my life and giving me her endless unconditional love. Last but not least, I would like to thank my wife, Krittiya and my daughter, Naranat. If our lives were hanging on graphs, we would only know who really love us when we were at a minimum or close to. My wife has been so supportive and understanding and been with me in both best and worst days of my life. In the past seven years, she has been the reason I grew up and became a better man. She is also a great and dedicated mom. Although my daughter is only 1 year old, she makes me laugh every single day and makes me smile even in the toughest days. iv If I consider the accomplishment today as the end of the bridge, these are the two persons who have walked with me through the end. v Contents Dedication ii Acknowledgments iii Abstract viii 1 Introduction and Organization 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Outline and Summary . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Overview of Stein’s Method 7 2.1 Introduction to Stein’s Method . . . . . . . . . . . . . . . . . . . . 7 2.2 Zero Bias Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Combinatorial Central Limit Theorems . . . . . . . . . . . . 15 3 Approximate Zero Bias Couplings 20 3.1 Construction of Approximate Zero Bias Couplings . . . . . . . . . . 20 3.2 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.1 L 1 Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.2 L ∞ Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3 Ewens Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Combinatorial CLT under Ewens Measure . . . . . . . . . . . . . . 33 4 Stein’s Method for Positively Associated Random Fields 64 4.1 Positive Association . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 L 1 Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3 Multivariate Smooth Function Metric Bounds . . . . . . . . . . . . 73 5 Applications to Statistical Physics Models 80 5.1 Second order stationary positively associated fields with exponential covariance decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2 The Ising Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3 Bond Percolations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 vi 5.4 The Voter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.5 The Contact Process . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.5.1 Theamountoftimethataboundedregioncontainsaninfected site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.5.2 The number of infected sites at any fixed time . . . . . . . . 130 6 Summary, Related Works and Further Questions 132 Reference List 135 vii Abstract Stein’s method is nowadays one of the most powerful methods to prove limit theorems in probability and statistics. It was first introduced in 1972 by Charles Stein and used originally for normal approximation. Its exceptional feature over other methods is that it is based on a characteristic equation of the normal dis- tribution and uses coupling constructions. In particular, it does not require the use of complex valued characteristic functions. Since its inception the method has grown dramatically and has now expanded to cover other distributional approxi- mations and even is applied in non-distributional approximation contexts such as concentration of measure. Two additional standout features of the method is that it provides non asymptotic bounds on the distance between distributions for finite sample sizes, and that it can handle a number of situations involving dependence. In 1997, Larry Goldstein and Gesine Reinert introduced a coupling technique using the zero bias transformation that can be applied to the combinatorial central limit theorem in the case where the given random permutation has the uniform distribution. Our first main focus of this dissertation is to generalize the zero bias coupling to approximate zero bias couplings, and also to extend the application of the combinatorial central limit theorem to the case where the random permutation has the Ewens distribution with parameter θ > 0, which contains the uniform distribution as a special case when θ = 1. viii Stein’s method also has many well-known applications related to statistical physics. Positive association among the underlying variables is often satisfied for various models in the field. Our second main focus is to develop an L 1 version of Stein’s method for sums of positively associated random variables and provide results in the multidimensional case where the L 1 metric is replaced by a smooth functions metric. These results are then applied to four well-known models in statistical physics and particle systems; the ferromagnetic nearest-neighbor Ising model, the supercritical bond percolation model, the voter model and the contact process. ix Chapter 1 Introduction and Organization “It is remarkable that a science which began with the consideration of games of chance should have become the most important object of human knowledge.” Pierre-Simon Laplace 1.1 Introduction It was in the early 18th-century that the initial idea of the Central Limit The- orem, or CLT, originated. In 1733, Abraham de Moivre studied the asymptotic distribution of the binomial distribution of the sum S n =X 1 +X 2 +··· +X n , (1.1) where X i , i∈ [n] :={1,...,n} are n independent Bernoulli trials with the same probability of success. The classic version of the Central Limit Theorem was first established by Laplace about a century later, stated as follows. For independent and identically distributed (i.i.d.) random variables X 1 ,X 2 ,...,X n , with finite mean μ and variance 0<σ 2 <∞, W n := S n −nμ √ nσ → d Z, (1.2) where S n is as in (1.1), Z is a standard normal variable and→ d denotes con- vergence in distribution. Recall that we say Y n converges in distribution to Y if 1 lim n→∞ P (Y n ≤ y) = P (Y ≤ y) for all continuity points y of P (Y ≤ y). Shortly after that, Lyapunov extended the classical CLT to the so-called Lyapunov’s CLT which allows independent random variables X 1 ,X 2 ,...,X n to be non-identically distributed. Since then the CLT has become one of the most well-known and powerful tools in probability theory and statistics. Its theoretical basis has been one of the most studied problems in probability, and it has innumerable applications ranging from gambling to national healthcare. Although it is extremely useful thattheclassicalCLTprovidesthelimitingdistributionforthesumofindependent variables, two questions that naturally arise are: “How fast does it converge?” and “What if the variables are dependent?”. As the first question relates to the ‘speed’ of convergence, responses to this question need to be based on some distance between distributions. We first recall that the L 1 and L ∞ distances between the distributionsL(X) andL(Y ) of real valued random variables X and Y are given, respectively, by d 1 L(X),L(Y ) = Z ∞ −∞ |P (X≤t)−P (Y≤t)|dt, (1.3) and d ∞ L(X),L(Y ) = sup t∈R |P (X≤t)−P (Y≤t)|. (1.4) Thesetwodistancesabovearewell-knownandaretermedtheWassersteinandKol- mogorov distances, respectively. It is not difficult to see that eitherd 1 (X n ,X)→ 0 or d ∞ (X n ,X)→ 0 implies X n → d X. 2 The question regarding the speed of convergence was first handled by Berry in 1941 ([10]) and Esseen in 1942 ([36]), showing that, under the third moment assumptionE|X 1 | 3 <∞, d ∞ L(W n ),L(Z) ≤ CE|X 1 | 3 σ 3 √ n , whereW n is defined as in (1.2) andC is a universal constant. The optimality of the constant C has attracted many researchers since then and nowadays it is known that 0.4097≤ C ≤ 0.4748 where the lower bound was proved in the Esseen’s original paper and the upper bound was recently obtained by Shevtsova in [80]. The same type of bound for non-identically distributed variables was also shown by Esseen in [36], that, with X i , i∈ [n] be independent,EX i =μ i , Var(X i ) =σ 2 i > 0 and W n = (S n − P μ i ) q P σ 2 i , d ∞ L(W n ),L(Z) ≤ C 0 P n i=1 E|X i | 3 ( P n i=1 σ 2 i ) 3/2 , where the lower and upper bounds of C 0 were again shown by Esseen and Shevtsova, respectively, in the same works that 0.4097≤C 0 ≤ 0.5600. Moving on to the second concern regarding dependence, since the CLT was established, extensive literature has since provided CLT type results under various types of dependence. Unfortunately, there is no general way to deal with every sit- uation. Most approaches relied on the characteristic function technique, which at times may be somewhat hard to apply as it becomes much more complicated com- pared to the independent case, and requires significant modifications each tailored to apply to different types of dependence. Stein’s method, introduced by Charles Stein in his seminal paper [82], is nowa- days one of the most powerful methods to prove convergence in distribution. Its 3 main advantages are that it provides non asymptotic bounds on the distance between distributions, and that it can handle a number of situations involving dependence. In our opinion, Stein’s method is one of the best ways to address the two questions mentioned above. Taking advantage of these benefits, many applications in several areas such as statistics, statistical physics, combinatorics and applied sciences have been developed using this method. Additional detail regarding the method will be given later, in Chapter 2. Although Stein’s method was originally used for normal approximation ([82] and [83]), it has now broadened to many other well known, and not so well known, distributions. We recommend two excellent references for overviews of the method; the first one is the book [21] and the second one is the introductory note [78]. Both of these mainly focus on normal approximation but also have some addi- tional results regarding distributional approximation by the Poisson, exponential and geometric distributions. Poisson approximation via Stein’s method was first considered in [19], and was later covered somewhat more extensively in the book [8]. The method has been developed dramatically and used for other distribu- tional approximations; an incomplete lists is given by: geometric ([72]), exponen- tial ([71]), gamma ([67] and [70]), generalized gamma ([73]), beta ([26], [47] and [27]) and generalized hyperbolic distributions ([41]). Not only useful for distributional approximations, Stein’s method has also been used recently to demonstrate concentration of measure (see e.g [16], [43], [44] and [46]), though this topic is not in the scope of the current dissertation. Most of the references mentioned above also provide interesting applications to several fields ranging from applied mathematics and statistics to applied sciences. As the reader will see in the remainder of this dissertation, we focus on two dif- ferent main topics on normal approximation that both involve dependence. In the 4 first topic, a new technique called an approximate zero bias coupling, a generalized version of the well-known zero bias coupling, is developed. This new technique will then be applied to the combinatorial central limit theorem, where the variables in consideration are not independent. In the second topic, we introduce a new way to use Stein’s method for positively associated random fields. As the name may suggest, the variables we consider in this case are certainly not independent. AtoolthatistypicallyappliedinStein’smethod, andonemakesitverysuccess- ful, is coupling. There are several coupling techniques that have been developed in general in such a way that they remain applicable to many problems; exchangeable pairs([83]), sizebiascouplings([48]), zerobiascouplings([49]), andSteincouplings ([22]). There are also non-coupling techniques used for particular problems such as local dependence ([9], [77]) and the inductive method ([12], see also [45] for an approach using size bias couplings). As mentioned in the previous paragraph, in this dissertation, we develop an approximate zero bias coupling that we believe can be used broadly in the future, and we develop a particular technique, not requiring coupling, that can be used for positively associated random fields. 1.2 Outline and Summary We begin Chapter 2 by introducing notations and definitions, along with the fundamentals of Stein’s method in Section 2.1. Proofs that may easily be found in the literature, e.g. [21] and [78], are omitted. Next, in Section 2.2, we state some useful results related to the well-known zero bias coupling and present an inter- esting application to the combinatorial central limit theorems where the random permutation has the uniform distribution. 5 In Chapter 3, we first generalize the zero bias coupling to an approximate zero bias coupling. The new technique is then used to obtain L 1 and L ∞ bounds to the normal in Section 3.2. An application to the combinatorial central limit theorems where the random permutation has the Ewens distribution is presented in Section 3.4. The definition and some necessary results of the Ewens distribution are detailed in Section 3.3. Chapter 4 develops an L 1 version of Stein’s method for sums of bounded pos- itively associated random variables. We also extend the results to the multidi- mensional case, with theL 1 metric is replaced by a smooth functions metric. The main bounds of the two such cases are stated and proved in Sections 4.2 and 4.3, respectively. Applications of the results in Chapter 4 are given to four well-known models in statistical physics and particle systems in Z d ; the ferromagnetic nearest-neighbor Ising model, the supercritical bond percolation model, the voter model and the contact process. These results are presented in separate sections in Chapter 5. In Chapter 6, we add some further comments, acknowledge some related works, and state a few open questions in which the reader may be interested. The topics in this dissertation are not exactly in linear order. Chapters 4 and 5 are not at all related to Chapter 3. For the reader who is only interested in the L 1 andL ∞ results based on zero bias couplings and approximate zero bias couplings, both the theory and applications are fully covered in the first three chapters with some further comments in the last chapter. On the other hand, if one wishes to immediately proceed to the results related to positive association, one can skip Section 2.2 and the whole Chapter 3. 6 Chapter 2 Overview of Stein’s Method “He (Charles Stein) was unique. Everyone’s unique but he was more unique than others.” Stein Jr. Back in the late 60s, Charles Stein introduced a new method to prove conver- gence in distribution to his students in a statistics lecture at Stanford University. The method has been published since 1972 (see [82]). Despite of the slow growth at the beginning, once the monograph of Stein’s method [83] was published in 1986, it started to grow more and more rapidly since then. Nowadays Stein’s method is one of the most powerful methods to prove convergence in distribution that has an extra benefit on providing bounds of the distance between distributions. In this chapter, we provide an overview of Stein’s method that is necessary for the remaining of our work and present one of the well-known couplings that is the root of our main results in the next chapter. 2.1 Introduction to Stein’s Method We begin this section by first recalling the definitions of the two distances introduced in the first chapter and adding that the L 1 distance has an alternative 7 expression that will be used later. TheL 1 andL ∞ distances are given, respectively, by d 1 L(X),L(Y ) = Z ∞ −∞ |P (X≤t)−P (Y≤t)|dt = sup h∈H 1 |Eh(X)−Eh(Y )| (2.1) whereH 1 ={h :|h(y)−h(x)|≤|y−x|}, and d ∞ L(X),L(Y ) = sup ∞<t<∞ |P (X <t)−P (X <t)| = sup h∈H∞ |Eh(X)−Eh(Y )| (2.2) whereH ∞ ={1(·≤z) :z∈R}. Stein’s method for normal approximation was originally motivated from the fact that W has the standard normal distribution, denotedN (0, 1), if and only if EWf(W ) =Ef 0 (W ) for all absolutely continuous functions f with E|f 0 (W )| <∞. This equation and the form of the distances in (2.1) and (2.2) lead to the differential equation h(w)−Nh =f 0 h (w)−wf h (w) (2.3) where Nh = Eh(Z) with Z ∼N (0, 1) and h∈H 1 (resp. h∈H ∞ ). Taking the supremum over all h∈H 1 (resp. h∈H ∞ ) to the expectation on the left hand side of (2.3) withw replaced by a variableW yields the distance betweenW and Z in (2.1) (resp. (2.2)). Thus, instead of working on the distances directly, one can handle the expectation on the right hand side using the bounded solution 8 f h of (2.3) for the given h. Using this device, Stein’s method has uncovered an alternative way to show convergence in distribution with additional information on the finite sample distance between distributions and can also deal with various kinds of dependence through the help of coupling constructions. Though the coupling techniques play a very important role in Stein’s method, in this work we only focus on the zero bias coupling and its generalization. If the reader is further interested, exchangeable pairs, size bias, and equilibrium techniques are well summarized in [21] and can be found as well in the other references mentioned in the introduction. Now we state the following useful properties of the solutionsf h of (2.3), proved in Lemmas 2.3 and 2.4 of [21]. In the following, for a real valued function ϕ(u) defined on the domainD, let|ϕ| ∞ = sup x∈D |ϕ(x)|. Lemma 2.1.1 ([21]). Let f z be the unique solution to (2.3) with h∈H ∞ . Then |f z | ∞ ≤ √ 2π/4, |f 0 z | ∞ ≤ 1 and for real numbers w, u and v, |(w +u)f z (w +u)− (w +v)f z (w +v)|≤ (|w| + √ 2π/4)(|u| +|v|). Lemma 2.1.2 ([21]). For a function h :R→R, let f h be the unique solution to (2.3). If h is bounded then |f h | ∞ ≤ q π/2|h(·)−Nh| ∞ and |f 0 h | ∞ ≤ 2|h(·)−Nh| ∞ . 9 Furthermore, if h is absolutely continuous, then |f h | ∞ ≤ 2|h 0 | ∞ , |f 0 h | ∞ ≤ q π/2|h 0 | ∞ and |f 00 h | ∞ ≤ 2|h 0 | ∞ . In particular, it follows directly from Lemma 2.1.2 that for h∈H 1 , |h 0 | ∞ ≤ 1 and |f h | ∞ ≤ 2, |f 0 h | ∞ ≤ q π/2 and |f 00 h | ∞ ≤ 2. (2.4) Therefore, to obtain an L 1 bound, one can simply take supre- mum to the right hand side of (2.3) over the class of functions n f :R→R :|f| ∞ ≤ 2,|f 0 | ∞ ≤ q π/2,|f 00 | ∞ ≤ 2 o . 2.2 Zero Bias Coupling In this section, we present one of the well-known couplings in the literature, the zero bias coupling which was first introduced in [49] and will be referred to in the next chapter. ForX with mean zero and varianceσ 2 ∈ (0,∞), we say thatX z has the X-zero biased distribution if EXf(X) =σ 2 Ef 0 (X z ) (2.5) for all absolutely continuous functions f for which the expectations exist. Before moving on to the construction and some interesting results of this cou- pling, westatethefollowingpropositionprovedin[21]thatshowstheexistenceand uniqueness of the zero bias variable X z of X and that it is absolutely continuous. 10 Proposition 2.2.1 ([21]). Let X be a random variable with mean zero and finite positive variance σ 2 . Then there exists a unique distribution for X z such that EXf(X) =σ 2 Ef 0 (X z ) for every absolutely continuous function f for whichE|Xf(X)|<∞. Moreover, the distribution of X z is absolutely continuous with density p z (x) =E[X1(X >x)]/σ 2 =−E[X1(X≤x)]/σ 2 and distribution function G z (x) =E[X(X−x)1(X≤x)]/σ 2 . As the zero bias variable always exists and is unique, we expect that we can obtainL 1 andL ∞ bounds based on the coupling once it is constructed. Following the proofs of Theorems 4.1 and 5.1 of [21] that showed the results for σ 2 = 1, we obtain the following two theorems for σ 2 > 0. We note that the results actually follow from the facts that Var(X/σ) = 1 and that (aX) z = aX z , however, the complete proofs are provided as we will refer to parts of them in Chapter 3. Theorem 2.2.2. Let X be a mean zero, variance σ 2 > 0 random variable and let X z have the X-zero bias distribution and be defined on the same space as X. Then, with Z∼N (0, 1), d 1 (L(X/σ),L(Z))≤ 2 σ E|X z −X|, (2.6) where Z is a standard normal random variable. 11 Proof. Forgivenh∈H 1 ,letf betheuniqueboundedsolutiontotheSteinequation f 0 (w)−wf(w) =h(w)−Nh. Now applying (2.5) in (2.3) with X replaced by X/σ, we have Eh(X/σ)−Nh =E f 0 X σ − X σ f X σ =E f 0 X σ −f 0 X z σ . Applying (2.4) to the equation above, we have E|h(X/σ)−Nh| ≤ |f 00 | ∞ E X σ − X z σ ≤ 2 σ E|X z −X|. Theorem 2.2.3. Let X be a mean zero, variance σ 2 > 0 random variable and suppose that there exists X z , having the X-zero bias distribution, defined on the same space as X, satisfying|X z −X|≤δ. Then d ∞ (L(X/σ),L(Z))≤ (1 + 1/ √ 2π + √ 2π/4)δ σ , (2.7) where Z is a standard normal random variable. Proof. We follow the proof of Theorem 5.1 of [21] which obtained the the same type of bound using zero biasing. Let z∈R, =δ/σ andf be the solution of the equation f 0 (w)−wf(w) =1 {w≤z−} −P (Z≤z−). 12 Then we have f 0 (Y z /σ)− (Y z /σ)f(Y z /σ) = 1 {Y z /σ≤z−} −P (Z≤z−) ≤ 1 {Y 0 /σ≤z} −P (Z≤z−). (2.8) Letting g(x) =f(x/σ), we have g 0 (x) = (1/σ)f 0 (x/σ), and P (Y 0 /σ≤z)−P (Z <z) = (P (Z <z−)−P (Z <z)) +P (Y 0 /σ≤z)−P (Z <z−) ≥ −/ √ 2π +P (Y 0 /σ≤z)−P (Z <z−) ≥ −/ √ 2π +E[f 0 (Y z /σ)− (Y z /σ)f(Y z /σ)] = −/ √ 2π + (1/σ)E[σ 2 g 0 (Y z )−Y z g(Y z )] ≥ − √ 2π + 1 σ E [Y 0 g(Y 0 )−Y z g(Y z )] = − √ 2π +E " Y 0 σ f Y 0 σ ! − Y z σ f Y z σ # . (2.9) where we have used (2.5) in the second equality. Writing Δ = Y z /σ−Y 0 /σ and applying the last inequality of Lemma 2.1.1 and that|Δ|≤ yields E " Y 0 σ f Y 0 σ ! − Y z σ f Y z σ # = E " Y 0 σ f Y 0 σ ! − Y 0 σ + Δ ! f Y 0 σ + Δ !# ≤ E " |Y 0 | σ + √ 2π 4 ! |Δ| # ≤ (1 + √ 2π/4) = δ σ (1 + √ 2π/4). (2.10) 13 Using this inequality in (2.9) yields P (Y 0 /σ≤z)−P (Z <z)≥− δ σ 1 √ 2π + 1 + √ 2π 4 ! . A similar argument yields the reverse inequality. It is easy to see that Theorems 2.2.2 and 2.2.3 allow one to simply obtain good bounds for L 1 and L ∞ distances once a zero bias coupling is constructed in such a way that the two variables are close. Unfortunately, constructing the zero bias coupling can be difficult and there are no common methods to proceed. Asanexample, wefirstshowthecouplingconstructionforasumofindependent randomvariables. LetX 1 ,...,X n beindependentmeanzerorandomvariableswith E[X 2 i ] =σ 2 i > 0,i∈ [n]and P n i=1 σ 2 i = 1. LetX z i havetheX i -zerobiasdistribution with X i independent of X j and X z j for j6= i. Further, let I be a random index, independent of X i , X z i , i∈ [n] with distribution P (I =i) =σ 2 i . Then, with W = P n i=1 X i , W z =W−X I +X z I has the W-zero bias distribution. 14 In general, one of the efficient methods, introduced in [49], is to construct the zero bias variable through an existence of a λ-Stein pair. We recall that an exchangeable pair X 0 ,X 00 form a λ-Stein pair if E(X 00 |X 0 ) = (1−λ)X 0 (2.11) for some 0 < λ < 1. The following lemma of [49] uncovers the way to construct zero bias couplings using λ-Stein pairs. Lemma 2.2.4 ([49]). Let X 0 ,X 00 be a λ-Stein pair with Var(X 0 ) = σ 2 ∈ (0,∞) and distribution F (x 0 ,x 00 ). Then when X † ,X ‡ have distribution dF † (x 0 ,x 00 ) = (x 0 −x 00 ) 2 E(X 0 −X 00 ) 2 dF (x 0 ,x 00 ), (2.12) and U∼U[0, 1] is independent of X † ,X ‡ , the variable X z =UX † + (1−U)X ‡ has the X 0 -zero biased distribution. Lemma 2.2.4 has been used in several papers and it will be discussed further in the following subsection. 2.2.1 Combinatorial Central Limit Theorems In this subsection, we present an application of the zero bias coupling to the distribution, introduced in [57], of Y = n X i=1 a i,π(i) (2.13) 15 whereA∈R n×n is a given real matrix with components{a i,j } n i,j=1 andπ∈S n has the uniform distribution. The distribution of this form has been used in statistics for long periods of time as ithas connection with nonparametric testsand sampling from a finite population (see e.g. [59], [85], [68] and [64]). The work [21] obtained an L 1 bound with explicit constant between the stan- dardized Y and the standard normalN (0, 1) with the help of Lemma 2.2.4 and Theorem 2.2.2. Using the same lemma along with Theorem 2.2.3, the authors also provided anL ∞ bound in the case where the random permutation has distribution which is constant over permutations with the same cycle type and having no fixed points. Concentration inequalities of the same setting were shown in [46]. Apart from the use of this lemma, Stein’s method with different techniques has been applied to the combinatorial central limit theorems under the uniform distribution in several works. The work [12] used an inductive method to obtain an L ∞ bound of ordern −1/2 but without explicit constant. One of the most recent papers is [20] where the exchangeable pairs technique was used to obtain L ∞ bounds between Y as in (2.13) with a fixed matrix A replaced by independent random variables {X i,j : i,j ∈ [n]} and the normal distribution. The bounds there are given in term of the third moments ofX i,j . To learn more about the the prolonged relation between Stein’s method and the combinatorial central limit theorems, see the ref- erences therein. The results were generalized to the case without third moments in [39] and to various moment conditions in [40] but both of them did not use Stein’s method. The original idea of replacing a i,j by X i,j dates back to [55] where the results are optimal only in the case that there existsC > 0 such that|X i,j |≤C for all i,j∈ [n]. Below let us summarize the idea on how [21] constructed a λ-Stein pair and the square bias distribution (2.12) but the full detail and the proof of 16 the main result are omitted. If interested, the reader may take a look further in Section 4.4 of [21]. Letting a ·· = 1 n 2 n X i,j=1 a ij , a i· = 1 n n X j=1 a i,j and a ·j = 1 n n X i=1 a i,j , by direct calculations, we can easily show that EY =na ·· and (2.14) Var(Y ) = 1 n− 1 X i,j (a i,j −a i· −a ·j +a ·· ) 2 . (2.15) We note that below we consider the symbols π and Y interchageable with π 0 and Y 0 , respectively. To construct a λ-Stein pair, we take (I,J) to be independent of π 0 with the uniform distribution over all pairs satisfying 1≤ I6= J≤ n and let π 00 = π 0 τ IJ whereτ i,j is the permutation that transposes i andj. ForY 0 andY 00 defined as in (2.13) with π replaced by π 0 and π 00 , respectively, by [21], we have E[Y 00 |Y 0 ] = 1− 2 n− 1 Y 0 , showing that (Y 0 ,Y 00 ) is a 2/(n− 1)-Stein pair. To construct the square bias distribution (2.12), following Lemmas 4.5 and 4.6 of [21], let (I † ,J † ,K † ,L † ) be independent of π with distribution p(i,j,k,l) = [(a i,k +a j,l )− (a i,l −a j,k )] 2 4n 2 (n− 1)σ 2 17 where σ 2 = Var(Y ), and let π † = πτ π −1 (K † ),J † if L † =π(I † ),K † 6=π(J † ) πτ π −1 (L † ),I † if L † 6=π(I † ),K † =π(J † ) πτ π −1 (K † ),I †τ π −1 (L † ),J † otherwise , and π ‡ = π † τ I † ,J †. Then, with Y † ,Y ‡ be as in (2.13) with π † ,π ‡ replacing π, respectively, and U be uniform on [0, 1], Y z =UY † + (1−UY ‡ ) has theY-zero bias distribution by Lemma 2.2.4. Bounding|Y z −Y| and invoking Theorem 2.2.2, the work [21] obtained the following theorem. Theorem 2.2.5 ([21]). For n≥ 3, let{a i,j } n i,j=1 be the components of an n×n matrix A, let π be a random permutation uniformly distributed overS n , and let Y be given by (2.13). Then, with μ =EY, σ 2 = Var(Y ) as given in (2.14), (2.15), respectively, γ = P i,j |a i,j −a i· −a ·j +a ·· | 3 , and W = (Y−μ)/σ, d 1 (L(W ),L(Z))≤ γ (n− 1)σ 3 16 + 56 n− 1 + 8 (n− 1) 2 ! , where Z is a standard normal random variable. There is a remark in [21] saying that when the elements of A are all of compa- rable order, σ 2 is of order n and γ of order n 2 , yielding the bound of order σ −1 or n −1/2 . Although the combinatorial central limit theorems have attracted attention for quite sometime and have been extended to many different settings, the case where 18 the random permutation has the Ewens distribution has yet been studied. It is interesting to study the robustness and the sensitivity of normality when the usual assumption, studied in [21], are not satisfied. This is useful in the real situation as the uniform properties are sometimes believed to hold but actually do not. In the next chapter, we introduce a recent coupling technique to handle the situation when a λ-Stein pair does not exist or is hard to obtain and we finally apply the technique to Y as in (2.13) when π has the Ewens distribution. 19 Chapter 3 Approximate Zero Bias Couplings “Anyone who has never made a mistake has never tried anything new.” Albert Einstein Although the work [49] has uncovered the powerful method to construct the zero bias variable through a λ-Stein pair as presented in the last chapter, one difficulty that may arise is that it is not always possible to construct a λ-Stein pair. Our goal in this chapter is therefore to provide a treatment to this situation. To be more precise, we introduce an approximate λ,R-Stein pair and then use it to construct an approximate zero bias variable. It will be used further to obtainL 1 andL ∞ bounds parallel to the ones in (2.6) and (2.7), respectively. An example of the case that aλ-Stein pair is absent, presented in Section 3.4, is the combinatorial central limit theorem where the random permutation has the Ewens distribution. The material in this chapter is based on the recent paper [87]. 3.1 ConstructionofApproximateZeroBiasCou- plings Inthissectionwedefineanapproximatezerobiasvariablewhichisageneralized version of the zero bias variable from the last chapter and then introduce a way to construct it. Prior to moving on to that step, we define an approximateλ,R-Stein 20 pair that will be used in the construction. We call a pair of random variables (Y 0 ,Y 00 ), an approximate λ,R-Stein pair if it is exchangeable and satisfies EY 0 = 0, Var(Y 0 ) =σ 2 (3.1) with σ 2 ∈ (0,∞), and E(Y 00 |Y 0 ) = (1−λ)Y 0 +R, (3.2) for some 0<λ< 1 and R =R(Y 0 ). The approximateλ,R-Stein pair was used earlier in [76] with an application to weighted U-statistics where they obtained an L ∞ bound, d ∞ (L(Y 0 ),L(Z))≤ 6 λ q Var{E[(Y 00 −Y 0 ) 2 |Y 0 ]} + 19 √ ER 2 λ + 6 r a λ E|Y 00 −Y 0 | 3 , providing Var(Y 0 ) = 1 and Z be the standard normal variable. The more general setting can be found in [26] and [27] where the relation in (3.2) was generalized to E[Y 00 −Y 0 |Y 0 ] = λγ(Y 0 ) +R with γ be a measurable function. Therefore, there is a natural question raised here that why we do not just use this known bound. Two simple answers are that first the term Var{E[(Y 00 −Y 0 ) 2 |Y 0 ]} is sometimes hard to calculate such as the one in our application to the combinatorial central limit theorem in Section 3.4 and that we would like to present a new approach that might be useful in different situations in the future. 21 Now we proceed to define an approximate zero bias coupling. Let (Y 0 ,Y 00 ) be an approximateλ,R-Stein pair. Taking expectation in (3.2), using exchangeability and that Y 0 has mean zero yields ER = (1−λ)EY 0 +ER =EY 00 = 0. In addition, for any function f such that the following expectations exist, EY 00 f(Y 0 ) = E(E(Y 00 f(Y 0 )|Y 0 )) =E(f(Y 0 )E(Y 00 |Y 0 )) = E(f(Y 0 )((1−λ)Y 0 +R)) = (1−λ)EY 0 f(Y 0 ) +ERf(Y 0 ). (3.3) In particular, specializing (3.3) to the case f(y) =y yields EY 00 Y 0 = (1−λ)EY 02 +EY 0 R, and thus E(Y 00 −Y 0 ) 2 = 2 EY 02 −EY 00 Y 0 = 2 λEY 02 −EY 0 R = 2 λσ 2 −EY 0 R . (3.4) A generalized version of Lemma 2.2.4 is presented in the next lemma. Here we use an approximate λ,R-Stein pair (Y 0 ,Y 00 ) to construct a variable Y ∗ satisfying (3.5) that we name it an approximate Y 0 -zero bias variable. 22 Lemma 3.1.1. Let (Y 0 ,Y 00 ) be an approximate λ,R-Stein pair with distribution F (y 0 ,y 00 ). Then when (Y † ,Y ‡ ) has distribution dF † (y 0 ,y 00 ) = (y 00 −y 0 ) 2 E(Y 00 −Y 0 ) 2 dF (y 0 ,y 00 ), and U∼U([0, 1]) is independent of Y † ,Y ‡ , the variable Y ∗ = UY † + (1−U)Y ‡ satisfies E[Y 0 f(Y 0 )] =σ 2 Ef 0 (Y ∗ )− EY 0 R λ Ef 0 (Y ∗ ) + ERf(Y 0 ) λ (3.5) for all absolutely continuous functions f. Proof. For all absolutely continuous functions f for which the expectations below exist, σ 2 Ef 0 (Y ∗ ) = σ 2 Ef 0 (UY † + (1−U)Y ‡ ) = σ 2 E f(Y † )−f(Y ‡ ) Y † −Y ‡ ! = σ 2 2 (λσ 2 −EY 0 R) E f(Y 00 )−f(Y 0 ) Y 00 −Y 0 ! (Y 00 −Y 0 ) 2 ! = σ 2 2 (λσ 2 −EY 0 R) E((f(Y 00 )−f(Y 0 ))(Y 00 −Y 0 )) = σ 2 λσ 2 −EY 0 R E(Y 0 f(Y 0 )−Y 00 f(Y 0 )) = λσ 2 λσ 2 −EY 0 R E(Y 0 f(Y 0 ))− σ 2 λσ 2 −EY 0 R ERf(Y 0 ), where we have used (3.4) and (3.3) in the third and the last equalities, respectively. 23 Thus EY 0 f(Y 0 ) = λσ 2 −EY 0 R σ 2 λ σ 2 Ef 0 (Y ∗ ) + ERf(Y 0 ) λ = σ 2 Ef 0 (Y ∗ )− EY 0 R λ Ef 0 (Y ∗ ) + ERf(Y 0 ) λ . Next lemma shows a way to construct an approximate zero bias distribution of Y 0 through an approximate Stein pair (Y 0 ,Y 00 ) in the case that (Y 0 ,Y 00 ) is a function of some underlying random variables ξ α ,α∈ χ and a random index I. The proof is omitted as it is similar to Lemma 4.4 of [21], with a λ-Stein pair and 2λσ 2 there respectively replaced by an approximate λ,R-Stein pair and E(Y 0 −Y 00 ) 2 = 2 (λσ 2 −EY 0 R) here. Lemma 3.1.2. Let F (y 0 ,y 00 ) be the distribution of an approximate λ,R-Stein pair (Y 0 ,Y 00 ) and suppose there exist a distribution F (i,ξ α ,α∈χ) (3.6) and anR 2 valued function (y 0 ,y 00 ) =ψ(i,ξ α ,α∈χ) such that when I and{Ξ α ,α∈ χ} have distribution (3.6) then (Y 0 ,Y 00 ) =ψ(I, Ξ α ,α∈χ) has distribution F (y 0 ,y 00 ). If I † ,{Ξ † α ,α∈χ} have distribution dF † (i,ξ α ,α∈χ) = (y 0 −y 00 ) 2 E(Y 0 −Y 00 ) 2 dF (i,ξ α ,α∈χ) (3.7) 24 then the pair (Y † ,Y ‡ ) =ψ(I † , Ξ † α ,α∈χ) has distribution F † (y † ,y ‡ ) satisfying dF † (y 0 ,y 00 ) = (y 0 −y 00 ) 2 E(Y 0 −Y 00 ) 2 dF (y 0 ,y 00 ). (3.8) 3.2 Main Results Now we have all ingredients to generalize the results in Theorems 2.2.2 and 2.2.3 from the last chapter. 3.2.1 L 1 Bounds In this subsection, we provide anL 1 bound betweenY 0 and a standard normal random variable in term of R and Y ∗ introduced in the last section. Theorem 3.2.1. Let Y 0 be a mean zero, variance σ 2 > 0 random variable, and Y ∗ and R(Y 0 ) be defined on the same space as Y 0 , satisfying (3.5), then d 1 (L(Y 0 /σ),L(Z))≤ 2 σ E|Y ∗ −Y 0 | + s 2 π |EY 0 R| σ 2 λ + 2E|R| σλ , (3.9) where Z is a standard normal random variable. Proof. For givenh∈L letf be the unique bounded solution to the Stein equation f 0 (w)−wf(w) =h(w)−Nh. 25 Letting g(x) =f(x/σ), we have g 0 (x) = (1/σ)f 0 (x/σ) , and thus |Eh(Y 0 /σ)−Nh| = E f 0 Y 0 σ ! − Y 0 σ f Y 0 σ !! = E f 0 Y 0 σ ! − 1 σ Y 0 g (Y 0 ) ! = E f 0 Y 0 σ ! −σg 0 (Y ∗ ) ! + EY 0 R σλ Eg 0 (Y ∗ )− ERg(Y 0 ) σλ ≤E f 0 Y 0 σ ! −f 0 Y ∗ σ + EY 0 R σ 2 λ Ef 0 Y ∗ σ + ERf(Y 0 /σ) σλ ≤ 2 σ E|Y ∗ −Y 0 | + s 2 π |EY 0 R| σ 2 λ + 2E|R| σλ , where we have applied (3.5) with f replaced by g in the third equality and used the last inequality in (2.4). 3.2.2 L ∞ Bounds In this section, we obtain a similar result as in the last subsection but with the L ∞ distance. Theorem 3.2.2. LetY 0 be a mean zero, varianceσ 2 > 0 random variable, andY ∗ and R(Y 0 ) be defined on the same space as Y 0 , satisfying (3.5) and|Y ∗ −Y|≤δ. Then d ∞ (L(Y 0 /σ),L(Z))≤ δ(1 + 1/ √ 2π + √ 2π/4) σ + |EY 0 R| σ 2 λ + √ 2πE|R| 4σλ , (3.10) where Z is a standard normal random variable. 26 Proof. We follow the proof of Theorem 2.2.3 which obtained the the same type of bound using zero biasing. Letz∈R, =δ/σ andf be the solution of the equation f 0 (w)−wf(w) =1 {w≤z−} −P (Z≤z−). Then we have f 0 (Y ∗ /σ)− (Y ∗ /σ)f(Y ∗ /σ) = 1 {Y ∗ /σ≤z−} −P (Z≤z−) ≤ 1 {Y 0 /σ≤z} −P (Z≤z−). Following the proof of Theorem 2.2.3, letting g(x) = f(x/σ), we have g 0 (x) = (1/σ)f 0 (x/σ), and P (Y 0 /σ≤z)−P (Z <z)≥−/ √ 2π + (1/σ)E[σ 2 g 0 (Y ∗ )−Y ∗ g(Y ∗ )]. Then, using (3.5) and applying Lemma 2.1.1 in the last inequality, P (Y 0 /σ≤z)−P (Z <z) ≥ − √ 2π + 1 σ E [Y 0 g(Y 0 )−Y ∗ g(Y ∗ )] + EY 0 R σλ Eg 0 (Y ∗ )− ERg(Y 0 ) σλ = − √ 2π +E " Y 0 σ f Y 0 σ ! − Y ∗ σ f Y ∗ σ # + EY 0 R σ 2 λ Ef 0 Y ∗ σ − ERf(Y 0 /σ) σλ ≥ − √ 2π +E " Y 0 σ f Y 0 σ ! − Y ∗ σ f Y ∗ σ # − |EY 0 R| σ 2 λ − √ 2πE|R| 4σλ . 27 Using the inequality in (2.10) in the last expression above yields P (Y 0 /σ≤z)−P (Z <z)≥− δ σ 1 √ 2π + 1 + √ 2π 4 ! − |EY 0 R| σ 2 λ − √ 2πE|R| 4σλ . The reverse inequality can be shown by a similar argument. We end up this section by mentioning the following two remarks. Remark 3.2.3. WhenR = 0 the construction of Lemma 3.1.1 yields the construc- tion of the zero bias distribution as in Lemma 2.2.4. Furthermore, the L 1 and L ∞ bounds in Theorems 3.2.1 and 3.2.2 reduce to the bounds in Theorems 2.2.2 and 2.2.3, respectively. Remark 3.2.4. Notice that the bounds in Theorems 3.2.1 and 3.2.2 both have the term|EY 0 R|. Therefore, one will obtain good bounds only if one can construct an approximate λ,R-Stein’s pair such that Y 0 and R are not highly correlated. 3.3 Ewens Distribution In this section, we briefly define the Ewens distribution and prove some useful properties that will be used in the next section. LetS n denotes the symmetric group. The Ewens distributionE θ onS n with parameterθ> 0, was first introduced in [37] and used in population genetics to describe the probabilities associated with the number of times that different alleles are observed in the sample. For the mathematical description, the reader may check the book [5]. In the following, 28 withZ the set of integers andk∈Z, we letN k = [k,∞)∩Z and forx∈R,n∈N 1 , we denote the rising and falling factorials, respectively, by x (n) =x(x + 1)··· (x +n− 1) and x (n) =x(x− 1)··· (x−n + 1). For a permutation π∈S n , the Ewens measure is given by P θ (π) = θ #(π) θ (n) , (3.11) where #(π) denotes the number of cycles of π. When θ = 1, it is obvious thatE θ specializes to the uniform distribution over all permutations. Now let c q (π) denotes the number of q cycles of π and write c q for c q (π) for short. The Ewens measureE θ can be defined equivalently in term of c 1 ,...,c n as P θ (c 1 ,...,c n ) =1 n X j=1 jc j =n n! θ (n) n Y j=1 θ c j j c j c j ! . (3.12) We note that the Ewens measure was originally stated this way in [37] where, in population genetics, (3.12) is the probability that there are c q alleles represented q times for q∈ [n] in a sample of n gametes taken from a large population. Oneoftheeasiestwaytoconstructapermutationπ n ∈S n withthedistribution E θ is to use the following procedure which is ‘so-called’ the Chinese restaurant process (see e.g. [4] and [74]). For n = 1, π 1 is the unique permutation that maps 1 to 1 inS 1 . For n≥ 2, we construct π n from π n−1 by either adding n as a fixed point with probability θ/(θ +n− 1), or by insertingn uniformly into one of n− 1 locations inside a cycle of π n−1 , so each with probability 1/(θ +n− 1). Next we define two sub-permutations of σ∈S n . For B⊂ [n], we let σ\B be the reduced permutation of the elements [n]\B whose cycle representation 29 is obtained by deleting all elements of B in the cycle representation of σ. For instance, if n = 5 and the cycle representation of σ is (1)(2435) and B ={1, 2} then σ\B has representation (354). Also, let σ B be the permutation whose cycle structure is obtained by taking the cycle structure of σ and removing all cycles that contain any element of B c . Here, for instance, σ B has cycle structure (1). With #(τ) denoting the number of cycles of the permutationτ, we easily see that #(σ) = #(σ/B) + #(σ B ), as any cycle of σ either contains, or does not contain, some element of B c . For B⊂ [n], Propositions 3.3.1 and 3.3.2 that follow provide the joint uncon- ditional probability that π(i) = ξ i ,i ∈ B, and conditional probability that π(i) =ξ i ,i∈B c given{π(i),i∈B} under the Ewens distribution, respectively. Proposition 3.3.1. Let π be a permutation of [n] with distributionE θ , θ> 0 and B⊆ [n]. Then, for ξ i ,i∈B distinct elements of [n], P (π(i) =ξ i ,i∈B) = θ #(π B ) (θ +n− 1) (|B|) . Proof. WeprovethislemmabyinductiononthesizeofB. Sinceitisclearby(3.11) that the distribution of π depends only on the number of cycles, it is sufficient to prove the result forB m ={n−m+1,n−m+2,...,n} andm∈ [n]. ForB 1 ={n}, by the starting configuration in the construction of π via the Chinese restaurant process described above, we immediately have P θ (π(n) =k) = 1 θ+n−1 if k6=n, θ θ+n−1 if k =n = θ #(π B 1 ) (θ +n− 1) (|B 1 |) . Now we assume that the claim is true for B m−1 . To prove the result for B m , we recall that the Chinese restaurant process either adds n−m + 1 as a fixed point 30 with probability θ/(θ +n−m), or inserts n−m + 1 uniformly into one of n−m locations inside a cycle of π n−m , so each with probability 1/(θ +n−m). Hence, using the assumption that the result holds for B m−1 , we have P θ (π(i) =ξ i ,i∈B m ) = 1 θ+n−m θ #(π B m−1 ) (θ+n−1) (m−1) if n−m + 1 is not a fixed point θ θ+n−m θ #(π B m−1 ) (θ+n−1) (m−1) if n−m + 1 is a fixed point = θ #(π Bm ) (θ +n− 1) (m) = θ #(π Bm ) (θ +n− 1) (|Bm|) . Proposition 3.3.2. Let π be a permutation of [n] with distributionE θ , θ> 0 and B⊂ [n]. Then, for ξ i ,i∈ [n] distinct elements of [n], P (π(i) =ξ i ,i∈B c |π(k) =ξ k ,k∈B) = θ #(π\B) θ (|B c |) . Proof. Using the definition of conditional probability and (3.11) and applying Proposition 3.3.1, we have P (π(i) =ξ i ,i∈B c |π(k) =ξ k ,k∈B) = P (π(i) =ξ i ,i∈ [n]) P (π(k) =ξ k ,k∈B) = θ #(π) /θ (n) θ #(π B ) /(θ +n− 1) (|B|) = θ #(π\B) θ (|B c |) , where we have used #(π\B) = #(π)− #(π B ) in the last equality. The joint moments of c = (c 1 ,...,c n ) in the uniform case were established in [86] (See also [5]). Using the similar argument, in the following proposition, we 31 generalize the result to the joint moments ofc = (c 1 ,...,c n ) under the distribution E θ for anyθ> 0. Note that, withθ = 1, the proposition below is exactly the same as that in [86] and [5]. Proposition 3.3.3. Letc = (c 1 ,...,c n ) be a cycle type ofπ∈S n with distribution E θ with θ> 0. Then, for m 1 ,...,m n ∈N 0 with m = P n j=1 jm j , E n Y j=1 (c j ) (m j ) = n (m) (θ +n− 1) (m) n Y j=1 θ j ! m j 1 n X j=1 jc j ≤n . Proof. Using (c j ) (m j ) /c j ! = 1/(c j −m j )! when c j ≥m j , we have E n Y j=1 (c j ) (m j ) = X c∈N n 0 P θ (c) n Y j=1 (c j ) (m j ) = X c:c j ≥m j 1 n X j=1 jc j =n n! θ (n) n Y j=1 (c j ) (m j ) θ c j j c j c j ! = n (m) (θ +n− 1) (m) n Y j=1 θ j ! m j X d∈N n 0 1 n X j=1 jd j =n−m (n−m)! θ (n−m) n Y j=1 θ d j j d j d j ! = n (m) (θ +n− 1) (m) n Y j=1 θ j ! m j X d∈N n 0 1 n−m X j=1 jd j =n−m (n−m)! θ (n−m) n−m Y j=1 θ d j j d j d j ! = n (m) (θ +n− 1) (m) n Y j=1 θ j ! m j 1 n X j=1 jc j ≤n where d j corresponds to c j −m j and the sum over d in the second last line is one as it is taken over all possibilities ofS n−m . 32 3.4 Combinatorial CLT under Ewens Measure In this section, we study the distribution of Y = n X i=1 a i,π(i) (3.13) where A∈ R n×n is a given real matrix with components{a i,j } n i,j=1 and π∈S n has the Ewens distributionE θ with θ > 0. Again, it is worth mentioning that the uniform distribution onS n corresponds to the distributionE θ with θ = 1 and the distribution of Y in this particular case has been discussed in Section 2.2.1. A distribution onS n is said to be constant on cycle type if the probability of any permutationπ∈S n depends only on the cycle type (c 1 ,...,c n ). In this section we initially intended to follow Section 6.1.2 of [21] that considered the distribution of (3.13) whereπ∈S n has distribution constant on cycle type with no fixed points. It follows from (3.12) directly that the Ewens distribution is constant on cycle type and allows fixed point. Therefore, though several techniques in Section 6.1.2 of [21] apply here, the main proofs and the coupling construction are needed to be modified. Letting a •,• = 1 n(θ +n− 1) θ n X i=1 a i,i + X i6=j a ij , (3.14) applying Lemma 3.3.1 with B ={i}, we have EY = n X i=1 Ea i,π(i) = n X i=1 θ θ +n− 1 a i,i + X j,j6=i 1 θ +n− 1 a i,j =na •,• . (3.15) 33 Letting b a i,j =a i,j −a •,• , and using (3.15), we have E " n X i=1 b a i,π(i) # =E " n X i=1 (a i,π(i) −a •,• ) # =E " n X i=1 a i,π(i) −na •,• # = 0. As a consequence, replacing a i,j by b a i,j , we may without loss of generality assume that EY = 0 and a •,• = 0. (3.16) For simplicity, as in [21], we consider only the symmetric case where a i,j =a j,i for all i,j∈ [n]. To rule out trivial cases, we assume in what follows that σ 2 > 0. We later calculate σ 2 explicitly in (3.41) of Lemma 3.4.8. Moreover, in Remark 3.4.9, we discuss that it is of order n when the elements of A are well chosen in some sense and, in that case, there exists N∈N 1 such that σ 2 > 0 for n>N. Now we state our main results in this section in which we obtain upper bounds for theL 1 andL ∞ distances betweenY given in (3.13) with the distributionE θ and a standard normal random variable and lower bounds for the L ∞ distance in the special case that the matrix A is integer-valued. Below, we consider the symbols π and Y interchageable with π 0 and Y 0 , respectively. Theorem 3.4.1. Let n≥ 6 and{a i,j } n i,j=1 be an array of real numbers satisfying a i,j =a j,i . 34 Let π 0 ∈S n be a random permutation with the distributionE θ , with θ > 0. Then, with Y 0 the sum in (3.13) with π 0 replacing π, assuming Var(Y 0 ) = σ 2 > 0, and letting W = (Y 0 −EY 0 )/σ and Z a standard normal random variable, d 1 (L(W ),L(Z))≤ α 1 (θ,M,n) σ , (3.17) and d ∞ (L(W ),L(Z))≤ α 2 (θ,M,n) σ , (3.18) where α 1 (θ,M,n) = 40M +κ θ,n,1 q 2/πM 3 + θ + 1 n− 1 ! + κ θ,n,2 q 2/πM n− 1 +θM 1.2 q 2/π + 1.2(6n + 4θ− 5) θ +n− 1 + θn (θ +n− 1) (2) ! , and α 2 (θ,M,n) = 20(1 + 1/ √ 2π + √ 2π/4)M +κ θ,n,1 M 3 + θ + 1 n− 1 ! + κ θ,n,2 M n− 1 +θM 1.2 + 0.15 √ 2π(6n + 4θ− 5) θ +n− 1 + 0.125 √ 2πθn (θ +n− 1) (2) ! with M = max i,j |a i,j −a •,• |, a •,• as in (3.14), κ θ,n,1 = v u u t θ 2 n (2) (θ +n− 1) (2) + θn θ +n− 1 (3.19) 35 and κ θ,n,2 = v u u t θ 4 n (4) (θ +n− 1) (4) + 4θ 3 n (3) (θ +n− 1) (3) + 2θ 2 n (2) (θ +n− 1) (2) . (3.20) In particular, if a i,j , i,j∈ [n] are all integers, then 1 6 √ 3σ + 3 ≤d ∞ (L(W ),L(Z))≤ α 2 (θ,M,n) σ . (3.21) Remark 3.4.2 below discusses the behavior of (3.19) and (3.20) in θ and the bounds on the rates of convergence in n. Remark 3.4.2. 1. κ θ,n,1 and κ θ,n,2 given in (3.19) and (3.20) are of constant order in n since κ θ,n,1 → √ θ 2 +θ and κ θ,n,2 → √ θ 4 + 4θ 3 + 2θ 2 as n→∞. In particular, κ 1,n,1 = √ 2 and κ 1,n,2 = √ 7. 2. Sinceκ θ,n,1 andκ θ,n,2 are of constant order inn, α 1 (θ,M,n) andα 2 (θ,M,n) are also of constant order in n for all θ > 0. Therefore, the L 1 and L ∞ bounds on the right hand side of (3.17) and (3.18) are of order σ −1 which is of the same order as the L 1 bound in the uniform case in Theorem 2.2.5 of [21]. The uniform distribution corresponds to the special case of the Ewens distribution with θ = 1 and the result in this case is presented in Corollary 3.4.3 below. The order of σ 2 in n is considered in Remark 3.4.9, where we find that if A = [a i,j ] is chosen well in some sense then σ 2 will be of order n, implying that the bounds in (3.17) and (3.18) are of order n −1/2 . 36 3. In the case that a i,j , i,j∈ [n] are interger-valued, the L ∞ upper and lower bounds in (3.21) are of the same order σ −1 , which is therefore the optimal order for the L ∞ distance. 4. It is easy to see thatα 1 (θ,M,n) andα 2 (θ,M,n) are increasing inθ and thus so are the L 1 and L ∞ bounds in (3.17) and (3.18). In fact, the expressions α 1 (θ,M,n) and α 2 (θ,M,n) are obtained from (3.9) and (3.10), respectively, which depend on R. This remainder R, given explicitly in Lemma 3.4.6 below, depends on the number of fixed points of π, which become more likely as θ increases. 5. As θ tends to zero, α 1 (θ,M,n)→ 40M and α 2 (θ,M,n)→ 20(1 + 1/ √ 2π + √ 2π/4)M. Therefore, theL 1 andL ∞ bounds on the right hand side of (3.17) and (3.18) converge to 40M/σ and 20(1 + 1/ √ 2π + √ 2π/4)M/σ, respectively. This corresponds to the case where a large number of cycles is unlikely. Next corollary specializes Theorem 3.4.1 to the case that θ = 1 which corre- sponds to the well-known case where the random permutation π 0 has the uniform distribution. Indeed, the result immediately follows from Theorem 3.4.1 by apply- ing the bounds α 1 (1,M,n) = 47.2 + 6/ √ π + 1.2 q 2/π + 5 + √ 14 √ π(n− 1) − 1.2 n ! M≤ 53M 37 and α 2 (1,M,n) = 21.2 + 3 √ 2 + 5.9 √ 2π + 20/ √ 2π + 2 √ 2 + √ 7 + 0.125 √ 2π n− 1 − 0.15 √ 2π n M≤ 50M, which hold for all n≥ 6. Corollary 3.4.3. Let n≥ 6 and{a i,j } n i,j=1 be an array of real numbers satisfying a i,j =a j,i . Let π 0 be a random permutation with the uniform distribution overS n . Then, with Y 0 the sum in (3.13) with π 0 replacing π, assuming Var(Y 0 ) =σ 2 > 0, and letting W = (Y 0 −EY 0 )/σ and Z a standard normal random variable, d 1 (L(W ),L(Z))≤ 53M σ , and d ∞ (L(W ),L(Z))≤ 50M σ , where M = max i,j |a i,j −a •,• | and a •,• is as in (3.14). In particular, if a i,j , i,j∈ [n] are all integers, then 1 6 √ 3σ + 3 ≤d ∞ (L(W ),L(Z))≤ 50M σ . 38 Section6.1.2of[21]provedthemainL 1 andL ∞ boundsbyfirstconsideringeach cycle type c := (c 1 ,...,c n ) separately and then combining all such cases. Unlike [21], fixed points are allowed in the present work and thus P θ (π(i) =i|c) varies as c changes. As of this fact, the same proof structure and coupling construction as there are not valid here and we construct a new coupling in order to use in the proof of Theorem 3.4.1. Toproveourmainresultweborrowthefollowingtwolemmasfrom[21]. Lemma 3.4.4 forms a partition of the space based on the cycle structure of π∈S n . Using that partition, Lemma 3.4.5 expresses the difference in the values taken on by the exchangeable pair coupling that will be given in Lemma 3.4.6. Below, fori,j∈ [n], we write i∼ j if i and j are in the same cycle, let|i| be the length of the cycle containing i and let τ i,j , i,j∈ [n] be the permutation that transposes i and j. Lemma3.4.4 ([21]). Letπ be a fixed permutation. For anyi6=j, distinct elements of [n], the sets A 0 ,...,A 5 form a partition of the space where, A 0 ={|{i,j,π(i),π(j)}| = 2} A 1 ={|i| = 1,|j|≥ 2}, A 2 ={|i|≥ 2,|j| = 1} A 3 ={|i|≥ 3,π(i) =j}, A 4 ={|j|≥ 3,π(j) =i} and A 5 ={|{i,j,π(i),π(j)}| = 4}. Additionally, the sets A 0,1 and A 0,2 partition A 0 where A 0,1 ={π(i) =i,π(j) =j}, A 0,2 ={π(i) =j,π(j) =i}, 39 and we may also write A 1 ={π(i) =i,π(j)6=j}, A 2 ={π(i)6=i,π(j) =j} A 3 ={π(i) =j,π(j)6=i}, A 4 ={π(j) =i,π(i)6=j}, and membership in A m , m = 0,..., 4 depends only on i,j,π(i),π(j). Lastly, the sets A 5,m , m = 1,..., 4 partition A 5 , where A 5,1 ={|i| = 2,|j| = 2,i6∼j} A 5,2 ={|i| = 2,|j|≥ 3}, A 5,3 ={|i|≥ 3,|j| = 2} A 5,4 ={|i|≥ 3,|j|≥ 3}∩A 5 , and membership in A 5,m , m = 1,..., 4 depends only on i,j,π −1 (i),π −1 (j),π(i),π(j). We now state Lemma 6.8 of [21], noting that y 0 and y 00 there are incorrectly interchanged on the left hand side of (3.22). Lemma 3.4.5 ([21]). Let π be a fixed permutation and i6=j distinct elements of [n]. Letting π(−α) =π −1 (α) for α∈ [n] set χ i,j ={−i,−j,i,j}, so that {π(α),α∈χ i,j } ={π −1 (i),π −1 (j),π(i),π(j)}. Then, for π 00 =τ i,j π 0 τ i,j , and y 0 and y 00 given by (3.13) with π 0 and π 00 replacing π, respectively, y 0 −y 00 =b(i,j,π(α),α∈χ i,j ) (3.22) 40 where b(i,j,π(α),α∈χ i,j ) = 5 X m=0 b m (i,j,π(α),α∈χ i,j )1 Am (3.23) with A m , m = 0,..., 5 as in Lemma 3.4.4, b 0 (i,j,π(α),α∈χ i,j ) = 0, b 1 (i,j,π(α),α∈χ i,j ) =a i,i +a π −1 (j),j +a j,π(j) − (a j,j +a π −1 (j),i +a j,π(i) ), b 2 (i,j,π(α),α∈χ i,j ) =a j,j +a π −1 (i),i +a i,π(i) − (a i,i +a π −1 (i),j +a i,π(j) ), b 3 (i,j,π(α),α∈χ i,j ) =a π −1 (i),i +a i,j +a j,π(j) − (a π −1 (i),j +a j,i +a i,π(j) ), b 4 (i,j,π(α),α∈χ i,j ) =a π −1 (j),j +a j,i +a i,π(i) − (a π −1 (j),i +a i,j +a j,π(i) ), and b 5 (i,j,π(α),α∈χ i,j ) =a π −1 (i),i +a i,π(i) +a π −1 (j),j +a j,π(j) −(a π −1 (i),j +a j,π(i) +a π −1 (j),i +a i,π(j) ). (3.24) Next we construct an exchageable pair satisfying (3.2). Lemma 3.4.6. For n≥ 6, let{a i,j } n i,j=1 be an array of real numbers satisfying a i,j =a j,i anda •,• = 0 wherea •,• is as in (3.14). Letπ 0 ∈S n a random permutation has the Ewens measureE θ with θ> 0. Further, let I,J be chosen independently of π, uniformly from all pairs of distinct elements of{1,...n}. Then, letting π 00 = τ I,J π 0 τ I,J and Y 0 and Y 00 be as in (3.13) with π 0 and π 00 replacing π, respectively, (Y 0 ,Y 00 ) is an approximate 4/n,R-Stein pair with R(Y 0 ) = 1 n(n− 1) E[T|Y 0 ] (3.25) 41 where T = 2(n +c 1 − 2(θ + 1)) X |i|=1 a i,i + 2(c 1 − 2θ) X |i|≥2 a i,i −4 X |i|,|j|=1,j6=i a i,j − 4 X |i|=1,|j|≥2 a i,j . (3.26) Proof. As Ewens measure is constant on cycle type, the exchangeability claim follows immediately from the proof of Lemma 6.9 of [21]. It remains to show that Y 0 ,Y 00 satisfies (3.2) with R given by (3.25). As Y 0 is a function of π 0 the tower property of conditional expectation yields that E[Y 0 −Y 00 |Y 0 ] =E[E[Y 0 −Y 00 |π]|Y 0 ], and we begin by computing the conditional expectation given π of the difference P 5 m=0 b m (i,j,π(α),α∈ χ i,j )1 Am in (3.23) with A 0 ,...,A 5 given in Lemma 3.4.4, with i,j replaced by I,J. First we have that b 0 = 0. As given in (6.85) of [21], the contribution to n(n− 1)E[Y 0 −Y 00 |π] from b 1 and b 2 totals to 2(n−c 1 ) X |i|=1 a i,i + 4c 1 X |i|≥2 a i,π(i) − 2c 1 (π) X |i|≥2 a i,i − 2 X |i|=1,|j|≥2 a i,j − 2 X |i|≥2,|j|=1 a i,j , (3.27) and likewise (6.88) shows b 3 and b 4 contribute 6 X |i|≥3 a i,π(i) − 4 X |i|≥3 a π −1 (i),π(i) − 2 X |i|≥3 a π(i),i , (3.28) and (6.90) shows the first four terms of b 5 contribute 4(n− 2−c 1 ) X |i|=2 a i,π(i) + 4(n− 3−c 1 ) X |i|≥3 a i,π(i) . (3.29) 42 Lastly, the contribution from the fifth term ofb 5 is given by (6.91), and separating out the cases where i6=j and i =j in the first sum there, that expression can be seen equivalent to − X |i|≥4 X j∼i,j6=i a i,j + X |i|≥4 (a i,π(i) +a π −1 (i),π(i) )− X |i|≥2,|j|≥2 X j6∼i a i,j . (3.30) To simplify (3.30), let a∧b = min(a,b), and follow (6.92) of [21], separating out the cases where j =i and j6=i, resulting in the identity θ n X i=1 a i,i + X i6=j a i,j =θ X |i|≥1 a i,i + X |i|≥4 X j∼i,j6=i a i,j + X |i|≤3 X j∼i,j6=i a i,j + X |i|≥2,|j|≥2 X j6∼i a i,j + X |i|∧|j|=1 X j6∼i a i,j . (3.31) Since θ P n i=1 a i,i + P i6=j a i,j =n(θ +n− 1)a •,• = 0 by the assumption, we replace the sum of the first and last terms in (3.30) by the sum of the first, third and fifth terms in (3.31). Now (3.30) equals θ X |i|≥1 a i,i + X |i|≤3 X j∼i,j6=i a i,j + X |i|≥4 (a i,π(i) +a π −1 (i),π(i) ) + X |i|∧|j|=1 X j6∼i a i,j . Using that π 2 (i) =π −1 (i) when|i| = 3 and there are only two points in the cycle when|i| = 2, we obtain θ X |i|≥1 a i,i + X |i|≥2 a i,π(i) + X |i|≥3 a π −1 (i),π(i) + X |i|∧|j|=1 X j6∼i a i,j . Combining this contribution with the next three terms of b 5 , each of which yields the same amount, gives the total 4θ X |i|≥1 a i,i + 4 X |i|≥2 a i,π(i) + 4 X |i|≥3 a π −1 (i),π(i) + 4 X |i|∧|j|=1 X j6∼i a i,j . (3.32) 43 Combining (3.32) with the contribution (3.29) from the first four terms in b 5 , the b 1 and b 2 terms in (3.27) and the b 3 and b 4 terms (3.28), yields n(n− 1)E[Y 0 −Y 00 |π] = 4(n− 1) X |i|=2 a i,π(i) + (4n− 2) X |i|≥3 a i,π(i) − 2 X |i|≥3 a π(i),i +2(n−c 1 + 2θ) X |i|=1 a i,i − 2(c 1 − 2θ) X |i|≥2 a i,i +4 X |i|∧|j|=1,j6∼i a i,j − 2 X |i|=1,|j|≥2 a i,j − 2 X |i|≥2,|j|=1 a i,j = 4(n− 1) X |i|=2 a i,π(i) + (4n− 2) X |i|≥3 a i,π(i) − 2 X |i|≥3 a π(i),i + 4(n− 1) X |i|=1 a i,i −2(n +c 1 − 2(θ + 1)) X |i|=1 a i,i − 2(c 1 − 2θ) X |i|≥2 a i,i +4 X |i|∧|j|=1,j6∼i a i,j − 2 X |i|=1,|j|≥2 a i,j − 2 X |i|≥2,|j|=1 a i,j , where the final two expressions are identical but for a rewriting of the coefficient of P |i|=1 a i,i . Now applying the assumption that a i,j = a j,i to the third term, using that π(i) =i for|i| = 1 for the fourth term, and a i,j =a j,i again to obtain 4 X |i|∧|j|=1,j6∼i a i,j − 2 X |i|=1,|j|≥2 a i,j − 2 X |i|≥2,|j|=1 a i,j = 4 X |i|=1,|j|=1,j6=i a i,j + 4 X |i|=1,|j|≥2 a i,j we have n(n− 1)E[Y 0 −Y 00 |π] = 4(n− 1) X |i|=2 a i,π(i) + (4n− 2) X |i|≥3 a i,π(i) −2 X |i|≥3 a i,π(i) + 4(n− 1) X |i|=1 a i,π(i) −T = 4(n− 1)Y 0 −T, 44 where T is given in (3.26). Now conditioning on Y 0 proves the claim that E[Y 00 |Y 0 ] = (1−λ)Y 0 +R(Y 0 ) with λ = 4/n and R(Y 0 ) as in (3.25). Next lemma provides bounds for E|R| and|EY 0 R| that will be used when applying Theorems 3.2.1 and 3.2.2. To derive the bounds, we use a consequence of Lemma 3.3.3, which can be easily verified, that E [c 1 ] = θn θ +n− 1 , E [c 1 (c 1 − 1)] = θ 2 n (2) (θ +n− 1) (2) , E h c 2 1 i = θ 2 n (2) (θ +n− 1) (2) + θn θ +n− 1 , and E h c 2 1 (c 1 − 1) 2 i = θ 4 n (4) (θ +n− 1) (4) + 4θ 3 n (3) (θ +n− 1) (3) + 2θ 2 n (2) (θ +n− 1) (2) .(3.33) Lemma 3.4.7. Let (Y 0 ,Y 00 ) be an approximate 4/n,R-Stein pair constructed as in Lemma 3.4.6 with R as in (3.25). Then E|R|≤ θM(12n + 8θ− 10) (n− 1)(θ +n− 1) + 2θ 2 M (θ +n− 1) (2) , (3.34) and |EY 0 R|≤ 10κ θ,n,1 + 4θ + 4(κ θ,n,1 (θ + 1) +κ θ,n,2 ) n ! Mσ n− 1 (3.35) where M = max i,j |a i,j −a •,• | with a •,• as in (3.14) and κ θ,n,1 and κ θ,n,2 are given in (3.19) and (3.20), respectively. Proof. By replacing a i,j by a i,j −a •,• we may assume (3.16) is satisfied, that is, thatEY 0 = 0 and a •,• = 0 and thus demonstrate the claim with M = max i,j |a i,j |. 45 We start with the first claim, using conditional Jensen’s inequality and (3.25) to obtain|R|≤E[|T||Y 0 ]/(n(n− 1)) and therefore that E|R|≤E|T|/(n(n− 1)), where T is given by (3.26). Using|a i,j |≤M for all i,j, we have E|R| ≤ 2ME|(n +c 1 − 2(θ + 1))c 1 | n(n− 1) + 2ME|(c 1 − 2θ)(n−c 1 )| n(n− 1) + 4ME[c 1 (c 1 − 1)] n(n− 1) + 4ME[c 1 (n−c 1 )] n(n− 1) = 2ME|(n− 1)c 1 +c 1 (c 1 − 1)− 2θc 1 | n(n− 1) + 2ME|(c 1 − 2θ)(n−c 1 )| n(n− 1) + 4ME[(n− 1)c 1 ] n(n− 1) ≤ 2ME[(n− 1)c 1 +c 1 (c 1 − 1) + 2θc 1 ] n(n− 1) + 2ME[nc 1 + 2θn] n(n− 1) + 4ME[c 1 ] n . Now using (3.33), we obtain E|R| ≤ 2θM θ +n− 1 + 2θ 2 M (θ +n− 1) (2) + 4θ 2 M (n− 1)(θ +n− 1) + 2θMn (n− 1)(θ +n− 1) + 4θM n− 1 + 4θM θ +n− 1 = θM(12n + 8θ− 10) (n− 1)(θ +n− 1) + 2θ 2 M (θ +n− 1) (2) . Now we consider the second claim. First note that by (3.33), E X |i|=1 a i,i 2 ≤M 2 E[c 2 1 ] =κ 2 θ,n,1 M 2 . Hence, using that n(n− 1)|E[Y 0 R(Y 0 )]| =|E[Y 0 E[T|Y 0 ]]| =|E[E[Y 0 T|Y 0 ]]| =|E[Y 0 T ]|, for the first term in the product Y 0 T, bounding c 1 by n, we obtain 46 E 2(n +c 1 − 2(θ + 1))Y 0 P |i|=1 a i,i n(n− 1) ≤ 4(n +θ + 1)E Y 0 P |i|=1 a i,i n(n− 1) ≤ 4κ θ,n,1 Mσ n− 1 + 4(θ + 1)κ θ,n,1 Mσ n(n− 1) (3.36) where here, and at similar steps later on, we apply the Cauchy–Schwarz inequality. Now, to bound the second term in the Y 0 T product, by (3.33) we have E c 2 1 X |i|≥2 a i,i 2 ≤M 2 n 2 E[c 2 1 ] =κ 2 θ,n,1 M 2 n 2 and E X |i|≥2 a i,i 2 ≤M 2 n 2 . Thus, for that second term, E (2c 1 (π)− 4θ)Y 0 P |i|≥2 a i,i n(n− 1) ≤ 2E c 1 Y 0 P |i|≥2 a i,i n(n− 1) + 4θE Y 0 P |i|≥2 a i,i n(n− 1) ≤ 2κ θ,n,1 Mσ n− 1 + 4θMσ n− 1 . (3.37) For the third term, again by (3.33), E X |i|=1,|j|=1,j6=i a i,j 2 ≤M 2 E[c 2 1 (c 1 − 1) 2 ] =M 2 κ 2 θ,n,2 , 47 and thus, 4E Y 0 P |i|=1,|j|=1,j6=i a i,j n(n− 1) ≤ 4κ θ,n,2 Mσ n(n− 1) . (3.38) For the fourth term, by (3.33), E X |i|=1,|j|≥2 a i,j 2 ≤n 2 M 2 E[c 2 1 ] =n 2 M 2 κ 2 θ,n,1 , and thus, 4E Y 0 P |i|=1,|j|≥2 a i,j n(n− 1) ≤ 4κ θ,n,1 Mσ n− 1 . (3.39) Hence, summing (3.36) to (3.39), we have |EY 0 R| ≤ 4κ θ,n,1 Mσ n− 1 + 4(θ + 1)κ θ,n,1 Mσ n(n− 1) + 2κ θ,n,1 Mσ n− 1 + 4θMσ n− 1 + 4κ θ,n,2 Mσ n(n− 1) + 4κ θ,n,1 Mσ n− 1 = 10κ θ,n,1 + 4θ + 4(κ θ,n,1 (θ + 1) +κ θ,n,2 ) n ! Mσ n− 1 . Next, Lemma 3.4.8 calculates E[(Y 0 − Y 00 ) 2 ] and σ 2 explicitly. For ease of notation, we write Σ α 1 ,...,αm for the sum over distinct α k ∈ [n] in its proof. Lemma 3.4.8. Let n≥ 6 and (Y 0 ,Y 00 ) be an approximate 4/n,R-Stein pair con- structed as in Lemma 3.4.6 with R as in (3.25). Then E[(Y 0 −Y 00 ) 2 ] = 2β 1 + 2β 3 +β 5,1 + 2β 5,2 +β 5,4 , (3.40) 48 and σ 2 = n 8 (2β 1 + 2β 3 +β 5,1 + 2β 5,2 +β 5,4 ) + nEY 0 R 4 , (3.41) where, with b m , m = 1,..., 5 are given in (3.24), β 1 (θ,n) = 1 n(n− 1)(θ +n− 1) (3) × θ 2 X i,j,s b 2 1 (i,j,i,s,i,s) +θ X i,j,s,l b 2 1 (i,j,i,s,i,l) , (3.42) β 3 (θ,n) = 1 n(n− 1)(θ +n− 1) (3) × θ X i,j,r b 2 3 (i,j,r,i,j,r) + X i,j,r,l b 2 3 (i,j,r,i,j,l) , (3.43) β 5,1 (θ,n) = θ 2 n(n− 1)(θ +n− 1) (4) X i,j,r,s b 2 5 (i,j,r,s,r,s), (3.44) β 5,2 (θ,n) = θ n(n− 1)(θ +n− 1) (4) X i,j,r,s,l b 2 5 (i,j,r,s,r,l), (3.45) and β 5,4 (θ,n) = 1 n(n− 1)(θ +n− 1) (4) θ X i,j,r,k b 2 5 (i,j,r,k,k,r) + X i,j,r,k,l b 2 5 (i,j,r,k,k,l) + X i,j,r,s,k,l b 2 5 (i,j,r,s,k,l) . (3.46) 49 Proof. We first calculate E[(Y 0 −Y 00 ) 2 ] with the help of Lemma 3.4.5. Note that we writeb m (i,j,π(α),α∈χ i,j ) =b m (i,j,π −1 (i),π −1 (j),π(i),π(j)) in what follows. As b 0 = 0, moving on to A 1 , and recalling that the summations in this proof are over distinct values, we have E[(Y 0 −Y 00 ) 2 ]1 A 1 = Eb 2 1 (I,J,π −1 (I),π −1 (J),π(I),π(J))1 {π(I)=I,π(J)6=J} = 1 n(n− 1)(θ +n− 1) (3) × θ 2 X i,j,s b 2 1 (i,j,i,s,i,s) +θ X i,j,s,l b 2 1 (i,j,i,s,i,l) , noting that there are n(n− 1) possibilities for I and J, applying Lemma 3.3.1 with B ={i,j,s} we see that the factor θ 2 /(θ +n− 1) (3) in the first term is the probability that π(i) = i, π(j) = s and π(s) = j and the factor θ/(θ +n− 1) (3) in the second term is the probability that π(i) = i, π(j) = l and π(s) = j. By symmetry, A 2 contributes the same amount. For A 3 , by similar reasoning, we have, E[(Y 0 −Y 00 ) 2 ]1 A 3 = Eb 2 3 (I,J,π −1 (I),π −1 (J),π(I),π(J))1 {π(I)=J,π(J)6=I} = 1 n(n− 1)(θ +n− 1) (3) × θ X i,j,r b 2 3 (i,j,r,i,j,r) + X i,j,r,l b 2 3 (i,j,r,i,j,l) . The contribution from A 4 is the same as that from A 3 . Lastly, consider the contribution from A 5 . Starting with A 5,1 , we have E[(Y 0 −Y 00 ) 2 ]1 A 5,1 = Eb 2 5 (I,J,π −1 (I),π −1 (J),π(I),π(J))1 {|I|=2,|J|=2,I6∼J} = θ 2 n(n− 1)(θ +n− 1) (4) X i,j,r,s b 2 5 (i,j,r,s,r,s). 50 For A 5,2 , we have E[(Y 0 −Y 00 ) 2 ]1 A 5,2 = Eb 2 5 (I,J,π −1 (I),π −1 (J),π(I),π(J))1 {|I|=2,|J|≥3} = θ n(n− 1)(θ +n− 1) (4) X i,j,r,s,l b 2 5 (i,j,r,s,r,l). Again by symmetry, A 5,3 contributes the same. For A 5,4 , we have E[(Y 0 −Y 00 ) 2 ]1 A 5,4 = Eb 2 5 (I,J,π −1 (I),π −1 (J),π(I),π(J))1 {|I|≥3,|J|≥3} = 1 n(n− 1)(θ +n− 1) (4) θ X i,j,r,k b 2 5 (i,j,r,k,k,r) + X i,j,r,k,l b 2 5 (i,j,r,k,k,l) + X i,j,r,s,k,l b 2 5 (i,j,r,s,k,l) . Summing all cases, yieldsE[(Y 0 −Y 00 ) 2 ] in (3.40). SinceE[(Y 0 −Y 00 ) 2 ] = 2 (λσ 2 −EY 0 R) and λ = 4/n, we have σ 2 = nE[(Y 0 −Y 00 ) 2 ] 8 + nEY 0 R 4 = n 8 (2β 1 + 2β 3 +β 5,1 + 2β 5,2 +β 5,4 ) + nEY 0 R 4 , where β 1 , β 3 , β 5,1 , β 5,2 and β 5,4 are defined as in (3.42), (3.43), (3.44), (3.45) and (3.46), respectively. 51 Remark 3.4.9. Here we consider the order of σ 2 given in (3.41). Letβ 1 , β 3 , β 5,1 , β 5,2 and β 5,4 be given as in (3.42)-(3.46), respectively. Using (3.35), we have σ 2 = n 8 (2β 1 + 2β 3 +β 5,1 + 2β 5,2 +β 5,4 ) + nEY 0 R 4 ≥ n 8 (2β 1 + 2β 3 +β 5,1 + 2β 5,2 +β 5,4 )− n|EY 0 R| 4 ≥ n 8 (2β 1 + 2β 3 +β 5,1 + 2β 5,2 +β 5,4 ) − 3κ θ,n,1 + 1.2θ + 1.2(κ θ,n,1 (θ + 1) +κ θ,n,2 ) n ! Mσ. As discussed in Remark 3.4.2, κ θ,n,1 and κ θ,n,2 are of constant order in n and thus σ 2 is of order n whenever at least one of β γ (θ,n), γ = 1, 3, (5, 1), (5, 2), (5, 4) are of constant order. Since the final sum of (3.46) in the expression for β 5,4 (θ,n) has n (6) terms and the denominator is of ordern 6 ,β 5,4 (θ,n) is of constant order inn if the elements of{a i,j :{i,j}⊂ [n]} are chosen so that the values do not depend on n and b 5 (i,j,r,s,k,l) with distinct i,j,r,s,k,l∈ [n] are nonzero for at leastbδn 6 c terms for some δ> 0. For instance, if a i,j ,{i,j}⊂ [n] are independent identically distributed uniform random variables on [0, 1] then these sufficient conditions hold almost surely. Now it is time to construct an approximate zero bias coupling that will be used when applying Theorems 3.2.1 and 3.2.2 in the proof of Theorem 3.4.1. We first specialize the outline in Lemma 3.1.2 to the more specific case where the random index I is chosen independently of the permutation and Y 00 −Y 0 =f(I, Ξ α ,α∈χ I ), (3.47) 52 whereI andχ I ⊂χ are vectors of small dimensions andf is a function with range being subset ofR, that is, we consider situations that Y 00 −Y 0 depends on only a few variables. Letting Y 0 ,Y 00 ,Y † ,Y ‡ be constructed as in Lemma 3.1.2 satisfying (3.47), we follow Section 4.4.1 of [21] decomposing P (i,ξ α ,α∈χ) as P (i,ξ α ,α∈χ) =P (I =i)P i (ξ α ,α∈χ i )P i c |i (ξ α ,α6∈χ i |ξ α ,α∈χ i ), (3.48) whereP i (ξ α ,α∈χ i ) is the marginal distribution of ξ α forα∈χ i , andP i c |i (ξ α ,α6∈ χ i |ξ α ,α∈χ i ) the conditional distribution of ξ α for α6∈χ i given ξ α for α∈χ i . For the square bias distribution, similarly, we decompose P † (i,ξ α ,α∈χ) as P † (i,ξ α ,α∈χ) =P † (I =i)P † i (ξ α ,α∈χ i )P i c |i (ξ α ,α6∈χ i |ξ α ,α∈χ i ) (3.49) where P † (I =i) = P (I =i)Ef 2 (i, Ξ α ,α∈χ i ) Ef 2 (I, Ξ α ,α∈χ I ) , (3.50) and P † i (ξ α ,α∈χ i ) = f 2 (i,ξ α ,α∈χ i ) Ef 2 (i, Ξ α ,α∈χ i ) P i (ξ α ,α∈χ i ). (3.51) From this point, we denote I † for I that is generated from (3.50). Notice that the representation of P † i (ξ α ,α∈ χ) in (3.49) allows us to construct I † and {Ξ † α ,α∈ χ} with distribution P † (i,ξ α ,α∈ χ) parallel to I and{Ξ α ,α∈ χ} with distribution P (i,ξ α ,α∈ χ) in (3.48). That is, we first choose I † by P † (I = i) and then generate{Ξ † α ,α∈χ i } following distribution P † i (ξ α ,α∈χ i ) given I † =i. Finally, we generate{Ξ † α ,α / ∈ χ i } according to P i c |i (ξ α ,α6∈ χ i |ξ α ,α∈ χ i ). As 53 the last term in (3.48) and (3.49) are exactly the same, the reader will see in the construction below that this equality will allow us to set Ξ α = Ξ † α for most α, yielding that UY † + (1−U)Y ‡ and Y 0 are close. Construction of an approximate zero bias coupling (Y 0 ,Y ∗ ): Now we construct an approximate zero bias variable Y ∗ from Y 0 , starting with its underlying permutations. Recall again that we consider the symbols π and Y interchageable withπ 0 andY 0 , respectively. Letπ have the EwensE θ distribution. By replacinga i,j bya i,j −a •,• we may assume that (3.16) is satisfied, that is, that EY 0 = 0 and a •,• = 0 and thus M = max i,j |a i,j |. Now we follow the outline described right before this construction to produce a coupling of Y 0 to a pair Y † ,Y ‡ , with the square bias distribution (3.8) and then construct a coupling ofY 0 toY ∗ satisfying (3.5), using uniform interpolation as in Lemma 3.1.1. We first provide a brief overview of the construction, providing the full details later. To specialize the outline above to the case at hand, we let f = b with b given in Lemma 3.4.5, I = (I,J), i = (i,j), Ξ α = π(α), Ξ † α = π † (α), χ = [n] and ξ α ,α ∈ [n] denote distinct elements of [n]. As in Lemma 3.4.5, we write π(−α) =π −1 (α) for α∈ [n] and let χ i =χ i,j ={−i,−j,i,j} and p i,j (ξ α ,α∈χ i,j ) =P (π(α) =ξ α ,α∈χ i,j ), the distribution of the pre and post images of i and j under π. By the decomposition in (3.48) and Lemma 3.4.6,π 0 andπ 00 can be constructed by choosing I with P (I = i) uniformly on [n], then constructing the pre and post images of I and J under π 0 and the values of π 0 on the remaining variables conditional on what has already been chosen and finally letting π 00 =τ I,J π 0 τ I,J . 54 For the distribution of the pair (Y † ,Y ‡ ) with the square bias distribution, we will first construct the underlying permutations (π † ,π ‡ ). For this purpose, we follow the parallel decomposition of P † (i,ξ α ,α∈ χ) in (3.49) beginning with the indices I † = (I † ,J † ) with distribution (3.50), P (I † =i,J † =j) = P (I =i,J =j)Eb 2 (i,j,π(α),α∈χ i,j ) Eb 2 (I,J,π(α),α∈χ I,J ) (3.52) whereb(i,j,π(α),α∈χ i,j ) is given in Lemma 3.4.5 andEb 2 (I,J,π(α),α∈χ I,J ) = E[(Y 0 −Y 00 ) 2 ] has been calculated in Lemma 3.4.8. Next, givenI † =i andJ † =j, we generate their pre and post images π −† (I † ),π −† (J † ),π † (I † ),π † (J † ) with distri- bution (3.51), p † i,j (ξ α ,α∈χ i,j ) = b 2 (i,j,ξ α ,α∈χ i,j ) Eb 2 (i,j,π(α),α∈χ i,j ) p i,j (ξ α ,α∈χ i,j ). (3.53) Next we will construct the remaining images ofπ † fromπ conditional on what has already been chosen so that I † and π † follow (3.49), and that π and π † are close. Specializing the last factor of (3.49) to the case at hand, the remaining images of π † given what has already been generated have distribution p † i c |i (ξ α ,α6∈χ i,j |ξ α ,α∈χ i,j ) =P π † (α) =ξ α ,α6∈χ i,j |π † (α) =ξ α ,α∈χ i,j (3.54) with P follows the original EwensE θ distribution. As the approximate λ,R-Stein pair (Y 0 ,Y 00 ) is defined as in (3.13) with π replaced by π 00 =τ I,J π 0 τ I,J for Y 00 , we will then letY † andY ‡ be as in (3.13) withπ replaced byπ † andπ ‡ =τ I † ,J †π † τ I † ,J †, respectively. ApplyingLemma3.1.2, (Y † ,Y ‡ )willhavethesquarebiasdistribution as in (3.8). 55 Next we construct π † from the given π 0 . Recall from Section 3.3 that, with B⊂ [n], the reduced permutationσ\B acts on [n]\B and has cycle representation obtained by deleting all elements of B in the cycle representation of σ. In view of Proposition 3.3.2, we will keep the number of cycles #(π † \χ i,j ) of the reduced permutation π † \χ i,j , the same as #(π 0 \χ i,j ) when applying (3.54). For ease of notation, we denote the values I † ,J † ,π −† (I † ),π −† (J † ),π † (I † ),π † (J † ) (3.55) generated in (3.52) and (3.53) by i,j,r,s,k,l, respectively. By (3.52), i6= j and thusr6=sandk6=l. Lemma3.4.4givesthat,forπ † ,membershipinA 0 ,...,A 4 and A 5,1 ,...,A 5,4 , defined in Lemma 3.4.4, is determined by the generatedi,j,r,s,k,l. The remaining specification ofπ † fromπ depends on which case, or subcase, of the events A 0 ,A 1 , A 2 , A 3 , A 4 , A 5 is obtained. In each instance, we construct π † from π by removing i,j,r,s from π’s cycle representation and inserting them into the resulting reduced permutation so that, respectively,r andk will be the pre and post images of i, and s and l will be those of j. As we only delete i,j,r,s from π and then insert these values without moving the remaining ones when constructing π † , it is clear thatπ † \{i,j,r,s} andπ\{i,j,r,s} are identical. As the distribution of I † and their pre and post images follow (3.50) and (3.51), respectively, and the conditional distribution (3.54) of the remaining values is the same as that for those of π, the pair (I † ,π † ) follows the distribution in (3.49). In the following, we explicitly construct π † from π by separating into cases, or subcases, of the events A 0 ,A 1 , A 2 , A 3 , A 4 , A 5 . To be more easily understandable, wealsoprovideanexampleinFigure3.1. Althoughthereareseveralcases, theidea in each case follows the same rule explained in the previous paragraph. Therefore, 56 we only present the full detail in the first nonzero case A 1 and remark for the reader that the last case A 5,4 is the only one that contributes to the orders of the bounds in Theorem 3.4.1 and hence it suffices for the reader to read only the first nonzero and the last cases. Below, we use the notation i→j if we change the cycle structure of π so that π † (i) =j. We say ‘deletei fromπ’ if we deletei from the cycle structure ofπ and connect π −1 (i)→ π(i) so that we end up with the reduced permutation π\{i}. We say ‘inserti in front ofk’ if we puti betweenπ −1 (k) andk, that is, we end up with π −1 (k)→i→k after the insertion. Case A 0 : As b 0 = 0 by Lemma 3.4.5, p † i,j as in (3.53) in the case A 0 is zero and therefore we need not consider this case. Case A 1 : We separate this case into two subcases, A 1,1 ={|i| = 1,|j| = 2} and A 1,2 ={|i| = 1,|j|≥ 3}. For A 1,1 , we have (i,j,r,s,k,l) = (i,j,i,s,i,s). We first recall that from (3.55) r is the pre image ofi and thusr =i meansi must be a fixed point. Similarly, since s and l are pre and post images of j, respectively, l = s6= j implies that (s,j) must be a 2-cycle. Hence, in this case, we simply delete i,j,s from π, then let i be a 1-cycle and (s,j) be a 2-cycle. For A 1,2 , we have (i,j,r,s,k,l) = (i,j,i,s,i,l). By the same reasoning as for A 1,1 , we keep l at the original place and delete i,j,s fromπ, then leti be a 1-cycle and inserts→j in front of l. For the remaining unmentioned values of the permutation, we keep them at the original places. Now we call the modified permutation π † . Since we have moved only i,j ands, the reduced permutations π † \{i,j,s} andπ\{i,j,s} are exactly the same. Applying Proposition 3.3.2 with B ={i,j,s}, we have that the remaining images of π † , conditioning on{π † (i) = i,π † (j) = l,π † (s) = j} has 57 distribution p † i c |i (ξ α ,α6∈{−i,−j,i,j}|ξ −i = i,ξ −j = s,ξ i = i,ξ j = l) in (3.54). Therefore the distribution of (I † ,π † ) in A 1 case follows (3.49) and we have π(φ) =π † (φ) for all φ6∈I 1 where I 1 ={π −1 (α),β :α∈{i,j,s,l},β∈{i,j,s}}. Hence|I 1 | is at most 7. From this point, for any indexγ, we denoteI γ for the set that π(φ) =π † (φ) for all φ6∈I γ in the case A γ . Case A 2 : Switching the role ofi andj, we follow the same construction as for A 1 and thus|I 2 | is also at most 7. Case A 3 : Again we separate this case into A 3,1 ={|i| = 3,π † (i) = j} and A 3,2 ={|i|≥ 4,π † (i) =j}. For A 3,1 , (i,j,r,s,k,l) = (i,j,r,i,j,r), we delete i,j,r from π and let them be a 3-cycle r → i→ j → r. For A 3,2 , (i,j,r,s,k,l) = (i,j,r,i,j,l), we delete i,j,r from π and insert r→ i→ j in front of l. By the same reasoning as in A 1 , the remaining images of π † , conditioning on{π † (i) = j,π † (j) = l,π † (r) = i} has distribution p † i c |i (ξ α ,α6∈{−i,−j,i,j}|ξ −i = r,ξ −j = i,ξ i = j,ξ j = l) in (3.54) and thus the distribution of (I † ,π † ) in A 3 case follows (3.49). In this case, π(φ) =π † (φ) for all φ6∈I 3 where I 3 ={π −1 (α),β :α∈{i,j,r,l},β∈{i,j,r}}. Hence|I 3 | is at most 7. Case A 4 : Switching the role ofi andj, we follow the same construction as for A 3 and so|I 4 | is at most 7. Moving on to A 5 , here we separate it into A 5,1 , A 5,2 , A 5,3 and A 5,4 , defined in Lemma 3.4.4. 58 Figure 3.1: This figure shows an example of the construction of π † and π ‡ from π whenn = 15 and (i,j,r,s,k,l) = (1, 5, 2, 4, 2, 6) that corresponds to the case A 5,2 . One can easily notice that the reduced permutationsπ\{1, 5, 2, 4},π † \{1, 5, 2, 4}, and π ‡ \{1, 5, 2, 4} have the same cycle structure, that is, one 1-cycle, one 2-cycle and two 4-cycles. Case A 5,1 : Here (i,j,r,s,k,l) = (i,j,r,s,r,s), we delete i,j,r,s from π and then let both (r,i) and (s,j) be 2-cycles. By the same argument as in the previous cases, the distribution of (I † ,π † ) follows (3.49) and we end up with π(φ) =π † (φ) for all φ6∈I 5,1 where I 5,1 ={π −1 (α),α :α∈{i,j,r,s}}. Thus|I 5,1 | is at most 8. 59 Case A 5,2 : As (i,j,r,s,k,l) = (i,j,r,s,r,l), we delete i,j,r,s from π and then let (r,i) be a 2-cycle and insert s→ j in front of l. In this case, again the distribution of (I † ,π † ) follows (3.49) and π(φ) =π † (φ) for all φ6∈I 5,2 where I 5,2 ={π −1 (α),β :α∈{i,j,r,s,l},β∈{i,j,r,s}}. Hence|I 5,2 | is at most 9. Case A 5,3 : Switching the role of i and j, we follow the same construction as for A 5,2 and thus|I 5,3 | is at most 9. Case A 5,4 : We again need to break this case into subcases depending on the size of{i,j,r,s,k,l}. As the case that contributes the most to the difference between π and π † is the case where i,j,r,s,k,l are all distinct, we only discuss the construction of this particular case. We delete i,j,r,s from π and then insert r→i and s→j in front of k and l, respectively. In this case, the distribution of (I † ,π † ) again follows (3.49) and π(φ) =π † (φ) for all φ6∈I 5,4 where I 5,4 ={π −1 (α),β :α∈{i,j,r,s,k,l},β∈{i,j,r,s}}. Thus|I 5,4 | is at most 10. Now the construction of π † has been specified in every case and subcase and the distribution of (I † ,π † ) follows (3.49). Therefore, setting π ‡ =τ I † ,J †π † τ I † ,J † 60 results in a collection of variablesI † ,J † and a pair of permutations with the square bias distribution (3.7). Hence, letting Y 0 ,Y † ,Y ‡ given by (3.13) with π 0 , π † and π ‡ , respectively, yields a coupling of Y 0 to the variables Y † ,Y ‡ with the square bias distribution as required in Lemma 3.1.1. Invoking Lemma 3.1.1 to Y ∗ = UY † + (1−U)Y ‡ with U∼U[0, 1] be independent of Y † ,Y ‡ , we have Y ∗ be an approximate Y 0 -zero bias variable satisfying (3.5), as desired. The following lemma provides a bound of the difference between Y ∗ and Y 0 . Lemma 3.4.10. Let Y 0 and M be defined as in the statement of Theorem 3.4.1 and Y ∗ be constructed as in the construction above. Then|Y ∗ −Y 0 |≤ 20M. Proof. By the same argument as in the proof of Lemma 6.10 in [21], we have |Y ∗ −Y 0 |≤ 2 max γ∈Γ |I γ |M, where Γ ={1, 2, 3, 4, (5, 1), (5, 2), (5, 3), (5, 4)}. Then the claim follows from that |I γ | is at most 10 from the construction above. Now we have all ingredients to prove our main theorem. Proof of Theorem 3.4.1 As before, by replacing a i,j by a i,j − a •,• , we may assume that (3.16) is satisfied, that is, that EY 0 = 0 and a •,• = 0 and thus proceedtheconstructionwithM = max i,j |a i,j |. Firstweconstructanapproximate 4/n,R-Stein pair (Y 0 ,Y 00 ) as in Lemma 3.4.6 with the remainderR given in (3.25). Then we construct an approximate zero bias variable Y ∗ satisfying (3.5) as in the construction right above this proof. Now we apply Theorems 3.2.1 and 3.2.2, handling three terms on the right hand side of (3.9) and (3.10). We note that the three terms from the two theorems 61 are different only on their constants so we handle both L 1 and L ∞ upper bounds at the same time. For the first term, by Lemma 3.4.10, we have 1 σ |Y ∗ −Y 0 |≤ 20M σ . (3.56) Now we handle the last two terms containing the remainderR. Applying (3.35) and (3.34) in Lemma 3.4.7, respectively, and using that λ = 4/n and n≥ 6, we obtain |EY 0 R| σ 2 λ ≤ 3κ θ,n,1 + 1.2θ + κ θ,n,1 (θ + 1) +κ θ,n,2 n− 1 ! M σ , (3.57) and E|R| σλ ≤ 0.6Mθ(6n + 4θ− 5) σ(θ +n− 1) + 0.5θ 2 Mn σ(θ +n− 1) (2) , (3.58) where κ θ,n,1 and κ θ,n,2 are given in (3.19) and (3.20), respectively. Invoking Theorems 3.2.1 and 3.2.2, using (3.56), (3.57) and (3.58), now yields the L 1 and L ∞ upper bounds in (3.17) and (3.18), respectively. Next we follow the idea in [34] to prove the L ∞ lower bound in (3.21) in the case that a i,j are all integers. By Chebyshev’s inequality, P (EY− √ 3σ<Y <EY + √ 3σ)≥ 2/3. The random variableY is discrete having at mostn! possibilities and for each pair of possibilities, the difference between them is at least 1 as a i,j are all integers. 62 Therefore the interval (EY− √ 3σ,EY + √ 3σ) contains at most l (2 √ 3σ + 1) m pos- sible values ofY which implies that the largest point massp(n) in the distribution of Y satisfies p(n)≥ 2 6 √ 3σ + 3 . As Φ(x) the distribution function of Z is continuous, we have sup ∞<x<∞ |P (Y <x)− Φ((x−EY )/σ)|≥ 1 2 p(n) = 1 6 √ 3σ + 3 . 63 Chapter 4 Stein’s Method for Positively Associated Random Fields “In order to carry a positive action we must develop here a positive vision.” Dalai Lama Positive association is one kind of dependence between random variables that has many interesting applications. In this chapter, we provide an overview of posi- tively associated random variables and bounds on distances between such variables and the normal using Stein’s method. We leave applications of our main results to four well known models in mathematical physics for the next chapter. This chapter and the next one are based on the recent paper [51]. 4.1 Positive Association A random vector ξ = (ξ 1 ,...,ξ m ) ∈ R m is said to be positively associated whenever Cov ψ(ξ),φ(ξ) ≥ 0 (4.1) for all real valued coordinate-wise nondecreasing functions ψ and φ on R m such thatψ(ξ) andφ(ξ) posses second moments. In general, a collection{ξ α :α∈I} of real valued random variables indexed by a set I is said to be positively associated 64 if all finite subcollections are positively associated. Positive association was intro- duced in [35] and has appeared frequently in probabilistic models in several areas, especially statistical physics. In some literature, positive association is known as the ‘FKG-inequality’ or simply ‘association’ (see [69] and [24] for examples, respec- tively). Over the last few decades, several papers established central limit theorems and rates of convergence for sums of positively associated random variables under different assumptions. To fix terms, for d a positive integer let Z d denote the d-dimensional lattice composed of vectors i = (i 1 ,...,i d ) having integer compo- nents. To state some of these results, recall that a random field{X j : j∈Z d } is called second order stationary when EX 2 j <∞ for all j∈ Z d and the covariance Cov X i ,X j depends only onj−i, that is, Cov X i ,X j =R(j−i) for alli,j∈R d , with R(·) necessarily given by R(k) = Cov X 0 ,X k . (4.2) We say the field is strictly stationary if for all m∈N 1 and k,j 1 ,...,j m ∈Z d , the vector (X j 1 ,...,X jm ) has the same distribution as (X j 1 +k ,...,X jm+k ). From the definitions, when second moments exist, strict stationarity implies second order stationarity, and if a second order stationary field is positively associated, then R(k)≥ 0 for all k∈Z d . We let 1∈Z d denote the vector with all components 1, and write inequalities such as a < b for vectors a,b∈ R d when they hold componentwise. For k∈ 65 Z d ,n∈N 1 and a random field{X j :j∈Z d }, define the ‘block sum’ variables, over a block with side length n, by S n k = X j∈B n k X j where B n k = n j∈Z d :k≤j<k +n1 o . (4.3) Note that B n k =B n 0 +k. For a second order stationary field with R(·) given by (4.2), we have Var S n k = X i,j∈B n k Cov X i ,X j =n d A n where A n = 1 n d X i,j∈B n 1 R(i−j). (4.4) With A = X k∈Z d R(k), (4.5) Lemma4of[69]showsR(0)<∞,R(k)≥ 0andA<∞imply lim n→∞ A n =A. As A n orA equals zero only if the field is trivial, we assume without loss of generality in the following that A n > 0 for all n∈N 1 and A> 0. The following result of [69] shows that the collection of properly standardized block sums S n k of a strictly stationary, positively associated random field with covariance given by (4.2) converges jointly to independent normal variables when A<∞. Theorem 4.1.1 ([69]). Let{X j :j∈Z d } be a positively associated strictly station- ary random field with finite second moments and covariance R(k) given by (4.2) that satisfies A<∞ where A is given in (4.5). Then the standardized block sums ( S n nk −ES n nk n d/2 :k∈Z d ) (4.6) 66 converge to independent mean zero normal distributions with variance A as n→ ∞. That is, the expectations of bounded continuous functions of (4.6) that depend on only finitely many coordinates converge to the expectation of that same number of independent mean zero, variance A normal variables. We note that the blocks B n nk over which sums are taken in Theorem 4.1.1 are disjoint, and that the limiting distribution has independent coordinates. As an application of our Theorem 4.3.1, in Theorem 5.1.2 we obtain bounds for the sums over overlapping blocks to an approximating dependent multivariate normal distribution. Rectangular blocks, that is, blocks with varying side lengths, can also be accommodated by our methods, at the cost of some additional complexity in our computations. The stationarity assumption was relaxed in [24], where it was shown that the block sums of a positively associated random field{X j : j∈Z d } converge to the normal whenever X j has finite third moment for every j∈Z d and u(n) converges to zero as n→∞ where u(n) = sup j∈Z d X i∈Z d :|j−i|∞≥n Cov X i ,X j . (4.7) The rate of convergence for sums of positively associated variables was first obtained in [11], which achieved the rate log(n)/ √ n in the L ∞ metric in the one dimensional case, assuming uniformly finite 3 +-moments for some > 0, and thatu(n) decays at an exponential rate. The following theorem of [15] generalizes the results in [11] to the multidimensional case over Z d . In the following C will denote a constant whose value may change from line to line. 67 Theorem 4.1.2 ([15]). Let > 0 and{X j : j∈ Z d } be a mean zero, positively associated random field whose elements have uniformly bounded 3 + moments. Assume that there exists λ> 0 such that u(n) as defined in (4.7) satisfies u(n)≤κ 0 e −λn for some κ 0 > 0. Then for any finite subset V⊂Z d and S(V ) = X j∈V X j with σ 2 (V ) = Var(S(V )), one has d ∞ L(S(V )/σ(V )),L(Z) ≤C|V|σ(V ) −3 (log(|V| + 1)) d where Z is a standard normal random variable, |V| denotes the size of V and C > 0 is a constant depending only on , d, κ 0 and λ. In this dissertation, our goal is to obtain L 1 results on four models in mathe- matical physics discussed in the next chapter. To reach our goal, we first develop a version of Stein’s method for sums of bounded positively associated random variables in the following two subsections. Before digging into the concrete appli- cations, in the first part of the next chapter, we will provide bounds in the L 1 distance to the normal for a sum of elements of a positively associated random field inZ d , under the condition that covariances decay at an exponential rate. The result will be applicable to three of the four models we considered. The voter model in Section 5.5 is exceptional and it illustrates that our methods apply in the absence of exponential decay. 68 4.2 L 1 Bounds Our main result in the one dimensional case is the following. Theorem 4.2.1. Letξ = (ξ 1 ,...,ξ m ) be a positively associated mean zero random vector with components obeying the bound|ξ i |≤ B, and whose sum W = P m i=1 ξ i has variance 1. Let Z be a standard normal random variable. Then d 1 L(W ),L(Z) ≤ 5B + s 8 π X i6=j σ ij where σ ij =E[ξ i ξ j ]. (4.8) To prove our result, we first recall that a pair of random variables (X,Y ) is said to be positive quadrant dependent, or PQD, if for all x,y∈R we have H(x,y)≥ 0 where H(x,y) =P (X >x,Y >y)−P (X >x)P (Y >y). It was shown in [56] (see also Lemma 2 of [63]), that if (X,Y ) is PQD and if both X and Y have finite second moments then Cov X,Y = Z R Z R H(x,y)dxdy. In particular, if (X,Y ) is PQD then Cov X,Y ≥ 0, and if Cov X,Y = 0 then X and Y are independent. The following two lemmas are invoked in the proofs of Theorem 4.2.1 and Theorem 4.3.1 in the next section. Lemma 4.2.2 is Lemma 3 of [69] which allows us to bound Cov φ(X),ψ(Y ) by a constant times Cov X,Y for a PQD pair (X,Y ) and φ and ψ sufficiently smooth. 69 Lemma 4.2.2 ([69]). Let the random variables X,Y have finite second moments and be PQD. Then for any real valued functions φ and ψ that are absolutely con- tinuous on all finite subintervals ofR, Cov φ(X),ψ(Y ) ≤|φ 0 | ∞ |ψ 0 | ∞ Cov X,Y . (4.9) As for any fixed u the indicator 1 X>u is increasing in X, when X and Y are positively associated then they are PQD. As positive association is preserved by coordinate increasing functions, the following lemma is immediate. Lemma 4.2.3. The pair (X,Y ) = (ψ(ξ),φ(ξ)) is PQD whenever ξ is positively associated and ψ(ξ) and φ(ξ) are coordinate wise increasing functions ofξ. Recall from (2.1) that the L 1 distance has its alternative characterization, see [75] for example: d 1 L(X),L(Y ) = sup h∈L |Eh(X)−Eh(Y )|, (4.10) where L ={h :|h(y)−h(x)|≤|y−x|} which is the form that we are going to apply in the proof that follows . Proof of Theorem 4.2.1 For given h∈L let f be the unique bounded solution to the Stein equation f 0 (w)−wf(w) =h(w)−Nh where Nh =Eh(Z), (4.11) withL(Z) the standard normal distribution. Then, (see e.g. [21] Lemma 2.4), |f 0 | ∞ ≤ s 2 π and |f 00 | ∞ ≤ 2. (4.12) 70 As Var(W ) = P i,j σ ij = 1, we obtain E[f 0 (W )] = E m X i=1 σ 2 i f 0 (W ) + X i6=j σ ij f 0 (W ) = E m X i=1 ξ 2 i f 0 (W ) + X i6=j σ ij f 0 (W ) + m X i=1 (σ 2 i −ξ 2 i )f 0 (W ) . Now letting W i =W−ξ i , write E[Wf(W )] =E m X i=1 ξ i f(W ) = E m X i=1 ξ i f(W i +ξ i ) = E m X i=1 ξ i f(W i ) +ξ 2 i Z 1 0 f 0 (W i +uξ i )du . Recalling the Stein equation (4.11) and subtracting, we obtain E[h(W )−Nh] =E[f 0 (W )−Wf(W )] =E m X i=1 ξ 2 i Z 1 0 f 0 (W )−f 0 (W i +uξ i ) du + m X i=1 (σ 2 i −ξ 2 i )f 0 (W ) + X i6=j σ ij f 0 (W )− m X i=1 ξ i f(W i ) . (4.13) Using the second inequality in (4.12), we bound the first term in (4.13) by E m X i=1 ξ 2 i Z 1 0 f 0 (W )−f 0 (W i +uξ i ) du = E m X i=1 ξ 2 i Z 1 0 Z ξ i uξ i f 00 (W i +t)dtdu ≤ 2E m X i=1 ξ 2 i Z 1 0 Z |ξ i | u|ξ i | dtdu ! =E m X i=1 |ξ i | 3 ≤BE m X i=1 ξ 2 i ≤B, (4.14) using the almost sure bound on the variablesξ i , and that their sum has mean zero and variance 1. 71 To handle the second term in (4.13), first note that W and ξ i are coordinate wise increasing functions of ξ, and hence PQD by Lemma 4.2.3. Now applying Lemma 4.2.2 with g(x) = x 2 |x|≤B B 2 |x|>B and again using the second inequality in (4.12), we have E m X i=1 f 0 (W )(σ 2 i −ξ 2 i ) = m X i=1 Cov f 0 (W ),g(ξ i ) ≤ 4B m X i=1 Cov W,ξ i = 4B m X i=1 m X j=1 σ ij = 4B, (4.15) using that Var W = P i,j σ ij = 1. For the third term in (4.13), using the nonnegativity of the covariancesσ ij and the first inequality in (4.12) we obtain E X i6=j σ ij f 0 (W ) =|Ef 0 (W )| X i6=j σ ij ≤ s 2 π X i6=j σ ij . (4.16) For the final term in (4.13), we note that the variablesW i andξ i are coordinate wise increasing in ξ, hence the pair (W i ,ξ i ) is PDQ by Lemma 4.2.3. Applying Lemma 4.2.2 and the first inequality in (4.12) now yields E m X i=1 ξ i f(W i ) = m X i=1 Cov ξ i ,f(W i ) ≤ s 2 π m X i=1 Cov ξ i ,W i = s 2 π X i6=j σ ij .(4.17) Summing the bounds (4.14)-(4.17) we find that|Eh(W )− Nh| is bounded by the right hand side of (4.8). Taking supremum over h ∈ L and using the characterization of the d 1 metric given in (4.10) completes the proof. 72 4.3 Multivariate Smooth Function Metric Bounds We also consider a version of Theorem 4.2.1 adapted to the multidimensional case, with the L 1 metric replaced by a smooth functions metric, following the development of Chapter 12 of [21]. For x∈R p let |x| 1 = p X i=1 |x i |, the L 1 vector norm, and recall that, for a real valued function ϕ(u) defined on the domainD,|ϕ| ∞ = sup x∈D |ϕ(x)|. We include in this definition the|·| ∞ norm of vectors and matrices, for instance, by considering them as real valued functions of their indices. Form∈N 0 , letL ∞ m (R p ) be the collection of all functionsh :R p →R such that for all k = (k 1 ,...,k p )∈N p 0 with|k| 1 ≤m, the partial derivative h (k) (x) = ∂ |k| 1 h ∂ k 1 x 1 ···∂ kp x p exists, and |h| L ∞ m (R p ) := max 0≤|k| 1 ≤m |h (k) | ∞ is finite. Let H m,∞,p ={h∈L ∞ m (R p ) :|h| L ∞ m (R p ) ≤ 1}, 73 and for random vectors X and Y inR p , define the smooth functions metric d Hm,∞,p L(X),L(Y) = sup h∈Hm,∞,p |Eh(X)−Eh(Y)|. (4.18) For a positive semidefinite matrix H we let H 1/2 denote the unique positive semidefinite square root of H. When H is positive definite, we write H −1/2 = (H 1/2 ) −1 . Our main multidimensional result is the following theorem: Theorem4.3.1. Withm,p∈N 1 , let{ξ i,j :i∈ [m],j∈ [p]} be positively associated mean zero random variables bounded in absolute value by some positive constant B. Let S = (S 1 ,S 2 ,...,S p ) where S j = P 1≤i≤m ξ i,j for j∈ [p] and assume that Σ = Var S is positive definite. Then d H 3,∞,p L(Σ −1/2 S),L(Z) ≤ 1 6 + 2 √ 2 p 3 B|Σ −1/2 | 3 ∞ p X j=1 Σ j,j + 3 √ 2 + 1 2 ! p 2 |Σ −1/2 | 2 ∞ p X j=1 X i,k∈[m],i6=k Cov (ξ i,j ,ξ k,j ) + 2 √ 2p 3 B|Σ −1/2 | 3 ∞ + 3 √ 2 + 1 2 ! p 2 |Σ −1/2 | 2 ∞ ! X j,l∈[p],j6=l Σ j,l , (4.19) where Z∼N (0,I p ), a standard normal vector inR p . To prove Theorem 4.3.1 we apply the following two lemmas. The first, con- tained in the remark explaining (12) in [69], provides a version of (4.9) for vector valued functions. Lemma 4.3.2 ([69]). Ifξ = (ξ 1 ,...,ξ n ) are positively associated then for all real valued differentiable functions φ and ψ onR n , Cov φ(ξ),ψ(ξ) ≤ 3 √ 2 n X i,j=1 ∂φ ∂ξ i ∞ ∂ψ ∂ξ j ∞ Cov ξ i ,ξ j . 74 The next lemma is a small variant of Lemma 2.6 of [21], due to [7]. Let Z be a standard normal random vector inR p . Forh :R p →R, letNh =Eh(Z) and for u≥ 0 define (T u h)(s) =Eh(se −u + √ 1−e −2u Z). We write D 2 h for the Hessian matrix of h when it exists. Lemma 4.3.3. For m≥ 3 and h∈L ∞ m (R p ) the function g(s) =− Z ∞ 0 [T u h(s)−Nh]du solves trD 2 g(s)−s·∇g(s) =h(s)−Nh, and for any 0≤|k| 1 ≤m |g (k) | ∞ ≤ 1 |k| 1 |h (k) | ∞ . Furthermore, for anyλ∈R p and positive definitep×p matrix Σ, f defined by the change of variable f(s) =g(Σ −1/2 (s−λ)) (4.20) solves trΣD 2 f(s)− (s−λ)·∇f(s) =h(Σ −1/2 (s−λ))−Nh, (4.21) 75 and satisfies |f (k) | ∞ ≤ p |k| 1 |k| 1 |Σ −1/2 | |k| 1 ∞ |h (k) | ∞ . In particular, if h∈H m,∞,p then |f (k) | ∞ ≤ p |k| 1 |k| 1 |Σ −1/2 | |k| 1 ∞ for all 0≤|k| 1 ≤m. (4.22) We apply the same technique as in the univariate case, along with Lemmas 4.3.2 and 4.3.3, to prove our main multivariate theorem. Proof of Theorem 4.3.1 Given h∈H 3,∞,p , let f be the solution of (4.21) given by (4.20) with λ = 0. Writing out the expressions in (4.21) yields E h h(Σ −1/2 S)−Nh i =E p X j=1 p X l=1 Σ j,l ∂ 2 ∂s j ∂s l f(S)− p X j=1 S j ∂ ∂s j f(S) =E p X j=1 Σ j,j ∂ 2 ∂s 2 j f(S) +E X j,l∈[p],j6=l Σ j,l ∂ 2 ∂s j ∂s l f(S)−E p X j=1 S j ∂ ∂s j f(S). (4.23) We consider the first term of (4.23) and deal with each term under the sum sep- arately for j = 1,...,p. Letting σ 2 i,j = Var ξ i,j and σ i,j;k,l = Cov ξ i,j ,ξ k,l , we have Σ j,j ∂ 2 ∂s 2 j f(S) = m X i=1 σ 2 i,j ∂ 2 ∂s 2 j f(S) + X i,k∈[m],i6=k σ i,j;k,j ∂ 2 ∂s 2 j f(S) = m X i=1 ξ 2 i,j ∂ 2 ∂s 2 j f(S) + X i,k∈[m],i6=k σ i,j;k,j ∂ 2 ∂s 2 j f(S) + m X i=1 (σ 2 i,j −ξ 2 i,j ) ∂ 2 ∂s 2 j f(S). (4.24) 76 Now, with S j∗i = S j −ξ i,j we may write the summands of the third term on the right hand side of (4.23) as S j ∂ ∂s j f(S) = m X i=1 ξ i,j ∂ ∂s j f(S) = m X i=1 ξ i,j ∂ ∂s j f(S 1 ,...,S j∗i ,...,S p ) + m X i=1 ξ 2 i,j Z 1 0 ∂ 2 ∂s 2 j f(S 1 ,...,S j∗i +uξ i,j ,...,S p )du. (4.25) Substituting (4.24) and (4.25) into (4.23), we obtain E h h(Σ −1/2 S)−Nh i =E p X j=1 m X i=1 ξ 2 i,j Z 1 0 ∂ 2 ∂s 2 j f(S)− ∂ 2 ∂s 2 j f(S 1 ,...,S j∗i +uξ i,j ,...,S p ) ! du +E p X j=1 m X i=1 (σ 2 i,j −ξ 2 i,j ) ∂ 2 ∂s 2 j f(S)−E p X j=1 m X i=1 ξ i,j ∂ ∂s j f(S 1 ,...,S j∗i ,...,S p ) +E p X j=1 X i,k∈[m],i6=k σ i,j;k,j ∂ 2 ∂s 2 j f(S) +E X j,l∈[p],j6=l Σ j,l ∂ 2 ∂s j ∂s l f(S). (4.26) We handle these five terms separately. For the first term in (4.26), using (4.22) we have E p X j=1 m X i=1 ξ 2 i,j Z 1 0 ∂ 2 ∂s 2 j f(S)− ∂ 2 ∂s 2 j f(S 1 ,...,S j∗i +uξ i,j ,...,S p ) ! du = E p X j=1 m X i=1 ξ 2 i,j Z 1 0 Z ξ i,j uξ i,j ∂ 3 ∂s 3 j f(S 1 ,...,S j∗i +t,...,S p )dtdu ≤ p 3 3 |Σ −1/2 | 3 ∞ E p X j=1 m X i=1 ξ 2 i,j Z 1 0 Z |ξ i,j | u|ξ i,j | dtdu = p 3 6 |Σ −1/2 | 3 ∞ p X j=1 m X i=1 E|ξ i,j | 3 ≤ p 3 6 |Σ −1/2 | 3 ∞ B p X j=1 m X i=1 Eξ 2 i,j ≤ p 3 6 |Σ −1/2 | 3 ∞ B p X j=1 Σ j,j , (4.27) 77 using the almost sure bound on the variables ξ i,j , and that their sum S j over i from 1 to m has mean zero. For the second term in (4.26), we have E p X j=1 m X i=1 (σ 2 i,j −ξ 2 i,j ) ∂ 2 ∂s 2 j f(S) = p X j=1 m X i=1 Cov ∂ 2 ∂s 2 j f(S),ξ 2 i,j ! = p X j=1 m X i=1 Cov ∂ 2 ∂s 2 j f(S,ξ i,j ),g(S,ξ i,j ) ! , where with some abuse of notation we let f(s,x) =f(s), and define g(s,x) = x 2 |x|≤B B 2 |x|>B, for all s∈R p and x∈R. Applying Lemma 4.3.2 and using the bound (4.22) and the fact that (S,ξ i,j ) are positively associated for all i,j, we obtain p X j=1 m X i=1 Cov ∂ 2 ∂s 2 j f(S),ξ 2 i,j ! ≤ 3 √ 2 p X j=1 m X i=1 p X l=1 ∂ 3 ∂s l ∂s 2 j f ∞ ∂g ∂x ∞ Cov (S l ,ξ i,j ) ≤ 2 √ 2p 3 |Σ −1/2 | 3 ∞ B p X j=1 m X i=1 p X l=1 Cov S l ,ξ i,j = 2 √ 2p 3 |Σ −1/2 | 3 ∞ B p X j,l=1 m X i,k=1 σ i,j;k,l = 2 √ 2p 3 |Σ −1/2 | 3 ∞ B p X j,l=1 Σ j,l . (4.28) 78 For the third term in (4.26), again applying Lemma 4.3.2 and arguing as for the second term, we have E p X j=1 m X i=1 ξ i,j ∂ ∂s j f(S 1 ,...,S j∗i ,...,S p ) = p X j=1 m X i=1 Cov ξ i,j , ∂ ∂s j f(S 1 ,...,S j∗i ,...,S p ) ! ≤ 3 √ 2 p X j=1 m X i=1 X l∈[p]\{j} ∂ 2 ∂s l ∂s j f ∞ Cov (ξ i,j ,S l ) + ∂ 2 ∂s 2 j f ∞ Cov (ξ i,j ,S j∗i ) ≤ 3 √ 2 p 2 |Σ −1/2 | 2 ∞ X j,l∈[p],j6=l m X i=1 Cov (ξ i,j ,S l ) + p X j=1 m X i=1 Cov (ξ i,j ,S j∗i ) = 3 √ 2 p 2 |Σ −1/2 | 2 ∞ X j,l∈[p],j6=l m X i,k=1 σ i,j;k,l + p X j=1 X i,k∈[m],i6=k σ i,j;k,j = 3 √ 2 p 2 |Σ −1/2 | 2 ∞ X j,l∈[p],j6=l Σ j,l + p X j=1 X i,k∈[m],i6=k σ i,j;k,j . (4.29) For the fourth and the fifth terms in (4.26), again using (4.22) we have p X j=1 X i,k∈[m],i6=k σ i,j;k,j ∂ 2 ∂s 2 j f(S) ≤ p 2 2 |Σ −1/2 | 2 ∞ p X j=1 X i,k∈[m],i6=k σ i,j;k,j (4.30) and E X j,l∈[p],j6=l Σ j,l ∂ 2 ∂s j ∂s l f(S) ≤ p 2 2 |Σ −1/2 | 2 ∞ X j,l∈[p],j6=l Σ j,l . (4.31) Summing the bounds (4.27)-(4.31) we find that E h h(Σ −1/2 S)−Nh i is bounded by the right hand side of (4.19). Taking supremum over h∈H 3,∞,p and using the definition (4.18) of d Hm,∞,p completes the proof. 79 Chapter 5 Applications to Statistical Physics Models “Mathematics is the door and key to the sciences.” Roger Bacon In this chapter, we focus on applications of the main results from the last chap- ter to the four well-known models in statistical physics and particle systems inZ d ; the ferromagnetic nearest-neighbor Ising model, the supercritical bond percolation model, the voter model and the contact process. In the Ising model, we obtain an L 1 bound between the total magnetization and the normal distribution at any tem- perature when the magnetic moment parameter is nonzero, and when the inverse temperature is below critical and the magnetic moment parameter is zero. In the percolation model we obtain such a bound for the total number of points in a finite region belonging to an infinite cluster in dimensions d≥ 2, in the voter model for the occupation time of the origin in dimensionsd≥ 7, and for finite time integrals of non-constant increasing cylindrical functions evaluated on the one dimensional supercritical contact process started in its unique invariant distribution. Stein’s method has been used previously in several papers in statistical physics. In [32], a version of Stein’s method using size bias couplings was developed for a class of discrete Gibbs measures, that is, probability measures onN 0 proportional to e V (k) for some function V : N 0 → R. This work provided bounds in the total 80 variation distance between the distribution of certain sums of strongly correlated random variables and discrete Gibbs measures, with applications to interacting particle systems. Stein’s method of exchangeable pairs was applied to the classical Curie-Weiss model for high temperatures and at the critical temperature in [30], where optimal Kolmogorov distance bounds were produced. The same result at the critical temperature was also obtained in [17]. These results were extended to the Curie-Weiss-Potts model in [31]. 5.1 Second order stationary positively associ- atedfieldswithexponentialcovariancedecay We begin with general results for a certain class of positively associated random fields which will be then applied to the Ising model, bond percolations and the contact process in the upcoming sections. Let{X j :j∈Z d } be a positively associated random field on the d-dimensional integer latticeZ d . Assume that the field is second order stationary, that is, for all j,k∈Z d the covariance Cov(X j ,X k ) exists and equals R(j−k) for some function R. With S n k defined in (4.3), consider the standardized variables W n k = S n k −ES n k √ n d A n , k∈Z d ,n∈N 1 , (5.1) that have mean zero and variance 1. The following theorem provides a bound of order n −d/(2d+2) with an explicit constant on theL 1 distance between the distribution ofW n k and the normal under the condition that the covariance function R(·) decays exponentially in the L 1 norm inR d . TheL 1 norm has been chosen for convenience in the proof of Lemma 81 5.1.4 as all norms in R d are equivalent, i.e., when inequality (5.2) holds for any norm with some λ> 0 then it holds for any other norm with λ replaced by some positive constant multiple. Theorem 5.1.1. Let d∈N 1 and{X j : j∈Z d } be a positively associated second order stationary random field with covariance function R(k) = Cov(X j ,X j+k ) for all j,k∈Z d , and suppose that for some K > 0 it holds that|X j |≤K a.s. for all j∈Z d . Assume that there exist λ> 0 and κ 0 > 0 such that R(k)≤κ 0 e −λ|k| 1 for all k∈Z d (5.2) and let μ λ = e λ (e λ − 1) 2 , υ λ = e 2λ (e λ − 1) 2 and γ λ,d = (4μ λ + 2υ λ ) d − (2υ λ ) d (5.3) and C λ,κ 0 ,d = 5Kd √ πA n √ 2κ 0 γ λ,d . Then, for anyk∈Z d , withW n k as given in (5.1) andZ a standard normal random variable, d 1 L(W n k ),L(Z) ≤ κ 1 n d/(2d+2) for all n≥ max n C 2/d λ,κ 0 ,d ,C −2/(d+2) λ,κ 0 ,d o , where, with A n is as in (4.4), κ 1 = 10Kκ d 0 γ d λ,d 2 3d/2 π d/2 A d+1/2 n ! 1/(d+1) 1 d d d+1 + 2d 1 d+1 ! . (5.4) 82 In the following we will make use of the identities n−1 X k=1 (n−k)w k = w ((n− 1)−nw +w n ) (w− 1) 2 for w6= 1, (5.5) n + n−1 X a=1 (n−a)(v a +v −a ) = v 1−n (v n − 1) 2 (v− 1) 2 for v6= 1, (5.6) and n + 2 n−1 X b=1 (n−b)u b = (1−u 2 )n− 2u + 2u n+1 (u− 1) 2 for u6= 1. (5.7) We have already taken A in (4.5), without loss of generality, to be positive, and the exponential decay condition (5.2) implies that it is also finite. Lemma 4 of [69] can now be invoked to yield that lim n→∞ A n =A. Hence the valuesA n are bounded away from zero and infinity as a function of n, and replacing A n by its limiting value A in (5.4) only affects the bound by a constant and does not alter the rate of convergence. In certain instances the values ofA n andA will be close even for the moderate values n, for instance if equality holds in (5.2), then using (5.7) with u = e −λ , which is not equal to one since λ> 0, as n→∞ we obtain A n = κ 0 n d X i,j∈B n 1 e −λ|i−j| 1 = κ 0 n d n X i 1 ,...,i d =1 j 1 ,...,j d =1 d Y k=1 e −λ|i k −i k | = κ 0 n d d Y k=1 n X i k ,j k =1 e −λ|i k −i k | = κ 0 n d d Y k=1 n−1 X a k =−n+1 (n−|a k |)e −λ|a k | = κ 0 n d n + 2 n−1 X b=1 (n−b)e −λb ! d =κ 0 1−e −2λ − 2e −λ n + 2e −λ(n+1) n (1−e −λ ) 2 d −→κ 0 1 +e −λ 1−e −λ ! d =κ 0 coth d (λ/2) :=A. 83 One can apply Theorem 4.1.2 of [15], to obtain a bound of order log d n/ √ n, without an explicit constant, on the Kolmogorov distance between W n k and the normal by showing thatu(n) as defined in (4.7) converges to zero at an exponential rate. Here we use Theorem 4.2.1 to obtain an L 1 result. We note that the technique in [15], first introduced in [84], presents some challenges when applied to obtain an L 1 bound. We consider the argument in the one dimensional case, as in [11], to illustrate the difficulty. Letting f n be the characteristicfunction ofW n 1 and following the proofin [11], oneobtains the bound f n (t)−e −t 2 /2 ≤C n −1/2 log(n)t 3 e −t 2 /4 +n −1 log 3/2 (n)t +n −3/2 t over the region 0≤ t≤ γn 1/2 / log(n), where γ is a constant defined in [11], and where here and in the following, C denotes a constant whose value may change from line to line. Applying Theorem 1.5.2 in [60] with T = n 1/2 / log(n), the Kolmogorov distance of order log(n)/n 1/2 was obtained in [11]. To get the bound inL 1 , one may apply Theorem 1.5.4 in [60] instead, however, the integrand of the term (3) in Theorem 1.5.4 Z T −T d dt f n (t)−e −t 2 /2 t ! 2 dt includes f n (t)−e −t 2 /2 2 t 4 , which is not integrable. WealsoextendTheorem5.1.1tothemultidimensionalcase. Foranyp∈N 1 and indicesk 1 ,...,k p inZ d satisfying a separation condition to ensure non-degeneracy of the limiting distibution, Theorem 5.1.2 provides a bound in the metric d H 3,∞,p to the multivariate normal for S n = (S n k 1 ,...,S n kp ) under exponential decay of the covariance function and a condition limiting the amount of variables common to the coordinates making up the sumsS n . Regarding multidimensional convergence, 84 under strictly stationary Theorem 4.1.1 of [69] shows that (W n nk 1 ,...,W n nkp ) con- verges to a standard normal random vector in R p , but does not provide a bound. In the following result and its proof, constants will not be tracked with precision, but will be indexed by the set of variables on which it depends. Theorem 5.1.2. For d∈ N 1 , let{X j : j∈ Z d } be a positively associated second order stationary random field with covariance function R(k) = Cov(X j ,X j+k ) for allj,k∈Z d , and suppose that there exists constantsK > 0,κ 0 > 0 andλ> 0 such that|X j |≤K a.s. for all j∈Z d and R(k)≤κ 0 e −λ|k| 1 for all k∈Z d . For p∈N 1 let k 1 ,...,k p ∈Z d be such that min q,s∈[p],q6=s |k q −k s | ∞ ≥ (1−α)n, (5.8) for some 0 < α < 1 with αn an integer. Let S n = (S n k 1 ,...,S n kp ), where S n k is defined as in (4.3) and assume that the covariance matrix Σ of S n is invertible, and let ψ n =n d/2 |Σ −1/2 | ∞ . 85 Then, there exists a constantC λ,κ 0 ,d,p,K such that withZ a standard normal random vector inR p , d H 3,∞,p L(Σ −1/2 (S n −ES n )),L(Z) ≤C λ,κ 0 ,d,p,K (A n +α) 1 d+1 ψ 2d+3 d+1 n 1 d d d+1 + 2d 1 d+1 ! n − d 2(d+1) +αψ 2 n ! for all n≥ max n B 2/d n,d,α ,B −2/(d+2) n,d,α o where B n,d,α =dψ n (A n +α). (5.9) For checking the hypothesis that the covariance matrix of S n is invertible, and that the quantity ψ n is of order one, and so does not affect the rate in (5.9) and also so that B n,d,α is order one, we present the following condition, proved at the end of this section. We note in addition that if ψ n = O(1) and α = O(n − d 2(d+1) ) then the bound in (5.9) will also be of order O(n − d 2(d+1) ). Lemma 5.1.3. Let{X j : j∈Z d } be a second order stationary random field with covariance function R(k) = Cov(X j ,X j+k ) for all j,k∈ Z d where R(·) satisfies (5.2). With n∈N 2 , the covariance matrix Σ of the random vector S n ∈R p ,p≥ 2 as given in Theorem 5.1.2 is invertible if for some b> 0 |k q −k s | ∞ ≥n−b for all q6=s,q,s∈ [p], and b< nA n (p− 1)κ 0 υ d λ , (5.10) with λ and κ 0 as in (5.2), and υ λ as in (5.3). If b =αn for α satisfying 0<α< min ( 1, A n (p− 1)κ 0 υ d λ ) (5.11) then the matrix Σ is invertible and |Σ −1 | ∞ ≤ 1 n d (A n − (p− 1)κ 0 υ d λ α) . (5.12) 86 The proofs of Theorems 5.1.1 and 5.1.2 proceed by decomposing the sum S n k over the blockB n k into sums over smaller, disjoint blocks whose side lengths are at most some integerl. In particular, given 1≤l≤n uniquely writen = (m−1)l +r for m≥ 1 and 1≤ r≤ l. We correspondingly decompose B n k into m d disjoint blocks D l i,k ,i∈ [m] d , where there are (m− 1) d ‘main’ blocks having all sides of lengthl, andm d − (m− 1) d remainder blocks having all sides of lengthr orl, with at least one side of length r. In detail, for k∈Z d and i∈ [m] d set D l i,k =D l i +k−1 where D l i = {j∈Z d : (i s − 1)l + 1≤j s ≤i s l,i s 6=m, (m− 1)l + 1≤j s ≤ (m− 1)l +r,i s =m}. It is straightforward to verify that for all i∈ [m] d with no component equal to m, which are the indices of the ‘main blocks’, that with B l 1 as in (4.3), we have D l i =B l 1 + (i−1)l for i∈ [m− 1] d , (5.13) and if r = l then D l i is given by (5.13) for all i∈ [m] d . Further, it is easy to see that the elements of the collection{D l i,k ,i∈ [m] d } are disjoint, and union to B n k . Letting ξ l i,k = X t∈D l i,k (X t −EX t ) and W n k = X i∈[m] d ξ l i,k √ n d A n for i∈ [m] d , (5.14) we see that ξ l i,k has mean zero, and W n k as in (5.14) agrees with its representation as given in (5.1), and has mean zero and variance one. For simplicity we will drop the index k in ξ l i,k when k =1, as we do also for D i,k , and may also suppress l. 87 By [35], when X = (X 1 ,...,X p ) is positively associated and if h k : R p → R, k ∈ [q] are coordinate-wise nondecreasing functions, then h 1 (X),...,h q (X) are positively associated. Hence, as the elements of{ξ i,k : i∈ [m] d } are increasing functions of{X j :j∈Z d } they are positively associated. We prove Theorems 5.1.1 and 5.1.2 with the help of two lemmas that follow. The first, Lemma 5.1.4 bounds the sum of the covariances between ξ l i,k and ξ l j,k , defined in (5.14), over i,j ∈ [m] d . Next, Lemma 5.1.6 bounds the covariance between two block sums of sizen d whose centers are at leastn−b apart in theL ∞ distance for some integer 1≤b≤n. Lemma 5.1.4. Let{X j :j∈Z d } be a second order, positively associated stationary random field with covariance functionR(k) = Cov(X j ,X j+k ) for allj,k∈Z d where R(·) satisfies the exponential decay condition (5.2). For n≥ 2 and 1≤l≤n, let n = (m− 1)l +r for integers m∈N 1 and 1≤ r≤ l. Then for k∈Z d , with ξ l i,k given by (5.14) we have X i,j∈[m] d ,i6=j E h ξ l i,k ξ l j,k i ≤ κ 0 γ λ,d n d l , where κ 0 is given in (5.2), and γ λ,d in (5.3). Proof. By second order stationarity, it suffices to consider the case k = 1. For convenience write σ i,j = E [ξ i ξ j ]. Since D l i ⊂ B l 1 + (i−1)l for all i∈ [m] d and Cov X t ,X s ≥ 0 for allt,s∈Z d by positive association, withS l i as given in (4.3), we have, for all i6=j∈ [m] d , σ i,j = Cov X t∈D l i X t , X s∈D l j X s ≤ Cov S l 1+(i−1)l ,S l 1+(j−1)l . (5.15) 88 Note that when i6=j in [m] d the pair (i,j) must lie in exactly one of the sets E m s for s∈ [d], given by E m s = n (i,j)∈ [m] d × [m] d :|{k∈ [d] :i k 6=j k }| =s o for s∈ [d]. Hence, using (5.15), X i,j∈[m] d ,i6=j σ i,j = d X s=1 X (i,j)∈E m s σ i,j ≤ d X s=1 X (i,j)∈E m s Cov S l 1+(i−1)l ,S l 1+(j−1)l .(5.16) Recalling the definition of the block sums in (4.3) and using second order sta- tionarity, for (i,j)∈E m s we have Cov S l 1+(i−1)l ,S l 1+(j−1)l = l X p 1 ,...,p d =1 q 1 ,...,q d =1 R (p 1 + (j 1 − 1)l)− (q 1 + (i 1 − 1)l) . . . (p d + (j d − 1)l)− (q d + (i d − 1)l) = l X p 1 ,...,p d =1 q 1 ,...,q d =1 R p 1 −q 1 + (j 1 −i 1 )l . . . p d −q d + (j d −i d )l ≤κ 0 l−1 X a 1 ,...,a d =−l+1 (l−|a 1 |)··· (l−|a d |) exp −λ a 1 + (j 1 −i 1 )l . . . a d + (j d −i d )l 1 , (5.17) where we have applied (5.2) in the final inequality. We now apply some invariance properties of the norm|x| 1 in order to simplify the sum of expression (5.17) when taken over pairs of indices in E m s . We say that 89 the pairs of indices (i,j) and (i 0 ,j 0 ) in E m s are equivalent if (i,j) can be made to agree with (i 0 ,j 0 ) by interchanging i k and j k for any values of k∈ [d] and then applying the same permutation to the resulting set of vectors. Note that asa k and −a k appear symmetrically in the sum (5.17), and that|·| 1 is invariant under sign changes and permutations of coordinates, equivalent vectors yield the same value of this sum. Call an index pair (i,j)∈E m s canonical ifi k <j k for allk = 1,...,s. Note that as pairs in E m s have exactly s inequalities among their coordinates that canonical indices must agree in their remaining, and last,d−s coordinates. Now associate to every pair (i,j)∈E m s the equivalent, unique canonical vector that is obtained from (i,j) by interchanging all coordinates k∈ [d] for which i k >j k , and then applying the permutation that maps the s unequal coordinates of the pair to positions 1, 2,...,s, thus mapping the equal pairs to the last d−s positions, leaving the relative order of each group unchanged. Partitioning the sum overE m s into smaller sums over all vectors that are equivalent to the same canonical vector yields a sum over all canonical vectors, with a factor of 2 s to account for the number arrangements of inequalities over the unequal pairs, and a factor of d s to account for the positions in which the unequal coordinates may occur. Hence, summing (5.17) over E m s , and using identities of the form X a 1 ∈A 1 ,...,a d ∈A d d Y q=1 f q (a q ) = d Y q=1 X aq∈Aq f q (a q ) that hold for any index setsI 1 ,...,I d and any functionsf q in order to interchange summations and products notating the index set to be consistent with the formula used above I s,m d = n (i,j)∈ [m] d × [m] d :i k <j k ,k∈ [s],i k =j k ,k∈ [d]/[s] o , 90 we obtain X (i,j)∈E m s Cov S l 1+(i−1)l ,S l 1+(j−1)l ≤ κ 0 d s ! 2 s X (i,j)∈I s,m d l−1 X a 1 ,...,a d =−l+1 (l−|a 1 |)··· (l−|a d |) exp −λ a 1 + (j 1 −i 1 )l . . . a d + (j d −i d )l 1 = κ 0 d s ! 2 s X (i,j)∈I s,m d l−1 X a 1 ,...,a d =−l+1 d Y q=1 (l−|a q |)e −λ|(jq−iq )l+aq| = κ 0 d s ! 2 s X (i,j)∈I s,m d d Y q=1 l−1 X aq =−l+1 (l−|a q |)e −λ|(jq−iq )l+aq| = κ 0 d s ! 2 s X (i,j)∈I s,m d s Y q=1 l−1 X aq =−l+1 (l−|a q |)e −λ(jq−iq )l−λaq d Y q=s+1 l−1 X aq =−l+1 (l−|a q |)e −λ|aq| , where in the final equality we have expressed the product in q as two separate products, and have noted in the first of these, as j k −i k ≥ 1 and a k ranges from −l + 1 to l− 1, the terms (j k −i k )l +a k appearing in the absolute value of the exponent are always positive. Now expanding the first multiple summation as an inner summation on the first s indices and outer sum over the final d−s indices, separating out the exponential terms in the first product and noting that the remaining product terms are purely mulitplicative factors, we obtain 91 κ 0 d s ! 2 s X 1≤i k =j k ≤m k=s+1,...,d X 1≤i k <j k ≤m k=1,...,s s Y q=1 e −λ(jq−iq )l × s Y q=1 l−1 X aq =−l+1 (l−|a q |)e −λaq d Y q=s+1 l−1 X aq =−l+1 (l−|a q |)e −λ|aq| = κ 0 d s ! 2 s m X i=1 1 ! d−s s Y q=1 X 1≤iq<jq≤m e −λ(jq−iq )l × s Y q=1 l−1 X aq =−l+1 (l−|a q |)e −λaq d Y q=s+1 l−1 X aq =−l+1 (l−|a q |)e −λ|aq| = κ 0 d s ! 2 s m d−s m−1 X k=1 (m−k)e −λkl ! s × l + l−1 X a=1 (l−a) e λa +e −λa ! s l + 2 l−1 X b=1 (l−b)e −λb ! d−s . By applying identity (5.5) with n = m and w = e −λl and identities (5.6) and (5.7) withn =l,v =e λ andu =e −λ , noting thatu,v andw are not one asλ> 0, the term above equals κ 0 d s ! 2 s m d−s e λ e −2λl e λl − 1 2 m− 1−me −λl +e −λlm (e λ − 1) 2 (1−e −λl ) 2 s × 1−e −2λ l− 2e −λ + 2e −λ(1+l) (1−e −λ ) 2 d−s ≤κ 0 d s ! 2 s m d e λ (e λ − 1) 2 ! s l (1−e −λ ) 2 ! d−s =κ 0 m d d s ! (2μ λ ) s (lυ λ ) d−s (5.18) 92 where μ λ and υ λ are given in (5.3). By (5.16) and (5.18), we have X i,j∈[m] d ,i6=j σ i,j ≤ κ 0 m d d X s=1 d s ! (2μ λ ) s (lυ λ ) d−s ≤ κ 0 n d l d X s=1 d s ! (4μ λ ) s (2υ λ ) d−s = κ 0 n d (4μ λ + 2υ λ ) d − (2υ λ ) d l where we have used the bounds m≤ 2n/l and 1/l s ≤ 1/l for all s∈ [d] in the second inequality. Lemma 5.1.5. For all n∈N 2 and λ> 0, n−1 X a=−n+1 (n−|a|)e −λ|q+a| is decreasing as a function of|q|∈N 0 . Proof. We note that the sequence f(a) = (n−|a|)1(|a|≤ n) is unimodal, and g(a) =e −λ|a| is log-concave, and hence their convolution h(q) = ∞ X a=−∞ f(a)g(q−a) = n−1 X a=−n+1 (n−|a|)e −λ|q−a| = n−1 X a=−n+1 (n−|a|)e −λ|q+a| is unimodal, see [13]. Since g(a) is symmetric unimodal, one can also use the fact from [81] that the convolution of two symmetric unimodal sequences is symmetric unimodal to prove the result. Below we index the coordinates of vectors k j ∈Z d as k j 1 ,...,k j d . Lemma 5.1.6. Let{X j : j∈Z d } be a second order stationary random field with covariance function R(k) = Cov(X j ,X j+k ) for all j,k∈ Z d where R(·) satisfies 93 (5.2). Let n∈N 2 and let b be an integer such that 1≤b≤n. Then if k 1 and k 2 are vectors inZ d such that |k 1 −k 2 | ∞ ≥n−b, then with λ and κ 0 as in (5.2), and υ λ as in (5.3), Cov S n k 1 ,S n k 2 ≤κ 0 υ d λ bn d−1 . Proof. Arguing as in the proof of Lemma 5.1.4 we have Cov S n k 1 ,S n k 2 = n−1 X p 1 ,...,p d =0 q 1 ,...,q d =0 R (p 1 +k 2 1 )− (q 1 +k 1 1 ) . . . (p d +k 2 d )− (q d +k 1 d ) ≤κ 0 n−1 X a 1 ,...,a d =−n+1 (n−|a 1 |)... (n−|a d |) exp −λ a 1 + (k 2 1 −k 1 1 ) . . . a d + (k 2 d −k 1 d ) 1 =κ 0 d Y i=1 n−1 X a i =−n+1 (n−|a i |)e −λ|(k 2 i −k 1 i )+a i | . (5.19) Lemma 5.1.5 yields that n−1 X a i =−n+1 (n−|a i |)e −λ|(k 2 i −k 1 i )+a i | is a decreasing function of|k 1 i −k 2 i |. (5.20) In particular the i th sum appearing in the product (5.19) is bounded by its value when k 1 i = k 2 i . As|k 1 −k 2 | ∞ ≥ n−b, there must exist at least one i for which |k 2 i −k 1 i |≥ n−b, and whose corresponding sum is maximized by its value when equality to n−b is achieved, again using (5.20). The product of these sums, by 94 (5.20) again, is maximized when there is just a single coordinate achievingn−b as its absolute difference, and where this difference in all other terms achieve equality to zero. Hence, by symmetry the term above is bounded by the case wherek 1 i =k 2 i for i∈ [d− 1] and k 2 d −k 1 d =n−b and thus Cov S n k 1 ,S n k 2 ≤ κ 0 d−1 Y i=1 n−1 X a i =−n+1 (n−|a i |)e −λ|a i | n−1 X a d =−n+1 (n−|a d |)e −λ|a d +n−b| ≤ κ 0 (nυ λ ) d−1 n−1 X a d =−n+1 (n−|a d |)e −λ|a d +n−b| , (5.21) where we have applied (5.7) in the final inequality and υ λ is given in (5.3). Now considering the sum in (5.21) for 2≤b≤n, we obtain n−1 X a=−n+1 (n−|a|)e −λ|a+n−b| = −n+b−1 X a=−n+1 (n−|a|)e λ(a+n−b) + n−1 X a=−n+b (n−|a|)e −λ(a+n−b) = n−1 X a=n−b+1 (n−a)e λ(−a+n−b) + n−b X a=1 (n−a)e −λ(−a+n−b) + n−1 X a=0 (n−a)e −λ(a+n−b) . For the first sum, making a change of variable and applying (5.5), we obtain n−1 X a=n−b+1 (n−a)e λ(−a+n−b) = e −λb b−1 X a=1 ae λa = e −λb b b−1 X a=1 e λa − b−1 X a=1 (b−a)e λa ! = e −λb b e λb −e λ e λ − 1 − e λ ((b− 1)−be λ +e λb ) (e λ − 1) 2 ! = b e λ − 1 +e λ(1−b) −e λ (e λ − 1) 2 , 95 and similarly we may obtain n−b X a=1 (n−a)e −λ(−a+n−b) = be λ e λ − 1 +e λ +n 1−e λ e λ(1+b−n) −e λ(1+b−n) (e λ − 1) 2 and n−1 X a=0 (n−a)e −λ(a+n−b) = e λ(1+b−2n) −e λ(1+b−n) +n(e λ − 1)e λ(1+b−n) (e λ − 1) 2 . Summing these three terms yields n−1 X a=−n+1 (n−|a|)e −λ|a+n−b| = b(e 2λ − 1) +e (1−b)λ −e −λ(n−b−1) (2−e −λn ) (e λ − 1) 2 ≤ be 2λ (e λ − 1) 2 =bυ λ , where we lastly note that the equality holds also for b = 1. Now we use Theorem 4.2.1 and Lemma 5.1.4 to prove Theorem 5.1.1. In the following, for positive numbers a and b we will seek to minimize a quantity of the form al d +b/l over non-negative integers l. Over real values, it is easy to verify that the minimum is achieved at l 0 = (b/ad) 1/(d+1) . Taking l =bl 0 c when l 0 ≥ 1 and using that l 0 /2≤l≤l 0 yields min l∈N 1 al d + b l ! ≤a b ad ! d d+1 + 2b ad b ! 1 d+1 =a 1 d+1 b d d+1 1 d d d+1 + 2d 1 d+1 ! . (5.22) Proof of Theorem 5.1.1: By second order stationarity, it suffices to prove the case k = 1. Let n≥ 2,B n 1 the block of size n d as given in (4.3), and W n 1 the 96 standardized sum over that block, as in (5.1). For any 1≤ l ≤ n write n = (m− 1)l +r, 1≤r≤l, and decompose W n 1 as the sum of ξ i / √ n d A n over i∈ [m], as in (5.14). We apply Theorem 4.2.1, handling the two terms on the right hand side of (4.8). For the first term, using|X j |≤ K, the definition (5.14) of ξ i , and the fact that the side lengths of all blocks D l i are at most l, we have ξ i √ n d A n ≤B with B = 2Kl d √ n d A n for all i∈ [m] d . Applying Lemma 5.1.4 for the second term, invoking Theorem 4.2.1 now yields d 1 L(W 1 ),L(Z) ≤ 10Kl d √ n d A n + 2 √ 2κ 0 γ λ,d √ πlA n . Recalling that A n > 0 for all n ∈ N 1 , applying the bound (5.22) yields the result, noting that l 0 = C −1/(d+1) λ,κ 0 ,d n d/(2d+2) satisfies 1 ≤ l 0 ≤ n for n ≥ max n C 2/d λ,κ 0 ,d ,C −2/(d+2) λ,κ 0 ,d o . To prove Theorem 5.1.2 we apply Theorem 4.3.1 and use the same techniques as for Theorem 5.1.1. We remind the reader that for this result we do not explicitly compute the constants, but index them by the parameters on which they depend. Proof of Theorem 5.1.2: We proceed as in the one dimensional case. Forn≥ 2, and 1≤l≤n we writen = (m−1)l+r withm≥ 1 and 1≤r≤l, and decompose S n kq −ES n kq forq∈ [p] as the sum overi∈ [m] d of the variablesξ i,kq given in (5.14). Applying Theorem 4.3.1, we handle the three terms on the right hand side of (4.19). For the first term, using the definition (5.14) ofξ i,kq and|X t |≤K, we have ξ i,kq ≤B where B = 2Kl d for all i∈ [m] d ,q∈ [p] 97 and thus noting that|Σ −1/2 | ∞ =n −d/2 ψ n and that Σ jj =n d A n , we may bound the first term as 1 6 + 2 √ 2 p 3 Bn −3d/2 ψ 3 n p X q=1 Σ q,q ≤ C p,K l d A n ψ 3 n n d/2 . (5.23) For the second term, by Lemma 5.1.4 we have 3 √ 2 + 1 2 ! p 2 n −d ψ 2 n p X q=1 X i,j∈[m] d ,i6=j E ξ i,kq ξ j,kq =C p ψ 2 n p X q=1 X i,j∈[m] d ,i6=j E ξ i,kq ξ j,kq n d ! ≤ C λ,κ 0 ,p,d ψ 2 n l . (5.24) Invoking Lemma 5.1.6 and assumption (5.8) we have Σ q,s = Cov S n kq ,S n ks ≤κ 0 υ d λ αn d for q6=s∈ [p], and hence we may bound the last term as 2 √ 2p 3 Bn −3d/2 ψ 3 n + 3 √ 2 + 1 2 ! p 2 n −d ψ 2 n ! X q,s∈[p],q6=s Σ q,s ≤ C p,K l d ψ 3 n n 3d/2 + C p ψ 2 n n d ! κ 0 υ d λ αn d ≤ C λ,κ 0 d,p,K αl d ψ 3 n n d/2 +C λ,κ 0 ,d,p αψ 2 n . (5.25) By Theorem 4.3.1 and (5.23), (5.24) and (5.25), d H 3,∞,p L(Σ −1/2 W),L(Z) ≤ (C p,K A n +C λ,κ 0 ,d,p,K α) ψ 3 n l d n d/2 + C λ,κ 0 ,d,p ψ 2 n l +C λ,κ 0 ,d,p αψ 2 n ≤C λ,κ 0 ,d,p,K (A n +α) ψ 3 n l d n d/2 + ψ 2 n l +αψ 2 n ! . 98 Applying (5.22) to the first two terms in parenthesis, we obtain l 0 = 1 dψ n (A n +α) ! 1 d+1 n d 2d+2 , which satisfies 1≤ l 0 ≤ n for the range of n given in (5.9), and applying (5.22) yields the result. We now prove the sufficient condition given above for the invertibility of the covariance matrix of S n , and the bound on the norm of its inverse. Recall that a matrix A∈R p×p is said to be strictly diagonally dominant if|a ii |− P j6=i |a ij |> 0 for all i∈ [p]. Proof of Lemma 5.1.3: By Lemma 5.1.6 and the upper bound on b given in (5.10), for all q∈ [p] we have Σ q,q − X 1≤s≤p,s6=q |Σ q,s |≥n d−1 (nA n − (p− 1)κ 0 υ d λ b)> 0. Hence, Σ is a strictly diagonally dominant matrix, and is therefore invertible by the Gershgorin circle theorem, see for instance Theorem 15.10 of [6]. The final claim of the lemma follows from [1], where it is shown that the bound (5.12) holds for the norm||C|| ∞ = max i P p j=1 |c ij |, which dominates|C| ∞ . 5.2 The Ising Model The Ising model was introduced by the physicist Ernst Ising in 1924 (see [61]), suggested by his thesis advisor, Wilhelm Lenz. It is a model of ferromagnetism in statistical physics. Though the model is easy to define, it is very useful on its own and is the prototype of many more complicated models. In this work we obtain an 99 L 1 distance bound between the total magnetization and the normal distribution in some special cases depending on temperature and the magnetic moment parameter as detailed below. To define this model, we first introduce a lattice. In this work, we considerZ d with d∈ N 1 . We call i∈ Z d , a site, and for each site i, ω i is a variable taking values in the set{−1, +1} which represents the magnetic spin at site i. Though we consider the infinite lattice Z d , we first start with finite lattices and later we take a limit in some sense so that the finite lattices approachZ d . For this purpose, let Λ be a symmetric hypercube, that is, Λ ={i∈Z d :|i| ∞ ≤k} for some k∈N 0 . We say ω ={ω i : i∈ Λ} is a configuration on Λ when ω i takes values in the set {−1, +1} for each i∈ Λ. Next we define a probability measure on configurations on Λ, corresponding to the nearest-neighbor ferromagnetic Ising model. First, fix a configuration e ω = {e ω i : i∈ Z d } on all of Z d , and the interaction strength J≥ 0. Now define the energy function, or Hamiltonian H Λ,h,e ω , by H Λ,h,e ω (ω) =−J X i,j∈Λ |i−j| 1 =1 ω i ω j + X i∈Λ,j/ ∈Λ |i−j| 1 =1 ω i e ω j −h X i∈Λ ω i , (5.26) whereh is the magnetic moment parameter. Given an inverse temperature β > 0, the finite volume Ising probability measure on Λ is given by μ Λ,β,h,e ω (ω) = exp(−βH Λ,h,e ω (ω))/Z Λ,β,h,e ω , 100 for a normalizing constant Z Λ,β,h,e ω . The cases where e ω j takes the value 0, +1 and −1 for all j∈Z d are typically referred to as free, positive, and negative boundary conditions, respectively, see [33] for example. Now let{Λ n :n∈N 1 } be any increasing sequence of symmetric hypercubes of Z d whose union isZ d . For fixedβ andh, we say thatμ Λn,β,h,e ω converges weakly to μ β,h,e ω as n→∞ if lim n→∞ Z fdμ Λn,β,h,e ω = Z fdμ β,h,e ω for all local functions f :R d →R, where we recall that a function f is said to be local if it depends only on Λ n for some n. Each such weak limit is a probability measure on{−1, 1} Z d . The set of infinite volume Gibbs measures, denotedG β,h , is the closed convex hull of all weak limits of μ Λn,β,h,e ω for e ω any configuration onZ d . The structure ofG β,h depends on parameters β and h. In particular, by [33], there exists a critical inverse temperatureβ c such that the limit measures depend, or do not depend, on the boundary conditions whenβ >β c orβ <β c , respectively. More precisely, for β > 0,h 6= 0 and 0 < β < β c ,h = 0, the finite volume measuresμ Λn,β,h,e ω have a unique weak limit for any choice of boundary conditions e ω. Thus, in these cases,G β,h consists of a unique measure, which we denote μ β,h . In the case β > β c ,h = 0, the setG β,0 contains two pure phases, μ + β,0 and μ − β,0 , which are the infinite volume measures arising as weak limits under positive and negative boundary conditions respectively. Hence, by definition,G β,0 also contains the convex combinationsμ (α) β,0 =αμ + β,0 +(1−α)μ − β,0 , 0<α< 1. These measures are called mixed phases. For d = 2,G β,0 consists of only pure and mixed phases. For d≥ 3, the setG β,0 contains nontranslation invariant measures for any sufficiently 101 large β, which are not considered in this paper. For additional detail about finite and infinite volume Gibbs measures, see [33]. In this work, we focus on the Ising models on Z d in the cases where the weak limit is unique, precisely, for measures in the set M 1 ={μ β,h : (β,h)∈R + ×R\{0}∪ (0,β c )×{0}}. (5.27) For these measures, we obtainL 1 bounds to the normal for the distribution of the total magnetization M k = X i∈B n k ω i (5.28) over the finite block B n k ⊂Z d with side length n ‘anchored’ at k∈Z d , defined in (4.3). See Remark 5.2.2 for a discussion regarding the applicability of our results to the pure or mixed phases when the inverse temperature is above critical, that is, for measures in the set M 2 = μ + β,0 ,μ − β,0 : β >β c ,h = 0 μ (α) β,0 : 0<α< 1,β >β c ,h = 0 . (5.29) It is well-known that the variables making up the configuration{ω i : i∈Z d } with measure inM 1 ∪M 2 are positively associated whenever J in (5.26) is non- negative. Therefore Theorem 5.1.1 and Theorem 5.1.2 apply to M k as stated in the following corollary. Corollary 5.2.1. The bounds of Theorem 5.1.1 (Theorem 5.1.2) apply to the total magnetization M k of (5.28) of the Ising model having infinite volume Gibbs mea- sure inM 1 of (5.27) (when choosing α in (5.8) to satisfy (5.11)). 102 Proof. We show that the fields in the model in question posses the properties required for the application of Theorems 5.1.1 and 5.1.2. First, each must comprise almost surely uniformly bounded variables, must be second order stationary, pos- itively associated and must have an exponentially decreasing covariance function. Clearly the uniformly bounded condition holds for both, implying the existence of second moments, so second order stationarity holds under strictly stationary. By [33], any Ising model corresponding to an infinite volume Gibbs measure inM 1 is strictly stationary, and by [69], positive association holds whenever the interaction J that appears in the Hamiltonian in (5.26) is nonnegative. By [62] and [2] (see also [28]), the covariance between spins decays at an exponential rate when β > 0,h6= 0 and β <β c ,h = 0, respectively. Lastly,fortheapplicationofTheorem5.1.2,assumption(5.11)onαandLemma 5.1.3 yield the required invertibility of the covariance matrix. Remark 5.2.2. The earlier version [50] of the work [51] depended on a result of the subsequently withdrawn manuscript [3] to handle the Ising model with infinite volume Gibbs measures inM 2 of (5.29). In particular, [50] shows how the results of Corollary 5.2.1 apply to this case of the Ising model under the condition that its covariance function is exponentially decreasing. 5.3 Bond Percolations The bond percolation model was first used in [14] to model the probability distribution that the center of a porous stone immersed in water becomes wet. Let us first describe the model in mathematical setting and we will revisit the connection to this physical model afterward. 103 We consider bond percolation on Z d for d ≥ 2 (see [52], for example). In graphical terms, nearest neighbor bonds are the edges between vertices x and y in Z d satisfying|x−y| 1 = 1; we denote the collection of all such bonds by E d . For a given θ∈ [0, 1], we declare each bond in E d to be open with probability θ and closed otherwise, independently of all other bonds. More formally, we take G ={0, 1} E d , elements of which are represented asg = n g(e) :e∈E d o , and called configurations. The value g(e) = 0 corresponds to the bond e being closed and g(e) = 1 corresponds to e being open. Given a bond configuration g and x ∈ Z d , letC(x) be the set of vertices connected to x through open bonds, and let|C(x)| denote the number of vertices inC(x), which may be infinite. Since the probability θ that a bond is open is the same for every e∈ E d and each bond is independent of all others,C(x) has the same distribution for all x∈R d . We define ρ(θ) =P (|C(0)| =∞) and θ c = sup{θ :ρ(θ) = 0}, respectively representing the probability that for a given connectivity θ a vertex belongstoaninfinitecluster, anditsthreshold, thecritical probability. Ourinterest is in obtaining L 1 bounds to the normal, in the supercritical case θ > θ c , for the distribution of the total number of points in a finite block that belong to an infinite cluster, specifically for U k = X x∈B n k 1 {|C(x)|=∞} (5.30) where B n k ⊂Z d is defined in (4.3). It is well-known that{1 {|C(x)|=∞} : x∈Z d } is positively associated (see [69]). 104 As promised, we now revisit the physical motivation of the bond percolation model, that is, the porous stone example. The bonds in E d represent the inner passageways of the stone where the open bonds correspond to the passages which are sufficiently large so that water can go through and the closed bonds correspond to the contary. A particular pore inside the stone becomes wet if and only if there exists a path of open bonds connecting the pore to the boundary of the stone. Now we focus on a very big stone so that the probability distribution in consideration is close to the model over the infinite latticeZ d . In this case, a particular pore of the stone becomes wet if and only if it belongs to some infinite cluster. Therefore the variable of our interest in (5.30) pertains to the number of wet pores in a certain region. Next corollary applies the results from Section 5.1 to the bond percolation model. Corollary 5.3.1. The bounds of Theorem 5.1.1 (Theorem 5.1.2) apply to the total number of points U k in (5.30) that belong to an infinite cluster in the supercritical bond percolation model in dimensions d≥ 2 (when choosing α in (5.8) to satisfy (5.11)). Proof. As in the proof of Corollary 5.2.1, we show that the field in the two model posses the properties required for the application of Theorems 5.1.1 and 5.1.2. For d≥ 2, strict stationarity follows from the fact that the vertex connection probability θ is the same for each bond e∈E d , and that the status of each bond is independent of all others. Again by [69], the binary field{1 {|C(x)|=∞} :x∈Z d } is positively associated. By [18], the covariance between1 {|C(x)|=∞} and1 {|C(y)|=∞} decays exponentially. Lastly,fortheapplicationofTheorem5.1.2,assumption(5.11)onαandLemma 5.1.3 yield the required invertibility of the covariance matrix. 105 5.4 The Voter Model The name might suggest that this model must be used for modeling political systems, however, the voter model was not originally introduced in this manner. In fact, it was invented to fulfill the basic theory in interacting particle systems (see e.g. [25] and [58]). Later, it has been called the voter model due to a potential connection to basic voting systems. Next we describe the model in mathematical setting and we meanwhile point out the connection to the voting systems. We consider the d-dimensional voter model, a Markov process{η t : t≥ 0} taking values in{0, 1} Z d , and one of the simplest interacting particle systems, see also [23]. With a precise formulation described later below, the process is specified by taking the initial distribution{η 0 (x) : x∈Z d } to be a family of independent, identically distributed Bernoulli random variables with parameter θ∈ (0, 1), and having transitions η t (x)→ 1−η t (x) at rate 1 2d {y :|y−x| 1 = 1 and η t (x)6=η t (y)} . (5.31) That is, a site x∈ Z d changes its state with rate equal to the fraction of its 2d nearest neighbors that are in the opposite state. We study the occupation time T t s = Z s+t s η u (0)du for t> 0, s≥ 0, (5.32) that is, the amount of time in (s,s +t] that the origin 0 spends in state 1. In connection with the voting systems, we may think of 0 and 1 as two candi- dates in a certain election and each x∈Z d as a voter. The translation in (5.31) 106 means that voters in each neighborhood have positive influence on each other, that is, a single voter is more likely to follow the majority in his/her neighbor- hood. What we study in (5.32) is the amount of time in a certain period that a voter prefers voting for a given candidate represented by state 1. Taking t> 0, m∈N 1 and X t s,i = Z s+it/m s+(i−1)t/m η u (0)du for i∈ [m], we have T t s = m X i=1 X t s,i , (5.33) where [m] ={1, 2,...,m}. It was demonstrated in [24] that the family{X t s,i :i∈ [m]} is positively associated. We note that the assumption that the covariance decays exponentially is not satisfied and therefore the previous results in Section 5.1 do not apply here. With- out using positive association, [23] showed that the occupation timeT t 0 of the voter model as defined in (5.32) satisfies the CLT for d≥ 2. In later work, [24] used the fact that T t 0 is the sum of positively associated random variables and that u(n) as defined in (4.7) converges to zero asn→∞ to prove the CLT ford≥ 5. However, error bounds and the rate of convergence appear yet unaddressed in the literature. Here we refine the results for d≥ 7 in Theorem 5.4.1 by applying Theorem 4.2.1 to obtain a rate of convergence in the L 1 metric. Though the authors are unaware of any results in the literature that yield distributional bounds for the voter model, Stein’s method has been successful in obtaining bounds in the anti-voter model, see [76]. The anti-voter model differs from the voter model in that after a uniformly neighbor of a vertex is chosen the state of that vertex changes to the opposite state of that neighbor. The work of [76] used the exchangeable pair approach to Stein’s method, and positive association was not an ingredient. 107 By [23], for d≥ 5 we have ET t 0 =θt and lim t→∞ A t =κ 2 for some κ 2 > 0, where A t = Var T t 0 t for t> 0. (5.34) Letting A t s = Var T t s t , (5.35) we will show in Lemma 5.4.6 that for all s≥ 0, lim t→∞ A t s =κ 2 , and for all t> 0 thatET t s =θt and A t s > 0. Standardizing T t s to have mean zero and variance one we obtain W t s = T t s −θt q A t s t . (5.36) Now we define the last exit time of the origin in [0,u] and [0,∞) by L u = sup{s≤u :Y s =0} for u≥ 0, (5.37) and L = sup{s<∞ :Y s =0}, (5.38) respectively, where Y s is a rate 1 simple symmetric random walk inZ d starting at the origin at time zero. By definition, it is clear that L u is increasing in u and L u ≤L for allu≥ 0. The following theorem providesL 1 bounds to the normal for 108 the standardized occupation time W t s , given in (5.36). Lemma 5.4.5 shows that the second moment of L u , appearing in the bounds below, is finite. Theorem 5.4.1. For W t s as defined in (5.36), for all d≥ 7 and s≥ 0, d 1 L(W t s ),L(Z) ≤ 180 √ 2θ(1−θ)EL 2 2(s+t) √ π(A t s ) 3/2 1/2 t −1/4 for all t≥ √ 2θ(1−θ)EL 2 2(s+t) 5 q πA t s 2/3 , with Z∼N (0, 1), A t s is as in (5.35) and L u as in (5.37). We also extend Theorem 5.4.1 to the multidimensional case, obtaining a bound in Theorem 5.4.2 in the metric d H 3,∞,p to the multivariate normal for S t = (T t s 1 ,...,T t sp ). The results could easily be extended to the case where the occupation times are measured over intervals of varying length, as recorded in the vector (T t 1 s 1 ,...,T tp sp ). Theorem 5.4.2. Let d≥ 7 and t> 0. For p∈N 1 let s 1 ,...,s p ≥ 0 be such that min k,l∈[p],k6=l |s k −s l |≥ (1−α)t, (5.39) for some 0 < α < 1. Let S t = (T t s 1 ,...,T t sp ) where T t s is defined as in (5.32), assume that the covariance matrix Σ of S t is invertible and let ψ t =t 1/2 |Σ −1/2 | ∞ . 109 Then, there exists a constant C p,θ such that with Z a standard normal random vector inR p , d H 3,∞,p L(Σ −1/2 (S t −θt)),L(Z) ≤C p,θ p X k=1 A t s k +α + 1 t ! 1/2 ψ 5/2 t t −1/4 +ψ 2 t α + 1 t for all t≥ ψ t p X k=1 A t s k +α + 1 t !! −2/3 . (5.40) From (5.35) we have Var(T t s ) =tA t s and we show in Lemma 5.4.6 thatA t s →κ 2 ast→∞. ForcheckingthehypothesisthatthecovariancematrixofS t isinvertible and the quantity ψ t is of order one, we present the following sufficient condition, proved at the end of this section. In addition, we see from (5.40) that ifψ t =O(1) and α =O(t − 1 4 ) then the bound will also be of order O(t − 1 4 ). Lemma 5.4.3. With t > 0, Σ the covariance matrix of the random vector S t ∈ R p ,p≥ 2 as given in Theorem 5.4.2 is invertible if for some b≥ 0 |s k −s l |≥t−b for all k6=l,k,l∈ [p], and b< t min k∈[p] A t s k − (p− 1)θ(1−θ)E[L 2 ] 2(p− 1)θ(1−θ)E[L] , (5.41) where A t s is given in (5.35) and L in (5.38). If b =αt for α satisfying 0<α< min ( 1, min k∈[p] A t s k − (p− 1)θ(1−θ)E[L 2 ]/t 2(p− 1)θ(1−θ)E[L] ) , then the matrix Σ is invertible and |Σ −1 | ∞ ≤ 1 t min k∈[p] A t s k − (p− 1)θ(1−θ) (E[L 2 ]/t + 2αE[L]) . 110 To prove Theorems 5.4.1 and 5.4.2, we apply the following result implied by (0.8) and (0.9) of [23], shown there using a duality that connects the voter model with a system of coalescing random walks constructed by a time reversal, and tracing the{0, 1} ‘opinion’ of every site back to its genesis at time zero. Lemma 5.4.4 ([23]). For t≥ 0 and 0≤u≤v, with L u as in (5.37), Cov(η u (0),η v (0)) =θ(1−θ)P (L u+v >v−u). (5.42) We now use Lemma 5.4.4 to prove the following result which will be used in several places in this section. Lemma 5.4.5. For all 0≤r<s<t, Z s r Z t s Cov(η u (0),η v (0))dvdu≤ θ(1−θ) 2 E[L 2 s+t ], whereL u is as in (5.37). Moreover, withL as in (5.38) we haveE[L 2 u ]≤E[L 2 ]<∞ for all u> 0 and d≥ 7. Proof. Applying covariance identity (5.42) from Lemma 5.4.4 and using the fact that L u is increasing in u, we obtain Z s r Z t s Cov(η u (0),η v (0))dvdu = θ(1−θ) Z s r Z t s P L u+v >v−u dvdu ≤ θ(1−θ) Z s r Z t s P L s+t >v−u dvdu. 111 By a change of variables that preserves the difference v−u, we have Z s r Z t s P L s+t >v−u dvdu = Z s−r 0 Z t−r s−r P L s+t >v−u dvdu ≤ Z s−r 0 Z t s−r P L s+t >v−u dvdu. Hence, Z s r Z t s Cov(η u (0),η v (0))dvdu ≤ θ(1−θ) Z s−r 0 Z t s−r P L s+t >v−u dvdu. = θ(1−θ) Z s−r 0 Z t−u s−r−u P (L s+t >v)dvdu ≤ θ(1−θ) Z s−r 0 Z t s−r−u P (L s+t >v)dvdu = θ(1−θ) Z s−r 0 Z t u P (L s+t >v)dvdu ≤ θ(1−θ) Z t 0 Z t u P (L s+t >v)dvdu = θ(1−θ) Z t 0 Z v 0 P (L s+t >v)dudv = θ(1−θ) 2 Z t 0 2vP (L s+t >v)dv ≤ θ(1−θ) 2 E[L 2 s+t ] where change of variables are applied in first and second equality, Fubini’s theorem in the third equality, and where in the final inequality we apply the standard fact, easily shown using Fubini’s theorem, that if φ(y) is a differentiable function on [0,∞) satisfying φ(0) = 0 then for Y≥ 0, E[φ(Y )] = Z ∞ 0 φ 0 (y)P (Y >y)dy. (5.43) Since L u ≤L for all u≥ 0, to prove the second claim, it suffices to show that E[L 2 ]<∞ ford≥ 7. Returning to the integral expression for the second moment, 112 and slightly modifying the calculation for the first moment in [23] following (0.11), we have E[L 2 ] = Z ∞ 0 2tP (L>t)dt = Z ∞ 0 2t Z ∞ t P (Y s = 0)dsγ d dt, (5.44) where γ d = P (Y t 6= 0∀t≥ 1), known to be positive and increasing for d≥ 3. Changing the order of integration in (5.44) yields E[L 2 ] =γ d Z ∞ 0 Z s 0 2tP (Y s =0)dtds =γ d Z ∞ 0 s 2 P (Y s =0)ds, implying thatE[L 2 ]<∞ ford≥ 7 as, see [23] for instance,P (Y s =0) =O(s −d/2 ). The next result shows some important properties of the mean and the variance of T t s . Lemma 5.4.6. For d≥ 7, t > 0 and s≥ 0, with T t s as in (5.32) and A t s as in (5.35), ET t s =θt, A t s > 0 and lim t→∞ A t s =κ 2 , where κ 2 is given in (5.34). Proof. We verify the first claim using the definition of T t s and (5.34) to yield ET t s =E h T t+s 0 −T s 0 i = (t +s)θ−sθ =tθ. The second claim follows from (2.1) in [23] and the fact that P (Y u =0) is strictly positive for all u> 0. 113 For the final claim, note that for s = 0, A t 0 = A t and the result reduces to (5.34). For s> 0, we have A t s = Var(T t s ) t = Var(T t+s 0 −T s 0 ) t = Var(T t+s 0 ) + Var(T s 0 )− 2Cov(T s 0 ,T t+s 0 ) t = (t +s)A t+s t + sA s t − 2Cov(T t+s 0 ,T s 0 ) t . (5.45) The first term converges toκ 2 and the second term tends to zero ast→∞. Hence it suffices to show that the last term also tends to zero. Writing the covariance in integral form in two parts and applying Lemmas 5.4.4 and 5.4.5 to the first and second terms, respectively, we obtain Cov(T s 0 ,T t+s 0 ) = Z s 0 Z s 0 Cov(η u (0),η v (0))dvdu + Z s 0 Z s+t s Cov(η u (0),η v (0))dvdu ≤ 2θ(1−θ) Z s 0 Z s u P (L u+v >v−u)dvdu + θ(1−θ) 2 E[L 2 2s+t ] ≤ s 2 +E[L 2 ], and thus the last term in (5.45) tends to zero as t→∞ since E[L 2 ] is finite for d≥ 7 by Lemma 5.4.5. Now we use Lemma 5.4.5 to prove Lemma 5.4.7, which bounds the sum of the covariances between X t s,i and X t s,j defined in (5.33) over i6= j∈ [m], followed by Lemma 5.4.8, which bounds the covariance between occupation times starting at r,s satisfying|r−s|≥t−b. 114 Lemma 5.4.7. For t> 0, s≥ 0 and m∈N 1 , with X t s,i as in (5.33) and L u as in (5.37), X i,j∈[m],i6=j Cov(X t s,i ,X t s,j )≤θ(1−θ)(m− 1)E[L 2 2(s+t) ]. Proof. Using the definition of X t s,i in (5.33), we have X i,j∈[m],i6=j Cov(X t s,i ,X t s,j ) = 2 m−1 X i=1 m X j=i+1 Cov(X t s,i ,X t s,j ) = 2 m−1 X i=1 Cov X t s,i , m X j=i+1 X t s,j = 2 m−1 X i=1 Z s+it/m s+(i−1)t/m Z s+t s+it/m Cov(η u (0),η v (0))dvdu. Applying Lemma 5.4.5 to the integrals above, and using the monotonicity property of L u in u> 0, we have X i,j∈[m],i6=j Cov(X t s,i ,X t s,j )≤ m−1 X i=1 θ(1−θ)E[L 2 2(s+t) ] =θ(1−θ)(m− 1)E[L 2 2(s+t) ]. Lemma 5.4.8. For t > 0, let 0≤ b≤ t and let r,s≥ 0 satisfy|r−s|≥ t−b. Then, with T t s as in (5.32) and L as in (5.38), Cov(T t s ,T t r )≤θ(1−θ) E[L 2 ] + 2bE[L] . (5.46) 115 Proof. It suffices to consider the caser≥s. First assume thatr<s+t. Using the definition of T t s in (5.32) and breaking the integral that expresses the covariance we wish to bound into three parts, we have Cov(T t s ,T t r ) = Z s+t s Z r+t r Cov(η u (0),η v (0))dvdu = Z r s Z r+t r Cov(η u (0),η v (0))dvdu + Z s+t r Z r+t s+t Cov(η u (0),η v (0))dvdu + Z s+t r Z s+t r Cov(η u (0),η v (0))dvdu ApplyingLemma5.4.5tothefirsttwointegrals, andusingthefactthatL u ≤L, accounts for the first term in (5.46). For the last integral, using (5.42) from Lemma 5.4.4, we obtain Z s+t r Z s+t r Cov(η u (0),η v (0))dvdu = 2θ(1−θ) Z s+t r Z s+t u P (L u+v >v−u)dvdu ≤ 2θ(1−θ) Z s+t r Z s+t u P (L>v−u)dvdu = 2θ(1−θ) Z s+t r Z s+t−u 0 P (L>v)dvdu ≤ 2θ(1−θ) Z s+t r Z ∞ 0 P (L>v)dvdu = 2θ(1−θ) Z s+t r E[L]du = 2θ(1−θ)(t−r +s)E[L] ≤ 2θ(1−θ)bE[L], thusaccountingforthesecondtermin(5.46), whereachangeofvariablesisapplied in the second equality, (5.43) in the third equality, and the assumption thatr−s≥ t−b in the final inequality. 116 Whenr≥s +t, by the positivity of the covariance due to assocation, we have Cov(T t s ,T t r ) = Z s+t s Z r+t r Cov(η u (0),η v (0))dvdu ≤ Z r s Z r+t r Cov(η u (0),η v (0))dvdu. Now applying Lemma 5.4.5 to the integral above, it is bounded byθ(1−θ)E[L 2 ]/2, hence the claim of the lemma is true in this case as it holds with b replaced by zero. Now we use Theorem 4.2.1 and Lemmas 5.4.5 and 5.4.7 to prove Theorem 5.4.1. Proof of Theorem 5.4.1: Let m be a positive integer and ξ t s,i = X t s,i − (t/m)θ q A t s t for i∈ [m], s≥ 0 (5.47) where X t s,i is defined as in (5.33). We apply Theorem 4.2.1 to W t s = P m i=1 ξ t s,i , having mean zero and variance one. By [35], the components of the vector ξ s t = (ξ t s,1 ,...,ξ t s,m ) are positively associated as they are increasing functions of the positively associated variables{X t s,i :i∈Z}. We now handle the terms on the right hand side of the bound (4.8). For the first term, using the definition of X t s,i in (5.33) we have 0≤X t s,i = Z s+it/m s+(i−1)t/m η u (0)du≤ Z s+it/m s+(i−1)t/m 1ds = t m , and thus, by using the definition of ξ t s,i in (5.47), |ξ t s,i |≤B where B = 2 √ t m q A t s for all i∈ [m]. 117 Applying Lemma 5.4.7 for the second term, invoking Theorem 4.2.1 now yields d 1 L(W t s ),L(Z) ≤ 10 √ t m q A t s + 2 √ 2θ(1−θ)mE[L 2 2(s+t) ] √ πA t s t , whereE[L 2 2(s+t) ]<∞ by Lemma 5.4.5. Applying (5.22) with d = 1 and withm in place of l now yields the result, noting that the lower bound on t in the theorem implies that m 0 = 5 q πA t s t 3/2 √ 2θ(1−θ)EL 2 2(s+t) 1/2 is at least one. To prove Theorem 5.4.2 we apply Theorem 4.3.1 and use the same techniques as for Theorem 5.4.1. For this result we index constants by the parameters on which they depend, though we do not explicitly compute them. Proof of Theorem 5.4.2: Form∈N 1 decomposeT t s k −θt fork∈ [p] as the sum over i∈ [m] of the variables ξ k i =X t s k ,i − (t/m)θ where X t s k ,i is given in (5.33). Applying Theorem 4.3.1, we handle the three terms on the right hand side of (4.19). In the calculation below we use thatE[L] andE[L 2 ] are finite ford≥ 7 by Lemma 5.4.5. For the first term, again using the definition of X t s k ,i , we have |ξ k i |≤B where B = 2t m for all i∈ [m] and k∈ [p]. Since|Σ −1/2 | ∞ =t −1/2 ψ t , and Σ k,k =tA t s k we may bound the first term as 1 6 + 2 √ 2 p 3 Bt −3/2 ψ 3 t p X k=1 Σ k,k = C p ψ 3 t √ t m p X k=1 A t s k . (5.48) 118 For the second term, by Lemma 5.4.7 we have 3 √ 2 + 1 2 ! p 2 t −1 ψ 2 t p X k=1 X i,j∈[m],i6=j E ξ k i ξ k j =C p ψ 2 t p X k=1 X i,j∈[m],i6=j Cov X t s k ,i ,X t s k ,j t ≤ C p,θ ψ 2 t m t p X k=1 E[L 2 s k +t ]≤ C p,θ ψ 2 t m t , (5.49) where we have applied Lemma 5.4.5 for the final inequality. Invoking Lemma 5.4.8 and assumption (5.39) we have Σ k,l = Cov(T t s k ,T t s l )≤θ(1−θ) E[L 2 ] + 2αtE[L] for k6=l∈ [p], and hence we may bound the last term as 2 √ 2p 3 Bt −3/2 ψ 3 t + 3 √ 2 + 1 2 ! p 2 t −1 ψ 2 t ! X k,l∈[p],k6=l Σ k,l ≤ C p tψ 3 t mt 3/2 + C p ψ 2 t t ! θ(1−θ) E[L 2 ] + 2αtE[L] ≤ C p,θ ψ 3 t m √ t + C p,θ ψ 2 t t + C p,θ α √ tψ 3 t m +C p,θ αψ 2 t . (5.50) By Theorem 4.3.1 and (5.48), (5.49) and (5.50), d H 3,∞,p L(Σ −1/2 (S t −θt)),L(Z) ≤ C p √ t p X k=1 A t s k + C p,θ √ t +C p,θ α √ t ! ψ 3 t m + C p,θ ψ 2 t m t + C p,θ ψ 2 t t +C p,θ αψ 2 t ≤C p,θ √ t p X k=1 A t s k + 1 √ t +α √ t ! ψ 3 t m + ψ 2 t m t +ψ 2 t α + 1 t ! . 119 Applying (5.22) to the first two terms in parenthesis now yields the result, noting that m 0 = ψ t p X k=1 A t s k +α + 1 t !! 1 2 t 3 4 is at least one for the range of t given in (5.40). Finally, we present the sufficient condition given above for the invertibility of the covariance matrix of S t . Proof of Lemma 5.4.3: By Lemma 5.4.8 and the upper bound on b given in (5.41), for all k∈ [p] we have Σ k,k − X l∈[p],l6=k |Σ k,l |≥tA t s k − (p− 1)θ(1−θ)(E[L 2 ] + 2bE[L])> 0. Hence, Σ is a strictly diagonally dominant matrix, and the claims follow as in the proof of Lemma 5.1.3. 5.5 The Contact Process The contact process, introduced in [54], was originally used in other contexts unrelated to its name such as Reggeon Field Theory in high energy physics. Never- theless, the process is well-known for modeling the evolution of a disease infection in a population and most researchers usually use the language of infection in order to describe the model. In this section, we separate our interest into two different variables in the following two subsections; the amount of time that a bounded region contains at least one infected site and the number of infected sites at any fixed time t> 0. 120 5.5.1 The amount of time that a bounded region contains an infected site In this section, we study the one dimensional contact process{ζ ν λ (t) :t≥ 0}, a continuous time Markov process with state spaceP(Z), the set of all subsets of Z. The subsetζ ν λ (t) models the collection of ‘infected’ individuals at timet, where each infected site infects healthy neighbors at rate λ, and infected sites recover at rate 1. The distribution of ζ ν λ (0) is ν λ , the unique invariant measure defined in [29] as the limiting distribution of ζ Z (t), ζ Z (t) d →ν λ as t→∞, where ζ Z (t) has initial distribution putting mass one on all ofZ. Define λ ∗ = sup{λ> 0 :ζ Z (t)→δ ∅ weakly as t→∞}. (5.51) The contact process is said to be supercritical when λ > λ ∗ , and, by [29], in this case ν λ is the nontrivial, unique invariant measure of the process; ν λ = δ ∅ when λ<λ ∗ . Recall that a function f :P(Z)→R is said to be cylindrical if it depends on only finitely many sites and increasing if f(A)≤ f(B) whenever A⊂ B⊂Z. In the supercritical case, we study the cumulative value of a cylindrical f evaluated on the process over the interval (s,s +t], D t s,f = Z s+t s f(ζ ν λ (u))du for s≥ 0,t> 0. (5.52) 121 For instance, forB a finite subset ofZ, lettingI B (·) be the cylindrical, increasing indicator function given by I B (η) =1(η∩B6=∅) for η⊂Z, (5.53) the valueD t s,I ¯ B yields the amount of time in (s,s +t] thatB contains at least one infected site. Taking t> 0, m∈N 1 and Y t,m s,i = Z s+it/m s+(i−1)t/m f(ζ ν λ (u))du for i∈ [m], we have D t s,f = m X i=1 Y t,m s,i . (5.54) We will show in Lemma 5.5.2 below, using a simple consequence from arguments in the proof of Lemma 1 of [79], that the family n Y t,m s,i :i∈ [m] o is positively associated when f is increasing. Now we focus on the functional D t s,f where f is a non-constant increasing cylindrical function for the supercritical one dimensional contact process{ζ ν λ (t) : t ≥ 0} with infection rate λ, recovery rate 1, and initial configuration having distribution ν λ , the unique non-trivial invariant measure of the process. It was provedin[79]thatinthesupercriticalcaseD t 0,f satisfiestheCLTforanycylindrical f. In Theorems 5.5.3 and 5.5.4 we provide finite sample bounds in the L 1 metric, andamultivariatesmoothfunctionmetric, forthisasymptoticwhenf isincreasing and non-constant by applying Theorems 4.2.1 and 4.3.1, respectively. For the main goal in this section, we apply Lemma 5.5.1 which provides expo- nential decay of the covariances off(ζ ν λ (·)) in time. We note the brief but impor- tant comment before the statement of Lemma 1 of [79], that the process considered 122 there, and defined in an indirect manner, has the distribution of the contact pro- cess started in its unique invariant non-trivial distribution. We recall the definition of λ ∗ given in (5.51). Lemma 5.5.1 (Lemma 2 of [79]). Forλ>λ ∗ and for any cylindricalf :P(Z)→ R there exist positive constants γ =γ(f,λ) and κ =κ(f,λ) depending on λ and f such that |Cov(f(ζ ν λ (r)),f(ζ ν λ (s)))|≤κe −γ|s−r| for {r,s}⊂ (0,∞). (5.55) The results of [79] make use of the fact that any cylindrical functionf is a finite linear combination of indicators of the form I ¯ B (·) given in (5.53) with|B| <∞. Hence, M f :=|f| ∞ <∞, implying that the moments of f on the process always exist. The next lemma verifies the positive association required for the application of our main results; let Y t,m s,i be as in (5.54). Lemma 5.5.2. Forλ>λ ∗ and any increasing cylindrical functionf :P(Z)→R, the families{f(ζ ν λ (t)) :t≥ 0}, and n Y t,m s,i :i∈ [m] o for s≥ 0, t> 0 and m∈N 1 , are positively associated. Proof. By applying (2.21) in [65], a corollary of Harris’ theorem, the proof of Lemma 1 of [79] demonstrates that{ζ ν λ (t) : t ≥ 0} is a positively associated family of random variables. Now both claims follow by the fact that both families in the statement are increasing functions in{ζ ν λ (t) :t≥ 0}. Using the remark above Proposition 1 in [79], the process{ζ ν λ (t) : t∈ R} is strictly stationary, hence we may let Var(D t s,f ) t =A t f for all s≥ 0,t> 0. (5.56) 123 Sincef is non-constant increasing andζ ν λ (u) is strictly stationary, by the Remark under Lemma 1 of [79], Var(f(ζ ν λ (u))) > 0 and does not depend on u. Since f(ζ ν λ (u)),u≥ 0 is positively associated, we have that A t f > 0 for all t> 0. Using the definition of D t s,f in (5.52), we also have A t f ≤ 2M 2 f <∞. Standardizing D t s,f to have mean zero and variance one, we obtain W t s,f = D t s,f −ED t s,f q tA t f = P m i=1 Y t,m s,i −EY t,m s,i q tA t f . (5.57) The following Lemma provides a bound on the L 1 distance between W t s,f and the standard normal. Theorem 5.5.3. Let λ > λ ∗ , s ≥ 0 and t > 0. Then for any non-constant increasing cylindrical function f :P(Z)→ R , with W t s,f as in (5.57) and Z∼ N (0, 1), d 1 L(W t s,f ),L(Z) ≤ 360 √ 2κM f √ π(A t f ) 3/2 γ 2 ! 1/2 t −1/4 , for all t≥ 2 √ 2κ 5γ 2 M f q πA t f 2/3 where M f =|f| ∞ <∞, and κ and γ are as in (5.55). We also prove the following multidimensional version of Theorem 5.5.3 for the p-vector S t f = (D t s 1 ,f ,...,D t sp,f ) with s 1 ,...,s p ≥ 0, (5.58) where D t s,f is defined in (5.52). Theorem 5.5.4. Let λ>λ ∗ , t> 0 and for p∈N 1 let s 1 ,...,s p ≥ 0 satisfy min k,l∈[p],k6=l |s k −s l |≥ (1−α)t 124 for some 0<α< 1, and letf :P(Z)→R be a non-constant increasing cylindrical function. Assume that the covariance matrix Σ ofS t f in (5.58) is invertible and let ψ t f =t 1/2 |Σ −1/2 | ∞ . Then, there exists a constant C p,f,λ such that with Z a standard normal random vector inR p , d H 3,∞,p L Σ −1/2 S t f −ES t f ,L(Z) ≤C p,f,λ A t f +α + 1 t 1/2 (ψ t f ) 5/2 t −1/4 + (ψ t f ) 2 α + 1 t ! for all t≥ ψ t f A t f +α + 1 t −2/3 . (5.59) We prove Theorems 5.5.3 and 5.5.4 using Theorems 4.2.1 and 4.3.1. Alterna- tively, one may apply Theorems 5.1.1 and 5.1.2 by breaking D t s,f into a sum of integrals over intervals of length one and a ‘remainder integral’ over an interval of lengtht−btc. However, this approach will result in constants larger than the ones stated in Theorems 5.5.3 and 5.5.4 which breaks D t s,f into integrals over intervals of equal, optimal length. For checking the hypothesis that the covariance matrix of S t f is invertible and the quantity ψ t f is of order one, we present the following sufficient condition. We see from (5.59) that if ψ t f =O(1) and α =O(t − 1 4 ) then the bound will also be of order O(t − 1 4 ). Lemma 5.5.5. With t > 0, the covariance matrix Σ of the random vector S t f ∈ R p ,p≥ 2 in (5.58) is invertible if for some b≥ 0 |s k −s l |≥t−b for all k6=l,k,l∈ [p], and b< tA t f γ 2(p− 1)κ − 1 γ ! , 125 where A t f is given in (5.56) and κ,γ in (5.55). If b =αt for some α satisfying 0<α< min ( 1, A t f γ 2(p− 1)κ − 1 γt ) , then the matrix Σ is invertible and |Σ −1 | ∞ ≤ 1 t A t f − 2κ(p− 1) (α/γ + 1/(γ 2 t)) . Since Theorems 5.5.3 and 5.5.4 and Lemma 5.5.5 have the same conclusions as the voter model results in Theorems 5.4.1 and 5.4.2 and Lemma 5.4.3, respectively, only differing by constants depending on the parameters of the model, it suffices to prove ‘contact process’ versions of Lemmas 5.4.7 and 5.4.8 and then the results in this section follow similarly as for their voter model counterparts. Proceeding in this manner, Lemma 5.5.6 bounds the sum of the covariances between Y t,m s,i and Y t,m s,j , defined in (5.54), over distinct i,j∈ [m], and Lemma 5.5.7 bounds the covariance between D t r,f and D t s,f where r and s are at least t−b apart for some 0<b≤t. Lemma 5.5.6. For t> 0, s≥ 0 and m∈N 1 , with Y t,m s,i as in (5.54), X i,j∈[m],i6=j Cov Y t,m s,i ,Y t,m s,j ≤ 2κm γ 2 where κ and γ are as in (5.55). 126 Proof. Applying Lemma 5.5.1 and using the stationarity of ζ ν λ (·), for 1≤i<j≤ m we have Cov Y t,m s,i ,Y t,m s,j = Z it/m (i−1)t/m Z jt/m (j−1)t/m Cov(f(ζ ν λ (u)),f(ζ ν λ (v)))dudv ≤κ Z it/m (i−1)t/m Z jt/m (j−1)t/m e −γ(u−v) dudv =κ Z jt/m (j−1)t/m e −γu du Z it/m (i−1)t/m e γv dv = κ γ 2 e γt/2m −e −γt/2m 2 e −(j−i)γt/m . Summing Cov(Y t,m s,i ,Y t,m s,j ) over i6=j, we obtain X i,j∈[m],i6=j Cov(Y t,m s,i ,Y t,m s,j ) = 2 X 1≤i<j≤m Cov(Y t,m s,i ,Y t,m s,j ) ≤ 2κ γ 2 e γt/2m −e −γt/2m 2 X 1≤i<j≤m e −(j−i)γt/m = 2κ γ 2 e γt/2m −e −γt/2m 2 m−1 X k=1 (m−k)e −γkt/m = 2κ γ 2 e γt/2m −e −γt/2m 2 e −γt/m (m− 1)−me −γt/m +e −γt (1−e −γt/m ) 2 = 2κ γ 2 (m− 1)−me −γt/m +e −γt ≤ 2κm γ 2 , where in the third equality we apply the identity in (5.5) with n = m and w = e −γt/m . Lemma 5.5.7. For t > 0, let 0≤ b≤ t and let r,s≥ 0 satisfy|s−r|≥ t−b. Then, with D t s,f as in (5.52), Cov(D t r,f ,D t s,f )≤ 2κ b γ + 1 γ 2 ! , 127 where κ and γ are as in (5.55). Proof. It suffices to consider the cases≥r. First assume thats≤r +t. Using the definition ofD t s,f in (5.52) and breaking the integral that expresses the covariance we wish to bound into three parts, we have Cov(D t s,f ,D t r,f ) = Z s+t s Z r+t r Cov(f(ζ ν λ (u)),f(ζ ν λ (v)))dudv = Z s+t s Z s r Cov(f(ζ ν λ (u)),f(ζ ν λ (v)))dudv+ Z s+t r+t Z r+t s Cov(f(ζ ν λ (u)),f(ζ ν λ (v)))dudv + Z r+t s Z r+t s Cov(f(ζ ν λ (u)),f(ζ ν λ (v)))dudv. Now applying Lemma 5.5.1 we have Cov(D t r,f ,D t s,f ) ≤ κ Z s+t s Z s r e −γ(v−u) dudv +κ Z s+t r+t Z r+t s e −γ(v−u) dudv +κ Z r+t s Z v s e −γ(v−u) dudv +κ Z r+t s Z r+t v e −γ(u−v) dudv. = κ 1−e −γ(s−r) −e −γt +e −γ(s−r+t) γ 2 + 1−e −γ(r−s+t) −e −γ(s−r) +e −γt γ 2 +2 t− (s−r) γ − 1−e −γ(r−s+t) γ 2 ! = κ 2(t− (s−r)) γ + e −γ(s−r+t) +e −γ(r−s+t) − 2e −γ(s−r) γ 2 ! ≤ κ 2(t− (s−r)) γ + 2 γ 2 ! ≤ 2κ b γ + 1 γ 2 ! . 128 When s>r +t we have Cov(D t s,f ,D t r,f )≤κ Z s+t s Z r+t r e −γ(v−u) dudv =κ Z s+t s e −γv dv Z r+t r e γu du = κ γ 2 e −γ(s−r−t) +e −γ(s−r+t) − 2e −γ(s−r) ≤ 2κ γ 2 . hence the claim of the lemma is true in this case as it holds with b replaced by zero. The proofs of Theorems 5.5.3 and 5.5.4 and Lemma 5.5.5 below closely follow those of Theorems 5.4.1 and 5.4.2 and Lemma 5.4.3 respectively, so only brief outlines are provided. Proof of Theorem 5.5.3: Following the outline of the proof of Theorem 5.4.1, we apply Theorem 4.2.1 to the sum of ξ t s,i = Y t,m s,i −EY t,m s,i q tA t f , which is absolutely bounded by B = 2 √ tM f m q A t f . Applying Lemma 5.5.6 for the second term of (4.8) yields d 1 L(W t s,f ),L(Z) ≤ 10 √ tM f m q A t f + 4 √ 2κm √ πγ 2 tA t f . Applying (5.22) with d = 1 and with m in place of l as in the proof of Theorem 5.4.1 now yields the result. Proof of Theorem 5.5.4: We apply Theorem 4.3.1 as in the proof of Theorem 5.4.2. Withξ k i definedbyξ k i =Y t,m s k ,i −EY t,m s k ,i , wehave|ξ k i |≤B whereB = 2M f t/m. Then bounding the right hand side of (4.19), using that|Σ −1/2 | ∞ = t −1/2 ψ t f for 129 the first term, applying Lemmas 5.5.6 and 5.5.7 for the second and the last terms respectively and invoking Theorem 4.3.1, we obtain d H 3,∞,p L(Σ −1/2 (S t −ES t ),L(Z) ≤ C p,f √ tA t f + C p,f,λ √ t +C p,f,λ α √ t ! (ψ t f ) 3 m + C p,f,λ (ψ t f ) 2 m t + C p,f,λ (ψ t f ) 2 t +C p,f,λ α(ψ t f ) 2 ≤C p,f,λ √ tA t f + 1 √ t +α √ t ! (ψ t f ) 3 m + (ψ t f ) 2 m t + (ψ t f ) 2 α + 1 t ! . Applying (5.22) to the first two terms in parenthesis now yields the result. FinallytoverifyLemma5.5.5, wefollowthesameideaasintheproofofLemma 5.4.3 with the help of Lemma 5.5.7. 5.5.2 The number of infected sites at any fixed time In addition to the time dependent quantity (5.52) studied in the previous sub- section, the results in Section 5.1 also allow us to obtain a version of Corollary 5.2.1 for the number of infected sites at any fixed timet> 0 in a bounded region for the supercritical multidimensional contact process{ζ Z d t (x) :x∈Z d } with intial state having mass one onZ d . Here ζ Z d t (x) = 1 if x is infected at time t and ζ Z d t (x) = 0 otherwise. In particular, the values of the process at t are positively associated by Theorem B17 of [66], and the covariance between ζ Z d t (x) and ζ Z d t (y) decays expo- nentially by Theorem 1.7 of [38]. Strictly stationary follows from the fact that at time zero all sites are infected. We state this result in the following corollary. 130 Corollary 5.5.8. The bounds of Theorem 5.1.1 (Theorem 5.1.2) apply to the num- ber of infected sites at any fixed time t> 0, X x∈B n k ζ Z d t (x), for the supercritical multidimensional contact process{ζ Z d t (x) : x∈ Z d }. (when choosing α in (5.8) to satisfy (5.11)). 131 Chapter 6 Summary, Related Works and Further Questions In this chapter, we summarize what we have done in this dissertation, mention some related works and state further questions that might be interesting to the reader. Our contributions and some related works are listed as follows. • We generalize the well-known zero bias coupling to an approximate zero bias coupling. • WeobtainL 1 andL ∞ boundsbasedonapproximatezerobiascouplingsusing similar techniques as used in the zero biasing case in [21]. • The L 1 and L ∞ bounds are applied to the combinatorial central limit theo- rems where the random permutation has the Ewens distribution. We obtain the same rate of convergence as the known results when the random per- mutation has the uniform distribution. Beside the uniform, the work [42] considered the case where the random permutation is chosen uniformly from the set of involutions without fixed points and the work [21] obtained the results when the random permutation has distribution constant on cycle type without fixed points. • We obtain L 1 bounds for sums of positively associated random variables and also extend the results to random vectors with L 1 metric replaced by a smooth functions metric. 132 • TheL 1 and smooth functions metric bounds in the one and multidimensional cases, respectively, are first applied to second order stationary positively associated random fields over R d with exponential decaying covariance. In the multidimensional case, we also allow block sums to be overlapped. In [15], L ∞ bounds without stationary assumption and with better rate were obtained. Nevertheless, the multidimensional case was not considered there and L ∞ bounds do not imply L 1 bounds in general. The CLT without rate of convergence for the multidimensional case with strictly stationary assumption and with disjoint block sums was proved in [69]. • Our results for second order stationary positively associated random fields with exponential decaying covariance are applied further to the well known statistical physics models; the ferromagnetic nearest-neighbor Ising model, the supercritical bond percolation model and the one dimentional contact process. • Our main L 1 and smooth functions metric bounds are also applied directly to the voter model overR d withd≥ 7. The CLT without rate of convergence of the model withd≥ 2 is known. TheL ∞ bounds in [15] are not applicable to this model. • The work [88] proved the results for negatively associated random fields par- allel to the ones in this dissertation for positively associated random fields using the same technique as in Chapter 2. We end up this dissertation by stating a couple of further related questions that might be worth studying in the future. 133 • Is the positively associated results applicable to the random cluster models of which the Ising model and bond percolation are special cases? For more detail about the models, see the book [53]. • Does the similar technique used for proving the positively associated results work with other settings? • Does the approximate zero biasing technique work with other models? • Is it possible to use the approximate zero biasing technique to obtain con- centration inequalities for the Hoeffding’s statistics where the random per- mutation has the Ewens distribution? • Can we generalize the result for the combinatorial CLT where the random permutation has the Ewens distribution to the one where the random per- mutation has distribution constant on cycle type and allows fixed point? The Ewens distribution as defined in (3.12) is a special case of this type of distribution. 134 Reference List [1] Ahlberg, J. H. and Nilson, E. N., Convergence properties of the spline fit, J. SIAM, 11 (1963) 95–104. [2] Aizenman, M., Barsky, D. J. and Fernandez, R., The phase transition in a general class of Ising-type models is sharp, Journal of Statistical Physics, 47, 343–374, 1987. [3] Aizenman, M. and Duminil-Copin, H., The truncated correlations of the Ising model in any dimension decay exponentially fast at all but the critical tem- peratute, ArXiv:1506:00625, 2015. [4] Aldous, D. J., Exchangeability and related topics. In P. L. Hennequin, editor, École d’Été de Probabilités de Saint-Flour XII, Springer Lecture Notes in Mathemtics, Vol. 1117, Springer-Verlag, 1985. [5] Arratia, R., Barbour, A.D., andTavare, S., Logarithmic Combinatorial Struc- tures: A Probabilistic Approach, European Mathematical Society, 2003. [6] Banerjee, S. and Roy, A., Linear Algebra and Matrix Analysis for Statistics, Chapman and Hall, 2014. [7] Barbour, A. D., Stein’s method for diffusion approximations, Probab. Th. Rel. Fields, 84, 297–332, 1990. 135 [8] Barbour, A. D., Janson, S. and Holst, L., Poisson Approximation, Oxford University Press, 1992. [9] Barbour, A. D., Karoński, M. and Ruciński, A., A central limit theorem for decomposable random variables with applications to random graphs, J. Combin. Theory Ser. B, 47, 125–145, 1989. [10] Berry, A. C., The Accuracy of the Gaussian Approximation to the Sum of Independent Variates, Transactions of the American Mathematical Society, 49, 122–136, 1941. [11] Birkel, T., On the convergence rate in the central limit theorem for associated processes, Annals of Probability, 16, 1685–1698, 1988. [12] Bolthausen, E., An estimate of the remainder in a combinatorial central limit theorem, Z. Wahrsch. Verw. Gebiete, 66, 379–386, 1984. [13] Brenti, F., Unimodal, Log-concave, and Pólya Frequency Sequences in Com- binatorics,, Memoirs Amer. Math. Soc., no. 413, 1989. [14] Broadbent, S. R. and Hammersley, J. M., Percolation processes I. Crystals and mazes, Proceedings of the Cambridge Philosophical Society, 53, 629–641, 1957. [15] Bulinski, A., Rate of convergence in the central limit theorem for fields of associated random variables, Theory Probab. Appl., 40, 136–144, 1995. [16] Chatterjee, S., Stein’s method for concentration inequalities, Probab. Theory Relat. Fields, 138, 305–321, 2007. 136 [17] Chatterjee, S. and Shao, Q.-M., Nonnormal approximation by Stein’s method of exchangeable pairs with application to the Curie-Weiss model, Annals of Applied Probability, 21, 464–483, 2011. [18] Chayes, J. T., Chayes, L., Grimmett, G., Kesten, H. and Schonmann, R. H., The correlation length for the high-density phase of Bernoulli percolation, Annals of Probability, 17, 1277–1302, 1989. [19] Chen, L. H. Y., On the convergence of Poisson binomial to Poisson distribu- tions, Annals of Probability, 2, 178–180, 1974. [20] Chen, L. H. Y. and Fang, X., On the error bound in a combinatorial central limit theorem, Bernoulli, 21(1), 335–359, 2015. [21] Chen, L. H. Y., Goldstein, L. and Shao, Q.-M., Normal Approximation by Stein’s Method, Springer, New York, 2011. [22] Chen, L. H. Y. and Roellin, A., Stein couplings for normal approximation, ArXiv:1003.6039, 2010. [23] Cox, T.andGriffeath, D., Occupationtimelimittheoremsforthevotermodel, Annals of Probability, 11, 876–893, 1983. [24] Cox, T. and Grimmett, G., Central limit theorems for associated random variables and the percolation model, Annals of Probability,12, 514–528, 1984. [25] Crifford, P. and Subbery, A., A model for spatial conflict, Biometrika, 60, 581–588, 1973. [26] Döbler, C., Stein’s method of exchangeable pairs for absolutely contin- uous, univariate distributions with applications to the Polya urn model, ArXiv:1207.0533v2, 2012. 137 [27] Döbler, C., Stein’s method of exchangeable pairs for the Beta distribution and generalizations, Electron. J. Probab., 20, 1–34, 2015. [28] Duminil-Copin, H. and Tassion, V., A new proof of the sharpness of the phase transition for Bernoulli percolation and the Ising model. ArXiv:1502.03050v2, 2015. [29] Durrett, R. and Griffeath, D., Supercritical contact processes inZ, Annals of Probability, 11, 1–15, 1983. [30] Eichelsbacher, P. and Löwe, G., Stein’s method for dependent random vari- ables occurring in Statistical Mechanics, Electronic Journal of Probability,30, 962–988, 2010. [31] Eichelsbacher, P. and Martschink, B., On rates of convergence in the Curie– Weiss–Potts model with an external field, Ann. Inst. H. Poincaré Probab. Statist., 51, 252–282, 2015. [32] Eichelsbacher, P.andReinert, M., Stein’smethodfordiscreteGibbsmeasures, Annals of Probability, 18, 1588–1618, 2008. [33] Ellis, R. S., Entropy, Large Deviations, and Statistical Mechanics, Springer, New York, 2006. [34] Englund, G., A remainder term estimate for the normal approximation in classical occupancy, Annals of Probability, 9, 684–692, 1981. [35] Esary,J.D.,ProschanF.andWalkup,D.W.,Associationofrandomvariables, with applications, Ann. Math. Statist., 38, 1466–1474, 1967. [36] Esseen, C. G., On the Liapunoff limit of error in the theory of probability, Arkiv för matematik, astronomi och fysik, A28, 1–19, 1942. 138 [37] Ewens, W., The sampling theory of selectively neutral alleles, Theoretical Population Biology, 3, 87–112, 1972. [38] Fiocco, M., Van Zwet W. R., Decaying correlations for the supercritical con- tact process conditioned on survival, Bernoulli, 9, 763–781, 2003. [39] Frolov, A. N., Esseen type bounds of the remainder in a combinatorial CLT, Statist. Plann. Inference, 149, 90–97, 2014. [40] Frolov, A. N., On Esseen type inequalities for combinatorial random sums, Statist. Theory Methods, 46(12), 5932–5940, 2017. [41] Gaunt, R. E., A Stein characterisation of the generalized hyperbolic distribu- tion, ArXiv:1603.05675v3, 2017. [42] Ghosh, S., L p bounds for a central limit theorem with involutions, ArXiv:0905.1150v3, 2010. [43] Ghosh, S. and Goldstein, L., Concentration of measures via sized biased cou- plings, Probability Theory and Related Fields, 149, 271–278, 2011. [44] Ghosh, S. and Goldstein, L., Applications of sized biased couplings for con- centration of measures, Electronic Communications in Probability, 16, 70–83, 2011. [45] Goldstein, L., ABerry-Esseenboundwithapplicationstovertexdegreecounts in the Erdős-Rényi random graph, Annals of Applied Probability,23, 617–636, 2013. [46] Goldstein, L. and Işlak Ü., Concentration inequalities via zero bias couplings, Statistics and Probability Letters, 86, 17–23, 2014. 139 [47] Goldstein, L.andReinert, G., Stein’smethodfortheBetadistributionandthe Pólya-Eggenberger urn, Annals of Applied Probability, 50, 1187–1205, 2013. [48] Goldstein, L. and Rinott, Y., Multivariate normal approximations by Stein’s method and size bias couplings, J. Appl. Probab., 33(1), 1–17, 1996. [49] Goldstein, L. and Rinott, Y., Stein’s method and the zero bias transformation with application to simple random sampling, Annals of Applied Probability, 7, 935–952, 1997. [50] Goldstein, L. and Wiroonsri, N., Stein’s method for positively associated random variables with applications to Ising, percolation and voter models. ArXiv:1603.05322v1, 2016. [51] Goldstein, L. and Wiroonsri N., Stein’s method for positively associated ran- dom variables with applications to the Ising and voter models, bond percola- tion, and contact process, Annales de l’Institut Henri Poincaré, 54, 385–421, 2018. [52] Grimmett, G., Percolation (2nd edition), Springer, New York, 1999. [53] Grimmett, G., The Random-Cluster Model, Springer, New York, 2006. [54] Harris, T. E., Contact interactions on a lattice, Annals of Probability, 2, 969– 988, 1974. [55] Ho, S. T. and Chen, L. H. Y., An L p bound for the remainder in a combina- torial central limit theorem, Annals of Probability, 6(2), 231–249, 1978. [56] Hoeffding, W., Masstabinvariante Korrelations-theorie Schriften, Math. Inst. Univ. Berlin, 5, 181–233, 1940. 140 [57] Hoeffding, W., A combinatorial central limit theorem, Ann. Math. Stat., 22, 558–566, 1951. [58] Holley, R. A. and Liggett, T. M., Ergodic theorems for weakly interacting infinite systems and the voter model, Annals of Probability, 3, 643–663, 1975. [59] Hotelling, H. and Pabst, M., Rank correlation and tests of significance involv- ing no assumption of normality, Ann. Math. Statist., 7, 29–43, 1936. [60] Ibragimov, I. A. and Linnik, Yu. V., Independent and Stationary Sequences of Random Variables, Wolters-Noordhoff, Groningen, 1971. [61] Ising, E., Beitrag zur Theorie des Ferromagnetismus, Z. Phys., 31, 253–258, 1925. [62] Lebowitz, J. L. and Penrose, O., Analytic and clustering properties of ther- modynamic functions and distribution functions for classical lattice and con- tinuum systems, Commun. math. Phys., 11, 99–124, 1968. [63] Lehmann, E. L., Some concepts of dependence, Ann. Math. Statist.,37, 1137– 153, 1966. [64] Lehmann, E. L. and Stein, C., On the theory of some nonparametric hypothe- ses, Ann. Math. Statist., 20, 28–45, 1949. [65] Liggett, T. M., Interacting particle systems, Springer, New York, 1985. [66] Liggett, T. M., Stochastic Interacting Systems: Contact, Voter and Exclusion Processes, Springer Berlin Heidelberg, 1999. [67] Luk, H. M., Stein’s method for the gamma distribution and related statistical applications., PhD thesis, University of Southern California, 1994. 141 [68] Madow, W. G., On the limiting distributions of estimates based on samples from finite universes, Ann. Math. Statist., 19, 535–545, 1948. [69] Newman, C., Normal Fluctuations and the FKG inequality, Communications in Mathematical Physics, 74, 119–128, 1980. [70] Nourdin, I. and Peccati, G., Stein’s method on Wiener chaos, Probab. Theory Related Fields, 145, 75–118, 2009. [71] Peköz, E. and Roellin, New rates for exponential approximation and the the- orems of Rényi and Yaglom, Annals of Probability, 39, 587–608, 2011. [72] Peköz, E., Roellin, A. and Ross, N., Total variation error bounds for geometric approximation, Bernoulli, 19, 610–632, 2013. [73] Peköz, E., Roellin, A. and Ross, N., Generalized gamma approximation with rates for urns, walks and trees, Annals of Probability, 44, 1776–1816, 2016. [74] Pittman, J. W., Some developments of the blackwell-macqueen urn scheme. In T. S. Ferguson, Shapley L. S., and Macqueen J. B., editors, Statistics, Prob- ability and Game Theory, Vol. 30 of IMS Lecture Notes-Monograph Series, pages 245–267. Institue of Mathematical Statistics, Hayward, CA., 1996. [75] Rachev, S. T., The Monge-Kantorovich transference problem and its stochas- tic applications, Theory Probab. Appl., 29, 647–676, 1984. [76] Rinott, Y. and Rotar, V., On coupling constructions and rates in the CLT for dependent summands with applications to the antivoter model and weighted U-statistics, Ann. Appl. Probab., 7(4), 1080–1105, 1997. [77] Roellin, A., Symmetric and centered binomial approximation of sums of locally dependent random variables, Electron. J. Probab., 24, 756–776, 2008. 142 [78] Ross, N., Fundamentals of Stein’s method, Prob. Surv., 8, 210–293, 2011. [79] Schonmann, R. H., Central limit theorem for the contact process, The Annals of Probability, 14, 1291–1295, 1986. [80] Shevtsova, I., On the absolute constants in the Berry–Esseen type equalities for identically distributed summands, ArXiv:1111.6554, 2011. [81] Stanley, R.P., Log-concaveandunimodalsequencesinalgebra, combinatorics, and geometry, Graph Theory and Its Applications: East and West, 576, 500– 535, 1989. [82] Stein, C., A bound for the error in the normal approximation to the distri- bution of a sum of dependent random variables, Proc. Sixth Berkeley Symp. Math. Statist. Prob., Univ. of California Press, 2, 210–293, 1972. [83] Stein, C., Approximate Computation of Expectations, Institute of Mathemati- cal Statistics Lecture Notes—Monograph Series, 7. Institute of Mathematical Statistics, Hayward, CA., 1986. [84] Tikhomirov, A. N., On the convergence rate in the central limit theorem for weakly dependent random variables, Theory Probab. Appl.,25, 790–809, 1980. [85] Wald, A. and Wolfowitz, J., Statistical tests based on permutations of the observations, Ann. Math. Statist., 15, 358–372, 1944. [86] Watterson, G. A., The sampling theory of selective neutral alleles, Advances in Applied Probability, 6, 463–488, 1974. [87] Wiroonsri, N., Stein’s method using approximate zero bias couplings with applications to combinatorial central limit theorems under the Ewens distri- bution, ALEA, Lat. Am. J. Probab. Math. Stat., 14, 917–446, 2017. 143 [88] Wiroonsri, N., Stein’s method for negatively associated random variables with applicationstosecondorderstationaryrandomfields, Journal ofAppliedProb- ability, 55, 196–215, 2018. 144
Abstract (if available)
Abstract
Stein's method is nowadays one of the most powerful methods to prove limit theorems in probability and statistics. It was first introduced in 1972 by Charles Stein and used originally for normal approximation. Its exceptional feature over other methods is that it is based on a characteristic equation of the normal distribution and uses coupling constructions. In particular, it does not require the use of complex valued characteristic functions. Since its inception the method has grown dramatically and has now expanded to cover other distributional approximations and even is applied in non-distributional approximation contexts such as concentration of measure. Two additional standout features of the method is that it provides non asymptotic bounds on the distance between distributions for finite sample sizes, and that it can handle a number of situations involving dependence. ❧ In 1997, Larry Goldstein and Gesine Reinert introduced a coupling technique using the zero bias transformation that can be applied to the combinatorial central limit theorem in the case where the given random permutation has the uniform distribution. Our first main focus of this dissertation is to generalize the zero bias coupling to approximate zero bias couplings, and also to extend the application of the combinatorial central limit theorem to the case where the random permutation has the Ewens distribution with parameter θ> 0, which contains the uniform distribution as a special case when θ= 1. ❧ Stein's method also has many well-known applications related to statistical physics. Positive association among the underlying variables is often satisfied for various models in the field. Our second main focus is to develop an L¹ version of Stein's method for sums of positively associated random variables and provide results in the multidimensional case where the L¹ metric is replaced by a smooth functions metric. These results are then applied to four well-known models in statistical physics and particle systems
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Limit theorems for three random discrete structures via Stein's method
PDF
Stein couplings for Berry-Esseen bounds and concentration inequalities
PDF
Stein's method and its applications in strong embeddings and Dickman approximations
PDF
Finite sample bounds in group sequential analysis via Stein's method
PDF
Exchangeable pairs in Stein's method of distributional approximation
PDF
Cycle structures of permutations with restricted positions
PDF
Concentration inequalities with bounded couplings
PDF
Applications of Stein's method on statistics of random graphs
PDF
CLT, LDP and incomplete gamma functions
PDF
Finite dimensional approximation and convergence in the estimation of the distribution of, and input to, random abstract parabolic systems with application to the deconvolution of blood/breath al...
PDF
Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol
PDF
Probabilistic numerical methods for fully nonlinear PDEs and related topics
PDF
Delta Method confidence bands for parameter-dependent impulse response functions, convolutions, and deconvolutions arising from evolution systems described by…
PDF
Essays on bioinformatics and social network analysis: statistical and computational methods for complex systems
PDF
M-estimation and non-parametric estimation of a random diffusion equation-based population model for the transdermal transport of ethanol: deconvolution and uncertainty quantification
PDF
Detecting joint interactions between sets of variables in the context of studies with a dichotomous phenotype, with applications to asthma susceptibility involving epigenetics and epistasis
PDF
Non-parametric models for large capture-recapture experiments with applications to DNA sequencing
Asset Metadata
Creator
Wiroonsri, Nathakhun
(author)
Core Title
Stein's method via approximate zero biasing and positive association with applications to combinatorial central limit theorem and statistical physics
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Publication Date
10/15/2018
Defense Date
09/12/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
association,Berry Esseen bounds,limit theorems,OAI-PMH Harvest,random fields,Stein's method,weak dependence
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Goldstein, Larry (
committee chair
), Alexander, Kenneth (
committee member
), Sun, Fengzhu (
committee member
)
Creator Email
nwiroon@gmail.com,wiroonsr@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-76925
Unique identifier
UC11672294
Identifier
etd-WiroonsriN-6834.pdf (filename),usctheses-c89-76925 (legacy record id)
Legacy Identifier
etd-WiroonsriN-6834.pdf
Dmrecord
76925
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Wiroonsri, Nathakhun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Berry Esseen bounds
limit theorems
random fields
Stein's method
weak dependence