Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Concentration inequalities with bounded couplings
(USC Thesis Other)
Concentration inequalities with bounded couplings
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CONCENTRATION INEQUALITIES WITH BOUNDED COUPLINGS by Umit I slak A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (MATHEMATICS) May 2014 Copyright 2014 Umit I slak To Sevgi ii Acknowledgments My advisor Prof. Jason Fulman has always been supportive, and very kind. He introduced me several interesting problems of probability theory, and encouraged me to work on these during my studies. I would like to thank him for all his help throughout my PhD. It is hard to describe how much I learnt from Prof. Larry Goldstein during the last few years, both mathematical and non-mathematical. I wish I could list all of these here, but it would be hard to keep this list short once I start. So I will just say that his help is greatly appreciated and will never be forgotten. There are two other professors who had special roles in my PhD: my masters advisor Prof. Alp Eden and Prof. Christian Houdr e. Prof. Eden was one of my main motivations for starting a PhD, and he kept supporting me in the last four years. Prof. Houdr e not only shared great results on mathematics and ideas for future work, but also gave a lot of tips on how to be a good supervisor. Also, I would like to thank my professors Sergey Lototsky, Susan Friedlander, Francis Bonahon, Jay Bartro, Igor Kukavica, Gesine Reinert, Paul Newton, Ylmaz Ko cer, Cymra Haskell and Peter Baxendale for their help and support during dierent periods of my studies. I had several friends at USC who contributed this dissertation directly or indirectly. Especially, I would like to thank Radoslav Marinov, John Pike and G okhan Yldrm. iii Besides, _ Ibrahim Devrim G uner and Umit Y ucel, my neighbors at Orchard Ave., had also a deep impact on me! I also had the joy of getting support from my family, and my best friends Mehmet Demir and Alper Kanar. They are not mathematicians (or probably I should say, they have no idea about mathematics), but still their help was invaluable. This dissertation is dedicated to my wife Sevgi I slak, I am not even sure whether I would choose a career in mathematics if it were not to her. She was always there while I was hitting my head on the walls. iv Table of Contents Dedication ii Acknowledgments iii Abstract vii Chapter 1: Introduction 1 Chapter 2: Overview of Stein's method and concentration inequalities 6 2.1 Concentration inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Stein's method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 A brief introduction to Stein's method . . . . . . . . . . . . . . . . 17 2.2.2 Couplings from Stein's method . . . . . . . . . . . . . . . . . . . . 20 2.3 Previous work on concentration inequalities with Stein's method . . . . . 22 Chapter 3: Concentration inequalities with size biased couplings 25 3.1 Background on size biasing and construction of couplings . . . . . . . . . 25 3.2 Concentration results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Basic examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 A correlation inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.5 A closely related coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Chapter 4: Size biasing, log-concavity and occupancy models 54 4.1 Some lemmas and connections to log-concavity . . . . . . . . . . . . . . . 55 4.2 Applications on occupancy models . . . . . . . . . . . . . . . . . . . . . . 66 4.2.1 Degree counts in Erd os-R enyi type graphs . . . . . . . . . . . . . . 67 4.2.2 Multinomial occupancy . . . . . . . . . . . . . . . . . . . . . . . . 71 4.3 Comparison of results with other techniques . . . . . . . . . . . . . . . . . 74 Chapter 5: A multivariate concentration bound 80 5.1 Main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.1.1 Proof of Theorem 5.1.1 . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2.1 Local dependence and an application on local extremes . . . . . . 88 v 5.2.1.1 Joint distribution of local extrema . . . . . . . . . . . . . 90 5.2.2 Pattern occurences . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Chapter 6: Use of zero biased couplings 94 6.1 Zero bias transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.2 Concentration bounds and discussion . . . . . . . . . . . . . . . . . . . . . 97 6.2.1 Proof of the main result . . . . . . . . . . . . . . . . . . . . . . . . 101 6.3 Hoeding's permutation statistic . . . . . . . . . . . . . . . . . . . . . . . 105 Chapter 7: Concluding remarks and future directions 111 Bibliography 114 vi Abstract Stein's method is a technique in probability theory introduced by Charles Stein in 1972 that enables one to obtain convergence rates in distributional approximations. The method makes use of characterizing dierential equations of distributions and various coupling constructions to get error bounds with respect to certain probability metrics. In 2007, Sourav Chatterjee showed that one particular coupling from Stein's method, exchangeable pairs, can be used to prove concentration of measure inequalities in a wide range of problems in dependent settings. Later in 2011, his approach also found use by Subhankar Ghosh and Larry Goldstein where they proved new concentration inequali- ties via size biased couplings which provide another useful framework in Stein's method. This dissertation contributes to the concentration of measure literature by following the latter of these two works and improving their results in several ways. Besides, we introduce two new directions by making use of zero biasing and equilibrium couplings. Further, we provide multivariate extensions of our results, obtain correlation inequali- ties by using couplings and make the connections to Stein's method more transparent by an understanding of xed points of distributional transformations. All of our results are illustrated through several nontrivial examples mainly on random graphs, random permutations and occupancy models. vii Chapter 1 Introduction Central limit theorem and strong law of large numbers are among the most fundamental results of probability theory. Letting X 1 ;X 2 ;:::;X n be independent and identically dis- tributed (i.i.d.) random variables with nite mean and nite nonzero variance 2 , and setting S n = P n i=1 X i , these two results state that S n n p n ! d N (0; 1) as n!1 (1.1) and S n n ! with probability 1 as n!1; (1.2) respectively, whereN (0; 1) is the standard normal distribution and! d is the convergence in distribution [24]. Although this dissertation is more related to (1.2), we start by giving a brief discussion of (1.1) as our results have their roots in distributional approximations. Central limit theorem, which dates back to the work of De Moivre from 18th century, has now several variations which relax the i.i.d. assumption on thefX i g n i=1 sequence in a number of ways including cases where they are possibly be dependent. See [24], for example, for some basic results. However, powerful weak convergence results such as (1.1) beg an immediate question: How well does the standard normal distribution approximate the distribution of (S n n)= p n for a given xed value of n? One powerful technique for obtaining error bounds in distributional approximations was introduced by Charles Stein [65] in 1972, which is nowadays known as Stein's method. 1 The method in particular applies to the classical central limit theorem described above, and provides convergence rates in terms of metrics that can be represented as d H (L(X);L(Z)) = sup h2H jE[h(X)]E[h(Z)]j (1.3) where X;Z are random variables,L(X) is the law of X andH is a suitable class of functions. Metrics of the form (1.3) in particular include the well known Kolmogorov, total variation and Wasserstein distances as special cases. Stein's seminal work [65] was using the Kolmogorov metric, a special case of (1.3), in order to obtain convergence rates for the normal approximation of sums ofm-dependent random variables. The most useful property of Stein's approach, taking dependence into account, was even apparent in his rst application. There has been much progress since the in uential work of Stein, see [16] and [62] for two recent and wonderful overviews of the subject. In particular, Louis Chen showed in [15] that the method can also be used for Poisson approximation and the work of Arratia et al. in [3] popularized his approach by providing useful examples on local dependence. [6] is an excellent reference on Poisson approximation. Some notable recent work on other distributions include the geometric [58], exponential [59] and Laplace distribution [57], among many others. Turning to the strong law result of (1.2), analogous to our discussion above on dis- tributional approximations, we may question the convergence rate of S n =n to . In other words, we might be interested in obtaining upper bounds for the tail probabil- ity P(jS n =nj t) for a given t > 0. Such inequalities are known as concentration inequalities, and have great importance in both the theory and applications of probabil- ity theory. In this dissertation, we aim to analyze the connections of Stein's method and the concentration of S n =n, or more general random variables, around their mean. Although traces of such concentration inequalities can be found even in the 19th century, the concentration of measure phenomenon was put forward by V. Milman in the 1970's while he was investigating the asymptotic geometry of Banach spaces. Since 2 then, this notion has found use in several dierent elds of mathematics; the probabilistic method [1], information theory [60], discrete geometry [52], randomized algorithms [22], and several others. For excellent book length treatments of the concentration of measure phenomenon, see [11] and [48]. For a simple example on concentration, we may let Y be a random variable with expectation =E[Y ] and variance 2 =E[(Y) 2 ], and consider Chebychev's inequality which provides the bound P(jYjt) 2 t 2 for any t > 0: That is, for t the probability of Y deviating more than t from its mean is "small". However, in general the bounds obtained via Chebychev's inequality are not good enough, and one needs to show that the tails of Y are indeed decaying exponentially. We may consider Cherno's inequality [18] for the binomial distribution as an archetypical example of concentration bounds that yield such behavior. Letting X 1 ;X 2 ;:::;X n be i.i.d. random variables with P(X k = 1) = 1P(X k = 0) = 1=2 for each k, and setting S n = P n i=1 X i , Cherno bound gives P S n n 1 2 t 2e 2nt 2 ; t> 0 improving the bound that can be obtained via Chebychev's inequality signicantly. Since the inspiring work of Cherno on sums of independent random variables, there has been extensive research, and progress, towards obtaining concentration bounds in various dierent settings. Some notable work are Hoeding's inequality [40], Efron-Stein inequality [25], martingale techniques [54], the use of self bounding and certiable func- tions [55], among several others. See Section 2.1 for a brief discussion of these well known techniques. In particular, Michel Talagrand published his well known Talagrand's inequality in 1996, as \A new look at independence" [68], which provided a general framework for 3 the concentration phenomenon in product spaces. Since then his powerful inequality has found several applications in important problems (such as the longest increasing subsequences and traveling salesman problems) and provided signicant improvements over all the techniques known at that time. Talagrand [68] summarizes the intuition behind `a new look at independence' as: a random variable that smoothly depends on a large number of independent random variables (but not too much on any of them), is `essentially' constant, in a `dimension-free' way. After Talagrand's groundbreaking work, the study of concentration inequalities has boosted even more. In this dissertation, we will follow one recent approach, the use of couplings from Stein's method, to analyze the tail behavior of random variables. As we shall see below, this technique has exactly the same advantage as its counterpart in distributional approximations; obtaining concentration bounds for functions of possibly dependent random variables will be merely a task of constructing couplings that satisfy certain conditions. Although the connections between Stein's method and concentration inequalities were known for a long time (for example, see [6]), this was rst formalized by Sourav Chatter- jee [13] in 2007 using the exchangeable pairs coupling. A one line summary of Chatterjee's work can be given as: If we can construct an exchangeable pair of Y on the same proba- bility space that is close to Y in a certain sense, then we can obtain exponential bounds for the tails of Y . This is indeed quite intuitive once we compare it with the existing methods such as the bounded dierences inequality of Colin McDiarmid [54]. Later in 2011, Subhankar Ghosh and Larry Goldstein [28] employed Chatterjee's approach to show that one other important coupling, size biased couplings, can also be used to prove concentration inequalities for nonnegative random variables. We recall from [37] that for a nonnegative, integrable random variableY , another random variableY s is said to have the Y size biased distribution ifE[Yf(Y )] =E[Y ]E[f(Y s )] for all functions f for which the expectations exist. Results of [28] show that under a bounded coupling assumption, namelyjY s YjC a:s: for someC > 0,Y satises certain concentration 4 properties around its mean. Our starting point in this dissertation will be the work of Ghosh and Goldstein in [28] and [29]. Here is an outline for the following chapters. In Chapter 2, we review the literature on Stein's method and concentration inequal- ities very brie y. In Chapter 3, we rst provide the necessary background on size biased distributions and constructions of size biased couplings. Then we move on to improving the concentra- tion results of [28], and we demonstrate the new bounds on simple applications. Further, we show that size biased couplings can also be used to obtain correlation inequalities and include several examples for comparison purposes with other techniques. We conclude the chapter by a discussion of a closely related coupling, equilibrium coupling, and show how these couplings can also be used to derive similar bounds. In Chapter 4, we discuss the relation between size biasing and log-concavity, and show how bounded size biased couplings can be constructed for a large class of statistics in certain occupancy models with log-concave marginals. We demonstrate the concen- tration results with applications on multinomial occupancy and random graph models. A comparison of our bounds to other techniques in the literature is also included in this chapter. Chapter 5 is devoted to a study of multivariate concentration bounds via size biased couplings. After reviewing marginal size biasing brie y, we prove a concentration result for random vectors and discuss its applications on two examples. Chapter 6 is on the use of zero biased couplings for concentration inequalities. We rst provide the necessary background on the zero bias transformation and then prove several concentration bounds. The results are then applied to the Hoeding statistic in dierent settings. Finally, a weaker form of our results in a multivariate setting is also given in this chapter. We conclude the dissertation in Chapter 7 by summarizing our contributions and discussing some directions for future work. 5 Chapter 2 Overview of Stein's method and concentration inequalities The purpose of this chapter is to discuss some preliminaries for the upcoming chapters. In Section 2.1, we provide some background on well known concentration inequalities includ- ing the bounded dierences inequality, Efron-Stein inequality and Talagrand's inequality. Section 2.2 is devoted to some basics about Stein's method. A discussion of the use of couplings in distributional approximations can also be found here. Finally, previous work in the literature on connections between Stein's method and concentration is included in Section 2.3. 2.1 Concentration inequalities Concentration inequalities provide probability bounds on how a random variable devi- ates from some value (e.g. its expectation). Some of the earliest examples of concen- tration inequalities are Markov's inequality, Chebyshev's inequality and Paley-Zygmund inequality. We already mentioned Chebyshev's inequality in the Introduction which is an immediate corollary to Markov's inequality which states that for a nonnegative random variable Y , the inequality P(Y t) E[Y ] t (2.1) 6 holds for any t> 0. On the other hand, Paley-Zygmund inequality takes both the mean and the variance into account; again letting Y be a nonnegative random variable and 2 (0; 1), it provides the bound P(Y E[Y ]) (1) 2 (E[Y ]) 2 E[Y 2 ] : (2.2) Despite the simplicity of all these bounds, in almost all applications, the estimates obtained via Chebyshev type inequalities are not sharp enough and we actually need to show that the tails of the distribution are decaying exponentially. We will now go over some techniques that are well known in the literature and that reveal such information about the tails of random variables. We divide this discussion into four parts: (1) Basic inequalities, (2) Efron-Stein inequality, (3) Talagrand's inequality and (4) Dependent sums / Negative association. 1. Basic Inequalities : Let's start with Cherno's inequality and its proof. The idea behind this simple proof is the basis for all other techniques for obtaining concentration bounds. Let X i 's be independent Bernoulli random variables with success probability p and set S n = P n i=1 X i . For > 0 and r> 0, an application of Markov's inequality gives P(S n >r) =P(e Sn >e r ) E[e Sn ] e r = (pe +q) n e r : (2.3) Now if we use the parametrization r(t) = (p +t)n, then (2.3) yields P(S n > (p +t)n) pe +q e (p+t) n : (2.4) Naturally, our aim is minimizing the functiong() := pe +q e (p+t) over> 0. Solvingg 0 () = 0 for gives e = q(p+t) p(qt) . Substituting this into (2.4) and using some elementary manipu- lations we arrive at 7 Theorem 2.1.1 (Cherno's Inequality, [18]) Let X 1 ;:::;X n be independent random variables with success probability p and S n = P n i=1 X i . Then for any t> 0, we have P(S n > (p +t)n) exp (p +t) ln p +t p (qt) ln qt q n : (2.5) The bound in (2.5) is the best possible for sums of Bernoulli random variables we can obtain via the estimation of the moment generating function. Estimating the right hand side of (2.5), one can now obtain several useful tail inequalities forS n that are frequently used in applications. For several other versions one can attain, see [22]. Also note that, although the proof technique used for Cherno's inequality is quiet simple, it is still remarkable as all the advanced techniques rely on estimating the moment generating function. One other well known concentration inequality that will be mentioned a few times during this dissertation is the Hoeding's inequality. Theorem 2.1.2 (Hoeding's inequality, [40]) LetX 1 ;X 2 ;:::;X n be independent random variables that satisfy a i X i b i a.s. for i = 1; 2;:::;n. Then for S n = P n i=1 X i and t> 0, we have P (S n E[S n ]>t) exp t 2 2 P n i=1 (b i a i ) 2 : Hoeding's inequality has been quite in uential in the literature thanks to the fact it also has a martingale version, which eventually triggered the study of concentration inequalities for functions with bounded dierences. Before discussing this, we nally state Bennett's inequality, again for sums of independent random variables, which is important in the sense that it provides sub-Poissonian tail bounds. 8 Theorem 2.1.3 (Bennett's inequality, [9]) Let X 1 ;X 2 ;:::;X n be independent random variables with nite variances such that X i C for some C > 0 a.s. for all i = 1;:::;n. Let S n = P n i=1 X i and = P n i=1 E[X 2 i ]. Then for t> 0 P(S n E[S n ]t) exp C 2 h Ct ; (2.6) where h(u) = (1 +u) log(1 +u)u for u> 0: One interesting note about Bennett's inequality is that it reveals another well known concentration bound, Berstein's inequality, as a corollary. Under the same setting, this states that P(S n E[S n ]t) exp t 2 2( +Ct=3) ; t> 0: (2.7) See [11] for details. Going back to the martingale version of Hoeding's inequality, we nally mention an important corollary, known as the bounded dierences inequality. This technique was popularized by Colin McDiarmid in his 1989 survey paper [53] and is still used in several applications. In the following, a function f : n ! R is said to have the bounded dierences property if for some constants c k , jf(x)f(x 0 )jc k holds whenever x; x 0 2 n dier only in k th coordinate. Functions with this special property enjoy the following concentration bound. Theorem 2.1.4 (Bounded dierences (McDiarmid's) inequality, [40], [53]) Suppose that X 1 ;X 2 ;:::;X n 2 are independent, and f : n ! R satises the bounded dif- ferences property with c 1 ;c 2 ;:::;c n . Then for t> 0, we have P(jf(X 1 ;:::;X n )E[f(X 1 ;:::;X n )]jt) 2 exp 2t 2 P n k=1 c 2 k : 9 2. The Efron-Stein Inequality : Most of the results discussed above for sums of independent random variables can be shown to apply to general functions of independent random variables. One simple, yet powerful result in this direction is the Efron-Stein inequality [25] which provides upper bounds on the variance of a general function. The inequality was rst introduced in 1981 to understand the nature of the bias in the jackknife estimate of variance, and has found use in several applications since then. Theorem 2.1.5 (Efron-Stein inequality, [25], [63]) Let Z =f(X 1 ;:::;X n ) be a function of n independent random variables X 1 ;:::;X n , and assume that X 0 i is an independent copy of X i for each i = 1;:::;n. Then Var(Z) 1 2 n X i=1 E[(f(X 1 ;:::;X i1 ;X i ;X i+1 ;:::;X n )f(X 1 ;:::;X i1 ;X 0 i ;X i+1 ;:::;X n )) 2 ]: As a simple consequence, when the function f satises the bounded dierences con- dition with c 1 ;c 2 ;:::;c n , we immediately arrive at Var(Z) (1=2) P n i=1 c 2 i . Although there is no guarantee that the upper bound obtained via Efron-Stein inequality will be of the right order, it is still useful as it can be used to obtain variance bounds for quite complicated problems by just checking a simple condition. It is also possible to obtain an exponential version of Efron-Stein inequality by using the modied log Sobolev approach for concentration inequalities. We need some notation to state a result in this direction. Let X 1 ;:::;X n be real valued independent random variables and X 0 1 ;:::;X 0 n be independent copies of them. Also let f : R n ! R be a measurable function. Set Z = f(X 1 ;:::;X n ) and Z i = f(X 1 ;:::;X i1 ;X 0 i ;X i+1 ;:::;X n ), and denote the random vector (X 1 ;:::;X n ) by X. Further, dene the random variables V + =E " n X i=1 (ZZ i ) 2 1(Z >Z i ) X # 10 and V =E " n X i=1 (ZZ i ) 2 1(Z <Z i ) X # : Theorem 2.1.6 [10] Let ;> 0 be such that < 1 and E[e V + = ]<1: Then logE[e (ZE[Z]) ] 1 logE[e V + = ]: Also if we assume that Z i Z 1 for every 1in, then for all 2 (0; 1=2), logE[e (ZE[Z]) ] 2 1 2 logE[e V = ]: For excellent applications and more information on Efron-Stein type inequalities, we refer to [11]. We conclude our discussion by noting that several other techniques for obtaining variance bounds have been introduced in the last few years that rely on simple properties to be satised by the functions. One example is the self-bounding property; We say that a nonnegative functionf :R n !R has the self-bounding property if there exist functions f i :R n1 !R such that for all x 1 ;:::;x n 2R and all i = 1;:::;n, 0f(x 1 ;:::;x n )f i (x 1 ;:::;x i1 ;x i+1 ;:::;x n ) 1 and also n X i=1 (f(x 1 ;:::;x n )f i (x 1 ;:::;x i1 ;x i+1 ;:::;x n ))f(x 1 ;:::;x n ): It is clear that for a self-bounding function f, we have n X i=1 (f(x 1 ;:::;x n )f i (x 1 ;:::;x i1 ;x i+1 ;:::;x n )) 2 f(x 1 ;:::;x n ): This observation combined with the Efron-Stein inequality has the following corollary. 11 Theorem 2.1.7 Keeping the notations as above, if f has the self-bounding property, then Var(Z)E[Z]: We will discuss more about self-bounding functions (and certiable functions, a sub- class of self-bounding functions) in Section 4.3. 3. Talagrand's Inequality : We start by xing notation and discussing some pre- liminaries. In the following we consider a product probability measure P on a product space n . For a given nvector = ( 1 ;:::; n ) of nonnegative real numbers, we dene the -Hamming distance between two points x = (x 1 ;:::;x n ); y = (y 1 ;:::;y n )2 n as d (x; y) = n X i=1 i 1(x i 6=y i ): If A n , then the distance of A to a point x2 n is d (x;A) = inffd (x; y) : y2Ag: With these, we can now dene the Talagrand's distance as d T (x;A) = supd (x;A) where the supremum is taken over all nonnegativen-vectors of length 1 with respect to theL 2 metric. By considering the nvector = (1 p n;:::; 1 p n), we see thatd T (x;A) (1= p n)d H (x;A) whered H is the standard Hamming distance. Thus, in particular upper bounds on Talagrand's distance also provide upper bounds for the standard Hamming distance. Before stating Talagrand's inequality, let's note that his result is a powerful gener- alization of the bounded dierences inequality discussed above. To see that this latter inequality can also be interpreted in terms of the Hamming distance, let f : n ! R 12 be a function such thatjf(x)f(y)j d (x; y) for all x; y2 n . Then the bounded dierences inequality provides the bound P(fE[f]t) exp 2t 2 P n i=1 2 i : The main weakness of bounded dierences inequality is that it allows to consider func- tions f that satisfy the Lipschitz condition for a xed . In applications, however, one also has to handle functions f which satises conditions such as f(x)f(y) n X i=1 i (x)1(x i 6=y i ) where : n !R n has certain boundedness conditions. Talagrand's inequality provides a general framework which work out for such cases and a lot more. Theorem 2.1.8 (Talagrand's inequality, [68]) Let X = (X 1 ;:::;X n ) be a family of inde- pendent random variables with X i taking values in a nite set i , and let A be a subset of n = n i=1 i . Then P(X2A)E(e 1 4 d T (X;A) 2 ) 1 for any product probability measure on n . An immediate corollary is that the inequality P(X2A)P(d T (X;A)t)e t 2 4 holds for any t 0. In particular, when P(X2A) 1=2, we obtainP(d T (X;A)t) 2e t 2 4 . The standard way of obtaining concentration bounds on specic problems via Tala- grand's inequality is as follows: Given a functionf with medianm f , letA =fx :f(x) 13 m f g so thatP(A) 1=2, and then nd a function s(t) such that f(x)m f >t implies d T (x;A)>s(t). Combining with the discussion above, P(fm f >t)P(d T (x;A)>s(t)) 2e s(t) 2 =4 : Note that the concentration bounds obtained with Talagrand's inequality are around the median instead of the mean. However, this is in general not a signicant problem as the median and mean can be shown to be close to each other. See [54] for a discussion of this. Since its appearance, Talagrand's result has found applications on several impor- tant problems, for example, the traveling salesman problem, Steiner trees and longest increasing subsequences. See Steele [64] and McDiarmid's [54] survey papers for very nice overviews and applications. We conclude this section by stating an important consequence of Talagrand's con- centration result. Theorem 2.1.9 For every product probability measure P on [0; 1] n , every convex 1- Lipschitz function f on R n , and every t 0, P(jfm f jt) 4e t 2 =4 ; where m f is the median of f for P. 4. Dependent sums / Negative association : Here we consider concentration inequalities for random variables of the form Y = P n i=1 X i where X 1 ;X 2 ;:::;X n are dependent random variables. Our focus will be merely on the case where Y is a sum of negatively associated random variables as this type of dependence will emerge in the following chapters. However, we note that there are several other dependence types that allow concentration bounds, some of which are the limited dependence, local dependence 14 and Markov dependence of random variables. See [22] and the references therein for more on these cases. In many statistical problems, the standard assumption that the underlying random variables are independent is not plausible. Increases in some variables are often related to decreases or increases in other variables. To understand this phenomenon partially, the concept of negative association of random variables was introduced in [21] and several properties of this family were analyzed. Formally, a nite collection of random variables X 1 ;:::;X n is said to be negatively associated if for any disjoint subsets A 1 ;A 2 [n] :=f1; 2;:::;ng, E[f(X i ;i2A 1 )g(X j ;j2A 2 )]E[f(X i ;i2A 1 )]E[g(X j ;j2A 2 )] whenever f and g are coordinate-wise nondecreasing functions for which the expecta- tions exist (The collection is said to be positively associated if the inequality is reversed). One can obtain Cherno-Hoeding type concentration bounds under negative associa- tion assumption since this special class of random variables enjoy the following useful property: If X 1 ;:::;X n are negatively associated, then for any > 0, E[e P n k=1 X k ] n k=1 E[e X k ]: (2.8) That is, the moment generating function of Y = P n k=1 X k can be estimated exactly as in the case of independent sums, and so an application of Markov's inequality yields concentration bounds just like the independent case. Three useful properties of negatively associated random variables are summarized in the following proposition. 15 Proposition 2.1.1 ([21], [23]) 1) (Closure under products) If X 1 ;:::;X n and Y 1 ;:::;Y m are two independent fami- lies of random variables that are separately negatively associated, then the family X 1 ;:::;X n ;Y 1 ;:::;Y m is also negatively associated. 2) (Disjoint monotone aggregation) If X i ;i2 [n], are negatively associated andA is a family of disjoint subsets of [n], then the random variables f A (X i ;i2A);A2A, are also negatively associated whenever f A 's are arbitrary nondecreasing (or non- increasing) functions. 3) (Zero-one law) If X 1 ;:::;X n are indicators satisfying P n i=1 X i = 1, then they are negatively associated. Here is an example demonstrating these properties. Example 2.1.1 (Urn statistics) For 2 [m], let the component M of the vector M = (M ) 2[m] count the number of balls in box when n balls are independently distributed into m boxes, with the position of ball j being box having probability 1=m: Dening Y = X 2[m] 1(M d ) for some nonnegative constants d , the summands of Y are negatively associated. To see that this is indeed the case, for i = 1;:::;n and j = 1;:::;m let N i;j be the indicator of the event that i th ball ends up in j th box. Observe now that m X j=1 N i;j = 1 so that by item 3 of Proposition 2.1.1,fN i;j g m j=1 are negatively associated. Using the rst part of Proposition 2.1.1, we conclude that the set of random variables fN i;j :i2 [n];j2 [m]g 16 are negatively associated. Finally by second part of Proposition 2.1.1, we arrive at the conclusion that the random variables M = n X i=1 N i; are negatively associated. To see that the summands of Y are negatively associated, dene f (M ) = 1(M d ) and use the \disjoint monotone aggregation" property to show that f (M )'s are negatively associated as each of f 's are nondecreasing functions of M 's. In conclusion, Y = X 2[m] 1(M d ) is a sum of negatively associated random variables and so one can use standard Cherno type bounds discussed above to obtain concentration inequalities. In [23], the concentration of the number of empty boxes in occupancy models is studied using above observations and Cherno type bounds. This and more general results will be discussed in Chapter 4. 2.2 Stein's method 2.2.1 A brief introduction to Stein's method Stein's method refers to a general framework for bounding the distance between the distribution of a random variable W and that of a random variable Z having some specied target distribution ([65], [66]). The metrics for which this approach is applicable are of the form d H (L(W );L(Z)) = sup h2H jE[h(W )]E[h(Z)]j for some suitable class of functionsH. Note that Kolmogorov, total variation and Wasserstein distances are all included as special cases of d H once we chooseH asH 1 =f1( x) : x2 Rg; H 2 = f1( 2 A) : A 2 Borel(R)g;H 3 = fh : R ! R : jh(x)h(y)j jxyjg; respectively. 17 Here we give a brief overview of Stein's method and the use of couplings in this technique. Our intention is to provide some motivation for our work on concentration inequalities in the following chapters. Discussion below will mostly focus on the normal approximation. Let's start with a lemma from Stein's seminal paper [65], that provides a characteri- zation of the normal distribution. This observation lies at the heart of Stein's method. Lemma 2.2.1 If the random variable Z has the standard normal distribution, then E[f 0 (Z)Zf(Z)] = 0 for all absolutely continuous function f with Ejf 0 (Z)j < 1: Conversely, if for some random variable W , E[f 0 (W )Wf(W )] = 0 for all absolutely continuous functions f with Ejf 0 (W )j<1, then W has the standard normal distribution. Lemma 2.2.1 suggests the following idea: if we can show thatE[Wf(W )f 0 (W )] is close to 0 for a large enough class of functions, then W should be close to the standard normal distribution. To make this precise, letZN (0; 1) and leth be any function so thatEjh(Z)j<1: The Stein function f h associated with h is a function satisfying f 0 h (w)wf h (w) =h(w)E[h(Z)]: (2.9) The equation given in (2.9) is often called as the Stein equation. Now substitutingW for w and taking expectations in the Stein equation, we obtain E[h(W )]E[h(Z)] =E[f 0 h (W )Wf h (W )] 18 showing that W is close to normal amounts to showing that E[f 0 h (W )Wf h (W )] is small. We summarize this observation in the following lemma. Lemma 2.2.2 If W is a random variable and Z has the standard normal distribution, then d H (W;Z) = sup h2H jE[f 0 h (W )Wf h (W )]j: (2.10) Thus, in order to estimate the distance between W and Z we may just focus on bounding the right hand side of (2.10) by using the structure of W and properties of the solutions f h . For this purpose we rst need to know that a solution f h for the Stein equation in (2.9) always exists and satises certain nice properties. These are the contents of the next lemma. Lemma 2.2.3 Let f h be the solution of the dierential equation f 0 h (w)wf h (w) =h(w)E[h(Z)] which is given by f h (w) =e w 2 =2 Z 1 w e t 2 =2 (E[h(Z)]h(t))dt: 1. If h is bounded, then kf h k 1 r 2 kh()E[h(Z)]k 1 ; and kf 0 h k 1 2kh()E[h(Z)]k 1 : 2. If h is absolutely continuous, then kf h k 1 2kh 0 k 1 ; kf 0 h k 1 r 2 kh 0 k 1 ; and kf 00 h k 1 2kh 0 k 1 : Proofs of these facts are standard, see for example [16]. 19 To end the discussion, let's focus on the Wasserstein distance and summarize our work above with the following result which follows immediately from Lemma 2.2.2 and Lemma 2.2.3. Theorem 2.2.1 IfW is a random variable andZ has the standard normal distribution, then d W (W;Z) sup f2F jE[f 0 (W )Wf(W )]j (2.11) whereF is the family of functionsff :kfk 1 ;kf 00 k 1 2;kf 0 k 1 p 2=g: Since the seminal work of Stein, several techniques have been introduced to estimate the quantityjE[f 0 (W )Wf(W )]j. We refer to [16] for a complete treatment of the subject. For this dissertation, we will focus only on the coupling approach in Stein's method and see how these ideas can also be extended to analyze other properties of random variables. 2.2.2 Couplings from Stein's method This section provides the necessary background on the use of couplings in distributional approximations, and some motivation for the upcoming chapters. Use of couplings in distribution approximations was initiated by Charles Stein in [66] with the exchangeable pairs coupling. Focusing on the normal approximation, the general idea behind the use of couplings is to rewrite the E[Wf(W )] term, say, in (2.11) in a suitable way so that the estimation ofjE[f 0 (W )Wf(W )]j becomes more tractable. Here we exemplify this by making use of zero biased couplings for the normal approximation of sums of independent random variables. Zero biased couplings for normal approximation : Recall from [35] that for any mean zero random variableW with positive, nite variance 2 , there exists a distribution for a random variable W satisfying E[Wf(W )] = 2 E[f 0 (W )] (2.12) 20 for all absolutely continuous functions f for which the expectation of either side exists. The variableW is said to have theW zero biased distribution. By Stein's lemma [65] it is known that a random variable W is normal if and only if W = d W : We will discuss some other properties of zero biased biased transformation in Section 6.1. Our intention here is to demonstrate how a zero biased couplingW ofW , a random variable on the same space asW withW zero biased distribution, can be used in normal approximation problems. We will do this via using the following result whose proof can be found in [16]. Theorem 2.2.2 LetW be a mean zero and variance 1 random variable, and assume that there exists W , having the W zero biased distribution, dened on the same probability space as W , satisfyingjW Wj. Then sup z2R jP(Wz)P(Zz)jc where c = 1 + 1= p 2 + p 2=4 2:03 and Z is a standard normal random variable. Let's now see how Theorem 2.2.2 can be used to obtain Berry-Esseen bounds for sums of independent random variables. Following [16], we letX 1 ;:::;X n be independent, mean zero random variables, all bounded by some constantC. Also we let 2 i =Var(X i );B 2 n = P n i=1 2 i and W = n X i=1 i where i =X i =B n : Now as we shall see in Section 6.1, letting I be an index independent ofX 1 ;:::;X n , with distributionP(I =i) = 2 i =B 2 n ; i = 1;:::;n, the variable W :=W I + I has the W zero biased distribution where i is a i zero biased random variable on the same space for each i = 1;:::;n. Sincej i jC=B n for each i, we obtain 21 jW Wj 2C B n (2.13) which in turn gives d K (W;Z) = sup z2R jP(Wz)P(Zz)j 4:06C B n (2.14) by making use of Theorem 2.2.2. Note in particular that when the variances of X i 's are also equal, the resulting bound in (2.14) is of order (1= p n) coinciding with the standard Berry-Esseen theorem. The result above is not surprising since the zero biased transformation has the normal distribution as its unique xed point. The bounded coupling condition of (2.13) basically tells us that W is close to normal distribution in a certain sense, and in the following chapters we will be using this intuition to study other distributional properties of W . We close this section by noting that there are several other notable coupling approaches in Stein's method besides zero biased couplings. Size biased couplings intro- duced by Goldstein and Rinott [37], used in both normal and Poisson approximation, will be discussed in detail in the next chapter. 2.3 Previous work on concentration inequalities with Stein's method Recall that for a given random variableW , our discussion on couplings from the previous section indicates that one can manipulate E[Wf(W )] in a number of ways for a given function f with EjWf(W )j <1. Letting f(x) = e x , 2 R, this suggests that the derivative of the moment generating function of W , E[We W ], and so the generating function itself, can be estimated via the use of couplings from Stein's method. This provides a nice motivation for the relation between couplings from Stein's method and concentration inequalities. 22 Connections between the Stein's method and concentration inequalities have been known for quite some time, see for example, the relevant results in [6]. However, this was rst formalized by Sourav Chatterjee in [12] (Also see [13] and [14]) where he used the well known exchangeable pair coupling of Stein's method to obtain concentration bounds. We now recall a simplied version of his result which, we believe, makes the connections to Stein's method apparent. A pair of random variables (W;W 0 ) is said to be an aStein pair if (W;W 0 ) is exchangeable and E[W 0 jW ] = (1a)W: Theorem 2.3.1 [13] Let (W;W 0 ) be an a-Stein pair with Var(W ) = 2 < 1: If E[e W jW 0 Wj] <1 for all 2 R and if for some sigma-algebraF (W ) there are nonnegative constants B and C such that E[(W 0 W ) 2 jF] 2a BW +C; then for all t> 0, P(Wt) exp t 2 2C and P(Wt) exp t 2 2C + 2Bt : The key observation for the proof of Theorem 2.3.1 is that an a-Stein pair (W;W 0 ) satises E[Wf(W )] = 1 2a E[(W 0 W )(f(W 0 )f(W ))]; which follows easily by using some simple implications of being a Stein pair. See [62] for details. Also note that a consideration of the bounded dierences inequality and the standard construction of an exchangeable pair for a sum of independent random variables will provide some intuition for Chatterjee's theorem. 23 Later in 2011, Subhankar Ghosh and Larry Goldstein [28] employed Chatterjee's approach by making use of size biased couplings which are very important for distribu- tional approximations within dependent settings in Stein's method literature. This will be discussed in detail in the next chapter, for now we will content ourselves just to state the result of Ghosh and Goldstein. First recall that for a nonnegative, integrable random variable Y , another random variable Y s is said to have the Y size biased distribution if E[Yf(Y )] = E[Y ]E[f(Y s )] for all functions f for which the expectations exist. Here is the main result of [28]. Theorem 2.3.2 [28] LetY be a nonnegative random variable with nonzero, nite mean , and suppose there exists a coupling of Y to a variable Y s having the Y size bias distribution that satisesjY s YjC for some C > 0 with probability one. If Y s Y a.s., then P (Yt) exp t 2 2C for all t> 0. (2.15) If the moment generating function m() =E[e Y ] is nite at = 2=C, then P (Yt) exp t 2 2C +Ct for all t> 0. (2.16) In the following chapters, we will discuss applications, generalizations and improve- ments of Theorem 2.3.2. There has been several other recent work on connections between Stein's method and concentration inequalities besides the ones discussed above. We don't have the chance to go over all of these here, but as a nal note, we would like to mention the work of [51] which is quite interesting. In this paper, the authors use a matrix version of Chatterjee's exchangeable pair approach to prove concentration inequalities for eigenvalue statistics of random matrices. For details, see [51] and the follow up papers. We will revisit this discussion in Chapter 7 when we discuss Stein couplings as one of our future directions. 24 Chapter 3 Concentration inequalities with size biased couplings In this chapter, we rst give the necessary background on size biased distributions and constructions of size biased couplings in Section 3.1. In Section 3.2, we provide our improvements over the concentration bounds of [28] and demonstrate these in several examples in Section 3.3. Section 3.4 will be on the use of size bias couplings for correlation inequalities, and the last section is devoted to a study of a closely related coupling, equilibrium coupling, for obtaining results in similar directions. 3.1 Background on size biasing and construction of cou- plings Let Y be a nonnegative and integrable random variable. The distribution of Y s is said to be Y size biased if we have E[Yf(Y )] =E[Y ]E[f(Y s )]; (3.1) for all functionsf for which the expectations exist [37]. Using (3.1) withf(x) =x, we can immediately conclude that E[Y s ] = E[Y 2 ] E[Y] . Thanks to this and various other properties, size biased distributions are hidden in various well known and important problems. 25 One such example is the Paley-Zygmund inequality which we mentioned earlier in the introduction of Section 2.1. Recall that for a nonnegative random variable Y , this inequality provides the bound P(Y E[Y ]) (1) 2 (E[Y ]) 2 (E[Y ]) 2 +Var(Y ) ; 2 (0; 1) which can also be stated as P(Y E[Y ]) (1) 2 (E[Y ]) 2 E[Y 2 ] = (1) 2 E[Y ] E[Y s ] : Thus, we can interpret the Paley-Zygmund inequality as that the probability in the lower tail of the distribution of Y will be small if E[Y ] and E[Y s ] are close to each other. An immediate question at this point is whether we could learn more about the concentration ofY if we knew more about the closeness of the size biased couplingY s toY . For example, what can we say about the case where we havejY s YjC a.s. for some constantC? Or going from the bounded to dominated, what about the casejY s YjX a.s. whereX is some random variable on the same space? With this motivation, the works of Goldstein and Ghosh in [28], [29] and [30] showed that in various applications construction of such couplings may provide useful concentration inequalities. Before going into details, let us say a few more words on size biased distributions. First note that, focusing on the discrete case, one can easily see that the denition above is equivalent to the following one: For a nonnegative integer valued random variable Y with mean, some other random variableY s is said to haveY size biased distribution if P(Y s =k) = kP(Y =k) ; k 0 (3.2) is satised. This latter denition explains the nomenclature size biased immediately. Also observe that with this alternative form, we see that the support of Y s will always be a subset of the support of Y . 26 The discussion for the discrete case can be extended to nonnegative continuous ran- dom variables easily, but for the applications included in this dissertation we will just focus on discrete valued random variables. As a general remark, we note that with its discrete and continuous versions, size biasing has been realized to be signicant in var- ious problems such as the waiting time paradox. For this and several other interesting connections to size biasing, see the excellent survey [4]. Although the use of size biased couplings in Stein's method literature was rst ini- tiated in [37], the underlying ideas were available in, for example, [6] and [50]; In the former reference they also make use of size biasing to obtain concentration results. It is time to recall a general way of constructing size biased couplings in the case of sums of random variables. This construction is standard and we closely follow the discussion on page 24 of [62]. Assume Y = P n i=1 X i where X i 's are nonnegative with E[X i ] = i . Then we have the following recipe for constructing a size bias coupling of Y . 1. For each i = 1;:::;n, let X s i have the size bias distribution of X i independent of (X j ) j6=i and (X s j ) j6=i : Given X s i = x, dene the vector (X i j ) j6=i to have the distribution (X j ) j6=i conditional on X i =x: 2. Choose a random summand X I , where the index I is independent of all else and has distributionP(I =i) = i =, where =E[Y ]. 3. Dene Y s = P j6=I X I j +X s I : Proposition 3.1.1 Let Y = P n i=1 X i , with X i 0; E[X i ] = i 2 (0;1). If Y s is constructed by Items 1-3 above, then Y s has the Y size biased distribution. Note that when X is an indicator random variable, using (3.2) we see that P(X s = 0) = 0: So size biased version of an indicator is merely the constant function with value 1. This helps us to get the following corollary of Proposition 3.1.1 for the case of a sum of indicators. 27 Corollary 3.1.1 Let X 1 ;:::;X n be indicators withP(X i = 1) =p i : For each i = 1;:::;n, let (X i j ) j6=i have the distribution of (X j ) j6=i conditional on X i = 1: If Y = P n i=1 X i ; = E[Y ], and I is chosen independent of all else with P(I = i) = p i =; then Y s = P j6=I X I j + 1 has the Y size biased distribution. Before moving on the next section, let's give a few examples of size biased distribu- tions. Example 3.1.1 a. (Bernoulli) If Y Bern(p) with p2 (0; 1], then Y s = 1 has the Y size bias distribution. b. (Sums of independent random variables) Let Y = X 1 +X 2 +::: +X n where X i 's are i.i.d. random variables taking values in [0;1) withE[X 1 ]2 (0;1). ThenY s = X s 1 +X 2 +::: +X n has the Y size biased distribution where X s 1 is independent of X 2 ;:::;X n and has the X 1 size bias distribution. Same result will hold for Y s = YX I +X s I whereI is a random variable independent of all else with distribution P(I =i) = 1=n for i = 1; 2;:::;n. c. (Binomial) If Y Bin(n;p), then the observations in parts a and b immediately give its size biased version as 1 +Bin(n 1;p). d. (Poisson) If Y PO(), then Y s =Y + 1 has the Y size biased distribution. This follows immediately by a direct computation of the size biased distribution. e. (Geometric) Let Y Geo(p) for some p2 (0; 1). An easy computation will show that the size bias distribution of Y is equivalent to NB(2;q) + 1 where q = 1p and NB is the negative binomial distribution. Letting G = d Geo(p) independent of Y , since Y +G = d NB(2;q), we conclude that Y s = Y +G + 1 has the Y size biased distribution. f. (Exponential) Let Y exp(), then Y s =Y +X, where X is an independent copy ofY , has theY size bias distribution. This can be observed by easily computing the 28 density of Y s as Gamma(2;), and recalling that a Gamma(2;) is a sum of two independent exponential random variables with parameter . 3.2 Concentration results We have two main goals in this section; (1) removing the monotonicity assumption (Y s Y a.s.) in the lower tail inequality of (2.15), and (2) improving the order of the concentration bounds in the upper tail inquality of (2.16). Regarding the rst of these two, sinceY s stochastically dominatesY , it is always pos- sible to nd a monotone coupling, however this coupling will not be necessarily bounded. In particular, we note that removal of this condition will reveal lower tail inequalities for all applications in [29] which previously lacked one. For some motivation about the second result, recall from Example 3.1.1.d that the distributional transformationY 7!Y s 1 has the Poisson distribution as its unique xed point. Thus, it is natural to assume that a random variable Y will be approximately Poisson if one can nd a coupling of Y and Y s that are close to each other in a certain sense (which is quantied asjY s YjC in Ghosh and Goldstein's setting), and this will in particular imply that the tail behavior of Y will resemble the tail behavior of Poisson distribution. Now also recall that if Z a Poisson random variable with parameter > 0, then the moment generating function of Z is given by E[e Z ] = exp((e 1)) for all 2R. So for any ;t> 0, one can obtain P(Zt) expf(e 1)(t +)g (3.3) by using the exponential Chebyschev inequality, and choosing = log(1 +t=) to mini- mize the right hand side of (3.3) yields P(Zt) exp t ( +t) log +t (3.4) 29 which is of smaller order than exp(t) for t large where > 0 is some xed number. This discussion indicates that it is natural to assume that the upper bound of order exp(t) in (2.16) can be improved. An explicit example for the discussion above can be given using the number of xed points in uniformly random permutations. It is well known that the number of xed points in a random permutation converges to the Poisson distribution, and so it is nat- ural to assume that a tail behavior as in (3.4) will be true for this statistic. Previous concentration bounds for such permutation statistics obtained via the couplings from Stein's method ([13], [33]) will provide Bernstein type bounds for the upper tail which is not optimal. We will come back to this example in Section 3.3. Here is the main result. Theorem 3.2.1 Let Y be a nonnegative random variable with nonzero, nite mean and assume that the moment generating function of Y exists everywhere. Suppose that there exists a coupling of Y to a variable Y s having the Y size bias distribution which satises Y s Y C for some C > 0 with probability one. Then P (Yt) exp t 2 2C (3.5) and P(Yt) exp 1 C t ( +t) log +t (3.6) for all t> 0. Proof: Let us start with the proof of the left tail inequality. Let m be the moment generating function ofY andx< 0. Dierentiating under the expectation by dominated convergence and then applying the characterization of the size bias distribution (3.1), 30 followed by an application of the bound Y s Y C a.s. and the inequality 1 +xe x , we obtain m 0 (x) =E[Ye xY ] =E[e xY s ] = E[e xY e x(Y s Y) ] E[e xY (1 +x(Y s Y ))] = m(x) +xE[e xY (Y s Y )] m(x) +xCm(x) = m(x)(1 +xC): Rearranging yields 0m 0 (x)m(x)(1 +xC); and multiplying each side by e (x+Cx 2 =2) we have 0 (m(x)e (x+Cx 2 =2) ) 0 for all x< 0 : Letting < 0, integration gives 0 Z 0 (m(x)e (x+Cx 2 =2) ) 0 dx = 1m()e (+C 2 =2) by which we obtain m()e (+C 2 =2) : Next letting M() =E[e (Y) ] =e m(), we get M()e e (+C 2 =2) =e C 2 =2 : Hence for xed t> 0 and for all < 0, P(Yt) =P(e (Y) e t )e t M()e t+C 2 =2 31 by Markov's inequality. Substituting =t=C yields the claim. Next we prove the upper tail inequality. As above, let m be the mgf of Y . Then for any x> 0, we have m 0 (x) =E[Ye xY ] =E[e xY s ] =E[e x(Y s Y) e xY ]e Cx m(x) since Y s Y C: This yields m 0 (x) m(x) e Cx : So for > 0, we have logm() = Z 0 m 0 (x) m(x) dx Z 0 e Cx dx = C (e C 1); or equivalently, by exponentiating each side m() exp C (e C 1) : Using Markov's inequality, we arrive at P(Yt) =P(e Y e (+t) ) exp(( +t))m() exp(( +t)) exp C (e C 1) = exp C (e C 1)( +t) for any t> 0 and > 0. Substituting = 1 C log +t , result follows. Remark 3.2.1 Proof above also relaxes the bounded coupling conditionjY s Yj C of [28] to Y s Y C. This will turn out to be useful in Chapter 4. Remark 3.2.2 Improvements we provided in Theorem 3.2.1 were recently observed by Arratia and Baxendale [2] as well. Although their proof is the same for the upper tail inequality, it relies on a completely dierent idea for the lower tail. We refer to [2] for 32 details. It is also worth noting that (1) they have product bounds that includes (3.6) as a corollary, and that (2) they show that we do not need to have any assumptions on mgf as a bounded size biased coupling ensures the existence of mgf everywhere. Note that the upper tail inequality given in (3.2.1) can be rewritten in the more familiar form P(Yt) exp C h t for all t> 0 where h(x) = (1 +x) log(1 +x)x, x1. Using the inequality h(x) x 2 2 + 2x=3 ; x 0 (for example, see [11, Exercise 2.8]), one immediately obtains the following Bernstein type inequality as a corollary, which still provides a slight improvement over the results of Ghosh and Goldstein in (2.16). Corollary 3.2.1 In the setting of Theorem 3.2.1, P(Yt) exp t 2 2C + 2Ct=3 for all t> 0. (3.7) Remark 3.2.3 The heuristics we discussed in the introduction of this section for obtain- ing sub-Poissonian upper tail inequalities via size biased couplings can also be useful for other distributions with some other transformation instead of the size bias one. For exam- ple, see Chapter 6 to see how this works out for the normal distribution via the zero bias transformation. It is natural to ask whether one can replace the C term in the boundedness condi- tion Y s Y C by some dominating function which is possibly unbounded. Indeed, under certain negative and positive association assumptions, one can still obtain similar concentration bounds. We content ourselves to state just one left tail inequality in this direction as we do not have applications of such results yet. 33 Theorem 3.2.2 Let Y be a nonnegative random variable with nite, nonzero expecta- tion and with nite mgf everywhere, and let Y s be a size biased coupling of Y . Further assume that there exists a random variable X on the same space that is positively asso- ciated with Y satisfying Y s Y g(X) a.s. where g is an increasing function. Then we have P(Yt) exp t 2 2C for all t> 0 where C =E[g(X)]: Proof: Let < 0. As before we have m 0 () =E[e Y s ]m() +E[(Y s Y )e Y ]m() +E[g(X)e Y ]: Now since X and Y are positively associated, and since the functions f(x) = e x and h(x) =g(x) are decreasing we get m 0 ()m() +E[g(X)]E[e Y ] =(1 +C)m(): Rest of the proof follows exactly as above. Simple examples regarding the concentration bounds of this section will be discussed in the next section. Also see Chapter 4 and Chapter 5 for further applications in uni- variate and multivariate settings, respectively. 3.3 Basic examples We now apply the concentration bounds of the previous section on some basic examples: (1) Poisson distribution, (2) Sums of independent random variables, (3) Random sums, (4) Number of non-isolated balls in multinomial occupancy, (5) Generalized matching problems, (6) Innitely divisible distributions. 34 Example 3.3.1 (Poisson distribution) WhenY has Poisson distribution with parameter , as we noted in Section 3.1,Y s =Y +1 hasY size biased distribution. Thus the bounds from the previous section can be used with C = 1 to obtain the following concentration bounds P(Yt) exp t 2 2 and P(Yt) exp t ( +t) log 1 + t : (3.8) Note that (3.8) coincides with the bound in (3.4) which we obtained by directly minimizing the mgf of the Poisson distribution. Example 3.3.2 (Sums of independent random variables) Let us consider the case Y = P n i=1 X i where X i 's are nonnegative i.i.d. random variables with X 1 C and E[X 1 ]2 (0;1). Then Y s =YX n +X s n has the Y size biased distribution, and we obtain Y s Y =X s n X n C since support of X s n is a subset of the support of X n . Thus, we conclude that Y satises the bounds P(Yt) exp t 2 2C and P(Yt) exp 1 C t ( +t) log +t exp t 2 2C + 2t=3 : (3.9) All of these bounds are standard in the literature for sums of independent random vari- ables. In particular, upper tail inequalities in (3.9) are known as Bennett and Bernstein 35 inequalities, respectively. Also note that this discussion can be generalized to non-i.i.d. and locally dependent settings with a little more eort. Example 3.3.3 (Random sums) A random sumS N is a sum of random variablesS N = P N i=1 X i where the number of terms in the summation, N, is also random. Asymptotic behavior of such sums are well studied, see for example [16] and [43]. We rst discuss how we can construct bounded size biased couplings for such sums whenX i 's are independent, and then use this to establish concentration bounds. We also mention some pointers how one can generalize the argument for locally dependent cases. Let X 1 ;X 2 ;::: be a sequence of nonnegative i.i.d. random variables with E[X 1 ] 2 (0;1), N be a nonnegative integer valued random variable with E[N] 2 (0;1), and suppose thatN is independent ofX i 's. SetS m = P m i=1 X i form 0, with the convention P 0 i=1 X i = 0: For the coupling construction, let X s be independent of all else with X 1 size biased distribution and N s be again independent of all else with N size biased distribution. We claim that S N s 1 +X s has S N size biased distribution. To prove this, rst we observe that for k 0 P(S N s 1 +X s =k) = 1 X n=1 P(S n1 +X s =k)P(N s =n) = 1 X n=1 P(S n1 +X s =k) nP(N =n) E[N] = 1 X n=1 P(S s n =k) nP(N =n) E[N] 36 where we just used the denition of size biased distribution and the standard construction for deterministic sums of i.i.d. random variables. Hence, by an application of Wald's identity, we get P(S N s 1 +X s =k) = 1 E[N] 1 X n=1 kP(S n =k) E[S n ] nP(N =n) = k E[N]E[X 1 ] 1 X n=1 P(S n =k)P (N =n) = k E[S N ] P(S N =k) so that S N s 1 +X s has indeed S N size biased distribution. Theorem 3.3.1 LetX 1 ;X 2 ;::: be a sequence of nonnegative i.i.d. random variables with X 1 M, E[X 1 ]2 (0;1). Also let N be a nonnegative integer valued random variable independent of X i 's withE[N]2 (0;1), and dene S N = P N i=1 X i . If there exists a size biased coupling N s of N withjN s NjD, then the conclusions of Theorem 3.2.1 and Corollary 3.2.1 hold with C =M(D + 1) Proof: Using the construction described just above, we have jS s N S N j = N s 1 X i=1 X i N1 X i=1 X i +X X N MjN s Nj +jX X N j MD +M =M(D + 1) =C from which the result follows immediately. For example, when N = P n i=1 Z i where Z i 's are i.i.d. Bernoulli random variables with parameter p; we havejN s Nj 1 and sojS s N S N j 2M: Thus we obtain tail bounds such as P(S N E[S n ]t) exp t 2 4ME[S N ] 37 and P(S N E[S n ]t) exp t 2 4ME[S N ] + 4Mt=3 where E[S N ] =E[N]E[X 1 ] =npE[X 1 ] by Wald's identity. Our discussion above on independent summands can also be extended to random sums where the summands satisfy certain local dependence conditions. A rigorous formulation of this requires too much notation, and we skip it here. But a combination of the ideas for the case of independent random sums above and the case with locally dependent deterministic sums in Section 5.2 provide an outline how this can be succeeded. Example 3.3.4 (Number of non-isolated balls in multinomial occupancy) The purpose of this example is to note that the results of Section 3.2 provide improvements in all applications that were previously discussed in [28] and [29]. Here we demonstrate this by borrowing an example from [29] on multinomial occupancy. Results of Chapter 4 will provide a quite general framework for related statistics in various occupancy models. Consider the standard balls and boxes model where n balls are independently placed into one ofm boxes, with the probability of a ball being in any box having equal probability. For i = 1; 2;:::;n, let X i denote the location of ball i, that is, the number of box where ball i ends up. The number of non-isolated balls is then given by Y = n X i=1 1(Z i > 0) where Z i =1 + n X j=1 1(X j =X i ); and results of [29] show that one can nd a size biased coupling of Y which satises jY s Yj 2; but which is not necessarily monotone (That is, Y s Y does not hold). Results of Section 3.2 improve the concentration bounds of [29] for the number of non- isolated balls in two directions: (1) We now have the lower tail inequality as well, since 38 we do not require monotonicity anymore. (2) The bounds for the upper tail inequality is now sharper. In particular, we now have sub-Poissonian bounds for the number of non-isolated balls. Example 3.3.5 (Generalized matching problems) Let be a uniformly random permu- tation in S n and dene F n;d =fi2f1;:::;ng :j(i)ijdg where we assume that n 2d + 1: Poisson approximation for random variables of the form Y =jF n;d j (and for other generalized matching problems) were previously studied in [6] in depth, and with that motivation we consider its concentration now. First note that Y = P n i=1 X i where X i = 1(j(i)ijd): Denote E[X i ] by i and E[Y ] by . One can easily check that = 2d + 1 2 n d X i=1 (di): (3.10) Now to size bias Y , let I be an independent random variable taking values inf1;:::;ng with distribution P(I =i) = i = i P n j=1 j ; i = 1;:::;n: Also let K i =fj :jjijdg for i = 1;:::;n. Now if X I = 1, then do nothing and set I = . Otherwise, let J be uniform over K I and transpose (I) and J to get a new permutation I . Dening Y I = n X i=1 1(j I (i)ijd); 39 Y I has the Y size biased distribution. Also it is easy to see thatjY I Yj 2: Hence we obtain P (jF n;d jt)exp 1 2 t ( +t) log 1 + t where is as given in (3.10). When d = 0, jF n;0 j is just the number of xed points in and we obtain a bound of order e t 2 logt . This improves the corresponding results of Chatterjee [13] where he obtains concentration inequalities for the generalized matching problem by making use of the exchangeable pairs method. As a nal note, the example just described can also be generalized to random variables dened by jF n;d;e j =jf(i;j)2f1;:::;ng 2 :jijjd;j(i)(j)jegj where d and e are xed positive integers. Example 3.3.6 (Innitely divisible distributions) This example is intended to remind the reader a connection between size biasing and innite divisibility. Let Y be a non- negative random variable with nite nonzero mean. Then we can express its size biased version Y s as Y s =Y +X for some X independent of Y , if and only if Y is innitely divisible [4]. This is quite interesting as the techniques of Section 3.2 can still be used to analyze the concentration of Y although X is not necessarily bounded. For one such simple example, we may consider the geometric distribution with parameterp2 (0; 1). ThenY s =Y +G+ 1 whereG is independent of Y and is again a geometric random variable with parameter p. Note thatG is unbounded, but we can still follow our approach, say in Theorem 3.2.2, to estimate the tails of Y . 40 The reader might also like to check the discussion in [29] on innitely divisible distri- butions. We just note here that the concentration bounds of Section 3.2 will provide improvements over their upper tail inequalities regarding this special class of random variables. 3.4 A correlation inequality Our main results in this section are: (1) Theorem 3.4.3, which gives a slightly more general version of the left tail inequality in Theorem 3.2.1, and, (2) Theorem 3.4.4, which provides a correlation inequality yielding upper bounds for the probability P(Y = 0) where Y is a nonnegative integer valued random variable. Examples and comparisons with other techniques will be included in the next subsection. Focusing on item (2) rst, a simple example can be given by considering the case where Y = P n i=1 X i where X i 's are i.i.d. (or more general) Bernoulli random variables with parameter p. For this case, one has P(Y = 0) =P n X i=1 X i = 0 ! =P n \ i=1 fX i = 0g ! justifying the nomenclature, correlation inequality. There has been several eorts to obtain such correlation bounds in the last three decades. See for example [6], for some early work in this direction. Our results here will be in the lines of the important work of Suen [67] which were later improved by Svante Janson in [42]. Janson's work is based on sums of indicators which are locally dependent. Theorem 3.4.4 below will generalize the cited results and will also improve them in terms of the quality of the upper bounds. Let us begin with recalling Janson's setting and results. We need some notation for this purpose. LetfI i g i2F be a nite family of indicators. is a dependency graph for fI i g i2F , i.e. a graph with vertex setF such that if A and B are disjoint subsets ofF, and contains no edge between A and B, then the familiesfI i g i2A andfI i g i2B are 41 independent. We write i j if there is an edge in between i and j: Also we will use the following notation: Y = P n i=1 I i , p i = P(I i = 1); = E[Y ] = P i p i ; i = P ji p j ; = max i i and nally = P fi;jg:ij E[I i I j ] summing over unordered pairs. With these notations, Janson rst proves the following tail estimate. Theorem 3.4.1 [42] LetfI i g i2F be a family of indicator random variables having a dependency graph : Then with notations as above, for any 0a 1 P(Y a) exp min (1a) 2 2 8 + 2 ; (1a) 6 : (3.11) The left tail bound we obtained via size biasing in Section 3.2 will yield improvements over Janson's result in almost all cases. However, our focus in this section is on comparing the correlation bounds of Janson. For this purpose observe that choosing a = 0 in Theorem 3.4.1 immediately gives the following estimate P(Y = 0) exp( 2 = max(8 + 2; 6)): Janson considers this as slightly weaker than the following, which is the main result of the same paper (Theorem 3). Theorem 3.4.2 [42] With assumptions as in Theorem 3.4.1, P(Y = 0) exp( 2 = max(8; 2; 6)): (3.12) This in particular says that the best possible bound one can obtain using Janson's result will be at least e =6 where = max i i , i = P ji p j and p j =E[I j ]: Note that when even one of the random variables depends on many others and when p i 's are not `small', will be too large so that the upper bound will be trivial. Theorem 3.4.4 enables one to obtain a Suen type correlation inequality taking care of this problem in various 42 cases. Also this new approach is not restricted to sums of Bernoulli random variables and can be used for any nonnegative integer valued random variable. Now let us go back to our work on size biased couplings, and show how we can weaken the boundedness condition in Theorem 3.2.1 slightly. Theorem 3.4.3 Let X = (X 1 ;:::;X n ) be random variables and assume that Y = g(X 1 ;:::;X n ) for some nonnegative measurable function g. Assume that = E[Y ]2 (0;1); and that there exists a size biased coupling Y s of Y that satises E[Y s YjX]C for some C > 0: Then P (Yt) exp t 2 2C for all t> 0: Proof: Let < 0 and m() =E[e Y ]. Then we have m 0 () =E[Ye Y ] =E[e (Y s Y) e Y ] E[e Y (1 +(Y s Y ))] = m() +E[e Y E[Y s YjX]] m() +Cm() = m()(1 +C): Rest of the proof is exactly the same as the rst part of the proof of Theorem 3.2.1. Now we immediately arrive at the following corollary which yields correlation bounds for nonnegative integer valued random variables. Theorem 3.4.4 Under the setting of Theorem 3.4.3, we have P(Y = 0) exp 2C : (3.13) 43 Proof: Using Theorem 3.4.3 with t = , result follows since P(Y = 0) = P(Y 0) by the assumption that Y is nonnegative and integer valued. Before discussing applications of Theorem 3.4.4, we note that this result can be quiet important for understanding the concentration of, not necessarily locally, dependent ran- dom variables. The conditionE[Y s YjX]C here can be interpreted as: The distance between Y and Y s might be arbitrarily large; however, once we x the underlying ran- dom variables, the distance between them will be small on the average. One way of looking at this problem is considering summations where the summands are dependent in a structured way. For example, a single summand may depend on all other variables, but only a few of them strongly. A formulation of this for a sum of indicators is dis- cussed in the following corollary. We use the notation from Section 3.1 in the statement ((X i 1 ;:::;X i i ;:::;X i n ) has the distribution of (X 1 ;:::;X i ;:::;X n ) conditioned on X i = 1). Corollary 3.4.1 Let X 1 ;:::;X n be nonnegative random variables with identical distri- bution and set Y = P n i=1 X i . Assume that for each i, one can decomposef1;:::;ng as A i 1 [A i 2 so thatE[X i j jX]X j D for all j2A i 1 andE[X i j jX]X j F n for all j2A i 2 . Further assume thatjA i 1 jK for all i: Setting C =DK +F we have P(Yt) exp t 2 2C for all t> 0. 44 Proof: Letting Y s be the standard size biased coupling construction of Y , we will show thatE[Y s YjX]C holds. We have E[Y s YjX] = 1 n n X i=1 E[Y i YjX] = 1 n n X i=1 n X j=1 E[X i j X j jX] = 1 n n X i=1 X j2A i 1 E[X i j X j jX] + 1 n n X i=1 X j2A i 2 E[X i j X j jX] 1 n n X i=1 KD + 1 n n X i=1 n F n =DK +F =C: Hence result follows from Theorem 3.4.4. One can give variations of this corollary easily. The moral of the story is that size biased couplings can help us to understand layered dependencies. Note that the layered dependence described above generalizesmdependence of random variables immediately. More generally, this will be useful when we have a collection of random variables for which changing only one of them will aect only a small portion of the other variables with high probability. 3.4.1 Examples Now we demonstrate the use of Theorem 3.4.4 with several examples. Example 3.4.1 (Independent and m-dependent sums) As the simplest possible case, let X i 's be i.i.d. Bernoulli random variables with success probability p and setY = P n i=1 X i . It is easy to see that for this case, one can nd the probabilityP(Y = 0) exactly as (1p) n : Next let us see the bound we attain with Theorem 3.4.4. We know thatY s = P n1 i=1 X i +1 has the Y size biased distribution and so we have Y s Y 1. Thus (3.13) yields P(Y = 0) exp np 2 : (3.14) 45 Note that the bound in (3.14) coincides with the one we can obtain via Janson's result in (3.12) for this case. However, our result enables us to analyze non-indicator random variables in the same way. Next, as a generalization we may consider the case where X 1 ;X 2 ;:::;X n are indi- cators with mean E[X i ] = p for i = 1; 2;:::;n and further assume that X 1 ;X 2 ;:::;X n are mdependent so that X i and X j are independent wheneverjijj > m: For large enough n, it is easy to see that the size bias bound with Theorem 3.4.4 is P(Y = 0) exp 2(2m+1) whereas the best estimate one can obtain using Janson's bounds will be greater or equal to exp 6p(2m+1) . Thus, in this case size biasing bound will be better than Janson's bound when p> 1=3. Example 3.4.2 (Non-empty boxes in occupancy) For 2 [m] we let the component M of the vector M = (M ) 2[m] count the number of balls in box when n balls are independently distributed into m boxes, with the position B j of ball j being box having probability 1=m: Dene Y ge = X 2[m] 1(M 1): It is shown in Chapter 4 that one can nd a size biased coupling Y s ge of Y ge satisfying Y s ge Y ge 1: Thus, Theorem 3.4.4 yields the bound P(Y ge = 0) exp 2 where =E[Y ge ] =m(1P(Bin(n; 1=m) = 0)): In the case of Janson's approach, it is easily seen that the dependency graph is the complete graph. So = 1 = m X =2 P(M 1) = (m 1)(1P(Bin(n; 1=m) = 0)) 46 and so the best possible bound one can obtain is at least P(Y ge = 0) exp m 6(m 1) which is signicantly worse than the one we obtain via size biasing. Example 3.4.3 (Isolated vertices in Erd} os-R enyi graph) LetG m be an Erd os-R enyi ran- dom graph on m vertices, where distinct vertices and are connected by an edge with probability p f;g , independently of all other edges, and let the components M of M = (M ) 2[m] record the degree of vertex . Let Y is be the number of isolated vertices inG m ; Y is = m X =1 1(M = 0): LettingX ; be the indicator that the edge between vertices and exists, and letting X be the vector ofX ; 's, one hasY is =g(X) for some nonnegative integer valued function g: It is shown in [30] that one can nd a size biased coupling of Y that satises E[Y s is Y is jX] 2: Thus using Theorem 3.4.4 we obtain P(Y is = 0) exp 4 where =m(1p) m1 : For this example, Janson's bound will be at least exp m 6(m1) . Depending on the parameters m and p; size biasing bound will provide sharper estimates than Janson's results. Example 3.4.4 (Isolated points in germ-grain models) LetU =fU 1 ;:::;U m g where U 1 ;:::;U m are independent random vectors in C n = [0;n 1=p ) p , having the same strictly positive density on C n . Equipping C n with the Euclidean toroidal distance D, forx2C n and > 0 let B x; =fy : D(x;y) g, the sphere of radius centered at x. With 47 given positive numbers ( ) 2[m] , let n be such that there existsfu 1 ;:::;u m g such that B u; \B u ; =; for all 6= . This condition, and that the density functions of U ;2 [m], are strictly positive in C n imply that the support of d() isf0; 1;:::;m 1g for all 2 [m] where d = (d()) 2[m] is the random vector with coordinates d() = X 2[m]nfg 1(B \B 6=;): That is, d() counts the number of neighbors of U . We are interested in the number of isolated points in the conguration which is dened as Y = X 2[m] 1(d() = 0): Now, using the straightforward size biased coupling construction, where we pick one of the points uniformly and leave it to a somewhere uniformly so that it does not have any neighbors, we have Y s Y = 1(d(I)6= 0) +d 1 (I) where d 1 (I) = X j2N(U I ) 1(d(j) = 1) and N(U I ) =fj :U j 2B I g: Thus E[Y s YjU] = 1 n n X i=1 1(d(i)6= 0) +d 1 (i) ! 1 + 1 n n X i=1 d 1 (i): 48 Now, n X i=1 d 1 (i) = n X i=1 X j2N(U i ) 1(d(j) = 1) = n X j=1 X i2N(U j ) 1(d(j) = 1) = n X j=1 jN(U j )j1(d(j) = 1) = n X j=1 1(d(j) = 1)n These observations yield E[Y s YjU] 2: which in particular implies that the correlation bound P(Y = 0) exp 4 holds where =E[Y ]: 3.5 A closely related coupling In this section, we follow the xed point transformations idea discussed in Section 3.2 and prove an upper tail inequality with the use of discrete equilibrium couplings. In Stein method literature, equilibrium transformations, which have the geometric distribution as their unique xed point, were introduced in [58]. In the following X is said to have geometric distribution with parameter p ifP(X = k) = (1p) k p; k 0. In the same spirit of Chapter 3, this time it is natural to expect an upper tail behavior of order q t with 0 < q < 1 by the nature of the xed point transformation. Let's start by recalling the denition of equilibrium couplings from [58]. Denition 3.5.1 If Y is a nonnegative integer-valued random variable with P(Y = 0) > 0; we say that an integer-valued random variable Y e 0 has the discrete equilibrium 49 distribution with respect to Y if for all bounded f and with f(x) =f(x + 1)f(x) we have E[f(Y )]f(0) =E[Y ]E[f(Y e 0 )]: Remark 3.5.1 Pek oz and Roellin also introduced a continuous version of the equilibrium transformation where exponential distribution is the unique xed point [59]. It is highly likely that the variants of the results of this section can also be given for the continuous case. Remark 3.5.2 Boundedness condition in Denition 3.5.1 can be relaxed. For example, iff is a nonnegative function, we can truncatef and use monotone convergence theorem to show that E[f(Y )]f(0) =E[Y ]E[f(Y e 0 )] when Y e 0 has the discrete equilibrium distribution. In our case, this will be used with f(x) =xe x 1(x 0) for > 0: Now we are ready to give the main result of this section. Theorem 3.5.1 Let Y be a nonnegative integer-valued random variable with P(Y = 0) > 0, nite mean = E[Y ] and nite mgf everywhere. Assume that one nd an equilibrium coupling Y e 0 of Y satisfying Y e 0 Y C and that Y e 0 is stochastically dominated by Y . Then there exists q2 (0; 1) and > 0 such that for any t> 0 we have P(Y t)q t : Remark 3.5.3 As the proof will make it clear, for any K > 1 we may indeed choose q = K 1 +K and = exp D 1 + 1 KD D 1 1 K ! 1 + 1 K 50 where D =C + 1: Note that q is less than 1 agreeing our discussion at the beginning of this section. Also, observe that the constant depends on : This will in general not be a problem since in applications we usually have 0 < C 1 < = E[Y ] = E[Y (n)] < C 2 for two universal constants C 1 and C 2 (independent of n) giving a bound of the form 0 q t with 0 a constant which is independent of : Here is the proof of Theorem 3.5.1. Proof: Letf(x) =xe x 1(x 0). By Remark 3.5.2, ifY e 0 has the equilibrium distribution w.r.t. Y , then for any > 0 we have E[Ye Y ] =E[(Y e 0 + 1)e (Y e 0 +1) Y e 0 e Y e 0 ]: Letting m() =E[e Y ] be the moment generating function of Y , we get m 0 () =(e 1)E[Y e 0 e Y e 0 ] +e E[e Y e 0 ]: Now sincef is an increasing function forx 0 and sinceY e 0 is stochastically dominated by Y , we obtain m 0 () (e 1)E[Ye Y ] +e E[e (Y e 0 Y) e Y ] (e 1)m 0 () +e e C m() which after rearrangement yields m 0 ()(1 +e )e (C+1) m(): Set D =C + 1: Then for 2 0; ln 1 + 1 (so that 1 +e > 0), we have m 0 () m() e D 1 +e : 51 Thus for 0<v< ln 1 + 1 , we get Z v 0 m 0 () m() d Z v 0 e D 1 +e d which immediately gives logm(v) 1 +e v 1 D (e Dv 1): Exponentiation yields m(v) exp D e Dv 1 1 +e v for any v2 0; ln 1 + 1 : Hence, using Chebyschev's inequality we arrive at P(Yt)e v(+t) m(v) exp D e Dv 1 1 +e v vtv Choosing v = ln 1 + 1 K with K > 1, this gives P(Yt) K 1 +K t where = exp D (1+ 1 KD ) D 1 1 K 1 + 1 K concluding the proof. Example 3.5.1 (Degree of a Randomly Chosen Node in a Uniform Attachment Graph) LetG n be a directed random graph onn nodes dened by the following recursive construc- tion. Initially the graph starts with one node with a single loop where one end of the loop contributes to the \in-degree" and the other to the \out-degree". Now, for 2 m n, given the graph with m 1 nodes, add node m along with an edge directed from m to 52 a node chosen uniformly at random among the m nodes present. We call a graph con- structed in this way as a uniform attachment graph. Here our interest is on the in-degree Y of a node chosen uniformly at random from a uniform attachment graphG n . We use the equilibrium coupling construction given in [58]. Let X i have a Bernoulli distribution, independent of all else, with parameter i := (ni + 1) 1 , and let N be an independent random variable that is uniform on the integers 1; 2;:::;n. If we imagine that node n + 1N is the randomly selected node, then we can write Y = P N i=1 X i : It is shown in [58] that Y e 0 := P N1 i=1 X i has the discrete equilibrium distribution with respect toY . Note thatY e 0 is stochastically dominated byY and alsojYY e 0 j 1. Also a simple computation shows that :=E[Y ] = 1 n n X i=1 ni ni + 1 : In particular, one has 1=4 1 for n 2. Thus using Theorem 3.5.1, we obtain P(Y t)q t where and q2 (0; 1) are computable constants which are independent of n. 53 Chapter 4 Size biasing, log-concavity and occupancy models In this chapter, we rst explore connections between log-concavity and size biased cou- pling constructions, and then show that threshold type counts based on multivariate occupancy models with log-concave marginals admit bounded size biased couplings under weak conditions for the following two models: 1. Counts of the number of vertices of given degrees in the Erd} os-R enyi graph, intro- duced in Section 4.2.1. 2. Counts of bin occupancy in the multinomial model, introduced in Section 4.2.2. Via the results of Section 3.2, these bounded couplings lead to new concentration of measure results for random graphs and multinomial allocation models, generalizing previous work in a number of directions, and improving them in certain cases. In addition to the two models discussed here, techniques below also apply to germ-grain models in stochastic geometry and multivariate hypergeometric sampling, but we skip these two here and refer [8] for interested reader. Before moving to the main discussion, let's give an idea of the avor of our results by considering the Erd} os-R enyi random graph on m vertices where each disjoint pair of vertices is independently connected by an edge with probability p2 (0; 1). Let the component M of the vector M = (M ) 2[m] record the degree of vertex . The work [30] derived concentration results for the number of isolated vertices P 2[m] 1(M = 0). Here we consider the random graph where each pair of disjoint edgesfi;jg is allowed to have its own connection probability p fi;jg . Next, we allow each vertex to have its own 54 threshold d to either meet, exceed, or dier from, which is allowed to take any value. Lastly, we allow each vertex to be weighted according to a nonnegative `importance factor,' or weight, w . Thus, in the Erd} os-R enyi random graph, and more generally, we provide sub-Poisson concentration bounds for random variables of the form Y ge = X 2[m] w 1(M d ) and Y ne = X 2[m] w 1(M 6=d ); (4.1) that is, for the weighted number of components of M having size at least d , and not equal to d , respectively. The concentration bounds provided by Theorems 4.2.1 and 4.2.2 for the variables dened in (4.1) also yield bounds for the `complementary' sums X 2[m] w 1(M <d ) = X 2[m] w Y ge and X 2[m] w 1(M =d ) = X 2[m] w Y ne ; with the mean = E[Y ] replaced by P 2[m] w and the roles of the right and left tails reversed. In fact, both results can be extended further, with essentially only a notational burden, to random variables of the form Y ? = X 2[m] w 1(M ? d ) where ? 2f;6=g; and therefore to the sums of complementary form. 4.1 Some lemmas and connections to log-concavity For any subsetS of R and any t 1 ;t 2 2 R, let t 1 S +t 2 =ft 1 s +t 2 : s2Sg. For a discrete random variable M let p x = P(M = x) and supp(M) =fx2 R : p x > 0g be the probability mass function and support of M, respectively. Recall that M is called a lattice random variable if supp(M) h 1 Z +h 2 for some real numbers h 1 6= 0, h 2 . We can without loss of generality assume our lattice random variablesM have supp(M)Z 55 by applying the transformation (Mh 2 )=h 1 . Such a lattice random variable M is log- concave (LC) if supp(M) is an integer interval; that is, if supp(M) = (k 1 ;k 2 )\Z for some k 1 ;k 2 2Z[f1g; k 1 <k 2 1; and p 2 x p x1 p x+1 for all x2Z: (4.2) Under a lattice log-concave assumption on the distribution of M, Lemma 4.1.1, Parts 1 and 2 provide bounded couplings of random variables with distributions L(MjMd) andL(MjMd), respectively, to variables with distributionsL(MjM d + 1) andL(MjM d 1). Part 3 shows that there is a bounded coupling of M to a variable having distributionL(MjM6=d), provided M is not degenerate at d. These results are extensions of [34, Lemma 3.3], which showed the d = 1 case of Part 1 when M is Bin(n;p) with p2 (0; 1). In the following we let Bern(p) denote the Bernoulli distribution giving mass 1p and p to 0 and 1 respectively. Lemma 4.1.1 Let M be a lattice LC random variable with supportS. 1. For x;d2Z dene (d) x = 8 > > < > > : P(Mx+1)P(M=d) P(Md+1)P(M=x) ; if x;d + 12S and xd 0; otherwise. Then the following hold. (a) 0 (d) x 1 for all x;d. (b) Ifd + 12S andN;B are a random variables such thatL(N) =L(MjMd) andL(BjN) = Bern( (d) N ), thenL(N +B) =L(MjMd + 1). 56 2. For x;d2Z dene (d) x = 8 > > < > > : P(Mx1)P(M=d) P(Md1)P(M=x) ; if x;d 12S and xd 0; otherwise. Then the following hold. (a) 0 (d) x 1 for all x;d. (b) If d 12S and N;B are random variables such thatL(N) =L(MjMd) andL(BjN) = Bern( (d) N ), thenL(NB) =L(MjMd 1). 3. Fix d2Z such thatP(M =d)< 1. Let B + ;B be conditionally independent given M withL(B + jM) = Bern( (d) M ) andL(B jM) = Bern( (d) M ). LetB be independent of B + , B , and M withL(B) = Bern(q), where q = P(Md + 1) P(M6=d) : Then L (M +BB + (1B)B ) =L(MjM6=d): (4.3) In other words, a random variable having the distribution on the left hand side of (4.3) can be formed by ipping aq-coinB and, if heads, adding 1 toM with probability (d) M , and otherwise subtracting 1 with probability (d) M . We note that whenM <d (resp. M >d), the probability (d) M of adding (resp. (d) M of subtracting) 1 is 0, and whenM =d, M is changed with probability 1 by either adding or subtracting 1. We dene the hazard function of a lattice random variable M with supportS as h x = P(M =x) P(Mx) = p x P yx p y for x2S: (4.4) We require the following result to prove Lemma 4.1.1. We will refer to [45] for other properties of lattice LC random variables that we will need. 57 Lemma 4.1.2 If M is lattice LC with supportS then the hazard function h x given in (4.4) is nondecreasing onS. Proof: For any x;y2S with xy note that by (4.2) we have p x+1 p x p x+2 p x+1 p y+1 p y : If x;x + 12S then 1=h x 1=h x+1 = X y2S:yx p y =p x X y2S:yx+1 p y =p x+1 = X y2S:yx (p y =p x p y+1 =p x+1 ) = X y2S:yx p y p x+1 p x+1 p x p y+1 p y 0: Proof of Lemma 4.1.1: Clearly (d) x 0, and to show that (d) x 1 it suces to assume that d;d + 12S since (d) x = 0 otherwise. Let h x be the hazard function of M dened by (4.4). For any dx2S, by Lemma 4.1.2 we have h d h x , and therefore (d) x = 1=h x 1 1=h d 1 1; proving Part 1a. To prove Part 1b, letting p x =P(M =x) and G x =P(Mx), for any k = 1; 2;::: we have P(N +Bd +k) =P(Nd +k) +P(N =d +k 1;B = 1) =P(Md+kjMd)+ (d) d+k1 P(M =d+k1jMd) = G d+k G d + G d+k p d G d+1 p d+k1 p d+k1 G d = G d+k G d G d+1 (G d+1 +p d ) = G d+k G d G d+1 G d = G d+k G d+1 =P(Md +kjMd + 1): 58 For Part 2a let f M =M, which is LC. For d 12S and dx2S, (d) x = P( f Mx + 1)P( f M =d) P( f Md + 1)P( f M =x) =e (d) x 2 [0; 1] by Part 1a, where e is dened with respect to f M. The rest of the proof of Part 2 is similar to that of Part 1. Moving to Part 3, lettingN denote the random variable on the LHS of (4.3), we will show that P(Ny) =P(MyjM6=d) for all y2Z; y<d; (4.5) the proof that P(Ny) =P(MyjM6=d) for all y >d being similar. Fix y <d and without loss of generality assume that y + 12S; (4.6) since otherwise (4.5) holds trivially as both sides are 0 or 1. With p x ;G x as above and F x =P(Mx), P(Ny) =P(My 1) +P(M =y; B = 0) +P(M =y; B = 1; B + = 0) +P(M =y + 1; B = 0; B = 1) =F y1 +p y (1q) +p y q(1 (d) y ) +p y+1 (1q) (d) y+1 =F y +p y+1 (1q) (d) y+1 ; (4.7) this last because (d) y = 0 since y<d. If d 12S then (4.7) is F y +p y+1 1 G d+1 1p d F y p d F d1 p y+1 =F y + F d1 1p d F y p d F d1 = F y 1p d =P(MyjM6=d): 59 Otherwised 162S so (d) y+1 = 0, hence (4.7) isF y . Ify =d 1 then minS =d by virtue of the assumption (4.6), so P(Nd 1) =F d1 = 0 =P(Md 1jM6=d): In the remaining case, d 162S and yd 2, we have maxS <d 1 again by virtue of (4.6), and in particular d62S. Then P(MyjM6=d) =P(My) =F y =P(Ny); nishing the proof. In the following we will say thatM has a Poisson Binomial distribution with param- eter p = (p j ) j2[m] , and write MPB(p), when L(M) =L 0 @ X j2[m] B j 1 A where B j are independent Bernoulli random variables with P(B j = 1) =p j for j2 [m]. When there exists p such that p = p for all j2 [m], then M Bin(m;p). We note that the distribution of a single Bernoulli random variable, with supportf0; 1g, trivially satises (4.2) and hence is LC. Since [45] demonstrates that LC is preserved under convolution, the claim of the following lemma is immediate. Lemma 4.1.3 The Poisson Binomial distributionPB(p) is LC. Turning to the hypergeometric distribution, it is shown in [26, Theorem A] that a hypergeometric random variable can be written as a sum of independent but non- identically-distributed Bernoulli random variables, hence the following lemma is a special case of the previous. Lemma 4.1.4 The hypergeometric distribution is LC. 60 When M has distributionPB(p) for p = (p j ) j2[m] , then for all d2Z we have P(M =d) =q eq (d; p) where q eq (d; p) = X s[m];jsj=d Y j2s p j Y j62s (1p j ) and so P(Md) =q ge (d; p) and P(M6=d) =q ne (d; p) where q ge (d; p) = m X k=d q eq (k; p) and q ne (d; p) = 1q eq (d; p): (4.8) For a `balls in urns' type occupancy model and the variable Y ge of (4.1), say, Lemma 4.1.1 demonstrates how a log-concave count M of a chosen urn having its distribution conditional onM d may be incremented or not, as determined by a Bernoulli variable whose success probability (d) M in Part 1 of Lemma 4.1.1 depends on the count M , in order to achieve its distribution conditional onM d +1. When the Bernoulli variable mandates the count of the urn be increased by one, a ball from a dierent urn must be added to it. When balls fall independently in the urns, the following lemma gives the distribution with which this additional ball should be selected from the other urns so that the conguration achieves the correct conditional distribution. We note that when the urn of interest contains every ball or no balls, the Bernoulli probability of adding or subtracting an additional ball, respectively, is zero. Lemma 4.1.5 Let B 1 ;:::;B m be independent Bernoulli random variables with respec- tive success probabilities p 1 ;:::;p m 2 (0; 1), and let R =fi :B i = 1g. 1. If J is a random variable taking values in [m] such that P(J =jjR) = p j =(1p j ) P k62R p k =(1p k ) for j62R; 61 then for any r [m] of sizejrj2 [m] 1, L(R[fJgjR =r) =L(RjRr;jRj =jrj + 1): 2. If J is a random variable taking values in [m] such that P(J =jjR) = (1p j )=p j P k2R (1p k )=p k for j2R; then for r [m] of sizejrj2 [m], L(RnfJgjR =r) =L(RjRr;jRj =jrj 1): Proof: For Part 1, x r [m] of sizejrj2 [m] 1. If s6= r[fjg for some j62 r then P(R[fJg =sjR =r) andP(R =sjRr;jRj =jrj + 1) are both zero. Otherwise, P(R[fJg =sjR =r) =P(J =jjR =r) = p j =(1p j ) P k62r p k =(1p k ) = Q i2r[fjg p i Q i62r[fjg (1p i ) P k62r Q i2r[fkg p i Q i62r[fkg (1p i ) = P(R =r[fjg) P(Rr;jRj =jrj + 1) =P(R =r[fjgjRr;jRj =jrj + 1) =P(R =sjRr;jRj =jrj + 1): Part 2 follows by applying Part 1 upon replacing R and p j , j2 [m], by [m]nR and 1p j , j2 [m], respectively. The next lemma shows that when the occupancy model M = (M ) 2[m] has lattice LC marginal distributions that are bounded below, and is such that for every 2 [m] one can closely couple M to a conguration having the distribution of M conditional on incrementing the coordinate M by one, then M can be coupled to a occupancy conguration that has the distribution of M conditional on the th coordinate being no smaller than any value in its support, and which diers from M in at most a bounded number of coordinates. The lemma also furnishes a similar result for coupling M to a 62 conguration having the occupancy distribution conditioned on its th coordinate not equal to some particular value. When M = (M ) 2[m] is a measurable function of some collection of random vari- ablesU we say M depends on (or corresponds to) the congurationU, or thatU has corresponding occupancy counts M. Lemma 4.1.6 Let M = (M ) 2[m] be a random vector depending on a congurationU such that for all 2 [m] the component M has a lattice, log-concave distribution with supportS , and let a = infS and b = supS . 1. Suppose for all 2 [m] that a = 0, and for all k 2 S given any U k with distribution L(V k ) :=L(UjM k); (4.9) one can constructU + k on the same space asU k such that for all (d ) 2[m] with d 2S , there exists B 0 such that L(U + k jU k ) =L(V k jN k; = minfM k; + 1;b g) and X 6= 1(M + k; d ) X 6= 1(M k; d ) +B; (4.10) where M k , M + k and N k are the occupancy counts corresponding toU k ,U + k andV k , respectively. Then there exists a coupling of the weighted threshold counts Y ge in (4.1) for the congurationU to a variable Y s ge having the Y ge size bias distribution that satises Y s ge Y ge +jwj (Bjdj + 1): (4.11) 63 2. Suppose (d ) 2[m] is such that P (M 6=d )< 1 for all 2 [m]. WithV satisfying L(V) :=L(U) and having corresponding counts N, suppose that for all 2 [m] one can constructU + with counts M + on the same space asU such that L(U + jU) =L(VjN = minfM + 1;b g) and X 6= 1(M + d ) X 6= 1(M d ) +B; (4.12) and also a congurationU , with corresponding counts M , satisfying (4.12) with M +1 andb replaced byM 1 anda , respectively. Then there exists a coupling of the weighted threshold counts Y ne given by (4.1) for the congurationU to a variable Y s ne having the Y ne size bias distribution that satises Y s ne Y ne +jwj (B + 1): (4.13) Proof: For Part 1, x 2 [m]. LettingU 0 =U, trivially for k = 0 we have L(U k ) =L(UjM k) and X 6= 1(M k; d ) X 6= 1(M 0; d ) +Bk; (4.14) and that the distribution of the th componentM k; of M k is lattice LC, where M k are the occupancy counts corresponding toU k . We show that we can construct successive congurationsU k where these statement hold for k = 1;:::;d . Assume that for some k = 0; 1;:::;d 1, a congurationU k has been constructed satisfying (4.14) and whose th corresponding component M k; of M k is lattice LC. By the hypotheses of the lemma, one can coupleU k to a congurationU + k with correspond- ing occupancy counts M + k satisfying (4.10). With (k) x given in Part 1 of Lemma 4.1.1, letU k+1 be the congurationU + k with probability (k) M k; , and otherwise letU k+1 beU k . By the log-concavity of M k; , Part 1b of Lemma 4.1.1, (4.9) and (4.10), we have that L(U k+1 ) =L(V k+1 ). It is easy to check that the conditional distribution of a lattice LC 64 random variable, conditioned on taking values in any integer interval subset of its sup- port, is again lattice LC. HenceM k+1; is lattice LC. Because M k satises the inequality in (4.14), (4.10) guarantees that the counts M k+1 corresponding toU k+1 satisfy it for k + 1, completing the induction. Let Y ge be the weighted threshold count as given in (4.1), corresponding to the nal congurationU d . Lemma 4.1.7 with X = 1(M d ) and (4.14) show that mixing Y ge , 2 [m], with distribution (4.17) results in a variable Y s ge with the Y ge size bias distribution that, now also taking account of the possible change in summand and the weights (w ) 2[m] , satises (4.13). Part 2 is shown in a manner similar to that of Part 1. Let (d) x ; (d) x and q be as dened in Parts 1, 2 and 3 of Lemma 4.1.1. SetU equal toU + with probabilityq (d) M 0; , U with probability (1q) (d) M 0; , andU 0 otherwise. It follows from the properties of U + ;U as in (4.12), and Part 3 of Lemma 4.1.1 that mixingU as in Part 1 results in a conguration whose weighted threshold countY s ne has theY ne size bias distribution, and satises (4.11). As a nal note in this section, we include the following lemma which discusses size biased constructions for sums of constant multiples of indicator random variables. This is a slight generalization of the construction discussed in Section 3.1 by making use of the fact that (aY ) s =aY s , which directly follows from (3.1). Also note that this can be considered as a special case of a result of [37] applied to our situation. Lemma 4.1.7 Let Y = P 2[m] w X be a nite sum of Bernoulli variables (X ) 2[m] weighted by nonnegative constants (w ) 2[m] and satisfying E[Y ] > 0. Suppose that for 2 [m] the variablesfX ;2 [m]g have joint distribution L(X ;2 [m]) =L(X ;2 [m]jX = 1): (4.15) 65 Then letting Y = X 2[m] w X (4.16) and I a random variable independent offX ;2 [m]g with distribution P(I =) = w E[X ] E[Y ] ; (4.17) the variable Y I has the Y -size bias distribution. Specializing to the case of interest here, to size bias sums of the form (4.1), say Y ge for concreteness, one selects the summand w 1(M d ) according to (4.17), and following (4.15), constructs a conguration on the same space as the one given whereM is at least d . For a `balls in urns' occupancy type model, such a construction requires the marginal distribution of M to achieve its conditional distribution given M d , and the distributions of the remaining urn occupancies to achieve their conditional distributions, given the contents of urn. Lemma 4.1.1 shows how to handle the marginal occupancy count distribution of the chosen cell when it is log-concave, and Lemma 4.1.5 shows how the correct conditionals can be achieved jointly when changing the count in the chosen urn when the marginals are Poisson Binomially distributed. 4.2 Applications on occupancy models We now present in detail the two aforementioned models, and use the constructions in Section 4.1 to prove concentration bounds for each case. Recall that the variables of interest here are the weighted occupancy counts of the form Y ge = X 2[m] w 1(M d ) and Y ne = X 2[m] w 1(M 6=d ): (4.18) 66 We may assume that any indicators in the sums (4.18) are non-trivial as those that are zero may simply be removed, and the corresponding w may be subtracted from the variable of interest for any that are identically one. In this same manner, we may assume that all the nonnegative weighting factors w are strictly positive, and that the sum is non-constant after making such reductions, as otherwise the result is trivial. In particular, without loss of generality we may assume for all 2 [m] that d 2S \ (S + 1) when considering Y ge , and that 0<P(M 6=d )< 1 when considering Y ne . When M PB(p ) for each 2 [m], by (4.8) the means ge and ne of Y ge and Y ne are given respectively by ge = X 2[m] w q ge (d ; p ) and ne = X 2[m] w q ne (d ; p ): (4.19) Throughout, when dealing with Y ge , we will assume that the inmum a of the support of M is zero for all 2 [m]. This assumption is equivalent a >1 for all 2 [m], as then M and d may be replaced by by M a and d a , respectively. 4.2.1 Degree counts in Erd os-R enyi type graphs The classical Erd os-R enyi random graph onm vertices is constructed by placing an edge between each pair of distinct vertices independently and with equal probability. The model was originally used in conjunction with the probabilistic method for proving the existence of graphs with certain properties (see [1]), and it became popular as well for modeling complex networks (see [20]). Here we consider the degree counts in Erd os-R enyi graphs, a quantity that has been the object of much study. Letting D m be the number of vertices of degree d in an Erd os-R enyi graph on m vertices with edge connectivity probability p depending on n, asymptotic normality was shown in [44] when m (d+1)=d !1 and mp! 0; or mp! 0 and mp logmd log logm!1: Asymptotic normality of D m when mp! c > 0 67 was obtained in [5]. Other univariate results on asymptotic normality of counts on ran- dom graphs are given in [41], and references therein. Goldstein and Rinott ([37]) obtain smooth function bounds for the vector whosek components count the number of vertices of xed degrees d 1 ;d 2 ;:::;d k when p ==(m 1)2 (0; 1) for xed , implying asymp- totic multivariate joint normality. This latter work was later extended to inhomogeneous random graph models in [49] which will be the setting in our paper. Formally, letG m be an Erd os-R enyi random graph on m vertices, where distinct vertices and are connected by an edge with probability p ; = p ; , independently of all other edges. The classical model is recovered by settingp ; =p for somep2 [0; 1]. ForY ge , with similar remarks applying toY ne , removing any edgef;g withp ; = 1, and replacingd ;d byd 1;d 1, we may assumep ; < 1 for all (;)2 [m][m]. Also reducing to the case where all the indicators in (4.18) are nontrivial allows us to assume that P :6= p ; < 1 for all 2 [m]: Let the components M of M = (M ) 2[m] record the degree of vertex . As M PB(p ), with p = (p ; ) :6= , by (4.8) the means ge and ne ofY ge andY ne are given by (4.19). Lemma 4.2.1 There exists a coupling of Y ge to Y s ge , having the Y ge -size biased distribu- tion, that satises jY s ge Y ge j (max 2[m] w )(1 + max 2[m] d ); and a coupling of Y ne to Y s ne , having the Y ne -size biased distribution, satisfying jY s ne Y ne j 2 max 2[m] w : Proof: First considerY ge . We verify that the hypotheses of the rst part of Lemma 4.1.6 hold withB = 1. The marginal distributions of M are log-concave by Lemma 4.1.3. For 2 [m], letk2S and suppose congurationU k , with corresponding occupancy counts M k , is a conguration with distributionL(V k ) =L(U 0 jM k; k). IfM k; = supS then settingU + k =U k we have that (4.10) is satised. 68 Otherwise, let R be the set of neighbors of vertex 2 [m] in the congurationU k , and letU + k be constructed by selecting a vertex 62R with probability P(J = jR ;2 [m]) = p ; =(1p ; ) P 62R p ; =(1p ; ) and connecting it to vertex. Lemma 4.1.5 and the independence of the edge indicators show that the counts corresponding the congurationU + k have distributionL(V k jN k; = M k; +1) where N k are the counts associated toV k . Also including an extra edge incident to vertex clearly aects the degree of at most one vertex, other than, yielding (4.10) with B = 1. We apply Lemma 4.1.7 with X = 1(M d ), and let I be an index chosen with distribution (4.17), independently of M. Lemma 4.1.6 yields the existence of occupancy counts M I with distribution that of M conditioned onM I d I , with the contents of M and M I diering in at mostjdj + 1 vertices. Letting Y s ge be given as Y in (4.18), with M replaced byM I , Lemma 4.1.7 yields thatY s ge has theY -sized biased distribution. As changing the degree of a single vertex can change sums as in (4.18) by at mostjwj, the dierence between Y s ge and Y ge is at most (max 2[m] w )(1 + max 2[m] d ) as claimed. To handleY ne we proceed similarly, now verifying that the hypotheses of the second part of Lemma 4.1.6 hold with B = 1. We have already shown in the previous part how the congurationU + required in the lemma may be constructed fromU 0 . In like manner, U can be obtained by selecting a vertex 2R with probability P(J = jR ;2 [m]) = (1p ; )=p ; P k2R (1p ; )=p ; ; and removing the edge between and . We now apply Lemma 4.1.7 with X = 1(M 6= d ), and I an index chosen with distribution (4.17), independently of M. Proceeding as before, Lemma 4.1.6 yields the existence of a conguration M I with distribution that of M conditioned on M I 6= d I , with the contents of M and M I diering in at most two vertices, accounting for vertex 69 . Lemma 4.1.7 yields that Y s ne , given as Y in (4.18) with M replaced by M I , has the Y -sized biased distribution. As the congurations corresponding to the summands Y s ne and Y ne dier in at most two vertices, the summands themselves can dier by no more than 2 max 2[m] w , and we are done. The following theorem now follows immediately. Theorem 4.2.1 Concentration of measure inequalities (3.5)-(3.7) hold for the counts of the number of vertices of given degrees in the Erd} os-R enyi graph, with, a. with Y ge ; ge and c ge given by (4.18), (4.19), and (max 2[m] w )(1 + max 2[m] d ), b. with Y ne , ne and c ne given by (4.18), (4.19) and 2 max 2[m] w . In the standard case of equal thresholds d and unit weightings, the expectations of Y ge and Y ne simplify to ge =mP(Bin(m 1;p)d) and ne =mP(Bin(m 1;p)6=d); (4.20) respectively, and the bounds (3.5)-(3.7) apply to Y ge with c = d + 1, and for Y ne with c = 2. In particular, (3.5) and (3.7) yield P(Y ge ge t) exp t 2 2(d + 1) ge and P(Y ge ge t) exp t 2 2(d + 1)( ge +t=3) : (4.21) As noted above, the special case of the number of isolated vertices Y is = X 2[m] 1(M = 0) for the standard Erd} os-R enyi model was handled in [30], using an unbounded size bias coupling, and with much greater eort. Techniques of the present paper can be used to obtain concentration bounds forY is in a much simpler way by noting thatmY is =Y ge 70 with unit weightings and equal thresholds d = 1. In particular, the bounds (4.25) hold with Y is is replacing Y ge ge and setting d = 1, reversing the roles of the left and right tail bounds, and replacing ge by m is . The left tail bound obtained in this fashion is stronger than the corresponding bound P(Y is is t) exp t 2 4 is ; given in [30], fort 6m(1p) m1 3m, with similar remarks applying to the right tail. Although the unbounded size bias coupling argument given in [30] applies only to the case of isolated vertices, rst part of Theorem 4.2.1 applies equally for all degrees d. In particular, keeping p and d xed and letting m ! 1, the left and right tail bounds forY ge provided by (3.5) and (3.7), say, will behave as exp(t 2 =(2(d+1)m)) and exp(t 2 =(2(d + 1)(m +t=3))), respectively. As another standard example regarding the asymptotics of the resulting concentra- tion bounds, we may x d and consider the case mp! for some > 0 so that for large m Bin(m 1;p) is close to a Poisson random variable with parameter . In this case focusing on the statistic Y ge = P 2[m] 1(M 1), for simplicity, the mean satis- es ge ! m(1e ), and the resulting left and right tail bounds are asymptotic to exp(t 2 =(4m(1e ))) and exp(t 2 =(4m(1e ) + 4t=3)), respectively as m!1. Comparisons of these tail bounds with other techniques from the literature will be dis- cussed in detail in Section 4.3. 4.2.2 Multinomial occupancy Among the many applications of multinomial occupancy models, in whichn balls are dis- tributed independently tom boxes (see [47] for an overview), are the well-known species trapping problem (see [61]) and the closely-related problem of statistical linguistics (see [69]). The study of the number of empty boxes, equivalent to studying the d = 1 case ofY ge in (4.18), was initiated in [71] where it was shown that the properly standardized 71 distribution ofY ge is asymptotically normal when balls land in boxes uniformly. Bounds in theL 1 metric between the standard normal distribution and standardized nite sam- ple distribution of the d = 1 case of Y ge was provided by [27] in the uniform case, of Y eq by [56] in the uniform and some non-uniform cases, and for all remaining d 2 versions of Y eq by [7] in the uniform case. Concentration of measure inequalities for occupancy problems were previously discussed in [23] for the number of empty boxes using negative dependence of random variables. For2 [m] let the componentM of the vector M = (M ) 2[m] count the number of balls in box whenn balls are independently distributed intom boxes, with the position B j of ball j being box having probability p ;j . As in Section 4.2.1, we may assume that p ;j < 1 for all (;j)2 [m] [n], and that P n j=1 p ;j > 0 for all boxes 2 [m]. As M PB(p ), arguing as for the multinomial occupancy problem, the mean of Y ne in (4.18) again has the form (4.19), here with p = (p j ) j2[n] . Lemma 4.2.2 There exists a coupling of Y ge to Y s ge , having the Y ge -size biased distribu- tion, that satises jY s ge Y ge j max 2[m] w ; and a coupling of Y ne to Y s ne , having the Y ne -size biased distribution, satisfying jY s ne Y ne j 2 max 2[m] w : Proof: We closely follow the proof of second part of Lemma 4.2.1. In particular, the marginal distributions of M are log-concave by Lemma 4.1.3. LetU k , with corresponding occupancy counts M k , be a conguration such that the distribution of the corresponding counts satisfyL(V k ) =L(MjM k; k) for some k2S . If M k; = supS then let 72 U + k =U k . Otherwise, letting R be the identities of the balls in box 2 [m] in the congurationU k , we formU + k by selecting a ball j62R with probability P(J =jjR ;2 [m]) = p ;j =(1p ;j ) P k62R p ;k =(1p ;k ) and placing it into box . Lemma 4.1.5 and independence between the locations of balls shows that the counts corresponding the congurationU + k have the distribution L(V k jN k; = M k; + 1) where N k are the counts associated toV k . In like manner,U can be obtained by choosing ball j2R with probability P(J =jjR ;2 [m]) = (1p ;j )=p ;j P k2R (1p ;k )=p ;k ; and placing it into box 6= with probability P(B J 2R jB J 62R ) =p ;J =(1p ;J ). We complete the remainder of the proof by arguing as in the second part of Lemma 4.2.1. Theorem 4.2.2 Concentration of measure inequalities (3.5)-(3.7) hold form multino- mial occupancy counts, a. with Y ge ; ge and c ge given by (4.18), (4.19), and max 2[m] w , b. with Y ne , ne and c ne given by (4.18), (4.19) and 2 max 2[m] w . In the asymptotic regime most studied, balls are uniformly distributed, thresholds are constant and the weights are taken to be identically 1. That is, p ; = 1=m;w = 1 andd =d for each2 [m] and2 [n]. For this special case, the expectations in (4.19) simplify to ge =mP(Bin(n; 1=m)d) and ne =m(1P(Bin(n; 1=m) =d)); (4.22) and the concentration bounds of Section 3.2 can be used for Y ge withc = 1, and for Y ne with c = 2. 73 The expectations in (4.22) are of a form similar to those of (4.20) for the standard Erd} os-R enyi model. Thus, by arguing as in Section 4.2.1, we can study in the same manner the behavior of the bounds we obtain for this case via Theorem 3.2.1. 4.3 Comparison of results with other techniques In this section, we compare the results given in Section 4.2 to concentration bounds obtained by other means. Our comparisons will be with the following three well known techniques for obtaining concentration inequalities: (i) Azuma-Hoeding bound, (ii) Use of negative association and (iii) Certiable functions. Of these three, the last technique is the most appropriate one with which to compare our results. For simplicity and con- creteness in our comparisons, below we will consider the unit weighting and constant threshold count Y ge = X 2[m] 1(M d): (4.23) Azuma-Hoeding bound. For an example from the present paper, let's consider an Erd os-R enyi random graphG on m vertices with xed edge probabilities p. We are interested in the concentration of the number of non-isolated vertices inG, which is dened as Y = P 2[m] 1(M 1) where M is the degree of vertex : It is easily seen that Y can be written as a function, call it f, of independent random variables X 1 ;X 2 ;:::;X ( m 2 ) where X i is the indicator of the presence of a given edge with respect to a certain labeling of all possible edges. Observing that the function f satises the bounded dierences condition given in (??) with c i = 2 for each i = 1;:::; m 2 , Azuma- Hoeding inequality yields P(Yt);P(Yt) exp t 2 4m(m 1) (4.24) 74 where =m(1 (1p) m1 ). Next recalling from Lemma 4.2.1 that we can construct a size biased coupling Y s of Y that satisesjY s Yj 2, (3.5) and (3.7) provide the bounds P(Yt) exp t 2 4 and P(Yt) exp t 2 4( +t=3) : (4.25) Focusing on, for example, the lower tail one can easily see that the size biasing bound given in (4.25) will always be better than the Azuma-Hoeding bound in (4.24) for the number of non-isolated vertices. Indeed, since =O(m) (asY is a sum ofm indicators) we see that the order of the exponent is actually improved fromO(t 2 =m 2 ) toO(t 2 =m). Similar discussions also apply for the upper tail. Further, the bounds (4.24) and (4.25), obtained by the use of Azuma-Hoeding and size biasing respectively, will remain true for inhomogeneous random graph models once the mean is changed accordingly. For such models, depending on the underlying edge probabilities, improvements provided by the size bias technique can be even more signif- icant in terms of the orders of exponents. Finally, we note that the discussion in previous paragraphs on random graphs will also be valid for the multinomial occupancy example above (and for various other problems in the literature), and Theorem 3.2.1 will provide improvements over the Azuma-Hoeding inequality in certain regimes. There are several other techniques in the literature that improve the orders of the exponents in Azuma-Hoeding bound. Below we will discuss two of these, use of negative dependence and certiable functions, which we found most appropriate to compare with our results. Negative association. Negative association has been used successfully to obtain concentration of measure inequalities for occupancy models. We recall from Chapter 2 75 that a family of random variablesX 1 ;X 2 ;:::;X m is said to be negatively associated if for any disjoint subsets A 1 ;A 2 [m], E[f(X i ;i2A 1 )g(X j ;j2A 2 )]E[f(X i ;i2A 1 )]E[g(X j ;j2A 2 )] whenever f and g are coordinate-wise nondecreasing functions for which these expecta- tions exist. Referring to Proposition 5 of [23],when X 1 ;X 2 ;:::;X m are negatively associ- ated indicators, the random variable Y = P m i=1 X i satisfy the bounds (3.5) and (3.7), with C = 1. For the multinomial occupancy model of Section 4.2.2, it can be shown that the indicators 1(M d ) are negatively associated. Hence, both the size bias and negative association technique yield the same bounds, with the same holding for the left tail. On the other hand, the indicator summands in the multinomial occupancy count Y ne = X 2[m] 1(M 6=d ) are no longer negatively associated when the thresholds d are not all 0, so the method of [23] no longer applies, while the methods in this paper show that the concentration of measure inequalities of Section 3.2 are still valid. For instance, when d = 1 for each 2 [m] and balls are distributed uniformly, rst part of Theorem 4.2.2 yields that the concentration bounds of Section 3.2 hold with C = 2 and ge =m(1P(M 1 = 1)) =m 1 n m 1 1 m n1 ! : Lastly, we note that one cannot use negative association for our applications to random graphs in Section 4.2.1. For instance, for a standard Erd} os-R enyi graph, a simple application of Harris' inequality shows that the summand variables of Y ge are indeed positively associated. 76 Self-bounding and certiable functions. We have seen above that bounds pro- duced by size biasing may improve on the bound (??) obtained using the bounded dierence inequality as it replaces the sum P n i=1 c 2 i by some function of the mean of Y . Bounds produced by the method of self bounding functions [55], of which certiable functions are a special case, also have this advantage. We focus on the latter, as it is more straightforward to address the applications studied here in the framework of certiable functions. We begin by recalling the relevant denitions and results on certiable functions from [55]. Letc> 0,a 0, andb be given, and let the nonnegative measurable function f on the product space = n i=1 i satisfy the following two conditions. (i) For each x2 , changing any coordinate x j changes the value of f(x) by at most c. (ii) If f(x) = s then there is a set of coordinates D [n] of size at most as +b that certiesf(x)s. That is, if the coordinates i2D ofy2 agree with those of x, then f(y)s. Let X 1 ;:::;X n be independent random variables with X i taking values in i , Y = f(X 1 ;:::;X n ) where f is a certiable function, and =E[Y ]. Then for all t 0; P(Yt) exp t 2 2c 2 (a +b +t=3c) and P(Yt) exp t 2 2c 2 (a +b +at) : (4.26) Before moving to a discussion of specic examples, we note that the asymptotic Poisson order O(exp(t logt)) as t!1 of the bound (3.6) with c = 1 and = 1, is superior to the order O(exp(t)) of the bound (4.26), with c = 1 and a = 1=2, with similar types of improvement in order holding for other choices of constants. The order of the bounds achieved by certiable functions, and more generally self bounding functions, seems to be intrinsic. In particular, after the proof of Theorem 6.21 in [11], the authors 77 note that using the entropy method to prove concentration inequalities for self bounding functions, via log Sobolev inequalities in particular, `at least for a> 1, there is no hope to derive Poissonian bounds. . . for the upper tail.' To focus on a specic example, consider the multinomial occupancy model of Sec- tion 4.2.2, and letY ge be given by (4.23) where M is the number of balls in cell . The variableY ge can clearly be written as a functionf of the locationsX j of ballj = 1;:::;n. It is not dicult to verify that f is certiable with c = 1;a =d and b = 0. Thus, from (4.26), Y ge satises P(Y ge ge t) exp t 2 2(d ge +t=3) and P(Y ge ge t) exp t 2 2d( ge +t) : (4.27) Applying size biasing, Part 3 of Theorem 4.2.2 shows that the lower and upper tail bounds (3.5) and (3.7) hold for Y ge with c = 1, and a simple computation shows that both these inequalities strictly outperform their counterparts in (4.27). Similar remarks apply to the statisticY ne = P 2[m] 1(M 6=d), which is a certiable function with c = 2;b = n and a = 0. The left tail size bias bound exp(t 2 =(4 ne )) obtained via (3.5), will always be preferable to exp(t 2 =(8n)), a function dominated by the left tail bound obtained from (4.27), as ne n< 2n. This situation is similar for the size bias bounds for other applications studied above. For example, consider the bounds provided by the rst part of Theorem 4.2.1 for the standard Erd} os-R enyi random graph on m vertices in Section 4.2.1. The variable Y ge given by (4.23), where M is the degree of vertex and d 2, is certiable with c = 2;a = d and b = 0. One can now easily verify that both the lower and upper tail bounds (3.5) and (3.7) are superior to those obtained via (4.26). Finally, we note that the concentration of measure inequalities of Theorem 3.2.1 will always provide further improvements over the size bias bounds (3.5) and (3.7) applied in the previous paragraphs. However, the form of these latter bounds, being simpler 78 than that of (3.6), allow for an easier comparison with (4.27), and although they are not the strongest bounds of those produced by the size bias method, they still suce to demonstrate the improvements claimed. 79 Chapter 5 A multivariate concentration bound The purpose of this chapter is to show how size biased couplings can be used to obtain multivariate concentration inequalities where the coordinates are possibly depen- dent random variables. For a given random vector W = (W 1 ;W 2 ;:::;W k ) with non- negative coordinates having nite, nonzero expectations i = E[W i ], another vector W i = (W i 1 ;W i 2 ;:::;W i k ) is said to have W size biased distribution in direction i if E[W i f(W)] = i E[f(W i )] (5.1) for all functions for which these expectations exist. Note that for a univariate nonnegative random variable W , this simplies to standard size biasing. 5.1 Main result We start by xing some notations. In the following, for two vectors x; y2 R k , we will write x y = x 1 y 1 ; x 2 y 2 ;:::; x k y k for convenience. Also, we dene the partial ordering onR k by x y,x i y i ; fori = 1; 2;:::;k: 80 Accordingly, the order is dened by x y, y x, and the denitions for and are similar. Finally, for 2R k , t will stand for the transpose of andkk 2 is the l 2 norm of. Now, we are ready to state our main result. Theorem 5.1.1 Let W = (W 1 ;W 2 ;:::;W k ) be a random vector whereW i is nonnegative with mean i > 0 and variance 2 i 2 (0;1) for each i = 1; 2;:::;k, and suppose that the moment generating function of W exists everywhere. Assuming that we can nd couplings fW i g k i=1 of W, with W i having W size biased distribution in direction i and satisfying kW i Wk 2 K for some constant K > 0, we have for any t 0, P W t exp ktk 2 2 2K 1 (5.2) and P W t exp ktk 2 2 2(K 1 +K 2 ktk 2 ) (5.3) where K 1 = 2K (1) 2 , K 2 = K 2 (1) with (1) = min i=1;2;:::;k i , = ( 1 ; 2 ;:::; k ) and = ( 1 ; 2 ;:::; k ): Note that when the variances satisfy 1 =::: = k =, these bounds simplify to P(Wt) exp ktk 2 2 4 p kKkk 2 (5.4) and P(W t) exp ktk 2 2 4 p kKkk 2 + p kKktk 2 : (5.5) Proof of Theorem 5.1.1 will be given in Section 5.1.1. Here we note that the assump- tion on moment generating function can be relaxed to E[e t W ]<1 forkk 2=K, as can be checked easily from the proof. Indeed, Arratia and Baxendale showed recently for the univariate case that the existence of a bounded coupling for W assures the existence 81 of the mgf everywhere. See [2] for details. Although their result can be generalized to the multivariate case, we skip this for now as in applications the underlying random variables are almost always nite (so that mgf exists everywhere). Noting that the k = 1 case in Theorem 5.1.1 reduces to standard size biasing and replacing t by t=, we arrive at the following univariate corollary. Corollary 5.1.1 LetW be a nonnegative random variable with nite and nonzero mean, and assume that the moment generating function of W exists everywhere. If there exists a size biased coupling W s of W satisfyingjW s WjK for some K > 0, then for any t 0, we have P(Wt) exp t 2 4K and P(Wt) exp t 2 4K +Kt : (5.6) Remark 5.1.1 For the one dimensional case, the lower tail inequality in (5.6) again improves Ghosh and Goldstein's corresponding result (namely, inequality (1) in [28]) by removing the monotonicity condition. However, in both tails the constants are slightly worse than the ones in their theorem, but this is not too surprising as our main result is for the multivariate case. In the rest of this section we brie y review the discussion in [29] which gives a procedure to size bias a collection of nonnegative random variables in a given direction. More on construction of size biased couplings can be found in [37]. As mentioned earlier, for a random vector W = (W 1 ;W 2 ;:::;W k ) with nonnegative coordinates, a random variable W i is said to have W size bias distribution in direction i if E[W i f(W)] = i E[f(W i )] for all functions for which these expectations exist. It is well known that the denition just given is equivalent to the following one. 82 Denition 5.1.1 Let W = (W 1 ;W 2 ;:::;W k ) be a random vector whereW j 's have nite, nonzero expectations j = E[W j ] and let dF (x) be the joint distribution of W. For i2f1; 2;:::;kg, we say that W i = (W i 1 ;W i 2 ;:::;W i k ) has the W size bias distribution in direction i if W i has joint distribution dF i (x) = x i dF (x) i : (5.7) Note that in univariate case, (5.7) reduces todF (x) =xdF (x)=, the univariate size biased distribution. Also this latter denition gives insight for a way to construct size biased random variables. Following [29], by the factorization of dF (x), we have dF i (x) = x i dF (x) i = P(W2dxjW i =x) x i P(W i 2dx) i = P(W2dxjW i =x)P(W i i 2dx): where W i i has W i size biased distribution. Hence, to generate W i with distribution dF i , rst generate a variable W i i with W i size bias distribution. Then, when W i i = x, we generate the remaining variables according to their original conditional distribution given that i th coordinate takes on the value x: The construction just described combined with Theorem 5.1.1 can be used to prove concentration bounds for random vectors with independent coordinates. Example 5.1.1 Let W = (W 1 ;:::;W k ) be a random vector where W i 's are nonnegative, i.i.d. random variables with W i K a.s. for some K > 0, and assume that 0 < 2 = Var(W 1 ) <1. To obtain W i , we let W i i be on the same space with W i size biased distribution and also set W i j = W j for j6= i. Since coordinates of W are independent, W i = (W i 1 ;W i 2 ;:::;W i k ) has W size biased distribution in direction i: Also noting that 83 W i i K as support ofW i i is the same as ofW i , we obtainkW i Wk 2 K. Thus using Theorem 5.1.1, one can conclude that the lower tail inequality P W t exp 2 ktk 2 2 4K p k and the upper tail inequality P W t exp ktk 2 2 4K p k= 2 +Kktk 2 = hold for all t 0. 5.1.1 Proof of Theorem 5.1.1 Before we begin the proofs, we note the following inequality je y e x jjyxj e y +e x 2 (5.8) which follows from the following observation e y e x yx = Z 1 0 e ty+(1t)x dt Z 1 0 (te y + (1t)e x )dt = e y +e x 2 for all x6=y. Proof of Theorem 5.1.1. We rst prove the upper tail inequality. Let 0 = (0; 0;:::; 0)2R k withkk 2 < 2=K. Note that an application of (5.8) and Cauchy-Schwarz inequality gives for any i = 1;:::;k, E[e t W i ]E[e t W ]jE[e t W i ]E[e t W ]j E " j t (W i W)j(e t W i +e t W ) 2 # E " kk 2 kW i Wk 2 (e t W i +e t W ) 2 # Kkk 2 2 E[e t W i +e t W ]: 84 Changing sides, sincekk 2 < 2=K; we obtain E[e t W i ] 1 + Kkk 2 2 1 Kkk 2 2 E[e t W ]: (5.9) Letting m() =E[e t W ], using (5.9) and the size bias relation in (5.1) we have @m() @ i =E[W i e t W ] = i E[e t W i ] i 1 + Kkk 2 2 1 Kkk 2 2 E[e t W ] = i 2 +Kkk 2 2Kkk 2 m(): (5.10) Now, letting M() = E h exp t W i ; observe that we have M() = m exp t : Hence denoting @ i m() = @m() @ i = ; we obtain fork=k 2 < 2=K, @M() @ i = 1 i @ i m exp t i i m exp t i i 2 +Kk=k 2 2Kk=k 2 m exp t i i m exp t = i i M() 2 +Kk=k 2 2Kk=k 2 1 = i i M() 2Kk=k 2 2Kk=k 2 : This in particular gives fork=k 2 < 2=K, @ logM() @ i i i 2Kk=k 2 2Kk=k 2 : Now, using the mean value theorem, for all 02R k withk=k 2 ; log(M()) =r log(M(z)); 85 for some 0 z: Noting thatkz=k 2 k=k 2 < 2=K and using Cauchy-Schwarz inequality, we obtain logM() =r logM(z) k X i=1 2Kkz=k 2 2Kkz=k 2 i i i 2Kk=k 2 2Kk=k 2 2 kk 2 j (5.11) Next we observe that kk 2 < 1 K 2 =) 2 < 2 K : Thus ifkk 2 < 1=K 2 ; (5.11) yields logM() 2 2Kkk 2 2 = (1) (2Kkk 2 = (1) ) = K 1 kk 2 2 2(1K 2 kk 2 ) : Hence if t 0 andkk 2 < 1=K 2 , we obtain P W t P t W t t exp t t M() exp t t + K 1 kk 2 2 2(1K 2 kk 2 ) Using = t K 1 +K 2 ktk 2 0, and noting thatkk 2 < 1=K 2 , we obtain the upper tail inequality. Next we prove the left tail inequality given in (5.2). Letting 0 and using the size bias relation given in (5.1), we have @m() @ i =E[W i e t W ] = i E[e t W i ] = i E[e t (W i W) e t W ]: Using the inequality e x 1 +x, this yields @m @ i i E[(1 + t (W i W))e t W ]: (5.12) 86 By Cauchy-Schwarz inequality and thatkW i Wk 2 K, we have j t (W i W)jkk 2 kW i Wk 2 Kkk 2 which in particular gives t (W i W)Kkk 2 . Combining this observation with (5.12), we arrive at @m @ i i E[(1Kkk 2 )e t W ] = i (1Kkk 2 )m(): (5.13) Now, keeping the notations as in the upper tail case and using the estimate in (5.13), we get @M @ i = 1 i @ i m exp t i i m exp t = 1 i exp t @ i m i m 1 i exp t i 1K 2 m i m Manipulating the terms in the lower bound, this yields @M @ i = 1 i exp t i K 2 m = i i K 2 M() i i K (1) kk 2 M(): Now, using the mean value theorem, for 0, one can nd z 0 such that logM() =r logM(z): 87 Hence for a given 0, we have logM() =r logM(z) k X i=1 K i kk 2 (1) i i k X i=1 K i kk 2 (1) i i (5.14) where we used that i 0 for each i for the inequalities. Now, using (5.14) and an application of Cauchy-Schwarz inequality gives logM() Kkk 2 (1) k X i=1 i j i j i K (1) 2 kk 2 2 : which after exponentiation yields M() exp K (1) 2 kk 2 2 . Combining this last observation with Markov's inequality, we arrive at P W t =P t W t t exp t t + K (1) 2 kk 2 2 : Using = t 2 K (1) k k 2 0, result follows. 5.2 Applications 5.2.1 Local dependence and an application on local extremes The purpose of this section is to show that Theorem 5.1.1 can be used to obtain a concen- tration bound for a random vector W = (W 1 ;W 2 ;:::;W k ) with nonnegative coordinates that are functions of a subset of a collection of independent random variables. First, we recall the following lemma from [29] which provides the constructions for size biased couplings in given directions. Lemma 5.2.1 Let V = f1; 2;:::;kg and fC v ;v 2 Vg be a collection of independent random variables, and for each i 2 V, let V i V and W i = W i (C v ;v 2 V i ) be a nonnegative random variable with nonzero and nite mean. 88 i. [29] IffC i v ;v2V i g has distribution dF i (c v ;v2V i ) = W i (c v ;v2V i ) E[W i (C v ;v2V i )] dF (c v ;v2V i ) and is independent offC v ;v2Vg, letting W i j =W j (C i v ;v2V j \V i ;C u ;u2V j \V c i ); the collection W i =fW i j ;j2Vg has the W size biased distribution in direction i. ii. Further if we assume that W i M for each i, then we have kW i Wk 2 p bM where b = max i jfj :V j \V i 6=;gj: Proof: Proof of the fact that W i =fW i j ;j2Vg has the W size biased distribution in direction i can be found in [29]. For the second part, we note that by the construction in the statement, we have W j =W i j wheneverV j \V i =;: Thus, kW i Wk 2 = 0 @ k X j=1 jW i j W j j 2 1 A 1=2 (M 2 max i jfj :V j \V i 6=;gj) 1=2 = p bM: In conclusion, we note that in the case of local dependence as described above, we can use Theorem 5.1.1 withK = p bM. Ghosh and Goldstein provide two specic examples, sliding m window statistics and local extrema on a lattice, in [29] in the univariate sense. We now extend their work on the number of local maxima on a given graph to a multivariate setting. 89 5.2.1.1 Joint distribution of local extrema Consider an rregular graphG = (V;E) withjVj =n. LetfC g :g2Vg be a collection of i.i.d. random variables that represent the weights associated with vertices. Denoting the vertices adjacent to the vertex v2V byV v (1) and settingV v =fvg[ V v (1), we dene the functions X v and Y v as X v (C w :w2V v ) = 1(C v >C w ;w2V v (1)) and Y v (C w :w2V v ) = 1(C v <C w ;w2V v (1)): Then W 1 := X v2V X v and W 2 := X v2V Y v count the number of local maxima and the number of local minima, respectively. We are interested in the joint concentration of the random vector W = (W 1 ;W 2 ): (Note that the discussion here can be generalized easily to the random vector V = (V 1 ;:::;V r+1 ) where V i is the number of i th maxima inG, where a vertex v2V is an i th maxima if jfw2V v (1) :C w >C v gj =i1. The reason for focusing on just the maxima and minima is just the notational convenience.) Using Lemma 5.2.1 and the discussion in Section 5.1, to form W 1 , we rst choose a vertex v2V uniformly and independently of C g 's. Now if v is already a local maxima, then we do nothing and set W 1 = W. Otherwise, we interchange the weights of v and the vertex inV v (1) that has the maximum weight, which we call as u v . Denoting the resulting graph asG 1 v , and letting W 1 1 ;W 1 2 be the number of local maxima and minima inG 1 v , W 1 = (W 1 1 ;W 1 2 ) has the W size biased distribution in direction 1. Similarly, we can dene W 2 . 90 Next we observe that for i;j2f1; 2g, we have jW i j W j j 1 +r +r = 2r + 1; where the 1 counts v itself, the rst r is for the neighbors of v and the other r is for the neighbors of u v . Hence kW i Wk (2(2r + 1) 2 ) 1=2 = p 2(2r + 1); i = 1; 2: Also, denoting the means of W 1 ;W 2 by 1 ; 2 , we have 1 = 2 = n r + 1 (5.15) since any vertex inV v has the same chance of being a local maxima or a local minima. Finally, noting that the variances ofW 1 andW 2 are equal by the underlying symme- try, we see that the concentration bounds in (5.4) and (5.5) hold with K = p 2(2r + 1) and 1 ; 2 as in (5.15). Remark 5.2.1 A concentration bound for W 1 was previously obtained in [29] via size biasing for the caseV =f1;:::;ng modulo n inZ p andE =f(v;w) : P p i=1 jv i w i j = 1g. Their construction provides a bounded size biased couplingW 1 1 that satisesjW 1 1 W 1 j 2p 2 + 2p + 1: Note that this corresponds to a 2p-regular graph and the discussion above improves their bound tojW 1 1 W 1 j 2r + 1 = 4p + 1 which is a strict improvement for any p 2: 5.2.2 Pattern occurences Let 1 ; 2 ;:::; k 2 S m be k distinct permutations from S m ; the permutation group on m 3 elements. Also let be a uniformly random permutation inS n , wherenm and setV =f1; 2;:::;ng. DenotingV s =fs;s +1;:::;s +m 1g for s2V; where addition of elements ofV is modulo n, we say the pattern appears at location s2V if the values 91 f(v)g v2Vs andf(v)g v2V 1 are in the same relative order. Equivalently, the pattern appears ats if and only if( 1 (v)+s1);v2V 1 is an increasing sequence. Our purpose here is to prove concentration bounds using Theorem 5.1.1 for the multivariate random variable W = (W 1 ;W 2 ;:::;W k ) whereW i counts the number of times pattern i appears in . This problem was previously studied in [29] for the univariate case. For 2 S m , let I j () be the indicator that (1);:::;(mj) and (j + 1);:::;(m) are in the same relative order. Following the calculations in [29], for i = 1;:::;k, we have i =E[W i ] = n m! (5.16) and 2 i =Var(W i ) =n 0 @ 1 m! 1 2m 1 m! + 2 m1 X j=1 I j ( i ) (m +j)! 1 A (5.17) Now we are ready to give our main result. Theorem 5.2.1 With the setting as above, if W = (W 1 ;W 2 ;:::;W k ), then the conclu- sions of Theorem 5.1.1 hold with mean and variance as in (5.16) and (5.17), and K 1 = 2k(2m 1)m! m! 2m + 2 and K 2 = p k(2m 1)m! 2 p n(m! 2m + 1) : Proof: Letting be a uniformly random permutation in S n , and X s; the indicator that appears at s, X s; ((v);v2V s ) = 1(( 1 (1) +s 1)<:::<( 1 (m) +s 1)); the sumW = P s2V X s; counts the number ofmelement-long segments of that have the same relative order as : 92 Now let s be in S m so that ( s (1) +s 1)<::::<( s (m) +s 1) and s 1 (v) = ( s ( 1 (vs + 1)) +s 1) if v 2V s and s 1 (v) = (v) if v = 2V s : In other words, s 1 is the permutation with the values (v);v 2V s reordered so that s 1 ( ) for 2V s are in the same relative order as 1 : Similarly we can dene s 2 ;:::; s k corresponding to 2 ;:::; k ; respectively. To obtain W i , the W size biased variate in directioni fori = 1; 2;:::;k, pick an index uniformly fromf1;:::;ng and set W i j = P s2V X s; j ( i ): Then W i = (W i 1 ;W i 2 ;:::;W i k ) for i = 1; 2;:::;k: The fact that we indeed obtain the desired size bias variates follows from results in [32]. Since both 1 and 2 agree with on all the indices leaving outV andjV j =m; we obtainjW i j W j j 2m 1 for i;j = 1; 2;:::;k: Hence,kW i Wk 2 p k(2m 1) for each i2f1; 2;:::;kg: Now recall from (5.17) that 2 i = n 1 m! 1 2m1 m! + 2 P m1 j=1 I j ( i ) (m+j)! for i = 1; 2;:::;k: Since 0I j 1, one can obtain a variance lower bound by setting I k = 0. In particular, this yields 2 (1) n m! 1 2m 1 m! : Since the constants K 1 and K 2 in Theorem 5.1.1 can be replaced by larger constants, result follows from simple computations. 93 Chapter 6 Use of zero biased couplings In this chapter, we rst discuss some preliminaries on zero bias distribution, including zero bias coupling constructions. Then we give our main concentration bounds in Section 6.2, and demonstrate these with an application to the Hoeding statistic in Section 6.3. This chapter is joint work with Larry Goldstein and is based on [33]. 6.1 Zero bias transformation Recall from [35] that for any mean zero random variable Y with positive, nite variance 2 , there exists a distribution for a random variable Y satisfying E[Yf(Y )] = 2 E[f 0 (Y )] (6.1) for all absolutely continuous functions f for which the expectation of either side exists. The variable Y is said to have the Y zero biased distribution. One can show that Y is absolutely continuous with the density function p (y) =E[Y 1(Y >y)]= 2 =E[Y 1(Y y)]= 2 : (6.2) By Stein's lemma [65] it is also known that a random variableY is normal if and only if Y = d Y : Thus, under a bounded zero biased coupling assumption, it is natural to assume that the tail behavior of Y will be close to the tails of normal distribution. Our main results in the next section justify this heuristics by providing concentration of measure inequalities with bounded zero biased couplings. For the use of zero bias couplings to produce bounds in normal approximations see [16] and the references therein. 94 We state two properties of the zero bias distribution that will be applied below. First, from (6.1), it is easy to see that whenever a6= 0, we have (aY ) = d aY : (6.3) Next, from [35], ifY is bounded by some constant, thenY is also bounded by the same constant, that is, jYjC implies jY jC: (6.4) In the rest of this section we discuss constructions of zero biased couplings of mean zero random variables. We start with the simplest case, whereY is a sum of independent, mean zero random variables each having a nite, positive variance. As formalized in the following proposition, the zero biased distribution of such a sum can be constructed by choosing a summand with probability proportional to its variance, and replacing it with a variable from its own zero bias distribution. Proposition 6.1.1 [35] Let X 1 ;:::;X n be independent, mean zero random variables with nite, positive variances 2 1 ;:::; 2 n and set Y = n X i=1 X i : Then with 2 = 2 1 + + 2 n and I a random index independent of X 1 ;:::;X n having distribution P(I =i) = 2 i 2 ; (6.5) and X i having the X i zero biased distribution, independent of X j ;j6= i and of I, the variable Y =YX I +X I has the Y zero biased distribution. 95 The next construction, due to [35], is based on the existence of -Stein pairs, that is, exchangeable variables (Y 0 ;Y 00 ) that satisfy the linearity condition E[Y 00 jY 0 ] = (1)Y 0 for some 2 (0; 1): Proposition 6.1.2 [35] Let (Y 0 ;Y 00 ) be a -Stein pair with Var(Y 0 ) = 2 2 (0;1) and distribution F (y 0 ;y 00 ). Then E[Y 0 ] = 0 and E(Y 0 Y 00 ) 2 = 2 2 ; and when (Y y ;Y z ) has distribution dF y (y 0 ;y 00 ) = (y 0 y 00 ) 2 E(Y 0 Y 00 ) 2 dF (y 0 ;y 00 ); (6.6) and UU(0; 1) is independent of (Y y ;Y z ), the variable Y =UY y + (1U)Y z (6.7) has the Y 0 zero biased distribution. In the typical uses of the coupling construction suggested by Proposition 6.1.2, includ- ing those in Section 6.3 below, given Y 0 ; we rst construct Y 00 close to Y 0 , such that (Y 0 ;Y 00 ) is a-Stein pair, and use it to form the dierenceY 0 Y 00 that appears in (6.6). Then, perhaps independently, one constructs the parts of Y y ;Y z that depend on the \square biased" term (Y 0 Y 00 ) 2 : Finally, one constructs the remaining parts of Y y ;Y z by using, as much as possible, the corresponding parts of Y 0 ;Y 00 to achieve variables Y y ;Y z that are close and have joint distribution dF y (y 0 ;y 00 ). 96 In many applications the construction just described results in S, a function of the variables which can be kept xed, and variables T 0 ;T y andT z , all on a joint space, such that Y 0 =S +T 0 ; Y y =S +T y ; and Y z =S +T z : (6.8) When T 0 ;T y and T z are all bounded by B, (6.7) gives jY Y 0 j =jUT y + (1U)T z T 0 jUjT y j + (1U)jT z j +jT 0 j 2B: (6.9) 6.2 Concentration bounds and discussion Our main results on zero biased couplings are summarized in the following theorem. Theorem 6.2.1 Let Y be a mean zero random variable with variance 2 2 (0;1) and moment generating function m(s) =E[e sY ], and let Y have the Y zero biased distribu- tion and be dened on the same space as Y . (a) If Y Y C for some C > 0 and m(s) exists for all s2 [0; 1=C), then for all t 0 P(Y t) exp t 2 2( 2 +Ct) : (6.10) The same upper bound holds for P(Y t) if Y Y C when m(s) exists for all s2 (1=C; 0]. IfjY YjC for some C > 0 and m(s) exists for all s2 [0; 2=C) then for all t 0 P(Y t) exp t 2 10 2 =3 +Ct ; (6.11) with the same upper bound holding for P(Y t) if m(s) exists in (2=C; 0]. (b) IfY Y C for some constantC > 0 andm(s) exists at = (logtlog logt)=C then for t>e P(Y t) exp t C logt log logt 2 C exp t 2C logt 2 2 C : (6.12) 97 If YY C then the same bound holds for the left tail P(Y t) when m() is nite. As regards part (a) and behavior inn, we remark that ifjY YjC andm(s) exists in [0; 2=C), then bound (6.10) is preferred over (6.11) forjtj< 4 2 =3C, a set increasing toR asymptotically in typical applications where the variance of Y increases to innity inn whileC remains constant. Regarding behavior int, part (b) of Theorem 6.2.1 shows that the respective asymptotic orders exp(t=(2C)) and exp(t=C) of bounds (6.10) and (6.11) as t!1, can be improved to exp(t logt=(2C)). As the bound (6.12) applies only whent>e it should be considered as a complementary result to the bounds in (a), that hold for all t 0. Remark 6.2.1 Theorem 5.1 of [16] states that when Y is a mean zero random variable with variance one for which there exists a coupling to Y such thatjY Yj C for some C then the Kolmogorov distance between Y and the standard normal distribution is bounded by 2:03C. Hence for small C the distribution of Y is close to normal, and in the limiting case where C takes the value zero inequality (6.10) is a valid bound when Y has the standard normal distribution. For such Y the inequality of [19] yields sup t0 exp(t 2 =2)P(Y t) =a; for a = 1=2, while (6.10) yields the same bound with a = 1. The constant a = 1 also results when bounding P(Y t) by inf s0 e st E[e sY ]. On the other hand, again by Theorem 5.1 of [16], when the distribution of Y is not close to normal there cannot exist a coupling ofY toY with a small value ofC, and the bounds may perform poorly in that the tail decay of Y may in fact be faster than what is indicated by (6.10). Example 6.2.1 (Independent sums) Though Theorem 6.2.1 may be invoked in the pres- ence of dependence, and for variables that may not be expressed as sums, we compare 98 the performance of our bound to comparable results in the literature whose application is limited to the case where Y is the sum of independent variables X 1 ;:::;X n with mean zero and variances 2 i = Var(X i )2 (0;1);i = 1;:::;n. Letting 2 = Var(Y ) and fol- lowing the discussion in Section 6.1, one can form a zero biased coupling of Y to Y by replacing theI th summandX I ofY by a random variableX I independent of the remain- ing summands which has theI th summand's zero bias distribution, where the indexI has distributionP(I =i) = 2 i = and is chosen independently of all else. WhenjX i jC for all i = 1;:::;n, by (6.4) this construction satises jY Yj =jX I X I j 2C; and as the moment generating function of Y exists everywhere in this case, using the bound, say, (6.10) we obtain P(Y t) exp t 2 2 2 +aCt : (6.13) with a = 4 (When the distribution of X i 's are also symmetric, it is easy to see that we can construct Y withjY Yj C improving the concentration bounds slightly). The closest classical inequality to (6.13) that holds under the conditions above is the one of Bernstein, see Corollary 2.11 of [11], which yields (6.13) with a = 2=3. Though the constant of 2=3 is superior to the 4, our results are more general as they provide concentration results in the presence of dependence. We also note that the rate for large t of the bound (6.12) is superior the rate in (6.13) for any a> 0. The tail bounds in (6.12) can also be considered as a version of Bennett's inequality for sums of independent random variables. In the same setting as for (6.13) where Y is a sum of independent variables satisfyingjX i j C, Bennett's inequality, which we discussed above in Section 2.1, provides the tail bound P(Y t)e t=C exp 2 C 2 1 + Ct 2 log 1 + Ct 2 ; t 0: (6.14) 99 We note that in the case of independent summands, Bennett's inequality will in general give better bounds than (6.12). Regarding other inequalities of note, when Y is the sum of n centered Bernoulli variables with success probability p2 (0; 1), the bounded dierences inequality yields the tail bound exp(2t 2 =n), which, upon taking C = 1, is improved upon by (6.13) in the range 0tn(1 4p(1p))=(2a). As the mean and variance pair (; 2 ) of a random variableY may in general take on any value inR(0;1), bounds forY expressed in terms of, such as the method of self bounding functions, see [55], and the use of size bias couplings as in previous sections, are not in general comparable to those of Theorem 6.2.1. In particular, in Remark 6.3.2, while handling an example involving dependent variables, we show how the bounds of Theorem 6.2.1, expressed in terms of the variance, may be superior to bounds expressed in terms of the mean. Our nal result in this section will be an extension of Theorem 6.2.1 to random vectors. First we recall from [36] the extension of the denition of zero bias distributions for multivariate random variables. For a square integrable mean zero random vector X = (X 1 ;:::;X n )2R n with ij =E[X i X j ] we say the collection of vectorsfX ij g i;j: ij 6=0 inR n has the X-zero bias distribution if for all dierentiable functions f i :R n !R with f ij =@ j f i , E " n X i=1 X i f i (X) # = X 1i;jn ij E f ij (X ij ) whenever the expectations exist. Here is the main result on random vectors, we skip its proof as it is very similar to the proof of Theorem 5.1.1. Theorem 6.2.2 Let X = (X 1 ;:::;X k ) t be a mean zero random vector having nite moment generating function m() = E[e t X ] for all 2 R k , and let 2 = 100 max i P k j=1 2 ij 1=2 . If there exists a family of random variablesfX ij g i;j: ij 6=0 having the X-zero biased distribution on the same space as X satisfyingkX ij Xk 2 K for each i;j2f1;:::;kg 2 , then for anyt 0, P(Xt) exp ktk 2 2 4 2 + 2Kktk 2 : 6.2.1 Proof of the main result Proof of Theorem 6.2.1: Letm(s) =E[e sY ] andm (s) =E[e sY ]. WhenY Y C for all s 0 then m (s) =E[e sY ] =E[e s(Y Y) e sY ]E[e Cs e sY ] =e Cs m(s): (6.15) In particular when m(s) is nite then so is m (s). Part (a). If m(s) exists in an open interval containing s we may interchange expec- tation and dierentiation at s to obtain m 0 (s) =E[Ye sY ] = 2 E[se sY ] = 2 sm (s); (6.16) where we have applied the zero bias relation (6.1) to yield the second equality. We rst prove (6.10). Starting with the well known inequality 1xe x , holding for all x 0, we obtain e x 1 1x for x2 [0; 1). Hence, for 2 (0; 1=C) and 0s we have m (s) =E[e sY ] =E[e s(Y Y) e sY ]e sC m(s) 1 1sC m(s): 101 Using the identity (6.16) to express m (s) in terms of m 0 (s) we obtain m 0 (s) 2 s 1sC m(s): Dividing both sides by m(s), integrating over [0;] and using that m(0) = 1 we obtain logm() = Z 0 m 0 (s) m(s) ds 2 1C Z 0 sds = 2 2 2(1C) ; and exponentiation yields m() exp 2 2 2(1C) : As (6.10) holds trivially for t = 0 consider t > 0 and apply Markov's inequality to obtain P(Y t) =P(e Y e t )e t m() exp t + 2 2 2(1C) : Setting =t=( 2 +Ct), and noting this value lies in the interval (0; 1=C), (6.10) follows. If now Y Y C then letting X =Y we see from (6.3) with a =1 that X =Y has theX-zero bias distribution, and applying (6.10) toX yields the claimed left tail inequality. Turning to (6.11), by the convexity of the exponential function we have e y e x yx = Z 1 0 e ty+(1t)x dt Z 1 0 (te y + (1t)e x )dt = e y +e x 2 for all x6=y, and hence e y e x jyxj(e y +e x ) 2 for all x and y. 102 Hence, whenjY YjC, for all 2 (0; 2=C) and 0s, e sY e sY js(Y Y )j(e sY +e sY ) 2 Cs 2 (e sY +e sY ): Taking expectation yields m (s)m(s) Cs 2 (m (s) +m(s)); hence m (s) 1 +Cs=2 1Cs=2 m(s); and now relation (6.16) gives that m 0 (s) 2 s 1 +Cs=2 1Cs=2 m(s): Following steps similar to the ones above, we obtain m() exp 2 2 1C=2 for = 5=6 and all 2 (0; 2=C). As the result holds trivially for t = 0, x t> 0 and argue as before using Markov's inequality to obtain P(Y t) exp t + 2 2 1C=2 for all 2 (0; 2=C). Letting = 2t=(4 2 +Ct), and noting that this value lies in (0; 2=C), we obtain the asserted right tail inequality (6.11). ReplacingY byY as before now demonstrates the remaining claim. Part (b). For any s2 [0;) such that m() exists, by (6.15) we have m 0 (s) =E[Ye sY ] = 2 sE[e sY ] = 2 sm (s) 2 se Cs m(s); 103 so that (logm(s)) 0 2 se Cs : Integrating over [0;] and using that m(0) = 1 we obtain log(m()) 2 C 2 e C (C 1) + 1 and exponentiation yields m() exp 2 C 2 e C (C 1) + 1 : Applying Markov's inequality as before, P(Y t) exp t + 2 C 2 e C (C 1) + 1 : For t>e letting = (logt log logt)=C we obtain the rst claim of (6.12) by P(Y t) exp t C (logt log logt) + 2 C 2 t logt (logt log logt 1) + 1 exp t C logt log logt 2 C : The second claim follows by the inequality (logt)=2 log logt for allt> 1. The left tail bound follows as in (a). 104 6.3 Hoeding's permutation statistic Let be a random permutation in the symmetric group S n and A = (a ij ) 1i;jn an nn matrix with real entries. Hoeding's statistic, of the form Y = n X i=1 a i;(i) (6.17) arises in a number of contexts, permutation testing problems foremost among them; see [70] for a seminal reference, and [37] and references within. Hoeding's combinatorial central limit theorem [39] gives conditions under which Y , properly centered and scaled, has an asymptotic normal distribution. The rate of convergence of Y to its normal limit is well studied, see for instance [16] and references therein. Here we apply Theorem 6.2.1 to obtain concentration inequalities for Y . First we consider the case where in (6.17) is chosen uniformly over S n . Letting a i = 1 n n X j=1 a ij ; a j = 1 n n X i=1 a ij and a = 1 n 2 n X i;j=1 a ij ; straightforward calculations show that the mean A and variance 2 A of Y are given by A =na (6.18) and 2 A = 1 n 1 X 1i;jn (a 2 ij a 2 i a 2 j +a 2 ) = 1 n 1 X 1i;jn (a ij a i a j +a ) 2 : (6.19) Theorem 6.3.1 Let n 3 and A = (a ij ) 1i;jn be an array of real numbers such that, with A and 2 A given by (6.18) and (6.19) respectively, the variance 2 A is non-zero. Let 105 be a random permutation uniformly distributed over S n and Y be as in (6.17). Then the bounds of Theorem 6.2.1 hold with Y replaced by Y A , 2 by 2 A and C = 8 max 1i;jn ja ij a i j: (6.20) Proof: Relabel Y and as Y 0 and 0 , respectively, and center by replacing Y 0 by Y 0 A anda ij bya ij a i . With ij the transposition that interchangesi andj, that is, ij (i) =j; ij (j) =i and ij (k) =k for allk = 2fi;jg, it is shown in [32] that by sampling I;J uniformly fromf1;:::;ng over all distinct pairs, independently of 0 , that Y 0 and Y 00 = n X i=1 a i; 00 (j) form a -Stein pair with = 2=(n 1) when 00 = 0 I;J . Following [32], see also [16], and the outline given in Section 6.1, to constructY y and Y z , sampleI y ,J y ,K y ,L y , independently of the remaining variables, from the distribution p 2 (i;j;k;l) = [(a ik +a jl ) (a il +a jk )] 2 4n 2 (n 1) 2 ; and set y = 8 > > > > < > > > > : 1 (K y );J y if L y =(I y );K y 6=(J y ) 1 (L y );I y if L y 6=(I y );K y =(J y ) 1 (K y );I y 1 (L y );J y otherwise, and z = y I y ;J y. Then withI =fI y ; 1 (K y );J y ; 1 (L y )g the decomposition (6.8) holds with S = X i= 2I a i; 0 (i) ; T 0 = X i2I a i; 0 (i) ; T y = X i2I a i; y (i) and T z = X i2I a i; z (i) : 106 SinceI is a set of size at most 4, by (6.9) we nd that there exists a coupling of the centered variable Y 0 to a random variable having its zero bias distribution with the distance between the two bounded by C of (6.20). Theorem 6.2.1 now obtains. Remark 6.3.1 Lettingjjajj = max 1i;jn ja ij a i j, note that forn 3 the variance 2 A in (6.19) can be bounded as 2 A 2n 2 n 1 jjajj 2 3njjajj 2 : Thus, bound derived from (6.10), for instance, of Theorem 6.2.1 can be upper bounded in terms ofjjajj as P(jY A jt) 2 exp t 2 6njjajj 2 + 16jjajjt : Remark 6.3.2 In [13] the concentration bound P(jY A jt) 2 exp t 2 4 A + 2t (6.21) of the Hoeding statistic Y is shown under the additional condition that 0a i;j 1 for all i;j. Under this condition one can take C = 8 in (6.20), and Theorem 6.3.1 yields P(jY A jt) 2 exp t 2 2 2 A + 16t : (6.22) In particular, the bound (6.22) will be smaller than (6.21) whenever t (2 A 2 A )=7. Moreover, when a ij ; 1 i;j n are themselves independent random variables with lawL(U) having support in [0; 1], then E[ 2 A ] = (n 1)Var(U) (n 1)E[U 2 ]<nE[U] =E[ A ]; where the rst equality follows by a calculation using the rst expression for the variance in (6.19), and applying 0U 1 to yield E[U 2 ]E[U] for use in the strict inequality. 107 Hence if the array entries behave as independent and i.i.d. random variables on [0; 1], (6.22) will be asymptotically preferred to (6.21) everywhere. Finally we note that the bound (6.12) further improves on (6.22), as regards its asymptotic order in t. Remark 6.3.3 In the case where is uniform, when the rows of A are monotone, or more generally, when they have the same relative order, the summand variables fa i(i) g 1in are negatively associated and the Bernstein and Bennett inequalities hold, (6.13) with a = 2=3 and (6.14), respectively, thus improving on the bound of Theorem 6.3.1 in this special case. However, for both the uniform and constant conjugacy class distributions considered below it is easy to show that negative association does not hold in general (For example, for the uniform case, we may let A be the identity matrix so that Y corresponds to the number of xed points in . It is well known that in this case Y is indeed a sum of positively associated random variables). We now consider Hoeding's statistic Y = n X i=1 a i(i) when the distribution of is constant over cycle type. This framework includes two special cases of note, one where is a uniformly chosen xed point free involution, considered by [38], having applications to permutation testing in certain matched pair experiments, and the other where has the uniform distribution over permutations with a single cycle, considered by [46], under the additional restriction thata ij =b i c j . Bounds on the error of the normal approximation to Y when the distribution of is constant over cycle type were derived in [32]. We start by recalling some relevant denitions. For q = 1;:::;n letting c q () be the number of q cycles of , the vector c() = (c 1 ();:::;c n ()) 108 is the cycle type of . For instance, the permutation = ((1; 3; 7; 5); (2; 6; 4)) in S 7 consists of one 4 cycle in which 1! 3! 7! 5! 1, and one 3 cycle where 2! 6! 4! 2, and hence has cycle type (0; 0; 1; 1). We say the permutations and are of the same cycle type ifc() =c(), and that a distributionP onS n is constant on cycle type if P () depends only on c(); that is P() =P() whenever c() =c(): Clearly a vector c = (c 1 ;:::;c n ) of nonnegative integers is a cycle type of some per- mutation if and only if P i ic i =n. Given such a vectorc, a special case of a distribution constant on cycle type is one uniformly distributed over all permutations having cycle typec, denotedU(c). The situations where is uniformly chosen from the set of all xed point free involutions, and chosen uniformly from all permutations having a single cycle, are both distributions of typeU(c), the rst with c = (0;n=2; 0;:::; 0) for even n and the second with c = (0; 0;:::; 0; 1): In general we consider distributionsU(c) for which c 1 (1) = 0, that is, where has no xed points, as is true for the two special cases of most interest. Noting that under this condition no expression of the form a ii appears in the sum (6.17), let a io = 1 n 2 n X j:j6=i a ij and a oo = 1 (n 1)(n 2) X i6=j a ij : Under the symmetry condition a ij =a ji for all i6=j, from [16], the mean =E[Y ] and variance 2 c = Var(Y ) for n 4 are given by = (n 2)a oo and 2 c = 1 n 1 + 2c 2 n(n 3) X i6=j (a ij 2a io +a oo ) 2 : (6.23) 109 When n is even and c is the cycle type of a xed point free involution, then c 2 = n=2 and the variance in (6.23) specializes to 2 c = 2(n 2) (n 1)(n 3) X i6=j (a ij 2a io +a oo ) 2 : (6.24) Whenc is the cycle type of permutations that have no two cycles, such as is the case for n 4 and permutations having only one long cycle, the variance in (6.23) becomes 2 c = 1 n 1 X i6=j (a ij 2a io +a oo ) 2 : (6.25) For an array (a ij ) i6=j let a o = max i6=j ja ij 2a io +a oo j: (6.26) Theorem 6.3.2 Letn 5 and (a ij ) n i;j=1 be an array of real numbers satisfyinga ij =a ji , and for some cycle type c without xed points let 2 S n have the uniform distribution U(c). Then the bounds of Theorem 6.2.1 hold with Y replaced by Y and 2 replaced by 2 c , where and 2 c are given by (6.23), and C replaced by 40a o with a o of (6.26). In the special case when n is even and is uniformly distributed over involutions without xed points, the same bounds hold with 2 given by 2 c of (6.24), and C = 24a o . Proof: When has theU(c) distribution, the constructions of [32] and [16], similar to those used in the proof of Theorem 6.3.1, provide a zero biased coupling for Y that satises j(Y) (Y)j 40a o ; and further show that the constant 40 can be replaced by 24 in the case of xed point free involutions. Theorem 6.2.1 now obtains. 110 Chapter 7 Concluding remarks and future directions In this chapter, we conclude our discussion by rst summarizing our contributions to the literature in previous chapters, and then list some problems that are still open to pursue. Our main contributions in this dissertation are: improving the concentration bounds via size biased couplings of [28], providing new applications for concentration inequalities with size biased couplings, exploring connections between size biasing and log-concavity, forming a general framework to construct bounded size biased couplings for counts based on multivariate occupancy models with log-concave marginals, making the connections to Stein's method more transparent, showing that the use of couplings can also lead to concentration inequalities for multivariate random variables, proving that other coupling approaches besides exchangeable pairs and size biasing can be used to obtain concentration bounds, observing that couplings from Stein's method can also be used for correlation inequalities, providing some extra information about the underlying random vari- ables besides their concentration and distributional approximations. 111 We end the discussion by pointing out some future directions which we hope to follow in the following period. 1. The rst problem concerns about forming a general framework for concentration bounds with the coupling approach. Chen and Roellin [17] introduced a class of couplings, known as Stein or G couplings, that generalize various constructions of Stein's method including the size biased couplings and exchangeable pairs. Their approach provides one possible direction for us to form, at least, a partial frame- work. More specically, for a coupling (W;W 0 ;G) of square integrable random variables, we call (W;W 0 ;G) a Stein coupling if E[Gf(W 0 )Gf(W )] =E[Wf(W )]: In [17], they prove that error bounds in normal approximation problems can be obtained with Stein couplings, generalizing various previous eorts in this direction. Our expectation is that Stein couplings will also reveal concentration bounds for random vectors (One can dene multivariate Stein couplings in a straightforward way) and for eigenvalue statistics of random matrices. 2. Can we use couplings other than the ones used in this dissertation in order to prove concentration bounds? As discussed in detail above, this will be especially interest- ing for couplings which are related to xed points of distributional transformations, as these will provide concentration bounds that take the underlying limiting dis- tribution into account, just like Stein's method for distributional approximations. 3. Is it possible to use couplings of Stein's method to analyze properties of random variables other than their concentration and limiting distribution? One example in this direction was discussed in Section 3.4 where we showed that size biased couplings are also useful to obtain correlation inequalities. It would be interesting 112 to see the connections of couplings to other problems, and we intend to start our work here by focusing on entropy estimation. 4. Where does the method stand among other techniques? Above we compared our results with several other techniques and showed that our approach provides improvements in certain cases. However, it would be more interesting to show that results from other techniques in the literature can be obtained via a coupling approach. For instance, it would be very interesting to know if one can prove Talagrand's inequality via a suitable coupling construction of Stein's method. Exploring the connections to concentration inequalities via self-bounding functions seems one possible way to begin this study. 5. Negative association of random variables, as discussed above, can be used to obtain concentration bounds for several problems. It was shown above that size biasing can do as well as negative association for certain occupancy statistics, and we are wondering whether one can extend these discussions to other statistics that can be analyzed via negative dependence. Similar questions can also be formulated for positively associated random variables. For example, in Section 3.4, we showed that size biased couplings can be used to understand certain positively associated sums. We are curious if these ideas can be generalized, and also whether couplings can provide a framework to obtain upper tail inequalities for positively associated sums? This would be interesting as there is almost nothing in the literature for positively associated random variables in this direction. 113 Bibliography [1] Alon, N. and Spencer, J. H., The probabilistic method, Third edition, John Wiley & Sons, Inc., Hoboken, NJ, 2008. [2] Arratia, R. and Baxendale, P., Bounded size bias coupling: a Gamma function bound, and universal Dickman-function behavior, preprint, http://arxiv.org/abs/1306.0157. [3] Arratia, R., Goldstein, L., and Gordon, L., Two Moments Suce for Poisson Approx- imations: The Chen-Stein Method, The Annals of Probability, Vol. 17, No. 1., pp. 9-25, 1989. [4] Arratia, R., Goldstein, L. and Kochman, F., Size bias for one and all, preprint, arxiv.org/abs/1308.2729. [5] Barbour, A. D., Karonski, M. and Rucinski, A., A central limit theorem for decom- posable random variables with applications to random graphs, J. Combin. Theory Ser. B, 47, 125-145, 1989. [6] Barbour, A., Janson, S. and Holst, L., Poisson Approximation, Oxford University Press, 1992. [7] Bartro, J. and Goldstein, L., A Berry-Esseen bound for the uniform multinomial occupancy model, Electronic Journal of Probability, 18, pp. 1-29, 2013. [8] Bartro, J., Goldstein, L., and I slak, U., Bounded size biased couplings, log-concave distributions and concentration of measure for occupancy models, 2014, preprint. [9] Bennett, G., Probability inequalities for sums of independent random variables, J. Amer. Statist. Assoc., 57 33-45, 1962. [10] Boucheron, S., Lugosi, G. and Massart, P., Concentration inequalities using the entropy method, Ann. Probab., 31, No. 3, 1583-1614, 2003. [11] Boucheron, S., Lugosi, G. and Massart, P., Concentration Inequalities: A Nonasymp- totic Theory of Independence, Oxford University Press, 2013. 114 [12] Chatterjee, S., Concentration inequalities with exchangeable pairs, Ph.D. thesis. Stanford University, 2005. [13] Chatterjee, S., Stein's method for concentration inequalities, Probab. Theory Related Fields, no. 1-2, 305-321, 2007. [14] Chatterjee, S. and Partha S. Dey, Applications of Stein's method for concentration inequalities, Ann. Probab., 38 no. 6, 2443-2485, 2010. [15] Chen, L.H.Y., On the convergence of Poisson binomial to Poisson distributions, Ann. Probab., 2, 178-180, 1974. [16] Chen, L.H.Y., Goldstein, L., Shao, Q. M., Normal approximation by Stein's method, Springer, Berlin, Heidelberg, 2011. [17] Chen, L.H.Y. and Roellin, A., Stein couplings for normal approximation, 2010, preprint. [18] Cherno, H., A Measure of Asymptotic Eciency for Tests of a Hypothesis Based on the sum of Observations, Annals of Mathematical Statistics, 23 (4): 493-507, 1952. [19] Chu, J. T., On bounds for the normal integral, Biometrika, 42, 263-265, 1955. [20] Chung, F. and Lu, L., Complex graphs and networks, American Mathematical Soci- ety, Providence, RI, 2006. [21] Joag-Dev, K. and Proschan, F., Negative association of random variables, with appli- cations, Ann. Statist. 11, no. 1, 286-295, 1983. [22] Dubhashi, D. P. and Panconesi, A., Concentration of measure for the analysis of randomized algorithms, Cambridge University Press, 2009. [23] Dubhashi, D. and Ranjan, D., Balls and bins: a study in negative dependence, Ran- dom Structures and Algorithms, 13, no. 2, 99-124, 1998. [24] Durrett, R., Probability: theory and examples, Fourth edition. Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge, 2010. [25] Efron, B. and Stein, C., The Jacknife Estimate of Variance, Annals of Statistics, 9:586-96, 1981. [26] Ehm, W., Binomial approximation to the Poisson binomial distribution, Statistics & Probability Letters, 11, 7{16, 1991. [27] Englund, G., A remainder term estimate for the normal approximation in classical occupancy, Ann. Probab., 9, pp. 684-692, 1981. [28] Ghosh, S. and Goldstein, L., Concentration of measures via size biased couplings, Probability Theory and Related Fields, 2011. 115 [29] Ghosh, S. and Goldstein, L., Applications of size biased couplings for concentration of measures, Electronic Communications in Probability, 2011. [30] Ghosh, S., Goldstein, L. and Raic, M., Concentration of measure for the number of isolated vertices in the Erd} os-R enyi random graph by size bias couplings, Statistics and Probability Letters, 2011. [31] Ghosh, S. and I slak, U., Multivariate concentration inequalities with size biased couplings, 2013, preprint, http://arxiv.org/abs/1310.5448. [32] Goldstein, L., Berry Esseen bounds for combinatorial central limit theorems and Pattern Occurrences, using Zero and Size Biasing, Jour. of Appl. Probab, 42, pp. 661-683, 2005.. [33] Goldstein, L. and I slak, U., Concentration inequalities via zero bias couplings, Statis- tics and Probability Letters, vol 86, pp. 17-23, 2014. [34] Goldstein, L. and Penrose, M., Normal approximation for coverage models over binomial point processes, Annals of Applied Probability, 20, pp. 696-721, 2010. [35] Goldstein, L. and Reinert, G., Stein's method and the zero bias transformation with application to simple random sampling, Ann. Appl. Probab., 7, pp. 935-952. arXiv:math.PR/0510619, 1997. [36] Goldstein, L., Reinert, G., Zero Biasing in One and Higher Dimensions, and Appli- cations, Stein's method and applications, Institute for Mathematical Sciences Lecture Notes Series No. 5, World Scientic Press, Singapore, pp. 1-18, 2005. [37] Goldstein, L. and Rinott, Y., Multivariate normal approximations by Stein's method and size bias couplings, J. Appl. Probab., 33(1), 1-17, 1996. [38] Goldstein, L. and Rinott, Y., A permutation test for matching and its asymptotic distribution, Metron, 61, pp. 375-388, 2003. [39] Hoeding, W., A combinatorial central limit theorem, Ann. Math. Stat., 22, pp. 558-566, 1951. [40] Hoeding, W., Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58, pp. 13-30, 1963. [41] Janson, S. and Nowicki, K., The asymptotic distributions of generalized U-statistics with applications to random graphs, Probab. Theory Related Fields, 90, 341-375, 1991. [42] Janson, S., New versions of Suen's correlation inequality, Proceedings of the Eighth International Conference \Random Structures and Algorithms" (Poznan, 1997), Ran- dom Structures Algorithms 13, no. 3-4, 467-483, 1998. [43] I slak, U., Asymptotic behavior of random sums of locally dependent random vari- ables, 2013, preprint, http://arxiv.org/abs/1303.2386. 116 [44] Karonski, M. and Rucinski, A., Poisson convergence and semi-induced properties of random graphs, Math. Proc. Cambridge Philos. Soc., 101, 291-300, 1987. [45] Keilson, J. and Gerber, H., Some results for discrete unimodality, Journal of the American Statistical Association, 66, pp. 386-389, 1971. [46] Kolchin, V. F. and Chistyakov, V. P., On a combinatorial limit theorem, Theory of Probability and Its Applications, 18, 728-739, 1973. [47] Kolchin, V. F., Sevast 0 yanov, B. A. and Chistyakov, V. P., Random allocations, V. H. Winston & Sons, Washington, D.C. Translated from the Russian, Translation edited by A. V. Balakrishnan, Scripta Series in Mathematics, 1978. [48] Ledoux, M., The concentration of measure phenomenon, Amer. Math. Soc., Provi- dence, RI, 2001. [49] Lin, K. and Reinert, G., Joint vertex degrees in the inhomogeneous random graph model g(n;p ij ), Adv. in Appl. Probab., 44, pp. 139-165, 2012. [50] Lindvall, T., Lectures on the coupling method, Dover Publications, 2002. [51] Mackey, L., Jordan, M. I., Chen, R. Y., Farrell, B. and Tropp, J. A., Probability inequalities for random matrices via the method of exchangeable pairs, Annals of Probability, arXiv:1201.6002, 2012. [52] Matousek, J., Lectures on discrete geometry, Graduate Texts in Mathematics, 212, Springer-Verlag, New York, 2002. [53] McDiarmid, C., On the method of bounded dierences, Surveys in combinatorics (J. Siemons, ed.), London Math. Soc. Lecture Notes Ser., 141, 148-188, 1989. [54] McDiarmid, C., Concentration, Algorithms Combin., 16, 1998. [55] McDiarmid, C. and Reed, B., Concentration for self-bounding functions and an inequality of Talagrand, Random Structures & Algorithms, (206), 29, pp. 549-557, 2006. [56] Penrose, M., Normal approximation for isolated balls in an urn allocation model, Electron. J. Probab, 14, pp. 2156-2181, 2009. [57] Pike, J. and Ren, H., Stein's method and the Laplace distribution, preprint, arxiv.org/abs/1210.5775. [58] Pek oz, E., Roellin, A. and Ross, N., Total variation error bounds for geometric approximation, Bernoulli 19 (2), 610-632, 2013. [59] Pek oz, E. and Roellin, A., New rates for exponential approximation and the theorems of R enyi and Yaglom, Ann. Probab. 39, no. 2, 587-608, [13], 2011. 117 [60] Raginsky, M. and Sason, I., Concentration of Measure Inequalities in Information Theory, Communications and Coding, Foundations and Trends in Communications and Information Theory, vol. 10, no 1-2, 2013. [61] Robbins, H. E., Estimating the Total Probability of the Unobserved Outcomes of an Experiment, The Annals of Mathematical Statistics, 39, 256-257, 1968. [62] Ross, N. F., Fundamentals of Stein's method, Probability Surveys, 8, 210-293 (elec- tronic), 2011. [63] Steele, J. M.,An Efron-Stein inequality for nonsymmetric statistics, Ann. Statist., 14, 753-758, 1986. [64] Steele, M. J., Probability theory and combinatorial optimization, CBMS-NSF Regional Conference Series in Applied Mathematics, 69, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997. [65] Stein, C., A bound for the error in the normal approximation to the distribution of a sum of dependent random variables, Proc. Sixth Berkeley Symp. Math. Statist. Probab., 2, 583-602, Univ. California Press, Berkeley, 1972. [66] Stein, C., Approximate Computation of Expectations, Institute of Mathematical Statistics, Hayward, CA, 1986. [67] Suen, W. C. S., A correlation inequality and a Poisson limit theorem for nonover- lapping balanced subgraphs of a random graph, Random Structures Algorithms 1, no. 2, 231-242, 1990. [68] Talagrand, M., A new look at independence, Ann. Probab. 24, 1-34, 1996. [69] Thisted, R. and Efron, B., Did Shakespeare Write a Newly-discovered Poem?, Biometrika, 74, 445-455, 1987. [70] Wald, A., Wolfowitz, J., Statistical tests based on permutations of the observations, Ann. Math. Stat. 15, 358-372, 1944. [71] Weiss, I., Limiting distributions in some occupancy problems, The Annals of Math- ematical Statistics, 29, pp 878-884, 1958. 118
Abstract (if available)
Abstract
Stein's method is a technique in probability theory introduced by Charles Stein in 1972 that enables one to obtain convergence rates in distributional approximations. The method makes use of characterizing differential equations of distributions and various coupling constructions to get error bounds with respect to certain probability metrics. In 2007, Sourav Chatterjee showed that one particular coupling from Stein's method, exchangeable pairs, can be used to prove concentration of measure inequalities in a wide range of problems in dependent settings. Later in 2011, his approach also found use by Subhankar Ghosh and Larry Goldstein where they proved new concentration inequalities via size biased couplings which provide another useful framework in Stein's method. This dissertation contributes to the concentration of measure literature by following the latter of these two works and improving their results in several ways. Besides, we introduce two new directions by making use of zero biasing and equilibrium couplings. Further, we provide multivariate extensions of our results, obtain correlation inequalities by using couplings and make the connections to Stein's method more transparent by an understanding of fixed points of distributional transformations. All of our results are illustrated through several nontrivial examples mainly on random graphs, random permutations and occupancy models.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Stein couplings for Berry-Esseen bounds and concentration inequalities
PDF
Limit theorems for three random discrete structures via Stein's method
PDF
Cycle structures of permutations with restricted positions
PDF
Stein's method via approximate zero biasing and positive association with applications to combinatorial central limit theorem and statistical physics
PDF
Exchangeable pairs in Stein's method of distributional approximation
PDF
Finite sample bounds in group sequential analysis via Stein's method
PDF
Applications of Stein's method on statistics of random graphs
PDF
Obtaining breath alcohol concentration from transdermal alcohol concentration using Bayesian approaches
PDF
High order correlations in sampling and concentration bounds via size biasing
PDF
Probabilistic numerical methods for fully nonlinear PDEs and related topics
PDF
Delta Method confidence bands for parameter-dependent impulse response functions, convolutions, and deconvolutions arising from evolution systems described by…
PDF
Probabilistic divide-and-conquer -- a new method of exact simulation -- and lower bound expansions for random Bernoulli matrices via novel integer partitions
PDF
Stein's method and its applications in strong embeddings and Dickman approximations
PDF
M-estimation and non-parametric estimation of a random diffusion equation-based population model for the transdermal transport of ethanol: deconvolution and uncertainty quantification
PDF
A Bayesian approach for estimating breath from transdermal alcohol concentration
PDF
On the depinning transition of the directed polymer in a random environment with a defect line
PDF
Population modeling and Bayesian estimation for the deconvolution of blood alcohol concentration from transdermal alcohol biosensor data
PDF
Path dependent partial differential equations and related topics
PDF
Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol
PDF
An abstract hyperbolic population model for the transdermal transport of ethanol in humans: estimating the distribution of random parameters and the deconvolution of breath alcohol concentration
Asset Metadata
Creator
Işlak, Ümit
(author)
Core Title
Concentration inequalities with bounded couplings
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Publication Date
06/16/2014
Defense Date
05/07/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
concentration inequalities,correlation inequalities,couplings,OAI-PMH Harvest,Stein's method
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Fulman, Jason (
committee chair
), Goldstein, Larry (
committee member
), Kocer, Yilmaz (
committee member
)
Creator Email
islak@usc.edu,umitislak@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-420563
Unique identifier
UC11285962
Identifier
etd-IslakUmitI-2549.pdf (filename),usctheses-c3-420563 (legacy record id)
Legacy Identifier
etd-IslakUmitI-2549.pdf
Dmrecord
420563
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Işlak, Ümit; Islak, Umit
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
concentration inequalities
correlation inequalities
couplings
Stein's method