Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Stein's method and its applications in strong embeddings and Dickman approximations
(USC Thesis Other)
Stein's method and its applications in strong embeddings and Dickman approximations
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
STEIN’S METHOD AND ITS APPLICATIONS IN STRONG EMBEDDINGS AND DICKMAN APPROXIMATIONS by Chinmoy Bhattacharjee A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (APPLIED MATHEMATICS) May 2018 Copyright 2018 Chinmoy Bhattacharjee How merciful that the turtle doesn’t see the little bird’s effortless flight. ∼ Abbas Kiarostami ii Acknowledgments I would first and foremost like to thank my parents for being extremely supportive in all of my pursuits and inspiring me to follow my dreams. Their outlook towards life has greatly shaped the person I am today and continues to inspire me in all my creative endeavors. I would like to thank my brother Tanmoy whose great support and guidance has been instrumental in shaping my career and life as a whole. My sincere gratitude goes to my advisor Professor Larry Goldstein for his overwhelming support. Larry has been an excellent mentor and a father figure to me. His support has been crucial not only in realizing this work but also in my overall development as a mathematician. I cannot thank him enough for his great patience, care and diligence. Without him, this work simply would not exist. I would also like to thank professors Peter Baxendale and Jinchi Lv for agreeing to read this dissertation. My sincere thanks also goes to professors Jason Fulman and Kenneth Alexander for many helpful discussions and their continued support. I would like to thank all the faculty members, my fellow graduate students and the staff in the department for creating a great environment where I felt at home. Last but by no means least, I would like to thank all my closest friends whose presence and constant supply of courage made life a joy. iii Contents Acknowledgments iii Abstract vi Notations vii Chapter 1: Introduction 1 1 Summary of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Chapter 2: Review of Existing Literature 8 1 Strong embedding and KMT approximation . . . . . . . . . . . . . . . . . . 8 2 Dickman approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Stein’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 Independent sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Exchangeable pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3 Zero bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 Size bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.5 Stein coefficient and second order Poincar´ e inequalities . . . . . . . . 18 I On Strong Embeddings by Stein’s Method 19 Chapter 3: Strong Embeddings 20 1 Introduction and main results . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2 Bounds for couplings to Gaussian variables . . . . . . . . . . . . . . . . . . 23 3 The induction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4 Proof of Theorem 3.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 II Dickman Approximation and its Applications 58 Chapter 4: Dickman Approximation in Summations, Probabilistic Number Theory 59 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2 Dickman approximation of sums . . . . . . . . . . . . . . . . . . . . . . . . 64 2.1 Weighted Bernoulli and Poisson sums . . . . . . . . . . . . . . . . . 64 2.2 Dickman approximation in number theory . . . . . . . . . . . . . . . 69 iv 3 Smoothness bounds for f(·) . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Chapter 5: Perpetuities and theD θ,s Family, Simulations and Distributional Bounds 83 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 2 Existence and uniqueness ofD θ,s distribution . . . . . . . . . . . . . . . . . 85 3 Smoothness bounds for g(·) . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4 Distributional bounds forD θ,s approximation and Simulations . . . . . . . . 96 5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Bibliography 107 Appendices 113 Chapter A: Proof of Theorem 3.1.1 using Lemma 3.4.1 113 Chapter B: Dickman approximation proofs 115 v Abstract The aim of this dissertation is twofold. First, in Part I, we apply Stein’s method in the con- text of embedding problems to extend Chatterjee’s novel approach to more general random walks. Specifically, we provide logn rates for the coupling of partial sums of independent variables to a Brownian motion when the steps are independent and identically distributed (i.i.d.) with mean zero, variance one and having vanishing third moment and taking values in a finite set not containing zero. In Part II of the dissertation, we consider Dickman approximation using Stein’s method. The generalized Dickman distributionD θ with parameter θ > 0 is the unique solution to the distributional equality W = d W ∗ , where W ∗ = d U 1/θ (W + 1), (0.1) withW non-negative,U∼ U[0, 1], uniform over the interval [0, 1] independent ofW , and = d denoting equality in distribution. Members of this family appear in number theory, stochas- tic geometry, perpetuities and the study of algorithms. We obtain bounds in Wasserstein type distances betweenD θ and certain randomly weighted sums of Bernoulli and Poisson random variables. We also give simple proofs and provide rates for the Dickman convergence of certain weighted sums arising in probabilistic number theory. In addition, we broaden the class of generalized Dickman distributions by studying the fixed points of the transformation s(W ∗ ) = d U 1/θ s(W + 1) generalizing (0.1), that allows the introduction of non-identity utility functions s(·) into Vervaat perpetuities. We obtain distributional bounds for recursive methods that can be used to simulate from this family. vi Notations Throughout this work, P (·) denotes probability of an event,E(X) and Var(X) denotes the expectation and variance of a random variableX. Below, we mention some other notations that we will often use without necessarily defining. R : Set of real numbers. N : Set of natural numbers. = d : We say X = d Y if the random variables X and Y have the same probability distribu- tion. L(·) : We writeL(X) to denote the law or distribution of the random variable X. ∼: For a random variable X and distribution F (·), we write X∼F if X is distributed as F (·). [·] : Floor function, i.e. [x] denotes the largest integer smaller than or equal to x. 1(·) : Indicator function. O : For two functions f(·) and g(·), we say f(x) = O(g(x)) as x→∞ if there exists a positive constant M and real number x 0 such that|f(x)|≤M|g(x)| for all x≥x 0 . o : For two functionsf(·) andg(·), we sayf(x) =o(g(x)) asx→∞ if for every> 0, there exists a constant M such that|f(x)|≤|g(x)| for all x≥M. O p : A sequence of random variables X n =O p (a n ) if for all > 0, there exists M > 0 such that P (|X n /a n |>M)< for all n. ≈: Approximately equal. C c : Set of continuous function with compact support. C k : Set of k times continuously differentiable functions. L k : Given a measure space (X, Σ,μ), L k (μ) denotes the set of all real valued function f(·) defined on X with Z X |f| k dμ<∞. vii U[a,b] : Uniform distribution on the interval [a,b] for−∞<a<b<∞. kfk ∞ : For a real-valued measurable function f(·) on the domain S⊂R,kfk ∞ denotes its essential supremum norm defined by kfk ∞ = ess sup x∈S |f(x)| = inf{b∈R :m({x :f(x)>b}) = 0}, where m(·) denotes the Lebesgue measure onR. kfk • : For any real valued function defined on its domain S and A⊂ S, we define its supremum norm on A by kfk A = sup x∈A |f(x)|. viii Chapter 1 Introduction One of the central aspects of probability theory is the notion of distributional approximation by which one can express the distribution of a complicated random variable by a relatively simple limiting distribution. The Central Limit Theorem (CLT) is one of the most fun- damental and useful results in this direction. CLT establishes that typically, the sum of a large number of independent random variables of roughly the same size, when properly normalized behaves like a standard normal random variable. The study of CLT started in the work of Abraham de Moivre in the 18 th century in the context of approximation of Binomial distributions. Since then, it has been generalized to a much broader class - the question of normal approximation when the summand variables are independent but not necessarily identically distributed was answered in Lindeberg-Feller-Levy Theorem (see [36]). While weak convergence results such as CLT give a sense of how the distribution of a random variable looks asymptotically, one natural question that arises is how well does the target distribution approximate the distribution of interest non-asymptotically. The heavy use of distributional approximations in practice, the Gaussian being the most pervasive, necessitate the need to study the approximation error. Stein’s remarkable idea for studying the normal approximation first appeared about forty years ago in his groundbreaking work [74] and since then the subject has seen an explosion in the number of ideas and applications in a variety of areas, including combinatorics, random graph theory, statistics, random matrix theory and statistical physics, to name a few. The method has also been used to prove concentration inequalities which has provided a way to study many complicated systems with genuine dependence. Malliavin calculus combined with Stein’s method has provided powerful techniques to study functionals of Gaussian fields, Rademacher sequences and Poisson point processes. In the next chapter, we will do a short review of Stein’s method and discuss in more detail some of the techniques that will be useful for us. This aim of this dissertation is to further explore the possibilities to apply Stein’s method in the context of Strong embeddings following Chatterjee’s work in this area and also modify 1 the method to extend its use to Dickman approximations. Now we give a brief chapter by chapter summary of this work. 1 Summary of Thesis We postpone a discussion of Stein’s method until the next chapter. Here we present a brief description of our work with the statements of the main theorems proved in this thesis to give the prospective reader an overview of the work. In Section 1 of Chapter 2, we will review the main existing results from strong embed- dings and the more recent breakthrough work of Chatterjee [22] which hinted at the vast possibility of using Stein’s method to prove strong embedding and which is the main inspi- ration for our work in this area. We will then review previous works to prove distributional approximation by Dickman distribution, most notably by Goldstein [40, 41], Pinsky [58] and Swan et. al. [1] in Section 2. This forms the basis of our work on Dickman approximation. Finally, we will briefly review Stein’s method in the last section of this chapter. Chapter 3 forms Part I of our work. In this chapter, we will extend Chatterjee’s novel use of Stein’s method in strong embedding. To state the main results, we need the following definition. Definition 1.1. We say Strong Embedding (SE) holds for the mean zero, variance one random variable if there exist constants C,K, and λ such that for all n = 1, 2,... the partial sumsS k = k P i=1 i ,k = 1,...,n of a sequence 1 , 2 ... of independent random variables distributed as , and a standard Brownian motion (B t ) t≥0 can be constructed on a joint probability space such that P max 0≤k≤n |S k −B k |≥C logn +x ≤Ke −λx for all x≥ 0. (1.1) where we adopt the standard empty sum convention whereby S 0 = 0. Theorem 3.1.1. SE holds for , any random variable with mean zero and variance 1 satisfying E 3 = 0, taking values in a finite setA not containing 0. The proof of this theorem is largely inspired from Chatterjee’s work which proved strong embedding for symmetric Rademacher summands i.e. when takes the values±1 each with probability 1/2. One should also note here that strong embedding is already known for random variables with finite moment generating function in a neighbourhood of zero due to the groundbreaking works of Koml´ os, Major and Tusn´ ady [48, 49]. But the approach taken in their work is technically very hard and seems challenging to generalize. The main 2 aim of Chatterjee’s work and ours is to provide a way to prove strong embedding that is potentially more amenable to generalizations. From (1.1), one can see that to prove strong embedding, it is enough to bound the moment generating function of the random variable max 0≤k≤n |S k −B k |. To this end, for any positive integern, let 1 , 2 ,..., n be exchangeable random variables taking values in a finite setA⊂R not containing zero. DefineW k =S k −(k/n)S n andγ 2 =n −1 P n i=1 2 i . Then for (B t ) 0≤t≤1 a standard Brownian bridge, we prove a bound on the moment generating function of max 0≤k≤n |W k − √ nγB k/n | which in turn implies the above theorem. We note that this alternative approach is able to handle sequences of exchangeable random variables, in contrast to the usual approach in [48]. Theorem 3.1.2. For any positive integer n, let 1 , 2 ,..., n be exchangeable random vari- ables taking values in a finite setA⊂R. Let S k = k X i=1 i , W k =S k − k n S n and γ 2 = 1 n n X i=1 2 i . Then there exists a positive universal constant C, and for all ν > 0 positive constants K 1 ,K 2 and λ 0 depending only onA and ν, such that for all n≥ 1 and η≥ ν, a version of W 0 ,W 1 ,...,W n and a standard Brownian bridge (B t ) 0≤t≤1 exist on the same probability space and satisfy E exp(λ max 0≤k≤n |W k − √ nηB k/n |) ≤ exp(C logn)E exp K 1 λ 2 S 2 n n +K 2 λ 2 n(γ 2 −η 2 ) 2 ! for all λ≤λ 0 . Moreover, if 06∈A, then there exist positive constants K 1 andλ 0 depending only onA such that E exp(λ max 0≤k≤n |W k − √ nγB k/n |)≤ exp(C logn)E exp K 1 λ 2 S 2 n n ! for all λ≤λ 0 , 3 and if in addition 1 ,..., n are i.i.d. with zero mean, then there exists a positiveλ depending only onA such that P max 0≤k≤n |W k − √ nγB k/n |≥λ −1 C logn +x ≤ 2e −λx for all x≥ 0. Chapters 4 and 5 form the second part (Part II) of the dissertation. In Chapter 4, We will consider distributional approximation by Dickman distribution developing on recent works like [40, 41, 19, 1, 58]. Forθ> 0, the generalized Dickman distributionD θ can be defined as the unique distribution for a non-negative random variable W such that W = d W ∗ , where W ∗ = d U 1/θ (W + 1), (1.2) with U a uniform random variable on [0, 1] and independent of W . The distributionD 1 is called the standard Dickman distribution. For α≥ 0, denote Lip α ={h :|h(x)−h(y)|≤α|x−y|}. (1.3) In [40], it was shown that a non-negative random variable W is distributed asD θ if and only if E[(W/θ)f 0 (W ) +f(W )−f(W + 1)] = 0 for all f∈∪ α≥0 Lip α . Hence we consider the differential-delay Stein equation given by (x/θ)f 0 (x) +f(x)−f(x + 1) =h(x)−E[h(D θ )], (1.4) whereD θ denotes aD θ distributed random variable. To apply the method, uniform bounds on the smoothness of the solution f(·) over test functions h(·) in some classH is required. For α≥ 0,β≥ 0, define H α,β ={h :h∈ Lip α ,h 0 ∈ Lip β }, with Lip α given in (1.3). We will achieve bounds on the smoothness of the solution for the classH 1,1 in Section 3 of Chapter 4. Theorem 4.1.5. For every θ> 0 and h∈H 1,1 , there exists a solution f∈H θ,θ/2 to (1.4) withkf 0 k (0,∞) ≤θ andkf 00 k (0,∞) ≤θ/2. 4 Next, in Section 2 of Chapter 4, we will study the approximation of sums that converge to Dickman. First in Section 2.1 we consider sums of the form W n = 1 n n X k=1 Y k B k , (1.5) where{B 1 ,...,B n ,Y 1 ,...,Y n } are independent, B k is a Bernoulli random variable with success probability 1/k, and Y k is non-negative with EY k = k, and Var(Y k ) = σ 2 k for all k = 1,...,n. The most well known case is the one where Y k = k almost surely (a.s.), for which W n = 1 n n X k=1 kB k . (1.6) To state the result we will obtain for such sums, we first define the Wasserstein-2 metric d 1,1 (X,Y ) = sup h∈H 1,1 |Eh(Y )−Eh(X)|. (1.7) The work [1] obtains a bound of the order √ logn/n on the distance betweenW n in (1.6) and D, a standard Dickman random variable, in a metric weaker than the Wasserstein-2 metric. We consider a more general sum in (1.5) and provide bounds on the Wasserstein- 2 distance between such sums and the standard Dickman distribution. In particular, we obtain a bound of the order 1/n in the special case considered in [1]. Theorem 4.1.1. LetW n be as in (1.5) andD a standard Dickman random variable. Then with the metric d 1,1 in (1.7), d 1,1 (W n ,D)≤ 3 4n + 1 2n 2 n X k=1 1 k q (σ 2 k +k 2 )σ 2 k , and in particular if Y k =k a.s., that is, for W n as in (1.6), d 1,1 (W n ,D)≤ 3 4n . We will also prove a related result for weighted sums of independent Poisson random variables. In Section 2.2, we will study Dickman approximation of weighted geometric and Bernoulli sums that appear in probabilistic number theory. Probabilistic approaches have 5 been very useful in furthering the understanding of the distribution of primes, and gener- ally to provide insights into the properties of certain classes of numbers. Let (p k ) k≥1 be an enumeration of the prime numbers in increasing order and Ω n be the set of integers with largest prime factor less than or equal to p n . Consider the ‘harmonic’ probability measure Π n on Ω n defined as Π n (m) = 1 π n m for m∈ Ω n with π n = P m∈Ωn 1/m, the normalizing constant. A natural question is whether this distribution can be approximated when n is large. Consider the random variable S n = logM n / log(p n ) whereM n is distributed as Π n . It was proved in [58] thatS n convergence in law to the standard Dickman distribution. We will prove the following rate of convergence in the Wasserstein-2 norm. Theorem 4.1.3. For D a standard Dickman random variable, with M n ∼ Π n , we have d 1,1 (S n ,D)≤ C logn for some universal constant C. One may instead consider the distribution Π 0 n over Ω 0 n , the set of square-free integers with largest prime factor less than or equal to p n , with Π 0 n (m) proportional to 1/m for all m∈ Ω 0 n . ForM n distributed as Π 0 n , it was proved in [19] that S n = logM n / log(p n ) weakly converges to the standard Dickman distributionand very recently a (log logn) 3/2 (logn) −1 rate was provided in [1] in a metric defined as a supremum over a class of three times differentiable functions. We will provide the improved (logn) −1 convergence rate in the stronger Wasserstein-2 norm in Theorem 4.1.4. Finally in Chapter 5, we will define a new class of Dickman type random variables and consider their connection to perpetuities. The transformation (1.2) was interpreted by Vervaat in [79] as the relation between the values of an asset at two successive times in a perpetuity. In particular, during the n th time period a deposit of some fixed value, scaled to be unity, is added to the value of an asset. During that time period, a multiplicative factor in [0, 1], taken to be U 1/θ , is applied accounting for depreciation. The generalized Dickman distributions arise as fixed points of this recursion, that is, solutions to W ∗ = d W where W ∗ is given in (1.2). By approaching from the view of utility, we extend the scope of the Dickman distributions past the currently known class. 6 In [13], Daniel Bernoulli argued that utility of an asset should be given as a concave function of the value of the asset. In practice, many of the utility functions used are indeed concave, see [34]. Hence, one is led to the generalized model s(W n+1 ) =U 1/θ n s(W n + 1) (1.8) where s(·) is the utility function of an asset. As for the classical Vervaat perpetuities, one can now seek fixed points of the transformation W ∗ = d s −1 (U 1/θ s(W + 1)) (1.9) when s(·) is concave. Under mild and natural conditions on the utility function s(·), we will prove that the transformation (1.9) has a unique fixed point, denoted by D θ,s here. The class of such distributional fixed points strictly extends the currently known class of generalized Dickman distributions, now depending on two parameters θ and s(·). We will generalize approximation results in [40] to this new family. In particular, we will prove the following inequality for the Wasserstein distance (2.3) between a non-negative random variable W and D θ,s : d 1 (W,D θ,s )≤ (1−ρ) −1 d 1 (W ∗ ,W ), (1.10) where ρ is a parameter depending on θ and s(·). We will then apply (1.10) to assess the quality of the recursive scheme (1.8) for the simulation of variables having theD θ,s distribution. 7 Chapter 2 Review of Existing Literature This chapter is dedicated to reviewing the main results in the areas that this dissertation touches upon. In the first section, we will review the advances in strong embedding, starting with the breakthrough work of Koml´ os, Major and Tusn´ ady and then discussing some more recent developments by Chatterjee. Then in Section 2, we will review some recent works in the context of Dickman approximation. Finally, we will review the basics of Stein’s method and some coupling techniques that will be useful to us in Section 3. 1 Strong embedding and KMT approximation Let 1 , 2 ... be a sequence of independent random variables distributed as , a mean zero, variance one random variable. Letting S k = P k i=1 i ,k = 1, 2,... , be the corresponding sequence of partial sums, Donsker’s invariance principle [33], see also [16], implies that the random continuous function X n (t) = 1 √ n (S [nt] + (nt− [nt]) [nt]+1 ), 0≤t≤ 1 converges weakly to a Brownian motion process (B t ) 0≤t≤1 . One way to study the quality of the approximation of X n (t) by B t is to determine a ‘slowly increasing’ sequence f(n) such that there exists an embedding of both processes on a common probability space such that max 0≤k≤n |S k −B k | =O p (f(n)). Finding the smallest achievable order of f(n) has been a very important question in the literature. The rate (n log logn) 1/4 (logn) 1/2 was achieved by Skorokhod [71], also see its translation [72] and Strassen [77] assuming E 4 < ∞ using Skorokhod embedding, and Kiefer [47] showed that this rate was optimal under the finite fourth moment condition. Cs¨ org˝ o and R´ ev´ esz [28] made improvements to the rate under additional moment assumptions. See the survey paper by Ob l´ oj [52] and [29] for a more detailed account. 8 The celebrated KMT approximation by Koml´ os, Major and Tusn´ ady ([48], [49]) achieved the rate logn under the condition that has finite moment generating function in a neigh- borhood of zero. Recall Definition 1.1 of Strong Embedding (SE). Theorem 2.1.1 (KMT approximation [48]). SE holds for satisfying E expθ|| <∞ for some θ> 0. Results by B´ artfai [10], see [83], showed that the rate in (1.1) is best possible under the finite moment generating function condition. A multidimensional version of the KMT approximation was proved by Einmahl [35], from which Zaitsev [81, 82] removed a logarith- mic factor. For extensions to stationary sequences see the history in [12], where dependent variables of the form X k = G(..., k−1 , k , k+1 ,...) for i ,i ∈ Z i.i.d. are considered. Strong embedding results have a truly extensive range of applications that includes empir- ical processes, non-parametric statistics, survival analysis, time series, and reliability; for a sampling see the texts [70, 29], or the articles [30, 80, 53]. In our work, we take the approach to the KMT approximation introduced by Chatterjee in [22] that has its origins in Stein’s method [74] and appears simpler, and is possibly easier to generalize, than the dyadic approximation argument of [48]. This alternative approach depends on the use of Stein coefficients, also known as Stein kernels, that first appeared in the work of Cacoullos and Papathanasiou [18]. For W a mean zero random variable with varianceσ 2 , we say the random variable T defined on the same probability space is a Stein coefficient for W if E[Wf(W )] =E[Tf 0 (W )] for all Lipschitz functions f(·) and f 0 (·) any a.e. derivative of f(·), whenever these ex- pectations exist. Theorem 2.1.2 below, from [22], demonstrates that a coupling of W and Z∼N (0,σ 2 ) exists whose quality can be evaluated uniquely as a function of T and σ 2 . Theorem 2.1.2 (Chatterjee [22]). Let W be mean zero with finite second moment and suppose that T is a Stein coefficient for W with|T| almost surely bounded by a constant. Then, given any σ 2 > 0, we can construct a version of W and Z∼N (0,σ 2 ) on the same probability space such that E exp(θ|W−Z|)≤ 2E exp 2θ 2 (T−σ 2 ) 2 σ 2 ! for all θ∈R. 9 Theorem 2.1.3, that demonstrates Theorem 2.1.1 for the special case of simple symmetric random walk, was proved in [22] applying this approach. Theorem 2.1.3 (Chatterjee [22]). SE holds for a symmetric random variable with support {−1, +1}. In this work, using the methods of [22], we will generalize Theorem 2.1.3 to prove strong embedding for , any random variable with mean zero and variance one satisfyingE 3 = 0, taking values in a finite setA not containing zero. To prove our result we employ induction, as in [22]. The induction step requires ex- tending Theorem 2.1.4 below of [22] from the special case where is a symmetric variable taking values in{−1, 1}. Theorem 2.1.4. There exist positive universal constants C, K and λ 0 such that the fol- lowing is true. Take any integer n≥ 2. Suppose 1 ,..., n are exchangeable±1 random variables. For k = 0,...,n, let S k = P k i=1 i and let W k =S k − (k/n)S n . It is possible to construct a version of W 0 ,...,W n and a standard Brownian bridge ( e B t ) 0≤t≤1 on the same probability space such that for any 0<λ<λ 0 , E exp(λ max 0≤k≤n |W k − √ n e B k/n |)≤ exp(C logn)E exp Kλ 2 S 2 n n ! . The generalization of Theorem 2.1.4 depends on the ‘zero-bias’ smoothing method in- troduced in Lemma 3.2.3 in Section 2 of Chapter 3, which may be of independent interest as regards the construction of Stein coefficients. We will briefly review zero-bias transforms in Section 3. 2 Dickman approximation The Dickman distributionD first made its appearance in [32] in the context of smooth numbers which is the class of integers with small prime factors. Forx andy positive integers, letψ(x,y) denote the number of integers less or equal to x with all its prime factors less or equal to y. Dickman [32] showed that lim x→∞ ψ(x,x 1/u )/(xρ(u)) = 1 for u> 0 where ρ(·) is the Dickman-de Bruijn function which is defined as a continuous function that satisfies the delay differential equation xρ 0 (x) +ρ(x− 1) = 0 (2.1) 10 with initial conditions ρ(x) = 1 for u∈ [0, 1]; see the more recent work [58] for a readable explanation of how the Dickman distribution arises there. Since then, it has appeared, sometimes curiously, in many problems in number theory and in probability. For θ > 0, the generalized Dickman distributionD θ can be defined as the unique distribution for a non-negative random variable W such that W = d W ∗ , where W ∗ = d U 1/θ (W + 1), (2.2) with U a uniform random variable on [0, 1] and independent of W . The broader class of generalized Dickman distributionsD θ for θ > 0, of whichD =D 1 , have since been used to approximate counts in logarithmic combinatorial structures, including permutations and partitions in [3], and more generally for the quasi-logarithmic class considered in [9], for the weighted sum of edges connecting vertices to the origin in minimal directed spanning trees in [57], and for certain weighted sums of independent random variables in [59]. Simulation of the generalized Dickman distribution has been considered in [31] and in connection with the Quickselect sorting algorithm in [45]. We will review some of the recent developments here. Define the Wasserstein distance between two random variables X and Y as d 1 (X,Y ) = sup h∈Lip 1 |Eh(X)−Eh(Y )| (2.3) with Lip α ={h :|h(x)−h(y)|≤α|x−y|} for α≥ 0. Generally, distributional characterizations like (2.2) serve as a starting point for Stein’s method. In [40], a O(logn/n) rate was provided for the convergence of the runtime of the Quickselect algorithm to find the m th smallest element from a list of n distinct numbers using Stein’s method. That result follows from the following theorem. Define theθ-Dickman bias distribution of W by (2.2). The following bound was proved in [40] and is one of the starting points of our work in Chapters 4 and 5. Theorem 2.2.1 (Goldstein [40]). For θ > 0, let D θ denote aD θ distributed random vari- able. For a non-negative random variable W , we have d 1 (W,D θ )≤ (1 +θ)d 1 (W ∗ ,W ). 11 The Stein equation used in [40] was of the integral type g(x)−A x+1 g =h(x)−E[h(D θ )] (2.4) where the averaging operator A x g was given by A x g = g(0) for x = 0 θ x θ R x 0 g(u)u θ−1 du for x> 0. The following theorem from [40] gives smoothness bounds for the solutions to the Stein equation (2.4). Suppressing dependence on θ, let Lip α,0 ={h : [0,∞)→R :h∈ Lip α ,E[h(D θ )] = 0}. (2.5) For a given function h∈ Lip α,0 for some α≥ 0, let h (?k) (x) =A k x+1 h for k≥ 0, g(x) = X k≥0 h (?k) (x) and g n (x) = n X k=0 h (?k) (x). (2.6) Theorem 2.2.2 (Goldstein [40]). For every θ> 0 and h∈ Lip 1,0 , letting ρ =θ/(θ + 1), we have kh (?k) k [0,a] ≤ (θ +a)ρ k , (2.7) g n ∈ Lip (1−ρ n+1 )/(1−ρ) and g(·) given in (2.6) is a Lip 1/(1−ρ) solution to (2.4). We will generalize Theorems 2.2.2 and 2.2.1 to Theorems 5.3.10 and 5.4.2 respectively in Chapter 5. We will discuss more about these results in Part II of the thesis. Dickman approximation also appeared in probabilistic number theory in some recent works [19, 1, 58]. Letting (p k ) k≥1 be an enumeration of the prime numbers in increasing order, consider the random variable logM/ logp n where M∼ Π n has mass function Π n (m) = 1 π n m for m∈ Ω n with normalizing constant π n and Ω n the set of all positive integers that does not contain a prime factor larger than p n . Theorem 2.2.3 (Pinsky [58]). WithM∼ Π n , the random variable logM/ logp n converges weakly to the standard Dickman distribution. 12 For X k ∼ Geom(1− 1/p k ) for 1≤k≤n, it can be shown that under Π n , logM/ logp n = d S n := 1 log(p n ) n X k=1 X k log(p k ). We will provide a rate for this convergence in Chapter 4. If one considers M∼ Π 0 n where Π 0 n (m) proportional to 1/m for all m∈ Ω 0 n with Ω 0 n being the set of square-free numbers with largest prime factor less than or equal to p n , then a similar weak convergence result holds, see [19]. In this case, a rate was provided in [1]. Letd 3 (X,Y ) denote the Wasserstein- 3 metric between two random variablesX andY defined similarly as in (2.3), but now with the test functions h(·) belonging in the class of three times differentiable functions instead of the Lipschitz class. The following theorem is due to [1]. Theorem 2.2.4 (Arras et al. [1]). d 3 (S n ,D) is O (log logn) 3/2 / logn . In Chapter 4 we will provide an improved rate of (logn) −1 is a stronger metric. 3 Stein’s method We now present a brief overview of Stein’s method. Most of the material in this section, except for very recent developments, is taken from the wonderful monograph by Chen, Goldstein and Shao [26] and the survey [66]; in some places taken verbatim. Stein’s remarkable idea for studying the normal approximation first appeared about forty years ago in his groundbreaking work [74] and since then the subject has seen an explosion in the number of ideas and applications in a variety of areas, including combinatorics, random graph theory, statistics, random matrix theory and statistical physics, to name a few. The method has also been used to prove concentration inequalities which has provided a way to study many complicated systems with genuine dependence. Malliavin calculus combined with Stein’s method has provided powerful techniques to study functionals of Gaussian fields, Rademacher sequences and Poisson point processes. The basic idea of Stein’s method is that by comparing the values of the expectations of the distribution of a random variable W and a target distribution, say that of the random variable Z on some class of functions, one can get an idea about the closeness of the two distributions. We only look at distributional approximation byN (0, 1) here, but the following mechanism works for several other distributions, e.g. Poisson [25, 4, 5], exponential [23, 51, 62], geometric [8, 54] etc. The application of Stein’s method, as unveiled in [74] and further developed in [75], begins with a characterizing equation for a given target 13 distribution. Such a characterization is then used as the basis to form a Stein equation, which is usually a difference or differential equation involving test functions in a class corresponding to a desired probability metric, such as the class of Lipschitz functions with Lipschitz constant 1 for the Wasserstein distance. The starting point of Stein’s method for normal approximation is the following charac- tarization: A random variable Z∼N (0, 1) if and only if E[(f 0 (Z)−Zf(Z)] = 0 (3.1) for all absolutely continuous functions f(·) for which the above expectations exist. Hence, for some nice class of functionsH and h∈H, one can try to solve the Stein equation f 0 (w)−wf(x) =h(x)−Eh(Z) where Z∼N (0, 1). (3.2) If one can find a solution forf(·) for a large class of functionsH, then to evaluateEh(W )− Eh(Z), one can now try to compute E[f 0 (W )−Wf(W )]. While at first glance, it might seems that the problem has not been made any easier, it turns out that the form of the later expectation, due to its relation to the normal characterization is much more suited for computation when approximation of W by normal is appropriate. We will not go any deeper, for a modern treatment of Stein’s method, an interested reader can see [26] and [66] and references therein. There are several variations of Stein’s method where in order to approximate the dis- tribution of W , one introduces an auxiliary random variable coupled to W having certain properties. In the first part of this thesis, we use Stein coefficients and zero bias coupling to prove strong embedding for sums of i.i.d. random variables. In the second part, for proving convergence to Dickman distributions in probabilistic number theory, we make use of size bias couplings. We will now discuss some of these coupling techniques. 3.1 Independent sum Let W be the sum of independent random variables 1 ,..., n with E i = 0 for all 1≤i≤n and E n X i=1 2 i = 1. Then to compute the expectation E[f 0 (W )−Wf(W )], one can use the ‘leave one out’ approach as introduced in Stein’s original paper [74]. Define W (i) =W− i . Then letting 14 ( ∗ i ) 1≤i≤n be an independent copy of ( i ) 1≤i≤n and I be a random variable taking values in {1,...,n} with P (I =i) =E 2 i for 1≤i≤n, it can be shown that E[f 0 (W )−Wf(W )] =E[f 0 (W )−f 0 (W )−f 0 (W (I) + ∗ I )]. One can now bound the expectation using smoothness properties of the solutions f(·) cor- responding to h∈H for some class of functionsH. 3.2 Exchangeable pair We call two random variables W,W 0 exchangeable if (W,W 0 ) = d (W 0 ,W ). If they further satisfy a ‘linear regression condition’ E[W 0 |W ] = (1−λ)W with λ∈ (0, 1), we say that (W,W 0 ) is a λ-Stein pair. The method of exchangeable pairs [75] is often very useful for normal approximation. This can essentially be viewed as the study of the stationary distribution of a reversible Markov chain by studying one step in the chain given by the pair (W,W 0 ). WhenEW 2 <∞, for any absolutely continuous functionf(·) satisfying |f(x)|≤C(1 +|x|) for some constant C, one can show that E[Wf(W )] = 1 2λ E[(W−W 0 )(f(W )f(W 0 )]. The linear regression condition can be too restrictive at times and one can also consider E[W 0 −W|W ] =λ(W−R) with λ∈ (0, 1) for some small random variable R. 15 3.3 Zero bias The characterizing equation (3.1) for standard normal can be easily extended to the mean zero normal family. A random variable Z∼N (0,σ 2 ) if and only if σ 2 Ef 0 (Z) =E[Zf(Z)] for all absolutely continuous functions for which these expectations exist. Even though this identity is special to the normal family, it is possible to create an identity in the same spirit that holds more generally. Following [42], given a mean zero random variable W with variance σ 2 , we say X ∗ has X-zero biased distribution if σ 2 Ef 0 (W ∗ ) =E[Wf(W )] (3.3) for all absolutely continuous functionsf(·) for which these expectations exist. One can show that the zero bias distribution exists and is unique for mean zero positive variance random variables. When seen as a transformation from the distribution of W to that ofW ∗ , we see that the normal distribution is its unique fixed point. While it fixes the mean zero normal, the transformation, in some sense brings the non-normal distributions closer to normality. Heuristically, one then expects that if the distribution W is close to that of a normal, then the distributions of W ∗ and W should also be close. Indeed, if W has variance one, then for Z∼N (0, 1),from (3.2) we have Eh(W )−Eh(Z) =E[f 0 (W )−Wf(W )] =E[f 0 (W )−f 0 (W ∗ )]. Hence when W and W ∗ are close, the left hand side will be small for a large class of functions. Zero biasing is closely related to another transformation. Following Section 3.2 of [39], see also Proposition 4.2 of [26], for W a random variable with finite, non-zero second moment, we say W has the W -square biased distribution when E[f(W )] = 1 EW 2 E[W 2 f(W )] (3.4) 16 for all functionsf(·) for which the expectation on the right hand side exists. IfW is a mean zero random variable with finite, non-zero varianceσ 2 , then for anyg∈C c , the collection of continuous functions with compact support, letting f(x) = R x 0 g(u)du, using (3.4), we have σ 2 Eg(UW ) =σ 2 Ef 0 (UW ) =σ 2 E Z 1 0 f 0 (uW )du =σ 2 E " f(W ) W # =E W 2 f(W ) W =E[Wf(W )] whereW andU are independent, U∼ U[0, 1] andW has theW -square biased distribu- tion. Thus, using (3.3), we have σ 2 Eg(UW ) =E[Wf(W )] =σ 2 E[f 0 (W ∗ )] =σ 2 E[g(W ∗ )]. Since the expectation of g(W ∗ ) and g(UW ) agree for any g∈C c , we obtain W ∗ = d UW . (3.5) 3.4 Size bias The next method to rewrite E[Wf(W )] for non-negative random variables is through size biasing which first appeared in the context of Stein’s method in [43]. For non-negative W with finite meanEW =μ, we say W s has W -size biased distribution if E[Wf(W )] =μEf(W s ) for all functions f(·) for whichE[Wf(W )] exists. Equivalently, the W -size biased distribu- tionF s (·) is the one which is absolutely continuous with respect to the distribution F (·) of W with the Radon Nikodym derivative given by dF s (x) dF (x) = x μ . Note that if Var(W ) =σ 2 , then one can rewriteE[Wf(W )] as E[Wf(W )] =E W−μ σ f W−μ σ = μ σ f W s −μ σ −f W−μ σ . Hence, iff(·) is differentiable, Taylor expansion about W allows us to compareE[Wf(W )] toEf 0 (W ). 17 3.5 Stein coefficient and second order Poincar´ e inequalities As discussed earlier, the key idea of Stein’s method is to show that E[Wf(W )]≈Ef 0 (W ). While there is no general method to show this step, the various methods mentioned above as well as other available variations of Stein’s method (e.g. diffusion generators [7], dependency graphs [6, 27, 65] and other methods like [38, 63, 64, 23, 24], to cite a few) need something ‘nice’ to happen. To device a more general method, one can use Stein coefficients which was first introduced in [18]. Heuristically, for a mean zero random variable W with unit variance, if one can construct a random T on the same space as W and coupled to W so that for all smooth functions f(·), E[Wf(W )]≈E[Tf 0 (W )], then, in some sense, closeness of T to 1 reflects how close W is to the standard normal distribution. An explicit construction of such a random variable T was provided in [20] when W is a function of independent random variables with EW = 0 and EW 2 = 1. For the special case when W is a function of i.i.d. standard normal random variables, one can use second order Poincar´ e inequalities as introduced in [21]. We will revisit Stein coefficients in Chapter 3. 18 Part I On Strong Embeddings by Stein’s Method 19 Chapter 3 Strong Embeddings Strong embeddings, that is, couplings between a partial sum process of a sequence of random variables and a Brownian motion, have found numerous applications in probability and statistics. In this chapter, we extend Chatterjee’s novel use of Stein’s method for{−1, +1} valued variables in [22] to a general class of discrete distributions, and provide logn rates for the coupling of partial sums of independent variables to a Brownian motion, and results for coupling sums of suitably standardized exchangeable variables to a Brownian bridge. 1 Introduction and main results Let 1 , 2 ... be a sequence of independent random variables distributed as , a mean zero, variance one random variable. LetS k = P k i=1 i ,k = 1, 2,... , be the corresponding sequence of partial sums. Definition 1.1. We say Strong Embedding (SE) holds for the mean zero, variance one random variable if there exist constants C,K, and λ such that for all n = 1, 2,... the partial sumsS k = k P i=1 i ,k = 1,...,n of a sequence 1 , 2 ... of independent random variables distributed as , and a standard Brownian motion (B t ) t≥0 can be constructed on a joint probability space such that P max 0≤k≤n |S k −B k |≥C logn +x ≤Ke −λx for all x≥ 0. We adopt the standard empty sum convention whereby S 0 = 0. Here we take the approach to the KMT approximation introduced by Chatterjee in [22] that has its origins in Stein’s method [74]. This alternative approach makes use of Stein coefficients, also known as Stein kernels, that first appeared in the work of Cacoullos and Papathanasiou [18]. In some sense, a Stein coefficient T for a mean zero random variable W neatly encodes all information regarding the closeness of W to the mean zero normal variableZ having varianceσ 2 , see Theorem 2.1.2. Using the methods of [22], we generalize Theorem 2.1.3 as follows. 20 Theorem 3.1.1. SE holds for , any random variable with mean zero and variance 1 satisfying E 3 = 0, taking values in a finite setA not containing 0. To prove our result we first provide a construction in the case where we have a finite number of variables and then extend to derive strong approximation for an infinite sequence. Such extensions have been studied in the context of the KMT theorem for summands with finite p th moment in [50] and also in [22]. For the finite case we employ induction, as in [22]. The induction step requires extending Theorem 2.1.4 from the special case where is a symmetric variable taking values in{−1, 1}. Theorem 3.1.2 here is a new result for the embedding of exchangeable random variables and a Brownian bridge. Theorem 3.1.2. For any positive integer n, let 1 , 2 ,..., n be exchangeable random vari- ables taking values in a finite setA⊂R. Let S k = k X i=1 i , W k =S k − k n S n and γ 2 = 1 n n X i=1 2 i . Then there exists a positive universal constant C, and for all ν > 0 positive constants K 1 ,K 2 and λ 0 depending only onA and ν, such that for all n≥ 1 and η≥ ν, a version of W 0 ,W 1 ,...,W n and a standard Brownian bridge (B t ) 0≤t≤1 exist on the same probability space and satisfy E exp(λ max 0≤k≤n |W k − √ nηB k/n |) ≤ exp(C logn)E exp K 1 λ 2 S 2 n n +K 2 λ 2 n(γ 2 −η 2 ) 2 ! for all λ≤λ 0 . Moreover, if 06∈A, then there exist positive constants K 1 andλ 0 depending only onA such that E exp(λ max 0≤k≤n |W k − √ nγB k/n |)≤ exp(C logn)E exp K 1 λ 2 S 2 n n ! for all λ≤λ 0 , and if in addition 1 ,..., n are i.i.d. with zero mean, then there exists a positiveλ depending only onA such that P max 0≤k≤n |W k − √ nγB k/n |≥λ −1 C logn +x ≤ 2e −λx for all x≥ 0. 21 The constant C is given explicitly in (3.8) in the proof of Theorem 3.3.1; its numerical value is roughly 8.4. The constants in the second inequality of Theorem 3.1.2 are those that appear in the first inequality, specialized to a case where the lower bound ν depends only onA. Our extension of the Rademacher variable result of [22] requires a number of non-trivial components. Example 3 of [22] demonstrates how to smooth Rademacher variables to obtain Stein coefficients, and the author states ‘we do not know yet how to use Theorem 1.2 to prove the KMT theorem in its full generality, because we do not know how to generalize the smoothing technique of Example 3. ’ We address this point by the zero bias method of Lemma 3.2.3 in Section 2, that shows how any mean zero, finite variance random variable may be smoothed to obtain a Stein coefficient and it may be of independent interest as regards the construction of Stein coefficients. Additionally, dealing with variables restricted to the set{−1, 1} avoids another diffi- culty. In particular, the second inequality of Theorem 3.1.2 shows that the ‘natural scal- ing’ for the approximating Brownian bridge process depends on the variance parameter γ 2 = n −1 P n i=1 2 i , which in the case of Rademacher variables is always one. In fact, for such variables, the variance parameter remains the constant one when restricted and suit- ably scaled to any subset of variables. In contrast, in general when applying induction to piece together a larger path from smaller ones, their respective variance parameters may not match. This effect gives rise to the term (γ 2 −η 2 ) 2 in the exponent of the first inequal- ity of Theorem 3.1.2, which then needs to be controlled in order for the induction to be completed. In doing so, one gains results on the comparison of the sample paths of a more general classes of exchangeable variables to a Brownian bridge. The second claim of Theorem 3.1.2 is shown under the assumption 06∈A. This condition becomes critical precisely at (3.30), where we require that the smallest absolute value of the elements ofA is positive, from which one then obtains a lower boundν onγ when invoking Theorem 3.3.1. This same phenomenon occurs in the proof of Lemma 3.4.1, on the way to demonstrate Theorem 3.1.1. The remainder of this chapter is organized as follows. In Section 2, we prove two theorems, one for coupling sums S n of i.i.d. random variables, and one for coupling W n of Theorem 3.1.2, to Gaussians. We also prove Lemma 3.2.3, which shows how to construct Stein type coefficients using smoothing by zero bias variables. Theorems 3.3.1 and 3.1.2, the first result a conditional version of the second, are proved in Section 3, and we prove Lemma 3.4.1, implying Theorem 3.1.1, in Section 4. 22 2 Bounds for couplings to Gaussian variables In this section we prove Theorems 3.2.1 and 3.2.2, generalizations of Theorems 3.1 and 3.2 of [22], and our zero bias smoothing result, Lemma 3.2.3. The first theorem gives bounds on couplings of sumsS n of i.i.d. variables, and the second on coupling of certain exchangeable sums to Gaussian random variables. Theorem 3.2.1. For every mean zero, variance one bounded random variable satisfying E( 3 ) = 0 and E( 4 ) <∞, there exists θ 1 > 0 such that for every positive integer n it is possible to construct a version of the sum S n = P n i=1 i of n independent copies of , and Z n ∼N (0,n), on a joint probability space such that E exp(θ 1 |S n −Z n |)≤ 8. For convenience, we adopt the convention that a normal random variable with mean μ and zero variance is identically equal to μ. Theorem 3.2.2. For n≥ 1, let 1 , 2 ,... n be arbitrary elements of a finite setA⊂ R, not necessarily distinct. Let γ 2 = n −1 P n i=1 2 i , let π be a uniform random permutation of {1, 2,...,n}, and for each 1≤k≤n let S k = k X i=1 π(i) and W k =S k − kS n n . (2.1) Then for all ν > 0 there exist positive constants c 1 ,c 2 and θ 2 depending only onA and ν such that for any integer n≥ 1, an integer k such that|2k−n|≤ 1, and any η≥ ν, it is possible to construct a version of W k and a Gaussian random variable Z k with mean 0 and variance k(n−k)/n on the same probability space such that for all θ≤θ 2 , E exp(θ|W k −ηZ k |)≤ exp 3 + c 1 θ 2 S 2 n n +c 2 θ 2 n(γ 2 −η 2 ) 2 ! . We now define Stein coefficients, the key ingredient upon which our approach depends. LetW be a random variable withE[W ] = 0 and finite second moment. We say the random variable T defined on the same probability space is a Stein coefficient for W if E[Wf(W )] =E[Tf 0 (W )] (2.2) 23 for all Lipschitz functions f(·) and f 0 (·) any a.e. derivative of f(·), whenever these expec- tations exist. We recall the following result from Chapter 2. Theorem 2.1.2.(Chatterjee [22]) Let W be mean zero with finite second moment and suppose that T is a Stein coefficient for W with|T| almost surely bounded by a constant. Then, given any σ 2 > 0, we can construct a version of W and Z∼N (0,σ 2 ) on the same probability space such that E exp(θ|W−Z|)≤ 2E exp 2θ 2 (T−σ 2 ) 2 σ 2 ! for all θ∈R. To prove Theorems 3.2.1 and 3.2.2, first recall definitions (3.3) and (3.4) of the zero biased distribution and square biased distribution respectively and from (3.5) that X ∗ = d UX whereX ∗ hasX-zero biased distribution, X andU are independent, U∼ U[0, 1] andX has X-square biased distribution. SmoothingX by adding an independent random variableY having theX-zero bias dis- tribution, we obtain the following result which will be used for constructing Stein coefficients for sums. Lemma 3.2.3. If X is a mean zero random variable with finite non-zero variance, and Y is an independent variable with the X-zero bias distribution, then E[Xf(X +Y )] =E[(X 2 −XY )f 0 (X +Y )] for all Lipschitz functions f(·) and a.e. derivative f 0 (·) for which these expectations exist. Proof. Let V be distributed as X, let U be a U[0, 1] random variable, and set Y =UV whereV,U,V andX are independent. Note that for any bivariate function g(·) for which the expectations below exist, by (3.4) we have E[g(X,V )] = 1 σ 2 E[V 2 g(X,V )], (2.3) 24 where σ 2 is the variance of X. Hence E[(X 2 −XY )f 0 (X +Y )] =E[(X 2 −XUV )f 0 (X +UV )] =E Z 1 0 (X 2 −XuV )f 0 (X +uV )du =E " (X 2 −XuV )f(X +uV ) V 1 0 +XV Z 1 0 f(X +uV ) V du # =E " (X 2 −XV )f(X +V )−X 2 f(X) V # +E[Xf(X +Y )] = 1 σ 2 E[V (X 2 −XV )f(X +V )−VX 2 f(X)] +E[Xf(X +Y )] = 1 σ 2 E[VX(X−V )f(X +V )] +E[Xf(X +Y )], where we have used (2.3) in the second to last equality, as well as the independence of V andX, and thatEV = 0, in the last. Hence, to prove the claim, it suffices to show that the first term above is zero. Since X = d V and V and X are independent and exchangeable, we have VX(X−V )f(X +V ) = d VX(V−X)f(X +V ) =−VX(X−V )f(X +V ), demonstrating that the expectation of the expression above is zero. For any mean zeroX with finite, non-zero varianceσ 2 the distribution ofX ∗ is absolutely continuous with density function p X ∗(x) = E[X1(X >x)] σ 2 . (2.4) One finds directly from (2.4) that a≤X≤b for some constants a<b implies a≤X ∗ ≤b. (2.5) Comparing (2.2) with (3.3), we see that T is a Stein coefficient for X if σ −2 E[T|X] is the Radon Nikodym derivative dμ ∗ dμ of the probability measureμ ∗ ofX ∗ with respect to the 25 measure μ of X. Hence, in light of (2.4), if X is a random variable with mean zero and finite variance, having density function p X (x) whose support is an interval, then setting h X (x) = E[X1(X >x)] p X (x) 1(p X (x)> 0) we have E[Xf(X)] =E[h X (X)f 0 (X)] (2.6) for all Lipschitz function f(·) and a.e. derivative f 0 (·) for which these expectations exist, that is, h X (X) is a Stein coefficient for X. We note the first equality in (2.6) shows, by virtue ofE(X) = 0, that h X (x)≥ 0. Now consider a random variable X having vanishing first and third moment, variance strictly between zero and infinity and satisfying E(X 4 ) <∞. Then the distribution for a random variableY having theX-zero bias distribution exists, and from (3.3) withg(x) =x 2 and g(x) =x 3 , we respectively find E(Y ) = 0 and E(Y 2 )<∞. (2.7) Moreover from (2.4) we see that Y has density function p Y (y) whose support is a closed interval. Hence the function h Y (y), given by the first equality in (2.6), satisfies the second. Lemma 3.2.4. Let 1 ,... n be independent and identically distributed as , a random variable with mean zero, finite nonzero variance, and satisfying E( 3 ) = 0 and E( 4 )<∞, and let Y have the -zero bias distribution and be independent of 1 ,..., n . Then for all Lipschitz functions f(·) and a.e. derivative f 0 (·), E[ e S n f( e S n )] =E[Tf 0 ( e S n )] where e S n =S n +Y with S n = 1 + 2 +··· + n , and T = n X i=1 2 i −S n Y +h Y (Y ) with h Y (y) = E[Y1(Y >y)] p Y (y) 1(p Y (y)> 0). Proof. With S (i) n =S n − i , we have E[ e S n f( e S n )] =E[S n f( e S n ) +Yf( e S n )] = n X i=1 E[ i f( i +Y +S (i) n )] +E[Yf(Y +S n )]. (2.8) 26 For the first term of (2.8), using that the summands i are independent and applying Lemma 3.2.3 yields E[ i f( i +Y +S (i) n )] =E[( 2 i − i Y )f 0 ( i +Y +S (i) n )] =E[( 2 i − i Y )f 0 ( e S n )]. Now turning to the second term of (2.8), we first note that by (3.3) the assumption that the third moment of is zero impliesE(Y ) = 0. Now using the independence of Y and S n , (2.6) yields E[Yf(Y +S n )] =E[h Y (Y )f 0 (Y +S n )] =E[h Y (Y )f 0 ( e S n )]. Substitution into (2.8) now yields the claim. Hoeffding’s lemma, e.g. see the proof of Lemma 2.2 of [17], will be used below. It states that if X is a mean zero random variable that satisfies a≤X≤b almost surely, then E[exp(θX)]≤e (b−a) 2 θ 2 /8 for all θ∈R. (2.9) We also require the ‘non central χ 2 1 ’ moment generating function identity, E exp αV 2 +βV = (1− 2α) −1/2 exp β 2 2(1− 2α) ! (2.10) valid for the standard Gaussian variable V , and all β∈R and α< 1/2. For the lawL(X) of any random variable X let `(L(X)) = inf{b−a :P (a≤X≤b) = 1}, the length of the support of X. For notational simplicity we will write `(X), or ` when X is clear from context, for `(L(X)). We use that `(X) is translation invariant in the sense that `(X) =`(X−c) for any real number c without further mention. Lemma 3.2.5. For every almost surely bounded random variableX, there exists a constant ϑ `(X) ∈ (0,∞) depending only on `(X) such that when X 1 ,X 2 ,... are independent random variables distributed as X, the sum S n =X 1 +··· +X n and μ =EX satisfy E " exp θ 2 S 2 n n !# ≤ 4 3 exp 4 3 nθ 2 μ 2 for all n≥ 1 and|θ|≤ϑ `(X) . 27 The constant 4/3 is somewhat arbitrary as any value greater than 1 can be achieved; the proof of Theorem 3.3.1 requires a value strictly less than 3/2. Proof. Let V be aN (0, 1) random variable independent of X. Using Hoeffding’s lemma (2.9) conditional on V , for any function of V we have E[exp(t(V )(X−μ))|V ]≤e ` 2 t(V ) 2 /8 . Applying E(expθV ) = exp(θ 2 /2), for `θ < √ 2 and V independent of X 1 ,X 2 ,..., letting t(V ) = √ 2θ V √ n we obtain E " exp θ 2 S 2 n n !# =E exp √ 2θ S n √ n V =E E exp √ 2θ V √ n X V n =E E exp(t(V )X) V n =E E exp(t(V )(X−μ) +t(V )μ) V n ≤E " exp 2` 2 θ 2 V 2 8n + √ 2θμ V √ n ! n # =E " exp ` 2 θ 2 4 V 2 + √ 2θμ √ nV !# = 1 p 1−` 2 θ 2 /2 exp nθ 2 μ 2 1−` 2 θ 2 /2 ! ≤ 1 1−` 2 θ 2 /2 exp nθ 2 μ 2 1−` 2 θ 2 /2 ! , where we have applied (2.10) in the final equality. It is now direct to verify that the property required by the lemma holds by letting ϑ `(X) = 1/( √ 2`(X)), the unique positive solution to 1 1−`(X) 2 θ 2 /2 = 4 3 . Lemma 3.2.6. Let be a bounded, mean zero, variance σ 2 ∈ (0,∞) random variable satisfying E 3 = 0. Then the Stein coefficient h Y (y), given by (2.6) for Y with the -zero bias distribution, is bounded. Proof. As is a mean zero random variable with finite, nonzero variance, the zero bias distributionL(Y ) exists. As E 3 = 0 and is bounded and non-trivial, as in (2.7) one verifies that EY = 0 and that Var(Y ) is positive and finite. Hence, as noted below (2.6), the Stein coefficient h Y (y) as given by (2.6) is nonnegative, so we need only show that it is bounded above. 28 From (2.4), an a.e. density of Y is given by p Y (y) = 1 σ 2 Z ∞ y udF (u) (2.11) where we use F X (·) to denote the distribution function of the random variable X. From (2.11) we may observe that the support of Y is the smallest closed interval ofR containing the support of . Since is bounded and has mean zero, using (2.5), this interval is of the form [a,b] for−∞<a< 0<b<∞, hence for t∈ [a,b] the upper limit of the integral in (2.11) may be replaced by b. In particular, for all t∈ [0,b], by (2.6) we have h Y (t) = R b t yp Y (y)dy p Y (t) = R b t y R b y udF (u)dy σ 2 p Y (t) = RR t≤y≤u≤b yudF (u)dy σ 2 p Y (t) = R b t u R u t ydydF (u) σ 2 p Y (t) = R b t u(u 2 −t 2 )dF (u) 2σ 2 p Y (t) ≤ b 2 R b t udF (u) 2σ 2 p Y (t) = b 2 2 , where we have used Fubini’s theorem in the fourth equality, and (2.11) in the second and sixth. As h −Y (t) =h Y (−t) we obtain that h Y (y) is bounded for t∈ [a, 0]. Proof of Theorem 3.2.1: For short we writeS = 1 + 2 +··· + n and e S =S +Y withY is as in Lemma 3.2.4. As the third moment of is zero and its fourth moment is finite, as in (2.7), Y has mean zero with finite variance, and hence so does e S. By Lemma 3.2.4, T = n P i=1 2 i −SY +h Y (Y ) is a Stein coefficient for e S. Since is bounded and the third moment of is zero, Lemma 3.2.6 yields that h Y (Y ) is bounded. Also bounded impliesS is bounded. In addition, as is bounded there exists someB such that||≤B, and (2.5) implies|Y|≤B. Thus, we conclude that|T| is bounded. Now invoking Theorem 2.1.2, there exists a version of e S andZ∼N (0,σ 2 ) on the same probability space such that E exp(θ| e S−Z|)≤ 2E exp(2θ 2 σ −2 (T−σ 2 ) 2 ) for all θ∈R. Using|Y|≤B we have|S− e S|≤B. It follows that, E exp(θ|S−Z|)≤ 2E exp(B|θ| + 2θ 2 σ −2 (T−σ 2 ) 2 ) . 29 Letting C 0 ≥B be such that|h Y (Y )|≤C 0 , and setting σ 2 =n, we obtain (T−σ 2 ) 2 σ 2 ≤ 3S 2 + 3C 2 0 S 2 + 3C 2 0 n where S = n P i=1 ( 2 i − 1). Hence, E exp(θ|S−Z|)≤ 2 exp B|θ| + 6C 2 0 θ 2 n ! E exp 6θ 2 S 2 +C 2 0 S 2 n ! ≤ exp B|θ| + 6C 2 0 θ 2 n ! E " exp 12θ 2 S 2 n ! + exp 12θ 2 C 2 0 S 2 n !# (2.12) where we applied the simple inequality exp(x +y)≤ (e 2x +e 2y )/2. Noting for S and S that 2 − 1 and respectively are bounded and have mean zero, using Lemma 3.2.5 for the first two inequalities below, we see that there exists θ 1 > 0 such that for all|θ|≤θ 1 and all positive integers n E exp(12θ 2 S 2 /n)≤ 2 and E exp(12θ 2 C 2 0 S 2 /n)≤ 2 and exp B|θ| + 6C 2 0 θ 2 n ! ≤ 2. Theorem 3.2.1 now follows from (2.12). We now prepare for the proof of Theorem 3.2.2 by providing a few lemmas. ForA the finite set in which the basic variable takes values, let D ={b−a :a,b∈A} and D + =D∩ [0,∞), (2.13) the set of differences of the elements inA, and those differences that are non-negative. We note here thatD is symmetric in thatD =−D. Let also B = max a∈A |a|. (2.14) Recall the definition (2.1) of W k and observe that we may write W k =S k − k n S n = k X i=1 π(i) − k n n X i=1 π(i) = n−k n k X i=1 π(i) − k n n X i=k+1 e π(i) = 1 n k X i=1 n X j=k+1 ( π(i) − π(j) ), (2.15) 30 and therefore W k = X d∈D + W k,d where W k,d = 1 n k X i=1 n X j=k+1 ( π(i) − π(j) )1(| π(i) − π(j) | =d). (2.16) Lemma 3.2.7. Under the hypotheses of Theorem 3.2.2, for any θ∈ R, 1≤ k≤ n and d∈D + we have E exp(θW k,d / √ k)≤ exp(d 2 θ 2 /2) and E exp(θW k / √ k)≤ exp(B 2 θ 2 ), (2.17) where B is as in (2.14). Further, there exists α 0 > 0 depending only onA such that E[exp(αW 2 k,d /k)]≤ 2 for all|α|≤α 0 and all d∈D + . (2.18) Proof. We may assumed> 0 as the result is otherwise trivial. Fix an integerk in [1,n] and d∈D + , and let m(θ) := E exp(θW k,d / √ k). We argue as in [22]. Since W k,d is bounded, the functionm(θ) is differentiable and differentiation and expectation may be interchanged. Hence, using (2.16) for the second equality, m 0 (θ) = 1 √ k E(W k,d exp(θW k,d / √ k)) = 1 n √ k k X i=1 n X j=k+1 E[( π(i) − π(j) )1(| π(i) − π(j) | =d) exp(θW k,d / √ k)]. (2.19) Now, leti andj satisfying 1≤i≤k<j≤n be arbitrary and letπ 0 =π◦ (i,j) where (i,j) is the transposition ofi andj. Then (π,π 0 ) is an exchangeable pair of random permutations. Let W 0 k,d be defined as in (2.16) with π 0 replacing π. Using exchangeability for the first equality and the definition of π 0 for the second, E[( π(i) − π(j) )1(| π(i) − π(j) | =d) exp(θW k,d / √ k)] =E[( π 0 (i) − π 0 (j) )1(| π 0 (i) − π 0 (j) | =d) exp(θW 0 k,d / √ k)] =E[( π(j) − π(i) )1(| π(i) − π(j) | =d) exp(θW 0 k,d / √ k)] =−E[( π(i) − π(j) )1(| π(i) − π(j) | =d) exp(θW 0 k,d / √ k)]. 31 Averaging the first and last expressions yields E[( π(i) − π(j) )1(| π(i) − π(j) | =d) exp(θW k,d / √ k)] = 1 2 E[( π(i) − π(j) )1(| π(i) − π(j) | =d)(exp(θW k,d / √ k)− exp(θW 0 k,d / √ k))]. (2.20) Note |W k,d −W 0 k,d | = 1 n X k+1≤l≤n,l6=j ( π(i) − π(l) )1(| π(i) − π(l) | =d) + X 1≤l≤k,l6=i ( π(l) − π(j) )1(| π(l) − π(j) | =d) + ( π(i) − π(j) )1(| π(i) − π(j) | =d)− X k+1≤l≤n,l6=j ( π 0 (i) − π 0 (l) )1(| π 0 (i) − π 0 (l) | =d) + X 1≤l≤k,l6=i ( π 0 (l) − π 0 (j) )1(| π 0 (l) − π 0 (j) | =d) + ( π 0 (i) − π 0 (j) )1(| π 0 (i) − π 0 (j) | =d) = 1 n n X l=1 ( π(i) − π(l) )1(| π(i) − π(l) | =d) + n X l=1 ( π(l) − π(j) )1(| π(l) − π(j) | =d) ≤ 1 n [nd +nd] = 2d. Now applying the inequality|e x −e y |≤ 1 2 |x−y|(e x +e y ) we see that (2.20) in absolute value is bounded by |θ| 4 √ k E[| π(i) − π(j) |1(| π(i) − π(j) | =d)|W k,d −W 0 k,d |((exp(θW k,d / √ k) + exp(θW 0 k,d / √ k))] ≤ |θ| 4 √ k 2d 2 E[exp(θW k,d / √ k) + exp(θW 0 k,d / √ k)] = |θ|d 2 √ k m(θ). So, from (2.19), and the fact that 1≤i≤k and k<j≤n are arbitrary, we obtain |m 0 (θ)|≤ 1 n √ k |θ|d 2 √ k k X i=1 n X j=k+1 m(θ)≤d 2 |θ|m(θ). Now, using, m(0) = 1, and that m(θ)≥ 0 for all θ∈R, for θ> 0, we obtain Z θ 0 m 0 (u) m(u) du≤ Z θ 0 d 2 udu =⇒ m(θ)≤ exp(d 2 θ 2 /2) 32 and for θ< 0, we obtain Z 0 θ −m 0 (u) m(u) du≤ Z 0 θ d 2 (−u)du =⇒ m(θ)≤ exp(d 2 θ 2 /2), proving the first inequality of (2.17). Arguing similarly, now letting m(θ) := E exp(θW k / √ k) and W 0 k as in (2.1) with π 0 replacing π, noting|W k −W 0 k | =| π(i) −e π(j) |≤ 2B, we obtain E[( π(i) − π(j) ) exp(θW k / √ k)]≤ |θ| 4 √ k E[( π(i) − π(j) ) 2 ((exp(θW k / √ k) + exp(θW 0 k / √ k))] ≤ |θ|4B 2 4 √ k 2m(θ) = 2|θ|B 2 √ k m(θ), so that|m 0 (θ)|≤ 2B 2 |θ|m(θ), implying the final inequality of (2.17). Turning to (2.18), lettingZ be a standard normal random variable independent ofW k,d , by (2.17) and (2.10), for all d∈D + and α< 1/(2d 2 ), we have E exp(αW 2 k,d /k) =E exp √ 2αZW k,d / √ k ≤E exp(d 2 αZ 2 )≤ 1 √ 1−2d 2 α . Now set α 0 so that the bound above is any number no greater than 2 when d is replaced by max{d :d∈D + }. Lemma 3.2.8. Under the assumptions of Theorem 3.2.2 there exists α 1 > 0 depending only onA such that for all n, all 1≤k≤ 2n/3, and all 0≤α≤α 1 , E exp αS 2 k /k ≤ exp 1 + 3αS 2 n 4n ! . Proof. The steps are the same as in the proof of Lemma 3.5 of [22]. For Z a standard normal random variable independent of π, by definition (2.1) of W k we have E exp(αS 2 k /k) =E exp r 2α k S k Z ! =E exp r 2α k W k Z + r 2α k kS n n Z ! . 33 By (2.17), with B given by (2.14), for the first term we obtain the bound E exp √ 2αZW k / √ k Z ≤ exp(2αB 2 Z 2 ). Thus, E exp(αS 2 k /k)≤E exp 2αB 2 Z 2 + r 2α k kS n n Z ! . Recalling S n is nonrandom, using the non central χ 2 1 identity (2.10), we find that E exp(αS 2 k /k)≤ 1 √ 1− 4αB 2 exp αkS 2 n (1− 4αB 2 )n 2 ! for 0<α< 1/(4B 2 ). The proof of the lemma is now completed by bounding k by 2n/3 and choosing α 1 > 0 small enough so that 1/(1− 4α 1 B 2 ) is sufficiently close to 1. Proof of Theorem 3.2.2: We assume θ > 0. Applying our convention that zero variance normal random variables are equal to their mean almost surely, when n = 1 we have S 0 =W 0 =W 1 =Z 0 =Z 1 = 0 and the result holds trivially, so we assumen≥ 2. Recalling definition (2.13) ofD + for each d > 0 inD + and thatD is symmetric, let Y d have the uniform U[−d/2,d/2] distribution, and be independent of each other and of the uniform random permutation π, and for d = 0 let Y 0 = 0. Set Y = X d∈D + Y d . For arbitrary i,j satisfying 1≤i≤k <j≤n letF ij =σ{π(l) :l6∈{i,j}}. Regarding the collection{ 1 ,..., n } as a multiset, we have {ε π(i) ,ε π(j) } ={ε i ,i = 1,...,n}\{ε π(l) ,l6∈{i,j}}, showing that{ε π(i) ,ε π(j) }, and therefore also ε π(i) +ε π(j) and d ij := |ε π(i) −ε π(j) | are measurable with respect toF ij . Further, the conditional distribution of X ij := ε π(i) −ε π(j) 2 givenF ij is uniform over the set{−d ij /2,d ij /2}. 34 Let S (i) k = S k −ε π(i) ,W (i) k = S (i) k − (k/n)S n and Y (ij) = Y−Y d ij . For ε π(i) 6= ε π(j) , applying Lemma 3.2.3 and the easily verified fact that the zero bias distribution of the variable that takes the values{−a,a} with equal probability is uniform over [−a,a], for some fixed Lipschitz function f(·), we have E[(ε π(i) −ε π(j) )f(W k +Y )|F ij ] =2E[X ij f(X ij +Y d ij +W (i) k + (ε π(i) +ε π(j) )/2 +Y (ij) )|F ij ] =2E[(X 2 ij −X ij Y d ij )f 0 (X ij +Y d ij +W (i) k + (ε π(i) +ε π(j) )/2 +Y (ij) )|F ij ] =2E[(X 2 ij −X ij Y d ij )f 0 (W k +Y )|F ij ] =2E[(d 2 ij /4−X ij Y d ij )f 0 (W k +Y )|F ij ]. We note that the equality between the first and final terms above holds also when ε π(i) = ε π(j) , both sides being zero. Taking expectation we obtain E[(ε π(i) −ε π(j) )f(W k +Y )] =E[t ij f 0 (W k +Y )] (2.21) where t ij = 2 d 2 ij 4 −X ij Y d ij ! = ε 2 π(i) +ε 2 π(j) 2 −ε π(i) ε π(j) − (ε π(i) −ε π(j) )Y d ij . (2.22) It is easy to verify using (2.6), or by integration by parts, that for U∼ U[−a,a], E[Uf(U)] = 1 2 E[(a 2 −U 2 )f 0 (U)], implying E[Yf(W k +Y )] = X d∈D + E[Y d f(Y d +W k + (Y−Y d )] = 1 2 X d∈D + E " d 2 4 −Y 2 d ! f 0 (Y d +W k + (Y−Y d ) # =E[R 4 f 0 (W k +Y )], (2.23) where R 4 = 1 2 X d∈D + d 2 4 −Y 2 d ! . 35 SinceD + is finite there exists C 0 > 0 so that |R 4 |≤C 0 . (2.24) From (2.15), W k = 1 n k X i=1 n X j=k+1 (ε π(i) −ε π(j) ), so letting f W k =W k +Y (2.25) and combining (2.21) and (2.23), we have E[ f W k f( f W k )] =E[Tf 0 ( f W k )], (2.26) where the Stein coefficient T , in light of (2.22), is given by T = 1 n k X i=1 n X j=k+1 t ij +R 4 =R 1 −R 2 −R 3 +R 4 , where R 1 = 1 2n (n−k) k X i=1 ε 2 π(i) +k n X j=k+1 ε 2 π(j) , R 2 = 1 n k X i=1 ε π(i) n X j=k+1 ε π(j) , and R 3 = 1 n X 1≤i≤k<j≤n ( π(i) − π(j) )Y d ij = X d∈D + Y d 1 n X 1≤i≤k<j≤n ( π(i) − π(j) )1(| π(i) − π(j) | =d) = X d∈D + Y d W k,d , with W k,d as in (2.16). Since|Y d |≤d/2, we have |R 3 |≤ X d∈D + d 2 |W k,d |. (2.27) 36 Recalling that ν > 0 is a given fixed number, and that γ 2 = 1 n n X i=1 ε 2 i , set e σ 2 = k(n−k)γ 2 n and σ 2 = k(n−k)η 2 n , for a positive constant η≥ν, noting that since n≥ 2 both σ 2 and e σ 2 are positive. Then, T−σ 2 2 σ 2 = n k(n−k)η 2 R 1 −σ 2 −R 2 −R 3 +R 4 2 ≤ n k(n−k)ν 2 R 1 −σ 2 −R 2 −R 3 +R 4 2 . (2.28) To bound this quantity, consider first R 1 −e σ 2 = 1 2n (n−k) k X i=1 ε 2 π(i) +k n X j=k+1 ε 2 π(j) −e σ 2 = 1 2n (n−k) k X i=1 ε 2 π(i) +k n X j=k+1 ε 2 π(j) − k(n−k) n 2 n X i=1 ε 2 π(i) = 1 2n (n−k) k X i=1 ε 2 π(i) +k n X j=k+1 ε 2 π(j) − 2k(n−k) n n X i=1 2 π(i) = 1 2n (n−k) k X i=1 ε 2 π(i) − k n n X i=1 ε 2 π(i) ! +k n X j=k+1 ε 2 π(j) − n−k n n X j=1 ε 2 π(j) = 1 2n (n−k) n−k n k X i=1 ε 2 π(i) − k n n X i=k+1 ε 2 π(i) +k k n n X j=k+1 ε 2 π(j) − n−k n k X j=1 ε 2 π(j) = 1 2n 2 (n−k) 2 −k(n−k) k X i=1 ε 2 π(i) − (n−k)k−k 2 n X j=k+1 ε 2 π(j) = n− 2k 2n 2 (n−k) k X i=1 ε 2 π(i) −k n X j=k+1 ε 2 π(j) . Hence, for all k = 1, 2,...,n, with B as in (2.14), |R 1 −σ 2 |≤ |n− 2k| 2n 2 (n−k) k X i=1 ε 2 π(i) +k n X i=k+1 ε 2 π(j) +|e σ 2 −σ 2 | ≤ |n− 2k| 2 γ 2 +|e σ 2 −σ 2 |≤ |n− 2k| 2 B 2 +|e σ 2 −σ 2 |. 37 Choosing k such that|n− 2k|≤ 1, we obtain |R 1 −σ 2 |≤ B 2 2 +|e σ 2 −σ 2 |. (2.29) Regarding R 2 , for any k∈{1,...,n} we have |R 2 | = 1 n k X i=1 ε π(i) n X j=k+1 ε π(j) ≤B|S k |. (2.30) Hence, for k such that|2k−n|≤ 1, from (2.28), (2.29), (2.30), (2.27) and (2.24), and that |e σ 2 −σ 2 | =| k(n−k) n (γ 2 −η 2 )|, we obtain (T−σ 2 ) 2 σ 2 ≤ n k(n−k)ν 2 B 2 /2 +|e σ 2 −σ 2 | +B|S k | + X d∈D + d 2 |W k,d | +C 0 2 ≤C 1 +n(γ 2 −η 2 ) 2 + S 2 k k + X d∈D + W 2 k,d k for some constant C depending uniquely onA and ν. We now verify that the hypotheses of Theorem 2.1.2 hold for f W k of (2.25) andT . Clearly f W k satisfiesE( f W k ) = 0 andE( f W 2 k )<∞. By (2.26), T is a Stein coefficient for f W k , and T is easily verified to be bounded. Writing Z for short for ηZ k in the statement of Theorem 3.2.2, we note that Z is distributedN (0,σ 2 ), and by Theorem 2.1.2 we can construct a version of f W k and Z on the same probability space so that for all θ, E exp(θ| f W k −Z|)≤ 2E exp(2θ 2 σ −2 (T−σ 2 ) 2 ) ≤ 2E exp 2Cθ 2 1 +n(γ 2 −η 2 ) 2 + S 2 k k + X d∈D + W 2 k,d k . 38 With D = 1 2 P d∈D +d we have|W k − f W k |≤|Y|≤ P d∈D + |Y d |≤D. Letting q =|D + | + 1, we have E exp(θ|W k −Z|) ≤ 2 exp(D|θ| + 2Cθ 2 + 2Cθ 2 n(γ 2 −η 2 ) 2 )E exp 2Cθ 2 S 2 k k + 2Cθ 2 X d∈D + W 2 k,d k ≤ 2 q exp(D|θ| + 2Cθ 2 + 2Cθ 2 n(γ 2 −η 2 ) 2 ) E exp 2Cqθ 2 S 2 k k ! + X d∈D + E exp 2Cqθ 2 W 2 k,d k ! , by the convexity of the exponential function. Using Lemmas 3.2.8 and 3.2.7, there exists θ 3 > 0 depending only onA and ν such that for all θ≤θ 3 , we obtain E exp(θ|W k −Z|)≤ 2 q exp(D|θ| + 2Cθ 2 + 2Cθ 2 n(γ 2 −η 2 ) 2 ) exp 1 + 6Cqθ 2 S 2 n 4n ! + 2(q− 1) ! ≤ 2 exp(D|θ| + 2Cθ 2 + 2Cθ 2 n(γ 2 −η 2 ) 2 ) exp 1 + 6Cqθ 2 S 2 n 4n ! + 2 ! . Now choose θ 4 > 0, depending only onA and ν, so that 2 exp(Dθ 4 + 2Cθ 2 4 )≤e and note exp 1 +θ 2 x + 2≤ exp 2 +θ 2 x for all x≥ 0, implying that for θ≤θ 2 :=θ 3 ∧θ 4 , E exp(θ|W k −Z|)≤ exp 1 + 2Cθ 2 n(γ 2 −η 2 ) 2 + 2 + 6Cqθ 2 S 2 n 4n ! = exp 3 + 6Cqθ 2 S 2 n 4n + 2Cθ 2 n(γ 2 −η 2 ) 2 ! , which is the desired bound. 3 The induction step In this section we present Theorem 3.3.1, which we use to prove Theorem 3.1.2 that gener- alizes Theorem 2.1.4 by [22]. Let 1 , 2 ,... n be arbitrary elements of a finite setA⊂R, 39 not necessarily distinct, and let π be a uniform random permutation of{1, 2,...,n}. For each 1≤k≤n recall S k = k X i=1 π(i) and W k =S k − kS n n . (3.1) We show (W 1 ,...,W n ) and a positive multiple of a Gaussian vector (Z 1 ,...,Z n ) ob- tained by evaluating a Brownian bridge process on [0,n] at integer time points can be coupled on the same space so that the moment generating function of their maximum abso- lute difference achieves the exponential bound (3.6) below. In place of coupling, the result of the theorem can be equivalently stated in terms of the existence of a joint probability functionρ n (s, z) on (S 1 ,...,S n ) and (Z 1 ,...,Z n ) having the correct marginals whose joint realization obeys the desired bound. It will be helpful to regard the collection ={ 1 ,..., n } as a multiset. We say s∈R n is a ‘path’ corresponding to a multiset of ‘increments’ when there exists π∈P n , the set of permutations on{1,...,n}, such that s can be achieved by summing the increments in the order given by π, that is, when s is an element of the set of all feasible paths A n :={s∈R n :s k = k X i=1 π(i) ,k = 1,...,n,π∈P n }. (3.2) Conversely, the multiset of increments corresponding to a path s is given by s ={s 1 ,s 2 −s 1 ,...,s n −s n−1 }, (3.3) so that s∈A n if and only if s = . Suppose that among arel distinct numbers, appearing with multiplicities m 1 ,...,m l , necessarily summing to n. Then letting f n (s) be the probability mass function of (S 1 ,...,S n ) as given by (3.1), we have |A n | = n! m 1 !m 2 !...m l ! and f n (s) = 1 |A n | 1(s∈A n ) = 1 |A n | 1( s = ), (3.4) that is, the distribution f n (s) is uniform overA n . The following result is a conditional version of Theorem 3.1.2. Theorem 3.3.1. Let 1 , 2 ,... n be arbitrary elements of a finite setA⊂R, not necessarily distinct, π a uniform random permutation of{1, 2,...,n}, S k and W k as in (3.1), and γ 2 = n −1 P n i=1 2 i . Then there exists a positive universal constant C, and for every ν > 0 40 positive constantsK 1 ,K 2 andλ 0 depending only onA andν such that for any integern≥ 1 and everyη≥ν one may construct a version of (W k ) 0≤k≤n and Gaussian random variables (Z k ) 0≤k≤n with zero mean and covariance Cov(Z i ,Z j ) = (i∧j)(n− (i∨j)) n (3.5) on the same probability space such that E exp(λ max 0≤i≤n |W i −ηZ i |) ≤ exp C logn + K 1 λ 2 S 2 n n +K 2 λ 2 n(γ 2 −η 2 ) 2 ! for any λ≤λ 0 . (3.6) Proof. As the result holds trivially for λ≤ 0 we need consider only λ> 0. Also, asW 0 = 0 and Z 0 = 0 by convention it suffices to consider the maximum over 1≤i≤n in (3.6). We use Theorem 3.2.2 and induction to prove the theorem. Recall the constantsα 1 from Lemma 3.2.8 depending only onA, andc 1 ,c 2 andθ 2 from Theorem 3.2.2, depending only onA andν. WithB given in (2.14), lettingθ 5 be the unique positive solution to 1 p 1−B 4 θ 2 /2 = 4 3 , (3.7) depending only onA. We will demonstrate the claim holds with C = 2 + log 4 log(3/2) , K 1 = 8c 1 , K 2 = 18c 2 and λ 0 = r α 1 32c 1 ∧ θ 2 2 ∧ θ 5 √ 72c 2 . (3.8) Note that any multiset ={ 1 ,..., n } of elements ofA lies in exactly one set of the form B n (a,b) ={{ 1 , 2 ,..., n } : n X i=1 i =a, 1 n n X i=1 2 i =b 2 } as a and b range over all pairs of feasible values of S n and γ, respectively. Fix one such feasible pair a,b, which may be notationally suppressed when clear from context, let ∈ B n (a,b) be arbitrary and fix any value η> 0. With f n (s) the probability mass function of (S 1 ,...,S n ) given in (3.4) and φ n (z) the probability density function of a Gaussian random vector (Z 1 ,...,Z n ) with mean zero and 41 covariance (3.5), we show that for eachn≥ 1, we can construct a joint probability function ρ n (s, z) onA n ×R n having the desired marginals X s∈A n ρ n (s, z) =φ n (z) and Z R n ρ n (s, z)dz =f n (s) (3.9) and satisfying the exponential bound Z R n X s∈A n exp λ max 1≤i≤n s i − ia n −ηz i ρ n (s, z) dz ≤ exp C logn + K 1 λ 2 a 2 n +K 2 λ 2 n(b 2 −η 2 ) 2 ! for all λ∈ (0,λ 0 ], (3.10) for all η ≥ ν, with C,K 1 ,K 2 and λ 0 as in (3.8), with C universal and the latter three constants depending only onA and ν. We will prove the claim by induction onn. Forn = 1 we note thatW 1 = 0 by (2.1) and Z 1 = 0 by convention, since it has mean zero and covariance given by (3.5). Hence (3.6) holds for n = 1 for all C, all nonnegative K 1 ,K 2 , and all λ 0 , and in particular for the set of constants specified in (3.8). Givenn≥ 2, suppose that for all l = 1, 2,...,n− 1 and all multisubsetsζ ζ ζ ofA of sizel we can construct ρ l ζ ζ ζ (s, z) satisfying (3.9) and (3.10). Take k = [n/2], lett denote multiset union and define the sets S n,k ={s : X ∈ 1 =s for some 1 , 2 such that| 1 | =k, 1 t 2 = }, and B n,k (s) ={( 1 , 2 ) : X ∈ 1 =s,| 1 1 1 | =k, 1 t 2 = } for s∈S n,k . That is,S n,k is the set of all feasible values at time k of a path having increments , and B n,k (s) is the set of all ways of dividing the n increments into sets of sizes k and n−k so that the path at time k takes the value s. Counting the number of paths that take the values∈S n,k at timek shows thatg n,k (s), the marginal density ofS k inf n (s), is given by g n,k (s) = P (ζ ζ ζ 1 ,ζ ζ ζ 2 )∈B n,k (s) |A k ζ ζ ζ 1 ||A n−k ζ ζ ζ 2 | |A n | . (3.11) Similarly, let h n,k (z) denote the marginal density function of Z k in φ n (z), that of the Gaussian distribution with mean zero and variance k(n−k)/n. By Theorem 3.2.2, there 42 exists a joint density function ψ n,k (s,z) onS n,k ×R and positive constants c 1 ,c 2 and θ 2 , depending only onA and ν, such that Z ψ n,k (s,z)dz =g n,k (s), X s∈S n,k ψ n,k (s,z) =h n,k (z), (3.12) and for θ≤θ 2 and η≥ν, Z X s∈S n,k exp θ s− ka n −ηz ψ n,k (s,z) dz≤ exp 3 + c 1 θ 2 a 2 n +c 2 θ 2 n(b 2 −η 2 ) 2 ! . (3.13) For s∈S n,k ,z∈ R, and recalling the definition (3.3) of s , s 1 , s 2 such that ( s 1 , s 2 )∈ B n,k (s), z 1 ∈R k and z 2 ∈R n−k , let γ n (s,z, s 1 , z 1 , s 2 , z 2 ) =ψ n,k (s,z)P ,s ( s 1 , s 2 )ρ k s 1 (s 1 , z 1 )ρ n−k s 2 (s 2 , z 2 ) (3.14) where P ,s ( 1 , 2 ) = |A k 1 ||A n−k 2 | P (ζ ζ ζ 1 ,ζ ζ ζ 2 )∈B n,k (s) |A k ζ ζ ζ 1 ||A n−k ζ ζ ζ 2 | 1(( 1 , 2 )∈B n,k (s)). Interpreting (3.14) in terms of a construction, one first samples the joint values s andz of the coupled random walk and Gaussian path at time k, then chooses increments corre- sponding to s 1 and s 2 , the first and last half of the walk according to their likelihood over the choices of those whose increments over the first half of the walk sum to s, and whose union of increments over both halves must be , and then samples coupled values of the paths with discrete Brownian bridges before and after time k. One may verify that γ n is a density function by integrating over z 1 and z 2 using the second equality in (3.9) followed by applying the second equality in (3.4), integrating overz, and then summing over all s 1 and s 2 ands, this last operation being equivalent to summing over all paths s with increments , see (3.17) and the explanation following. Now, let (S,Z, S 1 , Z 1 , S 2 , Z 2 ) be a random vector with densityγ n where S 1 = (S 1 i ) 1≤i≤k , S 2 = (S 2 i ) 1≤i≤n−k and Z 1 = (Z 1 i ) 1≤i≤k , Z 2 = (Z 2 i ) 1≤i≤n−k . Let S be obtained by ‘piecing’ the paths S 1 and S 2 together at time k according to the rule S i = S 1 i 1≤i≤k S +S 2 i−k k<i≤n, (3.15) 43 here noting S k =S, and define Z by Z i = Z 1 i + i k Z 1≤i≤k Z 2 i−k + n−i n−k Z k<i≤n, (3.16) here noting likewise that Z k = Z, since Z 1 k = 0. Now as in [22], we demonstrate that ρ n (s, z), the joint density of (S, Z), achieves the desired marginals (3.9) and exponential bound (3.10). 1. Marginal distribution of S. Let s s s be the path constructed from s, s 1 and s 2 as S is constructed from S, S 1 and S 2 in (3.15). Note that {s : s∈A n } ={s : ( s 1 s 1 s 1 , s 2 s 2 s 2 )∈B n,k (s k )}, and that S k =S almost surely. Hence, if S6∈A n then from (3.14) S has probability zero. For the marginal of γ n to be non-zero on s, s 1 , s 2 , first s must be a feasible value at time k for a path with increments , then s 1 must be a path of increments that attains the value s at time k, and finally the collection of increments determined by s 1 and s 2 must match the given set of increments. In this case we obtain from (3.9), (3.12) and (3.11), that the marginal distribution of (S, S 1 , S 2 ) is given by Z γ n (s,z, s 1 , z 1 , s 2 , z 2 )dz 2 dz 1 dz =g n,k (s)P ,s ( s 1 , s 2 )f k s 1 (s 1 )f n−k s 2 (s 2 ) = P (ζ ζ ζ 1 ,ζ ζ ζ 2 )∈B n,k (s) |A k ζ ζ ζ 1 ||A n−k ζ ζ ζ 2 | |A n | |A k s 1 ||A n−k s 2 | P (ζ ζ ζ 1 ,ζ ζ ζ 2 )∈B n,k (s) |A k ζ ζ ζ 1 ||A n−k ζ ζ ζ 2 | 1 |A k s 1 ||A n−k s 2 | = 1 |A n | =f n (s s s). (3.17) Now observing that (3.15) gives a one-to-one correspondence between (S, S 1 , S 2 ) and S we find that S has marginal density f n (s s s) as in (3.4). 2. Marginal distribution of Z. Consider A k 1 1 1 ×A n−k 2 2 2 , the set of all pairs of paths (s 1 , s 2 ) with increments 1 and 2 respectively. Using (3.9) and (3.12), and noting that 44 ( s 1 , s 2 ) = ( 1 , 2 ) for (s 1 , s 2 )∈A k 1 1 1 ×A n−k 2 2 2 , the marginal distribution ofZ, Z 1 , Z 2 is given by X s∈S n,k X ( 1 , 2 )∈B n,k (s) X (s 1 ,s 2 )∈A k 1 1 1 ×A n−k 2 2 2 γ n (s,z, s 1 , z 2 , s 2 , z 2 ) = X s∈S n,k ψ n,k (s,z) X ( 1 , 2 )∈B n,k (s) P ,s ( 1 , 2 ) X (s 1 ,s 2 )∈A k 1 1 1 ×A n−k 2 2 2 ρ k 1 (s 1 , z 1 )ρ n−k 2 (s 2 , z 2 ) = X s∈S n,k ψ n,k (s,z) X ( 1 , 2 )∈B n,k (s) P ,s ( 1 , 2 ) X s 1 ∈A k 1 ρ k 1 (s 1 , z 1 ) X s 2 ∈A n−k 2 ρ n−k 2 (s 2 , z 2 ) = X s∈S n,k ψ n,k (s,z) X ( 1 , 2 )∈B n,k (s) P ,s ( 1 , 2 )φ k (z 1 )φ n−k (z 2 ) =φ k (z 1 )φ n−k (z 2 ) X s ψ n,k (s,z) =φ n−k (z 2 )φ k (z 1 )h n,k (z) where we have used that P ( 1 , 2 )∈B n,k (s) P ,s ( 1 , 2 ) = 1. Hence Z, Z 1 and Z 2 are indepen- dent with densities h n,k (z), φ k (z 1 ) and φ n−k (z 2 ) respectively, implying that Z given by (3.16) is a multivariate mean zero Gaussian random vector. As in [22], one can verify that Z has covariances given by (3.5), and hence Z∼φ n (z). 3.The exponential bound. For 1≤i≤n, letting W i =S i − ia n , we show that E exp(λ max 1≤i≤n |W i −ηZ i |)≤ exp C logn + K 1 λ 2 a n +K 2 λ 2 n(b 2 −η 2 ) 2 ! for λ∈ (0,λ 0 ] where C,K 1 ,K 2 and λ 0 are as in (3.8). We continue to proceed as in [22]. Again writing S for S k , let T L := max 1≤i≤k S 1 i − iS k −ηZ 1 i ,T R := max k<i≤n S 2 i−k − i−k n−k (a−S)−ηZ 2 i−k , 45 and T := S− ka n −ηZ . Note that when 1≤i≤k we have |W i −ηZ i | = S 1 i − ia n −η Z 1 i + iZ k ≤ S 1 i − iS k −ηZ 1 i + iS k − ia n − i k ηZ ≤T L + i k T≤T L +T. Similarly for k<i≤n one can verify|W i −ηZ i |≤T R +T , proving max 1≤i≤n |W i −ηZ i |≤ max{T L +T,T R +T}. Now fixing λ≤λ 0 , the inequality exp(x∨y)≤e x +e y yields exp(λ max 1≤i≤n |W i −ηZ i |)≤ exp(λT L +λT ) + exp(λT R +λT ). (3.18) To prove that the exponential bound holds, we develop inequalities on the expectation of the two quantities on the right hand side of (3.18), starting with the expression involving T L . Note that s 1 determines S, and since is fixed s 2 is also determined, so by (3.14) the conditional density of (S 1 , Z 1 ) given ( S 1 ,Z) is ρ k S 1 (s 1 , z 1 ). Now using that the moment generating functions of T L and T are finite everywhere and that T is a function of{S,Z}, invoking the induction hypothesis and applying the Cauchy-Schwarz inequality twice, with γ 2 1 = (1/k) P k i=1 2 π(i) we obtain E exp(λT L +λT ) =E h E exp(λT L )| S 1 ,Z exp(λT ) i ≤ E E exp(λT L )| S 1 ,Z 2 E(exp(2λT )) 1/2 ≤ exp(C logk) h E exp 2K 1 λ 2 S 2 k + 2K 2 λ 2 k(γ 2 1 −η 2 ) 2 E exp(2λT ) i 1/2 ≤ exp(C logk) h E exp 4K 1 λ 2 S 2 k E exp 4K 2 λ 2 k(γ 2 1 −η 2 ) 2 i 1/4 (E exp(2λT )) 1/2 .(3.19) 46 For the first expectation in (3.19), (3.8) implies that 0≤ 4K 1 λ 2 ≤α 1 , and as|2k−n|≤ 1 we may invoke Lemma 3.2.8 to yield E exp 4K 1 λ 2 S 2 k ! ≤ exp 1 + 3K 1 λ 2 a 2 n ! . (3.20) For the second expectation in (3.19), recalling the definition of γ 2 1 , E exp 4K 2 λ 2 k(γ 2 1 −η 2 ) 2 =E exp 4K 2 λ 2 1 k k X i=1 ( 2 π(i) −η 2 ) ! 2 =E exp θ 2 U 2 k k ! ,(3.21) where θ = 2λ √ K 2 , and we write U k = k X i=1 ( 2 π(i) −η 2 ) = n X i=1 2 i 1(i∈π([k]))− k n η 2 = n X i=1 a i , where [k] ={1,...,k} so that π([k]) ={π(i) : i = 1, 2,...,k}, and a i = 2 i 1(i∈π([k]))− (k/n)η 2 . To bound (3.21), we will argue as in Lemma 3.2.5. Observe that forV a standard normal random variable independent of U k , E exp θ 2 U 2 k k ! =E exp √ 2θ V √ k U k =E exp √ 2θ |V|sgn(V ) √ k U k =E exp √ 2θ |V| √ k U k sgn(V ) = 1 P (sgn(V ) = 1) +E exp √ 2θ |V| √ k (−U k ) sgn(V ) =−1 P (sgn(V ) =−1). Now using the independence of|V| and sgn(V ), and that sgn(V ) is a symmetric±1 random variable, we obtain E exp θ 2 U 2 k k ! = 1 2 E exp √ 2θU k |V| √ k +E exp √ 2θ(−U k ) |V| √ k . (3.22) Recall that random variables X 1 ,X 2 ,...,X n are said to be negatively associated, see [46], if for any two disjoint index sets I and J, E[f(X i ,i∈I)g(X j ,j∈J)]≤E[f(X i ,i∈I)]E[g(X j ,j∈J)] (3.23) 47 for all coordinatewise nondecreasing functions f :R |I| →R and g :R |J| →R. Let X 1 ,...,X n be negatively associated. It is immediate that aX 1 +b,...,aX n +b are negatively associated for all a≥ 0 and b∈ R. In addition, letting Y i =−X i for all i = 1,...,n, for f(·) and g(·) coordinatewise nondecreasing functions and I and J disjoint index sets, as−f(−·) is coordinatewise nondecreasing, we have E[f(Y i ,i∈I)g(Y j ,j∈J)] =E[(−f(−X i ,i∈I))(−g(−X j ,j∈J))] ≤E[(−f(−X i ,i∈I))]E[(−g(−X j ,j∈J))] =E[f(Y i ,i∈I)]E[g(Y j ,j∈J)], demonstrating that−X 1 ,...,−X n are negatively associated. Combining these two facts, aX 1 +b,...,aX n +b are negatively associated for alla∈R andb∈R. By a direct inductive argument on (3.23), E " n Y i=1 f i (X i ) # ≤ n Y i=1 E [f i (X i )] (3.24) whenever the functions f i ,i = 1, 2,...,n are all nondecreasing. By Theorem 2.11 of [46], taking the real numbers in Definition 2.10 there to consist of k ones and n−k zeros, the indicators 1(1∈π([k])),...,1(n∈π([k])) are negatively associated; hence so are a 1 ,...,a n and−a 1 ,...,−a n . Thus, by (3.24), we have E exp √ 2θU k |V| √ k V =E " exp √ 2θ n X i=1 a i |V| √ k ! V # ≤ n Y i=1 E exp √ 2θa i |V| √ k V = k Y i=1 E exp √ 2θ 2 π(i) −η 2 |V| √ k V . (3.25) Now since−η 2 ≤ 2 π(i) −η 2 ≤B 2 −η 2 , using Hoeffding’s lemma (2.9) with μ =b 2 −η 2 , the mean of 2 π(i) −η 2 , we obtain k Y i=1 E exp √ 2θ( 2 π(i) −η 2 ) |V| √ k V ≤ exp B 4 θ 2 V 2 4k + √ 2θμ |V| √ k ! k = exp B 4 θ 2 V 2 4 + √ 2θμ √ k|V| ! ≤ exp B 4 θ 2 V 2 4 + √ 2θμ √ kV ! + exp B 4 θ 2 V 2 4 + √ 2θμ √ k(−V ) ! . 48 Using thatV and−V have the same distribution, taking expectation in (3.25) and then applying the non-central chi square identity (2.10) yields E exp √ 2θU k |V| √ k ≤ 2E " exp B 4 θ 2 V 2 4 + √ 2θμ √ kV !# = 2 p 1−B 4 θ 2 /2 exp kθ 2 μ 2 p 1−B 4 θ 2 /2 ! ≤ 8 3 exp 4 3 kθ 2 μ 2 (3.26) for all 0≤θ≤θ 5 , by (3.7). Using the fact that−a 1 ,...,−a n are negatively associated and that−a i and a i have supports over intervals of equal length for all i = 1, 2,...,n, (3.26) holds with U k replaced by−U k . Thus, by (3.22), E exp θ 2 U 2 k k ! ≤ 8 3 exp 4 3 kθ 2 μ 2 for 0≤θ≤θ 5 . (3.27) Using (3.8) we see that 0≤ 4K 2 λ 2 ≤ θ 2 5 , and as k≤ 2n 3 , by (3.21) and (3.27), and recalling that μ =b 2 −η 2 , we have E exp 4K 2 λ 2 k(γ 2 1 −η 2 ) 2 ≤ 8 3 exp 16 3 K 2 λ 2 k(b 2 −η 2 ) 2 ≤ 3 exp 32 9 K 2 λ 2 n(b 2 −η 2 ) 2 . (3.28) For the third expectation in (3.19), again by (3.8), 0≤ 2λ≤θ 2 . Hence by (3.13), E exp(2λT )≤ exp 3 + 4c 1 λ 2 a 2 n + 4c 2 λ 2 n(b 2 −η 2 ) 2 ! . (3.29) Applying bounds (3.20), (3.28) and (3.29) in (3.19), and setting Q 12 = 1 + 3K 1 λ 2 a 2 n + 32K 2 λ 2 n(b 2 −η 2 ) 2 9 and Q 3 = 3 + 4c 1 λ 2 a 2 n + 4c 2 λ 2 n(b 2 −η 2 ) 2 , we obtain E exp(λT L +λT )≤ 3 1/4 exp C logk + 1 4 Q 12 + 1 2 Q 3 ≤ 2 exp C logk + 2 + (3K 1 + 8c 1 )λ 2 a 2 4n + (8K 2 + 18c 2 ) 9 λ 2 n(b 2 −η 2 ) 2 ! . 49 Again by (3.8), 3K 1 + 8c 1 = 4K 1 and 8K 2 + 18c 2 = 9K 2 . Since k≤ 2n/3, we have logk = logn− log(n/k)≤ logn− log(3/2). Thus, using from (3.8) that C log(3/2) = log 4 + 2, E exp(λT L +λT )≤ 2 exp C logn−C log(3/2) + 2 + K 1 λ 2 a 2 n +K 2 λ 2 n(b 2 −η 2 ) 2 ! = 1 2 exp C logn + K 1 λ 2 a 2 n +K 2 λ 2 n(b 2 −η 2 ) 2 ! . In like manner we obtain this same bound onE exp(λT R +λT ), so (3.18), now yields exp(λ max 1≤i≤n |W i −ηZ i |)≤ exp C logn + K 1 λ 2 a 2 n +K 2 λ 2 n(b 2 −η 2 ) 2 ! . This step completes the induction, and the proof. Proof of Theorem 3.1.2: LetA be the set of the r distinct values{a 1 ,...,a r } and let 1 , 2 ,..., n be exchangeable random variables taking values inA. Let M = (M 1 ,...,M r ) where for j = 1,...,r we set M j = n X i=1 1( i =a j ), the number of components of the multiset ={ 1 ,..., n } that take on the value a j . With L denoting distribution, or law, clearly L( 1 , 2 ,..., n ) = X m≥0 L( 1 , 2 ,..., n |M = m)P (M = m) where m = (m 1 ,...,m r ) and m ≥ 0 is to be interpreted componentwise. As M is a symmetric function of 1 , 2 ,..., n , the conditional lawL( 1 , 2 ,..., n |M = m) inherits exchangeability fromL( 1 , 2 ,..., n ), that is, L( 1 , 2 ,..., n |M = m) = d L( π(1) , π(2) ,..., π(n) |M = m) where π is uniformly chosen fromP n . In particular, given M = m, k X i=1 i = d k X i=1 π(i) for all k = 1,...,n . 50 Hence, (3.6) of Theorem 3.3.1 yields the version of the first claim of Theorem 3.1.2 when conditioning on M, and taking expectation over M yields that result. We now demonstrate the second claim under the assumption that 06∈A, which together withA finite implies that ν = min a∈A |a| (3.30) is positive. With this value ofν the constantsc 1 ,c 2 andθ 2 as given by Theorem 3.2.2 depend only onA, and letC,K 1 ,K 2 andλ 0 be as given in (3.8) for thisν. Asγ≥ν, conditional on 1 ,..., n , inequality (3.6) of Theorem 3.3.1 holds forη =γ, and the argument is completed by taking expectation over M as for the proof of the first claim. For the last claim, under the hypotheses that 1 ,..., n are i.i.d. mean zero random variables, since K 1 depends only onA, by Lemma 3.2.5 there exists λ> 0 depending only onA such that E K 1 λ 2 S 2 n n ! ≤ 2. Thus from the second claim of the theorem we obtain E exp(λ max 0≤k≤n |W k − √ nγB k/n |)≤ 2 exp(C logn), and applying Markov’s inequality yields P max 0≤k≤n |W k − √ nγB k/n |≥λ −1 C logn +x ≤ E exp(λ max 0≤k≤n |W k − √ nγB k/n |) exp(C logn) e −λx ≤ 2 exp(C logn) exp(C logn) e −λx = 2e −λx . 4 Proof of Theorem 3.1.1 In this final section we prove Theorem 3.1.1 by first demonstrating a ‘finite n version’ of the desired result in the following lemma. Lemma 3.4.1. There exists a constant A > 1 such that for every finite setA of real numbers not containing zero, there exists a constantλ> 0 such that for any positive integer n, any , 1 , 2 ,..., n i.i.d. random variables with mean zero and variance one satisfying 51 E 3 = 0 and taking values inA, and S k = P k i=1 i ,k = 1,...,n, it is possible to construct a version of the sequence (S k ) 0≤k≤n and Gaussian random variables (Z k ) 0≤k≤n with mean zero and Cov(Z i ,Z j ) =i∧j on the same probability space such that E exp(λ|S n −Z n |)≤A (4.1) and E exp(λ max 0≤k≤n |S k −Z k |)≤A exp(A logn). (4.2) Proof. As in Theorem 3.3.1 it suffices to prove the result with the maximum taken over 1≤k≤n. Recall the positive constantθ 1 from Theorem 3.2.1, the valuesϑ `(X) from Lemma 3.2.5, B from (2.14), and let C,K 1 ,K 2 and λ 0 be as in Theorem 3.1.2 for ν = min a∈A |a|. Set λ = min ( θ 1 2 , λ 0 4 , ϑ `() 4 √ K 1 , ϑ `( 2 ) √ 2 , 1 B + 1 ) . (4.3) Let g n (s) and h n (z) denote the mass function of S n and the density of Z n respectively; in particular h n (z) is just theN (0,n) density. By Theorem 3.2.1, as 2λ≤θ 1 , withS n the support of S n , there is a joint probability function ψ n (s,z) onS n ×R such that Z R ψ n (s,z)dz =g n (s), X s∈S n ψ n (s,z) =h n (z), (4.4) and Z R " X s∈S n exp(2λ|s−z|)ψ n (s,z) # dz≤ 8. (4.5) Given any multiset of values ={ 1 ,..., n } fromA, let ρ n (s, z) be the joint density function guaranteed by Theorem 3.3.1; from that result, the marginal distributions of s and z are, respectively, f n (s) as in (3.4), and φ n (z), that of a mean zero Gaussian vector with covariance (3.5). For any s∈S n , define B n (s) ={{ 1 , 2 ,..., n } : n X i=1 i =s}. 52 Now, recalling the definition (3.3) of s , for s∈S n , s such that s ∈B n (s), z∈ R and e z∈R n , let γ n (s,z, s,e z) =ψ n (s,z)P ( = s |S n =s)ρ n s(s,e z), (4.6) where the multiset on the right hand side is composed ofn independent random variables distributed as . Interpreting (4.6) in terms of a construction, to obtain (S,Z, S, e Z) one first samples the joint values S and Z of the coupled random walk and Gaussian path at timen, then conditional on the terminal value S, one samples increments consistent with the path s from their i.i.d. distribution, and finally one couples a walk S to the discrete Brownian bridge e Z in such a way that a certain multiple of it and (W 1 ,...,W n ) given by W i =S i − i n S n (4.7) are close. To verify that (4.6) determines a probability function, recalling (3.2), note first that X s: s ∈B n (s) P ( = s |S n =s)ρ n s(s,e z) = X δ δ δ∈B n (s) X s∈A n δ δ δ P ( =δ δ δ|S n =s)ρ n δ δ δ (s,e z) = X δ δ δ∈B n (s) P ( =δ δ δ|S n =s) X s∈A n δ δ δ ρ n δ δ δ (s,e z) = X δ δ δ∈B n (s) P ( =δ δ δ|S n =s)φ n (e z) =φ n (e z). Now by (4.4), X s∈S n X s: s ∈B n (s) γ n (s,z, s,e z) =h n (z)φ n (e z), (4.8) and integrating over z ande z yields 1. Let (S,Z, S, e Z) be a random vector sampled from γ n (s,z, s,e z), and define Z = (Z 1 ,...,Z n ) by Z i = e Z i + i n Z. Using that Z and e Z are independent by (4.8), and that the latter has covariance given by (3.5), it follows that Z is a mean zero Gaussian random vector with Cov(Z i ,Z j ) =i∧j. 53 Regarding the marginals of s, integrating (4.6) over z ande z, with f n (s) given by (3.4), we obtain Z R n Z R γ n (s,z, s,e z)dzde z =g n (s)P ( = s |S n =s)f n s s s(s) =P ( = s )f n s s s(s) =P ( = s ) 1 |A n s| . The first term is the likelihood that the independently generated increments corresponding to those of s, while the second term is the chance that these increments will be arranged by the uniform permutation in an order that produces s. Hence, the marginal correspond to the distribution of S. It only remains to show that the pair (S, Z) satisfies the bounds (4.1) and (4.2). Note that for 1≤i≤n, recalling (4.7), we have |S i −Z i | = W i + i n S− e Z i + i n Z ≤ |W i − e Z i | + i n |S−Z|. (4.9) From (4.6), one can easily check that the conditional distribution of (S, e Z) given ( S ,Z) = ( ,z) is ρ n (s, ˜ z). Letγ 2 =n −1 P n i=1 2 i and recallν = min a∈A |a|> 0. Asγ≥ν and 4λ≤λ 0 by (4.3), we may invoke Theorem 3.3.1 conditional on{ ,Z}, and choosing η =γ we obtain E(exp(4λ max 1≤i≤n |W i −γ e Z i |) ,Z)≤ exp C logn + 16K 1 λ 2 S 2 n n ! , (4.10) with C and K 1 depending only onA. Applying the Cauchy-Schwarz inequality and (4.5), as S and Z are measurable with respect to{ ,Z}, from (4.9) we obtain E exp(λ max 1≤i≤n |S i −Z i |) ≤ h E E exp(λ max 1≤i≤n |W i − e Z i |) ,Z 2 E exp(2λ|S−Z|) i 1/2 ≤ h 8E E exp(λ max 1≤i≤n |W i − e Z i |) ,Z 2 i 1/2 . (4.11) 54 Using conditional Jensen’s inequality, the triangle inequality and the convexity of the exponential function in the first three lines below, (4.10) yields E exp(λ max 1≤i≤n |W i − e Z i |) ,Z 2 ≤E exp(2λ max 1≤i≤n |W i − e Z i |) ,Z ≤ 1 2 E exp(4λ max 1≤i≤n |W i −γ e Z i |) ,Z + 1 2 E exp(4λ max 1≤i≤n |γ e Z i − e Z i |) ,Z ≤ 1 2 exp C logn + 16K 1 λ 2 S 2 n n ! + 1 2 E exp(4λ|γ− 1| max 1≤i≤n | e Z i |) ,Z ≤ exp(C logn) + 1 2 E exp(4λ|γ− 1| max 1≤i≤n | e Z i |) ,Z . (4.12) For the first term in the fourth line, Lemma 3.2.5 yields E exp 16K 1 λ 2 S 2 n n ! ≤ 2, since 1 has mean zero,| 1 |≤B in (2.14) and 4 √ K 1 λ≤ϑ `() by (4.3). For the second term in (4.12), observe that conditional on ( ,Z), e Z is a mean zero mul- tivariate Gaussian random vector with covariance given by (3.5). Equivalently, conditional on ( ,Z), the distribution of ( e Z i / √ n) 1≤i≤n is that of a Brownian bridge on [0, 1] sampled at times 1/n, 2/n,..., 1. Thus, letting B t ,t∈ [0, 1] be a Brownian bridge independent of ( ,Z), since γ is a function of , we have E exp(4λ|γ− 1| max 1≤i≤n | e Z i |) ,Z =E exp(4 √ nλ|γ− 1| max 1≤i≤n | e Z i | √ n ) ,Z =E exp(4 √ nλ|γ− 1| max t∈[n]/n |B t |) ,Z ≤E exp(4 √ nλ|γ− 1| max 0≤t≤1 |B t |) ,Z ≤E exp(4 √ nλ|γ− 1| max 0≤t≤1 B t ) + exp(4 √ nλ|γ− 1| max 0≤t≤1 (−B t )) ,Z . From [73], the distribution of X = max 0≤t≤1 B t is given by P (X≤x) = 1− exp(−2x 2 ) for x≥ 0. 55 Using this identity, and the fact that−B t is also a Brownian bridge, it is straightforward to show that for any real number a, we have E exp(a max 0≤t≤1 B t ) + exp(a max 0≤t≤1 (−B t )) ≤ 2 + √ 2πa exp(a 2 /8). Thus, since B t and γ are respectively independent of, and a function of, , we obtain E exp(4λ|γ− 1| max 1≤i≤n | e Z i |) ,Z ≤2 + √ 2π4 √ nλ|γ− 1| exp 2λ 2 n(γ− 1) 2 ≤2 + 4(B + 1) √ 2πnλ exp 2λ 2 n(γ 2 − 1) 2 (4.13) where in the last step, we used|γ− 1|≤B + 1 where B is given by (2.14), and that γ≥ 0 implies 1≤ (γ + 1) 2 . Since E 2 1 = 1, we have n(γ 2 − 1) 2 = P n i=1 ( 2 i − E 2 i ) 2 /n and E( 2 i − E 2 i ) = 0. As 2 ≤B 2 and 0≤ √ 2λ≤ϑ `( 2 ) , by (4.3), Lemma 3.2.5 yields E exp 2λ 2 n(γ 2 − 1) 2 ≤ 2. Additionally, since λ(B + 1)≤ 1 by (4.3), taking expectation in (4.13) yields E(exp(4λ|γ− 1| max 1≤i≤n | e Z i |)) = 2 + 8(B + 1) √ 2πnλ≤ exp(C 1 logn) (4.14) for some universal constant C 1 . Thus, by (4.11), (4.12) and (4.14), we have E exp(λ max 1≤i≤n |S i −Z i |) ≤ h 8E exp (C logn) + 1 2 E exp(4λ|γ− 1| max 1≤i≤n | e Z i |) ,Z ii 1/2 ≤8 1/2 h exp(C logn) + 1 2 exp(C 1 logn) i 1/2 ≤A exp(A logn) for some universal constantA, which we may take to be at least 8. The proof of (4.2) is now complete. Lastly note that e Z n = 0 implies Z n =Z, hence (4.5) yields (4.1) as A≥ 8. 56 Theorem 3.1.1 follows from Lemma 3.4.1 in exactly the same way as Theorem 1.5 follows from Lemma 5.1 in [22], noting that the reasoning applied at this step does not depend on the support of the summand variables of the random walk; see Appendix A for the proof. 57 Part II Dickman Approximation and its Applications 58 Chapter 4 Dickman Approximation in Summations, Probabilistic Number Theory 1 Introduction Following [40], recall that for a givenθ> 0 and non-negative random variableW , we define the θ-Dickman bias distribution of W by W ∗ = d U 1/θ (W + 1), (1.1) where U∼ U[0, 1] and is independent of W . Though the density ofD θ can presently be given only by specifying it somewhat indirectly as a solution to certain differential delay equations similar to (2.1), it is well known [31] that the distributionsD θ are characterized by satisfying W ∗ = d W uniquely, that is,D θ is the unique fixed point of the distributional transformation (1.1). Indeed, this property is the basis for simulating from this family using the recursion W n+1 =U 1/θ n (W n + 1) for n≥ 0, with W 0 = 0, (1.2) where U m ,m≥ 0 are i.i.d. U[0, 1] random variables and U n is independent of W n , see [31]. Generally, distributional characterizations such as (1.1), and their associated transfor- mations, provide an additional avenue to study distributions and their approximation, and have been considered for the normal [42], the exponential [55], and various other distribu- tions that may be less well known, such as one arising in the study of the degrees of vertices in certain preferential attachment graphs, see [56]. In the following, D θ will denote aD θ distributed random variable, where the subscript may be dropped when equal to 1. Recall that in [41], the upper bound d 1 (W,D θ )≤ (1 +θ)d 1 (W,W ∗ ) (1.3) 59 for the Wasserstein distance between a non-negative random variableW andD θ was proved via Stein’s method, where d 1 (X,Y ) = sup h∈Lip 1 |Eh(X)−Eh(Y )| (1.4) with Lip α ={h :|h(x)−h(y)|≤α|x−y|} for α≥ 0. (1.5) We will also apply the fact that alternatively one can write d 1 (X,Y ) = infE|X−Y|, (1.6) where the infimum is over all joint distributions having the given X,Y marginals. The infimum is achieved for variables taking values in any Polish space, see e.g. [60], and so in particular for those that are real valued. For notational simplicity we write d 1 (X,Y ), say, for d 1 (L(X),L(Y )). In [41], inequality (1.3) was used to derive a bound on the quality of the Dickman approximation for the running time of the Quickselect algorithm. In this chapter, our aim is two fold. First, in Section 2.1 we study the approximation of randomly weighted sums that converge to Dickman, for instance, those of the form W n = 1 n n X k=1 Y k B k , (1.7) where{B 1 ,...,B n ,Y 1 ,...,Y n } are independent, B k is a Bernoulli random variable with success probability 1/k, and Y k is non-negative with EY k = k, and Var(Y k ) = σ 2 k for all k = 1,...,n. A well known special case is when Y k =k a.s. so that W n = 1 n n X k=1 kB k . (1.8) Recall definition (1.7) of the Wasserstein-2 metric d 1,1 (X,Y ) = sup h∈H 1,1 |Eh(Y )−Eh(X)| (1.9) where, for α≥ 0,β≥ 0, H α,β ={h :h∈ Lip α ,h 0 ∈ Lip β }, (1.10) 60 with Lip α given in (1.5). Theorem 4.1.1. LetW n be as in (1.7) andD a standard Dickman random variable. Then with the metric d 1,1 in (1.9), d 1,1 (W n ,D)≤ 3 4n + 1 2n 2 n X k=1 1 k q (σ 2 k +k 2 )σ 2 k , and in particular if Y k =k a.s., that is, for W n as in (1.8), d 1,1 (W n ,D)≤ 3 4n . (1.11) From the first bound given by the theorem, speaking asymptotically we see thatW n →D in distribution in the general case of (1.7) as soon as P n k=1 1 k q (σ 2 k +k 2 )σ 2 k = o(n 2 ). In particular, weak convergence to the Dickman distribution occurs if σ 2 k =O(k 2− ) for some > 0. In Section 2 we provide an application of Theorem 4.1.1 to minimal directed spanning trees inR 2 . We also show the following related result for a weighted sum of independent Poisson variables. For λ> 0, letP(λ) denote a Poisson random variable with mean λ. Theorem 4.1.2. For θ > 0, let{P 1 ,...,P n ,Y 1 ,...,Y n } be independent with P k ∼P(θ/k) and Y k non-negative with EY k =k and Var(Y k ) =σ 2 k , for all k = 1,...,n. Then W n = 1 n n X k=1 Y k P k (1.12) satisfies d 1,1 (W n ,D θ )≤ θ 4n + θ n n X k=1 σ k k + θ 2n 2 n X k=1 1 k q (σ 2 k +k 2 )σ 2 k , (1.13) and in particular, in the case Y k =k a.s., W n = 1 n n X k=1 kP k satisfies d 1,1 (W n ,D θ )≤ θ 4n . (1.14) Note here that similar to the weighted sum of Bernoullis in (1.7), we have weak conver- gence to the Dickman distribution if σ 2 k =O(k 2− ) for some > 0. In Section 2.2, we will turn our attention to the study of Dickman approximation of weighted geometric and Bernoulli sums that appear in probabilistic number theory. For 61 geometric variables, we write X ∼ Geom(p) if P (X = m) = (1−p) m p for m≥ 0. Let (p k ) k≥1 be an enumeration of the prime numbers in increasing order and Ω n denote the set of all positive integers having no prime factor larger thanp n . LetX 1 ,...,X n be independent with X k ∼ Geom(1− 1/p k ) for 1≤k≤n, and consider M n = n Y k=1 p X k k and S n = logM n log(p n ) = 1 log(p n ) n X k=1 X k log(p k ). (1.15) One can check (see e.g. [58]) that M n ∼ Π n has mass function Π n (m) = 1 π n m for m∈ Ω n with normalizing constant necessarily satisfying π n = P m∈Ωn 1/m. Distributional conver- gence of S n to the standard Dickman distribution was proved in [58]. In Theorem 4.1.3 below, we provide (logn) −1 convergence rate in Wasserstein-2 norm. Theorem 4.1.3. For D a standard Dickman random variable and S n as in (1.15) with X 1 ,...,X n independent variables with X k ∼ Geom(1− 1/p k ), we have d 1,1 (S n ,D)≤ C logn for some universal constant C. If instead one considers the distribution Π 0 n over Ω 0 n , the set of square-free numbers with largest prime factor less than or equal to p n , with Π 0 n (m) proportional to 1/m for all m∈ Ω 0 n , then M n = Q n k=1 p X k k has distribution Π 0 n when X k ∼ Ber(1/(1 +p k )) and are independent (see e.g. [19]). That S n = logM n / log(p n ) converges in distribution to the standard Dickman was proved in [19] and very recently a (log logn) 3/2 (logn) −1 rate was provided in [1] in a metric defined as a supremum over a class of three times differentiable functions. We provide the improved (logn) −1 convergence rate in the stronger Wasserstein-2 norm. Theorem 4.1.4. For D a standard Dickman random variable and S n as in (1.15) with X 1 ,...,X n independent variables with X k ∼ Ber(1/(1 +p k )), we have d 1,1 (S n ,D)≤ C logn for some universal constant C. 62 For our results in probabilistic number theory, we closely follow the arguments in [1]. We also provide such bounds when X k ’s are distributed as Poisson random variables with parameters λ k > 0 given by certain functions of p k . We obtain our results by extensions of [40] for the Stein method framework for the Dickman distribution. As discussed in Chapter 2, to apply Stein’s method, one first needs to obtain a characterizing equation for the given target distribution, the Dickman distribution in our case, which is then used as the basis to form a Stein equation. One key step of the method requires bounds on the smoothness of the solution to be obtained over the given class of test functions. Theorems 4.1.4 improves on results of [1]. That work applies a different version of Stein’s method, and in particular does not consider any form of the Stein equation, such as (1.16) or (1.18). Consequently [1] does not obtain bounds on a Stein solution for any Dickman case, as is achieved here in Theorems 4.1.5 and 5.3.10. Indeed, there it is noted in [2] that this last step can be an ‘extremely difficult problem’ . In [40] the Stein equation used was of the integral type g(x)−A x+1 g =h(x)−E[h(D θ )] (1.16) where the averaging operator A x g was given by A x g = g(0) for x = 0 θ x θ R x 0 g(u)u θ−1 du for x> 0. (1.17) Formulations of the form (1.16) are appropriate when the variable W of interest can be coupled to some W ∗ with its θ-Dickman bias distribution. However, such direct couplings appear elusive for all our examples in Section 2, including in particular those to probabilistic number theory, and a different approach is needed. To handle these new examples we consider instead a new Stein equation, of differential-delay type, given by (x/θ)f 0 (x) +f(x)−f(x + 1) =h(x)−E[h(D θ )]. (1.18) To apply the method, uniform bounds on the smoothness of the solution f(·) over test functions h(·) in some classH is required; we achieve such bounds for the class H 1,1 . 63 Recall, for a real-valued measurable function f(·) on the domain S⊂R,kfk ∞ denotes its essential supremum norm defined by kfk ∞ = ess sup x∈S |f(x)| = inf{b∈R :m({x :f(x)>b}) = 0}, (1.19) where m(·) denotes the Lebesgue measure on R. For any real valued function defined on its domain S and A⊂S, we define its supremum norm on A by kfk A = sup x∈A |f(x)|. (1.20) Theorem 4.1.5. For every θ> 0 and h∈H 1,1 , there exists a solution f∈H θ,θ/2 to (1.18) withkf 0 k (0,∞) ≤θ andkf 00 k (0,∞) ≤θ/2. We will prove Theorem 4.1.5 in Section 3. Unless otherwise specifically noted, integration will be with respect to m(·), and for notational simplicity we will use, say dv, in place of writing dm(v). This chapter is organized as follows. We focus on sums, such as the Bernoulli and Poisson weighted sums in (1.7) and (1.12), and sums arising in probabilistic number theory as (1.15) in Section 2 and in Section 3 we prove Theorem 4.1.5 providing smoothness bounds on the solution to the Stein equation (1.18) considered here. 2 Dickman approximation of sums We will prove Theorems 4.1.1 and 4.1.2, starting with a simple application of the former, in Section 2.1, and then provide the proofs of Theorems 4.1.3 and 4.1.4, in probabilistic number theory, in Section 2.2. In this section we deal with the form (1.18) of the Stein equation. That is, in the proofs of Theorems 4.1.1, 4.1.2 and 4.2.1, we take a fixedh∈H 1,1 , the function class defined in (1.10), and letf∈H 1,1/2 be the solution of the Stein equation (1.18) that is guaranteed by Theorem 4.1.5 in the case θ = 1. Substituting our W n of interest for x in (1.18) and taking expectation yields E[h(W n )]−E[h(D)] =E (W n /θ)f 0 (W n )− (f(W n + 1)−f(W n )) . (2.1) 2.1 Weighted Bernoulli and Poisson sums We begin with a simple application of Theorem 4.1.1 to the minimal directed spanning tree, or MDST, following [15], first pausing to describe the construction of the MDST. 64 For two points (u 1 ,v 1 ) and (u 2 ,v 2 ) in R 2 , we write (u 1 ,v 1 ) (u 2 ,v 2 ) if u 1 ≤ u 2 and v 1 ≤ v 2 , and write (u 1 ,v 1 )6 (u 2 ,v 2 ) otherwise. For any set of pointsV in R 2 , we say (u,v)∈V is a minimal point, or sink, ofV if (a,b)6 (u,v) for all (a,b)∈V, (a,b)6= (u,v). Forn∈N, consider a set ofn + 1 distinct pointsV ={(a i ,b i ), 0≤i≤n} in [0, 1]× [0, 1] where we take (a 0 ,b 0 ) = (0, 0), the origin. LetE be the set of directed edges ((a i ,b i ), (a j ,b j )) with i6= j and (a i ,b i ) (a j ,b j ). Since (0, 0) (a i ,b i ) for all i = 1,...,n, the edge set E contains all the directed edges (a 0 ,b 0 )→ (a i ,b i ) with i6= 0. Let G be the collection of all graphs G with vertex set G V =V and edge set G E ⊆E such that for any 1≤j≤n, there exists a directed path from (a 0 ,b 0 ) to (a j ,b j ) with each edge inG E . We define a MDST on V as any graph T∈G that minimizes P e∈G E |e| where|e| denotes the Euclidean length of the edge e. Clearly T is a tree and need not be unique. Now letP be a random collection of n points uniformly and independently placed in the unit square [0, 1] 2 in R 2 . In this random setting, the MDST on the point setV = P∪{(0, 0)} is uniquely defined almost surely, see [15]. By relabeling the points according to the size of their x-coordinate, without loss of generality, we may let the points inP be (X 1 ,Y 1 ),..., (X n ,Y n ) where Y 1 ,...,Y n are independentU[0, 1] random variables, and also independent of X 1 ,...,X n , where 0 < X 1 < X 2 <··· < X n < 1 have the distribution of the order statistics generated from a sample of n independentU[0, 1] variables. Though the origin is the unique minimal point ofV, the usual set of interest is the collection of minimal points ofP which has size at least one. For i = 1,...,n, observe that (X i ,Y i ) is a minimal point ofP if and only if Y j > Y i for all j < i. One much studied quantity in this context is the sumS n of theα th powers of the Euclidean distances between the minimal points of the process and the origin for some α> 0; the work [57] shows that S n converges to D 2/α in distribution as n tends to infinity. The lower record times R 1 ,R 2 ,... of the height process Y 1 ,...,Y n are also studied, see [15], and are defined by letting R 1 = 1, and for i> 1 by R i = ∞ if Y j ≥Y R i−1 for all j >R i−1 or if R i−1 ≥n, min{j >R i−1 :Y j <Y R i−1 } otherwise. In terms of these record times, the collection of the k(n) minimal points inside the unit square is given by (X R i ,Y R i ) for i = 1,...,k(n). We claim that the scaled sum of lower record times W n = 1 n k(n) X i=1 R i (2.2) 65 can be approximated by the Dickman distributionD in the Wasserstein-2 metric in (1.9) to within the bound specified by inequality (1.11) of Theorem 4.1.1. Indeed, for 1≤j≤n, letting B k =1(k∈{R 1 ,...,R k(n) }) we have that P k(n) i=1 R i = P n k=1 kB k . As Lemma 2.1 of [15] shows that B 1 ,...,B n are independent with B k ∼ Ber(1/k) for 1≤ k≤ n, Theorem 4.1.1 yields the claimed bound for the Dickman approximation of (2.2). We now present the proof of our first main result. Proof of Theorem 4.1.1: Let W n be as in (1.7) and take θ = 1 in (2.1). Letting W (k) n =W n − Y k n B k , evaluating the first term on the right hand side of (2.1) yields E[W n f 0 (W n )] =E " 1 n n X k=1 Y k B k f 0 (W n ) # = 1 n n X k=1 E Y k B k f 0 W (k) n + Y k n B k = 1 n n X k=1 E Y k f 0 W (k) n + Y k n P (B k = 1) = 1 n n X k=1 E Y k k f 0 W (k) n + Y k n . The right hand of (2.1) is therefore the expectation of 1 n n X k=1 Y k k f 0 W (k) n + Y k n − Z 1 0 f 0 (W n +u)du = 1 n n X k=1 Y k k f 0 W (k) n + Y k n −f 0 W (k) n + k n + 1 n n X k=1 f 0 W (k) n + k n −f 0 W n + k n + 1 n n X k=1 f 0 W n + k n − Z 1 0 f 0 (W n +u)du ! . (2.3) Now write the k th summand of the first term of (2.3) as Y k k f 0 W (k) n + Y k n −f 0 W (k) n + k n = Y k k f 0 W (k) n + Y k n −f 0 W (k) n + k n + Y k k f 0 W (k) n + k n −f 0 W (k) n + k n . 66 The expectation of the second difference is zero asE[Y k ] =k andY k is independent ofW (k) n . Thus, using that f∈H 1,1/2 , and hence in particular that f 0 (·) is Lipschitz, applying the Cauchy-Schwarz inequality to the first difference, we find that the expectation of the first term of (2.3) is bounded by kf 00 k ∞ n 2 n X k=1 E |Y k | k |Y k −k| ≤ 1 2n 2 n X k=1 1 k q (σ 2 k +k 2 )σ 2 k . For the expectation of the second term of (2.3), noting that E[Y k B k ] = 1, we similarly obtain the bound kf 00 k ∞ n n X k=1 E|W (k) n −W n |≤ 1 2n n X k=1 E Y k n B k = 1 2n . Finally, for the third expression (2.3), applying that same bound on the second derivative of f(·), almost surely 1 n n X k=1 f 0 W n + k n − Z 1 0 f 0 (W n +u)du ≤ n X k=1 Z k n k−1 n [f 0 (W n +k/n)−f 0 (W n +u)] du ≤ 1 2 n X k=1 Z k n k−1 n (k/n−u)du = 1 2 1 n 2 n X k=1 k− Z 1 0 udu ! = 1 4n . Combining these three bounds yields, via (2.1) with θ = 1, that |E[h(W n )]−E[h(D)]|≤ 3 4n + 1 2n 2 n X k=1 1 k q (σ 2 k +k 2 )σ 2 k . Taking the supremum overH 1,1 and recalling the definition of the norm d 1,1 in (1.9) now yields the theorem. The final claim (1.11) holds as σ 2 k = 0 when Y k =k a.s. We turn now to the proof of our next main result, proceeding along the same lines as in the proof of Theorem 4.1.1. We first recall the well known Stein identity for the Poisson distribution, see e.g. [25], that P∼P(λ) if and only if E[Pg(P )] =λE[g(P + 1)] (2.4) for all functions g(·) on the non-negative integers for which the expectation of either side exists. Proof of Theorem 4.1.2: Consider equation (2.1) withW n as in (1.12) andh(·) an arbitrary function inH 1,1 , and f∈H θ,θ/2 the solution of (1.18) guaranteed by Theorem 4.1.5. For 67 k = 1,...,n set W (k) n = W n −Y k P k /n. Using that P 1 ,...,P n ,Y 1 ,...,Y n are independent withP k ∼P(θ/k) and (2.4) for the second equality, letting S k ={Y j ,j∈{1,...,n},P j ,j∈ {1,...,n}\{k}}, we have E[(W n /θ)f 0 (W n )] = 1 θn n X k=1 E h Y k E h P k f 0 (W (k) n +Y k P k /n)|S k ii = 1 n n X k=1 E Y k k E h f 0 (W (k) n +Y k P k /n +Y k /n)|S k i = 1 n n X k=1 E Y k k f 0 (W n +Y k /n) . Thus, via (2.1), we obtain E[h(W n )]−E[h(D θ )] =E (W n /θ)f 0 (W n )− (f(W n + 1)−f(W n )) =E " 1 n n X k=1 Y k k f 0 W n + Y k n − Z 1 0 f 0 (W n +u)du # =E " 1 n n X k=1 Y k k f 0 W n + Y k n −f 0 W n + k n # +E " 1 n n X k=1 f 0 W n + k n − Z 1 0 f 0 (W n +u)du # . (2.5) Now for the second term in (2.5), since f∈H θ,θ/2 , as for this same term that appears in the proof of Theorem 4.1.1, we have almost surely that 1 n n X k=1 f 0 (W n +k/n)− Z 1 0 f 0 (W n +u)du ≤ θ 4n . (2.6) Now we write the first term in (2.5) as the expectation of 1 n n X k=1 Y k k f 0 W n + Y k n − Y k k f 0 W n + k n + 1 n n X k=1 Y k k f 0 W n + k n −f 0 W n + k n . (2.7) As in proof of Theorem 4.1.1, recalling thatf∈H θ,θ/2 , the expectation of the first term in (2.7) is bounded by kf 00 k ∞ n 2 n X k=1 E |Y k | k |Y k −k| ≤ θ 2n 2 n X k=1 1 k q (σ 2 k +k 2 )σ 2 k . 68 The expectation of the second term in (2.7) can be bounded by kf 0 k ∞ n n X k=1 E Y k k − 1 ≤ θ n n X k=1 σ k k . Assembling the bounds on the terms arising from (2.5), consisting of (2.6) and the two inequalities above, we obtain |E[h(W n )]−E[h(D θ )]|≤ θ 4n + θ n n X k=1 σ k k + θ 2n 2 n X k=1 1 k q (σ 2 k +k 2 )σ 2 k . Taking the supremum over h∈H 1,1 and applying definition (1.9) completes the proof of (1.13). The inequality in (1.14) follows by observing that σ 2 k = 0 when Y k =k a.s. 2.2 Dickman approximation in number theory Let (p k ) k≥1 be an enumeration of the prime numbers in increasing order. Let (X k ) k≥1 be a sequence of independent integer valued random variables and let S n = 1 log(p n ) n X k=1 X k log(p k ) for n≥ 1. (2.8) Weak convergence of S n to the Dickman distribution in the cases when the X k ’s are dis- tributed as geometric and Bernoulli is well known in probabilistic number theory, and [1] recently provided a rate of convergence in the Bernoulli case. We give bounds in a stronger metric and remove a logarithmic factor from their rate. We also prove such bounds when theX k ’s are distributed as geometric or Poisson with parameters given by certain functions ofp k . For our results in this area, we rely heavily on the techniques in the proof of Lemma 4.2.4 of [1]; in particular, the identity (2.9) below, without remainder, is due to [1]. We begin with the following abstract theorem. Theorem 4.2.1. Let S be a non-negative random variable with finite variance such that for some constant μ and a random variable T satisfying P (S +T = 0) = 0, E[Sφ(S)] =μE[φ(S +T )] +R φ for all φ∈ Lip 1/2 , (2.9) where the constant R φ may depend on φ(·). Then d 1,1 (S,D)≤|μ− 1| + 1 2 inf (T,U) E|T−U| + sup φ∈Lip 1/2 |R φ | (2.10) 69 where D is a standard Dickman random variable, and the infimum is over all couplings (T,U) of T and U∼U[0, 1] constructed on the same space as S, with U independent of S. Remark 4.2.2. We note the connection between the relation in (2.9) and size biasing, where for a non-negative random variable S with finite mean μ, we say S s has the S-size biased distribution when E[Sφ(S)] =μE[φ(S s )] for all functions φ(·) for which these expectations exist. In particular, when R φ in (2.9) is zero for all φ∈ Lip 1/2 , we obtain that S s = d S +T ; for an application which requires the remainder, see Lemma 4.2.3. Additionally, Section 4.3 of [3] shows that the standard DickmanD is the unique non-negative solution to the distributional equality W s = d W +U, where U isU[0, 1], and independent of W . Hence, the error term comparing T and U in Theorem 4.2.1 is natural. Proof of Theorem 4.2.1: We first show that the set of couplings over which the infimum is taken in (2.10) is non-empty. Let μ = E[S], and let S s and U be constructed on the same space as S, independently of S, with S s having the S-size biased distribution and U ∼ U[0, 1]. Then setting T = S s −S identity (2.9) is satisfied with R φ = 0 for all φ ∈ Lip 1/2 , and the pair (T,U) satisfies the conditions required of the infimum in the theorem. Invoking Theorem 4.1.5 with θ = 1, for any given h∈H 1,1 there exists a function f(·) satisfyingkf 0 k (0,∞) ≤ 1 andkf 00 k (0,∞) ≤ 1/2 such that E[h(S)]−E[h(D)] =E[Sf 0 (S) +f(S)−f(S + 1)]. Now considerμ andT satisfying (2.9) with (T,U) constructed on the same space asS, with U∼U[0, 1] and independent of S. Then, using that P (S +T = 0) = 0 and the mean value theorem for the second inequality and recalling definitions (1.19) and (1.20), we obtain |E[h(S)]−E[h(D)]| =|E[Sf 0 (S)−f 0 (S +U)]| =|E[μf 0 (S +T )−f 0 (S +U) +R f 0]| ≤|E[μf 0 (S +T )−f 0 (S +T )]| +|E[f 0 (S +T )−f 0 (S +U)]| +|R f 0|. ≤kf 0 k (0,∞) |μ− 1| +kf 00 k (0,∞) E|T−U| +|R f 0|≤|μ− 1| + 1 2 E|T−U| +|R f 0|. 70 Now taking the infimum on the right hand side over all couplings (T,U) satisfying the conditions of the theorem yields |E[h(S)]−E[h(D)]|≤|μ− 1| + 1 2 inf (T,U) E|T−U| +|R f 0 h |, where we have written f =f h to emphasize the dependence of f on h. Taking supremum over h∈H 1,1 first on the right, and then on the left now yields the result upon applying definition (1.9). Now we will demonstrate a few applications of Theorem 4.2.1. In all these examples the conditions that the variance of S is finite and that S +T > 0 almost surely are straight- forward to check, and will not be mentioned further. For n≥ 1, let Ω n denote the set of integers with no prime factor larger than p n , and let Π n be the distribution on Ω n with mass function Π n (m) = 1 π n m for m∈ Ω n where π n = P m∈Ωn 1/m is the normalizing factor. One can check, see e.g. Proposition 1 in [58], that M n = Q n k=1 p X k k has distribution Π n , where X k ∼ Geom(1− 1/p k ) are independent for 1≤ k ≤ n ; we remind the reader that we write X ∼ Geom(p) when P (X = m) = (1−p) m p for m≥ 0. For n≥ 1, the random variable S n as in (2.8) is therefore given by S n = 1 log(p n ) n X k=1 X k log(p k ) = logM n log(p n ) . (2.11) Taking the mean, we find μ n =E[S n ] = 1 log(p n ) n X k=1 log(p k ) p k − 1 . (2.12) Now define the random variable I taking values in{1,...,n}, and independent of S n , with mass function P (I =k) = log(p k ) (p k − 1) log(p n )μ n for k∈{1,...,n}. (2.13) The next lemma very closely follows the arguments in Lemmas 3 and 5 of [1] and is included here only for completeness. In the proof, we will use the statement, equivalent [44] to 71 the prime number theorem, that lim n→∞ p n /(n logn) = 1, and Rosser’s Theorem [67], to respectively yield that logp n = logn +O(log logn) and p k >k logk. (2.14) Lemma 4.2.3. Let S n be as in (2.11) with X 1 ,...,X n independent with X k ∼ Geom(1− 1/p k ), μ n as in (2.12), I with distribution given in (2.13) and independent of S n and T n = log(p I ) log(p n ) and R n,φ = 1 log(p n ) n X k=1 log(p k ) p k − 1 E X k φ S n + log(p k ) log(p n ) −φ(S n ) . Then E[S n φ(S n )] =μ n E[φ(S n +T n )] +R n,φ for all φ∈ Lip 1/2 . Moreover sup φ∈Lip 1/2 |R n,φ | =O 1 log 2 n and μ n − 1 =O 1 logn , and there exists a coupling between U∼U[0, 1] and T n with U independent of S n , such that E|T n −U| =O 1 logn . Proof. It is easily verified that for X∼ Geom(p), E[g(X)] = 1−p p E[g(X + 1)−g(X)] (2.15) for all functions g(·) for which these expectations exist, and which satisfy g(0) = 0. Let S (k) n =S n −X k log(p k )/ log(p n ). SinceX k ∼ Geom(1−1/p k ), specializing (2.15) to the case 72 g(x) =xφ(S (k) n +x log(p k )/ log(p n )), conditioning on S (k) n in the second equality and using the independence of I and S n in the last, for φ∈ Lip 1/2 we have E[S n φ(S n )] = 1 log(p n ) n X k=1 log(p k )E X k φ S (k) n +X k log(p k ) log(p n ) = 1 log(p n ) n X k=1 log(p k )(1/p k ) 1− 1/p k E (X k + 1)φ S n + log(p k ) log(p n ) −X k φ(S n ) = 1 log(p n ) n X k=1 log(p k ) p k − 1 E φ S n + log(p k ) log(p n ) +X k φ S n + log(p k ) log(p n ) −φ(S n ) =μ n n X k=1 log(p k ) (p k − 1) log(p n )μ n E φ S n + log(p k ) log(p n ) +R n,φ =μ n n X k=1 P (I =k)E φ S n + log(p k ) log(p n ) +R n,φ =μ n E[φ(S n +T n )] +R n,φ , proving the first claim. Next, using mean value theorem and thatkφ 0 k ∞ ≤ 1/2 in the first inequality, we have |R n,φ | = 1 log(p n ) n X k=1 log(p k ) p k − 1 E X k φ S n + log(p k ) log(p n ) −φ(S n ) ≤ 1 2 log(p n ) n X k=1 log 2 (p k ) (p k − 1) log(p n ) EX k = 1 2 log 2 (p n ) n X k=1 log 2 (p k ) (p k − 1) 2 =O 1 log 2 n (2.16) where in the last step, we have used that the second relation in (2.14) to lower bound p n by n, and, again by (2.14), that ∞ X k=1 log 2 (p k ) (p k − 1) 2 ≤C ∞ X k=1 log 2 (k) (k− 1) 2 <∞, (2.17) where we have used the first relation there to upper bound log(p k ) by C log(k) for some positive constant in the numerator, and the second one again to lower bound p k byk in the denominator. As the final sum in (2.16) does not depend on φ, the bound is uniform over all φ∈ Lip 1/2 . The proof of the remainder of the lemma closely follows Lemma 5 of [1]. Using that P j k=1 log(p k )/p k = log(p j ) +O(1), see Proposition 1.51 in [78], we obtain j X k=1 log(p k ) p k − 1 = log(p j ) + j X k=1 log(p k ) (p k − 1)p k +O(1) = log(p j ) +O(1), (2.18) 73 where in the second sum we have used both relations in (2.14) to obtain log(p k ) p k (p k −1) = O 1 k 2 logk . Thus, using (2.18), that p n > n via (2.14), and recalling μ n in (2.12), we obtain μ n − 1 = 1 log(p n ) n X k=1 log(p k ) p k − 1 − log(p n ) ! =O 1 logn . (2.19) To prove the last claim, we sketch the coupling construction of (U,I) in Lemma 5 of [1], with I a function of the uniform U ∼U[0, 1], itself independent of X 1 ,...,X n . For j = 0, 1,...,n, set F j = j X k=1 P (I =k) = 1 μ n log(p n ) j X k=1 log(p k ) (p k − 1) , and define the random variable I by I =j if F j−1 ≤U <F j . Clearly I is independent of X 1 ,...,X n , since it only depends on U. When I = j, using |u−c| is a convex function of u for any constant c for the equality, deterministically we have U− log(p I ) log(p n ) ≤ sup u∈[F j−1 ,F j ) u− log(p j ) log(p n ) = max n F j−1 − log(p j ) log(p n ) , F j − log(p j ) log(p n ) o . (2.20) Now, using (2.14), (2.18) and (2.19), with (2.19) implying that μ n → 1 asn→∞, we have F j − log(p j ) log(p n ) = 1 log(p n ) j X k=1 log(p k ) p k − 1 − log(p j ) − 1 logp n (1−μ −1 n ) j X k=1 log(p k ) p k − 1 =O 1 logn − μ n − 1 μ n logp n (logp j +O(1)) =O 1 logn . Also, using (2.14) and again that μ n → 1, we have P (I =j) =F j −F j−1 = 1 μ n log(p n ) log(p j ) (p j − 1) =O 1 j logn . (2.21) Thus, by subtracting and adding F j , we obtain F j−1 − log(p j ) log(p n ) =O 1 j logn +O 1 logn =O 1 logn , 74 and hence, on the event I =j, from (2.20) we have U− log(p I ) log(p n ) =O 1 logn . (2.22) Now, using (2.21) and (2.22) we obtain E U− log(p I ) log(p n ) = n X j=1 P (I =j)E U− log(p I ) log(p n ) I =j =O n X j=1 1 j logn 1 logn =O 1 logn , thus proving the final claim. Proof of Theorem 4.1.3: The theorem follows directly from Theorem 4.2.1 upon invoking Lemma 4.2.3. For our next example, for n≥ 1 let Ω 0 n denote the set of square-free integers whose largest prime factor is less than or equal to p n and let Π 0 n denote the distribution on Ω 0 n with mass function Π 0 n (m) = 1 π 0 n m for m∈ Ω 0 n where π 0 n = P m∈Ω 0 n 1/m is the normalizing factor. We again consider S n as in (2.11), here for M n = Q n k=1 p X k k where X k ∼ Ber(1/(1 +p k )) are independent for 1≤k≤n. One can check, see e.g. [19], that M n ∼ Π 0 n . Following [1], let μ n =E[S n ] = 1 log(p n ) n X k=1 log(p k ) 1 +p k . (2.23) The following lemma combines Lemmas 3 and 5 of [1]. By following tightly the same lines of argument in [1] the bounds we obtain in (2.26) and (2.27) are O(1/ logn) whereas [1] claims only the order O(log logn/ logn). Lemma 4.2.4. Let S n be as in (2.11) with X 1 ,...,X n independent with X k ∼ Ber(1/(1 + p k )). With μ n as given in (2.23), let the random variable I take values in{1,...,n} with mass function P (I =k) = log(p k ) (1 +p k ) log(p n )μ n for k∈{1,...,n}, 75 and be independent of X 1 ,...,X n . For T n = log(p I ) log(p n ) − X I log(p I ) log(p n ) , (2.24) we have E[S n φ(S n )] =μ n E[φ(S n +T n )] for all φ∈ Lip 1/2 . (2.25) Moreover, μ n − 1 =O 1 logn and E X I log(p I ) log(p n ) =O 1 log 2 n , (2.26) and there exists a coupling between a random variableU∼U[0, 1] andI withU independent of S n such that E U− log(p I ) log(p n ) =O 1 logn . (2.27) Proof. The proof of (2.25) is exactly same as in Lemma 3 of [1] and one can follow the lines of argument in [1] to prove the second claim in (2.26); for a proof, see Appendix B. The proofs of the other two claims are similar to those of the corresponding results in Lemma 4.2.3 noting that the orders in the bounds do not change if we replace p k − 1 byp k + 1; we omit the computation. Proof of Theorem 4.1.4: The result follows directly from Theorem 4.2.1 upon invoking Lemma 4.2.4 with R φ = 0 for all φ∈ Lip 1/2 and noting that with T n and U as in (2.24) and (2.27) respectively, E|T n −U|≤E X I log(p I ) log(p n ) +E U− log(p I ) log(p n ) =O 1 logn , using (2.26) and (2.27) on these two terms, respectively. We also prove that these types of convergence results hold for S n given in (2.11) when X k ∼ Poi(λ k ), k≥ 1 for certain sequence of positive real numbers (λ k ) k≥1 . Here we take μ n equal to the mean of S n , μ n = 1 log(p n ) n X k=1 λ k log(p k ) and P (I =k) = λ k log(p k ) log(p n )μ n for k∈{1,...,n}, (2.28) 76 with I independent of S n . Under this framework, we have the following construction of a variable having the size bias distribution of S n . Lemma 4.2.5. For a sequence of positive real numbers (λ k ) 1≤k≤n and independent random variables X 1 ,...,X n with X k ∼ Poi(λ k ), let S n = 1 log(p n ) n X k=1 X k log(p k ). For μ n as in (2.28) and T n = log(p I )/ log(p n ), where I is distributed as in (2.28) and is independent of S n , we have E[S n φ(S n )] =μ n E[φ(S n +T n )] for all φ∈ Lip 1/2 . Proof. Using (2.4) in the second equality, for S (k) n =S n −X k log(p k )/ log(p n ), E[S n φ(S n )] = 1 log(p n ) n X k=1 log(p k )E[X k φ(S (k) n +X k log(p k )/ log(p n )] = 1 log(p n ) n X k=1 log(p k )λ k E[φ(S (k) n + (X k + 1) log(p k )/ log(p n )] = 1 log(p n ) n X k=1 log(p k )λ k E[φ(S n + log(p k )/ log(p n )] =μ n n X k=1 P (I =k)E[φ(S n + log(p k )/ log(p n )] =μ n E[φ(S n +T n )] where in the last step, we have used that I is independent of S n . We now present two applications of Lemma 4.2.5 with notation and assumptions as there. Example 2.1. Letλ k = 1/(1+p k ). As the mean of theX k variables are the same here as in Lemma 4.2.4, μ n and the distribution of I also correspond. Taking U∼U[0, 1] independent of S n , and coupling I and U similarly as in Lemma 4.2.4, we have that |μ n − 1| =O 1 logn and E U− log(p I ) log(p n ) =O 1 logn . Now, by Theorem 4.2.1 and Lemma 4.2.5 we obtain d 1,1 (S n ,D)≤ C logn 77 for some universal constant C. Example 2.2. Let λ 1 = 1 and λ k = 1− log(p k−1 )/ log(p k ) for k≥ 2. Then clearly μ n = 1. Now to obtain a coupling (T n ,U), we take U∼U[0, 1] independent of S n , and define I =k if log(p k−1 ) log(pn) ≤U < log(p k ) log(pn) for 1≤k≤n where we take p 0 = 1. Then by construction we have P (I =k) = λ k log(p k ) log(p n )μ n for 1≤k≤n. Conditioning on I, we have E|T n −U| = n X k=1 P (I =k)E log(p k ) log(p n ) −U I =k ≤ n X k=1 P (I =k) log(p k−1 ) log(p n ) − log(p k ) log(p n ) . Now using that p k /p k−1 ≤ 2 by Bertrand’s postulate (see e.g. [61]) for all k≥ 1, we obtain E|T n −U|≤ log(2) log(p n ) . Hence from Theorem 4.2.1 with μ n = 1 and R φ = 0 for all φ∈ Lip 1/2 , we have d 1,1 (S n ,D)≤ log(2) 2 log(p n ) ≤ C logn for some universal constant C. 3 Smoothness bounds for f(·) In this section, we prove Theorem 4.1.5. For some θ> 0, define the measure ν by dν/dv = θv θ−1 and consider the Stein equation (x/θ)f 0 (x) +f(x)−f(x + 1) =h(x)−E[h(D θ )]. (3.1) For notational simplicity, in what follows, let ρ i = θ/(θ +i) for i∈{1, 2}. Recall the averaging operator A x g defined in (1.17). Lemma 4.3.1. For non-negative α and β, letH α,β be as in (1.10). For every θ > 0, if h∈H α,β then A x h∈C 2 [(0,∞)] and both A x h and A x+1 h are elements ofH αρ 1 ,βρ 2 . 78 Proof. Take h∈H α,β . Since h∈ Lip α , for any a > 0, it is bounded on the interval [0,a] and hence is ν-integrable on [0,a]. For x> 0, taking derivative in (1.17), we have (A x h) 0 = θ x θ+1 Z x 0 h 0 (v)v θ dv so that|(A x h) 0 |≤ αθ (θ+1)x θ+1 R x 0 (θ + 1)v θ dv =αρ 1 and hence A x h∈ Lip αρ 1 . Taking another derivative we obtain (A x h) 00 = θ x θ+1 h 0 (x)x θ − θ + 1 x Z x 0 h 0 (v)v θ dv for x> 0. As h 0 ∈ Lip β , the function A x h is twice continuously differentiable on (0,∞) proving the first claim. Since x θ = θ + 1 x Z x 0 v θ dv we have (A x h) 00 = θ(θ + 1) x θ+2 Z x 0 (h 0 (x)−h 0 (v))v θ dv . Taking absolute value and using that h 0 ∈ Lip β now yields |(A x h) 00 |≤ θ(θ + 1) x θ+2 Z x 0 |h 0 (x)−h 0 (v)|v θ dv ≤ βθ(θ + 1) x θ+2 Z x 0 (x−v)v θ dv = βθ(θ + 1) x θ+2 x θ+2 (θ + 1)(θ + 2) = βθ θ + 2 =βρ 2 . Since both A x h and (A x h) 0 are continuous at 0 and belong in C 1 [(0,∞)], we obtain A x h∈ H αρ 1 ,βρ 2 . The final claim is a consequence of the fact that A x+1 h is a left shift of A x h. Now we are ready to prove Theorem 4.1.5. Recall from (2.6) in Chapter 2 that for some α≥ 0, we define h (?k) (x) =A k x+1 h for k≥ 0 and definition (2.5) of Lip α,0 for α≥ 0. Also recall the integral type Stein equation (1.16). Proof of Theorem 4.1.5: Takeh∈H 1,1 . By replacing h(·) byh−E[h(D θ )] we may assume E[h(D θ )] = 0. Clearly E[D θ ] =θ (see e.g. [31]). For h∈ Lip 1,0 , Theorem 2.2.2 shows that g(·) in (2.6) given by g(x) = P k≥0 h (?k) (x) is a Lip 1/(1−ρ 1 ) solution to (1.16). Since g(·) is 79 Lipschitz, we haveg∈ T a>0 L 1 ([0,a],ν). Also, one can verify thatf =A x g is differentiable and satisfies (x/θ)f 0 (x) +f(x)−f(x + 1) =g(x)−A x+1 g. Hence f =A x g is a solution to (3.1). Now for a> 0, for any function h∈L 1 ([0,a],ν), kA • hk [0,a] = sup x∈[0,a] |A x h|≤ sup x∈[0,a] 1 x θ Z x 0 |h(v)|θv θ−1 dv≤khk [0,a] . (3.2) Let g n (x) = n X k=0 h (?k) (x) and f n (x) =A x g n . Sinceg n ∈ Lip (1−ρ n+1 )/(1−ρ) by Theorem 2.2.2, it isν-integrable over [0,a]. Now using (3.2), the triangle inequality and from (2.7) of Theorem 2.2.2 thatkh (?k) k [0,a] ≤ (θ +a)ρ k 1 for all a> 0, noting E[D θ ] =θ, we have kf−f n k [0,a] =kA • g−A • g n k [0,a] ≤kg−g n k [0,a] ≤ sup x∈[0,a] X k≥n+1 kh (?k) k [0,a] ≤ (θ +a) X k≥n+1 ρ k 1 = (θ +a) ρ n+1 1 1−ρ 1 . Letting n→∞, we obtain f(x) = X n≥0 A x h (?n) . Lemma 4.3.1 and induction imply that A x h (?n) ∈C 2 [(0,∞)] and A x h (?n) ∈H ρ n+1 1 ,ρ n+1 2 for all n≥ 0, and hence k(A x h (?n) ) 0 k (0,∞) ≤ρ n+1 1 and k(A x h (?n) ) 00 k (0,∞) ≤ρ n+1 2 . (3.3) 80 Thus, for any a > 0, on the interval (0,a], f 0 n (x) = P n k=0 (A x h (?k) ) 0 and f 00 n (x) = P n k=0 (A x h (?k) ) 00 converge uniformly to the corresponding infinite sums respectively, not- ing that by (3.3), the infinite sums are absolutely summable. Thus we obtain (see e.g. Theorem 7.17 in [69]) f 0 (x) = lim n→∞ f 0 n (x) and f 00 (x) = lim n→∞ f 00 n (x) for all x∈ [0,a]. Hence, again using (3.3), withk·k (0,∞) the supremum norm defined as in (1.20), kf 0 k (0,∞) ≤ X n≥0 ρ n+1 1 = ρ 1 1−ρ 1 =θ and kf 00 k (0,∞) ≤ X n≥0 ρ n+1 2 = ρ 2 1−ρ 2 = θ 2 . Finally, sincef(·) andf 0 (·) are differentiable everywhere on (0,∞) with bounded derivative, they are absolutely continuous on (0,∞). Also bothf(·) andf 0 (·) are continuous at 0 since by definition, f(0) = A 0 g = g(0) = lim x↓0 f(x) and f 0 (0) = lim x↓0 f 0 (x). Now noting that if a function is absolutely continuous on (0,∞) with bounded derivative and continuous at 0, then it is Lipschitz, we obtain that f∈H θ,θ/2 . Remark 4.3.2. In contrast to the normal, one can not expect uniformly bounded second derivatives of the solutions f(·) of (3.1) in Theorem 4.1.5 assuming only a Lipschitz con- dition on the test functions h(·) in a classH. For b> 0 let h(x) = 0 x≤b x−b x>b. Clearly h∈ Lip 1 . Taking θ = 1 and s(x) = x, the function g(·) as in (3.18), with h(·) replaced by ¯ h(·) = h(·)−E[h(D)] is Lipschitz and solves (3.10) by Theorem 5.3.10, hence f(x) = A x g solves (3.1). Arguing as in the proof of Theorem 4.1.5 to interchange A x and the infinite sum, f(·) is given by f(x) = X k≥0 A x [A k •+1 ( ¯ h)]. (3.4) Consider the term k = 0 in the sum (3.4). Directly, one may verify that A x h = 0 x≤b (x−b) 2 2x x>b. (A x h) 0 = 0 x≤b 1 2 1− (b/x) 2 x>b. (3.5) 81 and (A x h) 00 = 0 x≤b b 2 x 3 x>b. (3.6) so in particular, lim x↓b (A x ¯ h) 00 = lim x↓b (A x h−Eh(D)) 00 = lim x↓b (A x h) 00 = 1/b, (3.7) which is not bounded as b↓ 0. From (3.5) and (3.6) respectively, we have that (A x+1 ¯ h) 0 ≤ 1/2 and (A x+1 ¯ h) 00 ≤b 2 /(x+ 1) 2 ≤ b 2 on (0,∞), and hence A x+1 ¯ h∈H α,β with α = 1/2 and β = b 2 , By Lemma 4.3.1 with ρ 1 = 1/2 and ρ 2 = 1/3, we have A k •+1 ( ¯ h)∈H α/2 k−1 ,β/3 k−1 for k≥ 1. Hence, again by Lemma 4.3.1, A x [A k •+1 ( ¯ h)]∈H α/2 k ,β/3 k on (0,∞) for k≥ 1. (3.8) Summing and substituting the vales of α and β, we obtain X k≥1 A x [A k •+1 ( ¯ h)]∈H 1/2,b 2 /2 . (3.9) From (3.4), (3.7) and (3.9), we find that f 00 (x) may be made arbitrarily large on a set of positive measure by choosing b> 0 sufficiently small. 82 Chapter 5 Perpetuities and theD θ,s Family, Simulations and Distributional Bounds 1 Introduction In this chapter, we will consider the connection between the class of Dickman distributions and perpetuities. By approaching from the view of utility, we extend the scope of the Dickman distributions past the currently known class. As discussed at the end of Chapter 1, the recursion (1.2) was interpreted in [79] by Vervaat as the relation between the values of a perpetuity at two successive times. In particular, during the n th time period a deposit of some fixed value, scaled to be unity, is added to the value of an asset. During that time period, a multiplicative factor in [0, 1], accounting for depreciation is applied. In (1.2) of Chapter 4, that factor is taken to be U 1/θ to obtain the recursion W n+1 =U 1/θ n (W n + 1) for n≥ 0, with W 0 = 0, (1.1) where U m ,m≥ 0 are i.i.d. U[0, 1] random variables and U n is independent of W n . The generalized Dickman distributions arise as fixed points of this recursion, that is, solutions to W ∗ = d W where W ∗ is given by W ∗ = d U 1/θ (W + 1) (1.2) with U∼ U[0, 1] independent of W . Measuring the value of an asset directly by its monetary value corresponds to the case where the utility function of an asset, denoted here by s(·) is taken to be the identity. We consider the generalization of (1.1) to s(W n+1 ) =U 1/θ n s(W n + 1). (1.3) 83 In [13], see also the translation [14], Daniel Bernoulli argued that utility should be given as a concave function of the value of an asset, typically justified by observing that receiving one unit of currency would be of more value to an individual who has very few resources than one who has resources in abundance, see [34]. We may then interpret (1.3) in a manner similar to (1.1), but now in terms of utility. Again, during the n th time period, a constant value, scaled to be one, is added to an asset. Then, at time n + 1, the utility of the asset is given by some discount factor applied to the incremented utility of the asset. When s(·) is invertible, as for the known Vervaat perpetuities, one can now ask for stable points of their long term behavior by seeking fixed points of the transformation W ∗ = d s −1 (U 1/θ s(W + 1)). (1.4) Theorem 5.2.3 in Section 2 shows that under mild and natural conditions on the utility functions(·) the transformation (1.4) has a unique fixed point, sayD θ,s , which we say has the (θ,s)-Dickman distribution, denoted here asD θ,s . As the identity functions(x) =x recovers the class of generalized Dickman distributions, this extended class strictly contains them. The parameterθ> 0 here plays the same role forD θ,s as it does forD θ , in particular in its appearance in the distributional bounds for simulation using recursive schemes. Theorem 5.4.2 generalizes the bound (1.3) of [41] to theD θ,s family, providing the inequality d 1 (W,D θ,s )≤ (1−ρ) −1 d 1 (W ∗ ,W ) (1.5) with a parameter ρ given by a bound on an integral involving θ and s(·), see (4.2) and (4.3). We prove our results by generalizing Stein equation (1.16) used in [40] to the new D θ,s family and proving smoothness bounds for its solution. To handle theD θ,s family, over the range x> 0 we generalize the form of the averaging operator (1.17) introduced in Chapter 4 to A x g = 1 t(x) Z x 0 g(u)t 0 (u)du, where t(x) =s θ (x) and consider the Stein equation g(x)−A x+1 g =h(x)−E[h(D θ,s )] (1.6) 84 Smoothness bounds for solutions to this Stein equation will be given in Theorem 5.3.10 in Section 3 for a wide range of functions s(·). This generalization requires significant extensions of existing methods. We will apply (1.5) to assess the quality of the recursive scheme W n+1 =s −1 (U 1/θ n s(W n + 1)) for n≥ 0 and W 0 = 0, (1.7) for the simulation of variables having theD θ,s distribution. Simulation by these means for theD θ family was considered in [31], though no bounds on its accuracy were provided. An algorithmic method for the exact simulation from theD θ family was given in [37] with bounds on the expected running time. In brief, the method in [37] depends on the use of a multigamma coupler as an update function for the kernel K(x,·) :=L(U 1/θ (x + 1)), and on finding a dominating chain so that one can simulate from its stationary distribution, a shifted Geometric distribution in this case. To extend this approach to the more general familyD θ,s , one would consider the kernel K(x,·) :=L(U 1/θ s(x + 1)), and though one can generalize the multigamma coupler for use as an update function for this kernel, finding a suitable dominating chain in this generality may not be straightforward. The efficacy of a simpler recursive scheme for simulation from this family is addressed in (4.9) of Corollary 5.4.5 in Section 4 where we show that the iterates generated by (1.7) obey the inequality d 1 (W n ,D θ,s )≤ (1−ρ) −1 θ θ + 1 n E[s −1 (U 1/θ )], and which thus exhibit exponentially fast convergence. In Section 5 we will present some instances from the familyD θ,s that arise as limiting distributions for perpetuities when taking our utilities s(·) from those studied in economics. 2 Existence and uniqueness ofD θ,s distribution In this section we develop the extension of the generalized Dickman distribution to theD θ,s family for θ > 0 and a function s : [0,∞)→ [0,∞). As detailed in the Introduction, the recursion (1.2) associated with theD θ family can be interpreted as giving the successive values of a Vervaat perpetuity under the assumption that the utility function is the identity. More generally, with utility function s(·), one obtains the recursion s(W n+1 ) =U 1/θ n s(W n + 1) for n≥ 0, (2.1) 85 where U n ,n≥ 0 are independent and have the U[0, 1] distribution, U n is independent of W n , and W 0 has some given initial distribution. In the following we use the terms increasing and decreasing in the non-strict sense. Recall that≤ st denotes inequality between random variables in the stochastic order. Lemma 5.2.1. Let θ> 0 and s : [0,∞)→ [0,∞) satisfy s(x + 1)≤s(x) + 1 for all x≥ 0, (2.2) and let W 0 be a given non-negative random variable and {W n ,n ≥ 1} be generated by recursion (2.1). Then s(W n+1 )≤U 1/θ n (s(W n ) + 1) for all n≥ 0. (2.3) If in addition s(W 0 )≤ st D θ , then s(W n )≤ st D θ for all n≥ 0. (2.4) Proof. By applying (2.1) and (2.2) for the equality and inequality respectively, we have s(W n+1 ) =U 1/θ n s(W n + 1)≤U 1/θ n (s(W n ) + 1), hence the claim (2.3) holds, and when s(W n )≤ st D θ then U 1/θ n (s(W n ) + 1)≤ st U 1/θ n (D θ + 1) = d D θ , where for the final equality we have used thatD θ is fixed by the Dickman bias transformation (1.2), and taken U n independent of D θ . Induction then shows that the claim holds for all n≥ 0 when (2.4) is true for n = 0. Theorem 5.2.3, showing the existence and uniqueness of the fixed pointD θ,s to (1.4), requires the following condition to hold on the utility function s(·). Condition 2.1. The function s : [0,∞)→ [0,∞) is continuous, strictly increasing with s(0) = 0 and s(1) = 1, and satisfies s(x + 1)≤s(x) + 1 for all x≥ 0 (2.5) 86 and |s(x + 1)−s(y + 1)|≤|s(x)−s(y)| for all x,y≥ 0. (2.6) The following result shows that choice of the starting distribution in (2.1) has vanishing effect asymptotically as measured in the Wasserstein norm. Lemma 5.2.2. Letθ> 0 and Condition 2.1 be in force. AssumeW 0 andV 0 are independent of an i.i.d sequence U n ,n≥ 1 of U[0, 1] variables, are non-negative, and that the means of s(W 0 ) ands(V 0 ) are finite. Lets(V n ),n≥ 1 be generated in the same manner ass(W n ),n≥ 1 in (2.1). Then the variables s(W n ) and s(V n ) have finite mean for all n≥ 0, and d 1 (s(W n ),s(V n ))≤ θ θ + 1 n d 1 (s(W 0 ),s(V 0 )) for all n≥ 0. (2.7) Proof. By (2.3) of Lemma 5.2.1, the existence of E[s(W n )] implies the existence of E[s(W n+1 )]. Now induction and the assumption that E[s(W n )] is finite for n = 0 proves the expectation is finite for all n≥ 0. The claim (2.7) holds trivially for n = 0. Assuming it holds for some n≥ 0, by (2.1) and (2.6) we have |s(V n+1 )−s(W n+1 )|≤U 1/θ n |s(V n )−s(W n )|. Hence, by the independence of W n and V n from U n and definition (1.6) of the d 1 metric, we obtain d 1 (s(W n+1 ),s(V n+1 ))≤E[U 1/θ n |s(V n )−s(W n )|] = θ θ + 1 E|s(V n )−s(W n )|. Choosing the joint distribution of (s(V n ),s(W n )) to achieve the infimum in (1.6) and ap- plying the induction hypotheses, we obtain (2.7). Under Condition 2.1 on s(·), we will now prove Theorem 5.2.3 that shows that the distributional fixed pointsD θ,s of (2.1) exist and are unique. As Condition 2.1 implies that s(·) is invertible, when it is in force, we may also write (2.1) as W n+1 =s −1 U 1/θ n s(W n + 1) for n≥ 0. (2.8) 87 Define the generalized inverse of an increasing function s : [0,∞)→ [0,∞) as s − (x) = inf{y :s(y)≥x} (2.9) with the convention that inf∅ =∞. In particular for X a random variable, we consider s − (X) as a random variable taking values in the extended real line. When writing the stochastic order relation V ≤ st W between two extended value random variables, we mean that P (V ≥ t)≤ P (W≥ t) holds for all t in the extended real line. Note that s − (·) and s −1 (·) coincide on the range of s(·) when s(·) is continuous and strictly increasing. Theorem 5.2.3. Let θ > 0 and s(·) satisfy Condition 2.1. Then there exists a unique distributionD θ,s for a random variable D θ,s such that s(D θ,s ) has finite mean and satisfies D θ,s = d D ∗ θ,s , with D ∗ θ,s given by (1.4). In addition, D θ,s ≤ st s − (D θ ) where D θ ∼D θ . Proof. Generate a sequence W n ,n≥ 0 as in (2.8) with initial value W 0 = 0. We first prove that a distributional fixed point to the transformation (1.4) exists by showing the existence of a distributionD θ,s and a subsequence (n k ) k≥0 such that W n k → d D θ,s and W n k +1 → d D θ,s as k→∞ and W n k +1 = d W ∗ n k . (2.10) By Lemma 5.2.1 and the fact thats(W 0 ) =s(0) = 0, we haves(W n )≤ st D θ for alln≥ 0. As 0≤ s(W n )≤ st D θ , the sequence s(W n ),n≥ 0 is tight and therefore has a convergent subsequence s(W n k )→ d E θ,s for some distributionE θ,s . As s(·) is invertible W n k → d D θ,s whereD θ,s = d s −1 (E θ,s ) proving first claim in (2.10). As weak limits preserve stochastic order,E θ,s ≤ st D θ and henceD θ,s ≤ st s − (D θ ), as s(·) increasing implies that s − (·) given by (2.9) is also increasing. The last claim of the theorem is shown. Let the sequenceV n ,n≥ 0 be generated asW n is in (2.8) with initial valueV 0 = d W 1 and V 0 independent of U n ,n≥ 0. Note that s(V 0 ) has finite mean by Lemma 5.2.2, and hence (2.7) may be invoked to conclude thatd 1 (s(W n ),s(V n ))→ 0 asn→∞. Ass(W n k )→ d E θ,s , we have s(V n k )→ d E θ,s hence V n k → d D θ,s . As V 0 = d W 1 , we have V n = d W n+1 , implying W n k +1 → d D θ,s . The second claim in (2.10) is shown. The third claim holds by (2.8) and by definition (4.1) of the Dickman bias transform. By the first claim in (2.10) and the continuity of s(·) and s −1 (·), letting U∼U[0, 1] be independent of D θ,s ∼D θ,s and W n k , as k→∞ we have W ∗ n k = d s −1 U 1/θ s(W n k + 1) → d s −1 U 1/θ s(D θ,s + 1) = d D ∗ θ,s . 88 Hence, letting k→∞ in the third relation (2.10) we obtain D θ,s =D ∗ θ,s , showing thatD θ,s is a fixed point of the Dickman bias transformation (4.1). Now let W 0 and V 0 be any two fixed points of the transformation such that s(W 0 ) and s(V 0 ) have finite mean. Then the distributions of s(W n ) and s(V n ) do not depend on n, and (2.7) yields d 1 (s(W 0 ),s(V 0 )) =d 1 (s(W n ),s(V n ))→ 0 as n→∞. Hence s(W 0 ) = d s(V 0 ), and applying s −1 we conclude W 0 = d V 0 ; the fixed point is unique. 3 Smoothness bounds for g(·) In this section we turn to proving Theorem 5.3.10 which provides smoothness bounds for the solutions to (1.6) for a large class of functionss(·) corresponding to text functionsh∈ Lip 1,0 defined in (3.11). We develop the necessary tools building on [40]. For notational simplicity, in this section given (θ,s), let t(x) =s θ (x) for all x≥ 0. (3.1) Throughout this section t : [0,∞)→ [0,∞) will be strictly increasing and hence almost everywhere differentiable by Lebesgue’s Theorem, see e.g. Section 6.2 of [68], inducing the measure ν satisfying dν/dv = t 0 (v) on [0,∞), where v(·) is Lebesgue measure. For h∈L 1 ([0,a],ν) for some a> 0, define the averaging operator A x h = 1 t(x) Z x 0 h(v)t 0 (v)dv for x∈ (0,a], and A 0 h =h(0)1(t(0) = 0). (3.2) Lemma 5.3.1. Let t : [0,∞)→ [0,∞) be a strictly increasing function. If h∈L 1 ([0,a],ν) for some a> 0, then f(x) =A x h satisfies t(x) t 0 (x) f 0 (x) +f(x) =h(x) a.e. on (0,a]. (3.3) 89 Conversely, if in addition t(·) is locally absolutely continuous on [0,∞) with t(0) = 0, and f ∈ S α≥0 Lip α , then the function h(·) as given by the right hand side of (3.3) is in L 1 ([0,a],ν) for all a> 0 and f(x) =A x h for all x∈ (0,∞). (3.4) Proof. The first claim follows from the definition (3.2) of A x h by differentiation. For the second claim, noting that the caseα = 0 is trivial, fixα> 0. Sincet(·) is locally absolutely continuous and increasing, for any a> 0, Z a 0 h(v)t 0 (v)dv ≤ Z a 0 t(v)|f 0 (v)| +|f(v)|t 0 (v) dv≤αat(a) + (|f(0)| +αa)t(a)<∞ and hence h∈ L 1 ([0,a],ν) for all a > 0. Now note that the function f(x)t(x) is locally absolutely continuous on [0,∞) since both f(·) and t(·) are locally absolutely continuous and for any compact C⊂ (0,∞), the function g(u,v) = uv is Lipschitz on f(C)×t(C). Thus, for x> 0, we have A x h = 1 t(x) Z x 0 h(v)t 0 (v)dv = 1 t(x) Z x 0 (t(v)f 0 (v) +t 0 (v)f(v))dv = 1 t(x) (f(x)t(x)) =f(x). Lemma 5.3.2. If a function f : [0,∞)→ [0,∞) is increasing, continuous at 0 and locally absolutely continuous on (0,∞), then it is locally absolutely continuous on its domain. Proof. Since f(·) is absolutely continuous on any compact subset of (0,∞), by continuity of f(·) at 0, for 0 < ≤ x <∞, using absolute continuity on [,x] in the second equality and monotone convergence in the third, we have f(x)−f(0) = lim ↓0 (f(x)−f()) = lim ↓0 Z x f 0 (v)dv = Z x 0 f 0 (v)dv. Hence f(·) is locally absolutely continuous on its domain. Lemma 5.3.3. Lett : [0,∞)→ [0,∞) be given byt 1/θ (·) =s(·) fors(·) a strictly increasing locally absolutely continuous function on [0,∞) with s(0) = 0. Then t(·) is also locally absolutely continuous on [0,∞). Moreover, for W a non-negative random variable and W ∗ with distribution as in (1.4), for h∈ T a∈S L 1 ([0,a],ν) where S is the support of W + 1, E[h(W ∗ )] =E[A W +1 h] (3.5) 90 whenever either expectation above exists, and letting f(x) =A x h for all x∈S, E t(W ∗ ) t 0 (W ∗ ) f 0 (W ∗ ) +f(W ∗ ) =E[f(W + 1)], (3.6) when the expectation of either side exists. Proof. Since s(·) is locally absolutely continuous on [0,∞) and the function u θ is Lipschitz on any compact subset of (0,∞), we have thatt(·) is locally absolutely continuous on (0,∞), and hence the first claim of the lemma follows by Lemma 5.3.2. Next, as A x h exists for all x∈S for any h satisfying the hypotheses of the lemma and W ∗ ≤ st W +1 by (4.5), the averagesA W +1 h andA W ∗h both exist. Now let the expectation on the left hand side of (3.5) exist. Using (1.4) and (3.1) for the first equality and applying the change of variable v =ut(W + 1) in the resulting integral, we obtain E[h(W ∗ )] =E[h(t −1 [Ut(W + 1)])] =E Z 1 0 h(t −1 [ut(W + 1)])du =E " 1 t(W + 1) Z t(W +1) 0 h(t −1 (v))dv # =E " 1 t(W + 1) Z W +1 0 h(w)t 0 (w)dw # =E[A W +1 h], where in the second to last equality we have applied the change of variable t(w) = v and the fact thatt(0) = 0. When the expectation on the right hand side of (3.5) exists we apply the same argument, reading the display above from right to left. To prove the second claim of the lemma, by an argument similar to the one at the start of Section 3 of [40], the distribution of U 1/θ s(W + 1) is absolutely continuous with respect to Lebesgue measure, with density, sayp(·). By a simple change of variable, we obtain that W ∗ has density p W ∗(x) =p(s(x))s 0 (x) almost everywhere, and hence the distribution of W ∗ is also absolutely continuous with respect to Lebesgue measure. Thus by (3.3), E t(W ∗ ) t 0 (W ∗ ) f 0 (W ∗ ) +f(W ∗ ) =E[h(W ∗ )] and (3.6) follows from the first claim. 91 For an a.e. differentiable function f(·), let D t f(x) = t(x) t 0 (x) f 0 (x) +f(x)−f(x + 1). (3.7) Note that if f(x) = A x g for some g(·), then under the conditions of Lemma 5.3.1, by (3.3) we may write (3.7) as D t f(x) =g(x)−A x+1 g almost everywhere. (3.8) Condition 2.1 is assumed in some of the following statements to assure that the distri- bution of D θ,s exists uniquely. The proof of the next lemma follows using Lemmas 5.3.1 and 5.3.3, similar to the proof of Lemma 3.2 in [40]. Lemma 5.3.4. Let θ > 0 and s(·) satisfy Condition 2.1. If s(·) is locally absolutely con- tinuous on [0,∞), then, E[h(D θ,s )] =E[A D θ,s +1 h] and E[D t f(D θ,s )] = 0, for all h(·) ∈ T a∈(0,∞) L 1 ([0,a],ν) and f(·) ∈ S α≥0 Lip α for which E[D t f(D θ,s )] exists, respectively. Proof. The first equality follows from the fact that D ∗ θ,s = d D θ,s and (3.5) of Lemma 5.3.3. Forf(·)∈ Lip α withα> 0, lettingh(x) be given by as in the right hand side of (3.3), since t(·) is locally absolutely continuous on [0,∞) by Lemma 5.3.3, we have thath∈L 1 ([0,a],ν) for all a> 0 and f(x) =A x h for all x∈ (0,∞) by (3.4) of Lemma 5.3.1. Noting that the case α = 0 is trivial, the second claim now follows from (3.6). The second claim of the lemma and (3.7) suggest the Stein equation t(x) t 0 (x) f 0 (x) +f(x)−f(x + 1) =h(x)−E[h(D θ,s )], (3.9) which via (3.8) may be rewritten as g(x)−A x+1 g =h(x)−E[h(D θ,s )] (3.10) whenever g(·) is such that A x g exists for all x and f(x) =A x g. To use Stein’s method to bound the Wasserstein distance between a non-negative ran- dom variable and D θ,s , we first need to identify a set of broad sufficient conditions on t(·) 92 under which we can find a nice solution g(·) to (3.10) when h∈ Lip 1,0 , where, suppressing dependence on θ and s(·) for notational simplicity, for α> 0, we let Lip α,0 ={h : [0,∞)→R :h∈ Lip α ,E[h(D θ,s )] = 0}. (3.11) Note that by Lemma 5.3.3, if s(·) is strictly increasing with s(0) = 0, locally absolutely continuity of one of s(·) and t(·) implies that of the other. Hence, given that either one is locally absolutely continuous on [0,∞), as any continuous function h : [0,∞)→ R is bounded on [0,a] for all a≥ 0, we have h∈∩ a>0 L 1 ([0,a],ν). As the integrability of h(·) can thus be easily verified, it will not be given further mention. Lemma 5.3.5. Let t : [0,∞)→ [0,∞) be a strictly increasing and locally absolutely con- tinuous function on [0,∞). If h(·) is absolutely continuous on [0,a] for some a > 0 with a.e. derivative h 0 (·), then with A x h as in (3.2), (A x h) 0 = t 0 (x) t 2 (x) Z x 0 h 0 (u)t(u)du a.e. on x∈ (0,a]. (3.12) If there exists some ρ∈ [0,∞) such that ess sup x>0 I(x)≤ρ where I(x) = t 0 (x) t 2 (x) Z x 0 t(u)du, (3.13) then A x h∈ Lip αρ on [0,∞) whenever h∈ Lip α for some α≥ 0. Proof. For the first claim, first assume h(0) = 0. Using Fubini’s theorem in the third equality and then the local absolute continuity of t(·), for x∈ (0,a], we obtain A x h = 1 t(x) Z x 0 h(v)t 0 (v)dv = 1 t(x) Z x 0 Z v 0 t 0 (v)h 0 (u)dudv = 1 t(x) Z x 0 Z x u t 0 (v)h 0 (u)dvdu = 1 t(x) Z x 0 h 0 (u)[t(x)−t(u)]du, (3.14) and differentiation yields (3.12). To handle the case whereh(0) is not necessarily equal to zero, lettingh 0 (x) =h(x)−h(0) the result follows by noting that h 0 0 (·) = h 0 (·) and, by the absolute continuity of t(·), that (A x h 0 ) 0 = (A x h−h(0)) 0 = (A x h) 0 . 93 For the final claim, since h is continuous and thus h∈ T a>0 L 1 ([0,a],ν), using (3.12) and applying (3.13), for every x for which I(x)≤ρ and t 0 (x) exists, we obtain |(A x h) 0 | = t 0 (x) t 2 (x) Z x 0 h 0 (u)t(u)du ≤kh 0 k ∞ t 0 (x) t 2 (x) Z x 0 t(u)du≤αρ. (3.15) As t(·) is locally absolutely continuous, A x h, as seen by the first equality in (3.14), is a ratio of two locally absolutely continuous functions. For any fixed compact subset C of (0,∞), sinceu(x) := R x 0 h(v)t 0 (v)dv is continuous,u(C) is also compact and hence bounded. Also, since t(·) is strictly increasing with t(0)≥ 0, t(C) is bounded away from 0. Hence the function f(u,v) = u/v restricted to u(C)×t(C) is Lipschitz, implying that A x h is absolutely continuous on C. Thus, it follows that A x h∈ Lip αρ , as only x values in a set of measure zero have been excluded in (3.15). When the function s(·) is nice enough, we can actually say more about the constant ρ in (3.13) of Lemma 5.3.5. In Theorem 5.3.6 below, we provide conditions under which the integral boundkIk ∞ ≤ρ holds for some ρ∈ [0, 1). Theorem 5.3.6. Assume that θ> 0 and s : [0,∞)→ [0,∞) is concave and continuous at 0. Then with t(x) =s θ (x) and I(x) as given in (3.13), kIk ∞ ≤ θ θ + 1 . (3.16) If moreover s(·) is strictly increasing with s(0) = 0 and lim n→∞ s 0 (x n ) < ∞ for some sequence of distinct real numbers x n ↓ 0 in the domain of s 0 (·), then kIk ∞ = θ θ + 1 . (3.17) Proof. Sinces(·) is concave and continuous at 0, it is locally absolutely continuous withs 0 (·) decreasing almost everywhere on [0,∞). Since u θ+1 is Lipschitz on any compact interval, by composition,s θ+1 (·) is absolutely continuous on [0,x] for anyx≥ 0, and thus for almost every x, (θ + 1)I(x) θ = (θ + 1)s 0 (x) s θ+1 (x) Z x 0 s θ (v)dv≤ 1 s θ+1 (x) Z x 0 (θ + 1)s θ (v)s 0 (v)dv = s θ+1 (x)−s θ+1 (0) s θ+1 (x) ≤ 1, proving (3.16). 94 To prove the second claim, first note that 0< lim n→∞ s 0 (x n )<∞, the existence of the limit and second inequality holding by assumption, and the first inequality holding as s(·) is strictly increasing and s 0 (·) is decreasing almost everywhere. Thus, in the second equality using a version of the Stolz-Ces` aro theorem [76] adapted to accommodate s θ+1 (x n ) decreasing to zero, lim n→∞ I(x n ) =θ lim n→∞ s 0 (x n ) lim n→∞ R xn 0 s θ (v)dv s θ+1 (x n ) =θ lim n→∞ s 0 (x n ) lim n→∞ R xn x n+1 s θ (v)dv s θ+1 (x n )−s θ+1 (x n+1 ) =θ lim n→∞ s 0 (x n ) lim n→∞ R xn x n+1 s θ (v)dv (θ + 1) R xn x n+1 s θ (v)s 0 (v)dv = θ θ + 1 lim n→∞ s 0 (x n ) lim n→∞ 1 s 0 (x n ) = θ θ + 1 , where the penultimate equality follows from the fact that lim n→∞ 1 s 0 (x n ) = lim n→∞ 1 s 0 (x n+1 ) ≤ lim n→∞ R xn x n+1 s θ (v)dv R xn x n+1 s θ (v)s 0 (v)dv ≤ lim n→∞ 1 s 0 (x n ) and hence kIk ∞ ≥ θ θ + 1 which together with (3.16) proves (3.17). Remark 5.3.7. If θ > 0 and t(·) is given by t(·) = s θ (·) for s(·) concave and continuous at zero, thenkIk ∞ ≤ θ/(θ + 1) by Theorem 5.3.6. Hence ρ∈ [0, 1) always exists for such choices of t(·). Lemmas 5.3.8, 5.3.9 and Theorem 5.3.10 generalize Lemma 3.5, 3.6 and Theorem 3.1 in [40] for the generalized Dickman; hence we omit them here, see Appendix B for the proofs. Lemma 5.3.8. Letθ> 0 ands(·) satisfy Condition 2.1. Moreover assume thatμ =E[D θ,s ] exists. Then with Lip α,0 as in (3.11), for any α> 0, sup h∈Lip α,0 |h(0)| =αμ. To define iterates of the averaging operator on a function h, let A 0 x+1 h =h(x) and A n x+1 =A x+1 (A n−1 •+1 ) for n≥ 1, 95 and for a class of functionsH let A n x+1 (H) ={A n x+1 h :h∈H} for n≥ 0. Lemma 5.3.9. Lets(·) satisfy Condition 2.1 and be locally absolutely continuous on [0,∞). If there exists ρ∈ [0,∞) such that (3.13) holds, then for all θ> 0,α≥ 0 and n≥ 0, A n x+1 (Lip α,0 )⊂ Lip αρ n ,0 . In the following, by replacing h(x) byh(x)−E[h(D θ,s )], when handling the Stein equa- tions (3.9) and (3.10), without loss of generality we may assume that E[h(D θ,s )] = 0. For a given function h∈ Lip α,0 for some α≥ 0, let h (?k) (x) =A k x+1 h for k≥ 0, g(x) = X k≥0 h (?k) (x) and g n (x) = n X k=0 h (?k) (x). (3.18) Also recall definition (1.20) that for anya≥ 0 and functionf(·),kfk [0,a] = sup x∈[0,a] |f(x)|. Theorem 5.3.10. Let s(·) satisfy Condition 2.1 and be locally absolutely continuous on [0,∞). Further assume that μ = E[D θ,s ] exists. If there exists ρ∈ [0, 1) such that (3.13) holds, then for all a≥ 0 and h∈ Lip 1,0 we have kh (?k) k [0,a] ≤ (μ +a)ρ k , g n ∈ Lip (1−ρ n+1 )/(1−ρ) and g(·) given by (3.18) is a Lip 1/(1−ρ) solution to (3.10). 4 Distributional bounds forD θ,s approximation and Simula- tions In this section, we will provide distributional bounds for approximation of theD θ,s distri- bution. Theorem 5.4.2 extends the main Wasserstein bound (1.3) of [41] to d 1 (W,D θ,s )≤ (1−ρ) −1 d 1 (W ∗ ,W ) where W ∗ = d s −1 U 1/θ s(W + 1) (4.1) for U∼ U[0, 1], independent of W . The constant ρ is defined in (4.3) as a uniform bound on an integral involving (θ,s) given by (4.2). However, [11] shows that this quantity can be interpreted in terms of the Markov chain (2.8) and its properties connected to those of 96 its transition operator (Ph)(x) = E h s −1 U 1/θ s(x + 1) i in this, and some more general, cases. Though linear stochastic recursions are ubiquitous and are well known to be highly tractable, this special class of Markov chains, despite its non-linear transitions, seems also amenable to deeper analysis. We apply the inequality (4.1) in Corollary 5.4.5 to obtain a bound on the Wasserstein distance between the iterates W n of (2.8) and D θ,s . The following simple corollary to Lemma 5.2.2 gives a bound on how well the utility s(W n ), satisfying the recursion (2.1), approximates the long term utility of the fixed point D θ,s . Corollary 5.4.1. Let θ > 0 and Condition 2.1 be in force. Then s(W n ) given by (2.1) satisfies d 1 (s(W n ),s(D θ,s ))≤ θ θ + 1 n d 1 (s(W 0 ),s(D θ,s )) for all n≥ 0. Proof. The result follows from (2.7) of Lemma 5.2.2 by taking V 0 = d D θ,s and noting that D θ,s is fixed by the transformation (4.1) so that s(V n ) = d s(D θ,s ) for all n. For θ> 0, suppressed in the notation, and x> 0 such that s 0 (x) exists, let I(x) = θs 0 (x) s θ+1 (x) Z x 0 s θ (v)dv. (4.2) We note that the integral I(x) in (4.2) can be written as the one appearing in (3.13) when t(x) =s θ (x) as in (3.1). Corollary 5.4.1 depends on the direct coupling in Lemma 5.2.2, which constructs the variables s(W n ) and s(V n ) on the same space. Theorem 5.4.2 below, which follows from Theorem 5.3.10, gives a bound that is of use when W n itself is used to approximate the distribution of D θ,s . Though direct coupling can still be used to obtain bounds such as those in Theorem 5.4.2 for theD θ family, doing so is no longer possible for the more general D θ,s family as iterates of (2.8) can no longer be written explicitly when s(·) is non-linear. For S⊂ [0,∞), we say a function f : [0,∞)→ [0,∞) is locally absolutely continuous onS if it is absolutely continuous when restricted to any compact sub-interval of S. Unless otherwise stated, locally absolutely continuity will mean over the domain of f(·). 97 Theorem 5.4.2. Let θ > 0 and s : [0,∞)→ [0,∞) satisfying Condition 2.1 be locally absolutely continuous on [0,∞) and such that E[D θ,s ]<∞. With I(·) as in (4.2), if there exists ρ∈ [0, 1) such that kIk ∞ ≤ρ, (4.3) then for any non-negative random variable W with finite mean, d 1 (W,D θ,s )≤ (1−ρ) −1 d 1 (W ∗ ,W ). (4.4) In the special case s(x) =x, one may take ρ =θ/(θ + 1). Proof. The result follows by arguing similarly as in the proof of Theorem 1.3 of [40]. Let h∈ Lip 1 and note that h−Eh(D θ,s )∈ Lip 1,0 defined as in (3.11). Let g(·) be the solution to (3.10) guaranteed by Theorem 5.3.10 with g∈ Lip 1/(1−ρ) . As EW <∞ by assumption and g(·) is Lipschitz, Eg(W ) exists. By (3.2), it follows that A •+1 g is also Lipschitz and henceE[A W +1 g] also exists. Let (W,W ∗ ) be any coupling ofW to a random variableW ∗ having itsθ-Dickman biased distribution. Now, from the Stein equation (3.10), followed by (3.5) of Lemma 5.3.3 and the mean value theorem, we obtain |Eh(W )−Eh(D θ,s )| =|Eg(W )−EA W +1 g| =|Eg(W )−Eg(W ∗ )| ≤kg 0 k ∞ E|W−W ∗ |≤ (1−ρ) −1 E|W−W ∗ | where we have applied Theorem 5.3.10 for the final inequality. Taking supremum over h∈ Lip 1 in the left hand side and then infimum over all couplings (W,W ∗ ) on the right yields by (1.4) and (1.6) respectively. The final claim is obtained by applying Theorem 5.3.6 to s(x) =x. Remark 5.4.3. Note that E[s −1 (D θ )] <∞ implies E[D θ,s ] <∞ as D θ,s ≤ st s −1 (D θ ) by Theorem 5.2.3. Remark 5.4.4. By a simple argument, similar to the one in Section 3 of [41], for θ > 0 and s : [0,∞)→ [0,∞) satisfying Condition 2.1, (4.7) below and E[D θ,s ] <∞, for any non-negative random variable W with finite mean, we have d 1 (W,D θ,s )≤ (1 +θ)d 1 (W ∗ ,W ) 98 so that (4.4) holds with ρ =θ/(θ + 1). We highlight the differences between these two approaches. The use of Stein’s method in Theorem 5.4.2 does not require thats(·) satisfy (4.7) but does needs(·) to be locally absolutely continuous. In addition, the alternative approach in [41] has no scope for improvement in terms of finding the best constantρ; Example 5.2 presents a case where takingρ =θ/(θ +1) is not optimal. Theorem 5.3.6 gives a verifiable criteria by which one can show when the canonical choice ρ =θ/(θ + 1) is not improvable. Now we provide the following corollary applicable for the simulation ofD θ,s distributed random variables. Note that when s(·) is strictly increasing and continuous, for W inde- pendent of U∼ U[0, 1] the transform W ∗ as given by (1.4) satisfies W ∗ = d s −1 (U 1/θ s(W + 1))≤W + 1. (4.5) Corollary 5.4.5. Let s : [0,∞)→ [0,∞) be as in Theorem 5.4.2 and let{W n ,n≥ 1} be generated by (2.8) with W 0 non-negative and EW 0 <∞, independent of{U n ,n≥ 0}. If ρ∈ [0, 1) exists satisfying (4.3), then d 1 (W n ,D θ,s )≤ (1−ρ) −1 d 1 (W n+1 ,W n ). (4.6) Moreover, if s(·) satisfies |s −1 (as(x))−s −1 (as(y))|≤a|x−y| for a∈ [0, 1] and x,y≥ 1, (4.7) then d 1 (W n ,D θ,s )≤ (1−ρ) −1 θ θ + 1 n d 1 (W 1 ,W 0 ). (4.8) When W 0 = 0, d 1 (W n ,D θ,s )≤ (1−ρ) −1 θ θ + 1 n E[s −1 (U 1/θ )], (4.9) and in the particular the case of the generalized DickmanD θ family, d 1 (W n ,D θ )≤θ θ θ + 1 n . (4.10) 99 Proof. Identity (2.8), the inequality in (4.5) and induction show that W n ≤ W 0 +n, and hence EW n <∞, for all n≥ 0. Inequality (4.6) now follows from Theorem 5.4.2 noting from (1.4) that W ∗ n = d W n+1 for all n≥ 0. To show (4.8), recalling that the bound (1.6) is achieved for real valued random variables, for every n≥ 1 we may construct W 0 n−1 and V 0 n independent of U n such that W 0 n−1 = d W n−1 ,V 0 n = d W n and E|V 0 n −W 0 n−1 | =d 1 (W n ,W n−1 ). Now letting W 00 n =s −1 (U 1/θ n s(W 0 n−1 + 1)) and V 00 n+1 =s −1 (U 1/θ n s(V 0 n + 1)) we have W 00 n = d W n and V 00 n+1 = d W n+1 . Thus, using (1.6) followed by (4.7) we have d 1 (W n+1 ,W n )≤E|V 00 n+1 −W 00 n | =E|s −1 (U 1/θ n s(V 0 n + 1))−s −1 (U 1/θ n s(W 0 n−1 + 1))| ≤E[U 1/θ n |V 0 n −W 0 n−1 |] = θ θ + 1 d 1 (W n ,W n−1 ). Induction now yields d 1 (W n+1 ,W n )≤ θ θ + 1 n d 1 (W 1 ,W 0 ) and applying (4.6) we obtain (4.8). Inequality (4.9) now follows from (4.8) noting in this case, using s(1) = 1, that (W 0 ,W 1 ) = (0,s −1 (U 1/θ 0 )), and (4.10) is now achieved from (4.9) by takingρ to beθ/(θ+1), as provided by Theorem 5.4.2 when s(x) =x. In the remainder of this section, in Lemma 5.4.6 we present some general and easily verifiable conditions ons(·) for the satisfaction of (4.7), and lastly we show our bounds are equivalent to what can be obtained by a direct coupling method, in the cases where the latter is available. Condition 4.1. The function s : [0,∞)→ [0,∞) is continuous at 0, strictly increasing with s(0) = 0 and s(1) = 1, and concave. Lemma 5.4.6. If s : [0,∞)→ [0,∞) satisfies Condition 4.1, then it is locally absolutely continuous on [0,∞), satisfies Condition 2.1 and |s −1 (as(y))−s −1 (as(x))|≤a|y−x| for all x,y≥ 0 and a∈ [0, 1]. (4.11) 100 Proof. First, since s(·) is concave, it is locally absolutely continuous on (0,∞). Thus, by Lemma 5.3.2, s(·) is locally absolutely continuous on its domain. Next we show s(·) is subadditive, that is, that s(x +y)≤s(x) +s(y) for x,y≥ 0. (4.12) Taking x,y≥ 0, we may assume both x and y are non-zero as (4.12) is trivial otherwise since s(0) = 0. By concavity, y x +y s(0) + x x +y s(x +y)≤s(x) and x x +y s(0) + y x +y s(x +y)≤s(y). Since s(0) = 0, adding these two inequalities yield (4.12). Taking y = 1 and using s(1) = 1 we obtain (2.5). Next, the local absolute continuity and concavity ofs(·) on [0,∞) imply that it is almost everywhere differentiable on this domain, withs 0 (·) decreasing almost everywhere. Thus for x≥y≥ 0, we have s(x + 1)−s(x) = Z x+1 x s 0 (u)du≤ Z x+1 x s 0 (u +y−x)du = Z y+1 y s 0 (u)du =s(y + 1)−s(y), which together with the fact that s(·) is increasing implies (2.6). Hence s(·) satisfies Con- dition 2.1. Lastly, we show that s(·) satisfies (4.11). Since s(0) = 0 the inequality is trivially satisfied for a = 0, so fix some a∈ (0, 1]. Again as the result is trivial otherwise, we may take x6= y; without loss, let 0≤ x < y. The inverse function r(·) = s −1 (·) is continuous at zero and convex on the range S of s(·), a possibly unbounded convex subset [0,∞) that includes the origin. Letting u = s(x) and v = s(y), as s(·), and hence r(·), are strictly increasing and x6=y, inequality (4.11) may be written r(av)−r(au)≤a(r(v)−r(u)) or equivalently r(av)−r(au) av−au ≤ r(v)−r(u) v−u , (4.13) where all arguments of r(·) in (4.13) lie in S, it being a convex set containing{0,u,v}. The second inequality in (4.13) follows from the following slightly more general one that any convex function r : [0,∞)→ [0,∞) which is continuous at 0 satisfies by virtue of its local absolute continuity and a.e. derivative r 0 (·) being increasing: if (u 1 ,v 1 ) and (u 2 ,v 2 ) 101 are such that u 1 6= v 1 , u 1 ≤ u 2 and v 1 ≤ v 2 , and all these values lie in the range of r(·), then r(v 1 )−r(u 1 ) v 1 −u 1 = 1 v 1 −u 1 Z v 1 u 1 r 0 (w)dw = Z 1 0 r 0 (u 1 + (v 1 −u 1 )w)dw ≤ Z 1 0 r 0 (u 2 + (v 2 −u 2 )w)dw = 1 v 2 −u 2 Z v 2 u 2 r 0 (w)dw = r(v 2 )−r(u 2 ) v 2 −u 2 , as one easily has that u 1 + (v 1 −u 1 )w≤u 2 + (v 2 −u 2 )w for all w∈ [0, 1]. The bound (4.10) of Corollary 5.4.5 is obtained by specializing results for theD θ,s family, proven using the tools of Stein’s method, to the case where s(x) =x. For this special case, letting V j = U 1/θ j for j≥ 0 the iterates of the recursion (2.8), starting at W 0 = 0, can be written explicitly as W n = n−1 X k=0 n−1 Y j=k V j , allowing one to obtain bounds using direct coupling. Interestingly, the results obtained by both methods agree, as seen as follows. First, we show W n = d Y n where Y n = n−1 X k=0 k Y j=0 V j , and Y ∞ ∼D θ where Y ∞ = ∞ X k=0 k Y j=0 V j . The first claim is true since for every n≥ 1, (V 0 ,...,V n−1 ) = d (V n−1 ,...,V 0 ). For the second claim, note that the limit Y ∞ exists almost everywhere and has finite mean by monotone convergence. Now using definition (1.2), with U −1 ∼ U[0, 1] independent of U 0 ,U 1 ... and setting V −1 =U 1/θ −1 , we have Y ∗ ∞ =U 1/θ −1 (Y ∞ + 1) =V −1 ∞ X k=0 k Y j=0 V j + 1 = ∞ X k=0 k Y j=−1 V j +V −1 = ∞ X k=−1 k Y j=−1 V j = ∞ X k=0 k Y j=0 V j−1 = d ∞ X k=0 k Y j=0 V j =Y ∞ . 102 Hence Y ∞ ∼D θ . As (Y n ,Y ∞ ) is a coupling of a variable with the W n distribution to one with theD θ distribution, by (1.6) we obtain d 1 (W n ,D θ ) =d 1 (Y n ,Y ∞ )≤E|Y ∞ −Y n | =E ∞ X k=n k Y j=0 V j = ∞ X k=n θ θ + 1 k+1 =θ θ θ + 1 n , in agreement with (4.10). 5 Examples We now consider three new distributions that arise as special cases of the D θ,s family. Expected Utility (EU) theory has long been considered as an acceptable paradigm for decision making under uncertainty by researchers in both economics and finance, see e.g. [34]. To obtain tractable solutions to many problems in economics, one often restricts the EU criterion to a certain class of utility functions, which includes in particular the ones in Examples 5.1 and 5.3. In these two examples we apply the bounds provided in Corollary 5.4.5 for the simulation of the limiting distributions these functions give rise to via the recursion (2.8) with say, W 0 = 0. For each example we will verify Condition 4.1, implying Condition 2.1 by Lemma 5.4.6, and hence existence and uniqueness of D θ,s . Example 5.1. The exponential utility function u(x) = 1−e −αx is the only model, up to linear transformations, exhibiting constant absolute risk aversion (CARA), see [34]. Since utility is unique up to linear transformations, we consider its scaled version s α (x) = 1−e −αx 1−e −α for x≥ 0 characterized by a parameter α > 0. Clearly s α (·) is continuous at 0, strictly increasing with s α (0) = 0 and s α (1) = 1 and concave. Since lim x↓0 s 0 α (x) = α(1−e −α ) −1 ∈ (0,∞), for all θ > 0, by (3.17) of Theorem 5.3.6, one can take ρ to be θ/(θ + 1) and not strictly smaller, and (4.9) of Corollary 5.4.5 yields d 1 (W n ,D θ,sα )≤θ θ θ + 1 n−1 for all n≥ 0, using that 0≤s −1 α (U 1/θ )≤s −1 (1) = 1 almost surely. 103 Letting W α ∼D θ,sα it is easy to verify that s α (W α ) = d U 1/θ s α (W α + 1) =U 1/θ (1 +e −α s α (W α )). Using this identity, that 0≤s α (W α )≤ st D θ for all α> 0 and that lim α↓0 s α (x) =x for all x≥ 0 one can show that W α converges to D θ as α↓ 0. Hence, now setting s 0 (x) =x, the family of models D θ,sα ,α≥ 0 is parameterized by a tuneable values of α≥ 0 whose value may be chosen depending on a desired level of risk aversion, including the canonical α = 0 case where utility is linear. Example 5.2. Here we show how standard Vervaat perpetuity models can be seen to assume an implicit concave utility function, and how uncertainty in these utilities can be accommo- dated using the new families we introduce. Indeed, letting s θ (x) = x θ ,θ∈ (0, 1], it is easy to see thatD 1,s θ =D θ . To model situations where these utilities are themselves subject to uncertainty, we may letA be a random variable supported in (0, 1] and consider the mixture s(x) =E[s A (x)]. More formally, for some 0<a≤ 1, let μ be a probability measure on the interval (0,a], and define s(x) = Z a 0 s α (x)dμ(α). Clearly s(·) satisfies Condition 4.1. By (3.16) of Theorem 5.3.6, for the familyD θ,s one can take ρ =θ/(θ + 1). Let x > 0 be given and l∈ (0,x) be arbitrary. Since ∂x α /∂x = αx α−1 ≤ αl α−1 and αl α−1 is μ-integrable on [0,a], by dominated convergence we obtain s 0 (x) = Z a 0 ∂x α ∂x dμ(α) = Z a 0 αx α−1 dμ(α) for all x> 0. (5.1) Now note that fora< 1, lim x↓0 s 0 (x) diverges to infinity, and hence (3.17) of Theorem 5.3.6 cannot be invoked. We show, in fact, that one may obtain a bound better than θ/(θ + 1) in this case. 104 Taking θ = 1 and computing I(x) directly from (4.2), using (5.1) for the first equality and Fubini’s theorem for the second, we have I(x) = R a 0 αx α−1 dμ(α) [ R x 0 R a 0 v α dμ(α)dv] [ R a 0 x α dμ(α)] 2 = R a 0 αx α−1 dμ(α) h R a 0 x α+1 α+1 dμ(α) i [ R a 0 x α dμ(α)] 2 = h R a 0 R a 0 α β+1 x α+β dμ(α)dμ(β) i [ R a 0 x α dμ(α)] 2 = h R a 0 R a 0 1 2 α β+1 + β α+1 x α+β dμ(α)dμ(β) i R a 0 R a 0 x α+β dμ(α)dμ(β) ≤ sup α,β∈[0,a] 1 2 α β + 1 + β α + 1 . Using the simple fact that (β−α) 2 ≤β−α for 0≤α≤β≤ 1 shows that for 0≤α≤β≤a, α β + 1 + β α + 1 ≤ 2β β + 1 ≤ 2a a + 1 and hence one can take ρ = a/(a + 1). Note that when a = 1/2, say, we obtain the upper bound ρ = 1/3, whereas the bound (3.16) of Theorem 5.3.6 gives 1/2 when θ = 1.Taking μ to be unit mass at 1 yieldsρ = 1/2 which recovers the bound onρ for the standard Dickman derived in [41], and as given in Theorem 5.4.2, for the value θ = 1. Example 5.3. The logarithm u(x) = logx is another commonly used utility function as it exhibits constant relative risk aversion (CRRA) which often simplifies many problems encountered in macroeconomics and finance, see [34]. Applying a shift to make it non- negative, let s(x) = log(x + 1)/ log 2 for x≥ 0. Clearly s(·) satisfies Condition 4.1. To apply Corollary 5.4.5 it remains to compute an upper bound ρ on the integral in (4.2). Now since lim x↓0 s 0 (x)<∞, by (3.17) of Theorem 5.3.6, we may take ρ =θ/(θ + 1). Noting s −1 (x) = 2 x − 1, 105 simulating from this distribution by the recursion W n+1 = (W n + 2) U 1/θ n − 1 for n≥ 1 with initial value W 0 = 0, inequality (4.9) of Corollary 5.4.5 yields d 1 (W n ,D θ,s )≤θ θ θ + 1 n−1 for all n≥ 0, using that 0≤s −1 (U 1/θ ) = 2 U 1/θ − 1≤ 1 almost surely. 106 Bibliography [1] Arras, B., Mijoule, G., Poly, G. and Swan, Y. (2016). Distances between proba- bility distributions via characteristic functions and biasing. https://arxiv.org/ abs/1605.06819v1. [2] Arras, B., Mijoule, G., Poly, G. and Swan, Y. (2017). A new approach to the Stein- Tikhomirov method: with applications to the second Wiener chaos and Dickman convergence. https://arxiv.org/abs/1605.06819v2. [3] Arratia, R., Barbour, A. and Tavar´ e, S. (2003). Logarithmic combinatorial struc- tures: a probabilistic approach. EMS Monographs in Mathematics. European Mathematical Society (EMS), Z¨ urich. [4] Arratia, R., Goldstein, L. and Gordon, L. (1989). Two moments suffice for Poisson approximations: the Chen-Stein method. Ann. Prob., 9-25. [5] Arratia, R., Goldstein, L. and Gordon, L. (1990). Poisson approximation and the Chen-Stein method. Stat. Sci., 403-424. [6] Baldi, P., Rinott, Y. and Stein, C. (1989). A normal approximation for the number of local maxima of a random function on a graph. In Probability, statistics, and mathematics, 59-81, Academic Press, Boston, MA. [7] Barbour, A. D. (1990). Stein’s method for diffusion approximations. Probab. The- ory Relat. Fields, 84(03), 297-322. [8] Barbour, A. D. and Gr¨ ubel, R. (1995). The first divisible sum. Journal of Theo- retical Probability, 8(01), 39-47. [9] Barbour, A. and Nietlispach, B. (2011). Approximation by the Dickman distri- bution and quasi-logarithmic combinatorial structures. Electron. J. Probab., 16, 880-902. [10] B´ artfai, P. (1966). Die Bestimmung der zu einem wiederkehrenden Prozeß geh¨ oren- den Verteilungsfunktion aus den mit Fehlern behafteten Daten einer einzigen Re- alisation. Studia Sci. Math. Hungar., 1, 161-168. [11] Baxendale, P. (2017). Personal communication. 107 [12] Berkes, I., Liu, W., and Wu, W.B. (2014). Koml´ os-Major-Tusn´ ady approximation under dependence. Ann. Prob., 42(02), 794-817. [13] Bernoulli, D. (1738). Specimen theoriae novae de mensura sortis. Comentarii Academiae Scientiarum Imperiales Petropolitanae, 5, 175-192. [14] Bernoulli, D. (1954). Exposition of a new theory on the measurement of risk. Econometrica: Journal of the Econometric Society, 23-36. [15] Bhatt, A. and Roy, R. (2004). On a random directed spanning tree. Adv. Appl. Prob., 36(01), 19-42. [16] Billingsley, P. (1999). Convergence of probability measures. Second ed., Wiley Series in Probability and Statistics: Probability and Statistics, John Wiley & Sons. [17] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration inequalities: A nonasymptotic theory of independence. Oxford university press. [18] Cacoullos, T. and Papathanasiou, V. (1992). Lower variance bounds and a new proof of the central limit theorem. J. Multivar. Anal., 43(02), 173-184. [19] Cellarosi, F. and Sinai, Y. G. (2013) Non-standard limit theorems in Number theory. Prokhorov and contemporary probability theory, Springer Berlin, 197-213. [20] Chatterjee, S. (2008). A new method of normal approximation. Ann. Prob., 36(04), 1584-1610. [21] Chatterjee, S. (2009). Fluctuations of eigenvalues and second order PoincarÃľ inequalities. Probab. Theory Relat. Fields, 143(1-2), 1-40. [22] Chatterjee, S. (2012). A new approach to strong embeddings. Probab. Theory Relat. Fields, 152(1-2), 231-264. [23] Chatterjee, S., Fulman, J. and Roellin, A. (2011). Exponential approximation by exchangeable pairs and spectral graph theory. ALEA, 8, 1-27. [24] Chatterjee, S. and Meckes, E. (2008). Multivariate normal approximation using exchangeable pairs. ALEA, 4. [25] Chen, L.H.Y. (1975). Poisson approximation for dependent trials. Ann. Prob., 534–545. [26] Chen, L.H.Y., Goldstein, L, and Shao, Q.M. (2010). Normal approximation by Stein’s method. Springer. [27] Chen, L. H. and Shao, Q. M. (2004). Normal approximation under local depen- dence. Ann. Prob., 32(03), 1985-2028. [28] Cs¨ org˝ o, M. and R´ ev´ esz, P. (1974/75). A new method to prove Strassen type laws of invariance principle. I, II. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 31(04), 255-259; ibid. 31(04), 261-269. 108 [29] Cs¨ org˝ o, M. and R´ ev´ esz, P. (1974/75). (1981). Strong approximations in probability and statistics. Probability and Mathematical Statistics, Academic Press. [30] Cs¨ org˝ o, S. and Hall, P. (1984). The Koml´ os-Major-Tusn´ ady approximations and their applications. Austral. J. Statist., 26(02), 189-218. [31] Devroye, L. and Fawzi, O. (2010). Simulating the Dickman distribution. Statistics and Probability Letters, 80(03), 242-247. [32] Dickman, K. (1930). On the frequency of numbers containing prime factors of a certain relative magnitude. Ark. Mat. Astr. Fys., 22(10), 1-14. [33] Donsker, M.D. (1952). Justification and extension of Doob’s heuristic approach to the Kolmogorov-Smirnov theorems. Ann. Math. Stat., 23, 277-281. [34] Eeckhoudt, L., Gollier, C. and Schlesinger, H. (2005). Economic and financial decisions under risk. Princeton University Press. [35] Einmahl, U. (1989). Extensions of results of Koml´ os, Major, and Tusn´ ady to the multivariate case. J. Multivar. Anal. 28(01), 20-68. [36] Feller, W. (1968). An introduction to probability theory and its applications: Vol- ume 1. [37] Fill J. and Huber, M. (2010). Perfect Simulation of Vervaat Perpetuities. Electron. J. Probab., 15(04), 96-109. [38] Fulman, J. (2004). Stein’s method and non-reversible Markov chains. In Stein’s Method, 66-74. Institute of Mathematical Statistics. [39] Goldstein, L. (2007). L 1 bounds in normal approximation. Ann. Prob., 35(05), 1888-1930. [40] Goldstein, L. (2017). Non asymptotic distributional bounds for the Dickman ap- proximation of the running time of the Quickselect algorithm. https://arxiv. org/abs/1703.00505v1. [41] Goldstein, L. (2017). Non asymptotic distributional bounds for the Dickman ap- proximation of the running time of the Quickselect algorithm. https://arxiv. org/abs/1703.00505v2. [42] Goldstein, L. and Reinert, G. (1997). Stein’s method and the zero bias trans- formation with application to simple random sampling. Ann. Appl. Prob., 7(04), 935-952. [43] Goldstein, L. and Rinott, Y. (1996). Multivariate normal approximations by Stein’s method and size bias couplings. J. Appl. Probab., 33(01), 1-17. [44] Hardy, G. H. and Wright, E. M. (1979). An introduction to the theory of numbers. Oxford university press. 109 [45] Hwang, H. K. and Tsai, T. H. (2002). Quickselect and the Dickman function. Comb. Probab. Comput., 11(04), 353-371. [46] Joag-Dev, K. and Proschan, F. (1983). Negative association of random variables with applications. Ann. Stat., 286-295. [47] Kiefer, J. (1969). On the deviations in the Skorokhod-Strassen approximation scheme. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 13, 321-332. [48] Koml´ os, J., Major, P. and Tusn´ ady, G. (1975). An approximation of partial sums of independent RV’s and the sample DF. I. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 32, 111-131. [49] Koml´ os, J., Major, P. and Tusn´ ady, G. (1976). An approximation of partial sums of independent RV’s and the sample DF. II. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 34(01), 33-58. [50] Lifshits, M. (2000). Lecture notes on strong approximation. Pub. IRMA Lille, 53(13). [51] Luk, H. M. (1994). Stein’s method for the gamma distribution and related statis- tical applications. Ph.D. thesis, University of Southern California, 1994. [52] Ob l´ oj, J. (2004). The Skorokhod embedding problem and its offspring. Probab. Surv., 1, 321-390. [53] Parzen, E. (1979). Nonparametric statistical data modeling. J. Am. Stat. Assoc., 74(365), 105-121. [54] Pek¨ oz, E. (1996). Stein’s method for geometric approximation. J. Appl. Probab., 33(03), 707-713. [55] Pek¨ oz, E. and R¨ ollin, A. (2011) New rates for exponential approximation and the theorems of R´ enyi and Yaglom. Ann. Prob., 587-608. [56] Pek¨ oz, E., R¨ ollin, A and Ross, N. (2013). Degree asymptotics with rates for pref- erential attachment random graphs. Ann. Appl. Prob., 23(03), 1188-1218. [57] Penrose, M. D., and Wade, A. R. (2004). Random minimal directed spanning trees and Dickman-type distributions. Adv. Appl. Prob., 36(03), 691-714. [58] Pinsky, R. (2016). A Natural Probabilistic Model on the Integers and its Relation to Dickman-Type Distributions and Buchstab’s Function. https://arxiv.org/ abs/1606.02965. [59] Pinsky, R. (2016). On the strange domain of attraction to generalized Dickman dis- tributions for sums of independent random variables. https://arxiv.org/abs/ 1611.07207. [60] Rachev, S. T. (1991). Probability metrics and the stability of stochastic models. (Vol. 269). John Wiley & Son Ltd. 110 [61] Ramanujan, S. (1919). A proof of Bertrand’s postulate. Journal of the Indian Mathematical Society, 11(181-182), 27. [62] Reinert, G. (2005). Three general approaches to Stein’s method. In An introduction to Stein’s method, (04), 183-221. [63] Rinott, Y. and Rotar, V. (1996). A Multivariate CLT for Local Dependence with n −1/2 logn Rate and Applications to Multivariate Graph Related Statistics. J. Multivar. Anal., 56(02), 333-350. [64] Rinott, Y. and Rotar, V. (1997). On coupling constructions and rates in the CLT for dependent summands with applications to the antivoter model and weighted U-statistics. Ann. Appl. Prob., 1080-1105. [65] Rinott, Y. and Rotar, V. (2003). On Edgeworth expansions for dependency- neighborhoods chain structures and Stein’s method. Probab. Theory Relat. Fields, 126(04), 528-570. [66] Ross, N. (2011). Fundamentals of Stein’s method. Probab. Surv., 8, 210-293. [67] Rosser, B. (1939). The n th Prime is Greater than n logn. In Proceedings of the London Mathematical Society, 2(01), 21-44. [68] Royden, H. L. and Fitzpatrick, P. (1988). Real analysis (Vol. 198, No. 8). New York: Macmillan. [69] Rudin, W. (1964). Principles of mathematical analysis (Vol. 3). New York: McGraw-hill. [70] Shorack, G.R. and Wellner, J.A. (1986). Empirical processes with applications to statistics. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons. [71] Skorokhod, A.V. (1961). Issledovaniya po teorii sluchainykh protsessov (Stokhas- ticheskie differentsialnye uravneniya i predelnye teoremy dlya protsessov Markova). Izdat. Kiev. Univ., Kiev. [72] Skorokhod, A.V. (1965). Studies in the theory of random processes. Addison- Wesley. [73] Smirnoff, N. (1939). Sur les ´ ecarts de la courbe de distribution empirique. Rec. Math. N.S. [Mat. Sbornik], 6(48) , 3-26. [74] Stein, C. (1972). A bound for the error in the normal approximation to the dis- tribution of a sum of dependent random variables. Proc. Sixth Berkeley Symp. Math. Stat. Prob., 583-602. [75] Stein, C. (1986). Approximate Computation of Expectations. Institute of Mathe- matical Statistics, Hayward, CA. 111 [76] Stolz, O. (1885). Vorlesungen ¨ uber allgemeine Arithmetik: Nach den neueren An- sichten (Vol. 1). BG Teubner. [77] Strassen, V. (1967). Almost sure behavior of sums of independent random variables and martingales. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol 2: Contributions to Probability Theory, Part 1, 315- 343, The Regents of the University of California. [78] Tao, T. and Vu, V. H. (2006). Additive combinatorics, (Vol. 105). Cambridge University Press. [79] Vervaat, W. (1979). On a stochastic difference equation and a representation of non-negative infinitely divisible random variables. Adv. Appl. Prob., 11(04), 750- 783. [80] Wu, W. B. and Zhao, Z. (2007). Inference of trends in time series. J. R. Stat. Soc. Series B Stat. Methodol., 69(03), 391-410. [81] Zaitsev, A.Y. (1996). Estimates for the quantiles of smooth conditional distri- butions and the multidimensional invariance principle. Sibirsk. Mat. Zh., 37(04), 706-729. [82] Zaitsev, A.Y. (1998). Multidimensional version of the results of Koml´ os, Major and Tusn´ ady for vectors with finite exponential moments. ESAIM Probab. Statist., 2, 41-108. [83] Zaitsev, A.Y. (2002). Estimates for the strong approximation in multidimensional central limit theorem. In Proceedings of the International Congress of Mathemati- cians, Vol. III (Beijing, 2002), Higher Ed. Press, Beijing, 107-116. 112 Appendix A Proof of Theorem 3.1.1 using Lemma 3.4.1 Theorem 3.1.1. SE holds for , any random variable with mean zero and variance 1 satisfying E 3 = 0, taking values in a finite setA not containing 0. Proof. For eachr≥ 1, letm r = 2 r ,n r =m r −m r−1 and let (S (r) k ,Z (r) k ) 0≤k≤nr be a random vector satisfying the conclusions of Lemma 3.4.1, and suppose these random vectors are independent. Inductively define an infinite sequence (S k ,Z k ) k≥0 as follows. Let S k =S (1) k and Z k =Z (1) k for k≤m 1 . Having defined (S k ,Z k ) k≤m r−1 , define (S k ,Z k ) m r−1 ≤k≤mr as S k :=S (r) k−m r−1 +S m r−1 and Z k :=Z (r) k−m r−1 +Z m r−1 . Clearly, since the increments are independent, S k andZ k are indeed random walks with increments distributed as and gaussian respectively. Now recall the constants A and λ in Lemma 3.4.1. First, note that for each r, by Lemma 3.4.1 and independence we have E exp(λ|S mr −Z mr |)≤E exp λ r X l=1 |S (l) n l −Z (l) n l | ! = r Y l=1 E exp(λ|S (l) n l −Z (l) n l |)≤A r . (0.1) Next, let C = 1−A −1 exp − 1 2 A log 4 −1 . We will show by induction that for each r, E(λ max k≤mr |S k −Z k |)≤CA r exp(A logm r ). (0.2) By Lemma 3.4.1 and the facts thatA> 1 andC > 1, this holds forr = 1. Suppose it holds for r− 1. By the inequality exp(x∨y)≤ expx + expy, we have E(λ max k≤mr |S k −Z k |)≤E(λ max m r−1 ≤k≤mr |S k −Z k |) +E(λ max k≤m r−1 |S k −Z k |). (0.3) 113 Since max m r−1 ≤k≤mr |S k −Z k |≤ max 1≤j≤nr |S (r) j −Z (r) j | +|S m r−1 −Z m r−1 |, by independence and Lemma 3.4.1, and the inequality (0.1), we get E(λ max m r−1 ≤k≤mr |S k −Z k |)≤A r exp(A logm r ). By the induction hypothesis and the relation m r = m 2 r−1 , we see that the second term in (0.3) satisfies E(λ max k≤m r−1 |S k −Z k |)≤CA r−1 exp(A logm r−1 ) =CA r−1 exp A logm r 2 . Combining, we get E(λ max k≤mr |S k −Z k |)≤A r exp(A logm r ) 1 + C A exp − A logm r 2 . From the definition of C, it easy to verify (since m r ≥ 4), that the term within the paren- theses in the above expression is bounded by C. This completes the induction step and proving (0.2). Since r≤ M logm r for some constant M, this shows that there exists a constant K such that for all r, E(λ max k≤mr |S k −Z k |)≤K exp(K logm r ). Now let us prove such an inequality for arbitrary n instead of m r . Take any n≥ 2. Let r be such that m r−1 ≤n≤m r . Then since m r =m 2 r−1 ≤n 2 , we have E(λ max k≤n |S k −Z k |)≤E(λ max k≤mr |S k −Z k |)≤K exp(K logm r )≤K exp(2K logn). It is now easy to complete the argument using Markov’s inequality. 114 Appendix B Dickman approximation proofs Lemma 4.2.4. Let the random variable I take values in{1,...,n} with mass function P (I =k) = log(p k ) (1 +p k ) log(p n )μ n for k∈{1,...,n}, and be independent of all other variables. For T n = log(p I ) log(p n ) − X I log(p I ) log(p n ) , we have E[S n φ(S n )] =μ n E[φ(S n +T n )] for all φ∈ Lip 1/2 . Moreover, μ n − 1 =O 1 logn and E X I log(p I ) log(p n ) =O 1 log 2 n . (0.1) and there exists a coupling between a random variableU∼ U[0, 1] andI withU independent of S n such that E U− log(p I ) log(p n ) =O 1 logn . 115 Proof. To prove the first equality, we reproduce the proof of Lemma 5 in [1] here. Let φ(·) be a function such that E[S n φ(S n )] < ∞ and E[φ(S n +T n )] < ∞. Then, letting S (k) n =S n −X k log(p k )/ log(p n ) and using the definiton of I, we have E[S n φ(S n )] = 1 log(p n ) n X k=1 log(p k )E X k φ S (k) n +X k log(p k ) log(p n ) =μ n n X k=1 log(p k ) μ n (p k + 1) log(p n ) E φ S (k) n + log(p k ) log(p n ) =μ n E φ S (I) n + log(p I ) log(p n ) =μ n E[φ(S n +T n )]. To prove the second result in (0.1), we write E X I log(p I ) log(p n ) = n X k=1 P (I =k) X k log(p k ) log(p n ) = 1 μ n 1 (logp n ) 2 n X k=1 (logp k ) 2 (1 +p k ) 2 . The result now follows from (2.17) and thatμ n → 1 asn→∞ noting that (2.17) holds true if p k − 1 is replaced by p k + 1. The proofs of the other two claims are similar to those of the corresponding results in Lemma 4.2.3 noting that the rates do not change if we replace p k − 1 by p k + 1; we omit the computation. Lemma 5.3.8. Letθ> 0 ands(·) satisfy Condition 2.1. Moreover assume thatμ =E[D θ,s ] exists. Then with Lip α,0 as in (3.11), for any α> 0, sup h∈Lip α,0 |h(0)| =αμ. (0.2) Proof. By replacing h(·) by−h(·) if h(0) < 0, we may assume h(0)≥ 0. Observe that h α (x) = α(μ−x)∈ Lip α,0 and as h α (0) = αμ, the supremum in (0.2) is at least αμ. If h∈ Lip α is such that h(0) =αμ + for > 0, then h(x)≥h(0)−αx =αμ +−αx =h α (x) +, implyingE[h(D θ,s )]≥. Hence h6∈ Lip α,0 . 116 Recall that A 0 x+1 h =h(x) and A n x+1 =A x+1 (A n−1 •+1 ) for n≥ 1, and for a class of functionsH that A n x+1 (H) ={A n x+1 h :h∈H} for n≥ 0. Lemma 5.3.9. Lets(·) satisfy Condition 2.1 and be locally absolutely continuous on [0,∞). If there exists ρ∈ [0,∞) such that (3.13) holds, then for all θ> 0,α≥ 0 and n≥ 0, A n x+1 (Lip α,0 )⊂ Lip αρ n ,0 . Proof. The statement is trivially true for n = 0. Now if h∈ Lip β,0 for some β > 0, then h∈ Lip β , hence A x+1 h exists for all x≥ 0 and A •+1 h∈ Lip βρ by Lemma 5.3.5 and that A x+1 h is a left shift ofA x h. Additionally, asEh(D θ,s ) = 0 we conclude thatE[A D θ,s +1 h] = 0 by Lemma 5.3.4. Hence A •+1 h∈ Lip βρ,0 . Assuming the claim of the lemma holds for some n≥ 0 and letting β =αρ n , we see they also hold for n + 1 and hence the result follows by induction. Theorem 5.3.10. Let s(·) satisfy Condition 2.1 and be locally absolutely continuous on [0,∞). Further assume that μ = E[D θ,s ] exists. If there exists ρ∈ [0, 1) such that (3.13) holds, then for all a≥ 0 and h∈ Lip 1,0 we have kh (?k) k [0,a] ≤ (μ +a)ρ k , (0.3) g n ∈ Lip (1−ρ n+1 )/(1−ρ) and g(·) given by (3.18) is a Lip 1/(1−ρ) solution to (3.10). Proof. For all a > 0 we show that the partial sums g n ,n≥ 0 as given in (3.18) form a Cauchy sequence on the space of continuous functions on [0,a] under the supremum norm. By Lemma 5.3.9, we see that h (?k) ∈ Lip ρ k ,0 , and by Lemma 5.3.8, that|h (?k) (0)|≤ μρ k . Hence (0.3) holds. 117 Summing, we have g n ∈ Lip (1−ρ n+1 )/(1−ρ) and is therefore continuous. In addition, the sequence{g n ,n≥ 0} is Cauchy in thek·k [0,a] norm for all a> 0 as for n≥m≥ 0, kg n −g m k [0,a] ≤ n X k=m+1 kh (?k) k [0,a] ≤ (μ +a) X k≥m+1 ρ k = (μ +a) ρ m+1 1−ρ , which tends to zero as m→∞. Hence the limit g(·) exists on [0,∞) and is an element of Lip 1/(1−ρ) . By similar reasoning, see the proof of Lemma 3.6 of [40], we also have that A x+1 g = X n≥1 h (?k) (x), where the sum converges uniformly over compact intervals. Thus, g(·) solves (3.10), as g(x)−A x+1 g = X n≥0 h (?k) (x)− X n≥1 h (?k) (x) =h(x). 118
Abstract (if available)
Abstract
The aim of this dissertation is twofold. First, in Part I, we apply Stein's method in the context of embedding problems to extend Chatterjee's novel approach to more general random walks. Specifically, we provide log n rates for the coupling of partial sums of independent variables to a Brownian motion when the steps are independent and identically distributed (i.i.d.) with mean zero, variance one and having vanishing third moment and taking values in a finite set not containing zero. ❧ In Part II of the dissertation, we consider Dickman approximation using Stein's method. The generalized Dickman distribution ${\cal D}_\theta$ with parameter θ>0 is the unique solution to the distributional equality $W=_d W^*$, where ❧ \begin{equation} \label{eq:W*.transabs} W^*=_d U^{¹/ᶿ(W+1), \end{equation} ❧ with W non-negative, U ∼ U}[0,1], uniform over the interval [0,1] independent of W, and $=_d$ denoting equality in distribution. Members of this family appear in number theory, stochastic geometry, perpetuities and the study of algorithms. We obtain bounds in Wasserstein type distances between ${\cal D}_\theta$ and certain randomly weighted sums of Bernoulli and Poisson random variables. We also give simple proofs and provide rates for the Dickman convergence of certain weighted sums arising in probabilistic number theory. ❧ In addition, we broaden the class of generalized Dickman distributions by studying the fixed points of the transformation ❧ \begin{equation*} s(W^*)=_d U^{1/\theta}s(W+1) \end{equation*} ❧ generalizing (0.1), that allows the introduction of non-identity utility functions s(⋅) into Vervaat perpetuities. We obtain distributional bounds for recursive methods that can be used to simulate from this family.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Exchangeable pairs in Stein's method of distributional approximation
PDF
Stein's method via approximate zero biasing and positive association with applications to combinatorial central limit theorem and statistical physics
PDF
Stein couplings for Berry-Esseen bounds and concentration inequalities
PDF
Finite sample bounds in group sequential analysis via Stein's method
PDF
Limit theorems for three random discrete structures via Stein's method
PDF
On regularity and stability in fluid dynamics
Asset Metadata
Creator
Bhattacharjee, Chinmoy
(author)
Core Title
Stein's method and its applications in strong embeddings and Dickman approximations
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Publication Date
04/06/2018
Defense Date
03/06/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Bernoulli sums,coupling,delay equation,distributional approximation,exchangeability,OAI-PMH Harvest,size bias,Stein's method,strong approximation,utility,zero bias
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Goldstein, Larry (
committee chair
), Baxendale, Peter (
committee member
), Lv, Jinchi (
committee member
)
Creator Email
cbhattac@usc.edu,chinmoybht@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-533
Unique identifier
UC11670916
Identifier
etd-Bhattachar-6161.pdf (filename),usctheses-c89-533 (legacy record id)
Legacy Identifier
etd-Bhattachar-6161.pdf
Dmrecord
533
Document Type
Dissertation
Rights
Bhattacharjee, Chinmoy
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Bernoulli sums
coupling
delay equation
distributional approximation
exchangeability
size bias
Stein's method
strong approximation
utility
zero bias