Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
CLT, LDP and incomplete gamma functions
(USC Thesis Other)
CLT, LDP and incomplete gamma functions
PDF
Download
Share
Open document
Flip pages
Copy asset link
Request this asset
Transcript (if available)
Content
CLT, LDP AND INCOMPLETE GAMMA FUNCTIONS by Hongjian Zhou A Thesis Presented to the Faculty of the Dornsife College of Letters, Arts and Sciences University of Southern California Master of Science APPLIED MATHEMATICS December 2019 Contents 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 CLT, LLN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Large Deviation and Moderate Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Incomplete Gamma Function and Poisson Process . . . . . . . . . . . . . . . . . . . . . . . 5 2 Rate Function and Large Deviation Principle 9 2.1 Denition and Properties of Rate Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Large Deviation Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Examples of Rate Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 Moderate Deviation and Normal Tail 20 3.1 Derivation of Moderate Deviation Approximation . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Normal Tail Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Connecting Moderate and Large Deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4 Laplace's Method 27 4.1 Summary of Laplace's Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2 Examples of Laplace's Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5 Numerical Experiments 38 5.1 CDF of Gamma Distribution Near Zero: Low Tail . . . . . . . . . . . . . . . . . . . . . . . 38 5.2 CDF of Gamma Distribution at Innity: Upper Tail . . . . . . . . . . . . . . . . . . . . . . 41 5.3 An \Engineering Calculator" Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 A Additional Python Codes 48 ii Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 iii List of Tables 1.3.1 LD, MD, SLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4.1 Three approximations on right-hand side of (1.4.4) . . . . . . . . . . . . . . . . . . . . . . . 7 4.1.1 The summary of Laplace's approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 iv List of Figures 2.3.1 Rate Function of Bernoulli Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Rate Function of Poisson Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.3 Rate Function of Gamma Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.4 Rate Function of Normal Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 Upper Tail Approximation of Standard Normal CDF: Relative Error . . . . . . . . . . . . . 24 4.2.1 Normal Approximation of Gamma Distribution: n = 60 . . . . . . . . . . . . . . . . . . . . 34 4.2.2 Normal Approximation of Gamma Distribution: n = 600 . . . . . . . . . . . . . . . . . . . . 35 4.2.3 Normal Approximation of Gamma Distribution, n = 6:6, with the Tricomi correction . . . . 37 4.2.4 Normal Approximation of Gamma Distribution, n = 6:6, with Berry-Esseen-Petrov correction 37 5.1.1 Incomplete Gamma function/Gamma CDF: integration-by-parts approximation . . . . . . . 40 5.1.2 Incomplete Gamma function/Gamma CDF: alternating series approximation . . . . . . . . 40 5.2.1 Incomplete Gamma function/Gamma CDF: upper tail approximation . . . . . . . . . . . . 42 5.3.1 Mid-Range Approximation of Standard Normal CDF: Relative Error . . . . . . . . . . . . . 43 5.3.2 A Global Approximation of P (25;a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3.3 An Approximation of P (25;a): Lower Tail At Logarithmic Scale . . . . . . . . . . . . . . . 44 5.3.4 An Approximation of P (25;a): Upper Tail At Logarithmic Scale . . . . . . . . . . . . . . . 45 5.3.5 A Better Approximation of P (25:6;a): Lower Tail At Logarithmic Scale . . . . . . . . . . . 46 5.3.6 A Better Approximation of P (25:6;a): Mid-range . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.7 A Better Approximation of P (25:6;a): Upper Tail At Logarithmic Scale . . . . . . . . . . . 47 v Abstract Given a series of identically distributed random variables X 1 ;X 2 ;:::;X n with mean and variance 2 , the CLT(Central Limit Theorem) can provide a good approximation of the probabilityP(j X n j)>x forx which is not too big compared to p n . Although the CLT always works perfectly around the \center", it is not overwhelming on the \tail" { Large Deviation. For largex, a more delicate analysis is required. This thesis discusses three dif- ferent approaches to this analysis: Large Deviation, Moderate Deviation via Cram er-Petrov series theory and Asymptotic Analysis via Laplace's Method. These three approaches will be illustrated on standard exponential random variables. Key words: Large Deviation Principle, Cram er-Petrov Series, Rate Function, Asymptotic Expansions. Chapter 1 Introduction 1.1 Motivation The Poisson process is one of the most-widely used counting processes. It is usually used in scenarios where we are counting the arrivals of certain events that appear to happen at a certain rate but completely at random. Many natural phenomena can be modeled by the Poisson process. For example, suppose that from historical data, earthquakes occur in a certain area at a rate of one every four days. Except the rate, the occurance of earthquakes seems to be completely random. What is the probability of having more than 100 earthquakes in a year (365 days)? Poisson process gives a good approximation of the number of earthquakes in a yearP(91:25), that is, a Poisson random variable with mean 91:25 = 365=4. Then, the answer is P(P(91:25)> 100) 0:17: Intuitively, one can use normal approximation of the poisson distribution, P()N (;) !1 (1.1.1) leading to the approximate answer by checking the value in the table of standard normal distribution, P(P(91:25))P(N (91:25; 91:25)> 100) =P(N (0; 1)> 100 91:25 p 91:25 ) 0:17: (1.1.2) What if the deviation gets larger? Suppose that we enlarge the rate, that is, one earthquake every two days. What is the probability of having more than 128 earthquakes in 132 days? The answer is P(P(66)> 100) 4:6 10 12 : (1.1.3) Using normal approximation, P(P(66)> 100)P(N (66; 66)> 100) =P(N (0; 1)> 100 66 p 66 ) 1:5 10 14 : (1.1.4) 1 Notice that (1.1.3) and (1.1.4) are pretty small, but the relative dierence is large, which illustrates CLT does not work well in this situation. Therefore, the mission is to understand the limitation of CLT and explore various approaches computing approximation. 1.2 CLT, LLN Let X 1 ;X 2 ;::: be iid random variables with E(X i ) = and Var(X i ) = 2 < 1, and dene X n = 1 n P n i=1 X i . By Strong Law of Large Numbers, for every > 0, P( lim n!1 j X n j<) = 1; (1.2.1) that is, X n converges almost surely to . By Central Limit Theorem, lim n!1 p n( X n ) =N (0; 1) (1.2.2) in distribution, that is, p n( Xn) has a limiting standard normal distribution. Informally, one can express X n as X n +N (0; 1) p n : (1.2.2) can be expressed as lim n!1 F n (x) = (x); x2R (1.2.3) where F n (x) is the CDF (cumulative distributed function) of the random variable p n( Xn) and (x) is the CDF of the standard normal distribution. To analyze the dierence between (1.1.3) and (1.1.4), we should analyze the accuracy of CLT approximation. Then, we need the loss function L n (x) =j F n (x) (x)j x2R: (1.2.4) Since (x) is continuous on R, the convergence in (1.2.3) is uniform with respect to x, that is, for any x, 8> 0,9N2N + such thatL n (x)< for all n>N. Then, lim n!1 sup x2R L n (x) = 0: (1.2.5) Since CDF is non-decreasing on R, it approaches 0 as x!1 and approaches 1 as x! +1. Given an , one can nd an L> 0 such that, for all n, F n (x)< 5 ; x<L; (1.2.6) 1 F n (x)< 5 ; x>L: (1.2.7) The uniform convergence onR is reduced to uniform convergence on a compact interval [L;L]. In other words, (1.2.5), while consistent with both (1.1.1) and (1.1.3), illustrates nothing about the behavior of the 2 tail of function F n (x), that is, about P( p nj X n j>x n ) (1.2.8) as lim n!1 x n = +1, besides saying that the number can be arbitrarily small. Therefore, we conclude that to understand the dierence between (1.1.3) and (1.1.4), one has to study expression of the type (1.2.8) in more detail. 1.3 Large Deviation and Moderate Deviation To study (1.2.8), one need to compare the rate of growth between p n and x n . There are three cases as n!1: moderate deviation x n = p n! 0 x n =o( p n) large deviation x n = p n!c; c> 0 x n =O( p n) super-large deviation x n = p n!1 p n =o(x n ) Table 1.3.1: LD, MD, SLD Since super-large deviation depends too much on the underlying distribution, we will not discuss in this thesis. For exmaple, if a random variable X is bounded, then the probability in (1.2.8) is zero for all suciently large n according to x n = p n!1. The key to study both Large Deviation and Moderate Deviation is rate funciton (x), whcih is de- ned as the Legender transform of the log of the moment generating function () of X i.e (x) , sup 2R d[h;xi ()]. With the rate functions, an approximation of (1.2.8) is given by Large Deviation Principle: P( X n >x)e n (x) x>; (1.3.1) P( X n <x)e n (x) x<: (1.3.2) With the rate function (x), many random variables can be easily computed. For example, if X has Poisson distribution with mean 1, then (x) =x lnxx + 1: Recalling exponential Chebyshev inequality, that is, for any t> 0, P(X >x)e tx E(e tX ): (1.3.3) Applying Chebyshev inequality on (1.3.1) and (1.3.2), we can get the upper bound of the probability, take 3 (1.3.1) as an example, the other one is similar, P( X n >x) =P( n X i=1 x i >nx)e nx E(e n X i=1 x i ) (1.3.4) =e nx E(e x1 ):::E(e xn ) (1.3.5) =e nx e () =e n(x()) : (1.3.6) The exponent of last equation in (1.3.6)x () looks similar with rate function (x). However, (x) achieves supremum on. The precise meaning of is a more delicate question.To derive (1.3.1) and (1.3.2), the goal is to nd lower bounds. The proof of low bounds is complicated including change of measure such that mean of X is shifted from to x, which is known as Cram er's theorem. Back to the example (1.1.3), large deviation principle can give a good answer. SinceP(66) is the sum of 66 iidP(1) random variables, (1.1.3) can be approximated as P( X 66 > 128 66 )e 66( 128 66 ln 128 66 128 66 +1) 1:3 10 10 : (1.3.7) In the moderate deviation regime, the approxiation can be expressed as 1 F n (x n ) 1 (x n ) = exp n + x n p n + x 2 n 2 1 +O x n + 1 p n ; (1.3.8) F n (x n ) (x n ) = exp n x n p n + x 2 n 2 1 +O x n + 1 p n ; (1.3.9) where x n !1 and x n = p n! 0 as n!1. These equalities are part of the theory started in the late 1930's by the Swedish mathematician Harald Cram er (1893-1985) and further developed by the Russian mathematician Valentin Vladimirovich Petrov (b.1931). It is known as Cram er's series under the assumption that x n 0, x = o p n and Cram er's condition is satised. Cram er condition is that the moment-generating function exists, which implies the existence of moments of all orders. In large deviation x n = O p n, we can transform the expression from Cram er series to large deviation principle by writing x n = p n jyj (1.3.10) ignoring the dependence of y on n, so that y = x n p n ; (1.3.11) P X n >y = 1 F n (x n ); y>; (1.3.12) P X n <y = F n (x n ); y<; (1.3.13) plug (1.3.8) and (1.3.9) into (1.3.12) and (1.3.13) respectively, P X n >y p n y exp n (y) +n (y) p 2 2 ! ; y> (1.3.14) P X n <y p n y exp n (y) +n (y) p 2 2 ! ; y< (1.3.15) 4 From moderate deviation to large deviation, we need to push x n = o( p n) to x n = O( p n). We can x the y in (1.3.10). It looks like (1.3.14) and (1.3.15) are the improved version of LDP (1.3.1) and (1.3.2). Indeed, back to the example (1.1.3), using approximation (1.3.14) with (y) =y lnyy + 1;n = 66;y = 128=66; = = 1 P(P > 128) = 6:5 10 12 : The result is better than (1.3.7). Even an approximation of normal tail can give a better result than (1.3.7). For a random variable x with standand normal distribution, P(X >x) = 1 p 2 Z 1 x 1e t 2 2 dt 1 p 2 Z 1 x t x e t 2 2 dt = e x 2 2 x p 2 : The normal tail is (x) 1 p 2x e x 2 2 ; x!1: (1.3.16) The result by (1.3.16) is P(P(66)> 128) 6:7 10 12 : Once large deviation principle is stated, we will have a better understanding of in (1.3.1) and (1.3.2). Moreover, we can always nd an improved approximation (1.3.14) and (1.3.15), which is equivalent to LDP. When the function F n (x) is known for alln, asymptotic analysis of F n (x n ) become available, for example, Laplace's method. 1.4 Incomplete Gamma Function and Poisson Process To understand the probability relationship between Poisson process and gamma distribution, we can study Poisson process as a special counting process. Considering a Poisson process with rate k, which is a random counting process satisfying: the count N over an interval of length T isP(kT ); disjoint intervals are independent; waiting time Y i between (i 1)th and ith arrivals are independent Exp(1=k) = (1; 1=k). The third point is actually a consequence of the rst two.Then, time until mth arrival = n X i=1 Y i (n; 1=k) (1.4.1) and (1.4.1) has a natural relationship with number of arrivals in[0;T ] =NP(kT ); that is, P(Nm) =P(n or more counts in [0;T ]) =P( n X i=1 Y i T ): (1.4.2) 5 In other words, P(P(kT )n) =P((n; 1=k)T ): (1.4.3) Furthermore, if we apply this with k = 1=2, T = 2m, and remember that (p=2; 2) isX 2 p , then P(Poi(m)n) =P(X 2 2n 2m): Recall Gamma distribution with shape parameter> 0 and rate parameter > 0, the probability density function (PDF) is f X (x) = () x 1 e x ; x> 0; where () is Gamma function, that is () = Z +1 0 t 1 e t dt and (n + 1) =n! ( 1 2 +n) = n 1 2 n n! p : If X elds Gamma distribution with shape parameter and rate parameter , then using location-scale transformation for Y =a +bX, f Y (y) = 1 b f X ( ya b ); then X has Gamma distribution with shape parameter and rate parameter 1. Accordingly, with no loss of generality, we will assume that k = 1. Then (1.4.3) becomes P(P(T )>n) =P((n + 1; 1)T ) (1.4.4) and then the main goal of interest in connection with (1.4.4) becomes the integral P (n + 1;T ) = 1 n! Z T 0 x n e x dx; (1.4.5) whereP (n+1;T ) =P((n+1; 1)T ) as known as normalized incomplete Gamma function. The following relation is complementary to (1.4.4): P P(T )n =P (n + 1; 1)>T ): (1.4.6) Similarly, analysis of (1.4.6) is reduced to the other normalized incomplete Gamma function Q(n + 1;T ) = 1 n! Z +1 T x n e x dx: (1.4.7) Here is an interesting nding. The left-hand side of (1.4.4) is the sum e T 1 X k=n+1 T k k! ; 6 the right-hand side of (1.4.4) is the integral 1 n! Z T 0 x n e x dx: As a rule, analysis of integrals (continuous) is always easier than analysis of sums (discrete), since continuous functions has better properties like dierentiable, smooth, integrable. In CLT, an approximation of a discrete distribution by a normal distribution (continuous) leads to additional approximation errors. As an illustration, back to the example (1.1.3), because we have already got the probability of the Gamma distribution, we apply LDP (1.3.1) and its improved version (1.3.14) on the right-hand side of (1.4.4). Since the probability of Gamma distribution is integral (continuous) , we guess we will have a better approximation than doing the approximation on the left-hand side. P P(66)> 128 =P X 1 + +X 129 < 66 ; (1.4.8) where X 1 ;X 2 ;:::;X 129 are iid exponential random variables with mean 1. Moreover, we have shown that the rate funciton of standard exponential random variable is (x) =x 1 lnx; x> 0: Then we get the following approximation for the right-hand side of (1.4.8): P(P > 128) 6:5 10 11 approximated by (1.3.1) P(P > 128) 4:6 10 12 approximated by (1.3.14) P(P > 128) 6:5 10 12 approximated by (1.3.14) and (1.3.16) Table 1.4.1: Three approximations on right-hand side of (1.4.4) Indeed, it looks good recalling the exact value of the probability is close to 4:6 10 12 . If T <n, we can apply additional tools to solve the approximation of Gamma distribution. The detailed proof will be shown using Laplace's method in the Chapter 4. The sketch of the approach is following: P Gamma(n + 1; 1)T = 1 n! Z T 0 x n e x dx = n n+1 n! Z T=n 0 e n(lnxx) dx; since the function S(x) = lnxx is increasing on the interval (0; 1), for large n, the value of the integral is almost determined by the values of S(x) in the neighbor of T=n, that is Z T=n 0 e nS(x) dx Z T=n 0 e nS(T=n)+nS 0 (T=n)(T=nx) dx T n(nT ) exp n T n n : (1.4.9) Furthermore, using the Stirling approximation n! p 2n n e n ; (1.4.10) 7 then, the result is P Gamma(n + 1; 1)T 1 p 2n T nT exp n T n : (1.4.11) For the example (1.1.3), we can get the approximation by (1.4.11) P X 129 < 66 4:8 10 12 : 8 Chapter 2 Rate Function and Large Deviation Principle Cram er's theorem deals with LDP associated with the real-valued random variable X n = 1 n P n i=1 X i . Generally,X i 's can be dierent distribution and weakly dependent. It is not necessary to assume that X i 's are iid. Moreover, X i 's can be in R d where d2 N + or in innite dimensions. In the following, we only discuss the situations under the assumption that X i 2R and they are iid. In Cram er's theorem, the key is calculating rate functions. To calculate the rate functions for deterministic distributions, several support funcitions are necessary. One is logarithmic moment function (), lnE(e X1 ): Based on (), the other one is Fenchel-Legendre transform (x), sup 2R (x ()): 2.1 Denition and Properties of Rate Function Let X be a non-degenerate random variable, that is, P(X =a)< 1 for all a2R. In the following, the random variables we consider are all non-degenerate. Denition 2.1.1 A random variable X satises cram er condition if the following condition hold: there exists a > 0 such that Ee x <1; jj< Remark that, we write M() =Ee X 9 as the moment generating function of random variable X. If X satises Cram er's condition, the logarithmic moment generating funciton is dened as (), lnM(), lnE(e ): (2.1.1) Denition 2.1.2 The Fenchel-Legendre transform of () is (x), sup 2R (x ()); the function is called rate function. Proposition 2.1.3 () is a strictly convex function, that is (t + (1t))<t() + (1t)(); t2 (0; 1); and admits an analytic continuation to a neighborhood of zero in the complex plain. Proof: The convexity of follows by H older inequality, since (t + (1t)) = lnE[(e X ) t (e X ) 1t )] ln[E(e X ) t E(e X ) 1t ] =t() + (1t)(); t2 (0; 1) Then, to prove the inequality is indeed strict. An equality holds if and only if e tX =ce (1t)X for some constant c, that is, if X is degenerate. Using Talyor series on e X e X = 1 + 1 X k=1 k X k k! : By Cram er's condition,Ee jXj <1 for all suciently smalljj. Dominated Convergence Theorem implies M() = 1 + 1 X k=1 k E(X k ) k! : Therefore, M admits an analytic continuation to a neighborhood of zero in the complex plane, sinceM6= 0, so does . 2 Theorem 2.1.4 If X satises Cram er's condition, is the mean of X, then the rate funciton has following properties: (i) is convex; (ii) If ()<1 for some > 0, then <1, and for all x, (x) = sup 0 (x ()) (2.1.2) 10 i.e for x>, (x) is non-decreasing. Similarly, if ()< inf for some < 0, then >1, and for all x, (x) = sup 0 (x ()); (2.1.3) (iii) () = 0 if <1; (iv) (y)> 0 for y6=, then (y) = (y) 2 2 2 +o((y) 2 ); y!: (2.1.4) Proof: The proofs of (i), (ii) and (iii) are similar to the proof of [1, lemma 2.2.5]. (i) The convexity of follows by its denition, since t (x) + (1t) (y) = sup 2R (txt()) + sup 2R ((1t)y (1t)()) sup 2R [(tx + (1t)y) ()] = (tx + (1t)y) for x;y2R and t2 (0; 1). (ii) If () = lnM()<1 for some> 0, then =E(x)<M()=<1. It implies that<1. Now for all 2R, by Jensen's inequality, () = lnE(e x )E(lne x ) = (2.1.5) If <1, then () =1, and (2.1.2) trivially holds since (x) = 0 when = 0. When is nite, () = 0 since (). In this case, for any x> and < 0, x () () () = 0; (2.1.6) then (2.1.2) follows. (2.1.2) implies the monotonicity of (x) on (;1), since for every 0, the function x () is non-decreasing as a function of x. Similarly, we can prove (2.1.3) by the same approach. (iii) When is nite, () = 0 has been shown as before by Jensen's inequality. (iv) Intuitively, we use tool Taylor series at point to get (2.1.4). h(;y) =y (); so that (y) = sup h(;y): Because h(0;y) = 0, it follows that (y) 0. Then @h(;y) @ =y 0 () = 0; 11 if = , where = (y) is the solution of the equation y = 0 ( ) = M 0 ( ) M( ) : (2.1.7) Strict convexity of implies 0 is strictly increasing, and therefore equation (2.1.7) has a unique solution, and (y) =h (y);y =y (y) (y) : (2.1.8) By Proposition 2.1.3, =E(X) =M 0 (0) = 0 (0); 2 = 00 (0); (2.1.9) and therefore () = 0, which, by (2.1.8), implies () = 0. Next, dierentiating (2.1.8) with respect to y and using (2.1.7) yields ) 0 (y) =y ) 0 (y) + (y) 0 (y) ) 0 (y) = (y); that is, the derivatives of and are mutually inverse functions: 0 ( ) 0 (y) =y; ( ) 0 0 () =: (2.1.10) In particular, dierentiating the rst equality in (2.1.10) with respect to y and using (2.1.9), ( ) 00 () = 1 00 ( ) 0 () = 1 00 (0) = 1 2 : Hence, (y) = () + ( ()) 0 1! (y) + ( ()) 00 2! (y) 2 +o((y) 2 ) = (y) 2 2 2 +o((y) 2 ); y!: 2 The rate function is one of the characteristics of a random variable. When we transform a random variable, we also transform the rate function of the variable. Therefore, we will discuss transformation of the rate function in the following. One common transformation is standardization, that is, if X has mean and variance 2 , then Z = X ; (2.1.11) which reduces to a random variable Z with mean 0 and variance 1. Notating X and Z as the corre- sponding rate functions. Proposition 2.1.5 If a random variable X and Z satises (2.1.11), then Z (y) = X (y +) (2.1.12) X (y) = Z y : (2.1.13) 12 Proof: Note that Ee X =e Ee Z ; and so X () = + Z (): By the denition of , Z (y) = sup 2R (y Z ()) = sup 2R (y + X ( )) = sup 2R ((y +) X ( )) = X (y +); then (2.1.12) follows. Similarly, X (y) = sup 2R (y Z ()) = sup 2R ((y) Z ()) = sup 2R ( y Z ()) = Z ( y ); then (2.1.13) follows. 2 2.2 Large Deviation Principle Theorem 2.2.1 Let X;X 1 ;X 2 ;::: be iid and X n = 1 n n X k=1 X k : If X satises Cram er's condition, then the Large Deviations Principle (LDP) holds: lim n!1 1 n lnP X n >x = (x); x>; (2.2.1) lim n!1 1 n lnP X n <x = (x); x<: (2.2.2) Next, we will introduce a more general theorem. let n denote the measure of X n , then Theorem 2.2.2 Let X i 2 R, the sequence of measuresf n g satises the LDP with the convex function , namely 1. For any closed set FR, lim sup n!1 1 n ln n (F ) inf x2F (x): (2.2.3) 2. For any open set GR, lim inf n!1 1 n ln n (G) inf x2G (x): (2.2.4) 13 2.3 Examples of Rate Functions For the Bernoulli random variable with probability of success p2 (0; 1), () = ln(Ee x ) = ln(pe +q) where q = 1p. Setting h(;y) =y ln q +pe , @h(;y) @ =y pe q +pe = 0 Then, we get the maximum h(;y) of , = = ln q 1y ln p y As a result, (y) = 8 < : y ln y p + (1y) ln 1y 1p ; y2 (0; 1); +1; y = 2 (0; 1): Note that (0+) = lnq, (1) = lnp, (p) = 0, (y)> 0, y6=p, and the function y7! (y) is strictly convex for y2 (0; 1) and is strictly increasing for y2 (p; 1). In particular, for x2 (p; 1), lim n!1 1 n lnP 1 n n X k=1 X k >x ! = (x) =x ln x p (1x) ln 1x 1p ; so that, for large n, lnP( X n >x)nx ln x p n(1x) ln 1x 1p ; P( X n >x) p x nx 1p 1x n(1x) : By denition, for every n, P( X n > 1) =P( X n < 0) = 0, which is consistent with the denition of and the corresponding large deviation asymptotics (2.2.1) and (2.2.2). For the Poisson random variable with mean > 0, x2N f X (x) = x e x! : Then, () = lnE(e x ) = ln 1 X x=0 x e x! e x = ln e 1 X x=0 (e ) x x! : Using Taylor series, () = ln e e e = ln e (e 1) = (e 1); which is dened for all 2R. Setting h(;y) =y [(e 1)], @h(;y) @ =ye = 0: 14 Figure 2.3.1: Rate Function of Bernoulli Random Variable Let h 0 (;y) = 0, we focus on the y which is much larger than the mean , e = y 1 equivalent to = ln y 0: Then, the rate function for y> 0 (y) =y ln y [( y 1)] =y ln y [y] =y ln y y +: Consequently, (y) = 8 < : y ln y y +; y> 0; +1; y< 0: Note that (0+) = , () = 0, (y) > 0, y6= , and the function y7! (y) is strictly convex for y> 0 and is strictly increasing for y>. In particular, for x, lim n!1 1 n lnP 1 n n X k=1 X k >x ! = (x) =x ln x +x; so that, for large n, lnP( X n >x)nx ln x +nxn; P( x n x) = ( x ) nx e n(x) : 15 Figure 2.3.2: Rate Function of Poisson Random Variable For the Gamma random variable with shape parameter > 0 and scale parameter > 0, x2 (0;1), f X (x) = 1 () x 1 e x : Then, () = lnE(e x ) = ln Z 1 0 e x 1 () x 1 e x dx = ln 1 () Z 1 0 x 1 e ( 1 )x dx: Since 0 and x, () = ln 1 () Z 1 0 [( 1 )x] 1 ( 1 ) e ( 1 )x d( 1 )x: Next, denote ( 1 )x as y, () = ln 1 () 1 ( 1 ) Z 1 o y 1 e y dy: For 1 > 0, that is, < 1 , () = ln 1 () 1 ( 1 ) () = ln 1 (1) = ln(1): Setting h(;y) =y + ln(1), @h(;y) @ =y 1 = 0: 16 If = = 1 y As a result, (y) = 8 < : y + ln( y ); y> 0; +1; y< 0: Note that (0+) = +1, () = 0, (y)> 0, y6=, and the function y7! (y) is strictly convex for y> 0 and is strictly increasing for y>. In particular, for x>, lim n!1 1 n lnP 1 n n X k=1 X k >x ! = (x) = x + ln( x ); so that, for large n, lnP( x n >x)n( x + ln x ); P( x n >x) ( x ) n e n( x ) : The formulas are especially simple for the standard exponential random variables ( = = 1): (y) =y lny 1; P( X n x)x n e n(x1) ; x> 1: Figure 2.3.3: Rate Function of Gamma Random Variable 17 For the normal random variable with mean 2R and variance 2 > 0, x2R, f X (x) = 1 p 2 e (x) 2 2 2 : Then, E(e X ) = 1 p 2 e [ (+ 2 ) 2 2 2 2 2 2 ] Z 1 1 e [x(+ 2 )] 2 2 2 dx =e [ (+ 2 ) 2 2 2 2 2 2 ] =e 2+ 2 2 2 ; 2R: The cumulant generating function of normal random variable is () = lnE(e x ) = 2 + 2 2 2 : Setting h(;y) =y 2+ 2 2 2 , @h(;y) @ =y 2 = 0: If = = y 2 : As a result, (x) = ( x 2 x) 2( x 2 ) + ( x 2 ) 2 2 2 = (x) 2 2 2 : Note that () = 0, (y) > 0, y6= , and the function y7! (y) is strictly convex for y2 R and is strictly increasing for y>. In particular, for x>, lim n!1 1 n lnP 1 n n X k=1 X k >x ! = (x) = (x) 2 2 2 ; so that, for large n, lnP( X n >x)n (x) 2 2 2 ; P( X n >x) exp n (x) 2 2 2 : The formulas are especially simple for the standard normal random variables( = 0, 2 = 1): (y) = y 2 2 ; P( X n x) exp n x 2 2 ; x> 0: 18 Figure 2.3.4: Rate Function of Normal Random Variable 19 Chapter 3 Moderate Deviation and Normal Tail Starting with the traditional forms of (1.3.8) and (1.3.9), 1 F n (x n ) 1 (x n ) = exp x 3 n p n x n p n 1 +O x n + 1 p n ; (3.0.1) F n (x n ) (x n ) = exp x 3 n p n x n p n 1 +O x n + 1 p n ; (3.0.2) where (t) = P 1 k=0 k t k is a power series with the coecients k only depend on the cumulants of the random variable X. Moreover, (t) converges for suciently small values ofjtj. It is known as Cram er- Petrov series. Next, We will conrm that the Cram er-Petrov series (y) satises y 3 (y) = (y) + y 2 2 : At the end of this chapter, we will conrm a more advanced version of large deviation principle derived from moderate deviation. 3.1 Derivation of Moderate Deviation Approximation In this section, we will give the proof of (1.3.8) and (1.3.9). We rst dene that S n = P n j=1 X j and F n (x) =P( Sn p n <x), then we have the following theorem. Theorem 3.1.1 If x n 0, x n =o p n and if Cram er's condition is satised, then 1 F n (x n ) 1 (x n ) =exp(n ( x n p n ) + x 2 n 2 )(1 + O( x n + 1 p n )); F n (x n ) (x n ) =exp(n ( x n p n ) + x 2 n 2 )(1 + O( x n + 1 p n )): Proof: The part of proof is inspired by [6, Chapter VIII, section 2, theorem 1]. Since Cram er's condition is satised, there is a such that E(e x1 )<1. We denote the CDF of X by V (x). Next, we introduce 20 an auxilary sequence of independent random variables X n with a common CDF. V (x) = 1 M() Z x 1 e y dV (y) (3.1.1) where M() =E(e X 1 ) = Z 1 1 e y dV (y): Next, we introduce some values. v(t) =Ee itX 1 ; v(t) =Ee it X 1 ; m =E X 1 ; 2 =Var X 1 ; S n = n X j=1 X j ; W n (x) =P(S n <x); W n (x) =P( S n <x); w n (t) =E(e itSn ); w n (t) =E(e it Sn ); F n =P( S n n m<x n 1 n ): Using the notations above, we have F n (x) = W n (x p n); F n (x) = W n (n m +x n 1 2 ); w n (t) = v n (t); w n (t) = v n (t); v(t) = Z 1 1 e itx d V (x) = 1 M () v(ti): (3.1.2) By (3.1.2), we can have v(t) = M () v(t +i); (3.1.3) w n (t) = R n w n (t +i): (3.1.4) (3.1.4) implies that W n (x) = M n () Z x 1 e y d W n (y): Since F n (x) = W n (x p n), 1 F n (x) = M n () Z 1 xn 1 2 e y d W n (y) = M n ()e n m Z 1 x mn 1 2 e t p n dF n (t): According to the cumulant generating function, we can write the expand ln M () = 1 X v=1 v v! v ; where v is the cumulant of order v of the random variable X. Since V (x) = 1 M() R x 1 e y dV (y), we have m = 1 M () Z 1 1 xe x dV (x): 21 Similarly, we can also get 2 = 1 M () Z 1 1 xe x dV (x) m 2 : By dominated convergence theorem, if we take derivative with respect to on ln M (), we can interchange the integral and derivative. Therefore, m = d d ln M () = 1 X v=2 v (v 1)! v1 ; 2 = d 2 d 2 ln M () = 1 X v=2 v (v 2)! v2 : Consider the equation t = m(): It is obvious that m(0) = 0 and the derivative of function m() is positive in the neighborhood of = 0. Therefore, for suciently small t,t = m() has unique real root which has the same sign as t and goes to zero when t goes to 0. Then, we can expand the into a power series about t, = t 3 2 4 t 2 +::: ; ln M () m + t 2 2 =t 3 (t) where (t) = 3 6 3 + 4 2 3 2 3 24 6 t +:::: Next , we will prove ln M () m = (y). For the value of denoted by which we take in the rate function, it should be the such that m = 0 ( ) =t Therefore, y 3 (y) = ( (y))y + y 2 2 where y = 0 ( (y)) Then, we successfully get the formula y 3 (y) = (y) + y 2 2 : Then, we can plug in y = xn p n and get 1 F n (x n ) 1 (x n ) =exp(n ( x n p n ) + x 2 n 2 )(1 + O( x n + 1 p n )); F n (x n ) (x n ) =exp(n ( x n p n ) + x 2 n 2 )(1 + O( x n + 1 p n )): 2 22 To conclude a more general formula, we should consider what if x n is not standardized. We can solve this problem by the proposition (2.1.5). 1 F n (x n ) 1 (x n ) =exp(n ( + x n p n ) + x 2 n 2 )(1 + O( x n + 1 p n )); (3.1.5) F n (x n ) (x n ) =exp(n ( x n p n ) + x 2 n 2 )(1 + O( x n + 1 p n )): (3.1.6) 3.2 Normal Tail Bounds Recall that (x) = 1 p 2 Z x 1 e y 2 =2 dy; x2R; so that (x) + (x) = 1: Proposition 3.2.1 The following equality holds: (x) = 1 p 2 Z +1 x e y 2 =2 dy = e x 2 =2 p 2 n X k=0 (1) k (2k 1)!! x 2k+1 +O x 2n3 ! ; x! +1 (3.2.1) where (1)!! = 1; (2k 1)!! = 1 3 (2k 1); k 1: Proof: This follows after repeated integration by parts: Z +1 x e y 2 =2 dy = Z +1 x y 1 d e y 2 =2 =x 1 e x 2 =2 Z +1 x y 2 e y 2 =2 dy; more generally, (1) k (2k 1)!! Z +1 x y 2k e y 2 =2 dy = (1) k+1 (2k 1)!! Z +1 x y (2k+1) d e y 2 =2 = (1) k (2k 1)!!e x 2 =2 x 2k+1 + (1) k+1 (2k + 1)!! Z +1 x y (2k+2) e y 2 =2 dy: 2 Note that, for xed x, the sum on the right-hand side of (3.2.1) diverges as n!1. Equality (3.2.1) is thus an example of an asymptotic expansion, often written in the form (x) e x 2 =2 p 2 +1 X k=0 (1) k (2k 1)!! x 2k+1 ! ; x! +1: The following result gives both upper and lower bounds on (x) for all x> 0. Theorem 3.2.2 If x> 0, then 1 p 2 x x 2 + 1 e x 2 =2 (x) r 2 1 x e x 2 =2 : (3.2.2) 23 Proof: For the upper bound, Z +1 x e y 2 =2 dy Z +1 x e xy=2 dy = 2 x e x 2 =2 : For the lower bound, note that if g(x) =x 1 e x 2 =2 then g 0 (x) =(x 2 + 1)e x 2 =2 = 1 +x 2 x 2 e x 2 =2 ; and the function x! x 2 1 +x 2 is increasing for x> 0. As a result, for y>x, y 2 1 +y 2 x 2 1 +x 2 ; and Z +1 x e y 2 =2 dy = Z 1 x y 2 1 +y 2 1 +y 2 y 2 e y 2 =2 dy x 2 1 +x 2 Z 1 x 1 +y 2 y 2 e y 2 =2 dy = x 2 1 +x 2 g(x) = x 1 +x 2 e x 2 =2 : 2 Combining (3.2.1) and (3.2.2), we see that (1.3.16) is a good approximation starting with moderately large x, say,x> 5. Figure 3.2.1 presents the relative error of this approximation for x2 [2; 6]. Both (3.2.1) and the gure suggest the relative error decays as x 2 , but the numerical computations break down for x> 7 because of numerical instability related to division of two numbers that are close to zero. Figure 3.2.1: Upper Tail Approximation of Standard Normal CDF: Relative Error 24 3.3 Connecting Moderate and Large Deviations If X n is the sample mean of n iid standard random variables with mean zero and variance one, then, the weak law of large numbers tells, lim n!1 P(j X n j>") = 0; for every " > 0. More generally, given a sequencef" n ; n 1g of positive numbers, we refer to events of the form f! : X n >" n g and f! : X n <" n g (3.3.1) as deviations. In particular, by the Central Limit Theorem, lim n!1 P( p nj X n j>x) = 1 2(a) for every a> 0; equivalently, lim n!1 P(n 1=2 j X n j>x) 1 2(x) = 1: (3.3.2) Accordingly, the case" n =xn 1=2 in (3.3.1) is called normal deviations. If we allowx in (3.3.2) to grow with n, then there is no guarantee that (3.3.2) will continue to hold, and the study of the corresponding limit can be broadly considered as the subject of the theory of large deviations; here, \large" refers to x =x n growing with n, as opposed to staying xed. A more detailed analysis shows that there are three somewhat distinct regimes: moderate deviations, with x n =o(n 1=2 ), that is, x n !1 but x n n 1=2 ! 0; large deviations, with x n =O(n 1=2 ); super-large deviations, with x n n 1=2 ! +1. If the random variable X satises Cram er condition and x n = o( p n), then we have already got the normalized version of moderate deviation approximation (3.1.5) and (3.1.6), that is, 1 F n (x n ) 1 (x n ) = exp n + x n p n + x 2 n 2 1 +O x n + 1 p n ; F n (x n ) (x n ) = exp n x n p n + x 2 n 2 1 +O x n + 1 p n : Recall that, by the classical Berry-Essen theorem applied to the sum of n iid standard exponentials, sup x jP (n;n +x p n) (x)j 2 p n ; (3.3.3) see, for example, [7, Theorem III.11.1]. For xed x n = x, both (3.1.5) and (3.1.6) are consistent with (3.3.3). Next, we will see that (3.1.5) and (3.1.6) are consistent with (2.2.1) and (2.2.2); recall that actual (2.2.1) and (2.2.2) are established forx n = p njxj=, that is, forx n =O( p n), and therefore, strictly speaking, (2.2.1) and (2.2.2) cannot be derived from (3.1.5) and (3.1.6). Corollary 3.3.1 If the random variableX satises Cram er's condition andx n ! +1; x n =o( p n); n! 25 1, then 1 n lnP X n >x n + (x n = p n) ; (3.3.4) 1 n lnP X n <x n (x n = p n) : (3.3.5) Proof: This follows directly from (3.1.5) and (3.1.6) after observing that, by (3.2.2), 1 n ln 1 (x n ) x 2 n 2n ; n!1: 2 Corollary 3.3.2 If the random variable X satises Cram er's condition and 0<x n =o(n 1=6 ), then lim n!1 1 F n (x n ) 1 (x n ) = 1; (3.3.6) lim n!1 F n (x n ) (x n ) = 1: (3.3.7) Proof: we rst prove (3.3.6), with no loss of generality, assume = 0, = 1. By (3.1.5), we need to show that if x n =n 1=6 ! 0, then n x n = p n + x 2 n 2 ! 0: It follows from (2.1.4) that x n p n = x 2 n 2n +O(x 3 n n 3=2 ); Therefore, n x n = p n + x 2 n 2 =O(x 2 n n 1=2 ) =O x n n 1=6 3 ! 0: 2 26 Chapter 4 Laplace's Method Laplace's method is an approach for asymptotic analysis on the integrals Z b a f(x)e nS(x) dx; n!1; (4.0.1) where the function S : x7! S(x) is suciently smooth [e.g. S2 C 1 ] and achieves its maximal value on [a;b] at a unique point x =c. There are ve situations to consider, namely, when S is strictly increasing (S 0 (x)> 0) or strictly decreasing (S 0 (x)< 0) on [a;b] and when S 0 (c) = 0 with c =a; c2 (a;b); or c =b. We will state the results, outlining the proofs, and then apply the results to conrm (1.4.10) and (1.4.11). We will also derive the upper-tail counterpart of (1.4.11) and the normal approximation of (n + 1; 1). 4.1 Summary of Laplace's Method Let's consider the integral W (n) = Z b a f (x)e nS(x) dx ; n!1 wheref andS are suciently smooth functions dened onR, and1<a<b< +1. Assume that there is a unique point c2 [a;b] such that max x2[a;b] S(x) =S(c): The main idea is that the value of the integral mostly depends on the value of the integral in the neighbor of the maximum which is betweena andb. If we restrict the range of the integral around the maximum, then we can expand the f (x) and S(x) into asymptotic series. Then we can change the region of the integral which includes the maximum to get a good approximation. For the special case S(x) =x, I (n) = Z b 0 f (x)e nx dx 27 where f (x) is continuous and has asymptotic expansion series f (x) =x 1 X k=0 a k x k ; x! 0 +: In order to achieve the convergence of the series, we require >1 and > 0. In this case, directly applying Watson's lemma, then we have I (n) 1 X k=0 a k ( +k + 1) n +k+1 ; n! +1: In general cases, we can transform the expression into the Watson's lemma condition that is substituting S(x) by another variable z. However, this method does not always work because the inverse function which expresses the x in s is hard to get sometimes. To conclude the approximation of W (n), we want to substitute S(x) by its Taylor expansion, so we need the second method by considering the following cases. First, we assume that S(x) gets its maximum at x = c, then we can substitute S(x) by the rst few terms of its Taylor series in a regionjxcj<", where " is some small value. Case 1 If c is located at one of the endpoints a or b and S 0 (x)6= 0, we can approximate S(x) by S(c)+(xc)S 0 (c). If c =a, W (n) = Z b a f (x)e nS(x) dx = Z a+" a f (a)e n(S(x)) dx Z a+" a f (a)e n(S(a)+(xa)S 0 (a)) dx Z 1 a f (a)e n(S(a)+(xa)S 0 (a)) dx: Taking z =xa, W (n) = f (a)e nS(a) Z 1 0 e nzS 0 (a) dz = f (a)e nS(a) nS 0 (a) ; as n! +1: Case 2 Similarly, we can also solve the problem if c =b, then W (n) f (b)e nS(b) nS 0 (b) ; as n! +1: Note that S 0 (a)< 0 if c =a and S 0 (b)> 0 if c =b. Case 3 & 4 & 5 If S 0 (c) = 0 and S 00 (c)6= 0 where the c is in the interval [a;b], then we approximate S(x) by S(c) + (xc) 2 2 S 00 (c). W (n) = Z b a f (x)e nS(x) dx Z c+" c" f (c)e n(S(c)+ (xc) 2 2 S 00 (c)) dx f (c)e nS(c) Z +1 1 e n (xc) 2 2 S 00 (c) dx: 28 Taking z =(n (xc) 2 2 S 00 (c)), and recall that R +1 1 e x 2 dx = p , then W (n) p 2 p nS 00 (c) f (c)e nS(c) : Similarly, the following case is a more general claim for S 0 (c) = 0; S 00 (c) = 0:::S (m1) (c) = 0 and S (m) (c)6= 0 for even m. W (n) Z c+" c" f (c)e n(S(c)+ (xc) m m! S (m) (c)) dx: Setting z =xc, then W (n) = 2f (c)e S(x) Z +1 0 e n z m m! S (m) (c) dz: Setting k =n z m m! S (m) (c), then we can get z m1 = ( km! nS (m) (c) ) m1 m and dk =n m m! z m1 S (m) (c), W (n) = 2fe nS(c) m! nmS (m) (c) Z +1 0 ( km! nS (m) (c) ) 1 m 1 e k dk = 2f (c)e nS(c) (m) 1 m m(nS (m) (c)) 1 m ( 1 m ): Finally, we get that W (n) 2( 1 m )(m!) 1 m m(nS (m) (c)) 1 m f (c)e nS(c) : The following table is the summary of the results: Case Arg max c of S(x) Asymptotic of R b a f (x)e nS(x) dx;n!1 1 c = a, S 0 (a)< 0 f (a)e nS(a) nS 0 (a) (1 + O(n 1 )) 2 c = b, S 0 (b)> 0 f (b)e nS(b) nS 0 (b) (1 + O(n 1 )) 3 c = a, S 0 (a) = 0 2( 1 m )(m!) 1 m m(nS (m) (a)) 1 m f (a)e nS(a) (1 + O(n 1 2 )) 4 c = b, S 0 (b) = 0 2( 1 m )(m!) 1 m m(nS (m) (b)) 1 m f (b)e nS(b) (1 + O(n 1 2 )) 5 c2 (a;b) 2( 1 m )(m!) 1 m m(nS (m) (c)) 1 m f (c)e nS(c) (1 + O(n 1 2 )) Table 4.1.1: The summary of Laplace's approximations Note that m is a number such that S (m) (c)6= 0 and m is even. Main Error Analysis: W (n) =e nS(c) Z b a f (x)e n(S(x)S(c)) dx: To analyze the main error, we can only consider R b a f (x)e n(S(x)S(c)) dx. Case 1 & 2 29 Assuming c =a and S 0 (a)< 0, we use notation r for S 0 (a), we have Z b a f (x)e n(S(x)S(c)) dx f (a) Z b a e nr(xa) dx f (a) Z +1 0 e nrx dx: Since f (x) f (a) + f 0 (a)(xa) and f 0 (a)6= 0, we get that the error term is proportional to Z +1 0 xe nrx dx = 1 nr Z +1 0 xde nrx = 1 nr Z +1 0 e nrx dx = 1 n 2 r 2 : Therefore, the case 1 error is at most of order n 2 . Similarly, we can also get the order of error in case 2. Case 3 Assuming c=a and all rst (m1) derivatives are all zeros untilm, we know thatm is even and S (m) (a)< 0. Since f (x) f (a) + f 0 (a)(xa) where f 0 (a)6= 0, we get the error term is proportional to Z +1 0 xe nrx dx = 1 nr Z +1 0 xde nrx = 1 nr Z +1 0 e nrx dx = 1 n 2 r 2 : (4.1.1) Applying Gamma function and setting z = nrx m m! , we have dz = nrx m1 (m1)! dx and x = ( znr m! ) 2m m . Then the expression of (4.1.1) is (m 1)! nr ( nr m! ) 2m m Z +1 0 z 2m e z dz: Therefore, the error in case 3 is at most if order n 2 m 2 . Case 4 & 5 Since m is even and S (m) (a)< 0, we can transform the integrals in case 4 and 5 to case 3. Therefore, we can also get the order of case 4 and 5 are n 2 m 2 at most. To get (1.4.9), we can take advantage of Case 1 since S(x) is increasing on [0; 1]. Then, Z T n 0 e nS(x) dx e nS( T n ) nS 0 ( T n ) = ( T n(nT ) )exp(n ln( T n )T ) Comparing the exponential with the rate function of the standard exponential random variable ( T n ) = ln( T n ) + T n 1, then plug in ( T n ), Z T n 0 e nS(x) dx ( T n(nT ) )exp(n ( T n )n): Using Stirling approximation (1.4.10), n! p 2n( n e ) n : 30 Then, we get (1.4.11) P((n + 1; 1)T ) = n n+1 n! Z T n 0 e n(lnxx) dx n n+1 p 2n( n e ) n ( T n(nT ) )exp(n ( T n )n) = 1 p 2n ( T nT )exp(n ( T n )): 4.2 Examples of Laplace's Method As a rst application of Laplace's method, let b2 (0; 1) and consider lower tail of Gamma distribution P (n + 1;nb) = 1 n! Z nb 0 y n e y dy y=nx = n n+1 n! Z b 0 x n e nx dx = n n+1 n! Z b 0 e n(ln(x)x) dx: Then the function S(x) = ln(x)x is increasing on (0;b), with S 0 (b) = 1b b > 0: As a result, with f(x) = 1, we use Case 2 in Table 4.1.1 to conclude that P (n + 1;nb) = b 1b n n n! e n(blnb) 1 +O(n 1 ) ; n!1: (4.2.1) Next, let a2 (1; +1) and consider the upper tail of Gamma distribution Q(n + 1;na) = 1 n! Z 1 na y n e y dy = n n+1 n! Z +1 a e n(ln(x)x) dx: This time, the function S(x) = ln(x)x is decreasing on (a; +1), and we use Case 1 in Table 4.1.1 to conclude that Q(n + 1;nb) = a a 1 n n n! e n(alna) 1 +O(n 1 ) ; n!1: (4.2.2) As an another example, let us derive Stirling's approximation (1.4.10). We have n! = (n + 1) = Z +1 0 y n e y dy y=nx = Z +1 0 x n e nx dx x n =e n lnx = n n+1 Z +1 0 e n(ln(x)x) dx: Now we have S(x) = ln(x)x, S 0 (x) = (1x)=x and S 00 (x) = x 2 +(1x) x 2 , c = 1, S(c) =1, S 00 (c) = 1.Using Case 5 in Table 4.1.1, n! = 2n n+1 ( 1 2 )(2!) 1 2 2n 1 2 e n = p 2n n n e n 1 +O(n 1 2 ) ; n!1: (4.2.3) 31 Recall that, for the standard exponential random variable, the rate function is (x) =x ln(x) 1: Then, combining (4.2.3) with (4.2.1) and (4.2.2) we get, respectively, P (n + 1;nb) = 1 p 2n b 1b e n (b) 1 +O(n 1 ) ; n!1; b2 (0; 1); (4.2.4) Q(n + 1;na) = 1 p 2n a a 1 e n (a) 1 +O(n 1 ) ; n!1; a2 (1; +1): (4.2.5) In particular, (4.2.4) is a more precise version of (1.4.11). To illustrate Case 5 in Table 4.1.1, let us prove the following example lim n!1 e n n X k=0 n k k! = 1 2 : (4.2.6) First, we observe that the left side in (4.2.6) is the limit of CDF of Poisson distribution, and then we can apply Poisson process e n n X k=0 n k k! =P P(n)n =P (n + 1; 1)>n ; (4.2.7) where the second equality follows from (1.4.4). Then P (n + 1; 1)>n = 1 n! Z +1 n y n e y dy = n n+1 n! Z +1 1 e n(ln(x)x) dy = n n+1 n! e n 1 2 r 2 n 1 +O(n 1=2 ) ; which works together with (4.2.3) to imply (4.2.6). Note that (4.2.6) can also be derived from (4.2.7) using CLT: P Gamma(n + 1; 1)>n CLT P( x (n + 1) n + 1 > 1 n + 1 ) n!1 P N (0; 1)> 0 = 1 2 : This observation raises a natural question about further connections between Laplace's method and the Central Limit Theorem. As rst attempt at establishing such connections, consider (4.2.4) with b = 1 x p n : Then plug b into (4.2.4) P (n + 1;nb) = 1 p 2n ( p n x 1)e n (1 x p n ) (1 +O(xn 1 2 )): Then direct computations using (2.1.4) result in P (n + 1; 1)n p n x = 1 p 2x e x 2 =2 1 +O(xn 1=2 ) ; by (1.3.16), this is consistent with normal approximation of the Gamma distribution for large x and large 32 n, as long as xn 1=2 small. In other words, we have discovered a weaker version of (3.0.2). For a better result, we need to go further back to the derivation of Laplace's method. Proposition 4.2.1 For xed x2R, P (n + 1;n +x p n) = (x) 1 +O(n 1=2 ) : (4.2.8) Proof: By direct computation, P(n + 1;n +x p n) =P((n + 1; 1)6n +x p n) = 1 n! Z n+x p n 0 y n e y dy y=nt = 1 n! Z 1+ x p n 0 n n+1 t n e nt dt = n n+1 n! Z 1+ x p n 0 e n(lntt) dt Taylor = = = = = = = = = Expansion n n+1 n!e n Z 1+ x p n 0 e n(t1) 2 2 dt Stirling 0 s = = = = = = = = = = = = Approximation p n p 2 Z 1+ x p n 0 e n(t1) 2 2 dt z=t1 = = = = = = Z x p n 1 e nz 2 2 dz r= p nz = = = = = = 1 p 2 Z x p n e r 2 2 dr n!+1 (x): 2 In the above computations, the Taylor series approximation results in the biggest error. The Central Limit Theorem, when applied to the sum of n iid standard exponential random variables and stated in terms of the function P , gives P (n;n +x p n) = (x) 1 +O(n 1=2 ) ; (4.2.9) Since P(N (0; 1)<x) =P( P n i=1 X i n p n <x) =P( n X i=1 X i <n + p nx) =P (n;n +x p n): Of course, asn!1, (4.2.9) is the same as (4.2.1): if S n is the sum ofn iid standard exponential random variables, then P (n;n +x p n) =P S n n p n x ; P (n + 1;n +x p n) =P S n+1 n p n x ; and if is a standard exponential random variable, then = p n converges to zero as n!1. Numerical results show that, for xed n, (4.2.9) is a better approximation. Note also that, for every y> 0 and m<n, we have P (n;y)<P (m;y); in particular, P (n + 1;n +x p n)<P (n;n +x p n); for all n = 1; 2;:::; x> 0; 33 since Gamma family is stochastically increasing in the rst parameter. In particular, the normal approx- imation in (4.2.1) results in a visible over-estimation of the corresponding probabilities, especially near x = 0. The proof of Proposition 4.2.1 shows that, for x < 0, this over-estimation is also to be expected due to lntt + 1< (t 1) 2 2 ; 0<t< 1: To prove it, let us introduce an auxiliary function g(x) taking x =t 1, g(x) = ln(x + 1)x + x 2 2 ; we see that g(x)< 0, x2 (1; 0); since g(0) = 0, and g 0 (x) = 1 1 +x + (x 1) = x 2 x + 1 > 0; x2 (1; 0): In other words, the Taylor series approximation and last \" in proof of Proposition 4.2.1 are, in fact, strict inequalities \<". Figures 4.2.1 and 4.2.2 illustrate the results for n = 60 and n = 600, respectively, Figure 4.2.1: Normal Approximation of Gamma Distribution: n = 60 34 Figure 4.2.2: Normal Approximation of Gamma Distribution: n = 600 A result of Tricomi [10] (see also [4, Theorem 1.1]) implies P (n + 1;n +x p n) = (x) 1 3 p 2n (2 +x 2 )e x 2 =2 +O 1 n ; (4.2.10) and consequently the inequality P (n + 1;n +x p n)< (x) holds for all suciently large n and all suciently smalljxj; when n = 60, this inequality [\stars below dots"] is clearly visible on Figure 4.2.1. With S n denoting the sum of n iid standard exponentials and F n denoting the CDF of (S n n)= p n, we can use [6, Therem 5.22] to write F n (x) = (x) + 1 3 p 2n (1x 2 )e x 2 =2 +o 1 p n ; (4.2.11) which is an extension of the classical Berry-Esseen bound in the Central Limit Theorem; here, we also use that, for a standard exponential random variable X, we have E(X 1) 3 = 6 3 2 + 3 1 1 = 2: In particular, the dierence between P (n;n +x p n) and (x) changes sign whenjxj = 1, which is also visible on Figure 4.2.1. While we are not in a position to derive either (4.2.10) or (4.2.11), we can reconcile 35 the two results. Indeed, P (n + 1;n +x p n) =P(S n+1 n +x p n) =P S n+1 (n + 1) +x p n + 1 ( p n p n + 1)x 1 = F n+1 x 1 p n + 1 +O(n 1 ): Then, because n 1=2 (n + 1) 1=2 =O(n 3=2 ), we also have F n+1 (x) = (x) + 1 3 p 2n (1x 2 )e x 2 =2 +o 1 p n ; (4.2.12) As a result, P (n + 1;n +x p n) = x 1 p n + 1 + 1 3 p 2n (1x 2 )e x 2 =2 +o 1 p n : (4.2.13) First-order Taylor approximation yields x 1 p n + 1 = (x) 1 p 2n e x 2 =2 +O(n 1 ); which show that (4.2.13) is indeed consistent with (4.2.10), but is slightly weaker [o(n 1=2 ) vs O(n 1 )]. This is not at all surprising, because (4.2.10) is derived specically for the function P whereas (4.2.13) is a particular case of a much more general result [6, Therem 5.22]. For (4.2.10) to hold, we do not need n to be integer; in fact, with suitable interpretation, a version of (4.2.10) holds for complexn andx. Equality (4.2.11), on the other hand, is derived only for integer n, but, as numerical experiments suggest, can be used with non-integer values of n as well. Not surprisingly, equality (4.2.11) [Berry-Esseen-Petrov correction] results in a much better approximation of P (n;n +x p n) than (3.3.3). Similarly, equality (4.2.10) [Tricomi correction] results in a much better approximation of P (n + 1;n +x p n). Figures 4.2.3 and 4.2.4 illustrate the results for n = 6:6. 36 Figure 4.2.3: Normal Approximation of Gamma Distribution, n = 6:6, with the Tricomi correction Figure 4.2.4: Normal Approximation of Gamma Distribution,n = 6:6, with Berry-Esseen-Petrov correction 37 Chapter 5 Numerical Experiments While the theoretical results give asymptotic behavior of P (n + 1;a) as n!1 and/or a!1, the examples have specic values of a and n, such as a = 66 and n = 128. As a result, one has to make a decision whether 66=128 should be considered o( p 128), O( p 128), or maybe something else, and then use the corresponding approximation. A more detailed discussion of these questions is in [9, Section 11.2.5]. Given immediate applications in various software packages, research on this topic is on-going [2, 3, 4, 5, etc.]. In the end, it appears that many decisions remain somewhat arbitrary. For example, the algorithm in [3] considers n = 10 4 as the \large" value of n. In what follows, we will investigate some aspects of the approximations of P (n + 1;a) = 1 (n + 1) Z a 0 y n e y dy: Similar to [3], we restrict our attention to n >1, a > 0; in particular, n is not necessary to be integer. Qualitatively, for every xed n, we have the following cases to consider: 1. Very small a: a 0; 2. Very large a: a!1; 3. Intermediate a, covering both what we call the CLT regime (smalljanj) and the large deviations regime (ja=nj of order one); 4. Transitional regions, for example from a =rn, r< 1, to a =n +x p n. Several approximations of P (n + 1;a) propose a relatively simple algorithm for computing of P (n + 1;a) when a2 [0:01n; 3n], n2 [20; 500]; the algorithm requires only the four arithmetic operations and the logarithmic and exponential functions, and can be implemented on a basic scientic calculator. 5.1 CDF of Gamma Distribution Near Zero: Low Tail For a small a close to zero, the most common approach is using Taylor expansion. We will illustrate two dierent versions of expansion: 38 P (n + 1;a) =a n+1 e a +1 X k=0 a k (n +k + 2) ; (5.1.1) P (n + 1;a) =a n+1 +1 X k=0 (1) k a k k!(k +n + 1) : (5.1.2) To derive (5.1.1), we keep integrating by parts Z a 0 y n e y dy = Z a 0 e y d y n+1 n + 1 =e a a n+1 n + 1 + 1 n + 1 Z a 0 y n+1 e y dy =::: As a by-product, we get a recursive formula P (n + 2;a) =P (n + 1;a) e a a n+1 (n + 2) : To derive (5.1.2), we expande y by the Taylor series and then integrate term-by-term. Comparing (5.1.1) and (5.1.2), we see that the right-hand side of (5.1.2) is an alternating series, which makes it easier to control the truncation error, but can potentially introduce numerical instability. On the other hand, the coecients in (5.1.1) usually decay faster: for n> 0, (n +k + 2)> (n + 1)(k + 1); since (n + 1)(k + 1) (n +k + 2) =B(n + 1;k + 1) = Z 1 0 t n (1t) k dt< 1: Writing (5.1.1) as P (n + 1;a) =a n+1 e a +1 X k=0 b k ; b k = a k (n +k + 2) ; we see that an ecient recursive computation of b k is possible by b 0 = 1 (n + 2) ; b k+1 = a n +k + 2 b k ; because if we want to calculateb k+1 we can take advantage ofbk, it simplifes the complicated calculation on each coecient in the summation. Especially, it is useful in decreasing the complexity of the algrithms such as DP(Dynamic Programming), Divided and Conquer. Similarly, (5.1.2) becomes P (n + 1;a) = a n+1 (n + 1) +1 X k=0 c k ; c 0 = 1 n + 1 ; c k+1 = a k + 1 n +k + 1 n +k + 2 c k : As a result, (5.1.1) can be truncated to N + 1 terms and written as P N (n + 1;a) =e a N+n+1 X k=n+1 a k (k + 1) : (5.1.3) 39 In the two examples below, we took n = 8, a2 [0; 15], and N = 50. Figure 5.1.1 presents the results of using a truncation of (5.1.1); Figure 5.1.2 presents the results of using a truncation of (5.1.2). Figure 5.1.1: Incomplete Gamma function/Gamma CDF: integration-by-parts approximation Figure 5.1.2: Incomplete Gamma function/Gamma CDF: alternating series approximation 40 5.2 CDF of Gamma Distribution at Innity: Upper Tail For xed n and large values of a, the standard approach to computing P (n + 1;a) is via the asymptotic expansion P (n + 1;a) 1 a n e a (n + 1) X k0 (n) k a k ; (n) k =n(n 1) (nk + 1); (n) 0 := 1: (5.2.1) To derive (5.2.1), since P (n + 1;a) +Q(n + 1;a) = 1 P (n + 1;a) = 1Q(n + 1;a) = 1 1 (n + 1) Z +1 a y n e y dy and then integrate by parts as follows: Z +1 a y n e y dy = Z +1 a y n d(e y ) =a n e a n Z +1 a y n1 d(e y ) =a n e a (1 +na 1 )n(n 1) Z +1 a y n2 d(e y ) =:::: Note that, unlike (5.1.1) and (5.1.2), the series on the right hand side of (5.2.1) diverges for all k [unlessn is a non-negative integer, in which case the expansion becomes a nite sum and turns into equality (1.4.4)]. Practically, the corresponding truncation version is: P (N) (n + 1;a) = 1 a n e a (n + 1) N X k=0 (n) k a k (5.2.2) gives an approximation ofP (n + 1;a) for largea; thea gets larger, the approximation gets better. Similar to (5.1.1) and (5.1.2). Figure 5.2.1 illustrates the result for n = 66, a2 [30; 90], and N = 25. For better visualization, we plot ln(1f), where f is either the exact value computed using gammainc function or the approximation P (N) (n + 1;a) When a is not large enough around 30 to 50, the approximation is so bad. With a increasing, the result gets better. 41 Figure 5.2.1: Incomplete Gamma function/Gamma CDF: upper tail approximation 5.3 An \Engineering Calculator" Algorithm For small integer n, we can always use (1.4.4) (or, equivalently, (5.2.1)) to get P (n + 1;a) = 1e a n X k=0 a k k! : (5.3.1) For larger n, the key object becomes the standard normal CDF, which may not be available on a basic engineering calculator. Consider the function e (x) = 2 22 141 x=10 ; x 0: (5.3.2) In particular, e (0) = 2 22 11 = 1=2 = (0); lim x!+1 e (x) = 2 0 = 1 = lim x!+1 (x). As shown in [8], the function e is a good approximation of . Figure 5.3.1 presents the graph of the relative error e (x) (x) =(x) for x2 [6; 6]. Recall that, for x< 0, (x) = 1 (x); and, for x> 5, (x) 1 1 p 2x e x 2 =2 is a good approximation; cf. (1.3.16), (3.2.1), and Figure 3.2.1. As a result, we can now consider the function accessible on any calculator that can compute arbitrary powers. 42 Figure 5.3.1: Mid-Range Approximation of Standard Normal CDF: Relative Error To compute P (n + 1;a), we can combine the CLT with the Tricomi correction (4.2.10) for the a near n and the Laplace's method (4.2.4), (4.2.5) for the a away from n: with (t) =t ln(t) 1, P (n + 1;a) 8 > > > > > < > > > > > : 1 p 2n n na e n (a=n) ; a<n; (x) 1 3 p 2n (2 +x 2 )e x 2 =2 ; a =n +x p n; 1 1 p 2n n an e n (a=n) ; a>n: (5.3.3) Figure 5.3.2 conrms that Laplace's method fails for the a near n. To see that CLT fails away from n, we need to zoom in at logarithmic scale; cf. Figures 5.3.3 and 5.3.4. Figures 5.3.3 and 5.3.4 also show that there is a transition region where both the CLT and Laplace's method do not give a good approximation. 43 Figure 5.3.2: A Global Approximation of P (25;a) Figure 5.3.3: An Approximation of P (25;a): Lower Tail At Logarithmic Scale 44 Figure 5.3.4: An Approximation of P (25;a): Upper Tail At Logarithmic Scale At rst, it might appear that the problem cannot be resolved: Laplace's method works whenjanj is of order of n, whereas the CLT works whenjanj is of order of n 1=2 . Upon closer inspection, we realize that: 1. By Corollary 3.3.2, CLT actually works whenjanj is of order just under n (1=2)+(1=6) =n 2=3 . 2. Whenjanj is of order just under n, we have the moderate deviations approximation according to (1.3.8) and (1.3.9). 3. Whenjanj =O(n 2=3 ) and n!1, Laplace's method is consistent with (1.3.8) and (1.3.9). As a result, we should be able to improve the quality of the global approximation by transitioning between the Laplace and CLT approximations via the moderate deviations approximation, as well as by making all the approximations adaptive, that is, corresponding intervals will now depend on n. Accordingly, we suggest the following modication of (5.3.3), in which (t) =y lnt 1 45 and the number c2 (0; 1) is a tuning parameter: dene the functions L n (a) = 1 p 2n n na e n (a=n) ; M n (a) = an 1 p n + 1 exp (n + 1) a n + 1 + (an 1) 2 2(n + 1) ; N n (a) = (x) 1 3 p 2n (2 +x 2 )e x 2 =2 ; x = an p n ; V n (a) = 1 1 an 1 p n + 1 exp (n + 1) a n + 1 + (an 1) 2 2(n + 1) ; U n (a) = 1 1 p 2n n an e n (a=n) ; (5.3.4) and then set P (n + 1;a) 8 > > > > > > > > > < > > > > > > > > > : L n (a); an 1 1 (nc) 1=3 ; M n (a); n 1 1 (nc) 1=3 <a<n 1 c n 1=3 ; N n (a); janjn 1 c n 1=3 ; V n (a); n 1 + ( c n ) 1=3 <a<n 1 + 1 (nc) 1=3 ; U n (a); an 1 + 1 (nc) 1=3 : (5.3.5) The presence of n + 1 instead of n in the denitions of M n and V n re ects the fact that, for integer n, the function P (n + 1;) is the CDF of the sum of n + 1 standard exponential random variables; the Tricomi correction makes it possible to avoid this (minor) inconvenience in the denition of N n . Numerical experiments suggest that, in the rangen2 [20; 500], the valuesc = 0:2 works well. The value ofn, namely, 25:6, is chosen non-integer but otherwise arbitrary, with no particular meaning behind the numbers. The results are below: Figures 5.3.5{5.3.7. Figure 5.3.5: A Better Approximation of P (25:6;a): Lower Tail At Logarithmic Scale 46 Figure 5.3.6: A Better Approximation of P (25:6;a): Mid-range Figure 5.3.7: A Better Approximation of P (25:6;a): Upper Tail At Logarithmic Scale 47 Appendix A Additional Python Codes The Python codes below are the commands to generate pictures, which can also be interpreted as the applications of the theories. # Rate function of Bernoulli random variale import numpy as np import matplotlib.pyplot as plt x1 = np.linspace(0.01, 0.99, 100) p = 0.25 y11 = x1 * np.log(x1/p) + (1-x1) * np.log((1-x1)/(1-p)) p = 0.5 y12 = x1 * np.log(x1/p) + (1-x1) * np.log((1-x1)/(1-p)) p = 0.75 y13 = x1 * np.log(x1/p) + (1-x1) * np.log((1-x1)/(1-p)) plt.plot(x1, y11, linestyle = `solid', label = `p = 0.25') plt.plot(x1, y12, linestyle = `dotted', label = `p = 0.50') plt.plot(x1, y13, linestyle = `dashdot', label = `p = 0.75') plt.title(`Rate Function of Bernoulli Random Variables') plt.xlabel(`y') plt.ylabel(`${\u039B^*}$(y)') plt.legend(loc=`best') plt.show() #Rate function of Poisson random variable x2 = np.linspace(0, 10, 100) p = 1 y21 = x2 * np.log(x2/p) - x2 + p p = 5 y22 = x2 * np.log(x2/p) - x2 + p p = 10 y23 = x2 * np.log(x2/p) - x2 + p plt.plot(x2, y21, linestyle = `solid', label = `mean = 1') plt.plot(x2, y22, linestyle = `dotted', label = `mean = 5') 48 plt.plot(x2, y23, linestyle = `dashdot', label = `mean = 10') plt.title(`Rate Function of Poisson Random Variables') plt.xlabel(`y') plt.ylabel(`${\u039B^*}$(y)') plt.legend(loc=`best') plt.show() #Rate function of Gamma random variable x3 = np.linspace(1, 10, 100) a, b = 1, 1 y31 = x3/b - a +a * np.log(a*b/x3) a, b = 4, 2 y32 = x3/b - a +a * np.log(a*b/x3) a, b = 2, 4 y33 = x3/b - a +a * np.log(a*b/x3) plt.plot(x2, y31, linestyle = `solid', label = `\u03B1 = 1, \u03B2 = 1') plt.plot(x2, y32, linestyle = `dotted', label = `\u03B1 = 4, \u03B2 = 2') plt.plot(x2, y33, linestyle = `dashdot', label = `\u03B1 = 2, \u03B2 = 4') plt.title(`Rate Function of Gamma Random Variables') plt.xlabel(`y') plt.ylabel(`${\u039B^*}$(y)') plt.legend(loc=`best') plt.show() #Rate function of Normal random variable x4 = np.linspace(-10, 10, 100) a, b = 0, 1 y41 = (x4 - a)**2 / 2*b a, b = 1, 1 y42 = (x4 - a)**2 / 2*b a, b = 3, 2 y43 = (x4 - a)**2 / 2*b plt.plot(x4, y41, linestyle = `solid', label = `\u03BC = 0, ${\u03C3^2}$ = 1') plt.plot(x4, y42, linestyle = `dotted', label = `\u03BC = 1, ${\u03C3^2}$ = 1') plt.plot(x4, y43, linestyle = 'dashdot', label = `\u03BC = 3, ${\u03C3^2}$ = 2') plt.title(`Rate Function of Normal Random Variables') plt.xlabel(`y') plt.ylabel(`${\u039B^*}$(y)') plt.legend(loc=`best') plt.show() #Upper Tail Approximation of Standard Normal CDF: Relative Error import numpy as np import matplotlib.pyplot as plt import scipy.special as ss import scipy.stats as st import math 49 x = np.linspace(2, 6, 100) y1 = st.norm.cdf(-x, loc = 0, scale = 1) y2 = [] for i in x: y2.append(1/(math.sqrt(2*math.pi)*(i)) * math.exp(-i**2/2)) plt.xlabel(`x') plt.ylabel(`Relative error') plt.plot(x, (y2-y1)/y1) plt.show() # Normal approximation of Incomplete Gamma function import numpy as np import matplotlib.pyplot as plt import scipy.special as ss import scipy.stats as st x = np.linspace(-2, 2, 50) y = st.norm.cdf(x, loc = 0, scale = 1) n = 60 y1 = ss.gammainc(n, n + x * np.sqrt(n)) y2 = ss.gammainc(n+1, n + x * np.sqrt(n)) plt.plot(x, y, label = `Standard Normal CDF') plt.scatter(x, y1, s=15, marker=`o', label=`Incomplete Gamma with n') plt.scatter(x, y2, s=15, marker=`*', label=`Incomplete Gamma with n+1') plt.xlabel(`x') plt.ylabel(`Probability') plt.legend(loc=`best') plt.show() # Normal approximation of Incomplete Gamma function import numpy as np import matplotlib.pyplot as plt import scipy.special as ss import scipy.stats as st x = np.linspace(-2, 2, 50) y = st.norm.cdf(x, loc = 0, scale = 1) n = 600 y1 = ss.gammainc(n, n + x * np.sqrt(n)) y2 = ss.gammainc(n+1, n + x * np.sqrt(n)) plt.plot(x, y, label = `Standard Normal CDF') plt.scatter(x, y1, s=15, marker=`o', label=`Incomplete Gamma with n') plt.scatter(x, y2, s=15, marker=`*', label=`Incomplete Gamma with n+1') plt.xlabel(`x') plt.ylabel(`Probability') plt.legend(loc=`best') plt.show() 50 #Normal Approximation of Gamma Distribution, n = 6.6, with the Tricomi correction import numpy as np import matplotlib.pyplot as plt import scipy.special as ss import scipy.stats as st import math x = np.linspace(-1, 1, 30) n=6.6 y1 = ss.gammainc(n+1, n + x * np.sqrt(n)) y2 = [] for i in x: y2.append(st.norm.cdf(i, loc = 0, scale = 1) - 1/(3* math.sqrt(2*math.pi*n)) * (2+i**2)* \ math.exp(-i**2/2)) plt.plot(x, y1, label = `left-hand side') plt.scatter(x, y2, marker=`*', color = `r', label = `right-hand side') plt.xlabel(`x') plt.ylabel(`Probability') plt.legend(loc=`best') plt.show() # Normal Approximation of Gamma Distribution, n = 6.6, with Berry-Esseen-Petrov correction import numpy as np import matplotlib.pyplot as plt import scipy.special as ss import scipy.stats as st import math x = np.linspace(-1, 1, 30) n = 6.6 y1 = ss.gammainc(n, n + x * np.sqrt(n)) y2 = [] for i in x: y2.append(st.norm.cdf(i, loc = 0, scale = 1) + 1/(3* \ math.sqrt(2*math.pi*n)) * (1-i**2)* math.exp(-i**2/2)) plt.plot(x, y1, label = `left-hand side') plt.scatter(x, y2, marker=`*', color = `r', label = `right-hand side') plt.xlabel(`x') plt.ylabel(`Probability') plt.legend(loc=`best') plt.show() # Approximation of Standard Normal CDF: Relative Error import numpy as np import matplotlib.pyplot as plt import scipy.special as ss 51 import scipy.stats as st import math x = np.linspace(0, 6, 300) y1 = st.norm.cdf(x, loc = 0, scale = 1) y2 = 2**(-22**(1-41**(x/10))) plt.plot(x, (y2-y1)/y1, label = `Relative Error') plt.xlabel(`x') plt.ylabel(`Relative Error') plt.legend(loc = `best') plt.show() # Incomplete Gamma function/Gamma CDF: upper tail approximation import numpy as np import matplotlib.pyplot as plt import scipy.special as ss import math a = np.linspace(30, 90, 50) n = 66 N = 25 y1 = [] for i in a: y1.append(math.log(1 - ss.gammainc(n+1, i))) y2 = [] for i in a: trunc = 0 init = 1 pre = 1 for j in range(1,N+1): trunc += pre pre = (n-j+1)/i * pre y2.append(math.log(i**n * math.exp(-i)/math.gamma(n+1) * trunc)) plt.plot(a, y1, label = `incomplete Gamma function') plt.scatter(a, y2, color = `r', label = `upper tail approximation') plt.xlabel(`a') plt.ylabel(`Probability') plt.legend(loc = `best') plt.show() # Incomplete Gamma function/Gamma CDF: integration-by-parts approximation import numpy as np import matplotlib.pyplot as plt import scipy.special as ss import math a = np.linspace(0, 15, 50) n = 8 52 N = 50 y1 = ss.gammainc(n+1, a) y2 = [] for j in a: trunc = 0 for i in range(int(n+1), int(N+n+1)): trunc += j**i/math.gamma(i+1) y2.append(np.exp(-j) * trunc) plt.plot(a, y1, label = `incomplete Gamma function') plt.scatter(a, y2, s = 15, color = `r', label = `integration-by-parts approximation') plt.xlabel(`a') plt.ylabel(`Probability') plt.legend(loc = `best') plt.show() # Incomplete Gamma function/Gamma CDF: alternating series approximation import numpy as np import matplotlib.pyplot as plt import scipy.special as ss import math a = np.linspace(0, 15, 50) n = 8 N = 50 y1 = ss.gammainc(n+1, a) y2 = [] for i in a: trunc = 0 init = 1/(n+1) pre = init for j in range(0,50): trunc += pre pre = -(i/(j+1)) * (n+j+1)/(n+j+2) * pre y2.append(i**(n+1)/math.gamma(n+1) * trunc) plt.plot(a, y1, label = `incomplete Gamma function') plt.scatter(a, y2, s = 15, color = `r', label = `alternating series approximation') plt.xlabel(`a') plt.ylabel(`Probability') plt.legend(loc = `best') plt.show() # A Global Approximation of P (25, a) CLT versus Laplace's method import numpy as np import matplotlib.pyplot as plt import scipy.special as ss import scipy.stats as st 53 import math a = np.linspace(4, 75, 50) n = 24 y = ss.gammainc(n+1, a) y1 = [] y2 = [] for i in a: y1.append(st.norm.cdf((i-n)/math.sqrt(n), loc = 0, scale = 1)- \ 1/(3 * math.sqrt(2*math.pi*n)) * (2+i**2)*math.exp(-i**2/2)) for j in a: lamb = j/n - math.log(j/n) -1 if j < n: y2.append(1/math.sqrt(2*math.pi*n) * n/(n-j) * math.exp(-n*lamb)) else: y2.append(1-1/math.sqrt(2*math.pi*n) * n/(j-n) *math.exp(-n*lamb)) plt.plot(a, y, label = `incomplete gamma') plt.scatter(a, y1, s = 8, color = `y', label = `CLT') plt.scatter(a, y2, s = 8, marker=`*', color = `r', label = `Laplace method') plt.xlabel(`a') plt.ylabel(`Probability') plt.legend(loc = `best') plt.show() # A Global Approximation of P (25, a) CLT versus Laplace's method(logrithmic) low tail a = np.linspace(4, 25, 50) y = [] for k in a: y.append(math.log(ss.gammainc(n+1, k))) y3 = [] y4 = [] for i in a: w = st.norm.cdf((i-n)/math.sqrt(n), loc = 0, scale = 1)- \ 1/(3 * math.sqrt(2*math.pi*n)) * (2+i**2)*math.exp(-i**2/2) if w > 0: y3.append(math.log(w)) else: y3.append(math.log(ss.gammainc(n+1, i))) for j in a: lamb = j / n - math.log(j / n) - 1 v = 1/math.sqrt(2*math.pi*n) * n/(n-j) * math.exp(-n*lamb) if j < n and v > 0: y4.append(math.log(v)) else: y4.append(math.log(ss.gammainc(n+1, j))) plt.plot(a, y, label = `incomplete gamma') 54 plt.scatter(a, y3, s = 5, color = `y', label = `CLT') plt.scatter(a, y4, s = 5, marker=`*', color = `r', label = `Laplace method') plt.xlabel(`a') plt.ylabel(`lnP(25,a)') plt.legend(loc = `best') plt.show() #A Global Approximation of P (25, a) CLT versus Laplace's method(logrithmic) low tail a = np.linspace(25, 75, 50) y = [] for k in a: y.append(math.log(1 - ss.gammainc(n+1, k))) y3 = [] y4 = [] for i in a: w = 1 - st.norm.cdf((i-n)/math.sqrt(n), loc = 0, scale = 1)- \ 1/(3 * math.sqrt(2*math.pi*n)) * (2+i**2)*math.exp(-i**2/2) if w > 0: y3.append(math.log(w)) else: y3.append(math.log(1 - ss.gammainc(n+1, i))) for j in a: lamb = j / n - math.log(j / n) - 1 v = 1 - 1/math.sqrt(2*math.pi*n) * n/(n-j) * math.exp(-n*lamb) if j < n and v > 0: y4.append(math.log(v)) else: y4.append(math.log(1 - ss.gammainc(n+1, j))) plt.plot(a, y, label = `incomplete gamma') plt.scatter(a, y3, s = 5, color = `y', label = `CLT') plt.scatter(a, y4, s = 5, marker=`*', color = `r', label = `Laplace method') plt.xlabel(`a') plt.ylabel(`ln(1-P(25,a))') plt.legend(loc = `best') plt.show() # A Better Approximation of P (25.6, a): Lower Tail At Logarithmic Scale import numpy as np import matplotlib.pyplot as plt import scipy.special as ss import scipy.stats as st import math a1 = np.linspace(0.1, 18, 25) n = 24.6 c= 0.2 55 y1 = [] for r in a1: y1.append(math.log(ss.gammainc(n+1, r))) r1 = n * (1-1/(n*c)**(1/3)) r2 = n * (1-(c/n)**(1/3)) a11 = [] a12 = [] y2 = [] y3 = [] for i in a1: if i <= r1: a11.append(i) y2.append(math.log(1/math.sqrt(2 * math.pi * n) * (n /(n-i)) * \ math.exp(-n * (i/n - math.log(i/n) - 1)))) elif r1 < i < r2: a12.append(i) y3.append(math.log(st.norm.cdf((i-n-1)/math.sqrt(n+1))*math.exp(-(n+1)*(i/(n+1) - \ math.log(i/(n+1)) - 1) + (i - n - 1)**2/(2*(n+1))))) plt.plot(a1, y1, label = `incomplete Gamma') plt.scatter(a11, y2, s = 15, color = `g', label = `$L_n$') plt.scatter(a12, y3, s = 15, color = `r', marker=`*', label = `$M_n$') plt.xlabel(`a') plt.ylabel(`lnP(25.6, a)') plt.legend(loc = `best') plt.show() # A Better Approximation of P (25.6, a): Mid-range import numpy as np import matplotlib.pyplot as plt import scipy.special as ss import scipy.stats as st import math a1 = np.linspace(0.1, 60, 50) n = 24.6 c= 0.2 y1 = [] for r in a1: y1.append(ss.gammainc(n+1, r)) r1 = n * (1-1/(n*c)**(1/3)) r2 = n * (1-(c/n)**(1/3)) r3 = n * (1+(c/n)**(1/3)) r4 = n * (1+1/(n*c)**(1/3)) a11 = [] a12 = [] a13 = [] 56 y2 = [] y3 = [] y4 = [] for i in a1: if i <= r1: a11.append(i) y2.append(1/math.sqrt(2 * math.pi * n) * (n /(n-i)) * math.exp(-n * (i/n - math.log(i/n) - 1))) if r1 < i < r2: a12.append(i) y3.append(st.norm.cdf((i-n-1)/math.sqrt(n+1), loc = 0, scale = 1)* \ math.exp(-(n+1)*(i/(n+1) - math.log(i/(n+1)) - 1) + (i - n - 1)**2/(2*(n+1)))) if n - r2 <= i <= n + r2: a13.append(i) y4.append(st.norm.cdf((i-n)/math.sqrt(n)) -1/(3*math.sqrt(2*math.pi*n))* \ (2 + ((i-n)/math.sqrt(n))**2) * math.exp(-((i-n)/math.sqrt(n))**2 / 2)) if r3 < i < r4: a12.append(i) y3.append(1-(1-st.norm.cdf((i-n-1)/math.sqrt(n+1), loc = 0, scale = 1))* \ math.exp(-(n+1)*(i/(n+1) - \ math.log(i/(n+1)) - 1) + (i - n - 1)**2/(2*(n+1)))) if i >= r4: a11.append(i) y2.append(1 - 1/math.sqrt(2 * math.pi * n) * (n /(i-n)) * math.exp(-n * (i/n - math.log(i/n) - 1))) plt.plot(a1, y1, label = `incomplete Gamma') plt.scatter(a11, y2, s = 20, color = `g', label = `$L_n$ and $U_n$') plt.scatter(a12, y3, s = 20, color = `r', marker=`*', label = `$M_n$ and $V_n$') plt.scatter(a13, y4, s = 20, color = `c', marker=`+', label = `$N_n$') plt.xlabel(`a') plt.ylabel(`P(25.6, a)') plt.legend(loc = `best') plt.show() # A Better Approximation of P (25.6, a): Upper Tail At Logarithmic Scale import numpy as np import matplotlib.pyplot as plt import scipy.special as ss import scipy.stats as st import math a1 = np.linspace(30, 80, 50) n = 24.6 c= 0.2 y1 = [] for r in a1: y1.append(math.log(ss.gammainc(n+1, r))) r1 = n * (1-1/(n*c)**(1/3)) 57 r2 = n * (1-(c/n)**(1/3)) r3 = n * (1+(c/n)**(1/3)) r4 = n * (1+1/(n*c)**(1/3)) a11 = [] a12 = [] a13 = [] y2 = [] y3 = [] y4 = [] for i in a1: if r3 < i < r4: a12.append(i) y3.append(math.log(1-(1-st.norm.cdf((i-n-1)/math.sqrt(n+1), loc = 0, scale = 1))* \ math.exp(-(n+1)*(i/(n+1) - math.log(i/(n+1)) - 1) + (i - n - 1) ** 2/(2*(n+1))))) if i >= r4: a11.append(i) y2.append(math.log(1 - 1/math.sqrt(2 * math.pi * n) * (n /(i-n)) * \ math.exp(-n * (i/n - math.log(i/n) - 1)))) plt.plot(a1, y1, label = `incomplete Gamma') plt.scatter(a11, y2, s = 20, color = `g', label = `$U_n$') plt.scatter(a12, y3, s = 20, color = `r', marker=`*', label = `$V_n$') plt.xlabel(`a') plt.ylabel(`lnP(25.6, a)') plt.legend(loc = `best') plt.show() 58 Bibliography [1] A. Dembo, Large Deviations Techniques and Applications, second ed., vol. 38, Springer-Verlag, New York, 1998. [2] T. M. Dunster, Asymptotic solutions of second-order linear dierential equations having almost coa- lescent turning points, with an application to the incomplete gamma function, Proc. Roy. Soc. London Ser.A 452 (1996), no. 1949, 1331{1349. [3] P. Greengard and V. Rokhlin, An algorithm for the evaluation of the incomplete gamma function, Adv. Comput. Math. 45 (2019), no. 1, 23{49. [4] G. Nemes and A. B. Olde Daalhuis, Asymptotic expansions for the incomplete gamma function in the transition regions, Math.Comp. 88 (2019), no. 318, 1805{1827. [5] R. B. Paris, A uniform asymptotic expansion for the incomplete gamma function, J. Comput. Appl. Math. 148 (2002), no. 2, 323{339. [6] V. V. Petrov, Sums of independent random variables, Springer-Verlag, New York, 1975. [7] A. N. Shiryaev, Probability, Graduate Texts in Mathematics, second ed., vol. 95, Springer-Verlag, New York, 1996. [8] A. Soranzo and E. Epure, Very simply explicitly invertible approximations of normal cumulative and normal quantile function, Applied Mathematical Sciences 8 (2014), no. 87, 4323{4341. [9] N. M. Temme, pecial functions: An introduction to the classical functions of mathematical physics, A Wiley-Interscience Publication, John Wiley & Sons, Inc., New York, 1996. [10] F. G. Tricomi, Asymptotische eigenschaften der unvollst andigen gammafunktion, Math.Z. 53 (1950), 136{148. 59
Asset Metadata
Creator
Zhou, Hongjian (author)
Core Title
CLT, LDP and incomplete gamma functions
Contributor
Electronically uploaded by the author
(provenance)
School
College of Letters, Arts and Sciences
Degree
Master of Science
Degree Program
Applied Mathematics
Publication Date
12/06/2019
Defense Date
12/05/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
asymptotic expansions,CLT (central limit theorem),incomplete gamma functions,Laplace's method,LDP (large deviation principle),normal tail,OAI-PMH Harvest
Language
English
Advisor
Lototsky, Sergey (
committee chair
), Fulman, Jason (
committee member
), Kukavica, Igor (
committee member
)
Creator Email
hongjianzhou0606@gmail.com,hongjiaz@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-246132
Unique identifier
UC11674132
Identifier
etd-ZhouHongji-8010.pdf (filename),usctheses-c89-246132 (legacy record id)
Legacy Identifier
etd-ZhouHongji-8010.pdf
Dmrecord
246132
Document Type
Thesis
Rights
Zhou, Hongjian
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Abstract (if available)
Abstract
Given a series of identically distributed random variables X₁, X₂, ..., Xₙ with mean μ and variance σ², the CLT(Central Limit Theorem) can provide a good approximation of the probability ℙ(|X̄ₙ − μ|) > x for x which is not too big compared to σ∕√n. Although the CLT always works perfectly around the “center” μ, it is not overwhelming on the “tail” — Large Deviation. For large x, a more delicate analysis is required. This thesis discusses three different approaches to this analysis: Large Deviation, Moderate Deviation via Cramér-Petrov series theory and Asymptotic Analysis via Laplace’s Method. These three approaches will be illustrated on standard exponential random variables.
Tags
asymptotic expansions
CLT (central limit theorem)
incomplete gamma functions
Laplace's method
LDP (large deviation principle)
normal tail
Linked assets
University of Southern California Dissertations and Theses