Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Large deviations rates in a Gaussian setting and related topics
(USC Thesis Other)
Large deviations rates in a Gaussian setting and related topics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Large deviations rates in a Gaussian setting and related topics by Diogo Bessam A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (APPLIED MATHEMATICS) April 2014 Copyright 2014 Diogo Bessam Acknowledgements Firstly, I thank my Ph.D. advisor, Prof. Sergey Lototsky, for outstanding support, guidance through my graduate studies at USC and for openly sharing plenty of math- ematical questions and thoughts. Secondly, I thank Prof. Remigijus Mikulevicius for invaluable lectures and for decisive insights on several topics pertinent to this disserta- tion and beyond. Furthermore, I express appreciation to all members of my dissertation committee - the above mentioned and also Prof. Firdaus Udwadia - for their time and feedback. Lastly, I acknowledge FCT-Portugal fellowship SFRH / BD / 43286 / 2008 for partial financial support. ii List of Tables 2.1 Examples of rate functions. . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 Pointwise and functional limit theorems. . . . . . . . . . . . . . . . . . 30 iii List of Figures 2.1 Rate functions for Be(:2), Exponential(:1), Poisson(5), Laplace(0;:3). . 22 2.2 Sample paths for X n (t) p n (top) and e X n p n. . . . . . . . . . . . . . . . 31 iv Table of Contents Acknowledgements ii List of Tables iii List of Figures iv Abstract vii Chapter 1 INTRODUCTION. 1 Chapter 2 PRIMER ON LARGE DEVIATIONS THEORY. 8 2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Large deviations principle. . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Heuristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Rate functions on Polish spaces. . . . . . . . . . . . . . . . . . 12 2.3 Laplace’s principle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4 Weak large deviations. Exponential tightness. . . . . . . . . . . . . . . 16 2.5 Contraction principle. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.6 A selection of seminal examples. . . . . . . . . . . . . . . . . . . . . . 20 2.6.1 Cramer’s theorem. . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6.2 Sample path LD I: Mogulskii’s and Schilder’s theorems. . . . . 26 2.6.3 Sample path LD II: random perturbations of dynamical systems. 39 Chapter 3 THE GAUSSIAN SETTING. 51 3.1 A finite dimensional tour. . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1.1 The non-degenerate case. . . . . . . . . . . . . . . . . . . . . . 54 3.1.2 The degenerate case. . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Square integrable processes. . . . . . . . . . . . . . . . . . . . . . . . 61 3.2.1 Operators on Hilbert spaces: compact, positive, trace class. . . . 61 3.2.2 Reproducing kernel Hilbert spaces. . . . . . . . . . . . . . . . 68 3.2.3 LDP for square integrable Gaussian processes. . . . . . . . . . 75 3.3 Gaussian measures on separable Banach spaces. . . . . . . . . . . . . . 79 3.3.1 Cameron-Martin space: Banach case. . . . . . . . . . . . . . . 80 v 3.3.2 Cameron-Martin space: Hilbert case. . . . . . . . . . . . . . . 88 3.3.3 LDP for Gaussian measures. . . . . . . . . . . . . . . . . . . . 98 3.4 Proof of Claim 1.0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 3.4.1 First statement. . . . . . . . . . . . . . . . . . . . . . . . . . . 100 3.4.2 Second statement. . . . . . . . . . . . . . . . . . . . . . . . . 101 3.5 RKHSH q vs CMH g . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 3.6 Further examples and calculations. . . . . . . . . . . . . . . . . . . . . 104 3.6.1 Diagonalization of the covariance operator of the Ornstein-Uhlenbeck process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.6.2 Multiple integrated Wiener process. . . . . . . . . . . . . . . . 109 Chapter 4 DISCUSSION. 111 4.1 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.2 Further related questions. . . . . . . . . . . . . . . . . . . . . . . . . . 112 Reference List 114 vi Abstract We study large deviations (LD) rates in a Gaussian setting and their representation in terms of more fundamental objects: the covariance operator, the Cameron-Martin space, the reproducing kernel Hilbert space and we carry out a direct proof to obtain this rate for a vanishing Gaussian random vector. We provide a fairly self-contained discussion on the relation between three well- known examples: the LD principle for the vanishing Wiener process (Shilder’s theo- rem), the LD principle for the vanishing square integrable Gaussian process and the LD principle for a degenerating Gaussian measure. Motivated by the case of the Wiener process, we aim at a series representation of the Cameron-Martin space associated to the Ornstein-Uhlenbeck process on a finite interval, and we are led to a second order differ- ential equation that characterizes the spectrum of the associated covariance operator. Several examples are carried out illustrating the use of the contraction principle: namely for a d-dimensional Ornstein-Uhlenbeck process and for an oscillator with Gaus- sian random forcing. vii Chapter 1 INTRODUCTION. The topic of large deviations (LD) concerns descriptions such as P n (B) exp(n inf x2B I(x)+ o(n)); as n!¥; where(P n ;n 0) is a family of probability measures, B is a Borel set, and I 0. Equiv- alently, lim n!¥ 1 n logP n (B)= inf x2B I(x); and B is said a rare event, and I, said the rate function, expresses an exponential decaying rate, as n!¥ (see, e.g., [DZ98, FW98, Var84]). More technically, we need not assume the limit exists but rather that the right hand side in display is an upper bound for the limsup n!¥ 1 n logP n (C), C closed, and a lower bound for the liminf n!¥ 1 n logP n (U), U open. In generic terms, to establish, what we then call a large deviations principle (LDP), one would show each bound separately. 1 Here we are mainly interested in the Gaussian setting: this means we are dealing with Gaussian probability measuresP n . Essentially, we can think of these Gaussian measures as distribution lawsP n (G)=P(X=n2 G), where X is some Gaussian random element or stochastic process. In fact, for the class of examples we have in mind, Gaussian measures induce and are induced by Gaussian processes (this is due to the function space where the stochastic process has its paths, normally the space of square integrable functions L 2 (0;T), for finite T ). This correspondence has been proved in [RC72] for a number of function spaces usually used in the analysis of stochastic processes, including L 2 (0;T), for finite T . One of the main objectives here is to clarify the relation between the next three ex- amples of rates, appearing in the literature (below,e plays the role of 1=n). Example 1.0.1. Given a standard Wiener process(W(t);t2[0;1]), the distribution laws P e =L(eW) satisfy a large deviations principle (LDP) in L 2 (0;1), ase! 0 with rate function I(j)= 8 > > > < > > > : 1 2 R 1 0 j ˙ j(t)j 2 dt;j2 H 1 0 ([0;1]) ¥;otherwise where H 1 0 ([0;1])=fj; j(t)= R t 0 g(s)ds;g2 L 2 (0;1)g. See [FW98, Section 1.2]. 2 Example 1.0.2. More generally, consider a real valued, square integrable, zero-mean Gaussian process X =(X(t);t2 S), where S is some compact interval of the reals. The lawsL(eX),e > 0, satisfy a LDP in(L 2 (S);jj 2 ) ase! 0 with rate function I(j)= 8 > > > < > > > : 1 2 jQ 1=2 j(t)j 2 2 ;j2 Q 1=2 L 2 (S) ¥; otherwise: Here Q is the integral operator f()7! R S f(s)q(s;)ds and q(s;t)=EX(s)X(t). In the case when Q is not injective, the Q 1=2 is a generalized inverse. See [FW98, Section 4.4]. Example 1.0.3. Even more generally, given a Gaussian measure g on a separable Ba- nach space E, the family g( e ), as e! 0 and the family g( e ) satisfies a LDP on E as e! 0 with rate function I(x)= 8 > > > < > > > : 1 2 jxj 2 g ;x2H g ¥;otherwise whereH g is a certain Hilbert space continuously embedded in E. See [DPZ92, Theorem 12.7], [KO78], [DS89, Chapter III, Section 3.4]. In Section 3.1, we provide a proof of the LDP in Example 1.0.3 in the case when g is the distribution law of a standard Gaussian vector X2 E=R d - this setting carries the common ideas in the above examples without the technicalities of any. We then naturally arrive at Claim 1.0.1. The three results have a correspondence in the following sense: 3 1. Let X =(X(t);t2[0;1]) be a zero-mean Gaussian stochastic process such that E R 1 0 X 2 (t)dt <¥ and let g =L(X) on E = L 2 (0;1). Then, the rate function for the lawsg( e )=L(eX) obtained in item (1.0.2) is the rate function obtained from Example 1.0.3. 2. In particular, ifg is the Wiener measure on L 2 (0;1), then the rate function for the laws g( e )=L(eW) obtained in Example 1.0.2 is the rate function obtained in Example 1.0.1. The verification of this claim is carried out in Section 3.4. The second statement in display can be verified by looking at the series representations of both spaces (these calculations are known see, e.g., in [Bog98, Example 2.3.15]). The first statement in display is a consequence of the following: let g be a Gaussian measure on a separable Hilbert space H, then - we can associate a positive, self-adjoint, trace class operator Q, said the covariance operator; - we can associate a Hilbert spaceH g , continuously embedded in H, that charac- terizes the directions h2 H under which g and g( h) are equivalent measures; furthermore, - H g = Q 1=2 H ((;)) g ; -juj g =jQ 1=2 uj H ; u2H g . 4 HereH g is said the Cameron-Martin space associated with the Gaussian measureg and Q 1=2 H ((;)) g is the completion of the image QH under the inner product (Qu;Qv) g =(Qu;v) H : In case Q is not injective, we are considering a generalized inverse (in the sense of [FW98, p.93]). When Q is injective, it turns out that Q 1=2 H is dense in H and this fact can be used to showH g = Q 1=2 H. The second assertion of Claim 1.0.1 then follows by verifying that the integral operator Q X of X =(X(t) : t2[0;1]) as a stochastic process coincides with the covariance Q of X as an L 2 (0;1)- valued random element, according to the scheme q(s;t) C g ( f;g) Integration ? ? y ? ? y Riesz Q X Q g where q(s;t)=EX(s)X(t) is the covariance function associated to(X(t) : t2[0;1]) and C g ( f;g)= R L 2 (0;1) ( f;x) 2 (g;x) 2 g(dx) is the covariance ofg=L(X) . When we are deal- ing with the Gaussian law of a certain stochastic process over a set S, we may use the no- tion of reproducing kernel Hilbert space (RKHS) to describe the Cameron-Martin space. Indeed, associated to the covariance function q(s;t) there exists a unique RKHSH q of functions on S. The spaceH q is obtained canonically through the Moore-Aronszajn construction (see [Aro50]): it is the completion of the finite linear combinations of fq(t;); t2 Sg under the inner product (;) q determined by requiring the reproducing property(q(t;);q(s;)) q = q(t;s). It turns out that the RKHS is very often isometrically 5 isomorphic to the Cameron-Martin space, so that in the literature various authors use the terms fairly indistinguishably. Furthermore, the very term “Cameron-Martin space” of- ten refers to the specific space associated to the Wiener measure, first considered in the article by Cameron and Martin [CM44]. To avoid confusion, we will adopt the terminol- ogy that: a RKHS comes associated to a positive definite kernel and the Cameron-Martin space comes associated to a Gaussian law. As just mentioned, these notions are equiva- lent under mild assumptions. Sufficient conditions for equivalence have been established in [vdVvZ08, Theorem 2.1]: if the process has paths in a separable Banach subspace of the bounded functions equipped with the uniform norm, thenH g =H q (see Section 3.5). Now, suppose that instead of the Wiener process we have another Gaussian process, say the Ornstein-Uhlenbeck, or the multiple integrated Wiener process: how explicit would the LD rates look like for these cases? Let us deal with a Gaussian process in the conditions of the second example above. In particular the associated covariance operator Q is a positive self-adjoint, trace class operator and the covariance function q(s;t) is a positive definite kernel (meaning that the matrices(q(s i ;s j )) i; j are positive semi definite, symmetric). If the covariance operator Q is injective the following series representation is available H g =fh= å k h k e k 2 H; å k jh k j 2 l k <¥g; jhj 2 g = å k jh k j 2 l k ; h2 H g ; 6 where (e k ) is a complete orthonormal system of eigenvectors of Q, with corresponding eigenvalues l k . In turn, the diagonalization of Q amounts to solving equations of the type Q f =l f,(QlI) f = 0; f6= 0 with Q being positive self-adjoint and trace class, or equivalently Z T 0 q(s;t) f(s)ds=l f(t); f6= 0; with q(s;t) positive definite kernel such that R S q(s;s)ds <¥. In the case of the Wiener process, elementary methods such as differentiation and judicious algebraic manipula- tions are sufficient to solve these equations. For the Ornstein-Uhlenbeck process, this method leads to a second order differential equation characterizing the spectrum of Q, however the spectrum is not explicitly determined (see Section 3.6). 7 Chapter 2 PRIMER ON LARGE DEVIATIONS THEORY. This chapter constitutes an overview of the main notions in the subject of large devi- ations with some chosen seminal examples. 2.1 Introduction. We introduce the subject by focusing on a simple example: consider IID sequence of random variables X i 2R, with zero mean and unitary variance. As a consequence of the law of large numbers for the sample mean X n := X 1 +:::+ X n n ; P(jX n j2(y;¥))! 0; as n!¥; for all y 0: This result informs us that, assymptotically, as n!¥, there is some concentration of measure around the zero mean. We would like to quantify it. What type of convergence 8 to zero is it (polynomial, exponential)? Would the convergence accelerate had we con- sidered an interval further away from the zero mean, say(x;¥), x> y, instead of(y;¥)? If so, how exactly? Claim 2.1.1. For any y 0, there exists I(y) such that P(jX n j> y) e nI(y) : (2.1) Above, x n y n denotes lim n!¥ 1 n logx n = lim n!¥ 1 n logy n . Roughly, the tail probability decays exponentially at the rate I(y). We now verify Claim 2.1.1 for the particular case when the X i are standard Gaussian. Here we make use of Laplace’s method, which, in particular, implies the approxima- tion: lim e!0 e log Z b a e f(x) e g(x)dx= min x2[a;b] f(x); provided f is continuous function with a unique minimum of on ¥ < a < b <¥, and g is continuous and positive (see [FW98, p.71]). Applying this result with e = 1=n, f(x)=x 2 =2, g(x)= 1, on each[a;b]=[y;b], and letting b!¥ we obtain P(jX n j> y)= 2P(X n p n> p ny)= 2 p n p 2p Z +¥ y e nx 2 2 dx and lim n!¥ 1 n logP(jX n j> y)= y 2 2 ; 9 Hence, I(y)= y 2 =2. Summary: letting P n ((y;¥)) =P(j ¯ X n j2 (y;¥)) we can reformulate the result of Claim 2.1.1 as: there is I such that, for all sets(y;¥), y> 0 lim n!¥ 1 n logP n ((y;¥))= inf x2(y;¥) I(x): The subject of Large Deviations deals with situations involving these elements – an exponential decay (associated to some type of concentration of measure), at a rate expressed as an optimization problem. S.R.S. Varadhan is credited for introducing, in [Var66], a general principle that formalizes and unifies these concepts. Some well es- tablished references for a survey on the theory and its far reaching generalizations are: [DZ98, DS89, Var84]. Also a classic reference is the monograph by M. I. Freidlin and D. Wentzell, [FW98], being on of the first monographs in Large Deviations for stochastic processes. 2.2 Large deviations principle. Let E be a Polish space, i.e., a complete, separable, metric space andP e ;e 0 a family of probability measures on the Borel subsets of E,B(E). Also consider a lower- semicontinuous (LSC) function I : E![0;+¥]. Definition 2.2.1. We sayP e , e 0 satisfies a large deviations principle on E with rate function I (and scaling or speed r(e)!¥) if 10 (i) (upper bound) for each closed set C E, limsup e!0 1 r(e) logP e (C) inf x2C I(x); (ii) (lower bound) for each open set U E, liminf e!0 1 r(e) logP e (U) inf x2U I(x): We say I is a good rate function on E if (1) for each c<¥ the setfx;I(x) cg is a compact set in E. Remark 2.2.1 (LDP for random elements and stochastic processes). Given a family of random elements X e :W! E,e > 0, on a probability space(W;F;P), consider the laws L(X e ) of X e (induced on(E;B(E)). The LDP for X e means the LDP for the family of probability measuresL(X e ). 2.2.1 Heuristics. Estimates (i) and (ii) express the exponential decay of the probability of such events: I(x) should be seen as expressing a local (in a small ball around x) decay rate, while inf A I is the global decay rate for A. Let us make these ideas a little more definite: suppose the 11 set S is regular with respect to I, i.e. such that inf S I= inf S I, where S is the interior of S and S is the closure of S, then P e (S) e r(e)inf x2S I(x) ; ase! 0 (2.2) that is lim e!0 1 r(e) logP e (S)= inf x2S I(x): The following quote from [dH00] is appropriate here: Any large deviation occurs in the least unlikely of all unlikely ways. 2.2.2 Rate functions on Polish spaces. The uniqueness of I is guaranteed even without the compactness condition (see a short elegant prove in [Din93]). Nevertheless, on Polish spaces, this condition implies several additional features. We emphasize right away two. The proofs are in Lemma 2.2.1. The first one is in relation to the optimization problems inf x2C I(x), C closed; a good rate function is the starting point for the so called direct methods for finding minimizers of functionals (see [Str90, Chapter 1]). The second one is the recovery of a WLLN when we are in a situation where the probabilitiesP e become more degenerate, centered at the minimum of I, provided the point of minimum is unique. In general, the LDP covers more situations than that of one 12 exact minimum. Those situations correspond, probabilistically, to a breakdown in the LLN (or ergodic theorem) which is most interesting in Statistical Physics as explored in the book by R.S. Ellis [Ell85]. As a preliminary step, recall that one of the characterizations of the convergence in distribution,P n )P, is the convergence ofP n (B)!P(B), over all B continuity sets of P - sets such thatP(¶B)= 0, where ¶B is the boundary of B. Another characterization is the convergence of R E hdP n ! R E hdP, when we run through a convenient set of test functions, say the bounded and Lipschitz continuous functions. Then, in the particular case of convergence to a degenerate probability at one point we have Lemma 2.2.1. Let x2 E. LetP n be a sequence of probabilities on E. Then the following are equivalent: a. P n )d x ; b. P n (B)! 0, for all Borel set B such that x62 B. Proof. b.) a.: consider an open ballr(x;x 0 )<e. Let h continuous with Lipschitz con- stant C and bounded, split the integrals in the openfr(x;x 0 )<eg and closedfr(x;x 0 ) eg Z E hdP n Z E hdd x 0 Z r(x;x 0 )<e jh h(x 0 )jdP n + Z r(x;x 0 )e jh h(x 0 )jdP n Ce+ 2maxjhjP n (r(x;x 0 )e); 13 and the result follows sincee is arbitrary. The other implication is clear, since any such set is a continuity set ford x 0 . Now, the properties: Proposition 2.2.1 (properties of a good rate function). If I is a good rate function on (a Polish space) E, then: A) I is lower semi-continuous (LSC), i.e. liminf x n !x I(x n ) I(x), for all x2 E; B) I attains its minimum on every closed nonempty set; C) in particular, there exists x 0 such that I(x 0 ) is the global minimum; D) the global minimum is I(x 0 )= 0; E) let S :=fx2 E;I(x)= 0g, then for all Borel set B such that B\ S = / 0, we have P e (B)! 0; F) (WLLN) if there exists only one global minimum at x 0 , thenP e )d x 0 . Proof. To verify A), observe that E is a metric space, so the lower level setsfx2 E;I(x) Cg, being compact, are closed, which is equivalent to I being LSC. For B), assume B E is closed and I B := inffI(x);x2 Bg <¥; taking an approximating se- quence x n 2 A I B I(x n ) < I B + 1=n, I(x n )# I B , so that x n 2 B\fx;I(x) I B + 1g, it follows, from the compactness property and the closedness of A, that x k n ! x 0 2 B, for some subsequence x k n 2 B and, from LSC, that I B = limI(x k n ) I(x 0 ); hence I B = I(x 0 ). C) follows from the previous property by taking the closed set to be B= E and D) follows 14 from the lower and upper limit applied to the whole space E. To verify E), observe that c := inf B I > 0, so by the upper limit condition limsup e!0 e logP e (B)<c< 0. Hence, there existse 0 such that for alle <e 0 :P e (B)< e c=e ; the result follows lettinge! 0. For F) use Lemma 2.2.1 and use the previous property. Another most striking property of a LDP with good rate function is the generalization of Laplace’s method to Polish spaces (see Laplace’s principle in Definition 2.3.1). Why a Polish space. Seminal important results are obtained in Polish spaces and they make use of Prokhorov’s theorem which equates relative compactness to tightness in the space of probability measures. Nevertheless, the definition of a LDP could be given in a general topological space, for a view in that direction see [DZ98, Section 1, Section 4.1] 2.3 Laplace’s principle. One may argue that the fundamental reason why we could employ Laplace’s method for integrals in verifying Claim 2.1.1 is that an equivalent way to state a LDP is by way of the Laplace’s principle, which is a remarkable generalization to any Polish spaces E. Definition 2.3.1. We sayP e satisfies the Laplace’s principle with rate function I on E if, for all bounded continuous real valued functions h defined on E: lim e!0 e log Z E e f(x) e P e (dx)= inf x2E f f(x)+ I(x)g: (2.3) 15 S.R.S. Varadhan proved that the LDP of Definition 2.2.1 implies Laplace’s principle (see [Var84, Section Theorem 2.2]). The converse result is credited to E. Bryc (see [DZ98, Theorem 4.4.2]). Both P. Dupuis, R.S. Ellis in [DE97] and T. Kurtz, J. Feng in [FK06] explore this idea in their approach to this subject. 2.4 Weak large deviations. Exponential tightness. Relaxing the upper bound condition (i) of Definition 2.2.1 to run over compact sets we obtain Definition 2.4.1. A family of probability measuresP e , e 0 is said to satisfy the weak large deviations principle (WLDP), if it satisfies the lower bound (ii) and the upper bound (i) is satisfied for compact sets. Definition 2.4.2. We sayP e , e 0 is exponentially tight if for all C > 0, there exists K C E compact, such that limsup e!0 e logm e (En K C )<C: For an extensive treatment of the exponential tightness in large deviations see [FK06] where the basic methodology is the exploration of this concept by analogy with the role of tightness in the weak convergence of probabilities. The analogy is based on: if the family of measures m e , e 0 is exponentially tight, then there exists a subsequence which satisfies the LDP (see [FK06, Theorem 3.7 p.44]). 16 We have the equivalence Theorem 2.4.1. A family of probability measures satisfy a LDP with a good rate function iff it satisfies a WLDP and the exponential tightness property. Proof. See [DZ98, Lemma 1.2.18, Exercise 4.1.10]. The idea for the direct implication above can readily be seen in a locally compact setting (Exercise 1.2.19 form [DZ98]): let C > 0. By compactness A :=fx; I(x) ag[B i for some finite number of open balls B i , centered at points in A. By local compactness the closure of each ball is compact and so is the finite union[B i . From the upper bound condition in LDP limsup e!0 logP e (\En B i )< inf x2\EnB i I(x)<C; observing that I(x)> C for all x2 En B i . 2.5 Contraction principle. The contraction principle concerns the transference of the LDP from one space to another, under the push-forward measure. Eventually, it can be applied on one set with two different metrics. 17 The setting: let(E;r),(E 0 ;r 0 ) be Polish spaces andp : E! E 0 be a measurable map, define the family of measuresm 0 e on E 0 by m 0 e (A) :=m e (p 1 (A)); and the function I 0 (y) := inffI(x); y=p(x)g: Convention: inf ? S=¥. Theorem 2.5.1 (contraction principle). Assume p is continuous and that m e satisfy the LDP on E with good rate function I, ase! 0. Then I 0 is the good rate function for m 0 e on E 0 ase! 0. Proof. See [DZ98, Section 4]. There are situations (see Section 2.6.3.4) when the natural map to use with the con- traction principle is not continuous. For instance, in general, the map (Ito stochastic integral) W7! Z T 0 s(s)dW(s) is not continuous (pathwise). Next we state a result that allows to overcome this technical nuisssancey. For a more detailed discussion see [DZ98, Section 4.3]. 18 Additionally: fix a probability space (W;F;P), let X e , e 0 be a family of ran- dom variables with values in E, and Y e =p(X e ). Furthermore, consider a sequence of continuous mapsp m : E! E 0 and let Y m;e =p m (X e ). Theorem 2.5.2 (approximately exponentially equivalent). Assume the family X e , e 0 satisfies a LDP on E with good rate function I and p m approximates p in the following sense: (i) for all C > 0, lim m!¥ sup I(x)C r 0 (p m (x);p(x))= 0; (ii) for alld > 0, lim m!¥ limsup e!0 e logP(r 0 (Y e ;Y m;e )>d)=¥: Then the familym 0 e satisfies a LDP on E 0 with good rate function I 0 (y)= inffI(x) ;y=p(x)g: . Proof. See [DZ98, Theorem 4.2.23]. The first condition is uniform convergence on the level sets and in the case of the second condition we say Y e and Y m;e are approximately exponentially equivalent. 19 2.6 A selection of seminal examples. 2.6.1 Cramer’s theorem. On the real line, consider a sequence of IID random variables X i with distribution function F. Assume: the moment generating function M(l) :=Ee lX 1 = Z R e lx dF(x) (2.4) is finite for alll; the random variable X 1 is not bounded below or above. For every natural number n, denote byP n the distribution of the sample mean X n := (X 1 +:::+ X n )=n. Consider the function I(x) := sup l2R (lx log(M(l))); x2R: (2.5) Theorem 2.6.1 (Cramer). The sequenceP n satisfies a LDP with a good rate function given by (2.5). Remark 2.6.1. The role of the second assumption above is of a technical nature. It avoids a trivial LDP due to absence of any probability mass in a tail event. In that case, 20 necessarily the rate would be infinite. In fact it is possible to drop the second assumption, see [DS89] for a proof, and see more details in the discussion ahead. With respect to the exponential moments condition: inR it is not necessary either, inR d , with d 3 it is necessary for the upper bound. See [DZ98, p. 68]. Proof. See[Var84]. In any case, we present here a summary of the arguments of the proof. The argument for the lower bound in Cramer’s theorem can be traced to to H. Cramer, (1938) and F. Esscher (1932), and is now known as Cramer transformation or tilting, whereas for the upper bound, it can be traced to H. Chernoff (1952). We outline here its main insights. Facts about the function I(x) For all x, for all l, I(x)lx log(M(l)). Since for l = 0, M(0)= 1, 0xlog(M(0))= 0, consequently, I 0. DenotingL(l) := logM(l), it follows Hoelder inequality that L is convex. Also, from the assumption that the distribution F is not contained in any proper subset of the real line, it follows that L(l)=jlj!¥ asjlj!¥. Now I is the Legendre transformation of L 1 . So, I is necessarily convex and also I(y)=jyj!¥ asjyj!¥. In fact, I and L are conjugate dual: I =L and I =L. Being the point-wise limit of lower semi-continuous func- tions both are lower semi-continuous. Hence (ii), and(1) hold. Furthermore, it turns out that I has global minimum at X =EX 1 , it is non decreasing on (X;+¥] and non increasing on(¥;X]. See Figure 2.6.1 for typical graphs in the case of Bernoulli(:2), Exponential(:1), Poisson(5) and Laplace(0;:2) random variables: it is shown part of 1 the Legendre transformation ofL(q) :R!R is given byL (p) := sup q2R (pqL(q)) 21 the domain where the functions are finite: the rates of Be(:2), Poisson(5) jump to +¥ for x < 0. Observe that all are convex LSC functions inR with minimum (zero) at the mean of the distribution. Also they are smooth and strictly convex in the interior of D I =fx; ;I(x)<¥g. These are properties of any rate function obtained from Cramer’s theorem (see [dH00, LemmaI.14]). 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 Bernoulli 0.3 0 10 20 30 0 1 2 3 4 Exponential 0.1 0 10 20 30 0 5 10 15 20 25 30 Poisson 5 −5 0 5 0 5 10 15 20 25 Laplace 0 0.2 Figure 2.1: Rate functions for Be(:2), Exponential(:1), Poisson(5), Laplace(0;:3). 22 The Table 2.6.1 has examples of LD rate functions for the cases of Bernoulli, Normal, Poisson, Exponential and Laplace random variables. Distribution Ee lX 1 L (x) Be(p) 1 p+ pe l ( xlog x p +(1 x)log 1x 1p ; [0;1] ¥; otherwise: Normal(m;A), A> 0 e ml1=2(lm;A 1 (lm)) 1 2 (xm;A 1 (xm)) Poisson(q),q > 0 e q(e l 1) ( xlog x q +q x; x 0 ¥; otherwise: Exponential(q),q > 0 q ql ; l6=q ( log(qx)+qx 1; x 0 ¥; otherwise: Laplace(m;b), b> 0 e ml 1b 2 l 2 ;jlj< 1 b ( h(xm)+ log(1 b 2 h 2 ); x6=m 0; otherwise: whereh := b+ p b 2 +(xm) 2 (xm)b Table 2.1: Examples of rate functions. Upper bound. Let y> EX 1 ,l 0 and apply Chebychev’s inequality P n [y;¥)=P(X n y) e ly Ee lX n = e ly M l n n therefore applying 1=nlog to both sides 1 n logP n [y;¥) ly n + logM l n : 23 Sincel is arbitrary, it can be replaced by nl yielding 1 n logP n [y;¥)[ly logM(l)]; therefore optimizing overl 0 1 n logP n [y;¥) sup l0 [ly logM(l)]I(y): To justify the previous last inequality: if l < 0, by Jensen’s inequality, logM(l) lEX 1 ly, and, since I 0, in particular, sup l0 [ly logM(l)] sup l2R [ly logM(l)]= I(y). Similarly for intervals (¥;y] with y < x. With some more work one can show the upper bound to hold for every closed set C, considering the cases: x2 C or x62 C. This summarizes the proof for the upper bound. Lower bound. It is sufficient to verify for all open intervals liminf n!¥ 1 n logP n (yd; y+d)I(y); y2R; d > 0 In order to verify this, from the assumption that X 1 is not bounded below or above, it follows that lim jlj!¥ logM(l) jlj =¥: 24 Therefore, for any y sup l2R [ly log(M(l))] is attained at a finitel 0 . Then I(y)=l 0 y log(M(l 0 )); M 0 (l 0 ) M(l 0 ) = y Now, define the distribution F 0 by dF 0 (x) := e l 0 x M(l 0 ) dF(x): At this point, observe two heuristical aspects: - provided y6= x andd is small enough, x is outside the neighborhood(yd;y+d), so, heuristically, we can consider this a small probability event under each P n ; - the mean of F 0 is R xdF 0 (dx) = R xe l 0 x dF(x)=M(l 0 ) = M 0 (l 0 )=M(l 0 ) = y, so it is centered on y. 25 Proceeding, for all 0<d 1 <d, P n (yd;y+d) P n (yd 1 ;y+d 1 ) = Z j(x 1 +:::+x n )=nyj<d 1 dF(x 1 ):::dF(x n ) e (nyd 1 )jl 0 j Z j(x 1 +:::+x n )=nyj<d 1 e l 0 (x 1 +:::+x n ) dF(x 1 ):::dF(x n ) e (nyd 1 )jl 0 j [M(l 0 )] n Z j(x 1 +:::+x n )=nyj<d 1 dF 0 (x 1 ):::dF 0 (x n ): By the law of large numbers, lim n!¥ Z j(x 1 +:::+x n )=nyj<d 1 dF 0 (x 1 ):::dF 0 (x n )= 1; consequently, liminf n!¥ 1 n logP n (yd; y+d)I(y)d 1 jl 0 j: Lettingd 1 ! 0 the lower bound is proved. This summarizes the proof of the lower bound. 2.6.2 Sample path LD I: Mogulskii’s and Schilder’s theorems. 2.6.2.1 Notation. We use 0< T <¥ to denote the right bound of a time domain; d= 1;2;3;::: denotes the dimension; C([0;T];R d ) denotes the space of continuous functions defined on the 26 interval [0;T] with values in the Euclidean space (R d ;jj), equipped with the uniform metric, jjyjjjr 0T (j;y) := sup 0tT jj(t)y(t)j: For simplicity we write C([0;T]) C([0;T];R d ). Similarly L p (0;T) denotes the usual pth integrable functions, 1 p<¥. We use L ¥ (0;T) to denote the space of essentially bounded functions. Let AC([0;T]) denote the absolutely continuous functions, equiva- lently: the continuous functions j =j(0)+ Z 0 g(s)ds; with g2 L 1 (0;T): Denote by H 1 x ([0;T]) the subspace of AC([0;T]) such that ˙ j2 L 2 (0;T), j(0)= x. The kernel of a linear map A is denoted by N(A); in a Hilbert space H, N(A) ? denotes the orthogonal complement of N(A) in H; the image of A is denoted by R(A). For the remainder we fix a stochastic basis: (W;F;F t ;P) in which the filtration (F t ) isP- complete and right continuousF t =F t + := T e>0 F t+e . Any Wiener process will be adapted to the filtrationF t and denoted by(W(t);t 0). 27 2.6.2.2 Mogulskii’s theorem. Let(X k ) k1 be a sequence of IID random vectors with values inR d . Here, we change notation slightly. Assume H(l) := logEe (l;X 1 ) <¥ for all l. Define the sequence of processes X n (t), n= 1;2;:::, t2[0;1] X n (t) := 1 n bntc å k=1 X k = 1 n j1 å k=1 X k ; j 1 n t < j n ; j= 1;:::;n+ 1 wherebntc is the integer part of nt. Fixed n, X n is a right continuous process started at zero, for which the paths are step functions with eventual jumps only at the points j=n, (X j 6= 0), j= 1;:::;n+ 1. Denote bym n the law of X n in L ¥ (0;1) . We have Theorem 2.6.2 (Mogulskii). The sequencem n satisfies a LDP in L ¥ (0;1) with rate func- tion S(j)= 8 > > > < > > > : R 1 0 L(j(t))dt; ifj2 AC([0;1]);j(0)= 0 ¥; otherwise: (2.6) where L(x) := H (x)= sup l2R d (l;x) H(l) . Proof. See [DZ98, section 5.1]. 28 Example 2.6.1 (Cramer). Note that X n (1)= 1 n å n k=1 X k . In fact, we can recover Cramer’s theorem from Theorem 2.6.2. Consider instead the sequence of processes e X n (t), n= 1;2;::: t2[0;1]: e X n (t) := X n (t)+ t [nt] n X bntc+1 = 1 n j1 å k=1 X k + t j 1 n X j ; j 1 n t < j n ; j= 1;:::;n+ 1 Fixed n, e X n is a right continuous process started at zero, for which the paths are polygo- nal lines joining the points( j=n; X n ( j=n)), j= 1;:::;n+ 1. Denote bye m n the law of X n in C([0;1]). A LDP with the same rate can be proved for e X n (t) on C([0;1]) (see [DZ98, Lemma 5.1.4]) and we can now look at the continuous map p 1 : C([0;1]) ! R d w 7! p 1 (w) :=w(1) and use the contraction principle (Theorem 2.5.1) to conclude X n (1)= e X n (1)= 1 n å n k=1 X k satisfies a LDP with rate function I(x) = inffS(j); x=p 1 (j)g = inff Z 1 0 L( ˙ j(t))dt;j2 AC([0;1]); j(0)= 0; j(1)= xg: 29 We now verify that I(x)= L(x). Since L is convex, by Jensen’s inequality L(x)= L Z 1 0 ˙ j(t)dt Z 1 0 L( ˙ j(t))dt; u2 AC([0;1]); j(0)= 0; j(1)= x: Hence, L(x) I(x). By using the linear function j : t7! xt, we have attainment and it follows that I(x) L(x). Remark 2.6.2 (LD levels). The previous example gives insight to the term “contraction". We obtained a LDP by projecting the result from the “higher level" of the sample paths to the “lower level" of the one dimensional marginal. Remark 2.6.3 (Donsker). It is interesting to observe the strong parallel between point- wise and functional results for CLT and LD type of limit theorems. See Table 2.6.2.2. For visual illustration Figure 2.6.2.2 contains of realizations of the processes X n p n and e X n p n, for n= 15;100;1000, X i s Be(:5), IID, i= 1;:::;n. Limit theorem Pointwise Functional CLT X n (1) p n) N(0;1) X n () p n) W() LD X n (1), rate L(x) X n (), rate S(j)= R 1 0 L( ˙ j(t))dt Table 2.2: Pointwise and functional limit theorems. 30 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 Figure 2.2: Sample paths for X n (t) p n (top) and e X n p n. 2.6.2.3 Shilder’s theorem. To discuss the main example of this section, we present first an equivalent way of formulating the LD principle 2.2.1, the so called Freidlin-Wentzell estimates (see [FW98, Section 3.3]). Definition 2.6.1. Let E be a Polish space with metricr. LetP e be a family of probability measures, e > 0, defined on the Borel subsets of E. Let l(e) be a positive real valued 31 function converging to+¥ ase! 0 and let S(x) be a function on E with values in[0;¥]. We say thatl(e)S(x) is an action function forP e ase! 0 if: 1. the set K(s) :=fx;S(x) sg is compact for every 0 s<¥; and (I) (upper bound) for alld > 0,g > 0 and s> 0 there exists ane 0 > 0 such that P e fy; r(y;K(s))d)g expfl(e)(sg)g for allee 0 ; (II) (lower boud) for alld > 0,g > 0 and x2 X there exists ane 0 > 0 such that P e fy; r(x;y)<dg expfl(e)(S(x)+g)g for allee 0 . The function S(x) is called normalized action function andl(e) is the normalizing co- efficient. If E is a function space, the term action functional is used instead. Remark 2.6.4. The normalized action function in this formulation plays the role of the rate function in previous Definition 2.2.1, moreover, the lower bound conditions are equivalent (II), (ii), which elucidates the local nature of the estimate. The previous upper bound condition is stronger, (i)) (I). Conversely, if the rate is good, (I)) (i) 32 (see [FW98, Theorem 3.3]). The normalizing coefficient plays the role of the speed as in the formal expression dP e e l(e)S(x) dx: Remark 2.6.5. If X e ,e > 0, is a family of E valued random elements defined on (even- tually distinct) probability spaces (W e ;F e ;Q e ), then the action function for the laws P e () :=Q e (X e 2) is called the action function for the family X e and conditions (II), and (I) can be written as P e (r(X e ;K(s))d) expfl(e)(sg)g; and P e (r(X e ;y)<d) expfl(e)(S(x)+g)g; respectively. Let(W(t);t2[0;T]) be a standard Wiener process with values in R d . It is well known that this process induces a lawm W on E = C([0;T]). Define, for anyj2 C([0;T]) S 0 (j) := 8 > > > < > > > : 1 2 R T 0 j ˙ j(t)j 2 dt; ifj2 H 1 0 ([0;T]) ¥; otherwise: (2.7) We have: 33 Theorem 2.6.3 (Shilder). The action functional for the family of random elements X e = eW ise 2 S 0 . Remark 2.6.6. According to the equivalences mentioned in Remark 2.6.4, using another language one concludes a LDP for the laws of p eW on C([0;T]) in the sense of Defini- tion 2.2.1, with good rate function given by (2.7). Proof. See [FW98, Theorem 2.1]. Given Shilder’s theorem, Theorem 2.5.1 can readily be applied in situations where we have a continuous mapping from C([0;T]) to some other space. Example 2.6.2 (Green kernel I). (see [FW98, p.82]) Let W be a standard Wiener pro- cess with values inR. Let g(s;t) be ktimes continuously differentiable function on the square[0;T][0;T], k 1. Denote by C k ([0;T]) the ktimes continuously differentiable functions on the interval[0;T] with values inR and consider the integral operator pv(t) := Z T 0 g(s;t)dv(s) := Z T 0 ¶g(s;t) ¶s v(s)ds+ g(T;t)v(T) g(0;t)v(0); (2.8) for all v2 C([0;T]). We have that p : C([0;T])! C k1 ([0;T]) is continuous with the metric r k1 (v 1 ;v 2 ) := maxfjjv 1 v 2 jj;:::;jjv (k1) 1 v (k1) 2 jjg: 34 Therefore, from the contraction principle (Theorem 2.5.1) and Shilder’s theorem, (Theo- rem 2.6.3) the normalized action functional for a process X e :=peW on C k1 ([0;T]) is S X (j) = minfS 0 (v);j =pvg = minf 1 2 Z T 0 j ˙ v(s)j 2 ds; j =pvg: Again min / 0=¥, namely ifj=pv has no solution v2 H([0;T]), with v(0)= 0. Keeping the notations, this time introduce the integral operator G : L 2 (0;T)! C k ([0;T]) Gv(t) := Z T 0 g(s;t)v(s)ds; it is immediate that: pv= G ˙ v, provided v2 H([0;T]). Hence, S X (j) = minf 1 2 je vj 2 ; j = Ge vg We can express this action functional resorting to the generalized inverse of G setting: G 1 j = u, Gu=j and u is orthogonal to N(G): 35 Now, if S X (j)<¥, then there exists v2 H[0;T], such thatj =pv= G ˙ v. Also, for any v2 L 2 (0;T), v 0 + G 1 j, with v 0 2 N(G) and by orthogonality jvj 2 =jv 0 j 2 +jG 1 jj 2 jG 1 jj 2 ; hence, for allj2 C k1 ([0;T]), S X (j)= 8 > > > < > > > : 1 2 jG 1 jj 2 ;j2 R(G) ¥; otherwise: (2.9) Example 2.6.3 (Green kernel II: A linear differential equation subject to random forc- ing). Continuing the previous example. We can look at the problem of finding the asymp- totic behavior of the solution of the differential equation P d dt X e (t)= p e ˙ W(t); t2[0;T] (2.10) where P(d=dt)=å n k=0 a k d n =dt n is a differential operator, with homogeneous boundary conditions (see [CL55, Chapter 7]). The function g(s;t) is the Green kernel associated with the differential operator P( d dt ): X e (t)=e Z T 0 g(s;t)dW(s)=p( p eW): 36 In this case G has a single-valued inverse G 1 = P d dt defined on the functions satis- fying the homogenous boundary conditions. From (2.9), it follows that the action func- tional for the family X e ise 1 S X (j), where S X (j) := 1 2 Z T 0 jP d dt j(t)j 2 dt; (2.11) and if j does not satisfy the boundary conditions or if d n1 =dt n1 j is not absolutely continuous, then S X (j)=+¥. Example 2.6.4 (exterior of a ball I). (see [FW98, p.87])) We will conclude this discussion with an example studyingP(jX e j > C), for fixed C > 0, as e vanishes. Let us focus on the case where P(d=dt) has a CONS of eigenvectors m k , with associated eigenvaluesl k , k= 1;2;:::. Claim 2.6.1. lim e!0 e logP(jX e j> C)= lim e!0 e logP(jX e j C)= C 2 l 2 2 wherel is the eigenvalue with the smallest magnitude. To verify this claim observe that the open set E :=fj2 L 2 (0;T);jjj> Cg is regular with respect to the functional S X defined in (2.11), i.e., inf E S X (j)= min E S X (j): 37 One inequality is immediate. To verify the other, take any 0 <d < 1, then dE E so that min j2E S X (j) inf j2dE S X (j) = inf j2E S X (j)d 2 and the result follows lettingd! 1. We conclude lim e!0 e logP(jX e j>C)= lim e!0 e logP(jX e jC)=minfS X (j)g=minf 1 2 jP d dt jj 2 g where the minimum is taken over the functions satisfying the boundary conditions and with norm C. Finally, givenj =å k a k m k , withjjj 2 =å k a 2 k = C 2 Z T 0 jP d dt j(t)j 2 dt= å k a 2 k l 2 k C 2 l 2 ; and the lower bound is attained at Cm, where m is a eigenvector associated to l. This finishes the verification of the claim. Example 2.6.5 (exterior of a ball II). Continuing the previous example, consider the randomly forced linear differential equation: ¨ X(t)+ lX(t)= p e ˙ W(t); l > 0; t2[0;T]; X(0)= X(T)= 0; (2.12) 38 m k (t)= q 2 T sin( kp T t), l k = l k 2 p 2 T 2 . For convenience, assume l <p 2 =T 2 , so that the eigenvalue with smallest magnitude is l 1 . For instance, P(jX e j 1) exp (lp 2 =T 2 ) 2 2e ; ase! 0: 2.6.3 Sample path LD II: random perturbations of dynamical systems. Let (W(t);t2[0;T]) be a standard Wiener process inR d . Consider the stochastic differential equation ˙ X(t)= b(X(t))+ p es(X(t)) ˙ W(t); X(0)= x2R d ; e > 0; t2[0;T]; (2.13) which we may consider as a small random perturbation of the deterministic initial value problem ˙ X(t)= b(X(t)); t 0 X(0)= x2R d : (2.14) 39 2.6.3.1 Assumptions. Assume b(b i ) :R d !R d , and s(s i; j ) :R d !R dd are uniformly Lipschitz continuous ands is bounded, i.e. there exists K > 0 such that jb i (x) b i (y)j+js i j (x)s i j (y)j Kjx yj; for all x;y;i; j; (2.15) js i j (x)j K for all x;i; j: (2.16) We recall some consequences of these assumptions which are relevant for our discus- sion: for all 0 < T <¥, we obtain existence and uniqueness of a family of (strong) solutions X e X x;e of (2.13), inducing a lawL(X x;e ) on C([0;T]); moreover, we have the convergence in distributionL(X x;e ))d X 0, where X 0 is the solution of the deterministic part of (2.13). The first result is a standard result from the theory of stochastic differential equations. The second result is a LLN type of result and a proof can be found in [FW98, p.45]. Next, we will discuss the LDP forL(X x;e ), starting withs 1. 2.6.3.2 Small Gaussian perturbations. Consider ˙ X e (t)= b(X e (t))+e ˙ W(t); 0 t T; X e (0)= x2R d : (2.17) 40 Furthermore, consider the operatorX :(x;u)!j, where j(t)= x+ Z t 0 b(j(s))ds+ u(t); 0 t T; (2.18) is said the skeleton equation (andX is said the skeleton map). It follows from Gron- wall’s inequality and from b being Lipschitz thatXX x (u) :R d C([0;T])! C([0;T]) is Lipschitz continuous in each argument. Moreover, for fixed x2R d : the inverse oper- atorX 1 x is given by X 1 x j(t)= u(t)=j(t) x Z t 0 b(j(s))ds; 0 t T; (2.19) hence, in this case,X x is a homeomorphism over its range. Again for fixed x2R d , define the functional S x (j) := 8 > > > < > > > : 1 2 R T 0 j ˙ j(t) b(j(t))j 2 dt; ifj2 H 1 x ([0;T]) ¥; otherwise: (2.20) The S x is the LD rate of X e defined in (2.17). This is an application of the contrac- tion principle and Shilder’s theorem: observe that X e (t)=X x (eW(t)), therefore, using Theorem 2.5.1 and Theorem 2.19 minfS 0 (X 1 x (j))g = 1 2 Z T 0 jX 1 x (j(t))j 2 dt = 1 2 Z T 0 j ˙ j(t) b(j(t))j 2 dt 41 forj2 H x ([0;T]). 2.6.3.3 Uniform estimates. Taking a second look at the previous considerations, one can ask about the role of x: we obtained a whole family of action functionals S x . Can we obtain a functional independent of x for which the bounds hold uniformly? To answer this, we introduce the following definition, again, due to A.D. Wentzell and M.I. Freidlin (see [FW98, p.92]). Definition 2.6.2. Let E be a metric space. Let (W e ;F e ;P x e ) be a family of probability spaces, x2 X, e > 0. Let X e (t) be a stochastic process with values in E. Let r 0T be a metric in the space of functions on[0;T] with values in E and let S be a functional on this space. We sayl(e)S is the action functional for the family(X e ;P x e ), ase! 0, uniformly in a classA of subsets of E if: (0 u ) the functional S is LSC and[ x2A K x (s) is compact for any compact A X, where K x (s) :=fj2 C([0;T]); j(0)= x; S(j) sg; (I u ) (uniform upper bound) for all d > 0, g > 0, s 0 > 0 and A2A there exists an e 0 > 0 such that P x e fr 0T (X e ;K x (s))dg expfl(e)(sg)g (2.21) for allee 0 , s s 0 , and x2 A; 42 (II u ) (uniform lower bound) for alld > 0,g > 0,s 0 > 0 and A2A there exists ane 0 > 0 such that P x e fr 0T (X e ;j)<dg expfl(e)(S(j)+g)g (2.22) for allee 0 , all x2 A and allj2 K x (s 0 ). We can now give the complete picture: define S(j) := 8 > > > < > > > : 1 2 R T 0 j ˙ j(t) b(j(t))j 2 dt; ifj2 AC([0;T]) ¥; otherwise: (2.23) Theorem 2.6.4. The functionale 2 S(j) defined by 2.23 is the action functional for the family of processes X e defined by (2.17) on C([0;T]), uniformly with respect to the initial point x2R d . Proof. See [FW98, section 4.1]. Remark 2.6.7. The classA mentioned in Definition 2.6.2 isfR d g. Also note that S(j)< ¥ onj such that ˙ j2 L 2 (0;T), but not on all absolute continuous functions. 2.6.3.4 Small random perturbations. For the general cases6= 1, a LDP (or action functional) can also be found. Neverthe- less we cannot apply exactly the same method of proof as in the case of Gaussian random 43 perturbations where we used the continuity of the solution operator. The corresponding operator here would beX x : u!j wherej is the unique solution of j(t)= x+ Z t 0 b(j(s))ds+ Z t 0 s(j(s)) ˙ u(s)ds; 0 t T: (2.24) In fact, a result due to E. Wong and M. Zakai, [WZ65], shows that, under additionally regularity assumptions on s, we can take a continuous piecewise linear approximation W n (t) to the Wiener process W(t) in C(0;T ;R d ) and the corresponding diffusion X e n converges to the Itô diffusion in Stratonovich form. This diffusion has an additional parcel (e=2)s(x)Ds(x) (the so called Wong-Zakai correction term). It is relevant to observe that making use of rough path theory a convenient topology has been introduced under which the solution map is continuous allowing thus to use the the contraction principle. See more details in [LQZ02]. Theorem 2.6.5 (Freidlin-Wentzell). For fixed x2R d , the family P x e of laws of the solution of problem (2.13) satisfies a LDP on C([0;T]) ase! 0 with rate function given by S x (j)= inff 1 2 Z T 0 j ˙ u(s)j 2 ds; j =X x (u); for some u2 H 1 0 ([0;T])g: (2.25) If, in addition, a(x) :=s(x)s(x) is uniformly elliptic, i.e. there exists C > 0 such that (a(x)l;l)> Cjlj 2 ; l =(l 1 ;:::;l d )2R d ; x2R d (2.26) 44 then, we have uniform estimates and the simplified expressions: S(j) = 8 > > > < > > > : 1 2 R T 0 ˙ j(t) b(j(t)); a(j(t)) 1 ( ˙ j(t) b(j(t)) dt; j2 AC([0;T]) ¥; otherwise = 8 > > > < > > > : 1 2 R T 0 s(j(t)) 1 ( ˙ j(t) b(j(t))) 2 dt; j2 AC([0;T]) ¥; otherwise = 8 > > > < > > > : 1 2 å i j R T 0 a i j (j(t)) ˙ j i (t) b i (j(t)) ˙ j j (t) b j (j(t)) ; j2 AC([0;T]) ¥; otherwise; where(a i j (x)) := a(x) 1 . Proof. See [DZ98, Section 5.6]. We sketch the proof. The idea is to construct an ap- proximation solution X n;x;e for which the associated skeleton mapX n x is continuous and for which we can use the contraction principle and Shilder’s theorem. After this we can approximate the intended skeleton map (2.24) byX n x . This suffices since it implies X e and X ne are approximately exponentially equivalent (recall Theorem 2.5.2). Remark 2.6.8. We only write some of the conditions of the subset of C([0;T]) over which S 0 <¥, however the expression is consistent. In order to discuss further examples consider ˙ X(t)= AX(t)+ p eB ˙ W(t); X(0)= 0 t2[0;T]; e > 0 (2.27) 45 where A;B2R dd . We need the notion of generalized inverse for B2R dd B 1 y= u, u is the unique element inR d orthogonal to N(B) such that Bu= y: Formally, any solution of Bu= y is of the type u= B 1 y+ N(B). So, we are merely picking the particular solution B 1 y to be in the orthogonal complement of N(B). See Section 3.2.1 for more details. Proposition 2.6.1. The family of processes X e defined by (2.27) has action functional e 2 S where S(j)= 8 > > > < > > > : 1 2 R T 0 jB 1 ( ˙ j(t) Aj(t))j 2 dt; ˙ j(t) Aj(t)2 R(B); t2[0;T] ¥; otherwise: Proof. By Theorem 2.6.5, the action functional is determined by S(j)= minf 1 2 Z T 0 j ˙ u(s)j 2 ds; j =X 0 (u); for some u2 H 1 0 ([0;T])g for all j2 C([0;T]). Here we have a minimum (i.e., the infimum is attained) since X 0 : f! g g= Z 0 Ag(s)ds+ B f 46 is continuous: in fact, for all t2[0;T] jg 1 (t) g 2 (t)j Z T 0 jAg 1 (s) Ag 2 (s)jds+jB f 1 (t) B f 2 (t)j kAk Z T 0 jg 1 (s) g 2 (s)jds+kBkk f 1 f 2 k; and continuity follows from the Gronwall’s inequality. Now fixj2 C([0;T]): if S(j)< ¥, then j =X x (u), B ˙ u= ˙ j Aj for some u2 H 1 0 ([0;T]), i.e., ˙ jAj2 R(B). Also, observe that necessarilyj2 H 1 0 ([0;T]. As a more specific example we have Example 2.6.6 (oscillator). Consider the randomly forced oscillator X e defined by ¨ X(t)+ lX(t)= p e ˙ W(t); l > 0; t2[0;T]; X(0)= ˙ X(0)= 0: (2.28) Proposition 2.6.2. The lawsL(X e ), e > 0, satisfy a LDP on C([0;T];R) with rate function S(j)= 8 > > > < > > > : 1 2 R T 0 ( ¨ j(t)+ lj(t)) 2 dt; ˙ j2 H 2 0 ([0;T]); ¥; otherwise 47 where H 2 0 ([0;T])=f f ; f(0)= ˙ f(0)= 0; f; ˙ f2 AC([0;T]); ¨ f2 L 2 (0;T)g: Proof. Working in the phase space, consider Z := (X;Y) t , where ˙ X = Y , and W := (V;W) t , where V is another standard Wiener process independent of W. The equation becomes 0 B B @ ˙ X ˙ Y 1 C C A = 0 B B @ 0 1 l 0 1 C C A 0 B B @ X Y 1 C C A + p e 0 B B @ 0 0 0 1 1 C C A 0 B B @ ˙ V ˙ W 1 C C A : Equivalently ˙ Z(t)= AZ(t)+ p eB ˙ W(t); Z(0)= 0 t2[0;T]; e > 0 where A= 0 B B @ 0 1 l 0 1 C C A ; B= 0 B B @ 0 0 0 1 1 C C A : It is immediate to verify that (x;y)2 R(B), x= 0 (u;v)2 N(B), y= 0 hence N(B) ? = R(B). Therefore 48 0 B B @ ˙ X ˙ Y 1 C C A A 0 B B @ X Y 1 C C A 2 R(B), ˙ X = Y: Hence B 1 0 B B @ 0 B B @ ˙ X ˙ Y 1 C C A A 0 B B @ X Y 1 C C A 1 C C A = 0 B B @ 0 ˙ Y+ lX 1 C C A : Therefore, the rate function forL(Z e ) on C([0;T];R 2 ) is S(X;Y)= 8 > > > < > > > : 1 2 R T 0 ( ˙ Y(t)+ lX(t)) 2 dt; ˙ X = Y ¥; otherwise: We have not been writing the set of conditions that determine the subset where the rate function is finite. In this case, it is immediate to check that the necessary and sufficient conditions are ˙ X = Y , Y2 H 1 0 ([0;T]), Y(0)= X(0)= 0. Finally, to verify the claim, use the contraction principle with the projection p 1 : C([0;T];R 2 ) ! C([0;T];R) (X;Y) 7! X: Note that in the underlying probability space P(p((X;Y))2)=P(X2); 49 so we do obtainL(X e )=L(Z e )p 1 and for allj2 C([0;T];R) S(j) = minfS(X;Y);j =p 1 (X;Y)g = 8 > > > < > > > : 1 2 R T 0 ( ˙ Y(t)+ lX(t)) 2 dt; j = X; ˙ X = Y; Y2 H 1 0 ([0;T]);X(0)= 0 ¥; otherwise = 8 > > > < > > > : 1 2 R T 0 ( ¨ j(t)+ lj(t)) 2 dt; ˙ j2 H 2 0 ([0;T]) ¥; otherwise as wanted. The space H 2 0 ([0;T]) is also discussed in Example 3.2.5 in the context of reproducing kernel Hilbert spaces. 50 Chapter 3 THE GAUSSIAN SETTING. This chapter examines the fundamental structure underlying the LD rates in a Gaus- sian setting and discusses possible representations. 3.1 A finite dimensional tour. Let X=(X 1 ;:::;X d ) be a zero mean Gaussian vector on a probability space(W;F;P) with values in the Euclidean space(R d ;jj R d). Also,g =L(X) onR d and Q= cov(X)=(EX i X j ) d i; j=1 =(q i; j ) d i; j=1 : (3.1) The covariance matrix Q is symmetric, and positive (x t Qx 0), so there exists a square root matrix Q 1=2 , also positive and symmetric. It is straightforward to verify that X = Q 1=2 x as random vectors, where x is a standard Gaussian random vector ofR d , hence X is supported on Q 1=2 R d . 51 There are various ways to show that the family eX, e 0 satisfy a LDP as e! 0. Here we illustrate on a simple finite dimensional setting a method that has been general- ized for more general Gaussian elements. We state right away the result Theorem 3.1.1 (Baby Shilder). The family of Gaussian vectors eX, e > 0 satisfies a LDP onR d with rate function I(x)= 8 > > > < > > > : 1 2 jQ 1=2 xj 2 R d ; x2 Q 1=2 R d ¥; otherwise: (3.2) Remark 3.1.1. When a linear operator A on a Hilbert space does not have an inverse, we will use the following notion of generalized inverse that is proposed in [FW98]: A 1 x= y, Ay= x and y2 N(A) ? . See Section 3.2.1. From Theorem (3.1.1) we verify that the rate function is finite only on the support of the Gaussian vector X. The theorem also encodes an important notion associated to the Gaussian distributiong: its Cameron-Martin space. We will now introduce this space and some associated notations. Definition 3.1.1. The Cameron-Martin space ofg onR d , denoted byH g , is the set Q 1=2 R d with inner product ((u;v)) g :=(Q 1=2 u;Q 1=2 v) R d: (3.3) It follows from Remark 3.1.1 that this is indeed a well defined, symmetric bilinear form on Q 1=2 R d . It is also definite positive: if 0=kQ 1=2 uk g =juj R d , then u= 0, and 52 Q 1=2 u= 0. An alternative and equivalent way to write the same space without explicitly discussing generalized inverses is given in the following Proposition 3.1.1. The Cameron-Martin spaceH g is the set QR d with inner product (Qu;Qv) g :=(Qu;v) R d =(Q 1=2 u;Q 1=2 v) R d: (3.4) Proof. Observe that Q 1=2 R d = QR d : indeed, we have the isometric isomorphism Q 1=2 : Q 1=2 R d ;jj R d ! QR d ;jj g ; jQ 1=2 Q 1=2 uj g =jQ 1=2 uj R d and note that QR d is a subspace of Q 1=2 R d . Consequently, Q 1=2 R d = QR d . Also, note that Q 1=2 Qu= Q 1=2 u. Indeed, Q 1=2 Q 1=2 u= Qu and (Q 1=2 u;z) R d =(u;Q 1=2 z) R d = 0, for all z2 N(Q 1=2 ). Therefore(Qu;Qv) g =(Q 1=2 u;Q 1=2 v) R d =((Qu;Qv)) g . We wil use the notation(H g ;(;) g ) to refer to the Cameron-Martin space from here onwards. Lemma 3.1.1. We have the continuous inclusion i :(H g ;(;) g ),!(R d ;jj R d). Proof. For all u2R d ,jQuj R d =jQ 1=2 Q 1=2 uj R d CjQ 1=2 uj R d =jQuj g This space has an important role in the characterization of the the vectors h for which X and X+ h have equivalent distributions. It is instructive to pay attention to the case where Q has an inverse in the usual sense. 53 3.1.1 The non-degenerate case. If Q is non-degenerate, that is, N(Q)=f0g (or x t Qx> 0, x6= 0), thenH g =R d and (;) g is just the inner product determined by the matrix Q 1 on R d . It can be written in other equivalent ways: (x;y) g =(x;Q 1 y) R d =(Q 1=2 x;Q 1=2 y) R d = d å i; j=1 x i y j q i; j ; Q 1 =(q i; j ) d i; j=1 : (3.5) We visualize the geometry induced by(;) g by identifying the unit ball under this metric: it is the setfx2R d ; x t Q 1 x 1g, an ellipsoidal level curve of the quadratic form x7! x t Q 1 x (see [Bog98, Chapter 1]). Furthermore, the distributiong has a density given by dg dl (x)= exp( 1 2 x t Q 1 x) (here l is the Lebesgue measure ofR d ). In terms of the objects just defined dg dl (x)= exp( 1 2 jxj 2 g ): (3.6) For any h2R d , X+ h is a Gaussian vector and its lawn is related to the lawg: n(A)= g(A h), that is,n =g( h). The density ofg( h) is readily seen to be dg( h) dl (x)= exp( 1 2 jx hj 2 g ): (3.7) 54 Also, g( h) and g ( and also l) are equivalent measures and the Radon-Nykodim derivative is dg( h) dg (x) = dg( h) dl (x) dl dg (x) = exp 1 2 jx hj 2 g + 1 2 jxj 2 g = exp 1 2 jxj 2 g + x t Qh 1 2 jhj 2 g + 1 2 jx 2 g = exp((x;h) g 1 2 jhj 2 g ): 3.1.2 The degenerate case. If Q does not have inverse (we say Q is degenerate), some structure is lost and, arguably, it is a more interesting situation. Contrary to the non-degenerate case,H g andR d can now be very different. Let r be the dimension ofH g . As a guiding example take Q= Q 1=2 = P 1 to be the projection onto the first coordinate ofR d (sayEX 1 X 1 = 1, andEX i X j = 0, i; j6= 1). Then the dimension ofH g is r= 1< d. Hereg is supported onH g hence it is singular tol and there is no density. Also, the shifted measureg( h) has support onH g + h. If h62H g , the supports are disjoint andg andg( h) are singular. If h2H g ,H g h=H g and we will use the following result, which essentially says that g can be seen as standard Gaussian multivariate distribution onR r H g , with the inner product(;) g . 55 Theorem 3.1.2. For all bounded measurable f :R d !R Z R d f(x)dg(x)= 1 (2p) r Z H g f(y)exp 1 2 jyj 2 g dy Proof. See [Dri10, Theorem 4.3]. Continuing, we conclude by a change of variables that for all bounded measurable f Z R d f(x)dg(x h) = Z R d f(x+ h)dg(x) = 1 (2p) r Z H g f(x+ h)exp 1 2 jxj 2 g dx = = 1 (2p) r Z H g h f(x)exp 1 2 jx hj 2 g dx = 1 (2p) r Z H g f(x)exp 1 2 jxj 2 +(x;h) g 1 2 jhj 2 g dx = Z H g f(x)exp (x;h) g 1 2 jhj 2 g dg(x): Conclusion,g( h) andg are equivalent iff h2H g . If h2H g the Radon-Nykodim derivative is dg( h) dg (x)= exp((x;h) g 1 2 jhj 2 g ): If h62H d g , theng( h) andg are singular. Summarizing, we have 56 Theorem 3.1.3 (Cameron-Martin). The Cameron-Martin spaceH g defined in 3.1.1 is continuously included inR d and for all h2R d g( h) is equivalent tog if and only if h2H g : If h2H g , the Radon-Nikodym derivative is given by dg( h) dg (x)= e (x;h) g 1 2 jhj 2 g : If h62H g , theng( h) andg are mutually singular. Within this language, the rate function from Theorem 3.1.1 can be written as I(x)= 8 > > > < > > > : 1 2 jxj 2 g ; x2H g ¥; otherwise: (3.8) We will now prove Theorem 3.1.1. Proof. We will show that (3.8 ) is the (normalized) action functional ofeX (see Defini- tion 2.6.1). For the lower bound, letd > 0, h2H g , then 57 P(jeX hj R d <d) = P jX h e j R d < d e = g B 0; d e h e = Z B(0; d e ) exp e 1 (x;h) g e 2 I(h) g(dx) = exp e 2 I(h) Z B(0; d e ) exp e 1 (x;h) g g(dx); where in the passage to the integral we used the Cameron-Martin Theorem 3.1.3. Now, by Chebyshev’s inequality g B 0; d e =P jXj R d < d e 1 e 2 d 2 EjXj 2 R d 3=4; (3.9) for sufficiently smalle. Since X = Q 1=2 x , withx being a d-dimensional standard Gaus- sian, it is straightforward to verify that Ej(X;h) g j 2 =jhj 2 g = 2I(h) so that P e 1 (X;h) g C = P je 1 (X;h) g j C e 2 Ej(X;h) 2 g j C 2 = 1=4 58 for C= 2 p 2e 1 p I(h). Therefore, P exp(e 1 (X;h) g ) exp(2 p 2e 1 p I(h)) 3=4 (3.10) From estimates (3.9) and (3.10) it follows that Z B(0; d e ) exp e 1 (x;h) g g(dx)> 1=2exp(2 p 2e 1 p I(h)); consequently P(jeX hj R d <d)> 1=2exp(e 2 I(h) 2 p 2e 1 p I(h)); which implies the wanted lower bound. As for the lower bound estimate, letF(s)=fx2 R d ; I(x)= 1 2 jxj 2 g sg and denote r(x;F)= inffjx yj R d ;y2 Fg, for any F closed and x2R d P(r(eX;F(s))>d)P(eX62F(s)) = P(eQ 1=2 x62F(s)) = P(jeQ 1=2 xj 2 g > 2s) = P(e 2 d å i=1 x 2 i > 2s): 59 Since Y =å d i=1 x 2 i , it is a chi-squared random variable with d degrees of freedom with moment generating functionEe tY =(12t) d=2 , t < 1=2. By Chernoff-Markov inequal- ity, for alll > 0 P(e 2 Y > 2s) e l2s Ee le 2 Y = e l2s (1 2le 2 ) d=2 inf l>0 fe l2s (1 2le 2 ) d=2 g; for sufficiently small e > 0. With some calculus we obtain the optimal l = 1 2e 2 d 4s . Substituting we obtain the bound P(r(eX;F(s))>d) Ce d e s=e 2 ; where C= e d=2 ( d 2s ) d=2 . This suffices to show the wanted upper bound, for sufficiently smalle < 0. Remark 3.1.2. As seen the argument has two main ideas: (a) for the lower bound, a convenient change of measure, based on the Cameron-Martin theorem; (b) for the upper bound, Chernoff-Markov type of estimates. With convenient modifications, the previous result and method of proof can be generalized to various settings of a Gaussian type and in fact the previous proof is our adaptation to Gaussian vectors of the result appearing in [FW98, Theorem 2.1, Theorem 2.1] which deals instead with Gaussian stochastic processes). 60 3.2 Square integrable processes. The main objective here is to give the necessary background in order to discuss the LDP for Gaussian processes that appears in [FW98, Theorem 4.1, p.93] (see Theorem 3.2.3). 3.2.1 Operators on Hilbert spaces: compact, positive, trace class. We review useful facts about operators in Hilbert spaces and define what is a repro- ducing kernel Hilbert space (RKHS) in connection to a positive definite kernel. Refer- ences for this section are [DS63, XI.8] for integral operators; [Ree72, VI] for Hilbert- Schmidt and trace class operators operators; and [BTA04] for the RKHS part. Let H denote a separable real Hilbert space with inner product(;) H and normjj H . Diagonalization of compact, self-adjoint operators. Any compact, self-adjoint oper- ator A on H, has the following representation Ah= å k1 l k (h;e k ) H e k h2 H; (3.11) where (e k ) is a complete orthonormal system (CONS) of eigenvectors of H and l k is the sequence of eigenvalues of A (furthermore, l k ! 0, as k!¥, if the number of eigenvalues is infinite). This follows from the Hilbert-Schmidt theorem, see [Ree72, Theorem VI.16]. 61 Special compact operators: Hilbert-Schmidt and trace class operators. Let A be a compact, self-adjoint linear operator on H. Let (e k ) be a CONS of eigenvectors of A with associated sequence of eigenvalues(l k ). Definition 3.2.1. Define jAj 2 HS jAj 2 2 := å k; j1 (Ae k ;e j ) H = å k1 l 2 k : (3.12) We say A is Hilbert-Schmidt ifjAj HS <¥. The Hilbert-Schmidt operators constitute a subspace of the bounded linear operators L(H) and, on this subspace,jAj HS is a norm, said the Hilbert-Schmidt norm of A. Define tr(A)jAj 1 := å k1 (Ae k ;e k ) H = å k1 l k : (3.13) We say A is trace class if tr(A)<¥. The trace class operators constitute a subspace of the bounded linear operators L(H) and, on this subspace, tr(A) is a norm, said the trace norm of A. From (3.12) and (3.13) we conclude that every Hilbert-Schmidt operator is also trace class and tr(A) andjAj HS do not depend on(e k ). We can define these quantities for more general self-adjoint operators. However, if the norms are finite then A is compact. Hence, assuming A is compact is not a restriction. 62 Additionally, if A is positive, i.e.,(Ah;h) H 0, h2 H, thenl k =(Ae k ;e k ) H 0 for all k, and the square root of A, denoted by p A or A 1=2 , can be represented by A 1=2 h= å k1 p l k (h;e k ) H e k ; h2 H: (3.14) It is immediate that tr(A)=jA 1=2 j 2 HS . Generalized inverse. Let A be a positive, self-adjoint, compact operator. Then A is injective if and only if l k > 0 for all k, and then the algebraic inverse A 1 cannot be bounded on H. In fact, A 1 e k = 1 l k e k ; andl k ! 0: If A is injective, then R(A) 6= H, otherwise, as a consequence of the open mapping theorem [Ree72, Theorem 3.10], its algebraic inverse would be bounded. Similar con- clusion applies to A 1=2 . Hence, in general, we cannot talk about inverses in L(H), but rather we have the unbounded operators A 1 : D(A 1 ) 6= H! H; A 1 Ax= x; A 1=2 : D(A 1=2 ) 6= H! H; A 1=2 A 1=2 x= x: If A is not injective, the following notion of generalized inverse is used: A 1 x= y, Ay= x and y2 N(A) ? : (3.15) 63 The requirement that the pre-image A 1 y is in N(A) ? , turns A 1 into a well defined linear map A 1 : R(A)! N(A) ? . In fact, if A 1 y= x2 N(A) ? , and A 1 y= ˜ x2 N(A) ? , with Ax = A ˜ x, then x ˜ x2 N(A) ? \ N(A), that is x = ˜ x. The algebraic relations are A 1 A= id N(A) ? , AA 1 = id R(A) . If A is injective we obtain the usual notion of inverse. Example 3.2.1. Let P be an orthogonal projection. Then P 1 : R(P)! N(P) ? = R(P). For all y2 PH, x= P 1 y, Px= y and x2 N(P) ? = R(H), x= y, so that P 1 is just the identity restricted to R(P). Integral operators. We now specialize to positive operators of an integral type. We first need the Definition 3.2.2. Let S be a set. A real-valued function K(s;t) defined on SS is called a positive definite kernel on S if K(t;s)= K(s;t), s;t2 S, and, for every integer n 1, every collection of points s 1 ;:::;s n from S, and every collection of real numbers y 1 ;:::;y N ; the following holds: n å i; j=1 K(s i ;s j )y i y j 0: Remark 3.2.1. Observe that the matrix (K(s i ;s j )) is symmetric positive semi definite: there could be(y 1 ;:::;y n )6= 0, such that n å i; j=1 K(s i ;s j )y i y j = 0: 64 But traditionally, the term “positive definite" is commonly used instead of “non-negative definite". Additionally, specialize to a finite, positive Borel measure space (S;g). Let K2 L 2 (S S;mm), that is: Z S Z S K(s;t) 2 dm(s)dm(t)<¥; (3.16) Then (A f)(t)= Z S f(s)K(s;t)dm(s); t2 S; f2 L 2 (S); (3.17) efines an integral operator on L 2 (S;m). Proposition 3.2.1. Let K2 L 2 (S S;mm) be a positive definite kernel. Then the integral operator A defined in (3.17) is a positive, self-adjoint, compact operator in L(H). Furthermore: (a) The Hilbert-Schmidt norm of A is jAj 2 HS =jKj 2 L 2 (SS) = Z S Z S K(s;t) 2 dm(s)dm(t): (b) The trace norm of A is jAj 1 = Z S K(s;s)dm(s): 65 (c) Given a CONS(m k ) formed by eigenvectors of A with corresponding eigenvaluesl k , K(s;t)= å k1 l k m k (s)m k (t); in L 2 (S S); and A 1=2 is also an integral operator with kernel K 1=2 (s;t)= å k1 p l k m k (s)m k (t); in L 2 (S S): Proof. It is readily seen that A is positive and self-adjoint since K is a positive definite kernel. For the trace norm, observe that: K(t;)2 L 2 (S), t-a.s. and given a CONS(m k ) of L 2 (S), K(t;s)= å k1 Z S K(t;u)m k (u)dm(u)m k (t): Hence, by Fubini’s theorem and dominated convergence theorem å k1 (Am k ;m k ) H = å k1 Z S Z S K(t;u)m k (u)dm(u) m k (t)dm(t) = Z S å k1 Z S K(t;u)m k (u)dm(s)m k (t) ! dm(t) = Z S K(t;t)m(t): For the Hilbert-Schmidt norm, observe that: AK(t;s)= Z S K(t;u)K(s;u)dm(u): 66 Hence, by Fubini’s theorem and dominated convergence theorem å k1 (Am k ;Am k ) H = å k1 Z S Z S K(t;u)m k (u)dm(u) Z S K(t;s)m k (s)dm(s) dm(t) = å k1 Z S Z S Z S K(t;u)K(t;s)dm(t) m k (u)m k (s)dm(s)dm(u) = å k1 Z S Z S AK(u;s)m k (u)m k (s)dm(s)dm(u) = Z S AK(s;s)dm(s) = Z S Z S K(s;u) 2 m(u)dm(s): Concerning the representation of K: assuming further that Am k =l k m k , we conclude Z S K(t;u)m k (u)dm(u)= Am k (t)=l k m k (t) then K(t;s)=å k1 R S K(t;u)m k (u)dm(u)m k (t)=å k1 l k m k (s)m k (t). Concerning A 1=2 : let B be the integral operator with kernel K 1=2 (s;t) := å k1 p l k m k (s)m k (t): (3.18) Note that the eigenfunctions of the operator B are the m k ’s and its eigenvalues are the p l k ’s which implies that B= A 1=2 and K 1=2 is the kernel of A 1=2 . Remark 3.2.2. When K is a positive definite kernel, S is a compact topological space and K is continuous, then the series representation for K(s;t) converges uniformly. This result is known as Mercer’s theorem (see [DS63, XI.8.58]. 67 3.2.2 Reproducing kernel Hilbert spaces. Now we introduce the Definition 3.2.3. A function K : S S!R is a reproducing kernel of the Hilbert space H when 1. K(s;)2 H, for all s2 S; 2. h(s)=(h;K(s;)) H for all h2 H;s2 S (reproducing property). A characterization for reproducing kernels is Theorem 3.2.1. Let H be a Hilbert space of functions S!R. Then the following are equivalent: 1. the evaluation map ¯ s : H!R is continuous, for all s2 S. 2. there exists a reproducing kernel for H. Moreover, if K is a reproducing kernel for H, then K is unique and it is a positive definite kernel. Proof. 1:) 2: For fixed s2 S, observe that h7! s(h) is in the dual H . From Riesz represention theorem: there exists a unique g s 2 H, such that s(h)= h(s)=(h;g s ) H . Then q(s;t)= g s (t) is a reproducing kernel. 2:) 1: Given a reproducing kernel K(s;t),j ¯ s(h)j=jh(s)j=j(h;(K(s;)) H jjhj H jK(s;)j H , as wanted. 68 Uniqueness: if K and K 0 are reproducing kernels then jK(s;) K 0 (s;)j 2 = (K(s;);K(s;))+(K 0 (s;);K 0 (s;)) (K 0 (s;);K(s;))(K(s;);K 0 (s;)) = K(s;s) K 0 (s;s) K 0 (s;s) K(s;s) = 0; due to the reproducing property. Since s is arbitrary, K= K 0 . Now given y 1 ;:::;y n and s 1 ;:::;s n 2 S, å i y i K(s i ;) 2 H = å i; j y i y j K(s i ;s j ) 0 . Conversely, Theorem 3.2.2 (Moore-Aronszajn). Given a positive definite kernel K : SS!R, there corresponds a unique Hilbert spaceH K of functions on S admitting K as a reproducing kernel. Proof. The Hilbert spaceH K is the completion of fh;h= n å i=1 a j K(t j ;)g 69 under the inner-product (h; f) K = n å i; j=1 a i b j K(t i ;s j ); where h=å n i=1 a j K(t i ;)g and g=å m j=1 b j K(s j ;). See [BTA04, p.19]. Remark 3.2.3. The spaceH K is an abstract quotient space that is not a space of func- tions S!R, strictly speaking, but it can be identified to one since we have for all f2H K f(t)=( f;K(t;)) K : Definition 3.2.4. A Hilbert space H of functions S!R that satisfies either one of the conditions 1. or 2. of Theorem 3.2.1 is called a reproducing kernel Hilbert space (RKHS) on S. Example 3.2.2 (duall RKHS). The dual H is a RKHS of functions on H: let R denote the Riesz isometry with inverse R 1 x=(x;) H . For every j2 H , and for all x2 H, (x;) H 2 H and j(x)=(Rj;x) H =(j;R 1 x) H =(j;(x;) H ) H : Hence, the reproducing kernel is the inner product(;) H . Example 3.2.3 (L 2 (0;1) is not a RKHS). The space L 2 (0;1) with the usual inner-product is not a RKHS over(0;1). In fact, evaluating an element[ f] of L 2 (0;1) is meaningless, 70 as it depends on the representative f of the equivalence class [ f]. In any case, there could still exist K such that for allj2 L 2 (0;1). j(t)=(j;K(s;)) 2 = Z 1 0 j(r)K(r;t)dr; t a.e.: This is also impossible, see [BTA04, p. 8]. Example 3.2.4 (RKHS with norms involving derivatives I). The space H 1 0 ([0;1]) = f f ; f(t)= Z t 0 g(s)ds;g2 L 2 (0;1)g = f f ; f(t)= Z t 0 ˙ f(s)ds; ˙ f2 L 2 (0;1)g with inner product ( f;g) H 1 0 = Z 1 0 ˙ f(s) ˙ g(s)ds is a RKHS with reproducing kernel K(s;t)= t^ s= min(t;s): In fact, H 1 ([0;1]) with the referred inner product is a Hilbert space of the Sobolev type (see [AF03]) and if f2 H 1 0 ([0;1]), by Cauchy-Schwarz, j ¯ t( f)j=j f(t)j=j Z t 0 ˙ f(s)dsjj fj H 1 0 p t; t2[0;1]: 71 To find the reproducing kernel: is there K(s;t) such that for all t2[0;1] f =( f;K(t;)) H 1 0 = Z 1 0 ˙ f(s) ¶ ¶s K(t;s)ds; K(t;0)= 0? Since, f(t)= R 1 0 ˙ f(s)1 [0;t] (s) we can try solve the problem: ¶ ¶s K(t;s)= 1 [0;t] (s), K(t;0)= 0 for any t fixed. A solution is K(t;s)= t^ s. Integrating by parts, it is readily verified that R 1 0 1 [0;t] (s)j(s)ds= R 1 0 t^s ˙ j(s)ds for all smooth test functionsj null at the boundary of[0;1]. Hence, ¶ ¶s K(t;s)= 1 [0;t] (s)2 L 2 (0;1) in the weak sense, by Sobolev embedding: t^2 H 1 0 ([0;1]). Example 3.2.5 (RKHS with norms involving derivatives II). Similarly, let f2 B m iff f (n) (0)= 0, for j2f0;1;:::;m 1g. We claim H m 0 ([0;1]) H m 0 =f f2 B m ; f; ˙ f;:::; f (m1) is absolutely continuous and f (m) 2 L 2 (0;1)g with inner product ( f;g) H m 0 = Z 1 0 f (m) (s)g (m) (s)ds is a RKHS with reproducing kernel K(s;t)= Z 1 0 G m (s;u)G m (t;u)du; (3.19) where G m (t;u)= (t u) m1 + (m 1)! ; (x) + = x1 x0 : 72 Indeed, by Taylor’s expansion, for any f2 H m 0 f(t)= m1 å k=0 f (k) (u) k! t k + Z t 0 (t u) m1 (m 1)! f (m) (u)du= Z 1 0 G m (t;u) f (m) (u)du: Using Cauchy-Schwarz inequality, the evaluation map is continuous j ¯ t( f)j s Z 1 0 (G m (t;u)) 2 du s Z 1 0 ( f (m) (u)) 2 du = p K(t;t)k fk H m 0 : To verify K is a reproducing kernel for H m 0 : firstly, K(t;)2 H m 0 due to (3.19), secondly, f(t)=( f;R(t;)) H m 0 , since (3.19) implies ¶ m ¶s m (K(t;s))= G m (t;s): Example 3.2.6 (RKHS with norms involving derivatives III). Continuing the previous example, letf j (t)= t j1 =( j 1)!, j= 1;2;:::;m and denote byP m 0 the m-dimensional space of polynomials of degree less or equal to m 1 spanned by f 1 = 1;:::;f m = t m1 =(m 1)!. We have D i1 f j (0)=d i; j (3.20) 73 and D m (P m 0 )= 0. OnP m 0 define the inner product (f;y) P m 0 = m1 å j=0 D j f(0)D j y(0): which turnsP m 0 into a m-dimensional Hilbert space withf 1 ;:::;f m as an orthonormal basis. Also, K(t;s)= m å j=1 f j (s)f j (t) (3.21) is the reproducing kernel. Indeed, for any element of the basis: f i (t)= m å j=1 f j (t)(f i ;f j ) P m 0 =(f i ;K(t;)) P m 0 : Example 3.2.7 (RKHS with norms involving derivatives IV). Given the previous two RKHSP m 0 and H m 0 we can now define the so called Sobolev-Hilbert space H m ([0;1]) H m =f f ; f; ˙ f;:::; f (m1) is absolutely continuous and f (m) 2 L 2 (0;1)g: As seen in a previous example, by Taylor expansion, any element in H m admits a decom- position f = f 0 + f 1 with f 0 2P m 0 and f 1 2 H m 0 . 74 Claim: H m =P m 0 H m 0 , if we endow H m with the inner product ( f;g) H m =( f;g) P m 0 +( f;g) H m 0 : andP m 0 , H m 0 are orthogonal. Claim: H m is a RKHS with reproducing kernel K(s;t)= m å j=1 f j (s)f j (t)+ Z 1 0 G m (s;u)G m (t;u)du We refer to [W.90, p.7] for the proof. 3.2.3 LDP for square integrable Gaussian processes. Let(X(t);t2[0;T]) be a real valued Gaussian stochastic process, having zero mean function m(t)=E(X(t))= 0 and covariance function q(s;t) :=EX(s)X(t). In L 2 (0;T), the inner product is denoted by( f;g) := R T 0 f(s)g(s)ds and the norm byj fj 2 = p ( f; f). Assume:E R T 0 jX(t)j 2 dt <¥. By Fubini’s theorem, Z T 0 q(s;s)ds<¥: 75 Also, observe that the covariance function is a positive definite kernel: it is symmet- ric, and given y 1 ;:::;y n and s 1 ;:::;s n 2[0;T], å i; j y i y j q(s i ;s j )=E å i y i X(s i ) ! 2 0 Also define the operator Q on L 2 (0;T) by Q f(t) := Z T 0 q(s;t) f(s)ds; f2 L 2 (0;T): Then, by Proposition 3.2.1 (b), Q is positive, self-adjoint and of trace class: tr(Q)= Z T 0 q(s;s)ds=E Z T 0 jX(t)j 2 dt: (3.22) Finally, on L 2 (0;T) we define the functional S(j) := 8 > > > < > > > : 1 2 jQ 1=2 jj 2 2 ; ifj2 Q 1=2 L 2 (0;T) ¥; otherwise: (3.23) We have the following result Theorem 3.2.3. The functionale 2 S(j) is the action function for the Gaussian process X e (t) :=eX(t) in L 2 (0;T) ase! 0. Proof. See [FW98, Theorem 4.1]. 76 Example 3.2.8 (Projections I). Consider a process X e (t) :=e f(t)x , where f2 L 2 (0;T), T > 0,x is real valued standard Gaussian random variable ande > 0. The process X(t) is mean zero, Gaussian with correlation function q(s;t)= f(t) f(s)Ex 2 = f(t) f(s) so that Q is defined by Qj(t)= Z T 0 f(t) f(s)j(s)ds= f(t)( f;j) 2 : For convenience, assumej fj 2 = 1. Then Q is the projection onto the subspace generated by f in L 2 (0;T). A straightforward verification shows Q has trace tr(Q)=j fj 2 2 = 1. Furthermore, the null space of Q is N(Q) =fu2 L 2 (0;T);( f;u) = 0g. In this case Q 1 j =j, ifj =( f;j) f , so that, from Theorem 3.2.3, the normalized action function for X e is given by S(j)= 8 > > > < > > > : 1 2 jjj 2 2 ; ifj =( f;j) f; ¥; otherwise: Example 3.2.9 (Projections II). Consider the family of processes X e (t) :=eX(t) :=e N å i=1 m k (t)x k ; e > 0; (3.24) where the m 1 ;:::;m N forms an orthonormal set in L 2 (0;T), T > 0, N is a positive integer, and thex k are IID real valued standard Gaussian random variables. We have that E is a Gaussian process. To verify this: for any 0 t 1 <:::< t M <¥, any real a 1 ;:::;a M , Y = 77 å j a jå k m k (t j )X k =å k c k X k , where c k =å j m k (t j ); since the(X 1 ;:::;X M ) is a Gaussian vector, then Y is a Gaussian random variable. The covariation function is q(s;t)= EX(t)X(s)= å kl m k (t)m l (s)EX k X l = å kl m k (t)m l (s); and the covariance operator is Qj(t)= Z T 0 q(s;t)j(s)ds= å k Z T 0 m k (s)j(s)dsm k (t)= å k (j;m k ) 2 m k (t): The trace is tr(Q)= N, since Q is the projection onto the subspace generated by m 1 ;:::;m N , say spanfm 1 ;:::;m N g. Furthermore, the null space of Q is N(Q)= spanfm 1 ;:::;m N g ? , and N(Q) ? = spanfm 1 ;:::;m N g. Therefore, Q 1 j =j, for all j2 spanfm 1 ;:::;m N g. From Theorem 3.2.3, the normalized action function for X e is given by the functional defined on L 2 (0;T) S(j) = 8 > > > < > > > : 1 2 jjj 2 2 ; ifj2 spanfm 1 ;:::;m N g ¥; otherwise (3.25) = 8 > > > < > > > : 1 2 å N i=1 y 2 k ; ifj =å N k=1 y i m k ¥; otherwise: (3.26) 78 3.3 Gaussian measures on separable Banach spaces. The main objective of the section is to provide the background and to discuss the LDP for Gaussian measures in separable Banach spaces that appears in [DPZ92, Theorem 12.1.2], [KO78], [DS89, Chapter III, Section 3.4] and [FW98, Chapter 4]. The subject of Gaussian measures has been extensively treated elsewhere: for in- stance, see [Bog98] where Gaussian measures g for locally convex linear topological spaces X are considered, and [Bax76] where Gaussian measures over function spaces are discussed. In the route we are taking, it would be sufficient to consider the definition on Hilbert spaces but to obtain a wider perspective we opt for a compromise in abstraction and tractability and we consider it on separable Banach space E. The content of this section is adapted from [Hai09, DPZ92]. The construction of the covariance operator associated with Gaussian measure and the definition of Cameron- Martin space here presented were motivated by [DPZ92, Section 2.2.2] and [Hai09, Ex- ercise 3.25, p. 16]; the section devoted to Hilbert spaces was adapted from [Bog98, Chapter 2, Section 2.3]. Let(E;kk) be a real separable Banach space and denote its topological dual by E . We will define a Gaussian measure on the Borel sets of E. A Gaussian measure on a finite dimensional space is characterized by knowing its projections onto the one dimensional subspaces and this property is suited for generalization to infinite dimensional spaces. 79 Definition 3.3.1. A Borel measure g on E is said a Gaussian measure if gj 1 is a Gaussian probability measure onR, for every j2 E . We say g is centered if gj 1 has zero mean for anyj2 E . Remark 3.3.1. 1. This definition is “correct": ifg,n are two Borel probability measures on E, then g =n iffgj 1 =nj 1 , for allj2 E . See [Hai09, Proposition 3.6] 2. It follows immediately that E L p (E), p> 0. If nothing else is said to the contrary, the Gaussian measures considered here are all centered. 3.3.1 Cameron-Martin space: Banach case. We will now discuss the Cameron-Martin space of a Gaussian measure g on a sep- arable Banach space E. To understand how it is constructed, we will first present some preliminaries Right away, we need the following result: Theorem 3.3.1 (Fernique). There existsl > 0, such that R E e lkxk 2 g(dx)<¥. Proof. See [DPZ92, theorem 2.6] or [Bog98, Corollary 2.8.6]. Fernique’s theorem implies the existence of all moments R E kxk k g(dx) <¥, for all k> 0. It is also convenient to introduce 80 Definition 3.3.2. Define C g : E E !R by C g (j;y)= Z E j(x)y(x)g(dx): (3.27) C g is called the covariance ofg. Also, define b C g : E ! E by b C g (j)(y)= Z E xj(x)g(dx): (3.28) b C g is called the covariance operator ofg. Remark 3.3.2. Considering the canonical random element X :(E;g)! E, X(x)= x, C g ( f;g)=E f(X)g(X); f;g2 E : The previous definitions make sense due to the estimates below based on Fernique’s theorem. C g takes values inR, since jC g (j;y)jkjk E kyk E Z E kxk 2 g(dx)<¥: b C g takes values in E, since Z E kxj(x)kg(dx)kjk E Z E kxk 2 g(dx)<¥; 81 therefore R E xj(x)g(dx)2 E, interpreting the integral in the Bochner sense (see [Yos95, section V .5]). We have the L 2 -estimates k b C g (j)k jjj 2 Z E kxk 2 g(dx) 1=2 (3.29) jC g (j;y)j C g (j;j)C g (y;y)=jjj 2 jyj 2 : (3.30) C g is just the L 2 (E;g) inner product resticted to E E ; so, it is a symmetric, bilinear and nonnegative form on E : note that C g (j;j)= 0 impliesj= 0,ga.e., but it could happen thatj6= 02 E , so C g is an inner product in E only when we identify functionals that are equalg- a.s.. Denote the completion of E in the C g (i.e, L 2 norm) by E 2 . The previous L 2 - estimates (3.30), (3.29) allow us to extend C g and b C g to E 2 E 2 , E 2 , respectively. We will abuse notation and keep using C g and b C g for the extensions. LetH g = b C g (E 2 ) E and define the inner product ( b C g (j); b C g (y)) g = C g (j;y); j;y2 E 2 : (3.31) Definition 3.3.3. The Hilbert space (H g ;(;) g ) is called the Cameron-Martin (CM) space ofg. 82 By construction, one readily verifies that(H g ;(;) g ) is a Hilbert space and that b C g : E 2 !H g is a linear isometry. We summarize the previous discussion and include a few more facts in the Proposition 3.3.1. 1. The space(H g ;(;) g ) is a separable Hilbert space continuously included in E and b C g : E 2 !H g is an isometry. 2. For everyj2 E ,L(j)=N (0;jjj 2 g ), wherejjj g = supfjj(h)j;jhj g = 1g. Proof. Separability: since E is separable, so is L 2 (E;g). Then, so is the subset E 2 L 2 (E;g) and its isomorphic counterpart,H g . Embedding: by Fernique’s theorem, k b C g (j)kjjj L 2 Z E kxk 2 g(dx); showing the inclusion is continuous. The second statement also follows: for ally2 E , L(y)=N (0;jyj 2 L 2 ) 83 and jyj 2 L 2 = supfjC g (y;j)j;jjj L 2 = 1g = supfjy( b C g (j))j;j b C g (j)j g = 1g = supfjy(h)j;jhj g = 1g = jyj 2 g : In the last step a couple of identifications were made: y i y2 E H g =H g Remark 3.3.3. In fact, with an approximation argument, we can show the second state- ment of the previous proposition holds for all j2 E 2 (see [Hai09, Proposition 3.37, p. 18]). 3.3.1.1 A note on nomenclature:H g VSH C g . Some authors (see [DPZ92, Section 2.2.2]) refer to the Cameron-Martin spaceH g as “reproducing kernel space forg ” or “reproducing kernel Hilbert space” (see [vdVvZ08]). For some other authors (see [Bog98, p.44]) it is the space E 2 that is said the “reproducing kernel Hilbert space of g”. Note well: in Definition 3.2.3 we already defined what we mean by a reproducing kernel Hilbert space associated to a positive definite kernel K, H K . We show below in Proposition 3.3.2 that the CM spaceH g is indeed a RKHS over E , in fact,H g =H C g . 84 We have definedH g in two steps: by forming E – the completion of E under L 2 (g)) inner-product – and then taking its image under b C g (more precisely the extension of b C g to E 2 ). We will see now that we can instead complete b C g (E ) under the inner product jj g defined in (3.31). This follows from the Lemma 3.3.1. Let E, F be two vector spaces with scalar product. Let T : E! F be a linear isometry. Then T(E (;) E )= T(E) (;) F ; where G (;) denotes the completion of G under the inner product(;). Proof. The inclusion T(E (;) E ) T(E) (;) F follows imediately from the definition of the extension by continuity of T to E (;) E . For the other inclusion, observe that T(E) T(E (;) E ) and that T(E (;) E ) is(;) F - complete, hence T(E) (;) F T(E (;) E ). Corollary 3.3.1.H g = b C g (E ) (;) g . Proposition 3.3.2. The covariance C g is a positive definite kernel on E and the RKHS associated to C g is the Cameron-Martin space ofg:H C g =H g . Proof. The covariance C g is a positive definite kernel: given a 1 ;:::;a n 2R,j 1 ;:::;j n 2 E , one way to verify this claim is å j;k a k a j C g (j k ;j j )=E 0 B B B B B B @ a 1 ::: a n 2 6 6 6 6 6 6 4 j 1 (X) ::: j n (X) 3 7 7 7 7 7 7 5 1 C C C C C C A 2 0 85 where X :(E;g)! E is the canonical random element X(x)= x whose law isg andE is the expectation underg. Observe that fh; h= n å i=1 a j C g (j j ;)=f b C g ( n å i=1 a i j i )g= b C g (E ): By Moore-Aronszajn Theorem 3.2.2,H C g = b C g (E ) C g where(j;y) C g =å n i; j=1 a i b j C g (j i ;y j ). Now, sincejj g =jj C g on b C g (E ): j b C g (j)j 2 C g = C g (j;j) =j b C g (j)j 2 g . thenH C g = b C g (E ) C g = b C g (E ) (;) g =H g (Corollary 3.3.1). Remark 3.3.4. Here is another way to verify that C g is positive definite: since ( b C g (j j ); b C g (j k )) g = C g (j j ;j k ); we have å j;k y k y j C g (j k ;j j )=j å k y k b C g (j k )j 2 g 0: 3.3.1.2 Properties of the Cameron-Martin space. The Cameron-Martin space identifies a Gaussian measureg. Before showing this, it is convenient to consider yet another characterization: we recall that the Fourier transform of a measureg on E is b g(j)= Z E e p 1j(x) g(dx);j2 E : 86 Theorem 3.3.2. 1. The measureg is a centered Gaussian measure on E if and only if b g(j)= exp 1 2 B(j;j) ; j2 E : for some bilinear, symmetric, positive form B : E E !R, (B( f; f) 0, f2 E ). In this case B= C g . 2. Two measures coincide whenever their respective Fourier transforms coincide. Proof. See [Bog98, Theorem 2.2.4] for a proof of the first statement. The second state- ment is proved in [Bog07, Lemma 7.3.5]. Proposition 3.3.3. Ifg andn are two Gaussian measures on a Banach space E such that H g =H n and such thatjhj g =jhj n for all h2H g , theng =n. Proof. Since C g (j;j)=jhj 2 g =jhj 2 n = C n (j;j), for all j2 E the result follows from Theorem 3.3.2. A reason for studyingH g is the following theorem. A seminal version of this theorem concerned the measure induced by the standard Brownian motion on C 0 ([0;1]) and can be traced back to [CM44]. Theorem 3.3.3 (Cameron-Martin). Given h2 E, g( h) is equivalent tog if and only if h2H g : 87 If h2H g , the Radon-Nikodym derivative is given by dg( h) dg (j)= e (j;h) g 1 2 jhj 2 g : If h62H g , theng( h) andg are mutually singular. Proof. For a complete proof (even in a more general setting than Banach spaces) see [Bog98, Corollary 2.4.3, Theorem 2.4.5]. For a proof, except the statement about singu- larity see [Hai09, Proposition 3.41, p.19]. The previous theorem is a characterization of the CM space and therefore it can be used as its definition. 3.3.2 Cameron-Martin space: Hilbert case. Now let E= H be a separable Hilbert space. In this case, due to Riesz representation we identify H H; this yields some specializations and explicit computations. Theorem 3.3.4. 1. Ifg is a centered Gaussian measure, then there exists a positive, self-adjoint, trace class, Q2 L(H) operator Q such that: Z H (u;x) H (v;x) H g(dx)=(Qu;v) H : (3.32) 88 Here Q is called the covariance operator ofg. Furthermore, the trace is given by tr(Q)= Z H jxj 2 H g(dx): 2. Reciprocally, given Q2 L(H), positive, self-adjoint, trace class operator in H, there is a centered Gaussian measureg such that (3.32) holds. Proof. First statement: consider the form B(u;v)= Z H (u;x) H (v;x) H g(dx): It is bilinear, positive, symmetric and continuous on each entry: jB(u;v)jjuj H jvj H Z H jxj 2 H g(dx): The estimate is finite due to Fernique’s theorem. It follows that for each u2 H, B(u;)2 H . By Riesz representation, there exists x(u)2 H, such that B(u;)=(x(u);) H . By bilinearity and continuity, it follows Q : u7! x(u) is a bounded operator. Since B is symmetric and positive, Q is self-adjoint and positive. Also, let(e k ) be a CONS of H, so that for all x2 H,jxj 2 H =å k1 (x;e k ) 2 H . By dominated convergence theorem, tr(Q)= å k1 (Qe k ;e k ) H = Z H å k1 (e k ;x) 2 H g(dx)= Z H jxj 2 H g(dx): 89 Second statement: here we construct a random element X with values in H whose lawg has covariance operator Q. Define X = å j1 q l j x j e j ; wherex j are independentN (0;1) real random variables,(e j ) is a CONS of eigenvectors of Q with corresponding eigenvalues l j . The series converges in L 2 (W;F;P;H) (and hence a.s.) since by dominated convergence theorem EjXj 2 H = å j1 ( q l j x j ) 2 = å j1 l j = tr(Q)<¥: Let g be the induced law of X in H. Then by definition we have independent(X;e j )= p l j x j N (0;l j ). Now, by dominated convergence theorem, the Fourier transform of g is for all h2 H b g(h) = Z H exp p 1(X;h) H g(dx) = lim n!¥ Z H exp 1 2 n å j=1 l j (e j ;h) 2 H ! = exp 1 2 (Qh;h) H : By Theorem 3.3.2, the measureg is Gaussian with covariance operator Q. Remark 3.3.5. A proof without Fernique’s theorem can be obtained (see [DPZ92, Sec- tion 3.3.2]). 90 Remark 3.3.6. It follows that there is no Gaussian measure in an infinite dimensional Hilbert space H, with Fourier transform b g(x)= exp 1 2 jxj 2 H ; otherwise the corresponding Q would not be trace class. Remark 3.3.7. Identifying H H H , B is the covariance C g and Q is the covari- ance operator b C g (see Definition 3.3.2.) After these preparations, let us identify the CM space ofg on a Hilbert space in terms of its covariance operator Q. By definition,H g = b C g (H 2 ), where H 2 is the completion of H under the L 2 (g) - inner product. In this case, C g (u;v)=(Qu;v) H , b C g = Q, and(;) g is defined so that Q is a linear isometry: (Qu;Qv) g = C g (u;v). Therefore, by Corollary 3.3.1 H g = QH (;) g : In the following paragraph we further explicit this representation. 3.3.2.1 The non-degenerate case. Definition 3.3.4. Let A be a positive operator on H. We say A is non-degenerate if (Ah;h)> 0 for all h6= 0. 91 Remark 3.3.8. It is readily seen that a positive self-adjoint operator is non-degenerate operator iff it is injective. Theorem 3.3.5. Let Q be a non-degenerate covariance operator of Gaussian measure on a separable Hilbert space H, then 1. H g = Q 1=2 H, 2.juj g =jQ 1=2 uj H ; u2H g . Proof. Observe that (Qu;Qv) g =(Q 1=2 Q 1=2 u;Q 1=2 Q 1=2 v) g =(Q 1=2 u;Q 1=2 v) H hence Q 1=2 :(Q 1=2 H;jj H )!(QH;jj g ) is an isometry. Since Q is non-degenerate, Q 1=2 H = H: in fact, by contradiction, if x2 Hn Q 1=2 H, then (x;Q 1=2 h) H = 0, for all h2 H , so letting h= Q 1=2 x, we obtain x= 02 H. Therefore, we can extend Q 1=2 to an isometry Q 1=2 :(H;jj H )!(Q 1=2 H;jj g ): This shows(Q 1=2 H;jj g ) is complete and since QH Q 1=2 H, we have H g := QH (;) g = Q 1=2 H. As for the second statement, for any u2 H g ,juj g =jQ 1=2 Q 1=2 uj g =jQ 1=2 uj H . A normal triple: Covariance space and dual covariance space. A canonical Hilbert- Schmidt embedding. The fact that Q is non-degenerate adds structure in the following 92 sense: we can obtain a normal triple out of a non-degenerate, positive self-adjoint, oper- ator in L(H) (not necessarily trace class). Definition 3.3.5. A triple of Hilbert spaces(V;H;X) such that 1. (V;jj V ) is continuously and densely embedded in(H;jj H ), 2. (H;jj H ) is continuously and densely embedded in(X;jj X ); 3. there exists C > 0 with j(v;h)j H Cjvj V jhj X ; for all v2 V;h2 H: is said a normal triple. Example 3.3.1 (Covariance space, dual covariance space). Let Q be a positive self- adjoint, non-degenerate operator on H. Then, we can define the inner products: (Qu;Qv) Q = (Qu;v) H in QH; (u;v) 1 = (Qu;v) H in H: The non degeneracy of Q implies: (u;u) 1 = 0 only if u= 0. 1. We have the continuous inclusions (QH;jj Q )(H;jj H )(H;jj 1 ): 93 In fact, the first inclusion is continuous since jQuj H =jQ 1=2 Q 1=2 uj H CjQ 1=2 uj H = CjQuj Q ; with C=kQ 1=2 k: The continuity of the other inclusion is obtained similarly. 2. The inclusion(QH;jj Q )(H;jj H ) is dense (see proof of Proposition 3.3.5). 3. To construct the duality: consider the bilinear form B(;) : QH H!R, B(Qu;v)=(Qu;v) H . By Cauchy-Schwarz’s inequality it follows jB(Qu;h)j=j(Q 1=2 u;Q 1=2 h) H jjQ 1=2 uj H jQ 1=2 hj H =jQuj Q jhj 1 : Therefore, taking the completion of QH with respect tojj Q , denoted by H Q , and taking the completion of H with respect tojj 1 , denoted by H 1 Q , it is possible to extend the form B uniquely to a bilinear form[;] on H Q H 1 Q , which defines the duality. So, we have the normal triple(H Q ;H;H 1 Q ). The space H Q is said the covariance space, and H 1 Q is said the dual covariance space. Remark 3.3.9. Observe that (Qu;Qv) Q =(Q 1=2 u;Q 1=2 v) H =(u;v) 1 ; 94 therefore we have the isometries Q 1=2 : (H;jj 1 )!(Q 1=2 H;jj H ); Q 1=2 : (Q 1=2 H;jj H )!(QH;jj Q ); These can be extended to isometries that we still denote by Q 1=2 Q 1=2 : (H 1 Q ;jj 1 )!(H;jj H ); Q 1=2 : (H;jj H )!(H Q ;jj Q ): Therefore H = Q 1=2 H 1 Q and H Q = Q 1=2 H. Remark 3.3.10. We can identify H 1 Q to the topological dual H Q through the duality[;]. Remark 3.3.11. Suppose H =R 2 with the usual inner product and Q= Q 1=2 = 1 0 0 0 then QHR, which is not dense inR 2 . Therefore, the non degeneracy of Q is required for the normal triple structure to appear. Remark 3.3.12. Recalling Example 3.2.2, we could abuse language and say, due to duality, that V = X is a RKHS over X, and H Q a RKHS over H 1 Q , however we will not use such provisions. Remark 3.3.13. Additionally, if Q is trace class, then Q corresponds to a Gaussian measure g in H. It is readily seen that H Q =H C g =H g = QH jj Q = Q 1=2 H, H 1 Q = H jj 1 = H 2 . 95 Series representation of the spaces H Q , H 1 Q . Proposition 3.3.4. Let Q a positive self-adjoint, trace class and non-degenerate operator. Let(e k ) be a CONS of eigenvectors of Q with corresponding eigenvaluesl k . Then H Q =fh= å k h k e k 2 H; å k jh k j 2 l k <¥g; jhj 2 Q = å k jh k j 2 l k ; h2 H Q and (formally) H 1 Q =fh= h k e k ; å k l k jh k j 2 <¥g; jhj 2 1 = å k l k jh k j 2 ; h2 H 1 Q : Proof. Recall thatl k ! 0 andl k > 0. Letj n =å k a n k e k 2 H andjQj n Qj m j Q ! 0, as n;m!¥. We have Qj n =å k l k a n k e k and jQj n Qj m j 2 H = å k jl k a n k l k a m k j 2 = å k jl k j 2 ja n k a m k j 2 C å k l k ja n k a m k j 2 = å k j p l k a n k p l k a n k j 2 = CjQj n Qj m j 2 Q ! 0 as n;m! 0. Since l 2 is complete there are sequences(b k );(c k )2 l 2 so that å k jl k a n k b k j 2 ! 0 and å k j p l k a n k c k j 2 ! 0 96 and c k = b k = p l k . Nowj =å k b k e k 2 H and jQj n jj H ! 0 and å k jb k j 2 l k <¥: Also, jjj 2 Q = lim n jQj n j 2 Q = lim n å k l k ja n k j 2 = å k jc k j 2 = å k jb k j 2 l k : Similarly for H 1 Q . Canonical embedding. Note that H H 1 Q with the dense, continuous Hilbert-Schmidt inclusion i, in fact:jij 2 HS =å k; j (e k ;e j ) 1 =å k l 2 k =jQ 1=2 j 2 HS = tr(Q). 3.3.2.2 The degenerate case. In this case, some properties are lost: for instance Q 1=2 H is not dense in H anymore, a property that allowed us before to identifyH g = Q 1=2 H. Similarly to the finite dimen- sional case discussed in the previous section we can describe the Cameron-Martin using the space Q 1=2 H with an appropriate norm involving the generalized inverse, as follows: ((u;v)) g =(Q 1=2 u;Q 1=2 v) H ;u; v2 Q 1=2 H: Theorem 3.3.6. Let Q be the covariance operator of Gaussian measure on a separable Hilbert space H, then 1. H g = Q 1=2 H ((;)) g , 97 2.juj g =jQ 1=2 uj H ; u2H g . Proof. By definition of generalized inverse Q 1=2 Qu = Q 1=2 u, in fact: Q 1=2 Q 1=2 u = Qu and for any z2 N(Q 1=2 ), (Q 1=2 u;z) = (u;Q 1=2 z) = 0. Therefore ((Qu;Qv)) g = (Q 1=2 u;Q 1=2 v) H =(Qu;Qv) g and Q 1=2 :(Q 1=2 H;kk g )!(QH;jj g ) is a linear isom- etry. From Corollary 3.3.1, taking the completion, we arrive at the two statements we wanted to verify. Remark 3.3.14. We learned Section 3.3.2.1 covering the non-degenerate case and lead- ing to Theorem 3.3.5 from [Mik11] and Section 3.3.2.2 and Theorem 3.3.6 are the corre- sponding generalizations. Nevertheless, similar representations also appear in [Aze80, Chapter II] and [DPZ92, Proposition 12.8]), not invoking the notion of generalized in- verse. 3.3.3 LDP for Gaussian measures. Given a Gaussian measureg on a separable Banach space E, we can also discuss the asymptotic properties of the familyg( e ), ase! 0. Theorem 3.3.7. The familyg( e ) satisfies a LDP on E with speede 2 and with the good rate function I(x)= 8 > > > < > > > : 1 2 jxj 2 H i f x2 H ¥ otherwise 98 where(H;(;) H ) is a Hilbert space continuously embedded in E such that, for allj2 E L(j)=N (0;jjj 2 H ); jjj H = sup jhj1 jj(h)j: Proof. See [DPZ92, Theorem 12.7]. Remark 3.3.15. The space H above is the CM spaceH g from Definition 3.3.3. Ob- serve that since the inclusion i :H g ,! E is continuous it has an adjoint i : E ,!H g (H g =H g by Riesz representation), i j(h)=j(ih), for anyj2 E , h2H . In other words, the adjoint i mapsj to its restriction toH g ,j i. Hence in the expressionjjj g abbreviates:ji jj g = sup jhj g 1 jj(ih)j= sup jihj H g j1 jj(ih)j= sup jhj g 1 jj(h)j. 99 3.4 Proof of Claim 1.0.1. 3.4.1 First statement. We have seen in the previous sections that, corresponding to a Gaussian stochastic process X=(X(t);0 t 1) such thatE[ R 1 0 X 2 (t)]dt <¥, there is a positive self-adjoint, trace class, operator Q X 2 L(H), H = L 2 (0;1). On the other hand, given positive self-adjoint, trace class operator Q2 L(H), H = L 2 (0;1), there is a Gaussian measureg and whose CM space contains Q 1=2 L 2 (0;1) as a dense subset and where the norm isjQ 1=2 j 2 (see Theorem 3.3.4 and Proposition 3.3.6). Now, let us verify Q X = Q: by Fubini’s theorem (Q X f;g) 2 := Z 1 0 Z 1 0 q(s;t) f(s)ds g(t)dt = E Z 1 0 f(s)X(s)ds Z 1 0 g(t)X(t)dt = E( f;X) 2 (g;X) 2 = Z L 2 (0;1) ( f;x) 2 (g;x) 2 g(dx) =: C g ( f;g) = (Q f;g) 2 for all f;g2 L 2 (0;1). Hence, Q X = Q and items 1.0.2. and 1.0.3. coincide. 100 3.4.2 Second statement. As for the case of the Wiener process(W(t) : t2[0;1]) q(s;t)= s^t; Q W f(t)= Z 1 0 s^t f(s)ds: We want to verify Q 1=2 W L 2 (0;1)= H 1 0 ([0;1]); jQ 1=2 W hj 2 =j ˙ hj L 2 ; h2 H 1 0 ([0;1]): In fact, this is shown in [Bog98, Lemma 2.3.14] and we arrange the argument in steps. Diagonalization of Q W . If f2 L 2 (0;1) is an eigenvector with eigenvaluel, then l f(t)= Q W f(t)= Z t 0 s f(s)dst Z t 1 f(s)ds; implying f has a modification with two derivatives: l f 0 (t)= Z t 1 f(s)ds; l f 00 (t)= f(t): If l = 0, then f = 0 and it is not an eigenvector. Hence l > 0. We also conclude the boundary conditions f(0)= 0 f 0 (1)= 0: 101 Therefore, f satisfies a second order linear differential equation with characteristic equa- tion D 2 + 1=l = 0. The general solution is f(t)= Acos t p l + Bsin t p l . Since, f6= 0 andl > 0 the boundary conditions imply A= 0 and 1 p l =p=2+ kp, k 0. Therefore, f(t)= Bsin(p=2+ kp)t, and by requiringj fj 2 = 1 we can find B= p 2. Summarizing, we obtain a CONS of eigenvectors with associated eigenvalues: m k (t)= p 2sin t p l k l k = 1 p 2 (k 1 2 ) 2 ; k= 1;2;::: (3.33) Verification that Q 1=2 W L 2 (0;1)= H 1 0 ([0;1]). Given h2 Q 1=2 W L 2 (0;1), we have h= å k h k m k with å k h 2 k l k <¥: Differentiating ˙ h= å k h k ˙ m k ; a:e: : in fact, it is straightforward to verify p l k ˙ m k (t)= p 2cos t p l k , k= 1;2;::: is a CONS for L 2 (0;1), yielding, j ˙ hj 2 2 = å k h 2 k j ˙ m k j 2 2 = å k h 2 k l k <¥: Also h(0)= 0. Hence h(t)= R t 0 h(s)ds with g= ˙ h2 L 2 (0;1), as wanted. The other inclusion is obtained similarly. 102 Verification thatjQ 1=2 W hj 2 =j ˙ hj L 2 ; h2 H 1 0 ([0;1]). We have Q 1=2 W m k = 1 p l m k , there- fore for all h=å k h k m k 2 H 1 0 ([0;1]), jQ 1=2 W hj 2 2 =j å k h k p l k m k j 2 2 = å k h 2 k l k =j ˙ hj 2 2 : The last identity was already seen in the previous paragraph. 3.5 RKHSH q vs CMH g . Let(X(t); t2 S) be a square integrable, zero-mean Gaussian stochastic process and q(s;t)=EX(t)X(s) and R S q(s;s)ds<¥, (S a compact interval), so that the paths are in H = L 2 (0;1). LetH q be the RKHS over S (from the Moore-Aronszajn construction 3.2.2), asso- ciated to the kernel q. On the other hand, let H g be the CM space of the Gaussian law L(X) or equivalently (recall Proposition 3.3.2 ) the RKHSH C g over L 2 (0;1) associated to the kernel C g . These two spaces will coincide wheneverH q = Q 1=2 H ((;)) g – if Q is non-degenerate, wheneverH q = Q 1=2 H. In a broader perspective, given a process with paths in some Banach space of func- tions E, a comparison of these two spaces has been made in [vdVvZ08]. It rests essen- tially in the requirement that the evaluation maps ¯ s : b7! b(s) are continuous. 103 Theorem 3.5.1. Ifg is a Gaussian measure in a complete separable subspace of the uni- formly bounded functions f : S!R equipped with the uniform norm, then the Cameron- Martin spaceH g and the RKHSH q coincide (i.e., they are isometrically isomorphic). Furthermore, b C g ( ¯ t)= q(t;). Proof. See [vdVvZ08, Theorem 2.1]. From this viewpoint, to explicit the LD rates one is required to explicit a certain RKHS associated to the kernel covariance function. 3.6 Further examples and calculations. 3.6.1 Diagonalization of the covariance operator of the Ornstein- Uhlenbeck process. We revisit the Ornstein-Uhlenbeck process with vanishing diffusion term in light of what is suggested in Proposition 3.3.4. It is interesting to go through the program of finding the series representation. In the end, to obtain the eigenvalues of the covariance operator we are led to a second order equation with no closed form expression. More specifically we have Proposition 3.6.1. Let a6= 0, c> 0, 0< T <¥ and let X =(X(t);t2[0;T]) solve dX(t)=aX(t)dt+ p cdW(t); t2[0;T]; X(0)= 0: 104 Let Q X be the covariance associated to X. If Q X f =l f , f6= 0, then f 00 +( c l a 2 ) f = 0; f(0)= 0; f 0 (T)=a f(T): Proof. Let a6= 0, c> 0, 0< T <¥ and consider the process determined by the stochastic differential equation dX(t)=aX(t)dt+ p cdW(t); t2[0;T]: We can explicit the solution: X(t)= e at X(0)+ Z t 0 e a(tu) p cdW(u): Assume also the process is centered: X(0)= 0. The covariance function is q(s;t) = EX(t)X(s)=E Z T 0 1 ut e a(tu) p cdW(u) Z T 0 1 us e a(su) p cdW(u) = e a(t+s) Z t^s 0 e 2au cdu = ce a(t+s) e 2a(t^s) 1 2a : The processes has paths in L 2 (0;1) since E Z T 0 X(s) 2 ds= Z T 0 q(s;s)ds= c 2a T+ e 2aT 1 2a <¥: 105 The covariance operator is Q f(t) = Z t 0 q(s;t) f(s)ds = c Z T 0 e a(t+s) e 2a(t^s) 1 2a f(s)ds = c a e at Z t 0 e as e as 2 f(s)ds+ c a e at e at 2 Z T t e as f(s)ds = c a e at Z t 0 sinh(as) f(s)ds+ c a sinh(at) Z T t e as f(s)ds: Now, let us proceed with the diagonalization and find f ,l such that Q f =l f . Differen- tiating the LHS, (Q f) 0 (t) = ce at Z t 0 sinh(as) f(s)ds+ ce at sinh(at) f(t)e at + + ccosh(at) Z T t e as f(s)ds+ csinh(at)(1)e at f(t) = ce at Z t 0 sinh(as) f(s)ds+ ccosh(at) Z T t e as f(s)ds and (Q f) 00 (t) = cae at Z t 0 sinh(as) f(s)ds+ casinh(at) Z T t e as f(s)ds ce at f(t)(sinh(at)+ cosh(at)) = a 2 Q f(t) c f(t): 106 In the penultimate equation we used sinh(at)+ cosh(at)= e at . We obtain the following ODE for f : f 00 +( c l a 2 ) f = 0: (3.34) Note that if l = 0, f = 0 which is not an eigenvector. Hence l > 0 (and Q is non- degenerate, so it is legitimate to try obtaining a series representation). It is readily checked that we obtain the following additional conditions Q f(0)= 0, Q f(T)=1=a(Q f) 0 (T), i.e., f(0)= 0 (3.35) f 0 (T)=a f(T): (3.36) Let us carry out the search for the eigenvalues and eigenvectors. For convenience, let D := a 2 c=d. 1. If D= 0, f(t)= c 1 + c 2 t. Since f(0)= 0, c 1 = 0, f(t)= c 2 t: From the condition at T , c 2 =ac 2 T , hence either c 2 = 0 or T =1=a. Both are impossible. Therefore, D6= 0. 2. If D> 0, f(t)= c 1 e p Dt + c 2 e p Dt . Since f(0)= 0, then f(t)= 2c 1 sinh( p Dt): 107 From the condition at T , 2c 1 sinh( p DT)=c 1 2=acosh( p DT). Since c 1 6= 0 and cosh( p DT)6= 0, tanh( p DT)=1=a. Solving forl we obtain: l = c a 2 ( 1 T tanh 1 ( 1 a )) 2 > c a 2 : 3. If D< 0, then f(t)= c 1 cos( p Dt)+ c 2 sin( p Dt). Since f(0)= 0, f(t)= c 2 sin( p Dt): From the condition at T , c 2 sin( p DT)=c 2 1=a p Dcos( p DT). If c 2 = 0 or cos( p DT)= 0, then f = 0 or sin( p DT)= 0, respectively, and both are contradictions. Hence tan( p DT)= p D a . Let x= p DT = p c=l a 2 T , and we are dealing with equation aT tan(x)= x: Graphically, it is readily seen that such an equation has an increasing and count- able number of solutions, one per each interval( p 2 + kp; p 2 + kp), k= 0;1;2;:::. Therefore, there exists a countable decreasing number of eigenvalues l k ! 0. Moreover, since p 2 + kp < x k < p 2 + kp, we have c(( p 2 +kp T ) 2 + a 2 ) 1 < l k < c(( p 2 +kp T ) 2 + a 2 ) 1 for k 1, so thatl k = O(k 2 ). 108 This example illustrates a technical barrier of this way of proceeding to explicit the LD rate throughout this method. We observe that in this case, we already provided a some- what explicit representation for the LD rates. In fact, we obtained in 2.6.1 (and in Ex- ample 2.6.6 even further). Nevertheless, we are not claiming that the expressions found in Proposition 2.6.1 and Proposition 2.6.2 characterize the Cameron-Martin space asso- ciated. 3.6.2 Multiple integrated Wiener process. Let T <¥, m a positive integer and(W(t); t2[0;T]) be a standard Wiener process. Define the m-fold multiple Wiener process I m (t)= Z t 0 Z t 1 0 ::: Z t m1 0 W(t m )dt m :::dt 2 dt 1 ; t2[0;T] (3.37) or equivalently I m (t)= 8 > > > < > > > : W(t); m= 0 R t 0 I m1 (u)du; m 1; t2[0;T]: (3.38) It is a centered Gaussian process and it can be written as I m (t)= Z t 0 (t u) m m! dW(u): (3.39) 109 Indeed, integrating by parts m-times: Z t 0 (t u) m m! dW(u) = Z t 0 (t u) m1 (m 1)! I 0 (u)du = Z t 0 I m1 (u)du: The covariance function is q(s;t)=EI m (t)I m (s) = E Z t 0 (t u) m m! dW(u) Z s 0 (s u) m m! dW(u) = Z T 0 (t u) m + m! (s u) m + m! du: Recalling Example 3.2.5 and Theorem 3.5.1 we conclude thatH g =H q = H m+1 0 , for g =L(I m ). Therefore the LDP rate for the lawsL(eI m ) in L 2 (0;1) ase! 0 has been completely explicited in Example 3.2.5. This example provide an illustration where the knowledge of the RKHS trivializes the task of expliciting the LD rate. 110 Chapter 4 DISCUSSION. 4.1 Summary. In the Gaussian setting, the problem of proving and of representing large deviations rates is satisfactorily solved once we know how to correctly represent the Cameron- Martin space (this is not a problem in LD anymore): for Gaussian measures with non- degenerate covariance operator we may look for the series representation (and we need to solve Fredholm equations); for Gaussian measures induced by Gaussian processes on C([0;T]), T <¥, we look at ways to characterize the RKHS associated to the kernel covariance function q(s;t). The problem is solved in abstract, but, depending on how explicit the representation is required to be, further explicitation can be non trivial. 111 4.2 Further related questions. The study of the large deviations principle for Markov processes has been success- fully studied using an approach that resembles Prohorov compactness approach to weak convergence of probability. This program was initiated in [Fle85, FS86] giving a con- nection to control theory ideas. Also, a similar approach has been taken in [FK06] and proposes to obtain the LDP for a family of Markov processes(X n ) n1 by looking at the asymptotic behavior of the so called nonlinear semigroups V n (t) f(x) := 1 n logE[e n f(X n (t)) jX n (0)= x]= 1 n logT n (t)e n f(x) : Here T n (t) is the semigroup associated to the infinitesimal generator A n of X n . The weak convergence approach is also based on an equivalent formulation of the LDP, said the Laplace’s principle (see Definition 2.3.1), and it is an alternative to the more traditional way of changing of measure or even to the approximation method developed in [FW98, Aze80], that we have discussed in the previous chapters. This approach has been successfully applied to the cases where the state space is infinite dimensional (see, e.g., [DE97, BD00, BDM08]). A first question is: how successful would this approach be to a Markov process such as the solution of a stochastic partial differential equations with vanishing multiplicative noise such as du(t;x)= u xx (t;x)dt+eu x (t;x)dw(t); u(0;x)= u 0 (x); x2R; 112 as e! 0? Here (w(t);t2[0;T]) is a one dimensional Wiener process and u 0 is given sufficiently smooth. Given the high degeneracy of the noise, what type of explicit repre- sentations for the rate functions would one expect or obtain? On the other hand, in general, a zero mean Gaussian process (X(t);t2 S) with co- variance function q(s;t) is a Markov process if and only if q(s;t)= g(s)G(min(s;t))g(t), where g is a continuous function with g(0)= 1 and G is a continuous and monotone increasing function. Furthermore, if G> 0 the corresponding RKHS has been character- ized (see [BTA04, Example 2, p.59]). When the family of processes is Gauss-Markov what can be said about the connec- tion between these associated semigroups, the covariance operator, and covariance function, the associated rate function, and Cameron-Martin space and norm? Why is it that for the Wiener process we can obtain a series representation for its CM space, but for the Ornstein-Uhlenbeck that task seems untractable? 113 Reference List [AF03] R. A. Adams and J. J. F. Fournier. Sobolev spaces, volume 140 of Pure and Applied Mathematics (Amsterdam). Elsevier/Academic Press, Amsterdam, second edition, 2003. [Aro50] N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 68:337–404, 1950. [Aze80] R. Azencott. Grandes déviations et applications. volume 774 of Lecture Notes in Math., pages 1–176. Springer, Berlin, 1980. [Bax76] P. Baxendale. Gaussian measures on function spaces. Amer. J. Math., 98(4):891–952, 1976. [BD00] A. Budhiraja and P. Dupuis. A variational representation for positive func- tionals of infinite dimensional Brownian motion. Probab. Math. Statist., 20(1, Acta Univ. Wratislav. No. 2246):39–61, 2000. 114 [BDM08] A. Budhiraja, P. Dupuis, and V . Maroulas. Large deviations for infinite di- mensional stochastic dynamical systems. Ann. Probab., 36(4):1390–1420, 2008. [Bog98] V . I. Bogachev. Gaussian measures, volume 62 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI, 1998. [Bog07] V . I. Bogachev. Measure theory. Vol. I, II. Springer-Verlag, Berlin, 2007. [BTA04] A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic Publishers, Boston, MA, 2004. [CL55] Earl A. Coddington and Norman Levinson. Theory of ordinary differential equations. McGraw-Hill Book Company, Inc., New York-Toronto-London, 1955. [CM44] R. H. Cameron and W. T. Martin. Transformations of Wiener integrals under translations. Ann. of Math. (2), 45:386–396, 1944. [DE97] P. Dupuis and R. S. Ellis. A weak convergence approach to the theory of large deviations. Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons Inc., New York, 1997. [dH00] F. den Hollander. Large deviations, volume 14 of Fields Institute Mono- graphs. American Mathematical Society, Providence, RI, 2000. 115 [Din93] I. H. Dinwoodie. Identifying a large deviation rate function. Ann. Probab., 21(1):216–231, 1993. [DPZ92] G. Da Prato and J. Zabczyk. Stochastic equations in infinite dimensions, vol- ume 44 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 1992. [Dri10] B. Driver. Heat kernel l 2 -spaces. lecture notes, 2010. [DS63] N. Dunford and J. Schwartz. Linear operators. Part II: Spectral theory. Self adjoint operators in Hilbert space. With the assistance of William G. Bade and Robert G. Bartle. Interscience Publishers John Wiley & Sons New York-London, 1963. [DS89] J. Deuschel and D. W. Stroock. Large deviations, volume 137 of Pure and Applied Mathematics. Academic Press Inc., Boston, MA, 1989. [DZ98] A. Dembo and O. Zeitouni. Large deviations techniques and applications, volume 38 of Applications of Mathematics (New York). Springer-Verlag, New York, second edition, 1998. [Ell85] R. S. Ellis. Entropy, large deviations, and statistical mechanics, volume 271 of Grundlehren der Mathematischen Wissenschaften [Fundamental Princi- ples of Mathematical Sciences]. Springer-Verlag, New York, 1985. 116 [FK06] J. Feng and T. G. Kurtz. Large deviations for stochastic processes, volume 131 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI, 2006. [Fle85] W. H. Fleming. A stochastic control approach to some large deviations prob- lems. In Recent mathematical methods in dynamic programming (Rome, 1984), volume 1119 of Lecture Notes in Math., pages 52–66. Springer, Berlin, 1985. [FS86] W. H. Fleming and P. E. Souganidis. PDE-viscosity solution approach to some problems of large deviations. Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4), 13(2):171–192, 1986. [FW98] M. I. Freidlin and A. D. Wentzell. Random perturbations of dynamical systems, volume 260 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, New York, second edition, 1998. [Hai09] M. Hairer. An introduction to stochastic pdes. ArXiv e-prints, July 2009. [KO78] G. Kallianpur and H. Oodaira. Fre˘ ıdlin-Wentzell type estimates for abstract Wiener spaces. Sankhy¯ a Ser. A, 40(2):116–137, 1978. [LQZ02] M. Ledoux, Z. Qian, and T. Zhang. Large deviations and support the- orem for diffusion processes via rough paths. Stochastic Process. Appl., 102(2):265–283, 2002. 117 [Mik11] R. Mikulevicius. Miscellaneous. Lecture Notes for Math 681 (Topics in Functional Analysis), 2011. [RC72] Balram S. R. and Stamatis C. Gaussian processes and Gaussian measures. Ann. Math. Statist., 43:1944–1952, 1972. [Ree72] B. Reed, M.and Simon. Methods of modern mathematical physics. I. Func- tional analysis. Academic Press, New York, 1972. [Str90] M. Struwe. Variational methods. Springer-Verlag, Berlin, 1990. [Var66] S. R. S. Varadhan. Asymptotic probabilities and differential equations. Comm. Pure Appl. Math., 19:261–286, 1966. [Var84] S.R.S. Varadhan. Large deviations and applications, volume 46 of CBMS- NSF Regional Conference Series in Applied Mathematics. Society for In- dustrial and Applied Mathematics (SIAM), Philadelphia, PA, 1984. [vdVvZ08] A. W. van der Vaart and J. H. van Zanten. Reproducing kernel Hilbert spaces of Gaussian priors. In Pushing the limits of contemporary statistics: contri- butions in honor of Jayanta K. Ghosh, volume 3 of Inst. Math. Stat. Collect., pages 200–222. Inst. Math. Statist., Beachwood, OH, 2008. [W.90] Grace W. Spline models for observational data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990. 118 [WZ65] E. Wong and M. Zakai. On the convergence of ordinary integrals to stochas- tic integrals. Ann. Math. Statist., 36:1560–1564, 1965. [Yos95] K. Yosida. Functional analysis. Classics in Mathematics. Springer-Verlag, Berlin, 1995. Reprint of the sixth (1980) edition. 119
Abstract (if available)
Abstract
We study large deviations (LD) rates in a Gaussian setting and their representation in terms of more fundamental objects: the covariance operator, the Cameron‐Martin space, the reproducing kernel Hilbert space and we carry out a direct proof to obtain this rate for a vanishing Gaussian random vector. ❧ We provide a fairly self‐contained discussion on the relation between three well‐known examples: the LD principle for the vanishing Wiener process (Shilder's theorem), the LD principle for the vanishing square integrable Gaussian process and the LD principle for a degenerating Gaussian measure. Motivated by the case of the Wiener process, we aim at a series representation of the Cameron‐Martin space associated to the Ornstein‐Uhlenbeck process on a finite interval, and we are led to a second order differential equation that characterizes the spectrum of the associated covariance operator. ❧ Several examples are carried out illustrating the use of the contraction principle: namely for a d‐dimensional Ornstein‐Uhlenbeck process and for an oscillator with Gaussian random forcing.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Statistical inference of stochastic differential equations driven by Gaussian noise
PDF
Gaussian free fields and stochastic parabolic equations
PDF
Linear filtering and estimation in conditionally Gaussian multi-channel models
PDF
Asymptotic problems in stochastic partial differential equations: a Wiener chaos approach
PDF
Parameter estimation in second-order stochastic differential equations
PDF
Large deviations approach to the bistable systems with fractal boundaries
PDF
Return time distributions of n-cylinders and infinitely long strings
PDF
Asset price dynamics simulation and trading strategy
PDF
Multi-population optimal change-point detection
PDF
Bulk and edge asymptotics in the GUE
PDF
Topics in selective inference and replicability analysis
PDF
Statistical inference for second order ordinary differential equation driven by additive Gaussian white noise
PDF
On stochastic integro-differential equations
PDF
On the simple and jump-adapted weak Euler schemes for Lévy driven SDEs
PDF
Non-parametric models for large capture-recapture experiments with applications to DNA sequencing
PDF
Generalized Taylor effect for main financial markets
PDF
Optimal debt allocation using a dynamic programming approach
PDF
Numerical weak approximation of stochastic differential equations driven by Levy processes
PDF
Probabilistic numerical methods for fully nonlinear PDEs and related topics
PDF
CLT, LDP and incomplete gamma functions
Asset Metadata
Creator
Bessam, Diogo
(author)
Core Title
Large deviations rates in a Gaussian setting and related topics
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Publication Date
07/24/2014
Defense Date
04/16/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Cameron-Martin space,Gaussian,large deviations,OAI-PMH Harvest,Wentzell‐Freidlin estimates
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lototsky, Sergey V. (
committee chair
), Mikulevičius, Remigijus (
committee member
), Udwadia, Firdaus E. (
committee member
)
Creator Email
bessam@usc.edu,diogobessam@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-447401
Unique identifier
UC11286712
Identifier
etd-BessamDiog-2738.pdf (filename),usctheses-c3-447401 (legacy record id)
Legacy Identifier
etd-BessamDiog-2738.pdf
Dmrecord
447401
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Bessam, Diogo
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Cameron-Martin space
Gaussian
large deviations
Wentzell‐Freidlin estimates