Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Parameter estimation in second-order stochastic differential equations
(USC Thesis Other)
Parameter estimation in second-order stochastic differential equations
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PARAMETER ESTIMATION IN SECOND-ORDER STOCHASTIC DIFFERENTIAL EQUATIONS by Ning Lin A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (APPLIED MATHEMATICS) August 2012 Copyright 2012 Ning Lin To my parents, for their endless love and encouragement. ii Acknowledgements Though only my name appears on the coverof this dissertation, a great manypeople have contributed to its production. I owe my gratitude to all those people who have made this dissertation possible and because of whom my graduate experience has been one that I will cherish forever. I would like to gratefully and sincerely thank my advisor, Prof. Sergey Lototsky, for his guidance, encouragement and patience during my graduate study. He is not only knowledgable and inspiring, but also very patient and willing to teach me all the useful skills such as how to search for related literature. His guidance is crucial to my completion of dissertation, and more importantly, his advice on study and research will benefit me for a lifetime. He, as an academic advisor, has set an excellent example for his students. His enthusiasm and discipline is remarkable and has influenced me in many good ways. I would also like to thank Jay Bartroff and Fengzhu Sun on my dissertation committee for providing me with valuable and insightful suggestions. Jay Bartroff was also on my guidance committee and his help with my dissertation proposal is greatly appreciated. iii My special thanks go to my husband, Kaiyuan Zhang. He earned his Ph.D. in 2009 and has generously shared with me his experience in study and research. He is always supportive and encouraging, and I am very lucky to have him in my life. Last but not the least, I would like to express my gratitude to all my friends and family. It is your love and friendship that has made everything possible. iv Table of Contents Dedication ii Acknowledgements iii Abstract viii Chapter 1: Introduction 1 1.1 Review of Parameter Estimation for Ornstein Uhlenbeck Process . . . 4 1.2 Overview of Parameter Estimation for Second Order SDE(CAR(2)) . 6 Chapter 2: Ornstein Uhlenbeck Process: CAR(1) 12 2.1 General Ornstein-Uhlenbeck process . . . . . . . . . . . . . . . . . . 12 2.2 Classification of Ornstein Uhlenbeck Process . . . . . . . . . . . . . . 14 2.3 Ergodicity of Ornstein Uhlenbeck Process. . . . . . . . . . . . . . . . 14 2.4 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.1 Consistency of Estimator . . . . . . . . . . . . . . . . . . . . . 18 2.4.2 Asymptotic Distribution of Estimator . . . . . . . . . . . . . . 18 2.5 Connection with Discrete Model . . . . . . . . . . . . . . . . . . . . . 22 Chapter 3: Parameter Estimation for CAR(2) 23 3.1 Classification of Solution . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Parameter Estimation of θ 1 and θ 2 . . . . . . . . . . . . . . . . . . . 26 3.2.1 Consistency of MLE . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.2 Asymptotic Distribution of MLE: Preparation . . . . . . . . . 30 3.2.3 Asymptotic Structure of Log-likelihood Function . . . . . . . . 32 3.3 Parameter Estimation when One Parameter is Known . . . . . . . . . 36 Chapter 4: Parameter Estimation for CAR(2): Ergodic Case 37 4.1 Some Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 v 4.2 Parameter Estimation when One Parameter is Known . . . . . . . . . 41 4.3 Parameter Estimation when Neither Parameter is Known . . . . . . . 44 4.4 Connection With Autoregression of Order Two. . . . . . . . . . . . . 47 Chapter 5: Nonergodic CAR(2): Real Eigenvalues 49 5.1 Distinct Positive Roots . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2 Roots of Opposite Sign . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.3 Larger Root is Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Smaller Root is Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.5 Positive Double Root . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.6 Double Zero Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.7 Asymptotic Structure of Normalized Log-likelihood Ratio . . . . . . . 66 5.8 Knowing one parameter and Estimating the Other . . . . . . . . . . . 68 5.8.1 Larger Root is Positive . . . . . . . . . . . . . . . . . . . . . . 68 5.8.2 Larger Root is Zero . . . . . . . . . . . . . . . . . . . . . . . . 69 5.8.3 Double Zero Root . . . . . . . . . . . . . . . . . . . . . . . . . 70 Chapter 6: Nonergodic CAR(2): Pure Imaginary Eigenvalues 72 6.1 Harmonic Oscillator Driven by Random Force . . . . . . . . . . . . . 73 6.2 Estimation of Frequency with No Damping . . . . . . . . . . . . . . . 74 6.2.1 Consistency of MLE . . . . . . . . . . . . . . . . . . . . . . . 82 6.2.2 Asymptotic Distribution of MLE . . . . . . . . . . . . . . . . 84 6.3 Testing for Damping with Known Frequency . . . . . . . . . . . . . . 85 6.3.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . 85 6.3.2 Connection with Discrete Model . . . . . . . . . . . . . . . . . 87 6.4 Testing for Damping with Unknown Frequency . . . . . . . . . . . . . 88 6.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Chapter 7: Nonergodic CAR(2): Complex Eigenvalues 92 7.1 Parameter Estimation and Asymptotic Structure . . . . . . . . . . . 95 7.2 Parameter Estimation when One Parameter is Known . . . . . . . . . 99 Chapter 8: Simulation 101 8.1 Simulation of Ornstein Uhlenbeck Process . . . . . . . . . . . . . . . 101 8.1.1 Euler Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.1.2 Exact Discretization . . . . . . . . . . . . . . . . . . . . . . . 102 8.1.3 Comparison of Euler Scheme and Exact Discretization . . . . 102 8.2 Simulation of CAR(2) . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.2.1 Euler Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 vi 8.2.2 Exact Discretization . . . . . . . . . . . . . . . . . . . . . . . 104 8.2.3 Comparison of Euler Scheme and Exact Discretization . . . . 110 Bibliography 111 vii Abstract While consistency of the maximum likelihood estimator of the unknown parame- ters in the second-order linear stochastic differential equation driven by Gaussian white noise holds under rather general conditions, little is known about the rate of convergence and the limiting distribution of the estimator, especially when the underlying process is not ergodic. The objective of this dissertation is to identify and investigate all possible types of asymptotic behavior for the maximum likeli- hood estimators. The emphasis is on the non-ergodic case, when the roots of the corresponding characteristic equation are not both in the left half-plane. viii Chapter 1 Introduction Statistical inference for multi-dimensional linear and bi-linear diffusion processes in continuous time has become an active area of research; the paper by Basak and Lee [7] provides a comprehensive and up-to-date survey of the literature on the subject. Whileeverydifferentialequationofordertwoandhighercanbereducedtoasystem, linear stochastic equations deserve a separate analysis: 1. Similar to deterministic linear equations, the solution is easier to study than in the general matrix case; 2. The unknown coefficients form a vector rather than matrix, which allows anal- ysis of estimators of individual coefficients in a much more convenient way; 3. The general conditions in the matrix setting require certain non-degeneracy of the diffusion, which may or may not hold for the higher-order linear stochastic equation. We are interested in generalization of the parameter estimation. In particular, we will discuss the estimation of coefficients in second order equations ¨ X(t) =θ 2 ˙ X(t)+θ 1 X(t)+σ ˙ W(t), t> 0, X(0) =a, ˙ X(0) =b, (1.1) 1 or the equivalent system dX = ˙ Xdt, d ˙ X = (θ 2 X +θ 1 ˙ X)dt+σdW(t) (1.2) with a standard Brownian motion W =W(t) and non-random initial conditions a,b. Here ˙ g(t),¨ g(t) denote the first and second time derivatives of function g. Autoregressivemodelofhigherordercanbeseenasthediscreteversionofstochas- tic differential equation of order two and higher. For this reason, the SDE (1.1) may be named continuous autoregression of order two (CAR(2)). Through the past few decades, numerous amount of research has been done on the topic of AR(n) model, while much less has been done for continuous models. Growing demand for con- tinuous time modeling in various fields of study from engineering to economics is drawing more and more attention to the study of statistical inference in continuous time setting. For instance, Joao and Nicolau [38] used equation (1.1) to model price of financial derivatives as a continuous time process. For a good review of history of continuous-time economic models up to the year 1988, one can refer to [10]. The methodology of estimation we shall use in this dissertation is the maximum likelihood approach. One important assumption we make here is that the sample path can be observed continuously for all t ∈ [0,T]. This assumption is relatively rare in statistical inference, for the reason that in reality, often observation can be made only at discrete points in time. Although the benefit of continuous-time tools for modeling purposes is obvious, the use of stochastic processes with continuous (in time) sample paths still poses remarkable challenges when it comes to the field 2 involving discrete data. Bandi and Phillips addressed this issue in their paper [6] and reviewednewly developed continuous-timeapproachesof statistical inference for stochastic processes. In many cases, the discrete nature of data forces researchers to design estimation methodologies that can uniquely identify the structure of the underlyingprocessfromasampleofobservationslocatedalongthecontinuoussample path rather than from a continuous record of the process over that path. (Readers are referred to [2] [22].) Such methodologies generally, but not exclusively (c.f. [44] and [1]), rely on stationarity. The reason is clear. If the underlying process is endowed with a time-invariant stationary density, then the information extracted from the discretedata can be used to identifythe time-invariantprobabilitymeasure and thereby, hopefully, characterize the continuous dynamics of the system. In this way, stationarity can be a powerful aid to identification and estimation. Sorensen [49] used exact discretization in systems of stochastic differential equations, but the estimation and discussion is restricted in ARMA(2) model, which does not give a complete picture of parameter estimation in all possible cases for (1.1). Despite the advantages of assuming the existence of a time-invariant station- ary distribution, it appears that for many continuous-time models it would be more appropriate to allow for nonstationary behavior, while not ruling out sta- tionarity either (for instance, [20], [45] and [46]). In later chapters, a few noner- godic/nonstationary models such as harmonic oscillator driven by a random force will serve as examples. 3 1.1 Review of Parameter Estimation for Ornstein Uhlenbeck Process This dissertation begins with an review of previous work on statistical inference of the well known one-dimensional Ornstein Uhlenbeck process. Let{W(t)} t≥0 be the standard Brownian motion and θ be a real valued parameter in the drift term, then the following stochastic differential equation is the Ornstein Uhlenbeck process dX(t) =θX(t)dt+σdW(t), t≥ 0 X(0) =x 0 (1.3) A natural problem is the estimation of θ. This problem has been intensively studied. Chapter one will give a full review of basic concepts on Ornstein Uhlenbeck process and related previous work on parameter estimation. Review on Ornstein uhlenbeck process serves as a guide for the main body of this dissertation. The second-order SDE (1.1) is indeed a particular example of two- dimensionalOrnsteinUhlenbeckprocessifwrittenintheformatofsystem. Although parameter estimation for the second order SDE is more complicated than the one dimensional Ornstein Uhlenbeck process, they are closely related in nature. In later chapters the readers will see that in several cases, the solution of second order SDE (1.1) can break down to linear combination of Orntein Uhlenbeck processes, thus resultsandtechniquesthatapplytotheonedimensionalOrnsteinUhlenbeckprocess can be employed to deal with the second order SDE as well. 4 ItisalsoworthnoticingthatadiscreteversionoflinearstochasticODEisexactly the autoregressive process. As an important model in econometrics and many other fields of study, it has been studied throughout decades. Mann and Wald [36] are among the first people who studied AR(p) model and the asymptotic behavior of the maximum likelihood estimator. In most early studies, only stationary processes were considered. However, in 1958, White [52] found the limiting distribution of the maximum likelihood estimator of α in the regression model x i = αy i +u i when it is explosive, in other words, |α| = 1 and |α| > 1. An important tool he used is Donsker’stheorem[18],whichplayedthekeyroleinprovingthelimitingdistribution of the case when α = 1. Anderson [5] discussed parameter estimation in all three cases of AR(1) with detailed computation procedure and useful results. Phillips [42] built a model connecting the AR(1) model and Ornstein Uhlenbeck process and developed an asymptotic theory for the AR(1) model with a root near unity. The theory has applications to continuous time estimation and to the analysis oftheasymptoticpoweroftestsforaunitrootunderasequenceoflocalalternatives. BrownandHerwitt[15]studiedthestableOrnsteinUhlenbeckprocessandproved that the maximum likelihood estimator is asymptotically normal. Feigin [19] sur- veyed estimation problems for non-ergodic type continuous time processes including diffusion processes. In 1980 Basawa and Rao [8] used a unified framework to study the asymptotic properties of tests and estimators of parameters in discrete time and continuous time jump type processes. 5 Recently A¨ ıt-Sahalia [1] developed an approach using Hermite polynomials to study the maximum likelihood estimator of a general continuous time parametric diffusion with discrete time observations. 1.2 OverviewofParameterEstimationforSecond Order SDE(CAR(2)) The corresponding ODE of (1.1) is ¨ X(t) =θ 2 ˙ X(t)+θ 1 X(t), t> 0, X(0) = 0, ˙ X(0) = 1, (1.4) Classification of solution of (1.4) depends on the eigenvalues of its characteristic equation. More generally, as a stochastic process, the solution of second order SDE (1.1) can be classified into two main categories: ergodic process and nonergodic process. However, with two parameters θ 1 and θ 2 , the classification of second order equation (1.1) is much more complicated. Following is the definition of ergodic process: Denition 1.2.1. [32] An ergodic process is a stochastic process that has ergodic property. This means that there exists an (invariant) distribution F(·), such that for 6 any measurable function h(·) with E|h(ξ)| <∞ (here ξ has a distribution F(·)), we have the convergence lim T→∞ 1 T ∫ T 0 h(X t )dt = ∫ ∞ −∞ h(x)f(x)dx≡Eh(ξ) with proability 1. Ergodicity is equivalent to existence and uniqueness of the invariant measure. Ergodicity is a nice property which says the long run average over time of a process or a measurable function of the process is equal to the average of the function over space. For more information on ergodicity, please refer to [13], [12], [48], [31] [3], [4] and [41]. Although most previous work was concerned with the ergodic processes, there are some inspiring papers on autoregression models with unit roots, in which the processes are non-ergodic. Chan and Wei [16] in 1988 established the limiting distri- bution for the AR(p) model when the characteristic polynomial has all roots on the unit circle. In 1988 Phillips [43] discussed the limiting distribution as a stochastic integral for vector ARIMA models. Johansen [28] studied a nonstationary fractional autoregressivemodel,whichcanbeenseenasdiscreteversionofthedoublezeroroots case of the SDE, but it assumes θ 1 = 0 is known instead of estimating two param- eters together. Bergstrom [11] developed an algorithm for computation of MLE of nonstationary high-order SDE. 7 With two parameters involved, there are much more cases to consider than the first order continuous time autoregression. The characteristic function of CAR(2) f(r) =r 2 −θ 2 r−θ 1 (1.5) has two eigenvalues p and q, which are p = θ 2 + √ θ 2 2 +4θ 1 2 , q = θ 2 − √ θ 2 2 +4θ 1 2 . (1.6) A process is ergodic if it has ergodic property [21, chapter 6]. The process of CAR(2) (1.1) is stationary and ergodic if and only if both eigenvalues of (1.5) have negative real parts. This is equivalent to θ 1 < 0,θ 2 < 0. From above, if and only if the parameter pair (θ 1 ,θ 2 ) lie in the third quadrant, the process defined by (1.1) is ergodic. There are more nonergodic cases in which the ergodic theorem does not apply, and one has to use other tools to derive the long run property of such process and its functionals. There are 11 cases in total to consider: • Ergodicityofthepair(X(t), ˙ X(t))requirestherealpartofpandq arenegative. Equivalently,θ 1 < 0,θ 2 < 0. Therearethreesubcases: p,q aredistinctnegative real roots; p,q are repeated negative roots; p,q are complex conjugate with negative real parts. • When p,q are real numbers and the process is unstable, there are six cases to consider: p,q are distinct and positive; p,q are repeated and positive; p,q have opposite sign; p>q = 0; p = 0>q; p,q are both zero; 8 • When p and q are complex numbers and the process is nonergodic, we are facing two cases: p,q are pure imaginary roots and p,q are complex numbers with positive real parts. In chapter one we shall classify the Ornstein Uhlenbeck process and review the previousworkontheestimationproblem,withafocusonthecontinuoustimemodel. Chapter two covers the continuous-time methodology of estimating the param- eter in second order SDE (1.1). It is maximum likelihood approach. Two different assumptions are considered when estimating the parameters. While this dissertation is focused on the estimation of the two parameters together, which is more general and interesting, parameter estimation under the assumption that one parameter is known will also be addressed as a comparison. In deed, the estimators under differ- ent assumptions are not only different, but also have different rate of convergence in many nonergodic cases. General discussion of the structure of likelihood function is also included in this chapter. Chapter three is concerned with all three ergodic cases. Despite different values the parameter pair may take, due to the ergodic nature of the process, the asymp- totics of estimators can be obtained from a central limit theorem for continuous time ergodic process, which is stated and proved in the beginning part of that chapter. ThelimitdistributionoftheMLEsisnormalandindependentoftheinitialcondition. Chapterfourdealwiththesixnonergodiccaseswithrealroots. Atheoremonthe asymptotic behavior of the MLE is stated and proved for each case, followed a brief discussion on its property and connection among different cases. The last section of this chapter is on estimating one parameter under the assumption that the other 9 one is known. It is relatively easier with only three cases and the results follow from previous proofs, but the result is nontrivial and can serve as a good comparison with the main estimation problem. Two complex root cases will be discussed separately in Chapter five and Chapter six. Chapter five is on the pure imaginary case, which has an interesting Physics background. In deed, it is the equation describing a harmonic oscillator driven by a random force. Two hypothesis testing problems are introduced to give the reader a better understanding of how the theory in this dissertation is connected with real world. Moreover, through discussion of the hypothesis testing problems, the asymptotic behavior of the MLEs is found and stated as a theorem. Chapter six covers the complex root case, which is the only case that the MLE doesnotconvergetoasinglelimitdistribution. AsequenceoftheMLEswithproper choice of T k can converge to a special type of distribution. The last chapter is on simulation of the CAR(2) and CAR(1) process. Although it is not directly related to parameter estimation that this dissertation is mainly concernedwith,thesimulationalgorithmisnotonlyusefulinstatisticalcomputation, butalsogivesagoodinsightoftheconnectionbetweencontinuousmodelanddiscrete model. To streamline the presentation, denote⇒ convergence in distribution, d = equality in distributionby, and 0 a:s: (t) a continuous random process converging to zero with probability one as t→∞. If f(t)> 0 is a continuous non-random function, F(T) = 10 ∫ T 0 f(s)ds, and lim T→∞ F(T) = ∞, then a continuous-time version of the Toeplitz lemma implies 1 F(T) ∫ T 0 f(t)0 a:s: (t)dt = 0 a:s: (T). (1.7) 11 Chapter 2 Ornstein Uhlenbeck Process: CAR(1) This chapter is concerned with a survey of results on the asymptotic theory and methods of inference as applied to the following first order stochastic differential equation: dX(t) =θX(t)dt+σdW(t) X(0) =x 0 (2.1) Here θ ∈ R is the parameter we are estimating and W(t) is Wiener process in the given probability space (Ω,F,P) with filtration{F t , t≥ 0} . It is known that the process is, explicitly, X(t) =x 0 e t + ∫ t 0 e (t−s) dW(s) Thesolutionof (1.3)isalsoknownasOrnsteinUhlenbeckProcess. Weshallnext have a brief review on a more general case of the process in (1.3) when θ < 0. 2.1 General Ornstein-Uhlenbeck process More generally, the Ornstein-Uhlenbeck process is written in the following way 12 dX t =a(b−X t )dt+σdW t (2.2) in which a,b,σ∈ R and σ > 0, a> 0. The initial condition can be either determin- istic or stochastic, depending on the specific setting of the model. Whena> 0,theprocessX t isstationaryandmean-reverting,meaningthatwhen X t isabovethelongtermmeanµ,thedriftisnegativeandbringingtheprocessdown towards the mean; when X t is below b, the drift is positive and pushing it up to the mean. In other words, the mean acts as an equilibrium level for the process. This gives the process its informative name, “mean-reverting”. As one can see, the parameter a in Ornstein-Uhlenbeck process (2.2) is crucial because it determines the long run property of the process, more specifically, the ergodicity of the process. Without loss of generality, we assume the long term mean iszeroanddiffusionparameterisone. ThisgivesusthesimplifiedversionofOrnstein- Uhlenbeck process (1.3). Ornstein Uhlenbeck process was firstly introduced to describe the velocity of a massive Brownian particle under the influence of friction [50]. Later it became very interesting in finance, being one of several approaches used to model interest rates, currency exchange rates, and commodity prices stochastically. In 1977, Vasicek used it to model the instantaneous interest rate (see [51] [24]). More specifically, X t in (2.2) is the instantaneous interest rate and W t is a Wiener process under the risk neutralframeworkmodellingtherandommarketriskfactor. bisthelongtermmean of the interest rate, a characterizes the velocity at which the interest rate return to mean and σ is the volatility of the interest rate, which measures the “randomness” 13 entering the system. Vasicek assumes a > 0 in his model, thus the process is mean reverting. It is the first interest rate model that captures mean reversion, which is an essential characteristic of the interest rate that sets it apart from other financial prices. 2.2 Classication of Ornstein Uhlenbeck Process Now we go back to (1.3). Discussion in the previous section says that the sign of the parameter θ in (1.3) determines the long run property of the process X(t). Based on this, we classify the process into three categories: 1. Stable case(mean-reversion) When θ < 0, the process defined by (1.3) is stable and stationary. 2. Unstable case(explosive case) When θ > 0, the solution to SDE is unstable (explosive). 3. Neutrally stable case(Brownian motion) When θ = 0, X(t) is actually the standard Brownian motion 2.3 Ergodicity of Ornstein Uhlenbeck Process For a continuous-time diffusion process defined as follows, dX t =S(X t )dt+σ(X t )dW t ,X 0 ,0≤t≤T (2.3) 14 Kutoyants gave the conditions of ergodicity in [17] and [32]. The process we are interestedinisaspecialcaseof (2.3). Nowwestateandprovethefollowingtheorem: Theorem 2.3.1. For process X(t) in the first order SDE (1.3), ergodicity is equiv- alent to θ < 0. Moreover, when θ < 0, the invariant density is √ − 2 exp(θx 2 ) and the process conforms ergodic theorem. In particular, lim T→∞ 1 T ∫ T 0 X t dt =Eξ = 0 Proof. A sufficient and necessary condition for a diffusion process (2.3) to be ergodic is given by Kutoyants [32]. Proposition 1.15 and Theorem 1.16 in [32] says that the stochastic process has ergodic properties if the functions S(·) and σ(·) satisfy the following two conditions: V(S,x) = ∫ x 0 exp { −2 ∫ y 0 S(v) σ(v) 2 dv } dy→±∞, as x→±∞ (2.4) and G(S) = ∫ +∞ −∞ σ(x) −2 exp { 2 ∫ x 0 S(v) σ(v) 2 dv } dx<∞ (2.5) By the first condition the diffustion process is recurrent, i.e., the time to return to any bounded set A is finite with probability 1 and by the second condition this time has finite mathematical expectation. If both conditions are fulfilled then this 15 process is recurrent positive, i.e., it has ergodic properties. This means that for any measurable function h(·), such that E|h(ξ)|<∞ the following limit: 1 T ∫ T 0 h(X t )dt→ ∫ ∞ −∞ h(x)f S (x)dx≡Eh(ξ) (2.6) holds with probability 1. Here the function f S (x) =G(S) −1 σ(x) −2 exp { 2 ∫ x 0 S(v) σ(v) 2 dv } (2.7) is the invariance density of the process and ξ is the random variable with stationary density function f S (·). Now we want to show that the process X(t) in first order SDE (1.3) is ergodic if and only if it is stable (θ < 0). We need to verify that it satisfies the positive recurrent condition: V(S,x) = ∫ x 0 exp { −θy 2 } dy→±∞, as x→±∞ (2.8) and G(S) = ∫ +∞ −∞ exp(θx 2 )dx<∞ (2.9) It is not hard to see that θ < 0 is the necessary and sufficient condition for the two inequalities above to hold. Moreover, for any measurable function h(·), such thatE|h(ξ)|<∞ the following limit: lim T→∞ 1 T ∫ T 0 h(X t )dt = ∫ ∞ −∞ h(x)f S (x)dx≡Eh(ξ) (2.10) 16 holds with probability 1, where f S (x) = √ −θ 2π exp(θx 2 ) is the invariance density of the process and ξ is the a random variable with density function f S (·). In particular, taking h(·) as the identity function, we have 1 T ∫ T 0 X t dt⇒Eξ = 0 2.4 Parameter Estimation By [34, Theorem 7.6], the measure P X T generated by (X(t), 0≤t≤T) in the space of continuous functions is absolutely continuous with respect to the corresponding measure P W T generated by the Brownian motion (W(t),0≤t≤T), and dµ X dµ W = exp (∫ t 0 θX(s)dW s − 1 2 ∫ t 0 (θX(s)) 2 ds ) (2.11) Differentiateitw.r.t. θ,onecanobtainthemaximumlikelihoodestimatorofθ,which is ˆ θ T = ∫ T 0 X(t)dX(t) ∫ T 0 X 2 (t)dt = X 2 (t)−T 2 ∫ T 0 X 2 (t)dt (2.12) 17 2.4.1 Consistency of Estimator Since dX(t) =θX(t)dt+dW(t), (2.12) is equivalent to ˆ θ T −θ = ∫ T 0 X(t)dW(t) ∫ T 0 X 2 (t)dt (2.13) Its asymptotic behavior is entirely defined by the asymptotics of these stochastic integrals. Lipster and Shiryaev [34] showed consistency of the estimator, namely Theorem 2.4.1. [34, theorem 17.4.] The MLE ˆ θ T is strongly consistent. In other words, for each θ∈R, P { lim T→∞ ˆ θ T =θ } = 1 2.4.2 Asymptotic Distribution of Estimator Brown and Herwitt [15] proved that for the stable(also ergodic) case, the MLE is asymptotically normal. Feigin [19] covered both stable and unstable cases by a unifying approach. It is of great importance because it highlights the generality of themartingaleapproachtotheasymptotictheoryofmaximumlikelihoodestimation. Kutoyants [32] discussed some non ergodic models and gave a neat proof for the unstable OU process(proposition 3.46 in [32] ). The following is the sketch of the proof. If θ > 0, then the process {X(t),x ≥ 0} goes to ±∞ as t → ∞. Indeed, the solution of (1.3) can be written as 18 X(t) =x 0 e t +σe t ∫ t 0 e −s dW(s) =Z(t)e t where Z(t) = x 0 σ + ∫ t 0 e −s dW(s)⇒ x 0 σ + ζ 2 √ 2θ , ζ 2 ∼N(0,1). Hence lim t→∞ X(t) e t = x 0 σ + ζ 2 √ 2θ i.e., we have We have two integrals to study: ∫ T 0 Z(t)e t dW t and ∫ T 0 Z 2 (t)e 2t dt. Introduce the stochastic process Y(t) = ∫ t 0 e s dW(s), t≥ 0 e −T Y(T)⇒ ζ 1 √ 2θ , ζ 1 ∼N(0,1) We have by the Itˆ o formula [29], e −T ∫ T 0 X(t)dW(t) =e −T ∫ T 0 Z(t)dY(t) =Z t e −T Y(T)− Tθ 2 e −T −e −T ∫ T 0 e −t Y(t)dW(t) ⇒ ζ 1 (ζ 2 + √ 2θx 0 /σ) 2θ (2.14) 19 as T →∞ and with probability 1 e −2T ∫ T 0 X 2 (t)dt =e −2T ∫ T 0 Z ( t)2e 2t dt =e −2T 1 2θ ∫ T 0 Z 2 (t)de 2t = Z 2 (T) 2θ −e −2T 1 2θ ∫ T 0 e 2t d(Z 2 (t)) = Z 2 (T) 2θ −o p (1)⇒ (ζ 2 + √ 2θx 0 /σ) 2 4θ 2 . (2.15) Now the convergence follows directly from these representations. Note that random variables ζ 1 and ζ 2 are asymptotically independent because they are jointly asymptotically normal and lim t→∞ E (Z(t)e −t Y(t)) =E [( ζ 2 + √ 2θx 0 σ ) ζ 1 ] = 0. In the end we have e T ( ˆ θ T −θ) = e −T ∫ T 0 X(t)dW(t) e −2T ∫ T 0 X 2 (t)dt ⇒ 2θ ζ 1 ζ 2 + √ 2θx 0 /σ (2.16) which is Cauchy distribution when x 0 = 0. In summary, • The maximum likelihood estimator b θ T of θ using the observations of Y(t), 0≤ t≤T is b θ T = ∫ T 0 Y(t)dY(t) ∫ T 0 Y 2 (t)dt ; (2.17) 20 theestimatorisstronglyconsistentasT →∞: lim T→∞ b θ T =θwithprobability one for all θ∈R. • If θ < 0 (asymptotically stable or ergodic case), then lim T→∞ √ |θ|T( b θ T −θ) d = √ 2|θ|ξ, (2.18) where ξ is a standard normal random variable. • If θ = 0 (neutrally stable case), then lim T→∞ T( b θ T −θ) d = w 2 (1)−1 2 ∫ 1 0 w 2 (s)ds , (2.19) where w =w(s), 0≤s≤ 1, is a standard Brownian motion. • If θ > 0 (unstable or explosive case), then lim T→∞ e T ( b θ T −θ) d = 2θ η ξ+c , (2.20) where ξ = √ 2θ ∫ ∞ 0 e −t dW(t) is a standard normal random variable, η is a standard normal random variable independent of ξ, and c = √ 2θy 0 /σ. In particular, if y 0 = 0, then the limit has the Cauchy distribution and does not depend on σ. 21 2.5 Connection with Discrete Model ThediffusionprocessdefinedasthesolutionoffirstorderSDE(1.3)isthecontinuous- timeanalogueofthefirstorderautoregressiveprocess,a.k.a,AR(1). Mathematically, x t =ax t−1 +u t , t = 1,2,3,··· ,T (2.21) where u t is the noise, i.i.d with mean zero and variance σ 2 . Least square estimator of a is ˆ a = ∑ x t x t−1 ∑ x 2 t−1 (2.22) Compared with (2.12), (2.22) is the discrete time analogue of the MLE of θ. Mann and Wald [36] showed that when |a| < 1, the least square estimator is asymptotically normally distributed with mean a and variance 1−a 2 T . White [52] showed that when|a|> 1, the least square estimator is asymptotically Cauchy. The case|a| = 1 is called ”unit root” in time series analysis [40]. White [52] also covered this case, showing by Donsker’s theorem [18] that the asymptotical distribution is 1 2 W 2 (1)− 1 2 ∫ 1 0 W 2 (s)ds . All these results conforms with the results from continuous case. 22 Chapter 3 Parameter Estimation for CAR(2) In this chapter, we consider the problems of parameter estimation for the second order SDE (1.1). Recall the stochastic differential equation ¨ X(t) =θ 2 ˙ X(t)+θ 1 X(t)+σ ˙ W(t), t> 0, X(0) =a, ˙ X(0) =b, (3.1) whereW(t)beaWienerprocessinthegivenprobabilityspace(σ,F,P)withfiltration {F t ,t≥ 0} and a,b are non-random initial conditions. We can interpret it as a system of two equations: dX =Ydt dY = (θ 1 X +θ 2 Y)dt+σdW(t). (3.2) Hereθ 1 ,θ 2 ∈Raretheparametersofinterest. Nextweshalldiscusstheparameter estimation problem under the assumption that one parameter is known and also estimating two parameters jointly when neither parameter is known to us. 23 3.1 Classication of Solution From introduction, we know that when θ 1 < 0 and θ 2 < 0, the solution of (1.1) is ergodic. For any other value of the pair (θ 1 ,θ 2 ), the solution is nonergodic, meaning that the ergodic theorem does not apply and different approach must be taken to tackle the parameter estimation problem in such cases. Before we do that, it makes sense to classify the solution of (1.1) so that we can treat them differently with respect to the unique property of each class. The way we classify the solution is similar to the way we classify the solution of the corresponding ODE (1.4). Next, we classify the solution with respect to the eigenvalues. The solution of (1.1) is a Gaussian process X(t) =ax 1 (t)+bx 2 (t)+σ ∫ t 0 x 2 (t−s)dW(s), (3.3) where the functions x 1 (t),x 2 (t) form the fundamental system of solutions for the equation ¨ x(t)−θ 2 ˙ x(t)−θ 1 x(t) = 0. (3.4) In other words, x 1 (0) = 1, ˙ x 1 (0) = 0, x 2 (0) = 0, ˙ x 2 (0) = 1, and both x 1 = x 1 (t) and x 2 =x 2 (t) satisfy (3.4). Recall that the characteristic equation for (3.4) is r 2 −θ 2 r−θ 1 = 0, (3.5) 24 and the roots of (1.5) are p = θ 2 + √ θ 2 2 +4θ 1 2 , q = θ 2 − √ θ 2 2 +4θ 1 2 . (3.6) Then, with the usual modifications for complex, p,q, x 1 (t) = qe pt −pe qt q−p , if p>q, e qt (1−qt), if p =q; x 2 (t) = e pt −e qt p−q , if p>q, te qt , if p =q. (3.7) • Ergodicityofthepair(X(t), ˙ X(t))requirestherealpartofpandq arenegative. Equivalently,θ 1 < 0,θ 2 < 0. Therearethreesubcases: p,q aredistinctnegative real roots; p,q are repeated negative roots; p,q are complex conjugate with negative real parts. • When p,q are real numbers and the process is unstable, there are six cases to consider: p,q are distinct and positive; p,q are repeated and positive; p,q have opposite sign; p>q = 0; p = 0>q; p,q are both zero; • When p and q are complex numbers and the process is nonergodic, we are facing two cases: p,q are pure imaginary roots and p,q are complex numbers with positive real parts. 25 3.2 Parameter Estimation of θ 1 and θ 2 The process Y(t) is a diffusion type process in the sense of Lipster and Shiryaev; Therefore, by [34, Theorem 7.6], the measure P Y T generated by (Y(t),0 ≤ t ≤ T) in the space of continuous functions is absolutely continuous with respect to the corresponding measure P W T generated by the Brownian motion (W(t),0 ≤ t ≤ T), and the likelihood function is dP Y T dP W T (Y) = exp ( 1 σ 2 ∫ t 0 (θ 2 Y(t)+θ 1 X(t))dY(t)− 1 2σ 2 ∫ t 0 (θ 2 Y(t)+θ 1 X(t)) 2 dt ) . (3.8) The log-likelihood function is ln dP Y T dP W T (Y) = 1 σ 2 ∫ t 0 (θ 2 Y(t)+θ 1 X(t))dY(t)− 1 2σ 2 ∫ t 0 (θ 2 Y(t)+θ 1 X(t)) 2 dt. (3.9) Inthelog-likelihoodfunction(3.9),takethederivativew.r.tθ 1 andθ 2 ,oneobtains the following linear system: 0 = ∫ T 0 X(t)dY(t)− ∫ T 0 X(t)(θ 2 Y(t)+θ 1 X(t))dt 0 = ∫ T 0 Y(t)dY(t)− ∫ T 0 Y(t)(θ 2 Y(t)+θ 1 X(t))dt (3.10) 26 Maximum likelihood estimators of θ 1 and θ 2 are b θ T 1 = ∫ T 0 Y 2 (t)dt ∫ T 0 X(t)dY(t)− ∫ T 0 Y(t)dY(t) ∫ T 0 Y(t)X(t)dt ∫ T 0 X 2 (t)dt ∫ T 0 Y 2 (t)dt− ( ∫ T 0 Y(t)X(t)dt ) 2 b θ T 2 = ∫ T 0 X 2 (t)dt ∫ T 0 Y(t)dY(t)− ∫ T 0 X(t)dY(t) ∫ T 0 Y(t)X(t)dt ∫ T 0 X 2 (t)dt ∫ T 0 Y 2 (t)dt− ( ∫ T 0 Y(t)X(t)dt ) 2 (3.11) The estimators are well-defined: by the Cauchy-Schwartz inequality, (∫ T 0 ˙ X 2 (t)dt )(∫ T 0 X 2 (t)dt ) > (∫ T 0 X(t) ˙ X(t)dt ) 2 with probability one. The residual satisfies b θ T 1 −θ 1 =σ ∫ T 0 Y 2 (t)dt ∫ T 0 X(t)dW(t)− ∫ T 0 Y(t)X(t)dt ∫ T 0 Y(t)dW(t) ∫ T 0 X 2 (t)dt ∫ T 0 Y 2 (t)dt− ( ∫ T 0 Y(t)X(t)dt ) 2 , b θ T 2 −θ 2 =σ ∫ T 0 X 2 (t)dt ∫ T 0 Y(t)dW(t)− ∫ T 0 Y(t)X(t)dt ∫ T 0 X(t)dW(t) ∫ T 0 X 2 (t)dt ∫ T 0 Y 2 (t)dt− ( ∫ T 0 Y(t)X(t)dt ) 2 . (3.12) Inwhatfollows, wewilldenoteby thetwo-dimensionalcolumnvector(θ 1 θ 2 ) ⊤ , and byX =X(t), the column vector ( X(t) ˙ X(t) ) ⊤ , so that d ˙ X = ⊤ Xdt+σdW(t) (3.13) and ^ T − = Ψ −1 (T) ∫ T 0 X(t)dW(t) (3.14) 27 where Ψ(T) = ∫ T 0 X(t)X ⊤ (t)dt = ∫ T 0 X 2 (t)dt ∫ T 0 X(t) ˙ X(t)dt ∫ T 0 X(t) ˙ X(t)dt ∫ T 0 ˙ X 2 (t)dt (3.15) Then it follows that the log-likelihood ratio is L T (ϑ) = ln dP Y T dP W T (Y) = 1 σ ∫ T 0 ( #− ) ⊤ XdW(t)− 1 2σ 2 ∫ T 0 ( #− ) ⊤ X(t)X ⊤ (t) ( #− ) dt = 1 σ ∫ T 0 ( #− ) ⊤ XdW(t)− 1 2σ 2 ( #− ) ⊤ Ψ(T) ( #− ) dt (3.16) Notice that matrix Ψ(T) is the Fisher information matrix. The basic questions in the study of the estimators b θ 1 (T), b θ 2 (T) are 1. strong consistency, that is, proving that, with probability one, lim T→∞ b θ i (T) = θ i , i = 1,2; 2. rate of convergence and limit distribution, that is, finding positive deter- ministic functions v i (T), i = 1,2, such that, as T → ∞, v i (T) ↗ +∞ and v i (T) ( b θ i;T −θ i ) converge in distribution to non-degenerate random variables, and identifying the corresponding limit distributions; 3. local asymptotic structure of the normalized log-likelihood ratio ℓ T (u) =L T (+A T u), (3.17) 28 as T → ∞, where A T ∈ R 2 is a suiably chosen deterministic matrix with lim T→∞ ∥A T ∥ = 0(any matrix norm will work), andu∈R 2 . 3.2.1 Consistency of MLE Theorem 3.2.1. When either θ 1 or θ 2 is known, the maximum likelihood estimators b θ 1;T and b θ 2;T in (3.12) are strongly consistent, namely, with probability one, lim T→∞ b θ T 1 =θ 1 , lim T→∞ b θ T 2 =θ 2 , (3.18) Proof. The system (1.1) in the matrix-vector form is dX(t) = ΘX(t)dt+(0 σ) ⊤ dW(t), whereX = (X ˙ X) ⊤ and Θ = 0 1 θ 1 θ 2 . The estimator studied in [7] is b Θ T = (∫ T 0 ( dX(t)X ⊤ (t) ) )(∫ T 0 X(t)X ⊤ (t)dt ) −1 . (3.19) Direct computations show that b Θ T = 0 1 b θ 1;T b θ 2;T . 29 The statement of the theorem now follows from Theorem 2.1 and Remark 3.1 in [7]. 3.2.2 Asymptotic Distribution of MLE: Preparation Before detailed discussion on the limit distribution of the MLEs in each case, some technical results are introduced to help reduce the amount of calculation later on. Given two square-integrable on [0,T] functions f,g, define N(T;f,g) = (∫ T 0 f 2 (t)dt )(∫ T 0 g(t)σdW(t) ) (3.20) − (∫ T 0 f(t)g(t)dt )(∫ T 0 f(t)σdW(t) ) , D(T;f,g) = (∫ T 0 f 2 (t)dt )(∫ T 0 g 2 (t)dt ) − (∫ T 0 f(t)g(t)dt ) 2 . (3.21) Clearly, D(T;f,g) =D(T;g,f), but in general N(T;f,g)̸=N(T;g,f). Then formulas (3.12) become b θ T 1 −θ 1 = N(T; ˙ X,X) D(T;X, ˙ X) , b θ T 2 −θ 2 = N(T;X, ˙ X) D(T;X, ˙ X) . (3.22) 30 For every real number c and every square-integrable functions f,g, we have the following identities: D(T;f +cg,g) =D(T;f,g), D(T;cf,g) =c 2 D(T;f,g), N(T;f +cg,g) =N(T;f,g)−cN(T;g,f), N(T;f,cf +g) =N(T;f,g). (3.23) ThegeneralideaofalltheproofsistofindtheasymptoticbehaviorofD(T;X, ˙ X), N(T;X, ˙ X), and N(T; ˙ X,X), as T → ∞. Working directly with the processes X and ˙ X reduces the problem to the investigation of five integrals: ∫ T 0 X 2 (t)dt, ∫ T 0 ˙ X 2 (t)dt, ∫ T 0 X(t) ˙ X(t)dt, ∫ T 0 X(t)dW(t), ∫ T 0 ˙ X(t)dW(t). (3.24) Sometimes, computations can be greatly simplified by writing X = α ˙ X +R or ˙ X =βX+Q for some real numbers α,β and random processes R,Q, and then using formulas (3.23). This approach still requires investigation of five integrals of the type (3.24). ( For a general discussion on finding limit distribution of the functional ∫ T 0 f(X t )dt, one can refer to Khasminskii [30].) Here are some technical results to be used later. For r > 0, define the random variables ξ r = ∫ ∞ 0 e −rs dW(s), η r (T) = ∫ T 0 e −r(T−s) dW(s). (3.25) 31 It follows by direct computation that lim T→∞ η r (T) d =η r , (3.26) where the random variable η r is normal with mean zero and variance 1/(2r). Since lim T→∞ Eξ q η r (T) =e −rT ∫ T 0 e (r−q)s ds = 0, and zero correlation for Gaussian random variables is equivalent to independence, it follows that η r and ξ q are independent for every q,r > 0. 3.2.3 Asymptotic Structure of Log-likelihood Function As far as the normalized log-likelihood ratio (3.17), it follows from (3.16) that ℓ T (u) = 1 σ ∫ T 0 ( u ⊤ A T X(t) ) dW(t)− 1 2σ 2 u ⊤ A T Ψ T A T u, (3.27) that is, ℓ T is quadratic in u. The hope is that, with a suitable choice of the matrix A T , the limit ℓ ∞ (u) = lim T→∞ ℓ T (u) (3.28) exists in distribution for every u∈R. Three particular cases of ℓ T satisfying (3.28) have attracted special attention: 32 • Local Asymptotic Normality (LAN), when there exists a bi-variate normal vector with mean zero and non-degenerate covariance matrix Σ such that, for everyu∈R 2 , ℓ ∞ (u) = 1 σ u ⊤ − 1 2σ 2 u ⊤ Σ u; (3.29) • Local Asymptotic Mixed Normality (LAMN) if there exist a bi-variate nor- mal vector with zero mean and unit covariance matrix, and a random sym- metric positive definite matrix B ∈R 2×2 such that B and are independent and, for everyu∈R 2 , ℓ ∞ (u) = 1 σ u ⊤ B 1=2 − 1 2σ 2 u ⊤ Bu. (3.30) If (3.30) holds with a degenerate matrix B, we refer to ℓ T as DLAMN (degen- erate locally asymptotically mixed normal). • Local Asymptotic Brownian Functional structure (LABF) if ℓ ∞ (u) = 1 σ ∫ 1 0 u ⊤ G(t)dw(t)− 1 2σ 2 ∫ 1 0 u ⊤ G(t)G ⊤ (t)udt, (3.31) where G ∈ R 2×2 is an adapted process and the pair (G,w) is a Gaussian process. The LAN and LAMN properties of ℓ T imply certain asymptotic efficiency of the corresponding maximum likelihood estimator (MLE): 33 1. If ℓ T is LAN, then the corresponding MLE is asymptotically efficient in the sense of achieving the lower bound in the Cramer-Rao inequality; for details see [25, Theorem II.12.1]. 2. If ℓ T is LAMN, then the corresponding MLE has the maximal concentration property; for details, see [9, Theorem 2.2.1]. Note that the result requires non-degeneracy of the matrix B and therefore does not immediately extend to DLAMN. In the LABF case, there are results about asymptotic efficiency [27, Sec- tion 3, Proposition 10] of Bayessian estimators and sequential estimators cite[Theorem2]Feigin. To put our results in perspective, let us recall the estimation problem of the drift θ in the CAR(1) model, which is the one-dimensional OU process Y =Y(t) defined by dY(t) =θY(t)dt+σdW(t), Y(0) =y 0 . (3.32) If θ̸= 0, then b θ T is NLRR with rate R(T) = (∫ T 0 Y 2 (t)dt ) 1 2 and lim T→∞ (∫ T 0 Y 2 (t)dt ) 1=2 ( b θ T −θ ) d =ση, (3.33) where η is a standard normal random variable. 34 The normalized log-likelihood ratio ℓ T (u) = u σ ∫ T 0 Y(t)dW(t) ( E ∫ T 0 Y 2 (t)dt ) 1=2 − u 2 2σ 2 ∫ T 0 Y 2 (t)dt E ∫ T 0 Y 2 (t)dt , u∈R. is LAN if θ < 0, LABF if θ = 0 and LAMN if θ > 0. FormoredetailsonasymptoticstructureoflikelihoodfunctionfordiscreteAR(1), one can refer to [47]. Yoshida [53] studied a n-dimensional stochastic differential equation of the type dX(t) =θAX(t)dt+dW(t),t> 0, X(0) = 0, (3.34) where A is a known n by b matrix, W is an n-dimensional Wiener process, and θ is a positive-valued unknown parameter. In this setting, the MLE of θ is asymptotically mixed normal. This agrees with the result of the first order case (1.3). However, since the CAR(2) (1.1) has only 1-dimensional Brownian motion, it is essentially different from (3.34). The fact is in nonergodic case of CAR(2), we generally do not have LAMN. Detailed discussion on this topic will be found in the end of each section or chapter dealing with each specific case. 35 3.3 Parameter Estimation when One Parameter is Known Before we study the maximum likelihood estimators of θ 1 and θ 2 when they are both unknowns, we study the simpler case when one of them is known and the other is being estimated. Denote the maximum likelihood estimator of θ 1 given that θ 2 as e θ T 1 and the maximum likelihood estimator of θ 2 given that θ 1 as e θ T 2 . If θ 2 is known and one wants to estimate θ 1 , take the derivative w.r.t θ 1 , ∫ T 0 X(t)dY(t)− ∫ T 0 X(t)(θ 2 Y(t)+θ 1 X(t))dt = 0 Hence the maximum likelihood estimator of θ 1 is e θ T 1 = ∫ T 0 X(t)dY(t)−θ 2 ∫ T 0 X(t)Y(t)dt ∫ T 0 X 2 (t)dt =θ 1 + σ ∫ T 0 X(t)dW(t) ∫ T 0 X 2 (t)dt (3.35) Similarly, if θ 1 is known and one wants to estimate θ 2 , the maximum likelihood estimator of θ 2 is e θ T 2 = ∫ T 0 Y(t)dY(t)−θ 2 ∫ T 0 Y 2 (t)dt ∫ T 0 Y 2 (t)dt =θ 2 + σ ∫ T 0 ˙ X(t)dW(t) ∫ T 0 ˙ X 2 (t)dt (3.36) In the following chapters, we shall prove the consistency of e θ T 1 and e θ T 2 , and derive their asymptotic distributions in all possible cases of the underlying process. 36 Chapter 4 Parameter Estimation for Second Order SDE: Ergodic Case From introduction we know that when θ 1 < 0 and θ 2 < 0 in equation 1.1, the processes X(t) and ˙ X(t) are ergodic. For all ergodic cases, we will prove that the MLEs of θ 1 and θ 2 are asymptotically normal. 4.1 Some Preliminaries First we introduce the definition of Gaussian martingale. Denition 4.1.1. [26] A Gaussian martingale is an R d -valued martingale X such that (i) X 0 = 0; (ii) the distribution of any finite family (X t 1 ,...,X tn ) is Gaussian. The stochastic integral ∫ T 0 X(t)dW(t) is a martingale [39] but not Gaussian, because X(t) is not deterministic but random. However, in ergodic case, stochastic integral ∫ T 0 X(t)dW(t) will converge to a Gaussian martingale as T →∞. Theorem 4.1.2. [26, VIII 3.1.1] 37 AssumethatX isamulti-dimensionalcontinuousGaussianmartingalewithchar- acteristics (0,C,0), and that each X n is a multi-dimensional continuous martingale. If D is a dense subset of R + , the following are equivalent: (i) X n d =X; (ii))⟨X n;i ,X n;j ⟩ t d =C ij t for all i,j, any t∈D. This theorem can be extended to a family of martingales. Theorem 4.1.3. Let M = (M 1 (t),...,M d (t)), 0≤ t≤ 1, be a d-dimensional con- tinuous Gaussian martingale with X(0) = 0, and let M T = (M T;1 (t),...,M T;d (t)), T ≥ 0, 0≤t≤ 1, be a family of continuous square-integrable d-dimensional martin- gales such that M T (0) = 0 for all T and, for every t∈ [0,1] and i,j = 1,...,d, lim T→∞ ⟨M T;i ,M T;j ⟩(t) =⟨M i ,M j ⟩(t) in probability. Then lim T→∞ M T d = M in the topology of continuous functions on [0,1]. Proof. Modulo a non-essential (in this case) difference between a sequence and a familyindexedbythepositivereals, thisisaparticularcaseoftheprevioustheorem. Withtheaidofthetheoremabove, wenowstateandprovethefollowingtheorem for ergodic processes: 38 Theorem 4.1.4. If{X t , t≥ 0} is an ergodic process, then √ T ∫ T 0 X t dW t ∫ T 0 X 2 t dt ⇒ 1 σ N(0,1), where σ 2 = lim t→∞ EX 2 t . Proof. For fixed T > 0, let M T (t) = 1 √ T ∫ tT 0 X s dW s ,T > 0. For any T, M T (t) is a martingale with quadratic variation⟨M T ⟩(t) = 1 T ∫ tT 0 X 2 s ds. Because X t is ergodic, according to ergodic theorem, as T →∞, ⟨M⟩ T (t) = 1 T ∫ tT 0 X 2 s ds→σ 2 t (4.1) where σ 2 = lim t→∞ EX 2 t . By [26, VII 3.1.1], M T (t) converges in distribution to σW(t) as T →∞. Setting t = 1, we have √ T ∫ T 0 X t dW t ∫ T 0 X 2 t dt = ∫ T 0 X t dW t / √ T ∫ T 0 X 2 t dt/T ⇒ 1 σ N(0,1) Thistheoremcanbeseenasthecentrallimittheoremforcontinuoustimeergodic process. A discrete time version can be found in literature such as [23] which applies to stationary ergodic sequences. 39 For every r > 0, η r (t), t ≥ 0, defined in (3.25) is an ergodic (in fact, strictly mixing) process. Therefore 1 T ∫ T 0 η r (t)dt = 0 a:s: (T), 1 T ∫ T 0 η 2 r (t)dt = 1 2r +0 a:s: (T), (4.2) (by ergodic theorem), and lim T→∞ 1 √ T ∫ T 0 η r (t)dW(t) d =η ⊥ r , (4.3) where η ⊥ r is normal with mean zero and variance 1/(2r), and the random variables (ξ q , η r , η ⊥ r ) are jointly independent for every q,r > 0 (by Theorem 4.1.4). Finally, recall that, by the law of iterated logarithm for the standard Brownian motion, T −1=2 |W(T)| = T " 0 a:s: (T) for every ε > 0. Then the equality η r (·) d = ¯ W(·), where ¯ W(t) =e −rt W ( e 2rt −1 2r ) implies T −" η r (T) = 0 a:s: (T) (4.4) for every ε> 0. 40 4.2 Parameter Estimation when One Parameter is Known In this section we study the maximum likelihood estimator of one parameter under the assumption that the other parameter is known. We are going to show they are strongly consistent and asymptotically normal. Theorem4.2.1. Ifθ 1 < 0, θ 2 < 0, in other words,in ergodic case, then the maximum likelihood estimator of θ 1 given that θ 2 is known is strongly consistent in the large sample asymptotic T →∞: lim T→∞ e θ T 1 =θ 1 with probability one, and √ T( e θ T 1 −θ 1 ) = √ T ∫ T 0 X t dW t ∫ T 0 X 2 t dt ⇒ √ 2θ 1 θ 2 η 1 The maximum likelihood estimator of θ 2 given that θ 1 is known is also strongly consistent in the large sample asymptotic lim T→∞ e θ T 2 =θ 2 with probability one and √ T( e θ T 2 −θ 2 ) = √ T ∫ T 0 ˙ X(t)dW t ∫ T 0 ˙ X 2 (t)dt ⇒ √ 2|θ 2 | η 2 . η 1 and η 2 are standard Gaussian random variables. 41 Proof. It follows from that b θ T 1 −θ 1 = ∫ T 0 X(t)dW(t) ∫ T 0 X 2 (t)dt (4.5) The process Z(t) = ∫ t 0 X(s)dW(s) is a continuous square-integrable martingale with quadratic characteristic ⟨Z⟩(t) = ∫ t 0 X 2 (s)ds, so that b θ T 1 −θ 1 = Z(T) ⟨Z⟩(T) . By the strong law of large numbers for martingales (see, for example, [34, Corollary 1 to Theorem 2.6.10]), to complete the proof of the theorem it is enough to show that the integral ∫ ∞ 0 X 2 (t)dt = +∞. Define the random process Q(t) = ∫ t 0 X 2 (s)ds It is a non-decreasing process and therefore the limit Q ∞ = lim t→∞ Q(t) = ∫ ∞ 0 X 2 (s)ds, 42 finite or infinite, exists with probability one. We need to show thatP(Q ∞ = +∞) = 1. Assume this does not hold, then for ϵ> 0 there exists some C > 0 such that P(Q ∞ <C)>ϵ (4.6) Fix ϵ and C. Since Q ∞ ≥Q T for all T > 0, P ( Q T T < C T ) =P(Q T <C)≥P(Q ∞ <C) (4.7) From (4.1) we know that Q T T ⇒σ 2 1 > 0 (4.8) Then ϵ≤ lim T→ P ( Q T T < C T 2 ) =P(σ 2 1 < 0). which contradicts (6.33). From (4.1.4) we have √ T( e θ T 1 −θ 1 ) = √ T ∫ T 0 X(t)dW(t) ∫ T 0 X 2 (t)dt ⇒ 1 σ 1 η 1 where σ 2 1 = lim t→∞ EX 2 (t) = √ 2θ 1 θ 2 . Proof for consistency of e θ T 2 is similar. 43 4.3 Parameter Estimation when Neither Parame- ter is Known Suppose either θ 1 or θ 2 is known to us and we are estimating them together. The maximum likelihood estimators are (3.11) and they are more complicated than the MLE ˆ θ 1;T and ˆ θ 2;T . Now we state and prove the most important result of this chapter: Theorem 4.3.1. If θ 1 < 0 ,θ 2 < 0 and , the maximum likelihood estimators b θ 1;T and b θ 2;T ares asymptotically joint normal. √ T b θ 1;T −θ 1 b θ 2;T −θ 2 ⇒N 0 0 ,Σ , where Σ = 2θ 1 θ 2 0 0 2|θ 2 | (4.9) Besides, there exists a normal limit with a random rate (NLRR): lim T→∞ (∫ T 0 ˙ X 2 (t)dt ) 1=2 ( b θ 1;T −θ 1 ) d =ση 1 , lim T→∞ (∫ T 0 X 2 (t)dt ) 1=2 ( b θ 2;T −θ 2 ) d =ση 2 , (4.10) where η 1 and η 2 are iid standard normal random variables. Denote by diag(x,y) the diagonal matrix with x and y on the main diagonal, and A T =diag(T −1=2 ,T −1=2 ), then the normalized log-likelihood ratio ℓ T is LAN. 44 Proof. Let M T (t) = (M 1 T (t),M 2 T (t)) = ( 1 √ T ∫ tT 0 X s dW s , 1 √ T ∫ tT 0 ˙ X s dW s ) ,T > 0 M T (t) is a two dimensional Gaussian martingale. ⟨M T ⟩ = 1 T ∫ tT 0 X 2 (s)ds 1 T ∫ tT 0 X(s) ˙ X(s)ds 1 T ∫ tT 0 X(s) ˙ X(s)ds 1 T ∫ tT 0 ˙ X 2 (s)ds (4.11) Since the pair (X(t), ˙ X(t)) is ergodic, by ergodic theorem, 1 T ∫ tT 0 X 2 (s)ds =t lim s→∞ EX 2 (s) = t 2|θ 1 θ 2 | 1 T ∫ tT 0 X(s) ˙ X(s)ds =t lim s→∞ EX(s) ˙ X(s) = 0 1 T ∫ tT 0 ˙ X 2 (s)ds =t lim s→∞ E ˙ X 2 (s) = t 2|θ 2 | (4.12) Next apply theorem (4.1.4), M T (t) converge to a bivariate Gaussian process. Setting t = 1, we have 1 √ T ∫ T 0 X(t)dW(t) ∫ T 0 ˙ X(t)dW(t) ⇒ 1 √ 2 1 2 η 1 1 √ 2| 2 | η 2 (4.13) where η 1 and η 2 are i.i.d. standard normal random variables. Now define D T = X 4 (T) T 2 4 (∫ T 0 ˙ X 2 (t)dt T )( ∫ T 0 X 2 (t)dt T ). (4.14) 45 By direct computation, √ T( b θ 1;T −θ 1 ) = 1 1−D T √ T ∫ T 0 X(t)dW(t) ∫ T 0 X 2 (t)dt − X 2 (T) T (∫ T 0 X(t)dW(t) √ T ) (∫ T 0 ˙ X 2 (t)dt T )( ∫ T 0 X 2 (t)dt T ) √ T( b θ 2;T −θ 2 ) = 1 1−D T √ T ∫ T 0 ˙ X(t)dW(t) ∫ T 0 ˙ X 2 (t)dt − X 2 (T) T (∫ T 0 ˙ X(t)dW(t) √ T ) (∫ T 0 ˙ X 2 (t)dt T )( ∫ T 0 X 2 (t)dt T ) (4.15) From previous discussion, we have that as T →∞, X(T) converges to a normal randomvariablewithmeanzeroandfinitevariance. Hence X(T) √ T ⇒ 0. Itfollowsthat X 2 (T) T ⇒ 0 and X 4 (T) T 2 ⇒ 0 In the proof of the theorem in the beginning of this chapter, it is shown that ∫ T 0 X(t)dW(t) √ T and ∫ T 0 ˙ X(t)dW(t) √ T are asymptotically normal with zero mean and finite variance, and that ∫ T 0 X 2 (t)dt T and ∫ T 0 ˙ X 2 (t)dt T convergence to finite numbers. Thus when we send T to∞, many terms in (4.15) disappear and we have lim T→∞ √ T( b θ 1;T −θ 1 ) = lim T→∞ √ T ∫ T 0 X(t)dW(t) ∫ T 0 X 2 (t)dt lim T→∞ √ T( b θ 2;T −θ 2 ) = lim T→∞ √ T ∫ T 0 ˙ X(t)dW(t) ∫ T 0 ˙ X 2 (t)dt (4.16) Therefore, we have (4.9). The LAN of ℓ T follows and (4.10) holds. It is worth noticing that in ergodic case, the MLEs of θ 1 and θ 2 when they are both unknowns, have the same asymptotic behavior as the MLEs in the case of one 46 unknownparameter. Moreover, whentheyarebothunknowns, thelimitdistribution of ˆ θ 1 and ˆ θ 2 are independent. 4.4 Connection With Autoregression of Order Two Stationaryautoregressiveprocesshasbeenintensivelystudied. ThestationaryAR(2) process can be seen as a discrete analogue of the ergodic CAR(2). Discretize (1.1)using a uniform time step h as follows: X n −2X n−1 +X n−2 h 2 =θ 2 X n−1 −X n−2 h +θ 1 X n−1 + ξ n √ h . Then X n = (2+θ 2 h+θ 1 h 2 )X n−1 +(−θ 2 h−1)X n−2 +h 3=2 ξ n , (4.17) For an AR(2) process [40, chapter 12] y t =ϕ 1 y t−1 +ϕ 2 y t−2 +ε t (4.18) where ε t are i.i.d. N(0,σ 2 ), the condition of stationarity is 1−ϕ 2 −ϕ 1 > 0 1−ϕ 2 +ϕ 1 > 0 1+ϕ 2 > 0 (4.19) 47 Onecaneasilyverifythattheergodic(stationary)CAR(2),ifdiscretizedasabove, satisfies the condition of stationarity of AR(2). Recall that for ergodic CAR(2), θ 1 < 0,θ 2 < 0. Then, 1+θ 2 h+1−2−θ 2 h−θ 1 h 2 =−θ 1 h 2 > 0 1−θ 2 h−1+2+θ 2 h+θ 1 h 2 = 2+θ 1 h 2 > 0 for sufficiently small h 1−θ 2 h−1 =−θ 2 h> 0 (4.20) Therefore the discrete process X t defined by (4.17) is a stationary AR(2) process. Themaximumlikelihoodestimatorofϕ 1 andϕ 2 inAR(2)model(4.18)arejointly asymptoticallynormal. Then,themaximumlikelihoodestimatorsof(θ 1 ,θ 2 ),whichis alinearcombinationof(ϕ 1 ,ϕ 2 ), areasymptoticallynormalaswell. Thisisconsistent with our result in the continuous-time case. 48 Chapter 5 Nonergodic CAR(2): Real Eigenvalues This chapter deals with the non-ergodic case in which the characteristic function of SDE (1.1) has real roots. With p,q as defined in (3.6), the cases to consider are : 1. p>q > 0 : distinct positive roots; 2. p> 0>q: roots of opposite sign; 3. p> 0 =q: smaller root is zero; 4. p = 0>q: larger root is zero; 5. p =q > 0 : positive double roots; 6. p =q = 0 : double zero roots. The asymptotic behavior of the MLEs varies from case to case, but the compu- tation procedure is essentially the same. We shall firstly state the result for the four cases above, each followed by some brief remarks, then go To begin, we make some 49 general observations about the representation of the solution of (1.1). If p̸=q, then (3.3) and (3.7) imply X(t) =V p (t)−V q (t), ˙ X(t) =pV p (t)−qV q (t), (5.1) where V p (t) =e pt U p (t), U p = b−aq p−q + σ p−q ∫ t 0 e −ps dW(s), V q (t) =e qt U q (t), U q = b−ap p−q + σ p−q ∫ t 0 e −qs dW(s). (5.2) 5.1 Distinct Positive Roots Theorem 5.1.1. Let p and q be the roots of (3.6) and assume that p>q > 0. Then lim T→∞ e qT ( b θ 1;T −θ 1 ) d =− 2(p+q)pq p−q η ξ+c , (5.3) lim T→∞ e qT ( b θ 2;T −θ 2 ) d = 2(p+q)q p−q η ξ+c , (5.4) where c = √ 2q(b−ap)/σ and ξ, η are independent standard Gaussian random vari- ables. In particular, the limit distribution depends on σ and the initial conditions a = X(0) and b = ˙ X(0) unless b = ap; if indeed b = ap, then the limit distribution is Cauchy. Let r(T) = (∫ T 0 ( ˙ X(t)−pX(t)) 2 dt ) 1 2 . (5.5) 50 Then lim T→∞ r(T)( b θ 1;T −θ 1 ) d =− 1 p lim T→∞ r(T)( b θ 2;T −θ 2 ) d = p+q p−q ση, (5.6) meaning that NLRR exists in this case. Proof. When p>q > 0, we re-write (5.1) as X(t) =V p (t)−V q (t), ˙ X =pX +(p−q)V q (t). Using (3.23), we conclude that D(T;X, ˙ X) = (p−q) 2 D(T;V p ,V q ), N(T;X, ˙ X) = (p−q) ( N(T;V q ,V p )+N(T,V p ,V q ) ) , N(T; ˙ X,X) =−(p−q) ( qN(T;V q ,V p )+pN(T,V p ,V q ) ) . (5.7) In this case, (U p (t),U q (t)) is a two dimensional Guassian process. Define ζ p = b−aq p−q + σ p−q ξ p , ζ q = b−ap p−q + σ p−q ξ q , with ξ p ,ξ q defined in (3.25). It is not hard to see that (U p (t),U q (t))⇒ (ζ p ,ζ q ) 51 ∫ T 0 V p (t)dW(t) = ∫ T 0 e pt U p (t)dW(t) =U p (T) ∫ T 0 e pt dW(t)− ∫ T 0 (∫ t 0 e ps dW(s) ) dU p (t)− ∫ T 0 e pt e −pT dt =e pT U p (T)η p (T)− ∫ T 0 η p (t)dW(s)−T =e pT (U p (T)η p (T)+0 a:s: (T)) (5.8) Similarly, ∫ T 0 V q (t)dW(t) =e qT (U q (T)η q (T)+0 a:s: (T)) ∫ T 0 V 2 p dt =e 2pT ( ζ 2 p 2p +0 a:s: (T) ) ∫ T 0 V 2 q dt =e 2qT ( ζ 2 q 2q +0 a:s: (T) ) ∫ T 0 V p (t)V q (t)dt =e (p+q)T ( ζ p ζ q p+q +0 a:s: (T) ) (5.9) Piecing them together, we have lim T→∞ e −2(p+q)T D(T;X, ˙ X) = ζ 2 p ζ 2 q (p−q) 4 4pq(p+q) 2 , (5.10) lim T→∞ e −(q+2p)T N(T;V p ,V q ) d =σζ 2 p ζ q ( η q 2p − η p p+q ) , lim T→∞ e −(q+2p)T N(T;V q ,V p ) d = 0. It remains to observe that η q 2p − η p p+q 52 is a Gaussian random variable, independent of (ζ p , ζ q ), with mean zero and variance (p−q) 2 /(8p 2 q(p+q) 2 ). Then (5.3) and (5.4) follow from (5.7) and (3.22). Remarks ThelimitdistributionisCauchytype, whichissimilartounstableOUprocess. How- ever, two interesting things to observe in this theorem: (a) the rate of convergence is determined by the smaller root, and (b) the normalized residuals in the limit are negative multiples of each other. 5.2 Roots of Opposite Sign Theorem 5.2.1. Let p and q be the roots of (3.6) and assume that p> 0 and q < 0. Then, lim T→∞ √ T( b θ 1;T −θ 1 ) d =−p √ 2|q|η, (5.11) lim T→∞ √ T( b θ 2;T −θ 2 ) d = √ 2|q|η (5.12) where η is a standard normal random variable. In particular, the limit distribu- tions do not depend on the initial conditions X(0) and ˙ X(0). Besides, with r(T) as defined in (5.5), lim T→∞ r(T)( b θ 1;T −θ 1 ) d =− 1 p lim T→∞ r(T)( b θ 2;T −θ 2 ) d =ση (5.13) meaning that NLRR exists in this case. 53 Proof. To proceed, define Gaussian random variable ξ by ξ = b−aq p−q + σ p−q ∫ +∞ 0 e −ps dW(s) = b−aq p−q + σ p−q ξ p and note that V p =e pt ( ξ+0 a:s: (t) ) , V q = σ p−q η |q| (t)+ b−ap p−q e qt . By direct computation, ∫ T 0 V 2 p (t)dt = ( ξ 2 2p +0 a:s: (T) ) e 2pT , ∫ T 0 V 2 q (t)dt = σ 2 T 2|q|(p−q) 2 ( 1+0 a:s: (T) ) , (5.14) ∫ T 0 V p V q (t)dt = ∫ T 0 e pt ( ξ+0 a:s: (t) ) V q (t)dt =e pT ( ξη |q| (T)+0 a:s: (T) ) , ∫ T 0 V p (t)dW(t) = √ T e pT 0 a:s: (T) lim T→∞ 1 √ T ∫ T 0 V q (t)dW(t) d = σ p−q η ⊥ |q| . Therefore, D T (T;X, ˙ X) = ( Te 2pT ) ( ξ 2 2p σ 2 2|q| +0 a:s: (T) ) , lim T→∞ e −2pT √ T N(T;V p ,V q ) d = σ 2 p−q ξ 2 2p η ⊥ |q| , lim T→∞ e −2pT √ T N(T;V q ,V p ) d = 0. 54 Then (5.11) and (5.12) follow from (3.22). Remarks This result comes as a surprise: the asymptotic behavior of both estimators is dic- tated by the exponentially stable mode, even though this mode is “invisible” with probability one: it follows from (3.3) that P ( lim t→∞ 1 t ln|X(t)| =q ) =P ( lim t→∞ 1 t ln| ˙ X(t)| =q ) = 0 for all initial conditions X(0), ˙ X(0). When the roots of (3.6) are real and distinct, equation (3.4) has two Lyapunov exponents, p and q. Thus, Theorem 5.2.1 sug- gests that if the larger Lyapunov exponent of (3.4) is positive, then the asymptotic behavior of the estimators is determined by the smaller Lyapunov exponent. Note that the right-hand sides of (5.11) and (5.12) are negative multiple of the other. This is not a complete surprise: correlation of the limits in multi-parameter non-ergodic models has been documented before; see, for example, [35, Section 4.1]. The other non-surprising part is that the exponentially stable mode leads to the rate √ T and a Gaussian limit independent of the initial conditions. Passing to the limit p ↘ 0 in (5.12) suggests that, when q < 0 = p, the rate of convergence v 1 (T) for b θ 2;T is faster than the rate of convergence v 2 (T) for b θ 1;T : v 1 (T)≪v 2 (T) in the sense that lim T→∞ v 2 (T)/v 1 (T) = +∞. This is consistent with Theorem 5.3.1, which is the next case we consider. 55 5.3 Larger Root is Zero Theorem 5.3.1. Let p and q be the roots of (3.6) and assume that p = 0 and q < 0. Then lim T→∞ T ( b θ 1;T −θ 1 ) d =|q| w 2 (1)−1 2 ∫ 1 0 w 2 (s)ds , (5.15) lim T→∞ √ T ( b θ 2;T −θ 2 ) d = √ 2|q|η. (5.16) where η is a standard normal random variable, w = w(s), 0≤ s≤ 1, is a standard Brownian motion, andη andw are independent. In particular, the limit distributions do not depend on the initial conditions X(0) and ˙ X(0). In this case, with r(T) as defined in (5.5), lim T→∞ ( b θ 1;T −θ 1 ) d == σ 2 √ 2|q| 3=2 η. (5.17) NLRR exists for b θ 1;T only. Proof. By (5.1) with p = 0, ˙ X(t) =be qt +ση |q| (t), X(t) =U 0 (t)−V q (t) =U 0 (t)− ˙ X(t)/q. (5.18) 56 Using (3.23), D(T;X, ˙ X) =D(T;U 0 , ˙ X), N(T;X, ˙ X) = 1 q N(T, ˙ X,U 0 )+N(T;U 0 , ˙ X), N(T; ˙ X,X) =N(T; ˙ X,U 0 ). (5.19) Direct computations using (4.2)–(4.4) show that ∫ T 0 ˙ X 2 (t)dt = σ 2 T 2|q| (1+0 a:s: (T)), (5.20) ∫ T 0 U 2 0 (t)dt = σ 2 T 2 q 2 ( 1 T 2 ∫ T 0 W 2 (t)dt+0 a:s: (T) ) , (5.21) ∫ T 0 U 0 (t) ˙ X(t)dt =T 3=2 0 a:s: (T), (5.22) lim T→∞ 1 √ T ∫ T 0 ˙ X(t)dW(t) d =ση ⊥ |q| , (5.23) ∫ T 0 U 0 (t)dW(t) = σT |q| ( W 2 (T)−T 2T +0 a:s: (T) ) . (5.24) Note that self-similarity of the standard Brownian motion implies W 2 (T)−T 2T d = w 2 (1)−1 2 , 1 T 2 ∫ T 0 W 2 (t)dt d = ∫ 1 0 w 2 (s)ds. (5.25) 57 Combining (5.20)–(5.25) with (5.19) yields D(T;X, ˙ X) = σ 4 T 3 2|q| 3 ( 1 T 2 ∫ T 0 W 2 (t)dt+0 a:s: (T) ) , lim T→∞ T −5=2 N(T;X, ˙ X) d = σ 4 q 2 η ⊥ |q| ∫ 1 0 w 2 (s)ds, lim T→∞ T −2 N(T; ˙ X,X) d = σ 4 4q 2 ( w 2 (1)−1 ) . Then (5.15) and (5.16) follow from (3.22). To show independence of η and w, take a standard Brownian motion ˜ w = ˜ w(t), t∈ [0,1], that is independent of w and apply Theorem 4.1.3 with M(t) = (w(t), ˜ w(t)), M T (t) = ( 1 √ T ∫ Tt 0 dW(s), √ 2|q| √ T ∫ Tt 0 η |q| (s)dW(s) ) . . Then we have M T (t)⇒M(t) (5.26) Setting t = 1 in (5.26), we obtain that η and w are independent. 58 Remarks The asymptotic behavior of b θ 1;T is similar to (2.18) even though b θ 1;T is different from (2.17). Still, this similarity is not completely surprising. Indeed, if p = 0, then (3.6) reduces to r(r−q) = 0, and (1.1) becomes d ˙ X =q ˙ X +σ ˙ W, q < 0. (5.27) In other words, ˙ X is an ergodic OU process, and the unknown drift coefficient is q =θ 1 . The asymptotic behavior of b θ 2;T is similar to (2.19), although this similarity is somewhat unexpected. 5.4 Smaller Root is Zero Theorem 5.4.1. Let p and q be the roots of (3.6) and assume that p> 0 and q = 0. Then lim T→∞ T ( b θ 1;T −θ 1 ) d =−p w 2 (1)−1 2 ∫ 1 0 w 2 (s)ds , (5.28) lim T→∞ T ( b θ 2;T −θ 2 ) d = w 2 (1)−1 2 ∫ 1 0 w 2 (s)ds , (5.29) where w =w(s), 0≤s≤ 1, is a standard Brownian motion. In particular, the limit distributions do not depend on the initial conditions X(0) and ˙ X(0). 59 Proof. When p > q = 0, computations are similar to the case q < p = 0. The difference is that now X = ˙ X/p−U 0 , where U 0 (t) = b−ap p + σ p W(t), ˙ X(t) =pe pt ( ξ+0 a:s: (t) ) , ξ = b p + σ p ∫ ∞ 0 e −pt dW(t). Using (3.23), D(T;X, ˙ X) =D(T;U 0 , ˙ X), N(T;X, ˙ X) = 1 p N(T, ˙ X,U 0 )+N(T;U 0 , ˙ X), N(T; ˙ X,X) =−N(T; ˙ X,U 0 ). Then ∫ T 0 ˙ X 2 (t)dt = p 2 e 2pT ( ξ 2 +0 a:s: (T) ) , ∫ T 0 U 2 0 (t)dt = σ 2 T 2 p 2 ( 1 T 2 ∫ T 0 W 2 (t)dt+0 a:s: (T) ) , ∫ T 0 U 0 (t) ˙ X(t)dt =Te pT 0 a:s: (T), ∫ T 0 ˙ X(t)dW(t) =σe pT ( ξη p +0 a:s: (T) ) , ∫ T 0 U 0 (t)dW(t) = σT p ( W 2 (T)−T 2T +0 a:s: (T) ) . 60 Therefore, D(T;X, ˙ X) = σ 2 T 2 e 2pT 2p ( ξ 2 1 T 2 ∫ T 0 W 2 (t)dt+0 a:s: (T) ) , N(T;X, ˙ X) = σ 2 2p Te 2pT ( ξ 2 W 2 (T)−T 2T +0 a:s: (T) ) , N(T; ˙ X,X) =− σ 2 2 Te 2pT ( ξ 2 W 2 (T)−T 2T +0 a:s: (T) ) , and then (5.28) and (5.29) follow from (3.22). Remarks Eventhough,inthesettingofthetheorem,equation(1.1)becomesd ˙ X =p ˙ Xdt+dW, that is, ˙ X is an unstable OU process, the behavior of the estimators is nothing like (2.20). Still, after Theorems 5.3.1 and 5.2.1, this result does not come as a complete surprise: Theorem 5.3.1 suggests that if one of the roots is zero, we should expect a special limit of the type (2.19), and Theorem 5.2.1 suggests that the asymptotic behavior of the estimators is determined by the non-dominant Lyapunov exponent, which in this case is zero. Conclusions of Theorem 5.4.1 are consistent with the conclusions of Theorem 5.2.1. Indeed, passing to the limit q↗ 0 in both (5.11) and (5.12) suggests that, if q = 0>p, the rates v 1 (T) and v 2 (T) should be the same and faster than √ T. 61 5.5 Positive Double Root Theorem 5.5.1. Let p and q be the roots of (3.6) and assume that p =q > 0. Then lim T→∞ e qT T ( b θ 1;T −θ 1 ) d =−4 √ 2q 3 η ξ+c , (5.30) lim T→∞ e qT T ( b θ 2;T −θ 2 ) d = 4 √ 2q 2 η ξ+c , (5.31) where c = √ 2q(b−aq)/σ and ξ, η are independent standard Gaussian random vari- ables. In particular, the limit distribution depends on σ and the initial conditions X(0) and ˙ X(0) unless b =ap; if indeed b =ap, then the limit distribution is Cauchy. Define r(T) = 1 T 2 (∫ T 0 X 2 (t)dt ) 1=2 Then lim T→∞ r(T)( b θ 1;T −θ 1 ) d =− 1 p lim T→∞ r(T)( b θ 2;T −θ 2 ) d = 2 √ 2pση (5.32) Proof. With x 1 (t) = (1−qt)e qt , x 2 (t) =te qt , (3.3) becomes X(t) =a(1−qt)e qt +bte qt +σ ∫ t 0 (t−s)e q(t−s) dW(s), ˙ X(t) =qX(t)+Q(t), where Q(t) = ( b−aq+σ ∫ t 0 e −qs dW(s) ) e qt . 62 If we define ζ =b−aq+σ ∫ +∞ 0 e −qs dW(s), then X(t) =te qt ( ζ +0 a:s: (t) ) , Q(t) =e qt ( ζ +0 a:s: (t) ) . By (3.23), D(T;X, ˙ X) =e 4qT ( ζ 4 16q 4 +0 a:s: (T) ) ; (5.33) coincidentally, the same result follows after passing to the limit p↘q in (5.10). Next, define η q;1 (t) = ∫ t 0 e −q(t−s) dW(s), η q;2 (t) = 1 t ∫ t 0 se −q(t−s) dW(s), and observe that lim T→∞ T ( η q;1 (T)−η q;2 (T) ) d = η 2q 3=2 , (5.34) where η is a standard Gaussian random variable, independent of ζ. Therefore, ∫ T 0 X 2 (t)dt =T 2 e 2qT ( ζ 2 2q +0 a:s: (T) ) , ∫ T 0 Q 2 (t)dt =e 2qT ( ζ 2 2q +0 a:s: (T) ) , ∫ T 0 X(t)Q(t)dt =Te 2qt ( ζ 2 2q +0 a:s: (T) ) , ∫ T 0 Q(t)dW(t) =e qT ( ζη q;1 (T)+0 a:s: (T) ) , ∫ T 0 X(t)dW(t) =Te qT ( ζη q;2 (T)+0 a:s: (T) ) . 63 To continue, N(T;X,Q) = σ 2q T 2 e 3qT ( η q;1 −η q;2 )( ζ 3 +0 a:s: (T) ) , N(T;Q,X) = σ 2q Te 3qT ( η q;2 −η q;1 )( ζ 3 +0 a:s: (T) ) . It remains to observe that N(T;X, ˙ X) =N(T;X,Q), N(T; ˙ X,X) =−qN(T;X,Q)+N(T;Q,X). Then (5.30) and (5.31) follow from (3.22) and (5.34). This completes the proof of Theorem 5.5.1. Remarks Even though the rate in the case of a positive double root is slightly slower than exponential, the limit distribution is the same as in (2.20) for the unstable OU process. 5.6 Double Zero Root Theorem 5.6.1. Let p and q be the roots of (3.6) and assume that p = q = 0. Let w = w(s), 0 ≤ s ≤ 1, be a standard Brownian motion, and define the following random variables: ξ 1 = ∫ 1 0 w(s)ds, ξ 2 = ∫ 1 0 w 2 (s)ds, ξ 3 = ∫ 1 0 (∫ t 0 w(s)ds ) 2 dt. 64 Then lim T→∞ T ( b θ 1;T −θ 1 ) d = 2ξ 3 ( w 2 (1)−1 ) −2ξ 2 1 ( w(1)ξ 1 −ξ 2 ) 4ξ 2 ξ 3 −ξ 4 1 , (5.35) lim T→∞ T 2 ( b θ 2;T −θ 2 ) d = 4ξ 2 ( w(1)ξ 1 −ξ 2 ) −ξ 2 1 ( w 2 (1)−1 ) 4ξ 2 ξ 3 −ξ 4 1 . (5.36) In particular, the limit distributions do not depend on the initial conditions X(0) and ˙ X(0). Proof. In this case the result follows from self-similarity of the standard Brownian motion: W(Ts) d = √ Tw(s), 0≤s≤ 1. Recall the notations ξ 1 = ∫ 1 0 w(s)ds, ξ 2 = ∫ 1 0 w 2 (s)ds, ξ 3 = ∫ 1 0 (∫ t 0 w(s)ds ) 2 dt. Since, with θ 1 =θ 2 = 0, equation (1.1) becomes d ˙ X =σdW, we get ˙ X(t) =b+σW(t), X(t) =a+bt+σ ∫ t 0 W(s)ds. By direct computation, D(T;X, ˙ X) d = σ 4 T 6 4 ( 4ξ 2 ξ 3 −ξ 4 1 +0 a:s: (T) ) , N(T;X, ˙ X) d = σ 4 T 5 2 ( ξ 3 ( w 2 (1)−1 ) −ξ 2 1 ( w(1)ξ 1 −ξ 2 ) +0 a:s: (T) ) , N(T; ˙ X,X) d = σ 4 T 4 4 ( 4ξ 2 ( w(1)ξ 1 −ξ 2 ) −ξ 2 1 ( w 2 (1)−1 ) +0 a:s: (T) ) , and then both (5.35) and (5.36) follow. 65 This completes the proof of Theorem 5.6.1. Remarks Note that, in the case of the zero double root, the rate of convergence v 1 (T) for b θ 2;T is faster than the rate of convergence v 2 (T) for b θ 1;T : v 1 (T) =T ≪v 2 (T) =T 2 . This is consistent with the previous results: passing to the limit q↗ 0 in (5.16) or p↘ 0 in (5.29) both suggest that v 2 (T) ≫ T; passing to the limit q ↗ 0 in (5.30) and (5.31) suggests that v 2 (T)≫v 1 (T). 5.7 Asymptotic Structure of Normalized Log- likelihood Ratio In this section, we describe the asymptotic structure of the normalized log-likelihood ratio in the following theorem. Proof of the theorem is contained in the proofs in previous sections. Theorem 5.7.1. Denote by b p the column vector b p = 1 p . (5.37) 66 If p > 0,q < p, including q > 0, q < 0 and q = 0, and A T = e −pT b p b ⊤ p , the ℓ T is degenerate LAMN and the matrix B in (3.30) is B = (1+p 2 ) 2 2p σ 2 ζ 2 b p b ⊤ p , (5.38) where ζ is a standard normal random variable. If q <p = 0 and A T =diag(T −1 ,T −1=2 ), then ℓ T is mixed LABF/LAN: ℓ ∞ (u) =u ⊤ ξ ξ ξ− 1 2 u ⊤ Bu, (5.39) where ξ ξ ξ = σ|q| −1 ∫ 1 0 w(s)dw(s) (2|q|) −1 ση , B = σ 2 |q| −1 ∫ 1 0 w 2 (s)ds 0 0 σ 2 (2|q|) −1 . (5.40) η is a standard normal random variable, w is a standard Brownian motion, and η and w are independent. If p = q > 0 and A T = T −1 e −pT b p b ⊤ p , then ℓ T is degenerate LAMN and the matrix B in (3.30) is given by (5.38). If p = q = 0 and A T = diag(T −1 ,T −1 ), then ℓ T is LABF and the matrix G(t) in (3.31) is G(t) = σ ∫ t 0 w(s)ds 0 σw(t) 0 , (5.41) where w is a standard Brownian motion. 67 One general conclusion of this theorem is that, if p > 0 and q < p, then it is the value of the larger root p that determines asymptotic behavior of the normalized log-likelihood ratio. 5.8 Knowing one parameter and Estimating the Other From (3.35) and (3.36), it is apparent that the asymptotic behavior of the MLE is determined by the larger eigenvalue under the assumption that one parameter is known. There are six nonergodic cases discussed previously in the chapter, now we have only three cases: larger root is positive, including double positive root (p> 0), larger root is zero (p = 0>q) and double zero root (p =q = 0). A theorem is stated for each case. The proof of consistency is similar to that of the ergodic process, and the proof of limit distribution is actually already contained in the proofs of the previous six theorems. 5.8.1 Larger Root is Positive Theorem 5.8.1. If (1.1) has unstable solution and its eigenvalues are repeated and positive, equivalently, p =q > 0 , then the maximum likelihood estimator of θ 1 given that θ 2 is known is strongly consistent in the large sample asymptotic T →∞: lim T→∞ e θ T 1 =θ 1 68 with probability one, and v(p,T)( e θ T 1 −θ 1 ) = Te qT ∫ T 0 ˙ X(t)dW t ∫ T 0 ˙ X 2 (t)dt ⇒ 2p ζ 1 ζ 2 +c The maximum likelihood estimator of θ 2 given that θ 1 is known is also strongly consistent in the large sample asymptotic lim T→∞ e θ T 2 =θ 2 with probability one and v(p,T)( e θ T 2 −θ 2 ) = Te pT ∫ T 0 ˙ X(t)dW t ∫ T 0 ˙ X 2 (t)dt ⇒ 2 ζ 1 ζ 2 +c where ζ 1 and ζ 2 are independent standard normal random variables, c = a √ 2p . v(p,T) =e pT for distinct roots and v(p,T) =Te pT for double roots. 5.8.2 Larger Root is Zero Theorem 5.8.2. If (1.1) has unstable solution and its eigenvalues are repeated and positive, equivalently, p = 0>q , then the maximum likelihood estimator of θ 1 given that θ 2 is known is strongly consistent in the large sample asymptotic T →∞: lim T→∞ e θ T 1 =θ 1 with probability one, and 69 T( e θ T 1 −θ 1 )⇒ w 2 (1)−1 2 ∫ 1 0 w 2 (s)ds The maximum likelihood estimator of θ 2 given that θ 1 is known is also strongly consistent in the large sample asymptotic lim T→∞ e θ T 2 =θ 2 with probability one and √ T( e θ T 2 −θ 2 )⇒ √ 2|q| ξ, where ξ is a standard normal random variable. 5.8.3 Double Zero Root Theorem 5.8.3. If (1.1) has unstable solution and its eigenvalues are repeated and positive, equivalently, p =q = 0 , then the maximum likelihood estimator of θ 1 given that θ 2 is known is strongly consistent in the large sample asymptotic T →∞: lim T→∞ e θ T 1 =θ 1 with probability one, and T 2 ( e θ T 1 −θ 1 )⇒ w(1) ∫ 1 0 w(t)dt− ∫ 1 0 w 2 (t)dt ∫ 1 0 ( ∫ t 0 w(s)ds ) 2 dt 70 The maximum likelihood estimator of θ 2 given that θ 1 is known is also strongly consistent in the large sample asymptotic lim T→∞ e θ T 2 =θ 2 with probability one and T( e θ T 2 −θ 2 )⇒ w ( 1)−1 2 ∫ 1 0 w 2 (s)ds . 71 Chapter 6 Nonergodic CAR(2): Pure Imaginary Eigenvalues Thischapterisonparameterestimationwhentherootsarepureimaginarynumbers. It is organized in a slightly different way because the SDE (1.1) becomes ¨ X(t) =θ 1 X(t)+ ˙ W(t) (6.1) which describes an undamped harmonic oscillator driven by additive gaussian white noise if initial conditions are zero [33]. Firstly we shall state the theorem, which describes the essence of parameter estimation in this practical setting. Proof of this theorem is included in the discussion in the rest of this chapter. Theorem 6.0.4. If p = √ −1ν, then lim T→∞ T( b θ 1;T −θ 1 ) d = 2−w 2 1 (1)−w 2 2 (1) ∫ 1 0 w 2 1 (t)dt+ ∫ 1 0 w 2 2 (t)dt , lim T→∞ T( b θ 2;T −θ 2 ) d = 2ν ∫ 1 0 w 1 (t)dw 2 (t)− ∫ 1 0 w 2 (t)dw 1 (t) ∫ 1 0 w 2 1 (t)dt+ ∫ 1 0 w 2 2 (t)dt , (6.2) where w 1 ,w 2 are independent standard Brownian motion. 72 Let A T =diag(T −1 ,T −1 ), then ℓ T is LABF and the matrix G(t) in (3.31) is G(t) = σw 1 (t) σw 2 (t) −σw 2 (t) σw 1 (t) It is surprising that, while there is only one Brownian motion in the model, two Brownian motions are required to characterize the limit distributions. The appear- ance of the Levy stochastic area in the second limit adds to the surprise. Besides, the result does not depend on the initial conditions. 6.1 HarmonicOscillatorDrivenbyRandomForce In classical mechanics, a simple harmonic oscillator is an oscillator that is neither driven nor damped. Mathematically it can be described as the solution of the ODE ¨ X(t) =−c 2 X(t) (6.3) Inrealoscillatorsfriction,ordamping,slowsthemotionofthesystem. Generally, damped harmonic oscillators satisfy ¨ X(t) =a ˙ X(t)−c 2 X(t) (6.4) where a< 0 represents the existence of damping. With a random driving force described by the Brownian motion, the models with and without damping become 73 ¨ X(t) =a ˙ X(t)−c 2 X(t)+ ˙ W(t) (6.5) and ¨ X(t) =−c 2 X(t)+ ˙ W(t) (6.6) respectively. The undamped harmonic oscillator driven by additive gaussian white noise is a special case of the second order stochastic differential equation (1.1) when θ 1 < 0,θ 2 = 0. In this chapter, we study two hypothesis testing problems regarding the undamped harmonic oscillator. Here we define random variables ϕ 1 and ϕ 2 , which we shall refer to from time to time , by ϕ 1 = w 2 1 (1)+w 2 2 (1)−2 ( ∫ 1 0 w 2 1 (t)dt+ ∫ 1 0 w 2 2 (t)dt ), ϕ 2 = ∫ 1 0 w 1 (t)dw 2 (t)− ∫ 1 0 w 2 (t)dw 1 (t) ( ∫ 1 0 w 2 1 (t)dt+ ∫ 1 0 w 2 2 (t)dt ) (6.7) 6.2 Estimation of Frequency with No Damping In this section, we assume that damping parameter equals zero, meaning that θ 2 = 0 is known and θ 1 is being estimated. 74 When θ 2 = 0 and θ 1 =−c 2 < 0, (1.1) is reduced to the form ¨ X(t) =−c 2 X(t)+ ˙ W(t), t> 0, X(0) = ˙ X(0) = 0, (6.8) The solution of (6.8) is X(t) = 1 c ∫ t 0 sin(c(t−s))dW(s), ˙ X(t) = ∫ t 0 cos(c(t−s))dW(s) (6.9) Assuming θ 2 = 0 is known, the maximum likelihood estimator for θ 1 is e θ T = ∫ T 0 X(t)dY(t) ∫ T 0 X 2 (t)dt = ∫ T 0 X(t)d ˙ X(t) ∫ T 0 X 2 (t)dt . (6.10) Since d ˙ X(t) =θ 1 X(t)dt+dW(t), the residual satisfies b θ 1;T −θ = ∫ T 0 X(t)dW(t) ∫ T 0 X 2 (t)dt . (6.11) Let A = {a n , n ≥ 1} be a sequence of positive numbers such that lim n→∞ a n = +∞. Then finding the limit lim T→∞ T ( e θ 1;T −θ 1 ) is equivalent to find- ing lim n→∞ a n ( e θ 1;an −θ 1 ) for every sequence A. Working with a sequence is just a technical modification that would allow us to use some standard limit theorems that are stated for sequences. 75 We define the processes M n t and N n t as follows: M n t = √ 2 ca n ∫ ant 0 coscsdW s ∼N ( 0, t c + sin(2ca n t) 2c 2 a n ) , N n t = √ 2 ca n ∫ ant 0 sincsdW s ∼N ( 0, t c − sin(2ca n t) 2c 2 a n ) (6.12) Then X(a n s) = √ a n 2c sin(ca n s)M n s − √ a n 2c cos(ca n s)N n s ˙ X(a n s) = √ ca n 2 cos(ca n s)M n s + √ ca n 2 sin(ca n s)N n s (6.13) Now we want to study the asymptotic behavior of ∫ an 0 X(t)dW t ∫ an 0 X(t) 2 dt (6.14) as n→ +∞. By a simple application of change of variables, ∫ an 0 X(t)dW t =a n (∫ 1 0 M n t dN n t − ∫ 1 0 N n t dM n t ) (6.15) ∫ an 0 X 2 (t)dt = a 2 n c ∫ 1 0 (sin(ca n t)N n t −cos(ca n t)M n t ) 2 dt (6.16) It follows that a n ∫ an 0 X(t)dW t ∫ an 0 X 2 (t)dt =c ∫ 1 0 M n t dN n t − ∫ 1 0 N n t dM n t ∫ 1 0 (sin(ca n t)N n t −cos(ca n t)M n t ) 2 dt (6.17) 76 The right hand side of (6.17) is indeed a function of (M n t ,N n t ). Thus it makes sense to start with finding the limit distribution of the (M n t ,N n t ). (M n t ,N n t ) a two-dimensional Gaussian martingale. The quadratic variation of M n t and N n t is ⟨M n t ,N n t ⟩ =EM n t N n t = 2 ca n ∫ ant 0 coscssincsds = 1−cos(2ca n t) 2a n c 2 (6.18) when n→∞, ⟨M n t ,M n t ⟩→t/c, ⟨N n t ,N n t ⟩→t/c, ⟨M n t ,N n t ⟩→ 0 (6.19) Therefore, by (4.1.2), (M n t ,N n t )⇒ ( W 1 (t) √ c , W 2 (t) √ c ) . (6.20) where W 1 (t) and W 2 (t) are two independent Wiener process. (6.20) is the most basic and important result, on which the rest of the proof is built. We deal with the denominator of (6.14) first. An straight forward application of trigonomic identities gives us 77 ∫ 1 0 (sin(ca n t)N n t −cos(ca n t)M n t ) 2 dt = ∫ 1 0 (sin 2 (ca n s)(M n s ) 2 +cos 2 (ca n s)(N n s ) 2 −2sin(ca n s)cos(ca n s)M n s N n s )ds = [∫ 1 0 1−cos(2ca n s) 2 (M n s ) 2 ds+ ∫ 1 0 1+cos(2ca n s) 2 (N n s ) 2 ds − ∫ 1 0 sin(2ca n s)M n s N n s ds ] = 1 2 [∫ 1 0 (M n s ) 2 ds+ ∫ 1 0 (N n s ) 2 ds− ∫ 1 0 cos(2ca n s)(M n s ) 2 ds + ∫ 1 0 cos(2ca n s)(N n s ) 2 ds−2 ∫ 1 0 sin(2ca n s)M n s N n s ds ] (6.21) Now we want to show that the last three terms in the brackets all converge to 0 in distribution. 1. By integration by parts, 78 ∫ 1 0 cos(2ca n s)(M n s ) 2 ds = 1 2ca n ∫ 1 0 (M n s ) 2 dsin(2ca n s) = 1 2ca n sin(2ca n s)(M n s ) 2 | 1 0 − 1 2ca n ∫ 1 0 sin(2ca n s)d(M n s ) 2 = 1 2ca n sin(2ca n s)(M n s ) 2 | 1 0 − 1 ca n ∫ 1 0 sin(2ca n s)M n s dM n s − 1 2ca n ∫ 1 0 sin(2ca n s)d⟨M n s ⟩ = sin(2ca n )(M n 1 ) 2 2ca n − 1 ca n ∫ 1 0 sin(2ca n s)M n s dM n s − 1 2ca n ∫ 1 0 sin(2ca n s)d⟨M n s ⟩ (6.22) As a n → +∞, (M n 1 ) 2 ⇒W 2 (1)/c. Then sin(2can)(M n 1 ) 2 2can ⇒ 0. Moreover, ⟨M n t ⟩ = t c + sin(2cant) 2c 2 an is uniformly bounded by 2t c , then d⟨M n t ⟩ = ( 1 c + cos(2cact) 2c ) dt. E ⟨∫ 1 0 sin(2ca n s)M n s dM n s , ∫ 1 0 sin(2ns)M n s dM n s ⟩ =E ∫ 1 0 [sin(2ca n s)M n s ] 2 d⟨M n s ⟩ ≤ 2 c ∫ 1 0 E(M n s ) 2 ds< constant (6.23) Then, 1 2can ∫ 1 0 sin(2ca n s)M n s dM n s ⇒ 0 as a n →∞. The last term in (6.22) also converges to zero in distribution. Mathematically, 79 1 2ca n ∫ 1 0 sin(2a n s)d⟨M n s ⟩ ≤ 1 c 2 a n ∫ 1 0 dt⇒ 0 That the three terms converge in distribution to zero, which is a constant, implies that they converge in probability to zero. Now we have shown that ∫ 1 0 cos(2a n s)(M n s ) 2 ds → 0 in probability. Of course this also implies ∫ 1 0 cos(2a n s)(M n s ) 2 ds⇒ 0 By a similar argument, ∫ 1 0 cos(2ca n s)(N n s ) 2 ds⇒ 0 (6.24) 2. Now the cross term: 2 ∫ 1 0 sin(2ca n s)M n s N n s ds =− 1 ca n ∫ 1 0 M n s N n s dcos(2ca n s) = M n 1 N n 1 a n + 1 a n ∫ 1 0 cos(2ca n s)d(M n s N n s ) = M n 1 N n 1 a n + 1 a n (∫ 1 0 cos(2ca n s)N n s dM n s + ∫ 1 0 cos(2ca n s)M n s dN n s + ∫ 1 0 cos(2ca n s)d⟨N n s ,M n s ⟩ ) ⇒ 0 (6.25) by similar argument as we did in the previous step. Therefore, with (6.20) 80 ∫ 1 0 (sin(ca n t)N n t −cos(ca n t)M n t ) 2 dt⇒ 1 2c (∫ 1 0 W 2 1 (s)ds+ ∫ 1 0 W 2 2 (s)ds ) (6.26) Let Y n t = (M n t ,N n t ). Then one can show that Y n t is predictably uniformly tight, which is define as follows: Denition 6.2.1. [26] Y n t is predictably uniformly tight, if lim a→∞ sup H n;i ∈H n ;n∈N P(| 2 ∑ i=1 H n;i ·Y n;i t |>a) = 0 (6.27) By Proposition VI.6.13 in [26], a continuous local martingale Y n t is predictably uniformly tight is equivalent to that ⟨Y n ,Y n ⟩ t is tight for each t > 0. As we have computed, ⟨Y n ,Y n ⟩ t = t c + sin(2cant) 2c 2 an 1−cos(2cant) 2anc 2 1−cos(2cant) 2anc 2 t c − sin(2cant) 2c 2 an (6.28) which is tight for each t> 0. Next, let us consider the four-dimensional process ( M n t , N n t , ∫ t 0 M n s dN n s , ∫ t 0 N n s dM n s ) , 0≤t≤ 1. To find its limiting distribution, we need Theorem VI.6.22 in [26], which states Theorem 6.2.2. [26, VI.6.22] For each n ∈ ¯ N let Y n be a d-dimensional semi- martingales onB, and let H n be a q×d-dimensional adapted cadlag process onB, 81 and set (H n ·X n ) = ∑ 1≤j≤ dH n;ij ·X n;j for i = 1,...,q. Assume that the sequence (X n ) n∈N is predictably uniformly tight. If we have (X n ,H n ) ⇒ (X ∞ ,H ∞ ), then (X n ,H n ,H n ···X n )⇒ (X ∞ ,H ∞ ,X ∞ ·H ∞ ) Since the process Y n t converges in distribution to ( W 1 (t) √ c , W 2 (t) √ c ), the theorem con- cludes that ( M n t ,N n t , ∫ 1 0 M n s dN n s , ∫ 1 0 N n s dM n s ) ⇒ ( W 1 (t) √ c , W 2 (t) √ c , 1 c ∫ t 0 W 1 (s)dW 2 (s), 1 c ∫ t 0 w 1 (s)dw 2 (s) ) . (6.29) 6.2.1 Consistency of MLE Theorem 6.2.3. If θ 1 = −c 2 < 0, where c > 0, then the estimator b θ T is strongly consistent in the large sample asymptotic T →∞: lim T→∞ b θ 1;T =θ with probability one. Proof. It follows from (6.11) that b θ 1;T −θ = ∫ T 0 X(t)dW(t) ∫ T 0 X 2 (t)dt (6.30) The process Z(t) = ∫ t 0 X(s)dW(s) is a continuous square-integrable martingale with quadratic characteristic ⟨Z⟩(t) = ∫ t 0 X 2 (s)ds, 82 so that b θ T −θ = Z(T) ⟨Z⟩(T) . By the strong law of large numbers for martingales (see, for example, [34, Corollary 1 to Theorem 2.6.10]), to complete the proof of the theorem it is enough to show that the integral ∫ ∞ 0 X 2 (t)dt = +∞. Define the random process Q(t) = ∫ t 0 X 2 (s)ds It is a non-decreasing process and therefore the limit Q ∞ = lim t→∞ Q(t) = ∫ ∞ 0 X 2 (s)ds, finite or infinite, exists with probability one. We need to show thatP(Q ∞ = +∞) = 1. Assume this does not hold, then for ϵ> 0 there exists some C > 0 such that P(Q ∞ <C)>ϵ (6.31) Fix ϵ and C. Since Q ∞ ≥Q T for all T > 0, P ( Q T T 2 < C T 2 ) =P(Q T <C)≥P(Q ∞ <C) (6.32) 83 From (6.16) and (6.26) we know that lim T→+∞ Q T T 2 ⇒ 1 2c (∫ 1 0 W 2 1 (s)ds+ ∫ 1 0 W 2 2 (s)ds ) =ξ (6.33) Then ϵ≤ lim T→ P ( Q T T 2 < C T 2 ) = lim T→ P(ξ < 0). which contradicts (6.33). 6.2.2 Asymptotic Distribution of MLE The process X is not ergodic, meaning that the rate of convergence can be different from √ T. As the next theorem shows, the rate of convergence is T and the limiting distribution is not normal. Now we are about to state the major theorem of this section, which is Theorem 6.2.4. With the random variable ϕ 2 defined in , for every θ 1 =c 2 > 0, lim T→∞ T ( b θ 1;T −θ 1 ) ⇒ 2cϕ 2 . (6.34) Proof. Since A = {a n , n ≥ 1} is a sequence of positive numbers such that lim n→∞ a n = +∞. Then (6.34) is equivalent to lim n→∞ a n ( b θ 1;an −c 2 ) ⇒ 2cϕ 2 84 for every sequence A. Apply continuous mapping theorem to (6.29), lim n→∞ ∫ 1 0 ( M n (t)dN n (t)−N n (t)dM n (t) ) ∫ 1 0 ( sin(cb n t)N n (t)−cos(cb n t)M n (t) ) 2 dt ⇒ 2ϕ 2 , which complete the proof. 6.3 Testing for Damping with Known Frequency In this section we assume θ 1 is known and θ 2 is the parameter to be estimated. 6.3.1 Hypothesis Testing Theorem 6.3.1. If θ 2 = 0 and the frequency c> 0 is known, then P( lim T→∞ b θ 2;T = 0) = 1 (6.35) and lim T→∞ T b θ 2;T d =ϕ 1 , (6.36) withϕ 1 defined in (6.7). Thus, for sufficiently largeT, the null hypothesisH 0 :θ 2 = 0 (no damping) can be rejected in favor of the alternative H 1 : θ 2 < 0 (damping is present) at the level of significance α if b θ 2;T <γ , where P(ϕ 1 <γ ) =α. 85 Proof. The arguments are similar to the proof of Theorem 6.2.4. We have b θ 2;T −θ 2 = Q(T) ⟨Q⟩(T) , where Q(t) = ∫ t 0 ˙ X(s)dW(s), and lim T→∞ ⟨Q⟩(T) = +∞ with probability one, because, with X given by (6.9), we get ˙ X(t) = ∫ t 0 cos(c(t−s))dW(s), so that P(lim t→∞ | ˙ X(t)|→ 0) = 0. Thus (6.35) follows from the strong law of large numbers for martingales. Next, we use the martingales M n and N n defined in (6.12) to write a n ( b θ 2:an −θ 2 ) = ∫ 1 0 ( M n (t)dM n (t)+N n (t)dN n (t) ) ∫ 1 0 ( cos(ca n t)M n (t)+sin(ca n t)N n (t) ) 2 dt . The proof of Theorem 6.2.4 show that the right-hand side of the last equality con- verges in distribution to 2 ∫ 1 0 w 1 (t)dw 1 (t)+ ∫ 1 0 w 2 (t)dw 2 (t) ∫ 1 0 (w 2 1 (t)+w 2 2 (t))dt =ϕ 1 . Remark. Analysis of the proof shows that (6.35) holds for all θ 2 ∈ R, but (6.36) only holds for θ 2 = 0. 86 6.3.2 Connection with Discrete Model To conclude the section, we comment briefly about the connections with the discrete time case. In [16, Corollary 3.3.8], Chan and Wei consider the parameter estimation problem for the second-order auto-regression x n = 2cosθx n−1 +αx n−2 +ε n (6.37) and show that if α =−1, then, for all θ∈ (0,π), the least squares estimator ˆ α n of α satisfies lim n→∞ n(ˆ α n +1) d =ϕ 1 . (6.38) Let us discretize (1.1) using a uniform time step h as follows: X n −2X n−1 +X n−2 h 2 +a X n−1 −X n−2 h +c 2 X n−1 = ξ n √ h . Then X n = (2−ah−c 2 h 2 )X n−1 +(ah−1)X n−2 +h 3=2 ξ n , (6.39) which is of the same form as (6.37), with 2cosθ = 2−ah−c 2 h 2 and α = ah−1. In particular, if a = 0, then α = −1, and then estimation of α = −1 in (6.37) is equivalent to estimation of a = 0 in (6.39). Since the limiting distribution of n(ˆ α n +1) in (6.37) does not depend on θ, the limiting distribution of nˆ a n in (6.39) does not depend on h, suggesting that the result should continue to hold in the limit h→ 0. Theorem 6.3.1 shows that this is indeed the case, which is rather remarkable, 87 because in general there is little connection between the estimators in continuous time and the corresponding estimators for discretized models. 6.4 Testing for Damping with Unknown Fre- quency Inthissectionwedealwithestimationoftwoparameters,assumingneitherisknown. Therefore, the MLEs of θ 1 and θ 2 under this assumption is (3.12). Theorem 6.4.1. If θ 1 = −c 2 < 0 and θ 2 = 0, then the estimatorsb a T and b θ T are strongly consistent in the large sample asymptotic: with probability one, lim T→∞ b θ 2;T = 0, lim T→∞ b θ 1;T =θ 1 =−c 2 , (6.40) and lim T→∞ T ( b θ 2;T −θ 2 ) d =ϕ 1 , lim T→∞ T ( b θ 1;T −θ 1 ) d = 2cϕ 2 , (6.41) with random variables ϕ 1 , ϕ 2 defined in (6.7). Then, for sufficiently large T, the null hypothesis H 0 : θ 2 = 0 (no damping) can be rejected in favor of the alternative H 1 : θ 2 < 0 (damping is present) at the level of significance α if b θ 2;T > γ , where P(ϕ 1 >γ ) =α. Proof. Consistency of the MLEs follows from theorem (3.2.1). 88 We first establish (6.41). Define D T = ( ∫ T 0 X(t) ˙ X(t)dt ) 2 ( ∫ T 0 ˙ X 2 (t)dt )( ∫ T 0 X 2 (t)dt )= ( X(T) T ) 4 4 ( 1 T 2 ∫ T 0 ˙ X 2 (t)dt )( 1 T 2 ∫ T 0 X 2 (t)dt ). (6.42) By direct computation, (3.12) becomes T( b θ 2;T −θ 2 ) = 1 1−D T 1 T ∫ T 0 ˙ X(t)dW(t) 1 T 2 ∫ T 0 ˙ X 2 (t)dt − ( 1 T ∫ T 0 X(t)dW(t) )( X 2 (T) 4T 2 ) ( 1 T 2 ∫ T 0 ˙ X 2 (t)dt )( 1 T 2 ∫ T 0 X 2 (t)dt ) (6.43) T( b θ 1;T −θ 1 ) = 1 1−D T 1 T ∫ T 0 X(t)dW(t) 1 T 2 ∫ T 0 X 2 (t)dt − ( 1 T ∫ T 0 ˙ X(t)dW(t) )( X 2 (T) 4T 2 ) ( 1 T 2 ∫ T 0 ˙ X 2 (t)dt )( 1 T 2 ∫ T 0 X 2 (t)dt ) (6.44) If θ 1 =−c 2 < 0 and θ 2 = 0, then X(T) = 1 c ∫ T 0 sin(c(T−s))dW(s), ˙ X(T) = ∫ T 0 cos(c(T−s))dW(s). In particular,EX 2 (T)≤T/c 2 and therefore. lim T→∞ X 2 (T) T 2 = 0. (6.45) in probability. Next, (6.34) implies lim T→∞ 1 T ∫ T 0 X(t)dW(t) 1 T 2 ∫ T 0 X 2 (t)dt d = 2cϕ 2 , 89 and (6.36) implies lim T→∞ 1 T ∫ T 0 ˙ X(t)dW(t) 1 T 2 ∫ T 0 ˙ X 2 (t)dt d =ϕ 1 . Finally, from (6.45) and the second equality in (6.42), lim T→∞ D(T) = 0 in probability. This establishes (6.41). 6.5 Remarks 1. Inthischapter, weconsidertheparameterestimationproblemfortheequation ¨ X(t) =θ 2 ˙ X(t)+θ 1 X(t) = ˙ W, X(0) = ˙ X(0) = 0. when the eigenvalues are pure imaginary numbers. We establish consistency and the rate of convergence of the maximum likelihood estimators for θ 1 and θ 2 when θ 2 = 0 and θ 1 < 0 (undamped harmonic oscillator), and use the result to propose a statistical procedure for testing θ 2 = 0 vs θ 2 < 0. 2. Whenθ 2 = 0 andθ 1 > 0, the limiting distribution of T b θ 2;T does not depend on θ 1 and is the same as for the corresponding estimator in the disretized version of the equation. 90 3. The asymptotic behavior of the MLEs ( b θ 1;T , b θ 2;T ) does not depend on the initial condition or the diffusion σ. 91 Chapter 7 Nonergodic CAR(2): Complex Eigenvalues Suppose the eigenvalues are complex numbers and the process is unstable, say, p = µ+νi,q =µ−νi where µ> 0, ν > 0. Then the fundamental system of solutions of the associated ODE (1.4) is x 1 (t) = e t ν (νcosνt−µsinνt), x 2 (t) = 1 ν e t sinνt (7.1) This will give us X(t) = e t ν ((b−aµ)sinνt+aνcosνt)+ σ ν ∫ t 0 e (t−s) sinν(t−s)dW(s), (7.2) ˙ X(t) =µX(t)+Y(t), where (7.3) Y(t) =e t ( (b−aµ)cosνt−aνsinνt ) +σ ∫ t 0 e (t−s) cosν(t−s)dW(s). (7.4) Define Gaussian random variables η c = b−aµ ν + σ ν ∫ +∞ 0 e −t cosνtdW(t), η s =−a+ σ ν ∫ +∞ 0 e −t sinνtdW(t) (7.5) 92 and the functions V c (t) =e t cosνt, V s (t) =e t sinνt. Next, define ξ c (t) =e −t ∫ t 0 V c (s)dW(s), ξ s (t) =e −t ∫ t 0 V s (s)dW(s) (7.6) Keep in mind that η c , η s , ξ c , ξ s are different from η andξ defined in (3.25). The superscription indicated the trigonometric integrand in the stochastic integral. With these notations, X(t) and Y(t) can be expressed in a neater way: X(t) =η c V s (t)−η s V c (t)+e t 0 a:s: (t), Y(t) =η s V s (t)+η s V s (t)+e t 0 a:s: (t). (7.7) η c and η s are indeed the limit of b−aµ ν + σ ν ∫ T 0 e −t cosνtdW(t), and −a+ σ ν ∫ T 0 e −t sinνtdW(t) respectively in the sense of convergence in distribution. 93 Although ξ c (T) and ξ s (T) have finite variance and similar structure with η c and η s , they are not convergent due to the oscillating nature of sine and cosine. In more details, F c (T) =Var(ξ c (T)) = 1 4µ (1+cos(φ)cos(2νT −φ))+o(1), F s (T) =Var(ξ s (T)) = 1 4µ (1−cos(φ)cos(2νT −φ))+o(1), F cs (T) =Cov(ξ c (T),ξ s (T)) = 1 4µ cos(φ)sin(2νT −φ)+o(1) (7.8) where o(1) denotes a function ε = ε(T) such that lim T→∞ ε(T) = 0, and cosφ = µ/ √ µ 2 +ν 2 , sinφ =ν/ √ µ 2 +ν 2 . Since generally the pair (ξ c (T),ξ s (T)) does not converge, we consider a sequence (ξ c (T k ),ξ s (T k )). Choice of T k can be made such that T k →∞ and sin(2νT i −φ) = sin(2νT j − φ), cos(2νT i − φ) = cos(2νT j − φ) for any i,j. For example, T k+1 = T k +π/ν. With proper choice of T k , F(T k ) = F c (T k ) F cs (T k ) F cs (T k ) F s (T k ) can converge, and an application of theorem 4.1.2 will yield the limit of (ξ c (T k ),ξ s (T k )). 94 7.1 Parameter Estimation and Asymptotic Struc- ture Theorem 7.1.1. If p = µ + √ −1ν, µ > 0, the limit distribution does not exist. However, with proper choice of sequence T k ,k ∈ N , for i = 1,2, the families {e T k ( b θ i;T −θ i ),k∈N} have limit distributions of the form ¯ ξ c ¯ η c + ¯ ξ s ¯ η s ¯ ξ 2 c + ¯ ξ 2 s , where the bivariate normal vectors ( ¯ ξ c , ¯ ξ s ) and (¯ η c ,¯ η s ) are independent and E¯ η c = E¯ η s = 0. Moreover, the MLE is locally asymptotically mixed normal. Thisresultsshouldbecomparedwiththeexampleconsideredin[35, Section4.1]: dX 1 (t) dX 2 (t) = µ −ν ν µ X 1 (t) X 2 (t) dt+ dW 1 (t) dW 2 (t) . (7.9) Whiletheeigenvaluesofthematrixin(7.9)arealsoµ± √ −1ν,thespecialstructureof the model ensures that the MLEs of µ andν, when normalized by √ 2µe T , converge to a joint limit (which, for zero initial conditions, turns out to be the bivariate t 2 -distribution). 95 Proof. Straight forward computation yields e −T ∫ T 0 X(t)dW(t) =η c ξ s (T)−η s ξ c (T)+0 a:s: (T) e −T ∫ T 0 Y(t)dW(t) =ν(η c ξ c (T)+η s ξ s (T))+0 a:s: (T) e −2T ∫ T 0 X 2 (t)dt = (η c ) 2 F s (T)+(η s ) 2 F c (T)−2η c η s F cs (T)+0 a:s: (T) e −2T ∫ T 0 Y 2 (t)dt =ν 2 [(η c ) 2 F c (T)+(η s ) 2 F s (T)+2η c η s F cs (T)+0 a:s: (T)] e −2T ∫ T 0 X(t)Y(t)dt =ν[(η c ) 2 F cs (T)−(η s ) 2 F cs (T)−η c η s F c (T)+η c η s F s (T)+0 a:s: (T)] (7.10) and N(T;X,Y) =e 3T ν[(η 2 c +η 2 s )(η c ξ c F s +η s ξ s F c )+0 a:s: (T)] N(T;Y,X) =e 3T ν 2 [(η 2 c +η 2 s )(η c ξ s F c −η s ξ c F s )+0 a:s: (T)] D(T;X,Y) =e 4T ν 2 [(η 2 c +η 2 s ) 2 (F c (T)F s (T)−F 2 cs (T))+0 a:s: (T)] (7.11) Notice that F c (T)F s (T)−F 2 cs (T) = ν 2 16µ 2 (µ 2 +ν 2 ) D(T;X,Y) = e 4T ν 4 16µ 2 (µ 2 +ν 2 ) [(η 2 c +η 2 s ) 2 +0 a:s: (T)] (7.12) By (3.23), 96 N(T;X, ˙ X) =e 3T ν[((η c ) 2 +(η s ) 2 )(η c ξ c F s (T)+η s ξ s F c (T))+0 a:s: (T)] N(T; ˙ X,X) =e 3T ν[((η c ) 2 +(η s ) 2 )(νη c ξ s F c (T)−νη s ξ c F s (T) −µη c ξ c F s (T)−µη s ξ s F c (T))+0 a:s: (T)] D(T;X, ˙ X) = e 4T ν 4 16µ 2 (µ 2 +ν 2 ) [(η 2 c +η 2 s ) 2 +0 a:s: (T)] (7.13) D(T;X, ˙ X), if scaled by e 4T , does converge to a Chi-square distribution, while N(T; ˙ X,X) andN(T;X, ˙ X) do not converge in general. Choose T k ,k∈N such that lim k→∞ T k = +∞ lim k→∞ cos(2νT k −φ) exists lim k→∞ sin(2νT k −φ) exists (7.14) then(ξ c (T k ),ξ s (T k ))convergetoa2-dimensionalGaussianlimitdistribution, and hence N(T; ˙ X,X) and N(T;X, ˙ X) converge. Obviously sequence of T k satisfying (7.14) exists. For example: T k+1 =T k + π ν , ,k∈N Let T k ,k∈N be a sequence satisfying (7.14) and denote ξ c (T k )F s (T k ) ξ s (T k )F c (T k ) ⇒ ¯ ξ 1 ¯ ξ 2 (7.15) 97 e T ( ˆ θ 1;T −θ 1 )⇒ 16µ 2 (µ 2 +ν 2 ) ν 4 η c (ν ¯ ξ 2 −µ ¯ ξ 1 )−η s (ν ¯ ξ 1 +µ ¯ ξ 2 ) (η c ) 2 +(η s ) 2 e T ( ˆ θ 2;T −θ 2 )⇒ 16µ 2 (µ 2 +ν 2 ) ν 3 η c ¯ ξ 1 +η s ¯ ξ 2 (η c ) 2 +(η s ) 2 (7.16) Next we prove the locally asymptotically mixed normality(LAMN) for {T k ,k ∈ N} satisfying (7.14). Let A D (T) =− e −T ν ν 0 µ −1 (7.17) Then A D is a deterministic matrix and A D (T) ∫ T 0 X(t)dW(t) ∫ T 0 dX(t)dW(t) = η s −η c η c η s ξ c (T) ξ s (T) (7.18) and A D ΨA T D = η s −η c η c η s F c F cs F cs F s η s η c −η c η s (7.19) For T = T k ,(ξ c (T),ξ s (T)) converge to a bivariate Gaussian random variable with variance matrix 1 4µ 1+g c cosϕ g s cosϕ g s cosϕ 1−g c cosϕ = lim k→∞ F c (T k ) F cs (T k ) F cs (T k ) F s (T k ) (7.20) where g c = lim k→∞ cos(2νT k −φ), and g s = lim k→∞ sin(2νT k −φ). 98 The likelihood function has LAMN structure (3.30) with A D (T k ). 7.2 Parameter Estimation when One Parameter is Known First we study the maximum likelihood estimator of θ 1 when θ 2 is known, which is e θ T 1 =θ 1 + ∫ T 0 X(t)dW(t) ∫ T 0 X 2 (t)dt (7.21) From previous section, we have that e −T ∫ T 0 X(t)dW(t) =η c ξ s (T)−η s ξ c (T)+0 a:s: (T) e −2T ∫ T 0 X 2 (t)dt = (η c ) 2 F s (T)+(η s ) 2 F c (T)−2η c η s F cs (T)+0 a:s: (T) (7.22) Therefore, e T ( e θ T 1 −θ 1 ) = η c ξ s (T)−η s ξ c (T)+0 a:s: (T) (η c ) 2 F s (T)+(η s ) 2 F c (T)−2η c η s F cs (T) (7.23) which does not converge in general, but converges with proper choice of a sequence T k ,k∈N. The limit distribution of the sequence is of the form ¯ η c ¯ ξ s − ¯ η s ¯ ξ c ¯ η c 2 + ¯ η s 2 −2ρ¯ η c ¯ η s (7.24) 99 where (¯ η c , ¯ η s ) , ( ¯ ξ c , ¯ ξ s )are independent Gaussian random variables and ( ¯ ξ c , ¯ ξ s ) has covariance matrix 1 ρ ρ 1 (7.25) Similar result can be obtained for e θ T 2 . The MLEs is different from those in the previous section, but they all contains four Gaussian random variables and the distribution is rather complicated. 100 Chapter 8 Simulation Simulationofstochasticprocessesisimportantinmanyareasofstudy. Thefirststep in any simulation scheme is to find a way to “discretize” a continuous-time process into a discrete time process. In the rest of this chapter we shall present two ways of discretizing. One is the well known Euler scheme, which is a linear approximation; the other is exact discretization. 8.1 Simulation of Ornstein Uhlenbeck Process 8.1.1 Euler Scheme ThesimplestwaytodiscretizetheprocessistouseEulerdiscretization[14][37]. This is equivalent to approximating the integrals using the left-point rule. Suppose we take equi-distant time step 0 = t 0 <t 1 <t 2 <···<t n =T with step size ∆ = T n . X t k+1 −X t k =θX t k ∆+ √ ∆ξ k (8.1) where ξ k i.i.d∼N(0,1). It is proved in [14] that the discrete time process created by Euler scheme con- vergesalmostsurelyandinL p uniformlyoncompactsetstothesolutionoftheSDE. 101 If we ask for an approximation of the diffusion over the entire interval [0,T], it is simplest to take a linear interpolation between the points X t k [14]. 8.1.2 Exact Discretization Given X(t k ), we can obtain X(t k+1 ) by solving dX(t) =θX(t)dt+σdW(t) with initial condition X(t k ). Then X(t k+1 ) =e ∆ X(t k )+e t k+1 ∫ t k+1 t k e −t σdW(t) (8.2) The random part in the equation above is independent of one another. Thus, X(t k+1 ) =e ∆ X(t k )+σ √ e 2∆ −1 2θ η k (8.3) where η k i.i.d. ∼N(0,1). 8.1.3 ComparisonofEulerSchemeandExactDiscretization In either of the two ways of simulation, it requires generating a normal random variable per step, and the inductive formula for both methods are linear equations of X(t k ) and the new generated normal random variable. Thus the computation cost is the same. 102 Euler scheme is an approximation and it has good precision only when the time step is sufficiently small. Exact discretization provides the exact trajectory and allows any step size. Thus for the Ornstein Uhlenbeck process (1.3), exact dis- cretization is preferable. However, this does not mean Euler scheme is not good for any first order SDE. The process we are interested in here is very special and there are way more first order SDEs that do not have an explicit solution, let alone a linear inductive formula for simulation. In such situation, Euler scheme can yield satisfactory stimulation results given sufficiently small step size. 8.2 Simulation of CAR(2) For simulation of CAR(2), we shall discuss Euler scheme and exact discretization. 8.2.1 Euler Scheme We rewrite the equation (1.1) in the form of system: d X(t) ˙ X(t) = 0 1 θ 1 θ 2 X(t) ˙ X(t) dt+ 0 1 dW(t) (8.4) X(t k+1 ) ˙ X(t k+1 ) − X(t k ) ˙ X(t k ) = ∆ 0 1 θ 1 θ 2 X(t k ) ˙ X(t k ) + √ ∆ 0 ε k (8.5) where ε k i.i.d∼N(0,1). 103 8.2.2 Exact Discretization ExactdiscretizationinCAR(2)ismorecomplicatedthanthatinOrnsteinUhlenbeck process. A very natural idea is to write CAR(2) as a system. Similar to Ornstein Uhlenbeck process, an iterative formula can be obtained. Y(t k+1 ) =e A∆ Y(t k )+e At k+1 ∫ t k+1 t k e −At bdW(t) (8.6) where Y(t) = X(t) ˙ X(t) , b = 0 1 and A = 0 1 θ 1 θ 2 (8.7) The random part e At k+1 ∫ t k+1 t k e −At bdW(t) is a bivariate normal random variable with zero mean. Simulating it involves calculating its variance matrix, which is e 2At k+1 ∫ t k+1 t k e −At b[e −At b] ⊤ dt. This cannot be easily realized because of difficulty in computing matrix exponential and its integration. Essentially, it is the eigenvalues of matrix A that characterize the process defined by SDE (1.1). Denote the eigenvalues as λ 1 and λ 2 . One can express the process in terms of the eigenvalues. The idea is to start from the explicit solution of SDE (1.1) and express it as linear combinations of normal random variables whose mean and covariance can be easily computed. 104 λ 1 and λ 2 real distinct When λ 1 ̸=λ 2 and they are nonzero, the solution of SDE (1.1) is X(t) = ∫ t 0 e 2 (t−s) −e 1 (t−s) λ 2 −λ 1 dW(s) = 1 λ 2 −λ 1 ( e 2 t ∫ t 0 e − 2 s dW(s)−e 1 t ∫ t 0 e − 1 s dW(s) ) (8.8) For t =t k , X(t k ) = 1 λ 2 −λ 1 ( e 2 t k k−1 ∑ i=0 ∫ t i+1 t i e − 2 s dW(s)−e 1 t k k−1 ∑ i=0 ∫ t i+1 t i e − 1 s dW(s) ) = 1 λ 2 −λ 1 ( e 2 t k k ∑ i=1 η i −e 1 t k k ∑ i=1 ξ i ) (8.9) where η k = ∫ t k t k1 e − 2 s dW(s) ξ k = ∫ t k t k1 e − 1 s dW(s) (8.10) 1. If λ i ̸= 0 and λ 1 +λ 2 ̸= 0, we have the covariance matrix of (η k ,ξ k ) e 2 2 t k1 −e 2 2 t k 2 2 e ( 1 + 2 )t k1 −e ( 1 + 2 )t k 1 + 2 e ( 1 + 2 )t k1 −e ( 1 + 2 )t k 1 + 2 e 2 1 t k1 −e 2 1 t k 2 1 (8.11) 2. If λ 1 +λ 2 = 0 (in other words, θ 2 = 0), the covariance matrix of (η k ,ξ k ) is e 2 2 t k1 −e 2 2 t k 2 2 t k −t k−1 t k −t k−1 e 2 1 t k1 −e 2 1 t k 2 1 (8.12) 105 3. If λ 1 = 0, in other words, the larger root is zero and λ 2 < 0, the covariance matrix of (η k ,ξ k ) is e 2 2 t k1 −e 2 2 t k 2 2 e 2 t k1 −e 2 t k 2 e 2 t k1 −e 2 t k 2 t k −t k−1 (8.13) 4. If λ 2 = 0, in other words, the smaller root is zero and λ 1 > 0, the covariance matrix of (η k ,ξ k ) is t k −t k−1 e 1 t k1 −e 1 t k 1 e 1 t k1 −e 1 t k 1 e 2 1 t k1 −e 2 1 t k 2 1 (8.14) A sequence of{(η k ,ξ k ) : 1≤ k ≤ N} can be simulated and each X(t k ),1≤ k ≤ N can be obtain by (8.9). λ 1 =λ 2 =λ, real repeated, nonzero When λ 1 =λ 2 =λ̸= 0, solution of SDE (1.1) is X(t) = ∫ t 0 e (t−s) (t−s)dW(s) =te t ∫ t 0 e −s dW(s)−e t ∫ t 0 se −s dW(s) =te t η 1 −e t η 2 (8.15) 106 For t =t k , X(t k ) =t k e t k ∫ t k 0 e −s dW(s)−e t k ∫ t k 0 se −s dW(s) =t k e t k k−1 ∑ i=0 ∫ t i+1 t i e −s dW(s)−e t k k−1 ∑ i=0 ∫ t i+1 t i se −s dW(s) =t k e t k k ∑ i=1 η i −e t k k ∑ i=1 ξ i (8.16) The covariance matrix of (η k ,ξ k ) is Σ = σ 11 σ 12 σ 21 σ 22 (8.17) where σ 11 = e −2t k1 −e −2t k 2λ σ 12 =σ 21 = t k−1 e −2t k1 −t k e −2t k 2λ + e −2t k1 −e −2t k 4λ 2 σ 22 = t 2 k−1 e −2t k1 −t 2 k e −2t k 2λ + t k−1 e −2t k1 −t k e −2t k 2λ 2 + e −2t k1 −e −2t k 4λ 3 (8.18) λ 1 and λ 2 are complex When λ 1 =α+βi and λ 2 =α−βi, α̸= 0,β̸= 0. the solution of SDE (1.1) is X(t) = e t β ∫ t 0 e −s sinβ(t−s)dW(s) = e t sin(βt) β ∫ t 0 e −s cos(βs)dW(s)− e t cos(βt) β ∫ t 0 e −s sin(βs)dW(s) (8.19) 107 For t =t k , X(t k ) = e t k sin(βt k ) β ∫ t k 0 e −s cos(βs)dW(s)− e t k cos(βt k ) β ∫ t k 0 e −s sin(βs)dW(s) = e t k β [ sin(βt k ) k ∑ i=1 ∫ t i t i1 e −s cos(βs)dW(s)−cos(βt k ) k ∑ i=1 ∫ t i t i1 e −s sin(βs)dW(s) ] = e t k β [ sin(βt k ) k ∑ i=1 η i −cos(βt k ) k ∑ i=1 ξ i ] (8.20) The covariance matrix of (η k ,ξ k ) is f 11 (t k )−f 11 (t k−1 ) f 12 (t k )−f 12 (t k−1 ) f 12 (t k )−f 12 (t k−1 ) f 22 (t k )−f 22 (t k−1 ) (8.21) where f 11 (t) =e 2t [ 1 4α − cos(2βt−θ) 4 √ α 2 +β 2 ] (8.22) f 12 (t) =e 2t sin(2βt−θ) 2 √ α 2 +β 2 (8.23) f 22 (t) =e 2t [ 1 4α + cos(2βt−θ) 4 √ α 2 +β 2 ] (8.24) λ 1 and λ 2 are pure imaginary For t =t k , X(t k ) = 1 β [ sin(βt k ) k ∑ i=1 η i −cos(βt k ) k ∑ i=1 ξ i ] (8.25) 108 where η i = ∫ t i t i1 cos(βs)dW(s) and ξ i = ∫ t i t i1 sin(βs)dW(s). The covariance matrix of (η k ,ξ k ) is t k −t k1 2 + sin(2t k )−sin(2t k1 ) 4 cos(2t k1 )−cos(2t k ) 4 cos(2t k1 )−cos(2t k ) 4 cos(2t k1 )−cos(2t k ) 4 t k −t k1 2 − sin(2t k )−sin(2t k1 ) 4 (8.26) λ 1 =λ 2 = 0, double zero In this case, X(t) =tW(t)− ∫ t 0 sdW(s) (8.27) and for t =t k X(t k ) =t k W(t k )−t k−1 W(t k−1 )− ∫ t k t k1 sdW(s) (8.28) Letη k =W(t k )−W(t k−1 )andξ k = ∫ t k t k1 sdW(s),thepair(η k ,ξ k )isbivariatenormal with zero mean and covariance matrix t k −t k−1 t 2 k −t 2 k1 2 t 2 k −t 2 k1 2 t 3 k −t 3 k1 3 (8.29) Besides, (η k ,ξ k ) is independent of (η j ,ξ j ) if k̸=j. 109 8.2.3 ComparisonofEulerSchemeandExactDiscretization In the Euler scheme, it requires generating one normal random variable per step; in exact discretization, a bivariate normal random vector is generated per step, making it more costly than Euler scheme. As for simulation of Ornstein Uhlenbeck process, exact discretization allows any time step and no approximation is made. If one is interested in the value of X(t) only at a few sparse time spots, exact discretization costs far less to reach a good precision than the Euler scheme. But it loses this advantage when it comes to the degeneratecase becauseonly ˙ X(t)can besimulatedexactly. Toobtain X(t), numer- ical integration needs to be carried out, and a small approximation error usually requires a large number of integrand evaluations. 110 Bibliography [1] Y. Ait-Sahalia, Maximum likelihood estimation of discretely sampled diffusions: a closed form approximation approach, Econometrica 70 (2002), 223–262. [2] Y. Ait-Sahalia, L.P. Hansen, and J. Scheinkman, Operator methods for continuous-time markov processes, HandbookofFinancialEconometrics(2001). [3] D. Alimov, Asymptotically ergodic markov functionals of an ergodic process, Theory of Probability And its Applications (1994). [4] , Markov functionals of ergodic markov process, Theory of Probability And its Applications (1994). [5] T. W. Anderson, On asymptotic distributions of estimates of parameters of stochastic difference equations, Annals of Mathematical Statistics 30 (1959), 676–687. [6] F.M. Bandi and P.C.B. Phillips, Nonstationary continuous-time models, Hand- book of Financial Econometrics, Elsevier Science, forthcoming (2008). [7] G. K. Basak and P. Lee, Asymptotic properties of an estimator of the drift coef- ficients of multidimensional ornstein-uhlenbeck process that are not necessarily stable, Eletronic Journal of Statistics 2 (2008), 1309–1344. [8] I.V.BasawaandP.Rao, Asymptotic inference for stochastic processes,Stochas- tic Processes and their Applications 10 (1980), 221–254. [9] Ishwar V. Basawa and David John Scott, Asymptotic optimal inference for non- ergodic models, Lecture Notes in Statistics, vol. 17, Springer-Verlag, New York, 1983. 111 [10] A. R. Bergstrom, The history of continuous-time econometric models, Econo- metric Theory 4 (1988), no. 3, 365–383. [11] A.R Bergstrom, The estimation of parameters in nonstationary higher-order continuous time dynamic models, Econometric Theory 1 (1985), 369–385. [12] P. Billingsley, Ergodic theory and information, John Wiley & Sons Inc., New York, 1965. [13] George D. Birkhoff, Proof of the ergodic theorem, Proceedings of the National Academy of Sciences of the United States of America 17 (1931), 656–660. [14] Nicolas Bouleau and Dominique Lpingle, Numerical methods for stochastic pro- cesses, Wiley-Interscience, 1993. [15] B. M. Brown and J. I. Herwitt, Asymptotic likelihood theory for diffusion pro- cesses, Journal of Applied Probability 12 (1975), 228–238. [16] N.H. Chan and C.Z. Wei, Limiting distrubutions of least squares estimates of unstable authoregressive processes, The Annals of Statistics16 (1988), 367–401. [17] A. S. Dalalyan and Yu. A. Kutoyants, Asymptotically efficient trend coefficient estimation for ergodic diffusion, Mathematical Methods of Statistics 11 (2002), no. 4, 402–427. [18] Monroe D. Donsker, An invariance principle for certain probability limit theo- rems, Memoirs of the American Mathematical Society 1951 (1951), no. 6, 12. [19] P. D. Feigin, Maximum likelihood estimation for continuous-time stochastic pro- cesses, Advances in Applied Probability 8 (1976), 712–736. [20] D. Graupe, V. K. Jain, and J. Salahi, A comparative analysis of various least- squares identification algorithms, Automatica. The Journal of IFAC. The Inter- national Federation of Automatic Control 16 (1980), no. 6, 663–681. [21] Robert M. Gray, Probability, random processes, and ergodic properties, second ed., Springer, Dordrecht, 2009. [22] A. C. Harvey and J. H. StockReviewed, The estimation of high-order autore- gressive models, Econometric Theory 1 (1985), 97–117. [23] C. C. Heyde, On the central limit theorem for stationary processes, Z. Wahrscheinlichkietstheorie und Verw. Gebiete 30 (1974), 315–320. 112 [24] J. Hull, Options, futures and other derivatives, Prentice Hall, Upper Saddle River, NJ, 2003. [25] I. A. Ibragimov and R. Z. KHasminski˘ ı, Statistical estimation, Applications of Mathematics, vol. 16, Springer-Verlag, New York, 1981, Asymptotic theory, Translated from the Russian by Samuel Kotz. [26] JeanJacodandAlbertN.Shiryaev, Limit theorems for stochastic processes,sec- ond ed., Grundlehren der Mathematischen Wissenschaften [Fundamental Prin- ciples of Mathematical Sciences], vol. 288, Springer-Verlag, Berlin, 2003. [27] P.Jeganathan,someaspectsofasymptotictheorywithapplicationstotimeseries models, Econometric Theory 11 (1995), 818–887. [28] S. Johansen and M. Nielsen, Likelihood inference for a nonstationary fractional autoregressive model, (2007), no. 07–27. [29] Ioannis Karatzas and Steven E. Shreve, Brownian motion and stochastic cal- culus, Graduate Texts in Mathematics, vol. 113, Springer-Verlag, New York, 1988. [30] R. Khasminskii, Limit distributions of some integral functionals for null- recurrent diffusions, Stochastic Processes and their Applications 92 (2001), no. 1, 1–9. [31] R.Z.Khas’minskii, Ergodic properties of recurrent diffusion processes and stabi- lization of the solution to the cauchy problem for parabolic equations, THEORY OF PROBABILITY AND ITS APPLICATIONS (1960). [32] Y. A. Kutoyants, Statistical inferennce for ergodic diffusion processes, Springer, 2004. [33] N. Lin and S.V. Lototsky, Undamped harmonic oscillator driven by additive gaussian white noice: a statistical analysis, Communication of Stochastic Anal- ysis 5 (2011), no. 1, 233–250. [34] R.S. Lipster and A.N. Shiryaev, Statistics of random processes, Springer, 2001. [35] H. Luschgy, Local asymptotic mixed normality for semimartingale experiments, Probability Theory and Related Fields 92 (1992), 151–176. [36] H.B. Mann and A. Wald, On the statistical treatment of linear stochastic differ- ence equations, Econometrica 11 (1943), 173–220. 113 [37] G.N. Milstein, Numerical integration of stochastic differential equations, Kluwer Academic Publishers, The Netherlands, 1995. [38] Jo˜ ao Nicolau, Modeling financial time series through second-order stochastic differential equations, Statistics & Probability Letters 78 (2008), 2700–2704. [39] Bernt Øksendal, Stochastic differential equations, sixth ed., Universitext, Springer-Verlag, Berlin, 2003, An introduction with applications. [40] Bahram Pesaran and M. Hashem Pesaran, Time series econometrics using microfit 5.0, Oxford University Press, 2009. [41] K. Petersen, Ergodic theory, Cambridge University Press, Cambridge, 1983. [42] P. C. B. Phillips, Towards a unified asymptotic theory for autoregression, Biometrika 74 (1987), no. 3, 535–547. [43] P.C.B.Phillips, Weak convergence to the matrix stochastic integral ∫ 1 0 bdb,Jour- nal of Multivariate Analysis 24 (1988), no. 2, 252–265. [44] P.C.B. Phillips, The problem of identification in finite parameter continuous- time models, The Journal of Econometrics 4 (1973), 351–362. [45] B. M. P¨ otscher, Model selection under nonstationarity: autoregressive models and stochastic linear regression models, The Annals of Statistics 17 (1989), no. 3, 1257–1274. [46] M. M. Rao, Asymptotic distribution of an estimator of the boundary parameter of an unstable process, The Annals of Statistics 6 (1978), no. 1, 185–190. [47] George G. Roussas and Debasis Bhattacharya, Revisiting local asymptotic nor- mality (LAN) and passing on to local asymptotic mixed normality (LAMN) and local asymptotic quadratic (LAQ) experiments, Advances in directional and lin- ear statistics, Physica-Verlag/Springer, Heidelberg, 2011, pp. 253–280. [48] Paul Shields, Book Review: Mathematical theory of entropy // Book Review: Topics in ergodic theory // Book Review: An introduction to ergodic theory, Bulletin of the American Mathematical Society 9 (1983), 259–265. [49] B.E. Sorensen, Continuous record asymptotics in systems of stochastic differen- tial equations, Econometric Theory 8 (1992), 28–51. 114 [50] G.E.UhlenbeckandL.S.Ornstein, On the theory of brownian motion,Physical Review 36 (1930), 823–41. [51] Oldrich Vasicek, An equilibrium characterisation of the term structure, Journal of Financial Economics 5 (1977), 177C188. [52] J. S. White, The limiting distribution of the serial correclation coefficient in the explosive case, Annals of Mathematical Statistics 29 (1958), no. 4. [53] Nakahiro Yoshida, On asymptotic mixed normality of the maximum likelihood estimator in a multidimensional diffusion process, Statistical theory and data analysis, II (Tokyo, 1986), North-Holland, Amsterdam, 1988, pp. 559–566. 115
Abstract (if available)
Abstract
While consistency of the maximum likelihood estimator of the unknown parameters in the second-order linear stochastic differential equation driven by Gaussian white noise holds under rather general conditions, little is known about the rate of convergence and the limiting distribution of the estimator, especially when the underlying process is not ergodic. The objective of this dissertation is to identify and investigate all possible types of asymptotic behavior for the maximum likelihood estimators. The emphasis is on the non-ergodic case, when the roots of the corresponding characteristic equation are not both in the left half-plane.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Statistical inference of stochastic differential equations driven by Gaussian noise
PDF
Statistical inference for stochastic hyperbolic equations
PDF
Nonparametric estimation of an unknown probability distribution using maximum likelihood and Bayesian approaches
PDF
Second order in time stochastic evolution equations and Wiener chaos approach
PDF
Linear filtering and estimation in conditionally Gaussian multi-channel models
PDF
Asymptotic problems in stochastic partial differential equations: a Wiener chaos approach
PDF
Finite sample bounds in group sequential analysis via Stein's method
PDF
Gaussian free fields and stochastic parabolic equations
PDF
Parameter estimation problems for stochastic partial differential equations from fluid dynamics
PDF
Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol
PDF
Large deviations rates in a Gaussian setting and related topics
PDF
Statistical inference for second order ordinary differential equation driven by additive Gaussian white noise
PDF
On stochastic integro-differential equations
PDF
Regularity of solutions and parameter estimation for SPDE's with space-time white noise
PDF
Bulk and edge asymptotics in the GUE
PDF
Tamed and truncated numerical methods for stochastic differential equations
PDF
Noise benefits in expectation-maximization algorithms
PDF
From least squares to Bayesian methods: refining parameter estimation in the Lotka-Volterra model
PDF
Robust estimation of high dimensional parameters
PDF
Conditional mean-fields stochastic differential equation and their application
Asset Metadata
Creator
Lin, Ning
(author)
Core Title
Parameter estimation in second-order stochastic differential equations
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Publication Date
08/03/2012
Defense Date
05/05/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
consistency,Gaussian process,maximum likelihood estimation,OAI-PMH Harvest,statistical inference
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lototsky, Sergey V. (
committee chair
), Bartroff, Jay (
committee member
), Sun, Fengzhu Z. (
committee member
)
Creator Email
nlin@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-86412
Unique identifier
UC11289228
Identifier
usctheses-c3-86412 (legacy record id)
Legacy Identifier
etd-LinNing-1137.pdf
Dmrecord
86412
Document Type
Dissertation
Rights
Lin, Ning
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
consistency
Gaussian process
maximum likelihood estimation
statistical inference