Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Covariance modeling and estimation for distributed parameter systems and their approximations
(USC Thesis Other)
Covariance modeling and estimation for distributed parameter systems and their approximations
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
COVARIANCE MODELING AND ESTIMATION FOR DISTRIBUTED PARAMETER SYSTEMS AND THEIR APPROXIMATIONS by Yan Yang A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (APPLIED MATHEMATICS) December 2008 Copyright 2008 Yan Yang Dedication This dissertation is dedicated to my parents. ii Acknowledgements I would like to thank my advisor, Dr. Chunming Wang, for his great help. I also would like to thank Dr. Mark Moldwin, David Galvan, Dr. Byron Iijima and Dr. Tony Mannucci, for their help on the application part in my dissertation. iii Table of Contents Dedication ii Acknowledgements iii List Of Figures vi Abstract viii Chapter 1: Introduction 1 Chapter 2: Background 5 2.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Classical Estimation Results . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 3: Application of data assimilation to space weather 18 3.1 Ionospheric Data Assimilation . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 The Earth’sionosphere . . . . . . . . . . . . . . . . . . . . . 20 3.1.2 GAIM and It’s measurements . . . . . . . . . . . . . . . . . 23 3.1.3 Implementation of Kalman Filtering in GAIM . . . . . . . . 25 3.2 3-D tomography of plasmasphere . . . . . . . . . . . . . . . . . . . 27 3.2.1 Earth’s plasmasphere . . . . . . . . . . . . . . . . . . . . . . 28 3.2.2 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 A specific Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 4: Mathematical formulation 36 4.1 Modeling of a random state function . . . . . . . . . . . . . . . . . 37 4.1.1 Introduction to Random field . . . . . . . . . . . . . . . . . 37 4.1.2 Random State function . . . . . . . . . . . . . . . . . . . . . 38 4.1.3 Averaged random state . . . . . . . . . . . . . . . . . . . . . 43 4.1.4 Finite series approximation . . . . . . . . . . . . . . . . . . 46 4.2 Forward model of linear observation operators . . . . . . . . . . . . 52 4.3 Covariance properties of approximation states . . . . . . . . . . . . 54 4.4 Covariance properties of representation error . . . . . . . . . . . . . 56 iv 4.5 A special case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.6 Minimum variance estimation . . . . . . . . . . . . . . . . . . . . . 69 4.7 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Chapter 5: Bayesian approach 77 5.1 Parameterized model of uncertainties . . . . . . . . . . . . . . . . . 78 5.2 Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 Validation of Bayesian approach . . . . . . . . . . . . . . . . . . . . 87 Chapter 6: Application to 3-D tomography of plasmasphere 91 6.1 Further discussion about 3-D tomography . . . . . . . . . . . . . . 91 6.2 Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.3 Least Square Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.4 Minimum Variance Retrieval . . . . . . . . . . . . . . . . . . . . . . 103 Chapter 7: Conclusion 106 References 108 v List Of Figures 2.1 A thin rod with total mass v . . . . . . . . . . . . . . . . . . . . . . 6 3.1 TypicalverticaldistributionofiondensitiesintheEarth’sionosphere (source:www.windows.ucar.edu) . . . . . . . . . . . . . . . . . . . . 20 3.2 A typical low resolution grid for GAIM . . . . . . . . . . . . . . . . 26 3.3 The plasmasphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 LEO and GPS satellites . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 An ionospheric model with two cells. . . . . . . . . . . . . . . . . . 31 4.1 The partition of D and 15 line operators . . . . . . . . . . . . . . . 71 4.2 True system state . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.3 True average system state . . . . . . . . . . . . . . . . . . . . . . . 72 4.4 Estimator of Average System with Scaled State Covariance . . . . . 73 4.5 Estimator of Average System with Non-scaled State Covariance . . 74 4.6 The histogram of errors with scaled state covariance for 1000 esti- mations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.7 The histogram of errors with non-scaled state covariance for 1000 estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.1 L=10 and true k is 4. . . . . . . . . . . . . . . . . . . . . . . . . . 81 vi 5.2 L=30 and true k is 1. . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 Examples of randomly placed lines across domain D. . . . . . . . . 89 5.4 P k 14 vs k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.1 20 mins of data from 1 LEO Satellite . . . . . . . . . . . . . . . . . 92 6.2 Compare observed TEC with TEC from the model 1 . . . . . . . . 96 6.3 Compare observed TEC with TEC from the model 2 . . . . . . . . 98 6.4 Least Square retrieval. . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.5 Singular values of the observation matrix W. . . . . . . . . . . . . . 100 6.6 Modified least Square retrieval . . . . . . . . . . . . . . . . . . . . . 101 6.7 Residual before retrieval, mean=1.0, variance=1.7 . . . . . . . . . . 102 6.8 Residual after retrieval, mean=0.2, variance=0.9 . . . . . . . . . . . 102 6.9 Residual after minimum variance retrieval, mean=0.1, variance=0.5 105 vii Abstract In this dissertation, we first established a mathematical framework for approxi- mation of a filtering problem for infinite dimensional systems with random state. Particularlywedefinedrandomstatesforinfinitedimensionsystems. Wealsoestab- lished statistical property of finite dimensional approximations and gave a rigorous treatment for the contributions of error in the observation due to approximation. Numerical simulations were performed to validate the results. Secondly, a type of Bayesian approach was developed to select the most appropriate covariance matrix among a finite number of candidates. Monte carlo simulations were conducted to validate this approach. Finally, we applied the results to 3-D estimation problem for plasmasphere electron density. viii Chapter 1 Introduction The theory of minimum variance estimation including Kalman filter technique was developed in the mid of 20th century ([8]). The theory quickly gained wide popu- larity in many engineering applications. In addition to provide a computationally efficient method to produce an estimate of the unknown state of a dynamical sys- tem, the approach gives a mathematically rigorous prediction of the uncertainty of the estimate. However, this mathematically rigorous prediction is founded on the assumed knowledge of different sources of uncertainties involved in the system model and models of observations. With the availability of high performance com- puting capability, minimum variance estimation techniques are used in many new scientific research and engineering applications including medical imaging ([14]), numerical weather prediction ([10],[12]) and other data assimilation applications in geophysical sciences ([6]). 1 Onemajordifferencebetweentheclassicalengineeringapplicationsofminimum variance estimation technique and the new applications is the fact that the under- lying systems for many of the new applications are inherently infinite dimensional. Any application of the estimation technique requires discretization of the system. Themodelingofsystemuncertaintiesmustalsoincludeuncertaintiesintroducedby the discretization. The latter uncertainty is vaguely referred to as representation errorin the literature. Itiscommonlyknowninthemedicalimagingandgeophysi- caldataassimilationcommunitiesthatthesuccessofminimumvarianceestimation technique often relies on the ”artful” choice of uncertainty model represented by different error covariance matrices that one has to use in an implementation of Kalman filter. Many empirical approaches for modeling uncertainties are devel- oped for each specific field of application ([7]). In particular, it is widely known that there is a need to substantially augment uncertainties associated with mea- surements to account for the contribution of representation error. The ad hoc nature of the treatment of system uncertainty in the areas of es- timation problems involving fields defined on a continuum reduces our confidence on the resulting estimates. In fact, when Kalman filter technique is used in the development of a Global Assimilative Ionospheric Model (GAIM) at the University of Southern California in collaboration with the Jet Propulsion Laboratory (JPL), it was found that the estimate of ionospheric electron density can sometime be significantly dependent of the model resolution. In this thesis, we shall use three 2 dimensional tomography of plasmasphere as a concrete example to illustrate is- sues involved in uncertainty modeling and demonstrate effectiveness of proposed solutions. Our research is intent to develop a mathematical framework to investigate er- rors introduced in discretization of a system defined on a continuum. Based on this mathematical framework, we developed practical models of uncertainty of the infinitedimensionalsystemsthatcanbeusedtoderiveappropriatestatisticalprop- ertiesofrepresentationerroranduncertaintiesforthediscretizedsystem. Weintent to demonstrate both through theoretical derivation and numerical simulation tech- niques that the framework provides consistency of the estimations across different discretizations of the system. By removing ad hoc nature of the uncertainty mod- eling for the discretized system, we can focus our attention on the truly challenging problem of selecting the correct error covariance matrices. The fundamental issue is that how we can recognize that the error covariance matrices used in the filtering are inconsistent with the system behavior. A candidate approach is being explored through theoretical analysis, as well as, numerical simulation. Our aim is to de- velop a practical method to fine-tune error covariance matrices by examining the consistency between the predicted variability of the estimate and the empirically computed variability. An outline of the thesis is as follows. In Chapter 2, we first use a simple example to illustrate the basic issues involved in system estimation problem. Then 3 we review the basic theories of minimum variance estimation. In Chapter 3 the formulationofionosphericdataassimilationproblemasafinitedimensionalsystem estimation problem is presented. We also discuss several issues specific to the application of minimum variance estimation theory to the estimation of electron density in ionosphere. In Chapter 4 we present a mathematical formulation of representation error and establish the link between uncertainty state variables and the representation error. In Chapter 5 we first provide a possible way to model the covariance matrix of the elementary state vector. The proposed model depends on a small number of parameters with finite choices. A Bayesian approaches for determination of the most likely candidate for covariance matrix among a finite number of selections is presented in Chapter 5. In Chapter 6, we applied the results from previous chapters to 3-D inversion problem of plasmasphere . 4 Chapter 2 Background The objective of a system estimation problem is to derive the appropriate values for the variables that represent the state of the system from available observations that often contain measurement errors. A basic relationship between the observa- tions and the state variables is referred to as the forward model of the observation system. In this thesis, we consider only linear forward models. That is, we assume that the relationship between the available observations and the state variables is represented by a linear equation in which an additive error term is included. The quality of an estimate is represented by the smallness of the difference between the true values of the state variables and the estimated values. However, this is an impractical measure of performance because it requires knowledge of the true values of the state variable. Instead we must use various proxy performance mea- sures. One often used proxy performance measure is the residual error. That is the difference between the actual value of the observation and the forward model predicted observation when the state variables take the estimated value. Another 5 often used proxy performance measure is the expected value of the square of the difference between the estimated values and the true values of the state variables. This is referred to as a minimum variance estimation. In this second approach, we must assume that certain statistical properties of the measurement and modeling errors are known. When the system estimation theories which were designed for finite dimensional system are applied to infinite dimensional distributed parameter system, additional issues connected with approximation of an infinite dimensional system by a finite dimensional system are introduced. In this chapter, we first con- sider an extremely simple example to illustrate some of the issues related to system estimation problems for an infinite dimensional system, then we review the basic results of system estimation theory. 2.1 An Example Inthisexample,weconsidertheproblemofestimatingthemassdensitydistribution of a very thin rod. We approximate the rod by a 1-dimensional line segment with non-uniform densityη(x) forx∈[0,1] as shown in Figure 2.1. We assume that the total mass v is measured with a small error ε. Figure 2.1: A thin rod with total mass v 6 The basic forward model of the observation system is given by v = Z 1 0 η(x)dx+ε. (2.1) It is clear that we could not estimate the mass density for each point x since this wouldrequireustospecifyinfinitenumberofvaluesofthemassdensity. Therefore, we must approximate forward model (2.1) by a finite dimensional model. One straight forward way to discretize the problem is to divide the interval [0,1] into several subintervals and to estimate the average mass density for each subinterval. The simplest case is to estimate the average η 1 for the whole interval [0,1]. We consider the case in which η(x) has mean η 0 for all x and ε has mean zero. For simplicity, we also assume that η and ε are independent. Thus a discrete model is given by v =η 1 +ε. (2.2) The statistical properties are given as follows: at each point x the value of η(x) deviates from η 0 with a standard deviation of σ η and the standard deviation of measurement uncertainties in v is equal to σ ε . We would like to find an estimator of the form ˆ η 1 =η 0 +k(v−η 0 ) which minimizes Ekˆ η 1 −η 1 k 2 . (2.3) 7 By using the properties of random variables η 1 −η 0 and ε, we obtain Ekˆ η 1 −η 1 k 2 = Ekη 0 +k(v−η 0 )−η 1 k 2 = Ek(k−1)(η 1 −η 0 )+kεk 2 = (k−1) 2 E((η 1 −η 0 ) 2 )+k 2 E(ε 2 ) = (k−1) 2 σ 2 η +k 2 σ 2 ε . Whenk =σ 2 η /(σ 2 η +σ 2 ε ), the variance of estimation error is minimized. As a result, the optimal solution is given by ˆ η 1 =η 0 +(v−η 0 ) σ 2 η σ 2 η +σ 2 ε . (2.4) It is easy to see that if σ η is very small compared to σ ε the change to the initial estimatederivedfromobservationv isalsogoingtobeverysmall. Thelowvalueof σ η indicates that we have very strong confidence that η 0 is a good estimate of the average density. On the other hand, if σ η is very large relative to σ ε the resulting estimate is going to be very close to v because σ 2 η σ 2 η +σ 2 ε ≈ 1. The small value of σ ε indicates that we believe the measurement is very accurate. This illustrates the importance of knowing accurately the statistical properties of the uncertainties. 8 We also can divide the rod into two equally length subintervals. We would estimate the average densitiesη 21 andη 22 foreachpiece, respectively. Adiscretized model is given by v = η 21 2 + η 22 2 +ε. (2.5) If we still keep the standard deviation of uncertainties of η 21 , η 22 to be σ η , we can derive an optimal estimate of the form ˆ η 21 ˆ η 22 = η 0 η 0 + k 1 (v−η 0 ) k 2 (v−η 0 ) by minimizing E ˆ η 21 ˆ η 22 − η 21 η 22 2 . Using the statistical properties of η 21 −η 0 , η 22 −η 0 and ε, we obtain E ˆ η 21 ˆ η 22 − η 21 η 22 2 = E(ˆ η 21 −η 21 ) 2 +E(ˆ η 22 −η 22 ) 2 = E[η 0 +k 1 ( η 21 2 + η 22 2 +ε−η 0 )−η 21 ] 2 +E[η 0 +k 2 ( η 21 2 + η 22 2 +ε−η 0 )−η 22 ] 2 = ( k 1 2 −1) 2 E(η 21 −η 0 ) 2 +( k 1 2 ) 2 E(η 22 −η 0 ) 2 +k 2 1 E(ε 2 ) +( k 2 2 −1) 2 E(η 21 −η 0 ) 2 +( k 2 2 ) 2 E(η 22 −η 0 ) 2 +k 2 2 E(ε 2 ) = ( k 1 2 −1) 2 σ 2 η +( k 1 2 ) 2 σ 2 η +k 2 1 σ ε 9 +( k 2 2 −1) 2 σ 2 η +( k 2 2 ) 2 σ 2 η +k 2 2 σ ε . When parameters are given by k 1 =k 2 = σ 2 η σ 2 η +2σ 2 ε , the expected value of square of errors in the estimation is minimized. The optimal estimates are given by ˆ η 21 ˆ η 22 = η 0 η 0 + σ 2 η σ 2 η +2σ 2 ε (v−η 0 ) (v−η 0 ) . It is interesting to note that the total estimated mass of the rod becomes ˆ η =η 0 + σ 2 η σ 2 η +2σ 2 ε (v−η 0 ). When for an example σ η = σ ε = 1, the new estimation of the total mass is sub- stantially different from the optimal estimate in (2.4) without subdividing the rod. Although it is obvious that with only observation of the total mass, it is impossible to resolve the mass difference between the two portions of the rod. However, we still wish to obtain a consistent estimation of the total mass for any discretiza- tion of the problem. In fact, in many practical applications, it is often difficult to ensure that the totality of the observations spans the full dimension of the 10 space of state variables. In our case, it is easy to see that if we have assumed E(η 21 −η 0 ) 2 =E(η 21 −η 0 ) 2 = 2σ 2 η , the new optimal estimate of total mass would be the same as the undivided case. Why is this change of assumed statistical properties necessary? How should this type of change be made in more complex problems? These are some of the questions we would like to investigate in this re- search project. It is crucial for us to develop a mathematically rigorous framework to address these issues. In the remainder of this chapter, we present a summary of the basic theories of minimum variance estimation in ([11]). 2.2 Classical Estimation Results Supposewehaveobservationdataconsistingofmrealnumbers, weknowthatthey are obtained from a linear function of n unknown parameters. We can arrange the observations as components of an m-dimensional vector Y and the unknown parameters as components of ann-dimensional vectorβ. ThusY =Wβ+ε, where W isanm×nmatrixdeterminedbytheparticularexperimentorphysicalsituation. WeassumethatW isknownandεistheerrortermduetomeasurementerrororthe model itself. We choose the sum of square of the residual errorkY −Wβk 2 as the performance measure, then we are going to have the following optimal estimation for β. 11 (The Least-Square Estimate)[11] Suppose the columns in W are linearly in- dependent, then there exists a unique n-dimensional vector ˆ β which minimizes kY −Wβk 2 over all β inR n , furthermore, ˆ β =(W 0 W) −1 W 0 Y. WhenW has linearly dependent columns, if we replace (W 0 W) −1 by the gener- alized inverse of W 0 W, ˆ β =(W 0 W) −1 W 0 Y still minimizeskY −Wβk 2 . If we assume ε is a random variable, then Y is also a random variable. Thus any linear estimate β of the form ˆ β = KY is also a random vector. It’s possible to choose the minimization of norm of the error, i.e. E[k ˆ β−βk] 2 as our criterion for the optimality of the estimation performance. We must assume in this case that certain statistical properties of ε are known. In particular, we assume that E(ε)=0 and E(εε 0 )=Q. If we write the error explicitly, we obtain E[k ˆ β−βk 2 ] = E[kKY −βk 2 ]=E[kKWβ+Kε−βk 2 ] = kKWβ−βk 2 +Trace(KQK 0 ). We observe that ifKW =I, the norm of the error is independent ofβ andE( ˆ β)= E(KWβ+Kε)=KWβ =β, which indicates that ˆ β is unbiased. This observation suggests an alternative problem: find an unbiased linear estimator ˆ β = KY to minimize E[k ˆ β−βk 2 ]. We have the following classical result, 12 (GaussMarkov)[11]IfW haslinearindependentcolumns,thenthelinearminimum- variance unbiased estimator of β based on Y is ˆ β =(W 0 Q −1 W) −1 W 0 Q −1 Y with corresponding error covariance E[( ˆ β−β)( ˆ β−β) 0 ]=(W 0 Q −1 W) −1 . Intheproceedingtwomethods,weassumethatnopriorinformationconcerning the values of the state vector β is available. But in many situations, we do have prior information and we can regard β as a random variable with known mean and covariance. As in Gauss Markov method, we also choose E[k ˆ β−βk] 2 as the performance of the estimation scheme, then we have the following theorem. (Minimum Variance Estimation)[11] Suppose Y =Wβ+ε, where Y is a known m-dimensional data vector, β is an n-dimensional random vector, ε is an n-dimensional random error vector, W is a known m×n constant matrix and E(ε)=0,E(εε 0 )=Q,E(β =0),E(ββ 0 )=R,E(εβ 0 )=0 13 andweassumethatRandQarepositive-semidefinitematricesandthatWRW 0 +Q is nonsingular. Then the linear estimate ˆ β of β minimizing E[k ˆ β−βk 2 ] is ˆ β =RW 0 (WRW 0 +Q) −1 Y with error covariance E[(β− ˆ β)(β− ˆ β) 0 ]=R−RW 0 (WRW 0 +Q) −1 WR. In the theorem above,β and the errorε are assumed to be uncorrelated. When these two random vectors are correlated with covariance matrix S = E(βε 0 ), the minimum-variance estimate of β based on Y is ˆ β =(RW 0 +S)(WRW 0 +WS +S 0 W 0 +Q) −1 Y Considerthecaseinwhichaninitialestimator ˆ β ofarandomvectorβ witherror covariance E[(β− ˆ β)(β− ˆ β) 0 ] =R is given. Subsequently, additional observations of the form Y = Wβ +ε become available, where ε is a random vector with zero mean and uncorrelated to both β and past observations. By applying minimum variance method to the model y−W ˆ β =W(β− ˆ β)+ε (2.6) 14 the updated optimal estimate of β is given by ˆ ˆ β = ˆ β+RW 0 (WRW 0 +Q) −1 (Y −W ˆ β) with error covariance E[(β− ˆ ˆ β)(β− ˆ ˆ β) 0 ]=R−RW 0 (WRW 0 +Q) −1 WR. In many applications, we need to consider a dynamical system whose state is a sequence of random variables. Kalman filtering method is a recursive method to estimate the state of a dynamic system from series of incomplete and noisy measurements. An n-dimensional dynamic model of random process consists of three parts: 1. A vector difference equation x k+1 =Φ(k)x k +u k , k =0,1,2,··· where x k is an n-dimensional state random vector, Φ(k) is a known n×n matrixandu k isann-dimensionalrandomvectorwithmean0andcovariance Q(k) and uncorrelated with previous u 0 ,u 1 ,···. 2. An initial random vector x 0 together with initial estimator ˆ x 0 having covari- ance E[(ˆ x 0 −x 0 )(ˆ x 0 −x 0 ) 0 ]=P(0). 15 3. Measurements of the process v k is of the form v k =M(k)x k +ε k , k =0,1,2,··· where M(k) is a known m×n matrix and ε k is an m-dimensional random errorwithmean0andcovarianceR(k). ε k isassumedtobeuncorrelatedwith previous ε 0 ,ε 1 ,···ε k−1 . In addition, it is assumed that the random vectors x 0 , u j and ε k are all uncorrelated. Our objective is to derive a linear minimum variance estimator of the state vector x k from the observation vector v k ,v k−1 ,··· ,v 0 . We introduce the notation ˆ x(k|j) to represents the estimate ofx k given the observationv up to timej. Under the above hypothesis the classical result of Kalman Filtering problem is given by: (SolutiontoRecursiveProblem-Kalman)[11]Theoptimalestimate ˆ x(k+1|k) may be generated recursively according the equation ˆ x(k+1|k)=Φ(k)P(k)M 0 (k)[M(k)P(k)M 0 (k)+R(k)] −1 ×[v k −M(k)ˆ x(k|k−1)]+Φ(k)ˆ x(k|k−1), where then×n matrixP(k) is the covariance matrix of ˆ x(k|k−1) and recursively generated by P(k+1) = E h ˆ x(k+1|k)−x(k+1)big) ˆ x(k+1|k)−x(k+1)big) 0 i 16 = Φ(k)P(k){I−M 0 (k)[M(k)P(k)M 0 (k) +R(k)] −1 M(k)P(k)}Φ 0 (k)+Q(k). Initial conditions are ˆ x(0|−1)= ˆ x 0 and the associated covarianceP(0)=E h (ˆ x 0 − x 0 )(ˆ x 0 −x 0 ) 0 i . 17 Chapter 3 Application of data assimilation to space weather As we have indicated in the introduction of this thesis, one of the primary moti- vation for this thesis is to investigate several key issues related to ionospheric data assimilation. These issues were identified in the course of the development of the Global Assimilative Ionospheric Model (GAIM). Subsequently, in the course of this research,acollaborativeeffortbetweenUCLAandUSCondataassimilationforthe Earth’s plasmasphere started. In this Chapter, we provide some of background in- formationontheEarth’sionosphereandplasmasphere. Inthischapter, thephysics of the Earth’s ionospheric and data assimilation are introduced. In particular, we identify the main physical process, the primary state variables, as we as, the avail- ableobservationfortheionosphere. InSection3.2,weintroducetheproblemof3-D tomography of Eart’s plasmasphere electron density. The tomographic imaging of plasmasphere is very similar to ionosphere data assimilations in that the state and observations are nearly the identical. In Section 3.3, we give a simple example to 18 illustrate the importance of correctly modeling the system uncertainties in a data assimilation application. 3.1 Ionospheric Data Assimilation Data assimilation is a set of techniques that aim at reconciling the difference be- tween a theoretical first principle based model of the system with the actual obser- vationsofthesystembymodifyingeithertheinitialandboundaryconditionsorthe external forcing functions. Data assimilation techniques have been widely used in numerical weather prediction for the last 40 years. More recently, these techniques are also gaining ground in other area of geophysical and atmospheric sciences. In particular,withrapidincreaseofthewirelesscommunication,theneedsforaccurate estimationofelectrondensitiesintheEarth’sionospherebecomesevermorecritical. Atthesametime,theavailablemeasurementoftheionosphericelectrondensityhas been extraordinary increased due to the increase of space based telecommunication and navigational satellites. Beginning in the late 1990s, significant research efforts in developing global data assimilation model were made. A common element in the approaches for ionospheric data assimilation is min- imum variance estimation technique. In particular, by using an array of ground receivers aligned with a satellite’s ground track, 2-dimensional tomographic inverse technique is used to retrieve electron density in the plane containing the ground stations and the satellite orbit. On the other hand, several modified Kalman filter 19 techniques are used in the Global Assimilative Ionospheric Models (GAIM) devel- opedbytheUSC/JPLteamandtheUniversityofUtahStateteam. Inthissection, we present basic information on the Earth’s ionosphere and the USC/JPL GAIM. 3.1.1 The Earth’sionosphere The energy of the solar radiation is so strong that it ionizes the neutral atoms or molecules in the upper atmosphere of the Earth. The ionosphere is the part of atmosphere where ions occur in sufficient density to have a significant influence on the propagation of radio wave. The ionosphere starts at about 60km above the surface of the earth. The density of ions vary with longitude, latitude, altitude and time. A typical distribution of ions densities as functions of altitude is shown in Figure 3.1. Figure 3.1: Typical vertical distribution of ion densities in the Earth’s ionosphere (source:www.windows.ucar.edu) 20 The state of ionosphere is characterized by the distribution of the densities, the velocity and the temperature of ions and electrons. The external forces include solar radiation intensity, densities and velocity of neutral particles, variation of the Earth’s electrical and magnetic fields. The basic laws governing the dynamics of the plasma in the Earth’s ionosphere are the equations of the fluids. In particular, the law of conservation of mass states that the change of the number of an ion species inside of volume V is equal to net flux of particles across the boundary of V plus the production or recombination of the species inside V. This is represented by an integral equation d dt ZZZ V η l (t,x)dV =− ZZ S η l ~ V l ·~ ndS + ZZZ V (P l −L l )dV, (3.1) whereS is the boundary of the volume,η l is the density of thel-th ion specie, ~ V l is the velocity of the specie. P l and L l are the production and loss rate respectively. Law of conservation of momentum is basically Newton’s second law of motion. However, for an ionospheric model we often assume that the ions are in ”quasi- static” state. That is, the acceleration of the ions is equal to zero. This leads to the balance of external forces or the balance of momentum equation −∇(η l K B T l )+η l M l ~ g+c l η l ( ~ E + ~ V l × ~ B)− ~ W l =0, (3.2) 21 where T l is the temperature of l-th ion; K B is the Boltzmann’s constant; M l is the molecular mass of the l-th ion; ~ g is the gravitational field; c l is the charged number of ion, and ~ E, ~ B are electrical and magnetic fields respectively. The term W l represents the momentum exchange due to collision of ionl with other ions and neutral particles. Law of conservation of translational energy is given by d dt ZZZ V 3 2 (η l k B T l )dV =− ZZ S 3 2 η l k B T l ~ V l ·~ ndS− ZZZ V η l k B T l div( ~ V l )dV + ZZ S κ l ∇( ~ T l )·~ ndS + ZZZ V (F l,n +Q l )dV , (3.3) where κ l is the thermal conductivity of the ion l and F l,n and Q l are the heat- ing caused by the friction with the neutral particles and the heating rate due to collision with other ions and other factors. If we assume that the ion and electron temperaturesaresufficientlywell-known,wecouldalsoavoidsolvingEquation(3.3) by using values of ion and electron temperature provided by an empirical model in Equation (3.2). In addition to the Equations (3.1)-(3.3), a model of ionosphere also requires knowledge of solar radiation and neutral densities to determine ion productionandrecombinationrates. TheionconvectionvelocityduetotheEarth’s electric field and neutral velocity are also required. These quantities are considered as driving forces of the ionosphere. We also must impose appropriate boundary condition to ensure that the above equations have a unique solution. Equations 22 (3.1)-(3.2) are in general highly nonlinear. However, if we are only interested in modeling the predominant ion species O + ion, the model can be simplified to a linear partial differential equation. In this project, we do not address the nonlin- ear aspect of the ionospheric model. Therefore, we primarily focus our attention on the single species version of equations (3.1)-(3.2) which constitute the infinite dimensional version of the state dynamic equation of our system. This version of the ionospheric model is discretized and used in the first version USC/JPL GAIM model. 3.1.2 GAIM and It’s measurements One of the most powerful advantage of a data assimilation model is its ability to analyze a diverse collections of different types of measurements. In the case of USC/JPL GAIM, the model can assimilate the following types of data: 1. Slant TEC (Total Electron Content) which is the integral of electron density along the transmitter-receiver line of sight. In this case, the transmitter is the GPS satellites and the receivers are the GPS ground stations; 2. Relative TEC measurements taken from a low-Earth orbiter (LEO) track- ing GPS satellites at positive and negative elevations (i.e., during GPS-LEO occultations). This is the difference of two consecutive TEC measurements made by the receiving satellites which removes receiver bias in the TEC mea- surements; 23 3. The in-situ measurements of electron density made by instruments on the Defense Meteorological Space Program (DMSP) satellites along the satellite orbit at 800km altitude; 4. The night-time UV airglow radiances which can related to the ion density in a non-linear manner; 5. The ionosonde profile that are derived from the measurement of the reflected signal transmitted by the instrument at a range of radio frequencies. The ionosonde profile represents the electron density profile below the F-region peak; 6. The TEC from TOPEX Altimeter is derived from the measurement of signal delay in the ocean reflected signal transmitted by the Altimeter, and 7. The electron density profile measured by incoherent scattering radar. TheprimaryofthepurposesofGAIMistodeterminetheelectrondensityinthe ionosphere at the present time and possibly forecast the change of electron density profile in the future. In doing this, two techniques were applied: Kalman Filtering and 4-dimensional variational (4DVAR) technique. We shall focus our attention on the Kalman filter approach. 24 3.1.3 Implementation of Kalman Filtering in GAIM BeforepresentationoftheimplementationofKalmanFilteringtechniqueinGAIM, we need first introduce the following notations: • x t k : The true state, a discrete representation of the true ionospheric state (density) at time k. x t k is a vector whose dimension is equal to the total num- berofspatialelementsinadiscretizedversionoftheionosphericmodel. Each entry inx t k represents the true average electron density within the associated spatial element. • x a k : The analysis, an estimate of x t k after assimilation of measurements from initial time to time step k; • x f k : The forecast, an estimate of x t k given measurements up to time k−1. The observations m o k are related to the true state x t k through an observation operator H k via the equation m o k =H k x t k +ε o k whereε o k =ε m k +ε r k isthemeasurementerrorwhichiscomposedofthemeasurement error ε m k , and a representativeness error ε r k . The latter is due to the discretization in time and space of the solution for the ionospheric state. The below Figure 3.2 gives a typical discretization of the space. 25 Figure 3.2: A typical low resolution grid for GAIM For different data type, the observation and observation operator are different. For example, for data type 1, i.e. slant TEC, the observation is the integration of the ion density along the path from GPS to the receiver on the ground. The observation operator H k is a vector with the same dimension as x t k and each entry of H k represents the length of the line of sight between the transmitter and the receiver inside of the associated element. Werelatethetruestateattimek+1tothetruestateattimek viathefollowing equation discretized ionospheric model x t k+1 =Ψ k x t k +ε q k , 26 where for single ion,Ψ k is a forward model derived from equations (3.1)-(3.2) and ε q k is a process noise which reflects our uncertainty in the forward model. Let M k = Cov(ε m k ), R k = Cov(ε r k ) and Q k = Cov(ε q k ) be the measurement, representation and process noise covariances, respectively. Then the Kalman filter can be summarized by the following set of Equations ([2]): x a k =x f k +K k (m o k −H k x f k ) K k =P f k H T k (H k P f k H T k +R k +M k ) −1 P a k =P f k −K k H k P f k x f k+1 =Ψ k x a k P f k+1 =Ψ k P a k Ψ T k +Q k . 3.2 3-D tomography of plasmasphere The purpose of 3-D tomography of plasmasphere is to construct an estimation of electron density in three dimensional space corresponding to the Earth’s plasmas- phere. In this section we will give a brief introduction to earth’s plasmasphere and the available measurements. 27 3.2.1 Earth’s plasmasphere The plasmasphere is a doughnut shaped plasma cloud that surrounds the Earth. It’s the inner part of the magnetosphere and located above the ionosphere. It’s es- sentiallytheextensionoftheionospheretohighaltitudes. Itiscoldanddensecom- paredwithotherpartsofmagnetosphere. Thetypicaltemperatureofplasmasphere electrons is 1 electron volt, which is equivalent to a temperature of 11,604 degrees Kelvin. The typical densities of particles range from on the order of 10 4 /cm 3 near topside ionosphere down to 10/cm 3 at plasmapause. Its source is photoionization of earth’s ionosphere. The plasmapause ranges from 2 to 7 earth radii depending ongeomagneticactivity. TheconstitutesofplasmaspherearemainlyH+, He+and O and N species. The following graph shows how the plasmasphere looks like in the space. Figure 3.3: The plasmasphere 28 The study of the plasmasphere is important because it influences both mag- netosphere and ionosphere. Our object is to estimate the electron density for the plasmasphere. 3.2.2 Measurements Liketheionospherecase, theobservationdataarealsoTEC,whichisalongtheline ofsightbetweenGPSsatellitesandalowearthorbitsatellite. WechooseJASON-1 to get the data because of its high altitude(1336km) to avoid the noise TEC due to the ionosphere. The figure below shows the relative positions of the earth, low earth orbit satellite and GPS satellites in the space. Figure 3.4: LEO and GPS satellites The LEO satellite communicates with several GPS satellites simultaneously ev- ery 10 seconds. We only used TEC along lines of sight with at least 10 degree 29 elevation angle. Similarly the observed TEC Y is related to the discrete represen- tation of the density X through an observation operator H Y =HX +ε, (3.4) whereε is the sum of the measurement error and representation error which is due tothediscretizationofthespace. Herewecanseethata3-Dtomographicinversion is identical to a single step of Kalman Filtering in GAIM. However, when obser- vations are collected over a long period of time, ignoring time dependent features of plasmasphere would lead to errors in the estimation. However, since data in plasmasphere is much more sparse compared to ionosphere and the time dynamics of the plasmasphere involves many poorly known parameters, we choose to focus on 3-D tomographic inversion instead of full data assimilation for plasmasphere estimation problem. 3.3 A specific Issues As we can see from the above discussion, the covariance matrices play important rolesinestimationoftheiondensityandplasmadensity. Onedramaticsymptomof using wrong covariance matrices is that the algorithm may produce huge negative density, which is non physical. 30 We would like to investigate the problem in detail through a relatively simple example. Wewilltaketheionosphereexample. Insteadofestimatingtheiondensity in 3-dimensional space, we may just look at a simpler geometry of ionosphere. Suppose we have one box filled with ions and we want to estimate the average ion densities in two adjacent cubic region, the 2-D cross section view is shown as in Figure 3.5. Figure 3.5: An ionospheric model with two cells. Suppose we can observe the TEC along line of sight through the box. The true value of TEC is obtained by integrating the ion density along the line. If we know only one TEC v along a straight line L and the path lengths of L in each cell are Δ 1 andΔ 2 . Thetrueaveragedensitiesineachcellarex 1 andx 2 respectively. Then the observation model is v =Δ 1 x 1 +Δ 2 x 2 +ε, whereε is the sum of the representation error and measurement error. We will give the precise definition of representation error in the next chapter. Suppose we are given unbiased prior estimation x f 1 x f 2 . The random variablesx 1 −x f 1 , x 2 −x f 2 and 31 ε are assumed to be independent to each other. Then we could apply minimum variance method to the following linear system v−(Δ 1 x f 1 +Δ 2 x f 2 )=Δ 1 (x 1 −x f 1 )+Δ 2 (x 2 −x f 2 )+ε. If we further assume that the covariance matrix Cov(x 1 −x f 1 ,x 2 −x f 2 ) = R = R 1 0 0 R 2 and variance Var(ε) =Q are specified. Let W = Δ 1 Δ 2 denote the observation operator, then the updated density vector is given by x a 1 x a 2 = x f 1 x f 2 +RW T (WRW T +Q) −1 [v−W x f 1 x f 2 ] = x f 1 x f 2 + R 1 0 0 R 2 Δ 1 Δ 2 v− Δ 1 Δ 2 x f 1 x f 2 Δ 1 Δ 2 R 1 0 0 R 2 Δ 1 Δ 2 +Q = x f 1 x f 2 + v−Δ 1 x f 1 −Δ 2 x f 2 Δ 2 1 R 1 +Δ 2 2 R 2 +Q Δ 1 R 1 Δ 2 R 2 32 For example, if v = 1, x f 1 x f 2 = 1/4 1 , Δ 1 = Δ 1 = 1, and Q = 1, then the updated density vector is given by x a 1 x a 2 = R 2 +1−R 1 4(R 1 +R 2 +1) 2R 1 +R 2 +2 2(R 1 +R 2 +1) . In this special case if we happen to choose R with R 2 <R 1 −1, then we will have x a 1 <0. Moreover, if we assume the densities follow the normal distribution, x 1 x 2 = x f 1 x f 2 + ξ 1 ξ 2 , where ξ 1 ξ 2 vN(0, σ 2 1 0 0 σ 2 2 ), andtheerrorε∼N(0,Q)isindependentof x 1 −x f 1 x 2 −x f 2 ,thentheobservationfollows a normal distribution, i.e., v =Δ 1 x 1 +Δ 2 x 2 +ε∼N(Δ 1 x f 1 +Δ 2 x f 2 ,Δ 1 σ 2 1 +Δ 2 σ 2 2 +Q). Under these assumptions, the optimal update from Kalman filtering(or Mini- mum Variance Method) is given by x a 1 x a 2 = x f 1 x f 2 + v−Δ 1 x f 1 −Δ 2 x f 2 Δ 2 1 σ 2 1 +Δ 2 2 σ 2 2 +Q Δ 1 σ 2 1 Δ 2 σ 2 2 33 = x f 1 x f 2 + Δ 1 Δ 2 1 +Δ 2 2 σ 2 2 /σ 2 1 +Q/σ 2 1 Δ 1 Δ 2 1 σ 2 1 /σ 2 2 +Δ 2 2 +Q/σ 2 2 (v−Δ 1 x f 1 −Δ 2 x f 2 ) On the other hand, if we use the wrong covariance matrix R in the implemen- tation of Kalman filter, the update is given by x f 1 x f 2 + v−Δ 1 x f 1 −Δ 2 x f 2 Δ 2 1 R 1 +Δ 2 2 R 2 +Q Δ 1 R 1 Δ 2 R 2 = x f 1 x f 2 + Δ 1 Δ 2 1 +Δ 2 2 R 2 /R 1 +Q/R 1 Δ 1 Δ 2 1 R 1 /R 2 +Δ 2 2 +Q/R 2 (v−Δ 1 x f 1 −Δ 2 x f 2 ) In the case where we choose the wrong covariance matrix of the state function, for example, σ 2 1 < R 1 and σ 2 2 < R 2 while σ 2 1 /σ 2 2 = R 1 /R 2 , (which could happen if we modify the resolution but we don’t change the covariance matrix), We would have higher probability to get negative density since Δ 1 Δ 2 1 +Δ 2 2 σ 2 2 /σ 2 1 +Q/σ 2 1 < Δ 1 Δ 2 1 +Δ 2 2 R 2 /R 1 +Q/R 1 Δ 1 Δ 2 1 σ 2 1 /σ 2 2 +Δ 2 2 +Q/σ 2 2 < Δ 1 Δ 2 1 R 1 /R 2 +Δ 2 2 +Q/R 2 The estimation algorithm producing negative electron densities is one of most dramatic example of possibly wrong error covariance matrices being used in a 34 Kalman filter. There are also many situations in which highly non physical elec- tron density profiles can be produced by the algorithm when inappropriate error covariancematricesareused. Intheproposedproject,weintenttodevelopamath- ematical framework for the modeling of model uncertainties for a discretized model of an infinite dimensional system and to develop a mathematically rigorous and computational efficient method for determining the appropriate error covariance matrices. 35 Chapter 4 Mathematical formulation Inthischapter,wedevelopamathematicalframeworktoaddresstheissueofmodel- inguncertaintiesintroducedbyfinitedimensionalsystemapproximatinganinfinite dimensional distributed parameter system. We first present a definition of random state and the representation error in general and two types of approximation states in section 4.1. Then the forward linear systems are set up in section 4.2. In Section 4.3, we compute the model error covariance for approximate state. After that, the statistical properties of the representation error are been derived in section 4.4. In Section 4.5, we present a special case that we can do the numerical simulation on. In Section 4.6, the minimum variance method is applied to the linear system in section 4.2. In the last section, a numerical simulation are done to justify the results in previous sections. In particular, we shall demonstrate that when the underlying fine scale uncertainty model is used to generate simulated observations, commonlyusedincorrectconstructionoferrorcovariancematricescanleadtowell- known problematic behavior of the estimation results. On the other hand, using 36 mathematically rigorous derivation of the system error covariances, we can obtain statistically optimal estimations. 4.1 Modeling of a random state function In this section, we develop a mathematical framework for the modeling of uncer- tainties for state functions over a subset D with non-empty interior ofR n . In this chapter, the measure on D is refereed to the Lebesgue measure. 4.1.1 Introduction to Random field Definition 4.1.1. Let {Ω,F,P} be a given probability space and D is a subset ofR n , then a mapping ζ(x,ω):D×Ω→R is called a random field on D if ζ(x,·) is a random variable for anyx∈D. It is called measurable if it is measurable with respect to σ{B n ×F}, which is the product σ-algebra generated byB n ×F. Later on, we also use ζ(x) to stand for ζ(x,ω). Definition 4.1.2. let ζ be a random field on D, suppose ζ(x) ∈L 2 (Ω) for each x. Let a(x) = Eζ(x), R(x,y) = E[ζ(x)−a(x)][ζ(y)−a(y)], thena(x)iscalledthe mean valueoftheζ(x)andR(x,y)isits correlation function. 37 InorderforafunctionR(x,y)tobeacorrelationfunctionofsomerandomfield, it is necessary and sufficient that it possesses the following properties ([4]): 1. R(y,x)=R(x,y). 2. for any m, any u i ∈R and any points x i in D, m X i,j=1 R(x i ,x j )u i u j ≥0. Real functions with these properties are called positive-definite kernels on D. 4.1.2 Random State function We are interested in integrable random fields on a finite volume region. From now on, all the region, sayD, discussed later will have finite volume, i.e. |D|= R D dx< ∞. Definition4.1.3. Arandomfieldζ onDiscalledarandom stateifζ ∈L 2 (D×Ω). Proposition 4.1.1. Both the mean value function a(x) and correlation function R(x,y) exist almost everywhere for a random state ζ, what’s more, a(x)∈L 2 (D) and R(x,y)∈L 2 (D×D). 38 Proof. D has finite volume and ζ ∈L 2 (D×Ω), then E Z D |ζ(x)|dx≤(E Z D |ζ(x)| 2 dx) 1/2 ( Z D dx) 1/2 <∞ Thus by Fubini theorem ([3]), a(x)=Eζ(x)<∞ a.e. and Z D |a(x)| 2 = Z D |Eζ(x,ω)| 2 ≤ Z D E|ζ(x,ω)| 2 dx<∞. By Schwartz’s inequality, Z D Z D |R(x,y)| 2 dxdy = Z D Z D E[ζ(x)−a(x)]·[ζ(y)−a(y)] 2 dxdy ≤ Z D Z D E [ζ(x)−a(x)] 2 ·E [ζ(y)−a(y)] 2 dxdy ≤ Z D E [ζ(x)−a(x)] 2 dx· Z D E [ζ(y)−a(y)] 2 dy = n Z D E [ζ(x)−a(x)] 2 dx o 2 <∞. Therefore R(x,y) exists and R(x,y)∈L 2 (D×D). This property guarantees the existence of mean value and correlation function. 39 Definition4.1.4. LetV ⊆L 2 (D)beaBanachspace. Aboundedlinearfunctional H :V →R is called an observation operator. For any random state ζ(x,ω) with the property that ζ(·,ω)∈V for ω∈ Ω, we can apply H onto ζ(·,ω) for every ω, then H(ζ(·,ω)) is a random variable and we cantaketheexpectationofit. Nowthequestioniswhetherwecouldinterchangethe order ofH and expectation operators? To answer that question, we need introduce the definition of Bochner integral([1]). The Bochner integral is an extension of Lebesgue integral to functions which take values in a Banach space. Let(Ω,Σ,μ)beameasurespaceandB aBanachspace. First,asimplefunction is any finite sum of the form s(ω)= n X i=1 χ E i (ω)b i , where the E i are disjoint members of the σ-algebra Σ, the b i are distinct elements of B, and χ E is the characteristic function of E. The integral of a simple function is then defined by Z Ω h n X i=1 χ E i (ω)b i i dμ= n X i=1 μ(E i )b i . 40 A measurable function f : Ω→B is B-Bochner integrable on Ω if there exists a sequence s k of simple functions such that lim k→∞ Z Ω kf−s k k B dμ=0. In this case, the Bochner integral is defined by Z Ω fdx= lim k→∞ Z ω s k dμ, where the limit is in the sense of B-norm. In the case when B =R, R Ω fdx is the ordinary expectation of f. Theorem 4.1.1. If ζ(·,ω) is V-Bochner integrable in Ω, then H Z Ω ζ(ω,·)dμ= Z Ω Hζ(ω,·)dμ. (4.1) Proof. For simple function s(ω) = P n i=1 χ E i (ω)b i , we could interchange H and the expectation operators since Z Ω H(s(ω))dμ = n X i=1 μ(E i )H(b i ) = H( Z Ω s(ω)dμ). 41 Sinceζ(·,ω)isV-Bochnerintegrable,thereisaseriesofsimplefunctions k ,such that lim k→∞ R Ω kζ−s k k V dμ=0. Thus | Z Ω H(ζ−s k )dμ| ≤ Z Ω H|ζ−s k | ≤ c Z Ω kζ−s k k V dμ→0. The last inequality holds true since H is a bounded operator on V. On the other hand, by definition of Bochner integral, |H Z Ω (ζ−s k )dμ| ≤ ck Z Ω |ζ−s k |dμk V →0. So H( Z Ω ζdμ) = lim k→∞ H( Z Ω s k dμ) = lim k→∞ Z Ω H(s k )dμ = Z Ω H(ζ)dμ. In later literature, we always assume that ζ is V-Bochner integrable. 42 Definition 4.1.5. Suppose ζ is a random state on D and ζ is its approximation random state, then for any observation operator H, the representation error is defined by ε(ω)=H(ζ(·,ω))−H(ζ(·,ω)). We are going to focus on two types of approximation methods and study the stochasticpropertiesoftheapproximationrandomstatesandrepresentationerrors. 4.1.3 Averaged random state Often people are interested in the discretization version of ζ on a partition of D. The definition of a partition is given below. Definition 4.1.6. A collection of subsets of the domain D, Γ ={Γ 1 ,Γ 2 ,··· ,Γ m } is called a partition of D⊆R n if it satisfies i. Γ i ∩Γ j =∅ for all i6=j; ii. ∪ m i=1 Γ i =D. Moreover Δ=max 1≤i≤m sup x,y∈Γ i |x−y| is called the diameter of the partition Γ. Definition 4.1.7. A partition ofD, Γ={Γ 1 ,Γ 2 ,··· ,Γ m }, is called a subpartition of Λ = {Λ 1 ,Λ 2 ,··· ,Λ n }, which is also a partition of D, if for any Γ i ,1 ≤ i ≤ m, there exists 1≤k≤n, such that Γ i ⊆Λ k . Λ is called superpartition of Γ. Now we are ready to define the averaged random state. 43 Definition 4.1.8. Suppose Γ={Γ 1 ,Γ 2 ,··· ,Γ m } is a partition of D, the averaged random state ζ is ζ(x,ω)= 1 |Γ i | Z Γ i ζ(y,ω)dy, ∀x∈Γ i , where|Γ i |= R Γ i dx6=0 for all i. We could rewrite the above definition as ζ = m X i=1 χ(x∈Γ i )ζ i (ω), where ζ i (ω)= 1 |Γ i | R Γ i ζ(x,ω)dx. Since Eζ i =E 1 |Γ i | Z Γ i ζ(x,ω)dx= 1 |Γ i | Z Γ i Eζ(x,ω)dx= 1 |Γ i | Z Γ i a(x)dx,a i , the mean value for ζ is a(x)= m X i=1 a i χ(x∈Γ i ). Cov(ζ i ,ζ j ) = E(ζ i −Eζ i )(ζ j −Eζ j ) = E n 1 |Γ i | Z Γ i ζ(x,ω)−a(x) dx 1 |Γ j | Z Γ j ζ(y,ω)−a(y) dy o = 1 |Γ i ||Γ j | Z Γ i Z Γ j E (ζ(x,ω)−a(x))(ζ(y,ω)−a(y)) dydx 44 = 1 |Γ i ||Γ j | Z Γ i Z Γ j R(x,y)dydx,R ij , thus the correlation function for ζ is R(x,y)= m X i=1 m X j=1 χ(x∈Γ i )χ(y∈Γ j )R ij , where χ(·) is the characteristic function. Thus Z D E|ζ| 2 dx ≤ 2 m m X i=1 Z D E|χ(x∈Γ i )ζ i | 2 dx = 2 m m X i=1 ( Z D χ(x∈Γ i )dx)(E|ζ i | 2 ) ≤ 2 m m X i=1 |Γ i |· 1 |Γ i | 2 E Z Γ i |ζ(x,ω)| 2 dx ≤ 2 m m X i=1 1 |Γ i | E Z D |ζ(x,ω)| 2 dx<∞, i.e. ζ ∈L 2 (D×Ω). We proved that this type of approximation ζ is indeed a random state. 45 4.1.4 Finite series approximation Let{φ n ,n = 1,2,···} be an orthonormal basis ofL 2 (D) and ζ is a random state onD. Sinceζ ∈L 2 (D×Ω), for almost everyω,ζ(·,ω) is in the spaceL 2 (D). We have the following series expansion for ζ, ζ(x,ω)= ∞ X n=1 ξ n (ω)φ n (x), where ξ n = R D ζφ n dx. The above convergence is inL 2 sense, i.e., Z D | N X n=1 ξ n (ω)φ n (x)−ζ(x,ω)| 2 dx→0, a.e. (4.2) Lemma 4.1.1. ξ n defined in above equation is inL 2 (Ω). Proof. ζ is a random state on D and φ n is inL 2 (D), then E Z D ζ 2 dx<∞, (4.3) and Z D φ 2 n dx<∞. (4.4) By the above facts and Schwartz inequality, E Z D |ζφ n |dx≤ E Z D ζ 2 dx 1 2 Z D φ 2 n dx 1 2 <∞, 46 then from Fubini theorem, ξ n = R D ζφ n dx is measurable. Further more, it is in L 2 (Ω) since Eξ 2 n = E Z D ζφ n 2 dx ≤ E Z D ζ 2 dx· Z D φ 2 n dx = E Z D ζ 2 dx · Z D φ 2 n dx <∞ Wecanusethesummationofthefirstfewtermsinaboveseries,ζ N = P N n=1 ξ n (ω)φ n (x), toapproximatetherandomstateζ. Thefirsttaskistoshowthatsuchζ N isindeed a random state. Lemma 4.1.2. ζ N = P N n=1 ξ n (ω)φ n (x) is a random state and lim N→∞ ζ N =ζ (in L 2 sense). Proof. By lemma 4.1.1, ξ n ∈ L 2 (Ω), and φ n ∈ L 2 (D), it is easy to show that ξ n (ω)φ n (x)∈L 2 (D×Ω) and thus ζ N = N X n=1 ξ n (ω)φ n (x)∈L 2 (D×Ω). 47 Since{φ n } ∞ n=1 is an orthonormal basis, thus for every ω∈Ω Z D | N X n=1 ξ n (ω)φ n (x)−ζ(x,ω)| 2 dx≤ Z D |ζ(x,ω)| 2 dx, together with ζ(x,ω)∈L 2 (D×Ω) and equation (4.2), we conclude that E Z D | N X n=1 ξ n (ω)φ n (x)−ζ(x,ω)| 2 dx→0, from dominant convergence theorem. Now let us look at the mean value and correlation functions of ζ N . The mean value a N (x) of ζ N is a N (x)=Eζ N = N X n=1 (Eξ n )φ n (x), and the correlation function R N (x,y) of ζ N is R N (x,y) = E[ζ N (x)−a(x)][ζ N (y)−a(y)] = N X n=1 N X m=1 {E[ξ n −Eξ n ][ξ m −Eξ m ]}φ n (x)φ m (y) = N X n=1 N X m=1 Cov(ξ n ,ξ m )φ n (x)φ m (y). Let a(x) and R(x,y) be mean value and correlation functions of ζ respectively. Then the following property illustrates the relationship between a N (x) ,R N (x,y) and a(x), R(x,y). 48 Proposition 4.1.2. (i). lim N→∞ a N (x)=a(x) inL 2 sense. (ii). lim N→∞ R N (x,y)=R(x,y) inL 2 sense. Proof. (i). From H¨ older’s inequality we have Z D a N (x)−a(x) 2 dx = Z D Eζ N (x,ω)−Eζ(x,ω) 2 dx ≤ Z D (E ζ N (x,ω)−ζ(x,ω) 2 )dx, hence lim N→∞ a N (x)=a(x) inL 2 sense by lemma (4.1.2). (ii). By lemma (4.1.2) and (i), it’s easy to see that lim N→∞ Z D E [ζ N (x,ω)−a N (x)]−[ζ(x,ω)−a(x)] 2 dx=0. (4.5) The above equation together with ζ ∈L 2 (D×Ω) and a(x) ∈L 2 (D) will lead to sup{ Z D E ζ N (x,ω)−a N (x) 2 dx, Z D E ζ(x,ω)−a(x) 2 dx,N =1,2,··· ,}<M, (4.6) for some M >0. Hence Z D Z D R N (x,y)−R(x,y) 2 dxdy = Z D Z D E[ζ N (x,ω)−a N (x)][ζ N (y,ω)−a N (y)] 49 −E[ζ(x,ω)−a(x)][ζ(y,ω)−a(y)] 2 dxdy ≤ 2 Z D Z D { E[ζ N (x,ω)−a N (x)][ζ N (y,ω)−a N (y)] −E[ζ(x,ω)−a(x)][ζ N (y,ω)−a N (y)] } 2 dxdy + 2 Z D Z D |E[ζ(x,ω)−a(x)][ζ N (y,ω)−a N (y)] −E[ζ(x,ω)−a(x)][ζ(y,ω)−a(y)] } 2 dxdy ≤ 2 Z D Z D E [ζ N (y,ω)−a N (y)] 2 · E [ζ N (x,ω)−a N (x)]−[ζ(x,ω)−a(x)] 2 dxdy + 2 Z D Z D E [ζ(x,ω)−a(x)] 2 · E [ζ N (y,ω)−a N (y)]−[ζ(y,ω)−a(y)] 2 dxdy = 2 Z D E [ζ N (y,ω)−a N (y)] 2 dy· Z D E [ζ N (x,ω)−a N (x)]−[ζ(x,ω)−a(x)] 2 dx + 2 Z D E [ζ(x,ω)−a(x)] 2 dx· Z D E [ζ N (y,ω)−a N (y)]−[ζ(y,ω)−a(y)] 2 dy ≤ 4M Z D E [ζ N (x,ω)−a N (x)]−[ζ(x,ω)−a(x)] 2 dx. So we can conclude from equation (4.5) that lim N→∞ R N (x,y) = R(x,y) in L 2 sense. 50 a(x)andR(x,y)arebothL 2 integrable,theyadmitthefollowingFourierseries expansions a(x)= ∞ X n=1 μ n φ n (x), and R(x,y)= ∞ X m=1 ∞ X n=1 λ mn φ m (x)φ n (y), where the convergences are both inL 2 sense. On the other hand, a(x)= lim N→∞ a N (x)= ∞ X n=1 (Eξ n )φ n (x) and R(x,y)= lim N→∞ R N (x,y)= ∞ X m=1 ∞ X n=1 Cov(ξ m ,ξ n )φ m (x)φ n (y), which imply μ n =Eξ n , and Cov(ξ m ,ξ n )=λ mn . (4.7) Notice that ζ N is actually an orthonormal porjection of ζ to the space spanned by {φ 1 (x),φ 2 (x),··· ,φ N (x)}. In a general case, let {ϕ 1 (x),··· ,ϕ n (x),··· ,} be a Schauder basis ofL 2 (D), while it is not necessary orthonormal. We still can approximate ζ(ω,x) by ζ N = P N n=1 ξ n (ω)ϕ n (x), which is orthogonal projection of ζ to the space span{ϕ 1 (x),··· ,ϕ N (x)} for each ω ∈ Ω. In this situation, we no longerhavetheequationξ n = R D ζϕ n dx. Butlemma(4.1.1),(4.1.2)andproposition 51 (4.1.2) still hold true since the Gram-Schmidt orthogonalization of the Schauder basis will lead to an orthonormal basis{φ 1 ,φ 2 ,··· ,} and span{ϕ 1 (x),··· ,ϕ N (x)}=span{φ 1 (x),φ 2 (x),··· ,φ N (x)}. 4.2 Forwardmodeloflinearobservationoperators In the above two types of approximation methods, there are the following common characteristics. (1) Approximate random states ζ have the form ζ = n X i=1 ξ i (ω)ψ i (x). (2) ψ i (x) is known. (3) Mean and variance ofξ i can be computed if we are given the mean valuea(x) and correlation function R(x,y) of the original random state ζ. Weareinterestedinestimatinganunknownsystemstateζ definedonadomain D fromL observationsY l ,l =1,··· ,L, obtained by linear operatorsH l . However, inmostcases, it’simpossibletoestimateζ sinceit’sainfinitedimensionalinversion 52 problem. So we will estimate its approximate ζ instead. The forward observation model is given by Y l =H l ζ +ν l = N X n=1 H l (ψ n (·))ξ n (ω)+ν l , l =1,··· ,L, whereν l includes measurement error and representation error. In matrix form it is then given by Y =Wβ+ε+θ. (4.8) where Y= Y 1 Y 2 . . . Y L ,W = H 1 (ψ 1 (·)) ··· H 1 (ψ n (·)) . . . . . . . . . H L (ψ 1 (·)) ··· H L (ψ n (·)) ,β = ξ 1 . . . ξ n . The matrix version of the error term can be written as ν =ε+θ where θ l , l = 1,··· ,L is the measurement error, which are usually assumed to be i.i.dwithmean0andindependentofrepresentationerrorεandthesystemstateβ. The term ε corresponds to the representation error, which is defined in definition (4.1.5). 53 For a linear system (4.8), both the system state and the error term are random variables. W is known because ψ i (x) is given. If we can derive the statistical properties of these random variables, we could use minimum variance approach to estimate the averaged state. In next two section, we shall present the relationship betweenthestochasticpropertiesoftheoriginalsystemstateanditsapproximation as well as the representation error. 4.3 Covariancepropertiesofapproximationstates In this section we will evaluate the covariance properties for the averaged random state and finite series expansion approximation for system (4.8). To derive them, we need to know the statistical properties of the original system state ζ. As we mentioned in previous section, an unbiased prior estimation for the system state is available. Since the estimation of the true system state is equivalent to the estimationofthedifferencebetweenthepriorestimationandthetruesystemstate, we can assume, without lost of generality that mean of the system state is 0. We assume that the correlation function of the system state ζ is R(x,y). InthecaseofaveragedrandomstatewithrespecttopartitionΓ={Γ 1 ,··· ,Γ m }, the i-th element of β in system (4.8) is ξ i = 1 |Γ i | Z Γ i ζ(x,ω)dx. 54 As we computed before, the covariance matrix of β is R =(R ij ) m×m , where R ij = 1 |Γ i ||Γ j | Z Γ i Z Γ j R(x,y)dydx. In the case of N-th order approximation with respect to the orthonomal basis {φ n } ∞ n=1 , if R(x,y) admits the finite series expansion R(x,y)= ∞ X m=1 ∞ X n=1 λ mn φ m (x)φ n (y), then Var(β)=(λ mn ) N×N . The application of the minimum variance estimation of the averaged system state (4.8) also requires knowledge of the statistical properties of the error term. The error comes from two sources. One is the measurement error, whose statistical property are well understood. Another one is the representation error. In next section, we will derive the statistical property of the representation error based on our assumptions. 55 4.4 Covariancepropertiesofrepresentationerror The representation error in system (4.8) is ε=Hζ(x,ω)−Wβ. As we mentioned in last section, the system state is assumed to have zero mean and hence β also has zero mean and by theorem 4.1.1 Eε=EHζ(x,ω)=H[Eζ(x,ω)]=0, (4.9) for V-Bochner integrable random state ζ. To compute the covariance of ε, it’s sufficient to know Var(β),Var(Hζ(x,ω)) and Cov(Hζ(x,ω),β). For V-Bochner integrable random state ζ, we will have E[H k ζ(x,ω)H l ζ(y,ω)]=H k H l E[ζ(x,ω)ζ(y,ω)]=H k (H l R(·,·)). (4.10) 56 In the case of averaged random state approximation, ξ i = 1 |Γ i | R Γ i ζ(x,ω)dx, we will have E[H k ζ(x,ω)· 1 |Γ i | Z Γ i ζ(y,ω)dy]= 1 |Γ i | H k Z Γ i E[ζ(x,ω)ζ(y,ω)dy]= 1 |Γ i | H k ( Z Γ i R(·,y)dy). (4.11) In the case of N-th order approximation with respect to the orthonomal basis {φ n } ∞ n=1 , if R(x,y) admits the finite series expansion R(x,y)= ∞ X m=1 ∞ X n=1 λ mn φ m (x)φ n (y), then E[H k ζ(x,ω)β i ] = E[H k ( ∞ X m=1 ξ m φ m (x))ξ i ] (4.12) = ∞ X m=1 E(ξ m ξ i )H k (φ m (·)) (4.13) = ∞ X m=1 λ mi H k (φ m (·)) (4.14) = H k ( ∞ X m=1 λ mi φ m (·) (4.15) = H k ( Z D R(·,y)φ i (y)dy) (4.16) Once we have equations (4.9)-(4.16), the covariance matrixQ of representation error and the covariance matrix S of the representation error and the state can be 57 computed easily. Later we will give the explicit formula of Q and S for a special case. 4.5 A special case In this section, the forward model of a special case will be given. We will do numericalsimulationsonitlater. Wecomputetheexplicitformulaofthecovariance matrices for the state and representation error. Let Γ e ={Γ e i } m i=1 be a partition for a domain D such that all the subsets have the same volume. ζ is a random state on D with mean function a(x)=0 and the correlation function R(x,y)=Σ ik if x∈Γ e i ,y∈Γ e k , (4.17) where Σ = (Σ ik ) m×m is positive definite matrix. The positiveness of Σ guarantees that R(x,y) is a positive-definite kernel on D and such random state ζ exists. Let η i = 1 |Γ e i | Z Γ e i ζ(x,ω)dx,i=1,··· ,m then ζ e = m X i=1 χ(x∈Γ e i )η i (ω) is the averaged random states with respect to Γ e . Lemma 4.5.1. ζ =ζ e a.e. 58 Proof. For any x ∈ Γ e i , we need show that ζ(x) = η i . Let ε = ζ(x)−η i , then E(ε)=0 because a(x)=0 for ζ. And Eε 2 = E[ζ(x)−η i ] 2 = E[ζ(x)− 1 |Γ e i | Z Γ e i ζ(y)dy] 2 = E[ζ(x)] 2 −2 1 |Γ e i | E[ζ(x) Z Γ e i ζ(y)dy]+ 1 |Γ e i | 2 E[ Z Γ e i ζ(y)dy] 2 = Σ ii −2 1 |Γ e i | Z Γ e i Eζ(x)ζ(y)dy+ 1 |Γ e i | 2 Z Γ e i Z Γ e i Eζ(y)ζ(z)dydz = Σ ii −2 1 |Γ e i | Z Γ e i Σ ii dy+ 1 |Γ e i | 2 Z Γ e i Z Γ e i Σ ii dydz = Σ ii −2Σ ii +Σ ii =0 Thus ε=0 a.e., which implies ζ =ζ e a.e. We showed thatζ(·,ω) is constant for fixedω in each subset of Γ e , later we will call Γ e the elementary partition of ζ and η the elementary state. Lemma 4.5.2. Let η =[η 1 ,η 2 ,··· ,η m ] 0 , then Eη =0 and Var(η)=Σ. Proof. Eη =0 is due to the fact that a(x)=0. Eη i η k = E 1 |Γ e i | Z Γ e i ζ(x)dx 1 |Γ e k | Z Γ e k ζ(y)dy = 1 |Γ e i | 1 |Γ e k | Z Γ e i Z Γ e k Eζ(x)ζ(y)dxdy = 1 |Γ e i | 1 |Γ e k | Z Γ e i Z Γ e k Σ ik dxdy = Σ ik . 59 Now let Γ={Γ j } n j=1 is a super-partition of Γ e and denote β j = 1 |Γ j | Z Γ j ζ(x,ω)dx,j =1,··· ,n, then ζ = P n j=1 χ(x∈Γ j )β j (ω) is the averaged random state with respect to Γ. If we introduce the transformation matrix which fully describes the relationship betweentwopartitionswhenoneisasubpartitionofanotherone,thenwecanwrite the relationship between β and η in matrix form. Definition 4.5.1. Suppose Γ e = {Γ e i } m i=1 and Γ = {Γ j } n j=1 are two partitions of a domain D and Γ e is a subpartition of Γ, then a n×m matrix T is called the transformation matrix of Γ on Γ e if for any 1≤i≤m, 1≤j≤n T ji = 1 if Γ e j ⊆Γ i , 0 otherwise. The value of T ij indicates whether the j-th subset of the partition Γ e is in the i-th subset of the partition Γ. By the definition of subpartition, for any j, there exists a unique i such that Γ e j ⊆ Γ i , thus there is only one nonzero element in j-th column. It leads to the conclusion that the rows of T are pairwise orthogonal, i.e P m k=1 T ik T jk = 0 for i 6= j. Let N i = P m k=1 T ik T ik , then it gives the number of 60 subsets of Γ e which are in Γ i . Let N = N 1 0 ... 0 0 N 2 ... 0 ................ 0 0 ... N n , which will be called counting matrix in later part of the thesis. We summarize the above analysis by the following lemma. Lemma 4.5.3. TT 0 =N. Lemma 4.5.4. Let β = [β 1 ,··· ,β n ] 0 and T is the transformation matrix of Γ on Γ e , then β =N −1 Tη. Proof. On i-th row T i of T, the k-th element is 1 if Γ e k ⊆ Γ i and 0 otherwise. Therefore T i η = P m k=1 T ik η k = P m k=1,Γ e k ⊆Γ i η k . Since N is a diagonal matrix, the i-th element of N −1 Tη is 1 N i T i η = 1 N i m X k=1,Γ e k ⊆Γ i η k . On the other hand, β i = 1 |Γ i | Z Γ i ζ(x)dx = 1 |Γ i | m X k=1,Γ e k ⊆Γ i Z Γ e k η k dx = m X k=1,Γ e k ⊆Γ i |Γ e k | |Γ j | η k 61 = 1 N i m X k=1,Γ e k ⊆Γ i η k , the above equality holds true because the assumption that the volume of subsets in Γ e are the same. So β =N −1 Tη. Now let us consider linear operators{H l } L l=1 , then H l (ζ)=H l (ζ e )=H l ( m X i=1 χ(x∈Γ e i )η i (ω))= m X i=1 H l (χ(x∈Γ e i ))η i (ω), in matrix form, H(ζ) = f Wη, where f W li = H l (χ(x∈ Γ e i )). Similarly, H(ζ) = Wβ, where W li =H l (χ(x∈Γ i )), then we have Lemma 4.5.5. W = f WT 0 . Proof. W li = H l (χ(x∈Γ i ))= m X k=1,Γ e k ⊆Γ i H l (χ(x∈Γ e k )) = m X k=1,Γ e k ⊆Γ i g W lk = m X k=1 g W lk T ki , i.e. W = f WT 0 . The forward system is Y =Wβ+ε+θ. 62 Moreover we have Proposition 4.5.1. (1). R =Var(β)=N −1 TΣT 0 N −1 . (2). Q=Var(ε)= f W(I m −T 0 N −1 T)Σ(I m −T 0 N −1 T) 0 f W 0 . (3). S =E(βε 0 )=N −1 TΣ(I m −T 0 N −1 T) f W 0 . Proof. (1). By lemma (4.5.4) and lemma (4.5.2), R = Var(β)=Var(N −1 Tη) = N −1 TVar(η)T 0 N −1 = N −1 TΣT 0 N −1 . (2). By lemma (4.5.5) ε = H(ζ e )−H(ζ)= f Wη−Wβ = f Wη− f WT 0 N −1 Tη = f W(I m −T 0 N −1 T)η, then Q=Var(ε)= f W(I m −T 0 N −1 T)Σ(I m −T 0 N −1 T) 0 f W 0 . (4.18) 63 (3). S = E(βε 0 )=E[N −1 Tηη 0 (I m −T 0 N −1 T) 0 f W 0 ] (4.19) = N −1 TΣ(I m −T 0 N −1 T) f W 0 . (4.20) Wecoulddevelopintuitionsaboutscalingeffectonthecovarianceusingthecase independent identically distributed of η, referred to as iid case. In this case, the covariance of η has the form Σ =σ 2 I m . Using lemma (4.5.3) the covariance of the averaged state is R = N −1 TΣT 0 N −1 =σ 2 N −1 TI m T 0 N −1 = σ 2 N −1 TT 0 N −1 =σ 2 N −1 NN −1 =σ 2 N −1 = diag( σ 2 N 1 ,··· , σ 2 N n ). The above equality shows that the variance of β i is inverse proportional to the size of Γ i . Therefore, in the iid case, we need scale the uncertainty according to the size of the partition. It is useful to understand how the covariance matrix change as the resolution of the discretization change. It important to note that a simple relation only exists if 64 one partition is a subpartition of another one. Consider two partitions Γ 1 and Γ 2 such that Γ 2 is a super-partition of Γ 1 . Then the following result holds. Corollary 4.5.1. Suppose Γ 1 and Γ 2 are both superpartitions of Γ e , Γ 2 is the su- perpartition of Γ 1 and T is the transformation matrix of Γ 2 on Γ 1 and N is the corresponding counting matrix, then the covariance matrices R i of the averaged state on the scale of Γ i (i=1,2) satisfies the following equation R 2 =N −1 TR 1 T 0 N −1 . (4.21) When Σ=σ 2 I m , by lemma (4.5.3), the equation (4.18) is reduced to Q = σ 2 f W(I m −T 0 N −1 T) 2 f W 0 = σ 2 f W(I m −2T 0 N −1 T +T 0 N −1 TT 0 N −1 T) f W 0 = σ 2 f W(I m −2T 0 N −1 T +T 0 N −1 NN −1 T) f W 0 = σ 2 f W(I m −T 0 N −1 T) f W 0 and the equation (4.20) becomes S = σ 2 (N −1 T −N −1 TT 0 N −1 T) f W 0 = σ 2 (N −1 T −N −1 NN −1 T) f W 0 =0. 65 It is important to know that the representation error and the averaged system statearecorrelatedevenifthesystemstateattheelementaryscaleisuncorrelated. Only in the case where Σ =σ 2 I m , that all state variable are i.i.d., S = 0. In fact, consider the following simple example. Let η 1 and η 2 be the only two elementary system state. The averaged system stateβ is given byβ =(η 1 +η 2 )/2. Consider a observationY defined byY =α 1 η 1 +α 2 η 2 . The representation error in this case is equal to ε=α 1 η 1 +α 2 η 2 −(α 1 +α 2 )β. The covariance E(β·ε) is given by E(β·ε) = (α 1 σ 2 1 +α 2 σ 2 2 )/2−(α 1 +α 2 )(σ 2 1 +σ 2 2 )/4 = 1 4 (α 1 −α 2 )(σ 2 1 −σ 2 2 ) where σ 1 and σ 2 are the standard deviations of η 1 and η 2 , respectively. Unless σ 1 = σ 2 , it is possible to find an observation operator with α 1 −α 1 not equal to zero so that the representation error and the averaged state are correlated. This result suggests that we must consider the case of correlated observation error and state error much more often when we work with averaged system state. It is also useful to understand how the covariance of the representation error changes as we change the resolution of the averaged system. For the iid case, Q = σ 2 f W(I m −T 0 N −1 T) f W 0 depends only on the matrix I m −T 0 N −1 T if we fix the observation operator. Therefore, we can focus our discussion on the matrix I m −T 0 N −1 T. We have the following proposition. 66 Proposition 4.5.2. Suppose Γ 1 and Γ 2 are both partitions of a domain D with elementary partition Γ e and Γ 2 is a super-partition of Γ 1 , then I−T (2) 0 (N (2) ) −1 T (2) ≥I−T (1) 0 (N (1) ) −1 T (1) , (4.22) where T (i) is the transformation matrix and N (i) is the counting matrix of Γ i on Γ e , i=1,2. Proof. To prove it, we need the following lemma. Lemma 4.5.6. If a,b and m,n∈Z, then (a+b) 2 m+n ≤ a 2 m + b 2 n (4.23) Proof. Since (a+b) 2 m+n ≤ (|a|+|b|) 2 m+n , we could assume a,b≥0, then (a+b) 2 m+n = a 2 +2ab+b 2 m+n = a 2 +2( √ na √ m )( √ mb √ n )+b 2 m+n ≤ a 2 + na 2 m + mb 2 n +b 2 m+n = a 2 m + b 2 n 67 Because Γ 2 is a super-partition of Γ 1 and Γ 1 is a super-partition of Γ e , without lossofgenerality, wecanassumeΓ e 1 ,··· ,Γ e m , Γ 1 1 ,··· ,Γ 1 n andΓ 2 1 ,··· ,Γ 2 r areordered in a way such that T (1) = 1 0 N 1 0 ··· 0 0 1 0 N 2 ··· 0 ................... 0 ... 0 1 0 Nn and T (2) = 1 0 M 1 0 ··· 0 0 1 0 M 2 ··· 0 ................... 0 ... 0 1 0 Mr , (4.24) andN i j−1 +1 +···+N i j =M j forj =1,··· ,r,wheretheN vector 1 N =[1,1,··· ,1] 0 and i 0 =0,i r =n. Therefore for any α =[α 1 ,α 2 ,··· ,α m ] 0 ∈R m , α 0 T (1) 0 (N (1) ) −1 T (1) α = 1 N 1 (α 1 +···+α N 1 ) 2 +···+ 1 N n (α N n−1 +1 +···+α Nn ) 2 , (4.25) and α 0 T (2) 0 (N (2) ) −1 T (2) α = 1 M 1 (α 1 +···+α M 1 ) 2 +···+ 1 M r (α M r−1 +1 +···+α Mr ) 2 . (4.26) We apply lemma (4.5.6) repeatedly, we obtained 1 N 1 (α 1 +···+α N 1 ) 2 + 1 N 1 (α N 1 +1 +···+α N 2 ) 2 +···+ 1 N n (α N n−1 +1 +···+α Nn ) 2 ≥ 1 N 1 +N 2 (α 1 +···+α N 1 +N 2 ) 2 +···+ 1 N n (α N n−1 +1 +···+α Nn ) 2 ≥ ··· 68 ≥ 1 M 1 (α 1 +···+α M 1 ) 2 +··· 1 N n (α N n−1 +1 +···+α Nn ) 2 ≥ ···α 0 T (2) 0 (N (2) ) −1 T (2) α ≥ 1 M 1 (α 1 +···+α M 1 ) 2 +···+ 1 M r (α M r−1 +1 +···+α Mr ) 2 , which implies T (1) 0 (N (1) ) −1 T (1) ≥T (2) 0 (N (2) ) −1 T (2) and hence I m −T (2) 0 (N (2) ) −1 T (2) ≥I m −T (1) 0 (N (1) ) −1 T (1) . 4.6 Minimum variance estimation Suppose the variance of the measurement error is σ 2 0 and let Q 0 = σ 2 0 I n , then we have all the uncertainties that are needed, we can perform minimum variance method for system Y =Wβ+ε+θ The minimum variance estimator is ˆ β =(RW 0 +S)(WRW 0 +WS +S 0 W 0 +Q+Q 0 ) −1 Y. (4.27) 69 From the calculations of R, Q and S, we can see that they depends on the correlation function of the random state and the mapping is continuous. Thus the minimum variance estimation ˆ β depends on the correlation function continuously. 4.7 Validation The purpose of this section are two folds. First, we would like to demonstrate that using the correctly derived statistical properties of the averaged system state, the minimumvarianceestimationtechniqueproducesoptimalresultsfortheestimation of the averaged state. Second, we would like to show that if erroneous error covari- ance matrices for either the averaged state or the observation error, the quality of the resulting estimation would deteriorate. Experiment 4.7.1. The purpose of this experiment is to check the importance of scaling the covariance of the average state according the size of the subset in a partition. Let D =[0,50)×[0,50) on the plane. The elementary partition is Γ e ={[i,i+ 1)×[j,j+1)} 49 i,j=0 . Welabeltheuncertaintyunitsintheorderfromlefttorightand thenupwardstartingat[0,1)×[0,1). ConsiderapartitionΓofD givenby{[0,50× [0,50),[0,5)× [5,25),[5,15)× [0,25),[15,30)× [0,20),[30,50)× [0,40),[30,50)× [40,50),[15,30)×[20,50),[0,15)×[25,50)}. Then we randomly generate L = 15 straightlinesthroughthisregion. ThepartitionΓalongwiththe15linesareshown in the Figure (4.1). 70 Figure 4.1: The partition of D and 15 line operators Let the system state η be a random vector with mean 0 and covariance matrix Σ=I 2500 . We want to estimate the average stateβ via the observations generated by the above 15 line observation operators. The procedure of the experiments is the following: Step 1: Compute the transformation matrix T and counting matrix N of Γ, the observation matrix f W, where W lj is the length of the l-th line segments in j-th elementary uncertain units and W =T 0 f W. Step 2: Generate 2500 i.i.d random variables with mean 0 and variance 1 as our system stateη and then compute the true average system stateβ. The true system state and the average system state are shown in Figure (4.2) and (4.3) respectively. 71 Figure 4.2: True system state Figure 4.3: True average system state 72 Step3: Generate15i.i.drandomvariablesθ 1 ,··· ,θ 15 withmean0andvariance σ 2 0 asthemeasurementerror. ComputeY = f Wη+θwhichisourobservationvector. Step 4: Compute the covariance matrices R =N −1 = σ 2 N 1 0 ...... 0 0 σ 2 N 2 0 ... 0 ................... 0 ..... 0 σ 2 N 8 = 1 25 0 ...... 0 0 1 100 0 ... 0 ................... 0 ...... 0 1 375 andQ= f W(I−T 0 N −1 T) f W 0 .Theminimumvarianceestimateoftheaveragesystem state is given by ˆ β =RW 0 (WRW 0 +Q+σ 2 0 I) −1 Y (4.28) The retrieve image is shown in Figure (4.4) Figure 4.4: Estimator of Average System with Scaled State Covariance 73 For the next phase of the experiment, we don’t scale the covariance matrix of the average system state function according to the size of the partition. We would like to quantify the degradation of the estimate. Step5: Let ˜ R =I 8 asthenon-scaledstatecovariance,thenapplyminimumvariance estimation method with ˜ R, we will get another estimate ˜ β = ˜ RW 0 (W ˜ RW 0 +Q+ σ 0 I) −1 Y. The resulting retrieve image is shown in Figure (4.5). Figure 4.5: Estimator of Average System with Non-scaled State Covariance Step 6: Compute the error between the estimate and true system state by d Err = kT 0 ˆ β−ηk (4.29) g Err = kT 0 ˜ β−ηk (4.30) 74 The error of the retrieve is 2.7955 with scaled state covariance R while it goes dramatically to 10.0524, which increases 259.59%, without scaling the state covari- ance. People might think that it happens to be this case, so let us repeat step 2-6 1000 times with the same observation operators. The average of the error for the retrieve with scaled covariance is 2.5870 while 10.1861 for non-scaled case. We could compare the histogram of errors of two cases. Figure4.6: Thehistogramoferrorswithscaledstatecovariancefor1000estimations Theminimumvarianceestimationtechniqueassumesthatthecovariancematri- ces provided are accurate descriptions of the properties of these errors. The above example provides one example of the degradation of the estimation quality. We are continuing work on evaluating and quantifying the impacts of various possible wrong error covariance matrices for the averaged system state on the quality of the estimate. 75 Figure 4.7: The histogram of errors with non-scaled state covariance for 1000 esti- mations 76 Chapter 5 Bayesian approach In the last chapter, we discussed how we can derive the error covariance matrices when the correlation of the random state is given. This allows us to avoid mistakes in prescribing covariance matrices for the finite dimensional approximated state. Nowweareinapositiontodiscusspossiblewaystodeterminethemostappropriate uncertainty model among a collection of candidate models. In order to make use of theLawofLargenumbers,weonlydiscusstherandomstateswithfinitedimensional elementary state in this chapter. In particular, we first present an approach to model the uncertainties in which the covariance matrix of the elementary state vector is parameterized using a small number of parameters. Then in Section 2, we propose a Bayesian approach to determine the most appropriate covariance matrix among a finite collection of matrices. 77 5.1 Parameterized model of uncertainties Consider Γ e = {Γ e 1 ,Γ e 2 ,··· ,Γ e m } as the elementary partition of a domain D ⊂R 2 with corresponding elementary state vector given by η = [η 1 ,η 2 ··· ,η m ] 0 . Our objective is to determine the covariance matrix Σ= σ 2 1 σ 12 ... σ 1m σ 12 σ 2 2 ... σ 2m ................... σ 1m σ 2m ... σ 2 m of η. In general Σ is determined by the upper triangular part of it with m(m+ 1)/2 entries in total. In this section, we suggest a possibly useful way to model the covariance matrix of the elementary state. The suggested model take into consideration the aspects of many realistic applications to reduce the degree of freedom in covariance modeling. In many real situations, the state function in each pixel has strong correlations onlywithstatesofneighboringpixels. Incomparison, ithasweakcorrelationswith state functions for pixels that are far away. This characteristic of state function make it reasonable for us to propose that the magnitude of the off-diagonal entry σ ij ,(i 6= j) is completely determined by the uncertainties of states η i , η j and the distance between pixels Γ i and Γ j . In the special case where the state function is defined on a subset of R 2 , we can use d 1 and d 2 to denote horizontal and vertical 78 distancesbetweenthetwopointsΓ i andΓ j ,respectively. Morepreciselyinthiscase, we have σ ij = τ ij σ i σ j where τ ij = γ(d 1 ij ,d 2 ij ) for some function γ with properties γ(0,0) = 1 and lim k(d 1 ,d 2 )k→∞ γ(d 1 ,d 2 ) = 0. The reason that we distinguish the horizontal distance and vertical distance in this example is that the distances in different directions may have different influence on the correlation. One possible choice of γ is given by γ(d 1 ,d 2 )=exp − d 1 d 2 Υ d 1 d 2 (5.1) where Υ is a 2×2 matrix of the form Υ= Υ 11 Υ 12 Υ 12 Υ 22 . Thisselectionofγ resemblesajointnormalprobabilitydensitydistributionfunction of two variables. This selection of γ only consider the case of positively correlated system state. Other approach has to be used if system state estimation errors correlated negatively. Using this model of error covariance model, a covariance matrix Σ is then parameterized by the diagonal entries of the covariance matrix andthematrixΥtotalingm+3parameters.WenotethattheselectionofmatrixΥ mustensurethattheΣgeneratedinsuchwaystayspositivedefinite. Anoftenused empirical approach to model system uncertainty is to assume that the uncertainty 79 is proportional to the absolute values of the nominal state. This allows us to further reduce the dimension of the parameters used in the model of uncertainty. In many applications the parameters for the model of uncertainty have a bounded range of physically meaningful values. Selection of a finite subset of values in this range permit sufficiently good approximation of all possible covariance matrices. We assume that we work with cases in which a finite number of representative candidate covariance matrices are identified. The remaining issue becomes the identification of the most appropriate covariance matrix among these candidates. The following experiment illustrates an idealized cases of selecting the correct covariance matrix among a finite number of candidates. Let’s consider the domain D =[0,15)×[0,15) with elementary partition Γ e ={[i,i+1)×[j,j+1)} 15 i,j=0 and a partition=Γ={[0,5)×[0,5),[0,5)×[5,10),··· ,[10,15)×[5,10),[10,15)×[10,15). The associate state vectors are η 225×1 and β 9×1 , respectively. In this experiment we assume that 16 different covariance matrices{Σ 1 ,Σ 2 ,··· ,Σ 16 } are given as our candidates of the covariance of the elementary state vector η. We selected L lines as the observation operator. A Monte-Carlo simulation is performed as follows. Step 1: Randomly choose a number k from 1 to 16 ,generate the true state vector with mean 0 and covariance Σ k . Generate observation Y k according to the observation operator and the state. Compute the true average state vector β k for the given coarse resolution. 80 Step 2: For i = 1,2,··· ,16, regard Σ i as the covariance matrix of the elementary state vector and compute the error covariances according to the results from chapter 4 and calculate the minimum variance estimation ˆ β i . Then compute the weighted error between ˆ β i and β k according to √ 15k ˆ β i −β k k. Step 3: Repeat step 1-2 one hundred times and compute the average errors. The figures below show two example results for different observation operators. Figure 5.1: L=10 and true k is 4. We can see that the true case k will always give us the minimum error in the above two experiments, which means different selections of the covariance matrices of the elementary vector make significant differences in the behaviors of the esti- mation. Unfortunately we could not use this method to pick the correct covariance matrix out because the true average state β k is supposed to be estimated. We will give some possible approaches in next section. 81 Figure 5.2: L=30 and true k is 1. 5.2 Bayesian Approach In the previous section, we discussed that in many applications, it is possible to select covariance matrix Σ of the elementary state vector from a finite set of can- didate matrices {Σ 1 ,Σ 2 ,...,Σ I }. In this section, we will show how one can select the most appropriate model using available data. Suppose the elementary random state is normal distributed with mean 0 and variance Σ i for some unknown i ∈ {1,··· ,I}. For a series observation operator H 1 ,H 2 ,···, the forward systems are Y k = g W k η k +θ k =W k β+ε k +θ k , 82 where η 1 ,η 2 ,··· are i.i.d∼ N(0,Σ i ) and θ k are measurement error with distribu- tion N(0,σ 2 I) and independent of η 1 ,η 2 ,···. It’s easy to see that Y 1 ,Y 2 ,··· are independent with each other and Y k ∼N(0, g W k Σ i g W k 0 +σ 2 I). If Σ j is the right choice of covariance matrix then the j-th likelihood function for the observation Y k is L k (j)= 1 (2π) L k 2 q |B k j | e −{(Y k ) T B k j (−1) (Y k )/2} . (5.2) where B k j = g W k Σ j g W k 0 +σ 2 I. Let P 0 j = 1/I, then compute P k j recursively by the formula P k+1 j = L k (j)P k j P I r=1 L k (r)P k r . (5.3) We have the following result. Lemma 5.2.1. If for j = 1,2,··· ,I,we have P 0 j ≥ 0, P I j=1 P 0 j = 1 and P k+1 j = L k (j)P k j P I r=1 L k (r)P k r , then P k+1 j = Q k l=1 L l (j)P 0 j P I r=1 Q k l=1 L l (r)P 0 r . Proof. By induction. (i)k =1, it’s true by definition. 83 (ii)Suppose it’s true for k, i.e., P k j = Q k−1 l=1 L l (j)P 0 j P I r=1 Q k−1 l=1 L l (r)P 0 r . (iii)Then for k+1, P k+1 j = L k (j)P k j P I r=1 L k (r)P k r = L k (j) Q k−1 l=1 L l (j)P 0 j P I r=1 Q k−1 l=1 L l (r)P 0 r P I r=1 L k (r) Q k−1 l=1 L l (r)P 0 r P I s=1 Q k−1 l=1 L l (s)P 0 s = Q k l=1 L l (j)P 0 j P I r=1 Q k−1 l=1 L l (r)P 0 r · P I s=1 Q k−1 l=1 L l (s)P 0 s P I r=1 Q k l=1 L l (r)P 0 r = Q k l=1 L l (j)P 0 j P I r=1 Q k l=1 L l (r)P 0 r Then we have the following theorem. Theorem 5.2.1. Suppose g W 1 , g W 2 ,··· are i.i.d random vectors with same dis- tribution as the bounded M ×m random vector f W and they are independent of η 1 ,θ 1 ,η 2 ,θ 2 ,··· , P( f WΣ i f W 0 6= f WΣ j f W 0 )>0 for j6=i, (5.4) then P k i →1 and P k j →0(j6=i) almost surely when k→∞. We need the following two lemmas in the proof of this theorem. 84 Lemma 5.2.2 ([13]). Let X be an n×1 vectors of random variables, and let A be an n×n symmetric matrix, If E[X]=μ and Var[X]=Σ, then E[X 0 AX]=trace(AΣ)+μ 0 Aμ. Let F = σ( g W 1 , g W 2 ,···). Since Y l F ∼ N(0,B l i ) and B l j ∈ F, by the above lemma, we can get that E{X l F} = 1 2 log|B l i (B l j ) (−1) |+ 1 2 trace{[(B l i ) (−1) −(B l j ) (−1) ]B l i } = 1 2 log|B l i (B l j ) (−1) |+ M 2 − 1 2 trace[(B l j ) (−1) B l i ] Lemma 5.2.3. E{X l F}≤0 and c=EX l <0. Proof. Let A be M ×M positive definite symmetric matrix. We define f(A) = log|A|−trace(A). The functionf reaches it’s maximum whenA=I. To see this, let us denote λ 1 ,λ 2 ,··· ,λ M as the eigenvalues of A, clearly λ i >0 and f(A)= M X i=1 [log(λ i )−λ i ]. f reaches it’s maximum −M when λ i = 1 for all i = 1,2,··· ,M, i.e.A = I. Therefore E{X l F}= 1 2 h f B l i (B l j ) (−1) +M i ≤0. 85 Equation (5.4) guarantees that P{B l i (B l j ) −1 6=I}=P{B l i 6=B l j }=P{WΣ i W 0 6=WΣ j W 0 }>0, which implies P(E{X l F}< 0)> 0. This leads to EX l =E{E{X l F}}< 0 since E{X l F}≤0. Now we can proof Theorem 5.2.1. Proof. By lemma (5.2.1), we have P k+1 i = Q k l=1 L l (i)P 0 i P I r=1 Q k l=1 L l (r)P 0 r = Q k l=1 L l (i) P I r=1 Q k l=1 L l (r) = 1 1+ P j6=i Q k l=1 L l (j) Q k l=1 L l (i) = 1 1+ P j6=i exp{k· 1 k log Q k l=1 L l (j) Q k l=1 L l (i) } . Let us consider Z k j = 1 k log Q k l=1 L l (j) Q k l=1 L l (i) for j6=i, then Z k = 1 k log Q k l=1 L l (j) Q k l=1 L l (i) = 1 k k X l=1 [logL l (j)−logL l (i)] = 1 k k X l=1 {[− L l 2 log(2π)− 1 2 log|B l j |− 1 2 (Y l ) T (B l j ) (−1) Y l ] −[− L l 2 log(2π)− 1 2 log|B l i |− 1 2 (Y l ) T (B l i ) (−1) Y l ] 86 = 1 k k X l=1 [ 1 2 log|B l i (B l j ) (−1) |+ 1 2 (Y l ) T [(B l i ) (−1) −(B l j ) (−1) ]Y l ]. We denote X l = 1 2 log|B l i (B l j ) (−1) |+ 1 2 (Y l ) T [(B l i ) (−1) −(B l j ) (−1) ]Y l . Since W is bounded, from Lemma 5.2.3, we have E|X 1 |<∞. Since the sets of random vectors{ g W 1 , g W 2 ,··· ,},{η 1 ,η 2 ,···} and{θ 1 ,θ 2 ,···} are sets ofi.i.d random vectors and each set of random vector is independent of the other sets, then {Y l = f W l η l +θ l ,B l j = f W l Σ j ( f W l ) T +σ 2 I,B l i = f W l Σ i ( f W l ) T +σ 2 I} are also i.i.d. Hence X 1 ,X 2 ,··· are i.i.d. By the strong law of large numbers, Z k j = 1 k P k l=1 X l →EX l =c<0 almost surely when k→∞. So almost surely P k+1 i = 1 1+ P j6=i exp{k· 1 k Z k j } → 1 1+ P j6=i 0 =1 as k→∞. 5.3 Validation of Bayesian approach In this section, we will use numerical simulations to illustrate that the theorem proved in last section is true. We will consider the domain D = [0,15)×[0,15) with the elementary partitions Γ e = {[m,m + 1)× [n,n + 1),0 ≤ m,n ≤ 14}. 87 The first step is to generate a group of candidates of covariance matrices for the elementary state vector η. For each k, there is a unique pair (m,n) such that k-th element in Γ e is [m,m+1)×[n,n+1). Suppose the variance of η k follows σ 2 k =100cos − π 2 + mπ 15 − π 10 exp n 1 2 1− n α −3exp(− n α )(− π 2 + mπ 15 ) o +0.5, (5.5) where α is a unknown parameter. While the covariance between η k 1 and η k 2 is exp{−[m 1 −m 2 ,n 1 −n 2 ]R[m 1 −m 2 ,n 1 −n 2 ] 0 }σ k 1 σ k 2 , (5.6) where R is a positive definite 2×2 matrix. In my simulation, I used R = 2 1 1 1 . Now we can choose candidates for α, then we can have a family of candidate covariance matrices Σ 1 ,Σ 2 ,··· ,Σ I . Here I used α = 0.5,2.4286,4.3571,··· ,14 to generate candidates Σ 1 ,Σ 2 ,··· ,Σ 16 . Once we have candidates, we need generate observation operators. A Monte- Carlo simulation technique is used to generate randomly placed lines that cross domain D in a uniform and random fashion. The four sides of the rectangular domain D are labeled S 1 −S 4 . Each side i has an arbitrarily designated starting point B i and ending point E i . A line across the domain D must intersect two of the sides ofD. The process of generating arandomly placed line across the domain D consists of two steps. The first step randomly select one out of six possible 88 choice of pairs of sides that the line needs to intersect. The second step consists of generating two numbers ξ 1 and ξ 2 between 0 and 1. The point of intersection with the first selected side is selected at a distance equal to ξ 1 fraction of length of the side from its beginning point. Similarly, the point of intersection with the second line is at a distance equal to ξ 2 fraction of the second selected side. Figure 5.4 shows 6 randomly generated lines across the domain D. For l = 1, the two selected sides are S 1 and S 2 with length d 1 and d 2 , respectively. The intersection point of the line with sideS 1 is at a distanced 1 ·ξ 1 to the pointB 1 . Similarly, The intersection point of the line with side S 2 is at a distance d 2 ·ξ 2 to the point B 2 . Figure 5.3: Examples of randomly placed lines across domain D. For each time, we randomly generateL lines, then theL×225 matrix f W is the observation matrix, where f W lk is the path length in k-th element for line l. I did simulation when L=10 89 Each time k, we used 0.01I as the covariance matrix for the measurement error and Σ 14 is the true covariance matrix for the underlying system to generate obser- vations. Then we computeP k i ,i=1,··· ,16 by (5.3). The following graph howP k 14 changes as k increases. Figure 5.4: P k 14 vs k It’s easily to see that P k 14 approaches 1 when we have more and more observa- tions, which shows the result we proved is correct. 90 Chapter 6 Application to 3-D tomography of plasmasphere In this chapter, we are going to do the 3-D tomographic inversion of electron den- sities in plasmasphere. It’s a joint project between UCLA and USC. In section 6.1, we give a further discussion of the 3-D tomography of plasmasphere problem. In section 6.2, we set up a reasonable forward model for the inversion problem. In section 6.3, we use least square methodtototheinversion. In section6.4, we select the uncertainty for the underlying random state from a family of candidates and retrieve the electron densities by minimum variance method. 6.1 Further discussion about 3-D tomography In Chapter 3, we gave a brief introduction to plasmasphere and three dimensional tomography problem of it. The data are collected from 1 LEO satellite, which communicates with several GPS satellites at the same time. The Figure 6.1 shows 20 minutes of data. 91 Figure 6.1: 20 mins of data from 1 LEO Satellite Fromthisgraph,wecanfindthatthedatafrom1LEOsatellitearelimitedsince mostofthespacearenotpenetrated. Thusit’snearlyimpossibletodothefullthree dimensional retrieval. We can see also that the pathes from the same GPS satellite are very close to each other, which means we are sampling in the same location repeatedly. Consequently, the number of useful data is very small compared with the total number of the data. It’s quite challenging to do the retrieval with those very limited data. To have data as much as possible, we take several hours of data to do the retrieval. But the earth rotates on its axis while the plasmasphere depends on the sun mostly. The plasmasphere in day time is totally different from that in the night. The densities of the plasma depend on its relative position with respect to the sun thus depend on the local time. In our approach, we will think 92 that the sun is fixed. As we said, it’s impossible to do full 3-D inversion, we need discretize the problem and find the observation operators. 6.2 Parametrization In this section we are going to format our problem into the form as in chapter 4 so thatwecouldapplytheresultswealreadyhavetodotheinversion. Apparentlythe underlyingsystemstateistheelectrondensityζ(z,ω)withinacertaindomainD in the space. The observation operators are line integral along the pathes connecting the LEO and GPS satellites. We will use the electron density computed from the physics-based model created by P. A. Webb and E. A. Essex([15]) in 2004 as our prior. The first step is to find an appropriate approximate random state. We tried averagedrandomstateapproximationfirst. Forcomputationalpurpose,weadopted spherical coordinate system. For any point z in the space, it can be located by its longitudeθ, latitudeϕ and distancer from the earth center. The domainD we are interestedis[0,360)×[−90,90)×[R s ,R e ). Weneedtofindapartitionforthedomain D. Suppose 0 =θ 0 <θ 1 <···<θ I = 2π and π/2 =ϕ 0 >ϕ 1 >···>ϕ J =−π/2 and R s =r 0 <r 1 <···<r K =R e , automatically we have a partition Γ={[θ i ,θ i+1 )×[ϕ j+1 ,ϕ j )×[r k ,r k+1 ),i=0,··· ,I−1,j =0,··· ,J−1,k =0,··· ,K−1}. 93 We can also call an element in the partition as a voxel. The voxel [θ i ,θ i+1 )× [ϕ j+1 ,ϕ j )×[r k ,r k+1 ) corresponds to (i−1)JK+(j−1)K+r-th element in Γ. It’s easy to see thel-th observationTEC l =Y l obtained from lineP l can be written as Y l = IJK X m=1 W ml ζ m +ε l , whereW ml is the path length ofP l in the voxelm,ζ m is the averaged random state overm-th voxel andε l is the sum of thel-th measurement error and representation error. After we tried this approach for the data for some time slots, we found that it doesn’t make much sense because of the following reasons. First, majority of the space has not been penetrated, thus the portion of the voxels that have been penetrated is small. Second, the altitudes of the GPS satellites are about 4 earth radii while the altitude of the LEO satellite is 1/5 earth radii. This geometry property leads to the fact that most of the data are almost vertically integrated electron density, thus we are lack of the information how the density is distributed vertically. To overcome those difficulties, we decided to take advantage of the prior andusethefiniteseriesapproximation. Theideaistodiscretizethespaceaccording to latitude and longitude and then construct a function relying on altitude for each voxelfromtheprior, thenmultiplythecorrespondingweightfunctioninthatvoxel. Then our task is to find an orthogonal projection of the random state to the space spanned by above functions. Now we are going to explain it in details. 94 LetD =[0,2π)×[−π/2,π/2]×[R 0 ,R 1 ] be the domain we are interested in and ζ(z,ω) is the random state we want to estimate. Suppose 0=θ 0 <θ 1 <···<θ I = 2π and π/2=ϕ 0 >ϕ 1 >···>ϕ J =−π/2, then a partition of D is Γ={[θ i ,θ i+1 )×[ϕ j+1 ,ϕ j )×[R 0 ,R 1 ),i=0,··· ,I−1,j =0,··· ,J−1}. Totally we haveM =I·J voxels. For m-th voxelV m , there are uniquei,j such that V m = [θ i ,θ i+1 )×[ϕ j+1 ,ϕ j )×[R 0 ,R 1 ), then the prior along the center line of this voxel,i.e. θ = θ i +θ i+1 2 , ϕ= ϕ j+1 +ϕ j 2 , isafunctionofthedistancefromtheearthcenter. Letusdenoteitasf m (r). Anat- ural idea is to projectζ into the space spanned by{f m (|z|)χ(z∈V m )} M m=1 , i.e. ap- proximateζ by P M m=1 α m f m (|z|)χ(z∈V m ). Howeverthisapproximationisconstant for points with same altitudes within a voxel, which is not realistic. What’s more, letuscomparethetotalelectroncontentcomputedfrom P M m=1 α m f m (|z|)χ(z∈V m ) when α m =1 and the observed total electron content in Figure 6.2 . We can see that the TEC from this approximate state is not smooth enough although the model makes sense. To avoid this discontinuity, we modify our ap- proach a little bit. First, for a set of data, let us just consider the voxels that has been penetrated and we still call them V 1 ,V 2 ,··· ,V M for simplicity. We will call two voxels are neighbors if the intersection of their closures are not empty. 95 Figure 6.2: Compare observed TEC with TEC from the model 1 Now we can construct M weight functions which quantified how the electron den- sity in each voxel is related to its neighbor points. For m-th voxel V m , let θ i and ϕ j be the longitude and latitude of the centerline respectively, then for any point z =(θ,ϕ,r)∈V k , the m-th function is defined as d m (z)= exp{− (θ−θ i ) 2 σ 2 1 − (ϕ−ϕ j ) 2 σ 2 2 }, if V m T V k 6=∅ 0, else (6.1) whereσ 1 ,σ 2 aretwopositivenumbersandcanbeadjusted. Weneedthesummation of the weight function to be 1, so we introduce weight functionsρ 1 ,··· ,ρ M , where ρ m = d m P M k=1 d k . 96 Denote δ m (z) = ρ m (z)f m (|z|), then we are seeking the finite series approximate ζ = P M m=1 α m δ m oftherandomstateζ. Nowletussetuptheforwardmodelsystem. The observation operators are line integrals. So for each line P l , the observation is the TEC along the line, which will be denoted as Y l , the observation operator is the line integral along P l , the approximated random state is P M m=1 α m δ m (z), then Y l = Z P l ζ(z)dP l +ν l = Z P l M X m=1 α m δ m (z)dP l +ε l +ν l = M X m=1 [ Z P l ρ m (z)f m (|z|)dP l ]α m +ε l +ν l . where ε l is the representation error and ν l is the measurement error. In matrix form, the model is Y =Wα+ε+ν, (6.2) where Y = [Y 1 ,··· ,Y L ] 0 , W lm = R P l ρ m (z)f m (|z|)dP l , α = [α 1 ,··· ,α M ] 0 , ε = [ε 1 ,··· ,ε L ] 0 and ν = [ν 1 ,··· ,ν L ] 0 . Now let us compare observed TEC with the TEC computed from the model again with α m =1. NowwecanseeitfromFigure6.3thatthediscontinuitydisappearedandthetwo curves roughly have the same shape, which indicates this type of modeling makes some sense. After we have the forward model, we are ready to do the inversion. 97 Figure 6.3: Compare observed TEC with TEC from the model 2 6.3 Least Square Retrieval Least square approach is always the first trial since it’s the simplest one. We take the data 2-5 am on Oct 11th, 2004 as an example. Totally, we have 5055 observations. Wedividethespaceevery10degreesforbothlongitudeandlatitude, thus the total number of voxels is 648. Only 322 voxels have been penetrated and wefindoutthattherankoftheobservationmatrixis322too. Sowetriedtheleast square approach, Figure 6.4 shows the estimation for α. 98 Figure 6.4: Least Square retrieval It’s a unsuccessful try since the huge negative α will lead to negative density. This is because mathematically the observation matrix is full rank, but it’s ill- conditioned. Let σ 1 ,σ 2 ,··· ,σ 322 be the singular values for the observation matrix W, then there exist orthogonal matrices U,V such that W =USV 0 , where S = σ 1 0 0 0 σ 2 0 ............. 0 0 σ 322 0 .... 0 ............. . 99 We plot the singular values ofW in Figure 6.5 and we find that the singular values rapidly drop to 0. Figure 6.5: Singular values of the observation matrix W. Let S I be the first m dominant columns of S and S q be the rest columns. Let X = V 0 α and X I be the first m rows of X and X q be the rest, then system (6.2) can be written as Y = Wα+ε+ν =USV 0 α+ε+ν = U S I S q X I X q +ε+ν = US I X I +US q X q +ε+ν. 100 BecauseS q is close to 0, we can concludeUS q X q ≈0. We can just regardX q =0 and then apply least square method to the system Y = US I X I +ε +ν to get c X I =(S 0 I U 0 US I ) −1 S 0 I U 0 Y. Thus the estimator of α is given by b α =V c X I 0 . For the previous example, we takem=75 and do the inversion. The new retrieval of α is showed in the Figure 6.6. Figure 6.6: Modified least Square retrieval From the graph, almost every α is above 0. What’s more, let us compare the histograms of the residual before retrieval and after retrieval in Figure 6.7 and 6.8. We can see that the retrieval reduces the residuals a lot. 101 Figure 6.7: Residual before retrieval, mean=1.0, variance=1.7 Figure 6.8: Residual after retrieval, mean=0.2, variance=0.9 102 6.4 Minimum Variance Retrieval We want to use minimum variance method to refine our retrieval from least square method. To do that, first we need compute the system uncertainties. Because δ 1 ,δ 2 ,··· ,δ M are not orthonormal, wee need use Gram-Schmidt process to normal- ize {δ 1 ,··· ,δ M } to {φ 1 ,··· ,φ M } and denote the transformation matrix between them as T, i.e. [φ 1 ,··· ,φ M ] 0 =T[δ 1 ,··· ,δ M ] 0 . Thentheprojectionofζ tothespacespan{φ 1 ,··· ,φ M }isthesameasspan{δ 1 ,··· ,δ M }. If we use P M m=1 β m φ m to approximate ζ, similarly, we can get Y l = Z P l ζ(z)dP l +ν l = Z P l M X m=1 β m φ m dP l +ε l +ν l = M X m=1 [ Z P l φ m dP l ]β m +ε l +ν l = M X m=1 [ Z P l M X k=1 T mk δ k dP l ]β m +ε l +ν l = M X m=1 M X k=1 T mk W lk β m +ε l +ν l , and thus (6.2) becomes Y =WT 0 β+ε+ν. (6.3) 103 We use the unbiased prior estimator of ˆ α from least square to get the unbiased estimator of β by β 0 =[β 01 ,β 02 ,··· ,β 0M ]=T 0 ˆ α, the above equation becomes Y −WT 0 β 0 =WT 0 (β−β 0 )+ε+ν. (6.4) We are actually working on the underlying random system stateζ− P M m=1 β 0m φ m . For notation simplicity, we still call ζ− P M m=1 β 0m φ m as ζ, Y −WT 0 β 0 as Y and β−β 0 asβ. Thuswestillhavelinearsystem6.2. Withtheinformationofcorrelation function R(x,y) of ζ and the Var(ν)=σ 0 I, we can easily compute the covariances matrix of β−β 0 and and the covariance between them from (4.10), (4.7) and (4.16), i.e., EY k Y l = Z P k Z P l R(z,y)dzdy+σ 2 0 χ (k=l) , (6.5) Eβ m β n = Z D Z D R(z,y)φ m (z)φ n (y)dzdy, (6.6) Eβ m Y l = Z P l Z D R(z,y)φ m (z)dzdy. (6.7) ThefirststepistodeterminethecorrelationfunctionR(z 1 ,z 2 ). Weassumethat R(z,z) is proportional to n 0 (z), i.e. R(z,z)=pn 2 0 (z) and R(z 1 ,z 2 )=exp{−|z 1 −z 2 | 2 /σ 2 } p R(z 1 ,z 1 ) p R(z 2 ,z 2 ). 104 Thenthetaskistoidentifypandσ. Herewejusttestpfrom{0.05,0.1,0.15,··· ,0.3} and σ from R/4,R/2,R,2R,3R where R = 6371 is the earth radius. We assign σ 0 = 2 and compute the variance of Y l for each p and σ by equation (6.5). Then we plug in the variance of Y l in equation (5.2) to compute the likelihood function for each p and σ. Then we calculate the posterior density functions by equation (5.3) to see which one approaches to 1. We select p=0.05 and σ =R/4. Then we can numerically approximate the systems uncertainties by equation (6.5)-(6.7) to the inversion. Figure 6.9 gives the residual after minimum variance estimation. Figure 6.9: Residual after minimum variance retrieval, mean=0.1, variance=0.5 105 Chapter 7 Conclusion In this thesis, we have established a framework for the modeling of random state which takes values in a Banach space. We have shown that this framework allowed us to derive covariance properties of finite dimensional approximations to infinite dimension filtering problems. In particular, we have successfully established rela- tionship between covariance matrices for approximations of an infinite dimensional state space at different resolution, as well as, precisely define the concept of rep- resentation error. We have also demonstrated that the consistent modeling of the covariance of the finite dimensional approximations leads to statistically consistent filtering results. By removing inconsistencies in the modeling of covariances for dif- ferent approximations of an infinite dimensional filtering problem, we have gained possibility to use Bayesian approach to select the most appropriate covariance ma- trix among a finite family of candidates. The work presented here is motivated by data assimilation problems for the Earth’s ionosphere and plasmasphere. Using the approach developed in this work, 106 we have demonstrated that these filtering problems with infinite dimensional state space can be solved successfully. As for any real world application, there are still manychallengingissuesthathavetobeaddressedinthefutureresearch. Inpartic- ular, we still need to develop methods to combine larger diversity of data to allow higher fidelity of retrieval. 107 References [1] Adams, Robert A.,Sobolev Space, Acadmeic Press, INC(1975) [2] Bierman,G.J.,Factorization Methods for Discrete Sequential Estima- tion,Academic Press(1977) [3] Folland,Gerald B.,Real Analysis: Modern Techniques and Their Applica- tions,Wiley-Interscience(1999) [4] Gikhman, Iosif I., Skorokhod, Anatoli V. The theory of stochastic professes I, Springer [5] Hajj,George A, Wilson, Brian D.,Wang,Chunming and Pi, Xiaoqing. Iono- spheric Data Assimilation by Use of the Kalman Filter, 10th International Ionospheric Effects Symposium, (2002) 231–238 [6] Hanea,R.G.,Velders,G.J.M. and Heemink,A. Data assimilation of ground-level ozone in Europe with a Kalman filter and chemistry transport model, J. Geo- phys. Res.,109,(2004) D10302,doi:10.1029/2003JD004283. [7] Houtekame,P.L and Mitchel,H.L.Data assimilation using an ensemble Kalman filter technique.Wea.Rev.,126(1998),796–811 [8] Kalman, Rudolph Emil, A New Approach to Linear Filtering and Prediction Problems,Transactions of the ASME–Journal of Basic Engineering,82 (1960) 35–45 [9] Kelley, M. C., The Earth’s Ionosphere, Academic Press, Inc:San Diego, 1989. [10] Kilpinen, J., The application of Kalman filter in statistical interpretation of numerical weather forecasts, Proceedings of the 12th Conference on Probabil- ity and Statistics in Atmospheric Sciences, American Meteorological Society (1992) 11–16 [11] Luenberger, David G., Optimization by vetor space methods, John Wiley & Sons, Inc, 1969 [12] Persson,A.O.,Kalman filtering-A new approach to adaptive statistical interpre- tation of numerical meteorological forecasts. WMO TD421(1991) xx27-xx37. 108 [13] Seber, George A.F. and Lee, Alan J.Linear regression Analysis, John Wiley & Sons, Inc, 2003 [14] Tillack, Gerd-Rdiger, Artemiev,V.M.andNaumov,A.O.Dynamic image recon- struction for CT applications,AIPConferenceProceedings,557(2001)748–755 [15] Webb, P.A. and Essex, E.A. A dynamic global model of the plasmasphere, Journal of Atmospheric and Solar-Terrestria Physics,66(2004) 1057-1073 109
Abstract (if available)
Abstract
In this dissertation, we first established a mathematical framework for approximation of a filtering problem for infinite dimensional systems with random state. Particularly we defined random states for infinite dimension systems. We also established statistical property of finite dimensional approximations and gave a rigorous treatment for the contributions of error in the observation due to approximation. Numerical simulations were performed to validate the results. Secondly, a type of Bayesian approach was developed to select the most appropriate covariance matrix among a finite number of candidates. Monte carlo simulations were conducted to validate this approach. Finally, we applied the results to 3-D estimation problem for plasmasphere electron density.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Finite dimensional approximation and convergence in the estimation of the distribution of, and input to, random abstract parabolic systems with application to the deconvolution of blood/breath al...
PDF
An abstract hyperbolic population model for the transdermal transport of ethanol in humans: estimating the distribution of random parameters and the deconvolution of breath alcohol concentration
PDF
Obtaining breath alcohol concentration from transdermal alcohol concentration using Bayesian approaches
PDF
Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol
PDF
Simultaneous parameter estimation and semi-blind deconvolution in infinite-dimensional linear systems with unbounded input and output
PDF
Linear quadratic control, estimation, and tracking for random abstract parabolic systems with application to transdermal alcohol biosensing
PDF
Modeling of Earth's ionosphere and variational approach for data assimilation
PDF
Distributed parameter model based system identification and filtering in the estimation of blood alcohol concentration from transdermal alcohol biosensor data
PDF
Estimation of random input to semi-linear abstract parabolic systems with application to quantitative description of drinking behavior based on trans-dermal alcohol concentration
PDF
Population modeling and Bayesian estimation for the deconvolution of blood alcohol concentration from transdermal alcohol biosensor data
PDF
Robust estimation of high dimensional parameters
PDF
Delta Method confidence bands for parameter-dependent impulse response functions, convolutions, and deconvolutions arising from evolution systems described by…
PDF
M-estimation and non-parametric estimation of a random diffusion equation-based population model for the transdermal transport of ethanol: deconvolution and uncertainty quantification
PDF
Physics-informed machine learning techniques for the estimation and uncertainty quantification of breath alcohol concentration from transdermal alcohol biosensor data
PDF
Signal processing for channel sounding: parameter estimation and calibration
PDF
A fully discrete approach for estimating local volatility in a generalized Black-Scholes setting
PDF
Nonparametric estimation of an unknown probability distribution using maximum likelihood and Bayesian approaches
PDF
Essays on estimation and inference for heterogeneous panel data models with large n and short T
PDF
Limiting distribution and error terms for the number of visits to balls in mixing dynamical systems
PDF
Stein's method and its applications in strong embeddings and Dickman approximations
Asset Metadata
Creator
Yang, Yan
(author)
Core Title
Covariance modeling and estimation for distributed parameter systems and their approximations
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Publication Date
12/04/2008
Defense Date
09/03/2008
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
3-D tomography,Bayesian,covariance matrix,OAI-PMH Harvest,plasmasphere,random state,representation error
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Wang, Chunming (
committee chair
), Levi, Anthony F. J. (
committee member
), Rosen, Gary (
committee member
)
Creator Email
yanyang@live.com,yanyang@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m1865
Unique identifier
UC1232695
Identifier
etd-Yang-2377 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-130313 (legacy record id),usctheses-m1865 (legacy record id)
Legacy Identifier
etd-Yang-2377.pdf
Dmrecord
130313
Document Type
Dissertation
Rights
Yang, Yan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
3-D tomography
Bayesian
covariance matrix
plasmasphere
random state
representation error