Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol
(USC Thesis Other)
Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol by Lernik Asserian A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Applied Mathematics) August 2021 Copyright 2021 Lernik Asserian Dedication This dissertation is dedicated . . . to my family. ii Acknowledgements I would like to extend my great appreciation to my advisor Dr. Gary Rosen for his tremendous support and care in pursuing my PhD. I could not have asked for a better advisor and mentor. Besides his insightful guidance on my research and encouragement on pursuing my PhD, his great personality made my journey at USC invaluable. I would like to thank my committee members, Dr. Chunming Wang, Dr. Jay Bartroff, and Dr. Sergey Lototsky for their time and feedback. And, I would like to especially thank Dr. Susan Luczak for her care and support, as well as, collaborations on publishing the research findings. I would also like to thank Dr. Susan Montgomery, Director of Graduate Studies, for her endless care, feedback, and support during my PhD program. And, Amy Yung, Office Manager, for always being available, responding to my emails quickly, and answering all my questions. I would like to thank my officemates and friends, Maria, Irmak, Zhanerke, Keenan, and Mark, who made my journey at USC much memorable and fun. And, I am especially thankful for my academic sister, Melike, for her unlimited support. I would also like to thank USC Women in Math: Charlotte’s Web group, as well as Math Graduate Student Association (MGSA), especially Calum, for organizing fun, friendly, and exciting events. And, I would also like to thank all my students throughout these five years who made my journey at USC far exciting, fulfilling my great passion for teaching. iii Finally, I would like to thank my parents and my sister for their love and support. As an immigrant to the United States, I am always thankful for the great sacrifices that they have made for me to flourish in the land of opportunities. I would like to especially thank my father, who has always been very supportive. I can never forget the loss of my grandmother a few days before my qualifying exam. She did not know how to read and write, and was not even familiar with numbers to use a phone. As a first generation college student, I stood there, in front of the committee members, so thrilled to be pursuing my PhD in Applied Mathematics, making my grandma proud in Heaven. Last but not least, the best thing during these past five years was having my best friend, love of my life, my husband, Derenik, by my side. We have been through this challenging journey together, and I undoubtedly could not have done this without his endless love and support. He celebrated the completion of every milestones, and reminded me to focus on the end result. He kept motivating me by saying “I want to see you in the USC cardinal doctoral regalia”. Thank you for everything! iv Table of Contents Dedication ii Acknowledgements iii List Of Tables vii List Of Figures viii Abstract x Chapter 1: Introduction and Motivation 1 1.1 The Transdermal Transport of Alcohol Problem . . . . . . . . . . . . . . . . . . 1 1.2 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2: Mathematical and Statistical Literature Review 10 Chapter 3: Preliminaries 13 3.1 The Least Squares Estimator: Banks and Thompson . . . . . . . . . . . . . . . . 14 3.2 The Maximum Likelihood Estimator: Kiefer and Wolfowitz . . . . . . . . . . . 16 3.3 Functional Analysis: Definitions and Theorems . . . . . . . . . . . . . . . . . . 19 3.4 Probability and Statistics: Kolmogorov-Smirnov Test . . . . . . . . . . . . . . . 24 3.5 Probability and Statistics: Markov Chain Monte Carlo . . . . . . . . . . . . . . 25 3.6 The Prohorov Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 4: The Least Squares Estimator 33 4.1 The Mathematical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Existence and Consistency of the Least Squares Estimator . . . . . . . . . . . . 35 4.3 Approximation and Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.4 A General Approach to Error Models . . . . . . . . . . . . . . . . . . . . . . . 50 Chapter 5: The Maximum Likelihood Estimator 53 5.1 The Mathematical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2 Existence and Consistency of the Maximum Likelihood Estimator . . . . . . . . 57 5.3 Approximations and Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 60 v Chapter 6: Abstract Parabolic Systems 63 6.1 Abstract Parabolic Systems and Their Properties . . . . . . . . . . . . . . . . . . 63 6.2 Finite-Dimensional Approximating Systems . . . . . . . . . . . . . . . . . . . . 66 6.3 Verification of the Finite-Dimensional Assumptions of the LS Estimator . . . . . 68 6.4 Verification of the Finite-Dimensional Assumptions of the MLE . . . . . . . . . 69 Chapter 7: Application to the Transdermal Transport of Alcohol 72 7.1 A Random Parabolic PDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.2 The Weak Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.3 Verification of the Sesquilinear Form and Finite-Dimensional Assumptions . . . 75 Chapter 8: Numerical Results 78 8.1 The Least Squares Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 8.1.1 Example 1: Estimation Based on Simulation Data . . . . . . . . . . . . . 79 8.1.2 Example 2: Estimation Based on Human Subjects Data . . . . . . . . . . 85 8.2 The Maximum Likelihood Estimator . . . . . . . . . . . . . . . . . . . . . . . . 90 8.2.1 Example 1: Estimation Based on Simulation Data . . . . . . . . . . . . . 91 8.2.2 Example 2: Estimation Based on Human Subjects Data . . . . . . . . . . 96 Chapter 9: Discussion and Concluding Remarks 103 Reference List 107 vi List Of Tables 8.1 The averaged p-value for M = 400 and N = 128 with regularization parameters w 1 = 2 10 3 , and w 2 = 5 10 5 . . . . . . . . . . . . . . . . . . . . . . . . . 84 8.2 Decrease in NRMSE mean for an increasing number of nodes M and an increasing level of discretization N for the dataset collected by the WrisTAS7 biosensor . . . 87 8.3 Decrease in NRMSE mean for an increasing number of nodes M and an increasing level of discretization N for the dataset collected by the SCRAM biosensor . . . . 89 8.4 Decrease in D, the sum of the squared differences at each node between the es- timated and the “true” distribution, with increasing m, the number of drinking episodes, for fixed values of the number of nodes M = 400 and the level of dis- cretization N = 128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.5 Decrease in ¯ D M , the normalized sum of the squared differences at each node between the estimated distribution and the “true” distribution, with increasing M, the number of nodes, for a fixed value for the level of discretization, N = 128 . . 95 8.6 Decrease in ¯ D N , the normalized sum of the squared differences at each node between the estimated distribution and the “true” distribution, with increasing N, the level of discretization, with the number of nodes fixed at M= 400 . . . . . . 96 8.7 The measured peak TAC, the estimated peak TAC, and the 95% error band for the 9 drinking episodes from the test set collected by the SCRAM biosensor . . . 100 8.8 The measured peak time, estimated peak time, and the 95% error band for the 9 drinking episodes from the test set collected by the SCRAM biosensor . . . . . . 101 8.9 The measured area under the curve (AUC), estimated AUC, and 95% error band for 9 drinking episodes from the test set collected by the SCRAM biosensor . . . 101 vii List Of Figures 1.1 WrisTAS TM 7 (left) and SCRAM CAM R (right) alcohol biosensors . . . . . . . . 3 8.1 Different views of the estimated distribution and the “true” joint Beta(2;5) dis- tribution for M= 400 and N = 128 with regularization term . . . . . . . . . . . 83 8.2 Histogram of the generated q 1 samples (left) and q 2 samples (right) plus the “true” probability density function of Beta(2;5) (red) . . . . . . . . . . . . . . . . . . . 84 8.3 Boxplot of NRMSE i given the complexity of the model as labeled in Table (8.2). A decrease in NRMSE mean as M and N increase (blue), and an increase in the computational cost in seconds (green) is observed for the dataset collected by the WrisTAS7 biosensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.4 The measured TAC, the estimated TAC, and the 95% simultaneous confidence band for 9 drinking episodes from the test set collected by the WrisTAS7 biosen- sor using the LOOCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.5 Boxplot of NRMSE i given the complexity of the model as labeled in Table (8.3). A decrease in NRMSE mean as M and N increase (blue), and an increase in the computational cost in seconds (green) is observed for the dataset collected by the SCRAM biosensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 8.6 The measured TAC, the estimated TAC, and the 95% simultaneous confidence band for 9 drinking episodes from the test set collected by the SCRAM biosensor using the LOOCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 8.7 Three different views of the estimated distribution and the “true” joint Beta(2;5) distribution, for different numbers of drinking episodes m= 7, m= 16, and m= 42 in the top, middle, and bottom rows, respectively, for a fixed number of nodes M= 400 and level of discretization N = 128 . . . . . . . . . . . . . . . . . . . . 94 viii 8.8 The estimated density (top left), estimated distribution (top right), marginal den- sity of q 1 (bottom left), and marginal density of q 2 (bottom right) obtained from m= 9 drinking episodes collected by the SCRAM biosensor, for fixed values of M= 400 and N = 128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 8.9 The measured TAC, the estimated TAC, and the conservative 95% error band for 9 drinking episodes from the test set collected by the SCRAM biosensor using the LOOCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 8.10 The measured TAC, the estimated TAC, and the conservative 95% error band for 9 drinking episodes from the test set collected by the WrisTAS7 biosensor using the LOOCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 ix Abstract Motivated by the transdermal transport of alcohol problem, we consider two approaches for estimating the probability distribution of a random parameter vector in discrete-time abstract parabolic systems. First, we take the naive pooled approach to set up a least squares estimator, and in the second approach, we use the mixed effects model to set up a maximum likelihood estimator. In both approaches, we consider a Prohorov metric-based nonparametric approach to estimating the probability distribution of the random parameter vector. We establish the exis- tence and consistency of the estimators. A theoretical convergence result for a finite-dimensional approximation scheme for computing the estimators is also established and the efficacy of the ap- proach is demonstrated by applying the scheme to the transdermal transport of alcohol modeled by a random parabolic PDE. In the least squares estimator case, to show the convergence of the estimated distribution to the “true” distribution, we simulate data from the “true” distribution, apply our algorithm, and obtain the estimated cumulative distribution function. We then use the Markov Chain Monte Carlo Metropolis Algorithm to generate random samples from the estimated distribution, and perform a generalized (2-dimensional) two-sample Kolmogorov-Smirnov test with null hypothesis that our generated random samples from the estimated distribution and generated random samples from the “true” distribution are drawn from the same distribution. We then apply our algorithm to x actual human subject data from the alcohol biosensor and observe the behavior of the normalized root-mean-square error (NRMSE) using leave-one-out cross-validation (LOOCV) under different model complexities. In the maximum likelihood estimator case, numerical studies included show that the max- imum likelihood estimator is statistically consistent in that the convergence of the estimated distribution to the “true” distribution is observed in an example involving simulated data. The algorithm developed is then applied to two datasets collected using two different transdermal alcohol biosensors. Using the LOOCV method, we get an estimate for the distribution of the random parameters based on a training set. The input from a test drinking episode is then used to quantify the uncertainty propagated from the random parameters to the output of the model in the form of a 95% error band surrounding the estimated output signal. xi Chapter 1 Introduction and Motivation 1.1 The Transdermal Transport of Alcohol Problem In the absence of a blood sample, breath alcohol concentration (BrAC) obtained by a breathalyzer is typically used as a surrogate for blood alcohol concentration (BAC) for measuring the alcohol level in the human body. This is the case in law enforcement, medical research, and clinical therapy. The breathalyzer was developed by Borkenstein based on a redox (i.e. oxidation and reduction) reaction and Henry’s law [42]. While not entirely without controversy, there is rea- sonable agreement between BrAC and BAC, and this relationship has been shown to be relatively consistent across individuals under varying conditions [42]. Even though there is widespread acceptance among researchers and clinicians of using BrAC as a proxy for BAC, collecting and using samples of BrAC can be both challenging and limiting. For example, it is impractical to collect continuous or near-continuous BrAC data. Also, blowing into the breathalyzer so that a deep lung sample is obtained can be difficult, and if still drinking, the blown sample may be contaminated by mouth alcohol. Alternatively, ethanol, the type of alcohol contained in alco- holic beverages, is highly miscible and finds its way into all the water in the body. Moreover, it 1 has been known for a long time that the alcohol contained in perspiration is positively correlated with BAC (and BrAC) [73], but unfortunately, due to a number of confounding factors, the actual functional relationship between alcohol in perspiration and BAC/BrAC is not entirely clear. The technology required to measure the alcohol level in perspiration has been evolving, and there are now biosensors available that measure transdermal alcohol concentration (TAC). TAC is the amount of alcohol that diffuses from the dermal layer of the skin through the epidermal layer of the skin. The dermal layer of the skin is the layer that has active blood supply, and the epidermal layer of the skin is the layer that does not contain blood. After consuming alcohol, most of the ethanol is metabolized by the liver into other products that are then excreted. In addition, a portion of ingested ethanol exits the body directly through exhalation and urination [59] and approximately 1% of ethanol diffuses through the epidermal layer of the skin in the form of perspiration and sweat. The amount of alcohol excreted in this manner is quantified in the form of TAC. Currently, there are a number of different alcohol biosensors based on a variety of analog principles that can measure TAC essentially continuously, passively, unobtrusively, and relatively accurately, and make it available for processing in real time. Some of these devices are already commercially available and more technology are on the way. Several of these alcohol biosensors, like the breathalyzer, rely on relatively standard fuel-cell technology (i.e. converting chemical energy into electricity through redox reactions). The device produces 4 electrons for each molecule of ethanol, and effectively count the number of ethanol molecules that evaporate during perspiration from the epidermal layer of the skin in near-continuous time [48]. The result is an electrical current that can be measured and converted into TAC. Figure (1.1) shows two of these TAC measuring devices; the WrisTAS T M 7, developed by Giner, Inc. in Waltham, MA 2 and the SCRAM CAM R (Secure Continuous Remote Alcohol Monitor), developed by Alcohol Monitoring Systems, Inc. (AMS) in Littleton, Colorado. Figure 1.1: WrisTAS TM 7 (left) and SCRAM CAM R (right) alcohol biosensors Historically, researchers, clinicians, and the courts have always relied on BrAC or, when available, BAC. Consequently, in order to make TAC biosensors practical and accepted by the alcohol community, reliable and consistent means for converting TAC into equivalent BrAC or BAC must be developed. However, the precise relationship between TAC and BrAC or BAC is complicated due to a number of confounding physiological, technological, and environmental factors. For instance, there are variations in TAC 1) between different sensors including the same brand of sensors, 2) between persons (i.e. inter-individual variations) due to thickness of the skin, porosity (the quality of being porous, i.e. having tiny holes that fluid flows through), tortuosity (i.e. the diffusion and fluid flow through turns and twists in a porous media), and diffusion rate through the skin, and 3) within person across drinking episodes (i.e. intra-individual variations) due to environmental conditions, temperature and humidity, and skin hydration and vasodilation (i.e. the dilation of blood vessels which can result in a lower blood pressure). Taken together, all 3 these variations create significant challenges in obtaining a direct method of converting TAC to BrAC or BAC in order to obtain a meaningful measure for the alcohol community. 1.2 Previous Work In the past, the approach to developing a method for converting TAC into BrAC or BAC was based on deterministic methods for estimating parameters in distributed parameter systems such as those described in [16, 19]. Earlier work along these lines has been reported in, for example, [25, 32, 58]. In these treatments, a forward model in the form of a one-dimensional diffusion equation based on Fick’s law [70] with BrAC as the input and TAC as the output is first calibrated (i.e. fit) using BrAC and TAC data collected from the patient or research subject in the clinic or laboratory during what is known as a controlled alcohol challenge. Then, after the same patient or research subject has worn the TAC sensor in the field for an extended period of time (e.g. days, weeks, or even months), the TAC data is downloaded, and the fit forward model is used to deconvolve the BrAC or BAC input from the observed TAC output. However, the calibration process, which requires the simultaneous collection of BrAC and TAC for at least one drinking episode, is difficult to conduct, time consuming, and often impractical. It requires a lot of care and accuracy to collect the data. As mentioned before, the subject should blow into the breathalyzer so that a deep lung sample is obtained, and also if still drinking, the blown sample may be contaminated by mouth alcohol. So the subject should be trained on how to clear the alcohol in the mouth before obtaining a sample, and how to collect a sample as accurate as possible. In order to eliminate the calibration process, to make the process of estimating BrAC from TAC simpler for alcohol researchers and clinicians, we developed a population model-based 4 approach wherein the parameters in the model are assumed to be random. Then, rather than fitting the values of the parameters themselves, the distributions of the random parameters are estimated based on BrAC and TAC data from a cohort of individuals (see, for example, [2, 66, 67, 68]). In [66, 67, 68], a first principles physics-based initial-boundary value model for the trans- dermal transport of ethanol in the form of a random parabolic partial differential equation is considered. The parameters of this model are considered to be random due to a number of dif- ferent sources of uncertainty. The fundamental idea in both earlier work in [66, 67, 68], and our treatment here is to account for the uncertainty in the model parameters by fitting a population model to observations of both BrAC and TAC for a particular sample of subjects. Fitting the population model consists of estimating the distribution of the random parameters. Once this is done, the resulting population model can then be used to estimate BrAC from a subject’s TAC when no observations of BrAC are available. The underlying statistical model used to estimate the distribution of the parameters is based on the assumption that the dynamics in this physics- based model is common to the entire sample (i.e. all individuals who wore different sensors under different environmental conditions and all molecules of ethanol that were transported from the blood through the skin to the sensor) and then rely on randomness and uncertainty in the model parameters to capture the un-modeled variation. Thus, we assume that each data point is an observation that consists of the mean behavior plus a random error. In population modeling, we can statistically classify the methods as parametric or nonpara- metric. In the parametric approach, we assume that the general structure of the distribution is known a-priori but with unknown parameters of that particular distribution. For example, if we know that the distribution is normal with unknown mean and unknown variance, then the esti- mation of the distribution problem is now reduced to estimating the two unknown parameters 5 (i.e. the mean and the variance of the normal distribution). On the other hand, in the nonpara- metric approach, unlike the parametric approach, the structure of the distribution is assumed to be unknown, and the problem is to estimate the distribution itself. In either of these paradigms, different statistical approaches to the estimation problem can be taken. Each of these parametric or nonparametric classifications can have either a maximum likelihood approach or a Bayesian approach. Our problem of interest is very similar to a pharmacokinetic problem. Population phar- macokinetic models involve estimation of an unknown distribution based on the observed data. In a pharmacokinetic analysis, a drug is given to each individual of the population, the behavior of the drug in each individual is described by an unknown vector of parameters, and the variability of the drug’s response is analyzed [75]. The problem is to estimate the distribution of the popula- tion parameter based on the observed data. Different methodologies are applied to the population model. An overview of these different statistical approaches in the context of pharmacokinetics can be found in [75]. In the sequence of papers [66, 68, 69], the assumption was that the distribution of the ran- dom parameters is absolutely continuous with density parameterized by a parameter r (e.g., a truncated bivariate Normal), estimated via least squares as r N . It was argued that there exists a subsequencefr N j g that converges to the least squares estimate that would be obtained based on the original infinite-dimensional dynamical system. In [39], a Bayesian framework was de- veloped. Other researchers have also been looking at this problem and have tried a number of different approaches. For example, in [30, 31], a more traditional approach based on standard linear regression techniques is developed and discussed. A number of ideas from the machine learning literature have also been considered. In [34], a scheme based on random forests is used to recover BrAC from TAC, and in [53], the authors develop a method using physics-informed 6 neural networks, and in [52], BrAC or BAC is estimated from observations of TAC using a Hidden Markov Model (HMM) based approach. Our approach here is to not make any assumptions about the form of the distribution of the random parameters. We use a nonparametric estimation approach to estimate the shape of the cumulative distribution function directly rather than estimating parameters that determine a density, thus we are not limiting ourselves to any specific distribution type. We first take the naive pooled approach where we establish the existence and consistency of a least squares estimator and develop a finite-dimensional approximation and convergence theory. Our approach is an extension of the approach taken by Banks and his coauthors in [12] and [21]. Banks’ results are for continuous-time dynamical systems with a single subject. We are interested in discrete-time dynamical systems with multiple subjects. Next, we consider a mixed effects (see, for example, [27, 28, 29]) maximum likelihood based statistical model. We establish the existence and consistency of a maximum likelihood estimator and develop a finite-dimensional approximation and convergence theory. In the mixed effects model, it is assumed that observations are specific to a single individual plus a random error. The mixed effects model is a combination of the fixed-effects model, which describes the character- istics for an average individual in the population, and the random-effects model, which describes the inter-individual variability [45]. An outline of the remainder of the dissertation is as follows. In Chapter 2, we provide a liter- ature survey. In Chapter 3, we provide preliminaries used throughout the dissertation. In Section 3.1, we summarize Banks and Thompson results in [21] for the least squares estimator, which is the reference used for the extended results that we get in Chapter 4. In Section 3.2, we specify the assumptions used by Kiefer and Wolfowitz in [40] used for our consistency argument of the 7 maximum likelihood estimator in Chapter 5. In Section 3.3, we provide some basic definitions and theorems from functional analysis. In Section 3.4 and 3.5, we summarize the Kolmogorov- Smirnov test and the Markov Chain Monte Carlo (MCMC) method used in the numerical results of Section 8.1.1. In Section 3.6, we provide a summary of the Prohorov metric on the set of probability measures. In Chapter 4, we introduce the least squares estimator by setting up the mathematical model for our discrete-time dynamical systems and the associated statistical model in Section 4.1. In Section 4.2, we establish the existence and consistency of the least squares esti- mator. In Section 4.3, we establish a convergence theory for finite-dimensional approximations to the estimator. And, finally, in Section 4.4, we present a general approach to error models. Next, in Chapter 5, we introduce the maximum likelihood estimator by setting up the mathematical model in the form of a random discrete-time dynamical system and defining the maximum likelihood estimator for the distribution of the random parameters in Section 5.1. In Section 5.2, we es- tablish the existence and consistency of the maximum likelihood estimator, while in Section 5.3, we demonstrate the convergence of finite-dimensional approximations for our estimator. Next, in Section 6.1 of Chapter 6, we define abstract parabolic systems in a Gelfand triple setting and dis- cuss their properties. In Section 6.2, we describe how to get the finite-dimensional approximating systems using a Galerkin approach. We provide the application of our abstract estimation and approximation framework to them in the case of the least squares estimator in Section 6.3 and in the case of the maximum likelihood estimator in Section 6.4. In Chapter 7, the application of our scheme to the transdermal transport of alcohol is presented and discussed. Finally, in Chapter 8, we present the numerical results from the transdermal transport of alcohol problem, first, in the case of the least squares estimator in Section 8.1 in the context of an example involving simu- lated data in subsection 8.1.1, and in two examples involving human subjects data in subsection 8 8.1.2 collected in the Luczak laboratory, utilizing two different TAC biosensor devices. Next, we present the numerical results from the transdermal transport of alcohol problem in the case of the maximum likelihood estimator in Section 8.2. This again includes numerical studies for two examples, one involving simulated data in subsection 8.2.1 and the other, actual data collected in the Luczak laboratory in subsection 8.2.2. 9 Chapter 2 Mathematical and Statistical Literature Review A mathematical model for a nonlinear regression problem links the states of interest to the re- gressors (i.e. explanatory variables) and a vector of unknown parameters. Data is then collected for those states of interest through an experiment on the physical/biological system. In some experiments, subject-specific data is available while in the others aggregate or population level longitudinal data is collected. A parameter estimation method is used to estimate the unknown parameters. Estimation of parameters in a dynamical system emerges in many of the modeling problems. Previously, a deterministic approach was used to estimate a fixed parameter given a set of observations. However, due to the nature of some problems, the parameters are assumed to be random, and the goal is to find the distribution of the random parameters which characterize the behavior of the population of interest. As a result, a probabilistic approach is used rather than a deterministic one. A survey of measure estimation techniques in the literature of applied mathematics and statis- tics are provided in [17]. Here, we summarize the theoretical foundations of some of these meth- ods as well as their computational aspects. We focus on the nonparametric approach in estimating the probability distributions which provides us with the flexibility of not making any assumptions 10 about the underlying measure. The type of the observations (i.e. the dataset) available for model- ing creates some limitations on which estimation techniques are applicable for the nonparametric estimation of the probability distribution. If sufficient data are available (i.e. collected) from a particular subject, the estimation of parameters of interest in the model is a standard problem in applied mathematics and statistics [22, 27]. However, the parameter of interest varies from one subject to another in the population, and our goal is to estimate the probability distribution of the parameter of interest in the full population using the sample of subjects from the population. On the other hand, there are occasions when the collected data is an aggregate data rather than a subject-specific data, and the goal is to describe the variability of the parameter of interest by estimating its probability distribution. Based on the type of the dataset being a subject-specific dataset or an aggregate dataset, the techniques of probability distribution estimation vary. When a subject-specific dataset is collected from multiple subjects in the population with the goal of estimating the probability distribution of the parameter in the population, the case is called an “individual dynamics-individual data” problem; on the other hand, when we are working with an aggregate dataset with the goal of estimating the probability distribution of the parameter in the population, the case is called an “individual dynamics-aggregate data” problem [17]. Problems from pharmacokinetics studies, commonly found in statistical literature such as [28, 33, 71] are usually modeled as the first case. The second case is used, for example, in catch and release experiments such as the Sinko- Streifer model [22, 64]. There is a possibility of a third option being an “aggregate dynamics- aggregate data” problem, where the dynamics of the model directly depend on the distribution of the parameter across the population. Examples can be found in biology [4, 5, 6], electromagnetic models [13, 14], and wave propagation [20]. 11 For the estimation problems, a framework is established based on the the type of the dataset being a subject-specific dataset or an aggregate dataset. Then, suitable computational techniques are used for estimating the estimators of the probability distribution of the parameters. Since the probability measure space is infinite-dimensional, finite-dimensional approximations are ob- tained for numerical results. For the “individual dynamics-individual data” type of problems, the maximum likelihood estimation approach is usually applied in a setting of a mixed effects model, which requires assumptions about the distribution of the error random variables which are typically assumed to be mean zero [7]. A finite-dimensional approximation technique is used in the nonparametric maximum likelihood estimation approach. For instance, a D-optimal design is used where the determinant of the Fisher Information matrix is maximized over all probability measures [35, 43, 47]. This is shown to be equivalent to the G-optimal design [41]. The con- sistency of the maximum likelihood estimator has been studied decades ago [77, 78]. On the other hand, the “individual dynamics-aggregate data” type of problems can be found in applied mathematics literature [10, 11]. For this type of problems, the naive pooled approach using the least squares estimation method is applied, and the Prohorov metric [56] framework (summa- rized in Section 3.6) is used for establishing the finite-dimensional computational convergence [12, 21]. These methods have been applied to different problems in applied mathematics such as in [3, 4, 8, 9]. 12 Chapter 3 Preliminaries There are two methods of interest that we discuss in formulating our parameter estimation prob- lem, the naive pooled model, where we set up a least squares estimator, and the mixed effects model, where we set up a maximum likelihood estimator. In the naive pooled model, we assume that what we observe is the mean behavior plus a random error. On the other hand, in the mixed effects model, we assume that our observation is specific to a single subject plus a random error. In Section 3.1, we summarize Banks and his coauthor’s results from [21], which is the reference used in the naive pooled approach for the least squares estimator, and in Section 3.2, we include Kiefer and Wolfowitz’s assumptions, and summarize the results from [40], which is the reference used in the mixed effects approach for the maximum likelihood estimator. Then, in the next sec- tions following (Sections 3.3-3.5), we provide definitions and theorems from functional analysis and probability and statistics used throughout the dissertation. And, finally, in Section 3.6, we summarize the Prohorov metric and its properties. 13 3.1 The Least Squares Estimator: Banks and Thompson In Chapter 4, we prove the existence, consistency, and computational convergence of a least squares estimator in the naive pooled approach. The idea behind our approach comes from [21], where the existence and consistency of a least squares estimator are established, and a compu- tational method using a nonparametric estimation approach is developed, which is applicable to estimating the probability distribution of the unknown parameters. Here, we summarize the approach used by Banks and Thompson in [21], and we briefly explain our extension of this approach coming up in Chapter 4. Assume that the quantities of interest for a single subject is described as a continuous dynam- ical system as follows dy dt = g(t;y(t);q;y); y(t 0 )= y 0 ; where q2R r is the parameter vector unique to each subject of the population, and y2R u is the parameter common among all the subjects of the population. The solution of this model is as follows y(t;q;y)= C f(t;q;y 0 ;y); where C2R ls , f(t;q;y)2R s , so y2R l , andq =(q;y 0 )2R r+l =R p . Also,q2Q andy2Y, whereQ andY are the sets of admissible parameters. 14 For the aggregate problem, consider n observations as random variables directly sampled from the mean population state plus a random error as follows V j = v(t j ;P 0 ;y 0 )+E j ; j= 1;:::;n; where v(t;P;y)= E[Cy(t;:;y)jP]= Z Q Cy(t;q;y)dP(q); andE j ; j = 1;:::;n are independent and identically distributed random variables with mean 0 and variances 2 . Then, take n realizations v j of the random variables V j , and set up a least squares problem to find the estimator of the desired distribution. In the set up of this problem, v(t;P;y) has to be approximated by a numerical scheme such as a Galerkin method, and also the space of probability measures has infinitely many elements, so this space has to also be approximated by a computationally tractable set. The Prohorov metric framework is used to prove the existence and consistency of the estimator. In [21], it is shown that the estimator converges to the “true” distribution under some assumptions, and then, the computational convergence of the estimator is established. Following a similar approach as [21] and [12], in Chapter 4, we prove the existence, consis- tency, and computational convergence of the estimator with the difference that we assume that we have a discrete dynamical system instead of a continuous one, and also we extend the proof to the case where we have m subjects (i.e. individuals or drinking episodes in the alcohol problem) 15 instead of one subject, assuming that the observed data for each subject is a mean behavior plus a random error. The convergence of the distribution as a probability measure is shown in the Prohorov metric framework, summarized in Section 3.6. 3.2 The Maximum Likelihood Estimator: Kiefer and Wolfowitz In Chapter 5, we prove the existence, consistency, and computational convergence of a maximum likelihood estimator in the set up of a mixed effects model, where we estimate the probability distribution of the unknown parameters. The existence and consistency of a maximum likelihood estimator have been studied before in statistical literature. For instance, in [43], the existence and uniqueness of a maximum likelihood estimator of a mixing distribution using the geometry of mixture likelihoods were established. Similarly, in [47], the existence and uniqueness of the maximum likelihood estimator for the distribution of the parameters of a random coefficient re- gression model were established. In Chapter 5, we provide an existence argument based on the maximization of a continuous function over a compact set. To establish the consistency of the maximum likelihood estimator, we apply a theorem in [40] with the difference of working in the Prohorov metric framework (summarized in Section 3.6). We establish that the nonparametric maximum likelihood approach is statistically consistent; in other words, as the number of subjects gets larger, the estimator converges in probability to the “true” distribution. We set up our problem in a way that makes establishing the consistency a straightforward application of the consistency result in [40]. Here, we summarize the assumptions by Kiefer and Wolfowitz in [40] for the proof of consistency. 16 LetfX i j g; i= 1:::;m; j = 1;:::;n be random variables such that the density function of X i1 ;:::;X in is f(xjq;a i ), whereq is an unknown parameter shared by all the distributions, anda i is a subject-specific parameter. We assume that X i j are independent whenq;a 1 ;:::;a n are given. Letfa i g be independent and identically distribution random variables with common distribution G 0 . In [40], it is shown that the maximum likelihood estimator of q is strongly consistent (i.e. ˆ q!q with probability 1), and also, the maximum likelihood estimator of G 0 converges to G 0 at every point of continuity with probability 1. Consider the following assumptions: Assumption 1: f(xjq;a) is a density with respect to as-finite measure m on a Euclidean space of which x is the generic point. LetW be the space of possible values of the shared parameterq, and let A A A be the space of values which the subject-specific parameters a i can take. BothW and A A A are measurable subsets of Eu- clidean spaces, f is jointly measurable in x anda for eachq. The components of a pointq t inW is denoted byq (s) t (1 s r), and the Euclidean distance from the origin of a pointa2 A byjaj;t will denote Lebesgue measure on A A A. LetG=fGg be given space of cumulative distributions ofa i . Letq 0 and G 0 be the “true” value of the parameterq and the “true” distribution ofa i , respectively. It is assumed thatq 0 2W and G 0 2G. Letg =(q;G) be the generic point inWG. Define f(xjg)= Z A f(xjq;z)dG(z) andg 0 =(q 0 ;G 0 ). In the space ofWG, define the metric d(g 1 ;g 2 )=d([q 1 ;G 1 ];[q 2 ;G 2 ])= r å s=1 jarctanq (s) 1 arctanq (s) 2 j+ Z A jG 1 (z) G 2 (z)je jzj dt(z): 17 Let ¯ W ¯ G be the completed space ofWG. Then ¯ W ¯ G is compact. Assumption 2 (Continuity Assumption): It is possible to extend the definition of f(xjg) so that the range ofg will be ¯ W ¯ G and so that, for anyfg i g andg in ¯ W ¯ G,g i !g implies f(xjg i )! f(xjg ); except perhaps on a set of x whose probability is 0 according to the probability density f(xjg 0 ). Assumption 3: For anyg in ¯ W ¯ G and anyr> 0, w(xjg;r) is a measurable function of x, where w(xjg;r)= sup f(xjg 0 ); the supremum being taken over allg 0 in ¯ W ¯ G for whichd(g;g 0 )<r. Assumption 4 (Identifiability Assumption): Ifg 1 in ¯ W ¯ G is different fromg 0 , then, for at least one y, Z y ¥ f(xjg 1 )dm6= Z y ¥ f(xjg 0 )dm:; the integral being over x whose components are all the corresponding components of y. Let X be a chance variable with density f(xjg 0 ), and the operator E denote expectation underg 0 . Assumption 5 (Integrability Assumption): For anyg in ¯ W ¯ G we have lim r#0 E " log w(Xjg;r) f(Xjg 0 # + <¥: 18 Under these assumptions, in [40], the strong convergence of the maximum likelihood estima- tor is proven. 3.3 Functional Analysis: Definitions and Theorems In this section, we include some definitions and theorems from functional analysis that are used throughout the dissertation. As we shall see later, we have a random parabolic partial differen- tial equation for the transdermal transport of alcohol model. We rewrite the system in a weak form which includes a sesquilinear form satisfying the boundedness, coercivity, and continuity conditions. The sesquilinear form defines a bounded linear operator A A A in a duality pairing be- tween a Hilbert space and its dual induced by a Gelfand triple. Restricting the operator A A A to a subspace Dom(A A A), defined below, it becomes the infinitesimal generator of a holomorphic or analytic semigroupfe A A At : t 0g of bounded linear operators. The operator A A A is referred to as a regularly dissipative operator. Then we can define operators such that the aforementioned weak form can be written in a state space form as the evolution system with its mild solution given by the variation of constants formula using semigroups as the solution operators. We then apply the Galerkin approach to approximate the infinite-dimensional system by a sequence of finite- dimensional systems, and the convergence is argued by the Trotter-Kato Theorem from the linear semigroup theory. The following are the theorems and definitions of the terminologies used in the explanation of the above procedure. Definition 3.3.1. (Gelfand Triple) Let V and H be Hilbert spaces with V being continuously and densely embedded in H, denoted by V ,! H. Using Riesz map, we can identify H with its dual 19 H . It follows that H is continuously and densely embedded in V , denoted by H ,! V . The triple V ,! H,! V is referred to as a Gelfand triple. The duality pairing between V and V , denoted by <;> V ;V is then compatible with the inner product on H, that is for u2 H and v2 V ,< u;v> V ;V =< u;v> H . Definition 3.3.2. (Sesquilinear) Let V be a vector space. The map a(;) : VV!C is called sesquilinear if it is conjugate linear in the first argument, and linear in the second argument, i.e. for all u 1 ;u 2 ;v 1 ;v 2 2 V and c 1 ;c 2 2C, (i) a(u 1 + u 2 ;v 1 + v 2 )= a(u 1 ;v 1 )+ a(u 1 ;v 2 )+ a(u 2 ;v 1 )+ a(u 2 ;v 2 ), (ii) a(c 1 u 1 ;c 2 u 2 )= ¯ c 1 c 2 a(u 1 ;u 2 ). Let X denote a real or complex Banach space. LetL(X) denote the space of all bounded linear operators from X to X, and letfT(t) : 0 t<¥g be a family of operators inL(X). Definition 3.3.3. (Semigroup) A semigroup on X is a family of bounded linear operatorsfT(t) : 0 t<¥g on X such that (i) T(0)= I, (ii) T(t+ s)= T(t)T(s) for every t,s 0. Definition 3.3.4. (C 0 -Semigroup) A strongly continuous or C 0 -semigroup on X is a family of bounded linear operatorsfT(t) : 0 t<¥g on X such that (i) T(0)= I, (ii) T(t+ s)= T(t)T(s) for every t,s 0. 20 (iii) for every x2 X, the mapping t! T(t)x is continuous on[0;¥), i.e. lim t!0 + k T(t)x xk X = 0; for x2 X: Definition 3.3.5. (Analytic Semigroup) LetfT(t) : 0 t <¥g be a C 0 -semigroup on X, then fT(t) : 0 t<¥g is an analytic semigroup if (i) For someq2(0;p=2), T(t) can be extended toD q =f0g[ft2C jjarg(t)j<qg. (ii) For all t2D q f0g, T(t) is analytic in t in the sense of the uniform operator topology. Definition 3.3.6. (Infinitesimal Generator of a Semigroup) LetfT(t) : 0 t <¥g be a C 0 - semigroup on X. We define A A A as the infinitesimal generator offT(t) : 0 t <¥g on X if for x2 Dom(A A A), A A Ax= lim t!0 + T(t)x x t ; where Dom(A A A)=fx2 X j A A Ax existsg. It can be shown that Dom(A A A) is dense. Theorem 3.3.1. LetfT(t) : 0 t<¥g be a C 0 -semigroup on X with infinitesimal generator A A A. Then T(t)x= lim n!¥ I t n A A A n x 21 Theorem 3.3.2. LetfT(t) : 0 t<¥g be a C 0 -semigroup on X with infinitesimal generator A A A. Then for each x2 Dom(A A A), the map t7! T(t)x in continuously differentiable on (0;¥), T(t)x2 Dom(A A A) for each t, and d dt T(t)x= A A AT(t)x= T(t)A A Ax: Theorem 3.3.3. For eachfT(t) : 0 t<¥g a C 0 -semigroup on X, there exists M2R, M 1, andw2R such thatjT(t)j Me wt , t 0. If A A A is the infinitesimal generator offT(t) : 0 t <¥g a C 0 -semigroup on X such that jT(t)j Me wt , t 0, then we write A A A2 G(M;w). For example, if A A A2 G(1;0), i.e. M = 1 and w = 0, thenjT(t)x T(t)yjjx yj, thus A A A is the infinitesimal generator of the contraction semigroup. Definition 3.3.7. (Resolvent Set) Let A A A : Dom(A A A) X! X be a linear operator. The resolvent set r(A A A) of A A A is defined as the set of all l2C such that the range of the linear transformation lI A A A is dense andlI A A A has a continuous inverse on this set. Definition 3.3.8. (Resolvent Operator) For any l2r(A A A), the resolvent operator is denoted by R l (A A A)=(lI A A A) 1 . Theorem 3.3.4. Let A A A2 G(M;w) be the infinitesimal generator offT(t) : 0 t <¥g a C 0 - semigroup on X. Then, for all x2 X and anyl2r(A A A) such that R l (A A A)>w, R l (A A A)x= Z ¥ 0 e lt T(t)xdt: 22 The two important theorems from the linear semigroup theory are the Hille-Yosida theorem and the Trotter-Kato Theorem. The Hille-Yosida theorem provides necessary and sufficient con- ditions for an operator to be the infinitesimal generator of a C 0 -semigroup. And, the Trotter-Kato Theorem is a fundamental result on the approximation offT(t) : 0 t<¥g a C 0 -semigroup on X, which is used for convergence of numerical approximations. Theorem 3.3.5. (Hille-Yosida Theorem) For M 1 andw2R, A A A2 G(M;w), i.e. A A A generates a C 0 -semigroupfT(t) : 0 t<¥g on X that satisfiesjT(t)j Me wt , if and only if (i) A A A is closed and densely defined in X, i.e. Dom(A A A)= X, (ii) for reall >w,l2r(A A A) and jR l (A A A)j M (lw) n ; n= 1;2;:::: Theorem 3.3.6. (Trotter-Kato Theorem) Let H and H N be Hilbert spaces such that H N H. Let P N : H! H N be an orthogonal projection of H onto H N . Assume that for all x2 H, P N x! x as N!¥. Let A A A and A A A N be infinitesimal generators of C 0 -semigroups S(t) and S N (t) on H and H N , respectively, satisfying (i) there exists M andw such that for each N,jS N (t)j Me wt , (ii) there exists D dense in H such that for somel,(lI A A A)D is dense in H and A A A N P N x! A A Ax for all x2 D. Then, for each x2 H, S N (t)x! S(t)x uniformly in t on compact intervals[0;T]. 23 Theorem 3.3.7. (Morrey’s Inequality) Assume n< p¥, then there exists a constanta depend- ing on p and n, such that for all u2 C 1 (R n ), k uk C 0;g (R n ) ak uk W 1;p (R n ) ; whereg = 1 n p . 3.4 Probability and Statistics: Kolmogorov-Smirnov Test Here, we summarize the Kolmogorov-Smirnov test used in Chapter 4. We used the Matlab func- tion kstest 2s 2d as described in [49]. The two-dimensional version of the KS-test used in this algorithm was developed by Peacock [55]. The statistic, D ˆ n , used in the KS-test is the maximum of the absolute difference between the two empirical distributions of random samples generated by the estimated distribution and the “true” distribution by considering all the ordering combina- tions that are possible. In [55], Peacock argues that the two dimensional case may be analyzed in the same manner as the one dimensional case where the sample size is replaced by ˆ n= n 1 n 2 n 1 + n 2 ; where n 1 is the size of the sample generated by the estimated distribution, and n 2 is the size of the sample generated by the “true” distribution. For this algorithm, n 1 and n 2 should be greater than 24 10. Letting Z ˆ n = p ˆ nD ˆ n , for large values of n 1 and n 2 , Z ˆ n K, where K is a Kolmogorov random variable whose distribution is given by F K (z)= P(K z)= 1 2 ¥ å k=1 (1) k1 e 2k 2 z 2 : In [23], Birnbaum derived analytic expressions for the distribution of the Kolmogorov’s statis- tic for finite sample size, 2 and 3, and tabulated values for higher sample sizes. Using Monte Carlo simulation, Peacock demonstrated that one may adjust for small sample sizes via the fit asymptotic correction 1 Z ˆ n Z ¥ = 0:53 ˆ n 0:9 : Taking the null hypothesis to be that the two samples were drawn from the same distribution, the p-value is then well approximated by the expression P(K> Z ¥ ) 2e 2(Z ¥ 0:5) 2 ; the asymptotic distribution in the two-dimensional case [55]. 3.5 Probability and Statistics: Markov Chain Monte Carlo Here, we summarize the Markov Chain Monte Carlo (MCMC) algorithm used in Chapter 4. Many problems in statistics and machine learning such as solving high dimensional integrals are too complex that it is almost impossible to solve them precisely or to obtain exact solutions for them. A Monte Carlo (MC) algorithm is a randomized algorithm that can approximate the solution to a problem with a random error [57]. This random error can be reduced by increasing 25 the computational resources, namely the running time and memory. Monte Carlo methods rely on drawing samples from a probability distribution in order to estimate (Monte Carlo estimate) a quantity of interest. The basic idea in Monte Carlo methods is to view an integral or sum as the expectation of a certain random variable. For instance, assume that we are trying to calculate the following integral I = Z p(x) f(x)dx: We can rewrite the above integral as follows I = Z p(x) f(x)dx= E p [ f(X)]; where X is a random variable with distribution p, i.e. X p. Then, we can approximate I by generating n samples x 1 ;:::;x n from the distribution p, and taking the sample mean as an estimate of the expected value of f(X), i.e. ˆ I = n å i=1 f(x i ): Then, by the Law of Large Numbers (LLN) [36], lim n!¥ ˆ I = I: This idea heavily relies on whether we can easily generate samples from the distribution p. When sampling from the distribution p is not straightforward, we should rely on alternative methods such as importance sampling [76] or more general methods called Markov Chain Monte Carlo (MCMC) [24] that rely on generating samples that converge to the distribution of interest. Perhaps the most important MCMC algorithm is the Metropolis-Hastings algorithm [38]. This 26 algorithm can draw samples from any probability distribution p as long as we know the density function of p proportional to a constant, i.e. we only need to know the functional form of the density of p and there is no need for the normalization constant. The algorithm starts with an arbitrary initial value from the support of the target distribution. Then, at each iteration, using a proposal distribution centered at the current value, a new value is generated. This proposed value is either accepted or rejected with some probability. If ac- cepted, the proposed value is considered the next value, and if rejected, we remain at the current value, and repeat the process. The Metropolis-Hastings algorithm works well in low-dimensional problems, but it suffers in high-dimensional situations. Other methods have also been proposed such as slice sampling [50], Gibbs sampling [37], and Hamiltonian Mote Carlo (HMC) [51]. The HMC algorithm has been the most successful and widely used one in high-dimensional problems. 3.6 The Prohorov Metric The following is a summary of the definitions and theorems presented in [21]. The Prohorov metric framework, followed by these results, will be applied to the least squares estimation prob- lem in Chapter 4 as well as the maximum likelihood estimation problem in Chapter 5 for the nonparametric estimation of the probability measure of our model of interest. Let Q be the metric space with metric d. Define C b (Q)=f f : Q!R j f is bounded and continuousg: The following Riesz Representation Theorem is used to establish the weak topology on the continuous dual of the space of bounded continuous functions. 27 It is important to note that the weak topology and the weak topology are equivalent on the space of probability measures. Theorem 3.6.1. (Riesz Representation Theorem.) Assume (Q;d) is a Hausdorff compact space. For every f 2 C b (Q) (the continuous dual of the space C b (Q)), there exists a unique finite signed Borel measurem such that f ( f)= Z Q f(q q q)dm(q q q) for all f2 C b (Q). Moreover,jj f jj=jmj(Q). Definition 3.6.1. (Hausdorff Space.) X is a Hausdorff space if every two distinct points x 1 and x 2 in the topological space X are pairwise neighborhood-separable; i.e. there exists a neighborhood U of x 1 and a neighborhood V of x 2 such that U\V = / 0. Note that in this document,(Q;d) is assumed to be a Hausdorff space. By the Riesz Representation Theorem above, let f = f m . The setP(Q) of probability measures on(Q;d) can be identified with f m 2 C b (Q) such that f m ( f) 0 for all f2 C b (Q) and jj f m jj=m(Q)= 1. Thus,P(Q) C b (Q). Definition 3.6.2. (Weak Topology.) f n w ! f if and only if f n ( f)! f ( f) for all f2 C b (Q). Definition 3.6.3. Let(Q;d) be any metric space. Given any probability measure P2P(Q) and somee> 0, ane-neighborhood of P is B e (P)= ˜ P Z Q f(q q q)d ˜ P(q q q) Z Q f(q q q)dP(q q q) <e; for all f2 C b (Q) : 28 Definition 3.6.4. Given a sequence of measures P M 2P(Q) for all M = 1;:::;¥, we say P M converges weakly to P, P M w ! P, ifj R Q f(q q q)dP M (q q q) R Q f(q q q)dP(q q q)j! 0 for all f2 C b (Q). We want to define a metric r onP(Q)P(Q) such that given two probability measures, P; ˜ P2P(Q), ˜ P2 B e (P)() r(P; ˜ P)<e; . Definition 3.6.5. Let(Q;d) be a metric space. For all nonempty E2S Q , the Borel sigma algebra of Q, define thee-neighborhood of E, E e =f˜ q q q2 Q j inf q q q2Q d(q q q; ˜ q q q)<eg: Definition 3.6.6. Let(Q;d) be a metric space and letP(Q) be the set of all probability measures on Q. For any two measures P; ˜ P2P(Q), the Prohorov metricr is r(P; ˜ P)= inffe> 0j ˜ P(E) P(E e )+e and P(E) ˜ P(E e )+e; for all E2S Q g: Theorem 3.6.2. Let(Q;d) be a metric space. Thenr is a metric onP(Q). Theorem 3.6.3. Assume(Q;d) is separable, and P2P(Q); P M 2P(Q) for all M= 1;:::;¥. Then P M w ! P if and only ifr(P M ;P)! 0. The Prohorov metric metrizes the weak convergence of measures; as a result, the weak topol- ogy of measures is equivalent to the topology induced by the Prohorov metric on the space of 29 probability measures over a separable metric space(Q;d). In other words, given a separable met- ric space (Q;d), the space (P(Q);r) is a metric space with a topology equivalent to the weak topology of measures. Convergence in the Prohorov metric is equivalent to the weak conver- gence of measures whenP(Q) C b (Q). Some useful properties of the space(P(Q);r) is as follows: Define the space of Dirac measures on Q, D=fd q q q j j q q q j 2 Qg; where for all E2S Q , d q q q j (E)= 8 > > < > > : 1 if q q q j 2 E 0 if q q q j = 2 E Proposition 3.6.1. Let(Q;d) be a separable metric space and define DP(Q). Then r(d q q q 1 ;d q q q 2 )= 8 > > < > > : d(q q q 1 ;q q q 2 ) if d(q q q 1 ;q q q 2 ) 1 1 if d(q q q 1 ;q q q 2 )> 1 Corollary 3.6.1. The sequencefq q q j g ¥ j=1 is Cauchy in the separable space(Q;d) if and only if the sequencefd q q q j g ¥ j=1 is Cauchy in(P(Q);r). Corollary 3.6.2. Assume(Q;d) is separable. Then(Q;d) is complete if and only if(P(Q);r) is complete. 30 Corollary 3.6.3. Assume(Q;d) is separable. Then(Q;d) is compact if and only if(P(Q);r) is compact. Given the compactness of(Q;d), by the Riesz Representation Theorem we have P M w ! P if and only if f P M ( f)! f P ( f) for all f2 C b (Q). Consider the unit ball in C b (Q), B=f f 2 C b (Q)jjj f jj 1g: By Alaoglu’s Theorem, B is compact in the weak topology. Then the compact set f f 2 Bjjj f jj= 1 and f is positiveg is homeomorphic to(P(Q);r). For some n q q q , QR n q q q and P2P(Q), consider the random vector X : Q!R n q q q on the probability space (Q;S Q ;P) given by X(q q q)= q q q for q q q2 Q. The cumulative distribution function for X is given by F X (q 1 ;:::;q n )= P(X2 n q q q `=1 (¥;q ` ])= P( n q q q `=1 f(¥;q ` ]\ Qg): In this case, it follows that iffP M g;P 0 2(P(Q),r), then r(P M ;P 0 )! 0 if and only if F X M ! F X 0 31 at all points of continuity of F X 0 . Consequently, Prohorov metric convergence and weak and weak convergence inP(Q) are also referred to as convergence in distribution. Assume the metric space(Q;d) is separable and let Q d =fq q q j g ¥ j=1 be a countable dense subset of Q. Define the dense (see [21]) subset ofP(Q), ˜ P d (Q), as ˜ P d (Q)=fP2P(Q)j P= M å j=1 p j d q q q j ;q q q j 2 Q d ;M2N; p j 2[0;1]\Q; M å j=1 p j = 1g; (3.1) the collection of all convex combinations of Dirac measures on Q with rational weights p j at nodes q q q j 2 Q d , and for each M2N let P M (Q)=fP2 ˜ P d (Q) j P= M å j=1 p j d q q q j ;q q q j 2fq q q j g M j=1 g: (3.2) The Prohorov metric framework is used in the nonparametric approach of estimating the probability distribution of the random parameter in both the least squares estimation problem in Chapter 4 and the maximum likelihood estimation problem in Chapter 5. 32 Chapter 4 The Least Squares Estimator 4.1 The Mathematical Model Analogously to what was done in [12, 21], consider the following discrete-time mathematical model for the i th subject at time-step k x k;i = g k1 (x k1;i ;u k1;i ;q q q); k= 1;:::;n i ; i= 1;:::;m; x 0;i =f 0;i ; i= 1;:::;m; where q q q2 Q is the parameter vector, Q is the set of admissible parameters, g k1 :HR n Q! H ,H is, in general, an infinite-dimensional Hilbert space, and u k1;i 2R n is the input. The output is given by y k;i = h k (x k;i ;f 0;i ;u k;i ;q q q); k= 1;:::;n i ; i= 1;:::;m; where h k :HHR n Q!R. 33 We consider the aggregate problem of observing n i m sampled measurements of the mean behavior of the output plus a random error. Define Y k;i = ¯ y k;i (P 0 )+ e k;i ; k= 1;:::;n i ; i= 1;:::;m; (4.1) where ¯ y k;i (P)= E[h k (x k;i ;f 0;i ;u k;i ;q q q)]= Z Q h k (x k;i ;f 0;i ;u k;i ;q q q)dP; and e k;i are independent and identically distributed (i.i.d) random variables with mean 0 and common variance s 2 , and P is a probability measure on the Borel sigma algebra on Q,S Q . Let P(Q) denote the set of all probability measures defined on the Borel sigma algebra on Q, S Q , and let P 0 denote the “true” distribution of the random vector q q q. The goal is to find an estimate of the “true” distribution P 0 . In order to generate an estimator for P 0 , we employ an abstract framework based on non- linear least squares and the Prohorov metric onP(Q) (summarized in Section 3.6) to establish theoretical results and computational tools. Let n n n=fn i g m i=1 and define J n n n;m (Y Y Y ;P)= m å i=1 n i å k=1 (Y k;i ¯ y k;i (P)) 2 ; where Y Y Y =(fY k;i g n i k=1 ) m i=1 and define the estimator P n n n;m = arg min P2P(Q) J n n n;m (Y Y Y ;P): (4.2) 34 Let ˆ y k;i be realizations of the random variables Y k;i , and define ˆ P n n n;m = arg min P2P(Q) J n n n;m (ˆ y ˆ y ˆ y;P)= arg min P2P(Q) m å i=1 n i å k=1 (ˆ y k;i ¯ y k;i (P)) 2 ; (4.3) where ˆ y ˆ y ˆ y=ffˆ y k;i g n i k=1 g m i=1 . We cannot exactly compute ˆ P n n n;m since ¯ y k;i (P) typically must be approximated numerically. Let ¯ y N k;i (P) be an approximation of ¯ y k;i (P) based, for example, on a Galerkin scheme with N denoting the level of discretization. In addition, we define our approximating estimator over the setP M (Q) given by equation (3.2) where M denotes the number of nodes,fq q q j g M j=1 , so that the least squares optimization is now over a finite set of parameters, namely thefp j g M j=1 . Our approximating estimator is then given by ˆ P N n n n;m;M = arg min P2P M (Q) J N n n n;m (ˆ y ˆ y ˆ y;P)= arg min P2P M (Q) m å i=1 n i å k=1 (ˆ y k;i ¯ y N k;i (P)) 2 : (4.4) 4.2 Existence and Consistency of the Least Squares Estimator In this section, we establish the existence and consistency of the least squares estimator that we set up in the previous section. In order to establish the existence of the estimator P n n n;m given in equation (4.2), it is sufficient to show the existence of the estimator ˆ P n n n;m given in equation (4.3). Recall that the estimator ˆ P n n n;m is obtained from the realizationsfˆ y k;i g; k= 1;:::;n i ; i= 1;:::;m of the random variables fY k;i g; k= 1;:::;n i ; i= 1;:::;m. 35 The following theorem establishes the existence of the estimator ˆ P n n n;m . Theorem 4.2.1. For i= 1;2;:::;m, let J n i :R n i P(Q)!R be defined by J n i (ˆ y ˆ y ˆ y i ;P)= n i å k=1 (ˆ y k;i ¯ y k;i (P)) 2 ; where ˆ y ˆ y ˆ y i =[ˆ y 1;i ;:::; ˆ y n i ;i ] T and set J n n n;m (ˆ y ˆ y ˆ y;P)= m å i=1 J n i (ˆ y ˆ y ˆ y i ;P): Assume (Q;d) is separable and compact. Let (P(Q);r) be the space of probability measures P(Q) with the Prohorov metricr. Assume that for each P2P(Q) we have a measurable func- tion J n i (:;P) :R n i !R, and also for each ˆ y ˆ y ˆ y i , we have a continuous function J n i (ˆ y ˆ y ˆ y i ;:) :P(Q)!R. Then there exists a measurable function ˆ P n n n;m :Õ m i=1 R n i !P(Q) such that J n n n;m (ˆ y ˆ y ˆ y; ˆ P n n n;m )= inf P2P(Q) J n n n;m (ˆ y ˆ y ˆ y;P): Proof. According to the proof in [21], we take an enumeration of the countable dense subset of Q asfq j g M j=1 , and defineP M (Q) ˜ P d (Q) as in equation (3.2), where ˜ P d (Q) is given by equation (3.1). Then, letfP M j g ¥ j=1 be an enumeration of the elements ofP M , andP M J =fP M j g J j=1 be the first J enumerated elements. Let ˜ P M J be such that J n n n;m (ˆ y ˆ y ˆ y; ˜ P M J )= min P2P M J J n n n;m (ˆ y ˆ y ˆ y;P): 36 Since the minimum is being taken over a finite number of elements of the enumeration ofP M , the minimum must exist. It can be shown that ˜ P M J is measurable. Then identify ˜ P M J with[0;1] M \Q M by ˜ P M J (v)7!(p M 1 ;:::; p M M ) map. Let ˜ p M 1;J be the first component of the vector representation for ˜ P M J and define ˆ p M 1 = liminf J!¥ ˜ p M 1;J ; so ˆ p M 1 is measurable. Then repeat the same argument for the other components of the vector. Since [0;1] M is compact, there exists a convergent subsequence of the vector representation to some vector ˆ P M =( ˆ p M 1 ;:::; ˆ p M M ) such that lim M!¥ J n n n;m (ˆ y ˆ y ˆ y; ˆ P M )= lim M!¥ inf P2P M (Q) J n n n;m (ˆ y ˆ y ˆ y;P)= inf P2P(Q) J n n n;m (ˆ y ˆ y ˆ y;P)= J n n n;m (ˆ y ˆ y ˆ y; ˆ P n n n;m ): Next, we establish the consistency of the estimator P n n n;m by showing that P n n n;m converges to the “true” distribution, P 0 , in probability. Consider the following assumptions for a fixed m, A0. Let n= maxn i , and let y k;i = 0 for n i < k n; i= 1;:::;m. A1. For any fixed n and m,fe k;i g; k = 1;:::;n; i= 1;:::;m, introduced in equation (4.1), are i.i.d. with E[e k;i ]= 0 and Var(e k;i )=s 2 on some probability space(W;S W ;P W ). 37 A2. Let(P(Q);r) be the space of probability measuresP(Q) with the Prohorov metricr, where (Q;d) is a separable and compact metric space. A3. For T > 0 and n2N, set t n = T=n and t n k = kt n , k= 1;:::;n. Then, for each i= 1;:::;m, there exists ¯ y i 2 C(P(Q);C([0;T])) such that ¯ y i (t n k ;P)= ¯ y k;i (P), for P2P(Q), k= 1;:::;n. A4. There exists a measurem on[0;T] such that for all f2 C([0;T]), as n!¥, 1 n n å k=1 f(t n k )= Z T 0 f(t)dm n (t)! Z T 0 f(t)dm(t): A5. The “true” distribution P 0 2P(Q) is the unique minimizer of J 0 (P) where J 0 (P)=s 2 + 1 m m å i=1 Z T 0 ( ¯ y i (t;P 0 ) ¯ y i (t;P)) 2 dm(t): Theorem 4.2.2. Under assumptions A0-A5, define J n;m (Y Y Y ;P)= m å i=1 n å k=1 (Y k;i ¯ y i (t n k ;P)) 2 ; there exists a set A2S W with probability one such that for fixed m and for allw2 A, 1 mn J n;m (Y Y Y ;P)(w)! J 0 (P)(w) as n!¥, for each P2P(Q). In addition, the convergence is uniform onP(Q). 38 Proof. Let P2P(Q) be fixed. We have 1 mn J n;m (Y Y Y ;P)= 1 mn m å i=1 n å k=1 (Y k;i ¯ y i (t n k ;P)) 2 = 1 mn m å i=1 n å k=1 ( ¯ y i (t n k ;P 0 )+ e k;i ¯ y i (t n k ;P)) 2 = 1 mn m å i=1 n å k=1 e 2 k;i + 2 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) e k;i + ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) 2 = 1 mn m å i=1 n å k=1 e 2 k;i (4.5) + 2 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) e k;i (4.6) + 1 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) 2 : (4.7) For the first term, (4.5), by the Strong Law of Large Numbers, we have P W (fw2W j 1 mn m å i=1 n å k=1 e 2 k;i !s 2 g)= 1: For the second term, (4.6), let ˜ e k;i = ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) e k;i , by continuity of ¯ y i (t;P) and compactness of[0;T], we have E[ ˜ e k;i ]= ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) E[e k;i ]= 0: Var[ ˜ e k;i ]= ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) 2 Var[e k;i ] =s 2 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) 2 s 2 sup t2[0;T] ¯ y i (t;P 0 ) ¯ y i (t;P) 2 <¥: 39 So we have Var[ ˜ e k;i ] S P for some S P ; therefore, we have ¥ å k=1 Var[ ˜ e k;i ] k 2 S P ¥ å k=1 1 k 2 <¥: Hence, by Kolmogorov’s Law of Large Numbers, we have P W (fw2Wj 2 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) e k;i ! 0g)= 1: For the third term, (4.7), by assumptions A4, A5, and continuity of ¯ y i (t;P), we have 1 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) 2 ! 1 m m å i=1 Z T 0 ¯ y i (t;P 0 ) ¯ y i (t;P) 2 dm(t)= J 0 (P)s 2 : Let A P =fw2W j 1 mn m å i=1 n å k=1 e 2 k;i !s 2 g\fw2W j 2 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) e k;i ! 0g; then P W (A P )= 1 and for everyw2 A P , by adding the three terms above, we get 1 mn J n;m (Y Y Y ;P)(w)!s 2 + 0+ J 0 (P)(w)s 2 = J 0 (P)(w): Next, we want to find A with P W (A)= 1 such that we get the convergence for every w2 A and for all P2P(Q). Let A 1 =fw2Wj 1 mn å m i=1 å n k=1 je k;i j! E[je 1;1 j]g. By the Strong Law of Large Numbers, P W (A 1 )= 1. Similar to the idea in [21], we let A= A 1 T T P2 ˜ P d (Q) A P . We take the intersection over a countable number of sets, each with probability one, so P W (A)= 1. We must show that A A P 40 for all P2P(Q) from which will follow the desired convergence for every w2 A and for all P2P(Q). For allw2 A, we havew2 A 1 . Choose N 1 such that for all n N 1 , 1 mn m å i=1 n å k=1 je k;i j< E[je 1;1 j]+ 1: By Assumption A3 and ˜ P d (Q) being dense inP(Q), we choose P M 2 ˜ P d (Q) such that for all i= 1;:::;m, sup t2[0;T] j ¯ y i (t;P) ¯ y i (t;P M )j< e 4 E[je 1;1 j]+ 1 : Then for allw2 A, we havew2 A P M , and therefore that w2fw2W j 2 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P M ) e k;i ! 0g: Choose N 2 such that for all n N 2 , 2 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P M ) e k;i < e 2 : Then for n maxfN 1 ;N 2 g, 1 mn J n;m (Y Y Y ;P) J 0 (P) 1 mn m å i=1 n å k=1 e 2 k;i s 2 (4.8) + 2 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) e k;i (4.9) + 1 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) 2 (4.10) 1 m m å i=1 Z T 0 ¯ y i (t;P 0 ) ¯ y i (t;P) 2 dm(t) : 41 For the first term, (4.8), sincew2 A, we have w2fw2W j 1 mn m å i=1 n å k=1 e 2 k;i !s 2 g; so 1 mn m å i=1 n å k=1 e 2 k;i s 2 ! 0: For the second term, (4.9), 2 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) e k;i = 2 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P)+ ¯ y i (t n k ;P M ) ¯ y i (t n k ;P M ) e k;i 2 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P M ) e k;i + 2 mn m å i=1 n å k=1 ¯ y i (t n k ;P M ) ¯ y i (t n k ;P) e k;i < e 2 + 2 mn m å i=1 n å k=1 sup t2[0;T] ¯ y i (t;P M ) ¯ y i (t;P) e k;i < e 2 + 2 e 4 E[je 1;1 j]+ 1 ! 1 mn m å i=1 n å k=1 e k;i < e 2 + 2 e 4 E[je 1;1 j]+ 1 ! E[je 1;1 j]+ 1 = e 2 + e 2 =e: For the third term, (4.10), by assumptions A4 and A5, we have 1 mn m å i=1 n å k=1 ( ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P)) 2 1 m m å i=1 Z T 0 ( ¯ y i (t;P 0 ) ¯ y i (t;P)) 2 dm(t) ! 0: Therefore we have 1 mn J n;m (Y Y Y ;P)(w)! J 0 (P)(w) for everyw2 A and for all P2P(Q). 42 Next, we want to show that the convergence is uniform onP(Q) for w2 A. First, we will show that the sequence of functions 1 mn J n;m (Y Y Y ;P), n= 1;:::;¥ is equicontinuous as functions of P. Fixw2 A, lete> 0, and take P2P(Q). By compactness of[0;T] and Assumption A3, there exists ad > 0 such that for all ˜ P2 B d (P), and for all i= 1;:::;m, sup t2[0;T] j ¯ y i (t;P) ¯ y i (t; ˜ P)j< min e 6 E[je 1;1 j]+ 1 ; e 3 sup t2[0;T] j ¯ y i (t;P 0 )j ; sup t2[0;T] j ¯ y i (t;P) 2 ¯ y i (t; ˜ P) 2 j< e 3 : Also, sincew2 A, we havew2 A 1 . We choose N 3 such that n N 3 implies 1 mn m å i=1 n å k=1 je k;i j< E[je 1;1 j]+ 1: Then for n N 3 and for all ˜ P2 B d (P), 1 mn J n;m (Y Y Y ;P) 1 mn J n;m (Y Y Y ; ˜ P) = 1 mn m å i=1 n å k=1 ( ¯ y i (t n k ;P 0 )+ e k;i ¯ y i (t n k ;P)) 2 1 mn m å i=1 n å k=1 ( ¯ y i (t n k ;P 0 )+ e k;i ¯ y i (t n k ; ˜ P)) 2 = 1 mn m å i=1 n å k=1 2e k;i + ¯ y i (t n k ;P 0 ) ¯ y i (t n k ;P) ¯ y i (t n k ; ˜ P) ¯ y i (t n k ; ˜ P) ¯ y i (t n k ;P) 1 mn m å i=1 n å k=1 2 e k;i ¯ y i (t n k ; ˜ P) ¯ y i (t n k ;P) + 1 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) ¯ y i (t n k ; ˜ P) ¯ y i (t n k ;P) + 1 mn m å i=1 n å k=1 ¯ y i (t n k ;P) 2 ¯ y i (t n k ; ˜ P) 2 2 mn m å i=1 n å k=1 e k;i sup t2[0;T] ¯ y i (t; ˜ P) ¯ y i (t;P) 43 + 1 mn m å i=1 n å k=1 ¯ y i (t n k ;P 0 ) sup t2[0;T] ¯ y i (t; ˜ P) ¯ y i (t;P) + 1 m m å i=1 sup t2[0;T] ¯ y i (t;P) 2 ¯ y i (t; ˜ P) 2 < 2 E[je 1;1 j]+ 1 e 6 E[je 1;1 j]+ 1 ! + 1 mn m å i=1 n å k=1 sup t2[0;T] ¯ y i (t;P 0 ) e 3 sup t2[0;T] j ¯ y i (t;P 0 )j + 1 m m å i=1 sup t2[0;T] ¯ y i (t;P) 2 ¯ y i (t; ˜ P) 2 < e 3 + e 3 + e 3 =e: Therefore, the sequence of functions 1 mn J n;m (Y Y Y ;P), n= 1;:::;¥ is equicontinuous as functions of P, and by the Arzela-Ascoli Theorem, 1 mn J n;m (Y Y Y ;P)! J 0 (P) uniformly on compact subsets of P(Q), and therefore we have uniform convergence onP(Q). Theorem 4.2.3. Under assumptions A0-A5, define P n;m = arg min P2P(Q) J n;m (Y Y Y ;P)= arg min P2P(Q) m å i=1 n å k=1 (Y k;i ¯ y i (t n k ;P)) 2 ; for a fixed m,r(P n;m ;P 0 )! 0 with probability one as n!¥, wherer is the Prohorov metric. Proof. Fixw2 A (defined in Theorem 4.2.2). We know that 1 mn J n;m (Y Y Y ;P)(w)! J 0 (P) for every P2P(Q) by the previous theorem. Letd > 0, and define thed-neighborhood of P 0 by B d (P 0 )= ˜ P Z Q f(q q q)d ˜ P Z Q f(q q q)dP 0 <d; for all f2 C b (Q) : where C b (Q)=f f : Q!R j f is bounded and continuousg. 44 Then B d (P 0 ) is open inP(Q), and its complement (B d (P 0 )) C is compact, sinceP(Q) is compact. Also, by assumption A5, there exists e > 0 such that J 0 (P) J 0 (P 0 )> 2e for all P2 (B d (P 0 )) C . Also, by Theorem 4.2.2, there exists N 4 such that for n N 4 , 1 mn J n;m (Y Y Y ;P)(w) J 0 (P) < e 2 for all P2P(Q). Then for n N 4 and P2(B d (P 0 )) C , 1 mn J n;m (Y Y Y ;P) 1 mn J n;m (Y Y Y ;P 0 ) = 1 mn J n;m (Y Y Y ;P) 1 mn J n;m (Y Y Y ;P 0 )+ J 0 (P) J 0 (P)+ J 0 (P 0 ) J 0 (P 0 ) = 1 mn J n;m (Y Y Y ;P) J 0 (P) ! + J 0 (P) J 0 (P 0 ) ! + J 0 (P 0 ) 1 mn J n;m (Y Y Y ;P 0 ) ! > e 2 + 2e e 2 =e> 0: So for all P2(B d (P 0 )) C , we get J n;m (Y Y Y ;P)> J n;m (Y Y Y ;P 0 ): However, by definition of P n;m , we have J n;m (Y Y Y ;P n;m ) J n;m (Y Y Y ;P 0 ): Therefore, P n;m 2 B d (P 0 ) for all n N 4 , and sinced was arbitrary, we getr(P n;m ;P 0 )! 0. 45 4.3 Approximation and Convergence After establishing the consistency of the estimators P n;m in equation (4.2), followed by the esti- mates ˆ P n;m , using the realizations, given by equation (4.3), we need to establish computational convergence since P n;m or ˆ P n;m cannot be computed, and should be approximated numerically by ˆ P N n;m;M given in equation (4.4), where recall that ˆ P N n;m;M = arg min P2P M (Q) J N n;m (ˆ y ˆ y ˆ y;P)= arg min P2P M (Q) m å i=1 n å k=1 (ˆ y k;i ¯ y N k;i (P)) 2 : Corollary (3.6.3), proved in [21], is needed for the proof of Theorem 4.3.1. Consider the following assumptions, B1. For all n, m, and N, the map P7! J N n;m (ˆ y ˆ y ˆ y;P) is a continuous map. B2. Let(P(Q);r) be the space of probability measuresP(Q) with the Prohorov metricr. For any sequence of probability measures P M , such that r(P M ;P)! 0 inP(Q), ¯ y N k;i (P M )! ¯ y k;i (P) as N;M!¥ for k= 1;:::;n; i= 1;:::;m. B3. For all P2P(Q) and for k= 1;:::;n; i= 1;:::;m, ¯ y k;i (P) and ¯ y N k;i (P) are uniformly bounded. Theorem 4.3.1. Let (Q;d) be a compact and separable metric space, and (P(Q);r) be the space of probability measuresP(Q) with the Prohorov metricr. Let P M (Q)=fP2 ˜ P d (Q)j P= M å j=1 p j d q q q j ;q q q j 2fq q q j g M j=1 g ˜ P d (Q): 46 Then, under assumptions B1-B3, there exists minimizers ˆ P N n;m;M such that ˆ P N n;m;M = arg min P2P M (Q) J N n;m (ˆ y ˆ y ˆ y;P)= arg min P2P M (Q) m å i=1 n å k=1 (ˆ y k;i ¯ y N k;i (P)) 2 : In addition, for a fixed n and m, there exists a subsequence of ˆ P N n;m;M that converges to some ˆ P n;m as M;N!¥, and ˆ P n;m satisfies ˆ P n;m = arg min P2P(Q) J n;m (ˆ y ˆ y ˆ y;P)= arg min P2P(Q) m å i=1 n å k=1 (ˆ y k;i ¯ y k;i (P)) 2 : Proof. For any fixed n and m, by continuity of the map P7! J N n;m (ˆ y ˆ y ˆ y;P) for all n, m, and N, per assumption B1, and by Corollary 3.6.3, we can conclude that ˆ P N n;m;M exist. Since ˜ P d (Q) given by (3.1) is dense inP(Q), for M= 1;2;:::, we can construct a sequence of measures P M 2P M (Q) given by (3.2), such thatr(P M ;P)! 0 inP(Q), then by assumption B2 and B3, for some constant c, we have J N n;m (ˆ y ˆ y ˆ y;P M ) J n;m (ˆ y ˆ y ˆ y;P) = m å i=1 n å k=1 (ˆ y k;i ¯ y N k;i (P M )) 2 m å i=1 n å k=1 (ˆ y k;i ¯ y k;i (P)) 2 = m å i=1 n å k=1 2ˆ y k;i ¯ y N k;i (P M )+( ¯ y N k;i (P M )) 2 + 2ˆ y k;i ¯ y k;i (P) ( ¯ y k;i (P)) 2 ¯ y N k;i (P M ) ¯ y N k;i (P)+ ¯ y N k;i (P M ) ¯ y N k;i (P) < m å i=1 n å k=1 2ˆ y k;i ¯ y N k;i (P M ) ¯ y k;i (P) ¯ y k;i (P) ¯ y N k;i (P M ) < c m å i=1 n å k=1 ¯ y k;i (P) ¯ y N k;i (P M ) ! 0: 47 So we get, J N n;m (ˆ y ˆ y ˆ y;P M )! J n;m (ˆ y ˆ y ˆ y;P) as N;M!¥. Also, by definition, we have J N n;m (ˆ y ˆ y ˆ y; ˆ P N n;m;M ) J N n;m (ˆ y ˆ y ˆ y;P M ) (4.11) for each n, m, and N, and for all P M 2P M (Q). And, sinceP(Q) is compact, there exists a subsequence of ˆ P N n;m;M that converges to some ˆ P n;m as M;N!¥. So by taking the limit as M;N!¥ of (4.11), for all P2P(Q), we get J n;m (ˆ y ˆ y ˆ y; ˆ P n;m ) J n;m (ˆ y ˆ y ˆ y;P); therefore, ˆ P n;m = arg min P2P(Q) J n;m (ˆ y ˆ y ˆ y;P)= arg min P2P(Q) m å i=1 n å k=1 (ˆ y k;i ¯ y k;i (P)) 2 : In practice, we fix sufficiently large values of M, the number of nodes, and N, the level of discretization, in order to achieve the preferred accuracy level. The choice of N, the level of discretization, depends on the numerical framework, such as the Galerkin method, used for approximation of ¯ y k;i (P) by ¯ y N k;i (P). In order to achieve the level of accuracy that is desired, we select M nodes,fq q q j g M j=1 , and determine the values of the weights p j at each node q q q j by reducing 48 the optimization problem to a standard constrained estimation problem over Euclidean M-space as follows ˆ P N n;m;M = arg min P2P M (Q) J N n;m (ˆ y ˆ y ˆ y;P) = arg min P2P M (Q) m å i=1 n å k=1 (ˆ y k;i ¯ y N k;i (P)) 2 = arg min P2P M (Q) m å i=1 n å k=1 ˆ y k;i E[h k (x N k;i ;f N 0;i ;u k;i ;q q q)] 2 = arg min P2P M (Q) m å i=1 n å k=1 ˆ y k;i Z Q h k (x N k;i ;f N 0;i ;u k;i ;q q q)dP 2 = arg min ˜ p p p2 f R M m å i=1 n å k=1 ˆ y k;i M å j=1 h k (x N k;i ;f N 0;i ;u k;i ;q q q j )p j 2 = arg min ˜ p p p2 f R M ˆ y ˆ y ˆ y H H HD D D ˜ p p p 2 F ; where ˜ p p p=(p 1 ;:::; p M )2 f R M =f ˜ p p pj p j 2R + ;å M j=1 p j = 1g. Andk:k F is the Frobenius norm, ˆ y ˆ y ˆ y is the n m matrix of the realizations(fˆ y k;i g n k=1 ) m i=1 , H H H is the n mM matrix of H H H = (fh k (x N k;i ;f N 0;i ;u k;i ;q q q 1 )g n k=1 ) m i=1 ::: (fh k (x N k;i ;f N 0;i ;u k;i ;q q q M )g n k=1 ) m i=1 ; and D D D ˜ p p p is mM m matrix consisting of diagonal matrices as follows, D D D ˜ p p p = diag(p 1 ) diag(p 2 ) ::: diag(p M ) T ; where diag(a) is an m m diagonal matrix with a on the diagonals. As M increases, the optimization problem becomes ill-conditioned. As a result, M must be chosen large enough such that the desired accuracy of computational approximation is attained, 49 yet large numerical errors in solving the optimization problem, due to being poorly conditioned, are avoided. Later, when illustrating the numerical results, we are going to show how including a regularization term, chosen by a combination of art and science, takes care of the ill-posedness. 4.4 A General Approach to Error Models In Banks’ paper [21], as well as previous sections, the theorems are proved based on the fact that the random errors e k;i are independent and identically distributed with mean 0 and common variance s 2 . Although this is a very common assumption in simple statistical models due to computational practicality, it does not precisely model many biological and physical problems. In some problems, it is not reasonable to assume that the errors are independent. As a result, the following general approach is introduced to cover the models that do not have independent random errors. By equation (4.1) we had, Y k;i = ¯ y k;i (P 0 )+ e k;i ; k= 1;:::;n i ; i= 1;:::;m; Let ! e i =(e 1;i ;e 2;i ;:::;e n i ;i ) T ; i= 1;:::;m be independent and identically distributed Normal random vectors with mean ! 0 and covariance matrix S, i.e. ! e i N n i ( ! 0;S). As a result, the random error is independent among different subjects, but not at different time points for the same subject. Let us rewrite equation (4.1) in a vector form, i.e. ! Y i = ! ¯ y i (P 0 )+ ! e i ; i= 1;:::;m; (4.12) 50 where ! Y i =(Y 1;i ;Y 2;i ;:::;Y n i ;i ) T , and ! ¯ y i (P 0 )=( ¯ y 1;i ; ¯ y 2;i ;:::; ¯ y n i ;i ) T . By the properties of a covari- ance matrix, we know that a covariance matrix is symmetric; therefore, it is diagonalizable, [26], and we can write it as S= OLO T ; (4.13) where L is a diagonal matrix with eigenvalues of S being the diagonal elements, and O is an orthogonal matrix such that the columns of this matrix are the eigenvectors corresponding to the eigenvalues. Using the following transformation [61], let A=sL 1 2 O T ; (4.14) then left-multiplying equation (4.12) by A, we get A ! Y i = A ! ¯ y i (P 0 )+ A ! e i ; i= 1;:::;m: (4.15) Relabeling the terms in equation (4.15), we get ! V i = ! ¯ v i (P 0 )+ ! E i ; i= 1;:::;m: Now, we observe that Var( ! E i )= Var(A ! e i )= AVar( ! e i )A T ; (4.16) 51 then by substituting equation (4.13) and (4.14) in equation (4.16), and knowing that by orthogo- nality of matrix O, we have O T O= I, we get Var( ! E i )=sL 1 2 O T S(sL 1 2 O T ) T =s 2 L 1 2 O T OLO T OL 1 2 =s 2 L 1 2 LL 1 2 =s 2 I: Therefore, by applying the transformation to our model (4.12), the resulting transformed errors have a covariance matrix that is a diagonal matrix, so the transformed errors are uncorre- lated with common variance, and by normality, we can conclude that they are independent and identically distributed. As a result, the theorems in the previous section are applicable to the transformed model. 52 Chapter 5 The Maximum Likelihood Estimator In Chapter 4, we used a naive pooled approach where we established the existence and consis- tency of a least squares estimator, as well as the convergence of its finite-dimensional approxima- tion. The results in the previous chapter were based on the assumption that aggregate longitudinal data is what is available. However, if we assume that subject-specific longitudinal data is available for each drinking episode, then a mixed effects model would be more appropriate. In this chapter, we consider the mixed effects approach where we establish the existence and consistency of a maximum likelihood estimator for the joint probability distribution of the random parameters of the model, q q q. We take a nonparametric approach in the context of a mixed effects statistical model using a Prohorov metric framework on a set of feasible measures. A theoretical convergence result for a finite-dimensional approximation scheme for computing the maximum likelihood estimator is also established. First, we will set up the mathematical model for the mixed effects model. 53 5.1 The Mathematical Model Consider the following discrete-time mathematical model introduced in [2] for the i th subject at time-step k x k;i (q q q i )= g k1 (x k1;i (q q q i );u k1;i ;q q q i ); k= 1;:::;n i ; i= 1;:::;m; x 0;i =f 0;i ; i= 1;:::;m; where q q q i is the i th subject’s parameter vector in Q, denoting the set of admissible parameters, g k1 :HR n Q!H ,H is in general an infinite-dimensional Hilbert space, and u k1;i 2R n is the input. The output is given by y k;i (q q q i )= h k (x k;i (q q q i );f 0;i ;u k;i ;q q q i ); k= 1;:::;n i ; i= 1;:::;m; where h k :HHR n Q!R. For the mixed effects model, we define Y k;i = y k;i (q q q i )+ e k;i ; k= 1;:::;n i ; i= 1;:::;m; (5.1) where for each i= 1;2;:::;m, e k;i are independent and identically distributed (i.i.d.) with mean 0, variance s 2 , and e k;i j, k= 1;2;:::;n i , where j is a density with respect to a sigma-finite measure m onR and assumed to be continuous onR, and we assume that the random vectors [e 1;i ;:::;e n i ;i ] are independent with respect to i, i= 1;2;:::;m. That is, we assume that the error is independent across subjects and conditionally independent within subjects (i.e. given q q q i ). For 54 each i = 1;2;:::;m, let Y Y Y i = [Y 1;i ;Y 2;i ;:::;Y n i ;i ] T , y y y i (q q q i ) = [y 1;i (q q q i );y 2;i (q q q i );:::;y n i ;i (q q q i )] T , e e e i = [e 1;i ;e 2;i ;:::;e n i ;i ] T and rewrite (5.1) as Y Y Y i = y y y i (q q q i )+ e e e i ; i= 1;:::;m: Then, for i= 1;2;:::;m, Y Y Y i are independent with Y Y Y i f i (;q q q i ) :R n i !R, where f i (v v v;q q q i )= n i Õ k=1 j(v k y k;i (q q q i )); i= 1;:::;m; (5.2) where v v v=[v 1 ;v 2 ;:::;v n i ] T 2R n i . Let P2P(Q) denote a probability measure on the Borel sigma algebra on Q,S Q , whereP(Q) denotes the set of all probability measures defined onS Q , and let P 0 2P(Q) be the “true” distribution of the random vector q q q i . The goal is to find an estimate of P 0 . In order to generate an estimator for P 0 , and establish theoretical results and computational tools, we use the nonparametric maximum likelihood (NPML) approach, introduced by Lindsay and Mallet in [43, 47], using the Prohorov metric-based framework onP(Q), introduced by Banks and his co-authors in [21], and summarized in Section 3.6. For P2P(Q) and i= 1;2;:::;m, let L i (P;Y Y Y i )= Z Q f i (Y Y Y i ;q q q i )dP(q q q i ) = Z Q n i Õ k=1 j(Y k;i y k;i (q q q i ))dP(q q q i ) = Z Q n i Õ k=1 j(Y k;i h k (x k;i (q q q i );f 0;i ;u k;i ;q q q i ))dP(q q q i ) (5.3) be the contribution of the i th subject to the likelihood function 55 L n n n;m (P;Y Y Y)= m Õ i=1 L i (P;Y Y Y i ) = m Õ i=1 Z Q f i (Y Y Y i ;q q q i )dP(q q q i ) = m Õ i=1 Z Q n i Õ k=1 j(Y k;i y k;i (q q q i ))dP(q q q i ) = m Õ i=1 Z Q n i Õ k=1 j(Y k;i h k (x k;i (q q q i );f 0;i ;u k;i ;q q q i ))dP(q q q i ); (5.4) where n n n=fn i g m i=1 and Y Y Y =fY Y Y i g m i=1 . The goal is to find P that maximizes the likelihood function. Define the estimator P n n n;m = arg max P2P(Q) L n n n;m (P;Y Y Y) = arg max P2P(Q) m Õ i=1 Z Q f i (Y Y Y i ;q q q i )dP(q q q i ) = arg max P2P(Q) m Õ i=1 Z Q n i Õ k=1 j(Y k;i y k;i (q q q i ))dP(q q q i ) = arg max P2P(Q) m Õ i=1 Z Q n i Õ k=1 j(Y k;i h k (x k;i (q q q i );f 0;i ;u k;i ;q q q i ))dP(q q q i ): (5.5) Let ˆ y k;i be realizations of the random variables Y k;i , and define ˆ P n n n;m = arg max P2P(Q) L n n n;m (P; ˆ y ˆ y ˆ y) = arg max P2P(Q) m Õ i=1 Z Q f i (ˆ y ˆ y ˆ y i ;q q q i )dP(q q q i ) = arg max P2P(Q) m Õ i=1 Z Q n i Õ k=1 j(ˆ y k;i y k;i (q q q i ))dP(q q q i ) = arg max P2P(Q) m Õ i=1 Z Q n i Õ k=1 j(ˆ y k;i h k (x k;i (q q q i );f 0;i ;u k;i ;q q q i ))dP(q q q i ); (5.6) 56 where ˆ y ˆ y ˆ y=fˆ y ˆ y ˆ y i g m i=1 , with ˆ y ˆ y ˆ y i =[ˆ y 1;i ; ˆ y 2;i ;:::; ˆ y n i ;i ] T . The results of Lindsay and Mallet in [43, 47] states that the maximum likelihood estimator ˆ P n n n;m can be found in the class of discrete distributions with at most m support points, i.e. ˆ P n n n;m 2 P M (Q), where M m. Also, in [63, 79], algorithms are established for this purpose. Similar to our approach for the least squares estimator in [2], we cannot exactly compute the maximum likelihood estimator, ˆ P n n n;m , since y k;i must be approximated numerically by y N k;i using a Galerkin numerical scheme with N denoting the level of discretization. Also, similarly, we define our approximating estimator over the setP M (Q) where M denotes the number of nodes,fq q q j g M j=1 . As a result, the optimization is over a finite set of parameters, the rational weightsfp j g M j=1 . Thus, our approximating estimator is ˆ P N n n n;m;M = arg max P2P M (Q) L N n n n;m (P; ˆ y ˆ y ˆ y) = arg max P2P M (Q) m Õ i=1 Z Q f N i (ˆ y ˆ y ˆ y i ;q q q i )dP(q q q i ) = arg max P2P M (Q) m Õ i=1 Z Q n i Õ k=1 j(ˆ y k;i h k (x N k;i (q q q i );f N 0;i ;u k;i ;q q q i ))dP(q q q i ): (5.7) 5.2 Existence and Consistency of the Maximum Likelihood Estimator As mentioned in the literature survey in Chapter 3, in [43], the existence and uniqueness of a maximum likelihood estimator of a mixing distribution using the geometry of mixture likelihoods was established. Similarly, in [47], the existence and uniqueness of the maximum likelihood estimator for the distribution of the parameters of a random coefficient regression model was established. Here we provide an existence argument based on the maximization of a continuous function over a compact set. 57 The following theorem establishes the existence of the estimator ˆ P n n n;m in (5.7), obtained from the realizationsfˆ y k;i g; k = 1;:::;n i ; i = 1;:::;m of the random variablesfY k;i g; k = 1;:::;n i ; i = 1;:::;m. This is sufficient for establishing the existence of the maximum likeli- hood estimator P n n n;m in (5.5). Theorem 5.2.1. For i= 1;2;:::;m, letL i be given by equation (5.3), and letL n n n;m (P; ˆ y ˆ y ˆ y) be given by equation (5.4) where ˆ y ˆ y ˆ y=fˆ y ˆ y ˆ y i g m i=1 , with ˆ y ˆ y ˆ y i =[ˆ y 1;i ; ˆ y 2;i ;:::; ˆ y n i ;i ] T . Assume that for each ˆ y ˆ y ˆ y i , we have a continuous functionL i (:; ˆ y ˆ y ˆ y i ) :P(Q)!R, and also for each P2P(Q) we have a measur- able functionL i (P;:) :R n i !R. Then there exists a measurable function ˆ P n n n;m :Õ m i=1 R n i !P(Q) such that L n n n;m ( ˆ P n n n;m ; ˆ y ˆ y ˆ y)= sup P2P(Q) L n n n;m (P; ˆ y ˆ y ˆ y): Proof. The theorem can be proven in a similar way as in Theorem 4.2.1. We have the similar setting of the theorem with the difference that we are taking the sup (instead of the inf) of a continuous function over a compact set. In order to establish the consistency of the maximum likelihood estimator P n n n;m , we show that r(P n n n;m ;P 0 ) converges almost surely to zero. We do this by applying a theorem by Kiefer and Wol- fowitz in [40], establishing that the nonparametric maximum likelihood approach is statistically consistent. In other words, as the number of subjects, m, gets larger, the estimator P n n n;m converges in probability to P 0 , the “true” distribution, in the sense of the Prohorov metric, or weakly, or in distribution. Here, we have set up our problem in a way that makes establishing the consistency a straightforward application of the consistency result in [40]. 58 Theorem 5.2.2. For each i= 1;:::;m, assume that the map q q q i 7! f i (v v v;q q q i ) from Q intoR is contin- uous for each v v v2R n i , and f i (v v v;q q q i ) is measurable in v v v for any q q q i 2 Q, where f i is given by (5.2). Assume further that P 0 is identifiable; that is, for P 1 2P(Q) with P 1 6= P 0 , we have m Õ i=1 Z z z z i 0 Z Q n i Õ k=1 j(z k h k (x k;i (q q q i );f 0;i ;u k;i ;q q q i ))dP 1 (q q q i )dm n i 6= m Õ i=1 Z z z z i 0 Z Q n i Õ k=1 j(z k h k (x k;i (q q q i );f 0;i ;u k;i ;q q q i ))dP 0 (q q q i )dm n i ; for at least one z z z=[z z z T 1 ;:::;z z z T m ] T 2R å m i=1 n i , where for i= 1;2;:::;m, z z z i =[z 1 ;:::;z n i ] T 2R n i , and the technical integrability assumption holds; that is, for any P2P(Q), lim e#0 E P 0 " log sup ˜ P2B e (P) L i ( ˜ P;Y Y Y i ) L i (P 0 ;Y Y Y i ) # + <¥; whereL i is given by equation (5.3). Then, as m!¥, r(P n n n;m ;P 0 )! 0 almost surely (i.e. with probability 1) and therefore in probability as well. Proof. The assumptions we have made in the previous section (Section 5.1) and in the statement of the theorem are sufficient to argue that assumptions 1-5 in [40] are satisfied. Therefore, we can apply the results established in [40]. The conclusion of the consistency result in [40] is that the cumulative distribution functions, F n n n;m , corresponding to P n n n;m converge almost surely to the cumulative distribution function F 0 corresponding to P 0 at every point of continuity of F 0 . It follows thatr(P n n n;m ;P 0 )! 0 almost surely (i.e with probability 1); thus, in probability as well, as m!¥, and the theorem is proven. 59 5.3 Approximations and Convergence We want to establish the convergence of the finite-dimensional maximum likelihood estimators to the maximum likelihood estimator corresponding to the infinite-dimensional model. As men- tioned earlier, we cannot actually compute ˆ P n n n;m in (5.6) and consequently we approximate it by ˆ P N n n n;m;M in (5.7). Consider the following assumptions, C1. For all n n n, m, and N, the map P7!L N n n n;m (P; ˆ y ˆ y ˆ y) is a continuous map. C2. For anyfP M g;P2P(Q) such that r(P M ;P)! 0, we haveL N i (P M ; ˆ y ˆ y ˆ y i )!L i (P; ˆ y ˆ y ˆ y i ) as N;M!¥ for i= 1;:::;m. C3. For all P2P(Q) and i= 1;:::;m,L N i (P M ; ˆ y ˆ y ˆ y i ) andL i (P; ˆ y ˆ y ˆ y i ) are uniformly bounded. Theorem 5.3.1. Under assumptions C1-C3, there exists maximizers ˆ P N n n n;m;M given by equation (5.7). In addition, there exists a subsequence of ˆ P N n n n;m;M that converges to ˆ P n n n;m given by equation (5.6) as M;N!¥. Proof. For all n n n, m, and N, by continuity of the map P7!L N n n n;m (P; ˆ y ˆ y ˆ y), per assumption C1, and compactness of(P(Q);r), we can conclude that ˆ P N n n n;m;M exists. In [21], it is shown that ˜ P d (Q) given by equation (3.1) is a dense subset ofP(Q). Thus, for M= 1;2;:::, construct a sequence of probability measures P M 2P M (Q) ˜ P d (Q)P(Q), such thatr(P M ;P)! 0 inP(Q). Then by assumptions C2 and C3, we have L N n n n;m (P M ; ˆ y ˆ y ˆ y)L n n n;m (P; ˆ y ˆ y ˆ y) = m Õ i=1 L N i (P M ; ˆ y ˆ y ˆ y i ) m Õ i=1 L i (P; ˆ y ˆ y ˆ y i ) ! 0: 60 Consequently,L N n n n;m (P M ; ˆ y ˆ y ˆ y)!L n n n;m (P; ˆ y ˆ y ˆ y) as N;M!¥. In addition, by definition, for each n n n, m, and N, and for all P M 2P M (Q), we have L N n n n;m ( ˆ P N n n n;m;M ; ˆ y ˆ y ˆ y)L N n n n;m (P M ; ˆ y ˆ y ˆ y): (5.8) In addition, by compactness ofP(Q), there exists a subsequence of ˆ P N n n n;m;M that converges to ˆ P n n n;m as M;N!¥. Thus, by taking the limit in (5.8) as M;N!¥, for all P2P(Q), we find that L n n n;m ( ˆ P n n n;m ; ˆ y ˆ y ˆ y)L n n n;m (P; ˆ y ˆ y ˆ y); thus, ˆ P n n n;m = arg max P2P(Q) L n n n;m (P; ˆ y ˆ y ˆ y) as given in equation (5.6). In order to apply the results established here to get numerical results, in practice, to achieve a desired level of accuracy, M, the number of nodes, and N, the level of discretization, are fixed at sufficiently large values. In other words, we choose a sufficiently large value for the level of discretization, N, how large that needs to be, of course, depends on the particular numerical discretization scheme chosen. The most common choice would be using a Galerkin-based method to the approximate y k;i by y N k;i . And, we also choose a sufficiently large value for M, the number of nodes,fq q q j g M j=1 . Therefore, the optimization problem is reduced to a standard constrained estimation problem over Euclidean M-space, in which we determine the values of the weights p j at each node q q q j with the constraints that the weights should all be non-negative, and they should sum to one. By equation (5.7), it follows that 61 ˆ P N n n n;m;M = arg max P2P M (Q) L N n n n;m (P; ˆ y ˆ y ˆ y) = arg max P2P M (Q) m Õ i=1 Z Q n i Õ k=1 j(ˆ y k;i y N k;i (q q q i ))dP(q q q i ) = arg max P2P M (Q) m Õ i=1 Z Q n i Õ k=1 j(ˆ y k;i h k (x N k;i (q q q i );f N 0;i ;u k;i ;q q q i ))dP(q q q i ) = arg max ˜ p p p2 f R M m Õ i=1 M å j=1 n i Õ k=1 j(ˆ y k;i h k (x N k;i ;f N 0;i ;u k;i ;q q q j ))p j = arg max ˜ p p p2 f R M m Õ i=1 M å j=1 p j n i Õ k=1 j(ˆ y k;i h k (x N k;i ;f N 0;i ;u k;i ;q q q j )); where ˜ p p p=(p 1 ;:::; p M )2 f R M =f ˜ p p pj p j 2R + ;å M j=1 p j = 1g. We note that computing ˆ P N n n n;m;M involves high order products of very small numbers which not unexpectedly can cause numerical underflow. In order to mitigate this, we maximize the log- likelihood function instead and rewrite it in a form that lends itself to the use of the MATLAB optimization routine logsumexp as follows ˆ P N n n n;m;M = arg max ˜ p p p2 f R M log m Õ i=1 M å j=1 p j n i Õ k=1 j(ˆ y k;i h k (x N k;i ;f N 0;i ;u k;i ;q q q j )) (5.9) = arg max ˜ p p p2 f R M m å i=1 log M å j=1 p j n i Õ k=1 j(ˆ y k;i h k (x N k;i ;f N 0;i ;u k;i ;q q q j )) = arg max ˜ p p p2 f R M m å i=1 log M å j=1 exp log p j n i Õ k=1 j(ˆ y k;i h k (x N k;i ;f N 0;i ;u k;i ;q q q j )) = arg max ˜ p p p2 f R M m å i=1 log M å j=1 exp log(p j )+ n i å k=1 log j(ˆ y k;i h k (x N k;i ;f N 0;i ;u k;i ;q q q j )) : 62 Chapter 6 Abstract Parabolic Systems In this chapter, we briefly describe what an abstract parabolic system is, its properties, its finite- dimensional approximation, and an associated convergence theory. In Section 6.1, we define abstract parabolic systems and discuss their properties. In Section 6.2, we describe how to get the finite-dimensional approximating systems using a Galerkin approach. In Section 6.3, we verify that the assumptions B1-B3 in Chapter 4 (in Section 4.3) related to the least squares estimator are satisfied. And, in Section 6.4, we verify that the assumptions C1-C3 in Chapter 5 (in Section 5.3) related to the maximum likelihood estimator are satisfied. 6.1 Abstract Parabolic Systems and Their Properties Let V and H be Hilbert spaces with V continuously and densely embedded in H, i.e. V ,! H. Let<:;:> H andj:j H denote the inner product and norm on H, respectively, and letk:k V denote the norm on V . We identify H with its dual H to obtain the Gelfand triple V ,! H ,! V and let<;> V ;V denote the duality pairing between V and V induced by the dense and continuous embedding of V in H and H in V . For each q q q2 Q, let a(q q q;:;:) : VV!C be a sesquilinear form satisfying the conditions given below: 63 D1. (Boundedness) There exists a constanta 0 such that for all ˜ y;y2 V , we have ja(q q q; ˜ y;y)ja 0 k ˜ yk V kyk V : D2. (Coercivity) There existsl 0 2R andm 0 > 0 such that for ally2 V , we have a(q q q;y;y)+l 0 jyj 2 H m 0 kyk 2 V : D3. (Continuity) For all ˜ y;y2 V and q q q; ˜ q q q2 Q, we have ja(q q q; ˜ y;y) a(˜ q q q; ˜ y;y)j d(q q q; ˜ q q q)k ˜ yk V kyk V : Consider the following abstract parabolic system in weak form, < ˙ x;y> V ;V +a(q q q;x;y)=< B B B(q q q)u;y> V ;V ; y2 V; x(0)= x 0 ; (6.1) y(t)= C C C(q q q)x(t); where x 0 2 H, u2 L 2 ([0;T];R n ) is the input to the system, y2 L 2 ([0;T];R) is the output of the system, B B B(q q q) :R n ! V , and C C C(q q q) : V!R are bounded linear operators. Using standard variational arguments (see for example, [44]) it can be shown that the system (6.1) has a unique solution in y j y2 L 2 ([0;T];V); ˙ y2 L 2 ([0;T];V ) C([0;T];H): 64 However, we use a linear semigroup approach to convert the system in (6.1) into a discrete- time state space model and then use arguments from linear semigroup theory [19, 54] to argue convergence of finite-dimensional Galerkin-based approximations and conclude that assumptions B1-B3 and C1-C3 are satisfied. Under the assumptions D1 and D2, a(q q q;:;:) defines a bounded linear operator A A A(q q q) : V! V such that for ˜ y;y2 V , < A A A(q q q) ˜ y;y> V ;V =a(q q q; ˜ y;y): If we restrict the operator A A A(q q q) to the subspace Dom(A A A(q q q)) =fy2 V j A A A(q q q)y2 Hg, it becomes the infinitesimal generator of a holomorphic or analytic semigroup,fe A A A(q q q)t j t 0g, of bounded linear operators on H. The operator A A A(q q q) is referred to as being regularly dissipative [16, 19, 74]. Moreover, this semigroup can be extended and restricted to be a holomorphic semigroup on V and V , respectively, as well [16, 74]. It follows that the system (6.1) can be written in a state space form as the evolution system with time-invariant operators A A A(q q q), B B B(q q q), and C C C(q q q), as follows ˙ x(t)= A A A(q q q)x(t)+ B B B(q q q)u(t); x(0)= x 0 ; (6.2) y(t)= C C C(q q q)x(t): with the mild solution of (6.2) given by the variation of constants formula as 65 x(t;q q q)= e A A A(q q q)t x 0 + Z t 0 e A A A(q q q)(ts) B B B(q q q)u(s)ds; t 0; (6.3) y(t;q q q)= C C C(q q q)x(t;q q q): Let t > 0 denote the length of the sampling interval and consider strictly zero-order hold inputs of the form u(t) = u k1 ; t2 [(k 1)t;kt); k = 1;2;:::, and let x k = x(kt) and y k = y(kt); k= 1;2;:::. Applying (6.3) on each sub-interval [(k 1)t;kt], k= 1;2;:::, we obtain the discrete-time dynamical system given by x k = ˆ A A A(q q q)x k1 + ˆ B B B(q q q)u k1 ; k= 1;2;:::; (6.4) y k = ˆ C C C(q q q)x k ; k= 1;2;:::; (6.5) where x 0 2 V , ˆ A A A(q q q)= e A A A(q q q)t , ˆ B B B(q q q)= R t 0 e A A A(q q q)s B B B(q q q)ds, and ˆ C C C(q q q)= C C C(q q q). 6.2 Finite-Dimensional Approximating Systems Using a standard Galerkin approach (see, for example, [18]), we can approximate the discrete- time system given in (6.4)-(6.5) by a sequence of approximating finite-dimensional discrete-time systems in a sequence of finite-dimensional subspaces, V N , of V , where N2Z + . In order to argue convergence, we will require the following additional assumption concerning the subspaces V N , E1. (Approximation) For every x2 V , there exists x N 2 V N such thatk x x N k V ! 0 as N!¥. 66 For each q q q2 Q, let A A A N (q q q) be the linear operator on V N obtained by restricting the form a(q q q;:;:) to V N V N , i.e. for ˜ y N ;y N 2 V N , < A A A N (q q q) ˜ y N ;y N > V ;V =a(q q q; ˜ y N ;y N ): Let p N : H! V N denote the orthogonal projection of H onto V N and set B B B N (q q q)=p N B B B(q q q), where in this definition p N is understood to be the natural extension of the projection opera- tor p N to V from its dense subspace H. Define ˆ A A A N (q q q)= e A A A N (q q q)t , ˆ B B B N (q q q)= R t 0 e A A A N (q q q)s B B B N (q q q)ds, and ˆ C C C N (q q q)x k = ˆ C C C(q q q). We then consider the sequence of approximating discrete-time dynamical systems given by x N k = ˆ A A A N (q q q)x N k1 + ˆ B B B N (q q q)u k1 ; k= 1;2;:::; (6.6) y N k = ˆ C C C N (q q q)x N k ; k= 1;2;:::; (6.7) with x N 0 =p N x 0 2 V N . Then under assumptions D1-D3 and E1 together with the assumption that Q is compact and continuity assumptions on the operators B B B(q q q) and C C C(q q q), using the Trotter-Kato theorem (see, for example, [15]), it can be argued that lim N!¥ k x N k x k k V = 0 and lim N!¥ jy N k y k j= 0 for each x 0 2 V , and uniformly in q q q for q q q2 Q and k2f1;2;:::;Kg, for any fixed K2N + , where x N k and y N k are given by (6.6) and (6.7), respectively and x k and y k are given by (6.4) and (6.5), respectively. 67 6.3 Verification of the Finite-Dimensional Assumptions of the LS Estimator We next show that the assumptions B1-B3 in Section 4.3 are satisfied. To establish that assump- tion B1 holds, it is enough to show that for each N, k, and i and any sequence of probability measures P M P(Q) withr(P M ;P)! 0 for some P2P(Q), we havej ¯ y N k;i (P M ) ¯ y N k;i (P)j! 0 as M!¥. For each i = 1;2;:::;m and each k = 0;1;2;:::;n, it is not difficult to argue (for example via the Trotter-Kato theorem from linear semigroup theory, see for example, [54]) that the map q q q7! y N k;i (q q q) is continuous from Q intoR, where for each i= 1;2;:::;m and q q q2 Q, y N k;i (q q q) is given by (6.6) and (6.7). Since Q was assumed to be compact, it follows that y N k;i 2 C b (Q) and therefore from the definition of the Prohorov metric given in Section 3.6, that ifr(P M ;P)! 0 as M!¥, then ¯ y N k;i (P M )! ¯ y N k;i (P) inR as M!¥, assumption B1 is satisfied. In order to show that the assumption B2 is satisfied, from arguments given previously in this section we have that lim N!¥ k x N k;i x k;i k V = 0 and lim N!¥ jy N k;i y k;i j= 0 for each x 0;i 2 V , and uniformly in q q q for q q q2 Q, k= 1;:::;n; i= 1;:::;m. We want to show that for any sequence of probability measures P M , such thatr(P M ;P)! 0 inP(Q), and for k= 1;:::;n; i= 1;:::;m, as N;M!¥, we have ¯ y N k;i (P M )= Z Q y N k;i dP M ! ¯ y k;i (P)= Z Q y k;i dP: Lete > 0 be given and choose N 0 such that for N N 0 , we havejy N k;i y k;i j<e=2 on all of Q. Then for every M, since P M is a probability measure, we have R Q jy N k;i y k;i jdP M <e=2. 68 It follows that Z Q y N k;i dP M Z Q y k;i dP Z Q y N k;i y k;i dP M + Z Q y k;i dP M Z Q y k;i dP e 2 + e 2 =e; where the second term is made less thane=2 by using the fact thatr(P M ;P)! 0 and choosing M sufficiently large. This establishes assumption B2. Finally, the same reasoning used above to argue that for each N, y N k;i (q q q) are uniformly bounded for q q q2 Q applies as well to y k;i (q q q). This together with the fact thatjy N k;i (q q q) y k;i (q q q)j! 0 uni- formly in q q q for q q q2 Q are sufficient to conclude that assumption B3 holds. 6.4 Verification of the Finite-Dimensional Assumptions of the MLE Next, we show that the assumptions C1-C3 in Section 5.3 are satisfied. Under the assump- tions D1-D3 and E1 using the Trotter-Kato approximation theorem from the theory of linear semigroups of operators [15, 54], we were able to conclude that lim N!¥ k x N k x k k V = 0 and lim N!¥ jy N k y k j= 0 for each x 0 2 V , and uniformly in q q q for q q q2 Q and k2f1;2;:::;Kg, for any fixed K2N + . We can now use the results to show that an abstract parabolic system satisfies assumptions C1-C3 given in Section 5.3. To show that the assumption C1 is satisfied, we need to show that for all n n n, m, and N, the map P7!L N n n n;m (P; ˆ y ˆ y ˆ y) is a continuous map. It suffices to show that for any fixed n n n, m, and N, and for any sequence of probability measures P M , such that r(P M ;P)! 0 in P(Q), we haveL N n n n;m (P M ; ˆ y ˆ y ˆ y)!L N n n n;m (P; ˆ y ˆ y ˆ y) as M!¥. Towards this end, we see that 69 L N n n n;m (P M ; ˆ y ˆ y ˆ y)L N n n n;m (P; ˆ y ˆ y ˆ y) = m Õ i=1 L N i (P M ; ˆ y ˆ y ˆ y i ) m Õ i=1 L N i (P; ˆ y ˆ y ˆ y i ) = m Õ i=1 Z Q n i Õ k=1 j(ˆ y k;i y N k;i (q q q i ))dP M (q q q i ) m Õ i=1 Z Q n i Õ k=1 j(ˆ y k;i y N k;i (q q q i ))dP(q q q i ) ! 0 by definition of the Prohorov metric. It follows that the assumption C1 is satisfied. Next, we show that the assumption C2 is satisfied. Using the argument cited above from [2], we have that lim N!¥ k x N k;i x k;i k V = 0 and lim N!¥ jy N k;i y k;i j= 0 for each x 0;i 2 V , uniformly in q q q i for q q q i 2 Q, k= 1;:::;n i ; i= 1;:::;m. We want to show that for any sequence of probability measures P M , such that r(P M ;P)! 0 inP(Q), and for i = 1;:::;m, as N;M!¥, we have L N i (P M ; ˆ y ˆ y ˆ y i )!L i (P; ˆ y ˆ y ˆ y i ). Recall that j is assumed to be continuous. Let e > 0, and choose N 0 such that for N N 0 , and for every M, R Q Õ n i k=1 j(ˆ y k;i y N k;i (q q q i ))Õ n i k=1 j(ˆ y k;i y k;i (q q q i )) dP M (q q q i ) <e=2. Then, we have L N i (P M ; ˆ y ˆ y ˆ y i )L i (P; ˆ y ˆ y ˆ y i ) = Z Q n i Õ k=1 j(ˆ y k;i y N k;i (q q q i ))dP M (q q q i ) Z Q n i Õ k=1 j(ˆ y k;i y k;i (q q q i ))dP(q q q i ) Z Q n i Õ k=1 j(ˆ y k;i y N k;i (q q q i )) n i Õ k=1 j(ˆ y k;i y k;i (q q q i )) dP M (q q q i ) + Z Q n i Õ k=1 j(ˆ y k;i y k;i (q q q i ))dP M (q q q i ) Z Q n i Õ k=1 j(ˆ y k;i y k;i (q q q i ))dP(q q q i ) < e 2 + e 2 =e; 70 where the second term is less than e=2 by definition of the Prohorov metric. Consequently, the assumption C2 is satisfied. Finally, we want to show that the assumption C3 is satisfied. We want to show that for all P2P(Q) and for i= 1;:::;m,L N i (P M ; ˆ y ˆ y ˆ y i ) andL i (P; ˆ y ˆ y ˆ y i ) are uniformly bounded. Recall that the parameter space Q is compact. Thus, for q q q i 2 Q, and for each N, y N k;i (q q q i ) are uniformly bounded. Similarly, y k;i (q q q i ) are also uniformly bounded and we also have thatjy N k;i (q q q i ) y k;i (q q q i )j! 0 uni- formly in q q q i for q q q i 2 Q. Therefore, we can conclude that the assumption C3 is satisfied. 71 Chapter 7 Application to the Transdermal Transport of Alcohol 7.1 A Random Parabolic PDE As in [66, 67, 68], we consider a first principles physics-based initial-boundary value model for the transdermal transport of ethanol in the form of a random parabolic partial differential equation. The parameters of this model are considered to be random due to a number of different sources of uncertainty. Converting to dimensionless variables, the following initial-boundary value problem is obtained: ¶x ¶t (t;h)= q 1 ¶ 2 x ¶h 2 (t;h); 0<h < 1; t> 0; (7.1) q 1 ¶x ¶h (t;0)= x(t;0); t> 0; (7.2) q 1 ¶x ¶h (t;1)= q 2 u(t); t> 0; (7.3) x(0;h)= x 0 ; 0<h < 1; (7.4) y(t)= x(t;0); t> 0; (7.5) 72 where t andh are the temporal and spatial variables, respectively, and x(t;h) indicates the con- centration of ethanol in the epidermal layer of the skin at time t and depthh, whereh = 0 is at the skin surface and h = 1 is at the boundary between the epidermal and dermal layers of the skin. The input to the system is u(t), which is the BrAC or BAC at time t, and the output is y(t), which is the TAC at time t. Equation (7.1) represents the transport of ethanol through the epidermal layer of the skin. The boundary conditions (7.2) and (7.3) represent respectively the evaporation of ethanol at the skin surface and the flux of ethanol across the boundary between the epidermal and dermal layers. It is assumed that there is no alcohol in the epidermal layer of the skin at time t = 0, so the initial condition (7.4) is x 0 (h)= 0, 0<h < 1. Finally, the output equation (7.5) represents the TAC level measured by the biosensor at the skin surface. The parameters in the system that will be assumed to be random are q 1 and q 2 , which represent respectively the normalized diffusivity and the normalized flux gain at the boundary between the dermal and epidermal layers. The values or distributions of these parameters are assumed to depend on environmental conditions, the particular sensor being used, and the physiological characteristics of the individual wearing the sensor. The parameter vector is q q q=(q 1 ;q 2 )2 Q, where Q is assumed to be a compact subset ofR + R + with metric d Q . 7.2 The Weak Form In order to apply the results established in Chapter 6 to the model for the transdermal transport of alcohol problem, we must first rewrite the system (7.1)-(7.5) in weak form, identify the feasible parameter space Q, the Hilbert spaces H and V , the sesquilinear form a(q q q;:;:) : VV!C, and 73 the operators B B B(q q q), and C C C(q q q). Then we must choose the approximating space V N and argue that the assumptions D1-D3 and E1 are satisfied. The parameter space Q is assumed to be a compact subset ofR + R + with any p-metric denoted by d Q . We set V = H 1 (0;1), H = L 2 (0;1), together with their standard inner products and norms, and therefore it follows that V = H 1 (0;1). It is well-known that these three spaces, H, V , and V , form a Gelfand triple. To rewrite the system (7.1)-(7.5) in weak form, we multiply by a test functiony2 V , then by integration by parts we have < ˙ x;y> V ;V =< ¶x ¶t (t;h);y(t;h)> V ;V ; =< q 1 ¶ 2 x ¶h 2 (t;h);y(t;h)> V ;V ; = Z 1 0 q 1 ¶ 2 x ¶h 2 (t;h)y(t;h)dh; = q 2 u(t)y(t;1) x(t;0)y(t;0) Z 1 0 q 1 ¶x ¶h (t;h) ¶y ¶h (t;h)dh: Writing it in the weak form, we get < ˙ x;y> V ;V + Z 1 0 q 1 ¶x ¶h (t;h) ¶y ¶h (t;h)dh+ x(t;0)y(t;0)= q 2 u(t)y(t;1): where <;> V ;V denotes the duality pairing between V and V . Then for q q q2 Q, u2R, and ˜ y;y2 V , we set a(q q q; ˜ y;y)= Z 1 0 q 1 ˜ y 0 (h)y 0 (h)dh+ ˜ y(0)y(0); (7.6) < B B B(q q q)u;y> V ;V = q 2 uy(1); C C C(q q q)y = C C Cy =y(0): 74 7.3 Verification of the Sesquilinear Form and Finite-Dimensional Assumptions Standard arguments involving the Sobolev Embedding Theorem (see, for example, [1]) can be used to argue that the assumptions D1-D3 are satisfied and clearly the operators B B B(q q q) and C C C(q q q) are continuous in the uniform operator topology with respect to q q q2 Q. Here, we will show that the assumptions D1-D3 are satisfied for the sesquilinear form [65] given by the equation (7.6): D1. (Boundedness) There exists a constanta 0 such that for all ˜ y;y2 V , we have ja(q q q; ˜ y;y)ja 0 k ˜ yk V kyk V : In order to show that the boundedness assumption holds, we apply the Cauchy-Schwarz in- equality as well as the Morrey’s inequality as follows ja(q q q; ˜ y;y)j= Z 1 0 q 1 ˜ y 0 (h)y 0 (h)dh+ ˜ y(0)y(0) q 1 Z 1 0 ˜ y 0 (h)y 0 (h)dh + ˜ y(0)y(0) q 1 Z 1 0 ˜ y 0 (h) 2 dh 1=2 Z 1 0 y 0 (h) 2 dh 1=2 + ˜ ak ˜ yk V kyk V q 1 k ˜ yk V kyk V + ˜ ak ˜ yk V kyk V a 0 k ˜ yk V kyk V : D2. (Coercivity) There existsl 0 2R andm 0 > 0 such that for ally2 V , we have a(q q q;y;y)+l 0 jyj 2 H m 0 kyk 2 V : 75 In order to show that the coercivity assumption holds, since Q is assumed to be a compact subset ofR + R + , there existsm 0 > 0 such that we have a(q q q;y;y)+l 0 jyj 2 H = q 1 Z 1 0 jy 0 (h)j 2 dh+y(0) 2 +l 0 Z 1 0 jy(h)j 2 dh m 0 Z 1 0 jy 0 (h)j 2 dh+ Z 1 0 jy(h)j 2 dh m 0 kyk 2 V : D3. (Continuity) For ally 1 ;y 2 2 V and q q q; ˜ q q q2 Q, we have ja(q q q; ˜ y;y) a(˜ q q q; ˜ y;y)j d(q q q; ˜ q q q)k ˜ yk V kyk V : In order to show that the continuity assumption holds, we let d Q to be the Euclidean distance onR, and we apply the Cauchy-Schwarz inequality as follows ja(q q q; ˜ y;y) a(˜ q q q; ˜ y;y)j= Z 1 0 q 1 ˜ y 0 (h)y 0 (h)dh Z 1 0 ˜ q 1 ˜ y 0 (h)y 0 (h)dh = Z 1 0 (q 1 ˜ q 1 ) ˜ y 0 (h)y 0 (h)dh q 1 ˜ q 1 Z 1 0 ˜ y 0 (h)y 0 (h)dh q 1 ˜ q 1 Z 1 0 ˜ y 0 (h) 2 dh 1=2 Z 1 0 y 0 (h) 2 dh 1=2 d(q q q; ˜ q q q)k ˜ yk V kyk V So we conclude that the assumptions D1-D3 are satisfied for the sesquilinear form given by the equation (7.6). 76 It follows from Chapter 6 that in the case of the least squares estimator, we have g k1 (x k1;i ;u k1;i ;q q q)= g(x k1;i ;u k1;i ;q q q) = ˆ A A A(q q q)x k1;i + ˆ B B B(q q q)u k1;i ; k= 1;2;:::;n i ; i= 1;:::;m; h k (x k;i ;f 0;i ;u k;i ;q q q)= h(x k;i ;u k;i ;q q q) = ˆ C C C(q q q)x k;i ; k= 1;:::;n i ; i= 1;:::;m; And, similarly, it follows that in the case of the maximum likelihood estimator, we have g k1 (x k1;i (q q q i );u k1;i ;q q q i )= g(x k1;i (q q q i );u k1;i ;q q q i ) = ˆ A A A(q q q i )x k1;i (q q q i )+ ˆ B B B(q q q i )u k1;i ; k= 1;2;:::;n i ; i= 1;:::;m; h k (x k;i (q q q i );f 0;i ;u k;i ;q q q i )= h(x k;i (q q q i );u k;i ;q q q i ) = ˆ C C C(q q q i )x k;i (q q q i ); k= 1;:::;n i ; i= 1;:::;m; where ˆ A A A(q q q) = e A A A(q q q)t , ˆ B B B(q q q) = R t 0 e A A A(q q q)s B B B(q q q)ds, and ˆ C C C(q q q) = C C C(q q q) with t > 0 the length of the sampling interval. Let V N ; N = 1;2;:::, be the span of the standard linear splines defined with respect to the uniform meshf0;1=N;2=N;:::;(N 1)=N;1g on [0;1]. Then, assumption E1 is satisfied by standard arguments for spline functions (see, for example, [62]). If for each i= 1;2;:::;m, we define x N k;i and y N k;i as in (6.4)-(6.5), then by the arguments in Chapter 6, we conclude that the assumptions B1-B3 in Section 4.3 related to the least squares estimator and the assumptions C1- C3 in Section 5.3 related to the maximum likelihood estimator are satisfied. 77 Chapter 8 Numerical Results Finally, in this chapter, we present the application of our scheme to the transdermal transport of alcohol, first, in the case of the least squares estimator in Section 8.1 in the context of an example involving simulated data, and in two examples involving human subjects data collected in the Luczak laboratory [46, 60] utilizing two different TAC biosensor devices one collected with WrisTAS TM 7 biosensors (supported by NIH grant R21 AA017711) and the other collected with SCRAM CAM R biosensors (supported by NIH grant R01 AA026368). Next, we present the numerical results from the transdermal transport of alcohol problem in the case of the maximum likelihood estimator in Section 8.2. This also includes numerical studies for two examples, one involving simulated data and the other, actual human subjects data collected in the Luczak laboratory. 8.1 The Least Squares Estimator In the simulation case in Section 8.1.1, in order to show the convergence of the estimated dis- tribution of the parameter vector q q q = (q 1 ;q 2 ) to the “true” distribution, we take BrAC inputs from different drinking episodes, and simulate the TAC outputs using a specific distribution as 78 the “true” distribution. We estimate the distribution of the parameter vector using the developed algorithm. Then, we generate samples from the estimated distribution using the Markov Chain Monte Carlo (MCMC) Metropolis algorithm. We perform a generalized (2-dimensional) two- sample Kolmogorov-Smirnov test (KS-test) in order to test the null hypothesis that our generated samples from the estimated distribution and the generated samples from the “true” distribution are drawn from the same distribution. For the human subjects data in Section 8.1.2, we take nine drinking episodes, and apply the LOOCV method. Each time, we estimate the distribution of the parameter vector q q q using the training set. We simulate 100 TACs using the BrAC input of the test set and 100 samples of q q q generated from the estimated distribution. Using the average of the TACs as an estimation of the “true” TAC, we observe the behavior of the normalized root-mean-square error (NRMSE) under different model complexities and consider their computational costs. 8.1.1 Example 1: Estimation Based on Simulation Data The transdermal transport of ethanol is modeled by a random partial differential equation with random parameter vector q q q=(q 1 ;q 2 ). We estimate the distribution of this parameter vector. To demonstrate the application of the approach developed in the previous chapters, we simulated ag- gregate TAC data in MATLAB by assuming that the two parameters q 1 and q 2 are i.i.d. random variables from a Beta(2;5) distribution and therefore that their joint cumulative distribution func- tion (cdf) is the product of their marginal cdfs. Applying the results from the previous chapters, we estimate the probability distribution of the parameter vector q q q=(q 1 ;q 2 ), and compare it to the “true” distribution. 79 From equation (4.1), we have Y k;i = ¯ y k;i (P 0 )+ e k;i ; k= 1;:::;n i ; i= 1;:::;m; where m is the number of drinking episodes, ¯ y k;i (P 0 ) is the observed aggregate TAC for the i th drinking episode at time step k, given the “true” distribution P 0 , the product of the cdfs of two independent Beta(2;5) distributions, and e k;i is the random error, i.i.d. random variables from Normal distribution with mean 0 and variance 10 6 , i.e. N(0;10 6 ); k= 1;:::;n i ; i= 1;:::;m. The numerical scheme used to approximate the PDE model for the TAC observations is a linear spline-based Galerkin approximation method with N+ 1 equally spaced nodes (N equal sub-intervals of [0;1]) (see [66, 67, 68]). Let ¯ y N k;i (P) be the approximation of ¯ y k;i (P) using the Galerkin method, where N denotes the level of the spatial discretization. And, let ˆ y k;i be the realizations of the random variables Y k;i . Let n = maxn i , and let y k;i = 0 for n i < k n; i = 1;:::;m, depicting the level of alcohol staying at 0 after reaching 0. We want to compute ˆ P N n;m;M . Recall from equation (4.4) we have ˆ P N n;m;M = arg min P2P M (Q) J N n;m (ˆ y ˆ y ˆ y;P)= arg min P2P M (Q) m å i=1 n å k=1 (ˆ y k;i ¯ y N k;i (P)) 2 ; where ˆ y ˆ y ˆ y = (fˆ y k;i g n i k=1 ) m i=1 . And,P M (Q), an approximation toP(Q), is the computationally tractable set given byP M (Q). Recall from (3.2) that P M (Q)=fP2 ˜ P d (Q)j P= M å j=1 p j d q q q j ;q q q j 2fq q q j g M j=1 g; where ˜ P d (Q) is given by equation (3.1). 80 In the alcohol problem, since the parameter vector q q q is two-dimensional, we fix the nodes q q q j =(q j 1 ;q j 2 ) as M uniform meshgrid coordinates on[0;1][0;1]. From the results from Chapter 4, we have ˆ P N n;m;M = arg min ˜ p p p2 f R M ˆ y ˆ y ˆ y H H HD D D ˜ p p p 2 F ; where ˆ y ˆ y ˆ y is the n m matrix of the realizations (fˆ y k;i g n k=1 ) m i=1 , q q q j =(q j 1 ;q j 2 ) are the M nodes of the uniform meshgrid, and ˜ p p p=(p 1 ;:::; p M )2 f R M =f ˜ p p pj p j 2R + ;å M j=1 p j = 1g, where p j is the rational weight at the node q q q j . The n mM matrix H H H is given by H H H = (fh k (x N k;i ;f N 0;i ;u k;i ;q q q 1 )g n k=1 ) m i=1 ::: (fh k (x N k;i ;f N 0;i ;u k;i ;q q q M )g n k=1 ) m i=1 ; where h k (x N k;i ;f N 0;i ;u k;i ;q q q j ) is the TAC evaluated at the node q q q j = (q j 1 ;q j 2 ) and D D D ˜ p p p is mM m matrix consisting of diagonal sub-matrices as follows, D D D ˜ p p p = diag(p 1 ) diag(p 2 ) ::: diag(p M ) T : We letf N 0;i = 0, since we assume that there is no alcohol in the epidermal layer of the skin at time t= 0. The resulting constrained optimization problem is solved using the routine FMINCON in MATLAB. In order to prevent the estimation from extreme oscillations due to being ill-posed, we add a regularization term to our optimization problem. So we have ˆ ˆ P N n;m;M = arg min ˜ p p p2 f R M ˆ y ˆ y ˆ y H H HD D D ˜ p p p 2 F + w 1å j 1 ; j 2 p j 1 +1; j 2 p j 1 ; j 2 2 + w 2å j 1 ; j 2 p j 1 ; j 2 +1 p j 1 ; j 2 2 ; 81 where w 1 and w 2 are the regularization parameters. As w 1 and w 2 increase, the regularization terms plays a stronger role. The critical step is to choose the regularization parameters, w 1 and w 2 , properly, which is a combination of art and science. Using MATLAB, we first simulated the TAC data which represents data that might be col- lected by the biosensor for an individual’s drinking episode. To make our simulated TAC data as realistic as possible, we used BrAC data from three drinking episodes collected in the Luczak laboratory as the input to our model. Using these inputs, the simulated aggregate TAC is gener- ated by first sampling 100 TAC observations as longitudinal vectors through generating random samples of q 1 and q 2 , i.i.d. random variables from a Beta(2;5) distribution, then averaging them at each time point, and then adding noise. After computing the simulated TAC outputs, we used our scheme, and numerically solved the resulting optimization problem to obtain an estimate for the cdf. To show the convergence of the estimated cdf of the parameter vector q q q=(q 1 ;q 2 ) to the “true” distribution, which in the simulation case is the product of two Beta(2;5) cdfs, we use the MCMC Metropolis Algorithm to generate random samples from the estimated distribution. For the MCMC method, we used cubic spline interpolation to increase the resolution to a near-continuous distribution and used the Metropolis Algorithm with the standard uniform as the proposal distribution. Then, we per- formed a generalized (2-dimensional) two-sample Kolmogorov-Smirnov test (KS-test). The null hypothesis was that the generated random samples from the estimated distribution and those from the “true” distribution were drawn from the same distribution. We consider the case with M = 400 and N = 128 with regularization parameters w 1 = 2 10 3 , and w 2 = 5 10 5 . In Figure (8.1), we can see three different views of the estimated cdf and the “true” cdf, which we recall is the product of two Beta(2;5) cdfs. 82 Figure 8.1: Different views of the estimated distribution and the “true” joint Beta(2;5) distribu- tion for M= 400 and N = 128 with regularization term We can observe that our estimated distribution is reasonably “close” to the “true” distribu- tion, which agrees with the averaged p-value of our hypothesis test in Table (8.1). To calcu- late the averaged p-value, we generated 500 samples from the estimated distribution using the MCMC algorithm and 500 samples from the “true” distribution, and applied the two-dimensional Kolmogorov-Smirnov test. We repeated this 100 times, and take the average of all the p-values that we get each time. The averaged p-value in Table (8.1) indicates that it is reasonable to not reject the null hypothesis, which states that the samples from the estimated distribution and the “true” distribution are drawn from the same distribution. 83 M N Averaged p-value 400 128 0.0586 Table 8.1: The averaged p-value for M= 400 and N = 128 with regularization parameters w 1 = 2 10 3 , and w 2 = 5 10 5 Figure (8.2) shows the histograms of the 50000 generated q 1 and q 2 samples. In this sim- ulation case, since we assume that the parameters q 1 and q 2 are i.i.d. random variables from a Beta(2;5) distribution, we compare the histograms of each parameter with the “true” probability density function of Beta(2;5) in one dimension. Figure 8.2: Histogram of the generated q 1 samples (left) and q 2 samples (right) plus the “true” probability density function of Beta(2;5) (red) Since these numerical results are based on the simulated data, we considered the “true” dis- tribution to be a simple case by assuming q 1 and q 2 i.i.d. random variables from a Beta(2;5) distribution. The particular choice of this distribution is the scatter plot of samples of q 1 and q 2 for a particular set of drinking episode data given in [12]. In that paper, the parameters were ob- tained deterministically for BrAC and TAC measurements from 18 drinking episodes. However, 84 the choice of this distribution was mainly for demonstration purpose. We tested the algorithm on other distributions as well, and we obtained similar results. Next, we will apply the algorithm to actual human subjects data collected in the Luczak laboratory. The nonparametric estimation method gives us the flexibility of not assuming any particular distribution type for the param- eter vector q q q = (q 1 ;q 2 ), which in turn gives us the flexibility of not having to assume that q 1 and q 2 are i.i.d. when we are working with actual clinical or experimental data. By estimating the distribution of the parameter vector q q q=(q 1 ;q 2 ), we will then be able to sample from this distribution. 8.1.2 Example 2: Estimation Based on Human Subjects Data We considered two different datasets collected with WrisTAS7 and SCRAM biosensors. For each dataset, we take m= 9 drinking episodes selected from a larger dataset. For the SCRAM dataset, we chose drinking episodes from different drinking patterns; 3 chosen from a single drinking pattern, 3 from a dual drinking pattern, and 3 from a steady drinking pattern. Applying the LOOCV method, we partition the drinking episodes into the training set, which includes 8 drinking episodes, and the test set, which includes 1 drinking episode; we repeat this 9 times. Each time, we get an estimation for the distribution of the parameter vector q q q=(q 1 ;q 2 ) using the training set. We generate 100 random samples of q q q=(q 1 ;q 2 ) from the estimated distribution. Then we simulate 100 TAC signals, using the BrAC input of the test set and the 100 random samples of q q q. We then compute the average of the TAC signals as an estimate of the “true” TAC. In order to estimate the accuracy of our model, we compute the normalized root-mean-square error (NRMSE) using the estimated TAC and the measured TAC, given the model complexity based on the number of nodes M and the level of discretization N. In each round of the LOOCV 85 method, since the comparison within the drinking episodes of the test sets between the mea- sured TAC and the estimated TAC are in different scales for each test set, we use the following normalized root-mean-square error (NRMSE) for a means of comparison, NRMSE i = RMSE i max k ˆ y k;i min k ˆ y k;i ; where RMSE i = s 1 n n å k=1 (ˆ y k;i ˆ ˆ y k;i ) 2 ; with ˆ y k;i the measured TAC for i th drinking episode at time step k, and ˆ ˆ y k;i the estimated TAC. We compare various model complexities each having a different number of nodes M, and a different level of discretization N by calculating the NRMSE mean for each model complexity, NRMSE mean = 1 m m å i=1 NRMSE i : For the first example, we consider the data collected by the WrisTAS7 alcohol biosensor. In Table (8.2), we include different model complexities with a different number of nodes M and a different level of discretization N. We see that as the number of nodes M and the level of discretization N increase, the NRMSE mean decreases. In Figure (8.3), a plot with a y-axis on each side, we can see the decrease in NRMSE mean as the number of nodes M and the level of discretization N increase, referring to the model complexities in Table (8.2). We also see from the boxplots of NRMSE i , i= 1;:::;9 for each model that the variance of NRMSE i also decreases with the increase in M and N. Looking at the trend, we can 86 see that the NRMSE mean does not decrease much after a certain point. Consequently, in practice, the choice of the number of nodes M and the level of discretization N presents a trade-off between the NRMSE mean and the computational cost. Model Complexity M N NRMSE mean 1 4 2 0.8214 2 9 2 0.4394 3 16 4 0.2357 4 25 4 0.1805 5 36 8 0.1404 6 49 16 0.1261 7 64 32 0.1259 8 81 128 0.1206 9 225 128 0.1190 10 400 128 0.1140 Table 8.2: Decrease in NRMSE mean for an increasing number of nodes M and an increasing level of discretization N for the dataset collected by the WrisTAS7 biosensor Figure 8.3: Boxplot of NRMSE i given the complexity of the model as labeled in Table (8.2). A decrease in NRMSE mean as M and N increase (blue), and an increase in the computational cost in seconds (green) is observed for the dataset collected by the WrisTAS7 biosensor 87 Figure (8.4) shows the measured TAC and the estimated TAC for 9 drinking episodes from the test set collected by the WrisTAS7 alcohol biosensor using the LOOCV method, as well as the 95% simultaneous confidence band for the model complexity where the number of nodes is fixed at M= 400 and the level of discretization is fixed at N = 128. We have an overall 1a confidence level witha = 0:05 (i.e. 95% simultaneous confidence level) by using 1 a n confidence level at each time step according to the Bonferroni correction. The confidence interval at each time step is ¯ x k t 99; a 2n s k p 100 , where ¯ x k is the mean of the 100 sampled TACs at time step k, s k is the standard deviation of the 100 sampled TACs at time step k, and t 99; a 2n is the upper a 2n quantile of the t distribution with 99 degrees of freedom. Figure 8.4: The measured TAC, the estimated TAC, and the 95% simultaneous confidence band for 9 drinking episodes from the test set collected by the WrisTAS7 biosensor using the LOOCV 88 For the second example, we consider the dataset collected by the SCRAM alcohol biosensor. We can see from Table (8.3), and Figures (8.5) and (8.6) that we get a similar result. Model Complexity M N NRMSE mean 1 4 2 0.5120 2 9 2 0.2544 3 16 4 0.1870 4 25 4 0.1527 5 36 8 0.1452 6 49 16 0.1437 7 64 32 0.1304 8 81 128 0.1267 9 225 128 0.1248 10 400 128 0.1235 Table 8.3: Decrease in NRMSE mean for an increasing number of nodes M and an increasing level of discretization N for the dataset collected by the SCRAM biosensor Figure 8.5: Boxplot of NRMSE i given the complexity of the model as labeled in Table (8.3). A decrease in NRMSE mean as M and N increase (blue), and an increase in the computational cost in seconds (green) is observed for the dataset collected by the SCRAM biosensor 89 Figure 8.6: The measured TAC, the estimated TAC, and the 95% simultaneous confidence band for 9 drinking episodes from the test set collected by the SCRAM biosensor using the LOOCV 8.2 The Maximum Likelihood Estimator For the simulated data example in Section 8.2.1, we are able to observe the convergence of the estimated distribution of the random parameter vector q q q=(q 1 ;q 2 ) to the “true” distribution as the number of drinking episodes increases, as the number of Dirac measure nodes increases, and as the level of discretization in the finite-dimensional approximations increases. We look at each case separately and in turn. 90 In the actual human subjects data example discussed in Section 8.2.2, we apply the LOOCV method by first estimating the distribution of the parameter vector q q q=(q 1 ;q 2 ) using a training set, and then estimating the TAC output using the estimated distribution and the BrAC input of a the test set. 8.2.1 Example 1: Estimation Based on Simulation Data In this example, we estimate the distribution of the parameter vector q q q=(q 1 ;q 2 ) in the system (7.1)-(7.5) by first simulating TAC data in MATLAB with the assumption that the two parameters q 1 and q 2 are i.i.d. with a Beta distribution, q 1 ;q 2 Beta(2;5). Thus, their joint cumulative distribution function (cdf) is the product of their marginal Beta(2;5) cdfs. From equation (5.1), we have Y k;i = y k;i (q q q i )+ e k;i ; k= 1;:::;n i ; i= 1;:::;m; where m is the number of drinking episodes, and y k;i (q q q i ) is the observed TAC for the i th drink- ing episode at time step k. We let P 0 be the product of the cdfs of two independent Beta(2;5) distributions, and e k;i N(0;10 6 ); k= 1;:::;n i ; i= 1;:::;m. To approximate the PDE model for the TAC observations, we used the linear spline-based Galerkin approximation scheme with N equally spaced sub-intervals from[0;1] (see [66, 67, 68]). We want to compute ˆ P N n n n;m;M given by equation (5.9), where q q q j =(q j 1 ;q j 2 ) is chosen as M uniform meshgrid coordinates on [0;1][0;1]. We make the assumption that there is no alcohol in the epidermal layer of the skin at time t= 0, so we letf N 0;i = 0. The constrained optimization problem over Euclidean M-space was solved using constrained optimization routine FMINCON from the 91 Optimization Toolbox in MATLAB applied to the negative of the log-likelihood function given in equation (5.9). In the naive pooled approach in which the assumed observation was aggregated TAC, the appropriate underlying statistical model was the naive pooled model (i.e. the data point for each drinking episode at a certain time is an observation of the mean behavior plus a random error). When the nonlinear least squares-based constrained optimization problem was solved, the in- herent ill-posedness of the inverse problem resulted in undesirable oscillations. To mitigate this behavior, we included an appropriately weighted regularization term in the performance index being minimized. One advantage of the mixed effects statistical model presented here is that regularization was not required. In order to demonstrate the consistency of our estimator, we show that as m, the number of drinking episodes, increases, the estimated cdf of the parameter vector q q q=(q 1 ;q 2 ) approaches the “true” cdf, which is the product of two Beta(2;5) cdfs. In order to simulate realistic longitudinal TAC vectors, representing data that might be collected by the TAC biosensor for an individual’s drinking episode, we used BrAC data collected in the Luczak laboratory as the input to the model. Then, we generated random samples of the parameters q 1 and q 2 by considering them to be i.i.d. Beta(2;5) in MATLAB. Then, we simulated the TAC signal using the BrAC input and the generated samples. Using the algorithm developed in previous chapters, we estimated the distribution of the ran- dom parameter vector by solving the optimization problem for different cases based on the num- ber of drinking episodes, m2f1;3;7;9;16;42g, and observed the convergence of the estimated distribution to the “true” distribution as m increases. 92 To quantify this, let D be the sum of the squared differences at each node between the esti- mated and the “true” distribution, the product of two Beta(2;5) cdfs. Let p j and b j be the weights at the node q q q j of the estimated and “true” distribution, respectively. Then, D= M å j=1 (p j b j ) 2 : We fixed the number of nodes, M, and the level of discretization, N, sufficiently large. Based on the numerical studies from previous section, we set M= 400 and N = 128. We estimated the distribution of the parameter vector for different cases based on different numbers of drinking episodes, m2f1;3;7;9;16;42g, and calculated D for each case. Our results are summarized in Table (8.4). We observed that as the number of drinking episodes, m, increases, the sum of the squared differences at each node between the estimated distribution and the “true” distribution, D, decreases. m D 1 39.0164 3 28.3091 7 8.3247 9 7.1750 16 3.5697 42 3.0337 Table 8.4: Decrease in D, the sum of the squared differences at each node between the estimated and the “true” distribution, with increasing m, the number of drinking episodes, for fixed values of the number of nodes M= 400 and the level of discretization N = 128 In Figure (8.7), we have plotted three different views of the estimated distribution and the “true” distribution, (again, the product of two Beta(2;5) cdfs), for the different cases where m= 7, m= 16, and m= 42 drinking episodes in the top, middle, and bottom rows, respectively, with the 93 number of nodes set to M = 400 and the level of discretization to N = 128. We observe that as m increases, our estimated distribution gets “closer” to the “true” distribution, which agrees with the numerical results that are shown in Table (8.4). Figure 8.7: Three different views of the estimated distribution and the “true” joint Beta(2;5) distribution, for different numbers of drinking episodes m= 7, m= 16, and m= 42 in the top, middle, and bottom rows, respectively, for a fixed number of nodes M = 400 and level of dis- cretization N = 128 Next, we show that as the number of nodes M and the level of discretization N increases, the normalized sum of squared differences at each node between the estimated and the “true” 94 distribution decreases. First, we fixed the level of discretization at N= 128, and we increased the number of nodes M. Let ¯ D M = 1 M M å j=1 (p j b j ) 2 : In Table (8.5), we observe that for the fixed value of N = 128, as the number of nodes, M, increases, the normalized sum of the squared differences at each node between the estimated distribution and the “true” distribution, ¯ D M , decreases. M ¯ D M 25 0.03795 100 0.03025 225 0.02974 400 0.02081 Table 8.5: Decrease in ¯ D M , the normalized sum of the squared differences at each node between the estimated distribution and the “true” distribution, with increasing M, the number of nodes, for a fixed value for the level of discretization, N = 128 Next, we fixed the number of nodes at M= 400, and we increased the level of discretization N. Let ¯ D N = 1 N M å j=1 (p j b j ) 2 : In Table (8.6), we observe that for M= 400 fixed, as the level of discretization, N, increases, the normalized sum of the squared differences at each node between the estimated distribution and the “true” distribution, ¯ D N , decreases. 95 N ¯ D N 4 1.22783 16 0.65644 64 0.18565 128 0.06504 Table 8.6: Decrease in ¯ D N , the normalized sum of the squared differences at each node between the estimated distribution and the “true” distribution, with increasing N, the level of discretization, with the number of nodes fixed at M= 400 As mentioned earlier, the choice of the “true” distribution for q 1 and q 2 in the simulation case is the scatterplot of samples for a set of 18 drinking episodes including BrAC and TAC measure- ments obtained by a deterministic approach in [12]. However, the Beta(2;5) distribution was chosen strictly for the purpose of demonstration. When applying our algorithm to actual clinic or lab collected human subjects data, an advantage of our nonparametric approach is that we do not need to make any assumptions about the family of feasible distributions for the parameter vector. In addition, the independent and identically distributed assumption was also very simplistic given that q 1 and q 2 parameters depend on the same individual and environmental conditions at the time of measurements. This assumption is also relaxed in the flexible nonparametric approach used in the next example. 8.2.2 Example 2: Estimation Based on Human Subjects Data The two datasets used in this example were obtained by two different alcohol biosensors; SCRAM and WrisTAS7 [46, 60]. We fixed the number of nodes at M= 400 and the level of discretization at N = 128, both sufficiently large with respect to convergence as we observed in our simulation data examples. From each dataset, we chose m = 9 different drinking episodes from a larger dataset. For the SCRAM dataset, the drinking episodes where chosen from 3 different drinking 96 patterns (3 from each of the single, dual, and steady drinking patterns). We split the drinking episodes into a training set consisting of 8 drinking episodes, and a test set consisting of 1 drinking episode. This way, we could apply the LOOCV method. We repeated this partitioning process 9 times, each time leaving out a different drinking episode. Using the training set, we first estimated the distribution of the parameter vector q q q=(q 1 ;q 2 ). Next, we sampled 100 parameter vectors q q q=(q 1 ;q 2 ) from the estimated distribution, and using those along with the BrAC input from the test set, we simulated 100 TAC longitudinal signals. From these 100 simulated TAC signals, we estimated the “true” TAC by computing the mean at each time, and we provided what we refer to as a 95% conservative error band, or simply as a 95% error band, by taking the 2.5 and 97.5 percentiles. Later, we are also going to use the same approach for obtaining the 95% error band as we introduce a number of statistics associated with the TAC curve that are of particular interest to alcohol researchers and clinicians working in the area of alcohol use disorder. For the first example, we considered the dataset collected by the SCRAM alcohol biosensor. Prior to applying the LOOCV method, in order to visualize the estimated density and distribution of q q q=(q 1 ;q 2 ), and the marginal densities of q 1 and q 2 , we trained the algorithm on all 9 drinking episodes. Figure (8.8) illustrates the four aforementioned plots. In the estimated density plot, we can see that our numerical result for this example is in agreement with the theoretical result in Lindsay and Mallet [43, 47], which states that the maximum likelihood estimator ˆ P n n n;m can be found in the class of discrete distributions with at most m support points, i.e. ˆ P n n n;m 2P M (Q), where M m. In this example, since we had 9 drinking episodes, the estimated density plot displays the support points among 400 nodes for q q q=(q 1 ;q 2 ). 97 In addition, the sample mean of q q q = (q 1 ;q 2 ) is calculated to be ¯ q q q = (0:6003;1:2452); the sample covariance matrix is calculated to be S S S q q q = 0 B B @ 0:0706 0:0264 0:0264 0:0483 1 C C A ; and the sample correlation isr q q q =0:4519; observing a moderate negative association between the parameters q 1 and q 2 for our training population consisting of 9 drinking episodes. Figure 8.8: The estimated density (top left), estimated distribution (top right), marginal density of q 1 (bottom left), and marginal density of q 2 (bottom right) obtained from m= 9 drinking episodes collected by the SCRAM biosensor, for fixed values of M= 400 and N = 128 98 After illustrating the estimated density, the estimated distribution, and the marginal densities for q 1 and q 2 obtained from training our algorithm introduced in Chapter 5 on the 9 drinking episodes collected by the SCRAM alcohol biosensor, we applied the LOOCV method as ex- plained above to the same 9 drinking episodes. Figure (8.9) shows the measured TAC (i.e. measured by the SCRAM alcohol biosensor) and the estimated TAC (i.e. obtained from our algorithm) for all the 9 drinking episodes left out in the test set in the partitioning process, and the conservative 95% error band for a fixed number of nodes M= 400 and level of discretization N = 128. Figure 8.9: The measured TAC, the estimated TAC, and the conservative 95% error band for 9 drinking episodes from the test set collected by the SCRAM biosensor using the LOOCV 99 Alcohol researchers and clinicians are particularly interested in a certain statistics associated with drinking episodes: the maximum or peak value of the TAC curve, the time at which the peak value of the TAC is attained, and the area under the TAC curve. The area under the curve (AUC) is a quantifying measure of exposure to the alcohol that integrates the transdermal alcohol concentration across time. Tables (8.7)-(8.9) display these statistics along with the measured (or actual) value obtained by the SCRAM alcohol biosensor as well as the conservative 95% error band for the 9 drinking episodes from the test set. From these tables, we can observe that the 95% error bands do a reasonably good job of capturing the actual values of these statistics specially for the value of the peak TAC and the area under the curve displayed in Table (8.7) and (8.9). In Figure (8.9), we can see that there are minor fluctuations in the measured TAC curve. If we smoothed the measured TAC curve, we would have obtained better results for capturing the time at which the peak value of the TAC is attained displayed in Table (8.8). Drinking Episode Measured Peak TAC Estimated Peak TAC 95% Error Band 1 0.0585 0.0413 (0.0327, 0.0539) 2 0.0477 0.0433 (0.0324, 0.0543) 3 0.0464 0.0391 (0.0320, 0.0501) 4 0.0417 0.0375 (0.0301, 0.0489) 5 0.0535 0.0618 (0.0497, 0.0781) 6 0.0419 0.0361 (0.0287, 0.0465) 7 0.0450 0.0441 (0.0346, 0.0558) 8 0.0405 0.0430 (0.0345, 0.0542) 9 0.0391 0.0402 (0.0313, 0.0508) Table 8.7: The measured peak TAC, the estimated peak TAC, and the 95% error band for the 9 drinking episodes from the test set collected by the SCRAM biosensor 100 Drinking Episode Measured Peak Time Estimated Peak Time 95% Error Band 1 2.5600 2.4480 (1.9200, 2.8800) 2 2.2400 2.6080 (2.2400, 2.8800) 3 2.2400 2.8928 (2.5600, 3.2000) 4 4.1600 2.9056 (2.5600, 3.2000) 5 2.2400 2.5728 (2.2400, 2.8800) 6 2.2400 2.8928 (2.5600, 3.2000) 7 3.2000 2.9696 (2.5600, 3.2000) 8 2.8800 2.9312 (2.5600, 3.2000) 9 3.2000 2.9088 (2.5600, 3.2000) Table 8.8: The measured peak time, estimated peak time, and the 95% error band for the 9 drinking episodes from the test set collected by the SCRAM biosensor Drinking Episode Measured AUC Estimated AUC 95% Error Band 1 0.1909 0.1784 (0.1382, 0.2229) 2 0.1876 0.1799 (0.1343, 0.2321) 3 0.1868 0.1751 (0.1364, 0.2421) 4 0.1765 0.1735 (0.1349, 0.2394) 5 0.2231 0.2963 (0.2203, 0.3911) 6 0.1151 0.1496 (0.1117, 0.1982) 7 0.1474 0.1912 (0.1444, 0.2563) 8 0.1277 0.1898 (0.1412, 0.2505) 9 0.1493 0.1750 (0.1307, 0.2319) Table 8.9: The measured area under the curve (AUC), estimated AUC, and 95% error band for 9 drinking episodes from the test set collected by the SCRAM biosensor For the second example, we similarly applied the LOOCV method, as explained before, to the 9 drinking episodes collected by the WrisTAS7 alcohol biosensor. Figure (8.10) shows the measured TAC (i.e. measured by the WrisTAS7 alcohol biosensor) and the estimated TAC (i.e. obtained from our algorithm) for all the 9 drinking episodes left out in the test set in the partitioning process, and the conservative 95% error band for a fixed number 101 of nodes M = 400 and level of discretization N = 128. We can observe that we obtained similar results as the previous example. Figure 8.10: The measured TAC, the estimated TAC, and the conservative 95% error band for 9 drinking episodes from the test set collected by the WrisTAS7 biosensor using the LOOCV 102 Chapter 9 Discussion and Concluding Remarks This dissertation was motivated by a problem involving the development of a data analysis system for a transdermal alcohol biosensor. We took two different approaches: 1) The naive pooled approach where we established the existence and consistency of a least squares estimator as well as its computational convergence. 2) The mixed effects approach where we established the existence and consistency of a maximum likelihood estimator as well as its computational convergence. For the naive pooled approach, we assumed that each data point is an observation of the mean behavior plus a random error. We considered a nonparametric approach (i.e. not assuming any distribution form) to estimate the probability distribution of a random parameter vector in discrete-time (in general, infinite-dimensional) dynamical systems based on aggregate data from multiple subjects. Our results are related to, and in some sense an extension of, the Prohorov metric framework developed earlier in [21]. We obtained existence and consistency results for our estimator, and proved a convergence result for a finite-dimensional approximation framework for the case where the underlying dynamical system is infinite-dimensional and not directly amenable to numerical computations. We provided a rather complete treatment of how our framework can 103 be applied in the case of abstract parabolic systems. We were able to use linear semigroup theory, in particular the Trotter-Kato Theorem, and the properties of the weak topology and the Prohorov metric on the space of feasible probability measures, to establish a convergence result with respect to the least squares estimator for the finite-dimensional approximating estimation problems and the least squares estimator for the estimation problem posed in terms of the original underlying infinite-dimensional model. Our results in Section 8.1 demonstrated the efficacy of our approach in the case of both simulated data and human subjects data from an NIH funded study, illustrating that our approach works for two different alcohol biosensors, the WrisTAS7 and the SCRAM devices. The results we obtained and presented for the naive pooled approach are based on the as- sumption that aggregate longitudinal data is what is available. However, if we assume that subject-specific longitudinal data is available for each drinking episode, then a mixed effects model would be more appropriate. Thus, we took a similar approach to the one taken in the naive pooled approach that utilizes a maximum likelihood based mixed effects statistical model [28, 72]. We considered the nonparametric fitting of a population model for the transdermal transport of alcohol based on a maximum likelihood approach applied to a mixed effects statisti- cal model. In estimating a population model, we were actually estimating the distribution of the model parameters and consequently the maximum likelihood estimation problem was formulated as an optimization problem over a space of feasible probability measures endowed with the weak topology induced by the Prohorov metric. By using a first principles physics based model in the form of a one dimensional diffusion equation, we were able to capture the essential features of transdermal transport while keeping the dimension of the parameter space low. In this way, unlike the naive pooled approach, we 104 were able to avoid having to introduce regularization so as to mitigate ill-posedness and over- fitting. On the other hand, the fact that the model was infinite-dimensional being based on a partial differential equation, computing the maximum likelihood estimator necessitated finite- dimensional approximation. We were able to first theoretically demonstrate the existence and then the consistency of our maximum likelihood estimator using a decades old result from the literature. The consistency result is with respect to the uncertainty across subjects. Similar to the naive pooled approach, we were able to use linear semigroup theory, in partic- ular the Trotter-Kato Theorem, and the properties of the weak topology and the Prohorov metric on the space of feasible probability measures, to establish a convergence result with respect to the maximum likelihood estimator for the finite-dimensional approximating estimation problems and the maximum likelihood estimator for the estimation problem posed in terms of the original underlying infinite-dimensional model. We were able to demonstrate the efficacy of our theoretical results numerically first on an example involving simulated data and then on one involving actual human subjects data. We used our scheme to obtain the joint density and distribution of the parameters as well as estimates and conservative 95% error bands for the TAC signal and a number of TAC related statistics of particular interest to researchers and clinicians who work in the area of alcohol use disorder. In addition to consistency with respect to the intrinsic uncertainty, other extensions we are currently looking at include the development of a general framework that is applicable to differ- ent types of dynamical systems whether finite-dimensional or infinite-dimensional and whether continuous or discrete-time dynamical systems. This generalization will extend our framework to apply to any type of dynamical system such as ordinary differential equations (ODEs), partial 105 differential equations (PDEs), functional differential equations (FDEs), or difference equations (DEs) that would potentially subsume the results presented here as well as in [12], and [21]. Looking ahead, our primary motivation in pursuing this research is our interest in solving op- timization and control problems involving random dynamical systems. Once we have estimated the distribution of the random parameters in the underlying dynamical system, which serves as the forward model, and introduced a population model, the objective then advances to estimating the input to the system based on the observation of the output for a subject since the actual mo- tivation for this investigation is the development of schemes for converting biosensor measured TAC into BAC or BrAC (i.e. deconvolution an estimate for BAC or BrAC from the TAC signal). In particular, we are interested in comparing it to the schemes, used for this same purpose, de- veloped and implemented in [39] and [69]. In addition, we are also interested in examining how our uncertainty quantification scheme for the TAC to BAC or BrAC conversion problem performs when compared to the non-physics based, machine learning inspired schemes developed in [34], [53] and [52]. 106 Reference List [1] R.A. Adams and J.J.F. Fournier. Sobolev Spaces. Academic Press, 2nd edition, 2003. [2] L. Asserian, S.E. Luczak, and I.G. Rosen. Prohorov metric-based nonparametric estimation of the distribution of random parameters in abstract parabolic systems with application to the transdermal transport of alcohol. Submitted, 2021. [3] H.T. Banks and K. Bihari. Modelling and estimating uncertainty in parameter estimation. Inverse Problems, 17:95–111, 2001. [4] H.T. Banks and D.M. Bortz. Inverse problems for a class of measure dependent dynamical systems. J. Inverse and Ill-posed Problems, 13:103–121, 2005. [5] H.T. Banks, D.M. Bortz, and S.E. Holte. Incorporation of variability into the mathematical modeling of viral delays in hiv infection dynamics. Math. Biosciences, 183:63–91, 2003. [6] H.T. Banks, D.M. Bortz, G. Pinter, and L. Potter. Modeling and imaging techniques with po- tential for application in bioterrorism. Bioterrorism: Mathematical Modeling Applications in Homeland Security, 28, 2003. [7] H.T. Banks, M. Davidian, J.R. Samuels, and K.L. Sutton. An Inverse Problem Statistical Methodology Summary, pages 249–302. Springer Netherlands, Dordrecht, 2009. [8] H.T. Banks and J.L. Davis. A comparison of approximation methods for the estimation of probability distributions on parameters. Appl. Num. Math., 57:753–777, 2007. [9] H.T. Banks and J.L. Davis. Quantifying uncertainty in the estimation of probability distri- butions. Mathematical Biosciences and Engineering, 5:647–667, 2008. [10] H.T. Banks and B.G. Fitzpatrick. Estimation of growth rate distributions in size structured population models. Quarterly of Applied Mathematics, 49:215–235, 1991. [11] H.T. Banks, B.G. Fitzpatrick, L.K. Potter, and Y . Zhang. Estimation of probability distribu- tions for individual parameters using aggregate population observations, pages 353–371. Birkh¨ auser, 1998. [12] H.T. Banks, K.B. Flores, I.G. Rosen, E.M. Rutter, M. Sirlanci, and C. Thompson. The prohorov metric framework and aggregate data inverse problems for random pdes. Commu- nications in Applied Analysis, 22(3):415–446, 2018. [13] H.T. Banks and N.L. Gibson. Well-posedness in maxwell systems with distributions of polarization relaxation parameters. Applied Math. Letters, 18:423–430, 2005. 107 [14] H.T. Banks and N.L. Gibson. Electromagnetic inverse problems involving distributions of dielectric mechanisms and parameters. Quarterly of Applied Mathematics, 64:749–795, 2006. [15] H.T. Banks and K. Ito. A unified framework for approximation in inverse problems for distributed parameter systems. Control Theory Advanced Technology, 4(1):73–90, 1988. [16] H.T. Banks and K. Ito. Approximation in lqr problems for infinite dimensional systems with unbounded input operators. J. Math. Syst. Est. Control, 7(1):1–34, 1997. [17] H.T. Banks, Z.R. Kenz, and W.C. Thompson. A review of selected techniques in inverse problem nonparametric probability distribution estimation. Journal of Inverse and Ill-Posed Problems, 20(4):429–460, 2012. [18] H.T. Banks and K. Kunisch. The linear regulator problem for parabolic systems. SIAM J. Cont. Optim., 22(5):684–698, 1984. [19] H.T. Banks and K. Kunisch. Estimation Techniques for Distributed Parameter Systems. Birkh¨ auser, Boston-Basel-Berlin, 1989. [20] H.T. Banks and G.A. Pinter. A probabilistic multiscale approach to hysteresis in shear wave propagation in biotissue. SIAM J. Multiscale Modeling and Simulation, 3:395–412, 2005. [21] H.T. Banks and W.C. Thompson. Least squares estimation of probability measures in the Prohorov metric framework. Technical report, DTIC Document, 2012. [22] H.T. Banks and H.T. Tran. Mathematical and Experimental Modeling of Physical and Bio- logical Processes. CRC Press, Boca Raton, 2009. [23] Z.W. Birnbaum. Numerical tabulation of the distribution of kolmogorov’s statistic for finite sample size. Journal of the American Statistical Association, 47:425–441, 1952. [24] S. Brooks, A. Gelman, G. Jones, and X. Meng. Handbook of markov chain monte carlo. CRC press, 2011. [25] Z. Dai, I.G. Rosen, C. Wang, N. Barnett, and S.E. Luczak. Using drinking data and pharma- cokinetic modeling to calibrate transport model and blind deconvolution based data analysis software for transdermal alcohol biosensors. Mathematical Biosciences and Engineering, 13(5):911–934, jul 2016. [26] B.N. Datta. Numerical Linear Algebra and Applications. SIAM, Philadelphia, PA, 2nd edition, 2010. [27] M. Davidian and D.M. Giltinan. Nonlinear Models for Repeated Measurement Data. Chap- man and Hall, New York, 1995. [28] M. Davidian and D.M. Giltinan. Nonlinear models for repeated measurement data: An overview and update. Journal of Agricultural, Biological and Environmental Statistics, 8:387–419, 2003. 108 [29] E. Demidenko. Mixed Models, Theory and Applications. John Wiley and Sons, Hoboken, 2nd edition, 2013. [30] D.M. Dougherty, N.E. Charles, A. Acheson, S. John, R.M. Furr, and N. Hill-Kapturczak. Comparing the detection of transdermal and breath alcohol concentrations during periods of alcohol consumption ranging from moderate drinking to binge drinking. Experimental and Clinical Psychopharmacology, 20:373–381, 2012. [31] D.M. Dougherty, T.E. Karns, J. Mullen, Y . Liang, S.L. Lake, J.D. Roache, and N. Hill- Kapturczak. Transdermal alcohol concentration data collected during a contingency man- agement program to reduce at-risk drinking. Drug and Alcohol Dependence, 148:77–84, 2015. [32] M.A. Dumett, I.G. Rosen, J. Sabat, A. Shaman, L. Tempelman, C. Wang, and R.M. Swift. Deconvolving an estimate of breath measured blood alcohol concentration from biosensor collected transdermal ethanol data. Applied Mathematics and Computation, 196(2):724– 743, 2008. [33] E.I. Ette and P.J. Williams. Population pharmacokinetics ii: Estimation methods. Annals of Pharmacotherapy, 38:1907–1915, 2004. [34] C.E. Fairbairn, D. Kang, and N. Bosch. Using machine learning for real-time bac esti- mation from a new-generation transdermal biosensor in the laboratory. Drug and Alcohol Dependence, 216:108205, 2020. [35] V . Federov. Theory of Optimal Experiments. Academic Press, 1972. [36] T.S. Ferguson. A course in large sample theory. Routledge, 2017. [37] A.E. Gelfand. Gibbs sampling. Journal of the American statistical Association, 95(452):1300–1304, 2000. [38] W.K. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57(1):97–109, 1970. [39] K. Hawekotte, S.E. Luczak, and I.G. Rosen. Obtaining breath alcohol concentration from transdermal alcohol concentration using a bayesian approach. Submitted, 2021. [40] J. Kiefer and J. Wolfowitz. Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Annals of Mathematical Statistics, 27(4):887–906, 1956. [41] J. Kiefer and J. Wolfowitz. The equivalence of two extremum problems. Canadian J. Mathematics, 12:363–366, 1960. [42] D.A. Labianca. The chemical basis of the breathalyzer: a critical analysis. J. Chem. Educ, 67(3):259–261, 1990. [43] B. Lindsay. The geometry of mixture likelihoods: a general theory. Ann Stat, 11:86–94, 1983. 109 [44] J.L. Lions. Optimal Control of Systems Governed by Partial Differential Equations. Grundlehren der Mathematischen Wissenschaften in Einzeldarstellungen mit Besonderer Ber¨ ucksichtigung der Anwendungsgebiete. Springer-Verlag, 1971. [45] M. Lovern, M. Sargentini-Maier, C. Otoul, and J. Watelet. Population pharmacokinetic and pharmacodynamic analysis in allergic diseases. Drug Metabolism Reviews, 41(3):475–485, 2009. [46] S.E. Luczak, I.G. Rosen, and T.L. Wall. Development of a real-time repeated-measures assessment protocol to capture change over the course of a drinking episode. Alcohol and Alcoholism, 50(1):1–8, 2015. [47] A. Mallet. A maximum likelihood estimation method for random coefficient regression models. Biometrika, 73(3):645–656, 1986. [48] P.R. Marques and A.S. McKnight. Field and laboratory alcohol detection with 2 types of transdermal devices. Alcoholism: Clinical and Experimental Research, 33(4):703–711, 2009. [49] D. Muir. kstest 2s 2d(x1, x2, alpha). MATLAB (Retrieved 21 January 2020), 2017. [50] R.M. Neal. Slice sampling. Annals of statistics, pages 705–741, 2003. [51] R.M. Neal. MCMC using Hamiltonian dynamics. Handbook of markov chain monte carlo, 2(11):2, 2011. [52] C. Oszkinat, S.E. Luczak, and I.G. Rosen. Uncertainty quantification in the estimation of blood alcohol concentration using physics-informed neural networks. Submitted, 2021. [53] C. Oszkinat, T. Shao, C. Wang, I.G. Rosen, A. Rosen, E. Saldich, and S.E. Luczak. A covariate-dependent hidden markov model for uncertainty quantification and estimation of blood and breath alcohol concentration from transdermal alcohol biosensor data. Submitted, 2021. [54] A. Pazy. Semigroups of Linear Operators and Applications to Partial Differential Equa- tions. Applied Mathematical Sciences. Springer, 1983. [55] J.A. Peacock. Two-dimensional goodness-of-fit testing in astronomy. Royal Astronomy Society, 202:615–627, 1983. [56] Y .V . Prohorov. Convergence of random processes and limit theorems in probability theory. Theor. Probability Appl., 1:157–214, 1956. [57] C. Robert and G. Casella. Monte Carlo statistical methods. Springer Science & Business Media, 2013. [58] I.G. Rosen, S.E. Luczak, and J. Weiss. Blind deconvolution for distributed parameter sys- tems with unbounded input and output and determining blood alcohol concentration from transdermal biosensor data. Applied Mathematics and Computation, 231:357–376, 2014. 110 [59] J.T. Sakai, S. Mikulich-Gilbertson, R.J. Long, and T.J. Crowley. Validity of transdermal alcohol monitoring: Fixed and self-regulated dosing. Alcoholism Clinical and Experimental Research, 30:26–33, 01 2006. [60] E.B. Saldich, C. Wang, I.G. Rosen, L. Goldstein, J. Bartroff, R.M. Swift, and S.E. Luczak. Obtaining high-resolution multi-biosensor data for modeling transdermal alcohol concen- tration data. Alcoholism: Clinical and Experimental Research, 44, 2020. Suppl. 1. [61] B. Santhanam. course materials for ece541 probability theory and stochastic process, Fall 2019. University of New Mexico. [62] M.H. Schultz. Spline Analysis. Prentice-Hall Series in Automatic Computation. Pearson Education, Limited, 1972. [63] A. Schumitzky. Nonparametric em algorithms for estimating prior distributions. Applied Mathematics and Computation, 45:143–157, 1991. [64] S. Sinko and W. Streifer. A new model for age-size structure of a population. Ecology, 48:910–918, 1967. [65] M. Sirlanci. Finite dimensional approximation and convergence in the estimation of the distribution of, and input to, random abstract parabolic systems with application to the deconvolution of blood/breath alcohol concentration from biosensor measured transdermal alcohol level. PhD thesis, University of Southern California, 2018. [66] M. Sirlanci, S.E. Luczak, C.E. Fairbairn, D. Kang, R. Pan, X. Yu, and I.G. Rosen. Esti- mating the distribution of random parameters in a diffusion equation forward model for a transdermal alcohol biosensor. Automatica, pages 101–109, 2019. [67] M. Sirlanci, S.E. Luczak, and I.G. Rosen. Approximation and convergence in the estimation of random parameters in linear holomorphic semigroups generated by regularly dissipative operators. In American Control Conference (ACC), 2017, pages 3171–3176. IEEE, 2017. [68] M. Sirlanci, S.E. Luczak, and I.G. Rosen. Estimation of the distribution of random param- eters in discrete time abstract parabolic systems with unbounded input and output: Approx- imation and convergence. Communications in Applied Analysis, 23:287–329, 01 2019. [69] M. Sirlanci, I.G. Rosen, S.E. Luczak, C.E. Fairbairn, K. Bresin, and D. Kang. Decon- volving the input to random abstract parabolic systems; a population model-based approach to estimating blood/breath alcohol concentration from transdermal alcohol biosensor data. Inverse Problems, 34(12):2–27, 2018. [70] W.F. Smith, J. Hashemi, and F. Presuel-Moreno. Foundations of Materials Science and Engineering. McGraw-Hill, New York, NY , 3rd edition, 2004. [71] J.L. Steimer, A. Mallet, and F. Mentre. Estimating interindividual pharmacokinetic vari- ability. Raven Press, New York, 1985. [72] A.M. Stuart. Inverse problems: A Bayesian perspective. Acta Numerica, pages 451–559, 2010. 111 [73] R.M. Swift. Transdermal alcohol measurement for estimation of blood alcohol concentra- tion. Alcoholism: Clinical and Experimental Research, 24(4):422–423, 2000. [74] H. Tanabe. Equations of Evolution. Monographs and Studies in Mathematics. Pitman, London, 1979. [75] T. Tatarinova, M. Neely, J. Bartroff, M. van Guilder, W. Yamada, D. Bayard, R. Jelliffe, R. Leary, A. Chubatiuk, and A. Schumitzky. Two general methods for population pharma- cokinetic modeling: non-parametric adaptive grid and non-parametric bayesian. J Pharma- cokinet Pharmacodyn, 40:189–199, 2013. [76] S.T. Tokdar and R.E. Kass. Importance sampling: a review. Wiley Interdisciplinary Re- views: Computational Statistics, 2(1):54–60, 2010. [77] A. Wald. Note of the consistency of the maximum likelihood estimate. Ann. Math. Stat., 20:595–601, 1949. [78] J. Wolfowitz. On Wald’s proof of the consistency of the maximum likelihood estimate. Ann. Math. Stat., 20:601–602, 1949. [79] H.P. Wynn. The sequential generation of D-optimal experimental designs. Ann. Math. Stat., 41:1655–1664, 1970. 112
Abstract (if available)
Abstract
Motivated by the transdermal transport of alcohol problem, we consider two approaches for
estimating the probability distribution of a random parameter vector in discrete-time abstract parabolic systems. First, we take the naive pooled approach to set up a least squares estimator, and in the second approach, we use the mixed effects model to set up a maximum likelihood estimator. In both approaches, we consider a Prohorov metric-based nonparametric approach to estimating the probability distribution of the random parameter vector. We establish the existence and consistency of the estimators. A theoretical convergence result for a finite-dimensional approximation scheme for computing the estimators is also established and the efficacy of the approach is demonstrated by applying the scheme to the transdermal transport of alcohol modeled by a random parabolic PDE.
In the least squares estimator case, to show the convergence of the estimated distribution to the
“true” distribution, we simulate data from the “true” distribution, apply our algorithm, and obtain the estimated cumulative distribution function. We then use the Markov Chain Monte Carlo Metropolis Algorithm to generate random samples from the estimated distribution, and perform a generalized (2-dimensional) two-sample Kolmogorov-Smirnov test with null hypothesis that our generated random samples from the estimated distribution and generated random samples from the “true” distribution are drawn from the same distribution. We then apply our algorithm to actual human subject data from the alcohol biosensor and observe the behavior of the normalized root-mean-square error (NRMSE) using leave-one-out cross-validation (LOOCV) under different model complexities.
In the maximum likelihood estimator case, numerical studies included show that the maximum likelihood estimator is statistically consistent in that the convergence of the estimated distribution to the “true” distribution is observed in an example involving simulated data. The algorithm developed is then applied to two datasets collected using two different transdermal alcohol biosensors. Using the LOOCV method, we get an estimate for the distribution of the random parameters based on a training set. The input from a test drinking episode is then used to quantify the uncertainty propagated from the random parameters to the output of the model in the form of a 95% error band surrounding the estimated output signal.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Linear quadratic control, estimation, and tracking for random abstract parabolic systems with application to transdermal alcohol biosensing
PDF
An abstract hyperbolic population model for the transdermal transport of ethanol in humans: estimating the distribution of random parameters and the deconvolution of breath alcohol concentration
PDF
Finite dimensional approximation and convergence in the estimation of the distribution of, and input to, random abstract parabolic systems with application to the deconvolution of blood/breath al...
PDF
Obtaining breath alcohol concentration from transdermal alcohol concentration using Bayesian approaches
PDF
M-estimation and non-parametric estimation of a random diffusion equation-based population model for the transdermal transport of ethanol: deconvolution and uncertainty quantification
PDF
Estimation of random input to semi-linear abstract parabolic systems with application to quantitative description of drinking behavior based on trans-dermal alcohol concentration
PDF
Population modeling and Bayesian estimation for the deconvolution of blood alcohol concentration from transdermal alcohol biosensor data
PDF
Nonparametric estimation of an unknown probability distribution using maximum likelihood and Bayesian approaches
PDF
Distributed parameter model based system identification and filtering in the estimation of blood alcohol concentration from transdermal alcohol biosensor data
PDF
Physics-informed machine learning techniques for the estimation and uncertainty quantification of breath alcohol concentration from transdermal alcohol biosensor data
PDF
A nonlinear pharmacokinetic model used in calibrating a transdermal alcohol transport concentration biosensor data analysis software
PDF
Delta Method confidence bands for parameter-dependent impulse response functions, convolutions, and deconvolutions arising from evolution systems described by…
PDF
Uncertainty quantification in extreme gradient boosting with application to environmental epidemiology
Asset Metadata
Creator
Asserian, Lernik
(author)
Core Title
Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Applied Mathematics
Degree Conferral Date
2021-08
Publication Date
06/29/2021
Defense Date
06/09/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
alcohol biosensor,breath alcohol concentration,existence and consistency of estimators,finite-dimensional approximation and convergence,least squares estimation,maximum likelihood estimation,mixed effects model,nonparametric estimation,OAI-PMH Harvest,Prohorov metric,random discrete time dynamical systems,random partial differential equations,transdermal alcohol concentration
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Rosen, Gary (
committee chair
), Luczak, Susan E. (
committee member
), Wang, Chunming (
committee member
)
Creator Email
asserian@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC13012487
Unique identifier
UC13012487
Identifier
(accession number),etd-AsserianLe-9680.pdf (filename)
Legacy Identifier
etd-AsserianLe-9680
Document Type
Dissertation
Format
theses (aat)
Rights
Asserian, Lernik
Internet Media Type
application/pdf
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
alcohol biosensor
breath alcohol concentration
existence and consistency of estimators
finite-dimensional approximation and convergence
least squares estimation
maximum likelihood estimation
mixed effects model
nonparametric estimation
Prohorov metric
random discrete time dynamical systems
random partial differential equations
transdermal alcohol concentration