Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Bayesian inference using Markov chain Monte Carlo methods in pharmacokinetic /pharmacodynamic systems analysis
(USC Thesis Other)
Bayesian inference using Markov chain Monte Carlo methods in pharmacokinetic /pharmacodynamic systems analysis
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UM I a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9” black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UM I directly to order. ProQuest Information and Learning 300 North Zeeb Road, Ann Arbor, M l 48106-1346 USA 800-521-0600 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bayesian Inference Using Markov Chain Monte Carlo Methods in Pharmacokinetic/Pharmacodynamic Systems Analysis Copyright 2000 by Dongwoo Kang A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Biomedical Engineering) December 2000 Dongwoo Kang Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3041477 UMI’ UMI Microform 3041477 Copyright 2002 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES. CALIFORNIA 90007 This dissertation, written by under the direction of h . \ . s ........ Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of DOCTOR OF PHILOSOPHY D o n g w o o K a n g Dean of Graduate Studies Qafg D e c £ ? .r. DISSERTATION COMMITTEE Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. DEDICATION The fear o f the LORD is the beginning o f wisdom, and knowledge o f the Holy One is understanding. (Proverbs 9:10) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ACKNOWLEDGEMENTS The past few years at the University of Southern California has enriched my life and myself in an immeasurable way. Most of all, I had the wonderful chance to meet my advisor, Dr. David Z. D’Argenio, who introduced to me this fascinating field of research. He led me as well as supported me until the completion of the degree. I also thank Dr. Alan Schumitzky and Dr. Roger Jelliffe, who kindly took the responsibility to be members of my committee and provided me with their insights. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. TABLE OF CONTENTS Dedication Acknowledgements List of Tables List of Figures Abstract 1. Introduction 1.1 Pharmacokinetics and Pharmacodynamics (PK/PD) 1.2 Modeling PK/PD Processes 1.3 Estimating PK/PD Models 1.4 Specific Aims 1.5 Outline of the Dissertation 2. Review of Computational Methods for Bayesian Estimation 2.1 Analytic Approximation Techniques 2.2 Numerical Integration 2.3 Gaussian Quadrature 2.4 Other Numerical Methods 2.5 Sampling Based Methods 2.6 Markov Chain Monte Carlo Methods 3. Generic Markov Chain Monte Carlo Methods 3.1 Motivation 3.2 IMH (Independence Metropolis-Hastings) 3.2.1 Constructing a Markov Chain 3.2.2 Application to Bayesian Estimation 3.2.3 Implementation 3.3 Error Analysis, Mixing, and Convergence Test 3.3.1 Single Long Run 3.3.2 Multiple Sequences Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4. Regenerative Markov Chain Monte Carlo Methods 48 4.1 Motivation 48 4.2 Splitting Method: SIMH (Split Independence Metropolis-Hastings) 49 4.2.1 Defining Regeneration Times of IMH Cham 49 4.2.2 Application to Bayesian Estimation 56 4.2.3 Implementation 56 4.3 Hybrid Algorithm: RIMH (Rejection Independence Metropolis Hastings) 58 4.3.1 Introducing Regeneration Into IMH 58 4.3.2 Application to Bayesian Estimation 59 4.3.3 Implementation 59 4.4 Methods for Analysis of Regenerative Processes 61 4.4.1 Moment Estimation 61 4.4.2 Quantile Estimation 65 5. Applications 68 5.1 PK Model of High-Dose Busulfan 68 5.2 PK Model of Teniposide 72 5.3 PK/PD Model of Terbutaline 75 5.4 PK/PD Model of Warfarin 80 6. Results 88 6.1 Busulfan Example 88 6.1.1 Results with IMH 88 6.1.2 Results with SIMH 93 6.1.3 Results with RIMH 99 6.2 Teniposide Example 110 6.2.1 Results with IMH 110 6.2.2 Results with SIMH 113 6.2.3 Results with RIMH 115 6.3 Terbutaline Example 119 6.3.1 Results with IMH 119 6.3.2 Results with SIMH 124 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.3.3 Results with RIMH 6.4 Warfarin Example 6.4.1 Results with IMH 6.4.2 Results with SIMH 6.4.3 Results with RIMH 7. Discussion and Conclusion 7.1 Evaluation of the Methods 7.2 Limitations of the Methods 7.3 Advantages of the Methods 7.4 Other Issues Investigated 7.5 Future Work References Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF TABLES Table 5.1 Prior statistics of Busulfan PK parameters. 70 Table 5.2 Parameter values for the 3 selected subjects from Busulfan population simulation. 71 Table 5.3 Exact and noisy data for the 3 selected subjects from Busulfan population simulation. 71 Table 5.4 Prior statistics of Teniposide PK parameters. 73 Table 5.5 Parameter values for the 3 selected subjects from Teniposide population simulation. 75 Table 5.6 Exact and noisy data for the 3 selected subjects from Teniposide population simulation. 75 Table 5.7 Prior statistics of Terbutaline PK/PD parameters. 77 Table 5.8 Parameter values for the 3 selected subjects from Terbutaline population simulation. 79 Table 5.9 Exact and noisy data for the 3 selected subjects from Terbutaline population simulation. 79 Table 5.10 Prior mean vector of Warfarin PK/PD parameters. 84 Table 5.11 Measured INR data for the two subjects. 84 Table 5.12 Prior covariance matrix of Warfarin PK/PD parameters. 85 Busulfan Example: IMH Table 6.1 Table of moments. 92 Busulfan Example: SIMH Table 6.2 Effect of different values of split for BS10. 93 Table 6.3 Percentage coverage for BS 10. 96 Table 6.4 Features of the split IMH chain. 97 Table 6.5 Percentage coverage for the three subjects. 98 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Busulfan Example: RIMH Table 6.6 Features of the RIMH chain. 101 Table 6.7 Table of moments. 102 Table 6.8 Percentage coverage for BS10. 105 Table 6.9 Percentage coverage for posterior quantiles estimated using discretization method. 106 Table 6.10 Percentage coverage for the three subjects. 107 Teniposide Example: IMH Table 6.11 Table of moments. 110 Teniposide Example: SIMH Table 6.12 Features of the split IMH chain. 113 Table 6.13 Percentage coverage for the three subjects. 114 Teniposide Example: RIMH Table 6.14 Features of the RIMH chain. 116 Table 6.15 Table of moments. 116 Table 6.16 Percentage coverage for the three subjects. 118 Terbutaline Example: IMH Table 6.17 Table of moments. 121 Terbutaline Example: SIMH Table 6.18 Features of the split IMH chain. 124 Table 6.19 Summary ofthe posterior statistics of TB10. 125 Table 6.20 Summary of the posterior statistics of TB50. 126 Table 6.21 Summary of the posterior statistics of TB90. 126 viii Reproduced with permission ofthe copyright owner. Further reproduction prohibited without permission. Terbutaline Example: RIMH Table 6.22 Features of the RIMH chain. 127 Table 6.23 Table of moments. 128 Table 6.24 Summary ofthe posterior statistics of TB10. 131 Table 6.25 Summary of the posterior statistics of TB50. 132 Table 6.26 Summary of the posterior statistics of TB90. 132 Warfarin Example: IMH Table 6.27 Table of moments. 136 Warfarin Example: SIMH Table 6.28 Features of the split IMH chain. 141 Table 6.29 Summary of the posterior statistics of WF02. 142 Table 6.30 Summary of the posterior statistics of WF03. 143 Table 6.31 Percentage coverage for WF02. 144 Warfarin Example: RIMH Table 6.32 Number of subchains and number of rejections. 145 Table 6.33 Features of the RIMH chain. 146 Table 6.34 Table o f moments. 148 Table 6.35 Percentage coverage for selected parameter quantiles using discretization method for WF02. 153 Table 6.36 Summary of the posterior statistics of WF02. 154 Table 6.37 Summary of the posterior statistics of WF03. 155 Table 6.38 Percentage coverage for WF02. 156 Discussion and Conclusion Table 7.1 Approximate run times. 161 Table 7.2 Number of bum-in samples and posterior samples. 161 ix Reproduced with permission ofthe copyright owner. Further reproduction prohibited without permission. LIST OF FIGURES Figure 1.1 Conceptualization of the pharmacokinetic and pharmacodynamic framework. 2 Figure 5.1 Busulfan PK model with 2 parameters. 70 Figure 5.2 Population simulation of Busulfan model. Concentration curve. 71 Figure 5.3 Teniposide PK model with 4 parameters. 72 Figure 5.4 Population simulation of Teniposide model. Concentration curve. 74 Figure 5.5 Population simulation of Teniposide model. AUC curve. 74 Figure 5.6 Terbutaline PK model with 9 parameters. 76 Figure 5.7 Population simulation of Terbutaline model. FEV, curve. 79 Figure 5.8 Blood clotting cascade diagram. 82 Figure 5.9 Warfarin PK/PD model with 20 parameters. 83 Figure 5.10 Measured data for subject receiving Warfarin. 86 Busulfan: IMH Figure 6.1 Autocorrelation plots to assess mixing. 89 Figure 6.2 IPSRF (iterative potential scale reduction factor) for BS10. 90 Figure 6.3 Posterior sample path for BS 10. 91 Figure 6.4 Density plots for BS10. 92 Busulfan: SIMH Figure 6.5 Asymptotic 95% confidence intervals for the PK parameter V of BS 10. 94 Busulfan: RIMH Figure 6.6 Posterior sample path with the subchain starting points marked for BS 10. 101 Figure 6.7 Density plots for BS 10. 102 Figure 6.8 Asymptotic 95% confidence intervals for the PK parameter V of BS10. 103 Figure 6.9 Width of the asymptotic 95% confidence interval vs. c. 109 x Reproduced with permission ofthe copyright owner. Further reproduction prohibited without permission. Teniposide: IMH Figure 6.10 Autocorrelation plots to assess mixing. Figure 6.11 Prior and posterior distribution plot for TN10. 111 112 Teniposide: RIMH Figure 6.12 Prior and posterior distribution plot for TN10. 117 Terbutaline: IMH Figure 6.13 Autocorrelation plots to assess mixing. 120 Figure 6.14 Prior and posterior distribution plot for TB 10. 122 Terbutaline: RIMH Figure 6.15 Prior and posterior distribution plot for TB 10. 129 Warfarin: IMH Figure 6.16 Autocorrelation plots to assess mixing. 133 Figure 6.17 IPSRF (iterative potential scale reduction factor) for WF02. 134 Figure 6.18 Posterior sample path for WF02 and WF03. 135 Figure 6.19 Prior and posterior distributions for WF02 and WF03. 137 Warfarin: RIMH Figure 6.20 Posterior sample path with the subchain starting points marked. 147 Figure 6.21 Prior and posterior distributions for WF02 and WF03. 149 Figure 6.22 Width of the asymptotic 95% confidence interval vs. c. 157 Discussion and Conclusion Figure 7.1 Autocorrelation plots. 162 xi Reproduced with permission ofthe copyright owner. Further reproduction prohibited without permission. ABSTRACT Three sampling-based Bayesian parameter estimation methods were implemented in general-purpose software and evaluated using several pharmacokinetic/pharmacodynamic (PK/PD) models. The Independence Metropolis-Hastings (IMH) sampler is a generic Markov chain Monte Carlo algorithm, which can generate samples from the posterior distribution in Bayesian applications. The Independence Metropolis-Hastings algorithm with splitting (SIMH) adopts Nummelin’s splitting of the Markov chain transition kernel to obtain independent and identically distributed (i.i.d.) subchains by identifying regeneration times of the already constructed IMH chain. The regenerative analysis by Iglehart and Crane can then be used to calculate asymptotic confidence intervals to provide error analysis for the posterior estimates. The classical ratio estimator was used to estimate the posterior moments, and the empirical distribution method was used to estimate the posterior quantiles. The Rejection Independence Metropolis-Hastings (RIMH) is a hybrid sampler combining rejection sampling with the IMH sampler, which is capable of generating i.i.d. subchains by design using the predefined regeneration set. The same regenerative analysis used for SIMH can be used for RIMH. To insure convergence to the target distribution, optimal values for the bum-in period and the total chain length have to be determined for IMH. The tuning parameters for SIMH and RIMH can be used to control the average subchain length. Selection of these values is generally problem dependent and requires trial and error, especially for extreme subjects. In addition, sampling-based methods are computationally more expensive than other estimation methods. However, once the simulation is successfully done, any xii Reproduced with permission ofthe copyright owner. Further reproduction prohibited without permission. information on the posterior can be obtained, and the error analysis in the form of confidence interval is possible with the two regenerative methods. The three algorithms were evaluated using 4 PK/PD examples of differing complexity, from 2-parameter linear Busulfan PK model to 20-parameter nonlinear Warfarin PK/PD model. The posterior distributions, moments, and quantiles for system parameters and outputs were successfully obtained using the IMH sampler. In addition, the asymptotic confidence intervals for the posterior quantities obtained from SIMH and RIMH showed close to nominal coverage with appropriately chosen tuning parameters. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1. Introduction 1.1 Pharmacokinetics and Pharmacodynamics (PK/PD) Pharmacokinetics is the study of the absorption, distribution, metabolism, and excretion of drugs (Riegelman et al. 1973). The most commonly employed approach used to characterize the pharmacokinetics o f a drug is to represent the body as a system of interconnected compartments, even though these compartments usually have no physiologic or anatomic reality. Drug molecules are assumed to be distributed instantaneously and homogeneously upon entering a compartment. For such compartment models, the transfer of drug between compartments or its elimination from compartments can be modeled as a first- order transfer process or by non-linear transfer models (e.g., Michaelis-Menten kinetics), depending on the mechanism of transfer. Compartment models describing the pharmacokinetics o f drugs have been used to understand and control drug therapy (e.g., Jelliffe 1967; Buell, Jelliffe, Kalaba, and Sridhar 1970). Pharmacodynamics is the study of the time relationships between the concentration of a pharmacological substance at its site of action and the intensity of its effects. While there is no general modeling framework for describing pharmacodynamic processes, analogous to compartmental modeling of pharmacokinetic processes, several different approaches have been proposed for specific drug mechanisms of action. These approaches include effect compartment models (Holford and Sheiner 1981) and indirect response models (Jusko and Ko 1994). The effect compartment is an additional, hypothetical compartment linked to the central compartment by a first-order rate constant. However, the 1 Reproduced with permission ofthe copyright owner. Further reproduction prohibited without permission. drug amount in the effect compartment does not contribute to the overall amount of drug, and nonlinear sigmoid Hill equation is used to relate the drug amount to its effect. Instead of assuming direct action of drug, the indirect response model tries to model the pharmacologic phenomenon for which the site of action precedes or follows the measured response variable. Figure 1.1 provides a framework for conceptualizing pharmacokinetic/pharmacodynamic (PK/PD) processes. Cp I ( Ce ) ir H BIOSIGNAL * tO U T DISPOSITION BIOPHASE BIOSENSOR BIOSIGNAL TRANS- KINETICS DISTRIBUTION PROCESS FLUX OUCTION RESPONSE L PHARMACOKINETICS J PHARMACODYNAMICS Figure 1.1 Conceptualization of the pharmacokinetic and pharmacodynamic framework (Jusko et al. 1995). Reproduced with permission ofthe copyright owner. Further reproduction prohibited without permission. 1.2 Modeling PK/PD Processes It is assumed that the PK/PD process under study can be described by the following set of equations (D’Argenio and Schumitzky 1997): ^ r ~ = f{x{t),a,r(t),t), jc(0) = c , ( 1.1 ) at y(t) = /»(jc(/),a,r(f),f)> ( 1.2) where x(t) is the state vector ( x e R"), a is the parameter vector (a e Rp ), r(t) is the input vector ( r e R*), andy(r) is the output vector (y e R/ ). Each element of the input vector r(t) can be used to represent the dose regimen of a specific compound. One example is the intravenous (TV) infusion regimen in which the rates of infusion remain constant between input times: n ( 0 = ri- 1. dh -1 < 1 ^ dti > ‘ = 2'-> nd + l , ( 1.3 ) where nd is the number of dose events. Inputs can also be modeled as boluses b(t) which produce instantaneous changes in model states: x(dt* ) = x(dt~) + b(dti), i = 1 ,..., n d , ( 1.4) where the - and + indicate times immediately before and after bolus administration. Data to be used in fitting the model described by (1.1) and (1.2) are generally collected at discrete times and subject to additive output error: z(tj) = h[x(tj),a,tj)+v(tj), j = l,...,m , (1.5 ) where h(t} ) represents the model output vector at time tj (from (1.2)) and v(fy) is the vector of additive output error terms, representing all sources of uncertainty including measurement error that can be modeled as a random process added to the true model output. The common 3 Reproduced with permission ofthe copyright owner. Further reproduction prohibited without permission. method is to model v(/;) as normally distributed with zero mean and variance defined by an appropriate error variance model. In general, the variance of the output error can be an arbitrary function of the model output: where I is number of outputs, and /? represents additional parameters that are unique to the error variance model. For example, either the standard deviation of error is related to the output linearly as In cases of multiple responses, each output can be defined by a unique variance model which is independent of the others. The above modeling framework will be used to define the PK/PD estimation problem that is the subject of this study. 1.3 Estimating PK/PD Models In PK/PD, the problem of model parameter estimation has historically been formulated as a nonlinear least squares estimation problem. In many PK/PD modeling studies in which collected data are sparse and prior information is available about model parameters, Bayesian methods can be adopted as an effective approach. Early work in this area includes that o f Sheiner and his colleagues (Sheiner et al. 1975) who estimated digoxin clearances as the mode of its posterior distribution given measurements o f digoxin plasma varjv,-(tj)\=gi[y(t),a,tj,fi), / = / = l,...,m, (1 .6 ) (1 .7 ) or the variance of error is related to the output in power function as var{v(r)} = cr02 -yr (t). ( 1.8 ) 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. concentration and a known prior distribution for clearance in the population (maximum a posteriori estimate, MAP). Katz and Schumitzky (Katz et al. 1981; Katz et al. 1982) reported the first efforts to calculate the complete posterior distribution of model parameters in a nonlinear pharmacokinetic estimation problem, for which they used numerical quadrature methods to perform the multi-dimensional integration. Following this seminal work, others have reported the use of sophisticated numerical quadrature, analytical approximation techniques, and discretization of densities for calculating posterior densities (e.g., Naylor and Smith 1982; Katz and D’Argenio 1984; Smith et al. 1985; Tiemey and Kadane 1986; Smith et al. 1987; Shah 1990). The above methods, however, are not sufficiently robust for general nonlinear model applications encountered in pharmacokinetics and pharmacodynamics. Recently, sampling-based approaches (Smith and Gelfand 1992), especially MCMC (Markov chain Monte Carlo) methods, provide the promise of a unifying framework for complex Bayesian modeling problems (Gilks et al. 1996). These methods have been applied to complicated PK/PD population models (Wakefield et al. 1994; Gelman et al. 1996), as well as to individual models (Park and D’Argenio 1995; Wakefield 1996; D’Argenio and Park 1997). MCMC methods have a flexible characteristic which allows constructing a hybrid sampler using other individual sampling methods. This hybrid approach can be employed to enhance MCMC simulation. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.4 Specific Aims The objective of this study is to develop general, robust, and computationally practical approaches for Bayesian estimation for PK/PD models of the class defined in Section 1.2, that can be used by drug development researchers. To accomplish this goal, the following aims are addressed. 1. As one example of a generic MCMC method, implement the Independence Metropolis- Hastings (IMH) algorithm to solve the Bayesian estimation problem for PK/PD models as defined in Section 1.2 above. 2. Implement the Independence Metropolis-Hastings algorithm with splitting (SIMH) to provide an error analysis for the IMH method. Implement procedures for analysis of regenerative stochastic processes to estimate the posterior quantities with error analysis for the SIMH method. 3. Implement the Rejection Independence Metropolis-Hastings (RIMH) algorithm to solve the PK/PD Bayesian estimation problem. Implement procedures for analysis of regenerative stochastic processes to estimate the posterior quantities with error analysis for the RIMH method. 4. Apply the three algorithms to four models of differing complexity: a PK model of the anticancer drug Busulfan; a PK model of the anticancer drug Teniposide; a PK/PD model of the anti-asthmatic drug Terbutaline, and a PK/PD model of the anticoagulant Warfarin. Using these examples, evaluate the implementation of the algorithms, and the strengths and limitations of each. 5. Investigate the effects of the tuning parameters for SIMH and RIMH samplers using extensive coverage studies. 6 Reproduced with permission ofthe copyright owner. Further reproduction prohibited without permission. 6. Evaluate different quantile estimation methods for the SIMH and RIMH. 1.5 Outline of the Dissertation This dissertation is organized as follows. Chapter 1 gives an overview of PK/PD modeling and estimation along with the objectives. Chapter 2 reviews the previous computational efforts in Bayesian estimation, and introduces the concept of a Markov chain and Markov chain Monte Carlo (MCMC) methods. Chapter 3 summarizes the generic MCMC methods, introduces the algorithms in chronological order, and discusses implementation and analysis issues. Chapter 4 gives an exposition of the two regenerative sampling methods considered, provides the implementation details for both of them, and presents methods for the analysis of regenerative chains. Chapter 5 introduces the drug models that were investigated in this study and provides descriptions for them along with the prior parameter statistics. Chapter 6 gives the results of the application of the three MCMC methods obtained to the four examples. Various issues related to solving the sampling-based Bayesian estimation problem are discussed and the performance o f the methods is evaluated. Chapter 7 concludes the dissertation with discussion and topics for future work. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2. Review of Computational Methods for Bayesian Estimation In the Bayesian approach to parameter estimation, the model parameters are considered to be random variables. This is fundamentally different from the frequentist approach where parameters are treated as fixed quantities (Carlin and Louis 1996). The frequentist evaluates procedures based on repeated sampling, assuming an infinite replication of the inferential problem and evaluates unknown parameters over this repeated sampling framework. The Bayesian requires a sampling model represented in the form of likelihood and a prior distribution on parameters. Unknown parameters are considered random and all inferences are based on their distribution conditional on observed data, which is the posterior distribution. Bayesian inference involves fitting a probability model to a set o f data and summarizing the result by a probability distribution. It can be summarized by the following well-known formulation (Bayes 1958): P( , | Z)= ' W i W L , (2 .1 ) Jl{x I z)p(x)dx where x is the parameter vector (the parameter vector will be generally denoted by x except for the cases where it is specifically related to the PK/PD system for which it will be denoted by a as in Section 1.2), z is the observation or data, p(x) is the prior distribution, p(x | z) is the posterior distribution, and /(x| z) is the likelihood function. Once the posterior distribution p{x | z) is determined, quantities of interest such as marginal densities for model parameters, means, and standard deviations can be obtained. When the measurement 8 Reproduced with permission ofthe copyright owner. Further reproduction prohibited without permission. error is modeled as independent of each other and normally distributed with zero mean and finite variance, the likelihood function is defined as follows: where y is the prior predictions based on the prior parameter distribution. The Bayesian analogue of a frequentist confidence interval (Cl) is usually referred to as a credible set or central posterior interval, and it is used as a way of reporting the posterior uncertainty. A 100x(l-2y) % credible set for x is a subset C of U such that This definition enables direct probability statements about the likelihood ofx falling in C: ‘The probability that x lies in C given the observed data z is at least (1-2 y)' This is in contrast to the usual frequentist interpretation of Cl: ‘If we could recompute C for a large number of data sets collected in the same way as ours, about I00x(l-2y) % of them would contain the true value of x ' Therefore, the credible set provides an actual probability statement for the Bayesian based only on the observed data and the prior opinion. A similar quantity describing the posterior uncertainty is the highest posterior density (HPD) interval. The HPD interval is the region of values that contains 100x(l-2y) % of the posterior probability and also has the characteristic that the density within the region is never lower than that outside. The Bayesian approach to statistical modeling requires specifying the likelihood l(x | z ) and the prior probability function p{x) , multiplying the two o f them, and normalizing to form the posterior probability density. The technical problem of carrying out ( 2.2 ) (2 .3 ) u 9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a Bayesian analysis therefore reduces to that of performing or approximating a number of numerical integrals that usually turn out to be high dimensional. The associated numerical problems become severe as the number of parameters in the model increases. The remainder of this section will review some of the methods (Smith 1991) that have been proposed for Bayesian computation and have been applied to PK/PD models. 2.1 Analytic Approximation Techniques Asymptotic Normality: For large sample sizes, suppressing the dependence on the given data (i.e., denoting /(x| z) by L(x)), the posterior distribution ofx is often approximately N (x,t ) , where x is the maximum likelihood estimate and t is the inverse of the Hessian matrix evaluated at x . This approach has the powerful advantage in that practically all posteriors can be treated with normality assumption and that x and t can be easily computed. The major problem with this strategy is the justification o f approximate normality. Lindlev’s Approximation: Lindley (1980) developed specific expansions to order n '1 for ratios of integrals of the form f w(x)/(x | z)dx/ J v(x)/(x | z)dx. ( 2.4 ) Thus, for example, if we would like to calculate the expected value of posterior quantity with w(x) = /(x )p (x ), v(x) = p (x ), and p(x) = In p(x) , Lindley showed that I , (2 .5 ) ij iJJjn 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where subscripts on f p, and L denote differentiation with respect to specific components of approximations require the evaluations of up to third-order derivatives, a task which quickly becomes limiting as the parameter dimension increases. The Laplace Approximations: Tierney and Kadane (1986) proposed a form of analytic approximation, which requires the evaluation of only the first and the second derivatives of slightly modified likelihood functions. The basic idea is to apply separately the Laplace method for integrals to both the numerator and denominator in (2.4). We can illustrate the method by considering again the special case w(x) = f(x)p(x) and v(x) = p(x) , for positive /(•). We write (2.4) as Let x be the mode o fM, and let < x 2 = - I f M ’(x) . Then the Laplace approximation produces x. All functions in (2.5) are evaluated at x = x , and the q, are the elements of I . The E(J{x) | z) = \f(x)eL{x)p{x)dx/ \eLMp(x)dx, ( 2.6 ) where L(x) is the Iog-likelihood function. With M = log p(x) + L(x)/n and M* = lo g /(x ) + log p(x) + L(x)fn , the posterior mean o f/is \enM™ d x l\e nMMdx. (2 .7 ) <rexp[nA/(x)], ( 2.8 ) 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. for the denominator in (2.7). Similarly, if x * is the mode of M* and a * 2 (x*), then the Laplace approximation to the numerator of (2.7) is Je ^ ' ^ d x (j*exp[«A/*(x*)]. (2 .9 ) From (2.8) and (2.9) we obtain the approximation £ (/(* ) | z) s (< 7 * / <r)exp{«[M * (x*) - M (x)]}, ( 2.10) which has relative error of order n 2 and seems to be more accurate than conventional approximations for a range of problems. However, this method requires a regularity condition that the likelihood times prior must be unimodal. In addition, it is necessary to rely on numerical integration in order to calculate the normalizing constant to obtain marginal density. 2.2 Numerical Integration If the additive output error v(ry ) is distributed as N{0, o2 ), the density function can be written as f * \ m P(V\X,<T ) = 1 Therefore, the likelihood function is exp 1 m L y v2 ^ J I * * * /(x|r,(T 2)Qcp(v|x,o-2)cccT"mexp| 2 < j ‘ ) ( 2.1 1 ) ( 2.1 2 ) 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m where S(x) = £ v2 . If we assume independence between x and < r, that is, ;=i p(x,a2) = p(x)p(a2) , (2.13) and that x has a locally uniform prior distribution and a non-informative prior for a 2 by taking p(<r2)cc —, a (2.14) we can integrate out <j2 and obtain p{x\z)<cS{xrnl2 (2.15) Thus, to calculate p(x\ z) , it is only necessary to integrate the function S(x)~n/2 overx to obtain the constant of proportionality. Additional integration results in calculation of £(x), the covariance matrix ofx, and other quantities. To numerically integrate an expression of the form Katz, Schumitzky, and Azen (1981) used three different methods, which are summarized below. Romberg Integration utilizes extrapolation to infinity. To evaluate (2.16) atjth iteration, two trapezoidal approximations Tx (j) and T2(j) are calculated using K.) and [A T * ] (the largest integer contained in the brackets) subintervals, respectively. K\ = 1, K f = 2 , and Kj ~ 1.5 x K) are used in the first iteration, and K j = [A T ; 2 _, ] is used subsequently. The (2.16) a, a, 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. known rate of convergence of the trapezoidal approximations is used to extrapolate forward from Ti(f) and T2 (j) to an estimate of lim T(J) = A . The procedure stops when extrapolated K j-vb values of the integral on two successive iterations are within a specified relative tolerance, or when a maximum number of iterations is reached. Haber Integration is based on stratified random sampling. To evaluate (2.16), we first translate the variables xn so that the integration is calculated over the unit hypercube. At they'th iteration, each side of the hypercube is subdivided. In each sub hypercube, two points are selected at random, and the function is evaluated at these two points and at the reflections of these points through the center of the sub-hypercube. The mean of these function values yields an estimate £1 of the desired integral. A similar estimate £2 is obtained at yth iteration using more subintervals on each edge of the unit hypercube. The algorithm stops when both of two convergence criteria are met: i) an estimation of the relative error in £2 is small, and ii) this estimate of error is stable. Product of Simpson’s Rules approximates the integral of a function of one independent variable over the interval [a = x0 < X i <... < .t„ = 6] using n+1 points as follows: f/(x)<fe = ) + 4 / w + 2 /(1 ,)+ ...+ 4 /0 ^ , )+ /(* .)) (2 .1 7 ) ( b - a \ where x, =x0 + i ------ . For functions of k variables, (2.16) can be approximated by applying the one dimensional Simpson’s rule iteratively to each of the individual variables while holding the remaining variables fixed. A different number of points, n„ can be used for each variable. 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The Katz, Schumitzky, and Azen conclusion was that the product of Simpson’s rule worked best for the examples which had single modes and tails approaching zero rapidly, but that the method did not appear to be practical for functions of more than four variables due to the excessive computing time. The examples they used to compare the three methods were 2-parameter hormone-receptor model, 2-parameter bimodal tracer experiment model, and 2- parameter overspecified model. 2.3 Gaussian Quadrature Naylor and Smith (1982) proposed Gaussian quadrature to perform the numerical integration efficiently for the class of problems for which the forms p(x | z) may be adequately approximated by the product of a /fc-dimensional multivariate normal probability and a polynomial in (xlt..., xk). The numerical integration and marginalization of p(x | z ) , as given in (2.4), involve the computation of integrals of the form s r \ f (*)] = Jf(x)l(x | z)p{x)dxr , (2.18) where I is some index set I c {l,...,A:} withxt,..., xk denoting the components of x, and xr denotes the vector o f components of x whose subscripts are not elements of I. Thus for example, 7 = 0 and/(x) = 1 correspond to the normalizing constant in (2.1). Or if we are interested in the marginal density o f xh where the set I is the subset of interest, we then simply integrate over xr , where / ' is the complement of I to obtain P(*i I z) = fP(x | z)dxr . (2.19) 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Assuming the dimension of integration implicitly defined and the range understood to be the full parameter space, then all the integrals required for calculating and summarizing (2.1) and (2.4) are special cases of (2.18) for particular choices of /. If h(t) is a suitably regular function and i= t where and m ( 2.2 0 ) then the following relationship holds: ■ W O + 0 0 - jg(x)dx = | -j=h(Mx + ^2a~xi)txp{-x1)dx. ( 2.21 ) -0 0 -0 0 ’ ^ Utilizing the Gauss-Hermite formula, we obtain Ig(x)dx = 2-^=0, M/'t + >/2ov*f) -« |=! 'J* = eX P(^-2)S(Mx + V ^ , ) = 2 X s ( J f ) . ( 2.2 2 ) i= i = j2o~a)i exp(xf ), /,• =Mx+ fia~xxi > ( 2-23 ) 2n _ 1 n'.V^ < o , = -=--------------5 -, ( 2.24 ) where x, is the ith zero of the Hermite polynomial Hn(x). Thus we use approximations to the posterior mean and variance found in this way on any particular iteration to construct the 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. grid (/,•) and weights (m,) for the next iteration. This iterative method may be applied initially to small grid size, then the grid sizes are gradually increased until satisfactory convergence is observed. This method is generally fail-safe in the sense that convergent results cannot be attained if the assumptions underlying the approach are not reasonably satisfied. For problems with more than one parameter, a Cartesian product rule can be used: \....\g(xx,...,xk)dxv .xtxk s ’ (2-25 ) « i '* for which m\J ), are calculated using (2.23) with the marginal posterior mean and l J l J variance of ,ry substituted for x and a 2. Parameter transformation may be necessary for this to meet the assumptions of posterior independence. Therefore, all the marginal densities for a ^-parameter problem are available as marginal weighted sums. They illustrated their method with a 3-dimensional estimation problem. Later, Naylor and Shaw refined the method so that it could be applicable to problems with up to 6 parameters and further extensions with up to 26 parameters using Monte-Carlo methods. In so doing, they partitioned the integration grid into two parts, the Cartesian product part and the spatial part. The spatial part may itself be partitioned into a spherical rule part (the points may have some other deterministic configuration over the integration space) and a Monte-Carlo part (some stochastic process may be used to give more general sets of points). Thus, they defined the following partition of the parameter vector: X — (Xj ,...,Xe,Xc+j ,...,XC + S > -^c+ $ + l ,•••,^c+s+r’*c+J+r+l > •••> •* < :) > ( 2.26 ) where 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. c = number of dimensions in the Cartesian product rule, s = number of dimensions in the spherical rule, r = number of dimensions in the Monte-Carlo rule. The number c is restricted up to 6, and neither of r and k can exceed 26. Monte-Carlo rules are recommended for use on all problems with 9 or more parameters. However, the Monte- Carlo rules were not implemented in their standard release version of their program, Bayes Four (Naylor and Shaw). While Katz’s method is applicable to general form of distributions (for example, bimodal), it is restricted to low dimensional cases. Naylor’s method increases the dimensionality of the problem to be solved dramatically (yet not generally implemented), but it still requires a certain functional form (i.e., product of polynomial and normal probability density) of the posterior distribution. 2.4 Other Numerical Methods Fixed-grid approaches in numerical integration basically suffer from the exponential increase of computational burden as the dimension increases. An alternative for this is adaptive method which selectively subdivides the integration grid. The method was developed by van Dooren and de Riddler (1976), and later refined by Genz and Malik (1980). It first requires the hyper-rectangular region to be divided into smaller hyper- rectangles. In each subregion, the integral is estimated by a seventh-degree rule, and an error estimate is calculated by comparison with fifrh-degree rule obtained from a subset of the same points. Then the fourth difference o f the integrand along each coordinate axis is 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. evaluated and the subregion which has the largest absolute fourth divided difference is selected for further subdivision. This process is repeated until the resulting error becomes less than relative accuracy, van Dooren and de Riddler used test functions of dimension 2,3, 4, 5, and 6, and Genze and Malik reproduced their results with the same problems and other test functions of the same dimensions. Another approach, number theoretic method (Conroy 1967) attempts to find sequences of optimal lattice points for p-point rule approximation of the numerical integration: where {•} denotes the fractional part of its argument, ci,..., c* are the so-called optimal coefficients of x, and p is a prime number. This method uses periodic property of the Fourier expansion of the integrand function. Shah (1990) compared the two methods along with adaptive Monte Carlo method which uses a random sequence of numbers to evaluate an integral. The adaptive Monte Carlo method is adaptive in the sense that it can adjust the probability density where the random number is extracted. The number theoretic method is neither adaptive nor iterative, and it does not allow requesting a specified accuracy, hi addition, how to perform the required optimization is open to debate. The adaptive quadrature method is the most suitable routine for a well-behaved definite multidimensional integral (of dimension between 2 and 12). Katz and D’Argenio (1984) developed a method for Bayesian computation of PK/PD models that uses a discrete density approximation to replace the given continuous parameter density function. They developed a procedure for finding a discrete approximation (2.27) 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to a continuous multivariate density function. It is based on an algorithm for determining the Lx optimal discrete approximation to a univariate density. Results of approximating continuous bivariate density functions, which represent distributions of the parameters of a pharmacokinetic model, showed good agreement between the mean and covariance matrix of the approximated and approximating densities. The distribution of a predicted drug concentration was also calculated using a continuous density and discrete approximations with both 25 and 81 points. The authors note, however, that their method is limited to models of dimension up to about six or five. 2.5 Sampling Based Methods Sampling based-approaches for Bayesian computation have recently drawn much attention because of the potential to solve the dimensional restriction, the ability to accommodate general, even complex, forms of distribution, and also due to the cheap computing cost. There is essential duality between a sample and the density (distribution) from which it is generated. Clearly, the density generates the sample; conversely, given a sample we can approximately recreate the density. In terms of densities, the Bayesian inference process can be encapsulated in the updating of the prior density p(x) to the posterior density p(x | z) through the likelihood function /(x| z ) . With respect to samples, this corresponds to the updating of a sample from p(x) to a sample from p(x \ z) through the likelihood function. In this context, the prior density is often referred to as the proposal or candidate density, and the posterior density is called the target density. Once posterior samples are 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. obtained, numerical summaries (in the form of posterior moments including mean and standard deviation, correlations, credible intervals, quantiles, etc.) are trivially obtained by performing corresponding sample quantities. The marginal distributions can be easily calculated by summation over the other parameters, and functions of parameters (e.g., output of the system) can also be obtained in a straightforward way. Two sampling ideas that provide techniques whereby samples from one distribution may be modified to form samples from another distribution are summarized below (Smith and Gelfand 1992). Acceptance-Reiection Sampling: Suppose we wish to sample from a probability density function (pdf) jt, but only have a way to sample from the pdf p. The rejection sampling method retains the sampled values X from pdf p with a probability depending on the value of ^(R ipley 1987). The procedure is as follows: The A-R Sampling Algorithm 1. Generate X t from pdf p. 2. Take X n - Xi with probability u(X i)/Mp(X,), else iterate. Here, u is a density which is normalizable such that nix) = m (jc) / 1 u(x)dx (cf. Equation (2.1)). Then A " has pdf t c , because P{X< -t0 and X is accepted) = d x , Mp(x) (2.28) and P(X is accepted) = d x. Mp(x) (2.29) Therefore 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. P(X < x0 1X is accepted) = (2.30) which shows the accepted values have pdf k = u / \ u. Hence, for a sample X h i = 1,..., n, from p, in resampling to obtain a sample from n we will tend to retain those X i for which the ratio of ;r relative to p is large. The resulting sample size is random, and the probability that an individual item is accepted is given by One problem with the rejection sampling is that a large value of M leads to an inefficient algorithm with many rejections. The art of using the rejection method is to find a suitable pdf p that matches ;rwell and from which it is easy to sample. Importance-Sampling: Approximate resampling from /rcan be done as follows: The Importance Sampling Algorithm 1. Given A ”,, i'= 1,..., n, a sample fromp, calculate < 2 * = u(X{ ) / p(Xj). 2. Then calculate q( = < a f / £ Oij. /=> 3. Draw A1 * from the discrete distribution over {X [,..., X n) placing mass q, on A ,. P(X is accepted) = JJi s P W y f o (2.31) where S = {(x,y):x< xQ ,y<u(x)IMp(x)}. (2.32) n 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Then X* is approximately distributed according to k with the approximation improving as n increases. This can be verified as follows. Without weights, X* has density / v r s * ) = £ 71,_*,< *,) - T = - i= | n * 0 = ^p(x)dx, (2.33) - O D so that X* is approximately distributed as an observation fromp(A). Similarly, under weighted bootstrap, X * has density />(*;£ xo) = 2 f ' l( - * ) ( x') i» i r ^ r * E « w -2>, M n M P(x) ju(x)dx X o = ^ ----------= | n{x)dx, (2.34) Ju(x)dx - - O S so that A* is approximately distributed as an observation from k. But unless /rresembles p, large sample size is necessary for the distribution of X* to be a good approximation to n. 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.6 Markov Chain Monte Carlo Methods Monte Carlo integration evaluates the expectation of a certain quantity by drawing samples from the distribution and then estimating the population mean by a sample mean (Hammersley and Handscomb 1964): If the samples are independent, the law of large numbers guarantees that the approximation can be made as accurate as possible by increasing n. If one is unable to find a way to simulate independent realizations of some complicated stochastic process, it is almost as useful to be able to simulate dependent realizations Xu X2 ,..., forming an irreducible Markov chain having the distribution of interest as its stationary (equilibrium, invariant) distribution (Geyer 1992). This method provides a way to sample even from quite non-standard target distribution by generating samples throughout the support of the target distribution in the correct proportions. Thus, after a bum-in period (the time required for the Markov chain to converge to the stationary distribution) of m samples, the samples will be dependent samples from the target distribution. After discarding the bum-in samples, we have ergodic average Convergence to the required expectation is ensured by the ergodic theorem. We will now review aspects of Markov processes, as these will be central to our development of Markov chain Monte Carlo Methods for Bayesian computation. Markov processes have been used to model a variety of systems including queueing system, E [ f ( x ) ] = - t n x k). (2.35) of (2 .3 6 ) 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. exchange rate between two currencies, storage model, growth of population, telecommunications, and computer networks (Meyn and Tweedie 1993). A Markov process can be either discrete-time or continuous-time; and it can be countable state space or continuous state space. The stochastic process X is called a Markov chain if it satisfies the Markov property P(Xn = 51* 0, * , ) = P(Xn = s | ) ( 2.37 ) for all n £ I and s e S, where S is some countable set called the state space and P denotes probability of an event (Grimmett and Stirzaker 1982; Medhi 1994). UX„ = s, then the chain is said to be in the state s at the nth step. Markov property means that the outcome of the nth trial depends directly only on the (n-l)-st trial. The evolution of the chain is described by its transition probabilities: P y = P iX '+ i= j\X n = 0 , (2-38) which is the probability of the chain moving from state i to state j. The transition matrix P = (Pij) is the |5|x|5] matrix of transition probabilities and has the following properties: P y* 0 , (2.39) (2 .4 0 ) j In the more general case, the m-step transition probability is defined as follows: (2.41) It is the probability that, having started from the state i at the nth trial, the state j is reached in m steps. The chain A * is called homogeneous if 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. P{Xn+ x = j I Xn = i) = P{X{ = j I XQ = i) ( 2.42 ) for all n, i, and j. For a homogeneous chain, the transition probabilities do not depend on specific time points. Henceforth, we will confine our consideration only to homogeneous chains. State x is called recurrent if P(Xn =i forsome n * l | Jf0 = 0 = 1, (2-43 ) that is, having started from state /, the probability of returning to the same state is 1. If this probability is less than 1, then the state is called transient. A set C of states is called irreducible if i j for all i,j e C, where < -> is used to indicate that, having started from state /, there is a positive probability of the chain to visit state j, and vice versa. Let f tJ{n) = P(X{ * j ,X 2 * j .....V„_, * j, X n = j | X0 = i) ( 2.44 ) be the probability that starting from /, the first visit to state j takes place at the nth step. Then, the mean recurrence time v ; of a state i is defined as 3 0 v, = ^ n f li(n) if i is recurrent, ( 2.45 ) n=l v, =00 if / is transient. (2.46) The recurrent state / is called positive-recurrent if v -, < oo, or null-recurrent if v ; = oo. The period d(i) of a state / is defined to be the greatest common divisor of n for p„{n) > 0. It is the number of steps after which the state returns to itself. If d(i) = I, the state i is said to be aperiodic. 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The terms regarding the state and set of states defined in the previous paragraphs can be extended to transition matrix, and, equivalently, to chain (Seneta 1981). An nxn transition matrix P is irreducible if for every pair i,j of its index set, there exists a positive integer m such thatpijn) > 0. An irreducible transition matrix is said to be periodic (cyclic) with period d, if the period of any one (so of each one) of its indices satisfies d > 1, and is said to be aperiodic (acyclic) if d = 1. Positive-recurrence of a chain ensures the existence of a stationary distribution. In other words, it implies that if a value is sampled from stationary distribution, then all subsequent iterates will also be distributed according to the stationary distribution. If a chain is irreducible and positive-recurrent, then it has stationary distribution ;rsuch that When a Markov chain has a stationary distribution, it is also the limiting distribution o(Xn as rt -* oo. For example, transition matrix for a system with the transition probabilities of pi i = 0.5000, pi: = 0.5000, p2 I = 0.7000, and p ^ = 0.3000 is It can be verified that this transition matrix satisfies the condition for existence of stationary distribution. An easy way to find this stationary distribution is successively multiplying the transition matrix by itself until constant limiting value for each element is obtained. That is, 7t( > 0 for all /, and = 1, (2.47) If (2.48) is iterated, nP2 = («P)P =xP = n so that (2.48) kP" = n for all n £ 0. (2.49) P = 0.5000 0.5000 0.7000 0.3000 (2.50) 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. p 1 = 0.5833 0.4167 0.5833 0.4167 = P 8 =..., (2.51) where the equality holds here with 4 significant digits. Therefore, the stationary distribution of this Markov chain is n = [0.5833 0.4167]. In other words, if the states were to be repeatedly updated according to the transition probabilities defined as in (2.50), after a sufficient number of transitions, the final state would be 1 with probability 0.5833 and 2 with probability 0.4167. Analytically, the stationary distribution is defined as *,•=1/1'/, (2.52) where v ; is given as in (2.45). The existence of a stationary distribution is closely related to the concept of the spectral radius of a matrix. According to the Perron-Frobenius theorem, irreducible and aperiodic transition matrix has one eigenvalue of one, and the magnitudes of all the other eigenvalues are less than 1. Therefore, if a multiple power of the transition matrix is expressed using eigenvectors and eigenvalues according to the spectral theorem, those elements which are to be multiplied by the eigenvalues whose magnitudes are less than 1 vanish as the power becomes large. The only row that survives makes the stationary distribution of the transition matrix. If a transition matrix is periodic with period d, it has d complex roots of unity, which does not allow the stationary distribution to exist. The example transition matrix of (2.50) has eigenvalues o f I and -0.2. As for the general state space case, the above arguments hold in the same way. To begin with, measure space and measurable space are defined. A triple {E,£ ,p ) constitutes a measure space, where £ is a set (i.e., the state space) which often will be ^-dimensional 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Euclidean space R* but can be quite general, £ <zP(E) (the power set of E) is a oalgebra, and // is a measure on (£, £ ) which is also called a measurable space. A set B e £ such that //(£) = 0 is called a null set Co-null set, to be of //-measure zero). A set is of measure zero if it is so small as to be negligible or invisible so that whatever happens on such a set is not of any consequence. Therefore, for many theorems the hypothesis and/or conclusion need only hold on the complement of null set. If a statement about points x e A is true except for .t in some null set, we say that it is true //-almost everywhere (a.e.) on A, or that it is true for almost every x, or that it is true for //-almost all (a.a.). For a process X, evolving on a space E and governed by an overall probability law P, to be a time-homogeneous Markov chain, there must be a set of transition probabilities \p\x,A ),x e E,A c £ ■ } for appropriate sets A such that for times n, m in Z + = {0,1,...}, P{Xn+m e A \ X J,jZm-,Xm=x} = Pn(x,A)\ ( 2.53 ) that is, Pn (x, A) denotes the probability that a chain at x will be in the set A after n steps, or transitions. The independence of P" on the values of X j , j <m, is the Markov property, and the independence of Pn and m is the time-homogeneity property. The transition kernel is a function P(x, A) such that for any n> 0 P{Xn+ x e A \ X n =x} = P(x, A) , ( 2.54 ) for all x e E and A c E . In other words, P(x, ) is the distribution of the Markov chain after one step given that it starts at x. The first return time of a Markov chain to a set A<zE is denoted by rA. That is, 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ta = inf{«> l:A r„ e /l} , (2.55) with the convention that ta = q o if the chain never returns to A. A Markov chain is (p- irreducible for a probability distribution y > on E, if <p(A) >0 for a set A c E implies that Px {ta < 00} > 0 for all x e E . A chain is irreducible if it is q > -irreducible for some probability distribution q > . If a chain is < p -irreducible, then < p is called an irreducibility distribution for the chain. Irreducibility means that all interesting sets can be reached. If a chain is irreducible, it has many different irreducibility distributions. However, any irreducible chain has a maximal irreducibility distribution y/ in the sense that all other irreducibility distributions are absolutely continuous with respect to yr. (v is absolutely continuous with /r if v(A) = 0 for every A c £ for which /j(A) = 0.) Maximal irreducibility distributions are not unique but are equivalent, i.e., they have the same null sets. Recurrence is the property that all such sets will be reached infinitely often, at least from almost all starting points. An irreducible Markov chain with maximal irreducibility distribution y/ is recurrent if for any set . < 4 c £ with y/(A) > 0 the following conditions are both satisfied: Px {Xn+ l e A infinitely often } > 0 for all x, (2.56) Px {A r„+l e A infinitely often } = 1 for y/ -almost all x. (2.57) An irreducible recurrent chain is positive recurrent if it has an invariant probability distribution. Otherwise it is null recurrent. An irreducible chain with invariant distribution x is strongly aperiodic if there exists a probability distribution v o a £ , a constant J3> 0 , and a set C c £ such that v(C) > 0 and 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. P{.x, A) > fiv(A) for all x e C and alM c £ . Strong aperiodicity implies aperiodicity. If [Xn} is an irreducible, aperiodic Markov chain with transition kernel P and invariant distribution x, then ||pn(.t,-)-^(-)||->0 (2.58) for n -almost all x. I I • I I denotes total variation distance (for two probability distributions /u\ and ||/i| - ft\\ = 2 sup I f t - f t D - Equation (2.58) shows one mode of convergence to the Ac£ stationary distribution. 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. Generic Markov Chain Monte Carlo Methods In this chapter, the Metropolis-Hastings algorithm is introduced, along with the historical background, which culminates in a presentation of the Independence Metropolis- Hastings (IMH) that was implemented and evaluated for Bayesian estimation in this study. 3.1 Motivation The MCMC methods are celebrated for their flexibility to be fine-tuned for a specific problem at hand and the ability to handle high dimensional problems. The method is especially useful in nonlinear applications where other traditional Bayesian approaches fail. The term generic MCMC method is used here to denote the MCMC algorithms which are not combined with other sampling methods. The Metropolis algorithm, the Metropolis- Hastings algorithm, and the IMH algorithm will be viewed in the following sections. In this study, the IMH algorithm was implemented and applied to solve the Bayesian estimation problems in several PK/PD applications. 3.2 IMH (Independence Metropolis-Hastings) 3.2.1 Constructing a Markov Chain To construct a Markov chain which has a stationary distribution, we make the transition probability satisfy the following reversibility condition: 7 T -Pij = KjPji for all i and;. (3.1) 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This property guarantees that nP - P, for !LxiPij=lLnjPji =*jlLPji ~K j • ( 32) i i i Hence, 7t is the stationary distribution for P. This is a general way to ensure that the stationary distribution of a Markov chain exists. The Metropolis algorithm and the Metropolis-Hastings algorithm viewed in the following sections are practical implementing tools for applying the Markov chain in real-world problems. 3.2.1.1 Metropolis Algorithm The Metropolis algorithm (Metropolis et al. 1953) is a classic example of applying the idea of sampling from the stationary distribution of Markov chain. Their example considered only symmetrical proposals, having the distribution of the form p(J = for all i and j. They had to calculate the equilibrium value of F in a system of interacting molecules according to the equation \Fexp(-E/kT)d2Npd1Nq Jexp(~E/kT)d2Npd2Nq ’ where E is the potential energy of the system, k is Boltzmann’s constant, T is the absolute temperature of the surroundings, and d 2N pd2S q is a volume element in the 4N- dimensional phase space. Applying the Monte Carlo integration method to this problem replaces multi-dimensional integration over a regular array of points with simple integration over a random sampling of points. However, it was not practical for close-packed configurations where a configuration with very small value of exp(-£ / kT) could be chosen 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with high probability. They resolved this problem by choosing configurations with a probability exp(-£ / kT) and weighting them evenly, instead of choosing configurations randomly and then weighting them with exp(-£ / kT). Selection of configurations with a probability exp(-E / kT) was done as follows: The Metropolis Algorithm (Original Form) 1. Move a particle to a new position. 2. Calculate the change in energy of the system, AE . 3. If AE < 0, allow the move and put the particle in new position. If A£ > 0, allow the move with probability exp(-A£ / kT). For the purpose of taking averages, they always considered the particle was in a new configuration after each iteration. Then, after many moves the ensemble tends to a distribution which is proportional to exp(-£ / kT). The acceptance probability of the Metropolis algorithm can be written as follows: where the equilibrium distribution is = exp(-£, I kT) in this case. This algorithm depends on k through ratios of the form Kj I Thus k needs to be known only up to a normalizing constant. (3.4) (3.5) 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2.1.2 Metropolis-Hastings Algorithm Later, Hastings (1970) expanded the Metropolis’ idea to a general configuration by introducing an auxiliary density. Define Pij = Q ij& ij ’ i * J , ( 3-6 > Pa = l -£/>«. ( 3 J ) j*‘ where Q = {^,y} is the transition matrix of some arbitrary Markov chain with states 0,1,..., S, and djj is given by a u = - 1 + X i h j i ' (3.8) In (3.8), Sjj is a symmetric function of i and j, which was chosen such that 0 S a,y < 1 for all i and j. It can be verified that (3.6) and (3.8) satisfy the reversibility condition (3.1). One choice for s,y is (Peskun 1973): sg = 1 + { * d 3 L > \ *j/*9 { * ifl (3.9) Then, s ij = 1+ “ilia K : it Uij = I H lh L > i I j (3.10) (3.11) (3.12) 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This process can be simulated as follows: The Metropolis-Hastings Algorithm 1. Start with Xn = /, and select a state j using Q. 2. Take X^\ = j with probability a and X -i = i with probability 1 - a ,y. A special case of the MH algorithm is random-walk MH, for which q(j | i) = < y (| i - j |) . Often it is more convenient and computationally efficient to divide the variables into components of possibly differing dimensions, and then update these components one by one. This method is referred to as single-component Metropolis- Hastings. A special case of single-component Metropolis-Hastings is the Gibbs sampler (Geman and Geman 1984). Gibbs sampling deals with a collection of random variables Xu X2 ,..., Xk, for which it is known that the joint density, [Xu X2,...,Xk ], is uniquely determined by the full conditional densities [ Xs \ Xr, r * ■ s ], s = 1, 2,..., k . It is required that samples of X, can be generated straightforwardly and efficiently from [ X, | Xr, r * s ], given specified values of the conditioning variables, X„ r* s. Gibbs sampler candidates are always accepted and Gibbs sampling consists purely in sampling from full conditional distributions. If independence between X , and Xj exists, then q ij = q j and q# = q , (Tierney 1994). This can be accomplished by drawing samples independently of the current state. In this case, from (3.11) and (3.12), the Metropolis-Hastings acceptance probability of moving from state i to state j is a,y=l (3.13) (3.14) 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2.2 Application to Bayesian Estimation The Metropolis-Hastings algorithm presented above can be applied to the Bayesian estimation problem defined in (2.1) by relating the prior density to be the proposal density, and the posterior density to be the target density: rz = p(a | z ) , (3.15) q = p(a), (3.16) where a is the PK/PD parameter vector as defined in Section 1.2 and z is the measurement data. Then, k _ p (a | z) /(or|z) (317) 9 P(a ) I z)p(.a)da Thus in the case of independence chain, the acceptance probability of IMH sampler is given as: aij=l K(*i I 2) liCCj I 2) f / ( g y | 2 ) ^ i ‘ J Ka( \z) [/(a,! 2) _ (3.18) (3.19) where the double subscripted a denotes the MH acceptance probability and the single subscripted a denotes the PK/PD system parameter vector. Therefore, the normalizing constant doesn’t have to be known, and the acceptance probability is defined merely by the likelihood values associated with the two parameter vectors. The likelihood ratio serves as the acceptance probability for the candidate samples generated independently from the prior distribution. This likelihood is the importance weight function that would be used in importance sampling if observations were generated from the 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. density p(a). Candidate steps with low weights are rarely accepted. On the other hand, candidates with high weights are usually accepted, and the process will usually remain at these points for several steps, thus using repetition to build up weight on these points. If some points have very high weight values, then the process may get stuck at these points for a long time. Thus, as for importance sampling, it is useful to choose p to produce a weight function that is bounded, and as close to constant as possible. If the weight function is constant, then the Metropolis algorithm will never reject candidate steps, and the chain produces and i.i.d. (independent and identically distributed) samples from n. The best choice of the candidate density for an Independence Metropolis-Hastings kernel is still open to question. But, because of the close relation to importance sampling, it is reasonable to conjecture that multivariate t distributions with low degrees o f freedom, split-/ distributions and other distributions that have been found to be useful as importance sampling densities will be good choices in this context as well (Tierney 1994). 3.2.3 Implementation An algorithm implementing the IMH sampler was written in Fortran and incorporated as a part of the ADAPT software package (D’Argenio and Schumitzky 1997). Model equations can be formulated as described in Section 1.2. The prior distribution can be either defined as normal or lognormal. The random number generating routine used in the implementation combines two multiplicative congruendal generators with additional shuffling capability to increase the period and to prevent the possible serial correlation (Press 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. et al. 1992). Two separate generators were used in the algorithm below: one for random parameter deviates and the other for random number u. The IMH Algorithm 1. Select a 0 , initialize chain at a 0. 2. For/i = 1,..., do 3-5. 3. Generate a j from p(a) when currently at a , • = a „. 4. Generate u from t/(0,l), where C/(0,1) is uniform distribution between 0 and 1. K< *j 1z) 5. If u < — -----, then = a „ else a ^ i = a,. /(a, | z) > 3.3 Error Analysis, Mixing, and Convergence Test There are a number of practical implementation issues that need to be addressed when using MCMC methods, including issues of mixing of the chain over the target distribution and the convergence rate of the chain. Mixing refers to the ability of the sampler to move freely about the posterior region. Starting point, bum-in length, and total chain length affect the empirical posterior distribution. Methods for making inferences about posterior quantities need to be considered as well. In the following sections, two representative competing ideas regarding how to simulate the Markov chain are summarized, and the convergence tests relevant to each method are briefly overviewed (Brooks and Roberts 1998). 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.3.1 Single Long Run Geyer (1992) supports the idea of running a single Markov chain sufficiently long and making all inferences from it. If the chain is irreducible, then the choice of starting values will not influence the stationary distribution. Expectations are estimated by averaging over the samples and also are standard errors of the expectations. The variance estimation is different, since it must take the dependence into account, but this is a well-studied problem with a huge literature. The standard errors can be calculated by appealing to the central limit theorem, which is always available if the chain is reversible and the asymptotic variance is finite. An integral 6 = Jfix)dP(x) ( 3.20) can be approximated by averaging the function over the chain using samples: n i= i In nice situations, 6n -* 6 almost surely (meaning, for almost all sample paths of the Monte Carlo simulation), and the central limit theorem V^(0„ - 6 ) — °^>N{0,a2) (3.22) gives the size of the Monte Carlo error. The variance is difficult to calculate theoretically, but it can be estimated from the Markov chain itself by standard time-senes methods. Rafterv and Lewis’ eibbsit: Raftery and Lewis (1992) proposed a convergence test that is applicable to single long chain and implemented it into a code named gibbsit It first forms a binary 0-1 process depending on whether or not the samples from a pilot run fall 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. within the bound of the quantile to be estimated. Then gibbsit constructs a new process consisting of every kxh iteration from the 0-1 sequence. This new process can be an approximate Markov chain for sufficiently large k, and its distribution is asymptotically normal. The bum-in period is determined using the analytic solution of two-state Markov chain by making the intermediate transition matrix fall within tolerance bound of its stationary distribution. To determine the total chain length necessary to obtain desired accuracy, gibbit confines the target quantile to be in the range specified by the user. In so doing, gibbsit uses the asymptotic normality of the binary sequence distribution. However, caution is required in interpreting the results to successfully apply the method. (Brooks and Roberts 1999) Heidelbereer and Welch’s convergence diagnostic: Heidelberger and Welch (1983) applied Brownian Bridges statistics to detect transient phases. They use the Cramer-von- Mises statistic to test the null hypothesis that the sampled values for each variable form a stationary process. They also control run length by calculating confidence intervals. Geweke’s spectral density diagnostic: The functional of a positive-recurrent Markov chain is special case of a stationary time series. Geweke diagnoses convergence by an equal location test o f spectral density for two subsequences. If the chain has converged to its stationary distribution by a certain time, then the equal location test should be successful (Geweke 1992). Yu and Mvkland’s CUSUM method: This method is a graphical one based upon the sampler output (Yu and Mykland 1998). Smooth plot indicates slow mixing, and hairy plot indicates fast mixing rate. Benchmark CUSUM based on a sequence of i.i.d. normal variates with mean and variance matched to estimated moments can be used to assess hairiness. 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Yu’s Lx diagnostic: Suppose that a Markov chain has stationary density 7t(a) = c /r(a ), where n is known and c is the unknown normalization constant. Yu (1995) defines a kernel estimator, and an estimate of the normalization constant. Then he estimates the Ll distance over a particular subset between the kernel density estimate based on the whole chain and the one based on a finite run length. Garren and Smith’s convergence rate estimator: Garren and Smith (1995) tried to estimate the principle eigenvalues of the transition kernel in (3.23). Most MCMC algorithms converge to stationarity geometrically. In other words, there exists a measurable function V > 1, and a constant p < 1 such that ||/,'( ^ ) - * Q ||s K ( x ) /A (3.23) where P‘ denotes the /-step transition for the Markov chain. Furthermore, well-behaved Markov chains exhibit an eigen-expansion for their transition probabilities, i.e., P1 (x,•)-*(•) = a(x,A )4 + 0 ( 4 ), ( 3.24 ) where \X2\ < < 1. They suggest a heuristic approach to assessing convergence, given estimates for the constants in (3.23) and (3.24). 3.3.2 Multiple Sequences Gelman and Rubin’s PSRF (potential scale reduction factor): Gelman and Rubin (1992) proposed a multiple chain simulation method in order to overcome slow mixing, assess convergence, and provide a basis for inference involving estimates of posterior quantities. An estimate of the target distribution is created, centered about its mode (or 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. modes, which are typically found by an optimization algorithm) and overdispersed in the sense o f being more variable than the target distribution to start several independent sequences of the iterative simulation. Then, the multiple sequences are analyzed to form a distributional estimate of what is known about the target random variable, given the simulations thus far. Detailed procedure of the method is as follows. 1) Create a starting distribution. To do this, find the modes using either an optimization program or a statistical method such as EM (Dempster, Laird, and Rubin 1977). Then, obtain samples from an overdispersed distribution by first drawing from the normal mixture and then dividing each sample vector by a positive scalar random variable, an obvious choice being a Xq random deviate divided by rj (for example, if 7 = 1, we have a Cauchy mixture). Making this choice, the new distribution is then a mixture of multivariate t distributions. Sharpen the overdispersed approximation while keeping it overdispersed by downweighting regions that have relatively low density. One way of improving the approximation in this way is to use importance resampling (Rubin 1987). 2) Independently simulate m £ 2 sequences, each of length 2n, with starting points drawn from an overdispersed distribution. To diminish the effect o f the starting distribution, discard the first n iterations of each sequence, and focus attention on the last n. 3) For each scalar parameter of interest, calculate B / n = the variance between the m sequence means, X j , each based on n values ofX: (3.25) 43 permission of the copyright owner. Further reproduction prohibited without permission. where x ‘ ^ L x n <3- 26) and W= the average of the m within-sequence variances, sf: W = ± - f s f , (3.27) where <3- 2 8 ) 4) Estimate the target mean, 9 - 1 :c P(x) dx, by 0 , the sample mean of the mn simulated values of x. 5) Estimate the target variance, a 2 = i ( x - /u)2 ■ P(x)dx, by a weighted average of W and B, namely, a 2 = ^ - W + - B , (3.29) n n which overestimates <j2, assuming the starting distribution is appropriately overdispersed, but is unbiased for a 2 under stationarity (i.e., if the starting distribution equals the target distribution) or in the limit (n -> a>). Meanwhile, for any finite n, W should be less than a 2 because the individual sequences have not had time to range over all o f the target distribution and, as a result, will have less variability. In the limit as n -> oo, the expectation of W approaches a 2. 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6) Monitor convergence of the iterative simulation by estimating the factor by which the scale of the current distribution for X might be reduced if the simulations were continued in the limit n -> oo. This potential scale reduction factor (PSRF) is estimated by which declines to 1 as n->qo. 7) Once PSRF is near 1 for all scalar estimates of interest, summarize the target distribution by a set of simulations, rather than normal-theory approximations, in order to detect non-normal features of the target distribution. The simulated values from the last halves of the simulated sequences provide such draws. Approximate convergence is diagnosed when PSRF is less than 1.2 for all parameters, but for a final analysis of an expensive data set, a higher level of precision may be required. The accuracy of an answer can be assessed by calculating a confidence interval based on r-statistics centered at 6 , with scale + B/mn . The assumption of overdispersion of starting points can be confirmed by checking if the early intervals have conservative coverage over later distributions. Gelman and Rubin considered an example which involved fitting data from 17subjects. They ran 10 sequences using Gibbs sampler, each of which had 200 samples. Then they used the latter half of the sequences (i.e., a total of 10 x 100 = 1,000 samples) to calculate the posterior quantities. Brooks and Gelman (1996) proposed a refined version of the original method, which involved calculating the PSR iteratively (IPSRF: iterative potential scale reduction factor) to (3.30) 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. small sections of the sequence. The m sequences are divided into batches of length b. Then V(k), W(k), and R(k) are calculated based on the latter half of the observations of a sequence of length 2 kb, for k = l,...,n/b, for some suitably large n. If the sequence converges, the PSRF should approach to I as the sequence proceeds. This IPSRF is used to test the convergence of IMH chain in the results section. Later, Books and Gelman (1998) generalized the method to be used for multi parameter cases. With only a nominal increase in computational expense, each of the univariate measures is summarized in a single value, MPSRF (multivariate potential scale reduction factor). Ritter and Tanner’s weighting method: Ritter and Tanner (1992) tried to detect convergence of the full joint distribution by monitoring importance weights calculated from replications of a Gibbs sampler. Convergence is diagnosed by testing if the ratio o f the non- normalized stationary distribution to the Gibbs transition density is roughly constant. Zellner and Min’s Gibbs sampler difference convergence criterion: Instead of considering ratio as Ritter and Tanner did, Zellner and Min (1995) looked at difference. If the difference is small, then the estimated marginals have converged to the correct values. Liu. Liu and Rubin’s Ll convergence diagnostic: Liu, Liu and Rubin (1993) defined a global control variable to estimate the L2 distance between the r-step transition density and the stationary distribution. The control variable is calculated from m multiple replications of a Markov chain. Overall, a set of m(m-l) values are calculated, each using two independent parallel chains. Roberts’ I 2 convergence diagnostic: Roberts (1994) used a density ratio statistic (transition density for the chain to equilibrium distribution) to assess the I 2 convergence of 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the distribution of the chain as a whole, using samples from several independent replications of a Markov chain with finite Hilbert-Schmidt norm. Using the ratio statistics, dependence term and interaction term are calculated. Upon convergence, both of the terms must reach stationarity and have similar location. This method is restricted to the analysis of the output from a reversible Gibbs sampler. Brooks and Roberts’ total variation diagnostic: Brooks and Roberts (1998) diagnosed convergence by obtaining an upper bound to the Z ,1 distance between full dimensional kernel estimates from different chains. This method first calculates a Rao- Blackwellized estimate of the density of blocks of the whole chain, then forms a mean between chain distance with estimates of V distance between densities. Rosenthal. Roberts and Tweedie’s geometric convergence bounds method: Even when a Markov chain satisfies (3.23), it is rare for useful bounds on V and p to be attainable in even moderately complicated statistical models. This method (Rosenthal 1995; Roberts and Tweedie 1998) gives a flexible approach for computing V and p. The method runs two chains independently, one of which is started from the stationary distribution, and the other from some initial distribution, and both with the r-step transition kernel ) . 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4. Regenerative Markov Chain Monte Carlo Methods In this chapter, two methods for producing regenerative Markov chain are considered. In the first method, the IMH algorithm described in the previous chapter is split into independent subchains according to Nummelin (1984). In the second method, a hybrid approach according to Tiemey (1994) is considered. We then use the regenerative analysis developed by Iglehart and Crane (Crane and Iglehart 1974a; Iglehart 1975; Iglehart 1976) for queueing system analysis to analyze the regenerative chains. The former is retrospective in defining the regeneration times (i.e., complete IMH chain is constructed, then i.i.d. subchains are defined), while the latter is prospective (i.e., i.i.d. subchains are defined as the posterior samples are drawn). 4.1 Motivation MCMC methods allow Bayesian estimation to be applied to high dimensional problems and to problems involving general distributions. However, assessment of mixing and convergence of generic MCMC methods to determine the optimal combination of the bum-in period and the run length poses a formidable task. In addition, the analysis of the simulated chain can be controversial. Even though the literature has dealt with this issue and has suggested some guidelines, none of the proposed methods is proved to be sufficiently general to be applied to the PK/PD modeling problems that are the subject o f this dissertation. To overcome the limitations of generic MCMC methods, the regenerative methods are considered in this section. The first is the Independence Metropolis-Hastings 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with splitting (SIMH) and the second is the Rejection Independence Metropolis-Hastings (RIMH) algorithm. A process is regenerative if there is a sequence of random times at which the process starts over independently and identically. The tours of the process between these times are independent and identically distributed (i.i.d.). Methods of regenerative simulation analysis can then be applied to such i.i.d. chains to assess accuracy o f results, providing a built-in error analysis. In addition, the initialization problems do not arise and the run length determination is not necessary in hybrid sampling approach. In the following sections, how the SIMH algorithm identifies the regeneration times using the Lebesgue decomposition of measure is presented, and the algorithm is introduced. Then how the RIMH sampler regenerates itself by design is presented, followed by the description about the algorithm. The methods for analysis of the regenerative chain are explained in Section 4.4. 4.2 Splitting Method: SIMH (Split Independence Metropolis-Hastings) 4.2.1 Defining Regeneration Times of IMH Chain Let XH be an irreducible Markov chain on a state-space (£, £ ) with transition kernel P = P{x, dy) and invariant distribution /r. A set A e £ is a proper atom for the Markov chain if M.A) > 0 and P{x, •) = P(y, •) for all x, y e A. If a chain has a proper atom, then the times at which the chain enters the proper atom are regeneration times. Few chains contain proper atoms, but it is often possible to construct a related chain that does (Mykland et al. 1995). Suppose it is possible to find a function s(x) and a probability measure v(dy) such that 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. n(s) = \ s(x)7C(dx) > 0 ( 4.1) and P (i^ )> i(x )v (/1 ) (4 .2 ) for all x e £ and alM 6 £ . A pair (5, v) satisfying these conditions is called an atom for the transition kernel P. Atoms represent a generalization of proper atoms. If A is a proper atom, then (l ,,(*),/>(*,)) for some x e A is an atom. Condition (4.2) implies that we can write P(x,dy) = s(x)v(dy) + (l -s(x))Q(x,dy), (4.3 ) where Q is a transition kernel defined as ifs(x)<lj (4 4 ) = 1^0:) ifj(x ) = l . (4 .5 ) To generate Xn according to the above split scheme, generate a Bernoulli variable Sn with success probability s(x). Then generate from v if Sn = 1, otherwise generate X ^ from Q(x,■). However, sampling from the kernel Q may not be convenient. The following theorem accomplishes this task by generating the splitting variable Sn from their conditional distribution given the entire sequence {Xn,n = 0,1,...}. Theorem 4.1: Suppose that the pairs {X„, S„) are generated by choosing XQ from E according to an arbitrary initial distribution and for each n = 0, 1,..., generating first X^\ from P(X„, •) and then Sn as a Bernoulli variable with success probability 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Then (Xn, Sn ) is a Markov chain, the times when Sn = 1 are regeneration times (with probability I that the tours are all finite), and the expected tour length is E[Ni[ = \l Ms), where /rfs) = \ s(x)n(dx). Proof: Refer to Nummelin (1984, pp. 61-62).D The conditional regeneration probability (4.6) may not always be easy to calculate. But in many examples the generation of variables from P may produce additional information that allows generation of the Bernoulli variables S„ even if r(x, y) is not explicitly available. One example is the Metropolis kernel. Suppose that the distribution it we wish to sample has a density, also denoted by it,, with respect to a measure /x, n{dx) = Mx)jx(dx). Hastings’ (1970) version of the Metropolis algorithm originally introduced by Metropolis et al. (1953) generates the next step X„n in a Markov chain from the current state Xn by first generating a candidate step Y from a transition kernel and X^i is set equal to Y. Otherwise, the candidate is rejected and X„-t is set equal to Xn. Finding an atom for a Metropolis kernel can be done by finding an atom for the subprobability transition kernel q(x,y)aHx,y)\ that is, by finding a pair (5 ', v ) such that Q(X„, dy) = q(X„, y)&dy). (4 .7 ) This candidate is accepted with probability a(Xn, Y) where n(y)g{y,x) it(x)q(x,y) (4 .8 ) q(x,y)a(x,y)ft(dy) >s'(x)v'(dy) . (4 .9 ) Because the Metropolis kernel P satisfies P{x, dy) > q{x,y)a{x,y)^dy), (4.10) 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the pair (s', v') in (4.9) provides an atom of the kernel P. Proposition 4.1: Let (5, v) be an atom for the Metropolis chain and assume that v({x}) = 0 for all x e E. Then v has a density with respect to /a denoted by v(y), such that q{x, y)a(x, y) > s(x)v(y)] (4 .1 1 ) that is, (s, v) is an atom for q(x, y)a(x, y)/Ady). Proof: Let A e £ and x s E arbitrary, and set B = A - {*}. Because v{x} = 0, s(x)v(^) = s(x)v(B) < j q(x,y)a(x,y)M(dy). B < J q(x, y)a(x, y)M{dy) .□ (4 .1 2 ) A Thus the splitting variable S„ of Theorem 4.1 can be generated by allowing a split to occur only when a candidate step is accepted. Theorem 4.2: Suppose that the Metropolis chain satisfies (4.9), and suppose that A V * i and Sn are generated as follows: 1. Draw A 'b + i conditionally on X„ by taking candidate steps from q(X„, y)f4dy) and accepting or rejecting according to a(X„, y). 2. If the candidate in step 1 is rejected, then set S„ = 0. Otherwise, generate Sn as a Bernoulli random variable with success probability rA (Xn, X„ri), where . , s'(x)v'(rfv) , . , , , rA x,y) = — v ' ' . (4 .1 3 ) q(x,y)a(x,y) Then the process (Xn , S„) has the same distribution as the process described in Theorem 4.1. 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Proof: Let P{x,dy) denote the Metropolis kernel, and let An+ l be the event that the candidate for X n+ l is accepted. The conditional probability of An+i, given X„ = x and X„+ l = y , is given by P(An + l \ X n = x,X n+ l= y) = q(x,y)a(x,y)/i(dy)l P(x,dy). Thus the conditional probability that Sn = I , given X n = x and X n+ l- y , is P(Sn = l\X n = x,X n+ l=y) = P(.{Sn = \}n A n+l\X n = x ,X n + x =y) = P{Sn = l\X n = x,X n+r y , A n+0-P(An+l\X n = x,X n^ y ) = rA(x,y)q(X'y M x 'y M d y ) A K P(x,dy) s\x)v'{dy) q(x,y)a(x,y)n{dy) q(x,y)a(x,y)n(dy) P(x,dy) Let (s,,, v?) be an atom for the kernel Q, provided that there exits a positive function h such that for all x and y. This condition is formally similar to a reversibility condition, but h is not required to be a probability density. Under (4.15), the candidate acceptance probability becomes A x y j d y ) P(x,dy) (4.14) h(x)q(x,y) = h(y)q(y,x) (4.15) (4 .1 6 ) where w(x) = 7dx) / h(x). 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Theorem 4.3: Suppose that a Metropolis chain satisfies (4.15), and let (sq, vf) be an atom for Q. For any csplit > 0, set s\x) = sq[x)mm{csplit / vv(.r), 1 } and v'(dy) = v?(c(y)min{w(y) / csplit, 1}. Then (4.9) holds. Proof: If is sufficient to show that mm{csplit / w(jt), 1 }min{w(y) / csplit, I } < min (w(y) / w(x), 1} for all x and y and for any csplit > 0. To see this, note that if csplit / w(x) < 1, then H'fjO csplit 1 w (.r) ’ w(x) J (4.17) ( 4 .i8 ) whereas csplit / w(x) > I implies . u n |- i ^ , l l = m in| [ wf-t) J [csplit ) [csplit and min - ^ . . l U m i n (4.19) The v' given in Theorem 4.3 is not a probability measure, but this does not matter in (4.9), as v'(E) can be absorbed into s'. The tour length distribution for this atom depends on the choice of the constant csplit. The product s'(x)v'(dy) = sq [x)vq (dy) min {csplit / vv(.t),l} mm {w (_v) / csplit,\} ( 4.20 ) will be small if csplit is far above or below typical values of w(x). 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The most important case where condition (4.15) holds is an independence Metropolis chain. For an independence chain, candidates are generated from a fixed density / i regardless of the current state of the chain; thus q(x,y) = f ( y ) . Equation (4.15) holds for any h proportional to/. To apply Theorem 4.3, we choose sq (x) = I and vq(dy) = f(y)u (d y ). Independence chains are useful primarily as components in hybrid chains. For an independence chain with the split of Theorems 4.2 and 4.3, the distribution v' has density proportional to / ( y ) min{w'0')/ csplit,\}. This can be sampled by rejection sampling to obtain an initial value X0 for the chain corresponding to a regeneration if necessary. The conditional probability (4.13) of a regeneration at step n, given X„ = x , X n+ l = y and no rejection, simplifies to rA (x >y) = maxi— 7 -7 , j v v( jc) > csplit and w<y) > csplit, I w(x) w(y) J = max] ^ 1 if w(.r) < csplit and w{y) < csplit. [csplit csplit J = 1 otherwise. (4.21) Thus a regeneration is certain to have occurred if w(x) and w(y) are on opposite sides of csplit. 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2.2 Application to Bayesian Estimation The SIMH algorithm presented above can be applied to the Bayesian estimation problem defined in (2.1) by relating the prior density and the posterior density as follows: jz = p(a | z), (4.22) h= -----£<£)------- f (4.23) fl(a | z)p(a)da where a is the PK/PD parameter vector as defined in Section 1.2 and z is the measurement data. Then, K p{a\z)\l(ct\z)p{a)da w = — = ---------- = l{a | z ) . (4.24) h p(a) Thus, the regeneration probability becomes rA ) = m a x f e ^ - , .— " I if Kai)>csplit and l(aM) > csplit, [l(a,) /(«,+!)J = max | !(& /), Ka ,+i) 1 /(a .) <C S pHt and l(aM )<csplit, [csplit csplit J = 1 otherwise. ( 4.25 ) where O j is the current sample, a*.| is the next sample, and a,- * a I+I. 4.2.3 Implementation An algorithm implementing the SIMH sampler was written in Fortran and incorporated as a part of the ADAPT software package (D’Argenio and Schumitzky 1997). Model equations can be formulated as described in Section 1.2. The prior distribution can be 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. either defined as normal or lognormal. The random number generating routine used in the implementation combines two multiplicative congruential generators with additional shuffling capability to increase the period and to prevent the possible serial correlation (Press et al. 1992). Different generator from the one used in constructing IMH chain was used in step 4 of the algorithm below. The SIMH Algorithm 1. Generate an IMH chain a x ,...,an . 2. Select a constant csplit > 0. 3. £ = 1; for / = . 4. While a, * a M , generate u from U(0,1). 5. Consider the following three possible cases: { csplit csplit 1 77— T T ’T ? ------T T f’ /(af |z) l{ctM |z)J then accept Nk = i as the next regeneration step. ii) if /(a.-1 z) < csplit & /(a.+11 z) < csplit and u < maxi ^ 1' ■ - , ^ '*1 ,' -- [ csplit csplit then accept A fk =i as the next regeneration step. iii) else, accept Nk =i as the next regeneration step. 6. ax ,a„i+i ,... is a split IMH chain consisting o f i.i.d. subchains. 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3 Hybrid Algorithm: RIMH (Rejection Independence Metropolis Hastings) 4.3.1 Introducing Regeneration Into IMH Tierney (1994) proposed considering regenerative Markov chains to overcome the limitations of the generic MCMC methods. One easy way to introduce regeneration into an MCMC sampler is to use a hybrid algorithm in which the Independence Metropolis-Hastings sampler is combined in a mixture with an independence kernel where the proposal density is sampled by a rejection sampling. This hybrid algorithm will be referred to as the Rejection Independence Metropolis-Hastings (RIMH) sampler. To introduce regeneration into a Markov chain, fix a particular state and consider the times at which the chain returns to this state. Using the notations of the previous section, define a set C such that I (a \ z) < c , where 0 < c 5 l(a* \ z) and a* = arg{max/(a 1 z)}, the maximum likelihood estimate of a. Then, with a , • as the current sample and a , as a candidate, form an acceptance probability as follows (Chib and Greenberg 1995): Thus, whenever the current state of the chain is in C, any candidate is accepted, and the next state is generated as a draw from the target distribution, no matter what the particular state within C is. The algorithm will occasionally reject candidate steps when the chain is at a point a e C. This repeats the point within the sample path and thus compensates for the 1, for or , € C, (4 .2 6 ) , for or, € C a. e C, Ka,\zy (4 .2 7 ) (4.28) 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. deficiency in the envelope at a. Dependence is introduced to make up for this deficiency and ensure that the target distribution is the invariant distribution of the Markov chain. The visits to the set C form a renewal process that can be used for a regenerative analysis. Depending on the choice of c, the candidate samples are less frequently rejected than with the pure rejection sampling. Therefore a certain number of posterior samples can be obtained in shorter time. The constant c represents the contribution of the rejection sampler with respect to the EMH sampler. If c is large enough, the hybrid sampler will degenerate to a rejection sampling algorithm. 4.3.2 Application to Bayesian Estimation Since the RIMH algorithm includes the IMH sampler as its component, the discussion concerning the IMH in Section 3.2.2 is relevant to the RIMH sampler. There are two sampling stages in RIMH: the first one is the modified rejection sampling stage (i.e., step 5 in the algorithm), and the second is the three-way acceptance test which includes IMH likelihood ratio test (steps 6 and 7 in the algorithm). In the first stage, the sampler filters the prior samples, letting only those samples that possibly belong to the posterior enter the three- way acceptance test. Then, decision on acceptance is made with considering the location of the current sample with respect to the regeneration set C. 4.3.3 Implementation An algorithm implementing the RIMH sampler was written in Fortran and incorporated as a part of the ADAPT software package (D’Argenio and Schumitzky 1997). 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Model equations can be formulated as described in Section 1.2. The prior distribution can be either defined as normal or lognormal. The random number generating routine used in the implementation combines two multiplicative congruential generators with additional shuffling capability to increase the period and to prevent the possible serial correlation (Press et al. 1992). Three separate generators were used for the random parameter deviates, and the random numbers ul and m 2 in the algorithm below. The RIMH Algorithm 1. Find a* = arg{m ax/(a|z)}. 2. Select a constant c such that 0 < c < l(a* j z) and let C represent the set of a such that /(a | z) < c . 3. Select a 0 in C as the initial value. 4. For n = I,..., N, repeat 5-7. /(a , | z) 5. Generate a j fromp(a) and m 1 from C/(0,1) until m ! < — ----- , when currently at a , = c a n. 6. For a , e C, then a = a j. 7. Generate m 2 from £7(0,1). i) For a , « C and a , e C, Q if m 2 < -----------, then a„+i = cit, else a n+t = a ,. 1 2) ii) For a , € C and ary € C, l{CLj \z) if m 2 < / , then c H = a jt else a ^ , = a ;. l(.a> I z) 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8. ax ,ar„i+ 1 ,...,ar„,is an RIMH chain consisting of i.i.d. subchains. 4.4 Methods for Analysis of Regenerative Processes Regenerative analysis (Crane and Iglehart 1974a; Iglehart 1975; Iglehart 1976) can be used for analysis of the regenerative chains to estimate posterior quantities and to calculate the asymptotic confidence intervals associated with them. In the following two subsections, methods for estimating the posterior moments and quantiles are summarized. 4.4.1 Moment Estimation A regenerative process {X„:n> 0} with state space Rp , is a stochastic process that has increasing sequence of regeneration points A'i, A'i,..., that are random stopping times at which the process {X„: n > 0} starts from scratch according to the same probabilistic structure governing it at time Thus, between any two consecutive regeneration times Nk and A^i, the portion {X„: Nk <rt< A ^ ,} of the regenerative process is an i.i.d. replica of the portion between any other two consecutive regeneration times. Let vk - N k - N k_[ (4.29) be the length of each tour, and let f : R p -+R1 (4.30) be a measurable function. Then (4.31) 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is the total observation on th* fcth observed tour. The definition o f regenerative process ensures that the sequence {(n, F * (/)): k > 1} consists of i.i.d. random vectors. The measurable function/ can be defined depending on the quantity to be estimated (for example, / (x) = .t estimates x ). Define the quantity This classical estimator is strongly consistent, that is, it converges to 9(f) with probability 1 as n -> oo. To obtain an asymptotic confidence interval, set The k th cycle of the process {A'(n): n > 0} determines the quantity Zk ( / ) . Furthermore, (4.35) implies that Then according to the central limit theorem for i.i.d. random variables with zero mean and finite variance, (4.32) (4.33 ) Then, E [/(A ) ] can be estimated by the classical ratio estimator: dc(n)= = . v (4.34) Zi (f) = Yi (f)-9 (f)vk (4.35) £ [ * * ( / ) ] = o . (4.36) (4.37) 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where < x2( / ) S var[Z ,(/)]= var[rl ( / ) - « ( / ) v 1] = - ^ - t ( n - ^ h ) 2. ” “ 1 i=l The above equation can be equivalently written as JH{ec(n) -6 (f)} as 3 Then it follows that the interval i , ( . ) - ^ 3 L A w + i S ^ t ] ( 4 , 9 ) v(n)Vn v(n)vn is an asymptotic 100(1 - 2y) % confidence interval for 9(f) by the classical estimator. The half-length of the confidence interval is 1 Hn times a multiple Z\.r of the estimate of the standard deviation constant a(f) / E [ vj]. Therefore as n increases, the length of the confidence interval converges to 0, and the midpoint converges to the true value. For 95% (y = 0.025) confidence intervals, zt.r is 1.96. Several alternative estimators that attempt to reduce the bias have been proposed for 9(f). One of them is the jackknife estimator. Define 0 , = - ^ , (4.40) ' v,(n) (4.41) M * ‘ w And set • <442) n - l n - I jai 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. which is sample average of the data set deleting the ith tour. Then the jackknife estimator for a(J) is which is the average of deleted averages. The jackknife estimator for standard deviation for sample is is the asymptotic 100(1 - 2y) % confidence interval for the jackknife estimator. The posterior standard deviation for f (X) can be estimated using the standard formula: s2(f) = E[f2(X)]-{E[f(X)]}2. This requires first estimating E[f(X)] and E[f2 (A )], and then combining the two answers. However, the method for combining the two confidence intervals for £ [/(A )] and £ [ / 2(JQ] to obtain the confidence interval for o(/) in Crane and Iglehart (1974a) does not comply with the regenerative analysis. In other words, the actual coverage does not attain the nominal coverage. The jackknife estimator is a reliable method especially when the run length is not long enough, but requires more computing time than the classical estimator. On the other hand, the classical estimator can produce as good result as the jackknife estimator with less computation when the run length is reasonable. Therefore, the classical estimator is chosen (4.43) (4.44) Then, s jack ( n ) z \-Y (4.45) 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. as default estimator. Other ratio estimators that can be employed in regenerative analysis such as Fieller, Beale, and Tin estimators were also tested and the results were compared with those from the classical estimator. All of them gave similar results when the run length is adequately long, therefore only the classical estimator results are presented in the results section. 4.4.2 Quantile Estimation Two methods of estimating quantiles are summarized in the following sections: empirical distribution method and discretization method. The former method is computationally expensive, but it is more robust and general. The latter method, originally devised for the discrete state space cases with the purpose of reducing the computational burden, relies on interpolation of distribution function, therefore the state space has to be discretized. 4.4.2.1 Empirical Distribution Method The sample quantile is defined as 2 „(^ ) = inf{x:F„(x)>p}; 0 < p < 1, (4.46) where F„ (x) is the empirical distribution of the process based on n tours. For the discrete time regenerative process with continuous state space as in PK/PD application, the sample quantile Qn(p) is (Nnp)-lh order statistic if {Nnp) is integer, ([Ar„p]+ l)-st order statistic if {Nnp) is not integer, where [x] is the greatest integer less than or equal to x. This method 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of quantile estimation requires sorting the whole samples, which is computationally expensive. 4.4.2.2 Discretization Method In an effort to avoid the sorting process, Iglehart (1976) suggested quantile estimation method which uses discretization of the state space and linear interpolation of the distribution function. If / = {l,..., N + 1 } is the discretized state space, define f x = 1 (-« J x ]] + ('t - W ) ’ 1 ([jtH atl+l)’ ( 4-47 ^ YkM = J f x(X (s))ds, k>2, (4.48 ) vk- ! Zk (JC ) = Y k (JC ) - F{x)vk ,k>2, ( 4.49 ) F(x) = E[fx(X)]. (4.50) Using linear interpolation, Zx (x) for \ <x<N + \ can be written as z x(x)= z,([x ])+ M ^ h 5 M ( x _ W ) [x] + l-[x] = (l + W - x)Zi (M) + (r - (xjjz, ([x] +1). (4.51 ) Thus, < 7 2(x)s<72{zt(x)} = (1 + [X] -x )V ([x ]) + (x- [X ])2 a 2 ([X ] + 1 ) + 2(x - [x]Xl + [x] - x)£{Z, ([x])Z[ ([x] +1)}. From (4.35), (4.52) 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. cr2([x]) = (72 {r, ([x])} - 2F([x])cover, ([x]), v,) + F 2 ([x])<r2 (v,),(4.53 ) and similarly for a 2 ([x] +1). Then, according to the central limit theorem, the 100(1 - 2y) % confidence interval for Q(p) is f(n)z, Qn(p)~ r - ’Qn(P) + - (4.54) ~{n)f4n " v(n)f4n_ where /(« ) s t'u^cpj+i)! vn > which is the slope of F„ at the point Qn (p). This method was originally developed for discrete state space cases, therefore works better with those problems. This method requires less computation than the empirical distribution method, but the result is not as good, which will be thoroughly investigated in the results section. Furthermore, the performance of the estimator depends on the particular strategy employed. However, no information on the distribution function is available to determine how to discretize the state space until the simulation is done. 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. S. Applications The IMH, SIMH, and RIMH algorithms introduced in the previous chapters were implemented as part of a general purpose Bayesian estimation program (BAYESID) in the ADAPT package. To run the program, the distribution model for the prior and the seed for random number generators have to be provided for all the three algorithms. For IMH, the user has to input burn-in length and the total number of posterior samples to generate. For SIMH at the end of an IMH run, the user has to input the tuning parameter split to split the IMH chain. RIMH consists of two components. For the preliminary RIMH, the user has to input the number of prior samples to generate in order to obtain an estimate of the likelihood distribution. For the main RIMH, the user has to input the tuning parameter c to define the regeneration set C along with the number of subchains to obtain. The three algorithms have been applied to several PK/PD model estimation problems. Four of these applications are described below and have been used to evaluate the performance of the methods and investigate crucial issues involved with implementing the algorithms and applying the methods in real-world problems. The four estimation problems are arranged in the ascending order of the complexity in the following sections. 5.1 PK. Model of High-Dose Busulfan Busulfan can be administered in high doses along with cyclophosphamide to destroy a patient’s existing bone marrow in preparation for bone marrow transplantation. However, the pharmacokinetics of the drug is highly variable and this variability can contribute to 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. uncertainty in drug administration regimen. Low drug levels result in insufficient myeloablation in the host to permit engraftment. High drug levels can cause severe toxicity such as veno-occlusive disease of the liver. To achieve a targeted drug concentration for each patient, individualized dose adjustment based on measured drug concentration is necessary. Therefore, it is important to accurately define the pharmacokinetics of the drug in an individual for safe and effective clinical administration. Bayesian estimation methods are especially useful in this context because patient data can be incorporated into the estimation process along with prior information on the drug’s distribution. For purposes of myeloablation, the drug can be given as a parenteral dose, 4 times a day for 4 days (Rodman 1998). The model in Figure 5.1 is a simplified one-compartment linear model with two PK parameters, Ke and V, that describe the plasma concentration of the drug. The model input is via intravenous (IV) infusion and the output is the drug concentration. The prior statistics on the PK parameters were obtained from clinical study of 20 patients, ages from 0.1 to 20 years (Table 5.1). The prior distribution of the parameters was modeled as lognormal. The measurement error was assumed to be normally distributed, with a standard deviation linearly related to the output: <r(r) = 0.1 + 0.15y(r). (5 .1 ) Simulated data were generated from subjects randomly selected from the prior population distribution with three one-hour IV infusion of 38 mg/m2 at t = 0,6, and 12 hours. Figure 5.2 shows the plot of population simulation results. Data from simulated individuals were analyzed using the three estimation algorithms. In the results presented in the next chapter, the 10th, 50th, and 90th percentile subjects (denoted by BS10, BS50, and BS90) based on the concentration at t = 18 hours were selected for analysis. Table 5.2 shows 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the parameter values associated with the three subjects. Noisy data sets were generated, which are given in Table 5.3. The data used were 5 blood concentrations at t = 1.5, 2,3,4, and 6 hours, all of which were from the first dose. In Bayesian estimation of the Busulfan example, the drug concentration at t = 18 hours is predicted from the estimates of the system parameters. r(t) Figure 5.1 Busulfan PK model with 2 parameters. r(t): Bulsulfan infusion rate, V: distribution volume, Ke: elimination time constant. Mean Covariance Ke V Ke 0.299 (/hr) 0.08522 V 29.19 0) -0.881 18.282 Table 5.1 Prior statistics of Busulfan PK parameters. 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5 15 20 0 10 Time (hours) Figure 5.2 Population simulation of Busulfan model. Concentration curve was plotted with prior parameter means, and crossed points are population mean ± standard deviation. Prm BS10 BS50 BS90 Ke 0.350 0.287 0.350 V 31.3 25.8 9.61 Table 5.2 Parameter values for the 3 selected subjects from Busulfan population simulation. BS10 BS50 BS90 Time Exact Noisy Exact Noisy Exact Noisy 1.5 0.860 0.637 1.11 1.37 2.80 2.61 2 0.772 0.739 0.960 0.991 2.35 1.67 3 0.509 0.805 0.720 0.973 1.66 1.28 4 0.358 0.186 0.540 0.317 1.17 1.38 6 0.178 0.307 0.304 0.000 0.579 0.171 18 0.202 -1.0 0.368 -1.0 0.659 -1.0 Table 5.3 Exact and noisy data for the 3 selected subjects from Busulfan population simulation. The noisy data set was used for the Bayesian estimation. -1.0: missing data. 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2 PK Model of Teniposide Teniposide is used in treatment of children with acute lymphocytic leukemia, lymphomas or neuroblastoma. Clinical studies have shown that the oncolytic response to teniposide was correlated with the area under the blood concentration-time curve (AUC) of the drug, and a relationship between toxicity (i.e., mucositis) and AUC was also observed. In addition, substantial interpatient pharmacokinetic variability was noted, with systemic clearance varying over a fivefold range. Therefore, maintaining the AUC within a target range based on precise description of the PK parameters is necessary for safe and effective drug administration. Figure 5.3 shows the 4-parameter Teniposide PK model. Prior statistics of the PK parameters are modeled as lognormal based on published population study of children with cancer (Rodman et al. 1987) and are given in Table 5.4. The measurement error was assumed to be normally distributed with a standard deviation linearly related to the output: c t(0 = 1.0 + 0. ly (/). (5 .2 ) r(t) Kcp Kpc CL Figure 5.3 Teniposide PK model with 4 parameters. r(t): Teniposide infusion rate, Vc: distribution volume, CL: systemic clearance, Kcp and Kpc: elimination rate constants. 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Mean Covariance Kpc Kcp V CL Kpc 0.189 (/hr) 0.0112 Kcp 0.286 (/hr) 0.0024 0.0081 V 2.04 (Urn2 ) -0.0247 -0.0116 0.163 CL 0.652 (L/m2 /hr) -0.00030 -0.136 0.0697 0.0778 Table 5.4 Prior statistics of Teniposide PK parameters. The following set of equations describes the PK system of Teniposide example: dt t CL kC D +---- C P is \ c j *i(0 + *«k*2M + K 0 (5 .3 ) ^ ^ ■ = k 9 x l{ l)~ k pcx 1ff) (5 .4 ) (5 .5 ) at y x(t) = xx{t)!Vc (5 .6 ) y 2(0 = x3(0 (5 .7 ) where >»i(0 is the drug concentration andy2 (0 is the AUC between 0 and t (AUC(0, t)). Simulated data were generated from subjects selected from the prior population distribution with four-hour IV infusion of 60 mg/m: at t = 0 hour. Figures 5.4 and 5.5 show the plots of population simulation results. Data from simulated individuals were analyzed using the three estimation algorithms. In the results presented in the next chapter, the 10th, 50th, and 90th percentile subjects (denoted by TN10, TN50, and TN90) based on the systemic exposure at t = 24 hours (i.e., AUC(0,24)) were selected. Table 5.6 shows the parameter values associated with the three subjects. Noisy data sets were generated, which 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. are given in Table 5.6. The data used were 4 blood concentrations at t = 1.5,3.5, 8, and 24 hours. In Bayesian estimation of the Teniposide example, the AUC at t = 24 hours is predicted from the estimates of the system parameters. 75-i 50- e o S c 8 o o 25- 30 0 10 20 Time (hours) Figure 5.4 Population simulation of Teniposide model. Concentration curve was plotted with prior parameter means, and crossed points are population mean ± standard deviation. 1000-1 E 750- w JZ i O> a. 500- o < 250- 20 30 0 10 Time (hours) Figure 5.5 Population simulation of Teniposide model. AUC(0, t) curve was plotted with prior parameter means, and crossed points are population mean ± standard deviation. 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Prm TN10 TN50 TN90 Kpc 0.106 0.0908 0.192 Kcp 0.143 0.372 0.644 V 2.29 2.38 1.45 CL 1.02 0.539 0.235 Table 5.5 Parameter values for the 3 selected subjects from Teniposide population simulation. TN10 TN50t TN90 Time Exact Noisy Exact Noisy Exact Noisy 1.5 26.2 22.7 25.3 28.8 37.5 35.8 3.5 39.8 40.3 39.2 39.8 57.7 47.5 8 6.72 9.53 10.3 12.7 24.1 20.4 24 1.07 0 4.82 2.99 14.0 15.9 Table 5.6 Exact and noisy data for the 3 selected subjects from Teniposide population simulation. The noisy data set was used for the Bayesian estimation. 5.3 PK/PD Model o f Terbutaline Terbutaline is antiasthmatic agent, which can be administered orally or subcutaneously for the treatment of reversible airway obstruction. Terbutaline produces a dynamic effect by affecting FEVj (forced expiratory volume in 1 second). The delayed response of FEV, is characterized by the delay of the drug plasma concentration. An effect compartment model is used to describe the equilibrium between the two processes. A sigmoid response function is used to relate the FEVj to the effect compartment. Figure 5.6 shows the 9-parameter PK/PD model of Terbutaline. To define the prior distribution, the PK/PD parameters were assumed to be normally distributed in log scale. Since the log transform of the normal distribution is lognormal distribution, the prior distribution of the PK/PD parameters was consequently modeled as lognormal. The parameter values were rescaled by multiplying 100 or 1,000 in order to prevent the 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. covariance matrix possibly being non-positive-definite. The mean vector and covariance matrix were obtained using the standard 2-stage method from 12 individual parameter estimates (Oosterhuis et al. 1986), and they are shown in Table 5.7. The measurement error was assumed to be normally distributed with a standard deviation linearly related to the output: <7(0 = 0.1y(r). (5 .8 ) b(t) Kcp Kpc Kel Keo Vp Em ax E EC50 EO Figure 5.6 Terbutaline PK model with 9 parameters. b(t): Terbutaline dose, Vc and Vp: distribution volume, Ka: absorption rate constant, Kel, Kcp, and Kpc: elimination rate constants, Keo: rate constant between central and effect compartments, Emax, EC50, and EO: PD parameters. 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without perm ission. Mean Covariance LnlOO LnlOOO LnlOOO LnlOOO LnlOO LnlOO LnlOOO LnlOO LnlOO Ka Kcp Kpc Kel V Keo EC50 Emax EO LnlOOKa 5.79 (hr-1) 0.353 LnlOOOKcp 5.89 (hr-1) -0.342 0.616 LnlOOOKpc 6.20 (hr-1) -0.0731 0.0970 0.116 LnlOOOKel 6.02 (hr-1) -0.115 0.147 -0.0310 0.143 LnlOOV 8.43 (L) 0.0337 -0.121 0.0510 -0.133 0.192 LnlOOKeo 6.18 (hr-1) 0.0605 -0.214 0.0517 -0.00651 0.0606 0.300 Lnl000EC50 7.39 (ng/ml) -0.113 0.298 -0.0330 -0.0319 -0.0921 -0.485 0.966 LnlOOEmax 5.88 (L) 0.0248 -0.00397 -0.00383 -0.0728 0.0843 -0.0849 0.156 0.133 LnlOOEO 5.18 (L) 0.0716 -0.0498 -0.0689 -0.119 0.107 -0.203 0.298 0.143 0.315 Table 5.7 Prior statistics of Terbutaline PK/PD parameters. -j — j The following set of equations describes the PK/PD system of Terbutaline example: = ~ ( k el + k cp )*I (0 + k a x l (0 + k pcx l (0 ( 5 9 ) dt dx2 (0 dt = ~kax 2(0 (5.10) dx- i— = kcpX \ (t) - k pcx 2 (/) (5.11) dt dx 4(0 ^ / 5 W - x4« (5.12) y(t) = E 0 + (Emax E0) x4(t) (5.13) * £C50 + x4(/) 4 where x{ (/). x2(t), and jc3 (/) represent the amount of drug in compartments 1,2, and 3, respectively, x4(t) is the concentration of drug in the effect site, and y(t) is the FEV,. Simulated data were generated from subjects selected from the prior population distribution with three bolus doses of 0.5 mg at t = 0, 24, and 48 hours. Figure 5.7 shows the plot of population simulation results. Data from simulated individuals were analyzed using the three estimation algorithms. In the results presented in the next chapter, the 10th, 50th, and 90th percentile subjects (denoted by TB10, TB50, and TB90) based on the FEV, at t = 56 hours were selected. Table 5.8 shows the parameter values associated with the three subjects. Noisy data sets were generated, which are given in Table 5.9. The data used were six FEV, measurements at t = 1, 8,23,25,32, and 47 hours. In Bayesian estimation of the Terbutaline example, the FEV, at t = 56 hours is predicted based on the estimates of the system parameters because it is the target time for the feedback control in D’Argenio and Park (1997). 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0-|--------- 1 --------- 1 --------- 1 --------- 1 ----------1 ----------1 0 10 20 30 40 50 60 Time (hours) Figure 5.7 Population simulation of Terbutaline model. FEV i curve was plotted with prior parameter means, and crossed points are population mean ± standard deviation. Prm TBIO TB50 TB90 LnlOOKa 5.78 5.02 5.60 LnlOOOKcp 6.82 6.26 5.16 LnlOOOKpc 6.29 6.44 6.14 LnlOOOKel 6.51 6.26 5.62 LnlOOV 7.66 8.68 9.30 LnlOOKeo 6.09 6.62 6.37 Lnl000EC50 7.50 6.40 7.06 LnlOOEmax 5.57 5.92 6.20 LnlOOEO 4.41 4.81 5.87 Table 5.8 Parameter values for the 3 selected subjects from Terbutaline population simulation. TB10 TB50 TB90 Time Exact Noisy Exact Noisy Exact Noisy 1 2.36 2.13 3.39 2.98 4.54 4.38 8 1.53 1.55 2.43 2.66 4.05 3.44 23 0.888 1.04 1.30 1.32 3.61 3.22 25 2.36 2.09 3.40 3.81 4.55 4.91 32 1.54 1.69 2.43 2.1 4.05 3.17 47 0.889 1.06 1.30 1.02 3.61 3.86 56 1.54 -1.0 2.43 -1.0 4.05 -1.0 Table 5.9 Exact and noisy data for the 3 selected subjects from Terbutaline population simulation. The noisy data set was used for the Bayesian estimation. -1.0: missing data. 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4 PK/PD Model of Warfarin Warfarin is one of the most widely used oral anticoagulants. Warfarin is administered orally and acts as a depressant of Factor VII and Factor II (prothrombin) in the blood clotting cascade (Figure 5.8). Warfarin is a racemic mixture of two stereo-isomers (levorotatory or S-enantiomorph and dextrorotatory or R-enantiomorph). In a population study of Warfarin in 10 healthy individuals, the PK/PD model shown in Figure 5.9 was developed (Forrest 1998). The PK model has two identical two-compartment structures for S and R, but with different kinetic parameters. The PD model used assumes both S and R enantiomorphs have the same potency. It is an indirect effect pharmacodynamic model where the zero order production of Factor VII and Factor II is decreased as a function of S and R Warfarin. The prior distribution of the parameters was modeled as lognormal. The population statistics on PK/PD parameters are given in Tables 5.10 and 5.12. The Warfarin model has a total of 18 PK/PD system parameters and 2 initial conditions to estimate. The measurement error was assumed to be normally distributed with a standard deviation linearly related to the output: a(/) = 0.15 + 0.1y(r). (5.14) The following set of equations describes the PK/PD system of Warfarin example: dx\(0 dt { CLtR C Ld\ Vc Vc ) + xx (0 + kax 2 (0 + ~ ^ - x 2 (0 Vp (5.15) (5.16) 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. dxM ) CLd /x CLd .. - “ r 2 = -ET“ * i( 0 - - r ^ * 3 ( 0 Fc <&4(0 _ dt (CLtS CLd' + dxs(t)_ { Vc = ~kax 5(r) Fc * 4 ( 0 + * a *5 ( 0 + * 6 ( 0 Vp dt dx6(t) CLd dt Vc , x CLd , x *4( 0 — r —*6(0 Fp r _ Jr,(0 x4(0 Vc Vc dxjit) dt = k,C 7»-k 7 ■>//7 1 — \ i c s m r ' + c ^ - k C ~ n r ~ 1 ss2 -H 2 1 — ~ *7*7(0 ~ * 2 * 8(0 /M ? = /rain IC50IIH2 +CH2, Fm21H27 + (.t7 (/) + P27 • „tg (tj)Hl1 ^ (5.17) (5.18) (5.19) (5.20) (5.21) (5.22) (5.23) (5.24) (x 7 ( 0 + P 2 7 - x 8 ( 0 ) " 27 An oral dose of 75 mg was given on the first day, and measurements of 4 blood concentrations (S, R, Factor VII, and Factor II) and INR (International Normalized Ratio) were obtained for 10 days. The measurement data were kindly provided by Dr. Alan Forrest, and selected data from five subjects are shown in Figure 5.10. Data from two subjects (denoted by WF02 and WF03) were analyzed using the three estimation algorithms, with only INR measurements used to estimate model parameters (Table 5.11). In Bayesian estimation of the Warfarin example, INR at t = 48 and 72 hours are predicted from the estimates of the system parameters. 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Blood Clotting Cascade Extrinsic pathway Intrinsic pathway Activator: tissue thromboplastin Common pathway vn Vila " I I X — ► Xa V TI complex (VII, tissue thromboplastin, calcium, phospholipids) V complex (V, Xa, calcium, phospholipids) Activators: collagen, glass, and others XII — ► Xlla 1 XI — ► XIa 1 IX — ► IXa VIII complex (Vin, IXa,calcium, phospholipids) prothrombin(II) thrombin fibrinogen— ► fibrin XIII fibrin polymer Figure 5.8 Blood clotting cascade diagram (Fox 1996). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. D/2 S CLd CLtS i D/2 I CLd Vp IC7 Factor V II H7 IC50-FVII c S + c r k 2 * C ss2 Factor II IC50-FII C s +C r Clotting Tim e (INR) H 27 Imin Fm 27 FVII+P27*FII Figure 5.9 Warfarin PK/PD model with 20 parameters. 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Prm Mean Vc 5.47 n> Vp 5.16(1) CLd 2.63 (1/hr) CLtR 0.160 (1/hr) CLtS 0.281 (1/hr) ka 3.01 (/hr) IC50-FVn 975 (us/1) H7 3.17 k7 0.0783 (/hr) Css7 147 (limits) IC50-FH 1100(us/l) H2 3.93 k2 0.0124 (/hr) Css2 140 (limits) Imin 0.740 (INR) Fm27 34.2 (limits) H27 1.19 P27 0.183 Table 5.10 Prior mean vector of Warfarin PK/PD parameters. Time WF02 WF03 0 1.6 108 12 42.7 42.9 24 17.8 17.1 48 6.02 3.80 72 12.8 3.55 120 75.7 20.4 168 126 74.0 240 140 137 Table 5.11 Measured INR data for the two subjects. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a S T — ^ < N — o o o o > 0 — tn f* p « * o ^ © — f C ' 2 o 9 g p 0 9 § 9 p - w > s © m s p ; a g r* O ^ o — » A ^ 'O S ;3 9 t 9 © 3 9 ^ \fl O v ^ « 8 0 H c - * « n O 00 <n r* * Q J Q Z m © _ — (N < N 9 9 - 2 d 9 § 9 0 — O - f r s O „ s ^ g i p d c 9 o d 9 0 0 0 0 rn o n £ - S 2 5 ^ o - '0 ^ ,f fi« » “ 00 9?S9dfs 9 9 99 — 2 — 0 « £ S S 2 9 9 3 N O ' fls S C ; s O < » 2 ^ fj O Q K S 9 9 . — < N © O § « o ^ 9 O 00 . o z g i R 2 d d » O © d rn ^ m Z T — ~ o» 5 2 3 S o 9 o 9 o & d d — 00 — 5 5 s g g 9 d 9 9 9 ;C J o S ,,' 9 ' 0 ' ' n o S g g l s g i s s g l ^ 9 o d d — o o 9 o o e> — < * v s s g 9 9 9 i g R § 2 g - S 2 8 ^ 9 S o S 0 0 0 —0 0 . — . 0 *0 ^00 < r n « J n o S S » » s 5 ; N S l N S S S * 2 = i 5 g i l g 8 § . 0 3 o < N © — © m o o ^ rn ... N o N - V o ? s s s 000 fN fN fN o 5 d 9 w r-o f* ! ri e n rs — i -4 a Ehts m S < n S ’5£fs > > u u u i u i 5 u u i 2 u l £ i r-» < N Z a. in ha « u E 2 C 3 a a. c -c < 2 c 3 £ V. O X • f i u u c 2 •u £3 > O u w o " C a. e N vri J U M A H 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4000-. 3000- (0 2000- WF02 WF03 WF04 WF05 1000- - r - 1-------- •-----1 — 100 200 Time (hours) 300 4000-1 3000- 2000- 1000- 100 200 Time (hours) 300 Figure 5.10 Measured data for subject receiving Warfarin. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 200-1 § 100 i , • I_* _ 100 200 Time (hours) 3 0 0 200-i o c S 100-; £ 100 200 Time (hours) 3 0 0 7 .5 5 .0 - 2 .5 0.0 - » * * 100 Time (hours) - i— 200 3 0 0 Figure 5.10 (Cont’d) 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6. Results 6.1 Busulfan Example 6.1.1 Results with IMH The mean of the prior distribution was used as the starting point for the IMH algorithm. To form an ergodic average from the IMH chain, a decision has to be made on how many samples to discard in consideration of the bum-in period. This is necessary in order to reduce the effect of the initial distribution on the target distribution, because it may take the sampler some time to reach equilibrium. The general guideline is not to discard more samples than are necessary for the dependence to decay to a negligible level (Geyer 1992). The degree of dependence between samples can be assessed by calculating the autocorrelation of the chain. Figure 6.1 shows the autocorrelation plots for the two parameters, Ke and V, for the three subjects. Judging from the autocorrelation plot, 100 samples seems to be more than sufficient as bum-in length for all the three subjects. In the analysis presented below except for the convergence test, the first 100 samples of the generated chains have been discarded. Figure 6.2 shows the IPSRF (iterative potential scale reduction factor) to test the convergence to the posterior distribution for BS10 (Brooks and Gelman 1996). Five independent chains were constructed from different starting points. Different seeds for random number were used for each of them. The five different starting points were made up using the minimum and maximum parameter values from the prior population simulation to 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. BS10 1 o .o o 0 .0 20 40 O O O O t O O t 20 140 160 1 ao 200 BS50 20 4 0 e o a o i o o 12 0 1 a o 1 a o 200 1 o .a o >0.0 BS90 1 0 .0 o •0 . 0 20 40 e o 00 1 00 120 t e o 100 200 t 0 . 0 o ■ 0 . 0 I . Figure 6.1 Autocorrelation plots to assess mixing. (Busulfan, IMH) 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. mimic the overdispersed distribution (Gelman and Rubin 1992). The first 1,000 samples (i.e., 900 posterior samples plus 100 bum-in samples) were divided into 50 batches, each of which had 20 samples. The PSRF is less than 1.2 from the beginning and approximately 1 after 400 iterations, which proves the convergence to the identical posterior distribution according to the Brooks and Gelman’s criterion. 1.31 1 | I \ I I » i ■ 1 2 x x * x X * * 1.1 - N * x x x I . ^ v v ^ v x x w ^n ^x v x vxacmooogoweoooocoooeoooocxicooooccoooocoooeooct O gl--------- 1 -------1 ---------i -------- 1 ---------i -------- 1 -------- 1 ------------- 1 ---- 100 200 300 400 500 600 700 800 900 1000 Iteration Number X X X * x : x > * J O < * * , 'x * x x X ' ‘xxJOOcW XxSOOOOOOOOtXSOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOeOOOOOOOOOMOC , x ' - x x x 0.9 600 700 800 900 1000 100 200 300 400 500 Iteration Number Figure 6.2 IPSRF (iterative potential scale reduction factor) to diagnose convergence using multiple chains for BS10. (Busulfan, IMH) 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A total of 5,000, 7,000, and 25,000 posterior samples after the bum-in period were obtained from three separate runs of BAYESID for BS10, BS50, and BS90, respectively. The first 100 samples of the IMH chain for BS10 are plotted in Figure 6.3. The horizontal lines in the figure are the portions of chain where the candidate samples are rejected and the current sample is retained. The MH acceptance ratio was 0.271 for BS10,0.178 for BS50, and 0.0537 for BS90. The marginal and the joint density plots for BS10 are shown in Figure 6.4 with the 95% HPD (highest posterior density) interval indicated by a thick line. The prior statistics and the posterior statistics from the IMH chain for the three subjects are summarized in Table 6.1. In the table, the modes were found by the MAP estimation of ADAPT 0. 0.75-. 0.50- o X 0.25- 0.00 75 100 25 50 S a m p le n u m b e r 75-i 50- > 25- 75 100 25 50 S a m p le n u m b e r Figure 6.3 Posterior sample path for BS10. (Busulfan, IMH) 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. BS10 0.4-1 0.3- J £ •£ 0 .2 - PRIOR 95% HPDI 0 .0 -t— 0.00 0.25 0.50 0.75 1.00 Ke (/hr) O .S -i 0.4- ? 0.3- 0 .2 - 0 . 1 - 0.0 25 125 150 0 50 75 100 v (I) 75-| 50- 25- - , 0.0 0.1 0.2 0.3 0.4 0.! Kb 0.6 O .S -i 0.4- u e _ 0.3- O u X 0.2- 0 . 1 - 0.0 0.00 0.25 0.50 0.75 1.00 1.25 1.50 Conc.(t*18) Figure 6.4 For subject BS10, the prior and posterior marginal distribution for Ke and V (upper panel). The posterior joint density plot (lower left). Prior (-------) and posterior (------) distribution of plasma concentration (lower right). (Busulfan, IMH) IMH PRIOR BS10 BS50 BS90 Ke V Ke V Ke V Ke V Mean 0.297 29.5 0.261 36.9 0.381 21.0 0.423 10.5 Mode 0.28 20.0 0.251 36.4 0.373 20.7 0.423 10.3 Std 0.0867 17.0 0.0578 7.14 0.0797 4.29 0.0619 1.79 Cov - Ke 0.00751 0.00330 0.00642 0.00386 -V -0.798 289 -0.285 50.4 -0.273 18.5 -0.0933 3.21 Corr -0.54 -0.70 -0.79 -0.84 95% CDI [0.154, [6.34, [0.160, [24.0, [0.227, [13.2, [0.300, [7.01, 0.468] 62.4] 0.377] 51.6] 0.531] 29.7] 0.540] 14.0] # Sam 1,000 5,000 7,000 25,000 ples Table 6.1 Table of moments. Mode is MAP estimate. CDI = central density interval. (Busulfan, IMH) 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.1.2 Results with SIMH The IMH chain constructed in the previous section was split as described in Section 4.2. Table 6.2 shows the effect of using a range of values of the tuning parameter which was denoted by csplit in the algorithm. This parameter is defined using the likelihood values of the posterior samples. The number of resultant subchains increases as csplit becomes larger, and maximum number of subchains is attained at mean likelihood value. If csplit exceeds the mean likelihood value, the number of subchains decreases. To provide the user with a problem-independent method for selecting the chain spitting parameter csplit, the BAYESID program allows the user to specify a related parameter, split, which is defined as follows: if split is 0, csplit is equal to the minimum likelihood value; if split is 100, csplit is equal to the mean likelihood value. Intermediate values of split linearly interpolate csplit between these two extremes. split csplit # Subchains Avg. subchain length # Unique samples 0 0.000275 1 2 - 2 0.0657 100 48.8 13.6 5 0.164 211 23.2 6.43 10 0.328 352 14.1 3.85 20 0.655 549 9.07 2.47 30 0.982 712 7.00 1.90 40 1.31 866 5.77 1.57 50 1.64 977 5.11 1.39 100 227 1143 4.37 1.19 Table 6.2 Effect of different values o f split for BS10. (Busulfan, SIMH) 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The number of unique samples in subchain in Table 6.2 was obtained by dividing the number of accepted samples by the number of subchains, because the chain can stay at a same position when the candidate is not accepted. The number of posterior samples and the tuning parameter split were selected so that 100 or more subchains could be available (cf. Tierney 1992), with the average number of unique samples in subchain to be 10 or more. The reason for these choices will be elaborated in the following paragraphs. Once the i.i.d. subchains are obtained by the splitting technique, asymptotic confidence intervals for the estimates can be calculated using the regenerative analysis described in Section 4.4. Figure 6.5 shows the 95% confidence intervals associated with the posterior mean, and 2.5%, 97.5% quantiles for the PK parameter V of BS10. The quantiles were estimated using the empirical distribution method described in Section 4.4.2.1. 95% Cl for Q2.5 95% Cl for Mean [22.8, 24.8] [36.4, 37.5] 0.3-1 0.2 - 95% Cl for Qq7 5 [51.0,54.1] > a 0 .1- 0.0 0 10 20 30 40 50 60 70 V(l) Figure 6.5 Asymptotic 95% confidence intervals for the posterior mean, and 2.5%, 97.5% quantiles for the PK parameter V of BS10. (Busulfan, SIMH) 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To assess the validity of the confidence intervals, a coverage test was done with 100 replications. One hundred confidence intervals were compared with an estimate of the true value that was obtained by taking average of the results from 5 long runs (20 times longer than each of the 100 replications). The percent coverage for the subject BS10 is summarized in Table 6.3, for values of split ranging from 2 to 100. As for the confidence intervals associated with E(Prm) and E(Prm2 ), the coverage is close to the nominal 95% irrespective of the value of split. However, the coverage for the quantiles becomes worse if there are not enough unique samples in subchains. For this example, the empirically determined minimum number of unique samples in subchains is 10 or more, in order for the coverage of the 95% confidence interval to be close to the nominal value, that is, for the percentage coverage to be at least 90 with probability 0.9. The results from an analogous coverage study using the 20-parameter Warfarin PK/PD model were similar: 9 or more for Busulfan and 10 or more for Warfarin. Using this guideline, split was appropriately chosen for each subject. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. % Coverage of 95% Cl ________split________ Prm E(Prm) 95% Cl 2 5 10 20 30 40 50 100 Ke 0.260 [0.255, 0.265] 95 95 95 95 95 95 95 95 Ke2 0.0710 [0.0684, 0.0736] 95 95 95 95 95 95 96 95 V 36.9 [36.4, 37.5] 91 93 95 94 94 94 94 94 V2 1420 [1380, 1450] 92 94 94 94 94 94 94 94 Q(Ke25) 0.166 [0.161,0.173] 91 91 90 88 88 86 86 85 Q(Kes) 0.179 [0.175.0.184] 95 96 95 95 95 91 90 89 Q(Ke5o) 0.255 [0.251,0.260] 95 95 95 95 95 95 95 95 QCKejj) 0.363 [0.356,0.376] 94 95 95 90 89 89 88 87 Q(Ke9 7 .5 ) 0.395 [0.382,0.400] 91 91 90 90 90 89 88 87 QCVw) 23.9 [22.8, 24.8] 93 92 92 91 91 90 90 89 Q(VS ) 25.7 [24.9, 26.4] 95 93 93 94 92 92 89 89 Q(Vso) 36.7 [35.9, 37.4] 96 95 95 92 92 90 90 86 Q (V,,) 49.7 [48.6, 50.8] 94 92 92 91 89 89 89 88 Q(V9 7 .s) 51.6 f51.0, 54.1] 96 96 93 94 94 93 92 88 % Coverage of 95% Cl split Prm E(Prm) 95% Cl 2 5 10 20 30 40 50 100 Y 0.333 [0.325,0.341] 93 93 96 95 96 96 96 96 Y2 0.120 [0.114,0.126] 96 94 96 96 97 96 96 96 Q(Y25) 0.177 [0.171,0.184] 96 97 95 95 92 92 92 90 Q(Ys) 0.200 [0.186,0.206] 94 92 92 91 91 89 88 89 Q(YS o) 0.320 [0.315,0.329] 91 92 90 88 86 84 82 79 Q(Y9 5 ) 0.502 [0.489,0.512] 96 95 95 92 91 91 91 91 Q ( Y 97.j ) 0.530 f0.517, 0.552] 92 92 90 88 87 86 86 81 Table 6.3 Percentage coverage (% Cov.) for BSIO. Mean and confidence interval are calculated with split = 2. (Busulfan, SIMH) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 6.4 summarizes the features of the split IMH chains. The run time was measured on Sun workstation (Ultra Sparc 10), and it is the time required for generating the posterior samples. Therefore, the run time does not include the analysis time, most of which is occupied by sorting procedure. The complete estimation results using the SIMH algorithm for the three simulated Busulfan data sets are summarized in Table 6.5. The coverage is close to the nominal 95% both for the system parameters and output, and both for the moments and the quantiles. BS10 BS50 BS90 split 2 0.7 0.8 it Subchains 100 92 126 Avg. subchain length 48.8 73.6 194 Avg. # unique samples 11.4 12.0 10.9 in subchain it Bum-in samples 100 100 100 it Posterior samples 5,000 7,000 25,000 MH acceptance ratio 0.271 0.178 0.0537 # Acceptance 1,356 1,243 1,343 Run time 20 s 28 s 1 m 39 s Table 6.4 Features of the split IMH chain. (Busulfan, SIMH) 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Subject % Subject % Subject % Prm BS10 Cov. BS50 Cov. BS90 Cov. Ke [0.255,0.265] 95 [0.374, 0.386] 94 [0.418, 0.426] 92 Ke2 [0.0684, 0.0736] 1 95 [0.146, 0.156] 94 [0.178, 0.185] 94 V [36.4, 37.5] 91 [20.8,21.3] 92 [10.4, 10.7] 91 V2 [1380, 1450] 92 [450, 473] 92 [111, 117] 94 Q(Ke:5) [0.161,0.173] 91 [0.226, 0.249] 93 [0.302, 0.317] 91 Q(Kes) [0.175,0.184] 95 [0.249, 0.267] 94 [0.321,0.330] 94 Q(Ke5 0 ) [0.251,0.260] 95 [0.367,0.385] 94 [0.409, 0.428] 94 Q(Ke9 5 ) [0.356,0.376] 94 [0.506, 0.551] 87 [0.516,0.545] 89 Q(Ke975) [0.382,0.400] 91 [0.531,0.590] 84 [0.537, 0.579] 91 Q(v i j ) [22.8,24.8] 93 [12.4, 14.1] 94 [6.89, 7.38] 94 Q(VS ) [24.9, 26.4] 95 [14.1, 15.1] 91 [7.38, 7.87] 96 Q(Vso) [35.9, 37.4] 96 [20.0,21.3] 93 [10.2, 10.6] 95 Q (V9 5 ) [48.6, 50.8] 94 [28.0, 29.3] 97 [13.4, 13.9] 92 Q ( V 9 7 .s ) [51.0, 54.11 96 [29.2,31.81 95 [14.0, 14.6] 93 Subject % Subject % Subject % Prm BS10 Cov. BS50 Cov. BS90 Cov. Y [0.325, 0.341] 93 [0.267, 0.281] 92 [0.401,0.414] 92 Y2 [0.114, 0.126] 96 [0.0794, 0.0884] 92 [0.171,0.182] 94 Q(Y2S) [0.171,0.184] 96 [0.115,0.139] 89 [0.214, 0.258] 93 Q(YS ) [0.186,0.206] 94 [0.131,0.150] 91 [0.256, 0.275] 92 Q(Yso) [0.315, 0.329] 91 [0.251, 0.276] 93 [0.391,0.406] 93 Q(Y9 5 ) [0.489,0.512] 96 [0.425, 0.480] 90 [0.568,0.602] 95 Q(Y9 7 .5) [0.517, 0.552] 92 [0.486, 0.516] 94 [0.612,0.644] 95 Table 6.5 Percentage coverage (% Cov.) for the three subjects. Y is Busulfan plasma concentration. (Busulfan, SIMH) 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.1.3 Results with RIMH To select the constant c to define the regeneration set C, a preliminary run can be used to obtain an approximate likelihood distribution as well as to search for the maximum likelihood value. In this preliminary run, the parameter vectors are generated using the prior statistics with no acceptance/rejection test. Then c can be set at a certain quantile of the prior likelihood distribution. Depending on the difficulty of a problem (i.e, how far the posterior mean is located from the prior mean), the length of the preliminary run may need to be adjusted in order to obtain better estimate of the likelihood distribution and the maximum likelihood value. A total of 5,000 samples were generated in the preliminary runs for all of the examples investigated in this study. When drawing the posterior samples, the RIMH sampler automatically finds an initial value which is a sample from the set C. Then it moves in and out of C during the process of constructing the posterior distribution until a predefined stopping rule applies. Li.d. subchains are defined whenever the chain visits C along the sample path (Figure 6.6). Some subchains may have as few as one sample, and the average subchain length depends on specific choice of c. In general, larger c values result in shorter subchains and require more computing time to generate the same number of posterior samples, because more samples are rejected at the rejection sampling step of the algorithm (i.e., step 5 in the algorithm in Section 4.3). The set C includes the parameters with low likelihood values which are located at the tail portion of the likelihood distribution. As c becomes larger, the regeneration set C becomes more dominant in the parameter space, with the posterior samples consisting mainly of the samples from C. 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The stopping rule used was to obtain 100 subchains. The constant c was selected so that the average subchain length would be 10 or more for comparison with the results from SIMH. Different values of c were investigated, but the point estimates were basically independent of a specific choice of c. However, larger c values result in narrower confidence intervals. Figure 6.6 shows the first 100 posterior samples with the start of each subchain marked by ‘x’. Compared with Figure 6.3, mixing was improved with this hybrid sampling method. The resultant features of the RIMH chain are summarized in Table 6.6. The run time was measured on Sun workstation (Ultra Sparc 10), and it is the time required for generating the posterior samples. Therefore, the run time does not include the analysis time. About 19 seconds of the run time was used during the preliminary run. The marginal and the joint density plots are shown in Figure 6.7, and Table 6.7 summarizes the posterior statistics. In Table 6.7, the modes were found by the MAP estimation of ADAPT n. There is little difference in the posterior moments and marginal density plots from IMH results as expected. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.75-1 0.50- 0.00 i i i i 25 50 75 100 Sample number 50- > 25- 75 100 25 50 Sample number Figure 6.6 Posterior sample path with the subchain starting points marked (at 1,8,9,13, 14, 15, and 65) for BS10. (Busulfan, RIMH). BS10 BS50 BS90 c 70% 76% 93% Avg. subchain length 11.4 12.0 10.9 # Posterior samples 1,136 1,200 1,093 # Rejection 1,802 2,367 9,809 Rejection ratio 0.613 0.664 0.900 Runtime 29 s 30 s 55 s Table 6.6 Features of the RIMH chain. (Busulfan, RIMH) 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. BS10 0.4-, 0.3- X ' S. 0.2- PRIOR 95% HPDI 0.0 -f— 0.00 0.25 0.50 0.75 1.00 K* (/hr) 0 .5 -1 0.4- 0.3- i\ > 0 .2 - 0 . 1 - 0.0 75 1 0 0 125 150 50 v (I) 75-| 50- 25- 0.0 0.1 0.2 0.3 0.4 0.5 0.9 K« 0.5 0.4 g °'3' O a 0.2- o.i o.o v i "i— i ~r— i 1 0.00 0.25 0.50 0.75 1 00 1.25 1.50 C onc.(t=18) Figure 6.7 For subject BSIO, the prior and posterior marginal distribution for Ke and V (upper panel). The posterior joint density plot (lower left). Prior (------ ) and posterior (------ ) distribution of plasma concentration (lower right). (Busulfan, RIMH) RIMH PRIOR BS10 BS50____________ BS90 Ke V Ke V Ke V Ke V Mean 0.297 29.5 0.261 37.4 0.387 20.7 0.427 10.5 Mode 0.28 20.0 0.251 36.4 0.373 20.7 0.423 10.3 Std 0.0867 17.0 0.0568 7.49 0.0844 4.44 0.0597 1.74 Cov - Ke 0.00751 0.00322 0.00712 0.00352 -V -0.798 289 -0.296 56.1 -0.300 19.7 -0.0843 3.03 Corr -0.54 -0.70 -0.80 -0.81 95% CDI [0.154, [6.34, [0.157, [23.7, [0.244, [13.2, [0.304 [6.91, 0.468] 62.4] 0.369] 52.5] 0.561] 29.9] 0.534] 14.0] c 70% 76% 93% # Samples 1,000 1,136 1,200 1,093 Table 6.7 Table of moments. Mode is MAP estimate. CDI = central density interval. (Busulfan, RIMH) 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The regenerative analysis was used to calculate the asymptotic 95% confidence intervals associated with the posterior quantities of interest. Figure 6.8 shows the 95% confidence intervals associated with the posterior mean, and 2.5%, 97.5% quantiles for the PK parameter V of BS10. The quantiles were estimated using the empirical distribution method described in Section 4.4.2.1. 95% Cl for Q2,5 9 5 o / o C | f o r M e a n [22.8, 25.0] [36.9, 38.0] I 0 .2 - 95% Cl for Qg7 5 [52.2, 55.5] > a 0.1- 0.0 10 20 30 40 50 60 70 0 V(l) Figure 6.8 Asymptotic 95% confidence intervals for the posterior mean, and 2.5%, 97.5% quantiles for the PK parameter V of BS10. (Busulfan, RIMH) 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To assess the validity of the asymptotic 95% confidence intervals, a coverage test was done with 100 replications, in the same way as with the SIMH. One hundred confidence intervals were compared with an estimate of the true value that was obtained by taking the average of the results from 5 long runs (20 times longer than each of the 100 replications, with the stopping rule of obtaining 2,000 subchains). The percent coverage for the subject BS10 is summarized in Table 6.8, for values of c ranging from 50 to 100. The coverage of the 95% confidence interval for any c is close to nominal 95% as long as 100 or more i.i.d. subchains are available. In this regard, RIMH sampler is more robust than SIMH, since the regeneration is obtained prospectively by the design of the sampling algorithm. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. % Coverage of 95% Cl c Prm E(prm) 95% Cl 50 70 90 100 Ke 0.261 [0.257,0.266] 90 95 95 93 Ke2 0.0716 [0.0692,0.0740] 87 96 95 93 V 37.4 [36.9, 38.0] 92 96 93 97 V2 1460 [1420, 1500] 92 97 96 95 Q(Ke2J) 0.166 [0.158, 0.173] 94 96 96 95 Q(Ke5 ) 0.179 [0.173,0.186] 92 96 94 92 Q(Kejo) 0.257 [0.250, 0.262] 93 92 94 94 Q(Ke9 5 ) 0.365 [0.356,0.371] 92 90 93 93 Q(Ke9 7 .j) 0.391 [0.376, 0.414] 91 99 97 93 Q(V2 .s) 24.1 [22.8, 24.9] 87 96 97 96 Q(v 5 ) 25.8 [25.0, 26.9] 90 98 96 96 Q(VS 0 ) 37.3 [36.4, 38.0] 97 92 91 97 Q(V„) 50.2 [49.2, 52.0] 93 93 96 97 Q (v9 7 . 5 ) 53.5 [52.2, 55.5] 87 93 96 97 % Coverage of 95% Cl c Prm E(prm) 95% Cl 50 70 90 100 Y 0.325 [0.318,0.332] 89 95 97 96 Y2 0.114 [0.109,0.118] 93 92 97 96 Q(Y2 .s) 0.174 [0.162,0.184] 93 97 93 94 Q(Y5 ) 0.193 [0.184,0.205] 92 96 92 93 Q(y 5 0 ) 0.316 [0.306, 0.326] 93 93 94 95 Q(y 9 5 ) 0.489 [0.468,0.504] 83 93 95 96 QCY9 7 .5 ) 0.517 [0.503, 0.5441 89 93 97 95 Table 6.8 Percentage coverage for BS10. Mean and confidence interval are from c = 70% case (i.e., c set at 70% point of the likelihood distribution obtained from the 5,000 preliminary samples). (Busulfan, RIMH) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 6.9 shows the results of the coverage study using discretization method to estimate the confidence intervals for posterior quantiles. The results for case A was obtained by discretizing the parameter space using 250 equally spaced grids. The results for case B was obtained by first subdividing the parameter space into three subregions considering the gradient of the distribution function (i.e., the boundary values were 10% quantile and 90% quantile), and then discretizing each subregion accordingly. In other words, 70 grids were used from minimum parameter value to 10% quantile, from 10% to 90% quantile, and from 90% quantile up to the maximum parameter value. Better results were obtained in case B than in case A, but the overall results were not as good as the empirical distribution method. Therefore, the empirical distribution method was chosen for the default method for estimation quantiles even though it was computationally more expensive. E(Q2 ) E(Q2 .5) E(Qj) E(Qio) E(QI5 ) E(Q2 0 ) A -K e 53 50 73 76 91 93 A -V 39 52 59 85 93 85 B -K e 42 55 66 93 86 92 B -V 40 61 67 92 87 84 E(Q2 ) E(Q2 .5 ) E(QS ) E(Qio) E(Qis) E(Q2 0 ) A -K e 95 88 86 76 62 68 A -V 94 87 88 65 60 56 B -K e 90 89 97 75 61 56 B -V 95 92 100 68 54 53 Table 6.9 Percentage coverage for posterior quantiles estimated using A) discretization (Iglehart’s method) with 250 grids, and B) discretization (Iglehart’s method) with different grid sizes for three subregions o f the distribution. (Busulfan, RIMH) 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 6.10 shows the results of coverage study for all the three subjects. The coverage is close to the nominal 95% both for the system parameters and output, and both for the moments and the quantiles. Subject % Subject % Subject % Prm BS10 Cov. BS50 Cov. BS90 Cov. Ke [0.257,0.266] 95 [0.377, 0.396] 91 [0.422, 0.432] 94 Ke2 [0.0692,0.0740] 96 [0.148,0.165] 95 [0.181,0.190] 94 V [36.9, 38.0] 96 [20.2,21.1] 92 [10.3,10.6] 98 V2 [1420, 1500] 97 [428,466] 92 [109, 115] 96 Q(KeM) [0.158, 0.173] 96 [0.233, 0.253] 96 [0.294, 0.320] 94 Q(Kes) [0.173,0.186] 96 [0.253, 0.273] 93 [0.319,0.336] 97 Q(KeS 0 ) [0.250, 0.262J 92 [0.364, 0.396] 94 [0.421,0.431] 94 Q(Ke9s) [0.356,0.371] 90 [0.517,0.565] 95 [0.512, 0.545] 92 Q(Ke9 7 .5 ) [0.376,0.414] 99 [0.554, 0.644] 87 [0.533,0.577] 93 Q(V2 .5) [22.8, 24.9] 96 [10.4, 14.2] 93 [6.91,7.56] 94 Q(VS ) [25.0, 26.9] 98 [11.3, 14.9] 94 [7.22, 8.08] 92 Q(Vi0) [36.4, 38.0] 92 [19.8,21.2] 90 [10.1,10.5] 94 Q(V,5 ) [49.2, 52.0] 93 [27.8, 29.1] 94 [13.3, 14.0] 94 Q(V,7.5) [52.2, 55.51 93 [29.0,31.4] 88 ri4.0, 14.7] 95 Subject % Subject % Subject % Prm BS10 Cov. BS50 Cov. BS90 Cov. Y [0.318,0.332] 95 [0.260,0.280] 95 [0.391,0.408] 94 Y2 [0.109,0.118] 92 [0.0757, 0.0877] 92 [0.161,0.177] 96 Q(Y2 .5 ) [0.162,0.184] 97 [0.0989,0.137] 97 [0.230,0.256] 96 Q(Ys) [0.184,0.205] 96 [0.129,0.145] 97 [0.250,0.278] 94 Q(Yso) [0.306,0.326] 93 [0.243,0.274] 92 [0.378,0.400] 94 Q(Y«) [0.468,0.504] 93 [0.424, 0.464] 94 [0.560, 0.601] 97 Q(Y975) [0.503, 0.5441 93 [0.467,0.514] 95 [0.601,0.657] 94 Table 6.10 Percentage coverage (% Cov.) for the three subjects. Y is Busulfan plasma concentration. (Busulfan, RIMH) 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.9 shows the effect of c on the confidence interval width. The confidence interval width fluctuates at small c. The fluctuating portion might have resulted from the insufficient number of subchains to apply the central limit theorem and the asymptotic theory. Therefore, c must be sufficiently large so that the half width of the 95% confidence interval becomes stable. In this case, c must be larger than 60% quantile, which will produce 100 or more subchains. Once it is stabilized, the confidence interval width decreases with increasing c. The degree o f decrease was on the factor of approximately 0.5 in both 2- parameter Busulfan example and 20-parameter Warfarin example. However, as stated in the previous paragraph on the coverage study result using different c, the asymptotic characteristics remain meaningful. Therefore, the choice of large c is up to the user preference towards more accurate answer. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Ke 0.004-1 0.003- £ s i £ a z 0 .002- 0 .001- 0.000 95 85 55 65 75 C V 0.75-1 £ s * « 0.50- z 0.25- 0.00 95 85 55 75 65 C Figure 6.9 Width of the asymptotic 95% confidence interval vs. c, where c is expressed as a percentile of likelihood distribution from the 5,000-sample preliminary run. (Busulfan, RIMH) 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.2 Teniposide Example 6.2.1 Results with IMH Figure 6.10 shows the autocorrelation plots for two of the PK parameters of the Teniposide example: Kpc and Kcp. Burn-in length was set at 100,1,000, and 10,000 samples for TNI00, TN500, and TN900, respectively. A total of 10,000 samples samples were drawn for TN10,5,000 samples for TN50, and 200,000 samples for TN90. The prior statistics and the posterior statistics obtained from the IMH chain for the three subjects are summarized in Table 6.11. In the table, the modes were found by the MAP estimation of ADAPT n. The marginal density plots for TN10 are shown in Figure 6.11 with the 95% HPD interval indicated by a thick line. IMH PRIOR TN10 TN50 TN90 Kpc Kcp Kpc Kcp Kpc Kcp Kpc Kcp Mean 0.189 0.286 0.186 0.231 0.154 0.277 0.162 0.556 Mode 0.15 0.24 0.152 0.209 0.128 0.250 0.146 0.509 Std 0.116 0.0900 0.0895 0.0563 0.0626 0.0635 0.0422 0.120 95% CDI [0.0417 [0.132, [0.0359, [0.141, [0.0472, [0.149, [0.0891, [0.357, ,0.4151 0.4641 0.3451 0.3521 0.2711 0.3931 0.2461 0.8031 IMH PRIOR TN10 TN50 TN90 Kpc Kcp Kpc Kcp Kpc Kcp Kpc Kcp Mean 2.04 0.652 2.31 0.930 2.21 0.673 1.68 0.239 Mode 1.9 0.55 2.32 0.925 2.22 0.673 1.69 0.240 Std 0.409 0.281 0.254 0.121 0.190 0.101 0.110 0.0429 95% CDI [1.28, [0.214, [1.81, [0.667, [1.87, [0.461, [1.46, [0.161, 2.831 1.23] 2.811 1.141 2.581 0.8551 1.881 0.3201 Table 6.11 Table of moments. Mode is MAP estimate. CDI = central density interval. (Teniposide, IMH) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. TN10 i 30 40 O O 1 oo 1 ao »40 i e o t eo 300 1 0.0 o TN50 S O 100 1 S O 300 3SO 3 0 0 3 S O 400 4 QO SO O 10O 1 S O 300 200 3 0 0 3 0 0 400 400 S OO TN90 1 o .s o .0.0 & o S o .0.0 ll Figure 6.10 Autocorrelation plots to assess mixing. (Teniposide, IMH) 111 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. PRIOR 95% HPDI 0 . 0 0 0 . 2 S 0 . 5 0 0 . 7 5 1 .0 0 1 . 2 5 1 .5 0 Kpc (/hr) 0.4-1 0 . 3 - X. 0.2- 0 .1- 0.0 .4 0 . 5 0 . 6 0 . 7 0 .8 1 0 . 0 . 3 0.0 Kcp (/hr) 0 .4 -1 0 . 3 - ■S 0.2- 0 .1 - 0.0 2 3 4 0 1 V (I/m2) 0 .7 5 - 1 0 . 5 0 - i o ~ 5 0 . 2 5 - 0.00 3 1 2 0 CL (l/hr/m2) 0 .7 5 -1 0 . 5 0 - 2 °'25'l 0.00 1 5 20 0 5 1 0 0.4-1 0 . 3 - 0 .0 1 2 5 0 2 5 0 SOO 1000 7 5 0 0 Conc.(t*24) (mg/ml) M (t>24) (nfl-" • : Figure 6.11 Prior and posterior distribution plot for TN10. Systemic exposure at t = 24 hours (se(t=24)) is AUC(0,24). (Teniposide, IMH) 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 40 6.2.2 Results with SIMH Table 6.12 shows the values for split used for each subject and the resultant features of split IMH chains for the three subjects. The run time was measured on Sun workstation (Ultra Sparc 10), and it is the time required for generating the posterior samples. Therefore, the run time does not include the analysis time, most of which is occupied by sorting procedure. Table 6.13 shows the summary of the detailed posterior statistics obtained from split IMH chain along with the coverage study results for the three subjects. With the chosen value of csplit, the coverage is close to the nominal 95% both for the system parameters and output, and both for the moments and the quantiles TN10 TN50 TN90 split 1 2 0.1 # Subchains 105 110 136 Avg. subchain length 93.9 41.9 127 Avg. # unique samples 14.4 10.8 11.8 in subchain # Bum-in samples 100 1,000 10,000 # Posterior samples 10,000 5,000 200,000 MH acceptance ratio 0.1490 0.237 0.00849 # Acceptance 1,485 1,185 1,496 Runtime 47 s 28 s 15 m Table 6.12 Features of the split IMH chain. (Teniposide, SIMH) 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Prm TN10 Cov. TN50 Cov. TN90 Cov. Kpc [0.178,0.193] 89 [0.150,0.158] 95 [0.158,0.167] 95 Kpc2 [0.0390,0.0459] 87 [0.0260,0.0292] 94 [0.0263,0.0298] 95 Q(Kpc25) [0.0429,0.0597] 88 [0.0560,0.0642] 93 [0.0894,0.101] 93 Q(Kpc5 ) [0.0592,0.0704] 92 [0.0646,0.0735] 98 [0.0999,0.110] 94 Q(Kpc5 o) [0.163,0.183] 91 [0.139,0.150] 93 [0.152, 0.160] 99 Q(Kpc9 5 ) [0.325,0.371] 95 [0.261,0.287] 89 [0.230, 0.259] 85 Q(Kpc97S ) [0.361, 0.460] 91 r0.285,0.317] 88 [0.256, 0.273] 91 Prm TN10 Cov. TN50 Cov. TN90 Cov. Kcp [0.228,0.235] 90 [0.272, 0.282] 96 [0.532,0.577] 92 Kcp2 [0.0549, 0.0588] 90 [0.0776, 0.0835] 95 [0.293,0.351] 93 Q(Kcp2 .5 ) [0.131,0.145] 97 [0.156, 0.175] 93 [0.356,0.370] 88 Q(Kcps) [0.144,0.157] 94 [0.175,0.189] 92 [0.375,0.391] 89 Q(KcpS o) [0.219, 0.229] 86 [0.269, 0.281] 95 [0.524, 0.564] 93 Q(Kcp9 5 ) [0.328,0.347] 93 [0.379, 0.403] 93 [0.725,0.842] 92 QfKcp^s) [0.344,0.390] 92 [0.401,0.4271 92 [0.842, 0.8901 82 Prm TN10 Cov. TN50 Cov. TN90 Cov. V [2.29, 2.33] 92 [2.20, 2.22] 95 [1.67, 1.70] 94 V2 [5.30,5.48] 92 [4.86,4.96] 95 [2.80,2.89] 96 Q(V2.j) [1.82, 1.89] 92 [1.85, 1.89] 91 [1.44, 1.52] 93 Q (vs) [1.87, 1.95] 94 [1.91,1.92] 93 [1.47,1.53] 90 Q(Vj0) [2.27, 2.31] 91 [2.19, 2.22] 94 [1.67, 1.70] 96 Q(V9 S ) [2.71,2.84] 93 [2.50, 2.56] 92 [1.85, 1.87] 85 Q(V97j) [2.84, 2.911 94 [2.57, 2.65] 95 [1.87,1.93] 90 Prm TN10 Cov. TN50 Cov. TN90 Cov. CL [0.921,0.936] 88 [0.668,0.680] 96 [0233, 0.244] 93 CL2 [0.863,0.891] 90 [0.457,0.472] 95 [0.0562,0.0614] 92 Q(CL2 .s) [0.665,0.699] 93 [0.453,0.491] 94 [0.137, 0.168] 93 Q(CLs) [0.702,0.741] 93 [0.491,0.518] 95 [0.161,0.181] 95 Q(CL5 0 ) [0.923,0.951] 91 [0.664,0.687] 88 [0.233, 0.246] 93 CKCLw) [1.10, 1.14] 95 [0.828,0.855] 88 [0.305,0.316] 91 QICL^.s) [1.14,1.18] 91 [0.861,0.8881 89 [0.317,0.3271 89 Table 6.13 Percentage coverage (Cov.) of the 95% confidence interval for the three subjects. (Teniposide, SIMH) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Prm TN10 Cov. TN50 Cov. TN90 Cov. se [551,560] 90 [670, 677] 95 [893, 903] 96 se* [308k, 318k] 91 [453k, 462k] 95 [800k, 819k] 97 Q(se2 .5 ) [425, 444] 87 [544, 559] 90 [773,814] 89 Q(se5 ) [445,459] 93 [560, 575] 87 [805, 829] 91 Q(seso) [543, 558] 90 [667, 677] 92 [888, 902] 96 Q(se9 5 ) [667, 679] 94 [766, 779] 94 [978, 999] 92 Q(se9 7.s) [681,700] 93 f780, 7931 95 [990,1020] 92 Table 6.13 (Cont’d) 6.2.3 Results with RIMH A total of 5,000 samples were generated in the preliminary run. The c used for each subject and the resultant features of the RIMH chain are summarized in Table 6.14. The run time was measured on Sun workstation (Ultra Sparc 10), and it is the time required for generating the posterior samples. Therefore, the run time does not include the analysis time. About 21 seconds of the run time was used during the preliminary run. The prior statistics and the posterior statistics obtained from the RIMH chain for the three subjects are summarized in Table 6.15, and the marginal density plots are shown in Figure 6.12. In Table 6.15, the modes were found by the MAP estimation of ADAPT II. Table 6.16 shows the summary of the detailed posterior statistics obtained from RIMH chain along with the coverage study results for the three subjects. The coverage is close to the nominal 95% both for the system parameters and output, and both for the moments and the quantiles. 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. TN10 TN50 TN90 c 80% 72% 99% Avg. subchain length 15.7 10.9 18.7 # Posterior samples 1565 1093 1866 # Rejection 4354 1860 93141 Rejection ratio 0.736 0.630 0.980 Runtime 46 s 32 s 7 m Table 6.14 Features of the RIMH chain. (Teniposide, RIMH) RIMH PRIOR TN10 TN50 TN90 Kpc Kcp Kpc Kcp Kpc Kcp Kpc Kcp Mean 0.189 0.286 0.190 0.235 0.150 0.275 0.163 0.561 Mode 0.15 0.24 0.152 0.209 0.128 0.250 0.146 0.509 Std 0.116 0.0900 0.0927 0.0589 0.0628 0.0664 0.0404 0.110 95% CDI [0.0417, [0.132, [0.030, [0.138, [0.045, [0.159, [0.100, [0.388, 0.4151 0.4641 0.3641 0.3611 0.2651 0.4071 0.2551 0.8031 RIMH PRIOR TN10 TN50 TN90 Kpc Kcp Kpc Kcp Kpc Kcp Kpc Kcp Mean 2.04 0.652 2.307 0.929 2.21 0.671 1.68 0.239 Mode 1.9 0.55 2.32 0.925 2.22 0.673 1.69 0.240 Std 0.409 0.281 0.256 0.126 0.192 0.104 0.105 0.0436 95% CDI [1.28, [0.214, [182, [0.692, [1.85, [0.481, [1.45, [0.163, 2.83] 1.231 2.801 1.191 2.581 0.8821 1.841 0.3181 Table 6.15 Table of moments. Mode is MAP estimate. CDI = central density interval. (Teniposide, RIMH) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. PRIOR 95% HPDI 0.00 0.25 0.50 0.75 1.00 1.25 1.50 Kpc (/hr) 0.4-1 0.3- * 0.2- 0.0 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0 . Kcp (/hr) 0.4-1 0.3- 0.1- 0.0 0 2 3 4 V (I/m2) 0.75-1 0.50- _J O ' S. 0.25 0.00 2 3 0 1 CL (l/hr/m2) — 0.3- 0 . 1- 0.0 500 750 1000 250 1250 0 0.75-1 U 0.50- I - /- 0.00 5 15 0 10 20 C onc.(t*24) (pg/ml) s*(t*24) (pg-hr/ml) Figure 6.12 Prior and posterior distribution plot for TN10. Systemic exposure at t = 24 hours (se(t=24)) is AUC(0,24). (Teniposide, RIMH) 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Prm TN10 Cov. TN50 Cov. TN90 Cov. Kpc [0.182, 0.198] 92 [0.146, 0.155] 96 [0.157,0.169] 95 Kpc2 [0.0408, 0.0487] 87 [0.0251,0.0280] 96 [0.0259, 0.0305] 96 Q(Kpczs) [0.0486,0.0567] 91 [0.0447, 0.0630] 97 [0.0870,0.103] 93 Q(Kpcs) [0.0567, 0.0670] 93 [0.0598, 0.0695] 95 [0.0996,0.112] 92 Q(Kpc5 0 ) [0.163,0.183] 99 [0.135, 0.146] 94 [0.150, 0.164] 91 Q(Kpc9 S ) [0.336, 0.417] 90 [0.255,0.279] 95 [0.225,0.273] 85 Q(Kpc9 7.j) [0.398,0.460] 84 [0.273, 0.307] 94 [0.242, 0.319] 89 Prm TN10 Cov. TN50 Cov. TN90 Cov. Kcp [0.230, 0.240] 91 [0.270, 0.280] 93 [0.543,0.579] 94 Kcp2 [0.0560,0.0616] 90 [0.0770,0.0832] 92 [0.304, 0.349] 96 Q(Kcp2.j) [0.138, 0.147] 90 [0.159, 0.175] 92 [0.357,0.385] 93 Q(Kcps) [0.147, 0.158] 89 [0.174,0.185] 98 [0.380, 0.414] 93 Q(KcpS 0 ) [0.222, 0.234] 94 [0.262, 0.279] 94 [0.531,0.560] 96 Q(Kcp95) [0.334, 0.375] 92 [0.351,0.379] 96 [0.726, 0.890] 83 Q(Kcp97S) [0.359, 0.406] 88 [0.379, 0.4181 95 [0.803, 0.890] 79 Prm TN10 Cov. TN50 Cov. TN90 Cov. V [2.29, 2.33] 94 [2.19, 2.22] 92 [1.66, 1.69] 94 V2 [5.30, 5.48] 93 [4.85,4.97] 91 [2.77, 2.87] 93 Q(V2 .s) [1.82, 1.89] 92 [1.84, 1.89] 93 [1.45, 1.49] 88 Q(VS ) [1.88, 1.95] 96 [1.89,1.93] 96 [1.46, 1.53] 92 Q(V5 0 ) [2.27, 2.32] 88 [2.19, 2.22] 91 [1.68, 1.71] 93 Q(V9 S ) [2.71, 2.84] 94 [2.49, 2.59] 94 [1.82, 1.85] 88 Q(V97j) [2.84, 2.95] 92 [2.58, 2.69] 94 [1.84, 1.911 89 Prm TN10 Cov. TN50 Cov. TN90 Cov. CL [0.920, 0.938] 92 [0.662, 0.679] 95 [0.231, 0.246] 94 CL2 [0.862, 0.895] 92 [0.449, 0.471] 94 [0.0555,0.0621] 95 Q(CLZ S ) [0.641, 0.692] 93 [0.450, 0.483] 96 [0.137, 0.164] 93 Q(CLs) [0.695, 0.744] 89 [0.482,0.511] 95 [0.110,0.182] 91 Q(CLS 0 ) [0.917, 0.937] 92 [0.655, 0.685] 94 [0.233, 0.252] 94 Q(CL9 5 ) [1.11,1.17] 94 [0.826, 0.855] 94 [0.303,0.315] 99 Q(CL975) [1.17,1.21] 95 [0.854,0.899] 96 [0.315,0.327] 99 Table 6.16 Percentage coverage (Cov.) of the 95% confidence interval for the three subjects (Teniposide, RIMH) 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Prm TN10 Cov. TN50 Cov. TN90 Cov. se [551,560] 94 [669, 678] 90 [884, 901] 95 se' [308k, 319k] 93 [452k, 463k] 93 [784k, 815k] 94 Q(se2 .s) [415,441] 93 [542, 560] 93 [772, 797] 91 Q(ses) [429,459] 93 [561,578] 93 [776,811] 89 Q(seJ0 ) [548, 560] 94 [665, 680] 97 [883, 905] 95 Q(se9 5 ) [667, 682] 91 [764, 778] 96 [973, 990] 92 Q(se9 7 .5 ) [685, 7171 92 [778, 805] 95 [989, 1010] 93 Table 6.16 (Cont’d) 6.3 Terbutaline Example 6.3.1 Results with IMH Figure 6.13 shows the autocorrelation plots for two of the PK parameters of the Terbutaline example: Ka and Kcp. Bum-in length was set at 1,000 samples for TBIO and TB50, and 5,000 samples for TB90. A total of 50,000, 70,000, and 50,000 posterior samples were drawn for TB10, TB50, and TB90, respectively. The prior statistics and the posterior statistics obtained from the IMH chain for the three subjects are summarized in Table 6.17. In the table, the modes were found by the MAP estimation of ADAPT n. The marginal density plots for TB10 are shown in Figure 6.14 with the 95% HPD interval indicated by a thick line. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. TB10 O .S O > 0 .9 4 0 0 9 0 0 TB50 O. S o • 0 . 9 9 0 i o o 1 0 O 2 0 0 2 0 0 3 0 0 3 9 0 4 0 0 4 9 0 9 0 0 TB90 o o 1 o o 1 0 O 2 0 0 2 9 0 3 0 0 3 0 0 4 0 0 4 0 0 S O O OO 1 0 O ^ O O 2 0 0 2 0 0 3 0 0 3 0 0 4 9 0 S O O Figure 6.13 Autocorrelation plots to assess mixing. (Terbutaline, IMH) 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. IMH PRIOR TB10 Mean Mode Std 95% CDI Mean Mode Std 95% CDI LnlOOKa 5.78 5.6 0.584 [4.61, 6.88] 5.63 5.68 0.548 [4.56, 6.69] LnlOOOKcp 5.89 6.0 0.790 [4.34, 7.36] 6.03 5.95 0.767 [4.47, 7.49] LnlOOOKpc 6.20 6.2 0.350 [5.55, 6.87] 6.29 6.27 0.312 [5.75, 6.92] LnlOOOKel 6.02 6.0 0.372 [5.32, 6.79] 6.28 6.27 0.268 [5.76, 6.80] LnlOOV 8.43 8.4 0.443 [7.52, 9.23] 8.15 8.16 0.284 [7.62, 8.73] LnlOOKeo 6.18 6.4 0.562 [5.08, 7.20] 6.50 6.53 0.329 [5.90, 7.14] Lnl000EC50 7.39 7.5 1.01 [5.52, 9.39] 6.88 6.78 0.652 [5.57,8.11] LnlOOEmax 5.88 5.8 0.368 [5.16, 6.59] 5.45 5.42 0.107 [5.27, 5.67] LnlOOEO 5.18 5.0 0.552 [4.03,6.21] 4.61 4.61 0.0788 [4.46, 4.75] IMH TB50 TB90 Mean Mode Std 95% CDI Mean Mode Std 95% CDI LnlOOKa 5.79 5.82 0.500 [4.87, 6.81] 5.89 5.86 0.488 [4.83, 6.76] LnlOOOKcp 5.83 5.77 0.685 [4.62,7.19] 5.92 5.89 0.650 [4.61,7.08] LnlOOOKpc 6.40 6.38 0.295 [5.84, 6.95] 6.01 5.99 0.311 [5.43, 6.60] LnlOOOKel 6.07 6.05 0.267 [5.46, 6.5] 5.87 5.89 0.297 [5.31,6.41] LnlOOV 8.52 8.51 0.276 [7.89, 8.99] 8.54 8.53 0.357 [7.82,9.24] LnlOOKeo 6.69 6.68 0.279 [6.15, 7.24] 5.67 5.67 0.337 [5.01, 6.30] Lnl000EC50 6.63 6.64 0.558 [5.48, 7.58] 8.20 8.22 0.655 [6.92,9.43] LnlOOEmax 5.92 5.90 0.0960 [5.75, 6.10] 6.26 6.24 0.133 [6.03,6.54] LnlOOEO 4.70 4.72 0.0953 [4.54,4.92] 5.80 5.79 0.0641 [5.68, 5.91] Table 6.17 Table of moments. Mode is MAP estimate. CDI = central density interval. (Terbutaline, IMH) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. p(Ln100V) p(LnlOOOKpc) p(LnlOOKa) 0.15-1 TB10 0.10- PRIOR 0.05- 95% HPDI 0.00 5.0 10.0 2.5 7.5 0.0 0.2-1 a o S t 0 . 1 - * - e - j a o .o 10.0 7.5 2.5 5.0 0 .0 LnlOOKa (/hr) LnlOOOKcp (/hr) 0 .2 - 0 . 1- 0 0 6 7 8 5 4 S C 0.2- 00 8 6 5 7 4 LnlOOOKpc (/hr) LnlOOOKel (/hr) 0 .4 - 1 0.3- 0 .2- 0 . 1- 0.0 a 9 10 8 7 0.3-i 0 .2 - ' S . 0.1- 0.0 9 6 7 8 5 4 LnlOOV (I) LnlOOKeo (/hr) Figure 6.14 Prior and posterior distribution plot for TB10. (Terbutaline, IMH) 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.4-1 3 0.3- S 0 .2 - O- 0.1- 0.0- 12.5 10.0 2.5 5.0 7.5 0.0 Ln1000ECS0 (ng/mi) 1 .00-1 0.75- U l § 0.50- e -j 0.00 7 6 LnlOOEmax (I) 0.75-1 U J 0.50- O - 0.25- 0.00- 7 6 5 3 4 LnlOOEO (I) Figure 6.14 (Cont’d) 0.75-r 0.50- 0.00 6 2 4 0 FEV1 (t«S6) 123 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.3.2 Results with SIMH Table 6.18 shows the split values used for each subject and the resultant features of split IMH chains for the three subjects. The run time was measured on Sun workstation (Ultra Sparc 10), and it is the time required for generating the posterior samples. Therefore, the run time does not include the analysis time, most of which is occupied by sorting procedure. Tables 6.19 to 6.21 show the summary of the detailed posterior statistics obtained from split IMH chain along with the coverage study results. With the chosen value of csplit, the coverage is close to the nominal 95% both for the system parameters and output, and both for the moments and the quantiles. TB10 TB50 TB90 split 0.9 0.5 0.7 # Subchains 117 109 141 Avg. subchain length 424 615 341 Avg. # unique samples 10.7 10.9 10.3 in subchain # Bum-in Samples 1,000 1,000 5,000 # Posterior samples 50,000 70,000 50,000 MH acceptance ratio 0.0249 0.0170 0.0289 # Acceptance 1,247 1,187 1,446 Run time 29 m 39 m 30 m Table 6.18 Features of the split IMH chain. (Terbutaline, SIMH) 124 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Parameters 2.5% Quantiles 97.5 % Quantiles Prm E(Prm) 95% Cl % Cov. E(Q25) 95% Cl % Cov. E ( Q 97 5 ) 95% Cl % Cov. LnlOOKa 5.63 [5.58, 5.68] 96 4.56 [4.41,4.63] 96 6.71 [6.58, 6.83] 94 LnlOOOKcp 6.03 [5.96,6.10] 97 4.41 [4.23,4.63] 97 7.44 [7.35, 7.65] 87 LnlOOOKpc 6.29 [6.26,6.32] 97 5.66 [5.62, 5.76] 97 6.89 [6.85, 6.98] 92 LnlOOOKel 6.28 [6.26, 6.30] 96 5.74 [5.70, 5.77] 96 6.79 [6.76, 6.84] 89 LnlOOV 8.15 [8.13, 8.18] 95 7.56 [7.52, 7.65] 95 8.69 [8.65, 8.83] 90 LnlOOKeo 6.50 [6.47, 6.52] 94 5.88 [5.82, 5.91] 94 7.14 [7.07, 7.23] 92 Lnl000EC50 6.87 [6.82,6.93] 91 5.54 [5.40, 5.70] 91 8.11 [8.04, 8.27] 88 LnlOOEmax 5.45 [5.44, 5.46] 91 5.29 [5.28, 5.30] 91 5.72 [5.68, 5.80] 89 LnlOOEO 4.61 [4.61,4.62] 98 4.47 [4.45,4.47] 98 4.76 [4.75,4.77] 91 FEV|(t=56) 1.61 [1.60, 1.62] 95 1.43 [1.42, 1.441 96 1.81 [1.79, 1.83] 96 Table 6,19 Summary of the posterior statistics of TB10. (Terbutaline, SIMH) Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without perm ission. Parameters 2.5% Quantiles 97.5 % Quantiles Prm E(Prm) 95% Cl % Cov. E(Q25) 95% Cl % Cov. 6.79 [6.69, 6.90] % Cov. LnlOOKa 5.79 [5.75, 5.83] 94 4.82 [4.71,4.91] 96 7.19 [7.05, 7.48] 91 LnlOOOKcp 5.83 [5.78, 5.88] 89 4.59 [4.33,4.65] 94 6.95 [6.92,7.07] 88 LnlOOOKpc 6.40 [6.37, 6.42] 94 5.84 [5.75, 5.85] 92 6.66 [6.54, 6.75] 90 LnlOOOKel 6.07 [6.04, 6.10] 88 5.58 [5.53, 5.60] 90 9.05 [8.99, 9.11] 90 LnlOOV 8.52 [8.49, 8.55] 97 7.90 [7.87, 7.96] 94 7.25 [7.22, 7.33] 94 LnlOOKeo 6.69 [6.67, 6.71] 93 6.15 [6.10, 6.19] 92 7.63 [7.57, 7.73] 93 Lnl000EC50 6.63 [6.56, 6.69] 90 5.49 [5.42, 5.60] 93 6.12 [6.10, 6.14] 90 LnlOOEmax 5.92 [5.91,5.93] 96 5.76 [5.75, 5.77] 96 4.89 [4.88,4.91] 87 LnlOOEO 4.70 [4.70,4.71] 95 4.49 [4.45,4.55] 93 2.72 [2.69, 2.76] 95 FEV,(t=56) 2.40 [2.39, 2.42] 96 2.13 [2.10,2.151 94 6.79 [6.69, 6.901 94 Table 6.20 Summary of the posterior statistics of TBSO. (Terbutaline, SIMH) Parameters 2.5% Quantiles 97.5 % Quantiles Prm E(Prm) 95% Cl % Cov. E(Q25) 95% Cl % Cov. E(Q975) 95% Cl % Cov. LnlOOKa 5.89 [5.86, 5.92] 95 4.91 [4.83,5.01] 92 6.87 [6.78,6.92] 94 LnlOOOKcp 5.92 [5.87, 5.97] 95 4.63 [4.55,4.72] 95 7.13 [7.06, 7.26] 91 LnlOOOKpc 6.01 [5.98,6.03] 95 5.40 [5.34, 5.45] 88 6.60 [6.57,6.67] 92 LnlOOOKel 5.87 [5.84, 5.90] 89 5.28 [5.26, 5.34] 90 6.41 [6.38, 6.50] 84 LnlOOV 8.54 [8.51,8.57] 93 7.80 [7.56, 7.95] 87 9.22 [9.18, 9.34] 94 LnlOOKeo 5.67 [5.64, 5.70] 95 5.01 [4.92, 5.08] 86 6.33 [6.29,6.39] 93 Lnl000EC50 8.20 [8.15,8.25] 95 6.85 [6.66, 6.94] 94 9.43 [9.37,9.57] 83 LnlOOEmax 6.26 [6.25, 6.27] 93 6.04 [6.03. 6.05] 90 6.57 [6.54, 6.63] 91 LnlOOEO 5.80 [5.79, 5.80] 95 5.68 [5.67, 5.69] 95 5.92 [5.91,5.93] 90 FEV|(t=56) 3.75 [3.73, 3.76] 91 3.41 [3.39, 3.43] 92 4.12 [4.09,4.16] 95 Table 6.21 Summary of the posterior statistics of TB90. (Terbutaline, SIMH) K > o\ 6.3.3 Results with RIMH A total o f5,000 samples were generated in the preliminary run. The c used for each subject and the resultant features of the RIMH chain are summarized in Table 6.22. The run time was measured on Sun workstation (Ultra Sparc 10), and it is the time required for generating the posterior samples. Therefore, the run time does not include the analysis time. The prior statistics and the posterior statistics obtained from the RIMH chain for the three subjects are summarized in Table 6.23, and the marginal density plots are shown in Figure 6.15. In Table 6.23, the modes were found by the MAP estimation of ADAPT II. Tables 6.24 to 6.26 show the summary of the detailed posterior statistics obtained from RIMH chain along with the coverage study results. The coverage is close to the nominal 95% both for the system parameters and output, and both for the moments and the quantiles. TB10 TB50 TB90 c 96.8% 97.8% 96% Avg. subchain length 13.4 11.7 15.0 # Posterior samples 1,335 1,169 1,496 # Rejection 27,914 34,986 25,719 Rejection ratio 0.954 0.968 0.945 Run time 18 m 23 m 18 m Table 6.22 Features of the RIMH chain. (Terbutaline, RIMH) 127 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. RIMH PRIOR TB10 Mean Mode Std 95% CDI Mean Mode Std 95% CDI LnlOOKa 5.78 5.6 0.584 [4.61,6.88] 5.58 5.68 0.570 [4.56, 6.80] LnlOOOKcp 5.89 6.0 0.790 [4.34, 7.36] 6.07 5.95 0.781 [4.68,7.65] LnlOOOKpc 6.20 6.2 0.350 [5.55, 6.87] 6.29 6.27 0.307 [5.70, 6.86] LnlOOOKel 6.02 6.0 0.372 [5.32, 6.79] 6.30 6.27 0.274 [5.78,6.82] LnlOOV 8.43 8.4 0.443 [7.52, 9.23] 8.16 8.16 0.295 [7.58, 8.68] LnlOOKeo 6.18 6.4 0.562 [5.08, 7.20] 6.49 6.53 0.349 [5.83,7.16] Lnl000EC50 7.39 7.5 1.01 [5.52, 9.39] 6.88 6.78 0.702 [5.40, 8.07] LnlOOEmax 5.88 5.8 0.368 [5.16, 6.59] 5.46 5.42 0.117 [5.24, 5.70] LnlOOEO 5.18 5.0 0.552 [4.03, 6.211 4.61 4.61 0.080 [4.46, 4.77] RIMH TB50 TB90 Mean Mode Std 95% CDI Mean Mode Std 95% CDI LnlOOKa 5.78 5.82 0.504 [4.79,6.73] 5.85 5.86 0.497 [4.83, 6.74] LnlOOOKcp 5.79 5.77 0.683 [4.57,7.19] 5.93 5.89 0.647 [4.60, 7.11] LnlOOOKpc 6.41 6.38 0.294 [5.88, 6.99] 6.02 5.99 0.308 [5.40, 6.55] LnlOOOKel 6.05 6.05 0.246 [5.60, 6.51] 5.87 5.89 0.289 [5.37, 6.48] LnlOOV 8.54 8.51 0.276 [7.94, 9.02] 8.56 8.53 0.360 [7.84, 9.22] LnlOOKeo 6.71 6.68 0.294 [6.11,7.23] 5.67 5.67 0.340 [5.00,6.33] LnlOOOEC50 6.60 6.64 0.578 [5.49, 7.61] 8.22 8.22 0.665 [6.94,9.56] LnlOOEmax 5.91 5.90 0.0971 [5.73, 6.09] 6.27 6.24 0.137 [6.03, 6.54] LnlOOEO 4.71 4.72 0.0908 [4.54,4.87] 5.80 5.79 0.0673 [5.67,5.91] Table 6.23 Table of moments. Mode is MAP estimate. CDI = central density interval. (Terbutaline, RIMH) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. p(Ln100V) p(Ln1000Kpc) p(Ln100Ka) 0.15-1 TB10 0 .1 0 - PRIOR 0.05- 95% HPDI 0 .0 0 1 0 .0 0 .0 2.5 5.0 7 5 0.2-i a o K e -i a 0 .0 ' 1 0 .0 so 00 2.5 7.5 LnlOOKa (/hr) LnlOOOKcp (/hr) 0.3-i 0.2- 0 .1- 0 .0 8 6 5 7 4 LnlOOOKpc (/hr) 0.4-. f 0.3- 0 .2 - 0 .1 - 0 .0 LnlOOOKel (/hr) 0.4-, 0.3- 0.2- 0.1- 0 .0 7 8 9 1 0 6 0.3-1 0 .2 - ■ £ 0 .1 - 0 .0 9 8 6 7 5 4 LnlOOV (0 LnlOOKeo (/hr) Figure 6.15 Prior and posterior distribution plot for TB10. (Terbutaline, RIMH) 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.00-1 0.75- 0.50- ^ 0.25- 0.00' 6 7 5 0.4-i 8 0.3- 0.2- 0.1- 0.0' 7.5 1 0 .0 12.5 0.0 2.5 5.0 LniOOOECSO (ng/ml) LnlOOEmax (I) 0.75-, O. 0.25- 0.00 T* 4 5 6 7 3 LnlOOEO (I) Figure 6.15 (Cont’d) 0.75-1 0.50- % 0.25- 0 .0 0 - 6 4 2 0 FEV1 (ts S6) 130 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without perm ission. Parameters 2.5% Quantiles 97.5% Quantiles Prm E(Prm) 95% Cl % Cov. E(Q25) 95% Cl % Cov. E(Q97S) 95% Cl % Cov. LnlOOKa 5.58 [5.53, 5.64] 94 4.41 [4.30, 4.56] 92 6.67 [6.58, 6.85] 91 LnlOOOKcp 6.07 [6.01,6.13] 94 4.49 [4.22, 4.68] 93 7.59 [7.48, 7.79] 83 LnlOOOKpc 6.29 [6.26,6.32] 91 5.69 [5.52, 5.75] 89 6.86 [6.84, 6.92] 88 LnlOOOKel 6.30 [6.27,6.33] 96 5.78 [5.70, 5.83] 94 6.86 [6.81,6.96] 90 LnlOOV 8.16 [8.12,8.19] 92 7.58 [7.44, 7.66] 96 8.70 [8.65, 8.86] 95 LnlOOKeo 6.49 [6.46, 6.52] 92 5.78 [5.70, 5.89] 95 7.15 [7.12, 7.36] 92 Lnl000EC50 6.88 [6.83,6.94] 89 5.50 [5.40, 5.60] 94 8.28 [8.06, 8.58] 93 LnlOOEmax 5.46 [5.45, 5.47] 97 5.28 [5.26, 5.30] 92 5.75 [5.73, 5.80] 90 LnlOOEO 4.61 [4.60,4.62] 97 4.46 [4.43, 4.47] 90 4.77 [4.76,4.78] 97 FEV,(t=56) 1.61 [1.60,1.61] 90 1.42 [1.41, 1.44] 97 1.79 [1.78, 1.84] 95 Table 6.24 Summary o f the posterior statistics o f TB10. (Terbutaline, RIMH) Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without perm ission. Parameters 2.5% Quantiles 97.5% Quantiles Prm E(Prm) 95% Cl % Cov. E(Q25) 95% Cl % Cov. E(Q,7S) 95% Cl % Cov. LnlOOKa 5.78 [5.73, 5.83) 97 4.81 [4.73,4.90] 96 6.83 [6.71,7.00] 89 LnlOOOKcp 5.79 [5.71,5.87] 94 4.41 [4.27,4.57] 98 7.10 [6.97, 7.43] 90 LnlOOOKpc 6.41 [6.37,6.44) 95 5.88 [5.78, 5.90] 90 6.99 [6.89, 7.15] 88 LnlOOOKel 6.05 [6.03, 6.08] 88 5.60 [5.47, 5.66] 93 6.51 [6.48,6.67) 91 LnlOOV 8.54 [8.51,8.58] 95 7.94 [7.86, 8.03] 92 9.04 [8.99,9.18] 86 LnlOOKeo 6.71 [6.68, 6.74) 95 6.12 [6.03,6.17] 100 7.27 [7.23, 7.32] 90 LnlOOOECSO 6.60 [6.54, 6.66] 95 5.49 [5.12, 5.58] 96 7.63 [7.54, 7.87] 91 LnlOOEmax 5.91 [5.90, 5.92] 93 5.75 [5.73, 5.76] 98 6.11 [6.08, 6.17] 93 LnlOOEO 4.71 [4.70,4.72] 94 4.54 [4.50,4.55] 95 4.88 [4.87, 4.90] 93 FEV,(t=56) 2.40 [2.38, 2.42] 99 2.13 [2.10 2.15] 90 2.71 [2.70,2.79] 91 Table 6.25 Summary of the posterior statistics of TB50. (Terbutaline, RIMH) Parameters 2.5% Quantiles 97.5% Quantiles Prm E(Prm) 95% Cl % Cov. E(Q2S) 95% Cl % Cov. E(Q97.j) 95% Cl %Cov. LnlOOKa 5.85 [5.80, 5.90) 92 4.83 [4.65, 5.01] 89 6.91 [6.77, 7.08] 91 LnlOOOKcp 5.93 [5.88, 5.99] 94 4.61 [4.41,4.68] 95 7.17 [7.06, 7.31] 93 LnlOOOKpc 6.02 [5.99,6.06] 92 5.43 [5.21,5.53] 94 6.60 [6.55,6.71] 93 LnlOOOKel 5.87 [5.84, 5.90] 92 5.33 [5.26, 5.38] 91 6.48 [6.44,6.67] 90 LnlOOV 8.56 [8.52, 8.60] 92 7.84 [7.51,7.98] 93 9.29 [9.22,9.35] 94 LnlOOKeo 5.67 [5.65, 5.70] 95 5.00 [4.88, 5.07] 94 6.36 [6.32,6.45] 92 Lnl000EC50 8.22 [8.16,8.28] 95 6.79 [6.65,6.94] 95 9.51 [9.41,9.62] 87 LnlOOEmax 6.27 [6.26,6.28] 90 6.04 [6.02,6.06] 97 6.57 [6.52,6.65] 87 LnlOOEO 5.80 [5.79, 5.80] 93 5.67 [5.66, 5.68] 96 5.91 [5.91,5.93] 92 FEV,(t=56) 3.75 [3.73,3.77] 90 3.38 [3.35, 3.41] 96 4.12 [4.07, 4.14] 92 Table 6.26 Summary of the posterior statistics of TB90. (Terbutaline, RIMH) 6.4 Warfarin Example 6.4.1 Results with IMH Figure 6.16 shows the autocorrelation plots for two of the PK parameters of the Warfarin example: Vc and Vp for WF02 and WF03. Bum-in length was set at 100 samples for WF02, and at 5,000 samples for WF03 due to slow mixing. A total of 7,000 and 200,000 posterior samples were drawn for WF02 and WF03, respectively. WF02 2 0 <40 OO OO 1 0 O 1 2 0 1 4 0 1 0 O t OO 2 0 0 2 0 4 0 OO OO 1 0 O 1 2 0 4 4 0 1 0 O 1 0 O 2 0 0 WF03 0.0 e o o 0.0 Figure 6.16 Autocorrelation plots to assess mixing. (Warfarin, IMH) 133 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.17 shows the IPSRF (iterative potential scale reduction factor) to test the convergence to the posterior distribution for WF02 (Brooks and Gelman 1996). As with the Busulfan example, five independent chains were constructed from different starting points. Different seeds for random number were used for each of them. The five different starting points were made up using the minimum and maximum parameter values from the prior population simulation to mimic the overdispersed distribution (Gelman and Rubin 1992). 1,000 samples (i.e., 900 posterior samples plus 100 bum-in samples) were divided into 50 batches, each o f which had 20 samples. The PSRF is less than 1.2 from the beginning and approximately 1 after 800 iterations, which proves the convergence to the identical posterior distribution according to the Brooks and Gelman’s criterion. 1.3 1 .2 CO Q. 0.9 * x x x x x x x x x x x x x ; 100 200 300 400 500 600 700 800 900 1000 Iteration Number 1.3 1.2- 1 .1 1 - 0.9 X X * X X x*xX X X x x x x x x x x x x x x x x x j : 100 200 300 400 500 600 700 800 900 1000 Iteration Number Figure 6.17 IPSRF (iterative potential scale reduction factor) to diagnose convergence using multiple chains for WF02. (Warfarin, IMH) 134 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The first 200 samples for WF02 and the first 2,000 samples for WF03 of the chain are plotted in Figure 6.18. Extremely slow mixing of WF03’s chain is evident from the figure (cf. Figure 6.20). The prior statistics and the posterior statistics from the IMH chain for the two subjects are summarized in Table 6.27. The MH acceptance ratio was 0.181 for WF02 and 0.00660 for WF03. The marginal density plots for the two subjects are shown in Figure 6.19. 15*1 1 0 - o > WF02 50 100 150 Sample Number 2 0 0 WF03 15-i I Q - 1500 2 0 0 0 1 0 0 0 500 Sample Number 15-1 1 0 - a > 150 200 50 1 0 0 Sample Num ber 15-i 1 0 - a > 1500 2000 1 0 0 0 500 Sample Number Figure 6.18 Posterior sample path for WF02 and WF03. Note the difference in scale range on the x axes. (Warfarin, IMH) 135 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. IMH PRIOR WF02 WF03 Mean Std 95% CDI Mean Std 95% CDI Mean Std 95% CDI Vc 5.49 1.65 [2.74, 8.73] 5.48 1.63 [2.77, 8.87] 5.68 1.83 [2.52, 9.02] Vp 5.25 1.41 [2.86, 8.09] 5.30 1.42 [2.54, 7.86] 4.63 1.28 [2.21,6.83] CLd 2.60 1.55 [0.389,5.51] 2.56 1.39 [0.582,5.12] 2.05 1.19 [0.424, 4.27] CLtR 0.161 0.0342 [0.0952,0.226] 0.167 0.0253 [0.119,0.211] 0.125 0.0167 [0.0948, 0.156] CLtS 0.281 0.0592 [0.167,0.395] 0.290 0.0447 [0.199,0.371] 0.216 0.0298 [0.160, 0.271] ka 3.08 6.97 [0.048, 9.42] 2.38 2.52 [0.147, 7.08] 1.38 1.43 [0.0950, 3.92] IC50-7 973 277 [511, 1540] 963 237 [526, 1410] 728 167 [436, 1050] H7 3.18 0.580 [2.09,4.32] 3.19 0.514 [2.35, 4.26] 2.89 0.413 [2.10,3.65] k7 0.0784 0.0102 [0.0588, 0.0981] 0.0768 0.00873 [0.0606,0.0931] 0.0783 0.00761 [0.0655,0.0951] Css7 147 42.71 [74.5, 226] 141 41.7 [73.6, 235] 151 45.9 [81.4, 250] IC50-2 1100 272.1 [641, 1650] 1110 211 [739,1560] 812 160 [505, 1100] H2 4.16 3.77 [0.430, 10.7] 3.78 2.78 [0.560, 9.24] 3.27 2.13 [0.752, 7.65] k2 0.0122 0.00316 [0.0066,0.0181] 0.0127 0.00266 [0.00781,0.0177] 0.0125 0.00223 [0.00860, 0.0168] Css2 141 28.8 [87.9, 198] 136 27.3 [84.6, 182] 138 28.4 [86.3, 193] Imin 0.739 0.0582 [0.626,0.848] 0.735 0.0573 [0.631,0.840] 0.710 0.0471 [0.627, 0.808] Fm27 34.1 9.498 [18.4, 54.9] 31.5 7.16 [18.8,45.8] 37.8 6.75 [24.8, 50.7] H27 1.19 0.141 [0.924, 1.46] 1.17 0.131 [0.892, 1.41] 1.18 0.130 [0.944, 1.45] P27 0.174 0.206 [0.0043, 0.558] 0.103 0.0682 [0.00850,0.225] 0.0730 0.0369 [0.0102,0.143] IC(7) 96.5 25.1 [54.4, 151] 106 24.5 [63.2, 151] 108 24.1 [62.6, 154] IC(8) 97.1 9.68 [77.5, 115] 97.0 10.5 [77.5, 117] 96.0 9.65 [77.6, 115] INR48 3.06 1.31 [1.01,5.60] 3.05 0.376 [2.30, 3.75] 5.28 0.516 [4.33, 6.28] INR72 2.38 1.21 [0.939,4.60] 2.19 0.320 [1.60,2.84] 6.17 0.610 [5.01,7.36] Table 6.27 Table of moments. CDI = central density interval. (Warfarin, IMH) p(CLtS) p(CLd) p(Vc) PRIOR WF02 WF03 0.1- 0.1- 0.0' 0.0 0.0 2.5 5.0 7.5 10.0 12.5 15.0 20 0 5 10 15 Vc(l) Vp(l) 0.3-i 0.5-, 0.4- 0.2 - 0.3- "S 0.2- 0. 1- 0 . 1- 0.0- 0.0- 8 0 2 6 0.3 0.4 4 0.2 0.1 0.0 CLd (I/hr) CLtR (I/hr) 0.4-1 0.3- 0.2 - 0. 1- 0 .0' 0.0 0.0 0.2 0.5 0.6 0 .1 0.3 0.4 CLtS (l/hr) ka (/hr) Figure 6.19 Prior and posterior distributions for WF02 and WF03. (Warfarin, IMH) 137 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. p(IC 50-FII) P(k7) p(IC50-FVII) 0.24 0.2 H 0.1 H 0.H 0.0' 0.0 “ n---------1 --------- 1 — '■ '■ f i 500 1000 1500 2000 2500 IC 50-F V 1I (pg/|) 0 H 7 0.3 "i 0.4-r 0.3-1 0.24 0.1 H 0.1 H 0.0 0.0 400 300 200 Css7 (lunits) 100 0.15 0.10 0.05 0.00 k7 (/hr) 0.75-1 0.3-i 0.504 0.24 ex X ' S . 0.254 o.i 4 0.00 0.0 1 1000 1500 2000 2500 30 20 500 10 0 0 IC 50-FU (iig/l) H 2 Figure 6.19 (Cont’d) 138 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. p(H 27) p(lmin) PC <2) 0.4-i 0.4-1 0 .3 H 0.3 H O 0 .2 -^ 0.1 H o.-H 0.0 0.0 300 200 100 0.03 0.02 0.01 0.00 Css2 (lunits) k2 (/hr) 0.2n 0.3H o . i H o . H o.o o.o 1.25 1.00 0.00 0.25 0.50 0.75 krin (MR) Fm27 (lunits) 0.3-| 0.2! 0.1H 0.1 H 0.0 0.0 — I 2.0 0.5 0.4 0.3 0.2 0 .1 0.0 1.5 1.0 0.5 0.0 H 27 P27 Figure 6.19 (Cont’d) 139 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.2- 0.2- 0.1- 0.1- 0 .0 - 0.0' 150 100 0 50 300 100 200 0 IC7 (lunits) IC8 (lunits) 0.4-1 0.75-1 0 .3 - $ 0 .5 0 - '8 . 0 .2 5 - 0.00 10.0 12.5 7 .5 10.0 12.5 0 .0 2 .5 5.0 7 .5 0.0 2 .5 5.0 INR(t=48) INR(t=72) Figure 6.19 (Cont’d) 140 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.4.2 Results with SIMH Table 6.28 shows the values o f split used for each subject and the resultant features of split IMH chains for the two subjects. The run time was measured on Sun workstation (Ultra Sparc 10), and it is the time required for generating the posterior samples. Therefore, the run time does not include the analysis time, most of which is occupied by sorting procedure. Tables 6.29 and 6.30 show the summary of the detailed posterior statistics obtained from the split IMH chain and Table 6.31 shows the coverage study results. With the chosen value of csplit, the coverage is close to the nominal 95% both for the system parameters and output, and both for the moments and the quantiles. WF02 WF03 split 1.2 0.4 # Subchains 105 127 Avg. subchain length 65.0 1,470 Avg. # unique samples 12.1 10.4 in subchain # Bum-in samples 100 5,000 # Posterior samples 7,000 200,000 MH acceptance ratio 0.181 0.00660 # Acceptance 1,268 1,319 Runtime 4 m 2h23m Table 6.28 Features o f the split IMH chain. (Warfarin, SIMH) 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without perm ission. Prm E(Prm) 95% Cl E(Q25) 95% Cl E(Q975) 95% Cl Vc 5.47 [5.37, 5.57] 2.90 [2.77,3.10] 9.15 [8.87,9.77] Vp 5.31 [5.21,5.41] 3.10 [2.90,3.21] 8.76 [8.32,9.34] CLd 2.56 [2.46, 2.68] 0.797 [0.651,0.841] 6.20 [5.64,6.73] CLtR 0.166 [0.164, 0.169] 0.124 [0.118,0.125] 0.218 [0.211,0.226] CLtS 0.289 [0.286,0.293] 0.215 [0.205,0.215] 0.397 [0.378,0.412] ka 2.40 [2.24, 2.56] 0.304 [0.259,0.337] 9.25 [8.27, 11.3] IC50-7 965 [947, 984] 572 [546, 583] 1470 [1460, 1630] H7 3.19 [3.15,3.22] 2.35 [2.27, 2.37] 4.28 [4.21,4.45] k7 0.0769 [0.0763, 0.0775] 0.0615 [0.0607, 0.0626] 0.0954 [0.0931,0.0990] Css7 141 [138, 143] 78.0 [67.1,81.5] 242 [235,260] IC50-2 1110 [1090, 1120] 739 [676, 762] 1560 [1520, 1660] H2 3.78 [3.57, 3.99] 0.840 [0.757, 1.01] 10.9 [9.90, 13.7] k2 0.0127 [0.0126,0.0129] 0.00815 [0.00782,0.00854] 0.0184 [0.0179,0.0197] Css2 136 [134, 139] 90.8 [87.1,92.8] 196 [192,203] Imin 0.735 [0.732,0.738] 0.635 [0.624,0.644] 0.849 [0.836,0.864] Fm27 31.5 [31.0, 32.0] 19.6 [19.1,20.0] 47.6 [46.6, 50.8] H27 1.17 [1.16, 1.17] 0.911 [0.894,0.930] 1.45 [1.42, 1.51] P27 0.104 [0.0987, 0.109] 0.0141 [0.0124,0.0166] 0.265 [0.251,0.294] IC(7) 106 [104, 108] 66.0 [63.2, 67.9] 163 [151, 181] IC(8) 97.0 [96.1,97.9] 78.9 [78.1,79.9] 119 [117,121] INR48 3.05 [3.03, 3.07] 2.37 [2.30, 2.43] 3.88 [3.82, 3.94] INR72 2.19 [2.17,2.21] 1.63 [1.58, 1.67] 2.88 [2.84, 2.94] Table 6.29 Summary of the posterior statistics of WF02. (Warfarin, SIMH) Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without perm ission. Prm E(Prm) 95% Cl E(Q25) 95% Cl E(Q97.5 ) 95% Cl Vc 5.68 [5.50, 5.87] 2.93 [2.67,3.18] 10.42 [9.19, 12.0] Vp 4.63 [4.49, 4.76] 2.46 [2.22, 2.77] 7.56 [6.82, 12.3] CLd 2.05 [1.95,2.16] 0.574 [0.445, 0.719] 4.78 [4.52, 5.48] CLtR 0.125 [0.124,0.127] 0.0973 [0.0948,0.0991] 0.161 [0.157,0.167] CLtS 0.216 [0.213,0.219] 0.165 [0.162, 0.170] 0.291 [0.268, 0.326] ka 1.38 [1.24, 1.51] 0.130 [0.0638,0.204] 4.90 [4.19, 7.51] IC50-7 728 [713, 743] 454 [434, 467] 1100 [1050,1200] H7 2.89 [2.86, 2.93] 2.19 [2.11,2.25] 3.78 [3.69, 3.88] k7 0.0782 [0.0775,0.0789] 0.0646 [0.0633, 0.0662] 0.0945 [0.0929, 0.0996] Css 7 151 [146, 156] 87.2 [84.1,89.9] 267 [247,335] IC50-2 812 [797, 827] 536 [505, 570] 1190 [1110,1310] H2 3.28 [3.03, 3.52] 0.983 [0.830, 1.03] 9.18 [7.38,15.1] k2 0.0125 [0.0123,0.0127141] 0.00864 [0.00852, 0.00907] 0.0171 [0.0168,0.0177] Css2 138 [135, 141] 91.7 [88.3, 97.6] 210 [194, 224] Imin 0.710 [0.705,0.714] 0.622 [0.614, 0.627] 0.806 [0.792,0.829] Fm27 37.8 [37.2, 38.4] 25.7 [24.8, 28.0] 52.6 [51.8, 54.2] H27 1.18 [1.16, 1.19] 0.944 [0.912,0.948] 1.45 [1.44, 1.52] P27 0.0728 [0.0697, 0.0760] 0.0184 [0.0157, 0.0225] 0.162 [0.155,0.173] IC(7) 108 [106, 111] 70.0 [64.1,73.7] 166 [156,189] IC(8) 96.0 [95.1,96.9] 77.7 [75.8, 79.7] 116 [114,119] INR48 5.28 [5.24, 5.32] 4.40 [4.33,4.45] 6.46 [6.33,6.58] INR72 6.17 [6.13, 6.21] 5.11 [5.01,5.19] 7.54 [7.38, 7.70] Table 6.30 Summary of the posterior statistics o f WF03. (Warfarin, SIMH) 6 % Coverage of 95% Cl Prm E(Prm) E(Prm2 ) E(Q2 .5 > E(Qs) E(Q9 S ) E(Q,7.j) Vc 92 94 91 89 95 94 Vp 95 95 94 93 95 92 CLd 96 94 93 91 93 88 CLtR 96 96 90 95 94 93 CLtS 90 91 93 93 90 92 ka 91 90 93 94 90 94 IC50-7 89 89 95 93 92 91 H7 94 96 93 92 90 93 k7 96 95 95 92 96 92 Css7 93 92 90 94 95 94 IC50-2 97 95 90 96 97 96 H2 96 94 93 90 99 99 k2 93 94 95 92 92 94 Css2 88 89 93 91 93 93 Imin 95 95 90 92 94 93 Fm27 94 94 88 91 91 91 H27 95 96 93 96 94 89 P27 91 93 94 91 97 93 IC(7) 99 98 97 94 94 94 IC(8) 95 95 93 91 97 90 INR48 92 92 95 97 95 96 INR72 91 93 93 92 90 93 Table 6.31 Percentage coverage for WF02. The average number of unique samples i subchain is approximately 14. (Warfarin, SIMH) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.4.3 Results with RIMH The RIMH sampler was applied to all of the 10 subjects. 5,000 posterior samples were drawn for each subject. The resulting number of subchains and the number of rejected samples for each subject are shown in Table 6.32. The constant c was set at the 75% quantile of the likelihood values obtained from 5,000 sample preliminary run, except for the two subjects, WF03 and WF11. These two subjects were extreme cases, and they required higher c values to obtain more than 2 subchains from this fixed run length to apply the regenerative analysis. The number of rejections can be used as an indication of computational difficulty, and therefore, of the required computing time. Subject c # Subchains # Rejection WF02 75 453 9,580 WF03 99 318 316,000 WF04 75 204 10,400 WF05 75 327 10,100 WF06 75 67 11,000 WF07 75 344 10,300 WF08 75 362 9,960 WF09 75 278 9,820 WF10 75 398 10,000 WF11 90 34 31,600 Table 6.32 Number of subchains and number of rejections at step 5 in the RIMH algorithm for the 10 subjects. Results from 5,000 posterior samples. (Warfarin, RIMH) 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To further investigate two subjects WF02 and WF03, a total of 5,000 samples were generated in the preliminary run. The c was set at 75% quantile of the likelihood values for WF02 and at 98.7% quantile for WF03. Other features of the RIMH chain are summarized in Table 6.33. Figure 6.20 shows the first 200 posterior samples with the start of each subchain marked by ‘x \ Compared with Figure 6.18, mixing was improved with this hybrid sampling method, but with the expense of computational cost. Table 6.34 summarizes the posterior statistics, and the marginal density plots are shown in Figure 6.21. WF02 WF03 c 75% 98.7% Avg. subchain length 15.2 15.7 # Posterior samples 1,515 1,573 # Rejection 3,104 98,877 Rejection ratio 0.672 0.984 Run time 6 m 1 h 9 m Table 6.33 Features of the RIMH chain. (Warfarin, RIMH) 146 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. WF02 WF03 u > 2 0 0 150 1 0 0 50 200 150 50 1 0 0 0 Sample Number Sample Number 15-, 1 0 - a > X ' 200 150 1 00 50 15-i I Q - 150 200 50 10 0 SanffeN urtar Sarrple Nunrtoer Figure 6.20 Posterior sample path with the subchain starting points marked (1, S3,54,62, 63,100,114,115,117,158, 175, and 200 for WF02; at 1,2,7,10,15,20,22,150,162, and 163 for WF03). (Warfarin, RIMH) 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without perm ission. RIMH PRIOR WF02 WF03 Mean Std 95% HPD Interval Mean Std 95% HPD Interval Mean Std 95% HPD Interval Vc 5.49 1.65 [2.74, 8.73] 5.53 1.68 [2.73, 8.94] 5.66 1.67 [2.72, 8.79] Vp 5.25 1.41 [2.86, 8.09] 5.35 1.40 [2.54, 7.71] 4.65 1.16 [2.54, 6.87] CLd 2.60 1.55 [0.389,5.51] 2.65 1.47 [0.700, 5.59] 2.07 1.18 [0.535, 4.03] CLtR 0.161 0.0342 [0.0952,0.226] 0.168 0.0249 [0.119, 0.213] 0.125 0.0162 [0.095,0.156] CLtS 0.281 0.0592 [0.167,0.395] 0.291 0.0456 [0.200, 0.378] 0.215 0.00278 [0.162,0.268] ka 3.08 6.97 [0.048,9.42] 2.34 2.45 [0.138,6.74] 1.36 1.41 [0.067, 3.92] IC50-7 973 277 [511, 1540] 964 243 [534, 1430] 730 177 [423,1050] H7 3.18 0.580 [2.09, 4.32] 3.20 0.519 [2.31,4.30] 2.91 0.403 [2.20,3.71] k7 0.0784 0.0102 [0.0588,0.0981] 0.0764 0.00855 [0.061,0.094] 0.0780 0.00778 [0.065,0.094] Css7 147 42.7 [74.5, 226] 141 42.2 [65.5, 224] 152 44.5 [89.1,260] IC50-2 1100 272 [641, 1650] 1100 218 [720, 1540] 825 170 [559, 1210] H2 4.16 3.77 [0.430, 10.7] 3.86 2.74 [0.653, 9.74] 3.37 1.97 [0.840, 7.28] k2 0.0122 0.00316 [0.0066,0.0181] 0.0125 0.00266 [0.008, 0.018] 0.0124 0.00242 [0.007, 0.017] Css2 141 28.8 [87.9, 198] 136 26.8 [89.7, 183] 137 25.1 [87.0, 181] Imin 0.739 0.0582 [0.626,0.848] 0.735 0.0534 [0.625, 0.830] 0.711 0.0527 [0.630,0.851] Fm27 34.1 9.50 [18.4, 54.9] 31.3 7.17 [18.1,45.5] 37.9 7.02 [23.8, 50.7] H27 1.19 0.141 [0.924, 1.46] 1.17 0.136 [0.900, 1.43] 1.18 0.135 [0.935, 1.43] P27 0.174 0.206 [0.0043,0.558] 0.102 0.0683 [0.010,0.234] 0.0733 0.0357 [0.013,0.137] 1C(7) 96.5 25.1 [54.4, 151] 104 24.2 [64.9, 151] 107 23.0 [67.4, 150] IC(8) 97.1 9.68 [77.5, 115] 97.0 10.2 [77.5., 116] 96.2 9.93 [81.6, 118] INR48 3.06 1.31 [1.01,5.60] 3.03 0.373 [2.37, 3.72] 5.30 0.523 [4.30, 6.34] INR72 2.38 1.21 [0.939,4.60] 2.17 0.319 [1.57,2.82] 6.20 0.630 [5.09, 7.54] Table 6.34 Table of moments. CDI = central density interval. (Warfarin, RIMH) ■ c * . 00 p(CLtS) p(CLd) p(Vc) 0.2-i 0.2-1 P R I O R W F 0 2 W F 0 3 0 . 1- 0.0 0.0' 12.5 5.0 7.5 10.0 2.5 15 20 0.0 0 10 5 V C (I) Vp(l) 0 .4 - 0.2- 0 .3 - C- 0.2- 0 . 1- 0. 1- 0.0' 0.0' 0.2 0.3 0.4 0.1 6 8 0.0 0 2 4 CLd (i/hr) CLtR (I/hr) 0 .2-1 0.4-i 0.3- 0 .2 - 0 . 1- 0.0- 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 CLtS (I/hr) ka (/hr) Figure 6.21 Prior and posterior distributions for WF02 and WF03. (Warfarin, RIMH) 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. p(IC50-FII) p(k7) p(IC50-FVII) 0.2- 0.2 0 . 1- 0. 1- 0.0 0.0 500 1000 1500 2000 2500 IC 50-FV H (|xg/f) H7 0.4-, 0.3-i 0.3- 0.2- 0.2 - 0. 1- 0.1 0.0- — 0.00 0.0 0.05 0.10 0.15 200 300 400 0 100 k7 (/hr) Css7 (lunits) 0.3-1 0.75-1 0.2- 0.50- (M X •s 0 .1- 0.25- 4 ^ -------------------------- , o . 500 1000 1500 2000 2500 0 .0 - 20 10 30 IC50-FI0ig/l) Figure 6.21 (Cont’d) H 2 150 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. p(H 27) p(lm in) p(k2) 0.3- 0.3- O 0.2- 0. 1- 0 . 1- 0.0 0 .0 - 300 200 0 100 0.03 0.00 0.02 0.01 k2 (/hr) Css2 (lunits) 0.2 -) 0.3- 0.1- 0.0 0.0 0 10 20 30 40 50 60 70 80 0.50 0.75 1.00 1.25 0.00 0.25 Im in (IN R ) Fm27 (lunits) 0.3-| 0.2-i 0. 1- 0.0- 0.0 0.5 0.4 0.1 0.3 0.0 1.5 2.0 0.0 0.5 1.0 H27 P27 Figure 6.21 (Cont’d) 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. p(INR(t=48)) p(IC7) 0.2- 0. 1- 0. 1- 0.0- 0 .0 - 150 50 100 0 200 300 100 IC7 (lunits) IC8 (lunits) 0.75-1 0.3- 0.50- 0.1- 0.00- 5.0 10.0 12.5 10.0 12.5 0.0 2.5 7.5 7.5 5.0 0.0 2.5 MR(t»48) Figure 6.21 (Cont’d) INR(t=72) 152 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 6.35 shows the results of quantile estimation using discretization method. Equally spaced 250 grids were used to discretize the parameter space. If the distribution is highly skewed like that of ka or H2, the coverage can be extremely poor. This result demonstrates why the discretization method can’t be adopted in general-purpose program. Tables 6.36 and 6.37 show the summary of the detailed posterior statistics obtained from RIMH chain for the two subjects, and Tables 6.38 shows the coverage study results for WF02 using the empirical distribution method with c set at 60% quantile of the likelihood distribution. The coverage is close to the nominal 95% both for the system parameters and output, and both for the moments and the quantiles. % Coverage of 95% Cl Prm E ( Q 2S) E ( Q 20) E (Q g o ) E ( Q ,7 .5) Vc 72 99 95 83 CLd 62 95 100 82 ka 0 27 100 79 IC50-7 71 98 99 82 H7 69 98 98 73 IC50-2 65 99 98 79 H2 27 88 100 70 INR48 54 95 98 69 INR72 53 97 99 66 Table 6.35 Percentage coverage for selected parameter quantiles using discretization method for WF02. Results from 250 equally spaced grids. (Warfarin, RIMH) 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced w ith permission of th e copyright owner. Further reproduction prohibited without perm ission. Prm E(Prm) 95% Cl E(Q2 5 ) 95% Cl E(Q9 7 5 ) 95% Cl Vc 5.53 [5.38, 5.67] 2.99 [2.73,3.18] 9.42 [9.07, 10.4] Vp 5.35 [5.24, 5.46] 3.10 [2.86, 3.21] 8.67 [8.23, 9.43] CLd 2.65 [2.53,2.77] 0.808 [0.700,0.892] 6.08 [5.63,6.81] CLtR 0.168 [0.166,0.170] 0.123 [0.117,0.127] 0.218 [0.211,0.223] CLtS 0.291 [0.287,0.294] 0.211 [0.206,0.219] 0.398 [0.386, 0.413] ka 2.34 [2.13,2.54] 0.308 [0.259,0.343] 8.60 [7.22, 12.9] IC50-7 964 [946,982] 551 [515,588] 1470 [1420, 1660] H7 3.20 [3.15, 3.24] 2.34 [2.24,2.41] 4.45 [4.19,4.54] k7 0.0764 [0.0757,0.0770] 0.0615 [0.0605,0.0631] 0.0945 [0.0933,0.0981] Css7 141 [138,144] 76.5 [69.7,81.1] 245 [233,274] 1C50-2 1100 [1090,1120] 737 [699,751] 1580 [1540, 1650] H2 3.86 [3.62,4.10] 0.989 [0.804, 1.06] 11.0 [10.5, 13.6] k2 0.0125 [0.0123,0.0127] 0.00818 [0.00796,0.00848] 0.0182 [0.0177,0.0196] Css2 136 [134, 139] 92.4 [90.7, 96.1] 197 [188, 203] Imin 0.735 [0.731,0.740] 0.636 [0.625,0.641] 0.849 [0.835,0.854] Fm27 31.3 [30.7,31.8] 19.6 [19.1,20.5] 49.1 [46.5, 52.0] H27 1.17 [1.16, 1.18] 0.910 [0.892,0.931] 1.46 [1.44, 1.51] P27 0.102 [0.0975,0.107] 0.0141 [0.0123,0.0168] 0.267 [0.251,0.277] IC(7) 104 [103, 106] 65.7 [63.8, 67.8] 156 [151, 171] IC(8) 97.0 [96.2, 97.7] 78.8 [77.6, 79.7] 119 [116, 123] INR48 3.03 [3.01,3.06] 2.40 [2.37, 2.44] 3.83 [3.74, 3.90] INR72 2.17 [2.15,2.20] 1.60 [1.58, 1.64] 2.88 [2.82, 2.96] Table 6.36 Summary of the posterior statistics of WF02. (Warfarin, RIMH) Prm E(Prm) 95% Cl E(Q25) 95% Cl E(Q9 7 .s ) 95% Cl Vc 5.66 [5.46, 5.85] 3.02 [2.78,3.18] 9.78 [9.35, 10.7] Vp 4.65 [4.51,4.79 2.65 [2.51,2.80] 7.21 [6.97, 7.97] CLd 2.07 [1.93,2.21] 0.642 [0.438,0.754] 4.62 [4.18,5.27] CLtR 0.125 [0.123,0.126] 0.0965 [0.0952,0.0996] 0.162 [0.156, 0.172] CLtS 0.215 [0.213,0.218] 0.170 [0.164,0.173] 0.284 [0.272, 0.297] ka 1.36 [1.22, 1.50] 0.149 [0.134,0.204] 5.67 [4.36, 7.48] IC50-7 730 [705, 755] 457 [436,476] 1140 [1070, 1300] H7 2.91 [2.86, 2.95] 2.20 [2.11,2.26] 3.74 [3.69,4.11] k7 0.0780 [0.772, 0.0788] 0.0648 [0.0634,0.0659] 0.0929 [0.0929, 0.0871] Css7 152 [145, 159] 84.7 [77.7, 90.7] 260 [255, 278] IC50-2 825 [800, 849] 560 [507, 584] 1220 [1160, 1310] H2 3.37 [3.15,3.58] 0.963 [0.840,1.03] 8.11 [7.87, 10.9] k2 0.0124 [0.0121,0.0127] 0.00735 [0.00632,0.00903] 0.0176 [0.0170, 0.0189] Css2 137 [134, 139] 92.3 [89.2,98.0] 188 [186, 206] Imin 0.711 [0.703,0.718] 0.619 [0.604,0.630] 0.851 [0.851,0.904] Fm27 37.9 [37.1,38.7] 26.2 [25.1,27.0] 53.3 [52.7, 56.8] H27 1.18 [1.16, 1.20] 0.935 [0.852,0.972] 1.43 [1.41, 1.52] P27 0.0733 [0.0688,0.0778] 0.0186 [0.0133,0.0210] 0.155 [0.142,0.181] IC(7) 107 [105, 109] 72.3 [70.0, 75.0] 162 [157, 173] IC(8) 96.2 [95.1,97.4] 78.4 [66.6,81.8] 117 [114, 120] INR48 5.30 [5.25, 5.35] 4.37 [4.30,4.44] 6.47 [6.34, 6.64] INR72 6.20 [6.14, 6.25] 5.11 [5.01,5.16] 7.66 [7.50, 7.83] Table 6.37 Summary o f Ihe posterior statistics o f WF03. (Warfarin, RIMH) % Coverage of 95% Cl Prm E(Prm) E(Pmr) E(Q,s) E(Qs) E(Q,S ) E(Q97.j) Vc 91 91 96 92 96 94 Vp 96 95 91 88 93 95 CLd 95 91 94 91 95 93 CLtR 93 91- 93 92 96 94 CLtS 92 91 94 96 93 90 ka 90 89 91 94 94 93 IC50-7 96 97 97 96 90 94 H7 96 96 95 95 87 94 k7 95 95 93 94 93 93 Css7 92 93 94 94 98 97 IC50-2 93 93 94 95 96 95 H2 95 94 88 94 95 93 k2 92 93 96 94 94 94 Css2 93 93 90 91 91 90 Imin 94 94 96 95 95 95 Fm27 93 90 93 96 95 94 H27 94 94 97 95 92 94 P27 93 96 95 96 97 96 IC(7) 94 93 95 96 94 96 IC(8) 94 94 93 93 93 97 INR48 97 96 96 94 95 97 INR72 92 90 99 94 89 92 Table 6.38 Percentage coverage for WF02. c was set at 60% quantile (i.e., c set at 60% point of the likelihood distribution obtained from the 5,000 preliminary samples). (Warfarin, RIMH) 156 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 6.22shows the effect of c on confidence interval width. The degree of decrease is on the factor of approximately 0.5 as with the 2-parameter Busulfan example (cf. Figure 6.9). ka CLd 0.3-t I * m X 0.2 - 0.1- 0.0 85 95 55 75 65 0.t5n £ 5 a 0.10- 0.05- 0.00 95 75 85 55 65 ctquantila) c<quantila) IC50-FV II IC50-FII 30-, 1 2 ° ' l a I S 1 IQ - 85 95 55 85 75 2 0 -i IQ- 75 85 95 55 65 c<quantUa) Figure 6.22 Width of the asymptotic 95% confidence interval vs. c of selected paramters for WF02. (Warfarin, RIMH) 157 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7. Discussion and Conclusion Bayesian methods have had a long history since the concept of conditional probabilities was introduced by Thomas Bayes in 1763. However, Bayesian methods have not been widely applied to real-world problems, in part, because of the computational challenges posed by their implementation. As a result of the relatively recent development of Markov chain Monte Carlo algorithms such as the Metropolis algorithm (1953), and the widespread availability of powerful desk-top computers, Bayesian approaches have been successfully applied to solve complex problems in biomedicine and other fields. The Metropolis algorithm was generalized by Hastings (1970) so that it could be applied to non- symmetric proposal densities. Tierney (1994) introduced several ideas on how to use MCMC methods in Bayesian applications. Two of his ideas, the Independence Metropolis-Hastings (IMH) algorithm and the Rejection Independence Metropolis-Hastings (RIMH) algorithm, have been implemented in this work. In addition, the splitting technique of Nummelin (Nummelin 1984; Mykland, Tierney, and Yu 1995) has been used to obtain i.i.d. subchains from the Markov chain generated by the IMH algorithm. Iglehart’s regenerative analysis method (Crane and Iglehart 1974a) can be used to calculate the asymptotic confidence interval once the i.i.d. statistics are available. The RIMH algorithm and the IMH method with splitting (SIMH) overcome limitations of the generic MCMC methods, and allow the calculation of confidence intervals for posterior quantities using the central limit theorem. By having the asymptotic confidence interval available in addition to the Bayesian HPD interval, the frequentist approach and the Bayesian method are combined to allow more informative analysis. 158 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The overall objective of the work reported in this dissertation is to develop and implement general and robust Markov chain Monte Carlo methods for Bayesian estimation involving pharmacokinetic/pharmacodynamic models. The three algorithms investigated (IMH, SIMH, and RJMH) were implemented as Bayesian estimation module in ADAPT, the PK/PD systems analysis software developed by D’Argenio and Schumitzky (1997). The newly developed program, BAYESID, was evaluated using four different models of differing complexity, from simple linear 2-parameter model of the drug Busulfan to complex nonlinear 20-parameter model of Warfarin’s pharmacokinetics and pharmacodynamics. Despite the large dimensional difference between the examples, all three algorithms were able to obtain the posterior distributions of parameters and model outputs, which were used to calculate estimates for a number of posterior quantities (e.g. means, standard deviations, HPD intervals, and quantiles). The confidence intervals obtained from the SIMH and RIMH algorithms showed close to nominal 95% coverage for the posterior moments and the posterior quantiles for almost every portion of the distributions. 7.1 Evaluation of the Methods As the problems become more complex, it takes more time to run the sampling- based estimation program. The numbers in Table 7.1 are approximate run times measured on Sun workstation (Ultra Sparc 10), and they represent the times required for generating the posterior samples. The run times do not include the analysis time, most of which involves the sorting procedure when the chain is very long. The difference in run times for SIMH was due to different degree of mixing of the IMH chain for each subject. In Figure 7.1, the 159 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. autocorrelation plots for the 50th percentile subjects from the three simulated examples along with the two Warfarin example subjects are shown. If the chain mixes slowly, both the bum-in length and the chain length need to be large, which requires long computing time (Table 7.2). The analysis time for the longest chain, for WF03, was approximately 4 hours and 34 minutes, which made the total run time 6 hours and 57 minutes. For WF02, the total run time was 9 minutes. The analysis time was less than one minute for BS10, BS50, TN10, and TN50; 4 minutes for BS90; 21 minutes for TN90; and approximately 10 minutes for Terbutaline example subjects. As for RIMH, the number of posterior samples is about 1,000 for all the subjects because c was selected such that the average subchain length could be close to 10 with a total of 100 subchains to be obtained (Table 7.2). However, the time required for constructing similar length chain by the RIMH sampler can be very different from subject to subject depending on the particular value of c used for the particular subject (Table 7.1). If c is closer to the maximum likelihood, it takes longer time to generate the posterior samples. However, the average subchain length can be more flexibly controlled by the RIMH sampler than by the SIMH method (cf. Section 7.3). For the RIMH sampler, the run time includes the time required by the preliminary run. For Busulfan example subjects, the preliminary run took 19 seconds. For Teniposide example subjects, it was about 21 seconds. Since the chain lengths were similar, the analysis time was similar for all the subjects. For Busulfan example, it was only a few seconds. For the longest chain case, WF03 of Warfarin example because o f the largest number of system parameters, the analysis time was still less than 2 minutes. 160 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. SIMH RIMH 10 (WF02) 50 (WF03) 90 10 (WF02) 50 (WF03) 90 BS(Busulfan) 20 s 28 s 1 m 39 s 29 s 30 s 55 s TN(Teniposide) 47 s 28 s 15 m 46 s 32 s 7 m TB(Terbutaline) 29 m 39 m 30 m 18 m 23 m 18 m WF(Warfarin) 4 m 2 h 23 m 6 m I h 9 m Table 7.1 Approximate run times for SIMH and RIMH for different subjects. SIMH RIMH 10 (WF02) 50 (WF03) 90 10 (WF02) 50 (WF03) 90 BS 100/5,000 100/7,000 100,25,000 1,136 1,200 1,093 TN 100/10,000 1,000/5,000 10,000/200,000 1,565 1,093 1,866 TB 1,000/50,000 1,000/70,000 5,000/50,000 1,335 1,169 1,496 WF 100/70,000 5,000/200,000 1,515 1,573 Table 7.2 Number of bum-in samples and posterior samples for SIMH, and number of posterior samples for RIMH. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Busulfan Example: BS50 o . a o • o . a ii Teniposide Example: TN50 20 a o 1 OO 1 2 0 1 4 0 1 4 1 o . a o • o . a Terbutaline Example: TB50 a o i o o 1 a o 200 200 a o o 400 a o o * o .. a o . a Figure 7.1 Autocorrelation plots to assess mixing. 162 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. W arfarin Example: WF02 0.6 _ o 200 ■0 6 120 W arfarin Example: WF03 0.6 2000 200 600 1 0.6 O 1 o o o ' • o Figure 7.1 (Cont’d) 163 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7.2 Limitations of the Methods According to Bayes’ theorem, the posterior is the prior weighted by the likelihood. In sampling based-methods, this corresponds to retaining samples from the prior distribution that seem to belong to the posterior distribution. If the particular subject under investigation represents an extreme sample from the population, for example subject WF03 of the Warfarin example, there may be computational difficulty in estimating the posterior distribution of such a subject. This is because the posterior density for such a subject may be located in the extreme tail portion of the prior distribution. Therefore most of the samples from the prior distribution are rejected (RIMH method) or provide little information as posterior samples (IMH method). This results in long computing times to estimate posterior distributions. A similar situation can occur when the data are extensive and very informative, even if the true posterior distribution is located near the center of the prior distribution. For the IMH method, these two situations result in extremely slow mixing of the chain and therefore require a long chain to be generated. For RIMH, the c to define the regeneration set C has to be closer to the maximum likelihood. In this case, longer computing time is necessary to obtain better estimates of the maximum likelihood in the preliminary run. Once the simulation is successful done and samples from the posterior distribution are obtained, any posterior quantities can be calculated from the posterior samples. In addition, SIMH and RIMH allow calculation of the asymptotic confidence interval by producing i.i.d. subchains. SIMH requires the already constructed IMH chain and is retrospective in defining the regeneration times. Therefore, if the original IMH chain mixes slowly, a proportionately longer chain is required in order to obtain meaningful results. RIMH is prospective in defining the regeneration times, and the same amount of or more 164 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. information can be extracted from shorter run length than SIMH due to improved mixing property. Both SIMH and RIMH have tuning parameters. Given a fixed length run, they can be used to control the number of i.i.d. subchains (therefore, the average subchain length) that are included in the complete chain. Both of them are defined using likelihood value. In RIMH, c was defined with respect to the likelihood distribution. In SIMH, csplit was defined with respect to the mean and minimum likelihood values from the data set in question. Selection of both parameters is problem dependent, and thus requires that the user select appropriate values to produce the desired number of subchains given the confidence needed for estimated quantities. Estimating quantiles requires more work than estimating moments. Two methods of estimating quantiles have been tried, and the preference was given to the empirical distribution method even though it was computationally expensive. Therefore, batch process is recommended when the computing time becomes longer. 7.3 Advantages o f the Methods The confidence interval from the regenerative methods gives the measure of accuracy of the answer. It can also be used to determine how long the chain should be in order to obtain answers with a certain preset degree of accuracy. Therefore, more emphasis is given to regenerative methods in this work. With the proper choice of split for SIMH and c for RIMH, close to nominal coverage was obtained for all the estimates. Once a meaningful confidence interval is obtained at a certain run length, predictions on the accuracy of answer at a any run length are possible by the straightforward formulation in 165 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. calculating the confidence interval. This is an important advantage of regenerative methods over non-regenerative methods. For the confidence interval to attain nominal coverage, there must be at least 100 or more subchains available (based on the results of the simulations considered in this work). Quantile estimation requires more work than moment estimation, and quantile estimation using SIMH method requires more caution than RIMH method. In SIMH, the average number of accepted samples in subchains must be at least 10 or more, which was determined empirically. The average subchain length can be controlled with the tuning parameters. However, the SIMH method is subject to the mixing of the original IMH chain. In other words, if the IMH chain mixes very slowly, the average subchain length to be obtained by splitting tends to be long. And there is no way to obtain more subchains than the case with split equal to the mean likelihood value. For the RIMH method, the average subchain length can be completely controllable by the constant c . 7.4 Other Issues Investigated Several additional issues were investigated in this work but not reported in detail in the dissertation. One of them is an adaptive approach to determining the posterior (Gilks, Roberts, and Sahu 1998; Tierney and Mira 1999). The attempted adaptive method was one of the so-called 2-stage methods for which the posterior samples obtained part way through the analysis are used to update the prior distribution. This effectively moves the prior mean to the area closer to the possible posterior mean. Relying on this approach did reduce 166 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. computing time for the extreme case subject like WF03 of the Warfarin example. However, the posterior estimates were subject to the particular adaptive strategy used. Some of the controversial issues for this approach are how many times to adapt, how many samples to use to update the prior statistics, and eventually what effect this approach will exert on the calculated posterior distribution. Another unreported work is using the /-distribution as the prior distribution. This idea was adopted in an effort to overcome the computational difficulty associated with extreme subjects by allowing more candidate samples from the tail portion of the proposal density to be available. However, since a simulating /-distribution increases the overall variance, this approach resulted in more computational burden, because more samples from the prior that don’t belong to the possible posterior region are produced as well. A possible solution to this problem is to minimize parameter correlations. However, this is difficult to do as part of a general method. 7.5 Future Work The error variance parameters play a crucial role in the estimation process, and the estimation results depend on these values. Estimating the error variance parameters therefore is a very challenging task. However, unlike the Gibbs sampler which samples from the full conditional distribution, it is difficult with the Metropolis-Hastings algorithm that was currently implemented in this work. This is due to the fact that the cancellation of the normalizing constants assumes that the likelihood function is invariable. To be able to 167 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. estimate variance parameters with the Metropolis-Hastings algorithm, a new approach is necessary. Other PK/PD applications of regenerative samplers such as SIMH and RIMH methods may be to Bayesian design of experiment problems, for example to obtain optimal sampling time (cf. Rosner 1999). The general experiment design method available by ADAPT is a deterministic approach, so introducing a Bayesian method would represent an important contribution. 168 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. References Athreya, K. B., and Ney, P. (1978), “A New Approach to the Limit Theory of Recurrent Markov Chains," Transactions o f the American Mathematical Society, 245, 493-501. Bayes, T. (1958), “An essay towards solving a problem in the doctrine of chances,” Biometrika, 45, 296-315. Best, N., Cowles, M. K., and Vines, K. (1996), Convergence Diagnostics and Output Analysis Software fo r Gibbs Sampling Output Version 0.30, Cambridge, U.K.: MRC Biostatistics Unit, Institute of Public Health. Brooks, S., and Gelman, A. (1996), “General Methods for Monitoring Convergence of Iterative Simulations,” MCMC Preprint Service. Brooks, S., and Gelman, A. (1998), “General Methods for Monitoring Convergence of Iterative Simulations,” Journal o f Computational and Graphical Statistics, 7,434-455. Brooks, S. P., and Roberts, G. O. (1998), “Convergence Assessment Techniques for Markov Chain Monte Carlo,” Statistics and Computing, 8, 319-335. ------------ (1999), “On Quantile Estimation and Markov Chain Monte Carlo Convergence,” Biometrika, 86, 710-717. Buell, J., Jelliffe, R., Kalaba, R., and Sridhar, R. (1970), “Modem Control Theory and Optimal Drug Regimen II: Combination Therapy,” Mathematical Biosciences, 6,67-74. Carlin, B. P., and Louis, T. (1996), Bayes and Empirical Bayes Methods fo r Data Analysis, London: Chapman & Hall. Chib, S., and Greenberg, E. (1995), “Understanding the Metropolis-Hastings Algorithm,” The American Statistician, 49, 327-335. Conroy, H. (1967), “Molecular Schrodinger Equation VHI. A New Method for Evaluation of Multidimensional Integrals,” The Journal o f Chemical Physics, 47,5307-5318. Crane, M. A., and Iglehart, D. L. (1974a), “Simulating the Stable Stochastic Systems, I: General Multiserver Queues,” Journal o f the Association fo r Computing Machinery, 21, 103-113. ------------ (1974b), “Simulating the Stable Stochastic Systems, II: Markov Chains,” Journal o f the Association fo r Computing Machinery, 21, 114-123. ------------ (1975a), “Simulating the Stable Stochastic Systems, HI: Regenerative Processes and Discrete-Event Simulations,” Operations Research, 23, 33-45. 169 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ------------- (1975b), “Simulating the Stable Stochastic Systems, IV: Approximation Techniques,” Management Science, 21,1215-1224. D’Argenio, D. Z., and Park, K. (1997), “Uncertain Pharmacokinetic/Pharmacodynamic Systems; Design Estimation and Control,” Control Engineering Practice, 5, 1707-1716. D’Argenio, D. Z., and Schumitzky, A. (1997), ADAPTII User's Guide: Pharmacokinetic/ Pharmacodynamic Systems Analysis Software, Los Angeles: Biomedical Simulations Resource, University of Southern California. Dayneka, N. L., Garg, V., and Jusko, W. J. (1993), “Comparison of Four Basic Models of Indirect Pharmacodynamic Responses,” Journal o f Pharmacokinetics and Biopharmaceutics, 21,457-478. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood From Incomplete Data via the EM Algorithm (with discussion),” Journal o f the Royal Statistical Society, Ser. B, 39,1-38. Folland, G. B. (1999), Real Analysis, 2nd ed., New York, NY: John Wiley & Sons, Inc. Forrest, A. (1998), Personal Correspondence. Fox, S. I. (1996), Human Physiology, 5th ed., Dubugue, IA: Wm. C. Brown Publishers. Garren, S., and Smith, R. L. (1995), “Estimating the Second Largest Eigenvalue of a Markov Transition Matrix,” Technical Report, University of Cambridge. Gelman, A., Bois, F., and Jiang, J. (1996), “Physiological Pharmacokinetic Analysis Using Population Modeling and Informative Prior Distributions,” Journal o f the American Statistical Association, 91,1400-1412. Gelman, A., Carlin, J. B., Stem, H. S., and Rubin, D. B. (1995), Bayesian Data Analysis, London: Chapman & Hall. Gelman, A., Roberts, G. O., and Gilks, W. R. (1996), “Efficient Metropolis Jumping Rules,” in Bayesian Statistics 5, eds. J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, Oxford: Oxford University Press, 599-607. Gelman, A., and Rubin, D. B. (1992), “Inference From Iterative Simulation Using Multiple Sequences (with discussion),” Statistical Science, 7,467-511. Geman, S., and Geman, D. (1984), “Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-741. 170 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Genz, A. C., and Malik, A. A. (1980), “Remarks on Algorithm 006: An Adaptive Algorithm for Numerical Integration Over an N-Dimensional Rectangular Region,” Journal o f Computational and Applied Mathematics, 6, 295-299. Geweke, J. (1992), “Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments,” in Bayesian Statistics 4, eds. J. M. Bernardo, A. F. M. Smith, A. P. Dawid, and J. O. Berger, Oxford: Oxford University Press, 169-193. Geyer, C. J. (1992), “Practical Markov Chain Monte Carlo (with discussion),” Statistical Science, 7,473-511. Gilks, W. R., Richardson, S., and Spiegelhalter, D. J. (1996), “Introducing Markov Chain Monte Carlo,” in Markov Chain Monte Carlo in Practice, eds. W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, London: Chapman & Hall, 1-19. Gilks, W. R, Roberts, G. O., and Sahu, S. K. (1998), “Adaptive Markov Chain Monte Carlo through Regeneration,” Journal o f American Statistical Association, 93:1045-1054. Grimmett, G., and Stirzaker, D. (1982), Probability and Random Processes, Oxford: Clarendon Press; New York: Oxford University Press. Hammersley, J. M., and Handscomb, D. C. (1964), Monte Carlo Methods, London: Methuen & Co Ltd.; New York: John Wiley & Sons Inc. Hastings, W. K. (1970), “Monte Carlo Sampling Methods Using Markov Chains and Their Applications,” Biometrika, 57, 97-109. Heidelberger, P, and Welch, P. D. (1983), “Simulation Run Length Control in the Presence of an Initial Transient,” Operations Research, 31,1109-1144. Holford, N. H. G., and Sheiner, L. B. (1981), “Understanding the Dose-Effect Relationship: Clinical Application of Pharmacokinetic-Pharmacodynamic Model,” Clinical Pharmacokinetics, 6,429-453. Iglehart, D. L. (1975), “Simulating the Stable Stochastic Systems, V: Comparison of Ratio Estimators,” Naval Research Logistics Quarterly, 22, 553-565. ------------ (1976), “Simulating the Stable Stochastic Systems, VI: Quantile Estimation,” Journal o f the Association fo r Computing Machinery, 23, 347-360. Jelliffe, R. (1967), “A Mathematical Analysis of Digitalis Kinetics in Patients With Normal and Reduced Renal Function,” Mathematical Biosciences, I, 305-325. 171 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Jusko, W. J., and Ko, H. C. (1994), “Physiologic Indirect Response Models Characterize Diverse Types of Pharmacodynamic Effects,” Clinical Pharmacology & Therapeutics, 56, 406-419. Jusko, W. J., Ko, H. C., and Ebling, W. F. (1995), “Convergence of Direct and Indirect Pharmacodynamic Response Models,” Journal o f Pharmacokinetics and Biopharmaceutics, 23, 5-8. Katz, D., Azen, S. P., and Schumitzky, A. (1981), “Bayesian Approach to the Analysis of Nonlinear Models: Implementation and Evaluation,” Biometrics, 37,137-142. Katz, D., and D’Argenio, D. Z. (1984), “Discrete Approximation of Multivariate Densities With Application to Bayesian Estimation,” Computational Statistics & Data Analysis, 2, 27-36. Katz, D., Schumitzky, A., and Azen, S. P. (1982), “Reduction of Dimensionality in Bayesian Nonlinear Regression With a Pharmacokinetic Application,” Mathematical Biosciences, 9, 47-56. Lindley, D. V. (1980), “Approximate Bayesian Methods,” in Bayesian Statistics, eds. J. M. Bernardo, M. H. Degroot, D. V. Lindley, and A. M. F. Smith, Valencia, Spain: University Press. Liu, C., Liu, J., and Rubin, D. B. (1993), “A Control Variable for Assessment the Convergence of the Gibbs Sampler,” Proceedings o f the Statistical Computing Section o f the American Statistical Association, 74-78. Medhi, J. (1994), Stochastic Processes, 2nd ed., New York: John Wiley & Sons Inc. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953), “Equation of State Calculation by Fast Computing Machines,” The Journal o f Chemical Physics, 21, 1087-1092. Meyn, S. P., and Tweedie, R. L. (1993), Markov Chains and Statistic Stability, London: Springer-Verlag. Mykland, P., Tierney, L., and Yu, B. (1995), “Regeneration in Markov Chain Samplers,” Journal o f the American Statistical Association, 90, 233-241. Naylor, J. C., and Smith, A. F. M. (1982), “Applications of a Method for the Efficient Computation of Posterior Distributions,” Applied Statistics, 31,214-225. Naylor, J. C., and Shaw, J. E., Bayes Four User Guide, Nottingham, U.K. 172 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Nummelin, E (1978), “A Splitting Technique for Harris Recurrent Markov Chains,” Zeitschri/tfur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 43, 309-318. Oosterhuis, B., Braat, P., Roos, C. M., Werner, J., and Boxtel, C. J. V. (1986), “Pharmacokinetic-Pharmacodynamic Modeling of Terbutaline Bronchodilation in Asthma,” Clinical Pharmacology and Therapeutics, 40,469-475. ------------ (1984), General Irreducible Markov Chains and Non-Negative Operators, Cambridge, U.K.: Cambridge University Press. Park, K., and D’Argenio, D. Z. (1995), “Stochastic Control of Pharmacodynamic Processes With Application to Ergotamine,” in Advanced Methods o f Pharmacokinetic and Pharmacodynamic Systems Analysis Vol. 2, ed. D. Z. D’Argenio, New York: Plenum Press, 189-204. Peskun, P. H. (1973), “Optimum Monte-Carlo Sampling Using Markov Chains,” Biometrika, 60,607-612. Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992), Numerical Recipes in Fortran, 2nd ed, New York: Cambridge University Press. Raftery, A. E., and Lewis, S. M. (1992), “How Many Iterations in the Gibbs Sampler?” in Bayesian Statistics 4, eds. J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, Oxford: Oxford University Press, 763-773. Riegelman, S., Benet, L. Z., and Rowland, M. (1973), “Pharmacokinetics and Biopharmaceutics: A Definition of Terms,” Journal o f Pharmacokinetics and Biopharmaceutics, I, 3-4. Ripley, B. D. (1987), Stochastic Simulation, New York: John Wiley & Sons. Ritter, C., and Tanner, M. A. (1992), “Facilitating the Gibbs Sampler: The Gibbs Stopper and the Griddy-Gibbs Sampler,” Journal o f the American Statistical Association, 87, 861- 868. Roberts, G. O. (1994), “Methods for Estimating L2 Convergence of Markov Chain Monte Carlo,” in Bayesian Statistics and Econometrics: Essays in Honor o f Arnold Zellner, eds. D. Berry, K. Chaloner, and J. Geweke, Amsterdam: North Holland. Roberts, G. O., and Tweedie, R. L. (1998), “Bounds on Regeneration Times and Convergence Rates for Markov Chains,” Technical Report, University of Cambridge. Rodman, J. H. (1998), Personal correspondence. 173 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Rodman, J. H., Abromowitch, M., Sinkule, J. A., Hayes, F. A., Rivera, G. K., and Evans, W. E. (1987), “Clinical Pharmacodynamics as a Determinant of Response in a Phase I Trial,” Journal o f Clinical Oncology, 5, 1007-1014. Rosenthal, J. S. (1995), “Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo,” Journal o f the American Statistical Association, 90, 558-566. Rosner, G. L. (1999), “Simulation-Based Methods for Determining Optimal Sampling Times,” Advanced Methods o f PKJPD System Analysis (Biomedical Simulations Resource 1999 Workshop), University of Southern California. Rubin, D. B. (1984), “Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician,” The Annals o f Statistics, 12, 1151-1172. Rubin, D. B. (1987), Multiple Imputation fo r Nonresponses in Surveys, New York: Wiley. Seneta, E. (1981), Non-negative Matrices and Markov Chains, 2nd ed., New York: Springer- Verlag. Shah, S. S. (1990), A Comparison o f Numerical Integration Routines fo r Multiple Dimensions, Master Thesis, Los Angeles: University of Southern California. Shedler, G. S. (1993), Regenerative Stochastic Simulation, San Diego: Academic Press. Sheiner, L. B., and Beal, S. L. (1982), “Bayesian Individualization of Pharmacokinetics: Simple Implementation and Comparison with Non-Bayesian Methods,” Journal o f Pharmaceutical Sciences, 71, 1344-1348. Sheiner, L. B., Beal, S. L., Rosenberg, B., and Marathe, V. V. (1979), “Forecasting Individual Pharmacokinetics,” Clinical Pharmacology and Therapeutics, 26, 294-305. Sheiner, L. B., Halkin, H., Peck, C., Rosenberg, B., and Melmon, K. L. (1975), “Improved Computer Assisted Digoxin Therapy: A Method Using Feedback of Measured Serum Digoxin Concentrations,” Annals o f Internal Medicine, 82, 619-627. Smith, A. F. M. (1991), “Bayesian Computational Methods,” Philosophical Transactions of the Royal Society o f London, Ser. A, 337, 369-386. Smith, A. F. M., and Gelfand, A. E. (1992), “Bayesian Statistics Without Tears: A Sampling-Resampling Perspective,” The American Statistician, 46, 84-87. Smith, A. F. M., Skene, A. M., Shaw, J. E. H., and Naylor, J. C. (1987), “Progress with Numerical and Graphical Methods for Bayesian Statistics,” The Statistician, 36, 75-82. 174 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Smith, A. F. M., Skene, A. M., Shaw, J. E. H., Naylor, J. C., and Dransfield, M. (1985), “The Implementation of the Bayesian Paradigm,” Communications in Statistics, Part A- Theory and Methods, 14,1097-1102. Tierney, L. (1992), Comment on “Inference From Iterative Simulation Using Multiple Sequences” by A. Gelman and D. B. Rubin, and “Practical Markov Chain Monte Carlo” by C. J. Geyer, Statistical Science, 7, 499-501. -------------(1994), “Markov Chains for Exploring Posterior Distributions (with discussion),” The Annals o f Statistics, 22, 1701 -1762. ------------- (1996), “Introduction to General State-Space Markov Chain,” in Markov Chain Monte Carlo in Practice, eds. W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, London: Chapman & Hall, 59-74. Tiemey, L., and Kadane, J. (1986), “Accurate Approximations for Posterior Moments and Marginal Densities,” Journal o f the American Statistical Association, 81, 82-86. Tiemey L, and Mira, A. (1999), “Some Adaptive Monte Carlo Methods for Bayesian Inference,” Stat. Med., 18, 2507-2515. van Dooren, P., and de Riddler, L. (1976), “An Adaptive Algorithm for Numerical Integration Over an N-Dimensional Cube,” Journal o f Computational and Applied Mathematics, 2, 207-210. Wakefield, J. C. (1996), “Bayesian Individualization via Sampling-Based Methods,” Journal o f Pharmacokinetics and Biopharmaceutics, 24, 103-131. Wakefield, J. C., Smith, A. F. M., Racine-Poon, A., and Gelfand, A. E. (1994), “Bayesian Analysis of Linear and Nonlinear Population Models Using the Gibbs Sampler,” Applied Statistics, 41, 210-221. Yu, B. (1995), “Estimating L1 Error of Kernel Estimator: Monitoring Convergence of Markov Samplers,” Technical Report, Berkeley: Dept, of Statistics, University of California, Berkeley. Yu, B., and Mykland, P. (1998), “Looking at Markov Sampler Through Cusum Path Plots: A Simple Diagnostic Idea,” Statistics and Computing, 8, 275-286. Zellner, A., and Min, C. (1995), “Gibbs Sampler Convergence Criteria,” Journal o f the American Statistical Association, 90, 921-927. 175 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Bayesian estimation using Markov chain Monte Carlo methods in pharmacokinetic system analysis
PDF
A user interface for the ADAPT II pharmacokinetic/pharmacodynamic systems analysis software under Windows 2000
PDF
Dynamics of the newly formed neuromuscular synapse
PDF
Comparisons of deconvolution algorithms in pharmacokinetic analysis
PDF
Development of ceramic-to-metal package for BION microstimulator
PDF
Effects of prenatal cocaine exposure in quantitative sleep measures in infants
PDF
Head injury biomechanics: Quantification of head injury measures in rear-end motor vehicle collisions
PDF
Cardiorespiratory interactions in sleep apnea: A comprehensive model
PDF
Assessment of minimal model applicability to longitudinal studies
PDF
Comparing signal processing methods for spectral bio-imaging
PDF
Biological materials investigation by atomic force microscope (AFM)
PDF
Construction and validation of multivariable population pharmacokinetic models: Utility of allometric forms to predict the pharmacokinetic disposition of gentamicin in pediatric patients with app...
PDF
A preliminary investigation to determine the effects of a crosslinking reagent on the fatigue resistance of the posterior annulus of the intervertebral disc
PDF
Pharmacokinetic/pharmacodynamic modeling for genetically polymorphic populations
PDF
A model of upper airway dynamics in obstructive sleep apnea syndrome
PDF
A pharmacokinetic study of the renal imaging agent Tc-99m DMSA in the rat
PDF
Destructive and non-destructive approaches for quantifying the effects of a collagen cross-linking reagent on the fatigue resistance of human intervertebral disc
PDF
Characteristic acoustics of transmyocardial laser revascularization
PDF
English phoneme and word recognition by nonnative English speakers as a function of spectral resolution and English experience
PDF
Application of artificial neural networks in arterial wall structure identification via laser induced fluorescent spectra
Asset Metadata
Creator
Kang, Dongwoo (author)
Core Title
Bayesian inference using Markov chain Monte Carlo methods in pharmacokinetic /pharmacodynamic systems analysis
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Biomedical Engineering
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
biology, biostatistics,engineering, biomedical,Health Sciences, Pharmacology,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
D'Argenio, David (
committee chair
), Jelliffe, Roger (
committee member
), Schumitzky, Alan (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-127165
Unique identifier
UC11334839
Identifier
3041477.pdf (filename),usctheses-c16-127165 (legacy record id)
Legacy Identifier
3041477-0.pdf
Dmrecord
127165
Document Type
Dissertation
Rights
Kang, Dongwoo
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
biology, biostatistics
engineering, biomedical
Health Sciences, Pharmacology