Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Pharmacokinetic/pharmacodynamic modeling for genetically polymorphic populations
(USC Thesis Other)
Pharmacokinetic/pharmacodynamic modeling for genetically polymorphic populations
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PHARMACOKINETIC/PHARMACODYNAMIC MODELING FOR
GENETICALLY POLYMORPHIC POPULATIONS
by
Xiaoning Wang
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BIOMEDICAL ENGINEERING)
August 2007
Copyright 2007 Xiaoning Wang
ii
Dedication
To my grandparents and parents, who will always be there for me
iii
Acknowledgements
I would like to give my gratitude to Dr. David Z. D’Argenio. I always feel
fortunate of having him as my advisor, who offered me tremendous guidance in both
scientific and many other ways. I would also like to thank Dr. Schumitzky for his
enlightening input to my research. Their advisement makes my life at the University
of Southern California a valuable experience. I thank Dr. Khoo, Dr. Courellis and
Dr. Conti for serving in my committee and offer me their insights. Last but not least,
I thank my fiancé for his endless love and support.
iv
Table of Contents
Dedication ....................................................................................................................ii
Acknowledgements.....................................................................................................iii
List of Tables .............................................................................................................vii
List of Figures ...........................................................................................................viii
Abstract ........................................................................................................................x
1 Introduction……………………………...................................................................1
1.1 Population Pharmacokinetics and Pharmacodynamics (PK/PD).................1
1.2 The Population PK/PD Estimation Problem................................................3
1.3 Population PK/PD Modeling for Genetically Polymorphic Populations.....5
1.4 Specific Aims...............................................................................................7
2 Population PK/PD Analysis Background .................................................................9
2.1 General PK/PD Model Formulation ............................................................9
2.2 Population Modeling Problem...................................................................11
2.3 Estimation Approaches for the Population Modeling Problem .................14
2.3.1 Parametric Maximum Likelihood Approach .....................................14
2.3.2 Parametric Bayesian Approach..........................................................22
3 Population PK/PD Analysis with Finite Mixture Models: ML Solution
via the EM Algorithm .............................................................................................26
3.1 Finite Mixture Model Formulation ............................................................26
3.2 Parametric Maximum Likelihood Estimation............................................28
3.3 Solution via the EM Algorithm..................................................................29
3.4 Classification of Individual Subjects to Subpopulations ...........................32
3.5 Error Analysis for Mixture Models............................................................33
3.6 Model Selection.........................................................................................34
3.7 Extension to Models with General Error Variance Structure ....................36
3.8 Extension to Models with Covariates ........................................................37
3.9 Numerical Integration................................................................................38
3.9.1 Direct Monte Carlo Sampling ............................................................40
v
3.9.2 Monte Carlo with Importance Sampling............................................40
3.9.3 Markov Chain Monte Carlo Sampling...............................................42
4 Population Mixture Modeling Examples ................................................................45
4.1 One-compartment PK Model Example......................................................45
4.1.1 The Model and Simulation Experiment .............................................45
4.1.2 Estimation Results: Bias and Variability ...........................................48
4.1.3 Classification of Subjects...................................................................51
4.1.4 Numerical Integration and Convergence Issues.................................52
4.2 First-order Absorption PK Model Example...............................................57
4.2.1 The Model and Simulation Experiment .............................................57
4.2.2 Estimation Results: Bias and Variability ...........................................59
4.2.3 Classification of Subjects...................................................................63
4.3 One-compartment PK / Emax PD Model Example ...................................64
4.3.1 The Model and Simulation Experiment .............................................64
4.3.2 Estimation Results..............................................................................67
4.4 Comments on the ML Mixture Modeling Examples .................................72
5 Identifiability and Singularity Issues……………………………….. ....................74
5.1 Identifiability of Mixture Distributions......................................................74
5.2 Singularity of the Maximum Likelihood Problem for Mixture Models ....76
5.2.1 Mixtures with Heteroscedastic Components......................................77
5.2.2 Mixture with Homoscedastic Components ........................................78
6 Population PK/PD Analysis with Finite Mixture Models: MAP Solution via the
EM Algorithm………………………… .................................................................80
6.1 Parametric Bayesian Methods....................................................................80
6.2 Solution via the EM Algorithm..................................................................80
6.3 Comments on MAP Estimation .................................................................82
6.4 PK Model Example with MAP Estimation................................................83
6.4.1 Simulation Experiment......................................................................83
6.4.2 Prior Distribution...............................................................................84
6.4.3 The Results.........................................................................................85
6.4.4 Comments on the MAP Estimation Example ....................................87
7 Discussion and Future Work……………………………………...........................88
7.1 Further Evaluation Studies.........................................................................89
7.2 Error Analysis Issues .................................................................................90
7.3 Extension to Models with Mixed Effects...................................................91
vi
References..................................................................................................................93
Appendix 1: Derivation of EM Updating Formula for the ML Estimation of
Finite Mixture Models ........................................................................100
Appendix 2: Derivation of EM Updating Formula for the MAP Estimation of
Finite Mixture Models ........................................................................104
Appendix 3: Estimation of Maximum Likelihood EM Updating Formula for
Covariate Mixture Model....................................................................108
vii
List of Tables
Table 4-1 Summary and evaluations of parameter estimates over 200
population runs..........................................................................................49
Table 4-2 Parameter estimates by importance sampling with 1000, 2000
and 3000 samples ......................................................................................52
Table 4-3 Parameter estimates and log-likelihood values of the 25th and
50th EM iterations......................................................................................55
Table 4-4 Summary and evaluations of parameter estimates over 300
population runs..........................................................................................61
Table 4-5 Summary of population parameter estimates ............................................68
Table 4-6 Individual classification results .................................................................69
Table 6-1 Summary of MAP population parameter estimates...................................86
viii
List of Figures
Figure 1-1 Conceptualization of the pharmacokinetic and pharmacodynamic
framework ..................................................................................................1
Figure 4-1 Population densities of distribution volume
c
V
and elimination rate
constant k..................................................................................................46
Figure 4-2 Estimated population densities of Vc and k..............................................48
Figure 4-3 Percent bias and variation of ML estimates .............................................50
Figure 4-4 Histogram of percent correct classification..............................................51
Figure 4-5 Convergence of log-likelihood by starting the EM algorithm from 9
positions ...................................................................................................53
Figure 4-6 EM updating with 50 iterations for test data set.......................................54
Figure 4-7 One-compartment PK model with first-order absorption ........................58
Figure 4-8 Population density of clearance CL..........................................................59
Figure 4-9 Estimated population densities of CL, Vc and ka ...................................60
Figure 4-10 Percent differences of parameter estimates............................................62
Figure 4-11 Histogram of percent correct classification............................................63
Figure 4-12 Pharmacokinetic/pharmacodynamic model for
2
β agonist ...................64
ix
Figure 4-13 Mean response-time curves for two subpopulations ..............................67
Figure 4-14 Estimated population density of efficacy Emax.....................................67
Figure 4-15 Measurements and individual predictions of 7 subjects falsely
classified into the low Emax group........................................................70
Figure 4-16 Measurements and individual predictions of 2 subjects falsely
classified into the normal Emax group...................................................71
Figure 4-17 Individual classification results..............................................................71
Figure 6-1 Wishart prior distribution densities with different degrees of freedom ...85
Figure 6-2 Histogram of percent correct classification for MAP example................86
x
Abstract
This dissertation presents a model-based approach to study genetically
polymorphic population as part of the drug development process. A nonlinear
random effects model with a finite mixture structure is used to identify
pharmacokinetic/pharmacodynamic phenotypes and quantify inter-subject
variability. An EM algorithm for both maximum likelihood (ML) and maximum a
posterior probability (MAP) estimation of the model is developed. It uses sampling-
based methods to implement the expectation (E) step, followed by an analytically
tractable maximization (M) step. A benefit of the approach is that no model
linearization is performed and the estimation precision can be arbitrarily controlled
by the sampling process. By classifying each individual in the population to the most
likely subpopulation, this approach also provides a basis for further investigation on
the genetic mechanism of the polymorphism and makes it possible for the
subsequent genetics-based dose individualization.
Four simulations studies, with pharmacokinetic/pharmacodynamic models
and various degrees of complexity for mixture model structures, illustrate the
feasibility of the estimation approach and evaluate its performances. Some important
features of the mixture modeling approach are investigated, including model
selection, identifiability and potential singularity. Applications of the proposed
approach to clinical trial datasets, as well as some extensions and further evaluations
of the approach will be of interest for future investigation.
1
1 Introduction
1.1 Population Pharmacokinetics and Pharmacodynamics (PK/PD)
Pharmacokinetics (PK) is the study of the absorption, distribution,
metabolism, and excretion of drugs [39]. It describes the relationship between drug
inflow and drug concentration at various body sites. Pharmacodynamics (PD)
describes the time relationships between the concentration of a pharmacological
substance at its sites of action and the intensity of its effects. Figure 1-1 provides a
conceptual PK/PD framework.
Figure 1-1 Conceptualization of the pharmacokinetic and pharmacodynamic framework
(Jusko et al. 1995)
Traditional PK/PD studies (e.g., phase I clinical trials) usually recruit healthy
volunteers or highly selected patients, and inter-individual variability has been
viewed as a factor that needs to be minimized in these studies. However, there is
substantial variability in the way individuals respond to medication, in both
treatment efficacy and toxicity. The sources of this variability include factors such as
2
the individual’s physiological status (e.g., renal, liver, cardiovascular function),
disease severity, age, sex, ethnicity, and interactions with other drugs. For example,
drugs that are eliminated mostly by the kidney show greater steady-state
concentrations in patients suffering from renal failure than in patients with normal
renal function. It is important to know the extent and causes of this variability in both
clinical trials and treatments.
Population PK/PD is the study of the sources and correlates of variability in
drug concentrations and responses among individuals who are the target patient
population receiving clinically relevant doses of a drug of interest [1]. In addition to
establishing PK/PD exposure/response relationship (in the PK/PD model),
population PK/PD analysis describes inter-individual variability in these
relationships as well as their residual variability. Compared with traditional PK/PD
evaluation, such population studies can be performed upon sparsely collected data
from large number of patients. Study design becomes more flexible with fewer
restrictions on inclusion/exclusion criteria. More importantly, population studies can
identify determining factors of inter-individual variability, such as demographic
(e.g., age, body weight), pathophysiological (e.g., organ function) or environmental
factors (e.g., smoking history), and provide the basis for a priori dose adjustments
[52]. Therefore, population PK/PD studies provide a better understanding of the
dose-exposure-response relationship among patients representative of the target
population, and offer the opportunity to individualize particular medications and
dose regimens based on each patient’s ability to respond a specific drug.
3
1.2 The Population PK/PD Estimation Problem
In population PK/PD studies, formal statistical methods have been developed
to estimate the distribution of model parameters in the population and model the
relationship between PK/PD model parameters and measured sources of possible
variability (covariates). In their seminal work, Sheiner, Rosenberg and Melmon [45]
proposed a parametric nonlinear mixed-effects modeling framework for quantifying
both within and between subject variability in a drug’s pharmacokinetics, and
developed an approximate maximum likelihood solution to the problem. The general
purpose software package NONMEM implementing this approach was introduced by
Beal and Sheiner [4]. Under the assumption that the deviation of each individual
from the population is distributed normally, NONMEM involves a first-order
linearization of PK/PD model and obtains maximum likelihood (ML) estimates for
the population mean and covariance of the model parameters given the linearized
model. Since then, many other approximate maximum likelihood algorithms have
been proposed to solve the nonlinear random and mixed effects modeling problem
(see Davidian and Giltinan [8] for an extensive review). Vonesh and Carter [53] have
proposed an alternative to maximum likelihood estimates by using a generalized
least square method to circumvent the sensitivity of ML estimation to deviations
from normality. Davidian and Giltinan [9] presented a modification suitable for use
with general error variance structure. To improve the approximation associated with
first-order linearization, Lindstrom and Bates [25] advocate basing inference on a
refinement of linearization that may be expected to provide a better approximation.
4
They also provided an iterative generalized least squares implementation to solve
this first-order conditional approximation. All the above-mentioned methods (and
others) use analytical approximations to the original model or to the likelihood,
followed by the application of computational methods to obtain the solution of the
approximating equations.
To avoid such approximations and corresponding numerical complications,
Schumitzky [42] developed an EM algorithm for exact maximum likelihood
estimation of nonlinear random effect model and showed the connection between
this algorithm and certain two stage methods of population PK/PD analysis. Walker
[57] demonstrated the utility of the EM algorithm with an analytically tractable M
step, and an E step evaluated by Monte Carlo integration. As an alternative to the
direct sampling Monte Carlo method used by Walker for the integration, Bauer and
Guzy [2] used importance sampling-based Monte Carlo integration (as proposed by
Schumitzky) in the EM algorithm to perform analyses of several PK/PD population
models. Mentre [44] presented a stochastic approximation EM (SAEM) method,
which is implemented in a Matlab software MONOLIX for nonlinear mixed effects
models analysis. Kuhn and Lavielle [23] proposed to combine this algorithm with a
Markov chain Monte Carlo (MCMC) procedure. This approach is also incorporated
into MONOLIX.
The population PK/PD problem has also been cast in a Bayesian framework,
initially by Racine-Poon [37] using a two-stage approach. In the parametric full
Bayesian approach, an additional hierarchy is added by imposing a distribution
5
for all model parameters including the population parameters (e.g., for the mean and
covariance of the normally distributed PK/PD parameters). Sampling-based
approaches [16], especially the MCMC methods provide a solution to this
computationally demanding problem and have been applied to PK/PD model by
Wakefield and colleagues (e.g. [5] [54]). A software package for Bayesian
inference, WinBUGS, is available for parametric Bayesian modeling. Within
WinBUGS, PKBUGS provides an efficient interface for population PK/PD model
[28].
1.3 Population PK/PD Modeling for Genetically Polymorphic Populations
Despite the importance of population models incorporating measured
covariates in determining drug effects, it is now recognized that inherited differences
in the metabolism and disposition of drugs, and genetic polymorphisms in drug
transporters and the molecular targets of drug therapy can have an even greater
influence on the efficacy and toxicity of medications [12]. For example, CYP2D6 is a
human gene encoding the drug-metabolizing enzyme debrisoquin hydroxylase,
initially cloned and characterized in the late 1980s. Many opioid analgesics (e.g.,
codeine) are activated by CYP2D6 [11]. About 2 to 10% of the population, however,
is homozygous for non-functional CYP2D6 variant alleles, which renders them
relatively resistant to opioid analgesic effects. Thus there is considerable inter-
individual variability in the adequacy of pain relief when uniform doses of analgesics
such as codeine are widely prescribed. There are many other examples of genetic
6
polymorphism translating into clinical differences in drug disposition and effects
[59].
While some therapeutically important polymorphisms have been identified
and can be determined in individual patients prior to drug therapy decisions, many
others remain unidentified. There is, therefore, a need to develop methods of
population modeling that can extract and model important subpopulations using
PK/PD data collected in the course of drug development trials and other clinical
studies, in order to help identify other genetic determinants of drug response.
Knowledge of genetic bases for drug pharmacokinetics and phamacodynamics
should eventually make it possible to select particular medications and dose
regimens based on the genetic ability of individual patients to metabolize, eliminate,
distribute and respond to specific drugs.
Nonparametric population PK/PD modeling approaches have been proposed
previously that addresse this important problem of analyzing polymorphic
populations. The nonparametric approach makes no assumptions about the form of
the underlying distribution of PK/PD model parameters in the population, and aims
to estimate the form itself from the population data. Mallet [30] introduced non-
parametric maximum likelihood (NPML) method and used experiment design-based
algorithms to get a discrete estimate of the random variable distribution. Schumitzky
[43] applied the EM algorithm to solve this nonparametric problem. Davidian and
Gallant [7] proposed a smoothed nonparametric maximum likelihood approach
where maximum likelihood estimate is obtained over a smooth class of distributions.
7
Rosner and Muller [41][26] and Wakefield [55] have extended the Bayesian
framework to the nonparametric case, through the use of a general mixture model to
allow for population densities with an unknown (or infinite) number of modes.
Tatarinova [50] added random permutation sampler (RPS), birth-death Markov chain
Monte Carlo (BDMCMC) and Kullback-Leibler Markov chain Monte Carlo
(KLMCMC) methods into Bayesian analysis of mixture model with a finite but
unknown number of components. This approach efficiently solves the label-
switching and trapping problem of sampling-based Bayesian methods.
1.4 Specific Aims
The objective of this study is to investigate and develop general, robust and
computationally practical methods for population PK/PD analysis that can identify
subpopulations of patients with distinct pharmacokinetic and pharmacodynamic
phenotypes. We adopt a finite mixture model framework and develop both maximum
likelihood and maximum a posterior probability estimation approaches. To
accomplish this overall goal, the work has the 7 following specific aims:
1. Derive the maximum likelihood estimates for a class of population PK/PD finite
mixture models, using the EM algorithm;
2. Derive the maximum a posterior probability (MAP) estimates for a class of
population PK/PD finite mixture models, using the EM algorithm;
3. Extend the mixture modeling framework to accommodate general models for the
error variance structure relevant to PK/PD studies;
8
4. Extend the mixture modeling framework to incorporate PK/PD models that
include covariate relationships;
5. Incorporate different sampling-based approaches for calculating the expected
values required to solve for the maximum likelihood estimates;
6. Evaluate the performances of the population mixture modeling algorithm using
simulated examples and data sets;
7. Examine identifiability and singularity issues with population mixture models.
9
2 Population PK/PD Analysis Background
In this chapter, the traditional parametric PK/PD population modeling
approach (non-mixture model) is introduced and described under a variety of
assumptions, along with the approaches and methods used for estimation.
2.1 General PK/PD Model Formulation
It is assumed that the pharmacokinetics and pharmacodynamics of the drug
under study can be described by the multi-input, multi-output, nonlinear state-space
framework as follows:
()
( ( ), , ( ), ), (0)
i
ii i i i i
dt
ft tt
dt
= =
x
x αrx c (2.1)
( ) ( ( ), , ( ), ) ( ), 1, ,
iji i i ji i i ji ji i ji i
tt tt tj m =+= yhx αre … (2.2)
where the subscript i describes the quantity for the ith subject in the population,
1, , iN = … . The state vector (e.g., drug and metabolite amounts) is () t x (
n
R ∈ x ), α
is the vector of model parameters (e.g., rate constants) (
p
R ∈ α ); () t r is the vector of
model inputs (e.g. drug infusions) and covariates (e.g. age, body weight) (
k
R ∈ r );c
is the vector of initial conditions (e.g., pre-dose drug concentration); () t y is the
vector of model predictions or outputs (e.g., drug concentrations, responses)
(
l
R ∈ y ); ( ) t e
is the vector of random output errors (
l
R ∈ e );
ji
t , j = 1, …,
i
m are the
discrete times at which the data of ith subject are collected.
In what follows,
i
θ represents the collection of all the parameters for the ith
subject (
i
θ = [| ]
TTT
ii
α c ,
pn
i
R
+
∈ θ ),
i
Y represents the collection of all measured
10
outputs for the ith subject (
12
[ ( ) | ( ) | ...| ( )] ,
i
i
lm TT T T
ii i i i i mi i
tt t R =∈ Yy y y Y ),
i
e
represents the corresponding output error:
12
[ ( ) | ( ) | ...| ( )] ,
i
i
lm TT T T
ii i i i i mi i
tt t R =∈ ee e e e ,
and vector ( )
ii
h θ is defined as follows:
111
( ) [ ( ( ), , ( ), )|...| ( (), , (), )]
iii
TT T
i i i i i i i i i i imi i imi mi
ttt t tt ≡ h θ hx αrhx α r
The measurement equations for the ith subject can then be written as:
( ) ( )
ii ii i
= + Y θ h θ e (2.3)
where
i
e is the vector of additive output error terms, representing all sources of
uncertainty including measurement error. It is assumed ()
i
t e and ()
j
t e are
independent, and ( )
i
t e and ( )
j
t e are independent.
The output error
i
e is often modeled as normally distributed with zero mean
and variance defined by an error variance model. It is often necessary to express such
variance in terms of the model parameters or outputs:
var{ } ( , )
iii
= eR θβ (2.4)
where β is a vector of subject independent variance model parameters (
q
R ∈ β ), and
(, )
ii
R θβ is a positive definite covariance matrix. For example, as a reasonable error
variance model for drug concentration, the standard deviation of error can be linearly
related to model output:
2
inter slope
var{ } ( ( ))
iii
σσ =+ eh θ (2.5)
11
2.2 Population Modeling Problem
In population PK/PD modeling, it is natural to model the data from multiple
individuals hierarchically. As mentioned above, this allows the variability in
measurements to be separated into within-individual and between-individual
components. A two-stage hierarchy for the population model specification is adopted
here:
Stage 1: Kinetic/Dynamic Time Course for an Individual
At this stage of hierarchy, the data from each individual is modeled
conditional on a set of individual-specific PK/PD parameters. For the model defined
above, and the case of normal, independent output error, Eq. (2.3) can be written as:
| , ~ ( ( ), ( , )),
i i ii ii
N i = 1,...,N Y θβ h θ R θβ (2.6)
Several special cases for the error variance are common in PK/PD studies.
One of them has constant variance for the output error. The variance structure is
denoted by:
22 22
11
( , ) { ,..., |...| ,..., }
ii l l
diag σ σσ σ = R θβ
(( ,)
ii
lm lm
ii
R R ∈× R θβ ),
and
1
[ ,..., ]
T
l
σ σ = β .
Another common variance assumption is given by
1
( , ) ...
i
ii
m
ϖ
ϖ
⎡ ⎤
⎢ ⎥
=
⎢ ⎥
⎢ ⎥
⎣ ⎦
R θβ (2.7)
where
2
{(,)}
jiiji
diag t
γ
ϖ = σ h θ . In this case the variance of error is related to the
output in power function, and
22 2
1
[ ,..., ]
T
l
σσ = σ (
2 l
R ∈ σ ), (,) γ = βσ . This is a
phenomenon that is frequently observed in PK studies, often reflecting the fact that
12
assay precision is greater at lower concentrations. The power γ may be fixed or
estimated.
An alternative can also be presented as:
log( ) log( ( ))
iii i
= + Yhθε (2.8)
with
i
ε independent and identically distributed as N(0,( ,)
ii
R θβ ) in which
case
22 22
11
( ) { ,..., | ...| ,..., }
ii l l
diag σ σσ σ = R θ , β and
1
[ ,..., ]
T
l
σ σ = β . This lognormal
model ensures the positivity of the measurements and has been widely used in
PK/PD study.
Also note that the distributional assumptions made for PK and PD
measurements can be different. For example, discrete densities are used to model the
PD date that is categorical while PK data are generally represented using a
continuous model.
Stage 2: Inter-Individual Variation
Since each individual in the population has his/her own PK/PD parameters,
their PK/PD profiles are different. Therefore, at the second stage, their PK/PD
parameters are regarded as independent and identically distributed random vectors
governed by a given probability density. The probability density is generally
assumed to be multivariate normal:
~ ( , ), ,...,
i
Ni=1N θμΣ (2.9)
where the vector of population means is μ, and Σ denotes the inter-individual
covariance matrix. The population analysis problem is to estimate the unknown
parameters {( , ), } μ Σβ based on all the population data
1
{ ,..., }
N
YY .
13
In a more general model with the inclusion of covariates, Eq.(2.9) is replaced
by:
~ ( ( , ), ), ,...,
iii
Ni=1N θ gr c Σ (2.10)
with each
i
r are known individual specific covariate measurements (e.g., sex, age,
body weight), and
i
g is a model that depends on covariate coefficients c and
covariate data. For example, for renally eliminated drugs, the observed inter-subject
variability in the total drug clearance (CL) may be explained by the contribution
from both gender and creatinine clearance (CrCL). The two factors can influence CL
as follows:
For males:
mm
CL inter slope CrCL = +×
For females:
ff
CL inter slope CrCL = +×
where { , },{ , }
mm f f
inter slope inter slope represent intercept and slope values for the
linear relationships among male and female groups respectively.
In PK/PD studies, values of parameters such as clearance and volume of
distribution are always positive by definition. In order to make the above model
structure realistic and robust, it is necessary to ensure that the quantities of modeled
parameters can be realized on the entire range defined by the distribution. For
example, under the assumption of normality given by Eq.(2.9), the PK/PD
parameters are usually log-transformed onto the whole real line before being
incorporated into the population distribution model.
14
2.3 Estimation Approaches for the Population Modeling Problem
For the hierarchical population model defined above, the estimation is
performed on the population parameters ( , ) μ Σ and the error variance parameter β
given all the measurements
1
{ ,..., }
N
YY . Maximum likelihood and Bayesian
approaches are introduced as follows.
2.3.1 Parametric Maximum Likelihood Approach
(1) Maximum Likelihood Estimation
The aim of maximum likelihood estimation is to determine an estimate of the
parameters { , , } φ = μ Σβ , so that the likelihood of data (i.e., the probability of
obtaining that particular set of data, given the chosen probability distribution model)
will be maximum (i.e., arg max ( )
ML
L φ φ = ). For large samples, maximum likelihood
estimates are unbiased with minimum variance, having approximate normal
distributions and approximate sample variances that can be calculated and used to
generate confidence bounds. Also the likelihood (or log-likelihood) functions can be
used to test hypotheses about models and parameters.
Given the PK/PD model assumption made above, the log-likelihood function
of subject i is:
log( ( | , , )) log( ( | , ) ( | , ) )
ii iii i i
Lp l p d ==
∫
Y μ Σβ Y θβ θ μΣθ (2.11)
where ( | , )
i
p θ μ Σ is the parameter population density for
i
θ; ( | , )
ii i
l Y θβ is the joint
probability density for all the
i
m data points of subject i, following the formulation
of Eq.(2.6) and Eq.(2.9). By integrating the density over
i
θ , the probability of every
15
possible occurrence of
i
θ for the population has been considered. The logarithm of
the joint likelihood for all N subjects is then:
1 1
log log( ( | , ) ( | , ) )
N N
iiii i i
i i
LL l p d
+∞
= =
−∞
==
∑ ∏
∫
Y θβ θ μΣθ (2.12)
The maximum likelihood estimates of μ , Σ and βmaximize the total objective
function L.
The integrals in Eq. (2.12) are multi-dimensional without closed form, due to
the nonlinearity of PK/PD framework. Therefore, direct evaluation and maximization
can be a computational challenge. Linearized approximation methods (e.g., first
order conditional estimation method (FOCE)) have been traditionally used to
maximize an alternative objective function, which is a linearized approximation to L.
However, they can perform no better than a second order approximation [58], and
when there are few data per subject, such methods can introduce bias into the
estimation. The errors associated with the various linearization or likelihood
approximation methods are dependent on the model and data (e.g., degree of model
nonlinearity, number of observations per subject), and cannot be controlled at the
estimation steps. It is for this reason that other methods that attempt to directly
evaluate the required integrals without recourse to analytical approximations to the
original models have been investigated. While computational methods will be
required to approximate the integrals in Eq.(2.12), the degree of accuracy in the
calculation can be controlled.
16
(2) Implementation by EM Algorithm
The expectation-maximization (EM) algorithm is a widely applicable
approach to the iterative computation of maximum likelihood estimates. It was
originally introduced by Dempster, Laird and Rubin [10] in 1977, and has been
applied to a variety of incomplete-data problems.
This algorithm is an iterative procedure for computing maximum likelihood
estimates under the situation where some additional data is absent, leaving the
observations as “incomplete data”. In brief, on each iteration of the EM algorithm,
there are two steps — the expectation step or the E step and the maximization step or
the M step. The E step requires the calculation of conditional expectation of
complete-data likelihood given “incomplete” data, and the M step requires the
maximization of this calculation with respect to the unknown parameters over the
parameter space. Starting from any point, the above E and M steps define a sequence
of points in this space. The desirable property of this algorithm is that the resulting
sequence has the likelihood improving property that guarantees convergence to a
local maximum of the joint likelihood.
The advantage of the EM algorithm is that for many problems, the complete-
data likelihood has a form that makes both the expectation and the maximization
problems tractable without analytical approximation to original models. Thus, the
EM algorithm is well suited for population PK/PD modeling problem. Schumitzky
[42] has introduced this algorithm into population PK analysis, and Walker [57]
demonstrated its application for a class of nonlinear random effects models.
17
Let ( , ) Y θ represent the complete data so that the individual PK/PD
parameters { }
i
θ represent the missing data. Under the model assumption of Eq.(2.9),
the parameter population density ( | , )
i
p θ μ Σ is of the form of a multivariate normal
distribution. Letting
() () () ( )
{, , }, 0
rr r r
r φ = ≥ μΣ β be the current estimate of the
population parameters, the following EM updating scheme holds:
(1)
1
1
N
r
i
i
N
+
=
=
∑
μ θ (2.13)
(1)
1
1
N
r
i
i
N
+
=
Σ=
∑
Σ (2.14)
where
() ( ) ()
(| , , , )
rr r
i
ii
E = θθ Y μ Σβ is the conditional mean vector for subject i, and
( 1) ( 1) () ( ) () ( 1) ( 1)
(( )( ) ' | , , , ) ( )( ) '
rr rrr rr
i i ii
ii i
E
++ ++
=− − = − − + Σθ μ θ μ Y μ Σβ θ μ θ μ B , where
() () ()
(( )( ) ' | , , , )
rr r
i ii
ii i
E =− − B θθ θθ Y μ Σβ is the conditional variance matrix for the
ith subject.
(3) Error Analysis
The EM algorithm presented above does not automatically provide an
estimate of standard errors, or full covariance matrix of the obtained maximum
likelihood estimates. Hence a method for such error analysis is considered here. This
method is based on approximating the asymptotic covariance matrix, calculated
within the EM framework.
Assuming the regularity conditions from Philppou and Roussas [36], it can
be shown that asymptotically as N →∞ ,
18
1
1
() ( ())
ML ML
N
i
i
Covφφ
−
=
≈
∑
V ,
where () (log( |))(log( |))
T
ii i
pp φ φφ
φφ
∂∂
=
∂∂
VY Y .
Now
log ( |) { log ( , |)} ( | , )
iiiiii
p pp d φφ φ
φφ
∂∂
=
∂∂
∫
YYθθ Y θ ,
and the gradient components are calculated as
1
()
log ( | ) ( | , ){ ( )}
ii ii i i
ppd φφ
−
∂
== −
∂
∫
μ
sY θ Y Σθ μ θ
μ
,
11 11
()
1
log ( | ) { ( ) ( ( ) )}
2
ii
ii
p lower diag φ
−− −−
∂
==− −− −
∂
Σ
sY Σ Σ ΣΣ Σ Σ ΣΣ
A
,
where A is the lower triangular matrix of Σ , and i Σ is the same component of Eq.
(2.14) given by (|,)()()' i
ii i i i
p d φ =−−
∫
Σθ Y θ μ θ μ θ ;
2
1
22 4 ()
(()) ()(()) 11
log ( | ) ( | , ){ }
22
T
iiii iiiii
iii i
i
m
s pp d
σ
φφ
σσ σ
−
−− ∂
== −+
∂
∫
Yh θ H θ Yh θ
Y θ Y θ .
Introduce the notation
() () 1,1 ( ) 2,1 () ,
(( ) ,( ) ...,( ) )
ii i ipp ϖ
=
ΣΣ Σ
ss s s , where
12
() ,
()
ij j Σ
s is the
component of
() i Σ
s in the
12
(, ) j j position. Put these results together to produce the
vector
2
() ()
()
(, , )
ii i
i
s
ϖ
σ
=
μ
ss s ,
so
1
1
() ( )
N
T
ML i i
i
Cov φ
−
=
=
∑
ss .
19
All the computations outlined above can be performed during the sampling-based
calculation at the final iteration of the EM algorithm. Therefore, not much extra work
is required beyond the calculation for the MLEs.
(4) Error Variance Model
In general, the updating method for the parameter vector β that is associated
with the error variance model would be performed by maximization of the data
likelihood over β. When the error variance model takes the simple
form
2
(, ) ( )
ii i i
σ = R θβ R θ , with
2
σ = β , it leads to a recursive updating formula as
follows:
1()()()
2( 1) 1
1
(( ( ))'( ( )) ( ( )) | , , , )
()
N
rr r
ii i i i i i i i
r i
N
i
i
E
m
σ
−
+ =
=
−−
=
∑
∑
Yh θ R θ Yh θ Y μ Σβ
(2.15)
For more general error variance model, taking a simple linear case
1/ 2
12
(, ) ( )
ii i i
ββ =+ R θβ IH θ for instance, the gradient equations serving as the M
step for β will be of the form:
2
()
3
1
12 1 2
(()) 1
(| , ){ } 0
() ( ( ))
N
r ii i
ii i
i
ii i i
pd φ
ββ θ ββ
−∞
=
−∞
−
− +=
++
∑
∫
YH θ
θ Y θ
HH θ
2
()
3
1
12 12
() ( ( )) ()
(| , ){ } 0
() ( ())
N
r ii i ii i i
ii i
i
ii ii
pd φ
ββ ββ
−∞
=
−∞
−
− +=
++
∑
∫
H θ YH θ H θ
θ Y θ
H θ H θ
(2.16)
Such maximization steps are intractable and will not lead to a closed-form updating
formula as Eq.(2.15).
20
One approach to estimate β in a general error variance model is to use one
step of a gradient optimization procedure [2]. It is performed on the data likelihood
with respect to β , i.e.,
1
max log
N
i
i
L
=
∑
β
. A Gauss-Newton one-step update of β at the
end of each EM iteration is given by
(1) () 1 rr
L
+−
∂
=−
∂
β
ββ H
β
. Each element of the
vector
L ∂
∂ β
can be evaluated as:
11
1
(, ) 1
{ ( , ) ( , ) [ ( , ) ( ( ))( ( )) ']} ( | , )
2
N
ii
ii ii ii i i i i ii i i i
i
jj
L
tr p d φ
+∞
−−
=
−∞
∂ ∂
=−−−
∂∂
∑
∫
R θβ
R θβ R θβ R θβ Yh θ Yhθθ Y θ
ββ
.
The second derivative
β
H can be approximated by:
12
11
1
(, ) (, ) 1
{( ,) ( ,) }( | , )
2
N
ii i i
ii i i i i i
i
jj
tr p d φ
+∞
−−
=
−∞
∂∂
=
∂∂
∑
∫
β
R θβ R θβ
HR θβ R θβ θ Y θ
ββ
,
for element of the Hessian matrix H
β
at (
1
j ,
2
j ) position.
(, )
ii
j
∂
∂
R θβ
β
can generally
be analytically evaluated since the function form of variance matrix ( , )
ii
R θβ is user
defined and considered known.
(5) Covariate Model
Under the model assumption with covariates given by Eq.(2.10), the updating
for covariate coefficient c would involve a least square evaluation of the function
1
1
1
( ( ,))' ( ( ,))
2
N
ii
ii ii
i
−
=
−−
∑
θ g rc Σθ g rc with respect to c, serving as a maximization
step inside each EM iteration [2]. An analytically tractable maximization step exists
when a linear relationship between individual parameters and individual specific
21
covariate data is considered, i.e., ( , )
ii i
= g rc Ac , where
i
A is the individual specific
design matrix containing all the information of
i
r . The updating formula is:
1()1
1
'( )
N
r
ii
i
−−
=
=
∑
cW A Σθ (2.17)
where
() 1
1
'( )
N
r
ii
i
−
=
=
∑
WA Σ A . The same solution has also been provided by the ECM
algorithm of Meng and Rubin [32].
(6) Evaluation of the Expectations
All the conditional expectations in the EM updating formula can be expressed
in form of integration as follows:
() ( | , ) ( | , )
(| , , , )
(| ,) ( | , )
ii i i i i
i
ii i i i
flp d
Ef
lp d
−∞
−∞
−∞
−∞
=
∫
∫
θ Y θβ θ μΣθ
Y μΣ β
Y θβ θ μ Σ θ
(2.18)
where ( )
i
f θ denotes any function for which an expectation is required (e.g.,
i
θ ,
(1) (1)
()()'
rr
ii
++
−− θμ θμ or ()()'
ii i i
−− θθ θθ ). Such complicated high-dimensional
integrations can be practically evaluated by sampling-based approaches such as
direct sampling Monte Carlo methods. They allow arbitrarily accurate estimates of
such integrals by randomly sampling over the parameter space of
i
θ and calculating
a weighted average of the quantity of interest. Sampling-based approaches will be
discussed in the next chapter.
22
2.3.2 Parametric Bayesian Approach
The Bayesian analysis involves fitting a probability model to a set of data
and summarize the result by a probability distribution on the parameters of the model
[17], i.e., the parameters under estimation are taken as random variables and the
estimates are expressed by their distributions (defined as posterior distributions).
This method allows incorporation of prior knowledge about model parameters
(defined as prior distributions), which for example is useful when collected data are
sparse but prior information is available about model parameters.
(1) Model Setting
Wakefield [54] proposed a parametric Bayesian approach for population
PK/PD analysis by adding a third stage to the hierarchical model that contains the
prior distribution of population parameters to be estimated:
Stage 3: Prior Specification for , , { , 1,..., }
k
kl σ = = μΣ σ
~(, ) N μη H
~(,) IW ρ Σ R
2
~ ( , ), 1,...,
kkk
IGa b k l σ = (2.19)
where IW denotes inverse Wishart distribution and IG denotes inverse Gamma
distribution. , , , ρ η HR ,
i
a and
i
b are called hyperparameters, and their values must
be stated explicitly. Here the error variance model on stage 1 is assumed to take the
form of constant variance. The distributional forms of priors have traditionally been
chosen from a conjugate family, which means the posterior distribution follows the
same parametric form as the prior distribution. This conjugacy property is
23
mathematically convenient in that the posterior distribution has a known parametric
form.
A Bayesian population analysis involves calculation of the joint posterior
distribution of all unobserved random quantities (i.e., population mean μ and
covariance
Σ , individual subject parameters
i
θ and output error variance
2
σ
),
conditioned on the measured data. The following additional notation is adopted. Let
Y be the set of all measurements from all subjects
1
[ | ...| ]
TTT
N
= YY Y , and θ denote
all the individuals’ parameter vectors
i
θ (
1
[ | ...| ]
TTT
N
= θθ θ ). From Bayes’ theorem
we can write:
22
2 22
(| , , , ) (, , , )
(, , , | ) ( | , , , ) (, , , )
()
pp
ppp
p
=∝
Y θμ Σ σ θμ Σ σ
θ μ ΣσYY θ μ Σσ θ μ Σσ
Y
(2.20)
Under the assumption of conditional independence among the elements of Y and θ ,
and independence of μ , Σ , and σ , Eq. (2.20) can be written as:
22 2
( , , , | ) ( | , )(| , )( )( )( ) pp pppp ∝ θ μ ΣσYY θσ θ μ Σ μ Σσ (2.21)
where the estimation of conditional density
2
(, , , | ) p θ μ Σσ Y is the goal of Bayesian
population problem.
(2) MCMC Computations
A Bayesian analysis requires integration over the space of parameters, which
for PK/PD models is typically not available in closed form. Markov chain Monte
Carlo methods provide a useful computation strategy for this problem [5].
24
Markov chain Monte Carlo (MCMC) methods allow calculations based on
the joint posterior distribution of interest by providing a mechanism where a set of
realizations from that distribution are generated. This is achieved via Monte Carlo
simulation from a Markov chain that is constructed so that its stationary distribution
is the relevant posterior distribution, i.e.,
() () () 2 () 2
lim{ , , ,( ) ) ~ ( , , , | )
ii i i
i
p
→∞
θ μ Σσ θ μ Σσ Y ,
where i is the iteration numbers. Therefore, by running the chain for long enough, the
resulting samples are from the joint posterior distribution of interest.
There are many sampling-based approaches to carry out MCMC methods,
among which Gibbs sampler has been widely taken up. This algorithm is an iterative
Monte Carlo procedure, which extracts the marginal distributions from all the full
conditional distributions. The Markovian updating scheme proceeds as follows.
Given an arbitrary starting set of values
(0) (0) (0)
12
, ,...,
k
UU U , where U is a stochastic
variable, we can draw
(1) (0) (0)
112
~ ( | ,..., )
k
UpUU U ,
(1) (1) (0) (0)
2213
~ ( | , ,..., )
k
UpUU U U , and
so on, up to
(1) (1) (1) (1)
13 1
~ ( | , ,..., )
kk k
UpUU U U
−
. This finishes one cycle and k random
variables are generated. After i such iterations, we would arrive at
() () ()
12
, ,...,
ii i
k
UU U . It
has been proved that under mild conditions, the convergence holds:
() ()
11
,..., ( ,..., )
d ii
kk
UU pU U ⎯⎯ → and hence for every s (1 ≤ s ≤ k),
()
()
d i
s s
UpU ⎯⎯ → .
The Gibbs sampler is a highly effective Bayesian computation strategy and has been
applied to complex stochastic models involving very large numbers of variables.
Given the population PK/PD model formulation defined in Eq.(2.9) and
Eq.(2.19), full conditional distributions of μ , Σ , and σ can be derived:
25
21 11 1 1 11
1
( | , , , ) (( )( ),( ) )
N
i
i
pNN N
−− − − − −−
=
=+ + +
∑
μ Y θΣ σ H Σ H ηΣ θ H Σ (2.22)
12 1
1
(| ,,, ) ([ ( )( )' ], )
N
ii
i
p WN ρρ
−−
=
=− −+ +
∑
Σ Y θμ σ θ μ θ μ R (2.23)
2
1
1
(|,,,) ( ,[ ( ())'( ())2])
22
N
k k ki ki i ki ki i k
i
N
p Ga Y b σ
−
=
=+ − − +
∑
Y θμ Σ Yh θ h θ (2.24)
2 11
1
1
( | , , , , , ) exp{ [ ( ,)]' [ ( ,)] ( )' ( )}
2
i
m
i j ji i i ji ji i i ji i i
j
pji t t
−−
=
≠∝ − − − + − −
∑
θ Yθμ Σσ YhθΩ YhθθμΣθμ
(2.25)
where
22 2
12
{| |...|}
l
diag σσσ = Ω .
Generation of random variables μ , Σ and σ
is straightforwardly achieved
through Eq.(2.22) to Eq.(2.24). However, to generate the variables from the
conditional distribution of
i
θ , which is in the form of an un-normalized density as in
Eq.(2.25), some advanced sampling techniques such as Metropolis-Hasting
algorithm can be applied. Once the samples of posterior density have been obtained,
the effort to perform additional analysis of posterior distributions is relatively simple.
26
3 Population PK/PD Analysis with Finite Mixture Models:
ML Solution via the EM Algorithm
3.1 Finite Mixture Model Formulation
Finite mixture models have been used extensively, when the measurements of
random variables are taken under different conditions or where the population of
sampling units consists of a number of subpopulations, each defined by a relatively
simple model. The traditional formulation of a K-component finite mixture model
can be written as:
1
~ ( ) i=1,...,n
K
ikk
k
zwp φ
=
∑
(3.1)
where (i=1,...,n)
i
z are i.i.d random samples; ( )
k
p φ are the component densities of
the mixture and { }
k
w (the mixing weights) are non-negative quantities that sum to
one, i.e.,
1
1
K
k
k
w
=
=
∑
, with 0 1
k
w ≤ ≤ . In this formulation and the following
derivations, the number of components K is considered known and fixed. The
parameters φ in the specified component densities are unknown and need to be
inferred from data.
The above finite mixture modeling structure has been successfully applied to
a variety of areas including astronomy, biology, genetics, medicine, engineering and
many other fields. Maximum likelihood and Bayesian approaches are developed for
the model fitting. A comprehensive discussion of some major issues involved with
mixture modeling, such as identifiability problems, singularity properties of ML
27
estimators, and the assessment of number of components has also been provided
[31].
For population PK/PD studies with genetically polymorphic populations,
each subpopulation reacts to the drug differently as a consequence of genetic or other
characteristics. A finite mixture model on PK/PD parameters will arise naturally in
this case. The model can deal with heterogeneous populations consisting of K groups
with sizes proportional to their weights{ }
k
w . In particular, the same hierarchical
structure in section 2.2 is followed, and the model assumption only changes at the
second stage. For the population problem, instead of a uni-modal multivariate normal
density as in Eq.(2.9), a finite normal mixture model for the PK/PD parameters
i
θ is
considered with a density of the form:
1
~ ( , ), ,...,
K
ik kk
k
wN i=1 N
=
∑
θμΣ (3.2)
where the mixture components ( , )
kk
N μ Σ are multivariate normal densities with
mean vectors
k
μ and covariance matrices
k
Σ . The weights { }
k
w are nonnegative
numbers summing to one, denoting the relative size of each subpopulation. Letting
φ represent the population parameter {( , , ), }, 1,...,
kk k
wk K φ = = μΣ β , with β as a
parameter vector in the error variance model), the problem is to estimate φ given the
population data
1
{ ,..., }
N
YY .
When considering random effect covariates, a more general model in a
mixture form will be given by:
28
1
~((,),)
K
ik kik k
k
wN
=
∑
θ g rc Σ (3.3)
with
i
r as known individual covariate data and
k
c as covariate coefficient in the kth
subpopulation.
3.2 Parametric Maximum Likelihood Estimation
The maximum likelihood approach has been briefly introduced in section
2.3.1. In the case of a finite mixture model, the probability density of
i
Y given φ is:
( | ) ( | , ) ( | )
iiii i i
p lp d φφφ
+∞
−∞
=
∫
YYθθ θ (3.4)
which using Eq.(3.2) can be written as:
1
(| ) (| ,) ( | , )
K
ikiii ikki
k
p wl p d φ
+∞
=
−∞
=
∑
∫
YY θβ θ μ Σθ (3.5)
Under the independence assumption of the data
1
{ ,..., }
N
YY , the overall likelihood
function is then given by:
1
(| )
N
i
i
Lp φ
=
=
∏
Y (3.6)
The maximum likelihood estimator of φ is defined as: arg max
ML
L φ = .
The computational challenge associated with the maximum likelihood
approach involves calculation of the integral in Eq.(3.5), analogous to that for a non-
mixture model. One approach for solving this problem is to use linearizing
approximations to each individual likelihood function (i.e., ( | )
i
p φ Y ), followed by
the application of computational methods to obtain the solution of the approximating
equations. Similar to the non-mixture case (i.e., the PK/PD parameters are modeled
29
using a single normal distribution), such approximation methods include the
Laplacian method, the first-order (FO) approximation method and the first-order
conditional estimation (FOCE) method etc., all have been implemented in the
NONMEM software [3] for mixture model analysis as well.
3.3 Solution via the EM Algorithm
As noted above, the EM algorithm is an efficient method for maximum
likelihood estimation problem without analytical approximations. In the EM context
for finite mixture models, the data vector
1
{ ,..., }
N
N
= YY Y is viewed as incomplete,
as the PK/PD parameters { , 1,..., }
i
iN = θ and the component label vectors
{ , 1,..., }
i
iN = z are both not available (i.e., viewed as the missing data). The vector
{}
i
z is defined as a K dimensional indicator variable such that ( )
i
zk is one or zero
depending on whether the parameter
i
θ arises from the kth component of the mixture
or not. As mentioned before in section 2.3, by adding the unobservable data to the
problem to form a “complete” data set, it can be handled by EM algorithm. The E
step takes the conditional expectation of the complete data log-likelihood, given the
observed data
N
Y . The M step requires the maximization of such conditional
expectation with respect to the population parameter vector φ over the parameter
space and gives the updated estimate of φ . By proceeding iteratively with these two
steps, the likelihood function is non-decreasing.
What follows is a summary of the derivation of the solution to the maximum
likelihood problem with a finite mixture of normals, while the complete derivation
30
given in Appendix 1. The assumed error variance model structure is in multiplicative
form:
2
(, ) ( )
ii ii
σ = R θβ R θ , where σ = β is a constant to be estimated (a more
general error variance model is considered below). If
(0)
φ is any initial point in the
space of φ , and
() r
φ is the current estimate of φ , it can be shown that the components
of the updated estimate of φ are given by:
(1) ()
1
(| )
rr
kk
wd
N
φ
−∞
+
−∞
=
∫
Gθθ
()
(1)
()
(| )
(| )
r
k
r
k
r
k
d
d
φ
φ
−∞
+ −∞
−∞
−∞
=
∫
∫
θGθθ
μ
Gθθ
(1) ( 1) ()
(1)
()
()( )'(|)
(| )
rr r
kk k
r
k
r
k
d
d
φ
φ
−∞
++
+ −∞
−∞
−∞
−−
=
∫
∫
θμ θ μ Gθθ
Σ
Gθθ
1()
1 2( 1)
1
(())'()(())(| )
()
N
r
i i i i i i ii ii i
i r
N
i
i
d
m
φ
σ
−∞
−
= + −∞
=
−−
=
∑
∫
∑
Yh θ R θ Yh θ Fθθ
(3.7)
where
1
(| ) ( | )
N
kiki
i
g φ φ
=
=
∑
Gθθ
1
(| ) ( | )
K
ii ik i
k
g φ φ
=
=
∑
Fθθ
(| ,)( | , )
(| )
(| )
ki i i i k k
ik i
i
wl p
g
p
φ
φ
=
Y θβ θ μ Σ
θ
Y
(3.8)
31
The above formulation can also be modified to accommodate the important
case where a subset of parameters is modeled by a multivariate normal distribution
and the remaining parameters follow a mixture of normals. In many PK/PD problem,
it is often reasonable to assume that the mechanism of genetic polymorphism applies
to only part of the system, for example, drug metabolism or drug target. It is
therefore desirable to partition the parameter
i
θ into two components, one (
i
α ) that
follows a mixture of multivariate normals and the second (
i
β ) defined by a single
multivariate normal distribution: {, }
iii
= θαβ , where
i
α and
i
β are independent. The
modified EM updates for this special case are given by:
()
(1) 1
()
1
(, )
(_ )
(, )
n
r
iik i i
r i
k
n
r
ik i i
i
g d
gd
φ
φ
+ =
=
=
∑
∫
∑
∫
αθ θ
μα
θθ
,
(1) ( 1) ()
(1) 1
()
1
((_) )( (_) ) (, )
(_ )
(, )
n
rrT r
ik i k iki i
r i
k
n
r
ik i i
i
g d
gd
φ
φ
++
+ =
=
−−
=
∑
∫
∑
∫
αμ α α μ α θ θ
Σα
θθ
,
()
(1) 11
{(,)}
(_ )
nK
r
iik i i
r ik
g d
n
φ
+ ==
=
∑∑
∫
βθ θ
μβ ,
and
(1) ( 1) ()
(1) 11
{( (_) )( (_) ) (, ) }
(_ )
nK
rrT r
ii ikii
r ik
g d
n
φ
++
+ ==
−−
=
∑∑
∫
βμ β βμ β θ θ
Σβ . (3.9)
The updates for { ,1 }
k
wkK ≤≤ and
2
σ are the same as in Eq. (3.7).
32
3.4 Classification of Individual Subjects to Subpopulations
For mixture models, the estimation of the mixture component
parameters{ , }
kk
μ Σ and mixing weights{ }
k
w provides the population density. The
problem of determining which mixture component an observation is most likely to
come from is important as well. For population PK/PD studies in particular, such
classification can provide a basis for further investigating and understanding the
intrinsic genetic mechanism of polymorphism and ultimately making it possible to
individualize drug therapy. A simple classification method within the derived EM
framework is discussed here.
The component label vector ( 1,..., )
i
iN = z has been introduced in the
previous section and handled as a “missing” vector in the application of EM
algorithm to mixture models. From the derivation of EM algorithm, during an E step,
the conditional expectation of
i
z can be expressed as:
1
(| ,)( | , )
() ( () | )
(| ,)( | , )
ki i i i k k i
ii i
K
ki i i i k k i
k
wl p d
k k k = 1,...,K
wl p d
−∞
−∞
−∞
=
−∞
==
∫
∑
∫
Y θβ θ μ Σ θ
τ Ez Y
Y θβ θ μ Σ θ
(3.10)
This quantity is the posterior probability that the ith subject belongs to the kth
component of the mixture. The mixing weight { }
k
w can then be estimated by the
proportion of data from each mixture component in the following M step:
1
1
( )
N
ki
i
w k k = 1,...,K
N
=
=
∑
τ (3.11)
33
The classification method discussed here is based on the probability given by{ }
i
τ .
The ith individual will be classified into the kth subpopulation if his/her kth
probability ()
i
k τ is the largest among those K elements of
i
τ . An advantage of this
method is that no additional work of calculation is required to obtain the estimation
of { }
i
τ because every iterative EM step has already obtained the evaluations of all
the integrals in Eq.(3.10) (by Monte Carlo methods).
3.5 Error Analysis for Mixture Models
A similar asymptotic error analysis presented in section 2.3.1 in the case of
ML estimation for non-mixture population problems can be extended here to mixture
modeling analysis. It is based on approximating the asymptotic covariance matrix,
calculated within the EM framework.
Assuming certain regularity conditions hold [36], as N →∞ asymptotically,
1
1
() ( ())
ML ML
N
i
i
Covφφ
−
=
≈
∑
V , (3.12)
where () (log( |))(log( |))
T
ii i
pp φ φφ
φφ
∂∂
=
∂∂
VY Y .
Now
1
log ( | ) { log ( , | )} ( , )
K
iiiikii
k
p pg d φφφ
φφ
=
∂∂
=
∂∂
∑
∫
YYθθ θ ,
and the gradient components are calculated for k=1,…,K as
1
log ( | ) ( , ){ ( )}
k
iiki kik i
k
pgd φφ
−
∂
== −
∂
∫
μ
sY θΣ θ μ θ
μ
,
11 11
1
log ( | ) { ( ) ( ( ) )}
2
k
ikkkk kkkk
k
p lower diag φ
−− −−
∂
==− −− −
∂
Σ
sY Σ Σ ΣΣ Σ Σ ΣΣ
A
,
34
where
k
A is the lower triangular matrix of
k
Σ, and
(, )( )( )'
k ik i ik ik i
gd φ =−−
∫
Σθ θ μ θ μ θ ;
2
1
22 4
1
(()) ()( ()) 11
log ( | ) ( , ){ }
22
T K
iiii i iiii
iiki i
k
m
s pg d
σ
φφ
σσ σ
−
=
−− ∂
== −+
∂
∑
∫
Yh θ H θ Yh θ
Yθθ
and for k=1,…,K-1,
11
log ( | ) ( , ) ( , )
k
w i ik i i iK i i
kk K
sp g d g d
ww w
φφ φ
∂
== −
∂
∫∫
Y θθ θ θ .
Introduce the notation
1,1 2,1 ,
(( ) ,( ) ...,( ) )
kk k k
pp ϖ
=
ΣΣ Σ
ss s s , where
,
()
k
ij Σ
s is the
component of
k
Σ
s in the ( , ) ij position. Put these results together to produce the
vector
2
11 1 1
( ,..., , ,..., , ,..., , )
KK K
iww
ss s
ϖϖ
σ −
=
μμ
ss s s s ,
so
1
1
() ( )
N
T
ML i i
i
Cov φ
−
=
=
∑
ss .
All the integral computations can be performed by the sampling-based calculation,
and the above procedures are performed at the final iteration of the EM algorithm.
3.6 Model Selection
In the former analyses, the number of components in a mixture model K is
always assumed known a priori. In PK/PD analysis, however, the choice of K arises
within the question of how many subpopulations exist and can be unknown.
35
Assessing and testing this number of components K in a mixture is an
important but difficult problem. Many approaches have been proposed for solving it
[31]. The number of components can be estimated nonparametrically as a discrete
random variable of the model. In terms of parametric estimation, the choice of the
appropriate K can be addressed using appropriate model selection criteria. This is
the approach considered here.
To compare two non-nested models, in contrast to likelihood ratio procedures
for comparing nested models, several statistical criteria have been proposed. They
are based on a penalized objective function. For two models A and B with different
values of K, say K
A
and K
B
, one would compare objective functions Q
A
and Q
B
, and
prefer the model with the larger value. Three criteria are commonly used with
different definitions of the penalty function. The Akaike Information Criterion (AIC)
takes the form AIC L P =− , where L is the maximized overall data likelihood for the
model and P is the total number of parameters under estimation and considered as
the penalty. The Bayesian information criterion (BIC) is defined
as 1/ 2log BIC L P =− . If the penalty is given bylog(log ) P , the criterion is named
the Hannan-Quinn (HQ) Criterion. The size of the penalty decreases from BIC to HQ
to AIC. Thus, BIC is considered as a more strict model selection strategy and
generally favors inclusion of fewer components in the model. Fraley and Raftery
[13] reported using a BIC criterion for mixture model selection. Based on
applications to a range of datasets, they recommended use of the BIC criterion. A
priori knowledge or assumption on biological mechanism for PK/PD polymorphism
36
can also facilitate the choice of appropriate K, if combined with those statistical
criteria. For example, for drugs primarily metabolized by the liver, the information
on hepatic cytochrome P450 family can help to decide a reasonable range for the
number of clearance subgroups, thus reduce the number of competing models under
the subsequent statistical test.
3.7 Extension to Models with General Error Variance Structure
Analogous to the case of non-mixture models, if ( , )
ii
R θβ takes any form
other than the above multiplicative form or a constant (special case for multiplicative
form when ( )
ii
= R θ I , I is identity matrix), the maximizations over the fixed effect
vector β do not lead to an explicit updating formula. A similar approach to solving
this problem that has been used in the case of non-mixture population PK/PD models
can be applied here. It involves performing one step of a gradient optimization
procedure to solve the problem, i.e., max L
β
. A Gauss-Newton one-step update of β
at the end of each EM iteration is given by
(1) () 1 rr
L
+−
∂
=−
∂
β
ββ H
β
.
To do such optimization, only the differentiation of data density ( | , )
ii i
l Y θβ is
required as follows:
11
log ( | , )
(| )
NK
ii i
ik i i
ik
l L
gd φ
+∞
==
−∞
∂ ∂
=
∂∂
∑∑
∫
Y θβ
θθ
ββ
. Therefore, each
element of the vector
L ∂
∂ β
can be evaluated as:
11
11
(, ) 1
{ ( , ) ( , ) [ ( , ) ( ( ))( ( )) ']} ( | )
2
NK
ii
ii ii ii i ii i i i ik i i
ik
j j
L
tr g d φ
+∞
−−
==
−∞
∂ ∂
=−−−
∂∂
∑∑
∫
R θβ
R θβ R θβ R θβ Yh θ Yhθθ θ
ββ
.
37
The second derivative is also necessary and can be approximated by:
12
11
11
(, ) (, ) 1
{( ,) ( ,) } ( | )
2
NK
ii ii
ii ii ik i i
ik
jj
tr g d φ
+∞
−−
==
−∞
∂∂
=
∂∂
∑∑
∫
β
R θβ R θβ
HR θβ R θβ θ θ
ββ
,
for element of the Hessian matrix H
β
at (
1
j ,
2
j ) position.
The method can be applied to evaluate any fixed parameters associated with
intra-subject variability. Typically, the variance matrix ( , )
ii
R θβ is modeled as a
function of the PK/PD model output ( )
ii
h θ and β , where the function form is user
defined and considered known. In such cases,
(, )
ii
j
∂
∂
R θβ
β
can be easily evaluated, and
the computation time is comparable to that of updating any other parameters.
3.8 Extension to Models with Covariates
In population PK/PD studies, models incorporating measured individual
covariates such as gender, age, body weight, creatinine clearance can be used to
explain between-subject variability. For the mixture modeling approach, the effects
of such measured covariates would be of importance to help identify unknown
genetic factors contributing to any observed PK/PD polymorphisms.
Analogous to the ECM updating of covariate coefficients under a non-
mixture population distribution, an analytical tractable maximization step exists in
the case of a linear relationship between the parameters and individual specific
covariate data, i.e.,
1
~((,),)
K
ik kik k
k
wN
=
∑
θ g rc Σ and ( , )
ki k i k
= g rc Ac , where
i
A is the
38
individual specific design matrix containing all the information of covariate data
i
r .
The corresponding updating formula is:
( 1) 1 () () 1
1
(| ) '( )
N
rrr
kk ki ik ii
i
d φ
−∞
+− −
=
−∞
=
∑
∫
cW g θ AΣθθ (3.13)
where
() ( ) 1
1
(| ) '( )
N
rr
kki ik ii
i
d φ
−∞
−
=
−∞
=
∑
∫
Wg θ A Σ A θ .
For a general covariate model, a least squares error function
1
1
1
( ( ,))' ( ( ,))
2
N
ik i k k i k i k
i
−
=
−−
∑
θ g rc Σθ g rc with respect to
k
c can serve as a basis for a
maximization step at the end of each EM iteration. The updating of covariance
matrices { }
k
Σ and fixed effect β in error variance remain the same as given by
Eq.(3.7).
3.9 Numerical Integration
To implement the algorithm in Eq.(3.7), the evaluation of several integrals at
each iteration is required, all of which are of the form:
()
()(| ,) ( | , )
{|,,,}
(| ,) ( | , )
ii i i i k k i
ii k k
ii i i k k i
flp d
Ef
lp d
−∞
−∞
−∞
−∞
=
∫
∫
θ Y θβ θ μ Σθ
θ Y βμ Σ
Y θβ θ μ Σ θ
(3.14)
where ()
i
f θ denotes any function of
i
θ inside the integral. Numerical evaluations
of such integrals for PK/PD models are performed by sampling methods such as
direct Monte Carlo sampling, Monte Carlo importance sampling or Markov chain
Monte Carlo sampling (MCMC).
39
Monte Carlo methods can be thought of as statistical simulation methods that
utilize a sequence of random numbers to perform the simulation and numerically
approximate the value of an integral. For integrals of the above type, the procedures
are:
1) Pick R randomly distributed points
(1) ( )
,...,
iiR
θθ from the defined parameter
space;
2) Approximate the integral in Eq.(3.14) by a weighted average:
()
( ) () ()
1
() ()
1
(( ), ( ))
(( ), ( ))
R
ir i i r i r
r
R
iir ir
r
fl p
lp
λ
λ
=
=
∑
∑
θθ θ
θθ
(3.15)
The weight λ depends on each component of the population density ( | , )
ik k
p θ μ Σ ,
the individual data likelihood ( | , )
ii i
l Y θβ and the methods of sampling. The required
number of samples R depends on the desired precision of the estimation, and the
complexity of the overall PK/PD model. When the samples
(1) ( )
,...,
iiR
θθ are
independent, laws of large numbers ensure that the approximation can be made as
accurate as desired by increasing the sample size R.
As far as the different ways to draw random samples from appropriate
parameter space are concerned, two approaches will be investigated here. Also
presented is a Markov chain Monte Carlo (MCMC) procedure.
40
3.9.1 Direct Monte Carlo Sampling
Direct sampling is the simplest among all the Monte Carlo techniques and
has been utilized in general non-mixture PK/PD population problem by Walker [57]
and Bauer [2].
The direct sampling method approximate the integral by generating samples:
(1) ( )
,..., ~ ( , )
iiR iid kk
N θθ μ Σ and calculating
()
()
iir
l λ = θ . Eq.(3.15) then becomes:
()
() ()
1
()
1
()
()
R
ir i ir
r
R
iir
r
fl
l
=
=
∑
∑
θθ
θ
(3.16)
The population parameters ,
kk
μ Σ and β from the previous EM iteration are used for
the current Monte Carlo calculations.
This sampling technique provides a straightforward way to simulate and is
easily implemented due to the normality of the sampling density. But since the
random samples are drawn from ( | , )
ik k
p θ μ Σ only and the information on data of
each individual is ignored for simulating, direct sampling can be inefficient for some
problems. More samples are required to achieve the convergence to target density
(convergence issue is discussed in the next chapter). For problems of large size, this
could be computationally prohibitive.
3.9.2 Monte Carlo with Importance Sampling
An alternative approach is the importance sampling method as discussed by
Smith and Gelfand [47] in the context of Bayesian computation. They also discussed
such extensions as rejection and weighted bootstrap methods.
41
Assume a sample of a random variable is easily generated from a continuous
density g (envelope function), but what is really required is a sample from a density
h ( target density) absolutely continuous with respect to g. The strategy of
importance sampling is to utilize the sample from envelope function g to form a
sample from the target density h.
To apply this approach to approximate Eq.(3.14), an envelope function
()
ei
p θ is used to serve as a sampling function. The samples from envelope function
are more likely to be accepted if ( )
ei
p θ approximates the joint density
() ()
ii i
lp θθ well. A good index of such similarity is the ratio of the two densities,
which also acts as weight λ . The approximation of integrals under this method is:
()
( ) () () ()
1
() () ()
1
()()/ ( )
()()/ ( )
R
ir i i r i r e i r
r
R
iir ir e ir
r
fl p p
lp p
=
=
∑
∑
θθ θ θ
θθ θ
(3.17)
The envelope function ( )
ei
p θ can take arbitrarily any form of density to
approximate ( ) ( )
ii i
lp θθ and should be easy to get random sample from. One
example would be a multivariate normal density. In practice, after the first EM
iteration, such normal density can use the conditional mean
,
{| , , ,}
ik i i k k
E = θθ Y μ Σβ
and conditional variance ,
,,
{( )( )' | , , , } ik
iik i ik i k k
E =− − Σθθθθ Y μ Σβ of each subject,
based on the parameter values of the specific mixture components from the previous
EM iteration. If the envelope function ( )
ei
p θ is taken to be the population
42
density ( | , )
ik k
p θ μ Σ , the importance sampling approximation in (3.17) is the same
as the direct sampling approximation in (3.16).
Importance sampling offers more flexibility to the integral calculations.
When the envelope function includes information on both individual data ( )
ii
l θ and
population density ( )
i
p θ , the approximating steps approach the target density faster
and the overall calculation is more efficient compared with direct sampling.
3.9.3 Markov Chain Monte Carlo Sampling
The problem of calculating the expectation in Eq. (3.14) can also be solved
as:
() ()
()
1
1
{|,,,}
R
ii k k ir
r
Ef f
R
=
=
∑
θ Y βμ Σ θ (3.18)
The law of large numbers ensures that if the samples
()
{ , 1,..., }
ir
rR = θ are drawn
from ( | , ) ( | , )
ii i i k k
lp Y θβ θ μ Σ independently, the population mean of ()
i
f θ can be
estimated by a sample mean. In practice, the
()
{}
ir
θ need not necessarily be
independent. One way of doing this is to build a Markov chain having
(| ,) ( | , )
ii i i k k
lp Y θβ θ μ Σ as its stationary distribution. This technique is called the
Markov chain Monte Carlo (MCMC) method and has been used in numerous
applications.
A Markov chain is defined here as a sequence of random variables
(1) (2)
{ , ,...}
ii
θθ , such that for any number r, the next state
(1) ir +
θ depends only on the
current state of the chain
() ir
θ , and not further on the history of the chain
43
(1) (2) ( 1)
{ , ,..., }
ii ir −
θθ θ . As r increases, the chain will gradually forget its initial state
and eventually converge to a unique stationary distribution. Thus all the samples
()
{}
ir
θ will look increasingly like independent samples from this distribution.
To construct such a Markov chain whose stationary distribution is precisely
the distribution of interest ( | , ) ( | , )
ii i i k k
lp Y θβ θ μ Σ (denoted by π in the following
derivation), a classic Metropolis-Hasting algorithm is adopted here. It was first
proposed by Metropolis et al. and generalized by Hastings. Tierney [51] has
incorporated this algorithm into a hybrid MCMC method, and applied it to explore
Bayesian posterior distributions. In the Metropolis-Hasting algorithm, for each
() ir
θ ,
the next state
(1) ir +
θ is chosen by first sampling a candidate point ξ from a proposal
distribution q and then accepting it with a probability
()
(,)
ir
α θ ξ :
()
()
() ()
() ( | )
(,) min(1, )
()(| )
ir
ir
ir ir
q
q
π
α
π
=
ξ θ ξ
θξ
θξθ
(3.19)
If the candidate point is accepted, the next state becomes
(1) ir +
= θ ξ . If it is rejected,
the chain does not move, i.e.,
(1) () ir i r +
= θθ .
When the proposal density q does not depend on the current state
() ir
θ , the
method above is called the independence Metropolis-Hasting (IMH) sampler [40].
Specifically, if the population normal density ( | , )
ik k
p θ μ Σ is set to be the proposal
distribution, then the acceptance probability α is given as:
()
()
()
(,) min(1, )
()
i
ir
iir
l
l
α =
ξ
θξ
θ
(3.20)
44
where the likelihood ratio serves as the acceptance probability. The corresponding
IMH algorithm is as follows:
1) Select
(1) i
θ to initialize the chain;
2) Repeat step iii to step v for 1,..., rR =
3) Sample a point ξfrom ( | , )
ik k
p θ μ Σ ;
4) Generate u from a Uniform(0,1);
5) If
()
()
()
i
iir
l
u
l
≤
ξ
θ
, set
(1) ir +
= θ ξ ; else,
(1) () ir i r +
= θθ .
After obtaining the Markov chain
()
{ , 1,..., }
ir
rR = θ , the expectation is calculated as in
Eq. (3.18). This IMH sampler has been applied to the analysis of non-mixture
population PK/PD models and will be investigated in the mixture modeling approach
as well.
45
4 Population Mixture Modeling Examples
Three simulated examples incorporating parameter mixture models are
presented in this chapter. The first two involve pharmacokinetic models and include
detailed simulation studies to evaluate the population estimation. The third illustrates
the application of the approach to a PK/PD model. The results are used to evaluate
the implementation of ML estimation through the EM algorithm.
4.1 One-compartment PK Model Example
4.1.1 The Model and Simulation Experiment
The pharmacokinetic model used for simulation is a one compartment model
with a bolus drug administration of dose D. The parameter
c
V denotes the volume of
distribution of the central compartment and k is the elimination rate constant. The
output of the model is defined as the drug concentration in the central compartment
and expressed as:
( ) exp( * )
c
D
ht k t
V
=− (4.1)
where t is the discrete time point to measure the concentration. The intra-subject
error is assumed to be i.i.d (identical independent distribution) and the variance
model is:
22
*( )
iii
h σ = R θ (4.2)
where σ is a constant. The simulation was performed using the following attributes
for each data set: 5 observations per subject at 1.5, 2, 3, 4, 5.5 units of time, 1 bolus
46
dose of 100 units given to each subject at initial time, with simulated PK vector
(, )
icii
Vk = θ .
Each volume of distribution Vc was simulated from a normal distribution,
while the elimination rate constant k was simulated to follow a mixture of two
normals model among the population. This model represents the pharmacokinetics
of a drug with an elimination that can be characterized by two distinct
subpopulations. The population parameters are generated with population values
σ
= 0.1, Vc = 20, k1= 0.3, k2= 0.6, Var(Vc) = 4, Var(k1) = 0.0036, Var(k2) = 0.0036.
Vc and k are independent, where k1 and k2 denote the first and second component
mean of the mixture density respectively. The population densities for k and Vc are
plotted in Figure 4-1.
Population Density of Vc
10 15 20 25 30
Vc(liter)
P(Vc)
Population Density of k
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
k(hr
-1
)
P(k)
Figure 4-1 Population densities of distribution volume
c
V
and elimination rate constant k
In the simulation experiment, one hundred subjects were assumed for each
PK trial. For each subject, the PK parameters were randomly sampled from the
defined population distribution, followed by the calculation of predicted outputs by
47
Eq.(4.1). Output errors were generated from normal density with variance defined by
Eq.(4.2). Simulated drug concentrations were generated by adding up output error
and predicted concentrations. The simulation was repeated for 200 PK trials, with the
same population size and values.
During the estimation, 1000 random samples were generated in every EM
iteration, to calculate the integral in each subject’s conditional mean and variance by
the importance sampling method. Fifty EM iterations were run for each data set
starting from equal mixing weights (
1
0.5 w = ), with the initial values for rest of the
parameters set to be the population values. The formulation of Chapter 3 has been
modified to accommodate the important case where a subset of parameters are
modeled by a multivariate normal distribution and the remaining parameters follow a
mixture of normals (see modified updating formulas in Eq.(3.9). The MLEs were
obtained for each of the 200 population sets. For each of the estimated parameters φ ,
its percent prediction error was calculated for each population data set as:
ML
pe =100×( - )/
jjjj
φ φφ , 1,...,200 j = .
These percent prediction errors were used to calculate the mean prediction error and
root mean square prediction error (RMSE) for each parameter. In addition, for each
population data set, the calculated standard errors were used to construct 95%
confidence intervals for all estimated parameters. The percent coverage of these
confidence intervals was then tabulated over the 200 population data sets. Finally,
the individual subject classification accuracy was evaluated for each population data
set.
48
4.1.2 Estimation Results: Bias and Variability
ML estimation by EM algorithm was implemented in MATLAB. The
computation time was approximately 30 minutes for each population data set on a
PC (Pentium 4 with 2GHz CPU speed, 1GB RAM).
Figure 4-2 provides a graphical illustration of the results showing the true
population distributions of Vc and k along with the estimated distribution obtained
from the 200 simulated population data sets. The estimated densities are plotted as
dotted lines, together with the true population distributions plotted as the solid lines.
Many of the estimates shown in the plots are superimposed and close to the reference,
while some still deviate from the population distribution and introduce variations to
the estimator. Visual inspection of the plots also shows that more variations exist in
the estimates of the second (smaller) mixing component for rate constant k than in the
first (larger) one.
Population Density of Vc
10 20 30
Vc(liter)
P(Vc)
Population Density of k
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
k (hr
-1
)
P(k)
Figure 4-2 Estimated population densities of Vc and k
49
To quantitatively evaluate bias and precision, their statistics were calculated
and are shown in Table 4-1, which gives mean and root mean square prediction errors
(RMSE) as well as the percent coverage of the calculated confidence intervals for each
of the estimated parameters. The population value is the reference used to generate the
200 data sets. The mean of estimates is over the 200 sets of estimates for all the
population parameters. As shown in the table, the parameter estimates match the
population values and the percent coverage of confidence intervals is reasonable. The
estimates of population variance have relatively greater biases and RMSE. In detail,
the percent bias is less than 1% for all the parameters except for the variance of Vc and
two mixing components of k, which are –5.1%, –1.1% and -10.9% respectively. The
three largest RMSE also appear for the estimation of these terms.
Parameter Population Values Mean of Estimates Mean PE (%) RMSE (%) Coverage of 95% CI (%)
μ
V
20 19.978 0.043586 1.0399 94.5
μ
k1
0.3 0.29982 -0.09045 1.6491 96.5
μ
k2
0.6 0.60029 0.10042 2.6455 90.5
σ
V
2
4 3.7605 -5.0867 23.822 94.5
σ
k1
2
0.0036 0.003573 -1.0539 14.88 91.0
σ
k2
2
0.0036 0.003195 -10.857 40.236 83.5
w
1
0.8 0.80448 0.55991 5.4248 94.5*
σ 0.1 0.099931 -0.06857 4.0618 95.5
Table 4-1 Summary and evaluations of parameter estimates over 200 population runs
* The coverage of the transformed variable
1
1
log
1
w
w −
is shown.
Mean percent bias is also indicated by the solid lines in Figure 4-3, with the
dotted lines representing its 2.5% and 97.5% quantiles. Across all the simulations,
these two quantiles demonstrate the symmetric properties of the percent difference
from references.
50
Vc
0 50 100 150 200
-5
0
5
10
Simulation Number
%Difference from Reference
Var(Vc)
0 50 100 150 200
-100
-50
0
50
100
Simulation Number
%Difference from Reference
k1
0 50 100 150 200
-10
-5
0
5
10
Simulation Number
%Difference from Reference
Var(k1)
0 50 100 150 200
-50
0
50
Simulation Number
%Difference from Reference
k2
0 50 100 150 200
-10
-5
0
5
10
Simulation Number
%Difference from Reference
Var(k2)
0 50 100 150 200
-100
0
100
Simulation Number
%Difference from Reference
w1
0 50 100 150 200
-20
-10
0
10
20
Simulation Number
%Difference from Reference
sigma
0 50 100 150 200
-10
0
10
Simulation Number
%Difference from Reference
Figure 4-3 Percent bias and variation of ML estimates
51
4.1.3 Classification of Subjects
The probability-based classification approach was presented in section 3.4
and was applied to this PK model simulation study. Figure 4-4 shows the distribution
of correct classification numbers for each population data set.
90
91
92
93
94
95
96
97
98
99
100
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
Correct Classification Percentage
Relative Frequency
Figure 4-4 Histogram of percent correct classification
The true classification to subpopulation for every subject is known from the
simulation process. The figure demonstrates that all the simulations accurately
classified at least 96 subjects out of one hundred. Eighty-three out of 200 simulations
correctly classified all the subjects and 32 simulations only misclassified 1 subject
among the 100-subject population. These results demonstrate the good performance
of this classification method.
52
4.1.4 Numerical Integration and Convergence Issues
(1) Accuracy of Sampling-based Integration
Central to the calculation of the MLEs is the computation of the integrals in
the updating formula, as approximated by importance sampling in our
implementation. Using one of the 200 population data sets we examined the
influence of the number of samples (R) used in the importance sampler, which is
required to achieve two digits of accuracy for each of the estimated parameters.
Table 4-2 presents the parameter estimates from 50 EM iterations by using 1000,
2000 and 3000 samples in the important sampling. Accuracy to two digits was
obtained with 1000 samples. Based on this experience, R was taken to be 1000 in the
current simulations study.
Parameter R=1000 R=2000 R=3000
μ
V
19.85039 19.84843 19.85201
μ
k1
0.30840 0.30837 0.30853
μ
k2
0.60038 0.60029 0.60079
σ
2
V
5.73544 5.72227 5.70365
σ
2
k1
0.00327 0.00323 0.00328
σ
2
k2
0.00202 0.00203 0.00200
w
1
0.75164 0.75157 0.75269
σ 0.09706 0.09687 0.09692
Table 4-2 Parameter estimates by importance sampling with 1000, 2000 and 3000 samples
(2) Convergence of EM Iterations
As presented above, the likelihood will increase with every EM iteration
theoretically. Under fairly general conditions, the likelihood and all the estimates
converge to stationary values. Figure 4-5 demonstrates the convergence of log-
likelihood values for a particular data set by starting the 50 EM iterations from 9
53
different positions. The log-likelihood values were approximated via Monte Carlo
integration
2() () ()
() () () ()
111
1
log{ ( )} log{ ( | , ) ( | , ) / ( )}
nKR
ll l
k i ik ik k k e k ik
ikl
Lwp p p
R
φσ
===
≅
∑∑∑
Y θθ μΣ θ ,
where
(1) ( )
() () .. . ()
,..., ~ ( )
R
ik i k iid e k i
p θθ θ . All the likelihood values in the figure increase to
reach a stationary value and remain relatively constant thereafter. In addition, the
likelihood shows a small degree of fluctuation throughout the iterations, which is due
to the approximate computational evaluations of integrals discussed above.
0 10 20 30 40 50
-1750
-1500
-1250
-1000
-750
-500
EM Iterations
Log
Likelihood
Figure 4-5 Convergence of log-likelihood by starting the EM algorithm from 9 positions
The convergence of the EM sequence can be assessed by monitoring the
change both between estimates and between likelihood values as the iterative
estimation progresses. This is the method used in this simulation study example.
Letting
() r
φ and
() r
L be the parameter estimate and likelihood values after r iterations
respectively, the sequence is assumed to be stationary if
(1) ()
()
|| ||
rr
φ
φ φε
+
−< and
54
(1) ()
()
|| ||
rr
L
LL ε
+
−< , where
() φ
ε or
() L
ε is a small number with certain digit accuracy.
In finite mixture PK/PD models, a reasonable choice of such number would depend
on the complexity of the model settings, the desired precision of estimates and the
accuracy level that can be achieved by Monte Carlo integration.
One test data set was used here again to determine the proper number of EM
iterations. The standard adopted is to compare the estimates and likelihood from the
25th EM iteration with those from the 50th EM iteration. The EM chains with 50
iterations for every parameter estimate and calculation of log-likelihood are plotted
in Figure 4-6. Table 4-3 lists both the estimated parameters and likelihood values
under comparison.
Vc
0 10 20 30 40 50
19.0
19.5
20.0
20.5
21.0
Iteration Number
Vc
Var(Vc)
0 10 20 30 40 50
3.00
3.25
3.50
3.75
4.00
4.25
Iteration Number
Var(Vc)
k1
0 10 20 30 40 50
0.28
0.29
0.30
0.31
0.32
0.33
Iteration Number
k1
Var(k1)
0 10 20 30 40 50
0.00325
0.00350
0.00375
0.00400
0.00425
0.00450
Iteration Number
Var(Vc)
Figure 4-6 EM updating with 50 iterations for test data set
55
k2
0 10 20 30 40 50
0.55
0.57
0.59
0.61
0.63
0.65
Iteration Number
k2
Var(k2)
0 10 20 30 40 50
0.000
0.001
0.002
0.003
0.004
Iteration Number
Var(k2)
w1
0 10 20 30 40 50
0.4
0.5
0.6
0.7
0.8
0.9
Iteration Number
w1
sigma
0 10 20 30 40 50
0.09
0.10
0.11
Iteration Number
sigma
logL
0 10 20 30 40 50
400
410
420
430
440
450
Iteration Number
logL
Figure 4-6 (continued) EM updating with 50 iterations for test data set
μ
V
μ
k1
μ
k2
σ
V
2
σ
k1
2
σ
k2
2
w
1
σ Log(L)
25th EM 19.3723 0.3063 0.5915 3.5969 0.00400.0023 0.8385 0.0907 440.91
50th EM 19.3656 0.3066 0.5922 3.4849 0.00390.0021 0.8396 0.0913 440.27
Table 4-3 Parameter estimates and log-likelihood values of the 25th and 50th EM iterations
56
Figure 4-6 was visually inspected to conclude that all the EM sequences enter
a stable state quickly after 20 iterations, even though the updating for the variance
parameters involves relatively more variations.
Table 4-3 shows all the estimates are within two digits accuracy (except for
the variance of distribution volume Var(
c
V ), which has one identical digit between
the two estimates). Therefore, for this simulation example, 50 iterations are enough
for the EM updating sequence to reach a stable state and to obtain a reliable ML
estimate
(50)
φ .
(3) General Comments on Numerical Integration
The Monte Carlo methods converge to the true values of the integrals as R
increases. To compare different strategies, account is taken of R, the number of i.i.d.
random samples that can lead to adequate precision.
Comparisons of the above-mentioned three sampling techniques: direct
sampling, importance sampling and MCMC sampling have been made, using a
simulated population study with non-mixture one compartment PK model. The
results indicated to get the same accuracy level, the importance-sampling technique
with properly specified envelope function usually needs fewer samples per subject
than the direct sampling approach. The MCMC approach (with independence
Metropolis-Hasting algorithm) can be computationally more efficient (less computer
time of running the algorithm) than importance sampling if the proposal distribution
is close enough to the target distribution.
57
In any particular example, of course, the number of samples R used to
approximate integrals, the number of EM iterations, as well as the use of different
starting guess will depend on the experiment design and complexity of the model.
From previous experience with non-mixture models, direct sampling will be less
efficient for more complicated PK/PD models. Importance sampling and MCMC
sampling techniques are therefore, the appropriate strategies to be adopted in those
models with large complexity.
4.2 First-order Absorption PK Model Example
4.2.1 The Model and Simulation Experiment
A one-compartment PK model with first-order elimination and first-order
absorption was chosen in this simulation study. The model structure is presented in
Figure 4-7. The intra-subject variability model has a proportional component of 10%
coefficient of variation. A single 100mg oral dose with a relatively sparse sampling
design was chosen for all simulations. The observation times of 2, 8, 12 and 24 hours
were chosen for all the concentrations in the central compartment.
58
Figure 4-7 One-compartment PK model with first-order absorption
The volume of distribution and absorption rate constant were simulated from
a non-mixture population distribution, with population mean of 50liters and 1.0hour-
1, respectively. The inter-subject variability was set to be 20% coefficient of
variation (CV) for both parameters. Each simulated population was comprised of two
subpopulations based on the difference in drug’s overall clearance. The clearance of
subpopulation 1 was located at 5liter/hr, while that of subpopulation 2 centered at
10liter/hr. The inter-subject variability for both subpopulations was 50% CV. The
mixing fractions of two subpopulations are 0.3 and 0.7 respectively. Figure 4-8
shows the mixture population density of the clearance, with two dotted curves
representing the densities of two subpopulations.
D
2
Ka
1
Vc
CL
59
Population Density of CL
0 5 10 15 20 25
CL (liter/hr)
P(CL)
Figure 4-8 Population density of clearance CL
The population size was simulated to be 50 subjects. Three-hundred
populations were simulated and subsequently analyzed. Fifty EM iterations were
completed for each trial, with 1000 samples generated for importance sampling
calculations.
4.2.2 Estimation Results: Bias and Variability
The computation time for this example was approximately 30 minutes for
each population data set on a PC (Pentium D with 3.73GHz CPU speed, 3.25GB
RAM, MATLAB implementation).
The estimated population densities for clearance CL, distribution Vc and
absorption rate constant ka are plotted as dotted lines in Figure 4-9. Considerably
more variability was observed relative to the first simulation example.
60
Population Density of Vc
25 50 75
Vc(liter)
P(Vc)
Population Density of ka
0.5 1.0 1.5
ka(hr
-1
)
P(ka)
Population Density of CL
0 5 10 15 20 25
CL(liter/hr)
P(CL)
Figure 4-9 Estimated population densities of CL, Vc and ka
Table 4-4 shows the bias, RMSE and the percent coverage of the calculated
confidence intervals for each of the estimated parameters. The estimates all match
the reference values reasonably. As expected, for the difficult case of mixture model
in this example (i.e., less separated subpopulations, large inter-subject variability
small sample size and sparse sampling), more bias and variation was observed as
compared to the previous example.
61
Parameter
Population
Values
Mean of Estimates Mean PE (%) RMSE (%) Coverage of 95% CI (%)
μ
CL1
5 4.82 -3.56 26.6 87.7
μ
CL2
10 10.2 1.53 21.5 84.3
μ
V
50 49.1 -1.83 3.41 98.3
μ
ka
1 0.969 -3.13 6.41 87.0
σ
2
cl
(CV%)
50 42.1 -29.0 41.2 72.0
σ
2
v
(CV%)
20 18.6 -13.2 26.8 89.7
σ
2
ka
(CV%)
20 11.9 -64.6 74.5 66.0
w
1
0.3 0.434 44.7 69.2 84.7
σ 0.1 0.113 12.7 17.6 83.0
Table 4-4 Summary and evaluations of parameter estimates over 300 population runs
The percent prediction errors of all the parameters are plotted in Figure 4-10
for all 300 population trials. Their average values (i.e., mean PE), 2.5% and 97.5%
quantiles are represented by the lines in the graphs. The prediction error for inter-
subject variability of ka was highly screwed to the negative side on the plot, meaning
that the parameter was substantially underestimated. This could be due to the fact
that the simulation study design does not provide enough information for ka. The
only data simulated during the absorption phase is the one located at the peak of the
concentration profile. Therefore, instead of precisely estimating its inter-subject
variability, the algorithm intends to shrink the individual ka towards the population
mean and treat them as fixed.
62
Vc
-10
0
10
Simulation Number
%Difference from Reference
Var(Vc)
-75
-50
-25
0
25
50
75
Simulation Number
%Difference from Reference
CL1
-100
0
100
Simulation Number
%Difference from Reference
Var(CL)
-100
-75
-50
-25
0
25
50
75
100
Simulation Number
%Difference from Reference
CL2
-100
0
100
Simulation Number
%Difference from Reference
ka
-20
-15
-10
-5
0
5
10
15
20
Simulation Number
%Difference from Reference
Var(ka)
-150
-100
-50
0
50
100
150
Simulation Number
%Difference from Reference
w1
-200
-100
0
100
200
Simulation Number
%Difference from Reference
Figure 4-10 Percent differences of parameter estimates
63
4.2.3 Classification of Subjects
The classification was performed based on the posterior probability. Figure
4-11 shows the histogram of the percentage for correct classification. The average
percent correct classification is 71.25.
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
0.00
0.05
0.10
0.15
0.20
0.25
Correct Classification Percentage
Frequency
Figure 4-11 Histogram of percent correct classification
Kaila et. al. [20] conducted a PK simulation-estimation study with the same
one-compartment absorption PK model using the first-order conditional estimation
option in NONMEM. Under the same simulation scenario, their proportion of
subjects classified into the correct subpopulation was 71.5%, which is comparable to
our results. However, only half of the 300 runs converged with a successful
covariance step. They did not report prediction errors for estimates and thus we
could not evaluate the biases of their population estimates.
64
4.3 One-compartment PK / Emax PD Model Example
4.3.1 The Model and Simulation Experiment
In this simulation study, a pharmacokinetic/pharmacodynamic model of
hypothetical
2
β agonist was considered. The structural model is shown in Figure
4-12.
Figure 4-12 Pharmacokinetic/pharmacodynamic model for
2
β agonist
The pharmacokinetics is described by a one compartment linear model with a
distribution volume
c
V and elimination rate constant k. The drug plasma
concentration is related to the drug’s bronchodilation effect FEV
1
(forced expiratory
volume at one second) by a sigmoid response function where the three dynamic
parameters, Emax, E0 and EC50 denote the maximum drug effect, the baseline effect
and the plasma concentration producing 50% of Emax, respectively [34]. Eq.(4.3)
describes the PK/PD model, where c(t) represents the plasma concentration and y(t)
65
is the measured drug response FEV
1
. The PK/PD parameter vector in this example is
defined as: { , , , 0 , 50 }
icii maxi i i
Vk E E EC θ = .
( ) exp( * )
c
D
ct k t
V
=−
max
(0)
() 0 ( )
50 ( )
EE
ytE ct
EC c t
−
=+
+
(4.3)
Following a bolus dose of 100 units, three measurements of FEV
1
for each subject
were assumed to be available at 1.0, 8.0 and 23 hours. The measurement error was
simulated to be normally distributed with zero mean and constant variance,
i.e.,
2
(, )
ii
R θβσ = , σ =0.1.
It has been observed that genetic polymorphism of drug target
2
β -
adrenoreceptor can be linked to the clinical responses. Three SNPs (single nucleotide
polymorphism) have been associated with altered expression, downregulation, or
coupling of the receptor in response to
2
β agonist. As a result, patients with different
genotypes can show various degrees of venodilation in response to the agonist
medication, and such alterations are common among populations [24]. For
bronchodilation effect of
2
β agonist, it is reasonable to assume that a similar
polymorphism will occur because of the identical drug receptor. Based on these
observations, the simulated model assumed that there are two subpopulations
differing in the maximum effect produced by the drug. One subpopulation comprises
a proportion 1 w of the total population, with a lower maximum response Emax. The
remaining subpopulation responds to the drug with normal (high) Emax, comprising
66
(1- 1 w ) of the total population. This indicates a mixture model with two components
for the PD parameter Emax as follows:
max 1 2
~( , ) ( , )
low low norm norm
EwN wN μ μ Σ+Σ (4.4)
Other model parameters { , , 0, 50}
c
Vk E EC are assumed to be independent, normally
distributed with population mean μ and covariance matrix Σ. The simulated
population parameters are as follows: (V
c
,var(V
c
))=(20,4), (k,var(k))=(0.3,0.0036),
(E0,var(E0))=(2,0.16), (EC50,var(EC50))=(2.3,0.16), ( μ
low
,Σ
low
)=(2.8,0.04),
( μ
norm
,Σ
norm
)=(3.8,0.04) and 1 w =0.1.
One population data set was constructed consisting of 300 subjects. Figure
4-13 shows the simulated mean response-time curves for the two subpopulations.
Maximum likelihood estimations via the EM algorithm were performed on
the simulated data set. The importance sampling method used 1000 random samples
in each EM iteration to calculate integrations. Fifty EM iterations were run on a PC
(Pentium 4 with 2GHz CPU speed, 1GB RAM) for about 12 hours.
67
1.5
1.7
1.9
2.1
2.3
2.5
2.7
2.9
3.1
3.3
3.5
0 5 10 15 20
Time(hr)
FEV1(liter)
Low-efficacy Group
Normal-efficacy Group
Figure 4-13 Mean response-time curves for two subpopulations
(error bars indicate one standard deviation)
4.3.2 Estimation Results
(1) Population Estimates
Population Density of Emax
2 3 4 5
Emax
p(Emax)
Figure 4-14 Estimated population density of efficacy Emax
68
Figure 4-14 demonstrates the bimodal distribution of Emax from the
estimation, superimposed on the reference density. Visual inspection shows that the
mean of the first estimated mixture component is slightly overestimated, while the
variance of the second component is underestimated.
Estimation results for the population parameters are shown in Table 4-5. The
true values were obtained from the sample statistics by simulation. Percent
prediction error was defined in section 5.1.2 as an index of estimation precision. The
algorithm provided relatively accurate estimates for most of the parameters, with less
than 10% ± prediction error. The estimation of the variance for the second mixture
component has the largest prediction error of -53%.
Estimates True Percent Prediction Error
μ
Emax_1
2.8746 2.791 3.00
μ
Emax_2
3.8002 3.801 -0.02
σ
2
Emax_1
0.0395 0.0372 6.20
σ
2
Emax_2
0.0205 0.0435 -53.00
μ
Vc
19.8028 19.91 -0. 54
μ
k
0.3003 0.3099 -3.10
μ
E0
2.0185 2.013 0. 27
μ
EC50
2.2893 2.288 0.06
σ
2
Vc
3.2674 3.71 -12.00
σ
2
k
0.0023 0.00367 -37.00
σ
2
E0
0.1693 0.167 1.40
σ
2
EC50
0.1287 0.157 -18.00
w1 0.1506 0.12 26.00
w2 0.8494 0.88 -3.50
σ 0.1194 0.1 19.00
Table 4-5 Summary of population parameter estimates
69
(2) Classification of Subjects
This analysis was also evaluated by comparing the individual classification
results with the true classifications known from the simulation process. Table 4-6
gives the classification results.
Estimation Reference
Low Emax Normal Emax
Low Emax 34 7
Normal Emax 2 257
Table 4-6 Individual classification results
Of the 300 subjects analyzed, nine were misclassified. Seven normal response
individuals were falsely classified as having low response, while two low response
subjects were falsely classified into the normal response group. This yields an
accuracy of 97% for the mixture model classification.
Figure 4-15 and Figure 4-16 plot the simulated measurements of those 7 and
2 falsely classified subjects respectively, and the individual predictions of their
concentration-time profiles made from the conditional estimates. Also plotted in the
figure are the corresponding population predictions for both subpopulations, by using
their population means. It shows that for these subjects, the measurements and the
individual predictions are closer to the profile of the “false” subgroup, which is
consistent with the classification results. Such false closeness can be partly due to the
estimation imprecision.
70
subject 59
0 10 20 30
2
3
Time
FEV1
subject 86
0 10 20 30
1.5
2.0
2.5
3.0
3.5
Time
FEV1
subject 115
0 10 20 30
2
3
Time
FEV1
subject 132
0 10 20 30
1.5
2.0
2.5
3.0
3.5
Time
FEV1
subject 156
0 10 20 30
2
3
Time
FEV1
subject 191
0 10 20 30
1.5
2.0
2.5
3.0
3.5
Time
FEV1
subject 245
0 10 20 30
1.5
2.0
2.5
3.0
3.5
Time
FEV1
measurements
subgroup2 mean prediction
subgroup1 mean prediction
individuall prediction
Figure 4-15 Measurements and individual predictions of 7 subjects
falsely classified into the low Emax group
71
subject 106
0 10 20 30
1
2
3
Time
FEV1
subject 216
0 10 20 30
1.5
2.0
2.5
3.0
3.5
Time
FEV1
Figure 4-16 Measurements and individual predictions of 2 subjects
falsely classified into the normal Emax group
2
2
1
1
1
1
1
1
1
2
2.5
3
3.5
4
4.5
5
0 50 100 150 200 250 300
Subject Number
Emax(liter)
Low-efficacy Group
Normal-efficacy Group
Figure 4-17 Individual classification results
The classification results are also shown in Figure 4-17. In the figure, each
subject’s true Emax values are plotted. The 9 falsely classified subjects were labeled
72
by the number 1 or 2, denoting they were incorrectly assigned to the first (low Emax)
or second (normal Emax) group respectively. The locations of most of the nine false
cases reflect the fact that their Emax values are located near the borderline between
the two groups: the low and normal response groups. This made it hard for the
algorithm to pick the right subpopulation for them, and could be another possible
reason for the misclassifications.
4.4 Comments on the ML Mixture Modeling Examples
The application of EM algorithm to the maximum likelihood approach for
mixture model offers a practical solution for PK/PD problems involving a
polymorphic population. The results from the three examples demonstrate that the
proposed methods result in population estimates with reasonable precision, and the
classifications of individuals to subpopulations were performed well.
For all the examples, only the density of one parameter was assumed to be a
mixture, while other parameter distributions were restricted to come from a non-
mixture density. While such case is important, in a typical population PK/PD
analysis it is generally not known which parameters satisfy a mixture distribution.
Usually a mixture model for each parameter is developed first, followed by statistical
tests to determine which subset of parameters may represent a mixtures. A
subsequent analysis with these PK/PD parameters specified as mixture would serve
as a post hoc estimation procedure.
Two of the examples (i.e., first and third) represent the relatively easy case of
mixture model analysis. The mean values of the two subgroups were fairly separated
73
and the inter-subject variability is relatively low. The mixture model in the second
simulation study has a large overlapping region between the two subpopulations. The
larger variability (50% CV) also makes the subpopulations less separated. As a
result, the uncertainty of classifying an individual to a given subpopulation is
increased. This was shown by its smaller percent correct classification. In addition,
the decreased sample size (50 subjects) and relatively sparse sampling per subject
likely results in less information for parameter estimates and the corresponding less
accurate classification.
It should be noted that study design would play an important role in
population PK/PD problems as well. In the present simulation studies, the set of time
points for measurements were selected to capture enough information on the model
outputs. On the other hand, if the design was less informative, a poor estimation
would be obtained regardless of the analysis methods used.
74
5 Identifiability and Singularity Issues
The EM algorithm and computational techniques of maximum likelihood
estimation for finite mixture models have been addressed with examples presented in
Chapters 4. In this chapter, important features of the likelihood function including
identifiability and singularity issues are presented and proved.
5.1 Identifiability of Mixture Distributions
For the maximum likelihood estimation, the problem is said to be identical if
different sets of parameter values result in distinct values of likelihood functions.
This identifiability issue has been explored extensively for the ML estimation of
“traditional” mixture models (see section 3.1 of reference [31]). With respect to the
finite mixture models in population PK/PD studies, this property is investigated as
follows.
Using the notation of the population PK/PD problems with finite mixture
model, as in Eq.(2.6) and Eq.(3.2), Schumitzky considered the identifiability issues
and noted there is an identifiability problem when the individual model output
()
ii
h θ takes a linear form:
2
|, ~ ( , ), 1,
ii ii
NiN σ = Y θβ A θ I … (5.1)
where { }
i
A are known design matrices. This identifiability problem can be
illustrated as follows: let the probability density ( | )
i
p φ Y be given by:
(| ) (| ,)( | )
iiii i i
p lp d φφφ
+∞
−∞
=
∫
YYθθ θ . (5.2)
75
Using the mixture presentation in Eq.(3.2), this can be written as:
2
1
(| ) (| , )( | , )
K
ikiii ikki
k
p wl d φση
+∞
=
−∞
=
∑
∫
YYθθ μ Σθ (5.3)
where ( , ) η μ Σ denotes a normal density with mean vector μ and covariance matrix
Σ .
Now
22
2
(| , )( | , ) (| , )( | , )
= ( | , ' )
ii i i k k i i ii i k k i
iik ik i
ld d ση η σ η
ησ
+∞ +∞
−∞ −∞
=
+
∫∫
Yθθ μ Σθ YA θ I θ μ Σθ
YA μ A ΣAI
so that
2
1
(| ) (| , ' )
K
ikiikiki
k
pw φη σ
=
=+
∑
YYA μ A ΣAI (5.4)
From the independence assumption of the data
1
{ ,..., }
N
YY , the likelihood function is
then given by
1
(| )
N
i
i
Lp φ
=
=
∏
Y .
The consequence of Eq.(5.4) is that in terms of variance, the likelihood L is a
function of
2
'
ik i
σ + A ΣAI , not of { }
k
Σ and
2
σ separately. So these two terms cannot
be identified separately, but only the term
2
{' }
ik i
σ + A ΣAI can be determined
uniquely. The equation also shows that the design of experiments embedded in the
matrices { }
i
A is also involved in the identifiability of { }
k
Σ and
2
σ .
For an important class of PK model, the concentration-time profile is of sum-
of-exponential form. For such nonlinear models, the integral in Eq.(5.3) does not
have a closed form as given in Eq.(5.4). Therefore it is difficult to get a clear insight
76
into identifiability issue as seen with the traditional linear mixture model. As a
conjecture, the problem can proceed in two possible opposite directions. The
invariant
2
{' }
ik i
σ + A ΣAI may get expanded such that { }
k
Σ and
2
σ will appear in
different terms and be separated as a consequence of nonlinearity. On the other hand,
the identifiability invariant might still exist in models with higher complexity and
such invariant would be a more complicated function of { }
k
Σ and
2
σ . The
simulation study results presented in chapter 4 show a relatively good and consistent
performance of ML estimator, thus back the first conjecture that identifiability
invariant will be likely to disappear in nonlinear PK/PD mixture models. Studies
with a variety of PK/PD model structures are necessary for further assessment of this
problem.
5.2 Singularity of the Maximum Likelihood Problem for Mixture Models
It has been shown that for linear mixture models, likelihood singularity
problem exists if the covariance terms of mixture components are not constrained to
be identical (heteroscedastic mixture model) [31]. That means for a particular choice
of the parameters in φ, the likelihood is unbounded: lim ( ) L φ =∞. In population
PK/PD studies, the functional forms of PK/PD models are usually nonlinear.
Therefore the singularity of ML problem for nonlinear mixture models is of
particular relevance here. Two types of mixture structures are examined as follows.
77
5.2.1 Mixtures with Heteroscedastic Components
Assume the general nonlinear finite mixture model hierarchy as in Eq.(2.6)
and Eq.(3.2), with two different component covariance matrices
1
Σ and
2
Σ , and the
multiplicative error variance model is defined by Eq.(5.1). Schumitzky has made the
observation that the likelihood function of the PK/PD finite mixture model with
heteroscedastic components is unbounded, when there is at least one subject’s data
that can be fitted exactly. Precisely, if there exists a
*
1
μ for which there is a non-
empty set of indices J such that
*
1
()
ii
μ = Yh for iJ ∈ , then for a particular choice of
parameter, the likelihood function goes to infinity, i.e.,sup( ) L = ∞ .
The detection of such singularities is not difficult in practice, as they can be
detected by one or more of the component covariance matrices becoming singular or
the likelihood becoming unbounded in the estimation process. However,
consideration has to be given to the estimation approach. For maximum likelihood
estimation, the essential aim is to identify a sequence of roots of the likelihood
equation that is consistent. In the case of a mixture of normal distributions, such a
sequence corresponds to local maxima and the largest one is preferred. One popular
solution is to get multiple EM sequences from different starting points, which will
most likely identify several local maxima. Hathaway [18] introduced a constrained
EM algorithm, which puts constraints on both the mixing weights and the component
standard deviations. Bayesian approaches can also be used for solving this problem,
with a prior distribution served as a regulation on the parameters. By proposing
appropriate prior distributions, such approach can possibly eliminate convergence
78
failure due to singularity, while have little effect on stable results obtainable without
a prior [13]. Since it has a greater degree of flexibility, this Bayesian approach,
especially the maximum a posteriori probability (MAP) method that will be
introduced in the next chapter is a promising solution to the problem of singularities.
5.2.2 Mixture with Homoscedastic Components
An alternative mixture model has homoscedastic normal components, i.e.,
12
Σ=Σ =Σ. For such a model assumption there is a more strict condition for
singularity. Following the idea in the proof for heteroscedastic case, the following
equation can be derived when taking the limit on Σ as 0 Σ → :
22
11 2 2
0
lim ( |( ), ) ( |( ), )
iii ii
Lw w ημσ ημσ
Σ→
=+ Yh I Y h I (5.5)
where all the terms have identical variance matrices
2
σ I . If part of the subjects’ data
can be fitted exactly, i.e.,
*
1
()
ii
μ = Yh or
*
2
()
ii
μ = Yh iJ ∈ , while there are still
subjects who do not fulfill this requirement, then as 0 σ → , for exact-fit subjects,
their individual likelihood
i
L →∞ , while the remaining subjects’ likelihood function
will approach zero. The overall likelihood L is then given by:
0
lim lim 0
iJ iJ
L
σ→Σ→∞
∈∉
= ∞<∞
∏∏
(5.6)
and L is bounded. Therefore, for mixture models with homoscedastic components,
the sufficient condition for likelihood singularity is that all the data can be fitted
exactly, or in other words, all of the subjects satisfy the
equation:
*
(),
ii k
k = 1,...K μ = Yh .
79
In population PK/PD studies, each data set represents the measurements from
one subject and they are distinct from each other. In such a situation, the sufficient
condition for singularity proved above implies that the number of data sets (which is
the number of subjects N in this case) would need to be smaller than the number of
mixture components K in order to give singularity to the likelihood function. In
practice, although the size of a population PK/PD study varies depending on the
stage of the drug development process, it seldom has less than 10 subjects recruited.
On the other hand, an integer less than five is often a realistic and computationally
convenient assumption for the number of mixture components. It is therefore very
unlikely that the above sufficient condition could be satisfied.
This observation implies that with mixture covariance matrices
{ , 1,..., }
k
kK = Σ restricted to be equal, ML estimates of the parameters exists as a
global maximizer. Thus it provides a possible solution to the singularity problem in
heteroscedastic mixture models. But such restrictions may introduce bias into the
estimates if the model assumption is not consistent with the true underlying
distribution.
80
6 Population PK/PD Analysis with Finite Mixture Models:
MAP Solution via the EM Algorithm
6.1 Parametric Bayesian Methods
In this approach, the population PK/PD parameters
{( , , ), }, 1,...,
kk k
wk K φ== μΣ β are assumed to be random variables with known
prior distribution. The population analysis problem then is to estimate the conditional
distribution of these parameters given the population data
1
{ ,..., }
N
N
= YY Y and the
prior distribution. More precisely, let ( ) p φ be the prior distribution for φ , then its
posterior distribution given
N
Y is as follows:
1
(| ) ()
(|)()
(| )
() ()
N
i N
N i
NN
pp
pp
p
pp
φ φ
φφ
φ
=
==
∏
Y
Y
Y
YY
,
where ( | )
i
p φ Y is given by Eq. (3.5).
The computation details of this approach are given in section 2.3.2 for the
case of general non-mixture population model. Tatarinova [50] made the extension
to the difficult mixture case, and proposed solution to label-switching and trapping
problems. In what follows, we develop a solution to obtain maximum a posterior
probability (MAP) estimates for φ .
6.2 Solution via the EM Algorithm
The MAP approach provides a point estimate of φ, i.e. the mode of its
posterior distribution: arg max ( | )
N
MAP
p φφ = Y . This estimator has many of the
81
desirable properties of the ML estimator, such as consistency. The EM algorithm can
be used to calculate
MAP
φ in a manner similar to that presented above for the
maximum likelihood estimate. To illustrate this approach, it is assumed that the prior
density of φ is given by:
1
1
( ) ( ,..., ) ( ) ( | ) ( )
K
K kk k
k
ppw wp p p φ
=
=
∏
β μ ΣΣ ,
where
11
( ,..., ) ~ ( ,..., )
kK
ww Dirichleta a
|~ ( , / )
kk k k k
N τ μΣ λ Σ
1
() ~ ( , )
kkk
Wishart q
−
ΣΨ
21
() ~ (,) Gamma a b σ
−
. (6.1)
These are conjugate priors, meaning the posterior has the same form as the prior. The
hyper-parameters { , ,( , , , , ), 1,..., }
kk k k k
ab a q k K τ = λΨ are assumed known and chosen
to express the prior knowledge to the analyses. A constant coefficient variance model
is assumed here:
2
(, ) ( )
ii i i
σ = R θβ R θ , and σ = β . The EM algorithm can be applied
to this problem. If
(0)
φ is any initial point, and letting
() r
φ be the current estimate of
MAP
φ , it is shown that the components of the updated estimate of
(1) r
φ
+
are given by:
()
(1)
1
(| ) ( 1)
r
kk
r
k
K
k
k
da
w
NK a
φ
−∞
+ −∞
=
+ −
=
−+
∫
∑
Gθθ
82
()
(1)
()
(| )
(| )
r
kkk
r
k
r
kk
d
d
φτ
φ τ
−∞
+ −∞
−∞
−∞
+
=
+
∫
∫
θGθθ λ
μ
Gθθ
( 1) ( 1) ( ) ( 1) ( 1)
(1)
()
()()'(|) ( )( )'
(| )
rr r r r
k k k k kk kk k
r
k
r
kk
d
dq d
φτ
φ
−∞
++ + +
+ −∞
−∞
−∞
−− + − − +
=
+−
∫
∫
θμ θμ G θ θ λμ λμ Ψ
Σ
Gθθ
1()
1 2( 1)
1
(())'()( ())(| ) 2
()
2( 1)
N
r
ii i i i ii i i i i
i r
N
i
i
db
ma
φ
σ
−∞
−
= + −∞
=
−− +
=
+−
∑
∫
∑
Yh θ R θ Yh θ Fθθ
(6.2)
where (| )
k
φ G θ and (| )
ii
φ F θ are given by Eq. (3.8) and dim( )
i
d θ = . The complete
derivation can be found in Appendix 3. It can be proved that the iterates in the above
equations have the monotonicity properties, i.e.,
(1) ()
(| ) ( | )
rN r N
pp φφ
+
≥YY .
6.3 Comments on MAP Estimation
In contrast to full Bayesian approach, where the statistical interference is
based on the full posterior distribution of the parameters, the MAP method provides
only a point estimate of φ . Therefore the inference would then require asymptotic
results. Nonetheless, implementation of this approach will provide valuable insight
into the analysis of a mixture distribution.
Classification of individual subjects to subpopulations can follow the same
probability-based approach as described in the previous chapter for maximum
likelihood approach. In terms of model selection, those standard criteria such as AIC,
83
BIC, HQ apply as well. In the definitions however, instead being the maximized data
likelihood for the model, L is calculated at the MAP estimates of the parameters. The
extensions to general error variance models and covariate models follow the same
framework as presented in section 3.7 and 3.8 Analogous to the implementation of
EM algorithm for ML approach, the evaluation of integrals in the equations above
are performed by sampling-based methods. The error analysis for MAP estimates
needs more investigation and will be discussed in Chapter 7 of future work.
As a Bayesian approach, the MAP estimation method involves a prior
distribution for model parameters. This prior distribution may be available from
previous studies on samples from the populations. When no prior distribution is
available, by using minimally informative prior distributions, it may circumvent
some possible problems encountered in mixture model analysis, such as likelihood
singularity which has been explored in the Chapter 5.
6.4 PK Model Example with MAP Estimation
6.4.1 Simulation Experiment
To illustrate the application of MAP estimation via the EM algorithm, the
simulated population PK data set in the first example of Chapter 4 was adopted. In
summary, it uses a 1-compartment PK model with a bolus dose input. The intra-
individual error has 10% constant coefficient variation. The distribution volume Vc
has a normal population distribution, while a mixture population model with two
subpopulations represents the elimination rate constant k. In each simulation trial,
100 subjects were assumed, and such trial was repeated 200 times.
84
6.4.2 Prior Distribution
Minimally informative prior distributions were tested out in this example to
investigate their impact on the estimation. The priors are all conjugate for the
population distributions and their densities are described in Section 4.2.
From our experience with ML estimation, the performances of inter-
individual variance estimation need more improvement as compared to other
parameters. In addition, the behaviors of those variance terms play important roles in
likelihood characteristics (see Chapter 5 for a detailed discussion). It is for these
reasons that their corresponding priors (i.e.,
1
() ~ ( , )
kkk
Wishart q
−
ΣΨ ) are the major
focuses here. The probability density functions (pdf) of one-dimensional Wishart
distribution were plotted in Figure 6-1, with degrees of freedom
k
q varied from 1 to
4. It indicates that as
k
q decreases, the density gets more spread out around its mean
value. Therefore, to specify a non-informative (flat) prior distribution for
k
Σ , each
corresponding degree of freedom was set to be 1, the smallest number as possible
(i.e. the dimension of parameter space). The mean value of the distribution is given
by
1
kk
q
−
Ψ and in this example is specified as the true value of each subpopulation’s
precision (i.e., inverse of variance). The remaining prior densities are specified as:
12
(, )~ (1,1) ww Dirichlet
11 1 2 2 2
| ~ (0.3, / 0.01) | ~ (0.6, / 0.01) NN μΣ Σ μ Σ Σ
21
() ~ (1,0) Gamma σ
−
.
85
Wishart(1,0.0036)
0 100 200 300 400 500 600
0.0
0.5
1.0
Σ
-1
p( Σ
-1
)
Wishart(2,0.0072)
0 250 500 750 1000
0.00
0.25
0.50
Σ
-1
p( Σ
-1
)
Wishart(3,0.0108)
0 250 500 750 1000
0.00
0.25
0.50
Σ
-1
p( Σ
-1
)
Wishart(4,0.0144)
0 250 500 750 1000
0.00
0.25
0.50
Σ
-1
p( Σ
-1
)
Figure 6-1 Wishart prior distribution densities with different degrees of freedom
6.4.3 The Results
By using the prior distributions described above, the MAP estimation via the
EM algorithm was performed on the 200 simulated data sets. Fifty EM iterations
were carried out for each population data set. The integrations were calculated by
importance sampling with 1,000 random samples. The algorithm was coded in
MATLAB and the computation time was around 30 minutes for each data set on a
PC (Pentium D with 3.73GHz CPU speed, 3.25GB RAM).
Table 6-1 reports the mean estimates, estimation biases and root mean square
errors (RMSE). It can be compared with the ML estimates from Table 4-1 and note
86
the similar results obtained from both estimations. The RMSE is comparable with
that of ML results, and for the estimation of three inter-individual variance terms, the
biases (mean PE) are significantly improved.
Parameter Population Values Mean of Estimates Mean PE (%) RMSE (%)
μ
V
20 19.9856 0.0834 1.0424
μ
k1
0.3 0.2997 -0.1445 1.6479
μ
k2
0.6 0.5997 0.0102 2.6389
σ
V
2
4 3.9145 -1.0502 21.7925
σ
k1
2
0.0036 0.003608 0.0222 14.576
σ
k2
2
0.0036 0.003526 -1.2531 35.5302
w
1
0.8 0.8040 0.4941 5.3572
σ 0.1 0.09975 -0.2486 4.0442
Table 6-1 Summary of MAP population parameter estimates
Figure 6-2 shows the histogram of the percent correct classification. The
results are generally identical with the ML classification results given in Figure 4-4,
with 3 more trials in MAP estimation correctly classified all the subjects.
90
91
92
93
94
95
96
97
98
99
100
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
Correct Classification Percentage
Relative Frequency
Figure 6-2 Histogram of percent correct classification for MAP example
87
6.4.4 Comments on the MAP Estimation Example
In theory, MAP estimation with completely non-informative priors will lead
to the same results as from ML approach. The prior distribution of
k
Σ in this
simulation study is specified as being flat but not fully non-informative. It is able to
improve the estimation while having little effect on the stability of the results. That
was confirmed by the results presented in Table 6-1. The performances of variance
estimation were improved in terms of smaller biases and RMSE. It is reasonable to
assume that with their priors get more shrink towards the mean values (i.e., degrees
of freedom increase), the variance terms will be more precisely estimated.
Most of the issues discussed in section 5.4 regarding the examples with ML
solution will apply here as well. For example, the simulation study used here is
relatively simple in PK structural model. The two subpopulations are well-
partitioned and easy to identify. Therefore, the results obtained under both
approaches (ML and MAP) are similar. When applied to more complicated mixture
model analysis in PK/PD study, greater differences are expected between results
from ML and MAP approaches.
88
7 Discussion and Future Work
This dissertation presents a model-based approach to study genetically
polymorphic population as part of the drug development process. A nonlinear
random effects model with a finite mixture structure is used to identify
pharmacokinetic/pharmacodynamic phenotypes and quantify inter-subject
variability. An EM algorithm for both maximum likelihood (ML) and maximum a
posterior probability (MAP) estimation of the model is developed. It uses sampling-
based methods to implement the expectation (E) step, followed by an analytically
tractable maximization (M) step. A benefit of the approach is that no model
linearization is performed and the estimation precision can be arbitrarily controlled
by the sampling process. By classifying each individual in the population to the most
likely subpopulation, this approach also provides a basis for further investigation on
the genetic mechanism of the polymorphism and makes it possible for the
subsequent genetics-based dose individualization.
Four simulations studies, with pharmacokinetic/pharmacodynamic models
and various degrees of complexity for mixture model structures, illustrate the
feasibility of the estimation approach and evaluate its performances. Some important
features of the mixture modeling approach are investigated, including model
selection, identifiability and potential singularity. Applications of the proposed
approach to clinical trial datasets, as well as the following extensions and further
evaluations of the approach will be of interest for future investigation.
89
7.1 Further Evaluation Studies
Besides the evaluations presented in chapter 5, more work is needed to
explore the performance of the ML and MAP population mixture modeling
approaches. Questions of how well can the mixture components be estimated will be
answered both in terms of population parameters and individual subject
classification.
Different simulations should investigate:
1. different structural PK/PD models;
2. different population sizes;
3. different amounts of data per subject;
4. different number of mixing components (e.g., 2, 3 and 4);
5. amount of separation between mixing components.
In addition to simulation studies, real data from clinical trial would further
assist the evaluation of the approach. For example, irinotecan is a camptothecin
prodrug approved for the treatment of colorectal cancer. A previous population
pharmacokinetic study for irinotecan and 2 of its metabolites, SN-38 and SN-38G
has identified two subpopulations of SN-38 formation [22]. A possible explanation
to this is the polymorphism in one of the human carboxylesterases. A population PK
study incorporating a finite mixture model can be applied to the data set from this
clinical trial.
90
7.2 Error Analysis Issues
The error analysis for maximum likelihood estimation of population mixture
model is presented in section 3.5. For the MAP estimation approach, such analysis
has not been fully investigated. We plan to pursue this by using asymptotical results,
as is done for ML approach. For individual PK/PD analysis, the asymptotic error
analysis for the MAP estimation is well defined [6]. A similar approach can be
extended to population MAP estimation with mixture models.
In addition to inferences based on asymptotic statistics, other approaches can
be used to carry out error analyses as well, such as boot strapping. By resampling
with replacement from the original data set, a sets of population data are simulated.
Parameters estimation will be conducted on each one of them, and the standard
error/confidence intervals can be calculated by estimating the sampling distributions
of each estimator. As an alternative to asymptotic inferences, boot strapping makes
few assumptions. However, it is more computer-intensive. Such approach will be
incorporated into the mixture model error analysis in the future.
In addition to examining the standard errors of all the parameter estimates,
the reliability of classification can also be explored by error analysis. The
classification proposed in section 3.4 is based on an estimation of
i
τ , the posterior
probability of assignment for each individual. An asymptotic error analysis of the
standard error/confidence interval for the assignment probability could be developed
(boot strap method would also provide this). The results will indicate how reliable an
91
individual is classified. This information can then be used to consider which
individuals to be included in a genetic analysis investigation.
7.3 Extension to Models with Mixed Effects
In population PK/PD studies, one or several of the parameters might be
considered fixed among all the individuals. Such model can be described as mixed
effects models, indicating the parameters can be both random and fixed inside the
population. This can be an importance case encountered in drug development trials,
especially when the study design does not provide enough information for estimating
some particular parameters. For example, for orally administered drugs undergoing
quick absorption, few or no sampling time points during the absorption phase would
prevent the accurate evaluation of the absorption rate constant.
To perform maximum likelihood estimation on such mixed effects model via
the EM algorithm, the population values of the random effects can follow the
ordinary EM updating formula derived above. The estimation of the fixed effects
will be carried out by optimizing the objective function (i.e., log likelihood) with
respect to them. It is a similar procedure to that of updating covariate coefficient
k
μ
or error variance parameter β , presented in section 3.7 and section 3.8. However, the
optimization here is more time-consuming, because as part of the parameters, those
fixed effects directly affect the kinetic/dynamic system. The entire PK/PD system
will therefore be re-simulated at every optimization step, and solving the state-space
differential equations (i.e., Eq. 2.1) takes most of the computation time. We are
92
going to look at some approaches such as certain approximation methods to reduce
the computational expenses and the resulting reduction in estimation accuracy.
93
References
1. L. Aarons. Population pharmacokinetics: theory and practice. Journal of Clinical
Pharmacology, 32: 669-670 (1991).
2. R.J. Bauer, S. Guzy. Monte Carlo parametric expectation maximization (MC-
PEM) method for analyzing population pharmacokinetic/pharmacodynamic data.
In: Advanced Methods of Pharmacokinetic and Pharmacodynamic Systems
Analysis, Vol. 3: 135-163, D.Z. D’Argenio (editor) Plenum Press, New York
(2003).
3. S.L. Beal, L.B. Sheiner. NONMEM User’s Guide, Part I. Division of Clinical
Pharmacology, University of California, San Francisco (1998).
4. S.L. Beal, L.B. Sheiner. Estimating population kinetics. CRC Critical Reviews in
Bioengineering, 8: 195-222 (1982).
5. J.E. Bennet, A. Racine-Poon, J.C. Wakefield. Markov chain Monte Carlo for
nonlinear hierarchy models. In: Markov Chain Monte Carlo in Practice, 339-357,
W.R. Gilks, S. Richardson, D. J. Speigelhalter, D.J. (editors) Chapman and Hall,
London (1996).
6. D.Z. D’Argenio, A. Schumitzky. ADAPT II User’s Guide:
Pharmacokinetic/Pharmacodynamic Systems Analysis Software. Biomedical
Simulation Resource (1997).
7. M. Davidian, A.R. Gallant. The non-linear mixed effects model with a smooth
random effects density. Biometrika, 80: 475-488 (1993).
8. M. Davidian, D.M. Giltinan. Nonlinear Models for Repeated Measurement Data.
Chapman and Hall, New York (1995).
9. M. Davidian, D.M. Giltinan. Some general estimation methods for nonlinear
mixed effects models. Journal of Biopharmaceutical Statistics, 3: 23-55 (1993).
94
10. A.P. Dempster, N. Laird, D.B. Rubin. Maximum likelihood from incomplete data
via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39: 1-38
(1977).
11. J. Desmeules, M.P. Gascon, P. Dayer, M. Magistris. Impact of environmental
and genetic factors on codeine analgesia. European Journal of Clinical
Pharmacology, 41: 23-26 (1991).
12. W.E. Evans, M.V. Relling. Pharmacogenomics: Translating Functional
Genomics into Rational Therapeutics. Science, 286: 487-491 (1999).
13. C. Fraley, A.E. Raftery. Bayeisn regulation for normal mixture estimation and
model-based clustering. Technical Report No. 486, University of Washington
(2005).
14. B. Frame, R. Miller, R.L. Lalonde. Evaluation of mixture modeling with count
data using NONMEM. Journal of Pharmacokinetics and Pharmacodynamics,
30: 167-183 (2003).
15. Z.G. Ge, R. A. Smith, R. H. Raymond. A 20,000 foot overview of population
pharmacokinetics and its application in drug development. Biopharmaceutical
Report 9: 1-7 (2001).
16. A.E. Gelfand, A.F.M. Smith. Sampling-based approaches to calculating marginal
densities. Journal of the American Statistical Association, 85: 398-409 (1990).
17. A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin. Bayesian Data Analysis.
Chapman and Hall, New York (1995).
18. R.J. Hathaway. A constrained EM algorithm for univariate normal distributions.
Journal of Statistical Computation and Simulation,. 23: 211-230 (1986).
19. W.J. Jusko, H.C. Ko, W.F. Ebling. Convergence of direct and indirect
pharmacodynamic response models. Journal of Pharmacokinetics and
Biopharmaceutics, 23: 5-8 (1995).
95
20. N. Kaila, R.J. Straka, R.C. Brundage. Mixture models and subpopulation
classification: a pharmacokinetic simulation study and application to metoprolol
CYP2D6 phenotype. Journal of Pharmacokinetics and Pharmacodynamics, 34:
141-156 (2007).
21. D. Kang. Bayesian inference using Markov chain Monte Carlo methods in
pharmacokinetic/pharmacodynamic systems analysis. Ph.D. thesis, University of
Southern California (2000).
22. C.E. Klein, E. Gupta, J.M. Reid, P.J. Atherton, J.A. Sloan, H.C. Pitot, M.J.
Ratain, H. Kastrissios. Population pharmacokinetic model for irinotecan and two
of its metabolites, SN-38 and SN-38 glucuronide. Clinical Pharmacology &
Therapeutics, 72: 638-647 (2002).
23. E. Kuhn, M. Lavielle. Coupling a statistic approximation version of EM with a
MCMC procedure. ESAIM-Mathematical Modelling and Numerical Analysis, 8:
115-131 (2004).
24. M.W. Linder, W.E. Evans, H.L. McLeod. Application of pharmacogenetic
principles to clinical pharmacology. In: Applied Pharmacokinetics &
Pharmacodynamics: Principles of Therapeutic Drug Monitoring, 176-177, M.E.
Burton, L.M. Shaw, J.J. Schentag, W.E. Evans (editors) Lippincott Williams &
Wilkins (2006).
25. M.J. Lindstrom, D.M. Bates. Nonlinear mixed effects models for repeated
measures data. Biometrics, 46: 673-687 (1990).
26. H. Lopes, P. Muller, G.L. Rosner. Bayesian meta-analysis for longitudinal data
models using multivariate mixture priors. Biometrics, 59: 66-75 (2003).
27. T.M. Ludden, S.P. Riley. Model selection within a population model using
NONMEM and WinBUGS. Advanced Methods of Pharmacokinetic and
Pharmacodynamic Systems Analysis. BMSR Workshop, Los Angeles, CA,
organized by D.Z. D’Argenio (2005).
96
28. D.J. Lunn, L. Aarons. The pharmacokinetics of Saquinavir: a Markov chain
Monte Carlo population analysis. Journal of Pharmacokinetics and
Biopharmaceutics, 26: 47-74 (1998).
29. D.J. Lunn, N. Best, A. Thomas, J. Wakefield, D. Spiegelhalter. Bayesian analysis
of population PK/PD models: general concepts and software. Journal of
Pharmacokinetics and Pharmacodynamics, 29: 271-305 (2002).
30. A. Mallet. A maximum likelihood estimation method for random coefficient
regression models. Biometrika, 73: 645-656 (1986).
31. G. J. McLachlan, D. Peel. Finite Mixture Models. Wiley, New York (2000).
32. X. Meng, D.B. Rubin. Maximum likelihood estimation via the ECM algorithm: a
general framework. Biometrika, 80: 267-278 (1993).
33. B. Muthen, K. Shedden. Finite mixture modeling with mixture outcomes using
the EM algorithm. Biometrics, 55: 463-469 (1999).
34. K. Park, D.Z. D’Argenio. Control of uncertain pharmacodynamic systems. In:
Advanced Methods of Pharmacokinetic and Pharmacodynamic Systems
Analysis, Vol. 3, 275-293, D.Z. D’Argenio (editor) Plenum Press, New York
(2003).
35. D.K. Pauler, N.M. Laird. A mixture model for longitudinal data with application
to assessment of noncompliance. Biometrics, 56: 464-472 (2000).
36. A. Philppou, G. Roussas. Asymptotic normality of the maximum likelihood
estimate in the independent but not identically distributed case. Annals of the
Institute of Statistical Mathematics, 27: 45-55 (1975).
37. A. Racine-Poon. A Bayesian approach to nonlinear random effect models.
Biometrics, 41: 1015-1023 (1985).
97
38. S. Richardson, P.J. Green. On Bayesian analysis of mixtures with an unknown
number of components. Journal of the Royal Statistical Society, Series B, 59:
731-792 (1997).
39. S. Riegelman, L.Z. Benet, M. Rowland. Pharmacokinetics and biopharmaceutics:
a definition of terms. Journal of Pharmacokinetics and Biopharmaceutics, 1: 3-4
(1973).
40. G.O. Roberts. Markov chain concepts related to sampling algorithms. In: Markov
Chain Monte Carlo in Practice, 339-357, W.R. Gilks, S. Richardson, D. J.
Speigelhalter (editors) Chapman and Hall, London (1996).
41. G.L. Rosner, P. Muller. Bayesian population pharmacokinetic and
pharmacodynamic analyses using mixture models. Journal of Pharmacokinetics
and Biopharmaceutics, 25: 209-234 (1997).
42. A. Schumitzky. EM algorithms and two stage methods in pharmacokinetic
population analysis. In: Advanced Methods of Pharmacokinetic and
Pharmacodynamic Systems Analysis, Vol. 2, 145-160, D.Z. D’Argenio (editor)
Plenum Press, New York (1995).
43. A. Schumitzky. Nonparametric EM algorithms for estimating prior distributions.
Applied Mathematics and Computation, 45: 143-158 (1991).
44. A. Samson, M. Lavielle, F. Mentre. Stochastic approximation EM algorithm in
nonlinear mixed effects model for viral load decrease during anti-HIV treatment
(oral communication). The Annual Meeting of the Population Approach Group in
Europe, Uppsala (2004).
45. L.B. Sheiner, B. Rosenberg, K.L. Melmon. Modeling of individual
pharmacokinetics for computer-aided drug dosing. Computers and Biomedical
Research, 5: 441-459 (1972).
46. L.B. Sheiner, J. Wakfield. Population modeling in drug development. Statistical
Methods in Medical Research, 8: 183-193 (1999).
98
47. A.F.M. Smith, A.E. Gelfand. Bayesian statistics without tears: a sampling-
resampling perspective. American Statistician, 46: 84-88 (1992).
48. M.E. Spilker, K. Seng, A.A. Yao, H.E. Daldrup-Link, D.M. Shames, R.C.
Brasch, P. Vicini. Mixture model approach to tumor classification based on
pharmacokinetic measures of tumor permeability. Journal of Magnetic
Resonance Imaging, 22: 549-558 (2005).
49. M. Stephens. Dealing with label-switching in mixture models. Journal of the
Royal Statistical Society, Series B, 62: 795-809 (2000).
50. T. Tatarinova. Bayesian analysis of linear and nonlinear mixture models with
applications to population pharmacokinetics and gene expression data. Ph.D.
thesis, University of Southern California (2006).
51. L. Tierney. Markov chains for exploring posterior distributions (with discussion).
Annals of Statistics, 22: 1701-1762 (1994).
52. U.S. Department of Health and Human Services, FDA, CDER, CBER. Guidance
for industry: population pharmacokinetics. (1999).
53. E.F. Vonesh, R.L. Carter. Mixed effects nonlinear regression for unbalanced
repeated measures. Biometrics, 48: 1-18 (1992).
54. J.C. Wakefield, A.F.M. Smith, A. Racine-Poon, A.E. Gelfand. Bayesian analysis
of linear and non-linear population models by using the Gibbs sampler. Applied
Statistics, 43: 201-221 (1994).
55. J.C. Wakefiled, S.G. Walker. Bayesian nonparametric population models:
formulation and comparison with likelihood approaches. Journal of
Pharmacokinetics and Biopharmaceutics, 25: 235-253 (1997).
56. J.C. Wakefield, C. Zhou, S.G. Self. Modelling gene expression data over time:
curve clustering with informative prior distribution. In: Bayesian Statistics, Vol.
7, J. M. Bernardo, M.J. Bayarri, J.O. Berger, A.P. Dawid, D. Heckerman, A. F.
M. Smith, M. West (editors) Oxford University Press (2003).
99
57. S. Walker. An EM algorithm for nonlinear random effects models. Biometrics,
52: 934-944 (1996).
58. R. Wolfinger. Laplace’s approximation for nonlinear models. Biometrika, 80:
791-795 (1993).
59. www.sciencemag.org/feature/data/1044449.shl.
60. C. Zhou. A Bayesian model for curve clustering with application to gene
expression analysis. Ph.D. thesis, University of Washington (2003).
100
Appendix 1: Derivation of EM Updating Formula
for the ML Estimation of Finite Mixture Models
The derivation that follows was developed from a presentation given by
Professor Alan Schumitzky at a Biomedical Simulations Resource in-house research
seminar (May 17, 2005).
In general, the likelihood is given by:
2
1
(| ) (| , )( | , )
K
ikiii ikki
k
pYwlY d φ θσ η θ μ θ
+∞
=
−∞
=Σ
∑
∫
(A1.1)
Let
() () () 2 ()
{, ,( )}
rr r r
kk
φμ σ =Σ be the estimate of φ at stage r, then:
2
() ()
1
2
()
() 2( ) () ()
1
(| ) ( | , ) ( | , )
(| ) (| )
(| , ) ( | , )
= ( | )
(| , ) ( | , )
K
iiiiikk
ki rr
k
ii
K
r ki i i i k k
ik i i rr rr
k
ki i i i k k
pY l Y
wd
pY pY
wl Y
gd
wlY
φθσηθμ
θ
φφ
θσ η θ μ
θφθ
θσ η θ μ
+∞
=
−∞
+∞
=
−∞
Σ
=
Σ
Σ
∑
∫
∑
∫
(A1.2)
where
2
(| , ) ( | , )
(| )
(| )
ki i i i k k
ik i
i
wl Y
g
pY
θσ η θ μ
θφ
φ
Σ
= and
1
(| ) 1
K
ik i i
k
gd θφ θ
=
=
∑
∫
.
It follows from the Jensen’s inequality that:
2
()
() () 2( ) () ()
1
(| ) ( | , ) ( | , )
log ( | ) log
(| ) (| , ) ( | , )
K
r ikiiiikk
ik i i rrrrr
k
ikiiiikk
pY wl Y
gd
pY w l Y
φθσηθμ
θφθ
φθσηθμ
+∞
=
−∞
Σ
≥
Σ
∑
∫
(A1.3)
Let
1
{ ,..., }
N
N
YY Y = , then from the independence of { }
i
Y :
() () ()
() ()
1
(| ) (|)
log log ( , ) ( , )
(| ) ( | )
N N
rrr i
Nr r
i
i
pY pY
QQ
pY pY
φ φ
φφφφ
φφ
=
=≥ −
∑
(A1.4)
where
() () 2
11
( , ) ( | )log[ ( | , ) ( | , )]
NK
rr
ik i k i i i i k k i
ik
Qg wlY d φφθφ θσηθμθ
+∞
==
−∞
=Σ
∑∑
∫
101
If
(1) ()
arg max ( , )
rr
Q
φ
φ φφ
+
= ,then
(1)
()
(| )
log 0
(| )
Nr
Nr
pY
pY
φ
φ
+
≥ ,and
equivalently
(1) ()
() ( )
rr
LL φ φ
+
≥ .
Rewrite
()
(, )
r
Q φ φ in term of a summation:
() ( ) ( ) 2 ( )
11 2 1 1 3
( , ) ({ ,..., }, ) ({ ,..., },{ ,..., }, ) ( , )
rr r r
KK K
QQww Q Q φφφμμ φσφ =+ ΣΣ+ (A1.5)
where
() () ()
11
11 1
({ ,..., }, ) ( | )log[ ] ( | )log[ ]
NK K
rr r
Kiki ki k k
ik k
Qw w g w d G w d φ θφ θ θ φ θ
+∞ +∞
== =
−∞ −∞
==
∑∑ ∑
∫∫
(A1.6)
() ()
2
11
()
1
({ },{ }, ) (| )log[ (| , )]
(| )log[ (| , )]
NK
rr
kk iki i k k i
ik
K
r
kkk
k
Qg d
Gd
μφθφηθμθ
θφ η θ μ θ
+∞
==
−∞
+∞
=
−∞
Σ= Σ
=Σ
∑∑
∫
∑
∫
(A1.7)
2() () 2
3
11
() 2
1
( , ) ( | )log[ ( | , )]
( | )log[ ( | , )]
NK
rr
ik i i i i i
ik
N
r
ii i i i i
i
Qg lYd
FlY d
σφθφ θσθ
θφ θ σ θ
+∞
==
−∞
+∞
=
−∞
=
=
∑∑
∫
∑
∫
(A1.8)
and
1
(| ) ( | )
N
kiki
i
Gg θφθφ
=
=
∑
1
(| ) ( | )
K
ii ik i
k
Fg θφθφ
=
=
∑
. (A1.9)
102
Therefore to maximize
()
(, )
r
Q φ φ with respect to φ , it is sufficient to maximize
()
11
({ ,..., }, )
r
K
Qw w φ with respect to
1
{ ,..., }
K
ww ; to maximize
()
2
({ },{ }, )
r
kk
Q μ φ Σ
with respect to { },{ }
kk
μ Σ ; and to maximize
2()
3
(, )
r
Q σ φ with respect to
2
σ .
i. Maximization of
()
11
({ ,..., }, )
r
K
Qw w φ with respect to
1
{ ,..., }
k
ww :
Differentiation does not work here since the weights are constrained by 0
k
w ≥
and
1
1
K
k
k
w
=
=
∑
. The results are:
(1)
1
(| )
r
kk
wG d
N
θ φθ
−∞
+
−∞
=
∫
(A1.10)
See [43] for rigorous proof of this result.
ii. Maximization of
()
2
({ },{ }, )
r
kk
Q μ φ Σ with respect to { },{ }
kk
μ Σ :
()
2
({ },{ }, )
r
kk
Q μ φ Σ can be expressed as
()
2
(, , )
r
kkk
Q μ φ Σ plus terms independent
of
k
μ and
k
Σ , where
() ()
2
( , , ) ( | )log[ ( | , )]
rr
kkk k k k
QG d μφθφ ηθμ θ
+∞
−∞
Σ= Σ
∫
.
Take derivative with respect to
k
μ and set it to be zero:
() ()
2
() 1
( , , ) ( | ) log[ ( | , )]
= ( | )[ ( )] 0
rr
kk k k k k
kk
r
kk k
QG d
Gd
μφθφ ηθμ θ
μμ
θφ θ μ θ
+∞
−∞
+∞
−
−∞
∂∂
Σ= Σ
∂∂
Σ− =
∫
∫
(A1.11)
After multiplying through by
k
Σ , we get:
103
()
(1)
()
(| )
(| )
r
k
r
k
r
k
Gd
Gd
θ θφ θ
μ
θφθ
−∞
+ −∞
−∞
−∞
=
∫
∫
(A1.12)
Similarly for
k
Σ :
() ()
2
() 1 1 1
( , , ) (| ) log[ (| , )]
11
= ( | )[( ) ( )( )' ] 0
22
rr
kkk k kk
kk
r
kkkkkk
QG d
Gd
μφθφ ηθμ θ
θφ θ μ θ μ θ
+∞
−∞
+∞
−− −
−∞
∂∂
Σ= Σ
∂Σ ∂Σ
− Σ+ Σ − − Σ =
∫
∫
Multiply right and left by
k
Σ to get:
(1) (1) ()
(1)
()
()()'(|)
(| )
rr r
kk k
r
k
r
k
Gd
Gd
θμθμ θφ θ
θφ θ
−∞
++
+ −∞
−∞
−∞
−−
Σ=
∫
∫
(A1.13)
iii. Maximization of
2()
3
(, )
r
Q σ φ with respect to
2
σ :
2() () 2
322
1
()
24
1
(, ) ( | )log[( | , )]
() ()
( ( ))' ( )( ( )) 11
(| )[ ] 0
22
N
rr
ii i i i i
i
N
r iiii iiiii
ii i
i
QF lYd
mYh R Yh
Fd
σφ θ φ θ σ θ
σσ
θθ θ
θφ θ
σσ
+∞
=
−∞
+∞
=
−∞
∂∂
=
∂∂
−−
=−+ =
∑
∫
∑
∫
where
i
m is the dimension of
i
Y .
Solving this equation leads to the updating of
2
σ as follows:
1()
1 2( 1)
1
(())'()(())(| )
()
N
r
ii i i i i i i i i i
i r
N
i
i
Yh R Y h F d
m
θ θθθφθ
σ
−∞
−
= + −∞
=
−−
=
∑
∫
∑
(A1.14)
104
Appendix 2: Derivation of EM Updating Formula
for the MAP Estimation of Finite Mixture Models
The derivation that follows was also developed from a presentation given by
Professor Alan Schumitzky at a Biomedical Simulations Resource in-house research
seminar.
The posterior distribution is given by:
1
1
(| ) ()
(|)()
(| ) ( |) ( )
() ()
N
i N N
N i
i NN
i
pp
pp
p pp
pp
φφ
φφ
φ φφ
=
=
== ∝
∏
∏
Y
Y
YY
YY
, with ( )
N
p Y a
normalizing constant of the density, and
2
1
(| ) (| , )( | , )
K
ikiii ikki
k
pYwlY d φ θσ η θ μ θ
+∞
=
−∞
=Σ
∑
∫
,
1
1
( ) ( ,..., ) ( ) ( | ) ( )
K
K kk k
k
ppw wp p p φβμ
=
=ΣΣ
∏
. Therefore, by the definition of the
MAP approach, the entity to be maximized follows the form of
11 2
1
1
( ,..., ) ( | , / ) ( , ) ( , ) ( | , ) ( | , )
K
K k k kk kk k i i i i k k i
k
Da a W q G a b w l Y d ημλ τ ψ θ σ ηθ μ θ
+∞
−−
=
−∞
ΣΣ
∑
∫
The derivation of EM updating for ML estimation given in Appendix 1 also
applies here. As a result, a similar Q function is derived as:
() () () 2 ()
11 2 1 1 3
( , ) ({ ,..., }, ) ({ ,..., },{ ,..., }, ) ( , )
rr r r
KK K
QQww Q Q φφφμμ φσφ =+ ΣΣ+
(A2.1)
where
105
() () ()
11
11 1
({ ,..., }, ) ( | )log[ ] ( | )log[ ]
NK K
rr r
Kiki ki k k
ik k
Qw w g w d G w d φ θφ θ θ φ θ
+∞ +∞
== =
−∞ −∞
==
∑∑ ∑
∫∫
(A2.2)
()
11
()
2
1
() 1
(| )log[(| , )] ({ },{ }, )
log[ ( | , / ) ( , )]
(| )log[ (| , ) ( | , / ) ( , )
NK
r
ik i i k k i
ik
r
kk
k k kk kk
r
kkkkkkkkk
gd Q
Wq
GWq
θφ ηθ μ θ μφ
ημλ τ ψ
θφ η θ μ η μ λ τ ψ
+∞
==
−∞
−
−
−
Σ Σ=
+Σ
=ΣΣ
∑∑
∫
1
1
()
2
1
]
log[ ( | , / ) ( , )]
(, , )
K
k
kk k k k k
K
r
kk k
k
d
Wq
Q
θ
ημ λ τ ψ
μφ
+∞
=
∞
−
=
+Σ
=Σ
∑
∫
∑
(A2.3)
and
2() () 2 1
3
11
() 2 1
1
( , ) ( | )log[ ( | , )] log[ ( , )]
( | )log[ ( | , )] log[ ( , )]
NK
rr
ik i i i i i
ik
N
r
ii i i i i
i
Qg lYdGab
FlY d Gab
σφ θ φ θ σ θ
θφ θ σ θ
+∞
−
==
−∞
+∞
−
=
−∞
=+
=+
∑∑
∫
∑
∫
(A2.4)
The maximization is performed on
()
(, )
r
Q φ φ with respect to φ . It is identical to
maximizing
()
11
({ ,..., }, )
r
K
Qw w φ with respect to
1
{ ,..., }
K
ww ; to maximizing
()
2
({ },{ }, )
r
kk
Q μ φ Σ with respect to { },{ }
kk
μ Σ ; and to maximizing
2()
3
(, )
r
Q σ φ with
respect to
2
σ .
i. Maximization of
()
11
({ ,..., }, )
r
K
Qw w φ with respect to
1
{ ,..., }
k
ww :
Differentiation does not work here since the weights are constrained by
0
k
w ≥ and
1
1
K
k
k
w
=
=
∑
. The results are:
106
(1)
1
(| ) ( 1)
kk
r
k
K
k
k
Gd a
w
NK a
θφ θ
−∞
+ −∞
=
+ −
=
−+
∫
∑
(A2.5)
See [43] for rigorous proof of this result.
ii. Maximization of
()
2
({ },{ }, )
r
kk
Q μ φ Σ with respect to { },{ }
kk
μ Σ :
Take derivative with respect to
k
μ and set it to be zero:
() ( )
2
() 1 1
( , , ) ( | ) log[ ( | , )] log ( | , / )
=( | )[ ( )] ( ) 0
rr
kk k k k k k k k k
kk
r
kk k kkkkk
QG d
Gd
μφθφ ηθμ θ ημλ τ
μμ
θφ θ μ θ τ λ τ μ
+∞
−∞
+∞
−−
−∞
∂∂
Σ= Σ + Σ
∂∂
Σ− +Σ − =
∫
∫
After multiplying through by
k
Σ , the following is obtained:
()
(1)
()
(| )
(| )
r
kkk
r
k
r
kk
Gd
Gd
θ θφ θ τ λ
μ
θφ θ τ
−∞
+ −∞
−∞
−∞
+
=
+
∫
∫
(A2.6)
Similarly for
k
Σ :
() () 1
2
() 1 1 1
11 (1) (1) 1
( , , ) (| ) log[ (| , )] ( , )
11
=(| )[( ) {( )( )'} ]
22
1
() {( )( )' }
22
0
rr
kkk k kk k k
kk
r
kkk kkk
rr k
kkkk k k k kk
QG dWq
Gd
qd
μφθφ ηθμ θ ψ
θφ θ μ θ μ θ
τλ μ λ μ
+∞
−
−∞
+∞
−− −
−∞
− − ++−
∂∂
Σ= Σ
∂Σ ∂Σ
−Σ + Σ − − Σ
−
+− Σ + Σ − − +Ψ Σ
=
∫
∫
Multiply right and left by
k
Σ to get:
107
(1) (1) () (1) ( 1)
(1)
()
{( )( )'} ( | ) ( )( )'
(| )
rr r r r
k k k k kk kk k
r
k
r
kk
Gd
Gdqd
θμ θμ θ φ θ τ λ μ λ μ
θφ θ
−∞
++ + +
+ −∞
−∞
−∞
−− + − − +Ψ
Σ=
+−
∫
∫
(A2.7)
iii. Maximization of
2()
3
(, )
r
Q σ φ with respect to
2
σ :
2() () 2 1
322
1
()
24 42
1
(, ) ( | )log[( | , )] (,)
() ()
(())'()(()) (1 )
(| )[ ] 0
22
N
rr
ii i i i i
i
N
r i i ii ii i ii
ii i
i
QF lYdGab
mY h R Y h ba
Fd
σφ θ φ θ σ θ
σσ
θθ θ
θφ θ
σσ σσ
+∞
−
=
−∞
+∞
=
−∞
∂∂
=
∂∂
−− −
=+ +−=
∑
∫
∑
∫
where
i
m is the dimension of
i
Y .
Solving this equation leads to the updating of
2
σ as follows:
1()
1 2( 1)
1
(())'()(())(| ) 2
()
2( 1)
N
r
ii i i i ii i i i i
i r
N
i
i
Yh R Y h F d b
ma
θθ θ θφ θ
σ
−∞
−
= + −∞
=
−− +
=
+−
∑
∫
∑
(A2.8)
108
Appendix 3: Extension of Maximum Likelihood EM Updating
Formula for Covariate Mixture Model
For models with linear covariate relationships, the following structure is
used:
1
~((,),)
K
ikikk
k
Ng r θμ
=
Σ
∑
and(,)
ki k i k
gr μ μ = A .
Eq. (A1.11) in Appendix 1 becomes:
() ( )
2
1
N
() 1 1
i=1
( , , ) (| ) log[(| , )]
= ( | )[ ' ' )] 0
N
rr
kk k ik i i i k k i
i
kk
r
ik i i k i i k i k i
Qg Ad
gA AAd
μφ θφ ηθ μ θ
μμ
θφ θ μ θ
+∞
=
−∞
+∞
−−
−∞
∂∂
Σ= Σ
∂∂
Σ −Σ =
∑
∫
∑
∫
(A3.1)
So the updating formula for
k
μ is:
(1) 1 () () 1
1
(| ) '( )
N
rrr
kikiikii
i
Wg A d μ θφ θ θ
−∞
+− −
=
−∞
=Σ
∑
∫
(A3.2)
Introduction of covariate coefficient
k
μ does not affect the gradient equations for all
the variance parameters, so Eq.(A1.13) and Eq.(A1.14) of Appendix 1 still hold for
the updating of { }
k
Σ and
2
σ .
Abstract (if available)
Abstract
This dissertation presents a model-based approach to study genetically polymorphic population as part of the drug development process. A nonlinear random effects model with a finite mixture structure is used to identify pharmacokinetic/pharmacodynamic phenotypes and quantify inter-subject variability. An EM algorithm for both maximum likelihood (ML) and maximum a posterior probability (MAP) estimation of the model is developed. It uses sampling-based methods to implement the expectation (E) step, followed by an analytically tractable maximization (M) step. A benefit of the approach is that no model linearization is performed and the estimation precision can be arbitrarily controlled by the sampling process. By classifying each individual in the population to the most likely subpopulation, this approach also provides a basis for further investigation on the genetic mechanism of the polymorphism and makes it possible for the subsequent genetics-based dose individualization.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Animal to human scaling and pharmacokinetic/pharmacodynamic modeling of anticancer drugs
PDF
Population pharmacokinetic/pharmacodynamic modeling: evaluation of maximum likelihood expectation maximization method in ADAPT 5
PDF
Isoniazid population pharmacokinetics in HIV perinatally exposed infants
PDF
Population modeling and Bayesian estimation for the deconvolution of blood alcohol concentration from transdermal alcohol biosensor data
PDF
A hierarchical physiologically-based pharmacokinetic modeling platform for genetic and exposure effects in metabolic pathways
PDF
Ribavirin systemic and intracellular exposure and its role on viral response and anemia in chronic hepatitis C genotype 1 infection
PDF
Systematic identification of potential therapeutic targets in cancers using heterogeneous data
PDF
Abnormalities in cardiovascular autonomic function: independent effects of sleep-disordered breathing and impaired glucose metabolism
PDF
Systems pharmacology for acute myeloid leukemia and feedback control pharmacodynamic models
PDF
Modeling anti-tumoral effects of drug-induced activation of the cell-extrinsic apoptotic pathway
PDF
Nonparametric estimation of an unknown probability distribution using maximum likelihood and Bayesian approaches
PDF
Bayesian inference using Markov chain Monte Carlo methods in pharmacokinetic /pharmacodynamic systems analysis
PDF
Cardiorespiratory variability in normal and sleep disordered breathing states: a comprehensive model
PDF
Integrative modeling of autonomic and metabolic interactions in sleep-disordered breathing
PDF
A user interface for the ADAPT II pharmacokinetic/pharmacodynamic systems analysis software under Windows 2000
PDF
The effects of the vitreous state on the ocular pharmacokinetics following intravitreal drug delivery: an evaluation with quantitative magnetic resonance imaging
PDF
A model of upper airway dynamics in obstructive sleep apnea syndrome
PDF
Cardiorespiratory interactions in sleep apnea: A comprehensive model
PDF
Physiologically-based modeling to predict monoclonal antibody pharmacokinetics in humans from physiochemical properties
PDF
Cellular kinetic models of the antiviral agent (R)-9-(2-phosphonylmethoxypropyl)adenine (PMPA)
Asset Metadata
Creator
Wang, Xiaoning (author)
Core Title
Pharmacokinetic/pharmacodynamic modeling for genetically polymorphic populations
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Biomedical Engineering
Publication Date
06/05/2007
Defense Date
05/01/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
finite mixture models,mixed effects models,OAI-PMH Harvest,pharmacokinetics/pharmacodynamics
Language
English
Advisor
D'Argenio, David Z. (
committee chair
), Conti, David V. (
committee member
), Khoo, Michael C.K. (
committee member
), Schumitzky, Alan (
committee member
)
Creator Email
xiaoninw@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m509
Unique identifier
UC1443898
Identifier
etd-Wang-20070605 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-493714 (legacy record id),usctheses-m509 (legacy record id)
Legacy Identifier
etd-Wang-20070605.pdf
Dmrecord
493714
Document Type
Dissertation
Rights
Wang, Xiaoning
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
finite mixture models
mixed effects models
pharmacokinetics/pharmacodynamics