Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A hierarchical physiologically-based pharmacokinetic modeling platform for genetic and exposure effects in metabolic pathways
(USC Thesis Other)
A hierarchical physiologically-based pharmacokinetic modeling platform for genetic and exposure effects in metabolic pathways
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A HIERARCHICAL
PHYSOIOLOGICALLY-BASED PHARMACOKINETIC MODELING
PLATFORM FOR GENETIC AND EXPOSURE EFFECTS IN METABOLIC
PATHWAYS
by
Lingyun Du
A Thesis Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(BIOSTATISTICS)
August 2007
Copyright 2007 Lingyun Du
ii
ACKNOWLEDGEMENTS
I would like to express my most heartfelt gratitude to all these people who have
directly or indirectly helped me in the research leading up to this thesis.
I am profoundly indebted to Dr. Duncan Thomas, not only for his enormous
academic guidance and help, but his financial and mental support throughout my
research. He has seen the goodness in me behind my flamboyant surface, allowed
me to pursue my coveted metabolic pathway research with him, and endowed with
me the highest level of independence and freedom in the process. He never fails to
provide with me instantaneous and incisive feedbacks so that my wild enthusiasm
will never propel me out of control. He has lifted me out of the most difficult time
of my life, encouraging me to move forward instead of letting me waste away. I
shall forever cherish Dr. Thomas as a scientific role model and inspiration to me.
I am immensely grateful to Dr. David Conti not only for his constant infusion of
ingeniousl ideas in research, but his everlasting encouragement to and trust in me.
He has always been there for me, be it a programming setback or the need of a
computer to run the simulations. I shall never forget these heart-warming
iii
conversations that Dr. Conti has had with me at these critical times, be it on
metabolic pathway modeling, or a more optimistic view on everyday life. He will
always be remembered as the best peer-like advisor and motivator that I have ever
had.
I am also very thankful to Dr. Richard Watanabe, who has not only contributed
much useful genetics information in the workshops preceding this research, but
taught me the common statistical methods in human genetics in the PM570 course.
Both aspects have been instrumental to my understanding the field of genetics as a
former chemical engineer.
I would also like to thank Dr. Stanley Azen, Dr. Carolyn Ervin, Dr. Kimberly
Siegmund, and the students that I have assisted in PM510 and PM511B. Nods and
smiles from them are always my best anti-depressant on days when research is not
going so well. Their friendships and encouragements of my continued staying
true to myself have meant a lot to me.
The financial support from the USC Biostatistics Division throughout the course of
this research will always be appreciated and cherished.
iv
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ii
LIST OF TABLES vi
LIST OF FIGURES vii
ABSTRACT viii
CHAPTER 1 Introduction 1
CHAPTER 2 Background Research 5
2.1 Modeling joint effects of genes and exposures in
genetic epidemiology 5
2.2 Pharmacokinetic models 7
2.2.1 Pharmacokinetic Models Overview 7
2.2.2 PBPK Models in Toxicology 9
2.3 Hierarchical population models 11
2.4 Statistical approaches 14
2.4.1 The frequentist approach 14
2.4.2 The Bayesian approach 16
2.5 Markov Chain Monte Carlo method 19
2.6 Metabolic pathways and enzyme reaction kinetics 22
2.7 Previous modeling of metabolic pathways 25
CHAPTER 3 Statistical Methods 28
3.1 The Metabolic Pathway and Modeling Philosophy 28
3.2 Data simulation 31
3.3 Parameter estimation 33
CHAPTER 4 Results 38
4.1 Crude gene-exposure interactions elicited by the PBPK simulation
models 38
4.2 Ability of the PBPK estimation models
to retrieve the simulated
parameters in correctly specified genetic models under various
data input scenarios 40
4.2.1 Posterior estimation with exposure, genotype, and
disease data only 40
4.2.2 Posterior estimation with additional measured enzyme
activity data 41
v
4.2.3 Posterior estimation with additional measured metabolite data 41
4.3 Effect of completeness of metabolite concentration and enzyme
activity data on estimation precision 45
4.4 Effect of different measurement error variances in
metabolite and enzyme activity data on estimation precision 48
4.5 Effect of V
max
between-subject variability on estimation precision 49
4.6 Effect of mis-specifying reaction kinetics on estimation precision 50
4.7 Comparison of power of PBPK modeling with that of logistic
regression 52
4.8 Prior sensitivity 54
4.9 Summary 55
CHAPTER 5 Discussion 58
REFERENCES 62
vi
LIST OF TABLES
Table 4.1: Posterior estimates comparing data on E
i
, G
i
, Y
i
only with E
i
, G
i
, Y
i
,
and M
zj
42
Table 4.2: Completeness of metabolite concentration data
on estimation
Precision 46
Table 4.3: Posterior estimates with respect to mis-specified prior means with
E
i
, G
i
, Y
i
, and M
zj
data
input 55
vii
LIST OF FIGURES
Figure 2.1: Example of a PBPK model used for metabolism of TETRA 11
Figure 2.2: Graphical representation of a hierarchical population model used in
TETRA 12
Figure 2.3: An illustrative BUGS graphical model from BUGS user manual 22
Figure 2.4: Examples of Metabolic Pathways 22
Figure 2.5: Reaction rate from Michaelis-Menten kinetics 24
Figure 3.1: The person-specific DAG in the hypothetical metabolic pathway 28
Figure 3.2: Genotypes, poteomics and metabolomic in the topology 29
Figure 4.1: Illustrative plot showing exposure×genotype interaction effect 39
Figure 4.2: History plots showing convergence under various data input
scenarios 42
Figure 4.3: History plots comparing partial with complete P
G2
input on
estimation 46
Figure 4.4: Parameter estimation precision with respect to different metabolite
or enzyme activity measurement error variances 48
Figure 4.5: Comparison of parameter estimation precision at varying magnitudes
of simulated intra-genotype between subject metabolic rate variance 50
Figure 4.6: Comparison of parameter estimation precision between Michaelis-
Menten and linear kinetics 51
Figure 4.7: Comparison of power between hierarchical PBPK modeling and
logistic regression 53
viii
ABSTRACT
Metabolic pathways are candidates for examining gene-exposure interaction effects
on complex diseases. We present a hierarchical modeling platform integrating
physiologically based pharmocokinetic models and Bayesian approaches to
characterize gene-exposure interaction effects in a metabolic pathway. We simulate
datasets under a known metabolic model. Individual-specific metabolic rates are
described by Michaelis-Menten parameters V
max
and K
m
, regressed on genotypes at
the relevant loci. Intermediate measurements such as enzyme activities and
metabolite concentrations are incorporated as flawed measurements of an
individual’s long-term metabolic rates and metabolite concentrations. Parameter
posteriors are estimated using Markov Chain Monte Carlo methods. The platform
allows investigation posterior estimation precision relative to the simulated effect
magnitude and appears to have great utility for characterizing etiologic genetic and
exposure effects and dissecting complex pathways.
Keywords: Metabolic pathways, Physiologically based pharmacokinetic models,
Bayesian approaches, Hierarchical modeling, Exposure, Genotype, Michaelis-
Menten, Intermediate measurements, Enzyme activities, Metabolite concentrations,
Markov Chain Monte Carlo.
1
CHAPTER 1
INTRODUCTION
The field of epidemiology has long been interested in the roles that genes and
environmental exposures play on complex diseases such as cancer. Much of the
work, however, has failed to capture the exact biochemical nature of the reactions
that convert an exposure to reactive toxic species. Conversely, although
physiologically-based pharmacokinetic (PBPK) models have long been established
tools to examine toxicity of chemicals by tracing them through their metabolism
steps, little effort has been devoted to incorporating genetic information into such
models. We propose a statistical approach to bridge the gap between available
biochemical knowledge and genetic modeling, in hopes of better estimating
genotypic and environmental effects on human disease such as cancers.
Many exposures of public health concern require metabolic conversion into
chemically active species before exhibiting toxicity. Metabolic pathways constitute
a sequence of biochemical reactions within the body following the intake of a
chemical. Some chemicals widely present in the environment and diet are capable
of inducing mutations and eventually cancers, and are therefore classified as
carcinogens. Biochemical reactions themselves are carried out by specialized
proteins called enzymes. These enzymes are in turn encoded by genes that are part
of an individual’s genetic makeup. The same level of carcinogen intake could
translate into different degrees of cancer risk across a population (called gene-
2
exposure interaction), however, because genetic makeup could indicate efficient,
medium or deficient enzymatic conversion of the carcinogens. Metabolic pathways
could hence become a powerful tool to examine the etiologic gene-exposure
interactions in human diseases such as cancer.
Correctly modeling the biochemical reactions within the metabolic pathways could
be crucial. It needs to be noted that these biochemical reactions are generally not
linear over the entire range of chemical concentrations. For instance, the enzyme
could only provide so many open binding sites for reaction, and saturation occurs at
high substrate concentrations as a result. Michaelis-Menten (M-M) kinetics has
been widely adopted to describe such saturating effects. In the absence of
competitive inhibition, when two or more types of chemicals are competing for the
same enzyme, Michaelis-Menten kinetics describes the reaction rate as V = V
max
[S]/(K
m
+ [S]), where V
max
and K
m
are the two parameters defining the kinetic
behavior of an enzyme as a function of substrate concentration [S]. V
max
indicates
the maximum reaction rate , while K
m
indicates the concentration of substrate
required to reach a reaction rate of V
max
/2 We intend to take the saturation effect
into account and apply the Michaelis-Menten kinetics to the biochemical reactions
involved in metabolic pathways.
It is also worth mentioning that various family based study designs, case-parent
trios, for instance, have been used to detect gene-exposure interaction. They offer
protection from bias due to population stratification and admixture as in population
3
based studies. Nevertheless, they still treat exposures, genes, and disease as if they
were isolated identities, omitting the many steps in between that would otherwise
tie them together Moreover, traditional logistic regression techniques, the standard
gene-exposure interaction statistical tool, modeling disease phenotype on gene,
exposure, and gene-exposure (G×E) interaction effects, have also become
inadequate. For instance, available metabolomics (surrogate metabolite
concentrations measurements) or proteomics (surrogate enzyme activity
measurements) are difficult to fit into a logistic regression model, whereas in reality
such measurements may provide much information about an individual’s risk of
disease.
Several researchers in the field of epidemiology haven already taken heed to such
inadequacies and made exploratory effort into modeling the gene-exposure effects
from a metabolic pathway perspective. Among them, Cortessis and Thomas (1)
provide a detailed description of the heterocyclic amine and polycyclic
hydrocarbon pathways leading to colorectal polyps, as well as a PBPK modeling
approach examining the metabolic rates dependencies on genotypic and phenotypic
measurements in these pathways. Alternatively, Conti et al. (2) describe a Bayes
model averaging method, stochastically searching for best fitting variable subsets,
highlighting variables with greater impact in the pathway while incorporating
model uncertainty by averaging over all possible models. Both approaches attempt
to take advantage of prior biochemical knowledge about the metabolism of
4
carcinogenic compounds and the various regulatory genes, thus pointing to the
direction of combined Bayesian and PBPK modeling of metabolic pathways.
I present in this thesis a further exploration into the gene-exposure modeling within
a metabolic pathway context. I propose a hierarchical modeling platform that
utilizes strengths from both PBPK modeling and Bayesian approaches, to better
characterize the joint effects of genes on the metabolic activation and detoxification
of carcinogenic exposures. Chapter 2 describes the detailed background research
that forms the basis and motivation of this thesis. The hierarchical modeling
platform and the consequent simulation and estimation methods are presented in
Chapter 3. Chapter 4 provides simulation and estimation results showing
importance of gene-exposure interactions, posterior parameter precisions at various
estimation scenarios such as added metabolite or enzyme activity data, as well as
prior sensitivity and power analysis. The statistical significance and issues
identified from this work are discussed in Chapter 6. Last but not the least, the
author hopes that the hierarchical PBPK approach will be perceived as a more
accommodating and potentially etiologically informative strategy in gene-exposure
interaction modeling, in lieu of the increasingly fine molecular biological data
available.
5
CHAPTER 2
BACKGROUND RESEARCH
This thesis constitutes a novel marriage of Bayesian method and PBPK modeling
in detecting the joint genetic and environmental effects in metabolic pathways.
Background research was therefore conducted with respect to seven associated
topics: (1) modeling joint effects of genes and exposures in genetic epidemiology,
(2) pharmacokinetic models, (3) hierarchical population models, (4) statistical
approaches, (5) Markov Chain Monte Carlo method, (6) metabolic pathways and
enzyme reaction kinetics, and (7) previous modeling of metabolic pathways.
2.1 Modeling joint effects of genes and exposures in genetic epidemiology
Many complex diseases are neither strictly genetic nor purely environmental,
arising usually through the joint effects of genetic susceptibilities and
environmental exposures. It thus becomes natural to understand how genetic
makeup and exposure history interact to cause disease susceptibility. Joint effects
of genes and exposures are typically characterized by how the genotypic effect on
disease risk changes with exposure level or, conversely, whether the effect of
exposure varies with different genotypes.
Various methods assessing gene-exposure interactions are based on models that
predict the relative risk of disease, the ratio of penetrances, for instance, which are
the ratios of disease probabilities for a particular genotype versus the other.
6
Interactions are declared when the genotype relative risks vary significantly across
levels of the exposure, recognizing that they do not necessarily imply biological
interaction (3).
In genetic epidemiology, population-based case-control studies can be used to
assess gene-exposure interactions, but are prone to spurious findings due to hidden
genetic structure such as population stratification or admixture(4,5). Family-based
designs such as case-parent triads offer protection from bias caused by hidden
genetic structure and various researchers have proposed assessment of genotype-
exposure interaction by grouping case-parent triads according to the exposure
status of the case, and testing for differences between exposed and unexposed
groups. One group of proposal, for instance, tests for differences in the
transmission rates of the variant allele from heterozygous parents to exposed versus
unexposed offspring (6-8). Other proposals include comparing the genotype
distribution of unexposed cases to that of exposed cases, conditional on parental
genotypes (9-12).
The most obvious pitfall of case-parent designs is that exposure effects cannot
usually be estimated from case-parent data. The case-only design, which requires
only diseased subjects, allow for estimation of exposure effects but are susceptible
to bias arising from non-independence between the
genetic and environmental
factors in the population. In practice, it may be difficult to collect genetic
information on parents especially if the disease is of late-onset and family-based
7
tests are generally less powerful in detecting genetic main effects than a population
based-control study at the same number of genotyped subjects.
Nevertheless, whether case-parent trios or cases only, these studies shed little light
on etiologic gene-exposure interactions because they do not usually examine the
underlying biology of how the same exposure levels translates into different disease
risks across genotypes, which must have its biochemical basis. In other words, the
complex activation and detoxification steps that usually follow the intake of an
exposure are not acknowledged in these study designs, as if a complete “black box”
persisted between exposure, genes, and disease, whereas in reality we already have
varying degrees of information into the black box that could be helpful to
understanding etiologic gene-exposure interactions. Studies that trace the
metabolism of a carcinogen mediated by the respective encoding genes within a
human body allows better use of the information between exposure and disease,
and subsequently could yield more clues on etiologic gene-exposure interactions.
2.2 Pharmacokinetic models
2.2.1 Pharmacokinetic Models Overview
Pharmacokinetic models trace the disposition of chemicals in the body, e.g., the
processes of absorption, distribution, metabolism, and elimination.
Pharmacokinetic (PK) modeling has had a long history that began as data-based PK
compartmental models developed in the 1930's, when only limited tools were
available for solving sets of differential equations. These models were expanded in
8
the 1960's and 1970's to accommodate new observations on dose-dependent
elimination and flow-limited metabolism. In the 1970's PBPK models were
developed to evaluate metabolism of volatile compounds of occupational
importance, and, for the first time, dose-dependent processes in toxicology were
included in PBPK models in order to assess the conditions under which saturation
of metabolic and elimination processes lead to non-linear dose response
relationships(13-16). In the 1980's insights from chemical engineers and
occupational toxicology were combined to develop PBPK models to support risk
assessment with methylene chloride and other solvents. The 1990's witnessed
explosive growth in risk assessment applications of PBPK models and in applying
sensitivity and variability methods to evaluate model performance. Some of the
compounds examined in detail include butadiene, styrene, glycol ethers, dioxins
and organic esters/aids(17-23).
Data-based PK modeling implies that a "model" is fit to the experimental data. It is
thus the data that define the complexity of the structural model. The philosophy
behind whole body PBPK is the overlay of drug specific data onto an essentially
independent structural model, comprising the tissues and organs of the body with
each perfused by and connected via the vascular system (Figure 2.1). The
independent physiological data comprise, among others, tissue structure, tissue
volume, tissue composition, and associated blood flows—all anatomically correct.
An attractive feature of the model is that its structure is essentially common to all
9
mammalian species, thereby facilitating interspecies scaling. In addition, as
relevant knowledge of the physiological and morphological data becomes available,
as well as how drugs interact with the components of the system, the possibility
exists for efficient use of limited drug-specific data in order to make reasonably
accurate predictions as to the pharmacokinetics of specific compounds, both within
and between species, as well as under a variety of conditions.
PBPK models may exhibit different degrees of complexity, which in their simplest
form, can be reduced to its steady-state behavior characterized by such parameters
as clearance and volume of distribution. The drug-specific data could include tissue
affinity, plasma protein binding, and membrane permeability, as well as enzymatic
and transporter activity. As a consequence, whole body PBPK models facilitate
virtual "visualization" of events at various sites and tissues within the body and
could explain the causal basis of the observed data.
2.2.2 PBPK Models in Toxicology
Examples of PBPK models are those of dioxins (24-25) and benzopyrene (26)
toxicity. The dioxin model describes the dispersion of dioxin in rats with the
intention of determining by extrapolation the maximal tolerated daily intake of
dioxin, defined as the exposure level (g/kg/day) to which the general population
may be chronically exposed without developing adverse toxic effects. The
benzopyrene model is a simulation model of intravenous injection of benzopyrene
10
to rats, with motivation to predict for human beings the importance of the
metabolism of benzopyrene and its metabolites, which may cause cancerous tumors
in the liver. It is worth noting that the concentration of unchanged benzopyrene in
the blood and the concentration of the metabolites in the blood and bile can be
measured. Moreover, both dioxin and benzopyrenes can be ingested from food,
where other carcinogens such as heterocylic amines are also present. It thus seems
tempting to apply PBPK modeling approaches to mechanistically study the effect
of dietary exposures on cancer risk among humans.
Nevertheless, the values of the physiological and physicochemical parameters with
PBPK models may not always be known with precision, especially in vivo. The
uncertainty tends to be more pronounced in the metabolic parameters (27-28). It is
therefore advisable to use a statistical model to derive population parameter
estimates, properly accounting for the inter- and intra-individual variabilities
inherent in the pharmacokinetic data.
11
Figure 2.1 Example of a PBPK model used for metabolism of TETRA. Q, blood
flows; V, compartment volumes; P, partition coefficients; VPR, ventilation over
perfusion ratio; V
max
, maximum rate of metabolism; K
m
, Michaelis-Menten
parameter.
2.3. Hierarchical population models
Whenever one is to make inference about the kinetic behavior of a certain chemical
in the general population, it is necessary to derive quantitative information from
kinetic data collected at the individual level. The simplest approach in this regard is
naive pooling of data from various subjects and subsequent regression against mean
concentrations in the sample population over time. For this method, only an
estimate of the mean behavior of the chemical in the general population is derived,
losing the rich kinetic information content contained at individual levels. . A better
12
approach is the two-stage method, where the model is fit to the data for different
individuals separately. Population behavior is then summarized by calculating
descriptive statistics from the individual parameter estimates. Nevertheless, the
combination of estimates is obtained without any statistical model accounting for
the inter-individual variability in the two-stage methods..While more sophisticated
two-stage models have been suggested, the most effective approach is probably to
use a hierarchical structure to distinguish intra-individual variability from
variability at the population level. Figure 2.2 is an illustration of a hierarchical
structure appropriate to the PBPK model in figure 2.1.
Figure 2.2 Graphical representation of a hierarchical population model used in
TETRA. Square: observed variables, circle: unknown variables, triangle: the
deterministic TK model ƒ; Solid arrow: stochastic dependencies, dashed arrow:
deterministic dependencies.
The fundamental philosophy of a population model is that the same model can
13
describe the concentration-time profiles in all individuals, and that the model
parameters can vary from individual to individual. The individual parameter sets
are assumed to have arisen from a theoretical population distribution. The
population distribution may be assumed to be known or unknown. The former
approach, which is the most common, is called parametric, with the population
parameters estimated conditional on the assumed shape of the distribution. Within
the latter approach, known as non-parametric methods, the population distribution,
both with regards to the parameter values and the shape of the distribution, is to be
investigated.
Hierarchical population models have a long history of use in educational research
(29), and econometrics (30). They are now widely used in pharmacokinetic studies
as part of drug development procedures, where they are formulated to describe the
inter- and intra-individual variability in dose concentrations observed empirically
(31). There are many published works in this regard and Scheiner et al. provides a
comprehensive discussion of the rationales in population pharmacokinetic
modeling (32).
Nevertheless, most of the population pharmocokinetic works have hitherto stayed
within drug development. Genotype information is rarely included and above all,
the exposures involved are typically rare, assigned, and carefully quantified drug
dose. Genetic epidemiology, in contrast, is more interested in common exposures
14
naturally acquired from diet or the environment, typically recalled or assessed
categorically with considerable errors. There is indeed necessity to extend the
pharmacokinetic hierarchical modeling philosophy in drug development to
describing how common exposures and common genotypes work together to cause
common diseases in genetic epedimiology, based on their shared pharmocokinetic
root.
2.4. Statistical approaches
Whenever inferences are made based on collected data, it is necessary to resort to
some sort of statistical approach to assure that these inferences are grounded. The
most common statistical approach in scientific research is probably the so-called
frequentist or “classical” method. However, there exists another option, namely the
Bayesian approach.
2.4.1. The frequentist approach
A frequentist analysis proceeds by using statistics to determine the probability of
observing the actual data under alternative values of the unknown parameters.
Inferences about the parameters are based upon which alternative values of the
parameters best make the expected observations match the actual observations.
In statistical modeling, the population parameter values around which the sampled
data are most likely to have arisen from, are derived by the use of a maximum
likelihood estimator (MLE). MLE seeks the most likely value of the parameter N by
15
maximising the likelihood function of the observed data set over all possible values
of N. The likelihood function is expressed as:
) | ,..., ( ) (
1
n D
x x f lik = (Eq. 2.1)
where x
1
,…x
n
are the observed data set.
The maximum likelihood estimator reports the parameter values as point estimates
with associated standard deviations or confidence intervals.
Logistic regression is one of the most often adopted frequentist approaches in
genetics, in which a logistic dependence was assumed between binary dependent
and independent variables (categorical or continuous). Disease status is regressed
on observed individuals’ genotypes, exposures, and (or) gene-exposure
interactions, with the estimated regression coefficients denoting ratios of log(Odds)
of outcome comparing one group to another using the maximum likelihood
method.
Frequentist approaches tend to ignore the results in previous studies or any external
evidence other than the data collected in the present study. Morever, the measures
of uncertainty produced in a frequentist analysis only takes the data collected in the
present study into account. The plausibility of the results from a frequentist analysis
is often a cause for concern, and the results of the analysis must always be assessed
with regard to the previous research by the researcher after the analysis.
In the case of PBPK modeling, available toxicokinetic data are usually relatively
16
sparse, considering the complex model structure, with many parameters impossible
to estimate when only the present toxicokinetic data are taken into account. It
therefore becomes necessary to utilize the knowledge about parameters in the
model available from previous studies, although such knowledge is subject to
varying degrees of uncertainty. For instance, the mean value of the unit perfusion in
liver tissue may be known with a fair amount of precision, with a reasonable
estimate of the level of inter-individual variability of that perfusion also available.
In contrast, there may be very little data on the mean metabolic capacity for a given
chemical in a human population. There may only be an educated guess available,
based on animal-to-human extrapolation. These different levels of uncertainty are
difficult to account for properly using the standard frequentist approach.
2.4.2. The Bayesian approach
A Bayesian approach, in contrast, allows combination of both current observed data
and existing knowledge about parameter values from the scientific literature. In a
Bayesian setting, before conducting an experimental study, a prior probability
distribution, p( ) is constructed to reflect prior knowledge of . This prior
distribution is then updated using the current observed data values, y, to yield a
posterior probability distribution of model unknowns. Bayes' theorem shows that
the posterior probability distribution of given y, p( |y), is proportional to the
product of the likelihood and the prior p( ). In a Bayesian framework, the
likelihood of a particular data set, given an assumed model and its parameter
17
values, is the conditional probability distribution of the observed data, p(y| ).
Bayes' theorem can be viewed as a formula that shows how prior beliefs, formally
expressed as probability distributions, are modified by new information:
) ( ) | ( ) | ( p y p y p × .( Eq. 2.2)
In the case of the previous population PBPK model in Figure 2.3, Equation 2.3
takes the form
p( ,Q, ,
2
|y) p(y| ,Q, ,
2
) p( ,Q, ,
2
) (2) ( Eq. 2.3)
A Bayesian approach therefore makes it possible to merge a priori knowledge from
the literature with the current information in experimental toxicokinetic data,
producing updated posterior knowledge about parameters of interest. All inferences
about , i.e., means, standard deviations (SDs), quantiles, correlations, and
marginal distributions can be obtained from appropriate integrations of posterior
distribution p( |y).
The use of a probability distribution to represent prior knowledge offers a unified
way to account for precise as well as vague information available before
experiments. When little prior knowledge is available about particular parameter
value, a non-informative, such as uniform, distribution can be used. In the event of
that stronger prior knowledge is available, the distribution shape, location, and
dispersion parameters should be selected to represent that information as
appropriately as possible. In the context of PBPK models, there can be very little
18
prior knowledge about the parameter values for a specific individual because
literature typically reports average values or variances for a population that are also
not genotype- specific. It then becomes advantageous to also use a population
hierarchical structure, because such information can be used directly in the form of
prior distributions for the population parameters. For chemical-specific parameters,
prior information is often less precise but can still be obtained from previously
published experiments such as estimates of metabolic parameters obtained from in
vivo experiments (33). In all cases, it is preferable when setting uncertainties on
population parameters to be conservative, while allowing the parameters that have
physiological interpretation to stay within their biological limits.
As the Bayesian approach takes prior knowledge into account and is prediction
oriented, it is particularly suitable in PBPK modeling for risk assessment. However,
the Bayesian approach is also not free of its critics. One common concern is that
the Bayesian approach is too subjective. Different researchers may make different
assessments of previous results, and may thus assign different priors, leading to
different parameter estimates for the same observed data. Bayes factors, as the ratio
of posterior to prior sampling probabilities, could help alleviate but not eliminate
this concern, and in practice, prior sensitivity analyses are often necessary in
Bayesian modeling. However, the author would like to contend that considerable
subjectivity also goes into frequentist analyses, in terms of model choice, statistical
tests, confidence levels, etc.
19
2.5. Markov Chain Monte Carlo method
The question then becomes how to evaluate p( |y) when the non-linear form of
PBPK models render an analytical expression impossible. Until recently, this
intractability, along with enumeration of high dimensional integrals, has been the
practical hurdle to Bayesian analyses. Fortunately, Markov Chain Monte Carlo
(MCMC) method has come to rescue.
MCMC method is an iterative sampling scheme that provides direct
approximations to a complex joint posterior distribution. It consists of
constructing one or several Markov chains that converge, as iterations progress,
toward the target posterior distribution, p( |y). Various MCMC methods have
been developed to date, which depend on how the Markov chains are constructed.
Among them, the Metropolis-Hastings (MH) algorithm has been shown as very
efficient in dealing with Bayesian population pharmocokinetic models.
Briefly, the MH algorithm starts by assigning all model unknowns initial values,
for instance, by sampling from their respective prior distributions. At each
subsequent iteration step, each component,
k
, of the parameter vector is updated
according to the following acception/rejection rule: a proposed value,
k
´, is
sampled from a "proposal" distribution (e.g., normal, centered on the current value
k
). The joint posterior density is then computed at
k
and
k
´. If the ratio of
joint probabilities exceeds 1, the new value
k
´ is accepted and replaces
k
;
otherwise,
k
´ is accepted only with probability equivalent to the ratio. In case of
20
rejection of
k
´, the value
k
is kept. In the event of Gibbs sampling, which is a
special case of the Metropolis-Hastings algorithm, the proposed point is always
accepted. After updating all components of sequentially, their values are
recorded, and this completes one iteration of the Markov chain, and many
iterations are typically needed. It has been shown that under some regularity
conditions (i.e., the Markov chain has to be irreducible, aperiodic, and positive
recurrent), the chain converges toward the target joint posterior distribution and
this process of achieving convergence is known as burn-in. Samples drawn after
burn-in can be considered as approximate samples from p( |y), and can thus be
used to make inferences such as posterior means and standard deviations. The
proposal distribution should be chosen on the basis of its ease of simulation and
approximation of the posterior distribution. Techniques to monitor convergence
are well described in the literature (34-36) and will not be discussed here.
MCMC methods should not be confused with the traditional or standard Monte
Carlo methods used in uncertainty analysis. In MCMC, the values drawn for each
parameter start from its prior distribution and converge, as iterations progress, to a
data-adjusted posterior distribution. In simple Monte Carlo sampling, the values are
always drawn from the prior distribution, thus no updating of the prior belief is
performed. It should also be noted that if uniform, i.e. non-informative priors
(complete ignorance about plausible values) are used, the posterior will be
proportional to the likelihood of the data, and in the end equivalent to the standard
21
likelihood-based (frequentist) approach to PBPK modeling. Conversely, if the data
do not convey any information at all about the parameters, the posterior
distributions will be equivalent to the prior.
BUGS (Bayesian Inference Using Gibbs Sampling), is one efficient statistical
package for carrying out Gibbs sampling on Bayesian networks (37). BUGS offers
flexible and concise coding, as well as well summarized posterior parameter
outputs using the Marko-Chain-Monte-Carlo method. It is particularly useful for
the Bayesian analysis of complex statistical models such as PBPK ones (38-40), for
which a modified version called PKBUGS is also available. In brief, BUGS
simulates the analytically difficult-to-obtain posterior probability distributions by
iteratively drawing samples from the full conditional distributions of all unobserved
nodes, given the current or known values for the other nodes, based on a Direct
Acyclic Graph (DAG) structure representing the various component probability
distributions(Figure 2.3). Specifics of the BUGS software are readily accessible
from a variety of on-line sources (http://www.mrc-
bsu.cam.ac.uk/bugs/weblinks/webresource.shtml).
22
for(i IN 1 : N)
sigma
tau beta alpha
mu[i]
Y[i]
for(i IN 1 : N)
index: i from: 1 up to: N
Figure 2.3 An illustrative BUGS graphical model from BUGS user manual, where
Y[i] depends on mu[i] and tau, with mu[i] being a logical function of alpha and
beta.
2.6. Metabolic pathways and enzyme reaction kinetics
Metabolism consists of all reactions necessary to sustain life in a living organism.
Essentially all metabolic (or biochemical) reactions convert molecules named
substrates to products (Figure 2.4-left). A metabolic pathway is a cascade of
metabolic reactions (Figure 2.4-right), which encompasses a broad spectrum of
diverse biological, chemical, and physical data.
Figure 2.4: Examples of Metabolic Pathways.
Left: Metabolic reaction; Right: Metabolic pathway
S: substrate, E: enzyme, P: product A to D: intermediates
23
The past decade has seen remarkable progresses in molecular biology, leading to
improved understanding of biological activities at cellular levels and the sequence
of events as in metabolic pathways. We now have a clearer idea of the activities
inside cells, from which we expect to make better predictions about the bigger
event such as disease. In the case of a carcinogen intake, we now know that it can
be transformed into further reactive species and (or) detoxified by enzymes Some
reactive species that remain have the capability of irreversibly binding to DNA,
hindering the normal transcription process in protein synthesis and disease might
ensue as a result. It thus becomes clear that metabolic pathways naturally tie
exposures, enzymes (encoded by genes), and disease together, and are perfect
biological “microscopes” for gene and exposure effects examination.
The increased biological knowledge about metabolic pathways calls for statistical
approaches that could properly take into account the increased information when
drawing inferences on population parameter values in the metabolic pathways. that
can be used to correctly Naturally, the most appropriate statistical models should
respect and take into account the biological nature of the data to the maximum
degree, without arbitrarily omitting biologically specified information. From a
metabolic pathway modeling perspective, one crucial biological feature it should
address is the non-linear nature of the metabolic reactions that determine metabolite
concentrations and eventually disease risks, if the metabolites are mutagenic.
24
Michaelis-Menten kinetics is the most widely adopted kinetics to describe the non-
linear nature of metabolic reactions. The kinetics (illustrated in Figure 2.5) are
determined by two parameters V
max
and K
m
in the form of
] [
] [
max
S K
S V
m
+
, where [s]
stands for substrate concentration. V
max
is the maximum rate that the enzyme can
process reactants, at which the enzyme open sites are fully saturated with substrate.
K
m
indicates how much substrate is needed for the enzyme to reach V
max
/2.
Conseuqently, V
max
is more of an intrinsic property of the enzyme itself, whreas K
m
measures more of the substrate affinity to the enzyme, which largely depends on
the properties of the substrate. Because enzymes are encoded by metabolic genes
that constitute an individual’s genetic makeup, these interpretations have important
connotations on statistical modeling. We could infer that V
max
will vary more
widely than K
m
between individuals, given the same substrate, and the variation in
V
max
could capture the most essential genotypic information contained in metabolic
pathways.
Figure 2.5. Reaction rate from Michaelis-Menten Kinetics
25
2.7. Previous modeling of metabolic pathways
Literature on statistical modeling of metabolic pathways hitherto Recently, the field
of Bioinformaticsthat are aimed at constructing or refining pathways from gene
expression data as in micro-arrays. For instance, Kurhekar et. al.(41) present a
pathway scoring method, in which various biologically possible pathways were
generated and scored with respect to gene expression measurements, the most
significant score indicating the most plausible pathway . Karp. et al. (42) present a
computational pathway analysis in a much broader context of the human genome,
assigning 2,709 human enzymes to 896 bioreactions, with 622 of the enzymes
assigned roles in 135 predicted metabolic pathways. These all represent a necessary
first step toward quantitative modeling of metabolic pathway by validating the
biologically plausibility of a pathway. The ensuing question is: what significance
does the pathway, once or already validated, have on the outcomes, disease? To be
more specific, which exposure(s), gene(s), step(s) in a pathway are the most telling
of a disease that are significant causes of morbidity and mortality?
Several researchers in the field of genetic epidemiology have taken heed to this
question and made explorative effort of incorporating known metabolic pathway
knowledge into disease modeling. The most relevant to this work is probably those
by Cortessis and Thomas (1), who illustrate the use of a toxicokinetic approach in
modeling the gene-exposure and gene-gene interactions, by the well-understood
metabolic pathways of heterocyclic amines and polycyclic aromatic hydrocarbons.
26
Their approach contained characteristic PBPK features with respect to equations
describing the step-wise carcinogenic flow, modeling metabolic rates as log-normal
distributions around genotypes, and incorporating enzyme activity data as flawed
phenotypic measurements around true metabolic rates. As an explorative effort,
linear kinetics was assumed in their work thus saturation was inadequately
accounted for, which could be of particular importance for individuals with high
carcinogen intakes. Moreover, Conti et al.(2) describe a Bayes model averaging
method, stochastically searching for both the most significant pathway and the best
fitting variable subsets within a pathway, incorporating model uncertainty by
averaging over all possible models pertaining to pathways leading from smoking
and well-done red meat to colorectal polyps. Both approaches attempt to take
advantage of the metabolism of carcinogenic compounds and the various regulatory
genes in established pathways with reasonable success, thus pointing to a
possibility of gene-exposure modeling from a metabolic pathway perspective.
It is desired to further evaluate the suitability of Bayesian PBPK modeling in
metabolic pathways and genetic epidemiology, however. This is particularly
important in the sense that the standard, frequentist approaches such as logistic
regression may not provide the most ideal tool to accommodate the rapidly
increasing toxicokinetic knowledge and intermediate measurements such as
metabolite concentrations or enzyme activity levels, which would otherwise be
illuminating on disease etiology. In contrast, the extensively studied PBPK
27
models, with its intrinsic goal of determining toxicity of a pre-specified and
measured substance, seldom takes into account of individuals’ genotype and
exposure information that would most likely affect the outcome, which is of key
interest in genetic epidemiology. Consequently, hierarchical PBPK modeling on
metabolic pathways using the Bayesian approach, should be considered as a
potentially attractive coupling of PBPK modeling and genetic epidemiology for
better understanding the gene-exposure interplay on disease outcomes.
28
CHAPTER 3
STATISTICAL METHODS
3.1 The Metabolic Pathway and Modeling Philosophy
In this pathway, an individual’s dichotomous exposure intakes (E
i
) lead to disease
Y
i
, through a series of steps of transformation, detoxification, and excess excretion,
during which metabolites Zs are created. Squares represent measured
epidemiological quantities such as genotypes, exposures and disease status,
triangles represent intermediate metabolite concentrations, and circles represent
either latent rate variables or population parameters to be estimated. Solid arrows
denote biologically deterministic relationships between quantities and dashed
arrows denote stochastic dependencies.
Figure 3.1: The person-specific DAG in the hypothetical metabolic pathway
E
1
E
2
Z
1
Z
2
Z
3 Y
1
2
G
2
3
G
3
G
1
1
Excreted
3
Detoxified
Excreted
Figure 1B
Z
T
Measured quantities
Parameters of interest
Latent Variables
Stochastic relation
Deterministic relation
Z
m
: metabolites,
m
: Excretion rates
g
: Activation or detoxification rates
29
In the simulated topology. genes G
1
, G
2
, and G
3
at locus 1 2, and 3 exert their effect
through
ik
=(V
ik
, K
ik
), V
ik
and K
ik ,
denoting an individual is Michaelis-Menten
enzymatic activity parameters V
max
and K
m
encoded by gene G at locus k, which
stochastically vary both between and within genotypes ( Figure 3.2).
Figure 3.2: Genotypes, poteomics and metabolomic in the topology
For each individual i, we denote by Z
im
the mth metabolite concentration, around
which a measured M
im
would be obtained. The conversion rate of metabolite Z
im
to
Z
im+1
depends deterministically on Z
im
itself, as well as the individual specific
ik,
for which the corresponding enzyme activity P
ik
is measured. An individual’s net
accumulation rate for metabolite Z
im
is the balance between the rate at which Z
im
is
being created from its precursor Z
im-1
, and the rates at which it is being transformed
Z
m
Z
m +1
g
g
2
P
g
g
2
G
g
g
2
Proteomics
Proteomics
Measurement
Error
Measured
Genotype
Long-Term
Metabolic Rate
Between
Subjects
Variance
Genotype Specific
Mean Rates
Between
Genotypes
Variance
g
M
m
M
m +1
2
m
2
m +1 Metabolomics
Metabolomics
Measurement
Error
30
into its successor Z
im+1
or detoxified to waste. At steady states and Michaelis-
Menten kinetics, an individual’s metabolite concentrations Z
im
at steady states can
therefore be explicitly expressed in Eq. 1 to Eq. 6.
3 3 2
2 2
2 3
2
2 3
3
2 2
2
1
1 1
1 2
1 1
1 1
1 1
i i i
i i
i i
i
i i
i
i i
i
i
i i
i i
i i
i i
i
iT iT
iT iT i
Z Z
Z K
V
dt
dZ
Z
Z K
V
Z K
V
Z
Z K
V
dt
dZ
Z
Z K
V
Z K
Z V
dt
dZ
+
=
+
+
+
+
=
+
+
+
=
where Z
iT
denotes the total initial carcinogenic intake, which is a function of an
individual’s exposure status to be discussed later.
Solving at steady states, define
R
ZiT
=
iT iT
iT iT
Z K
Z V
+
, R
Zimk
=
im ik
im ik
Z K
Z V
+
(m= 1, 2; k=1, 3 )
Eq.1
Eq. 2
Eq .3
31
3
3
3
3 2 11 2
2 3 11 3 2 11 2
3 2 11 2
2
2 2
2
2 2
2
1 1 1 1 1 1
1
1 1
2
1 1
1
and
) ( ) (
,
where
2
4
,
where
2
4
i
Zij
i
i i Zi i
i i i Z i i i Z i
i i i Z i
i
i i i i
i
ZiT i i ZiT i i i i
i
i i i i
i
R
Z
K K R c
K V R K V R b
V V R a
a
c a b b
Z
R K c R K V b
c b b
Z
=
=
+ =
=
=
= + =
+
=
These equations form the scaffold of the hierarchical PBPK modeling platform. It
will be expanded below by relating “omics” to individual’s Z
i
s and
i
s, and his
(her) disease status Y
i
to exposure status E
i
, genotypes G
ik
at locus k,
and final
metabolite concentration Z
i3
, through parameters
, , and , respectively
.
3.2 Data simulation
Data is simulated from the topology and PBPK models for the hypothetical
pathway in the statistical package R. In simulating data, cohort study populations
were generated following some binomial exposure (yes/no) and genotype (aa=0,
aA=1, AA=2) distributions. Within this cohort, total initial carcinogenic intake Z
iT
for individual i was modeled as a log-linear regression on his (her) E
i
status.
Person-specific vectors of Michaelis-Menten rate parameters -
ik
= (V
ik
, K
ik
) for locus
k were random draws from univariate log-normal distributions respectively with
mean depending on genotypes G
ik
through parameter vector . (assuming a priori
null correlation between V
ik
and K
ik
, although multivariate lognormal distributions
Eq. 4
Eq. 5
Eq.6
32
could be used otherwise). Enzyme activities P
ik
were generated from log-normal
distributions centering on V
ik
/K
ik
at varying measurement error variances V
2
k
as
within half, one, or one-and-a-half standard deviations of the mean. The -
ik
s were
incorporated into Eq. 4 to 6 to compute the metabolite concentration Z
i
s, which also
serve as the true log mean values for measured metabolite concentration M
im
, also
assessed within half, one, or one-and-a-half standard deviations of the means.
Finally, disease status Y
i
is obtained from a logistic dependence of Y
i
on Z
i3.
In statistical notations, exposures, genotypes, metabolites and disease are simulated
as:
Exposures: E
i
~ Binomial ( N, 1, P
E
)
Genotypes: G
ik
~ Binomial ( N, 2, P
G
)
Metabolites: log (Z
iT
) ~ N (X
0
+:
1
E
i
,V
2
E
)
Y
ik
= (V
ik
, K
ik
) ~ MVLN (µ
Gk
,
2
Gk
),
where µ
Gk
=Z
0k
+Z
1G
G
ik
,
2
Gk
treated in this work as a priori null
G
ik
denoting individual i’s genotype G at locus k
Z
iT
Z
i3
Metabolite concentrations or enzyme activity levels:
log (M
im
) ~ N (log (Z
im
),\
2
m
)
log (P
ik
) ~ N (log (
ik
), V
2
k
),
ik
=
V
ik
/K
ik
Y
ik
Eq.4 to Eq.6
33
Disease: logit Pr (Y
i
= 1| Z
i3
) =]
0
+]
1
Z
i3
At the same exposure and genotype distributions, simulating at different population
parameter values
, , would produce cohort populations of varying disease
probability distributions. Nested case-control datasets randomly drawn from a
simulated cohort population of 5% disease probability, with complete exposure E
i
,
genotype G
ik
information, and partial or complete enzyme activity P
ik
or metabolite
data M
im
for 500 cases and 500 controls were fed into the hierarchical models to
draw posterior inferences on parameters
, , .
3.3 Parameter estimation
, , are in reality, unknown parameters of primary interest that summarize the
effect of E
i1
and E
i2
on initial metabolite concentration Z
iT
, the population
genotypic effect on -
ik
for gene G at locus k, and the effect of final metabolite
concentrations, on disease phenotype Y
i
for each individual, respectively. We seek
to estimate
, , from the hierarchical models by treating them as stochastic
quantities with unknown means and standard deviations. Inferences on
,
, are
drawn from the simulated datasets by estimating joint posterior probability with
proteomics, Pr (, <, = | Y
i
, E
i
, G
ik,
P
ik
), or with metabolomics, Pr (, <, = | Y
i
, E
i
,
G
ik,
M
im
), or no “omics”, Pr (, <, = | Y
i
, E
i
, G
ik
), respectively
Illustrated by the no
“omics” scenario, this probability is proportional to
34
_
i
Pr (Y
i
| Z
i3
;= ) Pr (Z
i3
| Z
i2
;<
G2
) Pr (Z
i2
| Z
i1
;<
G1,
<
G2,
<
G3
) Pr (Z
i1
| Z
iT
;<
G1
)
Pr (Z
i0
| E
i
;:) × Prior (:) Prior ( <
G1
) Prior ( <
G2
) Prior (<
G3
) Prior (= )
where the Zs are deterministic functions of their precursor concentrations yet
implicit stochastic functions of genotypes.
An analytical solution to this probability is impossible, due to the non-linear
dependencies between the metabolite concentrations. Nonetheless, the highly
structured nature of the physiologically-based models favors the use of Bayesian
approach readily implemented in WINBUGS 1.4.1, where we estimate the joint
posterior distribution of , , by iteratively drawing samples from the full
conditional distributions of all unobserved parameters and variables, given the
current or known values for the other parameters and variables in the model in a
hierarchical fashion.
The first level of posterior estimation includes modeling each subject’s probability
of disease as a result of his (her) individual specific exposure, metabolic rates, and
final metabolite concentrations, i.e., Pr (Y
i
= 1| 5, -
i
(.), 6 ; E
i
,G
i
). This probability
distribution was represented in a structure similar to that used in data simulation,
the only difference being that in estimation, ,,= were stochastic quantities to be
estimated, instead of assigned quantities. Measured metabolite M
im
, can be
incorporated into the models at this stage as log normal distributions around an
35
individual’s true long term average of Z
im
, i.e., log (M
im
) ~ N (log (Z
im
),
2
m
),
2
m
denoting the measurement error variance. Similarly, measured enzyme activities
P
ik
, can be fed in as a log-normal distribution around the individual’s long term
average rates Y
ik
, with measurement error variance @
2
k
, i.e., log (P
ik
)~ N (log (
ik
),
2
k
), -
ik
= V
ik
/K
ik
.
At the second level, we used population-level models to relate the individual
specific enzyme activity levels
ik
to their genotype-dependent population mean
values. Thus, for each individual,
ik
is a realization from a multivariate log-normal
distribution:
log (-
ik
)~ MVN (µ
G
k
,
2
G
k
), µ
G
k
denoting the population mean - for genotype G at
locus k, i.e., µ
G
k
= .
0G
+ .
1G
G
k
, assuming an additive genotypic effect model, and
2
Gk
being the variance-covariance matrix( treated in this work a priori as null, but
necessary if biology suggests V
max
and K
m
for a particular gene is a priori
correlated) . In this stage, .
1G
is the key parameter, representing the population
mean genotypic differences in enzyme activities on a log scale.
At the third level, prior distributions are specified for
, , ,
2
. Normal priors were
assigned to
, , with mean either at 0, reflecting the null assumption of no effect,
or at arbitrary mis-specified non-zero values as a test of prior sensitivity, and prior
variance at 100, 10, and 2 that reflect their respective biological bounds. Because
2
represent the variance terms in log-normal distributions for either V
max
or K
m
,
36
relatively tight inverse gamma distributions were specified for
2
to reflect the
maximum physiological bound of the parameters, as well as ensuring numerical
stability. 7
k
2
and \
m
2
for the intermediate measurements P
ik
and M
im
should be
guided by laboratory measurement precisions in practice but are sampled this work
from gamma distributions with a priori means corresponding to known
measurement error variances of within half, one, and one-and-a-half standard
deviations of their respective model derived values.
Final inference on the
parameters is based on the comparison of their posterior means (standard
deviations) to the priors, as well as Bayes factors, which are the ratios of after burn-
in posterior sampling frequency to prior sampling frequency of a parameter within
95% range of the simulated magnitude, i.e., 97.5%*Sim.Value to
102.5%*Sim.Value.
As an illustration, parameters are estimated in the following hierarchical fashion:
a. First level individual disease models
log (Z
iT
) ~ N (X
0
+:
1
E
i
,V
2
E
)
Z
iT
Z
i3
Y
i
~ Bernoulli (1, expit(]
0
+]
1
Z
i3
))
log (M
im
) ~ N (log (Z
im
),\
2
m
)
log (P
ik
) ~ N (log (
ik
), V
2
k
)
b. Second level population metabolic-rate-models
Y
ik
= (V
ik
, K
ik
) ~ MVLN (µ
Gk
,
2
Gk
),
Y
ik
Eq. 4 to Eq. 6
37
where µ
G
k
=Z
0k
+Z
1G
G
ik,,
2
Gk
treated in this work as a priori null
cThird level prior specifications of parameters
:,<,= ~ N (0,B
2
),@
2
~ Gamma
-1
(2, 1),
\
2
m
~ Gamma
-1
(1, 1), V
2
k
~ Gamma
-1
(1, 1), for a priori ± 1SD measurement
precision
We also compared the PBPK models to standard logistic regression estimation
techniques in terms of their respective statistical power (defined in results section)
to detect simulated effects. For logistic regression, both a reduced model with main
G and E effects only and a full model with additional G×E, and G×G interactions,
i.e., logit (Pr (Y
i
=1E
i
, G
i
) = 9
0
+
G
G
i
+
E
E
i
+
GE
G
i
×E
i
+
GG
G
ik
×G
il
(kl)) were
considered.
38
CHAPTER 4
RESULTS
Chapter 4 describes the key results from simulation and estimation pertaining to the
hierarchical PBPK modeling platform. Results are presented with respect to (1)
crude gene-exposure interactions elicited by the PBPK simulation models, (2)
ability of the PBPK estimation models
to retrieve the simulated parameters in
correctly specified genetic models under various data input scenarios, (3) effect of
completeness of metabolite concentration and enzyme activity data on estimation
precision, (4) effect of different measurement error variances in metabolite and
enzyme activity data on estimation precision, (5) effect of V
max
between-subject
variability on estimation precision, (6) effect of mis-specifying reaction kinetics on
estimation precision, (7) comparison of PBPK modeling power to that of logistic
regression, and (8) prior sensitivity, and summarized in Section 9.
4.1 Crude gene-exposure interactions elicited by the PBPK simulation models
Although the pathway-based PBPK simulation models do not explicitly include any
interaction effects, simple descriptive plot from the simulated data clearly displays
important gene-exposure (G×E) interactions. Figure 4.1 illustrates the G×E
interactions by plotting the relative risks of disease against various (G
1
,G
2
,G
3
)
combinations, at three simulated populations of low, moderate, and high exposure
intakes, respectively. Individuals with the slowest activation and the highest
detoxification genotype combinations (G
1
=2, G
2
=2, G
3
=0) were chosen as the
39
reference group. Clearly, estimates of relative risks increase at elevated exposure
level for the same genotype combinations, indicating more pronounced genotypic
effect on disease at higher exposure intakes. For instance, relative risk estimate
comparing the genotype combination (G
1
=2, G
2
=2, G
3
=0) to (G
1
=2, G
2
=2, G
3
=0) is
5.9 at high exposure intakes, which is much higher than its estimate of 1.7 at low
exposure intakes. To correctly estimate the joint effects of genes and exposures,
G×E interaction effects should be taken into account in a statistical model, in
addition to the respective main effects.
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00
2,2,0
2,1,0
2,2,1
1,2,0
2,1,1
2,2,2
2,0,0
1,2,1
1,1,0
0,1,0
0,2,1
0,2,0
1,0,0
1,1,1
2,1,2
2,0,1
0,0,0
1,2,2
0,2,2
1,1,2
1,0,1
0,1,1
0,1,2
0,0,1
2,0,2
1,0,2
0,0,2*
Relative Risk Estimates
* Reference
Increasing
RR at same
E for
G
1
,G
2
,G
3
Low
Exposure
Intakes
Medium
Exposure
Intakes
High
Exposure
Intakes
Figure 4.1: Illustrative plot showing exposure×genotype interaction effect
40
4.2 Ability of the PBPK estimation models
to retrieve the simulated
parameters in correctly specified genetic models under various data input
scenarios
Estimation results were collectively presented in Table 4.1 in terms of parameter
posterior means (SDs) and Figure 4.2 in terms of posterior sampling histories on
pages 43 to 44, comparing the three data input scenarios of (1) E
i
, G
i
, Y
i
only, (2)
E
i
, G
i
, Y
i
, and P
ik
, and (3) E
i
, G
i
, Y
i
, and M
zj
.
4.2.1 Posterior estimation with exposure, genotype, and disease data only
With genotype, exposure, and disease data input only, the PBPK models seem to
have trouble recovering the parameters at their simulated magnitudes. Given poor
Markov chain starting values, there is tendency for the estimation to end
prematurely in gibberish trap errors, a problem reportedly encountered by many
WINBUGS users for correctly specified models. Nevertheless, sampling history
plots (Figure 4.2a) started at null initial values show trend of posterior
1
s
increasing towards but not converging to their simulated parameter values. There is
also appreciable negative posterior correlation between
s and 6
1
These indicate
non-identifiablity problems that are common to PBPK models, meaning that
various combinatorial parameter sets would fit the data well, especially when the
parameter sampling spaces are rather wide.
41
4.2.2 Posterior estimation with additional measured enzyme activity data
When enzyme activity data P
ik
for all subjects were provided
into the estimation
models, traps still occur given poor starting values, yet illustrated by G
2,
history
plots (Figure 4.2b) before trap clearly show that .
1
for the corresponding gene at
locus k has reached fast convergence to its simulated value, while .
1
for the gene
that acts on the same metabolite as G
k
also show better signs of reaching its
simulated magnitudes and potentially less difficulty in convergence than without
P
ik
. There is no obvious estimation improvement before trap for
s and the
remaining .
1
over without P
ik
input, however.
4.2.3 Posterior estimation with additional measured metabolite data
With metabolite concentration data M
zj
for all subjects
,
the PBPK models are able
to estimate
s, s, and 6
1
with great precision for all. Traps disappear despite very
poor starting values and all parameters converge very nicely to their simulated
magnitudes( Figure 4.2c). Measured metabolite concentrations help improving
convergence and precision due to the particularly intricate balance of exposure and
genotype effect magnitudes necessary to attain a certain metabolite concentration in
the PBPK models. Moreover, parameter identifiability and precision improvement
are independent of whether M
z2
or M
Z3
is fed into the models. Surely, in practice,
stability and the laboratory measurement precisions of the metabolites would also
guide the selection of metabolite to measure.
42
Table 4.1: Posterior estimates comparing data on E
i
, G
i
, Y
i
only with E
i
, G
i
, Y
i
, M
zj
Parameter
Simulated
values
Prior
mean (sd)
Posterior mean (sd),
no metabolite or enzyme
activity data input
Posterior mean (sd),
with M
2
input
@
Posterior mean (sd),
with M
3
input
@
X
E1
^
7 0(10) 7.19 (0.91)
!
7.10 (0.43) 6.98 (0.46)
X
E2
7 0(10) 6.84 (0.86)
!
7.03 (0.39) 7.05 (0.42)
Z
1G1
$
0.50 0 (3.3)
0.58 (0.25)
!
0.53 (0.11) 0.55 (0.13)
Z
1G2
0.50 0 (3.3)
0.43 (0.21)
!
0.51 (0.10) 0.50 (0.11)
Z
1G3
0.50 0 (3.3)
0.39 (0.20)
!
0.48 (0.12) 0.47 (0.11)
]
1
*
0.06 0 (2) 0.069 (0.013) 0.064 (0.006) 0.063 (0.006)
^ Log contributions of exposure Es to total metabolite concentration in ppm
$ Log inter-genotype difference in µ
Vmax
for gene G at locus k
* Log increase in relative risk of disease y, per ppm increase in final metabolite
! With good starting values, otherwise not recoverable
@ Measured with moderate error variances:±1 SD around the model derived values
newalpha1 chains 1:3
iteration
1 200 400 600
5.0
6.0
7.0
8.0
9.0
newalpha1 chains 1:3
iteration
1 500 1000
4.0
6.0
8.0
10.0
newalpha1 chains 1:3
iteration
1 5000 10000
2.0
4.0
6.0
8.0
10.0
Figure 4.2: History plots showing convergence under various data input scenarios
Figure A1. Posterior 5
1
with E,G,Y only
Figure A2. Posterior 5
1
with E,G,Y, and P
G2
Figure A3. Posterior 5
1
with E,G,Y, and M
z3
43
newalpha2 chains 1:3
iteration
1 200 400 600
5.0
6.0
7.0
8.0
9.0
Figure B1. Posterior 5
2
with E,G,Y only
newalpha2 chains 1:3
iteration
1 500 1000
4.0
6.0
8.0
10.0
newalpha2 chains 1:3
iteration
1 5000 10000
2.0
4.0
6.0
8.0
10.0
Figure B2. Posterior 5
2
with E,G,Y, and P
G2
Figure B3. Posterior 5
2
with E,G,Y, and M
z3
a1 chains 1:3
iteration
1 200 400 600
0.0
0.2
0.4
0.6
0.8
Figure C1. Posterior .
1G1
with E,G,Y only
a1 chains 1:3
iteration
1 500 1000
0.0
0.2
0.4
0.6
0.8
a1 chains 1:3
iteration
1 5000 10000
-0.5
0.0
0.5
1.0
1.5
Figure C2.Posterior .
1G1
with E,G,Y,and P
G2
Figure C3. Posterior .
1G1
with E,G,Y, M
z3
Figure 4.2: History plots showing convergence under various data input scenarios
(continued)
44
b1 chains 1:3
iteration
1 200 400 600
-0.4
-0.2
0.0
0.2
Figure D1. Posterior .
1G2
with E,G,Y only
b1 chains 1:3
iteration
1 500 1000
-0.25
0.0
0.25
0.5
0.75
b1 chains 1:3
iteration
1 5000 10000
-0.5
0.0
0.5
1.0
1.5
Figure D2.Posterior .
1G2
with E,G,Y,and P
G2
Figure D3. Posterior .
1G2
with E,G,Y, M
z3
c1 chains 1:3
iteration
1 200 400 600
-0.4
-0.2
0.0
0.2
0.4
Figure E1. Posterior .
1G3
with E,G,Y only
c1 chains 1:3
iteration
1 500 1000
-0.25
0.0
0.25
0.5
0.75
c1 chains 1:3
iteration
1 5000 10000
-0.5
0.0
0.5
1.0
1.5
Figure E2. Posterior .
1G3
with E,G,Y,and P
G2
Figure E3. Posterior .
1G3
with E,G,Y, M
z3
Figure 4.2: History plots showing convergence under various data input scenarios
45
(Figure 4.2, continued)
gamma1 chains 1:3
iteration
1 200 400 600
-0.1
0.0
0.1
0.2
0.3
Figure F1. Posterior 6
1
with E,G,Y only
gamma1 chains 1:3
iteration
1 500 1000
-0.1
0.0
0.1
0.2
0.3
gamma1 chains 1:3
iteration
1 5000 10000
-0.05
0.0
0.05
0.1
Figure F2. Posterior 6
1
with E,G,Y, and P
G2
Figure F3. Posterior 6
1
with E,G,Y, and M
z3
Figure 4.2: History plots showing convergence under various data input scenarios
(continued)
4.3 Effect of completeness of metabolite concentration and enzyme activity
data on estimation precision
Tables 4.2 compares the estimation precision between completely and partially
measured metabolite concentrations, considering the potential difficulty in
collecting such measurements on all the study subjects. Illustrated by M
Z3,
results
show that measured metabolite concentration on 10% of cases, or controls, or both
yield reasonably precise posterior estimate for
s,
1
s, and 6
1,
with the best
precision achieved by
metabolite measured on both cases and controls. Despite
trap error messages, measured enzyme activity levels on 10% cases and controls
46
lead to converged and precise .
1
estimate for the corresponding
gene at locus k.
Measured enzyme activity levels on 10% cases only and controls only also show
promise of leading to better .
1
estimate for G
k
than without such input at all. (
Figure 4.3).
Table 4.2: Completeness of metabolite concentration data
@
on estimation precision
Parameters :
E1
^
:
E2
<
1G1
$
<
1G2
<
1G3
=
1
*
Simulated Values 7.0 7.0 0.5 0.5 0.5 0.06
Prior Mean(SD) 0(10) 0(10) 0 (3.3) 0 (3.3) 0 (3.3) 0(1.4)
Posterior
!
Mean(SD)
50 Cases Only
6.96
(0.61)
7.07
(0.53)
0.75
(0.33)
0.77
(0.37)
0.48
(0.27)
0.062
(0.010)
Posterior
!
Mean(SD)
50 Controls Only
7.08
(0.63)
6.88
(0.56)
0.56
(0.36)
0.69
(0.39)
0.38
(0.39)
0.045
(0.019)
Posterior
!
Mean(SD)
50 Cases & 50 Controls
7.11
(0.55)
7.18
(0.50)
0.65
(0.24)
0.59
(0.20)
0.46
(0.19)
0.052
(0.008)
Posterior
!
Mean(SD)
500Cases&500Controls
7.07
(0.52)
7.06
(0.49)
0.57
(0.19)
0.55
(0.14)
0.60
(0.16)
0.057
(0.007)
! Converged posteriors from three chains started from poor to good initial values
^ Log contributions of exposure Es to total metabolite concentration in ppm
$ Log inter-genotype difference in µ
Vmax
for gene G at locus k
* Log increase in relative risk of disease y, per ppm increase in final metabolite
@ Measured with moderate error variances( ±1 SD around the model derived values)
b1 chains 1:3
iteration
1 250 500 750
-0.4
-0.2
0.0
0.2
0.4
0.6
Figure A. Posterior .
1G2
with E,G,Y, for all and P
G2
for 50 cases only
Figure 4.3: History plots comparing partial with complete P
G2
input on estimation
47
b1 chains 1:3
iteration
1 500 1000
-0.25
0.0
0.25
0.5
0.75
Figure B. Posterior .
1G2
with E,G,Y, for all and P
G2
for 50 controls only
b1 chains 1:3
iteration
1 1000 2000
-0.5
0.0
0.5
1.0
Figure C. Posterior .
1G2
with E,G,Y, for all and P
G2
for 50 cases and controls
b1 chains 1:3
iteration
1 500 1000
-0.25
0.0
0.25
0.5
0.75
Figure D. Posterior .
1G2
with E,G,Y, for all and P
G2
for all cases and controls
Figure 4.3: History plots comparing partial with complete P
G2
input on estimation
(continued)
48
4.4 Effect of different measurement error variances in metabolite and enzyme
activity data on estimation precision
Figure 4.4 shows the effect of different measurement error variances in metabolite
concentration and enzyme activity data (with good Markov chain starting values)
on the estimation precision. Low, moderate, high measurement errors represent
metabolite or enzyme activities generated within 1.5, 1.0, and 0.5 standard
deviations of their respective calculated values from the PBPK simulation models
under Michaelis-Menten kinetics. Understandably, estimation precision for
s,
1
s,
and
1
, are at their highest when metabolite or enzyme activity are measured with
the lowest errors, yet even the worst studied measurement precision yield much
greater estimation precision over no such measurement at all.
0
20
40
60
80
100
120
140
160
1 23 456
Parameters of interests
Bayes Factors
1E 1
1E 2
1G 1
1G 2
1G 3
1
Common Simulation and Estimation M-M kinetics
No M
Zj
or P
Gk
input
Low, moderate, high
precision M
Z2
input
Low, moderate, high
precision P
G2
input
Figure 4.4: Parameter estimation precision with respect to different metabolite or
enzyme activity measurement error variances
49
4.5 Effect of V
max
between-subject variability on estimation precision
Metabolic rates may differ not only between genotypes but also between subjects
of the same genotype, as each individual is unique. Illustrated by gene at locus 2,
Figure 4.5 shows no appreciable decrease in BF( hence estimation precision) for .
1
of
gene at locus k despite its increased intra-genotype variability, when measured
enzyme activity data P
ik
is provided to estimation models. Illustrated by M
z2,
when
measured metabolite concentrations are incorporated
,
estimation precision for
1
s of
all three genes stay high regardless of their simulated intra-genotype V
max
variability
.
This can be explained by enzyme activities and metabolite
concentrations data being obtained as available on a subject-specific level that
already contain the latent individual specific V
max
information in them. Between-
individual differences in metabolic rates are therefore properly accounted for in
posterior estimation when measured enzyme activity or metabolite concentrations
are fitted into the models.
50
0
20
40
60
80
100
120
140
12 3
Parameters of interests
Bayes Factors
1G 1
1G 2
1G 3
Common Simulation and Estimation M-M kinetics
Green bar: No M
Zj
or P
Gk
input; Pink bar: Moderate error P
G2
input; Blue bar: Moderate error M
z2
input
From left to right: low, medium, high simulated intra-genotype V
max
variability
Figure 4.5: Comparison of parameter estimation precision at varying magnitudes
of simulated intra-genotype between subject metabolic rate variance
4.6 Effect of mis-specifying reaction kinetics on estimation precision
It is worth exploring whether linear reaction kinetics would alleviate some of the
computational load without sacrificing much estimation precision when the
pathway structure is complex. Figure 4.6 illustrates the differences in estimation
precision within PBPK hierarchical modeling between Michaelis-Menten and
linear kinetics with or without metabolite M
i2
or proteomic P
G2
data, both of which
were measured at moderate error variances. Posterior estimates for all the
parameters show evidence (20< BF< 150) around their simulated values under
either linear or Michaelis-Menten kinetics. Estimation precision is higher using
Michaelis-Menten than linear kinetics at the same data input scenario, since
51
Michaelis-Menten kinetics is the true enzyme reaction kinetics under which the
data were simulated. Consistent with previously described results in Section 4.3,
measured metabolite M
i2
with the lowest errors within Michaelis-Menten kinetics
models yield the highest estimation precision among all estimation scenarios.
Perhaps the most important reason why linear kinetics seem to perform rather well
is that in order to mimic real epidemiological datasets, a majority of the simulated
population has total exposure intakes Z
it
that are within the proximity of the linear
range of the reaction kinetics. The precision difference between kinetics will
probably be higher for a population with considerable number of individuals with
exposure intakes at the non-linear range of reaction kinetics.
0
20
40
60
80
100
120
140
160
Parameters of interests
Bayes Factors
Common simulation M-M yet different estimation kinetics
1E 2
1E 1
1G 1
1G 2
1G 3
1
LN .vs. M-M, No
M
Zj
or P
Gk
input
LN .vs. M-M, moderate
precision P
G2
input
LN .vs. M-M, moderate
precision M
Z2
input
Figure 4.6: Comparison of parameter estimation precision between Michaelis –
Menten and linear kinetics
52
4.7 Comparison of power of PBPK modeling with that of logistic regression
Because the PBPK simulation and estimation framework was aimed at better
detecting genetic and exposure effect, it is only natural to ask how it compares to
the established frequentist techniques such as logistic regression in terms of power
for the same purpose. Figure 4.7 compares the power between the two approaches
under various data input scenarios. Because PBPK modeling and logistic regression
estimate genetic and exposure effects in different ways, it is difficult to define
power in a comparable fashion for the two. Therefore, for logistic regression,
power is defined as the probability of the Wald test for a particular effect, whether
main or interaction effects being statistically significant under the simulated
alternatives, controlling the overall type I error rate for such effect at 0.05. For
PBPK hierarchical models, power is defined as the probability that posterior 95%
credibility intervals for 5
1
s and .
1
s exclude the null effect of zero under the
simulated alternatives, controlling type I error rate for each effect also at 0.05.
53
0
0.2
0.4
0.6
0.8
1
1.2
Effects
Power
*
Under null ( adjusted overall effect Type I error rate )
PBPK-M-M, no metabolomics or proteomics input
PBPK-M-M, with moderate precision metabolomics Mz2 input
Logistic regression
N =1000
E
1
E
2
G
1
G
2
G
3
* Power LR = Pr (Pwald<0.05|HA), Power PBPK =Pr (95% Post.Par CIs not inlude 0 |HA), where HA=PBPK models
with simulated non-zero effects
0.05
Figure 4.7: Comparison of power between hierarchical PBPK modeling and
logistic regression
As expected from the improvement in estimation precision, the most noticeable
power increase is observed when measured enzyme activity or metabolite data were
input into estimation. Reasonably accurate enzyme activity measurement for a gene
in estimation raises the power to detect its effect to 100%. Measured metabolite
concentrations at a comparable precision boost the power to 100% for detecting all
simulated effects. Enzyme activities or metabolite concentrations are not used in
logistic regression models, but could be most beneficial in terms of both precision
and power in PBPK models.
54
Moreover, the interactions shown to be present in metabolic pathways( Figure 4.1)
could be of particular importance in epidemiology, because they could mean how
an individual’s dietary habit or lifestyle may help modify his(her) otherwise
genetically pre-disposed disease risk. PBPK models naturally integrate G×E and
G×G interactions together with Michaelis-Menten kinetics
) , ( ) (
) , ( ) (
max
G E Z G K
G E Z G V
m
+
×
. The
author would therefore contend that the fairest power comparison between the two
would have the logistic regression models including G× E and G×G interaction
terms, in which case the PBPK models are superior in terms of power for the
scenarios studied.
4.8 Prior sensitivity
Finally, a legitimate question for this Bayesian type of analysis is the sensitivity of
the posterior parameter estimates with respect to the choice of priors. Because
priors for all the important effect parameters are uninformative (means centered at
zero with a large standard deviations, previously described results imply that the
data, rather than the priors are driving the estimates and the posteriors are
insensitive to prior specifications. Table 4.3 show that the models are indeed robust
with respect to mis-specified prior means, when metabolite measurements are
available. Surely, prior mis-specification is of much broader scope but the most
meaningful sensitivity analysis is probably to be made in a real pharmacokinetic
55
setting in terms of how sensitive the posteriors are with respect to the strength of
prior physiological knowledge applied to the models.
Table 4.3: Posterior estimates with respect to mis-specified prior means with
E
i
, G
i
, Y
i
, and M
zj
data
input
Parameter
Simulated
values
Prior
mean A(sd)
Posterior
mean A (sd)
Prior
mean B (sd)
Posterior
mean B (sd)
X
E1
^
7 0(10) 6.98 (0.46) 0(10) 7.06 (0.49)
X
E2
7 0(10) 7.05 (0.42) 0(10) 7.15 (0.44)
Z
1G1
$
0.50 0 (3.3)
0.55 (0.13)
-0.5(3.3)
0.58 (0.14)
Z
1G2
0.50 0 (3.3)
0.50 (0.11)
-0.5 (3.3)
0.54(0.13)
Z
1G3
0.50 0 (3.3)
0.47 (0.11)
-0.5 (3.3)
0.56(0.16)
]
1
*
0.06 0 (2) 0.063 (0.006) 0 (2) 0.060 (0.006)
4.9 Summary
The results are summarized below for the physiologically based pharmacokinetic
Bayesian hierarchical simulation and estimation framework under a postulated
metabolic pathway and topology, with complete genetic, exposure, disease
information for all individuals.
1. Gene-exposure interactions in terms of heterogeneous genotype-specific
RRs among different exposure groups are elicited by the simulated
metabolic pathway model, and such interactions need to be taken into
account in statistical modeling.
56
2. The hierarchical PBPK models allow flexibility in terms of data input and
provide a convenient way to incorporate intermediate measurements such as
enzyme activities and metabolite concentrations into effect estimation.
3. Measured metabolite concentrations, even if only partially complete, help to
recover 5s, .
1
s, and 6
1
with good precision, which is also independent of
which metabolite is measured and the magnitude of intra-genotype
variances in V
max
.
4. Measured enzyme activity data, whether complete or partial, help to
estimate .
1
for the encoding gene with great precision, which is also
independent of intra-genotype variances in V
max
.
5. The hierarchical PBPK models demonstrate higher power for detecting the
joint genetic and exposure effects than logistic regression.
6. The hierarchical models are robust with respect to mis-specified prior
means, given metabolite input as data rather than priors are driving the
posterior estimates.
7. If metabolite concentration measurements are available, hierarchical PBPK
modeling could serve as a viable tool to quantify the genetic effects within a
metabolic network that is clearly defined.
8. Although prematurely stopped by traps, a problem reported by many
WINBUGS users for correctly specified models, the PBPK models show
signs of parameter identifiablity problems and difficulty in convergence,
57
when provided with only genotype, exposure, and disease data and all prior
distributions for the parameters of interests are uninformative.
58
CHAPTER 5
DISCUSSION
This work represents an exploratory effort into whether incorporating metabolite
concentration or enzyme activity data into physiologically-based pharmacokinetic
(PBPK) models improves the estimate of joint effects of genes and exposures on
disease risks within a specified metabolic pathway and topology via a Bayesian
approach. Metabolite concentration or enzyme activity data, whether partial or
complete, cannot be readily incorporate into logistic regression models but are
shown to greatly help estimate all the simulated effects and the corresponding
genetic effects with notable precisions in PBPK models. The PBPK models also
demonstrate robustness with respect to mis-specified prior means, as well as higher
power than logistic regression in detecting the joint effects of genes and exposures
when the same data are available. All results presented, however, are necessarily
pathway topology and metabolic model dependent and need to be considered
within this context. Before the method can be put into practical use, the modeling
framework needs to take into account several issues discussed below.
Probably the most crucial issue that the models have yet to address is the
uncertainty about the topology between exposures, genes, and disease. In most
circumstances, we do not have complete confidence in the topology of the
metabolic network, so this thesis represents a necessarily idealized situation where
59
we assume clear knowledge of the topology and metabolic steps involved. At
present, the modeling platform presented is most directly applicable to
characterizing the effects in a segment of a metabolic network where the topology
is well defined. Good definition of topology and pathways will sooner or later
become possible, however, thanks to the increasing collaborative effort among
molecular biologist, analytical chemists, statisticians, and bioengineers. Bayesian
networks, for instance, seem to show promise in topology identification (43).
Even for a clearly defined topology, there are several important features that the
models have yet to address. In the context of study design, “reverse causation”, the
dependence of measured metabolite or enzyme activities on disease status in a case-
control study, stands out. The diseased status and possible medications that alter
physiology and body chemistry may cause such measurements for the cases to
differ from their pre-diagnostic values. The framework presented therefore is
suitable for a cohort type of study design where reverse causation is not an issue.
Multiple longitudinal metabolite or enzyme activity can also be obtained for the
same study subjects in a cohort study, however, thereby improving the modeling
further.
Well-known biological features that are not yet recognized in the current models
are those of DNA repair mechanisms (44) that are important for diseases like
cancer, as well as the inducibility of gene expression by exposure (45). DNA repair
60
follows its own pathways and may entail a very different type of model to describe
it. Exposure-induced gene expression will render a subject’s metabolic rates
dependent not only on genotypes but also exposure status.
Furthermore, the PBPK model used in this work is a simplified single
compartmental one. In reality, multi-compartmental PBPK models are often used
to describe not only metabolism, but the distribution of metabolites in the different
bodily compartments, such as the liver and fat (15). Multi-compartmental PBPK
models may serve better than a single compartmental one in modeling diseases
occurring predominantly in one particular location of the body, colorectal polyps,
for instance. One other limitation associated with using PBPK models in
epidemiology is the lack of intuitively clear result interpretation to epidemiologists,
who are probably most comfortable with risk estimators such as odds ratios. It
takes quite some manipulation to translate the between-genotypic difference in
rates in PBPK models to odds ratios, and such translation necessarily involves
exposures because PBPK models integrate genes and exposures altogether. In
contrast, standard logistic regression readily outputs the effects in terms of odds
ratios with respect to either exposure alone, genotypes alone, or both. Hierarchical
approaches with standard logistic regression as the first and second-level models
utilizing prior covariates with PBPK interpretations such as V
max
/K
m,
could
circumvent interpretation problems.
61
Probably the most serious statistical concern with increasing modeling complexity
to address various aforementioned features could be parameter identifiablity issues.
Although this work finds some of the effect parameters potentially un-identifiable
given exposure, genotype, and disease data only, there is hope for them to become
identifiable if metabolite or enzyme activity measurements are available, or by
applying informative priors on physiological parameters for which a priori strong
biochemical knowledge is present (not investigated in this work because symbolic
rather than actual genes and exposures are used).
Nevertheless, the author would like to stress that epidemiology needs to consider
and exploit the rich information content contained in metabolite or enzyme activity
measurement, much of which is not only disease relevant but could potentially be
incorporated into an appropriate Bayesian estimation framework as prior
knowledge. PBPK models provide an ideal way to extract the information content
in metabolite or enzyme activity data and utilize a priori physiological knowledge
on parameters where the essential genotypic information is contained. The gain in
statistical power and estimation precision without increasing sample size, when
incorporating even partially complete metabolite or enzyme activity information
into a PBPK model for a defined topology, greatly outweighs the increase in
modeling complexity and computational load. We expect our PBPK and Bayesian
modeling framework to be of great utility for characterizing genetic and exposure
effects in a metabolic network where the topology is clearly defined.
62
REFERENCES
1. Andersen ME, Clewell III HJ, Gargas ML, Smith FA, Reitz
RH. Physiologically based pharmacokinetics and the risk
assessment process for methylene chloride. Toxicol Appl
Pharmacol 1987;87:185-205.
2. Ballard P, Leahy DE, Rowland M. Prediction of in vivo tissue
distribution from in vitro data 1. Experiments with markers of
aqueous spaces. Pharm Res 2000;17:660-663.
3. Bass L, Robinson P, Bracken AJ. Hepatic elimination of
flowing substrates: the distributed model. J Theor Biol
1978;72:161-184.
4. Bernillon P, Bois FY. Statistical issues in toxicokinetic
modeling: A Bayesian perspective. Environ Health Persp
Suppl 2000;108:S883-S893.
5. Berry DA, Stangl DK. Bayesian methods in health-related
research. In: Berry DA, Stangl DK, ed. Bayesian biostatistics.
ed. New York: Marcel Dekker, 1996: Statistics: Textbooks and
monographs; vol 151.
6. Bois FY, Gelman A, Jiang J, Maszle DR, Zeise L, Alexeef G.
Population toxicokinetics of tetrachloroethylene. Arch Toxicol
1996;70:347-355.
7. Bois FY, Jackson ET, Pekari K, Smith MT. Population
toxicokinetics of benzene. Environ Health Persp
1996;104:1405-1411.
8. Bois FY, Maszle DR. MCSim: A Monte Carlo Simulation
Program. J Stat Software [serial on the internet] 1997;2:1-60.
9. Bois FY. Analysis of PBPK models for risk characterization.
Annals of the New York Academy of Sciences 1999;895:317-
337.
10. Bois FY. Statistical Analysis of Clewell et al. PBPK Model of
Trichloroethylene Kinetics. Environ Health Persp Suppl
2000;108:307-316.
63
11. Bois FY. Statistical Analysis of Fisher et al. PBPK Model of
Trichloroethylene Kinetics. Environ Health Persp Suppl
2000;108:275-282.
12. Brown D, Gorin M, Weeks D. Efficient strategies for genomic
searching using the affected-pedigree-member method of
linkage analysis. American Journal of Human Genetics
1994;54:544-552.
13. Brown RP, Delp MD, Lindstedt SL, Rhomberg LR, Beliles
RP. Physiological parameter values for physiologically based
pharmacokinetic models. Tox Ind Health 1997;13:407-484.
14. Carlsson A. Exposure to toluene: Uptake, distribution and
elimination in man. Scand J Work Environ Health 1982;8:43-
55.
15. Conti DV et.al., Bayesian modeling of complex metabolic
pathways, Human heredity, 2003; 56: 83-93.
16. Cortessis V, Thomas DC: Toxicokinetic genetics: An approach
to gene-environment and gene-gene interactions in complex
metabolic pathways, IARC Sci Publ. 2004;157:127-50.
17. Droz PO, Wu MM, Cumberland WG. Variability in biological
monitoring of solvent exposure. II Application of a population
physiological model. Br J Ind Med 1989;46:547-558.
18. Elston R, Guo X, Williams L. Two-stage global search designs
for linkage analysis using pairs of affected relatives. Genet
Epidemiology 1996;13:535-558.
19. Fiserova-Bergerova V. Extrapolation of phsyiological
parameters for physiologically based simulation models. Tox
Lett 1995;79:77-86.
20. Garcia-Closas M, Wacholder S, Caporaso N, Rothman N.
Inference issues in epidemiological studies of genetic effects
and gene-environment interactions. In: Khoury MJ, Little J,
Burke W, eds. Human Genome Epidemiology: Scientific basis
for using genetic information to improve health and prevent
disease, 2002: Chapter 7.
64
21. Genome-wide pathway analysis and visualization of using
gene expression data, http://psb.stanford.edu/psb-
online/proceedings/psb02/kurhekar.pdf.
22. Goldstein AM, Andrieu N., Detection of interaction involving
identified genes: available study designs, J. National Cancer
Inst. Monographs 1999; 26:49-54.
23. Goldstein AM, Falk RT, Korczak JF. Detecting gene-
environmental interactions using a case-control design,
Genetic Epi. 14(1997) 1085-1089.
24. Karp P.D., Paley S., Romero P. The Pathway tools software,
Bioinformatics, 2002; 18 Suppl 1:S225-32.
25. Lipscomb JC, Garrett CM, Snawder JE. Cytochrome P450-
dependent metabolism of trichloroethylene: Interindividual
differences in humans. Toxicol Appl Pharmacol 1997;142:311-
318.
26. Ludden TM, Allerheiligen SRB, Burk RF. Application of
population analysis to physiological pharmacokinetics. J
Pharmacokin Biopharm 1991;19:101S-113S.
27. Lunn DJ, Aarons L. Markov chain Monte Carlo techniques for
studying interoccasion variability: Application to
pharmacokinetic data. Appl Statist 1998;46:73-91.
28. M. Garcia-Closas, N. Rothman, and J. Lubin
Misclassification in Case-Control Studies of Gene-
Environment Interactions: Assessment of Bias and Sample
Size, Cancer Epidemiol. Biomarkers Prev., December 1, 1999;
8(12): 1043 - 1050.
29. Matthew J. Bartosiewicz, David Jenkins, Sharron Penn,
Jennifer Emery and Alan Buckpitt. Gene Expression Patterns
in Liver and Kidney Associated with Exposure to Chemical
Toxicants. Pharmacology and Experimental Therapeutics,
Vol. 297, Issue 3, 895-905.
30. N. Andrieu, A. M. Goldstein, Epidemiological and genetic
approaches in the study of gene-environment interaction: an
overview of available methods, Epidemiological reviews, Vol.
20, N.O. 2, 137-147.
65
31. N. Rothman, M. Garcia-Closas, W.T. Stewart, Differential
Misclassification and the Assessment of Gene-Environment
Interactions in Case-Control Studies, American Journal of
Epidemiology Vol. 147, No. 5: 426-433.
32. Poulin P, Theil F-P. A priori prediction of of tissue:plasma
partition coefficients of drugs to facilitate the use of
physiologically-based pharmacokinetic models in drug
discovery. J Pharm Sci 2000;89:16-35.
33. Racine-Poon A, Wakefield J. Statistical methods for
population pharmacokinetic modelling. Statist Method Med
Res 1998;63-84.
34. Racine-Poon A. Population models. In: Berry DA, ed.
Statistical Methodology in the Pharmaceutical Sciences. ed.
New York: Marcel Dekker, Inc., 1990: 139-162.
35. Sargent DJ, Hodges JS, Carlin BD. Structured Monte Carlo
Markov chains. J Comput Graph Stat 2000;9:217-234.
36. Sheiner LB. The population approach to pharmacokinetic data
analysis: Rationale and standard data analysis methods. Drug
Metab Rev 1984;15:153-171.
37. Slob W, Janssen PHM, van den Hof JM. Structural
identifiability of PBPK models: Practical consequences for
modeling strategies and study designs. Crit Rev Toxicol
1997;27:261-272.
38. Stephens J, Briscoe D, O'Brien S. Mapping by admixture
linkage disequilibrium in human populations: limits and
guidelines. Am J Hum Genet 1994;55:809-824.
39. Stuart JA, Karahalil B, Hogue BA, Souza-Pinto NC, Bohr
VA.. Mitochondrial and nuclear DNA base excision repair are
affected differently by caloric restriction. FASEB J 18(3):595–
97.
66
40. Thomas A, Spiegelhalter DJ, Gilks WR. BUGS: a program to
perform Bayesian inference using Gibbs sampling. In:
Bernardo JM, Berger JO, Dawid AP, Smith AFM, ed.
Bayesian Statistics Oxford: Oxford University Press, 1992:
837-842.
41. Thomas RS, Lytle WE, Keefe TJ, Constan AA, Yang RSH.
Incorporating Monte Carlo simulation into physiologically
based pharmacokinetic models using Advanced continuous
simulation language (ASCL): A computational method.
Fundam Appl Toxicol 1996;31:19.
42. Umbach D.M., Weinberg C.R., The Use of Case-Parent Triads
to Study Joint Effects of Genotype and Exposure, Am. J. Hum.
Genet. 2000; 66:251–261.
43. Wakefield J, Walker S. Bayesian nonparametric population
models: Formulation and comparison with likelihood
approaches. J Pharmacokin Biopharm 1997;25:235-252.
44. Wakefield JC, Smith AFM. Bayesian analysis of linear and
non-linear population models by using the Gibbs sampler.
Appl Stat 1994;43:201-221.
45. Zheng Li, Christina Chan. Inferring pathways and networks
within a Bayesian framework. The FASEB Journal, 2004;
18:746-48.
Abstract (if available)
Abstract
Metabolic pathways are candidates for examining gene-exposure interaction effects on complex diseases. We present a hierarchical modeling platform integrating physiologically based pharmocokinetic models and Bayesian approaches to characterize gene-exposure interaction effects in a metabolic pathway. We simulate datasets under a known metabolic model. Individual-specific metabolic rates are described by Michaelis-Menten parameters Vmax and Km, regressed on genotypes at the relevant loci. Intermediate measurements such as enzyme activities and metabolite concentrations are incorporated as flawed measurements of an individual's long-term metabolic rates and metabolite concentrations. Parameter posteriors are estimated using Markov Chain Monte Carlo methods. The platform allows investigation posterior estimation precision relative to the simulated effect magnitude and appears to have great utility for characterizing etiologic genetic and exposure effects and dissecting complex pathways.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
In-silico physiological based pharmacokinetic modeling of prodrugs
PDF
Associations between perfluoroalkyl substances exposure and metabolic pathways in youth
PDF
Bayesian hierarchical models in genetic association studies
PDF
Adaptive set-based tests for pathway analysis
PDF
Observed and underlying associations in nicotine dependence
PDF
Investigating a physiological pathway for the effect of guided imagery on insulin resistance
Asset Metadata
Creator
Du, Lingyun (author)
Core Title
A hierarchical physiologically-based pharmacokinetic modeling platform for genetic and exposure effects in metabolic pathways
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biostatistics
Publication Date
07/13/2007
Defense Date
01/16/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Bayesian approaches,enzyme activities,exposure,genotype,hierarchical modeling,intermediate measurements,Markov chain Monte Carlo,metabolic pathways,metabolite concentrations,Michaelis-Menten,OAI-PMH Harvest,physiologically based pharmacokinetic models
Language
English
Advisor
Thomas, Duncan (
committee chair
), Conti, David V. (
committee member
), Watanabe, Richard M. (
committee member
)
Creator Email
hellofromku@yahoo.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m618
Unique identifier
UC1426785
Identifier
etd-Du-20070713 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-516458 (legacy record id),usctheses-m618 (legacy record id)
Legacy Identifier
etd-Du-20070713.pdf
Dmrecord
516458
Document Type
Thesis
Rights
Du, Lingyun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
Bayesian approaches
enzyme activities
exposure
genotype
hierarchical modeling
intermediate measurements
Markov chain Monte Carlo
metabolic pathways
metabolite concentrations
Michaelis-Menten
physiologically based pharmacokinetic models