Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
New approaches using probabilistic graphical models in health economics and outcomes research
(USC Thesis Other)
New approaches using probabilistic graphical models in health economics and outcomes research
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
NEW APPROACHES USING PROBABILISTIC GRAPHICAL MODELS IN HEALTH
ECONOMICS AND OUTCOMES RESEARCH
by
Quang Anh Le
________________________________________________________________________
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(PHARMACEUTICAL ECONOMICS AND POLICY)
December 2010
Copyright 2010 Quang Anh Le
ii
DEDICATION
To our dearest little angel Ellie (06/16/2008 – 07/23/2009);
who was the most beautiful baby we had ever seen;
who brought us the greatest of love, joy, and happiness;
who taught us the deepest love that we never knew it could exist;
who made us the most proud parents that we could ever wish for;
whose big and bright eyes were so dazzling;
whose laughter was always filled with such pure happiness;
whose personality was so amazing that everyone met Ellie fell in love with her instantly;
whose life was like a shooting star: so brief, so bright, so beautiful!
We miss you, we love you, and you will live in our hearts forever, Ellie dear!
To our little princess Allison, you are the reason and strength for us to continue living our
lives without your sister Ellie.
To my wife, Quynh-Nhu, for her endless sacrifice, love, support, and encouragement.
To my parents for sacrificing their lives to support us, shaping all aspects of my life, and
making me a much better person.
To my brothers, Quan, Huy, and Minh, for their supports, encouragements, intelligent
discussions, and advices.
To my nieces and nephews, Vivi, Kevin, Elaine, Aurea, Kathy, Tina, and Karl, for their
adorable personalities. And to Beanie, our loving dog of 12 years (1999 – 2010).
iii
ACKNOWLEDGEMENTS
My sincere thanks to my advisor and dissertation chair, Dr. Jason N. Doctor, for
his invaluable advice and continuous support academically as well as personally. Dr.
Doctor introduced me to the novel idea that behavioral economics and the ―Bayesian
Network‖ method can be successfully applied in health economics and outcomes
research, which subsequently resulted in several important publications in major medical
journals. I am very fortunate to have him as my mentor. Thank you, Dr. Doctor!
I would like to sincerely thank Dr. Joel W. Hay for his wisdom and guidance
throughout my graduate study at USC. Dr. Hay not only taught me cost-effectiveness and
econometric methods but also offered his insightful and important advice which helped
me to confirm my decision to work in academia. Thank you, Dr. Hay!
I am also very grateful to my committee, Drs. Michael B. Nichol, Kathleen A.
Johnson, and Michael Cousineau, for their invaluable comments, time, and expertise, as
well as helping me to finish my dissertation in time. Thanks so much, Drs. Nichol,
Johnson, and Cousineau!
A special thanks to Dr. Greg Strylewicz for introducing and helping me with the
R statistical language. His exceptional programming skill is amazing. Thanks Greg!
Certainly, the overall studying experience at USC would have been less enjoyable
and meaningful without my friends and classmates. Thanks Hoa Lu, Loan, Thao, Tai,
iv
Nhan, Dang, Khoa, Vinh, Uyen, Huy, Jimmy, Natalie, Joseph, Jaejin, Jane, Janet, Jenny,
Sara, Flavia, Marcio, Vaidy, and Adam!
v
TABLE OF CONTENTS
DEDICATION ii
ACKNOWLEDGEMENTS iii
LIST OF TABLES viii
LIST OF FIGURES x
ABSTRACT xii
CHAPTER 1: INTRODUCTION 1
Uncertainty in Healthcare 1
Bayesian Network 3
Discrete-State Markov Model 5
Purposes of the Study 7
CHAPTER 2: COST-EFFECTIVENESS ANALYSIS OF LAPATINIB 9
IN HER2-POSITIVE ADVANCED BREAST CANCER:
AN APPLICATION OF MARKOV MODEL WITH
MONTE-CARLO SIMULATION
Chapter-Two Abstract 9
Chapter-Two Background 11
Chapter-Two Methodology 12
The Economic Model 12
Patient Population 15
Healthcare Resource Costs 16
Health-state Utilities 19
Major Assumptions 19
Sensitivity Analyses 21
Chapter-Two Results 21
Chapter-Two Sensitivity Analyses 22
Chapter-Two Discussion 25
Chapter-Two Conclusion 28
Chapter-Two References 31
CHAPTER 3: DETECTING BLOOD LABORATORY ERRORS 34
USING A BAYESIAN NETWORK: AN EVALUATION
ON LIVER ENZYME TESTS
Chapter-Three Abstract 34
Chapter-Three Background 36
Qualitative Types of Laboratory Error 38
vi
Chapter-Three Methodology 41
Data Source 41
The Bayesian Network 42
LabRespond™ - A Benchmark Method 43
Logistic Regression Model 44
Experiment 1 (Synthetic Systematic Errors) 44
Experiment 2 (Synthetic Random Errors) 45
Data and Statistical Analysis 47
Chapter-Three Results 49
Experiment 1 49
Experiment 2 54
Chapter-Three Sensitivity Analyses 59
Chapter-Three Discussion 60
Chapter-Three References 64
CHAPTER 4: PROBABILISTIC MAPPING OF DESCRIPTIVE 67
RESPONSES IN HEALTH STATUS ONTO HEALTH-
STATE UTILITIES USING BAYESIAN NETWORK:
AN EMPIRICAL ANALYSIS CONVERTING SF-12
INTO EQ-5D UTILITY INDEX IN A NATIONAL
U.S. SAMPLE
Chapter-Four Abstract 67
Chapter-Four Background 69
Bayesian Network 70
Chapter-Four Methodology 72
Data Source 72
Health-Related Quality of Life (HRQOL) Instruments 73
The Bayesian Networks (Probabilistic Mapping) 74
Monte-Carlo Simulation Method 75
Expected-Utility Method 76
Most-Likely Probability Method 79
Current Mapping Approaches using Econometric Methods 79
Ordinary Least Squares (OLS) Model 79
Censored Least Absolute Deviation (CLAD) Model 80
Multinomial Logistic Regression (MNL) Model 80
Data and Statistical Analysis 81
Chapter-Four Results 82
Chapter-Four Discussion 96
Chapter-Four References 100
CHAPTER 5: CONCLUSIONS 103
BIBLIOGRAPHY 107
vii
APPENDIX A 115
APPENDIX B 117
APPENDIX C 121
viii
LIST OF TABLES
Table 1.1: Results, on average, after 3 cycles (years) 7
Table 2.1: Markov model transition probabilities in each 6-week cycle 15
Table 2.2: Main model parameters: base-case values, ranges, and assumed 18
distributions
Table 2.3: Summary of cost and outcome results in Monte-Carlo 23
simulations (N = 20,000 runs)
Table 3.1: Classification accuracy for systematic errors 51
Table 3.2: Median AUCs and standard errors (SE), sensitivity (%), and 52
accuracy (%) of the predictive models in 10-fold cross
validation processes (n = 5,800) in detecting large, medium,
and small systematic errors
Table 3.3: Classification accuracy for random errors 56
Table 3.4: Median AUCs and standard errors (SE), sensitivity (%), and 57
accuracy (%) of the predictive models in 10-fold cross
validation processes (n = 5,800) in detecting large, medium,
and small random errors
Table 3.5: Median AUCs and standard errors (SE) of the predictive models 60
in 10-fold cross validation processes (n = 5,800) in detecting
errors at different error distributions
Table 4.1: Socio-demographic characteristics and health states of the 83
modeling and validation samples using 2003 Medical
Expenditure Panel Survey (MEPS) data
Table 4.2: MSE, MAE, and the EQ-5D utility mean scores with absolute 87
differences between the observed and predicted EQ-5D utility
mean scores of prediction models in the overall sample, age
group, and number of chronic conditions for the validation sample
(n = 9,839) using U.K. scoring system
ix
Table 4.3: MSE, MAE, and the EQ-5D utility mean scores with absolute 89
differences between the observed and predicted EQ-5D utility
mean scores of prediction models in the overall sample, age
group, and number of chronic conditions for the validation sample
(n = 9,839) using U.S. scoring system
Table 4.4: Mean squared error (MSE) and mean absolute error (MAE) 91
of prediction models by EQ-5D index range, age group, and
number of chronic conditions (NCC) for the validation sample
(n = 9,839) using U.K. scoring system
Table 4.5: Mean squared error (MSE) and mean absolute error (MAE) 93
of prediction models by EQ-5D index range, age group, and
number of chronic conditions (NCC) for the validation sample
(n = 9,839) using U.K. scoring system
x
LIST OF FIGURES
Figure 1.1: Example of a simple Bayesian Network 4
Figure 1.2: Example of a Markov model for a simple and progressive disease 6
Figure 2.1: Markov Model with 4 Health-States and Transitions 14
Figure 2.2: Estimated overall survival based on the Markov model 20
with 20,000 Monte-Carlo simulations
Figure 2.3: Tornado diagram of one-way sensitivity analyses shows the 23
impact of 22 individual base-case assumptions on the ICER
Figure 2.4: Plot of ICER and changing duration of survival after disease 24
progression in the combination therapy from 7.4 months to
15.0 months based on the 95% confidence interval of median
overall survival (OS) reported in the EGF100151 trial
(approximately 13.7 – 21.3 months) while holding the OS
in the monotherapy fixed at 15.54 months
Figure 2.5: Cost-Effectiveness acceptability curve 24
Figure 2.6: The cost-effectiveness model using Microsoft Excel® with 30
Visual Basic programming language
Figure 3.1: Clinical Laboratory Process 39
Figure 3.2: Bayesian network with mixed discrete (Gender: Male or 43
Female, and Error: Yes or No) and continuous (ALT, AST,
and LDH measures) variables
Figure 3.3: ROC curves compare the predictive performance in detecting 53
systematic errors of the Bayesian network against LabRespond
and the logistic regression model
Figure 3.4: ROC curves compare the predictive performance in detecting 58
random errors of the Bayesian network against LabRespond
and the multinomial logistic regression model
Figure 4.1: Example of a Bayesian network resulting from mapping 72
the SF-12v2 onto the EQ-5D self-care domain
xi
Figure 4.2: The Bayesian networks for Predicting Response-Levels of five 84
EQ-5D domains from 12 items of the SF-12v2
Figure 4.3: Scatter Plots of the observed EQ-5D utility scores vs. predicted 94
EQ-5D utility values based on U.K. scoring system
Figure 4.4: Scatter Plots of the observed EQ-5D utility scores vs. predicted 95
EQ-5D utility values based on U.S. scoring system
xii
ABSTRACT
Probabilistic graphical models (PGMs) are those models that employ both
probability theory and graph theory. The fundamental to the idea of a PGM is the notion
of modularity, i.e. a complex system can be built by combining simpler parts. Health
economics and outcomes research (HEOR) is a multidisciplinary approach to healthcare
and research that incorporates number of areas of expertise including clinical research,
epidemiology, health services research, economics, and psychometrics. The field has
rapidly expanded in the last decade and played a crucial role in improvement the quality
of healthcare. Drugs, healthcare programs, and medical devices are increasingly required
to demonstrate not only their efficacy and safety characteristics, but also their superior
performance in clinical effectiveness, health-related quality of life and economic
outcomes. While probabilistic graphical models have become a popular tool for data
analysis in health informatics, especially used to prescribe treatment or guide diagnostic
decisions, their use and applications in HEOR have been limited. This three-paper
dissertation introduces new approaches using probabilistic graphical models in health
economics and outcomes research.
Paper 1 demonstrates a cost-effectiveness analysis model of an expensive and
newly approved cancer drug, lapatinib, using a Markov model with Monte-Carlo
simulation method. This modeling approach innovatively uses Microsoft® Excel
spreadsheet with Visual Basic programming language and provides health economists the
flexibility to customize, ease to calibrate, and graphical visualization for their cost-
xiii
effectiveness models. Paper 2 presents an alternative method using a Bayesian network
that can detect blood lab errors better than the existing automated models. Successful
implementation of the Bayesian network model in clinical laboratory can help to reduce
medical costs and improve patient safety. Paper 3 provides a new robust and natural
approach using Bayesian networks to map health-profile or disease-specific measures
onto preference-based measures. Applying the probabilistic mapping technique to obtain
QALYs can be useful in health economic evaluations when health utilities are not
directly available.
1
CHAPTER 1: INTRODUCTION
Uncertainty in Healthcare
In healthcare, as in other areas of human activity, making a clinical decision is
certainly uncertain. Characteristically, a patient‘s diagnosis based on probably accurate
laboratory test results, signs and symptoms is probably correct, the treatment and
interventions he/she receives will probably work and the patient will probably get better
(Thompson & Dowding, 2001). As a result, healthcare professionals often have to make
difficult clinical decision based on potential clinical, economic, and quality of life
outcomes of treatment.
To gain a sense of the range of issues of uncertainty in healthcare, Peter Szolovits
(1995) categorized uncertainty into four major groups: (1) methodological questions:
analysis of different means of representing and reasoning with uncertainty, e.g.
comparison of Bayesian modeling vs. non-linear least squares analysis of data; (2)
reducing uncertainty: which involves providing better explanations of natural
phenomena, and making distinctions that create more consistent and therefore offers
more predictable diagnoses and treatment outcomes; (3) patients’ understanding and
response to uncertainty (i.e. understanding normative vs. descriptive decision models);
and (4) physicians’ coping with uncertainty (i.e. factors influencing physicians on making
clinical decisions). Although a substantial amount of work has focused on
methodological concerns on properly representing uncertainty in diagnostic programs and
making inferences and decisions in the face of uncertainty (Szolovits, 1995), efforts to
2
quantify uncertainty in specific areas of healthcare such as clinical laboratory, health
economics and outcomes research have been limited. In this dissertation, we study issues
in the reduction of uncertainty (paper 1), physicians’ coping with uncertainty (paper 2),
and methodological questions (paper 3) categories of uncertainty in healthcare.
Many decision-analysis tools providing normative models of rational decision
making have been proposed to aid clinical decision making in the face of uncertainty.
Among them, probabilistic clinical reasoning using probabilistic graphical models is a
logical analytic approach to assist healthcare professionals to identify the key elements of
uncertainty and to choose among alternatives in such as way as to optimize each chosen
outcome.
Probabilistic graphical models (PGMs) are those models that employ both
probability theory and graph theory (Murphy, 2002). Fundamental to the idea of a PGM
is the notion of modularity, i.e. a complex system can be built by combining simpler
parts. Probability theory provides connections where all the parts are combined, ensuring
that the system as a whole is consistent, and providing ways to interface models to data.
Graph theory provides both an intuitive and visual representation of a joint probability
distribution of the model, and a natural framework for the design of new systems
(Murphy, 2002). In this dissertation, we use two common PGMs: (1) A Bayesian network
and (2) Discrete-State Markov model.
3
Bayesian Network
A Bayesian network is a probabilistic graphical model that represents the
probabilistic relationship between a set of variables of interest (or nodes) and their
conditional dependences via a directed acyclic graph (DAG) (Pearl, 2000; Neapolitan,
2003; Jensen, 1996).
Let‘s consider a domain Z of n variables x
1
, x
2
… x
n
, where each x
i
represents a
node in the Bayesian network. A Bayesian network for Z is a joint probability distribution
over Z that encodes assertions of conditional independence and as well as dependencies.
Specifically, a Bayesian network B = (D, P) consists of a directed acyclic graph (DAG),
D, and a set of conditional probability distributions for all variables x
i
’s in the network, P.
The DAG, which defines the structure of the Bayesian network, contains a node
for each variable x
i
ϵ Z and a finite set of directed edges (arrows) between nodes denoting
the probabilistic dependencies among variables in Z. For health measurement, a node
may be a health domain, and the states of the node are the possible responses to that
domain. A node is called a child if the probability distribution over its states is
conditional on information about other nodes (items). Conversely, a parent node provides
the information that is used to condition probability within the child node(s). Formally,
for each child node x
i
with parents π(x
i
), there is attached a conditional probability
distribution, P(x
i
|π(x
i
)). With a set of only discrete variables x
i
’s, the joint probability
distribution of the network can be factored as follows (Pearl, 2000; Neapolitan, 2003;
Jensen, 1996):
4
Suppose that there are two events which could cause grass to be wet: either the
sprinkler is on or it is raining. Also, suppose that the rain has a direct effect on the use of
the sprinkler, i.e. the sprinkler is turned off when it is raining. With all three variables
(rain, sprinkler, and grass wet) have two possible states T (true) and F (false), then the
situation can be modeled with a Bayesian network as shown in Figure 1.1 below (Pearl,
2000; Jensen, 1996):
Figure 1.1: An example of a simple Bayesian network
The joint probability function:
) ( ) | ( ) , | ( ) , , ( R P R S P R S G P R S G P
n
i
i i n
x x P x x x P
1
2 1
)) ( | ( ) ... , (
5
Where G = Grass wet, S = Sprinkler, and R = Rain.
The Bayesian network can be used to answer (i.e. make inference) questions such
as, ―what is the probability that it is raining, given the grass is wet?‖ by using the
conditional probability formula and summing over all variables (nodes):
} , { ,
} , {
) , , (
) , , (
) (
) , (
) | (
F T R S
F T S
S T G R P
S T G T R P
T G P
T G T R P
T G T R P
% 77 . 35 % 100
0 228 . 0 1584 . 0 00198 . 0
) 1584 . 0 8 . 0 99 . 0 2 . 0 ( ) 00198 . 0 01 . 0 99 . 0 2 . 0 (
FTF FTT TTF TTT
TTF TTT
As in the example numerator is pointed out explicitly, the joint probability
function is used to calculate each iteration of the summation function, i.e. marginalizing
over S in the numerator, and marginalizing over S and R in the denominator.
Discrete-State Markov Model
Markov models are probabilistic graphical models that represent a series of
probable transitions between states (Jordan, 2004). These models are generally applied to
the natural development of a disease progresses over time. They assume that in each
―cycle‖ time a patient is in one of the finite set of health states, and that there is a certain
probability of transferring to a different health state at the next cycle. Furthermore,
Markov models assume that the probability of entering a new health state at the start of
each cycle does not depend on the path the patient took to their current health state
6
(although the probability may depend on the cycle and other risk factors) (Spiegelhalter
et al., 2004).
Let assume that a Markov model comprises of N cycles labeled t = 1, 2…N, and
that within each cycle t a patient remains in one of R health state, and all transitions occur
at the beginning of each cycle. The probability distribution at the beginning of the first
cycle t = 1 is represented by the row vector p
1
, and it is assumed that a transition matrix
M
t
whose (i,j)th element M
t, ij
is the probability of moving from health state i to health
state j between cycle t – 1 and t. Hence, the marginal probability distribution p
t
during
cycle t > 1 follows the recursive relationship: p
t
= p
(t - 1)
M
t
.
For example, Figure 1.2 shows a Markov model of a progressive disease with
three health states: Well, Ill, and Dead. Suppose that in the ―Well‖ state after each cycle
time, a patient can stay in the same ―Well‖ state at one-third probability or become ill
(thus moves to ―Ill‖ state) at two-third probability. Similarly, a patient in the ―Ill‖ state
can either stay in the same ―Ill‖ state at one-third probability or dies at two-third
probability. Once a patient dies (in the ―Dead‖ state), she/he stays in this state forever;
and it‘s called the absorbing state.
Figure 1.2: An example of a simple progressive disease
WELL ILL DEAD
7
Assuming that a cohort of 27 ―healthy‖ patients goes through the disease and each
cycle is one year. After 3 cycles (years), on average, one patient is in ―Well‖ state, six in
―Ill‖ state, and 20 patients will have died.
We have: 0 0 27
0
p
1 3 / 2 0
0 3 / 1 3 / 2
0 0 3 / 1
M
Applying the formula from the recursive relationship:
Cycle N: p
N
= p
(N – 1)
*M
Table 1.1 shows results after 3 cycles (years)
―Well‖ state ―Ill‖ state Dead
Cycle 0 (Beginning):
Cycle 1:
Cycle 2:
Cycle 3:
27
9
3
1
0
18
12
6
0
0
12
20
Purposes of the Study
Health economics and outcomes research (HEOR) is a multidisciplinary approach
to healthcare and research that incorporates number of areas of expertise including
clinical research, epidemiology, health services research, economics, and psychometrics.
The field has rapidly expanded in the last decade and played a crucial role in
improvement the quality of healthcare. Drugs, healthcare programs, and medical devices
are increasingly required to demonstrate not only their efficacy and safety characteristics,
but also their superior performance in clinical effectiveness, health-related quality of life
8
and economic outcomes (Badia et al., 2002). While probabilistic graphical models have
become a popular tool for data analysis in health informatics (Airoldi, 2007), especially
used to prescribe treatment or guide diagnostic decisions, their use and applications in
HEOR have been limited. This three-paper dissertation introduces new approaches using
probabilistic graphical models in health economics and outcomes research. Paper 1
applies a Markov model with Monte-Carlo simulation to determine the cost-effectiveness
of a newly approved and expensive cancer drug, lapatinib (Tykerb®), in treatment of
HER2-positive advanced breast cancer. In addition, the new modeling approach in this
cost-effectiveness analysis uses Microsoft® Excel with Visual Basic programming
language and examines advantages over other traditional models used only Excel
spreadsheet and those used commercial software like TreeAge®. Paper 2 introduces a
new approach using a Bayesian network that can detect blood laboratory errors and
compares the predictive performance of our model with the existing models. Successful
implementation of the Bayesian model in clinical laboratory suggests that it can be an
effective means for reducing medical costs and improving patient safety. Paper 3
introduces a novel probabilistic mapping method using Bayesian networks to convert SF-
12 responses into EQ-5D utility scores in a national sample. Successful implementation
of the probabilistic mapping method can provide a natural approach to map between
health-profile or disease-specific measures and health-utility measures for health
economic evaluations.
9
CHAPTER 2: COST-EFFECTIVENESS ANALYSIS OF LAPATINIB IN HER2-
POSITIVE ADVANCED BREAST CANCER: AN APPLICATION OF MARKOV
MODEL WITH MONTE-CARLO SIMULATION
1
Chapter-Two Abstract
Background: A recent clinical trial demonstrated that the addition of lapatinib to
capecitabine in the treatment of HER2-positive advanced breast cancer (ABC)
significantly increases median time to progression. The objective of the current analysis
is to assess the cost-effectiveness of this therapy from the U.S. societal perspective.
Methods: A Markov model comprising of 4 health-states (stable disease, respond-to-
therapy, disease progression, and death) was developed to estimate the projected-lifetime
clinical and economic implications of this therapy. The model used Microsoft® Excel
with Visual Basic programming language and Monte-Carlo simulation method to imitate
the clinical course of a typical patient with ABC and updated with response rates and
major adverse effects. Transition probabilities were estimated based on the results from
the EGF100151 and EGF20002 clinical trials of lapatinib. Health-state utilities, direct
and indirect costs of the therapy, major adverse events, laboratory tests, costs of disease
progression were obtained from published sources. The model used 3% discount rate and
reported in 2007 U.S. dollars.
1
The final, definitive version of this paper has been published in CANCER journal, Vol. 115/Issue 3,
February 2009 by WILEY Publications, Wiley & Sons, Inc. All rights reserved.
10
Results: Over a lifetime, the addition of lapatinib to capecitabine as combination therapy
was estimated to cost an additional $19,630 with an expected gain of 0.12 quality-
adjusted life years (QALY) or an incremental cost-effectiveness ratio (ICER) of $166,113
per QALY gained. The 95% confidence limits of the ICER ranged from $158,000 to
$215,000/QALY. Cost-effectiveness acceptability curve indicated less than 1%
probability that the ICER would be lower than $100,000/QALY.
Conclusion: Compared with commonly accepted willingness-to-pay thresholds in
oncology treatment, the addition of lapatinib to capecitabine is not clearly cost-effective;
and most likely to result in an ICER somewhat higher than the societal willingness-to-pay
threshold limits. In addition, modeling Microsoft® Excel with Visual Basic programming
language provides two advantages over other traditional method using only Excel or
commercial software: ease to calibrate model parameters and graphical visualization as
subjects moving from one health state to another until death.
11
Chapter-Two Background
Breast cancer is the most common female cancer in the United States, the second
most common cause of cancer death in women (after lung cancer), and the main cause of
death in women ages 45 to 55 (American Cancer Society, 2008). In 2007, it was
estimated that approximately 178,480 American women were diagnosed with breast
cancer and more than 40,000 were expected to die from this disease (American Cancer
Society, 2008). The National Cancer Institute estimated that about 2.4 million women
with a history of breast cancer were alive in 2004; and most of these women were cancer-
free, while others still had evidence of cancer and may have been undergoing treatment
(Ries et al., 2007). Due to high prevalence, morbidity and mortality, considerable R&D
effort is devoted to developing new breast cancer treatments. Many of the newer
treatment approaches are associated with substantially higher costs.
Among patients with breast cancer, approximately 20% to 30% have human
epidermal growth factor receptor-2 (HER2) positive disease, which is associated with
poor treatment outcomes (Slamon et al., 1987). Furthermore, unlike early-stage breast
cancer, most metastatic breast cancer is incurable; thus leading to relatively poor
prognosis in HER2-positive metastatic breast cancer (MBC) patients. Treatment, as a
result, has been limited to controlling the spread of metastases and improving the quality
of life without causing a detrimental effect on survival (Muss, 2006).
Recently, lapatinib (Tykerb; GlaxoSmithKline, Research Triangle Park, NC), an
orally administered small molecule inhibitor of the tyrosine kinase domains of HER1 and
12
HER2, and epidermal growth factor receptor (EGFR), has been approved by Food and
Drug Administration (FDA) in combination with capecitabine for treatment of HER2-
positive MBC (second-line therapy) for those who had been previously treated with
trastuzumab. The EGF1000151 clinical trial (Geyer et al., 2006; Geyer et al., 2007)
demonstrated that the addition of lapatinib to capecitabine in the treatment of HER2-
positive advanced breast cancer significantly improved the median time to progression
(TTP) by 8.5 weeks (18.6 weeks versus 27.1 weeks; p = 0.00013) and overall response
rate (ORR) by 9.8% (13.9% versus 23.7%; OR = 1.9, 95% CI: 1.1 – 3.4). Nonetheless,
the cost-effectiveness of the new therapy with lapatinib has not been determined. The
objective of this study is to develop an economic model to evaluate incremental cost-
effectiveness ratio (ICER) of lapatinib in second-line treatment of HER2-positive breast
cancer based on the EGF100151 clinical trial of lapatinib as well as updated outcomes
data reported at the American Society of Clinical Oncology in June 2007 (Geyer et al.,
2006; Geyer et al., 2007).
Chapter-Two Methodology
The Economic Model:
This economic model was a cost-effectiveness analysis evaluating capecitabine
monotherapy and the capecitabine plus lapatinib combination therapy from the U.S.
societal perspective. Time to progression (time from randomization to disease
progression or death due to breast cancer) was the specified primary endpoint of the
EGF100151 clinical trial (Geyer et al., 2006; Geyer et al., 2007). Secondary endpoints
13
included overall response rate (percentage of subjects achieving either a complete or
partial response), progression-free survival (PFS, time from randomization to disease
progression or death due to any cause), and overall survival (OS, time from
randomization to death due to any cause).
We constructed a Markov model comprising of 4 health-states: metastatic breast-
cancer (MBC) treatment with disease-free progression, respond-to-therapy, disease
progression, and death. In each health-state, the gains in direct and indirect costs, life-
months (LMs), and quality-adjusted life months (QALMs) were estimated over time. In
our model, patients began their treatment in the stable-disease health-state; then they
could move to the respond-to-therapy health-state if the tumor responded to treatment; or
they could stay at the same health-state or move to the disease-progression health-state.
Death could occur in any health-state; therefore, patients did not necessary die only in the
disease-progression state. Similarly, patients in the respond-to-therapy health-state could
stay in the same health-state or move to the disease-progression health-state; while
patients in the disease-progression health-state could stay in the same health-state or
eventually die (Figure 2.1).
14
Figure 2.1: Markov Model with 4 Health-States and Transitions
Our model used Monte Carlo simulation to track the clinical course of MBC
patients with response rates and adverse effects taken from clinical trial data. The
analysis was carried out in Microsoft Excel® using Visual Basic® to run the model
simulations. Transition probabilities were derived primarily from the results from
EGF100151 and EGF20002 (GlaxoSmithKline website, accessed December 2007)
clinical trials of lapatinib using the ―DEALE‖ method (Beck et al., 1982). To validate our
transition probabilities, we simulated a first-order Monte-Carlo method on the estimated
transition probabilities to obtain the output results. We then iteratively calibrated the
estimated parameters until the model consistently produced results similar to the clinical
trials outcomes. Finally, the newly calibrated transition probabilities were applied to
generate model results (Table 2.1). Each cycle of the Markov model was set to be 1.5
months (or approximately 6 weeks) to replicate the true interval time for patient re-
examination that was used in the trial. The model was run until all patients died (lifetime
horizon extrapolation). A discount rate of 3% was used in our model (Gold et al., 1996).
DISEASE
PROGRESSION
RESPOND-TO
THERAPY
STABLE
DISEASE
DEATH
15
Table 2.1: Markov model transition probabilities in each 6-week cycle
Model Parameters Lapatinib + Capecitabine Capecitabine
Alone Reference
Stable Disease Health-State:
Probability of Respond-to-Therapy 0.0620 0.0430
EGF100151 and EGF20002,
Probability of Stable Disease 0.7440 0.6800
2005, 2006, and 2007 (updated)
(Staying in the same health-state)
Probability of Disease-Progression 0.1400 0.2150
Probability of Death 0.0540 0.0620
Respond-to-Therapy Health-State:
Probability of Respond-to-Therapy 0.8150 0.8150
EGF100151 and EGF20002,
(Staying in the same health-state)
2005, 2006, and 2007 (updated)
Probability of Disease-Progression 0.1450 0.1050
Probability of Death 0.0400 0.0800
Disease-Progression Health-State:
Probability of Disease-Progression 0.9070 0.9040
EGF100151 and EGF20002,
(Staying in the same health-state)
2005, 2006, and 2007 (updated)
Probability of Death 0.0930 0.0960
Notes: Probabilities were estimated and calibrated to give output results that were close to outcome results
from the clinical trials of lapatinib; EGF20002: phase-II clinical trial of lapatinib; EGF100151: phase-III
clinical trial of lapatinib.
Patient Population:
The baseline patient in the model is a 53 years-old woman who has progressive,
HER2-positive, locally advanced or metastatic breast cancer and has previously been
treated with a minimum of anthracycline, taxane, and trastuzumab. Patients with
preexisting heart disease or conditions that could affect gastrointestinal absorption or
previously treated with capecitabine were ineligible. The combination regimen consisted
of lapatinib (L) at a dose of 1250 mg daily, and capecitabine (C) at a dose of 2000 mg per
square meter of body-surface area in two divided doses on days 1 through 14 of a 21-day
cycle; while the monotherapy (C only) was administered at a dose of 2500 mg per square
16
meter of body-surface area in two divided doses on days 1 through 14 of a 21-day cycle
(Geyer et al., 2006; Geyer et al., 2007).
Healthcare Resource Costs:
Healthcare resource costs were based on published data and were expressed in
2007 U.S. dollar (Table 2.2). The estimated drug cost per patient was based on the
wholesale acquisition cost (WAC) (National Institute for Health and Clinical Excellence
website, assessed May 2008). In the clinical trial of lapatinib, the average dose of
capecitabine in combination and mono-therapies were reported to be 2,000mg/m
2
/day
and 2,377mg/m
2
/day, respectively. The mean cost of treating a cardiac event was
assumed to be $1,979 (Garrison et al., 2007). The range of costs per severe diarrhea event
(grade III/IV) was between $2,559 and $8,230 (Dranitsaris et al., 2004; Shah et al.,
2004). Other costs of monitoring lab-tests included LVEF exam, renal function test,
complete blood count, and liver function test were based on Medicare reimbursement
code in 2007 (Centers for Medicare and Medicaid website, assessed August 2007). It is
less likely that trastuzumab-containing therapies would be given in third-line treatment
once disease progressed in patients who had already been refractory with trastuzumab-
containing regimens; thus, the average monthly cost after disease progression was
derived from the study of Mclachlan et al. (1999) in third-line treatment of metastatic
breast cancer and was set at a baseline value of $3,535 (in 2007 USD with 3% discount
rate). The combination therapy of lapatinib and capecitabine showed significant reduction
in central nervous system (CNS) metastases event compared with the monotherapy (2%
17
versus 6%, p = 0.045). The baseline annual cost per CNS metastases event was $100,000
based on study of Pelletier et al. (2008). Hence, the average monthly costs after disease
progression were estimated to be $3,631 [= $3,535 + ($100,000 ÷ 12) x 2% – ($3,535 x
2%)] in the combination therapy and $3,823 [= $3,535 + ($100,000 ÷ 12) x 6% – ($3,535
x 6%)] in the monotherapy.
Indirect costs associated with patient time and travel were estimated at the
average hourly compensation rate from the U.S. Bureau of Labor Statistics website
(assessed July 2007) times an adjusted factor of 1.5 ($28.11x1.5 = $42.17); and it was
assumed that it would take 2 hours of patient time in each 1.5-month cycle for re-
examination.
18
Table 2.2: Main model parameters: base-case values, ranges, and assumed distributions
Parameters Base-case Range Sensitivity Analysis Distribution Source
Health Utilities:
Stable Disease Health-State 0.70 0.50 – 0.80 Triangular Distribution
(Max = 0.8; min = 0.5; mode = 0.7) Elkin et al. 2004
Respond-to-Therapy Health-State 0.84 0.57 – 0.93 Triangular Distribution Brown et al. 1998
(Max = 0.93; min = 0.57; mode = 0.84) Earle et al. 2000
Disease-Progression Health-State 0.50 0.45 – 0.72 Triangular Distribution
(Max = 0.72; min = 0.45; mode = 0.50)
Costs (2007 U.S. Dollars):
250-mg Lapatinib Tablet $23.0 $18.4 - $27.6 Triangular Distribution
(Max = $27.6; min = $18.4; mode = $23.0) NICE
150-mg Capecitabine Tablet $1.50 $1.20 - $1.80 Triangular Distribution (WAC ± 20%)
(Max = $1.80; min = $1.20; mode = $1.50)
Severe Diarrhea Event $5,394 $2,559 - $8,230 Log-Normal Distribution Dranitsaris et al. 2004
(Mean = 8.60; SD = 0.16) Shah et al. 2004
Average Annual CNS Metastases $100K $80K - $120K Triangular Distribution Pelletier et al. 2007
(Max = $120K; min = $80K, mode = $100K)
After Disease Progression $3,535 $2,828 - $4,242 Triangular Distribution McLachlan et al. 1999
(Monthly) (Max = $4,242; min = $2,828; mode = $3,535)
Cardiotoxicity Event $1,979 Not varied Not varied Garrison et al. 2007
LVEF Exam $367 Not varied Not varied CMS
Total of Monitor Lab Tests $50 Not varied Not varied CMS
Value of Patient Time $84 Not varied Not varied BLS
(Hourly earnings x 1.5 x 2 hours)
Other Relevant Parameter:
Duration of Survival after 11.20 7.40 – 15.00 Log-Normal Distribution Assumed (derived from
Disease Progression (months) (Mean = 2.40; SD = 0.07) 95% CI of OS in the trial)
Notes: LVEF: Left ventricular ejection fraction; Monitor Lab Tests included renal function test, complete blood count, and liver function test at a
frequency of 2 months; CMS: Centers for Medicare and Medicaid Services; BLS: U.S. Bureau of Labor Statistics; NICE: National Institute for
Health and Clinical Excellence (U.K.). 95% CI of overall survival (OS) in the trial reported approximately from 13.72 months – 21.32 months.
19
Health-State Utilities:
Estimates of quality-of-life utilities for different health-states were adapted from
prior MBC studies shown in Table 2.2. Regardless of treatment group, utilities in stable
disease, respond-to-therapy, and disease progression health-states were estimated to be
0.70 (ranged from 0.50 to 0.80), 0.84 (ranged from 0.57 to 0.93), and 0.50 (ranged from
0.45 to 0.72), respectively (Elkin et al., 2004; Earle et al., 2000; Brown & Hutton, 1998).
Major Assumptions:
The EGF100151 clinical trial of lapatinib was halted prematurely due to clinically
meaningful and statistically significant advantage in the primary endpoint (TTP) in the
combination therapy with lapatinib and capecitabine versus capecitabine monotherapy.
All monotherapy patients were offered an option of switching to the combination therapy.
There was no significant difference in median overall survival reported in the
combination therapy (15.78 months) and monotherapy (15.54 months) (Geyer et al.,
2006; Geyer et al., 2007). This was likely caused by the crossover from monotherapy to
combination therapy when the trial was halted; thus an assumption was needed to adjust
for overall survival differences. We therefore assumed that the mean duration of survival
after disease progression was the same with both therapies. In addition, the mean duration
of survival after disease progression was estimated by subtracting the mean progression-
free survival from the mean overall survival (i.e. mean duration of survival after
progression = mean OS – mean PFS). As a result, the estimated mean overall survival in
combination therapy and monotherapy would be 17.52 months (mean PFS of 6.32
20
months + mean duration of survival after disease progression of 11.20 months) and 15.54
months, respectively (Figure 2.2). In our sensitivity analyses, we further varied the
duration of survival after disease progression in the combination therapy in accordance
with the 95% confidence interval of overall survival. In other words, when the mean PFS
is 6.32 months, varying the overall survival in the combination therapy within its 95%
confidence interval will result in varying the duration of survival after disease
progression.
Figure 2.2: Estimated overall survival based on the Markov model with 20,000 Monte-
Carlo simulations
ESTIMATED OVERALL SURVIVAL (OS)
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25 30 35 40
Months
Percent (%) Survival
Lapatinib and Capecitabine Capecitabine
21
Sensitivity Analyses:
We performed both one-way and probabilistic sensitivity analyses to examine the
impact of varying our base-case assumptions on the incremental cost per quality-adjusted
life year gained. In the one-way sensitivity analysis, the effect of changes in individual
base-case parameters across possible ranges of values was investigated. We conducted a
probabilistic sensitivity analysis in which key parameters were simultaneously and
randomly varied over the appropriate probability distributions using Monte-Carlo
simulation with 20,000 runs (Table 2.2). Based on the simulation results, we constructed
a cost-effectiveness acceptability curve at different willingness-to-pay thresholds as well
as 95% confidence limits for the base-case ICER.
Chapter-Two Results
Over the lifetime horizon, after the model was calibrated and run on 20,000
simulated patients, in the base case, the addition of lapatinib to capecitabine in the
combination therapy was estimated to cost an additional $19,630 with an expected gain
of 0.12 quality-adjusted life years (QALY). Thus, from the U.S. societal perspective, the
incremental cost-effectiveness ratio, ICER, was estimated to be $166,113 per QALY
gained (or approximately $13,843 per QALM gained) (Table 2.3).
The simulation model outcome results were as follows for the combination and
monotherapy groups, respectively: (1) the mean times to progression were approximately
6.21 months and 4.24 months, (2) the mean overall response rates were 24.1% and
13.6%, (3) the mean overall survival was 17.41 months and 15.45 months, and (4) the
22
mean duration of survival after disease progression was 11.20 months and 11.22 months.
In addition, the average total costs per patient were estimated at $66,499 with
combination therapy versus $46,869 with monotherapy (Table 2.3).
Chapter-Two Sensitivity Analyses
Figure 2.3 shows the impact of varying model estimates on incremental cost per
quality-adjusted life year gained. The one-way sensitivity analysis showed ICER results
ranging from $115,000 per QALY gained to combination therapy being dominated. The
base-case ICER was most sensitive to duration of survival after disease progression, cost
of lapatinib, and the health-utilities in stable-disease and respond-to-therapy health-states.
As the major assumption in our model, we varied only the duration of survival after
disease progression in the combination therapy from 7.4 months to 15.0 months based on
the 95% confidence interval of median overall survival (OS) reported in the EGF100151
trial (approximately 13.7 months to 21.3 months). The resulting ICER, similarly, ranged
from $115,000/QALY to being dominated (Figure 2.4).
Probabilistic sensitivity analysis showed ICER 95% confidence limits ranging
from $158,000 per QALY to $215,000 per QALY gained. The cost-effectiveness
acceptability curve indicated less than 1% probability that the ICER would be lower than
$100,000/QALY; and less than 2% probability that the ICER would be lower than
$150,000/QALY (Figure 2.5).
23
Table 2.3: Summary of cost and outcome results in Monte Carlo simulation (N = 20,000 runs)
Parameters Lapatinib + Capecitabine Capecitabine Alone Difference
Mean Time to Progression (TTP) 6.21 months 4.24 months 1.97 months
Mean Overall Response Rate (ORR) 24.1% 13.6% 10.5%
Mean Overall Survival (OS) 17.41 months 15.45 months 1.96 months
Mean Duration after Disease Progression 11.20 months 11.22 months 0.02 months
Average Total Cost per Patient $66,499 $46,869 $19,630
Cost per Life-Year Gained $120,184/LY
Cost per Quality-Adjusted Life-Year Gained $166,113/QALY
Cost per Progression-Free Life-Year Gained $133,167/Progression-Free Life Year
Notes: Results based on 20,000 Monte Carlo simulations.
Figure 2.3: Tornado diagram of one-way sensitivity analyses shows the impact of individual base-case assumptions on the ICER
0
$30,564
$29,268
$23,148
$6,324
$9,684
$2,604
$576
$600
-$51,120
-$30,564
-$14,184
-$9,708
-$6,264
-$3,036
-$2,604
-$240
-$564
Duration of Survival after Disease Progression
Cost of Lapatinib (Tykerb®) per 250-mg Tablet
Health Utility in Stable-Disease Health-State
Health Utility in Respond-to-Therapy Health-State
Cost of Capecitabine (Xeloda®) per 150-mg Tablet
Average Annual Cost of CNS Metastases Event
Average Monthly Cost After Progression
Cost per Severe Diarrhea Event
Health-Utility in Disease-Progression Health-State
BASE-CASE: $166,113/QALY
-------------------DOMINATED-------------------
24
Figure 2.4: Plot of ICER and changing duration of survival after disease progression in
the combination therapy from 7.4 months to 15.0 months based on the 95% confidence
interval of median overall survival (OS) reported in the EGF100151 trial (approximately
13.7 – 21.3 months) while holding the OS in the monotherapy fixed at 15.54 months
Figure 2.5: Cost-Effectiveness acceptability curve
COST-EFFECTIVENESS ACCEPTABILITY CURVE
0.00
0.20
0.40
0.60
0.80
1.00
$75,000 $100,000 $125,000 $150,000 $175,000 $200,000 $225,000 $250,000
COST($)/QALY THRESHOLDS
PROBABILITY
$0
$100,000
$200,000
$300,000
$400,000
$500,000
$600,000
$700,000
0 2 4 6 8 10 12 14 16 18 20
ICER ($ / QALY)
Duration of Survival after Progression (months)
ICER vs. Duration of Survival after Progression
$166,113/QALY
(Base-case ICER)
25
Chapter-Two Discussion
The current cost-utility analysis based on the pivotal clinical trial indicated that
the addition of lapatinib to capecitabine as a combination therapy for second-line
treatment of HER2-positive metastatic breast cancer was less likely to be cost-effective
when comparing with commonly accepted willingness-to-pay thresholds in oncology
treatment (even at a willingness-to-pay threshold of $150,000/QALY) (Ubel et al., 2003;
Devlin & Parkin, 2004). In the clinical trial of lapatinib, progression-free survival (PFS)
was used as secondary endpoint. Using PFS instead of overall survival as a criterion for
effectiveness in metastatic breast cancer to estimate the ICER would result in an
overestimation of quality of life before disease progression and an underestimation after
progression as a weight of one for life-years before progression of disease (no adjustment
for health-state utility) and a weight of zero after disease progression (no accounting
duration of survival after disease progression) were applied (Lamers et al., 2008). The
resulting ICER derived from the PFS-only was about $133,670 per progression-free life-
year gained. It is, again, less likely that treatment with lapatinib in ABC is cost-effective.
Because there was lack of standard clinical practice on treatment for trastuzumab
refractory patients and the EGF100151 clinical trial was the only clinical evidence on the
effectiveness of lapatinib, our economic model used the monotherapy with capecitabine
as the only comparator with the combination therapy. Several non-randomized controlled
and observational studies (Extra et al., 2006; Bartsch et al., 2007; Garcia-Saenz et al.,
2005; Del Bianco et al., 2006; Morabito et al., 2006) showed some benefits for
26
continuing treatment with trastuzumab-containing therapies after disease progression;
however, the results varied widely depending on patient population included as well as
treatments used. As a result, it would be premature to compare indirectly the combination
therapy of lapatinib and capecitabine with trastuzumab-containing therapies.
Because of significant improvement in the primary endpoint (TTP) in the
combination therapy compared with the monotherapy, all patients in the monotherapy
arm were offered an option of switching to the combination therapy; resulting in a
―crossover‖ effect. This effect likely caused an insignificant difference in median overall
survival between the two therapies (even though there was a significant difference in
TTPs). To adjust for the ―crossover‖ effect, we assumed that the mean survival time after
disease progression were the same in the two therapies. Therefore, the estimated
difference of the mean overall survival between combination and mono- therapies was
approximately 2 months (8.5 weeks); which was the same as the estimated difference in
median PFS between the two therapies (8.5 weeks). We believed that the assumption was
realistic and justified. Without this assumption, the resulting ICER was well above
$500,000/QALY. In addition to varying this assumption in the sensitivity analyses, our
threshold sensitivity analysis showed that, with an assumption of 23.5 months in overall
survival for the combination therapy with lapatinib, the ICER resulted in lower than the
WTP threshold of $100,000/QALY.
Quality-of-life utilities in different health-states used in our model were not
obtained from the clinical trial, but rather from the breast-cancer literature. Ranges of
27
utilities with reported means in 3 of metastatic breast-cancer health-states were obtained
from a similar study of HER2-positive metastatic breast cancer (Elkin et al., 2004). The
model was sensitive to the ranges of utilities in the stable disease health-state
(approximately $150,000 to $195,000 per QALY gained) and in the respond-to-therapy
health-state ($156,000 to $189,000 per QALY gained); while not sensitive in the disease-
progression health-state ($165,800 to $166,700 per QALY gained).
Based on the assumed random distributions of our key model parameters, the
probabilistic sensitivity analysis showed 95% confidence limits of the ICER between
$158,000 and $215,000 per QALY gained. At willingness-to-pay (WTP) of
$100,000/QALY, less than 1% probability of the resulting ICER would be lower than this
WTP threshold. Our threshold sensitivity analysis indicated that only at 65% of the
current acquisition cost of lapatinib (approximately $14.95 per 250-mg tablet); the ICER
would be lower than the WTP threshold of $100,000/QALY.
In addition, the economic burden of CNS metastases among patients with primary
breast cancer as reported in Pelletier et al. (2008) was estimated to cost, on average, about
$100,000 per year in each patient. The cost-effectiveness of the combination therapy with
lapatinib is dependent on the risk reduction of CNS metastases used in the base-case.
This subgroup of high-risk patients would benefit most as the clinical trial showed three-
fold greater reduction in risk of brain metastases in the combination therapy (risk of 2%)
compared with the monotherapy groups (risk of 6%). Applying lapatinib therapy to this
subgroup with an assumption of 10% in absolute risk of CNS metastases (i.e. 30%
28
absolute risk of CNS metastases in the monotherapy), our model estimated that the
resulting ICER would be approximately $90,000/QALY. It is suggested that clinical trials
should be conducted in the subgroup of patients with high risk of CNS metastases so that
potential cost and clinical benefits can be assessed.
A limitation of the current analysis was that cost and clinical data were not
available after disease progression; thus, assumptions were derived from the relevant
published literature of metastatic breast cancer. Nevertheless, the cost-effectiveness
results were robust to wide ranges of parameter values in our sensitivity analyses.
Moreover, it would be possible to extend our results to other Western as our model was
mostly sensitive to duration of survival after disease progression, price of lapatinib, the
health-utilities in stable-disease and respond-to-therapy health-states; which are
parameters not widely varied across Western countries. When we assumed the medical
costs (cost of CNS metastases event, medical costs after disease progression, cost of
severe diarrhea event, and costs of related lab tests) in Western countries were 30% lower
than those in the U.S., the resulting base-case ICER was still approximately
$150,000/QALY.
Chapter-Two Conclusion
Our economic evaluation of the combination therapy with lapatinib and
capecitabine in advanced breast cancer patients resulted in an ICER of $166,113 per
QALY gained. Compared with commonly accepted willingness-to-pay thresholds in
oncology treatment, the addition of lapatinib to capecitabine is not clearly cost-effective;
29
and most likely results in ICER somewhat higher than the threshold limits. In addition,
modeling Microsoft® Excel with Visual Basic programming language provides two
advantages over other traditional method using only Excel or commercial software: ease
to calibrate model parameters and graphical visualization as subjects moving from one
health state to another until death (Figure 2.6).
30
Figure 2.6: The cost-effectiveness model using Microsoft Excel® with Visual Basic programming language
31
Chapter-Two References
American Cancer Society. Breast Cancer Facts & Figures 2007-2008. Atlanta: American
Cancer Society, Inc.
An open-label, multicenter, single arm phase II study of oral GW572016 as single agent
therapy in subjects with advanced or metastatic breast cancer who have
progressed while receiving Herceptin containing regimens. GlaxoSmithKline.
Available from URL: http://ctr.gsk.co.uk/Summary/lapatinib/II_EGF20002.pdf
[accessed: December 18, 2007].
Bartsch R, Wenzel C, Altorjai G, Pluschnig U, Locker GJ, Rudas M, et al. Trastuzumab
(T) plus capecitabine (C) in heavily pretreated patients (pts) with advanced breast
cancer (ABC) [abstract]. J Clin Oncol. 2007;25:18S.
Beck JR, Pauker SG, Gottlieb JE, Klein K, Kassirer JP. A convenient approximation of
life expectancy (the "DEALE"). II. Use in medical decision-making. Am J Med
1982;73:889-997.
Brown RE, Hutton J. Cost-utility model comparing docetaxel and paclitaxel in advanced
breast cancer patients. Anticancer Drugs 1998;9:899-907.
Centers for Medicare and Medicaid. Available at URL:
http://www.cms.hhs.gov/home/medicare.asp [accessed August 31, 2007].
Del Bianco S, Rondinelli R. Trastuzumab-containing therapies: Activity beyond disease
progression in M.B.C. – A pivotal experience [abstract]. J Clin Oncol.
2006;24:18S.
Devlin N, Parkin D. Does NICE have a cost effectiveness threshold and what other
factors influence its decisions? A binary choice analysis. Health Econ
2004;13:437-452.
Dranitsaris G, Maroun J, Shah A. Severe chemotherapy induced diarrhea (CID) in
patients with colorectal cancer: A cost of illness analysis [abstract]. J Clin Oncol.
2004;22:14S.
Earle CC, Chapman RH, Baker CS, Bell CM, Stone PW, Sandberg EA, et al. Systematic
overview of cost-utility assessments in oncology. J Clin Oncol 2000;18:3302-
3317.
Elkin EB, Weinstein MC, Winer EP, Kuntz KM, Schnitt SJ, Weeks JC. HER-2 testing
and trastuzumab therapy for metastatic breast cancer: a cost-effectiveness
analysis. J Clin Oncol 2004;22:854-863.
32
Extra J-M, Antoine E-C, Vincent-Salomon A, et al. Favourable effect of continued
trastuzumab treatment in metastatic breast cancer: results from the French
Hermine cohort study. Breast Cancer Res Treat 2006; 100:S102.
Fenwick E, Marshall DA, Levy AR, Nichol G. Using and interpreting cost-effectiveness
acceptability curves: an example using data from a trial of management strategies
for atrial fibrillation. BMC Health Services Research 2006;6:52.
Garcia-Saenz JA, Martin M, Puente J, Lopez-Tarruella S, Casado A, Moreno F, et al.
Trastuzumab associated with successive cytotoxic therapies beyond disease
progression in metastatic breast cancer. Clin Breast Cancer 2005;:325-329.
Garrison LP, Jr., Lubeck D, Lalla D, Paton V, Dueck A, Perez EA. Cost-effectiveness
analysis of trastuzumab in the adjuvant setting for treatment of HER2-positive
breast cancer. Cancer 2007;110:489-498.
Geyer CE, Forster J, Lindquist D, Chan S, Romieu CG, Pienkowski T, et al. Lapatinib
plus capecitabine for HER2-positive advanced breast cancer. N Engl J Med.
2006;355:2733-2743.
Geyer CE, Martin A, Newstat B, Casey MA, Berger MS, Oliva CR, et al. Lapatinib (L)
plus capecitabine (C) in HER2+ advanced breast cancer (ABC): Genomic and
updated efficacy data [abstract]. J Clin Oncol. 2007;25:18S.
Gold MR, ed, Siegel JE, ed, Russell LB, ed, Weinstein MC, ed. Cost-Effectiveness in
Health and Medicine. New York, NY, Oxford University Press, 1996.
Lamers LM, Stupp R, van den Bent MJ, Al MJ, Gorlia T, Wasserfallen JB, et al. Cost-
effectiveness of temozolomide for the treatment of newly diagnosed glioblastoma
multiforme: a report from the EORTC 26981/22981 NCI-C CE3 Intergroup
Study. Cancer 2008;112:1337-1344.
McLachlan SA, Pintilie M, Tannock IF. Third line chemotherapy in patients with
metastatic breast cancer: an evaluation of quality of life and cost. Breast Cancer
Res Treat 1999;54:213-23.
Morabito A, Longo R, Gattuso D, Carillio G, Massaccesi C, Mariani L, et al.
Trastuzumab in combination with gemcitabine and vinorelbine as second-line
therapy for HER-2/neu overexpressing metastatic breast cancer. Oncol Rep
2006;16:393-398.
Muss HB. Targeted therapy for metastatic breast cancer. N Engl J Med. 2006;355:2783-
2785.
33
National Institute for Health and Clinical Excellence. Breast cancer (advanced or
metastatic) – lapatinib: Appraisal consultation document. Available at URL:
http://www.nice.org.uk/guidance/index.jsp?action=article&o=39849 [accessed:
May 15, 2008].
Pelletier EM, Shim B, Goodman S, Amonkar MM. Epidemiology and economic burden
of brain metastases among patients with primary breast cancer: results from a US
claims data analysis. Breast Cancer Res Treat 2008;108:297-305.
Ries LAG, Melbert D, Krapcho M, et al., eds. SEER Cancer Statistics Review, 1975-
2004, National Cancer Institute. Bethesda, MD,
http://seer.cancer.gov/csr/1975_2004/, based on November 2006 SEER data
submission, posted to the SEER Website, 2007.
Shah A, Maroun J, Dranitsaris G. The cost of hospitalization secondary to severe
chemotherapy induced diarrhea (CID) in patients with colorectal cancer [abstract].
J Clin Oncol. 2004:22:14S.
Slamon DJ, Clark GM, Wong SG, Levin WJ, Ullrich A, McGuire WL. Human breast
cancer: correlation of relapse and survival with amplification of the HER-2/neu
oncogene. Science. 1987;235:177–182.
Ubel PA, Hirth RA, Chernew ME, Fendick, M. What is the price of life and why doesn't
it increase at the rate of inflation? Arch Intern Med 2003;163:1637-1641.
US Department of Labor: US Bureau of Labor Statistics. Available at URL:
http://www.bls.gov [accessed July 21, 2007].
34
CHAPTER 3: DETECTING BLOOD LABORATORY ERRORS USING A
BAYESIAN NETWORK: AN EVALUATION ON LIVER ENZYME TESTS
2
Chapter-Three Abstract
Objectives: To detect errors in blood laboratory results using a Bayesian network (BN), to
compare results with an established method for detecting errors based on frequency
patterns (LabRespond) and logistic regression model.
Methods: In Experiment 1 and 2 using a sample of 5,800 observations from the National
Health and Nutrition Examination Survey dataset, large, medium and small errors were
randomly generated and introduced to liver enzymes (ALT, AST, and LDH) of the
dataset. Experiment 1 examined systematic errors, while Experiment 2 investigated
random errors. The outcome of interest was the correct detection of liver enzymes as
―error‖ or ―not error.‖ With the BN, the outcome was predicted by exploiting
probabilistic relationships among AST, ALT, LDH, and gender. In addition to AST,
ALT, LDH, and gender, LabRespond required more information on related analytes to
achieve optimal prediction. We assessed performance by examining the area under the
receiver-operating characteristics curves using a 10-fold cross validation method, as well
as risk stratification tables.
Results: In Experiment 1, the BN significantly outperformed both LabRespond and
logistic regression in detecting large (both at p < 0.001), medium (p = 0.01 and p < 0.001,
2
The final, definitive version of this paper has been published in Medical Decision Making (OnlineFirst
version) on August 20
th
2010 by SAGE Publications Ltd. SAGE Publications, Inc., All rights reserved.
35
respectively), and small (p = 0.03 and, p = 0.05, respectively) systematic errors. In
Experiment 2, the BN performed significantly better than LabRespond and multinomial
logistic regression in detecting large (p = 0.04 and p < 0.001, respectively) and medium
(p = 0.05 and p < 0.001, respectively) random errors.
Conclusion: A Bayesian network is better at detection and can detect errors with less
information than existing automated models, suggesting that Bayesian model may be an
effective means for reducing medical costs and improving patient safety.
36
Chapter-Three Background
Bayesian models are often used to prescribe treatment and guide diagnostic
decisions, but they have had less influence in evaluating the quality or validity of medical
data and information. One important source of medical information is a patient‘s clinical
laboratory results in the context of his or her clinical presentation. Approximately 60-
70% of the most important decisions on admission, discharge, and medications are based
on laboratory test results (Forsman, 1996). Errors in the clinical laboratory are
particularly problematic because they may lead to unnecessary further testing and
erroneous treatment decisions. Only recently has there been public awareness of the
financial and health costs associated with laboratory errors (Landro, 2006). Laboratory
errors come from a variety of sources; each has its own implication for patient safety.
Laboratory errors are estimated to occur between 0.1% and 1.0% of the time (Plebani &
Carraro, 1997). Given an estimated 7 billion laboratory tests per year in the United States,
this equates to upwards of 70 million laboratory errors annually. It is estimated that 6.4%
of erroneous laboratory results cause some harm to patients (Landro, 2006). Physicians
acting on erroneous information can have adverse health consequences for patients and
increase the cost of medical care by introducing inefficiency.
Currently, primary methods for detecting laboratory errors and validating patient
results include manual review of results by seasoned laboratory experts or by rule-based
autoverification systems. Autoverification systems estimate the believability of results
based on the internal consistency of the data and through delta checks (i.e. using previous
37
results from a patient to estimate the likelihood of error in the current results) (Boran et
al., 1996). Nevertheless, current strategies can be costly and/or limited. Laboratory
experts are effective in identifying errors but they can get fatigued, are often interrupted,
or just simply make a mistake. While studies of rule-based autoverification systems have
suggested that they can save 40 to 80 technologist hours per week and achieve greater
accuracy compared with laboratory experts, they too have limitations (Crolla &
Westgard, 2003). Current systems implementing an algorithmic approach to detecting
laboratory errors are generally based on sets of simple logic rules (such as IF-THEN
statement, AND, OR, etc.) (Clinical and Laboratory Standards Institute: Autoverification
of Clinical Laboratory Test Results Approved Guidelines – AUTO 10-A, 2006).The
major disadvantage of rule-based systems in laboratory error context is that they are not
able to reason abductively, i.e. from evidence of error to belief in hypothesis about error.
Most recently, Oosterhuis, Ulenkate, & Goldschmidt (2000) have developed a method by
which correlated laboratory tests are examined for frequency patterns. This system is
called LabRespond. In this approach, observed and expected patterns of laboratory
analytes yield an indicator of the likelihood of the data; which then can be used as an
error threshold. Oosterhuis et al. (2000) have shown that their method, LabRespond,
performs better than laboratory experts in detecting synthetic errors, i.e., errors generated
by the experimenters.
In this study, we propose a new approach using a Bayesian network to detect
synthetic laboratory errors in liver enzyme values and compare performance of our model
with an established method for automatic detection errors, LabRespond (Oosterhuis et al.,
38
2000), and predictive logistic regression model. Detecting errors in liver enzyme tests is
clinically important as false positive results may lead to additional invasive procedures,
while false negative tests may result in liver disease being untreated and further damage
the liver. We report on two experiments using the National Health and Nutrition
Examination Survey (NHANES) dataset. Experiment 1 compares the Bayesian network
to LabRespond and to a logistic regression model in detecting systematic errors
(misestimations of true values which are persistent in direction and magnitude) in values
of the liver enzymes alanine transaminase (ALT), aspartate transaminase (AST), or lactate
dehydrogenase (LDH). Experiment 2 compares the predictive performance of our Bayesian
network to LabRespond and to a multinomial logistic regression model in detecting
random errors (misestimations of true values which are variable in direction and
magnitude) that produce variance in AST, ALT, and LDH results.
Qualitative Types of Laboratory Error
The clinical laboratory cycle, as shown in Figure 3.1, is a process in clinical
laboratories that both starts and ends with the clinician. The part of the cycle from when
the clinician orders the test to when the sample is received by the laboratory technicians
is called the pre-analytical phase. The analytical phase covers the portion of the cycle
from the laboratory technicians‘ receipt of the sample to when the results are released for
reporting to the ordering clinician. The post-analytical phase is the portion of the process
occurring after results have been released for reporting (Strylewicz, 2008).
39
Figure 3.1: Clinical Laboratory Process
Note: LIMS = Laboratory Information Management Systems
A recent review found high variability between laboratories, but estimated that
about two-thirds of these errors occur in the pre-analytical stage, one-sixth in the
analytical stage, and one-sixth in the post-analytical stage (Bonini et al., 2002). The pre-
analytical phase contains errors originating from the patient, such as being wrongly
identified and errors due to improper sample collection and processing (Wiwanitkit,
2001). Analytical errors are those originating in the analytical sections of the clinical
laboratory and include analyzer errors, improper handling of samples, and data entry
errors (Witte et al., 1997). Post-analytical errors are those errors occurring after the
results have been released for reporting back to the clinician and may include
transcription or data-entry errors, excessive turn-around-time of results, or failure to
40
notify the clinician of an abnormal laboratory result (Stroobants, 2003). A careful
analysis of workflow and processes within these contexts may reveal ―root causes‖ of
errors. However, because laboratories, clinics and hospitals represent numerous largely
independent entities, a cause of error at one location may not be a cause at another
(Hoelzel et al., 2004; Marcovina et al., 2007).
We approach the problem of laboratory errors from a different perspective. We
seek to evaluate the belief (i.e. probability) in the value of an analyte being in error within
the context of results of other analytic values. Information on other analytes has virtually
no additional cost when this information is already available in laboratory databases.
Analytic values are indicators of biological function, and errors represent exogenous
perturbations of these biological indicators that lead to unusual data patterns. Hence,
examining the value of an analyte in the context of other biological indicators can
influence one‘s belief that the value is in error. For example, measuring a high fasting
glucose and a low glycosylated hemoglobin should increase our belief in an error since
such a combination is unlikely.
This study evaluates detection methods for identifying systematic and random
errors that affects a single analyte. These are called ―value errors‖ and represent an
important type of laboratory error. Value errors have no effect on our belief that any other
result is in error. Value errors may be due to instrument failure, miscalibration, or data
entry mistakes. A value error occurs, for example, when a technician enters an erroneous
cholesterol result of 250 mg/dl instead of an actual value of 150 mg/dl. As shown in
41
Figure 3.1, values errors most often occur at the pre-analytic and analytic stages (Bonini
et al., 2002). This study does not consider sample switching errors which occur when
samples from two patients are interchanged or a sample is wrongly identified. We
discuss these other types of error elsewhere (Strylewicz, 2008; Doctor & Strylewicz,
2010). Nor does this study address easily identifiable errors that are out of range from
typically sampled distributions (Jay & Provasek, 1993). These often occur during sample
processing. For example, hemolysis in a sample will cause the potassium result to be
significantly higher, often inconceivably high, while the alkaline phosphatase result will
be significantly lower than the true value. Such errors are easily identified by laboratory
staff and clinicians.
Chapter-Three Methodology
Data Source:
In the current study we utilized data from the National Health and Nutrition
Examination Survey (NHANES), which collected nutritional and health information for adults
and children across the United States (National Center for Health Statistics, Centers for Disease
Control – National Health and Nutrition Examination Survey website, assessed December 2007).
The program collected data through interviews, obtaining socioeconomic, demographic, dietary,
health-related information, physical examinations, medical, dental, laboratory tests and
physiological assessments. In our analysis, we used data from the 2003-2004 survey years with
clinical laboratory values including aspartate transaminase (AST), alanine transaminase (ALT),
lactate dehydrogenase (LDH), gamma glutamyl transpeptidase (GGT), alkaline phosphatase
42
(ALP), total bilirubin (TBIL), and gender. We excluded patients missing any of these lab values,
leaving a total of 5,800 patients for our analysis.
The Bayesian Network:
The structure of our Bayesian network was built by exploiting the probabilistic
relationships among AST, ALT, LDH, and gender using training data sets (Figure 3.2).
AST, ALT, and LDH enzymes are found in many body tissues, particular in the liver, and
thus tend to rise together when liver tissue is damaged, such as from viral hepatitis,
acetaminophen overdose, etc. (Giannini et al., 2005). Moreover, studies (Papatheodoridis
et al., 2007 ; Sanfey, 2005) have also found an effect of gender on liver enzyme levels.
We used the ―Deal‖ package for R version 1.2-33 to perform structure and parameter
learning to construct the Bayesian network with mixed variables (Pearl, 2000; Bøttcher &
Dethlefsen; Heckerman, 1995). The Bayesian network was then imported into Hugin
Researcher version 6.9 (2008) to infer the possibility that an error had occurred for each
patient‘s liver enzyme values. An overview of Bayesian networks with mixed variables is
briefly reported in the Appendix A.
43
Figure 3.2: Bayesian network with mixed discrete (Gender: Male or Female, and Error:
Yes or No) and continuous (ALT, AST, and LDH measures) variables.
LabRespond ™ – A Benchmark Method:
To evaluate the predictive performance of our approach, we compared our results
to those of an established method, or benchmark. We selected a well-established method
called ―LabRespond,‖ that has been previously validated (the complete details of the
LabRespond algorithm are provided in the paper by Oosterhuis et al.(2000). LabRespond
is an automated patient validation system that uses statistical methods to estimate the
plausibility of observed clinical laboratory results. The algorithm uses patient
demographic and other analytes that are thought to co-vary with the analyte of interest for
error classification. To achieve optimal prediction of errors in liver enzymes ALT, AST,
and LDH, LabRespond requires more information from other analytes (GGT, ALP, and
total bilirubin) and gender (Oosterhuis et al., 2000).
44
Logistic Regression Model:
Logistic regression models the probability of some event occurring as a linear
function of a set of predicting variables. This method is one of the most common models
of estimating probabilities from data. In experiment 1, the actual state of the dependent
variable (whether there is an error: yes or no) was determined by the estimated
probability obtained from the logistic regression model with ALT, AST, LDH, ALP,
GGT, TBIL, gender, and interaction terms between gender and these analytes as
independent variables. Additionally, to detect bi-directional random errors in experiment
2, we created a multinomial logistic regression model, which is a direct extension of the
standard logistic regression to conditions where the dependent variable has more than two
unordered outcome states (additive error, subtractive error, or no error).
Experiment 1:
The Bayesian network was evaluated against LabRespond and the logistic
regression model in detecting additive effects of systematic errors to the liver enzymes
ALT, AST, or LDH.
We introduced synthetic errors with different magnitudes to our dataset (n = 5,800
patients). Large, medium, and small systematic errors were defined as 75%, 50%, and
25% the magnitude of laboratory values. For example, if a large, medium, or small
systematic error was added to an analyte that has a value of 20 units, its erroneous large,
medium, or small value would be 35 units, 30 units, or 25 units, respectively. Errors were
randomly generated and added to 50% of the values of the liver enzymes ALT, AST, and
45
LDH. If an error was randomly added to a patient, it could only occur at one of ALT,
AST, or LDH. As a result, the average error rate added to each liver enzyme ALT, AST,
and LDH was approximately 16.7%.
Experiment 2:
We evaluated the performance of the Bayesian network compared to LabRespond
and the multinomial logistic regression model in detecting random errors (including both
additive and subtractive errors) added to the liver enzymes ALT, AST, or LDH.
The distributions of ALT, AST, and LDH results in our dataset were examined
and all found to be right-skewed. In addition, the normal ranges for ALT, AST, and LDH
in adults are from 10 to 40U/L, 8 to 40U/L, and 90 to 280U/L, respectively (Wu, 2006).
As a result, simply introducing subtractive errors (as one component of random errors)
would make the subtractive errors become obvious to detect. For example, the means for
ALT, AST, and LDH in our dataset are a 24U/L (range of 5U/L to 1997U/L), 25U/L
(7U/L to 1672U/L), and 131U/L (42U/L to 1292U/L), respectively. If a subtractive
random error of 75% was introduced into the dataset, the erroneous values would be
6U/L, 6U/L, and 34U/L for ALT, AST, and LDH, respectively. Because the erroneous
values are below the normal ranges, it would become obvious that an error had occurred.
We, therefore, used a different method to randomly generate errors based on the
distribution percentiles of ALT, AST, and LDH that retained values within their
corresponding distributions.
46
First, we transformed all measures of ALT, AST, and LDH to percentiles based
on their distributions in the dataset. For example, the means of ALT (24U/L), AST
(25U/L), and LDH (131U/L) in the dataset are at the 68
th
, 63
rd
, and 57
th
percentiles based
on their corresponding distributions. Next, the distributions of ALT, AST, and LDH were
divided into quartiles. We defined small, medium, or large random error as 25
percentiles, 50 percentiles, or 75 percentiles of the corresponding distribution. These
values were added to or subtracted from the laboratory values. All distribution percentiles
of ALT, AST, and LDH were then converted back into laboratory values. For example,
an erroneous ALT value of 43U/L or 19U/L (93
rd
or 43
rd
distribution percentile of ALT,
respectively) would result from randomly adding or subtracting ALT value of 24U/L
(68
th
distribution percentile of ALT) with a small random error.
A small random error was introduced to ALT, AST, or LDH using the following
approach. If a randomly selected analyte had a value in the second or third quartile of its
distribution, 25 percentiles were randomly added or subtracted to the analyte‘s value. If
the value was in the first or fourth quartile, 25 percentiles were added or subtracted to the
analyte‘s value, respectively. For medium random errors, 50 percentiles were added or
subtracted to the analyte‘s value if it was in the first half (first and second quartiles) or
second half (third and fourth quartiles), respectively. To generate large random errors to,
75 percentiles were added or subtracted to the analyte‘s value, if it was in the first
quartile or fourth quartile, respectively. Fiftieth percentile points (medium random error)
were also added or subtracted to the analyte‘s value if it was in the second or third
quartile, respectively, to ensure all observations have equal probability of receiving an
47
error. Thus, erroneous values were kept within observable limits when an error was
introduced.
Data and Statistical Analysis:
We evaluated the predictive performance of the Bayesian network, LabRespond,
and logistic regression models by implementing a 10-fold cross validation method. This
method has often been recommended to reduce variability and prevent overfitting
problem (Vickers et al., 2008; Tourassi & Floyd, 1997). The dataset was randomly
divided into 10 equally subsets and each subset was used as a test set for validation while
the remaining 9 subsets were used as a training set when learning the structure and
parameters of the Bayesian network. This process was alternatively repeated 10 times for
each subset. Predictive accuracy was measured for each subset that served as a test set
and the median of the performance indicators was selected to determine the overall
predictive performance for the model. Systematic and random errors were generated in a
Microsoft Excel spreadsheet using Visual Basic programming language.
To estimate the predictive accuracy of a probabilistic model, we computed the
area under the receiver operating characteristic curve (ROC) for each model. The ROC
curve plots true positive rate (sensitivity) against false positive rate (1 – specificity). The
area under the ROC curve (AUC) specifies an overall measurement of performance of a
model, with an area of 0.5 representing no discriminating ability and 1.0 showing perfect
discrimination. A z-score test was used to compare the AUC for each model. Because the
ROC curves in our study were derived from the same dataset, the standard error (SE) of
48
the difference between two AUCs needs to take into account the correlation between the
two (Hanley & McNeil, 1983). The z statistics is calculated as:
2 1
2
2
2
1
2 1
2 SE rSE SE SE
AUC AUC
z
;
where r is a correlation coefficient, which can be obtained using Table 1 from a paper by
Hanley & McNeil (1983).
In addition to ROC curve analysis, classification accuracy (true positive, false
positive, true negative, false negative, sensitivity, specificity, accuracy) of each model
was evaluated. These statistics, however, requires a specific and validated classification
threshold (Janes et al., 2008). Oosterhuis et al. (2000) defined the threshold of less than
5% for the post-test plausibility of LabRespond, which corresponds to a specific true
positive (or correct rejection) and a false positive (or incorrect rejection) point along its
ROC curve. To compare classification accuracy among the models, we computed the
false positive rate [= false positive / (false positive + true negative)] of LabRespond
based on the 5% cutoff point for post-test plausibility and then used the ROC curve to
select a classification threshold for the Bayesian network and logistic regression that
yielded the same false positive rate (FPR).
All statistical analyses were conducted with the statistical software package R
version 2.8.1 (R: A Language and Environment for Statistical Computing - R Foundation
for Statistical Computing, 2008) and Microsoft Excel using Visual Basic programming
language.
49
Chapter-Three Results
The typical participant was 42.3 years of age, female (51.2%), Caucasian and had
more than a high school education. The means (standard deviation) of ALT, AST, and
LDH were 23.6 U/L (30.9 U/L), 25.6 U/L (27.3 U/L), and 130.9 U/L (37.9 U/L),
respectively. The percents below and above the normal ranges for ALT, AST, and LDH
were 1.1% and 7.5%, 0% and 4.6%; and 3.2% and 0.3%, respectively.
Using logistic regression model with stepwise selection and backward elimination
among 13 potential variables (6 liver enzymes and related analytes: ALT, AST, LDH,
ALP, GGT, TBIL; gender; and 6 interaction terms between gender and the analytes) and
entering all variables that were significant at an α-level of 0.05, we identified 10 and 12
independent correlates of error detection of liver enzymes ALT, AST, and LDH using the
training data sets from Experiment 1 and 2, respectively. Note that the models resulted
from the automated stepwise selection produced similar results as non-parsimonious
model (model with all 13 variables).
Experiment 1:
Table 3.1 shows the classification accuracy of the Bayesian network, LabRespond
and logistic regression model in detecting large, medium, and small systematic errors
when the FPR‘s of the Bayesian network and logistic regression model were set the same
as LabRespond‘s FPR. The Bayesian network had significantly higher sensitivities (error
recovery rates) and accuracies than LabRespond and logistic regression at large
systematic error: 75% versus 52.1% and 39.0% (both at p < 0.001), and 82.2% versus
50
70.5% and 64.0% (both at p < 0.001); and at medium systematic error 55.5% versus
35.2% and 33.1% (both at p < 0.001), and 71.0% versus 60.9% and 59.8% (both at p <
0.001), respectively. For small systematic error, sensitivity was significantly higher for
the Bayesian network than LabRespond (p < 0.05) (Table 3.2).
Table 3.2 also presents the median AUCs over 10-fold cross validation processes
of the Bayesian network, LabRespond, and logistic regression model in detecting large,
medium, and small systematic errors of ALT, AST, and LDH. The Bayesian network
significantly outperformed both LabRespond and the logistic regression model in
detecting large (both at p < 0.001), medium (p = 0.01 and p < 0.001, respectively), and
small (p = 0.03 and, p = 0.05, respectively) systematic errors. Figure 3.3 compares the
ROC curves of the Bayesian network against LabRespond and the logistic regression
model in classifying large, medium, and small systematic errors.
51
Table 3.1: Classification accuracy for systematic errors
Analytes Predictive Models
Bayesian Network LabRespond Logistic Regression
Large Systematic Errors: Not Error Error Total Not Error Error Total Not Error Error
Total
Not Error 258 32 290 258 32 290 258 32
290
Error 71 219 290 139 151 290 117 113
290
Medium Systematic Errors:
Not Error 251 39 290 251 39 290 251 31
290
Error 129 161 290 188 102 290 194 96
290
Small Systematic Errors:
Not Error 252 38 290 252 38 290 252 38
290
Error 211 79 290 234 56 290 213 77
290
Notes: The Bayesian network and logistic regression model are set at the same specificity rates (1 – false positive rate) as LabRespond, which
corresponds to its validated threshold of 5% for the estimated post-test plausibility.
52
Table 3.2: Median AUCs and standard errors (SE), sensitivity (%), and accuracy (%) of the predictive models in 10-fold
cross validation processes (n = 5,800) in detecting large, medium, and small systematic errors
Predictive Models
Error Size Bayesian Network LabRespond Logistic Regression
Large Systematic Errors
AUC (SE) 0.872 (0.015) 0.799 (0.018)* 0.731 (0.021)*
Sensitivity (TPR) 75% 52.1%* 39.0%*
Accuracy (TP + TN) 82.2% 70.5%* 64.0%*
Medium Systematic Errors
AUC (SE) 0.783 (0.019) 0.732 (0.021) † 0.689 (0.022)*
Sensitivity (TPR) 55.5% 35.2%* 33.1%*
Accuracy (TP + TN) 71.0% 60.9%* 59.8%*
Small Systematic Errors
AUC (SE) 0.643 (0.023) 0.596 (0.024) ‡ 0.612 (0.023) §
Sensitivity (TPR) 27.2% 19.3%‡ 26.6%
Accuracy (TP + TN) 57.1% 53.1% 56.7%
Notes: The Bayesian network and logistic regression model are set at the same specificity rates (1 – false positive rate) as LabRespond, which
corresponds to its validated threshold of 5% for the estimated post-test plausibility. TPR: true positive rate; TP: true positive; TN: true negative. *
Bayesian network performs better at p < 0.001; † Bayesian network performs better at p = 0.01; ‡ Bayesian network performs better at p < 0.05; §
Bayesian network performs better at p = 0.05.
53
Figure 3.3: ROC curves compare the predictive performance in detecting systematic errors of the Bayesian network against
LabRespond and the logistic regression model
54
Experiment 2:
Table 3.3 shows the classification accuracy of the Bayesian network, LabRespond
and multinomial logistic regression model in detecting large, medium, and small random
errors when setting the FPR‘s for the Bayesian network and multinomial logistic
regression model equal to LabRespond‘s FPR. Similar to systematic errors, the Bayesian
network had significantly higher sensitivities and accuracies than LabRespond and
multinomial logistic regression at large random error 54.1% versus 35.5% and 22.1%
(both at p < 0.001), and 70.9% versus 61.6% and 54.8% (both at p < 0.001); and at
medium random error 45.5% versus 24.5% and 19.0% (both at p < 0.001), and 64.5%
versus 54.0% and 51.2% (both at p < 0.001), respectively. For small random error, the
sensitivity of the Bayesian network was significantly higher than that of the multinomial
logistic regression model (p = 0.01) (Table 3.4).
Table 3.4 presents the median AUCs over 10-fold cross validation processes of
the Bayesian network, LabRespond and multinomial logistic regression model for large,
medium, and small random errors. The Bayesian network again performed significantly
better than LabRespond and the multinomial logistic regression model in detecting large
(p = 0.04 and p < 0.001, respectively) and medium (p = 0.05 and p < 0.001, respectively)
random errors. However, the predictive performance of the Bayesian network compared
with LabRespond and the multinomial logistic regression model in detecting a small
random error was not statistically significant. Figure 3.4 compares the ROC curves of the
55
Bayesian network against LabRespond and the multinomial logistic regression model in
all large, medium, and small random errors.
56
Table 3.3: Classification accuracy for random errors
Analytes Predictive Models
Bayesian Network LabRespond Multinomial Logistic
Regression
Large Random Errors: Not Error Error Total Not Error Error Total Not Error Error
Total
Not Error 254 36 290 254 36 290 254 36
290
Error 133 157 290 187 103 290 226 64
290
Medium Random Errors:
Not Error 242 48 290 242 48 290 242 48
290
Error 158 132 290 219 71 290 235 55
290
Small Random Errors:
Not Error 248 42 290 248 42 290 248 42
290
Error 216 74 290 225 65 290 242 48
290
Notes: The Bayesian network and multinomial logistic regression model are set at the same specificity rates (1 – false positive rate) as LabRespond,
which corresponds to its validated threshold of 5% for the estimated post-test plausibility.
57
Table 3.4: Median AUCs and standard errors (SE), sensitivity (%), and accuracy (%) of the predictive models in 10-fold
cross validation processes (n = 5,800) in detecting large, medium, and small random errors
Predictive Models
Error Size Bayesian Network LabRespond Multinomial Logistic
Regression
Large Random Errors
AUC (SE) 0.749 (0.020) 0.701 (0.022) ‡ 0.594
(0.024)*
Sensitivity (TPR) 54.1% 35.5%* 22.1%*
Accuracy (TP + TN) 70.9% 61.6%* 54.8%*
Medium Random Errors
AUC (SE) 0.675 (0.022) 0.627 (0.023) § 0.574
(0.024)*
Sensitivity (TPR) 45.5% 24.5%* 19.0%*
Accuracy (TP + TN) 64.5% 54.0%* 51.2%*
Small Random Errors
AUC (SE) 0.570 (0.024) 0.524 (0.024) 0.539 (0.024)
Sensitivity (TPR) 25.5% 22.4% 16.6%‡
Accuracy (TP + TN) 55.5% 54.0% 51.0%
Notes: The Bayesian network and multinomial logistic regression model are set at the same specificity rates (1 – false positive rate) as LabRespond,
which corresponds to its validated threshold of 5% for the estimated post-test plausibility. TPR: true positive rate; TP: true positive; TN: true
negative. * Bayesian network performs better at p < 0.001; † Bayesian network performs better at p = 0.01; ‡ Bayesian network performs better at p
< 0.05; § Bayesian network performs better at p = 0.05.
58
Figure 3.4: ROC curves compare the predictive performance in detecting random errors of the Bayesian network against
LabRespond and the multinomial logistic regression model
59
Chapter-Three Sensitivity Analyses
In addition to the magnitude of the error, we further performed sensitivity
analyses on error rates and distributions. The error rates were implemented at 3%, 10%,
and 50% of specimens. Note that by the aforementioned process, there is an equal
probability of each analyte value being assigned an error. We found that different error
rates produced similar results. For error distributions, we divided the analytes (ALT,
AST, and LDH) into 4 quartiles (0 to 25 percentile, > 25 to 50 percentile, > 50 to 75
percentile, and > 75 percentile), and then introduced medium errors (50% the magnitude
of the analytes) to each quartile. Table 3.5 presents the median AUCs over 10-fold cross
validation processes of the Bayesian network, LabRespond, and logistic regression model
in detecting a medium error introduced at each quartile. The Bayesian network
significantly outperformed both LabRespond and the logistic regression model when the
error distribution was at the third quartile (p < 0.05 and p < 0.001, respectively) and at the
fourth quartile (both at p < 0.001). Moreover, the Bayesian network also performed better
than the logistic regression model when the error distribution was at the second quartile
(p < 0.001).
60
Table 3.5: Median AUCs and standard errors (SE) of the predictive models in 10-fold
cross validation processes (n = 5,800) in detecting errors at different error distributions
Predictive Models
Error Distributions Bayesian Network LabRespond Logistic Regression
At First Quartile 0.618 (0.037) 0.581 (0.037) 0.607 (0.037)
At Second Quartile 0.762 (0.034) 0.729 (0.035) 0.625 (0.037)*
At Third Quartile 0.879 (0.027) 0.794 (0.033) ‡ 0.682 (0.036)*
At Fourth Quartile 0.958 (0.017) 0.815 (0.031)* 0.866 (0.028)*
Notes: * Bayesian network performs better at p < 0.001; ‡ Bayesian network performs better at p < 0.05.
Chapter-Three Discussion
We investigated value errors that affect one analyte‘s result and have no effect on
our belief that any other result is in error. Examples of value errors include instrument
failure, miscalibration, or data entry errors. We found that a Bayesian network
significantly outperforms other methods (LabRespond and logistic regression) in
detecting large, medium and small value errors. As expected, the Bayesian network
performed much better at detecting clinically significant errors (i.e. those of higher
magnitude.
While area under the ROC curve was employed to evaluate overall predictive
performance of the models, it was also necessary to examine the ROC curves to evaluate
the performance of the Bayesian network over different error thresholds. Figures 3.3 and
3.4 clearly show that the Bayesian network consistently performs better than LabRespond
and the logistic regression model overall and at low and moderate false positive rates, but
performs worse at high false positive rate when large systematic and random errors are
61
introduced. In clinical practice, however, such high false positive rates would not be
acceptable as they would result in more frequent re-analysis of correct results. Therefore,
at operational levels of false positive rate (normally set at 5%), the Bayesian network
consistently performs better than both LabRespond and logistic regression.
There are several limitations to the current study. First, any forecasting method
requires a reference standard. Since the analysis was a retrospective analysis of a
published dataset, presumed to be free of errors, we used synthetic errors to evaluate
predictive performance of the models. The performance of these algorithms in identifying
naturally occurring errors is still unknown. However, our use of synthetic errors spanned
different types, magnitudes, rates, and distributions, to simulate many different types of
naturally occurring errors. The Bayesian networks performance was robust to these
variations in error. Second, this work was conducted on a national representative sample
of blood lab analytes (NHANES dataset), thus, different results may apply in clinical sub-
populations (diabetes, patients currently on cholesterol-lowering medications, Hepatitis-B
carriers, etc.). Third, while the Bayesian network performs very well at detecting errors
with large magnitude, as do all other methods (laboratory experts, LabRespond, logistic
regression model, or other existing rule-based autoverification systems), it is not as adept
at detecting errors that are small in magnitude. Fortunately, such errors would potentially
have less clinical impact as they would be less likely to change clinical care decisions.
Fourth, the NHANES data did not contain results of previous liver enzyme tests which
are an important predictor in the LabRespond algorithm, thus its predictive performance
62
could not be fully assessed under optimal conditions. An additional study using a dataset
containing results of prior laboratory tests is a topic of future study.
The Bayesian networks described herein are appropriate when there is observed
co-variation between two analytes or among more than two analytes. Causal relationships
among certain blood analytes make them appropriate for Bayesian network error
evaluations, because this leads to co-varation. Analytes may also be appropriate for
Bayesian network error analysis when they are not causally related but share co-variation
with an unobserved variable such as disease status, an unmeasured analyte or a
combination thereof. In addition to glucose and glycosylated hemoglobin (Strylewicz,
2008; Doctor & Strylewicz, 2010) and the liver function tests described in this paper, a
Bayesian approach to error classification may be useful with common diabetes panels
including c-peptide and insulin, and portions of complete blood count test. For certain
analytes, a Bayesian analysis may not be appropriate. Examples include analytes that
have highly-skewed distributions such as GAD65 auto-antibodies, or analytes that are
evaluated in the absence of other related analytes (e.g., HIV testing in the absence of a
white cell counts).
As previously discussed, the capability of abductive reasoning is the most
advanced feature of Bayesian network. Abductive reasoning is a process of inference that
produces a hypothesis, in the case of Bayesian networks, this is a degree of belief in a
hypothesis. With respect to the detection of laboratory errors using Bayesian
networks, abductive reasoning yields a probability of error given information about a
patient's laboratory values. Other approaches are not capable of producing such a result
63
within the calculus of probability theory. More importantly, because the hallmark
characteristic of most (if not all) other types of error is a deviation of the analytic value
from its true value, it is possible that a well-constructed Bayesian network would identify
such errors. Our study shows that Bayesian network may have promise in future
autoverification systems. Future studies should validate the Bayesian network using
samples from different populations and determine which types of systems are most cost-
effective as an autoverification system
64
Chapter-Three References
Bonini P, Plebani M, Ceriotti F, Rubboli F. Errors in laboratory medicine. Clin Chem
2002;48:691-8.
Boran G, Given P, O'Moore R. Patient result validation services. Comput Methods
Programs Biomed 1996;50:161-8.
Bøttcher SG. Learning Bayesian networks with mixed variables [dissertation]. Denmark:
Aalborg University; 2004.
Bøttcher SG, Dethlefsen C. DEAL: A package for learning Bayesian networks. J Stat
Software 2003;8.
Bøttcher SG, Dethlefsen C.Learning Bayesian networks with R. In: Hornik K, Leisch F,
Zeileis, editors. DSC 2003: Proceedings of the 3
rd
International Workshop on
Distributed Statistical Computing; 2003 Mar 20-22; Vienna, Austria
Clinical and Laboratory Standards Institute. Autoverification of Clinical Laboratory Test
Results, Approved Guidelines (AUTO 10-A). Wayne, Pa.: Clinical and
Laboratory Standards Institute, 2006.
Crolla LJ, Westgard JO. Evaluation of rule-based autoverification protocols. Clin
Leadersh Manag Rev 2003;17:268-72.
Doctor JN, Strylewicz GB. Detecting ‗Wrong Blood in Tube‘ Errors: Evaluation of a
Bayesian Network Approach. Artif Intell Med, 2010. [Epub ahead of print].
Forsman RW. Why is the laboratory an afterthought for managed care organizations?
Clin Chem 1996;42:813-6.
Giannini EG, Testa R, Savarino V. Liver enzyme alteration: a guide for clinicians. CMAJ
2005;172:367-79.
Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating
characteristic curves derived from the same cases. Radiology 1983;148:839-43.
Heckerman D., Geiger D. Learning Bayesian network. Microsoft Research; 1995. Report
No.: MSR-TR-95-02.
Heckerman D. A Tutorial on Learning Bayesian Networks. Microsoft Research; 1996.
Report No.: MSR-TR-95-06.
65
Hoelzel W, Weykamp C, Jeppsson JO, Miedema K, Barr JR, Goodall I, et al. IFCC
reference system for measurement of hemoglobin A1c in human blood and the
national standardization schemes in the United States, Japan, and Sweden: a
method-comparison study. Clin Chem 2004;50:166-174.
Hugin Expert A/S. Hugin Researcher Version 6.9. Computer software. Aalborg,
Denmark: Hugin Expert A/S; 2008.
Janes H, Pepe MS, Gu W. Assessing the Value of Risk Predictions by Using Risk
Stratification Tables. Ann Intern Med 2008;149:751-60.
Jay, DW, Provasek D. Characterization and mathematical correction of hemolysis
interference in selected Hitachi 717 assays. Clin Chem 1993;39:1804-10.
Landro L. The Informed Patient: Hospitals Move to Cut Dangerous Lab Errors. The Wall
Street Journal 2006. New York: D1.
Marcovina S, Bowsher RR, Miller G, Staten M, Myers G, Caudill SP, et al.
Standardization of insulin immunoassays: report of the American Diabetes
Association‘s Workgroup. Clin Chem 2007;53:711-716.
National Center for Health Statistics, Centers for Disease Control. National Health and
Nutrition Examination Survey. Data Sets and Related Documentation. Available
from http://www.cdc.gov/nchs/about/major/nhanes/datalink.htm. Accessed
December 2007.
Oosterhuis WP, Ulenkate HJ, Goldschmidt HM. Evaluation of LabRespond, a new
automated validation system for clinical laboratory test results. Clin Chem
2000;46:1811-7.
Papatheodoridis GV, Goulis J, Christodoulou D, et al. High prevalence of elevated liver
enzymes in blood donors: associations with male gender and central adiposity.
Eur J Gastroenterol Hepatol 2007;19:281-7.
Pearl J, Causality: models, reasoning, and inference. New York: Cambridge University
Press; 2000.
Plebani M, Carraro P. Mistakes in a stat laboratory: types and frequency. Clin Chem
1997;43:1348-51.
R Development Core Team. In: R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing, Vienna, Austria; 2008.
Available from: http://www.R-project.org.
Stroobants, AK, Goldschmidt HM, Pelbani M. Error budget calculations in laboratory
medicine: linking the concepts of biological variation and allowable medical
errors. Clin Chim Acta 2003;333:169-76.
66
Strylewicz GB. Errors in the Clinical Laboratory: A Novel Approach to Autoverification
[dissertation]. Seattle (WA). University of Washington; 2007.
Sanfey H. Gender-specific issues in liver and kidney failure and transplantation: a review.
J Womens Health (Larchmt) 2005;14:617-26.
Tourassi GD, Floyd CE. The effects of data sampling on the performance evaluation of
artificial neural networks in medical diagnosis. Med Decis Making 1997;17:186-
92.
Vickers AJ, Cronin AM, Elkin EB, Gonen M. Extensions to decision curve analysis, a
novel method for evaluating diagnostic tests, prediction models and molecular
markers. BMC Med Inform Decis Mak 2008;8:53.
Witte, DL, VanNess SA, Angstsdt DS, Pennell BJ. Errors, mistakes, blunders, outliers, or
unacceptable results: how many? Clin Chem 1997;43:1352-6.
Wiwanitkit, V. Types and frequency of preanalytical mistakes in the first Thai ISO
9002:1994 certified clinical laboratory, a 6 - month monitoring. BMC Clin Pathol
2001;1: 5.
Wu AH, ed. Tietz clinical guide to laboratory tests. St Louis, Missouri: Saunders, 2006.
67
CHAPTER 4: PROBABILISTIC MAPPING OF DESCRIPTIVE RESPONSES IN
HEALTH STATUS ONTO HEALTH STATE UTILITIES USING BAYESIAN
NETWORKS: AN EMPIRICAL ANALYSIS CONVERTING SF-12 INTO EQ-5D
UTILITY INDEX IN A NATIONAL U.S. SAMPLE
3
Chapter-Four Abstract
Background: As quality-adjusted life years (QALY) have become the standard metric in
health economic evaluations, mapping health-profile or disease-specific measures onto
preference-based measures to obtain QALYs has become a solution when health utilities
are not directly available. However, current mapping methods are limited due to their
predictive validity, reliability and/or other methodological issues.
Objectives: We employ probability theory together with a graphical model, called a
Bayesian network, to convert health-profile measures into preference-based measures and
to compare the results to those estimated with current mapping methods.
Methods: A sample of 19,678 adults who completed both the SF-12v2 and EQ-5D
questionnaires from the 2003 Medical Expenditure Panel Survey was split into
development and validation sets. Bayesian networks were constructed to explore the
probabilistic relationships between each EQ-5D domain and 12 items of the SF-12v2.
The EQ-5D utility scores were estimated based on the predicted probability for each
response-level of the five EQ-5D domains obtained from the Bayesian inference process
3
This is a non-final version of an article to be published in final form in MEDICAL CARE journal by
LLW publications, Lippincott Williams & Wilkins, Inc. All rights reserved.
68
using the following methods: Monte-Carlo simulation, expected utility, and most-likely
probability. Results were then compared with current mapping methods including
multinomial logistic regression, ordinary least squares, and censored least absolute
deviations.
Results: The Bayesian networks consistently outperformed other mapping models in
different age groups, number of chronic conditions, ranges of the EQ-5D index, and in
the overall sample using both the U.K. population-based EQ-5D scoring system (MAE =
0.077, MSE = 0.013, and R
2
overall = 0.802) and the U.S. population-based EQ-5D
scoring system (MAE = 0.058, MSE = 0.007, and R
2
overall = 0.787).
Conclusion: Bayesian networks provide a new robust and natural approach to map health
status responses into health utility measures for health economic evaluations.
69
Chapter-Four Background
Quality-adjusted life years (QALY) have become the standard metric in health
economic evaluations. Health utilities are commonly obtained using a preference-based
health-related quality of life (HRQOL) instrument, such as the EuroQoL 5D (EQ-5D)
(Brooks, Rabin & Charro, 2003), the Health Utilities Index Mark 3 (HUI3) (Feeny et al.,
2002), the Quality of Well-Being Scale-Self Administered (QWB-SA) (Andersen,
Rothenberg & Kaplan, 1998), or the Short Form 6D (SF-6D) (Brazier & Roberts, 2004).
However, in clinical studies, either generic HRQOL profile measures (SF-36 or SF-12)
(Ware et al., 1993; Ware, Kosinski & Keller, 1996) or disease-specific instrument
measures such as the European Organization for Research and Treatment of Cancer
Quality of Life Questionnaire (EORTC QLQ C-30) (Aaronson et al., 1993) or the
Parkinson‘s Disease Questionnaire (PDQ-8) (Jenkinson et al., 1997) are commonly used
rather than a preference-based measure. Possible reasons for this are a lack of resources,
time, or clinical interest in generic utility index (Brazier et al., 1999). A common method
to estimate health utilities is to ―map‖ a health-profile measure or disease-specific
measure onto a preference-based measure using various econometric techniques from a
simple ordinary least squares (OLS) regression to more complex models (Rowen, Brazier
& Roberts, 2009). Perhaps because of its simplicity, minimal data requirements and
adequate predictive validity, OLS models have been the most widely used method for
converting descriptive measures of health status into health utilities (Mortimer & Segal,
2008). Specifically, several papers studying the conversion of health-profile measure SF-
12 to the preference-based measure EQ-5D have used OLS (Franks et al., 2003; Franks et
70
al., 2004) and censored least absolute deviation (CLAD)
regression (Sullivan &
Ghushchyan, 2006). In addition, multinomial logistic (MNL) regression as a means of
mapping from the responses to SF-12 items to each EQ-5D health domain has also been
studied (Gray, Rivero-Arias & Clarke, 2006). Nevertheless, it is well-known that these
methods have certain limitations such as predictive values that are outside the domain of
the preference-based target, ceiling/floor effects, and assignment to health states that are
not defined in the preference-based target as well as less precision in response-mapping
when using multinomial logistic regression (Mortimer & Segal, 2008).
Bayesian models are often used to prescribe treatments and guide diagnostic
decisions, but they have had less influence in health-related quality of life research. In
this study, we introduce a new approach called ―probabilistic mapping‖ that uses
Bayesian networks to map health-profile or disease-specific measures to preference-
based measures. We then empirically implement the probabilistic mapping approach
using the 2003 Medical Expenditure Panel Survey data to map the health-profile measure
SF-12 onto the preference-based measure EQ-5D and compare the predictive validity of
our new approach to current mapping methods using econometric OLS, CLAD, and
MNL models.
Bayesian Network
A Bayesian network is a graphical model that encodes probabilistic relationships
among a set of variables of interest (Pearl, 2000; Neapolitan, 2003; Jensen, 1996).
71
Let‘s consider a domain Z of n variables x
1
, x
2
… x
n
, where each x
i
represents a
node in the Bayesian network. A Bayesian network for Z is a joint probability distribution
over Z that encodes assertions of conditional independence and as well as dependencies.
Specifically, a Bayesian network B = (D, P) consists of a directed acyclic graph (DAG),
D, and a set of conditional probability distributions for all variables x
i
’s in the network, P.
The DAG, which defines the structure of the Bayesian network, contains a node
for each variable x
i
ϵ Z and a finite set of directed edges (arrows) between nodes denoting
the probabilistic dependencies among variables in Z. For health measurement, a node
may be a health domain, and the states of the node are the possible responses to that
domain. A node is called a child if the probability distribution over its states is
conditional on information about other nodes (items). Conversely, a parent node provides
the information that is used to condition probability within the child node(s). Formally,
for each child node x
i
with parents π(x
i
), there is attached a conditional probability
distribution, P(x
i
|π(x
i
)). With a set of only discrete variables x
i
’s, the joint probability
distribution of the network can be factored as follows (Pearl, 2000; Neapolitan, 2003;
Jensen, 1996):
For example, the Bayesian network resulting from mapping the SF-12v2 onto the EQ-5D
self-care domain has a two-node structure and conditional probabilities as shown in
n
i
i i n
x x P x x x P
1
2 1
)) ( | ( ) ... , (
72
Figure 4.1. The joint probability distribution of P(MA = i, SC = j) with the assumption of
conditional independence between MA and SC can then be computed as:
) ( * ) | ( ) , ( i MA P i MA j SC P j SC i MA P
(where i and j are response-levels of MA and SC, respectively)
Figure 4.1: Example of a Bayesian network resulting from mapping the SF-12v2 onto the
EQ-5D self-care domain
[Parent Node] [Child Node]
7393 . 0 ) 3 (
1629 . 0 ) 2 (
0978 . 0 ) 1 (
MA P
MA P
MA P
0440 . 0 ) 1 | 3 (
3482 . 0 ) 1 | 2 (
6078 . 0 ) 1 | 1 (
MA SC P
MA SC P
MA SC P
0016 . 0 ) 3 | 3 (
0047 . 0 ) 3 | 2 (
9937 . 0 ) 3 | 1 (
0065 . 0 ) 2 | 3 (
0800 . 0 ) 2 | 2 (
9135 . 0 ) 2 | 1 (
MA SC P
MA SC P
MA SC P
MA SC P
MA SC P
MA SC P
Notes: MA denotes “SF-12 health limits moderate activities‖ and has 3 response-levels: MA = 1 (limited a
lot), MA = 2 (limited a little), and MA = 3 (not limited); and SC represents “EQ-5D self-care domain” and
also has 3 response-levels: SC = 1 (no problems), SC = 2 (some problems), SC = 3 (severe problems).
Chapter-Four Methodology
Data Source:
In the current study, we used a sample of adults (≥ 18 years of age) from the 2003
Medical Expenditure Panel Survey (MEPS), a nationally representative survey of the
SF-12 Health
Limits Moderate
Activities
EQ-5D Self-Care
73
U.S. civilian non-institutionalized population used to collect detailed information on
demographic characteristics, health status, healthcare expenditure, health insurance
coverage, and employment (Medical Expenditure Panel Survey website, assessed June
2008). The number of chronic conditions (NCC) was calculated as the total number of
self-reported priority conditions in MEPS and included diseases such as diabetes, asthma,
hypertension, heart disease (coronary heart disease, angina, and myocardial infarction),
stroke, emphysema, joint pain, and arthritis. Two generic HRQOL instruments, SF-12
and EQ-5D, were included via the self-administered questionnaire (SAQ), a paper-and-
pencil questionnaire fielded in panel 7 round 4 and panel 8 round 2 cohorts of the 2003
MEPS.
HRQOL Instruments:
The SF-12 is a generic health-profile measure that has been used extensively in
clinical and population studies. It consists of 12 questions with 3 to 5 response levels
measuring general health, physical and emotional role limitations, physical functioning,
social functioning, pain, vitality, and mental health (Ware et al., 2002). The physical
component summary (PCS-12) and mental component summary (MCS-12) are then
derived based on responses of the 12 questions. A higher summary score indicates better
functioning. Beginning in 2003, the MEPS started implementing SF-12v2 (version 2.0)
which has several improvements relative to the original SF-12. Specifically, the SF-12v2
has shorter and simpler instructions and questionnaire items, and an improved layout for
the questions and answers in the self-administered form making the questionnaire easier
74
to read and complete, translations and cultural adaptations easier to be done. Five-level
response choices have been implemented in place of dichotomous or six-level response
choices used in the original SF-12 (Ware et al., 2002).
The EQ-5D is one of the most widely used generic preference-based measures for
estimating health utilities. The measure has 5 health domains (mobility, self-care, usual
activities, pain/discomfort, and anxiety/depression), each with three response-levels (no
problems, some problems, and severe problems), thus generating 243 possible health
states (Brooks, Rabin & Charro, 2003). The scoring system of EQ-5D used in MEPS as
well as in our study was based on the U.K. general public using the time-tradeoff method
ranging from -0.594 (all five ED-5D health domains reported extreme problems) to 1 or
perfect health (no problems at all five EQ-5D domains) (Dolan, 1997). It should be noted
that the U.S. population-based EQ-5D scoring system was only developed in 2005 by
Shaw et al. (2005). In the current study, we investigate both the U.K. and U.S.
population-based EQ-5D scoring systems.
The Bayesian Networks (Probabilistic Mapping):
A Bayesian network can be learned (estimated) structurally and/or parametrically
from available data. The structures of our Bayesian networks were built by exploring the
probabilistic relationships between each EQ-5D domain and the 12 items of the SF-12v2
from the MEPS data. We used a structure learning algorithm called ―constraint-based‖
method. In the constraint-based approach, searching starts with a fully connected-nodes
DAG and edges between nodes are removed if tests of independence are rejected. These
75
tests calculate a statistic which is an asymptotic chi-square test that assumes the null
(conditional) independence (Hugin Expert A/S, 2008). In our study, we set the
probability of type-1 error for rejecting a hypothesis of independence at 0.01. Once the
structures and parameters of the Bayesian networks are learned, they are then used to
estimate the predicted probabilities of the response-levels for each EQ-5D domain. This
process is called ―probabilistic inference.‖ To estimate the EQ-5D utility scores given the
predicted probabilities for each response-level of all EQ-5D domains obtained from the
Bayesian networks, we examined the following methods:
1. Monte-Carlo Simulation:
This method was suggested by Gray and colleagues
15
. Random numbers between
0 and 1 from a uniform distribution were generated using a Monte-Carlo simulation
method. Based on these random numbers, individuals were assigned to one of the three
response-levels for each EQ-5D domain by comparing the random number with the
predicted probabilities of the response-levels obtained from the Bayesian networks. Let
P
1
(X), P
2
(X), and P
3
(X) be the predicted probabilities from the Bayesian networks for the
response-level ―no problems‖ (level 1), ―some problems‖ (level 2), and ―severe
problems‖ (level 3), respectively (where X is the EQ-5D domain for mobility, self-care,
usual activities, pain/discomfort, or anxiety/depression). For each random number u
i
between 0 and 1 from a uniform distribution, a response-level for each of the EQ-5D
domains was assigned as follows (Gray, Rivero-Arias & Clarke, 2006):
76
(Predicted EQ-5D Response-Level)
i
)] ( 1 [ 3
)] ( 1 [ ) ( 2
) ( 1
3
3 1
1
X P u if
X P u X P if
X P u if
i i
i i i
i i
with u
i
~ Uniform(0, 1)
The predicted EQ-5D utility score was then obtained by applying the EQ-5D
scoring system. For example, applying the U.K. (Dolan, 1997) and U.S. (Shaw et al.,
2005) population-based EQ-5D scoring system to the health state ―23211,‖ we obtain the
EQ-5D utility score of 0.331 and 0.512, respectively.
2. Expected-Utility Method:
The Monte-Carlo simulation method has been the only method used to estimate
the EQ-5D utility scores once the predicted probabilities of response-levels for the EQ-
5D domains are obtained. In the current study, we introduce two different and new
methods that can be used to estimate the EQ-5D utility scores: expected-utility and most-
likely probability methods. For the expected-utility method, we applied the expected
utility formula (Von Neumann & Morgenstern, 2004) and used both the U.K. and U.S.
population-based EQ-5D scoring systems (Dolan, 1997; Shaw et al., 2005) to estimate
the EQ-5D utility scores for individuals given the predicted probabilities of the response-
levels obtained from the Bayesian networks (an example on how to estimate the EQ-5D
utility score is detailed in the Appendix B).
77
For the U.K. population-based EQ-5D scoring system:
Predicted EQ-5D Utility Score = 1 – [Expected_Disutility(mobility) +
Expected_Disutility(self-care) + Expected_Disutility(usual activities) +
Expected_Disutility(pain/discomfort) + Expected_Disutility(anxiety/depression)
+ Expected_Disutility(any response with some/severe-problems) +
Expected_Disutility(any response with severe-problems)]
Where the Expected Disutilities for each EQ-5D domain and when an individual
responds to any EQ-5D domain with some and/or severe problems are calculated based
on the U.K. scoring system as follows (see Appendix C for Stata codes):
Expected_Disutility(X) = P
2
(X) * Disutility_at_some-problems_response(X) +
P
3
(X) * Disutility_at_severe-problems_response(X)
Expected_Disutility(any response with some/severe-problems) = P(any response
with some/severe problems) * 0.081
Expected_Disutility(any response with severe-problems) = P(any response with
severe-problems) * 0.269
For the U.S. population-based EQ-5D scoring system (see Appendix C):
Predicted EQ-5D Utility Score = 1 – [Expected_Disutility(mobility) +
Expected_Disutility(self-care) + Expected_Disutility(usual activities) +
Expected_Disutility(pain/discomfort) + Expected_Disutility(anxiety/depression)
+ Expected_Disutility(number of any response with some/severe-problems
78
beyond first) + Expected_Disutility(square of number of any response with some-
problems beyond first) + Expected_Disutility(number of any response with
severe-problems beyond first) + Expected_Disutility(square of number of any
response with severe-problems beyond first)]
Where the Expected Disutilities for each EQ-5D domain and when an individual
responds to any EQ-5D domain with some and/or severe problems are calculated based
on the U.S. scoring system as follows:
Expected_Disutility(X) = P
2
(X) * Disutility_at_some-problems_response(X) +
P
3
(X) * Disutility_at_severe-problems_response(X)
Expected_Disutility(number of any response with some/severe-problems beyond
first) = P(number of any response with some/severe-problems beyond first) *
(-0.140)
Expected_Disutility(square of number of any response with some-problems
beyond first) = P(square of number of any response with some-problems beyond
first) * 0.011
Expected_Disutility(number of any response with severe-problems beyond first) =
P(number of any response with severe-problems beyond first) * (-0.122)
Expected_Disutility(square of number of any response with severe-problems
beyond first) = P(square of number of any response with severe-problems beyond
first) * (-0.015)
79
One can think of the expected-utility method as an exact method, because it gives
an exact utility score through application of an algebraic equation that would be expected
from repeated simulations using Monte Carlo for instance.
3. Most-Likely Probability Method:
In the most-likely probability method, the response-level was assigned to the
highest probability among P
1
(X), P
2
(X), and P
3
(X):
(Predicted EQ-5D Response-Level)
i
) ( ) ( ) ( ) ( 3
) ( ) ( ) ( ) ( 2
) ( ) ( ) ( ) ( 1
2 3 1 3
3 2 1 2
3 1 2 1
X P X P and X P X P if
X P X P and X P X P if
X P X P and X P X P if
i i i i
i i i i
i i i i
Similar to the Monte Carlo simulation method, predicted EQ-5D utility scores
were then obtained by applying both the U.K. and U.S. population-based EQ-5D scoring
systems.
Current Mapping Approaches using Econometric Methods:
In the current study, the predictive validity of our probabilistic mapping method is
compared to current mapping methods using econometric OLS, CLAD, and MNL
models.
1. OLS Model:
We adopted the OLS model from Franks and colleagues, which was used to map
the original SF-12 onto EQ-5D (Franks et al., 2003; Franks et al., 2004). In the OLS
80
model, the EQ-5D utility index was regressed on the PCS-12 and MCS-12 scores of the
SF-12v2, their squared terms, and the interaction term of PCS-12 and MCS-12.
2. CLAD Model:
The CLAD model was introduced by Sullivan and Ghushchyan (2006) to map the
original SF-12 onto EQ-5D taking into account the ceiling effect of the EQ-5D index
observed in MEPS data (nearly half of the participants reported full health in MEPS).
Besides the PCS-12 and MCS-12 and their interaction term, the CLAD model in their
study also included other characteristics such as age, gender, ethnicity, education,
income, and co-morbidity. However, the predictive validity was only slightly increased
(Sullivan & Ghushchyan, 2006). Because information on socio-demographics are not
always available and to have a fair comparison with other models in our study, we
decided not to include the other characteristics mentioned above in the CLAD model.
3. MNL Model:
In addressing several methodological issues of the common mapping approach
using OLS or CLAD method (such as, prediction outside the domain of the preference-
based target, ceiling/floor effect, and assignment to health states not defined in the
preference-based target), Gray and colleagues (2006) proposed a different approach
called ―response-mapping.‖ Using multinomial logistic regression on 12 items of the
original SF-12, this method estimates the probabilities of the response-levels for each
EQ-5D health domain. Once the probabilities of being in a given response-level for every
EQ-5D domain were calculated, the EQ-5D utility scores were then estimated by the
81
similar methods used in the probabilistic mapping approach: (1) Monte-Carlo simulation,
(2) expected utility, and (3) most-likely probability.
Data and Statistical Analysis:
The analytic sample from 2003 MEPS data was randomly and equally split into a
modeling sample (used to develop the models) and a validation sample (used to validate
the models built from the modeling sample). To evaluate the predictive validity of the
competing models, we assessed the following statistical measures for the validation
sample: (1) the proportion of variability in the data that is explained by the models in
terms of overall R-squared, which was calculated by squaring the Pearson product-
moment correlation between the observed and predicted EQ-5D utility scores; (2)
deviation between the predicted and observed EQ-5D utility scores at the individual level
[measured as overall mean absolute error (MAE) and mean square error (MSE)]; and (3)
deviation between the predicted and observed EQ-5D utility scores at the group level
(measured in MAEs, MSEs, as well as predicted means with absolute differences
between the observed and predicted EQ-5D utility mean scores of the prediction models
for different age groups, number of chronic conditions, and ranges of the EQ-5D index).
Because our sample size is large, trivial differences can be statistically significant. The
minimal clinically important difference (MCID) for the EQ-5D has been reported to be
anywhere from 0.030 to 0.100 depending on the disease state (Marra et al., 2005;
Pickard, Neary & Cella, 2005). Therefore, we chose an EQ-5D value of 0.030 as the
smallest clinically important difference in our study.
82
The three methods (Monte-Carlo simulation, expected utility, and most-likely
probability) used to estimate the EQ-5D utility scores for the MNL and Bayesian models
were also evaluated and compared. Hugin Researcher software version 6.9 (2008) was
used to construct and make inference for the Bayesian networks. All analyses and the
econometric models were conducted using STATA version 10.1 (2009).
Chapter-Four Results
The analytic sample consisted of 19,678 adults who completed both the EQ-5D
and SF-12v2 questionnaires from the total of 34,215 participants in the 2003 MEPS.
Table 4.1 provides descriptive data for the modeling and validation samples, each
containing 9,839 subjects. Overall, both samples had similar socio-demographic
characteristics and health states.
The structures of the Bayesian networks were constructed based on the modeling
sample. Figure 4.2 shows the Bayesian networks or the probabilistic mappings of 12
items of the SF-12v2 onto each EQ-5D domain (mobility, self-care, usual activities,
pain/discomfort, and anxiety/depression). In general, the probabilistic relationship
between each EQ-5D domain and 12 items of SF-12v2 is anticipated. For example, the
EQ-5D mobility domain was probabilistically related to the SF-12v2 questions on general
health, physical health, moderate activities, stairs climbing, pain, and energy level.
However, in the EQ-5D self-care domain, only the probabilistic relationship with SF-
12v2 question on moderate activities was significant.
83
Table 4.1: Socio-demographic characteristics and health states of the modeling and
validation samples using 2003 Medical Expenditure Panel Survey (MEPS) data
Modeling Sample Validation Sample
N (%) N (%)
Number (N) 9,839 9,839
Mean Age (SD) 44.99 (17.55) 45.36 (17.33)
Gender
Male 4,413 (44.9%) 4,403 (44.8%)
Female 5,426 (55.1%) 5,436 (55.2%)
Race/Ethnicity
White 7,785 (79.1%) 7,772 (79.0%)
Black 1,395 (14.2%) 1,435 (14.6%)
Asian 423 (4.3%) 399 (4.1%)
Others 236 (2.4%) 233 (2.3%)
Years of Schooling
< 12 2,694 (27.4%) 2,654 (27.0%)
12 3,141 (31.9%) 3,148 (32.0%)
13 – 15 1,943 (19.7%) 2,016 (20.4%)
> 15 1,993 (20.3%) 1,963 (20.0%)
Not Reported 68 (0.7%) 58 (0.6%)
Family Income (as % of poverty line)
Poor/Negative (< 100%) 1,661 (16.9%) 1,646 (16.7%)
Near Poor (100% - 124%) 563 (5.7%) 573 (5.8%)
Low Income (125% - 199%) 1,597 (16.2%) 1,600 (16.3%)
Middle Income (200% - 399%) 2,909 (29.6%) 2,871 (29.2%)
High Income (≥ 400%) 3,109 (31.6%) 3,149 (32.0%)
EQ-5D
a
Mobility 1,887 (19.2%) 1,921 (19.5%)
Self-Care 484 (4.9%) 560 (5.7%)
Usual Activities 1,995 (20.3%) 1,842 (18.7%)
Pain/Discomfort 4,146 (42.1%) 4,205 (42.7%)
Anxiety/Depression 2,723 (27.7%) 2,792 (28.4%)
EQ-5D Score (SD) 0.821 (0.248) 0.816 (0.253)
SF-12 Physical Component Summary 49.40 (10.75) 49.33 (10.85)
(PCS-12)
SF-12 Mental Component Summary 50.46 (10.09) 50.27 (10.07)
(MCS-12)
a
Number (%) who reported problems (response-levels 2 or 3) in EQ-5D health domains
84
Figure 4.2. The Bayesian networks for predicting response-levels of five EQ-5D domains from 12 items of the SF-12v2
Notes:
- ACTEQ5D = EQ-5D Activity domain; DEPREQ5D = EQ-5D Anxiety/Depression domain; MOBIEQ5D = EQ-5D Mobility domain;
SELFEQ5D = EQ-5D Self-care domain; and PAINEQ5D = EQ-5D Pain/Discomfort domain
- GENH12 = SF-12 General Health (Question 1); HMODACT12 = SF-12 Health Limits Moderate Activities (Question 2.1); HCLIM12 =
SF-12 Health Limits Climbing Stairs (Question 2.2); PACC12 = SF-12 Accomplished Less due to Physical Health (Question 3.1); PLIM12
= SF-12 Work and Other Activities Limited due to Physical Health (Question 3.2); MACC12 = SF-12 Accomplished Less due to Emotional
Problems (Question 4.1); MLIM12 = SF-12 Work and Other Activities Limited due to Emotional Problems (Question 4.2); PAINLIM12 =
SF-12 Pain Interferes Normal Work (Question 5); CALM12 = SF-12 Feeling Calm and Peaceful (Question 6.1); ENERGY12 = SF-12
Feeling a lot of Energy (Question 6.2); BLUE12 = Feeling Downhearted and Depressed (Question 6.3); HSOCACT12 = SF-12 Physical
Health and Emotional Problems Interferes with Social Activities (Question 7)
85
When 12 items of the SF-12v2 were used as predictors, the overall error rates of
predicted response-levels in the EQ-5D mobility, self-care, usual activities,
pain/discomfort, and anxiety/depression domains were 8.64%, 5.69%, 3.51%, 16.39%,
and 7.97%, respectively.
Tables 4.2, 4.3, 4.4 and 4.5 provide details of the predictive measures (MAEs,
MSEs, overall R-squared, predicted EQ-5D means with absolute differences between the
observed and predicted EQ-5D utility mean scores) of all the prediction models in the
overall sample, different age groups, number of chronic conditions, and ranges of the EQ-
5D index. For the response and probabilistic mappings, the expected-utility method, for
the most part, performed the best in terms of overall R-squared, MAEs, MSEs, and
predicted EQ-5D utility mean scores as compared with both the Monte-Carlo simulation
and most-likely probability methods. The most-likely probability method resulted in the
lowest MAEs, but the worst in predicted EQ-5D utility mean scores in the overall sample,
age groups, number of chronic conditions, and ranges of the EQ-5D index (the absolute
differences between the observed and predicted EQ-5D utility mean scores were more
than the MICD of 0.030). Therefore, when comparing among the models, we used results
from the expected-utility method in MNL regression and the Bayesian networks.
The Bayesian networks consistently outperformed other prediction models in the
overall sample using the U.K. population-based (MAE = 0.077, MSE = 0.013, and R-
square overall = 0.802) and the U.S. population-based (MAE = 0.058, MSE = 0.007, and
R
2
overall = 0.787) EQ-5D scoring systems, as well as in different age groups, number of
86
chronic conditions, and ranges of the EQ-5D index. Only when the EQ-5D index range
was between 0.900 and 1.00, the CLAD model performed slightly better than the
Bayesian networks, MSE = 0.004 vs. 0.006; and MAE = 0.028 vs. 0.057 for the U.K.
population-based system and MSE = 0.003 vs. 0.004; and MAE = 0.025 vs. 0.048 for the
U.S. population-based system. Yet, the CLAD model poorly predicted the EQ-5D utility
mean scores at overall individual (0.033 and 0.024 for U.K. and U.S. population-based
systems, respectively) as well as at group levels (the absolute differences were more than
the MICD of 0.030). In addition, the absolute differences of the Bayesian networks in
predicting the EQ-5D utility mean scores were generally smaller at the group levels
compared to other models (Tables 4.2, 4.3, 4.4 and 4.5).
Figures 4.3 and 4.4 show the scatter plots of the observed versus predicted EQ-5D
utility scores for all prediction models using the U.K. and U.S. population-based scoring
systems respectively. The scatter plots for the response and probabilistic mappings using
the expected-utility method shows that data points were closely distributed around
diagonal line indicating the observed EQ-5D utility score equals to the predicted EQ-5D
utility value while data points for the Monte-Carlo simulation and most-likely probability
methods were scattered in a more random fashion. Similarly, when comparing to other
prediction models, the scatter plot of the Bayesian networks using the expected-utility
method appeared to have data points distributed more tightly around the diagonal line
that indicates the observed EQ-5D utility score equals to the predicted EQ-5D utility
value compared to other models.
87
Table 4.2: MSE, MAE, and the EQ-5D utility mean scores with absolute differences between the observed and predicted
EQ-5D utility mean scores of prediction models in the overall sample, age groups, and number of chronic conditions for the
validation sample (n = 9,839) using U.K. scoring system
Notes:
BN = Bayesian Network; MNL = Multinomial Logistic; OLS = Ordinary Least Square; CLAD = Censored Least Absolute Deviation; MSE = Mean
Squared Error; and MAE = Mean Absolute Error
a
Using Monte-Carlo simulation method to obtain predicted response-level for each EQ-5D domain
b
Using Expected-Utility method to estimate predicted EQ-5D utility scores
c
Using the Most-Likely Probability method obtain predicted response-level for each EQ-5D domain
88
Table 4.2: Continued
d
Franks and Colleagues model using OLS regression of EQ-5D utility scores on physical component summary scores of SF-12 (PCS-12), mental
component summary scores of SF-12 (MCS-12), squared PCS-12, squared MCS-12, and the interaction term PCS-12 x MCS-12
e
Sullivan and Ghushchyan model using CLAD regression of EQ-5D utility scores on PCS-12, MCS-12, and PCS-12 x MCS-12
f
Observed EQ-5D utility scores from the validation sample (n = 9,839)
g
Square of the Pearson product-moment correlation between the observed and predicted EQ-5D utility scores of the prediction models for the
validation sample
h
Absolute Difference between the observed and predicted EQ-5D utility mean scores of the prediction models
i
Chronic Conditions reported in MEPS: Asthma; Diabetes; Hypertension; Heart Diseases including Coronary Heart Disease, Angina, Heart Attack,
or other diagnosed Heart Diseases; Stroke; Emphysema; Joint Pain; or Arthritis
* Absolute Difference is more than the minimal clinically important difference (MCID) of 0.03 in EQ-5D score
89
Table 4.3: MSE, MAE, and the EQ-5D utility mean scores with absolute differences between the observed and predicted
EQ-5D utility mean scores of prediction models in the overall sample, age groups, and number of chronic conditions for the
validation sample (n = 9,839) using U.S. scoring system
Notes:
BN = Bayesian Network; MNL = Multinomial Logistic; OLS = Ordinary Least Square; CLAD = Censored Least Absolute Deviation; MSE = Mean
Squared Error; and MAE = Mean Absolute Error
a
Using Monte-Carlo simulation method to obtain predicted response-level for each EQ-5D domain
b
Using Expected-Utility method to estimate predicted EQ-5D utility scores
c
Using the Most-Likely Probability method obtain predicted response-level for each EQ-5D domain
90
Table 4.3: Continued
d
Franks and Colleagues model using OLS regression of EQ-5D utility scores on physical component summary scores of SF-12 (PCS-12), mental
component summary scores of SF-12 (MCS-12), squared PCS-12, squared MCS-12, and the interaction term PCS-12 x MCS-12
e
Sullivan and Ghushchyan model using CLAD regression of EQ-5D utility scores on PCS-12, MCS-12, and PCS-12 x MCS-12
f
Observed EQ-5D utility scores from the validation sample (n = 9,839)
g
Square of the Pearson product-moment correlation between the observed and predicted EQ-5D utility scores of the prediction models for the
validation sample
h
Absolute Difference between the observed and predicted EQ-5D utility mean scores of the prediction models
i
Chronic Conditions reported in MEPS: Asthma; Diabetes; Hypertension; Heart Diseases including Coronary Heart Disease, Angina, Heart Attack,
or other diagnosed Heart Diseases; Stroke; Emphysema; Joint Pain; or Arthritis
* Absolute Difference is more than the minimal clinically important difference (MCID) of 0.03 in EQ-5D score
91
Table 4.4: Mean squared error (MSE) and mean absolute error (MAE) of prediction models by EQ-5D index range, age
group, and number of chronic conditions (NCC) for the validation sample (n = 9,839) using U.K. scoring system
Notes:
BN = Bayesian Network; MNL = Multinomial Logistic; OLS = Ordinary Least Square; CLAD = Censored Least Absolute Deviation; and NCC =
Number of Chronic Conditions
a
Using Monte-Carlo simulation method to obtain predicted response level for each of five EQ-5D domains
b
Using Expected-Utility method to estimate predicted EQ-5D utility scores
c
Using the Most-Likely Probability method obtain predicted response level for each of five EQ-5D domains
92
Table 4.4: Continued
d
Franks and Colleagues model using OLS regression of EQ-5D utility scores on physical component summary scores of SF-12 (PCS-12), mental
component summary scores of SF-12 (MCS-12), squared PCS-12, squared MCS-12, and the interaction term PCS-12 x MCS-12
e
Sullivan and Ghushchyan model using CLAD regression of EQ-5D utility scores on PCS-12, MCS-12, and PCS-12 x MCS-12
f
Chronic Conditions reported in MEPS: Asthma; Diabetes; Hypertension; Heart Diseases including Coronary Heart Disease, Angina, Heart Attack,
or other diagnosed Heart Diseases; Stroke; Emphysema; Joint Pain; or Arthritis
† Smallest MSE/MAE in the subgroup among the models [note that the Expected Utility method was selected for the Bayesian networks, BN (2),
and MNL model, MNL(2)]
93
Table 4.5: Mean squared error (MSE) and mean absolute error (MAE) of prediction models by EQ-5D index range, age
group, and number of chronic conditions (NCC) for the validation sample (n = 9,839) using U.S. scoring system
Notes:
BN = Bayesian Network; MNL = Multinomial Logistic; OLS = Ordinary Least Square; CLAD = Censored Least Absolute Deviation; and NCC =
Number of Chronic Conditions
a
Using Monte-Carlo simulation method to obtain predicted response level for each of five EQ-5D domains
b
Using Expected-Utility method to estimate predicted EQ-5D utility scores
c
Using the Most-Likely Probability method obtain predicted response level for each of five EQ-5D domains
94
Table 4.5: Continued
d
Franks and Colleagues model using OLS regression of EQ-5D utility scores on physical component summary scores of SF-12 (PCS-12), mental
component summary scores of SF-12 (MCS-12), squared PCS-12, squared MCS-12, and the interaction term PCS-12 x MCS-12
e
Sullivan and Ghushchyan model using CLAD regression of EQ-5D utility scores on PCS-12, MCS-12, and PCS-12 x MCS-12
f
Chronic Conditions reported in MEPS: Asthma; Diabetes; Hypertension; Heart Diseases including Coronary Heart Disease, Angina, Heart Attack,
or other diagnosed Heart Diseases; Stroke; Emphysema; Joint Pain; or Arthritis
† Smallest MSE/MAE in the subgroup among the models [note that the Expected Utility method was selected for the Bayesian networks, BN (2),
and MNL model, MNL(2)]
Figure 4.3. Scatter Plots of the observed EQ-5D utility scores vs. predicted EQ-5D utility values based on U.K. scoring
system
95
Figure 4.4. Scatter Plots of the observed EQ-5D utility scores vs. predicted EQ-5D utility values based on U.S. scoring
system
Notes for Figures 4.3 and 4.4:
- Among the models (first row): The Bayesian Networks with Expected-Utility (EU) scoring method; Multinomial Logistic Regression (MNL)
with Expected-Utility (EU) scoring method; Franks and Colleagues model using OLS regression (OLS); Sullivan and Colleagues model using
CLAD regression (CLAD).
- The Monte-Carlo (MC) simulation and Most-Likely Probability (MLP) methods for the Bayesian networks and MNL (second row): the
Bayesian Networks with Monte-Carlo simulation scoring method; Multinomial Logistic Regression with Monte-Carlo simulation scoring
method; the Bayesian Networks with Most-Likely Probability scoring method; Multinomial Logistic Regression with Most-Likely Probability
scoring method.
96
Chapter-Four Discussion
In this study, we developed a new algorithm using Bayesian networks to map
health-profile or disease-specific measures onto preference-based measures. Using the
2003 MEPS data (without sampling weights), we showed that our probabilistic mapping
method consistently outperformed the commonly used econometric mapping approaches
at individual as well as group levels. Furthermore, the probabilistic mapping provides a
―natural‖ approach to model the relationships among health dimensions without
assumptions and restrictions on functional forms that exist in the econometric methods
(Mortimer et al., 2007). Besides better predictive validity, the probabilistic mapping
approach also provides the graphical relationships among variables that may be useful for
researchers in further investigating the correlational relationships of health dimensions
among and/or between preference-based measures and health-profile measures. For
example, to predict response-levels for the EQ-5D self-care domain, only the SF-12v2
question on moderate activities was needed and it was able to predict with more than
94% correction rate. It should be noted that like other structural techniques (e.g., factor
analysis), Bayesian networks may be used to explore or confirm models. In our study, we
used the Bayesian network structural learning as an exploratory technique but we note
that a theoretically based confirmatory approach (i.e., a priori choosing a model based on
a theory of item endorsement) may prove more generalizable, but also may not optimize
predictive performance in any given study. We encourage future work on testing
theoretically based Bayesian networks.
97
Gray and colleagues (2006) used Monte Carlo simulation to generate EQ-5D
responses based on the predicted probabilities obtained from multinomial logistic
regression. The Monte-Carlo simulation method, however, has a penalty when an
incorrect prediction is made, which causes the estimated EQ-5D utility score to increase
or decrease significantly. In addition to the Monte-Carlo simulation method, we
introduced the expected-utility and most-likely probability methods and showed that the
expected-utility method produced the best predictive performance among the three
methods. In fact, in response mapping, using the expected-utility method decreased the
overall MSE in both U.K. and U.S. population-based EQ-5D scoring systems (0.037 to
0.021 and 0.019 to 0.010, respectively) and MAE (0.112 to 0.095 and 0.081 to 0.070,
respectively) and increased R-squared overall (0.490 to 0.679 and 0.477 to 0.673). This is
because the expected-utility method gives an exact value as opposed to a simulated
distribution with Monte-Carlo simulation method. Also, even though the most-likely
probability method resulted in the smallest MAE, it had a higher MSE, and lower R-
squared overall, and the absolute differences of its predicted and the observed EQ-5D
means were higher than the MCID value of 0.030 at the individual and group level as
compared to the expected-utility method.
The CLAD model by Sullivan and Ghushchyan (2006)
performed poorly in
estimating the EQ-5D utility mean scores in the overall and across the groups despite it
seemed to be more theoretically sound than the simple OLS model by Frank and
Colleagues (2003; 2004). In addition, the CLAD model could not predict EQ-5D utility
scores less than 0.010 and 0.332 for the U.K. and U.S. population-based scoring systems,
98
respectively. Thus, it is impossible to use the CLAD model in worse-than-dead situations
that does occur in the EQ-5D scoring systems. Surprisingly, the simple OLS model
performed quite well when compared to the MNL and CLAD models. The CLAD model,
however, performed well when the EQ-5D scores are greater than 0.900. Possible
explanation is that the CLAD model estimates the median rather than mean and addresses
the ceiling effect of the EQ-5D scores, but performs poorly otherwise. The OLS model,
on the other hand, estimates the means, thus it generally performs better than the CLAD
model.
There are several limitations to the current study. First, as with other mapping
methods, the probabilistic mapping approach estimated less precisely with poor health
states even though it still performed better than other models. This might in part be due to
a large heterogeneity in the population that we defined by age groups and number of the
reported chronic conditions. Second, because basic differences among descriptive
systems always exist, i.e. SF-12v2 specifically measures ―energy‖ while the EQ-5D does
not, both the probabilistic and response mapping methods might not fully capture
changes occurring in certain health dimensions from the mapping HRQOL instrument to
the mapped one. Third, this work was conducted on a sample of general population
containing only information of the health-profile SF-12v2 and preference-based EQ-5D
measures. Thus, we were not able to assess the mapping performance between disease-
specific and preference-based measures. In addition, the comparisons of predictive
performance of the models in our study were limited to the same population in MEPS.
We, therefore, tried to make the best use of the MEPS data by randomly splitting the data
99
into two equal halves, one for model development and the other for model validation.
Fourth, none of the models in our study adequately predicted the EQ-5D scores below
0.499. Therefore, it is much difficult for the models to predict the EQ-5D scores in this
particular group. Furthermore, different disease groups may have distinct probabilistic
relationships. Hence, the external validity of our findings is still unclear to with respect to
different disease groups. Furthermore, the often superior performance of the Bayesian
network suggests strong dependencies among predictors. Hence, future research that
employs general linear methods to quality of life data may benefit from techniques that
are designed to handle dependencies (e.g., seemingly unrelated regression equations
(SURE) technique (Zellner, 1962).
In conclusion, for researchers seeking to minimize respondent burden while being
able to estimate health utilities from an available health status measure, we introduce a
new robust and natural approach to convert health status responses to health utility
measures and further demonstrated that the probabilistic mapping method is superior to
current mapping methods using econometric techniques.
100
Chapter-Four References
Aaronson NK, Ahmedzai S, Bergman B, et al. The European Organization for Research
and Treatment of Cancer QLQC-30: a quality-of-life instrument for use in
international clinical trials in oncology. J Natl Cancer Inst 1993;85:365–76.
Andresen EM, Rothenberg BM, Kaplan RM. Performance of a self-administered mailed
version of the quality of well-being (QWB-SA) questionnaire among older adults.
Med Care 1998;36:1349–1360.
Brazier J, Deverill M, Green C, et al. A review of the use of health status measures in
economic evaluation. Health Technol Assess 1999;3:1–164.
Brazier JE, Roberts J. The estimation of a preference-based measure of health from the
SF-12. Med Care 2004;42:851–859.
Brooks R, Rabin R, Charro F. The measurement and Valuation of health status using EQ-
5D: A European perspective. Dordrecht, The Netherlands: Kluwer Academic;
2003.
Dolan P. Modeling valuations for EuroQol health states. Med Care. 1997;35:1095-1108.
Feeny D, Furlong W, Torrance GW, et al. Multiattribute and single-attribute utility
functions for the health utilities index mark 3 system. Med Care 2002;40:113–
128.
Franks P, Lubetkin DI, Gold MR, et al. Mapping the SD-12 to preference-based
instruments: convergent validity in low-income, minority population. Med Care
2003;41:1277-1283.
Franks P, Lubetkin DI, Gold MR, et al. Mapping the SF-12 to the EuroQoL EQ-5D
index in a national US sample. Med Decis Making 2004;24:247-254.
Gray A, Rivero-Arias O, Clarke P. Estimating the association between SF-12 responses
and EQ-5D utility values by response mapping. Med Decis Making 2006:26:18-
29.
Hugin Expert A/S. Hugin Researcher. Version 6.9. Computer software. Aalborg,
Denmark: Hugin Expert A/S; 2008.
Jenkinson C, Fitzpatrick R, Peto V, et al. The PDQ-8: development and validation of a
short form Parkinson‘s disease questionnaire. Psychology & Health 1997;12:805-
814.
Jensen FV. An introduction to Bayesian Networks. New York: Springer; 1996.
101
Marra CA, Woolcott JC, Kopec JA, et al. A comparison of generic, indirect utility
measures (the HUI2, HUI3, SF-6D, and EQ-5D) and disease-specific instruments
(the RAQoL and HAQ) in rheumatoid arthritis. Soc Sci Med 2005;60:1571-1582.
Medical Expenditure Panel Survey. MEPS HC-079: 2003 full year consolidated data file.
Available at: http://www.meps.ahrq.gov/mepsweb/data_stats/download_data_files
_detail.jsp?cboPufNumber=HC-079. Accessed June 16, 2008.
Mortimer D, Segal L, Hawthorne G, Harris A. Item-based versus scale-based mappings
from the SF-36 to a preference-based quality of life measure. Value Health
2007;10:398-407.
Mortimer D, Segal L. Comparing the Incomparable? A systematic review of competing
techniques for converting descriptive measures of health status into QALY-
weights. Med Decis Making 2008;28:66-89.
Neapolitan RE. Learning Bayesian Networks. Upper Saddle River (NJ): Prentice-Hall
Inc.; 2003.
Pearl J. Causality: models, reasoning, and inference. New York: Cambridge University
Press; 2000.
Pickard SA, Neary MP, Cella D. Estimation of minimally important difference in EQ-5D
utility and VAS scores in cancer. Health Qual Life Outcomes 2007;5:70.
Rowen D, Brazier J, Roberts J. Mapping SF-36 onto the EQ-5D Index: How reliable is
the relationship? Health Qual Life Outcomes 2009; 7:27.
Shaw JW, Johnson JA, Coons SJ. US valuation of the EQ-5D health states: development
and testing of the D1 valuation model. Med Care 2005;43:203-220.
StataCorp. Stata statistical software. Version 10. College Station, TX: Stata Corp LP;
2009.
Sullivan PW, Ghushchyan V. Mapping the EQ-5D index from the SD-12: US general
population preferences in a nationally representative sample. Med Decis Making
2006;26:401-409.
Von Neumann J, Morgenstern O. Theory of Games and Economic Behavior, 60
th
Anniversary Edition. Princeton, NJ: Princeton University Press; 2004.
Ware JE, Kosinski M, and Keller SD. A 12-item Short Form Health Survey: Construction
of scales and preliminary tests of reliability and validity. Med Care 1996;34: 220-
233.
Ware JE, Kosinki M, Turner-Bower D, et al. How to score version 2 of the SF-12 health
survey. Lincoln (RI): QualityMetric; 2002.
102
Ware JE, Sonw KK, Kosinki M, et al. SF-36 health survey: manual and interpretation
guide. Boston (MA): The Health Institute, New England Medical Center; 1993.
Zellner A. An efficient method of estimating seemingly unrelated regression equations
and tests for aggregation bias. Journal of the American Statistical Association
1962;57: 348–368.
103
CHAPTER 5: CONCLUSIONS
With the growing complexity of clinical decisions together with the need for cost
control and patient‘s quality of life assessment, decision-analysis tools provide many
advantages in development of clinical practice guidelines (Shortliffe & Perreault, 2006).
Markov model and Bayesian network are the most common probabilistic graphical
models that combine both probability and graph theories and can be used as decision-
analysis tools to assist healthcare professionals to make better clinical decisions under
uncertainties as well as to improve the quality of healthcare.
Markov model presents a technique for analyzing repeatable events (e.g. chronic
diseases like diabetes, emphysema, etc.) or events that progress over an extended period
of time (e.g. advanced diseases like metastatic breast cancer, etc.) (Cooper et al., 2003).
In addition, Markov model with probabilistic Monte Carlo simulation offers an efficient
way for uncertainty analysis when making clinical decisions. Different from other uses of
Markov modeling in medical research, applications of Markov model in health
economics and outcomes research provide a reasonably simplified representation of the
natural progression of advanced diseases for integrating both costs and outcomes
simultaneously so that the cost-effectiveness of new treatments can be estimated
efficiently and scarce healthcare resources can be allocated effectively.
Bayesian networks, which have been introduced in recent years as a formalism for
representing and reasoning with models of problems involving uncertainty, offer a natural
way to represent the uncertainties in healthcare when dealing with diagnosis, treatment
104
choice, detection of medical errors, and prediction of prognosis (Lucas, 1998). The most
attractive feature of Bayesian network is the ability to compute any probabilistic
statement (e.g. what is the likelihood a particular lab test result is erroneous given other
lab test results and patient‘s medical condition, or the probability that one would have
―some problems‖ in his/her self-care health domain given his/her ―moderate activity‖ is
not limited, etc.) if all relevant information are included as parameters (or nodes); thus,
the Bayesian network can be used as decision-analysis tools for making clinical decisions
in the face of uncertainty.
Markov models have been widely used in health policy studies, specifically in
cost-effectiveness analysis to evaluate new interventions. Their use in health economics
and outcomes research will continue to grow as they become a standard modeling method
for economic evaluations (Weinstein et al., 2003). The current limited use of Bayesian
networks in health economics and outcomes research is likely due to the scarcity of
exiting literature as well as the availability and ease to use of Bayesian network software.
Fortunately, with the present rate of scientific and technological progress, especially the
advanced computational abilities of modern computers, Bayesian networks can now be
widely applied in healthcare research in general and also in health economics and
outcomes research in specific.
This dissertation introduces new approaches using probabilistic graphical models
to improve patient safety and health quality of life research, and provide unique way in
modeling cost-effectiveness analysis with several advantages as compared to other
105
traditional models using Microsoft® Excel spreadsheet or models using commercial
software like TreeAge®.
Paper 1 demonstrates a cost-effectiveness analysis model of an expensive and
newly approved cancer drug, lapatinib, using a Markov model with Monte-Carlo
simulation method. This modeling approach innovatively uses Microsoft® Excel
spreadsheet with Visual Basic programming language and provides health economists the
flexibility to customize, ease to calibrate, and graphical visualization for their cost-
effectiveness models. Paper 2 presents an alternative method using a Bayesian network
that can detect blood lab errors better than the existing automated models. Successful
implementation of the Bayesian network model in clinical laboratory can help to reduce
medical costs and improve patient safety. Paper 3 provides a new robust and natural
approach using Bayesian networks to map health-profile or disease-specific measures
onto preference-based measures. Applying the probabilistic mapping technique to obtain
QALYs can be useful in health economic evaluations when health utilities are not
directly available.
Taken together, this dissertation suggests that the uses of PGM models can be
effective and useful in health economics and outcomes research. Future research using
PGM models should be widely explored and applied. One particular area of health
economics and outcomes research that Bayesian networks may be used as a model for
estimating treatment effects is currently being investigated and compared with the
common models using (multinomial) logistic regression method and propensity score
106
methods. Successful implementation of the Bayesian network model in estimating
treatment effects can open doors for more research opportunities of Bayesian networks in
the field of health economics and outcomes research.
107
BIBLIOGRAPHY
Aaronson NK, Ahmedzai S, Bergman B, et al. The European Organization for Research
and Treatment of Cancer QLQC-30: a quality-of-life instrument for use in
international clinical trials in oncology. J Natl Cancer Inst 1993;85:365–76.
Airoldi EM. Getting started in Probabilistic Graphical Models. PolS Computational
Biology 2007;3:2421-2425.
An open-label, multicenter, single arm phase II study of oral GW572016 as single agent
therapy in subjects with advanced or metastatic breast cancer who have
progressed while receiving Herceptin containing regimens. GlaxoSmithKline.
Available from URL: http://ctr.gsk.co.uk/Summary/lapatinib/II_EGF20002.pdf
[accessed: December 18, 2007].
Andresen EM, Rothenberg BM, Kaplan RM. Performance of a self-administered mailed
version of the quality of well-being (QWB-SA) questionnaire among older adults.
Med Care 1998;36:1349–1360.
American Cancer Society. Breast Cancer Facts & Figures 2007-2008. Atlanta: American
Cancer Society, Inc.
Badia X, Guyver A, Magaz S, Bigorra J. Integrated health outcomes research strategies in
drug or medical device development, pre- and postmarketing: time for change.
Expert Review of Pharmacoeconomics & Outcomes Research 2002;2:269-278.
Bartsch R, Wenzel C, Altorjai G, Pluschnig U, Locker GJ, Rudas M, et al. Trastuzumab
(T) plus capecitabine (C) in heavily pretreated patients (pts) with advanced breast
cancer (ABC) [abstract]. J Clin Oncol. 2007;25:18S.
Beck JR, Pauker SG, Gottlieb JE, Klein K, Kassirer JP. A convenient approximation of
life expectancy (the "DEALE"). II. Use in medical decision-making. Am J Med
1982;73:889-997.
Bonini P, Plebani M, Ceriotti F, Rubboli F. Errors in laboratory medicine. Clin Chem
2002;48:691-8.
Boran G, Given P, O'Moore R. Patient result validation services. Comput Methods
Programs Biomed 1996;50:161-8.
Bøttcher SG. Learning Bayesian networks with mixed variables [dissertation]. Denmark:
Aalborg University; 2004.
Bøttcher SG, Dethlefsen C. DEAL: A package for learning Bayesian networks. J Stat
Software 2003;8.
108
Bøttcher SG, Dethlefsen C.Learning Bayesian networks with R. In: Hornik K, Leisch F,
Zeileis, editors. DSC 2003: Proceedings of the 3
rd
International Workshop on
Distributed Statistical Computing; 2003 Mar 20-22; Vienna, Austria
Brazier J, Deverill M, Green C, et al. A review of the use of health status measures in
economic evaluation. Health Technol Assess 1999;3:1–164.
Brazier JE, Roberts J. The estimation of a preference-based measure of health from the
SF-12. Med Care 2004;42:851–859.
Brooks R, Rabin R, Charro F. The measurement and Valuation of health status using EQ-
5D: A European perspective. Dordrecht, The Netherlands: Kluwer Academic;
2003.
Brown RE, Hutton J. Cost-utility model comparing docetaxel and paclitaxel in advanced
breast cancer patients. Anticancer Drugs 1998;9:899-907.
Centers for Medicare and Medicaid. Available at URL:
http://www.cms.hhs.gov/home/medicare.asp [accessed August 31, 2007].
Clinical and Laboratory Standards Institute. Autoverification of Clinical Laboratory Test
Results, Approved Guidelines (AUTO 10-A). Wayne, Pa.: Clinical and
Laboratory Standards Institute, 2006.
Cooper NJ, Abrams KR, Sutton AJ, Turner D, Lambert PC. A Bayesian approach to
Markov modeling in cost-effectiveness analyses: application to taxane use in
advanced breast cancer. J R Statist Soc A 2003;166:389-405.
Crolla LJ, Westgard JO. Evaluation of rule-based autoverification protocols. Clin
Leadersh Manag Rev 2003;17:268-72.
Del Bianco S, Rondinelli R. Trastuzumab-containing therapies: Activity beyond disease
progression in M.B.C. – A pivotal experience [abstract]. J Clin Oncol.
2006;24:18S.
Devlin N, Parkin D. Does NICE have a cost effectiveness threshold and what other
factors influence its decisions? A binary choice analysis. Health Econ
2004;13:437-452.
Doctor JN, Strylewicz GB. Detecting ‗Wrong Blood in Tube‘ Errors: Evaluation of a
Bayesian Network Approach. Artif Intell Med, 2010. [Epub ahead of print].
Dolan P. Modeling valuations for EuroQol health states. Med Care. 1997;35:1095-1108.
Dranitsaris G, Maroun J, Shah A. Severe chemotherapy induced diarrhea (CID) in
patients with colorectal cancer: A cost of illness analysis [abstract]. J Clin Oncol.
2004;22:14S.
109
Earle CC, Chapman RH, Baker CS, Bell CM, Stone PW, Sandberg EA, et al. Systematic
overview of cost-utility assessments in oncology. J Clin Oncol 2000;18:3302-
3317.
Elkin EB, Weinstein MC, Winer EP, Kuntz KM, Schnitt SJ, Weeks JC. HER-2 testing
and trastuzumab therapy for metastatic breast cancer: a cost-effectiveness
analysis. J Clin Oncol 2004;22:854-863.
Extra J-M, Antoine E-C, Vincent-Salomon A, et al. Favourable effect of continued
trastuzumab treatment in metastatic breast cancer: results from the French
Hermine cohort study. Breast Cancer Res Treat 2006; 100:S102.
Feeny D, Furlong W, Torrance GW, et al. Multiattribute and single-attribute utility
functions for the health utilities index mark 3 system. Med Care 2002;40:113–
128.
Fenwick E, Marshall DA, Levy AR, Nichol G. Using and interpreting cost-effectiveness
acceptability curves: an example using data from a trial of management strategies
for atrial fibrillation. BMC Health Services Research 2006;6:52.
Forsman RW. Why is the laboratory an afterthought for managed care organizations?
Clin Chem 1996;42:813-6.
Franks P, Lubetkin DI, Gold MR, et al. Mapping the SD-12 to preference-based
instruments: convergent validity in low-income, minority population. Med Care
2003;41:1277-1283.
Franks P, Lubetkin DI, Gold MR, et al. Mapping the SF-12 to the EuroQoL EQ-5D
index in a national US sample. Med Decis Making 2004;24:247-254.
Garcia-Saenz JA, Martin M, Puente J, Lopez-Tarruella S, Casado A, Moreno F, et al.
Trastuzumab associated with successive cytotoxic therapies beyond disease
progression in metastatic breast cancer. Clin Breast Cancer 2005;:325-329.
Garrison LP, Jr., Lubeck D, Lalla D, Paton V, Dueck A, Perez EA. Cost-effectiveness
analysis of trastuzumab in the adjuvant setting for treatment of HER2-positive
breast cancer. Cancer 2007;110:489-498.
Geyer CE, Forster J, Lindquist D, Chan S, Romieu CG, Pienkowski T, et al. Lapatinib
plus capecitabine for HER2-positive advanced breast cancer. N Engl J Med.
2006;355:2733-2743.
Geyer CE, Martin A, Newstat B, Casey MA, Berger MS, Oliva CR, et al. Lapatinib (L)
plus capecitabine (C) in HER2+ advanced breast cancer (ABC): Genomic and
updated efficacy data [abstract]. J Clin Oncol. 2007;25:18S.
110
Giannini EG, Testa R, Savarino V. Liver enzyme alteration: a guide for clinicians. CMAJ
2005;172:367-79.
Gold MR, ed, Siegel JE, ed, Russell LB, ed, Weinstein MC, ed. Cost-Effectiveness in
Health and Medicine. New York, NY, Oxford University Press, 1996.
Gray A, Rivero-Arias O, Clarke P. Estimating the association between SF-12 responses
and EQ-5D utility values by response mapping. Med Decis Making 2006:26:18-
29.
Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating
characteristic curves derived from the same cases. Radiology 1983;148:839-43.
Heckerman D., Geiger D. Learning Bayesian network. Microsoft Research; 1995. Report
No.: MSR-TR-95-02.
Heckerman D. A Tutorial on Learning Bayesian Networks. Microsoft Research; 1996.
Report No.: MSR-TR-95-06.
Hoelzel W, Weykamp C, Jeppsson JO, Miedema K, Barr JR, Goodall I, et al. IFCC
reference system for measurement of hemoglobin A1c in human blood and the
national standardization schemes in the United States, Japan, and Sweden: a
method-comparison study. Clin Chem 2004;50:166-174.
Hugin Expert A/S. Hugin Researcher Version 6.9. Computer software. Aalborg,
Denmark: Hugin Expert A/S; 2008.
Janes H, Pepe MS, Gu W. Assessing the Value of Risk Predictions by Using Risk
Stratification Tables. Ann Intern Med 2008;149:751-60.
Jay, DW, Provasek D. Characterization and mathematical correction of hemolysis
interference in selected Hitachi 717 assays. Clin Chem 1993;39:1804-10.
Jenkinson C, Fitzpatrick R, Peto V, et al. The PDQ-8: development and validation of a
short form Parkinson‘s disease questionnaire. Psychology & Health 1997;12:805-
814.
Jensen FV. An introduction to Bayesian networks. New York: Springer; 1996.
Jordan M. Graphical models. Statistical Science, Special Issue on Bayesian Statistics
2004;19:140—155.
Lamers LM, Stupp R, van den Bent MJ, Al MJ, Gorlia T, Wasserfallen JB, et al. Cost-
effectiveness of temozolomide for the treatment of newly diagnosed glioblastoma
multiforme: a report from the EORTC 26981/22981 NCI-C CE3 Intergroup
Study. Cancer 2008;112:1337-1344.
111
Landro L. The Informed Patient: Hospitals Move to Cut Dangerous Lab Errors. The Wall
Street Journal 2006. New York: D1.
Lucas P. Bayesian networks in medicine: a model-based approach to medical decision
making. Proceedings of the EUNITE Workshop on Intelligent Systems in Patient
Care, Vienna, Oct. 2001, 73-97.
Marcovina S, Bowsher RR, Miller G, Staten M, Myers G, Caudill SP, et al.
Standardization of insulin immunoassays: report of the American Diabetes
Association‘s Workgroup. Clin Chem 2007;53:711-716.
Marra CA, Woolcott JC, Kopec JA, et al. A comparison of generic, indirect utility
measures (the HUI2, HUI3, SF-6D, and EQ-5D) and disease-specific instruments
(the RAQoL and HAQ) in rheumatoid arthritis. Soc Sci Med 2005;60:1571-1582.
McLachlan SA, Pintilie M, Tannock IF. Third line chemotherapy in patients with
metastatic breast cancer: an evaluation of quality of life and cost. Breast Cancer
Res Treat 1999;54:213-23.
Medical Expenditure Panel Survey. MEPS HC-079: 2003 full year consolidated data file.
Available at:
http://www.meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?c
boPufNumber=HC-079. Accessed June 16, 2008.
Morabito A, Longo R, Gattuso D, Carillio G, Massaccesi C, Mariani L, et al.
Trastuzumab in combination with gemcitabine and vinorelbine as second-line
therapy for HER-2/neu overexpressing metastatic breast cancer. Oncol Rep
2006;16:393-398.
Mortimer D, Segal L, Hawthorne G, Harris A. Item-based versus scale-based mappings
from the SF-36 to a preference-based quality of life measure. Value Health
2007;10:398-407.
Mortimer D, Segal L. Comparing the Incomparable? A systematic review of competing
techniques for converting descriptive measures of health status into QALY-
weights. Med Decis Making 2008;28:66-89.
Murphy KP. Dynamic Bayesian Networks: representation, inference and learning
[dissertation]. Berkeley (CA). University of California at Berkeley; 2002.
Muss HB. Targeted therapy for metastatic breast cancer. N Engl J Med. 2006;355:2783-
2785.
112
National Center for Health Statistics, Centers for Disease Control. National Health and
Nutrition Examination Survey. Data Sets and Related Documentation. Available
from http://www.cdc.gov/nchs/about/major/nhanes/datalink.htm. Accessed
December 2007.
National Institute for Health and Clinical Excellence. Breast cancer (advanced or
metastatic) – lapatinib: Appraisal consultation document. Available at URL:
http://www.nice.org.uk/guidance/index.jsp?action=article&o=39849 [accessed:
May 15, 2008].
Neapolitan RE. Learning Bayesian Networks. Upper Saddle River (NJ): Prentice-Hall
Inc.; 2003.
Oosterhuis WP, Ulenkate HJ, Goldschmidt HM. Evaluation of LabRespond, a new
automated validation system for clinical laboratory test results. Clin Chem
2000;46:1811-7.
Papatheodoridis GV, Goulis J, Christodoulou D, et al. High prevalence of elevated liver
enzymes in blood donors: associations with male gender and central adiposity.
Eur J Gastroenterol Hepatol 2007;19:281-7.
Pearl J, Causality: models, reasoning, and inference. New York: Cambridge University
Press; 2000.
Pelletier EM, Shim B, Goodman S, Amonkar MM. Epidemiology and economic burden
of brain metastases among patients with primary breast cancer: results from a US
claims data analysis. Breast Cancer Res Treat 2008;108:297-305.
Pickard SA, Neary MP, Cella D. Estimation of minimally important difference in EQ-5D
utility and VAS scores in cancer. Health Qual Life Outcomes 2007;5:70.
Plebani M, Carraro P. Mistakes in a stat laboratory: types and frequency. Clin Chem
1997;43:1348-51.
R Development Core Team. In: R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing, Vienna, Austria; 2008.
Available from: http://www.R-project.org.
Ries LAG, Melbert D, Krapcho M, et al., eds. SEER Cancer Statistics Review, 1975-
2004, National Cancer Institute. Bethesda, MD,
http://seer.cancer.gov/csr/1975_2004/, based on November 2006 SEER data
submission, posted to the SEER Website, 2007.
Rowen D, Brazier J, Roberts J. Mapping SF-36 onto the EQ-5D Index: How reliable is
the relationship? Health Qual Life Outcomes 2009; 7:27.
113
Sanfey H. Gender-specific issues in liver and kidney failure and transplantation: a review.
J Womens Health (Larchmt) 2005;14:617-26.
Shah A, Maroun J, Dranitsaris G. The cost of hospitalization secondary to severe
chemotherapy induced diarrhea (CID) in patients with colorectal cancer [abstract].
J Clin Oncol. 2004:22:14S.
Shaw JW, Johnson JA, Coons SJ. US valuation of the EQ-5D health states: development
and testing of the D1 valuation model. Med Care 2005;43:203-220.
Shortliffe EH, Perreault LE, Eds. Medical informatics: computer applications in health
care. Addison-Wesley, Reading, MA, 2006.
Slamon DJ, Clark GM, Wong SG, Levin WJ, Ullrich A, McGuire WL. Human breast
cancer: correlation of relapse and survival with amplification of the HER-2/neu
oncogene. Science. 1987;235:177–182.
Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and
Health-Care Evaluation. West Sussex (England): John Wiley & Sons, 2004.
StataCorp. Stata statistical software. Version 10. College Station, TX: Stata Corp LP;
2009.
Stroobants, AK, Goldschmidt HM, Pelbani M. Error budget calculations in laboratory
medicine: linking the concepts of biological variation and allowable medical
errors. Clin Chim Acta 2003;333:169-76.
Strylewicz GB. Errors in the Clinical Laboratory: A Novel Approach to Autoverification
[dissertation]. Seattle (WA). University of Washington; 2007.
Sullivan PW, Ghushchyan V. Mapping the EQ-5D index from the SD-12: US general
population preferences in a nationally representative sample. Med Decis Making
2006;26:401-409.
Szolovits P. Uncertainty and decisions in medical informatics. Methods Inf Med
1995;34:111-121.
Thompson C, Dowding D. Responding to uncertainty in nursing practice. Int J Nurs Stud
2001;38:69-615.
Tourassi GD, Floyd CE. The effects of data sampling on the performance evaluation of
artificial neural networks in medical diagnosis. Med Decis Making 1997;17:186-
92.
Ubel PA, Hirth RA, Chernew ME, Fendick, M. What is the price of life and why doesn't
it increase at the rate of inflation? Arch Intern Med 2003;163:1637-1641.
114
US Department of Labor: US Bureau of Labor Statistics. Available at URL:
http://www.bls.gov [accessed July 21, 2007].
Vickers AJ, Cronin AM, Elkin EB, Gonen M. Extensions to decision curve analysis, a
novel method for evaluating diagnostic tests, prediction models and molecular
markers. BMC Med Inform Decis Mak 2008;8:53.
Von Neumann J, Morgenstern O. Theory of Games and Economic Behavior, 60
th
Anniversary Edition. Princeton, NJ: Princeton University Press; 2004.
Ware JE, Kosinski M, and Keller SD. A 12-item Short Form Health Survey: Construction
of scales and preliminary tests of reliability and validity. Med Care 1996;34: 220-
233.
Ware JE, Kosinki M, Turner-Bower D, et al. How to score version 2 of the SF-12 health
survey. Lincoln (RI): QualityMetric; 2002.
Ware JE, Sonw KK, Kosinki M, et al. SF-36 health survey: manual and interpretation
guide. Boston (MA): The Health Institute, New England Medical Center; 1993.
Weinstein MC, O‘Brien B, Hornberger J, Jackson J, Johanesson M, McCabe C, Luce BR.
Principles of good practice for decision analytic modeling in Health-care
evaluation: report of the ISPOR task force on good research practices – Modeling
studies. Value Health 2003;6:9-17.
Witte, DL, VanNess SA, Angstsdt DS, Pennell BJ. Errors, mistakes, blunders, outliers, or
unacceptable results: how many? Clin Chem 1997;43:1352-6.
Wiwanitkit, V. Types and frequency of preanalytical mistakes in the first Thai ISO
9002:1994 certified clinical laboratory, a 6 - month monitoring. BMC Clin Pathol
2001;1: 5.
Wu AH, ed. Tietz clinical guide to laboratory tests. St Louis, Missouri: Saunders, 2006.
Zellner A. An efficient method of estimating seemingly unrelated regression equations
and tests for aggregation bias. Journal of the American Statistical Association
1962;57: 348–368.
115
Appendix A
A Bayesian network is a graphical model that encodes probabilistic relationships
among a set of variables of interest (Pearl, 2000). Consider a domain Z of n variables x
1
,
x
2
… x
n
, where each x
i
represents a node in the Bayesian network. A Bayesian network for
Z is a joint probability distribution over Z by encoding assertions of conditional
independence and a collection of probability distributions. Specifically, a Bayesian
network B = (D, P) consists of a directed acyclic graph (DAG), D, and a set of local
probability distributions for all variables x
i
’s in the network, P.
The DAG, which defines the structure of the Bayesian network, contains a node
for each variable x
i
ϵ Z and a finite set of directed edges (arrows) between nodes denoting
the probabilistic dependencies among variables in Z. A node can have no or several
dependencies. A node is called child or parent if it is dependent on or depended by other
node(s), respectively. To each child node x
i
with parents π(x
i
), there is attached a local
probability distribution, P(x
i
|π(x
i
)). With a set of only discrete variables x
i
’s, the joint
probability distribution of the network can be factored as follows (Pearl, 2000):
In our model, we allow Bayesian networks with both discrete and continuous
variables, as treated in Bøttcher & Dethlefsen (2003). The set of nodes in Z is given by Z
= Δ U Γ, where Δ and Γ are the sets of discrete and continuous nodes, respectively. The
set of variables x
1
, x
2
… x
n
, can be denoted as:
n
i
i i n
x x P x x x P
1
2 1
)) ( | ( ) ... , (
116
) ) ( , ) (( ) , (
Y I Y I Z
;
where I and Y are the sets of discrete (δ) and continuous (γ) variables, respectively. To
ensure availability of exact local computation methods, we do not allow discrete
variables to have continuous parents. The joint probability distribution can be then
factored into a discrete part and a mixed part, so (Bøttcher & Dethlefsen, 2003):
Previously we mentioned that a Bayesian network consists of two parts (D and P).
Consequently, there are two steps to be performed during model building: (1) structure
learning in which the dependency structure that best explains the data is constructed
using a heuristic search strategy combined with a scoring metric (Bøttcher & Dethlefsen,
2003; Heckerman & Gieger, 1995; Heckerman, 1996); and (2) parametric learning in
which the parameters of the local probability models corresponding with the dependency
structure are estimated (i.e. the model has highest network score) using the Bayesian
approach (Bøttcher, 2004).
)) ( ), ( | ( ) ) ( | ( ) ... , (
2 1 y i y i i n
P P x x x P
117
Appendix B
Example on how to estimate the EQ-5Dutility scores using the U.K. population-based
scoring system for the Monte-Carlo simulation, Expected-Utility, and Most-Likely
Probability methods:
Suppose that the probabilities of the response-levels for each EQ-5D domain are
as follows:
P
1
(mobility) = 0.850; P
2
(mobility) = 0.100; and P
3
(mobility) = 0.050
P
1
(self-care) = 0.975; P
2
(self-care) = 0.020; and P
3
(self-care) = 0.005
P
1
(usual activities) = 0.900; P
2
(usual activities) = 0.075; and P
3
(usual activities)
= 0.025
P
1
(pain/discomfort) = 0.950; P
2
(pain/discomfort) = 0.040; and
P
3
(pain/discomfort) = 0.010
P
1
(anxiety/depression) = 0.400; P
2
(anxiety/depression) = 0.500; and
P
3
(anxiety/depression) = 0.100
Where P
1
(X), P
2
(X), and P
3
(X) are the predicted probabilities for the response-
level of no problems (level 1), some problems (level 2), and severe problems (level 3),
respectively (X= mobility, self-care, usual activities, pain/discomfort, or
anxiety/depression).
118
1. Monte-Carlo Simulation Method:
Suppose that the random numbers generated for EQ-5D mobility, self-care, usual
activities, pain/discomfort, or anxiety/depression are 0.163, 0.982, 0.578, 0.321, and
0.710, respectively. We then use the formula:
(Predicted EQ-5D Response-Level)
i
)] ( 1 [ 3
)] ( 1 [ ) ( 2
) ( 1
3
3 1
1
X P u if
X P u X P if
X P u if
i i
i i i
i i
The response-levels in EQ-5D mobility, self-care, usual activities,
pain/discomfort, or anxiety/depression are assigned 1, 2, 1, 1, and 2, respectively.
Applying the U.K. population-based EQ-5D scoring system to the health state ―12112,‖
we have a predicted EQ-5D utility score of 0.744.
2. Expected-Utility Method:
For the expected-utility method, the predicted EQ-5D utility scores are estimated
using the following formula:
Predicted EQ-5D Utility Score = 1 – [Expected_Disutility(mobility) +
Expected_Disutility(self-care) + Expected_Disutility(usual activities) +
Expected_Disutility(pain/discomfort) + Expected_Disutility(anxiety/depression) +
Expected_Disutility(any response with some/severe-problems) +
Expected_Disutility(any response with severe-problems)] [1]
119
Where the Expected Disutilities are calculated as the following, based on the UK
population-based EQ-5D scoring system:
Expected_Disutility(mobility) = (0.069)*P
2
(mobility) + (0.314)*P
3
(mobility)
= (0.069)*(0.100) + (0.314)*(0.050) = 0.0226
Expected_Disutility(self-care) = (0.104)*P
2
(self-care) + (0.214)*P
3
(self-care)
= (0.104)*(0.020) + (0.214)*(0.005) = 0.00315
Expected_Disutility(usual activities) = (0.036)*P
2
(usual activities) + (0.094)*P
3
(usual
activities) = (0.036)*(0.075) + (0.094)*(0.025) = 0.00505
Expected_Disutility(pain/discomfort) = (0.123)*P
2
(pain/discomfort) +
(0.386)*P
3
(pain/discomfort) = (0.123)*(0.040) + (0.386)*(0.010) = 0.00878
Expected_Disutility(anxiety/depression) = (0.071)*P
2
(anxiety/depression) +
(0.286)*P
3
(anxiety/depression) = (0.071)*(0.500) + (0.286)*(0.100) = 0.0641
Expected_Disutility(any response with some/severe-problems) = (0.081)*P(any response
with some/severe-problems) = (0.081)*[1 – P
1
(mobility)*P
1
(self-care)*P
1
(usual
activities)* P
1
(pain/discomfort) * P
1
(anxiety/depression)]
= (0.081)*[1 – (0.850)*(0.975)*(0.900)*(0.950)*(0.400)
= (0.081)*(0.7166) = 0.0580
Expected_Utility(any response with severe-problems) = (0.269)*P(any response with
severe-problems) = (0.269)*{1 – [1 - P
3
(mobility)]*[1 - P
3
(self-care)]*[1 - P
3
(usual
activities)]*[1 - P
3
(pain/discomfort)]*[1 - P
3
(anxiety/depression)]}
= (0.269)*{1 – [1 - 0.050]*[1 - 0.005]*[1 - 0.025]*[1 - 0.010]*[1 - 0.100]}
= (0.269)*{1 – [0.950]*[0.995]*[0.975]*[0.990]*[0.900]}
120
= (0.269)*{1 – 0.8212} = (0.269)*{0.1788} = 0.0481
From [1]:
Predicted EQ-5D Utility Scores = 1 – [0.0226 + 0.00315 + 0.00505 + 0.00878 + 0.0641
+ 0.0580 + 0.0481 = 1 – [0.2098] = 0.790
3. Most-Likely Probability Method:
The predicted response-levels are estimated using:
(Predicted Response-Level)
i
) ( ) ( ) ( ) ( 3
) ( ) ( ) ( ) ( 2
) ( ) ( ) ( ) ( 1
2 3 1 3
3 2 1 2
3 1 2 1
X P X P and X P X P if
X P X P and X P X P if
X P X P and X P X P if
i i i i
i i i i
i i i i
Thus, the response-levels in EQ-5D mobility, self-care, usual activities,
pain/discomfort, or anxiety/depression are assigned 1, 1, 1, 1, and 2, respectively.
Applying the U.K. population-based EQ-5D scoring system to the health state ―11112,‖
we have a predicted EQ-5D utility score of 0.848
121
Appendix C
STATA Codes for estimating the U.K. and U.S. based scoring systems for EQ-5D:
Let prob_mo(i) = predicted probability of the mobility domain at response-level (i);
Let prob_sc(i) = predicted probability of the self-care domain at response-level (i);
Let prob_ua(i) = predicted probability of the usual-activity domain at response-level
(i);
Let prob_pd(i) = predicted probability of the pain/discomfort domain at response-
level (i);
Let prob_ad(i) = predicted probability of the anxiety/depression domain at response-
level (i);
1. The U.K. based EQ-5D Scoring System:
gen mobi = (0.069)*prob_mo2 + (0.314)*prob_mo3
gen self = (0.104)*prob_sc2 + (0.214)*prob_sc3
gen act = (0.036)*prob_ua2 + (0.094)*prob_ua3
gen dispain = (0.123)*prob_pd2 + (0.386)*prob_pd3
gen anxiety = (0.071)*prob_ad2 + (0.236)*prob_ad3
gen prob_any = 1 - prob_mo1*prob_sc1*prob_ua1*prob_pd1*prob_ad1
gen any23 = (0.081)*prob_any
gen prob_bad = 1 - (1-prob_mo3)*(1-prob_sc3)*(1-prob_ua3)*(1-prob_pd3)*(1-
prob_ad3)
gen bad3 = (0.269)*prob_bad
gen ExpectedUtil = 1 - (mobi + self + act + dispain + anxiety + any23 + bad3)
2. The U.S. based EQ-5D Scoring System:
gen mobi_us = (0.146)*prob_mo2 + (0.558)*prob_mo3
gen self_us = (0.175)*prob_sc2 + (0.471)*prob_sc3
gen act_us = (0.140)*prob_ua2 + (0.374)*prob_ua3
gen dispain_us = (0.173)*prob_pd2 + (0.537)*prob_pd3
gen anxiety_us = (0.156)*prob_ad2 + (0.450)*prob_ad3
122
* I2_squared: Square of number of dimensions at level 2 beyond first (multiplied with
0.011)
* I2_squared= {(Prob_any_one_level_2 x 0) + (Prob_any_two_level_2 x 1) +
(Prob_any_three_level_2 x 2) + (Prob_any_four_level_2 x 3) + (Prob_five_level_2 x
4)}^2
gen Prob_any_two_level_2 = prob_mo2*prob_sc2*(1-prob_ua2)*(1-prob_pd2)*(1-
prob_ad2) + prob_mo2*(1-prob_sc2)*prob_ua2*(1-prob_pd2)*(1-prob_ad2) +
prob_mo2*(1-prob_sc2)*(1-prob_ua2)*prob_pd2*(1-prob_ad2) + prob_mo2*(1-
prob_sc2)*(1-prob_ua2)*(1-prob_pd2)*prob_ad2 + (1-
prob_mo2)*prob_sc2*prob_ua2*(1-prob_pd2)*(1-prob_ad2) + (1-
prob_mo2)*prob_sc2*(1-prob_ua2)*prob_pd2*(1-prob_ad2) + (1-
prob_mo2)*prob_sc2*(1-prob_ua2)*(1-prob_pd2)*prob_ad2 + (1-prob_mo2)*(1-
prob_sc2)*prob_ua2*prob_pd2*(1-prob_ad2) + (1-prob_mo2)*(1-
prob_sc2)*prob_ua2*(1-prob_pd2)*prob_ad2 + (1-prob_mo2)*(1-prob_sc2)*(1-
prob_ua2)*prob_pd2*prob_ad2
gen Prob_any_three_level_2 = prob_mo2*prob_sc2*prob_ua2*(1-prob_pd2)*(1-
prob_ad2) + prob_mo2*prob_sc2*(1-prob_ua2)*prob_pd2*(1-prob_ad2) +
prob_mo2*prob_sc2*(1-prob_ua2)*(1-prob_pd2)*prob_ad2 + prob_mo2*(1-
prob_sc2)*prob_ua2*prob_pd2*(1-prob_ad2) + prob_mo2*(1-
prob_sc2)*prob_ua2*(1-prob_pd2)*prob_ad2 + prob_mo2*(1-prob_sc2)*(1-
prob_ua2)*prob_pd2*prob_ad2 + (1-prob_mo2)*prob_sc2*prob_ua2*prob_pd2*(1-
prob_ad2) + (1-prob_mo2)*prob_sc2*prob_ua2*(1-prob_pd2)*prob_ad2 + (1-
prob_mo2)*prob_sc2*(1-prob_ua2)*prob_pd2*prob_ad2 + (1-prob_mo2)*(1-
prob_sc2)*prob_ua2*prob_pd2*prob_ad2
gen Prob_any_four_level_2 = prob_mo2*prob_sc2*prob_ua2*prob_pd2*(1-
prob_ad2) + prob_mo2*prob_sc2*prob_ua2*(1-prob_pd2)*prob_ad2 +
prob_mo2*prob_sc2*(1-prob_ua2)*prob_pd2*prob_ad2 + prob_mo2*(1-
prob_sc2)*prob_ua2*prob_pd2*prob_ad2 + (1-
prob_mo2)*prob_sc2*prob_ua2*prob_pd2*prob_ad2
gen Prob_five_level_2 = prob_mo2*prob_sc2*prob_ua2*prob_pd2*prob_ad2
gen I2_squared = [(Prob_any_two_level_2*1) + (Prob_any_three_level_2*2) +
(Prob_any_four_level_2*3) + (Prob_five_level_2*4)]^2
gen any2_beyond_1_sq = I2_squared*0.011
* I3: Number of dimensions at level 3 beyond first (multiplied with -0.122)
123
* I3 = {(Prob_any_one_level_3 x 0) + (Prob_any_two_level_3 x 1) +
(Prob_any_three_level_3 x 2) + (Prob_any_four_level_3 x 3) + (Prob_five_level_3 x
4)}
gen Prob_any_two_level_3 = prob_mo3*prob_sc3*(1-prob_ua3)*(1-prob_pd3)*(1-
prob_ad3) + prob_mo3*(1-prob_sc3)*prob_ua3*(1-prob_pd3)*(1-prob_ad3) +
prob_mo3*(1-prob_sc3)*(1-prob_ua3)*prob_pd3*(1-prob_ad3) + prob_mo3*(1-
prob_sc3)*(1-prob_ua3)*(1-prob_pd3)*prob_ad3 + (1-
prob_mo3)*prob_sc3*prob_ua3*(1-prob_pd3)*(1-prob_ad3) + (1-
prob_mo3)*prob_sc3*(1-prob_ua3)*prob_pd3*(1-prob_ad3) + (1-
prob_mo3)*prob_sc3*(1-prob_ua3)*(1-prob_pd3)*prob_ad3 + (1-prob_mo3)*(1-
prob_sc3)*prob_ua3*prob_pd3*(1-prob_ad3) + (1-prob_mo3)*(1-
prob_sc3)*prob_ua3*(1-prob_pd3)*prob_ad3 + (1-prob_mo3)*(1-prob_sc3)*(1-
prob_ua3)*prob_pd3*prob_ad3
gen Prob_any_three_level_3 = prob_mo3*prob_sc3*prob_ua3*(1-prob_pd3)*(1-
prob_ad3) + prob_mo3*prob_sc3*(1-prob_ua3)*prob_pd3*(1-prob_ad3) +
prob_mo3*prob_sc3*(1-prob_ua3)*(1-prob_pd3)*prob_ad3 + prob_mo3*(1-
prob_sc3)*prob_ua3*prob_pd3*(1-prob_ad3) + prob_mo3*(1-
prob_sc3)*prob_ua3*(1-prob_pd3)*prob_ad3 + prob_mo3*(1-prob_sc3)*(1-
prob_ua3)*prob_pd3*prob_ad3 + (1-prob_mo3)*prob_sc3*prob_ua3*prob_pd3*(1-
prob_ad3) + (1-prob_mo3)*prob_sc3*prob_ua3*(1-prob_pd3)*prob_ad3 + (1-
prob_mo3)*prob_sc3*(1-prob_ua3)*prob_pd3*prob_ad3 + (1-prob_mo3)*(1-
prob_sc3)*prob_ua3*prob_pd3*prob_ad3
gen Prob_any_four_level_3 = prob_mo3*prob_sc3*prob_ua3*prob_pd3*(1-
prob_ad3) + prob_mo3*prob_sc3*prob_ua3*(1-prob_pd3)*prob_ad3 +
prob_mo3*prob_sc3*(1-prob_ua3)*prob_pd3*prob_ad3 + prob_mo3*(1-
prob_sc3)*prob_ua3*prob_pd3*prob_ad3 + (1-
prob_mo3)*prob_sc3*prob_ua3*prob_pd3*prob_ad3
gen Prob_five_level_3 = prob_mo3*prob_sc3*prob_ua3*prob_pd3*prob_ad3
gen I3 = [(Prob_any_two_level_3*1) + (Prob_any_three_level_3*2) +
(Prob_any_four_level_3*3) + (Prob_five_level_3*4)]
gen I3_squared = I3*I3
gen any3_beyond_1 = I3*(-0.122)
gen any3_beyond_1_sq = I3_squared*(-0.015)
124
* D1: Number of dimensions at level 2 or 3 beyond first any level 2 or 3 (multiplied
with -0.140)
* D1 = {(Prob_any_one_level_2_or_3 x 0) + (Prob_any_two_level_2_or_3 x 1) +
(Prob_any_three_level_2_or_3 x 2) + (Prob_any_four_level_2_or_3 x 3) +
(Prob_five_level_2_or_3 x 4)}
gen Prob_any_two_level_2_or_3 = (1-prob_mo1)*(1-
prob_sc1)*prob_ua1*prob_pd1*prob_ad1 + (1-prob_mo1)*prob_sc1*(1-
prob_ua1)*prob_pd1*prob_ad1 + (1-prob_mo1)*prob_sc1*prob_ua1*(1-
prob_pd1)*prob_ad1 + (1-prob_mo1)*prob_sc1*prob_ua1*prob_pd1*(1-prob_ad1)
+ prob_mo1*(1-prob_sc1)*(1-prob_ua1)*prob_pd1*prob_ad1 + prob_mo1*(1-
prob_sc1)*prob_ua1*(1-prob_pd1)*prob_ad1 + prob_mo1*(1-
prob_sc1)*prob_ua1*prob_pd1*(1-prob_ad1) + prob_mo1*prob_sc1*(1-
prob_ua1)*(1-prob_pd1)*prob_ad1 + prob_mo1*prob_sc1*(1-
prob_ua1)*prob_pd1*(1-prob_ad1) + prob_mo1*prob_sc1*prob_ua1*(1-
prob_pd1)*(1-prob_ad1)
gen Prob_any_three_level_2_or_3 = (1-prob_mo1)*(1-prob_sc1)*(1-
prob_ua1)*prob_pd1*prob_ad1 + (1-prob_mo1)*(1-prob_sc1)*prob_ua1*(1-
prob_pd1)*prob_ad1 + (1-prob_mo1)*(1-prob_sc1)*prob_ua1*prob_pd1*(1-
prob_ad1) + (1-prob_mo1)*prob_sc1*(1-prob_ua1)*(1-prob_pd1)*prob_ad1 + (1-
prob_mo1)*prob_sc1*(1-prob_ua1)*prob_pd1*(1-prob_ad1) + (1-
prob_mo1)*prob_sc1*prob_ua1*(1-prob_pd1)*(1-prob_ad1) + prob_mo1*(1-
prob_sc1)*(1-prob_ua1)*(1-prob_pd1)*prob_ad1 + prob_mo1*(1-prob_sc1)*(1-
prob_ua1)*prob_pd1*(1-prob_ad1) + prob_mo1*(1-prob_sc1)*prob_ua1*(1-
prob_pd1)*(1-prob_ad1) + prob_mo1*prob_sc1*(1-prob_ua1)*(1-prob_pd1)*(1-
prob_ad1)
gen Prob_any_four_level_2_or_3 = (1-prob_mo1)*(1-prob_sc1)*(1-prob_ua1)*(1-
prob_pd1)*prob_ad1 + (1-prob_mo1)*(1-prob_sc1)*(1-prob_ua1)*prob_pd1*(1-
prob_ad1) + (1-prob_mo1)*(1-prob_sc1)*prob_ua1*(1-prob_pd1)*(1-prob_ad1) +
(1-prob_mo1)*prob_sc1*(1-prob_ua1)*(1-prob_pd1)*(1-prob_ad1) + prob_mo1*(1-
prob_sc1)*(1-prob_ua1)*(1-prob_pd1)*(1-prob_ad1)
gen Prob_five_level_2_or_3 = (1-prob_mo1)*(1-prob_sc1)*(1-prob_ua1)*(1-
prob_pd1)*(1-prob_ad1)
gen D1 = (Prob_any_two_level_2_or_3*1) + (Prob_any_three_level_2_or_3*2) +
(Prob_any_four_level_2_or_3*3) + (Prob_five_level_2_or_3*4)
gen any23_beyond_1 = D1*(-0.140)
125
gen ExpectedUtil_US = 1 - (mobi_us + self_us + act_us + dispain_us + anxiety_us +
any23_beyond_1 + any2_beyond_1_sq + any3_beyond_1 + any3_beyond_1_sq)
Abstract (if available)
Abstract
Probabilistic graphical models (PGMs) are those models that employ both probability theory and graph theory. The fundamental to the idea of a PGM is the notion of modularity, i.e. a complex system can be built by combining simpler parts. Health economics and outcomes research (HEOR) is a multidisciplinary approach to healthcare and research that incorporates number of areas of expertise including clinical research, epidemiology, health services research, economics, and psychometrics. The field has rapidly expanded in the last decade and played a crucial role in improvement the quality of healthcare. Drugs, healthcare programs, and medical devices are increasingly required to demonstrate not only their efficacy and safety characteristics, but also their superior performance in clinical effectiveness, health-related quality of life and economic outcomes. While probabilistic graphical models have become a popular tool for data analysis in health informatics, especially used to prescribe treatment or guide diagnostic decisions, their use and applications in HEOR have been limited. This three-paper dissertation introduces new approaches using probabilistic graphical models in health economics and outcomes research.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Advances and applications for economic evaluation methods in health technology assessment (HTA)
PDF
Discriminating changes in health using patient-reported outcomes
PDF
Health economics and outcomes research for informed decision making in rapidly evolving therapeutic areas
PDF
Testing the role of time in affecting unidimensionality in health instruments
PDF
Economic aspects of obesity
PDF
Evaluating approaches to reduce inappropriate antibiotic use in the United States
PDF
The impact of treatment decisions and adherence on outcomes in small hereditary disease populations
PDF
Essays in health economics
PDF
Economic, clinical, and behavioral outcomes from medical and pharmaceutical treatments
PDF
Burden of illness in hemophilia A: taking the patient’s perspective
PDF
Understanding primary nonadherence to medications and its associated healthcare outcomes: a retrospective analysis of electronic medical records in an integrated healthcare setting
PDF
Three essays on emerging issues in hemophilia care
PDF
Crohn’s disease: health outcomes and resource utilization in the biologic era
PDF
The impact of Patient-Centered Medical Home on a managed Medicaid plan
PDF
"Thinking in the middle" what instructional approaches are employed by urban middle school teachers to effect changes in African American students' learning outcomes and performances?
PDF
Evaluating cancer treatments with health economics methods
PDF
Essays on health insurance programs and policies
PDF
An evaluation of risk predictors for disease progression in Veteran Affairs patients with chronic hepatitis C
PDF
Value based purchasing: decision-making processes underlying hospital acquisitions of orthopedic devices
PDF
Impact of medication persistence on clinical and health care cost outcomes in patients with major depressive disorder using retrospective claims data
Asset Metadata
Creator
Le, Quang Anh
(author)
Core Title
New approaches using probabilistic graphical models in health economics and outcomes research
School
School of Pharmacy
Degree
Doctor of Philosophy
Degree Program
Pharmaceutical Economics
Publication Date
11/16/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
bayesian networks,blood laboratory errors,cost-effectivness,EQ-5D,lapatinib,Markov model,OAI-PMH Harvest,pobabilistic mapping,SF-12
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Doctor, Jason N. (
committee chair
), Cousineau, Michael (
committee member
), Hay, Joel W. (
committee member
)
Creator Email
quangale@gmail.com,quangle@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3528
Unique identifier
UC1180781
Identifier
etd-Le-4162 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-409306 (legacy record id),usctheses-m3528 (legacy record id)
Legacy Identifier
etd-Le-4162.pdf
Dmrecord
409306
Document Type
Dissertation
Rights
Le, Quang Anh
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
bayesian networks
blood laboratory errors
cost-effectivness
EQ-5D
lapatinib
Markov model
pobabilistic mapping
SF-12