Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Comparison of predicting accuracy of neural networks for censored survival data using generalized Receiver Operating Charactaristic (ROC)-C-Index method
(USC Thesis Other)
Comparison of predicting accuracy of neural networks for censored survival data using generalized Receiver Operating Charactaristic (ROC)-C-Index method
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
COMPARISON OF PREDICTING ACCURACY OF NEURAL NETWORKS FOR
CENSORED SURVIVAL DATA USING GENERALIZED RECEIVER OPERATING
CHARACTERISTIC (ROC) - C-INDEX METHOD
BY
YONG YUAN
A Thesis Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirement for the Degree
MASTER OF SCIENCE
(Applied Epidemiology/ Biometry)
August 1998
Copyright 1998 YongYuan
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UMI Number: 1417326
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations and
photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
®
UMI
UMI Microform 1417326
Copyright 2004 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company
300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Ml 48106-1346
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UNIVERSITY O F S O U T H E R N CALIFORNIA
THE GRADUATE SCHOOL
UNIVERSITY PARK
LOS ANGELES. CALIFORNIA 9 0 0 0 7
This thesis, written by
Y o a /C t \ uaaJ
under the direction of h.J.£....Thesis Committee,
and approved by all its members, has been pre
sented to and accepted by the Dean of The
Graduate School, in partial fulfillment of the
requirements for the degree of
D ntf August 18, 1998
THESIS COMMITTEE
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Acknowledgement
I would like to thank my thesis advisor, Dr. Annie Xiang and Dr. Stanley Azen for their
guidance and valuable inputs. I would like to thank Alex Ryutov for EPILOG PROC
NEURAL procedure that is provided as the basic analytic tool for the whole study. I am
thankful to Dr. Jonathan Buckley for accepting to be on my committee. My thanks are
also due to Dr. Pablo Lapuerta for his inspiration of my initial interest in the use of Neural
Network idea in the health and medical research field.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ii
Abstract
Neural networks (NNs) are sophisticated computer programs which utilize
principles of artificial intelligence. Theoretical work suggests that neural networks can
consistently match or exceed the performance of other statistical methods. However, the
application of neural networks to predict clinical events requires development of a strategy
to address the time course of a chronic disease process. Three published strategies
(Farragi and Simon method, Liestol method and Buckley-James method) and one method
(Ryutov method) developed by our group for applying neural networks to censored
survival type data are evaluated. Using the EPILOG® NN utility developed by our group,
feed-forward back-propagation networks are examined under a variety of model
assumptions. A generalized version of ROC curve - the C-Index - is used as a measure of
predictive accuracy for survival-type outcomes. Our results demonstrate that extensive
Monte Carlo experiments are required before any reliable conclusion can be drawn from
the simulations. Based on the C index values and the degree of variation of performance,
three interesting findings are as follows: 1) in the data set without censoring, the Ryutov
method might be the best method to choose from. If NN running time is concerned, FS
might be a better choice; 2) In the data set with censoring cases, for assumption of non
interaction terms between covariates, Ryutov method is found to be a more reliable
method compared with other three NN methods; for assumption with interaction terms, no
single NN method was found to be dominant in performance; 3) From this study, the
Liestol method has the relatively smaller C values and relatively larger variation and
proven to be the least favorable method under different model assumptions. However,
iii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
further extensive simulation studies based on more complicated model assumptions
to be developed to justify these findings.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table of Contents
Acknowledgement ii
Iii
Abstract
v
Table of Contents
vi
List of Tables
Chapter
I Introduction 1
II Methods 2
2.1 NN method for Censored Survival Data 2
2.1.1 Farragi and Simon Method 2
2.1.2 Liestol, Anderson and Anderson Method 4
2.1.3 Buckley-James Method 5
2.1.4 Ryutov Method 6
2.2 Minimization Method 6
2.2.1 Gradient Method 6
2.2.2 Conjugate Gradient Method 7
2.2.3 Newton Method 7
2.3 Predictive Accuracy- C Index 8
2.4 Simulation Study 9
2.4.1 Factors to Be Considered 9
2.4.2 Simulation Design 10
2.4.3 Simulation Data Sets 11
2.4.4 Parametric Settings for NN run 12
2.4.5 Optimal NN Iteration 13
m Results 14
IV Discussion 16
Reference 20
Appendix 21
I
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Tables
I. Table I: Simulation Results Based on Base Design 21
n. Table II: Simulation Results Based on Design [2] 22
ID. Table IE: Simulation Results Based on Design [3] 23
IV. Table IV: Simulation Results based on Design [4] 24
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter I
INTRODUCTION
Neural networks (NNs) are sophisticated computer programs which utilize
principles of artificial intelligence. Theoretical work suggests that neural networks can
consistently match or exceed the performance of other statistical methods (Homick 1989).
However, the application of neural networks to predict clinical events requires
development of a strategy to address the time course of a chronic disease process.
Outcomes may be best classified in terms of the time to an event, rather than simply the
presence or absence of an event. Many problems of medical prediction involve the use of
right-censored survival data, the outcome has been generally coded as an uncensored
discrete variable with all censored cases omitted or included in the highest category
(Faraggi and Simon 1995). Simple exclusion of censored observations from the available
training set would limit the amount of data available for network development and could
lead to significant biases in event predictions.
In this thesis, several strategies for applying neural networks to censored data are
evaluated and compared. Using the EPILOG® NN utility developed by our group, feed
forward back-propagation networks are examined. Minimization methods are chosen that
can determine optimal parameters during the analysis. A generalized version of ROC
curve - the C-Index - is used as a measure of predictive accuracy for survival-type
outcomes (Harrell et al. 1984 and Frank et al. 1982).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter II
METHODS
2.1 NN Method for Censored Survival Data
2.1.1 Farragi and Simon Method The method due to Farragi and Simon generalizes Cox
regression to allow non-linear functions in place of the usual linear combination of
covariates. Since a NN can be mathematically represented as a non-linear function of
covariates, they suggest a Cox-like model in which the NN output is used in place of the
linear function. This method retains the proportional hazards nature of the Cox model,
but provides the ability to model complexities and interactions in the input data that simple
Cox models would miss. A basic single hidden layer feed forward neural network example
was reviewed in the Farragi and Simon’s paper, this example is shown in Figure 2.1.1.
Each input is connected directly to all but one special node in the hidden layer. This
special note in the hidden layer connects to all output nodes but not to the input nodes. Its
role in the NN is similar to the constant term on the right hand side of a linear regression
model. However, in the Cox proportional hazard model this term is cancelled out because
its effect is absorbed in the baseline hazard.
2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Input Hidden Output
1
Xi
x2
Xp
Figure 2.1.1. Faraggi and Simon Single Hidden Layer NN Structure
Consider a vector of covariates X t = (Xjo,X n,..., X ip)'for the /th case as an input to
the network (i=l,... ,N), where X io is always 1. The/rth input node takes values X ip for
this input vector (p=0,... ,P). A weight W,* is associated with the connection between
input and hidden node h (h=l,... ,H). The output of hidden node h is / {whxi), where
/( .) is a “squashing function”. The most commonly used squashing function is the
logistic function. The functional form of the output from a single hidden layer NN with H
hidden nodes for a given input vector X t is
H H
g(xi,d) = a0 +'Eiah f(w kxi) = a0 + £ a * /[l + exp(-Hvr,)] (1)
h = \ A = 1
where 0 denotes the vector of unknown parameters, (W0i, W u,..., Wpi, W0 2 , Wi2 , ..., Wp 2 ,
• ••, W oh, Wih,...,Wph, cto, ai,...aH)’. The number of parameters is m=(H+l) +(P+l)H.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Maximum likelihood estimates (MLE’s) of the parameters of the neural network
are obtained using the Newton Raphson method to maximize the partial likelihood. The
partial likelihood function can be written as:
H
exp{ £ a h /(I + exp( -w**,))}
Lcm = n ^ ------------------------ £>
^ e x p f ^ a f c /O + expf-H-;,*,.))}
j e R t h ~ \
2.1.2 Liestol, Andersen and Andersen Method The method due to Liestol and
coworkers (Liestol et al. 1994) creates (with certain constraints) a NN to be exactly
equivalent to a Cox model for grouped survival data. Survival times are grouped into time
intervals during which the hazard is assumed to be constant, output nodes are established
for each interval. Figure 2.1.2 shows an example of the NN with no hidden levels in the
NN, the weights from an input to each output are constrained to be the same, and the
transfer function is appropriately chosen. The input covariates xt, x2 ,... are multiplied by
connection weights w for the connection from input nodes to output nodes. These
products are summed in the output nodes and a bias parameter added to give the net value
Vk in node k (Equation 3). Finally, transformed by an activation function, for example, a
logistic function, to give the output value Ok (Equation 4).
Vk = 'Z jw*xj < 3 )
O k(r,w) = g (2 X xj + w j (4)
j
With this architecture and constraints, the NN is trained to ‘predict’ the hazards
for each person, with results being identical to a Cox grouped survival analysis of the same
data. Eliminating the constraint that all weights from a node be identical results in non-
4
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
proportional hazards for the Cox regression, while adding a layer of nodes produces a
form of Cox-NN hybrid with some of the properties of both.
O i O 2 O 3
9 ® ^ Output
X i X2 X3 X4
Figure 2.1.2 Liestol, Anderson & Anderson: No Hidden Layer NN Structure
2.1.3 Buckley-James Method The original Buckley-James (Buckley and James 1979)
method, as applied to linear regression model, accommodates censored observations by
modifying the normal equations rather than the sum of squares of residuals. The Buckley-
James method involves replacement of censored cases with their expected values given (a)
the time censor of censoring, and (b) the residual distribution of all cases (the observed
values minus the predicted value for the current network). In this method, NN predictions
are compared to the actual values on each run and the differences (residuals) used to
calculate a Kaplan-Meier-type curve. Based on the residual distribution (as reflected in
the Kaplan-Meier curve), it is possible to estimate the expected survival for any person
who was censored. The Buckley-James method has been generalized to the NN setting to
determine the expected survival for all censored individuals (based on the current weight
matrix) and to substitute the expected value for the censored value when determining and
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
back-propagating the error. Since the predicted value changes as the network changes, so
do the expected values.
2.1.4 Ryutov Method This method involves a change to the error function so that the
error calculated for censored observations only penalizes under-prediction. The quadratic
error function is modified for censored observations. It utilizes only the available
information (the predicted time should be less than the censored time). This seems
reasonable, since predictions less than the observed time are clearly in error, and these
records should be used to contributes to the error function; prediction above the observed
time may or may not in error, so these records do not contribute to the error function.
The error for uncensored observations remains equal to the squared difference between
actual and predicted survival. An advantage of this method is that the error function is
quadratic and minimization methods works very well with it.
2.2 Minimization Method
2.2.1 Gradient Method This method was used by Liestol et al. to minimize the error
function via back propagation. Liestol et al. made the learning rate to be a decreasing
function of the iteration run, which is an improvement over the fixed rate, however, it does
not perform well if the large initial steps take the solution far from the minimum. An
alternative algorithm is to increase the learning rate if the error function has decreased and
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
decreased it when the error function has increased. This algorithm takes less time for one
iteration step, but requires a lot of iterations and generally shows a poor convergence.
2.2.2 Conjugate Gradient Method Alex Ryutov tested various modifications of the
gradient method and found that the conjugate gradient minimization method resulted in a
significant improvement over the simple gradient method. He also implemented various
algorithms for determining the search for the conjugate gradient method, all of them were
found equivalent provided the error function is exactly quadratic. The learning rate can be
either determined by performing a line search or by using Scaled Conjugate Gradient
algorithm. The Hessian matrix can be evaluated exactly (during back propagation) or
numerically, by applying central differences to the first derivatives of the error function.
The conjugate method performs more operations on each iteration step and also requires
more computer memory, however, it converges much faster than the gradient method.
2.2.3 Newton Method Farragi and Simon used the Newton-Raphson method to
maximize the partial likelihood function and approximated both the first and second
derivatives numerically. Because the numerical approximations of the derivatives can lead
to large round-up errors and can become computationally prohibitive for large network,
Ryutov has implemented a limited memory Newton method which build up an
approximation to the inverse Hessian Matrix over a number of steps. It uses a line search
to determine the learning rate. The “line search” algorithm has been implemented in the
study to determine a learning rate which minimizes the error function at each iteration in
the direction of the search.
7
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3 Predictive Accuracy (C Index)
With binary outcome, the area under a receiver operating characteristic (ROC)
curve has been widely used as a quantitative index for model predictive accuracy. ROC
curves have been also used in several neural network studies (Lapuerta et al. 1995 and
Patil et al. 1993). A high neural network score (Hanley and McNeil 1982) may have
excellent specificity but poor sensitivity for predicting clinical outcomes, whereas a low
neural network score will be more sensitive but less specific. The ROC curve plots
combinations of sensitivity and specificity for the entire range of model prediction,
providing an overall view of performance and the area under the ROC curve is a useful
measure of prognostic accuracy. In survival analysis the area under the ROC curve can be
used as a measure of prognostic accuracy at a specific time point, after dichotomizing the
output (Knaus et al. 1995). However, the index can only be used for analyzing model
performance when the outcome (or response) is dichotomous.
Motivated by rank tests based on Kendall’s tau developed by Brown et al., Harrell
et al. (Harrell et al. 1982 and Frank et al. 1984) derived a more powerful index of
concordance called C index which can be considered a generalization of the area under the
ROC curve for censored data. When used in a model which the outcome is binary, C
index reduces to the fraction of pairs of patients, one with and one without the event, such
that the one with the disease has the higher predicted event probability. In this case, C
index is the area under a ROC curve.
To calculate C index, take all possible pairings of patients. For a given pair, we
say that the predictions are concordant with the outcomes if the patient having the higher
8
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
probability estimate lived longer. Survival times can be compared either when both
patients have died or when one has died and the other’s follow-up time has exceeded the
survival time of the first. If both patients are still alive at the end of study or censored, or if
only one has died and the follow-up of the other is less than the survival time of the first
then the concordance of model predictions can not be determined. The process is
repeated until all possible pairs of patients have been examined. Of the pairs of patients of
which the ordering of survival times could be inferred, the fraction of pairs such that the
patient with the higher score had the longer survival time will be denoted by C index. In
other words, the C index is the proportion of all pairs of patients for which we could
determine the orderings of survival times such that the predictions are concordant.
The index is easy to interpret since it estimates the probability that for a randomly
chosen pair of patients, the one having the higher predicted survival is the one who survive
longer. Values of c near 0.5 indicate that the predictive accuracy of the model is no better
than a fair coin-flipping game in determining which patient will live longer, therefore the
model is not predictive. Value of C index near 0 or 1 indicate the input data virtually
always determine which patient has better prognosis (Frank et al. 1982).
2.4 Simulation Study
2.4.1 Factors To Be Considered In this section, we describe a simulation study to
evaluate the predictive accuracy of the four NN methods. Simulated experiments are
extremely useful for three reasons. First, the underlying distribution of variables and
9
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
relative relationship among them are known and the prediction of a fitted model can be
directly assessed against the “true” outcome values. Second, the simulated data with
different model specifications and assumptions are easy to obtain. Third, the sample size
can be as large as needed. In our simulation experiments, we will focus on the case where
there are only two covariates (inputs). In order to fairly assess the performance of
different NN methods under different circumstances, two factors will be considered in our
analysis: 1) inclusion/exclusion of an interaction term - first order interaction will be
considered in the model; 2) degree of censoring - two different degree of censoring will be
considered, they are: i. none censoring ii. intermediate censoring (About 20%). The
study designs will be based on all combinations of these assumptions. Four NN survival
techniques: Farragi and Simon, Liestol, Andersen and Andersen, Buckley-James, Ryutov
methods will be performed under these designs. These Monte Carlo simulation results will
offer a more balanced view on the relative merits of the different NN methods under
different conditions.
2.4.2 Simulation Design
In this study we compare the relative performance of different neural network
methods in dealing with survival type data. A parametric survival model with assumptions
of constant risk X(t) = X - Exponential model is used in the study. The model has
following functional form:
S (t) = e J o = e X t (5)
10
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
t = -log (S (t))/A, (6)
where S (t) is the survival probability and X is the hazard rate. The hazard rate is
defined as:
X = exp{a+piX 1 + f 5 2X2 + yX 1X2} (7)
where a is intercept, 0i,|3 2 and y are model coefficients, Xi and X2 are covariates
(inputs). Xi X2 is the interaction term.
The values of Xi are randomly generated from a binomial distribution with
probability of 0.5. The values of X2 are randomly generated from the standard normal
distribution (0,1). The values of S (t) are randomly generated from a uniform distribution
u(0,1). 200 samples are obtained from these assumptions.
The database is further split into 100 training records and 100 testing records. The
censor time was randomly drawn from an exponential distribution. If the censoring time is
greater than the event time, the case will be defined as non-censoring and the survival time
will be the event time; if the censoring time is less than event time, the case will be defined
as censored and the survival time will be censoring time.
The discrimination and calibration of these four NN methods were compared on
the validation set. The mean C index value and its standard deviation are used to assess
the performances of different NN methods and their variation. Mean differences in C
index are assessed with the use of the paired t test with a significance level of P<0.05.
2.4.3 Simulation Data Sets According to the above experimental design, we will focus
on the case where there are only two covariates. We conduct four experiments based on
the following designs:
11
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Design [1]: It is also called base design. There are two input variables Xi and X2. There
are no censoring cases in this design. Based on equation (7), the values of Xi are
randomly generated from a binomial distribution with probability of 0.5. The values of X2
are randomly generated from the standard normal distribution (0,1). There are two hidden
nodes in the NN structure with one hidden layer. There is no interaction term in the
model. Without loss of generality, we set the baseline hazard rate equal to 1, i.e., a=0.
For equation (7), let Pi=l, (52=0-25 and y =0
Design [2]: The same as base design except that there is a first-order interaction term in
the equation (7), i.e. y =0.2
Design [3]: The same as base design except that there is intermediate censoring
(Censoring cases: about 20%)
Design [4]: The same as design [3] except that there is a first-order interaction term in the
equation (7), i.e. y =0.2
We create 200 observations for each design and perform 20 repetitions for each
experiment. Totally four experiments are performed and each of them will be run by using
four different NN methods - Farragi and Simon, Liestol, Andersen and Andersene,
Buckley-James, and Ryutov methods. The prediction performance of these four NN
methods will be compared based on the mean C index value and its variance.
2.4.4 Parameter Setting for NN Run These four study designs will be based on the
following parameter setting for NN run: Gradient method is used with FS method in the
study because of limited computer memory concern. Conjugate gradient method is used
alongside with Buckley - James method. Newton minimization method with “line search”
12
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
algorithm is recommended for use with Liestol and Ryutov methods in the study. We
chose different combinations of NN methods with different minimization methods mainly
because these specific combinations had optimal performance and used less computer
memory. There is only one hidden layer and two hidden nodes in the NN structure. Initial
learning rate in the study is set to 0.005, and is updated using “line search” algorithm for
each minimization method. For Liestol method, survival times are grouped into five time
intervals based on 20%, 40%, 60% and 80% survival probabilities, within each interval the
hazard is assumed to be constant but no assumption is made for the hazard to be constant
across different intervals and output nodes are established for each interval.
2.4.5 Optimal NN Iterations The Proc Neural is designed to automatically save the
weights at any iteration user specified. It also saves the optimal weights and the
corresponding iteration number when the Root Mean Square (RMS) of the testing or
training group (whichever occurs first) begins to increase. At this point, the NN has
converged to a minimum and additional iterations might result in over-learning. However,
because the chosen iteration may represent a local rather than a global minimum,
additional iterations are performed to disprove this possibility. The optimal C index is
determined at the iteration with global minimum RMS.
13
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter III
RESULTS
As a basis of comparison, we first run an experiment based on a simple model
specification. The experiment is based on design[l], in which there are no censoring cases
and no interaction terms in the model. Results are shown in the Table I. In this setting,
Liestol method is worse than the other three NN methods: the mean C index value is
substantially smaller than those of the others, and this difference is statistically significant
(p<0.05, paired t test). Ryutov method is the most attractive method in this design, it has
significantly higher C values compared with other NN methods in training subset
(P<0.05). The variation of Ryutov method is the smallest among all NN methods, which
suggests that Ryutov method is more stable and therefore more reliable than other NN
methods. There is no difference in mean C index between FS and Buckley-James methods
(P>0.05). FS has relatively smaller standard deviation compared with Buckley-James
method. FS method requires the least NN runs, however, Buckley-James method needs
much more NN runs in this design.
The second experiment is based on the design[2] and the results are reported in
Table E L The relative performance in design [2] is similar to that in design [1] except for
that there is a striking difference in performance for Buckley-James method in Table I and
Table II. Compared with other three NN methods, Buckley-James method is less favorable
because it has much higher standard deviation. Its C index value is significantly lower
than Ryutov method in both training subset and testing subset (P<0.05). This also
14
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
suggests that Buckley’s performance is very sensitive to the linear vs. non-linear model
specification. Liestol method has the relatively poorer performance in this experiment -
lower mean C index value relative to other NN methods (P<0.05), however, its variation
is smaller than FS method. Again, Ryutov method is still shown to be the most attractive
method among four NN methods, it has relatively higher mean C value and lower variation
relative to other NN methods (P<0.05).
Table m reports the simulation results based on design [3], There are about 19%
censoring cases in the simulation data. Similar to the Table I, the Liestol method still has
the highest variation and lowest C index values compared to other methods (P<0.05).
Contrary to the findings in Table I, Buckley-James method now performs better than FS
method in terms of its relatively lower standard deviation but at the expense of more NN
iterations. In this design, Ryutov method is still the most attractive method in terms of its
C values and standard deviation.
Table IV describes the results based on design [4], There are about 21%
censoring cases in this design. Unlike the results from other tables, there is not a single
dominant method in this design among FS, Liestol and Ryutov methods. There is no
difference in mean C index between Ryutov and Buckley-James methods in both training
and testing subsets (P>0.05), and there is no difference in mean C index between Ryutov
and FS-James methods in the training subset (PX).05). Although there is no difference in
mean C index between FS and Buckley-James methods in both training and testing subsets
(PXJ.05), FS requires relatively less NN runs but Buckley requires much more NN runs.
The mean C index value from Liestol method is still significantly lower than all the other
three methods (P<0.05).
15
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter IV
DISCUSSION
Our review of the literature shows that this is the first research attempt to compare
the relative performance of established NN methods in dealing with censored survival type
data. Existing neural network designs cannot readily incorporate the censored survival
analysis. The application of neural networks to predict clinical events requires
development of a strategy to address the time course of a chronic disease process.
Outcomes may be best classified in terms of the time to an event, rather than simply the
presence or absence of an event. Simple exclusion of censored cases from the data set
would lose degree of freedom and limit the amount of information available for neural
network development and could lead to biased prediction results. Creation of the
EPILOG NN utility makes comparison easy among different NN methods. In addition,
the findings in this study can be easily duplicated and justified because of commercial
availability of this software.
The purpose of this paper is to compare four available neural network models for
censored survival data and demonstrate the relative merits of these techniques. Our
results demonstrate that extensive Monte Carlo experiments are required before any
reliable conclusion can be drawn from the simulations. Based on the mean C index value
and degree of variation of performance, four interesting findings are as follows: 1) In the
data set without censoring, the first choice is Ryutov method, the second choice is FS or
16
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Buckley-James method, if NN running time is concerned, FS might be a better second
choice. However, this conclusion must be used with caution. Theoretically, Buckley-
James method should not be different from Ryutov method in the data without censoring.
In this analysis, the mean C index value is significantly higher for Ryutov method than
Buckley-James method. The occurrence of this inconsistency is because the optimal C
index in the original Proc Neural was determined at the iteration when minimum RMS for
training or testing (whichever occurs first) subset rather than just training subset was
achieve. Further extensive work will be carried out just based on the optimal C index
from minimum RMS for training group only. 2) In the data set with censoring cases, for
assumption with non interaction terms, Ryutov method is found to be a more reliable
method compared with other three NN methods. Buckley-James method might be a
second best choice; for assumption of interaction terms, no single NN method among FS,
Liestol and Ryutov methods was found to be dominant in performance, Ryutov method
might still be a relatively better method; 3) From this study, the Liestol method has the
smallest C values consistently and relatively larger variation and proven to be the least
favorable method.
EPILOG® NN utility developed by our group has several potential advantages.
Commercial software for neural network to analyze the survival data is not, to our
knowledge, available because the adaptation of NN methods into censored survival
analysis are still new. This EPILOG® NN procedure allows automatic determination of
optimal NN runs, which is superior to the traditional commercial software in which
intensive visual aid needs to be used to determine the optimal NN runs. Overfitting the
data is a major concern found in the neural network design. To overcome this problem
17
permission of the copyright owner. Further reproduction prohibited without permission.
cross-validation is used in this design. The point of termination is determined by an
estimate of prediction error computed from a validation set of observations not part of the
set used for training (Further analyses need to be done just based on the minimum RMS
for training subset). That is, the iterative procedure used for maximizing the fit of
predictions to observed responses is terminated before convergence is reached. This
EPILOG® NN procedure can evaluate several competing neural network strategies under
a variety of conditions. Performances of models can be easily compared based on instantly
calculated receiver-operating characteristic(ROC) curves and Generalized C index.
There are some limitations in this study. The optimal C index in the original Proc
Neural was determined at the iteration when minimum RMS for training or testing
(whichever occurs first) subset was achieved. Further extensive work will be carried out
just based on the optimal C index from minimum RMS for training group only. In
addition, the C index values among four NN methods are somehow lower than expected.
However, this is understandable because only two input variables are considered so far in
our design. NN method has been found to be more predictive in more complicated data
set. Further more extensive study design needs to be done to justify these findings.
Moreover, we only simulated 200 observations, future study might need to consider the
impacts of different sample sizes on the comparative results.
Neural network is more powerful than the classical statistical tool when the
underlying model specification or the relationship between dependent variables and input
variables are not linear. For example, in survival data set, if we do not know the hazard
function or functional form of model, NN do not need to impose a specific functional
relationship between dependent and independent variables. Instead, the functional
18
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
relationship and hazard type are determined by the data in the process of finding values for
the weights. The advantage of this process is that the network is able to approximate any
continuous function, and we do not have to guess the functional form. The disadvantages
of NN are that it is difficult to interpret the network and converging to the solution can be
slow. If we already know the underlying model, the traditional statistical tool will perform
better in the analysis, and NN will lose its efficiency.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
19
Reference
Buckley J. and James I. Linear Regression With Censored Data. Biometrika, 66, 429-
436 (1979)
Faraggi D. and Simon R. A Neural Network Model For Survival Data. Statistics in
Medicine, 14, 73-82 (1995)
Frank E. Harrell, Jr. Ph.D; Robert M. Califf, MD. etc. Evaluating the Yield of Medical
Tests. JAMA, May 14, 247, 18 (1982)
Hanley JA and McNeil BJ. The Meaning and Use of the Area Under a Receiver Operating
Characteristic(ROC) Curve. Radiology 143, 29-36 (1982)
Harrell FE, Lee KL, CaliffRM, Pryor DB, and Rosati RA. Regression Modeling
Strategies for Improved Prognostic Prediction. Statistician in Medicine.
3, 143-152 (1984)
Homick K. Multilayer feedback networks are universal approximators. Neural Networks
2, 359-366 (1989)
Knaus WA, Harrell FE, Lynne J, Goldman L, et al. The SUPPORT Prognostic Model:
Objective Estimates of Survival for Seriously 1 1 1 Hospitalized Patients. Ann
Intern Med 122, 191-203 (1995)
Lapuerta P, Azen SP and laBree L. The Use of Neural Network in Predicting Risk of
Coronary Artery Disease. Computer Biomed Res 28:38-52, 1995
Liestol K, Anderson PK, Andersen U. Survival analysis and neural nets.
Statistics in Medicine 13, 1189-1200 (1994)
Patil S. Henry JW, Rubenfire M, and Stein PD. Neural network in the clinical diagnosis of
acute pulmonary embolism. Chest 104: 1685-89 (1993)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission.
Table I: Simulation Results Based on Base Design
NN
Methods
Training Set
(C Index)
Testing Set
(C Index)
NN Iterations
(Optimal #)
Mean* STD Mean* STD Mean STD
FS 0.6220* 0.0497 0.6249* 0.0450 2 0
Liestol 0.4767 0.0403 0.4738 0.0732 21 1 1
Buckley 0.6255* 0.0672 0.6220** 0.0610 74 62
Ryutov 0.6495 0.0289 0.6400** 0.0254 28 1 3
Design [1]: It is also called base design. There are two input variables Xi and X2 . There are no censoring cases
in this design. X = exp {a+Pi Xi + p2 X2 + y Xi X2 ) with Xi ~ Binomial(200,0.5), X2 ~ N(0,1). a=0, pi=l
p2 = 0.25 and y =0
• Means with the same letter (a, b, or c) are not significantly different at the P<0.05 level with use of the paired t test
The remaining pairs are significantly different with P<0.05.
21
Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission.
Table II: Simulation Results Based on Design[2]
NN
Training Set
(C Index)
Testing Set
(CIndex)
NN Iterations
(Optimal#)
Methods Mean* STD M ean* STD Mean STD
FS 0.6156* 0.0647 0.6273* 0.0591 4 5
Liestol 0.4858 0.0550 0.4967 0.0507 24 10
Buckley 0.5900* 0.1059 0.5907* 0.1069 46 57
Ryutov 0.6583 0.0357 0.6568 0.0317 30 15
Design [2]: The same as base design except that there is a first-order interaction term in the equation 7,
i.e. y =0.2
• Means with the same letter (a, b, or c) are not significantly different at the P<0.05 level with use of the t test
The remaining pairs are significantly different with P<0.05.
0
22
Reproduced w ith permission o f th e copyright owner. Further reproduction prohibited without permission.
Table III: Simulation Results Based on Design [3]
NN
Training Set
(C Index)
Testing Set
( C Index)
NN Iterations
(Optimal#)
Methods M ean'* STD Mean* STD Mean STD
FS 0.6099* 0.0787 0.6314* 0.0627 4 9
Liestol 0.5120 0.0611 0.5009 0.0715 20 1 3
Buckley 0.6449* 0.0405 0.6393** 0.0500 172 35
Ryutov 0.6587 0.0293 0.6561** 0.0328 27 1 5
Design [3]: The same as base design except that there is medium censoring (Censoring cases: about 20%)
• Means with the same letter (a, b, or c) are not significantly different at the P<0.05 level with use of the t test
The remaining pairs are significantly different with P<0.05.
23
Abstract (if available)
Abstract
Neural networks (NNs) are sophisticated computer programs which utilize principles of artificial intelligence. Theoretical work suggests that neural networks can consistently match or exceed the performance of other statistical methods. However, the application of neural networks to predict clinical events requires development of a strategy to address the time course of a chronic disease process. Three published strategies (Farragi and Simon method, Liestol method and Buckley-James method) and one method (Ryutov method) developed by our group for applying neural networks to censored survival type data are evaluated. Using the EPILOG® NN utility developed by our group, feed-forward back-propagation networks are examined under a variety of model assumptions. A generalized version of ROC curve - the C-Index - is used as a measure of predictive accuracy for survival-type outcomes. Our results demonstrate that extensive Monte Carlo experiments are required before any reliable conclusion can be drawn from the simulations. Based on the C index values and the degree of variation of performance, three interesting findings are as follows: 1) in the data set without censoring, the Ryutov method might be the best method to choose from. If NN running time is concerned, FS might be a better choice
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Rates of cognitive decline using logitudinal neuropsychological measures in Alzheimer's disease
PDF
Attrition in a longitudinal drug use prevention study
PDF
Longitudinal changes in physical activity and physical fitness: associations with blood pressure
PDF
Cluster analysis of p53 mutational spectra
PDF
Endometrial cancer in Asian migrants to the United States and their descendants
PDF
Air pollution and breast cancer survival in California teachers: using address histories and individual-level data
PDF
Racial/ethnic differences in colorectal cancer patient experiences, health care utilization and their association with mortality: findings from the SEER-CAHPS data
PDF
Predicting ototoxicity evaluated by SIOP in children receiving cisplatin
PDF
A comparison of three different sources of data in assessing the adolescent and young adults cancer survivors
PDF
Disparities in colorectal cancer survival among Latinos in California
PDF
Predicting neonatal outcomes among women diagnosed with severe preeclampsia and HELLP syndrome: a comparison of models
PDF
CAFÉ (Common Application Framework Extensible) - a framework to support data capture for clinical trials
PDF
Predictive factors of breast cancer survival: a population-based study
PDF
Risk factors and survival outcome in childhood alveolar soft part sarcoma among patients in the Children’s Oncology Group (COG) Phase 3 study ARST0332
PDF
Instability of heart rate and rating of perceived exertion during high-intensity interval training in breast cancer patients undergoing anthracycline chemotherapy
PDF
A novel risk-based treatment strategy evaluated in pediatric head and neck non-rhabdomyosarcoma soft tissue sarcomas (NRSTS) patients: a survival analysis from the Children's Oncology Group study...
PDF
Using genetic ancestry to improve between-population transferability of a prostate cancer polygenic risk score
PDF
Contemporary outcomes for adult congenital heart surgery in an adult tertiary care hospital
PDF
Statistical analysis of a Phase II study of AMG 386 versus AMG 386 combined with anti-VEGF therapy in patients with advanced renal cell carcinoma
PDF
Incidence and survival rates of the three major histologies of renal cell carcinoma
Asset Metadata
Creator
Yuan, Yong (author)
Core Title
Comparison of predicting accuracy of neural networks for censored survival data using generalized Receiver Operating Charactaristic (ROC)-C-Index method
School
Graduate School
Degree
Master of Science
Degree Program
Applied biometry / epidemiology
Publication Date
12/09/2020
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Xiang, Annie (
committee chair
), Azen, Stanley (
committee member
), Buckley, Jonathan (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-404185
Unique identifier
UC11666614
Identifier
1417326.pdf (filename),usctheses-c89-404185 (legacy record id)
Legacy Identifier
1417326.pdf
Dmrecord
404185
Document Type
Thesis
Rights
Yuan, Yong
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA