Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The impact of data collection procedures on the analysis of randomized clinical trials
(USC Thesis Other)
The impact of data collection procedures on the analysis of randomized clinical trials
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
The Impact of Data Collection Procedures on the Analysis of Randomized Clinical Trials
by
Elisabeth McIlvaine
A Thesis Presented to the
FACULTY OF THE KECK SCHOOL OF MEDICINE AT THE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY in
BIOSTATISTICS
August 2015
Copyright 2015 Elisabeth McIlvaine
ii
DEDICATION
I dedicate this document to my family and friends, without whom this work would not be
possible.
iii
ACKNOWLEDGMENTS
My deepest gratitude to my mentors Mark Krailo and Wendy Mack, both of whom have been a
constant source of guidance and support.
iv
TABLE OF CONTENTS
DEDICATION ........................................................................................................ ii
ACKNOWLEDGMENTS ..................................................................................... iii
LIST OF TABLES ................................................................................................. vi
LIST OF FIGURES .............................................................................................. vii
ABSTRACT ........................................................................................................... ix
Chapter 1 Introduction..........................................................................................1
1.1 Clinical Trial Analysis and Data Collection Procedures ..............................1
1.1.1 Censoring ..............................................................................................1
1.1.2 Reporting of Events ..............................................................................4
1.2 Alternative Data Collection Procedures........................................................5
1.2.1 Clinical Trial Example ..........................................................................6
Chapter 2 Literature Review ..............................................................................10
2.1 Early Work in Delayed and Unreported Events .........................................10
2.1.1 Insurance Claims .................................................................................10
2.1.2 AIDS Incidence ..................................................................................11
2.2 Estimating the Delay Time Distribution .....................................................13
2.3 Delayed Events in Survival Analysis ..........................................................19
2.3.1 Hu and Tsiatis .....................................................................................19
2.3.2 Hubbard and van der Laan ..................................................................25
2.4 Delayed Events as Missing Data.................................................................29
2.5 Conclusion ..................................................................................................33
Chapter 3 Estimation of the Expectation of the Hazard Rate .........................35
3.1 Data Collection Methods ............................................................................35
3.1.1 Censoring ............................................................................................35
3.1.2 Trial Structure Variables .....................................................................37
3.2 Data Collection Methods ............................................................................37
3.3 Estimating the Asymptotic Behavior of Rate Estimates .............................39
3.3.1 The Expectation of a Ratio of Random Variables ..............................39
3.3.2 Simultaneous Entry, No Late Reporting .............................................42
3.3.3 Simultaneous Entry, With Late Reporting ..........................................46
3.3.4 Random Entry, with Late Reporting ...................................................49
3.4 The Asymptotic Variance of the Rate Estimate ..........................................55
3.5 The Expectation of the Hazard Rate Under Specified Parameters .............59
3.5.1 Interactions Between Parameters ........................................................67
3.6 Conclusion ..................................................................................................71
3.7 Appendix .....................................................................................................75
v
Chapter 4 Simulation Results .............................................................................79
4.1 Methods.......................................................................................................79
4.2 Bias in Estimates of the Hazard Ratio ........................................................80
4.2.1 Probability of Delayed Event Reporting .............................................81
4.2.2 Interval Between Visits .......................................................................84
4.2.3 Time Elapsed Between End of Enrollment and Time of Analysis .....88
4.3 Power Analysis ...........................................................................................90
4.3.1 Methods...............................................................................................90
4.3.2 Probability of Late Reporting .............................................................91
4.3.3 Length of Visit Interval .......................................................................96
4.3.4 Time of Analysis ...............................................................................100
4.3.5 Conclusion ........................................................................................104
4.4 Test Size ....................................................................................................106
4.4.1Methods..............................................................................................106
4.4.2 Probability of Late Reporting ...........................................................106
4.4.3 Interval Between Follow-Up Visits ..................................................112
4.4.4 Time of Analysis ...............................................................................116
4.5 Conclusion ................................................................................................119
Chapter 5 Application of Data Collection Methods to Clinical Trial Data ..122
5.1 Trials .........................................................................................................122
5.1.1 INT-0091...........................................................................................122
5.1.2 AEWS0031 .......................................................................................123
5.2 Methods.....................................................................................................123
5.3 Results .......................................................................................................124
5.4 Conclusion ...............................................................................................131
Chapter 6 Concluding Remarks and Future Work ........................................133
References ...........................................................................................................136
vi
LIST OF TABLES
Table 1 Estimates of the Expectation of the Hazard Rate by Probability of Late Reporting 61
Table 2 Estimates of the Expectation of the Hazard Rate by Follow-Up Interval 63
Table 3 Estimates of the Expectation of the Hazard Rate by Time of Analysis 65
Table 4 Estimates of the Expectation of the Hazard Rate by True Hazard Rate 66
Table 5 INT-0091 Analyzed Based on Information Up to December, 1993 124
Table 6 INT-0091 Analyzed Based on Information Up to December, 1995 127
Table 7 AEWS0031 Analyzed Based on Information Up to December, 2006 128
Table 8 AEWS0031 Analyzed Based on Information Up to December, 2007 130
vii
LIST OF FIGURES
Figure 1 Lexis Diagram for an Illustrative Clinical Trial 3
Figure 2 AEWS0031 – EFS (Cutoff Date 5/31/2006) 7
Figure 3 AEWS0031 – Cutoff 11/30/2005 8
Figure 4 Hazard Rate Estimate by Probability of Late Reporting 61
Figure 5 Hazard Rate Estimate by Interval Between Visits 63
Figure 6 Hazard Rate Estimate by Analysis Time 65
Figure 7 Hazard Rate by Visit Interval and Late Reporting Probability 67
Figure 8 Hazard Rate by Visit Interval and Late Reporting Probability 68
Figure 9 Hazard Rate by Analysis Time and Visit Interval 69
Figure 10 Hazard Rate by Analysis Time and Visit Interval 69
Figure 11 Hazard Rate by Analysis Time and Late Reporting Probability 70
Figure 12 Hazard Rate by Analysis Time and Late Reporting Probability 71
Figure 13 Hazard Rate by Visit Interval and Late Reporting Probability 75
Figure 14 Hazard Rate by Visit Interval and Late Reporting Probability 76
Figure 15 Hazard Rate by Analysis Time and Visit Interval 76
Figure 16 Hazard Rate by Analysis Time and Visit Interval 77
Figure 17 Hazard Rate by Analysis Time and Late Reporting Probability 77
Figure 18 Hazard Rate by Analysis Time and Late Reporting Probability 78
Figure 19 Hazard Ratio Estimates Relative to Delayed Reporting Probability 82
Figure 20 Hazard Ratio Estimates Relative to Delayed Reporting Probability 84
Figure 21 Hazard Ratio Estimates Relative to Follow-Up Interval 86
Figure 22 Hazard Ratio Estimates Relative to Follow-Up Interval 88
Figure 23 Hazard Ratio Estimates Relative to Time of Analysis 89
Figure 24 Power of Log-Rank Test by Probability of Late Reporting 91
viii
Figure 25 Power of Exponential Regression by Probability of Late Reporting 92
Figure 26 Percent of 95% Confidence Intervals Including the True Hazard Ratio 94
Figure 27 Ratio Estimates by Probability of Late Reporting 95
Figure 28 Power of Log-Rank Test by Follow-Up Interval 96
Figure 29 Power of Exponential Regression by Follow-Up Interval 97
Figure 30 Percent of 95% Confidence Intervals Including the True Hazard Ratio 98
Figure 31 Ratio Estimates by Follow-Up Interval 99
Figure 32 Power of Log-Rank Test by Time of Analysis 100
Figure 33 Power of Exponential Regression by Time of Analysis 101
Figure 34 Percent of 95% Confidence Intervals Including the True Hazard Ratio 103
Figure 35 Ratio Estimates by Time of Analysis 104
Figure 36 Size of Log-Rank Test by Probability of Late Reporting 106
Figure 37 Size of Exponential Regression by Probability of Late Reporting 108
Figure 38 Percent of 95% Confidence Intervals Including the True Hazard Ratio 110
Figure 39 Ratio Estimates by Probability of Late Reporting 111
Figure 40 Size of the Log-Rank Test by Follow-Up Interval 112
Figure 41 Size of Exponential Regression by Follow-Up Interval 113
Figure 42 Percent of 95% Confidence Intervals Including the True Hazard Ratio 114
Figure 43 Ratio of Estimates by Follow-Up Interval 115
Figure 44 Size of the Log-Rank Test by Time of Analysis 116
Figure 45 Size of Exponential Regression by Time of Analysis 117
Figure 46 Percent of 95% Confidence Intervals Including the True Hazard Ratio 118
Figure 47 Ratio of Estimates by Time of Analysis 119
ix
ABSTRACT
In randomized clinical trials, in which time to event is of interest, it is common practice to censor
event-free patients at their last visit prior to analysis while recording survival times of patients
with events at any time. I show that this preferential method of censoring can influence estimates
of the hazard rates and ratios when analysis is conducted before all patients experience an event,
as is usually the case in clinical trials. Three alternate methods of data collection are proposed
and asymptotic expressions for hazard rate estimates are derived for all methods both in general
and when the underlying survival distribution is assumed to be exponential. The effects of trial
length, visit schedule, and probability of reporting delay on estimates of the hazard rates, ratios,
power, and test size in an exponential regression setting are established by systematically varying
these parameters in simulation studies. The four data collection methods discussed are applied to
two randomized clinical trials. A data collection method is proposed based on asymptotic
properties and performance in simulated trials. Areas of future work are discussed of which
interim monitoring is of particular interest.
1
Chapter 1. Introduction
1.1 Clinical Trial Analysis and Data Collection Procedures
The randomized clinical trial is considered the gold standard for determining the superiority or
non-inferiority of new interventions compared to those in current use. In response to widespread
use of this trial design, regulatory agencies such as the US Food and Drug Administration (FDA)
have provided guidances to assure adherence to good clinical practices (GCP) particularly with
respect to the protection of the interests of patients enrolled on such studies. This has lead to the
application of similar statistical methods across a wide variety of clinical trials. However,
problems arise when the assumptions underlying the trial design are contradicted by the
procedures used to collect and report data. This contradiction can result in biased or imprecise
estimates of the effect of the investigated treatment and lead to incorrect conclusions regarding
its performance compared to the standard treatment. The aim of this dissertation is to illuminate
one such aspect of current oncology research, to describe the biases that result, and to propose
and evaluate alternative data collection methods This dissertation will draw on data from
clinical cancer research. Although interventions will be referenced as chemotherapy, the work is
applicable to any intervention in a trial that uses random allocation and obtains data on the
outcome of interest by follow-up of study subjects.
1.1.1 Censoring
Consider a clinical trial to investigate a new chemotherapy regimen. Based on eligibility criteria,
patients treated at participating sites are enrolled in the trial and randomly assigned to either the
standard treatment or the experimental treatment. An event of interest indicative of the patient’s
disease status would be identified as the primary outcome measure to compare the regimens;
2
such events could include death from disease, recurrence of cancer, a second malignancy, or
some combination thereof. Patients would then be followed until this event occurred, they died
of another illness, or they were otherwise no longer under trial observation. Patients who do not
have the event of interest at the termination of follow-up and cease to be under observation for
the trial are said to be censored. This includes all patients who are event-free at the time that the
trial ends.
Two types of censoring under which standard analysis methods yield unbiased estimates
are type I censoring and type II censoring. Type I censoring is said to occur when the
observation portion of the trial is subject to a specific calendar end date. Type II censoring is said
to occur when observation is set to end when a specific number of events have been observed. In
both of these cases the censoring mechanism is independent of the event mechanism and is
therefore considered ignorable in the analysis. Clinical trials often claim to conform to type I
censoring. There is, however, a subtle difference between the ideal of ignorable censoring and
the reality of clinical trial logistics. In an ideal trial, at the point at which data are reported for
analysis, all patients would fall into one of the following categories: lost to follow-up, failed, or
3
censored at the time of reporting (Figure 1).
Figure 1. Graphical depiction of survival time under observation for patients in a clinical trial
relative to time of analysis.
Both censoring mechanisms require that the event-free participants who have not been
lost to follow-up at the end of the trial be censored at the same time. In reality, such patients are
censored the last time their disease status is evaluated clinically, which may be weeks or months
before the specified end of the trial. This last event-free observation usually occurs at a regularly
scheduled follow-up visit, occurring at intervals defined by study design throughout the course of
observation of the participant. There are a number of reasons for the scheduled-follow-up
structure. The methods by which events such as cancer recurrence are detected may be invasive
or possess some inherent risk of their own, in which case it would be impossible to continuously
subject participants to such measures. They may also be costly or time-consuming in which case
4
time and monetary constraints would preclude the determination of every person’s status at brief,
constant intervals. Therefore the assumption of continuous follow-up, that every person’s status
be known at every moment up to the point of censoring, is rarely if ever met. When a censoring
mechanism is neither type I nor type II censoring, it cannot be assumed that such a mechanism is
ignorable. In fact, it may not even be possible to perform a statistical test to evaluate this
possibility. If a censoring mechanism is not ignorable, the standard survival analysis techniques
are inappropriate as an ignorable censoring mechanism is a key assumption.
1.1.2 Reporting of Events
Another way in which actual clinical trial practice deviates from the ideal clinical trial lies in the
reporting of those participants who experience events. A patient may become symptomatic
between scheduled visits, seek medical attention, and be diagnosed as having had an event of
interest. Since such an event is often a serious incident and may have significant impact on the
findings or even the continuation of the study, trial reporting procedures require that these events
be reported as soon as they ascertained regardless of the follow-up visit structure. This implies
that, for any given patient, an event may be reported at any time. Specifically, a patient may
report an event between her last scheduled follow-up visit and the end of the trial. However,
unlike a patient who demonstrates symptoms of an event of interest, an event-free patient cannot
have her time under observation end between her last scheduled follow-up visit and the end of
the study. In this way, the censoring mechanism and the event mechanism are not independent of
each other.
If all events are reported at the time of their detection while censoring times are reported
only at last scheduled visit, then standard data procedures on the average under-report the time at
risk for those patients who are event-free at their last visit. They are reported only to survive to
5
their last evaluation when, in truth, they have survived up to the time of analysis. This would
result in an over-estimate of the underlying event rate for exponentially distributed data, for we
have the correct number of events in the numerator but an under-estimation of the total event-
free time for all subjects in the denominator. This would especially affect estimates of
conditional survival rates close to the time of analysis and the implication of such findings could
be particularly profound for interim reporting or futility analysis. If all patients without a
reported event were assured to not have had an event then the correct approach would be to set
the end of their observed survival time equal to the time of analysis. However, events ascertained
because of symptomatic assessment in the period between last visit and the cut-off date for
analysis may go unreported because of lapses in the reporting mechanisms from individual
investigators. Because imputing follow-up from the last visit until the time of analysis would
represent adding follow-up time that would be later identified as not supported by actual data,
physicians and medical researchers are understandably hesitant to adopt such a method. In
addition, the existence and length of such delays may depend on patient characteristics that are
associated with survival time, such as proximity to a study center or susceptibility to symptoms.
1.2 Alternative Data Collection Procedures
Considering the lack of certainty in the window between a patient’s last visit and the time that
data is collected for the analysis, investigators may opt to choose a prior point in time as the
deadline for contribution to the study data, such as the last scheduled follow-up visit for each
patient or some number of months before the time of analysis for all patients. There are a few
drawbacks to this tactic. It would ignore any information collected on patients after the cut-off
time, information which may have been costly to collect. If enrollment was still ongoing at the
decided cut-off date, recently enrolled patients may be ignored entirely. The truncation in time
6
and possibly patients could seriously impact the power of any statistical tests performed as well
as increase the variance of estimates. These drawbacks become particularly serious when
probable enrollment or predicted numbers of events are already low.
Alternatively, investigators could assume that patients who are event-free at their last
visit had maintained that status up to the time of analysis. The appropriateness of this approach
relies on the assumption that the majority of events are reported soon after they occur. If this
assumption is not met, estimates obtained in this manner could be badly biased.
1.2.1 Clinical Trial Example
Clinical trials often involve a number of analyses conducted while the trial is ongoing. If the data
collection procedure and analysis method are incompatible we would expect to see that manifest
in the analysis results. For example, if the preferential way in which events are reported is
causing a bias we may expect to see one conclusion in a standard analysis and a different
conclusion in an analysis that does not preferentially report events.
Such an example may be found in an interim analysis performed for trial AEWS0031
undertaken by Children’s Oncology Group (Womer, 2012). The trial was designed to test
whether a decrease in the interval between chemotherapy administrations (‘compressed therapy’)
would improve the event-free survival (EFS) time of children and young adults with Ewing
Sarcoma and related tumors when compared with standard therapy timing. Patients were enrolled
from May of 2001 until August 2005. The analysis that follows was an interim analysis
performed in May of 2006. The results below considered each patient’s EFS time to be the last
visit at which they were seen if no event had been reported, and to be their failure time if an
event was reported. This analysis provided an estimate of the relative hazard rate of the
compressed treatment of 0.68. The Lan and De Mets interim monitoring method would, for this
7
trial, require a p-value of less than 0.0225 in order to reject the null hypothesis of no difference
in survival between treatments. The log-rank test for such a difference yielded a p-value of
0.0464.
Figure 2. Analysis of AEWS0031 randomized clinical trial using data collected by the standard
method up to the time of analysis.
The analysis was repeated using data current to six months prior to the cutoff date. Six
months represented the maximum time between planned disease evaluations for any study
subject. The log-rank test for differences in survival curves yielded a p-value of 0.0275 and the
estimate of the relative hazard rate was 0.63.
Despite the fact that the second analysis includes less total survival time and fewer
events, the resulting p-value was smaller than in the previous analysis. The example is
0.00 0.25 0.50 0.75 1.00
Estimated Proportion Event-Free
0 20 40 60
Months
treat_no = I/E Standard Timing treat_no = I/E Intensive Timing
By Randomized Regimen
AEWS0031 - EFS (Cutoff Date 5/31/2006)
8
illustrative of the possible bias in using the most current data. The approach of using data
current only to the last visit can address the issue of apparent bias, but this example also
demonstrates that such an approach can decrease the efficiency of the estimation process.
Figure 3. Analysis of randomized clinical trial AEWS0031 using data collected up to six months
prior to time of analysis.
This dissertation will consider the standard method of censoring event-free patients at
their last visit prior to analysis in addition to three other data collection techniques. One method
will attempt to impose ignorable censoring by taking into account data collected only up to some
calendar date early enough to ensure that true patient statuses are retroactively known. Another
will apply the mechanism of censoring patients at their last visit prior to analysis to all patients
who are event-free at such a visit, even those who report an event later. Still another will make
0.00 0.25 0.50 0.75 1.00
Estimated Proportion Event-Free
0 20 40 60
Months
regimen = I/E Standard Timing regimen = I/E Intensive Timing
By Regimen - All Eligible Patients
AEWS0031 - Cutoff 11/30/2005
9
the extra assumption that all events are reported without delay, so that patients without reported
events can be assumed to survive to the time of analysis event-free. The remainder of the
dissertation will be as follows. A review of the literature on delayed reporting of outcomes will
be presented in Chapter 2. The asymptotic properties of rate estimates resulting from the
aforementioned data collection methods will be derived in Chapter 3. Chapter 4 will explore
issues of bias, power, and test size when exponential regression is applied to data collected by
the four methods and this will be done by way of simulation. In Chapter 5, the methods will be
applied to data from two randomized clinical trials. Chapter 6 will present concluding remarks
and areas of possible future work.
10
Chapter 2. Literature Review
When patients are censored at their last evaluation prior to analysis, the censoring mechanism is
not ignorable in estimating the failure time distribution. Many papers have addressed the issue of
dependence between censoring and failure mechanisms. However, the specificity of this situation
offers the opportunity to tailor specific solutions. If it could be assumed that all failures are
reported immediately, the appropriate approach to inference would be to impute all non-drop-
out, censored patients to contribute survival time up to the time of analysis. It is therefore the
possible delay between an event and when this event becomes known to investigators that
prevents such a straightforward solution to the estimation of the hazard ratios in a clinical trial.
2.1 Early Work in Delayed and Unreported Events
2.1.1 Insurance Claims
The issue of event reporting delays resulting in biased estimates of policy cost is a long-standing
issue in the actuarial sciences. Without an accurate estimate of the number of claims (events) to
be expected in a given year, insurance companies may undervalue or overvalue their policies or
underestimate the costs incurred by claims resulting in serious financial repercussions. In order
to estimate the number of incurred but not reported (IBNR) claims, the distribution of delay
times between an event (such as a car accident) and when the resulting claim is reported must be
modeled. A common approach is to assume some distributional structure to the delay-time or
number of IBNR claims resulting from an event in a given year, often with the distributional
mean dependent on some set of measured variables. Likelihoods are then constructed and
maximized to estimate the parameters and variances associated with these variables [Hesselager
and Witting, 1988]. The issue of reporting delays was addressed in a similar context by
Kalbfleisch, Lawless, and Robinson in 1991 in which they explored the delays associated with
11
the reporting of warranty claims. The common aim of these methods was to only estimate the
number of existing, unreported claims at a given point in time for budgetary purposes.
2.1.2 AIDS Incidence
Another area in which delayed reporting times have been of interest is that of AIDS incidence.
During the 1980’s and early 1990’s , estimating the incidence of AIDS became a great health
concern. However, the disease registries on which these estimates relied were often not up to
date. Adding further complication was the fact that the delay in reporting varied by geographical
region. Investigators recognized the need to estimate the distribution of the delay time between a
new AIDS diagnosis and the report of this case to a particular registry. In 1986, Morgan and
Curran published a paper in which they assumed incidence follows a given distribution and used
retrospective data to estimate the quantiles of this distribution. Monthly counts of reported cases
are adjusted for reporting delays, though the adjustment method used is not explicitly given in
the paper. A modified Box-Cox transformation is applied to the adjusted case counts and a
weighted linear regression including quadratic terms is used to estimate incident cases in future
months. This method is referred to as extrapolation. In 1989, Brookmeyer and Damiano
presented a method similar to that used to address the IBNR issue. Delay times are partitioned
into intervals with some maximum delay time assumed. For a given year of diagnosis, the
number of cases corresponding to delay intervals comprise a random vector that is assumed to
have a multinomial distribution. Unconditional likelihoods and likelihoods conditional on the
total number of cases per year are constructed and maximized to estimate parameters and
variances. This method is referred to as back-calculation.
12
Perhaps the best description of back-calculation was given by Harris in 1990. Random
variables Y
tu
are defined as the number of cases diagnosed in a given month (or other specified
time interval) t and reported in a given month t + u, where months are counted in integers. These
random variables are assumed to be independent with Poisson distributions of varying means
that possibly depend through a specified function on measured variables of interest. The total
number of diagnosed AIDS cases in month t, X
t
, is taken to be the sum of all Y
tu
’s over all
possible delay times u, so that
max
0
u
t tu
u
XY
(again assuming some upper bound on delay time).
The total number of cases X
t
diagnosed in month t are also independent Poisson variables whose
means are interpreted as the incidence of AIDS in month t. For any given month t, the vector of
case counts {Y
t0
, …, Y
tm
} given X
t
is an independent multinomial random variable whose mean
vector contains the probabilities for the delay times (number of cases delayed by u months
divided by total number of cases, X
t
). The mean number of cases diagnosed in month t and
reported in month u can be expressed as the product of the probability of having a reporting
delay of u months and the monthly incidence in month t, which are functions of mutually
exclusive sets of parameters, a condition known as separability. Likelihoods are constructed to
estimate parameters by maximization and possible models for the incidence and delay
probabilities are explored.
A novel approach was presented by Kalbfleisch and Lawless in 1991. Patients are
brought under observation upon occurrence of some consequent event and the time since some
initiating event is of interest. In the case of estimating delays in the reporting of incident AIDS
cases, the initiating event is considered the time of diagnosis and the consequent event is the time
at which the diagnosis is reported so that the interval of time between the two events is the
reporting delay. In this setting, all patients have both events and it is the distribution of the time
13
between these two events that is of interest. Starting at the consequent event, or diagnosis, and
moving backwards in time, the authors model the “hazard” of the occurrence of the precipitating
event, or report of diagnosis, in the case of continuous time. In the case of discrete time, the
probability of a patient’s diagnosis occurring in a particular interval is modeled by a log -log link
regression model. Although these models provide methods for estimating numbers of unreported
events and the distribution of delay times, applications to survival analysis are not provided, nor
do these methods alleviate the need for an ignorable censoring function.
2.2 Estimating the Delay Time Distribution
In 1994, Lawless presented a summary of methods for estimating the number of events that have
occurred when there is delay in the reporting of such events and extended them to encompass
random changes in delay with time, also in the framework of AIDS incidence. As with many of
the previous papers, Lawless works in discrete time which necessitates events that are not rare.
Two models are presented, one that models the delay time distribution without random effects
and one that models the distribution with random effects. Both models are developed so that the
total number of events occurring in a given interval and reported by some future time may be
predicted.
For the first model, let n
tx
be the number of events occurring in period t and reported in
period t + x. Let N(t;T) be the total number of cases in event period t and reported by time period
T, t ≤ T. Assuming that the vector {n
t0
, n
t1
, n
t2
, …,n
t T-t
} follows a multinomial distribution for a
fixed t and conditioned on N(t;T), the likelihood can be constructed using data from the m + 1
most recent intervals. Of interest is the function g
t
(x) = f
t
(x)/F
t
(x) where f
t
(x) is the probability
that an event in period t is reported in period t + x and F
t
(x) is the sum of those probabilities for a
fixed t for all reporting periods up to t + x. The likelihood can be simplified if it is also assumed
14
that g
t
(x)= g(x) for all event times t (a condition referred to as stationarity) and that reporting
delays for events occurring in different time intervals are independent. The maximum likelihood
estimates for g(x) for a fixed interval t are
and the asymptotic
covariance matrix for is
(2. 1)
To extend this model in order to more easily lend itself to the prediction of future
reported events, assume that the total number of events occurring in an interval t is a Poisson
random variable with mean
and that, for different intervals t, these variables are mutually
independent. Assume also that the event counts n
tx
are mutually independent random variables
with mean
. Finally, assume that N(t;T) is a Poisson random variable with mean
. If we consider only the m intervals strictly preceding interval T then the likelihood
is
(2. 2)
In the event that T – m – t < 0, the lower bound of the product is taken to be 0. Under the
assumption of stationarity, the maximum likelihood estimate of
isN(t;T) and the estimate of
the g(x)’s is the same as before, as is the associated covariance matrix.
If we assume that stationarity holds for some time extending into the future we can estimate the
total number of future reported events originating from event interval t. To do this we must also
15
assume some upper bound for the reporting delay. Let t be the interval during which an event
occurs, as before. Let T
1
be some time in the future, and let T be the maximum length of the
reporting delays, T
1
> T + t. Then the point estimate for the total number of events occurring in
interval t and reported by interval T
1
is
.
(2. 3)
Now let
The asymptotic variance of Z
t
is shown to be
(2. 4)
Under mild conditions, Z
t
divided by the square root of its variance tends to a standard normal
random variable. This asymptotic quality may be used to construct confidence intervals for
, the total number of event counts originating in period t and reported by period T
1
.
Next, a model is presented that allows for random changes in the reporting delay with time. Let t,
the interval during which events of interest occur, be fixed throughout. Consider the vector f
t
=
(f
t
(0), … , f
t
(T)) where f
t
(x) is the probability that an event occurring in interval t is reported in
interval t + x. Assume now that it follows a Dirichlet distribution with positive-valued
parameters α
0
, … , α
T
and density
16
(2. 5)
Because the entries of the vector are probabilities, we have that f
t
(x) > 0 for x=0,…,T. By
assuming that the maximum delay time is T intervals, we have that f
t
(0) + …+ f
t
(T) = 1.
Therefore the vector f
t
provides a valid support for the Dirichlet distribution.
This implies that
(2. 6)
From the Beta distribution we have that
(2. 7)
Let g(x) = E(g
t
(x)) and define the dispersion parameter
(2. 8)
The variance of g
t
(x) can then be expressed as
17
(2. 9)
Now, let
, t = 0, 1, …, T be independent random effects, where is the
dispersion parameter above. For a fixed t and conditional on
and f
t
(0), …, f
t
(T), the n
tx
’s are
independent Poisson random variables with means
. The f
t
(x)’s are independent of
and have the Dirichlet distribution previously discussed. Then we have
(2. 10)
where f
t
(x) is the same as above. The distribution of the n
tx
’s given N(t;T) is the same Dirichlet -
multinomial as before. Estimating equations in conjunction with the distributional assumptions
above result in the maximum likelihood estimates
and
with asymptotic
variance estimates
(2. 11)
(2. 12)
The prediction of events follows the methods of the second model and the estimate of N(t;T
1
) is
the same as above. With Z
t
as before, its asymptotic variance is now given by
18
(2. 13)
The asymptotic normality of Z
t
is used to procure confidence limits for the estimation of N(t;T
1
).
By examining earlier patterns of event reporting and making some assumptions about the
stability of such patterns over time, it is possible to make predictions as to the number of events
from some originating interval that will be reported at some later time. By making additional
assumptions as to the distribution of the delay probabilities it is possible to incorporate random
spread in the estimates of reported events.
These models provide a relatively straightforward way of modeling the delay-time
distribution in order to determine the number of events that have already occurred but have not
yet been reported as well as methods for predicting the number of such events at some future
point in time. These nonparametric models allow for random fluctuations in reporting delays and
are based on data that is only as recent as the analyst determines to be useful. In the case of
determining the number of IBNR events at the present time, asymptotic methods provide
confidence intervals for estimates and residuals for use in checking model assumptions. For
many applications this would be a simple, useful tool for adjusting for these as yet unreported
events.
These methods focus on estimating the number of unreported events originating from
some time interval. They notably do not address the question of how much time under
observation may elapse before such an event takes place, a question key to the topic of this
19
dissertation. Due to the discrete nature of time in this model, it is possible only to estimate when
an event may have occurred and when it may be reported in terms of intervals of time. The event
time is therefore interval censored. To get increasingly accurate survival times, one could take
increasingly fine partitions of the follow-up time but that leads to events being more and more
sparse within each interval, a characteristic not supported by this model.
In the structure of the problem at hand, it is assumed that events, when not detected
quickly, are at their latest detected at the patient’s next visit. Therefore there is only a single
natural interval of time in question: that between the visit just prior to analysis and the visit just
after. Events occurring before that interval will all have been reported by the time the interval
begins and all events occurring within the interval will be reported by the time the interval comes
to an end. But in the methods presented by Lawless no allowances are made for a single interval
of observation. In this case the probability of being reported in the next single interval is 1 and
all estimates are undefined. For these reasons the Lawless methods serve as a stepping stone to
survival analysis techniques for this defined problem, but cannot be used directly to address the
concerns of this dissertation.
2.3 Delayed Events in Survival Analysis
2.3.1 Hu and Tsiatis
The first paper to address the issue of reporting delays in the survival analysis field was
published in 1996 by Hu and Tsiatis. The authors set out to provide an estimate of the survival
distribution, similar to the Kaplan-Meier estimator, which remains unbiased when the
ascertainment process (which includes reporting delays) is not random. The authors used
estimators from the competing risks literature and applied counting process and martingale
theory to obtain asymptotic characteristics of their estimators.
20
Of n participants in a clinical trial, let T
i
denote the (continuous) time to failure for the i
th
patient measured from study entry. Let U
ji
denote the j
th
time at which vital status was evaluated
for patient i. Let A
ji
be the time at which the patient’s status at U
ji
is reported, so that the interval
between evaluation time U and reporting time A is the delay in the reporting of patient status. Let
k-1 denote the number of times a patient’s sta tus is recorded before the event of interest, so that
his or her failure time T
i
is recorded at time A
ki
. The ascertainment process for a given patient
can then be expressed by the random vector {(U
1i
, A
1i
), … , (U
k-1,I
, A
k-1,i
), (T
i
, A
ki
)}.
In order to proceed with their model, Hu and Tsiatis make some key assumptions. All
censoring is assumed to be due to incomplete follow up; no drop out or withdrawal is considered.
The duration of the reporting delays is assumed to be bounded above. Finally, assume that the
potential follow up time (from study entry to end date) F is independent of the vector {(U
1i
, A
1i
),
… , (U
k-1,I
, A
k-1,i
), (T
i
, A
ki
)}. Note that this implies that the failure time distribution and the
ascertainment process remain stable over the course of the study.
Now, let R
i
(x) be an indicator of the failure status of the i
th
patient at time x (where 1
indicates that an event has occurred). Let V
i
(x) denote the first time at which the i
th
patient’s
status at time x is known (the difference between R and V is the delay in status reporting). There
is a one-to-one relationship between the vector {(U
1i
, A
1i
), … , (U
k-1,I
, A
k-1,i
), (T
i
, A
ki
)} and the
vector {V
i
(x), R
i
(x); x ≥ 0}. Let {V(x), R(x)} denote the underlying distribution from which the
vectors {V
i
(x), R
i
(x)} are sampled. Now consider failure and censoring as two competing risks.
In this framework we can express the hazard function for censoring as
(2. 14)
21
and the hazard function for failure as
(2. 15)
It can be shown that the failure-specific cumulative distribution function is
(2. 16)
where
0
and
1
are the cumulative hazard functions corresponding to λ
0
and λ
1
above.
Let C(x) be the maximum possible delay time in the reporting of a status evaluated at
time x. Based on our assumption that there exists some upper bound on delay time, we can
express the survival function in terms of the failure-specific function G
1
as follows:
1 – S(x) =
G
1
{x, x + C(x)}. Let F denote the time from a patient’s entry to the study to the time of analysis
(where the subscript i indicating patient is suppressed). Let the survival distribution of the
potential follow-up F be denoted H(u) = Pr[F ≥ u]. Denote the vector of observable random
variables at time x as
where X(x) = min{V(x), F}, (x) indicates whether vital status at time x is known, and R*(x) is
the event indicator for time x but, unlike R(x) above, is only defined when vital status at time x is
known (i.e. (x) =1). The cause -specific hazard functions can be re-written in terms of these
observable variables and, by the assumption that F is independent of ascertainment, the
22
observable hazard functions are equal to the true hazard functions and these functions can be
expressed as
(2. 17)
In order to obtain the estimate for the survival function and its properties, we utilize
martingale theory. Define the following counting process, at-risk process, σ -algebra, and
intensity process respectively:
(2. 18)
The F(x,u)-martingale process associated with N
j
(x,u) (the sum of the N
ji
over all values of i) is
(2. 19)
where Y(x,t) is the sum of the Y
i
(x,t) over all possible values of i. Define M(x,u) as M
1
(x,u) +
M
0
(x,u). Substitute the Kaplan-Meier estimate of V(x) for V(x) and denote it by
The
proposed estimators are
23
(2. 20)
Note that in the case of no delays in reporting of vital status, the estimator for the survival
distribution reduces to the Kaplan-Meier estimator. Through the theory of counting processes
and martingales, it can be shown that
(2. 21)
and that this achieves asymptotic normality with variance
24
(2. 22)
A consistent estimate for this variance may be calculated by substituting the aforementioned
estimates for E(x,u), G
1
(x, . ), and dN
j
(x, u)/Y(x, u) for λ
j
(x,u).
Hu and Tsiatis provide a consistent estimate of survival when reporting of status may be
delayed. They recognized the fact that reporting delays could bias the Kaplan-Meier estimate.
They also specifically noted that the practice of censoring patients at the last point at which their
status was known is a potential source of such bias. Their novel approach of considering event
incidence and censoring as competing risks laid the groundwork for others to progress toward
better estimators of survival that consider reporting delays.
Some aspects of this method can limit its application to current implementations of
clinical trials. Hu and Tsiatis also make the assumption that there is some maximum possible
duration for a reporting delay and this is a known value. They also assume that all assessment
times and all reporting times are known for all patients, although this information is not
recorded by default. The assumption that potential follow-up time F is independent of the
ascertainment process {U
ij
, A
ij
} may be questionable. When patients are censored at their last
follow-up, their observed survival time is a function of the timing of their visits which is, in turn,
a part of the ascertainment process. Furthermore, the model does not allow for reportable
variables, such as treatment assignment, which may affect both delay and survival times
25
2.3.2 Hubbard and van der Laan
In 1998, van der Laan and Hubbard extended the work of Hu and Tsiatis. They, however,
assume that there are no reporting lags for visits at which the participant is well, while the
experience of an event may be reported with some delay. Using the same notation as Hu and
Tsiatis, they develop an inverse probability of censoring weighted (IPCW) estimator of the
survival function that does not require knowledge of the upper bound on reporting delay. Using
the work of Robins and Rotnitzky (Robins and Rotnitzky, 1992), they show that their proposed
one-step estimator is locally efficient. They aim to relax the requirement of independence
between the survival time and the ascertainment process, add covariates to the model of their
predecessors, and attain optimal efficiency.
As in the prior paper, let U
j
be the evaluation times, let A
j
be the corresponding reporting
times, and let k denote the number of times a patient’s status is observed including the point at
which they have their event. Assume that for all visits at which the patient has not had his or her
event (j < k) U
j
= A
j
but that events may be reported with delay (U
k
< A
k
). Let C be the time of
analysis, let T be the survival time, and let V be the time at which failure is reported (measuring
all from each patient’s entry to the study and suppressing patient indexing). Le t
(2. 23)
Let W(t), a real-valued k-vector, be a covariate process of measures reported at the same times as
vital status. Define the process, corresponding sample path, and observed data structure
26
(2. 24)
respectively. Let X represent the full data
. The distribution F
X
of X will be unspecified
and it will be assumed that G(. | X), the conditional distribution of C given X, satisfies
coarsening at random (meaning that censoring is non-informative given the observed covariates).
Note that the coarsening at random assumption is satisfied if
where m is some function and λ
C
is the Lebesgue hazard function. Let
(2. 25)
where λ
0
(c) is some function, so that the coarsening at random assumption is met, and assume
that Pr[ =1|X] > 0 F
X
almost everywhere. Let V*(t) be the first time at which R(t) = I(T≤ t) is
known. Let
(2. 26)
Under the stated assumptions we have the identity
(2. 27)
27
from which follows the simple estimator of the survival function:
(2. 28)
where
is an estimator of
assuming the proportional hazards model above. To obtain local
efficiency the empirical average of a two-part efficient influence function is added to the above
estimate yielding the one-step estimator
(2. 29)
The authors show that if survival time T is independent of the follow-up process then this
estimate is locally efficient. If IC
nu
*
(Y| F, G) is also estimated consistently then the above
estimator is asymptotically efficient. The authors discuss estimating IC
nu
*
as a regression of a
random variable on observed covariates based on the techniques of Robins (1993) and Robins
and Rotnitzky (1992). In the case that there is no relevant covariate process and under weak
conditions, it is shown that
28
is asymptotically normal with variance
(2. 30)
The authors recommend modeling the censoring distribution
with a Cox
proportional hazards model with relevant time-dependent covariates if there is possible
dependence between the censoring and delay processes, although this method is not explored
explicitly in the paper. Instead, the Kaplan-Meier estimate is used to estimate the censoring
distribution.
Van der Laan and Hubbard take the work of Hu and Tsiatis and approach the same
problem using inverse probability of censoring weighted estimators. Using the work of Robins
and Rotnitzky, they show that their one-step estimator is locally efficient for a specific sub-
model
and that even under misspecification of this sub-model their
estimate remains consistent. Their estimators do not require that delays in the reporting of status
for visits at which a patient does not have an event are recorded, as the Hu and Tsiatis estimator
requires; only delays in the reporting of events are required to be recorded. Also unlike the Hu
and Tsiatis estimator, dependence between survival time and censoring time is allowed if this
dependence is through the ascertainment process. For the special case of independent censoring
and no covariates, the calculation of the survival estimate is explicitly given along with
asymptotic variance and confidence intervals.
29
Despite their novelty and sophistication, there are some limitations to these methods.
Although records of the entire ascertainment process are no longer needed, knowledge of the
delay between event time and event reporting is still necessary and delays in the reporting of
non-failure statuses are not addressed. A more serious complication is the dependence of these
estimates on an accurate estimation of the censoring process. The conditional survival of
censoring function G(.|X) is assumed to satisfy coarsening at random; that is, the finite set of
times at which we observe the continuous function G are random. When dependent censoring is
present it is necessary to model G(.|X) as a Cox proportional hazards model. Low rates of
censoring, which are particularly likely in the intervals shortly before analysis, may result in
poorly defined estimates. In order to achieve the efficiency bound, the sub-model
must be correctly specified. As with the Hu and Tsiatis estimator, the best
characteristics of the proposed estimator are achieved asymptotically, leaving the adequacy of
the estimator for small samples to be questionable. The authors suggest approaches to improve
the performance of their estimator in the small sample situation. Their approach requires an
upper bound C
0
on the reporting delay so that any information occurring after t + C
0
is not
relevant for estimating F(t). All data is therefore truncated at t + C
0
reducing the variance of the
estimate. If C
0
is not known it must be hypothesized; too large will create unstable estimates, too
small will diminish efficiency.
2.4 Delayed Events as Missing Data
Tu et al, approach the issue of delay as one of missing data (Tu, Meng, and Pagano, 1993).
Censored subjects are considered to be missing data from the time of their censoring forward.
Truncated subjects are considered to be missing data from some originating time until they come
under observation. They approach the problem assuming a relevant covariate process under a
30
discrete-time framework with a proportional hazards model. An EM algorithm is used to
estimate the effects of covariates on survival time.
The authors cite previous nonparametric work that did not allow for a covariate process
including Turnbull (1976), Wang, Jewell, and Tsai (1986), and Wang (1992). They also note the
extensive history of delay estimation in the realm of AIDS incidence including the work of
Brookmeyer and Liao (1990), Harris (1990), and Kalbfleisch and Lawless (1991) but note that
these methods do not include adjustment for truncated data. Their estimation technique is a
specific application of the EM algorithm of Dempster et al. (1977) and may be considered an
extension of the algorithm proposed by Turnbull in 1977.
Let r and t be the lag time in reporting death and the time of death respectively. The first
step is to estimate the distribution of the delay times. Let z be a vector of covariates and let β be
a vector of unknown parameters. Assume that this probability mass function is given by
(2. 31)
where R is the maximal observed delay and
are the conditional
baseline probabilities for each discrete time unit j. Let
and denote the above distribution by .
Assume a corresponding model for survival times. Note that this implies that the hazard function
is approximately
(2. 32)
31
Let
Assume now that the number of unreported events follows a
negative-binomial distribution so that the expected number of unreported events m is
(2. 33)
and
deaths are observed at times
for patients with corresponding covariate vectors
. Using
distributional assumptions about the number of unreported deaths and the asymptotic normality
of random draws from the corresponding distributions are used to impute the unreported
events resulting in a “completed -data” set. If inferences were drawn on one of these sets alone,
the variance estimates would not reflect the added uncertainty from the imputation. Instead, M
completed-data sets are generated, each with its own set of maximum-likelihood estimates
and observed information matrices inverses
for the
survival distribution. The final estimate for is found by taking the mean of the maximum
likelihood estimates across all completed-data sets and is denoted
(2. 34)
Let
denote the average of the information matrices inverses across all completed-data sets.
Then the estimate of the total variation of
is given by
,
(2. 35)
32
If d is the dimension of and C is a k x d matrix (with d ≥ k), then the hypothesis C = 0 may be
tested by the modified Wald statistic
(2. 36)
where
(2. 37)
The methods of Tu et al carry many advantages. They are developed in the context of
survival analysis, they allow for both right-censoring and left-truncation, and they include
covariates. If the assumptions of the model apply, these methods provide a conceptually simple
though computationally intense way to estimate the effects of the vector of parameters β on
hazard rates. Although issues may arise due to misspecification of the delay model or unbounded
delay times, the method provides an efficient and straightforward way to estimate the effects of
covariates on survival time in the presence of delay in reporting of status. However, as with
previous models, this method reduces to standard estimates when no delay in reporting of events
is present. It may be interesting to investigate this further by simultaneously considering delay in
reporting of well status, as the preferential reporting of events over well statuses close to the time
of analysis may be framed as an issue of the report of well statuses being delayed until after the
analysis (i.e. their next visit).
33
2.5 Conclusion
For some time, the delay in reporting events has been recognized as a potential bias if analytic
methods do not account for it. The earliest examples propose simple models that aimed solely to
estimate the number of unreported events that had occurred. More complex models were
developed to model the distributions of the delay times in reporting events, allowing for random
effects and future projections. More recently, the problem was addressed within the framework
of survival analysis. Many methods have converged in the process of developing hazard
estimates that account for a delay in the reporting of events; these include competing risks,
martingale theory, influence curves, inverse probability weighting, and imputation techniques.
Although these models do address the issue of delay in event reporting, delay is not the only
flaw in the data structure. The larger issue is that the manner by which the data are collected is
not reflected in the analytic methods, neither those commonly used (such as Cox regression) nor
those discussed here. The problem that remains is the restriction of reported survival times of
those patients without event to times that coincide with their scheduled visits. Despite the variety
of tactics represented in the literature, for all methods that assume an upper bound on the
reporting delay, one could delay analysis until this interval has passed between the analysis time
of interest and the end of data collection. In this way, the status of each patient is known at the
analysis time, the censoring mechanism becomes type I censoring, and standard analysis
methods can be used (Kalbfleisch and Prentice). Waiting for this prescribed amount of time to
elapse will ensure that all events and their associated times will be reported and the complete
data can then be used to determine the results of the analysis. Waiting for complete data may or
may not be reasonable at the conclusion of a trial when the main question of interest is treatment
efficacy. However, trial data are also collected for formal interim monitoring (Lan and DeMets)
34
conducted throughout its duration; these analyses aim to protect the interests of patients as well
as trial resources. In the case of trial monitoring, it is not possible to wait for complete data
because of the monitoring analysis and reporting schedule. Additionally, there may be other
factors that do not permit for an extended follow-up period. It is therefore of interest to
investigate the performance of standard analytical techniques when the standard data collection
procedure is followed, as well as other possible data collection alternatives.
35
Chapter 3. Estimation of the Expectation of the Hazard Rates
This chapter will identify issues of bias in estimation introduced by the preferential reporting of
events and examine the performance of alternative data collection techniques. A total of four
data collection procedures will be presented and investigated. Estimates of the asymptotic
behavior of the event rate estimates resulting from each of these techniques will be derived and
calculated for specific parameter configurations.
3.1 Data Collection Methods
3.1.1 Censoring
For studies in which the main outcome of interest is the time elapsed under observation until
some specified event, some subjects may not experience that event while under observation.
When a subject leaves observation before they experience the event they are said to be censored.
The ideal situation in standard survival analysis is independent censoring. A censoring
mechanism is said to be independent if, when applied at some given time point t, censoring
depends only on the following: the events that have occurred before time t and random external
mechanisms. Independent censoring removes subjects from observation such that those patients
censored are a random sample of all patients at risk at the time point of interest. There are two
types of censoring commonly referred to in survival analysis applications: type I censoring and
type II censoring. In the case of type I censoring, a specific calendar endpoint to the trial is
determined at the outset. Patients who have not experienced an event by the conclusion of the
trial are censored at that point. In the case of type II censoring, a specific number of failures is
specified by design. The trial is ended when this specific number of events is observed. Type I
and type II censoring mechanisms are independent of the failure mechanism and are therefore
ignorable in likelihood-based survival analysis. The independent censoring assumption is key to
36
likelihood-based estimation of regression parameters in both the parametric and Cox model
settings. To see this, consider the full likelihood of a given set of data (t
i
, δ
i
), i=1, …, n, where t
i
denotes the time at which observation ended for the i
th
subject, δ
i
is an indicator of event
occurrence, and data is observed for n subjects. Let f(t) and F(t) be the probability and
cumulative density functions of the failure time random variable T as before and let g(c) and
G(c) be the probability and cumulative density functions of the censoring time random variable
C. Let T
i
and C
i
denote the realizations of random variables T and C for the i
th
patient. Let β be a
vector of parameters and let z
i
be a corresponding set of predictors. Given that δ
i
=1, the
contribution to the likelihood of the i
th
subject is
Given that δ
i
=0, the contribution to the likelihood of the i
th
subject is
When T and C depend on the same parameters, as is the case when censoring is not ignorable,
the probabilities involved in the likelihood contributions must be expressed in terms of the joint
distribution of T and C. This is problematic because of possibly complex and unobservable
relationships between these two quantities.
The mechanism of reporting data in cancer clinical trials, described in chapter 1, does not
conform to independent censoring. All event-free subjects are censored at their last scheduled
follow-up visit. After that last visit prior to the analysis, exceptions to this censoring rule are only
made for those people who are reporting events. This mechanism censors everyone except for
those who are at high risk for failure in the next interval and is therefore preferentially censoring
study participants who are well compared to those left in the risk set who are at high risk for
event.
37
3.1.2 Trial Structure Variables
Consider a clinical trial to compare the efficacy of two treatments. Upon initiation of the trial
patients begin to be enrolled and randomized. The hypothesis of interest is whether the treatment
groups have different hazard rates. The following parameters must be considered in the
formation of a trial:
1) Enrollment period: the length of time for which new patients will be enrolled into the trial
2) Enrollment-free follow-up: the length of time that elapses between the end of enrollment
and the analysis
3) Follow-up window: the length of time between scheduled visits
Although in actual trials, the visit intervals may vary from the specified follow up window for
numerous reasons, in these considerations it will be assumed for simplicity that patients maintain
the same visit schedule regardless of the time on trial.
3.2 Data Collection Methods
Instead of attempting to model the joint distribution of T (the failure time random variable) and
C (the censoring time random variable), there are a few simple tactics in data collection which
may be utilized to consider the uncertainty about the true status of patients at the time of
analysis. Rather than adjusting the model to the data it is possible to adjust the data to the model,
eliminating the preferential nature of event reporting. Three such adjustments are discussed
below.
One such approach would be to censor all patients, including those who experience late
events, at their last visit prior to the analysis and thereby not incorporate any event (or other
follow-up) information obtained between the time of the last scheduled visit and the time of
38
analysis. This requires knowledge of the visit schedule as it is executed for each patient.
Standard likelihood techniques would then be used to estimate the treatment effect estimators of
interest. Although this censoring mechanism would not prefer high or low risk patients, it would
still be dependent on the observation schedule for the trial. This method will be referred to as the
personal cutback method.
Another approach would be to accumulate data to the analytic time point, but use for
analysis only information relevant to a selected calendar date before the analysis; a time point
that would ensure all patients would have their status reported as of, or later than, that calendar
date according to the visit schedule for the study. The interval between scheduled visits is often
the same for all participants, for example, six months between visits. In this case, one could take
the data as of six months prior to analysis as the appropriate data freeze time thereby eliminating
the dependence of the censoring mechanism on the follow-up schedule. This method will be
referred to as the global cutback method.
Both of these estimates involve cutting back the data to some earlier time. There is an
increase in the variance of the effect estimate with this tactic. The number of reported days at
risk as well as the number of events not considered in the analysis will increase the variance of
the effect estimator, which has serious implications for the power of inferential tests. A third
tactic would be to assume that all patients who do not report an event after their last visit have
not had an event. If this assumption is correct the resulting estimates would be unbiased and no
data would be lost in the process. However, if this assumption does not hold then hazard rates
may be underestimated. This method will be referred to as the pull-forward method, as we are
pulling forward the survival times of patients without events to the time of analysis.
39
The practice of censoring patients without a reported event at their last scheduled follow-
up visit prior to the time of analysis will be referred to as the standard method.
3. 3 Estimating the Asymptotic Behavior of Rate Estimates
3.3.1 The Expectation of a Ratio of Random Variables
This chapter aims to provide a characterization of the asymptotic behavior of the estimates of the
hazard ratio that result from the data collection methods described above. As the hazard rate can
be expressed as the ratio of events to total survival time, the ideal approach would be to take the
expectation of this ratio. However, taking the expectation of a ratio of random variables is not
trivial. Consider the expectation of a ratio of two random variables D and T.
( , )
Dd
E f d t dddt
Tt
(3.1)
Taylor expand the ratio D/T about (E[D], E[T]) to first order terms
2
[ ] 1 [ ]
[ ] [ ]
[ ] [ ] [ ]
d E D E D
d E D t E T
t E T E T E T
(3.2)
Substitute into the integrand of the expectation.
2
[ ] 1 [ ]
[ ] [ ] ( , )
[ ] [ ] [ ]
D E D E D
E d E D t E T f d t dddt
T E T E T E T
40
2
[ ] 1 [ ]
( , ) [ ] ( , ) [ ] ( , )
[ ] [ ] [ ]
E D E D
f d t dddt d E D f d t dddt t E T f d t dddt
E T E T E T
2
[ ] 1 [ ] [ ]
( [ ]) ( , ) ( [ ]) ( , )
[ ] [ ] [ ] [ ]
E D E D E D
d E D f d t dddt t E T f d t dddt
E T E T E T E T
(3.3)
Based on this motivation, I will approximate the hazard rates resulting from the different
data collection methods using E[D]/E[T]. Note that a bound on the error of these estimates can
be calculated using the equations derived in Rice, 2009. For the remainder of this chapter the
following notation will be used:
ˆ
[ ]/ [ ] E D E T
where D denotes the total number of reported events and T denotes the total reported survival
time under observation during the trial.
Consider the simple method of calculating the hazard rate of a group of patients at the
time of analysis by dividing the total number of events by the total survival time accrued. Due to
the differences between the data collection techniques, this method will produce slightly
different estimates. Let us first consider the situation in which all patients enter simultaneously at
time 0.
To motivate the discussion, first assume that we have perfect knowledge of all patients’
event statuses right up to the time of analysis. Let T be true survival time and assume it follows
an exponential distribution with parameter λ . Let A be the time of analysis.
Define the following random variables:
1,
0,
TA
TA
,
*
0,
A TT
T
TA
41
(3.4)
Then their expected values are as follows:
[ ] [ ] 1 exp E P T A A
(3.5)
00
1
exp{ } 1 exp{ }
() exp{ }
[ *]
( ) 1 exp{ } 1 exp{ }
AA
T
T
A A A
tf t tt
E T dt dt
F A A A
(3.6)
The maximum likelihood estimate of the hazard rate is
1
1
[ ] [ ]
ˆ
[ ] [ *] 1 [ ] [ ] [ *] 1 [ ]
N
i
i
N
i
i
NE E
N
NE E T N E A E E T E A
T
(3.7)
Note that the numerator is the expected number of events, the first term of the
denominator is the total expected time accrued from patients with an event, and the second term
of the denominator is the expected time contribution from those patients without an event. Then
from the above expectations, we have
1 exp{ }
ˆ
1
exp{ } 1 exp{ } exp{ }
A
A A A A A
(3.8)
42
Therefore, in the case of perfect knowledge and continuous reporting this estimate is
asymptotically unbiased. This approach of estimating the hazard rate by the ratio of expected
events to expected accumulated survival time will adopted for the remainder of this chapter.
3.3.2 Simultaneous Entry, No Late Reporting
Assume now that we are informed of a patient’s event status only at their visit times and at the
time at which they report an event, should this happen. Assume also that all events are reported
when they occur.
Standard Estimate
Recall that the standard procedure in most clinical trials is for patients without a reported event
to be censored at their last follow-up visit prior to the time of analysis. Let A be the time of
analysis for the trial, w the interval between scheduled follow-up visits, T the survival time for a
given patient. Note that, assuming all patients enter simultaneously, the last visit for every
patient will be at A-w. Define the following random variables:
1
1,
0,
T A w
otherwise
2
1,
0,
A w T A
otherwise
(3.9)
1
,
0,
T T A w
T
otherwise
2
,
0,
T A w T A
T
otherwise
(3.10)
Assuming exponentially distributed survival times, expectations can be derived as follows:
43
1
[ ] [ ] 1 exp{ ( )} E P T A w A w
(3.11)
2
[ ] [ ] (1 exp{ }) (1 exp{ ( )}) E P A w T A A A w
exp{ ( )} exp{ } A w A
(3.12)
1
0
1
( )exp{ ( )} 1 exp{ ( )}
exp{ }
[]
1 exp{ ( )} 1 exp{ ( )}
Aw A w A w A w
tt
E T dt
A w A w
(3.13)
2
exp{ }
[]
exp{ ( )} exp{ }
A
Aw
tt
E T dt
A w A
11
exp{ } ( )exp{ ( )} exp{ } exp{ ( )}
exp{ ( )} exp{ }
A A A w A w A A w
A w A
(3.14)
Take as the estimate of the hazard rate the number of reported events divided by the total
accrued survival time. Then the following is used to estimate its expectation. Note that a factor
of N has been omitted from both the numerator and denominator.
12
1 1 2 2 1 2
[ ] [ ]
ˆ
[ ] [ ] [ ] [ ] (1 [ ] [ ])( )
EE
E E T E E T E E A w
(3.15)
44
Here, the first, second, and third terms of the denominator represent the expected time
contribution from patients with events before their last visit, patients with events after their last
visit, and patients who do not experience events during the trial respectively. Using the
expectations derived above, we have
1 exp{ }
ˆ
1
(1 exp{ } exp{ })
A
A w A
(3.16)
[]
[ ] [ ] ( )
T
P T A
E T P T A wf A
(3.17)
Note that if, and only if, the interval between visits, w, were equal to zero (implying
constant follow-up) this estimate would be unbiased. When the visit interval is strictly greater
than zero, the denominator is smaller than it would be in the unbiased case. The standard
estimate can therefore be expected to overestimate the hazard rate under these circumstances.
Cutback Estimates
Let A, w, T, δ
1
, and T
1
be defined as above. Recall that the personal cutback method involves
censoring all patients who have not had an event at their last visit prior to the analysis. All
information collected following a patient’s last visit is ignored in the calculation of the rate
estimate. Recall also that the global cutback method involves using only information collected up
until the time point one visit interval prior to the time of analysis (A – w). Now, note that in the
case of simultaneous entry, the personal and global cutback methods are the same procedure, as
all patients will have their last visit at the same time: A – w.
45
In keeping with the approach of estimating the expectation of the hazard rate by the ratio of
expected events to expected accumulated survival time, I will take as my estimate
1
1 1 1
[]
ˆ
[ ] [ ] (1 [ ])( )
E
E E T E A w
(3.18)
In the case of exponential survival times and based on expectations derived above,
1 exp{ ( )}
ˆ
1
( )exp{ ( )} 1 exp{ ( )} exp{ ( )}( )
Aw
A w A w A w A w A w
1 exp{ ( )}
1
1 exp{ ( )}
Aw
Aw
(3.19)
Therefore, under these circumstances, the cutback method can be expected to provide unbiased
estimates of the hazard rate.
Pull-Forward Estimate
Recall that the pull-forward method assumes that all patients who do not report an event during a
trial are event-free at the time of analysis, and their contribution to overall survival time reflects
this. Therefore, the estimate of the expectation of the hazard rate can be taken as
12
1 1 2 2 1 2
[ ] [ ]
ˆ
[ ] [ ] [ ] [ ] (1 [ ] [ ])
EE
E E T E E T E E A
(3.20)
46
In the case of exponential survival times,
1 exp{ }
ˆ
1
exp{ } 1 exp{ } exp{ }
A
A A A A A
(3.21)
When all events are detected and reported in a timely fashion, the pull-forward method
correctly assumes that all patients who do not report an event maintain their event-free status up
to the time of analysis. The result is an unbiased estimate of the hazard rate.
3.3.3 Simultaneous Entry, With Misreporting
Next, consider the possibility that events are not always quickly reported. Let ρ be the
probability that an event is not detected until the next scheduled visit after it occurs. If an event
happens after the last scheduled visit, ρ is the probability that the event goes unreported at the
time of analysis.
Standard Estimate
Note again that, assuming all patients enter simultaneously, the last visit for every patient will be
at A – w. Let A, w, T, δ
1
, T
1
, δ
2
, and T
2
be defined as above.
In keeping with previous estimates, take as our estimate of the expectation of the hazard rate the
ratio of the expected reported events to the accumulated reported survival times:
12
1 1 2 2 1 2
[ ] (1 ) [ ]
ˆ
[ ] [ ] (1 ) [ ] [ ] (1 [ ] (1 ) [ ])( )
EE
E E T E E T E E A w
(3.22)
47
1 exp{ ( )} (1 ) exp{ ( )} exp{ }
ˆ
1
1 exp{ ( )} (1 ) exp{ ( )} exp{ } exp{ }
A w A w A
A w A w A w A
(3.23)
[ ] (1 ) [ ]
ˆ
[ ] [ ] (1 )( [ ] ( ))
T
P T A w P A w T A
E T P T A w P A w T A wf A
(3.24)
When ρ = 1, so that all events are only reported at the next follow up visit, all patients
surviving without event at their last visit are censored at their last visit and the standard method
reduces to the personal cutback method, providing an unbiased estimate of the hazard rate. When
w = 0, implying constant knowledge of all patients’ event statuses, the terms involving ρ drop
out and again, an unbiased estimate is produced. When these circumstances are not met, the last
term of the denominator causes the estimate to be smaller than it would be in the unbiased case.
Again, the standard method generates an over-estimate of the hazard rate.
Personal and Global Cutback Estimates
Cutback estimates are not affected by a misreporting process that assumes that any event that
occurs is, at the latest, detected at the next visit. The personal cutback method stops data
collection at the last visit prior to the analysis, so if the patient has had an event it is known at
that time. The global cutback method moves the analysis time back far enough that all patients
will have a visit between the cutoff date and the time of analysis. At that visit, the patient’s event
status is ascertained and we assume that the timing of the event also becomes known, so that, if
the event occurred before the cutoff date, this information can be used to calculate the hazard
rate.
48
Pull-Forward Estimate
The pull-forward estimate of the hazard rate, taking late reporting into account, is
12
1 1 2 2 1 2
[ ] (1 ) [ ]
ˆ
[ ] [ ] (1 ) [ ] [ ] (1 [ ] (1 ) [ ])
EE
E E T E E T E E A
(3.25)
1 exp{ } exp{ ( )} exp{ }
ˆ
1
1 exp{ } exp{ ( )} exp{ }) exp{ ( )}
A A w A
A A w A w A w
(3.26)
1
ˆ
[ ] 1 ( )
T
P T A P A w T A
E T P T A P A w T A wf A w
(3.27)
When ρ = 0, the assumption that all patients without a reported event are event -free at the
time of analysis is satisfied and the estimate is unbiased. When ρ 0, the last term of the
denominator prevents the estimate from being unbiased. The last term of the denominator is an
increasing function in w, so the larger the window between visits, the more biased the estimate.
As the last term of the denominator increases it, the estimate of the hazard rate will be smaller
than the true rate.
3.3.4 Random Entry, With Misreporting
In practice, patients are often entered as they are identified as eligible participants over time
rather than instantaneously at the beginning of a trial. In this section, entry will be considered as
a random variable following a uniform distribution over the enrollment period of the trial. This
49
will require that the ratios which estimate hazard rates be multiplied by the distribution of entry
times and integrated in their entirety over the enrollment period.
Standard Method
First, the rate is derived as a function of entry time by deriving each component as such. Let K
denote the end of the enrollment period, A the end of the trial, and w the interval between
scheduled visits. Let T be survival time, E entry time, and L last visit. Note that the timing of the
last visit is a function of entry time, as we assume that visits take place at every increment of
length w from entry until the last visit prior to analysis. Let ρ be the probability that an event is
not reported until the following visit. Let
~ [0, ] E Unif K , ~ exp( ) T
Define the following random variables:
1
1,
0,
T L E
T L E
1
,
0,
T T L E
T
T L E
(3.28)
2
1,
0,
L E T A E
otherwise
2
,
0,
T L E T A E
T
otherwise
(3.29)
Then the components of the rate estimate can be expressed as functions of entry time E as
follows.
1
[ | ] 1 exp{ ( )} E E L E
(3.30)
50
2
[ | ] exp{ ( )} exp{ ( )} E E L E A E
(3.31)
1
11
( )exp{ ( )} exp{ ( )}
[ | ]
1 exp{ ( )}
L E L E L E
E T E
LE
(3.32)
22
2
( )exp{ ( )} ( )exp{ ( )}
[ | ]
exp{ ( )} exp{ ( )}
A E A E L E L E
E T E a b
L E A E
11
exp{ ( )} exp{ ( )}
exp{ ( )} exp{ ( )}
A E L E
L E A E
(3.33)
The numerator of the rate estimate, as a function of entry time, is
12
[ | ] (1 ) [ ] 1 exp{ ( )} (1 )exp{ ( )} E E E L E A E
(3.34)
The terms of the denominator are as follows.
11
11
[ | ] [ | ] ( )exp{ ( )} exp{ ( )} E E E T E L E L E L E
(3.35)
22
[ | ] [ | ] ( )exp{ ( )} ( )exp{ ( )} E E E T E A E A E L E L E
11
exp{ ( )} exp{ ( )} A E L E
51
(3.36)
12
1 [ ] (1 ) [ ] E E L E
exp{ ( )} (1 )(exp{ ( ) exp{ ( )}) ( ) L E L E A E L E
(3.37)
The denominator is the sum of these terms. In order to derive the standard estimate of the
expectation of the hazard rate, the ratio of the numerator to the denominator must be multiplied
by the distribution of entry times and integrated over the enrollment period with respect to entry
time. However, in order to do so, we must address the fact that the time of the last visit is a
function of entry time. To do this, I use the fact that, for all patients who enter within a given
interval of size w during the enrollment period, the time that elapses between entry and last visit
is the same. Partition the enrollment period into intervals of length w (these will be denoted
“enrollment intervals”):
[0, ] [0, ),[ ,2 ),[2 ,3 ),...,[( 1) , ] K w w w w w r w rw
Note that this assumes that the enrollment period is a multiple of w. Consider a patient who
enters during enrollment interval [(f-1)w, fw). Then, assuming that A is a multiple w, the time
elapsed between their entry and their last visit, L – E, is equal to A – fw. Therefore, if we break
up the integral of the ratio over the enrollment period into the sum of integrals of the ratio over
the enrollment intervals, the difference L – E can be expressed without being a function of E.
The estimate of the expectation of the standard rate can be expressed as
12
1
1 1 2 2 1 2 ( 1)
[ ] (1 ) [ ] 1
ˆ
[ ] [ ] (1 ) [ ] [ ] 1 [ ] (1 ) [ ]
fw
r
f
fw
EE
de
K E E T E E T A fw E E
52
(3.38)
1
1
( 1)
1 1 exp{ ( )} (1 )exp{ ( )}
ˆ
11
1 exp{ ( )} 1 exp{ ( )} 1 exp{ ( )}
fw
r
f
fw
A fw A e
de
K
A fw e fw A e A fw
(3.39)
Personal Cutback Method
The personal cutback method censors patients who do not report an event at their last visit prior
to the analysis. This mechanism is applied universally to all patients who are event-free and the
timing of a patient’s last visit is entirely determined by their time of entry. As the time of entry is
assumed to be a random process, the censoring time is therefore also random and not related to
risk of event. In this way, the censoring mechanism employed by the personal cutback method is
ignorable and the analysis procedures that depend on this assumption are applicable. In addition,
it can be shown that the ratio of expected reported events to expected survival is equal to the true
hazard rate. The personal cutback method can be expressed as a function of entry time as
follows.
1
1 1 1
[ | ]
ˆ
[ | ] [ | ] (1 [ | ])( )
EE
E E E T E E E L E
1 exp{ ( )}
11
( )exp{ ( )} exp{ ( )} ( )exp{ ( )}
LE
L E L E L E L E L E
(3.40)
53
The estimate of the hazard rate generated by the personal cutback method is therefore unbiased.
Global Cutback Method
The global cutback method censors patients who do not report an event one follow-up interval
prior to the true analysis date. The use of a specified calendar date is known to be a type of
ignorable censoring and the analysis procedures that depend on this assumption are applicable, as
in the personal cutback case. It should be noted, however, that it is assumed throughout this
dissertation that the information gathered in the interval between the cutoff date and the date of
analysis is sufficient to accurately determine the true status of all patients at the time of the
cutoff. It can also be shown that the ratio of expected reported events to expected survival is
equal to the true hazard rate. Define the following random variables.
3
1,
0,
T C E
T C E
3
,
0,
T T C E
T
T C E
(3.41)
Then the components of the rate estimate can be derived as follows.
3
[ | ] 1 exp{ ( )} E E C E
(3.42)
3
11
( )exp{ ( )} exp{ ( )}
[ | ]
1 exp{ ( )}
C E C E C E
E T E
CE
(3.43)
The estimate of the hazard rate as a function of entry time is
54
3
3 3 3
[ | ]
ˆ
[ | ] [ | ] (1 [ | ])( )
EE
E E E T E E E C E
(3.44)
1 exp{ ( )}
ˆ
11
( )exp{ ( )} exp{ ( )} ( )exp{ ( )}
CE
C E C E C E C E C E
(3.45)
The estimate of the hazard rate generated by the global cutback method is therefore unbiased.
The Pull-Forward Estimate
The pull forward estimate differs from the standard estimate only in the censoring of patients
without a reported event. The standard estimate censors patients at their last follow-up visit while
the pull-forward estimate censors them at the time of analysis A. The rate estimates therefore
differ only in the last term of the denominator. We can thus express the pull forward rate as a
function of entry time as follows.
12
1
1 1 2 2 1 2 ( 1)
[ | ] (1 ) [ | ] 1
ˆ
[ | ] [ | ] (1 ) [ | ] [ | ] 1 [ | ] (1 ) [ | ]
fw
r
f
fw
E E E E
de
K E E E T E E E E T E A e E E E E
(3.46)
1
( 1)
1 1 exp{ ( )} (1 )exp{ ( )}
11
1 exp{ ( )} exp{ ( )} exp{ ( )}
fw
r
f
fw
A fw A e
de
K
A e fw e A fw A e
(3.47)
3.4 The Asymptotic Variance of the Rate Estimate
55
Issues related to the variance of the estimate are of most interest in terms of statistical test
characteristics in the finite sample setting and will therefore be explored mainly by way of
simulation. However, techniques for deriving the asymptotic variance of estimates of the hazard
rate will be presented briefly with details left to the reader.
Assume perfect reporting and simultaneous entry. Let T be the true patient survival
random variable. Define the following random variables.
1,
0,
TA
TA
*
,
,
T T A
T
A T A
(3.48)
Let the individual realizations of these variables be indexed by i. Under the Weak Law of Large
Numbers, assuming the independence and identical distributions of the indicators of patient
event on trial, we have that
1
1
[]
N
p
i N
i
E
N
(3.49)
**
1
1
[]
N
p
i N
i
T E T
N
(3.50)
By the multivariate Central Limit Theorem, we have
1
*
*
1
1
[ ] 0
,
[ ] 0 1
N
i
i D
N
N
i
i
E N
NN
ET
T
N
56
(3.51)
Where
*
**
[ ] ( , )
( , ) [ ]
V Cov T
Cov T V T
(3.52)
Now define
( , )
x
f x y
y
(3.53)
Note that the underlying random variable T of survival time is assumed to be a continuous
random variable with non-negative support. Hence the sum of survival times can be assumed to
be bounded away from 0. Then, by the delta method
1
* * *
*
1
1
[ ] [ ] [ ]
0,
[ ] [ ] [ ] 1
N
T
i
i D
N
N
i
i
E E E N
N f f N f f
E T E T E T
T
N
(3.54)
Therefore, the asymptotic variance of the estimate of the hazard rate can be expressed as
**
** * 2 * 3 * 4
[ ] [ ]
[ ] 2 [ ] ( , ) [ ]
[ ] [ ] [ ] [ ] [ ]
T
EE
V E Cov T V T
ff
E T E T E T E T E T
(3.55)
Given the definitions of delta, T*, and T,
57
[ ] ( )
T
E F A
(3.56)
*
[ ] ( ) [ | ] ( )
TT
E T F A E T T A AS A
(3.57)
[ ] ( ) ( )
TT
V F A S A
(3.58)
2
* 2 2
[ ] ( ) [ | ] ( ) ( ) [ | ] ( )
T T T T
V T F A E T T A A S A F A E T T A AS A
(3.59)
*
( , ) ( ) ( ) [ | ]
TT
Cov T F A S A E T T A A
(3.60)
When the underlying distribution is exponential, we have the following.
[ ] 1 exp{ } EA
(3.61)
*
1
[ ] 1 exp{ } E T A
(3.62)
[ ] exp{ } 1 exp{ } V A A
58
(3.63)
*
22
1 1 1
[ ] 2 exp{ } exp{ 2 } V T A A A
(3.64)
*
11
( , ) exp{ } exp{ 2 } exp{ } Cov T A A A A
(3.65)
In the case of perfect reporting, simultaneous entry, and underlying exponential survival times,
we have that the asymptotic variance of the estimate is
* * 2
* 2 * 3 * 4
[ ] 2 [ ] ( , ) [ ]
[ ] [ ] [ ] 1 exp{ }
V E Cov T V T
E T E T E T A
(3.66)
In order to derive the asymptotic estimate of the variance resulting from the four data collection
methods, it is necessary to re-define δ and T* to reflect the procedure being used. For example,
consider the standard method in the presence of delayed reporting when enrollment is
simultaneous. Define the following
1,
,
0
T A w
A w T A
TA
*
(1 )( )
T T A w
T T A w A w T A
A w T A
59
(3.67)
Using the fact that is a Bernoulli random variable that is independent of the underlying survival
variable T, it is possible to derive the expectations and variances of δ and T* in order to calculate
the asymptotic variance of the rate estimate as above. The approach is similar when enrollment is
allowed to occur as a random process. Let E denote entry time, L last visit and note that L is a
function of E. For the standard method in the presence of delayed reporting and random entry,
define the following.
1,
,
0
T L E
L E T A
TA
*
(1 )( )
T T L E
T T L E L E T A
L E T A
(3.68)
In this case, note that late reporting, entry, and underlying survival time are mutually
independent random processes. Methods in section 3.3.4 may be applied in the calculation of
expectations and variances and the formula 3.55 used to derive the asymptotic variance.
3.5 The Expectation of the Hazard Rate Under Specified Parameters
When random entry is taken into account, the estimates of the hazard rates resulting from the
standard and pull-forward methods cannot, in general, be expressed in closed form. In order to
calculate these rates, the trapezoid method was used to estimate the component integrals. Each
integral was estimated multiple times, each time increasing the number of trapezoids used to
derive the estimate, until the difference between consecutive estimates was less than 0.000001.
All other estimates were calculated directly from the derivations above. Estimates based on the
cutback methods are not included due to the fact that they have been shown to be unbiased
60
regardless of random entry. Note that only ratios hazard rates of groups with different underlying
rates are investigated. Let λ denote the true hazard rate and let ρ denote the probability that an
event is not reported until the following visit. Let w denote the interval between scheduled
follow-up visits and let A denote the time of analysis.
61
Table 1
Estimates of the Expectation of the Hazard Rate by Probability of Late
Reporting
λ=0.67
Simultaneous Entry Uniform Entry
ρ=0 ρ=0.5 ρ=1 ρ=0 ρ=0.5 ρ=1
Standard Estimate
0.6824 0.6762 0.6700 0.6845 0.6773 0.6700
Pull-Forward
Estimate
0.6700 0.6643 0.6586 0.6700 0.6613 0.6527
λ=1.00
Simultaneous Entry Uniform Entry
ρ=0 ρ=0.5 ρ=1 ρ=0 ρ=0.5 ρ=1
Standard Estimate
1.0034 1.0017 1.0000 1.0108 1.0054 1.0000
Pull-Forward
Estimate
1.0000 0.9972 0.9944 1.0000 0.9927 0.9855
Hazard Ratio
Simultaneous Entry Uniform Entry
ρ=0 ρ=0.5 ρ=1 ρ=0 ρ=0.5 ρ=1
Standard Estimate
0.6801 0.6750 0.6700 0.6772 0.6736 0.6700
Pull-Forward
Estimate
0.6700 0.6662 0.6623 0.6700 0.6662 0.6623
Figure 4. Change in the estimates of the hazard rate as the probability of late reporting increases
assuming uniform enrollment over a three-year enrollment period, a visit interval of six months,
and analysis at five years, and a true hazard rate of 0.67.
62
When ρ = 1 no patients report an event after their last visit. The standard method
therefore censors all patients at their last visit and so reduces to the unbiased personal cutback
estimate. For values of ρ less than 1, the standard estimate takes events after t he last visit into
account while failing to consider the contribution of patients who maintain their event-free status
past their last visit. The smaller ρ is, the greater the number of late events preferentially used to
calculate the hazard rate, and the more bias is introduced to the estimate. Because the estimate
preferentially considers late events while not considering patients who are event free at the same
time point, the standard estimate overestimates the hazard rate when ρ is less than 1.
When ρ = 0, the assumption made by the pull-forward method that any patients without a
reported event maintain their event-free status to the end of the trial is met and the estimate is
unbiased. For values of ρ greater than zero, there are some patients are fal sely assumed to be
event-free at the end of the trial. The greater the value of ρ, the more patients whose events do
not contribute to the estimate and the more bias is introduced into the pull-forward estimate.
Because the estimate falsely assumes that patients are without event up to the end of the trial,
the pull-forward estimate underestimates the hazard rate when ρ is greater than zero.
63
Table 2
Estimates of the Expectation of the Hazard Rate by Follow-Up Interval
λ=0.67
Simultaneous Entry Uniform Entry
w=0.25 w=0.5 w=0.75 w=0.25 w=0.5 w=0.75
Standard Estimate
0.6731 0.6762 0.6794 0.6737 0.6773 0.6807
Pull-Forward
Estimate
0.6676 0.6643 0.6599 0.6660 0.6613 0.6561
λ=1.00
Simultaneous Entry Uniform Entry
w=0.25 w=0.5 w=0.75 w=0.25 w=0.5 w=0.75
Standard Estimate
1.0009 1.0017 1.0026 1.0028 1.0054 1.0078
Pull-Forward
Estimate
0.9989 0.9972 0.9946 0.9968 0.9927 0.9879
Hazard Ratio
Simultaneous Entry Uniform Entry
w=0.25 w=0.5 w=0.75 w=0.25 w=0.5 w=0.75
Standard Estimate
0.6725 0.6750 0.6776 0.6718 0.6736 0.6755
Pull-Forward
Estimate
0.6683 0.6662 0.6635 0.6681 0.6662 0.6641
Figure 5. Change in the estimates of the hazard rate as interval between visits increases
assuming uniform enrollment over a three-year enrollment period, a probability of late-reported
events of ½, a visit interval of six months, and analysis at five years, and a true hazard rate of
0.67.
64
The longer the interval between visits, the more time that will elapse, on average,
between a patient’s last visit and the time of analysis. The standard estimate will therefore censor
patients earlier resulting in the loss of more survival time contributed by event-free patients.
Therefore, the longer the interval between visits, the more the standard method will overestimate
the hazard rate if the probability of late reporting is not equal to one.
For longer intervals between visits, the pull forward method will assume more event-free
time for patients who do not report an event after their last visit. When the probability of late
reporting is not zero this causes the pull-forward estimate to falsely assume that some patients
are event free who have had an unreported event. Therefore, the longer the interval between
visits, the more the pull-forward method will underestimate the hazard rate when the probability
of late reporting is not zero.
65
Table 3
Estimates of the Expectation of the Hazard Rate by Time of Analysis
λ=0.67
Simultaneous Entry Uniform Entry
A=4 A=4.5 A=5 A=4 A=4.5 A=5
Standard Estimate
0.6828 0.6788 0.6762 0.6884 0.6812 0.6773
Pull-Forward
Estimate
0.6585 0.6619 0.6643 0.6496 0.6570 0.6613
λ=1.00
Simultaneous Entry Uniform Entry
A=4 A=4.5 A=5 A=4 A=4.5 A=5
Standard Estimate
1.0047 1.0028 1.0017 1.0179 1.0095 1.0054
Pull-Forward
Estimate
0.9923 0.9954 0.9972 0.9775 0.9875 0.9927
Hazard Ratio
Simultaneous Entry Uniform Entry
A=4 A=4.5 A=5 A=4 A=4.5 A=5
Standard Estimate
0.6795 0.6769 0.6750 0.6763 0.6748 0.6736
Pull-Forward
Estimate
0.6636 0.6650 0.6662 0.6645 0.6653 0.6662
Figure 6. Change in the estimates of the hazard rate as the time of analysis increases assuming
uniform enrollment over a three-year enrollment period, a visit interval of six months, a
probability of late reporting of 1/2, and a true hazard rate of 0.67.
66
We assume throughout that, if a patient has an event, that event is detected and reported
no later than the next visit and that their true event time is determined. As we’ve seen from the
cutback methods, basing the estimate of the hazard rate on statuses known at the last visit is an
unbiased method. Bias is introduced by imperfect reporting during the last follow-up interval
prior to analysis and the use of a preferential method of collecting data. The longer a trial runs,
the smaller the proportion of time that occurs in the last interval in terms of the of the total study
time. Therefore, the proportion of possibly biased information decreases as the length of the trial
increases.
Table 4
Estimates of the Expectation of the Hazard Rate by True Hazard Rate
Simultaneous Entry Uniform Entry
λ=0.67 λ=1.0 Ratio λ=0.67 λ=1.0 Ratio
Standard Estimate 0.6762 1.0017 0.6750 0.6773 1.0054 0.6736
Pull-Forward Estimate 0.6643 0.9972 0.6662 0.6613 0.9927 0.6662
When the probability of late reporting is 0.5, the visit interval is six months, the trial ends
at five years, and entry is simultaneous, the standard method overestimates the true hazard rate
by 0.93% when the true rate is 0.67 and by 0.17% when the true rate is 1.00. The pull-forward
method underestimates the hazard rate by 0.85% when the true rate is 0.67 and 0.28% when the
true rate is 1.00. When the hazard rate is higher, events occur faster and more information is
accrued over the course of the trial. Thus, the higher the true hazard rate, the less bias there is in
both the standard and the pull-forward estimates. The implication of this is that, under
circumstances resulting in biased estimates, the ratio will also be biased when comparing two
groups with different hazard rates. Using the standard method, the lower rate will be more over-
estimated than the higher rate. Using the pull-forward method, the lower rate will be more under-
estimated than the higher rate. Therefore, when taking the ratio of the lower-rate group to the
67
higher-rate group, the standard method will over-estimate both the rates and the hazard ratio
while the pull-forward method will under-estimate both the rates and their ratio.
3.5.1 Interactions Between Parameters
We have thus far investigated the change in the hazard rate estimate that occurs when varying
only a single study parameter. The following plots are presented in order to illustrate the affect
on hazard rate estimates of two parameters varying simultaneously. Plots displayed are based on
calculations under a true hazard rate of 0.67. Plots for a true rate of 1 are excluded as they reflect
the same behavior as those included here. Presented are estimates based on the standard method
of reporting assuming random, uniformly distributed enrollment. Please see the Appendix for
plots of the pull-forward estimate.
Figure 7. Scatter plot of impact of visit window (w) and probability of late reporting (row) on
estimates of the hazard rate assuming uniform enrollment over a three-year period, an analysis
five years after the start of enrollment, and a true hazard rate of 0.67.
0.25
0.50
0.75
1.00
w
0.00
0.33
0.67
1.00
row 0.6700
0.6792
0.6885
0.6977
rate
Hazard Rate By Visit Interval and Late Reporting Probability
Uniform Entry, Standard Method
Enrollment interval of 3 years, analysis at 5 years
68
Figure 8. Surface plot of impact of visit window (w) and probability of late reporting (row) on
estimates of the hazard rate assuming uniform enrollment over a three-year period, an analysis
five years after the start of enrollment, and a true hazard rate of 0.67
The standard estimate is always unbiased when the probability of late reporting is equal to 1. As
this probability decreases, the bias in the estimated hazard rate increases. The rate of this
increase is small when the interval between visits is small and it is large when said interval is
large. A small probability of late reporting is tantamount to an increased probability of
preferentially reporting late events. A larger window between visits results in more time cut out
of the event-free population, causing the late reported events to carry more weight thereby
compounding the bias introduced by the reporting probability.
Hazard Rate By Visit Interval and Late Reporting Probability
Uniform Entry, Standard Method
Enrollment interval of 3 years, analysis at 5 years
0.00
0.33
0.67
1.00
row
0.25
0.50
0.75
1.00
w
rate
0.6700
0.6792
0.6885
0.6977
69
Figure 9. Scatter plot of impact of visit window (w) and time of analysis (A) on estimates of the
hazard rate assuming uniform enrollment over a three-year period, a probability of late reporting
of 1/2, and a true hazard rate of 0.67.
Figure 10. Surface plot of impact of visit window (w) and time of analysis (A) on estimates of
the hazard rate assuming uniform enrollment over a three-year period, a probability of late
reporting of 1/2, and a true hazard rate of 0.67.
3.50
4.33
5.17
6.00
A
0.25
0.50
0.75
1.00
w 0.6717
0.6962
0.7207
0.7452
rate
Hazard Rate by Analysis Time and Visit Interval
Uniform Entry, Standard Method
Enrollment interval of 3 years, late reporting probability of 1/2
Hazard Rate by Analysis Time and Visit Interval
Uniform Entry, Standard Method
Enrollment interval of 3 years, late reporting probability of 1/2
0.25
0.50
0.75
1.00
w
3.50
4.33
5.17
6.00
A
rate
0.6717
0.6962
0.7207
0.7452
70
Here we see a very pronounced bias in the hazard rate when the window between visits is long
and the analysis is performed shortly after the end of the enrollment period. When the interval
between visits is one year and the analysis is done six months after the end of enrollment, it is
possible that there are late-enrolled patients who do not have a follow-up visit before the
analysis. The loss of information increases steeply when some patients are excluded from the
analysis. With this loss of information comes a steep increase in the bias of the rate estimate.
Though the combination of a shorter trial and a longer visit interval will induce some bias, this is
affect is most pronounced when the analysis occurs less than one interval length after the end of
enrollment. Conducting a final analysis shortly after the end of the enrollment period should be
an unusual occurrence, so the implications of this finding mostly weigh on interim analyses. This
topic, however, is reserved for future work.
Figure 11. Scatter plot of impact of visit window (w) and probability of late reporting (row) on
estimates of the hazard rate assuming uniform enrollment over a three-year period, a visit
window of six months, and a true hazard rate of 0.67.
3.50
4.33
5.17
6.00
A
0.00
0.33
0.67
1.00
row 0.6700
0.6936
0.7171
0.7407
rate
Hazard Rate by Analysis Time and Late Reporting Probability
Uniform Entry, Standard Method
Enrollment interval of 3 years, visit interval of 6 months
71
Figure 12. Surface plot of impact of visit window (w) and probability of late reporting (row) on
estimates of the hazard rate assuming uniform enrollment over a three-year period, a visit
window of six months, and a true hazard rate of 0.67.
Though the standard estimate is always biased when the reporting of events is never delayed, this
bias can be reduced by extending the time of final analysis and thereby collecting more
information, which we can see for the hazards calculated for a six-year trial. The worst bias
comes, predictably, when the trial is short and the probability of delayed reporting is zero. In this
case the hazard rate is over-estimated by more than 10%. Even though the analysis is being done
one visit interval after the end of enrollment, so that no late-enrolling patients are being left out
of the analysis, the short length of the trial so limits the amount of information collected that the
impact of a low late-reporting probability results in a sizable bias.
3.6 Conclusion
The standard method of data collection results in biased estimates of the hazard rates within each
treatment group. The severity of this bias depends on the probability of late-reported or
Hazard Rate by Analysis Time and Late Reporting Probability
Uniform Entry, Standard Method
Enrollment interval of 3 years, visit interval of 6 months
0.00
0.33
0.67
1.00
row
3.50
4.33
5.17
6.00
A
rate
0.6700
0.6936
0.7171
0.7407
72
unreported events, the length of the interval between scheduled visits, the time of analysis, and
possibly other factors not considered in this dissertation. When we assume that a patient’s true
event status is assessed at each visit, the only opportunity for bias is introduced in the time that
elapses between a patient’s last visit and the time of analysis. By eliminating the time periods
during which bias may be introduced we may generate unbiased estimates of the hazard rates.
The personal cutback method does this by ceasing data collection at the last visit prior to
analysis for all patients, even those who report a failure later. The global cutback method does
this by pushing the calendar date of analysis far enough back so as to ensure that the true event
status for all patients is retroactively known. Though these methods result in unbiased estimates,
the loss of information collected late in the study can imply a loss of power, which may be
considered a more egregious fault depending on the situation. Additionally, there may be factors
which affect the finite-sample estimates which are not apparent in asymptotic calculations. This
issue will be addressed by way of simulation in Chapter 4.
The standard practice of using data from patients who have late events while censoring
patients who do not report an event at their last visit will result in a biased estimate of the hazard
rates and ratio unless events can only be detected at visits. In this case, this practice reduces to
the personal cutback method and is subject to the same loss of power. If one can make the
additional assumption that all events are all detected quickly, even between visits, then the
survival times of those patients who do not report an event can be drawn forward to the time of
analysis. As long as the assumption is correct, this will result in an unbiased estimate which does
not suffer from a loss of power. The degree to which this assumptions does not hold is directly
proportional to the amount of bias introduced into the estimate of the hazard rate, holding all
other factors equal.
73
The bias present in the estimates resulting from the standard and pull-forward methods is
affected by a number of parameters, though, in general, the standard method will overestimate
hazard rates while the pull-forward method will underestimate them. The probability of events
only being detected at visits is one source of bias. It is a measure of the degree to which the
standard method aligns with the personal cutback method. When this probability is high, the
standard and personal cutback methods are similar and bias in the standard method is small.
When it is low, the two methods are dissimilar and the bias is larger. This probability is also a
measure of the degree to which the assumption made by the pull-forward method holds. When it
is low, events are generally detected quickly whether they are between visits or not, so that those
without reported events can be assumed to be event-free. When this probability is high, it is
likely that undetected events are happening between visits, so that it cannot be assumed that
patients who do not report an event have not actually had one. If similar studies have been
previously conducted, it is possible to estimate this probability by the percentage of events that
were not detected at a scheduled visit and this could help investigators gauge how biased the
standard and pull-forward methods may be.
The length of time between scheduled visits defines the size of the window during which
bias can be introduced to hazard rate estimates. The longer this interval is, the more bias there
will be in both the standard and the pull-forward rate estimates. In the standard method, the
longer the interval, the earlier that patients without a reported event are censored and the more
event-free survival time that is lost from the estimate of the rate. In the pull-forward method,
when there are unreported events in the final window, the longer the interval, the more time that
is falsely attributed to patients assumed to be event-free who have actually failed. In general, it is
best to keep the follow-up window as short as is logistically possible, for reasons that go beyond
74
quality of data. But should shorter intervals throughout the trial be out of the question, one option
may be to plan on an extra visit scheduled close to the time of analysis. If events tend to be
caught only at scheduled follow-up visits, this would be one way to improve the quality of data
used to estimate the hazard rates.
Another factor that affects the bias inherent in rate estimates which stem from the
standard and pull-forward method is the length of a trial. It is easy to see that the longer a trial
runs, the more information that is garnered, and the less bias we would expect to see in our
estimates. As this is a parameter that investigators have control over, it should be noted that this
may be one approach to offset a situation in which bias is to be expected due to a combination of
data collection method and uncontrollable factors such as probability of late reporting and true,
underlying hazard rates. A longer trial run is particularly important if the interval between visits
is logistically impossible to shorten. We can consider the potential bias in the hazard rates to be
proportional to the amount of time in the last follow-up interval of the trial, relative to the trial
length. Given a fixed follow-up interval w, the ratio of interval to trial length (w/A) gets smaller
the longer the trial is run. Therefore trial length, along with visit interval, is another factor which
may impact the bias in rate estimates that investigators have under their control.
A final aspect of trials to consider is the underlying set of true hazard rates in the groups
being observed. When these rates are high, information (in the form of events) is accrued rather
quickly, resulting in rate estimates that have less bias than estimates of lower rates. If the true
rates being estimated are small, this may be offset by running the trial for a longer period of time.
It should also be noted that, because both biased methods are more biased for smaller rates than
for larger rates, that the bigger the difference in rates being compared, the bigger the bias in the
hazard rate ratio. When true rates are suspected to be very low or very different it is especially
75
important to run the trial for as long as logistics permit and to keep the visit interval as short as
possible.
Though the magnitude of the biases explored in this chapter are relatively small, there are
some additional considerations to be taken into account. As illustrated in the preceding section,
factors discussed here may interact with each other to produce a magnitude of bias that is more
than additive. Also, the estimates in this chapter relate to the theoretical, asymptotic behavior of
these statistics. Any single trial represents but one realization of a number of random processes
and may result in much more bias than the findings presented here would suggest.
3.7 Statistical Characteristics of the Pull-Forward Method
Presented here are the pull-forward correlates of the standard method plots presented in section
3.5.1. Uniform enrollment is assumed throughout.
Figure 13. Scatter plot of impact of visit window (w) and probability of late reporting (row) on
estimates of the hazard rate assuming uniform enrollment over a three-year period, an analysis
five years after the start of enrollment, and a true hazard rate of 0.67.
76
Figure 14. Surface plot of impact of visit window (w) and probability of late reporting (row) on
estimates of the hazard rate assuming uniform enrollment over a three-year period, an analysis
five years after the start of enrollment, and a true hazard rate of 0.67.
Figure 15. Scatter plot of impact of visit window (w) and time of analysis (A) on estimates of
the hazard rate assuming uniform enrollment over a three-year period, a probability of late
reporting of 1/2, and a true hazard rate of 0.67.
77
Figure 16. Scatter plot of impact of visit window (w) and time of analysis (A) on estimates of
the hazard rate assuming uniform enrollment over a three-year period, a probability of late
reporting of 1/2, and a true hazard rate of 0.67.
Figure 17. Scatter plot of impact of visit window (w) and probability of late reporting (row) on
estimates of the hazard rate assuming uniform enrollment over a three-year period, a visit
window of six months, and a true hazard rate of 0.67.
78
Figure 18. Scatter plot of impact of visit window (w) and probability of late reporting (row) on
estimates of the hazard rate assuming uniform enrollment over a three-year period, a visit
window of six months, and a true hazard rate of 0.67.
79
Chapter 4. Simulation Results
The behavior of estimators in expectation is an important component of understanding the
characteristics of such estimators in general. However, the performance of estimators in the
finite-sample setting can be used to confirm the expected behavior of estimates or to elucidate
additional factors that may impact estimates and statistical tests. This chapter presents findings
on the bias, power, and size of estimates and statistical tests generated based on the four data
collection procedures.
4.1 Methods
In order to explore the characteristics of these methods and to compare the resulting estimates,
trial data were simulated. The hazard ratio between two treatment groups was estimated for each
data collection procedure. The following will be used to refer to the estimates resulting from
different data collection methods: standard (non-events censored at last visit, events reported any
time), personal cutback (all patients censored at last visit), global cutback (all patients censored
one follow-up interval prior to analysis), and pull-forward (censoring all non-events at the time
of analysis).
All simulations were undertaken using SAS statistical software, version 9.2. Data were
simulated for a clinical trial in which the hazard ratio between two groups was the estimate of
interest. The following parameters were specified for each simulation:
1. Enrollment Period
2. Enrollment-free follow-up
3. Follow-up window
80
4. Probability of delayed reporting: the probability that an event which occurs between
scheduled visits is not reported immediately; it is assumed that these events are detected
and reported at the next scheduled visit
5. Hazard rate in group 1 (of higher risk)
6. Hazard ratio (comparing the two groups)
7. Number of patients randomized to each group
Patients were entered into the trial uniformly during the enrollment window. Time to event
was assumed to follow an exponential distribution. Patients were assumed to be observed from
entry to the trial until event occurrence or time of analysis (no dropout) with scheduled follow-up
visits occurring at regular intervals throughout follow-up. If a patient experienced an event
before analysis, a Bernoulli random variable with probability as in item 4 above determined
whether this event was immediately reported or reported at the next scheduled visit. Regardless
of when an event was reported, the time that the event took place was assumed to be accurately
ascertained. Note that, should a patient fail between their last follow-up and the analysis, if their
event time was randomly chosen to be delayed then it would not contribute to the estimates
calculated at the time of analysis. For each set of trial parameters, 5000 iterations of the trial data
were generated. For each simulation the log hazard ratio was estimated by PROC LIFEREG
specifying an exponential failure time distribution. The mean of these estimates was taken across
all 5000 simulations. Selected results are presented below.
4.2 Bias in Estimates of the Hazard Ratio
The effects of three structural trial variables on treatment estimates resulting from the above data
collection methods were of interest: the probability of delayed reporting of an event, the interval
between scheduled follow-up visits, and the amount of time elapsing between the end of
81
enrollment and the time of analysis. Hazard rates have been chosen which reflect oncology trials
involving malignancies such as central nervous system (CNS) tumors and non-stage-IV breast
cancer. Sample sizes were based on attaining 80% power to detect a hazard ratio of 0.67 at an
analysis time of four years.
4.2.1 Probability of Delayed Event Reporting
The first simulated trial structure enrolled patients for three years then allowed two years of
enrollment-free follow-up before analysis (Figure 1). The hazard rate in the higher-risk group
was 1 and the hazard ratio was 0.67. There were 150 patients randomized to each group and
follow-up visits occurred every three months. The probability of an event not being reported
until the next visit was taken to be 0, 0.1, 0.5, 0.9, and 1. All estimates of the hazard ratio fell
between 0.67 and 0.68. Because I assume that all events, if not reported immediately, are
reported at the next scheduled visit, the bias of the personal cut-back method is not affected by
changes in the probability of this delay. The bias of the global cut-back method is also not
affected by changes in this probability because I assume that all patients will have at least one
visit during the redacted time and this visit would provide the true event status of a patient at the
time point chosen for analysis. The standard method shows a slight bias toward the null when no
events are reported with delay but this bias diminishes as the probability of delay increases.
When events are only reported with delay, all patients are censored at their last visit prior to
analysis. Therefore, as probability of delay increases, the probability that all patients are
censored in the same way also increases. The opposite is true of the pull-forward method. When
all events are immediately reported then all patients without reported events survive to the time
of analysis. This is exactly what this method assumes. But when no event is reported until the
next visit then all patients who experience events between their last visit and the analysis are
82
mistakenly assumed to survive until analysis without event. The reason that these biases are so
minimal in this particular setting is the choice of hazard rates and follow-up interval. These
particular hazard rates and short interval imply that one can expect relatively few events to occur
in the interval before analysis. Therefore few patients may be preferentially reported to have had
an event in this interval by the standard method. Similarly, few patients may be falsely assumed
to survive until analysis by the pull-forward method. Note that consecutive plots within the same
section have been designed with the same scale for the y-axis to aid in visual comparisons.
Figure 19. Change in the estimates of the hazard ratio as the probability of late reporting
increases assuming uniform enrollment over a period of three years, a visit interval of three
months, 150 patients per group, analysis at five years, and true hazard rates of 0.67 and 1.
83
The second simulated trial structure enrolled patients for three years and analysis was
undertaken at the end of the enrollment period. Compared to the first simulation, the length of
enrollment-free follow-up, the hazard rate, and the follow-up interval were changed. The hazard
rate in the higher-risk group was 3 and the hazard ratio was 0.67. There were 150 patients
randomized to each group and follow-up visits occurred annually. The probability of an event
not being reported until the next visit was taken to be 0, 0.1, 0.5, 0.9, and 1. The numbers of
reported events in this simulation are lower than those in the previous simulation due to the fact
that analysis is conducted two years earlier. Compared to the previous setting the cutback
methods were slightly more biased toward the null but not appreciably so. This slight change is
most likely due to the information lost by performing analysis at the end of enrollment without
allowing for accrual of survival time among those patients enrolled late in the trial; not enough
analytic time has been allowed to pass to see the true differences in treatments. The biases in the
standard and pull-forward methods are drastically worse. The same trends can be seen in each as
before, with the standard method being most biased toward the null when all events are quickly
reported. The pull-forward method is most biased toward the null when all events are reported
with delay. The severity in the bias in this case results from the high hazard rates and long
interval between visits. Many more patients will fail between their last visit and the analysis
which implies more preferential reporting for the standard method when delay is unlikely. There
will also be more false assumptions of survival to analysis for the pull-forward method when
delay is likely. As can be seen from Figure 2, the method of pulling survival times forward to the
time of analysis carries far more bias than the standard method in this setting. This is due to high
risk nature of the study, the implication being that because all patients are at such a high risk of
failure, the assumption that they survive without failure to analysis made by the pull-forward
84
method is unlikely. The assumption that those patients who have their event between their last
visit and the analysis are similar to the remaining patients as made by the standard method is
more likely.
Figure 20. Change in the estimates of the hazard ratio as the probability of late reporting
increases assuming uniform enrollment over a period of three years, a visit interval of one year,
150 patients per group, analysis at three years, and true hazard rates of 2.01 and 3.
4.2.2 Interval Between Visits
It is common practice in clinical trials to schedule follow-up visits at regular intervals, such as
every six months. Two trial designs were simulated to investigate the effect of the length of this
interval on hazard estimates. For the first trial, patients were enrolled for three years and analysis
was conducted two years after that. The hazard rate in the higher-risk group was 1 and the hazard
85
ratio was 0.67. For each of the two groups 150 patients were randomly assigned. The probability
of an event being reported with delay was fixed at 0.1. The interval between visits varied as
follows: 3 months, 6 months, 9 months, and one year. The higher-risk group experienced a mean
of 143 events across the 5000 simulations and the mean across all simulations of the median
survival times was 0.70 years. The lower-risk group experienced an average of 133 events and
the average of the median survival times was 1.04 years. All estimates of the hazard ratio fell
between 0.67 and 0.69. The estimates generated by the cutback methods are not affected by the
change in the interval between visits. The pull-forward estimate improves very slightly as the
interval increases due to the fact that most events are reported immediately. At no point does the
pull-forward method produce results biased more than 5% toward the null. However, the
standard method does produce estimates that are more biased toward the null and this bias
increases as the interval between visits gets longer. Many patients do survive to the analysis and
have their survival time censored at their last visit. The longer the length between visits the more
observation time is lost for these patients. Coinciding with this is the low probability of a delay
in event reporting, which means that most of the events occurring shortly before analysis are
reported. Therefore, the patients with reported survival times after their last visit are very
different from the patients who are censored at their last visit. The greater this difference, the
greater the bias in the resulting estimate generated by the standard method.
86
Figure 21. Change in the estimates of the hazard ratio as the window between visits increases
assuming uniform enrollment over a period of three years, a probability of late reporting of 1/10,
150 patients per group, a time of analysis of five years, and true hazard rates of 0.67 and 1.
The second trial simulated enrolled patients for three years and conducted analysis at the end of
that enrollment period. The delayed reporting probability, the hazard rate in the higher-risk
group, and the length of the follow-up interval differed from the first simulation. The hazard rate
in the higher-risk group was 3 and the hazard ratio was 0.67. Each patient group was allotted 150
patients and the probability of an event being reported with delay was 0.9. The higher-risk group
experienced a mean of 133 events across the 5000 simulations and the mean across all
simulations of the median survival times was 0.21. The lower-risk group experienced an average
of 125 events and the average of the median survival times was 0.30. As before, the cutback
estimates of the hazard ratio change negligibly with changes in the follow-up interval. In this
case, the standard reporting method is well-behaved, exhibiting a small amount of bias and
87
infinitesimal change as the follow-up interval varies. One may expect the standard method to
produce increasingly biased estimates as the follow-up interval increases, for the longer this
interval the more time under observation is missed for those patients who do not report an event.
However, the high probability of delayed event reporting implies that most events in the last such
interval before analysis are not reported. Therefore all patients without event and most patients
with event are all censored at their last visit before analysis. In this way the standard method
approaches the personal cutback method and the bias in the estimate vanishes with increasing
probability of delayed event reporting. By comparison, the pull-forward estimates of the hazard
ratio change greatly with a change in follow-up interval under these conditions, ranging up to
nearly 0.75 when the follow-up interval is one year. In this situation, hazard rates are high and
most events occurring in the last follow-up interval before analysis are not reported. The patients
with unreported events will mistakenly have their survival time drawn forward to the time of
analysis by the pull-forward method; the longer the follow-up interval the more time that is
added to the time under observation for those patients. Missing events in both groups and
allocating the survival time of those failed patients to event-free patients causes the pull-forward
estimate of the hazard ratio to be biased much farther toward the null than any other method
under these circumstances.
88
Figure 22. Change in the estimates of the hazard ratio as the window between visits increases
assuming uniform enrollment over a period of three years, a probability of late reporting of 9/10,
150 patients per group, a time of analysis of three years, and true hazard rates of 2.01 and 3.
4.2.3 Time Elapsed Between End of Enrollment and Time of Analysis
In order to investigate the impact of the length of follow-up time accrued following the close of
enrollment, treatment groups, each of size 134, were assigned hazard rates of 1 and 0.67. Follow-
up visits were scheduled every six months and the probability of delayed reporting of events was
50%.
89
Figure 23. Change in the estimates of the hazard ratio as the time of analysis increases assuming
uniform enrollment over a period of three years, a probability of late reporting of 1/2, 134
patients per group, a visit interval of six months, and true hazard rates of 0.67 and 1.
The “Ideal” estimate referred to in the above figure denotes the estimate of the hazard based on
the simulated true patient statuses current up to the time of analysis. The cutback methods
experienced little change in the estimate of the hazard ratio as the post-enrollment follow-up
period increases and remain close to the hypothetical ideal estimate. As expected, based on
previous derivations, the standard estimate overestimates the hazard ratio while the pull-forward
method underestimates it. As additional follow-up time is accrued both estimates experience a
reduction in bias. Note that the introduction of bias into the hazard ratio estimates produced by
these two methods occurs only in the final follow-up interval prior to the analysis. If the length
90
of the follow-up visit interval is kept constant at w and the length of analysis A is increased, then
the proportion of total observation time that is susceptible to bias, w/A, decreases as does the
bias in the estimates.
4.3 Power Analysis
4.3.1 Methods
Data were generated by the same methods that were used for the analysis of bias. Exponential
regression models were fit using PROC LIFEREG and log-rank statistics were generated using
PROC LIFETEST. Five thousand repetitions were run for each parameter set-up and p-values for
the exponential regression and the log-rank tests were generated. Power was calculated as the
percent of repetitions resulting in a p-value of less than 0.05. Designed power calculations were
performed using PASS version 12 statistical software. All other analyses were conducted with
SAS version 9.2.
91
4.3.2 Probability of Late Reporting
Figure 24. Change in the estimates of the power of the log-rank test as the probability of late
reporting increases assuming uniform enrollment over a period of three years, a time of analysis
of five years, 110 patients per group, a visit interval of six months, and true hazard rates of 0.67
and 1.
To generate data, enrollment was assumed to follow a uniform distribution over an enrollment
period of three years. The survival times of members of the two treatment groups were assumed
to follow an exponential distribution with rate parameters equal to 1 and 0.67. Follow-up visits
were set for every six months and the analysis time was five years. A sample size of 110 in each
group was used to design the trial such that the log-rank test reached 80% power to detect a
hazard ratio of 0.67 at the planned analysis time point. The probability of late-reported events
ranged from zero to one by intervals of 0.1.
For all four methods, the log-rank test had comparable power and little variation as the
probability of late reporting varied. At 78.84%, the global cutback method did have slightly less
power than the other approaches as is to be expected from the fact that it bases estimates on less
92
information than the others. The pull-forward method lost a small amount of power for higher
probabilities of late reporting, reaching a power of 79.36% when all events were reported at the
visit after which they occurred. The standard and personal cutback methods maintained similar
power with the personal cutback achieving 79.72% power throughout and the standard method
not falling below 79.62%.
Figure 25. Change in the estimates of the power of the exponential regression test for treatment
effect as the probability of late reporting increases assuming uniform enrollment over a period of
three years, a time of analysis of five years, 110 patients per group, a visit interval of six months,
and true hazard rates of 0.67 and 1.
Because the log-rank test does not rely on the underlying distribution of survival times while the
exponential regression takes advantage of the true underlying distributions, the power of the
regression test for treatment effect is greater than the power of the log-rank test across the board.
As with the log-rank test, the exponential regression test had the smallest power when the global
cutback method was used (79.26%). The personal cutback method achieved a power of 80.34%.
The standard method drifted between the two cutback methods, starting at the lower, global
93
cutback power when late reporting did not happen and drifting up to the personal cutback power
when all events were reported with delay. As previously mentioned, when all events are reported
at the visit following their occurrence, the standard method reduces to the personal cutback
method as all patients who are event-free at their last visit are then censored at that time. Even
though the standard method uses more data than the global cutback method, when all events are
reported quickly the two methods have the same power. The power of the pull-forward method
increases as the probability of late reporting increases, ranging from 80.74% to 82.10%. When
all events are reported with delay, all patients who experience an event between their last visit
and the analysis are assumed by the pull-forward method to survive event-free to the time of
analysis. All patients event-free at their last visit will be assumed to survive up to the time of
analysis. As the better treatment can be expected to have more patients event-free at that time
than the inferior treatment, the better treatment will garner more survival time and will appear
even better than it is. So as the probability of late reporting increases, the pull-forward method
will always reject the null hypothesis more often.
To further investigate the performance of these methods under the exponential regression
setting, 95% confidence intervals for the hazard ratio from each simulation were checked to see
if they contained the true ratio of 0.67. These coverage probabilities are presented in Figure 22.
The “Ideal” series denotes the use of simulated patient data that is accurate up to the time of
analysis. The pull-forward method produced confidence intervals that contained the true ratio
less than 95% of the time when the probability of late reporting was greater than 40%. As this
coincides with an increased probability of rejecting the null hypothesis, we can deduce that this
reflects an estimation of the hazard ratio that is biased away from the null. To examine the
spread of ratio estimates over simulations, box plots of estimates are presented in Figure 23.
94
Three values of late reporting probability were selected, data were aggregated, and the inter-
quartile ranges of ratio estimates were found for each method. Ratio estimates that fell outside of
the IQR were deleted and the remaining data were used to generate the box and whisker plot
below. This was done in order to produce a plot with a scale that aided visual interpretation;
including all values in the range of ratio estimates would obscure any differences between
methods. All methods have similar inter-quartile ranges, though pull-forward estimates tend to
be lower and standard estimates tend to be higher.
Figure 26. For each probability of late reporting, the percent of the 5000 simulations for which
exponential regression produced a confidence interval for the hazard ratio which contained the
true ratio of 0.67.
95
Figure 27. Box and whisker plots for ratio estimates within the overall inter-quartile range for
selected probabilities of late reporting.
96
4.3.3 Length of Visit Interval
Figure 28. Change in the estimates of the power of the log-rank test as the length of the interval
between visits increases assuming uniform enrollment over a period of three years, a time of
analysis of five years, 110 patients per group, a probability of late reporting of 1/2, and true
hazard rates of 0.67 and 1.
All study parameters were set equal to those used in the above analysis with the exception that
the probability of late reporting was held at 50%. The window between scheduled follow-up
visits ranged from three months to one year by three-months intervals. There were 110 patients
assigned to each treatment group for a designed power of 80% for the log-rank test.
The global cutback method suffered from a notable loss of power as the follow-up interval
increased. For the three-month follow-up interval, this method resulted in a log-rank test with
79.82%. But this rate decreased steadily as the length of the interval increased, reaching a low of
75.86% power when this interval was one year long, reflecting the loss of information resulting
from disregarding a full year’s worth of data. Though all methods saw a decline in power as the
97
follow-up interval increased, the other methods stayed within 1.5% of the simulated power of
80%.
Figure 29. Change in the estimates of the power of the exponential regression test for treatment
effect as the length of the interval between visits increases assuming uniform enrollment over a
period of three years, a time of analysis of five years, 110 patients per group, a probability of late
reporting of 1/2, and true hazard rates of 0.67 and 1.
The exponential regression setting again differentiates the methods a bit more than does the log-
rank test. The global cutback method has the lowest power of the methods throughout the range
of the follow up interval. The power of this method ranges from a very acceptable 80.20% for
the three-month interval to 75.94% when this interval reaches one year in length. The method
producing the second-lowest power of the test for treatment effect is the standard method, though
this method falls only slightly short of the personal cutback method by this metric. Both the
standard and the personal cutback methods experience a loss of power with an increase of the
interval length, starting at 80.30% and 80.48% respectively for the three-month interval, and
falling to 78.50% and 78.96% for the one-year interval. The pull-forward method consistently
98
rejects the null hypothesis more often than any other method, with a maximum power of 81.44%
for the six-month long interval and a minimum of 81.18% when the interval is one year long.
Coverage probabilities are presented in Figure 26. The pull-forward method produced
confidence intervals that contained the true ratio less than 95% of the time when the follow-up
interval was six months long or more. The cutback methods undergo a small but unpredictable
variation in coverage probabilities. Box plots of hazard ratio estimates within the IQR of each
method are presented in Figure 27. All methods have similar inter-quartile ranges, though pull-
forward estimates tend to be lower and standard estimates tend to be higher.
Figure 30. For each length of follow-up interval, the percent of the 5000 simulations for which
exponential regression produced a confidence interval for the hazard ratio which contained the
true ratio of 0.67.
99
Figure 31. Box and whisker plots for ratio estimates within the overall inter-quartile range for
selected lengths of follow-up interval.
100
4.3.4 Time of Analysis
Figure 32. Change in the estimates of the power of the log-rank test as the length of post-
enrollment follow-up increases assuming uniform enrollment over a period of three years, 134
patients per group, a probability of late reporting of 1/2, an interval between visits of six months,
and true hazard rates of 0.67 and 1.
A late reporting probability of 0.5 and a follow-up interval of six months were used to generate
the data for this analysis. The time of analysis varied from 3.25 to 6 years by three month
intervals. It should be noted that, with 134 patients in each treatment group, this study was
designed to reach 84.76% power at the 4-year mark. Data are presented for early analysis points
to illustrate the difference between the four collection methods and to provide evidence of a
trend in the power as more follow-up time is accrued.
The power of the long rank test predictably increases as the follow-up period increases,
illustrating the additional information available on which to base these statistical tests. By the
five-year mark, all methods achieve at least 86% power and when follow-up is extended to six
years, all methods attain a power of approximately 89%. The standard method and the pull-
101
forward method are nearly identical throughout with the personal cutback method falling only
slightly short of the power of these two methods, though this difference disappears by the five-
year mark. The standard and pull-forward methods attain 74.80% and 74.42% respectively when
the analysis is performed at 3.25 years and 83.90% and 83.60% when the analysis is performed
at 4 years. The personal cutback method achieves 73.02% power when analysis is performed at
3.25 years and increases to 83.30% by the 4-year point. The global cutback method is again the
method that results in the lowest power for the log-rank test for all analytic time-points. When
analysis is performed only three months after the end of enrollment, the log-rank test has 67.84%
power. At one year after the end of enrollment, this increases to 80.98%.
Figure 33. Change in the estimates of the power of the exponential regression test for treatment
effect as the length of post-enrollment follow-up increases assuming uniform enrollment over a
period of three years, 134 patients per group, a probability of late reporting of 1/2, an interval
between visits of six months, and true hazard rates of 0.67 and 1.
The test for treatment effect in the exponential regression setting is quite similar to the behavior
of the log-rank test when examining changes in power as the time of analysis varies. Though this
102
test attains slightly more power in this setting and there is slightly more separation of the
methods, the range of power reflects both the behavior and range of the power of the log-rank
test under the same circumstances. At the planned analysis time point, the methods rank in terms
of smallest power to largest as follows: global cutback (81.02%), standard(83.62%), personal
cutback(83.72%), pull-forward (85.52%). As with the log-rank test the global cutback attains
markedly less power than the other methods, the standard and personal cutback methods retain
their similarity, and the pull-forward method achieves slightly more power than the other
methods.
Coverage probabilities are presented in Figure 30. As more post-enrollment follow-up
time is accrued, all methods converge to a 95% coverage probability. Box plots of hazard ratio
estimates within the IQR of each method are presented in Figure 31. All methods experience a
reduction in the size of the IQR as analysis occurs later and later. For early analysis times, the
global cutback method has a larger IQR than the other methods, though this difference is
attenuated for later analysis times.
103
Figure 34. For each time of analysis, the percent of the 5000 simulations for which exponential
regression produced a confidence interval for the hazard ratio which contained the true ratio of
0.67.
104
Figure 35. Box and whisker plots of estimates of the hazard ratio within the overall inter-quartile
range for selected times of analysis.
4.3.5 Conclusion
The probability of late event reporting only appears to affect the power of the test for treatment
effect in the exponential regression setting. As this probability increases, both the standard and
the pull-forward methods reject the null hypothesis more often. When no events are reported
after patients’ last visit, all patients event -free at that time are assumed to maintain their status to
analysis time. As the better treatment will have more event-free patients, this treatment will
garner more survival time resulting in an estimate of relative risk being biased away from the
null, which will cause the test for treatment effect to reject the null hypothesis more often
regardless of the true difference in survival rates between the two treatment groups.
105
The tests for treatment effect generally have less power the longer the interval between
scheduled visits. All methods except the pull-forward method experience some loss of
information related to the length of the follow-up window. The longer this interval, the earlier
the global cutback analysis point, the earlier the personal cutback analysis censors all patients,
and the earlier the standard method censors people who lack a reported event. In each case, the
longer the interval the more information that is left out of the analysis. Because of the
assumption that all patients without a reported event maintain this status up to analysis, the pull-
forward method can be said not to lose information regardless of the length of the follow-up
window when this assumption is met. However, when this assumption is not met, the longer the
window the more time that is mistakenly allotted to patients with unreported events.
The longer that patients are observed, the more information is collected, the more often
that both the log-rank test and the exponential regression test will come to the correct conclusion.
The result is an increased power to detect the true treatment effect. Although none of the log-
rank tests quite reached the designed power for an analysis occurring at four years from the
inception of the trial, this short-coming is always ameliorated by simply accruing more follow-up
time.
The global cutback universally has the lowest power as this is the method that loses the
most information. Based on the performance of tests based on this method, it is inadvisable to
use this approach to make inferential tests of treatment effect as such tests may fail to detect a
difference when one truly exists. Despite the fact that it also disregards some information, the
power of the tests resulting from personal cutback method is similar to that of the standard
method. The pull-forward method consistently attains the highest power but this is sometimes
due to the bias in the relative hazard estimate and may reflect a tendency to reject the null
106
hypothesis in any situation rather than the tendency to come to the correct conclusion when
events are reported with delay.
4.4 Test Size
4.4.1 Methods
Data were generated under the same parameters used for the power analyses with the exception
that the hazard rate for both groups was set to 1. As before, log-rank tests were performed using
PROC LIFEREG and exponential regression was conducted using PROC LIFEREG in SAS
version 9.2. Size was defined to be the percentage of tests that rejected the null hypothesis of no
treatment effect. In the plots that follow, the “Ideal” method denotes the percentage of tests that
reject the null when perfect information is used up to the time of analysis.
4.4.2 Probability of Late Reporting
Figure 36. Change in the estimates of the size of the log-rank test as the probability of late
reporting increases assuming uniform enrollment over a period of three years, 110 patients per
group, a visit interval of six months, analysis at five years, and true hazard rates of 0.67 and 1.
107
Enrollment was assumed to follow a uniform distribution over the three-years patients were
enrolled. The follow up window was six months and analysis was conducted five years after the
inception of the trial. When all true patient statuses and survival times are used to generate the
log-rank test for equality of survival functions between the two treatment groups, the null
hypothesis is rejected 4.62% of the time. The global and personal cutback methods, not being
affected by the probability of late reporting, maintain a size of 4.82% and 4.92% respectively.
When no events are reported with delay, the assumption of the pull-forward method is met and
the resulting log-rank test has the same size as the hypothetical situation in which all true patient
statues are known. As the late-reporting probability increases the size of the pull-forward method
increases, reaching the same size as the global cutback method when all events are reported with
delay. The standard method behaves a bit unpredictably, though it tends to stay on the higher end
of the scale. The size of the test resulting from the standard method achieves a maximum of
5.02% when the probability of late reporting is 10% and 30%. It achieves its lowest size of
4.80% when late reporting happens 60% of the time.
108
Figure 37. Change in the estimates of the size of the exponential regression test for treatment
effect as the probability of late reporting increases assuming uniform enrollment over a period of
three years, 110 patients per group, a visit interval of six months, analysis at five years, and true
hazard rates of 0.67 and 1.
When perfect information is used, the test for treatment effect in the exponential regression
setting rejects the null hypothesis 4.76% of the time. The global and personal cutback methods
have size 4.75% and 4.79% respectively. The size of the standard method test tends to increase,
starting at 4.62% when event reporting is not delayed and climbing to 4.79% when all events are
reported at the visit after they occur. As seen in the power analysis, the pull-forward method
tends to reject the null more as the probability of late reporting increases. When events are not
reported with delay, the pull-forward method test has a size of 4.76% and when all events are
reported with delay the size is 5.23%, higher than the maximum size of any other method.
To further investigate the performance of these methods under the exponential regression
setting, 95% confidence intervals for the hazard ratio from each simulation were checked to see
if they contained the true ratio of 1. These coverage probabilities are presented in Figure 34. The
109
“Ideal” series denotes the use of simulated patient data that is accurate up to the time of analysis.
The pull-forward method produced confidence intervals that contained the true ratio less than
95% of the time when the probability of late reporting was greater than 60%. As this coincides
with an increased probability of rejecting the null hypothesis, we can deduce that this reflects an
estimation of the hazard ratio that is biased away from the null. To examine the spread of ratio
estimates over simulations, box plots of estimates are presented in Figure 35. Three values of late
reporting probability were selected, data were aggregated, and the inter-quartile ranges of ratio
estimates were found for each method. Ratio estimates that fell outside of the IQR were deleted
and the remaining data were used to generate the box and whisker plot below. This was done in
order to produce a plot with a scale that aided visual interpretation; including all values in the
range of ratio estimates would obscure any differences between methods. All methods have
similar inter-quartile ranges in terms of both spread and location.
110
Figure 38. For each probability of late reporting, the percent of the 5000 simulations for which
exponential regression produced a confidence interval for the hazard ratio which contained the
true ratio of 1.
111
Figure 39. Box and whisker plots of the hazard ratio estimates within the overall inter-quartile
range for selected probabilities of late reporting.
112
Figure 40. Change in the estimates of the size of the log-rank test as the interval between follow-
up visits increases assuming uniform enrollment over a period of three years, 110 patients per
group, a probability of late reporting of 1/2, analysis at five years, and true hazard rates of 0.67
and 1.
4.4.3 Interval Between Follow-Up Visits
Using perfect simulated information, the log-rank test rejects the null hypothesis 4.62% of the
time. As the length of the window between visits increases, the pull-forward method rejects the
null more often, starting at 4.68% for a three-month window and increasing to 4.78% for a one-
year window. The remaining methods tend to increase in size as the window goes from three to
six months in length but decrease in size as the window goes from 9 months to one year. The
personal cutback and standard method are similar in size throughout. Both achieve a maximum
when the follow-up interval is six months in length : 4.92% and 4.94% for the personal cutback
and standard methods respectively. Both achieve a minimum when this interval is one year long:
4.78% for the personal cutback and 4.72% for the standard. The global method appears most
unpredictable, starting at 4.76% for the three-month interval, spiking to 5.12% for the 9-month
interval, then ending at 4.86% when the interval is one year long.
113
Figure 41. Change in the estimates of the size of the exponential regression test for treatment
effect as the interval between follow-up visits increases assuming uniform enrollment over a
period of three years, 110 patients per group, a probability of late reporting of 1/2, analysis at
five years, and true hazard rates of 0.67 and 1.
The test for treatment effect in the exponential regression setting that uses perfect information in
this case has a size of 4.76%. The increase in size with the pull-forward method is much more
pronounced in the regression test than it is in the log-rank. For the three-month visit interval, the
size of the test for treatment effect is 4.87% and this increases steadily to 5.45% when the visit
interval is one year long. The global cutback method generally tends to increase in size with an
increase in the visit interval length. This method reaches a minimum size of 4.75% for an
interval of six months and a maximum of 4.92% for an interval of one year. The personal
cutback method does not follow a trend as the length of the follow-up interval increases and
demonstrates negligible variation. The size of this test is smallest at 4.79% for intervals of length
six months and one year and largest at 4.85% for an interval of three months. The standard
114
method also does not follow a monotonic pattern but is does seem to decrease at the higher end
of interval lengths, dropping to 4.62% when the window between visits is one year.
Coverage probabilities are presented in Figure 38. The pull-forward method produced
confidence intervals that contained the true ratio less than 95% of the time when the follow-up
interval was nine months long or more. Box plots of hazard ratio estimates within the IQR of
each method are presented in Figure 39. All methods have similar inter-quartile ranges in terms
of both spread and location.
Figure 42. For each length of follow-up interval, the percent of the 5000 simulations for which
exponential regression produced a confidence interval for the hazard ratio which contained the
true ratio of 1.
115
Figure 43. Box and whisker plots of the hazard ratio estimates within the overall inter-quartile
range for selected lengths of follow-up intervals.
116
4.4.4 Time of Analysis
Figure 44. Change in the estimates of the size of the log-rank test as time of analysis increases
assuming uniform enrollment over a period of three years, 134 patients per group, a probability
of late reporting of 1/2, a visit interval of six months, and true hazard rates of 0.67 and 1.
As the time of analysis varies, the precise values for size tend to be a bit sporadic, even for the
test that uses the true data, but all methods result in tests that have a very slight downward trend
as the time of analysis increases. The more time passes before the analysis is conducted, the
more information that is collected, and the more often statistical tests will reach the correct
conclusion.
lrksize
4.00
4.50
5.00
5.50
6.00
6.50
7.00
t_analysis
3 4 5 6
Size of the Log-Rank Test by Time of Analysis
method Global Cutback Ideal Personal Cutback Pull Forward Standard
117
Figure 45. Change in the estimates of the size of the exponential regression test for treatment
effect as time of analysis increases assuming uniform enrollment over a period of three years,
134 patients per group, a probability of late reporting of 1/2, a visit interval of six months, and
true hazard rates of 0.67 and 1.
Unlike the log-rank test, the exponential regression test for treatment effect is different in terms
of size for the different data collection methods. The cutback methods behave in a similar
fashion to the ideal test, and tend to decrease slightly with an increase in follow-up time. The
ideal test has the highest size when the analysis is done at 3.5 years, rejecting the null 5.40% of
the time. The personal cutback method achieves its maximum of 5.30% at an analysis time of
four years. The global cutback method is largest at 5.50% when the time of analysis is only three
months following the close of enrollment. The pull-forward method always rejects the null
hypothesis more than the standard threshold of 5% of the time. This method results in tests that
are always of larger size than any other method: 6.93% when analysis is done at 3.25 years and
5.18% when done at six years. The standard method has the second largest size of the methods at
the earliest analysis time and the smallest at the latest; 6.49% at 3.25 years and 4.88% at 6 years.
118
Coverage probabilities are presented in Figure 42. As more post-enrollment follow-up
time is accrued, all methods converge to a 95% coverage probability. Box plots of hazard ratio
estimates within the IQR of each method are presented in Figure 31. All methods present similar
IQRs in terms of both location and spread.
Figure 46. For each time of analysis, the percent of the 5000 simulations for which exponential
regression produced a confidence interval for the hazard ratio which contained the true ratio of 1.
119
Figure 47. Box and whisker plots of the hazard ratio estimates within the overall inter-quartile
range for selected times of analysis.
4.5 Conclusion
The probability of delayed event reporting does not have much impact on the test size for
treatment effect with the exception of the pull-forward method. The more likely it is for events to
be reported late, the more often the pull-forward method rejects the null hypothesis. This is the
case here, where the treatment groups really are the same in terms of survival, but it is also the
case when the treatment groups are truly different, as seen in the power analysis. What may seem
like superior performance of this method in terms of power is in fact a reflection of the tendency
120
of this method to reject the null hypothesis when it is likely it is for the assumption that patients
without a reported method reach analysis time event-free fails to hold.
The pull-forward method is also the only method for which test size exhibits a trend with
the lengthening of the interval between scheduled follow-up visits. The variation in size for the
other methods does not follow a clear pattern, but the size of the pull-forward test increases as
the follow-up interval increases. For these calculations, the probability of late reporting was
50%, so the assumption that all events are reported when they happen does not hold. The impact
of this fact is greater the longer the follow-up interval, as longer windows imply more survival
time mistakenly allotted to patients with unreported events. If the effect of improperly allotted
survival time is to cause the pull-forward method to reject the null more often, it follows that a
longer visit interval will result in increased size in this setting. It also bears noting that the global
cutback method results in a log-rank test with a fair amount of variation that has a larger size
than any other method for longer follow-up intervals.
The impact of the time of analysis on tests for treatment effect only has a notable impact
on the standard and pull-forward methods and only in the exponential regression setting. As in
all cases of exponential regression, the pull-forward method consistently rejects the null
hypothesis more than any other method. Both of these biased methods have large size when
analyses are conducted early and smaller size for later analyses. This is not surprising as both
methods are most biased away from the null when analysis is conducted early.
Neither the standard nor the personal cutback methods suffer from extremes in magnitude
of size nor in amount of variation in size and the two behave in a similar fashion throughout the
size analyses. The global cutback method does tend to vary a bit more throughout the analyses
and does at time achieve the largest size of all methods. The pull-forward method is most
121
affected by changes in the probability of late reporting, the length of the follow-up interval, and
the time of analysis. This method often rejects the null more than 5% of the time when the null
hypothesis is true. Based on test size, this method is not an advisable one if there is any doubt
that events are reported quickly.
122
Chapter 5. Application of Data Collection Methods to Clinical Trial Data
Asymptotic calculations provide important information about the theoretical behavior of an
estimate. Though simulations are more informative of the finite-sample tendencies of estimates,
they are still based on data manufactured by making assumptions about the underlying random
variables. In order to investigate the performance of hazard rate and rate ratio estimates
generated by each of the four data collection methods, these procedures were applied to the
analysis data sets of two randomized clinical trials.
5.1 Trials
Each of the four data collection methods were applied to the analysis data of two trials. Both
trials investigated treatments for Ewing’s sarcoma and took as their primary outcome event -free
survival. An event was defined to be disease progression (relapse), a second malignancy, or
death. Survival time was measured from patient enrollment to event or last patient contact.
5.1.1 INT-0019
Opened in December of 1988, NCI protocol INT-0091 compared the standard chemotherapy
treatment at that time (doxorubicin, cyclophosphamide, vincristine, and dactinomycin) to said
treatment with the addition of ifsofamide and etoposide. Treatment groups will therefore be
denoted IE for the experimental group or non-IE for the group receiving the standard treatment.
A total of 389 patients aged 30 and younger with newly diagnosed, non-metastatic Ewing’s
sarcoma of the bone were randomly assigned to one of the two treatment arms. Enrollment was
closed in November of 1992. While on protocol therapy, patients underwent treatment every
three weeks for a total of 17 courses of chemotherapy. Follow-up visits occurred every six
months from the end of the last course of therapy for five years, then every year thereafter for a
123
total maximum follow-up time of ten years. For the purposes of the final analysis, data collection
ceased on August 31, 2000.
5.1.2 AEWS0031
The Children’s Oncology group opened enrollment to the AEWS0031 study in May 2001. This
trial sought to determine whether shortening the interval between chemotherapy treatments
increased event-free survival. A total of 568 patients aged 50 and younger with newly
diagnosed, non-metastatic Ewing’s sarcoma were randomized to one of two treatment arms.
Those on the standard therapy were scheduled to receive treatment every 21 days while those on
the experimental treatment were scheduled to receive it every 14 days. Both arms were to
receive a total of 14 cycles of chemotherapy. Patients were followed every six months from the
last day of the final course of treatment. Patients were enrolled for 4.5 years and the trial was
designed to be analyzed 5.5 years after the first patient was enrolled. However, the final
published analysis was based on data collected up to March 2009, over two years after the
originally planned analysis date.
5.2 Methods
The analysis data set for INT-0091, which had a freeze date of August 31, 2000, was used for
this analysis. The primary publication data set was also used for AEWS0031, which had a final
freeze date of March 31
st
, 2009. The exact visit dates were only available for the period of time
during which patients were undergoing protocol therapy so post-therapy visit dates had to be
imputed. This was done by assuming that patients complied with the planned visit schedule and
came in for follow-up assessment every six months following the last day of their last course of
treatment. For patients on INT-0091, if a patient completed five years of post-therapy follow-up,
124
they were assumed to have attended follow-up visits every year on the day of their last visit
during the initial five-year follow-up period. Follow-up was assumed to continue in this fashion
until the patient had an event or reached their recorded date of last contact or the date of analysis.
Analyses were done for dates that preceded the freeze dates of the provided data sets. The
analysis dates chosen for INT-0001 were December 8
th
, 1993 and December 8
th
, 1995, one and
two years following the close of enrollment respectively. The dates chosen for AEWS0031 were
December 7
th
, 2006 and December 7
th
, 2007, representing 5.5 and 6.5 years from the enrollment
of the last patient, respectively. For each analysis, a survival time and event status was calculated
for each data collection method based on the analysis date, their event date, their imputed last
follow-up, and the date of their last contact. Based on these data, crude event rates (events per
thousand patient-months presented in the following tables), p-values for the log-rank test, and
Kaplan-Meier plots of the survival curves were generated using Stata version 12.
5.3 Results
Table 5
INT-0091 Analyzed based on information up to December, 1993
Patient
Time
Events
Crude
Rates
Rate
Ratio
Log-
Rank
p-value
Standard Method 0.0043
IE 5140 50 9.73
non-IE 4379 73 16.67 1.71
Personal Cutback Method 0.0056
IE 5135 47 9.15
non-IE 4369 69 15.79 1.73
Global Cutback Method 0.0036
IE 4741 41 8.65
non-IE 4032 64 15.87 1.83
Pull-Forward Method 0.0045
IE 5821 50 8.59
non-IE 4942 73 14.77 1.72
125
The primary finding of the INT-0091 study, based on data collected up to August of 2000, was
that the relative risk associated with the standard treatment compared to the experimental was 1.6
with a p-value of 0.005. The first analysis presented here was based on an analysis date of
December 8
th
, 1993, just over one year following the closing of enrollment. Crude rate estimates
were not provided in the publication of the primary findings, but the reported estimate of relative
risk can be compared to the measure of risk provided in the above table as the ratio of crude
rates.
For this particular study, there was a clear treatment effect and this is reflected by the rate
ratios and log-rank p-values of all methods. Methods slightly over-estimate the relative risk. This
is likely due to the early time of analysis. At this time, the standard treatment has already
suffered from nearly all the events the patients in that category will undergo. A later time point
will have roughly the same number of events but will include the extended survival time of those
patients remain event-free. If patients receiving the experimental treatment who experience
events do so at a more gradual, constant rate, then the early estimates of relative risk will indicate
greater risk in the standard group than may actually be the case over time.
The crude event rates are highest for the standard method and lowest for the pull forward
method. This is the relative magnitude we would expect for these two methods given the
asymptotic analyses above. Despite the differences in the individual crude rate estimates, the
rate ratio estimates are similar for those two methods.
The two asymptotically unbiased methods, viz., the personal and global cutback methods
provide crude rate estimates that fall between the standard and the pull forward methods. This is
as predicted given the examination of the asymptotic properties above.
126
The point estimate of the relative hazard rate is greatest for the global cutback method.
As noted above, the global cutback method sacrifices more follow-up time and observed events
than does any other of the data collection methods examined. As such, the variance of the
individual crude rate estimates and the hazard ratios will entail larger asymptotic variance than
the other methods. This may account, in part, for the difference between the ratio estimate for
the global cutback method when compared with the other methodologies. I did not have access
to other datasets associated with different analytic times that would have allowed for
computation of the four estimates across analysis times so variances in the ratios of the four
methods could be examined.
127
Table 6
INT-0091 Analyzed based on information up to December, 1995
Patient
Time
Events
Crude
Rates
Rate
Ratio
Log-
Rank
p-value
Standard Method 0.0017
IE 8469 60 7.08
non-IE 7126 87 12.21 1.72
Personal Cutback Method 0.0023
IE 8466 59 6.97
non-IE 7124 85 11.93 1.71
Global Cutback Method 0.0023
IE 8153 59 7.24
non-IE 6865 85 12.38 1.71
Pull-Forward Method 0.0018
IE 8926 60 6.72
non-IE 7488 87 11.62 1.73
The next analysis performed on the INT-0091 data took as the date of analysis December 8
th
,
1995, two years after the previous analysis date. The calculated event rates are lower than those
of the earlier analysis for both treatment groups for all data collection methods. The log-rank p-
values are smaller for all methods compared to the same tests in the earlier analysis. The
implication easily drawn from this observation is that, at this point, more information has been
collected which has therefore brought all estimates closer in addition to reducing the variance
present in the estimates. Despite different rate estimates, the rate ratios for the standard and
personal cutback methods remain nearly unchanged and similar to each other. The rate ratio
resulting from global cutback method, by contrast, has changed drastically, from being the
biggest estimate of the rate ratio to being among the smallest. The pull-forward method
predictably results in the lowest rate estimates but the ratio is nearly identical to that from the
earlier analysis.
128
Table 7
AEWS0031 Analyzed based on information up to December, 2006
Patient
Time
Events
Crude
Rates
Rate
Ratio
Log-
Rank
p-value
Standard Method 0.0176
Standard Treatment 8412 78 9.27 1.51
Intensive Treatment 8976 55 6.13
Personal Cutback
Method 0.0207
Standard Treatment 8404 75 8.92 1.51
Intensive Treatment 8974 53 5.91
Global Cutback Method 0.0350
Standard Treatment 7872 67 8.51 1.51
Intensive Treatment 8313 47 5.65
Pull-Forward Method 0.0215
Standard Treatment 9069 78 8.60 1.51
Intensive Treatment 9626 55 5.71
The final analysis of the AEWS0031, based on data collected up to March of 2009, showed a
relative risk of 0.74 (p-value = 0.048) associated with the intensive therapy, or 1.35 associated
with the standard treatment. The first analysis date presented here is December 7
th
, 2006, or 5.5
years after the enrollment of the first patient. This was the original designed end-point of the
trial. Remarkably, though the rate estimates generated by different methods are different, the rate
ratio is the same throughout. Consider, again, the fact that the bias present in the standard and
pull-forward methods is always more pronounced in groups with lower event rates. In this case,
the denominators of the rate ratios in the above table will be more over-estimated for the
standard method and more under-estimated in the pull-forward method. The affect is that the
ratio resulting from the standard method will be smaller than if there were no differential bias,
and the ratio resulting from the pull-forward method will be larger than if there were no
differential bias. Therefore, the effect of this differential bias is to bring the rate ratio estimates
closer to the truth than their rate estimate counterparts. In this particular realization, the result is
129
for all rate ratios to fall into agreement. But these early estimates of treatment effect are larger
than that found in the final result, similar to what was seen in the INT-0091 analyses above. This
is probably again due to the affect of the difference in the distribution of the event times in the
two treatment groups, with events in the standard treatment group occurring early while events in
the intensive group are more evenly spread over the observation time.
It should be noted, however, that the rate estimates are different when generated by
different methods. As to be expected, the standard method results in the highest rate estimates.
The personal cutback method generates larger rate estimates than the global method in this
situation. As we lose more information with the global cutback method, this loss is probably
disproportionately borne by the number of events rather than the amount of survival time,
indicating that events are happening between the global cutoff date and patients’ last visits prior
to the analysis date. If the rate of events is more frequent in the interval between the global
cutoff and the date of analysis than in previous times during the trial, this could result in biased
rate estimates if the global cutback method is used. The pull-forward rate estimates land between
the personal cutback estimates and the global cutback. The pull-forward method differs from the
standard method only by the addition of more survival time to those patients who do not report
an event, which then produces rate estimates that are smaller than the standard estimates.
130
Table 8
AEWS0031 Analyzed based on information up to December, 2007
Patient
Time
Events
Crude
Rates
Rate
Ratio
Log-
Rank
p-value
Standard Method 0.0474
Standard Treatment 10677 86 8.05 1.40
Intensive Treatment 11464 66 5.76
Personal Cutback
Method 0.0459
Standard Treatment 10674 84 7.87 1.41
Intensive Treatment 11462 64 5.58
Global Cutback Method 0.0366
Standard Treatment 10198 83 8.14 1.43
Intensive Treatment 10873 62 5.70
Pull-Forward Method 0.0488
Standard Treatment 11290 86 7.62 1.39
Intensive Treatment 12035 66 5.48
The second analysis presented allows for an additional year of follow-up past the original
analytic end-point of the trial. Again, though the rate estimates are disparate, the rate ratios are
similar. As expected, the standard method rates are larger than the pull-forward rates and the
personal cutback method produces rates that fall between them. In contrast, the global cutback
method estimates the rate in the intensive group to be only slightly less than the standard
estimate of the same. And the global estimate of the rate in the standard group is higher than that
of any other method. This is probably again a reflection of the distribution of event times. The
information that the global method lacks must be time that has elapsed without many events.
This again illustrates the impact of the event time distribution on the global cutback estimates.
All other rate estimates are smaller in this second analysis so in the interim period events have
not occurred as frequently. This change in the event rates shows that all methods are sensitive to
the distribution of event times. Even random events that are uniformly distributed will manifest
131
in any single realization as clusters and lapses to some degree and it is clear that this should be a
consideration in the interpretation of estimates of event rates.
5.4 Conclusion
The log-rank test statistic is a function of the number of events, the times at which they occur,
and the size of the risk sets at those times. The two cutback methods will incorporate fewer
numbers of events than the standard and pull-forward methods. Because this affects both the
numerator and the denominator of the statistic it is difficult to say how the p-value of the statistic
will behave. Though the standard and pull-forward methods include the same number of events,
the risk sets defined by the pull-forward method will be larger for late events, as they will
include patients the standard method censors at their last visit. The size of the risk set also
impacts both the numerator and denominator of the statistic. The characterization of the behavior
of the log-rank statistic is left for future work.
As indicated by the asymptotic calculations, the standard method over-estimates event
rates while the pull-forward method under-estimates them. The personal cutback method
produces rate estimates that fall between the standard and pull-forward rates, as one would
expect an unbiased estimate to behave. The global estimate, however, does not behave
predictably. In the AEWS0031 analyses, the rate estimates are the smallest of the methods in the
early analysis and among the largest in the later analysis. Despite being unbiased in the
theoretical situation of having an infinite amount of patients, this method is sensitive to the
characteristics of the finite samples to which it is applied.
Despite the differences in the event rate estimates, the rate ratios tend not to diverge too
largely from each other. Despite the fact that the rates are biased, comparing rates calculated in
the same, biased way will not result in as large a percentage of bias as is seen in the rate
132
estimates. The most striking example of this is the early analysis of AEWS0031 in which all rate
ratios are equal to the second decimal place. In particular, the standard and personal cutback
method produce very similar rate ratios throughout all four analyses above. However, all rate
ratios are dependent upon the time at which they are calculated. In many pediatric oncology
trials, earlier periods during the trial tend to see more events. At later points during follow-up,
events become less and less frequent and surviving patients tend to maintain their event-free
status for the remainder of follow-up. When event frequencies peak earlier for the standard
treatment than for the experimental treatment, early estimates of the rate ratio will place more
risk with the standard treatment than later estimates and will be biased away from the null.
In these particular finite samples, the personal cutback method consistently estimates event rates
to be between the over-estimated standard rates and the under-estimated pull-forward rates. The
rate ratio it produces is similar to the rate ratio of the standard method. When event rates within
the treatment groups are of interest, patients who are event-free at their last follow-up prior to
analysis should be censored at that time regardless of known subsequent event. This method
produces the most consistently unbiased rate estimates with minimal loss of incorporated
information and without making additional assumptions.
133
Chapter 6. Concluding Remarks and Future Work
This dissertation has shown that estimates of the hazard rates and the ratio are biased when the
standard approach of data reporting is taken. Using only information about events that occur in
the interval between a patient’s last visit and the time of analysis disregards information from
patients who do not have events in that time frame. Though the magnitude of the bias in the
hazard ratio is not severe, the bias is a definite concern when the hazard rates in the treatment
groups are of interest. The performance of statistical tests resulting from standard data reporting
are also sensitive to the amount of study time accrued before analysis is conducted in terms of
power and size.
The simplest way to deal with this bias is to change the way that patient information is
incorporated into estimates and statistical tests. I have presented three alternative data collection
procedures and compared the performance of these methods to each other and to the standard
method of data reporting. Though the global cutback method, which simply sets an earlier
analysis date, may seem the most intuitive approach, it is has the poorest performance in terms of
power of any method presented due to the loss of information that occurs with eliminating an
entire calendar period of information. The pull-forward method simply makes the additional
assumption that all events are reported in a timely fashion so that all patients without a reported
event are still event-free at the time of analysis. When this assumption is met, this method is
unbiased, is powerful enough to detect a true difference in treatments, and does not often falsely
conclude a difference exists when one does not. However, when the assumption that it makes is
not met, the pull-forward method can exhibit even more bias than the standard method and tends
to reject the null hypothesis regardless of the true underlying rates. It is therefore my
recommendation that the personal cutback method be used to estimate hazard rates, treatment
134
effect, and to conduct statistical tests of the difference between survival rates in treatment
groups. This method employs an ignorable censoring mechanism while retaining most of the
information available about the patients under observation.
Another implication of the work presented here is that variables such as probability of
late-reported events, the interval between visits, the time of analysis, and the procedures used to
incorporate data into analyses all impact the characteristics of statistical test. These factors must
therefore be considered in the statistical design of any clinical trial.
This dissertation has focused on the final analysis of a trial, usually conducted once all
patients have been enrolled and have completed treatment according to the study protocol. The
behavior of the estimates presented here suggests that all methods are sensitive to a reduction in
observed trial time. This would imply that the performance of estimates in terms of bias, power,
and size may be worse when tests are conducted while active enrollment or treatment is ongoing.
Hence, the most important area in which to further investigate methods of data collection is that
of interim monitoring.
Another natural extension of the material presented here is using data collected by these
methods in a Cox regression setting. The probability of an event being reported with delay may
be associated with some patient characteristics, such as stage of disease or proximity to a
reporting institution, which may also be associated with survival. Cox regression would permit
the investigation of the performance of estimators and tests when such individual patient
characteristics are included in a statistical model.
Another subject not addressed in here is that of irregular or missing visits. As the cutback
methods presented here rely on the assumption that a follow-up schedule is uniform and adhered
to, these methods would need to be altered to take irregular visits into account. As seen in the
135
trial data presented in chapter 5, visit schedules may be different while a patient is on a protocol
therapy and after therapy has been completed. Visit schedules may also be different for the
experimental therapy and the standard therapy. Furthermore, the actual adherence to a pre-
determined visit schedule may rely on patient characteristics which may, in turn, be associated
with survival. All of these factors will affect the performance of the methods presented here and
may require additional statistical machinery to produce estimates that minimize bias and provide
statistical tests with desirable properties.
The modification of data reporting is one approach to deal with the incongruity between
the assumptions of standard analysis methods and the way data are actually collected and
incorporated into such analyses. Another approach to consider would be sampling techniques.
Should a patient experience an event after her last scheduled visit prior to the analysis, an
investigator may choose to select a small, random sample of the remaining event-free patients to
bring in for evaluation. In this way, data from event-free patients could be ascertained late in the
trial in addition to data from patients experiencing events.
Finally, it would be possible to construct a likelihood-based model that would allow for
estimation and testing procedures that take into account the uncertainty in patient statuses
towards the end of the trial through specification of a statistical model. The incorporation of the
patient-time contribution from patients who are without reported event at the time of analysis
could be adjusted by the inclusion of additional parameters such as the probability of unreported
events and the interval between scheduled visits.
136
References
Asgharian, M., M’Lan, C. E., Wolfson, D. B. (2002). Length -biased sampling with right
censoring: an unconditional approach. Journal of the American Statistical Association 97, 201–
209.
Brookmeyer, R., Damiano, A. (1989). Statistical methods for short-term projections of AIDS
incidence. Statistics in Medicine 8, 23-34.
Casper, T. C., Cook, T.D. (2012). Estimation of the mean frequency function for recurrent events
when ascertainment of events is delayed. The International Journal of Biostatistics 8, Issue 1,
Article 4.
Cox, D.R., Hinkley, D.V. (1974). Theoretical Statistics. London: Chapman and Hall.
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood estimation from
incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39, 1-38.
Enanoria, W.T.A., Hubbard, A.E., van der Laan, M.J., Chen, M., Ruiz, J., Colford Jr, J.M.
(2007). Early prediction of median survival among a large AIDS surveillance cohort. BMC
Public Health 7:127.
Fine, J.P., Tsiatis, A.A. (2000). Testing for differences in survival with delayed ascertainment.
Biometrics 56, 145-153.
Grier HE, Krailo MD, Tarbell NJ, Link MP, Fryer CJH, Pritchard DJ, Gebhardt MC, Dickman
PS, Perlman EJ, Meyers PA, Donaldson SS, Moore S, Rausen AR, Vietti TJ and Miser JS.
Addition of Ifosfamide and Etoposide to standard chemotherapy for Ewing’s sarcoma and
primitive neuroectodermal tumor of bone. NEJM 348: 694-701, 2003.
Goodman, A.C., Peng, Y., Hankin, J.R., Kalist, D.E., Spurr, S.J. (2004). Estimating episode
lengths when some observations are probably censored. Statistics in Medicine 23, 2071-2087.
137
Harris, J.E. (1990). Reporting delays and the incidence of AIDS. Journal of the American
Statistical Association 85, 915-924.
Heitjan, D.F., Rubin, D.B. (1991). Ignorability and course data. The Annals of Statistics 19,
2244-2253.
Heitjan, D.F. (1993). Ignorability and course data: some biomedical examples. Biometrics 49,
1099-1109.
Hesselager, O., Witting, T. (1987). A credibility model with random fluctuations in delay
probabilities for the prediction of IBNR claims. Astin Bulletin 18, 20-26.
Hu, P., Tsiatis, A.A., Estimating the survival distribution when ascertainment of vital status is
subject to delay. Biometrika 83, 371-380.
Hubbard, A.E., van der Laan, M.J., Enanoria, W., Coldford Jr., J.M. (2000). Nonparametric
survival estimation when death is reported with delay. Lifetime Data Analysis 6, 237-250.
Kalbfleisch, J.D., Lawless, J.F., Robinson, J.A. (1991). Methods for the analysis and prediction
of warranty claims. Technometrics 33, 273-285.
Kalbfleisch, J.D., Lawless, J.F. (1991). Regression models for right truncated data with
applications to AIDS incubation times and reporting lags. Statistica Sinica 1, 19-32.
Kalbfleisch, J.D., Prentice, R.L. (2002). The statistical analysis of failure time data, 2
nd
edition.
Hoboken: John Wiley & Sons, Inc.
Kopperschmidt, K., Stute, W. (2007). The CLT under right censorship and reporting delays.
Journal of Statistical Planning and Inference 137, 1035-1042.
Lawless, J.F. (1994). Adjustments for reporting delays and the prediction of occurred but not
reported events. The Canadian Journal of Statistics 22, 15-31.
138
Li, K.H., Raghunathan, T.E., Rubin, D.B. (1991). Large-sample significance levels from
multiply imputed data using moment-based statistics and an F reference distribution. Journal of
the American Statistical Association, 86, 1065-1073.
Midthune, D.N., Fay, M.P., Clegg, L.X., Feuer, E.J. (2005). Modeling reporting delays and
reporting corrections in cancer registry data. Journal lf the American Statistical Association 100,
61-70.
Pagano, M., Tu, X.M., De Grutolla, V., MaWhinney, S. (1994). Regression analysis of censored
and truncated data: estimating reporting-delay distributions and AIDS incidence from
surveillance data. Biometrics 50, 1203-1214.
Rice. (2009) The expected value of the ratio of correlated random variables. (unpublished note).
Robins, J.M., Rotnitzky, A. (1992). Recovery of information and adjustment for dependent
censoring using surrogate markers. AIDS Epidemiology: Methodological Issues, 297-331.
Robins, J.M. (1993). Information recovery and bias adjustment in proportional hazards
regression analysis of randomized trials using surrogate markers. 1993 Proceedings of the
Biopharmaceutical Section, American Statistical Association, 24-33.
Sun, J., Liao, Q., Chiu, J. (2003). Simple and direct nonparametric estimation of a survival
function in the presence of reporting lags. Journal of Nonparametric Statistics 15, 395-401.
Tu, X.M., Meng, X., Pagano, M. (1993). The AIDS epidemic: estimating survival after AIDS
diagnosis from surveillance data. Journal of the American Statistical Association 88, 26-36.
Turnbull, B.W. (1976). The empirical distribution function with aribrarily grouped, censored,
and truncated data. Journal of the Royal Statistical Society 38, 290-295.
139
Van der Laan, M.J., Hubbard, A.E. (1998). Locally efficient estimation of the survival
distribution with right-censored data and covariates when collection of data is delayed.
Biometrika 85, 771-783.
Van der Laan, M.J., McKeague, I.W. (1998). Efficient estimation from right-censored data when
failure indicators are missing at random. The Annals of Statistics 26, 164-182.
Wang, J., Ke, C., Jiang, Q., Zhang, C., Snapinn, S. (2011). Predicting analysis time in event-
driven clinical trials with event-reporting lag. Statistics in Medicine 31, 801-811.
Wang, M.C., Jewell, N.P., Tsai, W.Y. (1986). Asymptotic properties of the product limit
estimate under random truncation. The Annals of Statistics 14, 1597-1605.
Wang, M.C. (1992). The analysis of retrospectively ascertained data in the presence of reporting
delays. Journal of the American Statistical Association 87, 397-406.
Womer RB, West DC, Krailo MD, Dickman PS, Pawel BR, Grier E, Marcus K, Sailer S, Healey
JH, Dormans JP and Weiss AR Randomized Controlled Trial of Interval-Compressed
Chemotherapy for the Treatment of Localized Ewing Sarcoma: A Report From the Children's
Oncology Group. J Clin Oncol 30: 4148-54 (2012), PM ID: 23091096
Zhang, J., Heitjan, D.F. (2006). A simple local sensitivity analysis tool for nonignorable
coarsening: application to dependent censoring. Biometrics 62, 1260-1268.
Abstract (if available)
Abstract
In randomized clinical trials, in which time to event is of interest, it is common practice to censor event‐free patients at their last visit prior to analysis while recording survival times of patients with events at any time. I show that this preferential method of censoring can influence estimates of the hazard rates and ratios when analysis is conducted before all patients experience an event, as is usually the case in clinical trials. Three alternate methods of data collection are proposed and asymptotic expressions for hazard rate estimates are derived for all methods both in general and when the underlying survival distribution is assumed to be exponential. The effects of trial length, visit schedule, and probability of reporting delay on estimates of the hazard rates, ratios, power, and test size in an exponential regression setting are established by systematically varying these parameters in simulation studies. The four data collection methods discussed are applied to two randomized clinical trials. A data collection method is proposed based on asymptotic properties and performance in simulated trials. Areas of future work are discussed of which interim monitoring is of particular interest.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The effect of delayed event reporting on interim monitoring methodologies in randomized clinical trials
PDF
Applications of multiple imputations in survival analysis
PDF
The effects of late events reporting on futility monitoring of Phase III randomized clinical trials
PDF
Estimation of treatment effects in randomized clinical trials which involve non-trial departures
PDF
Interim analysis methods based on elapsed information time: strategies for information time estimation
PDF
Eribulin in advanced bladder cancer patients: a phase I/II clinical trial
PDF
A comparison of methods for estimating survival probabilities in two stage phase III randomized clinical trials
PDF
An assessment of impact of early local progression on subsequent risk for the treatment failure in adolescent and young adult patients with non-metastatic osteosarcoma
PDF
Comparison for efficacy and toxicity in 2 vs 3 cycles of cisplatin consolidation therapy in children nasopharyngeal carcinoma
PDF
A simulation evaluation of the effectiveness and usability of the 3+3 rules-based design for phase I clinical trials
PDF
Survival of children and adolescents with low-risk non-rhabdomyosarcoma soft tissue sarcomas (NRSTS) treated with surgery only: an analysis of 234 patients from the Children’s Oncology Group stud...
PDF
Extremity primary tumors in non-rhabdomyosarcoma soft tissue sarcoma: survival analysis
PDF
The impact of statistical method choice: evaluation of the SANO randomized clinical trial using two non-traditional statistical methods
PDF
Statistical analysis of a Phase II study of AMG 386 versus AMG 386 combined with anti-VEGF therapy in patients with advanced renal cell carcinoma
PDF
An assessment of necrosis grading in childhood osteosarcoma: the effect of initial treatment on prognostic significance
PDF
Phase I clinical trial designs: range and trend of expected toxicity level in standard A+B designs and an extended isotonic design treating toxicity as a quasi-continuous variable
PDF
Latent change score analysis of the impacts of memory training in the elderly from a randomized clinical trial
PDF
A novel risk-based treatment strategy evaluated in pediatric head and neck non-rhabdomyosarcoma soft tissue sarcomas (NRSTS) patients: a survival analysis from the Children's Oncology Group study...
PDF
Incidence and survival rates of the three major histologies of renal cell carcinoma
PDF
The effect of vitamin D supplementation on the progression of carotid intima-media thickness and arterial stiffness in elderly African American women: Results of a randomized placebo-controlled trial
Asset Metadata
Creator
McIlvaine, Elisabeth J.
(author)
Core Title
The impact of data collection procedures on the analysis of randomized clinical trials
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Biostatistics
Publication Date
06/30/2015
Defense Date
03/10/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
clinical trials,current status and delay,data collection procedures,informative censoring,OAI-PMH Harvest,parametric survival models,preferential reporting,proportional hazards,survival analysis,visit schedule
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Krailo, Mark (
committee chair
), Azen, Stanley P. (
committee member
), Groshen, Susan L. (
committee member
), Mack, Wendy Jean (
committee member
), Stram, Daniel O. (
committee member
)
Creator Email
ej_mcilvaine@yahoo.com,emcilvai@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-584037
Unique identifier
UC11301076
Identifier
etd-McIlvaineE-3524.pdf (filename),usctheses-c3-584037 (legacy record id)
Legacy Identifier
etd-McIlvaineE-3524.pdf
Dmrecord
584037
Document Type
Dissertation
Format
application/pdf (imt)
Rights
McIlvaine, Elisabeth J.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
clinical trials
current status and delay
data collection procedures
informative censoring
parametric survival models
preferential reporting
proportional hazards
survival analysis
visit schedule