Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The effect of delayed event reporting on interim monitoring methodologies in randomized clinical trials
(USC Thesis Other)
The effect of delayed event reporting on interim monitoring methodologies in randomized clinical trials
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
The Effect of Delayed Event Reporting on Interim Monitoring Methodologies in
Randomized Clinical Trials
by
Sandy Oliver López Nájera
________________________________________________
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements of the Degree
DOCTOR OF PHILOSOPHY in BIOSTATISTICS
August 2016
Copyright 2016 Sandy Oliver López Nájera
i
Dedication
To the undocumented student for all your hard work and determination. Your investment of
blood, sweat, and tears toward higher learning will not have been in vain.
ii
Acknowledgements
First, mom and dad, thank you for your unconditional love and support, los quiero mucho. To my
life partner Lucy, thank you for your love and encouragement through the home stretch of
obtaining this degree, I love you wifey. To my brother Ricardo, thank you for leading by example
and his wife Ana for her encouragement. To Joshua, thank you for being a constant source of
motivation. You are all the source of my strength. I love and cherish you all, mi família.
Special appreciation for Dr. Mark Krailo. Thank you Dr. Krailo for guiding me through
the most transformative process of my academic career and for accepting coffee as barter for
knowledge and wisdom. A warm thank-you to my committee members whose guidance was key
to better communicating my research.
A warm and sincere muchas gracias for all of the angels that opened their door to me when
I came knocking for help. Your courage in helping an undocumented student like me share in your
collective resources made all of the difference in the world – thank you for seeing beyond the color
of my skin and my lack of nine little digits.
To my mentors and life coaches in no particular order: Dr. Juan Francisco Lara, Dr. Betty
Uribe, Dr. Armida Ayala, Dr. Shubha Kumar, Paul Riordan, Domenika Lynch, Father Rafael
Luevano, and Dr. Luis Ortiz-Franco. Your impact in my life has made all the difference.
To Dr. Elizabeth McIlvaine, you are a super-Shero. Thank you for being so willing to
share your knowledge with me and for your friendship throughout these years.
iii
Table of Contents
Dedication ………………………………………………………………………………...
i
Acknowledgements ……………………………………………………………………….
ii
List of Figures …………………………………………………………………………….
vi
List of Tables ……………………………………………………………………………...
ix
Abstract …………………………………………………………………………………...
xvi
Chapter 1 Introduction ……………………………………………………………………
1
1.1 Clinical trial and data processing procedures ………………………………………
1
1.1.1 Censoring …………………………………………………………………….
2
1.1.2 Reporting of events and data processing procedure ………………………….
4
1.2 Interim monitoring ………………………………………………………………...
5
1.2.2 Data Processing Methods …………………………………………………….
5
Chapter 2 Literature Review ……………………………………………………………...
12
2.1 Sequential Analysis: a brief introduction …………………………………………..
12
2.1.1 Group-sequential methods ……………………………………………………
12
2.1.2 Information Time …………………………………………………………….
14
2.2 Brownian Motion and Independent Increments ……………………………………
16
2.2.1 Lan and Zucker ……………………………………………………………….
16
2.2.2 Scharfstein, Tsiatis, and Robins ………………………………………………
18
2.3 Equal Information Fraction Methods ………………………………………………
21
2.3.1 The Pocock Method …………………………………………………………..
21
2.3.2 The O’Brien and Fleming Method ……………………………………………
22
2.3.3 The Power Family Method of Wang and Tsiatis ……………………………..
22
2.4 The α-Spending Function ………………………………………………………….
23
2.4.1 The Lan and DeMets Spending Function …………………………………….
23
2.4.2 The Gamma-Family of Hwang, Shih, and De Cani …………………………..
26
2.5 Events Reported With Delay in Survival Analysis ………………………………...
27
2.5.1 Hu and Tsiatis ………………………………………………………………...
28
2.5.2 Van der Laan and Hubbard …………………………………………………...
31
2.5.3 McIlvaine …………………………………………………………………….
32
iv
Chapter 3 Correlation between Increments of the Score Function ………………………..
35
3.1 Assumptions and Censoring ……………………………………………………….
35
3.1.1 Data Processing Methods …………………………………………………….
36
3.2 Deriving the Asymptotic Behavior of the Correlation between Increments of the
Score Function ……………………………………………………………………. 38
3.2.1 Maximum Likelihood Estimate of the Hazard Ratio …………………………
39
3.2.2 The Multivariate Delta Method ………………………………………………
41
3.2.3 Deriving Elements of the Covariance Matrix ………………………………...
44
3.2.4 Asymptotic Results …………………………………………………………..
58
3.3 Considerations on Trial Design ……………………………………………………
64
3.3.1 Methods ………………………………………………………………………
64
3.3.2 Results and Conclusion ………………………………………………………
66
3.4 Conclusion ………………………………………………………………………...
67
Chapter 4 Simulation Studies ……………………………………………………………..
69
4.1 Methods ……………………………………………………………………………
70
4.1.1 Data Generation ………………………………………………………………
70
4.1.2 Data Analysis ………………………………………………………………...
72
4.2 Results: Bias in Estimates of the Natural Log of the Hazard Ratio and Other
Outcomes of Interest ……………………………………………………………… 76
4.2.1 Under the Hazard Ratio from Study Design ………………………………….
76
4.2.1.1 Effect of delayed event reporting ……………………………………….
77
4.2.1.2 Effect of follow-up interval length ……………………………………...
87
4.2.2 Departure of the Hazard Ratio from Study Design …………………………..
95
4.2.2.1 Effect of delayed event reporting ………………………………………..
96
4.2.2.2 Effect of follow-up interval length ……………………………………...
105
4.2.3 Probability of a Type I Error under the null hypothesis ………………………
113
4.2.3.1 Effect of Delayed Reporting and Follow-up Interval Length ……………
114
4.2.4 Conclusion …………………………………………………………………...
115
4.3 Results: Effect of Delayed Reporting and Follow-up Interval Length on the
Assumption of Independent Increments ………………………………………….. 122
4.3.1 Investigation of the Assumption of Independent Increments …………………
124
4.3.2 Issues with Application of Asymptotic Results in a Final Sample Setting …..
130
4.3.3 Conclusion …………………………………………………………………...
131
4.4 Conclusion and Recommendations ………………………………………………..
132
v
Chapter 5 Application to Real Data ……………………………………………………….
135
5.1 AEWS0031 ………………………………………………………………………..
135
5.2 Methods ……………………………………………………………………………
135
5.3 Results ……………………………………………………………………………..
136
5.4 Conclusion ………………………………………………………………………...
140
Chapter 6 Conclusions and Future Work …………………………………………………. 141
References ………………………………………………………………………………... 146
Appendix Tables from Simulation Results ………………………………………………..
150
A1. Under the Hazard Ratio from Study Design ………………………………………
150
A2. Under Departure of the Hazard Ratio from Study Design ………………………..
156
A3. Under the Null Hypothesis ………………………………………………………..
162
A4. Correlation between Increments of the Score Statistic ……………………………
166
A5. Expected Values of Elements in Variance Covariance Matrix for K=3 …………...
167
A6. Variance Covariance Matrices K=3 ……………………………………………….
168
A7. Variance Covariance Matrices of the Score Statistics for K=3 ……………………
170
A8. Variance Covariance Matrices of the Increments of the Score Statistics for K=3 ..
172
A9. Exit Probabilities Under the Null Hypothesis ……………………………………..
174
vi
List of Figures
Figure 1.1. Graphical depiction of survival time under observation for participants in a
clinical trial relative to time of analysis. Lines ending in bold points denote observed
failure.
3
Figure 1.2. An example of the true event status of 7 participants in a trial with random
enrollment.
7
Figure 1.3. The effect of applying the standard data processing method on data from
figure 2.
8
Figure 1.4. The effect of applying the personal cutback data processing method on data
from figure 2.
9
Figure 1.5. The effect of applying the global cutback data processing method on data
from figure 2.
10
Figure 1.6. The effect of applying the pull-forward data processing method on data from
figure 2.
11
Figure 2.1 shows the graphs of the α spending functions ∗
, ∗
, ∗
, ∗
, and ∗
.
26
Figure 3.1. Correlation coefficient of ( , − ), under the null hypothesis for the
standard method.
59
Figure 3.2. Correlation coefficient of ( − , − ), under the null hypothesis for the
standard method.
61
Figure 3.3. Correlation coefficient of ( , − ), under the null hypothesis for the pull-
forward method.
62
Figure 3.4. Correlation coefficient of ( − , − ), under the null hypothesis for the
pull-forward method.
63
Figure 4.1. The quadratic spending function where α=0.05 and the group-sequential
boundaries for information fractions of 33% and 66% and 100%.
74
Figure 4.2. The mean log hazard ratio, among trials that reject the null hypothesis at the
first interim analysis, for each data processing method across different delayed reporting
schemes.
78
Figure 4.3. The mean log hazard ratio, among trials that reject the null hypothesis at the
second interim analysis, for each data processing method across different delayed
reporting schemes.
81
vii
Figure 4.4. The mean log hazard ratio, among trials that reject the null hypothesis at the
final analysis, for each data processing method across different delayed reporting
schemes.
82
Figure 4.5. The mean log hazard ratio, among trials that do reject the null hypothesis at
any analysis, for each data processing method across different delayed reporting schemes.
85
Figure 4.6. The effect of delayed reporting and length of follow-up interval on the mean
log hazard ratio estimates from the standard method of data processing, under the
alternative hypothesis of a hazard ratio of 0.67
88
Figure 4.7. The effect of delayed reporting and length of follow-up interval on the mean
log hazard ratio estimates from the pull-forward method of data processing, under the
alternative hypothesis of a hazard ratio of 0.67
92
Figure 4.8. The effect of delayed reporting and length of follow-up interval on the mean
log hazard ratio estimates from the personal cutback method of data processing, under the
alternative hypothesis of a hazard ratio of 0.67
94
Figure 4.9. The effect of delayed reporting and length of follow-up interval on the mean
log hazard ratio estimates from the global cutback method of data processing, under the
alternative hypothesis of a hazard ratio of 0.67
95
Figure 4.10. The mean log hazard ratio, among trials that reject the null hypothesis at the
first interim analysis, for each data processing method across different delayed reporting
schemes.
97
Figure 4.11. The mean log hazard ratio, among trials that reject the null hypothesis at the
second interim analysis, for each data processing method across different delayed
reporting schemes.
99
Figure 4.12. The mean log hazard ratio, among trials that reject the null hypothesis at the
final analysis, for each data processing method across different delayed reporting
schemes.
101
Figure 4.13. The mean log hazard ratio, among trials that do reject the null hypothesis at
any analysis, for each data processing method across different delayed reporting schemes.
103
Figure 4.14. The effect of delayed reporting and length of follow-up interval on the mean
log hazard ratio estimates from the standard method of data processing, under the
alternative hypothesis of a hazard ratio of 0.85
106
Figure 4.15. The effect of delayed reporting and length of follow-up interval on the mean
log hazard ratio estimates from the pull-forward method of data processing, under the
alternative hypothesis of a hazard ratio of 0.85
109
viii
Figure 4.16. The effect of delayed reporting and length of follow-up interval on the mean
log hazard ratio estimates from the personal cutback method of data processing, under the
alternative hypothesis of a hazard ratio of 0.85
111
Figure 4.17. The effect of delayed reporting and length of follow-up interval on the mean
log hazard ratio estimates from the global cutback method of data processing, under the
alternative hypothesis of a hazard ratio of 0.85
113
Figure 4.18. The effect of delayed reporting and length of follow-up interval on the type
I error rate from the standard and personal cutback methods of processing data.
116
Figure 4.19. The effect of delayed reporting and length of follow-up interval on the
observed power from the standard and personal cutback methods of processing data,
under the alternative hypothesis of a hazard ratio of 0.67.
120
Figure 4.20. The effect of delayed reporting and length of follow-up interval on the
observed power from the standard and personal cutback methods of processing data,
under the alternative hypothesis of a hazard ratio of 0.85.
121
Figure 4.21. The effect of delayed reporting and length of follow-up interval on the
correlation between the first score statistic and the increment from the first to the second
score statistic, for the standard method, under the null hypothesis of no treatment effect.
Note red denotes statistical significance
126
Figure 4.22. The effect of delayed reporting and length of follow-up interval on the
correlation between the increment from the first to the second score statistic and the
increment from the second to the third score statistic, for the standard method, under the
null hypothesis of no treatment effect. Note red denotes statistical significance
127
Figure 4.23. The effect of delayed reporting and length of follow-up interval on the
correlation between the first score statistic and the increment from the first to the second
score statistic, for the pull-forward method, under the null hypothesis of no treatment
effect. Note red denotes statistical significance
128
Figure 4.24. The effect of delayed reporting and length of follow-up interval on the
correlation between the increment from the first to the second score statistic and the
increment from the second to the third score statistic, for the pull-forward method, under
the null hypothesis of no treatment effect. Note red denotes statistical significance
129
ix
List of Tables
Table 4.1 Mean Information Fraction for rejection group 1 for standard and pull-forward
methods under the alternative hypothesis of ln(HR=0.67)=-0.4004776.
79
Table 4.2 Mean Information Fraction for rejection group 2 for standard and pull-forward
methods for the first and second interim analyses under the alternative hypothesis of
ln(HR=0.67)=-0.4004776
80
Table 4.3 Mean Information Fraction for rejection group 3 for standard and pull-forward
methods for the first and second interim analyses and the final analysis under the
alternative hypothesis of ln(HR=0.67)=-0.4004776.
83
Table 4.4 Mean Information Fraction for rejection group 4 for standard and pull-forward
methods for the first and second interim analyses and the final analysis under the
alternative hypothesis of ln(HR=0.67)=-0.4004776.
86
Table 4.5 Observed power for the standard and pull-forward methods under the
alternative hypothesis of ln(HR=0.67)=-0.4004776.
86
Table 4.6 Mean trial lengths in years for the standard and pull-forward methods under
the alternative hypothesis of ln(HR=0.67)=-0.4004776.
87
Table 4.7 Observed power for the standard method for window lengths of 0.125 years
and 0.25 years under the alternative hypothesis of ln(HR=0.67)=-0.4004776.
90
Table 4.8 Mean trial length for the standard method for narrowing window lengths under
the alternative hypothesis of ln(HR=0.67)=-0.4004776.
91
Table 4.9 Observed power for the pull-forward method for window lengths of 0.125
years and 0.25 years under the alternative hypothesis of ln(HR=0.67)=-0.4004776.
91
Table 4.10 Mean trial length in years for the pull-forward method for window lengths of
0.125 years and 0.25 years under the alternative hypothesis of ln(HR=0.67)=-0.4004776.
93
Table 4.11 Mean Information Fraction for rejection group 1 for standard and pull-
forward methods under the alternative hypothesis of ln(HR=0.85)=-0.1625189.
98
Table 4.12 Mean Information Fraction for rejection group 2 for standard and pull-
forward methods for the first and second interim analyses under the alternative
hypothesis of ln(HR=0.85)=-0.1625189.
100
Table 4.13 Mean Information Fraction for rejection group 3 for standard and pull-
forward methods for the first and second interim analyses and the final analysis under
the alternative hypothesis of ln(HR=0.85)=-0.1625189.
102
x
Table 4.14 Mean Information Fraction for rejection group 4 for standard and pull-
forward methods for the first and second interim analyses and the final analysis under
the alternative hypothesis of ln(HR=0.85)=-0.1625189.
104
Table 4.15 Observed power for the standard and pull-forward methods under the
alternative hypothesis of ln(HR=0.85)=-0.1625189.
104
Table 4.16 Mean trial lengths in years for the standard and pull-forward methods under
the alternative hypothesis of ln(HR=0.85)=-0.1625189.
105
Table 4.17 Observed power for the standard method for window lengths of 0.125 years
and 0.25 years under the alternative hypothesis of ln(HR=0.85)=-0.1625189.
107
Table 4.18 Mean trial length for the standard method for window lengths of 0.125 years
and 0.25 years under the alternative hypothesis of ln(HR=0.85)=-0.1625189.
108
Table 4.19 Observed power for the standard method for window lengths of 0.125 years
and 0.25 years under the alternative hypothesis of ln(HR=0.85)=-0.1625189.
109
Table 4.20 Mean trial length for the standard method for window lengths of 0.125 years
and 0.25 years under the alternative hypothesis of ln(HR=0.85)=-0.1625189.
110
Table 4.21 The Effect of window length between visits and delayed reporting of events
on the observed type I error rate for the standard and pull-forward methods for window
lengths of 0.125 years and 0.25 years under the alternative hypothesis of ln(HR=0.85)=-
0.1625189.
115
Table 5.1. Results from first interim analysis at calendar time 12/31/2002
137
Table 5.2. Results from second interim analysis at calendar time 12/31/2004
138
Table 5.3. Results from third interim analysis at calendar time 6/30/2005
139
Table 5.4. Results from final analysis at calendar time 10/30/2009
139
Table A1.1. Proportion of trials under the alternative hypothesis of a log hazard ratio of
0.67 for trials in rejection group 1 for all data processing methods by delayed reporting
probability and length between scheduled visits.
150
Table A1.2. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio
of 0.67 for trials in rejection group 1 for all data processing methods by delayed reporting
probability and length between scheduled visits.
150
Table A1.3. Mean information fraction under the alternative hypothesis of a log hazard
ratio of 0.67 for trials in rejection group 1 for all data processing methods by delayed
reporting probability and length between scheduled visits.
151
xi
Table A1.4. Proportion of trials under the alternative hypothesis of a log hazard ratio of
0.67 for trials in rejection group 2 for all data processing methods by delayed reporting
probability and length between scheduled visits.
151
Table A1.5. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio
of 0.67 for trials in rejection group 2 for all data processing methods by delayed reporting
probability and length between scheduled visits.
152
Table A1.6. Mean information fraction under the alternative hypothesis of a log hazard
ratio of 0.67 for trials in rejection group 2 for all data processing methods by delayed
reporting probability and length between scheduled visits.
152
Table A1.7. Proportion of trials under the alternative hypothesis of a log hazard ratio of
0.67 for trials in rejection group 3 for all data processing methods by delayed reporting
probability and length between scheduled visits.
153
Table A1.8. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio
of 0.67 for trials in rejection group 3 for all data processing methods by delayed reporting
probability and length between scheduled visits.
153
Table A1.9. Mean information fraction under the alternative hypothesis of a log hazard
ratio of 0.67 for trials in rejection group 3 for all data processing methods by delayed
reporting probability and length between scheduled visits.
154
Table A1.10. Proportion of trials under the alternative hypothesis of a log hazard ratio
of 0.67 for trials in rejection group 4 for all data processing methods by delayed reporting
probability and length between scheduled visits.
154
Table A1.11. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio
of 0.67 for trials in rejection group 4 for all data processing methods by delayed reporting
probability and length between scheduled visits.
155
Table A1.12. Mean information fraction under the alternative hypothesis of a log hazard
ratio of 0.67 for trials in rejection group 4 for all data processing methods by delayed
reporting probability and length between scheduled visits.
155
Table A2.1. Proportion of trials under the alternative hypothesis of a log hazard ratio of
0.85 for trials in rejection group 1 for all data processing methods by delayed reporting
probability and length between scheduled visits.
156
Table A2.2.. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio
of 0.85 for trials in rejection group 1 for all data processing methods by delayed reporting
probability and length between scheduled visits.
156
xii
Table A2.3. Mean information fraction under the alternative hypothesis of a log hazard
ratio of 0.85 for trials in rejection group 1 for all data processing methods by delayed
reporting probability and length between scheduled visits.
157
Table A2.4. Proportion of trials under the alternative hypothesis of a log hazard ratio of
0.85 for trials in rejection group 2 for all data processing methods by delayed reporting
probability and length between scheduled visits.
157
Table A2.5. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio
of 0.85 for trials in rejection group 2 for all data processing methods by delayed reporting
probability and length between scheduled visits.
158
Table A2.6. Mean information fraction under the alternative hypothesis of a log hazard
ratio of 0.85 for trials in rejection group 2 for all data processing methods by delayed
reporting probability and length between scheduled visits.
158
Table A2.7. Proportion of trials under the alternative hypothesis of a log hazard ratio of
0.85 for trials in rejection group 3 for all data processing methods by delayed reporting
probability and length between scheduled visits.
159
Table A2.8. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio
of 0.85 for trials in rejection group 3 for all data processing methods by delayed reporting
probability and length between scheduled visits.
159
Table A2.9. Mean information fraction under the alternative hypothesis of a log hazard
ratio of 0.85 for trials in rejection group 3 for all data processing methods by delayed
reporting probability and length between scheduled visits.
160
Table A2.10. Proportion of trials under the alternative hypothesis of a log hazard ratio
of 0.85 for trials in rejection group 4 for all data processing methods by delayed reporting
probability and length between scheduled visits.
160
Table A2.11. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio
of 0.85 for trials in rejection group 4 for all data processing methods by delayed reporting
probability and length between scheduled visits.
161
Table A2.12. Mean information fraction under the alternative hypothesis of a log hazard
ratio of 0.85 for trials in rejection group 4 for all data processing methods by delayed
reporting probability and length between scheduled visits.
161
Table A3.1. Proportion of trials under the null hypothesis for trials in rejection group 1
for all data processing methods by delayed reporting probability and length between
scheduled visits.
162
xiii
Table A3.2. Mean log hazard ratio under null hypothesis for trials in rejection group 1
for all data processing methods by delayed reporting probability and length between
scheduled visits.
162
Table A3.3. Proportion of trials under the null hypothesis for trials in rejection group 2
for all data processing methods by delayed reporting probability and length between
scheduled visits.
163
Table A3.4. Mean log hazard ratio under null hypothesis for trials in rejection group 2
for all data processing methods by delayed reporting probability and length between
scheduled visits.
163
Table A3.5. Proportion of trials under the null hypothesis for trials in rejection group 3
for all data processing methods by delayed reporting probability and length between
scheduled visits.
164
Table A3.6. Mean log hazard ratio under null hypothesis for trials in rejection group 3
for all data processing methods by delayed reporting probability and length between
scheduled visits.
164
Table A3.7. Proportion of trials under the null hypothesis for trials in rejection group 4
for all data processing methods by delayed reporting probability and length between
scheduled visits.
165
Table A3.8. Mean log hazard ratio under null hypothesis for trials in rejection group 4
for all data processing methods by delayed reporting probability and length between
scheduled visits.
165
Table A4.1. Correlation coefficient of the estimate of the first score statistic with the
increment to the second score statistic for all processing methods by delayed reporting
probabilities under the null hypothesis.
166
Table A4.2. Correlation coefficient of the estimate of the increment from the first score
statistic to the second and the estimate of the increment from the second score statistic
to the third by delayed reporting probabilities under the null hypothesis.
166
Table A5.1. Comparison of expected values of elements of the covariance matrix (3.11)
for the standard method of data processing under the null hypothesis of no treatment
effect when ρ=0 and w=0.5.
167
Table A6.1. Asymptotic variance covariance matrix (3.6) of vector v (3.4) for group j,
under the null hypothesis under the standard method of data processing when ρ=0 and
w=0.5.
168
xiv
Table A6.2. Variance covariance matrix (3.6) of vector v (3.4) for group j, under the null
hypothesis under the standard method of data processing for simulated trial data with
104 participants per group* when ρ=0 and w=0.5.
168
Table A6.3. Variance covariance matrix (3.6) of vector v (3.4) for group j, under the null
hypothesis under the standard method of data processing for simulated trial data with
500 participants per group* when ρ=0 and w=0.5.
168
Table A6.4. Variance covariance matrix (3.6) of vector v (3.4) for group j, under the null
hypothesis under the standard method of data processing for simulated trial data with
1,000 participants per group* when ρ=0 and w=0.5.
169
Table A6.5. Variance covariance matrix (3.6) of vector v (3.4) for group j, under the null
hypothesis under the standard method of data processing for simulated trial data with
2,000 participants per group* when ρ=0 and w=0.5.
169
Table A7.1. Asymptotic variance covariance matrix (3.11) of the score statistics for
group j, under the null hypothesis under the standard method of data processing when
ρ=0 and w=0.5.
170
Table A7.2. Variance covariance matrix (3.11) of the score statistics for group j, under
the null hypothesis under the standard method of data processing for simulated data with
104 participants per group* when ρ=0 and w=0.5.
170
Table A7.3. Variance covariance matrix (3.11) of the score statistics for group j, under
the null hypothesis under the standard method of data processing for simulated data with
500 participants per group* when ρ=0 and w=0.5.
170
Table A7.4. Variance covariance matrix (3.11) of the score statistics for group j, under
the null hypothesis under the standard method of data processing for simulated data with
1,000 participants per group* when ρ=0 and w=0.5.
170
Table A7.5. Variance covariance matrix (3.11) of the score statistics for group j, under
the null hypothesis under the standard method of data processing for simulated data with
2,000 participants per group* when ρ=0 and w=0.5.
171
Table A8.1. Asymptotic variance covariance matrix of the increments of the score
statistics for group j, under the null hypothesis under the standard method of data
processing when ρ=0 and w=0.5.
172
Table A8.2. Variance covariance matrix of the increments of the score statistics for group
j, under the null hypothesis under the standard method of data processing for simulated
data with 104 participants per group* when ρ=0 and w=0.5.
172
xv
Table A8.3. Variance covariance matrix of the increments of the score statistics for group
j, under the null hypothesis under the standard method of data processing for simulated
data with 500 participants per group* when ρ=0 and w=0.5.
172
Table A8.4. Variance covariance matrix of the increments of the score statistics for group
j, under the null hypothesis under the standard method of data processing for simulated
data with 1,000 participants per group* when ρ=0 and w=0.5.
172
Table A8.5. Variance covariance matrix of the increments of the score statistics for group
j, under the null hypothesis under the standard method of data processing for simulated
data with 2,000 participants per group* when ρ=0 and w=0.5.
173
Table A9.1. Comparison of exit probabilities for each of three total analyses assuming
independent increments and under the asymptotic situation based on the window length
between visits, w, and the probability of delayed reporting, ρ.
174
xvi
Abstract
It has been shown, in a clinical trial setting where time to event is of interest, a
preferential method of reporting influences estimates of the hazard rates and ratios
in a single sample situation. This commonly used practice of preferential reporting
censors event-free participants at their last clinical visit prior to the analytic time
point whereas recording of the survival time for participants experiencing an event
prior to analysis can occur at any time. Three methods of data processing are further
explored in the case where survival time is exponentially distributed and in an
interim monitoring setting. The effects of delayed reporting of events and window
length between visits on estimates of the log hazard are compared among the four
data processing methods under the alternative hypothesis from study design and a
deviation of the alternative hypothesis from study design. Also, under the null
hypothesis, the effect of delayed reporting of events and length between visits on the
correlation between increments of the score statistic obtained at each time of analysis
is compared for each data processing method.
A general framework is presented which is used to derive asymptotic variance
covariance matrices of the score statistics and asymptotic values of the correlation
between increments of the score statistic are also obtained.
1
Chapter 1 Introduction
1.1 Clinical trial and data processing procedures
The randomized clinical trial is accepted as one of the best evidence-based approaches for
instructing and improving clinical practice. Clinical trials research in human subjects offers an
opportunity to directly contribute knowledge to many fields, specifically in medicine in
determining the therapeutic value of an experimental therapy for a particular disease or condition,
chronic or acute. The ubiquity of the clinical trial as the primary methodology for comparative
studies. For trails where patients can be under observation for an extended period of time prior to
the time at which the primary endpoint is determined, there is an ethical imperative to examine
results prior to the planned maximum duration of a study while it is ongoing for ethical and
administrative purposes.
The aim of this research is to investigate and evaluate different methods of data processing
and the possible resulting biases that arise during interim monitoring. Currently there are several
statistical approaches for monitoring trials while still ongoing prior to the time of completion.
Regardless of what approach is ultimately chosen, applying data processing methods that do not
comply with underlying assumptions used to derive analytical methodologies can result in biased
or inefficient estimates of treatment effect at interim stages. This can lead to flawed conclusions
regarding whether a trial should be terminated early or allowed to continue to enroll participants.
While statistical stopping rules are but one of many aspects considered when determining the
continuation of a trial, they are valued as guides for how to proceed. This research will rely on
survival data that result from cancer research but results can be applied, in general, to any follow-
up trial.
2
1.1.1 Censoring
Consider a randomized comparative clinical trial designed to investigate the effectiveness of an
experimental therapy versus a standard therapy. Eligible patients would be recruited and,
providing they consent, would be randomly assigned to receive either therapy and followed
prospectively until an event of interest is observed. Event types are selected to suit the goal of the
trial. Examples of events in a cancer clinical trial setting as defined by the protocol are: death due
to the disease, a secondary malignancy and a relapse of the existing disease. Participants that do
not have an event of interest by the end of follow-up and are no longer under observation are said
to be censored.
Current standard analytic methods rely on the assumption that the underlying censoring
process is independent of the underlying time-to-event process. Such censoring mechanisms are
said to be ignorable in analysis. One type of ignorable censoring, known as type I, occurs when a
trial is designed to stop observation of all patients a fixed time after enrollment is started. Another,
known as type II, occurs when the trial follow-up period ends when a specific number of events
have been observed.
In an ideal trial, event related information is reported without delay for all participants and
current status is known for all participants at the time of analysis. In this situation censoring occurs
at the end of observation for all participants who were either lost to follow-up or did not experience
an event. The latter conforms to the assumption of ignorable censoring (figure 1.1). For this
research, I am assuming the methods for disease evaluation unambiguously identify when an event
occurs; there can be no prior time when the event could have been detected by any other means of
evaluation. In practice, at the time of analysis, follow-up in the absence of events only occurs at
regular intervals of time.
3
Events can possibly, although not certainly, be reported between the regular follow-up
intervals due to finite resources and the unreasonable burden on participants who would otherwise
require further medical testing to determine event status which impart adverse risk as well. To
ameliorate this risk, routine evaluations of patients are often scheduled at specific times as
specified in the study protocol. For example, in studies of childhood cancer, disease evaluations
are often done at 3-6 month intervals from trial enrollment to the planned end of the study. Because
of this, the last complete evaluation of a patient’s status may be some time prior to the time of
analysis from the primary study aim.
Figure 1.1 Graphical depiction of survival time under observation for participants in a clinical trial relative
to time of analysis. Lines ending in bold points denote observed failure.
4
1.1.2 Reporting of events and data processing procedure
For this research we are interested in what occurs in the window of time between a participant’s
last complete evaluation and a time of analysis. For any given trial, there are events of interest
that are preferentially reported as soon as they are ascertained. An example is the case where a
participant requires immediate medical attention for symptoms related to their disease or possible
adverse reactions to treatment, such as death from treatment side-effects where there is a regulatory
imperative to report such events as they occur. On the other hand, the reporting scheme and
follow-up schedule for event-free participants would not allow for their observation to end between
their last complete evaluation and the time of analysis resulting in an overestimate of the
underlying event rate for exponentially distributed data whereby the total time of participants is
underestimated yet the number of events is correct when, for example the standard method of data
processing is used (Figure 1.3).
A previously explored data processing technique assumes the status of participants is
maintained to the end of the trial for those who were event-free at their last scheduled follow-up
visit. This particular method requires immediate reporting of all events that occur before the end
of the trial to provide unbiased estimation of the treatment effect. Other methods of data
processing involve moving back the deadline for the contribution of data to analysis to either the
last scheduled follow-up visit for each participant or for all participants a number of months,
usually the same number of months between intervals of observation. The potential pitfalls involve
the impact losing information after this cut-back has on the power of the statistical analysis plan
and the inflation of the variance of estimates for parameters of interest. This cut-back potentially
excludes participants from analysis that were enrolled just before this cut-back (see figures 1.3
through 1.6).
5
1.2 Interim monitoring
In the case a determination can be made about the trial null hypothesis based on statistical evidence
prior to the planned end of the investigation, there is an ethical imperative to act on such
information. The conclusions of such trials need to be disseminated so that the research agenda
for the particular condition can be advanced and study subjects need no longer be exposed to any
risks associated with trial participation. The ability to check the status of a trial and its participants
at interim intervals before the trial’s conclusion allows for the determination of a significant
treatment difference that might be evident early on and to verify statistical aspects of the trial as
designed, among other aspects.
1.2.2 Practical Example
To examine the effect of these data processing methods and the effect of preferential reporting of
certain events in the context of interim analyses, methods described in this research will be applied
to study AEWS0031, a randomized controlled trial of interval-compressed chemotherapy for the
treatment of localized Ewing sarcoma, conducted by the Children’s Oncology Group. The primary
question for this trial was whether intensively-timed therapy, compared to standard-timed therapy,
would improve the event the event-free-survival (EFS) of children and young-adults with Ewing
sarcoma.
This research will consider data processing methods in the context of interim monitoring
of clinical trials. The standard method of censoring event-free-survival at the last scheduled visit
(figure 1.3) will be compared to methods that impose the criteria of ignorable censoring by using
an earlier deadline for the contribution of data to analysis to either a small window of time back
for all participants (figure 1.5), in effect insuring that true patient statuses are known up to that cut-
off, or to the last follow-up visit for each participant (figure 1.4). This research will also consider
6
a method that assumes participants who are event-free by their last follow-up visit, remain event-
free to the end of the trial, assuming that no event is reported with delay (6).
Figure 1.2 shows the true event status of 7 participants in an example trial with random enrollment.
Participants 1, 2, 3, and 4 in this diagram enter the study at such a point which leaves sufficient
time for a scheduled clinical visit, or last follow-up visit, before the time of analysis. Participants
5, 6, and 7 enter the study at a point which does not leave enough time for any clinical visits prior
to the time of analysis.
Figure 1.3 imposes the standard data processing method on data from figure 1.2. The
standard method of data processing allows for the delay in reporting of some events between the
last scheduled visit and the time of analysis, with probability greater than zero. In figure 1.3,
participants1 and 2 experience events between their last follow-up visit and the time of analysis.
In this example, participant 2’s event is reported with delay and thus their event status is unknown
after their last visit and the survival time is censored. Participant 3’s event time can be ascertained
by the time of their last scheduled follow-up visit, before the time of analysis. Participants 5, 6,
and 7 enter the trial relatively late and do not have a scheduled follow-up visit before the time of
analysis. Participant 5’s event is reported and their event and time under observation contributes
to the analysis. On the other hand, participant 6’s event is reported with delay (light grey) and thus
does not contribute to the analysis. Participant 7 is excluded from analysis since this participant
does not experience a last follow-up visit prior to analysis. Figure 1.4 imposes the personal cutback
data processing method on data from figure 1.2. The personal cutback method (figure 1.4) pushes
back the deadline for contribution to the last follow-up visit for each participant in the trial. Only
participants with at least one scheduled last follow- up visit before the time of analysis contribute
to the analysis of the data.
7
In figure 1.4 participants 5, 6, and 7 do not contribute to the analysis since they did not
have a follow-up visit prior to the time of analysis. Figure 1.5 imposes the global cutback data
processing method on data from figure 1.2. The global cutback (figure 1.5) pushes back the
contribution of data to analysis a small window of time, equal to the scheduled time between
routine visits, back for all participants and assumes the status of all participants is known up to the
study level cutback. In figure 1.5 participants 5, 6, and 7 are excluded from analysis for the same
reasons as above. Recall both the personal and global cutback methods employ censoring
mechanism that are ignorable analysis.
Figure 1.2. An example of the true event status of 7 participants in a trial with random enrollment.
8
Figure 1.6 imposes the pull-forward data processing method on data from figure 1.2. The
pull-forward method assumes any participant surviving beyond their last scheduled follow-up visit
survives until the time of analysis. In figure 1.6, the event for participant 2 is reported with delay
and the participant is incorrectly assumed to have survived to the time of analysis. On the other
hand, participant 4 is correctly assumed to survive to the time of analysis since the participant does
not have an event by the time of analysis according to figure 1.2. Participants 5, 6, and 7 enter the
trial with insufficient time to have a last follow-up visit prior the analytic time point, are also
assumed to survive to the time of analysis. Notice this assumption is incorrect for participants 5
and 6.
Figure 1.3. The effect of applying the standard data processing method on data from figure 1.2.
9
Figure 1.4. The effect of applying the personal cutback data processing method on data from figure 1.2.
10
Figure 1.5. The effect of applying the global cutback data processing method on data from figure 1.2.
11
Figure 1.6. The effect of applying the pull-forward data processing method on data from figure 1.2.
12
Chapter 2 Literature Review
2.1 Sequential Analysis: a brief introduction
Sequential analysis is a statistical method that applies a decision making process at any
intermediate observation time at some point prior to the planned primary analytic point of the trial
denoted herein as the time of analysis. Generally, the times of these interim analyses are part
of the clinical trial protocol. Wald (1945) defined the possible decisions to be made and are
described here in the context of a clinical trial. They are: 1) fail to reject the null hypothesis, 2)
reject the null hypothesis, or 3) continue the trial and enroll more participants. Based on the
accumulated information up to observation time < , one of the decisions is made.
Recruitment is stopped and the trial is terminated if either of the first two decisions are made. The
trial continues to the next observation time if the third decision is made. This process proceeds
until there is enough statistical evidence to reject or fail to reject the null hypothesis or until the
planned end of the trial where the final observation and analysis occurs. A key feature of sequential
analysis is it requires less number of observed outcomes on average as compared to the single-
sample methodology in order to reach a conclusion which has similar statistical poperties,
consequently utilizing resources more efficiently (Haybittle 1971, Pocock 1977, O’Brien and
Fleming, 1979, among others).
2.1.1 Group-sequential methods
Due to ethical imperatives and the protection of human lives, the process of inspecting
accumulating data in clinical trials at interim stages is necessary to monitor for extreme therapeutic
results and to provide convincing evidence to alter or stop the trial (Green, Fleming, O’Fallon
13
1987, Pocock 1982, DeMets and Lan 1984). This includes balancing the negative consequences
of inconclusive results against the damage done by continuing to expose participants to inferior
treatments (Peto et al 1975). From an administrative perspective, reviewing accumulating data at
interim stages allows for monitoring of participant eligibility, accrual rate, adherence to the
protocol, and adverse reactions to treatment including toxicity and related issues. Other
considerations for safety and efficacy during interim stages include accounting for findings from
concurrent related research and making valid inferences that instruct treatment for future patients.
The importance of interim analyses for administrative and statistical purposes has been
acknowledged by the FDA. It is required all comparative study applications include a plan for
interim review of accumulated data.
In the mid-1970s, formal sequential methods analyzing accumulating data in actual trials
were rarely applied because of the difficulty in conforming to the accrual pattern of participants
two at a time, one from either treatment group, and surveilling them continuously (Pocock 1977).
By the late 1970s, applying the mathematical theory of Brownian motion, an alternative
methodology labeled group-sequential procedures was developed that allows for intermittent
analyses. Methods for interim trial monitoring analyzing accumulating data include frequentist
approaches based on calendar time or information time and Bayesian approaches (Jennison and
Turnbull 1990). A main feature of these procedures are group-sequential boundaries. Group-
sequential tests using these boundaries are conducted periodically during the enrollment period of
a trial where regions for continuation and termination are defined. The test statistic in this case is
a function of time at analysis where time can be measured as calendar time elapsed or by the
amount of information accumulated, referred to as information time.
14
A key feature of group-sequential methods is they addresses a notable consequence from
using standard statistical methods in repeated testing, namely, the inflation of the type I error rate.
2.1.2 Information Time
Although the theoretical framework for sequential procedures were initially derived for trials
where the response (continuous outcome) to the treatment for all participants is known soon after
entry, extensive work has shown these methods can also be applied to time-to-event data where
the outcome of interest is event status (Jennison and Turnbull 1989, DeMets and Lan 1994). At
the heart of the matter are properties (detailed in the following section) of the test statistic that
allow for their application to such outcomes. Specifically, the logrank statistic, when computed
sequentially throughout a trial, performs like a partial sum of independently distributed normal
random variables with variance proportional to the number of events observed at the time the
analysis is conducted (Tsiatis 1982). Moreover, it has been shown that the joint distribution of the
test statistic for sequential monitoring does not depend on the total number of participants enrolled
but rather on the number of events (Lan and DeMets 1989a). For example, under the proportional
hazards assumption, the log-rank score statistic has an asymptotic variance that is equal to the
information for the log hazard ratio between treatment groups when there is no treatment effect.
Furthermore, assuming no tied events, an approximation for this information shows that it is
proportional to the number of events observed at an interim monitoring stage (Lan and Zucker
1992, Proschan et al 2006).
Consider a trial with planned analysis stages such that = 1,2, … , where at each
analysis, we test the null hypothesis of no treatment effect between two treatment groups, a control
group and an experimental group, relying only on the amount of accumulated information
15
available up to that stage. We define as the maximum number of events under the null
hypothesis and as the maximum information to be accumulated. Then, the information
fraction time (later simply referred to as information fraction or information) at stage , is
=
≈
, for 0 < < 1
where
and can be denoted as and (Kim and DeMets 1987).
Note the problem of estimating the information fraction at any stage when the total number of
events at end of the trial is unknown. A proposed estimate for this unknown quantity is the
expected number of events at the end of the trial assuming no treatment effect (under the null
hypothesis) (Lan and DeMets 1989 and Lan and Zucker 1993, and DeMets and Lan 1995). Notable
problems arise when the estimate of the number of events total either over- or underestimates the
observed number of events such as over- or under-spending alpha at times of interim analyses and
the final analysis. Kim et al. (1995) explored the issue of overspending or underspending alpha,
when utilizing alpha spending functions to generate boundary values, specifically when observing
events more than (overspending) or less than (underspending) the expected total by design. The
focus of their simulation studies was on maintaining the type I error α and adjusting the last
boundary value under the null hypothesis no matter which hypothesis the observed data support.
Their findings, corroborated later by Porschan and Nason (2011), show that either over- or
underestimating the number of events, thus incorrectly estimating the amount of information
accrued, leads to boundaries that are either too conservative or too liberal at early interim analyses.
Dang (2015) suggests using the expected number of events under the null hypothesis from the
control group to estimate the information accumulated up to time (defined below). She shows this
method limits the bias in estimating the amount of information accrued ameliorating the issues of
underspending and overspending of alpha. Here the script CO refers to the control group.
16
=
[ , | ]
[ | ]
In practical applications the observed number of events in the control group observed under
perfect ascertainment conditions by analysis time can be used, where denotes number of
events:
̂ =
[ | ]
This estimate achieves the following: 1) shown via simulation (with assumptions similar
to those explored in this dissertation explained further in chapter 3), estimates of information time
show little to no bias, 2) preserve the type I error rate under the null hypothesis, 3) and results in
the shortest study length under the alternative. This robust information time estimation method is
free of the assumption of a uniform distribution of events as needed by methods relying on calendar
time. Dang shows that estimating ̂ and its covariance structure based on the observed
information provided by the control group can easily be implanted in practice due to the ease of
access to boundary value calculation software in statistical packages like SAS and R. This estimate
of information fraction is used in the simulation studies explored in chapter 3.
2.2 Brownian Motion and Independent Increments
2.2.1 Lan and Zucker
Lan and Zucker (1993) provide a theoretical framework that allows for existing monitoring
methods based on Brownian motion to be applied to general situations. Well-known and studied
extensively in the stochastic process literature, the Brownian motion (Slud and Wei 1982, Tsiatis
1982, DeMets and Lan 1984 and others), { ( ): ∈ [0,1]} is defined such that for any < <
⋯ < in [0,1], the random vector ( ), ( ), … , has a multivariate normal
17
distribution with mean zero and covariance structure defined by ( ), ( ) = min( , )
where ( ), ( ) − ( ), … , − are independent; a property known as
independent increments.
To motivate the connection between sequential data monitoring and Brownian motion,
suppose that in a one-sample problem involving independent and identically distributed
observations , , … , with mean and variance 1 and that the hypothesis being tested
is : = 0. The partial sum is such that = + + ⋯ + for = 1, … , and
where = 0. Other properties follow such as:
( ) =
( ) =
( , ) = min( , )
where ( ) = is the amount of information accumulated. If is thought as the total amount
of information to be accumulated at the end of the trial, then let = / be the fraction of
information accumulated by the observation, also known as the information time associated
with the calendar time of when the observation is complete, where is part of the set of
information times, { , , … , }. We can now define the process { ( ): ∈ [0, ]} where
represents calendar time under observation, ( ) the number of observations by time , and ( ) =
( ) be the amount of accumulated information. If we let ( ) = ( )
, then the following
properties emerge:
[ ( )] = ( )
( ) = ( )
( ), ( ) = min ( ), ( )
Next, using the B-value of Lan and Wittes (1988), the authors define the process:
18
( ) = =
√ =
( )
( )
where with Θ = √ , then similarly as before:
[ ( )] = √
( ) =
( ), ( ) = min( , )
Note that is defined only for discrete time points.
Finally, the authors indicate that in a more general case, say for information
times , , … , in the interval [0,1], the random vector ( ), … , converges in
distribution to a multivariate normal random vector ∗
( ), … , ∗
( ) which has mean zero and
covariance such that ∗
, ∗
( ) = min , . Hence the process { ( )} can be
approximated by a Brownian motion and the property of independent increments only holds under
the null hypothesis or when = 0.
2.2.2 Scharfstein, Tsiatis, and Robins
Scharfstein et al. (1997) show that efficient test statistics used in group-sequential methods for
clinical trials have a limiting multivariate normal distribution with an independent increment
covariance structure. They then use this result to develop an information-based design and
monitoring procedure that can be applied to any type of study provided that there is a unique
parameter of interest that can be efficiently tested and finally assuring the reader no additional
effort is needed to establish this same distributional structure in a multi-parameter setting.
19
Let the full data collected on the , ( = 1, … ), study participant be described by an iid
and continuous process = , , , ≥ 0 , where is the time the participant is enrolled in
the trial and , is additional data collected during the time under study observation, . Assume
an interim analysis is chosen to occur at time , then let , denote the vector containing all of the
data collected up to time for all participants = 1, … , . Suppose that we are interested in testing
the hypothesis : − versus local alternatives using the data collected up to time . An
approach to performing this test would be to use a semi-parametric efficient statistic we will
denote ∗
( , , ). The authors show that instead, we can use a Wald statistic, a special type of
an efficient test statistic, based on a semi-parametric estimate of . Let ∗
( , ) denote a semi-
parametric efficient estimator of at time and the efficient Wald statistic be of the form:
∗
, , =
∗
, − ∗
,
where ∗
, is the standard error of ∗
, .
In lemma 1 the authors explain that a regular and asymptotically linear (RAL) test
statistic , , with influence function , , , is asymptotically normal with a
mean , , , , ( ) , where √ ( − ) → and, and variance 1. Using the
Cauchy-Schwarz inequality, the authors show that a RAL test statistic ∗
( , , ) with influence
function , ( ), achieves the upper bound of the mean of , , under local
alternatives. Relying on the theory of semiparametric efficient estimation, they show that an
efficient Wald statistic follows this framework since it is a regular and semiparametric efficient
estimator of at time . They proceed to show the following:
20
√ ∗
, − =
1
√ , , + (1)
where is the Fisher information based on the data available at time and , , is the
efficient score for at time . Since the authors assume that ∗
, ( ) converges
in probability to 1, then the efficient Wald statistic ∗
, , is a RAL, semiparametric
efficient test statistic.
The result of interest for this dissertation is that of their theorem 1 which states that under
a semiparametric model with as a parameter of interest and testing the null hypothesis : =
against local alternatives, the vector of RAL semiparametric efficient test
statistics ∗
( , , ) for = 1, … , , with influence functions ∗
, , , =
, ( ) converge in distribution to a multivariate normal distribution. Specifically
, ∗
, , , … , , ∗
, , → ( , )
where is a zero vector under the null hypothesis and is a × matrix with an independent
increments covariance structure and , a consistent estimate for , . They remark
that efficiency is a sufficient condition that brings about the independent increments structure.
Many methods (including those described above) for the construction of group sequential
boundaries depend on the independent increment property (also Lan and Zucker 1993 and
Porschan et al. 1992).
Lastly, in section 4.2 of their article, they provide the reader with recommendations for
implementing interim analyses with a similar approach as done in this dissertation (see chapter 4).
In general, when calculating the boundaries for the final analysis, the authors suggest spending the
21
remainder of . In the case where in the final analysis more information is observed than expected
(information fraction > 1), spending the remaining amount of α left over from the previous analysis
ensures the significance level can be maintained although the power of the study will be greater
than initially designed. The case where less information is observed than expected by the final
analysis (information fraction < 1) will result in a trial that is underpowered. The practice of
spending the remainder of in the final analysis is also implemented in this dissertation as others
have suggested this practice (see Kim et al. 1987, and Lan and DeMets 1989) and statistical
packages like SAS and R readily implement this practice in software designed to calculate group
sequential boundaries.
2.3 Equal Information Fraction Methods
The following methods discussed assume
information units are accumulated between each
planned interim analysis. Therefore the information fraction obtained at the interim analysis
is =
. These methods rely on the calculation of a constant, ( , ), designed to maintain
the overall type I error rate of with a total of planned analyses for two-sided tests, with
standardized test statistic .
2.3.1 The Pocock Method
Pocock (1977) proposes to utilize a constant ( , ) which is generated via numerical quadrature
so that the significance levels for each interim analysis are equal maintaining a total type I error
rate of for a planned total of analyses points. For the first = 1, … , − 1 analyses, the trial
is terminated and the null rejected if | | ≥ ( , ), otherwise continue the trial to the next stage,
+ 1. At the final analysis the decision is made to either reject the null hypothesis if | | ≥
22
( , ) or not reject the null otherwise. A unique feature of the Pocock method is that, under the
alternative hypothesis, the boundaries produced for all interim stages are the same, minimizing the
average sample number (ASN). For example for a trial with = 4 planned analyses and =
0.05, the null is rejected the first time | | ≥ 2.361.
2.3.2 The O’Brien and Fleming Method
Unlike the Pocock method (1979), the O’Brien and Fleming method uses increasing levels of
significance on subsequent stages of analysis. The constant, ( , ), maintaining a given type
I error rate and for a total number of analyses , is also produced via numerical integration
techniques. In the first = 1, … , − 1 interim analyses, terminate the trial and reject the null
hypothesis if | | ≥ ( , ) , or continue the trial otherwise. In the final analysis, reject
the null hypothesis if | | ≥ , otherwise fail to reject the null hypothesis. For example, for a
trial designed with = 4 planned analyses to maintain = 0.05, ( , ) = 2.024 forming
non-constant boundaries.
2.3.3 The Power Family Method of Wang and Tsiatis
Wang and Tsiatis introduce what they call the "optimal" boundaries for interim analysis which are
indexed by a single parameter 0 ≤ Δ ≤ 1. They define "optimal" as given , , and 1 − ,
producing boundaries that will minimize the average sample number (ASN). In the first =
1, … , − 1 interim analyses, terminate the trial and reject the null hypothesis if | | ≥
( , , Δ) , or continue the trial otherwise. In the final analysis, reject the null
hypothesis if | | ≥ ( , , Δ) , otherwise fail to reject the null hypothesis. A feature
23
of this method is that values of Δ = 0 and Δ = 0.5 yield boundaries similar to those generated by
the O’Brien and Fleming method and the Pocock method respectively.
2.4 The -Spending Function
If the number and time of interim analysis are known before conducting a trial, assuring each stage
acquires equal amounts of information, then methods in the section above can be readily applied.
In practice, the schedule and total number of analyses is less predictable. For example, a data
safety monitoring board (DSMB) may meet at intervals dictated by calendar schedules whose
meetings may not coincide with equal number of events between analyses. As shown by DeMets
and Gail (1985), unequal increments in information using these methods impact the overall type I
error.
2.4.1 The Lan and DeMets Spending Function
Lan and DeMets (1983) first proposed a group sequential method that produces boundaries
determined by critical values chosen such that the sum of total probability of crossing those values
during the duration of the trial is exactly the type I error rate, . Utilizing a “spending” function,
the entirety of is spent over the interim analyses. In other words, the -spending function
describes the rate in which the total is spent as a function of information fraction. Using this
method, there is greater flexibility in designing trials without knowing the total number of analyses
and relaxes the restriction of increments of information. The spending function, ∗
( ), 0 ≤ ≤
1, is defined as an increasing function where ∗
(0) = 0 and ∗
(1) = .
For a trial with interim analyses, = 1,2, … , and desired type I error of and power
of 1 − , let be the accumulated information fraction up to analysis . Specifically,
24
=
observed number of events up to analysis k
Expected number of total events at the end of the trial
The motivation for this approach comes from discretizing continuous boundaries for a Brownian
motion. Let { ( ): 0 ≤ ≤ 1} be a standard Brownian motion process and { ( ), 0 ≤ ≤ 1} a
continuous boundary. If { ( )} were to be observed only at discrete times 0 < , , … , =
1, ( ), = 1, … , a discrete boundary can be constructed { ( )} so that
[ ( ) ≥ ( )] = ( ) ≥ ′( ), ≤ = ∗
( ).
In essence the boundary crossing probability of a continuous process accumulates and is
assigned in this case.
For = 2, … , , constants , , … , can be found such that
[ ( ) < ( ), … , ( ) < ( ), ( ) ≥ ( )]
= ( ) ≥ ( )
, ≤ ≤
= ∗
( ) − ∗
( )
In other words, ∗
( ) − ∗
( ) is the amount of spent from analysis to analysis + 1. It
can be shown that =
√
Φ
2Φ − 1 for a two-sided test, where Φ(∙) is the
cumulative density function for a standard normal distribution. The remaining , … , are found
using numerical integration as outlined by Armitage et al (1969). Standardizing statistics
considering their corresponding boundaries for use in clinical practice, the boundary becomes
=
( )
, = 1,2, … ,
and the process becomes ( ) = ( )/√ (DeMets and Lan 1989) which under the null
hypothesis or small deviations from the null value, this process has asymptotically independent
increments (DeMets and Lan 1994, Lan and Zucker 1992).
25
Kim (1989) and Kim and DeMets (1987) investigated the effect of the magnitude of the
boundaries and the point estimates produced by five different spending functions,
∗
= 2 1 − Φ ⁄
∗
= log[1 + ( − 1) ]
∗
=
∗
= .
∗
=
under three types of information accumulating intervals; equal interval analysis at ( )
=
{0.2,0.4,0.6,0.8,1}, late analysis at ( )
= {0.3,0.6,0.8,0.9,1}, and early analysis at ( )
=
{0.1,0.2,0.3,0.6,1}. Lan and DeMets (1983) showed that under equal intervals, ( )
, ∗
approximated the boundaries of the O’Brien and Fleming methog and ∗
of the Pocock method.
Figure 2.1 shows the graph of each spending function for = 0.05.
Table 1 of Kim and DeMets (1987) shows that early analysis produces bigger boundaries
early on resulting in a much smaller probability of terminating a trial in the early stages. Late
analysis produces smaller boundaries early on with equal interval analysis producing boundaries
between late and early analysis. In table 3 they show that the expected stopping times increase the
later the analyses occur with the Pocock-type boundaries yielding the smallest times. Kim (1989)
shows that point estimates achieved in early stages are biased and is corroborated by Pocock and
Hughes (1989).
Slud and Wei (1982) recommend using a steadily increasing function since it will result in
steadily decreasing boundary values in turn lessening the chance of observing a significant result
at and subsequently a non-significant result at . For this very reason, Kim and DeMets
(1987) suggest using only convex spending functions such as ∗
, ∗
, and ∗
. One of few criticisms
26
of the spending fucntion approach comes by way of Fleming, Harrington, and O'Brien (1984)
where they point out that the total information actually accumulated by an analysis is difficult to
determine.
2.4.2 The Gamma-Family of Hwang, Shih, and De Cani
Hwang, Shih, and De Cani (1990) propose a gamma-family of spending functions which extend
the method of Lan and DeMets. The gamma-family of spending functions, α( ,) , is constructed
using truncated exponential distributions where specifies the proportion of information accrued
as before and the parameter γ specifies the rate alpha is spent as well as the shape of the boundaries.
Figure 2.1 shows the graphs of the spending functions ∗
, ∗
, ∗
, ∗
, and ∗
.
27
Th γ-family of alpha spending functions are defined as follows:
α(γ, t) = α 1 − e
1 − e
, γ ≠ 0
αt, γ = 0
for 0 ≤ t ≤ 1
where α(γ, 0) = 0 and α(γ, 1) = α for any γ.
In table 1 of their paper, they conclude that ∗
as defined above is almost identical to
(1, ) and that (−4, ) or (−5, ) approximate the boundaries of ∗
only for ≥ 0.6 since the
O’Brien and Fleming boundaries are extremely stringent early on. Here Hwang et al confirm the
optimality of the boundaries of Pocock and O’Brien and Fleming (Wang and Tsiatis 1987) where
for lower power (50%) the latter minizes the expected sample size and for higher power (< 90%)
the former minimizes the expected sample size. For intermediate power (80%), (1, ) and (2, )
appear to be optimal. The overall advantages of the gamma-family of functions are flexibility in
the choice of boundary and the independence of the rate at which alpha is spent and the choice of
or the sidedness of the test.
2.5 Events Reported With Delay in Survival Analysis
For any given trial, there are events of interest that are preferentially reported as soon as they are
ascertained (see chapter 1). For example, if a participant requires immediate medical attention for
symptoms related to their disease or possible adverse reactions to treatment, such as death from
treatment side-effects where there is a regulatory imperative to report such events as they occur.
Other types of events, without such a regulatory imperative to report, are not ascertained until the
next regularly scheduled follow-visit since participants cannot feasibly be continuously monitored
throughout the duration of the trial. For example, “well” follow-up is usually only reported when
a thorough patient evaluation is conducted. Because of factors of cost and the usually invasive
28
nature of evaluation, well follow-up is reported on a schedule. This results in asynchronous
ascertainment of well follow up and follow-up which ends in an event. In the following sections I
discuss the contributions of Hu and Tsiatis (1996) and Van der Laan and Hubbard (1998) to the
estimation of the survival distribution when ascertainment of events is subject to delay, and the
contribution of McIlvaine (2015) in applying a data processing solution to delayed reporting in
clinical trials.
2.5.1 Hu and Tsiatis
In this work, an estimator for the survival distribution that improves upon the Kaplan-Meier
estimator when the censoring process is informative, regardless of the process of ascertainment of
event status. When censoring is considered non-ignorable, the Kaplan-Meier estimator in some
cases is biased. The authors draw on methods for competing risks and relied on counting process
and martingale theory to verify the asymptotic behavior of their estimator.
Consider a trial with participants recruited to a clinical trial where there are no drop-outs
or withdrawals. Let denote the continuous time for failure for participant , measured from their
entry to the trial. Let represent the time of evaluation for a participant and the time when
the participant’s status at is reported. Thus, the interval of time between a participant’s
evaluation time and the time their status is reported is the delay (the time between and ).
Assuming all participants are followed until the occurrence of an event, the failure time is
recorded at time , meaning the participant’s status is recorded ( − 1) times before the
occurrence of an event. Let ( ) be an indicator of the participant’s status at time where
( ) = 1 signifies the occurrence of the event. Let ( ) denote the time at which the
29
participant’s status is known for the first time. The authors represent data for the individual as
a bivariate process { ( ), ( ); ≥ 0}.
The authors make two key assumptions. The first is that amount of reporting delay is
bounded above by a constant ( ) such that ( ( ) ≤ + ( )| ( ) = 1) = 1, meaning if a
participant experiences an event by time , then the information will be known by time + ( ).
Second, they assume the follow-up time, , is independent of {( , ), … , ( , ), ( , )}
suggesting that the failure time distribution and the process of ascertainment remain stable over
the course of the trial.
Drawing from competing risks literature, let the cause-specific hazard functions for time
to ascertainment be defined by
( , ) = lim
→ ℎ
{ ≤ ( ) < + ℎ, ( ) = | ( ) ≥ } , ( = 0,1)
Define the sub-distribution function
( , ) ≔ { ( ) ≤ , ( ) = 1}
and
( , ) = exp {−(Λ
( , ) + Λ
( , )} ( , )
where Λ
is the cumulative hazard function for and
Λ
( , ) = ( , ) , ( = 0,1)
Since delay is bounded above by ( ),
1 − ( ) = { ( ) = 1} = { ( ) ≤ + ( ), ( ) = 1} = ( , + ( ))
In the presence of censoring, observable random variables can be defined as
{ ( ), Δ( ), ∗
( ); ≥ 0}
30
where ( ) is defined as the minimum of ( ) and the follow-up time . Let Δ( ) be an indicator
of whether event status at time is known, and ∗
( ) be the event indicator at time where
∗
( ) = ( ) only when Δ( ) = 1. Assuming only observable information, it can be shown that
the cause-specific hazard for observable random variables
∗
( , ) = lim
→ ℎ
{ ≤ ( ) < + ℎ, ≥ , ( ) = }
{ ( ) ≥ , ≥ }
, ( = 0,1)
And since the failure time distribution and of the process of ascertainment remain stable
throughout the trial,
∗
( , ) = ( , ), ( = 0,1).
Utilizing counting process theory, the derive an estimator for the survival distribution as
( ) = 1 − { , ( )
}
{ , ( )}
{ ( ) ≤ + ( ), Δ
( ) = 1, ( ) = 1}
where { , ( )} is the following sum over = 1, … , ; ∑ { ( ) ≥ ( )}. In the case of no
delays in reporting, this estimate reduces to the Kaplan-Meier estimator.
Lastly, using counting process theory and the martingale central limit theorem,
{ ( ) − ( )}
achieves asymptotic normality and has an estimable variance.
In their discussion, the authors note that the potential source of bias is the last point at
which their status was known. While this approach may be innovative in treating event incidence
and censoring as competing risks, there are assumptions made that may be necessarily apply to
practical applications, such as a maximum possible duration for delays in reporting and the notion
assuming that all visits and reporting times are known and recorded accurately. The second
assumption made in their paper, that follow-up time is independent of the ascertainment process
31
is debatable in the case when participants are censored at their last follow-up visit making the
observed survival time a function of their follow-up schedule. Also useful to note and relevant to
this dissertation is that having an unbiased estimator of the time-to-event distribution does not
provide a direct estimate of the treatment effect. Finally, Van der Laan and Hubbard extend this
novel approach to include reportable covariates like treatment assignment.
2.5.2 Van der Laan and Hubbard
Van der Laan and Hubbard (1998) extend the work done by Hu and Tsiatis (1996), modifying their
estimator using the ‘inverse probability of censoring weighted’ estimator of ( ) from Robins
(1993) that works in more general situations. The estimator they propose incorporates covariates,
is locally efficient, and allows for dependent censoring i.e. allow censoring to depend on survival
time through the ascertainment process.
The authors assume that the time of analysis, , is independent of the vector ( ∗
( ), )
where ∗
( ) is the earliest time which ( ) = ( ≤ ) is known. It follows that
{ ∗
( )} = {Δ( )| , ∗
( )} = { ≥ ∗
( )| , ∗
( )}
And also since ( ) = ( ≤ ), then an estimator for ( ) results in
( ) =
1
( ≤ )Δ
( )
{ ( )}
.
An estimator for is by way of a Kaplan-Meier estimator based on observations ( ( ), 1 −
Δ( )). Letting ∗
( )
( ) = { ∗
( ) ≥ } and ( )
( ) = { ( ) ≥ } and using the
independence of and ( ∗
( ), ), then
1
( )
=
∗
( )
( )
( )
( )
32
And the modified estimator can be expressed as
( ) =
1
∗
( ), ( ( ))
( ), ( ( ))
( ≤ )Δ
( )
Where ∗
( ), the Kaplan-Meier estimator is obtained from the data { ( ), Δ
( )} and
( ), ( ) is the proportion of subjects with ( ) ≥ . The authors note that when the Hu and
Tsiatis estimator value, ( ), is set to infinity, it reduces to their estimator making ( )
uneccessary.
2.5.3 McIlvaine
McIlvaine (2015) investigated the effect misreporting of events has on data processing
methodologies and the estimates of the hazard rates, within each treatment group, they produce
when analysis is done once enrollment of trial participants is complete and have completed
treatment according to the study protocol. She also cites literature where misreporting (or delay)
of events has been found to potentially bias estimates if analytic methods do not account for it (See
McIlvaine, chapter 2). She defines three data processing methods which involve changing the way
study participant information is incorporated into estimates and statistical tests and presents them
as a simple way to deal with the bias that results from misreporting events. The first is the personal-
cutback method where trial participants’ follow-up obtained after the last scheduled visit before
the analytic time point is not included in the analysis. In this case participants’ survival time is
censored at their last follow-up visit. The next is the global-cutback method where the contribution
of data to analysis is pushed back a small window of time equal to the scheduled time between
routine visits. Finally, the pull-forward method where all non-events are censored at the time of
analysis. Via simulation and rigorous derivations of parameters, she compares their performance
33
to the standard method of data processing (where non-events are censored at the last visit and
events are reported any time).
McIlvaine found that, under similar clinical trial design considerations as explored in this
dissertation (See chapter 4), the standard method can lead to severe bias of the hazard rate estimates
as the probability of late-reported or unreported events increases or the length of time between
scheduled follow-up visits increases. Both the personal cutback and the global cutback methods
produce unbiased estimates of the hazard rates and are not affected by the misreporting of events.
With regard to performance, applying the global cutback data processing method yields a reduction
in power relative to that which would be obtained if the analyst had complete knowledge of the
data by the time of analysis. This poor performance is due to a tremendous loss of information by
eliminating an entire calendar period of data acquisition. The pull-forward method performs well
only when the assumption that all participants without a reported event by their last clinical visit
survive to the time of analysis does not hold. In this case, McIlvaine shows this method can exhibit
even more bias than the standard method. Since the personal cutback method employs an ignorable
censoring mechanism and produces unbiased estimates of hazard rates regardless of the level of
misreporting of events, she recommends applying this method of data processing to conduct
statistical tests of the difference between survival rates in treatment groups.
Notable from her findings is the longer a trial runs, the more information that is garnered,
and the less bias we would expect to see in estimates. Also, any loss in power is remediated by
extending the observation period until the number of events required by the original design are
obtained. Since the behavior of the estimates produced by the data processing methodologies
presented in McIlvaine’s work are sensitive to a reduction in observed trial time, the performance
34
of such processing methods and their estimates require exploration when tests are conducted while
active enrollment or treatment is ongoing.
35
Chapter 3 Estimation of the Correlation between Increments of the Score Function
This chapter explores the assumption of independent increments of the score function under four
data processing methods and the effects from delayed reporting of events and the window length
between scheduled clinical visits. As discussed in the previous chapter, the assumption of
independent increments of the score function is fundamental in the theoretical framework for
group-sequential methods and the boundaries produced to conduct interim analyses.
3.1 Assumptions and Censoring
A key assumption to likelihood-based methods of estimation of regression parameters in the Cox
model and parametric models is the notion of independent censoring. Independent censoring
occurs when the participants who are censored in a trial form a representative subgroup of those
who remained at risk with respect to their survival experience. Such a censoring mechanism is
said to be ignorable in analysis. Two types of ignorable censoring are described in section 1.1.1.
The reasoning to require ignorable types of censoring in the analysis can be shown by contributions
to the likelihood.
Consider data of the form ( , ) for participants = 1,2, … , where represents the time
when the participant is no longer under observation and the indicator for event status. If we
consider and to be realizations of random variables and , the time to event variable and
time to censoring, then ( ) and ( ) and ( ) and ( ) are the respective probability and
cumulative density functions.
Given a set of predictors and vector of parameters , the contribution to the likelihood for trial
participant experiencing an event ( = 1) while under observation is
36
lim
→ 1
ℎ
{ ≤ < + ℎ, > ; , }.
If trial participant did not experience an event ( = 0) while under observation then their
contribution to the likelihood is
lim
→ 1
ℎ
{ ≤ < + ℎ, > ; , }.
When the time-to-event distribution and the censoring distribution depend on the same
parameters, as when the censoring is not ignorable, then the likelihood must be expressed in terms
of the joint distribution of and . This would create problems in that once one of the random
variables for time is observed, the other remains unobservable, and can make modeling covariate
effects not identifiable.
For the remainder of this chapter, I consider a clinical trial with two treatment groups. In
this trial, participants are enrolled uniformly over an enrollment period and are randomly assigned
to the treatment or control group in a 1:1 ratio. Trial participants are under observation until then
end of the enrollment-free follow-up period or until an event is observed. The analysis plan
includes − 1 interim analyses and a final analysis planned at the end of the enrollment-free
interval. I also assume all participants have equal intervals of time between follow-up visits from
the time of enrollment.
3.1.1 Data processing methods
The approach of this research is to conform the manner in which data is processed to meet the
requirements of ignorable censoring to meet model specifications instead of attempting to model
the joint distribution of time-to-event and censoring. The following will be a detailed explanation
of methods of data processing under review in this research in the context of interim analyses. To
37
begin, the standard practice of censoring participants without a reported event at their last follow-
up visit before the analytic time point will be referred to as the standard method (std). Specifically
between participants’ last scheduled follow-up visit and the analytic time point, the standard
method provides for the possibility of reporting only events, since well follow-up can only be
verified by the patient evaluation methods employed at a scheduled follow-up visit.
The proposed approach to conform the data to meet the model is to censor all participants
at their last follow-up visit before the analytic time point in the case where an event is reported
between the time of that last visit and analysis. Also, this method requires the participant to have
at least one follow-up visit before the analytic time point for their time to contribute to analysis.
Should a trial participant be enrolled in the study at a time such that the analytic time point occurs
prior to the first scheduled visit, then this participant would not be counted in the analysis until the
next analytic time point. While standard likelihood methods can be used to generate estimators of
treatment effect of interest, this method would still depend on the follow-up schedule for the trial.
This method will be simply referred to as the personal-cutback method (cb1).
Another approach is to use a previous time point, a ‘relevant date’ before the analytic time
point, when event statuses of all participants would be reported and up to date. Any follow-up
data obtained after this relevant date is not included in the analysis. The amount of time chosen to
move back the analytic time point to this relevant date is usually the same as the interval of time
between scheduled follow-up for participants essentially moving the analytic time point back a
window width of time. With regard to interim analyses conducted early in a trial, a cutback
approach of this kind will exclude participants enrolled during the period between their last visit
before the analytic time point, affecting the amount of information accumulated for the early
analyses. This method will be referred to as the global-cutback method (cb2).
38
Yet another method of processing data is to assume participants who are alive at the time
of their last follow-up visit, survive to the analytic time point, essentially ‘pulling forward’ their
time on the study. If no events did occur during this period then the resulting estimates would be
unbiased. If the assumption is incorrect then the hazard rate would be underestimated (McIlvaine
2015). This method is referred to as the pull-forward method (plf). The pull-forward method, like
the standard method, allows for preferential reporting of certain events between participants’ last
clinical visit before the analytic time point while other events are not ascertained until the next
visit (reported with delay).
For methods that rely in ‘cutting back’ the analytic time point either study-wide (cb2) or
for each participant (cb1), there is an increase in the variance of the effect estimate since the
number of reported days at risk and the number of events is reduced compared perfectly
ascertaining event related data. The benefit of the cutback methods is that they are not subject to
preferential or delayed reporting of events.
3.2 Estimating the asymptotic behavior of the correlation between increments of the score
function
Sequential methods for interim monitoring of clinical trials depend on the multivariate normal
distribution to construct boundaries defining regions of rejection and continuation, whose purpose
is to maintain the nominal type I error rate at while intermittently testing the null hypothesis
against local alternatives. Recall from section 2.2, under the null hypothesis of no treatment effect,
the distribution of the increments of the score statistic has a limiting multivariate normal
distribution with an independent increment covariance structure. In other words, the increment
achieved from the test of hypothesis at analysis with amount of information accrued, is
39
independent (uncorrelated) of the increment to the score statistic achieved in the following analysis
with amount of information accrued. Deviations from this assumption can lead to generated
boundaries that do not have the frequency properties associated with independent increments.
In the following sections I describe the score statistic associated with the exponential
regression test for treatment effect, describe the mathematical methodology employed to derive
the asymptotic distribution of the increments of the score statistic, and lastly derive the covariance
structure of the increments of the score function under the four data processing methods and the
effect of delayed reporting and length of window between scheduled follow-up visits.
3.2.1 Maximum likelihood estimate of the Hazard ratio
In the case event times come from an exponential distribution, let the model for failure rate depend
on = [ , ]
and the hazard function be written as
( ; ) = exp( ).
Here = 1 so = exp( ) denotes the failure rate when = 0 and let be an indicator for
assignment to the experimental treatment arm ( = 1). Also, the hazard rate for participants in
the control and experimental treatment group are = and = respectively. The
hazard ratio can then be expressed as in equation 3.1 and therefore is a parameter that indicates
the true treatment effect between groups.
=
= .
According to Kalbfleisch and Prentice (2002), when the assumption of independent
censoring holds, the likelihood function for can be written as
( ) = exp − .
40
Here and are the corresponding regression vector and indicator for failure ( = 1 if failure 0
otherwise) for the trial participant. The resulting score functions for elements of are
( ) = − .
Specifically, separating the sum over all between members corresponding to each treatment
group we see that
( ) = + − −
( ) = −
where and denote the total number of events and total time under observation for participants
in treatment group , ∈ {1,2}. By solving first for the maximum likelihood estimate (MLE) of
from ( ) under the null hypothesis, that is setting = 0, and then evaluating the second score
function with the MLE from the first score function, again under the null hypothesis, , =
0 , we arrive at the score statistic
=
− + . (3.1)
Here (3.1) is asymptotically normally distributed with variance
( ) ( )
and can be used to test
for a treatment effect, = 0.
In the setting of interim analyses, the score test using (3.1) is conducted at every analytic time
point with accumulated information up to that point. Assume a trial is designed to have total
analysis time points, = 1, … , ( − 1 interim analyses and one final analysis). Therefore we
can define the score statistic achieved at monitoring time as
=
, , − , , , + , . (3.2)
41
where , and , denote the total number of events and total time under observation for
participants in treatment group , ∈ {1,2}.
Under the null hypothesis,
, , … , (3.3)
is asymptotically normally distributed with covariance matrix that has an independent increments
structure meaning ( , … , − , … , − ) are independent.
I use equation 3.2 to explore the assumption of independent increments in the setting of a trial
design as described above under each of the data processing methods in review and the effect of
delayed reporting and the length of the window between visits this assumption.
3.2.2 The multivariate delta method
In this section I provide a detailed description of a general framework I developed, which by way
of the multivariate delta method, I use to derive the asymptotic covariance matrix of the score
statistics, the covariance matrix of the increments of the score statistics, and the asymptotic
correlation coefficients between increments of the score statistic.
First, letting the individual realizations of the variables for event status and time under observation
be indexed by and be identically and independently distributed, under the Weak Law of Large
Numbers (here, and represent the individual event status indicator and the time under
observation),
̅ , =
1
, =
1
→ ( )
and
42
, =
1
, =
1
→ ( ).
Next, consider the vector
= [ , , , , … , , , , , , , , , … , , , , ]
(3.4)
whose elements, , and , , denote the total number of events and mean time under observation
at the k times of analysis, = 1, … , , for participants in treatment groups 1 (control) and 2
(experimental) ( ∈ {1,2}).
By the multivariate central limit theorem we have that
√ ( − ( ))
→ ( , Σ)
where ( ) is the element-wise expectation of in (3.4) and is a (4 × ) by 1 zero vector.
Here, Σ is a block diagonal matrix since it is assumed event statuses and times under observation
are independent between treatment groups since enrollment is done at random. Specifically,
Σ = Σ
, Σ
, (3.5)
where
Σ
, = , , , , , , , , = 1, … , ; = 1, … , . (3.6)
and Σ
, and have dimensions (2 × 2 ) since there are two treatment groups. Define a
function ℎ such that ℎ: ℝ
( × )
→ ℝ
,
ℎ( ) = [ℎ
( ), … , ℎ
( )]
(3.7)
where I let = [ , , , , … , , , , , , , , , … , , , , ]
and define ℎ
( ) such that
evaluating the function at yields the score statistic at time as in (3.2),
ℎ
( ) =
, , − , , , + , . (3.8)
43
Note the random variable, , , denoting survival time is assumed to be a continuous
random variable with non-negative support. Therefore the sum of survival times is assumed to be
bounded away from 0. By the multivariate delta method, the asymptotic distribution of ℎ( ) is:
√ ℎ( ) − ℎ( ) → ( , Σ
∗
) (3.9)
with covariance matrix
Σ
∗
= ( ) Σ ( )
(3.10)
where ( ( )) is the Jacobian evaluated at ( ) such that,
( ) =
ℎ
( )
, ( )
⋯
ℎ
( )
, ( )
ℎ
( )
, ( )
⋯
ℎ
( )
, ( )
ℎ
( )
, ( )
⋯
ℎ
( )
, ( )
and Σ is defined as in (3.5). The covariance matrix Σ
∗
is a × matrix such that
Σ
∗
= ℎ
( ), ℎ
( ) , = 1, … , ; = 1, … , . (3.11)
Elements of ( ) for = 1, … , are:
ℎ
( )
, ( )
=
− , , + ,
ℎ
( )
, ( )
=
, , + ,
ℎ
( )
, ( )
=
, , + , , + ,
ℎ
( )
, ( )
=
− , , + , , + ,
44
From (3.9), a liner transformation , can be applied to (3.7) where is defined as
[ ]
, = 1 =
−1 = − 1
and the covariance matrix of the increments of the score statistic can be found by
Σ
= Σ
∗
. (3.12)
From (3.12) I obtain the asymptotic limits of the increment correlations. In the following
section I derive the elements of the covariance matrix from (3.6) and explore the assumption of
independent increments by deriving asymptotic limits of the elements Σ
∗
(3.11) as shown in (3.10).
3.2.3 Deriving asymptotic limits of covariance matrix for
In this section I derive the variance and covariance terms from (3.6) implementing the personal
cutback method, the standard method, and the pull-forward method of data processing. For all
methods mentioned, using the terms from (3.6) I derive asymptotic values of the covariance matrix
in (3.12) to ultimately derive asymptotic limits of the correlation between increments of the score
statistics as in (3.7) for = 3. Namely, since ( , ) = ( ) − ( ) ( ) and ( ) =
( ) − ( ), I derive the relevant expected values. The following structural variables and their
effect on the correlation between increments of the score statistics are explored: the probability of
reporting events with delay ( ) and the window between scheduled follow-up visits ( ). Here
is assumed to be the same for all participants throughout the duration of the trial and is a
Bernoulli random variable representing the probability a trial participant’s event is reported with
delay. Let ∈ {1,2} denote the treatment group assignment, ∈ {1, … , } denote the time of
analysis, and be the calendar time for the analytic time point. In the following derivations
45
I assume subsequent times of analysis are at least one window length after the previous time of
analysis such that ≥ + . For data processing methods that rely on participants having
a last scheduled visit prior to the analytic time, for any monitoring time that occurs while
enrollment is ongoing, very few if any information will be gained if a subsequent monitoring time
point occurs within less than a window length since essentially the calendar time of the last
scheduled visit is a function of the enrollment time and .
To motivate this, I begin by deriving the expected values needed to calculate the values for
the limiting covariance matrix (3.6) assuming the null hypothesis of no treatment effect under
perfect ascertainment assuming random enrollment of participants during the enrollment
period [0, ] where is the calendar time for the end of enrollment. Note that under the null
hypothesis Σ
, = Σ
, (3.5) since the failure rate in both treatment groups are equal. Let
represent time to event and be a continuous random variable with density function ( ) and
survivor function ( ) for treatment group . Define as the enrollment time between 0 and the
end of enrollment, . For the monitoring time, define the status indicator and time variable
given study entry time as:
, = 1 0 ≤ ≤ − 0 ℎ
and , = 0 ≤ ≤ − − ℎ
.
In the following expected values, I omit the conditional distribution of event status at
different time intervals and instead express the limits of the integrals as disjoint partitions of time
as defined by , and , . Integrating over all possible times of enrollment, prior to analysis
time , the expected value of , is:
, = 1
( ) (3.13)
46
where constant = min( , ) such that [0, ], 0 ≤ ≤ , is the interval of time
where participants enter the trial by analysis .
For the expected value of , , under perfect ascertainment, trial participants surviving
beyond the time of analysis contribute − time to analysis. Therefore the expected value
of , is expressed as:
, = 1
( − ) ( − ) + ( ) . (3.14)
In a similar manner, the expected value of the time on study squared by analytic time point is:
, = 1
( − )
( − ) + ( ) . (3.15)
From (3.13) and (3.14) it follows that
, , = 1
( ) (3.16)
since , , has mass only when an event occurs, i.e. when , = 1. The expected values of the
product of the time variables at different times of analysis can be expressed as the following for
analysis times and such that < ,
, , = 1
( − )( − ) ( − ) + ( ) + ( − ) ( ) . (3.17)
Notice that once the trial progresses to analysis time from time , , takes on the value of the
censored survival time at time , ( − ). Also, should an event be experienced between
enrollment and analysis time , then , = , . In the case an event does not occur by analysis
time , then both , and , take one the value of their respective censored survival times. Also,
note that trial participants must enter the trial by time = min ( , ).
47
Next, the expected values of the product between the status indicator and the reported time
to analysis can be expressed in the following manner for, as before, analysis times < ,
, , = 1
( ) + ( − ) ( ) . (3.18)
Notice here for an event occurring between the time of enrollment and analysis time , then , =
, . As the trial progresses to analysis time , , takes on the value of the censored survival
time from analysis time . Again here, trial participants must enter the trial by time =
min ( , ).
For expected values of , , for analysis times < , the product only takes on a value of
1 if , = , = 1. Therefore, enrollment must occur by = min ( , ) and thus,
, , = , . (3.19)
Similarly, the expected values of , , for analysis times < , only have mass
where , = 1 and therefore, enrollment must occur by = min ( , ) and , will only
take on values where , = , . Thus,
, , = , , . (3.20)
With expected values (3.13) through (3.20), the asymptotic limits of the covariance matrix (3.12)
can be derived and subsequently the correlation of increments of the score statistic for analysis
times = 1, … , .
Next, for the personal cutback method, recall this processing method requires participants
to have at least one follow-up visit before the analytic time point for their time to contribute to
analysis. Should a trial participant be enrolled in the study at a time such that the analytic time
point occurs prior to the first scheduled visit, then this participant would not be counted in the
48
analysis until the next analytic time point. The status indicator and time variable given study entry
time for the monitoring time for the personal cutback method are defined as:
, = 1 0 ≤ ≤ ( )
− 0 ℎ
and , = 0 ≤ ≤ ( )
− ( )
− ℎ
.
Here ( )
is the last schedule visit prior to analysis time and is a function of the time of
enrollment, and the length between scheduled clinical visits, . Consequently, for any time of
analysis scheduled to occur while enrollment is ongoing ( < ) or if when the analysis is
scheduled after the time of enrollment and − < , participants enrolled in the window of
time [ − , ] do not have a last scheduled visit prior to the analytic time point at time .
Therefore, enrollment can only occur at or before − . Below, I show the expected values in
covariance matrix (3.6) implementing the personal cutback method of data processing. As above,
I express the limits of the integrals as disjoint partitions of time as defined by , and defined
under the personal cutback method. Integrating over all possible times of enrollment, , prior to
analysis time , the expected value of , is:
, = 1
( ) ( ) (3.21)
where = min( − , ) and = min( , ). Note I omit the integral over enrollment
times between ( , ] since a last scheduled visit is not possible and thus no events are reported;
although notice this interval decreases as decreases. Recall under the personal cutback method,
enrolled participants without an observed event by the time of their visit have their survival time
censored and reported as ( ) − . Therefore the expected value , can be written as,
, = 1
( ( ) − ) ( ( ) − ) + ( ) ( ) . (3.22)
Similarly, the expected value of the time on study squared by analytic time point is:
49
, = 1
( ( ) − )
( ( ) − ) + ( ) ( ) . (3.23)
From (3.21) and (3.22) and since , , has mass only when an event occurs prior the last
scheduled visit it follows that
, , = 1
( ) ( ) . (3.24)
The expected values of the product of the time variables at different times of analysis can be
expressed as the following for analysis times and such that < ,
, , = 1
( ( ) − )( ( ) − ) ( ( ) − )
+ ( ) ( ) + ( ( ) − ) ( ) ( ) ( ) . (3.25)
Notice that once the trial progresses to analysis time from time , , takes on the value
of the censored survival time at time , ( ( ) − ). Also, should an event occur between the
time of enrollment and the last scheduled visit prior to analytic time , ( ) − , then , =
, . If participants survive beyond ( ) − , then both , and , are reported as ( ) −
and ( ) − respectively.
Next, the expected values of the products of the status indicator variable at different times
of analysis can be expressed in the following manner for analysis times < ,
, , = 1
( ) ( ) + ( ( ) − ) ( ) ( ) ( ) . (3.26)
Notice here for an event occurring between the time of enrollment and analysis time , then , =
, . As the trial progresses to analysis time , , takes on the value of the censored survival
time from analysis time .
50
Using rationale as above, , , > 0 only where if , = , = 1. Therefore, enrollment
must occur by = min ( , ) and thus,
, , = , . (3.27)
The expected values, , , , only have mass where , = 1 and therefore, enrollment must
occur by . Also, if , = 1 then , = , and thus,
, , = , , . (3.28)
With expected values (3.21) through (3.28), via covariance matrix (3.12), the asymptotic
values of the correlations of the increments can be derived for the personal cutback method of data
processing for any length between visits, , so long as ≥ + .
The standard method of data processing is similar to the personal cutback method in
censoring participants’ time under observation at their last scheduled follow-up visit if an event is
not observed by such a time. For the personal cutback method, this means any event occurring
between participants’ last visit and the time of analysis is not reported until the subsequent follow-
up visit and will not be reported until the next time of analysis. Thus, under the personal cutback
method, all events occurring between the latter windows of time are assumed to be reported with
delay ( = 1). The standard method allows for events occurring between a last visit and the time
of analysis to be reported, with probability (1 − ), and accounted for by the analytic time point.
Define the status indicator and time under observation by analytic time point as,
= 1 ℎ
0 ℎ
, = 1 0 ≤ ≤ ( )
−
( )
− ≤ ≤ − 0 ℎ
51
, = 0 ≤ ≤ ( )
−
(1 − )( ) + ( ( )
− ) ( )
− ≤ ≤ − ( )
− ℎ
.
Using similar notation as above, the expected value of the status indicator variable is,
, = 1
( ) ( ) + (1 − ) ( ) ( )
+ 1
(1 − ) ( )
(3.29)
where, as above, = min( − , ) and = min( , ). The first integral with respect
to the enrollment time, events occurring between the last visit and the time of analysis are reported
with probability (1 − ). Also, note for any time of analysis scheduled to occur while enrollment
is ongoing ( < ) or if when the analysis is scheduled after the time of enrollment and −
< , participants enrolled in the window of time [ − , ] do not have a last scheduled
visit prior to the analytic time point at time . In this case only events reported without delay are
reported by the analytic time point. The second integral with respect to the enrollment period
accounts for latter case and is has a value of zero should the analytic time point occur after
enrollment is complete and − ≥ .
Similarly, the expected value of time under observation by analysis time , needs to
account for event times reported without delay. Therefore the expected value , can be
written as,
52
, = 1
( ( ) − ) ( ( ) − ) + ( ) ( ) + (1 − ) ( ) ( ) + ( ( ) − ) ( ) ( ) + 1
(1 − ) ( )
(3.30)
and as previously, the second term with respect to enrollment time is zero should the analytic time
point occur after enrollment is complete and − ≥ . Similarly, the expected value of the
time on study squared by analytic time point is,
, = 1
( ( ) − )
( ( ) − ) + ( ) ( ) + (1 − ) ( ) ( ) + ( ( ) − )
( ) ( ) + 1
(1 − ) ( ) .
(3.31)
From (3.29) and (3.30) and since , , has mass only when an event occurs prior the last
scheduled visit it follows that
, , = 1
( ) ( ) + (1 − ) ( ) ( ) + 1
(1 − ) ( ) .
(3.32)
The expected values of the product of the time variables at different times of analysis can
be expressed as the following for analysis times and such that < , and accounting for
events reported without delay,
53
, , = 1
( ( ) − )( ( ) − ) ( − ) + ( ) ( ) + (1 − ) ( ) ( ) + ( ( ) − ) ( ) ( ) + ( ( ) − ) ( ) + (1 − )( ( ) − ) ( ) ( ) ( ) + ( ( ) − )( ( ) − ) ( ) ( ) + 1
(1 − ) ( ) .
(3.33)
Notice, as previously, that once the trial progresses to analysis time from time , ,
takes on the value of the censored survival time at time , ( ( ) − ). Also, should an event
occur between the time of enrollment and the last scheduled visit prior to analytic time ,
( ) − , then , = , . If participants survive beyond ( ) − , then both , and , are
reported as ( ) − and ( ) − respectively. Also, note time under observation takes on
the value of the censored reported time if the event is reported with delay with probability .
Next, the expected values of the products of the status indicator variable at different times of
analysis can be expressed in the following manner for analysis times < ,
, , = 1
( ) ( ) + (1 − ) ( ) ( ) + ( ( ) − ) ( ) ( ) + ( ( ) − ) ( ) ( ) + (1 − )( ( ) − ) ( ) ( ) + 1
(1 − ) ( ) .
(3.34)
Notice here for an event occurring between the time of enrollment and analysis time ,
then , = , . As the trial progresses to analysis time , , takes on the value of the censored
54
survival time from analysis time . Time under observation in (3.34) is only accounted for if
event is observed.
I conclude as above, , , > 0 only where if , = , = 1. Therefore, enrollment
must occur by = min ( , ) and thus,
, , = , . (3.35)
Also, the expected values, , , , only have mass where , = 1 and therefore,
enrollment must occur by . Also, if , = 1 then , = , and thus,
, , = , , . (3.36)
With expected values (3.29) through (3.36), via covariance matrix (3.12), the asymptotic
values of the correlations of the increments can be derived for the personal cutback method of data
processing for any length between visits, , and delayed reporting probability where I
assume ≥ + . It is important to note when all events are reported with delay, = 1,
the expected values for the standard method reduce to the expected values of the personal cutback
method, equations (3.21) through (3.28).
Finally, the pull-forward data processing method, assumes participants who are alive at the
time of their last follow-up visit, survive to the analytic time point, ‘pulling forward’ their time on
the study. This method, like the standard method, allows for preferential reporting of certain events
between participants’ last clinical visit before the analytic time point while other events are not
ascertained until the next visit (reported with delay). The status variable and time under
observation are defined as follow:
= 1 ℎ
0 ℎ
55
, = 1 0 ≤ ≤ ( )
−
( )
− ≤ ≤ − 0 ℎ
, = 0 ≤ ≤ ( )
−
(1 − )( ) + ( − ) ( )
− ≤ ≤ − − ℎ
.
Using the notation as above, the expected value of the status indicator variable is,
, = 1
( ) ( ) + (1 − ) ( ) ( ) (3.37)
where, we redefine, = min( , ). Notice events occurring between the last visit and the
time of analysis are reported with probability (1 − ). Similarly, the expected value of time under
observation by analysis time , accounts for event times reported without delay. Therefore the
expected value , can be written as,
, = 1
( ( ) − ) ( ( ) − ) + ( ) ( ) + (1 − ) ( ) ( ) + ( ( ) − ) ( ) ( ) (3.38)
Notice, the expected value of the time under observation accounts for the time that is
‘pulled-forward’ for participants alive after the analytic time point or is reported with delay should
the event occur after the last scheduled visit. Similarly, the expected value of the time on study
squared by analytic time point is,
, = 1
( ( ) − )
( ( ) − ) + ( ) ( ) + (1 − ) ( ) ( ) + ( ( ) − )
( ) ( ) . (3.39)
From (3.37) and (3.38) and since , , has mass only when an event occurs prior the last
scheduled visit it follows
56
, , = 1
( ) ( ) + (1 − ) ( ) ( ) . (3.40)
The expected values of the product of the time variables at different times of analysis are
expressed as the following for analysis times and such that < , and accounting for
events reported without delay,
, , = 1
( − )( − ) ( − ) + ( ) ( ) + (1 − ) ( ) ( ) + ( − ) ( ) ( ) + ( − ) ( ) ( ) + (1 − )( − ) ( ) ( ) + ( − )( − ) ( ) ( ) . (3.41)
Once the trial progresses to analysis time from time , , takes on the value of the
censored survival time at time , ( − ) pulling forward the time assumed alive. Also, should
an event occur between the time of enrollment and the last scheduled visit prior to analytic
time , ( ) − , then , = , . If participants survive beyond ( ) − , then both
, and , are reported as − and − respectively. Also, note time under observation
takes on the value of the censored reported time if the event is reported with delay with
probability . For this expected value, an interesting feature of the pull-forward method is , can
be greater than , between ( ( ) − , − ). In this case, an event reported with delay at
analysis time , between the later interval, has a reported time , = − , while the at analysis
time , the event will be reported with time , ≤ , .
Next, the expected values of the products of the status indicator variable at different times
of analysis can be expressed in the following manner for analysis times < ,
57
, , = 1
( ) ( ) + (1 − ) ( ) ( ) + ( − ) ( ) ( ) + ( − ) ( ) ( ) + (1 − )( − ) ( ) ( ) . (3.42)
Notice here for an event occurring between the time of enrollment and analysis time ,
then , = , . As the trial progresses to analysis time , , takes on the value of the censored
survival time from analysis time which is − .
As with other processing methods, , , > 0 only where if , = , = 1. Therefore,
enrollment must occur by = min ( , ) and thus,
, , = , . (3.43)
Also, the expected values, , , , only have mass where , = 1 and therefore, enrollment
must occur by . Also, if , = 1 then , = , and thus,
, , = , , . (3.44)
With expected values (3.37) through (3.44), the asymptotic limits of the correlations of the
increments can be derived for the personal cutback method of data processing for any length
between visits, , and delayed reporting probability where I assume ≥ + . It is
important to note as approaches 0, the expected values for the personal cutback method reduce
to the expected values of the perfect ascertainment method, equations (3.13) through (3.20).
The expected values from (3.6) resulting from the global cutback data processing method
are not shown in detail here since the expected values are similar to those generated from perfect
ascertainment method except that a previous time point, a ‘relevant date’ before the analytic time
point, is utilized when event statuses of all participants would be reported and up to date. The
status indicator and time under observation variables would thus be defined as,
58
, = 1 0 ≤ ≤ ( − ) − 0 ℎ
and , = 0 ≤ ≤ ( − ) − − ℎ
.
where , a constant, represents that amount of time the analytic time point is pushed back.
3.2.4 Asymptotic results
In this section I apply the method outlined in the previous section for = 3. The calendar times
of the analyses are chosen to reflect when 33%, 66% and 100% of total information is
accumulated in a trial design as outlined in detail in section 4.1. Therefore = 1.3846, =
2.2115, and = 5. I assume the survival time, , , is exponentially distributed with density
function ( ) = and survivor function ( ) = for treatment groups = 1,2. Under
the null hypothesis = for a log hazard ratio of log = 0. Also, as described in section
4.1, I let = 1.
Note that for any proportional hazards model, that is, where ( ) = ( ),C a constant,
the methodology for calculating the asymptotic correlation structure outlined in the previous
sections is applicable, but the time variable needs to be rescaled to ( ), where ( ) is the
inverse survivor function for the baseline group. The density ( ) the exponentially density with
rate parameter = 1. The conclusions about window width are not conclusions about the window
width in the changed time scale.
SAS statistical software version 9.4 and PROC IML was used to find numerical solutions
to values in matrix (3.12) under the standard and pull-forward methods of data processing.
Assuming an underlying exponential distribution for event times provides closed form solutions
to integrals involving the exponential density function. However, the trapezoidal numerical
integration technique was used to approximate integrals over the enrollment times, , specifically
59
since the last scheduled follow-up visit, is a function of enrollment time. Convergence of the
approximation was determined when the difference between subsequent approximations was less
than 1.0 × 10
.
From each processing method, asymptotic limits of matrix (3.12) are derived for values of
delayed reporting probability, , between 0 and 1 and values of length between scheduled follow-
up visits, , between 0 and 0.5 years (1/4 of the enrollment period of 2 years). The correlation
coefficient for each combination of and , is computed using =
( , )
. Recall that under the
null hypothesis, ( , − , … , − ), are independent, i.e. ( , − ) =
Figure 3.1. Correlation coefficient of ( , − ), under the null hypothesis for the standard method.
60
0 and ( − , − ) = 0. For = 3 this implies increments ( , − )
and ( − , − ) are independent. The assumption of independent increments is
fundamental in the creation of regions of rejection for group sequential methods. A deviation from
this assumption yields boundaries that are too liberal or too conservative depending on the
direction of the association.
Figure 3.1 shows the surface of the correlation coefficient of ( , − ) by delayed
reporting probability and window length between follow-up visits for the standard method. Notice
that as the length between visits increases to 0.5 years and as the delayed reporting probability
decreases to zero, the increments deviate from independence. When = 0, all events are reported
without delay and individuals alive at the end of their last follow-up visit, have their time under
observation censored. The wider the window, the further back surviving participants have their
time under observation pushed back, the more time that needs to be accounted for by the
subsequent score statistic at the next time of analysis. This creates a positive association between
increments. Figure 3.1 is instructive on how to eliminate or at least ameliorate the positive
association. First, decreasing the widow length between visits reduces the correlation though
window lengths below one-sixteenth of the enrollment period (0.125 years) may be unachievable.
61
Second, note the independence assumption holds where = 1. Recall that when = 1,
the standard method reduces to the personal cutback method therefore the personal cutback
maintains the assumption of independent increments for any window length between 0 and 0.5
years. Lastly, note the oscillations on the surface for high values of as increases. This is due
to the last scheduled visit being a function of the enrollment time and the length between visits.
Figure 3.2. Correlation coefficient of ( − , − ), under the null hypothesis for the standard
method.
62
Figure 3.2 shows the surface of correlation coefficients of ( − , − ) by window
length and delayed reporting probability. Notice the correlation for large window sizes as
decreases is still positive and greater in magnitude than in figure 3.1. This indicates that by the
end of the trial, the final score statistic could be larger in absolute value simply due to the correlated
nature between increments and not due to a true treatment effect. The boundaries generated under
the null hypothesis are thus incorrect depending on and , though may not be estimable until
sometime after the completion of the trial. Notice when = 1, the correlation coefficient is 0 for
all values of meaning the personal cutback method maintains independent increments.
Figure 3.3. Correlation coefficient of ( , − ), under the null hypothesis for the pull-forward method.
63
Figure 3.3 shows the correlation of ( , − ) by window length and delayed reporting
probability for the pull-forward method. Recall for this method participants alive by their last
follow-up visit are assumed to have survived to the time of analysis. This assumption is incorrect
for participants with an event between their last follow-up visit and the time of analysis. As the
delayed reporting probability increases, the latter situation happens with more frequency.
Therefore the score statistic achieved at the subsequent analytic time point accounts for the
fabricated data in the previous analytic time point. As expected, this worsens as the as the window
length increases and the delayed reporting probability increases where the association between
Figure 3.4. Correlation coefficient of ( − , − ), under the null hypothesis for the pull-forward
method.
64
increments is negative. Notice when = 0, the increments are independent. The pull-forward
method reduces to the situation where trial data are perfectly ascertained.
Figure 3.4 shows the correlation of ( − , − ) by window length and delayed
reporting probability for the pull-forward method. Notice the negative association between
increments persists as in figure 3.3. Both figure 3.3 and 3.4 indicate ameliorating the correlated
nature of the increments means reducing the window length significantly to almost zero or
reducing the delayed reporting probability to almost zero. Any small deviation from either
situation yields increments that are negatively correlated.
3.3 Considerations on Trial Design
It is of immediate interest to determine the effect deviating from the assumption of independent
increments under the null hypothesis of no treatment effect has on the resulting exit probabilities
at each stage of analysis and the total amount of type I error rate spent throughout the entire conduct
of the trial, specifically utilizing boundaries generated assuming independent increments. In the
following section, I quantify the effect on the exit probabilities as described in section
2.4.1, ∗
( ) − ∗
( ), from departing from the assumption of independent increments under
the null hypothesis.
Though the methods described in the following section can be generalized for any times
of analysis, I focus on the case where = 3.
3.3.1 Methods
Statistical package SAS 9.4 and PROC SEQDESIGN were used to obtain the group sequential
boundaries assuming independent increments and R version 3.1.3 and the MVTNORM package
were used to integrate over the tri-variate normal distribution to obtain the exit probabilities. As
65
described by DeMets and Lan (1994), the boundary values are obtained from a multivariate normal
distribution of , … where the covariance matrix has an independent increments structure.
Specifically,
Σ = ( , )
= , ≤
(3.45)
where is the amount of information accumulated by the analytic time point.
The respective asymptotic covariance matrix under the standard method was derived as
shown in section 3.2.3 above for delayed reporting probabilities ∈ {0,0.1,0.2,1} and window
lengths ∈ {0.125,0.25,0.5} in years. The matrix used to obtain the resulting exit probabilities
under the standard method, Σ
, is derived from equation 3.11 such that,
Σ
=
ℎ
( ), ℎ
( ) ℎ
( ) ℎ
( ) , = 1, … , ; = 1, … , .
(3.46)
The fraction of accumulated information used in PROC SEQDESIGN was calculated
utilizing the general framework established above for the standard method. This means the
boundaries to conduct interim and final analyses were generated for accumulated information as
would be observed under the standard method for different values of and .
The exit probabilities resulting from the standard method were obtained for each time of
analysis by integrating over a tri-variate normal distribution with covariance matrix as in equation
3.46. for = 3 with mean vector [0,0,0]
.
66
3.3.2 Results and Conclusion
In appendix A9, table A9.1 shows the comparison of exit probabilities for each of three total
analyses assuming independent increments and under the asymptotic situation using the standard
method. Column 1 details the specific time of analysis (K=3). Column 2 shows the asymptotic
amount of accumulated information for each combination of delayed reporting and window length
under the standard method of data processing. Column 3 shows the group sequential boundaries
for each time of analysis obtained by SEQDESIGN in SAS using the information fraction from
column 2, under the null hypothesis assuming a two-sided test with 0.05 type I error rate. Note
SEQDESIGN assumes an independent increments structure of the score statistics. Column 4
shows the probability of terminating the trial by each time of analysis as described by Lan and
DeMets (1983), assuming an independent increments covariance matrix structure. Column 5
shows the probability terminating the trial at each stage of analysis utilizing the boundaries
obtained from SEQDESIGN but instead using the asymptotic covariance matrix resulting from
applying the standard method of data processing for each combination of window length and
delayed reporting probability listed on the table. Lastly, column 6 shows the total amount of alpha
spent under the asymptotic covariance structure obtained from the standard method with the given
specifications.
Notice, as designed, the total amount alpha spent should be 0.05. Interestingly, the
probability of exiting the trial by the second time of analysis under the standard method (column
5) is less than expected (column 4) for any window length and delayed reporting probability. The
opposite is true for exiting by the final analysis time where the probability of exiting according to
the standard method (column 5) is larger than what is expected under independent increments.
67
Note these differences are miniscule. Also recall when ρ=1 the standard method reduces to the
personal cutback method.
In all, the total amount of alpha spent under the standard method is just below 0.05. Slightly
under spending alpha by the end of the trial results in a trial with reduced power. This will be
corroborated in chapter 4 by the simulation study. Ultimately, the effect of the departure of
independent increments of the score statistic is mitigated by using the observed information from
the data processing method employed, in this case the standard method. This result also leads to
another application of the general framework presented in the previous section in that it can be
used to calculate the amount of information expected under the null hypothesis and subsequently
utilize that to obtain the group sequential boundaries to be used in analysis.
3.4 Conclusion
The assumption of independent increments of the score statistic is fundamental in the generation
of the boundaries used in group-sequential methodologies. By way of the multi-variate delta
method, I showed that the asymptotic behavior of the increment of the score statistics under both
the standard method and the pull-forward method of data processing yield positively and
negatively correlated increments, respectively. Not accounting for such a correlation in the
increment of the score statistic may lead to significant results simply due to the association between
increments.
The surfaces for correlation between increments (figure 3.1 and figure 3.2) applying the
standard method indicate reducing the window of time between follow-up visits alleviates the
association of the increments regardless of the effect of delayed reporting. If the delayed reporting
probability could be managed, then increasing decreases correlation between increments. More
68
importantly, these figures show that the personal cutback method adheres to the assumption of
independent increments for any widow length between 0 and 0.5 years. In the case where only the
window length between visits can be in the control of the trialist, then employing the personal
cutback method for relatively small (0.125 years to 0.25 years) length between visits, assures
adherence to the assumption of independent increments and may allow for as much information as
the standard method accumulates at any analytic time point.
As discussed above, the local community of clinical trialists are reticent to implement the
pull-forward method since time under observation could be incorrectly assumed to be more than
it truly is. The surfaces of the correlation between increments (figure 3.3 and figure 3.4) give more
reason to eliminate the pull-forward method as a viable option in analyzing clinical trial data since
even small deviations from = 0 or = 0 lead to negatively associated increments of the score
statistics, again leading to incorrectly defined boundaries for interim monitoring.
In chapter 4, I explore the assumption of independent increments for the methods of data
processing mentioned above for a small subset of values of and via simulation. Also explored
in chapter 4 is the observed power under the alternative hypothesis where we can expect power to
be reduced based on the result from the previous section.
69
Chapter 4 Simulation Studies
Evaluation of the performance of estimators in a finite-sample setting offer the opportunity to
investigate factors related to the estimation itself and the effect on the associated statistical tests.
Such evaluation is not feasible by studies of real data mainly due to the unknown structure of the
data. This chapter presents empirical findings on parameters and quantities of interest in relation
to the four data processing methods under investigation in the context of interim monitoring. They
include:
Bias in estimation of the log hazard ratio
Observed information fraction
Power of the related statistical test for the log hazard ratio
Type I error of the related statistical test for the log hazard ratio
Average length of study
The assumption of independent increments of the score statistic under the null hypothesis
of no treatment effect
For simplicity, the data processing methods are referred to as the following: Perfect
ascertainment or raw, where all events are reported immediately and non-events are censored at
the time of analysis. Standard, where events can be reported any time prior to the time of analysis,
but ‘well follow-up’ is reported only to the time of last scheduled visit. Personal cutback, where
trial participants’ follow-up obtained after the last scheduled visit before the analytic time point is
not included in the analysis. In this case participants’ survival time is censored at their last follow-
up visit providing no event has been experienced to that point. Global cutback, where the
contribution of data to analysis is pushed back a small window of time equal to the scheduled time
70
between routine visits. Lastly pull-forward, where all non-events are censored at the time of
analysis.
4.1 Methods
4.1.1 Data Generation
SAS statistical software version 9.4 was used to simulate trial data where the outcome of interest
is the log hazard ratio between two treatments. Generated trial data were subject to the following
parameters:
Two-year enrollment period
Three-year enrollment-free follow-up period
208 total trial participants randomly assigned to each of two treatment groups in a 1:1 ratio
Interval between follow-up visits (in years), ∈ {0.125, 0.25, 0.5}
Probability of delayed reporting, ∈ {0, 0.10, 0.25, 0.50, 0.75, 0.90, 1}
Hazard rate in the higher risk group, = 1
Hazard ratio between treatment groups, =
∈ {0.67, 0.85, 1}
Two interim analyses planned when 33% and 66% of total information in the control group
is accumulated
By design, a five-year trial with a two-year enrollment period, nominal type I error rate of 5%,
with 80% percent power to detect a hazard ratio of 0.67, requires 208 trial participants allocated
1:1 in two treatment groups, assuming perfect follow-up. The hazard rates , = 1 and = 0.67
reflect oncology trials involving malignancies such as central nervous system (CNS) tumors and
non-stage-IV breast cancer (McIlvaine 2015). Participants are assumed to enter the study
uniformly over the enrollment period and are observable from the time they enter the study until
71
they either experience an event or their survival time is censored; in other words, no loss to follow-
up. Follow-up visits are scheduled at regular intervals, , from enrollment for all participants,
where it is assumed the length of this interval is the same for all participants throughout the
duration of the trial. The follow up window used in section 4.2.1 is 0.5 years and the effect of
narrowing this window on outcomes of interest is investigated section 4.2.2.
Also examined in this simulation study is the effect on outcomes of interest when observing
less of a treatment effect than anticipated. In this setting, trial data are generated with a hazard
ratio of 0.85. In section 4.5, the assumption of independent increments of the score function is
investigated with trial data generated under the null hypothesis of no treatment effect, a hazard
ratio of 1.
To simulate delayed reporting of events, a Bernoulli random variable is generated for all
participants such that the event is reported with delay with probability . If a participant
experiences an event before the time of analysis and the event is reported without delay, then the
event is reported at the time the event occurs. Otherwise, if the event is reported with delay, the
event status and time are not ascertained until the next follow-up visit. The focus of this simulation
study is on events that occur between participants’ last follow-up visit and the time of analysis. If
during this time, the event is reported with delay, then this event does not contribute to the analysis.
Values of explored include the extreme situations where either no events ( = 0) or all events
( = 1) are reported with delay.
In these simulation studies, time to event is assumed to be exponentially distributed and
5,000 trials were simulated for each combination of parameters.
72
4.1.2 Data Analysis
The data analysis plan for these trials includes a total of three times of analysis, two interim and
one final. Interim analyses occur at calendar times chosen to reflect amounts of accumulated
fraction of total information according to a design as described in the section above. Critical values
defining regions of rejection and continuation for the two interim analyses and the final analysis
are generated utilizing a group-sequential method approach employing a quadratic alpha spending
function,
∗
( ) = , ∈ [0,1] (4.1)
where is the nominal type I error rate and is the accumulated fractional amount of total
information. Note that ∗
(0) = 0 and ∗
(1) = , meaning that by the planned end of the study,
all of is assumed to be spent.
The choice to use a quadratic spending function (4.1) is motivated by its wide usage in
interim analysis of clinical trials of childhood cancers by the Children’s Oncology Group. Benefits
of using a convex spending function such as ∗
( ) (figure 4.1) are the probability of stopping the
trial in the early stages is relatively small and the spending of depends on the amount of
accumulated information and not necessarily on calendar time.
The estimate of information time used in this simulation study is one suggested by Dang
(2015) (see section 2.1.2)
̂ =
[ | ]
(4.2)
which is the quotient of the observed number of events in the control group by analysis time and
the expected number of events under the null hypothesis in the control group. Here the CO denotes
the control group. This estimate is used since it addresses partially the issues of underspending
or overspending . As before, this robust information time estimation method is free of the
73
assumption of a uniform distribution of events as needed by methods relying on calendar time. In
mentioning accumulated information or information fraction throughout, this estimate is what is
referenced.
Interim analyses are planned at calendar times when 33% ( = 0.33) and 66% ( =
0.66) of the total number of events in the control group under the null hypothesis are expected.
The contributed program Analysis of Resources for Trials (ART) in Stata 14 yields calendar times
of 1.3846 years and 2.2115 years respectively for when 33% and 66% of information has
accumulated where a total of 102 events are expected in the control group at the end of the trial.
The final analysis will occur at the end of the trial when it is assumed = 1, or when all of the
102 expected events in the control group are observed. As before, while interim and final analyses
are performed at fixed calendar times, the amount of alpha spent and the boundary values used to
conduct the tests of hypothesis depend on the observed amount of information accumulated up to
that calendar time in the control group. Note that each trial will accumulate a different amount of
information by each fixed calendar time of analysis due to factors related to reporting events with
delay and the data processing technique used.
As mentioned before, repeated testing of accumulating data without adjustment for
multiple analyses leads to an inflation of the type I error rate. Employing group-sequential
methods to generate boundaries of continuation and rejection for each analysis preserves the type
I error rate. Although the analyses are performed at fixed calendar times, the amount of alpha
spent and the boundary used to conduct the test of hypothesis rely solely on the observed amount
of information accumulated up to that calendar time for each trial. In the final analysis, as
recommended, the remainder of alpha is spent regardless of how much information is accumulated
at the planned end of the trial (see section 2.2.2). Figure 4.1 shows the boundaries achieved by
74
using ∗
( ) (4.1) when interim analysis occurs exactly when 33% and 66% of observed
information has accumulated.
The right axis of figure 4.1 shows the regions of termination and continuation of a trial
formed by the critical values at planned interim analyses at information fraction times of 0.33 and
0.66 with a final analysis at 1. The left axis shows the amount of spent using the quadratic
spending function (4.1). The intersection of the dotted lines with the spending function indicate
the amount of spent up to that analysis point. Notice the majority of is spent in the last two
analyses.
Figure 4.1. The quadratic spending function where = 0.05 and the group-sequential boundaries for
information fractions of 33% and 66% and 100%.
75
Data resulting from the simulations were analyzed by the time at which an analysis of data
rejected the null hypothesis, if such occurred. This creates four disjoint groups which for
simplicity will be referred to as rejection groups. Group 1 includes trials that reject the null at the
first interim analysis. Group 2 includes trials that did not reject the null at the first interim analysis
but reject the null at the second interim analysis. Group 3 includes trials that do not reject the null
in the first two interim analyses but reject the null in the third and final time of analysis. Lastly,
group 4 includes trials that do not reject the null hypothesis at any time of analysis and thus no
significant treatment effect was found. PROC LIFEREG was used to estimate the log hazard ratio
between treatment groups. The test statistic from the exponential test of treatment effect was
compared to group-sequential boundaries generated to determine if there is enough evidence to
reject the null hypothesis and the trial terminated. PROC SEQDESIGN was used to determine
boundaries of rejection and continuation utilizing the amount of information accumulated at the
fixed calendar times (4.2) and spending function described above (4.1) for each simulated trial. It
is important to note employing this procedure assumes no more and no less than is spent by the
final analysis, i.e. ∗
(1) = . , regardless of the estimated fraction of expected information that is
obtained at the primary analytic time point. Estimates and quantities of interest were taken as the
mean over the number of trials in respective rejection groups for each combination of design
parameters (see above). The mean trial length for each combination of design parameters was
found by way of a weighted average using the proportion of trials in each rejection group as the
weight. The empirical power of the simulated trials is defined as the proportion of trials that ever
reject the null hypothesis out of the total number of trials using the regression test for treatment
effect, under the alternative hypothesis of a hazard ratio of 0.67 or of 0.85. The probability of a
type I error is calculated as the number of trials that ever reject the null hypothesis over the total
76
number of trials simulated using the regression test for treatment effect, under the null hypothesis
of no treatment effect, hazard ratio of 1. Lastly, the assumption of independent increments is
investigated under the null hypothesis using the Wald test statistic from the regression test for
treatment effect and the information accumulated up the time of analysis as defined by Kalbfleisch
and Prentice (2002).
It is important to note the investigation of the resulting outcomes of interest is a direct
comparison among the methods of data processing discussed here making things equal on the time
scale, since most interim analysis are actually scheduled on a calendar time to accommodate the
DSMC.
4.2 Results: Bias in Estimates of the Natural Log of the Hazard Ratio and Other Outcomes
of Interest
In the following sections, the effects of two structural trial variables on treatment estimates
resulting from the four data processing methods (introduced in chapter 1) were of interest: the
probability of delayed reporting of events, , and the interval between scheduled follow-up
visits, . Also under investigation is the effect of observing a treatment effect that is less than
anticipated (hazard ratio of 0.67), specifically a hazard ratio of 0.85 where the hazard rates are 1
and 0.85 in the control group treatment group respectively.
4.2.1 Under the Hazard Ratio from Study Design
In this section we examine the effect of reporting events with delay on the estimates of the log
hazard ratio. We consider probabilities of delayed reporting of events of 0, 0.10, 0.25, 0.50, 0.75,
0.90, and 1 where the lower extreme represents reporting all events without delay and the upper
77
extreme represents reporting all events with delay. To isolate the effect of delayed reporting, we
set the window between visits to 6 months (0.5 years) since, as discussed in chapter 1, in studies
of childhood cancer, disease evaluations are often done at 3-6 month intervals from trial enrollment
to the planned end of the study.
4.2.1.1 Effect of Delayed Event Reporting
Figure 4.2 shows mean log hazard ratios that result from the four data processing methods across
probabilities of delayed reporting of events for rejection group 1 (i.e. among trials that reject the
null hypothesis at the first interim analysis). Notice how all methods, across all delayed reporting
probabilities, produce mean estimates of the log hazard ratio that reflect a more extreme treatment
effect than under the alternative hypothesis (ln( = 0.67) = −0.400477 … ). The global
cutback method produces the most extreme which is almost double the smallest of all other
estimates (-1.4778 versus -0.8855). As described section 2.4.1, point estimates achieved in early
stages of trials are biased, in this case showing more of treatment effect than expected. Regardless
of the data processing method, the observed estimates of the log hazard ratio among rejection
group 1 are biased and overestimate the true treatment effect. This is most apparent with global
cutback, where we are ‘earlier’ in the study process as judged by the number of observed events.
Although there seems to be a pattern between the standard and pull-forward estimates given
the delayed reporting probability, the only patterns described in detail (section 3.2.3) are how the
pull-forward method processes data very similar to the raw method when events are not reported
with delay and the standard method processes data in the same manner as the personal cutback
method when all events are reported with delay. Hence an agreement in the estimates at those
value of .
78
It is important to note the percentage of trials in rejection group 1 and the amount of
information fraction accumulated for each processing method. The raw processing method shows
9.98% of all trials belong to rejection group 1 accumulating an average of 0.374 amount of
information. The personal and global cutback methods include the least amount of trials in
rejection group 1, 3.68% and 1.08% respectively.
Although both data processing methods are not affected by delayed reporting, the censoring
mechanism of each (at the last clinical visit for the personal cutback and one window-length prior
to the time of analysis for the global cutback) alters the amount of information that accumulates.
In fact, the cutback methods actually lose patients relative to the other methods, since patients
Figure 4.2. The mean log hazard ratio, among trials that reject the null hypothesis at the first interim
analysis, for each data processing method across different delayed reporting schemes.
79
enrolled within of the analysis cutoff will not be counted even if an event is observed. The
personal and global cutback methods accumulate 0.282 and 0.217 amount of information
respectively, both well below the 0.33 expected. This drives the differences in the observed log
ratio among the processing methods since the cutback methods are further behind in terms of
information (figure 4.1) where the boundaries for rejection are much conservative than the
boundaries when information accrued is further along such as with the standard and pull-forward
methods, especially more events are reported without delay.
For the standard and pull-forward methods, both affected by reporting events with delay,
the number of trials in rejection group 1 decreases as the probability of delayed reporting increases.
When all events are reported without delay ( = 0), the standard and pull-forward methods include
up to 7.94% and 9.98% of trials respectively. On the other hand, each includes as little as 3.68%
and 3.3% respectively, when = 1. For both of these methods, the amount of information accrued
also has an inverse relationship with the probability of delayed reporting. The standard and pull-
forward methods accumulate up to 0.357 and 0.366 amount of information when events are
reported without delay and only 0.282 and 0.217 when events are reported with delay (table 4.1).
Table 4.1 Mean Information Fraction for rejection group 1 for standard and pull-forward methods
under the alternative hypothesis of ln(HR=0.67)=-0.4004776.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
STD 0.367 0.357 0.345 0.322 0.304 0.291 0.282
PLF 0.374 0.366 0.352 0.332 0.308 0.296 0.286
It is reasonable to conclude that only very extreme treatment effects are detected at first
time of analysis since as the probability of delayed reporting increases, there is less information
accrued resulting less spend and subsequently a very stringent boundary value to conduct the
test of hypothesis (see figure 4.1).
80
Among trials in rejection group 2 (figure 4.3), that is trials that reject the null hypothesis at
the second interim analysis, the mean log hazard ratio estimates for all processing methods, across
all schemes of delayed reporting, are biased away from the alternative indicating more of a
treatment effect than anticipated. Note this bias is not as severe as that in rejection group 1. The
difference between the maximum and minimum estimates of treatment effect in rejection group 2
is much smaller than that in rejection group 1, with the raw and pull-forward methods producing
a mean estimate of -0.5446 (HR=0.58) and the global cutback method a mean estimate of -0.7049
(HR=0.4942).
Table 4.2 Mean Information Fraction for rejection group 2 for standard and pull-forward methods for the first
and second interim analyses under the alternative hypothesis of ln(HR=0.67)=-0.4004776.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
STD 0.333,0.686 0.323,0.676 0.308, 0.66 0.284,0.635 0.260,0.611 0.245,0.595 0.235,0.585
PLF 0.330,0.689 0.320,0.679 0.307,0.664 0.283,0.638 0.261,0.615 0.246,0.6 0.236,0.59
Again, due to the manner in which the cutback methods censor survival times (table 4.2),
affecting the amount of information accumulated and the percentage of trials in rejection group 2,
only the most extreme treatment effects are detected (personal cutback -0.6159 (HR=0.54)) and
the null hypothesis rejected. The pattern exhibited in figure 4.3 between the standard and pull-
forward methods, similar to that in figure 4.2, can only be verified when either all events are
reported with delay or with no delay.
81
The percentage of trials in rejection group 2 for the personal and global cutback methods
are 28.44% and 20.12% respectively. Compared to other methods, across all delayed reporting
probabilities, the global cutback method includes the least percentage of trials in rejection group
2. For the standard and pull-forward methods, the percentage of trials in rejection group 2
decreases as the probability of delayed reporting increases from 32.62% to 28.44% and 33.38% to
26.84% for each method respectively (table 4.2).
Recall we expect 0.33 and 0.66 amount of information accrued by the first and second
interim analyses. For trials included in rejection group 2 (table 4.2), the amount of information
accumulated at the first time of analysis is less than the amount of information accumulated at the
Figure 4.3. The mean log hazard ratio, among trials that reject the null hypothesis at the second interim
analysis, for each data processing method across different delayed reporting schemes.
82
first time of analysis among trials in rejection group 1 (disjoint groups). For the personal cutback
method, at the first interim analysis, the amount of information accumulated by trials in rejection
group 1 versus rejection group 2 is 0.282 and 0.235 respectively. For the global cutback that
comparison is 0.217 vs 0.165. Among trials in rejection group 2 and when all events are reported
without delay, the standard method and pull-forward methods both yield as much information as
expected (0.333 and 0.330 respectively) at the first interim analysis. Here again, the cutback
methods are ‘earlier’ in the study process as judged by the number of observed events and therefore
have more conservative boundaries for rejecting the null hypothesis.
Figure 4.4. The mean log hazard ratio, among trials that reject the null hypothesis at the final analysis, for
each data processing method across different delayed reporting schemes.
83
By the second interim analysis, among trials in rejection group 2, the personal and global
cutback methods accumulate less information than the 0.66 expected, 0.585 and 0.495
respectively. For the standard and pull-forward methods, the effect of delayed reporting is
evidenced by the decrease of information accumulated as increases, 0.686 to 0.585 and 0.689 to
0.590 respectively.
Among trials in rejection group 3 (see figure 4.4), the difference between the maximum (-
0.3935 for pull-forward when = 0) and minimum (-0.4259 for global cutback) estimates of the
log hazard ratio for all processing methods is 0.0324. The personal and global cutback estimates
(-0.4099 and -0.4259 respectively) slightly overestimate the treatment effect but are within
hundredths of the expected estimate under the alternative hypothesis (labeled as designed on figure
4.4). The standard and pull-forward methods produce estimates that slightly underestimate the
treatment effect for delayed reporting probabilities less than about 0.25 and overestimate the
treatment effect as the delayed reporting probability increases from 0.25.
The personal and global cutback methods each include 46.12% and 57.08% of all trials in
rejection group 3. Unlike the other rejection groups, the global cutback method includes the most
trials in rejection group 3. Also unlike the other rejection groups, as the delayed reporting
probability increases, the percentages of trials included by the standard and pull-forward methods
increases from 37.12% to 46.12% and 34.98% to 48.26% respectively.
Table 4.3 Mean Information Fraction for rejection group 3 for standard and pull-forward methods for the first and
second interim analyses and the final analysis under the alternative hypothesis of ln(HR=0.67)=-0.4004776.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
STD
0.317,0.654,
1.001
0.307,0.643,
1
0.292,0.626,
0.999
0.267,0.598,
0.998
0.243,0.568,
0.996
0.228,0.552,
0.995
0.218,0.541,
0.995
PLF
0.315,0.647,
1.001
0.305,0.636,
1
0.29,0.621,
0.999
0.266,0.594,
0.998
0.243,0.566,
0.996
0.228,0.550,
0.995
0.219,0.540,
0.995
84
There are two conclusions that are made here: 1) a plurality of trials reject the null
hypothesis at the time of final analysis for all methods of processing data (a majority for the global
cutback). 2) Since more information can be accumulated when less events are reported with delay
(small values of ), it is more likely for trials to accumulate enough information for a significant
treatment difference to be detected at earlier times of analyses. As the probability of reporting
events with delay increases, trials that exhibit a true treatment difference need more (calendar)
time to accumulate enough information to reject the null hypothesis. To corroborate this, among
trials in rejection group 3 when = 0, the amount of information the standard method accumulates
at all times of analysis are 0.317, 0.654 1.0007 (vs 0.33, 0.66, 1 as expected). When all events are
reported with delay ( = 1) the amount of information accumulated at all times of analysis is
0.218, 0.541, and 0.995. Similarly, for the pull-forward method, the information accumulated
when = 0 vs = 1 are (0.315, 0.647, and 1.0008) and (0.219, 0.540, and 0.995).
Among trials where the null hypothesis is never rejected (rejection group 4), the mean
estimate of the log hazard ratio is between -0.2111 (HR=0.810) and -0.207 (HR=0.813) (figure
4.5). Again here, the pattern shown by the standard and pull-forward methods as increases can
only be verified for the boundary values of . The number of trials included in rejection group 4
lie between a maximum of 22.32% (standard method when = 0) and a minimum of 21.38%
(pull-forward method when = 0.75) meaning that all processing methods reject a very similar
amount of trials by the end of observation.
85
Rejection group 4 actually details the power with which the null is rejected correctly under
the alternative hypothesis of a true hazard ratio of 0.67 between treatment groups. All methods of
data processing achieve power that is less than expected under the design specifications of the trial.
The personal and global cutback methods yield a power of 78.24 and 78.28 respectively without
regard to the delayed reporting scheme. Table 4.6 shows the power achieved by the standard and
pull-forward methods and the effect of delayed reporting. The similarities in power achieved by
the end of the trial can be attributed to the accumulation of information as trials continue to the
final analysis and reject the null hypothesis.
Figure 4.5. The mean log hazard ratio, among trials that do reject the null hypothesis at any analysis, for
each data processing method across different delayed reporting schemes.
86
Table 4.4 Mean Information Fraction for rejection group 4 for standard and pull-forward methods for the first and
second interim analyses and the final analysis under the alternative hypothesis of ln(HR=0.67)=-0.4004776.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
STD
0.307,0.635,
0.990
0.297,0.624,
0.989
0.282,0.608,
0.988
0.259,0.582,
0.986
0.236,0.555,
0.984
0.221,0.539,
0.982
0.212,0.529,
0.982
PLF
0.306,0.633,
0.989
0.296,0.623,
0.990
0.283,0.608,
0.988
0.259,0.581,
0.986
0.235,0.554,
0.983
0.221,0.538,
0.982
0.212,0.528,
0.982
The amount of information accumulated by the personal and global cutback methods at
each time of analysis is less than expected; (0.212, 0.529, 0.982) and (0.143, 0.434, 0.971)
respectively (table 4.5). For the standard method, although more information is accumulated when
no events are reported with delay (0.307, 0.635, 0.990) versus all events are reported with delay
(0.212, 0.529, 0.982), all underestimate the expected amount of information at each time of
analysis. The same is true for the pull-forward method, (0.306, 0.633, 0.989) versus (0.212, 0.528,
0.982).
Table 4.5 Observed power for the standard and pull-forward methods under the alternative hypothesis of
ln(HR=0.67)=-0.4004776.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
STD
77.68% 77.86% 77.92% 78.16% 78.4% 78.28% 78.24%
PLF
78.34% 78.3% 78.26% 78.44% 78.62% 78.52% 78.4%
Lastly, table 4.7 shows the mean trial length when applying the standard and pull-forward
methods across different probabilities of delayed reporting. The personal and global cutback
methods achieve mean trial lengths of 4.07 years and 4.4 years respectively. Notice in table 4.7
that as the delayed reporting probability increases so does the mean trial length. As expected, the
trial lengths are the largest when all events are reported with delay since this means more loss of
information to determine a true treatment difference.
87
Table 4.6 Mean trial length in years for the standard and pull-forward methods under the alternative
hypothesis of ln(HR=0.67)=-0.4004776.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
STD
3.8 3.82 3.86 3.93 4.03 4.06 4.07
PLF
3.71 3.75 3.82 3.93 4.04 4.09 4.13
4.2.1.2 Effect of follow-up interval length
In this section we examine the effect decreasing the length of the interval of time between
scheduled follow-up visits has on estimates of the log hazard ratio and other outcomes of interest.
We consider window lengths, , of 0.5 years, 0.25 years, and 0.125 years. As the window length
narrows, the probability of events (survival time exponentially distributed) occurring between
participants’ last scheduled follow-up visit and the time of analysis decreases. We expect this to
lead to less information loss due to delayed reporting of events or employing a censoring
mechanism that pushes back the deadline of contribution of data to analysis (cutback methods).
As before, the results are presented by rejection group labeled in the figures by when the null is
rejected: first interim analysis, second interim analysis, final analysis, or null not rejected.
Figure 4.6 is a three dimensional plot of mean log hazard estimates from the standard
method for different delayed reporting probabilities and window lengths. The biggest difference
in estimates resulting from different window lengths occurs when all events are reported with delay
in rejection group 1. In this case, a window of 0.125 years compared to 0.5 years includes more
than double the percentage of trials (8.12% vs 3.68% respectively), relatively increases the amount
of information accumulated by 25.3% (0.353 vs 0.282 respectively), and yields a much closer
estimate of the log hazard ratio to what is achieved by the raw estimation. Also, when the window
between visits is 0.125 years, the effect of delayed reporting on the estimate of the log hazard ratio
88
is minimal with the relative difference between = 0 versus = 1 is less than 5% (-0.872464 vs
-0.91322 respectively).
Among trials in rejection group 2, the difference in estimate of the log hazard ratio between
the widest and narrowest window lengths is much less than that observed in rejection group 1. The
percentage of trials included in rejection group 2 increases by 4% from 28.4% when = 0.5 to
32.4% when = 0.125. The difference in amount of information accrued when = 0 versus
when = 1 is less when the window length is 0.125 years (0.688 and 0.667) versus when the
Length between visits: 0.5 years 0.25 years 0.125 years
Figure 4.6. The effect of delayed reporting and length of follow-up interval on the mean log hazard ratio
estimates from the standard method of data processing, under the alternative hypothesis of a hazard ratio
of 0.67
89
window length is 0.5 years (0.686 and 0.585). A smaller window between visits also means more
information is accumulated at the first time of analysis, though it is still lower than the 0.33
expected, across all delayed reporting probabilities.
The effect of shortening the window size lessens for trials that reject the null hypothesis at
the final analysis (rejection group 3). The biggest difference in the log hazard ratio estimate for
any combination of window length and probability of delayed reporting is in the third decimal
place. The pattern of an increase in the percentage of trials included in rejection group 3 as the
probability of delayed reporting increases is present for all window lengths explore. This shows
that because of the censoring mechanism, the trials that truly exhibit a treatment difference, require
more time to accumulate the needed amount information to reject the null hypothesis, even when
the window between follow-up visits is relatively small.
Decreasing the size of the window between visits also has very little effect on the estimate
of the log hazard ratio for trials that do not reject the null hypothesis by the end of the trial (rejection
group 4). The amount of information accrued by the end of the trial for trials in rejection group 4
is also very similar across window lengths. The only notable improvement evident is the amount
of information that accumulates as the trials progress to the end. As a higher proportion of events
are reported with delay, the mean amount of information accumulated for each time of analysis
increases, though is always below what is expected (0.33, 0.66, and 1). To corroborate this, when
all events are reported with delay and the window length is 0.5 years the mean amount of
information at every time of analysis is 0.212, 0.529, and 0.982. Similarly when all events are
reported with delay but the window length is 0.125, the amount of information at each time of
analysis is 0.284, 0.611, and 0.987.
90
Table 4.7 Observed power for the standard method for window lengths of 0.125 years and 0.25 years under the
alternative hypothesis of ln(HR=0.67)=-0.4004776.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
0.125yrs
78.22% 78.30% 78.26% 78.28% 78.42% 78.44% 78.44%
0.25yrs
78.12% 78.14% 78.22% 78.32% 78.42% 78.42% 78.28%
Narrowing the length of time between visits seems to have little effect on the power
achieved for all methods. The power achieved implementing the personal cutback method for a
window length of 0.25 years vs 0.125 years is 78.28% versus 78.44%. For the global cutback
method, the difference is 78.26% versus 78.38%. From table 4.8 we can see the difference in
power achieved between the two shorter window lengths of 0.25 and 0.125 years for the standard
and pull-forward methods, across different delayed reporting schemes, also changes very little.
Again, this similarity in power for all methods is related to the amount of information that
accumulates as trials progress to the final analysis. As more calendar time passes, more
information is garnered to be able to correctly reject the null hypothesis and detect a treatment
effect.
Accumulating more information at earlier times of analysis affects the mean length of a
trial since there is more information to correctly reject the null hypothesis at earlier times of
analysis. The mean study length when all events are reported without delay and the window
between visits is 0.125 years is 3.719 years. The mean study length when all events are reported
with delay and the window between visits is 0.5 years is 4.074 years. In all, as the window size
decreases and the probability of delayed reporting decreases, the mean length of a trial decreases.
91
Table 4.8 Mean trial length for the standard method for window lengths of 0.125 years and 0.25 years under the
alternative hypothesis of ln(HR=0.67)=-0.4004776.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
0.125yrs
3.72 3.72 3.73 3.76 3.79 3.80 3.80
0.25yrs
3.74 3.75 3.76 3.81 3.86 3.89 3.90
Figure 4.7 shows the effect of window length and delayed reporting on the estimates of the
log hazard ratio for the pull-forward method. Recall that when no events are reported with delay,
the pull-forward method correctly assumes all non-events survive to the time of analysis, reducing
to the raw method. As with the standard method, the biggest differences in estimates of the log
hazard ratio among window lengths occurs in trials belonging to rejection group 1. This difference
decreases as the window length decreases and the delayed reporting probability decreases and is
maximum for trials included in rejection group 1 when all events are reported with delay.
Comparing the log hazard ratio estimates between the standard and pull-forward method,
for each combination of window length, delayed reporting probability, and rejection group, the
biggest difference is 0.018. This makes sense since as shown in section 3.2.3, the censoring
mechanism is very similar between these two data processing methods.
Table 4.9 Observed power for the pull-forward method for window lengths of 0.125 years and 0.25 years under
the alternative hypothesis of ln(HR=0.67)=-0.4004776.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
0.125yrs
78.34% 78.42% 78.40% 78.38% 78.46% 78.36% 78.40%
0.25yrs
78.34% 78.36% 78.44% 78.54% 78.28% 78.40% 78.38%
As can be seen by table 4.10, the power achieved by implementing the pull-forward method
changes very little by narrowing the window between visits. In fact, even as the probability of
delayed reporting of events changes, the power stays within one tenth of a percent.
92
Even with regard to average length of study (table 4.11), the standard and pull-forward
methods provide similar outcomes. The smallest average length of study for the pull-forward,
3.708 years, method occurs when the window length is 0.125 years and all events are reported
without delay. There is a slight improvement from the standard method’s average length of study
in this circumstance only because the pull-forward method correctly assumes non-events survive
to the time of analysis versus the standard method censoring at the last scheduled visit. When all
events are reported with delay and the window length is 0.5 years, the pull-forward’s average
Length between visits: 0.5 years 0.25 years 0.125 years
Figure 4.7. The effect of delayed reporting and length of follow-up interval on the mean log hazard ratio
estimates from the pull-forward method of data processing, under the alternative hypothesis of a hazard
ratio of 0.67
93
length of study is slightly bigger than that of the standard method (4.132 and 4.074 respectively)
since in that case, events reported with delay are incorrectly assumed to survive to the time of
analysis.
Table 4.10 Mean trial length in years for the pull-forward method for window lengths of 0.125 years and 0.25
years under the alternative hypothesis of ln(HR=0.67)=-0.4004776.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
0.125yrs
3.71 3.71 3.73 3.75 3.80 3.80 3.80
0.25yrs
3.71 3.72 3.74 3.81 3.87 3.89 3.91
Recall that estimates of the log hazard ratio for neither the personal nor global cutback
methods are altered by the probability of delayed reporting. Figure 4.8 shows the effect of window
length on estimates of the log hazard ratio for the personal cutback method. As shown in section
3.2.3 the personal cutback method processes data as the standard method does when all events are
reported with delay. This means that estimates of the log hazard ratio, the accumulated amount of
information, percentage of trials per each rejection group, power, and mean length of study are the
same as those produced by the standard method when all events are reported with delay. What is
most evident from figure 4.8 is the effect of window length is most significant among trials in
rejection group 1.
Figure 4.9 shows the effect of window length on estimates of the log hazard ratio from the
global cutback method. Recall the global cutback method pushes back the deadline for data to
contribute to analysis by the window length between visits. More information is lost as the window
between visits gets wider.
94
The percentage of trials in rejection groups 1 and 2 from the global cutback method are the
least of any other method across any delayed reporting probability and window length. But as
trials progress to the final time of analysis, especially for the global cutback method, there is a
‘catch-up’ in terms of accumulating enough information to detect the true difference in treatment
effect. The global cutback method has the most percentage of trials in rejection group 3 than any
other data processing method for any window length and delayed reporting scheme. This method
Length between visits: 0.5 years 0.25 years 0.125 years
Figure 4.8. The effect of delayed reporting and length of follow-up interval on the mean log hazard ratio
estimates from the personal cutback method of data processing, under the alternative hypothesis of a
hazard ratio of 0.67
95
produces the longest average trial lengths for all windows between visits (3.88 and 4.4 years when
the window lengths are 0.125 years and 0.5 years respectively).
4.2.2 Departure of the Hazard Ratio from Study Design
In this section we examine the effect of reporting events with delay on the estimates of the log
hazard ratio under the alternative hypothesis of a hazard ratio of 0.85, i.e. less of a treatment effect
than the trial was designed to detect. As in the previous section, also under investigation is the
Length between visits: 0.5 years 0.25 years 0.125 years
Figure 4.9. The effect of delayed reporting and length of follow-up interval on the mean log hazard ratio
estimates from the global cutback method of data processing, under the alternative hypothesis of a hazard
ratio of 0.67
96
amount of accumulated information, power, and mean trial length. As before, we consider
probabilities of delayed reporting of events of 0, 0.10, 0.25, 0.50, 0.75, 0.90, and 1 and set the
window between scheduled visits to 6 months (0.5 years) in order to isolate the effect of delayed
reporting. Results are presented by rejection group, as above.
It is important to note that according to the Analysis of Resources for Trials (ART) in Stata
14, the expected power for a trial designed with 0.05 type I error rate, 208 participants randomized
in a 1:1 ratio to treatment, two years of enrollment and three years of enrollment-free follow-up,
is 21%. In other words we can expect to correctly reject the null hypothesis 21% of the time in a
trial setting designed to detect a hazard ratio of 0.67 when under the alternative hypothesis of a
true hazard ratio of 0.85. As will be shown, this means a significantly smaller number of trials are
included in rejection groups 1 through 3.
4.2.2.1 Effect of delayed event reporting
Figure 4.10 shows the mean log hazard ratio for trials in rejection group 1 for different schemes
of reporting events with delay. Very similar patterns of mean log hazard ratio estimates are evident
as under the alternative hypothesis of a true hazard ratio of 0.67 and all estimates are severely
biased indicating more of a treatment effect than anticipated (ln(HR=0.85)=-0.162519). The most
extreme log hazard ratio is achieved by the global cutback method (-1.3585) followed by the
personal cutback method (-0.8174). When all events are reported with delay, the standard method
coincides with the personal cutback method and when all events are not reported with delay, the
pull-forward method coincides with the raw method (-0.8024). The proportion of trials included
in rejection group 1 for each data processing is much lower than under the alternative hypothesis
of a true hazard ratio of 0.67. The personal and global cutback methods include 0.78% and 0.3%
97
of trials in rejection group 1. The standard and pull-forward methods include more than two time
the personal cutback and more than five times the global cutback method though this proportion
decreases as delayed reporting is more prominent (from 1.48% to 0.78% for the standard method
and 1.68% to 0.66% for the pull-forward method).
The amount of information accumulated at the first time of analysis by trials in rejection
group 1 for the personal and global cutback methods is 0.293 and 0.225 respectively and are similar
to when under the alternative hypothesis of a hazard ratio of 0.67. The amount of information
Figure 4.10. The mean log hazard ratio, among trials that reject the null hypothesis at the first interim
analysis, for each data processing method across different delayed reporting schemes.
98
accumulated for the personal and pull-forward methods decreases and goes from overestimating
to underestimating the expected 0.33 as the probability of delayed reporting increases (table 4.11).
Table 4.11 Mean Information Fraction for rejection group 1 for standard and pull-forward methods
under the alternative hypothesis of ln(HR=0.85)=-0.1625189.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
STD 0.382 0.371 0.358 0.34 0.309 0.299 0.293
PLF 0.392 0.382 0.371 0.348 0.315 0.303 0.3
Among trials in rejection group 2, the mean log hazard ratio is still biased showing more
of a treatment effect than expected and still the most extreme estimate is that of the global cutback
method (-0.6322) followed by the personal cutback method (-0.5509). Figure 4.11 illustrates the
effect delayed reporting on the estimates produced by the standard and pull-forward methods.
The proportion of trials included in rejection group 2 increases significantly from rejection
group 1 for all methods but in particular for the cutback methods with the personal and global
cutback methods including 5.04% and 3.44% percent of all trials respectively. For the standard
and pull-forward methods, the percentage of trials included decreases as the probability of delayed
reporting increases (from 6.44% to 5.04% for the standard method and 6.86% to 4.96% for the
pull-forward method).
The amount of information accumulated at the first and second time of analysis among
trials in rejection group 2 shows that again, more information in garnered at the time of analysis
when the null is rejected (compare table 4.12 to table 4.11) compared to when the trial continues
to the next time of analysis. The personal cutback method accumulates 0.247 and 0.611 amount
of information at the first and second times of analysis. The global cutback achieves less with
0.173 and 0.516 accumulated at each time of analysis. Table 4.12 shows the amount of information
accumulated by the standard and pull-forward methods for different probabilities of delayed
99
reporting. Notice that less information than expected is accumulated at the first time of analysis
for any processing method and delayed reporting probability combination. Also, for the standard
and pull-forward methods, the amount of accumulated information is overestimated when no
events are reported with delay and decreases as the delayed reporting probability increases to being
underestimated when all events are reported with delay.
Figure 4.11. The mean log hazard ratio, among trials that reject the null hypothesis at the second interim
analysis, for each data processing method across different delayed reporting schemes.
100
Among trials in rejection group 3, figure 4.12 shows the estimates of the log hazard ratio
have a smaller range than previous rejection groups. In fact, the difference between the maximum
and minimum of estimates is in the thousandths. The global cutback method yields the most
extreme treatment effect (-0.3541) followed by the personal cutback method (-0.3509). The
standard and pull-forward methods’ estimates of the log hazard ratio show a decrease as the
probability of delayed reporting increases (from -0.3448 to -0.3509 for the standard method and -
0.3436 to -0.3513 for the pull-forward method). Most noticeable is among trials that reject the
null at the final time of analysis, the mean log hazard ratio detected is still biased and indicates
more of a treatment effect than expected (ln(HR=0.85)=-0.162519). According to ART in Stata
14, 1228 participants would be required to detect a true treatment effect of ln(HR=0.85)=-0.162519
with 80% power and all other trial design parameters equal.
Table 4.12 Mean Information Fraction for rejection group 2 for standard and pull-forward methods for the
first and second interim analyses under the alternative hypothesis of ln(HR=0.85)=-0.1625189.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
STD 0.352,0.710 0.340,0.698 0.327,0.686 0.30,0.660 0.276,0.638 0.26,0.622 0.247,0.611
PLF 0.353,0.717 0.342,0.706 0.328,0.693 0.302,0.667 0.278,0.645 0.262,0.631 0.250,0.619
With regard to the proportion of trials in rejection group 3, the standard and pull-forward
methods include more trials as more events are reported with delay, a reversal in trend among trials
in rejection groups 1 and 2. The standard method includes 11.02% to 13.12% as the probability
of delayed reporting increases from 0 to 1. For the pull-forward method the increase is from
10.36% to 11.34%. The personal and global cutback include the most trials with 13.12% and
15.26% respectively. Similarly to what is concluded in the previous section, even under a true
treatment effect that is less than the trial was designed to detect, we can conclude allowing more
101
(calendar) time to progress permits information to accumulate to be able to reject the null
hypothesis in trials where a true difference exists. In this case the true differences are more extreme
than under the alternative hypothesis of a hazard ratio of 0.85.
The amount of information accumulated among trials in rejection group 3 for each time of
analysis for the personal and global cutback methods are (0.232, 0.568, and 1.003) and (0.159,
0.473, and 0.995) respectively. Notice the amount of information is less than the expected 0.33
and 0.66 as the trial progresses from the first two times of analysis to the final analysis. By the
final analysis both cutback methods achieve within the third decimal place of the amount of
Figure 4.12. The mean log hazard ratio, among trials that reject the null hypothesis at the final analysis,
for each data processing method across different delayed reporting schemes.
102
information expected. Table 4.13 shows the amount of information achieved by the standard and
pull-forward methods for all times of analysis and the effect of delayed reporting of events. For
both the standard and pull-forward methods, the amount of information accumulated by the first
two interim analyses decreases as more events are reported with delay. The amount of information
accumulated in the final analysis for all methods is very close to 1 and changes very little even as
more events are reported with delay.
Table 4.13 Mean Information Fraction for rejection group 3 for standard and pull-forward methods for the first
and second interim analyses and the final analysis under the alternative hypothesis of ln(HR=0.85)=-0.1625189.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
STD
0.335,0.682,
1.006
0.324,0.671,
1.006
0.308,0.653,
1.005
0.283,0.624,
1.004
0.258,0.597,
1.004
0.242,0.580,
1.003
0.232,0.568,
1.003
PLF
0.333,0.677,
1.007
0.323,0.667,
1.007
0.307,0.649,
1.006
0.282,0.621,
1.005
0.258,0.595,
1.003
0.243,0.578,
1.003
0.232,0.567,
1.002
Among trials in rejection group 4, the difference in the maximum and minimum estimates
achieved of the mean log hazard ratio is less than 0.0015 (figure 4.13). The estimate least biased
away from the null of no treatment effect is that of the global cutback method (-0.1142) followed
by the personal cutback method (-0.1149). For both the standard and pull-forward methods the
estimates produced are less biased away from the null as the probability of delayed event reporting
increases (from -0.1155 to -0.1149 for the standard method and -0.1157 to -0.1149 for the pull-
forward method). Notice the mean estimates of the log hazard ratio among trials that do not reject
the null hypothesis by then end of the trial are much closer to the log hazard ratio under the
alternative hypothesis as compared to trials in rejection groups 1 through 3. As noted above, the
trial does not have the sufficient sample size to detect less of treatment effect than was used to
design the trial.
103
As trials progress to the final analysis and the null not rejected, we see the accumulated
amount of information at each time of analysis is an underestimation of what is expected (table
4.14). For the personal and global cutback methods, the amount of information at each time of
analysis is (0.221, 0.546, and 0.990) and (0.149, 0.449, and 0.981). For the standard and pull-
forward methods, the amount of information garnered at each time of analysis decreases as more
events are reported with delay.
Figure 4.13. The mean log hazard ratio, among trials that do reject the null hypothesis at any analysis, for
each data processing method across different delayed reporting schemes.
104
Table 4.14 Mean Information Fraction for rejection group 4 for standard and pull-forward methods for the first
and second interim analyses and the final analysis under the alternative hypothesis of ln(HR=0.85)=-0.1625189.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
STD
0.319,0.655,
0.996
0.309,0.644,
0.995
0.294,0.627,
0.994
0.270,0.60,
0.993
0.245,0.573,
0.991
0.231,0.557,
0.990
0.221,0.546,
0.990
PLF
0.319,0.655,
0.996
0.309,0.644,
0.995
0.294,0.627,
0.994
0.270,0.60,
0.993
0.245,0.573,
0.991
0.230,0.556,
0.990
0.221,0.545,
0.990
The number of trials included in rejection group 4 is directly related to the power with
which the null was correctly rejected under the alternative hypothesis of log(HR=0.85) = -
0.162519. Mainly due to the ‘catch-up’ in proportion of trials included in rejection group 3, the
proportion of trials included in rejection group 4 differs very little for all methods of processing
data and across different delayed reporting schemes. The power achieved by the personal and
global cutback methods is 18.94% and 19% respectively. Table 4.15 shows the power achieved
by the standard and pull-forward methods and the effect of delayed reporting of events. The effect
of delayed reporting of events is almost negligible with regard to correctly rejecting the null
hypothesis by the end of the trial. Note that the power achieved is what would be expected under
the trial design specifications and as achieved by ART in Stata 14.
Table 4.15 Observed power for the standard and pull-forward methods under the alternative hypothesis of
ln(HR=0.85)=-0.1625189.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
STD
18.94% 19.18% 19.2% 19.1% 18.9% 18.88% 18.94%
PLF
18.9% 19.04% 19.1% 19.14% 18.78% 18.84% 18.96%
The inability to detect the true treatment difference with ample power affects the mean
length of study. When applying the personal and global cutback methods the mean trial lengths
are 4.83 years and 4.89 years respectively, almost the total length of the study as designed (5 years).
105
Table 4.16 shows employing either the standard or pull-forward method does not improve the
mean trial length significantly.
Table 4.16 Mean trial lengths in years for the standard and pull-forward methods under the alternative
hypothesis of ln(HR=0.85)=-0.1625189.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
STD
4.77 4.77 4.78 4.80 4.82 4.83 4.83
PLF
4.75 4.76 4.77 4.79 4.82 4.84 4.84
4.2.2.2 Effect of follow-up interval length
In this section we examine the effect decreasing the length of the interval of time between
scheduled follow-up visits has on estimates of the log hazard ratio and other outcomes of interest
and consider window lengths of 0.125 years, 0.25 years , and 0.5 years. I show that even under
the alternative hypothesis of less of a treatment effect than the trial was designed to detect, a
narrowing window length garners more information since less events are likely to fall between
participants’ last follow-up visit and the time of analysis and consequently less events get reported
with delay. This decrease in window length leads to slight improvements in power and mean trial
length.
In figure 4.14 we see the effect decreasing the window length between follow-up visits and
delayed event reporting has on estimates of the mean log hazard ratio from the standard method.
Rejection group 1 (labeled interim 1 in figure 4.14) produces the biggest differences in estimates
of the log hazard ratio with respect to window length. As the window size increases and the
probability of delayed reporting increases, the mean estimate is more biased away from the
alternative indicating more of a treatment effect than expected under the alternative hypothesis.
For the smallest window between visits, 0.125 years, the effect of reporting more events with delay
106
is the smallest yielding an estimate of -0.8136 when no events are reported with delay ( = 0)
versus -0.849 when all events are reported with delay ( = 1). Among trials in rejection group 2
and rejection group 3 there is less difference in the mean estimates of the log hazard ratio with
respect to the window length (appendix table 6). Notice that in rejection group 3, reducing the
window length to 0.125 years still does not yield a mean log hazard ratio near what is expected
under the alternative (ln(HR=0.85) = -0.162519).
Among trials included in rejection group 1 and 2, the drop in percentage of trials as more
events are reported with delay is more significant as the window size increases (appendix tables 5,
Length between visits: 0.5 years 0.25 years 0.125 years
Figure 4.14. The effect of delayed reporting and length of follow-up interval on the mean log hazard ratio
estimates from the standard method of data processing, under the alternative hypothesis of a hazard ratio
of 0.85
107
6, 7 and 8 for complete results). In other words, the effect of delayed reporting on the proportion
of trials included in rejection groups 1 and 2 is decreased by shortening the window between the
last follow-up visit and the time of analysis. Among trials in rejection group 1, when the window
length is 0.125 years, the proportion of trials included is 1.58% when = 0 to 1.52% when =
1. For trials in rejection group 2 when the window length is 0.125 years the proportion of trials
included is 6.76% when = 0 to 6.54% when = 1. Also, the trend of an increase in percentage
of trials included in rejection group 3 as the delayed reporting probability increases is also observed
when the window length is less than 0.5 years but it is not as prominent as when the window length
is 0.5 years.
With regard to information accumulated, for rejection groups 1, 2, and 3, window lengths
of 0.125 years and 0.25 years, no matter the delayed reporting scheme, are overestimates of what
is expected. Note that for rejection group 3, the overestimation is very small.
Due to the ‘catch up’ in information as trials progress to later times of analysis, especially
by the final analysis, the ability to correctly reject the null hypothesis does not vary much even as
the probability of delayed reporting increases (table 4.17). As expected by ART in Stata 14, due
to the initial design parameters expecting to detect a larger treatment effect (in absolute values)
than what is observed under this alternative hypothesis, the power is very low.
Table 4.17 Observed power for the standard method for window lengths of 0.125 years and 0.25 years under the
alternative hypothesis of ln(HR=0.85)=-0.1625189.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
0.125yrs
19.04% 18.9% 19.04% 19% 19.06% 18.9% 19.06%
0.25yrs
19.12% 19.12% 19.22% 19.24% 18.9% 18.92% 18.94%
108
Also, since similar levels of power are achieved and very few trials included in rejection
group 1 and 2, the mean trial length is very close to the entire planned length of the trial (figure
4.18), even as the window between visits decreases to almost one-fourth the original size.
Table 4.18 Mean trial length for the standard method for window lengths of 0.125 years and 0.25 years under the
alternative hypothesis of ln(HR=0.85)=-0.1625189.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
0.125yrs
4.75 4.76 4.76 4.76 4.76 4.77 4.76
0.25yrs
4.75 4.76 4.76 4.76 4.78 4.79 4.79
Recall from chapter 3, the pull-forward method censors trial participants in a very similar
manner as the standard method except that censored individuals are assumed to survive to the time
of analysis versus to the time of the last follow-up visit. As a result, as the window between visits
is shortened, the pull-forward method yields very similar mean estimates of the log hazard ratio as
the standard method (appendix tables 5, 6, 7 and 8 for complete results). Figure 4.15 shows the
effect of window size and delayed event reporting on the mean estimates of the log hazard ratio.
Notice the biggest improvement in consistent estimation of the log hazard ratio is achieved in
rejection group 1 by shortening the window length between visits to 0.125 years.
The proportion of trials included in rejection groups 1, 2 and 3 differ to that achieved by
the standard method by less than one-tenth of one percent (appendix tables 5, 6, 7 and 8 for
complete results) for all window lengths and delayed event probability combinations. Also, the
effect of delayed event reporting and narrowing window length has very similar effects on the
amount of information garnered at each time of analysis. The power achieved utilizing the pull-
forward method (table 4.19) is very similar to that achieved by the standard method and due to a
similar ‘catch-up’ of information achieved by trials in rejection group 3, the effect of delayed
109
reporting and window length on power is very minimal. By the same argument, the mean length
of the trial employing the pull-forward method is very similar to that yielded by the standard
method (table 4.20).
Table 4.19 Observed power for the standard method for window lengths of 0.125 years and 0.25 years under the
alternative hypothesis of ln(HR=0.85)=-0.1625189.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
0.125yrs
18.9% 18.9% 18.88% 18.88% 19.06% 18.86% 18.92%
0.25yrs
18.89% 18.94% 18.86% 19.02% 18.8% 18.88% 18.86%
Length between visits: 0.5 years 0.25 years 0.125 years
Figure 4.15. The effect of delayed reporting and length of follow-up interval on the mean log hazard ratio
estimates from the pull-forward method of data processing, under the alternative hypothesis of a hazard
ratio of 0.85
110
Figure 4.16 shows the effect of delayed event reporting and window length on mean
estimates of the log hazard ratio for the personal cutback method. The trials included in rejection
group 1 experience the biggest difference in mean log hazard ratio by reducing the window size
from 0.5 years to 0.125 years (-1.0341 when = 0.5 to -0.8490 when = 0.125). This
difference in estimates between the widest and narrowest window decreases as trials progress to
later rejection groups (0.06 for rejection group 2, 0.005 for rejection group 3, and 0.0005 for
rejection group 4). Notice that trials that ever reject the null hypothesis (rejection groups 1, 2, and
3) produce mean estimates of the log hazard ratio that are biased away from the alternative showing
more of treatment affect than expected. Also, note as the window length decreases to 0.125 years,
the estimates of the log hazard ratio produced by the personal cutback method approach the
standard method estimates for all rejection groups.
Table 4.20 Mean trial length for the standard method for window lengths of 0.125 years and 0.25 years under the
alternative hypothesis of ln(HR=0.85)=-0.1625189.
Delayed reporting probability
0 0.1 0.25 0.5 0.75 0.9 1
0.125yrs
4.75 4.75 4.76 4.75 4.75 4.76 4.76
0.25yrs
4.75 4.75 4.76 4.77 4.78 4.79 4.79
Narrowing the window length between scheduled follow-up visits improves the number
trials included in rejection group 1 by almost double (1.52% when = 0.125 and 0.78%
when = 0.5) and in rejection group 2 by 1.5% (6.54% when = 0.125 and 5.04% when =
0.5). The amount of information accumulated in rejection groups 1 and 2 achieve estimates very
close to the standard method and overestimate the expected information at each time of anlaysis.
Since more information can accumulate prior to the two interim analysis times when the window
length is 0.125 years, the ‘catch-up’ in information and in the number of trials included in rejection
111
group 3 is not as profound as the ‘catch-up’ experienced when the window length is 0.5 years.
Regardless of the improvement in number of trials included and in the estimate of the log hazard
ratio achieved, the difference in power is but one-tenth of one percent between the widest and
narrowest window lengths (19.06% when = 0.125 and 18.94% when = 0.5). Similarly with
respect to average length of study, the improvement in narrowing the window length to 0.125 years
from 0.5 years is seven one-hundredths of a year (4.76 years when = 0.125 and 4.83 years
when = 0.5).
Length between visits: 0.5 years 0.25 years 0.125 years
Figure 4.16. The effect of delayed reporting and length of follow-up interval on the mean log hazard ratio
estimates from the personal cutback method of data processing, under the alternative hypothesis of a
hazard ratio of 0.85
112
Similarly when using the global cutback data processing method (figure 4.17), shortening
the window length from 0.5 years to 0.125 years improves the number of trials included in rejection
group 1 by a factor of 4 (0.3% and 1.2% respectively), decreases the bias in the mean estimate of
the log hazard ratio from -1.3583 to -0.9036 respectively, and improves the amount of information
collected in rejection groups 1 and 2 so that the estimates are closer to the 0.33 and 0.66 expected.
Regardless of the improvement in estimation at early stages of analysis, decreasing the window
length has very little effect on the observed power (19.04% when = 0.125 and 19% when =
0.5) and on the average length of study (4.79 years when = 0.125 and 4.89 years when =
0.5).
113
4.2.3 Probability of a Type I Error under the null hypothesis
In the following section, the effects of window length between scheduled visits and delayed event
reporting on the observed type I error rate are investigated. This is done specifically under the
null hypothesis of no treatment effect (ln(HR=1) = 0) employing the four data processing methods
under investigation.
Length between visits: 0.5 years 0.25 years 0.125 years
Figure 4.17. The effect of delayed reporting and length of follow-up interval on the mean log hazard ratio
estimates from the global cutback method of data processing, under the alternative hypothesis of a hazard
ratio of 0.85
114
4.2.3.1 Effect of Delayed Reporting and Follow-up Interval Length
Recall by design, the trial should achieve a type I error rate of 0.05. Although repeated testing
tends to inflate the type I error rate (section 2.1.1), the group-sequential methodology and
information fraction method utilized in these simulation studies maintains the type I error rate at
just below 0.05. Overall, for methods subject to delayed reporting, even under the null hypothesis,
the proportion of trials included in rejection groups 1 and 2 decrease as more events are reported
with delay and less information is accumulated. The similarities in the type I errors achieved (see
below) is due to the increase in the proportion of trials included in rejection group 3 as seen in
results under other alternatives.
For the personal and global cutback data processing methods, cutting back the deadline for
contribution of data to analysis leads to a significant loss of information in early stages of analysis
(appendix tables 5, 6, 7 and 8 for complete results). The progression of trials to the final stage of
analysis allows for sufficient information to be accumulated in trials to keep the type I error rate
near the nominal 0.05. Alleviating the loss information with smaller windows between scheduled
visits, essentially dwindling the amount of time cut back from the time of analysis, alters the
probability of a type I error very little. The personal cutback method achieves type I error rates
of 0.0464, 0.0456, and 0.0472 for window lengths of 0.125 years, 0.25 years, and 0.5 years
respectively. The global cutback method achieves type I error rates of 0.0452, 0.0458, and 0.0448
for window lengths of 0.125 years, 0.25 years, and 0.5 years respectively
Table 4.21 shows the type I error rate for the standard and pull-forward methods for three
different window lengths (0.125 years, 0.25 years, and 0.5 years) and the different schemes of
reporting events with delay. Notice there is only a 0.005 difference in error rate among all data
processing methods, across all delayed reporting probabilities, and window lengths. The standard
115
method achieves the maximum at 0.0498 (window=0.5 and rho=0.25) and the global cutback
method achieves the minimum at 0.0448 (window=0.5). Generally, as the delayed reporting
probability increases, the type I error rate decreases (table 4.21).
Table 4.21 The Effect of window length between visits and delayed reporting of events on the observed
type I error rate for the standard and pull-forward methods for window lengths of 0.125 years and 0.25
years under the alternative hypothesis of ln(HR=0.85)=-0.1625189.
Delayed Reporting Probability
0 0.1 0.25 0.5 0.75 0.9 1
STD
0.125 0.0476 0.0466 0.0466 0.0448 0.0464 0.0458 0.0464
0.25 0.0464 0.0462 0.0470 0.0456 0.0464 0.0462 0.0456
0.5 0.0482 0.0490 0.0498 0.0488 0.0478 0.0476 0.0472
PLF
0.125 0.0478 0.0474 0.0470 0.0462 0.0458 0.0456 0.0458
0.25 0.0478 0.0460 0.0454 0.0456 0.0468 0.0466 0.0466
0.5 0.0478 0.0472 0.0486 0.0486 0.0478 0.0478 0.0488
: in years
As will be discussed in the next section, the standard and personal cutback methods
present the most viable options for processing data of the four presented in this dissertation.
Figure (4.20) compares the difference between the type I error rates achieved by the standard and
personal cutback methods for different window lengths and probabilities of delayed reporting.
The dashed lines denote the error rates attained by the standard method and the colors denote the
different window lengths. The solid lines denote the rates achieved by the personal cutback
method. There are a couple of items to note 1) the scale on the y-axis is very small spanning a
range of 0.005 and 2) the rates achieved by the standard method are in agreement with the
personal cutback method when all events are reported with delay.
4.2.4 Conclusion
Performance of the data processing methods under investigation is evaluated by the following: 1)
reducing the amount of information loss especially at early stages of analysis 2) attaining the true
116
log hazard ratio by the final analysis 3) maintaining the power and type I error rate at specified
levels as designed and 4) reducing the mean trial length.
Due to ethical imperatives and the protection of human lives, inspecting accumulating data
in clinical trials at interim stages is necessary to monitor for extreme therapeutic results and to
provide convincing evidence to alter or stop the trial (section 2.1.1). Therefore it is of great import
for the data processing method implemented to perform well at interim stages of data analysis. It
is well understood estimates produced at early stages of analysis are biased (section 2.4) but loss
of information due to the manner in which data is processed for analysis can lead to severe
Figure 4.18. The effect of delayed reporting and length of follow-up interval on the type I error rate from
the standard and personal cutback methods of processing data.
117
deviations under either alternative hypotheses explored above (as shown above). The global
cutback method, pushing back the deadline for contribution of data to analysis by one window
length, yields the most egregious loss of information at the first and second interim analyses, in
particular when the window length is 0.5 years (under either alternative hypothesis above). As the
window shortens, more information can be accumulated and a higher proportion of trials are
included early on, yet the gains are not enough to improve upon the other methods of data
processing. The loss is worst at the first interim analysis since enrollment is still ongoing
eliminating subject information simply due to the cutback. The personal cutback method,
censoring participants’ contribution to analysis at the last clinical visit should they continue to be
under observation beyond then, also experiences loss of information at early stages of analysis but
performs comparable to the standard method as the window between visits shortens.
Although the pull-forward method loses as much information as the standard method due
to delayed reporting (see chapter 3), excluding from analysis the same participants due to the
similarities in the censoring mechanism, has a fundamental flaw in the manner survival time is
fabricated for participants still under observation past the time of their last follow-up visit. For
participants experiencing events between their last follow-up visit and the time of analysis, and
their event reported with delay, the pull-forward method incorrectly assumes survival to the time
of analysis. This creation of fictitious data increases as the window between visits increases. In
the community of researchers at this academic institution, there is a collective reticence in applying
the pull-forward method for the reasons mentioned here.
McIlvaine (2015) showed the standard and pull-forward methods yield biased estimates of
the hazard rate within each treatment group as the probability of delayed reporting increases when
one analysis is done at the end of the trial. In these simulation studies, these methods produced
118
the smaller (less biased) mean estimates of the log hazard ratio mainly due to trials being further
along in terms of elapsed percent of information as compared to the cutback methods. The
personal and global cutback methods suffer early on from pushing back the deadline for the
contribution of data to analysis since enrollment is still ongoing ultimately affecting how far along
the trial is with regard to elapsed percent of information having less enrolled trial participants to
potentially experience an event. This is alleviated significantly by reducing the window between
visits which in turn decreases the cutback for each method of processing data and increases the
amount of information accumulated allowing the trial to be further along in terms of total
information closer to the expected amount of information expected by each time of analysis.
As can be seen, under the alternative hypothesis of a hazard ratio of 0.67 (figures 17, 18,
19, and 20), the bias decreases for all four processing methods as trials progress to the final analysis
and the window between visits decreases to 0.125 years yielding mean estimates of the log hazard
ratio within three decimal places of what is expected (log(HR=0.67) = -0.400478). By the time of
the final analysis, the effect of delayed reporting is smallest when the window between visits is
smallest. Under the alternative hypothesis of a hazard ratio of 0.85, since the trial design is not
sufficiently powered (insufficient sample size) to detect a weaker treatment effect, the bias persists
to the final analysis where stronger than expected (under the alternative) treatment effects are
detected. Again, in this situation, the effect of delayed reporting is subsided by the shortening of
the window between scheduled follow-up visits.
In previous sections, it was shown the power achieved by the end of the trial differs very
little across data processing techniques under investigation here, for either of the alternative
hypotheses under review. Under the alternative hypothesis of a hazard of 0.67, each method
maintained the power just under the 80% as specified in the design of the trial and was altered only
119
slightly as more events were reported with delay and the window between visits decreased. Under
the alternative hypothesis of a hazard ratio of 0.85, although under-powered, the power achieved
by the four data processing methods was maintained just below the 21% predicted by the ART
from Stata 14.
An observation made here in the setting of interim analyses and corroborated by McIlvaine
(2015) when analysis is completed once after enrollment is complete is given enough calendar
time, trials progressing to later stages of analysis will garner enough information to reject the null
hypothesis when there is a true difference present. This catch-up in information is most apparent
in trials included in rejection group 3, regardless of the true treatment effect (appendix tables 1, 2,
3, and 4 for complete results). The trial design in this simulation study includes detecting a
treatment effect (log(HR=0.67) = 0.67) with 80% power. Figure 4.19 compares the effect of
window length and delayed reporting probability on power attained implementing the standard
method versus the personal cutback method under the alternative hypothesis of a hazard ratio of
0.67; notice the range of the y-axis is 0.76%. Figure 4.20 compares the effect of window length
and delayed reporting of events on power achieved between the standard method and the personal
cutback method under the alternative hypothesis of a hazard ratio of 0.85. Also in this situation,
notice the small range in the y-axis. While there seems to be a pattern in power attained by the
standard method and the probability of delayed reporting (for either alternative hypothesis), only
the agreement in estimates when all events are reported with delay can be verified. The pull-
forward estimate achieves power comparable to the standard and personal cutback methods.
120
With respect to the overall type I error rate (under the null hypothesis of no treatment
effect), the four methods of processing data maintain the nominal type I error rate close to, albeit
slightly below, the 0.05 as designed. The effect of delayed reporting is more prominent in earlier
stages of analysis but as trials progress to later stages of analysis, sufficient information is
accumulated making the effect of delayed reporting by the end of the trial almost negligible with
respect to the type I error rate. Also, narrowing the window between follow-up visits yields little
effect on the type I error rate observed. As mentioned above, only a difference of 0.005 exists
Figure 4.19. The effect of delayed reporting and length of follow-up interval on the observed power from
the standard and personal cutback methods of processing data, under the alternative hypothesis of a hazard
ratio of 0.67.
121
between the maximum and minimum error rate attained among all data processing methods, across
all delayed reporting probabilities, and window lengths (figure 4.18).
Lastly, related to the number of trials included in the rejection groups is the mean length
of trial achieved implementing the four data processing techniques under investigation. Under the
alternative hypothesis of a hazard ratio of 0.67, the effect of delayed reporting of events is most
noticeable when the window between follow-up visits is 0.5 years. Note that in this setting, 0.5
years represents a time when the window width is 25% of the total accrual time. In this case,
reporting all events with delay versus all with delay reduces the mean trial length of the standard
Figure 4.20. The effect of delayed reporting and length of follow-up interval on the observed power from
the standard and personal cutback methods of processing data, under the alternative hypothesis of a hazard
ratio of 0.85.
122
method by a little more than a quarter of a year (0.271 years) and of the pull-forward method by
almost half a year (0.424 years). Reducing the window length also reduces the mean trial length
for the standard and pull-forward methods but it most notable in the cutback methods.
Implementing the personal cutback method and reducing the window between follow-up visits
from 0.5 years to 0.125 years reduces the mean trial length by over a quarter of a year (0.272
years). For the global cutback method the same reduction in window length leads to a reduction
in mean trial length by over half a year (0.519 years). Under the alternative hypothesis of a hazard
ratio of 0.85, the mean trial length is just less than a quarter of a year from the full trial length of
5 years for all methods of data processing across different probabilities of delayed reporting of
events and window lengths between follow-up visits (appendix tables 5,6,7 and 8 for complete
results).
In all, there are benefits and trade-offs to be decided on prior to implementing any of the
four processing methods explored here to analyze trial data. In essence, due to the poor
performance of the global cutback method in early times of analysis and to the reasonable
reluctance of the local clinical trials community not to use methods that fabricate participant data,
the most viable processing methods remaining are the standard method and personal cutback
method. The benefit of implementing the personal cutback method is the outcomes of interest are
not subject to the probability of reporting events with delay, a probability that is not known in the
design phase of a trial and depends mainly on logistical aspects of the management of the trial.
The loss of information due to the censoring mechanism of the personal cutback method can be
mitigated by reducing the window between follow-up visits. In the next section, I review
simulation results on the assumption of independent increments, an assumption assumed and
123
necessary in the group-sequential methodology for generating the bounds of rejection
continuation.
4.3 Results: Effect of Delayed Reporting and Follow-up Interval Length on the Assumption
of Independent Increments
In chapter three (section 3.2.4) I showed the correlation structure of the multivariate distribution
of the score statistics achieved at three different times of analysis under the null hypothesis of no
treatment effect is violated when applying the standard or pull-forward data processing methods.
Specifically, the deviation from independent increments occurs applying the standard method
(figures 8 and 9, section 3.2.4) when the probability of reporting events with delay decreases and
the window between scheduled follow-up visits increases. Applying the pull-forward method the
deviation from this assumption occurs as the probability of reporting events with delay increases
and the window between follow-up visits increases (figures 10 and 11, section 3.2.4). As noted
in chapter two, many methods depend on the independent increment property for the construction
of group sequential boundaries including the method employed in this simulation study.
The following are results from survival data generated under the null hypothesis of no
treatment effect, specifically with hazard rates = 1 and = 1 in the control and treatment
group respectively, with all other design parameters kept the same as in previous sections.
Correlation of increments includes results from all 5000 simulated trials per each combination of
window and probability of delayed reporting. The assumption of independent increments is
investigated using the Wald test statistic from the regression test for treatment effect and the
observed information accumulated up the time of analysis as defined by Kalbfleisch and Prentice
(2002) implementing a result by Scharfstein et al. (1997) (section 2.2.2). For the remainder of this
section I will use the following notation to denote the score statistic achieved at analysis time
124
= ∗
.
(4.3)
Here and ∗
denote the estimate for the Fisher information based on the data available at
analysis and the Wald statistic achieved from the exponential regression test for treatment effect.
4.3.1 Investigation of the Assumption of Independent Increments
In this section I consider all 5,000 simulated trials per window length and delayed reporting
probability combination in assessing the assumption of independent increments under different
data processing techniques evaluated in this simulation study.
In chapter three I show the covariance structure of the multivariate distribution of the score
statistics achieved at each interim analysis and the final analysis, applying the personal cutback
method of data processing, conforms to the assumption of independent increments; in other
words, ( , − ) = 0 and ( − , − ) = 0 (chapter 3 equation 3.2). In
agreement with the theoretical result, the simulations demonstrated, when processing data with the
personal cutback method, the correlation between the estimate of the first score statistic, (4.3),
and the estimate of the increment to the second score statistic, − , are not statistically
significantly different from zero for the window lengths of 0.125 years, 0.25 years, and 0.5 years.
The same is true when implementing the global cutback method of processing data.
Via simulation, applying the standard method of data processing, the pattern of the
correlation coefficient between and − is similar to that seen in the result from chapter
3 (chapter 3 figure 8) in that the correlation coefficient increases as the window size increases and
as the probability of reporting events with delay increases (figure 4.21). A conventional statistical
test of the null hypothesis that the correlation is zero, however, was not statistically significant at
the 0.05 level. As noted earlier, the positive correlation occurring when all events are reported
125
without delay as the window size increases is driven by censoring event-free participants at their
last follow-up visit yet reporting without delay all events occurring between participants’ last
follow-up visits and the time of analysis. This essentially creates two different deadlines for the
analytic time point, one for participants with ‘well follow-up’ and another for participants
experiencing events. The wider the interval of time between visits, the more ‘well follow-up’ time
that needs to be made up and accounted for at the next time of analysis by the subsequent estimate
of the score statistic.
Figure 4.22 shows the correlation coefficients of − , − , the estimate of the
increment from the first score statistic to the second score statistic and the estimate of the increment
form the second to the third score statistics for different schemes of delayed reporting and window
between follow-up visits. Again here, the simulation results show the correlation between
increments have a similar pattern as expected in the asymptotic distribution of the score statistics
when applying the standard method of data processing. In figure 4.22, red correlation coefficients
denote statistical significance at the 0.05 level of significance. The maximum correlation
coefficient, r = 0.0403, is achieved when all events are reported without delay at the widest window
between visits of 0.5 years.
In both figure 4.21 and figure 4.22, the correlation between increments decreases both as
the window decreases and more notably as the probability of delayed reporting increases. When
all events occurring between participants’ last follow-up visits and the time of analysis are reported
with delay, the standard method of data processing reduces to the personal cutback method and as
shown in section 3.2.3, the personal cutback method yields score statistics that have an independent
increment covariance structure. Thus reporting all events occurring between participants’ last visit
and the time of analysis with delay ameliorates the effect of window length between visits on the
126
correlated nature of the increment of the score statistics. On the other hand, reducing the window
between visits to, in this case, at least 0.125 years, reduces the effect of reporting events with delay
on the correlated nature of the score statistics when applying the standard method.
Figures 4.23 and 4.24 show the correlation coefficients for , − and −
, − when implementing the pull-forward method of data processing. In this setting,
participants surviving to the time of their last follow-up visit are assumed to survive to the analytic
time point. This assumption is incorrect for participants who experience an event between their
Figure 4.21. The effect of delayed reporting and length of follow-up interval on the correlation between
the first score statistic and the increment from the first to the second score statistic, for the standard
method, under the null hypothesis of no treatment effect. Note red denotes statistical significance
127
last follow-up visit and the time of analysis and their event reported with delay. The subsequent
score statistic is affected by the reported time that was incorrectly fabricated from the previous
time of analysis. This leads to a negative association between increments of the score function as
shown via simulation.
For the correlation of , − , the pull-forward method produces statistically
significant correlations when the window between visits are widest (all but delayed reporting
probability of 0). Though reducing the window length, , seems to produce uncorrelated
Figure 4.22. The effect of delayed reporting and length of follow-up interval on the correlation between
the increment from the first to the second score statistic and the increment from the second to the third
score statistic, for the standard method, under the null hypothesis of no treatment effect. Note red denotes
statistical significance
128
increments in this case, the correlated nature becomes greater (negatively) in magnitude and
significance (decrease in pvalue) as the probability of delayed reporting increases. In the worst-
case, when all events are reported with delay and the window is the widest at 0.5 years, the
correlation is negative with a magnitude (r = -0.1129) (figure 4.23).
The correlation between the second and third estimate of the score statistic
increments, − , − , show a similar pattern as with the previous increments though with
a significant correlation when all events are reported with delay with a window of 0.5 years that is
Figure 4.23. The effect of delayed reporting and length of follow-up interval on the correlation between
the first score statistic and the increment from the first to the second score statistic, for the pull-forward
method, under the null hypothesis of no treatment effect. Note red denotes statistical significance.
129
smaller in magnitude (r= -0.0991 vs r=-0.1129). From the simulation results, it seems reducing
the window may ameliorate the effect of delayed reporting of events on the correlated nature of
the increments though a smaller window between visits may not be feasible in practice. The
decrease of delayed reporting probability to zero yields estimates of the score function that hold
the assumption of independent increments of the estimated score statistics since, in that case, the
pull-forward reduces to the perfect ascertainment of trial information (raw method) (figure 4.24).
Figure 4.24. The effect of delayed reporting and length of follow-up interval on the correlation between
the increment from the first to the second score statistic and the increment from the second to the third
score statistic, for the pull-forward method, under the null hypothesis of no treatment effect. Note red
denotes statistical significance
130
4.3.2 Issues with Application of Asymptotic Results in a Final Sample Setting
The main issue with applying asymptotic results to data from finite samples is extrapolating what
occurs as the sample size approaches infinity to a sample size that is relatively small. In this case,
a sample size of 208. For confirmation of the soundness of the general frame work presented in
the chapter 3, this discussion focusses on the standard method when all events are reported without
delay ( = ) and the window between visits is 0.5 years. Table A5.1 (appendix 5) shows the
asymptotic expected values of the status indicator variable and time under observation at different
times of analysis and the products of that status indicator and time under observation in different
combinations at different times of analysis and compares the average values achieved in finite
samples for group sizes of 104, 500, 1000, and 2000 participants each over 5000 simulated trials
for each. Notice the average values achieved in a finite samples approach the asymptotic values
for a group size of 104. The difference between average values and the asymptotic expected values
is fairly small as the group size increases.
In tables in appendix 6, we see the difference between the asymptotic variance covariance
values, as explained in section 3.2.2 in equation 3.6, differ very little from the observed variance
covariance values achieved in finite samples. In appendix 7, showing the asymptotic and observed
variance covariance matrix of the score statistics from equation 3.11, again we see a group size of
104 is sufficiently large to achieve very near asymptotic values. Ultimately, verifying this
behavior continues to further transformations of variance covariance matrices, tables in appendix
8 show the variance covariance matrices for the increments of the score statistics as described in
equation 3.12 in section 3.2.2. In fact, the correlation coefficients, ( , − ), ( −
, − ) are within three decimal places at most from the asymptotic value. The discrepancy
in the observed correlation between increments of the score statistic under the standard method
131
from what is expected asymptotically (figure 3.1 versus figure 4.21 and figure 3.2 versus figure
4.22) may very well be due to a possibly slow convergence of the Wald statistic for relatively small
values of the sample size. This dissonance may also be brought upon by variation brought upon
by the nature of the simulation.
4.3.3 Conclusion
Simulation results from the previous section confirm the asymptotic results from chapter 3 that
when applying the standard or pull-forward method of data processing under the null hypothesis,
the estimates of the increments of the score statistics are indeed correlated and violate a
fundamental assumption made in the generation of the boundaries used in sequential monitoring.
The cutback methods, also via simulation (appendix tables 9 and 10), confirm the resulting
covariance structure conforms to the assumption of independent increments, also under the null
hypothesis.
The only mechanical difference between the standard and pull-forward methods are the
times reported for censored survival times. The manner in which subsequent score statistics make
up for the unreported survival time or the fabricated survival time leads to the difference in
direction of the association between the estimated increments of the score statistics. The
magnitude of this correlation can be troublesome since not accounting for this can lead to
boundaries for sequential monitoring that are too conservative or too liberal based on the direction
of the correlation and the method of data processing used.
In practice it may be difficult or unfeasible to predict the proportion of events that will be
reported with delay due to logistical and reporting mechanisms already in place. Under both
standard and pull-forward data processing techniques, it seems the simulation results indicate
132
reducing the window between visits ameliorates the correlated structure of the estimates of the
increments of the score statistics, though the standard method shows more benefit in this regard
than the pull-forward method.
Lastly, the result from this simulation investigating the increments of the estimates of the
score statistics achieved at three total times of analysis provides an heuristic justification for the
for the result given by Scharfstein et al. (1997) (section 2.2.2) pertaining to the asymptotic behavior
under the null hypothesis of the estimate of the score statistic (4.3) and the structure of the
covariance matrix.
4.4 Conclusion and Recommendations
Simulation results from section 4.2 suggest methods of data processing perform better at interim
analyses when the accumulated information fraction is closer to what is expected as the design of
the trial indicates. Methods that cutback the deadline for contribution of data to analysis essentially
push back how far along the trial is with regard to total information elapse to regions where the
boundaries are designed to only detect extreme treatment differences (figure 4.1) therefore yielding
severely biased mean estimates of the log hazard ratio as compared to the standard and pull-
forward methods. The effect of the pushback in information lapse is alleviated by decreasing the
window length between visits allowing for a higher proportion of participants to have a last
scheduled visit before the time of analysis and decreases the amount of time between a last visit
and the time of analysis where if an event should occur, it would go undetected until the next
scheduled follow-up visit. In practice however, there is a limit to how compressed a time table
can be for follow-up visits mainly due to logistical and administrative purposes.
133
The use of the pull forward method as a sensitivity analysis is not warranted because the
underlying assumptions used to calculate interim monitoring boundaries generally are not met.
The global cutback method, while producing unbiased estimates (McIlvain 2015) of the hazard
rate in each treatment group, is not appropriate when analyses are performed early in the
information progression of the trial since the cutback in time cuts deep into the time of enrollment
and the most severely biased log hazard ratios in the first two interim analyses. The standard
method and personal cutback method are the best options for analyzing data though problems do
arise due to window length between visits and the delayed reporting of events but mainly to the
loss information.
A possible solution to increase the amount of information garnered by each time of analysis
may be to increase the number of interim monitoring points. Although the sequential monitoring
method employed in this research is flexible to include unplanned intermittent analyses and still
preserving the overall type I error rate, due to the manner in which participants are followed
throughout the duration of the trial (every window length from the time of enrollment) if
analyses should occur more frequently than every increment in calendar time , very little if no
information will be added. This is especially the case when analyses are conducted while
enrollment is still ongoing since due to delayed reporting, participants’ time under observation is
not included in the analysis unless they have a last scheduled visit prior to analysis.
According to the criteria for performance discussed in 4.2.4, the standard method performs
best in reducing the amount of information loss when analyses are conducted early in information
time and reduces the amount of information loss overall since it allows for reporting of events
between participants’ last follow-up visit and the time of analysis, unlike the personal cutback
134
methods. The standard method yields the smallest mean trial length and coincides with the
personal cutback when all events are reported with delay.
In 3.2.4 and corroborated by simulation in section 4.3, the standard method of data
processing yields estimates of the increments of the score statistics that are correlated violating a
fundamental assumption needed in the creation of the boundaries of rejection in group sequential
methodologies. In both the asymptotic results and in simulation, the correlation increases in
magnitude as the probability of delayed reporting decreases. This is troublesome since not
accounting for this correlation can lead to boundaries of rejection for early analyses that may lead
rejecting the null hypothesis purely due to this correlation. While it may seem unfeasible to predict
the proportion of events reported with delay, we saw in section 4.3.1 that reducing the window
between visits alleviates correlated nature of the estimated score statistics. Reducing the window
length also is a solution for making the personal cutback method perform better in being further
along in elapsed information minimizing the bias that results from cutting back the time for
contribution of data to analysis.
A benefit discussed in detail by McIlvaine (2015) in opting for the personal cutback method
in that the hazard rate estimates are unbiased and the censoring mechanism is ignorable in analysis
(section 1.1.1). Ultimately, the results from this research indicate implementing the personal
cutback method in the case the window between visits can be reduced to at least
of the enrollment
period or in this case 0.25 years. Reducing the window between visits improves the amount of
information accumulated by each time of analysis while maintaining assumptions like independent
increments and ignorable censoring.
135
Chapter 5 Application to Real Data
In the following sections I apply the methods of data processing explored in this dissertation to
data from a cancer clinical trial where the primary outcome is event-free survival and events are
defined as relapse, second malignancy, or death. Survival time was measured from the time of
study enrollment to the time of last contact or the time an event is reported. It is of interest to
investigate the performance of these methods in the context of interim analysis both when
enrollment is ongoing and during enrollment-free follow-up periods.
5.1 AEWS0031 Study
The purpose of the AEWS0031 study, available to member institutions of the Children’s Oncology
Group from May 2001 to August 2005, was to determine if treatment intensification via
compression of the time between subsequent treatments could improve the event-free survival of
trial participants aged 50 years or younger with newly diagnosed with non-metastatic Ewing’s
sarcoma. In total, 568 eligible participants were randomized to receive either control Regimen A
or the experimental Regimen B. Participants assigned to receive the control regimen received
chemotherapy every 21 days whereas participants assigned to the experimental regimen received
chemotherapy every 14 days, permitting a recovery of their blood count.
5.2 Methods
The AEWS0031 study was open for enrollment for 4.5 years and was planned to enroll 528 trial
participants. An enrollment-free follow-up period was planned for 1 year after the last participant
was enrolled. A total of four analyses were conducted at calendar times December 31
st
2002,
December 31
st
2004, June 30
th
2015, and October 30
th
2009. Note the final analysis was conducted
136
over almost 2 years after the intended final analysis date outlined in the study protocol. The study
was designed to detect a true treatment difference between the control and experimental regimens
(hazard ratio of 0.64) with 80% power and type I error rate of 0.05.
In this data, I assume follow-up visits occur at regular 3 month intervals (about 90 days)
and data is processed and results produced for each of the four data processing methods presented
in this research. Also note the method of analysis described in the protocol utilizes the log-rank
statistic to determine a significance treatment effect on event-free survival. The results presented
in the following section also utilize the log-rank test for treatment effect. Crude rates are calculated
for each treatment group and is presented as the number of events per 10,000 patient-days. The
rate ratio is presented as the quotient of the crude rate observed in the experimental regimen and
the observed crude rate in the standard regimen.
The interim analysis plan includes generating boundaries of rejection and continuation
using the Lan and DeMets quadratic spending function (section 2.4.1) utilizing the observed total
information i.e. the quotient of observed number of events by each time of analysis and the total
number of events observed by the end of the trial under each method of data processing.
5.3 Results
Table 5.1 tabulates the resulting data from analysis of data from the first interim analytic time
point. As described in chapter 4, a window length of about 0.25 years (about 90 days) alleviates
bias present in the estimates of the log hazard ratios among trials that reject the null hypothesis
early in analysis. By the first interim analysis, under the standard, personal cutback, and pull-
forward methods, a similar number of events are observed though the total patient time is much
larger for the pull-forward method since individuals without reported events have their time under
137
study pull-forward to the time of analysis. As expected, the global cutback method suffers in the
amount of events reported prior to 90 days before the analytic time point. Here, since no events
are reported between a trial participant’s last contact and the time of analysis, the standard and
personal cutback methods yield the same results. Notice the crude rates differ between the standard
and personal cutback and pull-forward data processing methods though the rate ratios are very
similar even when the pull-forward method accounts for more total patient time. The global
cutback method includes the least number of trial participants by this analysis time and also
excludes more events than other methods. In all, the trial would continue under any of the
processing methods since the boundary and resulting significance level for this time of analysis is
strikingly more stringent than the produced pvalues.
Table 5.1. Results from first interim analysis at calendar time 12/31/2002
Method
Total
Events
Patient
Time
Crude
Rates
Rate
Ratio
Log-rank
Pvalue
Standard
Compressed 2 15739 1.271
Standard 6 15988 3.753 0.339 0.242
Personal Cutback
Compressed 2 15739 1.271
Standard 6 15988 3.753 0.339 0.242
Global Cutback
Compressed 2 16912 1.183
Standard 4 17807 2.246 0.526 0.449
Pull-Forward
Compressed 2 24181 0.827
Standard 6 25362 2.366 0.35 0.167
By the second time of analysis, for the same reason as stated above, the standard and
personal cutback methods yield the same results. Here, as expected, the global cutback method
includes the least number of participants for this analysis and the pull-forward method accounts
for the most patient time. Although the rates for the pull-forward are smaller in magnitude, the
138
resulting rate ratio are comparable with the rates resulting from the other processing methods.
Interestingly, events occur in such a way that all methods report the same number of events prior
to analysis.
Table 5.2. Results from second interim analysis at calendar time 12/31/2004
Method
Total
Events
Patient
Time
Crude
Rates
Rate
Ratio
Log-rank
Pvalue
Standard
Compressed 12 49878 2.406
Standard 17 50284 3.381 0.712 0.38
Personal Cutback
Compressed 12 49878 2.406
Standard 17 50284 3.381 0.712 0.38
Global Cutback
Compressed 12 54654 2.196
Standard 17 55620 3.056 0.718 0.37
Pull-Forward
Compressed 12 67225 1.785
Standard 17 68077 2.497 0.715 0.379
The trial continues to the third time of analysis (table 5.3) where the global cutback includes
one less participant to analysis than the standard and pull-forward methods. Incidentally, due to
the earlier time of analysis from the global cutback method, there are two less events in the control
group that do not have a reported event by the new time of analysis. This leads to the global
cutback method producing a rate ratio that is slightly larger in magnitude than other rate ratios.
Again here, though the rates differ slightly depending on the processing method used, the rates are
similar and the trial would continue to the final analysis.
139
Table 5.3. Results from third interim analysis at calendar time 6/30/2005
Method
Total
Events
Patient
Time
Crude
Rates
Rate
Ratio
Log-rank
Pvalue
Standard
Compressed 26 135697 1.916
Standard 39 134353 2.903 0.66 0.103
Personal Cutback
Compressed 26 135697 1.916
Standard 39 134353 2.903 0.66 0.103
Global Cutback
Compressed 26 150771 1.724
Standard 37 150573 2.457 0.702 0.166
Pull-Forward
Compressed 26 172127 1.511
Standard 39 170983 2.281 0.662 0.107
By the final analysis (table 5.4), all methods of data processing detect the same number of
events though the patient time differs due to the censoring mechanism of each method. Mainly
due to the conduct of the final analysis over 2 years after the intended end of the trial, all methods
include all 568 enrolled and eligible participants. Note the rate ratios are again similar and the
resulting pvalues indicate the same conclusion no matter the processing method used.
Table 5.4. Results from final analysis at calendar time 10/30/2009
Method
Total
Events
Patient
Time
Crude
Rates
Rate
Ratio
Log-rank
Pvalue
Standard
Compressed 76 451566 1.683
Standard 95 424621 2.237 0.752 0.066
Personal Cutback
Compressed 76 451566 1.683
Standard 95 424621 2.237 0.752 0.066
Global Cutback
Compressed 76 507546 1.497
Standard 95 474419 2.002 0.748 0.067
Pull-Forward
Compressed 76 526266 1.444
Standard 95 491429 1.933 0.747 0.067
140
5.4 Conclusion
Overall, the major difference in applying the methods of data processing described in this
dissertation to real data is in the number of patients included in analyses conducted early. Total
patient varied also by processing method though this is mainly due to when participants are
censored and if time under study is synthesized as in the pull-forward method. Early in analysis,
we see that the global cutback method produces results that are different from the other processing
methods due to, as expected, less number of events and less number of participants included in the
analysis. As was observed in the simulation study (chapter 4), as the trial progress to the final
analysis, there is a catch up in information resulting in similar rates per group and less different
rate ratios. Ultimately, the decision to continue enrollment or terminate was the same in this data
applying any of the processing methods.
141
Chapter 6: Conclusions and Future Work
In this dissertation I evaluated the performance of four data processing methods in the context of
interim monitoring and showed the resulting effect on estimates of the log hazard ratio and
information accumulated under null hypothesis. The effect delayed reporting of events and
window length between visits on outcomes of interest were also investigated. I confirm all
methods of data processing presented in this research produce biased estimates of the log hazard
ratio when analyses are conducted early with regard to elapsed observed information. The bias is
most severe for methods of data processing which push back the calendar time of contributing data
to analysis further from the planned analytic time point.
For all methods, the bias diminishes as the trial progresses to later times of analysis
allowing the trial to be further along in elapsed information. In fact, overall, the difference in the
ability to reject the null hypothesis when there is a true treatment effect when employing any of
the processing methods described here is markedly small. The most considerable effect from the
data processing method applied, and ultimately from the delayed reporting of events and the
window length between visits, is during the early stages of interim monitoring, especially while
enrollment of trial participants is ongoing. Of the four data processing methods under
investigation, the global cutback method accumulates the least information at any particular
calendar time because of setting the effective analytic time point back in calendar time by at least
a window length consequently yielding the largest bias and variance in the resulting log hazard
ratio.
The personal cutback method also performs inadequately at early times of analysis but
mainly when the window between visits is relatively large. This method performs better as the
window between scheduled follow-up visits is shortened allowing for more participants to be
142
enrolled and have at least one follow-up visit prior to the analytic time point. Shortening the
window between visits essentially puts the personal cutback method as far along as the standard
method with regard to elapsed information at any time of analysis. Seemingly, the benefit of the
standard method in reporting events that occur between participants’ last clinical visit and the time
of analysis and events from participants whose enrollment time is such that no visit before the
analytic time point is possible, is being further along in information elapsed as to produce less
biased estimates of the log hazard ratio. The pull-forward method also benefits from being able to
report events in a similar way as the standard method, though this method makes the assumption
participants alive by the time of their last visit survive to the time of analysis. This assumption is
incorrect as the probability of delayed reporting increases and the effect on the estimates of interest
are more sensitive to the window length between visits.
The benefit both the standard and pull-forward methods exhibit in being further along in
information fraction by any time of analysis is mediated by the effect of creating effectively two
different reporting mechanisms: one for participants who experience an event and the other for
participants with well follow-up up to a last scheduled visit. This differential reporting scheme is
what leads to the departure of the assumption of independent increments of the score statistic under
both the standard and pull-forward methods. The difference in magnitude and in direction of the
correlation between increments of the score statistic is due to the reported time under observation
for participants with censored survival times and the proportion of events reported with delay. The
asymptotic and simulation results show reducing the window between visits alleviates this
departure from independent increments.
Overall, a partial solution to reduce the bias present at early stages of analysis and to
alleviate the correlated nature of the increments of the score statistics (in the case of the standard
143
and pull-forward methods) is to reduce the length of time between scheduled follow-up visits
essentially increasing the frequency of participant follow-up. At later times of analysis,
specifically once enrollment is complete, the window length need not be as short as in earlier
analyses with regard to accumulating similar amounts of information fraction. The effectiveness
of reducing the window between visits for the standard and pull-forward method is mediated by
the proportion of events reported with delay since it may be unrealistic in practice to reduce the
window to lengths where the effect of delayed reporting is negligible.
It is the conclusion of this research, the recommended data processing method is the
personal cutback method since estimated hazard rates in treatment groups have been shown to be
unbiased, it employs an ignorable censoring mechanism, and the independence of the increments
of the score statistics is maintained for window lengths up to one-eighth the length of the
enrollment time (section 3.3). Lastly, the reduction in window length necessary to show an
improvement in information elapse progression is reasonable.
Also, there are implications relevant to the design process of a clinical trial. The results
from this research indicate it is recommended to consider the length of time between follow-up
visits and the proportion of events that could be reported with delay, especially if employing
methods sensitive to these factors in their ability to estimate parameters indicative of treatment
effect. The practice of utilizing the pull-forward data processing method as a sensitivity analysis
is also called into question because, depending on and , we cannot be sure of what such an
analysis is quantifying.
Topics left to consider resulting from this dissertation include exploring the effect of
variable lengths of time between scheduled follow-up visits. In this research I assumed the window
length between visits was uniform for all trial participants. In practice, the window lengths may
144
have a more unpredictable length and frequency due to factors related to the status of therapy
completion or the treatment group allocation. From this research, a reasonable solution to reducing
the bias in estimates due to loss of information is setting the frequency of clinical visits for trial
participants conditional on the amount of time under observation where less time under study
means more frequent clinical visits. Another possible area of further research regarding the
window length between visits, especially once enrollment is complete, is to assess the effect of
setting a relatively long window except near the time of analysis. Also assumed in this research
is the frequency of reporting of events with delay between two treatment groups is equal. Future
work will focus on exploring the situation where reporting an event with delay depends on patient
characteristics and possibly allocation to treatment. The proposed solutions to limiting bias of the
estimates produced and the effect on the correlation between subsequent increments of the score
statistics should also be explored. More rigorous mathematical and statistical machinery would
be needed to investigate the latter areas of subsequent future research.
With respect to the association between increments of the score statistic under the null
hypothesis, it may be beneficial to inspect the correlated nature conditioning on trial progression
to later stages of interim monitoring. The asymptotic behavior of the increments of the score
statistic were investigated in this dissertation unconditional on the magnitude and statistical
significance of the treatment effect. Also, under each method of data processing presented in this
research, it would be instructive in the design phase of a trial to provide a mathematical
representation of the surface of the correlation coefficient of the increments of the score statistic
such that
= + + + + +
145
showing instantaneous effects of ∆ on ∆ on the resulting association. Here =
delayed reporting probability, = window length, and = correlation of the increment.
Also, since the methodology presented in section 3.2.3 is a general framework applicable for any
proportional hazards model, other distributions of survival time can be explored, such as the
Weibull distribution where for a different values of the shape and scale parameters, we can model
varying failure rates that are not constant over time. As briefly mentioned in section 3.2.4, the
time variable would need to be rescaled to ( ), where ( ) is the inverse survivor function
for the baseline group. The density ( ) the exponentially density with rate parameter = 1. The
conclusions about window width are not conclusions about the window width in the changed time
scale.
Lastly, there is the opportunity to extend the work presented here to incorporate the Cox
regression model allowing for the investigation of the performance of estimators and tests when
individual patient characteristics are included in a statistical model that are indicative of patient
survival when applying the methods of data processing investigated in this research.
146
References
[1] Dang, H. M. (2015). Interim Analysis Methods Based on Elapsed Information Time: Strategies
for Information Time Estimation (Doctoral dissertation). Retrieved from University of Southern
California Dissertation Database.
[2] DeMets, D. and Lan, G. (1984). An Overview of Sequential Methods and Their Application in
Clinical Trials. Communications in Statistics - Theory and Methods, 13(19): 2315-2338.
[3] DeMets, D. and Lan, G. (1994). Interim analysis: The alpha spending function approach.
Statistics in Medicine, 13(13):1341-1352.
[4] DeMets, D. and Lan, G. (1995). The Alpha Spending Approach to Interim Data Analyses. In
Thall, P. F., editor, Recent Advances in Clinical Trial Design and Analysis, pages 1-27. Kluwer,
1st edition.
[5] DeMets, D. L. and Gail, M. H. Use of logrank tests and group sequential methods at fixed
calendar times, Biometrics, 41, 1039-1044 (1985).
[6] Fleming, T. R. et al (1984). Designs for Group Sequential Tests. Controlled Clinical Trials, 5:
348-361.
[7] Green, S. J. et al. (1987). Policies for Study Monitoring and Interim Reporting of Results.
Journal of Clinical Oncology, 5(9): 1477-1484.
[8] Haybittle, J. L. (1971). Repeated assessment of results in clinical trials of cancer treatment.
The British Journal of Radiology, 44(526):793-797.
[9] Hu, P., Tsiatis, A.A. (1996). Estimating the survival distribution when ascertainment of vital
status is subject to delay, Biometrika 83: 371-380.
[10] Hwang, I. K., Shih, W. J., and De Cani, J. S. (1990). Group sequential designs using a family
of type I error probability spending functions. Statistics in Medicine, 9(12):1439-1445.
147
[11] Jennison, C., and Turnbull, B. W. (1989). Interim Analysis: The Repeated Confidence Interval
Approach. Journal of the Royal Statistical Society. Series B (Methodological), 51(3): 305-361.
[12] Jennison, C., and Turnbull, B. W. (1990). Statistical Approaches to Interim Monitoring of
Medical Trials: A Review and Commentary. Statistical Science, 5(3): 299-317.
[13] Kim, K. (1989). Point Estimation Following Group Sequential Tests. Biometrics, 45(2): 613-
617.
[14] Kim, K. and DeMets, D. L. (1987). Design and Analysis of Group Sequential Tests Based on
the Type I Error Spending Rate Function. Biometrika Trust, 74(1):149-154.
[15] Kim, K., Boucher, H., and Tsiatis, A. A. (1995). Design and Analysis of Group Sequential
Logrank Tests in Maximum Duration Versus Information Trials. International Biometric Society,
51(3):988-1000.
[16] Lachin, J. M. (2005). A review of methods for futility stopping based on conditional power.
Statistics in Medicine, 24(18):2747–2764.
[17] Lan, K. G., Reboussin, D. M., and DeMets, D. L. (1994). Information and information
fractions for design and sequential monitoring of clinical trials. Communications in Statistics -
Theory and Methods, 23(2):403–420.
[18] Lan, K. K. G. and DeMets, D. (2009). Further Comments on the Alpha-Spending Function.
Statistics in Biosciences, 1(1):95–111.
[19] Lan, K. K. G. and DeMets, D. L. (1983). Discrete Sequential Boundaries for Clinical Trials.
Biometrika Trust, 70(3):659-663.
[20] Lan, K. K. G. and Demets, D. L. (1989). Group sequential procedures: Calendar versus
information time. Statistics in Medicine, 8(10):1191-1198.
148
[21] Lan, K. K. G. and Lachin, J. M. (1990). Implementation of Group Sequential Logrank Tests
in a Maximum Duration Trial. Biometrics, 46(3):759.
[22] Lan, K. K. G. and Zucker, D. M. (1993). Sequential monitoring of clinical trials: The role of
information and brownian motion. Statistics in Medicine, 12(8):753-765.
[23] McIlvaine, E. J. (2015). The Impact of Data Collection Procedures on the Analysis of
Randomized Clinical Trials. Retrieved from University of Southern California Dissertation
database.
[24] O'Brien, P. C. and Fleming, T. R. (1979). A Multiple Testing Procedure for Clinical Trials.
Biometrics, 35(3):549.
[25] Pampallona, S., and Tsiatis, A. A. (1994). Group Sequential Designs for One-sided and Two-
sided Hypothesis Testing with Provision for Early Stopping in Favor of the Null Hypothesis.
Journal of Statistical Planning and Inference, 42: 19-35.
[26] Pocock, S. J. (1977). Group sequential methods in the design and analysis of clinical trials.
Biometrika, 64(2):191-199.
[27] Pocock, S. J. (1983). Interim Analyses for Randomized Clinical Trials. Biometrics, 38(1):153-
162.
[28] Pocock, S. J., and Hughes M. D. (1989). Practical Problems in Interim Analyses, with
Particluar Regard to Estimation. Controlled Clinical Trials, 10:2095-2215.
[29] Proschan, M. A. and Nason, M. (2011). A Note on Correction of Information Time in a
Survival Trial Using an Alpha Spending Function. Statistics in Biosciences, 3(2):250– 259.
[30] Proschan, M. A., Follmann, D. A., and Waclawiw M. A. (1992). Effects of Assumption
Violoations on Type I Error Rate in Group Sequential Monitoring. Biometrics, 48(4): 1131-1143.
149
[31] Proschan, M. A., Lan, K. K. G., and Wittes, J. T. (2006). Statistical Monitoring of Clinical
Trials: A Unified Approach. Springer.
[32] Sellke, T. and Siegmund, D. (1983). Sequential Analysis of the Proportional Hazards Model.
Biometrika Trust, 70(2):315–326.
[33] Slud, E. and Wei, L. J. (1982). Two-Sample Repeated Significance Tests Based on the
Modified Wilcoxon Statistic. American Statistical Association, 77(380): 862-848
[34] Tsiatis, A. A. (1982a). Group Sequential Methods for Survival Analysis with Staggered Entry.
Institute of Mathematical Statistics, 2:257-268.
[35] Tsiatis, A. A. (1982b). Repeated Significance Testing for a General Class of Statistics
Used in Censored Survival Analysis. American Statistical Association, 77(380):855-861.
[36] U.S. Food and Drug Administration/CDER/CBER. Guidance for Industry E9 Statistical
Principles for Clinical Trials. http://www.fda.gov/downloads/
Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm073137.pdf.
Accessed: 2013-10-07.
[37] Van der Laan, M.J., Hubbard, A.E. (1998). Locally efficient estimation of the survival
distribution with right-censored data and covariates when collection of data is delayed. Biometrika
85, 771-783.
[38] Wald, A. (1945). Sequential Test of Statistical Hypothesis. The Annals of Mathematical
Statistics, 16(2): 117-186.
[39] Wang, S. K. and Tsiatis, A. A. (1987). Approximately Optimal One-Parameter Boundaries
for Group Sequential Trials. Biometrics, 43(1):193.
150
Appendix Tables from Simulation Results
A1. Under the Hazard Ratio from Study Design
Table A1.1. Proportion of trials under the alternative hypothesis of a log hazard ratio of 0.67 for trials in rejection
group 1 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 499 (9.98) 499 (9.98) 499 (9.98) 499 (9.98) 499 (9.98) 499 (9.98) 499 (9.98)
0.5 Yrs
STD 397 (7.94) 378 (7.56) 329 (6.58) 275 (5.5) 234 (4.68) 198 (3.96) 184 (3.68)
CB1 184 (3.68) 184 (3.68) 184 (3.68) 184 (3.68) 184 (3.68) 184 (3.68) 184 (3.68)
CB2 54 (1.08) 54 (1.08) 54 (1.08) 54 (1.08) 54 (1.08) 54 (1.08) 54 (1.08)
PLF 499 (9.98) 458 (9.16) 387 (7.74) 279 (5.58) 230 (4.6) 185 (3.7) 165 (3.3)
0.25 Yrs
STD 482 (9.64) 455 (9.1) 424 (8.48) 384 (7.68) 359 (7.18) 341 (6.82) 320 (6.4)
CB1 320 (6.4) 320 (6.4) 320 (6.4) 320 (6.4) 320 (6.4) 320 (6.4) 320 (6.4)
CB2 204 (4.08) 204 (4.08) 204 (4.08) 204 (4.08) 204 (4.08) 204 (4.08) 204 (4.08)
PLF 499 (9.98) 475 (9.5) 442 (8.84) 394 (7.88) 350 (7) 331 (6.62) 309 (6.18)
0.125 Yrs
STD 493 (9.86) 485 (9.7) 468 (9.36) 436 (8.72) 435 (8.7) 411 (8.22) 406 (8.12)
CB1 406 (8.12) 406 (8.12) 406 (8.12) 406 (8.12) 406 (8.12) 406 (8.12) 406 (8.12)
CB2 341 (6.82) 341 (6.82) 341 (6.82) 341 (6.82) 341 (6.82) 341 (6.82) 341 (6.82)
PLF 499 (9.98) 487 (9.74) 466 (9.32) 445 (8.9) 440 (8.8) 416 (8.32) 409 (8.18)
Information on table 1 is presented as count(%)
Table A1.2. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio of 0.67 for trials in rejection
group 1 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW -0.871(0.14) -0.871(0.14) -0.871(0.14) -0.871(0.14) -0.871(0.14) -0.871(0.14) -0.871(0.14)
0.5 Yrs
STD -0.886(0.147) -0.906(0.151) -0.936(0.163) -0.995(0.177) -1.053(0.171) -1.092(0.18) -1.129(0.2)
CB1 -1.129(0.2) -1.129(0.2) -1.129(0.2) -1.129(0.2) -1.129(0.2) -1.129(0.2) -1.129(0.2)
CB2 -1.478(0.277) -1.478(0.277) -1.478(0.277) -1.478(0.277) -1.478(0.277) -1.478(0.277) -1.478(0.277)
PLF -0.871(0.14) -0.887(0.146) -0.919(0.15) -0.979(0.174) -1.049(0.166) -1.091(0.171) -1.133(0.19)
0.25 Yrs
STD -0.875(0.142) -0.886(0.146) -0.897(0.153) -0.918(0.155) -0.95(0.154) -0.959(0.164) -0.974(0.175)
CB1 -0.974(0.175) -0.974(0.175) -0.974(0.175) -0.974(0.175) -0.974(0.175) -0.974(0.175) -0.974(0.175)
CB2 -1.111(0.193) -1.111(0.193) -1.111(0.193) -1.111(0.193) -1.111(0.193) -1.111(0.193) -1.111(0.193)
PLF -0.871(0.14) -0.878(0.145) -0.89(0.151) -0.91(0.157) -0.945(0.154) -0.954(0.166) -0.967(0.179)
0.125 Yrs
STD -0.872(0.142) -0.875(0.146) -0.88(0.149) -0.895(0.148) -0.906(0.146) -0.911(0.15) -0.913(0.157)
CB1 -0.913(0.157) -0.913(0.157) -0.913(0.157) -0.913(0.157) -0.913(0.157) -0.913(0.157) -0.913(0.157)
CB2 -0.971(0.165) -0.971(0.165) -0.971(0.165) -0.971(0.165) -0.971(0.165) -0.971(0.165) -0.971(0.165)
PLF -0.871(0.14) -0.873(0.145) -0.88(0.148) -0.892(0.147) -0.901(0.146) -0.907(0.149) -0.91(0.157)
Information on table 2 is presented as mean(std)
151
Table A1.3. Mean information fraction under the alternative hypothesis of a log hazard ratio of 0.67 for trials in
rejection group 1 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 0.374(0.035) 0.374(0.035) 0.374(0.035) 0.374(0.035) 0.374(0.035) 0.374(0.035) 0.374(0.035)
0.5 Yrs
STD 0.367(0.039) 0.357(0.039) 0.345(0.037) 0.322(0.036) 0.304(0.037) 0.291(0.034) 0.282(0.035)
CB1 0.282(0.035) 0.282(0.035) 0.282(0.035) 0.282(0.035) 0.282(0.035) 0.282(0.035) 0.282(0.035)
CB2 0.217(0.025) 0.217(0.025) 0.217(0.025) 0.217(0.025) 0.217(0.025) 0.217(0.025) 0.217(0.025)
PLF 0.374(0.035) 0.366(0.034) 0.352(0.035) 0.332(0.033) 0.308(0.034) 0.296(0.035) 0.286(0.032)
0.25 Yrs
STD 0.37(0.037) 0.366(0.038) 0.36(0.038) 0.349(0.035) 0.339(0.037) 0.333(0.036) 0.329(0.037)
CB1 0.329(0.037) 0.329(0.037) 0.329(0.037) 0.329(0.037) 0.329(0.037) 0.329(0.037) 0.329(0.037)
CB2 0.289(0.034) 0.289(0.034) 0.289(0.034) 0.289(0.034) 0.289(0.034) 0.289(0.034) 0.289(0.034)
PLF 0.374(0.035) 0.37(0.035) 0.364(0.036) 0.352(0.035) 0.342(0.035) 0.336(0.034) 0.333(0.034)
0.125 Yrs
STD 0.372(0.037) 0.371(0.037) 0.368(0.036) 0.363(0.035) 0.356(0.037) 0.355(0.036) 0.353(0.035)
CB1 0.353(0.035) 0.353(0.035) 0.353(0.035) 0.353(0.035) 0.353(0.035) 0.353(0.035) 0.353(0.035)
CB2 0.331(0.037) 0.331(0.037) 0.331(0.037) 0.331(0.037) 0.331(0.037) 0.331(0.037) 0.331(0.037)
PLF 0.374(0.035) 0.373(0.035) 0.369(0.035) 0.364(0.035) 0.357(0.037) 0.356(0.035) 0.354(0.035)
Information on table 3 is presented as mean(std)
Table A1.4. Proportion of trials under the alternative hypothesis of a log hazard ratio of 0.67 for trials in rejection
group 2 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 1669 (33.38) 1669 (33.38) 1669 (33.38) 1669 (33.38) 1669 (33.38) 1669 (33.38) 1669 (33.38)
0.5 Yrs
STD 1631 (32.62) 1623 (32.46) 1611 (32.22) 1557 (31.14) 1443 (28.86) 1428 (28.56) 1422 (28.44)
CB1 1422 (28.44) 1422 (28.44) 1422 (28.44) 1422 (28.44) 1422 (28.44) 1422 (28.44) 1422 (28.44)
CB2 1006 (20.12) 1006 (20.12) 1006 (20.12) 1006 (20.12) 1006 (20.12) 1006 (20.12) 1006 (20.12)
PLF 1669 (33.38) 1648 (32.96) 1616 (32.32) 1562 (31.24) 1415 (28.3) 1386 (27.72) 1342 (26.84)
0.25 Yrs
STD 1642 (32.84) 1660 (33.2) 1670 (33.4) 1642 (32.84) 1580 (31.6) 1544 (30.88) 1558 (31.16)
CB1 1558 (31.16) 1558 (31.16) 1558 (31.16) 1558 (31.16) 1558 (31.16) 1558 (31.16) 1558 (31.16)
CB2 1433 (28.66) 1433 (28.66) 1433 (28.66) 1433 (28.66) 1433 (28.66) 1433 (28.66) 1433 (28.66)
PLF 1669 (33.38) 1688 (33.76) 1682 (33.64) 1616 (32.32) 1572 (31.44) 1559 (31.18) 1561 (31.22)
0.125 Yrs
STD 1657 (33.14) 1669 (33.38) 1665 (33.3) 1665 (33.3) 1611 (32.22) 1621 (32.42) 1621 (32.42)
CB1 1621 (32.42) 1621 (32.42) 1621 (32.42) 1621 (32.42) 1621 (32.42) 1621 (32.42) 1621 (32.42)
CB2 1565 (31.3) 1565 (31.3) 1565 (31.3) 1565 (31.3) 1565 (31.3) 1565 (31.3) 1565 (31.3)
PLF 1669 (33.38) 1677 (33.54) 1670 (33.4) 1665 (33.3) 1586 (31.72) 1615 (32.3) 1616 (32.32)
Information on table 4 is presented as count(%)
152
Table A1.5. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio of 0.67 for trials in rejection
group 2 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW -0.545(0.091) -0.545(0.091) -0.545(0.091) -0.545(0.091) -0.545(0.091) -0.545(0.091) -0.545(0.091)
0.5 Yrs
STD -0.549(0.092) -0.555(0.093) -0.564(0.095) -0.582(0.098) -0.6(0.104) -0.609(0.106) -0.616(0.106)
CB1 -0.616(0.106) -0.616(0.106) -0.616(0.106) -0.616(0.106) -0.616(0.106) -0.616(0.106) -0.616(0.106)
CB2 -0.705(0.112) -0.705(0.112) -0.705(0.112) -0.705(0.112) -0.705(0.112) -0.705(0.112) -0.705(0.112)
PLF -0.545(0.091) -0.554(0.094) -0.564(0.095) -0.581(0.097) -0.598(0.102) -0.606(0.105) -0.616(0.107)
0.25 Yrs
STD -0.545(0.09) -0.547(0.091) -0.553(0.094) -0.562(0.095) -0.571(0.097) -0.578(0.099) -0.58(0.1)
CB1 -0.58(0.1) -0.58(0.1) -0.58(0.1) -0.58(0.1) -0.58(0.1) -0.58(0.1) -0.58(0.1)
CB2 -0.614(0.103) -0.614(0.103) -0.614(0.103) -0.614(0.103) -0.614(0.103) -0.614(0.103) -0.614(0.103)
PLF -0.545(0.091) -0.547(0.092) -0.552(0.094) -0.563(0.095) -0.573(0.099) -0.577(0.1) -0.579(0.1)
0.125 Yrs
STD -0.544(0.089) -0.545(0.09) -0.548(0.092) -0.555(0.094) -0.559(0.095) -0.562(0.097) -0.563(0.097)
CB1 -0.563(0.097) -0.563(0.097) -0.563(0.097) -0.563(0.097) -0.563(0.097) -0.563(0.097) -0.563(0.097)
CB2 -0.578(0.099) -0.578(0.099) -0.578(0.099) -0.578(0.099) -0.578(0.099) -0.578(0.099) -0.578(0.099)
PLF -0.545(0.091) -0.547(0.092) -0.55(0.093) -0.553(0.094) -0.56(0.095) -0.561(0.098) -0.562(0.097)
Information on table 5 is presented as mean(std)
Table A1.6. Mean information fraction under the alternative hypothesis of a log hazard ratio of 0.67 for trials in
rejection group 2 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 0.689(0.04) 0.689(0.04) 0.689(0.04) 0.689(0.04) 0.689(0.04) 0.689(0.04) 0.689(0.04)
0.5 Yrs
STD 0.686(0.041) 0.676(0.042) 0.66(0.042) 0.635(0.042) 0.611(0.042) 0.595(0.042) 0.585(0.042)
CB1 0.585(0.042) 0.585(0.042) 0.585(0.042) 0.585(0.042) 0.585(0.042) 0.585(0.042) 0.585(0.042)
CB2 0.495(0.042) 0.495(0.042) 0.495(0.042) 0.495(0.042) 0.495(0.042) 0.495(0.042) 0.495(0.042)
PLF 0.689(0.04) 0.679(0.04) 0.664(0.04) 0.638(0.04) 0.615(0.04) 0.6(0.039) 0.59(0.04)
0.25 Yrs
STD 0.688(0.041) 0.683(0.041) 0.677(0.041) 0.666(0.041) 0.654(0.041) 0.648(0.041) 0.644(0.04)
CB1 0.644(0.04) 0.644(0.04) 0.644(0.04) 0.644(0.04) 0.644(0.04) 0.644(0.04) 0.644(0.04)
CB2 0.596(0.042) 0.596(0.042) 0.596(0.042) 0.596(0.042) 0.596(0.042) 0.596(0.042) 0.596(0.042)
PLF 0.689(0.04) 0.684(0.04) 0.678(0.039) 0.668(0.04) 0.656(0.04) 0.65(0.04) 0.646(0.04)
0.125 Yrs
STD 0.688(0.04) 0.686(0.04) 0.683(0.04) 0.678(0.04) 0.672(0.041) 0.669(0.041) 0.667(0.041)
CB1 0.667(0.041) 0.667(0.041) 0.667(0.041) 0.667(0.041) 0.667(0.041) 0.667(0.041) 0.667(0.041)
CB2 0.645(0.041) 0.645(0.041) 0.645(0.041) 0.645(0.041) 0.645(0.041) 0.645(0.041) 0.645(0.041)
PLF 0.689(0.04) 0.687(0.04) 0.683(0.04) 0.678(0.04) 0.673(0.04) 0.669(0.04) 0.667(0.04)
Information on table 6 is presented as mean(std)
153
Table A1.7. Proportion of trials under the alternative hypothesis of a log hazard ratio of 0.67 for trials in rejection
group 3 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 1749 (34.98) 1749 (34.98) 1749 (34.98) 1749 (34.98) 1749 (34.98) 1749 (34.98) 1749 (34.98)
0.5 Yrs
STD 1856 (37.12) 1892 (37.84) 1956 (39.12) 2076 (41.52) 2243 (44.86) 2288 (45.76) 2306 (46.12)
CB1 2306 (46.12) 2306 (46.12) 2306 (46.12) 2306 (46.12) 2306 (46.12) 2306 (46.12) 2306 (46.12)
CB2 2854 (57.08) 2854 (57.08) 2854 (57.08) 2854 (57.08) 2854 (57.08) 2854 (57.08) 2854 (57.08)
PLF 1749 (34.98) 1809 (36.18) 1910 (38.2) 2081 (41.62) 2286 (45.72) 2355 (47.1) 2413 (48.26)
0.25 Yrs
STD 1782 (35.64) 1792 (35.84) 1817 (36.34) 1890 (37.8) 1982 (39.64) 2036 (40.72) 2036 (40.72)
CB1 2036 (40.72) 2036 (40.72) 2036 (40.72) 2036 (40.72) 2036 (40.72) 2036 (40.72) 2036 (40.72)
CB2 2276 (45.52) 2276 (45.52) 2276 (45.52) 2276 (45.52) 2276 (45.52) 2276 (45.52) 2276 (45.52)
PLF 1749 (34.98) 1755 (35.1) 1798 (35.96) 1917 (38.34) 1992 (39.84) 2030 (40.6) 2049 (40.98)
0.125 Yrs
STD 1761 (35.22) 1761 (35.22) 1780 (35.6) 1813 (36.26) 1875 (37.5) 1890 (37.8) 1895 (37.9)
CB1 1895 (37.9) 1895 (37.9) 1895 (37.9) 1895 (37.9) 1895 (37.9) 1895 (37.9) 1895 (37.9)
CB2 2013 (40.26) 2013 (40.26) 2013 (40.26) 2013 (40.26) 2013 (40.26) 2013 (40.26) 2013 (40.26)
PLF 1749 (34.98) 1757 (35.14) 1784 (35.68) 1809 (36.18) 1897 (37.94) 1887 (37.74) 1895 (37.9)
Information on table 7 is presented as count(%)
Table A1.8. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio of 0.67 for trials in rejection
group 3 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW -0.394(0.07) -0.394(0.07) -0.394(0.07) -0.394(0.07) -0.394(0.07) -0.394(0.07) -0.394(0.07)
0.5 Yrs
STD -0.399(0.072) -0.399(0.073) -0.401(0.074) -0.404(0.076) -0.407(0.078) -0.41(0.079) -0.41(0.08)
CB1 -0.41(0.08) -0.41(0.08) -0.41(0.08) -0.41(0.08) -0.41(0.08) -0.41(0.08) -0.41(0.08)
CB2 -0.426(0.092) -0.426(0.092) -0.426(0.092) -0.426(0.092) -0.426(0.092) -0.426(0.092) -0.426(0.092)
PLF -0.394(0.07) -0.396(0.071) -0.399(0.074) -0.404(0.076) -0.41(0.08) -0.413(0.081) -0.415(0.083)
0.25 Yrs
STD -0.394(0.07) -0.395(0.07) -0.395(0.071) -0.397(0.071) -0.4(0.073) -0.401(0.074) -0.402(0.074)
CB1 -0.402(0.074) -0.402(0.074) -0.402(0.074) -0.402(0.074) -0.402(0.074) -0.402(0.074) -0.402(0.074)
CB2 -0.408(0.079) -0.408(0.079) -0.408(0.079) -0.408(0.079) -0.408(0.079) -0.408(0.079) -0.408(0.079)
PLF -0.394(0.07) -0.394(0.07) -0.394(0.071) -0.398(0.072) -0.401(0.073) -0.402(0.074) -0.403(0.075)
0.125 Yrs
STD -0.394(0.07) -0.394(0.07) -0.394(0.07) -0.395(0.071) -0.397(0.072) -0.397(0.072) -0.397(0.072)
CB1 -0.397(0.072) -0.397(0.072) -0.397(0.072) -0.397(0.072) -0.397(0.072) -0.397(0.072) -0.397(0.072)
CB2 -0.401(0.073) -0.401(0.073) -0.401(0.073) -0.401(0.073) -0.401(0.073) -0.401(0.073) -0.401(0.073)
PLF -0.394(0.07) -0.394(0.07) -0.394(0.071) -0.395(0.071) -0.396(0.071) -0.397(0.072) -0.397(0.072)
Information on table 8 is presented as mean(std)
154
Table A1.9. Mean information fraction under the alternative hypothesis of a log hazard ratio of 0.67 for trials in
rejection group 3 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 1.001(0.012) 1.001(0.012) 1.001(0.012) 1.001(0.012) 1.001(0.012) 1.001(0.012) 1.001(0.012)
0.5 Yrs
STD 1.001(0.013) 1(0.013) 0.999(0.013) 0.998(0.014) 0.996(0.014) 0.995(0.014) 0.995(0.014)
CB1 0.995(0.014) 0.995(0.014) 0.995(0.014) 0.995(0.014) 0.995(0.014) 0.995(0.014) 0.995(0.014)
CB2 0.987(0.016) 0.987(0.016) 0.987(0.016) 0.987(0.016) 0.987(0.016) 0.987(0.016) 0.987(0.016)
PLF 1.001(0.012) 1(0.013) 0.999(0.013) 0.998(0.013) 0.996(0.014) 0.995(0.014) 0.995(0.014)
0.25 Yrs
STD 1.001(0.013) 1(0.013) 1(0.013) 0.999(0.013) 0.999(0.013) 0.998(0.013) 0.998(0.013)
CB1 0.998(0.013) 0.998(0.013) 0.998(0.013) 0.998(0.013) 0.998(0.013) 0.998(0.013) 0.998(0.013)
CB2 0.994(0.014) 0.994(0.014) 0.994(0.014) 0.994(0.014) 0.994(0.014) 0.994(0.014) 0.994(0.014)
PLF 1.001(0.012) 1(0.013) 1(0.013) 0.999(0.013) 0.999(0.013) 0.998(0.013) 0.998(0.014)
0.125 Yrs
STD 1.001(0.013) 1.001(0.013) 1(0.013) 1(0.013) 1(0.013) 0.999(0.013) 0.999(0.013)
CB1 0.999(0.013) 0.999(0.013) 0.999(0.013) 0.999(0.013) 0.999(0.013) 0.999(0.013) 0.999(0.013)
CB2 0.998(0.013) 0.998(0.013) 0.998(0.013) 0.998(0.013) 0.998(0.013) 0.998(0.013) 0.998(0.013)
PLF 1.001(0.012) 1.001(0.013) 1(0.013) 1(0.013) 1(0.013) 0.999(0.013) 0.999(0.013)
Information on table 9 is presented as mean(std)
Table A1.10. Proportion of trials under the alternative hypothesis of a log hazard ratio of 0.67 for trials in rejection
group 4 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 1083 (21.66) 1083 (21.66) 1083 (21.66) 1083 (21.66) 1083 (21.66) 1083 (21.66) 1083 (21.66)
0.5 Yrs
STD 1116 (22.32) 1107 (22.14) 1104 (22.08) 1092 (21.84) 1080 (21.6) 1086 (21.72) 1088 (21.76)
CB1 1088 (21.76) 1088 (21.76) 1088 (21.76) 1088 (21.76) 1088 (21.76) 1088 (21.76) 1088 (21.76)
CB2 1086 (21.72) 1086 (21.72) 1086 (21.72) 1086 (21.72) 1086 (21.72) 1086 (21.72) 1086 (21.72)
PLF 1083 (21.66) 1085 (21.7) 1087 (21.74) 1078 (21.56) 1069 (21.38) 1074 (21.48) 1080 (21.6)
0.25 Yrs
STD 1094 (21.88) 1093 (21.86) 1089 (21.78) 1084 (21.68) 1079 (21.58) 1079 (21.58) 1086 (21.72)
CB1 1086 (21.72) 1086 (21.72) 1086 (21.72) 1086 (21.72) 1086 (21.72) 1086 (21.72) 1086 (21.72)
CB2 1087 (21.74) 1087 (21.74) 1087 (21.74) 1087 (21.74) 1087 (21.74) 1087 (21.74) 1087 (21.74)
PLF 1083 (21.66) 1082 (21.64) 1078 (21.56) 1073 (21.46) 1086 (21.72) 1080 (21.6) 1081 (21.62)
0.125 Yrs
STD 1089 (21.78) 1085 (21.7) 1087 (21.74) 1086 (21.72) 1079 (21.58) 1078 (21.56) 1078 (21.56)
CB1 1078 (21.56) 1078 (21.56) 1078 (21.56) 1078 (21.56) 1078 (21.56) 1078 (21.56) 1078 (21.56)
CB2 1081 (21.62) 1081 (21.62) 1081 (21.62) 1081 (21.62) 1081 (21.62) 1081 (21.62) 1081 (21.62)
PLF 1083 (21.66) 1079 (21.58) 1080 (21.6) 1081 (21.62) 1077 (21.54) 1082 (21.64) 1080 (21.6)
Information on table 10 is presented as count(%)
155
Table A1.11. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio of 0.67 for trials in rejection
group 4 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW -0.21(0.071) -0.21(0.071) -0.21(0.071) -0.21(0.071) -0.21(0.071) -0.21(0.071) -0.21(0.071)
0.5 Yrs
STD -0.211(0.07) -0.211(0.071) -0.21(0.07) -0.209(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07)
CB1 -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07)
CB2 -0.207(0.07) -0.207(0.07) -0.207(0.07) -0.207(0.07) -0.207(0.07) -0.207(0.07) -0.207(0.07)
PLF -0.21(0.071) -0.21(0.071) -0.21(0.071) -0.209(0.071) -0.208(0.07) -0.208(0.071) -0.208(0.071)
0.25 Yrs
STD -0.21(0.07) -0.21(0.07) -0.21(0.07) -0.209(0.07) -0.209(0.07) -0.209(0.07) -0.209(0.07)
CB1 -0.209(0.07) -0.209(0.07) -0.209(0.07) -0.209(0.07) -0.209(0.07) -0.209(0.07) -0.209(0.07)
CB2 -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07)
PLF -0.21(0.071) -0.21(0.071) -0.209(0.07) -0.209(0.071) -0.209(0.071) -0.209(0.071) -0.209(0.071)
0.125 Yrs
STD -0.21(0.071) -0.21(0.071) -0.209(0.07) -0.209(0.07) -0.209(0.07) -0.208(0.07) -0.208(0.07)
CB1 -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07)
CB2 -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07) -0.208(0.07)
PLF -0.21(0.071) -0.209(0.071) -0.209(0.07) -0.209(0.07) -0.208(0.07) -0.209(0.07) -0.209(0.07)
Information on table 11 is presented as mean(std)
Table A1.12. Mean information fraction under the alternative hypothesis of a log hazard ratio of 0.67 for trials in
rejection group 4 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 0.989(0.015) 0.989(0.015) 0.989(0.015) 0.989(0.015) 0.989(0.015) 0.989(0.015) 0.989(0.015)
0.5 Yrs
STD 0.99(0.016) 0.989(0.016) 0.988(0.016) 0.986(0.016) 0.984(0.016) 0.983(0.017) 0.982(0.017)
CB1 0.982(0.017) 0.982(0.017) 0.982(0.017) 0.982(0.017) 0.982(0.017) 0.982(0.017) 0.982(0.017)
CB2 0.971(0.019) 0.971(0.019) 0.971(0.019) 0.971(0.019) 0.971(0.019) 0.971(0.019) 0.971(0.019)
PLF 0.989(0.015) 0.989(0.016) 0.988(0.016) 0.986(0.016) 0.983(0.016) 0.982(0.017) 0.982(0.017)
0.25 Yrs
STD 0.99(0.016) 0.989(0.016) 0.989(0.016) 0.988(0.016) 0.987(0.016) 0.986(0.016) 0.986(0.016)
CB1 0.986(0.016) 0.986(0.016) 0.986(0.016) 0.986(0.016) 0.986(0.016) 0.986(0.016) 0.986(0.016)
CB2 0.981(0.017) 0.981(0.017) 0.981(0.017) 0.981(0.017) 0.981(0.017) 0.981(0.017) 0.981(0.017)
PLF 0.989(0.015) 0.989(0.016) 0.989(0.016) 0.988(0.016) 0.987(0.016) 0.986(0.016) 0.986(0.016)
0.125 Yrs
STD 0.989(0.016) 0.989(0.016) 0.989(0.016) 0.989(0.016) 0.988(0.016) 0.988(0.016) 0.987(0.016)
CB1 0.987(0.016) 0.987(0.016) 0.987(0.016) 0.987(0.016) 0.987(0.016) 0.987(0.016) 0.987(0.016)
CB2 0.985(0.016) 0.985(0.016) 0.985(0.016) 0.985(0.016) 0.985(0.016) 0.985(0.016) 0.985(0.016)
PLF 0.989(0.015) 0.989(0.016) 0.989(0.016) 0.989(0.016) 0.988(0.016) 0.988(0.016) 0.987(0.016)
Information on table 12 is presented as mean(std)
156
A2. Under Departure of the Hazard Ratio from Study Design
Table A2.1. Proportion of trials under the alternative hypothesis of a log hazard ratio of 0.85 for trials in rejection
group 1 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 84 (1.68) 84 (1.68) 84 (1.68) 84 (1.68) 84 (1.68) 84 (1.68) 84 (1.68)
0.5 Yrs
STD 74 (1.48) 70 (1.4) 64 (1.28) 49 (0.98) 51 (1.02) 42 (0.84) 39 (0.78)
CB1 39 (0.78) 39 (0.78) 39 (0.78) 39 (0.78) 39 (0.78) 39 (0.78) 39 (0.78)
CB2 15 (0.3) 15 (0.3) 15 (0.3) 15 (0.3) 15 (0.3) 15 (0.3) 15 (0.3)
PLF 84 (1.68) 79 (1.58) 68 (1.36) 54 (1.08) 44 (0.88) 37 (0.74) 33 (0.66)
0.25 Yrs
STD 84 (1.68) 81 (1.62) 81 (1.62) 80 (1.6) 70 (1.4) 62 (1.24) 60 (1.2)
CB1 60 (1.2) 60 (1.2) 60 (1.2) 60 (1.2) 60 (1.2) 60 (1.2) 60 (1.2)
CB2 47 (0.94) 47 (0.94) 47 (0.94) 47 (0.94) 47 (0.94) 47 (0.94) 47 (0.94)
PLF 84 (1.68) 76 (1.52) 73 (1.46) 68 (1.36) 68 (1.36) 60 (1.2) 58 (1.16)
0.125 Yrs
STD 79 (1.58) 74 (1.48) 76 (1.52) 77 (1.54) 76 (1.52) 71 (1.42) 76 (1.52)
CB1 76 (1.52) 76 (1.52) 76 (1.52) 76 (1.52) 76 (1.52) 76 (1.52) 76 (1.52)
CB2 60 (1.2) 60 (1.2) 60 (1.2) 60 (1.2) 60 (1.2) 60 (1.2) 60 (1.2)
PLF 84 (1.68) 81 (1.62) 79 (1.58) 76 (1.52) 77 (1.54) 70 (1.4) 66 (1.32)
Information on table 13 is presented as count(%)
Table A2.2. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio of 0.85 for trials in rejection
group 1 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW -0.802(0.128) -0.802(0.128) -0.802(0.128) -0.802(0.128) -0.802(0.128) -0.802(0.128) -0.802(0.128)
0.5 Yrs
STD -0.817(0.142) -0.844(0.149) -0.855(0.154) -0.915(0.184) -0.97(0.163) -1.007(0.158) -1.034(0.177)
CB1 -1.034(0.177) -1.034(0.177) -1.034(0.177) -1.034(0.177) -1.034(0.177) -1.034(0.177) -1.034(0.177)
CB2 -1.359(0.277) -1.359(0.277) -1.359(0.277) -1.359(0.277) -1.359(0.277) -1.359(0.277) -1.359(0.277)
PLF -0.802(0.128) -0.823(0.139) -0.848(0.142) -0.898(0.161) -0.971(0.159) -1.024(0.135) -1.049(0.153)
0.25 Yrs
STD -0.806(0.133) -0.813(0.145) -0.819(0.147) -0.846(0.154) -0.872(0.156) -0.899(0.162) -0.909(0.184)
CB1 -0.909(0.184) -0.909(0.184) -0.909(0.184) -0.909(0.184) -0.909(0.184) -0.909(0.184) -0.909(0.184)
CB2 -1.013(0.362) -1.013(0.362) -1.013(0.362) -1.013(0.362) -1.013(0.362) -1.013(0.362) -1.013(0.362)
PLF -0.802(0.128) -0.819(0.14) -0.827(0.147) -0.856(0.16) -0.872(0.152) -0.897(0.163) -0.918(0.179)
0.125 Yrs
STD -0.814(0.127) -0.823(0.139) -0.823(0.137) -0.831(0.143) -0.841(0.14) -0.849(0.14) -0.849(0.152)
CB1 -0.849(0.152) -0.849(0.152) -0.849(0.152) -0.849(0.152) -0.849(0.152) -0.849(0.152) -0.849(0.152)
CB2 -0.904(0.196) -0.904(0.196) -0.904(0.196) -0.904(0.196) -0.904(0.196) -0.904(0.196) -0.904(0.196)
PLF -0.802(0.128) -0.813(0.138) -0.819(0.137) -0.83(0.147) -0.839(0.14) -0.849(0.14) -0.864(0.157)
Information on table 14 is presented as mean(std)
157
Table A2.3. Mean information fraction under the alternative hypothesis of a log hazard ratio of 0.85 for trials in
rejection group 1 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 0.392(0.035) 0.392(0.035) 0.392(0.035) 0.392(0.035) 0.392(0.035) 0.392(0.035) 0.392(0.035)
0.5 Yrs
STD 0.382(0.037) 0.371(0.034) 0.358(0.036) 0.34(0.03) 0.309(0.032) 0.299(0.032) 0.293(0.03)
CB1 0.293(0.03) 0.293(0.03) 0.293(0.03) 0.293(0.03) 0.293(0.03) 0.293(0.03) 0.293(0.03)
CB2 0.225(0.03) 0.225(0.03) 0.225(0.03) 0.225(0.03) 0.225(0.03) 0.225(0.03) 0.225(0.03)
PLF 0.392(0.035) 0.382(0.033) 0.371(0.032) 0.348(0.031) 0.315(0.028) 0.303(0.033) 0.3(0.032)
0.25 Yrs
STD 0.387(0.039) 0.385(0.037) 0.378(0.036) 0.366(0.036) 0.351(0.035) 0.343(0.036) 0.345(0.035)
CB1 0.345(0.035) 0.345(0.035) 0.345(0.035) 0.345(0.035) 0.345(0.035) 0.345(0.035) 0.345(0.035)
CB2 0.291(0.033) 0.291(0.033) 0.291(0.033) 0.291(0.033) 0.291(0.033) 0.291(0.033) 0.291(0.033)
PLF 0.392(0.035) 0.389(0.035) 0.385(0.033) 0.374(0.035) 0.357(0.034) 0.351(0.034) 0.35(0.035)
0.125 Yrs
STD 0.389(0.038) 0.391(0.036) 0.385(0.034) 0.378(0.037) 0.374(0.037) 0.37(0.037) 0.369(0.037)
CB1 0.369(0.037) 0.369(0.037) 0.369(0.037) 0.369(0.037) 0.369(0.037) 0.369(0.037) 0.369(0.037)
CB2 0.346(0.035) 0.346(0.035) 0.346(0.035) 0.346(0.035) 0.346(0.035) 0.346(0.035) 0.346(0.035)
PLF 0.392(0.035) 0.393(0.034) 0.387(0.033) 0.383(0.036) 0.378(0.038) 0.373(0.037) 0.371(0.037)
Information on table 15 is presented as mean(std)
Table A2.4. Proportion of trials under the alternative hypothesis of a log hazard ratio of 0.85 for trials in rejection
group 2 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 343 (6.86) 343 (6.86) 343 (6.86) 343 (6.86) 343 (6.86) 343 (6.86) 343 (6.86)
0.5 Yrs
STD 322 (6.44) 321 (6.42) 305 (6.1) 296 (5.92) 257 (5.14) 252 (5.04) 252 (5.04)
CB1 252 (5.04) 252 (5.04) 252 (5.04) 252 (5.04) 252 (5.04) 252 (5.04) 252 (5.04)
CB2 172 (3.44) 172 (3.44) 172 (3.44) 172 (3.44) 172 (3.44) 172 (3.44) 172 (3.44)
PLF 343 (6.86) 332 (6.64) 316 (6.32) 302 (6.04) 257 (5.14) 246 (4.92) 248 (4.96)
0.25 Yrs
STD 332 (6.64) 320 (6.4) 324 (6.48) 321 (6.42) 297 (5.94) 303 (6.06) 297 (5.94)
CB1 297 (5.94) 297 (5.94) 297 (5.94) 297 (5.94) 297 (5.94) 297 (5.94) 297 (5.94)
CB2 260 (5.2) 260 (5.2) 260 (5.2) 260 (5.2) 260 (5.2) 260 (5.2) 260 (5.2)
PLF 343 (6.86) 345 (6.9) 333 (6.66) 331 (6.62) 299 (5.98) 297 (5.94) 297 (5.94)
0.125 Yrs
STD 338 (6.76) 332 (6.64) 341 (6.82) 333 (6.66) 331 (6.62) 327 (6.54) 327 (6.54)
CB1 327 (6.54) 327 (6.54) 327 (6.54) 327 (6.54) 327 (6.54) 327 (6.54) 327 (6.54)
CB2 306 (6.12) 306 (6.12) 306 (6.12) 306 (6.12) 306 (6.12) 306 (6.12) 306 (6.12)
PLF 343 (6.86) 340 (6.8) 345 (6.9) 335 (6.7) 341 (6.82) 337 (6.74) 336 (6.72)
Information on table 16 is presented as count(%)
158
Table A2.5. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio of 0.85 for trials in rejection
group 2 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW -0.48(0.085) -0.48(0.085) -0.48(0.085) -0.48(0.085) -0.48(0.085) -0.48(0.085) -0.48(0.085)
0.5 Yrs
STD -0.479(0.087) -0.484(0.105) -0.494(0.109) -0.51(0.113) -0.531(0.109) -0.544(0.109) -0.551(0.109)
CB1 -0.551(0.109) -0.551(0.109) -0.551(0.109) -0.551(0.109) -0.551(0.109) -0.551(0.109) -0.551(0.109)
CB2 -0.632(0.09) -0.632(0.09) -0.632(0.09) -0.632(0.09) -0.632(0.09) -0.632(0.09) -0.632(0.09)
PLF -0.48(0.085) -0.483(0.101) -0.495(0.105) -0.519(0.098) -0.531(0.106) -0.544(0.109) -0.55(0.109)
0.25 Yrs
STD -0.479(0.086) -0.484(0.086) -0.486(0.086) -0.491(0.087) -0.501(0.09) -0.502(0.09) -0.505(0.089)
CB1 -0.505(0.089) -0.505(0.089) -0.505(0.089) -0.505(0.089) -0.505(0.089) -0.505(0.089) -0.505(0.089)
CB2 -0.541(0.104) -0.541(0.104) -0.541(0.104) -0.541(0.104) -0.541(0.104) -0.541(0.104) -0.541(0.104)
PLF -0.48(0.085) -0.484(0.086) -0.487(0.086) -0.492(0.087) -0.502(0.092) -0.506(0.091) -0.508(0.09)
0.125 Yrs
STD -0.481(0.085) -0.484(0.085) -0.482(0.085) -0.484(0.086) -0.489(0.087) -0.491(0.088) -0.491(0.088)
CB1 -0.491(0.088) -0.491(0.088) -0.491(0.088) -0.491(0.088) -0.491(0.088) -0.491(0.088) -0.491(0.088)
CB2 -0.507(0.097) -0.507(0.097) -0.507(0.097) -0.507(0.097) -0.507(0.097) -0.507(0.097) -0.507(0.097)
PLF -0.48(0.085) -0.481(0.084) -0.48(0.084) -0.484(0.085) -0.49(0.089) -0.491(0.089) -0.492(0.089)
Information on table 17 is presented as mean(std)
Table A2.6. Mean information fraction under the alternative hypothesis of a log hazard ratio of 0.85 for trials in
rejection group 2 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 0.717(0.04) 0.717(0.04) 0.717(0.04) 0.717(0.04) 0.717(0.04) 0.717(0.04) 0.717(0.04)
0.5 Yrs
STD 0.71(0.042) 0.698(0.043) 0.686(0.042) 0.66(0.044) 0.638(0.042) 0.622(0.042) 0.611(0.043)
CB1 0.611(0.043) 0.611(0.043) 0.611(0.043) 0.611(0.043) 0.611(0.043) 0.611(0.043) 0.611(0.043)
CB2 0.516(0.042) 0.516(0.042) 0.516(0.042) 0.516(0.042) 0.516(0.042) 0.516(0.042) 0.516(0.042)
PLF 0.717(0.04) 0.706(0.04) 0.693(0.04) 0.667(0.042) 0.645(0.04) 0.631(0.04) 0.619(0.04)
0.25 Yrs
STD 0.716(0.04) 0.711(0.041) 0.705(0.04) 0.693(0.042) 0.683(0.041) 0.675(0.042) 0.671(0.042)
CB1 0.671(0.042) 0.671(0.042) 0.671(0.042) 0.671(0.042) 0.671(0.042) 0.671(0.042) 0.671(0.042)
CB2 0.62(0.043) 0.62(0.043) 0.62(0.043) 0.62(0.043) 0.62(0.043) 0.62(0.043) 0.62(0.043)
PLF 0.717(0.04) 0.713(0.04) 0.705(0.041) 0.695(0.041) 0.686(0.04) 0.678(0.041) 0.674(0.04)
0.125 Yrs
STD 0.716(0.039) 0.713(0.04) 0.71(0.04) 0.707(0.04) 0.701(0.041) 0.699(0.041) 0.696(0.041)
CB1 0.696(0.041) 0.696(0.041) 0.696(0.041) 0.696(0.041) 0.696(0.041) 0.696(0.041) 0.696(0.041)
CB2 0.672(0.043) 0.672(0.043) 0.672(0.043) 0.672(0.043) 0.672(0.043) 0.672(0.043) 0.672(0.043)
PLF 0.717(0.04) 0.714(0.041) 0.712(0.039) 0.708(0.039) 0.702(0.04) 0.699(0.04) 0.697(0.04)
Information on table 18 is presented as mean(std)
159
Table A2.7. Proportion of trials under the alternative hypothesis of a log hazard ratio of 0.85 for trials in rejection
group 3 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 518 (10.36) 518 (10.36) 518 (10.36) 518 (10.36) 518 (10.36) 518 (10.36) 518 (10.36)
0.5 Yrs
STD 551 (11.02) 568 (11.36) 591 (11.82) 610 (12.2) 637 (12.74) 650 (13) 656 (13.12)
CB1 656 (13.12) 656 (13.12) 656 (13.12) 656 (13.12) 656 (13.12) 656 (13.12) 656 (13.12)
CB2 763 (15.26) 763 (15.26) 763 (15.26) 763 (15.26) 763 (15.26) 763 (15.26) 763 (15.26)
PLF 518 (10.36) 541 (10.82) 571 (11.42) 601 (12.02) 638 (12.76) 659 (13.18) 667 (13.34)
0.25 Yrs
STD 540 (10.8) 555 (11.1) 556 (11.12) 561 (11.22) 578 (11.56) 581 (11.62) 589 (11.78)
CB1 589 (11.78) 589 (11.78) 589 (11.78) 589 (11.78) 589 (11.78) 589 (11.78) 589 (11.78)
CB2 640 (12.8) 640 (12.8) 640 (12.8) 640 (12.8) 640 (12.8) 640 (12.8) 640 (12.8)
PLF 518 (10.36) 526 (10.52) 537 (10.74) 552 (11.04) 573 (11.46) 587 (11.74) 588 (11.76)
0.125 Yrs
STD 535 (10.7) 539 (10.78) 535 (10.7) 540 (10.8) 546 (10.92) 547 (10.94) 550 (11)
CB1 550 (11) 550 (11) 550 (11) 550 (11) 550 (11) 550 (11) 550 (11)
CB2 586 (11.72) 586 (11.72) 586 (11.72) 586 (11.72) 586 (11.72) 586 (11.72) 586 (11.72)
PLF 518 (10.36) 524 (10.48) 520 (10.4) 533 (10.66) 535 (10.7) 536 (10.72) 544 (10.88)
Information on table 19 is presented as count(%)
Table A2.8. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio of 0.85 for trials in rejection
group 3 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW -0.345(0.075) -0.345(0.075) -0.345(0.075) -0.345(0.075) -0.345(0.075) -0.345(0.075) -0.345(0.075)
0.5 Yrs
STD -0.345(0.074) -0.345(0.074) -0.347(0.074) -0.348(0.073) -0.35(0.075) -0.35(0.076) -0.351(0.076)
CB1 -0.351(0.076) -0.351(0.076) -0.351(0.076) -0.351(0.076) -0.351(0.076) -0.351(0.076) -0.351(0.076)
CB2 -0.354(0.078) -0.354(0.078) -0.354(0.078) -0.354(0.078) -0.354(0.078) -0.354(0.078) -0.354(0.078)
PLF -0.345(0.075) -0.344(0.073) -0.346(0.074) -0.348(0.074) -0.349(0.074) -0.351(0.076) -0.351(0.077)
0.25 Yrs
STD -0.344(0.074) -0.345(0.074) -0.345(0.074) -0.345(0.074) -0.347(0.074) -0.348(0.075) -0.348(0.075)
CB1 -0.348(0.075) -0.348(0.075) -0.348(0.075) -0.348(0.075) -0.348(0.075) -0.348(0.075) -0.348(0.075)
CB2 -0.352(0.077) -0.352(0.077) -0.352(0.077) -0.352(0.077) -0.352(0.077) -0.352(0.077) -0.352(0.077)
PLF -0.345(0.075) -0.344(0.074) -0.346(0.075) -0.347(0.074) -0.347(0.074) -0.348(0.074) -0.348(0.074)
0.125 Yrs
STD -0.344(0.074) -0.344(0.074) -0.344(0.074) -0.345(0.075) -0.346(0.075) -0.346(0.075) -0.346(0.075)
CB1 -0.346(0.075) -0.346(0.075) -0.346(0.075) -0.346(0.075) -0.346(0.075) -0.346(0.075) -0.346(0.075)
CB2 -0.347(0.074) -0.347(0.074) -0.347(0.074) -0.347(0.074) -0.347(0.074) -0.347(0.074) -0.347(0.074)
PLF -0.345(0.075) -0.344(0.074) -0.345(0.075) -0.345(0.075) -0.346(0.075) -0.346(0.075) -0.346(0.075)
Information on table 20 is presented as mean(std)
160
Table A2.9. Mean information fraction under the alternative hypothesis of a log hazard ratio of 0.85 for trials in
rejection group 3 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 1.007(0.01) 1.007(0.01) 1.007(0.01) 1.007(0.01) 1.007(0.01) 1.007(0.01) 1.007(0.01)
0.5 Yrs
STD 1.006(0.011) 1.006(0.011) 1.005(0.011) 1.004(0.011) 1.004(0.012) 1.003(0.012) 1.003(0.012)
CB1 1.003(0.012) 1.003(0.012) 1.003(0.012) 1.003(0.012) 1.003(0.012) 1.003(0.012) 1.003(0.012)
CB2 0.995(0.014) 0.995(0.014) 0.995(0.014) 0.995(0.014) 0.995(0.014) 0.995(0.014) 0.995(0.014)
PLF 1.007(0.01) 1.007(0.011) 1.006(0.011) 1.005(0.011) 1.003(0.012) 1.003(0.012) 1.002(0.012)
0.25 Yrs
STD 1.006(0.011) 1.006(0.011) 1.006(0.011) 1.006(0.011) 1.005(0.011) 1.005(0.011) 1.005(0.011)
CB1 1.005(0.011) 1.005(0.011) 1.005(0.011) 1.005(0.011) 1.005(0.011) 1.005(0.011) 1.005(0.011)
CB2 1.002(0.012) 1.002(0.012) 1.002(0.012) 1.002(0.012) 1.002(0.012) 1.002(0.012) 1.002(0.012)
PLF 1.007(0.01) 1.007(0.01) 1.006(0.011) 1.006(0.011) 1.005(0.011) 1.005(0.011) 1.005(0.011)
0.125 Yrs
STD 1.007(0.01) 1.007(0.01) 1.006(0.011) 1.006(0.011) 1.006(0.011) 1.006(0.011) 1.006(0.011)
CB1 1.006(0.011) 1.006(0.011) 1.006(0.011) 1.006(0.011) 1.006(0.011) 1.006(0.011) 1.006(0.011)
CB2 1.004(0.011) 1.004(0.011) 1.004(0.011) 1.004(0.011) 1.004(0.011) 1.004(0.011) 1.004(0.011)
PLF 1.007(0.01) 1.007(0.01) 1.007(0.01) 1.006(0.01) 1.006(0.011) 1.006(0.011) 1.006(0.011)
Information on table 21 is presented as mean(std)
Table A2.10. Proportion of trials under the alternative hypothesis of a log hazard ratio of 0.85 for trials in rejection
group 4 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 4055 (81.1) 4055 (81.1) 4055 (81.1) 4055 (81.1) 4055 (81.1) 4055 (81.1) 4055 (81.1)
0.5 Yrs
STD 4053 (81.06) 4041 (80.82) 4040 (80.8) 4045 (80.9) 4055 (81.1) 4056 (81.12) 4053 (81.06)
CB1 4053 (81.06) 4053 (81.06) 4053 (81.06) 4053 (81.06) 4053 (81.06) 4053 (81.06) 4053 (81.06)
CB2 4050 (81) 4050 (81) 4050 (81) 4050 (81) 4050 (81) 4050 (81) 4050 (81)
PLF 4055 (81.1) 4048 (80.96) 4045 (80.9) 4043 (80.86) 4061 (81.22) 4058 (81.16) 4052 (81.04)
0.25 Yrs
STD 4044 (80.88) 4044 (80.88) 4039 (80.78) 4038 (80.76) 4055 (81.1) 4054 (81.08) 4054 (81.08)
CB1 4054 (81.08) 4054 (81.08) 4054 (81.08) 4054 (81.08) 4054 (81.08) 4054 (81.08) 4054 (81.08)
CB2 4053 (81.06) 4053 (81.06) 4053 (81.06) 4053 (81.06) 4053 (81.06) 4053 (81.06) 4053 (81.06)
PLF 4055 (81.1) 4053 (81.06) 4057 (81.14) 4049 (80.98) 4060 (81.2) 4056 (81.12) 4057 (81.14)
0.125 Yrs
STD 4048 (80.96) 4055 (81.1) 4048 (80.96) 4050 (81) 4047 (80.94) 4055 (81.1) 4047 (80.94)
CB1 4047 (80.94) 4047 (80.94) 4047 (80.94) 4047 (80.94) 4047 (80.94) 4047 (80.94) 4047 (80.94)
CB2 4048 (80.96) 4048 (80.96) 4048 (80.96) 4048 (80.96) 4048 (80.96) 4048 (80.96) 4048 (80.96)
PLF 4055 (81.1) 4055 (81.1) 4056 (81.12) 4056 (81.12) 4047 (80.94) 4057 (81.14) 4054 (81.08)
Information on table 22 is presented as count(%)
161
Table A2.11. Mean log hazard ratio under the alternative hypothesis of a log hazard ratio of 0.85 for trials in rejection
group 4 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW -0.116(0.108) -0.116(0.108) -0.116(0.108) -0.116(0.108) -0.116(0.108) -0.116(0.108) -0.116(0.108)
0.5 Yrs
STD -0.116(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108)
CB1 -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108)
CB2 -0.114(0.108) -0.114(0.108) -0.114(0.108) -0.114(0.108) -0.114(0.108) -0.114(0.108) -0.114(0.108)
PLF -0.116(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108)
0.25 Yrs
STD -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108)
CB1 -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108)
CB2 -0.115(0.109) -0.115(0.109) -0.115(0.109) -0.115(0.109) -0.115(0.109) -0.115(0.109) -0.115(0.109)
PLF -0.116(0.108) -0.116(0.108) -0.116(0.108) -0.115(0.108) -0.116(0.108) -0.115(0.108) -0.116(0.108)
0.125 Yrs
STD -0.115(0.108) -0.116(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.116(0.108) -0.115(0.108)
CB1 -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108)
CB2 -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108) -0.115(0.108)
PLF -0.116(0.108) -0.116(0.108) -0.116(0.108) -0.116(0.108) -0.116(0.108) -0.116(0.108) -0.116(0.108)
Information on table 23 is presented as mean(std)
Table A2.12. Mean information fraction under the alternative hypothesis of a log hazard ratio of 0.85 for trials in
rejection group 4 for all data processing methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 0.996(0.015) 0.996(0.015) 0.996(0.015) 0.996(0.015) 0.996(0.015) 0.996(0.015) 0.996(0.015)
0.5 Yrs
STD 0.996(0.015) 0.995(0.015) 0.994(0.015) 0.993(0.015) 0.991(0.016) 0.99(0.016) 0.99(0.016)
CB1 0.99(0.016) 0.99(0.016) 0.99(0.016) 0.99(0.016) 0.99(0.016) 0.99(0.016) 0.99(0.016)
CB2 0.981(0.018) 0.981(0.018) 0.981(0.018) 0.981(0.018) 0.981(0.018) 0.981(0.018) 0.981(0.018)
PLF 0.996(0.015) 0.995(0.015) 0.994(0.015) 0.993(0.015) 0.991(0.016) 0.99(0.016) 0.99(0.016)
0.25 Yrs
STD 0.996(0.015) 0.996(0.015) 0.995(0.015) 0.994(0.015) 0.994(0.015) 0.993(0.015) 0.993(0.015)
CB1 0.993(0.015) 0.993(0.015) 0.993(0.015) 0.993(0.015) 0.993(0.015) 0.993(0.015) 0.993(0.015)
CB2 0.989(0.016) 0.989(0.016) 0.989(0.016) 0.989(0.016) 0.989(0.016) 0.989(0.016) 0.989(0.016)
PLF 0.996(0.015) 0.996(0.015) 0.995(0.015) 0.994(0.015) 0.994(0.015) 0.993(0.015) 0.993(0.015)
0.125 Yrs
STD 0.996(0.015) 0.995(0.015) 0.994(0.015) 0.993(0.015) 0.991(0.016) 0.99(0.016) 0.99(0.016)
CB1 0.99(0.016) 0.99(0.016) 0.99(0.016) 0.99(0.016) 0.99(0.016) 0.99(0.016) 0.99(0.016)
CB2 0.981(0.018) 0.981(0.018) 0.981(0.018) 0.981(0.018) 0.981(0.018) 0.981(0.018) 0.981(0.018)
PLF 0.996(0.015) 0.995(0.015) 0.994(0.015) 0.993(0.015) 0.991(0.016) 0.99(0.016) 0.99(0.016)
Information on table 24 is presented as mean(std)
162
A3. Under the Null Hypothesis
Table A3.1. Proportion of trials under the null hypothesis for trials in rejection group 1 for all data processing
methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 24(0.48%) 24(0.48%) 24(0.48%) 24(0.48%) 24(0.48%) 24(0.48%) 24(0.48%)
0.5 Yrs
STD 21(0.42%) 25(0.5%) 22(0.44%) 17(0.34%) 15(0.3%) 13(0.26%) 8(0.16%)
CB1 8(0.16%) 8(0.16%) 8(0.16%) 8(0.16%) 8(0.16%) 8(0.16%) 8(0.16%)
CB2 3(0.06%) 3(0.06%) 3(0.06%) 3(0.06%) 3(0.06%) 3(0.06%) 3(0.06%)
PLF 24(0.48%) 20(0.4%) 22(0.44%) 18(0.36%) 9(0.18%) 9(0.18%) 10(0.2%)
0.25 Yrs
STD 19(0.38%) 19(0.38%) 19(0.38%) 15(0.3%) 13(0.26%) 15(0.3%) 13(0.26%)
CB1 13(0.26%) 13(0.26%) 13(0.26%) 13(0.26%) 13(0.26%) 13(0.26%) 13(0.26%)
CB2 13(0.26%) 13(0.26%) 13(0.26%) 13(0.26%) 13(0.26%) 13(0.26%) 13(0.26%)
PLF 24(0.48%) 19(0.38%) 19(0.38%) 21(0.42%) 16(0.32%) 15(0.3%) 16(0.32%)
0.125 Yrs
STD 24(0.48%) 21(0.42%) 22(0.44%) 20(0.4%) 21(0.42%) 20(0.4%) 20(0.4%)
CB1 20(0.4%) 20(0.4%) 20(0.4%) 20(0.4%) 20(0.4%) 20(0.4%) 20(0.4%)
CB2 9(0.18%) 9(0.18%) 9(0.18%) 9(0.18%) 9(0.18%) 9(0.18%) 9(0.18%)
PLF 24(0.48%) 22(0.44%) 24(0.48%) 23(0.46%) 20(0.4%) 18(0.36%) 18(0.36%)
Information on table 25 is presented as count(%)
Table A3.2. Mean log hazard ratio under null hypothesis for trials in rejection group 1 for all data processing methods
by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW -0.44(0.693) -0.44(0.693) -0.44(0.693) -0.44(0.693) -0.44(0.693) -0.44(0.693) -0.44(0.693)
0.5 Yrs
STD -0.635(0.528) -0.527(0.624) -0.59(0.6) -0.659(0.65) -0.639(0.699) -0.482(0.865) -0.743(0.768)
CB1 -0.743(0.768) -0.743(0.768) -0.743(0.768) -0.743(0.768) -0.743(0.768) -0.743(0.768) -0.743(0.768)
CB2 -1.512(0.526) -1.512(0.526) -1.512(0.526) -1.512(0.526) -1.512(0.526) -1.512(0.526) -1.512(0.526)
PLF -0.44(0.693) -0.454(0.685) -0.501(0.649) -0.364(0.839) -0.937(0.2) -0.777(0.678) -0.74(0.641)
0.25 Yrs
STD -0.535(0.64) -0.523(0.627) -0.618(0.55) -0.595(0.653) -0.63(0.714) -0.549(0.786) -0.767(0.555)
CB1 -0.767(0.555) -0.767(0.555) -0.767(0.555) -0.767(0.555) -0.767(0.555) -0.767(0.555) -0.767(0.555)
CB2 -0.683(0.836) -0.683(0.836) -0.683(0.836) -0.683(0.836) -0.683(0.836) -0.683(0.836) -0.683(0.836)
PLF -0.44(0.693) -0.44(0.705) -0.377(0.756) -0.419(0.763) -0.756(0.485) -0.66(0.67) -0.769(0.492)
0.125 Yrs
STD -0.44(0.686) -0.47(0.667) -0.412(0.71) -0.394(0.75) -0.495(0.693) -0.558(0.651) -0.56(0.639)
CB1 -0.56(0.639) -0.56(0.639) -0.56(0.639) -0.56(0.639) -0.56(0.639) -0.56(0.639) -0.56(0.639)
CB2 -0.932(0.23) -0.932(0.23) -0.932(0.23) -0.932(0.23) -0.932(0.23) -0.932(0.23) -0.932(0.23)
PLF -0.44(0.693) -0.478(0.66) -0.436(0.691) -0.489(0.66) -0.479(0.709) -0.539(0.688) -0.454(0.75)
Information on table 26 is presented as mean(std)
163
Table A3.3. Proportion of trials under the null hypothesis for trials in rejection group 2 for all data processing
methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 88(1.76%) 88(1.76%) 88(1.76%) 88(1.76%) 88(1.76%) 88(1.76%) 88(1.76%)
0.5 Yrs
STD 82(1.64%) 79(1.58%) 80(1.6%) 73(1.46%) 69(1.38%) 65(1.3%) 64(1.28%)
CB1 64(1.28%) 64(1.28%) 64(1.28%) 64(1.28%) 64(1.28%) 64(1.28%) 64(1.28%)
CB2 36(0.72%) 36(0.72%) 36(0.72%) 36(0.72%) 36(0.72%) 36(0.72%) 36(0.72%)
PLF 88(1.76%) 84(1.68%) 82(1.64%) 74(1.48%) 76(1.52%) 74(1.48%) 72(1.44%)
0.25 Yrs
STD 85(1.7%) 83(1.66%) 84(1.68%) 76(1.52%) 78(1.56%) 74(1.48%) 72(1.44%)
CB1 72(1.44%) 72(1.44%) 72(1.44%) 72(1.44%) 72(1.44%) 72(1.44%) 72(1.44%)
CB2 61(1.22%) 61(1.22%) 61(1.22%) 61(1.22%) 61(1.22%) 61(1.22%) 61(1.22%)
PLF 88(1.76%) 83(1.66%) 79(1.58%) 68(1.36%) 71(1.42%) 74(1.48%) 71(1.42%)
0.125 Yrs
STD 84(1.68%) 81(1.62%) 81(1.62%) 71(1.42%) 78(1.56%) 77(1.54%) 75(1.5%)
CB1 75(1.5%) 75(1.5%) 75(1.5%) 75(1.5%) 75(1.5%) 75(1.5%) 75(1.5%)
CB2 74(1.48%) 74(1.48%) 74(1.48%) 74(1.48%) 74(1.48%) 74(1.48%) 74(1.48%)
PLF 88(1.76%) 88(1.76%) 88(1.76%) 77(1.54%) 75(1.5%) 78(1.56%) 77(1.54%)
Information on table 27 is presented as count(%)
Table A3.4. Mean log hazard ratio under null hypothesis for trials in rejection group 2 for all data processing methods
by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW -0.032(0.462) -0.032(0.462) -0.032(0.462) -0.032(0.462) -0.032(0.462) -0.032(0.462) -0.032(0.462)
0.5 Yrs
STD -0.058(0.468) -0.081(0.473) -0.055(0.486) -0.069(0.505) -0.095(0.515) -0.06(0.538) -0.119(0.534)
CB1 -0.119(0.534) -0.119(0.534) -0.119(0.534) -0.119(0.534) -0.119(0.534) -0.119(0.534) -0.119(0.534)
CB2 -0.265(0.564) -0.265(0.564) -0.265(0.564) -0.265(0.564) -0.265(0.564) -0.265(0.564) -0.265(0.564)
PLF -0.032(0.462) -0.046(0.47) -0.006(0.48) -0.065(0.497) -0.154(0.497) -0.159(0.509) -0.171(0.513)
0.25 Yrs
STD -0.064(0.461) -0.048(0.465) -0.021(0.472) -0.016(0.487) -0.079(0.482) -0.027(0.498) -0.038(0.498)
CB1 -0.038(0.498) -0.038(0.498) -0.038(0.498) -0.038(0.498) -0.038(0.498) -0.038(0.498) -0.038(0.498)
CB2 -0.051(0.539) -0.051(0.539) -0.051(0.539) -0.051(0.539) -0.051(0.539) -0.051(0.539) -0.051(0.539)
PLF -0.032(0.462) -0.021(0.468) -0.035(0.478) -0.011(0.491) -0.074(0.486) -0.045(0.496) -0.018(0.501)
0.125 Yrs
STD -0.067(0.456) -0.056(0.462) -0.035(0.469) -0.031(0.475) -0.047(0.473) -0.021(0.481) -0.014(0.482)
CB1 -0.014(0.482) -0.014(0.482) -0.014(0.482) -0.014(0.482) -0.014(0.482) -0.014(0.482) -0.014(0.482)
CB2 -0.084(0.489) -0.084(0.489) -0.084(0.489) -0.084(0.489) -0.084(0.489) -0.084(0.489) -0.084(0.489)
PLF -0.032(0.462) -0.005(0.466) -0.015(0.469) -0.014(0.477) -0.019(0.481) -0.015(0.483) 0(0.484)
Information on table 28 is presented as mean(std)
164
Table A3.5. Proportion of trials under the null hypothesis for trials in rejection group 3 for all data processing
methods by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 127(2.54%) 127(2.54%) 127(2.54%) 127(2.54%) 127(2.54%) 127(2.54%) 127(2.54%)
0.5 Yrs
STD 138(2.76%) 141(2.82%) 147(2.94%) 154(3.08%) 155(3.1%) 160(3.2%) 164(3.28%)
CB1 164(3.28%) 164(3.28%) 164(3.28%) 164(3.28%) 164(3.28%) 164(3.28%) 164(3.28%)
CB2 185(3.7%) 185(3.7%) 185(3.7%) 185(3.7%) 185(3.7%) 185(3.7%) 185(3.7%)
PLF 127(2.54%) 132(2.64%) 139(2.78%) 151(3.02%) 154(3.08%) 156(3.12%) 162(3.24%)
0.25 Yrs
STD 128(2.56%) 129(2.58%) 132(2.64%) 137(2.74%) 141(2.82%) 142(2.84%) 143(2.86%)
CB1 143(2.86%) 143(2.86%) 143(2.86%) 143(2.86%) 143(2.86%) 143(2.86%) 143(2.86%)
CB2 155(3.1%) 155(3.1%) 155(3.1%) 155(3.1%) 155(3.1%) 155(3.1%) 155(3.1%)
PLF 127(2.54%) 128(2.56%) 129(2.58%) 139(2.78%) 147(2.94%) 144(2.88%) 146(2.92%)
0.125 Yrs
STD 130(2.6%) 131(2.62%) 130(2.6%) 133(2.66%) 133(2.66%) 132(2.64%) 137(2.74%)
CB1 137(2.74%) 137(2.74%) 137(2.74%) 137(2.74%) 137(2.74%) 137(2.74%) 137(2.74%)
CB2 143(2.86%) 143(2.86%) 143(2.86%) 143(2.86%) 143(2.86%) 143(2.86%) 143(2.86%)
PLF 127(2.54%) 127(2.54%) 123(2.46%) 131(2.62%) 134(2.68%) 132(2.64%) 134(2.68%)
Information on table 29 is presented as count(%)
Table A3.6. Mean log hazard ratio under null hypothesis for trials in rejection group 3 for all data processing methods
by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 0.083(0.333) 0.083(0.333) 0.083(0.333) 0.083(0.333) 0.083(0.333) 0.083(0.333) 0.083(0.333)
0.5 Yrs
STD 0.076(0.335) 0.071(0.337) 0.063(0.339) 0.07(0.339) 0.074(0.341) 0.052(0.341) 0.056(0.342)
CB1 0.056(0.342) 0.056(0.342) 0.056(0.342) 0.056(0.342) 0.056(0.342) 0.056(0.342) 0.056(0.342)
CB2 0.064(0.344) 0.064(0.344) 0.064(0.344) 0.064(0.344) 0.064(0.344) 0.064(0.344) 0.064(0.344)
PLF 0.083(0.333) 0.076(0.333) 0.057(0.34) 0.057(0.342) 0.091(0.334) 0.068(0.339) 0.062(0.341)
0.25 Yrs
STD 0.088(0.332) 0.085(0.332) 0.073(0.335) 0.055(0.339) 0.065(0.336) 0.054(0.339) 0.066(0.336)
CB1 0.066(0.336) 0.066(0.336) 0.066(0.336) 0.066(0.336) 0.066(0.336) 0.066(0.336) 0.066(0.336)
CB2 0.061(0.343) 0.061(0.343) 0.061(0.343) 0.061(0.343) 0.061(0.343) 0.061(0.343) 0.061(0.343)
PLF 0.083(0.333) 0.08(0.333) 0.074(0.336) 0.058(0.338) 0.068(0.335) 0.062(0.336) 0.057(0.337)
0.125 Yrs
STD 0.082(0.334) 0.079(0.335) 0.071(0.336) 0.064(0.338) 0.066(0.338) 0.06(0.339) 0.06(0.339)
CB1 0.06(0.339) 0.06(0.339) 0.06(0.339) 0.06(0.339) 0.06(0.339) 0.06(0.339) 0.06(0.339)
CB2 0.066(0.337) 0.066(0.337) 0.066(0.337) 0.066(0.337) 0.066(0.337) 0.066(0.337) 0.066(0.337)
PLF 0.083(0.333) 0.077(0.333) 0.08(0.333) 0.071(0.336) 0.059(0.34) 0.055(0.341) 0.054(0.341)
Information on table 30 is presented as mean(std)
165
Table A3.7. Proportion of trials under the null hypothesis for trials in rejection group 4 for all data processing methods
by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 4761(95.22%) 4761(95.22%) 4761(95.22%) 4761(95.22%) 4761(95.22%) 4761(95.22%) 4761(95.22%)
0.5 Yrs
STD 4759(95.18%) 4755(95.1%) 4751(95.02%) 4756(95.12%) 4761(95.22%) 4762(95.24%) 4764(95.28%)
CB1 4764(95.28%) 4764(95.28%) 4764(95.28%) 4764(95.28%) 4764(95.28%) 4764(95.28%) 4764(95.28%)
CB2 4776(95.52%) 4776(95.52%) 4776(95.52%) 4776(95.52%) 4776(95.52%) 4776(95.52%) 4776(95.52%)
PLF 4761(95.22%) 4764(95.28%) 4757(95.14%) 4757(95.14%) 4761(95.22%) 4761(95.22%) 4756(95.12%)
0.25 Yrs
STD 4768(95.36%) 4769(95.38%) 4765(95.3%) 4772(95.44%) 4768(95.36%) 4769(95.38%) 4772(95.44%)
CB1 4772(95.44%) 4772(95.44%) 4772(95.44%) 4772(95.44%) 4772(95.44%) 4772(95.44%) 4772(95.44%)
CB2 4771(95.42%) 4771(95.42%) 4771(95.42%) 4771(95.42%) 4771(95.42%) 4771(95.42%) 4771(95.42%)
PLF 4761(95.22%) 4770(95.4%) 4773(95.46%) 4772(95.44%) 4766(95.32%) 4767(95.34%) 4767(95.34%)
0.125 Yrs
STD 4762(95.24%) 4767(95.34%) 4767(95.34%) 4776(95.52%) 4768(95.36%) 4771(95.42%) 4768(95.36%)
CB1 4768(95.36%) 4768(95.36%) 4768(95.36%) 4768(95.36%) 4768(95.36%) 4768(95.36%) 4768(95.36%)
CB2 4774(95.48%) 4774(95.48%) 4774(95.48%) 4774(95.48%) 4774(95.48%) 4774(95.48%) 4774(95.48%)
PLF 4761(95.22%) 4763(95.26%) 4765(95.3%) 4769(95.38%) 4771(95.42%) 4772(95.44%) 4771(95.42%)
Information on table 31 is presented as count(%)
Table A3.8. Mean log hazard ratio under null hypothesis for trials in rejection group 4 for all data processing methods
by delayed reporting probability and length between scheduled visits.
0 0.1 0.25 0.5 0.75 0.9 1
RAW 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123)
0.5 Yrs
STD 0.003(0.123) 0.003(0.123) 0.003(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123)
CB1 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123)
CB2 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123)
PLF 0.002(0.123) 0.002(0.123) 0.002(0.122) 0.002(0.123) 0.002(0.123) 0.003(0.123) 0.003(0.123)
0.25 Yrs
STD 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.003(0.123) 0.002(0.123) 0.002(0.123)
CB1 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123)
CB2 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123)
PLF 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123)
0.125 Yrs
STD 0.003(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.003(0.123) 0.003(0.123) 0.002(0.123)
CB1 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123)
CB2 0.003(0.123) 0.003(0.123) 0.003(0.123) 0.003(0.123) 0.003(0.123) 0.003(0.123) 0.003(0.123)
PLF 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123) 0.002(0.123)
Information on table 32 is presented as mean(std)
166
A4. Correlation between Increments of the Score Statistic
Table A4.1. Correlation coefficient of the estimate of the first score statistic with the increment to the
second score statistic for all processing methods by delayed reporting probabilities under the null
hypothesis.
0 0.1 0.25 0.5 0.75 0.9 1
STD
0.5 0.0151 0.0152 0.0048 0.0035 0.0062 -0.0027 -0.0028
0.25 0.0212 0.0178 0.0101 0.0061 0.0007 -0.0055 -0.0088
0.125 0.0075 0.0072 0.0042 -0.0006 -0.0048 -0.0068 -0.0073
CB1
0.5 -0.0028 -0.0028 -0.0028 -0.0028 -0.0028 -0.0028 -0.0028
0.25 -0.0088 -0.0088 -0.0088 -0.0088 -0.0088 -0.0088 -0.0088
0.125 -0.0073 -0.0073 -0.0073 -0.0073 -0.0073 -0.0073 -0.0073
CB2
0.5 0.0043 0.0043 0.0043 0.0043 0.0043 0.0043 0.0043
0.25 0.0009 0.0009 0.0009 0.0009 0.0009 0.0009 0.0009
0.125 -0.0189 -0.0189 -0.0189 -0.0189 -0.0189 -0.0189 -0.0189
PLF
0.5 -0.0159 -0.0284* -0.0539* -0.0804* -0.089 * -0.1088* -0.1129*
0.25 -0.0159 -0.0217 -0.0323* -0.0409* -0.0451* -0.0542* -0.0589*
0.125 -0.0159 -0.0157 -0.0188 -0.0242 -0.028 * -0.0305* -0.0309*
* denotes statistical significance at the 0.05 level of significance.
Table A4.2. Correlation coefficient of the estimate of the increment from the first score statistic to the
second and the estimate of the increment from the second score statistic to the third by delayed reporting
probabilities under the null hypothesis.
0 0.1 0.25 0.5 0.75 0.9 1
STD
0.5 0.0403 * 0.032* 0.0328 * 0.0173 0.0017 0.0082 0.0015
0.25 0.033* 0.0259 0.0193 0.0105 -0.0013 -0.0067 -0.011
0.125 0.0169 0.011 0.0081 0.0003 0.0006 -0.0025 -0.0072
CB1
0.5 0.0015 0.0015 0.0015 0.0015 0.0015 0.0015 0.0015
0.25 -0.011 -0.011 -0.011 -0.011 -0.011 -0.011 -0.011
0.125 -0.0072 -0.0072 -0.0072 -0.0072 -0.0072 -0.0072 -0.0072
CB2
0.5 -0.0028 -0.0028 -0.0028 -0.0028 -0.0028 -0.0028 -0.0028
0.25 -0.0064 -0.0064 -0.0064 -0.0064 -0.0064 -0.0064 -0.0064
0.125 -0.0069 -0.0069 -0.0069 -0.0069 -0.0069 -0.0069 -0.0069
PLF
0.5 -0.0067 -0.0257 -0.0409* -0.0702* -0.0965* -0.0915* -0.0991*
0.25 -0.0067 -0.0154 -0.0243 -0.0335* -0.0479* -0.0544* -0.0585*
0.125 -0.0067 -0.0132 -0.016 -0.0238 -0.0239 -0.0269 -0.0319*
* denotes statistical significance at the 0.05 level of significance.
167
A5. Expected Values of Elements in Variance Covariance Matrix for =
Table A5.1. Comparison of expected values of elements of the covariance
matrix (3.11) for the standard method of data processing under the null
hypothesis of no treatment effect when = 0 and = 0.5.
Asymptotic
Result
N=104*
per
group
N=500*
per
group
N=1000*
per
group
N=2000*
per
group
( , ) 0.45863 0.45897 0.45874 0.45868 0.45867
( , ) 0.3387 0.33919 0.33876 0.33837 0.33854
( , )
0.22815 0.22871 0.22835 0.22792 0.22804
( , , ) 0.16769 0.16808 0.16781 0.16758 0.16768
( , ) 0.65008 0.64988 0.6498 0.65001 0.65011
( , ) 0.55938 0.55938 0.55916 0.55914 0.55924
( , )
0.54494 0.54491 0.54455 0.54445 0.54465
( , , ) 0.3357 0.33541 0.33531 0.33542 0.33557
( , ) 0.97848 0.97843 0.97843 0.97847 0.97848
( , ) 0.97354 0.97389 0.97367 0.97341 0.97344
( , )
1.76254 1.76469 1.76362 1.76244 1.76257
( , , ) 0.89912 0.89916 0.89907 0.89896 0.89904
( , , ) 0.45863 0.45897 0.45874 0.45868 0.45867
( , , ) 0.16769 0.16808 0.16781 0.16758 0.16768
( , , ) 0.2639 0.26431 0.26386 0.26365 0.26379
( , , ) 0.34016 0.3406 0.3402 0.33973 0.33997
( , , ) 0.65008 0.64988 0.6498 0.65001 0.65011
( , , ) 0.3357 0.33541 0.33531 0.33542 0.33557
( , , ) 0.54562 0.54541 0.54534 0.54536 0.54547
( , , ) 0.79908 0.79979 0.79894 0.79862 0.79883
( , , ) 0.45863 0.45897 0.45874 0.45868 0.45867
( , , ) 0.16769 0.16808 0.16781 0.16758 0.16768
( , , ) 0.3341 0.33445 0.33413 0.33377 0.33394
( , , ) 0.42833 0.42888 0.42838 0.42771 0.42806
* From 5,000 simulated trials
168
A6. Variance Covariance Matrices K=3
Table A6.1. Asymptotic variance covariance matrix (3.6) of vector (3.4) for group , under the null hypothesis
under the standard method of data processing = 0 and = 0.5.
0.2482888 0.0123524 0.1604834 -0.088859 0.0098719 -0.278808
0.0123524 0.1134332 0.0437169 0.1507028 0.0026892 0.0985925
0.1604834 0.0437169 0.227475 -0.027947 0.0139928 -0.297188
-0.088859 0.1507028 -0.027947 0.2320394 -0.001719 0.2545015
0.0098719 0.0026892 0.0139928 -0.001719 0.0210613 -0.053472
-0.278808 0.0985925 -0.297188 0.2545015 -0.053472 0.8147588
Table A6.2. Variance covariance matrix (3.6) of vector (3.4) for group , under the null hypothesis under the
standard method of data processing for simulated trial data with 104 participants per group* = 0 and = 0.5.
0.2483165 0.012405 0.1606961 -0.088659 0.0098996 -0.278903
0.012405 0.1136636 0.0438839 0.1508621 0.0025822 0.0985468
0.1606961 0.0438839 0.2275372 -0.02812 0.0140173 -0.297496
-0.088659 0.1508621 -0.02812 0.2320007 -0.001904 0.2550079
0.0098996 0.0025822 0.0140173 -0.001904 0.021104 -0.05372
-0.278903 0.0985468 -0.297496 0.2550079 -0.05372 0.8162295
* From 5,000 simulated trials
Table A6.3. Variance covariance matrix (3.6) of vector (3.4) for group , under the null hypothesis under the
standard method of data processing for simulated trial data with 500 participants per group* = 0 and = 0.5.
0.2482976 0.0124061 0.1606501 -0.088701 0.0098967 -0.278857
0.0124061 0.1135989 0.0437403 0.1507832 0.0026822 0.0985424
0.1606501 0.0437403 0.2275595 -0.028036 0.0140186 -0.297389
-0.088701 0.1507832 -0.028036 0.2318895 -0.001751 0.2545028
0.0098967 0.0026822 0.0140186 -0.001751 0.0211082 -0.053599
-0.278857 0.0985424 -0.297389 0.2545028 -0.053599 0.8155789
* From 5,000 simulated trials
169
Table A6.4. Variance covariance matrix (3.6) of vector (3.4) for group , under the null hypothesis under the
standard method of data processing for simulated trial data with 1,000 participants per group* = 0 and = 0.5.
0.248293 0.0123745 0.1605332 -0.088892 0.0098774 -0.27891
0.0123745 0.1134296 0.0437054 0.1505298 0.002687 0.098335
0.1605332 0.0437054 0.2274959 -0.028034 0.0139975 -0.297314
-0.088892 0.1505298 -0.028034 0.2318082 -0.001748 0.2543463
0.0098774 0.002687 0.0139975 -0.001748 0.0210705 -0.053494
-0.27891 0.098335 -0.297314 0.2543463 -0.053494 0.8149085
* From 5,000 simulated trials
Table A6.5. Variance covariance matrix (3.6) of vector (3.4) for group , under the null hypothesis under the
standard method of data processing for simulated trial data with 2,000 participants per group* = 0 and = 0.5.
0.248292 0.0124037 0.160484 -0.088826 0.0098704 -0.27881
0.0124037 0.1134358 0.0437026 0.1506447 0.0026888 0.0985124
0.160484 0.0437026 0.2274664 -0.027998 0.0139901 -0.297277
-0.088826 0.1506447 -0.027998 0.2319038 -0.001728 0.2544461
0.0098704 0.0026888 0.0139901 -0.001728 0.0210564 -0.053453
-0.27881 0.0985124 -0.297277 0.2544461 -0.053453 0.8149826
* From 5,000 simulated trials
170
A7. Variance Covariance Matrices of the Score Statistics for K=3
Table A7.1. Asymptotic variance covariance matrix (3.11) of the score statistics for group , under the null
hypothesis under the standard method of data processing = 0 and = 0.5.
0.2114135 0.2208555 0.2103162
0.2208555 0.3029123 0.3059764
0.2103162 0.3059764 0.4757921
Table A7.2. Variance covariance matrix (3.11) of the score statistics for group , under the null hypothesis under
the standard method of data processing for simulated data with 104 participants per group* = 0 and = 0.5.
0.2114318 0.2207391 0.2102901
0.2207391 0.3030042 0.3063777
0.2102901 0.3063777 0.476454
* From 5,000 simulated trials
Table A7.3. Variance covariance matrix (3.11) of the score statistics for group , under the null hypothesis under
the standard method of data processing for simulated data with 500 participants per group* = 0 and = 0.5.
0.2115099 0.2208943 0.2102896
0.2208943 0.3029437 0.306049
0.2102896 0.306049 0.4761945
* From 5,000 simulated trials
Table A7.4. Variance covariance matrix (3.11) of the score statistics for group , under the null hypothesis under
the standard method of data processing for simulated data with 1,000 participants per group* = 0 and = 0.5.
0.2115904 0.2209211 0.2102929
0.2209211 0.3029758 0.3060519
0.2102929 0.3060519 0.4760031
* From 5,000 simulated trials
171
Table A7.5. Variance covariance matrix (3.11) of the score statistics for group , under the null hypothesis under
the standard method of data processing for simulated data with 2,000 participants per group* = 0 and = 0.5.
0.2115904 0.2209211 0.2102929
0.2209211 0.3029758 0.3060519
0.2102929 0.3060519 0.4760031
* From 5,000 simulated trials
172
A8. Variance Covariance Matrices of the Increments of the Score Statistics for K=3
Table A8.1. Asymptotic variance covariance matrix of the increments of the score statistics for group , under the
null hypothesis under the standard method of data processing = 0 and = 0.5.
0.2114135 0.0094421 -0.010539
0.0094421 0.0726147 0.0136034
-0.010539 0.0136034 0.1667516
* From 5,000 simulated trials
Table A8.2. Variance covariance matrix of the increments of the score statistics for group , under the null
hypothesis under the standard method of data processing for simulated data with 104 participants per group* =
0 and = 0.5.
0.2114318 0.0093073 -0.010449
0.0093073 0.0729578 0.0138226
-0.010449 0.0138226 0.1667027
* From 5,000 simulated trials
Table A8.3. Variance covariance matrix of the increments of the score statistics for group , under the null
hypothesis under the standard method of data processing for simulated data with 500 participants per group* =
0 and = 0.5.
0.2115099 0.0093844 -0.010605
0.0093844 0.072665 0.01371
-0.010605 0.01371 0.1670401
* From 5,000 simulated trials
Table A8.4. Variance covariance matrix of the increments of the score statistics for group , under the null
hypothesis under the standard method of data processing for simulated data with 1,000 participants per group*
= 0 and = 0.5.
0.2115904 0.0093306 -0.010628
0.0093306 0.0727241 0.0137042
-0.010628 0.0137042 0.1668751
* From 5,000 simulated trials
173
Table A8.5. Variance covariance matrix of the increments of the score statistics for group , under the null
hypothesis under the standard method of data processing for simulated data with 2,000 participants per group*
= 0 and = 0.5.
0.2115904 0.0093306 -0.010628
0.0093306 0.0727241 0.0137042
-0.010628 0.0137042 0.1668751
* From 5,000 simulated trials
174
A9. Exit Probabilities Under the Null Hypothesis
Table A9.1. Comparison of exit probabilities for each of three total analyses assuming independent
increments and under the asymptotic situation based on the window length between visits, w, and the
probability of delayed reporting, ρ.
Analysis
Time
Asymptotic
Information
Fraction
Group
Sequential
Boundary
Exit
Probability
Asymptotic
Exit
Probability
Total
Alpha
Spent
w = 0.125 years
= 0
1 0.324 2.7904 0.00526 0.00526
2 0.664 2.3475 0.01681 0.01525
3 1 2.0611 0.02793 0.02803 0.0485
= 0.1
1 0.322 2.7951 0.00519 0.00519
2 0.662 2.3497 0.01673 0.01522
3 1 2.0605 0.02808 0.02818 0.0486
= 0.2
1 0.32 2.8 0.00511 0.00511
2 0.66 2.3519 0.01666 0.0152
3 1 2.0598 0.02823 0.02832 0.0486
= 1
1 0.301 2.8396 0.00452 0.00452
2 0.642 2.3697 0.01608 0.01496
3 1 2.0549 0.0294 0.02947 0.049
w = 0.25 years
= 0
1 0.324 2.7904 0.00526 0.00526
2 0.664 2.3475 0.01681 0.0151
3 1 2.0611 0.02793 0.02811 0.0485
= 0.1
1 0.32 2.8 0.00511 0.00511
2 0.66 2.3521 0.01665 0.01505
3 1 2.0598 0.02824 0.02839 0.0486
= 0.2
1 0.315 2.8097 0.00496 0.00496
2 0.655 2.3567 0.0165 0.01499
3 1 2.0585 0.02854 0.02868 0.0486
= 1
1 0.277 2.8923 0.00382 0.00382
2 0.618 2.3943 0.01526 0.01436
3 1 2.0486 0.03091 0.03097 0.0492
w = 0.5 years
= 0
1 0.324 2.7904 0.00526 0.00526
2 0.664 2.3475 0.01681 0.01502
3 1 2.0611 0.02793 0.02853 0.0488
= 0.1
1 0.315 2.8101 0.00495 0.00495
2 0.654 2.3584 0.01642 0.01481
3 1 2.0582 0.02863 0.02915 0.0489
= 0.2
1 0.305 2.8304 0.00465 0.00465
2 0.643 2.3694 0.01603 0.01459
3 1 2.0553 0.02932 0.02978 0.049
= 1
1 0.226 3.0166 0.00256 0.00256
2 0.557 2.4625 0.01297 0.01243
3 1 2.0338 0.03447 0.03448 0.0495
Abstract (if available)
Abstract
It has been shown, in a clinical trial setting where time to event is of interest, a preferential method of reporting influences estimates of the hazard rates and ratios in a single sample situation. This commonly used practice of preferential reporting censors event-free participants at their last clinical visit prior to the analytic time point whereas recording of the survival time for participants experiencing an event prior to analysis can occur at any time. Three methods of data processing are further explored in the case where survival time is exponentially distributed and in an interim monitoring setting. The effects of delayed reporting of events and window length between visits on estimates of the log hazard are compared among the four data processing methods under the alternative hypothesis from study design and a deviation of the alternative hypothesis from study design. Also, under the null hypothesis, the effect of delayed reporting of events and length between visits on the correlation between increments of the score statistic obtained at each time of analysis is compared for each data processing method. A general framework is presented which is used to derive asymptotic variance covariance matrices of the score statistics and asymptotic values of the correlation between increments of the score statistic are also obtained.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The effects of late events reporting on futility monitoring of Phase III randomized clinical trials
PDF
The impact of data collection procedures on the analysis of randomized clinical trials
PDF
Interim analysis methods based on elapsed information time: strategies for information time estimation
PDF
Estimation of treatment effects in randomized clinical trials which involve non-trial departures
PDF
An assessment of impact of early local progression on subsequent risk for the treatment failure in adolescent and young adult patients with non-metastatic osteosarcoma
PDF
Explore risk and protective factors of undifferentiated embryonal sarcoma of the liver
PDF
Eribulin in advanced bladder cancer patients: a phase I/II clinical trial
PDF
A comparison of methods for estimating survival probabilities in two stage phase III randomized clinical trials
PDF
Applications of multiple imputations in survival analysis
PDF
Inference correction in measurement error models with a complex dosimetry system
PDF
An analysis of disease-free survival and overall survival in inflammatory breast cancer
PDF
Survival of children and adolescents with low-risk non-rhabdomyosarcoma soft tissue sarcomas (NRSTS) treated with surgery only: an analysis of 234 patients from the Children’s Oncology Group stud...
PDF
Randomized clinical trial generalizability and outcomes for children and adolescents with high-risk acute lymphoblastic leukemia
PDF
An assessment of necrosis grading in childhood osteosarcoma: the effect of initial treatment on prognostic significance
PDF
Statistical methods and analyses in the Multiethnic Cohort (MEC) human gut microbiome data
PDF
A novel risk-based treatment strategy evaluated in pediatric head and neck non-rhabdomyosarcoma soft tissue sarcomas (NRSTS) patients: a survival analysis from the Children's Oncology Group study...
PDF
Risk factors and survival outcome in childhood alveolar soft part sarcoma among patients in the Children’s Oncology Group (COG) Phase 3 study ARST0332
PDF
Phase I clinical trial designs: range and trend of expected toxicity level in standard A+B designs and an extended isotonic design treating toxicity as a quasi-continuous variable
PDF
Relationship of blood pressure and antihypertensive medications to cognitive change in the BVAIT, WISH, and ELITE clinical trials
PDF
The association between sun exposure and multiple sclerosis
Asset Metadata
Creator
López Nájera, Sandy Oliver (author)
Core Title
The effect of delayed event reporting on interim monitoring methodologies in randomized clinical trials
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Biostatistics
Publication Date
07/29/2016
Defense Date
05/02/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
clinical trials,delayed event reporting,interim monitoring,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Krailo, Mark (
committee chair
)
Creator Email
lopeznaj@usc.edu,oliver.lopez.sol@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-287833
Unique identifier
UC11281108
Identifier
etd-LopezNajer-4678.pdf (filename),usctheses-c40-287833 (legacy record id)
Legacy Identifier
etd-LopezNajer-4678.pdf
Dmrecord
287833
Document Type
Dissertation
Format
application/pdf (imt)
Rights
López Nájera, Sandy Oliver; Lopez Najera, Sandy Oliver
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
clinical trials
delayed event reporting
interim monitoring