Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The effects of late events reporting on futility monitoring of Phase III randomized clinical trials
(USC Thesis Other)
The effects of late events reporting on futility monitoring of Phase III randomized clinical trials
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
The Ef fects of Late Events Reporting on Futility Monitoring of Phase III Randomized Clinical
T rials
by
Caihong Xia
A Dissertation Presented to the
F ACUL TY OF THE USC GRADUA TE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fullfillment of the
Requirement for the Degree
DOCT OR OF PHILOSOPHY
(BIOST A TISTICS)
December 2021
Copyright 2021 Caihong Xia
Dedication
This dissertation is dedicated to my family and friends, who have been a constant source of
support and encouragement during the challenges of graduate school and life.
ii
Acknowlegements
First and foremost, I would like to express my deepest gratitude to my thesis advisor , Dr . Mark
Krailo. W ithout his marvelous guidance, tremendous support and dedicated involvement in
every step throught the my PhD study , this dissertation would never be accomplished. His great
insight and knowledge always steered me in the right direction. It is my great honor to have
Dr . Krailo as my PhD mentor .
I greatly appreciated my committee members, Dr . Sandrah Eckel, Dr . T odd Alonzo,
Dr . Meredith Franklin and Dr . Leo Mascarenhas, for their thoughtful comments and recom-
mendations on this dissertation. They raised many precious points in our discussion, which
refined my research.
Special thanks to Dr . Sandrah Eckel, Dr . Kimberly Siegmund, Dr . Paul Marjoram and
Dr . Darryl Shibata for providing me research assistant positions throughout my study . It was a
precious experience to participate in research areas other than clinical trials.
A special thanks to Children’ s Oncology Group for the permission to use the AEWS data
for my research.
Lastly , I would like to thank my families for their unfailing support and continuous en-
couragement throughout my years of study . This thesis would not have been possible without
them.
iii
T able of Contents
Dedication ii
Acknowlegements iii
List of T ables vii
List of Figur es ix
Abstract xv
Chapter 1: Intr oduction
1
1.1 Clinical trials and Interim analysis . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Survival data collection and Delayed events reporting . . . . . . . . . . . . . . . 2
Chapter 2: Literatur e r eview
6
2.1 Futility monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Group sequential test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Repeated Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Conditional Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.4 Freidlin Korn Gray method . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Modified survival estimator for delayed events reporting . . . . . . . . . . . . . 16
2.2.1 Hu and T siatis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 V an Der Laan and Hubbard . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 3: Theor etical Considerations for Inefficacy Modelling in the Pr esence of
Delayed Reporting
23
3.1 Estimates under perfect reporting . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Estimates under delayed events reporting . . . . . . . . . . . . . . . . . . . . . 26
3.3 Theoretical calculations of RGray analysis time by Newton-Raphson method . . . 30
3.4 Theoretical log hazard ratio at RGray analysis time . . . . . . . . . . . . . . . . 32
3.5 Theoretical Repeated Confidence Intervals in the presence of delayed events re-
porting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Loss of power with interim futility monitoring analysis . . . . . . . . . . . . . . 36
3.7 Index of non-independent censoring (INIC) . . . . . . . . . . . . . . . . . . . . 38
Chapter 4: Simulation Results
48
4.1 RGray Analysis with standard data processing method . . . . . . . . . . . . . . 49
iv
4.1.1 Estimate RGray analysis time by simulation . . . . . . . . . . . . . . . . 49
4.1.2 Distribution of empirical RGray analysis time . . . . . . . . . . . . . . . 54
4.1.3 Empirical log hazard ratio at RGray analysis time . . . . . . . . . . . . . 58
4.1.4 Distribution of empirical log Hazard Ratio . . . . . . . . . . . . . . . . 62
4.1.5 Proportion of trials that are stopped for futility by RGray method . . . . . 66
4.1.6 Loss of power of RGray analysis with Standard data processing method . 69
4.2 RGray analysis with Personal Cutback method . . . . . . . . . . . . . . . . . . 70
4.2.1 Log hazard ratio at RGray analysis time with Personal Cutback method . . 70
4.2.2 Actual information level with Personal Cutback method . . . . . . . . . . 72
4.2.3 Proportion of trials that are stopped for futility by RGray method with
Personal Cutback method . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2.4 The distribution of log hazard ratio at RGray analysis time with Personal
Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2.5 Loss of power of RGray analysis with Personal Cutback method . . . . . 81
4.3 RGray analysis with Global Cutback method . . . . . . . . . . . . . . . . . . . 82
4.3.1 Log hazard ratio at RGray analysis time with Global Cutback method . . 82
4.3.2 Actual information level with Global Cutback Method . . . . . . . . . . 84
4.3.3 Proportion of trials that are stopped for futility by RGray method with
Global Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3.4 Distribution of log hazard ratio at RGray analysis time with Global Cut-
back method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3.5 Loss of power of RGray analysis with Global Cutback method . . . . . . 92
4.4 RGray analysis with Personal Cutback W ait method . . . . . . . . . . . . . . . . 92
4.4.1 RGray analysis time with Personal Cutback W ait method . . . . . . . . . 93
4.4.2 Log hazard ratio at RGray analysis time with Personal Cutback W ait method 98
4.4.3 Distribution of log hazard ratio at RGray analysis time with Personal Cut-
back W ait method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.4.4 Proportion of trials that are stopped for futility by RGray method with
Personal Cutback W ait method . . . . . . . . . . . . . . . . . . . . . 104
4.4.5 Loss of power of RGray analysis with Personal Cutback W ait method . . 106
4.5 RCI futility monitoring with Standard data processing method . . . . . . . . . . 106
4.5.1 Log hazard ratio with Standard data processing method . . . . . . . . . . 108
4.5.2 Proportion of trials that are stopped for futility by RCI method with Stan-
dard data processing method . . . . . . . . . . . . . . . . . . . . . . 1 10
4.5.3 Distribution of Lower Repeated CI with Standard data processing method 1 12
4.5.4 Loss of power of RCI method with Standard data processing method . . . 1 13
4.6 RCI futility monitoring with Personal Cutback method . . . . . . . . . . . . . . 1 14
4.6.1 Log hazard ratio with Personal Cutback method . . . . . . . . . . . . . . 1 15
4.6.2 Proportion of trials that are stopped for futility by RCI method with Per -
sonal Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . . 1 17
v
4.6.3 Distribution of Lower Repeated CI with Personal Cutback method . . . . 1 19
4.6.4 Loss of power of RCI method with Personal Cutback method . . . . . . . 120
4.7 RCI futility monitoring with Personal Cutback W ait method . . . . . . . . . . . 121
4.7.1 Log hazard ratio with Personal Cutback W ait method . . . . . . . . . . . 122
4.7.2 Proportion of trials that are stopped for futility by RCI method with Per -
sonal Cutback W ait method . . . . . . . . . . . . . . . . . . . . . . . 124
4.7.3 Distribution of Lower Repeated CI with Personal Cutback W ait method . 126
4.7.4 Loss of power of RCI method with Personal Cutback W ait Method . . . . 127
4.8 W eibull distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Chapter 5: Application to Real Data
138
5.1 AEWS0031 study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.1.1 Futility monitoring with RGray method . . . . . . . . . . . . . . . . . . 139
5.1.2 Futility monitoring with RCI method . . . . . . . . . . . . . . . . . . . 141
5.2 AEWS1031 study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.2.1 Futility monitoring with RGray method . . . . . . . . . . . . . . . . . . 145
5.2.2 Futility monitoring with RCI method . . . . . . . . . . . . . . . . . . . 146
5.3 AEWS1221 study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.3.1 Futility monitoring with RGray method . . . . . . . . . . . . . . . . . . 149
5.3.2 Futility monitoring with RCI method . . . . . . . . . . . . . . . . . . . 150
5.4 INT0091 study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.4.1 Futility monitoring with RGray method . . . . . . . . . . . . . . . . . . 154
5.4.2 Futility monitoring with RCI method . . . . . . . . . . . . . . . . . . . 155
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Chapter 6: Concluding Remarks and Futur e W ork 159
Refer ences 162
vi
List of T ables
1 T able 1. Lower Alpha-spending boundary in a 4-stage design . . . . . . . . . . 34
2 T able 2. The relationship between baseline hazard rate and INIC with fixed
hazard ratio 0.67, assuming none of the events were reported with delay (p=0)
and the reporting interval was 6 months. . . . . . . . . . . . . . . . . . . . . . 45
3 T able 4.1. RGray monitored example trial w6p2-605: Comparison of Standard
method versus Personal Cutback method . . . . . . . . . . . . . . . . . . . . 74
4 T able 4.2. RGray monitored example trial w6p2-605: Comparison of Standard
method, Personal Cutback method and Personl Cutback W ait method . . . . . . 93
5 T able 4.3. RCI monitored example trial w6p2-605 with Standard data process-
ing method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6 T able 4.4. RCI monitored example trial w6p2-605: Comparison of Standard
method versus Personal Cutback method . . . . . . . . . . . . . . . . . . . . 1 14
7 T able 4.5. RCI monitored example trial w6p2-605: Comparison of Standard
method, Personal Cutback method and Personl Cutback W ait method . . . . . . 121
8 T able 5.1.1. RGray interim futility monitoring for AEWS0031 study . . . . . . 140
9 T able 5.1.2.1. RCI interim futility monitoring for AEWS0031 study with Stan-
dard data processing method . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
10 T able 5.1.2.2. RCI interim futility monitoring for AEWS0031 study with Per -
sonal Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
1 1 T able 5.1.2.3. RCI interim futility monitoring for AEWS0031 study with Per -
sonal Cutback W ait method . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
12 T able 5.2.1. RGray interim futility monitoring for AEWS1031 study . . . . . . 145
13 T able 5.2.2.1. RCI interim futility monitoring for AEWS1031 study with Stan-
dard data processing method . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
14 T able 5.2.2.2. RCI interim futility monitoring for AEWS1031 study with Per -
sonal Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
15 T able 5.2.2.3. RCI interim futility monitoring for AEWS1031 study with Per -
sonal Cutback W ait method . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
16 T able 5.3.1. RGray interim futility monitoring for AEWS1221 study . . . . . . 149
vii
17 T able 5.3.2.1. RCI interim futility monitoring for AEWS1221 study with Stan-
dard data processing method . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
18 T able 5.3.2.2. RCI interim futility monitoring for AEWS1221 study with Per -
sonal Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
19 T able 5.3.2.3. RCI interim futility monitoring for AEWS1221 study with Per -
sonal Cutback W ait method . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
20 T able 5.4.1. RGray interim futility monitoring for INT0091 study . . . . . . . 154
21 T able 5.4.2.1. RCI interim futility monitoring for INT0091 study with Standard
data processing method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
22 T able 5.4.2.2. RCI interim futility monitoring for INT0091 study with Personal
Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
23 T able 5.4.2.3. RCI interim futility monitoring for INT0091 study with Personal
Cutback W ait method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
viii
List of Figur es
1 Figure 1. Lexis Diagram for an illustrative clinical trial . . . . . . . . . . . . . 3
2 Figure 2. Diagram of standard data collection method (A) and delayed events
reporting (B) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Figure 3. Diagram of LIB20 boundary . . . . . . . . . . . . . . . . . . . . . . 15
4 Figure 3.3.1. Theoretical RGray analysis time (in months) under the Altenative
by Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5 Figure 3.4.1. Theoretical log hazard ratio at RGray analysis time under the
Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6 Figure 3.5.1. Theoretical Lower Repeated Confidence Intervals under the Al-
ternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7 Figure 3.7.1. The bias of log hazard ratio estimation with dif ferent baseline
hazard rate, assuming 6 months reporting interval and a true hazard ratio of 0.67. 39
8 Figure 3.7.2. Schema to illustrate the calculation of index of non-independent
information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
9 Figure 3.7.3. Changes of INIC in the presence of delayed events reporting as
baseline hazard rate increases, at fifty percent information time with a reporting
interval of 6 months. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
10 Figure 3.7.4. Changes of INIC at fifty percent information time with dif ferent
reporting intervals as baseline hazard rate increases. . . . . . . . . . . . . . . . 45
1 1 Figure 3.7.5. Changes of hazard rate estimates with dif ferent INIC of control
arm, at fifty percent information time with a reporting interval of 6 months. . . 46
12 Figure 3.7.6. Changes of log hazard ratio estimates with dif ferent INIC in both
arms, at fifty percent information time with a reporting interval of 6 months. . 47
13 Figure 4.1. Empirical log hazard ratio with dif ferent reporting intervals and
dif ferent probability of late events reporting, assuming baseline hazard rate was
as low as 0.01. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
14 Figure 4.1.1.1. Empirical RGray analysis time (in months) under the Alterna-
tive by simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
ix
15 Figure 4.1.1.2. Empirical RGray analysis time (in months) under the Null by
simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
16 Figure 4.1.1.3. Comparision of theoretical and empirical RGray analysis time . 53
17 Figure 4.1.2.1. Box plot showed the distribution of empirical RGray analysis
time under the Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
18 Figure 4.1.2.2. Density plot showed the distribution of empirical RGray analy-
sis time under the Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . 55
19 Figure 4.1.2.3. Box plot showed the distribution of empirical RGray analysis
time under the Null . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
20 Figure 4.1.2.4. Density plot showed the distribution of empirical RGray analy-
sis time under the Null . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
21 Figure 4.1.3.1. Empirical log hazard ratio by simulation under the Alternative . 58
22 Figure 4.1.3.2. Empirical log hazard ratio by simulation under the Null . . . . 59
23 Figure 4.1.3.3. Comparison of theoretical log HR versus empirical log HR un-
der the Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
24 Figure 4.1.3.4. Comparison of theoretical log HR versus empirical log HR un-
der the Null . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
25 Figure 4.1.4.1. Box plot showed the distribution of empirical log HR under the
Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
26 Figure 4.1.4.2. Density plot showed the distribution of empirical log HR under
the Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
27 Figure 4.1.4.3. Box plot showed the distribution of empirical log HR under the
Null . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
28 Figure 4.1.4.4. Density plot showed the distribution of empirical log HR under
the Null . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
29 Figure 4.1.5.1. Proportion of trials that were stopped for futility by RGray
method under the Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . 66
30 Figure 4.1.5.2. A 3D plot showed the proportion of trials that were stopped for
futility by RGray method under the Alternative . . . . . . . . . . . . . . . . . 67
31 Figure 4.1.5.3. Proportion of trials that were stopped for futility by RGray
method under the Null . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
x
32 Figure 4.1.6. Loss of power of RGray analysis with Standard data processing
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
33 Figure 4.2.1.1. Log hazard ratio under the Alternative at RGray analysis time
with Personal Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . . 70
34 Figure 4.2.1.2. Log hazard ratio under the Null at RGray analysis time with
Personal Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
35 Figure 4.2.2.1. Actual Information Level under the Alternative by RGray
method with Personal Cutback method . . . . . . . . . . . . . . . . . . . . . 72
36 Figure 4.2.2.2. Actual Information Level under the Null by RGray method with
Personal Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
37 Figure 4.2.2.3. Diagram of extended LIB20 boundary . . . . . . . . . . . . . . 74
38 Figure 4.2.3.1. Proportion of trials that were stopped for futility under the Al-
ternative by RGray method with Personal Cutback method . . . . . . . . . . . 75
39 Figure 4.2.3.2. Proportion of trials that were stopped for futility under the Null
by RGray method with Personal Cutback method . . . . . . . . . . . . . . . . 76
40 Figure 4.2.4.1. Box plot shows the distribution of log hazard ratio under the
Alternative at RGray analysis time with Personal Cutback method . . . . . . . 77
41 Figure 4.2.4.2. Density plot shows the distribution of log hazard ratio under the
Alternative at RGray analysis time with Personal Cutback method . . . . . . . 78
42 Figure 4.2.4.3. Box plot showed the distribution of log hazard ratio under the
Null at RGray analysis time with Personal Cutback method . . . . . . . . . . . 79
43 Figure 4.2.4.4. Density plot showed the distribution of log hazard ratio under
the Null at RGray analysis time with Personal Cutback method . . . . . . . . . 80
44 Figure 4.2.5. Loss of power of RGray analysis with Personal Cutback method . 81
45 Figure 4.3.1.1. Log hazard ratio at RGray analysis time with Global Cutback
method under the Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . 82
46 Figure 4.3.1.2. Log hazard ratio at RGray analysis time with Global Cutback
method under the Null . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
47 Figure 4.3.2.1. Actual Information Level under the Alternative by RGray
method with Global Cutback method . . . . . . . . . . . . . . . . . . . . . . 84
48 Figure 4.3.2.2. Actual Information Level under the Null by RGray method with
Global Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
xi
49 Figure 4.3.3.1. Proportion of trials that were stopped for futility with Global
Cutback method under the Alternative . . . . . . . . . . . . . . . . . . . . . . 86
50 Figure 4.3.3.2. Proportion of trials that are stopped for futility with Global
Cutback method under the Null . . . . . . . . . . . . . . . . . . . . . . . . . 87
51 Figure 4.3.4.1. Box plot showed the distribution of log HR under the Alternative
at RGray analysis time with Global Cutback method . . . . . . . . . . . . . . 88
52 Figure 4.3.4.2. Density plot shows the distribution of log HR under the Alter -
native at RGray analysis time with Global Cutback method . . . . . . . . . . 89
53 Figure 4.3.4.3. Box plot shows the distribution of log HR under the Null at
RGray analysis time with Global Cutback method . . . . . . . . . . . . . . . . 90
54 Figure 4.3.4.4. Density plot shows the distribution of log HR under the Null at
RGray analysis time with Global Cutback method . . . . . . . . . . . . . . . . 91
55 Figure 4.3.5. Loss of power of RGray analysis with Global Cutback method . . 92
56 Figure 4.4.1.1. A verage wait time under the Alternative with Personal Cutback
W ait method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
57 Figure 4.4.1.2. Density plot showed the distribution of empirical RGray analy-
sis time under the Alternative with Personal Cutback W ait method . . . . . . . 95
58 Figure 4.4.1.3. A verage wait time under the Null with Personal Cutback W ait
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
59 Figure 4.4.1.4. Density plot showed the distribution of empirical RGray analy-
sis time under the Null with Personal Cutback W ait method . . . . . . . . . . . 97
60 Figure 4.4.2.1. Log hazard ratio at RGray analysis time with Personal Cutback
W ait method under the Alternative . . . . . . . . . . . . . . . . . . . . . . . . 98
61 Figure 4.4.2.2. Log hazard ratio at RGray analysis time with Personal Cutback
W ait method under the Null . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
62 Figure 4.4.3.1. Box plot showed the distribution of log HR under the Alternative
at RGray analysis time with Personal Cutback W ait method . . . . . . . . . . . 100
63 Figure 4.4.3.2. Density plot showed the distribution of log hazard ratio under
Alternative at RGray analysis time with Personal Cutback W ait method . . . . 101
64 Figure 4.4.3.3. Box plot showed the distribution of log HR under the Null at
RGray analysis time with Personal Cutback W ait method . . . . . . . . . . . . 102
xii
65 Figure 4.4.3.4. Density plot showed the distribution of log hazard ratio under
the Null at RGray analysis time with Personal Cutback W ait method . . . . . . 103
66 Figure 4.4.4.1. Proportion of trials that were stopped for futility with Personal
Cutback W ait method under the Alternative . . . . . . . . . . . . . . . . . . . 104
67 Figure 4.4.4.2. Proportion of trials that were stopped for futility with Personal
Cutback W ait method under the Null . . . . . . . . . . . . . . . . . . . . . . 105
68 Figure 4.4.5. Loss of power of RGray analysis with Personal Cutback W ait
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
69 Figure 4.5.1.1. Log hazard ratio under the Alternative with Standard data pro-
cessing method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
70 Figure 4.5.1.2. Log hazard ratio under the Null with Standard data processing
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
71 Figure 4.5.2.1. Proportion of trials that were stopped for futility with RCI
method under the Alternative with Standard data processing method . . . . . . 1 10
72 Figure 4.5.2.2. Proportion of trials that are stopped for futility with RCI method
under the Null with Standard data processing method . . . . . . . . . . . . . . 1 1 1
73 Figure 4.5.3.1. Distribution of Lower Repeated CI under the Alternative with
Standard data processing method . . . . . . . . . . . . . . . . . . . . . . . . 1 12
74 Figure 4.5.3.2. Distribution of Lower Repeated CI under the Null with Standard
data processing method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 12
75 Figure 4.5.4. Loss of power of RCI method with Standard data processing method 1 13
76 Figure 4.6.1.1. Log hazard ratio under the Alternative with Personal Cutback
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 15
77 Figure 4.6.1.2. Log hazard ratio under the Null with Personal Cutback method . 1 16
78 Figure 4.6.2.1. Proportion of trials that were stopped for futility with RCI
method under the Alternative with Personal Cutback method . . . . . . . . . . 1 17
79 Figure 4.6.2.2. Proportion of trials that were stopped for futility with RCI
method under the Null with Personal Cutback method . . . . . . . . . . . . . 1 18
80 Figure 4.6.3.1. Distribution of Lower Repeated CI under the Alternative with
Personal Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 19
81 Figure 4.6.3.2. Distribution of Lower Repeated CI under the Null with Personal
Cutback method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 19
xiii
82 Figure 4.6.4. Loss of power of RCI method with Personal Cutback method . . . 120
83 Figure 4.7.1.1. Log hazard ratio under the Alternative with Personal Cutback
W ait method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
84 Figure 4.7.1.2. Log hazard ratio under the null with Personal Cutback W ait
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
85 Figure 4.7.2.1. Proportion of trials that were stopped for futility with RCI
method under the Alternative with Personal Cutback W ait method . . . . . . . 124
86 Figure 4.7.2.2. Proportion of trials that are stopped for futility with RCI method
under the Null with Personal Cutback W ait method . . . . . . . . . . . . . . . 125
87 Figure 4.7.3.1. Distribution of Lower Repeated CI under the Alternative with
Personal Cutback W ait method . . . . . . . . . . . . . . . . . . . . . . . . . 126
88 Figure 4.7.3.2. Distribution of Lower Repeated CI under the Null with Personal
Cutback W ait method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
89 Figure 4.7.4. Loss of power of RCI method with Personal Cutback W ait method 127
90 Figure 4.8.1. W eibull distribution . . . . . . . . . . . . . . . . . . . . . . . . 128
91 Figure 4.8.2. Baseline hazard rate changes over time since time of enrollment . 129
92 Figure 4.8.3. Comparison of RGray analysis time for Standard/Personal Cut-
back method vs. Personal Cutback W ait method . . . . . . . . . . . . . . . . . 129
93 Figure 4.8.4. Density plot of RGray analysis time for Standard/Personal Cut-
back method vs. Personal Cutback W ait method . . . . . . . . . . . . . . . . . 131
94 Fig 4.8.5. The extra waiting time for Personal Cutback W ait method . . . . . . 132
95 Figure 4.8.6. Scatter plot of log hazard ratio with dif ferent data processing method 132
96 Figure 4.8.7. Density plot of log hazard ratio with dif ferent data processing
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
97 Figure 4.8.8. Proportion of trials stopped for futility with dif ferent data pro-
cessing methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
xiv
Abstract
In randomized clinical trials, the current standard data collection method censors event-free pa-
tients at their last visit time before the analysis time, while patients who experience an event can
be ascertained anytime the event occurs. The violation of independent censoring between the
last visit time and the analysis time introduces bias to the estimate of treatment ef fect. The pres-
ence of delayed events reporting makes this bias even worse. How the delayed events reporting
would af fect the interim futility monitoring of Phase III clinical trials remains unknown. In this
study , we evaluate the performance of two commonly used futility monitoring methods, the
RGray method and the Repeated Confidence Interval (RCI) method, in the presence of delayed
events reporting, for both under the null hypothesis and under the alternative hypothesis. Three
data processing methods, the standard method, the personal cutback method and the proposed
personal cutback wait method, are extensively explored and compared with each futility mon-
itoring method. The results suggest that delayed events reporting will not af fect studies with
low index of non-independent censoring (INIC). It will af fect the interim futility monitoring for
studies with high INIC, in which case personal cutback wait method can be employed to solve
the problem.
xv
Chapter 1: Intr oduction
1.1 Clinical trials and Interim analysis
A primary use of a phase III randomized clinical trials (RCT s) are designed to evaluate the
relative ef ficacy of a new therapy compared with the current accepted standard therapy . This
lar ge scale testing provides a thorough understanding of the ef fectiveness of the new therapy , the
benefit and the range of possible adverse event. T rials of this phase may last for several years.
Compelling evidence is needed to determine whether the new therapy is superior or does not
of fer an advantage when compared with the control therapy . T o balance the need of collecting
suf ficient evidence and the need of providing patients the best available treatment, RCT designs
should incorporate comparative analysis of treatment outcomes across regimens at times prior
to the planned final analysis for ef ficacy . At each such time, identified as an interim analysis,
the trials can be either stopped for ef ficacy or futility . Otherwise, the trial can be continued to
the next interim analysis or to the end(Pocock and Geller , 1986).
Futility monitoring is an important component of interim analysis, especially for the life-
threatening diseases. It allows the trials to stop early if the new therapy is unlikely to be superior
than the current standard therapy if the trial were to continue to the end. It can thus minimize
patient exposure to therapies that do not appear to be more beneficial than the current standard
treatment (Korn and Freidlin, 2018). The analysis also allows for reallocation of patient and
financial resources to other investigations when there is strong evidence the new therapy will
not advance treatment of the disease under study .
Several methods have been developed for futility monitoring. Among those, three general
methods are most commonly used for oncology RCT s: sequential testing of the alternative
1
hypothesis, repeated confidence intervals, and conditional power . It has been noted that some
of the commonly used futility rules are too conservative in the beginning and middle of the
trial, thus would not allow the timely stopping of harmful trials (Freidlin and Korn, 2009;
Anderson and High, 201 1). On the other hand, some futility rules are too aggressive in the end
of the trial, thus would recommend stopping for futility even when the experimental arm has
some tangible clinical benefit (Freidlin and Korn, 2002, 2009). T anaka et al. 2012 reported
that in a review of 72 oncology non-inferiority trials published in 2000-2010, only 36% had a
planned interim analysis (T anaka et al., 2012). For trials which included interim monitoring
plans, the specified plans may frequently be suboptimal in terms of protecting patients from
inferior experimental therapies.
1.2 Survival data collection and Delayed events r eporting
The analysis of clinical trials is focused on measuring time to event or outcome. The event of
interest could vary from relapse or progression of disease, secondary malignancy , high grade
toxicity , attainment of a biomarker , or death. Censoring occurs when information on time to
event is not available for some patients due to loss of follow-up or non-occurrence of event
before the end of study , when the primary analysis of the study is conducted.
The most commonly used survival analysis techniques include non-parametric Kaplan
Meier product limit methods (Kaplan and Meier , 1958), parametric W eibull and exponential
methods(Ebrahim, 2007), and semi-parametric cox proportional hazard method(Cox, 1972).
All these methods require the independent censoring assumption, which implies that censoring
is independent of risk of events(Prinja, Gupta and V erma, 2010). The censored individuals
share the same risk of events with uncensored individuals, however the censoring obstructs
2
the researcher from recording the precise time when the event of interest occurs. One type of
independent censoring is known as T ype I censoring, which stops observation for all patients
at a fixed time. Another independent censoring is T ype II censoring, which ends the trial when
a specific number of events have been observed(Leung, Elashof f and Afifi, 1997).
Figure 1. Lexis Diagram for an illustrative clinical trial
In an ideally trial, the event status for all participants are all perfectly ascertained and every
participant’ s status is current at the time of analysis. Patients who are loss to follow up before
the time of analysis are censored at their last visit time, while non-drop out patients without
events are censored at the time of analysis (Figure 1). In reality , it is not possible to obtain
perfect reporting of event status for every participant due to limited resources and unnecessary
repeated medical testing on patients, which might cause certain adverse ef fects. The routine
standard method of data collection is to evaluate patient status at scheduled clinic or phone visit
with regular intervals as specified in the study protocol. While on the other hand, the events
of interest are required to be reported as soon as possible. This situation is illustrated in Figure
3
2A. Participants without event of interest are censored at their last visit and will not contribute
survival time afterwards (patient 1), while patients with events will contribute time to event to
analysis (patient 2, 3 and 4). If participants enter the study late and will not have scheduled
visit before the time of analysis, the patient without event will not be included in the analysis
(patient 7), while patients with event will be included (patient 5 and 6).
Figure 2. Diagram of standard data collection method (A) and delayed events reporting (B)
There are several concerns about this standard method. First, it could lead to biased esti-
mates of the true survivor function. If most patients enter the study simultaneously , the last
visit time of each patient can be quite close. A lar ge number of “well” patients are censored
at their last visit time, which results in a relatively low number of patients at risk after this
timepoint. Only events of interest are to be reported, thus no one will be counted as survival at
this time. This will cause sudden lar ge drops of survival curves. Second, the censoring is no
longer independent between the time of last visit and time of analysis. During this time frame,
patients who do not have events will not have chance to be reported and are always censored.
On the contrary , patients who have events of interest will have chance to be reported and not
censored. Lastly , this method counts the correct number of events, but underestimates the total
4
survival time by ignoring the information time of censored patients between their last visit time
and analysis time, which leads to overestimate of the underlying hazard rate.
The presence of delayed events reporting could make this bias even worse. If all failures
are reported without delay , we can safely assume that all non-drop out and censored patients
do not experience events before time of analysis, thus they could contribution survival time
up to the time of analysis. Unbiased hazard ratio estimator can be achieved thereafter . This
is the pull forward method as defined in Dr . McIlvaine’ s work(McIlvaine, 2015). However ,
this is not always the case in reality . A patient who experienced an event of interest may not
be reported to the clinic in a timely manner due to reasons unrelated to risk of event. Figure
2B shows how the event status could be af fected by delayed events reporting. Patients who
experienced an event after the time of last visit and failed to report the event in a timely fashion
would be censored at their last visit time (patient 4). Patients who enrolled too late to have
a scheduled visit before time of analysis and failed to report event immediately would not be
included in analysis (patient 6). Late reporting of events causes an unpredictable bias in the
estimator of survival functions, which would lead to flawed conclusion in futility monitoring.
The information collected between last visit and time of analysis is the major source of bias of
the survival function estimator , thus need to be treated cautiously .
5
Chapter 2: Literatur e r eview
2.1 Futility monitoring
2.1.1 Gr oup sequential test
A group sequential test monitors a statistic which summarize the dif ference in primary response
between the two treatment groups at a series of times during the trial (Gordon Lan and Demets,
1983; Jennison and T urnbull, 1999). It provides the possibility to stop the trial early for either
ef ficacy or futility . The trial is stopped for futility if the alternative hypothesis is rejected by
the group sequential testing. T o avoid substantial inflation of type II error , the nominal signif-
icance level at each interim monitoring timepoints are determined according to a pre-specified
error spending function, such as Pocock-type error spending method (Pocock, 1977), O’Brien-
Fleming error spending method (O’Brien and Fleming, 1979), hwang-shih-decani gamma fam-
ily of spending function(Hwang, Shih and De Cani, 1990), W ang-T siatis power family spending
functions(W ang and T siatis, 1987), etc. Alternatively , a small constant nominal significance
level can be used.
Numerous methods have been developed to distribute the error to each stage and then derives
6
the interim monitoring boundary . Several commonly used methods are:
𝛼 𝑘 =
⎧
{
{
{
{
{
{
⎨
{
{
{
{
{
{
⎩
𝛼 × 𝑙𝑜 𝑔 (1 + (𝑒 − 1)𝑡) Pocock’ s method
2(1 − Φ(
𝑍 1−𝛼/2
√
𝑡 )) O’Brien-Fleming method
𝛼 ×
1−𝑒
−𝛾 𝑡 1−𝑒
−𝛾
if 𝛾 ≠ 0, 𝛼𝑡 if 𝛾 = 0 Gamma method
𝛼 × 𝑡 𝜌 , where 𝜌 is the power parameter Power family method
where 𝑡 denotes the information fraction at the interim analysis time. Extensive work has shown
that the fractional information at an interim anaysis time can be estimated as a fraction of max-
imum number of failures(Lan and Zucker , 1993; Lan, Reboussin and DeMets, 1994; Kim,
Boucher and T siatis, 1995), which can be denoted as
𝑡 𝑘 =
𝐼 𝑘 𝐼 𝑚𝑎𝑥 ≈
𝐷 𝑘 𝐷 𝑚𝑎𝑥 , 0 < 𝑡 𝑘 < 1
where 𝐼 𝑚𝑎𝑥 is the full information, 𝐼 𝑘 is the information at the 𝑘 𝑡ℎ
interim analysis, 𝐷 𝑘 is the
total number of observed events in both groups at the 𝑘 𝑡ℎ
interim analysis, and 𝐷 𝑚𝑎𝑥 is the
total expected events at the end of study under the alternative (Kim and Demets, 1987; DeMets
and Lan, 1995; Kim et al., 1995).
Let 𝐼 𝑘 be the information level at the 𝑘 𝑡ℎ
interim analysis, 𝑍 𝑘 be the standardized statistics
at the 𝑘 𝑡ℎ
interim analysis. The statistics {𝑍
1
, ⋯ , 𝑍 𝐾 } have the canonical joint distribution
with information levels {𝐼
1
, ⋯ , 𝐼 𝐾 } for the parameter 𝜃 if
(𝑖) (𝑍
1
, ⋯ , 𝑍 𝐾 ) is multivariate normal
(𝑖𝑖) 𝐸 (𝑍
𝑘 ) = 𝜃 √ 𝐼 𝑘 , 𝑘 = 1, ⋯ , 𝐾 , and
(𝑖𝑖𝑖) 𝐶 𝑜 𝑣 (𝑍
𝑘1
, 𝑍 𝑘2
) =
√
𝐼 𝑘1
/𝐼
𝑘2
, 1 ≤ 𝑘 1
≤ 𝑘 2
≤ 𝐾 7
The above defined group sequential test in terms of the standardized statistics {𝑍
1
, ⋯ , 𝑍 𝐾 } .
Other commonly used boundary scales include MLE estimator , score statistics and p-value.
Let
̂ 𝜃 𝑘 = 𝑍 𝑘 / √ 𝐼 𝑘 be the MLE estimator , 𝑆 𝑘 = 𝑍 𝑘 √ 𝐼 𝑘 be the score statistics, each of the
sequences {𝑍
1
, ⋯ , 𝑍 𝐾 } , {
̂ 𝜃 1
, ⋯ ,
̂ 𝜃 𝐾 } , {𝑆
1
, ⋯ , 𝑆 𝐾 } is multivariate normal, with
𝑍 𝑘 ∼ 𝒩(𝜃 √ 𝐼 𝑘 , 1) and 𝐶 𝑜 𝑣 (𝑍
𝑘 1
, 𝑍 𝑘 2
) = √ 𝐼 𝑘 1
/𝐼
𝑘 2
,
̂ 𝜃 𝑘 ∼ 𝒩(𝜃 , 𝐼 −1
𝑘 ) and 𝐶 𝑜 𝑣 (
̂ 𝜃 𝑘 1
,
̂ 𝜃 𝑘 2
) = 𝐼 −1
𝑘 2
,
𝑆 𝑘 ∼ 𝒩(𝜃 𝐼 𝑘 , 𝐼 𝑘 ) and 𝐶 𝑜 𝑣 (𝑆
𝑘 1
, 𝑆 𝑘 2
) = 𝐼 𝑘 1
for 𝑘 = 1, ⋯ , 𝐾 and 1 ≤ 𝑘 1
≤ 𝑘 2
≤ 𝐾 .
Let 𝐾 be the total number of analyses, 𝑍 𝑘 be the test statistics at the 𝑘 𝑡ℎ
interim analysis, a
pair of constant (𝐶
𝑎 𝑘 , 𝐶 𝑏 𝑘 ) be the critical value at the 𝑘 𝑡ℎ
interim analysis, with 0 ≤ 𝐶 𝑏 𝑘 < 𝐶 𝑎 𝑘 for 𝑘 = 1, ⋯ , 𝐾 − 1 and 𝐶 𝑎 𝐾 = 𝐶 𝑏 𝐾 . The overall T ype I error probability is given by
𝛼 =
𝐾 ∑
𝑘=1
𝛼 𝑘 where 𝛼 𝑘 is the 𝛼 spending at stage 𝑘 . That is, at stage 1,
𝛼 1
= 𝑃 𝜃 0
(|𝑍
1
| ≥ 𝐶 𝑎 1
)
At the subsequent stage 𝑘 ,
𝛼 𝑘 = 𝑃 𝜃 0
(𝐶
𝑏 𝑗 ≤ |𝑍
𝑗 | < 𝐶 𝑎 𝑗 , 𝑗 = 1, 2, ⋯ , 𝑘 − 1, |𝑍
𝑘 | ≥ 𝐶 𝑎 𝑘 )
8
The overall T ype II error probability 𝛽 is given by
𝛽 =
𝐾 ∑
𝑘=1
𝛽 𝑘 where 𝛽 𝑘 is the 𝛽 spending at stage 𝑘 . That is, at stage 1,
𝛽 1
= 𝑃 𝜃 𝐴 (|𝑍
1
| < 𝐶 𝑏 1
)
At the subsequent stage 𝑘 ,
𝛽 𝑘 = 𝑃 𝜃 𝐴 (𝐶
𝑏 𝑗 ≤ |𝑍
𝑗 | < 𝐶 𝑎 𝑗 , 𝑗 = 1, 2, ⋯ , 𝑘 − 1, |𝑍
𝑘 | < 𝐶 𝑏 𝑘 )
For a 𝐾 -stage design with a lower alternative hypothesis and with early stopping to accept 𝐻 0
only , the interim lower 𝛼 critical values are set to −∞ : 𝛼 𝑘 = −∞ , 𝑘 = 1, 2, ⋯ , 𝐾 − 1 , and
𝛼 𝐾 = 𝛼 .
The stopping rule for the group sequential test is:
At stage 𝑘 = 1, 2, ⋯ , 𝐾 − 1 ,
⎧
{
{
{
{
⎨
{
{
{
{
⎩
if |𝑍
𝑘 | ≥ 𝐶 𝑎 𝑘 stop, reject 𝐻 0
if |𝑍
𝑘 | < 𝐶 𝑏 𝑘 stop, accept 𝐻 0
otherwise continue to group 𝑘 + 1
At the last stage K,
⎧
{
{
⎨
{
{
⎩
if |𝑍
𝑘 | ≥ 𝐶 𝑎 𝐾 stop, reject 𝐻 0
Otherwise stop, accept 𝐻 0
.
9
2.1.2 Repeated Confidence Interval
Repeated confidence intervals (RCIs) for a parameter 𝜃 are defined as a sequence of intervals
𝐼 𝑘 , 𝑘 = 1, ⋯ , 𝐾 , for which a simultaneous coverage probability is maintained at level 1 − 𝛼 (Jennison and T urnbull, 1984, 1989). The construction of RCIs depends on the choice of group
sequential test and the associated critical value 𝐶 𝑎 1
, ⋯ , 𝐶 𝑎 𝐾 . The defining property of a 1 − 𝛼 -
level sequence of RCIs for 𝜃 is
𝑃 𝜃 {𝜃 ∈ 𝑆 𝑘 for all 𝑘 = 1, ⋯ , 𝐾 } = 1 − 𝛼 for all 𝜃 .
where 𝑆 𝑘 = {𝜃
0
∶ |𝑍
𝑘 (𝜃
0
)| < 𝐶 𝑎 𝑘 } , 𝐶 𝑎 𝑘 are critical values appropriate to the particular form
of test at the 𝑘 𝑡ℎ
interim analysis. The RCI at stage 𝑘 is
({𝑍
𝑘 − 𝐶 𝑎 𝑘 }/ √ 𝐼 𝑘 , {𝑍
𝑘 + 𝐶 𝑎 𝑘 }/ √ 𝐼 𝑘 ), 𝑘 = 1, ⋯ , 𝐾 ( 2.1.2.1)
Or writing
̂ 𝜃 𝑘 = 𝑍 𝑘 / √ 𝐼 𝑘 , the maximum likelihood estimate of 𝜃 at analysis 𝑘 ,
(
̂ 𝜃 𝑘 − 𝐶 𝑎 𝑘 / √ 𝐼 𝑘 ,
̂ 𝜃 𝑘 + 𝐶 𝑎 𝑘 / √ 𝐼 𝑘 ), 𝑘 = 1, ⋯ , 𝐾 ( 2.1.2.2)
T o allow early stopping for futility , suppose we desire a group sequential test with T ype I error
probability 𝛼 and power 1 − 𝛽 at 𝜃 = ±𝜃
𝐴 . Let 𝜃 𝐴𝑙 be the lower alternative, 𝜃 𝐴𝑢 be the upper
alternative. There are two ways to proceed.
In the first approach, form a 1 − 𝛼 -level sequence of RCIs, {(𝜃
𝑘𝑙 , 𝜃 𝑘𝑢 ); 𝑘 = 1, ⋯ , 𝐾 } .
The formal stopping rule is:
10
At stage 𝑘 = 1, 2, ⋯ , 𝐾 − 1 ,
⎧
{
{
{
{
⎨
{
{
{
{
⎩
if 𝜃 𝑘𝑙 > 𝜃 𝐴𝑢 or 𝜃 𝑘𝑢 < 𝜃 𝐴𝑙 stop, reject 𝐻 0
if (𝜃
𝑘𝑙 , 𝜃 𝑘𝑢 ) ⊂ [ 𝜃 𝐴𝑙 , 𝜃 𝐴𝑢 ] stop, accept 𝐻 0
otherwise continue to group 𝑘 + 1
At the last stage K,
⎧
{
{
⎨
{
{
⎩
if 𝜃 𝑘𝑙 > 𝜃 𝐴𝑢 or 𝜃 𝑘𝑢 < 𝜃 𝐴𝑙 stop, reject 𝐻 0
Otherwise stop, accept 𝐻 0
.
The second approach involves two sequences of RCIs for 𝜃 : one of level (1 − 𝛼) and the other
of level (1 − 2𝛽 ) . At each stage, 𝐻 0
is rejected if the (1 − 𝛼) RCI does not contain 𝜃 0
, and 𝐻 0
is accepted if the current (1 − 2𝛽 ) RCI lies entirely within the interval (𝜃
𝐴𝑙 , 𝜃 𝐴𝑢 ) .The power
requirement is satisfied as shown below:
𝑃 𝜃 𝐴𝑢 { Accept 𝐻 0
} ≤ 𝑃 𝜃 𝐴𝑢 {𝜃
𝑘𝑙 (𝛽 ) > 𝜃 𝐴𝑙 and 𝜃 𝑘𝑢 < 𝜃 𝐴𝑢 for some k }
≤ 𝑃 𝜃 𝐴𝑢 {𝜃
𝑘𝑢 < 𝜃 𝐴𝑢 for some k }
= 2𝛽 /2 = 𝛽 A similar ar gument establishes the power under 𝜃 = 𝜃 𝐴𝑙 . The acceptance RCI at stage 𝑘 is
(
̂ 𝜃 𝑘 + 𝜃 𝐴𝑙 + 𝐶 𝑏𝑘 / √ 𝐼 𝑘 ,
̂ 𝜃 𝑘 + 𝜃 𝐴𝑢 − 𝐶 𝑏𝑘 / √ 𝐼 𝑘 , ) ( 2.1.2.3)
1 1
2.1.3 Conditional Power
The conditional power approach was introduced by Lan, Simon & Halperin in 1982 (Lan, Simon
and Halperin, 1982). Let 𝒯 be the group sequential test at interim stage 𝑘 , let 𝐷 (𝑘 ) denote the
data accumulated so far , the conditional power at stage 𝑘 is defined as
𝑝 𝑘 (𝜃 ) = 𝑃 𝑟 𝜃 {𝒯 will reject 𝐻 0
|𝐷 (𝑘 )}
It suggested to stop the trial to accept the null if given the observed data, 𝑝 𝑘 (𝜃
1
) < 𝛾 ,
which means the conditional probability of rejecting 𝐻 0
under the alternative is less than a
pre-specified threshold 𝛾 . The value of 𝛾 should be between 0 and 0.5. The quantity 1 − 𝑝 𝑘 (𝜃
1
)
was termed the futility index (W are, Muller and Braunwald, 1985). It is the probability of
accepting 𝐻 0
under the alternative given the current data. A high futility index indicates a high
probability to declare futility given the current data.
2.1.4 Fr eidlin Korn Gray method
Freidlin et al. 2010 (Freidlin, Korn and Gray , 2010) suggested beginning monitoring at the
earliest time-point at which
̂ 𝜃 > 0 implies that the two-sided 95% confidence interval for 𝜃 would not contain the alternative hypothesis used to design the trial ( 𝜃 𝐴 ). This methodology
will be referred as “RGray method” in this report subsequently . The earliest point 𝑡 0
is given
by
𝑡 0
= (
𝑍 1−𝛾 /2
𝑍 1−𝛼/2
+ 𝑍 1−𝛽
)
2
( 2.1.4.1)
12
where 𝑍 denotes the quantile of a standard normal distribution, 𝛼 denotes the T ype I error rate,
𝛽 denotes the T ype II error rate, (1 − 𝛾 )%𝐶 𝐼 denotes the pre-specified confidence level.
A detailed process to derive the formula is described as below:
For a trial designed to detect the two-sided alternative 𝐻 1
∶ 𝜃 = 𝜃 𝐴 , 𝜃 𝐴 ≠ 0 versus the
null hypothesis 𝐻 0
∶ 𝜃 = 𝜃 0
, 𝜃 0
= 0 with a given power (1 − 𝛽 ) at a two-sided significance
level 𝛼 . The maximum likelihood estimator of log hazard ratio follows normal distribution
̂ 𝜃 ∼ 𝒩(𝜃 ,
1
𝐼 ) , where 𝐼 denotes the fisher information. The T ype I error probability is calculated
as
𝑃 𝜃=𝜃
0
(∣
̂ 𝜃 − 𝜃 √ 1/𝐼
∣ > 𝑍 1−𝛼/2
) = 𝛼 ⇒𝑃 (∣
̂ 𝜃 √
𝐼 ∣ > 𝑍 1−𝛼/2
) = 𝛼 The T ype II error is calculated as
𝑃 𝜃=𝜃
𝐴 (∣
̂ 𝜃 √
𝐼 ∣ < 𝑍 1−𝛼/2
) = 𝛽 ⇒ 𝑃 𝜃=𝜃
𝐴 (−𝑍
1−𝛼/2
<
̂ 𝜃 √
𝐼 < 𝑍 1−𝛼/2
) = 𝛽 ⇒ 𝑃 (−𝑍
1−𝛼/2
− 𝜃 𝐴 √
𝐼 <
̂ 𝜃 √
𝐼 − 𝜃 𝐴 √
𝐼 < 𝑍 1−𝛼/2
− 𝜃 𝐴 √
𝐼 ) = 𝛽 ⇒ 𝛽 = 𝜙 (𝑍
1−𝛼/2
− 𝜃 𝐴 √
𝐼 )) − 𝜙 (−𝑍
1−𝛼/2
− 𝜃 𝐴 √
𝐼 )
= 𝜙 (𝑍
1−𝛼/2
−
√
𝐼 𝜃 𝐴 )) ( the second term is very small and can be ignored if 𝜃 𝐴 > 0)
⇒ 𝑍 1−𝛼/2
− 𝜃 𝐴 √
𝐼 = 𝑍 𝛽 ⇒ 𝜃 𝐴 √
𝐼 = 𝑍 1−𝛼/2
− 𝑍 𝛽 = 𝑍 1−𝛼/2
+ 𝑍 1−𝛽
Similarly , if 𝜃 𝐴 < 0 , 𝜃 𝐴 √
𝐼 = −(𝑍
1−𝛼/2
+ 𝑍 1−𝛽
) .
13
The (1 − 𝛾 ) %CI for the log hazard ratio is
[
̂ 𝜃 − 𝑍 1−𝛾 /2
/
√
𝑡𝐼 ,
̂ 𝜃 + 𝑍 1−𝛾 /2
/
√
𝑡𝐼 ]
Let
̂ 𝜃 = 0 , if the confidence interval does not contain 𝜃 𝐴 , it yields
𝑍 1−𝛾 /2
/
√
𝑡𝐼 < 𝜃 𝐴 ⇒
√
𝑡 >
𝑍 1−𝛾 /2
𝜃 𝐴 √
𝐼 ⇒𝑡 > (
𝑍 1−𝛾 /2
𝑍 1−𝛼/2
+ 𝑍 1−𝛽
)
2
Similarly , for a trial designed to detect the one-side upper alternative 𝐻 1
= 𝜃 𝐴 , 𝜃 𝐴 > 0
versus the null hypothesis is 𝐻 0
∶ 𝜃 = 𝜃 0
, 𝜃 0
= 0 with a given power (1 − 𝛽 ) at a one-sided
significance level 𝛼 , the T ype I error rate is
𝑃 𝜃=𝜃
0
(
̂ 𝜃 − 𝜃 √ 1/𝐼
> 𝑍 1−𝛼
) = 𝛼 ⇒ 𝑃 (
̂ 𝜃 √
𝐼 > 𝑍 1−𝛼
) = 𝛼 The T ype II error rate is
𝑃 𝜃=𝜃
𝐴 (
̂ 𝜃 √
𝐼 < 𝑍 1−𝛼
) = 𝛽 ⇒ 𝑃 (
̂ 𝜃 √
𝐼 − 𝜃 𝐴 √
𝐼 < 𝑍 1−𝛼
− 𝜃 𝐴 √
𝐼 ) = 𝛽 ⇒ 𝛽 = 𝜙 (𝑍
1−𝛼
− 𝜃 𝐴 √
𝐼 ))
⇒ 𝑍 1−𝛼
− 𝜃 𝐴 √
𝐼 = 𝑍 𝛽 ⇒ 𝜃 𝐴 √
𝐼 = 𝑍 1−𝛼
− 𝑍 𝛽 = 𝑍 1−𝛼
+ 𝑍 1−𝛽
14
The one-sided (1 − 𝛾 ) %CI for the log hazard ratio is
[ − ∞,
̂ 𝜃 + 𝑍 1−𝛾
/
√
𝑡𝐼 ]
Let
̂ 𝜃 = 0 , if the confidence interval does not contain 𝜃 𝐴 , it yields
𝑍 1−𝛾
/
√
𝑡𝐼 < 𝜃 𝐴 ⇒
√
𝑡 >
𝑍 1−𝛾
𝜃 𝐴 √
𝐼 ⇒ 𝑡 > (
𝑍 1−𝛾
𝑍 1−𝛼
+ 𝑍 1−𝛽
)
2
Figure 3. Diagram of LIB20 boundary
T o avoid the potential problem of stopping the trial for inef ficacy when the experimental arm
is doing moderately better , they propose the following. The inef ficacy monitoring boundary
starts with the cut-of f
̂ 𝜃 > 0 at information time 𝑡 0
(as defined above), and gradually allows
for the cut-of f to increase to f% of the tar get benefit subject to the requirement that a two-sided
95% confidence interval for the treatment ef fect observed at a cut-of f excludes the alternative
hypothesis. A boundary (on a log hazard ratio scale) can be constructed by taking f to be 20%
and connecting the points ( 𝑡 0
, 0 ) and (1, 0.20 × 𝜃 𝐴 ) by a straight line (Fig 3). Then for a given
information faction t in the interval [ 𝑡 0
, 1], the corresponding cut-of f value is:
15
0.20 × 𝜃 𝐴 𝑡 − 𝑡 0
1 − 𝑡 0
=0.20 × 𝜃 𝐴 𝑡 − (
𝑍 1−𝛾 /2
𝑍 1−𝛼/2
+𝑍
𝛽 )
2
1 − (
𝑍 1−𝛾 /2
𝑍 1−𝛼/2
+𝑍
𝛽 )
2
=0.20 × 𝜃 𝐴 (𝑍
1−𝛼/2
+ 𝑍 1−𝛽
)
2
𝑡 − 𝑍 2
1−𝛾 /2
(𝑍
1−𝛼/2
+ 𝑍 1−𝛽
)
2
− 𝑍 2
1−𝛾 /2
( 2.1.4.2)
2.2 Modified survival estimator for delayed events r eporting
2.2.1 Hu and T siatis
The Kaplan-Meier estimator is the standard method for estimating survival distributions for
right-censored data. It requires the censoring process be non-informative to achieve consistent
estimator . In the case of reporting events with error , the informative censoring was induced
and the Kaplan-Meier estimator will be biased. This paper studied the extent of this bias and
proposed another estimator for the survival distribution which will be consistent even when the
ascertainment of vital status is informative (Hu and T siatis, 1996).
They first examined the process if all patients were followed until they were observed to fail.
Assume a random sample of n individuals will be recruited into the clinical trial. For the 𝑖 𝑡ℎ
individual, l et 𝑇 𝑖 denote the time from study entry to failure, 𝑈 𝑗𝑖 be the 𝑗 𝑡ℎ
visit to the hospital,
𝐴 𝑗𝑖 be the time when the life status ascertained at the 𝑗 𝑡ℎ
visit was recorded in the database.
The final update of time to death 𝑇 𝑖 was recorded at 𝐴 𝑘𝑖 . Let 𝑅 𝑖 (𝑥) indicate the individual’ s
vital status at time 𝑥 . Let 𝑉 𝑖 (𝑥) denotes the first time at which the individual’ s vital status is
known at time 𝑥 , which means
16
𝑉 𝑖 (𝑥) =
⎧
{
{
⎨
{
{
⎩
𝐴 𝑗𝑖 if 𝑥 ∈ (𝑈
𝑗−1
, 𝑈 𝑗 ) and 𝑅 𝑖 (𝑥) = 0
𝐴 𝑘𝑖 if 𝑅 𝑖 (𝑥) = 1
The data for the 𝑖 𝑡ℎ
individual can be expressed as 𝑉 𝑖 (𝑥), 𝑅 𝑖 (𝑥); 𝑥 ≥ 0 . Assume there exists
nonrandom 𝐶 (𝑥) such that
𝑝 {𝑉 (𝑥) ≤ 𝑥 + 𝐶 (𝑥)|𝑅 (𝑥) = 1} = 1
which means the death event will be known with certainty by time 𝑥 + 𝐶 (𝑥) . Let the
cause-specific hazard function for time to ascertainment be
𝜆 𝑖 (𝑥, 𝑢) = lim
ℎ→0
ℎ
−1
𝑝 {𝑢 ≤ 𝑉 (𝑥) < 𝑢 + ℎ, 𝑅 (𝑥) = 𝑗 |𝑉 (𝑥) ≥ 𝑢}, (𝑗 = 0, 1)
Let the sub-distribution function
𝐺 1
(𝑥, 𝑣 ) = 𝑝 {𝑉 (𝑥) ≤ 𝑣 , 𝑅 (𝑥) = 1}
= ∫
𝑣 0
𝑒𝑥𝑝 { − (∫
𝑡 0
𝜆 1
(𝑥, 𝑣 )𝑑 𝑣 + ∫
𝑡 0
𝜆 0
(𝑥, 𝑣 )𝑑 𝑣 )}𝜆
1
(𝑥, 𝑡)𝑑 𝑡 Thus they have
1 − 𝑆 (𝑥) = 𝑝 {𝑅 (𝑥) = 1} = 𝑝 {𝑉 (𝑥) ≤ 𝑥 + 𝐶 (𝑥), 𝑅 (𝑥) = 1} = 𝐺 1
(𝑥, 𝑥 + 𝐶 (𝑥))
T o achieve consistent estimator in the presence of informative censoring, they assume the po-
tential follow-up time F is independent of {(𝑈
1
, 𝐴 1
), ⋯ , (𝑈
𝑘−1
, 𝐴 𝑘−1
), (𝑇 , 𝐴 𝑘 )} . The ob-
17
served random variables can be represented as {𝑋 (𝑥), △(𝑥), 𝑅 ∗
(𝑥); 𝑥 ≥ 0} , where 𝑋 (𝑥) =
𝑚𝑖𝑛{𝑉 (𝑥), 𝐹 } , △(𝑥) = 𝐼 (𝑋 (𝑥) = 𝑉 (𝑥)) , which indicates whether the vital status at time
x is known; and 𝑅 ∗
(𝑥) = 𝑅 (𝑥) , which denotes the death indicator at time 𝑥 , defined only if
△(𝑥) = 1 . Now they have the cause-specific hazards as functions of only observable random
varaibles as
𝜆 ∗
𝑖 (𝑥, 𝑢) = lim
ℎ→0
ℎ
−1
𝑝 𝑟 {𝑢 ≤ 𝑋 (𝑥) < 𝑢 + ℎ, △(𝑥) = 1, 𝑅 ∗
(𝑥) = 𝑖|𝑋 (𝑥) ≥ 𝑢}
=
lim
ℎ→0
ℎ
−1
𝑝 𝑟 {𝑢 ≤ 𝑉 (𝑥) < 𝑢 + ℎ, 𝐹 ≥ 𝑢, 𝑅 (𝑥) = 𝑗 }
𝑝 𝑟 {𝑉 (𝑥) ≥ 𝑢, 𝐹 ≥ 𝑢}
(By definition)
= 𝜆 𝑖 (𝑥, 𝑢) (𝑗 = 0, 1) (By independence assumption)
They next use the theory of counting process to obtain the estimator . For fixed 𝑥 , define the
counting process as
𝑁 𝑗𝑖 (𝑥, 𝑢) = 𝐼 {𝑋
𝑖 (𝑥) ≤ 𝑢, △
𝑖 (𝑥) = 1, 𝑅 ∗
𝑖 (𝑥) = 𝑗 } (𝑗 = 0, 1; 𝑖 = 1, ⋯ , 𝑛, 𝑢 ≥ 0)
𝑁 𝑗 (𝑥, 𝑢) = ∑ 𝑁 𝑗𝑖 (𝑥, 𝑢)
𝑁 (𝑥 , 𝑢) = 𝑁 0
(𝑥, 𝑢) + 𝑁 1
(𝑥, 𝑢)
define the at-risk process as
𝑌 𝑖 (𝑥, 𝑢) = 𝐼 {𝑋
𝑖 (𝑥) ≥ 𝑢}
𝑌 (𝑥, 𝑢) = ∑ 𝑌 𝑖 (𝑥, 𝑢)
define a filtration of the increasing 𝜎 -algebra ℱ(𝑥, 𝑢) for 𝑢 ≥ 0 as
18
𝜎 [ 𝐼 {𝑋
𝑖 (𝑥) ≤ 𝑦 , △
𝑖 (𝑥) = 0}, 𝐼 {𝑋
𝑖 (𝑥) ≤ 𝑦 , △
𝑖 (𝑥) = 1, 𝑅 ∗
𝑖 (𝑥) = 1},
𝐼 {𝑋
𝑖 (𝑥) ≤ 𝑦 , △
𝑖 (𝑥) = 1, 𝑅 ∗
𝑖 (𝑥) = 0}, 0 ≤ 𝑦 ≤ 𝑢, 𝑖 = 1, ⋯ , 𝑛]
define the ℱ(𝑥, 𝑢) -martingale process associated with 𝑁 𝑖 (𝑥, 𝑢) as
𝑀 𝑗 (𝑥, 𝑢) = 𝑁 𝑗 (𝑥, 𝑢) − ∫
𝑢 0
𝜆 ∗
0
(𝑥, 𝑡)𝑌 (𝑥, 𝑡)𝑑 𝑡.
By substituting 𝑑 𝑁 𝑗 (𝑥, 𝑡)/𝑌 (𝑥, 𝑡) for 𝜆 𝑗 (𝑥, 𝑡) , this yields
̂ 𝐺 1
(𝑥, 𝑣 ) = ∫
𝑣 0
𝑒𝑥𝑝 { − ∫
𝑢 0
𝑑 𝑁 1
(𝑥, 𝑡) + 𝑑 𝑁 0
(𝑥, 𝑡)
𝑌 (𝑥, 𝑡)
}
𝑑 𝑁 1
(𝑥, 𝑢)
𝑌 (𝑥, 𝑢)
̂ 𝑆 (𝑥) = 1 −
̂ 𝐺 1
(𝑥, 𝑥 + 𝐶 (𝑥))
= 1 − ∫
𝑥+𝐶 (𝑥)
0
̂ 𝐸 (𝑥, 𝑢 −
)
𝑑 𝑁 1
(𝑥, 𝑢)
𝑌 (𝑥, 𝑢)
= 1 −
𝑛 ∑
𝑖=1
̂ 𝐸 {𝑥, 𝑋 𝑖 (𝑥)
−
}
𝑌 {𝑥, 𝑋 𝑖 (𝑥)}
𝐼 {𝑋
𝑖 (𝑥) ≤ 𝑥 + 𝐶 (𝑥), △
𝑖 (𝑥) = 1, 𝑅 𝑖 (𝑥) = 1}
where 𝑌 {𝑥, 𝑋 𝑖 (𝑥)} = ∑ 𝐼 {𝑋
𝑙 (𝑥) ≥ 𝑋 𝑖 (𝑥)} , 𝑙 = 1, ⋯ , 𝑛 , and
̂ 𝐸 (𝑥, 𝑢) denotes the
Kaplan-Meier estimator . In the special case where the ascertainment of vital status is always up
to date, this estimator reduces to the usual Kaplan-Meier estimator .
This work makes the important assumption that the maximum amount of possible reporting
delay is a known constant 𝐶 (𝑥) , which means that if a patient experiences an events at time
x, the information will be known by time 𝑥 + 𝐶 (𝑥) . They also assume that the follow-up
time 𝐹 is independent of the ascertainment process {(𝑈
1
, 𝐴 1
), ⋯ , (𝑈
𝑘−1
, 𝐴 𝑘−1
), (𝑇 , 𝐴 𝑘 )} ,
which is questionable. When patients are censored at their last visit, their survival time 𝐹 is a
function of the visit time, which is part of the ascertainment process. Lastly , they assume that
the assessment time and the reporting time are recorded for every patients, which is not the
19
case in real clinical trials. These aspects limit the practical applications of their work.
2.2.2 V an Der Laan and Hubbard
V an Der Laan and Hubbard et al. 1998 extended the work of Hu & T siatis by removal the depen-
dence of the unknown constant 𝐶 (𝑥) which bounds the maximal reporting delay and allowing
consoring to depend on T through the ascertainment process (V an Der Laan and Hubbard, 1998).
Their estimator can also incorporate covariates and is locally ef ficient. They used similar data
structure as in Hu & T siatis (1996). Let 𝑅 (𝑡) = 𝐼 (𝑇 ≤ 𝑡) , which represents the vital status at
time 𝑡 . Let 𝑉 1
denote the time 𝑅 (𝑡) was observed, that is
𝑉 1
(𝑡) =
⎧
{
{
⎨
{
{
⎩
𝑈 𝑗 if 𝑡 ∈ [ 𝐴 𝑗 , 𝐴 𝑗+1
]
𝑡 if 𝑡 ≥ 𝐴 𝑘 Let 𝑊 (𝑡) ∈ ℝ
𝑘 (𝑡 ≥ 0) be a covariate process which has the same reporting delay as does the
vital status of 𝑇 . Let 𝑋 (𝑡) = (𝑅 {𝑉
1
(𝑡)}, 𝑉 1
(𝑡), 𝑊 {𝑉
1
(𝑡)} be the process 𝑋 up until time
𝑡 corresponds to observing 𝑅 , 𝑉 1
and 𝑊 up until time 𝑉 1
(𝑡) . Let
̄ 𝑋 (𝑡) = {𝑋 (𝑠) ∶ 𝑠 ≤ 𝑡}
represent the sample path of 𝑋 up until time 𝑡 . Let 𝑉 (𝑇 ) be the time at which 𝑇 is reported.
Thus the observed data structure can be represented as
𝑌 = {
̄ 𝑇 ≡ 𝐶 ∧ 𝑉 (𝑇 ), △ ≡ 𝐼 {𝑉 (𝑇 ) ≤ 𝐶 },
̄ 𝑋 {𝐶 ∧ 𝑉 (𝑇 )}}
20
They first showed that Hu and T siatis estimator is an inverse probability of censoring weighted
estimator . Let 𝑉 ∗
(𝑡) be the earliest time at which 𝑅 (𝑡) = 𝐼 (𝑇 ≤ 𝑡) was known. That is,
𝑉 ∗
(𝑡) =
⎧
{
{
⎨
{
{
⎩
𝐴 𝑗+1
if 𝑡 ∈ (𝑈
𝑗 , 𝑈 𝑗+1
]
𝐴 𝑘 if 𝑡 ≥ 𝐴 𝑘 Let 𝑍 (𝑡) = 𝑉 ∗
(𝑡) ∧ 𝐶 , △(𝑡) = 𝐼 {𝑉
∗
(𝑡) ≤ 𝐶 } . By assuming independence between 𝐶 and (𝑉
∗
(𝑡), 𝑇 ) , they have
𝐸 {△(𝑡)|𝑇 , 𝑉 ∗
(𝑡)} = 𝑝 𝑟 {𝐶 ≥ 𝑉 ∗
(𝑡)|𝑇 , 𝑉 ∗
(𝑡)} =
̄ 𝐺(𝑉
∗
(𝑡))
⇒ 𝐸 [
𝐼 (𝑇 ≤ 𝑡)△(𝑡)
̄ 𝐺(𝑍 (𝑡))
] = 𝐹 (𝑡)
⇒ 𝐹 𝑛 (𝑡) =
1
𝑛 𝑛 ∑
𝑖=1
𝐼 (𝑇
𝑖 ≤ 𝑡)△
𝑖 (𝑡)
̄ 𝐺 𝑛 (𝑍
𝑖 (𝑡))
where
̄ 𝐺(𝑥) = 𝑝 𝑟 (𝐶 ≥ 𝑥) ,
̄ 𝐺 𝑛 is an estimator of
̄ 𝐺 .
̄ 𝐺 can be estimated by the Kaplan-
Meier estimator based on 𝑛 observations of (𝑍 (𝑡), 1 − △(𝑡)) , where 𝑉 ∗
(𝑡) plays the role of
the censoring variable for 𝐶 .
Next, they provide a new inverse probability of censoring weighted estimator that weights
the observed 𝐼 (𝑇
𝑖 ≤ 𝑡) by the correct probability of censoring. The simple estimator is pre-
sented as
𝐸 (△|𝑋 ) = 𝑝 𝑟 {𝐶 ≥ 𝑉 (𝑇 )|𝑋 } =
̄ 𝐺(𝑉 (𝑇 )|𝑋 )
⇒ 𝐸 {
𝐼 (𝑇 ≤ 𝑡)△
̄ 𝐺(
̄ 𝑇 |𝑋 )
} = 𝐹 (𝑡)
⇒ 𝐹 0
𝑛 (𝑡) =
1
𝑛 𝑛 ∑
𝑖=1
𝐼 (𝑇
𝑖 ≤ 𝑡)△
𝑖 ̄ 𝐺 𝑛 (
̄ 𝑇 𝑖 |𝑋
𝑖 )
By adding the empirical mean of the ef ficient influence function to the above simple esti-
21
mator , they propose their locally ef ficient one-step estimator:
𝐹 1
𝑛 (𝑡) = 𝐹 0
𝑛 (𝑡) +
1
𝑛 𝑛 ∑
𝑖=1
[ 𝐼 𝐶 0
{𝑌
𝑖 |𝐺
𝑛 , 𝐹 0
𝑛 (𝑡)} − 𝐼 𝐶 ∗
𝑛𝑢 (𝑌
𝑖 |𝐹
𝑛 𝑋 , 𝐺 𝑛 )]
where
𝐼 𝐶 0
{𝑌 |𝐺, 𝐹 (𝑡)} ≡
𝐼 (𝑇 ≤ 𝑡)△
̄ 𝐺(
̄ 𝑇 |𝑋 )
− 𝐹 (𝑡)
𝐼 𝐶 ∗
𝑛𝑢 (𝑌 |𝐹
𝑋 , 𝐺) = − ∫ 𝐹 {𝑡|
̄ 𝑋 (𝑢),
̄ 𝑇 > 𝑢}
𝑑 𝑀 (𝑢)
̄ 𝐺(𝑢|𝑋 )
𝑑 𝑀 (𝑢) ≡ 𝐼 (𝐶 ∈ 𝑑 𝑢, △ = 0) − ∧
𝐶 (𝑑 𝑢|𝑋 )𝐼 (
̄ 𝑇 > 𝑢)
Compared to Hu & T siatis, this study has relaxed the requirment of recording of entire as-
certainment process, however , the knowledge of the time of delay of event reporting is still
necessary . Moreover , the estimates they proposed are dependent on accurate estimation of cen-
soring process. A low rate of censoring, which is not unusual, may lead to poorly defined
estimates.
22
Chapter 3: Theor etical Considerations for Inefficacy
Modelling in the Pr esence of Delayed Reporting
3.1 Estimates under perfect r eporting
Assume the survival time 𝑇 ∼ 𝐸 𝑥𝑝 (𝜆) . Let 𝑡 𝐸 be the time of end of enrollment, let 𝑡 𝐹 be
the time of follow-up after the end of enrollment, thus 𝑡 𝐸 + 𝑡 𝐹 is the planned time of study
analysis. Let 𝜏 ∼ 𝑈 𝑛𝑖𝑓 (0, 𝑡 𝐸 ) be the time of enrollment, defined as the number of months
after the study was opened for enrollment. The expected number of events observed in any one
individual by the end of study for each treatment arm 𝑖 is
𝐸 (𝐷
𝑖 ) = ∫
𝑡 𝐸 0
1
𝑡 𝐸 (∫
𝑡 𝐸 +𝑡
𝐹 −𝜏
0
𝑓 𝑖 (𝑡)𝑑 𝑡)𝑑 𝜏 = ∫
𝑡 𝐸 0
1
𝑡 𝐸 (∫
𝑡 𝐸 +𝑡
𝐹 −𝜏
0
𝜆 𝑖 𝑒 −𝜆
𝑖 𝑡 𝑑 𝑡)𝑑 𝜏 = ∫
𝑡 𝐸 0
1
𝑡 𝐸 (1 − 𝑒 −𝜆
𝑖 (𝑡
𝐸 +𝑡
𝐹 −𝜏 )
)𝑑 𝜏 =
1
𝑡 𝐸 (𝜏 −
𝑒 −𝜆
𝑖 (𝑡
𝐸 +𝑡
𝐹 −𝜏 )
𝜆 𝑖 )∣
𝑡 𝐸 0
= 1 −
𝑒 −𝜆
𝑖 𝑡 𝐹 (1 − 𝑒 −𝜆
𝑖 𝑡 𝐸 )
𝜆 𝑖 𝑡 𝐸 ( 3.1.1)
Where 𝑖 = 0 denotes the control arm, and 𝑖 = 1 denotes the experimental arm. The pdf
function 𝑓 (𝑡) = 𝜆𝑒 −𝜆𝑡
. Let 𝐴 be the interim analysis time when 50% of expected events
under the alternative has been observed. Under perfect reporting, the event status at the time of
analysis 𝐴 in each treatment arm 𝑖 is
23
𝐷 𝑖 =
⎧
{
{
⎨
{
{
⎩
1 if 𝑇 ≤ 𝐴 − 𝜏 0 if 𝑇 > 𝐴 − 𝜏 The corresponding survival time is
𝑇 𝑖 =
⎧
{
{
⎨
{
{
⎩
𝑇 if 𝑇 ≤ 𝐴 − 𝜏 𝐴 − 𝜏 if 𝑇 > 𝐴 − 𝜏 Let 𝑡 𝐴 = 𝑚𝑖𝑛(𝐴, 𝑡 𝐸 ) . By law of total expectation, we have the expected number of events per
person enrolled in each treatment arm 𝑖 at analysis time 𝐴 is
𝐸 (𝐷
𝑖 ) = ∫
𝑡 𝐴 0
1
𝑡 𝐴 (𝑝
𝑖 (𝑡 ≤ 𝐴 − 𝜏 ) ∫
𝐴−𝜏
0
𝑓 𝑖 (𝑡|𝑡 ≤ 𝐴 − 𝜏 )𝑑 𝑡)𝑑 𝜏 = ∫
𝑡 𝐴 0
1
𝑡 𝐴 (𝑝
𝑖 (𝑡 ≤ 𝐴 − 𝜏 ) ∫
𝐴−𝜏
0
𝑓 𝑖 (𝑡)
𝑝 𝑖 (𝑡 ≤ 𝐴 − 𝜏 )
𝑑 𝑡)𝑑 𝜏 = ∫
𝑡 𝐴 0
1
𝑡 𝐴 ∫
𝐴−𝜏
0
𝑓 𝑖 (𝑡)𝑑 𝑡𝑑 𝜏 ( 3.1.2)
The corresponding expected survival time is
𝐸 (𝑇
𝑖 ) = ∫
𝑡 𝐴 0
1
𝑡 𝐴 (𝑝
𝑖 (𝑡 ≤ 𝐴 − 𝜏 ) × 𝐸 (𝑡) + 𝑝 𝑖 (𝑡 > 𝐴 − 𝜏 ) × 𝐸 (𝐴 − 𝜏 ))𝑑 𝜏 = ∫
𝑡 𝐴 0
1
𝑡 𝐴 (𝑝
𝑖 (𝑡 ≤ 𝐴 − 𝜏 ) × ∫
𝐴−𝜏
0
𝑡𝑓 𝑖 (𝑡|𝑡 ≤ 𝐴 − 𝜏 )𝑑 𝑡 + 𝑆 𝑖 (𝐴 − 𝜏 ) × (𝐴 − 𝜏 ))𝑑 𝜏 = ∫
𝑡 𝐴 0
1
𝑡 𝐴 ( ∫
𝐴−𝜏
0
𝑡𝑓 𝑖 (𝑡)𝑑 𝑡 + 𝑆 𝑖 (𝐴 − 𝜏 ) × (𝐴 − 𝜏 ))𝑑 𝜏 ( 3.1.3)
where the suvival function 𝑆 (𝑡) = 𝑒 −𝜆𝑡
. In this study , we will always assume the interim
24
monitoring for futility occurs before the end of enrollment, which means 𝐴 < 𝑡 𝐸 , thus allows
the possibility to stop prior to the planned end of the trial and conserve patient resources. In
this case, 𝑡 𝐴 = 𝑚𝑖𝑛(𝐴, 𝑡 𝐸 ) = 𝐴 , we have
𝐸 (𝐷
𝑖 ) = ∫
𝐴 0
1
𝐴 ∫
𝐴−𝜏
0
𝜆 𝑖 𝑒 −𝜆
𝑖 𝑡 𝑑 𝑡𝑑 𝜏 = ∫
𝐴 0
1
𝐴 ( − 𝑒 −𝜆
𝑖 𝑡 ∣
𝐴−𝜏
0
)𝑑 𝜏 = ∫
𝐴 0
1
𝐴 (1 − 𝑒 −𝜆
𝑖 (𝐴−𝜏 )
)𝑑 𝜏 =
1
𝐴 (𝜏 −
𝑒 −𝜆
𝑖 (𝐴−𝜏 )
𝜆 𝑖 )∣
𝐴 0
=
1
𝐴 (𝐴 −
1 − 𝑒 −𝜆
𝑖 𝐴 𝜆 𝑖 )
= 1 −
1 − 𝑒 −𝜆
𝑖 𝐴 𝜆 𝑖 𝐴 ( 3.1.4)
𝐸 (𝑇
𝑖 ) = ∫
𝐴 0
1
𝐴 ( ∫
𝐴−𝜏
0
𝑡𝜆 𝑖 𝑒 −𝜆
𝑖 𝑡 𝑑 𝑡 + 𝑒 −𝜆
𝑖 (𝐴−𝜏 )
(𝐴 − 𝜏 ))𝑑 𝜏 = ∫
𝐴 0
1
𝐴 ( − (𝐴 − 𝜏 )𝑒
−𝜆
𝑖 (𝐴−𝜏 )
+
1 − 𝑒 −𝜆
𝑖 (𝐴−𝜏 )
𝜆 𝑖 + 𝑒 −𝜆
𝑖 (𝐴−𝜏 )
(𝐴 − 𝜏 ))𝑑 𝜏 = ∫
𝐴 0
1
𝐴 1 − 𝑒 −𝜆
𝑖 (𝐴−𝜏 )
𝜆 𝑖 𝑑 𝜏 =
1
𝐴𝜆 𝑖 (𝜏 −
𝑒 −𝜆
𝑖 (𝐴−𝜏 )
𝜆 𝑖 )∣
𝐴 0
=
1
𝜆 𝑖 (1 −
1 − 𝑒 −𝜆
𝑖 𝐴 𝜆 𝑖 𝐴 ) ( 3.1.5)
25
Thus, the hazard rate for each treatment arm 𝑖 is
𝜆 𝑖 =
𝐸 (𝐷
𝑖 )
𝐸 (𝑇
𝑖 )
, 𝑖 = 0, 1 ( 3.1.6)
which is independent of analysis time 𝐴 .
T o solve for the interim analysis time 𝐴 , let 𝑡 𝑘 be the fraction of the total information avail-
able at the 𝑘 𝑡ℎ
interim analysis, we have
𝐴 𝑡 𝐸 (1 −
1 − 𝑒 −𝜆
0
𝐴 𝜆 0
𝐴 + 1 −
1 − 𝑒 −𝜆
1
𝐴 𝜆 1
𝐴 ) = 𝑡 𝑘 (1 −
𝑒 −𝜆
0
𝑡 𝐹 (1 − 𝑒 −𝜆
0
𝑡 𝐸 )
𝜆 0
𝑡 𝐸 + 1 −
𝑒 −𝜆
1
𝑡 𝐹 (1 − 𝑒 −𝜆
1
𝑡 𝐸 )
𝜆 1
𝑡 𝐸 )
⇒ 2𝐴𝜆
0
𝜆 1
+ 𝜆 1
𝑒 −𝜆
0
𝐴 + 𝜆 0
𝑒 −𝜆
1
𝐴 = 𝐶 ( 3.1.7)
where
𝐶 = 2𝑡
𝑘 𝜆 0
𝜆 1
𝑡 𝐸 + 𝜆 0
+ 𝜆 1
− 𝑡 𝑘 (𝜆
1
𝑒 −𝜆
0
𝑡 𝐹 (1 − 𝑒 −𝜆
0
𝑡 𝐸 ) + 𝜆 0
𝑒 −𝜆
1
𝑡 𝐹 (1 − 𝑒 −𝜆
1
𝑡 𝐸 ))
There is no closed form solution for 𝐴 from the above formula. For all subsequent calculations
I used the Newton-Raphson method (Kreyszig, 2000) to find the numerical solution for 𝐴 . An
initial guess of analysis time A was given as the starting value. The conver gence occurred
when the estimate changed by less than 1.0 × 10
−7
.
3.2 Estimates under delayed events r eporting
T o estimate the hazard ratio in the presence of delayed events reporting, let 𝑝 be the probablity
of delayed reporting of event, let 𝐴 be the interim analysis time, let 𝑤 be the length of interval
26
between regular follow-up visits, let 𝐿 be the time of last visit before analysis time 𝐴 . Assuming
𝑤 is a constant for all patients throughout the entire trial. Let 𝑡 𝑤 = 𝑚𝑖𝑛(𝐴 − 𝑤 , 𝑡 𝐸 ) , 𝑡 𝐴 =
𝑚𝑖𝑛(𝐴, 𝑡 𝐸 ) . For patients enrolled before time 𝐴 − 𝑤 , the event status can be defined as
𝐸 (𝐷
𝑖 ) =
⎧
{
{
{
{
⎨
{
{
{
{
⎩
1 if 𝑇 ≤ 𝐿 − 𝜏 1 − 𝑝 if 𝐿 − 𝜏 ≤ 𝑇 ≤ 𝐴 − 𝜏 0 if 𝑇 > 𝐴 − 𝜏 The corresponding survival time 𝑇 𝑖 is
𝐸 (𝑇
𝑖 ) =
⎧
{
{
{
{
⎨
{
{
{
{
⎩
𝑇 if 𝑇 ≤ 𝐿 − 𝜏 (1 − 𝑝 )𝑇 + 𝑝 (𝐿 − 𝜏 ) if 𝐿 − 𝜏 ≤ 𝑇 ≤ 𝐴 − 𝜏 𝐿 − 𝜏 if 𝑇 > 𝐴 − 𝜏 For patients enrolled between [ 𝐴 − 𝑤 , 𝐴] , no regular follow-up was scheduled before the
analysis time 𝐴 . Only events occured within this time frame would be reported with probability
1 − 𝑝 . The event status for these patients can be defined as
𝐸 (𝐷
𝑖 ) =
⎧
{
{
⎨
{
{
⎩
1 − 𝑝 if 𝑇 ≤ 𝐴 − 𝜏 0 if 𝑇 > 𝐴 − 𝜏 The corresponding survival time 𝑇 𝑖 is
𝐸 (𝑇
𝑖 ) =
⎧
{
{
⎨
{
{
⎩
(1 − 𝑝 )𝑇 if 𝑇 ≤ 𝐴 − 𝜏 0 if 𝑇 > 𝐴 − 𝜏 27
Therefore, by law of total expectation, the expected number of events in treatment arm 𝑖 is
𝐸 (𝐷
𝑖 |𝑝 ) = ∫
𝑡 𝑤 0
1
𝑡 𝐴 [ ∫
𝐿−𝜏
0
𝑓 𝑖 (𝑡)𝑑 𝑡 + ∫
𝐴−𝜏
𝐿−𝜏
(1 − 𝑝 )𝑓
𝑖 (𝑡)𝑑 𝑡]𝑑 𝜏 + ∫
𝑡 𝐴 𝑡 𝑤 1
𝑡 𝐴 ∫
𝐴−𝜏
0
(1 − 𝑝 )𝑓
𝑖 (𝑡)𝑑 𝑡𝑑 𝜏 ( 3.2.1)
The corresponding survival time 𝑇 𝑖 is
𝐸 (𝑇
𝑖 |𝑝 ) = ∫
𝑡 𝑤 0
1
𝑡 𝐴 [ ∫
𝐿−𝜏
0
𝑡𝑓 𝑖 (𝑡)𝑑 𝑡 + ∫
𝐴−𝜏
𝐿−𝜏
((1 − 𝑝 )𝑡 + 𝑝 (𝐿 − 𝜏 ))𝑓
𝑖 (𝑡)𝑑 𝑡 + (𝐿 − 𝜏 )𝑆
𝑖 (𝐴 − 𝜏 )]𝑑 𝜏 + ∫
𝑡 𝐴 𝑡 𝑤 1
𝑡 𝐴 ∫
𝐴−𝜏
0
(1 − 𝑝 )𝑡𝑓
𝑖 (𝑡)𝑑 𝑡𝑑 𝜏 ( 3.2.2)
In the case of 𝐴 < 𝑡 𝐸 , 𝑡 𝐴 = 𝑚𝑖𝑛(𝐴, 𝑡 𝐸 ) = 𝐴 , 𝑡 𝑤 = 𝑚𝑖𝑛(𝐴 − 𝑤 , 𝑡 𝐸 ) = 𝐴 − 𝑤 , we have
28
𝐸 (𝐷
𝑖 |𝑝 ) = ∫
𝑡 𝑤 0
1
𝑡 𝐴 [ ∫
𝐿−𝜏
0
𝑓 𝑖 (𝑡)𝑑 𝑡 + ∫
𝐴−𝜏
𝐿−𝜏
(1 − 𝑝 )𝑓
𝑖 (𝑡)𝑑 𝑡]𝑑 𝜏 + ∫
𝑡 𝐴 𝑡 𝑤 1
𝑡 𝐴 ∫
𝐴−𝜏
0
(1 − 𝑝 )𝑓
𝑖 (𝑡)𝑑 𝑡𝑑 𝜏 = ∫
𝐴−𝑤
0
1
𝐴 [ ∫
𝐿−𝜏
0
𝜆 𝑖 𝑒 −𝜆
𝑖 𝑡 𝑑 𝑡 + ∫
𝐴−𝜏
𝐿−𝜏
(1 − 𝑝 )𝜆
𝑖 𝑒 −𝜆
𝑖 𝑡 𝑑 𝑡]𝑑 𝜏 + ∫
𝐴 𝐴−𝑤
1
𝐴 ∫
𝐴−𝜏
0
(1 − 𝑝 )𝜆
𝑖 𝑒 −𝜆
𝑖 𝑡 𝑑 𝑡𝑑 𝜏 = ∫
𝐴−𝑤
0
1
𝐴 [1 − 𝑒 −𝜆
𝑖 (𝐿−𝜏 )
+ (1 − 𝑝 )(𝑒
−𝜆
𝑖 (𝐿−𝜏 )
− 𝑒 −𝜆
𝑖 (𝐴−𝜏 )
)]𝑑 𝜏 + ∫
𝐴 𝐴−𝑤
1 − 𝑝 𝐴 (1 − 𝑒 −𝜆
𝑖 (𝐴−𝜏 )
)𝑑 𝜏 =1 −
1 − 𝑒 −𝜆
𝑖 𝐴 𝜆 𝑖 𝐴 −
𝑝 𝜆 𝑖 𝐴 (𝜆
𝑖 𝑤 − 1 − 𝑒 −𝜆
𝑖 𝐿 + 𝑒 −𝜆
𝑖 𝐴 + 𝑒 −𝜆
𝑖 (𝐿+𝑤−𝐴)
) ( 3.2.3)
T o solve for the interim analysis time 𝐴 , let 𝑡 𝑘 be the fraction of the total information avail-
able at the 𝑘 𝑡ℎ
interim analysis, we have
𝐴 𝑡 𝐸 (1 −
1 − 𝑒 −𝜆
0
𝐴 𝜆 0
𝐴 −
𝑝 𝜆 0
𝐴 (𝜆
0
𝑤 − 1 − 𝑒 −𝜆
0
𝐿 + 𝑒 −𝜆
0
𝐴 + 𝑒 −𝜆
0
(𝐿+𝑤−𝐴)
)
+ 1 −
1 − 𝑒 −𝜆
1
𝐴 𝜆 1
𝐴 −
𝑝 𝜆 1
𝐴 (𝜆
1
𝑤 − 1 − 𝑒 −𝜆
1
𝐿 + 𝑒 −𝜆
1
𝐴 + 𝑒 −𝜆
1
(𝐿+𝑤−𝐴)
))
= 𝑡 𝑘 (1 −
𝑒 −𝜆
0
𝑡 𝐹 (1 − 𝑒 −𝜆
0
𝑡 𝐸 )
𝜆 0
𝑡 𝐸 + 1 −
𝑒 −𝜆
1
𝑡 𝐹 (1 − 𝑒 −𝜆
1
𝑡 𝐸 )
𝜆 1
𝑡 𝐸 )
⇒ 2𝐴𝜆
0
𝜆 1
+ 𝜆 1
𝑒 −𝜆
0
𝐴 + 𝜆 0
𝑒 −𝜆
1
𝐴 − 𝑝 (𝜆
0
𝑓 (𝜆
1
, 𝐴) + 𝜆 1
ℎ(𝜆
0
, 𝐴)) = 𝐶 ( 3.2.4)
where
ℎ(𝜆, 𝐴) = 𝜆𝑤 − 1 − 𝑒 −𝜆𝐿
+ 𝑒 −𝜆𝐴
+ 𝑒 −𝜆(𝐿+𝑤−𝐴)
,
𝐶 = 2𝑡
𝑘 𝜆 0
𝜆 1
𝑡 𝐸 + 𝜆 0
+ 𝜆 1
− 𝑡 𝑘 (𝜆
1
𝑒 −𝜆
0
𝑡 𝐹 (1 − 𝑒 −𝜆
0
𝑡 𝐸 ) + 𝜆 0
𝑒 −𝜆
1
𝑡 𝐹 (1 − 𝑒 −𝜆
1
𝑡 𝐸 ))
29
There is no closed form solution for 𝐴 from the above formula. Newton-Raphson method
(Kreyszig, 2000) was used to find the numerical solution for 𝐴 . An initial guess of analysis
time A was given as the starting value. The conver gence occurred when the estimate changed
by less than 1.0 × 10
−7
.
3.3 Theor etical calculations of RGray analysis time by Newton-Raphson
method
In this study , we will examine the futility monitoring in the presence of delayed events report-
ing under the alternative hypothesis ( 𝐻 1
∶ 𝜃 𝐴 = 𝑙𝑜 𝑔 (2/3) = −0.405 ) and under the null
hypothesis ( 𝐻 0
∶ 𝜃 0
= 0 ), where 𝜃 denotes the log hazard ratio of experimental arm over the
standard control arm. The baseline hazard rate of the control group was set as 1/3 . W e will
evaluate the performance of futility monitoring methods under dif ferent probabilities of late
events reporting (from 0 to 1, at 0.2 increments) and with dif ferent regular follow-up intervals
(2 months, 4 months and 6 months intervals). A one-day interval (0.033 month) will be used to
reflect perfect reporting.
T o perform the futility monitoring with RGray method, first we need to identify the analysis
time, which is when the 50% of expected events under the alternative hypothesis are observed.
As described in the previous section, the theoretical value of RGray analysis time can be solved
by equation (3.2.4), setting 𝑡 𝑘 = 0.5 . The numerical solution can be achieved by Newton-
Raphson method. Below I examined theoretical results that are examined in a simulation study
in Chapter 4. Let 𝑡 𝐸 = 48 months and 𝑡 𝐹 = 12 months.
30
Figure 3.3.1. Theoretical RGray analysis time (in months) under the Altenative by Newton-
Raphson method
Fig 3.3.1 demonstrated the estimation of RGray analysis time where the total information
was estimated as the total number of events under the alternative. The 0.033 month reporting
interval (which equals 1 day interval) represented the situation that patient status was assessed
everyday , which indicated almost perfect reporting and should not be af fected by late events
reporting. Indeed, the results in Fig 3.3.1 showed that the RGray analysis time of 0.033 months
reporting interval increased by less than one-tenth of one percent across the probabilities of
delayed examined. W e will use an interval of 0.033 months to represent no reporting delay
in subsequent presentations. In the case of longer reporting intervals (2 months interval, 4
months interval and 6 months intervals), the RGray analysis time increased as the probability
of late events reporting increased. This is due to the fact that if more patients tend to report
31
their events late, the researchers will collect fewer events, in expectation at any analysis time
compared with perfect reporting. Hence investigators will delay the time of futility analysis to
obtain the required number of events. The tar get analysis time became greater as the reporting
interval increased.
For the subsequent discussions in section 3, I denote the theoretical hazard ratio as
𝐸 (𝐷
0
|𝑝 )/𝐸 (𝑇
0
|𝑝 )
𝐸 (𝐷
1
|𝑝 )/𝐸 (𝑇
1
|𝑝 )
The ratio is the expected value of the MLE of the hazard ratio estimate based on an exponen-
tial model not accounting for delayed events reporting. The theoretical hazard ratio is defined
analogously .
3.4 Theor etical log hazard ratio at RGray analysis time
After we obtained the theoretical RGray analysis time by Newton-Raphson method, we can
calculate the expected number of events with equation (3.2.1), the expected survival time with
equation (3.2.2), and the hazard rate in each arm with equation (3.1.6). The theoretical hazard
ratio could be obtained thereafter .
32
Figure 3.4.1. Theoretical log hazard ratio at RGray analysis time under the Alternative
The result was presented in the log scale. Under the scenario of perfect reporting, which was
indicated by the 0.033 month intervals, the theoretical log hazard ratio was -0.405, which was
exactly the same as what was designed for this study . The theoretical log hazard ratio started
showing subtle dif ference under dif ferent probability of later reporting with 2-months reporting
intervals. The dif ference became lar ger in absolute value as the reporting interval got lar ger .
At the 6-month reporting interval, the log hazard ratio with late event reporting probability 1
remained to be -0.405, while it increased to -0.383 when the late event reporting probability
decreased to 0, which meant that all events were to be reported in a timely manner , but well
follow-up was only reported at the time points required by the follow-up schedule.
On the contrary , the theoretical log hazard ratio under the null hypothesis should not be
af fected by the change of reporting interval or late events reporting probability . This is because
33
the 𝐸 (𝑇
𝑖 ) and 𝐸 (𝐷
𝑖 ) are the same for each of the two regimens when 𝜆 are equal. Therefore,
the hazard rate calculated by equation (3.1.6) would be the same for each arm. Hence, the
hazard ratio would be 1, and log hazard ratio would be 0 for all scenarios.
3.5 Theor etical Repeated Confidence Intervals in the pr esence of delayed
events r eporting
In addition to RGray method, we will evaluate the performance of another widely used futility
monitoring method, the Repeated Confidence Intervals (RCI), in the presence of delayed events
reporting. The RCIs at each interim stages can be calculated as indicated in (2.1.2.2), which
requires the knowledge of log hazard ratio estimates and the alpha-spending boundaries. For a
one-sided 4-stage sequential design with a lower alternative hypothesis, the boundaries can be
calculated with SAS SEQDESIGN, in both MLE (Maximum Likelihood Estimate) scale and
standardized Z scale (T able 1).
T able 1. Lower Alpha-spending boundary in a 4-stage design
Stage Information
level
Lower Alt
Reference
(MLE
scale)
Lower
Boundary
(MLE
scale)
Lower Alt
Reference
(Z scale)
Lower
Boundary
(Z scale)
1 0.25 -0.40547 -0.80929 -1.48060 -2.95517
2 0.50 -0.40547 -0.49561 -2.09389 -2.55936
3 0.75 -0.40547 -0.36379 -2.56448 -2.30085
4 1.00 -0.40547 -0.28645 -2.96120 -2.09196
As shown in 3.4, the theoretical log hazard ratio can be obtained after the numerical solution
of analysis time was solved. W e can then calculate the theoretical repeated confidence intervals
in each scenario (Fig3.5.1 )
34
Figure 3.5.1. Theoretical Lower Repeated Confidence Intervals under the Alternative
The theoretical lower bound confidence intervals were smaller than the low alternative hy-
pothesis -0.405 across all the scenarios under the alternative. In expectation, the trial would pass
the interim futility check and proceed to the end, which was the desired consequence. While
under the null, the theoretical lower confidence intervals would be the same as the lower MLE
scale boundary , since the theoretical log hazard ratio was 0 across all the scenarios. Therefore,
the trial was expected to stop and claim futility at stage 3, which was the 75% information time,
since it was the first time the lower bound of repeated confidence intervals became greater than
the lower alternative reference value.
35
3.6 Loss of power with interim futility monitoring analysis
It was shown(Jennison and T urnbull, 1999) that for interim analysis done at the information
fraction 𝑡 , the joint distribution of the corresponding standardized test statistics {𝑍
1
, ⋯ , 𝑍 𝑡 }
follows multivariate normal distribution with 𝐸 (𝑍
𝑡 ) = 𝜃 √
𝑡𝐼 and 𝐶 𝑜 𝑣 (𝑍
𝑡 𝑖 , 𝑍 𝑡 𝑗 ) = √ 𝑡 𝑖 /𝑡
𝑗 ,
where 0 < 𝑡 𝑖 < 𝑡 𝑗 ≤ 1 and 𝐼 denotes the full information.
The loss of power with one interim analysis done at information fraction 𝑡 1
can be calculated
as:
𝑃 𝑟 ( Reject W ithout Interim ) − 𝑃 𝑟 ( Meet interim criteria and reject at the final analysis )
which is
∫
𝑏 −∞
𝑓 (𝑧 )𝑑 𝑥 − ∫
𝑏 −∞
∫
𝑎 −∞
𝑔 (𝑧 )𝑑 𝑧 1
𝑑 𝑧 2
( 3.6.1)
where f is the pdf associated with 𝑍 ∼ 𝒩(𝜃
√
𝐼 , 1) , and g is the pdf associated with the
bivariate normal distribution given by 𝑧 ∼ 𝒩(
⎡
⎢
⎢
⎣
𝜃 √ 𝑡 1
𝐼 𝜃 √ 𝑡 2
𝐼 ⎤
⎥
⎥
⎦
,
⎡
⎢
⎢
⎣
1 √
𝑡 1
𝑡 2
√
𝑡 1
𝑡 2
1
⎤
⎥
⎥
⎦
) , and 𝑡 2
= 1 .
The loss of power with three interim analysis done at information fraction 𝑡 1
, 𝑡 2
and 𝑡 3
, can
be calculated as:
𝑃 𝑟 ( Reject W ithout Interim ) − 𝑃 𝑟 ( Meet interim criteria and reject at the final analysis )
which is
∫
𝑑 −∞
𝑓 (𝑧 )𝑑 𝑥 − ∫
𝑑 −∞
∫
𝑐 −∞
∫
𝑏 −∞
∫
𝑎 −∞
𝑔 (𝑧 )𝑑 𝑧 1
𝑑 𝑧 2
𝑑 𝑧 3
𝑑 𝑧 4
( 3.6.2)
where f is the pdf associated with 𝑍 ∼ 𝒩(𝜃
√
𝐼 , 1) , and g is the pdf associated with the
36
multivariate normal distribution given by 𝑧 ∼ 𝒩(
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
𝜃 √ 𝑡 1
𝐼 𝜃 √ 𝑡 2
𝐼 𝜃 √ 𝑡 3
𝐼 𝜃 √ 𝑡 4
𝐼 ⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
,
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
1 √
𝑡 1
𝑡 2
√
𝑡 1
𝑡 3
√
𝑡 1
𝑡 4
√
𝑡 1
𝑡 2
1 √
𝑡 2
𝑡 3
√
𝑡 2
𝑡 4
√
𝑡 1
𝑡 3
√
𝑡 2
𝑡 3
1 √
𝑡 3
𝑡 4
√
𝑡 1
𝑡 4
√
𝑡 2
𝑡 4
√
𝑡 3
𝑡 4
1
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
) ,
and 𝑡 4
= 1 .
At information fraction 𝑡 , let 𝐷 1
be the number of expected events for the control arm and
𝐷 2
be the number of expected events for the experimental arm. The information 𝐼 𝑡 can be
calculated as
𝐼 𝑡 =
1
1
𝐷 1
+
1
𝐷 2
and the standardized test statistics 𝑍 𝑡 can be calculated as
𝐸 (𝑍
𝑡 ) = 𝜃 √ 𝐼 𝑡 Since 𝑍 are standardized statistics, the variance-covariance matrix is actually a correlation ma-
trix.
The loss of power with RGray method with one interim analysis at 50% information time
under perfect reporting can be calculated with equation 3.6.1, which results in 0.3%. The loss
of power with repeated confidence interval method with three interim analyses at 25%, 50%
and 75% information time under perfect reporting can be calculated with equation 3.6.2, which
results in 0.1%. Similarly , the loss of power with futility interim monitoring in the presence
of delayed events reporting can be calculated, with 𝐸 (𝑍
𝑡 ) = 𝜃 ∗
√ 𝐼 𝑡 , where 𝜃 ∗
denotes the
theoretical log hazard ratio in the presence of delayed events reporting, which can be calculated
as described in section 3.4. The variance-covariance matrix can be calculated based on equation
3.10 which was described in Dr . Lopez’ s dissertation work (López Nájera, 2016), which is
37
Σ
∗
= 𝐽 ℎ
(𝐸 ( v ))Σ𝐽
ℎ
(𝐸 ( v ))
𝑇 ( 3.6.3)
The vector v is defined as
v = [ 𝐷 0,1
, 𝑉 0,1
, ⋯ , 𝐷 0,𝐾
, 𝑉 0,𝐾
, 𝐷 1,1
, 𝑉 1,1
, ⋯ , 𝐷 1,𝐾
, 𝑉 1,𝐾
]
′
where 𝐷 𝑗,𝑘
and 𝑉 𝑗,𝑘
denote the total number of events and mean time under observa-
tion at the 𝑘 𝑡ℎ
analysis, 𝑘 = 1, ⋯ , 𝐾 , for participants in treatment groups 0 (control) and 1
(experimental)( 𝑗 ∈ {0, 1} ). 𝐽 (𝐸 ( v )) is the Jacobian evaluated at 𝐸 ( v ) such that,
𝐽 𝑛 (𝐸 ( v )) =
⎛
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝
𝜕 ℎ
1
(𝑥)
𝜕 𝑥 0,1
∣
𝐸 ( v )
⋯
𝜕 ℎ
1
(𝑥)
𝜕 𝑦 1,𝑘
∣
𝐸 ( v )
⋮ ⋱ ⋮
𝜕 ℎ
𝐾 (𝑥)
𝜕 𝑥 0,1
∣
𝐸 ( v )
⋯
𝜕 ℎ
𝐾 (𝑥)
𝜕 𝑦 1,𝑘
∣
𝐸 ( v )
⎞
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎠
and Σ is a block diagonal matrix. The score statistic ℎ
𝑘 ( v ) is defined as
ℎ
𝑘 ( v ) =
𝐷 1,𝑘
𝑉 0,𝑘
− 𝐷 0,𝑘
𝑉 1,𝑘
𝑉 0,𝑘
+ 𝑉 1,𝑘
( 3.6.4)
The score statistic 𝑆 can be transformed to 𝑍 statistic by
𝑍 =
𝑆 √
𝐼 ∼ 𝒩(𝜃
√
𝐼 , 1) ( 3.6.5)
3.7 Index of non-independent censoring (INIC)
Fig 3.4.1 showed that the biggest bias in log hazard ratio estimation was observed when report-
ing interval was 6 months. The results were based on the study design with baseline hazard rate
equals 1/3 , which was similar to the usual hazard rate of adult lung cancer . I was interested
38
to learn how the bias changed as the baseline hazard rate changed, with a fixed hazard ratio of
2/3 . Let 𝛽 𝑜𝑏𝑠 be the observed log hazard ratio, 𝛽 𝑡𝑟 𝑢 𝑒 be the true log hazard ratio. The bias is
calculated as:
𝐵 𝑖𝑎𝑠 =
𝛽 𝑜𝑏𝑠 − 𝛽 𝑡𝑟 𝑢𝑒 |𝛽
𝑡𝑟 𝑢𝑒 |
× 100%
The results in Fig3.7.1 showed that the bias of log hazard ratio increased as the baseline
hazard rate increased and the probability of late events reporting decreased. The log hazard
ratio estimate was increased by 5.4% when the baseline hazard rate was 0.33 and the probability
of late events reporting was 0. The bias was reduced to <1% when the baseline hazard rate was
0.01, which was similar to the hazard rate of most pediatric cancers. This result was confirmed
by simulations (Fig4.1).
Figure 3.7.1. The bias of log hazard ratio estimation with dif ferent baseline hazard rate, assum-
ing 6 months reporting interval and a true hazard ratio of 0.67.
From the fact that unbiased estimation of hazard ratio can be achieved when p=1, which
39
corresponds to the personal cutback method as described by Mcllvaine(McIlvaine, 2015), in
which situation none of the events were reported after the last scheduled visit, it can be inferred
that the bias was derived from non-independent information. In this section, I will introduce
the concept of index of non-independent censoring (INIC), and explore its relationship with the
hazard ratio estimation.
Figure 3.7.2. Schema to illustrate the calculation of index of non-independent information
For patients uniformly enter a reporting interval 𝑤 , the probability of accelerated reporting
(P AR) within the reporting interval w can be calculated as:
𝑃 𝐴𝑅 =
1
𝑤 ∫
𝑤 0
(1 − 𝑒 −𝜆𝑡
)𝑑 𝑡 =
1
𝑤 [𝑡 +
1
𝜆 𝑒 −𝜆𝑡
∣
𝑤 0
]
= 1 −
1 − 𝑒 −𝜆𝑤
𝜆𝑤 ( 3.7.1)
Let 𝑤 be the length of reporting interval. Let 𝑘 be the total number of scheduled visits
before analysis time A. Let 𝜆 0
be the hazard rate of control group and 𝜆 1
be the hazard rate
of experimental group. Considering the memoryless property of exponential distribution, for
patients who entered the study uniformly during 𝐼 𝑗 , 𝑗 = 1, ⋯ , 𝑘 , conditional on entering the
study during interval 𝐼 𝑗 , the probability of surviving 𝑗 − 1 reporting intervals and failing after
the last scheduled visit is:
40
𝑃 (𝑇 > (𝑗 − 1)𝑤 ) × 𝑃 𝐴𝑅 = 𝑒 −(𝑗−1)𝑤𝜆
𝑖 (1 −
1 − 𝑒 −𝜆
𝑖 𝑤 𝜆 𝑖 𝑤 ), 1 ≤ 𝑗 ≤ 𝑘 , 𝑖 = 0, 1 ( 3.7.2)
The probability of failure before the last scheduled visit is:
⎧
{
{
⎨
{
{
⎩
0, 𝑗 = 1
1 −
1−𝑒
−𝜆
𝑖 (𝑗−1)𝑤
𝜆 𝑖 (𝑗−1)𝑤
, 2 ≤ 𝑗 ≤ 𝑘 , 𝑖 = 0, 1
( 3.7.3)
The interval 𝑤 ∗
is time remainder after substracting the time elapsed for 𝑘 scheduled vis-
its from analysis time A. Conditional on enrolling during the interval 𝑤 ∗
, the probability of
surviving the next 𝑘 intervals and experiencing an event in 𝐼 1
after the last scheduled visit is:
𝑃 (𝑇 > 𝑘 𝑤 ) × 𝑃 𝐴𝑅 = 𝑒 −𝑘𝑤𝜆
𝑖 (1 −
1 − 𝑒 −𝜆
𝑖 𝑤 ∗
𝜆 𝑖 𝑤 ∗
), 𝑖 = 0, 1 ( 3.7.4)
The conditional probability of failure before the last scheduled visit is:
1 −
1 − 𝑒 −𝜆
𝑖 𝑘𝑤 𝜆 𝑖 𝑘 𝑤 , 𝑖 = 0, 1 ( 3.7.5)
The total unconditional probability of failure after last scheduled visit is:
𝑃 𝐴𝐿𝑆 𝑉 =
𝑤 𝐴 𝑘 ∑
𝑗=1
𝑒 −(𝑗−1)𝑤𝜆
𝑖 (1 −
1 − 𝑒 −𝜆
𝑖 𝑤 𝜆 𝑖 𝑤 ) +
𝑤 ∗
𝐴 𝑒 −𝑘𝑤𝜆
𝑖 (1 −
1 − 𝑒 −𝜆
𝑖 𝑤 ∗
𝜆 𝑖 𝑤 ∗
), 𝑖 = 0, 1
( 3.7.6)
The total unconditional probability of failure before last scheduled visit is:
41
𝑃 𝐵 𝐿𝑆 𝑉 =
𝑤 𝐴 𝑘 ∑
𝑗=2
(1 −
1 − 𝑒 −𝜆
𝑖 (𝑗−1)𝑤
𝜆 𝑖 (𝑗 − 1)𝑤
) +
𝑤 ∗
𝐴 (1 −
1 − 𝑒 −𝜆
𝑖 𝑘𝑤 𝜆 𝑖 𝑘 𝑤 ), 𝑖 = 0, 1 ( 3.7.7)
Assume 𝑝 is the probability of late events reporting, I define the index of non-independent
censoring (INIC) as
𝐼 𝑁 𝐼 𝐶 = (1 − 𝑝 ) × 𝑃 𝐴𝐿𝑆 𝑉 ( 3.7.8)
When 𝑝 = 1 , which corresponds to the personal cutback method of Mcllvaine(McIlvaine,
2015) which she demonstrates represents the case of non-informative censoring, the index INIC
is 0, which indicates that all information are current and accurate.
Let 𝐸 (𝐷 ) be the expected probability of an event under perfect reporting and 𝐸 (𝐷
𝑝𝑐 ) be the
number of events observed using the personal cutback method, the expected number of events
at analysis time A can be calculated with an alternative method:
𝐸 (𝐷
∗
𝑖 |𝑝 ) = 𝐸 (𝐷
𝑖 ) − 𝑝 × 𝑃 𝐴𝐿𝑆 𝑉 , 𝑖 = 0, 1 ( 3.7.9)
Similarly , the expected total time under observation can be calculated as the sum of contri-
butions to the expected value made by observations where events happen before the last visit
plus censored observations at the last schedule visit and contributions made from events that
happen after the last scheduled visit:
42
𝐸 (𝑇
∗
𝑖 |𝑝 ) = 𝐸 (𝑇
𝑝𝑐 ) + (1 − 𝑝 ) × 𝑃 𝐴𝐿𝑆 𝑉 × 𝐸 (𝑇
𝑖 | Events after last visit )
=
𝐸 (𝐷
𝑝𝑐 )
𝜆 𝑖 + (1 − 𝑝 ) × 𝑃 𝐴𝐿𝑆 𝑉 × 𝐸 (𝑇
𝑖 | Events after last visit ), 𝑖 = 0, 1
( 3.7.10)
Any patient who is event-free at the last scheduled visit has a distribution of time to failure
that is 𝐸 𝑥𝑝 𝑜 𝑛𝑒𝑛𝑡𝑖𝑎𝑙(𝜆) , because of the memorylessness property of the exponential distribu-
tion. Since the time at which the last scheduled visit is reached is obtained randomly over the
last 𝑤 time units, I calculated this contribution as:
𝐸 (𝑇
𝑖 | Event after last visit ) =
1
𝑤 ∫
𝑤 0
∫
𝜏 0
𝜆 𝑖 𝑡𝑒 −𝜆
𝑖 𝑡 𝑑 𝑡𝑑 𝜏 =
1
𝑤 ∫
𝑤 0
(
1
𝜆 𝑖 −
𝑒 −𝜆
𝑖 𝜏 𝜆 𝑖 − 𝜏 𝑒 −𝜆
𝑖 𝜏 )𝑑 𝜏 =
1 + 𝑒 −𝜆
𝑖 𝑤 𝜆 𝑖 −
2(1 − 𝑒 −𝜆
𝑖 𝑤 )
𝜆 2
𝑖 𝑤 , 𝑖 = 0, 1 ( 3.7.1 1)
Let 𝑑 𝑖 = ∑ 𝐸 (𝐷
𝑖 ) , 𝑡 𝑖 = ∑ 𝐸 (𝑇
𝑖 ) , the hazard rate of each group can be calculated as
𝑑 𝑖 /𝑡
𝑖 . The hazard ratio can be calculated as
̂ 𝐻 𝑅 =
𝑑 1
/𝑡
1
𝑑 0
/𝑡
0
. The natural log transformed hazard
ratio can be calculated as:
𝑙𝑛 ̂ 𝐻 𝑅 = 𝑙𝑛(𝑑
1
) − 𝑙𝑛(𝑡
1
) − 𝑙𝑛(𝑑
0
) + 𝑙𝑛(𝑡
0
) ( 3.7.12)
I next examined how the INIC changes with dif ferent hazard rate, reporting intervals, and
probability of late events reporting. The study design was 48 months enrollment, 12 months
follow-up and a fixed hazard ratio of 0.67. The INICs were calculated at the 50% information
43
time. Figure 3.7.3 showed that INIC approaches to 0 when the hazard rate became smaller and
the probability of late events reporting became lar ger . The INIC can be as lar ge as 0.085 when
baseline hazard rate was 0.4 and the probability of late events reporting was 0 and the reporting
interval was 6 months (Fig 3.7.3, table 2). The INIC was close to 0 when the baseline hazard
rate was very small regardless of p, the probability of late events reporting. The INIC was 0
when the probability of late events reporting was 1.
Figure 3.7.3. Changes of INIC in the presence of delayed events reporting as baseline hazard
rate increases, at fifty percent information time with a reporting interval of 6 months.
As shown in the T able 2, as the baseline hazard rate increased from 0.01 to 0.4, the INIC
increased from 0.02046 to 0.08538 in the control group, and increased from 0.01446 to 0.07840
in the experimental group.
Figure 3.7.4 showed how INIC changed with dif ferent baseline hazard rate and dif ferent
reporting intervals when the probability of late events reporting was 0. It indicated that when
the reporting interval became smaller , a lower INIC can be achieved. If patients were scheduled
to report daily , which was the perfect reporting condition, the INIC was almost 0.
44
T able 2. The relationship between baseline hazard rate and INIC with fixed hazard ratio 0.67,
assuming none of the events were reported with delay (p=0) and the reporting interval was 6
months.
𝜆 0
𝜆 1
Hazard Ratio INIC (control) INIC
(experimental)
0.01 0.00667 0.6667 0.02046 0.01446
0.025 0.01667 0.6667 0.03879 0.02937
0.05 0.03333 0.6667 0.05479 0.04515
0.1 0.06667 0.6667 0.06575 0.05850
0.15 0.1 0.6667 0.07097 0.06535
0.2 0.13333 0.6667 0.07449 0.06930
0.25 0.16667 0.6667 0.07748 0.07197
0.3333 0.22222 0.6667 0.08205 0.07574
0.4 0.26667 0.6667 0.08538 0.07840
Figure 3.7.4. Changes of INIC at fifty percent information time with dif ferent reporting intervals
as baseline hazard rate increases.
T o examine whether the dif ference in log hazard ratio caused by delayed events reporting
was related to INIC, I calculated the theoretical log hazard ratio with dif ferent baseline hazard
rate (Fig 3.7.6). The study design was 48 months enrollment, 12 months follow-up and 6 months
reporting interval. The log hazard ratio and INIC were calculated at the 50% information time.
Figure 3.7.5 described how the estimate of hazard rate at 50% information time in each
45
group changed as INIC changed. The hazard rate estimate was calculated as 𝐸 (𝐷 |𝑝 )/𝐸 (𝑇 |𝑝 ) ,
where 𝐸 (𝐷 |𝑝 ) was calculated with equation 3.7.9 and 𝐸 (𝑇 |𝑝 ) was calculated with equation
3.7.10. The hazard rate estimate in the two arms were quite close when INIC was small. The
absolute dif ference between the hazard rate estimates became lar ger as the probability of late
events reporting, p, became smaller and hence INIC became lar ger .
Figure 3.7.5. Changes of hazard rate estimates with dif ferent INIC of control arm, at fifty
percent information time with a reporting interval of 6 months.
The results in Figure 3.7.6 showed that the biase of log hazard ratio estimates were quite
small when INIC in both arms were small, regardless of the probability of late events report-
ing. The bias of log hazard ratio estimates became increasingly lar ger when INIC increased.
Therefore, my study would focus on the study design with relatively high hazard rate, which
had corresponding high INICs that were associated with big bias in hazard ratio estimation.
46
Figure 3.7.6. Changes of log hazard ratio estimates with dif ferent INIC in both arms, at fifty
percent information time with a reporting interval of 6 months.
47
Chapter 4: Simulation Results
First I checked the empirical log hazard ratio with dif ferent reporting intervals and dif ferent
probability of late events reporting, assuming 48 months enrollment plus 12 months follow-up,
a baseline hazard rate of 0.01 and a true hazard ratio of 2/3 . The empirical results were the
same as the theoretical results. The bias of log hazard ratio was very small in all scenarios
when the baseline hazard rate was as low as 0.01. Therefore, my simulations would focus on
high baseline hazard rate settings.
Figure 4.1. Empirical log hazard ratio with dif ferent reporting intervals and dif ferent probability
of late events reporting, assuming baseline hazard rate was as low as 0.01.
48
4.1 RGray Analysis with standard data pr ocessing method
4.1.1 Estimate RGray analysis time by simulation
For a study design with one-sided type I error 0.025 and type II error 0.2, the RGray analysis
time happens to be 50%. Simulation method was performed to get empirical value of RGray
analysis time. The empirical trial data was generated with the following parameters:
• Uniform enrollment in 4 years
• One-year follow-up after the end of enrollment
• Hazard rate in control arm 𝜆 0
= 1/3
• Hazard rate in experimental arm 𝜆 1
= 1/4.5
• Hazard ratio under the null (Experimental arm over control arm) 𝐻 𝑅 = 𝜆 0
/𝜆
0
= 1
• Hazard ratio under the alternative (Experimental arm over control arm) 𝐻 𝑅 =
𝜆 1
/𝜆
0
= 0.6667
• Interval between follow-up visit (in month): 𝑤 ∈ {0.033, 2, 4, 6}
• Probability of late reporting of events: 𝑝 ∈ {0, 0.2, 0.4, 0.6, 0.8, 1}
• 198 total patients randomly assigned to control arm and experimental arm in a 1:1 ratio
49
SAS PROC SEQDESIGN was used to calculate the sample size for a trial with one-sided
type I error rate of 0.025 and 80% power to detect a tar get hazard ratio of 0.67, with 3 interim
analysis at 25%, 50% and 75% information time for futility monitoring only . The monitoring
boundary was determined by error spending with the power family method. W e assumed the
participants entered the study uniformly during the enrollment period, and none of them was
loss to follow up. The probability of late event reporting 𝑝 was generated as a Bernoulli random
variable, with 𝑝 = 1 indicated the corresponding events would not be reported until the next
regular visit time while 𝑝 = 0 indicated the corresponding events would be reported imme-
diately without delay . The reporting interval 𝑤 was assumed to be a constant throughout the
trial for all patients. The time to event was assumed to follow an exponential distribution. Five
thousand trials were simulated for each scenario. SAS PROC LIFEREG was used to calculate
the log hazard ratio for each individual experiment, assuming the exponential distribution.
50
Figure 4.1.1.1. Empirical RGray analysis time (in months) under the Alternative by simulation
The empirical RGray analysis times under the alternative by simulation were as predicted
by my theoretical calculations. The analysis time at 0.033 month reporting interval was quite
similar for dif ferent probabilities of late reporting. The average value was about 27.65 months
if the reporting interval was 0.033 months (which meant one day interval). The analysis time
increased as the reporting interval became longer , and the probability of later reporting became
greater .
51
Figure 4.1.1.2. Empirical RGray analysis time (in months) under the Null by simulation
The average of RGray analysis times under the null by simulation were also as predicted by
my theoretical calculations. The analysis time at one day interval did not vary by probability
of late reporting. The analysis time increased as the reporting interval became longer , and the
probability of later reporting became greater . W e will compare the theoretical values and the
empirical values side by side for each scenario.
52
Figure 4.1.1.3. Comparision of theoretical and empirical RGray analysis time
The theoretical value and the empirical value were consistent with each other , for both the
null and alternative situations. Interestingly , the theoretical value was always slightly bigger
than the empirical value. However , the dif ference was very subtle. Most of the dif ferences
were less than 1 days. Therefore, we can safely ignore the dif ference between theoretical value
and empirical values. The general trend for RGray analysis time was that it increased as the
reporting interval became bigger . For a fixed length of reporting interval, the analysis time
increased as the late event reporting probability increased.
53
4.1.2 Distribution of empirical RGray analysis time
Figure 4.1.2.1. Box plot showed the distribution of empirical RGray analysis time under the
Alternative
The box plot was used as an alternative method to show the distribution of RGray analysis
time. The advantage of this plot was that it could demonstrate the variance of the data from
5000 simulation trials for each scenario. In this graph, the distribution looked normal and the
variance appeared to be consistent across all scenarios.
54
Figure 4.1.2.2. Density plot showed the distribution of empirical RGray analysis time under the
Alternative
The kernel density plot provided a better visualization of the shift of RGray analysis time
(Fig 4.1.2.2). It demonstrated not only the mean but also the extremes at which the analysis was
performed. At the 0.033 month interval, the analysis time was almost the same as indicated by
the overlapping density plot on the top row of Fig 4.1.2.2. The density plot started growing
apart as the interval became bigger . The biggest shift was observed when the reporting interval
55
was 6 months. The highest probability of late events increased the tar get analysis date the most.
Figure 4.1.2.3. Box plot showed the distribution of empirical RGray analysis time under the
Null
56
Figure 4.1.2.4. Density plot showed the distribution of empirical RGray analysis time under the
Null
The box plot and density plot constructed under the null shared the same feature with those
under the alternative. Longer waiting time was needed to get 50% of expected events if the
reporting interval got bigger , and/or the probability of late events reporting increased, for both
the situation under the alternative and under the null.
57
4.1.3 Empirical log hazard ratio at RGray analysis time
Figure 4.1.3.1. Empirical log hazard ratio by simulation under the Alternative
Fig 4.1.3.1 showed the average value of log hazard ratio from 5000 simulation trials for each
scenario. Under the scenario of perfect reporting, the average log hazard ratio was approxi-
mately -0.3994 for all probabilities of late events reporting. As shown in the theoretical values,
it started to show dif ference as the reporting interval increased. At the 6-month reporting inter -
val, the average log hazard ratio was about -0.40 when all events were reported late, was -0.38
when probability of late events reporting was 0.
58
Figure 4.1.3.2. Empirical log hazard ratio by simulation under the Null
Under the null, the average values of log hazard were not af fected by the change of late
reporting probability or change of reporting intervals. All empirical values were very close to
0, which was the same as study design (log1=0).
59
Figure 4.1.3.3. Comparison of theoretical log HR versus empirical log HR under the Alternative
The theoretical log hazard ratio and the empirical log hazard ratio were put side by side
for easy comparison. Both of them showed the same changing trend as the reporting interval
and probability of error reporting vary . Interestingly , the theoretical value was always slightly
smaller than the empirical value. It might due to the reason that the empirical analysis time was
slightly earlier than the theoretical analysis time, and point estimates achieved in early stages
of trials tend to be overestimated. The dif ferences were very small and could be safely ignored.
60
Figure 4.1.3.4. Comparison of theoretical log HR versus empirical log HR under the Null
The empirical log hazard ratio was very close to the theoretical log hazard ratio under the
null, which was 0. The theoretical value was slightly smaller than the empirical value, which
was consistent with the observations under the alternative.
61
4.1.4 Distribution of empirical log Hazard Ratio
Figure 4.1.4.1. Box plot showed the distribution of empirical log HR under the Alternative
The box plot showed the distribution of log hazard ratio under the alternative out of 5000 simu-
lated trials in each scenario. The variance looked small and consistent across all the scenarios.
62
Figure 4.1.4.2. Density plot showed the distribution of empirical log HR under the Alternative
The density plot provided an alternative view of the distribution of log hazard ratio (Fig
4.1.4.2). This plot assumed normallity and used the sample mean and variance when construct-
ing the plot. The overlapping bell-shaped curve indicated that the average estimate of empirical
log hazard ratio was quite close to each other when the reporting interval is small. As the re-
porting interval became bigger , the bell-shaped curves shifted horizontally to the right as the
probability of later reporting decreased.
63
Figure 4.1.4.3. Box plot showed the distribution of empirical log HR under the Null
The box plot showed the distribution of log hazard ratio under the null out of 5000 simulated
trials in each scenario. The variance appeared small and consistent across all the scenarios.
64
Figure 4.1.4.4. Density plot showed the distribution of empirical log HR under the Null
The density plot indicated that the empirical log hazard ratio under the null was not af fected
by the reporting intervals and probability of error reporting. The bell-shaped curves were
overlapping under all scenarios.
65
4.1.5 Pr oportion of trials that ar e stopped for futility by RGray method
Figure 4.1.5.1. Proportion of trials that were stopped for futility by RGray method under the
Alternative
66
Figure 4.1.5.2. A 3D plot showed the proportion of trials that were stopped for futility by RGray
method under the Alternative
The RGray method suggested to declare futility if the log hazard ratio was greater than 0 when
50% of expected events were observed. Based on this rule, I calculated the proportion of being
rejected early due to futility for all the 5000 simulated trials in each scenario. The result showed
that under perfect reporting, which was 0.033 month reporting interval, the proportion of trials
stopped early for futility under the alternative was about 2.7%. As the reporting interval became
bigger , the proportion of trials stopped early with the conclusion of inef ficacy increased. Higher
proportion of stopping for futility was observed under the scenario with decreasing probability
of late events reporting. In the case of 6-month reporting interval, about 2.7% trials were stopped
for futility if all events were reported late (probability of late events reporting equals 1), while
about 3.3% trials were stopped for futility if none of the events were reported late (probability
67
of late events reporting equals 0). This is about 22% increase in early stopping, which was not
favored since the treatment arm was beneficial under the alternative. It was desirable if the trial
could continue to the end.
Figure 4.1.5.3. Proportion of trials that were stopped for futility by RGray method under the
Null
The proportion of trials that were stopped for futility based on the rule of RGray method
was also calculated. It showed the proportion was quite consistent across all the scenarios,
which was around 51%. Based on the information shown so far , the futility monitoring under
the null did not seem to be af fected by later events reporting, which was good news. I will
examine methods to address the concern of rejecting too often for futility under the alternative
in the following sections.
68
4.1.6 Loss of power of RGray analysis with Standard data pr ocessing method
Figure 4.1.6. Loss of power of RGray analysis with Standard data processing method
The loss of power of RGray analysis with standard data processing method under the perfect
reporting was 0.4%, which was close to the theoretical value 0.3% as calculated in section 3.6.
The loss of power increased as the reporting interval became bigger and the probability of late
events reporting became smaller . The biggest loss of power was observed when the reporting
interval was 6 months and the probability of late events reporting equaled 0, which was 0.8%.
It was consistent with the observation that the highest proportion of trials were stopped to claim
futility under this scenario (Fig 4.1.5.1), which resulted in lower study power since fewer trials
would pass the interim futility check and be rejected in the end of study .
69
4.2 RGray analysis with Personal Cutback method
From the previous results, it was worth noting that the scenario with delayed event probability
1 always had the same estimation of log hazard ratio and proportion of early stopping for
futility as what was under the perfect reporting. When all events were reported late, the
events information would be only collected at scheduled follow up visit. Therefore, only
the information collected before the last visit time would contribute to the analysis. This is
the personal cutback data processing method, which was reported by Dr .McIlvaine in 2015
(McIlvaine, 2015).
4.2.1 Log hazard ratio at RGray analysis time with Personal Cutback method
Figure 4.2.1.1. Log hazard ratio under the Alternative at RGray analysis time with Personal
Cutback method
70
Since the information for analysis was all current with personal cutback method, the estimate
of hazard ratio was expected to be unbiased. The empirical results in Fig 4.2.1.1 confirmed this
statement. The average log hazard ratio was around -0.40 for all scenarios, which was exactly
as expected.
Figure 4.2.1.2. Log hazard ratio under the Null at RGray analysis time with Personal Cutback
method
The average log hazard ratio under the null was close to 0 for all scenarios.
71
4.2.2 Actual information level with Personal Cutback method
Figure 4.2.2.1. Actual Information Level under the Alternative by RGray method with Personal
Cutback method
72
Figure 4.2.2.2. Actual Information Level under the Null by RGray method with Personal Cut-
back method
The RGray analysis time was when 50% expected events were observed with standard data
processing method. If personal cutback method was used instead, the calendar time at which
the analysis was done was fixed. The total time contributed by patients was reduced as was
the number of events, since all the events that occurred between the last visit date and the
analysis date were ignored. Therefore, the events included in the analysis would be less than
50% of expected events when the analysis was conducted at the predicted time 50% of the
events would be observed in the absence of reporting delays. The actual information level,
which indicated the fraction o f total information available at the RGray analysis time, would
also change accordingly . Fig 4.2.2.1 and Fig 4.2.2.2 demonstrated the distribution of actual
information level at RGray analysis time with personal cutback method. The actual information
73
level appeared to be lower than 50% at most scenarios, with the lowest information level been
observed when p=0 and the reporting interval was 6 months, in which case the biggest number
of events were ignored. Therefore, I adjusted the monitoring boundary according to the actual
information level using linear boundary setting as prescribed by the LIB20 method. It was worth
noting that the LIB20 method was invented to adjust the boundary to the right after passing the
interim monitoring point 𝑡 0
at 50% information level, not to the left. My proposed method of
adjustment prior to 𝑡 0
has not been rigorously evaluated to date.
Figure 4.2.2.3. Diagram of extended LIB20 boundary
T able 4.1. RGray monitored example trial w6p2-605: Comparison of Standard method versus
Personal Cutback method
Method Analysis
time
(months)
Number
of
observed
events
Actual
Information
Level
RGray
Bound-
ary
log
HR
Action
Standard 26.18 99 0.5 0 0.08 Accept Null
Personal Cutback 26.18 79 0.4 0.014 -0.01 Continue
For better illustration, let us take the trial w6p2-605 as an example (T able 4.1). The name
of the trial indicated the it was the 605
𝑡ℎ
simulated trial for the scenario of 6 months reporting
interval and the probability of late events reporting was 0.2. The RGray analysis time for this
trial was 26.18 months, when total 99 events in both arms were observed with standard data pro-
cessing method. The log hazard ratio was 0.08, which was higher than the RGray boundary 0.
Therefore, this trial would be stopped and claim futility . However , if we use personal cutback
method instead, the events occurred between the last visit time and the RGray analysis time
would be ignored, hence only 79 events would be included in the analysis. The actual informa-
74
tion level would then became 79/198=0.40 instead of 0.5. The corresponding LIB20 boundary
for information level 0.40 would be 0.014. The log hazard ratio with personal cutback method
was -0.01, which was lower than the adjusted boundary 0.014. Therefore, this trial would pass
the checkpoint and continue to the next stage.
4.2.3 Pr oportion of trials that ar e stopped for futility by RGray method with Personal
Cutback method
Figure 4.2.3.1. Proportion of trials that were stopped for futility under the Alternative by RGray
method with Personal Cutback method
Although we had achieved unbiased log hazard ratio by personal cutback method, and the
monitoring boundary had been adjusted according to the actual information level, we did not
get the expected proportion of rejection for futility . As shown in Fig 4.2.3.1, the proportion
of trials stopped early for futility was around 2.7% under perfect reporting (with reporting
75
interval 0.033 month). The proportion increased as the probability of late reporting decreased
when personal cutback method was applied. The reject proportion was 3.5% for the scenario
of 6-month reporting interval, 0 error reporting probability . I suspect that the distribution of
log hazard ratio must have been changed, so that more trials had log hazard ratio greater than
the boundary value.
Figure 4.2.3.2. Proportion of trials that were stopped for futility under the Null by RGray method
with Personal Cutback method
The personal cutback method also caused unexpected changes in the proportion of trials
stopped for futility under the null. Fig 4.2.3.2 showed that the proportion decreased as the
reporting interval became lar ger . W ithin a fixed reporting interval, the proportion increased
as probability of late reporting increased. Interestingly , unlike the alternative condition where
76
the lowest proportion of trials stopped early for futility was observed when p=1, the lowest
proportion was observed when p=0 under the null.
4.2.4 The distribution of log hazard ratio at RGray analysis time with Personal Cutback
method
Figure 4.2.4.1. Box plot shows the distribution of log hazard ratio under the Alternative at
RGray analysis time with Personal Cutback method
The box plot indicated that compared to what was observed under the standard method, the vari-
ance of log hazard ratio at RGray analysis time with personal cutback method became bigger ,
especially when the probability of late event reporting was small.
77
Figure 4.2.4.2. Density plot shows the distribution of log hazard ratio under the Alternative at
RGray analysis time with Personal Cutback method
The density plot explained why the proportion of rejection was not as expected. It was
shown in Fig 4.1.4.2 that the bell-shaped curve shifted horizontally if analyze with the standard
method at RGray analysis time. Therefore, the average value of log hazard ratio changed in
each scenario. While in Fig 4.2.4.2, the bell-shaped curve shifted vertically and became more
dispersed when p decreased, indicating higher variance. Therefore, even though the expected
78
value of log hazard ratio estimates were the same across all scenarios, the uncertainty in where
the log hazard ratio would end up was increased, hence the proportion of trials stopped early
for futility increased as p decreased and reporting interval w increased. In this graph, the
curve of p=0 had the heaviest tail, which explained why it had the highest proportion of trials
being stopped early for futility . The p=0 means that the personal cutbak method threw away a
lar ger proportion of observed events, since all events were reported and more were showing up
between the last visit and the RGray analysis time. Although the boundary had been adjusted
to a value which was greater than 0 to accommodate the lower information level due to late
events reporting, the proportion lar ger than the adjusted LIB20 boundary was still greater than
the expected proportion under perfect reporting.
Figure 4.2.4.3. Box plot showed the distribution of log hazard ratio under the Null at RGray
analysis time with Personal Cutback method
79
The box plot indicated the presence of lar ge variance of log hazard ratio estimates under the
null when using the personal cutback data processing method.
Figure 4.2.4.4. Density plot showed the distribution of log hazard ratio under the Null at RGray
analysis time with Personal Cutback method
The personal cutback method also changed the distribution of log hazard ratio under the
null. The bell-shaped curve shifted vertically as the probability of late reporting changed, in
contrast to the overlapping bell-shaped curve with the standard data processing method (Fig
80
4.1.4.4). It was observed in Fig 4.2.3.2 that the lowest proportion of trials stopped early for
futility was observed when p=0 under the null. This is because the personal cutback method
would lead to lowest actual information level when p=0, therefore, the highest adjusted LIB20
boundary would be applied. When the boudary was shifted from 0 to a small positive value on
the right, the most far right shifting boundary would result in lowest proportion of trials to be
stopped for futility .
4.2.5 Loss of power of RGray analysis with Personal Cutback method
Figure 4.2.5. Loss of power of RGray analysis with Personal Cutback method
Similar as what was observed with standard data processing method, the loss of power increased
as the reporting interval became bigger and the probability of late events reporting became
81
smaller . The biggest loss of power was observed when t he reporting interval was 6 months and
every events were reported without delay , which was 0.78%.
4.3 RGray analysis with Global Cutback method
The global cutback method, another data processing method introduced by Dr .McIlvaine in
2015 (McIlvaine, 2015), was also tested in the study . This method pushed the analysis time
back a window width of time, and therefore expected to obtain unbiased estimation of log
hazard ratio.
4.3.1 Log hazard ratio at RGray analysis time with Global Cutback method
Figure 4.3.1.1. Log hazard ratio at RGray analysis time with Global Cutback method under the
Alternative
82
As expected, the global cutback method achieved unbiased log hazard ratio across all scenarios
under the alternative.
Figure 4.3.1.2. Log hazard ratio at RGray analysis time with Global Cutback method under the
Null
The average log hazard ratio was close to 0 under the null across all scenarios.
83
4.3.2 Actual information level with Global Cutback Method
Figure 4.3.2.1. Actual Information Level under the Alternative by RGray method with Global
Cutback method
84
Figure 4.3.2.2. Actual Information Level under the Null by RGray method with Global Cutback
method
The actual information level when global cutback method was employed was even smaller than
that of personal cutback method, since global cutback method ignored many more events than
personal cutback method did. As before, the LIB20 boundaries were used to accommodate the
changes of actual information level.
85
4.3.3 Pr oportion of trials that ar e stopped for futility by RGray method with Global
Cutback method
Figure 4.3.3.1. Proportion of trials that were stopped for futility with Global Cutback method
under the Alternative
However , the proportion of trials that were stopped for futility was still higher than that of
perfect reporting.
86
Figure 4.3.3.2. Proportion of trials that are stopped for futility with Global Cutback method
under the Null
The proportion of trials stopped early for futility under the null had the similar trend as was
observed with personal cutback method.
87
4.3.4 Distribution of log hazard ratio at RGray analysis time with Global Cutback
method
Figure 4.3.4.1. Box plot showed the distribution of log HR under the Alternative at RGray
analysis time with Global Cutback method
The estimate of log hazard ratio with global cutback method also had greater variance than that
of standard method.
88
Figure 4.3.4.2. Density plot shows the distribution of log HR under the Alternative at RGray
analysis time with Global Cutback method
The bell-shaped curve also shifted vertically , which explained the high proportion of trials
that were stopped early for futility under the scenario of low probabilities of late reporting.
89
Figure 4.3.4.3. Box plot shows the distribution of log HR under the Null at RGray analysis time
with Global Cutback method
The estimate of log hazard ratio under the null with global cutback method also had greater
variance than that of standard method.
90
Figure 4.3.4.4. Density plot shows the distribution of log HR under the Null at RGray analysis
time with Global Cutback method
Similar to what was observed with personal cutback method, the bell-shaped curve shifted
vertically unde the null. The lowest proportion of early stop for futility when p=0 can be
explained by the highest adjusted LIB20 boundary .
91
4.3.5 Loss of power of RGray analysis with Global Cutback method
Figure 4.3.5. Loss of power of RGray analysis with Global Cutback method
The loss of power across various scenarios showed similar pattern as observed under standard
data processing method and personal cutback method.
4.4 RGray analysis with Personal Cutback W ait method
W e have shown that the change of distribution of log hazard ratio estimates from simulated
trials af fect the proportion of early stopping for futility . T o obtain the desired proportion of early
rejections, we need to determine an approach that yields the same distribution of the estimate
of the log hazard ratio regardless of the reporting interval width and the probability of delayed
reporting of an event. Since the number of observed events varied in each simulation trial,
92
which lead to distributions of the log hazard ratio with the same mean but dif fering variances,
we could delay the analysis time for futility monitoring until we observe 50% expected events
with personal cutback method. T o be able to distinguish from previous described personal
cutback method, this ‘new’ method would be called personal cutback wait method.
4.4.1 RGray analysis time with Personal Cutback W ait method
T able 4.2. RGray monitored example trial w6p2-605: Comparison of Standard method, Per -
sonal Cutback method and Personl Cutback W ait method
Method Analysis
time
(months)
Number
of
observed
events
Actual
Information
Level
RGray
Bound-
ary
log
HR
Action
Standard 26.18 99 0.5 0 0.08 Accept Null
Personal Cutback 26.18 79 0.4 0.014 -0.01 Continue
Personal Cutback W ait 28.70 99 0.5 0 0.05 Accept Null
Let us again use the trial w6p2_605 as an example (T able 4.2). As mentioned before, the
RGray analysis time for this trial was 26.18 months, when 99 total events were observed in
both arms with standard data processing method. If the personal cutback method was used
instead, only 79 events would contribute to the hazard ratio estimation. The actual information
level would be 0.40 instead of 0.50. Therefore, the monitoring boundary needs to be adjusted
accordingly . If we use the personal cutback wait method, the analysis should be performed at
28.70 months, at which time 99 total events were observed in both arms by personal cutback
methods. The log hazard ratio by personal cutback wait method turned out to be 0.045, which
was higher than the boudary value 0. Therefore, the trial would be stopped and inef ficacy of
the experimental regimen concluded. The personal cutback wait method was performed 2.5
months later than the standard method.
93
Figure 4.4.1.1. A verage wait time under the Alternative with Personal Cutback W ait method
The wait time varied as the reporting interval and the probability of late reporting changed.
If the reporting interval became bigger and late reporting probability became smaller , the wait-
ing time would become lar ger . There was no waiting time if under perfect reporting, or the
probability of late event reporting was 1. The wait time was around 3.8 months when the re-
porting interval was 6 months and the late reporting probability was 0.
94
Figure 4.4.1.2. Density plot showed the distribution of empirical RGray analysis time under the
Alternative with Personal Cutback W ait method
The density plot of the RGray analysis time with personal cutback wait method indicated
that the analysis time was not af fected by the dif ference probability of late events reporting.
It was reasonable that the analysis time got lar ger when the reporting interval increased, since
bigger reporting interval indicated that the events were not reported as frequently as smaller
reporting interval, therefore, longer waiting time was needed to obtain the desired number of
95
events.
Figure 4.4.1.3. A verage wait time under the Null with Personal Cutback W ait method
The wait time under the null had the same trend as the wait time under the alternative.
96
Figure 4.4.1.4. Density plot showed the distribution of empirical RGray analysis time under the
Null with Personal Cutback W ait method
The RGray analysis time with personal cutback wait method under the null showed the same
pattern as under the alternative.
97
4.4.2 Log hazard ratio at RGray analysis time with Personal Cutback W ait method
Figure 4.4.2.1. Log hazard ratio at RGray analysis time with Personal Cutback W ait method
under the Alternative
As expected, the personal cutback wait method resulted in very similar estimates of log hazard
ratio across all scenarios. The average was approximately -0.4.
98
Figure 4.4.2.2. Log hazard ratio at RGray analysis time with Personal Cutback W ait method
under the Null
The average log hazard ratio under the null were very close to the expected value 0 across
all scenarios.
99
4.4.3 Distribution of log hazard ratio at RGray analysis time with Personal Cutback
W ait method
Figure 4.4.3.1. Box plot showed the distribution of log HR under the Alternative at RGray
analysis time with Personal Cutback W ait method
The box plot demonstrated the distribution of log hazard ratio obtained from personal cutback
wait method. The variance appeared quite small across all scenarios.
100
Figure 4.4.3.2. Density plot showed the distribution of log hazard ratio under Alternative at
RGray analysis time with Personal Cutback W ait method
The density plot showed that the bell-shaped curves perfectly overlapped for all scenarios.
This explained why we get consistent proportion of trials that were stopped early for futility
under all scenarios.
101
Figure 4.4.3.3. Box plot showed the distribution of log HR under the Null at RGray analysis
time with Personal Cutback W ait method
The distribution of log hazard ratio under the null did not dif fer substantially across the
various scenarios.
102
Figure 4.4.3.4. Density plot showed the distribution of log hazard ratio under the Null at RGray
analysis time with Personal Cutback W ait method
The bell-shaped density plot overlapped for all scenarios under the null.
103
4.4.4 Pr oportion of trials that ar e stopped for futility by RGray method with Personal
Cutback W ait method
Figure 4.4.4.1. Proportion of trials that were stopped for futility with Personal Cutback W ait
method under the Alternative
Now we achieved very similar proportion of trials stopped early for futility for all scenarios,
which was about 2.7%. That was the proportion we observed under the perfect reporting (with
0.033 month reporting interval).
104
Figure 4.4.4.2. Proportion of trials that were stopped for futility with Personal Cutback W ait
method under the Null
Under the null, we also observed similar proportions of early stopped trials across all the
scenarios, which was 51%. That was the proportion we observed under the perfect reporting
(with 0.033 month reporting interval).
105
4.4.5 Loss of power of RGray analysis with Personal Cutback W ait method
Figure 4.4.5. Loss of power of RGray analysis with Personal Cutback W ait method
W ith personal cutback wait method, the loss of power was not af fected by delayed events
reporting. It was 0.4% across all scenarios, which suggested that the loss of power was not a
concern when interim futility monitoring was performed with RGray method in the presence
of delayed events reporting.
4.5 RCI futility monitoring with Standard data pr ocessing method
Having extensively explored the futility monitoring with the RGray method, I am interested
to examine the ef fect of later events reporting on futility monitoring with another widely used
method, the repeated confidence intervals (RCI). The RCI at each interim stages was calculated
106
with the equation (2.1.2.2). The trial would be stopped for futiltiy if the lower bound RCI was
bigger than the lower alternative reference, which was -0.4054 in my study . The T able 3 showed
an example to monitor the trial w6p2-605 with RCI method. The trial was monitored at 25%,
50% and 75% information level. The lower RCI at each monitoring time was -1.06, -0.43 and
-0.38, respectively . The first time the RCI was bigger than the lower alternative reference -
0.4054 was stage 3, when the information level was 75% and the RCI was -0.38. Therefore,
the trial would be stopped at stage 3 with the conclusion that the alternative therapy was not
suf ficiently ef ficacious.
T able 4.3. RCI monitored example trial w6p2-605 with Standard data processing method
Method Stage Analysis
time
(months)
Number
of
observed
events
Actual
Information
Level
log HR Lower
RCI
Lower
Alt
Ref
Action
Standard 1
2
3
4
16.48
26.18
37.83
60
50
99
148
197
0.25
0.50
0.75
1
-0.23
0.08
-0.009
-0.05
-1.06
-0.43
-0.38
-0.34
-0.405
-0.405
-0.405
-0.405
Continue
Continue
Accept Null
In this section, I performed three interim monitoring at 25%, 50% and 75% information
levels and compared the proportion of trials stopped early for futility determined by RCI using
three dif ferenct data processing methods, the standard method, the personal cutback method,
and the personal cutback wait method.
107
4.5.1 Log hazard ratio with Standard data pr ocessing method
Figure 4.5.1.1. Log hazard ratio under the Alternative with Standard data processing method
The above plot demonstrated the average empirical value of log hazard ratio at all the 4 stages
in each scenario, with the standard data processing method. The average value of log hazard
ratio was not af fected by evaluate stage and probability of error reporting when the reporting
interval was 0.033 month, which indicated perfect reporting. As the reporting interval became
bigger , the scenario with lower probability of later reporting started showing higher estimate
of log hazard ratio. In the meantime, the average value of log hazard ratio at earlier interim
monitoring stages also became bigger .
108
Figure 4.5.1.2. Log hazard ratio under the Null with Standard data processing method
The empirical value of log hazard ratio under the null did not seem to be af fected by
reporting intervals and probability of late events reporting. However , it did seem like the
estimation obtained at early stages tend to be lar ger than that of later interim stages. The
dif ferences were very subtle though. All the estimations were quite close to 0, which was the
expected value under the null.
109
4.5.2 Pr oportion of trials that ar e stopped for futility by RCI method with Standard data
pr ocessing method
Figure 4.5.2.1. Proportion of trials that were stopped for futility with RCI method under the
Alternative with Standard data processing method
W ith the standard data processing method, the proportion of trials that were stopped early for
futility under the alternative based on RCI method increased as the reporting interval became
lar ger . W ithin a fixed reporting interval, this proportion increased as the probability of late
reporting got smaller .
1 10
Figure 4.5.2.2. Proportion of trials that are stopped for futility with RCI method under the Null
with Standard data processing method
The proportion of trials that were stopped early for futility based on RCI method was con-
sistent across all scenarios. It did not change as the reporting interval or the probability of late
events reporting changed.
1 1 1
4.5.3 Distribution of Lower Repeated CI with Standard data pr ocessing method
Figure 4.5.3.1. Distribution of Lower Repeated CI under the Alternative with Standard data
processing method
The density plot demonstrated a horizontal shift at all three interim stages. The shift was most
prominent at stage 1, when the information level was 25%. The lowest probability of late events
reporting (p=0) caused shifting to the right most, which lead to a higher proportion lar ger than
-0.4054. That was why we observed the highest proportion of trials that were stopped for futility
when p was 0 in Fig 4.5.2.1.
Figure 4.5.3.2. Distribution of Lower Repeated CI under the Null with Standard data processing
method
1 12
The overlapping density plot across all scenarios under the null explained why the propor -
tion of trials stopped early for futility were consistent for all scenarios, as shown in Fig 4.5.2.2.
4.5.4 Loss of power of RCI method with Standard data pr ocessing method
Figure 4.5.4. Loss of power of RCI method with Standard data processing method
When the repeated confidence interval method was deployed for interim futility monitoring, the
loss of power under perfect reporting was 0.08%, which was close to the theoretical value 0.1%
as calculated in section 3.6. The highest loss of power was observed when the reporting interval
was 6 months and none of the events were reported with delay , which was 0.3%. The loss of
power with RCI method was much smaller than that of RGray method, which was consistent
with previous report that RCI method had the lowest loss of power compared to other futility
monitoring methods (Freidlin et al., 2010).
1 13
4.6 RCI futility monitoring with Personal Cutback method
The error spending boundary was adjusted according to the actual information level when per -
sonal cutback method was applied. The repeated confidence intervals was recalculated accord-
ingly .
T able 4.4. RCI monitored example trial w6p2-605: Comparison of Standard method versus
Personal Cutback method
Method Stage Analysis
time
(months)
Number
of
observed
events
Actual
Information
Level
log HR Lower
RCI
Lower
Alt
Ref
Action
Standard 1
2
3
4
16.48
26.18
37.83
60
50
99
148
197
0.25
0.50
0.75
1
-0.23
0.08
-0.009
-0.05
-1.06
-0.43
-0.38
-0.34
-0.405
-0.405
-0.405
-0.405
Continue
Continue
Accept Null
Personal
Cutback
1
2
3
4
16.48
26.18
37.83
60
35
79
132
197
0.17
0.40
0.67
1
-0.24
-0.01
-0.04
-0.05
-1.33
-0.62
-0.45
-0.34
-0.405
-0.405
-0.405
-0.405
Continue
Continue
Continue
Accept Null
T able 4.4 showed an example how the personal cutback method would af fect the futility
monitoring with RCI method. The method changed the information level and hence changed
the lower RCI, which allowed the trial to continue to the end.
1 14
4.6.1 Log hazard ratio with Personal Cutback method
Figure 4.6.1.1. Log hazard ratio under the Alternative with Personal Cutback method
The empirical hazard ratios under the alternative obtained with personal cutback method were
consistent across all scenarios, compared to what was observed with the standard method which
was shown in Fig 4.5.1.1.
1 15
Figure 4.6.1.2. Log hazard ratio under the Null with Personal Cutback method
The empirical hazard ratio under the null obtained with personal cutback method were con-
sistent across all scenarios despite some subtle dif ferences.
1 16
4.6.2 Pr oportion of trials that ar e stopped for futility by RCI method with Personal
Cutback method
Figure 4.6.2.1. Proportion of trials that were stopped for futility with RCI method under the
Alternative with Personal Cutback method
The personal cutback method leads to inconsistent proportion of trials that were stopped early
for futility based on RCI rules. The proportion became smaller when the probability of late
reporting decreased. This is because personal cutback method always looking at an earlier
information time than planned as it drops some information from analysis.
1 17
Figure 4.6.2.2. Proportion of trials that were stopped for futility with RCI method under the
Null with Personal Cutback method
Fig 4.6.2.2 showed a clear linear trend of change in the proportion of trials that were stopped
early for futility based on RCI method under the null with personal cutback method. The pro-
portion decreased as the reporting interval increased. W ithin a fixed reporting interval, the
proportion increased as the probability of late reporting increased.
1 18
4.6.3 Distribution of Lower Repeated CI with Personal Cutback method
Figure 4.6.3.1. Distribution of Lower Repeated CI under the Alternative with Personal Cutback
method
The density plot showed an interesting shift pattern. When p=1, the bell-shape curve was about
in the same location for all intervals. When p became smaller , it started shifting horizontally to
the left and became more dispersed at the same time. This shift resulted in a smaller porportion
of trials which had lower RCI lar ger than -0.4054. Therefore, the proportion of trials being
stopped early for futility decreased as the probability of late events reporting decreased, as
shown in Fig 4.6.2.1.
Figure 4.6.3.2. Distribution of Lower Repeated CI under the Null with Personal Cutback method
1 19
The density plot under the null had the same shift pattern as that under the alternative. The
proportion of trials which had RCI value bigger than -0.4054 was much bigger than that under
the alternative (~60% vs. ~1.9%), therefore, the linear trend of changes was much more obvious
under the null which was shown in Fig 4.6.2.2.
4.6.4 Loss of power of RCI method with Personal Cutback method
Figure 4.6.4. Loss of power of RCI method with Personal Cutback method
The loss of power of RCI futility monitoring with personal cutback method did not have obvious
pattern across various scenarios. The average value of loss of power was about 0.1%, which
was very small.
120
T able 4.5. RCI monitored example trial w6p2-605: Comparison of Standard method, Personal
Cutback method and Personl Cutback W ait method
Method Stage Analysis
time
(months)
Number
of
observed
events
Actual
Information
Level
log HR Lower
RCI
Lower
Alt
Ref
Action
Standard 1
2
3
4
16.48
26.18
37.83
60
50
99
148
197
0.25
0.50
0.75
1
-0.23
0.08
-0.009
-0.05
-1.06
-0.43
-0.38
-0.34
-0.405
-0.405
-0.405
-0.405
Continue
Continue
Accept Null
Personal
Cutback
1
2
3
4
16.48
26.18
37.83
60
35
79
132
197
0.17
0.40
0.67
1
-0.24
-0.01
-0.04
-0.05
-1.33
-0.62
-0.45
-0.34
-0.405
-0.405
-0.405
-0.405
Continue
Continue
Continue
Accept Null
Personal
Cutback
W ait
1
2
3
4
18.86
28.70
41.95
60
50
99
148
197
0.25
0.50
0.75
1
-0.23
0.05
-0.07
-0.05
-1.06
-0.46
-0.44
-0.34
-0.405
-0.405
-0.405
-0.405
Continue
Continue
Continue
Accept Null
4.7 RCI futility monitoring with Personal Cutback W ait method
T able 4.5 demonstrated the result from RCI futility monitoring with dif ferent data processing
method. The RCI value was calculated from observed hazard ratio and error -spending bound-
ary at each interim stage. When using the standard method, the lower RCI at stage 3 was bigger
than the lower alternative reference value, therefore, the trial was stopped at stage 3 and declare
futility . While with the personal cutback and personal cutback wait method, the RCIs were
recalculated based on actual information level when personal cutback method was applied. The
re-calculated lower RCIs were always smaller than the lower alternative reference value at in-
terim stages, therefore, the trial was allowed to proceed to the end of the study .
121
4.7.1 Log hazard ratio with Personal Cutback W ait method
Figure 4.7.1.1. Log hazard ratio under the Alternative with Personal Cutback W ait method
The empirical hazard ratio under the alternative obtained with personal cutback wait method
were consistent across all scenarios. Compared to late interim stage, the estimation of log hazard
ratio obtained in early interim stage was slightly bigger . The dif ference was much smaller than
what was observed under the standard data processing method.
122
Figure 4.7.1.2. Log hazard ratio under the null with Personal Cutback W ait method
The empirical hazard ratio under the null obtained with personal cutback wait method were
consistent across all scenarios. The log hazard ratio obtained at earlier stages tend to be bigger
than that obtained at late stages. The dif ferences were very small though.
123
4.7.2 Pr oportion of trials that ar e stopped for futility by RCI method with Personal
Cutback W ait method
Figure 4.7.2.1. Proportion of trials that were stopped for futility with RCI method under the
Alternative with Personal Cutback W ait method
The proportion of trials stopped early for futility under the alternative with RCI method looked
consistent across all the scenarios when using the personal cutback wait method. The subtle
dif ference at dif ferent reporting intervals could be safely ignored.
124
Figure 4.7.2.2. Proportion of trials that are stopped for futility with RCI method under the Null
with Personal Cutback W ait method
The proportion of trials stopped early for futility under the null with RCI method were all
the same across all the scenarios when using the personal cutback wait method, which was
around 60%.
125
4.7.3 Distribution of Lower Repeated CI with Personal Cutback W ait method
Figure 4.7.3.1. Distribution of Lower Repeated CI under the Alternative with Personal Cutback
W ait method
W ithin each interim stages, the bell-shaped curve overlapped perfectly for all scenarios, which
explained the consistent proportion of trials being stopped early for futility across all scenarios
as shown in Fig 4.7.2.1.
Figure 4.7.3.2. Distribution of Lower Repeated CI under the Null with Personal Cutback W ait
method
Similar to what was observed under the alternative, the bell-shaped curve overlapped per -
fectly for all scenarios within each interim stage under the null, which explained the consistent
126
proportion of trials being stopped early for futility across all scenarios as shown in Fig 4.7.2.2.
4.7.4 Loss of power of RCI method with Personal Cutback W ait Method
Figure 4.7.4. Loss of power of RCI method with Personal Cutback W ait method
W ith personal cutback wait method, the loss of power was not af fected by the delayed reporting
and the average value is around 0.1% across all scenarios, which was very small. The results
indicated that loss of power was not a concern when interim futility monitoring was performed
with RCI method in the presence of delayed events reporting.
4.8 W eibull distribution
Since most of my work are based on an exponential setting, I next check whether my method can
be applied to a W eibull setting. The probability density function (PDF) of a W eibull distribution
127
is:
𝑓 (𝑥; 𝜆, 𝑘 ) =
𝑘 𝜆 (
𝑥 𝜆 )
𝑘−1
𝑒 (𝑥/𝜆)
𝑘 , 𝑥 ≥ 0
where 𝑘 is the shape parameter and 𝜆 is the scale paramter . It can be reduced to the
exponential distribution when 𝑘 = 1 . When the scale parameter 𝜆 is fixed and the shape
parameter 𝑘 = 0.5 , the W eibull distribution shows that the failure rate decreases over time
(Fig 4.8.1). It indicates that the patients are failing quickly early and the failure rate becomes
smaller and smaller later . This can be used to mimic the survival characteristics of pediatric
cancers, which has high risk of events immediately after diagnosis, and the risk of failure
attenuates as time goes by .
Figure 4.8.1. W eibull distribution
I used the following parameters to simulate a W eibull distribution dataset: scale = 20,
128
shape = 0.5, and HR = 0.63. The changes of baseline hazard rate over time were shown in
Fig 4.8.2. I will recruit 204 patients in 48 months and finish 12 months followup after the last
patient entered the study . I expected to observe 148 events by the end of study .
Figure 4.8.2. Baseline hazard rate changes over time since time of enrollment
Figure 4.8.3. Comparison of RGray analysis time for Standard/Personal Cutback method vs.
Personal Cutback W ait method
I calculated the RGray analysis time (which was the 50% information time) in each
simulated trial for the standard method, personal cutback method, and personal cutback wait
129
method. The standard method and the personal cutback method were performed at the same
calendar time. The reporting interval 0.033 months indicates that the patients status were
reported everyday , which represents a perfect reporting condition. Under the perfect reporting
condition, the RGray analysis time was around 35.3 months, regardless the probability of late
events reporting. As the reporting interval became lar ger and the probability of late events
reporting became bigger , the RGray analysis time also increased. The biggest RGray analysis
time, which was 38.9 months, was observed when reporting interval was 6 months and the
probability of late events reporting was 1. The reporting interval 3.6 means that the patients
were scheduled to report every 3 months in the first two years, then every 6 months thereafter .
Similarly , the reporting interval 6.3 means the patients were scheduled to report every 6 months
for the first two years, and every 3 months thereafter . For convenience, I will describe these
two conditions as reporting interval 3.6 months and 6.3 months subsequently . The RGray
analysis time was very close for reporting interval 3 months and 3.6 months, and was very
close for reporting interval 6 months and 6.3 months. This indicated that this characteristic of
the RGray method was driven by what happened in the first two years. Increasing the length
of follow-up to 6 months from 3 months after 2 years of follow-up does not materially af fect
the RGray time. That means, if patients reported their status every 6 months in the first two
years and the researchers felt that they need more information, it cannot be made up by asking
the patients to report every 3 months thereafter . The RGray analysis time for personal cutback
wait method was not af fected by delayed events reporting. The analysis time increased as the
reporting interal became lar ger . It was 35.3 months under perfect reporting, and was 39 months
when the reporting interval was 6 months.
130
Figure 4.8.4. Density plot of RGray analysis time for Standard/Personal Cutback method vs.
Personal Cutback W ait method
The smoothed density plot showed the distribution of the RGray analysis time. The
bell-shaped curve indicated a normal distribution of the 50% information time from the 5000
simulated trials.
131
Fig 4.8.5. The extra waiting time for Personal Cutback W ait method
I next examined how long we should wait to perform the personal cutback wait method.
The extra waiting time was defined as the dif ference in RGray analysis time between the
standard/personal cutback method and the personal cutback wait method. There was no extra
waiting time under the perfect reporting condition. The extra waiting time increased when the
reporting interval became lar ger and the probability of late events reporting became smaller .
The longest extra waiting time was 3.77 months, which was observed when reporting interval
was 6 months and the probability of late events reporting was 0.
Figure 4.8.6. Scatter plot of log hazard ratio with dif ferent data processing method
132
The scatter plot showed that with standard method, as the reporting interval became lar ger
and the probability of late events reporting became smaller , the dif ference in log hazard ratio
increased. Under perfect reporting, the log hazard ratio was -0.47. The biggest log hazard ratio
-0.461 was observed when reporting interval was 6 months and the probability of late events
reporting was 0. The dif ference diminished when personal cutback method or personal cutback
wait method were employed for data processing.
Figure 4.8.7. Density plot of log hazard ratio with dif ferent data processing method
Similar as observed under the exponential setting, when the probability of late events
reporting became smaller , the bell-shaped curve shifted horizontally to the right with standard
method, and shifted vertically to the bottom with personal cutback method. The curve was
perfectly overlapped with personal cutback wait method.
133
Figure 4.8.8. Proportion of trials stopped for futility with dif ferent data processing methods
Under perfect reporting, the proportion of trials stopped early for futility by RGray method
was 2.26%. The proportion increased as the reporting interval became lar ger and the probabil-
ity of late events reporting became smaller . W ith standard method, t he biggest proportion of
futility trials was 2.64%. W ith personal cutback method, the biggest proportion of futility trials
was 3.14%. Both of them were observed when the reporting interval was 6 months and the
probability of late events reporting was 0. W ith personal cutback wait method, the proportion
of futility trials was not af fected by delayed events reporting.
4.9 Conclusions
In this study , we explored whether the futility monitoring of phase III clinical trials can be
af fected by the length of reporting intervals and the probability of late events reporting. The
results showed that if the treatment arm was not superior than the current standard therapy , in
which case the early stopping to declare futility was desired, the dif ferent reporting intervals
and delayed events reporting would not af fect the estimate of hazard ratio and the proportion
of trials to be stopped for futility with the standard data processing method. If the treatment
arm was superior to the current standard therapy , in which case we would not want to stop the
trial early to declare futility , the increasing reporting intervals and decreasing probability of late
events reporting could lead to false decision of declaring futility too often. An attempting solu-
tion for this problem was to use personal cutback data processing method, which could obtain
134
unbiased estimate of hazard ratio. However , although the average hazard ratio was unbiased,
the number of observed events in each trial varied, which lead to more dispersed distribution
of empirical hazard ratio and lower cutof f value when the repeated confidence interval method
was used. The personal cutback method led to increased probability of rejection for futility un-
der the alternative and decreased probability of rejection for futility under the null when RGray
method was used for interim monitoring. When repeated confidence interval method was used,
the probability of rejection for futility was decreased both under the alternative and under the
null. T o solve this problem, the actual analysis time should be determined by personal cutback
wait method. This implies that the interim analysis should not be conducted until the expected
number of events is observed when the personal cutback method is applied to the maturing data.
The method will obtain an unbiased estimate of log hazard ratio and the frequency with which
the trial is stopped, a nd therefore the ef fect on power , is what would be expected theoretically
with perfect reporting, irrelevant to the width of reporting intervals and the probability of late
events reporting.
My work revealed that in order to get accurate probability of futility declaration, two key
points must be met: independent censoring and correct number of events. The standard method
failed to met the condition of indepedent censoring. During the time frame between the last
visit time and the analysis time, only events had chance to be recorded. Patients without events
would not have chance to contribute information to the survival time. This was where the
bias was introduced and lead to inaccurate estimation of hazard ratio. The personal cutback
method and global cutback method met the independent censoring condition. Therefore, both
of them could obtain unbiased hazard ratio estimates. However , the number of events varied
in dif ferent trials, which introduced variance to the hazard ratio estimation. When events were
reported without delay (p=0) and the reporting interval was lar ge (6 months in this study), there
135
were much more events fell in between the last visit time and the analysis time, and were to be
ignored by the personal cutback method. This would cause the biggest variance of hazard ratio
distribution, and hence the lar gest probability of being stopped to declare futility . The personal
cutback wait method met both conditions, which explained why it could obtain an unbiased
estimate of hazard ratio and predictable futility probabilities. Similar r esults were observed in
both the exponential setting and the W eibull setting (section 4.8).
In this study , the reporting interval was set to be consistent in most simulation settings.
In reality , patients who were in periods of reduced risk for event were likely to have longer
reporting intervals, as they were not required to report as often as patients with severe illness.
Therefore, I also examined the trials with 3 months reporting interval in the first two years and
6 months reporting intervals thereafter (which was called reporting interval 3.6 months in this
study), or trials with 6 months reporting interval in the first two years and 3 months reporting
interval thereafter (which was called reporting interval 6.3 months in this study). It turned out
that the empirical log hazard ratio was very close for reporting interval 3 months and 3.6 months,
and was very close for reporting interval 6 months and 6.3 months (Fig 4.8.6). This indicated
that the characteristics of analysis was driven by what happened in the first two years, which
was the high risk period by study design. Increasing the length of follow-up to 6 months from 3
months after 2 years of follow-up does not materially af fect the log hazard ratio estimation. That
means, if patients reported their status every 6 months in the first two years and the researchers
felt that they need more information, it cannot be made up by asking the patients to report every
3 months thereafter . Most pediatric cancers had high failure risk in the first several years and
low failure risk thereafter . My study indicated that it was appropriate to ask the patients to
report more often during the high risk period and less often during the low risk period to save
resources, and yield a similar treatment ef fect estimation as the situation which asked patients
136
to report often through out the entire trial.
It is worth noting that the absolute proportion of trials stopped for futility under the alter -
native did not change substantially in the presence of delayed reporting (2.7% under perfect
reporting verus 3.3% with delayed reporting), albeit the relative increase was 22%. This work
reassures the researchers that they would not need to worry too much about the delayed report-
ing in terms of the futility monitoring. If they feel comfortable to accept this small errors, they
could proceed the study with standard data processing method. Otherwise the personal cut-
back wait method would provide more precise decision, at the price of several months of extra
waiting time. However , if the experimental therapy was inferior to the standard therapy , extra
waiting time would enroll more patients and have more patients been exposed to an inferior
therapy . Deliberate consideration should be taken to make decisions.
This study used the baseline hazard rate 1/3 for all exponential settings. It corresponded to
a median suvival time of 2 months, which was similar to the usual condition of most poor risk
adult cancers. As shown in Fig 3.7.1, as the baseline hazard rate became smaller , the estimate
of log hazard ratio became more and more accurate. It might due to the reason that low hazard
rate corresponded to a low INIC (index of non-independent censoring, Fig 3.7.3), regardless
of the probability of late events reporting. The non-independent censoring was the source of
bias when estimating the empirical hazard ratio. Therefore, low baseline hazard rate could
lead to less biased hazard ratio estimation. When the baseline hazard rate was 0.01, which
reflected a usual median survival time of 5.8 years in pediatric cancers, the bias of log hazard
ratio estimation was less than 1%. Therefore, the delayed events reporting was not a concern in
pediatric cancers. However , it might be a problem in poor risk adult cancers and need special
attention.
137
Chapter 5: Application to Real Data
In this section I will apply the two futility monitoring approaches, RGray method and RCI
method, to perform interim futility monitoring with three dif ferent data processing methods for
four bone tumor trials, AEWS0031, AEWS1031, AEWS1221 and INT0091, which were carried
out by member institutions of Children’ s Oncology Group. These were phase III randomized
clinical trials which aimed to evaluate the ef ficacy of newly designed chemotherapy compared
to the standard chemotherapy to treat Ewing sarcoma. The primary study endpoint was event-
free survival, which was the time from study entry until disease progression, occurrrence of a
second malignancy , death without disease progression or last follow-up, whichever occurred
first. If the patient was alive and did not experience any of the listed events, the patient would
be considered censored for the analysis. Otherwise the patient would be considered to have an
event. The performance of the two futility monitoring methods will be evaluated with three
dif ferent data processing methods, the standard method, the personal cutback method and the
personal cutback wait method as proposed in this research.
The interim analysis time was determined when the number of expected events were
observed. In order to calculate the last visit date before each interim analysis time, the
reporting intervals for AEWS0031, AEWS1031 and INT0091 were assumed to be 3 months.
AEWS1221 recorded the date of each visit. The recorded visit dates were compared with the
interim analysis time to determine the last visit date for AEWS1221. The crude rates were
calculated for each arm and presented as number of events per 10,000 patient-days. The hazard
ratio was assessed by cox proportional hazards regression model. The log rank test p value
were reported for each interim check points.
138
5.1 AEWS0031 study
AEWS0031 is a randomized phase III trial to determine whether chemotherapy intensification
by reduction of the intervals between chemotherapy cycles can improve the ef fectiveness of
treatment for Ewing sarcoma and related tumors in children, adolescents, and adults. Patients
in the control arm received chemotherapy every 21 days, while patients in the experimental arm
received chemotherapy every 14 days, or as soon as blood count recovery permited. Both arms
received total 14 cycles of chemotherapy treatment. The log-rank statistic would be used to
compare the risk of an adverse event between patient groups as noted in the statistical methods
section. This study was designed to be open for 4.5 years to enroll approximately 528 patients.
One year of follow-up after the last patient entered would be obtained before the final analysis
was undertaken. The number of expected events by the end of study was 163 if the alternative
hypotheis was true. The study opened in May 2001. The final analysis was conducted in Mar
2009 (W omer et al., 2012), almost two years after the originally planned analysis date. T otal
568 eligible patients were enrolled and randomized to each treatment arms.
5.1.1 Futility monitoring with RGray method
T able 5.1.1 showed the results of RGray method for AEWS0031 with three dif ferent data
processing methods. If the data was processed with standard data processing method, the
RGray analysis should be conducted on 08/03/2005, at which time 50% of the total expected
events were observed (163/2=82). The hazard ratio estimated by cox regression was 0.63,
which was lower than the RGray boundary 1. Therefore, the study passed the interim futility
139
T able 5.1.1. RGray interim futility monitoring for AEWS0031 study
Method T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
log rank
p-value
Information
level
(Analysis
time)
Standard Experimental
Standard
180201
176894
32
50
1.78
2.83
0.63 0.02 0.50
(08/03/2005)
Personal
Cutback
Experimental
Standard
168746
166548
28
44
1.66
2.64
0.63 0.03 0.44
(08/03/2005)
Personal
Cutback
W ait
Experimental
Standard
182615
179194
31
51
1.70
2.84
0.60 0.01 0.50
(09/29/2005)
Final
analysis
Experimental
Standard
443546
416184
75
95
1.69
2.28
0.75 0.03 03/31/2009
1
1
Final analysis based on information up to this date.
check and proceeded to the end of study . On 08/03/2005, if the data was processed with
personal cutback method instead, the events which occurred between the last visit date and the
analysis date would be ignored. The number of events included in the analysis was decreased
to 72 and the adjusted information level was 0.44. The RGray-LIB20 boundary for information
level 0.44 was 1.008. The cox hazard ratio was 0.63, which was lower than the RGray-LIB20
boundary 1.008. Therefore, the study passed the interim futility check and proceeded to the
end of study . If the personal cutback wait method was deployed, researchers would need to
wait about two months to conduct the interim analysis on 09/29/2005. The data was then
processed with personal cutback method and 82 events were included in the analysis. This was
again 50% information level and the RGray boundary 1 would apply . The cox hazard ratio
was 0.60 and was lower than the RGray boundary 1. The trial would hence pass the interim
futility check and continue to the final analysis. The final analysis was conducted based on the
information up to 03/31/2009.
140
5.1.2 Futility monitoring with RCI method
T able 5.1.2.1. RCI interim futility monitoring for AEWS0031 study with Standard data pro-
cessing method
T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
Lower
RCI
log rank
p-value
Information
level
(Analysis
time)
Experimental
Standard
77686
78196
18
23
2.32
2.94
0.78 -1.05 0.22 0.25
(03/12/2004)
Experimental
Standard
180201
176894
32
50
1.78
2.83
0.63 -0.95 0.02 0.50
(08/03/2005)
Experimental
Standard
269767
259173
51
72
1.89
2.78
0.68 -0.75 0.02 0.75
(08/07/2006)
I next examined the performance of repeated confidence interval method with standard
data processing methods (T able 5.1.2.1). The RCI method performed three interim checks
at 25%, 50% and 75% information time. The first interim monitoring was conducted on
03/12/2004, when 41 events were observed, which denoted the 25% information level. Cox
regression were used to estimate the hazard ratio, which was 0.78 at this time point. The
alpha-spending boundaries were calculated from the power family method. The lower bound
of RCI was -1.05, which was smaller than the alternative reference -0.405. The trial passed
the first interim check point and continued to the second one. The second interim monitoring
was conducted on 08/03/2005, at which time 82 events were observed, which denoted the 50%
information level. The lower bound of RCI was -0.95, still smaller than the lower alternative
reference -0.405. The trial passed the second interim check point and proceeded to the third
one. The third interim monitoring was performed on 08/07/2006, at which time 123 events
were observed, which denoted the 75% information level. The lower bound of RCI was -0.75,
which was smaller than the lower alternative reference -0.405. The trial passed all the three
interim checks and continue to the end of study .
141
T able 5.1.2.2. RCI interim futility monitoring for AEWS0031 study with Personal Cutback
method
T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
Lower
RCI
log rank
p-value
Information
level
(Analysis
time)
Experimental
Standard
70652
71385
17
22
2.41
3.08
0.77 -1.10 0.2 0.24
(03/12/2004)
Experimental
Standard
168746
166548
28
44
1.66
2.64
0.63 -1.01 0.03 0.44
(08/03/2005)
Experimental
Standard
259693
249969
47
68
1.81
2.72
0.67 -0.78 0.02 0.70
(08/07/2006)
If the interim analysis were performed at the same calendar dates as shown in T able 5.1.2.1
and personal cutback method was used to process the data instead, the events between the last
visit date and the analysis date would be ignored and fewer events would be included in the
analysis. Therefore, the number of events included in the analysis was decreased to 39, 72
and 1 15 at each interim time points, and the adjusted information level became 0.24, 0.44 and
0.70, respectively (T able 5.1.2.2). The alpha-spending boundary were recalculated according
to the adjusted information levels. The cox regression models were refitted and new hazard
ratio estimates were obtained. The lower bound of RCIs were hence updated and compared to
the alternative hypothesis -0.405. None of them were greater than -0.405. The trials passed all
the three interim checks and continued to the end.
If the personal cutback wait method was deployed for data analysis, the first interim anal-
ysis time would be postponed from 03/12/2004 to 04/03/2004, which indicated 22 days extra
waiting time. The data was then processed with personal cutback method and 41 events were
included in the analysis. That was 25% information level. The new hazard ratio was estimated
142
T able 5.1.2.3. RCI interim futility monitoring for AEWS0031 study with Personal Cutback
W ait method
T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
Lower
RCI
log rank
p-value
Information
level
(Analysis
time)
Experimental
Standard
73935
74701
18
23
2.43
3.08
0.78 -1.05 0.22 0.25
(04/03/2004)
Experimental
Standard
182615
179194
31
51
1.70
2.84
0.60 -1.01 0.01 0.50
(09/29/2005)
Experimental
Standard
269792
259315
52
71
1.93
2.74
0.70 -0.72 0.03 0.75
(09/23/2006)
by cox regression and the lower bound of RCI was recalculated, which resulted in -1.05. It was
smaller than the lower alternative hypothesis -0.405. Therefore, the trial passed the first interim
check and proceeded to the next one. The second interim analysis time was postponed from
08/03/2005 to 09/29/2005, which indicated nearly two months waiting time. Eighty-two events
were included in the analysis, which represented 50% information level. The hazard ratio at
this time point was 0.60 and the lower bound RCI was -1.01. It was still smaller than the lower
alternative reference -0.405. The trial passed the second check point and continued to the third
one. The third interim monitoring time was postponed from 08/07/2006 to 09/23/2006, which
was about 1.5 months extra waiting time. The hazard ratio estimate was 0.70 and the lower
bound RCI was -0.72, which was still smaller than the lower alternative reference -0.405. The
trial passed all the interim checks and continued to the final analysis.
Overall, the three data processing methods reached the same interim futility monitoring
decision for AEWS0031, regardless which futility monitoring methods were used. The
personal cutback method required on average 1.5 months extra waiting time for each check
point, and did not change the decision on futility monitoring. The final analysis revealed
that the interval-compressed therapy was superior than standard therapy(W omer et al., 2012).
143
It was correct decision to let the trial pass the futility monitoring and continue to the end of study .
5.2 AEWS1031 study
Previous inter group Ewing sarcoma study INT -0091 revealed an ef fective regimen which
combined five drugs vincristine-doxorubicin-cyclophosphamide and ifosfamide-etoposide,
denoted as VDC/IE. AEWS0031 demonstrated that interval compression was superior to
standard timing. The five-drug combination using interval compression was considered the
North American standard therapy for localized Ewing sarcoma. Based on the promising
results of some in vitro experiments and the in vivo pilot study (Mascarenhas et al., 2016),
AEWS1031, a randomized Phase III trial, was carried out to test the ef ficacy of adding
vincristine-topotecan-cyclophosphamide to the interval compressed 5 drug backbone. This
new experimental regimen was denoted as VTC/IE/VDC. The log rank test statistic were used
to assess the treatment ef fect as described in the statistical methods section. This study planned
three interim futility monitoring with the conditional power approach, at 2 years, 3 years, and
4 years after study opening. If the conditional probability of rejection of the null hypothesis
was less than 5% at any monitoring time, the study would be identified for possible closure of
accrual. This study was designed to enroll approximately 630 eligible patients within 4.5 years
and the last patient enrolled would be followed for an additional 1 year . It was expected to
have 162 events by the end of study if the alternative hypothesis was true. This study opened
on 1 1/22/2010, and has reached the accrual goals.
144
T able 5.2.1. RGray interim futility monitoring for AEWS1031 study
Method T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
log rank
p-value
Information
level
(Analysis
time)
Standard Experimental
Standard
270603
257801
39
42
1.44
1.63
0.88 0.28 0.50
(02/16/2016)
Personal
Cutback
Experimental
Standard
257433
246245
36
41
1.40
1.67
0.84 0.22 0.47
(02/16/2016)
Personal
Cutback
W ait
Experimental
Standard
268284
255299
39
42
1.45
1.65
0.88 0.28 0.50
(03/23/2016)
Final
analysis
Experimental
Standard
555307
525137
67
74
1.21
1.41
0.86 0.19 03/31/2020
1
1
Final analysis based on informtion up to this date.
5.2.1 Futility monitoring with RGray method
The performace of RGray futility monitoring method was assessed with three data processing
methods (T able 5.2.1). If the data was processed with standard method, the RGray approach
should be performed on 02/16/2016, at which time 50% of expected events were observed
(162/2=81). The cox hazard ratio was 0.88, which was smaller than the RGray boundary 1. The
trial passed the RGray interim futility check and continued to the end of study . On the same
calendar time, if the data was processed with personal cutback method instead, the number of
events included in the analysis was decreased to 77 and the adjusted information level was
0.47. The RGray-LIB20 boundary for information level 0.47 was 1.003. The cox hazard ratio
at this time was 0.84, which was smaller than the RGray-LIB20 boundary 1.003. Therefore, the
trial passed the interim futility check and proceeded to the final analysis. The interim analysis
should be performed on 03/23/2016 if personal cutback wait method was deployed. That was
about 1.25 months extra waiting time. The data was processed with personal cutback method
and 81 events were included in the analysis, which resulted 50% adjusted information level.
The RGray boundary 1 would then apply . The hazard ratio was 0.88 and was smaller than the
145
RGray boundary 1. The trial passed the RGray interim monitoring and continued to the end of
study . The final analysis was conducted based on information current to 03/31/2020.
5.2.2 Futility monitoring with RCI method
T able 5.2.2.1. RCI interim futility monitoring for AEWS1031 study with Standard data pro-
cessing method
T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
Lower
RCI
log rank
p-value
Information
level
(Analysis
time)
Experimental
Standard
1 13748
10821 1
21
20
1.846
1.848
0.98 -0.83 0.47 0.25
(04/23/2014)
Experimental
Standard
270603
257801
39
42
1.44
1.63
0.88 -0.62 0.28 0.50
(02/16/2016)
Experimental
Standard
442475
418429
57
65
1.29
1.55
0.83 -0.55 0.15 0.75
(1 1/09/2017)
The performace of RCI approach was assessed with standard data processing method
(T able 5.2.2.1). Three interim analysis times were conducted on 04/23/2014, 02/16/2016 and
1 1/09/2017, at which time 25%, 50% and 75% of expected events were observed, respectively .
The lower bound of RCIs of each time point were calculated and compared to the lower
alternative reference -0.405. All of them were smaller than -0.405. The trial passed the three
interim check points and continued to the end of study .
On the same interim analysis calendar times as noted in T able 5.2.2.1, if the data was
processed with personal cutback method instead, the number of events included in the analysis
was decreased to 39, 77 and 120 at each time point, and the adjusted information levels
became 0.24, 0.47 and 0.74, respectively (T able 5.2.2.2). The alpha-spending boundaries were
146
T able 5.2.2.2. RCI interim futility monitoring for AEWS1031 study with Personal Cutback
method
T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
Lower
RCI
log rank
p-value
Information
level
(Analysis
time)
Experimental
Standard
104460
99738
21
18
2.01
1.80
1.09 -0.75 0.60 0.24
(04/23/2014)
Experimental
Standard
257433
246245
36
41
1.40
1.66
0.84 -0.70 0.22 0.47
(02/16/2016)
Experimental
Standard
430452
407551
55
65
1.28
1.59
0.80 -0.58 0.12 0.74
(1 1/09/2017)
recalculated based o n the a djusted information level and t he cox regression was r efitted. The
lower bound of RCIs were hence updated and compared to the lower alternative reference
-0.405. All of them were smaller than -0.405. The trial would pass all the interim futility
checks and continue to the end of study .
T able 5.2.2.3. RCI interim futility monitoring for AEWS1031 study with Personal Cutback
W ait method
T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
Lower
RCI
log rank
p-value
Information
level
(Analysis
time)
Experimental
Standard
109015
103658
22
19
2.02
1.83
1.07 -0.74 0.59 0.25
(05/14/2014)
Experimental
Standard
268284
255299
39
42
1.45
1.65
0.88 -0.62 0.28 0.50
(03/23/2016)
Experimental
Standard
447773
424058
57
65
1.27
1.53
0.83 -0.55 0.16 0.75
(01/16/2018)
If the personal cutback wait method was deployed, the first interim check at 25% informa-
tion time would be delayed for 21 days, from 04/23/2014 to 05/14/2014. The data was then
processed with personal cutback method and 41 events were included in the analysis. That was
147
25% information level. The cox hazard ratio was 1.07 and the lower bound of RCI was -0.74,
which was smaller than the lower alternative hypothesis -0.405. The trial passed the first in-
terim check and continued to the second one. The second interim check at 50% information
time would be delayed for 36 days, from 02/16/2016 to 03/23/2016. The data was processed
with personal cutback method and 81 events were included in the analysis. The cox hazard
ratio was 0.88 and the lower bound of RCI was -0.62, which was smaller than -0.405. The trial
passed the second interim check and continued to the third one. The third interim check at 75%
information time would be delayed for more than two months, from 1 1/09/2017 to 01/16/2018.
The data was processed with personal cutback method and 122 events were included in the
analysis. The cox hazard ratio was 0.83 and the lower bound of RCI was -0.55, which was still
smaller than -0.405. The trial passed the third interim futility check and proceeded to the final
analysis (T able 5.2.2.3).
In all, the two futility monitoring approaches, RGray method and RCI method, came to
the same decision interms of futility monitoring for AEWS1031 with three dif ferent data
processing methods. The trial passed all futility checks and continued to the final analysis. The
final analysis revealed that the experimental arm was superior to the standard arm, however
did not reach the significance threshold level (HR=0.86, p=0.38).
5.3 AEWS1221 study
Some preclinical data supports the role of IGF-1R inhibition in the treatment of Ewing sarcoma.
AEWS1221, a phase III randomized trial, was designed to evaluate the role of Ganitumab, a
human monoclonal antibody against IGF-1R, in the treatment of newly-diagnosed metastatic
Ewing sarcoma. Patients with newly diagnosed metastiatic Ewing sarcoma were randomized
148
at study entry to the control standard arm with interval compressed multiagent chemother -
apy (denoted as VDC/IE) or to the experimental arm with interval compressed multiagent
chemotherapy and addition of ganitumab (denoted as VDC/IE + Ganitumab). The statistical
alternative hypothesis was the relative risk of an EFS-event associated with the experimental
therapy was 0.67. The study was designed to enroll 300 eligible patients in 5 years. The last
enrolled patient would be followed up for 1.5 years. Three interim futility monitoring with
RGray method were planned to be approximately 2.5 years, 3.5 years and 4.5 years after the
study was opened. It was expected to have 196 events at the end of study if the alternative
hypothesis was true. This study opened on 1 1/08/2014 and has enrolled 298 eligible patients
by 03/31/2020.
5.3.1 Futility monitoring with RGray method
T able 5.3.1. RGray interim futility monitoring for AEWS1221 study
Method T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
log rank
p-value
Information
level
(Analysis
time)
Standard Experimental
Standard
68046
69475
49
49
7.20
7.05
1.04 0.57 0.50
(1 1/26/2018)
Personal
Cutback
Experimental
Standard
63644
59901
47
42
7.38
7.01
1.07 0.63 0.45
(1 1/26/2018)
Personal
Cutback
W ait
Experimental
Standard
67882
64803
50
48
7.37
7.41
1.002 0.5 0.50
(01/22/2019)
Final
analysis
Experimental
Standard
98159
93822
86
79
8.76
8.42
1.03 0.42 03/31/2020
1
1
Final analysis based on informtion up to this date.
The performance of RGray method was examined for A EWS1221 with three data process-
ing methods (T able 5.3.1). W ith standard data processing method, 98 events were observed
149
on 1 1/26/2018, which denoted the 50% information time. The hazard ratio estimate was 1.04,
which was greater than the RGray boundary . The trial would be stopped and declared futility .
On 1 1/26/2018, if the data was processed with personal cutback method instead, the number of
events included in analysis was decreased to 89. The adjusted information level was 0.45. The
RGray-LIB20 boundary for the adjusted informatio level was 1.006. Since the hazard ratio
estimate was 1.07 with personal cutback method, which was greater than 1.006, the trial would
be stopped and claim futility . If the personal cutback wait method was used, the researcher
would need to wait for 57 extra days to perform the analysis. On 01/22/2019, the data was
processed with personal cutback method and 98 events were included in the analysis, which
was 50% of expected events. The adjusted information level was 50% and the RGray boundary
1 was applied. The hazard ratio estimate at this time point was 1.002, greater than 1. The trial
would hence be stopped and declare futility . All three data processing methods stopped the
trial at 50% information time with RGray method. The final analysis was conducted based on
information current to 03/31/2020.
5.3.2 Futility monitoring with RCI method
T able 5.3.2.1. RCI interim futility monitoring for AEWS1221 study with Standard data pro-
cessing method
T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
Lower
RCI
log rank
p-value
Information
level
(Analysis
time)
Experimental
Standard
32655
34335
28
21
8.57
6.12
1.45 -0.44 0.90 0.25
(08/17/2017)
Experimental
Standard
68046
69475
49
49
7.20
7.05
1.04 -0.46 0.57 0.50
(1 1/26/2018)
Experimental
Standard
93769
94896
77
70
8.21
7.38
1.12 -0.25 0.75 0.75
(09/20/2019)
150
The RCI approach was performed to evaluate futility with standard data processing method
(T able 5.3.2.1). The first interim monitoring was performed on 08/17/2017, when 49 events
were observed, which denoted the 25% information time. The lower bound of RCI was -0.44,
which was smaller than the alternative reference -0.405. The trial passed the first interim
check and continued to the second one. The second interim monitoring was performed on
1 1/26/2018, at which time 98 events were observed, which denoted the 50% information time.
The lower bound of RCI was -0.46, smaller than the lower alternative reference -0.405. The
trial passed the second interim check point and proceed to the third one. The third interim
monitoring was performed on 09/20/2019, when 147 events were observed, which denoted the
75% information time. The lower bound of RCI was -0.25, which was greater than the lower
alternative reference -0.405. The trial would be stopped at this time point and declare futility .
T able 5.3.2.2. RCI interim futility monitoring for AEWS1221 study with Personal Cutback
method
T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
Lower
RCI
log rank
p-value
Information
level
(Analysis
time)
Experimental
Standard
30264
30400
27
18
8.92
5.92
1.55 -0.42 0.92 0.23
(08/17/2017)
Experimental
Standard
63644
59901
47
42
7.38
7.01
1.07 -0.47 0.63 0.45
(1 1/26/2018)
Experimental
Standard
86650
82152
73
64
8.42
7.79
1.08 -0.31 0.68 0.70
(09/20/2019)
On the three interim monitoring calendar date as stated in T able 5.3.2.1, if the personal
cutback method was used to process the data, the number of events included in the analysis
would decrease to 45, 89 and 137, respectively . The adjusted information level would become
0.23, 0.45 and 0.70 accordingly . The alpha-spending boudaries were recalculated based on the
151
adjusted information levels. New hazard ratio estimates were obtained from cox regression.
The lower bound of RCIs were hence updated and compared to the lower alternative reference
-0.405. The first time the lower bound RCI became greater than -0.405 was at 75% information
time. Therefore, the trial passed the first two interim monitoring but was stopped at the third
one. The trial would then be identified for possible termination for futility .
T able 5.3.2.3. RCI interim futility monitoring for AEWS1221 study with Personal Cutback
W ait method
T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
Lower
RCI
log rank
p-value
Information
level
(Analysis
time)
Experimental
Standard
33335
34816
28
21
8.40
6.03
1.42 -0.45 0.89 0.25
(10/25/2017)
Experimental
Standard
67882
64803
50
48
7.37
7.41
1.002 -0.49 0.5 0.50
(01/22/2019)
Experimental
Standard
91273
85959
80
67
8.76
7.79
1.12 -0.25 0.74 0.75
(1 1/1 1/2019)
If the personal cutback wait method was deployed for data analysis, the first interim analysis
time would be postponed from 08/17/2017 to 10/25/2017, which indicated 2.25 months extra
waiting time. The data was then processed with personal cutback method and 49 events were
included in the analysis. That was 25% information level. The new hazard ratio was estimated
by cox regression and the lower bound of RCI was recalculated, which resulted in -0.45. It
was smaller than the lower alternative hypothesis -0.405. Therefore, the trial passed the first
interim check and proceeded to the next one. The second interim analysis time was postponed
from 1 1/26/2018 to 01/22/2019, which indicated nearly two months waiting time. Ninety-eight
events were included in the analysis, which represented 50% information level. The hazard
ratio at this time point was 1.002 and the lower bound RCI was -0.49. It was still smaller than
152
the lower alternative reference -0.405. The trial passed the second check point and continued to
the third one. The third interim monitoring time was postponed from 09/20/2019 to 1 1/1 1/2019,
which was 1.7 months extra waiting time. The hazard ratio estimate was 1.12 and the lower
bound RCI was -0.25, which was the first time it became greater than the lower alternative
reference -0.405. The trial would then be considered to be stopped and delcare futility .
In all, all three data processing methods suggested to stop AEWS1221 at 50% information
time by RGray method, and to stop the trial at 75% information time by RCI method. As
of 03/20/2019, the study closed to accrual and institutions were instructed to immediately
discontinue ganitumab for patients on the experimental arm. It was correct decision to stop
the trial for futility at interim monitoring time point. RGray did a better job than RCI since it
would stop the trial earlier .
5.4 INT0091 study
The COG legacy study INT0091 recruited 120 patients with metastatic Ewing’ s sarcoma or
primitive neuroectodermal tumor (PNET) of bone from 1988 to 1992. The patients were
randomized into the control group which received the standard therapy consisted of vincristine,
doxorubicine, cyclophosphamide and dactinomycine, and the experimental group which added
ifosfamide and etoposide into the control regimen. The study purpose was to evaluate whether
addition of ifosfamide and etoposide to the standard regimen could improve outcomes (Miser
et al., 2004). The analytical dataset which was frozen on Aug 31, 2000, was used for the
following analysis.
153
5.4.1 Futility monitoring with RGray method
T able 5.4.1. RGray interim futility monitoring for INT0091 study
Method T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
log rank
p-value
Information
level
(Analysis
time)
Standard Experimental
Standard
17431
18088
22
25
12.62
13.82
0.87 0.32 0.50
(02/10/1992)
Personal
Cutback
Experimental
Standard
16094
16782
21
23
13.05
13.70
0.90 0.37 0.47
(02/10/1992)
Personal
Cutback
W ait
Experimental
Standard
17687
18342
23
24
13.00
13.08
0.93 0.41 0.50
(04/02/1992)
Final
analysis
Experimental
Standard
52227
57812
45
49
8.62
8.48
0.95 0.40 08/31/2000
1
1
Final analysis based on informtion up to this date.
The performace of RGray futility monitoring method was assessed with three data process-
ing methods (T able 5.4.1). If the data was processed with standard method, the RGray approach
should be performed on 02/10/1992, at which time 50% of expected events were observed
(94/2=47). The cox hazard ratio was 0.87, which was smaller than the RGray boundary 1. The
trial passed the RGray interim futility check and continued to the end of study . On the same
calendar time, if the data was processed with personal cutback method instead, the number of
events included in the analysis was decreased to 44 and the adjusted information level was
0.47. The RGray-LIB20 boundary for information level 0.47 was 1.003. The cox hazard ratio
at this time was 0.90, which was smaller than the RGray-LIB20 boundary 1.003. Therefore, the
trial passed the interim futility check and proceeded to the final analysis. The interim analysis
should be performed on 04/02/1992 if personal cutback wait method was employed. That was
about 1.75 months extra waiting time. The data was processed with personal cutback method
and 47 events were included in the analysis, which resulted 50% adjusted information level.
The RGray boundary 1 would then apply . The hazard ratio was 0.93 and was smaller than the
154
RGray boundary 1. The trial passed the RGray interim monitoring and continued to the end of
study . The final analysis was conducted based on information current to 08/31/2000.
5.4.2 Futility monitoring with RCI method
T able 5.4.2.1. RCI interim futility monitoring for INT0091 study with Standard data processing
method
T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
Lower
RCI
log rank
p-value
Information
level
(Analysis
time)
Experimental
Standard
7428
9076
10
13
13.46
14.32
0.94 -0.89 0.44 0.25
(01/13/1991)
Experimental
Standard
17431
18088
22
25
12.62
13.82
0.87 -0.63 0.32 0.50
(02/10/1992)
Experimental
Standard
24998
25962
35
36
14.00
13.87
0.93 -0.43 0.38 0.75
(1 1/30/1992)
The performace of RCI approach was assessed with standard data processing method
(T able 5.4.2.1). Three interim analysis times were conducted on 01/13/1991, 02/10/1992 and
1 1/30/1992, at which time 25%, 50% and 75% of expected events were observed, respectively .
The lower bound of RCIs of each time point were calculated and compared to the lower
alternative reference -0.405. All of them were smaller than -0.405. The trial passed the three
interim check points and continued to the end of study .
The data was processed with personal cutback method on the same interim analysis
calendar times as noted in T able 5.4.2.1. The number of events included in the analysis was
decreased to 22, 44 and 69 at each time point, and the adjusted information levels became 0.23,
0.47 and 0.73, respectively (T able 5.4.2.2). The alpha-spending boundaries were recalculated
155
T able 5.4.2.2. RCI interim futility monitoring for INT0091 study with Personal Cutback method
T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
Lower
RCI
log rank
p-value
Information
level
(Analysis
time)
Experimental
Standard
6647
8104
9
13
13.54
16.04
0.85 -1.01 0.36 0.23
(01/13/1991)
Experimental
Standard
16094
16782
21
23
13.05
13.70
0.90 -0.62 0.37 0.47
(02/10/1992)
Experimental
Standard
24073
24881
34
35
14.12
14.07
0.92 -0.45 0.37 0.73
(1 1/30/1992)
based on the adjusted information level and the cox regression was refitted. The lower bound
of RCIs were hence updated and compared to the lower alternative reference -0.405. All of
them were smaller than -0.405. The trial would pass all the interim futility checks and continue
to the end of study .
T able 5.4.2.3. RCI interim futility monitoring for INT0091 study with Personal Cutback W ait
method
T reatment
Arm
Person-
time
Events Crude
rate
Cox
hazard
ratio
Lower
RCI
log rank
p-value
Information
level
(Analysis
time)
Experimental
Standard
7066
8834
10
13
14.15
14.72
0.97 -0.85 0.47 0.25
(02/19/1991)
Experimental
Standard
17687
18342
23
24
13.00
13.08
0.93 -0.56 0.41 0.50
(04/02/1992)
Experimental
Standard
24617
2551 1
35
36
14.22
14.1 1
0.93 -0.43 0.38 0.75
(12/25/1992)
If the personal cutback wait method was employed, the first interim check at 25% informa-
tion time would be delayed for 37 days, from 01/13/1991 to 02/19/1991. The data was then
processed with personal cutback method and 23 events were included in the analysis. That was
25% information level. The cox hazard ratio was 0.97 and the lower bound of RCI was -0.85,
156
which was smaller than the lower alternative hypothesis -0.405. The trial passed the first in-
terim check and continued to the second one. The second interim check at 50% information
time would be delayed for 52 days, from 02/10/1992 to 04/02/1992. The data was processed
with personal cutback method and 47 events were included in the analysis. The cox hazard
ratio was 0.93 and the lower bound of RCI was -0.56, which was smaller than -0.405. The
trial passed the second interim check and continued to the third one. The third interim check
at 75% information time would be delayed for 25 days, from 1 1/30/1992 to 12/25/1992. The
data was processed with personal cutback method and 71 events were included in the analysis.
The cox hazard ratio was 0.93 and the lower bound of RCI was -0.43, which was still smaller
than -0.405. The trial passed the third interim futility check and proceeded to the final analysis
(T able 5.4.2.3).
In all, the two futility monitoring approaches, RGray method and RCI method, came to the
same decision interms of futility monitoring for INT0091 with three dif ferent data processing
methods. The trial passed all futility checks and continued to the final analysis. The final
analysis revealed that the experimental arm was not significantly superior than the control arm
(HR=0.95, p=0.81).
5.5 Conclusion
All three data processing methods came to the same decision interms of futility monitoring,
regardless which futility monitoring approach was deployed. The standard data processing
method and the personal cutback method were conducted on the same calendar date, with stan-
dard method counting all the events to meet the desired information level, while personal cut-
back method ignored the events which occurred between last visit date and the analysis date and
157
resulted in decreased information levels. The monitoring boundaries were adjusted accordingly
to accommodate the adjusted information levels. The personal cutback wait method generally
was conducted after 1-2 months extra waiting time. The desired information level was then
achieved and there was no need to adjust the boundaries. This method meets the independent
censoring assumption and the estimation of treatment ef fect was not af fected by delayed events
reporting. Compared to standard method and personal cutback method, the personal cutback
wait method could provide the most reliable estimate of treatment ef fect and unbiased prob-
ability of futility , however , the extra waiting time might be a concern, especially when the
experimental regimen was not as beneficial as proposed. For trials which were designed to last
for several years, 1-2 months extra waiting time was not a big issue.
158
Chapter 6: Concluding Remarks and Futur e W ork
Phase III randomized clinical trials are the “golden standard” to assess the overall and relative
therapeutic value of the new drug ef ficacy , safety and special properties. The interim futility
monitoring is to assess the treatment ef fect in the middle of the study , declaring futility if there
is enough evidence that the new regimen will not improve the outcome. Statisticians have
developed several futility monitoring approaches. In this study , I am focused on two futility
monitoring methods, the RGray method (Freidlin et al., 2010) and the repeated confidence
interval (RCI) method (Jennison and T urnbull, 1984, 1989), to evaluate the results of futility
monitoring in the presence of late events reporting. Both methods utilize the estimated hazard
rate in the calculation of the futility rule, therefore, it is important to understand the properties
of the naive estimate of the hazard rate under the various assumptions regarding reporting late
events.
As reported in earlier studies (McIlvaine, 2015; López Nájera, 2016), the estimates of haz-
ard ratio are biased when the standard method is taken to process the data. I notice that the
bias is relatively small if the baseline hazard rate is small and the probability of late events
reporting is big. The fact that personal cutback method can achieve unbiased estimates of haz-
ard ratio suggests that source of bias comes from information collected after last visit, which
is contributed by reported events that occur after last visit only , since well follow-up cannot
be accurately reported after last visit. A novel concept of index of non-independent censoring
(INIC) is introduced in my study . It measures the probability of having an event after the last
scheduled visit and the event was reported. Studies with low INIC are more likely to achieve
unbiased estimates of hazard ratio, and hence less likely to be af fected by late events reporting
in terms of futility monitoring by RGray method or by RCI method. That explains why my
research focused on studies with high baseline hazard rate only , since they have high INICs.
159
Although personal cutback method can get unbiased estimates of hazard ratio, the distribu-
tion of estimates changes and therefore the probability of declaring futility changes. The per -
sonal cutback wait method, although requires several months’ extra waiting time, can achieve
unbiased estimates of hazard ratio and correct probability of declaring futility . However , ex-
tra waiting time will recruit more patients and expose them to a possible inferior therapy . Or
alternatively , researchers can get more accurate estimation of treatment ef fects by scheduling
patients’ visits more often. The disadvantage of having short reporting intervals is that it re-
quires more resources and may not be realistic. Fortunately , although the late events report
does af fect the futility monitoring, the influence is relatively small (2.7% under perfect report-
ing vs. 3.3% under delayed reporting in my exponential setting). If researchers can live with
such small errors, they can proceed with the standard data processing method. This is a trade
of f decision.
Most pediatric cancers have the pattern that event risk begins high and tapers away . Such
situations can be approximated by the W eibull distribution with a shape parameter less than 1.
By changing the reporting intervals in the high risk period and in the low risk period, I reveal that
the reporting intervals in the high risk period is critical. If researchers have collected enough
information during the high risk period, they can ask the patients to visit less often during the
low risk period, without losing too much information. On the other hand, if researchers fail
to collect enough information during the high risk period, they cannot make it up by asking
patients to report more often during the low risk period.
I have evaluated the three data processing method in dif ferent exponential and W eibull set-
tings and achieve similar results. Generally speaking, the small the bias in hazard ratio esti-
mation, the better the probability of declaring futiltiy . However , it remains a question whether
my result can be applied to other distributions. My future work would be to assess the perfor -
160
mance of futility monitoring methods under proportional hazard models that follows continous
distributions other than exponential and W eibull.
For the sake of study design, I authored a ShinyApp webpage to calculate the bias of hazard
ratio estimation under dif ferent study designs: https://cxia2020.shinyapps.io/ShinyAppTest/ . If
the bias is small, researchers do not need to worry about late events reporting regarding futility
monitoring. If the bias is big, further discussion should be undertaken to examine methods to
reduce this bias. One such adjustment that could be made that is under the control of the study
team would be to change the time between scheduled follow-up visits.
By using this ShinyApp, I happen to find that short enrollment time can cause elevated bias
in hazard ratio estimation. It suggests that late events reporting can af fect the futility monitoring
of high incidence diseases. The disease incidence rate is not under the control of the investigator
and can af fect bias. The ShinyApp can be used to show the ef fect briefly . It is worthwhile
to investigate further how the disease incidence rate would af fect futility monitoring in the
presence of late events reporting in the future.
My current work assumes the reporting intervals are the same for the two arms. For a study
design where the experimental therapy includes an agent whose interaction with current treat-
ment is not fully understood, a dif ferent, and tighter , follow-up schedule for the experimental
regimen is perferred. An interesting future work would be to explore the ef fect of dif ferent
reporting intervals between the two arms on futility monitoring considering the probability of
late event reporting.
161
Refer ences
Anderson, J.R. and High, R. (201 1) Alternatives to the standard Fleming, Harrington, and
O’Brien futility boundary . Clinical T rials (London, England) , 8 , 270–276.
Cox, D.R. (1972) Regression Models and Life-T ables. Journal of the Royal Statistical Society .
Series B (Methodological) , 34 , 187–220.
DeMets, D.L. and Lan, G. (1995) The alpha spending function approach to interim data analy-
ses. Recent Advances in Clinical T rial Design and Analysis Cancer T reatment and Research.
(ed P .F . Thall), pp. 1–27. Springer US, Boston, MA.
Ebrahim, G.J. (2007) Survival Analysis: A practical approach, 2nd ednMachin D, Cheung YB,
Parmar MKB. Journal of T r opical Pediatrics , 53 , 218–218.
Freidlin, B. and Korn, E.L. (2002) A comment on futility monitoring. Contr olled Clinical
T rials , 23 , 355–366.
Freidlin, B. and Korn, E.L. (2009) Monitoring for Lack of Benefit: A Critical Component of a
Randomized Clinical T rial. Journal of Clinical Oncology , 27 , 629–633.
Freidlin, B., Korn, E.L. and Gray , R. (2010) A general inef ficacy interim monitoring rule for
randomized clinical trials. Clinical T rials (London, England) , 7 , 197–208.
Gordon Lan, K.K. and Demets, D.L. (1983) Discrete sequential boundaries for clinical trials.
Biometrika , 70 , 659–663.
Hu, P . and T siatis, A.A. (1996) Estimating the survival distribution when ascertainment of vital
status is subject to delay . Biometrika , 83 , 371–380.
Hwang, I.K., Shih, W .J. and De Cani, J.S. (1990) Group sequential designs using a family of
type I error probability spending functions. Statistics in Medicine , 9 , 1439–1445.
Jennison, C. and T urnbull, B.W . (1984) Repeated confidence intervals for group sequential
clinical trials. Contr olled Clinical T rials , 5 , 33–45.
Jennison, C. and T urnbull, B.W . (1989) Interim Analyses: The Repeated Confidence Interval
Approach. Journal of the Royal Statistical Society: Series B (Methodological) , 51 , 305–334.
Jennison, C. and T urnbull, B.W . (1999) Gr oup Sequential Methods with Applications to Clinical
T rials . CRC Press.
Kaplan, E.L. and Meier , P . (1958) Nonparametric Estimation from Incomplete Observations.
Journal of the American Statistical Association , 53 , 457–481.
Kim, K., Boucher , H. and T siatis, A.A. (1995) Design and analysis of group sequential logrank
tests in maximum duration versus information trials. Biometrics , 51 , 988–1000.
162
Kim, K. and Demets, D.L. (1987) Design and analysis of group sequential tests based on the
type I error spending rate function. Biometrika , 74 , 149–154.
Korn, E.L. and Freidlin, B. (2018) Interim monitoring for non-inferiority trials: Minimizing
patient exposure to inferior therapies. Annals of Oncology , 29 , 573–577.
Kreyszig, E. (2000) Advanced Engineering Mathematics: Maple Computer Guide , Eighth.
John W iley & Sons, Inc., USA.
Lan, K.K.G., Reboussin, D.M. and DeMets, D.L. (1994) Information and information fractions
for design and sequential monitoring of clinical trials. Communications in Statistics - Theory
and Methods , 23 , 403–420.
Lan, K.K.G., Simon, R. and Halperin, M. (1982) Stochastically curtailed tests in longT erm
clinical trials. Communications in Statistics. Part C: Sequential Analysis , 1 , 207–219.
Lan, K.K.G. and Zucker , D.M. (1993) Sequential monitoring of clinical trials: The role of
information and brownian motion. Statistics in Medicine , 12 , 753–765.
Leung, K.M., Elashof f, R.M. and Afifi, A.A. (1997) Censoring issues in survival analysis. An-
nual Review of Public Health , 18 , 83–104.
López Nájera, S.O. (2016) The Ef fect of Delayed Event Reporting on Interim Monitoring
Methodologies in Randomized Clinical T rials.
Mascarenhas, L., Felgenhauer , J.L., Bond, M.C., V illaluna, D., Femino, J.D., Laack, N.N., et
al. (2016) Pilot study of adding vincristine, topotecan, and cyclophosphamide to interval
compressed chemotherapy in newly diagnosed patients with localized Ewing sarcoma: A
report from the Children’ s Oncology Group. Pediatric blood & cancer , 63 , 493–498.
McIlvaine, E. (2015) The Impact of Data Collection Procedures on the Analysis of Randomized
Clinical T rials.
Miser , J.S., Krailo, M.D., T arbell, N.J., Link, M.P ., Fryer , C.J.H., Pritchard, D.J., et al. (2004)
T reatment of metastatic Ewing’ s sarcoma or primitive neuroectodermal tumor of bone: Eval-
uation of combination ifosfamide and etoposide–a Children’ s Cancer Group and Pediatric
Oncology Group study . Journal of Clinical Oncology: Official Journal of the American
Society of Clinical Oncology , 22 , 2873–2876.
O’Brien, P .C. and Fleming, T .R. (1979) A Multiple T esting Procedure for Clinical T rials. Bio-
metrics , 35 , 549–556.
Pocock, S.J. (1977) Group Sequential Methods in the Design and Analysis of Clinical T rials.
Biometrika , 64 , 191–199.
Pocock, S.J. and Geller , N.L. (1986) Interim Analyses in Randomized Clinical T rials. Drug
Information Journal , 20 , 263–270.
163
Prinja, S., Gupta, N. and V erma, R. (2010) Censoring in Clinical T rials: Review of Survival
Analysis T echniques. Indian Journal of Community Medicine : Official Publication of In-
dian Association of Pr eventive & Social Medicine , 35 , 217–221.
T anaka, S., Kinjo, Y ., Kataoka, Y ., Y oshimura, K. and T eramukai, S. (2012) Statistical issues
and recommendations for noninferiority trials in oncology: A systematic review . Clinical
Cancer Resear ch: An Official Journal of the American Association for Cancer Resear ch ,
18 , 1837–1847.
V an Der Laan, M.J. and Hubbard, A.E. (1998) Locally ef ficient estimation of the survival
distribution with right-censored data and covariates when collection of data is delayed.
Biometrika , 85 , 771–783.
W ang, S.K. and T siatis, A.A. (1987) Approximately Optimal One-Parameter Boundaries for
Group Sequential T rials. Biometrics , 43 , 193–199.
W are, J.H., Muller , J.E. and Braunwald, E. (1985) The futility index. An approach to the cost-
ef fective termination of randomized clinical trials. The American Journal of Medicine , 78 ,
635–643.
W omer , R.B., W est, D.C., Krailo, M.D., Dickman, P .S., Pawel, B.R., Grier , H.E., et al. (2012)
Randomized Controlled T rial of Interval-Compressed Chemotherapy for the T reatment of
Localized Ewing Sarcoma: A Report From the Children’ s Oncology Group. Journal of
Clinical Oncology , 30 , 4148–4154.
164
Abstract (if available)
Abstract
In randomized clinical trials, the current standard data collection method censors event-free patients at their last visit time before the analysis time, while patients who experience an event can be ascertained anytime the event occurs. The violation of independent censoring between the last visit time and the analysis time introduces bias to the estimate of treatment effect. The presence of delayed events reporting makes this bias even worse. How the delayed events reporting would affect the interim futility monitoring of Phase III clinical trials remains unknown. In this study, we evaluate the performance of two commonly used futility monitoring methods, the RGray method and the Repeated Confidence Interval (RCI) method, in the presence of delayed events reporting, for both under the null hypothesis and under the alternative hypothesis. Three data processing methods, the standard method, the personal cutback method and the proposed personal cutback wait method, are extensively explored and compared with each futility monitoring method. The results suggest that delayed events reporting will not affect studies with low index of non-independent censoring (INIC). It will affect the interim futility monitoring for studies with high INIC, in which case personal cutback wait method can be employed to solve the problem.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The effect of delayed event reporting on interim monitoring methodologies in randomized clinical trials
PDF
The impact of data collection procedures on the analysis of randomized clinical trials
PDF
A simulation evaluation of the effectiveness and usability of the 3+3 rules-based design for phase I clinical trials
PDF
Carboplatin and vincristine chemotherapy for progressive low grade gliomas in pediatric patients with or without neurofibromatosis type 1 (NF1)
PDF
Interim analysis methods based on elapsed information time: strategies for information time estimation
PDF
Randomized clinical trial generalizability and outcomes for children and adolescents with high-risk acute lymphoblastic leukemia
PDF
Biomarker-driven designs in oncology
PDF
Statistical analysis of a Phase II study of AMG 386 versus AMG 386 combined with anti-VEGF therapy in patients with advanced renal cell carcinoma
PDF
A comparison of methods for estimating survival probabilities in two stage phase III randomized clinical trials
PDF
Extremity primary tumors in non-rhabdomyosarcoma soft tissue sarcoma: survival analysis
PDF
Bayesian models for a respiratory biomarker with an underlying deterministic model in population research
PDF
A novel risk-based treatment strategy evaluated in pediatric head and neck non-rhabdomyosarcoma soft tissue sarcomas (NRSTS) patients: a survival analysis from the Children's Oncology Group study...
PDF
Fine-grained analysis of temporal and spatial differences of behavior patterns and their correlation with the spread of COVID-19 in Los Angeles County
PDF
Exploring the interplay of birth order and birth weight on leukemia risk
PDF
Risk factors and survival outcome in childhood alveolar soft part sarcoma among patients in the Children’s Oncology Group (COG) Phase 3 study ARST0332
PDF
Surgical aortic arch intervention at the time of extended ascending aortic replacement is associated with increased mortality
PDF
Air pollution and breast cancer survival in California teachers: using address histories and individual-level data
PDF
Effect of glutamate excitotoxicity on multiple sclerosis-related fatigue
PDF
Validation of the Children’s International Mucositis Evaluation Scale (ChIMES) in pediatric cancer and SCT
PDF
Eribulin in advanced bladder cancer patients: a phase I/II clinical trial
Asset Metadata
Creator
Xia, Caihong (author)
Core Title
The effects of late events reporting on futility monitoring of Phase III randomized clinical trials
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Biostatistics
Degree Conferral Date
2021-12
Publication Date
11/17/2021
Defense Date
08/05/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
clinical trials,futility monitoring,index of non-independent censoring,INIC,late events reporting,OAI-PMH Harvest,phase III
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Eckel, Sandrah (
committee chair
), Alonzo, Todd (
committee member
), Franklin, Meredith (
committee member
), Krailo, Mark (
committee member
), Mascarenhas, Leo (
committee member
)
Creator Email
caihongx@usc.edu,xchwhu@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC17138465
Unique identifier
UC17138465
Legacy Identifier
etd-XiaCaihong-10233
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Xia, Caihong
Type
texts
Source
20211118-wayne-usctheses-batch-898-nissen
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
clinical trials
futility monitoring
index of non-independent censoring
INIC
late events reporting
phase III