Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Biomarker-driven designs in oncology
(USC Thesis Other)
Biomarker-driven designs in oncology
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
BIOMARKER-DRIVEN DESIGNS IN ONCOLOGY
by
Yue Tu
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BIOSTATISTICS)
May 2024
Copyright 2024 Yue Tu
Dedication
To all those with self doubt, don‘t underestimate yourself. The journey of learning is long and never ends.
ii
Acknowledgements
I am very thankful to work with my advisor, Dr. Lindsay Renfro, for her guidance, support, and unwavering condence in me during my study. Her understanding, insightful feedback and patience, especially
during times when my research progress was slow due to personal events, have been instrumental in the
completion of my thesis. I am grateful for her acknowledgment of my eorts and for creating an environment where I felt supported and empowered. Her expertise and mentorship have shaped my work. I am
proud to be her rst PhD student.
I would like to express my gratitude to my dissertation committee members: Dr. Wendy Mack, Dr.
Mark Krailo, Dr. Todd Alonzo and Dr. Leo Mascarenhas. Their invaluable advice and constructive feedback
carry me forward throughout my research.
A special thanks goes to Dr. Yusha Liu, who laid a great foundation for my thesis work and helped me
with my projects.
I would like to extend my deepest gratitude to my husband, Kang-Li Cheng, whose unwavering support, love and encouragement have been a constant source of strength throughout this journey. To my
beloved companion, Toast, thank you for always being by my side and sharing those long nights with me.
I am blessed to have both of you in my life. (I want to say they are annoying sometimes, but they both
have strong feelings against that!)
To my parents, Liyu Liu and Xiaobiao Tu, I hope I can make you proud.
iii
Shout out to all my friends! Thank you for being there through the highs and lows, oering listening
ears, hugs and laughter. As an international student, your company and support make me less lonely
and remind me of the importance of friendship. I am so grateful to meet you all and I will never forget
the time we spent together. Thank you, for friends I met at school, Zeyun Lu, Dong Yuan, Menglin Wang,
Yinqi Zhao, Ziyi Jiang, Yijie Li, Zixuan Zhang, Jiayi Shen, Jiawen (Carmen) Chen, Yi Zhang, Ding Li, Vahan
Aslanyan, Brittney Marian, Irene Chen, Haoyue Shan, Xinyue Rui, Jingxuan He, Lulu Song, Chubing Zeng,
Xiaozhe Yin, Lai Jiang and Huiyu Deng. For friends I met outside of school, Anqi Xu, Kuo Jiao, Nelson
Lam, Xiao Yi, Haoting Chen, Hong Ling and Chen Gong. Some old friends: Shengchao Hou, Lan Mu, Yun
Zhou, Shuang Wu, Jianting Shi, Bo Ci, Bowen Chen, Hanying Yan, Yuqian Gu, Yutong Liu, Yuan Feng,
Shuo Liu, Mingxu Shan, Peng Zhang, Xin Shen, Ying Ji, Qianxi Wang, Lei Tong, Yiyi Zhao, Guochao Sun,
Yuhan Shi, Yun Liu, Shutian Liu, Xinyi Zhou, Muxiao Yu, Yourong Wang, Qian Tan, Runjia Li, Peiying
Hua, Xiaoke Zou, Tianyi Ma, Xuemeng Wang, Liwen Wu, Yi Yang, Xingyi Shi, Yun Zhang and Shuang
Song. To everyone else, I may have missed your name, but if we crossed paths during this time, this
acknowledgement includes you!
I am grateful to work as a research assistant with Dr. Benjamin Henwood, Dr. Randall Kuhn, Dr. Lillian
Gelberg, Dr. Amenda Landrian Gonzalez and Jessie Chien under the School of Social Work on homeless
individuals. I enjoyed the teamwork vibe and learned how my statistical skills can be applied in dierent
study areas.
I also want to thank Jennifer Borkowsky and Dr. Cheryl Jones for the internship opportunity at
Genentch; Dr. Tai Xie for hiring me at Brightech International, and Dr. David Moulton for mentoring
me during my internship at Google.
I feel really fortunate and grateful to have the great opportunity to attend USC as a PhD student. I thank
my younger self for having the courage to go through this challenging but rewarding life experience.
iv
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2: Literature Review on Novel Biomarker Trial Designs: Biomarker-Driven Basket Trial
Designs: Origins and New Methodological Developments . . . . . . . . . . . . . . . . . 4
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Traditional Basket Trial Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Description and Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Example: NCI-MATCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Traditional Basket Trial Design Challenges . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Novel Basket Trial Designs: Recent Developments . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 New Basket Designs: Bayesian Statistical Approaches . . . . . . . . . . . . . . . . 9
2.3.2 New Basket Designs: Classical Statistical Approaches . . . . . . . . . . . . . . . . . 17
2.4 Novel Basket Trial Designs: Published Comparisons and Examination of Features . . . . . 21
2.5 Conclusion: Future Needs in Basket Trial Designs . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 3: Latest Developments in “Adaptive Enrichment” Clinical Trial Designs in Oncology . . . 24
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.1 Enrichment Trial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2 Adaptive Enrichment Trial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Recent Developments and Extensions in Adaptive Enrichment Trial Designs . . . . . . . . 30
3.2.1 Bayesian Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.2 Frequentist Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Published Comparisons and Examination of Features . . . . . . . . . . . . . . . . . . . . . 40
3.4 Conclusion and Future Needs in Adaptive Enrichment Trial Designs . . . . . . . . . . . . . 42
Chapter 4: Bayesian Adaptive Enrichment Design for Continuous Biomarkers . . . . . . . . . . . . 44
4.1 Method Part I: Randomized Trial Design with Continuous Biomarker for Binary Outcome 44
4.1.1 Model formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
v
4.1.2 Posterior computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 Method Part II: Simulations without Adaptive Design Features . . . . . . . . . . . . . . . . 48
4.2.1 Introduction of simulation scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.2 Default estimation setting procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.2.1 Description of marker eect estimation . . . . . . . . . . . . . . . . . . . 57
4.2.3 Exploration of Model and MCMC settings . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.3.1 Impact of maximum number of knots on convergence and autocorrelation 60
4.2.3.2 Impact of thinning on convergence and autocorrelation . . . . . . . . . . 65
4.2.4 Exploration of the eect of max number of knots . . . . . . . . . . . . . . . . . . . 74
4.2.5 Estimation of Marker Eects without Adaptive Randomization . . . . . . . . . . . 78
4.3 Method Part III: Adaptive Randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3.1 Adaptive Randomization Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3.2 Simulation with Adaptive Randomization . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.2.1 Exploration and Estimation with Adaptive Randomization . . . . . . . . 82
4.3.2.2 Ten Single Trials for Adaptive Randomization under All Scenarios . . . . 86
4.4 Trial design: Bayesian Enrichment Trial with Adaptive Randomization . . . . . . . . . . . 89
4.4.1 Interim/Final analysis algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4.2 Comparator Frequentist Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.5 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.6 Simulation Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.7 Simulation Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Chapter 5: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
vi
List of Tables
2.1 Summary table for novel basket trial design articles . . . . . . . . . . . . . . . . . . . . . . 23
3.1 Summary table for recent developments and extensions in adaptive enrichment trial designs 43
4.1 Simulation scenarios with descriptions, marker eect functions on the log odds ratio scale
with true parameter values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Simulation scenarios with descriptions, with maximum response rate . . . . . . . . . . . . 51
4.3 The average of mean and max of absolute deviation from the truth over 1000 iterations
for Scenario 1, Scenario 5 and Scenario 10 with dierent maximum allowable knots . . . . 78
4.4 The average of mean and max absolute deviation from the truth over 1000 iterations for
all scenarios with proposed spline model with knots = 6 versus a comparator model . . . . 80
4.5 Simulation scenarios with most ecacious marker value in experimental arm . . . . . . . 87
4.6 Simulation results for the proposed trial design . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.7 Simulation results for the comparator frequentist trial design . . . . . . . . . . . . . . . . . 112
vii
List of Figures
2.1 Basket trial schema with two dierent denitions of basket . . . . . . . . . . . . . . . . . 7
2.2 Basket trial design schema by Cunanan et al.(Left) and by Krajewska et al.(Right) . . . . . 19
3.1 Enrichment trial schema with a single arm . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Adaptive enrichment trial schema with a binary biomarker . . . . . . . . . . . . . . . . . . 28
4.1 Log of odds of response rate as function of biomarker X for Scenarios 1-6 . . . . . . . . . 52
4.2 Log of odds of response rate as function of biomarker X for Scenarios 7-12 . . . . . . . . . 53
4.3 Response rate as function of biomarker X for Scenarios 1-6 . . . . . . . . . . . . . . . . . . 54
4.4 Response rate as function of biomarker X for Scenarios 7-12 . . . . . . . . . . . . . . . . . 55
4.5 Model’s estimation with default setting for Scenarios 1-6 . . . . . . . . . . . . . . . . . . . 58
4.6 Model’s estimation with default setting for Scenarios 7-12 . . . . . . . . . . . . . . . . . . 59
4.7 Traceplot for the MCMC chain under Scenario 1 with no thinning and 9 knots . . . . . . . 61
4.8 Autocorrelation plot for the MCMC chain under Scenario 1 with no thinning and 9 knots . 62
4.9 Traceplot for the MCMC chain under Scenario 1 with no thinning and 4 knots . . . . . . . 64
4.10 Autocorrelation plot for the MCMC chain under Scenario 1 with no thinning and 4 knots . 65
4.11 Traceplot for the MCMC chain under Scenario 1 with thin=20 and 9 knots . . . . . . . . . 66
4.12 Autocorrelation plot for the MCMC chain under Scenario 1 with thin=20 and 9 knots . . . 67
4.13 Traceplot for the MCMC chain under Scenario 1 with thin=20 and 4 knots . . . . . . . . . 68
4.14 Autocorrelation plot for the MCMC chain under Scenario 1 with thin=20 and 4 knots . . . 69
viii
4.15 Traceplot for the MCMC chain under Scenario 1 with thin= 50 and 9 knots . . . . . . . . . 70
4.16 Autocorrelation plot for MCMC chain under Scenario 1 with thin = 50 and 9 knots . . . . . 71
4.17 Traceplot for the MCMC chain under Scenario 1 with thin = 50 and 4 knots . . . . . . . . . 72
4.18 Autocorrelation plot for the MCMC chain under Scenario 1 with thin = 50 and 4 knots . . 73
4.19 Estimation of treatment eect by dierent maximum allowable knots and dierent
Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.20 Randomization probability under Scenario 1 start number = 100 with dierent tuning
parameters and block sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.21 Randomization probability under Scenario 1 start number = 200 with dierent tuning
parameters and block sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.22 Randomization probability under Scenario 1 for 10 trials . . . . . . . . . . . . . . . . . . . 86
4.23 Randomization probability under dierent Scenarios . . . . . . . . . . . . . . . . . . . . . 89
4.24 Proposed Bayesian enrichment trial with adaptive randomization design schema . . . . . . 98
4.25 Comparator frequentist enrichment trial design schema . . . . . . . . . . . . . . . . . . . . 99
4.26 Rate of status by marker values for Scenarios 1-6 . . . . . . . . . . . . . . . . . . . . . . . . 106
4.27 Rate of status by marker values for Scenarios 7-12 . . . . . . . . . . . . . . . . . . . . . . . 107
4.28 Time of nal decision by marker values for Scenarios 1-6 . . . . . . . . . . . . . . . . . . . 108
4.29 Time of nal decision by marker values for Scenarios 7-12 . . . . . . . . . . . . . . . . . . 109
4.30 Rate of interim stopping time by marker values for Scenarios 1-6 . . . . . . . . . . . . . . . 110
4.31 Rate of interim stopping time by marker values for Scenarios 7-12 . . . . . . . . . . . . . . 111
ix
Abstract
Recent developments in gene sequencing of tumors in oncology have led to widespread increase in use of
targeted therapies based on patients’ biomarkers, a paradigm generally referred to as “precision medicine".
Before such advances, cancer was viewed in relatively homogeneous terms, and treatment strategies focused on type, location, and stage of disease but did not distinguish among patients’ tumor biology. With
the advent of precision medicine treatments, new challenges in statistical design of clinical trials have
naturally emerged.
This thesis focuses on challenges and advances in statistical designs that attempt to evaluate treatments in the framework of predictive biomarkers. First, we describe state of art trial designs and recent
developments in the areas of basket and adaptive enrichment trials, in two separate review articles. A
novel adaptive enrichment trial design to handle continuous biomarkers which may have non-linear or
non-monotonic relationships with outcomes or treatment eects is then introduced, where we propose a
Bayesian solution involving adaptive randomization. We show that this particular design can correctly
make marker-specic trial decisions with high eciency, which results in signicantly improved and
patient-tailored outcomes compared to standard approaches without adaptive randomization that further
ignore or over-simplify true underlying marker relationships.
This dissertation is structured as follows: Chapter 1 introduces the background on biomarker-driven
clinical trial designs in oncology, current challenges and the motivation for the thesis. Chapter 2 is a review
article describing traditional and recent developments in biomarker-driven basket trial designs, categorized
x
into Bayesian and Frequentist perspectives and main modeling methods. Chapter 3 oers an overview of
the standard enrichment and adaptive enrichment trial designs, describes current practical challenges, and
reviews novel designs proposed in the past few years, from both Bayesian and Frequentist perspectives.
Chapter 4 proposes a novel Bayesian enrichment design with adaptive randomization that could maximize
patients’ benet when a continuous biomarker is potentially predictive or prognostic. Finally, Chapter 5
provides a conclusion of the thesis work and discussion about potential future directions.
xi
Chapter 1
Introduction
Cancer was once viewed as relatively homogeneous in terms of treatment strategy according to type or location of malignancy and stage of disease. But with the rapid development of gene sequencing techniques,
some cancers are now better understood to be heterogeneous across specic genomically or biologicallydened characteristics, dening patient subpopulations. In turn, this has led to a new treatment paradigm
that has shown success and promise in certain clinical settings. However, treatments based on molecular
targeting, an approach known as “precision medicine”, have brought about new challenges in statistical
design of clinical trials. Specically, novel clinical trial designs have been developed to reveal or validate
subpopulations in which an experimental treatment has enhanced benet. My dissertation focuses on
developing, evaluating, and implementing such designs.
Chapter 2 and Chapter 3 consist of review articles each describing very recent updates in biomarker
designs: one review on biomarker-driven basket trial designs, and the other describing recent advances
in adaptive enrichment designs. Basket trials investigate one or more biomarker-targeted therapies across
multiple cancer types in a tumor location agnostic fashion, while adaptive enrichment designs initially
enroll patients with a certain type of cancer regardless of biomarker values and adaptively rene or enrich the target population to a treatment-sensitive subgroup during the study. The review articles oer
overviews of the standard forms of such designs, the practical challenges facing each type of design, and
1
then review novel adaptations proposed in the past few years, categorized into Bayesian and Frequentist
perspectives. The review articles conclude by summarizing potential advantages and limitations of the
new trial design solutions.
In Chapter 4, we specically focus on adaptive enrichment trial designs, which are randomized clinical
trials of a targeted therapy allowing mid-trial restriction to subpopulation(s) that show signs of benet.
Adaptive enrichment designs have the potential to reduce the trial period time by identifying subpopulation(s) that would benet from the treatment as well as the potential to validate putative predictive
biomarkers, both of which are crucial objectives in the medical treatment research and development process. Any improvements in these areas will translate to signicant resource and time savings and avoidance
of assigning too many patients to a less ecacious treatment based on marker status.
Currently, however, the literature and practice focus on modeling biomarkers as binary variables that
are either dened at trial initiation or dichotomized mid-trial as part of algorithmic searches for cutpoints
or thresholds yielding a single subpopulation with enhanced treatment eect. Since many biomarkers
are naturally continuous, there is sucient evidence to believe that designs incorporating continuous
marker modeling – and specically, those that handle potentially non-monotonic prognostic and predictive
relationships between biomarkers, treatment eect, and outcome – may be more sensitive, accurate, and
ecient than crude dichotomization-based enrichment designs. To this end, there is a critical need for
clinical trial designs where continuous biomarkers are treated as such, and decisions (such as ecacy,
futility and inferiority) are made using all the information the biomarker presents, whether the marker is
predictive, prognostic, both, or neither.
We develop a clinical trial design that includes these features. Additionally, within ongoing Bayesian
assessment of the biomarker’s predictive value and corresponding statistical uncertainty, we apply a markerspecic adaptive randomization procedure. In contrast to the traditional equal randomization schema,
2
response adaptive randomization utilizes the currently estimated posterior information for the (continuous and potentially nonlinear) marker and treatment eects, assesses these against incoming patients’
biomarker proles, and assigns them to the optimal treatment arms in accordance with their marker values. Extensive simulations are conducted to evaluate the trial design’s performance. Implementation code
will be shared on GitHub and can be used by others for simulation and trial planning.
3
Chapter 2
Literature Review on Novel Biomarker Trial Designs: Biomarker-Driven
Basket Trial Designs: Origins and New Methodological Developments
2.1 Introduction
Due to increased use of gene sequencing techniques, understanding of cancer on a molecular level has
evolved, in terms of both diagnosis and evaluation in response to initial therapies. This knowledge, now
being applied within most cancers previously characterized mainly by location in the body and stage of
disease, has dramatically changed approaches to drug development, where drugs’ new mechanisms of action are emphasized. A now-common therapeutic strategy involves identifying and then targeting certain
molecular pathways associated with disease progression and spread. In parallel, clinical trials meant to
evaluate molecularly-driven interventions through assessment of both treatment eects and putative predictive biomarker eects are being employed to advance the goals of precision medicine. In 2019, the Food
and Drug Administration (FDA) published the guidance document Adaptive Designs for Clinical Trials for
Drugs and Biologics Guidance for Industry and Master Protocols: Ecient Clinical Trial Design Strategies To
Expedite Development of Oncology Drugs and Biologics [35], recognizing the importance of using innovative
designs to address biologically-motivated questions, and indicating that such designs and their properties
should be carefully evaluated. In March 2022, FDA published another guidance more specic to master
4
protocols, Master Protocols: Ecient Clinical Trial Design Strategies to Expedite Development of Oncology
Drugs and Biologics Guidance for Industry, recommending to run phase II trials of novel therapies before
running larger scale master protocols, which may simultaneously study multiple drugs in more than one
cancer type[36]. While master protocols in general may be more ecient, they also pose special challenges
related to assessing safety proles and interpretation of statistical ndings due to their complexity.
Several introductory reviews of biomarker-driven designs covering denitions and properties of “standard” biomarker design classes (e.g., enrichment, interaction, basket, and umbrella designs) and associated
operational frameworks (e.g., master protocols and platform designs) exist in the literature [75, 38, 85, 108].
However, within some of these classes of designs, novel statistical methodology aecting design components such as marker evaluation, population enrichment or selection, Bayesian posterior “information”
borrowing, ecacy determination, and adaptive decision-making has continued to evolve in response to
practical or statistical shortcomings.
The aim of this article is to provide a “late-breaking” overview of new design developments in basket
trials, a class of biomarker-driven design that has been actively extended by several research groups in
recent years. We rst describe the traditionally used design in its most basic form, describe features or
issues that motivated development of methodologic extensions or solutions, and describe real-life examples
for illustration. Next, we review selected impactful publications from the last few years, highlighting novel
design features and performance and published head-to-head comparisons of dierent design strategies
that may impact trial performance. For organizational purposes and ease of comparison within statistical
paradigms, we separately describe Bayesian versus classical developments, though we acknowledge this
distinction is arbitrary on its own.
5
2.2 Traditional Basket Trial Design
2.2.1 Description and Features
When an investigational therapy’s mechanism of action targets a specic genomic or molecular alternation, it is reasonable to hypothesize that therapeutic activity may be independent–to some degree–of tumor
location or histology. A basket trial is a type of “master protocol” design that recruits patients with multiple tumor types into parallel sub-trials or “baskets”, where sub-trial eligibility is based on the presence
of a genomic alteration in the tumor rather than tumor location or histology [76]. Under this denition,
a “basket” is dened by one or more genomic alterations, with tumor types resembling dierent pieces
of fruit nested within the molecularly similar basket. One such “basket” trial may be conducted on its
own, but more often, multiple molecular targets and matched experimental treatments may investigated
in parallel, as dierent baskets (sub-trials) within an overall master protocol framework, as shown in the
left panel of Figure 2.1.
Within a basket (sub-trial), patients with the same biomarker but dierent tumor histologies are pooled
together as if they represent a homogeneous cohort, with the goal of evaluating an investigational therapy’s overall eect across tumor types. A basket (sub-trial) most often then has a single-arm, non-randomized design, as the standard of care or control that would be assigned via randomization may not be
predictable or the same across tumor types within the basket. Because baseline prognosis usually diers
across tumor types as well, the primary endpoint in a basket trial is usually tumor (or blood) response
rather than time to progression or progression-free survival.
Occasionally, the question of homogeneity of response rates across tumor types is a secondary objective, with the protocol including prospective comparisons of the treatment’s activity across those tumor
types. Such an analysis of homogeneity of response may be triggered only if there is an overall response
6
rate above a certain threshold (see NCI-MATCH), where the goal is identifying whether one particular tumor type “drove” the overall result. Otherwise, a low overall response rate usually signals that the markerdriven and targeted-therapy hypothesis is not worth further study. Rarely, evaluation of treatment activity
within each tumor-specic cohort is of main interest (see Vemurafenib study for nonmelanoma cancers
[41]). In that situation, the tumor type may be referred to as the “basket” rather than the entire sub-study,
and adjustments for multiple comparisons may be made to control overall type I error [37, 53], as in the
right panel in Figure 2.1. The usage of the term “basket” should be noted due to the varied denitions of
“basket” among dierent trials.
Figure 2.1: Basket trial schema with two dierent denitions of basket
2.2.2 Example: NCI-MATCH
An example basket trial with a traditional design is NCI Molecular Analysis for Therapy Choice (MATCH)
[93]. NCI-MATCH is an ongoing phase II non-randomization trial initiated in 2015 that enrolls adult patients with solid tumors, lymphoma, or multiple myeloma into one of multiple single-arm sub-studies
7
based on genomic abnormalities identied by next-generation sequencing on tumor biopsy specimens
[23]. As of February 2023, there are 38 sub-studies and only one arm is still open for enrollment, with an
estimated sample size of 6452 screened participants [93]. After biopsy screening, patients are matched to
a sub-study if their tumor harbors a genetic mutation associated with that sub-study; patients will stay on
the sub-study until progressive disease and potentially match to another sub-study if additional matching aberrations exist. By broadly investigating multiple cancer types and antitumor therapies believed
to target “basket”-specic molecular abnormalities of interest, many therapies, markers, and tumor types
can be studied in a large ecient trial system with common screening and parallel infrastructure, thereby
accelerating development and standardizing conclusions.
2.2.3 Traditional Basket Trial Design Challenges
Basket trials usually have a relatively straightforward design on the sub-study level; i.e., a single-arm
cohort with response-based endpoint that is evaluated across tumor types, where response is evaluated
in either a one-stage or two-stage enrollment design. However, this approach is inherently susceptible to
some weaknesses, such as the potential for the eect within one tumor type to be masked or washed out
by the others, particularly if the relative size of that tumor group is small. From this perspective, an overall
analysis across tumor types may be too conservative, while independent analyses within tumor types may
be too strict. Also, dierent tumor types often have dierent prognostic processes, and overall analysis
across tumor types may not be informative. In addition, most basket trials are non-randomized because
incorporation of appropriately dierent historical controls for dierent tumor types in a single arm setting
often isn’t possible, so a given sub-study is often restricted to seek an indisputable “home run” achievement
of high overall response rate. To address these tradeos and associated barriers to implementation of
basket trials, a number of research groups have proposed modications to the “traditional” basket trial
design framework.
8
2.3 Novel Basket Trial Designs: Recent Developments
Here, we provide an overview of new developments in Basket Trial designs, presented separately by
Bayesian versus Classical design solutions in order to highlight some statistically similar features within
each category.
2.3.1 New Basket Designs: Bayesian Statistical Approaches
Bayesian Hierarchical Modeling Approaches. Several recently proposed modications or extensions to basket trials utilize the Bayesian hierarchical modeling (BHM) framework to assess and potentially borrow
endpoint information across tumor types, where in this context, the tumor types represent the “baskets”
within a single sub-trial.
Thall et al. [94] were among the rst to propose a design based on BHM that adaptively borrows
information across dierent strata with interim analyses to estimate the treatment eect in a particular
subgroup. Although this design was not explicitly applied to or called a “basket trial” by name, its parallel
application to a disease comprised of multiple subtypes that may or may not be exchangeable became a
“launching point” for future Bayesian solutions. For example, Berry et al. [4] specically applied BHM to
the basket trial setting to borrow strength across strata based on observed data with a shrinkage parameter controlling the amount of borrowing (the parameter is given a non-information prior). Subsequently,
Freidlin and Korn [25] demonstrated that the shrinkage parameter in the BHM context could not be estimated precisely from observed data when the number of baskets is fewer than 10. To overcome the issues
of shrinkage parameter estimation, Chu and Yuan [12] proposed a variation of the Bayesian hierarchical
model called Calibrated Bayesian Hierarchical Model (CBHM) for single-arm basket trials with binary endpoints. Instead of treating a single shrinkage parameter as xed and unknown, they dene shrinkage as a
monotonically increasing function of subgroup heterogeneity, which is in turn derived from Chi-squared
9
test statistics. The parameters of the function itself are calibrated through simulations so that shrinkage
is greatest when heterogeneity is low and shrinkage is least when heterogeneity is high, and the design
allows for early stopping for futility (but not ecacy). Since the degree of borrowing can be controlled, the
ination of Type I error for ineective subgroups can also be controlled, unlike the Bayesian hierarchical
model. The authors compared performance against the BHM approach under dierent prior distributions
and against independent evaluations without pooling, and concluded that CBHM has better control of
type I error than traditional BHM when heterogeneity is present. BHM cannot estimate the shrinkage
parameter precisely enough compared to the CBHM, which yields to the inated Type I error.
Instead of assuming homogeneity and exchangeability across all baskets, other approaches extended
BHM from borrowing information from all strata to allowing for stand-alone strata that do not share in information borrowing. For example, the exchangeability-nonexchangebility approach (EXNEX) proposed
by Neuenschwander et al. [66] allows a given stratum to either exchange information on treatment effect estimation with other strata or to be considered as non-exchangeable and therefore analyzed as an
independent stratum. To identify strata with dierent response rates and aggregate those with similar
response rates, Chu and Yuan [13] proposed a Bayesian latent subgroup trial (BLAST) that clusters baskets
into subgroups and applies BHM to borrow information among each subgroup, with number of subgroups
ranging from one to three. Instead of using the deviance information criterion to select the number of
subgroups in BLAST, a Bayesian cluster hierarchical model (BCHM) uses Dirichlet processes to perform
subgroup classication, which is followed by information borrowing within subgroups [10].
Another extension of the BHM was proposed by Liu et al. [55] in the context of a two-stage design for a
binary outcome. In their design, homogeneity of treatment eect across tumor-type baskets is assessed at
the rst-stage interim analysis using a meta-analysis random eects model and Cochran’s Q test. If eects
are heterogenous, Simon’s two-stage design is applied on each basket independently; if homogeneous,
only baskets with suciently high Bayesian predictive power of trial success will continue to the second
10
stage. A Bayesian hierarchical mixture model (an extension of BHM with mixture prior distribution) is
applied to the nal analysis, with potential information borrowing across baskets. Liu et al.[55] compared
their design to parallel Simon’s two-stage designs for each basket and concluded that when sample sizes
are xed to be equivalent, their design more often makes the correct go/no-go decision in all cases while
requiring a smaller sample size on average [86].
When several treatments are studied for multiple cancer types with various treatment eects among
biomarker-dened subgroups, Ventz et al. [96] proposed a design for basket trials with a binary outcome that is generalized from BHM and combined with response-adaptive randomization. The prior for
the response probability is constructed for the biomarker-cancer prognostic eect, overall treatment effect, marker-specic, and cancer-specic treatment eects, which are all independent normal components.
Patients are adaptively allocated to the treatment that maximizes the posterior probability of treatment effect given the patient’s biomarker prole. At the same time, a minimum number of patients for each
drug, biomarker, and cancer combination is guaranteed. If a control arm exists, the number of subjects
assigned to the control should match the highest number of patients in the experimental arm, given a
particular disease and biomarker type. Futility stopping is based on the posterior probability of a positive
treatment eect and ecacy stopping is based on ecacy test statistics after a minimum required number of patients are enrolled, with pre-specied decision boundaries. Boundaries are tuned by simulations
to control Type I error. This design can be applied under two settings: the subpopulation-stratied design, which evaluates multiple therapies in one biomarker-positive population with a control arm, and the
subpopulation-nding design, which aims to select subgroups with a positive biomarker-drug interaction
under multiple therapies. For the subpopulation-stratied study simulation, there are three experimental
arms with cancer-specic control arms and only one arm with treatment benet, ve cancer types with
dierent accrual rates, and the sample size per cancer type is 240. Under 7 scenarios, adaptive randomization designs with and without borrowing information across cancer types correctly allocate more patients
11
into the cancer specic positive treatment arm compared to balanced allocation. Also, the required sample
sizes for the proposed design, with or without borrowing, are much smaller to obtain the same power
compared to the two balanced randomized designs: a multi-arm study and three separate two-arm studies.
For the subpopulation-nding study simulation, there are three cancer types, four biomarkers, and ve experimental arms without a control arm. Adaptive designs with and without borrowing are compared with
a multi-arm balanced randomization design, and 250 patients are enrolled for each biomarker. Only futility
stopping is enabled for all three designs. While the proposed adaptive design allocates more patients to
the eective treatment, information borrowing also results in incorrect assignments when the treatment
eect is heterogenous in dierent disease biomarker combinations. Point estimation is also provided with
only small bias.
Traditional basket trials do not include a control arm, and Yin et al. [107] further extended the Bayesian
hierarchical model design by incorporating randomization between experimental and control arms within
each basket (tumor type). The primary endpoint is binary, and a two-level hierarchical model is used to
allow borrowing information across baskets. Beta prior distributions are applied to the response rates for
the control arm in each basket. The nal decision rules to claim ecacy are based on posterior probabilities of treatment eect, which can have dierent thresholds for each basket while controlling FWER
for multiplicity. The sample size is the same for each subgroup and is optimized to achieve the desired
power via simulation. The proposed design was compared with parallel two-arm, single-stage studies. In
the simulation, there were 4 subgroups, the response rate was 0.2 for the control arm in all subgroups, and
the targeted eect size was 0.2. Six scenarios, including the null and alternative cases, were considered,
and the required sample size was 44 per subgroup. The proposed design is not overpowered when the
treatment eect is homogeneous. However, it inates the false positive rate in the non-active subgroups
when treatment eects are heterogeneous across subgroups. Also, there is no early stopping for futility
considered in the proposed design.
12
Combining the frameworks of biomarker adaptive enrichment designs with basket trials, Yin et al.
[106] proposed a basket biomarker cuto (BCC) design for a continuous biomarker and binary outcome,
which identies a biomarker threshold and selects a subgroup of patients who benet the most from the
treatment within tumor-specic baskets. In Stage 1, the optimal biomarker cutos are determined using a
grid search and the tumor-specic response rates are modeled by BHM to identify the biomarker cuto that
yields a clinically meaningful response rate in each basket. Then in Stage 2, patients’ enrollment criteria
will be restricted based on selected cutos and only biomarker-positive patients who are more likely to
respond to the treatment are enrolled into the respective baskets, with both ecacy and futility stopping
rules. BCC was compared with the predictive probability design by Lee and Liu [52] and the Bayesian
enrichment two-stage design by Shi and Yin [81] with marker cuto search. In the simulations with three
tumor baskets, BCC provided higher power to detect a clinically meaningful increment treatment eect in
the selected biomarker-positive subpopulation than the other two designs across scenarios.
To incorporate not only tumor and cancer types but also biomarker information into the denition of
“basket” in a basket trial framework, Takeda et al. [91] proposed a constrained hierarchical Bayesian model
for latent subgroup (CHBM-LS) design, where cancer types serve as the rst classier and a categorical
biomarker serves as the second classier. With the assumption of dierent underlying latent subgroups
formed by the two classiers and similar treatment eects within the same latent subgroups, the proposed
design identies the latent subgroup each basket belongs to and allows exchangeability within a subgroup using BHM for a binary outcome. An ordered, monotonic constraint in response rates within each
classier is used. The number of latent subgroups is determined by the goodness-of-t per the deviance
information criterion, and the authors suggest limiting the number of treatment sensitive and treatment
insensitive subgroups (by cancer type and biomarkers) to 4 in practice. CHBM-LS was compared with other
approaches including independent analyses by baskets, BHM, BHM with a covariate, clustered Bayesian
hierarchical modeling of Jiang et al. [44], and EXNEX by Neuenschwander et al. [66]. Under dierent
13
scenarios assuming four cancer types and a binary biomarker, CHBM-LS performed the best in terms of
power when treatment eects were heterogeneous across both cancer types and biomarker levels. However, if treatment eects were heterogenous only on the biomarker level but not across cancer types, BHM
with biomarker as a covariate design was preferred.
Bayesian Model Averaging Approaches. Psioda et al. [72] proposed a single-arm Bayesian adaptive
basket trial design with binary outcome and interim analyses for early ecacy or futility. Their approach
uses Bayesian model averaging (BMA) to pool tumor-type-specic baskets with a similar response rate,
where a grid of rate models is constructed assuming dierent response rates and constraints for equal
response rates between some groups [58, 74, 19]. The likelihood of the observed data under each set of
models across the joint model space is calculated assuming beta-binomial models for individual cohortspecic rates. Each basket will then reach a decision of early futility or ecacy based on prespecied
thresholds at each interim analysis. In simulations, the BMA approach was compared to independent twostage Simon designs, frequentist parallel two-stage designs, and the calibrated Bayesian hierarchical model
assuming 5 subgroups and variations in true response rates and accrual [86, 12, 17]. BMA was found to have
the best family-wise error rate (FWER) control and high power in some scenarios, with shorter expected
trial duration and sample size in most scenarios. While the design by Psioda et al. [72] demonstrated some
advantages, it becomes more computationally intensive as the number of baskets (tumor types) increases,
as a much larger model space must be explored. To address this, Asano and Hirakawa [2] proposed a
simplied BMA approach using two dierent models to account for the homogeneity and heterogeneity of
treatment eect among tumor subgroups within a single-arm basket trial, based on the work of Simon et al.
[87] and Simon et al. [85]. Here, the number of patients in each basket (tumor type) follows a multinomial
distribution with equal probability, and at interim analyses for early futility, the posterior probabilities of
the homogeneity model (M0) and heterogeneity model (M1) are estimated from binary patient response
data. If a basket reaches a futility threshold, the remaining sample would not be reallocated to those
14
remaining baskets. In simulations, the proposed design was compared to independent cohort evaluations,
BHM, EXNEX proposed by Neuenschwander et al. [66], and Simon’s method, and typically demonstrated
a higher FWER than the BHM weak borrowing method and lower power than the BHM strong borrowing
method .
Other Bayesian Strategies. Taking a dierent approach than BHM or BMA, Fujikawa et al. [26] proposed
that information borrowing across strata be driven by the similarity of posterior distributions for stratumspecic response rates. In this framework, posterior distributions for the response rates are estimated from
beta-binomial models, and pairwise similarity of posteriors is assessed by Jensen-Shannon and KullbackLeibler divergences. Posterior predictive probabilities for responses in each stratum, which are estimated
from posterior distributions updated with borrowing information, are used in stopping rules for each
stratum. Simulations assuming 3 24-patient cohorts were conducted, and performance compared against
independent analyses and BHM; the authors conclude that the proposed design has higher power and
smaller sample size than BHM and independent analyses in most cases.
Hobbs and Landin [39] proposed a sequential basket trial design with interim Bayesian stopping rules
and a binary outcome. Rather than approaching dimension reduction through a single-source hierarchical
model (SEM), the authors proposed to apply multisource exchangeability modeling (MEM), originally proposed by Kaizer et al. [46], thereby extending MEM to the basket trial setting. In this context, exchangeable
basket-specic model parameters are assumed to achieve posterior shrinkage, and the number of distinct
parameters is determined by the grouping of possible pairwise exchangeability relationships. Basket-wise
exchangeability is based on the marginal posterior distribution and demonstrated greater exibility than
SEM for basket-specic posterior inference. The proposed design was applied to a vemurafenib basket
trial, where it eectively integrated information among baskets with similar treatment eects [41].
15
While many Bayesian approaches base decision-making and inference on the posterior distribution
directly, Zhou and Ji [112] proposed a robust Bayesian hypothesis testing method (RoBoT) for the singlearm basket trial design . In this framework, baskets belong to dierent latent subgroups and the number of
subgroups and clustering of subgroups are determined by Dirichlet process mixture modeling, and a test of
whether the overall estimated response rate is dierent from the null is performed. The authors conducted
simulations across 7 scenarios to assess the operating characteristics for 4 baskets and sample sizes of 10
or 20. The performance of RoBoT was also compared with independent analyses in each basket, BHM,
EXNEX by Neuenschwander et al. [66], and BLAST by Chu and Yuan [13]. FWER, family-wise power,
basket-wise type I error and power, bias, and MSE were assessed. RoBoT is robust and can control FWER
and basket-wise Type I error in all scenarios. Two real-world case studies were also discussed.
Motivated by IMPACT II, which explored multiple tumor-marker combinations’ target therapies, Xu
et al. [104] proposed a nonparametric Bayesian basket trial design that could adaptively allocate patients
between targeted and non-targeted therapies and identify treatment sensitive subpopulations [42]. Subpopulations are dened by tumor-marker pairs, and a decision-theoretic approach is applied to nd the
subpopulation with maximized expected utility. No recognized subgroup with or without overall treatment eect is also allowed as a nal conclusion. The design is applied to a survival endpoint, and the utility function is built on the log averaged hazard ratio over a period for tumor-marker pairs, with a penalty
for a small number of subjects. A nonparametric Bayesian survival regression model is used to model
the treatment eect, proposed by Müller et al. [65] and Quintana et al. [73]. The model randomly partitions patients into clusters given covariates, and it is a modication of the product partition model with a
similarity function that favors homogeneous clusters. After the initial run-in period, the patient-specic
randomization ratio arises from the posterior predictive probability of superiority of target therapies. In
the simulation, 6 scenarios are considered with 5 biomarkers, 3 types of cancer, and 400 patients. The
proposed design allows more patients to receive an eective treatment given the tumor-marker pairs and
16
has a high probability of correctly identifying the subpopulation with the largest utility. The design was
compared against a two-arm randomized study without a subgroup dened by biomarkers and separate
independent trials by each subgroup, and the authors demonstrated that it yields less biased estimation of
the treatment eect given tumor-marker pairs. One limitation is that the design parameters for the utility
function are arbitrary and require ne-tuning to provide desired operating characteristics.
2.3.2 New Basket Designs: Classical Statistical Approaches
Simon’s two-stage design extensions. Simon’s two-stage design is the most conventionally used phase II design for single-arm trials [86]. Zhou et al. [111] proposed to extend Simon’s optimal design and minimax
two-stage design to the setting of basket trials as a multi-arm trial. In this context, the primary endpoint is
tumor response rate, and the research question is whether the treatment is eective in at least one basket
(tumor type). Subgroup selection is made when n1 patients are evaluated at stage 1 for each subgroup, and
only those with at least r1 responders will continue to enroll n2 patients in stage 2. At the end of stage
2, the remaining subgroups are pooled together, and α
∗
for the pooled analysis is based on Binominal
distribution. Design parameters r1, n1, n2, and α
∗
are solved by controlling the Type I and Type II errors.
A table of design parameters for Type I error = 0.05 and Type II error = 0.2 is provided, corresponding to
the optimal and minimax designs. The proposed design is compared to Simon’s Bayesian basket design,
independent Simon 2-stage designs with Bonferroni correction, and pooled analyses without pruning subgroup selection. The simulations assume 6 baskets (tumor types) with the null response rate of 0.05 and
alternative rate of 0.2, 0.3 or 0.4, and 8 scenarios are considered. The total sample size is xed at 23 for
all four designs, which was chosen based on the optimal design. The independent design has the lowest
power, and the pooled design only performs well with a high number of active baskets. The proposed design has a power similar to Simon’s Bayesian basket design in most cases. While it gives a higher expected
number of true positives when there are 1, 2, or 4 active baskets, its expected number of false positives is
17
higher in all scenarios compared to the Simon’s Bayesian basket design. The proposed design’s setup is
straightforward, requiring only Type I and Type II error rates as design parameters. However, the relatively
high expected number of false positives might be a trade-o.
Similarly, Cunanan et al. [17] proposed an adaptive two-stage basket trial design to extend Simon’s
optimal two-stage design in each basket in parallel . However, instead of extending directly to a multi-arm
design, heterogeneity of the response rates across baskets is assessed at interim analyses. If the treatment
eects are similar, baskets (tumor types) are pooled together, stopped early for futility, or continued to
the second stage and analyzed as a pooled sample; if the treatment eects are heterogeneous, baskets are
analyzed separately for futility, and only promising baskets enter the second stage, with a Bonferroni-type
multiplicity adjustment for nal analysis. Heterogeneity is assessed by an exact test on the contingency
table for response and baskets with a calibrated decision parameter. Due to computational intensity, the
total sample size for each stage, the critical values for testing homogeneity, and the basket-specic stopping boundaries are required to be prespecied. In the simulation studies, 5 baskets were assumed and
6 congurations of the number of active baskets are considered with a null response rate of 0.15 and an
alternative rate of 0.45. The proposed design is compared with parallel, independent optimal Simon twostage designs in each basket. In that context, stage 1 has 35 patients, and stage 2 has 20 patients. When
the accrual rates are equal across baskets, the proposed design can reduce the expected trial duration due
to the reduced overall sample size while controlling FWER. When accrual rates are unequal, the trial duration will increase for the proposed design, but it also occurs practically for the reference design set of
trials. Moreover, the proposed design has higher power when there are more than 3 active baskets. One
limitation is that the design has high eciency when the target agent is eective in most or all subgroups,
but power is reduced if only a single subgroup is sensitive to the treatment. Also, while the design focuses
on assessing whether the novel therapy works overall, it cannot assess specic treatment eects in each
basket.
18
As an extension to the above study design proposed by Cunanan et al. [17], Krajewska and Rauch [49]
suggested to cluster baskets at the heterogeneity check step. Instead of pooling all baskets (tumor types)
together if response rates across baskets are homogeneous or analyzing as individual baskets if they are
not, they proposed to cluster baskets with similar response rates together using a k-means algorithm and
perform go/no-go decisions per cluster. The number of clusters does not need to be predened, and the
optimal cluster number maximizes the average silhouette, which measures how well a basket ts to its
cluster versus the others [78]. Simulations were conducted and compared with the pooling design by
Cunanan et al. [17] for baskets of 5 and 8, with each basket having a response rate of 0.45 or 0. The sample
size per basket in stage 1 was 7; in stage 2, sample size was 20 per cluster or pooled sample, and 15 per
cluster if only a single basket in the cluster. The clustering design performed the best when there were
four active baskets in a total of ve baskets. While the clustering design provides better marginal power
than the pooling design, it cannot always classify baskets correctly into clusters based on the percent of
incorrectly merged clusters.
Figure 2.2: Basket trial design schema by Cunanan et al.(Left) and by Krajewska et al.(Right)
19
Other Classical Strategies. Combining aspects of group sequential and basket trials, Liu et al. [54]
proposed a two-arm multi-stage design with subgroup selection interim analyses followed by ecacy
interim analyses that could be applied in both adaptive enrichment trial and basket trial contexts. Here
we discuss its application in basket trials solely. For the suggested design, a binary endpoint is used for
subgroup selection analyses, and a time-to-event endpoint is used for ecacy interim and nal analyses,
which could be derived from the same clinical outcome. Though the authors discussed only one interim
analysis for ecacy, the method can extend to multiple interim ecacy analyses between the subgroup
selection and nal analysis timepoints. The main focus is implementation of the interim analyses for
ecacy on the rened population while controlling Type I and Type II errors and maintaining a suciently
high early stopping probability for ecacy. At the population selection interim analysis, the un-pooled
proportional test statistic is calculated for each of the K mutually exclusive subpopulations, and those with
a test statistic less than the prespecied selection criteria are dropped. Those continuing subgroups are
pooled together for later analysis. Overall Type I error is controlled by searching the rejection boundaries
for both interim and nal analyses in the entire parameter space while considering possible congurations
of selected subgroups across all subgroups. Due to computational intensity, the rejection boundary at the
ecacy interim is set to be a function of the rejection boundary at the nal analysis. Equations for power
and stopping probabilities for ecacy are provided. In a hypothetical case study, there are 6 subgroups
and 720 patients with 430 overall survival events. A variety of hazard ratios and dierent ecacy interim
analysis times are considered. Two types of sample size adjustment after enrichment are discussed in the
case study: no adjustment of planned sample for each subgroup or no adjustment of the total planned
sample size with reallocation of any remaining sample size to selected subgroups. The total number of
events could potentially need to be greater than 430 for the second approach. The authors found that
nominal alpha decreased as the ecacy interim analysis occurs later or as the correlation between the
ecacy interim endpoint and nal endpoint increases. One limitation of this method is that it only allows
20
for trial stopping for ecacy but not futility. Also, the current setting is only under equal allocation for
each basket (tumor type), which may not be practical due to the dierent prevalence of cancer types. In
addition, the correlation between the endpoints utilized needs to be assessed beforehand.
2.4 Novel Basket Trial Designs: Published Comparisons and
Examination of Features
In order to tackle the issue of power reduction by pooling ineective baskets (tumor types), several statistical methods have been suggested to conduct subgroup selection at interim analyses. Chen et al. [9]
discussed the statistical considerations of a two-arm, two-stage, equal randomization design within each
subgroup-specic basket trial design. After an interim analysis for subgroup selection, selected subgroups
are pooled for the nal analysis. The main focus is on Type I error control for the pooled analysis while
implementing three dierent sample size adjustment strategies: xed sample sizes for each basket, xed
total sample size in the pooled analysis and xed total sample size in overall study. Whether or not the
interim and nal analyses use the same endpoint, the design’s performance with pruning is better than
without, in terms of power and nominal Type I error.
Cunanan et al. [18] investigated the Bayesian hierarchical model, following the design proposed by
Berry et al. [4], with dierent priors: an inverse-gamma prior on the basket-level variance, a uniform prior,
and a half-t prior on the basket-level standard deviation. They found that inverse-gamma priors did not
perform well when only one basket is active, yielding high FWER under a strong borrowing assumption
or lower power under a week borrowing assumption. On the other hand, the other two prior distributions
with sucient mass in the tails provided more robust operating characteristics.
21
2.5 Conclusion: Future Needs in Basket Trial Designs
The majority of the basket trial designs discussed here are to be used in a phase II setting, in which strict
control of Type I error to below 5% after adjustment for multiplicity across baskets is not the main focus,
and where a “basket” refers to dierent tumor types sharing the same genetic aberration. As the overarching aim of basket trials is to explore and identify promising baskets with sucient activity to warrant
subsequent conrmatory phase III trials, eciently utilizing all observed data across baskets should be
considered, especially when the particular biomarker-cancer combination has a low prevalence. When the
treatment eects are homogeneous across cancer types within a marker group, borrowing information
across strata could increase the overall power [16]. When facing the likelihood of heterogeneous baskets,
how to eciently lter out non-eective baskets before borrowing information across baskets or pooled
analyses is worth further investigation.
Before choosing a specic design for a basket trial, the trial biostatistician should consider multiple
aspects, including the type of study outcome, the number of cancer subtypes, the number of investigational
drugs, prevalence of certain biomarker-cancer combinations, required versus feasible sample sizes, and
whether control arms or interim analysis decisions would aid eciency or interpretation. Overall response
rate is the most common endpoint in basket trials, but Xu et al. [104] and Li et al. [54] proposed designs
that evaluate survival outcomes. If a control arm within each tumor type is considered, Ventz et al.’s [96]
and Yin et al.’s [107] designs can be considered. When early futility or ecacy stopping rules are of interest,
the study designs proposed by Cunanan et al. [17] and Psioda et al. [72] may be appropriate. Each study
design has pros and cons and certain assumptions that should be adopted with caution, following careful
evaluation by the biostatistician in consultation with the study team and clinical experts regarding the
investigational drugs and diseases under study.
22
Table 2.1: Summary table for novel basket trial design articles
Basket
Design
Method
Subcategory Article Design or Model Name
or Title
Bayesian
Bayesian Hierarchical
Modelling Approaches
Chu & Yuan
(2018a) [12]
Calibrated Bayesian Hierarchical Model
(CBHM)
Neuenschwand
et al.
(2016) [66]
Exchangeability-nonexchangebility approach
(EXNEX)
Chu & Yuan
(2018b) [13]
Bayesian latent subgroup trial
(BLAST)
Liu et al. (2017)
[55]
Bayesian hierarchical mixture model
Ventz et al.
(2017) [96]
BHM with response-adaptive randomization
Yin et al. (2018)
[107]
Bayesian hierarchical model
design with a control arm
Yin et al. (2021)
[106]
Basket biomarker cuto (BCC) design
Takeda et al.
(2022) [91]
Constrained hierarchical
Bayesian model for latent subgroup
(CHBM-LS)
Bayesian Model
Averaging Approaches
Psioda et al.
(2021) [72]
Bayesian adaptive basket trial design
Asano &
Hirakawa
(2020)[2]
BMA approach with a homogeneity
model and a heterogeneity model
Other Bayesian
Strategies
Fujikawa et al.
(2020) [26]
A Bayesian basket trial design that
borrows information across strata
based on the similarity between the posterior
distributions of the response probability
Hobbs &
Landin (2018)
[39]
Sequential basket trial design
Zhou & Ji
(2021) [112]
Robust Bayesian hypothesis testing method
(RoBoT)
Xu et al. (2019)
[104]
Nonparametric Bayesian basket trial design
Classical
Simon’s Two-stage
Design Extensions
Zhou et al.
(2019) [111]
Extended to multi-arm basket trial design
Cunanan et al.
(2017) [17]
Parallel two-stage design by heterogeneity test
Krajewska &
Rauch (2021)
[49]
Parallel two-stage design by clustering
Other Classical
Strategies
Li et al. (2019)
[54]
Two-arm multi-stage basket trial design
23
Chapter 3
Latest Developments in “Adaptive Enrichment” Clinical Trial Designs in
Oncology
3.1 Introduction
As cancer has become better understood on the molecular level with the evolution of gene sequencing
techniques, considerations for individualized therapy using predictive biomarkers (those associated with a
treatment’s eect) have shifted to a new level. Traditional randomized trial designs tend to either oversimplify or overlook dierences in patients’ genetic and molecular proles, either by fully enriching eligibility
to a marker subgroup or enrolling all-comers without prospective use of potentially predictive biomarkers.
In the former case of marker enrichment, one cannot learn about a marker’s true predictive ability from
the trial’s conduct (as marker-negative patients are excluded); in the latter case ignoring the biomarker,
the end result may be a “washing out” of the treatment eect when a predictive marker truly does exist
within the sampled patient population.
In the last decade or so, randomized “adaptive enrichment” clinical trials have become increasingly
utilized to strike a balance between enrolling all patients with a given tumor type, versus enrolling only
a subpopulation whose tumors are dened by a potential predictive biomarker that is hypothesized to be
related to the mechanism of action of the experimental therapy. On a high level, adaptive enrichment
24
designs take the form of a clinical trial that begins by randomizing participants to a targeted versus a
control therapy regardless of marker value, then adapts through a series of one or more interim analyses
to potentially limit subsequent trial recruitment to a marker-dened patient subpopulation that is showing
early signals of enhanced treatment benet.
In this review article, we rst discuss the “traditional” presentation of both enrichment and adaptive
enrichment designs and their decision rules and describe statistical or practical challenges associated with
each. Next, we introduce innovative design extensions and adaptations to adaptive enrichment designs
proposed during the last few years in the clinical trial methodology literature, both from Bayesian and
Classical perspectives. Finally, we review articles in which dierent designs within this class are directly
compared or features are examined, and we conclude with some comments on future research directions.
3.1.1 Enrichment Trial Designs
To motivate discussion of adaptive enrichment designs and why they are useful, it is helpful to rst understand enrichment trial designs, or designs that focus only on a subset of the patient population from the
beginning.
Design Details: In the setting of targeted therapies with strong prior evidence or clinical rationale
supporting ecacy only within a biomarker-selected subgroup, “marker-enriched” or enrichment trial
designs are used to conrm signal or ecacy only in that selected subgroup. In these types of trials,
patients are screened and classied into prespecied marker positive and negative subgroups at or prior to
enrollment, with only marker positive patients eligible to remain on study and receive protocol-directed
targeted therapy. This usually takes the form of a small, single-arm phase II study without a randomized
comparator, but in some settings, comparisons against a randomized non-targeted standard of care therapy
might be made (see Figure 3.1).
25
Figure 3.1: Enrichment trial schema with a single arm
Example: An example of a clinical trial with an enrichment design is the Herceptin Adjuvant (HERA)
trial. The HERA trial is a phase III, randomized, three-arm trial that studied the ecacy of 1 year versus
2 years of adjuvant trastuzumab versus control (no additional treatment) in women with human epidermal growth factor receptor 2 (HER2)-positive early breast cancer after completion of locoregional therapy
and chemotherapy [29]. HER2 is overexpressed in 15-25% of breast cancer and trastuzumab, a monoclonal antibody, binds the HER2 extracellular receptor [89, 90]. The primary outcome was disease-free
survival and using an intention-to-treat analysis, signicant treatment benet was demonstrated for 1
year of trastuzumab compared to the control arm.
Limitations: One important limitation of enrichment designs is that a marker’s predictive ability to select patients for treatment is assumed to already be known and cannot be validated from the trial itself. It
is theoretically possible that a pre-dened marker-negative subgroup might also benet from the targeted
treatment, but that knowledge won’t be updated with an enrichment design. For example, a pre-clinical
26
study found that trastuzumab can decrease cancer cell proliferation in HER2 negative and HER2 phosphorylation at tyrosine Y877 positive breast cancer cell lines, which is comparable to the drug eect in a
HER2 positive breast cancer cell lines, showing that the HER2 negative subpopulation may also benet
from trastuzumab [7].
Another limitation of enrichment trial designs is the necessity of establishing predened subgroups
during the study planning phase, which becomes complicated when dealing with biomarkers that exist on
a continuous scale, like expression levels or laboratory values. Determining an appropriate threshold to
divide patients into “positive” and “negative” groups is not always straightforward, validated, or eective
in distinguishing the eect of the targeted treatment. Selecting an incorrect threshold during trial design
can result in an ineective or underpowered study, and revising the decision once the trial has begun
accrual is not advisable.
3.1.2 Adaptive Enrichment Trial Designs
Adaptive enrichment trial designs, on the other hand, are an attractive solution to the inherent weaknesses
of a fully enriched trial design.
Design Details: An adaptive enrichment trial design initially enrolls patients with any marker value(s)
and randomizes them to experimental targeted versus standard (non-targeted) therapy. As the trial progresses, accrual may be subsequently rened or restricted to patients with certain marker values according
to those showing initial ecacy on the basis of one or more interim analyses. This design is randomized
out of necessity, so that treatment-by-marker interactions may be computed, and adaptations based on
dierential treatment eects by marker subgroups can be facilitated. At the interim analyses, according to
pre-specied decision rules, a trial may stop early for futility or ecacy, either overall, or within a markerdened subgroup. If the biomarker of interest is not naturally dichotomous, the same interim analyses may
also be used to select or revise marker cutpoints (see Figure 3.2).
27
Figure 3.2: Adaptive enrichment trial schema with a binary biomarker
Example: One real-world example of an adaptive enrichment design is the Morphotek Investigation
in Colorectal Cancer: Research of MORAb-004 (MICRO), which is an adaptive, two-stage, phase II study
assessing the eect of ontuxizumab versus placebo in patients with advanced metastatic colorectal cancer
[64, 32]. Ontuxizumab, a monoclonal antibody treatment targeting endosialin function, was expected to
be more eective in patients with endosialin-related biomarkers. Since the biomarkers were continuous
in nature and the optimal cutos were unknown, the study included an assessment for determining the
best cutos at an interim analysis, where progression-free survival (PFS) served as the primary endpoint.
Initially, the goal was to demonstrate the treatment eect of ontuxizumab either overall or within subgroups dened by biomarkers. However, the interim analysis revealed that none of the biomarkers had a
predictive relationship with treatment outcome. Consequently, the design shifted to a non-marker-driven
comparison. Additionally, the interim analysis showed early futility for ontuxizumab compared to placebo
overall, terminating the trial early due to lack of ecacy. In summary, this adaptive enrichment design
28
concluded both the biomarker assessment and the evaluation of the therapy early, and additional resources
and patients were spared.
Limitations: Adaptive enrichment trial designs do have some statistical challenges, including limitations faced in the design of the MICRO trial. These include estimation of subgroup-specic treatment
eects, particularly when the marker prevalence is low, as a suciently large sample size is required to
have enough patient-level information at interim analysis for informative subgroup selection. As a practical consideration, the primary endpoint must be quickly observed relative to the pace of accrual, to allow
time for impactful adaptations based on observed outcomes relatively early in the trial. Another challenge
is how exactly one should select cutpoints for adaptation of accrual. In the MICRO trial, at the interim
analysis, a series of Cox proportional hazards models were t over a grid of possible cutpoints, and the
signicance of a marker x treatment interaction term was evaluated. A pre-specied level of statistical
signicance for the interaction, along with a clinically meaningful eect in the marker “positive” group
dened by the interaction, would warrant potential accrual restriction; however, this approach treated
truly continuous biomarkers as binary in its implementation, which results (at least theoretically) in a loss
of information and potential loss of power.
Several groups have attempted to extend or modify the standard adaptive enrichment trial design
in various ways to address statistical shortcomings or tailor the strategy to various applications. The
remainder of this paper provides an overview of some of these recent developments. While we admit such
designations are rather arbitrary, we present this work separately by Bayesian and frequentist/classical
approaches, so that structural similarities among them may be readily described and compared.
29
3.2 Recent Developments and Extensions in Adaptive Enrichment Trial
Designs
3.2.1 Bayesian Approaches
Xu et al. proposed an adaptive enrichment randomized two-arm design that combines exploration of treatment benet subgroups and estimation of subgroup-specic eects in the context of a multilevel target
product prole, where both minimal and targeted treatment eect thresholds are investigated [103]. This
adaptive subgroup-identication enrichment design (ASIED) opens for all-comers rst, and subgroups
identied as having enhanced treatment eects are selected at an interim analysis, where pre-set minimum and targeted treatment eects are evaluated against a set of decision criteria for futility or ecacy
stopping for all-comers or possible subgroups. A Bayesian random partition (BayRP) model for subgroupidentication is incorporated into ASIED, based on models proposed by Xu et al. [105] and Guo et al. [33].
Due to the exibility of the BayRP model, biomarkers can be continuous, binary, categorical, or ordinal,
and the primary endpoint types can be binary, categorical, or continuous. Per the authors, extensions to
count or survival outcomes are also possible. BayRP was implemented due to its robustness, but other
Bayesian subgroup identication methods could be used as well, like Bayesian additive regression tree
(BART) or random forests for larger sample sizes [11]. A tree-type random partition of biomarkers is used
as a prior and an equally spaced k-dimensional grid constructed from k biomarkers is used to represent
possible biomarker proles. The operating characteristics of ASIED as a trial design was evaluated by
simulations with 4 continuous biomarkers, a total sample size of 180, an interim analysis after 100 patients
were enrolled, a minimum desired treatment eect of 2.37 and target treatment eect of 3.08 on a continuous score scale. ASIED’s recommendations were close to the expected results. However, the number of
30
simulated trials was only 100, which could yield lower precision of the estimated operating characteristics. Another limitation is that the partition of the biomarker prole was limited to at most four biomarker
subgroups due to the small sample size in each partition.
Another Bayesian randomized group-sequential adaptive enrichment two-arm design incorporating
multiple baseline biomarkers was proposed by Park et al. [69]. The design’s primary endpoint is time-toevent, while a binary early response acts as a surrogate endpoint assisting with biomarker pruning and
enrichment to a sensitive population at each interim analysis. Initially, the study is open for all-comers and
the baseline biomarkers can be binary, continuous, or categorical. The rst step at each interim analysis
is to jointly select covariates based on both the surrogate and nal endpoints by checking each treatment
by covariate interaction. The second step is to recalculate the personalized benet index (PBI), which is
a weighted average posterior probability indicating patients with selected biomarkers who benet more
from the experimental treatment. The retted regression from the variable selection step will redene
the treatment-sensitive patients, and only patients with PBI values larger than some pre-specied cuto continue to be enrolled to the trial. The third step is to test for futility and ecacy stopping by a
Bayesian group sequential test procedure for the previously identied treatment-sensitive subgroups. In
simulations, AED was compared with group sequential enriched designs called InterAdapt and GSED, an
adaptive enrichment design and all-comers group sequential design [59, 77, 84]. The maximum sample
size considered was 400, and patients were accrued by a Poisson process with 100 patients per year. Two
interim analyses took place after 200 and 300 patients enrolled, and 10 baseline biomarkers were considered. Across each of the seven scenarios, prevalence of the treatment-sensitive group was set to be 0.65,
0.50, or 0.35. While nearly all the designs controlled the nominal Type I error to 0.05, AED had higher
probabilities of identifying the sensitive subgroup and correctly concluding ecacy than other designs.
Also, 1000 future patients were simulated and treated by each design’s suggested treatment, and AED had
the longest median survival time overall. One stated limitation of this work was its inability to handle
31
high dimensional baseline biomarker covariates, as the authors suggest considering no more than 50 baseline covariates in total. Also, biomarkers in this design are assumed to be independent, though selection
adjustment for correlated predictors is mentioned.
To address the scenario of a single continuous predictive biomarker where the marker-treatment relationship is continuous instead of a step function, Ohwada and Morita proposed a Bayesian adaptive patient
enrollment restriction (BAPER) design that can restrict the subsequent enrollment of treatment insensitive
biomarker-based subgroups based on interim analyses [67]. The primary endpoint is assumed to be timeto-event, and the relationship between the biomarker and treatment eect is assumed to increase monotonically and is modeled via a four-parameter change-point model within a proportional hazard model.
Parameters are assumed to follow non-informative priors, and the posterior distributions are calculated
using the partial likelihood of the Cox proportional hazard model. At each interim analysis, decisions can
be made for a subgroup or the overall cohort. In addition, treatment-sensitive patients can be selected
based on a biomarker cuto value, which is determined by searching over the range of biomarker values
and picking the one with the highest conditional posterior probability of achieving the target treatment
eect. Simulations were conducted to compare the proposed method against both a similar method without enrichment and a design using a step-function to model marker-treatment interaction eects without
enrichment. The maximum sample size considered was 240 with two interim analyses, and the assumed
target hazard ratio was 0.6. The results show that the proposed BAPER method decreases the average
number of enrolled patients who will not experience the targeted treatment eect, compared to designs
without patient selection. Also, BAPER has a higher probability of correctly identifying the cuto point
that achieves the target hazard ratio. However, BAPER has certain restrictions: the biomarker cannot be
prognostic, as the main eect for the biomarker is excluded from the proportional hazard model. Also, the
design does not consider the distribution of the biomarker values themselves, so a larger sample size is
required when the prevalence of the treatment sensitive (or insensitive) population is small.
32
Focusing on an optimal decision threshold for a binary biomarker which is either potentially predictive
or both prognostic and predictive, Krisam and Kieser proposed a new class of interim decision rules for
a two-stage, two-arm adaptive enrichment design [50]. This approach is an extension of Jenkins et al.’s
design but with a binary endpoint instead of a time-to-event outcome [43]. Initially, their trial randomizes
all patients from two distinct subgroups (i.e., a binary biomarker), assuming one subgroup will have greater
benet, and the sample size is xed per stages by treatment group. At the rst interim analysis, the trial
might stop early for futility, continue enrolling to only the marker-positive group, or continue enrolling the
full population, while using Hochberg multiplicity-corrected p-values for these decisions. When the full
population proceeds to the second stage, it remains possible that ecacy testing will be performed both
overall and in the treatment-sensitive subgroup if the biomarker is found to be predictive or prognostic,
or only within the total population if the biomarker is not predictive. The critical boundaries for subgroup
decisions minimize the Bayes risk of a quadratic loss function by setting the roots of partial derivatives as
optimal thresholds, assuming the estimated treatment eects follow bivariate normal distributions with
design parameters from uniform prior distributions. A relevance threshold for the eect size, which serves
as the minimal clinical meaningful eect, also needs to be prespecied. Optimal decision threshold tables
are presented for a biomarker that is predictive, both predictive and prognostic, or non-informative, with
sample sizes ranging from 20 to 400 and subgroup prevalence values of 0.1, 0.25 and 0.5 considered. In their
simulations, the sample size is 200 per group per stage (for a total trial sample size of 800), the treatment
eect (response rate) in one of the subgroups is 0.15, and the biomarker is both predictive and prognostic.
Optimal decision rules with three dierent assumptions for the biomarkers (predictive, predictive and
prognostic, non-informative) and subgroup prevalence are compared with a rule just based on relevance
thresholds. Power is increased under the proposed decision rules when the correct biomarker assumption
is made. Since the decision thresholds incorporate sample size and subgroup prevalence information,
33
one limitation is that knowledge about the biomarkers must be strong enough pre-trial to prespecify the
required parameters.
Nesting frequentist testing procedures within a Bayesian framework, Simon and Simon proposed a
group-sequential randomized adaptive enrichment trial design that uses frequentist hypothesis tests for
controlling Type I error but Bayesian modeling to select treatment-sensitive subgroups and estimate effect size [84]. The primary endpoint in their models is binary, and multiple continuous biomarkers are
allowed, comprising a vector of covariates for each patient. Patients are sequentially enrolled in a total
of K blocks, and enrollment criteria for the next block are rened by a decision function, which is built
on the block adaptive enrichment design by Simon and Simon [83]. The nal analysis is based on inverse normal combination test statistics using data from the entire trial. A prior for the response rate in
each arm needs to be prespecied, which is based on both the biomarker covariates and a utility function.
Dierent utility functions can be applied according to the trial’s goal, and the one adopted here is the
expected future patient outcome penalized by accrual time. Using the conditional posterior for the previous block’s information, simulations are conducted to nd the optimal enrollment criteria based on the
utility function. The expected treatment eect given covariates can be estimated by the posterior predictive distribution for the response rate at the end of trial. In the presented simulation study, there are two
continuous biomarkers and 300 patients accrued in two or three enrollment blocks, with three logistic and
three cutpoint models for the biomarker-response relationships. An unenriched design and an adaptive
enrichment strategy with prespecied xed cutpoints are compared with the proposed design. The two
adaptive enrichment designs have higher power than the unenriched design to detect a treatment sensitive subgroup, and the enrichment designs have higher power when there are three versus two enrollment
blocks. Compared with the xed cutpoint enrichment method, the proposed design generally correctly
identies the treatment-sensitive subgroup while avoiding non-ideal pre-determined cuto points for the
34
following enrollment criteria. Though the eect size estimation is biased under the proposed design, the
bias is more severe under the unenriched design.
Graf et al. proposed to optimize design decisions using utility functions from the sponsor and public
health points of view in the context of a two-stage adaptive enrichment design with a continuous biomarker
[30]. Similar to Simon and Simon’s method, the proposed design’s decisions are based on frequentist
hypothesis tests, while the utility functions are evaluated under the Bayesian approach. In this design,
patients are classied into marker positive and marker negative groups at enrollment, and decisions can
be made with respect to the full population or the marker positive subgroup only. Closed testing procedures
along with Hochberg tests are used to control the family wise type I error rate. Parameters called “gain”,
which quantify the benet rendered by the trial to the sponsor and society, need to be pre-specied. The
utility function under the sponsor view is the sum of the gain multiplied by the probability of claiming
treatment ecacy in the full population or a marker-positive group, respectively. In addition to gain and
success probabilities, the public health utility function also considers the true eect sizes in subgroups,
and safety risk as a penalization parameter. Prior distributions are used to model treatment eects in each
subgroup to account for uncertainty, but the authors assume that only the marker negative group can be
ineective, and only point priors are used, which leads to a single probability that the treatment is eective
in just the marker positive subgroup or the full population. This optimized adaptive design is compared
with a non-adaptive design when the total sample sizes are the same. The adaptive design provides larger
expected utility in both utility functions only when the values are intermediate in gain from treatment
ecacy and the prior point probability. One limitation is that those utility functions can only compare
designs with the same total sample size and the cost of running a trial is not included.
Serving as an extension of Graf et al.’s work by incorporating a term for the trial cost in utility functions,
Ondra et al. derived an adaptive two-stage partial enrichment design for a normally distributed outcome
with subgroup selection and optimization of the second stage sample size [68]. In a partial enrichment
35
design, the proportion of the marker-positive subjects enrolled does not need to be aligned with the true
prevalence. At interim analysis, the trial can be stopped for futility, or continued in only the markerpositive population or the full population. The nal analysis is based on the weighted inverse normal
function with Bonferroni correction. Utility functions used for optimization are from societal or sponsor
perspectives. Expected utility is calculated by numerical integration on the joint sampling distribution of
two stage-wise test statistics, with the prior distributions for the treatment eect in each subgroup. The
optimal sample size for the second stage maximizes the conditional expected utility given the rst stage
test statistics and sample size used, and the optimal rst stage sample size maximizes the utility using the
solved optimal number for the second stage. The optimization function is solved recursively by dynamic
programming, and the optimal design in terms of the sample size is obtained. The optimized adaptive
enrichment design is compared with an optimized single-stage design for subgroup prevalence ranging
from 10% to 90%, with both weak and strong predictive biomarker priors considered. Expected utilities are
higher in both sponsor and societal views in the adaptive design. Also, even if the prior distribution for
the eect size used in the design diers from the true distribution, the proposed adaptive design is robust
in terms of expected utilities when the biomarker’s prevalence is high enough. One limitation is that the
endpoint needs to be observed immediately, which might be addressed by a short-term surrogate endpoint.
3.2.2 Frequentist Approaches
Fisher et al. proposed an adaptive multi-stage enrichment design that allows subgroup selection at an
interim analysis with continuous or binary outcomes [22]. Two subpopulations are predened, and the
goal is to claim treatment ecacy in one of the subpopulations or the full population. The cumulative test
statistics for the subgroups and the full population are calculated at each interim analysis and compared
against ecacy and non-binding futility boundaries. To control the family-wise Type I error rate (FWER),
two methods for constructing ecacy boundaries are presented. One is proposed by Rosenblum et al.
36
that spends alpha based on the covariance matrix of test statistics by populations (two subpopulations and
the full population) and by interim stages [77]. Another is the alpha reallocation approach [61, 6]. The
design parameters, including sample size per stage, futility boundaries, etc., are optimized to minimize
the expected number enrolled or expected trial duration using simulated annealing, with constraints on
power and Type I error. If the resulting design does not meet the power requirement, the total sample
size will be increased until the power requirement is met. The optimized adaptive design is compared
with a single-stage design, optimized single-stage design, and a multi-stage group sequential design with
O’Brien-Fleming or Pocock boundaries using actual trial data from MISTIE [63] and ADNI [1]. For the
MISTIE trail, the proposed designs are optimized by the expected number enrolled, which is lower than
for the optimized single-stage design and group-sequential design, but the maximum number enrolled
is still lower in the simple single-stage design. In the ADNI trial, when the expected trial duration is
optimized, the proposed design has a slightly shorter expected duration but a longer maximum duration
than the optimized single-stage design.
Similar to the aforementioned Bayesian approaches without predened subpopulations, Zhang et al.
proposed a two-stage adaptive enrichment design that does not require predened subgroups [109]. The
primary outcome is binary, and a collection of baseline covariates, including biomarkers and demographics, is used to dene a treatment-sensitive subgroup. The selection criteria are based on a prespecied
function modeling the treatment eect and marker by treatment interaction using rst stage data. The
nal treatment eect estimate is a weighted average of estimates in each stage. To minimize the resubstitution bias from using rst stage data in subsequent subgroup selection and inference, four methods
for estimating the treatment eect and variance for the rst stage are discussed: naive approach, crossvalidation, nonparametric bootstrap, and parametric bootstrap. To compare those estimation methods,
ECHO [62] and THRIVE [15] trial data are used for the simulation with a total sample size of 1000. The
rst stage has 250, 500 or 750 subjects, and the function used to simulate outcomes is the logistic regression
37
model. The results show that the bootstrap method is more favorable than both the naive estimate (which
has a large empirical bias) and the cross-validation method (which is overly conservative). The weight for
each stage and rst stage sample size need to be selected carefully to reach a small root mean squared error
(RMSE) and close-to-nominal one-sided coverage. Though a trial can stop due to an empty subset from the
restricted enrollment, the proposed method does not include an early stopping rule for futility or ecacy.
In order to reduce sample size while assessing the treatment eect in the full population, Matsui and
Crowley proposed a two-stage subgroup-focused sequential design for time-to-event outcomes, which
could extend to multiple stages [60]. In this design, patients are classied into two subgroups by a dichotomized predictive marker, with the assumption that the experimental treatment is more ecacious in
the marker-positive subgroup. The trial can proceed to the second stage with one of the subgroups, or the
full population, but treatment ecacy is only tested in the marker-positive group or the full population at
the nal analysis. Choices of testing procedures are xed-sequence and split-alpha. At the interim analysis,
a superiority boundary for the marker-positive subgroup and a futility boundary for the marker-negative
subgroup are constructed. The superiority boundary is calculated to control the study-wide alpha level,
while the futility boundary is based on a Bayesian posterior probability of ecacy with a non-informative
prior. The required sample sizes for each subgroup are calculated separately, and the hazard ratio for the
marker-positive subgroup is recommended to be 0.05-0.70 under this application. The proposed design is
compared with a traditional all-comers design, an enriched design with only marker-positive subjects, a
two-stage enriched design, and a traditional marker-stratied design. Dierent scenarios are considered
including those with no treatment eect, constant treatment eect in both groups with hazard ratio (HR)
= 0.75, a qualitative interaction with HRs = 0.65 and 1 and a quantitative interaction with HRs = 0.7 and
0.8. The marker prevalence is set to 0.4, and the accrual rate is 200 patients per year. When using the
split-alpha test, the proposed design has greater than 80% power to reject any null hypothesis in the alternative cases, but the traditional marker-stratied design also provides enough power under all cases.
38
The number screened and the number randomized are reduced for the proposed design compared to the
traditional marker stratied design, but the reduction is only moderate.
To determine whether the full population or only the biomarker-positive subgroup benet more from
the experimental treatment, Uozumi and Hamada proposed a two-stage adaptive population selection design for a time-to-event outcome, an extension of methods from Brannath et al. and Jenkins et al. [95,
5, 43]. The main extension is that the decision-making strategy at the interim analysis incorporates both
progression-free survival (PSF) and overall survival (OS) information. Also, OS is decomposed into time-toprogression (TTP) and post-progression survival (PPS) when tumor progression has occurred, to account
for the correlation between OS and PFS. The combination test approach is used for the nal analysis based
on Simes’ procedure [82]. The hypothesis rejection rule for each population is a weighted inverse normal
combination function with prespecied weights based on the expected number of OS events in each stage.
At the interim analysis, a statistical model from Fleischer et al. under the semi-competing risks framework
is applied to account for the correlation between OS and PFS [24, 20]. The interim decision rule uses the
predictive power approach in each population, extending Brannath et al.’s method from single endpoint
to multiple endpoints with a higher weight on PFS data due to its rapid observation. In the simulation, a
dichotomized biomarker is used with a 50% prevalence. Four scenarios are considered, where hazard ratios in the marker-positive subgroup are always 0.5 and are higher in the marker-negative subgroup. For
simplicity, the HR is the same for TTP, PPS, and death. FWER is controlled for all cases, but it is a little too
conservative when the treatment is eective. The proposed design has a higher probability of identifying
the treatment-sensitive population at the interim analysis, particularly when the PPS eect is large; those
probabilities are similar between using OS or PFS alone or the combined endpoints when the PFS eect is
small. One limitation of this design is that sample size calculations are not considered.
Instead of a single primary endpoint, Sinha et al. suggested a two-stage Phase III design with population enrichment for two binary co-primary endpoints, which is an extension of Magnusson and Turnbull’s
39
work with co-primary endpoints [88, 59]. The two binary endpoints are assumed to be independent, and
the ecacy goal should be reached in both endpoints. With two distinct predened subgroups, a set of
decision rules stops the non-responsive subgroups using ecient score statistics. The futility and ecacy
boundary values, which do not depend on the marker prevalence, are the same for both endpoints due to
independence. The lower and upper stopping boundaries are calculated by alpha spending functions, and
FWER is strongly controlled. Simulations were conducted assuming biomarker prevalences of 0.25 or 0.75
and weighted subgroup eect sizes of 0, 1, and 2 as the means of ecient score statistics under normal
distribution. The results show that the proposed design can reduce false-negative results for heterogeneous treatment eects between subgroups. The authors state the possibility of extending the design to a
bivariate continuous outcome, while an extension to bivariate survival would be more challenging.
3.3 Published Comparisons and Examination of Features
Kimani, Todd, and Stallard derived a uniformly minimum variance unbiased point estimator (UMVUE) of
treatment eect in adaptive two-arm, two-stage enrichment design with a binary biomarker [48]. Based
on the Rao-Blackwell theorem, UMVUE for the treatment eect conditional on the selected subgroup is
derived with and without prior information on maker prevalence. The proposed estimator is compared
with the naïve estimator, which is biased but with a lower mean squared error (MSE) when prevalence is
known. The estimator is robust, with and without prior information on maker prevalence.
Kimani et al. developed estimators for a two-stage adaptive enrichment design with a normally distributed outcome [47]. A predictive continuous biomarker is used to partition the full population into a
prespecied number of subgroups, and the cuto values are determined at the interim analyses based on
stage I observations. To estimate the treatment eect after enrichment for the selected subgroup, a naive
estimator, uniformly minimum variance conditional unbiased estimator (UMVCUE), unbiased estimator,
40
single-iteration and multiple-iteration biased-adjusted estimators, and two shrinkage estimators are derived and compared. Though no estimator is superior in terms of bias and MSE in all scenarios, UMVUE
is recommended by the authors due to its mean unbiasedness.
Tang et al. evaluated several proposed adaptive enrichment designs with a binary biomarker against
the traditional group sequential design (GSD) for a time-to-event outcome [92]. Type I error is controlled,
and the subpopulation is selected by Bayesian predictive power. Adaptive design A selects the subgroup
after considering futility and ecacy stopping decision. Design B selects the subgroup when the targeted
number of events are observed in full population, which can be earlier than the interim analysis. Design C
selects the subgroup only after the full population has reached a futility rule. Design D proceeds with the
subgroup or full population by checking the treatment eect in the complementary subgroup, proposed
by Wang et al. [98]. When an enhanced treatment eect exists in the subpopulation, all of these adaptive
designs could improve study power compared to GSD. Furthermore, Design C generally provides higher
power across all scenarios among all the adaptive designs.
Benner and Kieser explored how the timing of interim analyses would aect power in adaptive enrichment designs with a xed total sample size for a continuous outcome and binary marker [3]. Two
subgroup selection rules are considered: the estimated treatment eect, or the estimated treatment eect
dierence between the subgroup and the full population. When using the rst selection rule, early timing increases power when the marker prevalence and marker cuto values are low. However, the interim
analysis timing’s impact on power is small when marker prevalence is high. If absolute treatment eect is
used instead, earlier timing leads to power loss in general. Power depends more on the marker threshold,
prevalence, and treatment eect size when interim timing is later than when half of the total sample size
have observed outcomes.
Kunzmann et al. investigated the performance of six dierent estimators besides maximum likelihood estimator (MLE) for a two-stage adaptive enrichment design for a continuous outcome [51]. Those
41
estimators are empirical Bayes estimator (EBE) [40, 8], parametric bootstrap estimator [70], conditional
moment estimator (CME) [57], and UMVCUE with MLE and CME as two hybrid estimators [14]. The hybrid UMVCUE and CME estimator could reduce the bias across all considered scenarios, which the authors
recommend, though with the cost of larger RMSE.
3.4 Conclusion and Future Needs in Adaptive Enrichment Trial Designs
In this review article, we have given a brief overview of traditional enrichment and adaptive enrichment
designs, outlined their limitations, and described recent extensions and modications to adaptive enrichment design strategies. Both Bayesian and frequentist perspectives in handling statistical issues of these
designs were discussed in detail, along with important considerations for design parameters.
Overall, adaptive enrichment trial designs tend to increase study eciency while minimizing subsequent study participation among patients showing a low likelihood of benet based on early trial results
[83]. Biomarker-driven designs that reliably identify or validate predictive biomarker relationships and
their thresholds with sucient power to achieve phase II or III objectives continue to be of interest and
warrant further development. Designs that make better use of truly continuous (versus dichotomous)
marker-ecacy relationships are essential for future research.
42
Table 3.1: Summary table for recent developments and extensions in adaptive enrichment trial designs
Adaptive
Enrichment
Design
Method
Subcategory Article Design or Model Name
or Title
Bayesian
Time-to-event outcome
Park et al.
(2022) [69]
Group sequential adaptive
enrichment design
Ohwada
& Morita
(2016) [67]
Bayesian adaptive patient
enrollment restriction
(BAPER)
Binary outcome
Krisam
& Kieser
(2015) [50]
Optimal Decision Rules
for Biomarker-Based Subgroup
Selection for a Targeted
Therapy in Oncology
Simon & Simon (2018)
[84]
Using Bayesian modeling
in frequentist adaptive
enrichment designs
Continuous outcome
Graf et al.
(2015) [30]
Adaptive designs for
subpopulation analysis
optimizing utility functions
Ondra et al.
(2019) [68]
Optimized adaptive
enrichment designs
Binary/Categorical/
Continuous outcome
Xu et al.
(2020) [103]
Adaptive subgroup- identication
enrichment design (ASIED)
Frequentist
Time-to-event outcome
Matsui &
Crowley
(2018) [60]
Subgroup-focused markerstratied sequential design
Uozumi &
Hamada
(2017) [95]
Interim decision-making strategies
in adaptive designs
for population selection using
time-to-event endpoints
Binary outcome
Zhang et al.
(2018) [109]
Treatment evaluation for
a data-driven subgroup
in adaptive enrichment designs
of clinical trials
Sinha et al.
(2019) [88]
Group-sequential adaptive
enrichment design with two binary
co-primary endpoints
Binary/Continuous
outcome
Fisher et al.
(2018) [22]
Stochastic optimization of
adaptive enrichment designs
for two subpopulations
43
Chapter 4
Bayesian Adaptive Enrichment Design for Continuous Biomarkers
4.1 Method Part I: Randomized Trial Design with Continuous
Biomarker for Binary Outcome
In order to study the eect of an investigated treatment on a binary endpoint Y , a two-arm randomized
clinical trial is conducted. We use Z to represent the treatment assignment, with Z = 1 indicating receiving the experimental treatment and Z = 0 for receiving the control treatment. We assume that a
continuous baseline biomarker X exists with values ranging from a to b for some a, b ∈ R
1
, which is
potentially prognostic for the disease progression and predictive for treatment eect based on previous
knowledge. Assume the biomarker information is available from each patient at baseline.
4.1.1 Model formulation
The model formulation section is an extension from Liu et al.’s work to a binary endpoint [56]. To model
the eect of the continuous biomarker X and investigated treatment Z on the binary response outcome
Y , a logistic regression model is used, which can expressed as
logit(P(Y = 1)) = log(
P(Y = 1)
1 − P(Y = 1)) = f(x) + g(x)z, (4.1.1)
44
where f(x) represents the marker’s prognostic eect in the control arm, and g(x) represents the marker’s
predictive treatment eect in the experimental arm. Cubic B-splines are used to model f(x) and g(x),
which allows non-linear and non-monotonic underlying functions for the marker eects. We rst dene
a knot sequence τ : τ1, . . . , τM+8 such that
a = τ1 = τ2 = τ3 = τ4 < τ5 < · · · < τM+4 < τM+5 = τM+6 = τM+7 = τM+8 = b. (4.1.2)
Then, we can express f(x) and g(x) for a patient with marker value x as:
f(x) =
M
X
+4
m=1
Bm(x)ηf,m, g(x) =
M
X
+4
m=1
Bm(x)ηg,m, (4.1.3)
where B1(x), . . . , BM+4(x) are the cubic B-spline basis functions with the knot sequence τ and ηg,m, ηh,m
are the corresponding coecients.
We can choose the M interior knots τ5, . . . , τM+4 from the observed biomarker values; using the
biomarker’s quantiles to select the interior knots is one example.
The model above (4.1.2) can be also be expressed for n patients with marker values x1, . . . , xn as below:
f(x) = Bηf , g(x) = Bηg, (4.1.4)
where x = (x1, . . . , xn)
0
, f(x) = (f(x1), . . . , f(xn))0
, g(x) = (g(x1), . . . , g(xn))0
, B is the n×(M + 4)
cubic B-spline design matrix with the (i, m)
th entry Bim = Bm(xi), ηf = (ηf,1, . . . , ηf,M+4)
0
, and ηg =
(ηg,1, . . . , ηg,M+4)
0
. We use the same cubic B-spline design matrix for f(x) and g(x) , which is common
in the other spline literature [97]. To prevent overtting, we also add the L2 penalty in (4.1.4), which is
45
adding a L2 penalty on the integration of the second-order derivative squared of f(x) and g(x), and that
is equivalent to placing the following priors on ηf and ηg:
p(ηf ) ∝ exp(−
1
2σ
2
f
η
0
fΛηf ), p(ηg) ∝ exp(−
1
2σ
2
g
η
0
gΛηg), (4.1.5)
with σ
2
f
and σ
2
g
as regularization parameters.
Since rank(Λ) = M+2 [97], let d1, . . . , dM+2 denote the M+2 positive eigenvalues of Λ after spectral
decomposition, and WΛ denote the (M + 4) × (M + 2) matrix formed by the corresponding eigenvectors
of Λ. We can then rewrite (4.1.4) as the following:
f(x) = 1nβf,1 + xβf,2 + WBvf , g(x) = 1nβg,1 + xβg,2 + WBvg, (4.1.6)
where 1n is an n × 1 vector consisting of 1’s, βf,1, βf,2, βg,1 and βg,2 are scalars having non-informative
Gaussian priors βf,1, βf,2, βg,1, βg,2 ∼ N(0, H2
), where H is a very large entry, vf and vg are (M +
2) × 1 random vectors with mutually independent Gaussian priors vf ∼ MV N(0, σ2
f
IM+2) and vg ∼
MV N(0, σ2
g
IM+2), and WB = BWΛdiag
d
−1/2
1
, . . . , d−1/2
M+2
. The Gaussian priors placed on βf,1,βf,2,
βg,1, βg,2,vf and vg reect the L2 penalty on ηf,m, ηg,m after matrix transformation. An non-informative
inverse Gamma prior is used for σ
2
f
and σ
2
g
: σ
2
f ∼ IG(a0,f , b0,f ), σ
2
g ∼ IG(a0,g, b0,g).
4.1.2 Posterior computation
Let D = {(yi
, xi
, zi); i = 1, . . . , n} be the patient data collected during the study at an interim or the end
of the study, where yi
is equal to 1 if patient i is observed to have a response and equal to 0 if no response,
xi
is the patient i’s biomarker value, and zi
is patient i’s treatment assignment. Let θf = (βf,1, βf,2, v
0
f
)
0
,
46
and θg = (βg,1, βg,2, v
0
g
)
0
. The logistic regression model parameters are given an appropriate prior and
updated with other parameters at each MCMC iteration. Given (4.1.1) and (4.1.6), the joint posterior for
(θf , θg, σ2
f
, σ2
g
) given data D is proportional to the full likelihood times the joint prior:
π
θf , θg, σ2
f
, σ2
g
|D
∝ L(θf , θg|D) × p(vf |σ
2
f
) p(vg|σ
2
g
) p(σ
2
f
) p(σ
2
g
)
× p(βf,1)p(βf,2)p(βg,1)p(βg,2)
∝
Yn
i=1 (
exp(f(xi) + g(xi)zi)
1 + exp(f(xi) + g(xi)zi)
yi
1
1 + exp(f(xi) + g(xi)zi)
1−yi
)
×
M
Y
+2
m=1 "
1
σf
exp
−
v
2
f,m
2σ
2
f
!# ×
M
Y
+2
m=1 "
1
σg
exp
−
v
2
g,m
2σ
2
g
!#
× exp(−
β
2
f,1
2H2
) × exp(−
β
2
f,2
2H2
) × exp(−
β
2
g,1
2H2
) × exp(−
β
2
g,2
2H2
)
× (σ
2
f
)
−a0,f −1
exp(−
b0,f
σ
2
f
) × (σ
2
g
)
−a0,g−1
exp(−
b0,g
σ
2
g
).
(4.1.7)
The parameters σ
2
f
and σ
2
g have a conjugate full conditional distribution and are updated by Gibbs
sampling. θf and θg do not have a closed-form full conditional distribution, so they are updated by adaptive
Metropolis sampling [34]. The posterior samples of θf and θg can also be used for Bayesian inference, as
well as the credible interval with a prespecied signicant level for f(x) and g(x).
When we need to construct a logistic regression model at an early stage of the trial with a small sample,
it is possible to have complete or quasi–complete separation, which may be due to perfect prediction by
continuous covariates. There are some solutions to handle this problem, including exact logistic regression
and Firth’s Bias-Reduced logistic regression[21]. We decide to impose an L2 penalty when we initialize
the the Metropolis random walk process. The shrinkage on parameter estimation at the initialization step
is applied throughout the whole trial. Let Xf = (1n, x, vf )
0
and Xg = (1n, x, vg)
0
; then the variance of
θf and θg can be expressed as:
4
V ar(θf ) = (X0
fWfXf + λIM+4)
−1X0
fWf [V ar(Zf )]WfXf (X0
fWfXf + λIM+4)
−1
V ar(θg) = (X0
gWgXg + λIM+4)
−1X0
gWg[V ar(Zg)]WgXg(X0
gWgXg + λIM+4)
−1
(4.1.8)
where
V ar(Zf ) = W−1
f
V ar(Y )W−1
f = W−1
f
V ar(Zg) = W−1
g V ar(Y )W−1
g = W−1
g
,
(4.1.9)
λ is the the regularization parameter, and Wf and Wg is a (M + 4 × M + 4) diagonal matrix with Wi
representing the weight for ith patient, which is the estimated probability of response for ith patient[101].
4.2 Method Part II: Simulations without Adaptive Design Features
4.2.1 Introduction of simulation scenarios
To evaluate this model’s performance, simulations were conducted for several scenarios without adaptive
randomization, described in Table 4.1. The following 12 scenarios are investigated in order to cover a range
of combinations of prognostic and predictive marker and treatment eects, including scenarios when the
eects are non-linear and non-monotone. Scenario 1 is the null case. Scenario 2 has a constant treatment
eect but no marker eect. Scenario 3 has linear prognostic marker eect but no treatment eect. Scenario
4 to 10 all have predictive marker eect with dierent underlying functions. Scenario 11 is a inferior
treatment eect case. Scenario 12 has linear prognostic marker eect and nearly dichotomous predictive
marker eect.The visual presentation for each Scenario in true log of odds of response rate and in true
48
response rate are shown in Figure 4.2 and Figure 4.4. For each scenario, we assume 500 patients in total,
with 250 in the control and 250 in the experimental arm, and a randomization ratio of 1:1. The response
rate is set to be 5% for the control arm, except for Scenario 3, 11 and 12, and the maximum response
rate for the experiential arm is set to be 30%, except for Scenario 1, 11 and 12 (Table 4.2). Given the 250
patients per arm, eect size dierence of 25%, trial has 0.7981 power based on a two-sided two-sample test
of proportions with α = 0.05.
49
Table 4.1: Simulation scenarios with descriptions, marker eect functions on the log odds ratio scale with
true parameter values
Case Scenario Control Arm
f(x)
Treatment Arm
f(x) + g(x)
Parameter Values
1
No treat eect
(No marker eect) −2.95 −2.95 -
2
Constant treat eect
(No marker eect) −2.95 ∆ − 2.95 ∆ = 2.1
3
Prognostic marker eect
(No treat eect) x∆ − 2.95 x∆ − 2.95 ∆ = 2.1
4
Predictive marker eect
(perfectly dichotomous) −2.95 1{x > 0.5} ∆ − 2.95 x0 = 0.5, ∆ = 2.1
5
Predictive marker eect
(nearly dichotomous) −2.95
exp{30(x − x0)}
1 + exp{30(x − x0)}
∆ − 2.95 x0 = 0.5, ∆ = 2.1
6
Predictive marker eect
(linear) −2.95 x ∆ − 2.95 ∆ = 2.1
7
Predictive marker eect
(non-linear and
monotone)
−2.95 − exp(−7(x − x0)) + ∆ − 2.95 x0 = 0.106, ∆ = 2.1
8
Predictive marker eect
(non-linear and
monotone)
−2.95 exp(1.13x) + ∆ ∆ = −3.95
9
Predictive marker eect
(non-linear and
non-monotone)
−2.95
If x ≤ x0, −0.85 − ∆
exp{30(x − x1)}
1 + exp{30(x − x1)}
If x > x0, −2.95 − ∆
exp{30(x − x2)}
1 + exp{30(x − x2)}
x0 = 0.5, ∆ = 2.1
x1 = 0.2, x2 = 0.8
10
Predictive marker eect
(non-linear and
non-monotone)
−2.95
If x ≤ x0, −2.95 + ∆ exp{30(x − x1)}
1 + exp{30(x − x1)}
If x > x0, −0.85 − ∆
exp{30(x − x2)}
1 + exp{30(x − x2)}
x0 = 0.5, ∆ = 2.1
x1 = 0.2, x2 = 0.8
11 Constant inferior treat eect
(No marker eect) ∆ − 2.95 −2.95 ∆ = 2.1
12
Prognostic and predictive marker eect
(linear;
nearly dichotomous)
x∆ − 2.95
exp{30(x − x0)}
1 + exp{30(x − x0)}
x1 − 2.95 + x∆
x0 = 0.5, ∆ = 1.22
x1 = 1.33
50
Table 4.2: Simulation scenarios with descriptions, with maximum response rate
Case Scenario Control Arm Treatment Arm
1
No treat eect
(No marker eect) 5% 5%
2
Constant treat eect
(No marker eect) 5% 30%
3
Prognostic marker eect
(No treat eect) 30% 30%
4
Predictive marker eect
(perfectly dichotomous) 5% 30%
5
Predictive marker eect
(nearly dichotomous) 5% 30%
6
Predictive marker eect
(linear) 5% 30%
7
Predictive marker eect
(non-linear and
monotone)
5% 30%
8
Predictive marker eect
(non-linear and
monotone)
5% 30%
9
Predictive marker eect
(non-linear and
non-monotone)
5% 30%
10
Predictive marker eect
(non-linear and
non-monotone)
5% 30%
11 Constant inferior treat eect
(No marker eect) 30% 5%
12
Prognostic and predictive marker eect
(linear;
nearly dichotomous)
15% 40%
51
Figure 4.1: Log of odds of response rate as function of biomarker X for Scenarios 1-6
52
Figure 4.2: Log of odds of response rate as function of biomarker X for Scenarios 7-12
53
Figure 4.3: Response rate as function of biomarker X for Scenarios 1-6
54
Figure 4.4: Response rate as function of biomarker X for Scenarios 7-12
4.2.2 Default estimation setting procedure
In this section, to evaluate base model and estimation performance, we demonstrate posterior estimation
of model parameters under default settings, and assuming estimation is only performed at the end of the
trial with no interim analyses.
For the simulation default setting under each scenario, the maximum number of allowable knots for
the splines is 9, which is based on the suggestion that knots more than 10 provide little advantage [31].
55
The choice of maximum interior allowable knots will be explored in later sections. The marker values
are sampled from the uniform distribution between 0 and 1 and the median marker value is set as the
reference for the prognostic eect. In our simulation, we set a0,f = b0,f = a0,g = b0,g = 0.01, which
are parameters for the uninformative inverse Gamma prior for σ
2
f
and σ
2
g
. We set H2 = 108
, which is the
variance for the Gaussian prior for βf,1,βf,2, βg,1, βg,2. The choices of those parameter are based on the
recommendation by Zhao et al.[110]. We set the regularization parameter λ = 0.001 in (4.1.8), since we
only want to impose small shrinkage on the coecient estimates to deal with complete or quasi-complete
separation. The continuous baseline biomarker values are standardized to be in the range of [0,1].
For the MCMC chain’s default setting, the burn-in length is 25000 samples without thinning, and the
number of posterior samples collected is 10000 for a single chain. A Metropolis-Hastings random walk
sampling algorithm is used to generate the chain. The initial starting position is the estimation from
optimizing the L2 penalized negative binomial log-likelihood with "proximal Newton" algorithm. The
proposal distribution used is a multivariate Gaussian distribution with the covariance estimated from the
initialization times 0.5
2
. The adaptive Metropolis-Hastings random walk is done by scaling the covariance
of the proposal distribution from the history of the chain[34]. The basic choice of scaling factor is 2.4
2/d,
where d is the number of estimated parameters [27]. In what follows, we refer to 2.4
2/d as the basic scaling
factor. Given the optimal acceptance rate is 0.234 for the multivariate proposal distribution[28], we choose
the nal scaling factor to be based on the acceptance rate for every 2000 iterations. If the acceptance rate
is below 0.1, the covariance for the proposal distribution is updated to the simulated posterior covariance
matrix times the basic scaling factor with the scaling factor of 0.6
2
, to reduce the step size; if the acceptance
rate is below 0.2, the basic scaling factor times the scaling factor of 0.9
2
is used; if the acceptance rate is
above 0.23, the basic scaling factor times the scaling factor of 1.1
2
is used, to increase the step size. The
scaling factors are 0.5, 0.9, 0.6 and 1.1. Expectation-maximization (EM) algorithm is used to estimate θf
and θg alternatively. The parameters for the MCMC chain will be explored in later sections as well.
56
Multiple convergence diagnostics for MCMC are available and we use Heidelberger and Welch’s statistics, which is a method derived from spectral density estimation, since it is ready to use from the CODA
R package and it is good for assessing a single MCMC chain [79]. Heidelberger and Welch’s test statistics,
based on the Cramer-von Mises test statistics, tests the null hypothesis if the sampled Markov chain is
from a stationary distribution and p-values will be produced for each estimated parameter. The test is
successively applied, rstly to the whole chain; if rejected, the rst 10% of the chain will be discard and
retested. After discarding the rst 10%, 20% .. of the chain until either the null hypothesis is accepted, or
50% of the chain has been discarded [71]. After multiple-testing adjustment based on the number of maximum allowable knots, the cuto p-value used to determine convergence failure is p < 0.05/28 = 0.0018, if
the the number of maximum allowable knots is 9 [102]. If the chain does not converge, we will attempt up
to 15 times or until failure of convergence is determined. Again, at this stage, we only perform nal model
estimation at the end of the trial.
4.2.2.1 Description of marker eect estimation
Using the model and prior specications from Section 1 and using the default settings above, we visually
illustrate the estimated predictive marker eects (g(x)) for a single trial iteration, which is also the treatment eect for a given marker value, under each scenario in Table 4.1. For the plots below (Figure 4.6), the
x-axis represents marker values and the y-axis is on the scale of log of odds ratios for response. The solid
black line is the estimated treatment eect, which is the mean of the estimated posterior distribution at a
given marker value. The dotted black lines are the 95% credible bands for the estimate of the treatment
eect curve. The solid red line is the true predictive marker function associated with the experimental
arm.
57
Figure 4.5: Model’s estimation with default setting for Scenarios 1-6
58
Figure 4.6: Model’s estimation with default setting for Scenarios 7-12
4.2.3 Exploration of Model and MCMC settings
Above, we have assumed a maximum of 9 interior knots, but fewer may be adequate. Additionally, from
visual inspection, estimation of the predictive marker eects are often inaccurate and not resembling the
true shapes, especially for the non-monotone Scenarios 9 and 10. Thinning of the sampled chains to control
autocorrelation may also be needed. Below, we explore the separate and combined impacts of a reduced
maximum number of knots and various levels of thinning on autocorrelation and model convergence.
59
4.2.3.1 Impact of maximum number of knots on convergence and autocorrelation
To better understand the performance of the MCMC chain, we run a single iteration under the Scenario 1
with 500 patients. When the maximum allowable knots is equal to 9, the generated covariates in the model
are 1 + x + w1 + w2 + w3 + w4 + w5 + w6 + w7 + w8 + w9 + w10 + w11 + trt + x ∗ trt + w1 ∗ trt +
w2 ∗ trt+w3 ∗ trt+w4 ∗ trt+w5 ∗ trt+w6 ∗ trt+w7 ∗ trt+w8 ∗ trt+w9 ∗ trt+w10 ∗ trt+w11 ∗ trt
and we have 26 coecients in total, plus σ
2
f
and σ
2
g
(σ
2
f
and σ
2
g
are the variance of the Gaussian priors for
vf and vg in (4.1.6). The trace plot and autocorrelation plot for each of the coecients is showed below).
For the chain with knots = 9 (Figure 4.7 and Figure 4.8), the minimum p-value of Heidelberger and Welch’s
statistic for θf and θg is 0.0519; the minimum p-value of Heidelberger and Welch’s statistic for σ
2
f
and σ
2
g
is 0.0526. The acceptance rates for θf and θg are 0.1697 and 0.1882. Though based on the Heidelberger
and Welch’s diagnosis the chain converged, the autocorrelations for all of the coecients are not getting
close to zero as lag increases.
60
Figure 4.7: Traceplot for the MCMC chain under Scenario 1 with no thinning and 9 knots
61
Figure 4.8: Autocorrelation plot for the MCMC chain under Scenario 1 with no thinning and 9 knots
62
To explore the impact restricting the maximum number of allowable knots on convergence and autocorrelation, we repeat above single trial simulation but reduce the number of knots to 4. The generated
covariates in the model become 1 + x + w1 + w2 + w3 + w4 + w5 + w6 + trt + x ∗ trt + w1 ∗ trt +
w2 ∗ trt + w3 ∗ trt + w4 ∗ trt + w5 ∗ trt + w6 ∗ trt and we have 16 coecients in total, plus σ
2
f
and σ
2
g
.
The cuto value used to determine convergence failure is z > 2.9913 after multiple-testing adjustment. For
the chain with knots = 4 (Figure 4.9 and Figure 4.10), the minimum p-value of Heidelberger and Welch’s
statistic for θf and θg is less than 0.0001; the minimum p-value of Heidelberger and Welch’s statistic for
σ
2
f
and σ
2
g
is 0.0013. The acceptance rates for θf and θg are 0.2048 and 0.1809. We reject the null that the
chain has converged based on Heidelberger and Welch’s diagnosis and samples are highly correlated with
each other.
63
Figure 4.9: Traceplot for the MCMC chain under Scenario 1 with no thinning and 4 knots
64
Figure 4.10: Autocorrelation plot for the MCMC chain under Scenario 1 with no thinning and 4 knots
4.2.3.2 Impact of thinning on convergence and autocorrelation
4.2.3.2.1 Low-level thinning, thin = 20
Next, we would like to explore the impact of thinning. We rst keep every 20th accepted value in the
chain (thin = 20). All the other settings are kept the same. For the chain with knots = 9 (Figure 4.11
and Figure 4.12), the minimum p-value of Heidelberger and Welch’s statistic for θf and θg is 0.0053; the
minimum p-value of Heidelberger and Welch’s statistic for σ
2
f
and σ
2
g
is 0.0751. The acceptance rates for
65
θf and θg are 0.2707 and 0.2657. The chain converged. The trace plots are more stable compared to those
without thinning and the autocorrelation among the chain is also reduced.
Figure 4.11: Traceplot for the MCMC chain under Scenario 1 with thin=20 and 9 knots
66
Figure 4.12: Autocorrelation plot for the MCMC chain under Scenario 1 with thin=20 and 9 knots
67
For the below chain with knots = 4 (Figure 4.13 and Figure 4.14), the minimum p-value of Heidelberger
and Welch’s statistic for θf and θg is 0.0174; the minimum p-value of Heidelberger and Welch’s statistic
for σ
2
f
and σ
2
g
is 0.3320. The acceptance rates for θf and θg are 0.2869 and 0.2479. The chain converged.
Again, the autocorrelation plots seem better, which is the same as what is observed for knots = 9.
Figure 4.13: Traceplot for the MCMC chain under Scenario 1 with thin=20 and 4 knots
68
Figure 4.14: Autocorrelation plot for the MCMC chain under Scenario 1 with thin=20 and 4 knots
4.2.3.2.2 High-level thinning, thin = 50
Then, we keep the every 50th accepted value in the chain (thin = 50) while keeping the other setting the
same. For the chain with knots = 9 (Figure 4.15 and Figure 4.16), the minimum p-value of Heidelberger
and Welch’s statistic for θf and θg is 0.0244; the minimum p-value of Heidelberger and Welch’s statistic
for σ
2
f
and σ
2
g
is 0.0765. The acceptance rates for θf and θg are 0.3005 and 0.2762. The chain converged.
The trace plots are similar between thin = 20 and 50, but the autocorrelation plots are better for thin = 50.
69
Figure 4.15: Traceplot for the MCMC chain under Scenario 1 with thin= 50 and 9 knots
70
Figure 4.16: Autocorrelation plot for MCMC chain under Scenario 1 with thin = 50 and 9 knots
71
For the chain with knots = 4 (Figure 4.17 and Figure 4.18), the minimum p-value of Heidelberger and
Welch’s statistic for θf and θg is 0.0991; the minimum p-value of Heidelberger and Welch’s statistic for
σ
2
f
and σ
2
g
is 0.7144. The acceptance rates for θf and θg are 0.2766 and 0.2827. The chain converged. We
observe similar result for knots = 4 and knots = 9 that autocorrelation plots are better as thinning number
increases.
Figure 4.17: Traceplot for the MCMC chain under Scenario 1 with thin = 50 and 4 knots
72
Figure 4.18: Autocorrelation plot for the MCMC chain under Scenario 1 with thin = 50 and 4 knots
73
In general, based on the single MCMC chain comparison of ways to reduce autocorrelation and reviewing trace plots and Heidelberger and Welch’s diagnostics, we learn that both reduction of the number
of knots and some degree of thinning can improve the chain’s estimation accuracy. The impact of thinning
on the chain’s performance is greater than the impact of the number of knots.
4.2.4 Exploration of the eect of max number of knots
To investigate the impact of maximum number of allowable interior knots on the accuracy of marker eect
estimation, we plot the continuous treatment eects from 12 single trials for Scenario 1 to Scenario 12 with
dierent numbers of knots, ranging from 4 to 12 (Figure 4.19). This range is chosen due to the simulation
results shown by Rupper that for logit-shape and bump-shape functions, the selected number of knots from
two dierent search algorithms are 5 and 10 for penalized splines [80]. The logit-shape and bump-shape are
similar to Scenario 5 and Scenario 10. Though their simulations are modeling for a continuous outcome,
the result could still be applied to model binary outcome after using a link function [99]. All parameter
settings are the same as mentioned above except we have applied thinning: the burn-in length has 25000
samples; we keep every 50th value (thin = 50) and the number of posterior sample collected after thinning
is 10000 for a single chain; the scaling factors to adaptively adjust the size of random walk proposal are
0.5, 0.9, 0.6 and 1.1. The column name is the corresponding maximum number of knots allowed and each
row is 10 simulated trials for each scenario. For every plot, the ranges for y–axis and x–axis are xed. The
black line is the estimate for the treatment eect; the dotted lines are 95% credible bands; the red line is
the true function for the treatment eect on the log odds scale. In general, the more allowable knots, the
more exible the model is. Also, the 95% credible band is wider around the boundaries of marker values.
74
75
76
Figure 4.19: Estimation of treatment eect by dierent maximum allowable knots and dierent Scenarios
Another metric to understand how maximum allowable knots aects the estimation is the maximum of
the absolute deviation of the estimated curve from the truth and the mean of the absolute deviation. For a
single iteration, we nd the absolute maximum deviation and the mean of the absolute deviation from the
estimated posterior mean treatment eect to the true treatment eect over all marker values; then we run
1000 simulated trials and calculate the average of those two quantities. For both the average of the mean
and the max of absolute deviation from the truth, small values are preferable. We conduct the analysis
with maximum knots allowed from 4 to 12, under the null Scenario (Scenario 1), smooth step-function
shape (Scenario 5) and bell-shape (Scenario 10). These three scenarios are chosen since the null scenario
77
should be most easy and straightforward estimation, but the other two scenarios are the most extreme and
challenge the estimation the most. The result is showed in Table 4.3.
Interestingly, both deviation criteria seem to be increasing as as a function of maximum number of
knots under Scenario 1. The average of mean deviations stays relatively constant across the potential
number of knots under Scenario 5 and Scenario 10. We also see the increasing relationship between knots
and the average of the max of the absolute deviation for non-linear and non-monotonic experimental
eect under Scenario 5 and Scenario 10. The deviation is smaller for the simple null case than Scenario
5 and Scenario 10. In addition, the eect of maximum allowable knots on estimation deviation is less for
non-linear and non-monotonic true underlying marker-treatment relationships. Based on the results in
Table 4.3, the maximum allowable knots of 6 should be adequate, as it minimizes the average of the mean
and the max of the absolute deviation from the true value.
Table 4.3: The average of mean and max of absolute deviation from the truth over 1000 iterations for
Scenario 1, Scenario 5 and Scenario 10 with dierent maximum allowable knots
Maximum
allowable knots
Scenario 1 Scenario 5 Scenario 10
mean max mean max mean max
4 0.2122 0.4591 0.3757 0.9935 0.3550 0.9423
5 0.2126 0.4624 0.3776 1.0019 0.3483 0.9277
6 0.2136 0.4646 0.3765 0.9990 0.3465 0.9213
7 0.2134 0.4650 0.3763 0.9987 0.3466 0.9197
8 0.2134 0.4649 0.3770 0.9992 0.3478 0.9217
9 0.2126 0.4633 0.3759 0.9978 0.3471 0.9196
10 0.2124 0.4627 0.3764 0.9981 0.3469 0.9195
11 0.2141 0.4752 0.3757 0.9978 0.3462 0.9191
12 0.2128 0.4720 0.3775 0.9985 0.3465 0.9182
4.2.5 Estimation of Marker Eects without Adaptive Randomization
To establish the general accuracy of estimation with this model, simulations were conducted for several
scenarios without adaptive randomization. For each scenario from 1 to 12 in Table 4.1, we assume 500
patients in total, with 250 in the control and 250 in the experimental arm, and a randomization ratio of 1:1.
For each scenario, we simulated 1000 replicated trials. The maximum number of allowable knots for the
78
splines is 6 based on the exploration of how the number of maximum allowable knots aects the estimation
described in Section 4.2.4. The marker values are sampled from the uniform distribution between 0 and
1 and the median marker value is set as the reference for the prognostic eect. For MCMC chain, the
burn-in length has 25000 samples and the total chain length is 525000 samples. We keep the every 50th
value (thin = 50) and the number of posterior sample collected is 10000 for a single chain. The scaling
factors to adaptively adjust the size of random walk proposal are 0.5, 0.9, 0.6 and 1.1. The deviation criteria
mentioned in Section 4.2.4 are used for assessment.
A comparator model was used to compare with the proposed spline model. Instead of using cubic
B-spline to model prognostic and predictive marker eect, the comparator model was simplied as
logit(P(Y = 1)) = β1 ∗ x + β2 ∗ z + β3 ∗ x ∗ z, (4.2.1)
where X is the continuous marker value and Z is an indicator for investigated treatment. The simple
comparator model assume only linear predictive biomarker-treatment interaction. The coecients are
calculated by L2 penalized maximum likelihood estimators. Again, we assume 500 patients in total, with
250 in the control and 250 in the experimental arm, and a randomization ratio of 1:1. For each scenario, we
simulate 1000 replicated trials. The deviation criteria mentioned in previous section is used for assessment
as well.
The results are presented in Table 4.4. The proposed model estimation performs better for simple
scenarios than complicated underlying scenarios. The most deviated estimation occurs for Scenario 4,
which is a step function with sudden increase at 0. The discontinuity of the underlying model may lead
to the estimation dierence. Compared to the comparator model, the proposed model performed much
better in both deviation criteria for non-linear and non-monotonic cases (Scenario 9 and 10). Though
the proposed model performed a little worse for other cases, the dierence was small and results are
comparable.
79
Table 4.4: The average of mean and max absolute deviation from the truth over 1000 iterations for all
scenarios with proposed spline model with knots = 6 versus a comparator model
Scenario Spline Model Simple Model
mean max mean max
1 0.2126 0.4616 0.1984 0.3918
2 0.3114 0.7096 0.2640 0.5243
3 0.2198 0.4736 0.2100 0.4187
4 0.4483 1.3771 0.4992 1.2225
5 0.3769 1.0010 0.4050 0.9041
6 0.2391 0.5299 0.2167 0.4258
7 0.3272 1.0065 0.3512 1.1348
8 0.2343 0.5494 0.2263 0.5195
9 0.3501 1.2709 0.8078 1.6255
10 0.3471 0.9228 0.8561 1.4214
11 0.3024 0.6906 0.2645 0.5246
12 0.3031 0.7745 0.3054 0.6914
4.3 Method Part III: Adaptive Randomization
4.3.1 Adaptive Randomization Framework
In this section, we now expand to our modeling to the setting of adaptive trial design, where the estimated
marker eects to date will be used to assess treatment performance at a series of interim analyses, and
where the randomization probabilities to experimental vs control for a newly enrolling patient with a given
marker value of x will be updated periodically to preferentially assign that patient to the arm showing
benet with the probability of assignment to that is derived from the posterior strength of evidence seen
to date in the trial.
The quantity controlling the randomization ratio for a patient with marker value x will be based on
estimated probability of response. To obtain the probability of response based on the logistic model, we
need to invert the logistic function and we dene it as logit-1:
logit−1
(x) = e
x
1 + e
x
= P(Y = 1). (4.3.1)
80
In our context, the above equation represents the predictive response rate for a patient with marker value
x.
Assume a trial has randomized n patients and observed the response for all enrolled patients, with
data available to estimate the posterior distribution of ˆf(x) and gˆ(x), which are the marker’s observed
prognostic and predictive eects, respectively. Then, for a newly enrolled (n+1)th patient with marker
value xn+1, we can write the predicted response rate for the (n+1)th patient in the control arm as
Rˆ
ctl,n+1 = logit−1
(ˆg(xn+1)), (4.3.2)
and the predicted response rate for the (n+1)th patient in the experimental arm as
Rˆ
exp,n+1 = logit−1
(
ˆf(xn+1) + ˆg(xn+1)). (4.3.3)
Intuitively, we can assign the (n+1)th patient to the experimental arm with a randomization probability
based on ratio of the point estimate of the response rate in the treatment arm over the sum of the response
rates in both arms:
Pexp,n+1 =
Rˆ
exp,n+1
Rˆ
exp,n+1 + Rˆ
ctl,n+1
, (4.3.4)
while the randomization probability to the control arm is 1 − Pexp,n+1. However, this approach does not
consider the reliability of the estimation. Instead, we use the posterior credible interval for the currently
estimated predictive marker eect, representing the superiority of one treatment over the other, to determine the assignment probabilities for a patient with marker value x + 1. Given the observed data, the
posterior probability that the experimental arm has a higher response rate than the control arm for this
81
patient can be expressed as P r(Pˆ
exp,n+1 > Pˆ
crl,n+1|D). Then, we can assign the n + 1th patient to the
experimental arm with probability
Pexp,n+1 =
P r(Rˆ
exp,n+1 > Rˆ
ctl,n+1|D)
c
P r(Rˆ
exp,n+1 > Rˆ
ctl,n+1|D)
c + P r(Rˆ
ctl,n+1 > Pˆ
exp,n+1|D)
c
, (4.3.5)
where c is the tuning parameter that controls how much the randomization probability is inuenced by
the data. By the recommendation from Thall and Wathen, c can be chosen to be 0.5 or n
2N
, where n is the
number of currently enrolled patients and N is the total planned sample size for the trial [100].
Since we need to observe enough data to inform the model posteriors for subsequent adaptations, we
need a run-in phase, during which patients are randomization based on standard 1:1 ratio. Also, in order to
allow data accumulation between updates, we can set the frequency at which the posteriors and markerspecic randomization probabilities are updated (maintaining the same marker-specic randomization
probabilities between updates). According to the simulation study conducted by Wathen and Thall, in order
to have desirable selection properties, the possible adaptive randomized probabilities should be restricted
to a certain interval,[rmin, rmax], for example, [0.1,0.9] [100].
4.3.2 Simulation with Adaptive Randomization
4.3.2.1 Exploration and Estimation with Adaptive Randomization
Adaptive randomization is implemented and simulations are conducted rst for Scenario 1 as dened
above. Again, there are 500 patients in total. Adaptive randomization is started after observed results
from the rst 100th or 200th patients, and during the run-in phase, the randomization ratio is 1:1. The true
function in Scenario 1 for the control arm is f(x) = −2.95 and the true function for the experimental arm
is f(x) + g(x) = −2.95 (Table 4.1). The maximum number of allowable knots for the splines is set to 4
based on earlier results. The marker values are sampled from the uniform distribution between 0 and 1
82
and the median marker value is set as the reference for the prognostic eect. For MCMC chain, the burn-in
length is 25000 samples and the total chain length is 525000 samples. We keep the every 50th value (thin
= 50) and thus the number of posterior sample collected is 10000 for a single chain. The scaling factors to
adaptively adjust the size of the random walk proposal are 0.5, 0.9, 0.6 and 1.1, as discussed in Section 4.2.2.
To visualize the randomization process on a single trial, we make the plot below. Under Scenario 1,
the null case, the plot below is a single trial with adaptive randomization starting from the 100th patient,
and the randomization ratio is updated after either every 50 patients or every 10 patients (block size =
50 or block size = 10). The x–axis represents indices of enrolled patients and y–axis is the probability of
randomization to the experimental arm for a patient with the marker value associated with the biggest
treatment eect for that scenario. In this scenario, there is no predictive marker eect, so a patient with
a marker value of 0.5 or the median marker value is used. Tuning parameters of c = 0.5 or c = n/2N are
each explored. rmin and rmax set to be 0.1 and 0.9.
In Figure 4.20, we see that that randomization probability trends and shapes are similar for both block
sizes. For a smaller block size, the randomization probability is updated more frequently and we can
visualize the changes in the probability of randomization to experimental arm more easily. Here, a wellperforming trial would maintain randomization probabilities near 50% over time as there is truly no treatment eect. For both tuning parameters, the randomization probabilities reach the highest point between
the 200th and 250th patients. Tuning parameter c = n/2N overall is more conservative and more stable
than c = 0.5. The combination of c = n/2N with block size = 10 or block size = 50 are the best for this
single trial simulation, because of their control of the randomization probabilities near 50% over the course
of the trial.
83
Figure 4.20: Randomization probability under Scenario 1 start number = 100 with dierent tuning parameters and block sizes
Figure 4.21 below is for Scenario 1 with adaptive randomization starting after the 200th patient, while
all the other settings are kept the same. The randomization probabilities are closer to equal randomization,
which is desirable since this is the null case. The observations are similar to plots with adaptive randomization starting after the 100th patient. For c = 0.5, the randomization probability reaches the highest point
between the 200th and 250th patients. Based on the trace plots, the tuning parameter = n/2N is more
84
conservative than the tuning parameter c = 0.5. The combination of c = n/2N with block size = 10 or
block size = 50 yield yield the most desirable results for this single trial simulation.
Figure 4.21: Randomization probability under Scenario 1 start number = 200 with dierent tuning parameters and block sizes
Based on observations from above, c = n/2N , start number = 100 and block size = 50 are selected for
further investigation.
85
Figure 4.22: Randomization probability under Scenario 1 for 10 trials
4.3.2.2 Ten Single Trials for Adaptive Randomization under All Scenarios
The randomization probability spaghetti plots of 10 trial iterations, the same as described above, are generated for all 12 dierent Scenarios using tuning parameter = n/2N, start number = 100 and block size =
50. If there is no marker eect, a patient with a marker value of 0.5 or the median marker value is used.
Since in all Scenarios (except for Scenarios 1, 3 and 11), there are predictive marker eects or treatment
eects, we expect to see an upward trend for the randomization probabilities associated with the most
86
ecacious marker value (Table 4.5). For Scenario 11, since it is a constant inferior treatment case, there is
no marker value associated with treatment benet. In Figure 4.23, we see that randomization probabilities
to the experimental arm go close to 1 as patients are accrued and more information is collected for Scenarios with treatment benet. For Scenario 1 and Scenario 3, which are cases without treatment eect, the
randomization probabilities are around 50%, with some small variance. For Scenario 11, due to the inferior
treatment eect, the randomization probabilities fall to the bounded lowest randomization ratio, which is
0.1.
Table 4.5: Simulation scenarios with most ecacious marker value in experimental arm
Case Scenario Most ecacious marker value
1
No treat eect
No marker eect 0.5
2
Constant treat eect
No marker eect 0.5
3
Prognostic marker eect
No treat eect 0.5
4
Predictive marker eect
(perfectly dichotomous) near 1.0
5
Predictive marker eect
(nearly dichotomous) near 1.0
6
Predictive marker eect
(linear) near 1.0
7
Predictive marker eect
(non-linear and
monotone)
near 1.0
8
Predictive marker eect
(non-linear and
monotone)
near 1.0
9
Predictive marker eect
(non-linear and
non-monotone)
near 0.0 or 1.0
10
Predictive marker eect
(non-linear and
non-monotone)
0.5
11 Constant inferior treat eect
(No marker eect) -
12 Prognostic and predictive marker eect
(linear;nearly dichotomous) near 1.0
87
88
Figure 4.23: Randomization probability under dierent Scenarios
4.4 Trial design: Bayesian Enrichment Trial with Adaptive
Randomization
We propose a trial design which combines the features of covariate-adaptive and response-adaptive randomization given currently enrolled patient responses and adaptive enrichment if accrual for patients
89
with certain marker values are terminated early for ecacy, inferiority or futility. We assume a oneto-one randomization ratio to each treatment arm before the start of adaptive randomization. After the
run-in phase, patients will be adaptively randomized based on the marker-specic randomization ratio
calculated by the estimated penalized splines model with the accumulated patient information. Suppose there are k adaptive randomization blocks and we perform k interim analyses during the trial. Let
Dk = {(yi
, xi
, zi); i = 1, . . . , nk} be the patient data observed by interim analysis k ∈ 1, . . . , K, where
K is the nal analysis. At an interim check k, which occurs at the end of a randomization block, we rst
check whether there is sucient evidence of early ecacy, inferiority or futility for the experimental arm
among marker values in continuing marker group. If this occurred, the denition of “continuing marker
group" will be updated and patients with those marker values will no longer be enrolled. Instead, the trial
will enrich and continue with enrollment restricted to the continuing marker group to collect more evidence on the remaining marker values’ relationships with treatment and outcome. The trial terminates
once there is suciently precise evidence for all marker values’ associations with ecacy, inferiority, or
futility, or when the planned total sample size has been reached and outcomes observed for all patients.
Patients previously enrolled with marker values in the stopped marker group will continue to contribute
to subsequent modeling for estimation purposes. We assume that a patient’s response can be observed
shortly after receiving a treatment assignment.
4.4.1 Interim/Final analysis algorithm
At the end of each adaptive randomization block k ∈ 1, . . . , K corresponding to an interim or nal analysis time point, our proposed trial design’s decision algorithms proceed as follows.
90
Step 1: Update the full posterior marker model:
Using all data observed up to the current check k, we t or re-t the full penalized splines model as described in Section 4.1 to evaluate the treatment eect based on the continuous marker X.
Step 2a: Interim ecacy check:
If k ∈ 1, . . . , K − 1, we check for early ecacy of the experimental treatment eect among the continuing marker group Xc by comparing the high posterior probability of g(x) > 0 given x with a prespecied ecacy threshold. We dene Xef f,k as Xef f,k := {x ∈ [a, b] : P(g(x) > 0|Dk) > Pef f },
where Pef f is a prespecied ecacy threshold. We stop recruitment for patients with marker values
in Xef f,k due to early ecacy and remove Xef f,k from Xc. We add Xef f,k to the set of Xef f , where
Xef f = {x : x ∈
Sk−1
j=1 Xef f,j}. Xef f then includes treatment sensitive marker values identied up to
the current check k. Then, go to Step 3.
Step 2b: Final ecacy check:
If k = K, we check for nal ecacy of the experimental treatment eect among the continuing marker
group Xc by comparing the high posterior probability of g(x) > 0 given x with a prespecied ecacy
threshold. We dene Xef f,K as Xef f,K := {x ∈ [a, b] : P(g(x) > 0|DK) > Pef f }, where Pef f is a
prespecied ecacy threshold. We add Xef f,K to the set of Xef f , where Xef f = {x : x ∈
SK−1
j=1 Xef f,j}.
Xef f then includes treatment sensitive marker values identied up to the current check K. We claim all
marker values in Xef f for nal ecacy and Xef f,K for ecacy at nal analysis. For marker values in Xc
but not in Xef f , nal futility is claimed.
91
Step 3: Interim inferiority check:
If k ∈ 1, . . . , K − 1, we check for early inferiority of the experimental treatment eect among the continuing marker group Xc by comparing the low posterior probability of g(x) > 0 given x with a prespecied inferiority threshold. We dene Xinf,k as Xinf,k := {x ∈ [a, b] : P(g(x) > 0|Dk) < Pinf },
where Pinf is a prespecied inferiority threshold. We stop recruitment for patients with marker values
in Xinf,k for early inferiority and remove Xinf,k from Xc. We add Xinf,k to the set of Xinf , where
Xinf = {x : x ∈
Sk−1
j=1 Xinf,j}. Xinf then includes inferior experimental treatment eect marker values
identied up to the current check k. Then, we go to Step 4.
Step 4: Interim futility check:
If k ∈ 1, . . . , K − 1, we check for early futility of the experimental treatment eect among the continuing
marker group Xc based on the posterior predictive probability of trial success given x. We calculate the
marker-specic posterior predictive probability of trial success by simulating subsequent trial completion
in a loop r, for r ∈ [1, R]. For each simulated iteration r, we randomly sample the remaining patients with
marker values from Xc, which combined with the current observed data added up to the total planned
sample size N. Based on the currently estimated posterior distribution and marker specic randomization
probabilities, we assign patients in treatment arms and simulate outcomes for remaining patients yet to
have observed outcomes in the trial. The posterior penalized splines model is re-tted with observed data
along with simulated data. For marker values in Xc, we calculate an ecacy indicator variable Xc,r. We
dene Xc,r = 1 if x ∈ Xc : P(g(x) > 0|Dr) > Pef f , where a chosen ecacy threshold Pef f is prespecied; Otherwise, Xc,r = 0. Then, for each marker value in Xc, we take the mean of the number of
marker-specic ecacy indicators observed over R, denoted by Phit,x =
PR
r=1 Xc,r
R
. If Phit,x < Pfut, where
Pfut is set to be some low probability, we claim the early futility for patients with those marker values and
add those to the set of Xfut,k. We stop recruitment for patients with marker values in Xfut,k due to early
92
futility and remove Xfut,k from Xc. We add Xfut,k the set of Xfut, where Xfut = {x : x ∈
Sk−1
j=1 Xfut,j}.
Xfut then includes futile experimental treatment eect marker values identied up to the current check
k. Then, we go to Step 5.
Step 5: Update the adaptive randomization model and continue recruitment
If there are no more marker values in the set of continuing marker group Xc, we terminate the study early.
Otherwise, if k ∈ 1, . . . , K − 1, we continue recruitment only for patients with marker values in the set of
continuing marker group Xc and enroll patients up to the next randomization block. For each subsequently
enrolled patient, we adaptively randomize the patient based on their marker value at enrollment, and the
marker-specic randomization ratio is calculated using the current estimated full penalized splines model
at this current randomization block.
4.4.2 Comparator Frequentist Design
We compare our proposed trial design with a traditional frequentist adaptive enrichment design in order
to assess the advantages of modeling the truly continuous marker eect as continuous and using adaptive
randomization. The comparator design will treat the marker eect as linear and monotonic, as is commonly done [64, 32, 83, 45]. At a single interim analysis, a grid search algorithm will be used to decide if
the marker by treatment interaction is suciently signicant, and if so, the optimal marker threshold will
be used to dichotomize the patient population into marker positive and negative subgroups. Then, early
ecacy, inferiority and futility analyses will be conducted in each marker subgroup and, or in the overall
group if no optimal marker cuto point is identied. If one subgroup is stopped early, the remaining sample size will be allocated to the continuing subgroup. When we dichotomize the patient population based
on a marker cutpoint, we dene an indicator variable Xd, where Xd = 1 denotes patients in the marker
positive subgroup and Xd = 0 denotes patients in the marker negative subgroup based on the cutpoint. We
93
assume a one-to-one randomization ratio during the entire trial. Suppose there are k interim analyses during the trial and let Dk = {(yi
, xi
, zi); i = 1, . . . , nk} be the patient data observed by interim timepoint
k ∈ 1, . . . , K, where K is the nal analysis. Also, let Dk,Xd = {(yi
, xi
, zi); i = 1, . . . , nk Xd = 0, 1}
be the patient data observed by interim k in the corresponding marker subgroup. At each interim or nal
analysis time point, with k ∈ 1, . . . , K, the comparator design proceeds as follows.
Step 1: Interim check for identifying marker subgroup:
If the marker cut-o point was identied and dened prior to k and one marker subgroup was stopped
previously due to early ecacy, inferiority or futility, continue to Step 2a and perform the following checks
for the remaining marker subgroup.
If there was no early stopped marker subgroup, the optimal marker cut-o point will be identied using
a grid search algorithm, where equally spaced cut-o points among the possible marker values range are
evaluated. Patients with marker values equal or higher than the cut-o point are dened as marker positive
subgroup and with marker values lower than the cut-o point are dened as marker negative subgroup.
At each potential cutpoint, a logistic regression model is tted with treatment assignment, indicator of
marker subgroup based on the potential cut-o point, and a marker by treatment interaction as covariate
terms based on Dk. After the search, if the smallest p-value for the marker-specic interaction term is
less than Pint,freq, where Pint,freq is a prespecied threshold, we will dichotomize the patient population
using the corresponding cut-o point and classify patients using the binary indicator Xd. Then, we go to
Step 2a and evaluate each marker subgroup independently. If no cut-o point is identied, we go to Step
2c and evaluate patients as a overall group.
Step 2a: Interim ecacy check for marker subgroup:
If k ∈ 1, . . . , K − 1, we check for early subgroup ecacy of the experimental treatment in subgroup Xd.
94
A logistic regression model is tted with treatment assignment based on Dk,Xd
. If the p-value of the treatment assignment term is less than Pef f,grp,freq, where Pef f,grp,freq is a prespecied ecacy threshold, and
the estimated coecient is greater than 0, meaning a positive treatment eect, stop recruiting patients in
subgroup Xd and claim Xd for ecacy; otherwise, go to Step 3a.
Step 2b: Final ecacy check for marker subgroup:
If k = K, we check for nal subgroup ecacy of the experimental treatment in subgroup Xd. A logistic
regression model is tted with treatment assignment based on Dk,Xd
. If the p-value of the treatment assignment term is less than Pef f,grp,freq, where Pef f,grp,freq is a prespecied ecacy threshold, and the
estimated coecient is greater than 0, meaning a positive treatment eect, we claim Xd for nal ecacy;
otherwise, we claim nal futility for subgroup Xd.
Step 2c: Interim ecacy check for overall population:
If k ∈ 1, . . . , K − 1, we check for ecacy of the experimental treatment in the overall patient population.
A logistic regression model is tted with treatment assignment based on Dk. If the p-value of the treatment assignment term is less than Pef f,all,freq, where Pef f,all,freq is a prespecied ecacy threshold, and
the estimated coecient is greater than 0, meaning a positive treatment eect, stop recruiting all patients
and claim ecacy for patients with any marker values; otherwise, go to Step 3b.
Step 2d: Final ecacy check for overall population:
If k = K, we check for nal ecacy of the experimental treatment in in the overall patient population. A
logistic regression model is tted with treatment assignment based on Dk. If the p-value of the treatment
assignment term is less than Pef f,all,freq, where Pef f,all,freq is a prespecied ecacy threshold, and the
95
estimated coecient is greater than 0, meaning a positive treatment eect, we claim nal ecacy for patients with any marker values; otherwise, we claim nal futility for overall patient population.
Step 3a: Interim inferiority check for marker subgroup:
Check for early subgroup inferiority of the experimental treatment in subgroup Xd. A logistic regression
model is tted with treatment assignment based on Dk,Xd
. If the p-value of the treatment assignment
term is less than Pef f,grp,freq, where Pef f,grp,freq is a prespecied inferiority threshold, and the estimated
coecient is less than 0, meaning a negative treatment eect, stop recruiting patients in subgroup Xd and
claim Xd for early inferiority; otherwise, go to Step 4a.
Step 3b: Interim inferiority check for overall population:
Check for early inferiority of the experimental treatment eect in the overall patient population. A logistic regression model is tted with treatment assignment based on Dk. If the p-value of the treatment
assignment term is less than Pinf,all,freq, where Pinf,all,freq is a prespecied inferiority threshold, and the
estimated coecient is less than 0 meaning a negative treatment eect, stop recruiting all patients and
claim early inferiority for patients with any marker values; otherwise, go to Step 4b.
Step 4a: Interim futility check for marker subgroup:
If k ∈ 1, . . . , K − 1, check for early subgroup futility of the experimental treatment in subgroup Xd. Based
on the current marker prevalence for Xd based on Dk,Xd
and assuming equal randomization within the
subgroup of Xd, the conditional power of test for the dierence between the response rates in experimental
arm and control arm is calculated at interim k within Xd. The type I error rate of α used in the test is based
on Pef f,grp,freq and θalter, which denotes the expected dierence under the alternative hypothesis. If the
conditional power is lower than Pfut,grp,freq, where Pfut,grp,freq is a prespecied futility threshold, stop
96
recruiting patients in subgroup Xd and claim Xd for early futility; otherwise, we continue recruitment for
patients in subgroup Xd.
Step 4b: Interim futility check for overall population:
If k ∈ 1, . . . , K − 1, check for early futility of the experimental treatment in the overall patient population. Assuming equal randomization, the conditional power of test for the dierence between the response
rates in experimental arm and control arm is calculated at interim k in the overall population. The type I
error rate of α used in the test is based on Pef f,all,freq and θalter, which denotes the expected dierence
under the alternative hypothesis. If the conditional power is lower than Pfut,all,freq, where Pfut,all,freq is
a prespecied futility threshold, stop recruiting patients with any marker values and claim early futility;
otherwise, we continue recruitment for patients with any marker values.
4.5 Simulation Study
We conducted simulations to evaluate the performance of our proposed trial design. In the simulation,
N = 500 is the maximum sample size and there are two treatment arms. Adaptive randomization is
started after observed results from the rst 100 patients, and during the run-in phase, the randomization
ratio is 1:1. The block size is set to be 50 and the tuning parameter is c = n/2N. Given the total number
of patients and the block size, there are 8 randomization blocks or trial interim analyses. The maximum
number of allowable knots for the splines is 6. In order to assess the proposed trial design’s performance
over iterations, we x the pool of possible continuous marker values by a sequence from 0.01 to 0.99
with a increment of 0.01, which gives 99 unique marker values. During the run-in phase, patients are
recruited from an unrestricted population where marker values are sampled with replacement, and the
median marker value is set as the reference for the prognostic eect. Once the set of continuing marker
97
values Xc no longer represents the full population due to enrichment, marker values for subsequently
enrolling patients are sampled with replacement from Xc. For the MCMC chain, the burn-in length has
25000 samples. We keep the every 50th value (thin = 50) and the number of posterior samples collected
is 10000 for a single chain. The scaling factors to adaptively adjust the size of random walk proposal are
0.5, 0.9, 0.6 and 1.1. To save the computational time, the single MCMC chain used during the posterior
predictive-based futility check has burn-in length = 20000 , thin = 20 and collected 2000 posterior samples.
The thresholds for decision rules are the following: Pef f = 0.975, Pinf = 0.025, Pfut = 0.1, R = 100,
rmin = 0.1, rmax = 0.9. The proposed trail design schema can be found in Figure 4.24.
Simulations are run for all 12 scenarios as described in Table 4.1 with 1000 iterations.
Figure 4.24: Proposed Bayesian enrichment trial with adaptive randomization design schema
We conducted simulations on the comparator frequentist design as well, intentionally choosing comparable settings for a more direct comparison. In the simulation, up to N = 500 total patients are enrolled and
there are two treatment arms. Only one interim analysis, conducted when 50% of patients are enrolled and
98
evaluated, and one nal analysis are performed. The grid search algorithm for an optimal marker cutpoint
is based on from marker values of 0.01 to 0.99 by an increment of 0.01. The threshold for decision rules are
the following: Pef f,all,freq = 0.025, Pef f,grp,freq = 0.025, Pinf,all,freq = 0.025, Pinf,grp,freq = 0.025,
Pfut,all,freq = 0.1, Pfut,grp,freq = 0.1, θalter = 0.01, to match those used in our design as closely as
possible. The comparator trail design schema can be found in Figure 4.25. Simulations are run for all 12
scenarios with 1000 iterations.
Figure 4.25: Comparator frequentist enrichment trial design schema
4.6 Simulation Result
Simulation results with 1000 iterations are presented in the following gures and tables. In Figure 4.27,
marker-specic nal trial decisions are summarized in stacked bin plots. The x-axis is the marker value
and the y-axis is the cumulative rate of each nal trial decision, including ecacy (red), futility (green)
and inferiority (blue), which add up to 100% for each marker value. Given that all scenarios from 1 to
99
10 (except Scenario 3) have no prognostic marker eects, we notice that the marker-specic separation
boundaries between claiming ecacy and claiming futility have shapes similar to the predictive marker
eect curves in Scenarios 1,2 and 4-10. In Scenario 12, the marker-specic separation boundary has the
same shape as the treatment eect, which is the dierence between experimental arm and control arms due
to the presence of both prognostic and predictive marker eects. The similarity between the separation
boundaries and predictive marker eect curves shows that our proposed design could eectively model the
predictive marker and treatment interaction. In Scenario 1, since there is no treatment eect, we expect to
claim futility across marker values. Our proposed design correctly classies overall futility about 90% on
average across marker values under the null case. Of the other 10%, 7% is mis-classied as overall ecacy
and 3% is misclassied as overall inferiority, which are averaged over marker values due to no predictive
marker eect. In Scenario 2, there is a constant treatment eect and almost 100% of the time, the proposed
design correctly concludes overall ecacy while only 0.2% of the time concludes overall futility across
marker values. Scenario 3 has a constant prognostic marker eect but since there is no treatment eect, we
expect to see a result that is similar to the null case. The proposed design correctly classies the conclusion
as overall futility about 80% on average across marker values. However, we notice that there are higher
chances of incorrect conclusion around the marker values closer to 0 or 1, which might be due to the
model’s ability to deal with marker boundaries. Scenario 4 has a predictive marker eect represented as a
step function, which is perfectly dichotomous, and Scenario 5 uses a smooth step function, which is nearly
dichotomous, so that the results from these two scenarios are expected to be similar. In both cases, after
the marker value of 0.5, where patients start to have treatment benets in the underlying models, almost
all marker values are classied as overall ecacy as shown in Figure 4.27. Due to the smoothing of the step
function, the rate of ecacy is higher in Scenario 5 than Scenario 4 as we approaching to marker values of
0.5 from 0. Again, the small misclassication rate of ecacy for marker values close to 0.5 in Scenario 4 is
caused by the discontinuity of the step function. For Scenario 6, we have a linear predictive marker eect,
100
and a small curvature in the separation boundary is observed due to the non-linearity-allowing property
of our proposed model. Scenarios 7 and 8 both have non-linear predictive marker eect functions, and the
separation boundaries of the nal status represents those shapes accordingly. Scenario 9 has no treatment
eect around marker value of 0.5 and maximum treatment eects around the lowest and highest marker
values, and we see low futility rates around marker values close to 0 and 1. Scenario 10 has the opposite
underlying function, with the marker eect like a bell shape, and we see low ecacy rates around marker
values close to 0 and 1. Scenario 11 has a constant inferior treatment eect with no prognostic marker
eect and the separation boundary of overall futility and overall inferiority is similar to a horizontal line.
The proposed design correctly classies overall inferiority about 87% on average across the marker values
where the experimental arm is worse than the control arm. Scenario 12 has a linear prognostic marker
eect and a smooth S-shaped increasing predictive marker eect, with a nearly dichotomous treatment
dierence. The result is similar to what we observed in Scenario 5, which is as expected.
In Figure 4.29, rates of the trial decisions being reached at interim or nal analysis timepoints are shown
for each marker value. The x-axis is the marker value and the y-axis is the cumulative proportion of when
the nal marker-specic decision was reached, either at the nal analysis (red), or at an interim analysis
(cyan). Our proposed design can successfully reach marker-specic conclusions before the nal analysis
time point at almost all marker values in all dierent cases. We observe that marker values reaching
the nal analysis timepoint are more likely to be at the marker boundaries (close to 0 or 1) or have a
relatively smaller treatment eect (smaller dierence in the log of odds for having a response between the
experimental arm and control arm), which is as expected.
The marker-specic decision times by specic interim analyses (blocks) are shown in Figure 4.31,
which is a more granular breakdown of the rates of stopping at each possible interim analysis from what
is shown in Figure 4.29. In this plot, the x-axis is the marker value and the y-axis is the cumulative rate
of when the nal trial decision is made, adding up to 100% for each marker value and timepoint. Since
101
an interim analysis is conducted at the end of each randomization block, of which there are 8 in our
setting, there are 8 interim analyses and a total of 9 possible stopping times, including the nal analysis.
Notice that the corresponding rates for stopping times equal to 9 is the same as the rates for nal analysis
stopping shown in Figure 4.29. As mentioned above, marker values that have stopped at a later interim
check (as they required more information to conclude a nal marker-specic trial decision) or those that
have reached the nal analysis are more likely to be at the boundaries (close to 0 or 1) or be associated
with a relatively small treatment eect. For example, there are higher rates of later stopping around the
boundary marker values in Scenario 2; in Scenario 9, as the treatment eect decreases when marker values
get closer to 0.5, there is a higher chance that nal decisions are concluded at a later interim check.
The corresponding mean sample sizes and the percentage of early trial termination are shown in Table 4.6. When the treatment eect is constantly inferior or superior and the eect size is large enough, the
required sample size is relatively low and 100% of simulated trials can be terminated early. For example,
the average nal sample size is 157 for Scenario 2 and 115 for Scenario 11, which is much less than the preplanned maximum sample size of 500. When a higher proportion of marker values have a large treatment
eect size, the required sample size is also low with a high rate of early termination, which is shown in
Scenario 7 with an average sample size of 294 and 88% early termination. However, when the predictive
marker eect is non-linear or there is a higher variance in treatment eect across marker values, the average sample size tends to be larger. With more patients’ information required for each marker value, more
data need to be collected and trials are more likely to run to the end, especially for marker values with a
relatively small eect size.
Simulation results for the comparator design with 1000 iterations are presented in Table 4.7. For each
scenario, the rates of marker detection are reported, indicating if suciently strong evidence existed for
dichotomization of the full population into positive/negative marker subgroups. The percentages of each
102
possible trial decision are also presented, which include overall ecacy, overall inferiority, or overall futility when no marker is detected; if marker subgroups are identied, ecacy, inferiority or futility can
be concluded in the positive or negative marker subgroups. Average sample sizes and average marker
prevalences, dened as the proportion of marker positive patients in the full population when a marker is
detected, are also reported. For Scenario 1, the comparator design correctly concludes overall futility 85%
of the time while our proposed design does so about 90% of the time on average across marker values. In
Scenario 2, the comparator design correctly concludes overall ecacy 89% of the time while our proposed
design does so about 100% of the time on average across marker values. The comparator design concludes
overall futility 85% of the time under Scenario 3, which performs better than our proposed design. Scenario
4 and 5 are "ideal" cases for the comparator design due to the perfectly dichotomous or nearly dichotomous predictive marker eects, and the comparator design identies a marker around 99% and 98% of the
time for Scenarios 4 and 5, respectively. In Scenario 6, the most common nal decision is ecacy in the
marker positive subgroup and futility in the marker negative subgroup, which is determined 76%, and 81%
of the time that marker is identied, respectively. Since the average marker prevalence is around 55%, the
selected marker cut-o point is around 0.55 on average. But patients with marker values close to 0.5 could
also benet from the treatment, which can be detected in our proposed design. Under Scenario 7, patients
with marker values greater than 0.12 have the maximum treatment eect and our proposed design’s results could detect that. However, the comparator design claims marker negative subgroup futility 44% of
the time, with the selected marker cut-o point around 0.34 on average. This means that patients with
marker value less than 0.34 will be classied as non-responding, but in fact patients with marker values
between of 0.12 and 0.34 also benet from the experimental arm; in other words, under the comparator
design, around 20% of the full population will miss out on an ecacious treatment. Similarly, in Scenario
8, patients with marker values greater than 0.25 start to benet from the experimental treatment but the
103
comparator design claims marker negative subgroup futility 78% of the time, with the selected marker cuto around 0.5. In Scenarios 9 and 10, where nonlinear and non-monotone predictive eects are present,
which could be obsevered by our design but not the comparator design. The comparator design is most
likely to claim ecacy in the marker positive subgroup and futility in the marker negative subgroup, with
rates of 75% and 81% respectively. A marker is detected 87% and 82% of the time for that design, respectively, with corresponding cut-o points of 0.26 and 0.77, leading to mis-classication of non-responder
into responders with marker values around the tail in Scenario 9 and responders into non-responders with
marker values around the head in Scenario 10. Due to the truly non-linear and non-monotonic predictive
marker eects under Scenario 9 and 10, the comparator design could not perform well by searching for a
binary marker cut-o value. In Scenario 11, the comparator design correctly concludes overall inferiority
92% of the time. For Scenario 12, the comparator design correctly detects a marker 84% of time, with a
cut-o point around 0.5, which is a little worse than its performance in Scenario 5 due to the presence of
prognostic marker eect.
We notice that the average sample sizes are low for the comparator design across all cases and the
average sample sizes are close to the minimum possible number of patients, since the only interim analysis happens around when 50% patients are enrolled and evaluated. Though the average sample sizes are
smaller in the comparator design than the proposed design, the comparator design has lower power to
detect a constant treatment eect (Scenario 2), and has a lower rate of correct classication for the null
case (Scenario 1). Also, the comparator is much more likely to categorize patients into the non-responder
marker subgroup when they can actually benet from the treatment based on the underlying true predictive marker eect function, while ecacy in these cases for some patients can be revealed by the proposed
design.
104
In conclusion, our proposed design is able to model the underlying prognostic and predictive marker
eects, which can be potentially non-linear and non-monotonic, and correctly make marker-specic (patient-focused) decisions with eciency. Though more marker-specic information may be required to
reach some decisions, the proposed design has higher power in identifying patients whose marker values
have positive predictive eect and reecting the truly broader underlying classication of "responder" than
the traditional frequentist adaptive enrichment design.
Table 4.6: Simulation results for the proposed trial design
Case Scenario Average sample size Percentage of
early termination
1
No treat eect
No marker eect 316.2 0.85
2
Constant treat eect
No marker eect 156.1 1.00
3
Prognostic marker eect
No treat eect 310.4 0.83
4
Predictive marker eect
(perfectly dichotomous) 410.4 0.71
5
Predictive marker eect
(nearly dichotomous) 426.5 0.60
6
Predictive marker eect
(linear) 424.4 0.56
7
Predictive marker eect
(non-linear and
monotone)
293.5 0.88
8
Predictive marker eect
(non-linear and
monotone)
436.0 0.51
9
Predictive marker eect
(non-linear and
non-monotone)
446.6 0.49
10
Predictive marker eect
(non-linear and
non-monotone)
447.2 0.50
11 Constant inferior treat eect
(No marker eect) 114.5 1.00
12 Prognostic and predictive marker eect
(linear;nearly dichotomous) 416.3 0.62
105
Figure 4.26: Rate of status by marker values for Scenarios 1-6
106
Figure 4.27: Rate of status by marker values for Scenarios 7-12
107
Figure 4.28: Time of nal decision by marker values for Scenarios 1-6
108
Figure 4.29: Time of nal decision by marker values for Scenarios 7-12
109
Figure 4.30: Rate of interim stopping time by marker values for Scenarios 1-6
110
Figure 4.31: Rate of interim stopping time by marker values for Scenarios 7-12
111
Table 4.7: Simulation results for the comparator frequentist trial design
Scenario Possible
Final Decision
Overall Trial
Decision Rate
Average
Sample Size
Average
Marker Prevalence
1:
No treat eect,
No marker eect
No Marker Detected 0.713 N=254.8 -
Marker Detected 0.287 - 0.505
Overall Ecacy 0.010
Overall Inferiority 0.017
Overall Futility 0.846
M+ E/M- Fut 0.063
M+ E/M- Inf 0.005
M+ Fut/M- E -
M+ Fut/M- Inf 0.059
M+ Inf/M- E -
M+ Inf/M- Fut -
2:
Constant treat eect,
No marker eect
No Marker Detected 0.726 N=250 -
Marker Detected 0.274 - 0.505
Overall Ecacy 0.891
Overall Inferiority -
Overall Futility -
M+ E/M- Fut 0.109
M+ E/M- Inf -
M+ Fut/M- E -
M+ Fut/M- Inf -
M+ Inf/M- E -
M+ Inf/M- Fut -
3:
Prognostic marker eect,
No treat eect
No Marker Detected 0.302 N=253.5 -
Marker Detected 0.698 - 0.472
Overall Ecacy 0.006
Overall Inferiority 0.008
Overall Futility 0.859
M+ E/M- Fut 0.066
M+ E/M- Inf 0.001
M+ Fut/M- E -
M+ Fut/M- Inf 0.060
M+ Inf/M- E -
M+ Inf/M- Fut -
4:
Predictive marker eect,
(perfectly dichotomous)
No Marker Detected 0.009 N=251 -
Marker Detected 0.991 - 0.515
Overall Ecacy 0.015
Overall Inferiority -
Overall Futility -
M+ E/M- Fut 0.947
M+ E/M- Inf 0.037
M+ Fut/M- E -
M+ Fut/M- Inf 0.001
M+ Inf/M- E -
M+ Inf/M- Fut -
112
Scenario Possible
Final Decision
Overall Trial
Decision Rate
Average
Sample Size
Average
Marker Prevalence
5:
Predictive marker eect,
(nearly dichotomous)
No Marker Detected 0.025 N=251 -
Marker Detected 0.975 - 0.527
Overall Ecacy 0.030
Overall Inferiority -
Overall Futility -
M+ E/M- Fut 0.946
M+ E/M- Inf 0.024
M+ Fut/M- E -
M+ Fut/M- Inf -
M+ Inf/M- E -
M+ Inf/M- Fut -
6:
Predictive marker eect,
(linear)
No Marker Detected 0.189 N=254.25 -
Marker Detected 0.811 - 0.545
Overall Ecacy 0.229
Overall Inferiority -
Overall Futility 0.008
M+ E/M- Fut 0.759
M+ E/M- Inf 0.004
M+ Fut/M- E -
M+ Fut/M- Inf -
M+ Inf/M- E -
M+ Inf/M- Fut -
7:
Predictive marker eect,
(non-linear and
monotone)
No Marker Detected 0.428 N=250.5 -
Marker Detected 0.572 - 0.665
Overall Ecacy 0.558
Overall Inferiority -
Overall Futility -
M+ E/M- Fut 0.442
M+ E/M- Inf -
M+ Fut/M- E -
M+ Fut/M- Inf -
M+ Inf/M- E -
M+ Inf/M- Fut -
8:
Predictive marker eect,
(non-linear and
monotone)
No Marker Detected 0.179 N=261.5 -
Marker Detected 0.821 - 0.498
Overall Ecacy 0.188
Overall Inferiority -
Overall Futility 0.024
M+ E/M- Fut 0.779
M+ E/M- Inf 0.008
M+ Fut/M- E -
M+ Fut/M- Inf 0.001
M+ Inf/M- E -
M+ Inf/M- Fut -
113
Scenario Possible
Final Decision
Overall Trial
Decision Rate
Average
Sample Size
Average
Marker Prevalence
9:
Predictive marker eect,
(non-linear and
non-monotone)
No Marker Detected 0.128 N=263.25 -
Marker Detected 0.872 - 0.261
Overall Ecacy 0.189
Overall Inferiority -
Overall Futility 0.058
M+ E/M- Fut 0.753
M+ E/M- Inf -
M+ Fut/M- E -
M+ Fut/M- Inf -
M+ Inf/M- E -
M+ Inf/M- Fut -
10:
Predictive marker eect,
(non-linear and
non-monotone)
No Marker Detected 0.177 N=251.25 -
Marker Detected 0.823 - 0.767
Overall Ecacy 0.173
Overall Inferiority -
Overall Futility 0.007
M+ E/M- Fut 0.812
M+ E/M- Inf 0.007
M+ Fut/M- E -
M+ Fut/M- Inf 0.001
M+ Inf/M- E -
M+ Inf/M- Fut -
11:
Constant inferior treat eect,
No marker eect
No Marker Detected 0.763 N=250 -
Marker Detected 0.237 - 0.420
Overall Ecacy -
Overall Inferiority 0.916
Overall Futility -
M+ E/M- Fut -
M+ E/M- Inf -
M+ Fut/M- E -
M+ Fut/M- Inf 0.084
M+ Inf/M- E -
M+ Inf/M- Fut -
12:
Prognostic and
predictive marker eect,
(linear;
nearly dichotomous)
No Marker Detected 0.157 N=255.75 -
Marker Detected 0.843 - 0.509
Overall Ecacy 0.088
Overall Inferiority -
Overall Futility 0.104
M+ E/M- Fut 0.789
M+ E/M- Inf 0.016
M+ Fut/M- E -
M+ Fut/M- Inf 0.003
M+ Inf/M- E -
M+ Inf/M- Fut -
114
4.7 Simulation Code
The R code used to implement our proposed Bayesian enrichment trial with adaptive randomization design
and the comparator frequentist enrichment design is available on GitHub: https://github.com/huoying831/
Bayesian-enrichment-adaptive-randomization-design.
115
Chapter 5
Conclusions
Within the scope of my thesis, two major weaknesses of currently-used biomarker-driven designs in cancer
can be resolved: (1) simplication or misrepresentation of the prognostic or predictive eects of naturally
continuous biomarkers (e.g., circulating tumor DNA) that are driving study design and could therefore
lead to incorrect conclusions, and (2) xed rather than adaptive treatment allocation ratios to markerdriven versus control therapy, despite accumulating information on-trial that could inform the treatment
of enrolling patients on a personalized level.
The proposed design combines the machinery of Bayesian statistical theory and computing with immediate access to individual patient data, which together can inform the development of an impactful,
clinically intuitive, and ready-to-implement trial design framework. These designs will allow for more accurate and precise real-time identication of patient subgroups who are beneting from an experimental
targeted therapy, due to their avoidance of standard design manipulations (e.g. marker dichotomization)
and assumptions (e.g., eect linearity or monotonicity). Instead, a exible Bayesian modeling “engine” uses
accumulating patient, molecular, and outcome data to continuously (1) update estimates of design-driving
marker relationships, (2) quantify posterior uncertainty around these eects, and (3) translate the statistical information from (1) and (2) to personalize treatment allocation, such that a newly-enrolled patient’s
probability of randomization to experimental therapy is a function of the their tumor marker value(s), the
116
experimental therapy’s trial performance to date among molecularly similar patients, and the remaining
statistical uncertainty in that performance. This in turn translates into real time (vs. delayed) learning
from patient data, shortened trial duration with faster answers to critical questions, and more ethical and
personalized treatment of study patients.
The current proposed design only applies in a single-biomarker setting, but could be extended to a
higher-dimensional marker setting in theory, for example by developing a “risk score" or classication
scale comprised of multiple markers; however, further renements should be done with caution as such
combinations may not be clinically meaningful or interpretable.
Currently, our adaptive randomization algorithm only includes patients’ biomarker information to
predict the probability of response. The adaptive randomization step could include and utilize some additional key baseline characteristics based on the investigational drug and disease, which could be one
future direction. In addition, our proposed design is relatively computationally intense and high-performance computing is required to run simulations. Optimization of the method or code to increase the
computational speed could be another future direction.
117
Bibliography
[1] Alzeheimer’s Disease Neuroimaging Intiative (ADNI). 2017. url: https://adni.loni.usc.edu.
[2] Junichi Asano and Akihiro Hirakawa. “A Bayesian basket trial design accounting for
uncertainties of homogeneity and heterogeneity of treatment eect among subpopulations”. In:
Pharmaceutical Statistics 19.6 (2020), pp. 975–1000.
[3] Laura Benner and Meinhard Kieser. “Timing of the interim analysis in adaptive enrichment
designs”. In: Journal of Biopharmaceutical Statistics 28.4 (2018), pp. 622–632.
[4] Scott M Berry, Kristine R Broglio, Susan Groshen, and Donald A Berry. “Bayesian hierarchical
modeling of patient subpopulations: ecient designs of phase II oncology clinical trials”. In:
Clinical Trials 10.5 (2013), pp. 720–734.
[5] Werner Brannath, Emmanuel Zuber, Michael Branson, Frank Bretz, Paul Gallo, Martin Posch, and
Amy Racine-Poon. “Conrmatory adaptive designs with Bayesian decision tools for a targeted
therapy in oncology”. In: Statistics in medicine 28.10 (2009), pp. 1445–1463.
[6] Frank Bretz, Willi Maurer, Werner Brannath, and Martin Posch. “A graphical approach to
sequentially rejective multiple test procedures”. In: Statistics in medicine 28.4 (2009), pp. 586–604.
[7] Anna Burguin, Daniela Furrer, Geneviève Ouellette, Simon Jacob, Caroline Diorio, and
Francine Durocher. “Trastuzumab eects depend on HER2 phosphorylation in HER2-negative
breast cancer cell lines”. In: PLoS One 15.6 (2020), e0234991.
[8] Máximo Carreras and Werner Brannath. “Shrinkage estimation in two-stage adaptive designs
with midtrial treatment selection”. In: Statistics in Medicine 32.10 (2013), pp. 1677–1690.
[9] Cong Chen, Xiaoyun Li, Shuai Yuan, Zoran Antonijevic, Rasika Kalamegham, and
Robert A Beckman. “Statistical design and considerations of a phase 3 basket trial for
simultaneous investigation of multiple tumor types in one study”. In: Statistics in
Biopharmaceutical Research 8.3 (2016), pp. 248–257.
[10] Nan Chen and J Jack Lee. “Bayesian cluster hierarchical model for subgroup borrowing in the
design and analysis of basket trials with binary endpoints”. In: Statistical Methods in Medical
Research 29.9 (2020), pp. 2717–2732.
118
[11] Hugh A Chipman, Edward I George, and Robert E McCulloch. “BART: Bayesian additive
regression trees”. In: (2010).
[12] Yiyi Chu and Ying Yuan. “A Bayesian basket trial design using a calibrated Bayesian hierarchical
model”. In: Clinical Trials 15.2 (2018), pp. 149–158.
[13] Yiyi Chu and Ying Yuan. “BLAST: Bayesian latent subgroup design for basket trials accounting
for patient heterogeneity”. In: Journal of the Royal Statistical Society: Series C (Applied Statistics)
67.3 (2018), pp. 723–740.
[14] Arthur Cohen and Harold B Sackrowitz. “Two stage conditionally unbiased estimators of the
selected mean”. In: Statistics & Probability Letters 8.3 (1989), pp. 273–278.
[15] Calvin J Cohen, Jaime Andrade-Villanueva, Bonaventura Clotet, Jan Fourie, Margaret A Johnson,
Kiat Ruxrungtham, Hao Wu, Carmen Zorrilla, Herta Crauwels, Laurence T Rimsky, et al.
“Rilpivirine versus efavirenz with two background nucleoside or nucleotide reverse transcriptase
inhibitors in treatment-naive adults infected with HIV-1 (THRIVE): a phase 3, randomised,
non-inferiority trial”. In: The Lancet 378.9787 (2011), pp. 229–237.
[16] Olivier Collignon, Christian Gartner, Anna-Bettina Haidich, Robert James Hemmings,
Benjamin Hofner, Frank Pétavy, Martin Posch, Khadija Rantell, Kit Roes, and Anja Schiel.
“Current statistical considerations and regulatory perspectives on the planning of conrmatory
basket, umbrella, and platform trials”. In: Clinical Pharmacology & Therapeutics 107.5 (2020),
pp. 1059–1067.
[17] Kristen M Cunanan, Alexia Iasonos, Ronglai Shen, Colin B Begg, and Mithat Gönen. “An ecient
basket trial design”. In: Statistics in medicine 36.10 (2017), pp. 1568–1579.
[18] Kristen M Cunanan, Alexia Iasonos, Ronglai Shen, and Mithat Gönen. “Variance prior
specication for a basket trial design using Bayesian hierarchical modeling”. In: Clinical Trials
16.2 (2019), pp. 142–153.
[19] David Draper. “Assessment and propagation of model uncertainty”. In: Journal of the Royal
Statistical Society: Series B (Methodological) 57.1 (1995), pp. 45–70.
[20] Jason P Fine, Hongyu Jiang, and Rick Chappell. “On semi-competing risks data”. In: Biometrika
88.4 (2001), pp. 907–919.
[21] David Firth. “Bias reduction of maximum likelihood estimates”. In: Biometrika 80.1 (1993),
pp. 27–38.
[22] Aaron Fisher, Michael Rosenblum, and Alzheimer’s Disease Neuroimaging Initiative. “Stochastic
optimization of adaptive enrichment designs for two subpopulations”. In: Journal of
biopharmaceutical statistics 28.5 (2018), pp. 966–982.
119
[23] Keith T Flaherty, Robert J Gray, Alice P Chen, Shuli Li, Lisa M McShane, David Patton,
Stanley R Hamilton, P Mickey Williams, A John Iafrate, Jerey Sklar, et al. “Molecular landscape
and actionable alterations in a genomically guided cancer clinical trial: National Cancer Institute
Molecular Analysis for Therapy Choice (NCI-MATCH)”. In: Journal of Clinical Oncology 38.33
(2020), p. 3883.
[24] Frank Fleischer, Birgit Gaschler-Markefski, and Erich Bluhmki. “A statistical model for the
dependence between progression-free survival and overall survival”. In: Statistics in Medicine
28.21 (2009), pp. 2669–2686.
[25] Boris Freidlin and Edward L Korn. “Borrowing Information across Subgroups in Phase II Trials: Is
It Useful? Borrowing Information across Subgroups”. In: Clinical Cancer Research 19.6 (2013),
pp. 1326–1334.
[26] Kei Fujikawa, Satoshi Teramukai, Isao Yokota, and Takashi Daimon. “A Bayesian basket trial
design that borrows information across strata based on the similarity between the posterior
distributions of the response probability”. In: Biometrical Journal 62.2 (2020), pp. 330–338.
[27] Andrew Gelman, John B Carlin, Hal S Stern, and Donald B Rubin. Bayesian data analysis.
Chapman and Hall/CRC, 1995.
[28] Andrew Gelman, Walter R Gilks, and Gareth O Roberts. “Weak convergence and optimal scaling
of random walk Metropolis algorithms”. In: The annals of applied probability 7.1 (1997),
pp. 110–120.
[29] Luca Gianni, Urania Dafni, Richard D Gelber, Evandro Azambuja, Susanne Muehlbauer,
Aron Goldhirsch, Michael Untch, Ian Smith, José Baselga, Christian Jackisch, et al. “Treatment
with trastuzumab for 1 year after adjuvant chemotherapy in patients with HER2-positive early
breast cancer: a 4-year follow-up of a randomised controlled trial”. In: The lancet oncology 12.3
(2011), pp. 236–244.
[30] Alexandra C Graf, Martin Posch, and Franz Koenig. “Adaptive designs for subpopulation analysis
optimizing utility functions”. In: Biometrical Journal 57.1 (2015), pp. 76–89.
[31] Robert J Gray. “Flexible methods for analyzing survival data using splines, with applications to
breast cancer prognosis”. In: Journal of the American Statistical Association 87.420 (1992),
pp. 942–951.
[32] Axel Grothey, Jonathan R Strosberg, Lindsay A Renfro, Herbert I Hurwitz, John L Marshall,
Howard Safran, Michael J Guarino, George P Kim, JR Hecht, Susan C Weil, et al. “A randomized,
double-blind, placebo-controlled phase II study of the ecacy and safety of monotherapy
ontuxizumab (MORAb-004) plus best supportive care in patients with chemorefractory metastatic
colorectal cancer”. In: Clinical Cancer Research 24.2 (2018), pp. 316–325.
[33] Wentian Guo, Yuan Ji, and Daniel VT Catenacci. “A subgroup cluster-based Bayesian adaptive
design for precision medicine”. In: Biometrics 73.2 (2017), pp. 367–377.
[34] Heikki Haario, Eero Saksman, and Johanna Tamminen. “An adaptive Metropolis algorithm”. In:
Bernoulli (2001), pp. 223–242.
120
[35] U.S. Department of Health, Human Services Food,
Center for Drug Evaluation Drug Administration, Center for Biologics Evaluation Research, and
Research. Adaptive Designs for Clinical Trials of Drugs and Biologics Guidance for Industry. 2019.
url: https://www.fda.gov/media/78495/download.
[36] U.S. Department of Health, Human Services Food,
Center for Drug Evaluation Drug Administration, Center for Biologics Evaluation Research, and
Oncology Center of Excellence Research. Master Protocols: Ecient Clinical Trial Design Strategies
to Expedite Development of Oncology Drugs and Biologics Guidance for Industry. 2022. url:
https://www.fda.gov/media/120721/download.
[37] Michael C Heinrich, Heikki Joensuu, George D Demetri, Christopher L Corless, Jane Apperley,
Jonathan A Fletcher, Denis Soulieres, Stephan Dirnhofer, Amy Harlow, Ajia Town, et al. “Phase
II, open-label study evaluating the activity of imatinib in treating life-threatening malignancies
known to be associated with imatinib-sensitive tyrosine kinases”. In: Clinical Cancer Research 14.9
(2008), pp. 2717–2725.
[38] Akihiro Hirakawa, Junichi Asano, Hiroyuki Sato, and Satoshi Teramukai. “Master protocol trials
in oncology: review and new trial designs”. In: Contemporary clinical trials communications 12
(2018), pp. 1–8.
[39] Brian P Hobbs and Rick Landin. “Bayesian basket trial design with exchangeability monitoring”.
In: Statistics in medicine 37.25 (2018), pp. 3557–3572.
[40] Jiunn T Hwang. “Empirical Bayes estimation for the means of the selected populations”. In:
Sankhya: The Indian Journal of Statistics, Series A ¯ (1993), pp. 285–304.
[41] David M Hyman, Igor Puzanov, Vivek Subbiah, Jason E Faris, Ian Chau, Jean-Yves Blay,
Jürgen Wolf, Noopur S Raje, Eli L Diamond, Antoine Hollebecque, et al. “Vemurafenib in multiple
nonmelanoma cancers with BRAF V600 mutations”. In: New England Journal of Medicine 373.8
(2015), pp. 726–736.
[42] Initiative for Molecular Proling and Advanced Cancer Therapy (IMPACT II). 2014. url:
https://www.clinicaltrials.gov/ct2/show/NCT02152254.
[43] Martin Jenkins, Andrew Stone, and Christopher Jennison. “An adaptive seamless phase II/III
design for oncology trials with subpopulation selection using correlated survival endpoints”. In:
Pharmaceutical statistics 10.4 (2011), pp. 347–356.
[44] Liyun Jiang, Ruobing Li, Fangrong Yan, Timothy A Yap, and Ying Yuan. “Shotgun: A Bayesian
seamless phase I-II design to accelerate the development of targeted therapies and
immunotherapy”. In: Contemporary clinical trials 104 (2021), p. 106338.
[45] Wenyu Jiang, Boris Freidlin, and Richard Simon. “Biomarker-adaptive threshold design: a
procedure for evaluating treatment with possible biomarker-dened subset eect”. In: Journal of
the National Cancer Institute 99.13 (2007), pp. 1036–1043.
[46] Alexander M Kaizer, Joseph S Koopmeiners, and Brian P Hobbs. “Bayesian hierarchical modeling
based on multisource exchangeability”. In: Biostatistics 19.2 (2018), pp. 169–184.
121
[47] Peter K Kimani, Susan Todd, Lindsay A Renfro, and Nigel Stallard. “Point estimation following
two-stage adaptive threshold enrichment clinical trials”. In: Statistics in Medicine 37.22 (2018),
pp. 3179–3196.
[48] Peter K Kimani, Susan Todd, and Nigel Stallard. “Estimation after subpopulation selection in
adaptive seamless trials”. In: Statistics in Medicine 34.18 (2015), pp. 2581–2601.
[49] Maja Krajewska and Geraldine Rauch. “A new basket trial design based on clustering of
homogeneous subpopulations”. In: Journal of Biopharmaceutical Statistics 31.4 (2021), pp. 425–447.
[50] Johannes Krisam and Meinhard Kieser. “Optimal decision rules for biomarker-based subgroup
selection for a targeted therapy in oncology”. In: International journal of molecular sciences 16.5
(2015), pp. 10354–10375.
[51] Kevin Kunzmann, Laura Benner, and Meinhard Kieser. “Point estimation in adaptive enrichment
designs”. In: Statistics in Medicine 36.25 (2017), pp. 3935–3947.
[52] J Jack Lee and Diane D Liu. “A predictive probability design for phase II cancer clinical trials”. In:
Clinical trials 5.2 (2008), pp. 93–106.
[53] Steven Lemery, Patricia Keegan, and Richard Pazdur. “First FDA approval agnostic of cancer
site-when a biomarker denes the indication”. In: The New England journal of medicine 377.15
(2017), pp. 1409–1412.
[54] Wen Li, Jing Zhao, Xiaoyun Li, Cong Chen, and Robert A Beckman. “Multi-stage enrichment and
basket trial designs with population selection”. In: Statistics in Medicine 38.29 (2019),
pp. 5470–5485.
[55] Rong Liu, Zheyu Liu, Mercedeh Ghadessi, and Richardus Vonk. “Increasing the eciency of
oncology basket trials using a Bayesian approach”. In: Contemporary Clinical Trials 63 (2017),
pp. 67–72.
[56] Yusha Liu, John A Kairalla, and Lindsay A Renfro. “Bayesian adaptive trial design for a
continuous biomarker with possibly nonlinear or nonmonotone prognostic or predictive eects”.
In: Biometrics 78.4 (2022), pp. 1441–1453.
[57] Xiaolong Luo, Mingyu Li, Weichung Joe Shih, and Peter Ouyang. “Estimation of treatment eect
following a clinical trial with adaptive design”. In: Journal of biopharmaceutical statistics 22.4
(2012), pp. 700–718.
[58] David Madigan and Adrian E Raftery. “Model selection and accounting for model uncertainty in
graphical models using Occam’s window”. In: Journal of the American Statistical Association
89.428 (1994), pp. 1535–1546.
[59] Baldur P Magnusson and Bruce W Turnbull. “Group sequential enrichment design incorporating
subgroup selection”. In: Statistics in medicine 32.16 (2013), pp. 2695–2714.
[60] Shigeyuki Matsui and John Crowley. “Biomarker-stratied phase III clinical trials: enhancement
with a subgroup-focused sequential design”. In: Clinical Cancer Research 24.5 (2018), pp. 994–1001.
122
[61] Willi Maurer and Frank Bretz. “Multiple testing in group sequential trials using graphical
approaches”. In: Statistics in Biopharmaceutical Research 5.4 (2013), pp. 311–320.
[62] Jean-Michel Molina, Pedro Cahn, Beatriz Grinsztejn, Adriano Lazzarin, Anthony Mills,
Michael Saag, Khuanchai Supparatpinyo, Sharon Walmsley, Herta Crauwels, Laurence T Rimsky,
et al. “Rilpivirine versus efavirenz with tenofovir and emtricitabine in treatment-naive adults
infected with HIV-1 (ECHO): a phase 3 randomised double-blind active-controlled trial”. In: The
Lancet 378.9787 (2011), pp. 238–246.
[63] T Morgan, M Zuccarello, R Narayan, P Keyl, K Lane, and D Hanley. “Preliminary ndings of the
minimally-invasive surgery plus rtPA for intracerebral hemorrhage evacuation (MISTIE) clinical
trial”. In: Cerebral Hemorrhage. Springer. 2008, pp. 147–151.
[64] Morphotek Investigation in Colorectal Cancer: Research of MORAb-004 (MICRO). 2012. url:
https://clinicaltrials.gov/study/NCT01507545.
[65] Peter Müller, Fernando Quintana, and Gary L Rosner. “A product partition model with regression
on covariates”. In: Journal of Computational and Graphical Statistics 20.1 (2011), pp. 260–278.
[66] Beat Neuenschwander, Simon Wandel, Satrajit Roychoudhury, and Stuart Bailey. “Robust
exchangeability designs for early phase clinical trials with multiple strata”. In: Pharmaceutical
statistics 15.2 (2016), pp. 123–134.
[67] Shoichi Ohwada and Satoshi Morita. “Bayesian adaptive patient enrollment restriction to identify
a sensitive subpopulation using a continuous biomarker in a randomized phase 2 trial”. In:
Pharmaceutical Statistics 15.5 (2016), pp. 420–429.
[68] Thomas Ondra, Sebastian Jobjörnsson, Robert A Beckman, Carl-Fredrik Burman, Franz König,
Nigel Stallard, and Martin Posch. “Optimized adaptive enrichment designs”. In: Statistical methods
in medical research 28.7 (2019), pp. 2096–2111.
[69] Yeonhee Park, Suyu Liu, Peter F Thall, and Ying Yuan. “Bayesian group sequential enrichment
designs based on adaptive regression of response and survival time on baseline biomarkers”. In:
Biometrics 78.1 (2022), pp. 60–71.
[70] Michael D Pickard and Mark Chang. “A exible method using a parametric bootstrap for
reducing bias in adaptive designs with treatment selection”. In: Statistics in Biopharmaceutical
Research 6.2 (2014), pp. 163–174.
[71] Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. “CODA: Convergence Diagnosis
and Output Analysis for MCMC”. In: R News 6.1 (2006), pp. 7–11. url:
https://journal.r-project.org/archive/.
[72] Matthew A Psioda, Jiawei Xu, QI Jiang, Chunlei Ke, Zhao Yang, and Joseph G Ibrahim. “Bayesian
adaptive basket trial design using model averaging”. In: Biostatistics 22.1 (2021), pp. 19–34.
[73] Fernando A Quintana, Peter Müller, and Ana Luisa Papoila. “Cluster-specic variable selection
for product partition models”. In: Scandinavian Journal of Statistics 42.4 (2015), pp. 1065–1077.
123
[74] Adrian E Raftery. “Bayesian model selection in social research”. In: Sociological methodology
(1995), pp. 111–163.
[75] Lindsay A Renfro and Sumithra J Mandrekar. “Denitions and statistical properties of master
protocols for personalized medicine in oncology”. In: Journal of biopharmaceutical statistics 28.2
(2018), pp. 217–228.
[76] Lindsay A Renfro and Daniel J Sargent. “Statistical controversies in clinical research: basket trials,
umbrella trials, and other master protocols: a review and examples”. In: Annals of Oncology 28.1
(2017), pp. 34–43.
[77] Michael Rosenblum, Brandon Luber, Richard E Thompson, and Daniel Hanley. “Group sequential
designs with prospectively planned rules for subpopulation enrichment”. In: Statistics in medicine
35.21 (2016), pp. 3776–3791.
[78] Peter J Rousseeuw. “Silhouettes: a graphical aid to the interpretation and validation of cluster
analysis”. In: Journal of computational and applied mathematics 20 (1987), pp. 53–65.
[79] Vivekananda Roy. “Convergence diagnostics for markov chain monte carlo”. In: Annual Review of
Statistics and Its Application 7 (2020), pp. 387–412.
[80] David Ruppert. “Selecting the number of knots for penalized splines”. In: Journal of computational
and graphical statistics 11.4 (2002), pp. 735–757.
[81] Haolun Shi and Guosheng Yin. “Bayesian enhancement two-stage design for single-arm phase II
clinical trials with binary and time-to-event endpoints”. In: Biometrics 74.3 (2018), pp. 1055–1064.
[82] R John Simes. “An improved Bonferroni procedure for multiple tests of signicance”. In:
Biometrika 73.3 (1986), pp. 751–754.
[83] Noah Simon and Richard Simon. “Adaptive enrichment designs for clinical trials”. In: Biostatistics
14.4 (2013), pp. 613–625.
[84] Noah Simon and Richard Simon. “Using Bayesian modeling in frequentist adaptive enrichment
designs”. In: Biostatistics 19.1 (2018), pp. 27–41.
[85] Richard Simon. “New designs for basket clinical trials in oncology”. In: Journal of
biopharmaceutical statistics 28.2 (2018), pp. 245–255.
[86] Richard Simon. “Optimal two-stage designs for phase II clinical trials”. In: Controlled clinical trials
10.1 (1989), pp. 1–10.
[87] Richard Simon, Susan Geyer, Jyothi Subramanian, and Sameek Roychowdhury. “The Bayesian
basket design for genomic variant-driven phase II trials”. In: Seminars in Oncology. Vol. 43.
Elsevier. 2016, pp. 13–18.
124
[88] Arup K Sinha, Lemuel Moye III, Linda B Piller, Jose-Miguel Yamal, Carlos H Barcenas,
Jianchang Lin, and Barry R Davis. “Adaptive group-sequential design with population
enrichment in phase 3 randomized controlled trials with two binary co-primary endpoints”. In:
Statistics in medicine 38.21 (2019), pp. 3985–3996.
[89] Dennis J Slamon, William Godolphin, Lovell A Jones, John A Holt, Steven G Wong, Duane E Keith,
Wendy J Levin, Susan G Stuart, Judy Udove, Axel Ullrich, et al. “Studies of the HER-2/neu
proto-oncogene in human breast and ovarian cancer”. In: Science 244.4905 (1989), pp. 707–712.
[90] Dennis J Slamon, Brian Leyland-Jones, Steven Shak, Hank Fuchs, Virginia Paton,
Alex Bajamonde, Thomas Fleming, Wolfgang Eiermann, Janet Wolter, Mark Pegram, et al. “Use of
chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that
overexpresses HER2”. In: New England journal of medicine 344.11 (2001), pp. 783–792.
[91] Kentaro Takeda, Shufang Liu, and Alan Rong. “Constrained hierarchical Bayesian model for latent
subgroups in basket trials with two classiers”. In: Statistics in Medicine 41.2 (2022), pp. 298–309.
[92] Rui Tang, Xiaoye Ma, Hui Yang, and Michael Wolf. “Biomarker-Dened Subgroup Selection
Adaptive Design for Phase III Conrmatory Trial with Time-to-Event Data: Comparing Group
Sequential and Various Adaptive Enrichment Designs”. In: Statistics in Biosciences 10 (2018),
pp. 371–404.
[93] Targeted Therapy Directed by Genetic Testing in Treating Patients With Advanced Refractory Solid
Tumors, Lymphomas, or Multiple Myeloma (The MATCH Screening Trial). 2015. url:
https://clinicaltrials.gov/ct2/show/NCT02465060.
[94] Peter F Thall, J Kyle Wathen, B Nebiyou Bekele, Richard E Champlin, Laurence H Baker, and
Robert S Benjamin. “Hierarchical Bayesian approaches to phase II trials in diseases with multiple
subtypes”. In: Statistics in medicine 22.5 (2003), pp. 763–780.
[95] Ryuji Uozumi and Chikuma Hamada. “Interim decision-making strategies in adaptive designs for
population selection using time-to-event endpoints”. In: Journal of Biopharmaceutical Statistics
27.1 (2017), pp. 84–100.
[96] Steen Ventz, William T Barry, Giovanni Parmigiani, and Lorenzo Trippa. “Bayesian
response-adaptive designs for basket trials”. In: Biometrics 73.3 (2017), pp. 905–915.
[97] Matt P Wand and John T Ormerod. “On semiparametric regression with O’Sullivan penalized
splines”. In: Australian & New Zealand Journal of Statistics 50.2 (2008), pp. 179–198.
[98] Sue-Jane Wang, Robert T O’Neill, and HM James Hung. “Approaches to evaluation of treatment
eect in randomized clinical trials with genomic subset”. In: Pharmaceutical Statistics: The Journal
of Applied Statistics in the Pharmaceutical Industry 6.3 (2007), pp. 227–244.
[99] Wei Wang and Dylan S Small. “Monotone B-spline smoothing for a generalized linear model
response”. In: The American Statistician 69.1 (2015), pp. 28–33.
[100] J Kyle Wathen and Peter F Thall. “A simulation study of outcome adaptive randomization in
multi-arm clinical trials”. In: Clinical Trials 14.5 (2017), pp. 432–440.
125
[101] Wessel N van Wieringen. “Lecture notes on ridge regression”. In: arXiv preprint arXiv:1509.09169
(2015).
[102] Dawn B Woodard. “Detecting poor convergence of posterior samplers due to multimodality”. In:
Discussion Paper 2008–05. Department of Statistical Science, Duke University Durham, NC, 2007.
[103] Yanxun Xu, Florica Constantine, Yuan Yuan, and Yili L Pritchett. “ASIED: a Bayesian adaptive
subgroup-identication enrichment design”. In: Journal of biopharmaceutical statistics 30.4 (2020),
pp. 623–638.
[104] Yanxun Xu, Peter Müller, Apostolia M Tsimberidou, and Donald Berry. “A nonparametric
Bayesian basket trial design”. In: Biometrical Journal 61.5 (2019), pp. 1160–1174.
[105] Yanxun Xu, Lorenzo Trippa, Peter Müller, and Yuan Ji. “Subgroup-based adaptive (SUBA) designs
for multi-arm biomarker trials”. In: Statistics in Biosciences 8 (2016), pp. 159–180.
[106] Guosheng Yin, Zhao Yang, Motoi Odani, and Satoru Fukimbara. “Bayesian hierarchical modeling
and biomarker cuto identication in basket trials”. In: Statistics in Biopharmaceutical Research
13.2 (2021), pp. 248–258.
[107] Jun Yin, Rui Qin, Daniel J Sargent, Charles Erlichman, and Qian Shi. “A hierarchical Bayesian
design for randomized Phase II clinical trials with multiple groups”. In: Journal of
Biopharmaceutical Statistics 28.3 (2018), pp. 451–462.
[108] Weidong Zhang, Jing Wang, and Sandeep Menon. “Advancing cancer drug development through
precision medicine and innovative designs”. In: Journal of biopharmaceutical statistics 28.2 (2018),
pp. 229–244.
[109] Zhiwei Zhang, Ruizhe Chen, Guoxing Soon, and Hui Zhang. “Treatment evaluation for a
data-driven subgroup in adaptive enrichment designs of clinical trials”. In: Statistics in Medicine
37.1 (2018), pp. 1–11.
[110] Yihua Zhao, John Staudenmayer, Brent A Coull, and Matthew P Wand. “General design Bayesian
generalized linear mixed models”. In: Statistical science (2006), pp. 35–51.
[111] Heng Zhou, Fang Liu, Cai Wu, Eric H Rubin, Vincent L Giranda, and Cong Chen. “Optimal
two-stage designs for exploratory basket trials”. In: Contemporary clinical trials 85 (2019),
p. 105807.
[112] Tianjian Zhou and Yuan Ji. “RoBoT: a robust Bayesian hypothesis testing method for basket
trials”. In: Biostatistics 22.4 (2021), pp. 897–912.
126
Abstract (if available)
Abstract
Recent developments in gene sequencing of tumors in oncology have led to widespread increase in use of targeted therapies based on patients' biomarkers, a paradigm generally referred to as "precision medicine". Before such advances, cancer was viewed in relatively homogeneous terms, and treatment strategies focused on type, location, and stage of disease but did not distinguish among patients' tumor biology. With the advent of precision medicine treatments, new challenges in statistical design of clinical trials have naturally emerged.
This thesis focuses on challenges and advances in statistical designs that attempt to evaluate treatments in the framework of predictive biomarkers. First, we describe state of art trial designs and recent developments in the areas of basket and adaptive enrichment trials, in two separate review articles. A novel adaptive enrichment trial design to handle continuous biomarkers which may have non-linear or non-monotonic relationships with outcomes or treatment effects is then introduced, where we propose a Bayesian solution involving adaptive randomization. We show that this particular design can correctly make marker-specific trial decisions with high efficiency, which results in significantly improved and patient-tailored outcomes compared to standard approaches without adaptive randomization that further ignore or over-simplify true underlying marker relationships.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Randomized clinical trial generalizability and outcomes for children and adolescents with high-risk acute lymphoblastic leukemia
PDF
A simulation evaluation of the effectiveness and usability of the 3+3 rules-based design for phase I clinical trials
PDF
Robust and adaptive algorithm design in online learning: regularization, exploration, and aggregation
PDF
A simulation study between 3+3, Rolling-6 and i3+3 design
PDF
A novel risk-based treatment strategy evaluated in pediatric head and neck non-rhabdomyosarcoma soft tissue sarcomas (NRSTS) patients: a survival analysis from the Children's Oncology Group study...
PDF
Construction of a surgical survival prediction model of stage IV NSCLC patients-based on seer database
PDF
The effects of late events reporting on futility monitoring of Phase III randomized clinical trials
PDF
An iPSC-based biomarker strategy to identify neuroregenerative responders to allopregnanolone
PDF
Personalized breast cancer management using 5-fluorouracil and related biomarkers: a molecular biology and business perspective
PDF
The impact of data collection procedures on the analysis of randomized clinical trials
PDF
An assessment of the dependence of routine assessments of the scheduling the estimation of event rates in follow‐up trials
PDF
Associations between inflammatory markers and change in cognitive endpoints
PDF
Principal dynamic mode analysis of cerebral hemodynamics for assisting diagnosis of cerebrovascular and neurodegenerative diseases
PDF
Effects of demographic and tumor characteristics on outcomes in children with malignant peripheral nerve sheath tumors
PDF
Investigation of a causal role of transposable element activation in vertebrate aging
PDF
Survival of children and adolescents with low-risk non-rhabdomyosarcoma soft tissue sarcomas (NRSTS) treated with surgery only: an analysis of 234 patients from the Children’s Oncology Group stud...
PDF
Identification of novel epigenetic biomarkers and microRNAs for cancer therapeutics
PDF
Extremity primary tumors in non-rhabdomyosarcoma soft tissue sarcoma: survival analysis
PDF
Phase I clinical trial designs: range and trend of expected toxicity level in standard A+B designs and an extended isotonic design treating toxicity as a quasi-continuous variable
PDF
RANSAC-based semi-supervised learning algorithms for partially labeled data, with application to neurological screening from eye-tracking data
Asset Metadata
Creator
Tu, Yue
(author)
Core Title
Biomarker-driven designs in oncology
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Biostatistics
Degree Conferral Date
2024-05
Publication Date
05/24/2024
Defense Date
04/11/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
adaptive enrichment design,adaptive randomization,basket design,Bayesian adaptive design,biomarker-driven design,clinical trial,OAI-PMH Harvest,precision medicine,predictive biomarker
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Renfro, Lindsay A. (
committee chair
), Alonzo, Todd A. (
committee member
), Krailo, Mark (
committee member
), Mack, Wendy J. (
committee member
), Mascarenhas, Leo (
committee member
)
Creator Email
tuyue831@gmail.com,yuetu@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113967455
Unique identifier
UC113967455
Identifier
etd-TuYue-13028.pdf (filename)
Legacy Identifier
etd-TuYue-13028
Document Type
Dissertation
Format
theses (aat)
Rights
Tu, Yue
Internet Media Type
application/pdf
Type
texts
Source
20240528-usctheses-batch-1162
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
adaptive enrichment design
adaptive randomization
basket design
Bayesian adaptive design
biomarker-driven design
clinical trial
precision medicine
predictive biomarker