Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Calibration uncertainty in model-based analyses for medical decision making with applications for ovarian cancer
(USC Thesis Other)
Calibration uncertainty in model-based analyses for medical decision making with applications for ovarian cancer
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CALIBRATION UNCERTAINTY IN MODEL-BASED ANALYSES FOR MEDICAL DECISION MAKING WITH APPLICATIONS FOR OVARIAN CANCER by Jing Voon Chen ADissertationPresentedtothe FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (INDUSTRIAL AND SYSTEMS ENGINEERING) August 2018 Copyright 2018 Jing Voon Chen Acknowledgements I would like to express my sincere gratitude to my advisor, Dr. Julia Higle, for her guidance, inspiration, patience, and support throughout my Ph.D. study. Ithanktherestofmydissertationcommittee,Dr. Sze-ChuanSuenandDr. Lynda Roman, for their time, interest, and support. I wish to extend thanks to my peers for the stimulating discussions. Last but not least, I would like to thank my family and friends for their continued support and encouragement. ii Table of Contents Acknowledgements ii List of Tables vi List of Figures viii Abstract ix Chapter 1: Introduction 1 1.1 Bibliography 3 Chapter 2: A Literature Review on Validation, Calibration, and Sensitivity Analysis in Disease Models 4 2.1 Validation 5 2.2 Calibration 9 2.2.1 Parameters to be Calibrated 10 2.2.2 Calibration Targets 11 2.2.3 GOF Measure 11 2.2.4 Search Strategy 12 2.2.5 Acceptance Criteria 14 2.2.6 Stopping Rule 15 2.2.7 Integration of Calibration Results and Economic Parameters 15 2.2.8 Conclusion 15 2.3 Sensitivity Analysis (SA) 16 2.3.1 Sources of Uncertainty 17 2.3.2 Types of SA 19 2.3.3 Best Practices for Conducting SA 24 2.3.4 Summary 26 2.4 Conclusion 26 2.5 Bibliography 28 Chapter 3: Model-Based Calibration for Natural History Modeling 33 3.1 Motivation 33 3.2 A Fictitious Disease 34 3.2.1 Disease Dynamics 36 3.2.2 Incorporating Relationships Among Parameters in Calibration 38 3.3 Calibration Uncertainty 40 3.4 Model-Based Calibration 41 3.5 Net Monetary Benefit Analysis 42 3.6 Discussion 47 3.7 Bibliography 52 iii Appendix A:Data for the Fictitious Disease 54 A.1 P true ,⇡ true ,and ↵ true 54 A.2 Data for Plausible Models 56 A.3 Validity Conditions 63 A.4 Parameters for the Interventions 65 A.4.1 Screening Test Characteristics 65 A.4.2 Treatment Parameters 66 A.4.3 Quality of Life 67 A.4.4 Net Monetary Benefit Calculations 68 A.4.5 The NMB of Each Intervention 71 Chapter 4: A Literature Review of Model-Based Analyses for Ovarian Cancer 72 4.1 Natural History Models 73 4.2 Screening Detection Models 84 4.3 Cancer Prevention Models 86 4.3.1 Models Without Salpingectomy 87 4.3.2 Models With Salpingectomy 89 4.4 Conclusion 91 4.5 Bibliography 94 Chapter 5: Calibration Uncertainty and Model-Based Analyses with Applications to Ovarian Cancer Modeling 103 5.1 Introduction 103 5.2 Model-Based Analysis for Ovarian Cancer 104 5.3 Natural History Model for Ovarian Cancer 108 5.3.1 Data: Sources and Characteristics 109 5.3.2 Model Structure 110 5.3.3 Modeled Outcomes 113 5.3.4 Validity Conditions 114 5.3.5 Model-Based Calibration for Ovarian Cancer 115 5.4 Results 117 5.5 Conclusions 122 5.6 Bibliography 124 Appendix B:Data 127 B.1 Ovarian Cancer Data 127 B.2 Validity Conditions 130 Chapter 6: Conditionally Stationary Markov Models 132 6.1 The Disease Model 132 6.1.1 The Nonstationary Model 132 6.1.2 The Conditionally Stationary Model 133 6.1.3 Modeled Outcomes 135 6.1.4 Modeling Disease Activation as a Piecewise-Linear Function 138 6.1.5 The Calibration Problem 139 iv 6.2 Bibliography 140 Appendix C:Derivations 141 C.1 P (n) i,j (a); n-Step Transition Probability 141 C.2 S(i,a,n); Survival Probabilities 142 C.3 P{T S 1 =a}; Disease Activation 142 Chapter 7: A Risk- and Subtype-Di↵erentiated Model for Ovarian Cancer Prevention 143 7.1 Introduction 143 7.1.1 Literature Review 144 7.2 Methods 147 7.2.1 Model Structure 148 7.2.2 Calibration Targets 151 7.2.3 Modeled Outcomes 152 7.2.4 Model Fit 153 7.2.5 Model Simplifications 154 7.2.6 Validity Conditions 156 7.2.7 The Calibration Problem 157 7.3 Illustrative Results 158 7.4 Discussions 161 7.5 Bibliography 167 Appendix D:Data 172 D.1 Age-Specific Risk Estimates for BRCA Mutation Carriers 172 D.2 Validity Conditions 172 D.3 Modeling Surgical Interventions 176 Chapter 8: Conclusions 178 v List of Tables Table 3.1: Data fit of three models in two GOF measures 41 Table 3.2: Results of NMB analysis forP true ,P A ,andP B 43 Table 3.3: Net monetary benefit for each intervention 44 Table 3.4: Results of 1,000,000 randomly generated matrices 45 Table A.1: ⇡ true 55 Table A.2: ↵ true 55 Table A.3: ⇡ 0 57 Table A.4: ⇡ A 57 Table A.5: ⇡ B 57 Table A.6: ⇡ true 58 Table A.7: ⇡ 0 58 Table A.8: ⇡ A 59 Table A.9: ⇡ B 59 Table A.10: ↵ true 60 Table A.11: ↵ 0 60 Table A.12: ↵ A 60 Table A.13: ↵ B 60 Table A.14: ⇡ true 61 Table A.15: ⇡ 0 61 Table A.16: ⇡ A 62 Table A.17: ⇡ B 62 vi Table A.18:Probability of a positive test result given health state 65 Table A.19:Screening test costs 65 Table A.20:Treatment parameters 67 Table A.21:Quality of life parameters 68 Table A.22:Age distribution 69 Table A.23:Net monetary benefit for each intervention 71 Table 4.1: A summary of ovarian cancer modeling studies 81 Table 4.2: A summary of natural history models for ovarian cancer 82 Table 5.1: Mean duration of preclinical phase, percentiles from plausible models 119 Table 5.2: Percentiles of the duration of the preclinical phase 119 Table 7.1: Mean duration of preclinical phase 160 Table 7.2: Mean time to metastasis 160 Table 7.3: Mean duration in precursor lesions 160 Table 7.4: Mean age at activation for non-serous cancer 161 Table 7.5: Mean age at activation for serous cancer 161 Table 7.6: Median lifetime reduction in incidence and mortality, by subtype and patient type 163 Table 7.7: Median lifetime reduction in serous incidence for women with a BRCA mutation 164 Table D.1: Modeling the e↵ect of BS and BSO for the serous model 177 Table D.2: Modeling the e↵ect of BS and BSO for the non-serous model 177 vii List of Figures Figure 3.1: Possible transitions among 9 health states 35 Figure 3.2: Representation of plausible search spaces for P HH and P II 39 Figure 3.3: The range of the net monetary benefit for each intervention 50 Figure 3.4: The range of the net monetary benefit for each intervention 51 Figure 5.1: Possible transitions for the ovarian cancer model 112 Figure 5.2: Plausiblemortalityreductionasafunctionoftheageatwhichscreen- ing is initiated 120 Figure 5.3: Evidence of early detection using age and stage at diagnosis 121 Figure B.1: The boxplots of the age and stage at diagnosis from years 2000 to 2014 128 Figure B.2: Nonstationarity in post-diagnosis survival 129 Figure 7.1: Possible transitions for the serous (dashed) and non-serous (dotted) models 149 Figure D.1: Annual risk (per 100,000 per year) by age for BRCA mutation carriers172 Figure D.2: Ovarian cancer risk by age for BRCA mutation carriers 173 viii Abstract Model-basedanalysisforcomparativeevaluationofstrategiesfordiseasetreat- ment and management requires the development of a model of the disease that is being examined. The natural history (NH) model is arguably the most critical element in this process. A NH model requires the specification of various model parameters, some of which may not be observable, and generates modeled out- comes that o↵er an ability to assess the model’s consistency with available data and clinical judgment. There is rarely a unique set of model parameters that are consistentwithobservabledata, aphenomenonknownascalibrationuncertainty. Because model parameters influence comparative analyses, insucient exam- ination of the breadth of potential model parameters can create a false sense of confidence in the model recommendations, and ultimately cast doubt on the value of the analysis. This dissertation introduces a systematic approach to the examination of calibration uncertainty and its impact. We represent the calibra- tion process as a constrained nonlinear optimization problem and introduce the notion of plausible models which define the uncertainty region for model param- eters. In doing so, our framework integrates three crucial components that are undertaken in model-based analysis for assessing credibility of a model’s conclu- sions: validation, calibration, and sensitivity analysis. We first illustrate our calibration approach using a fictitious disease. We add degrees of realism by adapting our framework within the context of ovarian cancer. By examining the breadth of plausible models for ovarian cancer, we explore the range of potential unobservable disease characteristics, as well as the potential for early detection of ovarian cancer. We introduce the notion of a conditionally stationary Markov model as a method for streamlining the compu- tational burden associated with the identification of plausible models. Finally, we incorporate recent discoveries in ovarian carcinogenesis and develop a risk- ix and subtype-di↵erentiated model. By adopting our calibration framework, we characterize the set of plausible models and assess the potential of prophylactic strategies. x Chapter 1: Introduction Analysesformedicaldecisionmaking(MDM)typicallyinvolvecomparisonsofdiseasescreen- ingortreatmentoptionsintermsoftherelativecostsandbenefitsofvariousstrategiesunder consideration. Randomized control trials (RCTs) are considered the gold standard for such purposes because actual e↵ects of di↵erent medical interventions are observed and analyzed. RCTs are typically costly and time-intensive, and are restricted to a small number of inter- ventions. As a result, model-based analyses for MDM can be appealing as surrogates for RCTs. Model-based analysis for MDM relies heavily upon a natural history (NH) model of the disease, which is a model of the disease progression and regression in the absence of interventions. A NH model requires the specification of various model parameters, some of which may not be observable, and can be used to model disease outcomes that can be compared to data available from epidemiological and clinical studies. Because the models areusedtoassessthecostsandbenefitsofinterventions,itisessentialtoensurethatamodel is consistent with such data. The process of selecting the model parameters to encourage consistency between modeled outcomes and observed data is known as calibration. It is rarely the case that exactly one model fits the data, and the lack of a unique model is known as calibration uncertainty. Insucient examination of the impact of calibration uncertainty on model recommendations can lead to misleading conclusions. In Chapter 2, we review vital components involved in the development of a model-based analysis: calibration, validation, and sensitivity analysis. Limited transparency in docu- menting the calibration and validation process in disease modeling (Kim and Thompson [2010]; Sendi et al. [1999]; Stout et al. [2009]) and insucient examination of the impact of calibration uncertainty motivate our investigation into the use of an operations research based approach in the calibration and validation process. 1 In Chapter 3, we explore the use of constrained optimization problem(s) to represent the calibration/validation process. We illustrate our approach via a fictitious disease that progresses as a stationary Markov chain calibrated to noise-free data. We also adapt our approach to perform a robust examination of the impact of calibration uncertainty on the outcomes. Motivated by the non-observability or relative paucity of data from clinical or epidemi- ological studies in ovarian cancer, we apply our calibration/validation framework developed in Chapter 3 to ovarian cancer. Chapter 4 presents a review of the brief literature on natural history models, screening detection models, and cancer prevention models of ovarian cancer. Weadddegreesofrealismintoourframeworkbyincorporatingage-dependencyofdisease activation and competing risk mortality. To do so, we introduce a conditionally stationary model of a natural history, which involves a structural decomposition of a nonstationary Markov model. As in Chapter 3, the calibration/validation process of the conditionally stationary ovarian cancer natural history model is represented as a constrained optimization problem in Chapter 5. After identifying a set of plausible models, we examine the plausible rangesofthepreclinicalduration forovarian cancer. Weexaminethee↵ect ofahypothetical screening program on ovarian cancer mortality as a function of the age at which screening is initiated. The technical details of the conditionally stationary model are presented in Chapter 6. Chapter7introducesarisk-andsubtype-di↵erentiatedmodelofovariancancer. Thereis increasing evidence of fallopian tube lesions as a precursor of serous ovarian cancer (Kindel- berger et al. [2007]; Labidi-Galy et al. [2017]; Lee et al. [2007]; Piek et al. [2001]). We incorporate these findings in our model, and examine the plausible impact of preventive surgeries. 2 1.1 Bibliography Kim, L. G. and Thompson, S. G. (2010). Uncertainty and validation of health economic decision models. Health Economics,19(1):43–55. Kindelberger,D.W.,Lee,Y.,Miron,A.,Hirsch,M.S.,Feltmate,C.,Medeiros,F.,Callahan, M. J., Garner, E. O., Gordon, R. W., Birch, C., et al. (2007). Intraepithelial carcinoma of the fimbria and pelvic serous carcinoma: Evidence for a causal relationship. The American Journal of Surgical Pathology,31(2):161–169. Labidi-Galy, S. I., Papp, E., Hallberg, D., Niknafs, N., Adle↵, V., Noe, M., Bhattacharya, R., Novak, M., Jones, S., Phallen, J., Hruban, C. A., Hirsch, M. S., Lin, D. I., Schwartz, L., Maire, C. L., Tille, J.-C., Bowden, M., Ayhan, A., Wood, L. D., Scharpf, R. B., Kurman, R., Wang, T.-L., Shih, I.-M., Karchin, R., Drapkin, R., and Velculescu, V. E. (2017). High grade serous ovarian carcinomas originate in the fallopian tube. Nature Communications, 8(1):1093. Lee, Y., Miron, A., Drapkin, R., Nucci, M., Medeiros, F., Saleemuddin, A., Garber, J., Birch, C., Mou, H., Gordon, R., et al. (2007). A candidate precursor to serous carcinoma that originates in the distal fallopian tube. The Journal of Pathology,211(1):26–35. Piek, J. M., van Diest, P. J., Zweemer, R. P., Jansen, J. W., Poort-Keesom, R. J., Menko, F. H., Gille, J. J., Jongsma, A. P., Pals, G., Kenemans, P., et al. (2001). Dysplastic changes in prophylactically removed Fallopian tubes of women predisposed to developing ovarian cancer. The Journal of Pathology,195(4):451–456. Sendi, P. P., Craig, B. A., Pfluger, D., Gafni, A., and Bucher, H. C. (1999). Systematic validation of disease models for pharmacoeconomic evaluations. Journal of Evaluation in Clinical Practice,5(3):283–295. Stout, N. K., Knudsen, A. B., Kong, C. Y., McMahon, P. M., and Gazelle, G. S. (2009). Calibration methods used in cancer simulation models and suggested reporting guidelines. Pharmacoeconomics,27(7):533–545. 3 Chapter 2: A Literature Review on Validation, Calibration, and Sensitivity Analysis in Disease Models Model development for medical decision making (MDM) begins with two components: prob- lem conceptualization and model conceptualization (Roberts et al. [2012]). In problem con- ceptualization, the nature of the problem at hand is understood in terms of its objective, modeling perspective, target population, performance outputs, strategies, resources, and modeling time horizon. Problem conceptualization is followed by model conceptualization, which involves selecting an appropriate model structure to represent the problem. The inputs, outputs, and specifications of the model should be consistent with current knowledge and available data. Examination of the impact of the uncertainty associated with components of the model is also necessary. In this chapter, we review the literature associated with the post-conceptualization phase of the development of a model, namely, • validation, • calibration, and • sensitivity analysis. Calibration involves the selection of model input parameter values that result in model outputs that are consistent with observed data. Validation establishes the credibility of a model through verification of computational accuracy and consistency with current knowl- edge and available data. Sensitivity analysis examines the relationship between the uncer- tainty in model components and the conclusions that are drawn from the model. In §2.1, we introduce validation in model-based analysis for MDM. In §2.2, we provide an overview of calibration methods, and in §2.3, we trace the history of sensitivity analysis in model-based analysis for MDM. Finally, we present our conclusion in §2.4. 4 2.1 Validation Model-based analyses are intended as aids to decision making. Without sucient confidence in a model, decision makers are reluctant to utilize the information and insights obtained via the model. Validation is one of the methods used to assess the credibility of a model. There are five types of model validity suggested by the MDM research community: face validity, internal validity, external validity, cross validity, and predictive validity (Eddy et al. [2012]; Ramos et al. [2015]). Face validity Face validity refers to the extent to which the model structure, data requirements, problem formulation, and results of a model are consistent with current science and evidence, as evaluated by experts (Eddy et al. [2012]). Internal validity Internal validity refers to the internal, computational accuracy of a model, including theaccuracyofthecodes,equations,parameters,andmathematicalcalculations(Eddy et al. [2012]; Sendi et al. [1999]; Weinstein et al. [2001]). External validity Externalvalidityreferstotheextenttowhichsimulationofeventsaccuratelyrepresent corresponding events in a clinical trial or population (Eddy et al. [2012]). Cross validity Cross validity refers to the agreement between di↵erent models that address the same researchproblem. Whenperformingacrossvalidation,discrepanciesintheconclusions are explained (Eddy et al. [2012]; Weinstein et al. [2001]). Predictive validity Predictivevalidityreferstotheabilityofamodeltopredicttheresultsofaprospective trial that are to be released in the future (Eddy et al. [2012]; Weinstein et al. [2003, 5 2001]). Before the results are unfolded, predictive validation cannot be conducted during the development of a model. The evidence required to achieve these various forms of validation overlaps, which can create confusion within and between the definitions. For instance, consistency in the results of a model with current science and data can be interpreted as achieving both face validity and external validity. Face validity should fail if there are errors in mathematical and com- putational expressions, which suggests an overlap with internal validity. External validation and predictive validation both involve comparing a model’s predictions to actual events. If amodelisdevelopedusingdatafromtheinitialstudyperiodofaclinicaltrialandvalidated against the data from the remaining study period of the same trial, it is unclear whether a predictive validation or an external validation is conducted. The types of data available play an important role in the types of validation that can be conducted. Inaddition,atradeo↵mustbemadebetweenthevalueofanimprovedmodelasa result of further validation and the costs of obtaining additional data for validation purposes (Weinstein et al. [2001]). As a result, although all five types of validation are recommended as best practices for validation in model-based analyses by the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) and the Society for Medical Decision Making (SMDM) Task Force on Good Research Practices in Modeling, the ISPOR-SMDM Task Force recognizes that not all levels of validity can be achieved by a model. A model’s failure to achieve all levels of validity does not imply that it is not useful; the validity of a model is judged by those using its conclusions (Eddy et al. [2012]). The value of validation is widely recognized, yet the process of validating a model is not usually reported in model-based analyses (Kim and Thompson [2010]; Sendi et al. [1999]). Sendi et al. [1999] propose a rigorous approach for model validation and demonstrate the approach using an analysis investigating the impact of two treatment regimens for Mycobac- terium avium complex (MAC) drug prophylaxis in human immunodeficiency virus (HIV) infected individuals, where the natural history for developing MAC infection is modeled as 6 astationaryMarkovprocess. Theproposedvalidationapproachconsistsoffoursequential steps: internal validation, external validation, face validation, and cross validation. Internal validity is achieved through two separate computer implementations of the model. The com- puterprogramsaredebuggeduntilresultsareconcordantandtechnicalaccuracyisachieved. One-way sensitivity analysis is used in the debugging process. For example, the expected survival length increases when the drug ecacy is increased to 100% and the expected gain in life-months with either treatment compared to no treatment increases as MAC incidence increases. External validity is established by comparing the intermediate outcomes, such as MAC incidence, and the final outcomes, such as the expected median survival length, with independent data from di↵erent studies (Egger et al. [1997]; Low et al. [1997]). Both intermediate and final outcomes are found to be consistent with the published data. Face validity is established by checking whether the modeled survival curves for di↵erent starting health states are consistent with the logical conclusions. For example, the simulated results show that HIV-patients who initially have AIDS have much lower expected survival than those who initially do not have AIDS when both groups are subjected to the same MAC prophylactic regimen, which agrees with clinical knowledge. Lastly, cross validity is achieved by comparing the model’s conclusions with those from other independently created models (Freedberg et al. [1997, 1998]) that are used to study the e↵ect of the same MAC prophylac- tic treatment regimens. Sendi et al. [1999] conclude that clarithromycin treatment results in lower life expectancy compared to azithromycina, contradicting the conclusions of Freedberg etal.[1997]andFreedbergetal.[1998]. Theyattributethediscrepancytodi↵erentmodeling assumptions regarding the ecacy of treatment regimens. KimandThompson[2010]demonstratetheuseofvalidationtouncoverstructuralerrors. Internal, predictive, and external validations are conducted on a long-term cost-e↵ectiveness analysis of one-time screening at age 65 for abdominal aortic aneurysm (AAA) in males. The AAA natural history is modeled as a stationary Markov process in which the transition probabilitiesareestimatedfrom4-yearfollow-up, patient-leveldatafrom alargerandomized 7 controlled trial, the Multi-centre Aneurysm Screening Study (MASS) (Scott and The Mul- ticentre Aneurysm Screening Study Group [2002]). Internal validation of the Markov model is conducted by simulating the model and comparing the total numbers of events with the 4-year follow up information from the MASS trial. They conclude that the model-predicted numbers of events, total costs, and life-years gained are similar to the actual trial data, but thetimingsofthepredictedeventsareinconsistentwithobserveddata. Predictivevalidation is performed using the data from the 7-year follow up of the same MASS trial (Kim et al. [2007b]). The model is extrapolated to 7 years. As with internal validation, the numbers of the same events and their time trend are compared with the trial data from years 5 to 7. The model predictions do not agree with the actual data, so a second Markov model is developed to account for the nonstationarity of disease progression and is calibrated to the 7-year trial data. In external validation, the numbers and timings of the events of the first model are compared with the 15-year follow up data from a smaller, UK-based randomized controlled trial in Chichester (Ashton et al. [2007]), and discrepancies can be explained by the di↵erences in population and intervention characteristics in the Chichester trial. To ac- count for these discrepancies, a third model that is slightly modified from the first model and calibrated to the Chichester trial data is used. All three models are used for subsequent analyses to estimate the long-term cost-e↵ectiveness of screening for AAA. While the approach of model validation proposed by Sendi et al. [1999] represents a rig- orous way to achieve transparency and validity of a model, their recommendation on using sensitivity analysis as a debugging tool clearly shows that there is ambiguity in the interpre- tations of internal and face validity. It is unclear whether the independent models chosen in cross validation are validated. Little value is added in the case where invalid models are selected. On the other hand, Kim and Thompson [2010] show that model development is an iterative process; errors in the model can be identified via various forms of attempted vali- dation. Since no further validation is conducted for all three models, the level of credibility achieved by the models remains unclear. 8 In summary, the validity of a model is greatly valued and can be achieved at di↵erent levels. Although not all levels of validity can be accomplished in a model-based analysis, analysts should strive for greater credibility of a model’s conclusions by clearly describing their validation process. This transparency is not yet prevalent in current model-based studies. Avalidmodeloftenrequiresthatmodelparametersbeestimatedtofittheobserved data. Closely associated with external validation, this process of fitting the observed data is conventionally referred to as model calibration, which we will discuss in the next section. 2.2 Calibration The natural history model is perhaps the most important component in model-based analy- ses. Thenaturalhistoryofadiseaseisthenaturalprogressionandregressionofthediseasein theabsenceofmedicalinterventions. Itisrarelyobservedcompletelyforavarietyofreasons. First, it is unethical to harm a person by withholding medical interventions, and treatments alter the natural course of the disease. In addition, the preclinical stage of a disease, i.e. from disease onset to appearance of symptoms, is unobservable. Thus, there are known and unknown components in the natural history of a disease, and they need to be represented properly in the natural history model. Once a model structure is determined, the values of the model parameters are selected to populate the model. In conventional natural history modeling, observable model parameters can be directly estimated using published data from epidemiological studies, or other data sources such as disease-related registries. Unobserv- able model parameters, for example, the parameters representing the preclinical progression of a disease, need to be estimated via other methods. Calibration is often employed in model-based analyses to estimate these parameters. Calibration is a process of selecting the values of model parameters so that the model outputs are consistent with observed data (Karnon and Vanni [2011]; Stout et al. [2009]; Vanni et al. [2011]; Weinstein et al. [2001]). It involves several subjective decisions that 9 modelers need to make, including the choice of data used to be calibrated against, the quantitative measure of fit between the data and the parameters, and the criteria by which a set of parameters is declared as good-fitting. Duetoitsroleinestimatingtheparametervalues,calibrationisanessentialcomponentin thedevelopmentofamodel. Acarefullycalibratedmodeladdstothevalidityandcredibility of a model (Stout et al. [2009]; Vanni et al. [2011]). However, Stout et al. [2009] review 154 published cancer screening models and note that there are significant discrepancies in reporting details of the calibration processes used. They conclude that there is a need for rigorous and consistent documentation for calibration. Several studies such as those of Havrilesky et al. [2008], Havrilesky et al. [2011], and Tan et al. [2006] report minimal and vague calibration descriptions that consist of only a few sentences. Sincethereislittleconsensusonbestpracticesforcalibration, Vannietal.[2011]propose arigorous,seven-stepapproachtomodelcalibration. Itincludesidentificationof 1. Parameters to be calibrated, 2. Calibration targets, 3. Goodness-of-fit (GOF) measure, 4. Search strategy, 5. Acceptance criteria, 6. Stopping rule, and 7. Integration of calibration results and economic parameters. 2.2.1 Parameters to be Calibrated Traditionally, calibration is employed to select the unobservable parameter values in disease modeling, as seen in the modeling studies of Fryback et al. [2006], Kong et al. [2009], Erenay et al. [2011], Taylor et al. [2011], and Taylor et al. [2012]. In recent years, more studies 10 advocate the inclusion of all model parameters, both observable and unobservable, in the process of calibration (Goldhaber-Fiebert et al. [2007]; Karnon et al. [2009]; Karnon and Vanni [2011]; Kim et al. [2007a]; van de Velde et al. [2007]). Vanni et al. [2011] comment that the relationships between parameters can be better depicted by including all model parameters in calibration. 2.2.2 Calibration Targets The calibration targets are typically observed data to be compared with model outputs during calibration. Vanni et al. [2011] suggest that calibration targets be derived using good-quality data that suciently represent both the population of interest and the health management context. Such data include local population data and cohort study data. They also stress that caution should be taken when cross-sectional data are used as calibration targetssincedi↵erentbirthcohortsmightexperiencedi↵erentepidemiologicalpatternsofthe same disease. Common calibration targets include incidence, prevalence, survival, mortality, and disease subtype distribution. In natural history models that are discrete time Markov chains, transition matrix multiplication can be used to assess the fit to calibration targets. 2.2.3 GOF Measure The GOF measure is the quantitative measure that describes the accuracy of model outputs in predicting the calibration targets for an input set of parameters. In optimization, the GOF measure may appear in the objective function or in the constraint set. One of the most commonGOFmeasuresforasinglecalibrationtargetisthesquarederrorbetweenthemodel output and the target (Erenay et al. [2011]; Kong et al. [2009]). Draisma et al. [2003] and Karnonetal.[2009]chooseaweightedsquarederrorcalledthechi-squaredstatistic, whichis obtainedbydividingthesquarederrorbythestandarddeviationofthecorrespondingtarget. Taylor et al. [2011] and Taylor et al. [2012] compute the percent error for each target, which is the absolute deviation from the target divided by the target. Fryback et al. [2006] and 11 van de Velde et al. [2007] define a range for each target within which the fit is considered acceptable. Another popular choice of GOF measure is the likelihood (Goldhaber-Fiebert et al. [2007]; Karnon and Vanni [2011]; Kim et al. [2007a]; Tan et al. [2006]). Goldhaber- Fiebert et al. [2007] assume that the calibration targets are independently and normally distributed with mean and standard deviations obtained from 95% confidence intervals. The overall GOF score is defined as 2timesthesumofthelog-likelihoodscoresforeachtarget. The likelihood GOF score is defined similarly in Kim et al. [2007a]. Each of these methods have di↵erent requirements for the amount and quality of data. For instance, the likelihood methodismoredatademandingthanthesquareddeviationandchi-squaredmethods(Vanni et al. [2011]). Inthecasewheretherearemultiplecalibrationtargets, anoverallGOFmeasurecombin- ing individual GOF measure for each target is computed to reflect the overall fit of data for a particular set of parameter values. However, there is no standard practice for combining the GOF scores for multiple calibration targets (Kong et al. [2009]). Vanni et al. [2011] summarize several methods of combining individual GOF measures into one global GOF measure: one method weights each calibration target and sums the GOF measures across all calibrationtargets; asecondmethodinvolvesrankingeachtargetaccordingtoitsimportance and conducting a step-by-step search that begins with the most important target; a third approach specifies multi-dimensional integrals representing the uncertainty around groups of calibration targets. Enns et al. [2015] propose a Pareto frontier approach that does not require an overall GOF measure to be specified. They define a set of parameter values to be Pareto-optimal if no other sets can fit all the calibration targets as least as well or better. 2.2.4 Search Strategy A search strategy is a systematic method of searching the parameter space to obtain the best fit. In optimization, it is the solution method. There are several commonly used search methods in disease-modeling literature, but there remains no standard practice (Kong et al. 12 [2009]) nor perfect method (Vanni et al. [2011]). There are two types of search methods in model-based analyses for MDM: grid-based method and search-based method. In deterministic grid-based method, each parameter included in the search is assigned a plausible range that is divided into parts defined by grid points. The search method involves the complete enumeration of each grid point. All points are evaluated and the parameter set with the best GOF score is returned (Vanni et al. [2011]). Erenay et al. [2011] specify abiologicallyplausiblerangeforeachof5unknownparametersinvolvedincalibrationand astepsizeforeachrange. Thegridforeachparameterconsistsofthelowerandupper bounds of the range and the interior points where adjacent points are a step size away from each other. To reduce the computational burden associated with complete enumeration of representative points, a subset of the representative points can be selected for evaluation. Stochastic grid-based methods randomly select points according to some probability distri- bution. The uniform distribution is commonly chosen as the probability distribution, where its range is informed by biologically plausible values and data from the literature (Fryback et al. [2006]; Goldhaber-Fiebert et al. [2007]; Kim et al. [2007a]). Other distributions may be chosen if additional information on the parameters is available. The randomization is accomplished by selecting values for each parameter, typically independently. In an e↵ort to obtain sucient coverage of the parameter space, Goldhaber-Fiebert et al. [2007] obtain 1,000,000 sets of parameters whereas Kim et al. [2007a] generate 555,000 sets of parameters. On the other hand, van de Velde et al. [2007] employ a Latin Hypercube to obtain 200,000 sets of parameters. The Latin Hypercube is a grid-based method in which the parameter space is split into hypercubes of equal probability where each interval of a parameter is sampled exactly once (Vanni et al. [2011]). Popular search-based methods in disease-modeling literature include the Nelder-Mead algorithm (Nelder and Mead [1965]) and the generalized reduced gradient method (Lasdon et al. [1974]). In the Nelder-Mead algorithm, a simplex consisting of n+1pointsin R n is formedateachsteptowardminimizingafunctionofn variables. At each step, the point that 13 attainstheworstfunctionalvalueisreplacedwithanewpoint(NelderandMead[1965]). The generalizedreducedgradientmethodreducesthenonlinearprogrambydividingthevariables into basic and nonbasic variables, so the original problem becomes a reduced problem of the nonbasic variables. At each iteration, the search direction is the steepest descent direction to the reduced problem (Lasdon et al. [1974]). Search-based methods generally produce a single parameter set upon termination. However, Taylor et al. [2012] demonstrate using di↵erent combinations of initializations and GOF metrics to produce multiple good-fitting parameter sets. A mixed approach of combining multiple search methods is also proposed. Jit et al. [2010] employ simulated annealing in their search and switch to the generalized reduced gradient method when converging to a minimum. 2.2.5 Acceptance Criteria The acceptance criteria describe the level of fit achieved by a set of parameter values that is considered to be acceptable. The analyst can simply search for the parameter set with the best GOF score, or define an acceptance threshold for which the fit is considered good. For example, Fryback et al. [2006] define a set of parameters for a model with 104 calibration targets to be “minimally acceptable” if it has at most 10 model outputs fall outside of the acceptablerangesofthetargetvalues, and“exceptionallygood”ifatmost5. Otherssuggest that parameter sets are only acceptable when all outputs fall within the acceptable ranges of all targets (van de Velde et al. [2007]). Alternatively, Goldhaber-Fiebert et al. [2007] identify the best-fitting parameter set that has the lowest overall GOF score. Assuming that the GOF score is approximately distributed as a chi-square with degrees of freedom equal to the number of calibration targets, they conduct a likelihood-ratio test with a significance level of 0.05 to identify parameter sets that are statistically indistinguishable from the best- fitting parameter set. Confusions arise and remain unresolved because first, the hypothesis test is not stated clearly. Second, the likelihood functions are not functions of parameters that characterize the chi-square distributions, since the degrees of freedom are fixed. 14 2.2.6 Stopping Rule The stopping rule is simply the criterion used to terminate the search process. This ends the calibration. For example, Taylor et al. [2012] terminate the Nelder-Mead algorithm at 10,000 iterations. Karnon and Vanni [2011] perform the generalized reduced gradient search with multiple start points. Whenever a local optimum is identified, the estimate of the most probable total number of local optima is updated, although it is unclear how the total number of local optima is estimated. The multistart search terminates when the number of identifiedlocaloptimaisatmostoneawayfrommostprobabletotalnumberoflocaloptima. 2.2.7 Integration of Calibration Results and Economic Parameters When calibration is terminated, the calibrated parameter values are integrated with the economic parameters for subsequent analyses. A single parameter set is usually produced as aresultofcalibration,butthereexistotherparametersetsthatareconsistentwiththedata andtheyareoftenunexplored. Thislackofexplorationiscalledcalibrationuncertainty. Few studies consider calibration uncertainty when conducting their analyses. However, several studies examine the impact of calibration uncertainty (e.g., Enns et al. [2015]; Fryback et al. [2006]; Goldhaber-Fiebert et al. [2007]; Karnon et al. [2009]; Karnon and Vanni [2011]; Kim et al. [2007a]; Taylor et al. [2012]; van de Velde et al. [2007]). 2.2.8 Conclusion In short, calibration involves generation of a set of potential model parameters and evalua- tion of a GOF measure. It can be viewed as an optimization problem that maximizes the model fit to available data. The current disease-modeling practice in defining and search- ing the parameter space may be prone to insucient examination of the actual parameter space. Although relationships among model parameters can be specified while defining the parameter space, it is still possible that some combinations of the parameter values might be invalid, or that valid combinations might be overlooked during the search. In addition, it is 15 unclearwhetherdiseasemodelersverifyfacevalidityforeachparametersetevaluatedduring the search process. Hence, relationships between the valid parameter sets and the search space is unclear. As a result, the extent to which the actual parameter space is examined is not clear, even when a large number of parameters sets are selected in the search, rendering the conclusions unreliable. Although analysts have put more attention on the impact of calibration uncertainty by considering a small number of parameter sets that are consistent with observed data, it is unclear that calibration uncertainty is fully captured unless all such parameter sets are considered. Adequate examination of the impact of this uncertainty should be exercised in model-based analyses for MDM in order to provide valuable and reliable insights. Another source of uncertainty includes the uncertainty around the data used during the entireprocessofmodeling. Sensitivityanalysesareconventionallyconductedasastandalone component after a model is developed to examine the impact on the modeled outcomes due to the uncertainty. In the following section, we introduce sensitivity analysis and review how it is conducted in model-based analyses. 2.3 Sensitivity Analysis (SA) Amodel-basedanalysisforMDMsynthesizesallavailableevidenceandhelpsinformdecision making. Since there is uncertainty about some aspects of a model, uncertainty about the conclusions is inevitable. Sensitivity analysis (SA) consists of several systematic approaches designed to illuminate the impact of this uncertainty on conclusions drawn from the model. InSA,thevaluesofmodelparametersandassumptionsarevariedovertheirplausibleranges and the corresponding conclusions are examined (Weinstein [1981]; Weinstein and Fineberg [1980]). A model is said to be sensitive to a parameter if a relatively small variation in the parameter results in large variation in the conclusions, and insensitive to the parameter if a large variation in the parameter leads to small variation in the conclusions (Felli and Hazen [1998]). SA helps increase confidence in the conclusions of a model-based analysis if 16 they remain unchanged under di↵erent input values. On the other hand, if the conclusions change when a parameter or an assumption is varied, SA can help suggest areas where further research can achieve a better understanding (Doubilet et al. [1984]; Weinstein [1981]; Weinstein and Fineberg [1980]; Weinstein and Stason [1977]). Recognizing the importance of SA, Briggs et al. [2012] recommend that all model-based analyses perform systematic examination of uncertainty and report uncertainty around model outputs. 2.3.1 Sources of Uncertainty Di↵erent sources of uncertainty govern the types of SA that an analyst can perform. Briggs [2000]andBriggsetal.[2012]identifyfiveareasofuncertaintythatariseincost-e↵ectiveness models: 17 Model uncertainty Model uncertainty, also known as structural uncertainty, refers to the uncertainty around the structure of a model that is used in an analysis. Structural assumptions include the form of the model and the choice of disease states. Patient cohort uncertainty Patient cohort uncertainty refers to the uncertainty around parameters relating to the characteristics of a patient sample. Such characteristics include age, gender, and clin- ical characteristics (e.g., cholesterol levels). A treatment that is deemed cost-e↵ective for the entire group of patients might have large variations in cost-e↵ectiveness in di↵erent subgroups of patients where some benefit more than others. Parameter uncertainty Parameter uncertainty refers to the uncertainty around parameters. Such parameters include the transition probabilities in a Markov process, resource use items, health outcomes,unitcostsofresource,andutilityofhealthoutcomes. Parameteruncertainty also includes calibration uncertainty. Methodological uncertainty Methodologicaluncertaintyreferstotheuncertaintyaroundparametersrelatingtoan- alytical methods. Some examples of analytical methods are the discounting procedure, inclusion or exclusion of indirect costs, and method to value resource use and health outcomes. Stochastic uncertainty Stochastic uncertainty, also known as first-order uncertainty or Monte Carlo error, refers to the variability in individual events between similar patients; two patients with the same characteristics might have di↵erent treatment outcomes due to chance. 18 2.3.2 Types of SA Deterministic Sensitivity Analysis (DSA) SA examines the maximum robustness of a model’s results. Weinstein and Stason [1977] suggest that the most uncertain parameters or assumptions be included in SA. There are several types of SA. Deterministic sensitivity analysis (DSA) involves manually varying one or more parameters (Briggs [2000]). There are two DSA methods: one- and multi-way SA. In one-way SA, one parameter is varied across its plausible range while the remaining parametersaresetattheirbase-casevalues(Briggsetal.[1994]). Theparameterisassociated with a threshold value beyond which the study’s conclusions change. Such identification of a thresholdvalueisoftencalledathresholdanalysisinmodel-basedanalysesforMDM(Briggs et al. [1994, 2012]; Weinstein and Fineberg [1980]). Extreme values of a parameter can also beusedsothatworst/bestcaseanalysescanbeperformed,andthisisoftencalledanalysisof extremes(Briggsetal.[1994]). Ontheotherhand, inmulti-waySA,twoormoreparameters are varied simultaneously across all combinations of the set of possible values defined during the modeling process, while other parameters are set at their base-case values (Briggs et al. [1994]; Pass and Goldstein [1981]). SA is useful for examining the generalizability of a study beyond the setting addressed by it (Briggs et al. [1994]). For example, DSA can examine the e↵ect of di↵erent values of a resource use parameter on the model outcomes. In addition, DSA can also be used to handle methodologicaluncertainty(Briggsetal.[1994]). Forexample, indirectcostthatisexcluded in base-case analysis can be included in SA. Briggs and Sculpher [1995] advocate examining the generalizability and the e↵ect of various methods, perspectives, and contexts using SA since the decision makers and other users of the analysis may be interested in a di↵erent context. Analysis of extremes can be used to study a parameter that has a base-case value within the study but may vary over a plausible range, without a known distribution over it (Briggs et al. [1994]). 19 DSA has a number of limitations. First, one-way SA does not capture the joint uncer- tainty in other parameters (Critchfield et al. [1986]; O’Brien et al. [1994]). While multi-way SA explores the joint sensitivity of the parameters that are systematically varied, it becomes cumbersome when a large number of parameters are considered (Critchfield et al. [1986]; Doubilet et al. [1984]). Presenting the SA results graphically can be also challenging when more than three parameters are varied in a multi-way SA (Doubilet et al. [1984]). In addi- tion, it is highly possible that multi-way SA insuciently examines the uncertainty around parameters. For example, the results of a multi-way SA that involves only a subset of all parameters is dependent on the validity of the base-case estimates of those parameters not selected for variation (Doubilet et al. [1984]). On the other hand, selection bias might be an issue in DSA since the analyst has to decide which parameters to include and what plausible values they can take in DSA (O’Brien et al. [1994]). DSA does not allow a quan- titative summary regarding the certainty of the recommendation being optimal (Doubilet et al. [1984]). There is no clear standard as to how to determine whether a conclusion is sensitive or insensitive to the uncertainty in a parameter (O’Brien et al. [1994]). Probabilistic Sensitivity Analysis (PSA) Probabilistic sensitivity analysis (PSA) is proposed to overcome the limitations of DSA. PSA is first described by Pass and Goldstein [1981], which involves imposing a probability distribution on each parameter included in SA, generating parameter values from the joint distribution, and examining the distribution of the model output. Therefore, PSA can be viewed as a multi-way SA enacted by imposing distributions on the parameters. Doubilet et al. [1984] illustrate PSA using Monte Carlo simulation in a decision tree context. Each parameter in the decision tree, including probability and utility, is included in PSA and assumed to be a random variable. For practicality and ease, each probability and utility is modeled using a logistic-normal distribution. That is, logit(X)=log X 1 X is normally distributed. The mean and standard deviation of the logistic normal distribution of the 20 parameter can then be determined through its base-case estimate and one bound of its 95% confidence interval. Samples are drawn from the individual distributions, and the utility of each strategy in the decision tree is calculated using the sampled values of all parameters and averaged over the outcomes. A large number of iterations is conducted to obtain information such as the mean and standard deviation of expected utility for each strategy and the frequency of each strategy being optimal. Critchfieldetal.[1986],CritchfieldandWillard[1986],andWillardandCritchfield[1986] further describe the PSA analog of one-way SA in a decision tree context. A probability distribution is imposed on each parameter, as in PSA. A parameter ✓ is first set at a fixed valuet, and the other parameters are generated from the joint probability distribution given ✓ = t.Byvaryingthevalueof ✓ over its range, the individual e↵ect of ✓ on the sensitivity of the conclusion can be described by the conditional probability of each strategy being optimal as a function of ✓ . Although illustrated in a decision tree context, PSA and the aforementioned PSA analog of one-way SA can be applied to other models (Critchfield et al. [1986]). There are di↵erent viewpoints regarding the rationale of assigning a probability distri- bution to a parameter in conducting PSA. Critchfield and Willard [1986] assert that the probability distribution is used to reflect the uncertainty about the true value of the param- eter, which is assumed to be fixed but unknown. This is di↵erent from the Bayesian view- point of Doubilet et al. [1984], who regard parameters as random variables. Briggs [2000] and Briggs et al. [2002] consider PSA as inherently Bayesian where parameters are assigned prior distributions that reflect the degrees of belief. Briggs et al. [2002] attribute the lack of PSA in cost-e↵ectiveness models to the unclearness regarding appropriate assignments of probability distributions. They consider four types of parameters in cost-e↵ectiveness mod- els and suggest types of distributions for each type of parameters according to the Bayesian method. Theyarguethatabetapriorcanbeusedtorepresentaprobabilityparametersince aprobabilityparametercanbeestimatedfrombinomialdataandthebetadistributionis 21 conjugate to a binomial likelihood function, which results in a beta posterior. Resource use items are generally distributed as Poisson, so gamma distributions, which are conjugate to Poisson distributions, can be used. A gamma or normal distribution can be used to repre- sent a unit cost parameter. A log-normal distribution can be used to represent a relative risk parameter because the logarithm of relative risk data is normal and normal distribution is self-conjugate. Pasta et al. [1999] also recognize the challenges associated with selecting appropriate probability distributions for parameters in PSA. They advocate the use of the bootstrap technique to determine the probability distribution of a parameter that is esti- mated through meta-analysis. In particular, each study from the meta-analysis is considered as a sample of all possible studies. A bootstrap sample is drawn with replacement from the set of all studies in the meta-analysis. The estimate for the parameter is the weighted average over the estimates of drawn bootstrap samples, where the weights are the number of patients in the bootstrap sample. Baltussen et al. [2002] also propose using the bootstrap technique to estimate a parameter in the case where individual-level data are accessible for the parameter. They argue that by resampling the original data, the bootstrap technique does not make assumptions regarding the distribution of the parameter of interest. Due to the limitations of DSA, PSA has been widely advocated. Briggs [2000] and Bilcke et al. [2011] suggest the use of PSA as a method to handle uncertainty. Claxton et al. [2005] recommend including all uncertain parameters in SA and using the full distributions based on all evidence to represent the uncertainty in the conclusions of a model more realistically. They also stress that PSA allows correct calculation of expected costs and e↵ects in non- multilinear models such as Markov processes. By quantifying the uncertainty around the recommendations, PSA can help guide future research (Claxton et al. [2005]). Jain et al. [2011] review 406 cost-e↵ectiveness analyses and note increasing use of PSA over time. PSA has several challenges despite its advantages and growing utilization. PSA is hin- dered by more complex models due to computational burden (Koerkamp et al. [2010]). In addition, modeling relationships between parameters can be challenging since there are of- 22 ten no data to model the relationships (Bilcke et al. [2011]; Briggs [2000]; Koerkamp et al. [2010]). Another limitation of PSA is that it can only be used to handle parameter un- certainty (Briggs et al. [1994]). Methodological and model uncertainties will require other forms of SA such as one-way SA. Felli and Hazen [1998] further di↵erentiate between value sensitivity and decision sensitivity. Value sensitivity refers to the variation in the optimal value of a model as a result of the variation in the input parameters, whereas decision sen- sitivity refers to the variation in the recommended decision by the model as a result of the variation in the input parameters. They argue that DSA and PSA focus on decision sensi- tivity instead of value sensitivity, which ignores the loss in payo↵ as a result of a decision change. They propose using the expected value of perfect information (EVPI) as a method to perform SA. The EVPI approach combines both decision sensitivity and value sensitivity, thus illustrating a better picture of sensitivity to the decision makers. EVPI builds upon the idea of PSA. It represents the expected gain in payo↵ due to perfect information on the parameters. Specifically, if ✓ is an uncertain parameter, a 0 is the preferred action when the base-case value for ✓ equals ✓ 0 , and V a represents the payo↵ resulted by choosing action a,then E[V a 0 |✓ = ✓ 0 ]=max a E[V a |✓ = ✓ 0 ]. That is, action a 0 maximizes the expected payo↵ when ✓ = ✓ 0 . The EVPI(✓ )isdefinedas EVPI(✓ )=E ✓ [max a E[V a |✓ ] E[V a 0 |✓ ]] (Felli and Hazen [1998]). EVPI can easily be extended for the entire set of parameters; in this case it is called the total EVPI (Koerkamp et al. [2010]). Felli and Hazen [1998] and Koerkampetal.[2010]alsointroducethenotionofpartialEVPI.Let✓ beasetofparameters and ✓ =✓ I [ ✓ c I where ✓ I are the subset of parameters whose EVPI is of interest and ✓ c I are the rest of the parameters. Leta 0 be the preferred action using base-case estimates of ✓ and 23 a ⇤ (✓ I )bethepreferredactionasafunctionof ✓ I . Then, the partial EVPI of ✓ I is EVPI(✓ I )=E ✓ I ,✓ c I {E[V a ⇤ (✓ I ) |✓ I ,✓ c I ] E[V a 0 |✓ I ,✓ c I ]}. The EVPI approach is also advocated by Baio and Dawid [2015], who assert that it is the proper method to perform PSA. 2.3.3 Best Practices for Conducting SA More attention has been placed on the best practices for conducting SA. It has been recom- mended that methodological uncertainty be handled by using a “reference case” of agreed methodstopromotecomparabilityamongstudies(Briggs[2000];Briggsetal.[2012];Claxton et al. [2005]). Also, patient characteristics should be clarified (Briggs [2000]) and separate analyses should be run for each subgroup of patients (Claxton et al. [2005]). PSA is rec- ommended as the method to handle parameter uncertainty (Briggs [2000]; Claxton et al. [2005]). Briggs et al. [2012] recommend that probability distributions should be assigned according to standard statistical methods as previously discussed in Briggs et al. [2002] (e.g., beta distribution for binomial data), or according to expert opinion. They assert that insucient information does not justify the exclusion of any parameter in SA. The correla- tion among parameters should be accounted for by specifying a multivariate distribution for correlated parameters (Bilcke et al. [2011]; Briggs et al. [2012]). In the case where calibra- tion is employed, Bilcke et al. [2011] recommend investigating multiple sets of parameters that are calibrated to data. For model uncertainty, di↵erent teams should be employed to model the same problem with di↵erent structural assumptions (Briggs [2000]). On the other hand, Claxton et al. [2005] recommend that PSA be run for each scenario di↵erenti- ated by structural assumptions, which is also known as scenario analysis. However, Jackson et al. [2011] contend that presenting multiple model structures does not allow the decision makers to make an appropriate decision. They suggest that all structural assumptions be parameterizedintothemodel. Probabilitydistributionsforsuchparameterscanbeobtained 24 from expert opinion or published data to allow analysis using PSA. Scenario analysis can be conducted if there is no data and expert opinion to help inform probability distribu- tions. Model uncertainty can be assessed through a model-averaged analysis, which involves weighting each plausible model by a probability weight that is based on the data fit and the complexity of the model (Jackson et al. [2011]). As best practices in reporting the results of SA, Briggs et al. [2012] recommend that analysts should reveal and justify all assumptions pertaining to SA such as the plausible ranges in DSA and distributional assumptions in PSA. The results of SA should be reported responsibly. For one-way SA, a tornado diagram may be easily constructed. In a tornado diagram, each parameter involved in SA has a horizontal bar whose length is proportional to the uncertainty in the model outcome it generates. The longest bar is at the top whereas the other bars are ordered in decreasing length from the top. A vertical line representing the base-case values intersect all bars. For discrete parameters, the results of SA for each value should be reported separately. For multi-way SA involving two or more parameters, threshold analysis can be useful to present the results. For two-way SA, a graph where each axis represents a parameter can be constructed. A curve is drawn to separate the regionswheredi↵erentdecisionsarepreferred. Briggset al. [2012]also recommendreporting the results of PSA by presenting cost-e↵ectiveness acceptability curves (CEACs) (van Hout et al. [1994]) and distributions of model outcomes. The CEAC for each option shows its probability of being cost-e↵ective as a function of the cost-e↵ectiveness limit. This limit is a threshold that defines the maximum value for being cost-e↵ective (van Hout et al. [1994]). Examining calibration uncertainty involves considering multiple plausible models di↵erentiated by di↵erent parameter sets that are calibrated to observed data. Calibration uncertainty may be reported in deterministic or probabilistic form (Briggs et al. [2012]). In deterministic form, the range of calibrated parameter values is identified from the accepted parameter sets and the range of model outputs is identified similarly. Both ranges should be reported. In probabilistic form, a probability distribution may be reported if a probability 25 weight is assigned to each accepted parameter set. Otherwise, results from each plausible model may be reported separately. 2.3.4 Summary In summary, uncertainty is inevitable. SA provides a systematic approach for examining the consequence of uncertainty in various aspects of a model on its conclusions. It allows the de- cision makers to make suciently informed decisions based on available evidence. Although PSAcanovercomesomelimitationsofDSA,theneedforspecifyingprobabilitydistributions and correlations among parameters can be a challenge. Such challenge is especially evident in Markov processes where the transition probabilities are estimated through calibration. Even though the EVPI approach considers both decision sensitivity and value sensitivity, it is fundamentally based on the PSA approach, thus inheriting the challenge associated with the need for distributional specifications. It is imperative that the uncertainty associated with multiple plausible models be examined. Ultimately, after all sources of uncertainty are considered, it is the decision makers who will decide whether the conclusion is sensitive to a parameter and whether additional information is warranted before a decision is made. 2.4 Conclusion Validation, calibration, and SA are essential components for achieving credibility of a model’s conclusions. A calibrated model has to be valid, and an alternative model considered in SA has to be calibrated against observed data while remaining valid. However, the lack of transparency in validation, calibration, and SA impede clarity with respect to these con- cerns. Inadequate examination of the actual parameter space and calibration uncertainty further obstructs the reliability of a model. Since the data and expert opinion considered in validation could potentially be defined mathematically and calibration can be formulated as an optimization problem that optimizes data fit, incorporating validation into calibration and SA might be plausible. However, such integration has yet to be seen in model-based 26 analyses. Therefore, we investigate the potential of using mathematical programming ap- proaches for such integration. 27 2.5 Bibliography Ashton, H. A., Gao, L., Kim, L. G., Druce, P., Thompson, S. G., and Scott, R. (2007). Fifteen-year follow-up of a randomized clinical trial of ultrasonographic screening for ab- dominal aortic aneurysms. British Journal of Surgery,94(6):696–701. Baio, G. and Dawid, A. P. (2015). Probabilistic sensitivity analysis in health economics. Statistical Methods in Medical Research,24(6):615–634. Baltussen, R. M., Hutubessy, R. C., Evans, D. B., and Murray, C. J. (2002). Uncertainty in cost-e↵ectiveness analysis. International Journal of Technology Assessment in Health Care, 18(1):112–119. Bilcke, J., Beutels, P., Brisson, M., and Jit, M. (2011). Accounting for methodological, structural,andparameteruncertaintyindecision-analyticmodels: Apracticalguide. Medical Decision Making,31(4):675–692. Briggs, A. and Sculpher, M. (1995). Sensitivity analysis in economic evaluation: A review of published studies. Health Economics,4(5):355–371. Briggs, A., Sculpher, M., and Buxton, M. (1994). Uncertainty in the economic evaluation of health care technologies: The role of sensitivity analysis. Health Economics,3(2):95–104. Briggs,A.H.(2000). Handlinguncertaintyincost-e↵ectivenessmodels. Pharmacoeconomics, 17(5):479–500. Briggs, A. H., Goeree, R., Blackhouse, G., and O’Brien, B. J. (2002). Probabilistic analysis of cost-e↵ectiveness models: Choosing between treatment strategies for gastroesophageal reflux disease. Medical Decision Making,22(4):290–308. Briggs, A. H., Weinstein, M. C., Fenwick, E. A., Karnon, J., Sculpher, M. J., and Paltiel, A.D.(2012). Modelparameterestimationanduncertaintyanalysis: AreportoftheISPOR- SMDM Modeling Good Research Practices Task Force Working Group–6. Medical Decision Making,32(5):722–732. Claxton, K., Sculpher, M., McCabe, C., Briggs, A., Akehurst, R., Buxton, M., Brazier, J., and O’Hagan, T. (2005). Probabilistic sensitivity analysis for NICE technology assessment: Not an optional extra. Health Economics,14(4):339–347. Critchfield, G. C. and Willard, K. E. (1986). Probabilistic analysis of decision trees using Monte Carlo simulation. Medical Decision Making,6(2):85–92. Critchfield, G.C., Willard, K.E., andConnelly, D.P.(1986). Probabilisticsensitivityanaly- sismethodsforgeneraldecisionmodels. Computers and Biomedical Research,19(3):254–265. 28 Doubilet, P., Begg, C. B., Weinstein, M. C., Braun, P., and McNeil, B. J. (1984). Prob- abilistic sensitivity analysis using Monte Carlo simulation: A practical approach. Medical Decision Making,5(2):157–177. Draisma, G., Boer, R., Otto, S. J., van der Cruijsen, I. W., Damhuis, R. A., Schr¨ oder, F. H., and de Koning, H. J. (2003). Lead times and overdetection due to prostate-specific antigenscreening: EstimatesfromtheEuropeanRandomizedStudyofScreeningforProstate Cancer. Journal of the National Cancer Institute,95(12):868–878. Eddy, D. M., Hollingworth, W., Caro, J. J., Tsevat, J., McDonald, K. M., and Wong, J. B. (2012). Model transparency and validation: A report of the ISPOR-SMDM Modeling Good Research Practices Task Force–7. Medical Decision Making,32(5):733–743. Egger,M.,Hirschel,B.,Francioli,P.,Sudre,P.,Wirz,M.,Flepp,M.,Rickenbach,M.,Malin- verni, R., Vernazza, P., and Battegay, M. (1997). Impact of new antiretroviral combination therapies in HIV infected patients in Switzerland: Prospective multicentre study. BMJ, 315(7117):1194–1199. Enns, E. A., Cipriano, L. E., Simons, C. T., and Kong, C. Y. (2015). Identifying best-fitting inputs in health-economic model calibration: A Pareto frontier approach. Medical Decision Making,35(2):170–182. Erenay, F. S., Alagoz, O., Banerjee, R., and Cima, R. R. (2011). Estimating the unknown parameters of the natural history of metachronous colorectal cancer using discrete-event simulation. Medical Decision Making,31(4):611–624. Felli, J. C. and Hazen, G. B. (1998). Sensitivity analysis and the expected value of perfect information. Medical Decision Making,18(1):95–109. Freedberg, K. A., Cohen, C. J., and Barber, T. W. (1997). Prophylaxis for disseminated Mycobacterium avium complex (MAC) infection in patients with AIDS: A cost-e↵ectiveness analysis. Journal of Acquired Immune Deficiency Syndromes,15(4):275–282. Freedberg, K. A., Scharfstein, J. A., Seage III, G. R., Losina, E., Weinstein, M. C., Craven, D. E., and Paltiel, A. D. (1998). The cost-e↵ectiveness of preventing AIDS-related oppor- tunistic infections. JAMA,279(2):130–136. Fryback, D. G., Stout, N. K., Rosenberg, M. A., Trentham-Dietz, A., Kuruchittham, V., and Remington, P. L. (2006). The Wisconsin breast cancer epidemiology simulation model. Journal of the National Cancer Institute Monographs,2006(36):37–47. Goldhaber-Fiebert, J. D., Stout, N. K., Ortendahl, J., Kuntz, K. M., Goldie, S. J., and Salomon, J. A. (2007). Modeling human papillomavirus and cervical cancer in the United States for analyses of screening and vaccination. Population Health Metrics,5(11). 29 Havrilesky, L. J., Sanders, G. D., Kulasingam, S., Chino, J. P., Berchuck, A., Marks, J. R., and Myers, E. R. (2011). Development of an ovarian cancer screening decision model that incorporates disease heterogeneity. Cancer,117(3):545–553. Havrilesky, L. J., Sanders, G. D., Kulasingam, S., and Myers, E. R. (2008). Reducing ovarian cancer mortality through screening: Is it possible, and can we a↵ord it? Gynecologic Oncology,111(2):179–187. Jackson, C. H., Bojke, L., Thompson, S. G., Claxton, K., and Sharples, L. D. (2011). A framework for addressing structural uncertainty in decision models. Medical Decision Making,31(4):662–674. Jain, R., Grabner, M., and Onukwugha, E. (2011). Sensitivity analysis in cost-e↵ectiveness studies. Pharmacoeconomics,29(4):297–314. Jit, M., Gay, N., Soldan, K., Choi, Y. H., and Edmunds, W. J. (2010). Estimating progres- sion rates for human papillomavirus infection from epidemiological data. Medical Decision Making,30(1):84–98. Karnon, J., Czoski-Murray, C., Smith, K. J., and Brand, C. (2009). A hybrid cohort in- dividual sampling natural history model of age-related macular degeneration: Assessing the cost-e↵ectiveness of screening using probabilistic calibration. Medical Decision Making, 29(3):304–316. Karnon, J. and Vanni, T. (2011). Calibrating models in economic evaluation: A compari- son of alternative measures of goodness of fit, parameter search strategies and convergence criteria. Pharmacoeconomics,29(1):51–62. Kim, J. J., Kuntz, K. M., Stout, N. K., Mahmud, S., Villa, L. L., Franco, E. L., and Goldie, S. J. (2007a). Multiparameter calibration of a natural history model of cervical cancer. American Journal of Epidemiology,166(2):137–150. Kim, L. G., Scott, R. A. P., Ashton, H. A., and Thompson, S. G. (2007b). A sustained mortalitybenefitfromscreeningforabdominalaorticaneurysm.AnnalsofInternalMedicine, 146(10):699–706. Kim, L. G. and Thompson, S. G. (2010). Uncertainty and validation of health economic decision models. Health Economics,19(1):43–55. Koerkamp, B. G., Weinstein, M. C., Stijnen, T., Heijenbrok-Kal, M. H., and Hunink, M. M. (2010). Uncertainty and patient heterogeneity in medical decision models. Medical Decision Making,30(2):194–205. Kong, C. Y., McMahon, P. M., and Gazelle, G. S. (2009). Calibration of disease simulation model using an engineering approach. Value in Health,12(4):521–529. 30 Lasdon, L. S., Fox, R. L., and Ratner, M. W. (1974). Nonlinear optimization using the generalized reduced gradient method. Revue fran¸ caise d’automatique, d’informatique et de recherche op´ erationnelle. Recherche op´ erationnelle,8(3):73–103. Low,N.,Pfluger,D.,Egger,M.,andStudy,T.S.H.C.(1997). DisseminatedMycobacterium avium complex disease in the Swiss HIV Cohort Study: Increasing incidence, unchanged prognosis. AIDS,11(9):1165–1171. Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. The Computer Journal,7(4):308–313. O’Brien, B. J., Drummond, M. F., Labelle, R. J., and Willan, A. (1994). In search of power and significance: Issues in the design and analysis of stochastic cost-e↵ectiveness studies in health care. Medical Care,32(2):150–163. Pass, T. M. and Goldstein, L. P. (1981). CE-TREE: A computerized aid for medical cost- e↵ectiveness analysis. In Proceedings of the Annual Symposium on Computer Application in Medical Care, pages 219–221. American Medical Informatics Association. Pasta, D. J., Taylor, J. L., and Henning, J. M. (1999). Probabilistic sensitivity analysis incorporating the bootstrap: An example comparing treatments for the eradication of Heli- cobacter pylori. Medical Decision Making,19(3):353–363. Ramos, M. C. P., Barton, P., Jowett, S., and Sutton, A. J. (2015). A systematic review of research guidelines in decision-analytic modeling. Value in Health,18(4):512–529. Roberts, M., Russell, L. B., Paltiel, A. D., Chambers, M., McEwan, P., and Krahn, M. (2012). Conceptualizing a model: A report of the ISPOR-SMDM Modeling Good Research Practices Task Force–2. Medical Decision Making,32(5):678–689. Scott, R. and The Multicentre Aneurysm Screening Study Group (2002). The Multicentre Aneurysm Screening Study (MASS) into the e↵ect of abdominal aortic aneurysm screening on mortality in men: A randomised controlled trial. The Lancet,360(9345):1531–1539. Sendi, P. P., Craig, B. A., Pfluger, D., Gafni, A., and Bucher, H. C. (1999). Systematic validation of disease models for pharmacoeconomic evaluations. Journal of Evaluation in Clinical Practice,5(3):283–295. Stout, N. K., Knudsen, A. B., Kong, C. Y., McMahon, P. M., and Gazelle, G. S. (2009). Calibration methods used in cancer simulation models and suggested reporting guidelines. Pharmacoeconomics,27(7):533–545. Tan, S. Y., van Oortmarssen, G. J., de Koning, H. J., Boer, R., and Habbema, J. D. F. (2006). The MISCAN-Fadia continuous tumor growth model for breast cancer. Journal of the National Cancer Institute Monographs,2006(36):56–65. 31 Taylor, D. C., Pawar, V., Kruzikas, D. T., Gilmore, K. E., Pandya, A., Iskandar, R., and Weinstein, M. C. (2011). Calibrating longitudinal models to cross-sectional data: The e↵ect of temporal changes in health practices. Value in Health,14(5):700–704. Taylor, D. C., Pawar, V., Kruzikas, D. T., Gilmore, K. E., Sanon, M., and Weinstein, M. C. (2012). Incorporating calibrated model parameters into sensitivity analyses. Pharmacoeco- nomics,30(2):119–126. van de Velde, N., Brisson, M., and Boily, M.-C. (2007). Modeling human papillomavirus vaccine e↵ectiveness: Quantifying the impact of parameter uncertainty. American Journal of Epidemiology,165(7):762–775. van Hout, B. A., Al, M. J., Gordon, G. S., and Rutten, F. F. (1994). Costs, e↵ects and C/E-ratios alongside a clinical trial. Health Economics,3(5):309–319. Vanni, T., Karnon, J., Madan, J., White, R. G., Edmunds, W. J., Foss, A. M., and Legood, R. (2011). Calibrating models in economic evaluation: A seven-step approach. Pharma- coeconomics,29(1):35–49. Weinstein, M. C. (1981). Economic assessments of medical practices and technologies. Med- ical Decision Making,1(4):309–330. Weinstein, M. C. and Fineberg, H. V. (1980). Clinical Decision Analysis. WB Saunders Company. Weinstein, M. C., O’Brien, B., Hornberger, J., Jackson, J., Johannesson, M., McCabe, C., and Luce, B. R. (2003). Principles of good practice for decision analytic modeling in health- care evaluation: Report of the ISPOR Task Force on Good Research Practice–Modeling studies. Value in Health,6(1):9–17. Weinstein, M. C. and Stason, W. B. (1977). Foundations of cost-e↵ectiveness analysis for health and medical practices. New England Journal of Medicine,296(13):716–721. Weinstein, M. C., Toy, E. L., Sandberg, E. A., Neumann, P. J., Evans, J. S., Kuntz, K. M., Graham, J. D., and Hammitt, J. K. (2001). Modeling for health care and other policy decisions: Uses, roles, and validity. Value in Health,4(5):348–361. Willard, K. E. and Critchfield, G. C. (1986). Probabilistic analysis of decision trees using symbolic algebra. Medical Decision Making,6(2):93–100. 32 Chapter 3: Model-Based Calibration for Natural History Modeling 3.1 Motivation Markovmodelsarecommonlyseeninnaturalhistorymodelingforvariouscancers, including cervical cancer (Goldhaber-Fiebert et al. [2007]; Jit et al. [2010]; Kim et al. [2007]; Myers et al. [2000]; Siebert et al. [2006]; Taylor et al. [2011, 2012]; van de Velde et al. [2007]), ovarian cancer (Havrilesky et al. [2011, 2008]), breast cancer (Hillner and Smith [1991]), and prostate cancer (Grover et al. [2000]). Typical data that can be used to estimate the transi- tion probabilities include prevalence and incidence, and in some cases direct observations of transitions between some health states are available. The natural history is rarely observ- able, and there are many transition probabilities that are not observable. As a consequence, calibration is often employed to select the transition probabilities for a particular model. Calibrationisaprocessofselectingthevaluesofmodelparameterstoachieveconsistency between model outputs and observed data. In model-based analyses, a single parameter set is usually produced as a result of calibration, but there exist other parameter sets that are consistent with the data and are often unexplored. Without examining calibration uncer- tainty, the conclusions made lack reliability, which undermines the purpose of a model-based analysis as an aid to medical decision making. In an e↵ort to examine the impact of calibra- tion uncertainty, Kim et al. [2007] generate 555,000 sets of parameters of which 16,818 sets are considered “acceptable,” while Goldhaber-Fiebert et al. [2007] generate 1,000,000 sets of parameters of which 183 sets are accepted, although the measure for acceptability varies in these two studies. However, unless the full breadth of parameter sets that are consistent withobserveddataareconsideredintheanalyses, itisunclearthattheimpactofcalibration uncertainty is fully captured. In addition to the limited exploration of calibration uncertainty, there exist other chal- lenges in the calibration of a disease model. Stout et al. [2009] review 154 published cancer 33 screening models and note that there are significant discrepancies in reporting details of the calibration processes used. They conclude that there is a need for rigorous and consistent documentationforcalibration. Tothisend,Vannietal.[2011]proposeaseven-stepapproach to model calibration: identifying parameters to be calibrated, calibration targets, goodness- of-fit (GOF) measure, search strategy, acceptance criteria, stopping rule, and integration of calibration results and economic parameters. Another challenge in calibration for model-based analyses is the diculty in capturing relationshipsamongparameters. Acommonpracticeindefiningandsearchingtheparameter space during calibration involves the specification of a plausible range for each parameter included in the search. Parameter values are often selected by searching among these ranges independently. A set of parameter values is considered to be valid if it is consistent with observed data and existing clinical judgment about the disease of interest. This is often assessedaposterioribasedontheGOFmeasure,althoughdetailsofthevalidationprocessare often undocumented. As illustrated in Kim et al. [2007] and Goldhaber-Fiebert et al. [2007], searching within plausible ranges independently can overlook the relationships that exist amongparametersandthevalidityoftheparametersetsevaluated. Thislimitsopportunities toperformguidedsearchesortorecognizeoptimalselections. Thelawsofprobability,aswell as clinical judgment, reveal relationships among parameters. Without incorporating these relationshipsandverifyingthevalidityoftheparametersetsinthesearch,theextenttowhich the actual parameter space is examined remains unclear. Therefore, conclusions might be unreliable even when a large number of parameters sets are selected in the search. The purpose of this chapter is to demonstrate a systematic approach that combines calibration and validation in the context of a Markovian natural history model. 3.2 A Fictitious Disease Inthissection,wepresentafictitiousdiseasethatprogressesthroughdiseasestatesaccording toaMarkovprocess. Thisfictitiousdiseaseinvolvestransitionsamong3stagesofthedisease 34 Figure 3.1: Possible transitions among 9 health states (which may be detected or undetected). The nine health states that result are: • healthy (H); • stage 1, 2, 3 which may be – undetected (1U,2U,3U), or – detected (1D,2D,3D); and • death from – disease (DD), or – other causes (DO). An individual who is in the healthy state (H)isfreeofthefictitiousdiseasebutmay have other diseases. The disease is strictly progressive. That is, a diseased person cannot transition back to healthy nor regress to a less advanced disease stage in the future. In 35 addition, an individual who is detected with the disease remains detected throughout his or her natural history until death. Figure 3.1 shows all possible transitions among the 9 health states for our fictitious disease. Transitions among health states do not depend on age so that the transition probabilities from a health state i to another health state j are independent of age. The transition probability matrix for this fictitious disease, which we callP true , is available in Appendix A.1. 3.2.1 Disease Dynamics The dynamics of a disease that is a discrete-state, discrete-time Markov process are well defined. Transitions over time and measures such as incidence and prevalence are described via a system of equations. To begin, let • A=setofages, • S=setofallhealthstates, • D ={1D,2D,3D}=setofdetecteddiseasestates, • U ={1U,2U,3U}=setofundetecteddiseasestates, • O =D[{ DO,DD}=setofobservablestates, • P=[P i,j ]=transitionprobabilitymatrixwhere P i,j denotes the probability that a patient in state i2 S at age a2 A will be in state j 2 S at age a+1forall a2 A, • ⇡ (a)=[⇡ i (a): i 2 S]where ⇡ i (a) = probability that an individual is in health state i2 S at age a2 A. Note that ⇡ i (a) also represents the proportion of individuals who are in health state i at age a,and • ↵ j (a) = the probability that an individual is newly detected in state j 2 D at age a 2 A. Note that ↵ j (a) also represents the proportion of individuals whose disease is initially detected in state j at age a. 36 Relationships that result from a Markov process include: ⇡ (a)= ⇡ (a 1)P 8 a2 A, (3.1) or equivalently ⇡ (a)= ⇡ (0)P (a) where P (n) is the n-step transition probability matrix. In other words, the proportion of individuals at age a+1 who are in health state j is the result of all transitions into j from agea. Note that (3.1) is valid for all health states, regardless of the observability of the healthstates. Theobservabilityofahealth stateisimportant becauseitdetermineswhether relevant data can be available. In addition, the first time a diseased individual is diagnosed with the disease is the result of a transition from either healthy (H)oroneoftheundetecteddiseasestatestoadetected disease state. Then: ↵ j (a)= X i2 U[{ H} ⇡ i (a 1)P i,j 8 j 2 D,a2 A. (3.2) Equations (3.1) and (3.2) represent the dynamics of the disease, and relate transition probabilitiestoobservablequantitieswithinthegeneralpopulation,incidenceandprevalence by age and state. Prevalence and incidence are two common measures of disease burden. In epidemiology, prevalence is defined as the proportion of individuals who have the disease of interest while incidence refers to the newly detected cases. Prevalence data are available for health states that are observable, and incidence data are available for the detected health states. Inthissection, wefurtherexpandthenotionofprevalencetoincludeallhealthstates so that the prevalence of a particular health state represents the proportion of individuals who are in that health state. By doing so, prevalence of state i2 S at age a is just ⇡ i (a). It 37 is readily seen that ↵ j (a)isequivalenttotheincidenceinstate j 2 D at age a. 3.2.2 Incorporating Relationships Among Parameters in Calibration In (3.1) and (3.2), P i,j denotes the probability that an individual transitions from i 2 S to j 2 S at any given age. Since not all transitions are observable, calibration is frequently used to select transition probabilities that provide consistency between model outputs and observed data. Combinations of parameter values are evaluated based on the GOF measure. A set of parameters that achieves a specified level of “fit” can be considered as acceptable. Failure to incorporate the relationships described by (3.1) and (3.2) in the calibration process can result in an inecient and insucient search of the parameter space, as illustrated by the following example. Example: Consider a simpler example in which there are three health states: healthy (H), ill (I), and dead (D). The transition probability matrix is P = 2 6 6 6 6 6 4 HI D HP HH P HI 0 I 0 P II P ID D 00 1 3 7 7 7 7 7 5 , and note that P HI =1 P HH and P ID =1 P II ,sowehavetwoparameters, P HH and P II , to estimate. Suppose that we are given plausible ranges for P HH and P II : P HH 2 [0.7,0.9] and P II 2 [0.4,0.6]. 38 (a) Plausible ranges without dynamics (b) Actual parameter space with dynamics Figure 3.2: Representation of plausible search spaces for P HH and P II Figure 3.2a illustrates the intersection of these plausible ranges for P HH and P II . Now, suppose that the prevalence vectors for two consecutive ages are known. That is, suppose that we are given ⇡ (1) = [0.800.2] and ⇡ (2) = [0.64 0.16 0.2] where the order of the health states in the prevalence vector is H, I,and D,respectively. Using equation (3.1), we obtain the following: [0.800.2] 2 6 6 6 6 6 4 P HH 1 P HH 0 0 P II 1 P II 00 1 3 7 7 7 7 7 5 =[0.64 0.16 0.2]. (3.3) Solving(3.3)yieldsP HH =0.8,butprovidesnoadditionalinformationregardingP II .There- fore, after considering (3.1) within the context of the data provided, the plausible parameter 39 values that remain valid form the line segment indicated in Figure 3.2b. That is, the actual parameter space is considerably more restricted than the shaded region. The region defined solely by the plausible ranges provides combinations of parameter values that are not valid. Although the line segment implies an infinite number of plausible models that are consistent with observed data, a search process that focuses on the region in Figure 3.2a is unlikely to recover the valid models that lie within the line segment depicted in Figure 3.2b. 3.3 Calibration Uncertainty The existence of multiple models that are consistent with all available data, i.e., multiple valid models, results in calibration uncertainty. That is, the representation of the behavior of thediseaseoutsideoftheobservablestateswillvary, causinguncertaintyintheconsequences ofvarioustreatmentsorinterventionsthatmightbesubjectedtomodel-basedanalysis. This is a form of uncertainty that is introduced with the selection of a specific set of parameters to represent the model. It is therefore important to provide a systematic description of the validity of plausible models. For our 9-state fictitious disease depicted in Figure 3.1, we calculate incidence and prevalence fromP true using (3.1) and (3.2). We refer to the resulting values as • ↵ true (a)and • ⇡ true (a) for a2 A (available in Appendix A.1). We consider three models,P 0 ,P A ,andP B ,thatare consistent with the observable elements of ↵ true and ⇡ true , ↵ true D ={↵ true i (a):i2 D,a2 A} and ⇡ true O = {⇡ true i (a): i 2 O,a 2 A}. These models are available in Appendix A.2. Table 3.1 summarizes the extent to which these models fit the observable elements using two common GOF measures: sum of squared deviations from data and sum of absolute deviations. Note that all three models fit the observed data with high precision. However, close 40 Model Sum of squared deviations Sum of absolute deviations P 0 6.6533⇥ 10 10 1.5966⇥ 10 4 P A 1.4634⇥ 10 9 2.3541⇥ 10 4 P B 1.3510⇥ 10 12 5.5766⇥ 10 6 Table 3.1: Data fit of three models in two GOF measures inspection ofP 0 in comparison toP true suggests inconsistencies with respect to clinical judg- ment that is likely to develop for this fictitious disease. For example, in model P 0 ,an individual in state 3U will either remain in 3U or transition to 3D.Inaddition,anindi- vidual in state H can transition to state 3D but not 3U.Thesefeaturesassociatedwith P 0 defy logic and clinical judgment. Therefore, even though P 0 is strongly consistent with observed data, the model is not plausible and cannot be considered “valid.” Without a formal representation of “validity,” other less obviously flawed models can emerge and im- pact recommendations. By incorporating mathematical expressions of validity within the calibration process, we ensure the validity of the model while establishing consistency with observed data. In Appendix A.3, we illustrate a representation of “validity” conditions for our hypothetical model, and verify thatP A andP B are valid models, whileP 0 is not. 3.4 Model-Based Calibration Not all transition probabilities and components of {⇡ i (a): i 2 S} are observable. By representing{P,⇡,↵ }asdecisionvariables,weareabletoselecttheirvaluessothatthemodel outputs are consistent with observed data. Suppose we have {ˆ ↵ D ,ˆ ⇡ O } = {ˆ ↵ i (a),ˆ ⇡ j (a): i2 D,j 2 O,a2 A} as calibration targets and a set V ={{P,⇡,↵ } that satisfy known validity conditions}. 41 A mathematical program that combines calibration and validation may be stated as: [CM(ˆ ↵ D ,ˆ ⇡ O ,V)] : optimize GOF(ˆ ↵ D ,ˆ ⇡ O ;↵,⇡ ) subject to ⇡ j (a+1)= X i2 S ⇡ i (a)P i,j 8 j 2 S,a2 A ↵ j (a+1)= X i2 U[{ H} ⇡ i (a)P i,j 8 j 2 D,a2 A {P,⇡,↵ }2 V AsolutiontoCM(ˆ ↵ D ,ˆ ⇡ O ,V)willsatisfytherestrictionsimposedbythediseasedynamics and the validity constraints and will be consistent with observed data ˆ ↵ D and ˆ ⇡ O .Models P A and P B are two solutions to CM(↵ true D ,⇡ true O ,V)wheretheGOFisthetotalsumof squared deviations, which were obtained using the Baron solver (Tawarmalani and Sahinidis [2005]). The set V for the fictitious disease is specified in Appendix A.3. 3.5 Net Monetary Benefit Analysis The process of calibration results in one or more plausible models through which recom- mendations regarding alternative medical interventions are made. These alternatives are evaluated based on criteria such as cost-e↵ectiveness or net monetary benefit (NMB; Stin- nett and Mullahy [1998]; Trippoli [2017]), and recommendations are based on the outcome of these evaluations. Every intervention impacts costs, quality of life, and transitions between states. In order to make a recommendation regarding a set of potential interventions, it is necessary to estimate the net cost/benefit across the population. This is accomplished using a model of the natural history of the disease. Consequently, the assessment of possible interventions depends heavily on the transition probability matrix P used. We illustrate the impact of calibrationuncertaintyusingourfictitiousdisease. Forthisdisease, wehavefiveintervention 42 options: 1. do nothing, 2. treatment 1 (T1), 3. treatment 2 (T2), 4. screening + treatment 1 (ST1), and 5. screening + treatment 2 (ST2). Aninterventionappliedtoapatientinaparticularstatewillalterthetransitionprobabilities fromthatstate. Thatis, interventionsaltertherowsofP.Inparticular,screeninga↵ectsthe rows corresponding to healthy or undetected disease states, and treatment alters the rows corresponding to the detected disease states. Additionally, each intervention incurs costs. All parameters related to the cost/benefit analysis, including the willingness-to-pay (WTP), are available in Appendix A.4. Table3.2summarizestheresultsofNMBanalysesforP true andmodelsP A andP B . Using P true , NMB analysis identifies the optimal intervention as ST1. P A andP B are models of the fictitious disease that are validated and well-calibrated. However, these models recommend interventions that di↵er from the recommended intervention associated with P true . Model Preferred medical intervention P true ST1 P A T2 P B T1 Table 3.2: Results of NMB analysis forP true ,P A ,andP B Using the fictitious disease, i.e., P true , we are able to examine the NMB of each possible interventionandtheirdeviationfromtheoptimalNMB(availableinAppendixA.4.5). Table 3.3 summarizes the results. 43 P true Do nothing T1 T2 ST1 ST2 % deviation from max NMB -0.39% -0.0026% -0.0058% 0% -0.0089% Table 3.3: Net monetary benefit for each intervention The recommended intervention suggested byP A is T2, and forP B it is T1. In both cases, the cost of making a decision that is not preferred by P true is a very small fraction of the optimal net benefit. Unfortunately, this is based on the true transition matrix,P true ,which is not available outside the realm of fictitious diseases. A common practice in disease modeling to understand the impact of calibration uncer- tainty on the recommendation is to randomly generate a parameter set from the plausible ranges around its elements. Using this method, Goldhaber-Fiebert et al. [2007] randomly generate 1,000,000 parameter sets. For each parameter set, they evaluate the likelihood based GOF score of the modeled outputs. 183 sets that are good-fitting are accepted. Using our fictitious disease, we randomly generate 1,000,000 transition probability matrices from the plausible ranges. We compute the GOF score, defined as the total sum of squared devia- tions from the data, for each matrix and assess if the matrix satisfies the validity conditions outlined in Appendix A.3. Only ten matrices are valid and one provides a good fit, although it is invalid. Among the ten valid matrices, the smallest GOF score is 0.39. Table 3.4 sum- marizes the results. Since a GOF score of zero is attainable in our experimental setting and both P A and P B have GOF scores in the order of 10 9 or less, these randomly generated matrices that satisfy the validity conditions can be considered as ill-fitting. As a result, this method of generating a matrix that is valid and provides a good fit can be dicult. We can adapt the mathematical program that represents the process of calibration and validation, CM(ˆ ↵ D ,ˆ ⇡ O ,V), to formally and mathematically characterize the models that provide a good fit for the calibration targets while satisfying the validity conditions. Such 44 Randomly generatedP GOF Valid? Recommendation 1 3.131363537 Yes ST1 2 0.394529165 Yes ST1 3 2.395214755 Yes ST1 4 1.641439985 Yes ST1 5 0.969297336 Yes ST1 6 2.076240118 Yes ST1 7 2.601293352 Yes ST1 8 0.000976364 No T1 9 1.403418576 Yes ST1 10 1.683823924 Yes ST1 11 0.691347396 Yes ST1 Table 3.4: Results of 1,000,000 randomly generated matrices models are members of the set PM ={P : GOF(ˆ ↵ D ,ˆ ⇡ O ;↵,⇡ ) ✏ ⇡ j (a+1)= X i2 S ⇡ i (a)P i,j 8 j 2 S,a2 A ↵ j (a+1)= X i2 U[{ H} ⇡ i (a)P i,j 8 j 2 D,a2 A {P,⇡,↵ }2 V} (3.4) where ✏ is the threshold for which the GOF is considered acceptable. We call a model belonging to the set PM a plausible model. By searching among the plausible models, we can examine the maximum and minimum NMB achievable for each intervention. This is accomplished by incorporating the NMB calculations associated with an intervention (outlined in Appendix A.4) into the set PM. From Table 3.4, we observe that all of the valid matrices recommend the correct intervention, ST1, whiletheinvalidmatrixrecommendsanincorrectintervention. Onemightsuspectthat the validity of a model might play a more significant role than its GOF score in guiding the 45 decision. In other words, the recommendation might be sensitive to the GOF threshold. As aresult,weexaminethemannerinwhichtheGOFofamodelthatweconsiderasacceptable impacts the range of the NMB. We use various values of ✏ based on what we consider as a good fit and the GOF scores from Table 3.4 while searching in the set PM. The results are shown in Figure 3.3 (see page 50). The black horizontal line segment represents the NMB range that results by searching among all plausible models when ✏=10 7 .Thevertical lines represent the intervention-specific NMB values computed using P true ,avaluethatis unknowable (available in Appendix A.4.5). When ✏ 0.1, the optimal intervention, ST1, displays the largest range of the NMB, indicating that the risk associated to selecting ST1 is the greatest. T1 and T2 have relatively smaller range of the NMB. As a result, a decision makermightprefertochooseT1orT2overST1astheriskissmaller. Thisshowsthattheset of plausible models can be used to e↵ectively capture the impact of calibration uncertainty through examining the maximum and minimum NMB achievable for every intervention. In addition, we observe that the NMB range becomes larger as ✏ increases, whereas the maximum NMB achievable for each intervention decreases as ✏ decreases. The ranges for all interventions become almost identical when ✏ is large, suggesting that all interventions have the same level of risk. As ✏ decreases, the NMB range di↵ers from one intervention to another. Consequently, it is easier to identify the intervention with the smallest NMB range. We consider other constraints that are not currently in the validity set V.Biologically plausible ranges for some transition probabilities, incidence, or prevalence can be added in the set V as they become available. Mathematical representations of evolving expertise and judgmentcanbesimilarlyincorporated. Forexample, wecouldadda“judgmentconstraint” stating that without interventions, awareness of the disease does not a↵ect progression: P iU,jU +P iU,jD =P iD,jD 8 i=1,2,3,j i. (3.5) With this constraint in V,itiseasilyseenthatneither P A nor P B remains valid. In fact, 46 solutions obtained from this updated calibration model recommend ST1, the optimal inter- ventionrecommendedbyP true ,eventhoughthesemodelsstilldi↵erfrom P true .Thissuggests the importance of the role of a correct “judgment.” By searching among the set of plausible models with ✏=10 7 ,weareabletoexamine the manner in which the “judgment constraint,” (3.5), impacts the range of the NMB. The results are shown in Figure 3.4 (see page 51). For each intervention, the black horizontal line segment represents the NMB range using the original validity conditions specified in V and the red line is the NMB range when the “judgment constraint” is included. The addition of the judgment constraint (3.5) to V substantially reduces the range for every intervention, indicating a smaller calibration uncertainty with respect to the NMB values. This further highlights the value of a correct judgment. 3.6 Discussion This chapter introduces a systematic approach for the incorporation of issues related to “va- lidity” within the calibration process for a Markov natural history model. This approach allowsahigherleveloftransparencyinnaturalhistorymodeling,sincecalibrationandvalida- tion are crucial aspects in modeling that are not, in most cases, suciently well-documented in disease modeling literature. We also introduce the concept of plausible models, which are models that fit the calibration targets well and are valid. We show how the set of plausible models can be mathematically characterized through an adaptation of our cali- bration/validation model. This allows a complete assessment of the impact of calibration uncertainty on the NMB values for an intervention. This chapter is not without limitations. Our fictitious disease is a stationary Markov process, as are our natural history models. This eliminates issues associated with structural uncertainty. Non-stationaritiescanbeincorporatedbyexpandingthenumberofhealthstates and introducing age-specific transition probabilities. In doing so, the number of variables and the complexity of the resulting model are substantially increased, which impacts the 47 magnitude of calibration uncertainty. Despite the stationarity of our disease, incidence and prevalence are age-dependent nonetheless. We use a Markov natural history model with cycle length coinciding with the age range for which the data are available. This is largely because the Surveillance, Epidemiology, and End Results (SEER) Program database, which is commonly used for cancer models, provides incidence and prevalence data in 5-year age groups. A shorter cycle length (e.g. 1 month) might eliminate some direct transitions, leading to a sparse transition probability matrix with fewer number of nonzero {P i,j } variables in the calibration model. However, using equations (3.1) and (3.2), we note that there are also a large number of incidence and prevalence variables for which there will be no calibration targets. With respect to calibration uncertainty, the impact of a model cycle that is smaller than the data cycle is unclear, and further research is warranted. We only consider two types of calibration targets, incidence and prevalence, in our cal- ibration/validation model. Other common calibration targets include survival curves that result from some treatments. Calibrating the model to these targets involves specifying the functional relationship between the natural history model and the resulting transition probability matrix under the treatment as well as the mathematical or computational rep- resentations of survival curves. In addition, while our validity conditions are linear, there exist other validity conditions that may be nonlinear or dicult to be represented mathe- matically. Future directions for our research include incorporating these calibration targets into our calibration/validation model and examining the e↵ect of these validity conditions. We stress the value of “judgment” through the judgment constraint (3.5). Expert judg- ment and available data enable guided searches. However, it is possible that the calibra- tion/validation model is infeasible. Infeasibility indicates that the representations of the disease dynamics and validity are in conflict. In other words, there exists a misrepresenta- tion of clinical knowledge or a structural error. Therefore, our framework allows another level of validity through feasibility, in addition to the level of validity achieved through the 48 validity conditions in V. Ourapproachensuresthevalidityofparametersetsinthesearchbyintegratingvalidation and calibration. This integration, however, does not imply that we nullify other types of validation processes commonly performed in model-based analysis for medical decision making. On the contrary, the value of validation is greatly emphasized. For example, cross validation among di↵erent model structures can help examine the structural uncertainty. Our approach simply allows a higher level of transparency, which is not yet prevalent in model-based analysis for medical decision making, and an ecient and sucient search of the parameter space. We only consider plausible ranges for the transitions that are observable (see Appendix A.3). For instance, P i,j where i 2 D and j 2 O is restricted within a plausible range, whereas all other transition probabilities are bounded between 0 and 1. This is because the transitions from H or U are not observable, and the lack of observability for these transitions suggests the lack of data. As expert judgment regarding the plausible ranges for the unobservable transitions becomes available, such information can be included in the set of validity conditions to help guide searches. 49 Figure 3.3: The range of the net monetary benefit for each intervention 50 Figure 3.4: The range of the net monetary benefit for each intervention 51 3.7 Bibliography Goldhaber-Fiebert, J. D., Stout, N. K., Ortendahl, J., Kuntz, K. M., Goldie, S. J., and Salomon, J. A. (2007). Modeling human papillomavirus and cervical cancer in the United States for analyses of screening and vaccination. Population Health Metrics,5(11). Grover, S. A., Coupal, L., Zowall, H., Rajan, R., Trachtenberg, J., Elhilali, M., Chetner, M., andGoldenberg, L.(2000). TheclinicalburdenofprostatecancerinCanada: Forecastsfrom the Montreal Prostate Cancer Model. Canadian Medical Association Journal,162(7):977– 983. Havrilesky, L. J., Sanders, G. D., Kulasingam, S., Chino, J. P., Berchuck, A., Marks, J. R., and Myers, E. R. (2011). Development of an ovarian cancer screening decision model that incorporates disease heterogeneity. Cancer,117(3):545–553. Havrilesky, L. J., Sanders, G. D., Kulasingam, S., and Myers, E. R. (2008). Reducing ovarian cancer mortality through screening: Is it possible, and can we a↵ord it? Gynecologic Oncology,111(2):179–187. Hillner, B.E.andSmith, T.J.(1991). Ecacyandcoste↵ectivenessofadjuvantchemother- apy in women with node-negative breast cancer: A decision-analysis model. The New Eng- land Journal of Medicine,324(3):160–168. Jit, M., Gay, N., Soldan, K., Choi, Y. H., and Edmunds, W. J. (2010). Estimating progres- sion rates for human papillomavirus infection from epidemiological data. Medical Decision Making,30(1):84–98. Kim, J. J., Kuntz, K. M., Stout, N. K., Mahmud, S., Villa, L. L., Franco, E. L., and Goldie, S. J. (2007). Multiparameter calibration of a natural history model of cervical cancer. American Journal of Epidemiology,166(2):137–150. Myers, E. R., McCrory, D. C., Nanda, K., Bastian, L., and Matchar, D. B. (2000). Math- ematical model for the natural history of human papillomavirus infection and cervical car- cinogenesis. American Journal of Epidemiology,151(12):1158–1171. Siebert, U., Sroczynski, G., Hillemanns, P., Engel, J., Stabenow, R., Stegmaier, C., Voigt, K., Gibis, B., H¨ olzel, D., and Goldie, S. J. (2006). The German cervical cancer screening model: Developmentandvalidationofadecision-analyticmodelforcervicalcancerscreening in Germany. The European Journal of Public Health,16(2):185–192. Stinnett,A.A.andMullahy,J.(1998). Nethealthbenefits: Anewframeworkfortheanalysis of uncertainty in cost-e↵ectiveness analysis. Medical Decision Making,18(2suppl):S68–S80. Stout, N. K., Knudsen, A. B., Kong, C. Y., McMahon, P. M., and Gazelle, G. S. (2009). Calibration methods used in cancer simulation models and suggested reporting guidelines. 52 Pharmacoeconomics,27(7):533–545. Tawarmalani, M. and Sahinidis, N. V. (2005). A polyhedral branch-and-cut approach to global optimization. Mathematical Programming,103(2):225–249. Taylor, D. C., Pawar, V., Kruzikas, D. T., Gilmore, K. E., Pandya, A., Iskandar, R., and Weinstein, M. C. (2011). Calibrating longitudinal models to cross-sectional data: The e↵ect of temporal changes in health practices. Value in Health,14(5):700–704. Taylor, D. C., Pawar, V., Kruzikas, D. T., Gilmore, K. E., Sanon, M., and Weinstein, M. C. (2012). Incorporating calibrated model parameters into sensitivity analyses. Pharmacoeco- nomics,30(2):119–126. Trippoli, S. (2017). Incremental cost-e↵ectiveness ratio and net monetary benefit: Current use in pharmacoeconomics and future perspectives. European Journal of Internal Medicine, 43:e36. van de Velde, N., Brisson, M., and Boily, M.-C. (2007). Modeling human papillomavirus vaccine e↵ectiveness: Quantifying the impact of parameter uncertainty. American Journal of Epidemiology,165(7):762–775. Vanni, T., Karnon, J., Madan, J., White, R. G., Edmunds, W. J., Foss, A. M., and Legood, R. (2011). Calibrating models in economic evaluation: A seven-step approach. Pharma- coeconomics,29(1):35–49. 53 Appendix A: Data for the Fictitious Disease A.1 P true ,⇡ true , and ↵ true P true is the true transition probability matrix for the fictitious disease. P true = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 H 1U 2U 3U 1D 2D 3DDO DD H 0.97 0.00875 0.0051 0.003 0.00375 0.0034 0.003 0.0028 0.0002 1U 00.5635 0.0753 0.03325 0.2415 0.0502 0.03325 0.0017 0.0013 2U 00 0.414 0.1535 0 0.276 0.1535 0.0012 0.0018 3U 00 0 0.4985 0 0 0.4985 0.001 0.002 1D 00 0 0 0.805 0.1255 0.0665 0.0017 0.0013 2D 00 0 0 0 0.69 0.307 0.0012 0.0018 3D 00 0 0 0 0 0.997 0.001 0.002 DO00 0 0 0 0 0 1 0 DD00 0 0 0 0 0 0 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 The health states are ordered in this manner: H,1U,2U,3U,1D,2D,3D, DO,and DD. All individuals at age 1 are in state H,thatis, ⇡ true (1) = [1,0,0,0,0,0,0,0,0] and ↵ true j (1) = [0,0,0] where j 2 D. The data is available for 11 ages only. Using the bilinear relations (3.1) and (3.2), we obtain the following: 54 Age Health State H 1U 2U 3U 1D 2D 3D DO DD 1 1 0 0 0 0 0 0 0 0 2 0.97000 0.00875 0.00510 0.00300 0.00375 0.00340 0.00300 0.00280 0.00020 3 0.94090 0.01342 0.00772 0.00548 0.00877 0.00796 0.00976 0.00555 0.00044 4 0.91267 0.01579 0.00900 0.00718 0.01383 0.01260 0.01995 0.00826 0.00071 5 0.88529 0.01689 0.00957 0.00823 0.01837 0.01681 0.03290 0.01092 0.00103 6 0.85873 0.01726 0.00975 0.00879 0.02218 0.02040 0.04797 0.01353 0.00138 7 0.83297 0.01724 0.00972 0.00903 0.02525 0.02334 0.06459 0.01609 0.00177 8 0.80798 0.01700 0.00957 0.00906 0.02761 0.02565 0.08231 0.01861 0.00220 9 0.78374 0.01665 0.00936 0.00898 0.02936 0.02741 0.10075 0.02108 0.00267 10 0.76023 0.01624 0.00913 0.00882 0.03060 0.02868 0.11963 0.02351 0.00317 11 0.73742 0.01580 0.00888 0.00862 0.03140 0.02955 0.13873 0.02589 0.00371 Table A.1: ⇡ true Age Health State 1D 2D 3D 1 0 0 0 2 0.00375 0.00340 0.00300 3 0.00575 0.00514 0.00548 4 0.00677 0.00600 0.00718 5 0.00724 0.00638 0.00823 6 0.00740 0.00650 0.00879 7 0.00739 0.00648 0.00903 8 0.00729 0.00638 0.00906 9 0.00714 0.00624 0.00898 10 0.00696 0.00608 0.00882 11 0.00677 0.00592 0.00862 Table A.2: ↵ true 55 A.2 Data for Plausible Models P 0 = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 H 1U 2U 3U 1D 2D 3DDO DD H 0.56257 0.42421 0.00008 0 0.00375 0.00340 0.00300 0.00280 0.00020 1U 00.97000 0 0.00196 0.00859 0.00752 0.00893 0.00275 0.00026 2U 00 0.08330 0.10788 0 0.66461 0.05077 0.06091 0.03253 3U 00 0 0.48958 0 0 0.51042 0 0 1D 00 00 0.80502 0.12882 0.06395 0.00201 0.00020 2D 00 000 0.68631 0.31000 0.00078 0.00291 3D 00 000 0 0.99692 0.00104 0.00204 DO 00 000 001 0 DD 00 000 000 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 P A = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 H 1U 2U 3U 1D 2D 3DDO DD H 0.96997 0.00881 0.00420 0.00389 0.00375 0.00340 0.00299 0.00280 0.00020 1U 00.56404 0.05606 0.03977 0.23948 0.05606 0.03977 0.00241 0.00240 2U 00 0.42118 0.16037 0 0.32472 0.09373 0.00000 0.00000 3U 00 0 0.52318 0 0 0.47358 0.00112 0.00212 1D 00 00 0.80505 0.10018 0.08997 0.00241 0.00240 2D 00 000 0.71743 0.28257 0.00000 0.00000 3D 00 000 0 0.99676 0.00112 0.00212 DO 00 000 001 0 DD 00 000 000 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 P B = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 H 1U 2U 3U 1D 2D 3DDO DD H 0.56350 0.41623 0.00356 0.00356 0.00375 0.00340 0.00300 0.00280 0.00020 1U 00.97000 0.00457 0.00456 0.00874 0.00457 0.00456 0.00278 0.00022 2U 00 0.41400 0.17766 0 0.37264 0.03270 0.00000 0.00300 3U 00 0 0.49850 0 0 0.49850 0.00101 0.00199 1D 00 00 0.80500 0.12550 0.06650 0.00278 0.00022 2D 00 000 0.69000 0.30700 0.00000 0.00300 3D 00 000 0 0.99700 0.00101 0.00199 DO 00 000 001 0 DD 00 000 000 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 56 Age Health State H 1U 2U 3U 1D 2D 3D DO DD 1 1 0 0 0 0 0 0 0 0 2 0.56257 0.42421 0.00008 0 0.00375 0.00340 0.00300 0.00280 0.00020 3 0.31648 0.65013 0.00005 0.00084 0.00877 0.00797 0.00976 0.00555 0.00044 4 0.17804 0.76489 0.00003 0.00169 0.01383 0.01260 0.01995 0.00826 0.00071 5 0.10016 0.81747 0.00002 0.00233 0.01837 0.01680 0.03290 0.01092 0.00103 6 0.05635 0.83543 0.00001 0.00275 0.02219 0.02040 0.04797 0.01353 0.00138 7 0.03170 0.83427 0.00001 0.00298 0.02525 0.02333 0.06459 0.01610 0.00177 8 0.01783 0.82269 0.00000 0.00310 0.02761 0.02565 0.08231 0.01861 0.00220 9 0.01003 0.80557 0.00000 0.00313 0.02936 0.02741 0.10075 0.02108 0.00266 10 0.00564 0.78566 0.00000 0.00311 0.03060 0.02868 0.11963 0.02351 0.00317 11 0.00317 0.76448 0.00000 0.00306 0.03140 0.02955 0.13873 0.02589 0.00371 Table A.3: ⇡ 0 Age Health State H 1U 2U 3U 1D 2D 3D DO DD 1 1 0 0 0 0 0 0 0 0 2 0.96997 0.00881 0.00420 0.00389 0.00375 0.00340 0.00299 0.00280 0.00020 3 0.94085 0.01351 0.00633 0.00682 0.00877 0.00796 0.00977 0.00555 0.00044 4 0.91259 0.01591 0.00737 0.00878 0.01383 0.01260 0.01995 0.00826 0.00071 5 0.88519 0.01701 0.00782 0.00995 0.01837 0.01681 0.03290 0.01092 0.00103 6 0.85861 0.01739 0.00796 0.01058 0.02218 0.02040 0.04797 0.01353 0.00138 7 0.83283 0.01737 0.00793 0.01084 0.02525 0.02333 0.06459 0.01609 0.00177 8 0.80782 0.01713 0.00781 0.01087 0.02761 0.02564 0.08231 0.01861 0.00220 9 0.78356 0.01678 0.00764 0.01076 0.02936 0.02740 0.10075 0.02108 0.00266 10 0.76003 0.01637 0.00744 0.01057 0.03060 0.02868 0.11963 0.02351 0.00317 11 0.73721 0.01593 0.00724 0.01033 0.03140 0.02956 0.13873 0.02590 0.00371 Table A.4: ⇡ A Age Health State H 1U 2U 3U 1D 2D 3D DO DD 1 1 0 0 0 0 0 0 0 0 2 0.56350 0.41623 0.00356 0.00356 0.00375 0.00340 0.00300 0.00280 0.00020 3 0.31753 0.63829 0.00538 0.00631 0.00877 0.00796 0.00976 0.00555 0.00044 4 0.17893 0.75131 0.00628 0.00814 0.01383 0.01260 0.01995 0.00826 0.00071 5 0.10083 0.80324 0.00667 0.00924 0.01837 0.01681 0.03290 0.01092 0.00103 6 0.05682 0.82111 0.00679 0.00981 0.02218 0.02040 0.04797 0.01353 0.00138 7 0.03202 0.82013 0.00677 0.01004 0.02525 0.02334 0.06459 0.01609 0.00177 8 0.01804 0.80885 0.00666 0.01006 0.02761 0.02565 0.08231 0.01861 0.00220 9 0.01017 0.79209 0.00652 0.00995 0.02936 0.02741 0.10075 0.02108 0.00267 10 0.00573 0.77256 0.00636 0.00977 0.03060 0.02868 0.11963 0.02351 0.00317 11 0.00323 0.75177 0.00618 0.00954 0.03140 0.02955 0.13873 0.02589 0.00371 Table A.5: ⇡ B 57 The comparison of observable data among ⇡ true ,⇡ 0 ,⇡ A , and ⇡ B is shown below. The di↵erence between each observable element in ⇡ that is calculated from a model and the corresponding element in ⇡ true is minimal. Age Health State 1D 2D 3D DO DD 1 0 0 0 0 0 2 0.00375 0.00340 0.00300 0.00280 0.00020 3 0.00877 0.00796 0.00976 0.00555 0.00044 4 0.01383 0.01260 0.01995 0.00826 0.00071 5 0.01837 0.01681 0.03290 0.01092 0.00103 6 0.02218 0.02040 0.04797 0.01353 0.00138 7 0.02525 0.02334 0.06459 0.01609 0.00177 8 0.02761 0.02565 0.08231 0.01861 0.00220 9 0.02936 0.02741 0.10075 0.02108 0.00267 10 0.03060 0.02868 0.11963 0.02351 0.00317 11 0.03140 0.02955 0.13873 0.02589 0.00371 Table A.6: ⇡ true Age Health State 1D 2D 3D DO DD 1 0 0 0 0 0 2 0.00375 0.00340 0.00300 0.00280 0.00020 3 0.00877 0.00797 0.00976 0.00555 0.00044 4 0.01383 0.01260 0.01995 0.00826 0.00071 5 0.01837 0.01680 0.03290 0.01092 0.00103 6 0.02219 0.02040 0.04797 0.01353 0.00138 7 0.02525 0.02333 0.06459 0.01610 0.00177 8 0.02761 0.02565 0.08231 0.01861 0.00220 9 0.02936 0.02741 0.10075 0.02108 0.00266 10 0.03060 0.02868 0.11963 0.02351 0.00317 11 0.03140 0.02955 0.13873 0.02589 0.00371 Table A.7: ⇡ 0 58 Age Health State 1D 2D 3D DO DD 1 0 0 0 0 0 2 0.00375 0.00340 0.00299 0.00280 0.00020 3 0.00877 0.00796 0.00977 0.00555 0.00044 4 0.01383 0.01260 0.01995 0.00826 0.00071 5 0.01837 0.01681 0.03290 0.01092 0.00103 6 0.02218 0.02040 0.04797 0.01353 0.00138 7 0.02525 0.02333 0.06459 0.01609 0.00177 8 0.02761 0.02564 0.08231 0.01861 0.00220 9 0.02936 0.02740 0.10075 0.02108 0.00266 10 0.03060 0.02868 0.11963 0.02351 0.00317 11 0.03140 0.02956 0.13873 0.02590 0.00371 Table A.8: ⇡ A Age Health State 1D 2D 3D DO DD 1 0 0 0 0 0 2 0.00375 0.00340 0.00300 0.00280 0.00020 3 0.00877 0.00796 0.00976 0.00555 0.00044 4 0.01383 0.01260 0.01995 0.00826 0.00071 5 0.01837 0.01681 0.03290 0.01092 0.00103 6 0.02218 0.02040 0.04797 0.01353 0.00138 7 0.02525 0.02334 0.06459 0.01609 0.00177 8 0.02761 0.02565 0.08231 0.01861 0.00220 9 0.02936 0.02741 0.10075 0.02108 0.00267 10 0.03060 0.02868 0.11963 0.02351 0.00317 11 0.03140 0.02955 0.13873 0.02589 0.00371 Table A.9: ⇡ B 59 The comparison of observable data among ↵ true ,↵ 0 ,↵ A , and ↵ B is shown below. Minimal di↵erence exists between each element in ↵ that is calculated from a model and the corresponding element in ↵ true . Age Health State 1D 2D 3D 1 0 0 0 2 0.00375 0.00340 0.00300 3 0.00575 0.00514 0.00548 4 0.00677 0.00600 0.00718 5 0.00724 0.00638 0.00823 6 0.00740 0.00650 0.00879 7 0.00739 0.00648 0.00903 8 0.00729 0.00638 0.00906 9 0.00714 0.00624 0.00898 10 0.00696 0.00608 0.00882 11 0.00677 0.00592 0.00862 Table A.10: ↵ true Age Health State 1D 2D 3D 1 0 0 0 2 0.00375 0.00340 0.00300 3 0.00575 0.00516 0.00548 4 0.00677 0.00600 0.00718 5 0.00724 0.00638 0.00823 6 0.00740 0.00650 0.00879 7 0.00739 0.00648 0.00903 8 0.00729 0.00638 0.00906 9 0.00713 0.00625 0.00898 10 0.00696 0.00609 0.00882 11 0.00677 0.00593 0.00862 Table A.11: ↵ 0 Age Health State 1D 2D 3D 1 0 0 0 2 0.00375 0.00340 0.00299 3 0.00575 0.00515 0.00549 4 0.00677 0.00601 0.00718 5 0.00723 0.00638 0.00821 6 0.00740 0.00650 0.00877 7 0.00739 0.00648 0.00902 8 0.00729 0.00638 0.00906 9 0.00714 0.00624 0.00898 10 0.00696 0.00608 0.00882 11 0.00677 0.00592 0.00863 Table A.12: ↵ A Age Health State 1D 2D 3D 1 0 0 0 2 0.00375 0.00340 0.00300 3 0.00575 0.00514 0.00548 4 0.00677 0.00600 0.00718 5 0.00724 0.00638 0.00823 6 0.00740 0.00650 0.00879 7 0.00739 0.00648 0.00903 8 0.00729 0.00638 0.00906 9 0.00714 0.00624 0.00898 10 0.00696 0.00608 0.00882 11 0.00677 0.00592 0.00862 Table A.13: ↵ B 60 The comparison of unobservable data among ⇡ true ,⇡ 0 ,⇡ A , and ⇡ B is shown below. Some elements in ⇡ calculated from a model di↵er significantly from ⇡ true . Age Health State H 1U 2U 3U 1 1 0 0 0 2 0.97000 0.00875 0.00510 0.00300 3 0.94090 0.01342 0.00772 0.00548 4 0.91267 0.01579 0.00900 0.00718 5 0.88529 0.01689 0.00957 0.00823 6 0.85873 0.01726 0.00975 0.00879 7 0.83297 0.01724 0.00972 0.00903 8 0.80798 0.01700 0.00957 0.00906 9 0.78374 0.01665 0.00936 0.00898 10 0.76023 0.01624 0.00913 0.00882 11 0.73742 0.01580 0.00888 0.00862 Table A.14: ⇡ true Age Health State H 1U 2U 3U 1 1 0 0 0 2 0.56257 0.42421 0.00008 0 3 0.31648 0.65013 0.00005 0.00084 4 0.17804 0.76489 0.00003 0.00169 5 0.10016 0.81747 0.00002 0.00233 6 0.05635 0.83543 0.00001 0.00275 7 0.03170 0.83427 0.00001 0.00298 8 0.01783 0.82269 0.00000 0.00310 9 0.01003 0.80557 0.00000 0.00313 10 0.00564 0.78566 0.00000 0.00311 11 0.00317 0.76448 0.00000 0.00306 Table A.15: ⇡ 0 61 Age Health State H 1U 2U 3U 1 1 0 0 0 2 0.96997 0.00881 0.00420 0.00389 3 0.94085 0.01351 0.00633 0.00682 4 0.91259 0.01591 0.00737 0.00878 5 0.88519 0.01701 0.00782 0.00995 6 0.85861 0.01739 0.00796 0.01058 7 0.83283 0.01737 0.00793 0.01084 8 0.80782 0.01713 0.00781 0.01087 9 0.78356 0.01678 0.00764 0.01076 10 0.76003 0.01637 0.00744 0.01057 11 0.73721 0.01593 0.00724 0.01033 Table A.16: ⇡ A Age Health State H 1U 2U 3U 1 1 0 0 0 2 0.56350 0.41623 0.00356 0.00356 3 0.31753 0.63829 0.00538 0.00631 4 0.17893 0.75131 0.00628 0.00814 5 0.10083 0.80324 0.00667 0.00924 6 0.05682 0.82111 0.00679 0.00981 7 0.03202 0.82013 0.00677 0.01004 8 0.01804 0.80885 0.00666 0.01006 9 0.01017 0.79209 0.00652 0.00995 10 0.00573 0.77256 0.00636 0.00977 11 0.00323 0.75177 0.00618 0.00954 Table A.17: ⇡ B 62 A.3 Validity Conditions V ={{P,⇡,↵ } that satisfy the following constraints}: X j2 S P i,j =1 8 i2 S (A.1) X i2 S ⇡ i (a)=1 8 a2 A (A.2) 0 P i,j 1 8 i,j 2 S (A.3) 0 ⇡ i (a) 1 8 i2 S,a2 A (A.4) 0 ↵ i (a) 1 8 i2 D,a2 A (A.5) P iU,j =P iD,j 8 i2{ 1,2,3},j2{ DO,DD} (A.6) P H,H P H,1U P H,2U P H,3U (A.7) P H,H P H,1D P H,2D P H,3D (A.8) P 1U,1U P 1U,2U P 1U,3U (A.9) P 1U,1D P 1U,2D P 1U,3D (A.10) P 2U,2U P 2U,3U (A.11) P 2U,2D P 2U,3D (A.12) P 1D,1D P 1D,2D P 1D,3D (A.13) P 2D,2D P 2D,3D (A.14) P i,jU P i,jD 8 i2{ H}[ U,j2{ 1,2,3} (A.15) P H,1U +P H,1D P H,2U +P H,2D P H,3U +P H,3D (A.16) P 1U,1U +P 1U,1D P 1U,2U +P 1U,2D P 1U,3U +P 1U,3D (A.17) P 2U,2U +P 2U,2D P 2U,3U +P 2U,3D (A.18) P i,H =0 8 i/ 2{ H} (A.19) 63 P i,1U =0 8 i/ 2{ H,1U} (A.20) P i,2U =0 8 i/ 2{ H,1U,2U} (A.21) P i,3U =0 8 i/ 2{ H}[ U (A.22) P i,1D =0 8 i/ 2{ H,1U,1D} (A.23) P i,2D =0 8 i/ 2{ H,1U,1D,2U,2D} (A.24) P i,3D =0 8 i/ 2{ H}[ U [ D (A.25) P DO,DO = 1 (A.26) P DD,DD = 1 (A.27) P i,j 1⇥ 10 7 8 i/ 2{ DO,DD},j2{ DO,DD} (A.28) ⇡ H (1) = 1 (A.29) 0.75 P 1D,1D 0.85 (A.30) 0.1 P 1D,2D 0.15 (A.31) 0.06 P 1D,3D 0.09 (A.32) 0.0015 P 1D,DO 0.003 (A.33) 0.0002 P 1D,DD 0.0025 (A.34) 0.68 P 2D,2D 0.72 (A.35) 0.28 P 2D,3D 0.31 (A.36) 0.0000001 P 2D,DO 0.0015 (A.37) 0.0000001 P 2D,DD 0.003 (A.38) 0.95 P 3D,3D 1 (A.39) 0.0005 P 3D,DO 0.0015 (A.40) 0.0018 P 3D,DD 0.0024 (A.41) 64 Equation (A.6) states that mortality is independent of awareness of the disease. Equa- tions (A.7) to (A.18) indicate that the disease is more likely to remain in the current state than progress to a worse health state or to become detected. Equation (A.28) indicates that there is a non-zero probability of dying from any state. Equations (A.30) to (A.41) represent the biologically plausible ranges for observable transition probabilities. These con- straints represent the expert judgment that would likely result from observations of the diseased patients. A.4 Parameters for the Interventions A.4.1 Screening Test Characteristics We consider a multimodal screening protocol in which every individual in H or U will be screened with the first-line test within each cycle. If the first-line test is positive, a second- linetestwillbeappliedimmediately. Athird-linediagnostictestwillbeappliedimmediately to individuals with a positive second-line test. The third-line test is 100% accurate. Tables A.18 and A.19 summarize the characteristics of the screening tests. Test 1U 2U 3U H First-line 0.78 0.87 0.89 0.01 Second-line 0.97 0.98 0.99 0.001 Third-line 1 1 1 0 Table A.18: Probability of a positive test result given health state Test Cost First-line $30 Second-line $72 Third-line $154 Table A.19: Screening test costs The multimodal screening test alters the rows in the transition probability matrix P corresponding toH orU.Lettheprobabilityofapositivei th -line test given in statej bep (i) j 65 for i=1,2,3and j2{ H}[ U.If P=[P i,j ]isthematrixrepresentingthenaturalhistory andP scr =[P scr i,j ] represents the matrix under screening, then we have P scr i,j =P i,j 8 i/ 2 U,j 2 S P scr iU,j =p (1) iU p (2) iU P iD,j + (1 p (1) iU )+p (1) iU (1 p (2) iU ) P iU,j 8 i=1,2,3,j 2 S. Let us define a constant matrixA scr as follows: A scr = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 10 0 0 0 0 0 0 0 0 a 1 00 b 1 00 00 00 a 2 00 b 2 00 0 00 0 a 3 00 b 3 00 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 where a i =(1 p (1) iU )+p (1) iU (1 p (2) iU )and b i =p (1) iU p (2) iU .Then, P scr =A scr P. FromP scr , (3.1) and (3.2) can be used to calculate the altered prevalence and incidence under screening, ⇡ scr and ↵ scr . A.4.2 Treatment Parameters Two treatments are considered for the fictitious disease. Treatments alter the rows in the transitionprobabilitymatrixPcorrespondingtoD.LetP=[P i,j ]bethematrixrepresenting 66 the natural history andP Tn =[P Tn i,j ]thematrixundertreatment n.Thenfortreatment n, P Tn i,j =P i,j 8 i/ 2 D,j 2 S P Tn iD,DO =P iD,DO 8 i=1,2,3 P Tn iD,DD =(1 m (n) iD )P iD,DD 8 i=1,2,3 P Tn 1D,2D =(1 t (n) 1,2 )P 1D,2D P Tn 1D,3D =(1 t (n) 1,3 )P 1D,3D P Tn 2D,3D =(1 t (n) 2,3 )P 2D,3D P Tn iD,iD =1 X j6=iD P Tn iD,j 8 i=1,2,3 FromP Tn ,(3.1)and(3.2)canbeusedtocalculatethealteredprevalenceandincidence under treatment n, ⇡ Tn and ↵ Tn . Table A.20 summarizes the parameters related to treat- ments T1 and T2. The state-specific costs are the total costs for that state within a cycle length. Treatment State Cost m t 1,2 t 1,3 t 2,3 T1 1D $2900 0.78 0.83 0.44 0.58 2D $3000 0.64 3D $3300 0.41 T2 1D $1200 0.40 0.46 0.46 0.53 2D $1400 0.34 3D $1700 0.32 Table A.20: Treatment parameters The transition probability matrix for screening + treatment n, P STn ,isobtainedby applying the procedures outlined in §A.4.1 and §A.4.2 to appropriate rows. A.4.3 Quality of Life Each test has impact on quality of life. We assume a baseline quality of life for each health state. For all screening tests and treatments, we assume that there is a state-specific disu- 67 tility, which is presented in Table A.21. A negative disutility implies that the intervention causes some discomfort whereas a positive value implies an improved quality of life. For screening tests, the quality of life for the states in O remains the same as the baseline value since screening tests do not apply to those states. On the other hand, only detected diseased individuals receive treatments. Treatment starts immediately when a case is detected. State Baseline quality First-line Second-line Third-line T1 T2 H 1 0 0 0.005 0 0 1U 0.94 0 0 0.005 0 0 2U 0.90 0 0 0.005 0 0 3U 0.85 0 0 0.005 0 0 1D 0.90 0 0 0 0.05 0.05 2D 0.85 0 0 0 0.05 0.04 3D 0.80 0 0 0 0.03 0.03 DO 0 0 0 0 0 0 DD 0 0 0 0 0 0 Table A.21: Quality of life parameters A.4.4 Net Monetary Benefit Calculations Calculation of Quality-Adjusted Life Years (QALYs) The total quality-adjusted life years (QALYs) when there is no intervention is equal to P a2 A w(a) P i2 S Q i ⇡ i (a)where • {w(a)} is the age distribution of the target population, • Q i is the quality of life for state i,and • ⇡ i (a)istheprevalenceofstateiforageacalculatedusingP,thenaturalhistorymodel. For our fictitious disease, the age distribution is shown in Table A.22. Due to the nature of a multimodal screening test, a healthy or undetected individual can receive one to three tests depending on the test results, which determines the total QALYs. 68 Age w(a) 1 0.0923 2 0.0994 3 0.0982 4 0.0948 5 0.0891 6 0.0884 7 0.0912 8 0.0977 9 0.0954 10 0.0834 11 0.0703 Table A.22: Age distribution The quality of life for state i due to screening when a person receives only the first-line test, both first- and second-line tests, and all three tests (denoted asQ scr i (1),Q scr i (2), andQ scr i (3), respectively) is calculated by summing the baseline quality and all relevant disutilities. For instance,Q scr i (2)isthesumofthebaselinequalityandthedisutilitiesofthefirst-andsecond- line tests. Note that Q scr i (1) = Q scr i (2) = Q scr i (3) for i 2 O. The QALYs for screening is equal to X a2 A w(a) X i2{ H}[ U (1 p (1) i )Q scr i (1)+p (1) i (1 p (2) i )Q scr i (2)+p (1) i p (2) i Q scr i (3) ⇡ scr i (a) + X a2 A w(a) X i2 O Q scr i (1)⇡ scr i (a) The quality of life for state i due to treatment n, Q Tn i , is the sum of the baseline quality and its disutility. For an arbitrary individual, the total QALYs for treatment n is equal to P a2 A w(a) P i2 S Q Tn i ⇡ Tn i (a). 69 The total QALYs for screening + treatment n, X a2 A w(a) X i2{ H}[ U (1 p (1) i )Q scr i (1)+p (1) i (1 p (2) i )Q scr i (2)+p (1) i p (2) i Q scr i (3) ⇡ STn i (a) + X a2 A w(a) X i2 O Q Tn i ⇡ STn i (a) where ⇡ STn i (a)isobtainedbyusingP STn and (3.1). Calculation of Total Costs The screening costs for the first-, second-, and third-line tests are denoted as C S 1 ,C S 2 ,and C S 3 whereas the costs of treatment n for state i2 D are denoted as C Tn i .Then, Total cost of screening = X a2 A w(a) X i2{ H}[ U ⇡ scr i (a) C S 1 +C S 2 p (1) i +C S 3 (p (1) i p (2) i ) Total cost of treatment n = X a2 A w(a) X i2 D ⇡ Tn i (a)C Tn i The total cost of screening + treatment n is the sum of the total costs of screening and treatment, both of which are calculated using ⇡ STn . Calculation of NMB The NMB for an intervention is defined as WTP ⇥ total QALYs for that intervention total cost. The WTP is set as $300,000 per QALY. 70 A.4.5 The NMB of Each Intervention P true Do nothing T1 T2 ST1 ST2 NMB $290338 $291459 $291450 $291466 $291440 Deviation from max NMB -$1128 -$8 -$17 $0 -$26 Table A.23: Net monetary benefit for each intervention 71 Chapter 4: A Literature Review of Model-Based Analyses for Ovarian Cancer Ovarian cancer is the fifth deadliest cancer for females, with 22,500 estimated new cases and 14,000 estimated deaths in 2017 (American Cancer Society [2017]). While 15% of the cases are diagnosed at the localized stage, where the 5-year survival rate is 92%, a majority of cases (60%) are detected at the distant stage, where the 5-year survival rate is only 29% (American Cancer Society [2017]). Early detection might help improve the survival outlook of ovarian cancer patients. However, there is no recommended screening for ovarian cancer by the U.S. Preventive Services Task Force (USPSTF) (Moyer [2012]). Randomized control trials (RCTs) provide the strongest empirical evidence of the e↵ec- tiveness of a screening strategy. Two largest RCTs to date, the Prostate, Lung, Colorectal and Ovarian (PLCO) (Buys et al. [2011]) and the United Kingdom Collaborative Trial of Ovarian Cancer Screening (UKCTOCS) (Menon et al. [2009]), have been conducted to eval- uate the e↵ectiveness of ovarian cancer screening. The PLCO is a United States based RCT targeting women from age 55 to 74. A total of 39,105 women are enrolled in the screening arm (annual screening with biomarker CA 125 and transvaginal ultrasound (TVS)) whereas 39,111 women are in the control arm (usual care with no annual screening). From this study, it is concluded that the annual screening with CA 125 and TVS did not reduce mor- tality from ovarian cancer. The UKCTOCS is a United Kingdom based RCT recruiting post-menopausal women from age 50 to 74. There are 101,359 women in the control arm, 50,640womeninthemultimodalscreeningarm(annualscreeningwithCA125andTVSasa second-line test), and 50,639 women in the annual TVS arm. The multimodal screening arm shows a significant reduction in ovarian cancer mortality of 28% after 7 years of screening when prevalent cases are excluded (Jacobs et al. [2016]). Although considered as the gold standard for evaluating the e↵ectiveness of a screen- ing strategy, RCTs are generally costly and time-consuming and can only evaluate specific 72 strategies when they are undertaken. Therefore, model-based analyses for cancer screenings are often used for assessing the potential impact of a broader array of screening strategies. Many have been used in evaluating screening strategies for di↵erent cancers such as prostate cancer (Cantor et al. [1995]; Kobayashi et al. [2007]), cervical cancer (Mandelblatt et al. [2002]; McLay et al. [2010]; Myers et al. [2000]; Siebert et al. [2006]), breast cancer (Fry- back et al. [2006]; Tan et al. [2006]), colorectal cancer (Rutter and Savarino [2010]; Zauber et al. [2008]), and lung cancer (Kong et al. [2009]; Schultz et al. [2012]). In the context of ovarian cancer screening, models involving stochastic simulation, decision trees, and Markov processes that describe the natural history of ovarian cancer have been developed. These natural history models are essential tools for evaluating cancer screening strategies. In the following section, we review models of the natural history of ovarian cancer available in the literature. 4.1 Natural History Models SkatesandSinger[1991]arethefirsttodevelopastochasticsimulationmodeltoevaluatethe benefit of using CA 125 radioimmunoassay to screen for ovarian cancer. The model consists of four components: • amodelofthenaturalhistoryofovariancancer, • a model of the time of clinical detection of ovarian cancer, • a model of the survival probability for each cancer stage, and • amodelofthescreeningprogram. Thenaturalhistoryofovariancancerismodeledusingthestandardfour-stagecancerstaging system. The joint distribution of the durations of the four stages is characterized by a four- variate log-normal distribution. By reducing the covariance matrix of the distribution to a function of one variable and fixing the ratios between the mean durations of early stage 73 (I and II) to late stage (III and IV), between the mean durations of stage I to stage II, and between the mean durations of stage III to stage IV, they further simplify the joint distribution of the durations to a function of two variables: the mean duration in stage I and the coecient of variation of the duration of each stage, which is assumed to be constant across all stages. Each incident case is assumed to be clinically detected at a time uniformly distributed within a 30 year period, and this time is independent of other variables in the model. The survival distribution for each stage is a function of three parameters: • the cure portion of cancer patients, and • the mean and variance of a log-normal survival distribution where the parameters are maximum likelihood estimations based on data from the MGH (MassachusettsGeneralHospital)tumorregistry. Anannualscreeningprogramthatisbased onCA125serumlevel(withacuto↵valueof35U/ml)issimulatedforahypotheticalcohort of1,000,000womenaged50-75overa30yearperiod. TheCA125levelaftertumorinception is assumed to follow an exponential growth model, resulting in stage-specific sensitivities. The outcome of the simulation is the expected number of life-years saved (LYS) per ovarian cancer case. The simulation study indicates an average of 3.42 LYS using an annual CA 125 screening. Urban et al. [1997] extend the model developed by Skates and Singer [1991] and evaluate six ovarian cancer screening strategies in terms of their e↵ectiveness and cost-e↵ectiveness. Theirnaturalhistorymodelisbasedonthefour-stagelog-normalmodelofSkatesandSinger; their clinical detection model assigns each case an age and stage at diagnosis according to the age- and stage-specific distributions given by the Surveillance, Epidemiology, and End Results Program (SEER) database. Departing from Skates and Singer, the age- and stage- specific survival distributions are estimated by applying the Kaplan-Meier method to the SEER data. Six screening protocols involving TVS and/ or CA 125 assay are evaluated: • annual TVS; 74 • annual CA 125 with a cuto↵ value of 35 U/ml; • annual CA 125 with a positivity criterion of doubled CA 125 since the last screen or elevated CA 125; • annual CA 125 followed by TVS if CA 125 is doubled or elevated; • 6-month CA 125 followed by TVS if CA 125 is doubled or elevated; and • 2-year CA 125 followed by TVS if CA 125 is doubled or elevated. The TVS is assumed to be 100% sensitive following a normally-distributed time delay after the inception of stage I; the specificities of TVS at the first, second, and third or later screens are assumed to be 0.019, 0.010, and 0.006, respectively. The CA 125 serum level is modeled using the exponential growth model as described in Skates and Singer. Each of the six screening strategies is imposed on a hypothetical cohort of 1,000,000 women, aged 50 in 1990, over 30 years. The costs of CA 125 and TVS are assumed to be $40 and $150, respectively. The outcomes of the model are LYS and the overall cost which is the sum of treatment, screening, and diagnosis costs. Annual TVS is the most cost-e↵ective strategy among the three single-modality strategies. Comparing all six strategies, the most e↵ective strategies are the multimodal strategy performed every six months or annually. Drescher et al. [2012] refine the stochastic microsimulation model developed by Urban etal.[1997]toevaluatetheimpactofscreeningtestperformanceandcostonthee↵ectiveness and cost-e↵ectiveness of ovarian cancer screening. The model, which focuses on epithelial ovarian cancer (EOC), consists of four components: the natural history, screening, survival, and cost components. The natural history component is based on parameters such as age at death from competing risks, EOC tumor characteristics (incidence, stage, grade, histology, and age at diagnosis); benign tumor characteristics (incidence and age at diagnosis); total durationofmalignantdiseasebystage,histologyandgrade; andbenigndiseaseduration. All parameter estimates are derived from the SEER database, the U.S. Vital Statistics Report, 75 andliterature(Andersonetal.[2010];Havrileskyetal.[2011];Katsubeetal.[1982];Partridge et al. [2009]; Yabro↵ et al. [2008]), except that the malignant durations are point estimates from gynecologic oncologists. Under the natural history model, each woman can be one of four types: • symptomatic EOC, • benign, • healthy, and • latent (asymptomatic) EOC. The simulation examines a hypothetical cohort of 1 million women as they are screened annually from ages 45 to 85 with a multimodal screening test with the first-line test being eitherCA125orahypotheticalbiomarkerassay,andthesecond-linetestbeingeitherTVSor ahypotheticalimagingtest. ThesensitivityofCA125assayismodeledusingtheparametric empiricalBayes(PEB)rules(McIntoshetal.[2002]), resultinginanincreasingstepfunction of the time before clinical diagnosis. The specificity of CA 125 assay is assumed to be 95%. The sensitivity function of the hypothetical biomarker test is assumed to be the minimum between 1 and twice that of CA 125 sensitivity at each time before diagnosis. The screening component also assumes that TVS has a fixed sensitivity of 63% and specificity of 97% whereas the hypothetical imaging test has a sensitivity of 90% and specificity of 97%. A screening result is regarded positive if both the first and second-line tests are positive. True positives and false positives will undergo surgeries. The survival component assigns a time of death for all clinically diagnosed cases and screen detected cases. The survival estimate is derived from the age-, stage-, histology-, and grade-specific survival data obtained from the SEER. All women are assumed to die by age 110. The cost component considers the screening test costs, surgical procedure costs, and treatment costs, which are derived from Medicare data. The costs of the screening tests 76 are $31 for CA 125, $210 for the hypothetical biomarker test, $111 for TVS, and $750 for the hypothetical imaging test. The outcome of the simulation includes mortality reduction, LYS, and cost-e↵ectiveness. With CA 125 as the first-line test and TVS as the second-line test, the mortality reduction is 13%. Using a hypothetical imaging test in place of TVS improves the mortality reduction to only 15% at a much greater cost per LYS. When CA 125 is replaced with the more sensitive, hypothetical biomarker test and TVS is used as the second-line test, the mortality reduction improves from 13% to 25%; when CA 125 is used as the first-line test and TVS is replace with the more sensitive, hypothetical imaging test, the mortality reduction slightly improves from 13% to 15%. This indicates that screening outcomes are insensitive to second-line test performance and costs. Schapira et al. [1993] conduct a decision tree analysis of the e↵ectiveness of a one-time ovarian cancer screening with CA 125 and TVS in a cohort of 40-year-old women in the United States. No screening and a one-time CA 125 and TVS screen are examined in the decision analysis. The natural history of ovarian cancer consists of three parameters: • ovarian cancer prevalence, • proportion of prevalent cases that are in early stages at the time of screening, and • probability of early-stage disease diagnosed clinically in the absence of screening. The prevalence of ovarian cancer in 40-year-old women is estimated by the product of age- specific incidence and the average duration of the preclinical disease phase, where the inci- dence is based on Cutler and Young [1975] and the average preclinical duration is assumed to be 2 years, although it is unclear how they arrive at this assumption. The time it takes to progress from early- to late-stage disease before clinical diagnosis is assumed to be one year. Screening-related parameters used in the decision tree include sensitivity of the combined CA 125 and TVS screening for early-stage disease, sensitivity of the combined screening for late-stage disease, specificity of the combined screening, and probability of post-laparotomy death. The sensitivities for early- and late-stage disease and the specificity of the combined 77 screening are estimated to be 45%, 81%, and 99.95%, respectively. Associated with each branch of the decision tree is a terminal node where appropriate remaining life expectancy is assigned to each terminal node. The combined screening results in an average remaining life expectancy of 40.192 years, increasing the average life expectancy by less than 1 day compared to no screening. Havrileskyetal.[2008]developaone-phenotypeMarkovmodeloftheovariancancernat- uralhistoryandstudythecost-e↵ectivenessofpotentialscreeningstrategies. Thelengthofa Markov cycle is 1 month. The model consists of 13 health states, including well, undetected stage I - IV ovarian cancer, detected stage I - IV ovarian cancer, benign oophorectomy, ovarian cancer survivor, death from ovarian cancer, and death from other causes. Direct transition from undetected stage I cancer to undetected stage III cancer is allowed in the natural history model. In addition, the model assumes that death from ovarian cancer will occur only at or after detection of cancer, and that ovarian cancer survivors are those who stay alive 10 years post diagnosis and can die of other causes. All patients enter the model at age 20 and die by age 100. The authors estimate the stationary transition probabilities between two cancer stages by searching over clinically justifiable values to fit the SEER data including the lifetime probability of developing ovarian cancer, stage distribution, and lifetime probability of death from ovarian cancer. Age-specific SEER data are used to es- timate ovarian cancer incidence; age- and stage-specific SEER data are used to estimate the transition probabilities involving ovarian cancer survival; age-specific U.S. Life Tables data are used to estimate age-specific probability of death from other causes. Age-specific probabilities of benign oophorectomy and mortality from oophorectomy are estimated from literature data (Keshavarz et al. [2002]; Merrill [2006]; Wingo et al. [1985]) following model calibration. Imposed on their natural history model are several hypothetical screening strategies, including no screening and screening at intervals of 3-36 months. In the base case, an annual screen of 85% sensitivity and 95% specificity costing $50 is performed on those between 78 ages 50 and 85. Two scenarios are examined in their study: screening within the general population and within a high-risk population (simulated based on the prevalence of having ariskfactorandtherelativeriskofovariancancer). Theannualscreeningresultsin43% mortalityreductioninthebasecase. Thestudyconcludesthatannualscreeningispotentially cost e↵ective, especially in high-risk population. Havrilesky et al. [2011] extends their 1-phenotype model (Havrilesky et al. [2008]) to a 2-phenotypeMarkovmodelinwhichEOCisclassifiedintotwophenotypes, “aggressive”and “indolent”. They examine the e↵ectiveness of ovarian cancer screening using both 1- and 2- phenotype models, considering only the histological types defining the two phenotypes. The 1-phenotype model has the same structure as in Havrilesky et al. [2008]. The health states of the 2-phenotype model include well, undetected stage I - IV “aggressive” ovarian cancer; detected stage I - IV “aggressive” ovarian cancer; undetected stage I - IV “indolent” ovarian cancer; detectedstageI-IV“indolent”ovariancancer; benignoophorectomy; ovariancancer survivor; deathfromovariancancer; anddeathfromothercauses. Allmodelassumptionsfor the 1- and 2-phenotype model are inherited from the 1-phenotype model, with the exception that the ovarian cancer survivors are those who survive 15 years post diagnosis. All model parametersareestimatedusingmethodsdescribedinHavrileskyetal.[2008]: thephenotype- specific transition probabilities between two cancer stages are estimated by fitting clinically justifiablevaluestofittheSEERdataincludingthelifetimeprobabilityofdevelopingovarian cancer, stage distribution, and lifetime probability of death from ovarian cancer; ovarian cancer incidence are estimated by the age-specific SEER data; the transition probabilities involving ovarian cancer survival are estimated by the age- and stage-specific SEER data; the age-specific probability of death from other causes are based on the age-specific U.S. Life Tables data. The length of a cycle is 1 month. All patients enter the model at age 40 and die by age 100. Following model calibration, age-specific probabilities of benign oophorectomy andmortalityfromoophorectomyareestimatedfromliteraturedata(Keshavarzetal.[2002]; Merrill[2006];Wingoetal.[1985]). Themodelisthenvalidatedagainsttheprevalencescreen 79 resultsoftheUKCTOCStrial(Menonetal.[2009]),usingascreeningtestof89.5%sensitivity and 99.8% specificity. Specifically, they simulate a cohort of 1 million women with age distribution approximating the age distribution of UKCTOCS population and with relative risk of developing ovarian cancer set at 2 so that the prevalence of ovarian cancer mimics the UKCTOCS prevalence. The simulated model outputs the estimates of the stage distribution of detections and the positive predictive value (PPV) of the screening for validation. Screening strategies evaluated in this study include no screening and screening at inter- vals of 3-36 months. In the base case, an annual screening of 89.5% sensitivity and 99.8% specificity is performed on women between ages 50 and 85. Base-case analysis results in a PPV of 14% and an average of 0.05 lifetime false positives. In addition, 14.7% mortality reduction is achieved in the 1-phenotype model whereas only 10.9% reduction is achieved in the 2-phenotype model where 68% of cancer deaths result from the aggressive phenotype. The smaller mortality reduction predicted by the 2-phenotype model may be due to the fact that indolent cancers are generally diagnosed at an early stage whereas aggressive cancers are often diagnosed at an advanced stage. Tables 4.1 and 4.2 summarize the modeling studies reviewed in this section. 80 Table 4.1: A summary of ovarian cancer modeling studies Study Screening Strategies Screening Period Target Population Study Outcomes Major Findings Skates and Singer [1991] annual CA 125 30 years ages 50-75 LYS per case 3.42 LYS Urban et al. [1997] annual TVS or CA 125 30 years age 50 LYS and cost/LYS CA 125 + TVS every 6 or 12 months are cost-e↵ective CA125+TVSevery6, 12, 24 months Drescher et al. [2012] annual CA 125 + TVS 40 years age 45 mortality reduction, LYS, and cost/LYS annual CA 125 + TVS reduces mortality by 13%; HB can reduce mortality by at least 25% annual CA 125 + HI a annual HB b + TVS annual HB + HI Schapira et al. [1993] one-time CA 125 + TVS NA c age 40 LYS screening increases the average life expectancy by less than 1 day Havrilesky et al. [2008] HS d every 3-36 months 35 years age 50 mortality reduc- tion, lifetime false-positive tests, PPV e ,andICER (cost/LYS) annual screening re- duces mortality by 43%; annual screening forhigh-riskpopulation can be cost-e↵ective Havrilesky et al. [2011] HS every 3-36 months 35 years age 50 mortalityreduction, PPV, and lifetime false-positive tests annual screening reduces mortality by 10.9% in the 2- phenotype model and 14.7% in the 1-phenotype model a Hypothetical imaging test b Hypothetical biomarker test c Not applicable d Hypothetical screening test e Positive predictive value 81 Table 4.2: A summary of natural history models for ovarian cancer Study Model structure Model parameters Data sources Parameter estimation Skates and Singer [1991] microsimulation stage durations expert opinion modeledasalognormal distribution age at clinical detection literature data derived from data stage at clinical detec- tion literature data, MGH database derived from data survival distribution MGH database maximum likelihood CA 125 growth model literature data distribution fitting Urban et al. [1997] microsimulation stage durations expert opinion, trial modeledasalognormal distribution age at clinical detection SEER database derived from data stage at detection SEER database derived from data survival distribution SEER database Kaplan-Meier CA 125 growth model literature data distribution fitting Drescher et al. [2012] microsimulation disease durations expert opinion derived from data age at clinical detection SEER database derived from data tumor characteristics at detection SEER database derived from data survival distribution SEER database derived from data competingriskmortality U.S. Vital Statistics derived from data Schapira et al. [1993] decision tree prevalence literature data derived from data early-stage disease pro- portion - assumption early-stage disease de- tection probability literature data derived from data Continued on next page 82 Table 4.2 – continued from previous page Study Model structure Model parameters Data sources Parameter estimation life expectancy U.S. Vital Statistics DEALE method 1 Havrilesky et al. [2011, 2008] Markov model transition probabilities SEER database, U.S. Life Tables, literature data direct derivation, cali- bration 1 Beck et al. [1982] 83 4.2 Screening Detection Models In this section, we review three screening detection models of ovarian cancer which estimate the window of opportunity for ovarian cancer screening. Lutz et al. [2008] developed a linear one-compartment model that relates secreted tu- mor biomarker plasma levels to tumor size, with application to ovarian cancer and prostate cancer. The model assumes that the compartment represents well-mixed and kinetically ho- mogenous plasma, that tumor cells secrete the biomarker into the extracellular fluid outside of the plasma, and that only a percentage of the secreted biomarkers will enter the plasma compartment at a continuous rate. Under these model assumptions, the tumor biomarker plasma level at steady state is expressed as a function of the serum half-life of the biomarker and the biomarker secretion rates from normal cells and from tumor cells. The calculated biomarker plasma level is then extrapolated to the total plasma volume of the patient to obtain the total biomarker level. The minimal number of tumor cells required to reach the secreted biomarker level is calculated using the total biomarker level and the percentage of the biomarker entering the plasma compartment. The minimal tumor size is then estimated from the minimal number of tumor cells required. Under the setting of ovarian cancer, the minimally detectable tumor size ranges from 0.11 mm 3 to 3610.14 mm 3 under various com- binations of the CA 125 test assay sensitivity and CA 125 secretion rate, assuming that CA 125 secretion is by the tumor cells only and 10% of CA 125 reaches the plasma. If CA 125 secretion is by both normal and tumor cells, the minimally detectable tumor size that gives apositivetestresultrangesfrom116.7mm 3 to 1.52 ⇥ 10 6 mm 3 under various combinations of the percentage of CA 125 reaching the plasma and CA 125 secretion rate. BrownandPalmer[2009]estimatethedurationofthe“occultperiod”(i.e.,thepreclinical phase) of serous ovarian cancer through the prevalence of non-advanced occult serous cancer in prophylactic bilateral salpingo-oophorectomy (PBSO) specimens and the incidence of serous ovarian cancer in a cohort of women with BRCA1 mutations. The duration of the occult period is equal to the prevalence divided by the incidence, where the prevalence and 84 incidence are estimated from literature data (Ago↵ et al. [2004]; Callahan et al. [2007]; Carcangiu et al. [2006]; Colgan et al. [2001]; Finch et al. [2006a,b]; Kau↵ et al. [2002]; Laki et al. [2007]; Lamb et al. [2006]; Lee et al. [2006]; Leeper et al. [2002]; Liede et al. [2002]; Medeiros et al. [2006]; Olivier et al. [2004]; Paley et al. [2001]; Powell et al. [2005]; Rebbeck et al. [2002]). The stage progression is modeled as a function of tumor size and a function of diameter using separate Kaplan-Meier analyses. Then, exponential models for tumor growth for early- and advanced-stage tumors are fitted against the data from the PBSO studies using the Monte Carlo method. The life histories of 1000 tumors are then simulated through the Monte Carlo method based on the tumor growth models, the stage progression model, andtheprobabilityofclinicaldiagnosisasafunctionofsize(vanNagelletal.[2007]). Hypothetical testings in these 1000 simulated tumors with varying screening intervals (3, 6, 12, or 24 months) and minimum detectable tumor diameters are performed to estimate the sensitivity and survival benefits of early detection. The study assumes that the screening test will fail to detect a tumor below the minimum detectable tumor size with certainty and willalwayssucceedtodetectatumorabovetheminimumsizethreshold. Themeanduration of the entire occult period is estimated to be 5.1 years, with 4.3 years spent as early stage tumors, suggesting that serous ovarian cancers tend to progress to advanced stages 1 year before becoming clinically detectable. Also, the study finds that most occult serous cancers have less than 1 cm in diameter, and that the median diameter of an advanced serous tumor is 3 cm. An annual screening will need to have a minimum size threshold of 1.3 cm to detect earlystagetumorsinnormalriskpopulationtoachieve50%sensitivityand0.4cmtoachieve 80% sensitivity. Finally, a 50% five year mortality reduction requires an annual screen that is able to detect tumors of at most 0.5 cm in diameter, which demands extremely sensitive biomarker assays. Danesh et al. [2012] estimate the window of opportunity for ovarian cancer screening by modeling the progression of ovarian cancer as a multi-type continuous-time branching process. The model assumes that ovarian cancer has three cell subtypes: 85 • primary, • peritoneal, and • metastatic cells. With type i=0,1,2representingtheprimary,peritoneal,andmetastaticcelltype,respec- tively, each cell type i grows at rate i = a i b i where a i is its birth rate and b i is its death rate. The progression rate from type i 1totype i is µ i .Thegrowthrate 0 and 2 are estimated using the exponential growth rates for early- and advanced-stage tumors from Brown and Palmer [2009]. 1 is assumed to be 0 to obtain an upper bound on the win- dow of opportunity for ovarian cancer screening. Theoretical results on branching processes require the knowledge of µ 1 µ 2 ,whichisestimatedtobe10 4 . The window of opportunity is defined as the di↵erence of the time when the number of metastatic cells first reaches 10 9 (approximately 1 cm in diameter) and the time when the number of primary cells first reaches 6.5⇥ 10 7 (approximately 0.5 cm in diameter). The analysis assumes that tumors with diameter greater than 0.5 cm will be detected by the screening test and that there are no false negatives. Under such assumptions, the window of opportunity is 2.9 years, most probably between 30 and 36 months before metastasis. The study concludes that e↵ective screening for ovarian cancer needs to be done at least every two years. 4.3 Cancer Prevention Models GeneticmutationofcancersusceptibilitygenessuchasBRCAisanestablishedriskfactorfor ovarian cancer (American Cancer Society [2017]; Permuth-Wey and Sellers [2009]; Schorge et al. [2010]). Although only 0.1% to 0.2% of the general population are BRCA mutation carriers(Dowdy et al. [2004]), thecumulativerisk ofovarian cancer by age 80isestimated to be44%forBRCA1mutationcarriersand17%forBRCA2mutationcarriers(Kuchenbaecker et al. [2017]), as opposed to the SEER projected lifetime risk of 1.3% for the general pop- ulation. Additionally, the mean age at diagnosis for BRCA-associated cases is significantly 86 younger than that for non-BRCA-associated cases (Boyd et al. [2000]). This suggests that disease initiation is risk-di↵erentiated. Risk-reducing surgeries include the removal of fallop- ian tubes (salpingectomy) and ovaries (oophorectomy), on both sides (bilateral) or only one side (unilateral). As more studies reveal evidence of fallopian tube lesions as a precursor of serous ovarian cancer (Kindelberger et al. [2007]; Labidi-Galy et al. [2017]; Lee et al. [2007]; Piek et al. [2001]), salpingectomy becomes an increasingly important option as a prophylactic measure for ovarian cancer. The Society of Gynecologic Oncology recommends bilateral salpingo- oophorectomy (surgical removal of fallopian tubes and ovaries) between 35 and 40 years as arisk-reducingstrategyforwomenatincreasedgeneticriskofovariancancer(Walkeretal. [2015]). Salpingectomy at the completion of childbearing with delayed oophorectomy is also recommended as an option (Walker et al. [2015]). Randomized clinical trials to examine the e↵ectsoftheseprophylacticsurgeriesmightbeinfeasible, evenamongtheverysmallfraction of women who are at high risk, although model-based analyses might provide insight. In this section, we review the model-based analyses that examine the impact of prophylactic surgeries for ovarian cancer. 4.3.1 Models Without Salpingectomy Before the recent finding of the role of fallopian tube disease in serous ovarian cancer, cancer prevention models tend to focus on women with BRCA mutations and the implications of having such mutations. As a result, both breast and ovarian cancers are incorporated in these models, as these women are at elevated risk for both (American Cancer Society [2017]; Schorge et al. [2010]). These model-based analyses generally examine the impact of prophylactic mastectomy and prophylactic oophorectomy. In this section, we briefly review these models. Schrag et al. [1997] develop a Markov model for women with BRCA mutations and com- pare the e↵ect of nine treatment strategies involving prophylactic mastectomy and prophy- 87 lactic oophorectomy, performed immediately or with a 10-year delay, on life expectancy. van Roosmalen et al. [2002] adopt a modeling approach that is similar to Schrag et al. [1997], and evaluate the impact of prophylactic surgery (mastectomy and oophorectomy) and/or screening for women carrying BRCA1 mutations. Grann et al. [1998] develop a Markov model for women with BRCA mutations to investigate the cost-e↵ectiveness of four strate- gies: prophylactic mastectomy, prophylactic oophorectomy, both prophylactic mastectomy andprophylacticoophorectomy,andsurveillance. UsingamodelstructureidenticaltoGrann etal.[1998], Norumetal.[2008]developaMarkovmodelforNorwegianBRCA1carriersand examinethecost-e↵ectivenessofperformingprophylacticbilateralsalpingo-oophorectomyat age 35, with or without a prophylactic bilateral mastectomy at age 30, compared to no in- tervention. The same conclusions are drawn from these model-based analyses: the combined strategy involving both prophylactic mastectomy and prophylactic oophorectomy is e↵ective or cost-e↵ective compared to other strategies. Grann et al. [2000] develop a Markov model for BRCA mutation carriers to evaluate the survival, quality-adjusted survival, and cost-e↵ectiveness of seven prophylactic strategies: tamoxifen, raloxifene, oral contraceptives, prophylactic mastectomy, prophylactic bilateral salpingo-oophorectomy, both surgeries, and surveillance alone. The study of Grann et al. [2000]leadstoaseriesofmodel-basedanalyseswithsimilarfocus. Grannetal.[2002]perform an analysis that is highly similar to Grann et al. [2000] to incorporate di↵erent assumptions on the risk level for BRCA mutation carriers. Anderson et al. [2006] update the model by Grann et al. [2002] by analyzing BRCA1 and BRCA2 mutations separately, using the age- andmutation-specificbreastcancerandovariancancerincidencebasedonKingetal.[2003]. Grann et al. [2011] update the analysis of Anderson et al. [2006] by considering strategies involving mammography and MRI. The results from these model-based analyses are similar: prophylactic bilateral salpingo-oophorectomy, with or without prophylactic mastectomy, is e↵ective or cost-e↵ective. 88 4.3.2 Models With Salpingectomy Assalpingectomygainsmoreattentionasanimportantoptionforovariancancerprevention, more ovarian cancer models are developed to examine the impact of salpingectomy. In this section, we review these models. Kwon et al. [2013] develop a Markov Monte Carlo simulation model to compare the costs and benefits of salpingectomy with bilateral salpingo-oophorectomy among women with BRCA mutations. As in the models reviewed in §4.3.1, they consider both breast and ovarian cancers in their model. They conclude that bilateral salpingo-oophorectomy confers the highest reduction in breast and ovarian cancer risks. Kwon et al. [2015] focus on the cost-e↵ectiveness of opportunistic salpingectomy as an ovarian cancer prevention strategy in the general population. As in Kwon et al. [2013], they develop a Markov Monte Carlo simulation model. There are four health states: well (i.e., not at risk), at risk, ovarian cancer, and death. Pre-menopausal women enter the model in the “at risk” state, and women who survive ovarian cancer after 10 years transition to the “well” state. Two hypothetical cohorts of women are considered separately. The first cohort consists of 28,000 aged 45 women who undergo hysterectomy for benign conditions and are eligible for elective salpingectomy. This cohort represents the number of women who have hysterectomy with ovarian preservation in Canada in 2011 (McAlpine et al. [2014]). Three strategies are considered for this cohort: hysterectomy alone, hysterectomy with salpingec- tomy, and hysterectomy with bilateral salpingo-oophorectomy. The second cohort consists of 25,000 aged 35 women seeking surgical sterilization. For this cohort, tubal ligation and salpingectomy are considered. Each cohort is modeled for 40 years, and 100 simulations are performed for each. Age-dependent competing risk mortality is based on Canadian Life Ta- bles(StatisticsCanada[2014]). Lifetimeovariancancerriskand10-yearoverallsurvivalwith ovarian cancerareestimated fromCanadianCancer Society[2014]and SEER.Model param- eters,includingovariancancerriskreductionattributabletobilateralsalpingo-oophorectomy and salpingectomy and the proportion of women discontinuing hormone therapy five years 89 after oophorectomy, are derived from the literature (e.g., Cibula et al. [2011]; Kindelberger et al. [2007]; Read et al. [2010]; Rice et al. [2014, 2012]; Seidman et al. [2011]). Sensitivity analysesontreatmentcosts,ageatsurgery,andriskreductionassociatedwithsalpingectomy are conducted. UsingthecostdatafromsourcessuchastheBritishColumbiaMedicalServicesPlan, the Canadian Institute for Health Information, and the Ontario Cancer Registry, hysterectomy with salpingectomy has the least cost and gains more life-years compared to hysterectomy alone or hysterectomy with oophorectomy, although the di↵erences are small. Compared to hysterectomy alone, concomitant salpingectomy reduces ovarian cancer risk by 38.1%, and concomitant oophorectomy by 88.1%. Compared to tubal ligation, salpingectomy is cost-e↵ective. Tubal ligation with concomitant salpingectomy achieves a 29.2% reduction in ovarian cancer risk. Results are sensitive to the age at hysterectomy and the estimates of risk reduction associated with tubal ligation and salpingectomy. Dilley et al. [2017] develop a decision tree model for ovarian cancer to assess the cost- e↵ectiveness of salpingectomy at the time of hysterectomy for benign indications and at permanent contraception for women in the general population. As in Kwon et al. [2015], they consider two cohorts of women: those seeking hysterectomy at age 45 and those seeking surgical sterilization at age 35. The lifetime risk of ovarian cancer (estimated at 1.3%) and ovarian cancer mortality 5 years post-diagnosis (54%) are based on SEER. The estimates of risk reduction associated with tubal ligation, bilateral salpingectomy, and hysterectomy are derived from Falconer et al. [2015]. The surgical complication rates are based on published studies such as Jamieson et al. [2000]. For the permanent contraception model, women may experience unintended pregnancy or ectopic pregnancy after tubal ligation or salpingectomy. The probabilities of these complications are derived from Peterson et al. [1997]. One-way sensitivity analysis is performed on all variables. Additionally, a probabilistic sensitivity analysis is also conducted in which each probability in the model is modeled using a beta distribution. A total of 1,000 simulations are performed in the probabilistic sensitivity 90 analysis. For the hysterectomy cohort, hysterectomy with salpingectomy is the dominant, cost- saving strategy. For the permanent contraception cohort, salpingectomy is cost-e↵ective. The results are sensitive to risk reduction estimates, complication rates, and procedural costs. The probabilistic sensitivity analysis concludes that salpingectomy is cost-e↵ective 62.3% of the time for the hysterectomy model, and 55% of the time for the permanent contraception model, with a willingness-to-pay of $100,000 per QALY. 4.4 Conclusion In this chapter, we have reviewed three types of model-based analyses for ovarian cancer: • models of the natural history of ovarian cancer, • screeningdetectionmodelsthatestimatethewindowofopportunityforearlydetection, and • cancer prevention models that incorporate prophylactic surgeries. Mostofthemodelsassumethatovariancancerishomogeneous. However, K¨ obeletal.[2008] conclude that the subtypes of ovarian cancer (serous, clear cell, endometrioid, and mucinous carcinomas) are di↵erent diseases and should not be analyzed as a whole. The evidence in fallopian tube lesions as a precursor of serous ovarian cancer (Kindelberger et al. [2007]; Labidi-Galy et al. [2017]; Lee et al. [2007]; Piek et al. [2001]) further highlights the need for subtype-specificanalyses. Noneofthemodelsreviewedinthischapterhaveincorporatedthe precursor states. Also, only three ovarian cancer models examine the e↵ect of salpingectomy despite the uptake in its utilization. Most cancer prevention models focus on women with BRCA mutations and the impli- cations of such mutations. The conclusions drawn from these models may not generalize to women at low risk. There are few cancer prevention models that focus on the general population, and they do not di↵erentiate women according to their risk levels. As a result, 91 they do not permit the assessment of risk-di↵erentiated prophylactic strategies. As BRCA mutation carriers have a much higher risk of developing ovarian cancer compared to women atgeneralrisk,arisk-di↵erentiatedmodelcanfacilitatesuchassessmentandprovideabetter understanding on the benefits for women at di↵erent risk levels. Consistent with our general observations in Chapter 2, the disclosure of the details re- gardingcalibrationinmodel-basedanalysesforovariancancerreviewedin§4.3isnonexistent or lacks transparency. Also, methods to incorporate expert judgment into the models are typically not documented. The process of modeling the relationship between risk reduction ofprophylacticsurgeriesandtransitionprobabilitiesisunclear. Inaddition,modelvalidation is nonexistent in the cancer prevention models discussed in §4.3. Sensitivity analyses on the model parameters (for instance the duration of stage I can- cer, probability of undergoing a surgical procedure, screening test sensitivity, screening test specificity, test costs, and risk reduction estimates for prophylactic surgeries) are generally conducted; however, these sensitivity analyses are performed by changing the model pa- rameter one at a time. Such one-way sensitivity analyses do not account for the complex relationships among the model parameters. Only one study (i.e., Dilley et al. [2017]) per- form a probabilistic sensitivity analysis, and there are no sensitivity analyses on calibration uncertainty. Since natural history is the natural progression of a disease without any interventions, the models describing the natural history should not involve interventions such as bilateral oophorectomy, as in Havrilesky et al. [2008] and Havrilesky et al. [2011]. Such interventions should be imposed on the natural history model instead of being incorporated into the natural history model. The cancer prevention models discussed in §4.3 are generally simplistic models with minimal health states. The cancer-related health states are generally not di↵erentiated by cancer stage. Also, pre-diagnostic health states are nonexistent. The lack of these health states might result in a misrepresentation of the cost-e↵ectiveness of a prophylactic surgery, 92 since such prophylactic surgery targets women without a cancer diagnosis. Sincescreeningisundertakeninane↵orttodetectlatentcancersandprophylacticsurgery is performed prior to clinical diagnosis, accurate descriptions of the latent period of the naturalhistoryarethekeystodesigninge↵ectivetoolstostudyscreeningand/orprophylactic protocols. However, the lack of longitudinal data from large cohort studies such as RCTs renders the parameter estimation of the natural history model of ovarian cancer dicult. The lack of longitudinal data also challenges the estimation of the window of opportunity for early detection and prevention. By applying our calibration model to ovarian cancer data, we can perform model-based analyses to investigate the potential of screening and/or prophylactic strategies for ovarian cancer. 93 4.5 Bibliography Ago↵, S. N., Garcia, R. L., Go↵, B., and Swisher, E. (2004). Follow-up of in situ and early-stage fallopian tube carcinoma in patients undergoing prophylactic surgery for proven or suspected BRCA-1 or BRCA-2 mutations. The American Journal of Surgical Pathology, 28(8):1112–1114. American Cancer Society (2017). Cancer Facts & Figures 2017. American Cancer Society, Atlanta. Anderson, G. L., McIntosh, M., Wu, L., Barnett, M., Goodman, G., Thorpe, J. D., Bergan, L., Thornquist, M. D., Scholler, N., Kim, N., et al. (2010). Assessing lead time of selected ovarian cancer biomarkers: A nested case–control study. Journal of the National Cancer Institute,102(1):26–38. Anderson, K., Jacobson, J. S., Heitjan, D. F., Zivin, J. G., Hershman, D., Neugut, A. I., and Grann, V. R. (2006). Cost-e↵ectiveness of preventive strategies for women with a BRCA1 or a BRCA2 mutation. Annals of Internal Medicine,144(6):397–406. Beck, J. R., Kassirer, J. P., and Pauker, S. G. (1982). A convenient approximation of life expectancy(the“DEALE”):I.validationofthemethod. The American Journal of Medicine, 73(6):883–888. Boyd, J., Sonoda, Y., Federici, M.G., Bogomolniy, F., Rhei, E., Maresco, D.L., Saigo, P.E., Almadrones, L. A., Barakat, R. R., Brown, C. L., et al. (2000). Clinicopathologic features of BRCA-linked and sporadic ovarian cancer. JAMA,283(17):2260–2265. Brown, P.O.andPalmer, C.(2009). Thepreclinicalnaturalhistoryofserousovariancancer: Defining the target for early detection. PLoS Medicine,6(7):1–14. Buys, S.S., Partridge, E., Black, A., Johnson, C. C., Lamerato, L., Isaacs, C., Reding, D.J., Greenlee, R. T., Yokochi, L. A., Kessel, B., et al. (2011). E↵ect of screening on ovarian cancer mortality: The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Randomized Controlled Trial. JAMA,305(22):2295–2303. Callahan, M. J., Crum, C. P., Medeiros, F., Kindelberger, D. W., Elvin, J. A., Garber, J. E., Feltmate, C. M., Berkowitz, R. S., and Muto, M. G. (2007). Primary fallopian tube malignancies in BRCA-positive women undergoing surgery for ovarian cancer risk reduction. Journal of Clinical Oncology,25(25):3985–3990. CanadianCancerSociety(2014). CanadianCancerStatistics2014.CanadianCancerSociety, Toronto (Canada). Available at: http://www.cancer.ca/ /media/cancer.ca/CW/cancer in- formation/cancer 101/Canadian cancer statistics/Canadian-Cancer-Statistics-2014-EN.pdf. Cantor,S.B.,Spann,S.J.,Volk,R.J.,Cardenas,M.P.,andWarren,M.M.(1995). Prostate 94 cancer screening: A decision analysis. The Journal of Family Practice,41(1):33–41. Carcangiu, M. L., Peissel, B., Pasini, B., Spatti, G., Radice, P., and Manoukian, S. (2006). Incidental carcinomas in prophylactic specimens in BRCA1 and BRCA2 germ-line muta- tion carriers, with emphasis on fallopian tube lesions: Report of 6 cases and review of the literature. The American Journal of Surgical Pathology,30(10):1222–1230. Cibula, D., Widschwendter, M., Majek, O., and Dusek, L. (2011). Tubal ligation and the riskofovariancancer: Reviewandmeta-analysis. Human Reproduction Update,17(1):55–67. Colgan, T. J., Murphy, J., Cole, D. E., Narod, S., and Rosen, B. (2001). Occult carcinoma in prophylactic oophorectomy specimens: Prevalence and association with BRCA germline mutation status. The American Journal of Surgical Pathology,25(10):1283–1289. Cutler, S. J. and Young, J. L., editors (1975). Third National Cancer Survey: Incidence Data, National Cancer Institute Monograph 41, DHEW Publication No. (NIH) 75-787. U.S. Government Printing Oce, Washington, DC. Danesh, K., Durrett, R., Havrilesky, L. J., and Myers, E. (2012). A branching process model of ovarian cancer. Journal of Theoretical Biology,314:10–15. Dilley, S. E., Havrilesky, L. J., Bakkum-Gamez, J., Cohn, D. E., Straughn, J. M., Caughey, A. B., and Rodriguez, M. I. (2017). Cost-e↵ectiveness of opportunistic salpingectomy for ovarian cancer prevention. Gynecologic Oncology,146(2):373–379. Dowdy, S. C., Stefanek, M., and Hartmann, L. C. (2004). Surgical risk reduction: Prophy- lactic salpingo-oophorectomy and prophylactic mastectomy. American Journal of Obstetrics and Gynecology,191(4):1113–1123. Drescher, C. W., Hawley, S., Thorpe, J. D., Marticke, S., McIntosh, M., Gambhir, S. S., and Urban, N. (2012). Impact of screening test performance and cost on mortality reduction and cost-e↵ectiveness of multimodal ovarian cancer screening. Cancer Prevention Research, 5(8):1015–1024. Falconer, H., Yin, L., Gr¨ onberg, H., and Altman, D. (2015). Ovarian cancer risk after salp- ingectomy: A nationwide population-based study. Journal of the National Cancer Institute, 107(2):dju410. Finch, A., Beiner, M., Lubinski, J., Lynch, H. T., Moller, P., Rosen, B., Murphy, J., Ghadirian, P., Friedman, E., Foulkes, W. D., et al. (2006a). Salpingo-oophorectomy and the risk of ovarian, fallopian tube, and peritoneal cancers in women with a BRCA1 or BRCA2 mutation. JAMA,296(2):185–192. Finch, A., Shaw, P., Rosen, B., Murphy, J., Narod, S.A., andColgan, T.J.(2006b). Clinical and pathologic findings of prophylactic salpingo-oophorectomies in 159 BRCA1 and BRCA2 95 carriers. Gynecologic Oncology,100(1):58–64. Fryback, D. G., Stout, N. K., Rosenberg, M. A., Trentham-Dietz, A., Kuruchittham, V., and Remington, P. L. (2006). The Wisconsin breast cancer epidemiology simulation model. Journal of the National Cancer Institute Monographs,2006(36):37–47. Grann, V. R., Jacobson, J. S., Thomason, D., Hershman, D., Heitjan, D. F., and Neugut, A. I. (2002). E↵ect of prevention strategies on survival and quality-adjusted survival of women with BRCA1/2 mutations: An updated decision analysis. Journal of Clinical Oncol- ogy,20(10):2520–2529. Grann, V. R., Jacobson, J. S., Whang, W., Hershman, D., Heitjan, D. F., Antman, K. H., and Neugut, A. I. (2000). Prevention with tamoxifen or other hormones versus prophylac- tic surgery in BRCA1/2-positive women: A decision analysis. The Cancer Journal from Scientific American,6(1):13–20. Grann, V. R., Panageas, K. S., Whang, W., Antman, K. H., and Neugut, A. I. (1998). Deci- sion analysis of prophylactic mastectomy and oophorectomy in BRCA1-positive or BRCA2- positive patients. Journal of Clinical Oncology,16(3):979–985. Grann, V. R., Patel, P. R., Jacobson, J. S., Warner, E., Heitjan, D. F., Ashby-Thompson, M., Hershman, D. L., and Neugut, A. I. (2011). Comparative e↵ectiveness of screening and prevention strategies among BRCA1/2-a↵ected mutation carriers. Breast Cancer Research and Treatment,125(3):837–847. Havrilesky, L. J., Sanders, G. D., Kulasingam, S., Chino, J. P., Berchuck, A., Marks, J. R., and Myers, E. R. (2011). Development of an ovarian cancer screening decision model that incorporates disease heterogeneity. Cancer,117(3):545–553. Havrilesky, L. J., Sanders, G. D., Kulasingam, S., and Myers, E. R. (2008). Reducing ovarian cancer mortality through screening: Is it possible, and can we a↵ord it? Gynecologic Oncology,111(2):179–187. Jacobs, I. J., Menon, U., Ryan, A., Gentry-Maharaj, A., Burnell, M., Kalsi, J. K., Amso, N. N., Apostolidou, S., Benjamin, E., Cruickshank, D., et al. (2016). Ovarian cancer screen- ing and mortality in the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS): Arandomisedcontrolledtrial. The Lancet,387(10022):945–956. Jamieson, D. J., Hillis, S. D., Duerr, A., Marchbanks, P. A., Costello, C., Peterson, H. B., et al. (2000). Complications of interval laparoscopic tubal sterilization: Findings from the United States collaborative review of sterilization. Obstetrics & Gynecology,96(6):997–1002. Katsube, Y., Berg, J., and Silverberg, S. (1982). Epidemiologic pathology of ovarian tumors: A histopathologic review of primary ovarian neoplasms diagnosed in the Denver Standard Metropolitan Statistical Area, 1 July-31 December 1969 and 1 July-31 December 1979. In- 96 ternational Journal of Gynecological Pathology,1(1):3–16. Kau↵, N. D., Satagopan, J. M., Robson, M. E., Scheuer, L., Hensley, M., Hudis, C. A., Ellis, N. A., Boyd, J., Borgen, P. I., Barakat, R. R., et al. (2002). Risk-reducing salpingo- oophorectomy in women with a BRCA1 or BRCA2 mutation. New England Journal of Medicine,346(21):1609–1615. Keshavarz, H., Hillis, S. D., Kieke, B. A., and Marchbanks, P. A. (2002). Hysterectomy surveillance—UnitedStates,1994–1999. MMWR CDC Surveillance Summaries,51(SS05):1– 8. Kindelberger,D.W.,Lee,Y.,Miron,A.,Hirsch,M.S.,Feltmate,C.,Medeiros,F.,Callahan, M. J., Garner, E. O., Gordon, R. W., Birch, C., et al. (2007). Intraepithelial carcinoma of the fimbria and pelvic serous carcinoma: Evidence for a causal relationship. The American Journal of Surgical Pathology,31(2):161–169. King, M.-C., Marks, J. H., Mandell, J. B., et al. (2003). Breast and ovarian cancer risks due to inherited mutations in BRCA1 and BRCA2. Science,302(5645):643–646. Kobayashi, T., Goto, R., Ito, K., and Mitsumori, K. (2007). Prostate cancer screening strategies with re-screening interval determined by individual baseline prostate-specific anti- gen values are cost-e↵ective. European Journal of Surgical Oncology,33(6):783–789. K¨ obel, M., Kalloger, S. E., Boyd, N., McKinney, S., Mehl, E., Palmer, C., Leung, S., Bowen, N. J., Ionescu, D. N., Rajput, A., et al. (2008). Ovarian carcinoma subtypes are di↵erent diseases: Implications for biomarker studies. PLoS Medicine,5(12):1749–1760. Kong, C. Y., McMahon, P. M., and Gazelle, G. S. (2009). Calibration of disease simulation model using an engineering approach. Value in Health,12(4):521–529. Kuchenbaecker, K. B., Hopper, J. L., Barnes, D. R., Phillips, K.-A., Mooij, T. M., Roos- Blom, M.-J., Jervis, S., VanLeeuwen, F.E., Milne, R.L., Andrieu, N., etal.(2017). Risksof breast, ovarian, and contralateral breast cancer for BRCA1 and BRCA2 mutation carriers. JAMA,317(23):2402–2416. Kwon,J.S.,McAlpine,J.N.,Hanley,G.E.,Finlayson,S.J.,Cohen,T.,Miller,D.M.,Gilks, C. B., and Huntsman, D. G. (2015). Costs and benefits of opportunistic salpingectomy as an ovarian cancer prevention strategy. Obstetrics & Gynecology,125(2):338–345. Kwon, J.S., Tinker, A., Pansegrau, G., McAlpine, J., Housty, M., McCullum, M., andGilks, C. B. (2013). Prophylactic salpingectomy and delayed oophorectomy as an alternative for BRCA mutation carriers. Obstetrics & Gynecology,121(1):14–24. Labidi-Galy, S. I., Papp, E., Hallberg, D., Niknafs, N., Adle↵, V., Noe, M., Bhattacharya, R., Novak, M., Jones, S., Phallen, J., Hruban, C. A., Hirsch, M. S., Lin, D. I., Schwartz, L., 97 Maire, C. L., Tille, J.-C., Bowden, M., Ayhan, A., Wood, L. D., Scharpf, R. B., Kurman, R., Wang, T.-L., Shih, I.-M., Karchin, R., Drapkin, R., and Velculescu, V. E. (2017). High grade serous ovarian carcinomas originate in the fallopian tube. Nature Communications, 8(1):1093. Laki, F., Kirova, Y. M., This, P., Plancher, C., Asselain, B., Sastre, X., Stoppa-Lyonnet, D., and Salmon, R. (2007). Prophylactic salpingo-oophorectomy in a series of 89 women carrying a BRCA1 or a BRCA2 mutation. Cancer,109(9):1784–1790. Lamb, J. D., Garcia, R. L., Go↵, B. A., Paley, P. J., and Swisher, E. M. (2006). Predictors of occult neoplasia in women undergoing risk-reducing salpingo-oophorectomy. American Journal of Obstetrics and Gynecology,194(6):1702–1709. Lee, Y., Medeiros, F., Kindelberger, D., Callahan, M. J., Muto, M. G., and Crum, C. P. (2006). Advances in the recognition of tubal intraepithelial carcinoma: Applications to cancer screening and the pathogenesis of ovarian cancer. Advances in Anatomic Pathology, 13(1):1–7. Lee, Y., Miron, A., Drapkin, R., Nucci, M., Medeiros, F., Saleemuddin, A., Garber, J., Birch, C., Mou, H., Gordon, R., et al. (2007). A candidate precursor to serous carcinoma that originates in the distal fallopian tube. The Journal of Pathology,211(1):26–35. Leeper, K., Garcia, R., Swisher, E., Go↵, B., Greer, B., and Paley, P. (2002). Pathologic findings in prophylactic oophorectomy specimens in high-risk women. Gynecologic Oncology, 87(1):52–56. Liede, A., Karlan, B. Y., Baldwin, R. L., Platt, L. D., Kuperstein, G., and Narod, S. A. (2002). Cancer incidence in a population of Jewish women at risk of ovarian cancer. Journal of Clinical Oncology,20(6):1570–1577. Lutz, A. M., Willmann, J. K., Cochran, F. V., Ray, P., and Gambhir, S. S. (2008). Cancer screening: A mathematical model relating secreted blood biomarker levels to tumor sizes. PLoS Medicine,5(8):1287–1297. Mandelblatt, J. S., Lawrence, W. F., Womack, S. M., Jacobson, D., Yi, B., Hwang, Y.-t., Gold, K., Barter, J., and Shah, K. (2002). Benefits and costs of using HPV testing to screen for cervical cancer. JAMA,287(18):2372–2381. McAlpine, J. N., Hanley, G. E., Woo, M. M., Tone, A. A., Rozenberg, N., Swenerton, K. D., Gilks, C. B., Finlayson, S. J., Huntsman, D. G., and Miller, D. M. (2014). Opportunistic salpingectomy: Uptake, risks, and complications of a regional initiative for ovarian cancer prevention. American Journal of Obstetrics & Gynecology,210(5):471.e1–471.e11. McIntosh, M.W., Urban, N., andKarlan, B.(2002). Generatinglongitudinalscreeningalgo- rithms using novel biomarkers for disease. Cancer Epidemiology Biomarkers & Prevention, 98 11(2):159–166. McLay, L. A., Foufoulides, C., and Merrick, J. R. (2010). Using simulation-optimization to construct screening strategies for cervical cancer. Health Care Management Science, 13(4):294–318. Medeiros, F., Muto, M. G., Lee, Y., Elvin, J. A., Callahan, M. J., Feltmate, C., Garber, J. E., Cramer, D. W., and Crum, C. P. (2006). The tubal fimbria is a preferred site for early adenocarcinoma in women with familial ovarian cancer syndrome. The American Journal of Surgical Pathology,30(2):230–236. Menon, U., Gentry-Maharaj, A., Hallett, R., Ryan, A., Burnell, M., Sharma, A., Lewis, S., Davies, S., Philpott, S., Lopes, A., et al. (2009). Sensitivity and specificity of multimodal and ultrasound screening for ovarian cancer, and stage distribution of detected cancers: Results of the prevalence screen of the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS). The Lancet Oncology,10(4):327–340. Merrill, R. M. (2006). Impact of hysterectomy and bilateral oophorectomy on race-specific rates of corpus, cervical, and ovarian cancers in the United States. Annals of Epidemiology, 16(12):880–887. Moyer, V. A. (2012). Screening for ovarian cancer: U.S. Preventive Services Task Force rearmation recommendation statement. Annals of Internal Medicine,157(12):900–904. Myers, E. R., McCrory, D. C., Nanda, K., Bastian, L., and Matchar, D. B. (2000). Math- ematical model for the natural history of human papillomavirus infection and cervical car- cinogenesis. American Journal of Epidemiology,151(12):1158–1171. Norum, J., Hagen, A. I., Mæhle, L., Apold, J., Burn, J., and Møller, P. (2008). Prophylactic bilateral salpingo-oophorectomy (PBSO) with or without prophylactic bilateral mastectomy (PBM) or no intervention in BRCA1 mutation carriers: A cost-e↵ectiveness analysis. Euro- pean Journal of Cancer,44(7):963–971. Olivier, R., van Beurden, M., Lubsen, M., Rookus, M., Mooij, T., van de Vijver, M., and van’t Veer, L. (2004). Clinical outcome of prophylactic oophorectomy in BRCA1/BRCA2 mutation carriers and events during follow-up. British Journal of Cancer,90(8):1492–1497. Paley, P.J., Swisher, E.M., Garcia, R.L., Ago↵, S.N., Greer, B.E., Peters, K.L., andGo↵, B. A. (2001). Occult cancer of the fallopian tube in BRCA-1 germline mutation carriers at prophylactic oophorectomy: A case for recommending hysterectomy at surgical prophylaxis. Gynecologic Oncology,80(2):176–180. Partridge, E., Greenlee, R. T., Xu, J.-L., Kreimer, A. R., Williams, C., Riley, T., Reding, D. J., Church, T. R., Kessel, B., Johnson, C. C., et al. (2009). Results from four rounds of ovarian cancer screening in a randomized trial. Obstetrics and Gynecology,113(4):775–782. 99 Permuth-Wey, J. and Sellers, T. A. (2009). Epidemiology of Ovarian Cancer,pages413–437. Humana Press, Totowa, NJ. Peterson, H. B., Xia, Z., Hughes, J. M., Wilcox, L. S., Tylor, L. R., and Trussell, J. (1997). The risk of ectopic pregnancy after tubal sterilization. New England Journal of Medicine, 336(11):762–767. Piek, J. M., van Diest, P. J., Zweemer, R. P., Jansen, J. W., Poort-Keesom, R. J., Menko, F. H., Gille, J. J., Jongsma, A. P., Pals, G., Kenemans, P., et al. (2001). Dysplastic changes in prophylactically removed Fallopian tubes of women predisposed to developing ovarian cancer. The Journal of Pathology,195(4):451–456. Powell, C. B., Kenley, E., Chen, L.-m., Crawford, B., McLennan, J., Zaloudek, C., Ko- maromy, M., Beattie, M., and Ziegler, J. (2005). Risk-reducing salpingo-oophorectomy in BRCA mutation carriers: Role of serial sectioning in the detection of occult malignancy. Journal of Clinical Oncology,23(1):127–132. Read, M. D., Edey, K. A., Hapeshi, J., and Foy, C. (2010). Compliance with estrogen hormone replacement therapy after oophorectomy: A prospective study. Menopause inter- national,16(2):60–64. Rebbeck, T. R., Lynch, H. T., Neuhausen, S. L., Narod, S. A., van’t Veer, L., Garber, J. E., Evans, G., Isaacs, C., Daly, M. B., Matlo↵, E., et al. (2002). Prophylactic oophorectomy in carriers of BRCA1 or BRCA2 mutations. New England Journal of Medicine,346(21):1616– 1622. Rice, M. S., Hankinson, S. E., and Tworoger, S. S. (2014). Tubal ligation, hysterectomy, unilateral oophorectomy, and risk of ovarian cancer in the Nurses’ Health Studies. Fertility and Sterility,102(1):192–198. Rice, M. S., Murphy, M. A., and Tworoger, S. S. (2012). Tubal ligation, hysterectomy and ovarian cancer: A meta-analysis. Journal of Ovarian Research,5(1):13. Rutter, C. M. and Savarino, J. E. (2010). An evidence-based microsimulation model for col- orectal cancer: Validation and application. Cancer Epidemiology Biomarkers & Prevention, 19(8):1992–2002. Schapira, M. M., Matchar, D. B., and Young, M. J. (1993). The e↵ectiveness of ovarian cancer screening: A decision analysis model. Annals of Internal Medicine,118(11):838–843. Schorge, J. O., Modesitt, S. C., Coleman, R. L., Cohn, D. E., Kau↵, N. D., Duska, L. R., and Herzog, T. J. (2010). SGO white paper on ovarian cancer: Etiology, screening and surveillance. Gynecologic Oncology,119(1):7–17. Schrag, D., Kuntz, K. M., Garber, J. E., and Weeks, J. C. (1997). Decision analysis – 100 E↵ects of prophylactic mastectomy and oophorectomy on life expectancy among women with BRCA1 or BRCA2 mutations. New England Journal of Medicine,336(20):1465–1471. Schultz, F., Boer, R., and de Koning, H. (2012). Description of MISCAN-lung, the Erasmus MC lung cancer microsimulation model for evaluating cancer control interventions. Risk Analysis,32(s1):S85–S98. Seidman, J. D., Zhao, P., and Yemelyanova, A. (2011). “Primary peritoneal” high-grade serous carcinoma is very likely metastatic from serous tubal intraepithelial carcinoma: As- sessing the new paradigm of ovarian and pelvic serous carcinogenesis and its implications for screening for ovarian cancer. Gynecologic Cncology,120(3):470–473. Siebert, U., Sroczynski, G., Hillemanns, P., Engel, J., Stabenow, R., Stegmaier, C., Voigt, K., Gibis, B., H¨ olzel, D., and Goldie, S. J. (2006). The German cervical cancer screening model: Developmentandvalidationofadecision-analyticmodelforcervicalcancerscreening in Germany. The European Journal of Public Health,16(2):185–192. Skates, S. J. and Singer, D. E. (1991). Quantifying the potential benefit of CA 125 screening for ovarian cancer. Journal of Clinical Epidemiology,44(4):365–380. Statistics Canada (2014). Complete Life table, Canada, 2000 to 2002: Females. Available at: http://www.statcan.gc.ca/ pub/84-537-x/t/pdf/4198611-eng.pdf. Tan, S. Y., van Oortmarssen, G. J., de Koning, H. J., Boer, R., and Habbema, J. D. F. (2006). The MISCAN-Fadia continuous tumor growth model for breast cancer. Journal of the National Cancer Institute Monographs,2006(36):56–65. Urban, N., Drescher, C., Etzioni, R., and Colby, C. (1997). Use of a stochastic simulation modeltoidentifyanecientprotocolforovariancancerscreening. Controlled Clinical Trials, 18(3):251–270. vanNagell,J.R.,DePriest,P.D.,Ueland,F.R.,DeSimone,C.P.,Cooper,A.L.,McDonald, J. M., Pavlik, E. J., and Kryscio, R. J. (2007). Ovarian cancer screening with annual transvaginal sonography. Cancer,109(9):1887–1896. van Roosmalen, M. S., Verhoef, L. C., Stalmeier, P. F., Hoogerbrugge, N., and van Daal, W. A. (2002). Decision analysis of prophylactic surgery or screening for BRCA1 mutation carriers: Amoreprominentroleforoophorectomy. Journal of Clinical Oncology,20(8):2092– 2100. Walker, J. L., Powell, C. B., Chen, L., Carter, J., Bae Jump, V. L., Parker, L. P., Borowsky, M. E., and Gibb, R. K. (2015). Society of Gynecologic Oncology recommendations for the prevention of ovarian cancer. Cancer,121(13):2108–2120. Wingo, P. A., Huezo, C. M., Rubin, G. L., Ory, H. W., and Peterson, H. B. (1985). The 101 mortalityriskassociatedwithhysterectomy. American Journal of Obstetrics and Gynecology, 152(7):803–808. Yabro↵, K. R., Lamont, E. B., Mariotto, A., Warren, J. L., Topor, M., Meekins, A., and Brown, M. L. (2008). Cost of care for elderly cancer patients in the United States. Journal of the National Cancer Institute,100(9):630–641. Zauber, A. G., Lansdorp-Vogelaar, I., Knudsen, A. B., Wilschut, J., van Ballegooijen, M., and Kuntz, K. M. (2008). Evaluating test strategies for colorectal cancer screening: A decision analysis for the U.S. Preventive Services Task Force. Annals of Internal Medicine, 149(9):659–669. 102 Chapter 5: Calibration Uncertainty and Model-Based Analyses with Applica- tions to Ovarian Cancer Modeling 5.1 Introduction Medical decision making (MDM) studies typically include comparative analyses of disease screening or treatment options in order to understand the relative costs and benefits of various strategies and programs under consideration. Randomized control trials (RCTs) are consideredthegoldstandardforsuchpurposesbecausetheactuale↵ectsofdi↵erentmedical interventions across a representative segment of the population are observed and analyzed. However, RCTs are typically cost- and time-intensive, and the number of interventions that can be tested within a trial are limited. As a result, model-based analyses for MDM can be appealing as surrogates for RCTs. Model-based analysis for MDM relies heavily upon a natural history (NH) model, which represents disease progression and regression in the absence of interventions. A NH model requiresthespecificationofvariousmodelparameters, someofwhichmaynotbeobservable, and generates modeled outcomes that lead to a representation of the costs and benefits of interventions. The NH model can also be used to calculate modeled outcomes that can be compared to observed data from epidemiological and clinical studies. Because it serves as a replacement for direct observation, it is essential to ensure that a NH model is consistent with such data. The process of selecting model parameters that provide consistency between modeled outcomes and observed data is known as calibration. In order to assess interventions such as screening, vaccination, etc, a NH model will necessarily incorporate phases of a disease that occur prior to diagnosis, and data associated with these phases are typically not available. This gives rise to a phenomenon known as “calibration uncertainty”, where distinct sets of model parameters o↵er consistency with observed data while providing decidedly di↵erent 103 representations of unobservable phases. Because model parameters influence comparative analyses, insucient examination of the impact of calibration uncertainty on model recom- mendations can lead to misleading interpretations of the conclusions drawn. Inthischapter, wepresentanapproachtomodel-basedanalysisforMDMthatsystemat- ically examines the breadth of models that can plausibly represent the disease. We illustrate the approach within the context of ovarian cancer, with a particular emphasis on the cor- responding variability of modeled outcomes that might impact typical comparative analyses of methods for early detection. 5.2 Model-Based Analysis for Ovarian Cancer Ovarian cancer is the fifth deadliest cancer for females, with 22,500 estimated new cases and 14,000 estimated deaths in 2017 (American Cancer Society [2017]). While a small fraction (15%)ofthecasesarediagnosedatthelocalizedstage, wherethe5-yearsurvivalrateis92%, the majority of the cases (60%) are diagnosed at the distant stage, where the 5-year survival rate is only 29% (American Cancer Society [2017]). Although early detection often increases treatment options and improves survival outlook for cancer patients, the U.S. Preventive Services Task Force (USPSTF) does not recommend screening for ovarian cancer (Moyer [2012]). The strongest empirical evidence of the e↵ectiveness of cancer screening strategies often comes from RCTs. Two large RCTs involving ovarian cancer appear in the literature: the Prostate, Lung, Colorectal and Ovarian study (PLCO) (Buys et al. [2011]) and the United Kingdom Collaborative Trial of Ovarian Cancer Screening study (UKCTOCS) (Menon et al. [2009]). PLCO is a United States based RCT targeting women from age 55 to 74, which considers screening based on either a biomarker (CA 125) or imaging (transvaginal ultra- sound, TVS). The study concludes that the screening regimens examined did not reduce mortality from ovarian cancer. UKCTOCS is a United Kingdom based RCT targeting post- menopausal women from age 50 to 74, which concludes that screening that combines CA 104 125 and TVS may reduce ovarian cancer mortality (Jacobs et al. [2016]). The literature also includes a very small number of models of the natural history of ovarian cancer. Skates and Singer [1991] present a stochastic simulation model designed to evaluate the potential benefit of using CA 125 radioimmunoassay to screen for ovarian cancer. Given the focus on a screening program, a representation of the disease prior to diagnosis is necessary. Thus, their model consists of four components: • amodelofthenaturalhistoryofovariancancer, • a model of the time of clinical detection of ovarian cancer, • a model of the survival probability for each cancer stage, and • amodelofthescreeningprogram. Thenaturalhistoryofovariancancerismodeledusingthestandardfour-stagecancerstaging system. The joint log-normal distribution of the durations of the four stages is represented as a function of two variables: the mean duration in stage I and the coecient of variation of the duration of each stage, which is assumed to be constant across all stages. The survival distributionsaredi↵erentiatedbystageatdiagnosis,andarederivedviamaximumlikelihood estimations based on data from the Massachusetts General Hospital tumor registry. The resulting model is used to assess a comparison of post-diagnosis survival with, and without, a screening strategy based on CA 125 serum levels. Urban et al. [1997] extend the model developed by Skates and Singer [1991] and evaluate six ovarian cancer screening strategies in terms of their ecacy and cost-e↵ectiveness. Their naturalhistorymodelisbasedonthefour-stagelog-normalmodelofSkatesandSinger[1991]. Theirmodelforclinicaldetectionassignseachcaseanageandstageatdiagnosisaccordingto the age- and stage-specific distributions given by the Surveillance, Epidemiology, and End Results Program (SEER) of the National Cancer Institute (NCI). Departing from Skates and Singer, the age- and stage-specific survival distributions are estimated by applying the 105 Kaplan-Meier method to the SEER data. Six screening protocols involving TVS and/ or CA 125 assay are evaluated. Drescher et al. [2012] refine the model of Urban et al. [1997] by introducing hypothetical biomarker and imaging tests. Parameter estimates for the natural history model are derived from the SEER database, the U.S. Vital Statistics Report, and literature (Anderson et al. [2010]; Havrilesky et al. [2011]; Katsube et al. [1982]; Partridge et al. [2009]; Yabro↵ et al. [2008]), except that the malignant durations are point estimates provided by gynecologic oncologists. A hypothetical cohort of 1 million women are screened annually from ages 45 to 85 using a multimodal screening test with the first-line test being either CA 125 or a hypothetical biomarker assay, and the second-line test being either TVS or a hypothetical imaging test. The survival component assigns a time of death for all clinically diagnosed cases and screen detected cases, derived from the age-, stage-, histology-, and grade-specific survival data obtained from SEER. Schapiraetal.[1993]conductananalysisofthee↵ectivenessofaone-timeovariancancer screening with CA 125 and TVS in a cohort of 40-year-old women in the United States. Unlike the approaches that developed from Skates and Singer [1991], Schapira et al. [1993] use a simplifed decision tree to compare “no screening” to one-time with CA 125 and TVS at age 40. The disease model involves only: • ovarian cancer prevalence, • proportion of prevalent cases that are in the early stages at the time of screening, and • probability of early-stage disease diagnosed clinically in the absence of screening. The prevalence of ovarian cancer in 40-year-old women is estimated as the product of age- specific incidence (based on Cutler and Young [1975]) and the average duration of the pre- clinical disease phase (assumed to be 2 years). The time it takes to progress from early- to late-stage disease before clinical diagnosis is assumed to be one year. Associated with each 106 branch of the decision tree is a terminal node where appropriate remaining life expectancy is assigned to each terminal node. AlsodepartingfromthemodelsthatdevelopedfromSkatesandSinger[1991], Havrilesky et al. [2008] develop a discrete-time Markov chain model of the natural history of ovarian cancer and study the cost-e↵ectiveness of potential screening strategies. The model consists of 13 health states, including well, undiagnosed stage I - IV ovarian cancer, diagnosed stage I-IVovariancancer,benignoophorectomy(surgicalremovalofovarieswithoutapriori evidence of disease), ovarian cancer survivor, death from ovarian cancer, and death from other causes. All patients enter the model at age 20 and die by age 100. Data obtained from SEER that aids calibration includes the lifetime probability of developing ovarian cancer, stage distribution at diagnosis, and lifetime probability of death from ovarian cancer. Age- specific U.S. Life Tables are used to estimate age-specific probability of death from other causes. Stationary transition probabilities between pairs of cancer stages that are consistent with the SEER data are obtained by searching over clinically justifiable values. Age-specific probabilities of benign oophorectomy and mortality from oophorectomy are estimated from literature data (Keshavarz et al. [2002]; Merrill [2006]; Wingo et al. [1985]) following model calibration. Imposedontheirnaturalhistorymodelareseveralhypotheticalscreeningstrate- gies, including no screening and screening at intervals of 3-36 months. Two scenarios are examined in their study: screening within the general population and within a high-risk population (simulated based on the prevalence of having a risk factor and the relative risk of ovarian cancer). The study concludes that annual screening is potentially cost e↵ective, especially in high-risk population. Havrilesky et al. [2011] extend their original model (Havrilesky et al. [2008]) to include two distinct ovarian cancer phenotypes, referred to as “aggressive” and “indolent”. Model parameters are estimated using methods described in Havrilesky et al. [2008]. Following the selection of model parameters, age-specific probabilities of benign oophorectomy and mortality from oophorectomy are estimated from literature data (Keshavarz et al. [2002]; 107 Merrill [2006]; Wingo et al. [1985]). Screening strategies evaluated in this study include no screening and the use of a hypothetical screening test at intervals of 3-36 months. In an e↵ort to estimate the duration of the preclinical phase of serous ovarian cancer, Brown and Palmer [2009] examine prophylactic salpingo-oophorectomy specimens . A broad range of techniques are used to create a tumor-growth model for early- and advanced-stage tumors. Simulated tumors are then subjected to hypothetical screening tests in order to assess a relationship between tumor size at diagnosis and the benefits of early detection. Summary Becausescreeningisundertakeninane↵orttodetectlatentcancers, accuratedescriptionsof the latent period of the natural history are key to designing e↵ective tools to study screening protocols. However, the lack of longitudinal data from large cohort studies such as RCTs renderstheparameterestimationofthenaturalhistorymodelofovariancancerdicult. The lackoflongitudinaldataalsochallengestheestimationofthewindowofopportunityforearly detection. Intheremainderofthischapter, wediscussmodel-basedanalysesthatinvestigate the screening potential for ovarian cancer. Our modeling approach aims to characterize the setofmodelsthatcanbeconsideredtoprovide“plausible”representationsofovariancancer. Byexaminingthebreadthofsuchmodels,wecanexploretherangeofpotentialunobservable disease characteristics, as well as the range of potential outcomes for various interventions. 5.3 Natural History Model for Ovarian Cancer Natural history model development requires identification of data sources and model struc- ture, specification of validity conditions, and calibrated model parameters that satisfy valid- ity conditions and yield consistency with available data. In this section, we discuss each of these in turn. 108 5.3.1 Data: Sources and Characteristics The SEER program routinely collects data from, and provides cancer statistics for, the United States. Its database currently covers approximately 30% of the U.S. population, and isacommonsourceofdataforNHcancermodels. SEERdataincludespatientdemographics, tumor characteristics, stage at diagnosis, initial course of treatment, and post-diagnosis follow-up for vital status. SEER*Stat is statistical software that has been developed under the SEER program to serve as an interface for retrieval and analysis of the SEER data. Unless stated otherwise, all data used in this study were obtained via SEER*Stat. Ovarian cancer data describing annual age- and stage-specific incidence for 2000 through 2014 was analyzed using SEER*Stat for all ages up to 85 and 3 stages: (I) localized, (II) regional,and(III)distant. Acursoryexaminationofthisdataindicatesthatthedistribution of patient age and cancer stage at diagnosis (i.e., incidence) exhibits minimal variation over this 15 year period (see Figure B.1 in Appendix B.1). Accordingly, we developed our model using the aggregation of these data. An analysis of data describing the 5-year survival data following diagnosis indicates that these survival probabilities are a function of the age and stage at diagnosis. That is, for a given stage at diagnosis, an older patient is less likely to survive the given number of years post diagnosis than a younger patient, as illustrated in Figure B.2 in Appendix B.1. Diseasemortalitydataforallagesupto95areobtainedfromDevCan,anotherstatistical software developed under the SEER program. All-cause mortality data is obtained via the National Center for Health Statistics (NCHS) of the Centers for Disease Control and Prevention (CDC) for all ages up to 100 (Arias et al. [2016]). Both disease mortality and all-cause mortality data are age-dependent, which suggests that competing-risk mortality is as well. SEER provides a type of prevalence data called the limited-duration prevalence, which is the prevalence of ovarian cancer cases that were diagnosed since 1975. This indicates that ovarian cancer patients who were diagnosed prior to 1975 and are still alive currently are 109 excluded from the limited-duration prevalence data. To estimate the complete prevalence regardless of when the diagnosis occurs, SEER employs a statistical model called the com- pletenessindexthatisbasedonthelimited-durationprevalence,incidence, andsurvivaldata (Capocaccia and De Angelis [1997]; Merrill et al. [2000]). Because the complete prevalence data are modeled values, we decided against the use of both complete and limited-duration prevalence data. Hence, our analysis excluded ovarian cancer prevalence data. 5.3.2 Model Structure Weuseadiscrete-timeMarkovchaintorepresentthenaturalhistoryofovariancancerbased on nine health states, as follows: • healthy (H), indicates that the patient is free from ovarian cancer; • stage 1 (“localized”), 2 (“regional”), 3 (“distant”) which may be – undiagnosed (1U,2U,3U), or – diagnosed (1D,2D,3D), and • mortality due to – disease (DD), or – other causes (DO). Given the lack of evidence that suggests otherwise, we model ovarian cancer as a strictly progressive disease. That is, our model excludes transitions from a disease state to healthy ortoalessadvanceddiseasestate. Inaddition, apersoninadiagnoseddiseasestateremains within the diagnosed states until death. Recognizing that SEER data is likely to have the impactofpost-diagnosistreatmentsembeddedwithinthedata, ourmodelpermitstransition probabilities from undiagnosed disease stages to di↵er from those for diagnosed stages. Our Markovian model ofthenatural history ofovarian cancer isstructurally decomposed into three discrete-time components: 110 • a component that represents disease activation as an age-dependent process, • acomponentthatrepresentstheprocessofdiseaseprogressionfollowingactivationas astationarydiscrete-timeMarkovchain(DTMC),and • a component that represents competing-risk mortality as an age-dependent process. The cycle lengths for these component processes are one year. From one cycle to the next, a healthyfemalemayremainhealthy,transitionto1U,ordiefromothercauses. Thetransition from H to 1U is referred to as disease activation. Following activation, a female progresses according to the DTMC, or dies from competing risks. Patients enter the model in the healthy state at age 20. Notation: • S = {H,1U,2U,3U,1D,2D,3D,DD,DO},thesetofallhealthstates. • D = {1D,2D,3D},thesetofallhealthstatescorrespondingtoadiagnosisofovarian cancer. • U = {H,1U,2U,3U},thesetofallundiagnosedhealthstates. • A = {20,...,85}, the set of all ages for which patients are modeled. • P(a)=[P i,j (a)] = the transition probability matrix for a patient at age a.Thatis, P i,j (a) denotes the probability that a female in state i2S at age a2A will be in state j2S at age a+1. • P = {P(a),a2A} . • p DO (a) = the probability that a female at age a succumbs to competing risks by age a+1. Figure 5.1 illustrates the set of possible transitions. Since transitions into DO can occur from all states, these transitions are excluded from Figure 5.1 in an e↵ort to enhance visual clarity. 111 Figure 5.1: Possible transitions for the ovarian cancer model (transitions to DO are not shown) The nonstationary transition probability matrix,P(a), is represented as P(a)= 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 H 1U 2U 3U 1D 2D 3DDD DO HP H,H (a) P H,1U (a)0 0 0 0 0 0 p DO (a) 1U 0 P 1U,1U (a) P 1U,2U (a) P 1U,3U (a) P 1U,1D (a) P 1U,2D (a) P 1U,3D (a) P 1U,DD (a) p DO (a) 2U 00 P 2U,2U (a) P 2U,3U (a)0 P 2U,2D (a) P 2U,3D (a) P 2U,DD (a) p DO (a) 3U 00 0 P 3U,3U (a)0 0 P 3U,3D (a) P 3U,DD (a) p DO (a) 1D 00 0 0 P 1D,1D (a) P 1D,2D (a) P 1D,3D (a) P 1D,DD (a) p DO (a) 2D 00 0 0 0 P 2D,2D (a) P 2D,3D (a) P 2D,DD (a) p DO (a) 3D 00 0 0 0 0 P 3D,3D (a) P 3D,DD (a) p DO (a) DD 00 0 0 0 0 0 1 0 DO 00 0 0 0 0 0 0 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 (5.1) Unknown parameter values are: • p DO (a), a2A • P HH (a), a2A ,and 112 • P ij (a),i/ 2{ H,DD,DO},j/ 2{ H,DO}, a2A . There are (24+2)*65 = 1690 parameters inP whose values must be determined. We incorporate three simplifications that serve to streamline the e↵ort required to iden- tify plausible model parameters. We represent the process of disease progression following activation as a stationary process. Accordingly, P i,j (a)=(1 p DO (a))P i,j i/ 2{ H,DD,DO},j/ 2{ H,DO}. (5.2) We model disease activation, P H,1U (a), as a piecewise linear function of age: P H,1U (a)= 8 >>>>>>>< >>>>>>>: 0if a 30 (a 30) if a2 (30,75] 45 if a2 (75,85]. (5.3) Finally, we define competing-risk mortality, p DO (·), as the di↵erence between the observed disease mortality and all-cause mortality. As a result of (5.2) and (5.3), the set of unknown parameter values is reduced to • • P ij ,i/ 2{ H,DD,DO},j/ 2{ H,DO}. Note that with these simplifications, we have 24+1 = 25 variables, a significant reduction from 1690 variables in the original presentation of the transition matrices. 5.3.3 Modeled Outcomes Transitions among health states over time result in modeled trends and tendencies governed by the laws of probability. These modeled values can be compared to available data when determining whether or not a given set of model parameters,P, “fits” the available data. In modeling incidence by age and stage, we define 113 • ↵ i (a;P)=theprobabilitythatapatientisreceivesaninitialdiagnosisinstate i 2 D[{ DD} at age a2A . Then ↵ i (a+1;P)= X k2U ⇡ k (a;P)P k,i (a) 8 i2D ,a2A (5.4) ↵ DD (a+1;P)= X k2U[D ⇡ k (a;P)P k,DD (a) 8 a2A , (5.5) where vectors {⇡ (a;P),a2A} are calculated as • ⇡ i (20;P)=1if i =H,and0otherwise • ⇡ (a+1;P)= ⇡ (20;P) a Q t=20 P(t) 8 a2A . Equations (5.4) and (5.5) consider incidence within a cohort, and provide a basis for com- paring outcomes derived from the model to observations from data sources described in §5.3.1. Finally, the n-year survival probability following diagnosis at age a2A and state i2D , S(i,a,n;P), is readily calculated as S(i,a,n;P)= X j6=DD,DO ⇥ a+n 1 Y t=a P(t) ⇤ i,j . (5.6) 5.3.4 Validity Conditions We impose limitations on model parameters, including transition probabilities and the slope parameter in the activation process, so that models that do not satisfy these restrictions are eliminated from consideration. For example, transition probabilities must follow basic probability laws; they are nonnegative and each row of each transition matrix must sum to one. In addition, we impose constraints such that prior to diagnosis, progression is more likely than after diagnosis. This accounts for the e↵ect of treatments that are contained within the post-diagnosis data. Other types of constraints include: 114 • The likelihood of a diagnosis increases with the severity of the health state; • Progression is more likely when an individual is in a more severe health state than in alesssevereone; • The likelihood of progression to a health state decreases with the severity of the state; • The slope defining the activation process is confined to be strictly positive with an upper bound that ensures that basic probability laws are satisfied. The full statement of the validity conditions for our ovarian cancer study appears in Ap- pendix B.2. 5.3.5 Model-Based Calibration for Ovarian Cancer We present the task of identifying valid and well-calibrated parameter sets as an optimization problem. Similar to the modeling framework discussed in Chapter 3, we define an objective function that represents the deviation between modeled outcomes and calibration targets obtained from SEER. We di↵erentiate modeled outcomes from calibration targets by using a “hat” for the targets. Thus, while ↵ i (a;P)denotesthemodeledvalueforincidenceforstate i2D at age a2A ,ˆ ↵ i (a) represents the corresponding target value. Our computational work in this chapter uses the total weighted absolute deviation between modeled outcomes and calibration targets, as a function of the set of model parameter values. Our objective functionistheweightedsumofthetotaldeviationsassociatedwiththethreetypesoftargets used: incidence, disease mortality, and post-diagnosis survival. For a given set of parameter values,P,theobjectivevalueisgivenby: GOF(P)= X t2{ inc,DD,S} w t GOF t (P) =w inc X a2A X i2D ↵ i (a;P) ˆ ↵ i (a) +w DD X a2A ↵ DD (a;P) ˆ ↵ DD (a) +w S X a2A X i2D X n S(i,a,n;P) ˆ S(i,a,n) 115 where ↵ andS are calculated as in (5.4)-(5.6). The coecients, w inc ,w DD ,andw S represent weights corresponding to the three types of calibration targets: incidence, disease mortality, and survival, respectively. These sets of target values have significant di↵erences in mag- nitude. Thus, our computational work in this chapter uses weights that place the various types of targets on similar footing by using weights that are equal to the reciprocal of the mean of the target values. We define the set V,asthesetofmodels, P,thatsatisfyknownvalidityconditions described in §5.3.4. Our calibration problem for ovarian cancer may now be stated as: minimize GOF(P)(CP) subject toP2 V. Note that V is a polyhedral set, while the objective function, GOF(P)isnonconvex,and requires numerous multi-step calculations. We use the numerical method of Nelder and Mead [1965] to identify solutions to (CP). This is a heuristic search method for nonlinear optimization that is widely used, but is not guaranteed to find an optimal solution to (CP). Accordingly, we solve (CP) repeatedly using randomized initializations. That is, when the Nelder-Mead method stabilizes at a particular set of model parameters, the parameters are retained and the search is restarted in a random fashion. The process is repeated several times, afterwhichweexaminethequalityoffittoeachtypeoftarget. Asetofmodelparam- eters is considered to yield a plausible representation of the natural history of ovarian cancer when the modeled outcomes provide an acceptable fit for each type of target. Specifically, if T i represents the sum of the targets of type i,plausiblemodelssatisfy GOF i (P) T i c 1 (5.7) 116 for each i2{ inc,DD,S},andforatleasttwotargettypes, GOF i (P) T i c 2 (5.8) where c 2 <c 1 .Incombination,(5.7)and(5.8)ensurethatmodelparametersthatare considered to be plausible fit the modeled outcomes for all three target types well, with at least two types being fit with a tighter tolerance. Within this study, we set c 1 =0.25 and c 2 =0.2. 5.4 Results Using the approach described in §5.3.5 we identify approximately 3500 sets of model param- eters, of which 150 are considered to yield plausible models of the natural history of ovarian cancer. Withthiscollectionofplausiblemodels, wecandevelopaplausiblerangeofmodeled outcomes relevant to possible medical interventions. In this way, a sense of the broad set of projected outcomes is available. For example, early detection programs can be most e↵ective when they detect the pres- ence of cancers during their preclinical phase. By definition, the preclinical phase is not observable. Its duration is unknowable, and the nature of this window of opportunity is subject to speculation. The collection of plausible models can shed some light on this. For each plausible model, we calculate the expected duration of the preclinical phase, which is defined as the expected duration from disease onset to a diagnosed disease state (i.e., the first passage time from 1U to {1D,2D,3D,DD}). This yields 150 plausible values of the expected duration of the preclinical phase. This set of values is summarized in Table 5.1. Notethat80%oftheplausiblemeandurationsfallbetween4.1yearsand9.6years, whilethe interquartile range is from 5.8 years to 8.3 years. In addition to the expected duration of the preclinical duration, there may be interest in understanding the plausible ranges for various distributional characteristics. For example, associated with each plausible model is a cumu- 117 lative distribution function (cdf) of the duration of the preclinical phase, from which ranges of percentiles can be obtained. Table 5.2 contains information regarding various percentiles that are typically of interest. 118 10 th Percentile 25 th Percentile Median 75 th Percentile 90 th Percentile 4.1 yrs 5.8 yrs 7.4 yrs 8.3 yrs 9.6 yrs Table 5.1: Mean duration of preclinical phase, percentiles from plausible models Percentiles Interquartile Range from Plausible Models 25 th 2.3 – 3.3 years Median 4.1 – 6.1 years 75 th 7.1 – 10.5 years Table 5.2: Percentiles of the duration of the preclinical phase Despite the absence of recommended screening for ovarian cancer by the U.S. Preventive Services Task Force (USPSTF) (Moyer [2012]), our results suggest a high likelihood of a multi-year window of opportunity for early detection. Using our plausible models, we assess theimpactthatascreeningprogrammightyield. Weexaminetheopportunityforreductions in disease mortality as a function of the sensitivity of a hypothetical screening test and the age at which annual screening is initiated. The hypothetical screening program is two- phased, with a first-line screening test and a second-line diagnostic test. An individual with apositivescreenresultundertakesthesecond-linediagnostictest,afterwhichherhealth state is known with certainty and she continues to transition according to P.Thatis,a healthy individual with a false positive screen result will be correctly classified as healthy, while an individual in state iU who tests positive will be correctly diagnosed, placing her in state iD.Thismovementtostate iD,whichcorrespondstoanearlydiagnosis,causes the patient to progress and receive treatment as a diagnosed patient. Mortality reduction is defined as the reduction in the probability of dying from ovarian cancer by age 85 that is attributed to the screening program. The results for mortality reduction are illustrated in Figure 5.2, where we see that the age at which annual screening is initiated exhibits a more strongly di↵erentiated impact than the sensitivity of the screening test. This is especially 119 true for the median and upper quartile obtained from the collection of plausible models than for the lower quartile. Not surprisingly, the e↵ect on mortality reduction diminishes as the age at which screening is initiated increases. Figure 5.2: Plausible mortality reduction as a function of the age at which screening is initiated Figure 5.2 illustrates mortality reduction as a result of early diagnosis, based in part on post-diagnosis outcomes embedded within SEER data. Figure 5.3 illustrates the manner in which the age and stage of cancers, including cancers that are currently latent, would be diagnosed with the hypothetical screening program. Using two plausible models selected from theset of 150 such models, weillustrate the changes in diagnosticstates at various ages as a function of the age at which screening is initiated. Close inspection of the side-by-side graphs reveals di↵erences between the two models when screening is introduced. Overall, we see a stark increase in diagnoses, especially in the early stages. Information such as this would be useful in assessing the potential range of impact when new treatment protocols are under consideration. 120 Figure 5.3: Evidence of early detection using age and stage at diagnosis (two plausible models) 121 5.5 Conclusions We present a Markov model for ovarian cancer that is decomposed into separate processes representing disease activation, disease progression, and competing-risk mortality. Exploita- tion of our structural decomposition of the model of the natural history of ovarian cancer leads to a significant reduction in the number of parameters involved in the calibration pro- cess, therebystreamliningthecomputationalchallengesassociatedwithidentifyingsolutions to (CP). Additionally, we model disease activation as a piecewise linear function, which fur- ther reduces the number of parameters required to model disease activation. While a fully nonstationary model might appear to be more realistic and desirable, such a model will overwhelm search methods and exacerbate the magnitude of calibration uncertainty. The net result would likely to render the analysis unreliable. We illustrate how model-based analysis can help shed some light on our understanding of the duration of the preclinical phase for ovarian cancer, an unknowable value that has a significant impact on the potential ecacy of a screening program. We estimate that the mean duration is likely to fall between 4.1 years and 9.6 years. This range includes the estimate from Brown and Palmer [2009] (i.e., 5.1 years), excludes that of Skates and Singer [1991] (i.e., 1.3 years), and excludes the assumption of Cutler and Young [1975] (i.e., 1 year). Using our plausible models to assess the ecacy of a hypothetical annual screening pro- gram initiated at age 50, we suggest a median of approximately 25% in mortality reduction across all first-line test sensitivities when the second-line diagnostic test is used. Among our models, the25 th percentileofmortalityreductionisapproximately6%,whereasthe75 th per- centile is approximately 65%. Havrilesky et al. [2008] use model-based analysis to conclude that an annual screening initiated at age 50, without a second-line diagnostic test, would result in 43% reduction in ovarian cancer mortality. This corresponds to the 65 th percentile amongvaluesobservedfromourplausiblemodels, althoughthedi↵erencesinthediagnostics test makes a direct comparison dicult. Our work is not without limitations. The disease activation process is modeled as a 122 piecewise linear function that eliminates activation prior to age 30. Although the activation process is inherently age-dependent, the assumption that ovarian cancer activates according to a linear fashion after age 30 might be overly rigid and simplistic. Nevertheless, this process is unknowable and ovarian cancer is rarely diagnosed before age 40 (Boyd et al. [2000]; Permuth-Wey and Sellers [2009]). As a future direction, we will modify the piecewise linear function to examine the impact of this assumption on modeled outcomes. Another limitation is that in modeling the general population, we did not create a risk- di↵erentiated model. In addition to age, genetic/hereditary conditions are major risk factors (ChambersandHess[2008];Schorgeetal.[2010]),asactivationisinfluencedbythesefactors. Our future research includes an adaptation of our model to incorporate risk-di↵erentiated activation, which will allow us to examine the potential of risk-di↵erentiated early detection strategies. 123 5.6 Bibliography American Cancer Society (2017). Cancer Facts & Figures 2017. American Cancer Society, Atlanta. Anderson, G. L., McIntosh, M., Wu, L., Barnett, M., Goodman, G., Thorpe, J. D., Bergan, L., Thornquist, M. D., Scholler, N., Kim, N., et al. (2010). Assessing lead time of selected ovarian cancer biomarkers: A nested case–control study. Journal of the National Cancer Institute,102(1):26–38. Arias, E., Heron, M., and Xu, J. (2016). United States Life Tables, 2012. National Vital Statistics Reports,65(8):14–15. Boyd, J., Sonoda, Y., Federici, M.G., Bogomolniy, F., Rhei, E., Maresco, D.L., Saigo, P.E., Almadrones, L. A., Barakat, R. R., Brown, C. L., et al. (2000). Clinicopathologic features of BRCA-linked and sporadic ovarian cancer. JAMA,283(17):2260–2265. Brown, P.O.andPalmer, C.(2009). Thepreclinicalnaturalhistoryofserousovariancancer: Defining the target for early detection. PLoS Medicine,6(7):1–14. Buys, S.S., Partridge, E., Black, A., Johnson, C. C., Lamerato, L., Isaacs, C., Reding, D.J., Greenlee, R. T., Yokochi, L. A., Kessel, B., et al. (2011). E↵ect of screening on ovarian cancer mortality: The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Randomized Controlled Trial. JAMA,305(22):2295–2303. Capocaccia, R. and De Angelis, R. (1997). Estimating the completeness of prevalence based on cancer registry data. Statistics in Medicine,16(4):425–440. Chambers, S. K. and Hess, L. M. (2008). Ovarian cancer prevention. In Alberts, D. S. and Hess, L. M., editors, Fundamentals of Cancer Prevention,chapter17,pages447–473. Springer-Verlag Berlin Heidelberg. Cutler, S. J. and Young, J. L., editors (1975). Third National Cancer Survey: Incidence Data, National Cancer Institute Monograph 41, DHEW Publication No. (NIH) 75-787. U.S. Government Printing Oce, Washington, DC. Drescher, C. W., Hawley, S., Thorpe, J. D., Marticke, S., McIntosh, M., Gambhir, S. S., and Urban, N. (2012). Impact of screening test performance and cost on mortality reduction and cost-e↵ectiveness of multimodal ovarian cancer screening. Cancer Prevention Research, 5(8):1015–1024. Havrilesky, L. J., Sanders, G. D., Kulasingam, S., Chino, J. P., Berchuck, A., Marks, J. R., and Myers, E. R. (2011). Development of an ovarian cancer screening decision model that incorporates disease heterogeneity. Cancer,117(3):545–553. 124 Havrilesky, L. J., Sanders, G. D., Kulasingam, S., and Myers, E. R. (2008). Reducing ovarian cancer mortality through screening: Is it possible, and can we a↵ord it? Gynecologic Oncology,111(2):179–187. Jacobs, I. J., Menon, U., Ryan, A., Gentry-Maharaj, A., Burnell, M., Kalsi, J. K., Amso, N. N., Apostolidou, S., Benjamin, E., Cruickshank, D., et al. (2016). Ovarian cancer screen- ing and mortality in the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS): Arandomisedcontrolledtrial. The Lancet,387(10022):945–956. Katsube, Y., Berg, J., and Silverberg, S. (1982). Epidemiologic pathology of ovarian tumors: A histopathologic review of primary ovarian neoplasms diagnosed in the Denver Standard Metropolitan Statistical Area, 1 July-31 December 1969 and 1 July-31 December 1979. In- ternational Journal of Gynecological Pathology,1(1):3–16. Keshavarz, H., Hillis, S. D., Kieke, B. A., and Marchbanks, P. A. (2002). Hysterectomy surveillance—UnitedStates,1994–1999. MMWR CDC Surveillance Summaries,51(SS05):1– 8. Menon, U., Gentry-Maharaj, A., Hallett, R., Ryan, A., Burnell, M., Sharma, A., Lewis, S., Davies, S., Philpott, S., Lopes, A., et al. (2009). Sensitivity and specificity of multimodal and ultrasound screening for ovarian cancer, and stage distribution of detected cancers: Results of the prevalence screen of the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS). The Lancet Oncology,10(4):327–340. Merrill, R. M. (2006). Impact of hysterectomy and bilateral oophorectomy on race-specific rates of corpus, cervical, and ovarian cancers in the United States. Annals of Epidemiology, 16(12):880–887. Merrill, R. M., Capocaccia, R., Feuer, E. J., and Mariotto, A. (2000). Cancer prevalence estimates based on tumour registry data in the Surveillance, Epidemiology, and End Results (SEER) Program. International Journal of Epidemiology,29(2):197–207. Moyer, V. A. (2012). Screening for ovarian cancer: U.S. Preventive Services Task Force rearmation recommendation statement. Annals of Internal Medicine,157(12):900–904. Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. The Computer Journal,7(4):308–313. Partridge, E., Greenlee, R. T., Xu, J.-L., Kreimer, A. R., Williams, C., Riley, T., Reding, D. J., Church, T. R., Kessel, B., Johnson, C. C., et al. (2009). Results from four rounds of ovarian cancer screening in a randomized trial. Obstetrics and Gynecology,113(4):775–782. Permuth-Wey, J. and Sellers, T. A. (2009). Epidemiology of Ovarian Cancer,pages413–437. Humana Press, Totowa, NJ. 125 Schapira, M. M., Matchar, D. B., and Young, M. J. (1993). The e↵ectiveness of ovarian cancer screening: A decision analysis model. Annals of Internal Medicine,118(11):838–843. Schorge, J. O., Modesitt, S. C., Coleman, R. L., Cohn, D. E., Kau↵, N. D., Duska, L. R., and Herzog, T. J. (2010). SGO white paper on ovarian cancer: Etiology, screening and surveillance. Gynecologic Oncology,119(1):7–17. Skates, S. J. and Singer, D. E. (1991). Quantifying the potential benefit of CA 125 screening for ovarian cancer. Journal of Clinical Epidemiology,44(4):365–380. Urban, N., Drescher, C., Etzioni, R., and Colby, C. (1997). Use of a stochastic simulation modeltoidentifyanecientprotocolforovariancancerscreening. Controlled Clinical Trials, 18(3):251–270. Wingo, P. A., Huezo, C. M., Rubin, G. L., Ory, H. W., and Peterson, H. B. (1985). The mortalityriskassociatedwithhysterectomy. American Journal of Obstetrics and Gynecology, 152(7):803–808. Yabro↵, K. R., Lamont, E. B., Mariotto, A., Warren, J. L., Topor, M., Meekins, A., and Brown, M. L. (2008). Cost of care for elderly cancer patients in the United States. Journal of the National Cancer Institute,100(9):630–641. 126 Appendix B: Data B.1 Ovarian Cancer Data FiguresB.1illustratestheboxplotsoftheageandstageatdiagnosisfromyears2000to2014, where the combined data for all 15 years are included at the last positions on the subplots. Figure B.2 illustrates survival distributions as a function of the age and stage at diagnosis. These are clearly age-dependent, for all stages. 127 (a) The boxplot of the age at diagnosis across years 2000-2014 (b) The boxplot of the stage at diagnosis across years 2000-2014 Figure B.1: The boxplots of the age and stage at diagnosis from years 2000 to 2014 128 Figure B.2: Nonstationarity in post-diagnosis survival 129 B.2 Validity Conditions P 1U,2U P 1D,2D (B.1) P 1U,3U P 1D,3D (B.2) P 1U,DD P 1D,DD (B.3) P 2U,3U P 2D,3D (B.4) P 2U,DD P 2D,DD (B.5) P 3U,DD P 3D,DD (B.6) P 2U,3U P 1U,2U (B.7) P 3U,DD P 2U,3U (B.8) P 2D,3D P 1D,2D (B.9) P 3D,DD P 2D,3D (B.10) P 1U,2U P 1U,3U (B.11) P 1U,3U P 1U,DD (B.12) P 1D,2D P 1D,3D (B.13) P 1D,3D P 1D,DD (B.14) P 2U,3U P 2U,DD (B.15) P 2D,3D P 2D,DD (B.16) P 1U,1D P 1U,2D (B.17) P 1U,2D P 1U,3D (B.18) P 1U,3D P 1U,DD (B.19) P 2U,2D P 2U,3D (B.20) P 2U,3D P 2U,DD (B.21) P 2U,DD P 1U,DD (B.22) 130 P 3U,DD P 2U,DD (B.23) P 2D,DD P 1D,DD (B.24) P 3D,DD P 2D,DD (B.25) P 2U,3D P 1U,3D (B.26) P 3U,3D P 2U,3D (B.27) P 2U,2D P 1U,2D (B.28) P 3U,3D P 3U,DD (B.29) P 2U,2D P 1U,1D (B.30) P 3U,3D P 2U,2D (B.31) p DO (a)=ˆ p DO (a) 8 a2A (B.32) X j2S P i,j (a)=1 8 i2S ,a2A (B.33) 0 P i,j (a) 1 8 i,j2S ,a2A (B.34) 0 1 p DO (85) 45 (B.35) Equations (B.1) - (B.6) state that treatment helps slow down progression. (B.7) - (B.10) impose that progression is more likely when the person is in a more severe health state. (B.11) - (B.31) require that progressing to a more severe health state is less likely than to alessseverehealthstate. (B.32)sets p DO (a)asthecorrespondingobservedvalue. (B.33) - (B.34) refer to the basic probability laws, whereas (B.35) ensures that the slope defining the disease activation process is strictly positive and is at most 1 p DO (85) 45 .Thisisderived from the fact that ˆ p DO (a)isamonotonicallydecreasingfunctionof a, and that (B.33) must be satisfied. 131 Chapter 6: Conditionally Stationary Markov Models This chapter provides a summary of various technical details associated with the specifica- tion and calibration of the various disease models that appear throughout this dissertation. We formally introduce the notion of a conditionally stationary Markov model–amodelcon- sisting of structurally decomposed elements of the natural history of the disease. We provide the technical details of the decomposition and document the manner in which it facilitates computations. We begin with the elements of a nonstationary disease model, followed by apresentationofaconditionallystationarymodel. Weillustratethecalculationofvarious disease measures from the conditionally stationary model. 6.1 The Disease Model In this section, we consider a model of the disease as a discrete-time Markov chain (DTMC) in which transitions occur over a set of “health states” and all model parameters vary with age. Individuals enter the model at age a 0 free of the disease (i.e., “healthy”) and depart at age a T . 6.1.1 The Nonstationary Model Let: • S = {H,S 1 ,...,S N ,DD,DO},thesetallhealthstates. H denotes the “healthy” state (i.e., the patient is free of the disease). {S i ,i=1...N} denotes the set of health states corresponding to the disease. S 1 denotes the least severe disease state. DD denotes disease mortality (i.e., death from the disease). DO denotes competing risk mortality (i.e., death from other causes). 132 U =thesetofhealthstatesthatprecedetheobservable/clinicalstagesofthe disease. O = the set of all “alive” states that represent observable/clinical stages of the disease. • A ={a 0 ,...,a T },thesetofagesforwhichpatientsaremodeled,where a t+1 =a t +1. • P ij (a)=theone-steptransitionprobabilityfromage a;theprobabilitythatapatient who is in health state i at age a is in health state j at age a+1. • P(a)=[P ij (a)], the one-step transition probability matrix from age a. • P (n) ij (a)=the n-step transition probability from age a;theprobabilitythatapatient who is in health state i at age a is in stage j at age a+n. • P (n) (a)=[P (n) ij (a)], the n-step transition probability matrix from age a. P (n) (a)= t=n 1 Q t=0 P(a+t). • P ={P(a),8 a2A} . Thecollectionofmatrices,P,constitutesthediseasemodel. Whenalltransitionprobabilities vary with age, there are approximately T⇤ N 2 possible transition probabilities that must be determined through the calibration process. The conditionally stationary model, described in the following subsection streamlines these requirements considerably. 6.1.2 The Conditionally Stationary Model Aconditionallystationarymodeldecomposesthediseasemodelintothreecomponents: • a nonstationary process for disease activation (i.e., the transition from H to S 1 ), • anonstationaryrepresentationofthecompetingriskmortality(i.e.,atransitionfrom i to DO), 133 • a stationary discrete-time Markov chain (DTMC) governing the disease progression following activation. The conditionally stationary model requires additional notation in order to describe age- dependent disease activation and competing risk mortality. In particular, • {P H,S1 (a),8 a2A} models the disease activation process, • {p DO (a),8 a2A} models competing risk mortality, and we introduce a constant matrix C which models disease progression following activation as an age-independent process. Let • p DO (a) = the probability that an individual at age a will die from competing risks by age a+1. • q DO (a)=theprobabilitythatanindividualatage a will survive competing risks through age a+1 (i.e., q DO (a)=1 p DO (a)). • R(a,n) = the probability that an individual at age a will survive competing risks through age a+n (i.e., R(a,n)= n 1 Q t=0 q DO (a+t)). • C=[C i,j :i,j2{ S 1 ,...,S N ,DD}], a stationary one-step transition probabilty matrix associated with the disease states. • C (n) denotes the n-step transition probability matrix associated with the stationary component,C. C (n) = n Q t=1 C. • P (n) i,j (a)=R(a,n)C (n) i,j 8 i,j2{ S 1 ,...,S N } (see Appendix C.1). 134 In the conditionally stationary model, the non-stationary transition matrix P(a)isrep- resented as P(a)= 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 HS 1 ... S i ... DD DO HP H,H (a) P H,S 1 (a) ... 0 ... 0 p DO (a) S 1 0 q DO (a)C S 1 ,S 1 ... q DO (a)C S 1 ,S i ... q DO (a)C S 1 ,DD p DO (a) . . . . . . . . . . . . . . . . . . . . . . . . S i 0 q DO (a)C S i ,S 1 ... q DO (a)C S i ,S i ... q DO (a)C S i ,DD p DO (a) . . . . . . . . . . . . . . . . . . . . . . . . DD00 ... 0 ... 10 DO00 ... 0 ... 01 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 . (6.1) Simplystated,theconditionallystationarymodelisaspecialcaseofthenonstationarymodel for which P i,j (a)=q DO (a)C i,j i2{ S 1 ,...,S N },j2{ S 1 ,...,S N ,DD}. (6.2) One advantage of using the conditionally stationary model, as depicted in (6.1), is that the number of model parameters that must be determined through the calibration process is greatly reduced from approximately T ⇤ N 2 to approximately 2T +N 2 .Inadditionto facilitating calibration, this also has implications for calculated outcomes associated with the disease model. 6.1.3 Modeled Outcomes Common Epidemiological Values Transitions among health states over time result in trends and tendencies that yield common epidemiological measures. Within the model, these are governed by the laws of probability 135 so that these trends and tendencies can be calculated directly as “modeled outcomes”. This section includes several such outcomes and identifies how they are calculated. Let: • ⇡ (a)=[⇡ i (a): i2S ]where ⇡ i (a) represents the probability that a patient is in state i2S at age a2A . Note that the observable elements of ⇡ (a)correspondtotheage- and stage-specific prevalence. ⇡ (a)= ⇡ (a 1)P(a 1). • ↵ i (a)=theprobabilitythatapatientisnewlydetectedinstate i2O[{ DD} at age a. Note that {↵ i (a),i2O ,a2A} correspond to the age- and stage-specific incidence. ↵ i (a)= P k2U ⇡ k (a 1)P k,i (a 1) 8 i2O ,8 a2A , ↵ DD (a)= P k2U[O ⇡ k (a 1)P k,DD (a 1) 8 a2A . • S(i,a,n) = the probability that patient survives n years following an initial diagnosis that occurs in state i2O at age a2A . S(i,a,n)=R(a,n)(1 C (n) i,DD ) (see Appendix C.2). • T s 1 denotes the age at which an individual activates the disease. T S 1 is a random variable whose distribution is given by P{T S 1 =a} = ⇡ H (a 0 ) a 2 Q t=a 0 P H,H (t) P H,S 1 (a 1) 8 a2A (see Appendix C.3). In epidemiological studies and disease registries, incidence is conventionally defined as the number of new cases relative to the population alive rather than the original cohort. In this case, let ↵ 0 i (a)denotetheconventionalviewpointofincidenceforstate i2O at age 136 a2A .Itfollowsthat ↵ 0 i (a)= ↵ i (a) P j6=DD,DO ⇡ j (a) = ↵ i (a) 1 ⇡ DD (a) ⇡ DO (a) . (6.3) More Complex Modeled Outcomes In this section, we illustrate how to precisely calculate the modeled outcomes corresponding to important disease characteristics one may be interested in. In many cases these outcomes are constructed in two steps. The first involves a perturbation of the transition matrices during which certain states are redefined as absorbing states, while the second involves calculations that make use of the perturbed matrix. In such cases, the perturbation of a given matrix (e.g.,P)willberepresentedwith 0 (e.g.,P 0 ). Let • R a represent the lifetime risk of activating the disease. R a = P a2A P{T S 1 =a}. • R d represents the lifetime risk of being diagnosed with the disease. We first modify the transition probabilities in each P(a)sothatthehealthstatesin O become absorbing states. Denoting the perturbed transition probability matrix as P 0 (a), the lifetime risk of having a diagnosis can simply be calculated as follows: R d = ⇡ H (a 0 ) P i2O P 0(a T a 0 ) H,i (a 0 ) • ˜ T denotes the preclinical duration, and E[ ˜ T]denoteitsexpectedvalue. We define the preclinical duration as the duration from the disease activation, S 1 ,toa state inO[{ DD}.WemodifythetransitionprobabilitiesintheDTMC,C,sothat the health states in O become absorbing states. The resulting transition probability matrix is denoted asC 0 .Then 137 P{ ˜ T n} = P j2O[{ DD} C 0 (n) S 1 ,j n=1,2,... P{ ˜ T =n} =P{ ˜ T n} P{ ˜ T n 1},and E[ ˜ T]= P n nP{ ˜ T =n}. In addition to preclinical duration, an analyst may be interested in the amount of time spent in certain health states. An example is the duration before cancer metastasizes. In this case, let the time to metastasis to be the duration from disease onset, S 1 ,tothehealth state(s) corresponding to disease metastasis or worse (i.e., DD). The expected time to metastasis can be calculated similarly as in the preclinical duration, by first modifying the transition probabilities in C so that the health states representing disease metastasis are absorbing. 6.1.4 Modeling Disease Activation as a Piecewise-Linear Function Inordertofurtherreducethenumberofmodelparametersinvolvedinthecalibrationprocess, we model the disease activation process as a piecewise-linear function of age. Specifically, if {t i :i=1,...,N}2A represent the break points of the piecewise-linear function, then P H,S 1 (a)= 8 >>>>>>>>>>>< >>>>>>>>>>>: c 1 + 1 a if a2 (a 0 ,t 1 ] c 2 + 2 a if a2 (t 1 ,t 2 ] . . . . . . c N + N a if a2 (t N 1 ,t N ] (6.4) where c i and i represent the intercept and slope for the i th linear function, respectively. Assuming that individuals can only activate the disease after entering the model at a 0 , 138 the continuity of the piecewise-linear function leads to the following: c 1 = 1 a 0 , (6.5) c i =c i 1 +( i 1 i )t i 1 i=2,...,N (6.6) As a result of (6.5) and (6.6), the piecewise-linear function in the form of (6.4) has only N parameters (i.e., the slopes), leading to a decrease of T N in the number of model parameters. Another advantage is the gain in flexibility in the shape of the activation function. A piecewise-linear function with more pieces can better approximate a curvature compared to one with a few. Also, a small change in a slope results in a di↵erent shape. 6.1.5 The Calibration Problem As in Chapter 3 and Chapter 5, the calibration process under the conditionally stationary model may be represented as a nonlinear program. Let • ˆ ⌧ ={ˆ ⌧ t :t2T} denote the set of calibration targets available via epidemiological and clinical studies • ⌧ (P)= {⌧ t (P): t2T} denote the corresponding set of modeled outcomes associated with the modelP • GOF(⌧ (P),ˆ ⌧ )denotethegoodness-of-fitfunctionthatspecifiesthedeviationofthe modeled outcomes from the calibration targets • V denote the set of model parameters, P,thatdonotviolateanyknownvalidity conditions. The calibration problem for a disease model can be represented as: (CP): minimize GOF(⌧ (P),ˆ ⌧ ) subject toP2 V 139 Solving the Calibration Problem and Identifying a Plausible Model (CP)ishighlynon-linearandnon-convex,asthecalculationof⌧ involvesmatrixpolynomials. Heuristics such as Nelder-Mead (Nelder and Mead [1965]) can be used to solve (CP). As this is a heuristic search, convergence to local minima can happen. In our implementation, we repeatedly solve (CP) with randomized initializations and retain the models at which the Nelder-Mead method stabilizes. All retained models are examined so that only those with good fit to each type of calibration target are considered plausible and are subjected to subsequentanalyses. Specifically, themodelfitforacalibrationtargetoftypetisconsidered good if the average deviation from that target type is small relative to the average target value. That is, if the goodness-of-fit function is the absolute deviation from the targets, then the fit is considered good if GOF(⌧ t (P),ˆ ⌧ t )/n t T t /n t ✏ t (6.7) where T t represents the sum of targets of type t, n t denotes the size of target t,and ✏ t is the threshold below which the fit is considered good. We refer to the ratio in the left-hand side of (6.7) as a “fit ratio”. Note that one may assign di↵erent thresholds for di↵erent target types, so that the model fit to certain target types is required to be tighter than the others. 6.2 Bibliography Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. The Computer Journal,7(4):308–313. 140 Appendix C: Derivations C.1 P (n) i,j (a); n-Step Transition Probability When n=2, P (2) i,j (a)= X k2S P i,k (a)P k,j (a+1) = S N X k=S 1 P i,k (a)P k,j (a+1) = S N X k=S 1 (q DO (a)C i,k )(q DO (a+1)C k,j ) =q DO (a)q DO (a+1) S N X k=S 1 C i,k C k,j =R(a,2)C (2) i,j The second equality follows from the fact that DD and DO are absorbing states, and re- gression to H is not allowed post activation. The third equality results from (6.2). Assume that P (n) i,j (a)=R(a,n)C (n) i,j ,then P (n+1) i,j (a)= X k2S P (n) i,k (a)P k,j (a+n) = S N X k=S 1 P (n) i,k (a)P k,j (a+n) = S N X k=S 1 R(a,n)C (n) i,k q DO (a+n)C k,j =R(a,n)q DO (a+n) S N X k=S 1 C (n) i,k C k,j =R(a,n+1)C (n+1) i,j 141 C.2 S(i,a,n); Survival Probabilities Surviving n years following an initial diagnosis that occurs in state i2O at a is equivalent to being in state j 6= DD,DO at a +n,giventhatthepersonisinstate i at a.Bythe definition of an n-step transition probability, S(i,a,n)= X j6=DD,DO P (n) i,j (a) = X j6=DD,DO R(a,n)C (n) i,j =R(a,n) X j6=DD C (n) i,j =R(a,n)(1 C (n) i,DD ) C.3 P{T S 1 =a}; Disease Activation For an individual to activate the disease for the first time at age a,itmustbethecasethat the individual remains healthy prior to a and the transition from H to S 1 occurs at a 1. Since all individuals enter the model at a 0 and activation cannot occur prior to entering the model, ⇡ i (a 0 )=0for i =S 1 ,...,S N ,DD.Thatis,for a a 1 , P{T S 1 =a} = ⇡ H (a 0 )P H,H (a 0 )P H,H (a 1 )...P H,H (a 2)P H,S 1 (a 1) = ⇡ H (a 0 ) a 2 Y t=a 0 P H,H (t) P H,S 1 (a 1). 142 Chapter 7: ARisk-andSubtype-Di↵erentiatedModelforOvarianCancerPre- vention 7.1 Introduction Chapter 3 introduces a systematic approach to examining the impact of calibration uncer- tainty in model-based analysis for medical decision making. The approach is illustrated using a fictitious disease. The disease is modeled as a discrete time Markov chain (DTMC) and the process of calibrating the DTMC parameters to available data is represented as a constrained nonlinear optimization problem (NLP). In Chapter 5, this approach is used to develop a model of ovarian cancer. The NLP is used to characterize the set of plausible models for the disease in question. The natural history model is general in that the patient population is not di↵erentiated by risk for ovarian cancer, and the disease is not di↵eren- tiated by subtype. Search techniques are used to obtain a collection of plausible models, which are examined to assess the plausible range of predicted outcomes. In this chapter, we build on Chapter 3 and Chapter 5 by introducing an extended model of ovarian cancer that can be used to investigate the ecacy of preventive or early detection strategies. Early detection can often result in mortality reduction or survival benefit. Despite this, the U.S. Preventive Services Task Force (USPSTF) currently recommends against screening for ovarian cancer (US Preventive Services Task Force [2018]). This is likely due to limited sensitivity and specificity of screening tests, which leads to limited mortality reduction and moderate (and potentially substantial) harms resulted from false-positive screening test re- sults(USPreventiveServicesTaskForce[2018]). Asaresult,ovariancancerremainsthefifth deadliestcancerforfemales, with22,000estimatednewcasesand14,000estimateddeathsin 2018 (American Cancer Society [2018]). The majority of new cases (60%) are diagnosed at the distant stage, where the 5-year survival rate is only 29%, while only 15% of the cases are diagnosed at the localized stage, where the 5-year survival rate is 92% (American Cancer So- 143 ciety [2018]). Risk-reducing surgeries include the removal of fallopian tubes (salpingectomy) and ovaries (oophorectomy) or both (salpingo-oophorectomy), on both sides (bilateral) or only one side (unilateral). Studies have identified fallopian tube lesions as precursors to serous ovarian cancer, the most aggressive form of ovarian cancer (Kindelberger et al. [2007]; Labidi-Galy et al. [2017]; Leeetal.[2007];Pieketal.[2001]). Theinvolvementofthefallopiantubesinserouscarcino- genesis has led to a spike in interest in salpingectomy as a prophylactic measure for ovarian cancer, and specifically opportunistic salpingectomy where the procedure is performed in addition to already scheduled surgery. The Society of Gynecologic Oncology recommends bilateral salpingo-oophorectomy between 35 and 40 years as a risk-reducing strategy for women at increased genetic risk of ovarian cancer (Walker et al. [2015]). Salpingectomy at thecompletionofchildbearingwithdelayedoophorectomyisalsorecommendedasanoption (Walker et al. [2015]). Because clinical trials to examine the e↵ects of these prophylactic surgeries do not appear to be imminent, model-based analyses might provide insight. 7.1.1 Literature Review The literature on ovarian cancer includes a small number of model-based analyses. These include models that explore opportunities for early detection and opportunities for cancer prevention. Early Detection Models Skates and Singer [1991] develop a stochastic simulation model to evaluate the potential benefit of using CA 125 radioimmunoassay to screen for ovarian cancer. Extending the model developed by Skates and Singer [1991], Urban et al. [1997] evaluate the ecacy and cost-e↵ectiveness of six ovarian cancer screening strategies involving transvaginal ultrasound (TVS)and/orCA125assay. Drescheretal.[2012]refinethemodelofUrbanetal.[1997]by introducing hypothetical biomarker and imaging tests in the multimodal screening test, in 144 which the first-line test is either CA 125 or a hypothetical biomarker assay and the second- linetestiseitherTVSorahypotheticalimagingtest. Schapiraetal.[1993]developadecision tree model to compare “no screening” to a one-time ovarian cancer screening with CA 125 and TVS in a cohort of 40-year-old women in the United States. Havrilesky et al. [2008] develop a discrete-time Markov model of ovarian cancer and examine the cost-e↵ectiveness of a hypothetical screening test at intervals of 3-36 months, within the general population and within a high-risk population. Havrilesky et al. [2011] extend their original model (Havrilesky et al. [2008]) to a subtype-di↵erentiated model that incorporates two ovarian cancer phenotypes. Brown and Palmer [2009] estimate the duration of the preclinical phase of serous ovarian cancer by incorporating data on occult cancers from prophylactic salpingo- oophorectomy specimens in their model. Cancer Prevention Models Before the recent finding of the role of fallopian tube disease in serous ovarian cancer, cancer prevention models tended to focus on women with a BRCA (a cancer susceptibility gene) mutation and the implications of having such mutation. As such, both breast and ovarian cancers are incorporated in these models, as these women are at elevated risk for both (American Cancer Society [2018]; Schorge et al. [2010]). These model-based analyses are Markovian in nature, and generally examine the impact of prophylactic mastectomy and prophylacticoophorectomy(seeAndersonetal.[2006], Grannetal.[2002,2000,1998,2011], Norum et al. [2008], Schrag et al. [1997], and van Roosmalen et al. [2002]). The results from these model-based analyses are similar: prophylactic bilateral salpingo-oophorectomy, with or without prophylactic mastectomy, is e↵ective or cost-e↵ective for women who carry a BRCA mutation. Three cancer prevention models include salpingectomy in their analyses. Kwon et al. [2013] develop a Markov Monte Carlo simulation model to compare the costs and benefits of salpingectomy with bilateral salpingo-oophorectomy among women with a BRCA mu- 145 tation. Both breast and ovarian cancers are modeled. Kwon et al. [2015] focus on the cost-e↵ectiveness of opportunistic salpingectomy as an ovarian cancer prophylactic strategy in the general population. As in Kwon et al. [2013], they develop a Markov Monte Carlo simulation model. There are four health states: well (i.e., not at risk), at risk, ovarian can- cer, and death. Pre-menopausal women enter the model in the “at risk” state, and women who survive ovarian cancer after 10 years transition to the “well” state. Two hypothetical cohorts of women are considered separately. The first cohort consists of 28,000 women (age 45) who undergo hysterectomy for benign conditions and are eligible for elective salpingec- tomy. Three strategies are considered for this cohort: hysterectomy alone, hysterectomy with salpingectomy, and hysterectomy with bilateral salpingo-oophorectomy. Hysterectomy with salpingectomy has the least cost and gains more life-years compared to hysterectomy alone or hysterectomy with oophorectomy, although the di↵erences are small. Compared to hysterectomy alone, concomitant salpingectomy reduces ovarian cancer risk by 38.1%, and concomitant oophorectomy by 88.1%. The second cohort consists of 25,000 women (age 35) seeking surgical sterilization. For this cohort, tubal ligation and salpingectomy are con- sidered. Compared to tubal ligation, salpingectomy is cost-e↵ective and achieves a 29.2% reduction in ovarian cancer risk. Dilley et al. [2017] develop a decision tree model for ovarian cancer to assess the cost- e↵ectiveness of salpingectomy at the time of hysterectomy for benign indications and at permanent contraception for women in the general population. As in Kwon et al. [2015], they consider two cohorts of women: those seeking hysterectomy at age 45 and those seeking surgical sterilization at age 35. The lifetime risk of ovarian cancer (estimated at 1.3%) and ovarian cancer mortality 5 years post-diagnosis (54%) are based on the Surveillance, Epidemiology, andEndResultsProgram(SEER).Theestimatesofriskreductionassociated withtuballigation,bilateralsalpingectomy,andhysterectomyarederivedfromFalconeretal. [2015]. The surgical complication rates are based on published studies such as Jamieson et al. [2000]. For the permanent contraception model, women may experience unintended 146 pregnancy or ectopic pregnancy after tubal ligation or salpingectomy. The probabilities of these complications are derived from Peterson et al. [1997]. For the hysterectomy cohort, hysterectomy with salpingectomy is the dominant, cost-saving strategy. For the permanent contraception cohort, salpingectomy is cost-e↵ective. Summary Despitetheevidenceoffallopiantubeslesionsasprecursorstoserousovariancancer, noneof the models reviewed in this paper have incorporated these precursor states. Also, very few ovarian cancer models examine the e↵ect of salpingectomy. Most cancer prevention models focus on women with a BRCA mutation and the implications of such mutation. There are few cancer prevention models that focus on the general population and conclusions drawn from models tailored to women who carry a BRCA mutation may not generalize to women at low risk. In addition, existing cancer prevention models do not di↵erentiate cancer- related health states by cancer stage, nor do they model pre-diagnostic health states. A restrictive set of health states can result in a misrepresentation of the cost-e↵ectiveness of a prophylactic surgery, since prophylactic surgery targets women without a cancer diagnosis. As a result, the inclusion of pre-diagnostic states within the natural history is critical. By adopting the approach to model development described in Chapter 3, we can examine the breadth of plausible representations of ovarian cancer and quantify the range of important yet unobservable disease characteristics. 7.2 Methods Our methods include the specification of a model of ovarian cancer progression over time. This requires the specification of the structure of the model under development, data used for model calibration, validity conditions, and calibration process. In this section, we discuss these elements in turn. 147 7.2.1 Model Structure We develop a discrete-time Markov model for each ovarian cancer subtype (serous/ non- serous) for women at two risk levels: those with and those without a BRCA mutation. We choose to incorporate genetic mutation of BRCA in our model, as it is an established risk factor for ovarian cancer (American Cancer Society [2018]; Permuth-Wey and Sellers [2009]; Schorge et al. [2010]). Although only 0.1% to 0.2% of the general population are BRCA mutation carriers (Dowdy et al. [2004]), the cumulative risk of ovarian cancer by age 80 is estimated to be 44% for BRCA1 mutation carriers and 17% for BRCA2 mutation carriers (Kuchenbaecker et al. [2017]), as opposed to the SEER projected lifetime risk of 1.3% for the general population. Additionally, the mean age at diagnosis for BRCA-associated cases is significantly younger than that for non-BRCA-associated cases (Boyd et al. [2000]). This suggests that disease initiation is risk-di↵erentiated. The non-serous model is modeled using the model structure specified in Chapter 5 and includes the following health states: • healthy (H); • stage 1 (“localized”), 2 (“regional”), 3 (“distant”) which may be – undiagnosed (1U,2U,3U), or – diagnosed (1D,2D,3D), and • death from – disease (DD), or – other causes (DO). The serous model includes these nine health states, and adds two undiagnosed health states representing the fallopian tube precursor lesions, F1and F2. As a result, there are eleven health states in the serous model. 148 Each subtype is modeled as a strictly progressive disease, due to the lack of evidence that suggests otherwise. Hence, transitions from a disease state to healthy or to a less severe disease state are excluded. Also, patients who are in a diagnosed disease state remain within the diagnosed states until death. Because the impact of treatments can be embedded within the post-diagnosis data from SEER, we permit transition probabilities from undiagnosed disease states to di↵er from those from diagnosed disease states. Thetwomodelstructuresarerepresentedasconditionallystationarymodels. Inmodeling asubtype,serous(S)ornon-serous(NS),thetransitionfromH toadiseasestate(i.e.,F1for subtypeS and 1U for subtypeNS)isreferredtoasdiseaseactivation. Followingactivation, apatientprogressesaccordingtotheDTMC,orsuccumbstocompetingrisks. Weillustrate the model structure in Figure 7.1, in which dashed lines are specific to subtype S,dotted lines are specific to NS,andsolidlinesarecommontoboth. Tomaintainvisualclarity, transitions into DO which can occur from any state (excluding DD)arenotshowninthe figure. Figure 7.1: Possible transitions for the serous (dashed) and non-serous (dotted) models (transitions into DO are not shown) 149 To describe the model(s) mathematically, we require the following notation: • A = {20,21,...,85},thesetofagesforwhichpatientsaremodeled, • T ={S,NS}, the set of all subtypes being modeled whereS represents serous ovarian cancer andNS represents non-serous ovarian cancer, • R ={B,NB},thesetofallpatienttypesbeingmodeledwhereB represents a patient with a BRCA mutation andNB represents a patient without a BRCA mutation, • S(t)=thesetofallhealthstatesforsubtype t2T .Thatis, – S(NS)={H,1U,2U,3U,1D,2D,3D,DD,DO} and – S(S)=S(NS)[{ F1,F2} • D = {1D,2D,3D},thesetofallhealthstatescorrespondingtoadiagnosisofthe disease, • U(t)=thesetofallhealthstatescorrespondingtoundiagnoseddiseaseortheabsence of disease for subtype t2T .Thatis, – U(NS)={H,1U,2U,3U} and – U(S)=U(NS)[{ F1,F2} • P t,r (a)=[P t,r i,j (a)] = the transition probability matrix corresponding to subtype t2T for a patient of type r2R at age a.Thatis, P t,r i,j (a)denotestheprobabilitythata patient of type r who is in state i2S (t)atage a2A will be in state j2S (t)atage a+1, • P t ={P t,r (a):a2A ,r2R} ,and • p DO (a)=theprobabilitythatafemaleatage a will die from competing risks by age a+1. 150 7.2.2 Calibration Targets Based on the International Classification of Diseases for Oncology (ICD-O) by the World Health Organization (WHO) (Tavassoli and Devilee [2003]), we obtain subtype-specific data from the Surveillance, Epidemiology, and End Results (SEER) Program. For each subtype, age-specificincidenceforallagesupto85isobtainedbyaggregatingdatafrom2000through 2014. In addition, the 5-year survival data by age and stage at diagnosis for each subtype is extracted from SEER. Ovarian cancer mortality data for all ages up to 85 is obtained from DevCan, astatisticalsoftwaredevelopedundertheSEERprogram. All-causemortalitydata for all ages up to 85 is obtained from the United States Life Tables (Arias et al. [2016]). The SEER projected lifetime risk of developing ovarian cancer is 1.3% for the general population. We derive the cumulative risk of ovarian cancer by age 80 for BRCA mutation carriers by weighting the estimates for BRCA1 and BRCA2 mutation carriers from Kuchen- baecker et al. [2017] using the proportion of each mutation type among all BRCA mutation carriers (refer to Appendix D.1). Subtype-specific data is necessary to inform our model. Because disease mortality and cumulative risks for each patient type are not di↵erentiated by subtype, we derive subtype- specific data using the following procedure: we weight ovarian cancer mortality by age 85 and lifetime risk of ovarian cancer for the general population using the proportion of ovarian cancer subtype that is estimated through the case listing data in SEER. Similarly, subtype- specific risk by age 80 for BRCA mutation carriers is obtained by weighting the cumulative risk of ovarian cancer for BRCA mutation carriers using the proportion of ovarian cancer subtype among BRCA-associated cases (Bolton et al. [2012]; Candido-dos Reis et al. [2015]; Kotsopoulos et al. [2016]; McLaughlin et al. [2013]). The data presented in this section represent the collection of calibration targets that are comparedwithmodeledoutcomeswhenassessingthemodelfit. Weusea“hat”todi↵erenti- ate these calibration targets from the corresponding modeled outcomes. For instance, ˆ ↵ t i (a) representstheobservedincidenceforsubtypetforstateiatagea,while ↵ t i (a)representsthe 151 corresponding modeled incidence. For a given subtype t2T ,thesetofcalibrationtargets is denoted as follows: • ˆ ↵ t (a)=observedsubtype-specificincidenceatage a2A , • ˆ S t (i,a,n)=observed n-year subtype-specific, post-diagnosis survival in statei2D at age a2A , • ˆ ⇡ t DD (85) = observed subtype-specific mortality by age 85, • ˆ µ t =estimatedcumulativeriskofovariancancerofsubtype t by age 85 for women without a BRCA mutation, and • ˆ t =estimatedcumulativeriskofovariancancerofsubtype t by age 80 for BRCA mutation carriers. 7.2.3 Modeled Outcomes For a given set of NH model parameters,P t ,incidence,prevalence,andsurvivalbyageand stage are defined as follows: • ↵ t,r i (a;P t )=theprobabilitythatapatientreceivesaninitialdiagnosisinstate i 2 D[{ DD} at age a2A , • ⇡ t,r i (a;P t )=theprobabilitythatapatientisinstate i2S (t)atage a2A , • S t,r (i,a,n;P t )=theprobabilitythatapatientwhoisdiagnosedinstate i2D at age a2A will remain alive n years following diagnosis, • µ t (P t )=theprobabilitythatapatientwithoutaBRCAmutationwilldevelopovarian cancer of subtype t by age 85, and • t (P t ) = the probability that a patient with a BRCA mutation will develop ovarian cancer of subtype t by age 80. 152 ↵ t,r i (a;P t ), ⇡ t,r i (a;P t ), and S t,r (i,a,n;P t )canbecalculatedaccordingtothedetailsoutlined in Chapter 6. Recognizing that most of the risk-di↵erentiated modeled outcomes are not directly comparable with the observed data described in§7.2.2, we first combine risk-specific outcomes for each subtype. That is, ↵ t i (a;P t )= X r2R ⇢ r ↵ t,r i (a;P t ) 8 i2D[{ DD},t2T ⇡ t i (a;P t )= X r2R ⇢ r ⇡ t,r i (a;P t ) 8 i2S (t),t2T where ⇢ B is the prevalence of BRCA mutations, which is estimated at 0.15% (Dowdy et al. [2004]). Finally, ↵ t (a;P t )= X i2D ↵ t i (a;P t ), µ t (P t )= X i2D X a 85 ↵ t,NB i (a;P t ), and t (P t )= X i2D X a 80 ↵ t,B i (a;P t ). 7.2.4 Model Fit We assess a model’s fit based on the di↵erences between the calibration targets and the modeled outcomes. For a given set of parameter values, P t , the objective function with subtype-specific calibration targets (i.e., incidence, survival, disease mortality, and cumula- tive risk for each patient type) is defined as the total weighted absolute deviation from the 153 targets, which is given by: GOF(P t )= X ⌧ 2{ ↵,S,DD, } w t ⌧ GOF t ⌧ (P t ) =w t ↵ X a2A |↵ t (a;P t ) ˆ ↵ t (a)|+w t S X a2A X i2D X n |S t (i,a,n;P t ) ˆ S t (i,a,n)| +w t DD |⇡ t DD (85;P t ) ˆ ⇡ t DD (85)|+w t µ |µ t (P t ) ˆ µ t |+w t | t (P t ) ˆ t |. (7.1) The coecients w t ↵ , w t S , w t DD , w t µ ,and w t are the weights associated with the types of calibrationtargets: incidence, survival, diseasemortality, cumulativeriskforwomenwithout aBRCAmutation,andcumulativeriskforwomenwithaBRCAmutation,respectively. Because the targets vary in magnitude significantly, the weights are selected so that all target types are placed on similar footing. That is, the weights are the reciprocal of the average target value. One exception is w t µ and w t ,becausealargerweightisnecessaryin order to encourage consistency. Accordingly, w t µ and w t are set at 35 times the reciprocal of the average target value for the non-serous subtype and 15 times for the serous subtype. 7.2.5 Model Simplifications {P t ,t2T} involves more than 3500 model parameters. Because the calculations of the modeled outcomes require matrix multiplications, (7.1) is highly nonlinear and nonconvex, which taxes any solver. As a result, simplifications are necessary. For a given ovarian cancer subtype t2T and patient type r2R ,werepresentthemodelasaconditionallystationary model. Other model simplifications are stated below. • Progression following activation is independent of age and patient type. That is, P t,r i,j (a)=(1 p DO (a))P t i,j i2S (t)\{H,DD,DO} and j2S (t)\{H,DO}. • Disease activation is modeled as a continuous piecewise-linear function of age for each subtype and patient type. Specifically, 154 P t,B H,1U (a)= 8 >>>>>>>>>>>>>>>< >>>>>>>>>>>>>>>: t,B 1 a+c t,B 1 if a 30 t,B 2 a+c t,B 2 if a2 (30,40] t,B 3 a+c t,B 3 if a2 (40,60] t,B 4 a+c t,B 4 if a2 (60,70] t,B 5 a+c t,B 5 if a2 (70,85] , (7.2) and P t,NB H,1U (a)= 8 >>>>>>>>>>>< >>>>>>>>>>>: 0if a 30 t,NB 1 a+c t,NB 1 if a2 (30,40] t,NB 2 a+c t,NB 2 if a2 (40,75] t,NB 3 a+c t,NB 3 if a2 (75,85] . (7.3) The break points in (7.2) are selected based on the general trend observed from age- specific incidence estimates for BRCA mutation carriers, which are derived from Finch et al. [2006] and Kuchenbaecker et al. [2017] (see Appendix D.1). The break points in (7.3) are based on the incidence trend observed from SEER data and are consistent with the age-at-diagnosis distribution reported in Boyd et al. [2000]. • Competingriskmortalityissetatthedi↵erencebetweentheobserveddiseasemortality and all-cause mortality for all subtypes and patient types. Namely, p DO (a)=ˆ p DO (a),a2A . As a result of the simplifications, the set of unknown parameter values now becomes • t,r i ,8 i,t2T ,r2R • P t i,j ,i2S (t)\{H,DD,DO},j2S (t)\{H,DO},t2T 155 In other words, the total number of variables is significantly reduced to 16 + 24 + 28 = 68 variables. 7.2.6 Validity Conditions There are many conditions that can be used to eliminate potential sets of model parameters that are necessarily invalid. For example, probabilities must be nonnegative, and must sum to one across certain subsets. Any set of proposed parameter values is easily recognized as invalid if these conditions are violated. Other examples of validity conditions include: • the likelihood of a diagnosis increases with the severity of the disease, • the likelihood of progression from a given state to a “next” state decreases with the severity of that “next” state, and • the likelihood of progression from a given health state increases with the severity of that health state. Taking into the account that the impact of treatments is likely to be contained within the post-diagnosis data obtained from SEER, we also impose constraints on all subtypes so that progression is more likely prior to diagnosis than after diagnosis. Additionally, we impose constraints on the slopes in (7.2) according to the general trend observed in Figures D.1 and D.2. Similarly, the slope parameters in (7.3) are constrained to follow the trend observed from the SEER data. We also impose constraints so that for each subtype, women with a BRCA mutation are more likely to activate the disease compared to women without such mutation at ages 40 and 60. Weincorporatetheseconditionsasconstraintsonthemodelparameterssothatparameter sets that violate the validity conditions are eliminated from consideration. We denote the set of validity conditions forP t asV t , which is available in Appendix D.2. 156 7.2.7 The Calibration Problem WeadoptthemodelingframeworkpresentedinChapter3andChapter5toidentifyplausible models by formulating the calibration process as a nonlinear program for each subtype. Using model simplifications, calibration targets, modeled outcomes, and validity conditions, the two calibration problems can be represented as: (CP t ): minimize GOF(P t ) subject toP t 2 V t whereV t consists of the subtype-specific validity conditions discussed in §7.2.6. WeuseaheuristicsearchintroducedinNelderandMead[1965]tosolve(CP t ),t =S,NS. Aheuristicsearchcanstallatsuboptimalpoints. Tosafeguardagainstthem,eachtimethe search method appears to stabilize at a parameter set, we retain the parameter set and randomly re-initialize the search. All parameter sets that are retained are examined for their fit to each type of calibration target. A parameter set is considered to fit the calibration targetsoftype⌧ welliftheaveragedeviationfromthetargetsoftype⌧ issmallincomparison to the average target value. That is, GOF t ⌧ (P t )/n ⌧ T ⌧ /n ⌧ ✏ ⌧ (7.4) where • T ⌧ denotes the sum of targets of type ⌧ , • n ⌧ denotes the number of targets of type ⌧ ,and • ✏ ⌧ denotes the threshold below which the fit is regarded as good. Parameter sets that fit each target type well are considered to be plausible and are used in subsequent analyses. As in Chapter 3 and Chapter 5, we refer to the ratio in the left-hand 157 side of (7.4) as a “fit ratio”. We require that the fit ratios of at least three target types to be 0.15 at most, and all fit ratios to be at most 0.25. In other words, a plausible model is required to have exceptional fit to at least four target types, and a model with only four exceptional fits is also plausible if the remaining fit ratio is at most 0.25. 7.3 Illustrative Results Using the approach described in §7.2.7, approximately 55,000 sets of model parameters for thenon-serousmodeland53,000setsfortheserousmodelwereretainedviatheNelder-Mead search. Of these, 25 non-serous models and 66 serous models are considered as plausible and used in subsequent analyses. We examined the following disease characteristics: • the expected duration of the preclinical phase for each subtype, • the expected time from disease activation to metastasis for each subtype, • the expected time spent in precursor lesions for the serous subtype, and • the mean age at activation for each subtype and patient type. The calculation of the expected duration described in Chapter 6 provides an estimate by “censoring”patientswhodiefromcompetingrisks. Inthischapter,wecalculatetheexpected durations with the e↵ect of competing risk mortality. The duration of the preclinical phase is defined to be the first passage time from disease activation to a diagnosed disease state (i.e., a state in {1D,2D,3D}). For the non-serous subtype, it is the first passage time from 1U; for the serous subtype, it is from F1. For each plausible model of a given ovarian cancer subtype, we calculate the expected duration of the preclinical phase. As there are 25 and 66 plausible models for the non-serous and serous subtypes, respectively, there exist 25 and 66 plausible values of the expected preclinical duration accordingly. Table 7.1 summarizes the results. We observe that for the non-serous subtype, the interquartile range of the plausible mean durations falls between 4.1 years and 158 6.8 years for women with a BRCA mutation and between 6.3 years and 6.7 years for women without a BRCA mutation. For the serous subtype, the interquartile ranges for women with a BRCA mutation and for those without are identical, which is between 4 years and 4.7 years. Theexpectedtimetometastasisisdefinedasthefirstpassagetimefromdiseaseactivation to {3U,3D}.TheresultsarepresentedinTable7.2.Forthenon-seroussubtype,the interquartile range of the expected time to metastasis falls between 4.4 years and 7.9 years for women with a BRCA mutation and between 6.7 years and 8.2 years for those without. For the serous subtype, it falls between 4.7 years and 6.9 years for women with a BRCA mutation and between 4.5 years and 5.9 years for those without. For each of the plausible models for the serous subtype, we also calculate the expected time spent in precursor lesions, F1and F2, for both patient types. The results are sum- marized in Table 7.3. The interquartile ranges for both patient types are almost identical: between 2 and 3 years. Our results suggest that there is a multi-year window for early detection or preventive opportunities for all subtypes and patient types. For the serous subtype, patients spend approximately 2.5 years in precursor lesions regardless of their mutation status, and the di↵erence between the median expected preclinical duration and the median expected time to metastasis is just slightly more than one year. This suggests that the disease progresses relatively fast, which underlines the importance of early detection or prevention. 159 Subtype Patient Type 25 th Percentile Median 75 th Percentile Non-serous BRCA 4.1 yrs 4.9 yrs 6.8 yrs Non-BRCA 6.3 yrs 6.4 yrs 6.7 yrs Serous BRCA 4.0 yrs 4.3 yrs 4.7 yrs Non-BRCA 4.0 yrs 4.3 yrs 4.7 yrs Table 7.1: Mean duration of preclinical phase, percentiles from plausible models Subtype Patient Type 25 th Percentile Median 75 th Percentile Non-serous BRCA 4.4 yrs 5.1 yrs 7.9 yrs Non-BRCA 6.7 yrs 7.3 yrs 8.2 yrs Serous BRCA 4.7 yrs 5.5 yrs 6.9 yrs Non-BRCA 4.5 yrs 5.1 yrs 5.9 yrs Table 7.2: Mean time to metastasis, percentiles from plausible models Patient Type 25 th Percentile Median 75 th Percentile BRCA 2.2 yrs 2.5 yrs 2.8 yrs Non-BRCA 2.3 yrs 2.5 yrs 2.9 yrs Table 7.3: Mean duration in precursor lesions, percentiles from plausible models 160 Patient Type 25 th Percentile Median 75 th Percentile BRCA 61.7 72.7 76.0 Non-BRCA 64.8 64.9 65.0 Table 7.4: Mean age at activation for non-serous cancer, percentiles from plausible models Patient Type 25 th Percentile Median 75 th Percentile BRCA 54.7 56.9 60.0 Non-BRCA 66.54 66.56 66.59 Table 7.5: Mean age at activation for serous cancer, percentiles from plausible models We estimate the mean age at activation for each subtype and patient type. Tables 7.4 and 7.5 show the results. We note that for all subtypes, the interquartile range for the mean age at activation for women without a BRCA mutation is almost zero. We also observe a wide interquartile range for those with a BRCA mutation. Women with a BRCA mutation activatenon-serouscanceratamedianageof73, whereasthosewithoutactivateatamedian age of 65. Women with a BRCA mutation also activate serous cancer at a median age of 57, whereas those without activate at age 67. 7.4 Discussions In this chapter, we extend our ovarian cancer model in Chapter 5 by di↵erentiating ovarian cancer subtypes and patient types. The models are represented as conditionally stationary models, and the calibration problems are formulated according to the framework introduced in Chapter 3. Our illustrative results indicate that there is a multi-year window in the preclinical phase for both disease subtypes, and 2.5 years is spent as fallopian tube disease for the serous subtype. The mean age at serous activation is predicted to be 55-60 years for women who carry a BRCA mutation and 66.5 years for those do not. Ourmodelingframeworkallowsabroadarrayofanalysesinadditiontotheonespresented in§7.3. For example, women at high risk of developing ovarian cancer might be interested in 161 undertaking prophylactic surgery such as salpingectomy and salpingo-oophorectomy as soon aschildbearingiscomplete. Onthecontrary,womenatgeneralriskmightprefertodelaythe surgery to closer to natural menopause. Using the set of plausible models for each subtype, we examine the potential of prophylactic surgery, including bilateral salpingo-oophorectomy (BSO) and bilateral salpingectomy (BS), in reducing incidence and disease mortality as a function of age at surgery. We model the ecacy of the surgeries so that at the time of surgery, women who are healthy will experience reduced probability of activating the serous subtype by 85% with BS and by 99% with BSO for the rest of her lifetime. BSO is modeled as reducing future probability of activating the non-serous subtype by 99% whereas BS is modeled as having no e↵ect on non-serous activation. The details are available in Appendix D.3. With these, we calculate the median lifetime reduction in incidence (i.e., diagnoses) and mortality for each subtype and patient type for a selection of age at surgery. Tables 7.6a - 7.6d summarize the results for both patient types. 162 Age at surgery Median incidence reduction Median mortality reduction BS BSO BS BSO 35 79.7% 97.3% 79.7% 96.9% 40 78.8% 96.1% 78.5% 95.1% 45 77.8% 94.5% 77.1% 92.1% 50 75.1% 91.0% 73.6% 87.9% (a) Serous, BRCA Age at surgery Median incidence reduction Median mortality reduction BS BSO BS BSO 35 83.6% 98.9% 82.9% 98.9% 40 83.6% 98.9% 82.9% 98.9% 45 83.4% 98.5% 83.0% 98.7% 50 81.8% 96.3% 81.6% 96.4% (b) Serous, non-BRCA Age at surgery Median incidence reduction Median mortality reduction BS BSO BS BSO 35 -0.7% 96.6% 0.4% 96.7% 40 -1.2% 95.1% 0.9% 94.6% 45 -2.3% 90.6% 1.5% 92.1% 50 -5.3% 82.6% 2.3% 89.4% (c) Non-serous, BRCA Age at surgery Median incidence reduction Median mortality reduction BS BSO BS BSO 35 -0.5% 97.4% 0.2% 97.6% 40 -2.0% 92.7% 0.9% 93.0% 45 -4.4% 85.8% 2.1% 86.4% 50 -7.9% 77.1% 3.7% 78.6% (d) Non-serous, non-BRCA Table7.6: Medianlifetimereductioninincidenceandmortality, bysubtypeandpatienttype (BS: bilateral salpingectomy; BSO: bilateral salpingo-oophorectomy) 163 Age at BS Delayed oophorectomy 10-year delay 15-year delay 35 94.1 91.5 36 93.5 90.7 37 92.9 89.8 38 92.2 88.8 39 91.4 87.8 40 90.6 86.8 Table 7.7: Median lifetime reduction (%) in serous incidence for women with a BRCA mutation (combined strategy) From Tables 7.6a - 7.6d, we observe that reduction in serous incidence and mortality decreaseswiththeageatBSorBSO.Reductioninincidenceissimilartomortalityreduction for each subtype and patient type combination with BSO, and mortality reduction in non- serous subtype is slightly higher than incidence reduction regardless of the patient type with BS. The negative incidence reductions in Tables 7.6c and 7.6d represent an increase in non-serous incidence due to occult cancers discovered during BS and the assumption that non-serous risk is una↵ected by BS. We also examine a combined strategy that consists of BS between ages 35 and 40 and bilateral oophorectomy with a delay of 10 or 15 years. For women with a BRCA mutation, this combined strategy is expected to result in at least 87% in serous incidence reduction (see Table 7.7). The mortality reduction is at least 87% for all subtypes and patient types (data not shown). Our work is not without limitations. We consider two ovarian cancer subtypes and subsequently develop two disjoint models, one for each subtype, for two patient types. This approach considers the natural history of one subtype while disregarding the possibility of developingtheother,thusplacingalimitationonthetypesofanalysisthatcanbeperformed. For example, with this approach, one can only examine the impact of an intervention on a subtype instead of on ovarian cancer as a whole. Simply combining the results from each subtype to represent the overall impact of the intervention on ovarian cancer might result in 164 an overestimation of the actual impact. Incorporating both ovarian cancer subtypes into a single disease model requires that all 68 model parameters be calibrated simultaneously, a task that may be challenging computationally, especially our approach involves solving the calibration problem repeatedly with randomized initializations. Havrilesky et al. [2011] developed a subtype-di↵erentiated model that incorporates two ovarian cancer subtypes. Their model targets the general U.S. population, who enter the modelatage40. Additionally,theydidnotmodelprecursorlesions. Ourmodeldi↵erentiates disease activation by BRCA mutation status. Women with a BRCA mutation can activate thediseaseasearlyasage20,asopposedtoage30forthosewithout. Incorporatingprecursor lesions in the model permits a better exploration of the window of opportunity for ovarian cancer prevention, as preventive interventions are likely to have the most potential while the disease is pre-cancerous. Our model also allows a complete examination of the impact of risk-di↵erentiated interventions. We did not di↵erentiate between high grade serous carcinomas (HGSC) and low grade serous carcinomas (LGSC) when modeling the serous subtype, although HGSC and LGSC are considered two distinct subtypes of ovarian cancer (K¨ obel et al. [2008]; Prat [2012]). However, LGSC is rare (Prat [2012]) and little is known about its carcinogenesis. In our models, survival is not risk-di↵erentiated. While studies have found that BRCA- associated ovarian cancer cases have improved survival over non-BRCA-associated cases (Bolton et al. [2012]; Chetrit et al. [2008]; Vencken et al. [2011]), several studies have con- cludedthatthesurvivaladvantageisshort-termanddiminishesovertime(Candido-dosReis et al. [2015]; Kotsopoulos et al. [2016]; McLaughlin et al. [2013]). Our choice of two precursor states in the serous model might appear to be somewhat arbitrary. Although a single precursor state will reduce the number of parameters involved in the calibration process, it constrains the representation of the duration of the precursor states. Additionally, there are two precursor events prior to serous carcinogenesis: p53 signature and serous tubal intraepithelial carcinoma (STIC) (Crum et al. [2012]). Due to 165 these reasons, we chose to incorporate two precursor states in our model. 166 7.5 Bibliography American Cancer Society (2018). Cancer Facts & Figures 2018. American Cancer Society, Atlanta. Anderson, K., Jacobson, J. S., Heitjan, D. F., Zivin, J. G., Hershman, D., Neugut, A. I., and Grann, V. R. (2006). Cost-e↵ectiveness of preventive strategies for women with a BRCA1 or a BRCA2 mutation. Annals of Internal Medicine,144(6):397–406. Arias, E., Heron, M., and Xu, J. (2016). United States life tables, 2012. National Vital Statistics Reports,65(8):14–15. Bolton, K. L., Chenevix-Trench, G., Goh, C., Sadetzki, S., Ramus, S. J., Karlan, B. Y., Lambrechts, D., Despierre, E., Barrowdale, D., McGu↵og, L., et al. (2012). Association between BRCA1 and BRCA2 mutations and survival in women with invasive epithelial ovarian cancer. JAMA,307(4):382–389. Boyd, J., Sonoda, Y., Federici, M.G., Bogomolniy, F., Rhei, E., Maresco, D.L., Saigo, P.E., Almadrones, L. A., Barakat, R. R., Brown, C. L., et al. (2000). Clinicopathologic features of BRCA-linked and sporadic ovarian cancer. JAMA,283(17):2260–2265. Brown, P.O.andPalmer, C.(2009). Thepreclinicalnaturalhistoryofserousovariancancer: Defining the target for early detection. PLoS Medicine,6(7):1–14. Candido-dos Reis, F. J., Song, H., Goode, E. L., Cunningham, J. M., Fridley, B. L., Larson, M. C., Alsop, K., Dicks, E., Harrington, P., Ramus, S. J., et al. (2015). Germline mutation in BRCA1 or BRCA2 and ten-year survival for women diagnosed with epithelial ovarian cancer. Clinical Cancer Research,21(3):652–657. Chetrit, A., Hirsh-Yechezkel, G., Ben-David, Y., Lubin, F., Friedman, E., and Sadetzki, S. (2008). E↵ect of BRCA1/2 mutations on long-term survival of patients with invasive ovariancancer: TheNationalIsraeliStudyOfOvarianCancer. Journal of Clinical Oncology, 26(1):20–25. Crum, C. P., McKeon, F. D., and Xian, W. (2012). BRCA, the oviduct, and the space and time continuum of pelvic serous carcinogenesis. International Journal of Gynecological Cancer,22(S1):S29–S34. Dilley, S. E., Havrilesky, L. J., Bakkum-Gamez, J., Cohn, D. E., Straughn, J. M., Caughey, A. B., and Rodriguez, M. I. (2017). Cost-e↵ectiveness of opportunistic salpingectomy for ovarian cancer prevention. Gynecologic Oncology,146(2):373–379. Dowdy, S. C., Stefanek, M., and Hartmann, L. C. (2004). Surgical risk reduction: Prophy- lactic salpingo-oophorectomy and prophylactic mastectomy. American Journal of Obstetrics and Gynecology,191(4):1113–1123. 167 Drescher, C. W., Hawley, S., Thorpe, J. D., Marticke, S., McIntosh, M., Gambhir, S. S., and Urban, N. (2012). Impact of screening test performance and cost on mortality reduction and cost-e↵ectiveness of multimodal ovarian cancer screening. Cancer Prevention Research, 5(8):1015–1024. Falconer, H., Yin, L., Gr¨ onberg, H., and Altman, D. (2015). Ovarian cancer risk after salp- ingectomy: A nationwide population-based study. Journal of the National Cancer Institute, 107(2):dju410. Finch, A., Beiner, M., Lubinski, J., Lynch, H. T., Moller, P., Rosen, B., Murphy, J., Ghadirian, P., Friedman, E., Foulkes, W. D., et al. (2006). Salpingo-oophorectomy and the risk of ovarian, fallopian tube, and peritoneal cancers in women with a BRCA1 or BRCA2 mutation. JAMA,296(2):185–192. Grann, V. R., Jacobson, J. S., Thomason, D., Hershman, D., Heitjan, D. F., and Neugut, A. I. (2002). E↵ect of prevention strategies on survival and quality-adjusted survival of women with BRCA1/2 mutations: An updated decision analysis. Journal of Clinical Oncol- ogy,20(10):2520–2529. Grann, V. R., Jacobson, J. S., Whang, W., Hershman, D., Heitjan, D. F., Antman, K. H., and Neugut, A. I. (2000). Prevention with tamoxifen or other hormones versus prophylac- tic surgery in BRCA1/2-positive women: A decision analysis. The Cancer Journal from Scientific American,6(1):13–20. Grann, V. R., Panageas, K. S., Whang, W., Antman, K. H., and Neugut, A. I. (1998). Deci- sion analysis of prophylactic mastectomy and oophorectomy in BRCA1-positive or BRCA2- positive patients. Journal of Clinical Oncology,16(3):979–985. Grann, V. R., Patel, P. R., Jacobson, J. S., Warner, E., Heitjan, D. F., Ashby-Thompson, M., Hershman, D. L., and Neugut, A. I. (2011). Comparative e↵ectiveness of screening and prevention strategies among BRCA1/2-a↵ected mutation carriers. Breast Cancer Research and Treatment,125(3):837–847. Havrilesky, L. J., Sanders, G. D., Kulasingam, S., Chino, J. P., Berchuck, A., Marks, J. R., and Myers, E. R. (2011). Development of an ovarian cancer screening decision model that incorporates disease heterogeneity. Cancer,117(3):545–553. Havrilesky, L. J., Sanders, G. D., Kulasingam, S., and Myers, E. R. (2008). Reducing ovarian cancer mortality through screening: Is it possible, and can we a↵ord it? Gynecologic Oncology,111(2):179–187. Jamieson, D. J., Hillis, S. D., Duerr, A., Marchbanks, P. A., Costello, C., Peterson, H. B., et al. (2000). Complications of interval laparoscopic tubal sterilization: Findings from the United States collaborative review of sterilization. Obstetrics & Gynecology,96(6):997–1002. 168 Kindelberger,D.W.,Lee,Y.,Miron,A.,Hirsch,M.S.,Feltmate,C.,Medeiros,F.,Callahan, M. J., Garner, E. O., Gordon, R. W., Birch, C., et al. (2007). Intraepithelial carcinoma of the fimbria and pelvic serous carcinoma: Evidence for a causal relationship. The American Journal of Surgical Pathology,31(2):161–169. K¨ obel, M., Kalloger, S. E., Boyd, N., McKinney, S., Mehl, E., Palmer, C., Leung, S., Bowen, N. J., Ionescu, D. N., Rajput, A., et al. (2008). Ovarian carcinoma subtypes are di↵erent diseases: Implications for biomarker studies. PLoS Medicine,5(12):1749–1760. Kotsopoulos, J., Rosen, B., Fan, I., Moody, J., McLaughlin, J. R., Risch, H., May, T., Sun, P.,andNarod,S.A.(2016). Ten-yearsurvivalafterepithelialovariancancerisnotassociated with BRCA mutation status. Gynecologic Oncology,140(1):42–47. Kuchenbaecker, K. B., Hopper, J. L., Barnes, D. R., Phillips, K.-A., Mooij, T. M., Roos- Blom, M.-J., Jervis, S., VanLeeuwen, F.E., Milne, R.L., Andrieu, N., etal.(2017). Risksof breast, ovarian, and contralateral breast cancer for BRCA1 and BRCA2 mutation carriers. JAMA,317(23):2402–2416. Kwon,J.S.,McAlpine,J.N.,Hanley,G.E.,Finlayson,S.J.,Cohen,T.,Miller,D.M.,Gilks, C. B., and Huntsman, D. G. (2015). Costs and benefits of opportunistic salpingectomy as an ovarian cancer prevention strategy. Obstetrics & Gynecology,125(2):338–345. Kwon, J.S., Tinker, A., Pansegrau, G., McAlpine, J., Housty, M., McCullum, M., andGilks, C. B. (2013). Prophylactic salpingectomy and delayed oophorectomy as an alternative for BRCA mutation carriers. Obstetrics & Gynecology,121(1):14–24. Labidi-Galy, S. I., Papp, E., Hallberg, D., Niknafs, N., Adle↵, V., Noe, M., Bhattacharya, R., Novak, M., Jones, S., Phallen, J., Hruban, C. A., Hirsch, M. S., Lin, D. I., Schwartz, L., Maire, C. L., Tille, J.-C., Bowden, M., Ayhan, A., Wood, L. D., Scharpf, R. B., Kurman, R., Wang, T.-L., Shih, I.-M., Karchin, R., Drapkin, R., and Velculescu, V. E. (2017). High grade serous ovarian carcinomas originate in the fallopian tube. Nature Communications, 8(1):1093. Lee, Y., Miron, A., Drapkin, R., Nucci, M., Medeiros, F., Saleemuddin, A., Garber, J., Birch, C., Mou, H., Gordon, R., et al. (2007). A candidate precursor to serous carcinoma that originates in the distal fallopian tube. The Journal of Pathology,211(1):26–35. McLaughlin, J. R., Rosen, B., Moody, J., Pal, T., Fan, I., Shaw, P. A., Risch, H. A., Sellers, T. A., Sun, P., and Narod, S. A. (2013). Long-term ovarian cancer survival associated with mutation in BRCA1 or BRCA2. Journal of the National Cancer Institute,105(2):141–148. Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. The Computer Journal,7(4):308–313. Norum, J., Hagen, A. I., Mæhle, L., Apold, J., Burn, J., and Møller, P. (2008). Prophylactic 169 bilateral salpingo-oophorectomy (PBSO) with or without prophylactic bilateral mastectomy (PBM) or no intervention in BRCA1 mutation carriers: A cost-e↵ectiveness analysis. Euro- pean Journal of Cancer,44(7):963–971. Permuth-Wey, J. and Sellers, T. A. (2009). Epidemiology of Ovarian Cancer,pages413–437. Humana Press, Totowa, NJ. Peterson, H. B., Xia, Z., Hughes, J. M., Wilcox, L. S., Tylor, L. R., and Trussell, J. (1997). The risk of ectopic pregnancy after tubal sterilization. New England Journal of Medicine, 336(11):762–767. Piek, J. M., van Diest, P. J., Zweemer, R. P., Jansen, J. W., Poort-Keesom, R. J., Menko, F. H., Gille, J. J., Jongsma, A. P., Pals, G., Kenemans, P., et al. (2001). Dysplastic changes in prophylactically removed Fallopian tubes of women predisposed to developing ovarian cancer. The Journal of Pathology,195(4):451–456. Prat, J. (2012). Ovarian carcinomas: Five distinct diseases with di↵erent origins, genetic alterations, and clinicopathological features. Virchows Archiv,460(3):237–249. Schapira, M. M., Matchar, D. B., and Young, M. J. (1993). The e↵ectiveness of ovarian cancer screening: A decision analysis model. Annals of Internal Medicine,118(11):838–843. Schorge, J. O., Modesitt, S. C., Coleman, R. L., Cohn, D. E., Kau↵, N. D., Duska, L. R., and Herzog, T. J. (2010). SGO white paper on ovarian cancer: Etiology, screening and surveillance. Gynecologic Oncology,119(1):7–17. Schrag, D., Kuntz, K. M., Garber, J. E., and Weeks, J. C. (1997). Decision analysis- - E↵ects of prophylactic mastectomy and oophorectomy on life expectancy among women with BRCA1 or BRCA2 mutations. New England Journal of Medicine,336(20):1465–1471. Skates, S. J. and Singer, D. E. (1991). Quantifying the potential benefit of CA 125 screening for ovarian cancer. Journal of Clinical Epidemiology,44(4):365–380. Tavassoli, F. A. and Devilee, P., editors (2003). Tumours of the Ovary and Peritoneum. In World Health Organization Classification of Tumours. Pathology and Genetics of Tumours of the Breast and Female Genital Organs, chapter 2. IARC Press, Lyon, France. Urban, N., Drescher, C., Etzioni, R., and Colby, C. (1997). Use of a stochastic simulation modeltoidentifyanecientprotocolforovariancancerscreening. Controlled Clinical Trials, 18(3):251–270. US Preventive Services Task Force (2018). Screening for ovarian cancer: US Preventive Services Task Force recommendation statement. JAMA,319(6):588–594. van Roosmalen, M. S., Verhoef, L. C., Stalmeier, P. F., Hoogerbrugge, N., and van Daal, 170 W. A. (2002). Decision analysis of prophylactic surgery or screening for BRCA1 mutation carriers: Amoreprominentroleforoophorectomy. Journal of Clinical Oncology,20(8):2092– 2100. Vencken, P., Kriege, M., Hoogwerf, D., Beugelink, S., vanderBurg, M., Hooning, M., Berns, E.,Jager,A.,Coll´ ee,M.,Burger,C.,etal.(2011). ChemosensitivityandoutcomeofBRCA1- and BRCA2-associated ovarian cancer patients after first-line chemotherapy compared with sporadic ovarian cancer patients. Annals of Oncology,22(6):1346–1352. Walker, J. L., Powell, C. B., Chen, L., Carter, J., Bae Jump, V. L., Parker, L. P., Borowsky, M. E., and Gibb, R. K. (2015). Society of Gynecologic Oncology recommendations for the prevention of ovarian cancer. Cancer,121(13):2108–2120. 171 Appendix D: Data D.1 Age-Specific Risk Estimates for BRCA Mutation Carriers We select the break points in (7.2) based on the risk estimates derived from Finch et al. [2006] and Kuchenbaecker et al. [2017], which are presented in Figures D.1 and D.2. We also impose validity conditions on the slope parameters so that the slopes follow the general trend observed in these studies. Figure D.1: Annual risk (per 100,000 per year) by age for BRCA mutation carriers, derived from Finch et al. [2006] D.2 Validity Conditions The set of validity conditions imposed on the model for a given subtype t2T ,V t ,isgiven by: P t 1U,2U P t 1D,2D (D.1) P t 1U,3U P t 1D,3D (D.2) 172 Figure D.2: Ovarian cancer risk by age for BRCA mutation carriers, derived from Kuchen- baecker et al. [2017] P t 1U,DD P t 1D,DD (D.3) P t 2U,3U P t 2D,3D (D.4) P t 2U,DD P t 2D,DD (D.5) P t 3U,DD P t 3D,DD (D.6) P t 2U,3U P t 1U,2U (D.7) P t 3U,DD P t 2U,3U (D.8) P t 2D,3D P t 1D,2D (D.9) P t 3D,DD P t 2D,3D (D.10) P t 1U,2U P t 1U,3U (D.11) P t 1U,3U P t 1U,DD (D.12) P t 1D,2D P t 1D,3D (D.13) P t 1D,3D P t 1D,DD (D.14) P t 2U,3U P t 2U,DD (D.15) P t 2D,3D P t 2D,DD (D.16) 173 P t 1U,1D P t 1U,2D (D.17) P t 1U,2D P t 1U,3D (D.18) P t 1U,3D P t 1U,DD (D.19) P t 2U,2D P t 2U,3D (D.20) P t 2U,3D P t 2U,DD (D.21) P t 2U,DD P t 1U,DD (D.22) P t 3U,DD P t 2U,DD (D.23) P t 2D,DD P t 1D,DD (D.24) P t 3D,DD P t 2D,DD (D.25) P t 2U,3D P t 1U,3D (D.26) P t 3U,3D P t 2U,3D (D.27) P t 2U,2D P t 1U,2D (D.28) P t 3U,3D P t 3U,DD (D.29) P t 2U,2D P t 1U,1D (D.30) P t 3U,3D P t 2U,2D (D.31) p DO (a)=ˆ p DO (a) 8 a2A (D.32) X j P t,r H,j (a)=1 8 a2A ,r2R (D.33) 0 P t,r H,j (a) 1 8 j2S (t),a2A ,r2R (D.34) X j6=H,DO P t i,j =1 8 i6=H,DO (D.35) 0 P t i,j 1 8 i,j 6=H,DO (D.36) t,B 1 t,B 2 t,B 3 (D.37) t,NB 1 t,NB 2 (D.38) 174 0 t,B 1 1 p DO (30) 10 (D.39) 0< t,B 2 1 p DO (40) 10 (D.40) 0< t,B 3 1 p DO (60) 20 (D.41) 1 p DO (60) 10 t,B 4 <0(D.42) 1 p DO (70) 10 t,B 5 1 p DO (85) 15 (D.43) 0 t,NB 1 1 p DO (40) 10 (D.44) 0< t,NB 2 1 p DO (75) 35 (D.45) 1 p DO (75) 10 t,NB 3 1 p DO (85) 10 (D.46) 10 t,B 1 +10 t,B 2 1 p DO (40) (D.47) 10 t,B 1 +10 t,B 2 +20 t,B 3 1 p DO (60) (D.48) 10 t,B 1 +10 t,B 2 +20 t,B 3 +10 t,B 4 1 p DO (70) (D.49) 10 t,B 1 +10 t,B 2 +20 t,B 3 +10 t,B 4 0(D.50) 10 t,B 1 +10 t,B 2 +20 t,B 3 +10 t,B 4 +15 t,B 5 1 p DO (85) (D.51) 10 t,B 1 +10 t,B 2 +20 t,B 3 +10 t,B 4 +15 t,B 5 0(D.52) 10 t,NB 1 +35 t,NB 2 1 p DO (75) (D.53) 10 t,NB 1 +35 t,NB 2 +10 t,NB 3 1 p DO (85) (D.54) 10 t,NB 1 +35 t,NB 2 +10 t,NB 3 0(D.55) P S,B H,F1 (40) 20P S,NB H,F1 (40) (D.56) P S,B H,F1 (60) 30P S,NB H,F1 (60) (D.57) P NS,B H,1U (40) 4P NS,NB H,1U (40) (D.58) P NS,B H,1U (60) 6P NS,NB H,1U (60) (D.59) Equations (D.1) - (D.6) incorporate the e↵ect of treatments by requiring that progression is more likely prior to disease diagnosis. (D.7) - (D.10) state that the likelihood of progression 175 from a given health state increases with the severity of that health state. (D.11) - (D.31) require that progression from a given health state to a more severe health state is less likely than to a less severe health state. (D.32) sets p DO (a)asthecorrespondingobservedvalue. (D.33) - (D.36) refer to the basic probability laws involving the elements in P t .(D.37)and (D.38) are based on the general trend observed from studies that are discussed in §7.2.5. (D.39) - (D.55) ensure that the slopes do not violate (D.33) and (D.34). (D.56) - (D.59) require that the probability of activating the disease for women with a BRCA mutation is aconstantmultipleofthatforwomenwithoutaBRCAmutationatages40and60,where the constants are chosen based on the subtype-specific risks for both patient types. D.3 Modeling Surgical Interventions We model the e↵ect of bilateral salpingectomy (BS) and bilateral salpingo-oophorectomy (BSO) on each subtype as outlined in Tables D.1 and D.2. Because these interventions are performed as a preventive measure, only unobservable health states are a↵ected. 176 Health states BS BSO H reduce activation 1 probability by 85% reduce activation probability by 99% F1 consider patient as healthy and reduce activation probability by 99% reduce activation probability by 85% F2 consider patient as healthy and reduce activation probability by 95% reduce activation probability by 80% 1U identify patient as 1D immediately identify patient as 1D immediately 2U identify patient as 2D immediately identify patient as 2D immediately 3U identify patient as 3D immediately identify patient as 3D immediately Table D.1: Modeling the e↵ect of BS and BSO for the serous model Health states BS BSO H activation probability is una↵ected reduce activation probability by 99% 1U identify patient as 1D immediately identify patient as 1D immediately 2U identify patient as 2D immediately identify patient as 2D immediately 3U identify patient as 3D immediately identify patient as 3D immediately Table D.2: Modeling the e↵ect of BS and BSO for the non-serous model 1 After surgery, activation refers to the transition from H to 1U 177 Chapter 8: Conclusions Calibration, validation, and sensitivity analysis are three crucial elements in natural his- tory modeling. Limited transparency and insucient documentation regarding the elements within the disease modeling literature challenge interpretation as well as reliability of mod- eled recommendations. In this dissertation, we show how our calibration framework can aid in the examination of the impact of calibration uncertainty on modeled recommendations in a systematic fashion. Additionally, our approach promotes transparency. Prompted by the lack of data from large cohort studies and relatively small number of literature in model- based analysis for ovarian cancer, we adapt our calibration framework within the context of ovarian cancer. Our modeling exercise indicates that the mean duration of the preclin- ical phase for ovarian cancer is likely to fall between 4.1 years and 9.6 years, suggesting amulti-yearwindowforearlydetection. Ahypotheticalannualscreeningprogramwitha sensitivity of 95% that initiates at age 50 is likely to result in a median of approximately 25% in mortality reduction. Recent findings in fallopian tube precursors in serous carcinogenesis renew interest in salpingectomy as a prophylactic measure. We incorporate these findings and BRCA muta- tions in our natural history model. The result is a risk- and subtype-di↵erentiated model. We apply our calibration framework and evaluate the impact of calibration uncertainty on unobservable disease characteristics. From our results, we find that the expected time spent in precursor lesions is between 2 year and 3 years for serous ovarian cancer. Also, there is a multi-year window for early detection or preventive strategies for ovarian cancer. The plau- siblerangesofthemeanpreclinicaldurationforallsubtypesandpatienttypesareconsistent withourfindingsfromChapter5. Usingthesetofplausiblemodels,weexaminethepotential of bilateral salpingectomy (BS) and bilateral salpingo-oophorectomy (BSO) for various ages at surgery. Although the reduction in subtype-specific mortality and/ or incidence decreases 178 with the age at surgery, the decrease is small so that both BS and BSO can be e↵ective. Acombinedstrategythatconsistsofbilateralsalpingectomywithdelayedoophorectomyis likelytoachievesimilarlevelofmortalityand/orincidencereductioncomparedtostrategies involving a single surgery. We have performed precise calculation of all modeled outcomes in our work. This fa- cilitates computations, so that the e↵ort is spent primarily on obtaining plausible models via repeated Nelder-Mead searches. Future directions include improving computational ef- ficiency in obtaining plausible models. Additionally, a single ovarian cancer model that incorporates both subtypes allows a complete assessment of the impact of an intervention on overall risk of ovarian cancer. 179
Abstract (if available)
Abstract
Model-based analysis for comparative evaluation of strategies for disease treatment and management requires the development of a model of the disease that is being examined. The natural history (NH) model is arguably the most critical element in this process. A NH model requires the specification of various model parameters, some of which may not be observable, and generates modeled outcomes that offer an ability to assess the model’s consistency with available data and clinical judgment. There is rarely a unique set of model parameters that are consistent with observable data, a phenomenon known as calibration uncertainty. ❧ Because model parameters influence comparative analyses, insufficient examination of the breadth of potential model parameters can create a false sense of confidence in the model recommendations, and ultimately cast doubt on the value of the analysis. This dissertation introduces a systematic approach to the examination of calibration uncertainty and its impact. We represent the calibration process as a constrained nonlinear optimization problem and introduce the notion of plausible models which define the uncertainty region for model parameters. In doing so, our framework integrates three crucial components that are undertaken in model-based analysis for assessing credibility of a model’s conclusions: validation, calibration, and sensitivity analysis. ❧ We first illustrate our calibration approach using a fictitious disease. We add degrees of realism by adapting our framework within the context of ovarian cancer. By examining the breadth of plausible models for ovarian cancer, we explore the range of potential unobservable disease characteristics, as well as the potential for early detection of ovarian cancer. We introduce the notion of a conditionally stationary Markov model as a method for streamlining the computational burden associated with the identification of plausible models. Finally, we incorporate recent discoveries in ovarian carcinogenesis and develop a risk- and subtype-differentiated model. By adopting our calibration framework, we characterize the set of plausible models and assess the potential of prophylactic strategies.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Optimizing healthcare decision-making: Markov decision processes for liver transplants, frequent interventions, and infectious disease control
PDF
Designing infectious disease models for local level policymakers
PDF
The next generation of power-system operations: modeling and optimization innovations to mitigate renewable uncertainty
PDF
A series of longitudinal analyses of patient reported outcomes to further the understanding of care-management of comorbid diabetes and depression in a safety-net healthcare system
PDF
Scalable optimization for trustworthy AI: robust and fair machine learning
PDF
Human and machine probabilistic estimation for decision analysis
PDF
A stochastic Markov chain model to describe cancer metastasis
PDF
Inverse modeling and uncertainty quantification of nonlinear flow in porous media models
PDF
Advances in linguistic data-oriented uncertainty modeling, reasoning, and intelligent decision making
PDF
Efficient control optimization in subsurface flow systems with machine learning surrogate models
PDF
Some scale-up methodologies for advanced manufacturing
PDF
Application of data-driven modeling in basin-wide analysis of unconventional resources, including domain expertise
PDF
Model based design of porous and patterned surfaces for passive turbulence control
PDF
DNA methylation markers for blood-based detection of small cell lung cancer in mouse models
PDF
Racial/ethnic differences in colorectal cancer patient experiences, health care utilization and their association with mortality: findings from the SEER-CAHPS data
PDF
Uncertainty quantification in extreme gradient boosting with application to environmental epidemiology
PDF
Improving arterial spin labeling in clinical application with deep learning
PDF
Countering problematic content in digital space: bias reduction and dynamic content adaptation
PDF
Bayesian hierarchical models in genetic association studies
PDF
A study of diffusive mass transfer in tight dual-porosity systems (unconventional)
Asset Metadata
Creator
Chen, Jing Voon
(author)
Core Title
Calibration uncertainty in model-based analyses for medical decision making with applications for ovarian cancer
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Industrial and Systems Engineering
Publication Date
08/05/2018
Defense Date
06/05/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
calibration uncertainty,Markov model,natural history model,OAI-PMH Harvest,ovarian cancer
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Higle, Julia (
committee chair
), Roman, Lynda (
committee member
), Suen, Sze-Chuan (
committee member
)
Creator Email
jingvooc@usc.edu,jingvoon@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-53229
Unique identifier
UC11672502
Identifier
etd-ChenJingVo-6644.pdf (filename),usctheses-c89-53229 (legacy record id)
Legacy Identifier
etd-ChenJingVo-6644.pdf
Dmrecord
53229
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Chen, Jing Voon
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
calibration uncertainty
Markov model
natural history model
ovarian cancer