Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
ROC surface in the presence of verification bias
(USC Thesis Other)
ROC surface in the presence of verification bias
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ROC Surface in the Presence of Verification Bias by Ying Zhang Supervised by Dr. Todd Alonzo A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirement for the Degree DOCTOR OF PHILOSOPHY (BIOSTATISTICS) December 2015 Copyright 2015 Ying Zhang DEDICATION This dissertation I dedicate to my parents, Guoying Wei and Dezhou Zhang, and to my beloved grandparents in heaven. i ACKNOWLEDGMENTS Earning a Ph.D. degree is not a solo mission; it comes with the efforts and supports of many. I would like to first express my sincere appreciation and gratitude to my dissertation advi- sor Dr. Todd Alonzo. His constant support and encouragement made my doctoral research an amazing journey. He was always there whenever I needed his advice. He taught me how to ask the right questions and how to express my ideas. He showed me different ways to approach a research problem. He guided me through tough times and never doubted my abilities. He brought out the best in me and more importantly, he inspired me to work hard by being a role model himself. He spent endless hours proofreading my dissertations and providing me with countless ways to elevate my dissertation up to a higher level. Without his everlasting encouragement, constant guidance and genuine care, I could not have com- pleted this dissertation. I will treasure this experience as an unique and enlightening one. I extend my special thanks to Dr. Juan Pablo Lewinger, who has served as my mentor in many regards. He helped me with much personal guidance and individual insights. I benefited enormously from his kind acts. I also give special thanks to Dr. Wendy Mack, Dr. Kimberly Siegmund and Dr. Helena Chang Chui for their insights and suggestions throughout my research journey. I thank Dr. Mack for all the helpful advice about my re- search especially for the part of data application. Being a TA of PM518a was an enjoyable and rewarding experience. I thank Dr. Siegmund for her enlightening perspectives. The exposure to PM 610 seminars really improved my presentation skills and taught me how to deliver my works effectively. I thank Dr. Chui for her patience and invaluable feedback. Her insightful comments and knowledge in Neuroscience enabled me to improve my dis- sertation further. ii I thank the Alzheimers Disease Neuroimaging Initiative research group for kindly permit- ting access to the data analyzed in this dissertation. In addition, I take this opportunity to thank faculties here at USC, Dr. Ite. Laird-Offringa, Dr. Carrie Breton, and Dr. Towhid Salam, for all the support and guidance. Also I own many thanks to my classmates and friends, for their friendship and selfless help. To my parents, Guoying Wei and Dezhou Zhang, thank you for your endless love, support and encouragement. You provided me with the best life possible, and did everything that you could to make my life wonderful and meaningful. You are always believing in me and giving me strength to reach the stars and chase my dreams. You are the reason I strive for the best. For your kind hearts and love, I will forever be grateful. To my family mem- bers, I am truly fortunate to have your frequent love and care. I can never thank you enough. And finally, to my boyfriend and my soul mate, Guanyi Huang, the meaning of my life is you. I am very lucky to have you. Thank you for coming into my life and loving me, even when I am being impossible and grumpy. You showed me the lights when I was in the dark. Thank you for never giving up on me and always knowing how to make me smile. And also, I love your smile and I am Super Cereal© iii TABLE of CONTENTS Dedication i Acknowledgments ii List of Figures viii List of Tables x Abstract xiii Chapter 1 Introduction 1 1.1 Types of Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Verification Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Importance of Recognizing and Correcting Verfication Bias . . . . . . . 5 1.4 Three-class Disease Classification Problems . . . . . . . . . . . . . . . 6 1.5 Motivating Examples - Alzheimer’s Disease (AD) . . . . . . . . . . . . 7 1.6 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 2 Existing Methods for Assessing the Diagnostic Accuracy (two- class disease status) 11 2.1 Data and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Measure of Diagnostic Accuracy . . . . . . . . . . . . . . . . . . . . . 12 2.4 Assessing Accuracy for Complete Data . . . . . . . . . . . . . . . . . . 13 2.5 Missing At Random (MAR) Assumption . . . . . . . . . . . . . . . . . 15 iv 2.6 Na¨ ıve Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.7 Two-Stage Design Methods . . . . . . . . . . . . . . . . . . . . . . . . 16 2.7.1 Full Imputation (FI) Method . . . . . . . . . . . . . . . . . . . . . 17 2.7.2 Mean Score Imputation (MSI) Method . . . . . . . . . . . . . . . . 18 2.7.3 Inverse Probability Weighting (IPW) Method . . . . . . . . . . . . 18 2.7.4 Semi-Parametric Efficient (SPE) Method . . . . . . . . . . . . . . . 19 2.8 Direct Estimation of Bias-corrected AUC . . . . . . . . . . . . . . . . . 20 2.9 Non-ignorable missingness . . . . . . . . . . . . . . . . . . . . . . . . 21 2.9.1 Doubly Robust (DR) Estimators . . . . . . . . . . . . . . . . . . . 21 2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Chapter 3 Three-Class ROC Analysis 26 3.1 Data and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 Three-Class ROC Analysis with Complete Data . . . . . . . . . . . . . 26 3.3 Na¨ ıve Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Three-Class ROC Analysis under Verification Bias . . . . . . . . . . . . 30 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 4 Estimation of VUS in the Presence of Ignorable Verification Bias 34 4.1 A Brief Review on U-statistics . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 IPW Estimator of AUC . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3 IPW Estimator of VUS . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 5 Estimation of VUS in the Presence of non-ignorable missingness 41 5.1 Model Assumptions for Identification . . . . . . . . . . . . . . . . . . . 42 5.2 IPW Estimation under Non-ignorable Verification Bias . . . . . . . . . . 46 5.3 DR Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 Pseudo Doubly Robust (PDR) Estimation of VUS . . . . . . . . . . . . 50 5.5 Qualitative Comparison of Approaches . . . . . . . . . . . . . . . . . . 54 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 v Chapter 6 Asysmptotic Properties of Bias-corrected Estimators of VUS 56 6.1 Data and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2 Closed-form Estimation of Asymptotic Variance for IPW estimator . . . 57 6.3 Jackknife Estimator of Variance . . . . . . . . . . . . . . . . . . . . . . 62 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Chapter 7 Simulation studies: IPW Methodology 65 7.1 Notation for Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7.2 Simulation Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 7.3 Default Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.4 Performance of VUS: Bias . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.5 Performance of VUS: SD . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.6 Performance of VUS: MSE . . . . . . . . . . . . . . . . . . . . . . . . 71 7.7 Performance of VUS: Relative Efficiency . . . . . . . . . . . . . . . . . 71 7.8 Varying Test Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7.9 Varying Disease Prevalence . . . . . . . . . . . . . . . . . . . . . . . . 74 7.10 Varying Verification Mechanism . . . . . . . . . . . . . . . . . . . . . . 75 7.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Chapter 8 Simulation studies: IPW, DR and PDR Methodology 84 8.1 Simulation Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 8.2 Working Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.3 Small Sample Bias: VUS Estimator . . . . . . . . . . . . . . . . . . . . 86 8.3.1 Under Ignorable Missingness . . . . . . . . . . . . . . . . . . . . . 86 8.3.2 Under Non-ignorable Missingness . . . . . . . . . . . . . . . . . . 88 8.4 Performance of VUS: Variance . . . . . . . . . . . . . . . . . . . . . . 90 8.4.1 Under Ignorable Missingness . . . . . . . . . . . . . . . . . . . . . 90 8.4.2 Under Non-ignorable Missingness . . . . . . . . . . . . . . . . . . 90 8.5 Performance of VUS: Relative Efficiency . . . . . . . . . . . . . . . . . 91 8.6 Performance of VUS: MSE . . . . . . . . . . . . . . . . . . . . . . . . 92 8.7 Default Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 vi 8.8 Varying Test Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 8.9 Varying Disease Prevalence . . . . . . . . . . . . . . . . . . . . . . . . 96 8.10 Varying Verification Mechanism . . . . . . . . . . . . . . . . . . . . . . 98 8.11 Model Misspecification . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.12 Alpha Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 8.13 Ties in Test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Chapter 9 Data Application 118 9.1 Study Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 9.2 Convention and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 119 9.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 9.4 Verification Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 9.5 Working Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 9.6 Application Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Chapter 10 Conclusions and Future Works 130 10.1 Dissertation Contributions and Recommendations . . . . . . . . . . . . 130 10.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 10.2.1 Asymptotic Distribution Theory . . . . . . . . . . . . . . . . . . . 133 10.2.2 Estimation with NULL probability of verification . . . . . . . . . . 133 10.2.3 Estimation under verification bias adjusting for covariates . . . . . . 133 Bibliography 135 vii List of Figures 2.1 Examples of Four ROC Curves . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1 Example of ROC Surface . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 7.1 Full data (Green) and CC (Red) ROC surface from a randomly chosen re- alization of the simulation study; TPR0, TPR1 and TPR2 for D = 0, 1, and 2, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 7.2 Full data (Green) and IPW (Red)ROC surface from a randomly chosen re- alization of the simulation study; TPR0, TPR1 and TPR2 for D = 0, 1, and 2, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 8.1 Full data (Red surface) and CC (Green surface) ROC surface from a ran- domly chosen realization of the simulation study; TPR0, TPR1 and TPR2 for D = 0, 1, and 2, respectively. . . . . . . . . . . . . . . . . . . . . . . . 106 8.2 Full data (Red surface) and IPW (Green surface) ROC surface from a ran- domly chosen realization of the simulation study; TPR0, TPR1 and TPR2 for D = 0, 1, and 2, respectively. . . . . . . . . . . . . . . . . . . . . . . . 107 8.3 Full data (Red surface) and CC (Green surface) ROC surface from a ran- domly chosen realization of the simulation study; TPR0, TPR1 and TPR2 for D = 0, 1, and 2, respectively. . . . . . . . . . . . . . . . . . . . . . . . 108 8.4 Full data (Red surface) and IPW (Green surface) ROC surface from a ran- domly chosen realization of the simulation study; TPR0, TPR1 and TPR2 for D = 0, 1, and 2, respectively. . . . . . . . . . . . . . . . . . . . . . . . 109 9.1 Boxplot of Tau protein & Ab1-42, stratified by disease status . . . . . . . . 126 9.2 Boxplot of Tau protein & Ab1-42, stratified by verification status . . . . . . 127 viii 9.3 Boxplot of Tau protein & Ab1-42, stratified by verification status . . . . . . 128 9.4 Boxplot of Tau protein & Ab1-42, stratified by verification status . . . . . . 129 ix List of Tables 1.1 Hypothetical example of verification bias . . . . . . . . . . . . . . . . . . . 4 7.1 True and CC VUS under different test accuracy . . . . . . . . . . . . . . . 67 7.2 Mean estimated VUS from 1000 realizations of the simulation with differ- ent sample sizes. Relative bias, (mean VUS - true estimate of 0.792)/0.792 is provided in parentheses. . . . . . . . . . . . . . . . . . . . . . . . . . . 69 7.3 The ratio of the Monte Carlo mean of the estimated SD to simulation SD of the estimators. Jackknife variance estimator used for full data, CC, IPW(E)- J, and IPW(K)-J. Closed form variance estimator used for IPW(E)-CF and IPW(K)-CF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 7.4 90 % CI coverage probability of VUS variance estimator. Jackknife vari- ance estimator used for full data, CC, IPW(E)-J, and IPW(K)-J. Closed form variance estimator used for IPW(E)-CF and IPW(K)-CF. . . . . . . . 72 7.5 MSE10 3 of VUS estimators. Jackknife variance estimator used for full data, CC, IPW(E), and IPW(K). . . . . . . . . . . . . . . . . . . . . . . . . 73 7.6 Efficiency comparison between d VUS IPW(E) and d VUS IPW(K) . ERE relative to d VUS IPW(K) is provided in parentheses. . . . . . . . . . . . . . . . . . . 73 7.7 Detailed information about verification rate: varying test accuracy . . . . . 74 7.8 Comparison of the VUS Estimators: Varying Test Accuracy. . . . . . . . . 80 7.9 Detailed information about verification rate: varying disease prevalence . . 81 7.10 Comparison of the VUS Estimators: Varying Disease Prevalence . . . . . . 81 7.11 Detailed information about verification rate: varying verification mechanism 82 7.12 Comparison of the VUS Estimators: Varying Verification Rate . . . . . . . 83 x 8.1 Mean estimated VUS from 1000 realizations of the simulation with dif- ferent sample sizes under the ignorable missingness (True VUS: 0.843). Relative bias, (mean VUS - 0.843)/0.843 is provided in parentheses. . . . . 88 8.2 Mean estimated VUS from 1000 realizations of the simulation with dif- ferent sample sizes, under non-ignorable missingness (True VUS: 0.843). Relative bias, (mean VUS - 0.843)/0.843 is provided in parentheses. . . . . 89 8.3 The ratio of the Monte Carlo mean of estimated SD to the simulation SD of the estimators under ignorable missingness. . . . . . . . . . . . . . . . . 91 8.4 90 % CI coverage probability of VUS variance estimator: under non-ignorable missingness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8.5 Ratio of the Monte Carlo mean of estimated SD of the estimators to the Monte Carlo SD of the estimators: under non-ignorable missingness . . . . 93 8.6 90 % CI coverage probability of VUS variance estimator: under ignorable missingness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 8.7 Efficiency relative to DR2. ARE relative to DR2 is provided in parentheses under ignorable missingness. . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.8 Efficiency relative to DR2. ARE relative to DR2 is provided in parentheses: non-ignorable missingness. . . . . . . . . . . . . . . . . . . . . . . . . . . 95 8.9 MSE10 3 of VUS estimators: ignorable missingness. Jackknife variance estimator used for all the estimators. . . . . . . . . . . . . . . . . . . . . . 96 8.10 MSE10 3 of VUS estimators: non-ignorable missingness. Jackknife variance estimator used for all the estimators. . . . . . . . . . . . . . . . . 97 8.11 Detailed information about verification rate: varying test accuracy. . . . . . 97 8.12 Comparison of the VUS Estimators: Varying Test Accuracy. Relative bias(%) to “True” estimates is provided in parentheses. . . . . . . . . . . . . . . . . 110 8.13 Detailed information about verification rate: varying disease prevalence . . 111 8.14 Comparison of the VUS Estimators: Varying Disease Prevalence. Relative bias(%) to “True” estimates is provided in parentheses. . . . . . . . . . . . 112 8.15 Detailed information about verification rate: varying verification mechanism 113 8.16 Comparison of the VUS Estimators: Varying Verification Rate . . . . . . . 114 xi 8.17 Comparison of the VUS Estimators: Model Correctness . . . . . . . . . . . 116 8.18 Comparison of the VUS Estimators: Alpha Sensitivity . . . . . . . . . . . . 117 9.1 Summary statistics of ADNI data . . . . . . . . . . . . . . . . . . . . . . . 120 9.2 Estimated model parameters — verification model . . . . . . . . . . . . . . 122 9.3 Estimates of the VUS using data from ADNI . . . . . . . . . . . . . . . . . 123 9.4 Sensitivity Analysis for Estimates of the VUS using data from ADNI (true a =1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 xii Abstract In diagnostic medicine, the volume under the receiver operating characteristic (ROC) sur- face (VUS) is a commonly used index to quantify the ability of a continuous diagnostic test to discriminate between three disease states. In practice, verification of the true disease status may be performed only for a subset of subjects under study since the verification procedure is invasive, risky or expensive. The selection for disease examination might de- pend on the results of the diagnostic test and other clinical characteristics of the patients, which in turn can cause bias in estimates of the VUS. This bias is referred to as verification bias. The only study considering verification bias correction in three-way ROC analysis fo- cuses on ordinal tests. We propose new verification bias-correction methods to estimate the VUS for a continuous diagnostic test. We investigate the assumptions of missing at random (MAR) and non-ignorable missingness. Three classes of estimators are proposed, namely, inverse probability weighted, imputation-based, and doubly robust estimators. A Jackknife estimator of variance is derived for all the proposed VUS estimators. Based on U-statistics theory, we also develop asymptotic properties for IPW estimators. Extensive simulation studies are performed to evaluate the performance of the new estimators in terms of bias correction and variance. And the proposed methods are applied to data in Alzheimers dis- ease research. KEYWORDS: Diagnostic test; ROC surface; VUS; Verification bias; MAR; Non-ignorable missingness xiii Chapter 1 Introduction This dissertation addresses the problem of developing and comparing statistical methods that assess the accuracy of a continuous diagnostic test for three-way disease classification in presence of verification bias, i.e. the selection procedure for true disease status verifica- tion relating to the true disease status. Diagnostic tests, which are of critical importance in the field of health care, are performed to provide information about the diagnosis or detec- tion of diseases, to track disease progression and to understand disease mechanisms[1, 2]. Common examples of diagnostic tests include physical examinations, X-rays, biopsies, pregnancy tests and medical histories. A perfect diagnostic test would correctly classify all patients with the target health condition (often a disease or a disease stage). Unfortunately, the perfect test rarely exists, and thus researchers should be aware of the diagnostic accuracy, which describes the ability of a test to discriminate between different health conditions, before they can confidently introduce the new tests into clinic. An ac- curate diagnostic test often can help early diagnosis of disease, which leads to successful treatment and improved recovery. On the contrary, an inaccurate test often results in in- effective or even harmful treatment that can be both physically and emotionally painful to incorrectly diagnosed patients. With the rapid development of new technologies such as microarrays, the number of new diagnostic tests has been increasing dramatically. Thus, it is critical to develop proper statistical methods to evaluate the diagnostic accuracy. 1 1.1 Types of Test Results Recently, the advancement in medical technologies has facilitated the development of a vast amount of diagnostic and screening tests that can provide non-invasive and accurate tools to predict disease existence and progression. Tests may yield results, on a binary, ordinal or continuous scale. Binary test is the simplest type, which only has two diagnostic values, negative and positive. One classical example of a binary test is the pregnancy test. Some tests produce results that can have more than two values. For example, echocardiographical grading of mitral regurgitation produces a 5-point ordinal scale (0; 1+; 2+; 3+; 4+) in- dicating disease severity. Test results can also be continuous, having an unlimited number of possible values. Examples include gene expression profiling, a clinician’s rating of con- fidence of the presence of tumor ranging from 0% to 100%. Continuous test offers more detailed information about the degree of the illness and also imposes difficulties when it comes to estimating the accuracy. In this dissertation, we focus on accuracy estimation for continuous diagnostic tests. 1.2 Verification Bias The accuracy of a new diagnostic test is usually assessed by comparison between the re- sults of the new test with that from a reference test (hereinafter referred to as the “Gold Standard” test (GST), the best possible method for disease ascertainment). It is a common assumption that true disease status can be perfectly verified using a GST. Ideally the pa- tients undergo both the new test and the GST, and after which the results of both tests are compared. Accuracy analysis can be easily performed if the true disease status of every subject is determined by the GST. In practice, however, only a subset of the subjects who took the new test will further take GST and get true disease status verified. This situation is particularly common when the GSTs, such as biopsy to determine the presence of prostate cancer or angiography to decide the existence of coronary artery disease, or a behavioral test to ascertain hearing impairment in infants, are invasive, risky and expensive. Under 2 those conditions, it would be more ethical and cost-effective to preferably administer the GST to patients who are identified by the new test as the high risk group for diseases. On the other hand, the frequency of receiving true disease status verification is lower for pa- tients who have negative results from new test. Therefore, for those subjects not receiving the GST, their true disease status is unknown. As a consequence, na¨ ıve estimators of the accuracy of the new diagnostic test that only base estimation on disease-verified patients may be badly biased. In other words, the estimators are subject to verification bias or work- up bias[3, 4]. In most cases, the decision of choosing patients to obtain a GST is not made randomly, but instead is made based on test results, signs, symptoms or a medical history that are disease suggestive. Usually test results from the new test under study can help identify subjects at higher disease risk and the decision about the verification status is made accordingly. For example, in a study assessing the accuracy of ankle swelling to predict a fracture in patients with ankle injuries, physicians are less likely to refer patients with no ankle swelling to take the GST (X-ray). Not only test results can influence the verification selection for true dis- ease status, signs, symptoms, or a medical history also can serve as disease indicators which will further affect the probability of a person being referred for a GST. Those informative indicators other than the new test results are called auxiliary data. It is not uncommon that clinicians combine information from the new test and auxiliary data to select patients who are in need of GST confirmation. Apart from that, patients having a serious health condi- tion may be too ill to undergo the GST procedure. Thus, the selection procedure for disease verification is often related to the test results and other clinical characteristics, and makes the verified subjects no longer a random sample of those in the study. Verification bias may thus occur when using only verified subjects to evaluate the diagnostic accuracy[3].The hypothetical example below illustrates how such verification bias influences the estimators of diagnostic accuracy. Assume that in a study of evaluating the accuracy of a new binary screening test for a disease, 200 subjects are enrolled and receive the new test. The prevalence of the disease 3 is 50%. Among several indices of diagnostic accuracy of a binary test, sensitivity and specificity are commonly used. Sensitivity, also called true positive rate (TPR), represents the probability of a test correctly identifying an individual as “diseased”. Specificity, also known as 1 false positive rate (FPR), is essentially the ability of a test to correctly identify an individual as “non-diseased”. In this study, the true sensitivity and specificity of the new test are 80% and 90%, respectively. If we have all the subjects receive GST, we will have the following results (Table 1.1a), Table 1.1: Hypothetical example of verification bias Diseased Non-diseased Postive Test 80 10 Negative Test 20 90 100 100 a: Truth Diseased Non-diseased Postive Test 48 6 Negative Test 4 18 52 24 b: Observed In fact, only a random sample of 20% of subjects with a negative test result will take the GST, while 60% of subjects with a positive result will be verified. Then the observed re- sults from verified subjects are shown in table 1.1b. It is clear that by using only verified subjects, the observed sensitivity is 48=52 92%, and the observed specificity is 18=24= 75%. In this case, true sensitivity is overestimated, while true specificity is underestimated. Based on these na¨ ıve estimators, one could draw misleading conclusions that the new test is more sensitive and less specific than it really is. This is a simple example showing that verification bias can distort the assessment of diagnostic accuracy. 4 1.3 Importance of Recognizing and Correcting Verfica- tion Bias Verification bias has been an underrecognized source of error in studies of assessing the diagnostic accuracy. Bates et al. reviewed the existence of potential verification bias in the pediatric literature between 1987 and 1989[5]. Among those studies of assessing diagnos- tic accuracy, 36% were subject to verification bias. Petscavage et al. reviewd 776 original research articles mentioning sensitivity or specificity as the endpoint in the radiology liter- ature between 2006 and 2009, and found that as high as 36.4% of the articles have potential verification bias[6]. In a retrospective design, verification bias happens when a sample of patients receive diagnosis using an imaging modality, but only patients who further have a GST are included in the statistical analysis. Similarly, verification bias is identified in prospective studies when in the beginning of the study, all patients are imaged with the test under study, but only those having a positive test results subsequently receive a GST or only a fraction of patients with negative test results get the disease verification. Fortunately, more and more attention has been put on the prevalence of verification bias in the medical literature. Cronin found that the percentage of diagnostic test studies correcting verification bias remarkably increased from 29% during 1978-1981 to 62% between 1990 and 1993[7]. Furthermore, 40% of studies assessing diagnostic accuracy for cancer published between 1990 and 2003, acknowledged the potential presence of verification bias in their studies[8]. It has been shown that correction for verification bias helps improve the estimates of sen- sitivity and specificity of the diagnostic test and allows better clinical decision making[9]. Between 1995 and 2001, 6691 eligible men were enrolled in a diagnostic test study and underwent PSA examination and digital rectal examination to determine the presence of prostate cancer. Specifically, the GST method, biopsy, was recommended if the patient had PSA values greater than 2:5 ng/ml or digital rectal examination suggested high risk for prostate cancer. In this study, men with higher PSA values were more likely to receive a disease confirmation, thus consituting verification bias. Compared to the unadjusted re- sult, the accuracy estimation of the PSA test increased by almost 62% after adjustment of 5 verification bias. Also, verification bias was found to result in inflation of the sensitivity and underestimation of the true specificity of the PSA test. Therefore, the prior estimates of diagnostic accuracy led to the wrong conclusion, and adjustment for verification bias is necessary to obtain more accurate information about the diagnostic characteristics of the test, and to promote better recommendation for prostate biopsy. As a common problem in medicine research, verification bias can be eliminated in the design stage by referring every subject under study to take the GST. However, due to ethi- cal and cost-effective issues, it is usually not practical to remove verification bias through the study design. In clinical trials, the verification mechanism can be defined in protocol. While in observational studies, it is more difficult to know the mechanism completely. It is thus crucial to understand the existence of this bias and construct a bias-corrected estimate of the diagnostic accuracy accordingly. 1.4 Three-class Disease Classification Problems More than two disease classes are often necessary in many important biomedical diagnos- tic tasks. For example, it is more informative and helpful to predict the progression stage of the disease than to merely report whether the disease is present or not. Take the cancer detection setting as an example, the GST can classify specimens into one of three ordinal classes: non-disease, benign lesions and malignant lesions. Other practical cases in the medical field include: classification of ECG signals as normal, ventricular premature and supraventricular premature beats[10]; and classification of pigmented skin lesions as be- nign, dysplastic nevi, or cutaneous melanoma[11]. When different stages of a disease are critical to report, three or even more ordinal diagnostic groups are introduced in the accu- racy study. Accordingly, the diagnostic accuracy must then measure the capability of the test in correct classification of subjects into different ordinal disease groups. In the two-class disease classification setting, one can just focus on a single class. For ex- 6 ample, subjects are classified as diseased or non-disease according to the test results, so the sensitivity and specificity for a single class is helpful enough. Since there are only three degrees of freedom in the contingency table (error matrix, as shown in Table 1.1a, b), clas- sification rates for the other class can be easily obtained. With c classes, there are c(c 1) degrees of freedom in the error matrix and thus only focusing on one particular class is likely to mislead the conclusion. In practice, the major hindrance in the multi-class classi- fication problem is the high dimensionality of the probability space. Sincec(c1) possible misclassification combinations are needed to evaluate the test, the concern of multi-class problems is not limited in visualization, but also in the complexity of constructing esti- mates. A complete characterization of the performance of accuracy even for simple class distribution is an important yet difficult task. Three-class classification problems have been considered as a convenient and sufficient representative of multi-class classification prob- lems and many methods have been proposed to perform accuracy analysis under this cir- cumstance. 1.5 Motivating Examples - Alzheimer’s Disease (AD) There are numerous examples of medical diagnostic studies that do not fit into the bi- nary disease classification problem.These test results typically give an ordered gradation of illness from not diseased to seriously ill. Here we take the research area of AD as one example. Alzheimer’s Disease (AD) is a complex and progressive disease with sequentially interact- ing pathological cascades that lead to loss of synaptic integrity, effective neural network connectivity and progressive regional neurodegeneration [12, 13]. Mild cognitive impair- ment (MCI) is a transitional stage between normal cognition and dementia. When the underlying disease process is due to AD (plaques and tangles), memory disturbance pre- dominates and manifests clinically as MCI amnestic type.[14, 15]. Patients with MCI are of particular interest because they are at considerably higher risk of progressing to demen- 7 tia of the Alzheimer type, and thus secondary prevention trials can be very meaningful in those patients. In addition, since the progression of AD is a remarkably slow process, primary prevention trials for AD require at least 3000 subjects followed for 5 to 7 years to achieve sufficient clinical endpoints. Therefore, the availability of suitable biomarkers tracking the intermediate stages of disease progression could markedly accelerate drug de- velopment by providing an earlier indication of drug efficacy. The problem with biomarker development for AD and AD-related MCI is that a definite diagnosis requires autopsy con- firmation. Therefore, the true disease status will be missing because living subjects cannot have disease verified, and deceased subjects do not always have autopsy performed. As a result, the percentage of subjects receiving disease verification can be very low and may depend on unobserved disease status. We now take the Alzheimer’s disease Neuroimaging Initiative (ADNI) study as the real- life illustration example. ADNI is a large, multicenter, longitudinal neuroimaging study launched in 2003 by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, the Food and Drug Administration, private pharmaceutical companies and non-profit organizations[16]. For disease verification, due to the fact that it is unrealistic to perform autopsy on all the subjects under study, ADNI investigators used neuropsychological tests to identify disease stages into probable AD, amnestic MCI and normal cognition. Over 1600 participants with MCI, probable AD and elderly, cognitively normal control (CN) have been identified and recruited over three phases: ADNI 1, ADNI GO and ADNI 2. One of the goals of ADNI is to develop new biomarker test that would substitute for neu- ropathological findings. Considering the unreality of performing autopsy on each subject and the alternative cognitive assessment approach can be rather expensive or even not accu- rate, it would be useful and cost-effective if, in the future studies, we can properly estimate the accuracy using in vivo biomarkers rather than requiring post-mortem biomarkers for disease verification and do not require disease verification for all the subjects. Based on the methods developed in Chapters 5 & 6 a study could be designed without the requirement 8 of the definite AD verification on all subjects. This will make the accuracy estimation of a new diagnostic test feasible when it is not realistic to obtain the true disease status of all the study subjects. Apart from that, our methods could significantly reduce the cost of a future study by allowing only a subset of subjects to take those time consuming and expen- sive neuropsychological tests, and at the same time without compromising the decisions about the effectiveness of a new diagnostic test. For example, a cost-efficient study could be designed so that patients with higher test results are given a higher probability to receive disease verification, compared to the probability of having disease confirmed among pa- tients having lower test results. 1.6 Aims The goal of this dissertation is to develop and compare methods for assessing the accuracy of a continuous medical diagnostic or screening test with three ordinal diagnostic groups, when the verification bias is present. Existing methods for assessing accuracy of continuous tests with binary disease status under verification bias are reviewed in Chapter 2. Existing methods for assessing accuracy of a diagnostic test with three ordinal disease classes are reviewed in Chapter 3. In Chapters 4 & 5, new approaches for bias-corrected estimation of accuracy in three-class classification problem under different assumptions are proposed. Inferential methods are derived for these methods in Chapter 6. Chapter 7 & 8 compare the performance of the different approaches as determined by using simulation studies. Data application using ADNI study data set is presented in Chapter 9. Conclusions, recommen- dations along with plans for future work are summarized in Chapter 10. 1.7 Summary Diagnostic tests are widely used in the medical area to help disease diagnosis and advise health care plans. Before a new test can be made available to the public, it is of great importance to properly evaluate the diagnostic accuracy of the test by comparing it to the 9 performance of the GST. Nonetheless, when the GST is expensive or invasive, it may not be applied to all subjects under study, and thus may result in verification bias in estimates of the accuracy. Estimates using data from verified subjects are biased if subjects in the study are not equally likely to receive disease ascertainment and the selection procedure for taking the reference test depends on the test results as well as related clinical character- istics. There has been much work on the statistical estimation and inference of bias-corrected diagnostic accuracy when subjects are categorized in a binary fashion, as summarized in Chapter 2. Due to the fact that many medical diagnostic problems involve more than two diagnostic groups, researchers have been prompted to investigate evaluation of the diagnos- tic accuracy for three-class classification problems. So far, there are few existing methods to quantify three-class diagnostic task performance in the presence of verification bias. In particular, for a continuous diagnostic test there is no existing method correcting verifica- tion bias for the three-class problem. The work in this dissertation will fill the gap. We propose to extend recent developments in deriving verification bias-corrected estimators of the accuracy of continuous diagnostic tests in two-class classification problems, and to derive estimators of three-class diagnostic performance in the presence of verification bias with diagnostic tests measured in continuous scale. Such an effort is applicable to the evaluation of a broad class of problems that can be formulated as three-class classification tasks, and to assess classifiers for tasks involving three or more classes. 10 Chapter 2 Existing Methods for Assessing the Diagnostic Accuracy (two-class disease status) In this chapter we discuss the existing methods for assessing accuracy of a continuous di- agnostic test in the presence of verification bias for two-class classification problems. In the subsequent chapter, we will introduce methods for three-class classification problems in the setting when every subject has disease ascertainment, and proper methods in the presence of verification bias. First, we introduce some notation. 2.1 Data and Notation Suppose a study of evaluating the accuracy of a diagnostic test enrollsn subjects randomly from the target population. Let T i denote the test result and and D i denote the true disease status for the i th subject, i= 1;2;:::n. If there are only two disease classes, D i = 1 indi- cates a diseased status, whileD i = 0 indicates a non-disease status. The true disease status D is known if the i th subject is selected for disease verification, indicated byV i = 1, and if V i = 0, the D i is unknown. Let A i be a vector of observed auxiliary data measured on the i th subject. n v refers to the verification group in which subjects have disease verification. 11 n ¯ v represents the non-verified group without disease verification. 2.2 Assumptions Disease status, D, can be perfectly ascertained using a GST. It is common and convenient to assume that test results are positively related to the true disease status, i.e. higher test results indicate the patient is at higher disease risk. 2.3 Measure of Diagnostic Accuracy For diagnostic tests having binary results (positive or negative), sensitivity and specificity are often used. For an ordinal or continuous test, a thresholdc can be chosen to dichotomize the test result, i.e., positive results will be given ifT >c and negative otherwise. Sensitivity and specificity will then be calculated accordingly for each possible cut-off point c. The overall performance accuracy of the test is generally assessed through the area under the receiver operating characteristic (ROC) curve. The ROC curve is a plot of TPR(c) (sensi- tivity) versus FPR(c) (1-specificity) as one changes the value of the cutpointc for positivity of being diagnosed. For a given cut pointc, TPR and FPR of the test are defined as TPR(c)=P(T >cjD= 1) and FPR(c)=P(T >cjD= 0) 12 ROC curve is a very useful summary graph of the diagnostic test accuracy. It is monotone increasing from 0 to 1 in the unit square. Figure 2.1 shows examples of ROC curves. A perfect diagnostic test (Test A) will generate a ROC curve passing along the upper left cor- ner of the unit square where both the sensitivity and specificity are 100%, while an ROC curve of a diagonal 45 line of the unit square represents an uninformative test (Test D). The closer the ROC curve climbs to the upper left corner of the unit square, the better the diagnostic test is. For instance, Test B represents a more accurate test than Test C. In gen- eral, ROC curves provide information about how well the diagnostic test discriminates the positive condition from the negative condition by directly visualizing the degree of sepa- ration between the distribution of test results in respective conditions. Also, ROC curve is invariant to any monotone transformation of test results and thus it does not depend on the measurement units[17, 18]. Since a summary statistic is usually needed for the accuracy evaluation, the area under the ROC curve (AUC) was proposed as a general index derived from the ROC curve across all possible cut-off points. AUC is defined as P(T i > T j jD i > D j ), where i is a random selected disease subject and j is a randomly selected non-disease subject[19]. AUC can be interpreted as the probability that a test result will be consistent with the disease status. In other words, test results from a diseased subject will be higher than that from a non- diseased subject, assuming disease status is positively correlated with test results. A larger value of AUC indicates better performance of the diagnostic test: an AUC of 1 represents a perfect test (Test A in Figure 2.1), while an AUC of 0:5 represents a worthless test (Test D in Figure 2.1). 2.4 Assessing Accuracy for Complete Data Continuous test results can be dichotomized based on any given cut-off pointc varied from ¥ to +¥, i.e., positive results if T > c and negative results otherwise. When all the subjects in the study have their true disease status verified, one can estimate TPR(c) and 13 FPR(c) non-parametrically for a given cut-off valuec as follows: d TPR(c)= å n i=1 I(T i >c)D i å n i=1 D i (2.1) and d FPR(c)= å n i=1 I(T i >c)(1D i ) å n i=1 (1D i ) (2.2) where I(:) stands for the indicator function. The empirical ROC curve is the plot of the pairs (TPR(c) and FPR(c)), where c ranges over the observed value of test results. For a study having a random sample of size n 1 diseased subjects and n 0 non-diseased subjects, the empirical ROC plot will consist of at mostn 1 +n 0 + 1 points. Extrapolation of the em- pirical ROC curve to all possible threshold values can be generated by connecting adjacent points with vertical lines of length 1=n 0 and horizontal lines of length 1=n 1 . We assume that F 0 and F 1 are cumulative distributions of test results in non-diseased sub- jects and diseased subjects, respectively. For a specificity oft (0 < t < 1), the sensitivity (TPR) of the test can be calculated as following: TPR(t)= 1F 1 (F 0 1 (t)) whereF 0 1 is the inverse function ofF 0 . The ROC curve can thus be expressed as the plot of 1t versus TPR(t), 0 < t < 1. As a widely used measure for summarizing ROC curve, AUC is defined as R 1 0 TPR(t)d t . 14 Let X 1 ;:::;X n 0 be test results of non-diseased subjects and Y 1 ;:::;Y n 1 be test results of diseased subjects, AUC can be calculated by the trapezoidal rule of the Wilcoxon-Mann- Whitney two-sample rank statistic as following: d AUC n 0 n 1 = 1 n 0 n 1 n 0 å i=1 n 1 å j=1 I(Y i >X i ) ROC analysis requires the verification of every subject’s true disease status D. However, in practice, only a subset of subjects is selected for disease ascertainment when the GST is expensive and/or invasive. If the selection mechanism for disease verification is associated with disease status and other related clinical characteristics and subsequently verified sub- jects are not random samples from subjects in the study, estimation of the ROC curve and AUC using only verified subjects will introduce bias. In the remaining part of this chapter, we will briefly review existing methods for correcting verification bias when the diagnostic task is dichotomous and the new test is continuous. 2.5 Missing At Random (MAR) Assumption Since only a subset of the subjects receive true disease status verification, one can consider non-verified subjects as having missing data in true disease status. MAR is the key as- sumption that has been proposed in many methods for verification bias correction[20]. The verification process is MAR if the probability of disease verification is purely determined by the test result and other observed clinical covariates. Thus, the probability of receiv- ing disease verification is conditionally independent of unknown true disease status given observed variables, i.e. V j= Dj(T;A). In other words, MAR means that the verification process is ignorable. If the verification process is related with the true disease status even conditioning on observed variables, the verification process is non-ignorable. Neverthe- less, it is sometimes unrealistic to assume ignorable verification as the physician’s decision about verification depends on the patient’s overall health status information which can’t be 15 fully captured by test results and measured covariates. 2.6 Na¨ ıve Estimators The na¨ ıve estimators of TPR(c) and FPR(c) are similar to equation (2.1) and (2.2), except that they only use data from subjects having observed true disease status. These estimators are also referred as complete-case (CC) estimators: d TPR CC (c)= å n i=1 I(T i >c)V i D i å n i=1 V i D i (2.3) and d FPR CC (c)= å n i=1 I(T i >c)V i (1D i ) å n i=1 V i (1D i ) (2.4) Notably, the CC estimators are unbiased only when the verification process is missing com- pletely at random, which is typically an unrealistic assumption. Biased results will still be introduced under the less restrictive MAR assumption. 2.7 Two-Stage Design Methods Alonzo et al. considered a study with a disease verification process as a study with a two- phase or double-sampling process[21][22]. Estimators for sensitivity and specificity of a diagnostic test can be derived by incorporating methodology for prevalence estimation un- der a two-phase study design. The bias-corrected ROC curve can be made by plotting bias- 16 corrected TPR(c) versus bias-corrected FPR (c) for all possible thresholds c. By applying trapezoidal rule for integration, AUC can be empirically estimated from a bias-corrected ROC curve[21]. 2.7.1 Full Imputation (FI) Method The idea of FI method is to impute the probability of disease for all subjects in the study as a function of (T;A). The FI estimator of disease prevalence can be written as ˆ P(D= 1)= 1 n n å i=1 ˆ p(D i = 1jT i ;A i ) (2.5) where ˆ P(D i = 1jT i ;A i ) is an estimate of the disease model, P(D= 1jT;A). By the MAR assumption, the disease model can be estimated by P(DjT;A;V = 1), which only uses verified subjects. The FI estimator ofP(T >c;D= 1) can be derived as following: ˆ P(T >c;D= 1)= 1 n n å i=1 I(T i >c) ˆ p(D i = 1jT i ;A i ) (2.6) where P(T;A) is empirically assigned as 1=n at each observed point (t;a). Then the FI estimator for TPR(c) is the ratio of FI estimators of P(T > c;D = 1) (equation 2.6) to P(D= 1) (equation 2.5), and similarly, FPR (c) can be estimated as the ratio of P(T > c;D= 0) toP(D= 0): d TPR FI (c)= å n i=1 I(T i >c) ˆ p(D i = 1jT i ;A i ) å n i=1 ˆ p(D i = 1jT i ;A i ) (2.7) and 17 d FPR FI (c)= å n i=1 I(T i >c)(1 ˆ p(D i = 1jT i ;A i )) å n i=1 (1 ˆ p(D i = 1jT i ;A i )) (2.8) 2.7.2 Mean Score Imputation (MSI) Method Instead of imputing the probability of disease for all subjects as in FI method, MSI method only imputes disease status for non-verified subjects and uses the observed disease status for verified subjects. MSI estimator for TPR(c) and FPR(c) are: d TPR MSI (c)= å n i=1 I(T i >c)fV i D i +(1V i ) ˆ p(D i = 1jT i ;A i )g å n i=1 fV i D i +(1V i ) ˆ p(D i = 1jT i ;A i )g (2.9) and d FPR MSI (c)= å n i=1 I(T i >c)fV i (1D i )+(1V i ) ˆ p(D i = 0jT i ;A i )g å n i=1 fV i (1D i )+(1V i ) ˆ p(D i = 0jT i ;A i )g (2.10) where by applying the MAR assumption, the estimate of P(D= 1jT;A) can be obtained using only verified subjects. 2.7.3 Inverse Probability Weighting (IPW) Method Another approach proposed by Horvitz & Thompson for estimating the prevalence of dis- ease in a two-phase design is to weight each verified sample by the inverse of the probability of the subject being selected for verification[23]. Letp =P(V = 1jT;A). In the IPW esti- mators, each verified subject is given weight 1/p, the inverse of the verification probability. 18 IPW estimators for TPR and FPR are: d TPR IPW (c)= å n i=1 I(T i >c)V i D i = ˆ p i å n i=1 V i D i = ˆ p i (2.11) and d FPR IPW (c)= å n i=1 I(T i >c)V i (1D i )= ˆ p i å n i=1 V i (1D i )= ˆ p i (2.12) p may be known or may be estimated using verified sample, which further require the MAR assumption. 2.7.4 Semi-Parametric Efficient (SPE) Method Based on their work about SPE estimator of prevalence of disease in two-phase sampling study, Alonzo et al. further developed SPE estimator of TPR (c) for any given cut-off point c as following: d TPR SPE (c)= å n i=1 I(T i >c)fV i D i = ˆ p i (V i ˆ p i ) ˆ p(D i = 1jT i ;A i )= ˆ p i g å n i=1 fV i D i = ˆ p i (V i ˆ p i ) ˆ p(D i = 1jT i ;A i )= ˆ p i g (2.13) and d FPR SPE (c)= å n i=1 I(T i >c)fV i (1D i )= ˆ p i (V i ˆ p i ) ˆ p(D i = 0jT i ;A i )= ˆ p i g å n i=1 fV i (1D i )= ˆ p i (V i ˆ p i ) ˆ p(D i = 0jT i ;A i )= ˆ p i g (2.14) 19 This estimator combines the essence of the MSI and IPW methods. Like the MSI ap- proach, it imputes the probability of disease for those unverified subjects. On the other hand, similar to IPW method, for those verified subjects, it attaches weights (probability of being selected for disease verification) to the observed disease status. It is considered semi-parametric since it needs to specify parametric models to estimate the conditional probability of disease P(DjT;A) and the conditional probability of verification P(VjT;A), and also non-parametrically estimates the joint distribution ofP(D;T;A). The advantage of this approach is that it is doubly robust in that it is consistent as long as either the disease model or verification model is correctly specified, i.e. incorrect estimation of either one of theP(DjT;A) andP(VjT;A) would not influence the consistency of the SPE estimator. 2.8 Direct Estimation of Bias-corrected AUC He et al. achieved direct estimation of the AUC based on U statistics and the IPW approach[24]. Based on MAR assumption, they observed the fact that: Efp 1 i p 1 j V i V j I(T i >T j )I(D i >D j )g=P(D i = 1)P(D j = 0)AUC wherep i andp j are the verification proabilities from randomly selected two subjectsi and j from each disease group. Each verified subject is attached the weight, which is the inverse of the verification probabilities, i.e. p 1 i and p 1 j . Similarly, a reasonable estimator that has expectationP(D i = 1)P(D j = 1) isfp 1 i p 1 j V i V j I(D i > D j )g. Then the ratio of the above two estimators forP(D i = 1)P(D j = 0)AUC andP(D i = 1)P(D j = 1) results in the direct estimation of AUC. By writing the estimator in symmetric form, it can be expressed as a function of an one sample U statistic: d AUC= å n i=1 å n j=1 1 2 p 1 i p 1 j V i V j fI(T i >T j )I(D i >D j )+I(T i <T j )I(D i <D j )g å n i=1 å n j=1 1 2 p 1 i p 1 j V i V j fI(D i >D j )+I(D i <D j )g (2.15) 20 This new estimator corrects verification bias by using IPW approach. All possible pairs of verified subjects is given weight p 1 i p 1 j . Using asymptotic theory of U statistic, the asymptotic properties of AUC can be derived when assuming the p is known. When p is unknown, logistic regression may be used to estimatep. They also proved that the estima- tor AUC is equivalent to the empirical AUC computed based on the bias-corrected ROC curve obtained using Alonzo & Pepe’s IPW approach(section 2.7.3). 2.9 Non-ignorable missingness Methods introduced in previous sections are all based on the MAR assumption. In practice, however, such assumption may not hold since the doctor’s decision to refer a patient to re- ceive disease verification may depend on comprehensive information on the patient’s health condition, which may be not be guaranteed to be fully captured by the test results or other measured auxiliary data. In such a situation, we call the underlying missingness in dis- ease status non-ignorable. A few papers have considered non-ignorable missingness. The earliest works coming from Baker (1995), Kosinski and Barnhart (2003), and Zhou (1993, 1994)[25, 26, 27, 28] , which focus on estimation of sensitivity and specificity of a binary diagnostic test. Zhou & Rodenberg (1998) and Zhou & Castelluccio (2003) considered the estimation of AUC under non-ignorable missingness but for categorical tests[29, 30]. Here we review a doubly robust method to estimate AUC for a continuous test adjusting for measured covariates and also for an arbitrary specified degree of the residual selection bias. 2.9.1 Doubly Robust (DR) Estimators Rotnitzky et al. applied and extended ideas suggested in previous work of the theory of semi-parametric models with non-ignorable missing data[31]. They proposed a DR esti- mator of the AUC under the presence of non-ignorable missingness. First, they specify the 21 selection model of verification as following: log P(V = 0jT;A;D) P(V = 1jT;A;D) =h(T;A)+q(T;A)D (2.16) whereq(T;A) is an arbitrary (known) function that represents the association between dis- ease status and verification process when controlling for the observed variables (i.e. test results and auxiliary data). When q(T;A) is chosen to be equal to zero, the non-ignorable verification assumption reduces to MAR assumption, meaning that test results and covari- ates cover all information related to disease status that may influence the selection process for verification. One can construct estimates of AUC by assuming the verification process follows a parametric model with a known non-ignorable parameterq(T;A), or by assuming a parametric model for the conditional probability of disease among verified subjects. But when test results and covariates are continuous, with a moderate sample size, there is no guarantee that the disease model or the verification model can be well estimated. To address this problem, Rotnitzky et al. first defined D i DR =P(D i V i ; T i ; A i ;b) + U i (g; b) (2.17) where P(D i V i ; T i ; A i )=P(D i = 1jV i = 1; T i ; A i ) V i + (1V i ) expfq(T i ;A i )g (1P(D i = 1jV i = 1; T i ; A i )f1 expfq(T i ;A i )gg) 1 U i (g; b)=V i D i P(1V i ; T i ; A i ) + expfh(T i ;A i ;g) q(T i ;A i )D i gfD i P(0V i ; T i ; A i )g (2.18) 22 Then the DR estimator of AUC can be constructed as following, d VUS DR = å n i=1 å n j=1 D i DR (1D j DR )I(T i > T j ) å n i=1 å n j=1 D i DR (1D j DR ) Similar to the SPE estimator of sensitivity and specificity developed by Alonzo et al., this DR estimator also combines the two parametric models for the conditional disease proba- bility and for the conditional verification probability, but it is consistent and asymptotically normal if either of these models is correctly specified. As a trade-off to obtain the attractive doubly robust property, the non-ignorable parameter in the verification model cannot be identified. They argued that the non-ignorable parameter can be assumed and a sensitivity analysis can be performed accordingly. Fluss et al. applied the same approach to construct the DR estimators of sensitivity and specificity as well as the empirical ROC curve[32]. Considering the fact that it may be difficult to specify a valid non-ignorable parameter, by making additional assumptions on the disease model, Liu et al. proposed a likelihood- based approach to estimate the ROC curve and AUC under non-ignroable verification bias. Their estimation process requires the specification of the parametric disease model on the whole study population. With the parametric setting of the joint probability of D and V , the non-ignorable parameter becomes identifiable. Such estimator has the same functional form as the DR estimator but it does not have the doubly robust property in general[33]. 2.10 Summary ROC curves are a standard way that has been widely used to assess the diagnostic accuracy of continuous tests. In this chapter, we reviewed existing verification correction methods for a continuous test when the diagnostic task is binary (i.e. two disease classes). FI, MSI, IPW and SPE estimators of ROC curve AUC require the MAR assumption, while DR es- 23 timator relaxes this assumption to allow for the existence of non-ignorable missingness. One appealing property of DR estimator is that it is consistent as long as one of the disease model or verification model is correctly specified. To accommodate the problem of discriminating three disease classes, three-way ROC anal- ysis shall be introduced. In the next chapter we review existing methods of three-way ROC analysis when all subjects receive disease verification and when verification bias is present. 24 Figure 2.1: Examples of Four ROC Curves 25 Chapter 3 Three-Class ROC Analysis Evaluation of diagnostic tests that discriminate patients into one of the three classes of disease severity is of great clinical importance. In this chapter, we review ROC analysis when it is applied to three-class disease classification problems. 3.1 Data and Notations First we extend the notation defined before for the two-class classification problem to ac- comodate the three-class problem. Supposen subjects are chosen randomly from the target population to assess the accuracy of a diagnostic test. Let T i denotes the continuous result of an investigational diagnostic test, and let D i denote the true disease status for the i th subject, i= 1;:::;n. D i = 0 indicates that the subject does not have the disease, D i = 1 indicates that the subject has mild disease and D i = 2 indicates that the subject has severe disease. Only a subset of the subjects have their disease status verified; let V i = 1 if the subject has the true disease status verified, and V i = 0 otherwise. Let A i be a vector of observed covariates for the subject that may be informative aboutD i . 3.2 Three-Class ROC Analysis with Complete Data ROC analysis generally assumes that the diagnostic test aims to classify subjects into just two mutually-exclusive classes or conditions (e.g. diseased or non-diseased). However, in 26 many clinical diagnostic situations, the objective is to discriminate between three or more disease status. Under these circumstances, ROC curves cannot be applied. The ROC sur- face has thus been considered as a natural extension of ROC methodology to three-class diagnostic problems[34]. Because there are three different disease categories to be deter- mined, two ordered decision boundariesc1<c2 are selected and the diagnostic decision is made based on the following rules: 1. ifT i <c1, thenD i = 0 : non-disease subject 2. ifT i >c1&T i <c2, thenD i = 1 : mild-disease subject 3. ifT i >c2, thenD i = 2 : severe-disease subject For each pair of ordered thresholds (c1;c2), three true classification rates (TCR) can be generated. The ROC surface plot is made by plotting TCRs in the three-dimensional space for all possible pairs of thresholds. Figure 3.1 shows an example of a ROC surface. The ROC surface illustrates the trade-off among three TCRs when thresholds vary. The vol- ume under the surface (VUS) has been proposed to evaluate the overall performance of the diagnostic test. As a generalization of the AUC for summarizing an ROC curve under a binary diagnostic task, VUS is defined as P(T i >T j >T k jD i >D j >D k ), where i; j;k are randomly selected subjects from non-disease, mild-disease and severe-disease subjects, respectively. It represents the probability that the test result will correctly identify subjects into one of the three diagnostic categories. Again a value of 1 means perfect test, while a value of 1/6 means that the test is no better than random chance. Let F 0 ;F 1 , and F 2 be the corresponding cumulative distribution functions for the three dis- ease categories, i.e. non-disease, mild-disease and severe-disease subjects. Letd 0 =F 0 (c1), d 2 = 1F 2 (c2) be the true classification rate for non-disease and diseased groups, respec- tively. Then the probability that a randomly selected mild-disease subject has a test result betweenc1 andc2 is 27 d 1 =F 1 (c2)F 1 (c1)=F 1 fF 2 1 (1d 2 )gF 1 fF 0 1 (d 0 )g By plotting the triplet (d 0 ;d 1 ;d 2 ) for all possible pairs of cut-off points (c1;c2) in three- dimensional space, one can construct an ROC surface. Details about forming an ROC surface were discussed elsewhere[35]. To summarize the overall diagnostic accuracy, the VUS is defined as the probability that test results from three randomly selected subjects, one from each class, will be correctly ordered and it can be expressed as VUS= Z 1 0 Z 1F 2 fF 1 1 (d 0 )g 0 F 1 fF 2 1 (1d 2 )gF 1 fF 0 1 (d 0 )gdd 2 dd 0 Nakas and Yiannoutsos proposed to estimate the VUS nonparametrically using Mann- Whitney U statistic, which is given by d VUS= 1 n 0 n 1 n 2 n 0 å i=1 n 1 å i=1 n 2 å i=1 I(T i <T j <T k )I(D i <D j <D k ) where n 0 ;n 1 ;n 2 are the number of subjects in non-disease, mild-disease, and diseased groups, respectively. i, j and k are randomly selected samples from the different dis- ease classes[36]. In accordance with previous work by Dreiseitl et al.[11], they used U statistic methodology to make the inference for VUS estimator. He & Frey also described the large sample confidence interval for VUS based on U-statistic theory and bootstrap methodology[37]. Parametric approaches have also been developed to estimate VUS[38][39][40]. One simple method is to assume that test results from different disease classes follow differ- ent normal distributions, i.e. TN(m d ;s d ),d= 0;1;2. And calculate VUS accordingly: d VUS= Z +¥ ¥ F(asb)F(cs+d)f(s)ds (3.1) 28 where a=s 1 =s 0 ;b=(m 0 m 1 )=s 0 , c=s 1 =s 2 , d =(m 2 m 1 )=s 2 , F(:) is the standard normal distribution function, andf(:) is the standard normal density function[38][40]. In practice, the normality assumption is often not satisfied. Considering that VUS is invariant to monotonic transformation, Box-Cox type transformation may be applied to achieve the normality assumption[39]. Construction of a smoothed ROC curve using ker- nel density estimates has been proposed to summarize the diagnostic accuracy for two-class problems[41][42]. Kang et al. combined Box-Cox type transformation and kernel smooth- ing technique using Gaussian basis functions to estimate VUS[39]. Li & Zhou described a parametric approach to estimate the ROC surface using the same approach (equation 3.1), but used Brownian bridges and quasi-likelihood to estimate the variance of VUS. 3.3 Na¨ ıve Estimators If only a subset of subjects are selected to obtain disease verification, the na¨ ıve estimators of VUS that construct the Mann-Whitney U statistic purely based on verified subjects are obtained as d VUS= å n i=1 å n i=1 å n i=1 I(T i <T j <T k )I(D i <D j <D k )V i V j V k å n i=1 å n i=1 å n i=1 I(D i <D j <D k )V i V j V k This na¨ ıve estimator is unbiased only if the subjects are completely randomly selected for disease verification. Otherwise, this na¨ ıve estimator is biased. There are few existing meth- ods that have been developed to correct the verification bias in three-class ROC analysis. 29 3.4 Three-Class ROC Analysis under Verification Bias Thus far, only one publication has addressed the verification bias problem in ROC surface analysis. A nonparametric likelihood-based approach has been developed by Chi and Zhou to construct the empirical bias-corrected ROC surface for an ordinal diagnostic test[43], with results ranging from 1 to M. For any pair of ordered decision thresholds (d 1 ;d 2 ), where 0 d 1 < d 2 M, the three correct classification rates, fork= 1;2;3, are P( ˆ D=kjD=k)=P(d k1 < T d k jD=k) = 8 > < > : 0 if ¯ d k1 > ¯ d k å¯ d k i= ¯ d k1 w ik if ¯ d k1 > ¯ d k where w ik =P(T =ijD=k), ¯ d k1 is the smallest integer greater than d k1 and ¯ d k is the largest integer that is less than or equal tod k . The empirical ROC surface across all possible pairs of cutoff points (d 1 ;d 2 ) can be constructed. Because of the existence of ties among observed test results for an ordinal test, the VUS is expressed as VUS=P(T 1 <T 2 <T 3 )+ 1 2 fP(T 1 <T 2 =T 3 ) + P(T 1 =T 2 <T 3 )g+ 1 6 P(T 1 =T 2 =T 3 ) whereT k indicates the test results from thek th disease class, fork= 1;2;3. 30 Accordingly, the nonparametric estimate of VUS is equivalent to d VUS= M2 å i=1 M1 å j=i+1 M å k=j+1 w i1 w j2 w k3 + 1 2 nM1 å i=1 M å j=i+1 (w i1 w j2 w j3 + w i1 w i2 w j3 ) o + 1 6 n M å i=1 w i1 w i2 w i3 o (3.2) wherew ik can be calculated explicitly by w ik =P(T =ijD=k)= t i f ki å M j=1 t j f kj ; (3.3) based on Bayes Rules, where t = P(T = i) and f ki = P(D= kjT = i). Chi et al. also proposed a maximum likelihood (ML) estimate of parameterst andf by making the MAR assumption. The likelihood function as the product of P(T)P(DjT)P(VjD; T), is propor- tional to P(T)P(DjT; V = 1) under the MAR assumption. By placing ML estimates for t and f in equation 3.3, the ML estimates of w can be obtained. Then, by replacing the unknownw in equation 3.2 with their ML estimates, ML estimator VUS can be derived as d VUS= å M2 i=1 å M1 j=i+1 å M k=j+1 ˆ t i ˆ t j ˆ t k ˆ f 1i ˆ f 2j ˆ f 3k å M i=1 ˆ t i ˆ f 1i å M i=1 ˆ t i ˆ f 2i å M i=1 ˆ t i ˆ f 3i + å M1 i=1 å M j=i+1 ˆ t i ˆ t 2 j ˆ f 1i ˆ f 2j ˆ f 3j + ˆ t 2 i ˆ t j ˆ f 1i ˆ f 2i ˆ f 3j 2 å M i=1 ˆ t i ˆ f 1i å M i=1 ˆ t i ˆ f 2i å M i=1 ˆ t i ˆ f 3i + å M i=1 ˆ t 3 i ˆ f 1i ˆ f 2i ˆ f 3i 6 å M i=1 ˆ t i ˆ f 1i å M i=1 ˆ t i ˆ f 2i å M i=1 ˆ t i ˆ f 3i The Delta method[44] or the jackknife method[45] can be used to estimate the variance of VUS. 31 In light of the work of Chi et al[43], a number of issues remain open for further inves- tigation. For example, it is imperative to develop proper methods for evaluation of the accuracy of a continuous diagnostic test accounting for the presence of verification bias. As illustrated in the previous chapter, methods have been proposed in the literature to con- struct bias-corrected estimates of accuracy for a continuous test in two-class disease status setting, such as IPW and DR estimators of AUC. The extension of these methods to three- class disease status problems will be greatly helpful for accuracy assessment when disease severity is of interest and corrrection is needed for verification biased-sampling. 3.5 Summary In this chapter we reviewed existing methodologies of accuracy measurement for three- class classification tasks. The only approach for verification bias correction in three-way ROC analysis only is applicable for ordinal tests. Therefore, in the next chapter, we propose bias correction methods for continuous tests in three-class disease classification problems. 32 Figure 3.1: Example of ROC Surface 33 Chapter 4 Estimation of VUS in the Presence of Ignorable Verification Bias ROC curve analysis has been a very important and useful tool for assessing the accuracy of a diagnostic test when the true disease status is a binary status (for example, diseased or non-diseased). Three-class classification problems have motivated many researchers to generalize ROC analysis to accommodate three-class problems. An ROC surface is a natu- ral extension of the ROC curve for three-class diagnostic tasks[34]. VUS is the most widely used measure for summarizing the ability of a continuous test to discriminate among three disease states. It is the probability that three subjects randomly selected from three ordinal disease statuses have ordered test results corresponding to their disease statuses. In the ideal setting, every subject received the GST and had true disease status verified and VUS can be estimated non-parametrically using Mann-Whitney U-statistic[36]. De- veloping verification bias correction approaches that can be used with continuous tests is compelling as many tests are measured on a continuous scale. There is no existing method, however, estimating the bias-corrected VUS for continuous tests. In this chapter, we ex- tend existing bias correction approaches for two-class classification problems and propose new estimators of VUS in the presence of verification bias. Asymptotic properties of the estimators are also investigated. A brief review of relative U-statistic theory behind our estimators is given in section 4.1. In the following sections, new estimators are introduced 34 and closed-form variance estimators are given. 4.1 A Brief Review on U-statistics Developed by W. Hoeffding (1948), U-statistics have been widely applied in many es- timating and testing problems[46]. In the following, we will introduce some properties related with the consistency and asysmptotic distribution of U-statistics. Detailed exposi- tions of U-statistic theory may be found in A. J. Lee(1990) and Koroljuk and Borovskich (1994)[47, 26]. Letz be a family of probability measures on an arbitrary measureable space. There are only mild restrictions on the family of distributions such as continuity or existence of mo- ments. LetX be a random variable or vector distributed according toF (F2z) and let the q(F) denote a real-valued estimable parameter withinz. Suppose there exists a real-valued measured functionh(x 1 ;:::;x m ) such that E F (h(x 1 ;:::;x m ) = q(F) for all F2z where x 1 ;:::;x m are independent and identically distributed (i.i.d.) random samples with distributionF. For a sample of sizen>m from a distributionF, one can obtain an consis- tent estimator ofq(F) as following, U n =U n (h)= (nm)! n! å P m;n h(X i 1 ;:::;X i m ) where the summation is over the set of all possible permutations of sizem (X i 1 ;:::;X i m ) cho- sen from n samples (1;2;:::;n). Note that without loss of generality one may assume that h is a symmetric function with respect to its arguments. It is straightforward to show that 35 this U-statistic,U n , is a consistent estimate ofq(F). To derive the asymptotic distribution ofU n , we need some notation. Forc= 0;1;:::;m; let h c (x 1 ;:::;x c )=Eh(x 1 ;:::;x c ;X c+1 ;:::;X m ) whereX c+1 ;:::;X m are i.i.d. random samples with distributionF. These functions all have the same expectation, which is equal toq(F). Let s 2 c =Cov(h(X i 1 ;:::;X i m );h(X j 1 ;:::;X j m )) where i 1 ;i 2 ;:::;i m and j 1 ; j 2 ;:::; j m are permutations of size m chosen from n samples (1;2;:::;n) and c is the number of integers common to (i 1 ;i 2 ;:::;i m ) and (j 1 ; j 2 ;:::; j m ), 0cm. Theorem 4.1. ForF (F2z), Var(U n )= n m 1 m å c=1 m c nm mc s 2 c Ifs 2 <¥, then p n(U n q)! p N(0;m 2 s 2 1 ). 4.2 IPW Estimator of AUC Note that in a study having each subjects’ true disease status verified, the AUC is the two-sample Wilcoxon statistic, which is an example of a two-sample (diseased and non- diseased population) U-statistic. The hindrance of constructing the Wilcoxon statistic to estimate the AUC in the presence of verification bias is the fact that the true disease status of some subjects is missing, which prohibits one from dividing the whole sample into the two subsamples: diseased and non-disease groups. Nevertheless, as mentioned before, if 36 we construct Wilcoxon statistics for AUC only based on data from subjects having dis- ease status verified, it will result in the complete-case (CC) na¨ ıve method which leads to verification bias. If bias correction methods are developed properly, then accuracy of a new test still could be accurately estimated. The IPW approach is one of existing method- ologies, in which each observation in the verification group is given weights equal to the inverse of the sampling fraction, i.e., the probability of getting disease verification. Us- ing the IPW approach, He et al. proposed a one sample U-statistic estimator based on the MAR assumption[24]. Detailed information about their method was reviewed in Section 2.8, Chapter 2. We will propose the IPW estimator of VUS by extending their methodology to three-class classification setting. 4.3 IPW Estimator of VUS We will continue to use data and notations defined before in section 3.1 to develop bias- corrected estimator of VUS. Under the defined setting, the observed data are comprised of n independent and identical distributed(i.i.d) samples S i of the random vector S i = (V i , D i , T i , A i ). (T i ,V i , A i ) is observed for each subject, but D i is observed only ifV i = 1. Here we also make the MAR assumption, i.e., V j= Dj(T;A) (ignorable verification). Let l 0 , l 1 , l 2 be the prevalence for the three disease subgroups and let p i =Pr(V i = 1jT i ;A i ) be the verification probability. The new estimator is built based on the following observation: 37 E p i 1 p j 1 p k 1 V i V j V k I(T i >T j >T k )I(D i >D j >D k ) =E ( E p i 1 p j 1 p k 1 V i V j V k I(T i >T j >T k )I(D i >D j >D k )jT i ;T j ;T k ;A i ;A j ;A k ) =E ( I(T i >T j >T k )E p i 1 p j 1 p k 1 V i V j V k I(D i >D j >D k )jT i ;T j ;T k ;A i ;A j ;A k ) =E ( I(T i >T j >T k )E I(D i >D j >D k )jT i ;T j ;T k ;A i ;A j ;A k E p i 1 V i j T i ;A i E p j 1 V j j T j ;A j E p k 1 V k j T k ;A k ) =E ( I(T i >T j >T k )E I(D i >D j >D k )jT i ;T j ;T k ;A i ;A j ;A k ) =E ( I(T i >T j >T k )I(D i >D j >D k ) ) =E ( I(T i >T j >T k )jI(D i >D j >D k ) ) P(D i >D j >D k ) =VUS Pr(D i = 2)Pr(D j = 1)Pr(D k = 0) =VUS l 0 l 1 l 2 By similar argument, we can prove that, E p i 1 p j 1 p k 1 V i V j V k I(D i >D j >D k ) =Pr(D i = 2)Pr(D j = 1)Pr(D k = 0) Because when D i is not verified (unobserved), accordinglyV i = 0, and thus the i th subject will not contribute to the computation of p i 1 p j 1 p k 1 V i V j V k I(T i > T j > T k )I(D i > D j > D k ). Thus, this quantity is computable for every subject even when true disease status is unknown. We propose direct estimation of the VUS in the presence of verification bias as following d VUS IPW = å n i=1 å n j=1 å n k=1 p i 1 p j 1 p k 1 V i V j V k I(T i >T j >T k )I(D i >D j >D k ) å n i=1 å n j=1 å n k=1 p i 1 p j 1 p k 1 V i V j V k I(D i >D j >D k ) (4.1) 38 To express it as a function of U-statistics, we need to rewrite (4.1) in symmetric form as follows. d VUS IPW = å n i=1 å n j=1 å n k=1 f(S i ; S j ; S k ) å n i=1 å n j=1 å n k=1 y(S i ; S j ; S k ) (4.2) where, f(S i ; S j ; S k )= 1 6 p i 1 p j 1 p k 1 V i V j V k ( I (T i T j )(D i D j ) > 0 I (T k T j )(D k D j ) > 0 I[(T k T i )(D k D i ) > 0] ) y(S i ; S j ; S k )= 1 6 p i 1 p j 1 p k 1 V i V j V k ( I(D i >D j )+I(D i <D j ) I(D k >D j )+I(D j <D k ) [I(D i >D k )+I(D k <D i )] ) (4.3) Next we prove that the new estimator d VUS IPW is consistent. Theorem 4.2. The estimator d VUS IPW is consistent. Proof. We already know that, E f(S i ; S j ; S k ) =VUS l 0 l 1 l 2 E y(S i ; S j ; S k ) =l 0 l 1 l 2 Therefore, by law of large numbers, we have, å i6=j6=k f(S i ; S j ; S k ) n(n 1)(n 2) ! p VUS l 0 l 1 l 2 å i6=j6=k y(S i ; S j ; S k ) n(n 1)(n 2) ! p l 0 l 1 l 2 (4.4) 39 Hence, d VUS IPW = å i6=j6=k f(S i ; S j ; S k ) å i6=j6=k y(S i ; S j ; S k ) ! p VUS Applying the asymptotic theory of U-statistics, we can obtain the asymptotic distribution of d VUS IPW . In chapter 6, we will discuss the asymptotic properties of IPW estimator in detail. 4.4 Summary Based on the ignorable verification bias assumption, we proposed a new estimator of bias- corrected VUS when the test results are continuous and the classification task is three-way. This new estimator is similar to the na¨ ıve CC estimator in that the estimation is based on verified subjects having observed true disease status, but the former weights the observed disease status by the probability of receiving disease verification. This approach requires parametric models to be specified for verification status. In the following chapter, we will further discuss bias-corrected verification when the MAR assumption is violated, i.e., non- ignorable missingness. We will further extend the current proposed IPW estimator to esti- mate VUS using IPW approach under the non-ignorable missingness assumption. We will also introduce a new estimator which has the doubly robustness property in terms of only requiring the correctness of parametric models either for disease status or for verification status. 40 Chapter 5 Estimation of VUS in the Presence of non-ignorable missingness The new method proposed in the previous chapter relies on the assumption that disease ver- ification is independent with true disease status when controlling for test results and related auxiliary information. However, in practice, the selection procedure for disease verification may be based on the patient’s detailed clinical information, which may not be entirely cap- tured by the test results and other measured auxiliary covariates. Under this situation, the MAR assumption no longer holds since disease verification may still be related with some unobserved covariates that depend on disease, even after the adjustment of test results and observed covariates. This is often referred as the non-ignorable missingness. There are a few papers that discuss correcting AUC for non-ignorable missingness. Re- cently, Rotnitzky et al. developed a doubly robust (DR) estimator of AUC under a non- ignorable verification model by arbitrary specifying a value for the non-ignorable param- eter and conduct sensitivty analyses[31]. Liu et al considered the challenge of specifying the non-ignorable parameter and estimated it by maximum likelihood with further assump- tions in the disease model[33]. In this chapter, we extended their work to incorporate the situation when there are three ordinal disease classes to be determined. In the following sections, estimation of VUS with or without an arbitrary non-ignorable parameter will be presented and followed by qualitative comparison. We will continue to use data and nota- 41 tion defined previously in section 4.2. 5.1 Model Assumptions for Identification As discussed in Chapter 3, without missing data in disease status, we can derive a nonpara- metric maximum likelihood estimator (MLE) d VUS ofVUS as follows, g VUS= å n i=1 å n j=1 å n k=1 I(T i >T j >T k )I(D i >D j >D k ) å n i=1 å n j=1 å n k=1 I(D i >D j >D k ) which is essentially a Wilcoxon statistic. The existence of unobserved disease status pro- hibits us to construct three-sample Wilcoxon statistic using the whole population. The na¨ ıve CC estimator can be constructed based on verified subjects but it may suffer verification bias unless the decision for receiving disease verification is made completely at random. Thus, we need to make arbitrary assumption in order to identify theVUS to the ROC sur- face. Extending the assumption made by Rotnitzky et al. to identify ROC curve[31], we have the following assumption, log ( P(V = 0jD; T ; A) P(V = 1jD; T ; A) ) =h(T;A)+q(T;A)D (5.1) where q(T;A) is an arbitrary specified function and h(T;A) is an arbitrary unknown func- tion. By applying Bayes’ rule, we can show that 5.1 holds true for someh(T;A) if and only if P(D=djV = 0; T ; A)= P(D=djV = 1; T ; A)exp(q(T;A)d) E exp(q(T;A)D)jV = 1; T ; A (5.2) 42 where, E exp(q(T;A)D)jV = 1; T ; A =P(D= 0jV = 1; T ; A) +exp(q(T;A))P(D= 1jV = 1; T ; A) +exp(q(T;A)2)P(D= 2jV = 1; T ; A) = 1P(D= 1jV = 1; T ; A)(1exp(q(T;A))) P(D= 2jV = 1; T ; A)(1exp(q(T;A)2)) (5.3) We define modelA(q) as the model making assumption 5.2 with some arbitrary specified q(T;A) and unknownh(T;A). One main assumption that is required in modelA(q) is that P P(V = 1jD;T;A)s > 0 = 1 (5.4) for some s > 0. It essentially means that regardless of the values of T and A, the prob- ability of getting disease status verified is always positive. In some settings, it would be plausible that subjects with certain test results and/or clinical characteristics would never receive the GST, which may be because there is strong evidence of the presence/absence of the disease. For example, invasive coronary angiography has been the reference GST for the assessment of coronary artery disease. In this case, it is not warranted to refer subjects who have a negative diagnostic test result to take the GST. Therefore, usually a research study will only consider subjects who have a positive diagnostic test result to get disease verified. Pepe and Alonzo (2001) showed that even with zero probability of verification, it is still possible to estimate the ratios of sensitivities and specificities[48]. Unfortunately, VUS can not be estimated in this situation. We can, however, specify an extra function to represent the probability of disease for subjects who have zero probability of getting dis- ease verification. Rotnitzky et al. discussed in detail about this strategy, which is one of the 43 research area we are planning to explore (Section 10.12) [31]. The functionq(T;A) measures the degree of the residual association betweenD andV after controlling for observed T and A. For example, the choice q(T;A)= 0 indicates that after adjustment for T and A,V is independent of D (i.e.,V j= Dj(T;A)), which corresponds to the assumption of ignorable verification. In practice, the decision for disease verification may be based not only on T and A, but on additional unobserved covariates that are cor- related with true disease status. In other words, q(T;A)6= 0 when there is non-ignorable missingness. For example, the choiceq(T;A)=1 means that within certain level ofT and A, the odds of having disease status verified is about 2.7 times higher for subjects having middle-level disease stage (i.e., D= 1) than for subjects without disease. One could also chooseq(T;A)=a 0 +a 1 T to consider the situation that the odds of disease verification is modified by the value of new test results and this modification is monotone in T. Scharfstein et al. showed that the choice of q(T;A) does not impose any restriction on the distribution of observed data S i = (V i , D i , T i , A i )[49]. Therefore the choice is untestable because any q(T;A) can perfectly fit the observed data. Robins et al. suggested to repeat the estimation under different choices of the selection bias function q(T;A) as a series of sensitivity analyses[50]. It has been recommended to choose a series of simple functions having one or two parameters. For example, lettingq(T;A) equal to a constantb indicates the assumption that the odds of having disease verification of diseased at medium level vs. non-diseased subjects is constant across all values ofT andA. Under assumption 5.2, we can derive the conditional probability of having a certain disease status given the verification status, the new diagnostic test results and observed auxiliary data for thei th subject,i= 1;:::;n. 44 E(I(D=d)jV ; T ; A)=E(I(D=d)jV= 1; T ; A) 2 6 6 4 V+(1V) exp(q(T;A)D) E exp(q(T;A)D)jV = 1; T ; A 3 7 7 5 (5.5) Also, by assumption (5.2), we can estimate the conditional probability of receiving disease verification given the new diagnostic test results, disease status and measured covariates for thei th subject,i= 1;:::;n. p i = exp[h(T i ;A i )+q(T i ;A i )D i ]+ 1 1 (5.6) Theorem 5.1. Under modelA(q),h(T;A) is identified as following, h(T;A)= log ( P(V = 0jT ; A) P(V = 1jT ; A) ) log ( E exp(q(T;A)D)jV = 1; T ; A ) (5.7) Although using model A(q) is sufficient for identification of VUS, it is not sufficient enough to estimate VUS due to the curse of dimensionality. Because when both T and A are continuous, h(T;A) and E(I(D= d)jV= 1; T ; A) cannot be estimated properly with pratical sample size even if we can impose smoothness conditions on these condi- tional expectations. Two strategies were considered to build dimension-reduction models for estimation purpose. 45 The first option is to assume that the unknown functionh(T;A) follows a parametric model h(T;A)=h(T;A;g ) (5.8) where h(T;A;g ) is a known function that is smooth in g, and g is an unknown h 1 parameter vector. The second strategy is to assume thatP(D=djV = 1; T ; A) follows a parametric model, namely, log P(D= 1jV = 1; T ; A) P(D= 0jV = 1; T ; A) =m(T;A;b 1 ) log P(D= 2jV = 1; T ; A) P(D= 0jV = 1; T ; A) =m(T;A;b 2 ) (5.9) wherem(T;A;b 1 ) andm(T;A;b 2 ) are smooth inb 1 andb 2 . b 1 andb 2 are unknownm 1 parameters. There are two models we can build using above assumptions. ModelB(q) is defined by (5.8) with some specified q(T;A) as well as assumption (5.1). ModelC(q) is defined by (5.9) with some specifiedq(T;A) as well as assumption (5.2). 5.2 IPW Estimation under Non-ignorable Verification Bias We can see that modelB(q) is essentially modeling the selection procedure for disease verification, through which we can estimate the verification probabilityp (5.6). In Chapter 4, we introduced the IPW estimator of VUS under the MAR assumption. Here we can extend the IPW estimator to incorporate non-ignorable missingness assumption. By taking conditional expections given (T, V , D), we have similar observations, 46 E p i 1 p j 1 p k 1 V i V j V k I(T i >T j >T k )I(D i >D j >D k ) =E ( E p i 1 p j 1 p k 1 V i V j V k I(T i >T j >T k )I(D i >D j >D k )j T i ;T j ;T k ;A i ;A j ;A k ; D i ;D j ;D k ) =E ( I(T i >T j >T k )E I(D i >D j >D k )jT i ;T j ;T k ;A i ;A j ;A k ; D i ;D j ;D k E p i 1 V i j T i ;A i ; D i E p j 1 V j j T j ;A j ; D j E p k 1 V k j T k ;A k ; D k ) =VUS Pr(D i = 2)Pr(D j = 1)Pr(D k = 0) =VUS l 0 l 1 l 2 It is straightforward to obtain the IPW estimator accounting for non-ignorable bias as fol- lows, d VUS IPW = å n i=1 å n j=1 å n k=1 p i 1 p j 1 p k 1 V i V j V k I(T i >T j >T k )I(D i >D j >D k ) å n i=1 å n j=1 å n k=1 p i 1 p j 1 p k 1 V i V j V k I(D i >D j >D k ) (5.10) wherep can be estimated under modelB(q) with user-specifiedq(T;A). To estimateg, note that assumption (5.1) imposes a logistic regression model on the selec- tion probabilities for disease verification with offset q(T;A)D. Because the offset cannot be observed whenV = 0, we cannot use standard logistic regression to find ˆ g. Rotnitzky et al. proposed to solve the following equation to get ˆ g [31] n å i=1 Vp(T i ;A i ;g) 1 1 d(T;A) i = 0 (5.11) 47 Applying the theoretical results in Rotnitzky and Robins (1997)[51], it has been proved that if we letd(T;A) be d opt (T;A)= ¶h(T;A;g) ¶g g=g E exp(q(T;A)D)jV = 1; T ; A E exp(q(T;A)D)p 1 (T;A;g )jV = 1; T ; A (5.12) then the solution g opt of (5.17) is a locally semiparametric efficient estimator of g under modelB(q). 5.3 DR Estimation Estimations under neither modelB(q) norC(q) is completely satisfactory as the estima- tors can be biased if the corresponding model, (5.8) or (5.9), is incorrect. Thus, we are motivated to develop an intermediate estimation strategy that can give the researcher two chances to obtain correct inference. Following theoretical results about DR estimation of AUC proposed by Rotnitzky et al.[31], we can define D DR (D i =d)=E(I(D i =d)jV i ; T i ; A i ) + U i (g; b 1 ; b 2 ) (5.13) where U i (g; b 1 ; b 2 )=V i ( [I(D i =d)P(D i =djV i = 1; T i ; A i )] + exp[h(T i ;A i )] exp[q(T i ;A i )D i ][I(D i =d)P(D i =djV i = 1; T i ; A i )] ) (5.14) 48 D DR (D=d), d= 0;1;2, is computable because it is based on observed variables. We can show that this function has the following properties E[D DR (D i =d;g ;b 1 ;b 2 )jT i ;A i ]= E[I(D i = djT i ; A i ] for anyb 1 andb 2 if model (5.10) holds (5.15) E[D DR (D i =d;g;b 1 ;b 2 )jT i ;A i ]= E[I(D i = djT i ; A i ] for anyg if model (5.11) holds (5.16) To show 5.15, first note that E U i (g; b 1 ; b 2 )jT i ;A i = P(V = 0jD=d; T; A) P(V = 1jD=d; T; A) P(D=djV = 1; T; A) P(V = 0jT; A) P(V = 1jT; A) P(D=djV = 0; T; A) = 0 To show 5.16, first note thatD DR (D i =d) can also be written as ˜ D i DR (d)=V i p 1 i I(D i = d) V i p 1 i 1 P(D i =djV i = 0; T i ; A i ) Now noticing thatE V i p 1 i 1jV i ; T i ; A i = 1, we conclude that this impliesE[D DR (D i = d)jT i ;A i ]= E[I(D i = djT i ; A i ]. Based on the above observations, we can construct a new doubly robust (DR) estimator of VUS, 49 d VUS DR = å n i=1 å n j=1 å n k=1 D DR (D i = 2;q q q)D DR (D j = 1;q q q)D DR (D k = 0;q q q)I(T i >T j >T k ) å n i=1 å n j=1 å n k=1 D DR (D i = 2;q q q)D DR (D j = 1;q q q)D DR (D k = 0;q q q) (5.17) where ˆ q q q = f ˆ g ; ˆ b 1 ; ˆ b 2 g. ˆ g, ˆ b 1 , and ˆ b 2 are consistent estimators of g, b 1 , and b 2 under model (5.8) or (5.9). Comparing to the IPW estimator (5.10), the DR estimator can protect against misspecification of either (5.8) or (5.9), but not both. In other words, as long as modelA(q), condition (5.4) and one of the models (5.8) or (5.9) holds, the DR estimator is consistent forVUS. ˆ b 1 , and ˆ b 2 can be obtained by solving the following equations, n å i=1 H(b 1 ;b 2 ) i = 0 (5.18) with H(b 1 )=V ¶m(T;A;b 1 ) ¶b 1 I(D= 1) 1+exp[m(T;A;b 1 )] 1 H(b 2 )=V ¶m(T;A;b 2 ) ¶b 2 I(D= 2) 1+exp[m(T;A;b 2 )] 1 (5.19) Then the bias-corrected d VUS can be computed accordingly. 5.4 Pseudo Doubly Robust (PDR) Estimation of VUS Since giving any choice of q(T;A) does not put restriction on the observed data distri- bution, modelB(q) is a non-parametric model for the distribution S=(V;c(V;D;T;A)). For certain models of the observed data distribution any value of non-ignorable parameter 50 q(T;A) can fit the data. Therefore, in practice, when there are little prior information about non-ignorable parameter, we need to make some additional assumptions to carry out the estimation. Extending the Liu et al. approach for AUC estimation with the likelihood- based estimators of non-ignorable parameter[33], here by making an additional parametric disease model, we can make the non-ignorable selection parameter identifiable. We con- struct the PDR estimator of VUS when the true three-class disease status is subject to non-ignorable missingness. It is named PDR because it shares the same functional form as the DR estimator, but it does not have the doubly robust property in general. Again, we assume the following selection mechanism for disease verification as following, log ( P(V = 0jD; T ; A) P(V = 1jD; T ; A) ) =h(T;A;g)+q(T;A)D (5.20) For simplicity, if we consider the setting with only one auxiliary covariate A, Then we let h(T;A;m)=g 0 +g 1 T+g 2 A andq(T;A)=a. This can be easily extended to allow for more complicated models; for example, a model with interation betweenT andA. ThenP(D=djV = 0; T ; A) can be solved as a function ofP(D=djV = 1; T ; A), P(D=djV = 0; T ; A)= P(D=djV = 1; T ; A)exp(ad) E exp(aD)jV = 1; T ; A (5.21) In order to make a identifiable, a further assumption of the disease model for the entire sample is required, log P(D= 1jT ; A) P(D= 0jT ; A) =m(T;A;b 1 ) log P(D= 2jT ; A) P(D= 0jT ; A) =m(T;A;b 2 ) (5.22) 51 Again, for simplicity, we take m(T;A;b 1 )=b 01 +b 11 T +b 21 A and m(T;A;b 2 )=b 02 + b 12 T+b 22 A. Notice that if we fixP(D=djV = 1; T ; A) and changea in (5.21), model (5.22) may not hold since P(D= djV = 0; T ; A) will be changed. Hence, making an additional assumption about the disease model will provide some information to identify a. Let m m m =(m 0 ;m 1 ;m 2 ) T ,b 1 b 1 b 1 =(b 01 ;b 11 ;b 21 ) T , andb 2 b 2 b 2 =(b 02 ;b 12 ;b 22 ) T . And let p di =P(V i = 1jD i =d;T i ;A i )= 1 exp(m 0 +m 1 T i +m 2 A i +aD i )+ 1 r 1i =P(D i = 1jT i ;A i )= exp(b 01 +b 11 T i +b 21 A i ) exp(b 01 +b 11 T i +b 21 A i )+ exp(b 02 +b 12 T i +b 22 A i )+ 1 r 2i =P(D i = 2jT i ;A i )= exp(b 02 +b 12 T i +b 22 A i ) exp(b 01 +b 11 T i +b 21 A i )+ exp(b 02 +b 12 T i +b 22 A i )+ 1 (5.23) Then the log likelihood can be expressed as following, l(m m m;b 1 b 1 b 1 ;b 2 b 2 b 2 )= n å i=1 I(D i = 0)V i logp 0i r 0i +I(D i = 1)V i logp 1i r 1i +I(D i = 2)V i logp 2i r 2i +(1V i )log(1p 0i r 0i p 1i r 1i p 2i r 2i ) (5.24) Under model assumptions (5.20) and (5.22), the score equations form m m,b 1 b 1 b 1 ,b 2 b 2 b 2 are 52 0= n å i=1 " I(D i = 1)V i (1p 1i )+I(D i = 2)V i 2(1p 2i ) (1V i ) r 1i p 1i (1p 1i )+ 2r 2i p 2i (1p 2i ) 1p 0i r 0i p 1i r 1i p 2i r 2i # 0= n å i=1 0 B B B @ 1 T i A i 1 C C C A " I(D i = 0)V i (1p 0i )+I(D i = 1)V i (1p 1i )+I(D i = 2)V i (1p 2i ) (1V i ) r 0i p 0i (1p 0i )+r 1i p 1i (1p 1i )+r 2i p 2i (1p 2i ) 1p 0i r 0i p 1i r 1i p 2i r 2i # 0= n å i=1 0 B B B @ 1 T i A i 1 C C C A " I(D i = 0)V i r 2i +I(D i = 1)V i r 2i I(D i = 2)V i (1r 2i ) (1V i ) p 0i r 0i r 2i +p 1i r 1i r 2i p 2i r 2i (1r 2i ) 1p 0i r 0i p 1i r 1i p 2i r 2i # 0= n å i=1 0 B B B @ 1 T i A i 1 C C C A " I(D i = 0)V i r 1i I(D i = 1)V i (1r 1i )+I(D i = 2)V i r 1i (1V i ) p 0i r 0i r 1i +p 2i r 1i r 2i p 1i r 1i (1r 1i ) 1p 0i r 0i p 1i r 1i p 2i r 2i # (5.25) Using the Newton-Raphson algorithm or an EM algorithm, we can solve the above equa- tions to estimate the model parameters. Then, we can further construct the PDR estimator. First denote r 0di =P(D i =djV = 0;T;A), d = 0;1;2 to be the disease probability condi- tioning on the true disease status that is missing. It can be estimated usingc p di andc r di , d r 0di = (1c p di )c r di (1c p 0i )c r 0i +(1c p 1i )c r 1i +(1c p 2i )c r 2i (5.26) 53 Let D PDR (D = d)=Vp 1 I(D = d)+(Vp 1 1) r 0di . We can obtain the estimated D PDR;i by replacingp di andr di withc p di andc r di , respectively. The new PDR estimator of VUS is then given as follows d VUS PDR = å n i=1 å n j=1 å n k=1 D PDR (D i = 2) D PDR (D j = 1) D PDR (D k = 0)I(T i >T j >T k ) å n i=1 å n j=1 å n k=1 D PDR (D i = 2) D PDR (D j = 1) D PDR (D k = 0) (5.27) Although having the same functional form as the DR estimator, the difference exists in the fact that the PDR estimator estimates r di and p di , d = 0;1;2 together from the likeli- hood function, and thus, the correct specification of both the verification (5.20) and disease model (5.22) is required to obtain valid bias-correction. 5.5 Qualitative Comparison of Approaches We have proposed DR, IPW and PDR estimators of VUS under non-ignorable missingness. Both DR and PDR estimators need us to build a selection model for disease verification and disease model, while the IPW estimator only needs a verification model. The DR estimator only requires one of the disease model or the selection model to be correct but at the same time, it requires an inestimable specification of the residual associationq(T;A) which mea- sures the degree of influence of the true disease status on the verification process given the new diagnostic test and possibly other observed covariates. It has been recommended to conduct sensitivity analyses to different degrees of non-ignorable selection bias by repeat- ing the estimation under different plausible choices for theq(T;A) function. In some cases, it may be challenging to specify the q(T;A) and one may have more in- formation about the disease regression model. In the effort to estimate such a parameter from the data instead, possibly under a reasonable parametric model, the PDR estimator was proposed. By further assuming a disease model for the whole sample, the PDR esti- mator makes the non-ignorable selection model identifiable and hence can be estimated by 54 solving the score equations. Compared to the DR estimator, the PDR estimator needs to specify the parametric disease model on the whole study population but loses the double robust property to estimate the q(T;A) function. Researchers should choose an estimator based on their confidence in information on the reasons for missing true disease status and the disease mechanism. 5.6 Summary In this chapter, we have focused on estimation of the VUS under non-ignorable missing- ness when the test result is continuous and there are three ordinal disease classes. Two approaches were considered. DR estimator can achieve double robustness with an arbitrary specified non-ignorable parameter, while the likelihood-based PDR estimator identifies the non-ignorable parameter with further assumption about the disease model on the entire population. By making an assumption about the verification model with non-ignorable pa- rameter, we extend our previously proposed IPW estimator under the non-ignorable miss- ingness. We will discuss their asymptotic behaviors in the next chapter. 55 Chapter 6 Asysmptotic Properties of Bias-corrected Estimators of VUS In the previous two chapters, we proposed several bias-corrected estimators of VUS under different assumptions. In this chapter, we will show that by applying the asymptotic theory of U-statistics, we can derive a closed-form expression for the asymptotic variance of IPW estimator. We also use the Jackknife technique to obtain variance estimates for all the new estimators. 6.1 Data and Notation Data and notation defined before in section 4.2 still apply here. We further define F d (t) to be the corresponding distribution function of the new test resultT for subjects withD = d, d = 0; 1; 2. Finally letG d (t;a) be the corresponding joint distribution function of new test resultT and covariatesA for subjects withD = d,d = 0; 1; 2. 56 6.2 Closed-form Estimation of Asymptotic Variance for IPW estimator First, recall that the IPW estimator of VUS has the following function form, d VUS IPW = å n i=1 å n j=1 å n k=1 f(S i ; S j ; S k ) å n i=1 å n j=1 å n k=1 y(S i ; S j ; S k ) (6.1) where, f(S i ; S j ; S k )= 1 6 p i 1 p j 1 p k 1 V i V j V k ( I (T i T j )(D i D j ) > 0 I (T k T j )(D k D j ) > 0 I[(T k T i )(D k D i ) > 0] ) y(S i ; S j ; S k )= 1 6 p i 1 p j 1 p k 1 V i V j V k ( I(D i >D j )+I(D i <D j ) I(D k >D j )+I(D j <D k ) [I(D i >D k )+I(D k <D i )] ) (6.2) Theorem 6.1. Let A= 1 l 0 l 1 l 2 ; VUS l 0 l 1 l 2 , then, p n( d VUS IPW VUS)! p N(0;A T S S SA), whereS S S is, S S S= 8 > < > : Cov f(S i ; S j ; S k ); f(S i ; S m ; S n ) Cov f(S i ; S j ; S k ); y(S i ; S m ; S n ) Cov y(S i ; S j ; S k ); f(S i ; S m ; S n ) Cov y(S i ; S j ; S k ); y(S i ; S m ; S n ) Proof. Let U = 1 n(n1)(n2) å n i=1 å n j=1 å n k=1 f(S i ; S j ; S k ) V = 1 n(n1)(n2) å n i=1 å n j=1 å n k=1 y(S i ; S j ; S k ). By Theorem 4.1, with m = 3, we have p n((U;V) T m)! d N(0;9S S S), wherem=(l 0 l 1 l 2 VUS;l 0 l 1 l 2 ) T . 57 Therefore, by the multivariated-method, p n( d VUSVUS)! d N(0;s 2 ),S S S can be expressed in terms ofl d ; F d (t) andG d (t;a),d = 0; 1; 2. Since Cov f(S i ; S j ; S k ); f(S i ; S m ; S n ) =E f(S i ; S j ; S k ); f(S i ; S m ; S n ) l 2 0 l 2 1 l 2 2 VUS 2 Cov y(S i ; S j ; S k ); y(S i ; S m ; S n ) =E y(S i ; S j ; S k ); y(S i ; S m ; S n ) l 2 0 l 2 1 l 2 2 Cov f(S i ; S j ; S k ); y(S i ; S m ; S n ) =E f(S i ; S j ; S k ); y(S i ; S m ; S n ) l 2 0 l 2 1 l 2 2 VUS we need to compute E f(S i ; S j ; S k ); f(S i ; S m ; S n ) , E y(S i ; S j ; S k ); y(S i ; S m ; S n ) , andE f(S i ; S j ; S k ); y(S i ; S m ; S n ) . 58 E f(S i ; S j ; S k ); f(S i ; S m ; S n ) = 1 36 E ( p i 1 p j 1 p k 1 V i V j V k p i 1 p m 1 p n 1 V i V m V n I (T i T j )(D i D j ) > 0 I (T k T j )(D k D j ) > 0 I[(T k T i )(D k D i ) > 0] I[(T i T m )(D i D m ) > 0]I[(T n T m )(D n D m ) > 0]I[(T n T i )(D n D i ) > 0] ) = 1 36 E ( E p i 1 p j 1 p k 1 V i V j V k p i 1 p m 1 p n 1 V i V m V n I (T i T j )(D i D j ) > 0 I (T k T j )(D k D j ) > 0 I[(T k T i )(D k D i ) > 0] I[(T i T m )(D i D m ) > 0]I[(T n T m )(D n D m ) > 0]I[(T n T i )(D n D i ) > 0] S i ;S j ;S k ;S m ;S n ) = 1 36 E ( p 1 i I (T i T j )(D i D j ) > 0 I (T k T j )(D k D j ) > 0 I[(T k T i )(D k D i ) > 0] I[(T i T m )(D i D m ) > 0]I[(T n T m )(D n D m ) > 0]I[(T n T i )(D n D i ) > 0] ) = 1 36 ( 4E W(i; j;k;m;n)jD i = 2;D j =D m = 1;D k =D n = 0 l 2 0 l 2 1 l 2 + 4E W(i; j;k;m;n)jD i = 1;D j =D m = 0;D k =D n = 2 l 2 0 l 2 2 l 1 + 4E W(i; j;k;m;n)jD i = 0;D j =D m = 1;D k =D n = 2 l 2 2 l 2 1 l 0 ) where W(i; j;k;m;n)=p 1 i I (T i T j )(D i D j ) > 0 I (T k T j )(D k D j ) > 0 I[(T k T i )(D k D i ) > 0] I[(T i T m )(D i D m ) > 0]I[(T n T m )(D n D m ) > 0]I[(T n T i )(D n D i ) > 0] 59 Computation ofE W(i; j;k;m;n)jD i = 1;D j =D m = 0;D k =D n = 2 : E W(i; j;k;m;n)jD i = 1;D j =D m = 0;D k =D n = 2 =E ( E p 1 i I (T i T j )(D i D j ) > 0 I (T k T j )(D k D j ) > 0 I[(T k T i )(D k D i ) > 0] I[(T i T m )(D i D m ) > 0]I[(T n T m )(D n D m ) > 0]I[(T n T i )(D n D i ) > 0] D i = 2;D j =D m = 1;D k =D n = 0;T j ;T m D i = 2;D j =D m = 1;D k =D n = 0 ) = Z T j Z T m Z p 1 i I(T i >T j )d G 2 (T i ;A i ) Z I(T j >T k )d F 0 (T k ) Z I(T m >T n )d F 0 (T n ) d F 1 (T j ) d F 1 (T m ) Similarly, we obtain E W(i; j;k;m;n)jD i = 0;D j =D m = 1;D k =D n = 2 = Z T j Z T m Z p 1 i I(T i <T j )I(T i <T m )d G 0 (T i ;A i ) Z I(T k >T j )d F 2 (T k ) Z I(T n >T m )d F 2 (T n ) d F 1 (T j ) d F 1 (T m ) E W(i; j;k;m;n)jD i = 1;D j =D m = 2;D k =D n = 0 = Z T i p 1 i Z I(T j >T i )d F 2 (T j ) Z I(T m >T i )d F 2 (T m ) Z I(T i >T k )d F 0 (T k ) Z I(T i >T n )d F 0 (T n ) d G 0 (T i ;A i ) 60 Hence, Cov f(S i ; S j ; S k ); f(S i ; S m ; S n ) = 1 9 ( Z T j Z T m Z p 1 i I(T i >T j )d G 2 (T i ;A i ) Z I(T j >T k )d F 0 (T k ) Z I(T m >T n )d F 0 (T n ) d F 1 (T j ) d F 1 (T m ) l 2 0 l 2 1 l 2 Z T j Z T m Z p 1 i I(T i <T j )I(T i <T m )d G 0 (T i ;A i ) Z I(T k >T j )d F 2 (T k ) Z I(T n >T m )d F 2 (T n ) d F 1 (T j ) d F 1 (T m ) l 2 1 l 2 2 l 0 Z T i p 1 i Z I(T j >T i )d F 2 (T j ) Z I(T m >T i )d F 2 (T m ) Z I(T i >T k )d F 0 (T k ) Z I(T i >T n )d F 0 (T n ) d G 0 (T i ;A i ) l 2 0 l 2 2 l 1 ) l 2 0 l 2 1 l 2 2 VUS 2 By similar arguments, Cov f(S i ; S j ; S k ); y(S i ; S m ; S n ) = 1 9 ( Z Z p 1 i I(T i >T j )d G 2 (T i ;A i ) Z I(T j >T k )d F 0 (T k ) d F 1 (T j ) l 2 0 l 2 1 l 2 + Z p 1 i Z I(T j >T i )d F 2 (T j ) Z I(T i >T k )d F 0 (T k ) d G 1 (T i ;A i ) l 2 0 l 2 2 l 1 + Z Z p 1 i I(T i <T j )d G 0 (T i ;A i ) Z I(T j <T k )d F 2 (T k ) d F 1 (T j ) l 2 1 l 2 2 l 0 + ) l 2 0 l 2 1 l 2 2 VUS Cov y(S i ; S j ; S k ); y(S i ; S m ; S n ) = 1 9 ( Z p 1 i d G 2 (T i ;A i ) l 2 0 l 2 1 l 2 + Z p 1 i d G 1 (T i ;A i ) l 2 0 l 2 2 l 1 + Z p 1 i d G 0 (T i ;A i ) l 2 1 l 2 2 l 0 + ) l 2 0 l 2 1 l 2 2 c l d can be estimated by å n j=1 p i 1 V i I(D i =d) å n j=1 p i 1 V i and the empirical density functions (EDFs) b F d (t); b G d (t;a) can also be estimated empirically using observed data, ford = 0; 1; 2. 61 The variance estimation of d VUS IPW assumed thatp i =Pr(V i = 1jT i ;X i ),i= 1;:::;n, were known. At the design stage of studies, a protocol which dictates which subjects get dis- ease verified, the sampling fractions (p) are known. In some studies such as observational studies, however, the actual selection probabilities may be unknown and thus need to be estimated. In these cases, we could substitute b p i for it in the expressions for the estimator and its variance. 6.3 Jackknife Estimator of Variance First, we briefly describe Jackknife variance estimation. Detailed theory can be found in reference[52, 53] LetZ Z Z 1 ;:::;Z Z Z n be independent random variables and define T n =T(Z Z Z 1 ;:::;Z Z Z n )= n m 1 å 1i 1 <:::<i m n h(Z Z Z i 1 ;:::;Z Z Z i m ) as a one-sample U-statistic of degree m as a consistent estimator of the parameter q. Let the jackknife pseudo-values be ˆ V i =nT n (n 1)T (i) n1 , where T (i) n1 =T(Z Z Z 1 ;:::;Z Z Z i1 ;Z Z Z i+1 ;:::;Z Z Z n ) = n 1 m 1 i å 1j 1 <:::<j m n h(Z Z Z j 1 ;:::;Z Z Z j m ) which is the statisticT n1 computed using then 1 sample from the original data set with- out theith data point. Then the jackknife estimator ofq is the average of the pseudo-values: ˆ T n (jack) = 1 n n å i=1 ˆ V i and the Jackknife estimate of the variance ofT n is given by, d Var(T n )= 1 n(n 1) n å i=1 ( ˆ V i ˆ T n (jack)) 2 62 It is straightforward to apply the Jackknife method to obtain variance estimation for our new estimators. Take the IPW estimator of VUS as one example. We proved that VUS= E p i 1 p j 1 p k 1 V i V j V k I(T i >T j >T k )I(D i >D j >D k ) E p i 1 p j 1 p k 1 V i V j V k I(D i >D j >D k ) Let f =E p i 1 p j 1 p k 1 V i V j V k I(T i >T j >T k )I(D i >D j >D k ) and y =E p i 1 p j 1 p k 1 V i V j V k I(D i >D j >D k ) ; then estimators off andy are ˆ f = 1 n(n 1)(n 2) i6=j6=k å p i 1 p j 1 p k 1 V i V j V k I(T i >T j >T k )(D i >D j >D k ) ˆ y = 1 n(n 1)(n 2) i6=j6=k å p i 1 p j 1 p k 1 V i V j V k I(D i >D j >D k ) Thei th jackknife pseudo-values for ˆ f is ˆ f i PS =n ˆ f(n 1) ˆ f (i) n1 and the jackknife estima- tor off, ˆ f JK = 1 n å n i=1 ˆ f i PS . Thus the Jackknife estimator of variance off is d Var(f)= 1 n(n 1) n å i=1 ( ˆ f i PS ˆ f JK ) 2 Similarly, the Jackknife estimator of the variance ofy is d Var(y)= 1 n(n 1) n å i=1 ( ˆ y i PS ˆ y JK ) 2 and the Jackknife estimator of the covariance betweenf andy is d Cov(f;y)= 1 n(n 1) n å i=1 ( ˆ f i PS ˆ f JK )( ˆ y i PS ˆ y JK ) 63 By the multivariate delta method, we have d Var(VUS IPW )= ˆ f 2 ˆ y 4 d Var(y)+ 1 ˆ y 2 d Var(f)+ 2 ˆ f ˆ y 3 d Cov(f;y) Because of the fact that our other estimators have similar functional form as the IPW esti- mator, we can easily apply similar arguments to obtain the Jackknife estimator of variance for them. 6.4 Summary In this chapter we developed asymptotic distribution theory for the IPW estimator based on U-statistic theory. We further used the IPW estimator as one example and derived the Jackknife estimator of variance. We can then calculate confidence intervals for new estima- tors. In the next two chapters, we use simulation studies to investigate the performance of the different estimators under different assumptions about the missingness, i.e. ignorable missingness or non-ignorable missingness. 64 Chapter 7 Simulation studies: IPW Methodology In Chapter 4, we developed the IPW estimator of VUS using the MAR assumption. We proved that this new estimator d VUS IPW is consistent and derived its asymptotic properties. In this chapter, the finite-sample behavior of this new estimator of VUS is investigated via simulation. 7.1 Notation for Estimators Let us define short notations for our proposed estimators. We consider two IPW estima- tors. “IPW(E)” corresponds to d VUS IPW using estimatedp by assuming a logistic regression model for the verification process; while “IPW(K)” corresponds to d VUS IPW using known p based on prior knowledge. “CC” represents the na¨ ıve estimator of VUS only using com- plete cases without bias correction, d VUS na¨ ıve . “Full Data” represents the estimates based on complete data when every subject gets disease verified, which should be unbiased. “True” estimates are the average “Full Data” estimates when sample size is 5000 across 1000 re- alizations, which can represent the true value of VUS. 65 7.2 Simulation Set-up The simulation set-up is similar to that of Alonzo et al.(2003 & 2005) but modified to ac- commodate three ordered disease stages rather than binary disease status. Specifically, the disease status is formed based on an underlying continuous pathology process, which re- mains subclinical or mild-disease until it reaches certain thresholds and hence progresses to the next disease stage. In this simulation study, the disease status D with three stages, i.e. non-disease, mild-disease and severe-disease, is generated based on the comparison be- tween random variableZN(0;1) and two ordered decision boundaries p 1 < p 2 . That is, non-disease(D= 0) ifZ< p 1 ; severe disease(D= 2) ifZ> p 2 and mild-disease(D= 1), otherwise. In other words, the thresholds p 1 and p 2 determine the prevalence of the disease. In practice, there are usually multiple factors contributing to the development of disease, so it is reasonable for us to viewZ as the sum of two independent random variablesZ 1 andZ 2 where both of themN(0;0:5). The aim of the continuous new diagnostic test results is to capture the information in those factors, and here, we construct T as a linear combination ofZ 1 ,Z 2 and random normal error,e 1 . In particular, T =a 1 Z 1 +b 1 Z 2 +e 1 ; e 1 N(0;0:25) As shown in table 7.1, test accuracy varies according to the values ofa 1 andb 1 . The new diagnostic test can be highly accurate whena 1 =b 1 = 1. Decreasing the value of eithera 1 or b 1 to zero, causes the test to capture less information about the disease status and thus leads to lower accuracy. Similarly, when both the values ofa 1 andb 1 decrease to 0.5, the inherent accuracy of T also decreases. By lettinga 1 =b 1 = 0, the test becomes no better than random guessing. One covariateA is generated, in a way similar to T, according to the following relation, 66 Table 7.1: True and CC VUS under different test accuracy Method a 1 ,b 1 1, 1 0.5, 0.5 1, 0 0, 0 True 0.792 0.457 0.565 0.167 CC 0.747 0.427 0.521 0.167 A=a 2 Z 1 +b 2 Z 2 +e 2 ; e 2 N(0;0:25) wheree 1 ande 2 are independent. Again, by varying the value ofa 2 andb 2 , one can change the degree to which the covariate is correlated with disease status. The verification mechanism is designed to be a Bernoulli random variable with the verifi- cation probability related to the test results and covariate using the probability model logit(P(V = 1))=g 0 +g 1 I(t q 1 <Tt q 2 )+g 2 I(T >t q 2 )+g 3 I(a q 1 <Aa q 2 )+g 4 I(A>a q 2 ) wheret q 1 andt q 2 are theq th 1 andq th 2 quantiles of the distribution ofT orA(q 1 <q 2 ). In this setting, verification status is related to the disease status through the new test results and covariate. Higher T or A increases the probability of disease verification. Using ordered thresholdt q 1 andt q 2 , patients are classfied into different categories according to the values of their T and A. Different verification probabilities are given to patients in different cate- gory to introduce informative missingness in the disease status. In the estimation procedure, the correct models are assumed to hold for the verification probability. Robustness to verification model misspecification will be comprehensively in- vestigated in the next chapter along with other proposed estimators. Simulation results under different scenarios are summarized over 1000 replications. 67 7.3 Default Setting Unless otherwise indicated, the following default simulation set-up will be used. In the default setting, the diagnostic test under study is highly accurate (True VUS = 0.792) with a 1 =b 1 = 1. The value of (p 1 , p 2 ) is chosen to yield disease prevalence rates of 5%, 15% and 80%, for non-disease, mild-disease and severe disease status, respectively. (q 1 , q 2 ) and (g 0 , g 1 , g 2 ,g 3 , g 4 ) are chosen to be (70%; 90%) and (-1.4, 1, 2, 0.8, 2), respectively. Note that under this verification model, subjects with higher test results or higher covariate values or both are more likely to receive disease verification. Averaging over 1000 real- izations of simulation with a sample size of 2000, the percentage of disease verification among subjects in non-disease (D= 0), mild-disease (D= 1) and severe-disease (D= 2) groups are 26%, 67% and 89%, respectively. Overall, the probability of verification in the population is about 35%. The default simulations have a sample size of 1000. 7.4 Performance of VUS: Bias Table 7.2 reports average estimated VUS of the diagnostic test across 1000 replications of the simulation for different sample sizes, ranging from 400 to 2000. As expected, the CC estimator is biased (5.6% - 6.0%), compared to the true value of VUS (“True” estimates). Both IPW(E) and IPW(K) correct the noticeable bias of the CC estimator and are quite close to full data estimates. The relative bias of IPW and full Data to “True” estimates are practically ignorable. As sample size increases, their performance become better. The verification bias caused by ignoring the biased sampling procedure for disease verifica- tion is also apparent in figure 7.1, where the empirical ROC surface based on the complete cases (red surface) and full data (green surface) for one randomly chosen data realization is presented. The CC ROC surface is biased downwards relative to the full data curve (VUS value: 0.772 vs. 0.807). The graphs are drawn with respect to the three correct classifi- cation rates, across all possible decision thresholds. Figure 7.2 provides the bias-corrected 68 ROC surface using IPW(E) (red surface) and the full data (green surface) ROC surface. It is not surprising that IPW ROC surface lies closely next to the full data ROC surface and the corresponding VUS is similar to the full data value (0.812 vs. 0.807). The ROC surface for IPW(K) is not presented because it is very similar to that for IPW(E). When sample size is as small as 400 and the overall disease verification rate is 34%, ap- proximately only 136 subjects have disease status verified. Although the CC and IPW estimators both only require data from verified subjects, the IPW estimators exhibited sub- stantially smaller bias than the na¨ ıve CC estimator. This further justifies the useful appli- cation of the IPW estimators in correcting verification bias and properly assessing the test accuracy with small sample size. Table 7.2: Mean estimated VUS from 1000 realizations of the simulation with different sample sizes. Relative bias, (mean VUS - true estimate of 0.792)/0.792 is provided in parentheses. Method Sample Size 400 600 800 1000 2000 Full Data 0.791 (0.2%) 0.791 (-0.2%) 0.794 (-0.2%) 0.792 (-0.1%) 0.793 (0.1%) CC 0.745 (-6.0%) 0.745 (-5.9%) 0.748 (-5.6%) 0.746 (-5.8%) 0.748 (-5.7%) IPW(E) 0.791 (0.2%) 0.791 (0.1%) 0.795 (-0.4%) 0.793 (0.0%) 0.793 (0.1%) IPW(K) 0.791 (0.2%) 0.791 (0.2%) 0.795 (-0.4%) 0.792 (0.0%) 0.793 (0.1%) 7.5 Performance of VUS: SD In chapter 6, we developed a closed-form asymptotic variance estimator as well as Jack- knife variance estimator for IPW estimators of VUS. Based on the variation estimation, here we further investigate the performance of these asymptotic standard deviation (SD) estimators for the VUS for a range of sample sizes. The ratio between the Monte Carlo 69 mean of the estimated SD to the Monte Carlo SD of the VUS estimators (simulation SD) is presented in table 7.3. The simulation SD of the VUS estimators is calculated from 1000 realizations and can fairly represent the variability of the estimators. The Jackknife variance estimator performs very well for sample size as small as 400, with the ratio of estimated SD to simulation SD ranging from 0.983 to 1.043. Considering the closed-form variance estimators for the IPW estimators, the variance es- timator for IPW(K) is also very close to the empirical estimates and it consistently out- performs the variance estimator for IPW(E) in terms of closeness to the simulation SD. When sample size is small, the closed-form IPW(E) VUS variance estimator consistently underestimates the variance of IPW(E), which is likely due to the fact that the estimated variance does not account for the variation in estimating the verification probability. There- fore, naturally the extension to consider the variation in estimating the verification prob- ability is worth further investigation, which will be discussed in chapter 10. As sample size increases, the improvement in the performance of the closed-form variance estimator for IPW(E) demonstrates that this should not be a significant problem with sample sizes greatly than 600. This is not surprising because the variance in the estimated verification probability is relatively small, compared with the variance of VUS. The performance of the variance estimators is assessed by calculating coverage probabil- ities of the confidence interval (CI) corresponding to the variance estimator. Table 7.4 summarizes nominal 90 % CI coverage probabilities of VUS for several sample sizes. For the Jackknife variance estimators, nominal 90% coverage probabilities between 89.7% and 91.8% are achieved. The nominal 90% coverage probabilities of closed-form variance es- timators for IPW(K) have comparable performance (87.0% to 91.0%). Not surprising the closed-form variance estimator for IPW(E) did not obtain 90 % coverage rate (71.4% - 83.3%) when sample size ( 600) is small but has good coverage for sample size of 2000. In general, the Jackknife variance estimator and the closed-form variance estimator of d VUS IPW using known p perform very well under small samples. The closed-form vari- 70 ance estimator of d VUS IPW using estimated p, however, requires relatively larger sample size to achieve good performance. Table 7.3: The ratio of the Monte Carlo mean of the estimated SD to simulation SD of the estimators. Jackknife variance estimator used for full data, CC, IPW(E)-J, and IPW(K)-J. Closed form variance estimator used for IPW(E)-CF and IPW(K)-CF. Method Sample Size 400 600 800 1000 2000 Full Data 0.972 0.991 0.987 1.028 1.034 CC 0.998 1.008 1.013 1.017 1.033 IPW(E)-J 0.986 1.008 1.007 1.015 1.043 IPW(K)-J 0.983 1.001 1.002 1.012 1.034 IPW(E)-CF 0.714 0.833 0.875 0.911 0.992 IPW(K)-CF 0.917 0.949 0.966 0.980 1.019 7.6 Performance of VUS: MSE A good estimator should have reasonably small mean squared error (MSE), which repre- sents small bias and variation. Here we calculate the MSE for all the estimators. In table 7.5, the IPW MSE is smaller than the CC MSE (between 45% and 78%) but is about 35% more than the full data MSE, which is reasonable as about 66% of the observations are missing the disease status. 7.7 Performance of VUS: Relative Efficiency To compare the efficiency between IPW(K) and IPW(E) , we further investigate the small sample relative efficiency (SSRE) and estimated relative efficiency (ERE). SSRE is calcu- 71 Table 7.4: 90 % CI coverage probability of VUS variance estimator. Jackknife variance estimator used for full data, CC, IPW(E)-J, and IPW(K)-J. Closed form variance estimator used for IPW(E)-CF and IPW(K)-CF. Method Sample Size 400 600 800 1000 2000 Full Data 88.6 90.4 90.2 91.3 91.7 CC 89.7 89.9 90.0 91.6 91.5 IPW(E)-J 90.2 90.3 91.0 91.7 91.8 IPW(K)-J 89.7 90.2 90.5 91.5 91.6 IPW(E)-CF 76.9 83.9 87.2 87.8 89.4 IPW(K)-CF 87.0 88.5 88.6 90.2 91.0 lated as the ratio of the simulation variance for two estimators, while ERE is the ratio of the Jackknife variance estimators. SSRE relative to d VUS IPW(K) is presented in Table 7.6. A value less than 1 implies IPW(E) is more efficient than d VUS IPW(K) . Overall, IPW(K) and IPW(E) have similar efficiency. Again, the closeness between SSRE and ERE supports the good performance of the Jack- knife variance estimators. 7.8 Varying Test Accuracy The inherent accuracy of the new diagnostic test may influence the degree of verification bias[3]. In this setting, by changing values of a 1 and b 1 , we can vary the accuracy of the test. We consider the following four scenarios, Scenario A (Default setting): a 1 =b 1 = 1. Scenario B:a 1 = 1,b 1 = 0. 72 Table 7.5: MSE10 3 of VUS estimators. Jackknife variance estimator used for full data, CC, IPW(E), and IPW(K). Method Sample Size 400 600 800 1000 2000 Full Data 2.21 1.46 1.06 0.86 0.42 CC 5.42 4.30 3.46 3.34 2.59 IPW(E) 3.03 1.97 1.43 1.15 0.57 IPW(K) 2.98 1.97 1.41 1.14 0.57 Table 7.6: Efficiency comparison between d VUS IPW(E) and d VUS IPW(K) . ERE relative to d VUS IPW(K) is provided in parentheses. Method Sample Size 400 600 800 1000 2000 IPW(E) 1.014 (1.008) 0.992 (1.005) 0.997 (1.008) 0.997(1.003) 0.986 (1.002) Scenario C:a 1 = 0:5,b 1 = 0:5. Scenario D:a 1 = 0,b 1 = 0. Detailed information about verification percentage can be found in table 7.7. All other simulation set-up parameters, including sample size, verification mechanism and disease prevalence, stay the same as the default setting. Results are presented in table 7.8. Since the CC estimator does not account for biased sampling, as expected, CC yields bi- ased estimates of VUS when there is a potential for verification bias. In each scenario, CC underestimates the true VUS. Conversely, IPW is very close to true value. Note that as the test accuracy decreases, the relative bias tends to increase which is a reflection of the smaller value of base value (true value of VUS). Jackknife variance estimators are similar to the simulation variance and the 90% coverage rates are near their nominal values. 73 Table 7.7: Detailed information about verification rate: varying test accuracy Statistic Scenario A Scenario B Scenario C Scenario D Verification Rate (overall) 35% 36% 36% 36% Verification Rate (D = 0) 26% 28% 28% 31% Verification Rate (D = 1) 67% 59% 62% 51% Verification Rate (D = 2) 89% 80% 84% 67% Table 7.8 also provides MSE of the VUS estimators for scenarios A-D. Under all sce- narios, IPW shows smaller MSE relative to CC. The two IPW estimators give comparable performance. 7.9 Varying Disease Prevalence In this section we investigate the effect of varying disease prevalence on the performance of the proposed VUS estimators. Three scenarios are defined as following, Low-prevalence: Disease prevalence rates as 90%, 9% and 1%, for non-disease, mild- disease and severe-disease status, respectively. Medium-prevalence: Disease prevalence rates as 70%, 20% and 10%, for non-disease, mild-disease and severe-disease status, respectively. High-prevalence: Disease prevalence rates as 40%, 40% and 20%, for non-disease, mild- disease and severe-disease status, respectively. All other simulation set-up including sample size, verification mechanism and test accuracy stay the same as the default setting. Detailed information about verification percentage can 74 be found in table 7.9. Table 7.10 reports the average estimated VUS, SD and MSE of the estimated VUS, av- erages of Jackknife estimated SD, and 90% CI coverage for different disease prevalence. IPW is consistently closer to the true value and remains more accurate than CC under dif- ferent disease prevalence. The Jackknife variance estimators generally work well in terms of consistency and CI coverage. Compare the two scenarios when we have medium or high disease prevalence to the low prevalence scenario, all estimators are more efficient, which is reflected by the decreasing variance. This is reasonable because increasing the disease prevalence while keeping the verification mechanism the same will make the study less likely to have extreme small numbers of verified subjects in each disease group, especially in the severe-disease group. Again, IPW(E) has similar efficiency compared to IPW(K). In addition, as disease prevalence increases, the effective sample size increases and thus MSE decreases accordingly. The IPW estimators consistently have smaller MSE than CC. 7.10 Varying Verification Mechanism Next, we investigate the performance of the VUS estimators when we vary the verifica- tion percentage. Four different scenarios representing low, medium, high verification rate as well as random verification are considered. Detailed information about the verification percentage can be found in table 7.11. All other simulation set-up parameters including sample size, disease prevalence and test accuracy stay the same as the default setting. In table 7.12, considering the first three scenarios with different verification rates, we find very little small sample bias (less than 0.3%) in the IPW estimator. Conversely, CC underestimates VUS (-1.2% to -5.4%). As expected, increasing the verification rate leads to a large number of complete cases, i.e. effective sample size for the CC estimator and thus, its bias decreases accordingly. Also, as the percentage of subjects receiving disease verification is increased, the difference in 75 efficiency between the full data estimators and the other estimators decreases. In the fourth scenario, when verified subjects are selected completely at random, each sub- ject is sharing the same verification probability, thus there is little potential for verification bias and the IPW estimator attaches each verified subject with the same weight (p 1 i ), which makes it essentially the same as the CC estimator. As expected, both IPW and CC are unbiased. This simulation supports that IPW remains valid and robust under different verification mechanisms. Apart from that, considering MSE, IPW remains valid and robust under different verifica- tion mechanisms as IPW exhibits lower MSE compared to CC. The performance of IPW(E) is no different than that of IPW(K). 7.11 Summary In this chapter, we studied the performance of the proposed IPW estimator with estimated or known verification probabilities as well as their variance estimators based on U-statistic asymptotic theory and the Jackknife approach. First, the results of our simulation studies showed by ignoring the existence of biased sampling in the verification procedure, CC es- timators yielded biased estimation of VUS. Conversely, IPW estimators perform quite well under different scenarios, including small sample size, rare disease rate and low verification rates. In addition, the MSE of IPW(W) and IPW(E) are comparable and are consistently less than that of CC. Jackknife variance estimators in general were very similar to the empirical estimates of the variance and the CI coverage was also close to the nominal 90%. The closed-form variance estimator based on U-statistic theory for IPW(K) is similar to the simulation variance and has a good nominal 90% coverage probabilities. In comparison, the closed-form variance estimator for IPW(E) requires greater sample size to approximate the simulation variance, which is likely due to not accounting for the extra variance introduced by estimating the 76 verification probabilities. Therefore, we suggest that the SD estimated by the Jackknife method may be better than the SD estimated by the U-statistic methods when the sample size is small and the verification mechanism is unknown. When the sample size is large, considering the fact that calculating Jackknife variance is very time-consuming in this case, and the closed-form estimator takes much less time, we recommend to use the closed-form estimator to assess the variance of IPW estimators. In fact, it is not uncommon in practice that the verification mechanism is well understood or can be controlled by the researchers, and thus, even for small sample size it is less of an issue to use the closed-form variance estimator. In the next chapter, a different simulation set-up will be introduced to allow the evaluation of the performance of the IPW, DR and PDR VUS estimators together under ignorable missingness as well as non-ignorable missingness. 77 Figure 7.1: Full data (Green) and CC (Red) ROC surface from a randomly chosen realiza- tion of the simulation study; TPR0, TPR1 and TPR2 for D = 0, 1, and 2, respectively. 78 Figure 7.2: Full data (Green) and IPW (Red)ROC surface from a randomly chosen real- ization of the simulation study; TPR0, TPR1 and TPR2 for D = 0, 1, and 2, respectively. 79 Table 7.8: Comparison of the VUS Estimators: Varying Test Accuracy. Method VUS SD SE y Coverage (%) z MSE $ Scenario A Full Data 0.792 (-0.1%) 0.028 0.029 91.3 0.86 CC 0.746 (-5.8%) 0.034 0.035 91.6 3.34 IPW(E) 0.793 (0.1%) 0.033 0.034 91.7 1.15 IPW(K) 0.792 (0.0%) 0.033 0.034 91.5 1.14 Scenario B Full Data 0.458 (0.2%) 0.036 0.036 90.1 1.27 CC 0.402 (-12.1%) 0.041 0.040 88.7 4.69 IPW(E) 0.459 (0.5%) 0.046 0.045 88.6 1.99 IPW(K) 0.459 (0.4%) 0.046 0.044 88.4 1.98 Scenario C Full Data 0.565 (-0.0%) 0.036 0.036 90.6 1.32 CC 0.506 (-10.5%) 0.041 0.042 90.0 5.23 IPW(E) 0.567 (0.2%) 0.045 0.045 89.9 2.00 IPW(K) 0.566 (0.2%) 0.045 0.045 89.7 1.99 Scenario D Full Data 0.167 (-0.0%) 0.023 0.023 90.0 0.52 CC 0.136 (-18.1%) 0.025 0.025 90.0 1.55 IPW(E) 0.168 (0.9%) 0.030 0.029 89.4 0.86 IPW(K) 0.168 (0.9%) 0.030 0.029 89.7 0.86 relative bias to “True” estimates is provided in parentheses. simulation SD. y the average of the SD estimator (Jackknife). z 90 % CI coverage probabilities calculated using the SD estimator (Jackknife) $ MSE10 3 of VUS estimators calculated using the SD estimator (Jackknife) 80 Table 7.9: Detailed information about verification rate: varying disease prevalence Statistic Low-prevalence Medium-prevalence High-prevalence True VUS 0.85 0.78 0.81 Verification Rate (overall) 35% 35% 35% Verification Rate (D = 0) 30% 23% 20% Verification Rate (D = 1) 82% 53% 23% Verification Rate (D = 2) 93% 83% 56% Table 7.10: Comparison of the VUS Estimators: Varying Disease Prevalence Method VUS SD SE y Coverage (%) z MSE $ Low-prevalence Full Data 0.850 (0.1%) 0.052 0.050 91.7 2.55 CC 0.804 (-5.2%) 0.058 0.056 90.7 5.04 IPW(E) 0.850 (0.1%) 0.056 0.053 91.2 2.84 IPW(K) 0.850 (0.1%) 0.055 0.053 91.2 2.84 Medium-prevalence Full Data 0.774 (-0.1%) 0.023 0.022 89.6 0.49 CC 0.743 (-4.2%) 0.031 0.029 88.8 1.90 IPW(E) 0.774 (-0.1%) 0.030 0.029 88.8 0.86 IPW(K) 0.774 (-0.2%) 0.030 0.029 89.1 0.86 High-prevalence Full Data 0.809 (-0.3%) 0.015 0.015 89.1 0.22 CC 0.839 (3.5%) 0.028 0.028 89.9 1.61 IPW(E) 0.811 (-0.0%) 0.031 0.032 89.9 0.99 IPW(K) 0.811 (0.0%) 0.031 0.031 90.4 0.99 relative bias to “True” estimates is provided in parentheses. simulation SD. y the average of the SD estimator (Jackknife). z 90 % CI coverage probabilities calculated using the SD estimator (Jackknife) $ MSE10 3 of VUS estimators calculated using the SD estimator (Jackknife) 81 Table 7.11: Detailed information about verification rate: varying verification mechanism Statistic Low verification Medium verification High verification Random verification overall 15% 54% 84% 36 % D = 0 9% 48% 81% 36 % D = 1 30% 73% 97% 36 % D = 2 63% 93% 99% 36 % 82 Table 7.12: Comparison of the VUS Estimators: Varying Verification Rate Method VUS SD SE y Coverage (%) z MSE $ Low verification Full Data 0.792 (-0.1%) 0.028 0.029 91.8 0.86 CC 0.750 (-5.4%) 0.048 0.049 89.6 4.19 IPW(E) 0.795 (0.3%) 0.059 0.054 88.0 2.93 IPW(K) 0.794 (0.2%) 0.058 0.053 88.3 2.86 Medium verification Full Data 0.792 (-0.1%) 0.028 0.029 91.8 0.86 CC 0.775 (-2.1%) 0.032 0.033 91.0 1.30 IPW(E) 0.792 (-0.0%) 0.033 0.033 90.5 1.03 IPW(K) 0.792 (-0.0%) 0.032 0.033 90.7 1.03 High verification Full Data 0.792 (-0.1%) 0.028 0.029 91.8 0.86 CC 0.783 (-1.2%) 0.029 0.030 91.0 0.97 IPW(E) 0.792 (-0.1%) 0.029 0.030 91.2 0.87 IPW(K) 0.792 (-0.1%) 0.029 0.030 91.2 0.87 Random verification Full Data 0.792 (-0.1%) 0.028 0.029 91.8 0.86 CC 0.793 (0.1%) 0.055 0.055 90.9 2.98 IPW(E) 0.793 (0.2%) 0.055 0.055 91.0 3.01 IPW(K) 0.794 (0.1%) 0.055 0.055 90.9 2.98 relative bias to “True” estimates is provided in parentheses. simulation SD. y the average of the SD estimator (Jackknife). z 90 % CI coverage probabilities calculated using the SD estimator (Jackknife) $ MSE10 3 of VUS estimators calculated using the SD estimator (Jackknife) 83 Chapter 8 Simulation studies: IPW, DR and PDR Methodology In the previous chapter, we focused on evaluating the performance of IPW estimators under ignorable missingness. In practice, observed factors may not fully capture the association between the verification mechanism and the disease status. Therefore even conditioning on observed variables, the missingness is still non-ignorable. In chapter 5, we proposed IPW as well as DR and PDR estimators accounting for non-ignorable missingness. Here, we perform simulation studies to evaluate and compare the finite-sample behavior of DR, PDR and IPW VUS estimators when the missingness in disease verification is non-ignorable. 8.1 Simulation Set-up The simulation set-up is similar to that of Rotnitzky et al.(2006)[31] but extended to the three-classification problem. For each subject, we generated a indicator of disease status with three ordinal categories,D multinomial(0:1;0:2;0:7); a continuous new diagnostic test, from the modelTjD=dN(d;0:25); a covariate,AjD=dN(0:5d;0:25); and a verification status indicator,V , that follows model (5.1) withh(T;A)=g 0 +g 1 T+g 2 A and q(T;A)=a. In this setting, the disease prevalence rates of non-disease, mild-disease and severe-disease groups are 70%, 20% and 10%, respectively. Note that by choosinga = 0, 84 the missingness in disease verification is ignorable. In other words, it means that the MAR assumption is satisfied. Scenarios with non-ignorable missingness can be incorporated by simply choosing a6= 0. In the estimation procedure, the correct models for verification given T, A (T, A and D for non-ignorable missingness) a generalized linear model with logit link is the correct model. For disease, a generalized linear model for D given T and A with logit link is the correct model. For comparison purposes the estimates based on the “Full Data” (i.e., using all simulated data as if all subjects had their disease status verified), which are not subject to verification bias, are also presented. Simulation results under dif- ferent scenarios are summarized over 1000 replications. Next, let us define short notation for our proposed estimators. Notation in Chapter 7 includ- ing “CC”, “Full Data”, “IPW(E)” and “IPW(K)” still apply here. Note that for “IPW(E)” and “IPW(K)”, they have the same structure as the IPW estimators in chapter 7, but the only difference is that here we estimate the verification probability accounting for potential non- ignorable missingness. In addition, we consider two DR estimators. “DR1” reports results for d VUS DR using ˆ g g g solving 5.17 withd(T;A)=(1;T;A) T , while “DR2” reports results for d VUS DR using ˆ g g g opt solving 5.17 withd opt (T;A) in 5.18. “PDR” represents d VUS PDR . Again, we obtain the “True” estimates as the average “Full Data” estimates when sample size is 5000 across 1000 realizations, which represents the true value of VUS. 8.2 Working Models Correct working models including either disease (P(D = djT;A;V = 1)) or verification models (P(V = 1jT;A;D)), or both are required for the validity of the proposed estimators. Under our data-generating process, the correct working disease model is as follows, 85 log P(D= 1jV = 1;T;A) P(D= 0jV = 1;T;A) ! = log(o 1 (T;A))+b 01 +b 11 T+b 21 A log P(D= 2jV = 1;T;A) P(D= 0jV = 1;T;A) ! = log(o 2 (T;A))+b 02 +b 12 T+b 22 A (8.1) whereo 1 (T;A)= 1+expg 0 +g 1 T+g 2 A 1+expg 0 +g 1 T+g 2 A+a ando 2 (T;A)= 1+expg 0 +g 1 T+g 2 A 1+expg 0 +g 1 T+g 2 A+2a . And the correctly specified model for selection in 5.10 is h(T;A;g)=g 0 +g 1 T+g 2 A (8.2) witha= 0 (ignorable missingness) or1 (non-ignorable missingness) being specified and known. 8.3 Small Sample Bias: VUS Estimator 8.3.1 Under Ignorable Missingness We first assess how well the new methods correct verification bias under different sample size when the verification process is ignorable (i.e. MAR). For the verification model, we choose (g 0 ;g 1 ;g 2 ;a) = (1;1;0:5;0). These values result in an average of 39% of the subjects receiving disease verification. The percentage of disease verification among sub- jects in non-disease (D= 0), mild-disease (D= 1) and severe-disease (D= 2) groups are, 28%, 56% and 80 %, respectively. Note that in this case, the MAR assumption is satisfied. Results are presented for scenarios in which study samples size n vary from 100 to 1000. By showing the mean estimates of VUS across 1000 realizations, Table 8.1 demonstrates 86 finite-sample behavior of the new estimators under the MAR assumption. The na¨ ıve estimator (CC) of VUS does not correct for the biased sampling and thus, as expected, all scenarios with varying sample size, demonstrate bias (1.8% - 2.2%, relative bias in absolute value) by underestimating the VUS. Very little bias was observed for the bias-correction estimators. IPW(E), IPW(K), DR1 and DR2 VUS estimators are very close (magnitude of relative bias < 0.3%) to the true VUS. Even when the sample size is as small as 100, their bias is negligible. It is not surprising that average estimates of IPW(E), IPW(K), DR1, and DR2 VUS become closer to the true value of VUS as the sample size increases. By comparing the performance of the new estimators under small sample size, generally DR and IPW estimators perform better than the PDR estimator. When sample size is greater than 200, the magnitude of the bias for PDR is negligible. Figure 8.1 illustrates that the CC ROC surface (red surface) underestimates the full data ROC surface (green surface). It is not surprising that in figure 8.2 little bias is observed in the IPW bias-corrected ROC surface (red surface) compared to the full data surface (green surface). In conclusion, our estimators perform very well with respect to bias for sample sizes as small as 100, under ignorable missingness. Nevertheless, IPW and DR estimators have better overall performance, compared to the PDR estimator, because the PDR estimator requires larger sample size to achieve comparable performance. As expected from the the- oretical development, our bias correction procedures provide consistent estimators for finite sample sizes. 87 Table 8.1: Mean estimated VUS from 1000 realizations of the simulation with different sample sizes under the ignorable missingness (True VUS: 0.843). Relative bias, (mean VUS - 0.843)/0.843 is provided in parentheses. Method Sample Size 100 200 500 800 1000 2000 Full Data 0.841(-0.3) 0.843(-0.0) 0.844(0.0) 0.842(-0.1) 0.843 (-0.0) 0.843(0.0) CC 0.824(-2.2) 0.826(-2.0) 0.828(-1.8) 0.827(-1.9) 0.826 (-2.0) 0.827(-2.0) DR1 0.841(-0.3) 0.844(0.1) 0.844(0.1) 0.843(-0.0) 0.843(-0.0) 0.843(-0.0) DR2 0.841(-0.2) 0.844(0.1) 0.844(0.1) 0.843(-0.0) 0.843(-0.0) 0.843(-0.0) IPW(E) 0.842(-0.1) 0.843(0.0) 0.844(0.1) 0.843(-0.1) 0.842(-0.1) 0.843(-0.0) IPW(K) 0.841(-0.2) 0.843(-0.0) 0.844(0.1) 0.842 (-0.1) 0.842(-0.1) 0.843(-0.0) PDR 0.830(-1.6) 0.834(-1.1) 0.841(-0.2) 0.840(-0.3) 0.841(-0.2) 0.842(-0.1) 8.3.2 Under Non-ignorable Missingness As mentioned in previous chapters, under the ignorable missingness assumption, the prob- ability of a subject having the disease status verified is purely determined by the test result and the subject’s observed characteristics, and is conditionally independent of the unknown true disease status. Our methods also can accommodate the existence of non-ignorable missingness. We conduct the following simulation experiments with the goal to evaluate the performance of our new estimators in terms of recovering verification bias, under non- ignorable missingness. To represent the non-ignorable missingness scenario, we choose (g 0 ;g 1 ;g 2 ;a) = (1;1;0:5;1) for the verification model. Because the residual associ- ation between the unknown disease status and verification process is denoted by a, this setting assumes that within certain levels ofT andA, the odds of having disease status ver- ified is about 2.7 times higher for subjects having middle-level disease stage (i.e., D= 1) than for subjects without disease. In this case, roughly 45% of the disease status is verified. The percentage of disease verification among subjects in non-disease (D= 0), mild-disease (D= 1) and severe-disease (D= 2) groups are 28%, 76% and 97%, respectively. 88 Table 8.2: Mean estimated VUS from 1000 realizations of the simulation with different sample sizes, under non-ignorable missingness (True VUS: 0.843). Relative bias, (mean VUS - 0.843)/0.843 is provided in parentheses. Method Sample Size 100 200 500 800 1000 2000 Full Data 0.842(-0.2) 0.843(0.0) 0.844(0.1) 0.842(-0.1) 0.843(-0.0) 0.843(0.0) CC 0.814(-3.5) 0.814(-3.5) 0.816(-3.2) 0.814(-3.4) 0.814(-3.4) 0.814(-3.4) DR1 0.842(-0.2) 0.842(-0.1) 0.844(0.1) 0.843(-0.0) 0.843(-0.0) 0.843(-0.0) DR2 0.841(-0.2) 0.842(-0.1) 0.844(0.1) 0.843(-0.0) 0.843(-0.0) 0.843(-0.0) IPW(E) 0.841(-0.2) 0.841(-0.2) 0.844(0.1) 0.843(-0.0) 0.842(-0.1) 0.843(-0.0) IPW(K) 0.842(-0.2) 0.842(-0.2) 0.844(0.1) 0.842(-0.1) 0.843(-0.0) 0.843(-0.0) PDR 0.835(-1.0) 0.838(-0.7) 0.842(-0.1) 0.841(-0.2) 0.842(-0.1) 0.842(-0.1) Table 8.2 presents the mean estimates of VUS across the 1000 realizations, for a range of sample sizes. Under non-ignorable missingness, again the na¨ ıve estimator (CC) leads to substantial negative bias by roughly 3 %, and it can not be recovered by a larger sample size. For a large sample size, such as 2000, all new estimators (IPW(E), IPW(K), DR1, DR2 and PDR) are unbiased. For a small sample size of 100, the largest bias from the proposed estimators is approximately -1% for PDR, which is noticeably smaller than that of the na¨ ıve estimator. Again, figure 8.3 illustrates that CC underestimates (red curve) the full data (green curve) ROC surface, while the closeness between the Full data and the IPW bias-corrected ROC surface is evident in figure 8.4. 89 8.4 Performance of VUS: Variance 8.4.1 Under Ignorable Missingness Next we consider the performance of the Jackknife variance estimators for the VUS as a function of sample size. Here we mainly focus on the ratio between the Monte Carlo mean of estimated SD by applying the Jackknife approach (average over the 1000 real- izations) to the Monte Carlo SD of estimated VUS (simulation SD). Again, the simulation SD is calculated from 1000 realizations to reflect the true variability of the VUS estimators. Under the setting where the MAR assumption is satisfied which is described in detail in Section 7.2.1, table 8.3 provides the ratio between the estimated SD to the simulation SD. With small sample size, such as 100 and 200, the Jackknife estimator is quite comparable to the simulated variance (ratio ranging from 0.82 to 1.01) and when sample size gets larger, the estimated variance is essentially unbiased (ratio between 0.93 and 1.06). We observe that when the sample size is small, the variance estimators for DR and PDR appear to have worse performance compared to that for IPW. In general, the variance estimator appears to perform fairly well even under small sample sizes, and thus can be used to generate interval estimates in small samples. Table 8.4 presents the 90% coverage rates for our proposed estimators. Overall, nominal coverage probabilities of between 81.4 and 92.1 are achieved for all the estimators. For small sample size of 100, our proposed estimators still give good performance with coverage rates above 80%. And better performance is observed with larger sample size with coverage rates ranging from 86.1 to 92.1. 8.4.2 Under Non-ignorable Missingness We further investigate the performance of the Jackknife variance estimator under non- ignorable missingness. Table 8.5 presents that the SD estimator and the simulated SD are similar with ratio ranging from 0.91 to 1.07. Consistently, for small sample size (100 or 200), variance estimators of IPW, Full Data and CC perform better than that of DR and 90 Table 8.3: The ratio of the Monte Carlo mean of estimated SD to the simulation SD of the estimators under ignorable missingness. Method Sample Size 100 200 500 800 1000 2000 CC 1.012 0.977 1.059 0.994 1.012 0.999 Full Data 0.974 0.956 1.007 1.002 1.046 1.031 DR1 0.902 0.928 1.017 0.979 1.002 0.995 DR2 0.913 0.929 1.020 0.980 1.003 0.995 IPW(E) 0.970 0.970 1.047 0.982 1.009 1.004 IPW(K) 0.965 0.965 1.044 0.982 1.008 0.999 PDR 0.977 0.818 0.977 0.933 0.984 0.989 PDR. When the sample size is small, the average estimated SD tends to be slightly smaller than the simulation SD, but when the sample size reaches 500, it tends to be larger than the simulation SD. The 90% coverage probabilities presented in table 8.6 are also in agree- ment with the consistency of the variance estimator as the coverage rates are close to the nominal 90% (ranging from 84.5 to 92.5). The good performance of our variance estima- tor as well as the satisfying coverage under non-ignorable missingness further justifies its practical application in generating interval estimates. 8.5 Performance of VUS: Relative Efficiency Considering the efficiency of the different VUS estimators relative to each other, both SSRE and ERE are calculated. Detailed information about SSRE and ERE calculations can be found in section 7.6. SSRE relative to DR2, under ignorable and non-ignorable missingness are presented in ta- ble 8.7 and table 8.8, respectively. A SSRE value less than 1 represents the estimator is 91 Table 8.4: 90 % CI coverage probability of VUS variance estimator: under non-ignorable missingness Method Sample Size 100 200 500 800 1000 2000 Full Data 82.7 88.0 89.9 89.8 92.1 91.2 CC 83.3 88.4 91.0 90.0 89.4 90.9 DR1 82.2 87.1 90.2 90.1 90.6 89.1 DR2 81.4 87.7 90.5 90.3 90.6 89.2 IPW(E) 82.7 89.2 90.9 89.8 89.9 90.7 IPW(K) 82.8 89.0 90.4 89.7 89.8 90.7 PDR 83.7 86.1 90.3 88.8 89.4 89.7 more efficient than DR2. Generally, under both ignorable and non-ignorable missingness, DR2 is the most efficient bias-corrected VUS estimator. Comparing IPW(E) and IPW(K) suggests that the IPW estimator with estimated verification probability is more efficient than with known probability. As expected, the Full data estimator is more efficient than other estimators because of the larger effective sample size. ERE relative to DR2 is also included in parentheses. The fact that ERE is close to SSRE also implies good performance of the jackknife variance estimator. 8.6 Performance of VUS: MSE MSE of each estimator is compared in table 8.9 when the MAR assumption is met. For DR, the MSE is smaller than MSE of CC (between 19% and 50%). IPW also has lower MSE compared to CC (between 9% and 46%). For small sample size of 100 and 200, the rela- tively larger variation causes PDR to have slightly higher MSE relative to CC. With sample size larger than 200, PDR MSE is comparable to the MSE of the other estimators. Table 92 Table 8.5: Ratio of the Monte Carlo mean of estimated SD of the estimators to the Monte Carlo SD of the estimators: under non-ignorable missingness Method Sample Size 100 200 500 800 1000 2000 CC 0.975 0.964 1.024 1.014 1.044 1.004 Full Data 0.959 0.955 1.007 1.002 1.046 1.031 DR1 0.922 0.936 0.995 1.012 1.045 1.009 DR2 0.954 0.942 0.998 1.015 1.050 1.012 IPW(E) 0.995 0.982 1.033 1.029 1.072 1.035 IPW(K) 0.978 0.966 1.023 1.013 1.047 1.015 PDR 0.919 0.912 0.982 0.986 1.032 1.006 8.10 provides MSE results when the missingness is non-ignorable. All bias-corrected es- timators have lower MSE values compared to CC. DR has the best performance and IPW has similar performance as PDR. 8.7 Default Setting Note that in the rest of this Chapter, we will focus on the non-ignorable missingness situ- ation. In the following simulation studies, we continue using the current setting about the residual association between disease status and missingness (a =1). In the following simulation studies, we investigate the effect of test accuracy, disease preva- lence, verification rate and model correctness on the performance of the VUS estimators. In this simulation setting, we specify q(T;A) to be a constant a for DR estimators. As suggested before, we will choose a series of values ofa and run a sensitivity analysis. First, let us set up the default setting and then we change factors one at a time to study their 93 Table 8.6: 90 % CI coverage probability of VUS variance estimator: under ignorable missingness Method Sample Size 100 200 500 800 1000 2000 Full Data 85.3 88.6 89.9 89.8 92.1 91.2 CC 86.2 88.8 90.2 91.0 91.6 90.4 DR1 84.5 88.1 89.8 91.1 91.9 90.4 DR2 85.0 88.5 89.7 91.1 91.9 90.6 IPW(E) 87.0 89.3 90.3 91.5 92.5 91.7 IPW(K) 86.1 88.7 90.3 90.5 91.9 90.7 PDR 84.8 89.0 89.8 90.9 91.6 90.2 influence on the performance of the proposed estimators. Since in Chapter 7, comprehen- sive simulation studies have been performed to evaluate the performance of proposed esti- mator when the missingness is ignorable. Here we will mainly focus on the non-ignorable missingness situation. The default sample size is 1000. We will continue use the simulation setting in section 8.3.2. 8.8 Varying Test Accuracy We define the accuracy of the new diagnostic test by setting the model TjD=dN(m1 d;0:25). In the default setting, m1= 1. Here, we change the test accuracy by varying val- ues of m1. We consider the following three scenarios, Low accuracy: m1= 0:3. Medium accuracy: m1= 0:7. High accuracy: m1= 1:2. 94 Table 8.7: Efficiency relative to DR2. ARE relative to DR2 is provided in parentheses under ignorable missingness. Method Sample Size 100 200 500 800 1000 2000 Full Data 0.735 (0.784) 0.753 (0.774) 0.762 (0.753) 0.728 (0.745) 0.715 (0.745) 0.717 (0.743) CC 1.000 (1.108) 1.009 (1.061) 0.980 (1.018) 0.990 (1.004) 0.993 (1.002) 0.995 (0.999) DR1 1.021 (1.008) 1.005 (1.003) 1.001 (0.999) 1.001 (1.000) 1.002 (1.001) 1.000 (1.000) IPW(E) 1.019 (1.082) 1.015 (1.060) 1.012 (1.039) 1.033 (1.035) 1.027 (1.033) 1.023 (1.032) IPW(K) 1.025 (1.082) 1.022 (1.062) 1.014 (1.038) 1.033 (1.036) 1.028 (1.033) 1.026 (1.031) PDR 1.210 (1.294) 1.277 (1.125) 1.109 (1.063) 1.098 (1.046) 1.051 (1.030) 1.021 (1.015) Table 8.8: Efficiency relative to DR2. ARE relative to DR2 is provided in parentheses: non-ignorable missingness. Method Sample Size 100 200 500 800 1000 2000 Full Data 0.889 (0.893) 0.874 (0.886) 0.897 (0.885) 0.888 (0.885) 0.867 (0.884) 0.717 (0.743) CC 1.104 (1.129) 1.093 (1.119) 1.108 (1.106) 1.112 (1.105) 1.112 (1.104) 0.995 (0.999) DR1 1.038 (1.003) 1.010 (1.003) 1.003 (1.000) 1.005 (1.000) 1.003 (1.000) 1.000 (1.000) IPW(E) 1.014 (1.058) 1.003 (1.046) 1.016 (1.030) 1.007 (1.028) 1.003 (1.027) 1.023 (1.032) IPW(K) 1.021 (1.047) 1.015 (1.041) 1.032 (1.030) 1.030 (1.027) 1.023 (1.026) 1.026 (1.031) PDR 1.088 (1.048) 1.096 (1.061) 1.042 (1.012) 1.020 (1.002) 1.001 (0.996) 1.021 (1.015) Detailed information about true value of VUS and verification percentage can be found in table 8.11. All other simulation set-up including sample size, verification mechanism and disease prevalence stay the same as the default setting. Table 8.12 summarizes the results of simulations. We observe that the proposed VUS es- timators are more accurate than the CC, which yields negative biased estimates of VUS (magnitude of relative bias between 2.1% and 13.5%). DR, PDR and IPW estimators give unbiased results and have similar efficiency, although DR in general tends to be more effi- cient than the other two estimators. Jackknife variance estimators are close to the simula- 95 Table 8.9: MSE10 3 of VUS estimators: ignorable missingness. Jackknife variance estimator used for all the estimators. Method Sample Size 100 200 500 800 1000 2000 Full Data 3.40 1.60 0.61 0.38 0.30 0.15 CC 7.14 3.29 1.34 0.96 0.82 0.54 DR1 5.63 2.68 1.07 0.69 0.55 0.27 DR2 5.53 2.67 1.07 0.69 0.55 0.27 IPW(E) 6.48 2.99 1.16 0.74 0.58 0.29 IPW(K) 6.47 3.00 1.16 0.74 0.58 0.29 PDR 9.44 3.46 1.21 0.76 0.58 0.28 tion variance and the 90% coverage rates are almost successfully retained. Table 8.12 also indicates that in general DR has the lowest MSE. PDR has comparable performance but the MSE can be a little bit higher than that of DR when the accuracy is low. The IPW estimators have very similar MSE which are higher than that of PDR and DR and the difference increases when the accuracy decreases. 8.9 Varying Disease Prevalence In this section, we investigate whether our estimators remain valid and robust with vary- ing prevalence rate of disease stages. In the default setting, we let disease status, D multinomial(0:1;0:2;0:7). By changing three parameters in the multinomial distribution, we can vary the disease prevalence. Three scenarios are considered, Low-prevalence: Disease prevalence rates as 90%, 9% and 1%, for non-disease, mild- disease and severe-disease status. 96 Table 8.10: MSE10 3 of VUS estimators: non-ignorable missingness. Jackknife vari- ance estimator used for all the estimators. Method Sample Size 100 200 500 800 1000 2000 Full Data 3.37 1.60 0.61 0.38 0.30 0.15 CC 6.24 3.41 1.69 1.43 1.31 1.07 DR1 4.25 2.04 0.77 0.49 0.39 0.19 DR2 4.23 2.03 0.77 0.49 0.39 0.19 IPW(E) 4.73 2.22 0.82 0.52 0.41 0.20 IPW(K) 4.63 2.20 0.82 0.52 0.41 0.20 PDR 4.71 2.32 0.79 0.50 0.39 0.19 Table 8.11: Detailed information about verification rate: varying test accuracy. Statistic Low accuracy Medium accuracy High accuracy True VUS 0.376 0.682 0.910 Verification Rate (overall) 41% 43% 45% Verification Rate (D = 0) 88% 94% 98% Verification Rate (D = 1) 63% 71% 80% Verification Rate (D = 2) 28% 28% 28% Medium-prevalence: Disease prevalence rates as 50%, 30% and 20%, for non-disease, mild-disease and severe-disease status. High-prevalence: Disease prevalence rates as 20%, 45% and 35%, for non-disease, mild- disease and severe-disease status. Detailed information about verification percentage can be found in table 8.13. All other 97 simulation set-up including sample size, verification mechanism and test accuracy stay the same as the default setting. As shown in table 8.14, CC consistently underestimates the VUS by roughly 3.4% un- der different disease prevalence settings. All bias-corrected estimators perform very well; although PDR tends to have higher bias compared to the other estimators. All the bias corrected VUS estimators have similar efficiency based on the average simulation SD. It seems that when the disease prevalence is very high, the proposed estimators tend not to perform as well as under medium disease prevalence scenarios. Such instability might be due to the relatively small effective sample size in the non-diseased group. Again, Jack- knife variance estimators resemble the simulation variance and the 90% coverage rates are close to the nominal level. Considering MSE, as expected, CC has the highest MSE. In general, DR has better pefor- mance than IPW and PDR. IPW and PDR have similar efficiency. Interestingly, when the disease prevalence is high, IPW and PDR both have smaller MSE than DR. Currently we do not understand the reason behind this finding. 8.10 Varying Verification Mechanism The magnitude of verification bias also depends on the verification mechanism. Here we vary the verification mechanism to further assess the performance of our methods. Four different scenarios representing low, medium, high verification rate as well as random ver- ification, are considered. Detailed information about verification percentage can be found in table 8.15. Low verification: Overall, the probability of verification in the population is about 24%. Verification rate = 7%, 50% and 94%, forD= 0,D= 1 andD= 2, respectively. 98 Medium verification: Overall, the probability of verification in the population is about 46%. Verification rate = 29%, 80% and 98%, forD= 0,D= 1 andD= 2, respectively. High verification: Overall, the probability of verification in the population is about 81%. Verification rate = 74%, 99% and 100%, forD= 0,D= 1 andD= 2, respectively. Random verification: 40% of subjects under study are chosen randomly to receive disease verification. All other simulation set-up including sample size, disease prevalence and test accuracy stay the same as the default setting. Table ?? summarizes the performance of all the estimators. Note that Full data estimator is shared by all the scenarios. Considering the first three scenarios when verification rate is low, medium or high, there is little bias (<0.9%) when estimating VUS for any method except CC. Bias in CC decreases as the percentage of subjects verified is increased (mag- nitude of relative bias ranging from 1.9% to 6.7%). When the verification rate is very low (24 % overall), there are little differences between bias-corrected estimators and the true value (magnitude of relative bias ranging from 0.1% to 0.8%), but the bias is much smaller compared to the bias in CC (magnitude of relative bias: 6.7%). Among all the correction methods, PDR tends to have the largest bias and variance. With random verification, when verified subjects are selected completely independent of any observed or unobserved vari- ables, there is no verification bias and thus CC is very close to the true value. All VUS estimators are less efficient when the verification rates are very low, which is probably because these estimators contain the term ofVp 1 so their variance are inflated whenp is small. Jackknife variance estimators still give good performance in all scenarios and 90% coverage rates are also retained (between 89.7% and 92.9%). In terms of MSE (table ??), as verification rate increases, the values of MSE for all the estimators decreases. 99 DR, IPW and PDR are consistently more efficient relative to CC. 8.11 Model Misspecification IPW requires models for the probablity of verification, while DR requires correct models for either the probability of disease among verified samples or correct models for the veri- fication probability. Also, PDR needs to specify a correct working model for disease using the whole sample. Therefore, it is important to evaluate the behavior of our new estimators in finite samples under correct and incorrect specification of the disease and/or the selec- tion model. Up to this point in the chapter, correct model specification has been considered. Next we consider incorrect model specification. In practice verification probabilities may not be known and thus estimation based on avail- able data is required. This is a very common situation for observational studies. At the beginning of the randomized trial, the probabilities of disease verification may have been known according to the study design, but the actual selection procedures may be unknown due to drop-out, loss of follow up, or other unpredictable reasons. On the other hand, even when the selection procedure for true disease status verification is well understood and can be controlled by the investigators at the design stage of the study, the model for disease may not be valid. We considered situations in which the disease model, verification model, or both are ex- tremely misspecified to explore the behavior of the new estimators in terms of recovering verification bias. Here we investigate how robust these methods are to model misspecifica- tion. In our simulations we estimate VUS using the proposed approaches under the following four scenarios for the working models: 100 1. Both Correct: Both working models are correctly specified as 8.1 and 8.2. 2. Wrong V: The model for selection for verification was incorrectly specified in that we seth(T;A;g)=g 0 +g 1 logjTj+g 2 I(A > 0). 3. Wrong D: The disease model was incorrectly specified in that we set log P(D=djV=1;T;A) P(D=0jV=1;T;A) ! = log(o d (T;A))+b 01 +b 11 T 3 +b 21 I(A > 0),d= 1;2. 4. Both Wrong: Both working models are incorrectly specified as (2) and (3). All other simulation set-up including sample size, disease prevalence, verification mecha- nism and test accuracy stay the same as the default setting. Table 8.17 shows the simulation results. Note that the Full data, CC and IPW(K) estimators stay the same in all the scenarios because they are not impacted by misspecified disease and verification models. When only the disease model is incorrect, IPW(E) has very little bias (magnitude of rela- tive bias: -0.1%). PDR shows bias of -1.7% because the estimation process relies on the correct specification of the disease model. When only the verification model is incorrect, IPW(E) has bias of -2.3% in this case because the estimated verification probabilities are no longer valid and PDR also yields bias (magnitude of relative bias: -1.1%) as the correctness of verification model is also required for the PDR approach. On the contrary, because of the correctness of the working disease model, DR estimators have little bias (magnitude of relative bias< 0.2%), which supports the theoretical results that DR estimators are doubly robust. As predicted by theory, the simulation results show that the DR estimators greatly corrected the noticeable bias of the CC estimator (-3.4%) in any of the first three scenarios in which either the disease or the verification process were correctly modeled. When both working models were incorrectly specified, DR leads to biased results (magnitude of relative bias ranging from 3.3% to 4.0%). In terms of PDR, the magnitude of bias is higher when both 101 models are wrong (-3.2%), compared to the magnitude of bias caused by misspecification of either one of the two models. Table 8.17 also suggests as long as the disease model or the verification model is correct, MSE for DR is comparable. When the verification model is wrong, the MSE of IPW(E) is about 1.22 times higher than the value when the verification model is correct. In general, when either one of the two models or both of them are incorrect, PDR yields higher MSE. Considering variance and MSE, it is interesting to observe that PDR is more sensitive to the misspecification of the disease model. It is probably because the simulation setting makes the estimation process in PDR more sensitive to the selected type of disease model misspecification. Overall, we see that, as expected from the theoretical development, the DR procedure pro- vides unbiased estimates when at least one of the working models is correctly specified. In addition, it performs well for small sample sizes providing substantially less biased esti- mates than those obtained by the na¨ ıve estimator, but it performs poorly in bias as a result of the two working models being incorrect. The IPW(E) estimator clearly is not robust to the verification model misspecification as it relies on the correct weight (verification prob- ability) for the bias-correction procedure. By a similar argument, the PDR estimator is sensitive to both the verification and disease model misspecification. 8.12 Alpha Sensitivity To overcome the limitations of making inference based on unverifiable assumptions, Little and Rubin advocated running sensitivity analysis to report results under a range of plausible assumptions[20]. Therefore, we can conduct sensitivity analysis over a series of plausible values ofa, which is the user-defined residual association between disease status and ver- ification status after adjusting for observed factors. In the current simulation setting, the true value ofa is equal to -1. 102 Note that Full data, CC, IPW(K) and PDR estimators stay the same in all the scenarios because they do not rely on the correctness of the non-ignorable parameter. In table 8.18, comparing the proposed estimates of VUS, the DR estimator with misspecified a could lead to bias (magnitude of relative bias ranging from 0.4% to 3.4%) and the bias tends to be more sensitive to over-estimation of a (magnitude of relative bias between 0.8% and 3.4%). When under estimating a, the bias of the DR estimator is considerably smaller (magnitude of relative bias between 0.4 and 1.1) than that of the CC estimate (magnitude of relative bias: 3.4%). The validity of the IPW estimator is influenced by the misspecifi- cation ofa less substantially (magnitude of relative bias< 1.4%) , compared to that of the DR estimate. Considering MSE, incorrect estimation of a increases the MSE of DR and IPW(E). The magnitude of changes in MSE tends to be larger when a is biased upward. And consistently, the MSE of IPW(E) is less affected by the incorrectness ina relative to the MSE of DR. 8.13 Ties in Test results In the above scenarios, we assume that there are no tied test results. In practice, for tied test results, one reasonable approach is to add a small amount of uniformly distributed random ”jitter” to each test result to break ties. In this section, we study the sensitivity of our pro- posed estimators to different amount of jitter we add to the test results. We estimate VUS over a series of values of jitter by adding a uniform number between [-jitter, +jitter] to the test result. We tried four values: 0.001, 0.0001, 0.00001 and 0.000001 and they all give almost identical results in terms of bias and variance (euclidean distance between average estimate of VUS is less than10 10 ). Since adding a small amount of randomness into the data does not influence the validity of the results, our methods can also deal with the situation when there are tied test results. 103 8.14 Summary Na¨ ıve CC VUS estimators that did not account for the biased sampling yielded biased estimates of VUS. On the other hand, our proposed bias-corrected VUS estimators pro- duce unbiased estimates even in small samples, under both ignorable missingness and non- ignorable missingness. DR2 ( d VUS DR using ˆ g g g opt solving 5.17 using d opt (T;A) in 5.18) is the most efficient in general. The PDR estimator has higher bias under small sample size, compared to the DR and IPW estimators, but the bias is considerably smaller than that of the CC estimator and it becomes negligible as the sample size increases. Both ERE and SSRE relative to DR2 shows that besides the Full Data estimator which is based on all the subjects under study, DR2 is the most efficient and PDR and IPW have similar efficiency. Considering the Jackknife variance estimators, the mean of the SD esti- mator is similar to the simulation SD which is averaged across 1000 replication to represent the true SD. In addition, the 90% coverage rates are mostly achieved. Investigations of the effects of test accuracy, disease prevalence rate and verification sta- tus reveal that under different scenarios, all proposed estimators corrected the noticeable bias of the CC estimator. In most cases, the DR estimator has the lowest MSE among all the proposed estimators; except when the disease prevalence is very high, DR has highest MSE. The performances of IPW(E) and IPW(K) are comparable overall. PDR have similar but slightly lower MSE comparing to IPW. Considering model misspecification, IPW(E), which requires a correct working model for the probability of disease verification, yield biased results if the verification model is not correctly specified. IPW(K), on the contrary, is not influenced by the correctness of verifica- tion model, since it relies on the prior knowledge of verification probability. DR estimates are robust to either incorrect verification model or incorrect disease model, which reflects the fact that it preforms fairly well under misspecified verification model or misspecified disease model. But when both the disease and verification models are wrong, DR yield bi- 104 ased results. PDR relies on the correctness of both the disease probability and verification probability and thus the misspecification of either one of them or both of them will cause bias in the estimation. The sensitivity analysis shows that over-estimating the residual association between dis- ease status and verification has higher influence on the performance of DR and IPW(E) estimates. Comparing DR to IPW estimators, IPW is more robust to the wronga value. Considering potential tied test results, we recommend to break the ties by adding a small random quantity (eg. uniform between[10 5 ;10 5 ]) to the test result. According to our sensitivity analysis, the conclusion is stable when adding a small amount of randomness to the data (small in terms of relative to the mean and SD of test results). In conclusion, since DR consistently gives the most efficient estimates, we recommend us- ing DR over IPW. We suggest to conduct a sensitivity analysis over a range of plausible values of a. Although the relatively lower efficiency of the PDR estimator, we can use it to get a crude estimate ofa. Since DR estimators tend to be robust to under-estimation of a, based on the crude estimate ofa, sensitivity analysis can be conducted over a range of the plausible values ofa, possibly lower than the crude estimate. In the next chapter, we will analyze data from the ADNI study. Specifically, the proposed methods will be used to estimate VUS of a new diagnostic test for identifying between three stages of Alzheimer’s Disease (AD). 105 Figure 8.1: Full data (Red surface) and CC (Green surface) ROC surface from a randomly chosen realization of the simulation study; TPR0, TPR1 and TPR2 for D = 0, 1, and 2, respectively. 106 Figure 8.2: Full data (Red surface) and IPW (Green surface) ROC surface from a randomly chosen realization of the simulation study; TPR0, TPR1 and TPR2 for D = 0, 1, and 2, respectively. 107 Figure 8.3: Full data (Red surface) and CC (Green surface) ROC surface from a randomly chosen realization of the simulation study; TPR0, TPR1 and TPR2 for D = 0, 1, and 2, respectively. 108 Figure 8.4: Full data (Red surface) and IPW (Green surface) ROC surface from a randomly chosen realization of the simulation study; TPR0, TPR1 and TPR2 for D = 0, 1, and 2, respectively. 109 Table 8.12: Comparison of the VUS Estimators: Varying Test Accuracy. Relative bias(%) to “True” estimates is provided in parentheses. Method VUS SD SE y Coverage (%) z MSE $ Low Accuracy Full Data 0.376(-0.1) 0.024 0.024 90.0 0.59 CC 0.325(-13.5) 0.028 0.028 89.7 3.37 DR1 0.376(-0.2) 0.028 0.028 90.7 0.80 DR2 0.376(-0.1) 0.028 0.028 90.8 0.80 IPW(E) 0.375(-0.3) 0.029 0.031 91.9 0.95 IPW(K) 0.376(-0.2) 0.031 0.031 89.8 0.94 PDR 0.373(-0.9) 0.031 0.028 88.8 0.82 Medium Accuracy Full Data 0.681(-0.1) 0.023 0.024 91.1 0.56 CC 0.636(-6.6) 0.028 0.029 90.0 2.88 DR1 0.681(-0.0) 0.026 0.027 91.7 0.74 DR2 0.681(-0.1) 0.026 0.027 91.8 0.74 IPW(E) 0.681(-0.1) 0.027 0.028 92.2 0.81 IPW(K) 0.681(-0.1) 0.027 0.028 90.4 0.80 PDR 0.680(-0.3) 0.027 0.027 91.1 0.74 High Accuracy Full Data 0.910(-0.0) 0.012 0.013 91.4 0.16 CC 0.892 (-2.1) 0.016 0.016 91.4 0.61 DR1 0.910(0.0) 0.014 0.014 91.8 0.20 DR2 0.910(-0.0) 0.014 0.014 92.0 0.20 IPW(E) 0.910(-0.0) 0.014 0.015 92.7 0.21 IPW(K) 0.910(-0.0) 0.014 0.015 92.4 0.21 PDR 0.910(-0.0) 0.014 0.014 91.7 0.20 relative bias to “True” estimates is provided in parentheses. simulation SD. y the average of the SD estimator (Jackknife). z 90 % CI coverage probabilities calculated using the SD estimator (Jackknife) $ MSE10 3 of VUS estimators calculated using the SD estimator (Jackknife) 110 Table 8.13: Detailed information about verification rate: varying disease prevalence Statistic Low-prevalence Medium-prevalence High-prevalence True VUS 0.84 0.84 0.84 Verification Rate (overall) 33% 56% 70% Verification Rate (D = 0) 97% 97% 97% Verification Rate (D = 1) 76% 76% 76% Verification Rate (D = 2) 28% 28% 28% 111 Table 8.14: Comparison of the VUS Estimators: Varying Disease Prevalence. Relative bias(%) to “True” estimates is provided in parentheses. Method VUS SD SE y Coverage (%) z MSE $ Low-prevalence Full Data 0.844(-0.1) 0.045 0.044 90.6 1.92 CC 0.816(-3.4) 0.051 0.049 89.9 3.20 DR1 0.844(-0.0) 0.049 0.046 90.0 2.10 DR2 0.844(-0.0) 0.049 0.046 89.7 2.10 IPW(E) 0.844(-0.0) 0.049 0.046 90.3 2.15 IPW(K) 0.844(-0.0) 0.049 0.046 90.0 2.15 PDR 0.843(-0.1) 0.049 0.046 89.8 2.15 Medium-prevalence Full Data 0.843(-0.0) 0.014 0.014 89.4 0.19 CC 0.814(-3.4) 0.019 0.019 89.6 1.18 DR1 0.843(-0.0) 0.016 0.017 90.7 0.27 DR2 0.843(-0.0) 0.016 0.017 91.1 0.27 IPW(E) 0.843(-0.0) 0.016 0.017 91.3 0.29 IPW(K) 0.843(-0.0) 0.017 0.017 89.9 0.29 PDR 0.841(-0.2) 0.017 0.016 89.7 0.27 High-prevalence Full Data 0.843(-0.1) 0.014 0.013 89.0 0.18 CC 0.813(-3.5) 0.023 0.023 89.1 1.42 DR1 0.845(0.2) 0.021 0.022 91.5 0.50 DR2 0.843(0.0) 0.020 0.022 93.3 0.49 IPW(E) 0.841(-0.2) 0.018 0.019 89.9 0.36 IPW(K) 0.842(-0.2) 0.019 0.019 89.6 0.36 PDR 0.839(-0.5) 0.018 0.018 88.3 0.33 relative bias to “True” estimates is provided in parentheses. simulation SD. y the average of the SD estimator (Jackknife). z 90 % CI coverage probabilities calculated using the SD estimator (Jackknife) $ MSE10 3 of VUS estimators calculated using the SD estimator (Jackknife) 112 Table 8.15: Detailed information about verification rate: varying verification mechanism Statistic Verification Rate Low Medium High Random Overall 24% 46% 81% 40 % D = 0 7% 29% 74% 40 % D = 1 50% 80% 99% 40 % D = 2 94% 98% 100% 40 % 113 Table 8.16: Comparison of the VUS Estimators: Varying Verification Rate Method VUS SD SE y Coverage (%) z MSE $ Full Data 0.843 (-0.0) 0.017 0.017 92.1 0.30 Low Verification CC 0.787 (-6.7) 0.033 0.033 90.4 4.27 DR1 0.843 (0.1) 0.029 0.028 91.5 0.80 DR2 0.842 (-0.1) 0.027 0.027 91.7 0.72 IPW(E) 0.840 (-0.4) 0.031 0.031 91.5 1.00 IPW(K) 0.840 (-0.3) 0.032 0.031 90.6 0.98 PDR 0.837 (-0.8) 0.037 0.035 92.9 1.27 Medium Verification CC 0.814(-3.4) 0.021 0.021 90.4 1.29 DR1 0.843 (0.0) 0.018 0.019 92.3 0.37 DR2 0.843(-0.0) 0.018 0.019 91.9 0.37 IPW(E) 0.842(-0.1) 0.019 0.020 91.6 0.40 IPW(K) 0.843(-0.0) 0.019 0.020 90.7 0.40 PDR 0.842(-0.1) 0.019 0.019 91.1 0.38 High Verification CC 0.827 (-1.9) 0.017 0.023 90.8 0.95 DR1 0.842(-0.1) 0.017 0.022 90.7 0.31 DR2 0.842(-0.1) 0.017 0.022 90.8 0.31 IPW(E) 0.842(-0.1) 0.017 0.019 91.4 0.31 IPW(K) 0.842(-0.1) 0.017 0.019 91.1 0.31 PDR 0.843(-0.1) 0.017 0.018 91.1 0.31 114 Method VUS SD SE y Coverage (%) z MSE $ Random Verification CC 0.842(-0.1) 0.027 0.028 91.7 0.78 DR1 0.843(-0.0) 0.026 0.026 89.8 0.69 DR2 0.843(-0.0) 0.026 0.026 89.7 0.69 IPW(E) 0.842(-0.1) 0.027 0.028 91.6 0.78 IPW(K) 0.842(-0.1) 0.027 0.028 91.7 0.78 PDR 0.841(-0.2) 0.028 0.027 89.7 0.71 relative bias to “True” estimates is provided in parentheses. simulation SD. y the average of the SD estimator (Jackknife). z 90 % CI coverage probabilities calculated using the SD estimator (Jackknife) $ MSE10 3 of VUS estimators calculated using the SD estimator (Jackknife) 115 Table 8.17: Comparison of the VUS Estimators: Model Correctness Method VUS SD SE y Coverage (%) z MSE $ Full Data 0.843 (-0.0) 0.017 0.017 92.1 0.30 CC 0.814 (-3.4) 0.021 0.022 91.6 1.31 IPW(K) 0.843 (-0.0) 0.019 0.020 91.9 0.41 Both Correct DR1 0.843 (-0.0) 0.019 0.020 91.9 0.39 DR2 0.843 (-0.0) 0.019 0.020 91.9 0.39 IPW(E) 0.842 (-0.1) 0.019 0.020 92.5 0.41 PDR 0.842 (-0.1) 0.019 0.020 91.6 0.39 Wrong V DR1 0.841 (-0.2) 0.019 0.018 89.2 0.31 DR2 0.841 (-0.2) 0.019 0.018 89.2 0.31 IPW(E) 0.824 (-2.3) 0.024 0.023 91.3 0.91 PDR 0.834 (-1.1) 0.019 0.020 92.9 0.50 Wrong D DR1 0.843 (-0.0) 0.020 0.022 90.9 0.46 DR2 0.843 (-0.0) 0.020 0.021 91.3 0.46 IPW(E) 0.842 (-0.1) 0.019 0.020 91.3 0.41 PDR 0.857 (-1.7) 0.043 0.064 97.6 4.31 Both Wrong x DR1 0.871 (3.3) 0.022 0.022 84.3 1.26 DR2 0.809 (-4.0) 0.113 0.148 93.3 13.93 IPW(E) 0.824 (-2.3) 0.023 0.021 83.2 0.91 PDR 0.816 (-3.2) 0.081 0.389 93.8 7.26 relative bias to “True” estimates is provided in parentheses. simulation SD. y the average of the SD estimator (Jackknife). z 90 % CI coverage probabilities calculated using the SD estimator (Jackknife) $ MSE10 3 of VUS estimators calculated using the SD estimator (Jackknife) x MSE10 3 of VUS estimators calculated using the simulation SD due to the inaccuracy of the Jackknife SD estimator. 116 Table 8.18: Comparison of the VUS Estimators: Alpha Sensitivity Method VUS SD SE y Coverage (%) z MSE $ Full Data 0.843 (-0.0) 0.017 0.017 92.1 0.31 CC 0.814 (-3.4) 0.021 0.022 91.6 1.31 IPW(K) 0.843 (-0.0) 0.019 0.020 91.9 0.41 PDR 0.842 (-0.1) 0.019 0.020 91.6 0.39 a =3 DR1 0.834 (-1.1) 0.019 0.020 91.1 0.45 DR2 0.834 (-1.1) 0.019 0.020 91.6 0.46 IPW(E) 0.831 (-1.4) 0.019 0.021 93.1 0.57 a =2 DR1 0.839 (-0.4) 0.023 0.031 98.2 0.98 DR2 0.840 (-0.4) 0.023 0.032 98.3 1.01 IPW(E) 0.836 (-0.8) 0.019 0.020 93.3 0.45 a =1 x DR1 0.843 (-0.0) 0.021 0.020 91.9 0.39 DR2 0.843 (-0.0) 0.019 0.020 91.9 0.39 IPW(E) 0.842 (-0.1) 0.019 0.020 92.5 0.41 a = 0 DR1 0.836 (-0.8) 0.021 0.022 91.7 0.53 DR2 0.836 (-0.8) 0.021 0.022 91.1 0.53 IPW(E) 0.843 (0.0) 0.021 0.022 92.4 0.49 a = 1 DR1 0.815 (-3.4) 0.028 0.029 91.1 1.65 DR2 0.816 (-3.2) 0.030 0.030 91.0 1.66 IPW(E) 0.834 (-1.1) 0.028 0.029 91.1 0.93 relative bias to “True” estimates is provided in parentheses. simulation SD. y the average of the SD estimator (Jackknife). z 90 % CI coverage probabilities calculated using the SD estimator (Jack- knife) $ MSE10 3 of VUS estimators calculated using the SD estimator (Jack- knife) x correctly specifieda 117 Chapter 9 Data Application 9.1 Study Description We illustrate our proposed approach with data from the ADNI study. Detailed background for the ADNI study can be found in section 1.5.1. To summarize, the GST for AD requires brain autopsy, which is based on the extent of neuritic plaques and neurofibrillary tangles. According to the National Institute on Aging-Reagan Institute criteria, the frequency of plaques and tangles in the neocortex can be used to determine the severity of AD. Thus, people who are still alive can not have disease confirmed. In addition, the disease status is also missing for patients or their family who may decline to have brain autopsy. As de- scribed in my previous comment, there are many reasons for low autopsy verification rate in AD. Since identifying disease stages and tracking pathology progress are critical for effective AD intervention and drug development, ADNI as a large, multicenter, longitudinal neu- roimaging study, was launched in 2003. One of the goals of the ADNI study is to develop a cerebrospinal fluid (CSF) biomarker signature for AD stage identification[16]. Among all the potential biomarkers under study, CSF tau protein concentrations are considered to be a promising biomarker that may be diagnostic for different AD stages relative to degree of cognitive impairment[54]. Also, recent studies showed that CSF tau changes may predict the conversion to AD in mild cognitive impairment (MCI) subjects[55]. However, before 118 introducing it into clinical diagnosis, it is critically important to perform a proper study to determine the accuracy of using tau level in CSF as the biomarker to predict different stages of AD. In the current ADNI data set, all subjects received a clinical diagnosis of normal cognition, amnestic MCI, and probable AD based on a series of clinical-cognitive assessments. In our analysis, we are interested in evaluating the diagnostic accuracy of CSF tau protein in predicting stages of AD in terms of cognitive impairment, which ranges from normal ag- ing (non-disease), to mild cognitive impairment (mild-disease) and probable AD (severe- disease). Our purpose is to evaluate whether the proposed estimator can properly assess the accuracy of CSF tau protein without requiring disease verification on all the subjects. In practice, the clinical decision about disease verification can be made based on a com- bination of new test results and available clinical charactersitics. To mimic the real-life application without over-complication, we introduce non-random missing in disease status based on levels of tau protein (new biomarker test) in CSF and a clinical covariate. We then apply the proposed methods to estimate the VUS. We have the full data results so we can compare our estimators to the full data estimator, which is not subject to the verification bias. 9.2 Convention and Notation Here we consider tau protein levels as the new diagnostic test. Test results are positively related with disease status (i.e. larger values of test results indicate more severe disease stages). Reduced CSF levels of Ab1-42 are believed to result from large-scale accumu- lation of this Ab peptide into insoluble plaques in the AD brain, which makes Ab1-42 concentration in CSF an informative clinical covariate[54]. We denote D to be the true AD stages (D = 0: normal aging; D = 1: mild cognitive impairment; D = 2: probable AD). T denotes tau protein levels, A denotes Ab1-42 and V denotes verification status. We con- tinue to use notations defined in section 8.1 for all the estimators. 119 9.3 Data In the full data set, AD stages (D), tau protein (T) and Ab1-42 (A) are available for 1078 subjects. As mentioned before, a decreased level of A implies higher degree of disease stages. For convenience, T and -A are standardized. Note that there are tied values in both T and A. Since our methods apply to continuous test results, as suggested in the simula- tion chapter (Section 8.13), we artificially create continuous data by randomly subtracting a small random value from each tied test result. Specifically, for each tied observation, we randomly pick a value from the uniform distribution[10 5 ;10 5 ]. Table 9.1 shows the summary statistics for all the subjects used in our analysis. We observe that T and A are not normally distributed with large variation relative to the mean. It is appropriate to apply our proposed estimators since our approach is non-parametric. Also due to the large variation, we expect wider 95% confidence interval than that of simulation results with similar setting. Figure 9.1 displays boxplots of T and A for stratified disease status. We can see that subjects with higher T and A are more likely to have higher degree of disease status. Table 9.1: Summary statistics of ADNI data Variable Summary statistics D Non-disease: 25%; Mild-disease: 18%; Severe-disease: 56% T y Mean: 0.002; Median: -0.261; SD: 0.999 A z Mean: 0.002; Median: 0.246; SD: 1.000 y Tau protein: values are standardized. z Ab1-42: values are standardized and multiplied by -1. 120 9.4 Verification Process Non-random missingness is introduced into the disease status. Only a fraction of the sub- jects receive the GST and have disease status observed. T and A are obtained on all sub- jects. Since in practice, the observed characteristics are unlikely to capture all of the re- lationship between disease status and verification status, we generate missingness of the disease status that is non-ignorable. Therefore, the simulation process for the verification status is similar to the one used in section 8.1. Specifically, V is generated following model (5.1) with h(T;A)=g 0 +g 1 T+g 2 A and q(T;A)=a. Again, by selecting a6= 0, one can create the scenario with non-ignorable missingness. In particular, we choose (g 0 ;g 1 ;g 2 ;a) = (1;1;0:5;1) for the verification model. The non-ignorable parameter is set to be 1, indicating that the odds ratio of verification for mild-disease versus non-disease is 2.7. Overall, 47% of the subjects receive disease verification. The percentage of disease veri- fication among subjects in non-disease (D= 0), mild-disease (D= 1) and severe-disease (D= 2) groups are, 24%, 47% and 75%, respectively. In figure 9.2, subjects with higher T and A are related to increased verification rates. 9.5 Working Models DR and PDR estimators require a model for the disease probability conditional on the test under study and covariate data. We choose a multinomial logistic regression model assum- ing linear relationships between log-odds of disease stages and T, A. IPW, DR and PDR also require a model for Pr(VjT; A). We estimate verification probabilities by assuming T and A are linearly related with log-odds of verification status. Table 9.2 compares the known coefficients and the estimated coefficients from the verification model. We observe that the coefficient estimates from the DR and PDR methods are close to the true values, while DR performs better than PDR. In terms of the estimated non-ignorable parameter, the PDR approach provides reasonable estimates, which can indicate the proper specification of the disease model. Also, the estimated non-ignorable parameter can serve as the starting 121 point of the beta sensitivity analysis in the DR approach. Table 9.2: Estimated model parameters — verification model Variable Coefficients Intercept T A D Known Intercept 1.000 -1.000 0.500 -1.000 DR 1.037 -0.903 0.388 - PDR 1.204 -0.899 0.445 -1.229 9.6 Application Results The resulting estimates of VUS are presented in Table 9.3. For each VUS estimate, a Jack- knife variance estimate is presented and the corresponding 95 % confidence interval (CI). For IPW estimates, the asymptotic variance result from Theorem 6.1 is also presented. As expected, CC underestimates the VUS in this case by nearly 13 %, which is also apparent in Figure 9.3, where the empirical ROC surface based on the complete cases (red sur- face) and full data (green surface) is presented. The CC ROC surface is consistently biased downwards relative to the full data curve. Figure 9.4 provides the bias-corrected ROC surface using IPW(K) (red surface) and the full data (green surface) ROC surface. The bias-correction method shows underestimation and overestimation evenly across different pairs of decision thresholds, which yields VUS estimates similar to the full data value, with values of bias ranging from 0.1% to 2%. The ROC surfaces for other estimators are not presented because they are very similar to that for IPW(K). Clearly, the na¨ ıve estimator using verified subjects without taking into account the verification bias could lead to inac- curate conclusions about the accuracy of the test. The true disease model is unknown in this case and thus it is likely that there is some degree 122 of model misspecification present. This explains the bias of PDR estimate, but the bias is relatively small compared to that of CC estimate (2% versus 13%). When the known ver- ification probabilities are used, the IPW(K) VUS estimate is quite close to the full data estimate. Even when we do not have much prior knowledge about the verification proba- bilites and require a model to estimate those probabilities, both IPW(E) and DR estimates still resemble the full data value. Comparing all the new bias-corrected estimates, the 95 % CI obtained based on Jackknife methods are very close to each other and all the 95% confidence intervals contain the full data value of 0.357. Consistent with simulation results, DR2 is more efficient than the DR1 and IPW estimators. Possibly due to randomness, PDR is slightly more efficient than DR2. The variance estimates obtained via the Jackknife methods and the asymptotic theorem are very similar, which further supports the application of the asymptotic variance in real life data. Table 9.3: Estimates of the VUS using data from ADNI Method VUS JK SD (*100) Asymptotic SD (*100) 95 % CI (JK) Full Data 0.357 1.872 —– (0.320, 0.393) CC 0.312 2.858 —– (0.256, 0.368) DR1 0.357 3.275 —– (0.293, 0.421) DR2 0.356 3.238 —– (0.293, 0.420) IPW(E) 0.355 3.363 3.311 (0.289, 0.421) IPW(K) 0.358 3.456 3.409 (0.290, 0.426) PDR 0.363 3.167 —– (0.301, 0.425) In addition, a sensitivity analysis is performed with different a2[2;1]. According to table 9.4, DR and IPW(E) estimates are more robust to the under-estimation ofa with val- ues of bias ranging from 2% to 6%. On the contrary, when the selection bias function is over estimated, substantial bias is observed (13% - 34%). Therefore, our simulation results 123 emphasize the danger of overestimating the non-ignorable parameter. Comparing DR and IPW(E) estimates, IPW(E) is influenced by misspecification ofa to a smaller degree. Table 9.4: Sensitivity Analysis for Estimates of the VUS using data from ADNI (true a =1) Method Alpha -3 -2 -1 0 1 DR1 0.368 (2.317) 0.378 (2.839) 0.357 (3.275) 0.306 (3.343) 0.238 (2.915) DR2 0.369 (2.349) 0.379 (2.860) 0.356 (3.238) 0.304 (3.260) 0.236 (2.834) IPW(E) 0.365 (3.285) 0.370 (3.334) 0.355 (3.363) 0.312 (3.283) 0.254 (3.012) 9.7 Summary After inducing biased sampling for verification status to the ADNI data set, our proposed methods yield valid estimates of VUS that closely resemble the Full Data estimates. Based on our findings, with only part of the subjects having disease verified, we reached the con- clusion that the accuracy of Tau protein is 35.6% and we are 95% confident that the true value of the accuracy of Tau protein is between 29.3% and 42.0%. Our conclusion is con- sistent with the results that would be obtained if each subject had disease verification. In practice, it might be difficult to specify both the disease model and the verification model exactly right. However, if we could obtain enough information about the disease mecha- nism and the verification process, it is possible to build models that are close to the correct models. Also logistic regression models usually provide reasonably valid estimates. With a relatively large sample size, which is not uncommon in screening studies, good approxima- tions can be achieved by parsimonious and scientific plausible models. Thus, to properly evaluate the accuracy of a new diagnostic test, it is not necessary that the true disease status is available for all the subjects under study. Our methods correct potential verification bias 124 when it is not realistic to require all subjects to receive the GST and obtain disease verifi- cation. In addition, applying our methods can also be cost-effective since only a subset of subjects will need disease verification. 125 Figure 9.1: Boxplot of Tau protein & Ab1-42, stratified by disease status 126 Figure 9.2: Boxplot of Tau protein & Ab1-42, stratified by verification status 127 Figure 9.3: Boxplot of Tau protein & Ab1-42, stratified by verification status 128 Figure 9.4: Boxplot of Tau protein & Ab1-42, stratified by verification status 129 Chapter 10 Conclusions and Future Works 10.1 Dissertation Contributions and Recommendations This dissertation focuses on the important problem of assessing the accuracy of a contin- uous medical diagnostic test with biased sampling in disease verification for three-class classification problems. We explored the estimation of the summary measure of the ROC surface, VUS, in the presence of verification bias. Both the MAR and MNAR missing- ness mechanisms were investigated. By imposing an identification assumption about the residual association between disease status and the missingness, we can estimate the dis- ease probability as well as the verification probability separately from the score equations. Another option is to assume a joint distribution of (D i ; V i ) and jointly estimate disease probability and verification probability from the score equations. Under different model setup, several types of estimators were proposed to correct verifica- tion bias. Several new methods for bias correction were developed and compared in this dissertation. First, we constructed the IPW VUS under the MAR assumption. Then to incorporate non-ignorable selection process for disease verification, we further proposed IPW, DR, PDR estimators of VUS. In simulation studies, we demonstrated that the pro- posed estimators successfully corrected for this bias in small and large samples. The IPW estimator weights each verified subject by the inverse of the verification probabil- 130 ity. The DR estimator uses both the verification and model information. The DR estimators have better efficiency compared to the IPW estimator. Moreover, as long as either the dis- ease or the verification model is correctly specified, the DR estimator is consistent. In practice, the DR estimator is recommended, especially when the verification process or the disease model is not clearly understood. A drawback of the DR estimator is that it relies on the arbitrary identification assumption. A series of sensitivity analyses is recommended across a range of possible values of the non-ignorable parameter (q function). The PDR estimator is proposed to estimate VUS by jointly modeling the missing mecha- nisms and the risk of having a certain disease level. By making further assumption about the joint distribution of the missingness and disease status, we can obtain more information about the non-ignorable parameter, which can serve as the starting point for the sensitivity analysis of DR estimation. In practice, we can treat historical data as the training data and introduce non-random missingness with a known missing mechanism. Then in the particu- lar disease/research area, we can determine proper model specifications for the verification and disease models by checking the closeness of the estimated q function by PDR estima- tion approach to the true q function. The estimators we proposed are easy to implement. When the verification probabilities are known and sample size is small, we recommend using both IPW and DR estimators since they have better MSE performance than the PDR estimator. Depending on the pur- pose of the study, IPW may be prefered since it is more intuitively understandable and only needs to specify one verification model. In observational studies where verification mech- anism is unknown or the clinical course of the disease is too heterogeneous to build proper disease model, we recommend using DR estimators since they are robust to model mis- specifications and very efficient. Regarding the non-ignorable parameter, we recommend using PDR to estimator the residual association between disease status and verification sta- tus and around the estimated value, we can perform a series of sensitivity analyses for DR estimation. Considering the less substantial bias caused by underestimating the residual association, we recommend to select more values that are lower than the estimated one. 131 Also, with historical data, we can also use PDR approach to propose a reasonable verifica- tion model and disease model. Jackknife estimators of variance were proposed and evaluated. In practice, when we know the verification process very well, we recommend to use the closed-form variance estimator for IPW VUS with known verification probabilities. When we do not have a good under- standing about the selection process for disease verification and sample size is small (¡ 400), we recommend the Jackknife approach to estimate variance of our proposed estima- tors because it gives fairly good performance in terms of closeness to the simulated variance and maintaining required coverage probabilities. Otherwise, when sample size is large, the closed-form variance estimator for IPW with estimated verification probabilities is prefered as it is comparable to the simulated variance and is much less computer-intensive. In summary, we proposed several methods to estimate VUS accounting for verification bias. Through theoretical framework and simulation studies, we showed that proposed es- timators perform quite well. In addition, Jackknife variance estimators were derived and shown to work well in finite sample setting. A closed-form expression for IPW estimator’s asymptotic variance was also derived and it resembled the simulation variance when the verification process is known. With relative large sample size (> 400) and reasonably esti- mated verification probabilities, the closed-form variance estimator performs very well. The proposed estimators are easy to implement only requiring a subset of subjects under study to obtain disease verification. In future studies, under the extreme situation when it is unrealistic to apply the GST to each subject, using our proposed estimator can provide unbiased estimators when there are only a subset of the subjects having disease verified. When the GST is too expensive, our approach can reduce the cost by only requiring par- tial disease verification but at the same still present valid conclusion about the accuracy of the new diagnostic test. If the studies are designed so that the verification probabilities are known or can be estimated reasonably well, the performance of our estimators can be further ensured. 132 10.2 Future Work 10.2.1 Asymptotic Distribution Theory In Chapter 6, asymptotic distribution theory was developed for IPW estimators based on U-statistics theory. In simulation studies, we found that the variance estimator has good performance when the verification probabilities, i.e. p i , are known. But when p i are not known, the variance estimator tends to underestimate the variance, which probably is due to the fact that we did not account for the variance in estimating the p i . Although this should not be a significant problem in large samples, developing variance estimators that account for the variance in ˆ p i warrants further attention. In addition, since similar to IPW estimator, DR and PDR estimators are also a function of U-statistics, we plan to extend the theory to develop closed-form variance estimators for DR and PDR. 10.2.2 Estimation with NULL probability of verification One of the assumptions we made is the positive selection probabilities, which means that re- gardless the values of test results and observed covariates, the probability of getting disease verified for each subject is always positive. In practice, this assumption can be unrealistic. One obvious example is that for alive patients, the probability of being selected to receive autopsy is always zero. In addition, doctors might obtain strong and convincing evidence of disease status based on observed information from some patients and thus they will never send them to disease verification. Estimation of the test accuracy with zero probability of verification is another area we plan to explore. 10.2.3 Estimation under verification bias adjusting for covariates It is well know that some covariates such as gender and age, may influence the accuracy of the new diagnostic test. Lack of covariate adjustment may bias the result, and also com- 133 promise the generalizability of the study results to other different population. It would be useful to develop bias-corrected estimation adjusting for covariates. 134 Bibliography [1] B. J. McNeil and S. J. Adelstein, “Determining the value of diagnostic and screening tests,”JNuclMed, vol. 17, no. 6, pp. 439–448, 1976. [2] H. C. B. M. A. H. M. C. Sox Jr. and K. I. Marton, “Medical Decision Making,” Butterworths-Heinemann,Boston, 1989. [3] C. B. Begg and R. A. Greenes, “Assessment of Diagnostic Tests When Disease Veri- fication is Subject to Selection Bias,”Biometrics, vol. 39, no. 1, pp. 207–215, 1983. [4] D. F. Ransohoff and A. R. Feinstein, “Problems of spectrum and bias in evaluating the efficacy of diagnostic tests,”NEnglJMed, vol. 299, no. 17, pp. 926–930, 1978. [5] A. S. Bates, P. A. Margolis, and A. T. Evans, “Verification bias in pediatric studies evaluating diagnostic tests,”JPediatr, vol. 122, no. 4, pp. 585–590, 1993. [6] J. M. Petscavage, M. L. Richardson, and R. B. Carr, “Verification bias an underrec- ognized source of error in assessing the efficacy of medical imaging,” Acad Radiol, vol. 18, no. 3, pp. 343–346, 2011. [7] A. M. Cronin and A. J. Vickers, “Statistical methods to correct for verification bias in diagnostic studies are inadequate when there are few false negatives: a simulation study,”BMCMedResMethodol, vol. 8, p. 75, 2008. [8] S. Mallett, J. J. Deeks, S. Halligan, S. Hopewell, V . Cornelius, and D. G. Altman, “Systematic reviews of diagnostic tests in cancer: review of methods and reporting,” BMJ, vol. 333, no. 7565, p. 413, 2006. 135 [9] R. S. Punglia, A. V . D’Amico, W. J. Catalona, K. A. Roehl, and K. M. Kuntz, “Ef- fect of verification bias on screening for prostate cancer by measurement of prostate- specific antigen,”NEnglJMed, vol. 349, no. 4, pp. 335–342, 2003. [10] R. Silipo, R. Vergassola, W. Zong, and M. R. Berthold, “Knowledge-based and Data- driven Models in Arrhythmia Fuzzy Classification,” Methods Archive, vol. 40, no. 5, pp. 397–403, 2001. [11] S. Dreiseitl, L. Ohno-Machado, and M. Binder, “Comparing Three-class Diagnos- tic Tests by Three-way ROC Analysis,” Medical Decision Making, vol. 20, no. 3, pp. 323–331, 2000. [12] D. S. Knopman, J. E. Parisi, A. Salviati, M. Floriach-Robert, B. F. Boeve, R. J. Ivnik, G. E. Smith, D. W. Dickson, K. A. Johnson, L. E. Petersen, W. C. McDonald, H. Braak, and R. C. Petersen, “Neuropathology of cognitively normal elderly.,”Jour- nalofneuropathologyandexperimentalneurology, vol. 62, pp. 1087–1095, 2003. [13] G. M. Savva, S. B. Wharton, P. G. Ince, G. Forster, F. E. Matthews, and C. Brayne, “Age, neuropathology, and dementia.,” The New England journal of medicine, vol. 360, pp. 2302–2309, 2009. [14] J. A. Schneider, Z. Arvanitakis, W. Bang, and D. A. Bennett, “Mixed brain patholo- gies account for most dementia cases in community-dwelling older persons,”Neurol- ogy, vol. 69, pp. 2197–2204, 2007. [15] L. White, B. J. Small, H. Petrovitch, G. W. Ross, K. Masaki, R. D. Abbott, J. Hard- man, D. Davis, J. Nelson, and W. Markesbery, “Recent clinical-pathologic research on the causes of dementia in late life: update from the Honolulu-Asia Aging Study.,” Journalofgeriatricpsychiatryandneurology, vol. 18, pp. 224–227, 2005. [16] C. Misra, Y . Fan, and C. Davatzikos, “Baseline and longitudinal patterns of brain atrophy in MCI patients, and their use in prediction of short-term conversion to AD: Results from ADNI,”NeuroImage, vol. 44, pp. 1415–1422, 2009. 136 [17] M. H. Zweig and G. Campbell, “Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine,”ClinChem, vol. 39, no. 4, pp. 561– 577, 1993. [18] G. Campbell, “Advances in statistical methodology for the evaluation of diagnostic and laboratory tests,”StatMed, vol. 13, no. 5-7, pp. 499–508, 1994. [19] D. Bamber, “The area above the ordinal dominance graph and the area below the receiver operating characteristic graph,”JournalofMathematicalPsychology, vol. 12, no. 4, pp. 387–415, 1975. [20] R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing Data, vol. Second. 1987. [21] T. A. Alonzo and M. S. Pepe, “Assessing Accuracy of a Continuous Screening Test in the Presence of Verification Bias,” Journal of the Royal Statistical Society. Series C(AppliedStatistics), vol. 54, no. 1, pp. 173–190, 2005. [22] T. A. Alonzo, M. S. Pepe, and T. Lumley, “Estimating disease prevalence in two-phase studies,”Biostatistics, vol. 4, no. 2, pp. 313–326, 2003. [23] D. G. Horvitz and D. J. Thompson, “A Generalization of Sampling Without Re- placement From a Finite Universe,” Journal of the American Statistical Association, vol. 47, no. 260, pp. 663–685, 1952. [24] H. He, J. M. Lyness, and M. P. McDermott, “Direct estimation of the area under the receiver operating characteristic curve in the presence of verification bias,” Statistics inMedicine, vol. 28, pp. 361–376, 2009. [25] S. G. Baker, “Evaluating multiple diagnostic tests with partial verification.,” Biomet- rics, vol. 51, pp. 330–337, 1995. [26] A. S. Kosinski and H. X. Barnhart, “Accounting for nonignorable verification bias in assessment of diagnostic tests,”Biometrics, vol. 59, pp. 163–171, 2003. 137 [27] X.-h. Zhou, “Maximum likelihood estimators of sensitivity and specificity corrected for verification bias,” Communications in Statistics - Theory and Methods, vol. 22, no. 11, pp. 3177–3198, 1993. [28] X. H. Zhou, “Effect of verification bias on positive and negative predictive values.,” Statisticsinmedicine, vol. 13, pp. 1737–1745, 1994. [29] X.-H. Zhou and C. A. Rodenberg, “Estimating an roc curve in the presence of non- ignorable verification bias,” Communications in Statistics - Theory and Methods, vol. 27, pp. 635–657, Jan. 1998. [30] X. H. Zhou and P. Castelluccio, “Nonparametric analysis for the ROC areas of two di- agnostic tests in the presence of nonignorable verification bias,”JournalofStatistical PlanningandInference, vol. 115, pp. 193–213, 2003. [31] A. Rotnitzky, D. Faraggi, and E. Schisterman, “Doubly Robust Estimation of the Area under the Receiver-Operating Characteristic Curve in the Presence of Verification Bias,” Journal of the American Statistical Association, vol. 101, no. 475, pp. 1276– 1288, 2006. [32] R. Fluss, B. Reiser, D. Faraggi, and A. Rotnitzky, “Estimation of the ROC curve under verification bias,”BiometricalJournal, vol. 51, pp. 475–490, 2009. [33] D. Liu and X.-H. Zhou, “A model for adjusting for nonignorable verification bias in estimation of ROC curve and its area with likelihood-based approach,” Biometrics, vol. 66, pp. 1119–1128, Dec. 2010. [34] B. K. Scurfield, “Multiple-Event Forced-Choice Tasks in the Theory of Signal De- tectability,”JMathPsychol, vol. 40, no. 3, pp. 253–269, 1996. [35] D. Mossman, “Three-way ROCs,”MedDecisMaking, vol. 19, no. 1, pp. 78–89, 1999. [36] C. T. Nakas and C. T. Yiannoutsos, “Ordered multiple-class ROC analysis with con- tinuous measurements,”StatMed, vol. 23, no. 22, pp. 3437–3449, 2004. 138 [37] H. Xin and E. C. Frey, “Three-Class ROC Analysis&#8212;The Equal Error Utility Assumption and the Optimality of Three-Class ROC Surface Using the Ideal Observer,” Medical Imaging, IEEE Transactions on, vol. 25, no. 8, pp. 979–986, 2006. [38] P. S. Heckerling, “Parametric three-way receiver operating characteristic surface anal- ysis using mathematica,”MedDecisMaking, vol. 21, no. 5, pp. 409–417, 2001. [39] L. Kang and L. Tian, “Estimation of the volume under the ROC surface with three ordinal diagnostic categories,” Computational Statistics & Data Analysis, vol. 62, no. 0, pp. 39–51, 2013. [40] J. Li and X.-H. Zhou, “Nonparametric and semiparametric estimation of the three way receiver operating characteristic surface,” Journal of Statistical Planning and Inference, vol. 139, no. 12, pp. 4133–4142, 2009. [41] C. J. Lloyd, “Using Smoothed Receiver Operating Characteristic Curves to Summa- rize and Compare Diagnostic Systems,” Journal of the American Statistical Associa- tion, vol. 93, no. 444, pp. 1356–1364, 1998. [42] C. T. Nakas and T. A. Alonzo, “ROC graphs for assessing the ability of a diagnostic marker to detect three disease classes with an umbrella ordering,”Biometrics, vol. 63, no. 2, pp. 603–609, 2007. [43] Y .-Y . Chi and X.-H. Zhou, “Receiver Operating Characteristic Surfaces in the Pres- ence of Verification Bias,” Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 57, no. 1, pp. 1–23, 2008. [44] A. Agresti, “Categorical data analysis,”NewYork: Wiley, 1990. [45] B. Efron and R. J. Tibshirani, “An introduction to the Bootstrap,”NewYork: Chapman andHall, 1993. [46] W. Hoeffding, “A Class of Statistics with Asymptotically Normal Distribution,” 1948. [47] J. Lee, “U-statistics: Theory and Practice,” 139 [48] M. S. Pepe and T. a. Alonzo, “Comparing disease screening tests when true disease status is ascertained only for screen positives.,”Biostatistics, vol. 2, pp. 249–60, 2001. [49] D. Scharfstein, A. Rotnitzky, and J. Robins, “Adjusting for nonignorable drop-out using semiparametric nonresponse models: Rejoinder,” Journal of the American ... , vol. 94, pp. 1135–1146, 1999. [50] J. Robins, A. Rotnitzky, and D. Scharfstein, “Sensitivity Analysis for Selection bias and unmeasured Confounding in missing Data and Causal inference models,” in Sta- tisticalModelsinEpidemiology,theEnvironment,andClinicalTrialsSE-1 (M. Hal- loran and D. Berry, eds.), vol. 116 ofTheIMAVolumesinMathematicsanditsAppli- cations, pp. 1–94, Springer New York, 2000. [51] A. Rotnitzky and J. Robins, “Analysis of semi-parametric regression models with non-ignorable non-response,” inStatisticsinMedicine, vol. 16, pp. 81–102, 1997. [52] J. W. Tukey, “Bias and Confidence in Not-Quite Large Samples,”TheAnnalsofMath- ematicalStatistics, vol. 29, p. 614, 1958. [53] M. H. Quenouille, “Notes on Bias in Estimation,” Biometrika, vol. 43, pp. 353–360 CR – Copyright © 1956 Biometrika Tru, Dec. 1956. [54] R. A. Frank, D. Galasko, H. Hampel, J. Hardy, M. J. De Leon, P. D. Mehta, J. Rogers, E. Siemers, and J. Q. Trojanowski, “Biological markers for therapeutic trials in Alzheimer’s disease: Proceedings of the biological markers working group; NIA initiative on neuroimaging in Alzheimer’s disease,” Neurobiology of Aging, vol. 24, pp. 521–536, 2003. [55] O. Hansson, H. Zetterberg, P. Buchhave, E. Londos, K. Blennow, and L. Minthon, “Association between CSF biomarkers and incipient Alzheimer’s disease in patients with mild cognitive impairment: a follow-up study,” 2006. 140
Abstract (if available)
Abstract
In diagnostic medicine, the volume under the receiver operating characteristic (ROC) surface (VUS) is a commonly used index to quantify the ability of a continuous diagnostic test to discriminate between three disease states. In practice, verification of the true disease status may be performed only for a subset of subjects under study since the verification procedure is invasive, risky or expensive. The selection for disease examination might depend on the results of the diagnostic test and other clinical characteristics of the patients, which in turn can cause bias in estimates of the VUS. This bias is referred to as verification bias. The only study considering verification bias correction in three-way ROC analysis focuses on ordinal tests. We propose new verification bias-correction methods to estimate the VUS for a continuous diagnostic test. We investigate the assumptions of missing at random (MAR) and non-ignorable missingness. Three classes of estimators are proposed, namely, inverse probability weighted, imputation-based, and doubly robust estimators. A Jackknife estimator of variance is derived for all the proposed VUS estimators. Based on U-statistics theory, we also develop asymptotic properties for IPW estimators. Extensive simulation studies are performed to evaluate the performance of the new estimators in terms of bias correction and variance. And the proposed methods are applied to data in Alzheimer's disease research.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Inference correction in measurement error models with a complex dosimetry system
PDF
Statistical analysis of high-throughput genomic data
PDF
Two-step testing approaches for detecting quantitative trait gene-environment interactions in a genome-wide association study
PDF
Best practice development for RNA-Seq analysis of complex disorders, with applications in schizophrenia
PDF
Generalized linear discriminant analysis for high-dimensional genomic data with external information
PDF
Comparison of participant and study partner predictions of cognitive impairment in the Alzheimer's disease neuroimaging initiative 3 study
PDF
Evaluating the associations between the baseline and other exposure variables with the longitudinal trajectory when responses are measured with error
PDF
Finding signals in Infinium DNA methylation data
PDF
Statistical methods and analyses in the Multiethnic Cohort (MEC) human gut microbiome data
PDF
A comparison of methods for estimating survival probabilities in two stage phase III randomized clinical trials
PDF
Missing heritability may be explained by the common household environment and its interaction with genetic variation
PDF
Using multi-angle imaging spectroradiometer aerosol mixture properties and meteorology for PM₂.₅ assessment in Iran
PDF
Evaluation of factors influencing Los Angeles Tiered-Dispatch System’s improvement on bystander CPR rate and inter reliability between electronic patient care report (ePCR) and 911 call review on...
PDF
Characterization and discovery of genetic associations: multiethnic fine-mapping and incorporation of functional information
PDF
Genome-wide characterization of the regulatory relationships of cell type-specific enhancer-gene links
PDF
Combination of quantile integral linear model with two-step method to improve the power of genome-wide interaction scans
PDF
Applications of multiple imputations in survival analysis
PDF
Gene-set based analysis using external prior information
PDF
Distant metastatic nonrhabdomyosarcoma soft tissue sarcomas in children and adolescents: clinical features and survival outcome among patients in Children's Oncology Group Phase 3 Study ARST0332
PDF
Robust feature selection with penalized regression in imbalanced high dimensional data
Asset Metadata
Creator
Zhang, Ying (author)
Core Title
ROC surface in the presence of verification bias
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Biostatistics
Publication Date
09/25/2017
Defense Date
08/28/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
diagnostic test,MAR,non-ignorable missingness,OAI-PMH Harvest,ROC surface,verification bias,VUS
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Alonzo, Todd (
committee chair
), Chui, Helena Chang (
committee member
), Lewinger, Juan Pablo (
committee member
), Mack, Wendy (
committee member
), Siegmund, Kimberly (
committee member
)
Creator Email
janezhang57a@gmail.com,zhang57@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-188735
Unique identifier
UC11275545
Identifier
etd-ZhangYing-3953.pdf (filename),usctheses-c40-188735 (legacy record id)
Legacy Identifier
etd-ZhangYing-3953.pdf
Dmrecord
188735
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Zhang, Ying
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
diagnostic test
MAR
non-ignorable missingness
ROC surface
verification bias
VUS