Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Comparison of variance estimators in case -cohort studies
(USC Thesis Other)
Comparison of variance estimators in case -cohort studies
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. ProQuest Information and Learning 300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA 800-521-0600 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. COMPARISON OF VARIANCE ESTIMATORS IN CASE-COHORT STUDIES by Jenny Q. Jiao A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (BIOMETRY) May 2002 Copyright 2002 Jenny Q. Jiao Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3074933 UMI UMI Microform 3074933 Copyright 2003 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, vxritten by Jenny Jiao under the direction of tez. Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of DOCTOR OF PHILOSOPHY Dean of Graduate Studies Date My...kO.,...2 .0 .0 2 . DISSERTATION COMMITTEE Chairperson 71 L > V Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. D edication To my parents, my husband Frank, and my two lovely children Jonathan and Vanessa. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A cknowledgem ent I would like to thank Professor Bryan Langholz, who identified this research topic and guided me through this dissertation work. I am very grateful to him for his thoughtful and focused comments which helped me greatly accomplishing this important task. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C ontents D edication u A cknow ledgem ent iii L ist o f Tables viii 1 Introduction 1 1.1 Study Examples.................................................................................... 3 1.2 Exposure Stratified Case-Cohort Design ......................................... 6 1.3 Motivation of Dissertation Topic........................................................ 8 1.4 Outline of the Dissertation.................................................................. 8 1.5 Summary of Findings........................................................................... 10 2 Background 13 2.1 Overview of Case-Cohort Design......................................................... 13 2.2 Prentice’s Pseudolikelihood and Variance E stim a to r............................................................................................. 16 2.3 Asymptotic V ariance........................................................................... 21 2.3.1 Asymptotic Variance for Standard C ase-C ohort................ 21 iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.3.2 A Working Formula for Software Packages......................... 24 2.3.3 Asymptotic Estimators for Exposure-Stratified Case-Cohort 26 2.4 Robust Variance ................................................................................ 30 2.4.1 Robust Variance for Standard Case-Cohort......................... 30 2.4.2 Weighted Likelihood by Cain and L a n g e ............................ 32 2.4.3 Functional Statistics for Case Influence by Reid and Crepeau 34 3 R elationship B etw een A sym ptotic and R obust Variance Estim a tors for Standard Case-Cohort 37 3.1 Robust Variance to Asymptotic V ariance........................................ 37 3.2 Relative Efficiency: Comparison of Variances of Variance Estimators 41 4 R obust E stim ators for E xposure-Stratified Case-Cohort 42 4.1 Robust Estimators Based on Weighted Pseudolikelihood............................................................................... 43 4.1.1 The Naive Robust E stim ators.............................................. 43 4.1.2 The Asymptotic Variance Based Robust Estimator .... 44 5 Sim ulation Studies 51 5.1 Standard Case-cohort Design ........................................................... 51 5.1.1 Cohort G eneration................................................................ 51 5.1.2 Scope of Simulation S tu d ie s................................................. 58 5.1.2.1 Validity of estimators ............................................ 58 5.1.2.2 Precision of estimators ......................................... 59 5.1.2.3 Relative efficiencies.................................................. 60 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.1.2.4 Practical a sp e c ts.................................................... 60 5.1.3 Simulation R e s u lts ................................................................ 60 5.1.4 Conclusions from the Sim ulations......................................... 67 5.2 Exposure Stratified Case-Cohort Design ........................................ 68 5.2.1 Cohort G eneration................................................................ 68 5.2.2 Scope of Simulation S tu d ie s................................................. 69 5.2.3 Simulation R e s u lts ................................................................ 70 5.2.4 Conclusions from the Sim ulations......................................... 72 6 A pplications to A R eal D ata Exam ple 74 6.1 The TBF C o h o rt................................................................................ 74 6.2 Method of Sampling and Scope of Data Analysis...................................................................................... 75 6.3 Results of A nalysis............................................................................. 77 7 D iscussion 79 7.1 Standard C ase-C ohort....................................................................... 79 7.2 Exposure Stratified Case-Cohort........................................................ 81 7.3 Future Research................................................................................... 85 R eference List 87 A ppendix A Simulation Results: Standard C ase-C ohort....................................................................... 96 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A ppendix B Simulation Results: Exposure Stratified Case-Cohort...........................................................105 A ppendix C TBF Study: Results of Data A nalysis.......................................................................120 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List o f Tables 1.1 Summary of Literature Search for Case-Cohort Studies (1997-2000). 3 3.1 Standard Case-Cohort: Partitioned Score Residuals................ 38 4.1 Exposure-Stratified Case-Cohort: Partitioned Score Residuals. . 46 5.1 Definitions of Parameters Used in Sim ulations....................... 53 A .l Standard Case-Cohort: Unbiasedness of Variance Estimators in Clinical Trials................................................................................. 91 A.2 Standard Case-Cohort: Unbiasedness of Variance Estimators in Occupational Studies..................................................................... 92 A.3 Standard Case-Cohort: Unbiasedness of Variance Estimators in Intervention Trials.......................................................................... 93 A.4 Standard Case-Cohort: Variance of the Score Under (3 q .......... 94 A.5 Standard Case-Cohort: Variance of the Score Under (3 q in Occu pational Cohorts............................................................................. 95 A.6 Standard Case-Cohort: Variance of Xk (a% )........................... 96 Xk viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A. 7 Standard Case-Cohort: Consistency of Variance Estimators in Clin ical Trials (10% Exposure Rate)......................................................... 97 A.8 Standard Case-Cohort: Consistency of Variance Estimators in Clin ical Trials (50% Exposure Rate)......................................................... 98 A.9 Standard Case-Cohort: Consistency of Variance Estimators in In tervention Trials.................................................................................... 99 A. 10 Standard Case-Cohort: Comparison of Variance Estimators in Clin ical Trials.................................................................................................. 100 A. 11 Standard Case-Cohort: Comparison of Variance Estimators in Oc cupational Studies....................................................................................101 A. 12 Standard Case-Cohort: Comparison of Variance Estimators in In tervention Studies (No Lost to Follow-up)............................................102 A. 13 Standard Case-Cohort: Comparison of Variance Estimators in In tervention Studies (10% Lost to Follow-up)..........................................103 A. 14 Standard Case-Cohort: Comparison of Variance Estimators in In tervention Studies (20% Lost to Follow-up)..........................................104 B.l Exposure-Stratified: Unbiasedness of Variance Estimators.................106 B.2 Exposure-Stratified: Consistency of Variance Estimators in Clini cal Trials................................................................................................... 107 B.3 Exposure-Stratified: Consistency of Variance Estimators in Occu pational Studies........................................................................................108 B.4 Exposure-Stratified: Consistency of Variance Estimators in Inter vention Trials............................................................................................109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. B.5 Exposure-Stratified: Comparison of Variance Estimators in Clini cal Trials................................................................................................... 110 B.6 Exposure-Stratified: Comparison of Variance Estimators in Occu pational Studies........................................................................................I l l B.7 Exposure-Stratified: Comparison of Variance Estimators in Inter vention Trials (No Lost to Follow-up)................................................... 112 B.8 Exposure-Stratified: Comparison of Variance Estimators in Inter vention Trials (10% Lost to Follow-up).................................................113 B.9 Exposure-Stratified: Comparison of Variance Estimators in Inter vention Trials (20% Lost to Follow-up).................................................114 B.10 Exposure-Stratified: Robust Variance Estimators in Clinical Tials. 115 B .ll Exposure-Stratified: Robust Variance Estimators in Occupational Studies.......................................................................................................116 B.12 Exposure-Stratified: Robust Variance Estimators in Intervention T ials (No Lost to Follow-up)................................................................. 117 B.13 Exposure-Stratified: Robust Variance Estimators in Intervention T ials (10% Lost to Follow-up)...............................................................118 B.14 Exposure-Stratified: Robust Variance Estimators in Intervention T ials (20% Lost to Follow-up)...............................................................119 C.l TBF Study - Standard Case-Cohort.....................................................121 C.2 TBF Study - Exposure Stratified Case Cohort Sample I (Size of Subcohort=75, the Number of Cases)................................................... 122 x permission of the copyright owner. Further reproduction prohibited without permission. C.3 TBF Study - Exposure Stratified Case Cohort Sample II (Size of Subcohort=150, Two Times the Number of Cases...............................123 C.4 TBF Study - Exposure Stratified Case Cohort Sample III (Size of Subcohort=225, Three Times the Number of Cases)...........................124 C.5 TBF Study - Exposure Stratified Case Cohort Sample IV (Size of Subcohort=300, Four Times the Number of Cases).............................125 C.6 TBF Study - Exposure Stratified Case Cohort Sample V (Full Cohort)......................................................................................................126 xi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Jenny Jiao Bryan Langholz ABSTRACT COMPARISON OF VARIANCE ESTIMATORS IN CASE-COHORT STUDIES The complexity of variance estimation under a case-cohort design has been recognized and discussed in the statistic literature ever since the first introduction by Prentice. In the past decade, several analysis methods have been proposed by different authors using independent approach. However, a review of current literature indicates that there has been lack of extensive research on comparisons of the estimators in terms of validity and performances under a variety of cohort types. In this paper, we explore the theoretical relationship between the asymptotic and the robust variance estimators and provide simulation-based evidence for assessing these estimators, plus the Prentice method, under a standard case-cohort design. Moreover, for exposure stratified case- cohort design, which is a new sampling method proposed recently by Borgan, Langholz, and Samuelsen et al, a robust variance estimator is derived which is basically the unadjusted naive robust estimator minus a correction term. The performance of this estimator is evaluated and compared to the asymptotic counterpart. In summary, our results show that all three estimators provide similar and approximately consistent results for estimating the parameter variance with a very high precision. The efficiency of the asymptotic is superior to the robust counterpart. Our theory indicates that the robust variance is in fact an empirical version of the asymptotic counterpart and it has two extra variance terms that cause the loss in efficiency. The permission of the copyright owner. Further reproduction prohibited without permission. major concern raised from our simulations is the observed downward bias, particularly in occupational cohort setting with staggered entry. For the exposure stratified case- cohort, the proposed robust variance performed fairly well in our simulation studies although the actual robustness with respect to model misspecification needs to be explored extensively. Except the Prentice variance, the implementation for asymptotic and robust methods is considered extremely easy and efficient using software like SAS. Furthermore, for the exposure stratified design, the stratum-specific sampling fraction is required for both estimators in order to form the pseudolikelihood properly whereas no such need for calculating the robust variance under the standard case-cohort. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Introduction The case-cohort design for studies of chronic diseases has been increasingly pop ular since the introduction of methods for analyzing failure time data in 1986 [17]. A MEDLINE search for the period of 1997 to 2000 resulted 68 occurrences under keyword “case-cohort.” Among these, about 45 articles reported the study design and statistical analysis in enough detail to determine that the design was in fact a case-cohort design. Different analysis methods were employed on the outcome measures according to the nature of collected data. Nevertheless, in those studies that used Cox regression models for failure time data there were more than five different methods used in statistical analyses in which all incorpo rated the required adjustment of the variance estimation for case-cohort studies. Table 1.1 summarized the results of this survey by types of analyses. Inspired by the unique features and advantages of case-cohort design for rare diseases in large cohort studies, approximately half of them used time to failure as the primary endpoint, however the choice of methods for estimating the variance of regression 1 permission of the copyright owner. Further reproduction prohibited without permission. parameters varied across studies. The most frequently used methods for vari ance estimation in Cox regression settings have been the Prentice estimator [17] with Software Epicure and the robust estimator with SAS macros [1, 10]. Other methods used include the Self-Prentice asymptotic estimator [21], Wacholder’s bootstrapping techniques [27], Volovic’s derivation [26] and Schouten’ s formula [19] in “Poisson regression” settings. In these studies, basically two types of sampling were applied for selecting subcohort: the simple random sampling with one random sample from the co hort, or the stratified sampling with one random sample from each stratum often defined by one or more confounding variables such as sex and age categories. The sampling fractions could be varied across strata. Most studies used the stratified sampling approach, by sampling a fixed number of subjects without replacement within each stratum, to result a better distribution of the subcohort in covari- ate variables for comparison with the cases. Further, almost all of the stratified case-cohort studies reported explicitly that a stratified version of Cox regression models was used in analysis. Detailed descriptions for most commonly used methods are provided in the next chapter, however, here we present some study examples to illustrate the usage of case-cohort design and the analysis method applied in each study. 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 1.1: Sum m ary of L iterature Search for C ase-C ohort Studies (1997- 2000)°.________________________________________________________________________ Number of studies Cox regression by methods of variance estimation Prentice estimator (Epicure) 8 Barlow robust estimator (SAS) 7 Self-Prentice asymptotic estimator (GLIM) 3 Volovics (GLIM) 2 Wacholder (bootstrap) 1 Poisson regression for grouped data 4 Logistic regression for odds ratio 4 Others 4 “Exclude those studies that misused the concept of case-cohort design or lack of sufficient description for statistical analyses. 1.1 Study Exam ples 1. Serum, selenium levels and incident of esophageal and gastric cancers Recently Mark et al [12] published the results of a case-cohort study which is considered as one of the largest prospective studies of serum selenium levels and cancer risks. The study was a randomized nutritional intervention trial conducted during 1985-1991 in Linxian, China where a high rate of esophageal and gastric cancers is observed. The participants consisted of 29,584 healthy adults aged 40-69 years. Blood was drawn from all the participants prior to the start of the intervention. Frozen samples were transported to US and stored at -70°C at National Cancer Institute. The subsequent risks of developing squamous esophageal, gastric cardia, and gastric non-cardia cancers during the trial were 3 permission of the copyright owner. Further reproduction prohibited without permission. examined and cases were identified for each of the outcomes. Controls were selected using a stratified case-cohort sampling method, with six strata defined by sex and three age categories at the start of intervention. Specifically, controls were drawn from each stratum to achieve a ratio of approximately 1:1 of controls to case subjects with incident esophageal/gastric cardia cancers. Overall, 1062 controls were selected and 1079 subjects developed one of the specified cancers. The blood samples from these subjects were processed for assessing serum selenium levels. At conclusion, the study found highly significant inverse associations of serum selenium levels with the incidence of esophageal and gastric cardia cancers, but no evidence of relationship with gastric non-cardia cancers. For estimation of relative risks and 95% confidence intervals, they used the Prentice estimator with a stratified Cox proportional hazard model based on the six sex-age strata. The calculation was carried out by using Software Epicure. 2. The Netherlands cohort study Numerous studies [24, 25, 6, 20] have been conducted and published investi gating the risk factors for a specific cancer risk based on the Netherlands cohort, which was started in September 1986 with 62,573 women and 58,279 men aged 55-69 years enrolled in the study. A subcohort of 3,500 subjects (1,688 men, 1,812 women) was sampled randomly from the cohort after the baseline expo sure measurements. The subcohort has been followed biennially for vital status Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. information in order to estimate the accumulated Person-years in the cohort. In cidence cancer cases have been identified by record linkage to cancer registries and a pathology register. During 1997 to 2000, at least four case-cohort studies were conducted and results are published in medical journals. Two of them examined the relationship between breast cancer risk and diet in adolescence or anthropometric indices such as height, weight, and weight change. One study tested the hypothesis of occupational exposure to carcinogens increasing the risk of lung cancer. The last study investigated the relationship between anthropometry and prostate cancer risk. For methods of analysis, the first three studies used the estimator suggested by Self and Prentice through software GLIM whereas the last one applied the method derived by Volovics with the implementation of a self-developed macro in GLIM. 3. The atherosclerosis risk in communities (ARIC) study The ARIC study follows a cohort of 15,792 men and women aged 45-64 years of both sexes who were initially examined between 1986-1989. The participants were randomly selected from four US communities. The main goals of ARIC study are to investigate the etiology and natural history of subclinical and clinical atherosclerosis. During 1997-2000, four case-cohort studies [9, 16, 22, 5] using separately sampled subcohorts from those participants who satisfied some further restriction criteria for each particular study, have been published investigating the association of coronary heart disease (CHD) and the history of the following risk 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. factors: infectious diseases such as chlamydia pneumoniae, cytomegalovirus, or herpes simplex virus; abnormality of cardiac functions. Three studies used a stratified sampling method and two of them actually shared the same subcohort sampled from strata defined by sex, age, and arterial wall thickness. The other study partitioned patients by center-race and a disease indicator. Only one study took the simple sampling approach. For estimation of relative risks, two of the studies applied the robust variance estimator proposed by Barlow which uses the Cox proportional hazard regres sion procedure phreg in SAS. Whereas the other two studies utilized two differ ent methods to estimate the same quantity: one by Prentice and the other by Schouten in a Poisson regression setting. 1.2 Exposure Stratified C ase-C ohort D esign Recently, a new sampling method for case-cohort studies has been proposed by Borgan, Langholz, and Samuelsen et al (BLS) [3]. Suppose that a surrogate mea sure of the exposure variable is available or can be easily collected for the entire cohort while data for the detailed exposure information are expensive or difficult to obtain and therefore it is only desired to gather these additional information from a small sample such as subcohort, it would seem sensible and advantageous to incorporate this surrogate information in the sampling process. The exposure stratified case-cohort design is proposed to sample the subcohort members from strata defined by the surrogate measure (e.g. exposed or unexposed) with the 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. exposed patients oversampled. Particularly, when the exposure is rare, this sam pling process will result a much higher percentage of exposed subcohort controls than the simple random sample, therefore a more informative subcohort. For instance, the study of exposure to fluoroscopy and breast cancer risk, which will be used later as a data example for this dissertation, has information for each patient on whether she was treated with fluoroscopy or not. The selection of subcohort for this study can be done more efficiently by sampling most or all of the treated patients while only a small portion of the untreated patients. The ob tained subcohort sample should be more informative for containing more exposed patients and bigger variances in exposure data. Obviously, the subcohort chosen by exposure stratified sampling is no longer a random sample of the full cohort, the corresponding statistical analysis based on this sample requires some adjustments in order to eliminate the bias induced by the sampling. At this point, no actual study using this special sampling scheme has been conducted. The paper by BLS et al [3] is the first and the only article in literature investigating this special case-cohort design. They proposed three estimators for analyzing such a subcohort sample with failure time data based on a pseudolikelihood in the spirit of Prentice. The asymptotic variance estimator they derived is a natural extension of Self-Prentice for a simple case-cohort design. The detailed descriptions for their methods will be provided in the background section. 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.3 M otivation of D issertation Topic This literature review reveals the fact that there are several options and tools for analyses of case-cohort studies. Nevertheless, there has been little research on comparisons of the existing methods in terms of properties and performances. First of all, there are three variance estimators that appear to be valid for failure time analysis of case-cohort data. These are the Prentice’s original formulation; the asymptotic estimator; and the robust variance. The estimators were derived from independent theories and their relationships as well as differences warrant further investigation. Previous research [17, 1, 23] based on small and limited simulation studies has shown that they are approximately unbiased estimators for the actual variance of (3. However, there has been lack of rigorous testing on the validity and performances of these estimators through a variety of cohort types. Further, the efficiency of each variance estimator which varies across methods is also of concern. This information is important to investigators for selecting the method of analysis and interpreting the results correctly. Therefore, a study comparing the variance estimators and providing recommendations on which estimator to use is warranted. 1.4 O utline of th e D issertation The scope of this dissertation extends from theory exploration for the under lying relationship between estimators to simulation studies for examining and comparing the performance of these estimators. 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The first goal is to conduct a complete and extensive comparison study which assesses the validity of the three aforementioned variance estimators in terms of unbiasedness, consistency, and precision and compares their performances un der different settings with respect to distribution parameters and cohort types. Specifically, we evaluate the methods in rare disease situations (e.g. 10% failure rate) and generate simulated data which mimic the feature and structure of the following three types of studies: (1) clinical trials; (2) occupational studies; and (3) intervention trials. We use simulations as the primary tool and apply ana lytic techniques wherever possible. We outline each method using standardized counting process notations and investigate the underlying theoretical relationship between the asymptotic and robust estimator. We are also interested in extending the existing analysis methods to accom modate more complex study designs. In particular, we develop robust variance estimators for exposure stratified case-cohort design and evaluate their perfor mances and compare to BLS’ s asymptotic estimator I using simulated data under different settings. At the end, we will test the proposed estimators and compare to the existing methods as well as to results from full cohort analysis using data collected from a real occupational cohort study. In addition, the practical as pects such as implementation of the method, computing time, and information required for calculation will be assessed as well. The rest of this dissertation is organized as follows. In Chapter 2, we give an overview of case cohort design and discuss its advantages as well as disadvantages over other design options in terms of cost and efficiency as reported in current 9 permission of the copyright owner. Further reproduction prohibited without permission. literature. Following that, we introduce the origination of existing methods of variance estimation and subsequent development for simplifying the computation process using standard software such as SA S. Chapter 3 presents recent work by Langholz, the theoretical exploration on the underlying relationship between asymptotic and robust variance estimators for a standard case cohort design, and explicit expressions for the extra error terms involved in robust variance which results in a lower efficiency compared to the asymptotic counterpart. In Chapter 4, we propose three robust estimators for exposure stratified case cohort, including the derivation of an asymptotic-based robust variance. Chapter 5 shows the work carried out for simulation studies under two samplings schemes: simple and exposure stratified case cohort. The details of simulations are described which includes the design, scope, summary of results, and simulation based conclusions. Chapter 6 reports the analysis performed on sampled data from a real study, the TBF cohort, using methods described in previous chapters. At the end, in Chapter 7, we discuss our findings and propose future research work in this area. The tables for simulation studies are deferred to Appendixes (A to C). 1.5 Sum m ary of Findings Our theory reveals the fact that the robust and the asymptotic variances are actually closely related. The difference between the two estimators per se is the substitutions made by observed quantities in robust variance for approximating the expectations required in the asymptotic variance calculations. Although the expectation of robust estimate is equivalent to the asymptotic one, the former 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. estimator has extra variance terms which make it less efficient than the latter one. Simulation results suggest a downward bias in the estimators evaluated. It is especially prominent (up to 30%) in occupational cohorts where subjects enter the study in a random pattern, rather than fixed at “time zero.” As estimators for the variance, they provide very similar results on average and all demonstrate good consistency and high precision. For an exposure stratified design, three robust estimators are proposed. The first two estimators are suggested based on the form of robust variance for stan dard case cohort, the so-called naive estimators, and the last one is derived from the asymptotic variance given by BLS et al. The main concerns for this type of design are two fold: (1) correction for the effect of over sampled subgroups, which can be easily achieved by using a weighted likelihood with the inverse of sampling fraction as the stratum specific weight; (2) the stratum mean of df betas is no longer zero, like the case in standard case cohort, and it needs to be in corporated into the calculation of variance. The adjusted naive robust estimator is formed with intention of adjusting the sum of squares of df betas by stratum means in a naive way. The asymptotic-based robust estimator is derived from BLS’ s asymptotic estimator I which is essentially the unadjusted robust minus a correction term. Our simulation studies demonstrate that the derived robust variance estimator provides satisfactory results comparing to BLS’ s estimator, whereas variances given by the naive estimators are quite different from others. In summary, the newly derived estimator appears to be a good candidate for an exposure stratified design and may be a better alternative for estimating the 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. variance when the assumptions of the model are in question. However, the robust ness of this estimator with respect to model misspecification and other incorrect assumptions made for data applied warrant further investigation. 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C hapter 2 Background This section provides the background and theory development of the three most popular variance estimators: Prentice method, Self and Prentices’ asymptotic estimator, and Barlow’s robust variance. The paths used for deriving these esti mators may be different from that originally proposed, although the final formula is exactly the same. The intention of this effort is to use standard counting process notation and to illustrate the concept in a logical and systematic way which is easier for readers to follow. 2.1 O verview of Case-Cohort D esign The case-cohort design was first proposed by Prentice [17] as a special sampling method which could be used for large studies when the collection of covariates for the full cohort is infeasible. Specifically, a random sample of the cohort, called “subcohort,” is selected at the beginning of the study and is followed till the time of failure or censoring. All cases including those that occurred outside the subcohort will be utilized in the sampled risk set. This method is most useful 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in analyzing time to failure in large cohort studies when failure is rare. Valued resources can be used more efficiently by collecting relevant information from the sampled set that are most informative to the study. It is particularly cost-effective for studies that collect covariate information which is expensive to retrieve. With the case-cohort design, it is only required for subcohort members and cases which is often a much smaller study sample to the full cohort. Evidently, it is necessary to ensure that covariate information for the full cohort could be retrieved for processing whenever needed since we can not predict who will be in the case set. For failure time studies, the efficiency for estimating regression parameters strongly depends on the number of cases, whereas the contribution from controls are relatively small. Studies [17] have shown that this sampling scheme can be very cost efficient. Furthermore, it is well suited for studies that are designed to investigate multiple outcomes since the same comparison group can be used for different endpoints. On the other hand, one commonly recognized difficulty with case-cohort de sign has been the analytic complexity. Particularly, the covariances among score terms induced by the sampling design have been difficult to calculate and remain an active research area. Prentice proposed a pseudo-likelihood for parameter estimation of risk ratios. Like the partial likelihood used in Cox models, the pseudo-likelihood is based on the probability that the case failed conditional that one subject failed from the sampled risk set. The corresponding maximum pseudo-likelihood provides a consistent estimator for the parameter. However, the actual variance for the score is underestimated by the “standard” likelihood 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. based method, the inverse of information matrix. This is because the inverse information does not take into account the correlation between score terms due to this particular sampling scheme. To properly estimate the variance, Prentice suggested an estimator that accounts for the covariances among score terms con ditioning on whether the case is within or outside the sampled subcohort. His method is computationally intensive and difficult to modify for situations that are different from that originally derived. In the past few years, various forms of pseudo-likelihood, which are essentially modifications of Prentice’s original pseudo-likelihood by applying different weighting schemes, have been explored. Moreover, different approaches for variance estimation have been discussed and studied. Among those, Self and Prentice [21] proposed a consistent estimator of variance based on asymptotic theory. Barlow [1] and Lin-Ying [11] have pro posed two different forms of robust variance estimators that share the same idea. Barlow’s theory was based on approximate Jackknife by measuring the influence of each individual on the estimated parameters. He also showed that his es timate could be generalized to more complicated sampling schemes. However, basic properties such as consistency have not yet been explored. A SAS Macro is provided for use on the Internet by the same author. Lin and Ying proposed a variance estimator for a Cox model with incomplete covariates and included case-cohort design as a special case. Wacholder et al [27] applied a bootstrap technique that mimics the original sampling scheme by resampling separately cases and subcohort controls. 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The literature search revealed that the three most commonly used methods for variance calculations in Cox proportional hazard regressions have been the Prentice’ s original formulation, Barlow’s robust estimator, and Self and Prentice’s asymptotic approximation. 2.2 P ren tice’s Pseudolikelihood and Variance Estim ator Prentice [17] proposed a “pseudolikelihood” procedure for estimation of the relar tive risk parameter based on a proportional hazard model. He derived an explicit formula for computing the variance of the estimator with correction for the covari ances between score terms conditioning on whether the case is within or outside the sampled subcohort. For simplicity of illustration, we assume an exponential relative risk although it is not strictly necessary. The results can be generalized to other forms. Use the standard counting process notation and let {Ni(t), Yi(t), Zi(t)} denote counting, censoring and covariate value at time t for subject i. Let n be the size of the full cohort C and m be the size of the subcohort C. Furthermore, we define {C} U {i\dNi(t) = 1 }, the case-cohort set at time t, 1 R ( t) ^ C * ) 0 otherwise {Z(u), N(u), Y (u); u < t; R(t), A(t)}, Yi{t)eP'Zi{i\ 16 m = A (t) = m = Ti(t) = Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. r'(t) = Yi(t)Z,(t)ea'z‘< ‘> , S<°>(t) = E ¥ & ) < * * < ■ » = 5 r<(t), iefl(t) i£R(t) s m(t) = E w w y " * ’ = E rS W . * € « (< ) 5 ( 2)(i) = £ i&R(t) Vjj = conditional variance of the pseudoscore at tj, vkj = covariance between score terms at tk and tj. F{t) contains information for R(t), A(t) and all available counting, censoring, and covariate information up to time t. Assuming independent censoring and failures, Prentice writes the conditional probability of subject i failed given T and the fact that one subject failed at £ in R(t) as following P [dNt(t) = l|.F(i), •£ dNiit) = 1] = r<(t) . (2.1) ,6S(t) £«=««) W ) The pseudolikelihood for the cohort is then ( ■ i dNi(t) r ^ } - ( 2 2 ) 2-'l£R {t) ' W ) ) Note that the denominator is a sum over subjects at risk in R(t) rather than over subjects at risk in the entire cohort. The pseudoscore for subject i is then U 0 ) = | m - dN,(t). (2.3) Prior to proceeding with the covariance between score terms, it is necessary to clarify some notations. Let tj and tk denote two distinct failure times and tk <tj. 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Assume subject j failed at tj and subject k failed at tk- Let Uj(/3) and Uk{0) be the corresponding pseudoscore for each failure time and write the conditional expectation of the score at tj as = E H ti) - rj{tj) SW{tj)] (2.4) X ZiiijYiitj) - s s J f A E rtii) ^ieR(tj) Tl^i) = 0 Immediately, it follows that the expectation of the score for the cohort E[U(/?)] 0. This result conforms to the partial likelihood property. Similarly, we can write the conditional variance of the pseudoscore at tj as V a r[U jm n tj), E «WK*i) = l] leSctj) E U*((3)-P[i failed at X iefl(tj) iefl(tj) n(tj) = 1] jj (2.5) The covariance between Uk and Uj is then: E dNl(tj) = l]= v kj (2.6) 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The calculation for this joint conditional expectation requires explicit expres sion of Uk for each possible failure at tj. First step needs to determine the case-cohort set at tk, i.e.R(tk), using the information at tj. Now consider two situations: [1] A (tj) =0 : case in subcohort, then C — R(tj). Recall that the case-cohort set used in pseudolikelihood includes all subcohort members plus the case, then R(tk) = {l\dNi(tk) = 1} U R(tj). In this case, R(tk) is fixed with regard to R(tj). Namely, the set R(tk) does not depend on the particular individual who actually failed at tj and Uk(P) is therefore a fixed quantity in (2 .6 ), E lU M U jiP m tj), £ dNl(tj) = 1 ] /efi(tj) = Uk(p)E[Uj(Pmtj), £ d A T I (£ i) = l] iefl(tj) = 0. Then Cav[Uk((3),Uj((3)\ = 0. [2] A (tj) = 1 : case outside the subcohort, then C = R(tj) — {j} and R(tk) = {l\dNi(tk) = 1} U {R(tj) - {j}}. In this case, R(tk) changes with the failure j at tj, which implies that Uk is no longer constant in (2 .6 ) but varies with the failure at tj. This is the covariance induced by including those non-subcohort cases. Rewrite (2.6): E lU M U jiP m tj), £ dNt(tj) = 1 ] l£R(h) Sw (tk) + r'jjtk) - r'jjtk) \ S(°Htk) + Tj(tk) - Ti(tk) J 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [7 u \ g(1)(*j)l rife) S W(tj) J 5(o)(ij) = vkj. (2.7) The first part in (2.7) is Uk with modification of case at tj. Moreover, when i = j, the subject who actually failed at tj, the effect of j is removed from the Uk. However, when i ^ j, then R(tk) = {R(tj)} + {j} — {«}. Rewrite the variance of the score Var{U(f3)} = ^ j % + 2A(ij) vkj\dN(tj j = 1 ( k/tk<tj J •i) = 7 + A (2.8) where A is the total covariance between score terms. The notation A(tj) was introduced for indicating non-subcohort cases. With A (tj) = 0, it implies that the case j occurred within the subcohort. Clearly, the covariance term is zero for subcohort failures. The first part of (2.8) is the variance of the pseudoscore which can be consistently estimated by the corresponding information matrix I. The second part is the total covariance between score terms which can be calculated by the sum of ’vkj ’ terms over unique failure times multiplying two. vkj can be calculated by equation 2 .6 . With a Taylor expansion of U(/?) at /?o around /3, we have U(l3„) = 0 0 ) + (ft, - 0)U’0o) + °p(VZ) (2.9) By definition of MLE, 0 0 ) = 0. Letting n— > oo, it follows that (219) 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.3 A sym ptotic Variance 2.3.1 A sym ptotic Variance for Standard Case-Cohort Self and Prentice [21] modified Prentice’s original pseudolikelihood by using the subcohort only as the risk set for all cases. The difference between the two definitions is that Prentice’ s likelihood included the non-subcohort case in the risk set at his failure time. Self and Prentice pseudolikelihood The impact of the non-subcohort case on the denominator is very minimal when sample size gets large and disease is rare. This likelihood provides consistent estimator for the parameter which is very close to that obtained by Prentice’s likelihood. However, the information I does not provide consistent estimator for the variance of the score. Let Ij be an indicator for subcohort membership, then Let E and Eg be the conditional expectation of the covariate for the full cohort — ' T " * n Z'T- T n Z t'I• C and subcohort C, respectively. Then E = r^n1 33 and Eg = 1 1 Rewrite the score term from the case-cohort, evaluated at /?o, by substituting the conditional covariate E for the full cohort with its subcohort counterpart Eg, (2.11) (2.12) 21 permission of the copyright owner. Further reproduction prohibited without permission. where dN. = £?=1dNt The first term is the partial likelihood for the full cohort and the second term is the “deviation” of the subcohort from the full cohort by replacing E with Eg. Rearrange the second term as following, dN. 7, t . V* 7„t, T i dN. g _ HZkTk £ ZkT k £ ZirJi f{E — Ee )iN. J I ZrJ, = I [~ Z X n l , ' ; E n 4 £ nh = £ ( 1 - - I k)Xk (2.13) fc= l m where X k — f(Zk — E) „ dN. and E is the expectation of covariate from full cohort. Immediately it follows that £ £ =1 X k = 0 and X = 0. Then the pseudoscore can be written as: U = U + j 2 ( l - ~ I k ) X k (2-14) Asymptotically, term ^ £ 2r/// should approximate £*Tfc and X k is ‘fixed’ with respect to the random part Ik. Therefore, we can decompose the variance of the score for case-cohort into three terms: Var(U) = Var(U) + { - f V a r ( ^ I kX k) + Cov[U, £ ( 1 - -/* )* * ] (2.15) m k=i k= i m 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The covariance term is zero because the full cohort score U and the second term are “conditionally uncorrelated” given full cohort. Specifically, the covariate in formation and the total number of cases are determined by the given cohort. The only random part in U is which subject fails and contributes to the likelihood. study. Therefore, score U and term £ £ =1(1 — ^Ik)Xk are uncorrelated condition ing on covariates and the number of cases. Var(U) can be consistently estimated by information I from the full cohort or I from the subcohort. The second term is the variance attributed to the case-cohort sampling given full cohort, here de noted as A. Specifically, term Y,k=ihXk is a sum of m sampled XkS from the full cohort and its variance can be derived by applying the finite sample theory as given by Langholz, hence Term can be approximated by its subcohort counterpart cr~, where For the second part, Ik is the only random variable which is determined at the beginning of the study and will not depend on which individual fails during the A = A V a r E W m tr. fc=l n(n — m) 2 (216) Similarly, £JL, X k = 0 and X g = jL - x k = 0. Therefore the subcohort counterpart of A can be estimated by the following, using the empirical variance formula for er~: A m 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Using the subcohort information 7 as an estimate for full cohort information I, the variance of the pseudoscore Var(U) is simply 7 -F A. Applying the sandwich estimator for variance of 0, we get Var(0) = 7"1 + 1 '1 A / " 1 (2.18) 2.3.2 A Working Formula for Software Packages The asymptotic variance estimator for Var(0) given in equation (2.18) may be computed using the output from popular statistical packages, such as SAS. Write the score residual based on case-cohort data using the pseudolikelihood defined by Self and Prentice R i= h Z i - Es )(dNi - =— — dN.) J L jec ri For censored subjects, Xi = —^Ri and I~lXi = The quantity I~lRi has been widely used as an approximation for the influence diagnostics following the work by Cain and Lange. The detailed background for theory and application is provided in the next section. Here we show how to calculate Var{0) for case-cohort data by plugging in the influence diagnostics returned from the Cox model which is currently an option in SAS. Denote A as the approximate influence of subject i on 0, then A = I~lRi. The D0S are referred as df betas in literature and software manuals. 24 permission of the copyright owner. Further reproduction prohibited without permission. Rewrite (2.18) V a r ( 0) = I-' + j - i|r a (” -I") g ; ■ 771 fe= l = r 1 + " ( " - ^ ( g y r n f ; m 7 1 k=\ = /- I + (n ~ (2.19) n c where .D^ is a pxm matrix that contains the dfbeta residuals from C. Equation (2.19) is exactly the same formula given by Therneau and Li [23] although their derivations were different. Recall that the substitution of X{S by RiS requires dN{ — 0, i.e. the individual is censored. We need to prepare the data to such form prior to applying (2.19). In their paper, Therneau and Li suggest to reform the data by creating two observations for each subcohort case, one as case and the other as censored. Only the dfbeta from the censored observation will be used to calculate the covariance. In this way, the score residual Ri is split into two parts: the censored observation which contains — f y . r'z dN. and the case 2-< jec ri observation which contains f(Zi — E)dNi. A weighted relative risk will be used in the denominator for which the observation for case is assigned a log weight of a large negative number (e.g., -1 0 0) and the observation for censored is assigned a log weight of 0. The weight is used as the offset term in SAS phreg which essentially ‘eliminates’ the contribution from the cases in the denominator. The exit time for all cases, regardless of subcohort membership, are set at time point right before the failure time (i.e., subtract a very small number from the actual time). For subcohort cases, they contribute to the likelihood as censored up to 25 permission of the copyright owner. Further reproduction prohibited without permission. the failure time and as cases at the failure time. The additional term from the subcohort case in the denominator will be offset by the small weight, therefore, it has very little impact on the actual value. Moreover, non-subcohort cases are included in the risk set only at their own failure times and their contribution to the denominator is offset by the small weight, basically no contribution. Therefore, the resulting parameter estimate will be the Self and Prentice estimator. However, by altering the weight for non-subcohort case to 1 , instead of the small value, we should obtain the Prentice estimator. Thus, with the proper software (such as SAS), the asymptotic variance of (3 can be easily calculated using the case-cohort data and the sampling fraction for subcohort. 2.3.3 A sym ptotic Estim ators for Exposure-Stratified Case- Cohort As mentioned earlier, analysis method for exposure stratified case-cohort requires adjustments for correction of the over-represented sample. BLS et al [3] proposed three estimators based on the following weighted pseudolikelihood for exposure stratified case-cohort studies. The type of estimator obtained depends on the choice of “case-cohort set” (2.20) R(t) and the weight Wi. They considered three combinations of R(t) and W i which are natural generalizations of ideas from simple case-cohort to stratified 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sampling. For example, Estimator I uses the subcohort C for R(t) and incorpo rates a stratum-specific weight which is the inverse of the sampling fraction for the stratum. They proved that all estimators are score-unbiased which means the expectation of the pseudoscore equals zero at the true value of /?. Their sim ulation results showed that all of the proposed methods performed well and were more efficient than the simple sampled case-cohort study. The variance estimator for exposure stratified case-cohorts is just a simple extension of the counterpart for simple case-cohort and the derivation is very similar to that shown in simple case-cohort section. Assume the cohort can be partitioned into L strata based on information available at entry time. Let n/ and mi be the total number of patients and the number of sampled subcohort in stratum I, respectively, and n/(i) and be the corresponding quantities for the stratum that patient i belongs to at entry. We random sample mi subcohort members without replacement from the ni patients in stratum I. Then the full cohort size n = ni and the sampled subcohort size m = Y^iLi fni- For estimator I, W i — The score given by this weighted pseudolikelihood is, U = ± f ( Z i - E)dNi + f(E - E£)dN. where E£ = Jj C 2^=1 riw i^ 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Following the same steps as shown in Section 2.3.1, j ( E - Es )dN. [wkhE — E + Zk — ZkWkIk\ dN. n = Ed k=l n = £ ( 1 - wklk)Xk (2.21) where X k = f(Zk - E)y - ^ r r dN. / 3 3 3 Decompose the sum over cohort into sums for each stratum using the stratum- specific weight and rewrite the score term, We can easily show that term X^k) has the following properties: • Sum over cohort is zero (J2iLi 52kec, Xi(k) = 0), and sum over subcohort is approximately zero (Y^iLi IZk^c, Xl(k) ~ ® ) because of « Ect; • Mean over strata (overall mean) is zero for cohort (Xc = 0), and is approx imately zero for subcohort (Xg « 0 ); • Sum within stratum is not zero for cohort and subcohort (YlkeCi Xi(k) ^ 0 <=ifceCi rni (2.22) 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Stratum mean is not zero for cohort and subcohort (Xct 7^ 0 and 7^ 0). In addition, the same argument of “conditionally uncorrelated” between the two terms, as that used in simple case-cohort, applies here as well. Therefore the covariance between U and X]£=iLfceCz(l — is zero. So the variance of the pseudoscore has the following two components: Var(U) = / + 1=1 k e c t m i = / + A Term I can be estimated fairly well by I from case-cohort sample. Moreover, the //(fc)’s are independent across strata as a result of independent sampling, therefore the second quantity in the above equation can be simplified as such, 1=1 17 11 k€R t = (2 23) 1 = 1 m i Similar to simple case-cohort, a2 Xl can be estimated by subcohort counterpart a - using the empirical variance within stratum /, namely, fc e C , where, and X a = — E Xt C l mi ~ fcec, 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The score residual for exposure-stratified can be written as following using the weighted likelihood in 2 .2 0 : For censored patients, dNt = 0 and X{ — — ~ R i = — Then, cr~( = -i a 2 ~ . "i Ri So, the asymptotic variance for exposure stratified can be expressed as fol lowing: where cr|>_ = ^ E ^ t A - A)®2, and A = ^ E^g, A - Note that the df betas are contributions from “censored” subcohort. Particularly, for subcohort failures which have two parts of contributions, one as censored up to failure time and the other as failure at failure time, the covariance part only uses the censored contribution. 2.4 R obust Variance 2.4.1 Robust Variance for Standard Case-Cohort Barlow proposed a robust variance estimator based on influence diagnostics and stated that it is an infinitesimal jackknife variance. Lin and Ying used estimation (2.24) 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. equations for Cox models with incomplete covariate measurements and treated case-cohort design as a special case. The variance estimator they derived is identical to that of Barlow’ s. The influence of subject i on the parameter estimate (3 is defined as the change in P when subject i is deleted from the likelihood, denoted as A fa = /?— /?({), where /3 (j) is the MLE obtained without subject i. The robust variance for case-cohort is then: V a r 0 ) = i The direct calculation for A $ requires fitting the Cox model repeatedly by deleting one subject each time which is extremely computationally intensive. However, different approaches for approximating case influence have been devel oped in the past decade. The commonly used method is a Unear approximation to Afli which can be calculated by multiplying the information inverse with the score residual. This approximation has been widely used in recent years and is referred to as df betas in Uterature. In matrix notation, let Dpxn be df betas, /p X p be information, and RpXn be the score residual, then D — I~X R. The robust variance estimator for (3 is simply D®2. It is worth mentioning that the D matrix used here is different from that in asymptotic variance. In this case, there is only one score residual for subcohort cases. Particularly, the D{S for those subcohort cases are equivalent to the sum of two observations used in previous method with one as the case and the other as the censored. 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Among different derivations for this approximation are two basic methods, the weighted procedure proposed by Cain and Lange [4] and the influence func tion derived through differentials of functional statistics by Reid and Crepeau [18]. Both methods used the first order of Taylor expansion and therefore are asymptotic approximations for variance. The frame work for the two methods is summarized in the following sections. 2.4.2 W eighted Likelihood by Cain and Lange Let Wj be the case weight for subject j and define Wj as such ( 1 if j is included 0 if j is excluded Further, assume all other subjects have weight unity except subject j. Now (3 depends on Wj. Setting Wj = 1 yields (3, the estimate from the full data. Setting Wj = 0 deletes subject j from the likelihood and yields (3(j). Expand (3(iUj) at Wj = 1 , thus: (3{wj) = (3(1) + - 1 ) + oP With Wj = 0, the influence of subject j is approximately (3 — (3(j) ~ There fore, the case influence can be approximated by the first derivative of (3(w) eval uated with the full data. Recall that (3(wj) is the solution to the score equation U((3(wj),Wj) = 0, and taking the first derivative with respect to Wj gives d U 0 )d P (w j) ( dU(wj) p d(3 dwj dwj 32 permission of the copyright owner. Further reproduction prohibited without permission. Then dP(Wj) \ d U 0 )\~ l dU0,wj) dwj \ dj3 ) dwj The first term is the observed information from the full data. The calculation for the second term requires expressing the partial likelihood as a function of w. Note that L(0, w) is differentiable in each w. The weighted score is then, U((3(w),w) = 5 3 ^ j{Zi - E((3,w))dNi i=i J where E 0 ,w ) = The derivative of the score w.r.t. Wj is dU(/3(w),w) dwj = f (Zj - E(0, w))dNj - I (Z} - E(0, w ) ) ^ — E VidNi ( 2. 26) J J i— 1 Plugging w = 1 gives, Rj = J (Z } - E(0))dNj - J ( Z j - E ( 0 ) ) ^ - d N . This is the score residual. Therefore, the influence of subject i on (3 can be approximated by Di = I 0 ) - lRi Similarly, we can expand the U(0, Wi) at W i = 1 U0,Wi) » U(0,1) + - 1) 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. With W i = 0, the above equation gives <g/(^’1 > x U 0 , l ) - U 0 , O ) aWi where U(j3,l) is the score from full likehhood and U 0 , 0) is the score excluding individual i. Therefore, term d U }^'0 is the contribution of i to score. The empirical variance of the score for partial likelihood setting is simply, VarU0) = j r R i W W M 1 = 1 The Variance of (3 is then V a rtf) = ± r 'iu m b r ' = r* 82 i= l Therefore, the robust variance is an empirical version of variance estimate which does not require the relative size or fraction of the subcohort. 2.4.3 Functional Statistics for Case Influence by Reid and Crepeau The influence function of an estimator is an infinite sample concept which is originally proposed for indicating the sensitivity of the estimator to particular values [13]. In the past years, influence functions of estimators have been used for calculating the empirical version of the asymptotic variance. Following the introduction of infinitesimal jackknife, which established the bridge between jack- knife and robust estimation of variance, much research has been done exploring the robust variances under different settings through influence functions. First we consider a simple case, a one-dimensional cumulative distribution function F for random variable X. Suppose the estimator (3 can be expressed 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. as a functional of the empirical cumulative distribution of the data, such that B(F) and the unknown true parameter f3 can be denoted as B(F ), the value of the functional at the true distribution F. Under regularity conditions, Miller [13] showed that the estimator can be expressed in the form, B(F) = B(F) + J B\F, x)d(F - F)(x) + op(n~*) where B'(F, x) is a Von Mises [14] derivative defined by lira B(<F + ~ B- & = J B'(F,x)dG(x) where G(x) = Fx — F and eFx is a ‘contaminated’ F at point x. For a normalized function B, it follows that / B‘{F, x)dF(x) = 0, so that B(F) = B(F) + - '£ B '( F ,X i) + Opfn-i) n i-i Then B(F) is asymptotically normal with mean zero and variance i £ [# (/? , X,)]2 Sharing the same spirit, Reid and Crepeau expanded the idea to censored data and derived an explicit formula for influence function of the estimator. Let H(t,z,5) be the joint distribution for survival time T, the covariate vector Z and the censoring indicator A. Let H(t,z) be the joint marginal distribution function of (T ,Z ). The influence function obtained by evaluating lim ^ 0 B {H +e G )-B {H) is c < f ' * • * > - r‘ { * - ^ L i £ f ) [ z - SH l f f i ( l -^ ) } ( 2 ' 2 7 ) where so(t,0) = E{e!3z},Si(t,/3) = E{ze^z} and I is the information matrix. 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The variance of ft is then J C(t, z, 5)C'(t, z, 5)dH(t, z, S) For a finite sample with size n, the variance using the empirical distribution is simply £ £ ” =1 CC' which is the infinitesimal jackknife estimator used by Barlow. ) 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C hapter 3 R elationship B etw een A sym ptotic and R obust Variance E stim ators for Standard C ase-C ohort 3.1 R obust Variance to A sym ptotic Variance This section reflects work by Langholz. Using the pseudolikelihood by Self and Prentice and the conditional expecta tion of the covariate from subcohort, we can write the score residual for patient i as following: Ri = H Z i - E5)(dNi - —I*— dN.) where r is the maximum follow up time of the cohort. In order to explore the relationship between the two estimators, the first step is to decompose the score residual for each individual by two time segments which sepaxate the contribution of each individual into the “censoring” period ending right before (t“ )the failure time t and the “failure” moment starting just before the failure time (t~). Obviously, subcohort controls only contribute to the first 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 3.1: Standard Case-Cohort: Partitioned Score Residuals. subcohort non-subcohort (all members) (failures) Score residual matrix Rq Ri Matrix dimension p x m Score residual for patient i Ai — Bi Ai Corresponding matrix A0 - B0 Ai ♦Number of non-subcohort cases. part whereas subcohort failures contribute to both segments. For non-subcohort failures, since they are included in the risk set only at his own failure time, therefore, they only contribute to the second part. Define: Ai= f (Zi — Eg)dNi, the “observed” part of Ri J 0 Ai reflects the contribution of i from the failure part, the second time period. Clearly, for subcohort controls, Ai — 0 because of dNi = 0. Further, define: Bi = f (Zi -E~)-=— — dN., the “expected” part of Ri. Jo £ JG c r; Bi reflects the contribution of i from the “censoring” period, the first time seg ment. For non-subcohort failures, Bi = 0 due to the fact that they are not in the risk set and therefore Yi(t) = 0 and rj(t) = 0. Then the score residual can be partitioned into the parts as shown in table 3.1 according to subcohort membership and time periods. 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For subcohort non-failures, Ai = 0 and the score residual is simply B*’; for subcohort failures, the residual is the full term of Ri, i.e. lAi — Bt '; for non subcohort failures B, = 0 , the residual is just lAi’. Rewrite the robust variance by the partitioned residuals D®2 = = / -1[iZ o B i]® 2 / -1 = I-'U A o-B ojA if2! - 1 Expand the middle term and combine those matrices with same letters that represent the same quantities: R®2 = [AqA i]® 2 + B®2 - 2A0B'0 Note that AqB'0 = B 0A'0. Term [AqA^®2 = Z?=1f0 T(Zi - Ed)®2dNu this is the sum of squares of the pseudoscore terms which has an expectation of zero. Therefore, the expectation of [A0Ai]®2 is just the information matrix 7. Term B®2 = R~2, where the contributions from subcohort failures in correspond to the expected part only (Bo), namely the censored observations in dataset as described in section 2.3.2. The term 2AqB'0 reflects the contribution of subcohort failures. By definition, 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Because Ti(t) is zero after a jump and the censoring variable Yi(t) is left continuous, so ri(s) = 0 for s > t, therefore the second integration is nonzero up to t~. Further, using dNi = r^dAo + dMi and estimating dAo with the “case-cohort” version of Breslow estimator n AN\l) . W e can rewrite the above term as: E - Egtlt)) f ' l Z M - Es {s)) r,^ , dN.(s) i6c L j& c Ti{t)dN.{t) + dMi(t) where M (t) is a martingale and its expectation is zero. Thus, n Re-form the robust variance by the partitioned terms, D®2 = I~l[I + e + B f R f + M {r)]rl c n c = r 1 + r ' l — - i t e 2}!-1 + e' + m '(t) n -i , n Z J ^ Df + €' + M'(r). n c (3.1) where E[d] = 0 and E[M'(r)] = 0. Thus, E[D® 2] = 7" 1 + — -D®2 n c This is exactly the asymptotic variance given by Self and Prentice. 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2 R elative Efficiency: Com parison o f Variances of Variance E stim ators In the previous section we showed that the relationship between robust and as ymptotic variance estimators is actually empirical versus expectation. Here we compare the variances of the two variance estimators. Equation 3.1 can be expressed as following: V c lv^P') ro b u st = Var(P)asyrn. -|- c -F M where e is the random error due to the replacement of I by the sum of squares of pseudoscore terms, and E(e) = 0 and Var(e) = a2. The second term M is a martingale with an expectation of zero and variance of a2 M. So immediately it follows that, Var[Var0)robust] = Var[Var0)asym] + a2 + a2 M. (3.2) Therefore, comparing to the asymptotic estimator, the robust has extra vari ation due to the replacement of observed quantities for the expectations in the estimation of I and A. Conclusion: the above theory shows that the asymptotic estimator should be more efficient than the robust counterpart. However, calculation for the first one requires the information of sampling fraction (or the size of full cohort) for case-cohort whereas no such need for the second one. 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C hapter 4 R obust E stim ators for Exposure-Stratified Case-Cohort For this type of design, the stratification variable is correlated with exposure infor mation and the case-cohort sample is commonly selected by over-representing the exposed cohort. Consequently, the obtained sample will be biased and statistical analysis will require corrections for sampling fractions. Section 2.3.3 described the asymptotic variance estimators proposed by BLS et al, which essentially are extensions of the asymptotic estimator for standard case-cohort through a weighted pseudolikelihood. Naturally, we would want to extend similar ideas to robust estimators by applying a stratum-specific weight which is the inverse of the sampling fraction. 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1 R obust Estim ators Based on W eighted Pseudolikelihood Consider the following likelihood: Cox model based on this weighted pseudolikelihood. 4.1.1 The Naive Robust Estimators Naturally, the first robust estimator one would want to explore for exposure stratified is to mimic the form for simple case cohort. Namely, using the sum of cross products of df betas over case-cohort for the variance, here we call it the unadjusted naive robust estimator: suggested by Barlow for his modified case cohort design. However, as mentioned in Section 2.3.3, with exposure stratified design the stratum mean of XkS is no longer zero like the case in standard case-cohort (4.1) where Wi = ni and mi are sizes of cohort and sampled subcohort in stratum I that subject i belongs. R(t) can be subcohort C or subcohort plus those cases that occurred outside the subcohort, i.e. C U D. For simplicity, here we choose R (t) = C. The robust estimator can then be calculated using the df betas returned from V a r ^ f f ) unadjusted naive robust where Ri = Q U di, the case-cohort set in stratum I. This would be the variance 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (L = 1), although the overall mean for subcohort remains zero. This fact directly extends to other related terms like score residual Ri and df betas. Evidently, the nonzero strata means need to be built into the variance calculation. Thus, an intuitive way to incorporate the strata means of df betas can be formed as such: where Di = ^ Jfk=i At- In a way, this looks like a close form for the empirical 4.1.2 The A sym ptotic Variance Based Robust Estim ator The approach for deriving the adjusted robust estimator is based on the formula tion of the asymptotic estimator given by BLS et al, with application of empirical variance as approximations for expected quantities. Derivation: Using the weighted pseudolikelihood defined in 4.1, the score residual for patient i in stratum I can be expressed as following: L _ V a V ( ft) adjusted naive robust — ^ ^ 1 \ E k ^ t \ /=1 ife e d , (4.3) variance of (3 . where E, Further, define the “observed” part of Ri. 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For subcohort non-failures, = 0 because of dNi = 0. And, the stratum mean for Ai(i)s over subcohort is: 1 ™ l A l = mi and, rT t’.'mi. Bio) = (Zi — Ep;)— — — — dN., the “expected” part of Ri, Jo c Tljec riwj which is the stun of weighted differences over all failure times. For non-subcohort failures, Bi^) = 0 due to the fact that they are not in the risk set and therefore Yi(t) = 0 and rt -(£) = 0. Then the stratum mean for Bi over subcohort C is: 1 mi Note that At and Bi are the stratum means over subcohort Ci which include both controls and cases. Let = ifyi) — Ri be the score residual adjusted by stratum mean. Then the score residuals can be partitioned according to subcohort membership and failure status. As shown in Table 4.1, these terms are further broken down by strata to allow for adjustment within each stratum. Recall that the asymptotic variance for exposure stratified with a single co- variate is: Var0)^m . = r' tZi ni c‘ / = ! n l C' L L mi ieCj '=* U l iedi 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 4.1: Exposure-Stratified Case-Cohort: Partitioned Score Residuals. subcohort non-subcohort (all members) (failures) Patient set a di Strata (I) 1 = 1 , 2 ,...L 1 = 1 ,2 Score residual matrix Rm R« i) Score residual for patient i in stratum I A w - Bt(i) M i) Corresponding matrix o ) - -E fy o ) M i) Matrix dimension p x mi p x m f Stratum Mean Ri Ai — Bi The first term I can be approximated by the sum of AjS over case-cohort, i.e., The second term in 4.4 is the sum of squares of adjusted score residuals from the expected part (Bjs), namely the censored observations from subcohort. Recall that contributions from subcohort failures are two fold: one as censored up to failure time and the other as failure at failure time. This part reflects contributions from the censored observations. 1=1 ifiC, L m i = 5Z XX-%) ~ B i)2 i=i t=i L m t __ __ 1 = 1 t=l = £ B® o > - X > ^ 1=1 1=1 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The third term in 4.4 can be expanded similarly, ‘=i 7 1 1 iec, 1= 1 T li The first term above can be approximated by contributions from subcohort using the following relationship: 1=1 1= 1 n i Proof: By definition, 1 = 1 L tec, Because ri(t) is zero after a jump and the censoring variable Yi(t) is left contin uous, so r;(s) = 0 for s > t, therefore the second integration is nonzero up to t~. Furthermore, replace dNi(t) by baseline hazard and Martingale residual, i.e. dNi = ndAo -I- dMi, and express dA0 by Breslow estimator using the stratum- specific weight Wj, i.e.: d m = £ iG c r;(* H dN.(t) 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Then, term Y ,i =1 A(o)-B/(o) can be expanded as following: x ridN.(t) E jec wjrj(t) = £ £ £ £ w t ) - % W ) [jf <*w - TkWidN.(t) + M {r) where M (r) is a martingale and has an expectation of zero. Thus, 2s E ( 4 (. ) b ; , , n e ^ ) ’ j=i 1 = 1 ni Rewrite the asymptotic variance by partitioned score residuals, Var(@) asym . ~ I 1 [ + ^ 1 ) 1 1= 1 i=i i=i - 2 Y JA m B'm + Y : r ^ m lB 2 l } r l. i=i i=i 7 1 1 (4.5) Similarly, rewrite the unadjusted naive robust variance as defined in previous section, by partitioned score residuals: 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Var(P) unadjusted naive robust = E E X i_1 keCiUdt = -r-'IEIfiiwfiid) ] ® 2 ]?'1 1=1 L = r ' O A , , ) - Bm r + A ffo i- ' 1 = 1 — E l- ^ ® o ) + ^ ® i ) l + E Bm ~ 2 E - , 4 i ( o ) - B / ( o ) K 1 /=1 Var0)tu ym . + 1 1E m *(1 “ ~ ) B*]I *• 1 = 1 711 To further explore the second term, we evaluate the relationship between At and Bi \ At = m ^ ^ »ec, “ i ; E / > « » - ® c W tec, - ^ p z « « - E 4 € ^ + d m ) tec, = — — E % ) + M2 (T) mz rii ^ = — ^ + M2(r) ni 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where M2(t) is a Martingale and has an expectation of zero. Therefore, E{At] = ni Then the additional term in unadjusted robust variance, leaving out the factor J-1, can be approximated by rearranging the terms as following: Y mi(l - — )B2 ( « Y miB] - Y miAiBi 1 = 1 ni 1 = 1 i=i = - Y r m f a - B t j B t i=i L _____ i=i Term “- B ” is actually the stratum mean of score residuals contributed by cen sored observations from subcohort, i.e. ft? = — Bi. Therefore, the relationship between the asymptotic variance and the unadjusted robust variance is simply, L _ V 0,T(P^unadjU 3 ie d naive robust ~ V Q T ^P ^asym . "b -f 1 = 1 = V a r 0 )a s& m . + Y ^ o P k (4‘6) z=i where is the stratum mean of df betas from the “censored” subcohort. Thus, the adjusted robust estimator for exposure stratified case-cohort can be expressed as following: Var(£)„** = £ £ D \ - ’ E m B d P k - (4-7) /=1 fcecud, l = = 1 When L = 1 which corresponds to simple case-cohort, Dg = 0 and the second term goes away. Then, the above variance converts to the form of the robust variance under standard case-cohort. 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C hapter 5 Sim ulation Studies 5.1 Standard C ase-cohort D esign 5.1.1 Cohort Generation The primary interest of our simulation study is to evaluate the performances of variance estimators as described in earlier sections and compare them under different sampling schemes. Three cohort types are considered for this purpose. For convenience, we will call theses types: (i) clinical trials, where all patients enter the study at time zero and exit randomly either due to failure or censoring during the trial; (ii) occupational studies, where patients entered the study at staggered time points and exit randomly; and (iii) intervention trials, where all patients enter the study at time zero and are followed up to a predetermined time point for study termination, patients either fail or are lost to follow up prior to the end of the study, otherwise patients will be censored at this maximum follow up time. A detailed description and parameter settings for each design are provided in the following section. In general, distributions for time variables 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (entry, failure, and censoring) are assumed exponential and covariate dependent, except that entry time is fixed to zero for clinical trial and intervention trials. The proportional hazard model with exponential relative risk is used for all risk ratio models. Simulated data were generated from assumed distributions using SAS macro programs. The settings of each simulation are very general which allow for vari ations of parameters to produce the desired cohort and also restrict them to result a specific failure rate. Particularly, the following parameters are fixed in all settings (i.e. values are assigned by the specific cohort aimed to produce): probability of failures, probability of exposed, relative risks for entry, failure and censoring time. The baseline hazard for failure \ to is fixed at 1, and the baseline hazards for censoring Aco and entry Xeq are derived values by restrictions applied to each cohort. To compare the different estimators, 1000 subjects were randomly drawn from the assumed cohort determined by the covariate, failure, and censoring distrib ution. 500 replications were done for each set of parameters. For consistency of the estimators, the trial sizes are increased from 500 to 2000 by 500 incre ment and with 500 replicates for each size. The case-cohort sample is formed by a random sample from the full cohort with a sampling fraction of 15%, plus all non-subcohort cases. A single binary covariate Z is used in all settings with Z — 1 for exposed and Z — 0 for unexposed. A detailed list of symbols and their definitions is provided in Table 5.1. 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 5.1: Definitions o f P aram eters Used in Sim ulations Parameter D efinition T Failure time C Censoring time E Entry time Xto Baseline hazard function for failure time X t Hazard function for failure time Aco Baseline hazard function for censoring time Xc Hazard function for censoring time Xeo Baseline hazard function for entry time x& Hazard function for entry time frit) density function for failure time fc(c) density function for censoring time /®(e) density function for entry time rT e^Z, relative risk for failure time rc e^Z, relative risk for censoring time te e ? BZ, relative risk for entry time P Probability of failure, overall P o Probability of failure for unexposed Pi Probability of failure for exposed Pi Probability of lost follow up, overall *0 Probability of unexposed T T l Probability of exposure T m ax Maximum follow up time Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Clinical Trials The probability density function for failure time T is /r(t) = Aye- *7,4 and for censoring time C is /c(c) = Ace-Acc. Assuming independent censorship, the joint density function for T and C is then /r,c(£»c) = Are_A TtAce~A cC . Easily, we can derive the probability of failure p: p = P[T < C ] = . A t + A c Using proportional hazard model setting with exponential relative risk, we can write \ t = Xto^ z and A c = Xco^cZ- Further, the probability of failure for unexposed is p Q — ^ro Aro + A co and for exposed is Xt q g P7 P i = Xto^ + Xco^c Then the expected failure rate for exposed and unexposed is Aro , Xto^ P = n o~:----- — t----l-flV Aro + Aco Xto^ + Xcoe@ c Solving above equation for Aco, we get , -A T 0[ ( p - x 0)eft; + (p-7r1)e',r] ± \/ A Aco = ------------------------- 2 ^ 5 -------------------------- • (5 1 ) where A = ( p — 7To)2e 2/?c + 2 {{p - 7r0 ) ( p - TTi) + 2 p ( l - p ) X \ Q ePTe0 c \ + { p - n i) 2e2 ^r . 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To avoid negative values of \ca, the bigger root is used. For convenience, A to is set to 1 in all cohort generations. The cohort is then generated by the known distributions of T and C which were determined by plugging in the assigned values of p, 7T i, e ^ , e ® c and the corresponding derived values for Aco- Occupational Cohort Studies The only difference of this design is the staggered entry time as opposed to fixed time zero used in clinical trials. Therefore, the cohort is generated simply by adding the exponentially distributed entry time to the exit time used in clinical trials. The mean entry time with a density function of /e(£) = A £?e-A £;t can be expressed as following, 7T o 7T 1 + A .E 0 Xeq& @ e The main interest on exposure-dependent entry time is to have more late en tries in the exposed group. For exponential distributed data, this can be achieved by setting the relative risk at values that are less than one (i.e., e^sZ < 1). An other restriction applied here is setting the mean entry time at 2 0 % of the mean failure time as shown in the following equation: + T ^ V = 02 x A eo A eo^ .Aro Xto^ Solving this equation for Ago, x [ 7T i + noel3 E \XToe0T (K ^ E0~ O ^ M ^ + T T ! ] 1 ' J 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The cohort is generated by using the assigned values of p, 7Ti, e ^ , e ^ , and the corresponding derived values for A co and A^o- Aco is calculated using equation (5.1). Intervention Trials By design, patients either fail or are lost to follow up prior to the end of the study, otherwise will be censored at the maximum follow up time, therefore the probability of failure is, p = P[T < C and T < Tm ax] ^ fl p ~ (Ar+AcO^mail “ Ar + Ac I J Similarly, the probability of lost to follow up is pi = P [C < T and C < Tmax] ^ N g— (A 7 ’+Ac)?mo* Ar + Ac * • Plugging in \ T = Xto^ 2 and Ac = \coeP°z , the expected probability of failure and lost to follow up can be expressed by the following two equations, _ TToAro ^ c -(A T0+AC0) r m ail Aro + Aco _ 7 T i A t o N _ -(X TOe ^ + X coe^c)Tmax] ( r 3) + A T 0e ^ + Ac0e ^ I J 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and, pt = 'Ko^C a N _ ^-(\T 0+\C 0 )T m a x 1 Aro + Aco I 7TiAcq6^C ( ■ - . _ - ( \ TOe.P T + \c o e0C )Tm ax] (K 4^ + Ar o e ^ + A c o e ^ L 1 J To generate the cohort, the following parameters are fixed: p,pi, X to, d3 *, and e^c . The remaining two parameters Aco and Tm^ are solutions to equations 5.3 and 5.4. Evidently, these two parameters can not be derived explicitly by formulas, their roots are obtained by using numerical methods, a software called “CMAP” is used to solve the above nonlinear equations. 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.1.2 Scope o f Simulation Studies The goal of this research is to evaluate the three aforementioned variance estima tors developed for standard case cohort design, namely, the Prentice, asymptotic, and robust Variance estimators. The main objective of this simulation study is to assess the validity of the estimators and to compare their performances using simulated data sampled from various cohort types. The detailed descriptions for this task are outlined in the following sections. 5.1.2.1 Validity of estimators The validly of each estimator is assessed with respect to the following two prop erties: 1. Unbiasedness Prentice’s estimator is score-unbiased which means that the expectation of the score under /?0 is zero. It is only approximately true for the asymp totic variance in small sample situations. The unbiasedness of the robust estimator has never been explored. The approach for examining the unbiasedness of the estimators uses the empirical variance as the “gold standard.” The estimator is considered un biased when the 95% confidence interval for the difference between the es timator and the empirical one (e.g. Varrob{fi) — Varetnp(j3)) includes zero. It is worth mentioning that data for empirical variance were generated in dependently. This will ensure that the variance estimator is completely independent of the empirical. Thus, the variance of the difference is just 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the sum of the two variances. Further, the variances for robust and asymp totic estimators are simply the empirical variances of the estimates and the variance of the empirical variance can be calculated based on a formula provided by a text book [15]: Var(Varemp0 ) ) = - — ~a4). n n — 1 where fx4 is the fourth moment and cr4 is the square of the variance. 2. Consistency By definition, the consistency of an estimator is two fold: as sample size goes to infinity, the sample based estimate approaches to the true value (simple consistency) and the variance of the estimator approaches to zero under the condition that it is an unbiased estimator (mean square error consistency). For this purpose, the size of cohort will be increased from 500 to 2000 with an increment of 500. The empirical variance of each estimator will be calculated with each cohort size. The difference of the variance estimator from the “true” value and the rate of change in the variance of each variance estimator will be examined. 5.1.2.2 Precision o f estimators For each valid estimator, the precision of the estimator is also of concern. We evaluate this feature by calculating the standard error of the estimate (i.e. SE = yjV ar(V ar0))/n ) in percent of the “true” value (SEemp0 ) ) . For example, SERobust/SEemp0 ) . 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.1.2.3 Relative efficiencies The efficiency of the estimators will be compared by calculating the relative ef ficiency, that is the ratio of variances of variance estimators. Specifically, the efficiency of the asymptotic variance will be compared to the other two counter parts, namely, the Prentice and the robust estimators. 5.1.2.4 Practical aspects Consider a situation that a few methods are available which also provides similar results, one major factor that weights heavily in an investigator’s decision for choosing which method for analysis will be the ease of implementation. Prom a practical perspective, one would consider how easy or difficult to carry out the analysis using available software, how much computing time it needs, and what kind of information it requires. These factors will be assessed and compared between the three methods. 5.1.3 Simulation R esults All simulation work was done on a Pentium III (CPU 600mHz) computer with 128 megabytes memory. All analyses, except Prentice method, were performed using software SA S. Prentice variance was calculated using software Epicure. As mentioned earlier, case-cohort design is particularly useful for rare disease situations. Therefore, the failure rate was chosen at 10% for all simulated cohorts. The exposure rate was set at 50% in most cases although in some cohorts a 10% 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. exposure rate was used. Results of simulation studies for standard case cohort are provided in Appendix A, Tables A.l to A. 14. Assessment for unbiasedness was focused on the asymptotic and robust esti mators whereas only limited evaluation was done for Prentice method. As shown in Tables A.l-A.3, there was a clear tendency for downward bias which suggest that the estimators underestimated the true variance. The bias was especially severe and consistent in occupational cohorts, with a range of 5% to 21%, re gardless whether or not the entry time was dependent on exposure. Results from Prentice method indicate similar bias. The direction of bias was mixed in clinical trial and intervention trials. To confirm the findings, simulations were repeated and similar results were obtained for occupational cohorts. Results from clinical trials varied slightly and the magnitude of bias was relatively small (<12%). In intervention trials, very small bias (< 6 %) was observed and the direction of bias in cases for ry = 1 and ry = 2 was opposite. Noticeably, the confidence inter vals were extremely narrow due to the very small variance of the estimators. In summary, both asymptotic and robust estimators indicated some degree of bias (plus or minus). Recall that the variance of (3 is the first order approximation of the Taylor expansion on the variance of the score. To investigate the possible “second order bias,” the score under and its corresponding asymptotic and robust variances were calculated and compared to the empirical counterpart. The 95% confidence intervals for the differences between each estimator and empirical variance (e.g. Varr0 b(U(/?o)) — Varemp(U(/30))) were calculated. Data for the empirical variance 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. were generated independently. This was done so that the variance estimators were completely independent of the empirical. The variance for the empirical variance was calculated using the same formula as described previously. As shown in Table A.4, similar downward bias as seen in simulations for Var(p) was found for Var(U(/30)) in clinical trial and occupational studies. However, in intervention trials, a substantial downward bias (23%) was observed with r = 2 which is in contradiction to results obtained for Var(j3). Since the bias in occupational cohorts was consistently large in V a r 0 ) and Var{U(fio)), the simulation was repeated with a much larger replications (5000 samples) and the results (Table A.5) were very similar which confirmed the downward bias found in previous simulations. Based on these results, we can conclude that the bias was not attributed to the “first order approximation” used in deriving Var(p) from Var(U), it is particularly true in clinical trial and occupational studies. The differences found in intervention trials are controversial and warrant further investigation. Recall that the asymptotic variance of the pseudoscore is the sum of “ I + A” and / can be estimated fairly accurately by I from subcohort with all cases in cluded. Therefore the estimation of A, which is the covariance between score terms, needs to be further evaluated. For this purpose, we want to investigate the quantities involved in the calculation of term A. As shown in equation 2.14, A consists of two parts: Var[S] and Cov[U,8], where U is the full cohort score and 8 = ££=i(l - %h)Xk- These two quantities (U and 5) are considered “con ditionally uncorrelated” and therefore the covariance term Cov[U, 5] was dropped 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. from the calculation of A. To examine whether Cov[U, 5] ~ 0, we need to cal culate the full score U and ££ =1(1 - ^ h ) X Although the second term can be simplified to -™ I~^k&c^k ^ue ^ ac^ ' ^ = calculation for Xk is not straight forward because of the mixed terms from full cohort (E) and subcohort —r)- Our second concern is the method used for estimating m Z-vj 3 Var(Xk) as shown in equation 2.17. The empirical version of Var(Xk) assumes XkS are independent. However, this is only true for Xk(Po), but Xk(/3) are ex pected to be correlated. Replacing Var(Xk(f3o)) by Var(Xk0)) would result in smaller variances due to the correlation of Xk terms. Moreover, Var(Xk) has a positive impact on Var{U) or Var0), therefore an underestimated Var(Xk) will lead to a smaller asymptotic variance. Thus, we conjecture that A(/3b) > A(/3) due to the fact of Var(Xk(Po)) > Var{Xk0)), and we investigate this question in the context of our simulation. Table A.6 presents results for comparison of Var(Xk(Po)) and Var(Xk0)). Little or no differences were found in these two quantities which suggest that V ar(Xk0 )) approximate Var(Xk(fio)) very well despite the correlations between XkS. For evaluation of consistency of the estimators, simulations were repeated by increasing the size of trials from 500 to 2000 with an increment of 500. Two distributions in clinical trial setting with exposure rate of 10% and 50% are used and results are given in Tables A.7 and A.8 , respectively. Results from the inter vention trial cohort is provided in Table A.9. One caution is worth mentioning here for reading these tables: the magnitude of variances of variance estimators (shown in parenthesis) decreased greatly as sample size increased, and therefore 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. for better precision, a factor of 1 0 4 was multiplied to the variance of the estimator. Readers should always refer to footnotes for proper interpretation of this quan tity. All estimators demonstrate good consistency evidenced by the following facts: (1) the variances of variance estimators decreased at a rate of approxi mately 1 0 - 3 over the increase of sample size; (2 ) the means of the estimates were close to the empirical for large sample sizes. Overall, the robust estimator had the largest variance in all settings while the asymptotic estimator was similar to the Prentice one. In another word, the asymptotic estimator is superior to the robust one in terms of variance. Results of extensive variance comparisons under different settings are given in Table A.10 for clinical trials, Table A.11 for occupational studies, and Tables A.12 to A. 14 for intervention trials with three different settings for lost to follow up: 0% (no lost follow-ups), 10%, and 20%. The values for relative risks of tt and rc included 1 and 2 , while r = 1 corresponds to /? = 0 for no association with disease and r = 2 corresponds to (3 = 0.693 for exposure elevated risk. Simulations were done for all four possible combinations of rT and rc- The relative risk for entry time as defined in occupational cohorts was chosen at 0.5 which would result more late entries in exposed cohorts. For cases with a relative risk not equal to 1, censoring and staggered entry are considered differential, i.e. depend on exposure status. The following five variance estimators were calculated for comparison purposes: (i) empirical variance, the actual variance of /3 over multiple trials; (ii) inversed information, the average of J - 1 returned from Cox models, often called the naive estimator for case-cohort studies; (iii) robust variance, the average of 64 permission of the copyright owner. Further reproduction prohibited without permission. the cross product of df betas (D® 2) from the case-cohort sample; (iv) asymptotic variance, the average of variance estimates by equation (2.18); and (v) Prentice, the average of Prentice variance as shown in section 2. In addition, the variances of different variance estimators are also provided for assessment of precision and comparison of efficiencies. All the calculations except for Prentice method were done with 500 trials using SAS phreg. The Prentice variance was calculated using Epicure Module Peanuts and only the first 100 trials out of total 500 generated trials were used due to the limitation of the software, such as time and memory constraints. As shown in all tables, variances given by three estimators are very close to each other and in most cases slightly smaller than the empirical one, whereas the inverse information severely and consistently underestimated the variance ranging from 28% to 60%. In general, the Prentice and the asymptotic estima tor performed better than the robust counterpart. Again, the efficiency of the asymptotic variance is superior than the robust one and similar to the Prentice one. For example, as seen in Table A. 10 for clinical trials, the relative efficiency of robust to asymptotic ranges from 1.41 to 1.79 while Prentice to asymptotic ranges from 0.93 to 1.68. One noteworthy point in this table is that the relative efficiency of Prentice versus the asymptotic is very close to 1 when censoring is independent of exposure (rc = 1). For occupational studies, the asymptotic variance estimator outperformed the robust one by a two-fold efficiency differ ence for all settings. However, as comparing to the Prentice estimator, the gain is very small when r r = 2 and it actually loses efficiency when = 1. With 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. intervention trials, the asymptotic method consistently performed slightly better than the robust one whereas it is only superior to the Prentice method in some cases. Computation for asymptotic and robust variances is considered extremely efficient. For example, for 500 trials with 1000 subjects per trial and 15% sampling fraction for subcohort, the time required from generating the cohort, sampling subcohort, and calculating the robust and asymptotic variances normally takes less than 2 minutes. 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.1.4 Conclusions from th e Simulations 1. Validity of the estimators 1.1 Unbiasedness - In most cases, all estimators for Var(/3) showed some degree of downward bias from the “true” variance. Further evaluation of the variance of the score under @ o (i.e.Var(U(0o)) confirmed similar findings, the bias was more prominent in occupational cohorts where both entry time and exit time are random: (a) Clinical Trials: almost all biased down for Var((3) except the case with ry = 1 and rc = 1 where the robust estimator biased up slightly (4%); (b) Occupational Studies: Biased down in all settings for both Var{0) and Var(U((3o)), the bias ranged from 5% to 21%; (c) Intervention Trials: Contradictory results obtained between Var(0) and Var(U(Po)). 1.2 Consistency - all demonstrated good consistency evidenced by: (a) Simple consistency: the bias seemed to decrease with the increase of sample size. (b) Mean square error consistency: the variability of the estimators decreased rapidly with sample size. 2. Variances of the estimators 2.1 Very small variances in all estimators (with SE< 1% of the estimate); 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2 Asymptotic estimator appears the best; 2.3 Prentice estimator is often close to the asymptotic one; 2.4 Robust estimator has larger variance than the other two counterparts. 5.2 Exposure Stratified C ase-C ohort D esign 5.2.1 Cohort G eneration Three types of study designs are considered in this simulation study: clinical trials, occupational studies, and intervention trials using the same design features as described in standard case-cohort section. The distribution parameters used to generate the exposure-stratified cohorts are very similar to those specified in simple case-cohort except that the exposure distribution varied across strata. For convenience, a binary surrogate Z of the exposure covariate Z (1 for exposed and 0 for unexposed) is used to partition the cohort into two strata (I = 1 , 2 ) with the following fractions: S tra tu m 1: Z = 1, exposure surrogate negative Z ( - ) and v\ = P (Z = 1) S tra tu m 2: Z = 2, exposure surrogate positive Z(+) and v2 = P (Z = 2) The proportion of exposed in each stratum is denoted as n = P (Z = 1/Z = I) with 1=1 or 2, and the overall exposure rate is then r = r\ + u2r2. The correlation between Z and Z can be specified by the two parameters below: Sensitivity: P {Z = 2/Z = 1) = r2v2/r Specificity: P (Z = 1/Z = 0) = (1 — r\)v\f{l — r) 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sampling for case-cohort is done independently for stratum 1 and 2 with fractions of 7T i and 7T 2, respectively. To generate the desired cohort, the following parameters are fixed with as signed values: overall failure rate (p), the proportions of exposed for the cohort (r) and for surrogate positive stratum (r2), sensitivity, specificity, and the sam pling fractions for the cohort (ir) and for surrogate positive stratum (7t2). Values for other parameters u2, and ffi) are adjusted by the constraints mentioned above. Cohorts for each study type were generated using almost identical methods to that described in Section 5.1.1, except that each patient was first randomized into one of the strata according to the value of Z and the sampling was done using the stratum-specific fractions. 5.2.2 Scope of Simulation Studies The scope of our simulation studies for exposure-stratified design is similar to those specified in standard case cohort. Specifically, we want to assess the valid ity of BLS’s asymptotic and the derived robust estimator for unbiasedness and consistency under various cohort types. Further, we evaluate and compare the estimators with respect to the following properties: 1) Precision - the variances of variance estimators; 2) Relative efficiencies - robust vs. asymptotic estimator; 3) Practical aspects - implementation using available software, computing time, and required information. In addition, the two naive robust estimators will be 69 permission of the copyright owner. Further reproduction prohibited without permission. calculated and compared to the asymptotic-based robust, BLS’s asymptotic, and the empirical variances. 5.2.3 Simulation R esults For each simulated cohort, a Cox model was fitted using the weighted pseudolike lihood defined in Section 4. The related quantities such as (3, I -1, and df betas were obtained from the model. Results of simulations are given in Appendix B, Tables B.l to B.14. Among the three proposed robust estimators, the asymptotic-based one was the primary robust estimator for evaluation purposes in terms of unbiasedness, consistency, and comparison to the empirical and BLS’s asymptotic estimator (Tables B.l to B.9). The naive robust estimators were calculated for illustration and comparison only (Tables B.10 to B.14). The empirical variance of each estimator is reported and the variance of the empirical variance was calculated using the same formula as described previously. Note that all variances of the estimators are reported with a factor of “ x 1 0-4”. As shown in Table B.l, both the asymptotic and the robust estimators showed some degrees of bias. In clinical trials and occupational studies, they both under estimated the “true” variance up to 30%. The magnitude of the bias arisen from the two estimators was actually very close. However, the bias seen in intervention trials was controversial for the two cases considered although its magnitude was relatively small (<5%): upward bias for r = 1 and downward bias for r = 2. The 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. two estimators demonstrated good consistency, as shown in Tables B.2-??, evi denced by the following facts: (1) the variances of variance estimators decreased at a rate of > 10~2 with the increase of cohort size, for example, in clinical trial the coefficient of variation dropped approximately from 24% to 12% as cohort size increased from 500 to 2000; (2) the means of the estimates were close to the empirical for large cohort sizes. For precision, they showed very small variances, typically with SE<2% of the estimate. As shown in Tables B.5 to B.9, the results obtained from the two methods are very close to each other despite the fact that they both tend to be biased downward from the empirical, especially in clinical trials and occupational cohorts. Further, they seemed to be more close to the “true” variance in intervention studies, in another word, very little bias observed in these settings. The efficiency of the two estimators are very similar except the case observed in occupational cohorts where the asymptotic one was more superior to the robust counterpart. As indicated in Tables B.10-B.14, results based on naive robust estimators were quite different from the empirical and other derived estimators. Overall, the adjusted naive performed poorly comparing to the rest especially in clinical and intervention trials. It suggests that the adjusted naive is not a valid estimator for the variance. In some cases, under clinical trial or occupational study settings, the unadjusted robust variance provided closer results to the empirical even though they were further away from BLS’ s asymptotic and the derived robust variance. This unexplained phenomena could be due to the downward bias reported earlier. Moreover, in intervention studies, results based on the two naive robust estimators 71 permission of the copyright owner. Further reproduction prohibited without permission. were noticeably different from the rest, whereas both BLS’s asymptotic and the derived robust estimators provided very close results to the “true” variance. In addition, the unadjusted naive has a much better precision, namely, a much smaller variance comparing to other estimators. 5.2.4 Conclusions from the Simulations 1. Validity of the estimators - BLS’ s asymptotic and the derived robust (asymptotic- based) variances: 1.1 Unbiasedness - Both estimators showed some degrees of bias from the “true” variance, particularly in clinical trials and occupational studies; 1.2 Consistency - All demonstrated good consistency, evidenced by: (a) Simple consistency: the bias seemed to decrease with the increase of sample size. (b) Mean square error consistency: the variability of the estimators decreased rapidly with sample size. 2. Comparison of the estimators: 2.1 The asymptotic-based robust variance provides very similar results as to BLS’s asymptotic estimator; 2.2 The naive robust variances are not considered valid estimators al though the unadjusted one may provide close results in some cases. 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3. Variances of the estimators 3.1 All estimators had very small variances and the unadjusted naive had a much smaller variance comparing to the rest; 3.2 Although by theory, the robust estimator should have larger variances comparing to the asymptotic one, this was only true for simulated occupational cohort. This raised the question for further exploration on the form of the “real” robust estimator. It is worth mentioning that, for exposure stratified design, calculations for both the Robust and the asymptotic estimators require the knowledge of sam pling fraction for each stratum in forming the weighted pseudolikelihood. This condition is unavoidable due to the sampling scheme applied which resulted over represented subgroups. From practical concern, both estimators can be calcu lated fairly easily and efficiently based on the same Cox model using available softwares such as SAS, Splus, etc. 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C hapter 6 A pplications to A R eal D ata Exam ple 6.1 T he T BF Cohort The cohort used for data example was from a follow-up study [8] of 1742 fe male patients who were discharged alive from two tuberculosis (TB) sanatoria in Massachusetts between 1930 and 1956. During the course of lung collapse therapy, 1044 women were examined repeatedly for an average of 101 times with X-ray flouroscopies. The comparison group of 698 women with TB received other treatments that did not require fluoroscopic monitoring. The average age at the start of follow-up was 25.5 years, and the average length of follow-up to December 1980 was 30.2 years. At the end of study, 75 breast cancer cases were identified with 54 exposed and 21 unexposed. Estimates of radiation dose to breast in exposed was computed based on the number of flouroscopies, type of equipment used, reconstruction of exposure conditions, and absorbed dose calculations. Results of analysis using full cohort data was first published by Boice and Monson [2] and then followed by Hrubec’s paper [8] for second follow-up report. 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Using the breast cancer incidence rates of Connecticut, they found excessive breast cancer cases in the exposed group (55 observed versus 35.8 expected). Using a maximum likelihood method, they found an increase in risk with radiation dose, but only after a latency of 15 years. More findings in their papers include: risk appeared to decrease with increasing age at exposure; low dose fractionated exposures are as effective as single exposures of the same total dose in inducing breast cancer; and etc. This cohort provides a good example of situation where a case-cohort study might have been cost-saving. Suppose that only cohort membership was known, then a simple case-cohort would be appropriate. Subjects would be randomly sampled and the radiation dose information would be needed only for sampled subcohort and cases. Further, suppose flourscopic exposure was recorded for every patient or could be easily identified from medical records, then an exposure stratified sampling may be more efficient since we could oversample the exposed cohort. In addition, because information for radiation dose was available for full cohort, we can also perform our analysis based on full cohort and compare the results with case-cohort samples. 6.2 M ethod o f Sam pling and Scope o f D ata A nalysis The full cohort data obtained was in S A S format which contained the following information: disease status, date of birth, age and date at first treatment, date 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. at entry, date at exit, exposure and dose information. Upon examination of the data, we identified the following problems prior to data analyses: Data related problems and solutions: • Two patients had a second breast cancer diagnosis and only the first one was used in analysis. • Due to data collection error, sixteen patients ended with an exit date shortly before the entry date and none of them was a breast cancer case. For analysis purpose, exit dates for these patients were set as “exit age = entry age + 0.000001.” • Twenty two exposed patients had missing data for dose information and therefore were excluded from analysis. Methods for sampling case-cohort included simple sampling (unstratified) and exposure stratified sampling. For simple case cohort, the following sampling fractions were used: 5%, 10%, 20%, 50%, and 100% (full cohort). For comparison purpose, calculations for variance of f3 (presented as standard errors of /3 in result section) included Prentice method, robust and asymptotic estimators, and the information inverse. Estimation for (3 was based on the Cox model using the pseudolikelihood with subcohort as the risk set, which was the same model used for calculation of robust and asymptotic variances. In addition, relative risk and its confidence intervals are also provided. For exposure stratified, the following sampling strategies were applied: 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Fixing the size of subcohort to a multiple of the number of cases (i.e. 75,150,225, and 300). • With each size, varying the ratio of exposed to unexposed (i.e. 2:1,1:1, and 1:2). 6.3 R esults o f A nalysis As described in previous section, there were 75 identified breast cancer cases and among them 54 were exposed to fluoroscopy. The observed incidence rate for breast cancer was very low (< 5%) in this cohort which makes it a good case for using the case-cohort design. Results of analysis are presented in Appendix C, Table C.l for standard case cohort and Tables C.2-C.6 for exposure stratified. Significant dose effect on increasing breast cancer risk was found in all samples, with the lower bound of 95% confidence intervals of relative risk greater than 1. As shown in Table C.l, the (3 estimated by Prentice method, which included the non-subcohort cases at his own failure time in risk set, was almost identical to the one returned from the pseudolikelihood with subcohort only in the risk set. The standard errors of /3 based on the three estimators (asymptotic, robust and Prentice methods) all provided very similar results. However, the information inverse based estimate for SE{(3) was much smaller although the difference tends to shrink with the increase of sampling fraction. Particularly, when the sampling fraction reaches 50%, I -1 is almost identical to the SEs from the three estima tors. Comparing to the result from full cohort, the parameter estimate (3 varied 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. substantially between samples. This partially reflects the variability of sampling. Clearly, the variance of J 3 decreased with the increase of sample size. In parallel, the confidence intervals for relative risk became narrower. Results from exposure stratified sampling confirmed similar findings. All sam ples suggest significant dose effect on breast cancer. For a fixed subcohort size, sampling more exposed (for instance, 2:1 for exposed:tmexposed) would result in a smaller variance of (3. Of course, increasing overall sampling fraction would lower the variance as well. The proposed robust estimator provided very similar results to BLS’ s asymptotic counterpart across all samples. Furthermore, when full cohort was used, the calculated robust estimator based on exposure stratified method gave identical result to the robust estimator for standard case cohort. In summary, this application provides real data based evidence for the close relationship identified in this research. Results from exposure stratified samples proofed that the proposed asymptotic-based robust estimator worked very well under various sampling schemes. All findings regarding the performances of the estimators are consistent with those found from simulation studies. 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C hapter 7 D iscussion This dissertation work provides important theory foundation for establishing the relationship and explaining the differences between the asymptotic and robust variance estimators in case cohort studies with standard or exposure stratified sampling schemes. Results from our extensive simulation studies demonstrate ad equate evidence for providing answers to questions of majors concerns addressed in the dissertation proposal, in terms of validity and performances of existing variance estimators published in statistical literature. Based on our findings, new issues have been emerged and further research has been identified in this specific area. 7.1 Standard Case-Cohort In general, we conclude that all three estimators provide similar and consistent a results for estimation of the true variance of /? under a variety of cohort types. All methods are considered as estimators with high precision, typically with a standard error less than 2% of the estimate. 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The major concern for validity of the estimators under evaluation is the bias indicated by simulation results. In most cases they appear to underestimate the true variance. Further investigation of the score and its variance given by asymptotic and robust estimators under the true parameter value (fio) confirm that this bias is not due to the “second order bias” from the Taylor expansion used in approximating the variance of the parameter. In particular, the bias is more prominent in occupational cohorts where patients enter the study randomly. For instance, the bias for asymptotic and robust estimators in occupational cohorts under standard case cohort ranged from 12% to 18%. Conceptually, with a prospective case cohort sampling which is exactly the case in our simulations, the subcohort is selected at “time zero” and the process is independent of the patient’ s actual entry time into the study. It is conceivable that members of subcohort with late entries could only serve as controls for failures occurring after their entry time. Therefore this prospective sampling scheme can cause potential bias in studies with staggered entry, although the actual effect or impact of bias on variance estimators is not clear. One possible way to avoid this problem is to sample the subcohort retrospectively, namely after entry into the study. It is worth mentioning that Barlow’s simulation was done based on a setup that is similar to our intervention trial in which all patients enter at “0” and exit either as failure or being set to censored after the expected number of failures are reached, i.e., all remaining patients are censored at the last failure time. Our simulation results indicate that very little bias was observed from intervention trials. On the other hand, Therneau’s simulations was based on a setup that seemed like our 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. clinical trials and their results showed similar downward bias from asymptotic and robust estimators, which is consistent with our findings. They recognized the bias and stated that the estimators are only approximately unbiased. According to our research, no simulation studies based on an occupational cohort has been reported in literature. Therefore, the problems with bias has not been ever addressed and discussed until our findings from simulation studies. Although at this stage, we have not been able to identify the source of bias or to provide solutions for minimizing it, we are proposing some ideas for future research at the end of this dissertation. From the practical perspective, following the work by Therneau and Li, the computation for asymptotic variance became extremely easy and efficient using softwares like S A S or S — Plus. Besides, the robust estimator can also be calcu lated based on outputs from the same Cox model. However, the Prentice method is still limited to one software, Epicure, which is computationally intensive and time consuming- Furthermore, the asymptotic variance requires the knowledge of sampling fraction whereas no such need is required for computing the robust or Prentice variances. 7.2 Exposure Stratified Case-Cohort When exposure stratified sampling is a better option for study design, the only available analysis method published in the literature is the asymptotic variance 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. given by BLS et al which is considered model-based. The proposed robust esti mator reduces the dependency on the model by using empirical variance approx imations. However, since it was derived based on the asymptotic variance and the actual robustness is not clear, and it needs to be investigated thoroughly. In principle, the robust estimator should be derived by calculating the empirical variance from the case influence on the full score (or on 0), which is actually the score residual Ri (or / -1D,) returned from Cox model. For case cohort design, the additional variance induced by finite sampling should be estimated by all possible subcohort samples from the cohort. However, this was not the tech nique adopted for deriving the robust estimator with simple case cohort. The proposed robust variance estimator under exposure stratified is a modified em pirical version of BLS’s asymptotic counterpart. As a matter of fact, results from simulation studies show they are very close to each other. With respect to relative efficiency, the asymptotic estimator outperformed the robust one only in occupational cohorts. Some cases in intervention trials showed boundary loss or gain in efficiency. It is not clear whether this unexpected phenomenon was purely due to random chance or some undiscovered components in the estimator. Alternatively, we can conclude that the efficiency of the two estimators was very similar. The asymptotic-based robust variance provides similar results to that is obtained from BLS’s estimator, whereas the naive robust estimators performed poorly in general although the unadjusted naive robust variance sometimes may provide a fairly close estimate. 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In addition, similar bias was found from the estimators for exposure stratified design. The magnitude of bias ranged from 21% to 30% which was much higher than those seen in standard case cohort. The explanation for the even higher bias as observed in exposure stratified simulations is primarily due to the consequence of oversampling the exposed cohorts and the entry time was purposely set to be exposure dependent which generated more late entries in exposed group. This would result higher portion of subcohort members entering the study in late stage. Thus, if the assumption made for source of bias partially due to staggered entry is true, then more bias would be expected. With a modified case-cohort design, Barlow applied the robust estimator us ing a similar weighted pseudolikelihood to what we used for exposure stratified. In brief, the study was designed to investigate the effect of mammography screen ing on preventing breast cancer death. In order to evaluate whether breast cancer diagnosis can serve as a surrogate outcome for mortality, the subcohort was aug mented by all other non-fatal breast cancer cases dining the course of the study. At each failure time, the risk set consisted of the failure, all non-fatal cases weighted by unity, and the subcohort excluding those non-fatal cases weighted by the inverse of sampling fraction. Then the case influence was calculated based on this weighted pseudolikelihood. However, the actual formula used for estimat ing Var(j3) was not described explicity in the paper although it implied that it could be just the sum of squares of df betas over case cohort. This design was somewhat similar to exposure stratified except that it was stratified by an inter mediate outcome, the diagnosis of breast cancer, and this stratum was sampled 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 100%, therefore resulting in a weight of unity. Unlike the exposure stratified in which subcohort was selected at the beginning of the study, these non-fatal cases enter the study at time of diagnosis and exit at death. Barlow’s paper suggested that the unadjusted robust variance based on the weighted likelihood could esti mate the true variance of 0 fairly close. However, this is not consistent with what we found from our simulations. In a few situations, such as occupational cohort where more bias was found in derived estimators,i.e. BLS’ s asymptotic and the derived robust variances, the unadjusted robust variance provided results closer to the empirical whereas the two considered as valid estimators underestimated the true variance quite a bit. It is not clear whether this was the result due to chance or some other unknown reasons. On the other hand, simple derivations and simulation results show that the unadjusted robust variance is an empirical version of the unadjusted asymptotic variance given by BLS et al (i.e., leaving out the stratum mean in covariance part, the second term). Nevertheless, by theory, the stratum mean of XkS is nonzero and should not be dropped from the covariance term. The asymptotic-based robust variance is equal to the unad justed naive robust variance subtracting a positive correction term, so the former is always smaller than the latter. The relative magnitude of this correction term was rather large with a range from 15% to 25% over BLS’s asymptotic variance. Therefore, the correction term is not considered negligible for a valid estimate. Simulations for exposure stratified case cohort were based on fixed stratum- specific weights (i.e. ni/mi). Programs for time dependent weights, which use the actual sizes of risk set in full cohort and subcohort (n(t)/m (t)) at each unique 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. failure time, were developed but not implemented in this dissertation. However, two published papers which examined the differences between fixed weights and time varying weights, one by Barlow for modified case cohort and the other by BLS et al for exposure stratified, reported that a slight improvement in efficiency with time dependent weights was observed. 7.3 Future Research In summary, this research work provides theoretical and simulation based evi dence for the close relationship between the asymptotic and robust variance es timators for case cohort studies. The proposed robust estimator under exposure stratified design performed fairly well in our simulation studies and applications to a real data example. These results suggest that the newly developed robust estimator may be a potential candidate or even a better alternative for estimating the variance, especially in situations that the assumed model is not a good fit for the actual data. This statement is only our speculation, however, the robustness of the proposed estimator still needs to be explored extensively. Upon completion of this dissertation, new questions have been raised and future research work has been identified in this specific area: 1. Identify the source of bias through theoretical investigation and/or more rigorous and thorough simulation studies, including but not limited to: 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) Perform simulation studies on occupational cohorts using retrospective sampling approach and reexamine the unbiasedness of the estimators to find out whether the bias was partially caused by the late entries; (b) Modify or adjust the estimators to accommodate studies with stag gered entry (such as the occupational cohorts) and prospective sam pling; (c) Evaluate the accuracy of estimation for covariance terms (A) by calcu lating XkS under @ 0 and computing the corresponding empirical vari ances; (d) Examine the pattern of bias: check whether the increase of sample size can reduce the bias; (e) Quantify the magnitude of bias under various settings in terms of distribution parameters and cohort types. 2. Test the robustness of the derived robust estimator under various model misspecification. 3. Derive the “true” robust estimator for exposure stratified design using the classical approach, i.e., the infinitesimal jackknife concept and the finite sampling theory; and compare to the asymptotic-based robust estimator as presented in this dissertation. 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R eference List [1 ] W. Barlow, “Robust variance estimation for the case-cohort design”, Bio metrics vol.50 pp. 1064-72, 1994. [2 ] J.D. Boice and R.R. Monson, “Breast cancer in Women after repeated fluoroscopic examinations of the chest”, J. Natl. Cancer Inst. Vol.59,No.3, pp. 823-832, 1977. [3] 0 . Borgan, B. Langholz, S. Samuelsen,L. Goldstein and J. Pogoda, “Expo sure stratified case-cohort designs”, Lifetime Data Analysis vol.6 pp. 39-58, 2000. [4 ] K.C. Cain and N.T. Lange, “Approximate case influence for the propor tional hazards regression model with censored data”, Biometrics vol.40 pp. 493-99, 1984. [5] J.M. Dekker, R.S. Crow, et al, “Low heart rate variability in a 2-minute rhythm strip predicts risk of coronary heart disease and mortality from several causes: the ARIC study”, Circulation vol. 101, no.ll, September 12, pp. 1239-1244, 2000. [6] M. J. M. Dirx, P.A. van den Brandt, R.A. Goldbohm, and L.H. Lumey, “Diet in adolescence and the risk of breast cancer: results of the Netherlands Cohort Study”, Cancer causes and control vol. 10, pp. 189-199, 1999. [7 ] S. Greenland, “Adjustment of risk ratios in case-base studies (hybrid epi demiologic designs)”, Stat. Med. vol.5 pp. 579-584, 1986. [8 ] Z. Hrubec, J.D. Boice, Jr., R.R. Monson, and M. Rosenstein, “Breast cancer after multiple chest fluoroscopies: second follow-up of Massachusetts women with tuberculosis”, Cancer Research Vol.49 pp. 229-234, 1989. [9 ] D.P. Liao, J.W. Cai, et al, “Cardiac autonomic function and incident coro nary heart disease: A population-based case-cohort study”, Am. J. Empi- demiol. vol.145, pp. 696-706, 1997. [10] D.Y. Lin and L.J. Wei, “The robust inference for the Cox Proportional Hazards Model” , J. Amer. Stat. Assoc, vol.84 pp. 1074-79, 1989. [11] D.Y. Lin and Z. Ying, “Cox regression with incomplete covariate measure ments”, J. Amer. Stat. Assoc, vol.88 pp. 1341-49, 1993. 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [12] S.D. Mark, Y-L Qiao, S.M. Dawsey, et al, “Prospective study of serum se lenium levels and incident esophageal and gastric cancers”, J. Natl. Cancer Inst, vol.92, no.21, pp. 1753-63, 2000. [13] R.G. Miller, “The jackknife - a review”, Biometrika vol.61 pp. 1-15, 1974. [14] v.R. Mises, “On the asympptotic distribution of differentiable statistical functions”, Ann. Math. Statist, vol. 18 pp. 309-348, 1947. [15] A.M. Mood, F.A. Graybill, and D.C. Boes, Introduction to the theory of statistics, 3rd ed, McGraw-Hill Publishing Company, 1974. [16] F.J. Nieto, A.R. Folsom, and at el, “Chlamydia pneumoniae infection and incident coronary heart disease - The atherosclerosis risk in communities study”, Am. J. Empidemiol. vol. 150, pp. 149-56, 1999. [17] R. L. Prentice, “A case-cohort design for epidemiologic cohort studies and disease prevention trials”, Biometrika vol.73 pp. 1-11. 1986. [18] N. Reid and H. Crepeau, “Influence functions for proportional hazards re gression” , Biometrika vol.72 pp. 1-99, 1985. [19] E.G. Schouten, J.M. Dekker, F.J. Kok, et al, “Risk ratio and rate ratio esti mation in case-cohort designs: hypertension and cardiovascular mortality”, Stat. Med. vol. 12 pp. 1733-1745, 1993. [20] A.G. Schuurman, R.A. Goldbohm, E. Dorant, and P.A. van den Brandt. “Anthropometry in Relation to Prostate Cancer Risk in the Netherlands Cohort Study”, Am. J. Epidemiol, vol.151, no.6,March 15, pp. 541-549, 2000. [21] S.G. Self and R. L. Prentice. “Asymptotic distribution theory and efficiency results for case-cohort studies” Ann. Statist, vol.16 pp. 64-81,1988. [22] P.D. Sorlie, F.J. Nieto, et al, “A prospective study of cytomegalovirus, herpes simplex virus 1, and coronary heart disease - The atherosclerosis risk in communities study”, Arch. Intern. Med. vol. 160, no. 13, July 10, pp. 2027-2032, 2000. [23] T.M. Therneau and H. Li, “Computing the Cox model for case cohort designs”, Lifetime Data Analysis vol.5 pp. 99-112, 1999. [24] P.A. van den Brandt, M. J. M. Dirx, C.M. Ronckers, P. van den Hoogen, and R.A. Goldbohm, “Height, weight, weight change, and postmenopausal breast cancer risk: the Netherlands Cohort Study”, Cancer Causes and Control vol.8, pp. 39-47, 1997. [25] A.J. M. van Loon, I.J. Kant, G.M.H. Swaen, R.A. Goldbohm, A.M. Kre- mer,and P.A. van den Hoogen, “Occupational exposure to carcinogens and risk of lung cancer: results from The Netherlands cohort study”, Occup. Environ. Med. vol.54, pp. 817-824, 1997. 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [26] A. Volovics and PA. Van den Brandt, “Methods for the analyses of case- cohort studies”, Biom. J. vol.2 pp.195-214, 1997. [27] S. Wacholder,M.H. Gail,D. Pee, and R. Brookmryer, “Alternative variance and efficiency calculations for the case-cohort design”, Biometrika vol.76 pp. 117-123, 1989. 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A ppendix A Sim ulation R esults: Standard Case-Cohort Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table A.l: Standard Case-Cohort: U nbiasedness o f Variance Estim ators in C linical Trials0. Rel. Risk Empirical Robust Variance Asymptotic Variance Fail Censor Variance6 T t T c VEmp(P) % of Bias0 95% Cl for Vjlob ~ VEmp % of Bias 95% Cl for Vksym VEmp Exposure Rate = 50% Failure Rate (overall) = 10% 1 1 .1002 -12% (-.0130, -.0113) -13% (-.0143, -.0127) 1 2 .0904 4% ( .0028, .0048) -3% (-.0019, -.0036) 2 1 .0953 -2% (-.0025, -.0003) -4% (-.0048, -.0029) 2 2 .0973 -3% (-.0044, -.0021) -5% (-.0061, -.0042) °Based on 500 trials with 1000 patients per trial for full cohort; case-cohort sample is formed by a 15% random sample from full cohort plus all non-subcohort cases. *Data for empirical variance were sampled independently. Therefore, no covariances between empirical variance and other estimators. C 100 X (Vro6 — VEmp)/VEmp- 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table A.2: Standard Case-Cohort: U nbiasedness o f Variance Estim ators in O ccupational Studies0. Rel. Risk Empirical Fail CensorVariance6 rT rc VBmp(P) Robust Variance Asymptotic Variance % of Bias0 95% Cl for V floi VEmp % of Bias 95% Cl for ^Asym V km p Exposure Rate = 50% Failure Rate (overall) = 10% a. Entry time is exposure dependent (rB = 0.5,) 1 1 .1177 -11% (-.0142, -.0115) -15% (-.0190, -.0168) 1 2 .1283 -16% (-.0224, -.0191) -21% (-.0278, -.0254) 2 1 .1130 -5% (-.0073, -.0049) -9% (-.0116, -.0097) 2 2 .1192 -15% (-.0192, -.0168) -20% (-.0246, -.0226) b. Entry time is exposure independent (rB = 1) 1 1 .1038 -6% (-.0074, -.0053) -10% (-.0116, -.0099) 1 2 .1135 -9% (-.0114, -.0090) -14% (-.0166, -.0147) 2 1 .1211 -11% (-.0154, -.0124) -17% (-.0224, -.0202) 2 2 .1116 -9% (-.0113, -.0087) -15% (-.0172, -.0154) °Based on 500 trials with 1000 patients per trial for full cohort; case-cohort sample is formed by a 15% random sample from full cohort plus all non-subcohort cases. 6Data for empirical variance were sampled independently. Therefore, no covariances between empirical variance and other estimators. c100 x (Pm ^Emp ) / FEm p ■ 92 permission of the copyright owner. Further reproduction prohibited without permission. Table A.3: Standard Case-Cohort: U nbiasedness o f Variance Estim ators in Intervention Trials0.______________________________________________ Rel. Risk Empirical Robust Variance Asymptotic Variance Fail Censor Variance6 % of 95% Cl for % of 95% Cl for I'T rC V Emp(.P) Bias V jR a b ^Emp Bias VAsym V Em p Exposure Rate = 50% Failure Rate (overall) = 10% Lost to follow-up (overall) = 0% (no lost to follow up) 1 1 .0649 -4% (-.0030, -.0021) -3% i o o to 0 0 -.0019) 1 2 .0661 -6% (-.0044, -.0034) -5% (-.0042, -.0032) 2 1 .0644 4% ( .0017, .0029) 4% (.0019, .0031) 2 2 .0656 3% (.0013, .0024) 3% (.0014, .0025) “Based on 500 trials with 1000 patients per trial for full cohort; case-cohort sample is formed by a 15% random sample from full cohort plus all non-subcohort cases. 6Data for empirical variance were sampled independently. Therefore, no covariances between empirical variance and other estimators. C 100 X (VroJ, - VEmp) / VEmp• 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table A.4: S tan d ard C ase-C ohort: Variance of th e Score U nder /ffi. Robust Variance Asymptotic Variance Rel. Risk Score U(fio) % of rr rc Mean Var.6 Biasc 95% CI^ V flob Fkmp % of Bias 95% Cl ^ ^Asym V B m p Clinical Triald1 1 1 -0.2 49.6 2% ( 0.4, 1.9) -0% (-0.9, 0.6) 2 2 -0.6 50.6 -12% (-6.7, -5.2) -14% (-7.7, -6.2) Occupational Studiedi,e 1 1 -0.4 51.7 -8% (-5.0, -3.5) -13% (-7.5, -6.1) 2 2 1.2 56.0 -10% (-6.8, -5.2) -16% (-9.9, -8.5) Intervention Triald 1 1 0.2 41.4 3% ( 0.8, 1.8) 2% ( 0.5, 1.5) 2 2 -1.4 48.3 -23% (-11.7, -10.6) -23% (-11.6, -10.4) “Based on 500 trials with 1000 patients per trial for full cohort; case-cohort sample is formed by a 15% random sample from full cohort plus all non-subcohort cases. 6Data for empirical variance were sampled independently. Therefore, no covariances between empirical variance and other estimators. c100 x (Vftob VEmp ) / VEmp • ^Exposure Rate = 50% and Failure Rate (overall) = 10%. eRelative risk for entry time = 0.5. ^No lost to follow up. 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table A.5: Standard C ase-Cohort: Variance o f th e Score U nder fio in O ccupational C ohorts0.___________________________________________ Robust Variance Asymptotic Variance Rel. Risk Score U(/30) % of tt rc Mean Var.6 Biasc 95% Cl Vflob k/Sm p % of Bias 95% Cl V Asym V ]3 v n p Occupational Studiedi,e 1 1 -0.1 52.1 -8% (-5.0, -3.5) -13% (-7.4, -5.9) 2 2 -1.1 55.0 1 0 0 S3 (-5.1, -3.7) -14% (-8.4, -7.0) “Based on 500 trials with 1000 patients per trial for full cohort; case-cohort sample is formed by a 15% random sample from full cohort plus all non-subcohort cases. 6 Data for empirical variance were sampled independently. Therefore, no covariances between empirical variance and other estimators. c100 x (Vfc* V Em p ) / ^Emp • ^Exposure Rate = 50% and Failure Rate (overall) = 10%. eRelative risk for entry time = 0.5. 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table A.6: S tandard Case-C ohort: V ariance of Xk {a% )“. _______________________________________________________ Xk Rel. Risk Variance Estimation - VarpC k)b T t Tc Under (3 o Under MLE (0) Clinical Trials 1 1 .0045 .0046 2 2 .0039 .0039 Occupational S tu d ie d 1 1 .0042 .0042 2 2 .0044 .0044 Intervention T r ia ls 1 1 .0029 .0029 2 2 .0024 .0023 “Based on 500 trials with 1000 patients per trial for full cohort; case-cohort sample is formed by a 15% random sample from full cohort plus all non-subcohort cases. ^Empirical Variance. “Exposure Rate = 50% and Failure Rate (overall) = 10%. ^Relative risk for entry time = 0.5. eNo lost to follow up. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table A.7: Standard Case-Cohort: C onsistency o f Variance Estim ators in C linical Trials (10% Exposure R ate)._____________________________ Exposure Rate =10% Failure Rate (overall) = 10% Relative Risk for Failure tt= 2 Relative Risk for Censoring rc— 1.5 Trial Size Method of Variance Estimation for Var(/3)a Efficiency Full Case- Inv. CohortCohort6 fir Emp.c Infor. Robustc Asym.c Prenticec ,d 500 95 .928 1.020 (.007) .243 .704 (.204) .640 (.061) .695 (.132) 3.36 2.17 1000 190 .751 .424 (.001) .107 .328 (.043) .314 (.011) .321 (.016) 3.97 1.42 1500 286 .768 .249 (.0004) .065 .203 (.003) .201 (.002) .198 (.002) 1.27 0.95 2000 381 .740 .155 (.0001) .049 .150 (.001) .149 (.001) .152 (.001) 1.30 0.95 “All estimates except Prentice are based on 500 trials. Eight trials for cohort size 500 and one trial for cohort size 1000 were excluded due to nonconvergence of the estimator. 6 Average over multiple trials; each case-cohort sample is formed by a 10% random sample from full cohort plus all nonsubcohort cases. cNumbers in parenthesis are variances of the corresponding variance estimators. dBased on 100 trials. 97 permission of the copyright owner. Further reproduction prohibited without permission. Table A.8: Standard Case-Cohort: C onsistency o f Variance E stim ators in C linical Trials (50% E xposure R ate)._____________________________ Exposure Rate = 50% Failure Rate (overall) = 10% Relative Risk for Failure rT= 2 Relative Risk for Censoring rc= 1.5 Trial Size Method of Variance Estimation for Var(j3)a Efficiency Full Case- Inv. CohortCohort6 fir Emp.6 Infor. Robust6 Asym6 Prentice6 ,c 500 94 .772 .349 (7.537) .101 .269 (186.7) .246 .227 (18.42) (13.09) 10.1 .71 1000 190 .740 .125 (.739) .046 .121 (2.576) .118 (1.435) .113 (1.323) 1.79 .92 1500 285 .722 .091 (.369) .030 .080 (.769) .078 (.463) .076 (.378) 1.66 .82 2000 380 .713 .067 (.209) .022 .059 (.263) .058 (.174) .058 (.190) 1.51 1.09 “All estimates except Prentice are based on 500 trials. 6Average over multiple trials; each case-cohort sample is formed by a 10% random sample from full cohort plus all non-subcohort cases. 6Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). “Based on 100 trials. 98 permission of the copyright owner. Further reproduction prohibited without permission. Table A.9: Standard Case-Cohort: C onsistency o f Variance Estim ators in Intervention Trials.______________________________________________ Exposure Rate = 50% Failure Rate (overall) = 10% Lost to follow-up (overall) = 0% (no lost to follow up) Relative Risk for Failure r^ = 2 Relative Risk for Censoring rc= 1 Trial Size Method of Variance Estimation for Var({3)a Efficiency Full Case- CohortCohort6 fir Emp.6 Inv. Infor. Robust6 Asym.6 Prentice6 ,c Rob. Pren. Asym. Asym. 500 98 .743 .174 (1.554) .089 .167 (2.963) .167 (2.808) .167 (3.089) 1.06 1.10 1000 196 .738 .087 (.272) .043 .081 (.307) .081 (.292) .079 (.217) 1.05 0.74 1500 294 .718 .056 (.137) .028 .053 (.079) .054 (.074) .053 (.078) 1.07 1.04 2000 392 .720 .039 (.052) .021 .040 (.031) .040 (.028) .040 (.028) 1.11 1.02 “All estimates except Prentice are based on 500 trials. 6 Average over multiple trials; each case-cohort sample is formed by a 10% random sample from full cohort plus all nonsubcohort cases. 6Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). “Based on 100 trials. 99 permission of the copyright owner. Further reproduction prohibited without permission. Table A. 10: Standard Case-Cohort: Comparison o f Variance Estimators in Clinical Trials.__________________________________________________ Exposure Rate = 50% Failure Rate (overall) = 10% Rel. Risk Method of Variance Estimation for Var(/3)a Efficiency Fail Censor Inv. rT rc Pt Emp.6 Infor. Robust6 Asym6 Prentice6 - C 1 1 1 2 2 1 2 2 .009 .005 .719 .730 .100 (.515) .110 (.468) .085 (.302) .097 (.440) .042 .050 .047 .047 .088 .087 .085 1.41 1.06 (.613) (.436) (.463) .094 .093 .092 1.47 1.22 (.945) (.642) (.782) .094 .092 .089 1.78 0.93 (1.123) (.633) (.591) .094 .092 .092 1.79 1.68 (1.337) (.749) (1.259) “All estimates except Prentice are based on 500 trials with 1000 patients per trial for full cohort; case-cohort sample is formed by a 15% random sample from full cohort plus all non-subcohort cases. 6 Numbers in parenthesis are variances of the corresponding variance estimators (xlO-4). “Based on 100 trials. 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table A.ll: Standard Case-Cohort: Comparison of Variance Estimators in Occupational Studies.___________________________________________ Exposure Rate = 50% Failure Rate (overall) = 10% Relative Risk for Entry Time r^ = 0.5 Rel. Risk Method of Variance Estimation for Var(/3)a Efficiency Fail Censor Inv. rT rc 0r Emp.6 Infor. Robust6 Asym.6 Prentice6 ,c 1 1 1 2 2 1 2 2 .012 -.020 .722 .747 .103 (.477) .112 (.652) .107 (.476) .118 (.676) .048 .051 .051 .046 .105 .100 .096 2.13 0.91 (1.867) (.877) (.795) .108 .102 .096 2.44 0.88 (2.771) (1.134) (.947) .107 .102 .099 2.08 1.13 (1.456) (.699) (.790) .101 .096 .090 2.17 1.02 (1.227) (.565) (.577) “All estimates except Prentice are based on 500 trials with 1000 patients per trial for full cohort; case-cohort sample is formed by a 15% random sample from full cohort plus all non-subcohort cases. 6Numbers in parenthesis are variances of the corresponding variance estimators (xl0~4). “Based on 100 trials. 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table A. 12: Standard Case-Cohort: Comparison of Variance Estimators in Intervention Studies (No Lost to Follow-up).____________________ Exposure Rate = 50% Failure Rate (overall) = 10% Lost to follow-up (overall) = 0% (no lost to follow up) Rel. Risk Method of Variance Estimation for Var(j3)a Efficiency Fail Censor Inv. rT rc fir Emp.6 Infor. Robust6 Asym.6 Prentice6 - C 1 1 -.010 .062 .039 .062 .063 .062 1.05 0.76 (.151) (.136) (.130) (.099) 1 2 -.002 .059 .039 .062 .062 .062 1.11 1.04 (.127) (.137) (.123) (.127) 2 1 .742 .060 .043 .067 .067 .067 1.03 1.00 (.137) (.270) (.262) (.260) 2 2 .755 .068 .044 .068 .068 .067 1.04 1.02 (.198) (.315) (.304) (.311) “All estimates except Prentice are based on 500 trials with 1000 patients per trial for full cohort; case-cohort sample is formed by a 15% random sample from full cohort plus all non-subcohort cases. 6 Numbers in parenthesis are variances of the corresponding variance estimators (xlO-4). “Based on 100 trials. 102 permission of the copyright owner. Further reproduction prohibited without permission. Table A. 13: Standard Case-Cohort: Comparison of Variance Estimators in Intervention Studies (10% Lost to Follow-up).___________________ Exposure Rate = 50% Failure Rate (overall) = 10% Lost to follow-up (overall) = 10% * * > * Rel. Risk Method of Variance Estimation for Var((3)a Efficiency Fail Censor Inv. rr rc Pt Emp.6 Infor. Robust6 Asym.6 Prentice6 ,c 1 1 -.009 .062 .036 .061 .061 .061 1.11 1.15 (.179) (.126) (.114) (.131) 1 2 .060 .061 .036 .061 .061 .061 1.09 1.24 (.141) (.128) (.117) (.146) 2 1 .728 .070 .041 .066 .066 .066 1.07 1.03 (.196) (.235) (.220) (.226) 2 2 .799 .069 .041 .066 .066 .066 1.07 0.98 (.189) (.260) (.243) (.194) “All estimates except Prentice are based on 500 trials with 1000 patients per trial for full cohort; case-cohort sample is formed by a 15% random sample from full cohort plus all non-subcohort cases. 6Numbers in parenthesis are variances of the corresponding variance estimators (xlO-4). “Based on 100 trials. 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table A. 14: Standard Case-Cohort: Comparison of Variance Estimators in Intervention Studies (20% Lost to Follow-up).___________________ Exposure Rate = 50% Failure Rate (overall) = 10% Lost to follow-up (overall) = 20% Rel. Risk Method of Variance Estimation for Var((3)a Efficiency Fail Censor Inv. rT rc h Emp.6 Infor. Robust6 Asym.6 Prentice6 ,c Rob. A sym . P ren. A sym 1 1 .0001 .066 .034 .060 .060 .060 1.06 0.89 (.194) (.108) (.102) (.091) 1 2 .112 .058 .034 .060 .060 .060 1.14 1.14 (.130) (.113) (.099) (.114) 2 1 .736 .066 .038 .064 .064 .064 1.15 1.18 (.168) (.182) (.159) (.188) 2 2 .841 .059 .038 .064 .064 .063 1.15 0.96 (.130) (.187) (.163) (.157) “All estimates except Prentice are based on 500 trials with 1000 patients per trial for full cohort; case-cohort sample is formed by a 15% random sample from full cohort plus all non-subcohort cases. 6Numbers in parenthesis are variances of the corresponding variance estimators (xl0~4). “Based on 100 trials. 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A ppendix B Sim ulation R esults: E xposure Stratified Case-Cohort Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table B.l: Exposure-Stratified: Unbiasedness of V ariance E stim ators0. Stratification variable is a binary surrogate Z with 50% for Z (- ) and 50% for Z(+)-, Sensitivity-80%; Specificity-80%; Exposure rate: overall 50%, Z (— ) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z ( - ) 5%, Z{+) 25%; Expected failure rate: 10%.____________________________________________ Rel. Risk Empirical Fail Censor Variance6 rT rc VEmp((3) Robust Variance Asymptotic Variance % of Bias0 95% Cl for V R ob V ffm p % of Bias 95% Cl for Vtsj/m kjSm p Clinical Trials 1 1 .131 -18% (-.0249, -.0214) -19% (-.0264, -.0230) 2 2 .119 -11% (-.0150, -.0117) -11% (-.0151, -.0118) Occupational Studies? 1 1 .182 -25% (-.0486, -.0416) -30% (-.0566, -.0516) 2 2 .157 -21% (-.0350, -.0301) -25% (-.0417, -.0373) Intervention Trials1 1 1 .066 4% ( .0017, .0034) 5% ( .0028, .0045) 2 2 .075 -3% (-.0033, -.0014) -2% (-.0022, -.0002) °Based on 500 trials with 1000 patients per trial. 6 Data for empirical variance were sampled independently. Therefore, the covariances between empirical variance and other estimators were zero. C 100 X (Vjlob — V fjm p)/VEmp• ^Relative risk for entry time = 0.5. eNo lost to follow up. 106 permission of the copyright owner. Further reproduction prohibited without permission. Table B.2: Exposure-Stratified: Consistency of V ariance E stim ators in Clinical Trials. Stratification variable is a binary surrogate Z with 50% for Z (-) and 50% for Z(+); Sensitivity-80%; Specificity-80%; Exposure rate: overall 50%, Z (-) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z ( -) 5%, Z(+) 25%; Expected failure rate: 10%; Relative risk for failure tt=2; Relative risk for censoring rg=2.________________________________________ TVial Size Method of Variance Estimation for Var(fi)a Efficiency Full Case- Cohort Cohort6 fir Emp.6 Inv. Infor. Robust6 Asym.6 Robust Asym. 500 116 .689 .257 (2.854) .100 .217 (26.85) .213 (24.13) 1.11 1000 234 .689 .111 (.579) .047 .105 (3.450) .104 (3.346) 1.03 1500 351 .672 .081 (.279) .031 .071 (1.060) .070 (1.060) 1.01 2000 468 .709 .055 (.115) .023 .053 (.436) .053 ( -458) 0.95 “Based on 500 trials. ^Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table B.3: Exposure-Stratified: C onsistency o f Variance E stim ators in O ccupational Studies. Stratification variable is a binary surrogate Z with 50% for Z (-) and_50% for Z(+); Sensitivity-80%; Specificity-80%; Exposure rate: overall 50%, Z(— J 20%, Z{+) 80%; Sampling fraction for case-cohort: overall 15%, Z ( -) 5%, Z(+) 25%; Expected failure rate: 10%; Relative risk for failure tt=2; Relative risk for censoring rg=2. Trial Size Method of Variance Estimation for Var(0)a Efficiency Pull Case- Cohort Cohort6 fir Emp.6 Inv. Infor. Robust6 Asym.6 Robust Asym. 500 117 .634 .375 (6.405) .106 .279 (72.35) .245 (34.30) 2.11 1000 234 .655 .149 (.992) .047 .124 (5.476) .118 (4.663) 1.17 1500 352 .682 .097 (.399) .030 .080 (1.615) .077 (1.383) 1.17 2000 470 .688 .065 (.145) .022 .059 (.696) .058 ( .593) 1.17 “Based on 500 trials. ^Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table B.4: E xposure-Stratified: Consistency o f Variance E stim ators in Intervention Trials w ith No Lost to Follow-up. Stratification variable is a binary surrogate Z with 50% for Z (-) and_50% for •£(+); Sensitivity-80%; Specificity-80%; Exposure rate: overall 50%, Z (-) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z ( -) 5%, Z(+) 25%; Expected failure rate: 10%; Relative risk for failure tt=2; Relative risk for censoring rc=1._____ Trial Size Method of Variance Estimation for Var(fi)a Efficiency Full Case- Cohort Cohort6 fir Emp.6 Inv. Infor. Robust6 Asym.6 Robust Asym. 500 118 .725 .173 (1.125) .089 .149 (8.517) .154 (9.070) 0.94 1000 237 .761 .072 (.200) .043 .072 (1.006) .073 (1.039) 0.97 1500 356 .722 .053 (.104) .029 .048 ( .258) .049 (.258) 1.00 2000 475 .738 .038 (.063) .022 .036 ( 107) .036 ( .103) 1.04 “Based on 500 trials. ^Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table B.5: E xposure-Stratified: C om parison of V ariance E stim ators in Clinical Trials. Stratification variable is a binary surrogate Z with 50% for Z {— ) and 50% for Z(+); Sensitivity-80%; Specificity-80%; Exposure rate: overall 50% ,Z(— ) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z (-) 5%, Z(+) 25%; Expected failure rate: 10%._______________________________ Rel. Risk Method of Variance Estimation for V a r0 )a Efficiency Fail rT Censor rc 0 T Emp.6 Inv. Infor. Robust6 Asym.6 Robust Asym. 1 1 -.038 .113 (.505) .043 .108 (3.097) .106 (2.859) 1.08 1 2 -.027 .129 (.678) .050 .107 (3.182) .107 (3.254) 0.98 2 1 .696 .125 (.686) .047 .112 (3.321) .111 (3.180) 1.04 2 2 .658 .108 (.551) .047 .105 (3.033) .105 (2.958) 1.03 “Based on 500 trials with 1000 patients per trial. 6Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 6.6: Exposure-Stratified: Com parison of Variance E stim ators in O ccupational Studies.Stratification variable is a binary surrogate Z with 50% for Z (-) and 50% for Z{+); Sensitivity-80%; Specificity-80%; Exposure rate: overall 50%, Z (-) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z (-) 5%, Z(+) 25%; Expected failure rate: 10%; Relative risk for entry time r,E=0.5.________________________________________________________ Rel. Risk Method of Variance Estimation for Var(/3)a Efficiency Fail rT Censor re P t Emp.6 Inv. Infor. Robust6 Asym.6 Robust Asym. 1 1 -.087 .182 (2.688) .050 .137 (14.46) .128 (7.072) 2.04 1 2 -.117 .146 (.922) .053 .139 (11.51) .128 (6.437) 1.79 2 1 .648 .168 (1.262) .052 .132 (7.334) .126 (5.734) 1.28 2 2 .662 .157 (1.153) .047 .124 (6.248) .118 (4.918) 1.27 “Based on 500 trials with 1000 patients per trial. 6Numbers in parenthesis are variances of the corresponding variance estimators. ( x l O - 4 ). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table B.7: Exposure-Stratified: C om parison of Variance E stim ators in Intervention Ttfals (No Lost to Follow-up). Stratification variable is a binary surrogate Z with 50% for Z (-) and 50% for Z(+); Sensitivity-80%; Specificity-80%; Exposure rate: overall 50%, Z ( - ) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z ( - ) 5%, Z(+) 25%; Expected failure rate: 10%.___________________________________________________________ Rel. Risk Method of Variance Estimation for V a r0 )a Efficiency Fail rT Censor re P t Emp.6 Inv. Infor. Robust6 Asym.6 Robust Asym. 1 1 .018 .065 (.185) .039 .068 ( -811) .069 ( .832) 0.97 1 2 .004 .074 (.219) .039 .069 ( .840) .070 (.877) 0.96 2 1 .752 .068 (.232) .043 .072 (.913) .073 ( .898) 1.02 2 2 .717 .074 (.210) .043 .073 (1.025) .074 (1.019) 1.01 “Based on 500 trials with 1000 patients per trial. 6 Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table B.8: Exposure-Stratified: C om parison of Variance E stim ators in Intervention Trials (10% Lost to Follow-up). Stratification variable is a binary surrogate Z with 50% for Z{— ) and 50% for Z(+); Sensitivity-80%; Specificity-80%; Exposure rate: overall 50%, Z (-) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z (- ) 5%, Z(+) 25%; Expected failure rate: 10%.___________________________________________________________ Rel. Risk Method of Variance Estimation for Var((3)a Efficiency Fail rT Censor rc 0 T Emp.6 Inv. Infor. Robust6 Asym.6 Robust Asym. 1 1 -.010 .066 (.192) .037 .069 ( .876) .070 (.903) 0.97 1 2 .057 .067 (.195) .037 .068 ( .805) .069 (.819) 0.98 2 1 .745 .080 (.271) .041 .073 (1.045) .074 (1.058) 0.99 2 2 .763 .071 (.259) .040 .072 ( .937) .073 ( .968) 0.97 “Based on 500 trials with 1000 patients per trial. 6 Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table B.9: E xposure-Stratified: C om parison of V ariance E stim ators in Intervention Trials (20% Lost^to Follow-up).Stratification variable is a binary surrogate Z with 50% for Z (- ) and_50% for Z(+)-, Sensitivity-80%; Specificity-80%; Exposure rate: overall 50%, Z (-) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z (-) 5%, Z(+) 25%; Expected failure rate: 10%.___________________________________________________________ Rel. Risk Method of Variance Estimation for Var(/3)a Efficiency Fail rT Censor re Pt Emp.6 Inv. Infor. Robust6 Asym.6 Robust Asym. 1 1 .004 .077 (.239) .034 .069 ( .957) .070 (.986) 0.97 1 2 .075 .065 (.180) .034 .068 (.842) .070 ( .889) 0.95 2 1 .759 .071 (.179) .038 .072 (1.051) .073 (1.078) 0.97 2 2 .840 .073 (.201) .038 .072 (1.069) .073 (1.112) 0.96 “Based on 500 trials with 1000 patients per trial. ^Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table B.10: Exposure-Stratified: R obust Variance E stim ators in Clinical Trials. Stratification variable is a binary surrogate Z with 50% for Z (-) and 50% for Z(+); Sensitivity-80%; Specificity-80%; Exposure rate: overall 50%, Z (— ) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z (-) 5%, Z(+) 25%; Expected failure rate: 10%.__________________________________ Method of Variance Estimation for Var(/3)a Rel. Risk Robust Variance Fail Censor tt rc 0t BLS’s Emp.6 Asym. Asym. based. Unadj. Adjusted naive naive 1 1 -.038 .113 (.505) .106 (2.859) .108 (3.097) .125 (2.324) .168 (3.345) 1 2 -.027 .129 (.678) .107 (3.254) .107 (3.182) .124 (2.006) .172 (2.568) 2 1 .696 .125 (.686) .111 (3.180) .112 (3.321) .129 (2.383) .169 (2.841) 2 2 .658 .108 (.551) .105 (2.958) .105 (3.033) .123 (1.990) .167 (2.634) “Based on 500 trials with 1000 patients per trial. 6Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table B .ll: E xposure-Stratified: R obust Variance E stim ators in Occu pational Studies. Stratification variable is a binary surrogate Z with 50% for Z (- ) and_50% for Z(j-); Sensitivity-80%; Specificity-80%; Exposure rate: over all 50%, Z (-) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z (-) 5%, Z(+) 25%; Expected failure rate: 10%; Relative risk for entry time rg = 0.5._____________________________________________________________ Method of Variance Estimation for Var((3)a Rel. Risk Robust Variance Fail Censor T t Tc 0t BLS’s Emp.6 Asym. Asym. based. Unadj. Adjusted naive naive 1 1 -.087 .182 (2.688) .128 (7.072) .137 (14.46) .154 (13.26) .199 (15.60) 1 2 -.117 .146 (.922) .128 (6.437) .139 (11.51) .155 (10.60) .204 (14.04) 2 1 .648 .168 (1.262) .126 (5.734) .132 (7.334) .149 (6.149) .189 (7.521) 2 2 .662 .157 (1.153) .118 (4.918) .124 (6.248) .142 (5.093) .187 (6.274) “Based on 500 trials with 1000 patients per trial. 6 Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table B.12: E xposure-Stratified: R obust V ariance E stim ators in In ter vention Trials (No Lost to Follow-up). Stratification variable is a binary surrogate Z with 50% for Z (-) and 50% for Z(+); Sensitivity-80%; Specificity- 80%; Exposure rate: overall 50%, Z(j- ) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z ( - ) 5%, Z(+) 25%; Expected failure rate: 10%. Method of Variance Estimation for Var(J3)a Rel. Risk Robust Variance Fail Censor rT rc Pt BLS’s Emp.6 Asym. Asym. based. Unadj. Adjusted naive naive 1 1 .018 .065 (.185) .069 ( .832) .068 ( -811) .085 (.260) .128 (.906) 1 2 .004 .074 (.219) .070 ( .877) .069 (.840) .085 (.303) .129 (1.032) 2 1 .752 .068 (.232) .073 (.898) .072 ( .913) .089 (.448) .128 (.904) 2 2 .717 .074 (.210) .074 (1.019) .073 (1.025) .090 (.494) .128 (.904) “Based on 500 trials with 1000 patients per trial. 6 Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table B.13: Exposure-Stratified: R obust V ariance E stim ators in Inter vention Trials (10% Lost to Follow-up). Stratification variable is a binary surrogate Z, with 50% for Z (-) and 50% for Z(+); Sensitivity-80%; Specificity- 80%; Exposure rate: overall 50%, Z {— ) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z (-) 5%, Z(+) 25%; Expected failure rate: 10%. Rel. Risk Method of Variance Estimation for Var(P)a Robust Variance Fail Censor BLS’s Asym. Unadj. Adjusted rT rc Pt Emp.6 Asym. based. naive naive 1 1 -.010 .066 .070 .069 .085 .131 (.192) ( .903) (.876) (.321) (1.214) 1 2 .057 .067 .069 .068 .085 .131 (.195) (-819) ( .805) (.295) (1.172) 2 1 .745 .080 .074 .073 .089 .129 (.271) (1.058) (1.045) (.446) (.908) 2 2 .763 .071 .073 .072 .089 .128 (.259) (.968) ( .937) (.411) (.967) “Based on 500 trials with 1000 patients per trial. ^Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table B.14: Exposure-Stratified: R obust Variance E stim ators in In ter vention Trials (20% Lost to Follow-up). Stratification variable is a binary surrogate Z with 50% for Z (— ) and 50% for Z(+); Sensitivity-80%; Specificity- 80%; Exposure rate: overall 50%, Z (-) 20%, Z(+) 80%; Sampling fraction for case-cohort: overall 15%, Z ( -) 5%, Z(+) 25%; Expected failure rate: 10%. Rel. Risk Method of Variance Estimation for Var((3)a Robust Variance Fail Censor BLS’ s Asym. Unadj. Adjusted rT tc 0T Emp.6 Asym. based. naive naive 1 1 .004 .077 .070 .069 .085 .132 (.239) ( .986) ( .957) (.338) (1.183) 1 2 .075 .065 .070 .068 .085 .131 (.180) ( .889) (.842) (.281) (1.233) 2 1 .759 .071 .073 .072 .089 .130 (.179) (1.078) (1.051) (.448) (1.074) 2 2 .840 .073 .073 .072 .089 .130 (.201) (1.112) (1.069) (.443) (1.172) “Based on 500 trials with 1000 patients per trial. 6Numbers in parenthesis are variances of the corresponding variance estimators. (xlO-4). 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A ppendix C T B F Study: R esults o f D ata A nalysis Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table C.l: T B F Study0 - S tandard Case-C ohort: Testing for Linear Dose Effect w ith E xponential R elative Risk._____________________ Sampling Fraction Method for &Size4 Variance Pt S E 0 ) Relative Risk [95% Cl for RR] 5% Asym. 0.49 0.21 1.63 [1.08, 2.45] 84 (72) Robust - 0.22 - [1.06, 2.49] j-1 - 0.12 - [1.28, 2.08] Prentice 0.47 0.21 1.60 [1.06, 2.40] 10% Asym. 0.60 0.17 1.82 [1.31, 2.53] 166 (69) Robust - 0.17 - [1.30, 2.55] J - 1 - 0.11 - [1.47, 2.25] Prentice 0.58 0.17 1.79 [1.28, 2.49] 20% Asym. 0.24 0.12 1.27 [1.01, 1.60] 357 (56) Robust - 0.11 - [1.02, 1.58] I - 1 - 0.09 - [1.05, 1.53] Prentice 0.24 0.12 1.27 [1.01, 1.60] 50% Asym. 0.29 0.11 1.34 [1.09, 1.65] 911 (38) Robust - 0.10 - [1.11, 1.62] I ' 1 - 0.10 - [1.10, 1.63] Prentice 0.29 0.11 1.34 [1.09, 1.65] Full Cohort: 100% Asym. 0.30 0.10 1.34 [1.10, 1.64] 1578 (0) Robust - 0.09 - [1.12, 1.62] J - 1 - 0.10 - [1.10, 1.64] Prentice 0.30 0.10 1.34 [1.10, 1.64] °Size of full cohort: 1758 patients (1052 exposed and 706 unexposed); Total breast cancer cases: 75 (54 exposed and 21 unexposed); 22 exposed patients had dose information missing and therefore were excluded from analysis. 4 Size of subcohort (number of non-subcohort cases). cReturned from the same Cox model for calculation of (3 for the following 3 estimators: robust, asymptotic, and 7-1. 121 permission of the copyright owner. Further reproduction prohibited without permission. Table C.2: TBF Study® - Exposure Stratified Case Cohort Sample I (Size of Subcohort=75, the Number of Cases): Testing for Linear Dose Effect with Exponential Relative Risk.___________________________ Subcohort Ratio of Size Exp:Unexp Method for Relative Risk (%)» (Actual Size) Variance Pt S E 0 ) [95% Cl for RR] 75 2:1 Asym. 0.89 0.25 2.42 [1.48, 3.97] (4.3%) (50:25) Robust - 0.28 - [1.39, 4.23] J - 1 - 0.15 - [1.81, 3.24] 1:1 Asym. 0.80 0.27 2.22 [1.32, 3.74] (38:37) Robust - 0.29 - [1.27, 3.90] I~l - 0.14 - [1.69, 2.92] 1:2 Asym. 0.75 0.30 2.12 [1.17, 3.84] (25:50) Robust - 0.32 - [1.14, 3.96] J - 1 - 0.14 - [1.63, 2.77] “Size of full cohort: 1758 patients (1052 exposed and 706 unexposed); Total breast cancer cases: 75 (54 exposed and 21 unexposed); 22 exposed patients had dose information missing and therefore were excluded from analysis. 6 Percent of full cohort. cReturned from the same Cox model for calculation of (3 for the following 3 estimators: robust, asymptotic, and I~x. 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table C.3: TBF Study® - Exposure Stratified Case Cohort Sample II (Size of Subcohort=150, Two Times the Number of Cases): Testing for Linear Dose Effect with Exponential Relative Risk.___________ Subcohort Ratio of Size Exp:Unexp Method for (%)6 (Actual Size) Variance Relative Risk S E 0 ) [95% Cl for RR] 150 (8.5%) 2:1 Asym. 0.47 0.17 1.60 [1.16, 2.22] (100:50) Robust - 0.17 - [1.14, 2.24] I - 1 - 0.11 - [1.28, 2.01] 1:1 Asym. 0.65 0.21 1.91 [1.26, 2.89] (75:75) Robust - 0.22 - [1.23, 2.95] J - 1 - 0.12 - [1.49, 2.43] 1:2 Asym. 0.84 0.25 2.32 [1.42, 3.80] (50:100) Robust - 0.29 - [1.32, 4.06] J - 1 - 0.15 - [1.73, 3.11] “Size of full cohort: 1758 patients (1052 exposed and 706 unexposed); Total breast cancer cases: 75 (54 exposed and 21 unexposed); 22 exposed patients had dose information missing and therefore were excluded from analysis. A Percent of full cohort. “Returned from the same Cox model for calculation of (3 for the following 3 estimators: robust, asymptotic, and J -1. 123 permission of the copyright owner. Further reproduction prohibited without permission. Table C.4: TBF Study0 - Exposure Stratified Case Cohort Sample III (Size o f Subcohort=225, Three Times the Number of Cases): Testing for Linear Dose Effect with Exponential Relative Risk.___________ Subcohort Ratio of Size Exp.Unexp Method for Relative Risk (%)b (Actual Size) Variance S E 0 ) [95% Cl for RR] 225 (12.7%) 2:1 Asym. 0.43 0.14 1.54 [1.17, 2.01] (150:75) Robust - 0.13 - [1.19, 1.97] I - 1 - 0.11 - [1.24, 1.91] 1:1 Asym. 0.43 0.14 1.53 [1.15, 2.03] (113:112) Robust - 0.13 - [1.17, 1.99] 7 -1 - 0.11 - [1.23, 1.90] 1:2 Asym. 0.64 0.21 1.89 [1.25, 2.87] (75:150) Robust - 0.22 - [1.22, 2.93] I ' 1 - 0.12 - [1.48, 2.41] °Size of full cohort: 1758 patients (1052 exposed and 706 unexposed); Total breast cancer cases: 75 (54 exposed and 21 unexposed); 22 exposed patients had dose information missing and therefore were excluded from analysis. 6 Percent of full cohort. cReturned from the same Cox model for calculation of /3 for the following 3 estimators: robust, asymptotic, and 7-1. 124 permission of the copyright owner. Further reproduction prohibited without permission. Table C.5: TBF Study0 - Exposure Stratified Case Cohort Sample IV (Size o f Subcohort=300, Four Times the Number o f Cases): Testing for Linear Dose Effect with Exponential Relative Risk.___________ Subcohort Ratio of Size Exp:Unexp Method for Relative Risk (%)6 (Actual Size) Variance % S E 0 ) [95% Cl for RR] 300 (17.1%) 2:1 Asym. 0.47 0.13 1.61 [1.24, 2.09] (200:100) Robust - 0.12 - [1.26, 2.05] I ' 1 - 0.11 - [1.29, 2.00] 1:1 Asym. 0.42 0.14 1.53 [1.17, 2.00] (150:150) Robust - 0.13 - [1.19, 1.96] I ' 1 - 0.11 - [1.23, 1.90] 1:2 Asym. 0.47 0.17 1.59 [1.15, 2.21] (100:200) Robust - 0.17 - [1.14, 2.23] I ' 1 - 0.11 - [1.27, 2.00] “Size of full cohort: 1758 patients (1052 exposed and 706 unexposed); Total breast cancer cases: 75 (54 exposed and 21 unexposed); 22 exposed patients had dose information missing and therefore were excluded from analysis. 6 Percent of full cohort. “Returned from the same Cox model for calculation of (3 for the following 3 estimators: robust, asymptotic, and 7_1. 125 permission of the copyright owner. Further reproduction prohibited without permission. Table C.6: T B F Study" - E xposure Stratified Case C ohort Sample V (Full C ohort): Testing for L inear Dose Effect w ith E xponential Rela- tive Risk._______________________________________________________ Subcohort Ratio of Size Exp.Unexp Method for Relative Risk (%)* (Actual Size) Variance SE{/3) [95% Cl for RR] 1758 1.45:1 Asym. 0.30 0.10 1.34 [1.10, 1.64] (100%) (1052:706) Robust 0.09 [1.12, 1.62] r 1 0.10 [1.10, 1.64] “Size of full cohort: 1758 patients (1052 exposed and 706 unexposed); Total breast cancer cases: 75 (54 exposed and 21 unexposed); 22 exposed patients had dose information missing and therefore were excluded from analysis. 6 Percent of full cohort. “Returned from the same Cox model for calculation of (3 for the following 3 estimators: robust, asymptotic, and I~l. 126 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Counter -matching in nested case -control studies: Design and analytic issues
PDF
Cure rate estimation in the analysis of survival data with competing risks
PDF
Imputation methods for missing data in growth curve models
PDF
A joint model for Poisson and normal data for analyzing tumor response in cancer studies
PDF
Analysis of binary crossover designs with two treatments
PDF
Multi-State Failure Models With Competing Risks And Censored Data For Medical Research
PDF
Cost -efficient design of main cohort and calibration studies where one or more exposure variables are measured with error
PDF
Efficient imputation in multilevel models with measurement error
PDF
Analysis Of Transplant In Non-Randomized Settings
PDF
Correction for ascertainment in family studies
PDF
A study of pediatric oncology nurses' attitudes to and knowledge of genetic testing
PDF
Imputation methods for missing items in the Vitality scale of the MOS SF-36 Quality of Life (QOL) Questionnaire
PDF
Experimental Modeling In Case Crossover Designs
PDF
Identifying susceptibility genes for complex diseases by accounting for epistasis in studies of candidate genes
PDF
Cognitive dysfunction and occupations with likely significant magnetic field exposure: A cross-sectional study of elderly Mexican Americans
PDF
Does young adult Hodgkin's disease cluster by school, residence and age?
PDF
Application of a two-stage case-control sampling design based on a surrogate measure of exposure
PDF
Gene mapping using haplotype data
PDF
Interaction of dietary fiber and serum cholesterol on early atherosclerosis
PDF
A case/parental/sibling control study of Ewing's sarcoma/peripheral primitive neuroectodermal tumor (pPNET)
Asset Metadata
Creator
Jiao, Jenny Qun
(author)
Core Title
Comparison of variance estimators in case -cohort studies
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Biometry
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
biology, biostatistics,health sciences, public health,OAI-PMH Harvest,statistics
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Langholz, Bryan (
committee chair
), [illegible] (
committee member
), Sather, Harland (
committee member
), Stram, Daniel O. (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-242761
Unique identifier
UC11339398
Identifier
3074933.pdf (filename),usctheses-c16-242761 (legacy record id)
Legacy Identifier
3074933.pdf
Dmrecord
242761
Document Type
Dissertation
Rights
Jiao, Jenny Qun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
biology, biostatistics
health sciences, public health