Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6” x 9” black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. UMI A Bell A Howell Information Company 300 North Zeeb Road, Ann Arbor MI 48106-1346 USA 313/761-4700 800/321-0600 COMPARISON O F NESTED CASE-CONTROL W ITH FULL COHORT ANALYSIS UNDER MODEL MISSPECIFICATION by Army Hui Xiang A Dissertation Presented to tlie FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Biometry) August 1995 Copyright 1995 Army Hui Xiang UMI Number: 9621652 UMI Microform 9621652 Copyright 1996, by UMI Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, U nited States Code. UMI 300 North Zeeb Road Ann Arbor, MI 48103 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, written b y h b d ..................... under the direction of Dissertation Committee, and approved b y all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of DOCTOR OF PHILOSOPHY Dean of Graduate Studies Date DISSERTATION COMMITTEE " " X ld r " 'a a 5= Dedication To my husband JefF, my son Marshall, and m y parents for their belief in me. ii Acknowledgements I would like to express m y appreciation to the members of my committee for their guidance and su p p o rt and for their belief in my abilities: My deepest gratitude goes to Dr. Bryan Langholz, my committee chair m an, for his guidance and support in th e development of my dissertation. Special thanks goes to Dr. Stanley Azen. who gave me continuous support throughout m y studies. Appreciation is also expressed to Dr. Harland Sather, Dr. Larry Goldstein, who gave m e valuable review of this work. I would also like to th a n k Dr. W endy Mack for her contribution to the early development of my dissertation. Contents D edication ii A cknowledgem ents iii List of Tables vi A bstract x 1 Introduction 1 2 Literature review 9 2.1 Covariate o m is s io n .................................................................................. 9 2.2 Mismodelling available m easurem ent,s.............................................. 17 2.3 Measurement e r r o r .................................................................................. 19 3 An exam ple 25 4 A sym ptotic properties of estim ators with m odel misspecifica- tion for nested case-control and full cohort designs 31 4.1 Notation and a s s u m p tio n s ................................................................... 31 4.2 Preliminaries ............................................................................................ 34 4.3 Convergence of the maximum partial likelihood estimator !i . 36 4.4 Asymptotic normality of the maximum likelihood estimator ft 38 4.5 Extension to the full cohort design ................................................. 40 5 Two covariates Z\,Z-2 with Z2 om itted 42 5.1 Both Z\ and Z2 are binary and the distribution is constant over t i m e ............................................................................................................... 43 5.2 Both Z\ and Z2 are normally distributed and constant over time 45 5.3 Idealized intervention trials with Z \, Z2 binary or normally dis tributed ....................................................................................................... 16 5.4 Results: b ia s .................................................................................................. IS 5.5 Results: variances and asymptotic relative efiiciency..................... 51 5.6 Both Z\ and Z2 are binary with interaction..................................... 54 6 O ther types of model m isspecification 68 6.1 Mismodelling available m easu rem en ts................................................ 68 6.2 The true model is accelerated failure time m o d e l.......................... 70 6.3 Measurement e r ro rs ................................................................................... 71 7 Simulation study 76 7.1 Design of the simulation s t u d y ............................................................. 77 7.2 Study p a ra m e te rs ...................................................................................... 78 7.3 Results: the maximum partial likelihood estimator 3 ................. 79 7.4 Residts: score variance, information and variance of 3 ................. 82 8 Summary and discussion 107 References 114 v List of Tables 2.1 Bias anc! test size for exposure effect with omitting a balanced covariate ..................................................................................................... 2.2 Effect of pooled analysis with perfectly balanced design . . . 3.1 Descriptive analysis of the cohort d a t a ........................................... 3.2 Comparison of estimates of log hazard rates between full cohort and nested case-control s a m p l i n g ...................................................... 3.3 Differences of log relative hazard estimates between nested case- control sampling and full cohort a n a ly s i s ....................................... 5.1 Bias of nested case-control vs. full cohort with Z2 omitted for idealized intervention trial" ................................................................ 5.1 Bias of nested case-control vs. full cohort with Z 2 omitted for idealized intervention trial" (co n tin u ed )........................................... 5.2 Bias of nested case-control vs. full cohort with Z2 omitted for idealized intervention trial" ................................................................ 5.2 Bias of nested case-control vs. full cohort with Z2 omitted for idealized intervention trial" ( d ) ............................................ 5.3 Bias of nested case-control vs. lull cohort with Z-i omitted for idealized intervention trial" ................................................................ 5.3 Bias of nested case-control vs. full cohort with Z2 omitted for idealized intervention trial" (c o n tin u ed )........................................... 5.4 Bias of nested case-control vs. full cohort with Z2 omitted for idealized intervention trial" ................................................................ 5.4 Bias of nested case-control vs. full cohort with Z i omitted for idealized intervention trial" ( ’ d ) ............................................ 5.5 Bias of nested case-control vs. full cohort with Z2 omitted for an idealized intervention trial" ......................................................... 23 24 28 29 30 57 58 59 60 61 62 63 64 65 vi 1115 1115 5.6 Asymptotic variance for nested case-control and full cohort with Zi omitted for an idealized intervention trial® ............................. 66 5.7 Bias of nested case-control vs. full cohort with Zi omitted for interaction m o d e l ...................................................................................... 67 6.1 Bias of nested case-control vs. full cohort with Z l modelled as Z when the distribution of Z is constant over time'1 .................. 74 6.2 Bias of nested case-control vs. full cohort with Z 1 modelled as Z for an idealized intervention trial" ............................................... 75 7.1 Comparison between simulated and asymptotic results of (i with Zi omitted for an idealized intervention trial" (OlV— 5.0) . . . 85 7.1 Comparison between simulated and asymptotic results of :i with Zi omitted for an idealized intervention trial" (O ll(,=5.0. con tinued) ........................................................................................................ 86 7.2 Comparison between simulated and asymptotic results of jj with Zi omitted for an idealized intervention trial" (OR/J=0.23) . . 87 7.2 Comparison between simulated and asymptotic results of /) with Zi omitted for an idealized intervention trial" (O R 6=0.23. con tinued) ........................................................................................................ 88 7.3 Comparison between simulated and asymptotic results of .1 with Zi omitted for an idealized intervention trial" (/r =0.0) . . . . 89 7.4 Comparison between simulated and asymptotic results of ii with Zi omitted for an idealized intervention trial" (ph=0.5) . . . . 90 7.5 Comparison between simulated and asymptotic results of ,i when Z l is modelled as Z for an idealized intervention trial" . . . . 91 7.6 Simulation: Comparison between full cohort and nested case- control designs with Zj omitted for an idealized intervention trial" (O R '= 1) ......................................................................................... 92 vii 7.6 Simulation: Comparison between full cohort and nested case- control designs with Z2 om itted for an idealized intervention trial0 (0 R ( '=1, continued) .................................................................... 93 7.7 Simulation: Comparison between full cohort and nested case- control designs with Z2 om itted for an idealized intervention trial0 (OR.f c =0.23) 94 7.7 Simulation: Comparison between full cohort and nested case- control designs with Z2 om itted for an idealized intervention trial0 (OR! '=0.23, co n tin u ed )................................................................ 95 7.8 Simulation: Comparison between full cohort and nested case- control designs with Z2 omitted for an idealized intervention trial0 (p(,=0.0) 96 7.9 Simulation: Comparison between full cohort and nested case- control designs with Z> omitted for an idealized intervention trial" (pb=0.5) 97 7.10 Simulation: ( 'omparisoii between full cohort and nested case- control designs when Z ’ is modelled as Z for an idealized inter vention trial0 ............................................................................................. 98 7.11 Comparison between simulated and asymptotic results of E and / with Z2 omitted for an idealized intervention trial" (OR6= l ) 99 7.11 Comparison between simulated and asymptotic results of E and I with Z2 omitted for an idealized intervention trial" (ORfc = 1. continued) ................................................................................................. 100 7.12 Comparison between simulated and asymptotic results of E and I with Z2 omitted for an idealized intervention trial" (OR( '=0.23) 101 7.12 Comparison between simulated and asymptotic results of E and I with Z2 omitted for an idealized intervention trial" (OR(,=0.23, continued) ................................................................................................. 102 7.13 Comparison between simulated and asymptotic variances of /i with Z2 omitted for an idealized intervention trial" (0 R ( '=1) . 103 viii 7.13 Comparison between simulated and asymptotic variances of 0 with Zi omitted for an idealized intervention trial" (0 R 6=1, continued) ................................................................................................. 104 7.14 Comparison between simulated and asymptotic variances of 0 with Zi omitted for an idealized intervention trial0 (OR6=5.0) 10-5 7.14 Comparison between simulated and asymptotic variances of 0 with Zi omitted for an idealized intervention trial" (ORfc =5.0, continued) ................................................................................................. 106 ix Abstract Properties of maximum partial likelihood estimator of nested case- control sampling with model misspecification are evaluated. A very general underlying true hazard has been assumed and the proposed model is a proportional haz ards model. Specifically, we compare the estimator from a nested case-control sample to the partial likelihood estimator from full cohort analysis to see if nested case-control sampling represents the full cohort data under model mis specification. This machinery is used to investigate the bias between nested case-control and full cohort with three types of model misspecification. They are: covariate omission; mismodelling available measurements; and measure ment error. Model based variance estimator is also evaluated with covariate omission. Extensive numerical and simulation studies show that model misspecifi cation generally results iu biased estimates from both full cohort and nested case-control designs. However, the biases are almost the same from both de signs. We conclude that nested case-control studies reliably represent full cohort under model misspecification. With covariate omission, the informa tion based on the misspecified model well represents the actual variance of the score, thus, inferences based on the usual likelihood techniques are reliable. X 1 Introduction Epidemiologic cohort studies are considered the most reliable design for study ing risk factors for disease incidence. Baseline characteristics, general exposure histories, some other risk factors (generally called “covariates” ) are collected during the course of prospective study, thus, information bias is avoided and subject selection bias is minimized compared to a retrospective study. In co hort studies, subjects are observed over some time period, and either “fail” (develop or die from the disease of interest) or are "censored” (are alive at the end of the study period, or are lost to follow-up). If complete covari ate information is obtained for all cohort members, the relationships between the incidence of disease and the covariate history are modeled either through some parametric or semi-parametric analytic techniques (Breslow and Day. 1987) to find the appropriate model which gives the "best” goodness-of-lit to the observed data. Cox (1972) semi-parametric proportional hazards model has been widely used in this case. The regression coefficients are estimated by maximizing a partial likelihood (Cox. 1975; Borgan el al., 1992). The large sample properties of these estimators have been shown to be "as expected” (Andersen and Gill, 1982; Gill, 1981; Borgan el al., 1992) if the true model is the assumed proportional hazards model. T hat is, in large samples these regression coefficient estimators are approximately normally distributed with the mean equal to the true regression coefficients and with variances and 1 covariances. That can he estimated from the second derivative of the logarithm of the partial likelihood. Typically, incidence of diseases such as cancer are rare events, and cohort studies usually require very large numbers of subjects or long periods of follow- up in order to accumulate enough failures to have sufficient statistical power to give reliable answers to the questions of interest. T he cost of collecting high quality covariate information 011 all subjects is prohibitively expensive. If the disease of interest is rare, the contribution of the non- failures, in term s of the “power” of the study, will be negligible compared to that of the failures. Thus cohort sampling methods which include all the failures and only a portion of the non-failures are highly desirable. One type of cohort sampling, the nested case-control design, is the most popular sampling design in large epidemiology studies. This m ethod treats the cohort data as a matched case-control study in which the failure (case) is matched to all subjects who are alive and 011 study at the tim e the case became diseased (potential controls) and parallels the life table analysis of cohort data in the Cox model. The partial likelihood is then the conditional logistic analysis typically used for matched case-control studies. In nested case- control designs, controls are randomly sampled without replacement from the potential controls for each case. After a few controls per case, increasing the number of controls results in little increase in efficiency (Ury, 1975; Breslow et al., 1983; Goldstein and Langholz, 1992). The resulting partial likelihood of the nested case-control design is very similar to Cox’s partial likelihood for the full cohort design when a proportional hazards model is assumed. The only difference is that the full cohort partial likelihood uses all the potential controls for each failure, and the nested case-control partial likelihood uses only the sampled controls for each failure. The maximum likelihood estimators based on the nested case-control partial likelihood have been shown to be consistent (Goldstein and Langholz, 1992; Borgan ct al., 1992) and the relative efficiency of nested case-control to the full cohort for a single covariate is (m — 1)jm (Ury, 1975; Breslow cl al.. 1983: Goldstein and Langholz, 1992) under the null hypothesis when the assumed proportional hazards model is correct. The purpose of epidemiologic studies is to identify risk factors for the dis ease of interest and understand the influence of joint effects of risk factors. However, generally, the true model is unknown. In this dissertation, we are concerned with the validity of our conclusions about the relationship between covariates and disease rates if the true model is not proportional hazards, or the covariates are modelled using incorrect functional forms, or even some needed covariates are omitted from the model. Thus, questions are raised about the properties of the maximum partial likelihood estimators for both full cohort and nested case-control designs if the assumed proportional hazards model is misspecified. Gail (1991) in his review paper indicated four sources of model misspecifica tion. They are missing latent structure; mismoclelling available measurements; missing data, and errors in measurements. One of the most important forms of missing latent structure is the failure to include im portant confounders in the model. Confounders may be omitted either because they are unmeasured, or because, though measured, they are not thought to be important. It is gen erally believed that models should be kept simple, and th at only previously known risk factors and those covariates which make major contributions should be included in the model. Examples of mismodelling available measurements are: the correct model is accelerated failure time model, the assumed model is proportional hazards model: tin* true relationship between the exposure and the response variables is log quadratic, and the proposed model is log linear. Missing data is another important problem which may result in misleading results even if the assumed model is correct. If d ata on the response variable are missing completely at random, then classical analyses of the observed data will yield valid estimates of parameters. The penalty for missing data on ex posure variables or covariates is loss of information instead of loss of validity of inference when using classical analyses methods. Another important prob lem is measurement error. Even though a correct model has been assumed for correct measurements of response and exposure and covariates variables, distortions of estimated effect and loss of power can result from mismeasure- ment of any of these elements. This dissertation is focusing on the type of model misspecification where the assumed model is proportional hazards with certain covariates and the correct model is any type of model which may differ 4 from the assumed model. Examples are: the correct model is proportional hazards with more covariates, or an accelerated failure time model, or the correct model is the same as the proposed model but the covariates in the proposed model contain measurement error. Thus, our model misspecification includes missing latent structure, mismodelling available measurements and measurement error by Gail s definition. Various researchers have investigated the biases of estimators of covariate effects from linear or non-linear models, and the efficiency loss of correspond ing test statistics with various types of model misspecifications for full cohort, retrospective case-control designs or randomized clinical trials: a detailed lit erature review is given in Section 2. In this dissertation, we investigate the properties of the maximum likeli hood estimators based on the nested case-control partial likelihood when the assumed proportional hazards model is misspecified. In particular, we focus on comparing the maximum partial likelihood estimators from full cohort and nested case-control designs under model misspecifications, since usually, the true model is unknown, or covariates are omitted because they are not col lected. In this case, not very much can be done in terms of correcting the bias in either cohort or cohort sampling designs. However, the comparisons of bias and efficiency loss between cohort and nested case-control designs in this case to see if sampling introduces extra bias and lowers efficiency has significant practical implications. We evaluate the asymptotic properties of maximum partial likelihood estimators for both full cohort and nested case- control designs under model misspecifications since these asymptotic results do not depend on the size of any particular cohort, and the exact distributions are in general intractable. Our results can give some theoretical guidelines in terms of the comparability between full cohort and nested case-control designs, and the relative efficiency of nested case-control to the full cohort when the assumed proportional hazards model is not correct under various situations. Our results can also be generalized to population based case-control studies, since cases and controls are random samples from an "infinite" population co hort matching on either calendar time or some other covariates. such as age or sex. In summary, this dissertation primarily focuses on the following four areas: 1. Derive the asymptotic distribution of maximum partial likelihood esti mator based on a proportional hazards model with a very general un derlying model in a nested case-control design. 2. Derive the asymptotic distribution of the maximum partial likelihood estimator for full cohort data with a very general underlying model. 3. Apply the above asymptotic results to various interesting types of model misspecifications. Compare the asymptotic bias of the exposure effect from full cohort and nested case-control designs to the true model. Eval uate the relative performance between full cohort and nested case-control 6 design to see if cohort sampling introduces extra bias and reduces effi ciency when the proportional hazards model is misspecified. 4. Perform simulation studies to examine the performance of the maxi mum partial likelihood estimators under model misspecifications between full cohort and nested case-control designs when sample sizes are lim ited. Compare the simulated results using limited sample sizes with the asymptotic results. The outline of this proposal is as following. Section 2 gives a detailed literature reviews of model misspecification for classical linear or non-linear models used in case-control. full cohort, and clinical trials in terms of biases of estimates and efficiency loss of corresponding test statistics, especially con centrating on the proportional hazards model. Specific areas are omission of covariates, mismodelled available measurements and measurement errors. In Section 3, a data, example is used to compare the full cohort and nested case- control design and pose the questions we investigate in this dissertation. In Section 4, we derive the limits of maximum partial likelihood estimators based on a proportional hazards model under nested case-control sampling with a very general form of underlying true model. The asymptotic distribution of the maximum partial likelihood estimator under model misspecification is pos tulated. Extensions to the full cohort design are also given. Section 5 considers one simple application of the results from Section 1. that is. the true model is proportional hazards with two covariates and the assumed model is propor tional hazards wit h only one of the covariates. The comparisons of asymptotic biases of the covariate effect and relative efficiency between full cohort and nested case- control design is given. Section 6 compares the asymptotic maxi mum partial likelihood estimators between full cohort and nested case-control designs when other types of model misspecification exist, such as, measurement error, mismodelling available measurements, the true model is accelerated fail ure time model. Section 7 describes simulation design, simulation results and the comparison between simulated and asymptotic results for selected types of model misspecification. Summary and discussion are given in Section S. 8 2 Literature review The various types of model misspecification are related. Mismodelling a co variate and measurement error can he viewed as a form of covariate omission. However, for consistency of this dissertation, Gail’s classification (Gail, 1991) is used in this section. Literature on the effect of model misspecification is reviewed based 011 covariate omission, mismodelling available measurements and measurement, error. We shall focus on the bias of estimates of exposure or treatm ent effect (hazard rates) and the efficiency loss of the score test statistic based on a proportional hazards model given these types of model misspecifi cation. 2.1 Covariate omission Since the omission of a covariate that is correlated with an exposure variable (but is not a consequence of exposure) and causally related with a disease leads to well-established confounding bias, most researchers have considered the situation where the omitted covariate is independent of the exposure vari able. In clinical trials, randomization is used to ensure the balance of other covariates across treatment groups. In epidemiological studies, many covari ates and exposure variables might be statistically independent. In follow-up studies, the unexposed cohort might be chosen to have the same distribution of certain covariates as the exposed cohort. This coidd be accomplished by pair matching exposed ami unexposed individuals on the covariate or by ran 9 domly sampling unexposed individuals whose covariate values fall into various categories with probabilities defined by the covariate distribution in the ex posed cohort (frequency matching). In case- control studies, covariate balance is achieved by selecting controls to match the potential confounder variable values of the cases. In early 1970’s, several publications in the epidemiologic literature started discussing the problem of unbiased estimates of relative risk using classical methods of analysis of grouped data, particularly as they relate to matching factors (Miettinen, 1970; Hardy and White, 1971; Currie, 1971). Seigel and Greenhouse (1973) explored the bias of relative risk estimators in the classical 2x2 setting when a matching factor is omitted in cohort and case- control study designs assuming a simple model involving only an exposure, a disease, and a covariate (potential matching factor), each of which is binary. They showed that for unmatched cohort and case-control designs, if the covariate is not associated with both exposure and disease, i.e.. it is not a confounder. then the estimates of relative risk is unbiased if the covariate is omitted. If the covariate is a confounder. then the estimates of relative risk is biased in the unmatched cohort anti case-control designs. For matched cohort designs (the covariate is matched in exposure and non-exposure categories), the estimates of relative risk from both unpaired and paired tables are unbiased. For matched case-control designs, estimates of relative risk from paired tables are unbiased. However, analysis from unpaired table yields biased estimates of relative risk 10 (toward null) unless the covariate is not associated with the exposure. In another words, if the covariate is a confounder or if the covariate is not a confounder but associated with the exposure then the estimates of relative risk are biased by ignoring this covariate. Thomas and Greenland (198.1) compared the relative efficiencies of case- control studies for stratified and independent sampling designs and the analysis methods are matched, stratified and collapsed, respectively. With binary ex posure, disease and potentially confounding variables, following five strategies were compared: 1. Independent samples with a collapsed analysis; 2. Independent samples with a stratified analysis: 3. Stratified samples with a collapsed analysis; 4. Stratified samples with a stratified analysis; 5. Stratified samples incorporating pair matching in the analysis. The results showed that: (1) Under the null hypothesis, if a stratified sampling design is used, then a collapsed data analysis gives an unbiased estimate of exposure effect and is fully efficient. If an independent sampling design and a collapsed data analysis are used, then the estimator of exposure effect is biased when the potential matching factor is a confounder. (2) Under the alternative hypothesis, if the potential matching factor is a confounder. then a collapsed data analysis gives a biased estimate of exposure effect for both designs. If the potential matching factor is not a confounder, an independent design and a collapsed data analysis is fully efficient and unbiased. However, if the matching factor is related with exposure but not related with disease, a collapsed data analysis for the stratified design is biased. Struthers and Kalbfleisch (1986) investigated the bias of the maximum partial likelihood estimator based on a proportional hazards model for full cohort analysis when the assumed model is incorrect. They used a multi variate counting process and martingale theorems l.o derive the consistency of the maximum partial likelihood estimator with model misspecification. They showed th at the estimator from the partial likelihood is a consistent estimator for a param eter defined implicitly. They applied their results to the type of model misspecification in which the true model is proportional hazards with two binary independent covariates and the proposed model is proportional hazards with only one of the covariates. They assumed random censorship in which the survivor function for the censoring time does not depend on the missing covariate. They showed that under the null hypothesis, the estima tor from the proportional hazards model with an omitted covariate has the same asymptotic distribution as that from the proportional hazards model with both covariates. Under the alternative hypothesis, the estimator of the covariate effect from the misspeeified model is asymptotically biased toward zero, even with no censoring. However, the bias is relatively small unless there is a strong effect from the omitted covariate. 12 Bretagnolle and Huber-Carol (1988) studied the effects of omitting more than one covariate in ('ox’s regression model for survival data with censor ing. They assumed that all covariates are independent and censoring and survival times are independent conditional on all the covariates. They pre dicted that: (1) If there is only one covariate included in the model regardless of the number of omitted covariates, the effect of the covariate under study is always underestimated; (2) If several covariates are included in the model, the same underestimation holds for each of them at least, up to some fixed time. This time depends on the distribution of the covariates and the size of the true effects. They concluded that the asymptotic bias resulting from covariate omissions is not negligible. Many authors have considered the bias or efficency of extimates of treat ment effect in a randomized study when some needed balanced covariates are omitted. Randomization provides a valid basis for testing the null hypothesis of no treatm ent elfect wit hout any assumption of a population model and with out the necessity of measurements on all the possibly important covariates. In another words, under randomization of subjects to the treatments, the esti mate of the treatment elfect is unbiased under the null hypothesis even if the proposed model is misspecified. provided censoring of responses acts equally on both treatment groups. However, randomization does not always lead to asymptotically unbiased estimates of treatment effect when needed covariates are om itted. Gail et al. (Gail ft nl., 1984; Gail, 1986) examined the effects of 13 omitting a balanced covariate when analyzing cohort data or randomized ex periment data using various regression methods. Specifically, they considered 9 models: normal linear, exponential multiplicative, exponential reciprocal, Bernoulli logistic, Bernoulli additive . Bernoulli multiplicative. Poisson m ulti plicative, Cox model, and the proportional hazards model for paired survival data (Table 2.1). They showed that under the null hypothesis the maximum likelihood (or partial likelihood) estimator of the exposure or treatm ent effect is asymptotically unbiased for all of tin* 9 models if the balanced covariate is omitted. If the regression of the response variable on exposure and omitted co variate is additive (normal linear. Bernoulli additive) or multiplicative (uncen sored exponential multiplicative, Bernoulli multiplicative, Poisson multiplica tive), then the asymptotic bias from omitting the covariate is zero. Odds ratio estimates from Bernoulli logistic models are attenuated toward the null. For randomly censored survival data with proportional hazards, parametric m od els with known nuisance hazards yield conservative estimates of the exposure effect, so does the Cox partial likelihood analysis. For uncensored exponential survival data, analysis with the exponential survival model yields asym ptot ically unbiased estimate, however, analysis with Cox partial likelihood yields a bias toward null when the independent covariate is omitted (Struthers and Kalbfleisch, 1980). Conditions for asymptotically unbiased estimates of treat ment effect for censored survival data are typically quite complicated and lack generality. 14 Omission of covariates in randomized studies can also inflate the size of score tests for no exposure or treatment effect in some models (Table 2.1). Gail et al. (Gail, 1986; Gail c.1 al., 1988) showed that, in general, if a bal anced covariate is omitted, the model-based variance estimate is inconsistent, and asymptotically it, differs from the true variance by a ‘‘variance deflation factor”, yielding a supra nomina l test size. However, all the Bernoulli models (Table 2.1) and the Cox analysis do have the nominal size if the balanced co variate is om itted. Lagakos and Schoenfeld (1984) and Morgan (1986) investi gated the properties of proportional hazards score test with omitted covariates for comparing two randomized treatments. They showed that the asymptotic size of the score test is not adversely affected by omitting a balanced covariate but there is a substantial loss of power. The omission of a balanced covariate induces nonproportionality to the treatment hazard ratio, thereby causing a loss of power. Chastang cl al. (1988) studied the bias in estimating the treatment effect caused by omitting a balanced covariate in parametric exponential survival models. Their results agree with the findings in other papers. Specifically, they present approximate formulae for calculating the bias and power loss for the multiplicative exponential survival model. They noted that the percentage relative bias is most affected by the effect size of the omitted covariate, the percentage censoring, and the distribution of the omitted covariate. Using simulation studies they investigated the properties of estimates from the Cox 15 and Weibull models for data generated from the multiplicative exponential model. The results indicate that even when there is no censoring, the Cox and Weibull estimates of treatment effect, are biased when an im portant balanced covariate is omitted, unlike the situation lor the multiplicative exponential model (Table 2.1). In contrast to simple randomization, Weinberg (1985) considered the effect of omitting the matching factor in a a cohort study in which a dichotomous exposure was perfectly balanced within each covariate stratum by “frequency- matching” , ‘■pair-matching" and stratification. She commented that valid es timates of the relative risk could be obtained from data ; 1 d across strata under homogeneity of the relative risk across levels of the matching factor. However, the variance estimator obtained from the pooled d ata tended to be too large, leading to a hypothesis test with subnominal size. (Sail (1988) ex tended Weinberg's results to the entire class of generalized linear models and to other models of interest in epidemiologic cohort studies and randomized clinic trials. He examined model-based inference for a variaty of response variables and generalized linear regressions, as well as for the Cox model (Table 2.2). If the data are pooled so that stratum effects are omitted from the regression, all Poisson models and normal models with known variance retain nominal size. The model-based tests have supranominal size for all exponential models. The test sizes for Bernoulli models and Cox models are subnominal (Table 2.2), which contrast the results from simple randomization (Table 2.1). However, 85 the conclusions regarding the asymptotic bias of the estimate of exposure effect when poolling across strata in perfectly balanced designs for these models are the same as in a simple randomized study with a balanced covariate omitted (Table 2.1). All multiplicative or additive models are asymptotically unbiased. The extent of efficiency loss here depends on the model and on the particular param eter values. 2.2 M ism odelling available measurem ents Lagakos and Schoenfeld (1984) considered t he effect of the following types of misspecification of proportional hazard regression models on the corresponding partial-likelihood score test for comparing treatments: nonproportional treat ment hazard functions, omitted treatment-covariate interactions, and mis- modelled covariates. They concluded that the asymptotic efficiency of the proportional- hazards score test, relative to the optimal partial-likelihood test, declines slowly as the hazard functions for the two treatments deviate from proportionality. The efficiency can bo very low when the hazard functions cross or differ only at large survival times. With the omission of treatment-covariate interaction term, the loss in efficiency from using partial likelihood score test instead of the optimal score test was slight for small to moderate interactions. The test size is not adversely affected. Misspecification of the functional form of the regression portion of a proportional hazards model does not affect the size of the score test, but does cause a reduction in power. However, because some of the effect of the covariate will be captured by the term s that have been fitted to the model, the efficiency loss may often be far less than when the covariate is omitted. Lagakos (1988«; 1988b) derived the asymptotic efficiency of the partial likelihood score tests based on a proportional hazards regression model, the least-squares tests based on a linear model, and the likelihood score tests based on a logistic regression model with misspecified form of the covariate process. It is shown that the relative efficiency of the misspecified model for testing the null hypothesis of no exposure effect equals the squared correlation between the correct and incorrect coding of the covariates in the model for all three statistical tests, provided that the risk of censoring does not depend on the covariates in proportional hazards model. The results were used to examine the consequences of 1) using an incorrect dose m etameter in a test for trend, 2) mismodelling a continuous explanatory variable, 3) discretizing a continuous explanatory variable, 4) classification error of a discrete explanatory variable, and 5) measurement errors for a continuous explanatory variable. In examining the incorrect dose metamet er effect on tests of efficiency, he showed that if both concave and (monotone) concave metameters seem plausible a priori, then it is reasonable to use the linear metameter since it maintains a reasonably good asymptotic relative efficiency against all the monotone metameters. If it is felt a priori that the dosc-response will be either linear or convex, use of a mildly convex metameter is reasonable. 18 Struthers and Kalbfleisch (1986) investigated the properties of the m ax imum likelihood estimator based on a proportional hazards model when the underlying true model is the accelerated failure time model using counting pro cesses and martingale theorems. Solomon (1984) addressed this same problem using a different approach. They concluded th at the use of the proportional hazards model for analysis when the accelerated failure tim e model is true leaves unchanged, to first order, the relative importance of the covariates, re gardless of the censoring distribution. Struthers and Kalbfleisch also pointed out that the approximation holds to second order when covariates are sym metric and there is no censoring. Begg and Lagakos (1992) studied tin.’ effects of misspecifying the covariate vector on tests of association between a partic ular covariate and the response variable in logistic regression models. They derived the asymptotic distribution of the likelihood score test statistic un der a. sequence of models. The main result can be used to study the effects of random measurement error, discretizing a continuous explanatory variable and mismodelling the functional form of an explanatory variable. 2.3 M easurem ent error Design and statistical analysis issues have been extensively discussed in terms of providing valid estimate when there exists measurement errors. Here, for my purpose, 1 only give a review relating to how the estimates are biased in various models. 19 Let E be the true exposure value and E~ be its observed value. In a classical measurement error model. E’ is modelled as a function of E , i.e. E ’ = E + £, where e is normally distributed with mean zero and a constant variance a 2, and independent of E. If V ’ is a measured response variable and related to £ by a linear model, it is well known (Fuller, 1987) th at the least-scjuares estimator of ft based on using E m instead of E is biased. Its expectation is rd, where r =var( E)/{var(E)+<72}. Under the null hypothesis, the least-squares estimate of f) is unbiased and ihe standard test statistics have normal size but with reduced power. For logistic regression models, the maximum likelihood estimates of E effect are attenuated (Carroll, 1989: Stefanski and Carroll, 1985), and hypothesis tests in logistic regression for no exposure effect have valid size but with reduced power, which is similar to linear models. For a proportional hazards model. Prentice (1982) showed that the propor tional hazards structure was preserved in time to response models if an error prone exposure E ‘(t) was measured instead of E(t). although the relative risk model was altered. Typically, the analysis of errors is difficult. However, in the case of rare diseases it may be reasonable to suppose t hat the joint dis tribution of E and E~ is constant over time. In this case, under the classical measurement error model E' = E + t, estimates of exposure E effect ji in the relative risk model / (E ) = 1 + ,1E are attenuated (Pepe et al., 1989) by the factor var(E)/{var(E)+<r2}. Tin* same attenuation , , ' es to the log relative risk parameter i in the model r(E) = exp(,1E) if E" and E have a bivariate 2 0 97 normal distribution. More complicated techniques are required when the joint distribution of E ’ and E changes over time, as when E is time-dependent or when events are not rare or the censoring distributions depend on E. Lagakos (1988a) considered the asymptotic relative efficiency due to measurement error for three statistical tests. They are (1) least-squares tests derived from linear models; (2) likelihood score tests from logistic models; and (3) partial likeli hood score tests from proportional hazards models. In examining the effect of measurement errors in continuous explanatory variables, he studied the classi cal measurement error model (i.e., E m = E + t defined above) and the Berkson model (i.e. E = E ‘ +7. where 7 is normally distributed with mean zero arid variance a 2, and independent of E~). for the classical model, the asymptotic relative efficiency is simply var( E )/{var( E)+fr2}. for the Berkson model, the asymptotic relative efficiency corresponds to var(£",)/{var(£” )+<72}. In a randomized study of treatment or exposure effect E, with a covariate .Y measured with error, least-square estimates in linear regression models give unbiased estimate of the E effect and valid test size. The effect of measurement error in .Y is loss of power (Carroll tl al., 1985). In logistic regression, the estimate of E effect is biased and the hypothesis tests have incorrect type I error and reduced power (Carroll ft al., 1984). If measurement error in .V is treated as omitting a needed covariate fj(X — A ”), then the estimate of exposure E effect is not biased in multiplicative regression models, but is 2 1 biased towards zero in both logistic and proportional hazards models (Gail et al., 1984). Generally, in a non-randomized study, measurement error in covariate .V creates a biased estimate of the E effect for almost all commonly used models and invalidates the inference for exposure efTect and reduces power (Carroll, 1989). Such errors can severely distort estimates of the common odds ratio (Kupper, 1984: Greenland, 1980). Table 2.1: Bias and test, size for exposure effect, with omitting a balanced covariate Model Asymptotic bias test size Normal linear 0 supranominal Uncensored exponential multiplicative 0 supranominal reciprocal away from 0 supranominal Poisson multiplicative 0 supranominal Bernoulli additive 0 nominal multiplicative 0 nominal logistic toward (J nominal Cox model toward 0 nominal Paired survival toward 0 nominal T a b le 2 .2 : E f f e c t o f p o o le d a n a ly s is w it h p e r f e c t ly b a la n c e d d e s ig n Model Asymptotic bias test size Efficiency loss Normal additive 0 nominal no Exponential additive 0 supranominal yes multiplicative 0 supranominal yes reciprocal awav from 0 supranominal no Poisson additive 0 nominal yes multiplicative 0 nominal no Bernoulli additive 0 s yes 'icative 0 subnominal yes logistic toward 0 subnominal no Cox model toward 0 subnominal yes 9 4468 3 An exam ple An observational study was conducted in Shanghai, People’s Republic of China involving 17,7lb male subjects aged IS to 64 and cancer free. The recruitment period was from 198b to 1989. Questionnaires were sent regarding exposure histories, health status and dietary habits. Subjects were followed until they develop cancer, or were lost of follow-up. T he last follow-up date for this data analysis is Dec. 31, 1992. We examined the relationship between lung cancer incidence rate with smoking, alcohol and chronic lung disease. Wo assume that the true model for this data, is proportional hazards. We first analyze these data assuming it is a full cohort design, using the Cox partial likelihood approach. Then we perform a nested case-control sampling and analyze the sampled data using conditional logistic regression. The relative risk estimates from fitting these two models are t hen compared. Out of 17,71b subjects who entered this cohort, 10") of them developed lung cancer before Dec. 31, 1992. Smoking history was coded as never smoker, ex smoker and current smoker. For smokers, the estimates of cigarettes per day were given and also the smoking start, ages. For ex-smokers, the stop smoking ages were provided. The coding of alcohol history is the same as smoking, except that the estimates of grains of ethanol per day were given. Chronic lung disease history is a simple binary variable. For our interest, we model smoking and alcohol as cumulative exposures and chronic lung disease as a binary exposure. Table 3.1 presents the descriptive analysis of the cohort data comparing full cohort with the lung cancer cases. Subjects who developed lung cancer in the follow-up have a much higher rate of being current smokers comparing to the full cohort (81% vs 51%). The percentage of current alcohol drinkers in the lung cancer group is a little higher than the full cohort (55% vs 41%). The chronic lung disease history rate in the lung cancer group doubles that in the full cohort. The average entry ago of later lung cancer subjects is 3 years older than that of the full cohort. This is because of the fact th at lung cancer rate is increasing with the increase1 of age. 'Die average exit ages from this cohort are comparable between t he t wo groups. We analyze the data using the Cox partial likelihood approach and nested case-control sampling. Covariates cumulative smoking exposure (pack-year/10), cumulative alcohol exposure (lOOg ethanol-year/10) and chronic lung disease (yes/no) are considered univariately and multivariately. T he results are given in Table 3.2. For nested rase-control sa , g, 1:1 and 1:1 matchings are considered. For each mat ching st rategy, 10 samplings are repeated. The aver age of the estimates of log relative hazards ami the average of the estimated standard errors are then ca.lcida.ted. For the full cohort design, all three covariates are significantly related with the lung cancer univariately (cumulative smoking and alcohol. P<0.0001; chronic lung disease, P=0.047). When cumulative smoking is included, both cumulative alcohol and chronic lung disease are no longer related with lung 21 cancer (cumulative alcohol, P = 0 .13; chronic lung disease, P=0.25). The esti mate of the log relative hazard for cumulative smoking is not adversely affected by adding cumulative alcohol and chronic lung disease. However, the estimates of log relative hazards for cumulative alcohol and chronic lung disease are dra matically reduced by adding cumulative smoking. These results suggest that the estimators of alcohol and chronic lung disease effect are severely biased when smoking efTect is omitted from the full cohort model. For nested case-control sampling, both 1:1 and 1:4 matching give generally higher average estimates of log relative hazards for all models. 1:4 matching gives closer estimates for all mvuriatcs in all models to the full cohort than 1:1 matching. The actual differences of the estimate of log relative hazard between sampling and full cohort for all models are given in Table 3.3. One notices that 1:4 sampling gives approximately the same results as the full cohort for all models in this example. So far proportional hazards model has been assumed when analyzing these data. The underlying true model is unknown. The comparison of the results between full cohort analysis and the nested case- control sampling in this example does not provide a general conclusion. However, these results do illustrate the need to investigate the properties of estimators from full cohort and nested case-control sampling when the proposed model is misspecified in some particular wavs. 2 7 T a b le 3 .1 : D e s c r ip t iv e a n a ly s is o f t h e c o h o r t d a ta Full cohort Lung cancer percent mean (SD) percent mean (SD) Smoking never ■12.77 12.38 ex (i.63 6.67 current 50.5!) 80.95 Alcohol never 57.30 42.86 ex 2.02 1.90 current 10.62 55.24 Chronic lung disease 6.47 12.38 Entry age 55.5 (5.2) 58.1 (4.3) Exit age 60.2 (5.2) 60.6 (4,5) Follow-up in years 4.7 (1.1) 2.5 (1.6) T a b le 3 .2 : C o m p a r is o n o f e s t i m a t e s o f lo g h a z a r d r a te s b e t w e e n fu ll c o h o r t a n d n e s t e d c a s e - c o n t r o l s a m p lin g Models Full cohort /f1 (SE) P- value 1:1 sampling Mean(SE)2 1:4 sampling Mean(SE)2 Univariate analysis 1. Cumulative smoking'* 0.300 (0.035) <0.0001 0.433(0.092) 0.344(0.054) 2. Cumulative alcohol'1 0.190 (0.045) <0.0001 0.309(0.123) 0.214(0.070) 3. Chronic lung disease 0.589 (0.298) 0.047 0.717(0.523) 0.541(0.355) Cumulative smoking + alcohol Cumulative smoking Cumulative alcohol 0.284 (0.038) 0.078 (0.052) <0.0001 0.13 0.415(0.095) 0.126(0.133) 0.330(0.057) 0.069(0.076) Cumulative smoking -f Chronic lung disease Cumulative smoking Chronic lung disease 0.290 (0.030) 0.340 (0.300) <0.0001 0.25 All three 0.443(0.094) 0.812(0.624) ■ covariates 0.341(0.054) 0.390(0.383) Cumulative smoking Cumulative alcohol Chronic lung disease 0.280 (0.038) 0.078 (0.052) 0.341 (0.300) <0.0001 0.13 0.25 0.423(0.098) 0.144(0.136) 0.865(0.631) 0.326(0.057) 0.074(0.076) 0.417(0.385) 1 Log of relative hazard. 2 Average of estimate of log relative hazard from 10 samples. 3 pack-year/10. 4 lO O g ethanol-year/10. T a b le 3 .3 : D if fe r e n c e s o f lo g r e la t iv e h a z a r d e s t i m a t e s b e t w e e n n e s te d c a s e - c o n t r o l s a m p lin g a n d fu ll c o h o r t a n a ly s is Models $1:1 — $/«// $1:4 — $/«// Univariate analysis 1. Cumulative smoking1 0.133 0.044 2. Cumulative alcohol2 0.113 0.018 3. Chronic lung disease 0.128 -0.048 Cumulative smoking -f alcohol Cumulative smoking 0.131 0.046 Cu mu 1 at i ve a 1 co 1 1o 1 0.048 -0.009 Cumulative■ s m o k i n g -f ’ lung disease Cumulative smoking 0.1 17 0.472 Chronic lung disease 0.172 0.050 All three covariates Cumulative smoking 0.143 0.046 Cumulative alcohol 0.060 -0.004 Chronic lung disease 0.524 0.076 1 pack-year/10. 2 lO O g ethanol-year/10. 0897 4 A sym ptotic properties o f estim ators w ith m odel mis specification for nested case-control and full cohort designs 4.1 N otation and assum ptions A time interval [0, r] for a given terminal time r, 0 < r < oo is fixed through out this thesis. Let he a probability space and (H ()(g[o,oo) a right continuous, nondecreasing family of sulxr-algebras of T , and specifies the “cohort history" up to time / in the sense that it contains all events (including censoring history) whose occurrence or not is fixed by time t. 7it- contains all events whose occurrence or not and censoring history are fixed strictly before time t. Let tj be ordered failure times, / , be the index of the failure at time lj and Rj the set of all those "at risk" at including both the failure and those on study. Denote the following counting processes A',(/) = Y , /(*; < l *ij = >) ./> 1 to count the number of observed failures for individual / in [0, /], /’ = 1 .2 ,..., n. Suppose that .V,(/) is right continuous and that no two components have si multaneous jumps. Let ))(t) be a predictable indicator process taking the value 1 if the ith individual is at risk at / — and 0 otherwise. Assuming that a p-dimensiona! covariate vectors Z,(t) are left continuous and adapted, consequently they arc predictable and locally bounded. This means that the values of the covariates at time I should be known, based on 31 available information on the cohort, just before time /.. Suppose the processes (Zi, Vj), i = 1 ,2 ,..., n, are independent copies of the process (Z, V ). Assuming the underlying true intensity processes associated with Ni(t) are specified as A i(t) = Yi(t)aii(t), (4.1) where «,(<) could be any function of Z,(t). such as, a proportional hazards model, accelerated failure time model, etc. By standard counting process theory, M,(t) = Ni(t.) - [ \,(u)du Jo are local square integrable martingales. Their predictable variation processes are given as < Mi > ( t ) = [ \,(u)dti, (4.2) Jo and their predictable covariation processes are < Mi.Xlj > (/) = 0 for i ± j. In cohort sampling, a set of controls are sampled according to some pre specified distribution from the risk set Rj at each failure time tj for the failing individual ij. This sampling activity will introduce some extra random varia tion into the model (4.1). Let J-, represents the history of observed events in the cohort together with the sampled risk sets. Under independent sampling, the intensity processes corresponding to the counting processes ,Vt - will be the 3 2 same under Tt as under 7it (Borgan ft al., 1992), that is, additional knowl edge of sampling which has occurred before any time t. should not alter the intensities of failure at I. YVe assume independent sampling throughout this dissertation, and all intensity processes, martingales, etc., will be with respect to (F t). Suppose due to reasons mentioned in the introduction section, model (4.1) is not used to analyze the data. Instead, we model the data assuming a pro portional hazards model with covariates vector Zt{l). Mere, Z,(l) could be all the /^dimensional covariates or only part of them with some covariates om it ted. For generality, we use Z,(l) to represent the covariates in our postulated proportional hazards model throughout, that is, following intensity processes are postulated, A,-(<) = V;(/)A1(/)( ',, ^<') (4.3) where Aj(<) is the mis-modeled baseline hazard function (non-negative), jj is a vector, representing mis- modelled covariate effects from covariates Zi(t). In nested case-control sampling, at each failure time a set of controls are selected randomly. Facli subject at risk has an equal probability to lie selected as a control (case is "at risk" but can not bo sampled as a c ' Let R,(t) be the sampled risk set which includes sampled controls and the failure, individual i, at time t. Then the nested case-control partial likelihood under assumed 06 model (4.3) can be written as fIN,(t) w i - n n r , '" .n i>0 , = l ( £-j€I1,(> ) ) It is convenient here to introduce the notation S lpHl),lJ) = ^ Zj0pel3TZj izu jeu for p = 0. 1,2. where for a vector S'. define S' 0 = l.S'wl = S’ . ,S 'e2 = S'S’7 . \ ru for Throughout this dissertation, expressions of the form 0/0 and J } m < j will be set to 1. The sy 1 's =>• and — > will denote convergence in distribution and in probability, respectively. 4.2 Prelim inaries The following Lemma is the key to the derivation of the asymptotic distribu tion of the maximum nested case-control partial likelihood estimator fi with true intensity (4.1). These preliminary results will be stated without proof, and the}' are direct results of Lemma I from Goldstein and Langholz (1992). Here, we extend their Lemma to include p — 0 and a general form of .4; and specify the corresponding form of .4, when p = 0 for our purpose of deriving the asymptotic distribution of maximum nested case-control partial likelihood estimator with true intensity (4.1). In fact, their Lemma 1 works for any finite p, any general form of (/’(/ ). and any general form of .4, under the conditions specified below. 85 In this section, for notational simplicity, vve suppress an arbitrary .s G [0,oo), and write Y = Y(s). Z = Z(s) and p = p(s), etc. L e m m a 1 Let p G {0,1. 2}. and (Vj. Z,) be independent copies of (V, Z), U — 1 ,2 ,..., m, Z is bounded, and p = P (Y = 1) > 0. Let R = {j : Yj = 1}. P = {U C R.\U\ = ;»} and Pi = {U G P : » € ('}. WWi f/ G P. let ii'(l ') In of the form / M i l l /M O J or /M M M ill ' i .scn(^. / } / with w(ifl) = 0. Under indt pendent sampliny. suppej.se that R, are mut ually independent and uniform on l\. Let S„ = - Y w i R M A i . n —' i=i where Ai is aft.) when /> = 2. may be either o ,(/) or Z,e\ft) when p = 1. and Ai is either Z, ft f t ) or (Z ,) 2e\;(t) when p — 0. Then Sn ----> ip- where ip = pE | »’( * 0 ^ Aj | Yv = 11 . 35 4.3 Convergence of the maximum partial likelihood estim ator 0 Theorem 1 Let 0 be the estimator which maximizes the nested case- control partial likelihood with model misspecification (4-4)- V = 1,2 rn. Then 0 converges to 0". where 0 ' is the solution to the equation h(0) = 0 with h(0,t) = p(s) — E{S^1 * ( U ) - | j ^ ' j s (0>({/) | YV = l)els. (4.5) Proof. This proof is similar to the proof of the consistency of the maximum partial likelihood estimator given by Andersen and (Jill (1982) when there is no model misspecification. Let l(.i. t) be the log of the partial likelihood (4.4). Using (4.2), we have n ' ll(0J.) (s) (4.6) n = 'r ' E / {:fz^ - lo s (s'< 0 ,(j- " ■ * * ))) } + £ / {it1’***) - (^,(0)(^ IU«))) } A,(.s), ;=i where A,(s) is from the true model (4.1). Let X{J.t) = n~il(0.t) A(ifit) = n - 1 £ [ {S'Zii*) - log />-( -))) } A,(*Rs. then for each 0. (X(0. •) — . !( 1. ■)) is a local square integrable martingale with < (A (/y, •) - A(0, •)), (X (0. ■ ) - A(0, •)) > = B(0, •), 1 1 pt = , r ' E I { *r ^ ( * ) - l o g ( * ,0)( / L /?,(*))) } ^ Y , where r 'y2 B(/3,t) = »~2 Y , {/iT^ - ( « ) - l o g ( ^ 0)(/?,^ -(s)))} 'i Xi(s)ds 1=1 •'° = » r 2 £ f { ( l i r Z ,( s ) f 2 - 2/3r Z1 (s) log (.S'(0)(/i, & (*))) 1 = 1 + [log ^.S'(0)(/3f, /?,(*)))] }Yi($)ai(s)ds. By the boundedness condition of Z i(t ), nB(ft, oo) converges in probability to some finite quantity (depending on :i). Therefore by the inequality of Lenglart (Andersen and Gill, 1982. Appendix 1.2). .Y (T .og) converges in probability to the same limit as A(/), oc ) for each ,i. In wha.t follows, interchanges of limiting operations are justified by a dom inated convergence theorem. Hence. ' v- r f,,. « . ( * ) ) 1 — > I < Zi(s) ----------------------- > V,(.s)ol(s)f/.s, t r 7 o \ 5 (» )(^ /?,(.s))/ and, using Lemma 1. j T i ii, = i J * - l > < H s g § 5 • - } This demonstrates th at . 1 ( /. x ) converges to a function with first derivative 0 at /I = /J*. 3 7 Let Zyju = (Zy.i, . ■, Zy.m) be a vector of in independent copies of the random variable Z y . Let e0TZy,J P' = L i&r ^ ' and set P(Z — Zyj\Zy,r) = ]>,. Then Cov{Z\Zy,v) = Y . z v > ‘ - ZyjPj)02- < 4-7) jsu J ei' Then th e second derivative of the limit of ,4(/L oo) with respect to 3 is / = - I / ' ( * ) ^ {Cov(Z(.S) I Zy,:)S{ti)(r )} ds. (1.8) The second derivative is a negative definite matrix. Thus lor each /J, ?*_ l/(/i, oo) converges in probability to a concave function A(3, oo) with a unique maxi mum a t (i = /i* (by definition of ji‘). Since 3 maximizes the random concave function n~il(3, oo), it follows by some convex analysis (Andersen and Gill, 1982, Appendix 2) that 3 — > 3'. 4,4 A sym ptotic normality of the m axim um likelihood estim ator 3 In this section, we try to demonstrate th a t with true intensity (4.1) and the proposed proportional hazards model (4.3), the maximum likelihood estima tor 3 ,at which the nested case-control partial likelihood (4.4) is maximized, is asymptotically nor distributed with asymptotic mean 3 ’ and some variance. The formula for calculating 3" has been given in Theorem 1 of the previous section. ^ T h e o r e m 2 ((Conjecture: Asymptotic normality of /i) « 1/2( / i - / i " ) =► JV(0,r). where T = ( / -1 ) S ( / _1 )7 . 1 is the limiting value of the second derivative of the log-partial likelihood evaluated at fi' (4-M)- and S is the score variance evaluated at fim , where, s " H 0 ' , u ) y0 ? ? > [ s SW {0',U ) + {M kf]) ■ « " < " > i«. = >}*.. « ■ » > Let lJ(li,t) he the score t'linctioii, i.e.. the first, derivative of the log partial likelihood function (-l.(i) with respect to /L we have. f L - a £ & L , i . and S = V a r(n -'/2( r{/rj.) = lim V / { Z«(.s) - ------ - J - } \,(s)ds e r fv,, h»))V\.............. = lull//. > / < /,(•*)------—-------=------- > y,(.s)f*i(s)ds. Using Lemma 1 from Section 1.2, - £ I «, = I j </« ~ * l 1 V tI = ' I * — Equation^ M). The theoretical proof of above conjecture is difficult for sampling designs when model misspecification exists. The proposed asymptotic variance for mula T = (I~])'£(I~t )r may not be exactly true according to the work of Lin and Wei (1989). Lin and Wei derived the asymptotic distribution of the maximum partial likelihood estimator for full cohort design when model mis specification exists. They claim that — lJ') is asymptotically normal with mean 0 and with a covariance matrix that can be consistently estimated. In their paper, the score variance is different from ours by a correction factor. Further consideration of this will be given in the discussion section. Future work is need in this area. Monte Carlo simulation method will be used to check the applicability of our proposed variance formula and the performance for small sample sizes. T he detailed description and the results are given in the simulation section. 4.5 Extension to the full cohort design T he above results can be extended to a full cohort design when in — ► oo. Since for U = 1 ,2 ,..., in M E { S M (l)jr) I V = 1} = E { Y ( s )SM((},U)} and as in — + oo, E { Y ( , ) - S ^ ( ; i . r ) } — F { V - ( .s ) ( Z ( .s ) r v ^ > } m Similar results can be found for Define = E { n t n Z ( t ) ) ' ^ r z ^ } , s ^ ( t ) = E { Y ( t ) ( Z ( t ) r pa(t)} 40 Then when rn — ► oo, h(/1) is exactly the sam e as that, derived by Struthers and Kalbfleisch (1986, equation 2.5). In particular, let be the m axim um partial likelihood estimator for a full cohort design under an assumed intensity (4.3) with true intensity (4.1), then /?/„» converges to lijun where is the solution to the following equation: *,„«(/» - [ {.'•'<«> - (4.10) The formulae for computing //„// and S/„// are given below, respectively. 5 Two covariates Z \ ,Z - i with Z-i om itted The results derived above apply for any form of underlying true model and we model the data with proportional hazards, so we can study the biases due to such model misspecifications. In this section, we consider a common situa tion where the true model is a proportional hazards model with two covariates (possibly time-dependent and/or with interaction between these two covari ates) and th e proposed model is proportional hazards with only one covariate. Sections 5.1 to 5.5 assume the' true random intensity process is of the form Yi(t)\0( t y " Z'{tHc'2Z2it). (5.1) Section 5.6 considers interaction between Z\ and Z?, that is. the true random intensity process is of the form V ;-(<) Ao(Of'c>,^,(0+a2'^2(^)+l" ^l(^)*^2(^, (5.2) where Ao(t) is the true baseline hazard (non-negative). For all situations, we consider estimation of o-| when the model assumed is V ;(/)A ,(/)e ^ l'>. (5.3) and compare ii~ between the full cohort, and nested case-control designs. We use subscript {full} and {nest} to differentiate the results between full cohort and nested case-control designs. The proposed asymptotic score variance S, information / and the variance of :Jm (F) are also evaluated under certain situations. 5.1 B oth Z| and Z.2 are binary and the distribution is constant over tim e We first examine the case where Z\(t) and Z 2(t) are binary with true model Yi{t)A0(f)eaiZl*d+i1 , 222(() The working model is V;(<)A](<)e/3Zl*0 For simplicity, we assume the joint probability of exposure distribution of Z\ and Z2 are constant over time (which is approximately true for rare disease). Let 7r„ = l>(Zl(l) = i ,Z 2(l) = j | Y(t) — 1), in TtJ = ^ 2 = i, Z2k = j), i j = 0, 1 t=i where Tij represents the total number of subjects in each exposure category of Z\,Z2 in the sampled risk set. Then lor / = 1.2 in S {p)(li.l') = in)jCki. ><'>(/.') = i j = 0 , 1 = , , ( t ) E { Z y z '}. sM (t) = Ao ( t W ) E { Z t t 0'z '+a> z'), where p(t) is the probability of being at risk as defined previously. Since we assume that the joint exposure probability to Z \,Z 2 is constant over time, h( finest) = 0 from equation (1.5) and hjutt(fijuu) — 0 from equation (4.10) are equivalent to E{iTijeiat+Jai - T; •,t 1 +'°2} = 0. (5.4) for a nested case-control model, and for a full cohort model. T he expectations in the nested case-control model are taken with respect to T. Since tlie four categories defined by Z\ and Zi arise randomly, T has a multinomial (7 r0o. ttoi. ^io. m ) distribution. Define /l = ttoe °' + < „ r‘M+° 2, B = t l 0epn.., + C = too + + <iot°' + <i.cn' +° 2, D — <oo + <01 + (l ~ 7 7 1 7 7 1 1 ( 1 . h — 7 7 1 (jf 7 nit -(- j j ( f nit ^ c = T T O O T 7T0|C L 2 + JTiof' 1 + 7T)p l+‘ (I = 7 T o o + 7 T o i -f 7T|of/ 1 ““ + 7 T ] ] . Let <11 = m — <oo — <oi — <io. then the above two equations (5.4, 5.5) become / m m — < 0 0 m — <oo-<oi EE E { A - B * C / D } < o o =0 <oi=0 * io =0 for the nested case-conl rol model and = o, \ ' < U 0 < O I< 1 O < 1 1 a — h * c/d = 0, for the full cohort model, respectively. One can easily obtain the analytical solution of as following: .. . I / (7T|0 + 7rii'"2)(7Tuo + ;roi)\ '1 + \ ( * „ 7 W -)<>r1 ., + * „ ) / For m = 2 or 3, we analytically solve the binomial equation for /^’es(. The solution is exactly the sam e as <ijull (does not depend on m). For m > 3, the analytical equation solving is tedious. However, we expect that is equal to /iyu(/ for any size of m . A written FORTRAN program which does numerical 14 solutions to those two equations for any m provides the same results with our analytical solutions and expectations. That is. there is no extra bias because of sampling in this kind of model inisspecification. 5.2 B oth Z\ and Z2 are norm ally distributed and constant over tim e We next examine the case where Z\ and Z2 are jointly normally distributed with true model )](1) A0( / ) r . The working model is )Xi (t)el3Z' . Assume the distribution probabilities are constant, over time (which is approx imately true for rare disease). Further assume that Z\ and Z2 are independent from each other at any time. Then Since both and />„,,,/( i) are monotone decreasing function, o t is easily seen to be the unique solution to both equations (unbiased). That is. For any normally distributed and correlated Z\ and Z 2, we can write fol lowing linear regression line. j I r = i] [ t v " E [ Z „ ' « ' I > ' = 1 1 . hjuiiifi) hnest ( / ^ ) a „ (/),> (/)-/: ^Jull — I .si — ° I 15 where p is the correlation coefficient between Z\ and Z2, and <rzx,az2 are the standard deviation of Z i,Z 2, respectively, e is normally distributed. Then Z\ and e are independent from each other. o\Z\ + o 2Z2 = c\\Z\ + o 2(p— ~ Z \ + f) = (o'i + o 2p — )Z, + a 2e. °Zi Thus, if /> , (Tz2 and do not change over time, then (7 = K,,i = n i + (vu>— < *Z x Generally, it is difficult to gel analytical solutions of iijul) and d'ltst for other types of complicated distributions of correlated Z\ and Z2, and when the distributions of Z\ and Z2 are changing over time. We wrote FORTRAN programs to obtain numerical solutions of and for an idealized in tervention trial where' both Z\. Z2 are either binary or normally distributed at the beginning of the t rial. 5.3 Idealized intervention trials w ith Z ].Z 2 binary or normally dis tributed For idealized intervention trials, the distributions of Z |.Z 2 for subjects being at risk change over time. Let Zi(/.).Z2(/) be dirhotomous with 0.1 and p[Z|(0) = /. Z2(0) = j] = 7 1^ , where i,j = 0,1. Assuming A0(/) = 1. then the probability of being at risk at 46 e a c h t i m e t is l.\ -I , -(e02 , — <eul , _ie'M+a2 p(t.) — T T o u f -f- JToif + T T lO f + 7 T 1 1 e Let. 7T,j(/) he the distribution probabilities of Z \ , Z 2 at any tim e t, then Xao(t) = 1 / p ( t ) *01 (/) = 7 T 0ie '(fQ2//J(<) ~U)(I) = JTwf ~tfa‘ M O i(0 = ^ 11f " 1 ' M 0- There is no closed form solution for here. However, for nested case-control sampling, it can be shown that when m = 2 . j ’ — 1 ,, / io /> (0 (/,iu*L ‘‘ + P i ) f U|+L>2)(puo + P01 )dt 1 I J u P(l)(P0U + Po\f l '2 )(Pu> + P u ) ( l t J An 1MSL subrouline DQDACH is used to calculate the one-dimensional integration with respect to time I in our FORTRAN program to obtain nu merical solution of and The results are shown later. For any continuous variables Z\ (I.). Z2(t), let the joint density’ of Zi(0), Z 2(0) be /( Z ,( 0 ) ,Z a(0)), then p(t) = j [ e.q>(-le'"z '{t'+''2/2{)) f( Z l {l).Z2(t))dZl(t)dZ2(t). (.5.6) The joint density of Z\(1), Z 2(l) at any particular time I is ~ ------------------------- 7 T T ------------------------ • /'(0 We only calculate numerical results for normally distributed Z.\ and Z2 (at time 0). IMSL subroutine DQAND is used to calculate the expectations in hjuu(li) at any tim e t (two-dimensional integrations) and to calculate the expectations in hntst((3) (four-dimensional integrations) at any time t when m — 2. Because of the numerical problems and the amount of calculation involved, the inte gration w ith respect to tim e I is performed using summation. Because of the same reason, the numerical solution of iJ’est when rn > 2 can not be obtained (more than four-dimensional integration), even though theoretically it exists. 5.4 R esults: bias We have proven that when Z\ and Z 2 are either binary or normally distributed and constant over time, there is no extra bias due to sampling compared to full < ohort analysis when Z2 is omitted. The absolute bias for both full cohort and nested case-control designs depends on the distributions of Z i and Z2 and the effect of Z2 (n2) and the interaction between Z\ and Z2 (a ;j). The absolute bias of the maximum partial likelihood estimators from both full cohort and nested case-control designs when Z2 is omitted is (not considering interaction): |o)) ( ( ^1(1 + 7 T | 11L '2 )(Xqo + ttqi ) \ \ ( J T ih i + ^01 c "2 )( JT lO + 7 T ) 1 ) J when Z x and Z2 are binary, and o 2p ^ i when Z |. Z 2 are normally distributed. Specical cases are: 1. If o 2=0, then = T’f ,, = o ,. no bias. 2. If Z\ and Z2 are independent, then = ,^“es( = oi, that is, no bias. 4 8 For an idealized intervention trial, define 7r,j(0) = p(Z\(Q) = i,Z 2{0) = j ) , i , j = 0,1, O R = 7rO o(0)7rn (0)/7roi(0)7rio(0), then 7r,.(0) =/>(Z,(0) = 1) and Tr.t(0) = p(Z2(0) = 1). Table 5.1 and 5.2 show the results assuming that Z\ and Z2 are independent (OR = 1) at beginning of the trial. Table 5.1 shows that the marginal probabilities p(2j(0) = 1) = i>(Z2(Q) = 1) = 0.5. Table 5.2 assumes that the marginal probabilities p(Z\(Q) — 1) = 0 .1 and p(Z2(0) = 1) = 0.2. Because of subjects failure, Z\ and Z,2 are not independent over time. However, under the null hypothesis (ni = 0). there is no bias for both full cohort and nested case-control designs ( - 0). 1'nder alternative hy pothesis, both and J “1 V , are lower I ban rp (biased toward the null). The larger the effect, from Z2. the more bias t here is. The bias decreases with decreasing p(5/2(0) = 1) and increasing the exposure to Z\ changes bias very little (results not shown here). However, the differences and the relative percent differences between /Jjull and d~fat are very small. One also notices that with the increase of m in nested case-control sampling. :i’ ust approaches the value of iijun. the results we expected. Table 5.3 and 5.4 give the results when Z\ and Z2 are correlated at time 0. W ith «i > 0 and 0 2 > 0. for both null and alternative hypothesis, ii"jnu and /i*Ml are less than o-| (biased toward null) when Z\ and Z2 are negatively correlated {OH < 1). and are generally greater t han 0 1 (biased away from null) when Z\ and Z2 are positively 1 'orrelated (OR > 1). However, for positively correlated Z\ and Z 2, if the effects from Z\ and Z2 are large, jim is biased 4 9 toward null. For the situations with and/or a 2 less than 0, one can change the sign of Oj an d /o r o_. and the direction of the correlation between Zt and Z2, making both «i and n 2 non-negative, then above predictions with respect to th e direction of bias applies. In all cases, the differences and the relative percent differences between 3 jull and are very small. approaches P'jull wl> ei> m ' s increased Table 5.5 shows an example of idealized intervention trial with both Z\ and Z2 having standard normal distributions and various correlation coefficents p at timeO. We divide the time / into 20 intervals /< , in which the probabilities of being at risk (p(h)) are 1.0. 0.05. 0.00. 0.S5. 0.80. 0.75. 0.70. 0.65. 0.60. 0.55, 0.50, 0.45, 0.40, 0.45. 0.40. 0.25. 0.20, 0.15, 0.10, 0.05. respec tively. Formula 5.6 and iterative method are used to calculate each for each 0 |, a 2 and p value. The integration with respect to time I is replaced using a summation of these 20 points. For nested case-control sampling, calculation is done only for m = 2. T h e top part of fable 5.5 gives tin* results when Z\ and Z2 are independent at beginning of the trial (tim e 0). Under the null hypothesis, both full cohort and nested case-control designs provide unbiased estimates of cij when Z 2 is omitted. Under the alternative hypothesis, the estimates from both full cohort and nested case-control are biased toward null. The differece between ft full and 3nest when in = 2 is very small except when both ri| and a 2 are very large. This large difference' might due to numerical calculation inaccuracy. 5 0 since 4 dimensional integrations are involved when m — 2. Later simulation study can be used to confirm this conjecture. When Z\ and Z 2 are positively correlated at beginning of the trial, the estimates from both full cohort and nested case-control designs are generally greater than (biased away front null under both null and alternative hypothesis, but switched direction with large oj and e\2. When Z\ and Z2 are negatively correlated at beginning of the trial, the estimates from both full cohort and nested case-control designs are less than O] (biased toward null). These results are consistent with binary Z\ and Z2. 5.5 R e su lts: v aria n ces an d a s y m p to tic re la tiv e efficiency In this section, we consider the relative efficiency of nested case-control relative to the full cohort if our conjecture regarding to the asymptotic variance formula is true. We begin with a theorem and a proof. We will compare this theoretical result with simulation study in the simulation section. T h e o r e m 3 For I wo indepe ndent covariates Z \. Z2. ij the tnu inodtl is (5.1) and the proposed model is (5.->). then under the null hypothesis that aj = 0, /j * = 0 is the solution to both nested case-control and full cohort design, and the asymptotic relative efficiency of nested case-control relative to the full cohort is (in— 1 )/rn, independent of covariate distributions. If censoring exists, censoring is assumed not to be related to the covariates. 51 Proof. Substitute oi = 0 ami models (5.1, 5.3) into equations (4.5, 4.10), we have r B / £ 2 , « « * - Z ' f " C £ ^ I Yu = 1} * (5 .7 ) J° I S ? j?v J for nested case-control design and r ^ M s ) ( E { Z xc°'z') - E [Z E {ea*z’ ) \ d s (5.8) Jo [ t { e } J for full cohort design. Since O) = 0 and Z\ and are independent at time 0, Z\ and Zi will be independent at any time / (by the assumption of censoring, censoring will not change this independency over time). Then at any time t, E { Y , Z i * * iZi) = Z ' ^ } - FAZm "'*2} = E { Z x)E {ca' z'}. j € i r j€ l' ./€<' We can see that the unique solut ions to equations (5.7. 5.8) are = 0 and H U = 0. Under above situations. I = r M*)l>U {E {Z{} - (E{Z,}f} E { ta'z>},l« Jo S /« « = r Ao(.s);>(.s) { E {zy * z*} - ->E{Zt^ z*)E{Zi} Jo + (E{Zl})lE{,'"z‘))ds = I full and ds 52 = r A „(.< , ) i■{ i e U - 2 1 e I * 0 1 m ./€(' ^ jeu J m - 1 . Since in E fc Z fe 0'*1} = jec * jef' j€f' / • { £ > ' ^ 2 E z '} = )a X > “’*’ } ./€ (■ ’ /€'' jgr j € I' then -*n est - r { E 2 ? * - - * - IE 2 .0* E 2 ' ,/0 ' j€f' jet' + i<E^>T'“ ,z -U _/€(' j€(- J = /'v * » ( • « ) ? » ( * ) / ; J ^jefzi y y * 1 f / s •'" m I '" jer "'2 jet' J ~ Inast' The asymptotic relative efficiency (ARE) of nested case-control against full cohort is 1 nf.-,t m Goldstein and Langholz (1!)9'2) proved that for l:m — 1 matching, the asymptotic relative efficiency is (in — 1 )/m when a single covariate is m od elled when the assumed proportional hazards model (5.3) is correct, indepen dent of censoring and the covariate distributions. O ur results show that the asymptotic relative efficiency remains the same if tlie omitted covariate is in dependent of the modelled covariate and under null hypothesis. O ur proof also 53 shows that the inverse of the information prov ides an unbiased estimate of the asymptotic variance of ti for both full cohort and nested case-control study under the null hypothesis. The size of the score test in this case is nominal for both full cohort and nested case-control design. We numerically calculate the conjectured score variance and information when bias exists for i. This happens when Z\ and Z2 are not independent, or under alternative hypothesis. Table 5.6 shows the representative results. Generally, information is very d o se to the score variance. The bigger the effect from Z2 (larger o > ). the larger the difference. For rare exposure (7T 0o = 0.72,7 T 0i = 0.18,7T io = 0.0S, 7 T i i = 0.02, or x0o = 0.700,7 r0I - 0.194,7r)0 = 0.094,7Tii = 0.000). S and I are almost identical for both full cohort and nested case-control designs although relatively large bias exists for ji. 5.6 B o th Z\ a n d Z2 are b in a r y w ith in te ra c tio n We consider a special case where Z\ and Z2 are binary with true : 1 I The working model is Y,(/)A|(Of We first assume th at the joint proba bility of exposure distribution of Z\ and Z2 are constant over time (which is approximately true for rare disease). Using the same notation and derivation method as in Section 5.1, it can be shown that the analytical solution to fjfull 79 For m = 2, we analytically solve the binomial equation for /?“es(. The solution is exactly the same as /^„,/. For 111 > 2, numerical solutions from our FOR TRAN program indicate that the same estimates will be provided by nested case-control studies with any in. That is, there is no extra bias because of sampling if the covariate (of course the interaction term) is omitted. We next consider an idealized intervention trial using the strategy in Sec tion 5.3. An IMSL subroutine DQDACII is used to calculate the one- dimen sional integration with respect to time / in our FORTRAN program to obtain numerical solution of and i'lfxr An application of the above results to genetic epidemiologic studies is pro vided by Table 5.7. Assume that an environmental factor infers additional risk of disease only in a genet ically susceptible subgroup of the population. Let Z\ represent the environmental factor and Zi represent the genetic susceptibility. In the interaction model, this situation corresponds to « | = 0, 0 2 = 0, a 3 > 0. We evaluate the situation where the relative risk for the environmental factor in genetically susceptibility is 100 (o3 — 4.605). Further assume that the two factors are independent. Model misspecification exists when the genetic factor is not known. Table 5.7 shows the bias of J m in the extreme cases (rare exposures and very large cv3). In the situation with little censoring (constant distribution of 2Ts), we would conclude that the environmental factor is a risk factor itself (/3* = 1.092). With an idealized intervention trial, ll' = 0.02. One notices the large change of /i" because of the changes of distributions of Z \ , Zj. Practically, one might have an estim ate of -,i~ between 0.0'2 and 1.002. In any case, nested case-control designs provide the same results as full cohort designs. •56 T a b le 5.1: B ia s o f n e s t e d c a s e -c o n t r o l v s . full c o h o r t w ith Z i o m i t t e d for id e a liz e d in t e r v e n t io n trial" Oi C V 2 ''full in OR finest = l 6 Difference0 Percent'* 0.0 0.5 0.0000 2 0.0000 0.0000 0.00 4 0.0000 0.0000 0.00 5 0.0000 0.0000 0.00 10 0.0000 0.0000 0.00 20 0.0000 0.0000 0.00 1.0 0.0000 2 0.0000 0.0000 0.00 4 0.0000 0.0000 0.00 5 0.0000 0.0000 0.00 10 0.0000 0.0000 0.00 20 0.0000 0.0000 0.00 2.0 0.0000 2 0.0000 0.0000 0.00 4 0.0000 0.0000 0.00 5 0.0000 0.0000 0.00 10 0.0000 0.0000 0.00 20 0.0000 0.0000 0.00 0.5 0.5 0.4740 2 0.4754 0.0008 0.17 4 0.4752 0.0000 0.14 5 0.4750 0.0001 0.08 10 0.1718 0.0002 0.04 20 0.4747 0.0001 0.02 1.0 0.4254 2 0.4262 0.0009 0.21 4 0.1259 0.0006 0.14 5 0.4257 0.0004 0.09 10 0.4255 0.0002 0.05 20 0.4254 0.0001 0.02 2.0 0.4475 2 0.4466 -0.0009 -0.26 4 0.4409 -0.0000 -0.17 5 0.4471 -0.0004 -0.12 10 0.4474 -0.0002 -0.06 20 0.4474 -0.0001 -0.04 * Z\ and Zn are dichotomous variables with jr fJ -(0) = p(Zi (0) = * , Zn(0) = j), ij = 0, 1, baseline hazard A o(<) = I. 6 ir00(0) = 0.25, t t q i(0) = 0.25,j t ,o (0) = 0.25, jrn (0) = 0.25. OR= Jroo(0)7r11(0)/7roi> (0)jr,t,(0). e d * 1 0 0 5 7 T a b le 5.1: B ia s o f n e s t e d r a s e -r o n tr o l v s. full c o lio r t w it h Z i o m i t t e d for id e a liz e d in te r v e n tio n trial" ( c o n t in u e d ) ft, « 2 tfull in OR Pn»t = \h Difference0 Percent^ 1.0 0.5 0.9526 ■ > 0.9573 0.0047 0.49 :{ 0.9501 0.0035 0.37 5 0.9549 0.0023 0.24 10 0.9538 0.0012 0.13 20 0.9532 0.0006 0.06 1.0 0.8529 2 0.8604 0.0075 0.88 1 0.8583 0.0054 0.63 5 0.8502 0.0033 0.39 10 0.8510 0.0017 0.20 20 0.8537 0.0008 0.09 2.0 0.0820 2 0.0809 -0.0017 -0.25 :i 0.0810 -0.0016 -0.23 5 0.0814 -0.0012 -0.18 10 0.6819 -0.0007 -0.10 20 0.0822 -0.0004 -0.06 2.0 0.5 1.9188 2 1.9384 0.0190 1.02 :l 1.9352 0.0161 0.85 5 1.9309 0.0121 0.03 10 1.9259 0.0071 0.37 20 1.9225 0.0037 0.19 1.0 1.7:121 2 1.7764 0.0443 2.50 • 1 1.7677 0.0350 2.06 5 1.7568 0.0247 1.43 10 1.7453 0.0132 0.70 20 1.7387 0.0060 0.38 2.0 I.:i:l80 2 1.30 19 0.0209 2.01 • 1 1.3502 0.0182 1.30 5 1.3477 0.0097 0.72 10 1.3416 0.0030 0.27 20 1.3393 0.0013 0.10 “ Z\ and Z-> are dirliotonioiis variables with tr,j(0) - p(Zi(0) = i,Zn{0) = j), ij - 0, 1 , baseline hazard Ao(/) = I. 6 JToo(O) = 0.25, jtoi(O) = 0.25. zr,0( 0 ) = 0.25, jrM (0) = 0.25. O R = JrO o(0)7ri i(0)/7Toi(0)<Tin(0). C finest - ft full- ■ ' -"-'jr*/'" * l()U- 5 8 T a b le 5.2: B ia s o f n e s t e d c a s e -c o n tr o l v s. lull c o h o r t w it h Z i o m i t t e d for id e a liz e d in te r v e n tio n trial" «1 o 2 djutt in OR 1 nest = l 6 DifFerencec Percent'* 0.0 0.5 -0.0001 2 0.0000 0.0001 0.00 3 0.0000 0.0001 0.00 5 -0.0001 0.0000 0.00 10 -0.0001 0.0000 0.00 20 -0.0001 0.0000 0.00 1.0 -0.0001 2 0.0000 0.0001 0.00 3 0.0000 0.0001 0.00 5 -0.0001 0.0000 0.00 10 -0.0001 0.0000 0.00 20 -0.0001 0.0000 0.00 2.0 -0.0001 2 0.0000 0.0001 0.00 5 0.0000 0.0001 0.00 5 -0.0001 0.0000 0.00 10 -0.0001 0.0000 0.00 20 -0.0001 0.0000 0.00 0.5 0.5 0.487:1 2 0.4877 0.0004 0.08 3 0.4874 0.0001 0.02 5 0.4873 0.0000 0.00 10 0.4874 0.0001 0.02 20 0.4873 0.0000 0.00 1.0 0.1 (> :{ () 2 0.4«31 0.0001 0.02 :{ 0,1630 0.0000 0.00 5 0.4631 0.0001 0.02 10 0.4630 0.0000 0.00 20 0.4629 -0.0001 -0.02 2.0 0.4223 2 0.4220 -0.0003 -0.07 3 0.4220 -0.0003 -0.07 5 0.4222 -0.0001 -0.02 10 0.4223 0.0000 0.00 20 0.4223 0.0000 0.00 " Z\ and Z-, are ilicliolomoiis variables with T/j(0) = )i(Zi(0) = /'. Z-<(0) = j), ij - 0, 1 , baseline hazard An(/) = I. 6 troo(0) = 0.72, tr0,(U) = 0.18. trln(0) = 0.08. jt,,{0) = 0.02. O R = Jroo(0)trii(0)/trni(0)7rio(0). c - Pjuii- d * »»«■ T a b le 5 .2 : B ia s o f n e s t e d c a s e -c o n tr o l v s. full c o h o r t w ith Z 2 o m i t t e d fo r id e a liz e d in t e r v e n t io n tria l'1 ( c o n t i n u e d ) O] cv2 rf’ tutt /» OR finest = 1* Difference0 Percent'* 1.0 0.5 0.0772 2 0.9779 0.0007 0.07 :i 0.9777 0.0005 0.05 5 0.9770 0.0004 0.04 10 0.9774 0.0002 0.02 20 0.9773 0.0001 0.01 1.0 0.9277 2 0.9288 0.0011 0.12 :l 0.9283 0.0006 0.06 5 0.9282 0.0005 0.05 10 0.9279 0.0002 0.02 20 0.9278 0.0001 0.01 2.0 0.8:100 2 0.8290 -0.0016 -0.19 :{ 0.8292 -0.0014 -0.17 5 0.8290 -0.0010 -0.12 10 0.8298 -0.0008 -0.10 20 0.8302 -0.0004 -0.05 2.0 0.5 1.90:14 2 1.9670 0.0036 0.18 :{ 1.9665 0.0031 0.16 5 1.9660 0.0026 0.13 10 1.9652 0.0018 0.09 20 1.9640 0.0012 0.06 1.0 1.871:1 2 1.8785 0.0072 0.38 :{ 1.8776 0.0063 0.31 5 1.8763 0.0050 0.27 10 1.8749 0.0036 0.19 20 1.8735 0.0022 0.12 2.0 1.0.112 2 1.6314 0.0002 0.01 :l 1.6311 -0.0001 -0.01 5 1.6309 -0.0003 -0.02 10 1.6307 -0.0005 -0.03 20 1.6307 -0.0005 -0.03 " Z\ and Zn are dicliotoinoiis variables with <r,j(0) = p(Zj(0) = 0) = j),ij = 0,1, baseline hazard A 0(f) = I. b troo(0) = 0.72, troi(O ) = 0.18,jrln(0) = 0.08, »M (0) = 0.02. OR= -n -o o ( 0) a-! 1 (0)/s r,) 1 (0) 7 T 1,, (0). c Kt't-H'jun- 4 * 100. 60 T a b le 5.3: B ia s o f n e s te d r a s e -r o n tr o l vs. full c o h o r t w ith Z 2 o m i t t e d for id e a liz e d in t e r v e n t io n trial" «i 02 fi'fnll /» OR = finest 0.23* Difference0 Percent'* 0.0 0.5 -0.0635 2 -0.0634 0.0001 -0.16 3 -0.0634 0.0001 -0.16 5 -0.0635 0.0000 0.00 10 -0.0635 0.0000 0.00 20 -0.0635 0.0000 0.00 1.0 -0.1028 2 -0.1027 0.0001 -0.10 3 -0.1027 0.0001 -0.10 5 -0.1027 0.0001 -0.10 10 -0.1028 0.0000 0.00 20 -0.1028 0.0000 0.00 2.0 -0.1 108 2 -0.1409 -0.0001 0.07 3 -0.1408 0.0000 0.00 5 -0.1 108 0.0000 0.00 10 -0.1409 -0.0001 0.07 20 -0.1409 -0.0001 0.07 0.5 0.5 0.4236 2 0.4235 -0.0001 -0.02 3 0.4234 -0.0002 -0.05 5 0.4236 0.0000 0.00 10 0.4235 -0.0001 -0.02 20 0.1236 0.0000 0.00 1.0 0.3595 2 0.3591 -0.0001 -0.11 3 0.3590 -0.0005 -0.14 5 0.3593 -0.0002 -0.06 10 0.3593 -0.0002 -0.06 20 0.3591 -0.0001 -0.03 2.0 0.2801 2 0.2795 -0.0009 -0.32 3 0.2798 -0.0006 -0.21 5 0.2799 -0.0005 -0.18 10 0.2801 -0.0003 -0.11 20 0.2802 -0.0002 -0.07 “ Z\ and Z2 arc diohotomoiis variables with *,;(••) = p(Zi(0) = i.Z% > (t)) — j),ij = 0. 1 . baseline hazard A ,>(/) = 1 . ‘ ffoo(0) = 0.706, ffoi(0) = 0.15)4. *,,,(0) = 0.05)4, irM 00 = 0.006. O R = 7Too(0)7rxl(0)/7roi(0)a-,,,(0). , 3 * , 3 * 1 % , 1 S ■ I > • > U I I ■ ■ , , . T a b l e 5.3: B ia s o f n e s t e d rase-eon t.rol v s. full cohort, w i t h Z 2 o m i t t e d for id e a li z e d in t e r v e n t io n trial" ( c o n t in u e d ) «1 «2 ll* m OR Pneat = 0.23* Difference0 Percent'* 1.0 0.5 0.0133 2 0.0134 0.0001 0.01 3 0.9131 -0.0002 -0.02 5 0.9132 -0.0001 -0.01 10 0.9132 -0.0001 -0.01 20 0.9132 -0.0001 -0.01 1.0 0.8231 ■ ) 0.8219 -0.0012 -0.15 3 0.S221 -0.0010 -0.12 5 0.8224 -0.0007 -0.09 10 0.8226 -0.0005 -0.06 20 0.8229 -0.0002 -0.02 2.0 0.6857 2 0.6813 -0.0044 -0.64 :{' 0.6822 -0.0035 -0.51 5 0.6832 -0.0025 -0.36 10 0.6812 -0.0015 -0.22 20 0.6849 -0.0008 -0.12 2.0 0.5 1.0001 2 1.9015 0.0011 0.06 3 1.9011 0.0007 0.04 5 1.9008 0.0004 0.02 10 1.9007 0.0003 0.02 20 1.9006 0.0002 0.01 1.0 1.7682 2 1.7674 -0.0008 -0.05 3 1.7673 -0.0009 -0.05 5 1.7673 -0.0009 -0.05 10 1.7676 -0.0006 -0.03 20 1.7678 -0.0004 -0.02 2.0 1.1827 2 1.4706 -0.0121 -0.82 3 1.4720 -0.0107 -0.72 5 1.4739 -0.0088 -0.59 10 1.1767 -0.0060 -0.40 20 1.1792 -0.0035 -0.24 1 1 Z\ and Z -< arc dirliolomous variables with <r,j(0) = p(Zj(0) = i.Z-j(U) = j).ij = 0, 1 . baseline hazard XnV) = 1 . 4 JToo(O) = 0.706. JToi(O) = 0.104, ff,(,(0) = 0.094, jrM (U) = 0.006. OR= Jroo(0)ffn(0)/TodO)n-|„(0). C /I’ — * IIH ) f t Pnest Pjulf 7TT * luu- 62 T a b le 5.4: B ia s o f n e s t e d r a s e -c o n tr o l vs. full c o h o r t w ith Z 2 o m i t t e d for id e a liz e d in t e r v e n t io n trial" « 1 o 2 fi'futl m OR ■ f * ■ nest = 5.06 Differencec Percent'* 0 . 0 0.5 0.1694 ■ ) 0.1694 0 . 0 0 0 0 0 . 0 0 2 0.1094 0 . 0 0 0 0 0 . 0 0 5 0.1094 0 . 0 0 0 0 0 . 0 0 1 0 0.1094 0 . 0 0 0 0 0 . 0 0 2 0 0.1094 0 . 0 0 0 0 0 . 0 0 1 .0 0 . 2 1 1 2 2 0 . 2 1 1 0 0.0004 0.13 2 0.2115 0.0003 0 . 1 0 5 0.2114 0 . 0 0 0 2 0.06 1 0 0 . 2 1 1 2 0 . 0 0 0 1 0.03 2 0 0 . 2 1 1 2 0 . 0 0 0 0 0 . 0 0 2 . 0 0.-1752 2 0.4799 0.0046 0.97 2 0.-1785 0 . 0 0 2 2 0.67 5 0.1772 0 . 0 0 2 0 0.42 1 0 0.1702 0 . 0 0 1 0 0 . 2 1 2 0 0.4758 0.0005 0 . 1 1 0.5 0.5 0 .0 - 1 0 0 2 0.0477 0 . 0 0 1 1 0.17 2 0.0474 0.0008 0 . 1 2 5 0.0471 0.0005 0.08 1 0 0.0408 0 . 0 0 0 2 0.03 2 0 0.0407 0 . 0 0 0 1 0 . 0 2 1 .0 0.7272 2 0.7428 0.0006 0.90 2 0.7420 0.0048 0.65 5 0.7402 0 . 0 0 2 1 0.42 1 0 0.7288 0.0016 0 . 2 2 2 0 0.7280 0.0008 0 . 1 1 2 . 0 0.8092 2 0.8270 0.0182 2.26 2 0.8224 0.0131 1.62 5 0.8174 0.0081 1 . 0 0 1 0 0.8124 0.0041 0.51 2 0 0.8112 0 . 0 0 2 0 0.25 “ Z\ and Zn are dicliot.omons variables wi11 • 7r,j(()) = f/(Zi(0) — i, Z-< (0 ) = j), ij = 0, 1 , baseline hazard Ao(/) = I. 6 7T on (0) = 0.20.jr„i(0) = 0.20. Jr,n(0) = 0.10. rr,, (0) = 0.50. OR= too(0)t, i(0)/thi(0)T|U (0). ' < c „ T a b le 5.4: B ia s o f n e s te d r a s e - c o n t m l v s . full c o h o r t w ith Z-i o m i t t e d for id e a liz e d in t e r v e n t io n trial'1 ( c o n t in u e d ) « 1 a 2 fi'fult in OR = 5.0* Difference0 Percent'* 1 . 0 0.5 1.1265 2 1.1320 0.0055 0.49 3 1.1307 0.0042 0.37 5 1.1294 0.0029 0.26 1 0 1.1280 0.0015 0.13 2 0 1.1273 0.0008 0.07 1 . 0 1.1661 . ■ > 1.1891 0.0227 1 .95 3 1.1839 0.0175 1.50 5 1.1780 0.0116 0.99 1 0 1.1725 0.0061 0.52 2 0 1.1695 0.0031 0.27 2 . 0 1.1:180 2 1.1815 0.0435 3.82 3 1.1699 0.0319 2.80 5 1.1580 0 . 0 2 0 0 1.76 1 0 1.1 180 0 . 0 1 0 0 0 . 8 8 2 0 1.1429 0.0019 0.43 2 . 0 0.5 2.0048 2 2.1157 0.0209 1 . 0 0 3 2.1126 0.0178 0.85 5 2.1083 0.0135 0.64 1 0 2.1030 0.0082 0.39 2 0 2.0993 0.0045 0 . 2 1 1 . 0 2.0420 2 2.1240 0.0820 4.02 :l 2 . 1 1 1 2 0.0692 3.39 5 2.0932 0.0512 2.51 III 2.0720 0.0300 1.47 2 0 2.0579 0.0159 0.78 2 . 0 1.7775 2 1.9292 0.1517 8.53 3 1.8963 0.1188 6 . 6 8 5 1.8562 0.0787 4.43 1 0 1.8170 0.0395 2 . 2 2 2 0 1.7964 0.0189 1.06 ° Z\ and Zn are dicliotomons variables with 70,(0) = p(Zi(0) = /, Z->(0) = j), ij = 0,1, baseline hazard A u(/) = 1 . * ;ron(0) = 0.20, w«i(0) = 0.2U,jr,„(0) = 0.10, jrM (0) = 0.50. O R = tfoolOlJrufOl/TudOlTndO). e K»,-H}uu- ' ' % ^ * I 0 0 . 64 T a b l e 5.5: B ia s o f n e s te d r a s e -c o n tr o l vs. full c o h o r t w ith Z-i o m i t t e d for an id e a liz e d in t e r v e n t io n trial" «t o 2 ft'full $ mat Difference0 Percent'* 0 . 0 0 . 0 0 . 0 0 . 0 0 0 0 0 . 0 0 0 0 0 . 0 0 0 0 0 . 0 0 0.5 0 . 0 0 0 0 0 . 0 0 0 0 0 . 0 0 0 0 0 . 0 0 1 . 0 0 . 0 0 0 0 0 . 0 0 0 0 0 . 0 0 0 0 0 . 0 0 0.5 0 . 0 0.5000 0.5001 0 . 0 0 0 1 0 . 0 2 0.5 0.4162 0.4277 0.0115 2.76 1 . 0 0.3237 0.3282 0.0045 1.39 1 . 0 0 . 0 1 . 0 0 0 0 1.0006 0.0006 0.06 0.5 0.836-1 0.8938 0.0574 6 . 8 6 1 . 0 0.5607 0.7112 0.1505 26.84 0.5 0 . 0 0 . 0 0 . 0 0 0 0 0 . 0 0 0 0 0 . 0 0 0 0 0 . 0 0 0.5 0.2103 0.2219 0.0026 1.19 1 .0 0.3500 0.3791 0.0282 8.04 0.5 0 . 0 0.5000 0.5000 0 . 0 0 0 0 0 . 0 0 0.5 0.6600 0.7028 0.0419 0.34 1 .0 0.7010 0.8560 0.1520 21.59 1 . 0 0 . 0 1 . 0 0 0 0 1 . 0 0 0 2 0 . 0 0 0 2 0 . 0 2 0.5 1.1551 1.4376 0.2825 24.46 1 .0 1.1040 1.3683 0.2634 23.84 " Doth Z\ iuicl Z ,-> arc standard normal distribution at time 0. A 0(/) = I. 0*e„ is for m = 2. b p is the correlation coolliccnt between Z\ and Z < > . t^ * — '1 * C . ,1 * _ d •■/ nil t I (III Pjulf n * 1UU- 6 5 T a b le 5.6: A s y m p t o t i c v a r ia n c e for n e s t e d c a s e -c o n t r o l a n d full c o h o r t w ith Zi o m i t t e d for a n id e a liz ed in te r v e n tio n trial" distribution «i o 2 design d* V / 7 T O o(0)=0.25 0.5 0.5 full 0.4746 0.2373 0.2381 =0.25 m = 2 0.4754 0.1130 0.1130 jt1o(0)=0.25 m = 6 0.4750 0.1956 0.1960 jrn (0)=0.25 m = l l 0.4748 0.2146 0.2152 (OR=1.0) 2 . 0 2 . 0 full 1.3380 0.1794 0.1834 m = 2 1.3650 0.0672 0.0672 1 1 1 = 6 1.3456 0.1383 0.1415 m = l 1 1.3412 0.1507 0.1611 tr(iO ( 0 )= 0 . 2 0 0.5 0.5 full 0.0166 0.2179 0.2187 7 t o,(0)=0.20 m = 2 0.6477 0.1060 0.1060 jt,o(0)=0.10 m = 6 0.6170 0.1816 0.1820 jrn (0)=0.50 m = l l 0.6468 0.1983 0.1989 (OR=5.0) 2 . 0 2 . 0 full 1.7775 0.1287 0.1516 m = 2 1.9202 0.0451 0.0451 1 1 1 = 6 1.8438 0.0989 0.1064 m = l 1 1.8132 0.1125 0.1265 7 T oo(0)=0.72 0.5 0.5 full 0.4873 0.0885 0.0886 JT o, (0)=l). U S m = 2 0.1877 0.0352 0.0352 jr,o(O)=0.08 1 1 1 = 6 0.4874 0.0681 0.0681 jr„ ( 0 )= 0 . 0 2 m = ll 0.4873 0.0770 0.0771 (OR=1.0) 2 . 0 2 . 0 full 1.6312 0.0773 0.0771 m = 2 1.6314 0.0154 0.0154 1 1 1 = 6 1.6309 0.0437 0.0437 m = ll 1.6308 0.0562 0.0563 jroo(0)=0.70(i 0.5 0.5 full 0.1236 0.0889 0.0888 jrO i(0)=0.194 m = 2 0.1235 0.0366 0.0366 ffIO( 0 )= 0 .0 S M 1 1 1 = 6 0.1235 0.0692 0.0692 jrn (0)=0.006 m = ll 0.4235 0.0779 0.0778 (OR =0.23) 2 . 0 2 . 0 full 1.4827 0.0802 0.0779 m = 2 1.4706 0.0176 0.0176 1 1 1 = 6 1.4748 0.0474 0.0471 m = ll 1.4772 0.0598 0.0592 “ Z \ and Zo are dirliotomoiis, Ao(/) = I. 53 T a b l e 5.7: B ia s o f n e s te d r a s e -r o n tr o l v s. fu ll c o h o r t w i t h Z 2 o m i t t e d for in t e r a c t io n m o d e l Constant. distribution of Z\ ,Z 2 7 r 0 | cv2 0.1 f t ’ futt m Unest Jtoo = 0.83.1 0 0 4.605 1.092 2 1.092 *01 = 0.017 3 1.092 *"lO = 0.147 5 1.092 T T l 1 = 0.003 10 1.092 v(Zt = ) = 0.15, ;> ( Z -> - 1) © © I I 011=1 Idealized intervention trial 7T(0) O'l 0 2 tV ;, ii%„ ni ^nrsl JTuo(O) = 0.833 0 0 1.605 0.0199 2 0.0200 itui(O) = 0.017 3 0.0200 S T U I(0) = o.l 17 5 0.0200 7r,, (0) = 0.003 10 0.0199 p ( Z i(0) = 1) = 0.15./> ( 2 T 3(0) = 1) = 0.02, OR=l. 67 6 O ther types of m odel m isspecification In this section, three other types of model misspecification are considered. They are: mismodelling available measurements, the true model is acclerated failure tim e model, anil measurement error. 6.1 M ism odelling available m easurements By mismodelling available measurements here we mean that the exposure vari ables or covariates are included in the working model, but the functional forms of those covariates are misspccilied. I'or example, the true functional form for exposure variable Z is Z i. and we model it as linear Z , or the true relationship is loglinear and the modelled relationship is not loglinear. As one application, we consider the case where the true hazard is A0 and the proposed model is Other situations can be evaluated with the same tech nique. We have working model since Zj{l) = Zf(l). We calculate and /i'es( when Z,(t) is normally distributed. M l ) = £ = 1] E{fm ') | Y = 1] E[eaZ2(t) | Y = 1] \(lt. If Zi(t) is a binary variable with values 0 and 1, then there is no bias in the First we assume that the distribution of Z does not change over time (approximately true for a rare disease). Numerical calculation indicates that if Z(t) is normally distributed with mean 0 and any size of variance, then fim j ull and fi'ieH t are always 0 for any values of o. In another words, if the true effect of a normal covariate with mean 0 is quadratic under proportional hazards, using a linear functional form can not estimate the true effect of that covariate at all for both full cohort and nested case-control studies. However, if the mean of that normal variable is not zero, using a linear form can estimate the true effect a to a certain degree. The bias comparing to tin' true effect depends on the sizes of the mean and variance of that covariate. Table 6.1 presents the results with Z distributed as iV(0.3,0.12) and /V(0.5,0.12). finest was on|y calculated for m = 2. When Z is Af(0.3, 0.12), both estimates from full cohort and nested case-control designs are biased towards null. How ever, the biases from lull cohort and nested case-control designs are almost identical. When Z is .\;(0.5.0.1-’). both full cohort and nested case-control designs provide almost unbiased estimates of the true effect a. For an idealized intervention trial, in which all subjects are followed until the event happened, Table 6.2 presents some representative results. The pa rameters in Table 6.2 are similar to fable 6.1 except that Z is only normally distributed at the beginning a trial. Because of failure subjects are censored from the study, the distribution of Z will not be normally distributed at any time t > 0. One notices, however, that the results in Table 6.2 are very sim- 6 9 ilar to Table 6.1. Also, if the mean value of Z is assumed to be zero at the beginning of a trial (with any variance), the estimate of 7, effect is zero. In summary, for rare diseases or an idealized intervention trial, if Z is normally distributed at the beginning of the trial and the true effect from Z is proportional hazards with quadratic form, the estimates from both full cohort and nested case-control designs are biased if Z is modelled as a linear form. The actual biases depends on the distribution of Z. Nested case-control designs provide close estimates 1o the full cohort. Sampling does not introduce extra bias in this situation. 6.2 The true m odel is accelerated failure tim e model Suppose that the true model is accelerated failure time model with log ./ — Ou T O ' Z T (71'l', where a 1 = ( o j n,,) and n > 0 are unknown parameters, and IT has density ./(»’) and distribution function /■ ’(«’). Consider the special case where IT has t he ext reme value distribution for a minimum, /(«>) - (.c/»( if — e > 0 then 7’ has a Wei bull distribution with density function ./'(/) = Ap(A/)''-'c-,U)". and survival function as 70 where p = <7 1, and A = r (°o+»r^) rf | le hazard function for the Weibull distribution is A(/) = \p ( \ i ) p~' = ir -h a~l- lt - a~'ao er”~'aTz. One can easily see that Weibull distributions belong to the proportional hazards model family with A 0 (Z) = 1 - 1 c- " ln° and relative risk e- " 'aTz. Instead of modelling T as a Weibull distribution, if the proposed hazard func tion given Z is taken to be A(Z) = A„ ( t y iT/. then it is easily seen from h /uii(,i) and hnf:.st(^) that = —a ~ xa is the only solution. The absolute bias due to model misspecification for full cohort and nested case-control studies depends on the shape parameter a - 1 of the Weibull distribution. Again, sampling does not introduce extra bias in this case, i.e., nested case-control design represents corresponding full cohort study. 6.3 Measurement, errors Consider a classical measurement error model Z'(t) = Z(t) + t(l), where Z(l) is the true exposure. Z"(t) is the measured exposure and <■ (/) is the measurement error which is normally distributed and independent of Z(t). Assume the true model is V;«)A0 ( / ) c ^ (,). (6.1) and the working model is then h j M = f ~ A Ut)p(t){E[Z'(t)e:a^'^-'^ | Y = 1 ] - -■ E[ca(Z'{0 ~<(0) | V = 1 ] \ r// E [ CW ) | V = 1] U[C | f _ .. y ! y = 1 L i(. One can see tliat this type of measurement error can he considered as a special case of missing covariates willi (lie true model is Au(/)eu^,,l,- |’ *dO) and the proposed model is A| The t.echnicpie used in the missing covariate section can be fully applied here. Specifically, if Z '( t ) and Z(l) are assumed to he bivariately normally distributed and the distributions do not, change over time, then the exact bias can be calculated. Let <jz and a, be the standard deviations of Z ( f ) and t{l). respectively, then = a - a f>~ where oz* = \ J ° \ + af and p is the correlation coefficent, between Z m (t) and which is the same results derived by IVpe et al. (1980) for the full cohort design. In this paper, we conclude that nested case-control sampling has the same bias as the full cohort design when measurement error exists (assuming classical measurement, error model) and true exposure and error distributions are constant over time. 73 T a b le 6 .1 : B ia s o f n e s t e d r a s e -c o n t r o l vs. full c o h o r t w it h Z 2 m o d e lle d a s Z w h e n t h e d is t r ib u tio n o f 7, is c o n s t a n t o v e r tim e" N (m e a v ,$ D 2) o fi’ f nil finest Differencec Percent'* Af(0.3,0.12) -2.0 -1.1539 -1.2009 -0.0470 0.41 -1.0 -0.58S2 -0.5997 -0.0115 1.96 -0.5 -0.2970 -0.2995 -0.0025 0.84 0.0 0.0000 0.0000 0.0000 0.00 0.5 0.3030 0.2992 -0.0038 -1.25 1.0 0.6122 0.5989 -0.0133 -2.17 2.0 1.2500 1.2005 -0.0495 -3.96 V(0.T. (). | - ) -2.0 -1.9231 -2.0000 -0.0769 1.00 -1.0 -0.9801 -1.0003 -0.0199 2.03 -0.5 -0.1591 -0.4995 -0.0404 8.80 0.0 0.0000 0.0000 0.0000 0.00 0.5 0.5051 0.4992 -0.0059 -1.17 1.0 1.0204 0.9997 -0.0207 -2.03 2.0 2.0833 2.0007 -0.0826 -3.96 “ Z is normally (list ribiited. .f',,, is for in = 2. 1 The distribution parameter for Z. ' i(ll)- 74 T a b le 6.2: B ia s o f n e s t e d c a se -c o n t rol vs. full c o h o r t w ith Z 2 m o d e lle d a s Z for an id e a liz e d in te r v e n tio n trial" N (m c a n , S D 2)1 ' o ft'full Fn»t Difference1 7 Percent'* jV(0.3,0.12) -2.0 -1.1925 -1.2003 -0.0078 0.65 -1.0 -0.5985 -0.6003 -0.0018 0.30 -0.5 -0.2996 -0.3000 -0.0004 0.13 0.0 0.0000 0.0000 0.0000 0.00 0.5 0.3003 0.3000 -0.0003 -0.10 1.0 0.6001 0.6003 -0.0001 -0.02 2.0 1.2033 1.1997 -0.0006 -0.05 .V(0.5.0.12) -2.0 -1.987 1 -2.0004 -0.0166 0.83 -1.0 -0.9968 -1.0004 -0.0072 0.72 -0.5 -0.4992 -0.5001 -0.0009 0.18 0.0 0.0000 0.0000 0.0000 0.00 0.5 0.5005 0.5001 -0.0004 -0.08 1.0 1.0018 1.0004 -0.0014 -0.14 2.0 2.0056 2.0019 -0.0037 -0.18 “ Z(0) is normally (listribuled at linn1 0. A|>(/) — l- t ', ,, is for m = '2. 1 The distribution parameter lor Z al lime 0 . C ^ , , - ^ u U - •' * »>»• 75 7 Simulation study In the previous sections, we gave the theoretical formulae for the convergence of the maximum partial likelihood estimator and a conjecture for the asymptotic distribution of the maximum likelihood estimator for full cohort and nested case-control designs when general model misspecification exists. We also cal culated the asymptotic results for several types of model misspecification. The general conclusions were: maximum partial likelihood estimates have biases for both full cohort and nested case-control designs when model misspecification exists. However, the biases between full cohort and nested case-control designs are almost the same. The inverse of the information provides a close estimate of the asymptotic variance of the maximum partial likelihood estimator when model misspecification exists. To evaluate the performance of nested case-control design comparing to full cohort design for limited sample sizes and our conjecture about the asymp totic variance when model misspecilication exists, a simulation study was con ducted. Section 7.1 describe the design of the simulation study. Section 7.2 summarize the study parameters. Simulation results about bias are presented in Section 7.T Section 7.1 shows the results related to the score variance, information and variance of maximum likelihood estimator of ji. 76 7.1 Design of th e sim ulation study There were two parts in the simulation study. The first part, was to gen erate data from an assumed true model. YVe only considered the idealized intervention trials here in order to make a comparison with the asymptotic results. In another words, we assumed all subjects enter the trial at the same tim e and were followed until an event with 110 other censoring. The hazard rate was from the assumed true model. Two types of model misspecification were considered: ( I ) the true model was proportional hazard with two di- chotomous 01 normally distributed covariates (independent, or correlated), i.e. A(<) = \ 0(t)eaiZl+a3Z2. and the working model was A(/) = Xt(t)t'3Zl. (2) the true model was A(t) = Au( / . when1 Z\ was normally distributed, and the working model was A(/) = A|(!)<s z '. Other types of model misspecilication can be easily evaluated using the same strategy. To meet the first objective, after selecting a true model and the sample size of the cohort, the covariate values for each subject were generated according to the given covariate distributions at time 0. Then the inverse probability transformation (Newman and Odell. 1 f)71) was used to generate the survival time for each cohort member according to t lie given true model. One thousand independent set cohorts were generated for each set of parameters. The second part involved modelling and sampling. T he working model was proportional hazards with a single linear covariate, i.e., A(/) = X\(t)eiiZ]. Cox’s partial likelihood was used for the analysis of the full cohort data. For each failure, we randomly sampled a given number of controls without replace ment from the remaining at risk subjects. The partial likelihood (replacing the risk set with the sampled controls plus the failure) was used for sampled data. Newton-Raphson iterative method was employed to get the maximum partial likelihood estimates for both the full cohort and the nested case-control designs. Besides the maximum partial likelihood estimate of 8, the variances of () and the score function evaluated at corresponding 8" were calculated based on 1000 independent simulat ion runs. The averages of information evaluated at 8" and /i (observed information) were also com, ' '. 7.2 S tu d y p a r a m e te rs We simulated data under the following true models and parameters. These will be compared to the asymptotic results given in previous sections. 1. The true model was A(/) = Ao(/)c‘>|2l+,>2^2, where Z\ and Zi were binary with distributions at time 0 given by: p(Z, = 1) p(Z2 = 1) OR *00 *01 *10 * 1 1 0.5 0.5 1.0 0.25 0.25 0.25 0.25 0.0 0.7 5.0 0.20 0.20 0.10 0.50 0.1 0.2 1.0 0.72 0.18 0.08 0.02 0.1 0.2 0.2:1 0.700 0.104 0.004 0.000 78 06 with all combinations of ny = 0.0,0.5,1.0,2.0 and o 2 = 0.0,0.5,1.0,2.0. Note that the first two rows were for common exposures and the last two rows were for rare exposures. 2. T he true model was A(/) = A( )(/)f‘ 'm^1+"2^2, where Z\ and Z2 were standard normal distribution at time 0 with correlation coefficient be tween Z \,Z -2 {p) was equal to 0.0 and 0.5, and all combinations of n , — 0.0,0.5. 1.0 and n 2 - 0.0, 0.5,1.0. :l. T he t rue model was A(/) = A(> (/ )c ">^r. wlu're Z\ was normally distributed at time 0 with A (0.-5.0.12). ,V(0.5.0.12) and all the combinations of O) = - 2 .0 , -1.0, -0.5,0.0,0.5, 1.0, 2.0. W ithout loss of generality, X0(t) was assumed to be 1 for all cases. The cohort size was 50. 100 and 500 respectively. In each cohort, for each failure, 1, 5. and 10 controls (m =2. (i. and 1 I ) were independent |y randomly selected from the at risk subjects. 7.3 R esults: the maximum partial likelihood estim ator Table 7.1 through 7.5 show the representative results comparing ji from sim ulation study with the corresponding asymptotic values ( , f ). In each table, we calculated the difference between :in (from simulation study) and jim for cohort size n = 50,100 and 500 when full cohort design (m = oo) or nested case-control design (in = 2,6, 11) was selected. Only the idealized intervention trial was considered. 79 Table 7.1 and 7.2 give the results with true model A(/) = f r“,Zl+° 2^2, where Z\ and Zi were both binary. The results where Z\ and Zi were both normally distributed are given in Table 7.3 and 7.4. Table 7.5 shows the results with true model \ { t ) = f"z ‘, where Z was normally distributed. Note that because of numerical calculation problems, 1' was not calculated for the situation when in > 2 and Z's were normally distributed. We used the full cohort fim when m > 2 in our comparison. The rationale was that the full cohort j}m should provide' a close estimate based on our observation. We also provided the simulation results with no model missperiliration (o 2 = 0). The simulation results indicated that with the increase of the cohort size n, the maximum partial likelihood estimate's of freun the simulation study approacheel the asymptotic value- J~. With cohort size of 100, the difference between ji and jim were almost negligible. Model misspecification did not no tably influence the difference' between 4 and i ~ since* the differences were sim ilar when correct meedels w e're* spee ified. Increasing the* sample size eel nested case-control designs tended te> re'elue e the elifferences and made' the differences more comparable to the lull ceihort. With rare exposure' e enariates ( fable 7.2), the difference bet we-en > i„ and )' teneleel t.e) be larger than that with common exposure covariates, sinee* few failure' subjeets we-re in the* ewpexsure categories. Increasing the cohort size1 reduced the' differe-nces. Note that in fable 7.2 for m = 2 and o t = 1.0.2.0. — i~ was large botli with model misspecification and without model misspecifie-atieui (nj = 0). Thus, it was the general failure 8 0 of the* asymptot ic results to hold at this small sample size with few failure sub jects. Also note that in Table 7.3 and 7.4, with O] = 1.0 and ct-j = 0.5, 1.0 the simulated ftn were much smaller than the asymptotic ft" when m — 2. Using full cohort ft" when in = (i and 11 for nested case-control sampling provided as consistent difference of ft,, — ft’ as for the full cohort. This comparison con firmed our previous conjecture that the calculated asymptotic ft" when m = 2 was not accurately calculated because of 4 dimensional integrations (Section 5.4). Table 7.6 through 7.10 provide representative results of the comparison of ft between full cohort and nested case-control designs in simulation study only. Table 7.6 and 7.7 are the results with true model A(/) = where Z\ and Z-i were both binary. The results when Z\ and Z j, were both normally distributed are given in Table 7.8 and 7.!). Table 7.10 shows the results with true model X[l) = c"x~. where Z was normally dist ributed. Y V e also provided the simulation results with no model misspecification (o-j = 0). The comparisons indicated that with a large cohort (n=500 here), 1:1 matching in nested case-control provided a close estimate to the full cohort design when a binary covariate was omitted. With a small cohort size. 5 to 10 controls per failure subject gave a close est imate to the full cohort design. For rare exposure ("fable 7.7) and large effect from Z\. 1:1 matching (m = 2) gave poor estimates comparing to the full cohort design with small cohort size. Note that this was true even with model correctly specified (oj = 0). Thus, SI this was the general failure of the asymptotic theorem with small sample sizes. Model misspecification does not influence the comparison between full cohort and nested case-control designs. Increasing the cohort size in this case will re duce the difference. When Z 2 was normally distributed and omitted or when Z 2 was modelled as Z the same conclusions hold. Our simulation results confirmed our conclusions previously. We concluded that with model misspecification, such as omitting covariate Z 2 or modelling Z 2 as Z, the maximum partial likelihood estimates of jj from both full cohort and nested case-control designs were biased in general comparing to the true effect o. Nested case-control design provided a close estimate to the lull cohort when model misspecification exists. In general, l:m (m = 1 ,2 ,...) matching asymptotically provided a close estimate to the full cohort. With small cohort or moderate cohort size with rare exposure variables, multiple controls per subject in sampling was neccesary. The principles in selecting the number of controls per subject with no model misspecilications applied. 7.4 R e su lts: score v arian ce, in fo rm a tio n a n d v a ria n c e of fi In this section, we provide a comparison of our simulated score variance, ob served information, variance of i with our conjectured asymptotic ones. We also provide a comparison of variance 1 with the inverse of observed informa tion (model based variance) from the simulation study only. The simulation study was only done when the true model is \(t) — where Z i and 82 Z ji were hotli binary, and the working model was A(/.) = A|(/,)r'iZ' . V V e only considered idealized intervention trials. Table 7.11 and 7.12 show representative results of the comparison between simulated score variance, observed information with the conjectured asymp totic results. E* and / ’ were the conjectured asymptotic score variance and information. Vnru(U(i.1m )) was the simulated score variance, calculated by substituting /i" to the score function I ’(fi ) in each simulation run and calcu lating the variance alter 1000 simulation runs. /„ was the average observed model based information after 1000 simulation runs. All the variances and informations were given per subject. The comparisons indicated that the conjectured score variance and infor mation provided a. good estimate to the simulated ones for different cohort sizes when full cohort or different, sizes of nested case-control sampling designs were selected. Increasing the cohort size tended to reduce the difference. Table 7.13 and 7.14 compare the variances of /?. Var(Q) was calculated using our conjectured formula E " /( / “)'J. Since E* was very close to /* in general (as indicated in Section o.o) anti observed information /„ was a good estimator for /", we also compared the inverse of observed information with the variance of ji from the simulation study (third column, — Varn(fi)) to see the influence of model misspecification on the conventional method of variance estimation (just using inverse of observed information). All the variances presented were given per subject. 83 For bot.li lull cohort and nested case-control designs, the difference bet,ween Varn((3) and Var(ji) (rolunin 2) became smaller with the increase of cohort size, the results we anticipated. In general, model misspecification did not in crease the difference in each cohort size compared to no model misspecification (tv2 = 0) for both designs. 1:1 matching was not an efficient design (but cost- efficient) because of the large variance of /.?. The difference between V a r n(f3) and Var{)3) when rn = 2 was very large in some cases and reduced with the increase of cohort size. This result held with no model misspecification. The comparison between the inverse of observed information and the vari ance of ',3 (column 4) showed that with cohort, size of oOO, the model based variance (inverse of observed information) provided a very good approxim a tion of the true variance* when model misspecificat ion existed and full cohort design was selected. In nested case-control sampling, we have* the sam e con clusions when 5 to 10 controls per subject were chosen. 1:1 matching was not a good choice in terms of using model based variance to estimate the true variance with limited num ber of failure subjects. Bear in mind, above results applied for situations with or without covariate omission. In another words, covariate omission did not alter our conclusions regarding the performance of nested case-control design comparing to full cohort design significantly. With covariate omission, model based variance estimator from both lull cohort and nested case-control designs may be asymptotically biased, however, the bias was very small from our simulat ion studies. 84 T a b le 7.1: C o m p a r is o n b e t w e e n s i m u l a t e d a n d a s y m p t o t i c r e s u lt s o f (3 w it h Z i o m i t t e d for a n id e a liz e d in te r v e n tio n trial" ( O R fc= 5 . 0 ) A. - IT «l ifj h i'' n=50 n=100 o o L ? II 0.0 0.0 •X 0.0000 0.0045 0.0038 -0.0010 2 0.0000 -0.0103 0.0015 -0.0020 0 0.0000 0.0003 0.0031 -0.0022 1 1 0.0000 0.0027 0.0041 -0.0007 0.5 X ' 0.1094 0.0009 0.0079 0.0019 2 0.1094 0.0000 -0.0022 0.0051 0 0.1094 0.0040 0.0107 0.0007 1 1 0.1091 0.0089 0.0104 0.0010 1.0 X 0.:tl 1 2 0.0083 0.0139 -0.0003 2 0.3110 0.0073 0.0045 -0.0032 6 0.3113 0.0007 0.0123 -0.0017 1 1 o .:ili:i 0.0058 0.0121 -0.0010 2.0 X 0.4753 0.0287 0.0181 0.0027 2 0.4799 0.0207 0.0190 0.0028 0 0.4709 0.0231 0.0142 -0.0002 1 1 0.4702 0.0271 0.0117 0.0004 0.5 0.0 X 0.5000 0.0252 0.0079 0.0020 2 0.5000 0.0034 0.0215 0.0021 0 0.5000 0.0200 0.0133 0.0012 1 1 0.5000 0.0255 0.0080 0.0009 0.5 X 0.0400 0.0195 -0.0030 -0.0020 2 0.0477 0.0473 0.0045 0.0014 (i 0.0470 0.0201 -0.0047 0.0000 11 0.0408 0.0171 -0.0052 -0.0025 1.0 X 0.7372 0.0139 0.0099 0.0000 2 0.7438 0.0495 0.0321 0.0013 0 0.7398 0.0140 0.0109 -0.0017 11 0.7387 0.0130 0.0094 -0.0003 2.0 X 0.8093 0.0000 0.0275 0.0013 ■ > 0.8270 0.1122 0.0585 0.0049 0 0.8101 0.0010 0.0262 -0.0007 1 1 0.8130 0.0564 0.0169 -0.0003 " Z\ and Z - > are dichotoinous variables with tr,j(0) = p(Zj (0) = i,Z->(0) = j ) , ij = 0, 1 , baseline hazard A o(<) = I. * troo(O) = 0.20,tr01(0) = 0.20, jrlo(0) = 0.10.7r,,(0) = 0.50. OR= troo(0)7rii(0)/srni(0)Tn,(0). e m = oo refers to full roliori. 8 5 T a b le 7.1: C o m p a r is o n b e tw e e n s i m u l a t e d a n d a s y m p t o t i c r e s u lts o f [i w ith Z 2 o m i t t e d for an id e a liz e d in te r v e n tio n trial" ( O R ' = r i.O, c o n t in u e d ) P» - P " «1 «2 P' O 1 2 ? I I M 0 0 n=500 1.0 0.0 'X' 1.0000 0.0586 0.0276 0.0049 • 2 1.0001 0.0851 0.0622 0.0051 6 1.0000 0.0599 0.0313 0.0047 1 1 1.0000 0.0595 0.0277 0.0058 0.5 X 1.1-265 0.0087 0.0107 -0.0026 • 2 1.1 5 -2 0 0.0345 0.0303 0.0037 (i 1.1-200 0.0100 0.0124 -0.0062 1 1 1.1270 0.0087 0.0106 -0.0033 1.0 X 1.1664 0.0520 0.0320 0.0049 2 1.1891 0.1001 0.0513 0.0022 6 1.176.'! 0.0587 0.0298 0.0032 1 1 1.17-20 0.0471 0.0285 0.0036 •2 .0 X 1.1380 0.0651 0.0146 0.0010 • 2 1.1815 0.1296 0.0159 0.0060 (j 1.1547 0.0611 0.0033 0.0015 1 1 1.1471 0.0565 0.0003 -0.0010 •2 .0 0.0 X '2.0001 0.0570 0.0396 0.0055 2 •2.0001 0.0123 0.1468 0.0175 1 ) •2.0001 0.0733 0.0573 0.0075 1 1 •2.0001 0.0665 0.0427 0.0081 0.5 X •2.0048 0.1036 0.0614 0.0062 • 2 •2.1157 0.0179 0.1053 0.0296 (j •2.1068 0.0860 0.0684 0.0119 1 1 •2.1023 0.1006 0.0567 0.0078 1.0 X 2.0420 0.1146 0.0600 0.0087 • 2 2.1240 0.0190 0.1466 0.0245 (i 2.0871 0.1164 0.0611 0.0088 1 1 2.0606 0.1007 0.0517 0.0056 •2 .0 X 1.7775 0.1039 0.0540 0.0076 • 2 1.0202 0.0588 0.0922 0.0173 0 1.8438 0.0999 0.0632 0.0109 1 1 1.8132 0.0834 0.0430 0.0058 “ Z \ and Z ‘ > are diclioton 1011s variables with ?r,j(0) - p (Z 1 (0) — i, Z2(0) = j ) , i j — 0, 1 , baseline hazard Ao(f) = I. * JToo(O) = 0.20, jtoi(O) = 0.20, jt,o(0) = 0.10, jt,,(0) = 0.50. O R = rroo(0)rri i ( 0 ) / tto i (0)^111(0). c m = 00 refers to full cohori. 86 T a b l e 7.2: C o m p a r is o n b e t w e e n s im u l a t e d a n d a s y m p t o t i c r e s u lt s o f /j w it h Z 2 o m i t t e d for a n id e a liz e d in te r v e n tio n trial" ( O R 6= 0 . 2 3 ) Hn ~ 0 ’ O l o - _ > »ir 0 ' n=50 n=100 O O l O I I 0.0 0.0 ■ X ' -0.0001 0.0612 0.0409 0.0095 2 0.0000 0.0556 0.0994 0.0247 6 -0.0001 0.0956 0.0551 0.0155 11 -0.0001 0.0790 0.0466 0.0086 0.5 00 -0.06:15 0.0733 0.0485 0.0131 2 -0.06:14 0.0948 0.0811 0.0237 6 -0.06:14 0.1179 0.0624 0.0132 1 1 -0.0635 0.0937 0.0574 0.0139 1.0 X -0.1028 0.0804 0.0348 0.0112 2 -0.1027 0.0792 0.0951 0.0188 6 -0.1027 0.1129 0.0544 0.0155 1 1 -0.1027 0.0857 0.0359 0.0131 2.0 X -0.1408 0.0847 0.0336 0.0075 2 -0.1409 0.0902 0.0781 0.0138 6 -0.1408 0.1246 0.0575 0.0059 1 1 -0.1408 0.0926 0.0397 0.0083 0.5 0.0 X 0.4009 0.0957 0.0547 0.0087 2 0.5002 0.0211 0.1353 0.0276 6 0.5000 0.1252 0.0736 0.0103 1 1 0.4990 0.1177 0.0688 0.0090 0.5 X 0.4236 0.1191 0.0582 0.0144 • > 0.4235 0.0547 0.1235 0.0369 6 0.4235 0.1635 0.0862 0.0149 1 1 0.4235 0.1339 0.0663 0.0162 1.0 X 0.3595 0.0989 0.0375 0.0026 2 0.3591 0.0404 0.0832 0.0109 6 0.3593 0.1255 0.0499 0.0057 1 1 0.3594 0.1187 0.0518 0.0020 2.0 X 0.2804 0.0920 0.0641 0.0028 ■ > 0.2795 0.0499 0.1204 0.0265 6 0.2801 0.1187 0.0910 0.0055 1 1 0.2802 0.1033 0.0737 0.0043 a Z \ and Z ’ > arc dichotoninns variables with < t,j((> ) = i>(Z|(0) = i.Z->{0) = j ) , i j = 0, 1 , baseline hazard A lp (t) = 1 . 6 rroo(0) = 0.706, Toi(0) = 0.194, irln(0) = 0.094, *,,(0) = 0.006. OR= a-oo(0)jrii(0)/wcn(0)jrin(0). c to = 00 refers to full cohort. 87 T a b l e 7.2: C o m p a r is o n b e t w e e n s im u l a t e d a n d a s y m p t o t i c r e s u lts o f fi w ith Z - 2 o m i t t e d for a n id e a liz e d in te r v e n tio n trial" ( ( ) l V ' = 0 .2 3 , c o n t in u e d ) • 5 3 . 1 5? o 1 (h > nr iT n=50 n=100 © © I I 1.0 0.0 < 5 0 0.9999 0.1469 0.0595 0.0173 2 1.0001 -0.0738 0.0973 0.0400 6 1.0000 0.1534 0.0771 0.0212 1 1 0.9999 0.1629 0.0824 0.0202 0.5 ■ 5 0 0.913.‘ l 0.0911 0.0536 0.0085 2 0.9134 -0.1118 0.0785 0.0269 6 0.9131 0.1008 0.0705 0.0176 1 1 0.9132 0.1116 0.0643 0.0080 1.0 x 0.6231 0.1153 0.0660 0.0146 2 0.6219 -0.0589 0.1019 0.0540 6 0.622.5 0.1470 0.0936 0.0172 1 1 0.6227 0.1425 0.0815 0.0186 2.0 X 0.6857 0.1004 0.0716 0.0114 2 0.6814 0.0283 0.1386 0.0233 6 0.6836 0.1388 0.0950 0.0140 1 1 0.6844 0.1273 0.0778 0.0153 2.0 0.0 X 1.9999 0.1706 0.0439 0.0090 2 2.0006 -0.6643 -0.2596 0.0870 (i 1.9996 0.0388 0.0946 0.0256 1 1 2.0000 0.1299 0.0685 0.0155 0.5 X 1.900 1 0.1881 0.0792 0.0155 2 1.9016 -0.6441 -0.1679 0.0834 6 1.9006 0.0670 0.1453 0.0267 1 1 1.9007 0.1629 0.0967 0.0236 1.0 X 1.7682 0.1531 0.0623 0.0134 2 1.7674 -0.5380 -0.1153 0.0723 6 1.7674 0.1086 0.1100 0.0256 1 1 1.7675 0.1629 0.0999 0.0219 2.0 X 1,1827 0.0998 0.0566 0.0086 2 1.4706 -0.3934 -0.0023 0.0551 6 1.4748 0.0758 0.1194 0.0141 1 1 1.4772 0.1211 0.0676 0.0207 4 Z \ and Z't are dicliot.omous variables with ir,j(0) = p (Z i(0) = i,Z->(0) = j ) , i j = 0, 1, baseline hazard Ao(0 = 1. 6 7T oo(0) = 0.706,7T oi(0) = 0.194, J T i,)(0 ) = 0.094, t,,(0) = 0.006. OR= troolOltrnfOJ/TodOjindO). in = oo refers to full cohort. SS T a b le 7 .3 : C o m p a r is o n b e t w e e n s im u la t e d a n d a s y m p t o t i c r e s u lts o f ft w ith Zi o m i t t e d for an id e a liz e d in te r v e n tio n trial" ( ^ = 0 . 0 ) Hn ~ /}' o 2 »/c o I I o o o o i- '" ? I I a 0.0 0.0 o c 0.0000 0.0038 0.0083 -0.0012 2 0.0000 0.0085 0.0115 -0.0027 6 0.0000 -0.0013 0.0072 -0.0012 11 0.0000 0.0043 0.0072 -0.0011 0..-) oc 0.0000 0.0037 0.0059 -0.0011 2 0.0000 0.0028 0.0081 -0.0002 (i 0.0000 0.0023 0.0055 -0.0001 1 1 0.0000 0.0033 0.0059 -0.0010 1.0 oc 0.0000 0.0009 -0.0005 -0.0011 • > 0.0000 0.0029 -0.0031 -0.0013 0 0.0000 0.0051 -0.0014 -0.0005 11 0.0000 0.0081 -0.0011 -0.0018 1.0 0.0 oc 1.0000 0.0217 0.0147 0.0028 2 1.0000 0.0800 0.0132 0.0085 6 1.0000 0.0340 0.0188 0.0044 1 1 1.0000 0.0273 0.0158 0.0035 0.5 oc 0.8301 0.0587 0.0516 0.0235 2 0.8938 0.0529 0.0275 -0.0216 (> 0.830-1 0.0033 0.0501 0.0270 11 0.830-1 0.0574 0.0526 0.0255 1.0 O C ' 0.5007 0.1300 0.1056 0.0939 2 0.7112 0.0242 -0.0310 -0.0119 (i 0.5007 0.1410 0.1024 0.0989 11 0.5007 0.1343 0.1033 0.0902 “ Z\ and Z-> are standard normally distributed al time 0, baseline hazard Ao(M - 1 . b Correlation coellicient between Z\ and Z -> at time 0. ' m = oo refers to full cohort. d V V ltett m = 6 or in = 11, is not calculated and is assumed to be equal to full cohort. 89 T a b le 7.4: C o m p a r is o n b et w e e n s i m u l a t e d a n d a s y m p t o t i c r e s u l t s o f w ith Z 2 o m it t e d fo r an id e a liz e d in te r v e n tio n trial" (p < ’= 0 . 5 ) Pn ~ C V l 0 2 m c ii"< n=50 11=100 n=500 0.0 0.0 0 0 0.0000 0.0038 0.0083 -0.0012 2 0.0000 0.0085 0.0115 -0.0027 0 0.0000 -0.0013 0.0072 -0.0012 11 0.0000 0.0043 0.0072 -0.0011 0.5 0 0 0 .2 1 0 :1 0.0143 0.0114 -0.0010 ■ > 0.2210 0.0170 0.0130 -0.0001 (i 0.210:1 0.0172 0.0110 -0.001 1 1 1 0.210:1 0.0131 0.0100 -0.0022 1.0 X I1..1500 0.0205 0.0030 -0.0004 2 0.:1701 0.0000 -0.0210 -0.0318 0 0.3509 0.0208 0.0020 -0.0072 11 0.3509 0.0211 0.0037 -0.0071 1.0 0.0 X 1.0000 0.0217 0.0147 0.0028 2 1.0002 0.0810 0.0430 0.0080 (i 1.0000 0.0340 0.0188 0.0014 1 1 1.0000 0.0273 0.0158 0.0035 0.5 X 1.1551 0.0022 -0.0077 -0.0382 2 1.1:17(1 -0.2240 -0.2572 -0.2997 6 1.1551 0.0102 -0.0030 -0.0327 11 1.1551 0.0021 -0.0103 -0.0302 1.0 oc 1.1040 0.0318 -0.0008 -0.0288 2 1.3683 -0.1582 -0.2329 -0.2029 b 1.1040 0.0344 -0.0008 -0.0152 11 1.1040 0.0350 -0.0080 -0.021 1 ' * Z\ and Z-, are standard normally dislrihntcd at lime 0, baseline hazard A o(Z ) = 1 . 4 Correlation cocflicienl between Z\ and Z-> at time 0. c m = 00 refers to full cohort. d W hen in = 6 or in = 11, is not calculated and is assumed to be equal to full cohort. 9 0 T a b le 7.5: C o m p a r is o n b e t w e e n s im u la t e d a n d a s y m p t o t i c r es u lts o f fi w h en Z 2 is m o d e lle d a s Z for an id e a liz e d in te r v e n tio n trial" !i ~ fi' n »i/ H'e n=50 n=100 o o I I -2.0 O G -1.1925 -0.0338 -0.0205 0.0140 2 -1.2003 -0.1364 -0.0596 0.0080 6 -1.1925 -0.0715 -0.0412 0.0020 11 -1.1925 -0.0186 -0.0299 0.0064 -1.0 O G -0.5985 -0.0288 0.0053 -0.0101 2 -0.000.1 -0.0143 -0.0165 -0.0226 6 -0.5985 -0.0486 -0.0077 -0.0067 1 1 -0.5985 -0.0173 0.0093 -0.0093 -0.5 oc -0.2990 0.0818 0.0203 -0.0021 2 -0.3000 0.0514 0.0250 0.0015 (i -0.2990 0.0274 0.0333 0.0013 1 1 -0.2990 0.0694 0.0251 -0.0067 0.0 oc 0.0000 0.0461 0.0151 0.0119 2 0.0000 0.0554 0.0115 -0.0143 fi 0.0000 0.0415 0.0197 0.0142 1 1 0.0000 0.0459 0.0191 0.0126 0.5 - O C ' 0.3003 0.0716 -0.0101 0.0202 2 0.3000 0.0989 0.0373 0.0082 6 0.3003 0.0593 -0.0111 0.0130 1 1 0.3003 0.0707 -0.0176 0.0183 1.0 O C 0.600-1 0.0050 0.0686 0.0140 2 0.0003 -0.00.31 0.0652 0.0198 6 0.0001 0.0047 0.0779 0.0003 1 1 0.0001 0.0098 0.0545 0.0207 2.0 oc 1.2033 0.0071 0.0232 0.0366 2 1.1997 0.0068 0.0075 0.0342 (i 1.2033 -0.0352 0.0267 0.0371 1 1 1.2033 0.0274 0.0065 0.0331 ° Z ( 0) is normally distributed with .V(O.'l.O.l-). A»(/) = 1 . b m = oc refers to full coliori. c When in = fi or in = 11, J’ is not calculated and is assumed to be equal to full cohort. 9 1 T a b l e 7.6: S im u la t io n : C o m p a r is o n b e t w e e n full c o h o r t a n d n e s t e d c a s e -c o n tr o l d e s i g n s w it h Z 2 o m i t t e d for an id e a liz e d in t e r v e n t io n tr ia l0 ( 0 R 6= 1 ) ‘ ,im — iij n-i o 2 nL ' Jj m =2 n 1 =6 m = 1 1 0.0 0.0 50 -0.0148 -0.0033 -0.0038 -0.0014 100 O .O O ihS 0.0025 0.0027 0.0006 500 0.002!) 0.0006 -0.0011 0.0004 0.5 50 -0.0144 0.0179 0.0012 -0.0006 100 -0.0025 -0.0063 -0.0007 0.0002 500 0.00:17 -0.0068 0.0036 -0.0008 1.0 50 0.004S 0.0013 0.0037 0.0019 100 0.0095 -0.0014 0.0011 0.0029 500 -0.0012 -0.0010 0.0000 0.0004 2.0 50 0.008:1 0.0183 -0.0026 0.0021 100 0.0101 0.0000 -0.0041 0.0019 500 -0.0029 0.0053 0.0003 0.0000 0.5 0.0 50 0.5:10.1 0.0313 0.0029 0.0014 100 0.52:15 0.0267 0.0012 0.0010 500 0.501:1 0.0061 -0.0010 0.0005 0.5 50 0.5059 0.0354 -0.0010 -0.0015 100 0,18:19 0.0129 -0.0028 -0.0022 500 0.1710 0.0011 -0.0003 0.001 1 1.0 50 0 .122S 0.0285 0.0036 0.0005 100 0.1272 0.0076 0.0036 -0.0009 500 0,1297 -0.0021 0.0015 0.0001 2.0 50 0.3441 0.0305 0.0008 -0.0004 100 0.3528 0.0220 0.0076 0.0033 500 0.3461 -0.0065 0.0003 0.0003 ° Z\ and Z-i are dichotoiiious variables with t,,(0) = p(Z j (0) = i,Z->(0) = j) ,ij - 0. 1 , baseline hazard A o(<) = 1 . b jtoo(O ) = 0 . 2 5 , 7T,) 1 ( 0 ) = 0 . 2 5 . f f | O( 0 ) = 0 . 2 5 . tt, , ( 0 ) = 0 . 2 5 . OR= TroolO lJrufO l/jrndO lTrm lO ). c Cohort size in simulation study. T a b le 7.6: S im u la t io n : C o m p a r is o n b e t w e e n fu ll c o h o r t a n d n e s t e d c a s e -c o n tr o l d e s ig n s w it h Z 2 o m i t t e d for an id e a liz e d in t e r v e n t io n trial® ( 0 R 6= 1 , c o n t i n u e d ) tin ~ Hj «i o 2 nc m =2 m = 6 m = l l 1.0 0.0 50 1.0275 0.0727 0.0093 0.0004 100 1.0030 0.0348 0.0014 -0.0003 500 1.0010 0.0071 0.0008 0.0006 0.5 50 0.9953 0.0367 0.0095 0.0036 100 0.9671 0.0335 0.0086 0.0023 500 0.9534 -0.0003 0.0031 0.0021 1.0 50 0.8722 0.0622 0.0114 0.0027 100 0.8612 0.0377 0.0032 0.0001 500 0.8597 0.0155 0.0029 0.0012 2.0 50 0.6979 0.0313 0.0122 0.0061 ion 0.6921 0.0215 0.0064 0.0031 500 0.6852 0.0083 0.0004 -0.0008 2.0 0.0 50 2.0793 -0.0533 0.0213 0.0033 100 2.0223 0.0819 0.0181 0.0005 500 2.0027 0.0236 0.0023 0.0006 0.5 50 1.9916 -0.0166 0.0247 0.0196 100 1.9399 0.0731 0.0192 0.0038 500 1.9252 0.0293 0.0106 0.0066 1.0 50 1.8116 0.0386 0.0326 0.0197 100 1.7766 0.1255 0.0361 0.0111 500 1.7381 0.0542 0.0252 0.0130 2.0 50 1.3863 0.0999 0.0205 0.0070 100 1.3532 0.0776 0.0178 0.0060 500 1.3383 0.0356 0.0056 0.0060 “ Z \ and Z> are dicliotoiuous variables with !r,j(0) = 1 > (Z \(0) = i,Z->(0) = j ), i j = 0, 1, baseline hazard An(/) = I. 4 JToo(O) = 0 .2 5 ,7r,), (0) = 0.25. *„,(()) = 0.25. jtu(0) = 0.25. O R = JTuu(0)5T| |(0)/T|]l(0)Tli)(0). c Cohort size in simulation study. 9 3 T a b le 7.7: S im u la t ion: C o m p a r is o n b e tw e e n full c o h o r t a n d n e s t e d c a s e -c o n tr o l d e s ig n s w i t h Z i o m i t t e d lor an id e a liz e d in t e r v e n t io n trial" (O R .fc = 0 . 2 3 ) ~ fij Oi 02 nc Hi m = 2 ni= 6 m = I 1 0.0 0.0 50 0.0011 -0.0055 0.0344 0.0178 100 0.0108 0.0580 0.0142 0.0057 500 0.0091 0.0153 0.0060 -0.0009 0.5 50 0.0098 0.0210 0.0447 0.0204 100 -0.0150 0.0327 0.0140 0.0089 500 -0.0501 0.0107 0.0002 o.ooos 1.0 50 -0.0224 -0.0011 0.0320 0.0054 100 -0.0080 0.0004 0.0197 0.0012 500 -0.0910 0.0077 0.0044 0.0020 2.0 50 -0.0501 0.0051 0.039!) 0.0079 100 -0.1072 0.0444 0.0239 0.0061 500 -o.i3.33 0.0002 -0.0010 0.0008 0.5 0.0 50 0.5950 -0.0743 0.0290 0.0220 100 0.5510 0.0809 0.0190 0.01 1 1 5110 0.5080 0.0192 0.0017 0.0003 0.5 50 0.5427 -0.0645 0.0443 0.0147 100 0.1818 0.0652 0.0279 0.0080 500 0.-1380 0.0224 0.0004 0.0017 1.0 50 0.4584 -0.0589 0.0201 0.0197 100 0.3970 0.0153 0.0122 0.0142 500 0.3021 0.0079 0.0029 -0.0007 2.0 50 0.3724 -0.0430 0.0261 0.0111 100 0.3115 0.0554 0.0266 0.0094 500 0.2832 0.0228 0.0024 0.0013 " Z\ and Z ’> are dichotomons variables with t,j(0) = p(Z i(0) = /', Z?(0) - j), ij = 0,1, baseline hazard A o(< ) = 1 . 6 7 T 0o(0) = 0.706, Toi(0) = U . I'M . *,„(()) = 0.0‘M , j t , ,(<)) = 0.006. 011= 4 T n c i(0 )? r i i (0)/ir( ii (0)7T|„(()). c Cohort, size in siniiilation study. 94 T a b le 7.7: S im u la tio n : C o m p a r is o n b e t w e e n full c o h o r t a n d n e s t e d c a s e -c o n tr o l d e s ig n s w i t h Z i o m i t t e d for an id e a liz e d in te r v e n tio n trial'1 ( O R / 1= 0 . 2 3 , c o n t in u e d ) iln ~ I3j Oi o-2 uc lij 1 1 1 = 2 m = 0 m = 11 1.0 0.0 50 1.1 108 -0.2205 0.0000 0.0160 100 1.0591 0.0380 0.0177 0.0229 500 1.0172 0.0229 0.0040 0.0029 0.5 50 1.0044 -0.2028 0.0095 0.0204 100 0.9009 0.0250 0.0107 0.0106 500 0.9218 0.0185 0.0089 -0.0006 1.0 50 0.9384 -0.1754 0.0311 0.0268 100 0.8891 0.0347 0.0270 0.0151 500 0.8377 0.0382 0.0020 0.0036 2.0 50 0.7801 -0.0701 0.0303 0.0256 100 0.7573 0.0027 0.0213 0.0019 500 0.0971 0.0070 0.0005 0.0026 2.0 0.0 50 2.1705 -0.8342 -0.1319 -0.0406 100 2.0438 -0.3028 0.0500 0.0247 500 2.0089 0.0787 0.0105 0.0006 0.5 50 2.0885 -0.8310 -0.1207 -0.0219 100 1.9790 -0.2459 0.0005 0.0178 500 1.9159 0.0091 0.0110 0.0081 1.0 50 1.9213 -0.0919 -0.0453 0.0091 100 1.8305 -0.1781 0.0109 0.0369 500 1.7810 0.0581 0.011 1 0.0078 2.0 50 1.5825 -0.5053 -0.0319 0.0158 100 1.5393 -0.0710 0.0549 0.0055 500 1.1913 0.0344 -0.0024 0.0066 " Z\ and Z'i are dicliotomous variables with tt,j(0) = \>(Zi(0) = /, Z'j(O) = j ) , i j = 0, 1 . baseline hazard A n(t) = 1. 6 7T oo(0) = 0 .706,7 T 01(0) = 0 . 1 94, ^ hi(O) = 0.00-1, Tn (0) = 0.006. O R = 7 r0 O (0)7Tl i (0 ) / t t O i (0)7Ti „( 0). c Cohort, size in siumlat ion study. 9 5 T a b l e 7.8: S im u la t io n : C o m p a r is o n b e t w e e n full c o h o r t a n d n e s te d c a s e -c o n t r o l d e s i g n s w ith Z i o m i t t e d for a n id e a liz ed in te r v e n tio n tria l" (/)6= 0 . 0 ) 0m * H] «1 a 2 ? * c Hi m = 2 m = 6 n t= l 1 0.0 0.0 50 0.0038 0.0047 -0.0051 0.0005 100 0.0083 0.0032 -0.0011 -0.0011 500 -0.0012 -0.0015 0.0000 0.0001 0.5 50 0.0037 -0.0009 -0.0014 -0.0004 100 0.0059 0.0022 -0.0004 0.0000 500 -0.0011 0.0009 0.0007 -0.0005 1.0 50 0.0069 -0.0010 -0.0015 0.0012 100 -0.0005 -0.0026 -0.0009 -0.0006 500 -0.0011 -0.0002 0.0006 -0.0007 0 . 5 0.0 50 0.5162 0.0170 -0.0027 0.0002 100 0.5089 0.0143 0.0025 0.0020 500 0.5011 0.0043 0.0002 -0.0003 0.5 50 0.1173 0.0134 -0.0012 0.0005 100 0.1397 0.0089 0.0017 -0.0011 500 0.3280 0.0032 0.0011 0.0007 1.0 50 0.3388 0.0111 -0.0031 -0.0021 100 0.3311 0.0054 -0.0009 -0.0030 500 0.3231 0.0064 -0.0002 0.0002 1.0 0.0 50 1.0217 0.0595 0.0123 0.0056 100 1.0147 0.0291 0.0041 0.0011 500 1.0028 0.0063 0.0016 0.0007 0.5 50 0.8951 0.0516 0.0046 -0.0013 100 0.8880 0.0333 0.0045 0.0010 500 0.8599 0.0123 0.0035 0.0020 1.0 50 0.6973 0.0381 0.0050 -0.0023 100 0.6663 0.0133 -0.0032 -0.0023 500 0.6546 0.0147 0.0050 0.0023 “ Z i and Z-> are both standard normally distributed at time 0, baseline hazard Ao(/) = 1 . b Correlation coefficient between Z\ and Z> at time 0. 0 Cohort size in simulation study. 96 T a b l e 7.9: S im u la ! i o n : C o m p a r is o n b e t w e e n lull c o h o r t a n d n e s t e d c a s e -c o n tr o l d e s i g n s w ith 7,2 o m i t t e d for a n id e a liz e d in te r v e n tio n trial" (/>*’= ( ) .5 ) 3 1 cv, cv2 nc m =2 m = 6 m = l l 0.0 0.0 50 0.0038 0.0047 -0.0051 0.0005 100 0.0083 0.0032 -0.0011 -0.0011 500 -0.0012 -0.0015 0.0000 0.0001 0.5 50 0.2330 0.0002 0.0029 -0.0012 100 0.2307 0.0042 0.0005 -0.0008 500 0.2177 0.00-11 0.0005 -0.0006 1.0 50 0.3711 0.0077 0.0003 0.0009 100 0.3515 0.0027 -0.0007 0.0001 500 0.3445 0.0028 -0.0008 -0.0007 0.5 0.0 50 0.5102 0.0170 -0.0027 0.0002 100 0.5089 0.0143 0.0025 0.0020 500 0.501 1 0.0043 0.0002 -0.0003 0.5 50 0.091 1 0.0272 0.0033 -0.0005 100 0.0831 0.0200 0.0036 0.0015 500 0.0081 0.0087 0.0018 0.0010 1.0 50 0.7111 0.0312 0.0038 0.0024 100 0.7207 0.0190 0.0057 0.0003 500 0.7088 0.0185 0.0036 0.0018 1.0 0.0 50 1.0217 0.0595 0.0123 0.0056 100 1.0147 0.0291 0.0041 0.0011 500 1.0028 0.0003 0.0016 0.0007 0.5 50 1.1573 0.0503 0.0080 -0.0001 100 1.1 171 0.0330 0.0017 -0.0026 500 1.1109 0.0210 0.0055 0.0020 1.0 50 1.1307 0.0731 0.0026 0.0032 100 1.0981 0.0373 0.0060 -0.0018 500 1.0701 0.0293 0.0136 0.0077 a Z\ and Z'l arc both standard normally distributed a t time 0, baseline hazard A o(< ) — 1 . 6 Correlation coefficient between Z\ and Z< , at timet). Cohort size in simulation study. 9 7 T a b le 7 .1 0 : S im u la tio n : C o m p a r is o n b e t w e e n full c o h o r t a n d n e ste d c a s e -c o n t r o l d e s ig n s w h e n Z 1 is m o d e lle d a s Z for a n id e a liz e d in t e r v e n t io n trial" An - fij n nb ■h m = 2 m = 6 m=ll -2.0 50 -1.2263 -0.1104 -0.0377 -0.0148 100 -1.2130 -0.0160 -0.0207 -0.0094 500 -1.17S5 -0.0138 -0.0120 -0.0076 -1.0 50 -0.6273 0.0127 -0.0108 -0.0185 100 -0.5932 -0.0236 -0.0130 0.0040 500 -0.6086 -0.0143 0.0034 0.0008 -0.5 50 -0.2118 -0.0338 -0.0574 -0.0154 100 -0.2703 0.0043 0.0130 0.0048 500 -0.3017 0.0032 0.0034 -0.0046 0.0 50 0.0-161 0.0003 -0.0016 -0.0002 100 0.0151 -0.0036 0.0046 0.0010 500 0.0110 -0.0262 0.0023 0.0007 0.5 50 0.3710 0.0270 -0.0123 -0.0009 100 0.2002 0.0471 -0.0010 -0.0075 500 0.3205 -0.0123 -0.0072 -0.0019 1.0 50 0.6051 -0.0082 -0.0003 0.0048 100 0.6600 -0.0035 0.0093 -0.0141 500 0.61 1 4 0.0057 -0.0137 0.0067 2.0 50 1.2101 -0.0030 -0.0423 0.0203 100 1.2265 -0.0103 0.0035 -0.0167 500 1.2300 -0.0060 0.0005 -0.0035 0 2(0) is normally dist.ribut.eil with A ’(0.:t.0.1 ~). A 0(< ) = 1 . 4 Cohort, size in simulation study. O S T a b l e 7.11: C o m p a r is o n b e t w e e n s im u la t e d a n d a s y m p t o t i c r e s u lts o f E a n d / w it h Z 2 o m i t t e d for an id e a liz e d in te r v e n tio n trial" ( O R fc= l ) Vrn « ( W ) ) — E * h - I ’ «1 C r 2 m c E” n=50 n=100 n=500 /* n=50 n=100 5 I I in O O 0.0 0.0 00 0.2500 -0.0287 0.0025 -0.0056 0.2500 -0.0206 -0.0115 -0.0029 2 0.1250 -0.0139 0.0078 -0.0091 0.1250 -0.0074 -0.0039 -0.0007 6 0.2083 -0.0058 0.0026 0.0076 0.2083 -0.0057 -0.0026 -0.0004 1 1 0.2273 -0.0150 0.0082 -0.0057 0.2273 -0.0084 -0.0035 -0.0006 0.5 'X ' 0.2500 -0.0049 0.0052 -0.0112 0.2500 -0.0203 -0.0116 -0.0028 2 0.1250 0.0051 -0.0004 -0.0003 0.1250 -0.0074 -0.0036 -0.0006 6 0.2083 -0.0002 0.0126 -0.0111 0.2083 -0.0061 -0.0028 -0.0003 1 1 0.2273 0.00-11 0.0132 -0.0019 0.2273 -0.0080 -0.0039 -0.0005 1.0 X 0.2500 -0.0195 -0.0078 -0.0205 0.2500 -0.0200 -0.0117 -0.0030 2 0.1250 -0.0033 -0.0002 0.0042 0.1250 -0.0073 -0.0032 -0.0005 (j 0.2083 -0.0008 0.0035 -0.0095 0.2083 -0.0061 -0.0031 -0.0006 1 1 0.2273 -0.0081 -0.0017 -0.0109 0.2273 -0.0077 -0.0039 -0.0007 2.0 X 0.2500 -0.0203 -0.0155 0.0170 0.2500 -0.0206 -0.0115 -0.0030 2 0.1250 -0.0077 -0.0059 0.0035 0.1250 -0.0069 -0.0046 -0.0009 6 0.2083 -0.0002 -0.0041 0.0081 0.2083 -0.0058 -0.0028 -0.0005 1 1 0.2273 -0.0071 -0.0089 0.0252 0.2273 -0.0083 -0.0038 -0.0008 0.5 0.0 X 0.2302 -0.0285 -0.0100 -0.0067 0.2362 -0.0190 -0.0099 -0.0021 2 0.1118 -0.0108 -0.0027 -0.0043 0.1118 -0.0062 -0.0032 -0.0005 0 0.19-13 -0.0178 0.0012 -0.0099 0.1943 -0.0043 -0.0018 0.0001 1 1 0.213d -0.0154 0.0000 0.0035 0.2134 -0.0063 -0.0022 -0.0000 0.5 X 0.2373 -0.0107 -0.0116 -0.0008 0.2381 -0.0195 -0.0103 -0.0021 2 0.1130 0.0004 -0.0078 -0.0012 0.1130 -0.0067 -0.0033 -0.0006 6 0.1950 0.0012 -0.0004 0.0062 0.1960 -0.0051 -0.0016 -0.0001 1 1 0.2140 0.0012 -0.0006 -0.0080 0.2152 -0.0069 -0.0025 0.0001 1.0 X 0.2390 -0.0107 0.0153 0.0029 0.2406 -0.0178 -0.0101 -0.0024 2 0.1152 -0.0092 0.0101 0.0060 0.1152 -0.0059 -0.0027 -0.0004 6 0.1980 0.0029 0.0198 0.0057 0.1985 -0.0038 -0.0016 -0.0003 1 1 0.2109 0.0001 0.0110 0.0035 0.2177 -0.0053 -0.0024 -0.0003 2.0 X 0.2-131 -0.0248 -0.0282 0.0068 0.2418 -0.0174 -0.0091 -0.0021 2 0.1180 -0.0009 -0.0062 0.0022 0.1180 -0.0069 -0.0028 -0.0004 6 0.2011 -0.01 50 -0.0138 0.0092 0.2005 -0.0035 -0.0014 -0.0000 1 1 0.2202 -0.01 23 -0.0176 0.0101 0.2193 -0.0054 -0.0017 -0.0001 a Z\ and Z o are dichotomoiis variables with tt,^(0) = />(Zi(0) — i.Z->{0) = j) ,ij — 0, 1 , baseline hazard A ()(/) = I. * 7 T O o(0) = 0.25, jr ol(0) = 0.25, jrul(0) = 0.25, j t , ,(0) = 0.25. OR= 5roo(0)Tn (0)/jr„i(0)5r|(,(0). c 1 1 1 = 00 refers to full cohort. 99 Table 7.11: Comparison between simulated and asymptotic results o f E and / with Z 2 omitted for a n idealized intervention trial0 ( 0 R 6=1, continued) 0 tO to , C O , 0 c m C O X to to - r 1- C O O -« -s O S — — C MT T C O to eo O C MC M 0 O 0 0 O 0 0 C M0 0 0 O O © 0 © O O O O O 0 0 0 O O 0 10 0 0 0 0 0 O 0 0 O 0 0 0 0 O O O 0 0 O O O O O O O 0 0 0 O O O 0 I I 0 0 0 0 p O 0 p O 0 0 p p O O O p 0 O O O O O O O p 0 0 O O O 0 0 0 0 1 0 0 1 O 0 0 O 0 0 d 0 O O O 0 0 O O O 1 O O O © 0 0 d O 1 O O 0 0 c m x 0 C O 0* lO C MC O O • o * 0 C O © O S C O C MC O X O s C MC M _ tO L . O © 0 C O C MO S os 0 h - 0 0 0 0 C M O S C M t - C MO O C O 0 0 O 1— O S * * • 1 C MC O 0 0 O 0 0 0 0 O O 0 0 O 0 © © O O O O 0 0 O O O O O 0 0 0 O 0 O 0 0 I I O p 0 0 p O O 0 p O 0 0 p O O O p p 0 O O O O O p p 0 O 0 O p 0 s O 0 0 t 0 1 0 1 O 1 O 1 0 1 0 O 0 0 d O O 1 O O 0 • 0 I O O O O O 0 0 0 O 1 d O 0 1 0 0 iff x C O 0 C O Iff Lff tO TP C M© O r - C Mh- ■ o * C MO C MC M _ L _ - J - C M X C Mto X « - 0 L f f C O c m C O h- C OC O T T C OC O T T T C MC M C O C MC MC MLff C MC MC O C O © C O C O C MC OC M I I © 0 0 0 0 O — 0 0 0 — 4 O O 0 O 0 O O © 0 0 O — 4 O 0 O O 0 0 © O 0 0 O p p p p p O O 0 p O 0 O O O O O p p p O 0 O p O O 0 0 0 O 0 0 O 0 0 0 0 O O O 0 0 O 0 O O O O © 0 0 0 O 0 O 0 O • X X C OO S C MC O c —O C O C O 0 O s co C M O S T T 1 —C M1 —C O 1 — * « T C MtO * * * . T T to C OC MO S X h- C OX C O tO T T T T 7 0 C O C M O co 0 X C MX C O i - O X C O t— >4 O X C OX 0 X C O X — O S 1 —O S C MO X O C O T T O S — • C O T T O S iO C OX C O C O C M 0 C Mp — 4 C Mp C M— 1 —C M p 0 O p ——O — — 4 O O 0 O d 0 d d d d d d d d d d d d d 0 d d d d O 0 d d d d d d d O X C O O h » X C MC MC O 0 co os i - O O S C M os os C MO S co C O C O X C MC O X C O 0 os O O S ■ O * C OC OC M1 - C O 1 —C O — — C Mos C O C O C O • H rr 0 C M O S C O O s 1 —C * * 1 iff 0 O O O 0 0 O O O 0 — • 0 0 O 0 0 0 O 0 O O 0 O 0 O 0 0 0 • I I 0 O O O p 0 O O O p 0 p p 0 0 O 0 0 0 O p 0 O O p O p O O 0 0 0 W 1 d O O O d d d d d d d d d w d d d 1 d » d = T d d O d d d d 1 0 d d 1 O C OC O O C O 0 tO O S 0 T T 0 0 T T 0 X C O — N i.O L . O X O s C O i'- t- X C MC M t O C M X C O I—C OC O C O i— t—C Miff 0 0 0 C MC M 0 C O O u O X L f f l O 0 0 0 0 O 0 0 0 O C D 0 O O 0 C D 0 O O O I I O 0 p 0 0 p 0 0 0 p 0 0 O O O 0 p 0 O O O p O p O p O 0 0 O O O O 0 d d d d d 0 d 0 d 0 O d d d 0 d d w w w d 0 d d d d d d d O e O O S T T 1 0 C O X O S - / - s C O - 1 — _ 0 lO X C M0 1- — , T T X i-» -T C O os C M L f f rr © C MO S O S O S 0 0 co O s O S C O X O S i-l 0 C MO s 1 —C O tO 1 - C O ' C O C O L f f X C O || 0 0 0 0 0 O 0 0 0 O © C MO 0 O 0 0 0 O O 0 O 3 ro 0 0 C M0 0 p 0 O 0 0 p O 0 0 0 p 0 p 0 0 p p p 0 O O p p p p p p 0 p 0 0 0 0 O d d d 1 d d d d 0 d d 1 d 0 d 1 d 1 d d d 0 d 0 d • d • d 1 d • X X - O ’ c * “ ' C OC M C MC O os X 0 rr 1— 1 C Mt— i - O t t 0 _ i - V . tO T f C MC O v\ T T to C O C Mr—X C O to C O C O co O C O 0 1 _ 0 C MX X O S C O O S I—X C O 0 X C O X 0 X C O X • - h O S h- O S C MO X 0 C O T T os — mC O •V os — — 0 C MI — C O C O to C Mp C M0 — H C Mp t C M C M O p — 4 O p — 0 — 4 p O 0 0 d d 0 d d d d d d d d d d d d d d d d d d d d d O d d d d V g C MC O 0 C MC O g C MC O g C MC O g C MC O g C M g C MC O , g C MC O 0 0 c 0 0 0 0 0 O d © c m 1.0 d © C M o o o I I o N j I I o N a. I I o £ -D .0 N -c e N J baseline h a za rd A o (< ) = 1 . * J r o o (O ) = 0.25, ttoi(0) = 0.25,ffio(0) = 0.25. !ru (0 ) = 0.25. 0R = 7 r O o(0)jr,,(0)/7ro,(0)!rlo(0). c m = 0 0 r e fe r s t . o full cohort. a Z\ a n d Z> a r e dichotomous variables w ith 7r,j(0 ) - p(Z i (0) = i, Z?(0) = j), ij — 0 , 1 , baseline h a za rd A 0(f) = I. b noo(0) = 0.706, t (1, (0) = 0.104, jtiU (0) = 0.004. ?n,(0) = 0.006. O R = troofOlxulOJ/TmlOl^HdO). c » n = o o r e fer s t o full cohort. j M B re re © 8 re © 8 M b © re © in 8 © re © in © © 8 re re © 8 C 5 re c o 0 1 © re © in 8 © re © © © © 8 P P M 2 ft 0 0 © © © © © © © © © © © © © © © © © © © © © © © © © © © © © 0 0 b © b © © b © © © © © © © b © b © © © © © © b © © b © © 0 © -s| -‘I oc — 1 oo — 1 © is O C © X X 4k X X 4k X X — 1 4k © X M .| 4k © C D M B © © oc © -1 © — 1 © © O C ' © in X re © © re in © re in © © in in © M in in wl © © re © re © C D © © © in in re — X © X © re X © © © • O 1 I © © (b b b © o 1 © © 1 © 1 © b c b b b b 1 0 1 © 1 b 1 © © b j — . © © 1 © O © © © b © © © © © © © © © © b © © © © © © © © © © © © © © © ■ 1 te M b © tsl M B M B © M b © © © © © © © © w B B © B B © © © © © © © © II _ — C D iT -1 4k M B © © O ik M B M B O C M b M B iS in 4k © re re © i ' © 4k M B in in Ci 05 © X © re oc ^1 © re © © is © M B © © ik 0 X re is in X is © 4k 4k 4k © 0 b b © © © © © © © I © b 1 © © b b © b b b • © © b I © b b © © © 1 © 0 0 © © © © © © © © © © b © © © © b © © © © © © © © © © © © © © II B B B B © N 1 © © © M B © © © © © © O © M B © O M B © © © M B © © © © © © © M B i'l in 4k © in © re © re re re re in M b © 4k -1 -1 iS re 4k — 1 © re re © * X 4k © © © •is iS in in X © X B B re X © ik re X -1 -J © 4k © s © 1 0 1 O 1 © 1 © © © © b © 1 © 1 © 1 © © © © b b C D j B - ., b © b © 1 © 1 © © 1 © © © © © © - 1 M 0 O © © 0 © © © © © © © © © © © © © © © © © © © © © © © © © © © II * © te © © © © © © © © © © © © © © © © © © © © © © © © in 4k 05 in © © 4k © re © X' 4k re re C X — 1 X © © © re © © © © © 4k 4k -1 X — 1 © --4 4k © © © in — 1 is re w 4k is 4k re 4k 4k X i* © © © 0 O © © © © © © © © © © © © © © c © © © © © © © © © © © © © © © 0 © © © © © © © © © © © © © © b © C © © © © © © © © © © © © © — I *^l is a -1 X © i s X — 1 © is X X — J 4k © X -J 4k © X — J 4k © X 4k C D C D M b © a X © Tj O C — 1 © © X © — l in X re © -] © re in © © re in © in in © S m b re in © in “ “ © © 00 re © X © © © in © O k © 4k © X © re © © X © © © « O © © © © © © © © © © © © © © © © © © O © © © © © © © © © © © © O © © © © © © © © © © © © © © © © © © © © © © © © © © © © © © O O © © © © © © © © © © © © © C © © © © © © © © © © © © © © © © II re is © © m - re © W * 1 M B re © 4k re is © © 4 . 4k 4k iS is iS © re re re in re is in in in re w. w » oc 4* re re Cn re © X © is © © in -J is — 1 4k © -1 4k © © X © O 1 © 1 © 1 © 1 © 1 1 © 1 © 1 © 1 © 1 © © © © i © O © 1 © 0 t © 1 © 1 © 1 © t © 1 © 1 © t © 1 © © 1 © 1 © < © 3 O © © © © © © © © © © © © © © © © b © © © © © © © © © © © © © © II | O © © © © © © 0 © © © © © © © © 0 © © © © © © © © © © © © © © M B 0 re M B m b is © te re re re re 4k © M B re re is © re re re i s © • n O C ' 4k oc «^i © te X in © X i s in re 4k © M B © re in © 4k © re re © M B iS is re © • O 1 © © © © © © © 1 © 1 © 1 © © 1 © © © © © C 1 © 1 © 1 © 1 © 1 © © © 1 © • © 1 © © • © © 1 © 3 O © © © © © © © © © © © © © © © © © © © © © © © © © © © © © © © II O © © © © © © © © © © © © © © © © © © © © © © © © © © © © © © © in O © © m b © © © © © © © M B © © © © © © © © © © © © © © © © © © © © oc 00 © 4k w 4k © © is -u a © © © is © © in — 1 © © © 4k © 4k © © Table 7.12: Comparison between simulated and asymptotic results o f S and I with omitted fo r a n idealized intervention tria l0 (O R 6=0.23) *-4 “ O c X -a O w 3 < 44 • 4 o •4? e X o ♦J u * 3 2 X w £ > c ^ l o 4 - 5 - © a t r » O c > > 0 X X - " O 4 - J X r j .2 5 5 ^ x S > r — 5 * X .S c "3 o ; n ■ J • 4 4 - 9 X i n ' -2 ■ w ( 4 * c c X X • X o w 2 g 5 £ 4 ^ > o .4 c4 E o 4 ) "j 4 -! - 4 - 9 X H o oo re © r r © C N M , C N oa • 0 0 X © r r - 4 4 © r - C N C N C N © N X C N © C N r r k O o o o o o o O O o o © © © © © © © © © © © © © © © © © © © © © 1 C o o o o o o O o o o © © © © © © © © © © O © © © © © © © © © © © II p o p o p o O p o o © © © © © © © © © © © © © © © © © © © © © © o o d i o i d d i d • d i d • d • © 9 © © © d © < © © 1 © d 1 d 1 © d © © 1 © © 1 © 1 d 1 d 1 © • o o C N o © oa C N o C O o h- C N X l © © © 1 - © k O © © X © © r r kC * ■ 4 o re O 4 4 o C O O C N — 4 r r O C N — • C N 7 N C N c * c © © re 4 4 re re © re C N 4 4 o o o o o o O o O o © © © © © © © © © © © © © © © © © © © © O 1 II p o p o O o O p O o © © © © © © © © © © © © © © © © © 0 © © © © © 1 d d d i d • d i o i d d i d © -© 9 © © © 9 © d d d d © 1 d © • © © © © © © 1 © 1 o o C N k O © © oo oo oa co 1 - C r r © 1 .7 4 » | t'- C N © C N © © r r C N © X © © 44 -4 © eo 44 4 -4 tO C O — < 4 4 — 4 © © © 4 4 4 4 © © 4 4 4 4 © © © I - X © II o o o o o o o o o o © © © © O © 4 4 — 4 © © © 4 - 4 © © © © © © © © © © 1 1 o o o o o o o p p © © © © © © © © © © © © © © © © © © © © © © © © o d d i d d d d d i © © d © © © © © © © © 1 © d © d d © © d © d © © • • 00 C N o o © oo oa © o kC h- © C N © C C 1 r r © © re © C N © X © C N © © , C N *4 © 00 © © © ca o o 00 — 4 ■ — © -T y ? — 4 4 -4 k O 1 — re C N h- © © re © re l"4 r— © 00 C N © © oo C N © h - 00 C N © I— X © 1 1- 44 c * c r r 1 - 4 1 4 4 re © r - 44 r r © t^- 44 r r o o o p © O O O p p © © © © © © © © © © p © © © © © © © © © © © d d d d d d o d d d © d © © © d d d © d d © © d © © © © d © d © o © 00 — i^- © C O o Tf C N C N © l«- C N © 1 44 cc © re © X © re l— re X re © r r o o o o o © © i"- 4} © • * ^ r © 7 - 7 1'- I-- C N C N © © © * 4 - \ © 4-\ © © © © © © o o o o O O o o o © © © C N 0 — 4 — 4 © © © © © © © © © © © © C N © © 4 4 # II o o o p p p p p p © p © © © © © © © p p © © © © © © © p © © © © M 1 © t d d d d d • d i d i d i d © 9 © © © © © © d d d d d 1 © © © © © © © © 1 o © o r— © o C N O , , © t— © X © i-, r r © c^ X re u C © 1 — re C N re 4 - s X r r © . o l 7 C N o re r r « •4 o r r © © -T x rj* re © r - © C N re h- re o o o O o O O o © © © C N — — © © © © © © © © © © © C N © © 4 4 7 i o o p O p O o p o © p © © © © © © © © © © © © © © © © © © O © © © ■ d i d d i d i o i d i t d i 9 9 9 9 9 9 9 © © > © * 1 © © © 1 d 1 9 © © © 1 1 w « C M I ', e C5 M O M 55 X n © © © © — © © © © o o © cn o o o o o o o o o o o o o r a i* x * o w ^ o o w o i- n l » } x C M h -* O O I" — " O W — X : ; © © © © © © © — © © © cn© — © : i O O O O O O O O O O O O O O O o o o o o o o o © © © © © © © © © © oo c n © o © x © © r~ © x — -X c n © i . - c X M O C C C W iC N X M O N X X C h o o o o o o o o o o o o o o o o ?! « r O O W O W W M C f X - - L ' J h n f ) N O ^ « O r t O N h C 5 ( • * - r e ^ i ^ - r e u ? w - r rr i o o o ^ i - o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o = 8 = 8~ = r c : g p d p c n o © o c n p C N on o o II N - o N -= O 8 X * *§ ,-4 . 4 x N 31 T a b l e 7 .1 3 : C o m p a r is o n b e t w e e n s im u la t e d a n d a s y m p t o t i c v a r ia n c e s o f 0 w ith Z 2 o m i t t e d for a n id e a liz e d in te r v e n tio n trial" ( 0 R 6= 1 ) Vai n(i))-V arU 1Y In - V n t'n(0) »1 «2 m'1 Var(l))'' n=50 n=100 n=500 n=50 n=100 n=500 0.0 0.0 oc 11.11 5.15 4.38 0.66 -4.28 -3.81 -0.49 2 22.22 4.42 6.27 2.59 -2.84 -4.80 -2.19 6 13.33 8.21 5.33 0.81 -7.43 -4.90 -0.71 1 1 12.22 5.41 4.87 0.59 -4.90 -4.51 -0.51 0.5 00 11.11 4.32 1.31 1.04 -3.29 -0.76 -0.92 2 22.22 1.09 8.24 3.32 0.55 -6.31 -3.06 (i 13.33 4.06 1.15 1.47 -3.78 -0.71 -1.41 1 1 12.22 4.01 1.39 0.81 -3.92 -1.06 -0.77 1.0 'X' 11.11 5.73 2.18 0.65 -4.71 -1.02 -0.46 2 22.22 3.80 7.08 0.47 -1.74 -5.98 -0.03 0 13.33 5.72 3.57 0.73 -4.80 -3.15 -0.61 1 1 12.22 5.44 2.32 0.43 -4.76 -1.99 -0.32 2.0 O C ' 11.11 4.71 3.12 0.02 -3.84 -2.65 0.04 2 22.22 1.35 9.08 1.82 -0.05 -8.51 -1.63 6 13.33 6.28 5.05 0.48 -5.57 -4.73 -0.49 1 1 12.22 4.31 3.10 0.17 -3.80 -2.86 -0.19 0.5 0.0 X' 11.30 3.71 1.95 -1.02 -3.04 -1.41 1.14 ■ j 28.5!) -7.17 8.70 1.64 0.69 -7.03 -0.97 0 1 -1 .7 3 3.95 2.82 -0.60 -3.40 -2.34 0.69 1 1 13.01 5.27 2.08 -1.19 -4.90 -2.37 1.25 0.5 X ’ 11.2!) 4.14 1.04 0.81 -3.37 -0.94 -0.66 • ; 28.38 -0.01 0.90 1.28 5.43 -5.03 -0.81 0 14.07 5.00 3.94 0.51 -5.19 -3.16 -0.37 1 1 12.97 5.97 2.29 0.45 -5.60 -1.79 -0.36 1.0 X 11.27 1.42 2.95 0.47 -3.67 -2.71 -0.38 2 27.97 -4.90 7.34 2.57 4.27 -6.53 -2.07 6 14.59 5.30 3.91 0.75 -4.82 -3.79 -0.68 1 1 12.93 8.09 3.18 0.51 -7.09 -3.19 -0.47 2.0 X' 11.30 3.02 2.77 -0.61 -2.20 -2.20 0.65 2 27.34 -0.05 3.80 1.64 5.50 -2.26 -1.59 0 14.48 4.14 3.34 -0.03 -3.05 -2.77 0.04 1 1 12.88 3.87 3.68 -0.28 -3.45 -3.31 0.26 ° Z\ and Zi are dichotoinous variables with t t ,- j (0) = p(Z 1 (0) = i, Za(0) = j) , i j = 0,1, baseline hazard A 0(<) = 1 . ‘ ffoo(0) = 0.72, = 0.18, j t ,o (0) = 0.08, J T ,,(0) = 0.02. O R = 7roo(0)7rn (0 ) /7 ro i(0 ) jrio ( 0 ). c Asymptotic variance Var(J) = 5 Z ”/{/")-. » n = 00 refers to full cohort. 103 33 T a b le 7 .1 3 : C o m p a r is o n b e t w e e n s im u l a t e d a n d a s y m p t o t i c v a r ia n c e s o f li w ith Z 2 o m i t t e d for a n id e a liz e d in t e r v e n t io n trial" ( 0 R * = 1 , c o n t i n u e d ) V n r n (/i) - V a r ( 0 Y In - V a r n ('$) < * 1 a 2 m A V a i i P Y n=50 n=100 I ) 0 0 n=50 n=100 3 I I O i O O 1.0 0.0 00 11.80 4.48 2.86 -0.01 -3.34 -2.35 0.14 • 2 39.69 -18.99 -2.11 2.79 13.66 1.85 -2.30 6 17.23 3.26 4.45 0.38 -2.26 -4.06 -0.30 11 14.49 6.41 4.24 0.56 -5.72 -3.99 -0.48 0.5 cc 11.74 4.75 •2.16 0.80 -3.62 -1.65 -0.70 2 39.06 -21.35 -1.74 •2.03 16.00 2.81 -1.96 6 17.07 •2.49 4.18 1.11 -1.67 -3.55 -1.09 II 14.38 3.79 3.87 1.08 -3.18 -3.55 -1.05 1.0 X 11.67 4.64 3.22 0.21 -3.81 -2.63 -0.12 • 2 37.73 -17.94 -1.76 5.83 12.45 1.91 -5.40 6 16.77 4.29 5.21 0.41 -3.87 -4.57 -0.37 1 1 14.19 3.79 4.34 -0.07 -3.62 -3.98 0.09 2.0 0 0 11.71 2.89 0.63 -0.14 -1.97 -0.09 0.15 2 35.24 -16.23 0.19 1.31 12.09 0.97 -1.23 6 16.29 3.05 4.46 0.12 -2.32 -3.82 -0.14 1 1 13.97 4.11 1.45 -0.16 -3.58 -1.12 0.14 •2.0 0.0 X 14.02 8.06 4.43 -0.56 -5.78 -3.26 0.84 2 88.89 -76.53 -59.54 20.54 34.34 36.66 -19.96 6 28.22 -7.03 10.62 1.27 5.96 -9.53 -0.79 11 20.87 1.44 6.57 0.50 -1.13 -5.78 -0.24 0.5 00 13.77 9.87 4.63 •2.46 -7.37 -3.58 -2.24 • 2 86.31 -73.10 -56.31 20.00 33.99 34.17 -19.07 (j 27.61 -6.40 5.71 3.26 5.90 -5.10 -3.11 1 1 •20.48 3.31 4.88 2.75 -2.44 -4.35 -2.72 1.0 0 0 13.35 8.73 4.99 1.79 -6.32 -3.77 -1.40 2 79.93 -67.03 -49.56 32.16 32.16 32.28 -30.80 ( 5 26.13 -4.60 7.69 2.11 4.08 -6.45 -1.70 1 1 19.57 2.39 5.33 1.16 -1.40 -4.61 -0.91 •2.0 S C 13.02 5.37 1.23 -0.35 -3.74 -0.38 0.40 2 64.83 -47.63 -31.56 14.78 24.27 •23.20 -14.77 (j 22.86 -2.80 4.86 1.53 2.97 -3.45 -1.50 II 17.78 1.61 4.31 0.07 -0.84 -3.63 -0.07 " Z\ and Zn are dichotoinous variables with tr,j(0) = p(Z 1 (0) = i,Zn(0) = j),ij = 0 , 1, baseline hazard A 0(f) = 1 . 4 j t o o (O ) = 0.72, ir„i(0) = 0.18.7r10(0) = 0.08, tt,,(0) = 0.02. OR= 7 r O o(0)7T i 1 (0)/7rO i(0)rr1 „(0). c Asymptotic variance Var(fi) — E '/(/*)-'. d 1 1 1 = 00 refers to full cohort. 104 T a b le 7 .1 4 : C o m p a r is o n b e t w e e n s im u l a t e d a n d a s y m p t o t i c v a r ia n c e s o f (3 w it h Z 2 o m i t t e d for a n id e a liz e d in t e r v e n t io n trial® ( O R 6= 5 . 0 ) Vai n(0 ) - V a rW Y K - Var„ «1 «2 m'i Var(0)f n=50 n=100 o o ( I n=50 n=100 n=500 0.0 0.0 X 4.17 0.86 0.45 0.02 -0.49 -0.24 0.04 2 8.33 1.74 0.61 0.08 -1.18 -0.40 -0.04 6 5.00 0.71 0.32 -0.06 -0.55 -0.24 0.08 1 1 4.58 0.71 0.25 -0.05 -0.53 -0.16 0.07 0.5 'X ' 4.20 0.72 0.43 0.04 -0.32 -0.23 0.01 2 8.31 2.09 0.53 0.43 -1.53 -0.31 -0.39 0 5.02 0.89 0.42 -0.15 -0.71 -0.36 0.16 1 1 4.61 0.61 0.31 0.00 -0.42 -0.24 0.01 1.0 X 4.24 0.58 0.44 0.61 -0.16 -0.21 -0.54 2 8.46 2.30 0.88 0.67 -1.73 -0.65 -0.61 0 5.07 0.59 0.26 0.48 -0.42 -0.17 -0.46 1 1 1.66 0.42 0.29 0.50 -0.21 -0.19 -0.48 2.0 X 4.23 1.60 1.22 0.35 -1.05 -0.87 -0.20 2 8.,S O 3.22 1.85 0.06 -2.67 -1.60 -0.02 6 5.12 1.52 1.08 0.47 -1.26 -0.92 -0.39 1 1 4.67 1.27 1.10 0.33 -0.95 -0.92 -0.24 0.5 0.0 X 4.42 0.83 0.43 -0.04 -0.43 -0.23 0.09 2 8.92 2.87 0.08 -0.14 -2.29 0.16 0.18 6 5.29 0.72 0.56 0.07 -0.58 -0.51 -0.06 1 1 4.85 0.69 0.29 -0.02 -0.52 -0.23 0.02 0.5 X 4.55 0.83 0.46 0.18 -0.40 -0.23 -0.12 2 9.44 1.78 0.74 1.04 -1.25 -0.57 -1.01 6 5.48 0.59 0.26 0.25 -0.46 -0.22 -0.24 1 1 5.01 0.63 0.33 0.04 -0.45 -0.26 -0.02 1.0 X' 4.54 1.03 0.61 0.60 -0.52 -0.30 -0.45 2 9.83 3.88 1.59 1.10 -3.36 -1.36 -1.09 6 5.55 0.91 0.56 0.85 -0.74 -0.46 -0.79 1 1 5.04 0.82 0.64 0.53 -0.59 -0.52 -0.45 2.0 X 4.41 1.48 1.10 0.83 -0.72 -0.60 -0.56 2 10.20 4.79 1.56 1.46 -4.06 -1.13 -1.43 6 5.52 1.64 0.89 0.89 -1.25 -0.65 -0.74 1 1 4.96 1.12 0.96 0.90 -0.68 -0.68 -0.72 0 Z\ and Z't are dichotoinous variables with <r,j(0) = p(Z i(0) = /, Z'i(O) = j), ij = 0, 1 , baseline hazard A u (/) = I. * j t o o ( O ) = 0.20, troi(O ) = 0.20. trlo(0) = 0.10, jr,,(0) = 0.50. OR= troolOjtrnlOl/trodOJtrmlO). c Asymptotic variance Var(0) = S*/(/*)2. d m = oo refers to full cohort. 105 T a b le 7.14: C o m p a r is o n b e t w e e n s im u la t e d a n d a s y m p t o t i c v a r ia n c e s o f w ith Z 2 o m i t t e d for an id e a liz e d in t e r v e n t io n trial" ( O R fc = 5 . 0 , c o n t in u e d ) Var, (I3)-Var([)y In 1 - Varj'0) «1 O '2 m'1 Vnr(iiy n=50 n=100 n=500 n=50 n=100 O O 1 1 1.0 0.0 00 5.15 0.88 0.01 0.09 -0.39 -0.37 -0.04 2 11.35 1.93 2.20 1.47 -1.42 -1.75 -1.40 6 0.28 0.57 0.23 0.20 -0.40 -0.17 -0.19 1 1 5.70 0.54 0.40 0.12 -0.36 -0.35 -0.12 0.5 00 5.30 1.18 0.95 0.33 -0.65 -0.65 -0.23 • 2 12.30 2.64 3.20 0.37 -2.45 -3.05 -0.36 6 6.59 1.00 0.39 0.11 -0.93 -0.33 -0.10 1 1 5.93 0.84 0.03 0.22 -0.69 -0.56 -0.19 1.0 C O 5.09 1.98 1.04 1.38 -1.17 -1.08 -1.08 2 12.73 3.99 •2 .0 3 1.50 -3.45 -2.44 -1.55 0 0 .5 -1 1.90 1.89 1.03 -1.0 1 -1.00 -1.48 1 1 5.,S O 1.78 1.71 1.30 -1.41 -1.45 -1.10 •2 .0 X 1 .7 1 2.02 1.10 0.00 -1.00 -0.77 -0.17 •> 12.58 5.00 2.33 0.74 -4.22 -2.34 -0.70 ( i 0.20 1.80 1.19 0.59 -1.41 -0.93 -0.32 1 1 5 .-14 1.78 1.37 0.58 -1.24 -1.03 -0.25 •2 .0 0.0 00 8.30 2.91 0.02 0.55 -2.09 -0.11 -0.45 2 •23.74 -0.10 4.52 1.84 4.56 -3.30 -1.82 ( j 11.04 •2.02 1.08 0.85 -2.48 -0.85 -0.82 1 1 9.02 2.71 0.20 0.25 -2.02 -0.05 -0.22 0.5 C O 8.38 4.30 2.30 0.88 -2.81 -1.37 -0.54 • 2 20.18 -8.01 4.22 3.38 0.98 -4.04 -3.12 f i 11.75 1.68 2.07 0.90 -1.39 -1.05 -0.78 1 1 10.00 3.12 1 .11 0.92 -2.60 -0.78 -0.78 1.0 O C ' 7.22 5.37 3.35 •2.45 -3.27 -1.92 -1.58 2 20.31 -8.10 5.11 5.80 0.14 -4.09 -5.75 6 11.18 5.55 3.44 1.75 -4.97 -2.98 -1.44 1 1 9.24 4.98 •2 .5 8 1.87 -4.20 -1.97 -1.39 2.0 ' C O 5.00 0.49 3.28 1.77 -4.54 -1.79 -0.68 • 2 •22.17 -2.95 6.47 3.82 •2 .0 2 -6.25 -3.74 6 8.74 0.97 5.08 •2.83 -6.13 -4.17 -2.13 1 1 7.03 0.02 3.54 1.93 -4.96 -2.58 -1.04 “ Z\ and Z-i are dichotoinous variables with Jr,j(0) = p(Z 1 (0) = Z-j(O ) = j), ij = 0, 1 , baseline hazard Ao(/) = 1 . * jroo(0) = 0.20, tO i(O) = ().2().t,„(0) = 0.10, jt,,(0) = 0.50. OR= rroo(0)jrii(0)/ffO ,(0)n-1 ()(0). c Asymptotic variance Var(fl) = d m = 00 refers to full cohort. 106 8 Summary and discussion Concerns regarding th e appropriateness of nested case-control designs versus the full cohort designs when m odel misspecification exists are partially an swered in this dissertation. We studied the behavior of the nested case-control and full cohort designs over a wide range of model m isspecification types, in cluding som e very extrem e situations. O ur results indicate th a t: (1) m odel misspecification does not. result in practical differences in estim ates from th e full cohort and nested case-control designs; (2) with covariate omission (no interaction term ), th e inform ation based on the misspecified model well rep resents th e actual variance of th e score, thus, inference based on the usual likelihood techniques are reliable. We conclude th at nested case-control d a ta analyzed using partial likelihood techniques reliably represents th e full cohort under m odel m isspecification. T he following sum m arizes our contributions and specific results. We derive the form ula to calculate the asym ptotic m axim um partial like lihood estim ates for a nested case-control design when the Cox proportional hazards m odel is misspecified in very general ways. T he asym ptotic variance of the m axim um p artial likelihood estim ator under model m isspecification is postulated. The form ulae are extended to full cohort designs when model m is specification exists. We apply the formulae to three types of commonly seen model m isspecification: 1. C ovariate omission: the tru e model is A 0(f)cC M ^ 1 + c> 2 ‘ ?2, or with interac- 107 tion, the working model is Aj(t)eaz' , where Z\ and Z 2 are either binary or norm ally distributed. 2. M ism odelling available m easurem ents: th e true m odel is Ao(t)e°z2, the working m odel is At(t)e'3Z, where Z is either binary or norm ally dis tributed; the tru e model is accelerated failure tim e, and the working m odel is proportional hazards with sam e covariate. 3. M easurem ent error: the tru e model is Ao(t)tnZ, and th e working model is where Z" = Z + < (classical m easurem ent error m odel). O ur theoretical and sim ulation studies conclude that: 1. C ovariate omission: • W hen Z\ and Z2 are both binary and constantly distributed over time (which is approxim ately true for rare disease and little censor ing). if th e true model does not include t he interact ion between Z\ and Z>. then ... _ _ , i / ( ^ 1 0 + "•iif'f> 2)(*bo + T T qi) \ ! ntsl full rV1 °<J | | ^ + a j ) ( Jr, o -|- TTji ) J where 7r’s an* the jo in t distribution probabilities of Z \,Z 2. Thus, sam pling does not introduce extra bias com pared to using th e full cohort. W hen Z\ and Z2 are independent, there is no bias com pared to the tru e model. If th e true model involves the interaction between 108 f i n e at ~ ‘ f u l l = n l + (tT io + 7 T iie Q2't' a3 ) ( TTqo + TTqi ) (*oo + 7roic“2)(n‘io + TTu) } T hus, sam pling docs not introduce e x tra bias com pared to th e full cohort. W hen Z\ and Z 2 are both norm ally and constantly d istributed over tim e, where p is th e correlation coefficient between Z\ and Zj, and <r’s are th e standard deviations. Thus, sam pling does not ' 1 ’ extra bias com pared to the full cohort. W hen Z 1 anti Z-i are independent, th ere is no bias. For idealized intervention trials with Z\, Zi both binary or norm ally distributed a t the beginning of the trial, the jo in t distributions of Z ’s are changed over tim e. Generally, the m axim um likelihood es tim ates from both full cohort and nested case-control designs are biased com pared to th e tru e model. However, asym ptotically, a nested case-control design provides a close estim ate to the full co hort design. W ith sm all cohort or m oderate cohort size w ith rare exposure variables, m ultiple controls per subject are necessary. The principles in selecting th e num ber of controls per subject w ith no model m issperilications apply. 0612 • T he conjectured formulae for calculating the asym ptotic score vari ance. inform ation and variance of I) provide sim ilar results to our sim ulation studies in covariate omission settings. In the cases we studied (binary exposure w ith no interaction term ), the difference between score variance and inform ation is very sm all. W ith a cohort size of 500. model based variance (inverse of observed inform ation) provides a very good approxim ation to th e true variance when full cohort designs or nested case-control designs with m ultiple controls per subject are chosen. Those conclusions apply for both situations w ith or w ithout covariate omission. 2. M ismodelling available m easurm ents: • If tru e model is A„(/) < and working model is , \ |( / )c/J^(,). For rare diseases or an idealized intervention trial, if Z ( 0) is norm ally distributed w ith mean 0, then = /3jull = 0. O therw ise, the bias com pared to the true m odel depends on the distribution of Z. Nested case-control designs provide close estim ates to full cohort designs. • If th e true m odel is the accelerated failure tim e m odel logT = cv0 + (iTZ + rr IT with II the extrem e value distribution for a m inim um (7' has a Weibull .!! ' and the working model is A i(/)t,< ;\ then 110 494731 iKrst = & full = ~ < J ' n - Nested case-control design does not in troduce extra bias. 3. M easurem ent error: • For a classical m easurem ent error m odel Z~(t) = Z(t) + e(l), if the tru e model is Ao(t)cnZ and the working model is Aj(t)ef3Z' , and Z m (t), Z(t) are biva.ria.tely norm ally distributed and th e distribution does not change over tim e, then j- _ j . _ ‘K»t - Hfull - (' a l + „ y where ■ o f are the variances of Z(t).f(l). respectively. Nested case-control designs have sim ilar perform ance to full cohort designs in this situation, (lonerally, the technifpie used in covariate om is sion can be applied here, since this model can be considered as a special case of missing covariates with tru e model A o(0(- rt2*< < ) The consequences of misspecifying the Cox model for full cohort designs have been extensively investigated in recent years (Gail cl al, 1984; Lagakos, 1988a; Lagakos, 19886; Lagakos and Schoenfeld. 1984; Morgan. 1986; Solomon, 1984; S truthers and Kalbfleisch. 1986; Lin and VVei. 1989). O ur conclusions re garding full cohort designs are consistent, with previous studies. We show th at w ith covariate omission, conventional m odel-based variance provides a very good estim ation of the tru e variance. Gail ef al. (1988) show th a t proportional hazards m odel are robust when relevant balanced covariates are om itted, and the conventional m odel-based score tests retain nom inal size. Lin and Wei (1989) derive th e asym ptotic distribution of the m axim um partial likelihood estim ato r for full cohort, designs when th e Cox proportional hazards model is m isspecified in a very general form. T h ey show th a t for general m odel mis- specification, th e conventional m odel-based inference procedures often lead to tests with supranom inal size and confidence intervals with poor coverage probability. However, their sim ulation results does indicate that w ith covariate omission, the test size is nominal. T hey propose a robust variance estim ator for full cohort design and show that it improves th e test size for o th er types of m odel misspecification for full cohort design. For classical m easurem ent error m odel, we derived the atten u atio n factor for both full cohort and nested case-control designs when m easurem ents and errors are norm ally distributed and th e distributions are constant over time, which is the sam e as Pepe cl al. (1989) calculated for full cohort designs. F uture work exists in following areas: • T heoretically derive th e asym ptotic distribution of the m axim um partial likelihood estim ator ,i for nested case-control designs with general model misspecifications. A robust variance estim ator for nested case-control designs with general model m isspecifications m ay be provided based on th e asym ptotic variance of li. • C alculate the asym potic bias of o th er types of model misspecification if one has particular interests, to see if nested case-control designs still 112 represent full cohort designs. For example, m ore than one- covariate is om itted with only one or m ultiple covariates left in the working model. T his has been investigated by Bretagnolle and H uber-Carol (1988) for full cohort designs. We anticipate th a t the sam e results apply to nested case-control designs. Both theoretical calculation and sim ulation ap proach can be adopted. Using counting processes, m artingale theorem s and sim ilar, but m uch m ore com plicated technique to derive the asym ptotic distribution of m axim um partial likelihood estim ato r for general sam pling m ethods (not lim ited to nested case-control design) with general model misspecifica tions. Borgan ft nl. (1992) develop the general asym ptotic theory for th e maximum partial likelihood estim ator for general sam pling m e'1 's when proportional hazards m odels are correctly specified. 85 References A ndersen, P. K. and CJill, R. D. (1982). Cox’s regression m odel for counting processes: A large sample study. Ann. Statist. 10, 1100-1120. Begg, M. D., and Lagakos, S. (1992). Effects of m ismodeling on tests of association based on logistic regression m odels. The annals of Statistics 20, 1929-1952. B organ, 0 ., G oldstein, L , Langholz, B. (1993). M ethods for the analysis of sam pled cohort data in the Cox proportional hazards model. Technical R ep o rt 50/92. D epartm ent of Preventive M edicine, B iostatistics Division, U niversity of Southern California. Breslow, N. E. and Day. N. E. (1987). Statistical Met betels in Cancer Research. V olum e 2 - The Design and Analysis of Cohort Studies, I ARC Scientific Publications, Vol. 82. Internat ional Agency for Research on Cancer, Lyon. Breslow, N. E., Lubin. J. 11.. M arek. P., and Langholz. B. (1983). M ultiplica tiv e models and cohort analysis. J. Amer. Statist. Assoc. 78. 1-12. B retagnolle. .1. and Iluber-C arol. C. (1988). Effects of o m ittin g covariates in C ox's model for survival d ata. Stand J Statist 15, 125-138. C arroll, R. J. (1989). Covariate analysis in generalized linear m easurem ent e rro r models. ./. Statistics in Mtdicine 8. 1075-1093. C arroll, R. .).. Spiegelman. C. II.. Lan, K. K., Bailey, K. T ., A bbott, R. D. (1984). On errors in variables for binary regression m odels. .1. Biomelrika 71, 19-25. C arroll, R. J., Gallo, P. P.. Closer. L. J. (1985). Com parison of least scjuares an d errors in variables regression with special reference to random ized an al ysis of covariance. Journal of the American Statistical Association 80. 929-932. C hastang. C.. Bvar. D.. Piantadosi. S. (1988). A q uantitative study of th e bias in estim ating th e treatm ent effect caused by om itting a balanced covariate in survival m odels. Statistics in Medicine 7, 1213-1255. 114 Cox, D. R . (1972). Regression m odels and life-tables (with discussion). ./. Roy. Statist. Soc. B 34, 187-220. Cox, D. R . (1975). P artial likelihood. Biometrika 62, 269-276. Currie, J . B. (1971). Matched and unm atched: A com parison of two design, w ith epidem iologic data. Am ./ Epidemiol 93, 315-316. Fuller, W . A. (1987). Measurement error models, Wiley, New York. Gail, M . II. (1986). A djusting for covariates th a t have the sam e distribution in exposed and unexposrd cohorts, in M oolgavkar, S. H. and Prentice, R. L. (eds), Modern Statistica Methods in Chronic Disease Epidemiology, Wiley, New York, pp. 3-18. Gail, M . H. (1988). T h e effect of pooling across strata in perfectly balanced stu d ies. Biometrics 44. 151-162. Gail, M . H. (1991). A bibliography and com m ents on the use of statistical m odels in epidemiology in the 1980’s. Statistics in Medicine 10, 1819- 1885. Gail, M . H ., Tan, W . Y.. Piantadosi, S. (1988). Tests for no treatm en t effect in random ized clinical trials. Biometrika 75. 57-64. Gail, M . H ., W ieand. S., Piantadosi. S. (1984). Biased estim ates of treatm en t effect in random ized experim ents with nonlinear regressions and om itted covariates. Biome Irika 71. 131-144. Gill, R. D . (1984). U nderstanding C ox’s regression model: A m artingale ap pro ach . J. Amer. Slat. ,4sso. 79. 441-447. G oldstein, L. and Langholz, B. (1992). A sym ptotic theory for nested case- co n tro l sampling in the Cox regression m odel. Ann. Statist, (to appear). G reenland, S. (1980). The elfect of misclassification in the presence of covari ates. Am J Epidemiol 112. 561-569. Hardy, R . J. and W h ite, C. (1971). M atching in retrospective studies: Am J Epidemiol 93, 75-76. 115 K upper, L. L. (1984). Effects of the use of unreliable surrogate variables on the validity of epidemiologic research studies. Am J Epidemiol 120, 643-648. Lagakos, S. W. ( 1988«). Effects of m ism odelling and m ism easuring explanatory variables on tests of their association w ith a response variable. Statistics in Medicine 7, 257-274. Lagakos, S. W. (19886). T he loss in efficiency from m isspecifying covariates in proportional hazards regression models. Biometrika 75, 156-160. Lagakos, S. W. and Schoenfeld, D. A. (1984). Properties of proportional- hazards score tests under misspecified regression models. Biometrics 40, 1037-1048. Lin, D. Y. and Wei, L. .1. (1989). T he robust inference for the Cox proportional hazards model. Journal of the American Statistical Association 84, 1074- 1078. M iettinen, O. S. (1970). M atching and design efficiency in retrospective stu d ies. Am J Epidemiol 91, 111-118. M organ, T . M. (1986). O m itting covariates from the proportional hazards m odel. Biometrics 42, 993-995. N ewm an, T. G. and Odell, P. L. (1971). 7'he generation of random variates. C harles Griffin < V Co. Ltd, London. Pepe, M. S., Self, S. G.. Prentice. R. L. (1989). f urther results in covariate m easurem ent error in cohort studies with tim e to response d ata. Statistics in Medicine 8. 1167-1178. Prentice, R. L. (1982). Covariate m easurem ent errors arid p aram eter estim a tion in a failure tim e regression model. Biometrika 69, 331-342. Seigel, D. G. and Greenhouse, S. W. (1973). Validity in estim ating relative risk in case-control studies. ./. Chron. Dis. 26, 219-225. Solomon, P. J. (1984). Effect of m isspecification of regression models in the analysis of survival data. Biometrika 71, 291-298. 116 Stefanski, L. A. and Carroll. R. J. (1985). Covariate m easurem ent error in logistic regression. Annals of Statistics 13. 1335-1351. S truthers, C. A., and Kalbfleisch, J. D. (1986). M isspecified proportional hazard models Biometrika 73, 363-369. T hom as, D. C. and G reenland, S. (1983). T h e relative efficiencies of m atched and independent sam ple designs for case-control studies. J Chron Dis 36, 685-697. Ury, H. (1975). Efficiency of case-control studies with m ultiple controls per case: C ontinuous or dichotom ous data. Biometrics 31 , 643-649. W einberg, C. R. (1985). On pooling across stra ta when frequency m atching has been followed in a cohort data. Biometrics 41, 117-127. 117
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11257361
Unique identifier
UC11257361
Legacy Identifier
9621652