Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Analysis of gene-environment interaction in lung cancer
(USC Thesis Other)
Analysis of gene-environment interaction in lung cancer
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g^ maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the bade of the book. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6” x 9” black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. A Beil & Howell information Company 300 North Zeeb Road. Ann Arbor. M l 48106-1346 USA 313/761-4700 800/521-0600 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ANALYSIS OF GENE-ENVIRONMENT INTERACTION IN LUNG CANCER by John Leo Morrison A Thesis Presented to the FALCULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree MASTER OF SCIENCE (Biometry) May 1997 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 1384913 Copyright 1997 by Morrison, John Leo All rights reserved. UMI Microform 1384913 Copyright 1997, by UMI Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. UMI 300 North Zeeb Road Ann Arbor, MI 48103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY O F S O U T H E R N CA LIFO RN IA t h e g r a d u a t e s c h o o l U NIVERSITY PARK LOS ANGELES. CALIFORNIA S 0 0 0 7 This thesis, written by John L. Morrison under the direction o f h— is.JThesis Com mittee, and approved by all its members, has been pre sented to and accepted by the Dean of The Graduate School, in partial fulfillm ent of the requirements fo r the degree of Master of Science T)nU February 13, 1997 [TTEE THESIS c o ; Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. John Leo Morrison William James Gauderman, Jr. Analysis of Gene-Environment Interaction in Lung Cancer The Louisiana Lung Cancer Dataset, consisting of 337 extended pedigrees, is analyzed to determine whether a major Mendelian gene interacts with cumulative tobacco smoking (pack-years). The proportional hazards model is utilized as it is a natural framework for estimating relative risks while adjusting for variability in age of disease onset. Segregation analyses show evidence that a Mendelian gene is segregating in these families with the most parsimonious model including sex, pack-years, pack-years squared, and a dominant major gene. The estimated frequency of the high risk allele is 2% and carriers are estimated to have relative risk of 17.3 for developing lung cancer, compared to non-carriers. There was no significant evidence for a gene-smoking interaction, either by direct modeling or by stratifying the pedigrees based on a surrogate for smoking exposure. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. DEDICATION To my parents who always believed in the value of education and whose love and support made this possible. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ACKNOWLEDGMENTS I would like to express my sincerest appreciation to the Chairman of my committee, Dr. William James Gauderman Jr., for his support and guidance in the development of this thesis. I would also like to thank Dr. Duncan Thomas for his guidance, encouragement, and the sharing of his knowledge of the subject. Thanks to Dr. Jonathan Buckley for the privilege of working on the development of G.A.P. which added greatly to my understanding of genetic analysis. Special thanks to Dr. Stanley Azen for his support throughout by entire graduate training. Thanks to Dr. Henry Rothschild for providing me access to the Louisiana Lung Cancer Dataset. Finally I would like to thank Dr. Cheryl Faucett for her programming in G.A.P. and Dr. Catherine Carpenter for her knowledge of lung cancer. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. TABLE OF CONTENTS DEDICATION................................................... ii ACKNOWLEDGMENTS ............................................ iii LIST OF T A B L E S ............................................ v Chapter Page I. INTRODUCTION......................................... 1 II. MATERIALS AND METHODS ................................ 4 III. RESULTS..................................................11 IV. DISCUSSION..............................................19 REFERENCES................................................... 22 iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. L IS T OF TABLES Table Page I. Summary of model parameters ............................ 9 II. Descriptive Statistics of the LLCD...................... 10 III. Segregation analysis results without gene-smoking interaction in the model, all families..................12 IV. Segregation analysis results without gene-smoking interaction in the model, early onset proband families.................................................. 13 V. Segregation analysis results without gene-smoking interaction in the model, late onset proband families.................................................. 15 VI. Parameter estimates and confidence intervals for the dominant model for early and late onset proband families.................................................. 16 VII. Segregation analysis results with gene-smoking interaction in the model................................. 17 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER I IN T R O D U C T IO N Primary lung cancer represents roughly 15% of all cancer cases in the United States (22% in males, 8 % in females). However, it accounts for 25% of all cancer-related deaths, due to its high fatality rate (34% in males, 15% in females). Case-control studies in the 1950’s indicated a link between cigarette smoking and lung cancer (Levin et al., 1950, Doll and Hill, 1952). Subsequent large cohort studies confirmed this link, reporting relative risks of 9 to 14 for currently smoking males compared to nonsmokers (Kahn, 1966, Hammond, 1972, Doll and Peto, 1976), and 2.2 for currently smoking females (Hammond, 1972). All of these studies found a strong dose-response relationship. In addition to smoking, occupational exposures have also been found to increase lung cancer risk. Hammond et al. (1979) found relative risks due to asbestos exposure of 5 for exposed nonsmokers and 50 for exposed smokers, compared to unexposed nonsmokers. O ther occupational risk factors for lung cancer include radon, mustard gas, chromium, nickel, and rubber. Environmental factors suspected to increase lung cancer risk include air pollution, radiation, and a deficiency of vitamin A in the diet (Fraumeni and Blot, 1982). Residential radon has also been implicated as a risk factor, showing evidence for interaction with smoking (Pershagen et al., 1994). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Evidence is mounting that host factors also play an important role in the risk of lung cancer. A genetic component was implicated by Tokuhata and Lilienfeld (1963), who reported significant familial clustering of the disease. Wu et al. (1988) also found a significant effect of family history after controlling for other risk factors, including smoking. Using frailty models, Mack et al. (1990) reanalyzed the data of Wu et al. and concluded that the observed familial clustering could not be fully eplained by familial aggregation of smoking. Using the Louisiana Lung Cancer Dataset (LLCD, described below) Ooi et al. (1986) compared family members of 337 deceased lung cancer cases (probands) of family members of their spouses and found that even after controlling for age, sex, cigarette smoking and occupational exposures, relatives of cases were at increased risk (RR=2.4) for developing lung cancer. Sellers et al. (1990) analyzed the LLCD and found evidence of a m ajor gene segre gating in the families of the 337 probands, after adjusting for smoking and variability in age at onset. In a subsequent analysis, Sellers et al. (1992) divided the 337 pedigrees into two groups based on whether or not the proband was at least 60 years old. The rational for these two groups was that the probands with an age of onset less than 60 were bom post WWI when smoking in the U.S. rose quickly. Thus, the probands with an age of onset less than 60 were more likely to smoke compared to probands with an age of onset 60 or greater. They found different segregation patterns in the two groups and suggested that this was due to differences in smoking prevalence between the two cohorts (with younger probands and their relatives more likely to smoke). Although this 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. analysis indicated a possible gene-smoking interaction, no direct modeling of such an effect was performed. In this thesis, the LLCD is modeled using an extension of Cox’s proportional hazards model, as it provides a natural framework for estimating relative risks while adjusting for variable age of onset. All models include smoking history and gender as covariates. A segregation analysis with no gene-smoking interaction is performed on the LLCD. The LLCD is then divided into two groups based on whether or not the proband is at least 60 years old. Segregation analyses with no gene-smoking interaction are performed on these two groups. A segregation analysis with a gene-smoking interaction term is performed on the entire LLCD. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER II M aterial a n d M ethods Louisiana Lung Cancer Dataset White persons who died of lung cancer during a four-year period (1976-1979) in ten Louisiana parishes were identified (n=440) from the Office of Public Health Statistics, Louisiana Health and Human Resources Administration, New Orleans. Trained inter viewers obtained information on proband’s parents, siblings, half-siblings, spouses, and offspring. D ata were collected from each family member by telephone, or when not pos sible (29%), by mailed questionnaire or in-person interview. Adequate 2 or 3-generation pedigree data was collected for 337 (76%) of the 440 identified probands. The remaining families were excluded due to a lack of information on names, telephone numbers, and addresses (15%), incomplete data on all pedigree members (3%), or refusals to partici- pate(6 %). Lung cancer affection status in non-probands was verified by comparison with a random sample of death certificates (70%), or by corroboration of additional family members. The age at death, or the current age for those living was also obtained. Information on occupation, smoking status, packs of cigarettes smoked per day, and the total number of years of cigarette consumption was collected. A quantity denoted pack- years was calculated as packs per day times years of consumption. For use in modeling, 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. pack-years was standardized by subtracting its mean (22.9) and dividing by its standard deviation (10.0). Also computed was a cumulative index of occupation exposure. Models I analyze the 337 proband-pedigrees, consisting of a total of 4,356 subjects. Pedigree sizes range from 4 to 28 members. The outcome for each subject i includes a disease indicator (d,-) and an age of onset (£*), which is the age at death for diseased subjects or the last known disease-free age for unaffected subjects. To account for variability in age of onset and right censoring, the standard epidemio logical approach of modeling on the hazard scale in adopted. Cox’s proportional hazards model is utilized to express the age-specific incidence rate as a function of observed covariates (z), an unobserved major gene (g), and their interaction(s), i.e. A (t) = X0(t)epz+^G+rtGz ( 1 ) The function Ao(i) describes the baseline dependence of disease risk on age, G.z repre sents an interaction between the major gene and one or more of the measured covariates, and (3, 7 , and 77 are regression coefficients to be estimated. The disease gene (A) is as sumed to be diallelic and in Hardy-Weinberg equilibrium with allele frequency q to be estimated. Each individual’ s genotype g is mapped into a “genetic risk” covariate G on the range [0,1] depending on the mode of inheritance. For example, under a dominant mode of inheritance, G= 1 for genotypes ‘aA’ and ‘AA’, and G =0 for genotype ‘aa’. In a preliminary analysis of non-probands only, we used standard Cox regression treating subjects as independent to determine which measured covariates to include 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in the model. The model that produced the best fit included pack-years, pack-years squared, and sex. The cumulative index of occupation exposure and a dichotomous ever/never smoked variable were not significantly associated with lung cancer risk, given the other variables included in the model. Also, there was not a statistically significant interaction between sex and pack-years. We tested the proportional hazards assumption for each significant variable by adding age-covariate interactions to the above model (see Kalbfleisch and Prentice, 1980). Neither pack-years, pack-years squared, nor sex showed significant deviations from proportionality. Since a particular parametric form for Aq(t) is difficult to specify for lung cancer, I model it as a step function on a predefined set of 5 age intervals, based on the cutpoints t(X ) = 50, f(2) = 60, £(3) = 70, and t(4) = 80. The baseline hazard is then expressed as A 0(£) = A * for tk-i < t < tk, k = 1,..., 5 (2 ) with o ) = 30 and i(5) = 100. In this dataset, there were no lung cancers in subjects younger that 30, so Ao(f) was set to zero for t < 30. Table I summarizes all of the model parameters. For notational simplicity, the set of hazard model parameters will be denoted by fi = {/?, 7 , 77, Ai,..., A 5 }. The appropriate penetrance function for a given individual based on the proportional hazards model is given by /(* ,% * * .• ,« ) = [Aft)]* Sfc) (3) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where S(f,) = exp(— Jq X(s)ds) is the survival function, i.e. the probability of surviving disease free to age £j. The likelihood of the parameters for each family ( / = 1,..., 337), uncorrected for ascertainment, is given by Lf (d,t\z, Q, qA, r) = ]T Pr(d,t\g,z, Q)Pr(g\qA, r) (4) £ where the first factor is the product over all subjects in the family of the penetrance probability in Equation 3 and the second factor is the genotype vector probability, which is a function of the allele frequency qA for founders and a transmission probabilities r = tAA, 7 . 4a) raa for nonfounders. To account for sampling of pedigrees via diseased probands, a correction for single ascertainment must be made. This is achieved by dividing the above likelihood by LAj the probability that the proband was affected at his/her age of onset (or age of examination), i.e. LAf(dp,tp\zp,n,qA,T) = ^2 Pr(dp,tp\gp, zp,n)Pr(gp\qA,r) (5) 9p where p indicates the value for the proband. The corrected likelihood is then given by L = El f(Lf/LAf). Segregation analysis was performed by first fitting the general “ousiotype” model (Cannings et al., 1978) and several nested alternatives using likelihood ratio tests. These alternatives included Mendelian codominant, dominant, and recessive models, the spo radic (no-major-gene) model, and the pure environmental model (i.e. a single discrete type with frequency qA, independent and identically distributed among all subjects). Mendelian inheritance is supported when one or more of the Mendelian models cannot be statistically rejected while the sporadic and pure environmental models are rejected. 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For any subject with missing data, the penetrance in Equation 3 was set to one for all genotypes, yielding a “complete case” analysis. Table I lists the parameters in the models. All analyses were performed using the Genetic Analysis Package (G.A.P., 1997). The LLCD was divided into two subsets based on the age of onset for the proband in each family. Families where the age of onset was less than 60 formed the early-onset proband families. This subset consisted of 106 families with 1,349 subjects. Families where the age of onset of the proband was at least 60 formed the late-onset proband families This subset consisted of 231 families with 3,007 subjects. Table II shows the descriptive statistics of the LLCD and the two subsets. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table I Summary of model parameters Parameter Description Pi Log relative risk for males, compared to females P2 Log relative risk per unit of standardized pack-years {(pack years - 22.9)/10} P 3 Log relative risk per unit of squared standardized pack-years Y g Log relative risk for carriers of type g (AA and/or Aa), compared to wild type (aa) 'Hgxpy Differences in log relative risk per unit of standardized pack-years for type g, compared to wild type Xj,...,Xs Baseline hazards for age intervals 30-50, 50-60, 60-70, 70-80, 80-100, respectively, i.e. age specificdisease rates for wild-type females with 22.9 pack-years 4 A Population frequency of high risk allele A . Probability that a parent of type g (g=AA, Aa, or aa) transmits A to an offspring Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table U Descriptive Statistics of the LLCD All Early onset Late onset _________ Families proband families proband families Families 337 106 231 Subject 4,356 1,349 3,007 Subjects with complete data" 3,336 (76.6) 1,035 (76.7) 2,301 (76.5) Males* 1,662 (49.8) 535 (51.7) 1,127 (49.0) Cases* 412 (12.3) 132 (12.8) 280 ( 1 2 .2 ) Nonproband cases* 91 (3.0) 29 (3.0) 62 (3.0) Age of onsetc 56.3 (17.7) 47.9 (17.4) 60.3 (16.6) Pack yearsc 23.0 (30.9) 21.3 (27.9) 23.9 (32.2) a Count(percent of total subjects) b Count(percent of subjects with complete data) c Mean(standard deviation) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER IH RESULTS Segregation analysis by maximum likelihood was performed four times. The first three analyses did not include a gene by smoking interaction in the penetrance function. These analyses were performed on the complete LLCD, early-onset proband families only, and the late-onset proband families only. The final analysis was performed on the complete LLCD with a gene by smoking interaction term in the penetrance function. Table III shows the results for the segregation analysis on the complete LLCD without a gene by smoking interaction in the penetrance function. Both the environmental and sporadic model could be rejected, but none of the Mendelian alternatives could be rejected. The dominant model is the most parsimonious, yielding the lowest AIC. The estimated allele frequency from this model is 0.024, and the relative risk for carriers versus non-carriers is exp(2.85) = 17.33. The estimated risk for males is 1.7 times the risk for females, even after adjusting for smoking. Based on the estimates of /% and /? 3 , a person who has smoked the average amount (23 pack-years) has 3.4 times the lung cancer risk of a non-smoker. Table IV shows the segregation analysis on the early age of onset families without a gene smoking interaction in the model. Neither the sporadic nor the environmental model could be rejected. Among the Menelian models, the dominant model yields the 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table III Segregation analysis results Without gene-smoking interaction in the model AH familes (n=337) ______________________________ Hypothesis__________________ Mendelian Parameter Dominant Recessive Codominant Sporadic Environmental General p, (male) 0.555 0.587 0.555 0.581 0.581 0.558 P2 (p y ) 0.426 0.409 0.426 0.360 0.360 0.436 p3 (p y 2) -0.047 -0.043 -0.047 -0.036 -0.036 -0.048 Y a a 2.853 2.743 2.212 0.000° 0 .0 0 0 ' 2.096 J ab 2.853* 0.000° 2.870 0.000° 0.000' 0.026 < 7 A 0.024 0.297 0.024 - 1 .0 0 0 0.026 taa 1.000° 1.000° 1.000° - 1.000* 0.547 TAa 0.500° 0.500° 0.500° - 1.000* 0.451 *Aa 0.000° 0.000° 0.000° - 1.000* 0 .0 0 0 x W 0.31 (4) 5.08 (4) 0.28 (3) 14.36 (6) 14.36 (6) -- p-value 0.99 0.28 0.96 0.026 0.026 - AIC'' 1240.0 1244.8 1242.0 1250.1 1250.1 1247.7 a Parameter value is fixed b Parameter value is constrained to equal preceding parameter value c Likelihood-ratio chi-squared (degrees of freedom) testing the null hypothesis of no difference from the general model d AIC = -2(Log likelihood) + 2(Number of free parameters) e The genetic risk parameters converged to zero, thus the ennvironmental model is equivalent to the sporadic model 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table IV Segregation analysis results W ithout gene-smoking interaction in the model Early-onset proband familes (n=106) ______________________________ Hypothesis__________________ Mendelian Parameter Dominant Recessive Codominant Sporadic Environmental General Pj(male) 0.882 1.047 0.902 0.994 1.017 0.678 P2 (py) 0.580 0.542 0.640 0.498 0.523 0.737 P3 (py2 ) -0.059 -0.053 -0.072 -0.051 -0.055 -0.070 Y a a 3.627 3.132 5.625 0.000“ -67.15 3.679 Yaa 3.627* 0.000“ 3.285 0.000“ -56.68 7.168 < n 0.105 0.102 0.060 - 0.125 0.073 t a a 1.000“ 1.000“ 1.000“ - 0.125* 0 . 0 0 0 * A a 0.500“ 0.500“ 0.500“ -- 0.125* 0.382 * A a 0.000“ 0.000“ 0.000“ - 0.125* 0.009 x \ df)c 2.37 (4) 2.66 (4) 1.80 (3) 3.96 (6) 3.92 (3) - p-value 0.67 0.62 0.62 0.68 0.27 - AIC'' 348.07 348.36 349.49 355.62 351.62 353.70 a Parameter value is fixed b Parameter value is constrained to equal preceding parameter value c Likelihood-ratio chi-squared (degrees of freedom) testing the null hypothesis of no difference from the general model d AIC = -2(Log likelihood) + 2(Number of free parameters) 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. lowest AIC. The estimated allele frequency from the dominant model is 0.105, and the relative risk for carriers versus non-carriers is exp(3.627) = 37.60. Table V shows the segregation analysis on the late age of onset families without a gene smoking interaction in the model. Neither the sporadic nor the environmental model could be rejected. Among all the models, the dominant model was the most parsimonious, yeilding the lowest AIC. The estimated allele frequency from this model is 0.022, and the relative risk for carriers versus noncarriers is exp(2.64) = 14.01. Table VI shows the parameter estimates with 95% confidence intervals for the dom inant model fitted to both the early and late age of onset families. The confidence intervals for each parameter overlap indicating that there is not a significant difference between the two groups. The 10-degree of freedom likelihood ratio test for a difference in the two groups gives x2 = 7.83, corresponding to a non-significant p-value of 0.65. Table VTI shows the segregation analysis when a gene by pack-years interaction is included in the penetrance function. Once again, the dominant model fits the data best, and both non-Mendelian alternatives are rejected. The estimated interaction log-relative risk from the dominant model is 77 = 0.024 with a standard error of 0.07. Comparing the dominant models from Tables III and VII, the 1-degree of freedom likelihood ratio test for the addition of the interaction term gives x2 = 0-13, corresponding to a non significant p-value of 0.74. The estimates of the allele frequency and the main effects of the gene, smoking, and sex in the dominant model axe virtually unchanged with the addition of the interaction term. 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table V Segregation analysis results Without gene-smoking interaction in the model Late-onset proband familes (n=231) ______________________________ Hypothesis__________________ Mendelian Parameter Dominant Recessive Codominant Sporadic Environmental General P,(male) 0.443 0.474 0.442 0.446 0.445 0.431 P2 (py) 0.359 0.354 0.359 0.313 0.313 0.346 Pj (py2 ) -0.040 -0.034 -0.041 -0.032 -0.032 -0.038 Ya a 2.639 2.997 1.724 0.000° 0.0007 1.975 Y a * 2.639* 0.000° 2.870 0.000° 0.0007 2.466 Q a 0.022 0.262 0.023 - 0.999 0.022 t a a 1.000° 1.000° 1.000° - 0.999* 0.690 * A a 0.500° 0.500° 0.500° - 0.999* 0.640 * A a 0.000° 0.000° 0.000° -- 0.999* 0.000 x W 0.33 (4) 0.55 (4) 0.30 (3) 7.50 (6) 7.50 (3) - p-value 0.99 0.97 0.96 0.28 0.06 - AIC7 894.1 894.3 896.1 897.3 903.3 901.8 a Parameter value is fixed b Parameter value is constrained to equal preceding parameter value c Likelihood-ratio chi-squared (degrees of freedom) testing the null hypothesis of no difference from the general model d AIC = -2(Log likelihood) + 2(Number of free parameters) 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table VI Parameter estimates and confidence intervals for the dominant model for early and late onset proband families Early onset Late onset proband families proband families Estimate 95% Cl Estimate 95% Cl P, (male) 0.88 ( - 0 . 1 9 , 1.95) 0.44 (-0.18 , 1 .02) p2(py) 0.58 ( 0 .28 , 0.88) 0.36 ( 0.20 , 0 .5 1 ) P3 (py2 ) - 0 . 0 6 ( - 0 . 1 0 ,- 0 . 0 2 ) - 0 .0 4 (-0.06 , - 0 . 0 2 ) Ysa 3.63 ( 1 .12 , 6.10) 2.64 ( 1.60 , 3 .6 7 ) <Ia 0.10 ( 0 .0 0 , 0.74) 0.02 ( 0.00 , 0 .1 0 ) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table VH Segregation analysis results With gene-smoking interaction in the model All familes (n=337) ____________________________ Hypothesis________________ Mendelian Parameter Dominant Recessive Codominant Sporadic Environmental General Pi (male) 0.553 0.591 0.539 0.581 0.609 0.538 Pz(py) 0.409 0.422 0.458 0.360 1.996 0.468 p3 (py2 ) -0.047 -0.044 -0.051 -0.036 -3.679 -0.052 Y a a 2.792 2.798 -16.85 0.000“ 4.409 -17.12 Y a j 2.792* 0.000“ 2.935 0.000“ -5.780 3.017 h /tX xpy 0.024 0.591 0.539 0.000“ 0.609 0.538 ^Aaxpy 0.024* 0.000“ -0.018“ 0.000“ 0.760 -0.021 < ! a 0.025 0.287 0.023 - 0.551 0.025 ta a 1.000“ 1.000“ 1.000“ -- 1.000* 0.951 * A a 0.500“ 0.500“ 0.500“ - 1.000* 0.453 ^ A a 0.000“ 0.000“ 0.000“ - 1.000* 0.000 x W 3.25 (5) 8.12(5) 0.09 (3) 17.4 (6) 12.1 (6) - p-value 0.66 0.15 0.99 0.026 0.007 - A IC*' 1241.9 1246.8 1242.7 1250.1 1254.7 1248.6 a Parameter value is fixed b Parameter value is constrained to equal preceding parameter value c Likelihood-ratio chi-squared (degrees of freedom) testing the null hypothesis of no difference from the general model d AIC = -2(Log likelihood) + 2(Number of free parameters) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A five-step baseline hazard was used in the previous analyses, which assumes the disease rates are homogenous across fairly large age intervals (See Table I). To check the validity of this assumption, we refit the dominant model with interaction, defining the baseline hazard over 20 2-year intervals from age 40 to 80. Since there are no diseased subjects less than 30, only one non-proband lung cancer between 30 and 40, and not many subjects older than 80, we kept these intervals broad. The estimated log-relative risks due to sex, pack-years, and pack-years squared were essentially unchanged from the dominant model results shown in Table III. The log-relative risk for gene-carriers was 2.93 from this model with allele frequency of 0.020, not substantially different from the previous analysis. 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER IV D ISCU SSIO N The results from these analyses support previous findings that a major gene plays an important role in lung cancer risk. An additional finding not previously observed is that there is no apparent interaction between the putative lung cancer gene and smoking. The most parsimonious model included a dominant major gene, sex, pack-years of tobacco smoking, and the square of pack-years. The results obtained differ significantly from the results of Sellers et al. (1990). They concluded th at the most parsimonious model included a codominant major gene when the entire LLCD or the early-onset proband families were analyzed, while the dominant model fit best for the late-onset proband families. For all the models fit in this thesis, the mendelian model with the lowest AIC is the dominant model. A possible cause for this difference is the penetrance model used. They assumed that age of onset of lung cancer for susceptible individuals was distributed logistically with a density that depended on a major gene and pack-years (but not sex). They also included a parameter for lifetime susceptibility. This differs significantly from the proportional hazards model assumed here. Proportional hazards is a standard tool used by epidemiologists but there may be other models more biologically plausible. 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For individuals with complete smoking data, pack-years of tobacco consumptions is a variable th at is subject to substantial measurement error. Each subject’ s value was obtained as the product of packs per day and years of consumption, a quantity that does not take into account when in calendar time the tobacco was consumed (i.e. the subject may have quit 1 0 years ago or may still be smoking) and does not quantify variation in the level of consumption over time. Additionally, under or over reporting of either packs per day or years of consumption could lead to a large error in pack-years. Nondifferential random error in a measured covariate has been shown to bias regression coefficients toward zero in the proportional hazards model (Thomas et al., 1993). Failure to accurately measure a covariate which is correlated within families can also lead to biased estimates of the effect of a major gene. Covariates that could fall into this category include second-hand smoke, dietary factors, radon exposure, and occupation exposures. Severe under reporting of pack-years, especially in diseased subjects, would also cause an upward bias in the genetic relative risk if the amount of tobacco consump tion is correlated among family members in a way that mimics Mendelian inheritance. However, it is unlikely that the strong genetic effect supported by these data is due solely to mismeasurement or failure to measure environmental factors. Detecting a gene-environment interaction is difficult in segregation analysis since the gene is an unmeasured quantity. Gauderman and Faucett (1997) demonstrated that the power for detecting a G x E interaction increases with the inclusion of a tightly linked marker, but that the power is still much lower them if the major gene could actually be observed. However, the large sample size in this lung cancer dataset does provide 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the possibility of detecting a moderate interaction effect using a segregation model. For example, using the estimated standard error of 77, I would have 80% power to detect a true 77 of 0.20 at the 5% significant level. The results of this study of lung cancer imply the possible existence of at least two risk groups. We find that heavy smokers have high lung cancer risk, with or without an inherited major gene mutation. Additionally, our results indicate that persons inheriting a mutation axe at increased risk whether or not they smoke. We do not find a statistical interaction between smoking and a major gene on the hazard ratio scale, an appropriate scale for modeling age-specific disease risk. However, this does not exclude the existence of a biological interaction which increases genetic risk for a specific subgroup defined by level of smoking (London et al. 1995a, Nakachi et al. 1991). Part of the work in this thesis is about to be published (Gauderman et al. 1997). Specifically the segregation analyses for the entire d ata set with and without the inter action term are summarized in that paper. This thesis examines the data by stratifying on age of onset of the proband, a surrogate for smoking exposure, which differs from the paper. The rapid advances in molecular biology in terms of identifying genes and their functions should lead to a greater understanding of the interplay of genetic and environmental factors in the etiology of lung cancer. 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R eferences Cannings C, Thompson EA, Skolnick MH (1978) Probability functions on complex pedi grees. Adv Appl Prob 10:26-61. Doll R, Hill AB (1952) A study of the aetiology of carcinoma of the lung. Br Med J 2:1271-1286 Fraumeni JF, Blot W J (1982) Lung and Pleura. In: Schottenfeld D, Fraumeni JF (eds) Cancer epidemiology and prevention. W.B. Saunders Company, Philadelphia, pp 564- 582 G.A.P. (1997) Genetic Analysis Package, Release 1.0. Computer program available from Epicenter Software, Pasadena California. Gauderman W J, Faucett CL (1996) Detecting gene-environment interactions in joint segregation and linkage analysis. J Hum Genet, in review. Gauderman W J, Morrison JL, Carpenter CL, Thomas DC (1997) Analysis of Gene- Smoking Interaction in Lung Cancer. Genet Epidemiol, in press Hammond EC (1972) Smoking habits and air pollution in relation to lung cancer. In Lee DHK (ed) Environmental factors in respiratory disease. Academic Press, New York, pp 177-198 Hammond EC, Selikoff IJ, Seidman H (1979) Asbestos exposure, cigarette smoking and death rates. Ann NY Acad Sci 330:473-490. Kahn HA (1966) The Dorn Study of smoking and mortality among U.S. veterans: report on 8 1/2 years of observation. NCI Monograph 19:1-125. Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. John Wiley & : Sons, Inc, New York. Levin ML, Goldstein H, Gerhardt PR (1950) Cancer and tobacco smoking: A preliminary report. JAMA 143:336-338. London SJ, Daly AK, Cooper J, Navidi WC, Carpenter CL, and Idle JR. (1995a) Poly morphism of Glutathione S-transferase and lung cancer risk among African Americans and Caucasians in Los Angeles County, California.JNCI 87: 1246-1253. 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Mack W. Langholz B, Thomas DC (1990) Survival models for familial aggregation of cancer. Env Health Persp 37:27-35 Nakachi K, Imai K, Hayashi S, Watanabe J, Kawajiri K. (1991) Genetic susceptibility of squamous cell carcinoma of the lung in relation to cigarette smoking dose. Cancer Res 51:5177-5189. Ooi WL, Elston RC, Chen VW, Bailey-Wilson JE, Rothschild H (1986) Increased familial risk for lung cancer. J Natl Cancer Inst 76:217-222 Pershagen G, Akerblom G, Axelson O, Clavensjo B, Damber L, Desai G, Enflo A, et al. (1994) Residential radon exposure and lung cancer in Sweden. N Eng J Med 330:159-164. Sellers TA, Bailey-Wilson JE, Elston RC, Wilson AF, Elston GZ, Ooi WL, Rothschild H (1990) Evidence for Mendelian inheritance in the pathogenesis of lung cancer. J Natl Cancer Inst 82:1272-1279. Sellers TA, Bailey-Wilson JE, Potter JD, Rich SS, Rothschild H, Elston RC (1992) Effect of cohort differences in smoking prevalence on models of lung cancer susceptibility. Genet Epidemiol 9:261-272. Thomas DC, Stram DS, Dwyer J (1993) Exposure measurement error: Influence on exposure-disease relationships and methods of correction. Annu Rev Publ Health 14:69- 93 Tokuhata GK, Lilienfeld AM (1963) Familial aggregation of lung cancer in humans. J Natl Cane Inst 30:289-312. Wu AH, Yu MC, Thomas DC, Pike MC, Henderson BE (1988) Personal and family history of lung disease as risk factors for adenocarcinoma of the lung. Cancer Res 48:7279-7289. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Cluster analysis of p53 mutational spectra
PDF
Do reproductive factors have the same effect on breast cancer risk in African-American and White women?
PDF
A comparative study of environmental factors associated with multiple sclerosis in disease-discordant twin pairs
PDF
An exploration of nonresponse with multiple imputation in the Television, School, and Family Project
PDF
beta3-adrenergic receptor gene Trp64Arg polymorphism and obesity-related characteristics among African American women with breast cancer: An analysis of USC HEAL Study
PDF
Familiality and environmental risk factors of peptic ulcer: A twin study
PDF
Descriptive epidemiology of thyroid cancer in Los Angeles County, 1972-1995
PDF
A joint model for Poisson and normal data for analyzing tumor response in cancer studies
PDF
Associations between lung function growth and air pollution in two cohorts in Southern California children
PDF
Associations of weight, weight change and body mass with breast cancer risk in Hispanic and non-Hispanic white women
PDF
Candidate gene association analysis by principal components
PDF
Effects of myeloperoxidase (MPO) polymorphism and tobacco smoke on asthma and wheezing in Southern California children
PDF
Interaction of dietary fiber and serum cholesterol on early atherosclerosis
PDF
Impact of heterogeneity on evidence for linkage to type 2 diabetes mellitus in the Finnish population
PDF
Application of a two-stage case-control sampling design based on a surrogate measure of exposure
PDF
Analysis of the HSD3B2 gene in prostate cancer
PDF
Gene mapping using haplotype data
PDF
Endometrial cancer following breast cancer treatment: Tumor characteristics and predictors of survival
PDF
Does young adult Hodgkin's disease cluster by school, residence and age?
PDF
Determinants of mammographic density in African-American, non-Hispanic white and Hispanic white women before and after the diagnosis with breast cancer
Asset Metadata
Creator
Morrison, John Leo
(author)
Core Title
Analysis of gene-environment interaction in lung cancer
School
Graduate School
Degree
Master of Science
Degree Program
Biometry
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
biology, biostatistics,biology, genetics,health sciences, oncology,OAI-PMH Harvest
Language
English
Contributor
Digitized by ProQuest
(provenance)
Advisor
Gauderman, William James (
committee chair
), Buckley, Jonathan D. (
committee member
), Thomas, Duncan C. (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-12947
Unique identifier
UC11341326
Identifier
1384913.pdf (filename),usctheses-c16-12947 (legacy record id)
Legacy Identifier
1384913.pdf
Dmrecord
12947
Document Type
Thesis
Rights
Morrison, John Leo
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
biology, biostatistics
biology, genetics
health sciences, oncology