Page 44 |
Save page Remove page | Previous | 44 of 127 | Next |
|
small (250x250 max)
medium (500x500 max)
Large (1000x1000 max)
Extra Large
large ( > 500x500)
Full Resolution
All (PDF)
|
This page
All
|
34 simulated based on their specified associations with E and Y as in the single-marker simulation. Two sets of simulations were performed to assess power, sensitivity and specificity for discovery. The first set of simulations consisted of 1,000 replicates of N=10,000 samples with equal numbers of cases and controls. We specified W= 1 million, d = 1, pE=0.4, qA=0.1, and pY=0.05. The second set of genome-wide simulations consisted of 1,000 replicates of N=3,750 samples with equal numbers of cases and controls, W=10,000, d=20, pE=0.4, qA=0.225 for all d SNPs, and pY=0.01. We note that a smaller sample size was applied in the simulations used to create Receiver Operating Characteristic (ROC) curves in order to reduce sensitivity across all approaches and yield informative differentiation in results between methods. Such an approach was used because all methods in our comparison showed very high sensitivity, making it impossible to show differences between them. Unless otherwise specified, we simulated independent E and G, and ‘non-causal’ genetic variants with Pr (G=1) sampled from a uniform distribution within the range [0.10,0.40]. We set a null marginal environmental effect of OR(E)=1.0 for both sets of simulations except for one instance in which we calculated power using induced main G effects. For induced main effects, we increased the effects of E and G with increasing interaction effect. To measure empirical power in simulations with one designated ‘causal’ marker, we took the proportion of replicates in which the ‘causal’ marker was identified to be genome-wide significant (P-value ≤5×10−8). To create ROC plots, we repeatedly simulated sets of markers with d=20 designated causal markers. The resulting P-values in each iteration were ordered from least to greatest, and the number of ‘causal’ markers ranked within the set of k smallest P-values,
Object Description
Title | Bayesian model averaging methods for gene-environment interactions and admixture mapping |
Author | Moss, Lilit Chemenyan |
Author email | chemenya@usc.edu;liliths8686@yahoo.com |
Degree | Doctor of Philosophy |
Document type | Dissertation |
Degree program | Biostatistics |
School | Keck School of Medicine |
Date defended/completed | 2018-06-19 |
Date submitted | 2018-07-31 |
Date approved | 2018-07-31 |
Restricted until | 2018-07-31 |
Date published | 2018-07-31 |
Advisor (committee chair) | Conti, David |
Advisor (committee member) |
Gauderman, William James Thomas, Duncan Stram, Daniel Amezcua, Lilyana |
Abstract | Evidence suggests that identifying genetic contributions to the risk of complex diseases requires moving beyond independent tests of association between markers and traits. The purpose of this study is to present two methods within a Bayesian framework to be used in identifying gene-by-environment (GxE) interactions and genomic regions contributing to differential disease risk by ancestry. We first introduce a GxE approach which combines a Bayesian framework with a two-degree-of-freedom (2df) test structure for a simultaneous test of main and interaction effects. Simulations are used to present a comparison study of classical and more complex GxE approaches used currently and demonstrate that our proposed method performs similarly to existing 2df approaches with increased power and robustness in numerous scenarios. A second approach is introduced to perform admixture mapping and map susceptibility loci to complex disease with parental ancestry. Our admixture mapping approach provides a linear regression framework in which we reformulate the often used case-control and case-only statistics as nested regression models which are combined within a Bayesian model selection framework. Simulation is used to demonstrate that this approach is advantagous to using case-control or case-only statistics in increased power and robustness. We conduct two genome-wide interaction studies (GWIS) for childhood asthma using air pollution and ethnicity as environmental factors in a nested case-control sample from the Children’s Health Study (CHS). We conduct an admixture mapping of prostate cancer (PrCa) in African Americans and Latinos from the Multiethnic Cohort as well as multiple sclerosis in Hispanic Whites using our proposed method, as well as case-control and case-only methods. |
Keyword | BMA; Bayesian model averaging; gene-environment; interactions; admixture; mapping; childhood asthma |
Language | English |
Format (imt) | application/pdf |
Part of collection | University of Southern California dissertations and theses |
Publisher (of the original version) | University of Southern California |
Place of publication (of the original version) | Los Angeles, California |
Publisher (of the digital version) | University of Southern California. Libraries |
Provenance | Electronically uploaded by the author |
Type | texts |
Legacy record ID | usctheses-m |
Contributing entity | University of Southern California |
Rights | Moss, Lilit Chemenyan |
Physical access | The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given. |
Repository name | University of Southern California Digital Library |
Repository address | USC Digital Library, University of Southern California, University Park Campus MC 7002, 106 University Village, Los Angeles, California 90089-7002, USA |
Repository email | cisadmin@lib.usc.edu |
Filename | etd-MossLilitC-6584.pdf |
Archival file | Volume3/etd-MossLilitC-6584.pdf |
Description
Title | Page 44 |
Full text | 34 simulated based on their specified associations with E and Y as in the single-marker simulation. Two sets of simulations were performed to assess power, sensitivity and specificity for discovery. The first set of simulations consisted of 1,000 replicates of N=10,000 samples with equal numbers of cases and controls. We specified W= 1 million, d = 1, pE=0.4, qA=0.1, and pY=0.05. The second set of genome-wide simulations consisted of 1,000 replicates of N=3,750 samples with equal numbers of cases and controls, W=10,000, d=20, pE=0.4, qA=0.225 for all d SNPs, and pY=0.01. We note that a smaller sample size was applied in the simulations used to create Receiver Operating Characteristic (ROC) curves in order to reduce sensitivity across all approaches and yield informative differentiation in results between methods. Such an approach was used because all methods in our comparison showed very high sensitivity, making it impossible to show differences between them. Unless otherwise specified, we simulated independent E and G, and ‘non-causal’ genetic variants with Pr (G=1) sampled from a uniform distribution within the range [0.10,0.40]. We set a null marginal environmental effect of OR(E)=1.0 for both sets of simulations except for one instance in which we calculated power using induced main G effects. For induced main effects, we increased the effects of E and G with increasing interaction effect. To measure empirical power in simulations with one designated ‘causal’ marker, we took the proportion of replicates in which the ‘causal’ marker was identified to be genome-wide significant (P-value ≤5×10−8). To create ROC plots, we repeatedly simulated sets of markers with d=20 designated causal markers. The resulting P-values in each iteration were ordered from least to greatest, and the number of ‘causal’ markers ranked within the set of k smallest P-values, |