Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Comparisons of four commonly used methods in GWAS to detect gene-environment interactions
(USC Thesis Other)
Comparisons of four commonly used methods in GWAS to detect gene-environment interactions
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
1
COMPARISONS OF FOUR COMMONLY USED
METHODS IN GWAS TO DETECT
GENE-ENVIRONMENT INTERACTIONS
By
Claire Chen
______________________________________________________
A Thesis Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(BIOSTATISTICS)
August 2018
Copyright 2018 Claire Chen
2
TABLE OF CONTENTS
DEDICATION 3
ACKNOWLEDGEMENTS 4
ABSTRACT 5
INTRODUCTION 6
METHODS 11
RESULTS 14
DISCUSSION 16
TABLES 18
FIGURES 21
REFERENCES 22
3
DEDICATION
This work is dedicated to my parents for all of their endless love and
support. Also, to my aunt for being there for me in my hard time.
4
ACKNOWLEDGEMENTS
I would like to thank my advisor Dr. W. James Gauderman for being a
great mentor, allowing me to join his research, and for being incredibly
supportive, helpful and patient. In addition, I would like to thank my
committee members Dr. Conti and Dr. Lewinger for their expertise and
guidance.
5
ABSTRACT
It is commonly believed that most complex diseases are affected in part
by gene-environment interactions. Correspondingly, multiple methods have
been proposed to detect gene-environment interactions. However, which one
is the most powerful under different conditions? In this thesis, we compared
four kinds of the most commonly used methods to see which one
outperforms others in most cases.
6
INTRODUCTION
For a long time, it has been thought that different human traits are
determined by either genetic differences or environmental factors. However,
recently more and more scientists hold the opinion that neither genetic nor
environmental differences are solely lead to variations among individuals.
On the contrary, it is the joint action of genes and environmental factors that
lead to differences in human characteristics. Therefore, it is important to
research for the influence of gene-environment interactions when study on
genetic epidemiology.
Gene–environment interaction is when two different genotypes respond
differently to environmental exposures. Sometimes, individuals are affected
by the same environmental factors, resulting in different disease phenotypes
because of the interaction between gene and environment. The graph of
norm of reaction shows how different genotypes respond to environmental
exposures when phenotypic differences are continuous. It can help to
understand gene-environment interactions. Generally speaking, there are
four types of gene-environment interactions, pure interaction, no interaction,
quantitative interaction and crossing interaction. As shown in the graphs
below, G represents for the risk allele. G=1 for carriers while G=0 for
non-carriers. E represents for the environment exposure. Similarly, E=1 for
exposed and E=0 for unexposed. The Y-axis represents the odds ratio.
7
In the graph of pure interaction model, we
assume that there is no increased risk (odds
ratio = 1.0) when G=0 and/or E=0, and that
risk is only increased when both G=1 and
E=1.
In the graph of no interaction model, we
assume that there is no increased risk (odds
ratio = 1.0) when G=0 and E=0, and that
risk is increased when either G=1 or E=1.
In the graph of quantitative interaction
model, we assume that there is no increased
risk (odds ratio = 1.0) when G=0 and E=0,
and that risk is increased when either G=1 or
E=1. But the risk is increased faster when
E=1. This represents the situation where G
and E each have some main effects by
themselves, (i.e. OR
g
> 1.0 and OR
e
>1.0),
but the combined effect when G=1 and E=1
is greater than the product of the two.
8
In the graph of crossing interaction model, we
assume that there is decreased risk (odds ratio
< 1.0) when G=0 and E=0, and that risk is
increased when G=1and E=0, or E=1and G=0.
However when G=1 and E=1, the risk is
decreased. This represents the situation where
G and E each have some effect to increase the
risk by themselves, (i.e. OR
g
> 1.0 and
OR
e
>1.0), but the combined effect of G and E
conversely decrease the risk.
Generally speaking, when the two lines in the chart are not parallel, we
say there is a gene-environment interaction. This indicates that different
genotypes have different responses to the same environmental exposures.
Identifying gene-environment interactions also helps individuals to
determine their risk of developing a disease based on both their genetic risk
profiles and environmental factors. Therefore, studies about
gene-environment interactions have been generated enthusiasm in recent
years.
The most common method to detect gene-environment interaction is a
standard logistic regression model testing in a case control study. It tests
whether the disease risk for gene-environment interaction differs from the
logistic-additive effect of genotypes and environment exposures. This will
be denoted as the 1-df GxE testing in the following discussion.
9
However, the 1-df testing may be not that powerful when a locus only
affect among exposed or unexposed popularity. Then a marginal effect
testing is commonly being used since it is more powerful in this situation.
Thus a joint test of genetic marginal effect and gene-environment interaction
has been proposed. This test will be denoted as the 2-df test [Kraft et al.,
2007] or the G-GE test. The validity of the 1-df test and 2-df tests do not
depend on whether G and E are correlated in the population.
When G and E are uncorrelated in the population (a G-E independence
assumption is satisfied) and the disease prevalence is low, a “case only” test
can be used to test for GxE interaction and is often more powerful than the
usual case-control test. A case-only analysis involves testing for association
between genotype and exposure (G-E association) in the cases. This method
weighs much on satisfaction of the G-E independence assumption; if that
assumption fails, a case-only analysis will result in an inflated number of
false positives [Piegorsch, 1994]. In a genome wide association study
(GWAS), one can typically evaluate the G-E independence assumption by
examining the q-q plot for large-scale deviations between observed and
expected values of the log-10 p-values.
Murcray et al., 2009 demonstrated that for a rare disease, one could
expect to observe G-E association in the combined case-control sample for a
gene involved in a GxE interaction. This formed the basis for a screening
step in their 2-step testing procedure, described briefly below. They further
noted that this G-E information is independent of the marginal G and GxE
information used in the 2-df Kraft et al. test. Therefore, one might consider
using a 3-df joint test which combines information about genetic marginal
effect, gene and environment interaction, and also gene and environment
association, which will be denoted as the 3-df test below [Gauderman et al.,
2018]. The 3-df test is still subject to the G-E independence assumption in
the population, although is less sensitive to this than is the case-only analysis.
10
We will examine whether it can enhance power of detecting
gene-environment interaction.
As mentioned above, a marginal gene testing is the most common
approach used to identify novel genes in a GWAS, so we also involve this in
the following discussion. It will be denoted as Marg G test.
Finally, there are many methods not included in the discussion above
but that have a wide range of applications under certain conditions. For
example, Li and Conti [Li and Conti, 2009] proposed a Bayes Model
Averaging approach and Mukherjee and Chatterjee [Mukherjee and
Chatterjee, 2008] proposed an empirical Bayes estimator, both designed to
give different weights to case-control and case-only estimators based on
whether the G-E independence assumption is likely to be held or not. In
addition, some researchers focus on the step of screening, such as the genetic
marginal association-based [Kooperberg and LeBlanc, 2008] and the
correlation-based [Murcray et al., 2009] procedures. These two-stage
analyses can gain more power in genome-wide association studies where a
majority of SNPs do not have gene-environment interactions. Different
comparison procedures can also affect the power of detecting
gene-environment interactions. For example, the weighted hypothesis testing
approach allocates Type I error differentially, which enable a potential to
increase power by assigning more weight to SNPs that are more likely to
have a gene-environment interaction.
In this thesis, we will focus on single-step procedures, including the
Marg G test, the 1-df test, the 2-df test and the 3-df test. By setting different
cases and different combination of parameters, we are going to see which
method will be more powerful than others in detecting the gene-environment
interaction.
11
METHODS
Let D be an indicator of the disease, and assume we have a sample of
cases (D=1) and unrelated controls (D=0). There are three models involved
in the Marg G test, 1-df test, 2-df test and the 3-df test. Models are as below,
Model 1:
Logit[Pr(D=1|G)] = α + β
g
G
Model 2:
Logit[Pr(D=1|G,E)] = α + β
g
G + β
e
E + β
ge
G×E
Model 3:
Logit[Pr(G|E)] = α
g
+ γ
g
E
Model 3 is applied to the combined sample of both cases and controls. In
all of the models, G is the genotype coding and E is the environment
exposure. Under an additive genetic model, G can equal 0, 1 or 2 as
codes for the number of minor alleles at a particular locus. Under a dominant
or recessive model, genotypic categories collapse and it ends up being a
comparison of G=1 to G=0. In this thesis, we only consider a dominant
genotype coding where G = 1 for carriers with the risk allele and G = 0 for
non-carriers. The exponential of parameter β
ge
is the ratio of the odds ratio
comparing exposed risk allele carriers and unexposed non-carriers. If β
ge
= 0
then we will say there is no gene-environment interaction. If β
ge
> 0 then the
genetic effect is larger in exposed individuals than in unexposed individuals.
Conversely, if β
ge
< 0 then the genetic effect is smaller in exposed
individuals than in unexposed individuals.
12
Please note that the null hypotheses of the Marg G test, 1-df test, 2-df
test and 3-df test are different. Null hypotheses are as below respectively.
Marg G test: H
null
: β
g
= 0
(Correspond to Model 1)
1-df test: H
null
: β
ge
= 0
(Correspond to Model 2)
2-df test: H
null
: β
ge
= β
g
= 0
(Both β
g
and β
ge
from Model 2)
3-df test: H
null
: β
ge
= β
g
= γ
g
= 0
(β
g
and β
ge
from Model 2, and γ
g
from model 3)
We assume the joint distribution of G and E in the general population is
Pr(G,E)=Pr(G)Pr(E), i.e. the genetic and environmental factors are
independent. To compare the above tests, we calculate power for studies
with 1,000 cases (and 1,000 controls), and 5,000 cases (and 5,000 controls)
respectively. We consider a range of allele frequencies (0.10, 0.25),
environmental exposures (0.10, 0.50), odds ratio of environmental main
effect (1.0, 2.0), odds ratio of genetic main effect (1.0, 1.50), odds ratio of
gene-environment interaction (1.0, 2.5). In all calculation, we assume a
baseline of disease prevalence of 0.0001, an overall significance level of
0.05, a two-sided alternative hypothesis, and a total of 1,000,000 SNPs being
tested in the GWAS. We assume a Bonferroni correction for multiple testing,
and so the per-SNP significance level is 0.05/1,000,000 = 0.00000005.
13
The methods mentioned above can be implemented by a C++ written
program, Q2 (program developed by Gauderman, 2016). Parameters can be
test in this program includes: Type of design (denoted as Design), controls
per case (denoted as C/case), dominance model (denoted as DomG),
exposure type (denoted as Etype), frequency of the reference allele of the
disease susceptibility locus (denoted as qA), the proportion exposure (pE),
baseline disease prevalence (denoted as p0), baseline odds ratio for G=1 vs.
G=0, assuming E=0 (denoted as OR
g
), baseline odds ratio for E=1 vs. E=0,
assuming G=0 (denoted as OR
e
), and the G and E interaction odds ratio
(denoted as OR
ge
).
14
RESULTS
Table 1 shows the power estimates for allele frequency qA= 0.10,
proportion exposure pE
= 0.10, and several settings of the disease model
parameters. Not surprisingly, with the same setting of the interaction (OR
ge
),
the environmental main effect (OR
e
) and the genetic main effect (OR
g
), the
power of study with 5,000 cases is always larger than the power with 1,000
cases.
In the absence of interaction (OR
ge
=1.0), the 1-df test has no power
(beyond the Type I error rate). The Marg G test is always more powerful
than the 2-df test, and the 2-df test is always more powerful than the 3-df
test.
In the presence of an interaction (OR
ge
≥1.5), the relative power depends
on both genetic and environmental main effects. When OR
e
=1.0 and
OR
g
=1.0, or they are slightly greater than 1.0, the 2-df test is a little more
powerful than the Marg G test. While when OR
e
and OR
g
are of modest to
large magnitude, the 2-df test is a little less powerful than the Marg G test.
However, in general the difference of power between 2-df test and the Marg
G test is small. Regardless of the value of OR
e
and OR
g
, as long as the
OR
ge
≠1.0, the 3-df test is always the most powerful one among the four
methods above, while the 1-df test is always the least powerful method.
Figure 1 and Table 1 show the power estimates versus
gene-environment interaction parameter OR
ge
(X-axis) for a range of genetic
main effect (OR
g
) and environmental main effect (OR
e
). If OR
ge
≥1.5, the
3-df test is the most powerful method, and the 1-df test is the least powerful.
The power estimates of the Marg G and 2-df test slightly differs, and are
between the power estimates of the 3-df test and the 1-df test. In addition,
the power estimates of the 3-df test increases significantly when OR
ge
≥1.5
with 1,000 cases and 1,000 controls.
15
Table 2 shows the power estimates for allele frequency qA= 0.25,
proportion exposure pE
= 0.10, and several settings of the disease model
parameters. Not surprisingly, the findings are similar to those in table 1.
When there is no gene-environment interaction, the Marg G test is more
powerful than the 2-df test, and the 2-df test is more powerful than the 3-df
test. When there is an interaction, the 3-df test is the most powerful method,
and the 1-df test is the least powerful. The power estimates of the Marg G
and 2-df test slightly differs, and are between the power estimates of the 3-df
test and the 1-df test.
The only difference between table 1 and table 2 is that the power
estimates in table 2 are generally larger than those in table 1. From which we
can conclude that the increase of allele frequency qA
could enhance the
power of methods to detect gene-environment interactions.
Similarly, Table 3 shows the power estimates for allele frequency qA=
0.10, proportion exposure pE
= 0.50, and several settings of the disease
model parameters. The only difference between table 1 and table 3 is that the
power estimates in table 3 are generally larger than those in table 1. From
that we can conclude that the increase of proportion exposure pE
could also
enhance the power of methods to detect gene-environment interactions.
16
DISCUSSION
In this thesis, we compared the power of several popular tests, and a
novel 3-df test, to identify genes in a GWAS. In general, we found that The
Marg G test is always more powerful than the 2-df test, and the 2-df test is
always more powerful than the 3-df test in the absence of an interaction.
For the interaction scenarios we investigated, which include only pure and
quantitative GxE models, the 3-df test was always more powerful than the
Marg G test and the 2-df test. There was not a significant difference in
power between the Marg G test and the 2-df test, and the 1-df test was
always the least powerful, for the interaction scenarios we investigated.
We also compare the power estimates with 1,000 cases and 5,000 cases,
respectively. Not surprisingly, the more cases we have, the higher power
estimates we get. The number of cases does not change the efficiency of
tests (i.e. when OR
ge
≥1.5, the 3-df test is more powerful than the Marg G
and 2-df tests. The 1-df test is less powerful than the Marg G and 2-df tests.),
but increased number of cases will change the absolute difference of power
estimates among tests. For example, when there are 1,000 cases, the absolute
power estimate of the 3-df test is significantly higher than estimates of other
tests when OR
ge
> 1.5. However, when there are 5,000 cases, there are not
large differences between the power estimates of 2-df and 3-df tests, as both
approach the maximum of 100%.
Finally, by comparing the tables above, we can conclude that the
increased values of the allele frequency qA and the proportion exposure pE
can both enhance the power estimates. But they do not change the efficiency
of tests. They do not influence the difference among power estimates, either.
So we can conclude that the allele frequency qA and the proportion exposure
pE have no effect in choosing a test to detect the gene-environment
interaction, as long as they are in the normal range.
17
While the 1-df and 2-df tests do not depend on an assumption of G-E
independence in the population, the 3-df test does depend on this assumption.
If this violated, we may think of using another method to avoid inaccurate
estimates, i.e. the Bayes Model Averaging approach [Li and Conti, 2009]
and the empirical Bayes estimator [Mukherjee and Chatterjee, 2008]. Both
are designed to give different weights to case-control and case-only
estimators based on whether the G-E independence assumption is likely to
hold or not.
In this thesis, we have replicated the basic power comparisons first
shown by Kraft et al. [2007] for three commonly used procedures, the Marg
G test, 1-df test of GxE, and the 2-df combined test. We have demonstrated
that a 3-df test that augments the 2-df test with G-E association information
can provide greater power than any of the three commonly used procedures.
Across the four procedures evaluated, one needs to remember that the
corresponding underlying hypotheses being evaluated are different. While
these differences are important for understanding the underlying biology of
how a gene might influence a trait, they may be less important in a GWAS
where the primary goal is to first identify a gene or genes that are important
among the very large collection of genes being tested. In that context, it may
be less important whether a gene has a marginal-only or GxE-only effect,
but rather that the gene has some effect that we would like to explore further
in a future study. The procedures that attempt to capture as much
information as possible, including the 2-df, 3-df, and the 2-step procedures
we did not evaluate here, may provide our best opportunity to identify novel
genes for complex human traits.
18
TABLES
Table 1.
OR
e
OR
g
OR
ge
Study with 1,000 cases Study with 5,000 cases
Marg G 1df 2df 3df Marg G 1df 2df 3df
1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02
2.00 0.00 0.00 0.00 0.00 0.00 0.19 0.21 0.90
2.50 0.00 0.00 0.00 0.09 0.00 0.78 0.83 >0.99
1.25 1.00 0.00 0.00 0.00 0.00 0.18 0.00 0.12 0.09
1.50 0.00 0.00 0.00 0.00 0.54 0.00 0.65 0.82
2.00 0.01 0.00 0.02 0.07 0.86 0.22 0.99 >0.99
2.50 0.02 0.00 0.08 0.41 0.98 0.81 >0.99 >0.99
1.50 1.00 0.05 0.00 0.03 0.02 >0.99 0.00 >0.99 0.99
1.50 0.11 0.00 0.10 0.11 >0.99 0.00 >0.99 >0.99
2.00 0.23 0.00 0.27 0.47 >0.99 0.24 >0.99 >0.99
2.50 0.38 0.01 0.51 0.86 >0.99 0.83 >0.99 >0.99
2.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.21
2.00 0.00 0.00 0.00 0.05 0.02 0.31 0.39 >0.99
2.50 0.00 0.01 0.01 0.47 0.29 0.90 0.95 >0.99
1.25 1.00 0.00 0.00 0.00 0.00 0.18 0.00 0.11 0.09
1.50 0.01 0.00 0.00 0.01 0.82 0.01 0.75 0.99
2.00 0.03 0.00 0.03 0.34 >0.99 0.34 >0.99 >0.99
2.50 0.14 0.01 0.13 0.89 >0.99 0.91 >0.99 >0.99
1.50 1.00 0.05 0.00 0.03 0.02 >0.99 0.00 >0.99 0.99
1.50 0.21 0.00 0.11 0.25 >0.99 0.01 >0.99 >0.99
2.00 0.49 0.00 0.34 0.84 >0.99 0.36 >0.99 >0.99
2.50 0.77 0.01 0.62 >0.99 >0.99 0.92 >0.99 >0.99
Table 1 provides power estimates based on a study with N (1,000 or 5,000) cases and N controls. Type I error
rate α=0.05 with 2-sided alternative hypothesis. Baseline disease prevalence is 0.0001.
Both the minor allele frequency and exposure are uncommon (q
A
=0.1, p
E
=0.1).
19
Table 2.
OR
e
OR
g
OR
ge
Study with 1,000 cases Study with 5,000 cases
Marg G 1df 2df 3df Marg G 1df 2df 3df
1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.50 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.14
2.00 0.00 0.00 0.00 0.02 0.00 0.52 0.56 >0.99
2.50 0.00 0.02 0.02 0.30 0.02 0.97 0.98 >0.99
1.25 1.00 0.00 0.00 0.00 0.00 0.54 0.00 0.44 0.37
1.50 0.01 0.00 0.01 0.02 0.91 0.01 0.95 0.99
2.00 0.03 0.00 0.06 0.23 0.99 0.51 >0.99 >0.99
2.50 0.08 0.02 0.22 0.73 >0.99 0.97 >0.99 >0.99
1.50 1.00 0.18 0.00 0.12 0.09 >0.99 0.00 >0.99 >0.99
1.50 0.35 0.00 0.31 0.33 >0.99 0.01 >0.99 >0.99
2.00 0.55 0.00 0.58 0.78 >0.99 0.50 >0.99 >0.99
2.50 0.73 0.01 0.80 0.98 >0.99 0.97 >0.99 >0.99
2.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.50 0.00 0.00 0.00 0.00 0.00 0.02 0.03 0.63
2.00 0.00 0.00 0.00 0.19 0.10 0.70 0.78 >0.99
2.50 0.00 0.03 0.03 0.82 0.71 0.99 >0.99 >0.99
1.25 1.00 0.00 0.00 0.00 0.00 0.54 0.00 0.43 0.37
1.50 0.02 0.00 0.01 0.06 0.99 0.02 0.98 >0.99
2.00 0.14 0.00 0.10 0.67 >0.99 0.69 >0.99 >0.99
2.50 0.39 0.03 0.33 0.99 >0.99 0.99 >0.99 >0.99
1.50 1.00 0.18 0.00 0.11 0.09 >0.99 0.00 >0.99 >0.99
1.50 0.51 0.00 0.34 0.57 >0.99 0.02 >0.99 >0.99
2.00 0.82 0.00 0.66 0.97 >0.99 0.68 >0.99 >0.99
2.50 0.96 0.03 0.86 >0.99 >0.99 0.99 >0.99 >0.99
Table 2 provides power estimates based on a study with N (1,000 or 5,000) cases and N controls. Type I error
rate α=0.05 with 2-sided alternative hypothesis. Baseline disease prevalence is 0.0001.
The minor allele frequency is common and exposure is uncommon (q
A
=0.25, p
E
=0.1).
20
Table 3.
OR
e
OR
g
OR
ge
Study with 1,000 cases Study with 5,000 cases
Marg G 1df 2df 3df Marg G 1df 2df 3df
1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.50 0.00 0.00 0.00 0.01 0.18 0.09 0.62 0.94
2.00 0.05 0.01 0.18 0.54 >0.99 0.95 >0.99 >0.99
2.50 0.44 0.10 0.80 >0.99 >0.99 >0.99 >0.99 >0.99
1.25 1.00 0.00 0.00 0.00 0.00 0.18 0.00 0.12 0.09
1.50 0.10 0.00 0.12 0.20 >0.99 0.10 >0.99 >0.99
2.00 0.71 0.01 0.83 0.97 >0.99 0.96 >0.99 >0.99
2.50 0.98 0.12 >0.99 >0.99 >0.99 >0.99 >0.99 >0.99
1.50 1.00 0.05 0.00 0.03 0.02 >0.99 0.00 >0.99 0.99
1.50 0.71 0.00 0.70 0.78 >0.99 0.12 >0.99 >0.99
2.00 0.99 0.02 >0.99 >0.99 >0.99 0.97 >0.99 >0.99
2.50 >0.99 0.13 >0.99 >0.99 >0.99 1.00 >0.99 >0.99
2.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1.50 0.00 0.00 0.00 0.01 0.67 0.06 0.75 0.98
2.00 0.26 0.01 0.26 0.70 >0.99 0.90 >0.99 >0.99
2.50 0.89 0.07 0.88 1.00 1.00 1.00 1.00 1.00
1.25 1.00 0.00 0.00 0.00 0.00 0.18 0.00 0.11 0.09
1.50 0.26 0.00 0.16 0.33 >0.99 0.08 >0.99 >0.99
2.00 0.95 0.01 0.90 0.99 >0.99 0.92 >0.99 >0.99
2.50 >0.99 0.08 >0.99 >0.99 >0.99 >0.99 >0.99 >0.99
1.50 1.00 0.05 0.00 0.02 0.02 >0.99 0.00 >0.99 0.99
1.50 0.89 0.00 0.76 0.90 >0.99 0.09 >0.99 >0.99
2.00 >0.99 0.01 >0.99 >0.99 >0.99 0.94 >0.99 >0.99
2.50 >0.99 0.09 >0.99 >0.99 >0.99 >0.99 >0.99 >0.99
Table 3 provides power estimates based on a study with N (1,000 or 5,000) cases and N controls. Type I error
rate α=0.05 with 2-sided alternative hypothesis. Baseline disease prevalence is 0.0001.
The minor allele frequency is umcommon and exposure is common (q
A
=0.10, p
E
=0.50).
21
FIGURES
Figure 1.
22
REFRENCES
1. Peter Kraft, Yu-Chun Yen, Daniel O. Stram, John Morrison, W. James
Gauderman: Exploiting Gene-Environment Interaction to Detect Genetic
Associations. Hum Hered 2007;63:111–119.
2. Piegorsch W, Weinberg C, Taylor J: Non-hierarchical logistic models and
case-only designs for assessing susceptibility in population-based
case-control studies. Stat Med 1994; 13: 153–162.
3. Murcray CE, Lewinger JP, Gauderman WJ: Gene-environment interaction
in genome-wide association studies (with commentaries and rejoinder). Am
J Epidemiol 169(2):219–226.
4. Li D, Conti DV: Detecting gene-environment interactions using a
combined case-only and case-control approach. Am J Epidemiol
169(4):497–504.
5. Mukherjee B, Chatterjee N: Exploiting gene-environment independence
for analysis of case-control studies: an empirical Bayes-type shrinkage
estimator to tradeoff between bias and efficiency. Biometrics 64(3):685–
694.
6. Kooperberg C, LeBlanc M: Increasing the power of identifying gene x
gene interactions in genome-wide association studies. Genet Epidemiol
32(3):255–263.
7. Roeder K, Devlin B, Wasserman L: Improving power in genome-wide
association studies: weights tip the scale. Genet Epidemiol. 2007;31(7):741–
747. doi: 10.1002/gepi.20237.
8. Iuliana Ionita-Laza, Matthew McQueen, Nan Laird, Christoph Lange:
Genomewide Weighted Hypothesis Testing in Family-Based Association
Studies, with an Application to a 100K Scan American Journal of Human
Genetics 2007:81:607-614
9. Roeder K, Wasserman L: Genome-Wide Significance Levels and
Weighted Hypothesis Testing. Stat Sci. 2009:24(4):398-413.
10. Wikipedia
https://en.wikipedia.org/wiki/Gene%E2%80%93environment_interaction
Abstract (if available)
Abstract
It is commonly believed that most complex diseases are affected in part by gene-environment interactions. Correspondingly, multiple methods have been proposed to detect gene-environment interactions. However, which one is the most powerful under different conditions? In this thesis, we compared four kinds of the most commonly used methods to see which one outperforms others in most cases.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Combination of quantile integral linear model with two-step method to improve the power of genome-wide interaction scans
PDF
Bayesian model averaging methods for gene-environment interactions and admixture mapping
PDF
High-dimensional regression for gene-environment interactions
PDF
Two-step testing approaches for detecting quantitative trait gene-environment interactions in a genome-wide association study
PDF
Minimum p-value approach in two-step tests of genome-wide gene-environment interactions
PDF
Leveraging functional datasets of stimulated cells to understand the relationship between environment and diseases
PDF
Efficient two-step testing approaches for detecting gene-environment interactions in genome-wide association studies, with an application to the Children’s Health Study
PDF
Identification of differentially connected gene expression subnetworks in asthma symptom
PDF
Identifying prognostic gene mutations in colorectal cancer with random forest survival analysis
PDF
Generalized linear discriminant analysis for high-dimensional genomic data with external information
PDF
Using multi-level Bayesian hierarchical model to detect related multiple SNPs within multiple genes to disease risk
PDF
Improving the power of GWAS Z-score imputation by leveraging functional data
PDF
Characterization and discovery of genetic associations: multiethnic fine-mapping and incorporation of functional information
PDF
Gene-set based analysis using external prior information
PDF
The influence of DNA repair genes and prenatal tobacco exposure on childhood acute lymphoblastic leukemia risk: a gene-environment interaction study
PDF
Polygenic analyses of complex traits in complex populations
PDF
Hierarchical approaches for joint analysis of marginal summary statistics
PDF
Hierarchical regularized regression for incorporation of external data in high-dimensional models
PDF
Incorporating prior knowledge into regularized regression
PDF
Comparison of models for predicting PM2.5 concentration in Wuhan, China
Asset Metadata
Creator
Chen, Ke (Claire)
(author)
Core Title
Comparisons of four commonly used methods in GWAS to detect gene-environment interactions
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biostatistics
Publication Date
08/03/2018
Defense Date
08/02/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
gene-environment interaction,GWAS,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Gauderman, William (
committee chair
), Conti, David (
committee member
), Lewinger, Juan (
committee member
)
Creator Email
cocochen0822@gmail.com,kec@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-49399
Unique identifier
UC11670914
Identifier
etd-ChenKeClai-6622.pdf (filename),usctheses-c89-49399 (legacy record id)
Legacy Identifier
etd-ChenKeClai-6622.pdf
Dmrecord
49399
Document Type
Thesis
Format
application/pdf (imt)
Rights
Chen, Ke (Claire)
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
gene-environment interaction
GWAS