Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Minimum p-value approach in two-step tests of genome-wide gene-environment interactions
(USC Thesis Other)
Minimum p-value approach in two-step tests of genome-wide gene-environment interactions
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Minimum p-value approach in two-step tests of genome-wide gene-environment interactions
by
Ziting Jiao
A Thesis Presented to the
FACULTY OF THE THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(BIOSTATISTICS)
May 2023
Copyright 2023 Ziting Jiao
Acknowledgements
I would like to acknowledge and give my warmest thanks to my advisor Professor Eric Kawaguchi who
made this work possible. His guidance and advice carried me through all the stages of writing my project.
I would also like to thank my committee members, Professors Gauderman and Lewinger, for all the help
and suggestions.
ii
TableofContents
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2: Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Standard GWIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Two-step testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 Three types of screening statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Different interaction patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.3 Subset and weighted approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 The minimum p-value approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Chapter 3: Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1 Parameter settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Application of two-step testing with new min-p approach . . . . . . . . . . . . . . . . . . . 9
Chapter 4: Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 5: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
iii
ListofTables
3.1 Settings forOR
G∗ E
and correspondingOR
G
andOR
E
. . . . . . . . . . . . . . . . . . . 9
4.1 Estimated Type I error rates for tests of interaction in all parameter settings using different
screening statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Estimated power for tests of interaction in all parameter settings using different screening
statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
iv
Abstract
Facing an environmental factor, people with certain genetics could have different risks of developing dis-
ease. Hence, identifying the specific genes that interact with environmental factors is essential. Genome-
wide interaction scans (GWIS) was introduced to address this issue. However, GWIS suffers from low
power, which led to the development of the two-step testing method. In a case-control study, several
existing screening statistics were tested to be powerful to use in step-1, but they perform differently ac-
cording to the different interaction patterns. To overcome this issue, we introduce a minimum p-value
screening method to enhance performance in the presence of unknown interaction patterns. By utilizing a
well-designed simulation, we can compare the efficacy of this new approach with other existing screening
methods, using both subset and weighted approaches, and include GWIS as a reference group. The results
demonstrate that this new approach performs more reliably without compromising the family-wise error
rate (FWER).
v
Chapter1
Introduction
Gene-environment interactions (GxE) refer to the complex interplay between genetic factors and environ-
mental exposures that influence an individual’s risk of developing a disease. Investigating GxE is crucial
in identifying specific genetic and environmental factors that contribute to disease risk and understanding
the underlying mechanisms of disease.
Given the importance of detecting gene-environment interactions, numerous approaches, study designs,
and statistical methods have been developed.[10] The traditional method is based on an exhaustive scan
with each SNP using a regression framework. The main goal for improvement is to increase the power to
detect significant interactions. Therefore, many approaches have been proposed to achieve the goal for
quantitative[14][13], binary[9], and time-to-event[8] traits. Among these approaches, two-step testing is a
widely used approach. This study introduces a new screening statistic in step-1 of the two-step testing, in
order to improve power for different marginal effects. The proposed approach addresses the challenge of
low power and variability of existing screening statistics by providing a more reliable method for detecting
significant GxE interactions.
1
Chapter2
Methods
2.1 StandardGWIS
Genome-wide interaction scans (GWIS) is a simple method to test GxE interactions. As the formula below,
it tests one SNP at-a-time by modeling the genotype G
j
, an environmental exposure E, and the corre-
sponding interaction term G
j
∗ E, where j represents the number of SNPs. Environmental exposure E
can represent both the external environment (eg., water quality) and personal exposure (eg., gender). Test-
ing of the gene-environment interaction is based on the significance of the interaction term, typically from
a logistic (for a binary/disease trait), linear (for a quantitative trait), or Cox[2] (for a survival trait) regres-
sion model, but this study will only be focused on binary outcomeD (disease) from a case-control study.
The GWIS is performed using the following logistic regression model.
Logit(Pr(D = 1|G
j
,E)) =β oj
+β G
j
G
j
+β E
E +β G
j∗ E
(G
j
∗ E)
To test the interaction of genes and environmental factors in GWIS, we use the test statistics from testing
the null hypothesis H0: β G
j∗ E
= 0 for each SNP. Rather than using a significance level of α =0.05, an
adjustment is applied to it, usually set as α ∗ = α/M (M is the total number of SNPs for genes). Any
2
interaction that has a corresponding test p-value smaller thanα ∗ is considered significant. However, the
correction applied to the significant level can cause low power in detecting GxE interaction.
2.2 Two-steptesting
To address these issues, the 2-step tests method is proposed, which involves two steps to identify potential
GxE effects more efficiently. The 2-step method improves power compared to GWIS while maintaining
the Type I error rate.
In the first step, a screening test is performed to identify the genetic variants most likely to interact with
an environmental exposure. Screening statisticT
1
and correspondingp
1
are generated for each SNP and
many different methods of computing screening statistics have been proposed[12][3][5]. Depending on
the different GxE patterns, they perform differently.
In the second step, a more comprehensive test is performed on the interactions identified in the screening
test. It uses p
1
as the basis for sorting, SNPs where smaller p
1
are ranked higher. This step involves
modeling the joint effects of the genetic and environmental factors using the regression model shown
above. Compute GxE test statisticT
2
from testing the null hypothesis H0: β G
j∗ E
= 0 with corresponding
p-valuep
2
. A key requirement for any of the two-step methods is the independence of the Step 1 screening
T
1
and Step 2 testing statisticsT
2
[8]. This approach can reduce the number of false positives and increase
statistical power, as the significance levels used are adjusted. Several different step-2 methods have also
been proposed[12][7]and in this study weighted method[6] and subset method[9] are used, which will be
introduced later.
3
2.2.1 Threetypesofscreeningstatistics
Three types of step-1 screening tests have been used in this study: test of marginal D versus G (DG)
association, DG statistics[9] is based on the hypotheses H0:γ G
j
= 0 for j = 1,. . . , M, whereγ G
j
is estimated
based on the following model
Logit(Pr(D = 1|G
j
)) =γ oj
+γ G
j
G
j
Test of E versus G (EG) association, EG statistics[12] is based on the hypotheses H0: µ gE
= 0, whereµ gE
is estimated based on the following model
Logit(Pr(G =g|E)) =µ g
+µ gE
E
Additionally, EDGE[3] using the combination of DG and EG screening statistics has been proposed to
improve efficiency, the statistic is generated as follows.
T
EDGE
=T
2
DG
+T
2
EG
BecauseT
DG
andT
EG
follow the standard normal distribution,T
EDGE
follows a chi-square distribution
with two degrees of freedom. A corresponding p-value can be generated because of the known distribution.
2.2.2 Differentinteractionpatterns
Even with the same interaction effect size OR
G∗ E
, the patterns for the interactive effects of G and E on
D could be completely different. A concept should be introduced is marginal effect[1]. In a case-control
study, the marginal effect of a gene (G) on disease (D) is measured by the odds ratio for gene OR
G
, which
can be calculated byexp(γ G
) from the regression model for DG statistics above. The marginal effect OR
G
4
is a weighted average of the corresponding genetic odds ratios in two exposure groups(OR
G
|E = 0) and
(OR
G
|E = 1).
In this study, a total of three statistics have been applied for the step-1 screening statistics. Each performed
differently for different interaction patterns between gene and exposure. For example, if there is a GxE
effect but no G effect, then DG statistics would perform poorly because the model only contains G but
not E. If with a G effect but no GxE effect, DG would probably be the best screening statistics, but EG
would perform relatively poorly. Also, it depends on when there’s a GxE effect and induced marginal G
effect. EDGE approach is a combination of the two methods, so it would not be the worst one. Therefore,
it would be helpful if we could find a screening statistic that performs the best with all different interaction
patterns.
2.2.3 Subsetandweightedapproach
For step-2 in two-step testing, subset and weighted method are used in this study in order to have a more
comprehensive evaluation of the newly proposed minimum p-value method. Subset approach selects the
top-ranked SNPs base on the result from step-1, normally the top 5%. Once the subset of interactions is
selected, the significance of the interactions in this subset is evaluated using a more stringent significance
threshold α ∗ compared to GWIS. The correction used in this study is Bonferroni-corrected significance
levelα ∗ =α/m , wherem is the number of SNPs that are selected. This test adjusts the significance level
of each test based on the number of tests performed, effectively controlling for the multiple testing burden.
The weighted approach tests all SNPs using a significance level that is weighted, rather than testing only
a subset of the SNPs. After the SNPs have been sorted according to the significance of the screening test
in step-1, the genes are divided into a total number of B bins, and the significance level used in each
bin is the same. Set the first bin size B
0
= 5 as suggested. The significance level α 1
for the first bin
is α/ 2/B
0
. The size of each bin increases, the latter is twice as large as the previous one. For example,
5
B
1
= 10,B
2
= 20,B
3
= 40,...,etc. To control the FWER at level α , derive α to α 1
,α 2
,...,α B
, where
α b
=α/ 2
b
. Therefore, with a largeB, we have the following equation:
α 1
+α 2
+...+α B
≈ α The significance threshold decrease as bin size gets larger, a Bonferroni correction is also applied in each
binα ∗ b
=α b
/B
b− 1
(B
b− 1
is the bin size for each bin). The idea for this method is the gene with a smaller
p-value in step-1 gets a larger threshold in step-2. It assigns different significance levels for each bin so
the weighted method is likely to be more powerful than the subset method.
2.3 Theminimump-valueapproach
This study introduces a novel screening statistic, the min p-value approach. The screening statistics for the
min p-value approach can be obtained by extracting the corresponding p-values from the screening statis-
tics and selecting the minimum value among them. In step-1 of the two-step testing approach, if q screening
statistics were generated (T
1
,T
2
,...,T
q
), then there will be q corresponding p-values (p
1
,p
2
,...,p
q
). Calcu-
late the minimum p-value usingp
pmin1
=min(p
1
,p
2
,...,p
q
). In this study, the minimum of DG, EG, and
EDGE screening statistics is used for this approach. The idea of this approach is that different methods
may perform differently based on various interaction patterns, so that taking the minimum value may help
find the most effective screening method.
In addition, to assess whether the performance is impacted by including invalid screening statistics, an
additional min p-value is introduced during the simulation step. Other than the q screening methods, t
invalid p-values are randomly generated additionally. The second minimum p-value is computed using the
formula p
pmin2
= min(p
1
,p
2
,...,p
q
,...,p
t+q
). If the second min p-value, which includes the additional
6
invalid screening statistics, performs worse than the first one, it indicates that invalid screening statistics
should not be included when using this new approach.
7
Chapter3
Simulation
In this study, we conducted a performance comparison of DG, EG, EDGE, and the two newly proposed
minimum p-value approaches using a two-step testing method with different interaction patterns. We use
two different procedures for prioritizing SNPs in Step-2 GxE testing after the Step-1 screening: subset and
weighted hypothesis testing. To provide a comprehensive evaluation, we included the standard one-step
GWIS as a reference for each interaction pattern.
3.1 Parametersettings
G is set to be an N*M genotype matrix where N=2,000 individuals[4] and M=20,000 SNPs. Each G
j
is
simulated based on a minor allele frequency generated randomly that follows a uniform distribution be-
tween 0.1 and 0.3 except for G
1
. G
1
is designed to be the ’disease susceptibility locus’ (DSL) and has a
GxE interaction effect and a minor allele frequency value of 0.23. E is a binary exposure variable with
Pr(E = 1) = 0.4. Two interaction effect is considered, one is a modest effect ( OR
G∗ E
= 1.6), and the
other is a stronger interaction effect ( OR
G∗ E
= 2.2)[11]. For each interaction effect, we set four different
marginal genetic effect and environment effect sets (shown in Table 3.1). These settings covered a wide
range of disease risk models, including both qualitative models (where the effects of G differed in opposite
directions depending on E) and quantitative models (where the effects of G varied in the same direction
8
but with varying magnitudes across different levels of E) for G ×E interaction. In the two-step testing, we
set the original significance level α = 0.05, and for the weighted approach, the initial bin sizeB
0
= 5[6].
Table 3.1: Settings forOR
G∗ E
and correspondingOR
G
andOR
E
OR
G∗ E
=1.6 OR
G∗ E
=2.2
OR
G
OR
E
OR
G
OR
E
0.81 0.99 0.70 0.87
0.89 0.98 0.77 0.85
1.05 0.96 0.91 0.83
1.22 0.95 1.06 0.80
3.2 Applicationoftwo-steptestingwithnewmin-papproach
To achieve a balanced dataset, continue generating data until there are 1,000 individuals for both cases and
controls. For the two-step testing, firstly we generate all the screening statistics we need, including DG,
EG, EDGE and the two minimum p-values. There should be M values for each statistic because each one is
following a one-SNP-at-a-time model. After generating thep
DG
andp
EG
using the corresponding logistic
regression model,p
EDGE
is the probability that a value is less thanT
2
DG
+T
2
EG
with follows a chi-square
distribution with a degree of freedom of 2. p
pmin1
is to get the minimum of the three screening statistics
above. Three random values between 0 to 1 are generated andp
pmin2
should be the minimum of all the
values including the three random ones.
In Step-2, two approaches are used separately. For the subset approach, we rank the screening statistics
from low to high and only keep the top 5%. A constant significance level α ∗ = α/ 5%∗ M is used as
threshold. For the weighted approach, after the same ranking procedure, we keep all of the SNPs for later
comparison.α ∗ are generated using the calculation method introduced before, sorted in descending order.
Usingα ∗ as a threshold to test the null hypothesisβ G∗ E
= 0.
For both approaches, power is defined as rejecting the null hypothesis of H0: β G
1
∗ E
= 0 at a significance
level of α ∗ . FWER is calculated as rejecting the null hypothesis for any of the SNPs in j=2,. . . ,M at the
9
corresponding significance level. As we set the FWER to be 5%, the estimated FWER we get from the
simulation should also be around 5%.
10
Chapter4
Result
First, compare the family-wise error rate under all sets of parameters (Table 4.1). It can be observed that
there is no significant deviation from 0.05 across all different step-1 screening statistics, including the
newly added two using the min-p approach. It provides evidence that the new minimum p-value approach
effectively controls the Type I error, thus enabling meaningful comparisons of power with other screening
methods.
From the data comparison in Table 4.2, it can be seen intuitively that generally, two-step testing methods
have a higher power than the traditional GWIS. Also under the same interaction pattern, the power ob-
tained by using the weighted method in the second step of the two-step test is obviously greater than that
Table 4.1: Estimated Type I error rates for tests of interaction in all parameter settings using different
screening statistics
ORG∗ E =1.6 ORG∗ E =2.2
ORG=0.81
ORE =0.99
ORG=0.89
ORE =0.98
ORG=1.05
ORE =0.96
ORG=1.22
ORE =0.95
ORG=0.70
ORE =0.87
ORG=0.77
ORE =0.85
ORG=0.91
ORE =0.83
ORG=1.06
ORE =0.80
GWIS 0.0454 0.0421 0.0469 0.0457 0.0440 0.0426 0.0431 0.0435
subset DG 0.0442 0.0458 0.0443 0.0467 0.0472 0.0439 0.0498 0.0470
EG 0.0476 0.0446 0.0427 0.0469 0.0457 0.0493 0.0456 0.0433
EDGE 0.0459 0.0433 0.0412 0.0499 0.0435 0.0482 0.0488 0.0447
pmin1 0.0462 0.0441 0.0411 0.0486 0.0462 0.0472 0.0476 0.0448
pmin2 0.0428 0.0423 0.0423 0.0495 0.0464 0.0485 0.0464 0.0463
weighted DG 0.0481 0.0487 0.0423 0.0471 0.0478 0.0502 0.0453 0.0440
EG 0.0435 0.0449 0.0433 0.0440 0.0445 0.0499 0.0471 0.0412
EDGE 0.0457 0.0489 0.0388 0.0449 0.0490 0.0456 0.0436 0.0391
pmin1 0.0444 0.0469 0.0402 0.0459 0.0470 0.0462 0.0458 0.0388
pmin2 0.0472 0.0479 0.0411 0.0446 0.0444 0.0456 0.0462 0.0414
11
obtained by the subset method. It can be concluded that weighted method performs better in distinguish-
ing the significant GxE interaction.
From Table 4.2, it is clear that no one specific screening statistic has the greatest power among all pat-
terns. For the modest interaction effect ( OR
G∗ E
=1.6), the EG screening statistic has the largest power
when OR
G
= 0.81,OR
E
= 0.99 and OR
G
= 0.89,OR
E
= 0.98, EDGE screening statistic has the
largest power when OR
G
= 1.05,OR
E
= 0.96 and OR
G
= 1.22,OR
E
= 0.95, in both second step
approach. For the stronger interaction effect ( OR
G∗ E
=2.2), EG screening statistic has the largest power
whenOR
G
= 0.70,OR
E
= 0.87 andOR
G
= 0.77,OR
E
= 0.85, EDGE screening statistic has the largest
power when OR
G
= 0.91,OR
E
= 0.83 and OR
G
= 1.06,OR
E
= 0.80. Although the new minimum
p-value does not have the highest power, it can be observed from both the table and the figure that it gets
the second highest power among all set parameter groups. It is noteworthy that when three ineffective
0.0000
0.0500
0.1000
0.1500
0.2000
0.2500
0.3000
0.3500
0.4000
ORG=0.81,ORE=0.99 ORG=0.89,ORE=0.98 ORG=1.05,ORE=0.96 ORG=1.22,ORE=0.95
Power
GWIS weighted DG weighted EG
weighted EDGE weighted pmin1 weighted pmin2
𝑂𝑅
𝐺 ∗ 𝐸 =1.6
0.0000
0.1000
0.2000
0.3000
0.4000
0.5000
0.6000
0.7000
ORG=0.70,ORE=0.87 ORG=0.77,ORE=0.85 ORG=0.91,ORE=0.83 ORG=1.06,ORE=0.80
Power
GWIS subset DG subset EG
subset EDGE subset pmin1 subset pmin2
𝑂𝑅
𝐺 ∗ 𝐸 =2.2
0.0000
0.1000
0.2000
0.3000
0.4000
0.5000
0.6000
0.7000
0.8000
0.9000
1.0000
ORG=0.70,ORE=0.87 ORG=0.77,ORE=0.85 ORG=0.91,ORE=0.83 ORG=1.06,ORE=0.80
Power
GWIS weighted DG weighted EG
weighted EDGE weighted pmin1 weighted pmin2
𝑂𝑅
𝐺 ∗ 𝐸 =2.2
0.0000
0.0500
0.1000
0.1500
0.2000
0.2500
0.3000
0.3500
0.4000
ORG=0.81,ORE=0.99 ORG=0.89,ORE=0.98 ORG=1.05,ORE=0.96 ORG=1.22,ORE=0.95
Power
GWIS subset DG subset EG
subset EDGE subset pmin1 subset pmin2
𝑂𝑅
𝐺 ∗ 𝐸 =1.6
Figure 4.1: Power comparison among different screening statistics using setting parameters
screening statistics were added, the min-p approach demonstrated a consistent decrease in power in most
cases. Therefore, it is crucial to ensure that only verified screening methods are utilized when applying
12
this approach. The findings derived from this study are consistent across both the subset and weighted
methods, indicating the comparisons of screening statistics is not affected by the step-2 testing approach
being
Table 4.2: Estimated power for tests of interaction in all parameter settings using different screening statis-
tics
ORG∗ E 1.6 ORG∗ E =2.2
ORG=0.81
ORE =0.99
ORG=0.89
ORE =0.98
ORG=1.05
ORE =0.96
ORG=1.22
ORE =0.95
ORG=0.70
ORE =0.87
ORG=0.77
ORE =0.85
ORG=0.91
ORE =0.83
ORG=1.06
ORE =0.80
GWIS 0.0148 0.0152 0.1352 0.0157 0.3304 0.3186 0.3296 0.3343
subset DG 0.0029 0.0141 0.3234 0.0689 0.0293 0.1073 0.4848 0.5846
EG 0.0490 0.0502 0.2990 0.0537 0.5825 0.5733 0.5810 0.5842
EDGE 0.0425 0.0477 0.3715 0.0689 0.5722 0.5684 0.5844 0.5877
pmin1 0.0430 0.0481 0.3622 0.0689 0.5753 0.5690 0.5837 0.5876
pmin2 0.0351 0.0394 0.3409 0.0685 0.5593 0.5564 0.5807 0.5875
weighted DG 0.0017 0.0090 0.3377 0.3502 0.0666 0.1266 0.5482 0.8816
EG 0.0877 0.0961 0.3205 0.1179 0.8356 0.8439 0.8613 0.8721
EDGE 0.0629 0.0897 0.3749 0.3778 0.7842 0.8182 0.9014 0.9337
pmin1 0.0653 0.0866 0.3686 0.3698 0.8027 0.8208 0.8915 0.9320
pmin2 0.0472 0.0625 0.3525 0.3539 0.7527 0.7820 0.8733 0.9295
*The largest power is bolded in each parameter setting for both subset and weighted method.
13
Chapter5
Conclusions
This study focuses on using two-step testing to distinguish significant gene-environment (G ×E) interac-
tions in the case-control study. Due to the disparate interaction patterns affecting the DG and EG screening
methods, the EDGE method’s performance may be compromised when one of the methods is not applica-
ble, despite its integration of both approaches. Thus, the study aims to propose a new method to obtain
more stable screening statistics.
The simulation results demonstrate that the two-step testing method significantly improves over tra-
ditional GWIS while maintaining the family-wise error rate (FWER). Furthermore, using the weighted
method in the second step is more effective than the subset method for the simulation scenarios we con-
sidered. The newly proposed minimum p-value approach satisfies the independence condition. Although
it did not perform the best, it had the second-largest power under all the selected parameter combinations
and exhibits superior stability compared to other screening statistics. In conclusion, this new method is
suitable to use when the interaction pattern is unknown and a better detecting power is desired. Its sta-
bility can reduce the number of screen statistics and computations required. However, we caution against
adding unproven screening statistics to this method as it may affect the accuracy of the min-p approach.
14
Bibliography
[1] Heather Cordell. “Detecting gene-gene interactions that underlie human diseases”. In: Nature
reviews. Genetics 10 (June 2009), pp. 392–404.doi: 10.1038/nrg2579.
[2] D. R. Cox. “Regression Models and Life-Tables”. In: Journal of the Royal Statistical Society: Series B
(Methodological) 34.2 (1972), pp. 187–202.doi:
https://doi.org/10.1111/j.2517-6161.1972.tb00899.x. eprint:
https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517-6161.1972.tb00899.x.
[3] W Gauderman, Pingye Zhang, John Morrison, and Juan Pablo Lewinger. “Finding Novel Genes by
Testing G× E Interactions in a Genome-Wide Association Study”. In: Genetic epidemiology 37
(Sept. 2013).doi: 10.1002/gepi.21748.
[4] W. James Gauderman. “Sample Size Requirements for Association Studies of Gene-Gene
Interaction”. In: American Journal of Epidemiology 155.5 (Mar. 2002), pp. 478–484.issn: 0002-9262.
doi: 10.1093/aje/155.5.478. eprint:
https://academic.oup.com/aje/article-pdf/155/5/478/410002/478.pdf.
[5] W. James Gauderman, Bhramar Mukherjee, Hugues Aschard, Li Hsu, Juan Pablo Lewinger,
Chirag J. Patel, John S. Witte, Christopher Amos, Caroline G. Tai, David Conti, Dara G. Torgerson,
Seunggeun Lee, and Nilanjan Chatterjee. “Update on the State of the Science for Analytical
Methods for Gene-Environment Interactions”. In: American Journal of Epidemiology 186.7 (July
2017), pp. 762–770.issn: 0002-9262.doi: 10.1093/aje/kwx228. eprint: https:
//academic.oup.com/aje/article-pdf/186/7/762/24330802/kwx228gaudermanwebmaterialfinal.pdf.
[6] Iuliana Ionita-Laza, Matthew B McQueen, Nan M Laird, and Christoph Lange. “Genomewide
weighted hypothesis testing in family-based association studies, with an application to a 100K
scan”. In: American journal of human genetics 81.3 (Sept. 2007), pp. 607–614.issn: 0002-9297.doi:
10.1086/519748.
[7] Eric S. Kawaguchi, Andre E. Kim, Juan Pablo Lewinger, and W. James Gauderman. “Improved
two-step testing of genome-wide gene-environment interactions”. In: bioRxiv (2022).doi:
10.1101/2022.06.14.496154. eprint:
https://www.biorxiv.org/content/early/2022/07/01/2022.06.14.496154.full.pdf.
15
[8] Eric S. Kawaguchi, Gang Li, Juan Pablo Lewinger, and W. James Gauderman. “Two-step hypothesis
testing to detect gene-environment interactions in a genome-wide scan with a survival endpoint”.
In: Statistics in Medicine 41.9 (2022), pp. 1644–1657.doi: https://doi.org/10.1002/sim.9319. eprint:
https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.9319.
[9] Charles Kooperberg and Michael LeBlanc. “Increasing the power of identifying gene× gene
interactions in genome-wide association studies”. In: Genetic Epidemiology 32.3 (2008),
pp. 255–263.doi: https://doi.org/10.1002/gepi.20300. eprint:
https://onlinelibrary.wiley.com/doi/pdf/10.1002/gepi.20300.
[10] Kimberly McAllister, Leah E. Mechanic, Christopher Amos, Hugues Aschard, Ian A. Blair,
Nilanjan Chatterjee, David Conti, W. James Gauderman, Li Hsu, Carolyn M. Hutter,
Marta M. Jankowska, Jacqueline Kerr, Peter Kraft, Stephen B. Montgomery, Bhramar Mukherjee,
George J. Papanicolaou, Chirag J. Patel, Marylyn D. Ritchie, Beate R. Ritz, Duncan C. Thomas,
Peng Wei, John S. Witte, and on behalf of workshop participants. “Current Challenges and New
Opportunities for Gene-Environment Interaction Studies of Complex Diseases”. In: American
Journal of Epidemiology 186.7 (July 2017), pp. 753–761.issn: 0002-9262.doi: 10.1093/aje/kwx227.
eprint: https://academic.oup.com/aje/article-pdf/186/7/753/24330718/kwx227.pdf.
[11] Cassandra Milmont, Juan Pablo Lewinger, David Conti, Duncan Thomas, and W Gauderman.
“Sample Size Requirements to Detect Gene-Environment Interactions in Genome-Wide
Association Studies”. In: Genetic epidemiology 35 (Apr. 2011), pp. 201–10.doi: 10.1002/gepi.20569.
[12] Cassandra E. Murcray, Juan Pablo Lewinger, and W. James Gauderman. “Gene-Environment
Interaction in Genome-Wide Association Studies”. In: American Journal of Epidemiology 169.2
(Nov. 2008), pp. 219–226.issn: 0002-9262.doi: 10.1093/aje/kwn353. eprint:
https://academic.oup.com/aje/article-pdf/169/2/219/17337898/kwn353.pdf.
[13] Jianjun Zhang, Qiuying Sha, Han Hao, Shuanglin Zhang, Xiaoyi Raymond Gao, and Xuexia Wang.
“Test Gene-Environment Interactions for Multiple Traits in Sequencing Association Studies”. In:
bioRxiv (2019).doi: 10.1101/710574. eprint:
https://www.biorxiv.org/content/early/2019/07/22/710574.full.pdf.
[14] Pingye Zhang, Juan Pablo Lewinger, David Conti, John L. Morrison, and W. James Gauderman.
“Detecting Gene–Environment Interactions for a Quantitative Trait in a Genome-Wide Association
Study”. In: Genetic Epidemiology 40.5 (2016), pp. 394–403.doi: https://doi.org/10.1002/gepi.21977.
eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/gepi.21977.
16
Abstract (if available)
Abstract
Facing an environmental factor, people with certain genetics could have different risks of developing disease. Hence, identifying the specific genes that interact with environmental factors is essential. Genome-wide interaction scans (GWIS) was introduced to address this issue. However, GWIS suffers from low power, which led to the development of the two-step testing method. In a case-control study, several existing screening statistics were tested to be powerful to use in step-1, but they perform differently according to the different interaction patterns. To overcome this issue, we introduce a minimum p-value screening method to enhance performance in the presence of unknown interaction patterns. By utilizing a well-designed simulation, we can compare the efficacy of this new approach with other existing screening methods, using both subset and weighted approaches, and include GWIS as a reference group. The results demonstrate that this new approach performs more reliably without compromising the family-wise error rate (FWER).
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Two-step testing approaches for detecting quantitative trait gene-environment interactions in a genome-wide association study
PDF
Combination of quantile integral linear model with two-step method to improve the power of genome-wide interaction scans
PDF
High-dimensional regression for gene-environment interactions
PDF
Efficient two-step testing approaches for detecting gene-environment interactions in genome-wide association studies, with an application to the Children’s Health Study
PDF
Bayesian model averaging methods for gene-environment interactions and admixture mapping
PDF
Comparisons of four commonly used methods in GWAS to detect gene-environment interactions
PDF
Statistical downscaling with artificial neural network
PDF
Extending genome-wide association study methods in African American data
PDF
Gene-set based analysis using external prior information
PDF
Leveraging functional datasets of stimulated cells to understand the relationship between environment and diseases
PDF
Generalized linear discriminant analysis for high-dimensional genomic data with external information
PDF
Prediction and feature selection with regularized regression in integrative genomics
PDF
Hierarchical approaches for joint analysis of marginal summary statistics
PDF
Best practice development for RNA-Seq analysis of complex disorders, with applications in schizophrenia
PDF
Genome-wide characterization of the regulatory relationships of cell type-specific enhancer-gene links
PDF
Sentiment analysis in the COVID-19 vaccine willingness among staff in the University of Southern California
PDF
The influence of DNA repair genes and prenatal tobacco exposure on childhood acute lymphoblastic leukemia risk: a gene-environment interaction study
PDF
Enhancing model performance of regularization methods by incorporating prior information
PDF
Machine learning approaches for downscaling satellite observations of dust
PDF
Detecting joint interactions between sets of variables in the context of studies with a dichotomous phenotype, with applications to asthma susceptibility involving epigenetics and epistasis
Asset Metadata
Creator
Jiao, Ziting
(author)
Core Title
Minimum p-value approach in two-step tests of genome-wide gene-environment interactions
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biostatistics
Degree Conferral Date
2023-05
Publication Date
05/04/2023
Defense Date
05/04/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
case-control study,G*E interaction,minimum p-value,OAI-PMH Harvest,two-step method
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kawaguchi, Eric (
committee chair
), Gauderman, William (
committee member
), Lewinger, Juan Pablo (
committee member
)
Creator Email
zitingji@usc.edu,zitingjiao@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113099682
Unique identifier
UC113099682
Identifier
etd-JiaoZiting-11773.pdf (filename)
Legacy Identifier
etd-JiaoZiting-11773
Document Type
Thesis
Format
theses (aat)
Rights
Jiao, Ziting
Internet Media Type
application/pdf
Type
texts
Source
20230505-usctheses-batch-1037
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
case-control study
G*E interaction
minimum p-value
two-step method