Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Combination of quantile integral linear model with two-step method to improve the power of genome-wide interaction scans
(USC Thesis Other)
Combination of quantile integral linear model with two-step method to improve the power of genome-wide interaction scans
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
COMBINATION OF QUANTILE INTEGRAL LINEAR MODEL WITH TWO-STEP
METHOD TO IMPROVE THE POWER OF GENOME-WIDE INTERACTION SCANS
by
Ke Sun
A Thesis Presented to the
FACULTY OF THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(BIOSTATISTICS)
MAY 2023
Copyright 2023 Ke Sun
ii
Table of Contents
List of Tables ................................................................................................................................. iii
List of Figures ................................................................................................................................ iv
Abstract ........................................................................................................................................... v
Chapter 1 Introduction .................................................................................................................... 1
Chapter 2 Methods .......................................................................................................................... 3
2.1 Statistical model ........................................................................................................... 3
2.1.1 Marginal (YG) test ................................................................................................ 3
2.1.2 Interaction test (GxE) ............................................................................................ 4
2.1.3 Two-step method with step 1 on marginal G (YG|GxE) ...................................... 4
2.1.5 Two-step with step1 on Levene’s test with adjusting on Exposure ...................... 7
2.1.6 QUAIL method ..................................................................................................... 8
2.1.7 Two-step with step1 on QUAIL method with adjusting on Exposure ................ 10
2.1.8 Two-step with step1 on QUAIL method (QUAIL|G*E) without adjusting
covariates ........................................................................................................................... 12
2.2 Simulation setting ....................................................................................................... 12
Chapter 3 Results .......................................................................................................................... 16
3.1 Type I Error ................................................................................................................ 16
3.2 Power Comparison ..................................................................................................... 18
Chapter 4 Discussion .................................................................................................................... 20
References ..................................................................................................................................... 21
Appendix ....................................................................................................................................... 23
iii
List of Tables
TABLE 1 SIMULATION SETTING IN DIFFERENT SCENARIOS ------------------------------------------------ 23
TABLE 2 SUMMARIZED POWER IN DIFFERENT SCENARIOS ------------------------------------------------- 23
TABLE 3 SUMMARIZED FWER INDIFFERENT SCENARIOS ---------------------------------------------------- 24
iv
List of Figures
FIGURE 1 SUMMARIZED LINE PLOT FOR FWER IN FIVE SCENARIOS ------------------------------------ 17
FIGURE 2 SUMMARIZED LINE PLOT FOR POWER IN FIVE SCENARIOS ---------------------------------- 18
v
Abstract
Two-step methods can improve the power of detecting gene-environment (GxE) interactions in
genome-wide interaction scans. Typically, in the first step, marginal trait-vs-G or G-vs-E effects
are tested to prioritize the SNPs that are more likely involved in GxE interactions. However, apart
from marginal G effects, GxE interaction can also influence the variability in quantitative trait
levels. Quantile integral linear model (QUAIL) is a quantile regression-based framework used to
estimate genetic effects on the variability of outcomes. In this study, we aimed to utilize the
QUAIL method to test variability of trait in the screening test. We designed several representative
scenarios to evaluate the performance of different interaction detection methods. Through
simulation, we compared this new combined method with other interaction testing methods. We
found that the QUAIL method can be adjusted with exposure to reduce the inflated type 1 error
and performed slightly better than adjusted Levene’s test, a common method testing the variance
of a trait. However, the power of adjusted QUAIL method still remains low compared to other
two-step testing strategies.
1
Chapter 1 Introduction
Genome-wide association studies (GWAS) are commonly used to identify genetic factors that
contribute to complex traits. However, many complex traits are likely to be influenced not only by
genetics, but also by environmental factors. To address this, genome-wide interaction scans (GWIS)
have been developed to test for gene-by-environment (GxE) interactions. These scans model the
genotype, an environmental exposure, and the interaction term of exposure and genotype. Despite
their potential, there are many limitations to detecting GxE interactions. These limitations arise in
part due to the polygenic nature of human traits, as well as the relatively small effects of GxE
interactions. Nonetheless, GWIS represent an important tool in the ongoing effort to better
understand the complex etiology of many human traits and diseases.
The original approach for analyzing gene-by-environment (GxE) interactions is based on a
regression model that includes an interaction term for the gene and the environment. However,
this approach can lack power, and as a result, two-step methods have been developed to improve
the ability to detect GxE interactions [Kooperberg and LeBlanc, 2008; Murcray et al., 2009; Ionita-
Laza et al., 2007]. In the two-step method, the first step involves screening SNPs based on their
marginal effect [Ionita-Laza et al., 2007]. Only SNPs that pass the significance threshold in this
screening test are then tested for GxE interaction in the second step. By prioritizing the SNPs most
likely to be involved in a GxE interaction, the screening test can improve efficiency.
Genetic variants can affect both the level and the variability of a quantitative trait [Paré et al.,
2010]. In addition to detecting marginal genetic effects in step 1 of a two-step genome-wide
interaction scan (GWIS), statisticians like Paré have developed an alternative procedure for testing
variance heterogeneity across SNP genotypes [Paré et al., 2010]. However, previous studies have
2
indicated that using a screening test to test for variance heterogeneity may result in an inflated
false-positive rate [Zhang et al., 2016].
In this paper, we propose a new method for detecting variability heterogeneity that we aim to
combine with the two-step method for GWIS testing. Our goal is to compare the performance of
this new testing approach to other existing methods for detecting GxE interactions. By
incorporating a more robust approach for testing variability on quantitative trait, we hope to
improve the overall ability to detect GxE interactions from another aspect.
The QUAIL (Quantile Integral Linear Model) method is a new technique that uses quantile
regression-based framework to estimate genetic effects on the variance of quantitative traits [Miao
et al., 2022]. QUAIL is a promising approach for finding variance quantitative trait loci (vQTL)
and can also adjust for the effects of confounders on the phenotypic variance and the level of
phenotypic quantitative trait [Miao et al., 2022]. Unlike other methods that test for variance
heterogeneity, QUAIL can be applied to both categorical and continuous variables without
assuming a specific distribution of phenotypes. In this study, we simulated the performance of the
two-step method using QUAIL and compared it to the performance of the common two-step
method.
3
Chapter 2 Methods
Notation:
In developing the methods, we use the following notations:
𝑌 : Quantitative phenotype
𝐺 : Genotype variable (𝐺 =2 for AA genotype, 𝐺 =1 for Aa genotype and 𝐺 =0 for aa
genotype)
𝐸 : Dichotomous exposure variable
𝑀 : Dimension of SNPs. M SNPs have been genotyped on N study individuals with G
1
,G
2
,…,G
M
denoting the genotype at M loci
𝑞 𝐴 : minor allele frequency (MAF) of allele A for the quantitative trait locus (QTL)
Consider a gene-environment interactions study with a quantitative trait Y, an environmental
exposure of interest E, and M SNPs (G
j
, j = 1,2,…,M) measured or imputed for each of the N
subjects.
2.1 Statistical model
2.1.1 Marginal (YG) test
We test the effect of genotype β
G
=0 with linear regression model of the form:
Y=β
0
+β
G
∗ G+ϵ
4
Adjustment covariates can also be added into model. β
𝐺 is a weighted average of the
corresponding 𝐺 genetic effect in each environmental group. The magnitude of β
𝐺 quantifies the
marginal effect and can serve as an indicator for the below two-step methods that test for the
marginal 𝐺 effect test.
2.1.2 Interaction test (GxE)
To test the significance of GxE interaction term with linear regression model of the form:
Y=β
0
+β
G
G+𝛽 𝐸 𝐸 +β
GxE
GxE+ϵ
For a genome-wide interaction study (GWIS), we assume M tests of GxE interaction where test
statistics {𝑇 𝑗 }
𝑗 =1
𝑀 and corresponding p-values {𝑃 𝑗 }
𝑗 =1
𝑀 are computed. Each 𝑇 𝑗 corresponds to the
test statistics for testing the null hypothesis 𝐻 0
:β
𝐺 𝑗 𝑥𝐸
=0 . An adjustment for multiple
comparisons is utilized to preserve the family-wise Type I error rate (FWER) at a prespecified
significance level α . For example, α
∗
represents the significance level in one single hypothesis
among multiple comparisons with α
∗
=α/𝑀 . However, this correction will decrease the power of
detecting the GxE interaction term.
2.1.3 Two-step method with step 1 on marginal G (YG|GxE)
In two-step methods, information on GxE is not only captured by the standard GxE test but a prior
test for prioritize SNPs that are more likely to participate in GxE interactions. [Kawaguchi, et al.,
2023; Zhang et al., 2016].
In the first step of the two-step method, we prioritize SNPs based on their marginal effect on the
trait, as an indication that they may be involved in GxE interactions. In the second step, we test for
5
interactions using a changeable significance threshold that is based on the number of SNPs selected
in the first step, to account for multiple comparisons.
Two procedures have been proposed for prioritizing SNPs in step-2 GxE interaction testing after
the step-1 screening: subset and weighted hypothesis testing [Gauderman et al., 2013]. These
procedures are widely used in the field. In subset testing, only a subset of the M total SNPs that
pass the significance threshold according to the screening statistics are eligible for step 2 using the
standard GxE interaction test. The number of SNPs that pass the screening test is denoted as m <<
M. In step 2, the significance threshold for GxE interaction is calculated by the Bonferroni
correction with the number of SNPs (m) that pass the screening test. The new significance level
𝛼 ∗
=𝛼 /𝑚 is much less stringent than the original threshold used in a single one-step interaction
test. However, the relaxed threshold used in subset testing is a trade-off: SNPs that do not pass the
step 1 screening test will not be able to be tested in step2.
Weighted hypothesis testing is another approach for prioritizing SNPs in step 2 of GxE interaction
testing, and it does not depend on whether a SNP passes the screening test in step 1. In this method,
SNPs are allocated into bins based on the magnitude of the screening statistic [Ionita-Laza et al.,
2007]. Each bin is assigned a different significance level that adds up to α. Typically, lower bins
are given a more liberal significance threshold, as SNPs in these bins are more likely to have an
interaction based on the screening statistics. Conversely, SNPs in higher bins are given a more
stringent significance level for step 2 testing. Unlike the subset method, all SNPs in the screening
test can be tested in step 2 interaction.
Although weighted hypothesis testing is typically more powerful than the subset method, its power
is more likely to be affected by “bin overcrowding” [Kawaguchi, et al., 2023]. This occurs when
SNPs with a non-zero marginal effect but no GxE interaction are allocated to earlier bins and SNPs
6
with true GxE effects are not optimally tested, being assigned a stricter significance level in the
later bins. Other additional component leading to overcrowding is linkage disequilibrium (LD)
affecting index snps in high LD being put in upper-level bins. To avoid the potential computational
burden and the issue of overcrowding, we chose to generate independent snps without LD and use
the subset method in our simulation study. [Kawaguchi, et al., 2022]
Step 1:
Y=β
0
+β
G
∗ G+ϵ
Step 2:
Y=β
0
+β
G
G+𝛽 𝐸 𝐸 +β
GxE
GxE+ϵ
As outlined earlier, the first step in the two-step GxE interaction testing approach involves
conducting a Marginal Test (YG) to identify candidate SNPs with a marginal effect on quantitative
outcomes. These SNPs are then moved to step 2, where they are subjected to a one-step interaction
test using Bonferroni correction, with the significance level adjusted as 𝛼 ∗
=𝛼 /𝑚 , where m
represents the number of candidate SNPs that have passed the step 1 screening test.
The motivation behind Two-step methods with screening test as marginal G is that in the presence
of a true GxE interaction effect, in some cases it should be presented as a significant marginal G
effect on the outcome. This makes the statistics in the screening test helpful in testing the true GxE
effects. However, the marginal G effect may not always be effective in some GxE situations there
may be no marginal G effect can be tested in screening step. Apart from the marginal G effect,
GxE interaction can also influence the variability in quantitative trait levels.
2.1.4 Two-step with step 1 on Levene’s test (Var|GxE)
7
In addition to screening for the marginal G effect, we can also consider variance heterogeneity in
the quantitative trait across genotypic groups [Paré et al., 2010]. The rationale behind this is that
if interaction effects do exist, they may also be reflected in differences in variability in the
quantitative trait among individuals carrying the risk allele compared to non-carriers. Levene's test
has been proposed as a means of testing the equality of variance [Levene H, 1960], which can
prioritize SNPs for further interaction testing. Levene's test is used to test whether k samples have
equal variance, and in this case, it is used to test the variance in the quantitative trait across different
genotype groups. The p-value from the Levene's test is used as the metric for determining
eligibility for entering step 2.
However, previous studies have shown that there is a correlation between the variance estimator
and GxE interaction estimator, indicating that the independent assumptions in two-step methods
are violated when the marginal effect of E exists [Zhang et al., 2016]. Simulation results also
confirm the above statement with a high false-positive rate in the approach proposed by Pare et al.
2.1.5 Two-step with step1 on Levene’s test with adjusting on Exposure [6]
As mentioned in the previous section, the correlation between Levene's test and GxE interaction
violates the independence assumption between step 1 and step 2. However, we can eliminate this
correlation by adjusting for the exposure covariate in the regression model. This adjustment of
exposure will not affect the testing of GxE interaction in step 2. Thus, we propose a revised
Levene's test as follows: [Zhang et al., 2016]
Step1:
Conducting a linear regression model with quantitative trait Y with Exposure:
𝑌 𝑖 =β
0
+β
𝐸 𝐸 𝑖 +ϵ
𝑖
8
𝑖 = 1,...,𝑁 for 𝑁 individuals in the sample and ϵ
𝑖 denotes the residual of regression for each
individual after adjustment of Exposure.
Given different K genotype group, in this case K = 3, denoted as genotype group AA, AB and BB.
Testing the null hypothesis H
0
: σ
1
=σ
2
=⋯=σ
K
, where σ
𝐾 is the standard deviation in the 𝑘 𝑡 ℎ
group. The statistics in revised Levene’s test can be defined as:
W=
(N−K)∑ N
k
(Z
k⋅
−Z
⋅⋅
)
2 K
k=1
(K−1)∑ ∑ (Z
kj
−Z
⋅j
)
2
N
k
j=1
K
k=1
where 𝑁 𝑘 is the sample size in kth group, Z
kj
= |ϵ
kj
− ϵ
k⋅
| with ϵ
kj
as the individual residual of
the regression with Y and E, ϵ
k⋅
as the mean of ϵ
kj
in kth groups. Z
k⋅
Z
⋅j
𝑎𝑛𝑑 Z
⋅⋅
are the mean of
Z
kj
in 𝑘 𝑡 ℎ
genotype group, 𝑗 𝑡 ℎ
individuals and overall level, respectively. We change the statistics
from original quantitative trait level 𝑌 to the residual level ϵ with adjustment of exposure. Now
we obtain p-value as:
𝑃 𝑟𝑉𝑎𝑟 =Pr(𝐹 (𝐾 −1,𝑁 −𝐾 )
>𝑊 )
Where 𝑊 statistics follows F distribution with 𝐾 −1 and 𝑁 −𝐾 degrees of freedom.
Step 2: The candidate SNPs from step 1 are then selected and the null hypothesis 𝐻 0
:β
𝐺 𝑗 𝑥𝐸
=0 is
tested using residuals to remove the marginal effect of E on quantitative trait. Thus, the proposed
approach satisfies the independent assumption, and the increase in family-wise error rate is
preserved.
2.1.6 QUAIL method
QUAIL (quantile integral linear model) is a novel method for identifying SNPs that show
differential variability of a quantitative trait across genotype groups. The QUAIL method uses
9
quantile regression to quantify the variability change in the quantitative trait across different
genotype groups
[Miao et al., 2022]. Quantile regression can estimate the conditional quantile
function of quantitative phenotypical traits at different quantile levels. If a SNP has an effect on
the variability of a quantitative trait, the slopes in quantile regression between genotype groups
will differ across the quantile levels.
To illustrate, let’s consider an example where a SNP G causes a change in variability of trait Y
and let 𝛽 𝜏 represent the slope of quantile regression between trait Y and G at conditional quantile
level 𝜏
𝑄 𝑌 (𝜏 |𝐺 =𝑔 )=𝑔 𝛽 𝜏
Conducting 𝜏 between (0,0.5), the QUAIL method chooses pairs of quantile level 𝜏 with 1−𝜏
and utilizes the difference between regression slope of upper quantile level vs lower quantile level,
i.e., (β
1−τ
−β
τ
). On the contrary, if there is no variability change on quantitative trait, H
0
can be
proposed as H
0
:β
τ+0.5
=β
τ
for τ∈(0,0.5) in general.
After implementing quantile regression, we introduce the quantile-integrated effect to aggregate
all the variability changes across quantile levels for each SNP on trait Y. This integrated effect
provides a summary measure of the overall impact of each SNP on the variability of Y, taking into
account both the magnitude and direction of the effects observed across different quantiles.
β
𝑄𝐼
= ∫ (β
1−τ
−β
τ
)𝑑 τ
0.5
0
The above equation shows that the difference of slope in pairs of quantile levels can also be tested
by testing 𝐻 0
: β
𝑄 𝐼 =0. In the practical computation, quantile levels are always treated as infinite
10
and β
𝑄𝐼
can be approximately estimated by adding up the regression coefficients from K quantile
levels:
𝛽 ̂
𝑄𝐼
=
∑ (𝛽 ̂
1−𝜏 𝑘 −𝛽 ̂
𝜏 𝑘 )
𝐾 𝑘 =1
𝐾
In quantile regression, the fitting equation to solve the regression model is given by:
𝜃̂
𝜏 𝑘 =𝑎𝑟𝑔𝑚𝑖 𝑛 𝜃 ∑𝜌 𝜏 𝑘 (𝑌 𝑖 −𝑔 𝑖 𝛽 𝜏 𝑘 )
From the above equation, we can obtain estimated parameter 𝜃 like 𝛽 ̂
𝜏 𝑘 and 𝜇 ̂
𝜏 𝑘 , where𝜌 𝜏 (𝑢 )=
𝑢 [𝜏 −𝐼 (𝑢 <0)] and i is the index for 𝑖 𝑡 ℎ
individual. The QUAIL method has been developed to
efficiently compute K quantile regressions and overcome the difficulties associated with obtaining
the variance-covariance matrix. More details can be found in the next two sections.
2.1.7 Two-step with step1 on QUAIL method with adjusting on Exposure
Step 1: QUAIL Method to Quantify the Variability in Genotype:
The QUAIL method's first procedure involves estimating the intercept and covariate effects under
the null model for 2K (K=1000) quantile levels in quantile regression of 𝑄 𝑌 (𝜏 |𝐺 =𝑔 ,𝐶 )=𝜇 𝜏 +
𝑔 𝛽 𝜏 +𝐶 𝛼 𝜏 , where C is matrix for covariate and 𝛼 𝜏 is the regression coefficients for covariates and
𝜇 𝜏 is the intercept for different 𝜏 quantile levels. The estimated coefficients are obtained by
minimizing the sum of the equation given by:
𝛼̂
𝜏 =𝑎𝑟𝑔𝑚𝑖 𝑛 𝛼 𝜏 ∑𝜌 𝜏 (𝑌 𝑖 −𝜇 𝜏 𝑘 −𝐶 𝑖 𝛼 𝜏 )
Where 𝜌 𝜏 (𝑢 )=𝑢 [𝜏 −𝐼 (𝑢 <0)] is the loss function for quantile regression. Then, for each
individual i, we construct 2𝐾 quantile rank scores as:
11
𝑎̂
𝑖 (𝜏 )=𝜏 −𝐼 (𝑌 𝑖 <𝐶 𝑖 𝛼̂
𝜏 +𝜇 ̂
𝜏 )
Where 𝐼 (𝑌 𝑖 <𝐶 𝑖 𝛼̂
𝜏 +𝜇 ̂
𝜏 ) is an indicator variable for whether 𝑌 𝑖 is smaller than the estimated 𝜏 𝑡 ℎ
conditional quantile for 𝑌 𝑖 . Thus, we have a quantile rank score for each individual, given by:
𝑌 𝑖𝜏
=
√𝑛 ∗𝑎̂
𝑖 (𝜏 )∗𝑆𝐸 (𝛾̂
𝜏 )
√(𝜏 −𝜏 2
)
Where n is the sample size, 𝑆𝐸 (𝛾̂
𝜏 ) is the estimated standard error of the regression coefficient in
quantile regression 𝑄 𝑌 (𝜏 |𝑑 ,𝐶 )=𝑏 𝜏 +𝑑 𝛾 𝜏 +𝐶 𝛼 𝜏 and 𝑑 is a random variable sampled from
𝑁 (0,1). We use variable 𝑑 and standard error of 𝑑 to approximate 𝛽 ̂
𝑄𝐼
.
Finally, we construct the quantile integrated rank score for each individual 𝑖 by combining 𝑌 𝑖𝜏
across quantile levels:
𝑌 𝑄 𝐼 𝑖 =
∑ [𝑌 𝑖 (1−𝜏 𝑘 )
−𝑌 𝑖𝜏𝑘
]
𝐾 𝑘 =1
𝐾
We standardize the 𝑌 𝑄 𝐼 𝑖 with mean 0 and SD =1.
In the QUAIL method’s second procedure, after transforming the phenotype into a quantile
integrated rank score, we estimate the quantile integral effect with a regression between integrated
rank score with G. The regression equation is given by:
𝑌 𝑄 𝐼 𝑖 =𝛽 ̂
𝑄𝐼
𝐺 ⋆
+𝜖
𝛽̂
𝑄𝐼
=𝑎𝑟𝑔𝑚𝑖 𝑛 𝛽 ||𝑌 𝑄𝐼
−𝐺 ⋆𝑇 𝛽 ||
2
Where 𝐺 ⋆
is n*1 vector of genotype residuals after regression out covariates. Since we have 𝐺 ⋆
as
genotype adjusted with covariate, the quantile integral effect can be computed by the regression
12
of integrated rank score with 𝐺 ⋆
and it can be interpreted as the genotype effect on variability of
trait after adjusting 𝑌 for covariates.
Under the 𝐻 0
that all differences in pair of slopes are identically equal to 0, 𝛽 ̂
𝑄𝐼
follows a normal
asymptotic distribution. From above regression 𝑌 𝑄 𝐼 𝑖 =𝛽̂
𝑄𝐼
𝐺 ⋆
+𝜖 , QUAIL statistics and p-values
can be obtained.
Step 2: After selecting the candidate SNPs with p-value < 0.05 from QUAIL method, the null
hypothesis 𝐻 0
:β
𝐺 𝑗 𝑥𝐸
=0 can be tested with the standard interaction test. The significance level is
the same as criteria in other two-step method while 𝛼 ∗
=𝛼 /𝑚 , 𝑚 is the number of snps passing
the QUAIL method in step 1. Since we test the variability change in 𝑌 instead of marginal G effect,
we can use weighted testing instead of subset testing. In order to compare the performance of
different method, we still use subset testing in simulations.
2.1.8 Two-step with step1 on QUAIL method (QUAIL|G*E) without adjusting covariates
The procedure is much similar to section 7 but we only need to estimate intercept under the
model: 𝑄 𝑌 (𝜏 |𝐺 =𝑔 ,𝐶 )=𝜇 𝜏 +𝑔 𝛽 𝜏 . Meanwhile, when constructing the quantile integrated rank
score, 𝑎̂
𝑖 (𝜏 ) should be:
𝑎̂
𝑖 (𝜏 )=𝜏 −𝐼 (𝑌 𝑖 <𝜇 ̂
𝜏 )
And the 𝑆𝐸 (𝛾̂
𝜏 ) in
𝑌 𝑖𝜏
=
√𝑛 ∗𝑎̂
𝑖 (𝜏 )∗𝑆𝐸 (𝛾̂
𝜏 )
√(𝜏 −𝜏 2
)
should be the estimated standard error of 𝛾 𝜏 in 𝑄 𝑌 (𝜏 |𝑑 ,𝐶 )=𝑏 𝜏 +𝑑 𝛾 𝜏 without covariate term.
2.2 Simulation setting
13
Let G be an 𝑁 ∗𝑀 genotype matrix for 𝑁 individuals and 𝑀 SNPs. In all of our simulations, we
assume 𝑁 = 2000 and 𝑀 = 10000. 𝐺 is simulated based on the minor allele frequencies
(MAFs). To simplify the simulation, we did not consider correlations between SNPs. Quantitative
traits are simulated according to the following linear model:
𝑌 =∑𝛽 𝐺 𝑗 𝐺 𝑗 +𝛽 𝐸 𝐸 +𝛽 𝐺𝑥𝐸 (𝐺 1
∗𝐸 )+𝜖 j
Here, 𝐸 is the exposure variable (assumed to be binary) with 𝑃𝑟 (𝐸 =1)=0.3 and 𝑗 is the index
of SNPs with only 𝐺 𝑗 =1
has the true GxE effect on the outcome. The other SNPs, 𝐺 𝑗 ≠1
have no
marginal G effect and GxE effect neither. 𝜖 is the residual assumed to have a normal distribution
with mean 0 and variance 𝜎 𝐸 .
To generate the genotype 𝐺 𝑗 , first let 𝑋 ~ 𝑁 (0,1) be a 𝑁 ∗𝑀 matrix of standard normal variables.
Letting 𝑋 𝑖 ,𝑘 , 𝐺 𝑖 ,𝑘 be the 𝑖 𝑡 ℎ
row and 𝑘 𝑡 ℎ
column of 𝑋 and 𝐺 , we can show the relationship
between 𝑋 𝑖 ,𝑘 and 𝐺 𝑖 ,𝑘 as follows:
𝐺 𝑖 ,𝑘 =
{
0,𝑖𝑓 𝑋 𝑖 ,𝑘 <𝜙 (𝑓 𝑘 2
)
1,𝑖𝑓𝜙 (𝑓 𝑘 2
)<𝑋 𝑖 ,𝑘 <𝜙 (𝑓 k
2
+2𝑓 𝑘 (1−𝑓 𝑘 ))
2,𝑖𝑓𝜙 (𝑓 𝑘 2
+2𝑓 𝑘 (1−𝑓 𝑘 ))<𝑋 𝑖 ,𝑘
Where 𝜙 is the cumulative distribution function of the standard normal distribution and 𝑓 𝑘 is the
minor allele frequencies for SNP 𝑘 .
In this model, the value of the parameter 𝛽 is set to achieve a predetermined 𝑅 2
for each term with
strategy proposed by to partition the variance [Zhang et al., 2016]. We assumed total variance of
𝑌 to be 1 and use the strategy proposed by Gauderman [Gauderman et al., 2003] to partition the
variance. The variance of different terms can be derived from
14
𝑉𝑎𝑟 (𝐺 )=2(1−𝑞 𝐴 )𝑞 𝐴
𝑉𝑎𝑟 (𝐸 )=𝑃 𝐸 (1−𝑃 𝐸 )
𝑉𝑎 𝑟 𝐺𝑥𝐸 =𝑉𝑎𝑟 (𝐺 )𝑉𝑎𝑟 (𝐸 )
𝜎 𝐸 =𝜎 𝑌 √1−𝑅 𝐺𝑥𝐸 2
−𝑅 𝐺 2
−𝑅 𝐸 2
Therefore, regression coefficient can be generated by corresponding 𝑅 2
:
𝑅 𝐺 2
=β
𝐺 2
𝑉𝑎𝑟 (𝐺 −μ
𝐺 )=𝑞 𝐴 (2−𝑞 𝐴 )(1−𝑞 𝐴 )
2
𝑅 𝐸 2
=β
𝐸 2
𝑉𝑎𝑟 (𝐸 −μ
𝐸 )=𝑃 𝐸 (1−𝑃 𝐸 )
𝑅 𝐺𝑥𝐸 2
=β
𝐺𝑥𝐸 2
𝑉𝑎𝑟 [(𝐺 −μ
𝐺 )(𝐸 −μ
𝐸 )]=2𝑃 𝐸 (1−𝑃 𝐸 )𝑞 𝐴 (2−𝑞 𝐴 )(1−𝑞 𝐴 )
2
Here μ
𝐺 and μ
𝐸 are the population means of the covariates G and E. Therefore, we can get
corresponding slope value:
𝛽 𝐺𝑥𝐸 =
√
𝑅 𝐺𝑥𝐸 2
𝜎 𝑌 𝑉𝑎𝑟 (𝐺𝑥𝐸 )
𝛽 𝐺 ̅=
√
𝑅 𝐺 2
𝜎 𝑌 𝑉𝑎𝑟 (𝐺 )
𝛽 𝐸̅
=
√
𝑅 𝐸 2
𝜎 𝑌 𝑉𝑎𝑟 (𝐸 )
15
Here, 𝛽 𝐺 ̅ and 𝛽 𝐸̅
are the regression coefficients when G is the mean of genotype and E is the mean
of exposure, respectively. From the above equation, we can get general regression coefficients𝛽 𝐸
and 𝛽 𝐺
𝛽 𝐸 =𝛽 𝐸̅
−𝛽 𝐺𝑥𝐸 2𝑞 𝐴
𝛽 𝐺 =𝛽 𝐺 ̅−𝛽 𝐺𝑥𝐸 𝑃 𝐸
In order to compare the models in different circumstances, we design five scenarios corresponding
to different interaction patterns between G and E. In all scenarios we generated 10,000 replicate
data set with 𝑞 𝐴 =0.231 and 𝑃 𝐸 =0.3.
16
Chapter 3 Results
We have designed five scenarios to test the performance of QUAIL method compared to the other
GxE interaction test. All five scenarios have the same level of variance explained by the GxE
(𝑅 𝐺𝑥𝐸 2
) and the same 𝑃𝑟 (𝐸 ) = 0.3. In Crossing 1 and Crossing 2, we set 𝑅 𝐺 2
=0 to evaluate the
methods’ performance when there is no variance explained by marginal G effect. Crossing 2 has
a higher 𝑞 𝐴 as 0.5 and Crossing 1 has 𝑞 𝐴 as 0.231 with other parameter settings remaining the
same. Scenarios Quantitative 1 and Quantitative 2 are designed to simulate a condition where the
effects of genotype on variability of trait are larger than the effects of exposure. Quantitative 1 has
𝑅 𝐺 2
as 0.01 and Quantitative 2 has larger proportion of variance explained by marginal G effect as
0.02. We also propose a condition Strong Quant as the marginal effect of 𝐸 (𝑅 𝐸 2
=0.006) is bigger
than the marginal effect of G (𝑅 𝐺 2
=0.0043) with equal 𝑞 𝐴 and 𝑅 𝐺𝑥𝐸 2
compared to Quantitative
scenarios. Details about all parameter in different scenarios are presented at supplemental materials.
3.1 Type I Error
17
Figure 1 Summarized line plot for FWER in five scenarios
In Figure 1, we present the summary information about FWER. We observe that family-wise error
rate (FWER) in GWIS, YG, adjusted QUAIL and adjusted Levene’s test maintain the correct test
size. However, the FWER of the QUAIL method and Levene’s test is inflated to an unacceptable
level, indicating that the adjusted method can remove the marginal effect of exposure and control
the FWER back to correct level. We can also find that regardless of the magnitude of G effect, the
type I Error rates are similar for all adjusted method, GWIS and YG.
18
3.2 Power Comparison
Figure 2 Summarized line plot for power in five scenarios
Figure 2 shows a summary of the results of power four different method in five scenarios. Due to
the inflated type 1 error, we eliminated unadjusted QUAIL method and Levene’s test in power
comparison. The power to detect the interaction in GWIS is stable across different conditions. For
the YG method, the power of testing remains at a low level less than 0.04 in Crossing condition
and stays high level in Quantitative condition. We can also observe a positive relationship between
19
the power of YG in Quantitative condition and the effect of G on outcome, due to testing the
marginal effect of G in first step of YG method.
We then focus on the screening test of variability heterogeneity. The adjusted Levene’s test
performs poorly in all the situations, with power around 0.05. On the contrary, the adjusted QUAIL
method has a slightly better performance then adjusted Levene’s test at Quantitative conditions
and no difference at Crossing conditions.
20
Chapter 4 Discussion
For FWER, among the methods that included screening for variability heterogeneity, both adjusted
QUAIL method and adjusted Levene’s test have inflated type I error rates and back to correct
magnitude after adjusting for exposure. This demonstrates that both QUAIL method and Levene’s
test can control the FWER by removing the confounding effect on exposure.
The standard GWIS method performs consistently with a power around 0.47 in all conditions. YG,
which tests the marginal G effect in screening step, shows a strong correlation with magnitude of
𝑅 𝐺 2
, which can explain the higher power in YG with increasing of 𝑅 𝐺 2
value. However, all methods
involving in screening on variability heterogeneity of the outcome have relatively small power,
especially for the adjusted Levene’s test with consistent power as 0.05 in all conditions. This result
is consistent with a previous study [Zhang et al., 2016]., where the author concluded that the poor
performance of adjusted Levene’s test when the marginal 𝐸 effect is absent results from sacrificing
degrees of freedom for utilizing an inefficient source of information, since the magnitude of
variance heterogeneity is small. However, in our simulation, even with consistent presence of
marginal E effect (𝑅 𝐸 2
=0.5%), the power of screening on variability methods still remains low
level. It is reasonable for power in Crossing condition is small because the 𝑅 𝐺 2
=0 and the
screening in first step cannot detect G effect on variability of outcome. The Adjusted QUAIL
method has higher power than the adjusted Levene’s test in Quantitative situation and presents a
slightly positive relationship with 𝑅 𝐺 2
. This is likely because the QUAIL method can detect more
subtle change in the variability of outcome. Overall, the adjusted QUAIL method performs
generally better than adjusted Levene’s test under all conditions in simulation study but the power
remained low.
21
References
Murcray CE, Lewinger JP, Gauderman WJ. Gene-environment interaction in genome-wide
association studies. Am J Epidemiol. 2009 Jan 15;169(2):219-26. doi: 10.1093/aje/kwn353. Epub
2008 Nov 20. PMID: 19022827; PMCID: PMC2732981.
Gauderman WJ, Zhang P, Morrison JL, Lewinger JP. Finding novel genes by testing G × E
interactions in a genome-wide association study. Genet Epidemiol. 2013 Sep;37(6):603-13. doi:
10.1002/gepi.21748. Epub 2013 Jul 19. PMID: 23873611; PMCID: PMC4348012.
Kooperberg C, Leblanc M. Increasing the power of identifying gene x gene interactions in
genome-wide association studies. Genet Epidemiol. 2008 Apr;32(3):255-63. doi:
10.1002/gepi.20300. PMID: 18200600; PMCID: PMC2955421.
Ionita-Laza I, McQueen MB, Laird NM, Lange C. Genomewide weighted hypothesis testing in
family-based association studies, with an application to a 100K scan. Am J Hum Genet. 2007
Sep;81(3):607-14. doi: 10.1086/519748. Epub 2007 Jul 17. PMID: 17701906; PMCID:
PMC1950836.
Paré G, Cook NR, Ridker PM, Chasman DI. On the use of variance per genotype as a tool to
identify quantitative trait interaction effects: a report from the Women's Genome Health Study.
PLoS Genet. 2010 Jun 17;6(6):e1000981. doi: 10.1371/journal.pgen.1000981. PMID: 20585554;
PMCID: PMC2887471.
Zhang P, Lewinger JP, Conti D, Morrison JL, Gauderman WJ. Detecting Gene-Environment
Interactions for a Quantitative Trait in a Genome-Wide Association Study. Genet Epidemiol. 2016
Jul;40(5):394-403. doi: 10.1002/gepi.21977. Epub 2016 May 27. PMID: 27230133; PMCID:
PMC5108681.
Miao J, Lin Y, Wu Y, Zheng B, Schmitz LL, Fletcher JM, Lu Q. A quantile integral linear model
to quantify genetic effects on phenotypic variability. Proc Natl Acad Sci U S A. 2022 Sep
27;119(39):e2212959119. doi: 10.1073/pnas.2212959119. Epub 2022 Sep 19. PMID: 36122202;
PMCID: PMC9522331.
Kawaguchi ES, Kim AE, Lewinger JP, Gauderman WJ. Improved two-step testing of genome-
wide gene-environment interactions. Genet Epidemiol. 2023 Mar;47(2):152-166. doi:
10.1002/gepi.22509. Epub 2022 Dec 26. PMID: 36571162; PMCID: PMC9974838.
22
Hsu L, Jiao S, Dai JY, Hutter C, Peters U, Kooperberg C. Powerful cocktail methods for detecting
genome-wide gene-environment interaction. Genet Epidemiol. 2012 Apr;36(3):183-94. doi:
10.1002/gepi.21610. PMID: 22714933; PMCID: PMC3654520.
Murcray CE, Lewinger JP, Conti DV, Thomas DC, Gauderman WJ. Sample size requirements to
detect gene-environment interactions in genome-wide association studies. Genet Epidemiol. 2011
Apr;35(3):201-10. doi: 10.1002/gepi.20569. Epub 2011 Feb 9. PMID: 21308767; PMCID:
PMC3076801.
Levene, H. Robust tests for equality of variances. Stanford University Press; 1960. p. 278-292.
Gauderman WJ. Candidate gene association analysis for a quantitative trait, using parent-offspring
trios. Genet Epidemiol. 2003 Dec;25(4):327-38. doi: 10.1002/gepi.10262. PMID: 14639702.
Kooperberg C, Leblanc M. Increasing the power of identifying gene x gene interactions in
genome-wide association studies. Genet Epidemiol. 2008 Apr;32(3):255-63. doi:
10.1002/gepi.20300. PMID: 18200600; PMCID: PMC2955421.
Kawaguchi ES, Li G, Lewinger JP, Gauderman WJ. Two-step hypothesis testing to detect gene-
environment interactions in a genome-wide scan with a survival endpoint. Stat Med. 2022 Apr
30;41(9):1644-1657. doi: 10.1002/sim.9319. Epub 2022 Jan 24. PMID: 35075649; PMCID:
PMC9007892.
Gauderman WJ. Candidate gene association analysis for a quantitative trait, using parent-offspring
trios. Genet Epidemiol. 2003 Dec;25(4):327-38. doi: 10.1002/gepi.10262. PMID: 14639702.
23
Appendix
Table of general setting in simulation part
Scenarios 𝑞 𝐴 𝑅 𝐺 2
𝑅 𝐸 2
𝑅 𝐺𝑥𝐸 2
Quantitative 1 0.231 0.01 0.005 0.01
Quantitative 2 0.231 0.02 0.005 0.01
Strong Quant 0.231 0.0043 0.006 0.01
Crossing 1 0.231 0 0.005 0.01
Crossing 2 0.5 0 0.005 0.01
Table 1 Simulation setting in different scenarios
Result of power and FWER in four scenarios (eliminating the QUAIL and Levene):
Quantitative 1 Quantitative 2 Strong Quant Crossing 1 Crossing 2
GWIS 0.4832 0.4964 0.473 0.4571 0.472
DG 0.726 0.7417 0.5961 0.0323 0.0364
Adjusted QUAIL 0.0608 0.0985 0.0501 0.0436 0.0334
Adjusted Levene 0.0448 0.0459 0.045 0.0445 0.0412
Table 2 Summarized power in different scenarios
24
Quantitative 1 Quantitative 2 Strong Quant Crossing 1 Crossing 2
GWIS
0.0632 0.0666 0.0549 0.0536 0.0524
DG
0.0617 0.0639 0.0539 0.053 0.0511
QUAIL
0.2494 0.2616 0.259 0.2205 0.0777
Adjusted QUAIL
0.0731 0.0701 0.0687 0.0619 0.0606
Levene
0.2029 0.2124 0.212 0.1797 0.0724
Adjusted Levene
0.0698 0.0688 0.066 0.0567 0.0622
Table 3 summarized FWER indifferent scenarios
Abstract (if available)
Abstract
Two-step methods can improve the power of detecting gene-environment (GxE) interactions in genome-wide interaction scans. Typically, in the first step, marginal trait-vs-G or G-vs-E effects are tested to prioritize the SNPs that are more likely involved in GxE interactions. However, apart from marginal G effects, GxE interaction can also influence the variability in quantitative trait levels. Quantile integral linear model (QUAIL) is a quantile regression-based framework used to estimate genetic effects on the variability of outcomes. In this study, we aimed to utilize the QUAIL method to test variability of trait in the screening test. We designed several representative scenarios to evaluate the performance of different interaction detection methods. Through simulation, we compared this new combined method with other interaction testing methods. We found that the QUAIL method can be adjusted with exposure to reduce the inflated type 1 error and performed slightly better than adjusted Levene’s test, a common method testing the variance of a trait. However, the power of adjusted QUAIL method still remains low compared to other two-step testing strategies.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Minimum p-value approach in two-step tests of genome-wide gene-environment interactions
PDF
Two-step testing approaches for detecting quantitative trait gene-environment interactions in a genome-wide association study
PDF
Comparisons of four commonly used methods in GWAS to detect gene-environment interactions
PDF
Bayesian model averaging methods for gene-environment interactions and admixture mapping
PDF
Efficient two-step testing approaches for detecting gene-environment interactions in genome-wide association studies, with an application to the Children’s Health Study
PDF
High-dimensional regression for gene-environment interactions
PDF
Quantile mediation models: methods for assessing mediation across the outcome distribution
PDF
Generalized linear discriminant analysis for high-dimensional genomic data with external information
PDF
Uncertainty quantification in extreme gradient boosting with application to environmental epidemiology
PDF
Sentiment analysis in the COVID-19 vaccine willingness among staff in the University of Southern California
PDF
Detecting joint interactions between sets of variables in the context of studies with a dichotomous phenotype, with applications to asthma susceptibility involving epigenetics and epistasis
PDF
Hierarchical approaches for joint analysis of marginal summary statistics
PDF
Prediction and feature selection with regularized regression in integrative genomics
PDF
Bayesian multilevel quantile regression for longitudinal data
PDF
Missing heritability may be explained by the common household environment and its interaction with genetic variation
PDF
Leveraging functional datasets of stimulated cells to understand the relationship between environment and diseases
PDF
Statistical downscaling with artificial neural network
PDF
Characterization and discovery of genetic associations: multiethnic fine-mapping and incorporation of functional information
PDF
Inference correction in measurement error models with a complex dosimetry system
PDF
Gene-set based analysis using external prior information
Asset Metadata
Creator
Sun, Ke
(author)
Core Title
Combination of quantile integral linear model with two-step method to improve the power of genome-wide interaction scans
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biostatistics
Degree Conferral Date
2023-05
Publication Date
05/09/2023
Defense Date
05/09/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
gene-environment interaction,OAI-PMH Harvest,Quantile integral linear model,two-step methods
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kawaguchi, Eric (
committee chair
), Gauderman, William (
committee member
), Lewinger, Juan (
committee member
)
Creator Email
ksun7731@usc.edu,zhuyebamboo@163.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113103793
Unique identifier
UC113103793
Identifier
etd-SunKe-11809.pdf (filename)
Legacy Identifier
etd-SunKe-11809
Document Type
Thesis
Format
theses (aat)
Rights
Sun, Ke
Internet Media Type
application/pdf
Type
texts
Source
20230509-usctheses-batch-1040
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
gene-environment interaction
Quantile integral linear model
two-step methods