Close
USC Libraries
University of Southern California
About
FAQ
Home
Collections
Login
USC Login
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Folder
Combination of quantile integral linear model with two-step method to improve the power of genome-wide interaction scans
(USC Thesis Other) 

Combination of quantile integral linear model with two-step method to improve the power of genome-wide interaction scans

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Copy asset link
Request this asset
Request accessible transcript
Transcript (if available)
Content COMBINATION OF QUANTILE INTEGRAL LINEAR MODEL WITH TWO-STEP
METHOD TO IMPROVE THE POWER OF GENOME-WIDE INTERACTION SCANS
by
Ke Sun
A Thesis Presented to the
FACULTY OF THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(BIOSTATISTICS)
MAY 2023
Copyright 2023 Ke Sun
ii

Table of Contents
List of Tables ................................................................................................................................. iii
List of Figures ................................................................................................................................ iv
Abstract ........................................................................................................................................... v
Chapter 1 Introduction .................................................................................................................... 1
Chapter 2 Methods .......................................................................................................................... 3
2.1 Statistical model ........................................................................................................... 3
2.1.1 Marginal (YG) test ................................................................................................ 3
2.1.2 Interaction test (GxE) ............................................................................................ 4
2.1.3 Two-step method with step 1 on marginal G (YG|GxE) ...................................... 4
2.1.5 Two-step with step1 on Leveneโ€™s test with adjusting on Exposure ...................... 7
2.1.6 QUAIL method ..................................................................................................... 8
2.1.7 Two-step with step1 on QUAIL method with adjusting on Exposure ................ 10
2.1.8 Two-step with step1 on QUAIL method (QUAIL|G*E) without adjusting
covariates ........................................................................................................................... 12
2.2 Simulation setting ....................................................................................................... 12
Chapter 3 Results .......................................................................................................................... 16
3.1 Type I Error ................................................................................................................ 16
3.2 Power Comparison ..................................................................................................... 18
Chapter 4 Discussion .................................................................................................................... 20
References ..................................................................................................................................... 21
Appendix ....................................................................................................................................... 23


 
iii

List of Tables
TABLE 1 SIMULATION SETTING IN DIFFERENT SCENARIOS ------------------------------------------------ 23
TABLE 2 SUMMARIZED POWER IN DIFFERENT SCENARIOS ------------------------------------------------- 23
TABLE 3 SUMMARIZED FWER INDIFFERENT SCENARIOS ---------------------------------------------------- 24

 
iv

List of Figures
FIGURE 1 SUMMARIZED LINE PLOT FOR FWER IN FIVE SCENARIOS ------------------------------------ 17
FIGURE 2 SUMMARIZED LINE PLOT FOR POWER IN FIVE SCENARIOS ---------------------------------- 18

 
v

Abstract
Two-step methods can improve the power of detecting gene-environment (GxE) interactions in
genome-wide interaction scans. Typically, in the first step, marginal trait-vs-G or G-vs-E effects
are tested to prioritize the SNPs that are more likely involved in GxE interactions. However, apart
from marginal G effects, GxE interaction can also influence the variability in quantitative trait
levels. Quantile integral linear model (QUAIL) is a quantile regression-based framework used to
estimate genetic effects on the variability of outcomes. In this study, we aimed to utilize the
QUAIL method to test variability of trait in the screening test. We designed several representative
scenarios to evaluate the performance of different interaction detection methods. Through
simulation, we compared this new combined method with other interaction testing methods. We
found that the QUAIL method can be adjusted with exposure to reduce the inflated type 1 error
and performed slightly better than adjusted Leveneโ€™s test, a common method testing the variance
of a trait. However, the power of adjusted QUAIL method still remains low compared to other
two-step testing strategies.
1

Chapter 1 Introduction
Genome-wide association studies (GWAS) are commonly used to identify genetic factors that
contribute to complex traits. However, many complex traits are likely to be influenced not only by
genetics, but also by environmental factors. To address this, genome-wide interaction scans (GWIS)
have been developed to test for gene-by-environment (GxE) interactions. These scans model the
genotype, an environmental exposure, and the interaction term of exposure and genotype. Despite
their potential, there are many limitations to detecting GxE interactions. These limitations arise in
part due to the polygenic nature of human traits, as well as the relatively small effects of GxE
interactions. Nonetheless, GWIS represent an important tool in the ongoing effort to better
understand the complex etiology of many human traits and diseases.
The original approach for analyzing gene-by-environment (GxE) interactions is based on a
regression model that includes an interaction term for the gene and the environment. However,
this approach can lack power, and as a result, two-step methods have been developed to improve
the ability to detect GxE interactions [Kooperberg and LeBlanc, 2008; Murcray et al., 2009; Ionita-
Laza et al., 2007]. In the two-step method, the first step involves screening SNPs based on their
marginal effect [Ionita-Laza et al., 2007]. Only SNPs that pass the significance threshold in this
screening test are then tested for GxE interaction in the second step. By prioritizing the SNPs most
likely to be involved in a GxE interaction, the screening test can improve efficiency.  
Genetic variants can affect both the level and the variability of a quantitative trait [Parรฉ et al.,
2010]. In addition to detecting marginal genetic effects in step 1 of a two-step genome-wide
interaction scan (GWIS), statisticians like Parรฉ have developed an alternative procedure for testing
variance heterogeneity across SNP genotypes [Parรฉ et al., 2010]. However, previous studies have
2

indicated that using a screening test to test for variance heterogeneity may result in an inflated
false-positive rate [Zhang et al., 2016].
In this paper, we propose a new method for detecting variability heterogeneity that we aim to
combine with the two-step method for GWIS testing. Our goal is to compare the performance of
this new testing approach to other existing methods for detecting GxE interactions. By
incorporating a more robust approach for testing variability on quantitative trait, we hope to
improve the overall ability to detect GxE interactions from another aspect.  
The QUAIL (Quantile Integral Linear Model) method is a new technique that uses quantile
regression-based framework to estimate genetic effects on the variance of quantitative traits [Miao
et al., 2022]. QUAIL is a promising approach for finding variance quantitative trait loci (vQTL)
and can also adjust for the effects of confounders on the phenotypic variance and the level of
phenotypic quantitative trait [Miao et al., 2022]. Unlike other methods that test for variance
heterogeneity, QUAIL can be applied to both categorical and continuous variables without
assuming a specific distribution of phenotypes. In this study, we simulated the performance of the
two-step method using QUAIL and compared it to the performance of the common two-step
method.
 
3

Chapter 2 Methods
Notation:
In developing the methods, we use the following notations:
๐‘Œ : Quantitative phenotype
๐บ : Genotype variable (๐บ =2 for AA genotype, ๐บ =1 for Aa genotype and ๐บ =0 for aa
genotype)
๐ธ : Dichotomous exposure variable
๐‘€ : Dimension of SNPs. M SNPs have been genotyped on N study individuals with G
1
,G
2
,โ€ฆ,G
M

denoting the genotype at M loci
๐‘ž ๐ด : minor allele frequency (MAF) of allele A for the quantitative trait locus (QTL)

Consider a gene-environment interactions study with a quantitative trait Y, an environmental
exposure of interest E, and M SNPs (G
j
,โ€ˆjโ€ˆ=โ€ˆ1,2,โ€ฆ,M) measured or imputed for each of the N
subjects.  
2.1   Statistical model
2.1.1 Marginal (YG) test
We test the effect of genotype ฮฒ
G
=0 with linear regression model of the form:
Y=ฮฒ
0
+ฮฒ
G
โˆ—โ€ˆG+ฯต
4

Adjustment covariates can also be added into model. ฮฒ
๐บ is a weighted average of the
corresponding ๐บ genetic effect in each environmental group. The magnitude of ฮฒ
๐บ quantifies the
marginal effect and can serve as an indicator for the below two-step methods that test for the
marginal ๐บ effect test.
2.1.2 Interaction test (GxE)
To test the significance of GxE interaction term with linear regression model of the form:
Y=ฮฒ
0
+ฮฒ
G
G+๐›ฝ ๐ธ ๐ธ +ฮฒ
GxE
GxE+ฯต
For a genome-wide interaction study (GWIS), we assume M tests of GxE interaction where test
statistics {๐‘‡ ๐‘— }
๐‘— =1
๐‘€ and corresponding p-values {๐‘ƒ ๐‘— }
๐‘— =1
๐‘€ are computed. Each ๐‘‡ ๐‘— corresponds to the
test statistics for testing the null hypothesis ๐ป 0
:ฮฒ
๐บ ๐‘— ๐‘ฅ๐ธ
=0 . An adjustment for multiple
comparisons is utilized to preserve the family-wise Type I error rate (FWER) at a prespecified
significance level ฮฑ . For example, ฮฑ
โˆ—
represents the significance level in one single hypothesis
among multiple comparisons with ฮฑ
โˆ—
=ฮฑ/๐‘€ . However, this correction will decrease the power of
detecting the GxE interaction term.
2.1.3 Two-step method with step 1 on marginal G (YG|GxE)
In two-step methods, information on GxE is not only captured by the standard GxE test but a prior
test for prioritize SNPs that are more likely to participate in GxE interactions. [Kawaguchi, et al.,
2023; Zhang et al., 2016].
In the first step of the two-step method, we prioritize SNPs based on their marginal effect on the
trait, as an indication that they may be involved in GxE interactions. In the second step, we test for
5

interactions using a changeable significance threshold that is based on the number of SNPs selected
in the first step, to account for multiple comparisons.
Two procedures have been proposed for prioritizing SNPs in step-2 GxE interaction testing after
the step-1 screening: subset and weighted hypothesis testing [Gauderman et al., 2013]. These
procedures are widely used in the field.  In subset testing, only a subset of the M total SNPs that
pass the significance threshold according to the screening statistics are eligible for step 2 using the
standard GxE interaction test. The number of SNPs that pass the screening test is denoted as m <<
M. In step 2, the significance threshold for GxE interaction is calculated by the Bonferroni
correction with the number of SNPs (m) that pass the screening test. The new significance level
๐›ผ โˆ—
=๐›ผ /๐‘š is much less stringent than the original threshold used in a single one-step interaction
test. However, the relaxed threshold used in subset testing is a trade-off: SNPs that do not pass the
step 1 screening test will not be able to be tested in step2.  
Weighted hypothesis testing is another approach for prioritizing SNPs in step 2 of GxE interaction
testing, and it does not depend on whether a SNP passes the screening test in step 1. In this method,
SNPs are allocated into bins based on the magnitude of the screening statistic [Ionita-Laza et al.,
2007]. Each bin is assigned a different significance level that adds up to ฮฑ. Typically, lower bins
are given a more liberal significance threshold, as SNPs in these bins are more likely to have an
interaction based on the screening statistics. Conversely, SNPs in higher bins are given a more
stringent significance level for step 2 testing. Unlike the subset method, all SNPs in the screening
test can be tested in step 2 interaction.
Although weighted hypothesis testing is typically more powerful than the subset method, its power
is more likely to be affected by โ€œbin overcrowdingโ€ [Kawaguchi, et al., 2023]. This occurs when
SNPs with a non-zero marginal effect but no GxE interaction are allocated to earlier bins and SNPs
6

with true GxE effects are not optimally tested, being assigned a stricter significance level in the
later bins. Other additional component leading to overcrowding is linkage disequilibrium (LD)
affecting index snps in high LD being put in upper-level bins. To avoid the potential computational
burden and the issue of overcrowding, we chose to generate independent snps without LD and use
the subset method in our simulation study. [Kawaguchi, et al., 2022]
Step 1:
Y=ฮฒ
0
+ฮฒ
G
โˆ—โ€ˆG+ฯต
Step 2:  
Y=ฮฒ
0
+ฮฒ
G
G+๐›ฝ ๐ธ ๐ธ +ฮฒ
GxE
GxE+ฯต
As outlined earlier, the first step in the two-step GxE interaction testing approach involves
conducting a Marginal Test (YG) to identify candidate SNPs with a marginal effect on quantitative
outcomes. These SNPs are then moved to step 2, where they are subjected to a one-step interaction
test using Bonferroni correction, with the significance level adjusted as ๐›ผ โˆ—
=๐›ผ /๐‘š , where m
represents the number of candidate SNPs that have passed the step 1 screening test.  
The motivation behind Two-step methods with screening test as marginal G is that in the presence
of a true GxE interaction effect, in some cases it should be presented as a significant marginal G
effect on the outcome. This makes the statistics in the screening test helpful in testing the true GxE
effects. However, the marginal G effect may not always be effective in some GxE situations there
may be no marginal G effect can be tested in screening step. Apart from the marginal G effect,
GxE interaction can also influence the variability in quantitative trait levels.
2.1.4 Two-step with step 1 on Leveneโ€™s test (Var|GxE)
7

In addition to screening for the marginal G effect, we can also consider variance heterogeneity in
the quantitative trait across genotypic groups [Parรฉ et al., 2010]. The rationale behind this is that
if interaction effects do exist, they may also be reflected in differences in variability in the
quantitative trait among individuals carrying the risk allele compared to non-carriers. Levene's test
has been proposed as a means of testing the equality of variance [Levene H, 1960], which can
prioritize SNPs for further interaction testing. Levene's test is used to test whether k samples have
equal variance, and in this case, it is used to test the variance in the quantitative trait across different
genotype groups. The p-value from the Levene's test is used as the metric for determining
eligibility for entering step 2.
However, previous studies have shown that there is a correlation between the variance estimator
and GxE interaction estimator, indicating that the independent assumptions in two-step methods
are violated when the marginal effect of E exists [Zhang et al., 2016]. Simulation results also
confirm the above statement with a high false-positive rate in the approach proposed by Pare et al.  
2.1.5 Two-step with step1 on Leveneโ€™s test with adjusting on Exposure [6]
As mentioned in the previous section, the correlation between Levene's test and GxE interaction
violates the independence assumption between step 1 and step 2. However, we can eliminate this
correlation by adjusting for the exposure covariate in the regression model. This adjustment of
exposure will not affect the testing of GxE interaction in step 2. Thus, we propose a revised
Levene's test as follows: [Zhang et al., 2016]

Step1:  
Conducting a linear regression model with quantitative trait Y with Exposure:
๐‘Œ ๐‘– =ฮฒ
0
+ฮฒ
๐ธ ๐ธ ๐‘– +ฯต
๐‘–
8

๐‘– = 1,...,๐‘ for ๐‘ individuals in the sample and ฯต
๐‘– denotes the residual of regression for each
individual after adjustment of Exposure.  
Given different K genotype group, in this case K = 3, denoted as genotype group AA, AB and BB.
Testing the null hypothesis H
0
โ€ˆ:โ€ˆฯƒ
1
=ฯƒ
2
=โ‹ฏ=ฯƒ
K
, where ฯƒ
๐พ is the standard deviation in the ๐‘˜ ๐‘ก โ„Ž

group. The statistics in revised Leveneโ€™s test can be defined as:
W=
(Nโˆ’K)โˆ‘ N
k
(Z
kโ‹…
โˆ’Z
โ‹…โ‹…
)
2 K
k=1
(Kโˆ’1)โˆ‘ โˆ‘ (Z
kj
โˆ’Z
โ‹…j
)
2
N
k
j=1
K
k=1

where ๐‘ ๐‘˜ is the sample size in kth group, Z
kj
โ€ˆ=โ€ˆ|ฯต
kj
โ€ˆโˆ’โ€ˆฯต
kโ‹…
| with ฯต
kj
as the individual residual of
the regression with Y and E,  ฯต
kโ‹…
as the mean of ฯต
kj
in kth groups. Z
kโ‹…
Z
โ‹…j
๐‘Ž๐‘›๐‘‘ Z
โ‹…โ‹…
are the mean of
Z
kj
in ๐‘˜ ๐‘ก โ„Ž
genotype group, ๐‘— ๐‘ก โ„Ž
individuals and overall level, respectively. We change the statistics
from original quantitative trait level ๐‘Œ to the residual level ฯต with adjustment of exposure. Now
we obtain p-value as:
๐‘ƒ ๐‘Ÿ๐‘‰๐‘Ž๐‘Ÿ =Pr(๐น (๐พ โˆ’1,๐‘ โˆ’๐พ )
>๐‘Š )
Where ๐‘Š statistics follows F distribution with ๐พ โˆ’1 and ๐‘ โˆ’๐พ degrees of freedom.
Step 2: The candidate SNPs from step 1 are then selected and the null hypothesis ๐ป 0
:ฮฒ
๐บ ๐‘— ๐‘ฅ๐ธ
=0 is
tested using residuals to remove the marginal effect of E on quantitative trait. Thus, the proposed
approach satisfies the independent assumption, and the increase in family-wise error rate is
preserved.
2.1.6 QUAIL method
QUAIL (quantile integral linear model) is a novel method for identifying SNPs that show
differential variability of a quantitative trait across genotype groups. The QUAIL method uses
9

quantile regression to quantify the variability change in the quantitative trait across different
genotype groups

[Miao et al., 2022]. Quantile regression can estimate the conditional quantile
function of quantitative phenotypical traits at different quantile levels. If a SNP has an effect on
the variability of a quantitative trait, the slopes in quantile regression between genotype groups
will differ across the quantile levels.
To illustrate, letโ€™s consider an example where a SNP G causes a change in variability of trait Y
and let ๐›ฝ ๐œ represent the slope of quantile regression between trait Y and G at conditional quantile
level ๐œ
๐‘„ ๐‘Œ (๐œ |๐บ =๐‘” )=๐‘” ๐›ฝ ๐œ
Conducting ๐œ between (0,0.5), the QUAIL method chooses pairs of quantile level ๐œ with 1โˆ’๐œ
and utilizes the difference between regression slope of upper quantile level vs lower quantile level,
i.e., (ฮฒ
1โˆ’ฯ„
โˆ’ฮฒ
ฯ„
). On the contrary, if there is no variability change on quantitative trait, H
0
can be
proposed as H
0
:ฮฒ
ฯ„+0.5
=ฮฒ
ฯ„
for ฯ„โˆˆ(0,0.5) in general.
After implementing quantile regression, we introduce the quantile-integrated effect to aggregate
all the variability changes across quantile levels for each SNP on trait Y. This integrated effect
provides a summary measure of the overall impact of each SNP on the variability of Y, taking into
account both the magnitude and direction of the effects observed across different quantiles.
ฮฒ
๐‘„๐ผ
=โ€ˆโˆซ (ฮฒ
1โˆ’ฯ„
โˆ’ฮฒ
ฯ„
)๐‘‘ ฯ„
0.5
0

The above equation shows that the difference of slope in pairs of quantile levels can also be tested
by testing ๐ป 0
โ€ˆ:โ€ˆฮฒ
๐‘„ ๐ผ =0. In the practical computation, quantile levels are always treated as infinite
10

and ฮฒ
๐‘„๐ผ
can be approximately estimated by adding up the regression coefficients from K quantile
levels:
๐›ฝ ฬ‚
๐‘„๐ผ
=
โˆ‘ (๐›ฝ ฬ‚
1โˆ’๐œ ๐‘˜ โˆ’๐›ฝ ฬ‚
๐œ ๐‘˜ )
๐พ ๐‘˜ =1
๐พ
In quantile regression, the fitting equation to solve the regression model is given by:
๐œƒฬ‚
๐œ ๐‘˜ =๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘– ๐‘› ๐œƒ โˆ‘๐œŒ ๐œ ๐‘˜ (๐‘Œ ๐‘– โˆ’๐‘” ๐‘– ๐›ฝ ๐œ ๐‘˜ )
From the above equation, we can obtain estimated parameter ๐œƒ like ๐›ฝ ฬ‚
๐œ ๐‘˜ and ๐œ‡ ฬ‚
๐œ ๐‘˜ , where๐œŒ ๐œ (๐‘ข )=
๐‘ข [๐œ โˆ’๐ผ (๐‘ข <0)] and i is the index for ๐‘– ๐‘ก โ„Ž
individual. The QUAIL method has been developed to
efficiently compute K quantile regressions and overcome the difficulties associated with obtaining
the variance-covariance matrix. More details can be found in the next two sections.
2.1.7 Two-step with step1 on QUAIL method with adjusting on Exposure  
Step 1: QUAIL Method to Quantify the Variability in Genotype:
The QUAIL method's first procedure involves estimating the intercept and covariate effects under
the null model for 2K (K=1000) quantile levels in quantile regression of ๐‘„ ๐‘Œ (๐œ |๐บ =๐‘” ,๐ถ )=๐œ‡ ๐œ +
๐‘” ๐›ฝ ๐œ +๐ถ ๐›ผ ๐œ , where C is matrix for covariate and ๐›ผ ๐œ is the regression coefficients for covariates and
๐œ‡ ๐œ is the intercept for different ๐œ quantile levels. The estimated coefficients are obtained by
minimizing the sum of the equation given by:
๐›ผฬ‚
๐œ =๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘– ๐‘› ๐›ผ ๐œ โˆ‘๐œŒ ๐œ (๐‘Œ ๐‘– โˆ’๐œ‡ ๐œ ๐‘˜ โˆ’๐ถ ๐‘– ๐›ผ ๐œ )
Where ๐œŒ ๐œ (๐‘ข )=๐‘ข [๐œ โˆ’๐ผ (๐‘ข <0)] is the loss function for quantile regression. Then, for each
individual i, we construct 2๐พ quantile rank scores as:  
11

๐‘Žฬ‚
๐‘– (๐œ )=๐œ โˆ’๐ผ (๐‘Œ ๐‘– <๐ถ ๐‘– ๐›ผฬ‚
๐œ +๐œ‡ ฬ‚
๐œ )
Where ๐ผ (๐‘Œ ๐‘– <๐ถ ๐‘– ๐›ผฬ‚
๐œ +๐œ‡ ฬ‚
๐œ ) is an indicator variable for whether ๐‘Œ ๐‘– is smaller than the estimated ๐œ ๐‘ก โ„Ž

conditional quantile for ๐‘Œ ๐‘– . Thus, we have a quantile rank score for each individual, given by:
๐‘Œ ๐‘–๐œ
=
โˆš๐‘› โˆ—๐‘Žฬ‚
๐‘– (๐œ )โˆ—๐‘†๐ธ  (๐›พฬ‚
๐œ )
โˆš(๐œ โˆ’๐œ 2
)

Where n is the sample size, ๐‘†๐ธ  (๐›พฬ‚
๐œ ) is the estimated standard error of the regression coefficient in
quantile regression ๐‘„ ๐‘Œ (๐œ |๐‘‘ ,๐ถ )=๐‘ ๐œ +๐‘‘ ๐›พ ๐œ +๐ถ ๐›ผ ๐œ and ๐‘‘ is a random variable sampled from
๐‘ (0,1). We use variable ๐‘‘ and standard error of ๐‘‘ to approximate ๐›ฝ ฬ‚
๐‘„๐ผ
.
Finally, we construct the quantile integrated rank score for each individual ๐‘– by combining ๐‘Œ ๐‘–๐œ

across quantile levels:
๐‘Œ ๐‘„ ๐ผ ๐‘– =
โˆ‘ [๐‘Œ ๐‘– (1โˆ’๐œ ๐‘˜ )
โˆ’๐‘Œ ๐‘–๐œ๐‘˜
]
๐พ ๐‘˜ =1
๐พ
We standardize the ๐‘Œ ๐‘„ ๐ผ ๐‘– with mean 0 and SD =1.  
In the QUAIL methodโ€™s second procedure, after transforming the phenotype into a quantile
integrated rank score, we estimate the quantile integral effect with a regression between integrated
rank score with G. The regression equation is given by:
๐‘Œ ๐‘„ ๐ผ ๐‘– =๐›ฝ ฬ‚
๐‘„๐ผ
๐บ โ‹†
+๐œ–
๐›ฝฬ‚
๐‘„๐ผ
=๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘– ๐‘› ๐›ฝ ||๐‘Œ ๐‘„๐ผ
โˆ’๐บ โ‹†๐‘‡ ๐›ฝ ||
2

Where ๐บ โ‹†
is n*1 vector of genotype residuals after regression out covariates. Since we have ๐บ โ‹†
as
genotype adjusted with covariate, the quantile integral effect can be computed by the regression
12

of integrated rank score with ๐บ โ‹†
and it can be interpreted as the genotype effect on variability of
trait after adjusting ๐‘Œ for covariates.  
Under the ๐ป 0
that all differences in pair of slopes are identically equal to 0, ๐›ฝ ฬ‚
๐‘„๐ผ
follows a normal
asymptotic distribution. From above regression ๐‘Œ ๐‘„ ๐ผ ๐‘– =๐›ฝฬ‚
๐‘„๐ผ
๐บ โ‹†
+๐œ– , QUAIL statistics and p-values
can be obtained.
Step 2: After selecting the candidate SNPs with p-value < 0.05 from QUAIL method, the null
hypothesis ๐ป 0
:ฮฒ
๐บ ๐‘— ๐‘ฅ๐ธ
=0 can be tested with the standard interaction test. The significance level is
the same as criteria in other two-step method while ๐›ผ โˆ—
=๐›ผ /๐‘š , ๐‘š is the number of snps passing
the QUAIL method in step 1. Since we test the variability change in ๐‘Œ instead of marginal G effect,
we can use weighted testing instead of subset testing. In order to compare the performance of
different method, we still use subset testing in simulations.  
2.1.8 Two-step with step1 on QUAIL method (QUAIL|G*E) without adjusting covariates
The procedure is much similar to section 7 but we only need to estimate intercept under the
model: ๐‘„ ๐‘Œ (๐œ |๐บ =๐‘” ,๐ถ )=๐œ‡ ๐œ +๐‘” ๐›ฝ ๐œ . Meanwhile, when constructing the quantile integrated rank
score, ๐‘Žฬ‚
๐‘– (๐œ ) should be:  
๐‘Žฬ‚
๐‘– (๐œ )=๐œ โˆ’๐ผ (๐‘Œ ๐‘– <๐œ‡ ฬ‚
๐œ )
And the ๐‘†๐ธ  (๐›พฬ‚
๐œ ) in  
๐‘Œ ๐‘–๐œ
=
โˆš๐‘› โˆ—๐‘Žฬ‚
๐‘– (๐œ )โˆ—๐‘†๐ธ  (๐›พฬ‚
๐œ )
โˆš(๐œ โˆ’๐œ 2
)

should be the estimated standard error of ๐›พ ๐œ in  ๐‘„ ๐‘Œ (๐œ |๐‘‘ ,๐ถ )=๐‘ ๐œ +๐‘‘ ๐›พ ๐œ without covariate term.
2.2 Simulation setting
13

Let G be an ๐‘ โˆ—๐‘€ genotype matrix for ๐‘ individuals and ๐‘€ SNPs. In all of our simulations, we
assume ๐‘ = 2000 and ๐‘€ = 10000. ๐บ is simulated based on the minor allele frequencies
(MAFs). To simplify the simulation, we did not consider correlations between SNPs. Quantitative
traits are simulated according to the following linear model:
๐‘Œ =โˆ‘๐›ฝ ๐บ ๐‘— ๐บ ๐‘— +๐›ฝ ๐ธ ๐ธ +๐›ฝ ๐บ๐‘ฅ๐ธ (๐บ 1
โˆ—๐ธ )+๐œ– j

Here, ๐ธ is the exposure variable (assumed to be binary) with ๐‘ƒ๐‘Ÿ (๐ธ =1)=0.3 and ๐‘— is the index
of SNPs with only ๐บ ๐‘— =1
has the true GxE effect on the outcome. The other SNPs, ๐บ ๐‘— โ‰ 1
have no
marginal G effect and GxE effect neither. ๐œ– is the residual assumed to have a normal distribution
with mean 0 and variance ๐œŽ ๐ธ .  
To generate the genotype ๐บ ๐‘— , first let ๐‘‹ ~ ๐‘ (0,1) be a ๐‘ โˆ—๐‘€ matrix of standard normal variables.
Letting ๐‘‹ ๐‘– ,๐‘˜ , ๐บ ๐‘– ,๐‘˜ be the ๐‘– ๐‘ก โ„Ž
row and ๐‘˜ ๐‘ก โ„Ž
column of ๐‘‹ and ๐บ , we can show the relationship
between ๐‘‹ ๐‘– ,๐‘˜ and ๐บ ๐‘– ,๐‘˜ as follows:
๐บ ๐‘– ,๐‘˜ =
{




0,๐‘–๐‘“ ๐‘‹ ๐‘– ,๐‘˜ <๐œ™ (๐‘“ ๐‘˜ 2
)
1,๐‘–๐‘“๐œ™ (๐‘“ ๐‘˜ 2
)<๐‘‹ ๐‘– ,๐‘˜ <๐œ™ (๐‘“ k
2
+2๐‘“ ๐‘˜ (1โˆ’๐‘“ ๐‘˜ ))
2,๐‘–๐‘“๐œ™ (๐‘“ ๐‘˜ 2
+2๐‘“ ๐‘˜ (1โˆ’๐‘“ ๐‘˜ ))<๐‘‹ ๐‘– ,๐‘˜
Where ๐œ™ is the cumulative distribution function of the standard normal distribution and ๐‘“ ๐‘˜ is the
minor allele frequencies for SNP ๐‘˜ .
In this model, the value of the parameter ๐›ฝ is set to achieve a predetermined ๐‘… 2
for each term with
strategy proposed by to partition the variance [Zhang et al., 2016]. We assumed total variance of
๐‘Œ to be 1 and use the strategy proposed by Gauderman [Gauderman et al., 2003] to partition the
variance. The variance of different terms can be derived from  
14

๐‘‰๐‘Ž๐‘Ÿ (๐บ )=2(1โˆ’๐‘ž ๐ด )๐‘ž ๐ด
๐‘‰๐‘Ž๐‘Ÿ (๐ธ )=๐‘ƒ ๐ธ (1โˆ’๐‘ƒ ๐ธ )
๐‘‰๐‘Ž ๐‘Ÿ ๐บ๐‘ฅ๐ธ =๐‘‰๐‘Ž๐‘Ÿ (๐บ )๐‘‰๐‘Ž๐‘Ÿ (๐ธ )
๐œŽ ๐ธ =๐œŽ ๐‘Œ โˆš1โˆ’๐‘… ๐บ๐‘ฅ๐ธ 2
โˆ’๐‘… ๐บ 2
โˆ’๐‘… ๐ธ 2

Therefore, regression coefficient can be generated by corresponding ๐‘… 2
:
๐‘… ๐บ 2
=ฮฒ
๐บ 2
๐‘‰๐‘Ž๐‘Ÿ (๐บ โˆ’ฮผ
๐บ )=๐‘ž ๐ด (2โˆ’๐‘ž ๐ด )(1โˆ’๐‘ž ๐ด )
2

๐‘… ๐ธ 2
=ฮฒ
๐ธ 2
๐‘‰๐‘Ž๐‘Ÿ (๐ธ โˆ’ฮผ
๐ธ )=๐‘ƒ ๐ธ (1โˆ’๐‘ƒ ๐ธ )
๐‘… ๐บ๐‘ฅ๐ธ 2
=ฮฒ
๐บ๐‘ฅ๐ธ 2
๐‘‰๐‘Ž๐‘Ÿ [(๐บ โˆ’ฮผ
๐บ )(๐ธ โˆ’ฮผ
๐ธ )]=2๐‘ƒ ๐ธ (1โˆ’๐‘ƒ ๐ธ )๐‘ž ๐ด (2โˆ’๐‘ž ๐ด )(1โˆ’๐‘ž ๐ด )
2

Here ฮผ
๐บ and ฮผ
๐ธ are the population means of the covariates G and E. Therefore, we can get
corresponding slope value:

๐›ฝ ๐บ๐‘ฅ๐ธ =
โˆš
๐‘… ๐บ๐‘ฅ๐ธ 2
๐œŽ ๐‘Œ ๐‘‰๐‘Ž๐‘Ÿ (๐บ๐‘ฅ๐ธ )

๐›ฝ ๐บ ฬ…=
โˆš
๐‘… ๐บ 2
๐œŽ ๐‘Œ ๐‘‰๐‘Ž๐‘Ÿ (๐บ )

๐›ฝ ๐ธฬ…
=
โˆš
๐‘… ๐ธ 2
๐œŽ ๐‘Œ ๐‘‰๐‘Ž๐‘Ÿ (๐ธ )

15

Here, ๐›ฝ ๐บ ฬ… and ๐›ฝ ๐ธฬ…
are the regression coefficients when G is the mean of genotype and E is the mean
of exposure, respectively. From the above equation, we can get general regression coefficients๐›ฝ ๐ธ
and ๐›ฝ ๐บ
๐›ฝ ๐ธ =๐›ฝ ๐ธฬ…
โˆ’๐›ฝ ๐บ๐‘ฅ๐ธ 2๐‘ž ๐ด
๐›ฝ ๐บ =๐›ฝ ๐บ ฬ…โˆ’๐›ฝ ๐บ๐‘ฅ๐ธ ๐‘ƒ ๐ธ
In order to compare the models in different circumstances, we design five scenarios corresponding
to different interaction patterns between G and E. In all scenarios we generated 10,000 replicate
data set with ๐‘ž ๐ด =0.231 and ๐‘ƒ ๐ธ =0.3.
 
16

Chapter 3 Results
We have designed five scenarios to test the performance of QUAIL method compared to the other
GxE interaction test. All five scenarios have the same level of variance explained by the GxE
(๐‘… ๐บ๐‘ฅ๐ธ 2
) and the same ๐‘ƒ๐‘Ÿ (๐ธ ) = 0.3. In Crossing 1 and Crossing 2, we set ๐‘… ๐บ 2
=0 to evaluate the
methodsโ€™ performance when there is no variance explained by marginal G effect. Crossing 2 has
a higher ๐‘ž ๐ด as 0.5 and Crossing 1 has ๐‘ž ๐ด  as 0.231 with other parameter settings remaining the
same. Scenarios Quantitative 1 and Quantitative 2 are designed to simulate a condition where the
effects of genotype on variability of trait are larger than the effects of exposure. Quantitative 1 has
๐‘… ๐บ 2
as 0.01 and Quantitative 2 has larger proportion of variance explained by marginal G effect as
0.02. We also propose a condition Strong Quant as the marginal effect of ๐ธ (๐‘… ๐ธ 2
=0.006) is bigger
than the marginal effect of G (๐‘… ๐บ 2
=0.0043) with equal ๐‘ž ๐ด and ๐‘… ๐บ๐‘ฅ๐ธ 2
compared to Quantitative
scenarios. Details about all parameter in different scenarios are presented at supplemental materials.  
3.1 Type I Error


17


Figure 1 Summarized line plot for FWER in five scenarios
In Figure 1, we present the summary information about FWER. We observe that family-wise error
rate (FWER) in GWIS, YG, adjusted QUAIL and adjusted Leveneโ€™s test maintain the correct test
size. However, the FWER of the QUAIL method and Leveneโ€™s test is inflated to an unacceptable
level, indicating that the adjusted method can remove the marginal effect of exposure and control
the FWER back to correct level. We can also find that regardless of the magnitude of G effect, the
type I Error rates are similar for all adjusted method, GWIS and YG.

18

3.2 Power Comparison

Figure 2 Summarized line plot for power in five scenarios
Figure 2 shows a summary of the results of power four different method in five scenarios. Due to
the inflated type 1 error, we eliminated unadjusted QUAIL method and Leveneโ€™s test in power
comparison. The power to detect the interaction in GWIS is stable across different conditions. For
the YG method, the power of testing remains at a low level less than 0.04 in Crossing condition
and stays high level in Quantitative condition. We can also observe a positive relationship between
19

the power of YG in Quantitative condition and the effect of G on outcome, due to testing the
marginal effect of G in first step of YG method.  
We then focus on the screening test of variability heterogeneity. The adjusted Leveneโ€™s test
performs poorly in all the situations, with power around 0.05. On the contrary, the adjusted QUAIL
method has a slightly better performance then adjusted Leveneโ€™s test at Quantitative conditions
and no difference at Crossing conditions.
 
20

Chapter 4 Discussion
For FWER, among the methods that included screening for variability heterogeneity, both adjusted
QUAIL method and adjusted Leveneโ€™s test have inflated type I error rates and back to correct
magnitude after adjusting for exposure. This demonstrates that both QUAIL method and Leveneโ€™s
test can control the FWER by removing the confounding effect on exposure.
The standard GWIS method performs consistently with a power around 0.47 in all conditions. YG,
which tests the marginal G effect in screening step, shows a strong correlation with magnitude of
๐‘… ๐บ 2
, which can explain the higher power in YG with increasing of ๐‘… ๐บ 2
value. However, all methods
involving in screening on variability heterogeneity of the outcome have relatively small power,
especially for the adjusted Leveneโ€™s test with consistent power as 0.05 in all conditions. This result
is consistent with a previous study [Zhang et al., 2016]., where the author concluded that the poor
performance of adjusted Leveneโ€™s test when the marginal ๐ธ effect is absent results from sacrificing
degrees of freedom for utilizing an inefficient source of information, since the magnitude of
variance heterogeneity is small. However, in our simulation, even with consistent presence of
marginal E effect (๐‘… ๐ธ 2
=0.5%), the power of screening on variability methods still remains low
level. It is reasonable for power in Crossing condition is small because the ๐‘… ๐บ 2
=0 and the
screening in first step cannot detect G effect on variability of outcome. The Adjusted QUAIL
method has higher power than the adjusted Leveneโ€™s test in Quantitative situation and presents a
slightly positive relationship with ๐‘… ๐บ 2
. This is likely because the QUAIL method can detect more
subtle change in the variability of outcome. Overall, the adjusted QUAIL method performs
generally better than adjusted Leveneโ€™s test under all conditions in simulation study but the power
remained low.  
21

References
Murcray CE, Lewinger JP, Gauderman WJ. Gene-environment interaction in genome-wide
association studies. Am J Epidemiol. 2009 Jan 15;169(2):219-26. doi: 10.1093/aje/kwn353. Epub
2008 Nov 20. PMID: 19022827; PMCID: PMC2732981.
Gauderman WJ, Zhang P, Morrison JL, Lewinger JP. Finding novel genes by testing G ร— E
interactions in a genome-wide association study. Genet Epidemiol. 2013 Sep;37(6):603-13. doi:
10.1002/gepi.21748. Epub 2013 Jul 19. PMID: 23873611; PMCID: PMC4348012.
Kooperberg C, Leblanc M. Increasing the power of identifying gene x gene interactions in
genome-wide association studies. Genet Epidemiol. 2008 Apr;32(3):255-63. doi:
10.1002/gepi.20300. PMID: 18200600; PMCID: PMC2955421.
Ionita-Laza I, McQueen MB, Laird NM, Lange C. Genomewide weighted hypothesis testing in
family-based association studies, with an application to a 100K scan. Am J Hum Genet. 2007
Sep;81(3):607-14. doi: 10.1086/519748. Epub 2007 Jul 17. PMID: 17701906; PMCID:
PMC1950836.
Parรฉ G, Cook NR, Ridker PM, Chasman DI. On the use of variance per genotype as a tool to
identify quantitative trait interaction effects: a report from the Women's Genome Health Study.
PLoS Genet. 2010 Jun 17;6(6):e1000981. doi: 10.1371/journal.pgen.1000981. PMID: 20585554;
PMCID: PMC2887471.
Zhang P, Lewinger JP, Conti D, Morrison JL, Gauderman WJ. Detecting Gene-Environment
Interactions for a Quantitative Trait in a Genome-Wide Association Study. Genet Epidemiol. 2016
Jul;40(5):394-403. doi: 10.1002/gepi.21977. Epub 2016 May 27. PMID: 27230133; PMCID:
PMC5108681.
Miao J, Lin Y, Wu Y, Zheng B, Schmitz LL, Fletcher JM, Lu Q. A quantile integral linear model
to quantify genetic effects on phenotypic variability. Proc Natl Acad Sci U S A. 2022 Sep
27;119(39):e2212959119. doi: 10.1073/pnas.2212959119. Epub 2022 Sep 19. PMID: 36122202;
PMCID: PMC9522331.
Kawaguchi ES, Kim AE, Lewinger JP, Gauderman WJ. Improved two-step testing of genome-
wide gene-environment interactions. Genet Epidemiol. 2023 Mar;47(2):152-166. doi:
10.1002/gepi.22509. Epub 2022 Dec 26. PMID: 36571162; PMCID: PMC9974838.
22

Hsu L, Jiao S, Dai JY, Hutter C, Peters U, Kooperberg C. Powerful cocktail methods for detecting
genome-wide gene-environment interaction. Genet Epidemiol. 2012 Apr;36(3):183-94. doi:
10.1002/gepi.21610. PMID: 22714933; PMCID: PMC3654520.
Murcray CE, Lewinger JP, Conti DV, Thomas DC, Gauderman WJ. Sample size requirements to
detect gene-environment interactions in genome-wide association studies. Genet Epidemiol. 2011
Apr;35(3):201-10. doi: 10.1002/gepi.20569. Epub 2011 Feb 9. PMID: 21308767; PMCID:
PMC3076801.
Levene, H. Robust tests for equality of variances. Stanford University Press; 1960. p. 278-292.
Gauderman WJ. Candidate gene association analysis for a quantitative trait, using parent-offspring
trios. Genet Epidemiol. 2003 Dec;25(4):327-38. doi: 10.1002/gepi.10262. PMID: 14639702.
Kooperberg C, Leblanc M. Increasing the power of identifying gene x gene interactions in
genome-wide association studies. Genet Epidemiol. 2008 Apr;32(3):255-63. doi:
10.1002/gepi.20300. PMID: 18200600; PMCID: PMC2955421.
Kawaguchi ES, Li G, Lewinger JP, Gauderman WJ. Two-step hypothesis testing to detect gene-
environment interactions in a genome-wide scan with a survival endpoint. Stat Med. 2022 Apr
30;41(9):1644-1657. doi: 10.1002/sim.9319. Epub 2022 Jan 24. PMID: 35075649; PMCID:
PMC9007892.
Gauderman WJ. Candidate gene association analysis for a quantitative trait, using parent-offspring
trios. Genet Epidemiol. 2003 Dec;25(4):327-38. doi: 10.1002/gepi.10262. PMID: 14639702.
 
23

Appendix
Table of general setting in simulation part
Scenarios ๐‘ž ๐ด ๐‘… ๐บ 2
๐‘… ๐ธ 2
๐‘… ๐บ๐‘ฅ๐ธ 2

Quantitative 1 0.231 0.01 0.005 0.01
Quantitative 2 0.231 0.02 0.005 0.01
Strong Quant 0.231 0.0043 0.006 0.01
Crossing 1 0.231 0 0.005 0.01
Crossing 2 0.5 0 0.005 0.01
Table 1 Simulation setting in different scenarios

Result of power and FWER in four scenarios (eliminating the QUAIL and Levene):
Quantitative 1 Quantitative 2 Strong Quant Crossing 1 Crossing 2
GWIS 0.4832 0.4964 0.473 0.4571 0.472
DG 0.726 0.7417 0.5961 0.0323 0.0364
Adjusted QUAIL 0.0608 0.0985 0.0501 0.0436 0.0334
Adjusted Levene 0.0448 0.0459 0.045 0.0445 0.0412
Table 2 Summarized power in different scenarios






24

Quantitative 1 Quantitative 2 Strong Quant Crossing 1 Crossing 2
GWIS
0.0632 0.0666 0.0549 0.0536 0.0524
DG
0.0617 0.0639 0.0539 0.053 0.0511
QUAIL
0.2494 0.2616 0.259 0.2205 0.0777
Adjusted QUAIL
0.0731 0.0701 0.0687 0.0619 0.0606
Levene
0.2029 0.2124 0.212 0.1797 0.0724
Adjusted Levene
0.0698 0.0688 0.066 0.0567 0.0622
Table 3 summarized FWER indifferent scenarios 
Asset Metadata
Creator Sun, Ke (author) 
Core Title Combination of quantile integral linear model with two-step method to improve the power of genome-wide interaction scans 
Contributor Electronically uploaded by the author (provenance) 
School Keck School of Medicine 
Degree Master of Science 
Degree Program Biostatistics 
Degree Conferral Date 2023-05 
Publication Date 05/09/2023 
Defense Date 05/09/2023 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag gene-environment interaction,oai:digitallibrary.usc.edu:usctheses,OAI-PMH Harvest,Quantile integral linear model,two-step methods 
Format theses (aat) 
Language English
Advisor Kawaguchi, Eric (committee chair), Gauderman, William (committee member), Lewinger, Juan (committee member) 
Creator Email ksun7731@usc.edu,zhuyebamboo@163.com 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-oUC113103793 
Unique identifier UC113103793 
Identifier etd-SunKe-11809.pdf (filename) 
Legacy Identifier etd-SunKe-11809 
Document Type Thesis 
Format theses (aat) 
Rights Sun, Ke 
Internet Media Type application/pdf 
Type texts
Source 20230509-usctheses-batch-1040 (batch), University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law.  Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright.  It is the author, as rights holder, who must provide use permission if such use is covered by copyright. 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email uscdl@usc.edu
Abstract (if available)
Abstract Two-step methods can improve the power of detecting gene-environment (GxE) interactions in genome-wide interaction scans. Typically, in the first step, marginal trait-vs-G or G-vs-E effects are tested to prioritize the SNPs that are more likely involved in GxE interactions. However, apart from marginal G effects, GxE interaction can also influence the variability in quantitative trait levels. Quantile integral linear model (QUAIL) is a quantile regression-based framework used to estimate genetic effects on the variability of outcomes. In this study, we aimed to utilize the QUAIL method to test variability of trait in the screening test. We designed several representative scenarios to evaluate the performance of different interaction detection methods. Through simulation, we compared this new combined method with other interaction testing methods. We found that the QUAIL method can be adjusted with exposure to reduce the inflated type 1 error and performed slightly better than adjusted Leveneโ€™s test, a common method testing the variance of a trait. However, the power of adjusted QUAIL method still remains low compared to other two-step testing strategies. 
Tags
gene-environment interaction
Quantile integral linear model
two-step methods
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button