Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis
(USC Thesis Other)
twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide
association analysis
By
Xinran Wang
A Thesis Presented to the
FACULTY OF THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(APPLIED BIOSTATISTICS AND EPIDEMIOLOGY)
May 2023
Copyright 2023 Xinran Wang
ii
Acknowledgements
I would like to express my deepest appreciation and gratitude to everyone who has contributed to
the successful completion of my master's thesis.
First and foremost, I would like to extend my sincere thanks to my thesis committee chair, Nicholas
Mancuso. His guidance, encouragement, and insightful comments throughout this journey have
been invaluable. I am grateful for his time, expertise, and patience in helping me navigate through
the complexities of this research. I would also like to extend my appreciation to Kimberly
Siegmund and David Conti for their valuable contributions as committee members. Their feedback
and constructive criticism helped me to refine and improve the quality of my work.
I am also grateful to Zeyun Lu and Zixuan Zhang, both Ph.D. candidates, for their support and
assistance during my research. They helped me to explain some of the more complicated aspects
of my research and were always available to offer valuable feedback and advice.
Finally, I would like to express my appreciation to my family, friends, and colleagues who
provided me with the motivation, encouragement, and support I needed to complete this research.
Without their support and belief in me, this achievement would not have been possible.
Thank you all for your contributions, guidance, and support throughout this journey.
iii
Table of Contents
Acknowledgements ........................................................................................................................ ii
List of Tables ................................................................................................................................. iv
List of Figures.................................................................................................................................. v
List of Algorisms ........................................................................................................................... vi
Abstract ........................................................................................................................................ vii
Introduction .................................................................................................................................... 1
Chapter 1: Methods ......................................................................................................................... 2
Model for gene expression and complex phenotype ............................................... 2
Model for GWAS summary statistics and fast GWAS simulation ......................... 2
Gene expression prediction models ........................................................................ 4
Model for Transcriptome-Wide Association Study ................................................ 5
Chapter 2: Implementation ............................................................................................................. 6
Chapter 3: Simulation ..................................................................................................................... 9
Simulation data preparation .................................................................................... 9
Simulation groups ................................................................................................... 9
Simulation parameters .......................................................................................... 10
Chapter 4: Performance Metrics and Definition ............................................................................ 11
Chapter 5: Dynamic Import of External Modules ......................................................................... 12
Chapter 6: Application .................................................................................................................. 14
Unbiasedness ........................................................................................................ 14
FWER and Inflation .............................................................................................. 15
Power .................................................................................................................... 17
LD Misspecification ............................................................................................. 20
Scalability ............................................................................................................. 22
Horizontal Pleiotropy Through Linkage ............................................................... 22
Chapter 7: Conclusion ................................................................................................................... 24
References .................................................................................................................................... 30
iv
List of Tables
Table 1. One-sample, Two-Sided Kolmogorov-Smirnov Test Statistics ...................................... 19
Table 2. Two-Sided T-test for Power, FWER, and Inflation for Correct and Misspecified
Reference Panel .............................................................................................................. 20
Supplementary Table 1. Structure of TWAS simulator OUTPUT.summary.tsv........................... 25
Supplementary Table 2. Structure of TWAS simulator OUTPUT.scan.tsv................................... 26
v
List of Figures
Figure 1. twas_sim Workflow ...................................................................................................... 8
Figure 2. twas_sim Simulation Results ...................................................................................... 14
Figure 3. The TWAS statistics simulated by twas_sim is well controlled at P=0.05 under the
null in family-wise error rate (FWER) .......................................................................... 15
Figure 4. twas_sim simulates non-inflated TWAS statistics under the null) across eQTL/
GWAS sample sizes and genetic architecture................................................................ 16
Figure 5. The TWAS statistics simulated by twas_sim retain high power under the
alternative across eQTL/GWAS sample sizes and genetic ............................................ 17
Figure 6. SuSiE and Elastic Net has comparable power for h
ge
2
< 0.0001 ..................................... 18
Figure 7. Impact of LD misspecification on statistical power ..................................................... 21
Figure 8. fast mode outperforms standard mode in mean CPU time regardless of GWAS
sample sizes .................................................................................................................. 22
Figure 9. A. TWAS χ
2
test-statistics generated from a model of horizontal pleiotropy through
linkage as compared to TWAS model. B. FWER increased as GWAS h
g
2
increase
in horizontal pleiotropy through linkage model ............................................................ 23
vi
List of Algorithms
Supplementary Algorithm 1. External Module to Call susieR....................................................... 27
Supplementary Algorithm 2. R Code to Import Simulated eQTL Data and Output
susieR Prediction Weights.............................................................. 29
vii
Abstract
Genome-wide association studies (GWASes) have identified numerous variants associated with
quantitative traits. However, most of these associations are in linkage disequilibrium (LD) with
other variants and fall in the non-coding region, complicating proximal target gene identification.
Transcriptome-wide association studies (TWASs) have been proposed to mitigate this gap by
integrating expression quantitative trait loci (eQTL) data with GWAS data. Numerous
methodological advancements have been made for TWAS, yet each approach requires ad-hoc
simulations to demonstrate feasibility. The power of TWAS test statistics may be compromised by
LD misspecification, defined as the inconsistency of LD patterns for individuals of single ancestry
across different reference panels or within the same reference panel. Here, we present twas_sim, a
computationally scalable and easily extendable tool for simplified performance evaluation and
power analysis for TWAS methods. To illustrate the utility of twas_sim, we perform simulations
across various genetic architectures and data contexts from 1000 Genomes genotype data to
evaluate the power and unbiasedness of the simulated TWAS test statistics. We further use
twas_sim to assess the degree of LD misspecification within the same reference panel.
1
Introduction
Genome-wide association studies (GWASs) have identified numerous genetic variants associated
with complex traits and diseases (Visscher et al., 2017). However, most associated variants fall
within non-coding regions, which makes identifying the target gene challenging (Edwards et al.,
2013; Hindorff et al., 2009). Furthermore, functional evidence suggests that most GWAS hits are
involved in regulatory processes (Maurano et al., 2012; Vierstra et al., 2020), which implies that
causal variants regulate the expression of nearby genes. Transcriptome-wide association studies
(TWASs) have been proposed to address this limitation by integrating expression quantitative trait
loci (eQTL) data with GWAS data to identify functionally-informed gene-level associations
(Gamazon et al., 2015; Gusev et al., 2016). A growing ecosystem of methods have been developed
around TWAS, each relying on different statistical assumptions (Nagpal et al., 2019; Parrish et al.,
2022; Liu et al., 2021; Tang et al., 2021; Mancuso et al., 2019; Lu et al., 2022; Bhattacharya et
al., 2021). Prior methodological work evaluated performance through a combination of ad-hoc
simulations and real data analysis. However, validating and assessing model performance requires
researchers to implement custom simulations, which duplicates effort and can result in subtle
differences in how baselines are defined.
To address this, we developed twas_sim, a computationally scalable and easily extendable tool
for downstream TWAS method evaluation and comparison (e.g., statistical power, false positive
rate, etc.). It leverages real genetic data to capture typical linkage disequilibrium (LD) patterns and
can simulate gene expression levels and complex traits under a variety of feasible genetic
architectures. Importantly, it is capable of dynamically loading custom code (e.g., Python, R, Julia)
to evaluate independently developed TWAS methods. It is freely available at
https://github.com/mancusolab/twas_sim.
2
Chapter 1: Methods
Model for gene expression and complex phenotype. The first step of simulating TWAS
association statistics is to compute genotype matrix and β coefficient for eQTL panel. We use
reference linkage-disequilibrium (LD) data from 1000Genomes (Sudmant et al., 2015; 1000
Genomes Project Consortium et al., 2015) to simulate genotype samples and user-defined
architecture to simulate eQTL reference panel. Then, we construct models for gene expression and
complex trait. We use a linear function of genotype matrix 𝐗 ∈ ℝ
𝑛 ×𝑝 for 𝑛 individuals and 𝑝
SNPs, to model gene expression level 𝐠 ∈ ℝ
𝑛 as,
𝐠 = 𝐗 𝛃 eQTL
+ 𝛆 𝑔𝑒
where 𝛃 eQTL
∈ ℝ
𝑝 is the vector for eQTL effect size, 𝛆 𝑔𝑒
~ 𝐍 ( 𝟎 , 𝜎 𝑔𝑒
2
𝐈 𝑛 ) is random environmental
noise such that 𝜎 𝑔𝑒
2
≔ 1 − ℎ
𝑔 2
, and ℎ
𝑔 2
is the SNP-heritability of gene expression (Shi et al., 2016).
We use a linear function of expression levels g to model 𝐲 ∈ ℝ
𝑛 , the normally distributed
quantitative trait for 𝑛 individuals as
𝐲 = 𝐠 α + 𝛆 = ( 𝐗 𝛃 eQTL
+ 𝛆 𝑔𝑒
) 𝛼 + 𝛆 = 𝐗 𝛃 eQTL
𝛼 + ( 𝛆 𝑔𝑒
𝛼 + 𝛆 )
= 𝐗 𝛃 eQTL
𝛼 + 𝛆 𝑦
= 𝐗 𝛃 GWAS
+ 𝛆 𝑦 ,
where 𝛼 is the causal effect of gene expression, 𝛆 𝑦 ~ 𝐍 ( 𝟎 , 𝜎 𝑦 2
𝐈 𝑛 ) is random environmental noise
such that 𝜎 𝑦 2
≔ 1 − ℎ
𝑔𝑒
2
where ℎ
𝑔𝑒
2
is proportion of variance explained in complex trait 𝐲 by gene
expression levels g.
Model for genome-wide association study summary statistics and fast GWAS simulation. We
simulate GWAS summary statistics using two different modes. The standard mode simulates
genotypes for GWAS individuals using multi-variate normal approximations parameterized by
3
estimated linkage disequilibrium (LD) at the genomic region, simulates phenotypes under a fixed
eQTL and trait architecture using the above equations, and finally performs marginal regression at
each approximate SNP to obtain GWAS summary statistics. Here, we describe how to perform
GWAS simulations without generating simulated individual-level data (i.e. fast mode). Given that
𝛃 GWAS
= 𝛃 eQTL
α (see above), using standard linear regression we can compute 𝛃 ̂
GWAS
for 𝑗 𝑡 ℎ
SNP as
β
̂
GWAS,j
= ( 𝐱 j
T
𝐱 j
)
−1
𝐱 j
T
𝐲 ≈
1
𝑛 𝐱 j
T
𝐲 ,
where 𝐱 j
is centered and standardized genotype vector at the 𝑗 𝑡 ℎ
SNP. Thus, we can compute the
entire vector of marginal estimates as
𝛃 ̂
GWAS
≈
1
𝑛 𝐗 T
𝐲 ,
where 𝐗 ∈ ℝ
𝑛 ×𝑝 is centered and standardized genotype matrix at all 𝑝 SNPs. Alternatively, we
can model GWAS summary statistics directly using a multivariate normal distribution
parameterized by estimated LD and causal effects 𝛃 GWAS
, which is a common modelling
assumption in numerous statistical genetic frameworks (Pasaniuc and Price, 2017; Yang et al.,
2012). Based on the computation above, we can compute the expectation and variance of 𝛃 ̂
GWAS
as:
𝔼 [𝛃 ̂
GWAS
] = 𝔼 [
1
𝑛 𝐗 T
𝐲 ] =
1
𝑛 𝐗 T
𝔼 [𝐲 ] =
1
𝑛 𝐗 T
𝐗 𝛃 GWAS
= 𝐕 𝛃 GWAS
𝕍 [𝛃 ̂
GWAS
] = 𝕍 [
1
𝑛 𝐗 T
𝐲 ] =
1
𝑛 𝐗 T
𝕍 [𝐲 ]𝐗 1
𝑛 =
1
𝑛 2
𝐗 T
( 𝐈 𝑛 𝜎 𝑦 2
) 𝐗 =
𝜎 𝑦 2
𝑛 1
𝑛 𝐗 T
𝐗 =
𝜎 𝑦 2
n
𝐕
where 𝐕 is the 𝑝 × 𝑝 LD matrix.
Together we have,
4
𝛃 ̂
GWAS
~ N(𝐕 𝛃 GWAS
, 𝐕 𝜎 𝑦 2
𝑛 ).
Typically, 𝑛 is large and the normal distribution is a good approximation. However, we extend our
framework to the 𝑡 distribution to allow for additional variability at low sample sizes. Specifically,
we use an inverse gamma distribution to model the variance in the GWAS effect estimate at the
𝑗 th
SNP as
s
𝑗 2
~Γ
−1
(
𝑣 2
,
𝑣 𝜏 2
2
),
where 𝑣 = 𝑛 − 1 and 𝜏 2
= σ
𝑦 2
𝑛 ⁄ . Let 𝐃 = 𝐝𝐢𝐚𝐠 ( 𝑠 1
, … , 𝑠 𝑝 ) be a diagonal matrix of GWAS
standard errors, then we sample GWAS summary data following ref (Zhu and Stephens, 2017) as,
𝛃 ̂
GWAS
~N( 𝐃 𝐕 𝐃 −𝟏 𝛃 GWAS
, 𝐃 𝐕𝐃 ) .
Gene expression prediction models. We use LASSO, Elastic Net, and GBLUP predictive
penalized linear models to fit effect sizes 𝐰 ̂
Lasso
, 𝐰 ̂
EN
, and 𝐰 ̂
GBLUP
(Tibshirani, 1996; Zou and
Hastie, 2005; Hoerl and Kennard, 1970; Patterson and Thompson, 1971; Goeman, 2010). We use
𝒘 ̂ to represent penalized predicative weights instead of 𝜷 ̂
𝒆𝑸𝑻𝑳 . We compute LASSO predictive
model as:
𝐰 ̂
Lasso
≔ argmin
𝐰 ‖𝐠 − 𝐗 𝐰 ‖
2
2
+
𝜆 1
2
‖𝐰 ‖
1
using L1 penalized regression, where 𝜆 1
is the coefficient shrinkage adjustment constant. We
compute Elastic Net predictive model as
𝐰 ̂
EN
≔ argmin
𝐰 ‖𝐠 − 𝐗 𝐰 ‖
2
2
+
𝜆 2
𝜆 1
2
‖𝐰 ‖
1
+
( 1 − 𝜆 2
) 𝜆 1
2
‖𝐰 ‖
2
2
using L1/L2 penalized regression, where 𝜆 2
is the weight that tradeoff between 0 and 1.
We compute GBLUP predictive model as
5
𝐰 ̂
GBLUP
≔ argmin
𝐰 ‖𝐠 − 𝐗 𝐰 ‖
2
2
+ 𝜆 1
‖𝐰 ‖
2
2
using penalty computed using REML variance component estimates (Searle et al., 1992) as
𝜆 1
= 𝜎 𝑔𝑒
2
( ℎ
𝑔 2
𝑝 ⁄ ) ⁄ .
Model for Transcriptome-Wide Association Study. According to TWAS (Gusev et al., 2016;
Gamazon et al., 2015), we can first estimate the causal effect of gene expression, 𝛼 ̂, using ordinary
least squares (OLS) regression as,
𝛼 ̂ =
𝐠̂
T
𝐲 ( 𝐠̂
T
𝐠̂)
=
( 𝐗 𝐰 ̂)
T
𝐲 ( 𝐗 𝐰 ̂)
T
𝐗 𝐰 ̂
=
𝐰 ̂
T
𝐗 T
𝐲 𝑛 𝐰 ̂
T
𝐕 𝐰 ̂
where 𝐠̂ is the predicted gene expression. We can compute the variance of the estimation of causal
effect of gene expression (𝛼 ̂) as,
𝕍 [𝛼 ̂] = 𝕍 [
𝐰 ̂
T
𝐗 T
𝐲 𝑛 𝐰 ̂
T
𝐕 𝐰 ̂
] =
𝐰 ̂
T
𝐗 T
𝑛 𝐰 ̂
T
𝐕 𝐰 ̂
𝕍 [𝐲 ]
𝐗 𝐰 ̂
𝑛 𝐰 ̂
T
𝐕 𝐰 ̂
=
𝜎 𝑦 2
𝑛 𝐰 ̂
T
𝐕 𝐰 ̂
.
Lastly, we compute TWAS Z-score as,
𝑧 𝑇𝑊𝐴𝑆 =
𝛼 ̂
√𝕍 [𝛼 ̂]
=
𝐰 ̂
T
𝐗 T
𝐲 𝑛 𝐰 ̂
T
𝐕 𝐰 ̂
⋅
√𝑛 𝐰 ̂
T
𝐕 𝐰 ̂
𝜎 𝑦 =
𝐰 ̂
𝑇 𝐳 GWAS
√𝐰 ̂
T
𝐕 𝐰 ̂
,
where 𝐳 GWAS
≔
1
√ 𝑛 𝜎 𝑦 𝐗 T
𝐲 .
6
Chapter 2: Implementation
twas_sim is a python-based tool that uses real genotype data to generate TWAS test statistics by
simulating complex traits as a function of latent expression levels, fitting eQTL weights in
independent reference data, and performing genome- and predicted transcriptome- association
testing on the simulated complex trait (see Figure 1). twas_sim accepts optional arguments to
vary eQTL/GWAS sample sizes, genetic architectures (e.g., ℎ
𝑔 2
, ℎ
𝑔𝑒
2
, and sparsity of eQTL effects),
horizontal pleiotropy through linkage, and reference genotype datasets for each step in the pipeline
(e.g., GWAS, eQTL reference, TWAS testing). For details on parameters and options, see
Supplementary Tables 1, 2, and Simulation.
twas_sim supports simulating GWAS summary data through two possible modes. Standard mode
simulates genotypes for GWAS individuals using multivariate normal approximations
parameterized by LD at the genomic region, simulates phenotypes under a fixed eQTL and trait
architecture, and finally performs marginal regression at each approximate SNP to obtain GWAS
summary statistics. When GWAS sample size, 𝑁 𝐺𝑊𝐴𝑆 , is large, this process requires large amounts
of memory (i.e., 𝑂 ( 𝑁 𝐺𝑊𝐴𝑆 ⋅ 𝑃 ), where 𝑃 is the number of genetic variants. As a workaround,
twas_sim supports fast mode, which simulates GWAS summary statistics directly using the
multivariate normal distribution parameterized by LD (Pasaniuc and Price, 2017). By making
distributional assumptions of the underlying summary statistics, this setting bypasses the need for
individual-level genotype data and requires memory only proportional to 𝑂 ( 𝑃 2
) , which can vastly
reduce the memory footprint and vastly speed up simulation times (see Methods).
7
Importantly, to model LD misspecification, twas_sim supports the option to use different LD
reference panels across GWAS and eQTL simulations in addition to TWAS testing. To predict
gene expression levels into GWAS data, twas_sim supports internally fitting least absolute
shrinkage and selection operator (LASSO), elastic net and genomic best linear unbiased prediction
(GBLUP) linear prediction models from simulated reference gene expression data; in addition, it
also allows users to use true eQTL effect sizes for TWAS calculation instead of regularization
method (Searle et al., 1992; Tibshirani, 1996; Zou and Hastie, 2005).
The dynamic import feature enables twas_sim to include external prediction tools easily. It
requires only that users define a simple Python interface with a function named `fit` (see Dynamic
import of external modules, Supplementary Algorithms 1, 2). To illustrate the simplicity of
this approach, we have provided two example scripts in the repository to perform Ordinary Least
Square (OLS) regression using sklearn and the Sum of Single Effects (SuSiE) sparse regression
from susieR (Wang et al., 2020).
8
Figure 1. twas_sim workflow. Step 1: We approximated genotypes under 1,000 Genomes reference LD structure
using an MVN model. Step 2: First, we simulated eQTL effect sizes ( 𝛃 eQTL
) based on user-defined genetic
architectures and environmental noise. Then, we simulated gene expression in eQTL dataset (𝐠 eQTL
). Third, we
estimated eQTL effect sizes (𝛃 ̂
eQTL
) using penalized prediction models (LASSO, Elastic Net, GBLUP). Step 3: First,
we obtained gene expression effect sizes (α). Second, we calculated GWAS effect sizes (𝛃 GWAS
). Third, we performed
GWAS using either standard mode or memory efficient fast mode. In the standard mode, we first generated genotype
matrix (𝐗 GWAS
) and phenotype (𝐲 GWAS
). Then, we regressed phenotype (𝐲 GWAS
) using marginal linear regression for
each simulated variant to get GWAS Z-score. In the fast mode, we simulated GWAS effect size (𝛃 ̂
GWAS
) from MVN
approximation and generated GWAS Z-score. Step 4: We computed TWAS test statistics using LD, GWAS Z-score,
and estimated eQTL effect sizes (𝛃 ̂
eQTL
).
9
Chapter 3: Simulation
Simulation data preparation. Here we describe our simulation pipeline that includes reference
genotype data pre-processing and twas_sim. To simulate a region with sufficient complexity, we
first sample a genomic region uniformly at random, among approximately independent LD blocks
that harbor between 5-20 genes, using RefSeq gene definitions (O’Leary et al., 2016; Berisa and
Pickrell, 2016). Next, we subset 1000G reference genotype data (1000 Genomes Project
Consortium et al., 2015) from European ancestry individuals to the genomic region from the
previous step, while filtering out genetic variants that are not bi-allelic SNPs, have minor allele
frequency less than 1%, have Hardy-Weinberg p-value < 1e-5, and variant missingness > 10%.
We additionally restrict to HapMap3 variants (International HapMap 3 Consortium et al., 2010).
Next, we provide this QC’d reference genotype data to twas_sim to perform simulations under a
variety of eQTL and complex trait architectures, sample sizes, and linear prediction models.
Simulation groups. We performed two groups of simulations to account for LD misspecification:
• Correct reference panel: use 1000Genomes reference genotypes from all 489 individuals
of European ancestry to compute GWAS, eQTL, and TWAS LD information.
• Misspecified reference panel under LD misspecification: randomly assign 1000Genomes
reference genotypes from 489 individuals of European ancestry into two subgroups, with
244 individuals in the first group and 245 individuals in the second group. Use the reference
genotype data from the first subgroup to compute GWAS LD information and the genotype
data from the second subgroup to compute eQTL and TWAS LD information.
Additionally, we performed two groups of simulations to generate model 𝛽 𝐺𝑊𝐴𝑆 under:
10
• Causal TWAS Model: dependent GWAS and eQTL signals, and
• Horizontal Pleiotropy through Linkage: independent GWAS and eQTL signals (see
Performance metrics and definitions).
Simulation parameters. Here, we define a set of “canonical” parameters that represent a baseline
from which we deviated when exploring specific parameter settings:
GWAS simulation mode: fast mode,
• linear model: Elastic Net,
• SNP model: 1% SNPs,
• eQTL sample size: 250,
• GWAS sample size: 200K,
• ℎ
𝑔𝑒
2
: 0.0005,
• ℎ
𝑔 2
: 0.1.
When exploring specific parameter settings, we varied individual parameter as,
• GWAS simulation mode: fast or standard modes,
• linear model: LASSO, Elastic Net, GBLUP, SuSiE, and True eQTL,
• SNP model: 1 SNP, 1% SNPs, 10% SNPs,
• eQTL sample size: 100, 250, or 500,
• GWAS sample size: 50k, 100k, 200k, 500k,
• ℎ
𝑔𝑒
2
: 0.0 (null), 0.00005, 0.0001, 0.0005, 0.001, 0.0025, 0.005, or 0.01,
• ℎ
𝑔 2
: 0.1.
We perform 50 simulations for each fixed set of simulation parameters.
11
Chapter 4: Performance metrics and definitions.
We evaluated the performance of twas_sim in terms of unbiasedness, power, degree of LD
misspecification, memory usage, and CPU time usage. We investigated unbiasedness from 3
metrics (Kolmogorov-Smirnov test, family-wise error rate, and inflation) under the null hypothesis,
which states that there is no association between complex trait and predicted gene expression (i.e.,
α = 0). We first used a two-sided, one-sample Kolmogorov-Smirnov (KS) test to test if there was
a statistically significant difference between TWAS Z-score distribution and normal distribution
for each fixed set of simulation parameters (50 simulations per set) in each simulation group.
Second, we calculated family-wise error rate (FWER) to assess the proportion of TWAS null
hypothesis being rejected under the null. For each fixed set of simulation parameters, we
performed bootstraps with 1,000 repeats to track the proportion of genes that are TWAS significant
with nominal P-value < 0.05. Then, we calculated the FWER by computing the mean and 95%
confidence interval of the bootstrapped results. Third, we calculated inflation to measure how
much twas_sim results deviate from the expected distribution (χ
2
distribution with 1 degree of
freedom). We estimated the inflation and its 95% confidence interval using bootstraps with 1,000
repeats. Specifically, we defined inflation as the median(twas.χ
2
)/ 0.455, where 0.455 is the median
of 1-df χ
2
distribution; in other words, the inflation is expected to be 1 if the twas_sim results is
perfectly unbiased.
We defined power as the probability of detecting the association between complex trait and
predicted gene expression when the association truly exists. In other words, it is the proportion of
TWAS alternative hypothesis being accepted under the alternative hypothesis. For each fixed set
of simulation parameters, we estimated the power and its 95% confidence interval by performing
12
1,000 nonparametric bootstraps with the threshold TWAS p-value < 0.05/22,000, where 22,000
represent an approximation of the total number of protein-coding genes within human genome.
We defined LD misspecification as the inconsistency of LD patterns for individuals of the same
ancestry within the same reference panel and horizontal pleiotropy through linkage as the situation
when nearby tagging genes are also tested in TWAS.
We extracted MaxRSS, the maximum memory usage of each simulation process, from Slurm
Workload Manage output files using sccat command. In addition, we obtained CPU time usage by
process_time function from python time library.
13
Chapter 5: Dynamic Import of External Modules.
To enhance the flexibility of twas_sim and enable the use of externally defined prediction models,
we have implemented a dynamic import function that permits users to call external models in a
language-agnostic way, provided it is wrapped inside a Python function named `fit`. As a
demonstration of this feature, we implemented a module that fits the genetic component of gene
expression using the Sum of Single Effects (SuSiE) model (Wang et al., 2020) as implemented in
susieR (see Supplementary Algorithms 1, 2).
14
Chapter 6: Application
To illustrate the utility of twas_sim, we performed simulations using genetic data from
1000Genomes (1000 Genomes Project Consortium et al., 2015) across a variety of gene expression
and complex trait architectures, and genotype reference panels (see Simulation).
Unbiasedness. First, we investigated unbiasedness under null simulations (i.e., α = 0) under three
metrics: Kolmogorov-Smirnov (KS) test on TWAS Z-scores, family-wise error rate (FWER) on
TWAS p-value, and inflation (see Performance metrics and definitions). We found TWAS test
statistics computed using Elastic Net are largely consistent with the null (P=0.26) and observed
similar patterns for other linear models (see Figure 2A). Focusing on Elastic Net prediction models,
we observed similar results under various eQTL architectures, eQTL/GWAS sample sizes, and
simulation modes (see Table 1).
Figure 2. TWAS Simulation results. A) QQ plot for TWAS χ2 under the null hypothesis. Each point reflects the χ2
statistic under null simulations based on different predictive models. B) TWAS power analysis. Each point reflects
the proportion of simulations where the null was rejected at p < 2.27e-06. X-axis reflects the proportion of trait
variability explained by gene expression C) Memory usage by simulation mode. Height of bars reflect the average
memory usage for fast/standard simulation modes. All error bars reflect 95% confidence interval.
15
FWER and Inflation. Next, we evaluated FWER with found calibrated results across prediction
models, eQTL architecture, eQTL/GWAS sample sizes, and simulation modes (see Figures 3).
Similarly, we found no inflation across all settings (see Figures 4). Together, these results suggest
that TWAS test statistics are robust to model assumptions.
Figure 3. The TWAS statistics simulated by twas_sim is well controlled at P=0.05 under the null in family-
wise error rate (FWER). The bar plot for TWAS FWER (under the null scenario where gene expression effect size
α=0) for canonical parameters with various A. linear models B. SNP models C. increasing GWAS sample sizes D.
eQTL sample sizes E. GWAS mode. See Supplementary Note for FWER calculation. 50 simulations were performed
for each fixed set of simulation parameters. The FWER and its error bars (95% confidence interval) are estimated
using bootstraps with 1,000 repeats.
16
Figure 4. twas_sim simulates non-inflated TWAS statistics under the null) across eQTL/GWAS sample sizes
and genetic architecture. The bar plot for TWAS inflation (under the null scenario where gene expression effect size
α=0) for canonical parameters with various A. linear models B. SNP models C. increasing GWAS sample sizes D.
eQTL sample sizes E. GWAS mode. See Supplementary Note for inflation calculation. 50 simulations were
performed for each fixed set of simulation parameters. 50 simulations were performed for each fixed set of simulation
parameters. The inflation and its error bars (95% confidence interval) are estimated using bootstraps with 1,000 repeats.
17
Power. Next, we evaluated the power of each prediction model when a causal relationship between
eQTL and complex trait exists (i.e., α ≠ 0). We observed Elastic Net (power=0.66) outperformed
GBLUP (power=0.64), LASSO (power=0.62), and SuSiE (power=0.44; see Figures 5, 6). We
assessed power under various simulation settings and observed power increased with increasing
ℎ
𝑔𝑒
2
, GWAS and eQTL sample sizes, eQTL and sparsity of eQTL architectures (see Figure 2B;
Figures 5, 6).
Figure 5. The TWAS statistics simulated by twas_sim retain high power under the alternative across
eQTL/GWAS sample sizes and genetic. The plot for TWAS power (under the alternative scenario where gene
expression effect size α≠0) for canonical parameters with various A. h
ge
2
B. linear models C. SNP models D. increasing
GWAS sample sizes E. eQTL sample sizes F. GWAS mode. See Supplementary Note for power calculation. 50
simulations were performed for each fixed set of simulation parameters. The power and its error bars (95% confidence
interval) are estimated using bootstraps with 1,000 repeats.
18
Figure 6. SuSiE and Elastic Net has comparable power. The line plot for TWAS power (under the alternative
scenario where gene expression effect size α≠0) for Elastic Net and SuSiE with canonical parameters and increasing
h
ge
2
. 50 simulations were performed for each fixed set of simulation parameters. The power and its error bars (95%
confidence interval) are estimated using bootstraps with 1,000 repeats.
19
Canonical parameters with various GWAS mode
GWAS Mode Statistics P Value
Fast 0.14 0.26
Standard 0.07 0.95
Canonical parameters with various linear model
Linear Model Statistics P Value
Elastic Net 0.14 0.26
LASSO 0.17 0.10
GBLUP 0.10 0.67
True eQTL 0.12 0.48
External (SuSiE) 0.16 0.16
Canonical parameters with various SNP model
SNP Model Statistics P Value
10% SNPs 0.10 0.71
1% SNPs 0.14 0.26
1 SNP 0.12 0.45
Canonical parameters with increasing eQTL sample size
eQTL Sample Size Statistics P Value
100 0.10 0.75
250 0.14 0.26
500 0.16 0.14
Canonical parameters with increasing GWAS sample size
GWAS Sample Size Statistics P Value
50K 0.12 0.51
100K 0.14 0.30
200K 0.14 0.26
500K 0.12 0.46
Canonical parameters with various reference panels
Reference Panel Statistics P Value
Correct 0.14 0.26
Misspecified 0.11 0.57
Canonical parameters with various GWAS and eQTL signals
GWAS and eQTL signals Statistics P Value
Dependent GWAS and eQTL Signals 0.11 0.57
Independent GWAS and eQTL Signals 0.11 0.59
Table 1. One-Sample, Two-Sided Kolmogorov-Smirnov Test Statistics.
20
LD Misspecification. To assess the degree to which LD misspecification affects TWAS test
statistics, we performed simulations splitting 1000G EUR individuals into two subsets (N=244,
245). The first subset was used to simulate GWAS test statistics, whereas the second was used for
eQTL simulation and downstream TWAS testing. Under the null, we found TWAS test statistics
computed using the correct reference panel (P=0.26) and the misspecified reference panel (P=0.57;
see Table 1) were largely consistent, with similar estimates inflation (P=0.049) and moderately
reduced FWER (P=0.005; see Table 2). In simulations under a causal model, we observed LD
misspecification reduced power significantly compared with the correctly specified model
(P=2.2E-16; see Table 2, Figures 7A-L).
Performance Evaluation Method Statistics P Value
Power 26.95 2.2E-16
FWER 2.80 0.005
Inflation 2.05 0.049
Table 2. Two-Sided T-test for Power, FWER, and Inflation for Correct and Misspecified Reference Panel.
21
Figure 7. Impact of LD misspecification on statistical power. We quantified the impact of LD misspecification.
Briefly, we performed simulations by splitting all Europeans in 1000G reference panel into two equally divided
groups, simulated GWAS and eQTL data using the first LD group, and computed TWAS test statistics using the
second LD group. We compared results under this scenario with results obtained using the correctly specified panel
(eQTL, GWAS, and TWAS data generated from all Europeans in 1000G reference panel) to assess statistical power
(A-D), FWER (E-H), and inflation under the null (I-L). The power, FWER, and inflation and their error bars (95%
confidence interval) are estimated using bootstraps with 1,000 repeats.
22
Scalability. To highlight the scalability of twas_sim to extremely large GWAS sample sizes, we
evaluated its performance under standard and fast simulation modes. We found fast mode required
6x and 36x less memory and 8x and 41x less CPU time compared with standard mode, for GWAS
sample sizes of 100K and 500K, respectively (see Figure 2C, and Figure 8).
Figure 8. fast mode outperforms standard mode in mean CPU time regardless of GWAS sample sizes. The bar
plot for mean CPU time (in seconds) for canonical parameters with two GWAS modes while increasing GWAS sample
sizes. 50 simulations were performed for each fixed set of simulation parameters.
Horizontal Pleiotropy Through Linkage. Lastly, to assess how horizontal pleiotropy through
linkage (i.e. genes whose eQTLs are in LD with eQTLs for a causal gene) inflates TWAS test
statistics, we simulated GWAS effect sizes independently from eQTLs and performed TWAS
testing. Overall, we found that while TWAS test statistics at tagging genes were not as large as
those computed using the causal gene (see Figure 9A), we observed significantly inflated test
statistics resulting in an elevated FWER, which is consistent with previous works (Wainberg et al.,
23
2019; Mancuso et al., 2019; Lu et al., 2022) emphasizing the need for joint testing of multiple
nearby genes or statistical fine-mapping (see Performance metrics and definitions; Figure 9B).
Figure 9. A. TWAS χ
2
test-statistics generated from a model of horizontal pleiotropy through linkage as
compared to TWAS model. B. FWER increased as GWAS 𝐡 𝐠 𝟐 increase in horizontal pleiotropy through linkage
model. The bar plot represents TWAS χ
2
test-statistics for canonical parameters with various GWAS h
g
2
. 50
simulations were performed for each fixed set of simulation parameters. The TWAS χ
2
and their error bars (95%
confidence interval) are estimated using bootstraps with 1,000 repeats.
24
Chapter 7: Conclusion
Here, we present twas_sim, a flexible and scalable computational simulation tool of TWAS test
statistics. It simulates expression levels and complex traits under a variety of feasible genetic
architectures, can efficiently generate GWAS data, and provides a high-level summary report and
an individual reports at each SNP. The simulation results are easily interpretable for downstream
model evaluation. We evaluate the performance of twas_sim in terms of unbiasedness
(Kolmogorov-Smirnov test, family-wise error rate, inflation) under the null hypothesis (no causal
effect of gene expression, α = 0) and power under the alternative hypothesis (α ≠ 0). Various eQTL
architectures, eQTL/GWAS sample sizes, and either fast or standard simulation modes have been
included in the evaluation. We conclude that TWAS test statistics simulated by twas_sim remain
unbiased regardless of the sample size and genetic architectures. The power of Elastic Net
underperforms true eQTL and outperforms LASSO and GBLUP. GWAS fast mode uses similar
memory and CPU time regardless of GWAS sample sizes. LD misspecification reduced power
significantly in simulations under a causal model as compared with the correctly specified model.
Fast mode uses 1/6 and 1/36 as much memory, and 1/8 and 1/41 as much CPU time, as standard
mode, for GWAS sample size of 100K and 500K, respectively. Additionally, while TWAS test
statistics at tagging genes were not as large as those computed using the causal gene, we found
significantly inflated test statistics resulting in an elevated FWER, which demonstrates the need
for joint testing of multiple nearby genes or statistical fine mapping. The simulator currently
supports fitting LASSO, Elastic Net, and GBLUP prediction models to predict gene expression
into GWAS. It is easily extendable with dynamic import function to include additional linear
models to accommodate TWAS methods.
25
Supplementary Tables.
Label Description
gwas.sim GWAS mode
real.time Real time spent on the current simulation
cpu.time CPU time spent on the current simulation
linear_model Linear model used in the current simulation
h2ge Variance explained in trait by GE
snp_model SNP model used in the current simulation
nsnps Number of SNPs
ngwas GWAS sample size
nqtl eQTL sample size
h2g Narrow-sense heritability of GE
h2g.hat Predicted narrow-sense heritability of GE
avg.ldsc Average LD-score at the region
min.gwas.p Minimum GWAS SNP p-value
mean.gwas.chi2 Mean GWAS SNP chi-sq
median.gwas.chi2 Median GWAS SNP chi-sq
twas.z TWAS Z score
twas.p TWAS p-value
alpha TWAS alpha
Supplementary Table 1. Structure of TWAS simulator OUTPUT.summary.tsv “Label” represents output file’s
column names.
26
Column Description
chrom Chromosome
snp SNP identifier
pos bp position
a0 Non-effect allele
a1 Effect allele
maf Minor allele frequency
ld.score LD score (ie. sum_i r_ij^2, where r_ij is LD between snps i, j)
ld.score.causal LD score for causal variants
gwas.sim GWAS mode
gwas.true True causal effect for complex trait
gwas.beta Beta coefficient in GWAS
gwas.se Standard error in GWAS
eqtl.true True causal effect for expression
eqtl.beta Beta coefficient in eQTL
eqtl.se Standard error in eQTL
eqtl.model Linear model to predict gene expression from genotype
eqtl.model.beta Coefficient estimated in selected linear model
Supplementary Table 2. Structure of TWAS simulator OUTPUT.scan.tsv “Column” represents output file’s
column names.
27
Supplementary Algorithms
Supplementary Algorithm 1. External module to call susieR. This Python script creates a
subprocess to run custom R code (external.R; see Supplementary Algorithm 2), that imports
simulated eQTL data to fit the susieR model and export fitted prediction weights. These fitted
weights are imported and returned to the primary twas_sim module for use in TWAS testing.
"""
This file (`external_r.py`) is an example of how to define an external/custom
function to
fit a predictive model of gene expression from genotype to be used by `twas_sim`.
Here
we are illustrating how to call an external R script to call susieR on the simulated
data. Please see `external.R` for example R script details.
External modules -must- include a function named `fit` that takes as arguments:
Z: numpy matrix of genotype
y: numpy vector of gene expression/phenotype
h2g: the true h2g of gene expression
b_qtls: the true beta/effect-sizes for gene expression (i.e. eQTL)
args: the argparse object from twas_sim; useful for pulling `args.output`
as a prefix for temp data.
Similarly, it must return a tuple containing (coef, r2, logl):
coef: the numpy vector for estimated eQTL weights
r2: the predictive r2 (optional; None)
logl: the log likelihood of the model (optional; None)
28
"""
import subprocess
import numpy as np
def fit(Z, y, h2g, b_qtls=None, args=None):
# create output/input paths
geno_path = f"{args.output}.eqtl.genotype.txt.gz"
pheno_path = f"{args.output}.eqtl.gexpr.txt.gz"
coef_path = f"{args.output}.susie.coef.txt.gz"
# write genotype and phenotype to disk so that R can load it
np.savetxt(geno_path, Z, fmt="%.5f")
np.savetxt(pheno_path, y, fmt="%.5f")
# launch R script in a separate process
# R script reads in genotype, phenotype matrix and writes out SuSiE-inferred
# coefficients to `coef_path`
subprocess.run(
f"~/miniconda3/bin/Rscript external.R {geno_path} {pheno_path} {coef_path}",
shell=True,
check=True,
)
# load/read in SuSiE-inferred coefficients
coef = np.loadtxt(coef_path)
# r2 and logl are optional => hence `None`
return coef, None, None
29
Supplementary Algorithm 2. R code to import simulated eQTL data and output susieR
prediction weights.
# Example R script (external.R) to read in txt-based genotype data, along with
# gene expression data. Here, we fit SuSiE using the `susieR` package,
# pull coefficients at genotype (ignoring constant term)
# and write out results so that Python can read them in/load them.
library(susieR)
library(readr)
# paths to genotype, phenotype, and output
args <- commandArgs(trailingOnly=TRUE)
z_path <- args[1]
y_path <- args[2]
out_path <- args[3]
# load data as matrices
Z <- as.matrix(read.table(z_path, header = FALSE, dec = "."))
y <- as.matrix(read.table(y_path, header = FALSE, dec = "."))
# run SuSiE using at most 5 effects
res <- susie(Z, y, L=5)
# pull coefficients at genotypes (first coef is intercept)
g_coef <- data.frame(COEF=coef(res)[2:length(coef(res))])
# write out the result, ignoring column name
write_tsv(g_coef, out_path, col_names= FALSE)
30
References
1000 Genomes Project Consortium et al. (2015) A global reference for human genetic variation.
Nature, 526, 68–74.
Berisa,T. and Pickrell,J.K. (2016) Approximately independent linkage disequilibrium blocks in
human populations. Bioinformatics, 32, 283–285.
Bhattacharya,A. et al. (2021) MOSTWAS: Multi-omic strategies for transcriptome-wide
association studies. PLoS Genet., 17, e1009398.
Edwards,S.L. et al. (2013) Beyond GWASs: illuminating the dark road from association to
function. Am. J. Hum. Genet., 93, 779–797.
Gamazon,E.R. et al. (2015) A gene-based association method for mapping traits using reference
transcriptome data. Nat. Genet., 47, 1091–1098.
Goeman,J.J. (2010) L1 penalized estimation in the Cox proportional hazards model. Biom. J., 52,
70–84.
Gusev,A. et al. (2016) Integrative approaches for large-scale transcriptome-wide association
studies. Nat. Genet., 48, 245–252.
Hindorff,L.A. et al. (2009) Potential etiologic and functional implications of genome-wide
association loci for human diseases and traits. Proc. Natl. Acad. Sci. U. S. A., 106, 9362–
9367.
Hoerl,A.E. and Kennard,R.W. (1970) Ridge regression: Biased estimation for nonorthogonal
problems. Technometrics, 12, 55–67.
International HapMap 3 Consortium et al. (2010) Integrating common and rare genetic variation
in diverse human populations. Nature, 467, 52–58.
Liu,L. et al. (2021) Multi-trait transcriptome-wide association studies with probabilistic
Mendelian randomization. Am. J. Hum. Genet., 108, 240–256.
Lu,Z. et al. (2022) Multi-ancestry fine-mapping improves precision to identify causal genes in
transcriptome-wide association studies. Am. J. Hum. Genet., 109, 1388–1404.
Mancuso,N. et al. (2019) Probabilistic fine-mapping of transcriptome-wide association studies.
Nat. Genet., 51, 675–682.
Maurano,M.T. et al. (2012) Systematic localization of common disease-associated variation in
regulatory DNA. Science, 337, 1190–1195.
31
Nagpal,S. et al. (2019) TIGAR: An improved Bayesian tool for transcriptomic data imputation
enhances gene mapping of complex traits. Am. J. Hum. Genet., 105, 258–266.
O’Leary,N.A. et al. (2016) Reference sequence (RefSeq) database at NCBI: current status,
taxonomic expansion, and functional annotation. Nucleic Acids Res., 44, D733-45.
Parrish,R.L. et al. (2022) TIGAR-V2: Efficient TWAS tool with nonparametric Bayesian eQTL
weights of 49 tissue types from GTEx V8. HGG Adv, 3, 100068.
Pasaniuc,B. and Price,A.L. (2017) Dissecting the genetics of complex traits using summary
association statistics. Nat. Rev. Genet., 18, 117–127.
Patterson,H.D. and Thompson,R. (1971) Recovery of Inter-Block Information when Block Sizes
are Unequal. Biometrika, 58, 545.
Searle,S.R. et al. (1992) Variance Components Searle,S.R. et al. (eds) John Wiley & Sons,
Nashville, TN.
Shi,H. et al. (2016) Contrasting the genetic architecture of 30 complex traits from summary
association data. Am. J. Hum. Genet., 99, 139–153.
Sudmant,P.H. et al. (2015) An integrated map of structural variation in 2,504 human genomes.
Nature, 526, 75–81.
Tang,S. et al. (2021) Novel Variance-Component TWAS method for studying complex human
diseases with applications to Alzheimer’s dementia. PLoS Genet., 17, e1009482.
Tibshirani,R. (1996) Regression shrinkage and selection via the lasso. J. R. Stat. Soc., 58, 267–
288.
Vierstra,J. et al. (2020) Global reference mapping of human transcription factor footprints. Nature,
583, 729–736.
Visscher,P.M. et al. (2017) 10 years of GWAS discovery: Biology, function, and translation. Am.
J. Hum. Genet., 101, 5–22.
Wainberg,M. et al. (2019) Opportunities and challenges for transcriptome-wide association studies.
Nat. Genet., 51, 592–599.
Wang,G. et al. (2020) A simple new approach to variable selection in regression, with application
to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol., 82, 1273–1300.
Yang,J. et al. (2012) Conditional and joint multiple-SNP analysis of GWAS summary statistics
identifies additional variants influencing complex traits. Nat. Genet., 44, 369–75, S1-3.
Zhu,X. and Stephens,M. (2017) Bayesian large-scale multiple regression with Summary Statistics
from genome-wide association studies. Ann. Appl. Stat., 11, 1561–1592.
32
Zou,H. and Hastie,T. (2005) Regularization and variable selection via the elastic net. J. R. Stat.
Soc. Series B Stat. Methodol., 67, 301–320.
Abstract (if available)
Abstract
Genome-wide association studies (GWASes) have identified numerous variants associated with quantitative traits. However, most of these associations are in linkage disequilibrium (LD) with other variants and fall in the non-coding region, complicating proximal target gene identification. Transcriptome-wide association studies (TWASs) have been proposed to mitigate this gap by integrating expression quantitative trait loci (eQTL) data with GWAS data. Numerous methodological advancements have been made for TWAS, yet each approach requires ad-hoc simulations to demonstrate feasibility. The power of TWAS test statistics may be compromised by LD misspecification, defined as the inconsistency of LD patterns for individuals of single ancestry across different reference panels or within the same reference panel. Here, we present twas_sim, a computationally scalable and easily extendable tool for simplified performance evaluation and power analysis for TWAS methods. To illustrate the utility of twas_sim, we perform simulations across various genetic architectures and data contexts from 1000 Genomes genotype data to evaluate the power and unbiasedness of the simulated TWAS test statistics. We further use twas_sim to assess the degree of LD misspecification within the same reference panel.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Prostate cancer: genetic susceptibility and lifestyle risk factors
PDF
Using genetic ancestry to improve between-population transferability of a prostate cancer polygenic risk score
PDF
Analysis of factors associated with breast cancer using machine learning techniques
PDF
Gene expression and angiogenesis pathway across DNA methylation subtypes in colon adenocarcinoma
PDF
A linear model for measurement errors in oligonucleotide microarray experiment
PDF
Air pollution and breast cancer survival in California teachers: using address histories and individual-level data
PDF
Using ribosome footprinting to detect translational efficiency of Mbnl1x2 KO muscle cells
PDF
An analysis of disease-free survival and overall survival in inflammatory breast cancer
PDF
The risk estimates of pneumoconiosis and its relevant complications: a systematic review and meta-analysis
PDF
The effect of renal function on toxicity of E7389 (eribulin) among patients with bladder cancer
PDF
Improving the power of GWAS Z-score imputation by leveraging functional data
PDF
Comparison of participant and study partner predictions of cognitive impairment in the Alzheimer's disease neuroimaging initiative 3 study
PDF
A comparison of three different sources of data in assessing the adolescent and young adults cancer survivors
PDF
Association of traffic-related air pollution and lens opacities in the Los Angeles Latino Eye Study
PDF
Need for tissue plasminogen activator for central venous catheter malfunction and its association with occurrence of vVenous thromboembolism
PDF
The interplay between tobacco exposure and polygenic risk score for growth on birthweight and childhood acute lymphoblastic leukemia
PDF
Enhancing model performance of regularization methods by incorporating prior information
PDF
Adipokines do not account for the association between osteocalcin and insulin sensitivity in Mexican Americans
PDF
HIF-1α gene polymorphisms and risk of severe-spectrum hypertensive disorders of pregnancy: a pilot triad-based case-control study
PDF
The influence of DNA repair genes and prenatal tobacco exposure on childhood acute lymphoblastic leukemia risk: a gene-environment interaction study
Asset Metadata
Creator
Wang, Xinran
(author)
Core Title
twas_sim, a Python-based tool for simulation and power analysis of transcriptome-wide association analysis
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Applied Biostatistics and Epidemiology
Degree Conferral Date
2023-05
Publication Date
05/12/2023
Defense Date
05/12/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
GWAS,OAI-PMH Harvest,TWAS
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Mancuso, Nicholas (
committee chair
), Conti, David (
committee member
), Siegmund, Kimberly (
committee member
)
Creator Email
xrwang010110111@gmail.com,xwang505@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113121764
Unique identifier
UC113121764
Identifier
etd-WangXinran-11838.pdf (filename)
Legacy Identifier
etd-WangXinran-11838
Document Type
Thesis
Format
theses (aat)
Rights
Wang, Xinran
Internet Media Type
application/pdf
Type
texts
Source
20230512-usctheses-batch-1043
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
GWAS
TWAS